Pellint: An Ontology Repair Tool

July 2nd, 2008 · Kendall Clark

Pellet performance isn’t bad, it’s unpredictable. And that’s a problem. Pellint, our new ontology repair tool, is a solution to that problem.

Non-experts often find predicting reasoner performance from their ontology disappointing. The connections between performance and the data are opaque in a way that is sometimes confounding and off-putting. Pellet isn’t unique in this regard; all automated reasoners for expressive knowledge representation formalisms have this problem, some more so than Pellet.

Contrast RDBMS technology. Not only is the underlying computational complexity much better, but many developers have internalized the technology such that predictions are more reliable and the connection between data, queries, and performance is more transparent. And the bad stuff is always bad, till you fix it, and the good stuff is always good till you break it. And if all else fails you just EXPLAIN your way to happiness.

For serious Pellet users, including the ones who are customers of ours, the analogue of EXPLAIN is either an email to the Pellet users list or a support contract, respectively. But we really don’t like the fact that the only reliable way to maximize Pellet performance is to ask or hire (or become) an expert.

So what can we do to ease this problem, while working on the next big performance paradigm shift? Well, for one thing, we’re building design and support tools to help people sniff out problems in ontologies and, ideally, automatically repair them. Taking the classic C tool, lint, as our inspiration, we’ve developed Pellint, a lint tool for Pellet that reports and repairs modeling constructs that are known to have bad performance characteristics. Actually, our new intern, Harris Lin, has been working with Evren on Pellint, and we’re all impressed with the quality of his work on the tool.

Pellint takes an OWL ontology as input and can report on problematic modeling constructs (which we call “patterns”), or it can simply output a repaired ontology with the troublesome patterns rewritten or omitted.

We’ll be releasing an early adopter’s version of Pellint soon so that experts and eager users in the OWL community can test it on their ontologies, report other patterns, and give us feedback on improvements.

Early testing with our “known bad” ontologies collection is very encouraging, and we expect a production-ready version of Pellint to be released simultaneously with the next major Pellet release.

“Major-league bullshit”

July 2nd, 2008 · Bijan Parsia

I just saw this wonderful, wonderful Carlin quote (from, via):

When in comes to bullshit…bigtime, major league bullshit…you have to stand in awe of the all-time champion of false promises and exaggerated claims…religion. ... Religion has actually convinced people that there’s an invisible man living in the sky who watches everything you do, every minute of every day. And the invisible man has a special list of 10 things he does not want you to do. And if you do any of these 10 things he has a special place full of fire and smoke and burning and torture and anguish where he will send you to live and suffer and burn and choke and scream and cry for ever and ever until the end of time…but he loves you.

I wonder what Carlin would have said about tech advocacy (and Semantic Web Advocacy ).

(It’s good to practice humility. Treat your own cherished claims as bullshit. What survives may be gold.)

(I’ll have some hard core techy stuff soon! Never fear!)

Architectural Arguments

June 30th, 2008 · Bijan Parsia

Further (if weak) evidence that appeals to puppies are rather non-technical (and not even socio-technical):

...HTML owns that process of extracting a valid URI-reference from an attribute’s value string. A simple string parsing description, with associated context-specific error-handling, is more than sufficient to satisfy the needs of HTML5 without appearing to override an existing standard that has recently been agreed to by all vendors, including the few browser vendors that care about HTML5
In contrast, pretending to define a new URL standard as part of HTML5 is not acceptable. HTML5 is a user of the Web, not a definer of it. HTML will never define the identifiers for the Web. That would be a fundamental violation of the Web architecture.—Roy Fielding

This just seems to mix up process/spec structure with system structure. It’s a bit like saying that the architecture of a building is ungainly because the blueprints are all smeared up.

The first paragraph isn’t insane. There could be a dispute as to whether the existing standard is in fact agreed to and, even if it is, whether it is de facto (or merely de jure) and want to do about it.

The second paragraph doesn’t seem to appeal to any facts at all. Sentence one is just Roy saying he doesn’t like it (though expressed in pseudo-factual terms). The second seems just false on any remotely literal reading (HTML5 isn’t the kind of thing that can use anything, much less the web!). The third sentence seems more like a declaration of his intent (i.e., it’ll never happen because he’ll make it never happen). The last seems factual, but it definitely contentless, at least without serious serious supplement (i.e., it should be a conclusion, but we haven’t even seen whether it’s a violation of the standards for writing blueprints or of what the blueprints say; whose blueprints are they anyway? can we sensible talk about blueprints for the Web?)

Thus, Roy is not giving public reasons, primarily. He’s just expressing strong dissent with some coloring of expertise to hide the bruteness of that dissent. That’s not happy discussion technique, IMHO.

Introduction to Pellet: New York Semantic Web Meetup Talk

June 27th, 2008 · Kendall Clark

Mike Smith and I ran up to NYC yesterday to give a talk about Pellet and OWL at the New York Semantic Web Meetup. The talk went pretty well, though we still have a lot to learn about how to sell Pellet and OWL to people who don’t already get KR, logic programming, etc. Typically when we give a technical intro to Pellet, there are mostly blank stares. Then when we get to the section of slides about how people are using Pellet, the mood in the room changes. People get that stuff a lot more easily, since they have the same or similar problems.

So, natural born geniuses that we all are, from now on we’ll start leading with the ways Pellet is being used successfully, and only follow-up with tech details when people ask. Doh… :>

That said, there were plenty of very techie people in the room and some of them had very good questions in the Q&A about Pellet performance, our use of it to manage XACML policies, and probabilistic reasoning with Pronto.

Mike did a great job compressing a 60 minute talk into—whoops—20 minutes, which was just as well since we had to catch Amtrak back to DC and barely made it on time. As a SemWeb hub, NYC is starting to pick up steam. Lots of interesting people doing interesting things at the meetup last night, so that’s a great sign. Even in a crap economy, NYC is still a good place to do biz. As a place to be, NYC always kicks my ass for the first 2 hours, then I start asking myself: why don’t I live here? Goddam, I love that city.

Mad props to Marco Neumann who’s doing a good job organizing the meetup. Way back in the day I founded and ran a Linux Users Group in Dallas, which at its peak had 300+ at monthly meetings, so I know how much hard work it is.

Madder props to my old friend Paul Ford for coming out to listen to the talks and say hello. Paul, who lives (famously) in Brooklyn (ftrain!), is the genius behind Harpers.org—which happens to be a SemWeb-powered site, using RDF. This hasn’t gotten nearly enough play in SemWeb circles in my opinion. Harpers is certainly the best American magazine ever and it’s very cool web presence is all RDF-powered. How great is that.

Maddest props to Mike Smith for putting up with the mad dash up to NYC and for the great talk he gave.

Why Reasoning Matters: Explanations (3)

June 23rd, 2008 · Kendall Clark

Previously I talked about the most fundamental reasoning service, consistency checking. It’s the most fundamental because every other reasoning service, ultimately, is performed by doing one or more consistency checks. I undersold the utility of consistency checking last time intentionally, because saying it’s key to all the other things one can do with automated reasoning isn’t very interesting before you know about some other things automated reasoning can do.

To recall, consistency checking itself is useful in, for example, data integration projects because it eliminates from run-time and query-time errors based on conceptual or modeling issues, and it does that at design-time and with certain guarantees, modulo bugs, about soundness and completeness.

(A neglectable aside: in automated reasoning, “sound and complete” comes up a lot. In principle, a reasoner is “sound and complete” if, but only if, it uses a decision procedure (i.e., a kind of algorithm) that is sound and complete. Which means that it is guaranteed to give no wrong answers (“sound”) and to give all the answers there are to give (“complete”). I say “in principle” because automated reasoners have bugs just like any complex software. I say “guaranteed” because someone has proven the soundness or completeness, or both, of the decision procedure. Unsound automated reasoners are not, as far as I know, very interesting for real apps. But when designing an automated reasoner, people often trade completeness for efficiency by implementing an incomplete decision procedure—there are answers that such a reasoner can never provide, by design. But it least it doesn’t provide them quickly!)

By way of comparison, Linked Data and RDF triple store vendors try to make virtue of their vice—they can’t do consistency checking, so they claim no one would ever want or need to do it. As to this tendency, I blame no one. I’d say the same thing, too!

Explanations

Now I want to talk about the utility of another reasoning service, which in Pellet we call “explanation, debugging, and repair”—in this post, I’ll focus only on explanation, saving the others for another day.

The utility of explanation, in a nutshell, is that the reasoner can not only create new knowledge from existing knowledge by means of inference, but it can also—and this is the cool part—tell you how it reached the conclusion, or inference, that it reached.

So Pellet can derive new knowledge and then explain how it derived that new knowledge. It explains its inferences by providing the minimal set of facts or other knowledge necessary to draw the inference.

Think about a perfectly ordinary conversation between two people. Bob tells Nancy a lot of stuff about the physics of baseball, how curve balls work, and so on. Nancy thinks about what Bob’s said and infers some other stuff based on it; for instance, maybe she draws some conclusions about how split-finger fastballs work based on how 3-seam fastballs work. So Nancy tells Bob these new bits of knowledge she’s inferred from what he told her about baseball physics. Then Bob asks Nancy to tell him why she reached those new bits. And, in response, Nancy picks out just the bits that she relied on and tells Bob about them.

This is damn useful because, while people are themselves reasoners by virture of their nature, they tend to be very skeptical about machine or automated reasoning. And with good reason! Explanations provide a means for people to, basically, check the computer’s work.

Configuration Management

One of the ways we’ve used Pellet for customers is to build a configuration management engine for some problem using Pellet and ontologies. Basically you make an ontology describing some problem domain—say, stereo equipment. The ontology describes what things there are in this part of the world (AudioEquipment, Receiver, Interconnect, Speaker, Component, AcceptableUsePolicy, etc.) and what relationships between those things are possible (connect_to, provides_output, etc). The application takes some input from the user, say, a preferred stereo system buildout, and determines whether that’s a legal configuration by means of some of its built-in reasoning services.

So far, so cool. But it’s one thing to tell a user that her preferred buildout isn’t legal; it’s a much better thing to tell her why it’s not legal (explanation) and to suggest changes she can make (repair) that will make it legal. Obviously this capability exists on every PC and auto manufacturer’s web site, as well as many other examples. I will save for another day a discussion of why you might want to solve this problem with an automated reasoner and OWL, rather than, say, Java or Python code.

But there’s another reason why explanation is so useful: not only does it let people check the computer’s work, but it also allows people to understand mistakes they’ve made. In other words, Pellet’s explanation service works for every inference it makes, including the inference that something is inconsistent. Pellet doesn’t just tell you that you’ve made a mistake, it shows you which bits of what you’ve said, and what it inferred, together cause the mistake.

A Comparison with RDF & Linked Data

By the way, Linked Data and RDF triple stores don’t do consistency checking and they don’t do explanations, either. Why not? Since the kinds of inferences one can draw in RDF are mostly trivial or meaningless, there’s nothing that needs to be explained. Anyone can understand the most complex RDF inference by thinking about it for no more than 2.7 seconds. That’s a scientific fact! :)

In cases where you don’t care about stuff like explanations, debugging and automatic repair, RDF can be a good choice, depending on a lot of other factors. But in cases where that stuff really matters, you need to think about automated reasoning languages and systems, like OWL and Pellet.