Archive for September, 2007

Introducing Pronto: Probabilistic DL Reasoning in Pellet

Thursday, September 27th, 2007 · Pavel Klinov

This is the first in a series of posts on extending Pellet with probabilistic reasoning capabilities. We call this tool “Pronto”. It offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”.

The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally, for example, risk factors associated with breast cancer.

First, I should say that if you are interested in a rigorous description of the approach, read the paper by Thomas Lukasiewicz “Probabilistic Description Logics for the Semantic Web”. Pronto is to a large extent an implementation of the Lukasiewicz approach—the rest is optimization and the support of explanations.

In a nutshell, the features of Pronto (in addition to the features of Pellet) are the following:


  1. Expressing generic probabilistic knowledge. “Generic” means that the knowledge doesn’t apply to any specific individual but rather to a fresh, randomly chosen one. Generic probabilistic knowledge is represented in the form of generic conditional constraint (GCC). A GCC is an expression of the form (D|C)[l,u], where C and D are DL concepts and [l,u] is a closed subinterval of [0,1]. Without getting deeply into the semantics, the meaning of a GCC is roughly for a randomly chosen instance of C, the probability of being an instance of D is within [l,u]. The above statement about birds would be written as (FlyingObject|Bird)[0.9;1.0].

  2. Expressing concrete probabilistic knowledge. Here the knowledge applies to a specific individual. Concrete probabilistic knowledge is represented in the form of a:X, where “a” is an individual and “X” is a GCC restricted to the form (D|owl:Thing)[l,u]. We can express “Tweety is-a Flying Object with probability less than 5%” as Tweety:(FlyingObject|owl:Thing)[0.0;0.05].

  3. Probabilistic reasoning, that is, generic and concrete entailments. A generic entailment is, given a probabilistic KB and a pair of concepts, compute the tightest interval (D|C)[l,u]. A concrete entailment is, given a probabilistic KB, an individual “a”, and a concept “D”, compute the tightest interval (D|owl:Thing)[l,u] for “a”. So we can ask Pronto to infer the probability of a statement like Tweety being a flying object based on other statements rather than asserting the conditional constraint.

  4. Probabilistic explanations. Pronto is capable of computing all minimally sufficient (w.r.t. inclusion) subsets of conditional constraints for a particular entailment, both generic and concrete.

Perhaps the single most important point about Pronto reasoning is that all inferences are done in a totally “logical” way, i.e. using a well-defined entailment relation and without any explicit or implicit translation of KB (or some parts of KB) to Bayesian graphs. This is the major difference between Pronto and other approaches, e.g. P-CLASSIC or “Probabilistic Extension to OWL”.

Finally, I should mention overriding as a feature of Pronto that we particularly like. Pronto allows certain conflicts between different pieces of probabilistic knowledge, more precisely, between different conditional constraints. The famous example is that of flying birds and non-flying penguins. (It’s similar to the famous Nixon Diamond problem.) The problem here is related to non-monotonicity: A bird is a flying object with high probability and all penguins are birds but a penguin has a low probability of flying.

The way Pronto resolves these conflicts is by allowing more specific constraints to override more generic ones. So if Pronto knows that Tweety is a Penguin and Penguin is a subclass-of Bird, it will override the constraint (FlyingObject|Bird)[0.9;1.0] by (FlyingObject|Penguin)[0.0;0.05] and correctly entail Tweety:(FlyingObject|owl:Thing)[0.0;0.05]. This is the idea borrowed from reference class reasoning and supported by Lehmann’s lexicographic entailment employed in Pronto (see the Lukasiewicz paper for technical details). The decision whether a constraint is more specific/generic than some other one is made through the classical DL reasoning.

In the next post of this series, I’ll take you through an actual use of Pronto in the life sciences domain.

New jSpace App: Baseball Stats Browser

Thursday, September 20th, 2007 · Michael Grove

We’re big baseball fans around here, and if you were in our office, you’d frequently find us debating various baseball topics, such as whether or not Jeff Bagwell had a Hall of Fame career, or lamenting about the sad state of the Orioles franchise. So it was just a matter of time until our interests at work collided with our love of the national pastime and yielded our latest demo of jSpace over a data set near and dear to our hearts, baseball statistics.

Retrosheet is a great site for baseball statistics, they have a very comprehensive database of stats for nearly every game played dating back to about 1870. We’ve used this site in the past to settle more than a few debates. One of the great things about Retrosheet is that they provide a dump of their data, which we scraped into RDF and loaded into Sesame. It’s about 7.1M triples and contains all the season stats (pitching, batting and fielding) for all the players in their database who played between 1871 and 2006. That’s nearly 17000 players, from Hank Aaron to Ryan Zimmerman and everyone in between. It also includes all relevant associated team, league and position data.

We’ve created a model file for this data and hooked it up to jSpace for browsing, and it makes for a very cool demo—in fact, we think it’s probably gone beyond a demo and is a nearly-useful tool for sabermetricians, baseball stats junkies, etc.

Under the assumption you’re looking for a player, our model contains columns for all relevant statistics in the database, as well as some calculated ones, such as Slugging %, Runs Created or Zone Rating. You can very easily find all first basemen who played for any of the Baltimore Orioles franchises (there are several) who have hit more than 200 homeruns in their career and have a lifetime batting average of .280 or better, all it takes is a few clicks in jSpace’s interface. Maybe you don’t care what franchise they played for, no problem. You’re just a click away from finding all players that meet those criteria, except for the franchise for which they played.

To make the demo possible, we’ve added some new features to jSpace. Among the new features is support for custom UI’s for columns containing typed literals. So in the case of our baseball demo, the homeruns column, which is full of integer values, now gives you the option to restrict your search using numerical operations such as greater than or less than. If a column contains date entries, you can specify that you’re looking for things before or after a certain date. And you can even set the granularity of a column of numbers, so rather than showing ALL the homerun values, you can display them in buckets of 10 or 100, narrowing down the results in a column from potentially hundreds to just a handful. The custom column UI support is extensible, so it would be easy to make new column view’s, such a map for geographic data, or a calendar for date values rather than a list.

We’ve also added a new Information Panel component called the “Web View” which simply takes the current selected resource and shows a web page relevant to that selection. We provide out of the box support for Wikipedia and Google’s “I’m Feeling Lucky” searches. So in our baseball demo, when you select “First Base” from the list of positions, the web view will show wikipedia’s entry on first basemen. When you select a player, you’ll see Retrosheet’s player data page. The web view brings more information about what you are browsing right to your fingertips.

We think these new features and a very cool data set make for a great demo of jSpace and we invite you over to check it out. Just browse to the jSpace page and scroll down to the bottom of the page where we list the demo’s and click on the link to launch the jSpace baseball demo.

Understanding SWRL (Part 3): Some tricky bits

Thursday, September 13th, 2007 · Bijan Parsia

What could possibly be tricky about DL Safe SWRL rules? Well, the fact that I have to write out “DL Safe SWRL Rules” rather than merely “DL Safe Rules” or “SWRL Rules” should be some sort of indicator.

Recall that (arbitrary) SWRL rules act as a very expressive sort of class (or property, or class ‘n’ property, or…) axiom, the DL Safety restriction acts (most directly) on named individuals only. The restriction to named individuals is common to (most) databases and, in some ways, to Prolog and many rule languages as well.

(Sometimes, in query contexts, you’ll hear the term “active domain”. So, “domain”: the set of individuals we are talking about. “Active” I’m not so clear on, but I think the intuition is that it’s the part of the domain that you’ve directly touched and are working with by naming those individuals directly. Remember that in OWL it’s easy to describe things for which you have no name. Existential/someValuesOf restrictions are the obvious case. I can say that you have a parent without knowing, without knowing at all, who that parent is. I can say that you have a parent who is a doctor without knowing if that parent is your mom, your other mom, your dad, your step-dad, etc. That’s what makes arbitrary SWRL rules so powerful, yet difficult to work with: they have to consider all these possibilities (and more!). DL Safe SWRL rules merely need to consider the parents we’ve actually named.)

However, this is only one point of similarity. There are lots of ways that even DL Safe SWRL rules retain their SWRLiness…so one needs to be careful not to leap from, “Oh, DL Safe SWRL rules are more like Datalog rules because their variables range over the active domain alone!” to “So, I can slap a Datalog engine in front of my OWL reasoner and everything is hunky dory!”

No. This just isn’t true. There are cases where a full fledged OWL reasoner will give the same answers (for a particular class of answers) as a Datalog engine given the same input (e.g., a certain OWL KB and a bunch of DL Safe SWRL rules). But, as we saw, a SWRL KB with the rules interpreted as DL Safe will produce the same answers, for a certain class of answers as that SWRL KB with the rules treated generally. The relations between the formalisms (and their associated reasoner behavior) is not always obvious. Just off the top of my head, I’d say that as long as you don’t use any funky features such as negation as failure most more or less naive combinations of a rule engine with a DL reasoner to process DL Safe SWRL rules should be sound (i.e., won’t give you wrong answers) but not complete (i.e., will miss answers). (Don’t hold me to this!)

Take a simple example, ever so lightly adapted from the original DL Safe rule paper:


:Child (x) :- :GoodChild(x).
:Child (x) :- :BadChild(x).
	

:Oedipus rdf:type [a owl:Class; owl:unionOf (:BadChild, :GoodChild)].

(We’ll presume both rules are interpreted with the DL Safety restriction.)

(Ok, I’m not even pretending to make these directly usable anymore. It’s so much work! It really shouldn’t be the case that standardization makes building tutorials like this a slit-your-wrists matter. However, there is an SWRL encoding in RDF/XML of (a slightly modified version of) table 3 of the paper, which includes the Oedipus example.)

Question: Is Oedipus a child or not?

Let’s ask Pellet:


   > ./pellet.sh -if http://owldl.com/ontologies/dl-safe.owl -r -c TREE -s OFF
    Input file: http://owldl.com/ontologies/dl-safe.owl
    Consistent: Yes
    Time: 2050 ms (Loading: 1456 Consistency: 82 Classification: 32 Realization: 480 )
    Classification:
     owl:Thing - (dl-safe:Remus)
        dl-safe:BadChild - (dl-safe:Cain)
        dl-safe:Child - (dl-safe:Oedipus, dl-safe:Cain)
        dl-safe:GoodChild
        dl-safe:Grandchild
           dl-safe:Person - (dl-safe:Abel, dl-safe:Adam, dl-safe:Cain, dl-safe:Romulus)

Pellet says answer is “yes”, even though we don’t know whether Oedipus is a good child or a bad child…we just know that he’s one or the other. And that’s exactly right. A Prologgy engine won’t show that. For example, you can try this rendition in a Prolog in Javascript implementation:


    % Below this line to the next percent sign goes in the top box.
    child(X) :- goodChild(X).
    child(X) :- badChild(X).
    rdfType(oedipus, unionOf([goodChild, badChild]).
    goodChild(bijan).
    % This goes into the query box
    child(X).
    % Hit "Run Query" and the result will include the following line mentioning bijan but no binding for oedipus
    X = bijan

Now, the rule engine person might chime in: “Ok, but you’re missing something…the rules that make sense out of the OWL part!” At this point, I heave a big ole sigh and say, “Not going to happen.” At least, not naively. You could go the Boris/KAON2 route in which case you translated the ontology part into potentially quite a few rules. Unless you are in one of the specific tractable fragments like HornSHIQ or DLP which have a fairly close correspondence between OWL Axioms and Rules, you aren’t going to get anything readable out of it. Implementing something like KAON2 is a comparable effort to implementing something like Pellet. You aren’t going to pop a simple axiomatization of the OWL vocabulary into a rules engine and get anything useful.

There is no magic pixie make-your-life-easy-by-rules dust. Sorry.

(This doesn’t mean standard rule engines aren’t useful, by any stretch of the imagination. A lot of the “best” ontologies fit into Horn-fragments of OWL or mostly into Horn-fragments of OWL which means that rule tech can be fairly straightforwardly applied. I’m not engaged in advocacy at the moment, but trying to raise the level of clarity. Decisions made for bad reasons are often bad!

Similarly, there are things you might be able to do to massage this particular example. But working from ad hoc examples is a pretty bad way to proceed (at least for completeness and correctness; for optimizations it’s a little more justifiable).)

Another perhaps tricky bit is contraposition. You can flip conditionals around if you have negation available. For example, consider the sentences:


      If Bijan has finished the SWRL series, then he is working another series.
      If he is NOT working on another series, then Bijan has NOT finished the SWRL series.

In many rule systems, contraposition doesn’t really hold because you have no (classical) negation. But SWRL rules (including DL Safe ones) are standard first order conditionals.

A simple way to play with this is to make use of the correspondence between SWRL and OWL axioms. Basically, you can convert any OWL axiom to a SWRL rule, though DL Safety will restrict the meaning a bit.

Ok, this part of the series has been sitting in my queue for over a week. I just don’t have the time (classes starting) and energy (arthritis flare) to work up these examples. It’s pretty easy to take any arbitrary OWL axiom and SWRLize it and it’d be really great if someone would do this for some existing ontology so that we’d have some examples. Even better would be a little script that did this :) (XSLT on OWL 1.1 XML format should work easily enough.) I imagine this blog series will inform the Safe Rules OWLED task force, so further examples or tools for playing with DL Safe SWRL rules are welcome. I think the next post will be about how distinguished and undistinguished variables in query languages like SPARQL relate to the DL Safety restriction. Then a bit about syntax. Then maybe a bit about extensions and built-ins. That probably completes the series :) Oh maybe one about working with SWRL using a first order logic reasoner… And maybe one about implementation… and fragments… No. :)

New OWL Working Group!

Thursday, September 6th, 2007 · Bijan Parsia

Hurrah hurrah! The moment has finally arrived. The W3C has announced today that they are starting up a new OWL working group.

I’m really excited by this, since, after all, I’ve been working toward this for, geez, two years or so? Two and a half? Well, since before the first OWLED. I think it’ll be a big boost for the community and that users will benefit quite a bit.

The working group won’t be the only place where OWL evolution will be happening, of course. OWLED, for example, marches on and I’m hoping for great things from the task forces. There’s lots to do and lots of fun to be had while doing it.

Interesting, tonight, we Mancunians from IMG who are not moving to Oxford threw a bash for those who are. Ian Horrocks (who is the reason I have a job at the University of Manchester) has been successfully wooed by Oxford. Ian will be (co)-chairing the OWL working group.

Interesting times!