Archive for the 'Logic' Category

Using Pronto: Breast Cancer Risk Models

Tuesday, October 2nd, 2007 · Pavel Klinov

In my previous post, I introduced Pronto, a probabilistic DL reasoning extension for Pellet. I gestured at some of the algorithmic and technical details of Pronto’s capabilities—for the technically curious, a careful read of “Probabilistic Description Logics for the Semantic Web” paper is the best place to start. Now let’s move to a more realistic example than those poor birds and penguins and Richard Nixon.

Consider the domain of cancer, more precisely, women’s breast cancer, yet more specifically, the issue of breast cancer risk assessment . Very roughly, the central problem is to combine all the risk factors that apply to a particular woman and come up with a credible number reflecting her chance of developing breast cancer, either in her lifetime or in the short term (normally in the next 10 years). There are a few models that do that, basically, just by computing an empirically inferred function of input parameters (risk factors).

Pronto offers a different way of approaching the problem. It supports a wide use of all the background knowledge captured in a classical ontology—for example, of the sort maintained in the NCI Thesaurus—but also allows us to augment the classical KB with probabilistic statements, such that the risk of developing cancer can be computed as an ontological inference. That makes modeling a lot more explicit and illustrative—especially with support of Pellet’s explanations—than using a “black box” function.

So, consider a classical part of an ontology for modeling the breast cancer domain—by the way, we’re not claiming it’s correct or useful from medical point of view; you may also consider Matt Williams’ version of clinical ontology if you’re concerned with correctness of terms and stuff like that.

The ontology defines risk factors that are relevant to breast cancer, i.e., subclasses of RiskFactor. Then it also defines different categories of women, first, those that have certain risk factors (subclasses of WomanWithRiskFactors); and, second, those distinct in terms of the risk of developing cancer (subclasses of WomanUnderBRCRisk). The basic task is to compute the probability that a certain woman is an instance of some WomanUnderBRCRisk subclass given that she is an instance of some WomanWithRiskFactors subclass. In addition, it will be useful to infer the generic probabilistic subsumption between classes under WomanUnderBRCRisk and under WomanWithRiskFactors.

The first thing to do in order to enable such probabilistic reasoning is to express the uncertain background knowledge about the domain. This is done by listing the conditional constraints in the form of OWL 1.1 axiom annotations. The constraints can either be in a separate file that imports the classical OWL ontology or be embedded into the classical part.

For this example, the constraints express how individual risk factors influence the risk of developing cancer (numbers taken from “Risk Factors and Prevention”). The job of Pronto is to combine factors that apply to a particular woman and compute the probability that she is an instance of some WomanUnderBRCRisk subclass.

Let’s now quickly go through the individuals to illustrate the reasoning:


  • Julie is a woman in her thirties. The only risk factor that applies to her is AgeUnder50, so the Pronto concludes that Julie:(WomanWithBRCInShortTime|owl:Thing)[0.0;0.027] (her chance of developing cancer in next 10 years is no higher than 2.7%)

  • Mary is known to have BRCA1 gene mutation which is known to be a hugely important risk factor. Using the generic constraint (WomanUnderGreatBRCRisk|WomanWithBRCAMutation)[1;1], Pronto puts her in the category of women with the highest relative risk of cancer (this example, also shows, that conditional constraints, with some caveats, can model certain subsumption relationships)

  • For Ann we know two risk factors – her mother had BRC and she is an Ashkenazi Jew, so she has an increased chance of having inherited gene mutation. Using the combination of risk factors without overriding, Pronto concludes that she has a 31.25% chance of being in the category of 3x increased risk and over 2.5% of being in the highest risk category.

  • Helen is the most interesting case. For her we again know 2 risk factors – her age is over 50 and her mother had cancer. Using overriding we can specify how these two factors strengthen or weaken each other to produce the actual risk. This can be done by defining a generic constraint (WomanUnderGreatBRCRisk|SeniorWomanWithMotherBRCAffected)[0.9;1] that overrides constraints for both factors individually. Thus, Pronto entails that she in the highest risk category with more than 90% probability.

Finally, I want to mention explanations, which are an important part of Pellet’s reasoning services. I begin by pointing out that DL reasoning can be difficult to understand and probabilistic reasoning can also be difficult to understand. Not surprisingly, the hybrid reasoning that Pronto is capable of can be very difficult to understand. This is both a limitation and an opportunity to extend Pellet’s explanation services to Pronto.

In other words, we want to extract the minimal set of conditional constraints that are sufficient to produce the given probabilistic entailment. For the breast cancer example above that would imply filtering out all the irrelevant risk factors and leaving only those which were taken into account during reasoning. We’ve got some initial work done in extending explanations in Pronto, but there’s much more work to do, including extending debug and repair features of Pellet to Pronto.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Introducing Pronto: Probabilistic DL Reasoning in Pellet

Thursday, September 27th, 2007 · Pavel Klinov

This is the first in a series of posts on extending Pellet with probabilistic reasoning capabilities. We call this tool “Pronto”. It offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”.

The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally, for example, risk factors associated with breast cancer.

First, I should say that if you are interested in a rigorous description of the approach, read the paper by Thomas Lukasiewicz “Probabilistic Description Logics for the Semantic Web”. Pronto is to a large extent an implementation of the Lukasiewicz approach—the rest is optimization and the support of explanations.

In a nutshell, the features of Pronto (in addition to the features of Pellet) are the following:


  1. Expressing generic probabilistic knowledge. “Generic” means that the knowledge doesn’t apply to any specific individual but rather to a fresh, randomly chosen one. Generic probabilistic knowledge is represented in the form of generic conditional constraint (GCC). A GCC is an expression of the form (D|C)[l,u], where C and D are DL concepts and [l,u] is a closed subinterval of [0,1]. Without getting deeply into the semantics, the meaning of a GCC is roughly for a randomly chosen instance of C, the probability of being an instance of D is within [l,u]. The above statement about birds would be written as (FlyingObject|Bird)[0.9;1.0].

  2. Expressing concrete probabilistic knowledge. Here the knowledge applies to a specific individual. Concrete probabilistic knowledge is represented in the form of a:X, where “a” is an individual and “X” is a GCC restricted to the form (D|owl:Thing)[l,u]. We can express “Tweety is-a Flying Object with probability less than 5%” as Tweety:(FlyingObject|owl:Thing)[0.0;0.05].

  3. Probabilistic reasoning, that is, generic and concrete entailments. A generic entailment is, given a probabilistic KB and a pair of concepts, compute the tightest interval (D|C)[l,u]. A concrete entailment is, given a probabilistic KB, an individual “a”, and a concept “D”, compute the tightest interval (D|owl:Thing)[l,u] for “a”. So we can ask Pronto to infer the probability of a statement like Tweety being a flying object based on other statements rather than asserting the conditional constraint.

  4. Probabilistic explanations. Pronto is capable of computing all minimally sufficient (w.r.t. inclusion) subsets of conditional constraints for a particular entailment, both generic and concrete.

Perhaps the single most important point about Pronto reasoning is that all inferences are done in a totally “logical” way, i.e. using a well-defined entailment relation and without any explicit or implicit translation of KB (or some parts of KB) to Bayesian graphs. This is the major difference between Pronto and other approaches, e.g. P-CLASSIC or “Probabilistic Extension to OWL”.

Finally, I should mention overriding as a feature of Pronto that we particularly like. Pronto allows certain conflicts between different pieces of probabilistic knowledge, more precisely, between different conditional constraints. The famous example is that of flying birds and non-flying penguins. (It’s similar to the famous Nixon Diamond problem.) The problem here is related to non-monotonicity: A bird is a flying object with high probability and all penguins are birds but a penguin has a low probability of flying.

The way Pronto resolves these conflicts is by allowing more specific constraints to override more generic ones. So if Pronto knows that Tweety is a Penguin and Penguin is a subclass-of Bird, it will override the constraint (FlyingObject|Bird)[0.9;1.0] by (FlyingObject|Penguin)[0.0;0.05] and correctly entail Tweety:(FlyingObject|owl:Thing)[0.0;0.05]. This is the idea borrowed from reference class reasoning and supported by Lehmann’s lexicographic entailment employed in Pronto (see the Lukasiewicz paper for technical details). The decision whether a constraint is more specific/generic than some other one is made through the classical DL reasoning.

In the next post of this series, I’ll take you through an actual use of Pronto in the life sciences domain.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Scaling OWL (two new ways)

Monday, June 25th, 2007 · Bijan Parsia

For those with OWL-performance-fear, there is some good news. There usually is, but these tidbits are striking:

  • On the TBox side of things: Boris Motik and Rob Shearer (with Ian Horrocks) have developed a new reasoning calculus that is very effective with the notorious Galen ontology, and, indeed, with all the OBO ontologies. They tackle both non-determinism and tableau size with stunning results. It should also have positive implications for DL Safe rules. They have a prototype reasoner using the technique, HermiT, available for download.
  • On the ABox side of things: IBM Research (Watson Research Center, NY) have recently posted information about their summarization technique for scaling ABoxes. They have a reasoner, SHER, which will be available in one form or another at some point (note! it has Pellet Inside!). Their case study is quite inspiring.

(The hypertableaux calculus has positive implications for ABox scaling as well.)

Since I’m an hour train ride from Liverpool, I’ll just conclude that it’s getting better all the time. Read the papers. Enjoy.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Working 9 (Hours) to 5 (Minutes): Tuning the Pellet Classifier

Tuesday, April 10th, 2007 · Mike Smith

The NCI Thesaurus is an ontology of cancer, diseases, and related terminology within and outside biomedicine. The latest version is really large—about 58,000 classes in the latest release. From our perspective, as maintainers of Pellet, large ontologies present opportunities. The folks at NCI agreed and they’ve funded us to improve Pellet’s classification service to the point that it can be used with the Thesaurus.

In a short month of work, we’ve progressed from infinite time to 9 hours to 5 minutes and, though we’re shifting focus at the moment, we’re confident there are more improvements to be realized. This has been an excellent example of why working on Pellet is rewarding, why software engineering matters, and how funding Pellet’s development can make a difference.

If you were paying attention to the NCI Thesaurus when some of us worked on it at the Mindswap lab, that (older) version now takes about 15 seconds to classify. Yeah, that’s right: 15 seconds. When we started this work it was 50 minutes.

Expect to see these classification improvements in the 1.5 release, and if you’ve got other big problems for which re-engineering Pellet may help, let us know.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Witchly interlude: SPASSing out

Wednesday, January 3rd, 2007 · Bijan Parsia

I see Dan has converted the Witch argument to N3 and used its proof generation tools to output a proof (btw, Dan, the OWL version was a solo effort by me and I guess you’ll have to come to our actual weblog to see who I am :)). (BTW, even though I linked to it, I should mention explicitly that the original is due to John Ioannidis.)

I’m jotting up the tableau version of the proof (but it’s a bit tricky, pedagogically since if I use a linear proof format, then it doesn’t really correspond to what, e.g., Pellet does, but if I use a completion graph approach I gotta go graphical). However, Dan’s post reminded me that SPASS (and MSPASS) will generate (roughly, resolution based) proofs and have web interfaces. (These proofs are meant, I think, to be checkable.)

So, following the SPASS tutorial, I got this:

begin_problem(Burn_the_Witch_Argument).

list_of_descriptions. name({*Burn the Witch Argument <strong>}). author({</strong> Bijan Parsia <strong>}). status(unsatisfiable). description({</strong> Formalization of the Burn the Witch Argument*}).
end_of_list.

list_of_symbols. functions[(girl,0), (duck,0)]. predicates[(Burns,1),(Woman,1),(Witch,1),(IsMadeOfWood,1),(Floats,1),(SameWeight,2)].
end_of_list.

list_of_formulae(axioms). formula(forall([X], implies(and(Burns(X), Woman(X)), Witch(X))), 1). formula(Woman(girl),2). formula(forall([X],implies(IsMadeOfWood(X), Burns(X))),3). formula(forall([X],implies(Floats(X), IsMadeOfWood(X))),4). formula(Floats(duck),5). formula(forall([X,Y],implies(and(Floats(X), SameWeight(X, Y)), Floats(Y))),6). formula(SameWeight(duck, girl), 7).
end_of_list.

list_of_formulae(conjectures). formula(Witch(girl),8).
end_of_list.
end_problem.

If you paste this into the SPASS interactive web form, and add -DocProof in the options field, you’ll get a bunch of stuff including the proof:

Here is a proof with depth 3, length 15 : 1[0:Inp] || -> Floats(duck)*. 2[0:Inp] || -> Woman(girl)*. 3[0:Inp] || -> SameWeight(duck,girl)*. 4[0:Inp] || Witch(girl)*+ -> . 5[0:Inp] Floats(U) || -> IsMadeOfWood(U)*. 6[0:Inp] IsMadeOfWood(U) || -> Burns(U)*. 7[0:Inp] Burns(U) Woman(U) || -> Witch(U)*. 8[0:Inp] Floats(U) || SameWeight(U,V)*+ -> Floats(V). 9[0:Res:7.2,4.0] Woman(girl) Burns(girl) || -> . 10[0:MRR:9.0,2.0] Burns(girl) || -> . 17[0:SoR:10.0,6.1] IsMadeOfWood(girl) || -> . 18[0:SoR:17.0,5.1] Floats(girl) || -> . 20[0:Res:3.0,8.1] Floats(duck) || -> Floats(girl)*. 21[0:SSi:20.0,1.0] || -> Floats(girl)*. 22[0:MRR:21.0,18.0] || -> . Formulae used in the proof : 5 2 7 8 4 3 1 6

Neat!

SPASS is a full first order logic theorem prover. Our translation into OWL DL (ALC really) shows that we don’t need full FOL for this. MSPASS tweaks SPASS so that it effects a decision procedure for (various) DLs. I did a MSPASS DL variant of the argument as well (wish I had an tool to convert to that format from OWL; oh, if you want to save it to disk, you should strip off the ”.txt” and put a dot before the “dfg”.). You can test it with the MSPASS web interface (which will show you the various translations back to FOL).

Philosophically, I’m remain skeptical about the general value of proof generation (at least for exchange, e.g., for PCA like tasks), especially for scalable, decidable logics. Basically, the proofs can be expontential in the size of the input, and the input can be large. For decidable logics with optimized reasoners, for many things, an asserted justification (i.e., a minimal set of axioms from the KB sufficient to entail the conclusion) might be more than sufficient. There’s a complex story here, of course, but if your proof checker isn’t reliable (Dan said he was still finding bugs :)), then even using it to verify is a bit tricky. Proof checkers aren’t always easier.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati