Archive for the 'Description Logic' Category

Using Pronto: Breast Cancer Risk Models

Tuesday, October 2nd, 2007 · Pavel Klinov

In my previous post, I introduced Pronto, a probabilistic DL reasoning extension for Pellet. I gestured at some of the algorithmic and technical details of Pronto’s capabilities—for the technically curious, a careful read of “Probabilistic Description Logics for the Semantic Web” paper is the best place to start. Now let’s move to a more realistic example than those poor birds and penguins and Richard Nixon.

Consider the domain of cancer, more precisely, women’s breast cancer, yet more specifically, the issue of breast cancer risk assessment . Very roughly, the central problem is to combine all the risk factors that apply to a particular woman and come up with a credible number reflecting her chance of developing breast cancer, either in her lifetime or in the short term (normally in the next 10 years). There are a few models that do that, basically, just by computing an empirically inferred function of input parameters (risk factors).

Pronto offers a different way of approaching the problem. It supports a wide use of all the background knowledge captured in a classical ontology—for example, of the sort maintained in the NCI Thesaurus—but also allows us to augment the classical KB with probabilistic statements, such that the risk of developing cancer can be computed as an ontological inference. That makes modeling a lot more explicit and illustrative—especially with support of Pellet’s explanations—than using a “black box” function.

So, consider a classical part of an ontology for modeling the breast cancer domain—by the way, we’re not claiming it’s correct or useful from medical point of view; you may also consider Matt Williams’ version of clinical ontology if you’re concerned with correctness of terms and stuff like that.

The ontology defines risk factors that are relevant to breast cancer, i.e., subclasses of RiskFactor. Then it also defines different categories of women, first, those that have certain risk factors (subclasses of WomanWithRiskFactors); and, second, those distinct in terms of the risk of developing cancer (subclasses of WomanUnderBRCRisk). The basic task is to compute the probability that a certain woman is an instance of some WomanUnderBRCRisk subclass given that she is an instance of some WomanWithRiskFactors subclass. In addition, it will be useful to infer the generic probabilistic subsumption between classes under WomanUnderBRCRisk and under WomanWithRiskFactors.

The first thing to do in order to enable such probabilistic reasoning is to express the uncertain background knowledge about the domain. This is done by listing the conditional constraints in the form of OWL 1.1 axiom annotations. The constraints can either be in a separate file that imports the classical OWL ontology or be embedded into the classical part.

For this example, the constraints express how individual risk factors influence the risk of developing cancer (numbers taken from “Risk Factors and Prevention”). The job of Pronto is to combine factors that apply to a particular woman and compute the probability that she is an instance of some WomanUnderBRCRisk subclass.

Let’s now quickly go through the individuals to illustrate the reasoning:


  • Julie is a woman in her thirties. The only risk factor that applies to her is AgeUnder50, so the Pronto concludes that Julie:(WomanWithBRCInShortTime|owl:Thing)[0.0;0.027] (her chance of developing cancer in next 10 years is no higher than 2.7%)

  • Mary is known to have BRCA1 gene mutation which is known to be a hugely important risk factor. Using the generic constraint (WomanUnderGreatBRCRisk|WomanWithBRCAMutation)[1;1], Pronto puts her in the category of women with the highest relative risk of cancer (this example, also shows, that conditional constraints, with some caveats, can model certain subsumption relationships)

  • For Ann we know two risk factors – her mother had BRC and she is an Ashkenazi Jew, so she has an increased chance of having inherited gene mutation. Using the combination of risk factors without overriding, Pronto concludes that she has a 31.25% chance of being in the category of 3x increased risk and over 2.5% of being in the highest risk category.

  • Helen is the most interesting case. For her we again know 2 risk factors – her age is over 50 and her mother had cancer. Using overriding we can specify how these two factors strengthen or weaken each other to produce the actual risk. This can be done by defining a generic constraint (WomanUnderGreatBRCRisk|SeniorWomanWithMotherBRCAffected)[0.9;1] that overrides constraints for both factors individually. Thus, Pronto entails that she in the highest risk category with more than 90% probability.

Finally, I want to mention explanations, which are an important part of Pellet’s reasoning services. I begin by pointing out that DL reasoning can be difficult to understand and probabilistic reasoning can also be difficult to understand. Not surprisingly, the hybrid reasoning that Pronto is capable of can be very difficult to understand. This is both a limitation and an opportunity to extend Pellet’s explanation services to Pronto.

In other words, we want to extract the minimal set of conditional constraints that are sufficient to produce the given probabilistic entailment. For the breast cancer example above that would imply filtering out all the irrelevant risk factors and leaving only those which were taken into account during reasoning. We’ve got some initial work done in extending explanations in Pronto, but there’s much more work to do, including extending debug and repair features of Pellet to Pronto.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Introducing Pronto: Probabilistic DL Reasoning in Pellet

Thursday, September 27th, 2007 · Pavel Klinov

This is the first in a series of posts on extending Pellet with probabilistic reasoning capabilities. We call this tool “Pronto”. It offers core OWL reasoning services for knowledge bases containing uncertain knowledge; that is, it processes statements like “Bird is a subclass-of Flying Object with probability greater than 90%” or “Tweety is-a Flying Object with probability less than 5%”.

The use cases for Pronto include ontology and data alignment, as well as reasoning about uncertain domain knowledge generally, for example, risk factors associated with breast cancer.

First, I should say that if you are interested in a rigorous description of the approach, read the paper by Thomas Lukasiewicz “Probabilistic Description Logics for the Semantic Web”. Pronto is to a large extent an implementation of the Lukasiewicz approach—the rest is optimization and the support of explanations.

In a nutshell, the features of Pronto (in addition to the features of Pellet) are the following:


  1. Expressing generic probabilistic knowledge. “Generic” means that the knowledge doesn’t apply to any specific individual but rather to a fresh, randomly chosen one. Generic probabilistic knowledge is represented in the form of generic conditional constraint (GCC). A GCC is an expression of the form (D|C)[l,u], where C and D are DL concepts and [l,u] is a closed subinterval of [0,1]. Without getting deeply into the semantics, the meaning of a GCC is roughly for a randomly chosen instance of C, the probability of being an instance of D is within [l,u]. The above statement about birds would be written as (FlyingObject|Bird)[0.9;1.0].

  2. Expressing concrete probabilistic knowledge. Here the knowledge applies to a specific individual. Concrete probabilistic knowledge is represented in the form of a:X, where “a” is an individual and “X” is a GCC restricted to the form (D|owl:Thing)[l,u]. We can express “Tweety is-a Flying Object with probability less than 5%” as Tweety:(FlyingObject|owl:Thing)[0.0;0.05].

  3. Probabilistic reasoning, that is, generic and concrete entailments. A generic entailment is, given a probabilistic KB and a pair of concepts, compute the tightest interval (D|C)[l,u]. A concrete entailment is, given a probabilistic KB, an individual “a”, and a concept “D”, compute the tightest interval (D|owl:Thing)[l,u] for “a”. So we can ask Pronto to infer the probability of a statement like Tweety being a flying object based on other statements rather than asserting the conditional constraint.

  4. Probabilistic explanations. Pronto is capable of computing all minimally sufficient (w.r.t. inclusion) subsets of conditional constraints for a particular entailment, both generic and concrete.

Perhaps the single most important point about Pronto reasoning is that all inferences are done in a totally “logical” way, i.e. using a well-defined entailment relation and without any explicit or implicit translation of KB (or some parts of KB) to Bayesian graphs. This is the major difference between Pronto and other approaches, e.g. P-CLASSIC or “Probabilistic Extension to OWL”.

Finally, I should mention overriding as a feature of Pronto that we particularly like. Pronto allows certain conflicts between different pieces of probabilistic knowledge, more precisely, between different conditional constraints. The famous example is that of flying birds and non-flying penguins. (It’s similar to the famous Nixon Diamond problem.) The problem here is related to non-monotonicity: A bird is a flying object with high probability and all penguins are birds but a penguin has a low probability of flying.

The way Pronto resolves these conflicts is by allowing more specific constraints to override more generic ones. So if Pronto knows that Tweety is a Penguin and Penguin is a subclass-of Bird, it will override the constraint (FlyingObject|Bird)[0.9;1.0] by (FlyingObject|Penguin)[0.0;0.05] and correctly entail Tweety:(FlyingObject|owl:Thing)[0.0;0.05]. This is the idea borrowed from reference class reasoning and supported by Lehmann’s lexicographic entailment employed in Pronto (see the Lukasiewicz paper for technical details). The decision whether a constraint is more specific/generic than some other one is made through the classical DL reasoning.

In the next post of this series, I’ll take you through an actual use of Pronto in the life sciences domain.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Scaling OWL (two new ways)

Monday, June 25th, 2007 · Bijan Parsia

For those with OWL-performance-fear, there is some good news. There usually is, but these tidbits are striking:

  • On the TBox side of things: Boris Motik and Rob Shearer (with Ian Horrocks) have developed a new reasoning calculus that is very effective with the notorious Galen ontology, and, indeed, with all the OBO ontologies. They tackle both non-determinism and tableau size with stunning results. It should also have positive implications for DL Safe rules. They have a prototype reasoner using the technique, HermiT, available for download.
  • On the ABox side of things: IBM Research (Watson Research Center, NY) have recently posted information about their summarization technique for scaling ABoxes. They have a reasoner, SHER, which will be available in one form or another at some point (note! it has Pellet Inside!). Their case study is quite inspiring.

(The hypertableaux calculus has positive implications for ABox scaling as well.)

Since I’m an hour train ride from Liverpool, I’ll just conclude that it’s getting better all the time. Read the papers. Enjoy.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Working 9 (Hours) to 5 (Minutes): Tuning the Pellet Classifier

Tuesday, April 10th, 2007 · Mike Smith

The NCI Thesaurus is an ontology of cancer, diseases, and related terminology within and outside biomedicine. The latest version is really large—about 58,000 classes in the latest release. From our perspective, as maintainers of Pellet, large ontologies present opportunities. The folks at NCI agreed and they’ve funded us to improve Pellet’s classification service to the point that it can be used with the Thesaurus.

In a short month of work, we’ve progressed from infinite time to 9 hours to 5 minutes and, though we’re shifting focus at the moment, we’re confident there are more improvements to be realized. This has been an excellent example of why working on Pellet is rewarding, why software engineering matters, and how funding Pellet’s development can make a difference.

If you were paying attention to the NCI Thesaurus when some of us worked on it at the Mindswap lab, that (older) version now takes about 15 seconds to classify. Yeah, that’s right: 15 seconds. When we started this work it was 50 minutes.

Expect to see these classification improvements in the 1.5 release, and if you’ve got other big problems for which re-engineering Pellet may help, let us know.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

A Pellet User Survey

Thursday, December 21st, 2006 · Kendall Clark

One of the areas we’re interested in as a business is OWL and, more specifically, Description Logics. There has been a lot of talk lately about the low-expressivity end of the Semantic Web spectrum; but we think OWL DL is still the place to be to do interesting work, solve real (that is, hard) problems, and build a market. We’re primarily concerned with solving our customer’s problems, rather than building any grand public systems. We’ll leave that to others.

So what we really need, if we’re going to build the kinds of tools people want, is an ongoing conversation about OWL, DL, Pellet, etc. Having that conversation is one reason we give talks, write papers, and sponsor conferences.

But another way is, well, just asking people (or “the community”) directly, or as directly as is possible, what they think, need, want, and what they are or might be willing to pay for. To that end, here’s a pointer to the 2007 Pellet User Survey (hosted by the very nice Wufoo web forms service—highly recommended).

Since we’re going to be offering commercial support of Pellet starting in 2007, we’re trying to gather information about what makes sense to concentrate on. So if you’ve ever used Pellet, use it regularly, or are just curious, please let us know what you think:

https://clarkparsia.wufoo.com/forms/2007-pellet-user-survey/
Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati