Why Reasoning Matters: Consistency Checking (1)
by Kendall Clark
Reasoning only matters (to us, to the market) if it’s useful. Significance—technical and economic—is a function of utility and perceived utility. This is the first in a series of posts that will, I hope, increase the perceived utility of formal, automated, logic-based reasoning.
We think reasoning is useful in a great number of ways, for a great number of use cases. But its proponents—including us, Clark & Parsia LLC—haven’t always done a great job of communicating that utility to others, in part because like any non-trivial field, reasoning is complicated. Lots of squiggles and symbols and off-putting bits. Like machine learning or computer vision or…Perl.
Let’s talk, first, about data integration. Non-trivial cases require something more than the standard ploy:
- make an RDF Schema
- coin or re-use a URI scheme
- convert n sources into RDF
- dump that RDF into a database
- query the database (i.e., build a new front-end, etc)
We’ve done that—for the NASA expertise location service we built called POPS, as well as for the NASA data center analysis tool called BIANCA—and while it works, there weren’t any really hard modeling, mapping, or integration bits.
For really hard bits, like schema and mapping alignments, partial alignments, dynamic mappings, query routing, and so on, you need more help from the computer. Reasoning gives you that help, particularly when the problems are complex (very large or many schemas to be aligned, or partial mappings or alignments, etc).
In these applications, especially where data volumes are large, you want to follow standard engineering principles with regard to failure and edge case detection: that is, you want to fail early and often. Expressing a Global View on n schemas as an OWL ontology means that certain kinds of mapping and integration conceptual errors can’t happen in a live, production system. Using consistency checking at design-time, the computer checks that all concepts in the Global View can actually be instantiated, that is, that they are logically consistent.
You just can’t do that in RDF or RDFS, since it’s not generally possible to express a contradiction in those languages. Everything is always consistent in an RDF/S application. But that’s not always the way the world works. For some applications, that “feature” is really a bug.
Pellet and Owlgres both offer consistency checking for ontologies, which is (roughly) analogous to static type checking at compile time in some programming languages. That sort of feature of the system eliminates certain classes of failures.
In complex integration apps, eliminating that class of error not only makes the system more robust at run-time, but it also increases the confidence one can have about the answers to queries against the data. An RDFS-only solution cannot eliminate that class of errors and, thus, cannot increase confidence that query answers actually make sense.
As Jim Hendler likes to say, there is no single, univocal notion or standard of truth on the Web or on the Semantic Web. Yes, of course. But for some apps and data sets, there is such a notion, and it’s incredibly useful that tools like Pellet and Owlgres can detect and enforce those constraints.
Significance—technical and economic—is a function of utility and perceived utility. This post gives you some good reasons to perceive the utility of reasoning differently—in future posts, I’ll give more good reasons around things like explanation, automated debugging, and other reasoning services.
June 9th, 2008 at 12:47 am
I don’t know how hard it is, but it would be nice to see more explanations for inconsistencies in the ABox. I recently converted some legacy data to RDF, and with the help of Topbraid, Pellet and a simple OWL ontology I managed to find errors in the data…these errors being highlighted as an inconsistency in the ontology.
June 9th, 2008 at 12:47 am
I don’t know how hard it is, but it would be nice to see more explanations for inconsistencies in the ABox. I recently converted some legacy data to RDF, and with the help of Topbraid, Pellet and a simple OWL ontology I managed to find errors in the data…these errors being highlighted as an inconsistency in the ontology.
June 9th, 2008 at 2:47 am
I don’t know how hard it is, but it would be nice to see more explanations for inconsistencies in the ABox. I recently converted some legacy data to RDF, and with the help of Topbraid, Pellet and a simple OWL ontology I managed to find errors in the data…these errors being highlighted as an inconsistency in the ontology.
June 9th, 2008 at 5:20 am
Hi John…My next post in this series will be about how explanation—as a reasoning service—provides all sorts of real-world benefits, including, as you point out, providing quite good “debugging hints” when you find problems in data.
I will also be hammering the relative advantage OWL has in this regard over RDF and RDFS, which both are so weak that no explanations are ever useful because no RDF/RDFS inferences are ever useful (Okay, a bit of an exaggeration, but still…)
June 9th, 2008 at 5:20 am
Hi John…My next post in this series will be about how explanation—as a reasoning service—provides all sorts of real-world benefits, including, as you point out, providing quite good “debugging hints” when you find problems in data.
I will also be hammering the relative advantage OWL has in this regard over RDF and RDFS, which both are so weak that no explanations are ever useful because no RDF/RDFS inferences are ever useful (Okay, a bit of an exaggeration, but still…)
June 9th, 2008 at 7:19 am
I’ve been toying with the idea of only excluding certain bits of the OWL ontology when it comes to checking consistency, but then removing them when it comes to implementing the RDF. For example in some cases “allValuesFrom” restriction might be useful for TBox and ABox consistency checking, but might add little when it comes to querying while also making the reasoning less tractable. Any thoughts?
June 9th, 2008 at 7:19 am
I’ve been toying with the idea of only excluding certain bits of the OWL ontology when it comes to checking consistency, but then removing them when it comes to implementing the RDF. For example in some cases “allValuesFrom” restriction might be useful for TBox and ABox consistency checking, but might add little when it comes to querying while also making the reasoning less tractable. Any thoughts?
June 9th, 2008 at 7:20 am
Hi John…My next post in this series will be about how explanation—as a reasoning service—provides all sorts of real-world benefits, including, as you point out, providing quite good “debugging hints” when you find problems in data.
I will also be hammering the relative advantage OWL has in this regard over RDF and RDFS, which both are so weak that no explanations are ever useful because no RDF/RDFS inferences are ever useful (Okay, a bit of an exaggeration, but still…)
June 9th, 2008 at 9:19 am
I’ve been toying with the idea of only excluding certain bits of the OWL ontology when it comes to checking consistency, but then removing them when it comes to implementing the RDF. For example in some cases “allValuesFrom” restriction might be useful for TBox and ABox consistency checking, but might add little when it comes to querying while also making the reasoning less tractable. Any thoughts?
June 9th, 2008 at 2:16 pm
I assume you meant “including” rather than “excluding” in the sentence.
Generally, I think identifying different data “facets”—for lack of a better term—and producing ontologies, or versions of an ontology, for each facet seems reasonable. It reminds me of the best practice from XML Land, where it’s often said that a really complex apps needs several schemas.
The real trick is doing something clever in various reasoning systems to support this kind of faceted work.
Since Owlgres, in particular for us, is considerably less expressive but more scalable, one might build a more maximal ontology to check consistency but something more minimal (or none at all) to do query. Sure.
Ideally this process could be at least semi-automated; but I’m not familiar with any work in this area, though more well-informed folks might know otherwise.
June 9th, 2008 at 2:16 pm
I assume you meant “including” rather than “excluding” in the sentence.
Generally, I think identifying different data “facets”—for lack of a better term—and producing ontologies, or versions of an ontology, for each facet seems reasonable. It reminds me of the best practice from XML Land, where it’s often said that a really complex apps needs several schemas.
The real trick is doing something clever in various reasoning systems to support this kind of faceted work.
Since Owlgres, in particular for us, is considerably less expressive but more scalable, one might build a more maximal ontology to check consistency but something more minimal (or none at all) to do query. Sure.
Ideally this process could be at least semi-automated; but I’m not familiar with any work in this area, though more well-informed folks might know otherwise.
June 9th, 2008 at 4:16 pm
I assume you meant “including” rather than “excluding” in the sentence.
Generally, I think identifying different data “facets”—for lack of a better term—and producing ontologies, or versions of an ontology, for each facet seems reasonable. It reminds me of the best practice from XML Land, where it’s often said that a really complex apps needs several schemas.
The real trick is doing something clever in various reasoning systems to support this kind of faceted work.
Since Owlgres, in particular for us, is considerably less expressive but more scalable, one might build a more maximal ontology to check consistency but something more minimal (or none at all) to do query. Sure.
Ideally this process could be at least semi-automated; but I’m not familiar with any work in this area, though more well-informed folks might know otherwise.
June 13th, 2008 at 12:43 am
“Ideally this process could be at least semi-automated; but I’m not familiar with any work in this area”:
Well another project for Bijan to start if that is the case :)
June 13th, 2008 at 12:43 am
“Ideally this process could be at least semi-automated; but I’m not familiar with any work in this area”:
Well another project for Bijan to start if that is the case :)
June 13th, 2008 at 2:43 am
“Ideally this process could be at least semi-automated; but I’m not familiar with any work in this area”:
Well another project for Bijan to start if that is the case :)