Why Reasoning Matters: Consistency Checking (1)

by Kendall Clark

Reasoning only matters (to us, to the market) if it’s useful. Significance — technical and economic — is a function of utility and perceived utility. This is the first in a series of posts that will, I hope, increase the perceived utility of formal, automated, logic-based reasoning.

We think reasoning is useful in a great number of ways, for a great number of use cases. But its proponents — including us, Clark & Parsia LLC – haven’t always done a great job of communicating that utility to others, in part because like any non-trivial field, reasoning is complicated. Lots of squiggles and symbols and off-putting bits. Like machine learning or computer vision or…Perl.

Let’s talk, first, about data integration. Non-trivial cases require something more than the standard ploy:

  1. make an RDF Schema
  2. coin or re-use a URI scheme
  3. convert n sources into RDF
  4. dump that RDF into a database
  5. query the database (i.e., build a new front-end, etc)

We’ve done that — for the NASA expertise location service we built called POPS, as well as for the NASA data center analysis tool called BIANCA – and while it works, there weren’t any really hard modeling, mapping, or integration bits.

For really hard bits, like schema and mapping alignments, partial alignments, dynamic mappings, query routing, and so on, you need more help from the computer. Reasoning gives you that help, particularly when the problems are complex (very large or many schemas to be aligned, or partial mappings or alignments, etc).

In these applications, especially where data volumes are large, you want to follow standard engineering principles with regard to failure and edge case detection: that is, you want to fail early and often. Expressing a Global View on n schemas as an OWL ontology means that certain kinds of mapping and integration conceptual errors can’t happen in a live, production system. Using consistency checking at design-time, the computer checks that all concepts in the Global View can actually be instantiated, that is, that they are logically consistent.

You just can’t do that in RDF or RDFS, since it’s not generally possible to express a contradiction in those languages. Everything is always consistent in an RDF/S application. But that’s not always the way the world works. For some applications, that “feature” is really a bug.

Pellet and Owlgres both offer consistency checking for ontologies, which is (roughly) analogous to static type checking at compile time in some programming languages. That sort of feature of the system eliminates certain classes of failures.

In complex integration apps, eliminating that class of error not only makes the system more robust at run-time, but it also increases the confidence one can have about the answers to queries against the data. An RDFS-only solution cannot eliminate that class of errors and, thus, cannot increase confidence that query answers actually make sense.

As Jim Hendler likes to say, there is no single, univocal notion or standard of truth on the Web or on the Semantic Web. Yes, of course. But for some apps and data sets, there is such a notion, and it’s incredibly useful that tools like Pellet and Owlgres can detect and enforce those constraints.

Significance — technical and economic — is a function of utility and perceived utility. This post gives you some good reasons to perceive the utility of reasoning differently — in future posts, I’ll give more good reasons around things like explanation, automated debugging, and other reasoning services.