Skip to : [Content] [Navigation]
 
Fork me on GitHub

Pellet Integrity Constraints: Validating RDF with OWL

Wouldn’t it be great if OWL could be used as a very expressive schema language for RDF, Linked Data, virtual RDF, and so on? Pellet Integrity Constraint Validator (Pellet ICV) treats OWL as a schema or validation language for RDF data via auto-generated SPARQL queries that can be executed on any SPARQL-enabled RDF store. Pellet ICV extends core Pellet by interpreting OWL axioms with integrity constraint semantics. That means you can write ontologies that validate RDF data via auto-generated SPARQL queries.

Pellet ICV, part of Clark & Parsia’s NIST-funded work to support integrity constraints in OWL, is available as a prototype extension of Pellet that interprets OWL ontologies with the Closed World Assumption in order to detect constraint violations in RDF data. This means that the full expressivity of OWL and OWL 2 can finally be used as a schema language for RDF.

Pellet ICV translates OWL integrity constraint ontologies into SPARQL queries automatically to validate RDF data. When those SPARQL queries are executed, either by Pellet or by some other RDF triple store, the results indicate integrity constraint violations. Pellet ICV can also provide automatic explanations of why integrity constraints are violated, which aids data debugging, repair, alignment, etc.

We anticipate commercializing this technology via PelletDb, our Pellet-Oracle integration product.

Download

A preview release of Pellet ICV, version 0.4, is no longer available for download. A new version of Pellet ICV will be included in the upcoming release of Pellet 3.

Integrity Constraints in OWL

A common misperception about OWL is to think that standard OWL or OWL 2 can easily be used as an expressive schema language. Such schema languages are typically used to validate data in relational databases or XML data. Due to the Open World Assumption adopted by RDF and OWL, and the lack of Unique Name Assumption, the axioms in an OWL ontology are meant to infer new knowledge rather than trigger an inconsistency.

But clearly user experience and feedback has taught us that people want to use OWL both to infer new knowledge via inference and to validate RDF instance data via OWL ontologies. Pellet ICV allows us to do both.

Our approach in Pellet ICV is to give an alternative semantics for OWL axioms so that they are interpreted with CWA and a weak form of UNA. CWA interpretation means that we assume that an assertion is false if we don’t know explicitly whether it is true or false — for example, if the assertion is missing from RDF data. Weak UNA means that if two individuals are not inferred to be the same, then they will be assumed to be distinct. The full expressivity of OWL and OWL is available to OICV. It is now possible to use Pellet and OWL to both create new knowledge and validate RDF instance data.

IC Examples

Let’s see some examples of how ICs can be used in ontologies. We will provide some simple examples in the context of a fictional ontology about products and manufacturers. In the following examples, we will describe constraints such as “products are manufactured by manufacturers”, “every product has a manufacturer”, “a product cannot have more than one manufacturer”. Using the same axiom We will point out the differences between treating an axiom as a standard OWL axiom and as an IC.

Range constraints:

One common case that requires ICs is defining domain and range constraints for properties. Suppose, in our product ontology, we would like to define a constraint saying that “products are manufactured by manufacturers”. In OWL we can use a range axiom to express this constraint. The following ontology with two axioms show this constraint along with a property assertion about a specific product’s manufacturer:

   :isManufacturedBy rdfs:range :Manufacturer .
   :product1 :isManufacturedBy :ACME .

According to RDFS and OWL semantics, this ontology is consistent even tough ACME is not explicitly defined to be a Manufacturer instance. The missing type assertion is fine since there is nothing else contradicting the reasoner to infer that ACME is an instance of Manufacturer. The range axiom is being used to infer this type relation rather than as a check to detect that ACME satisfies the condition.

With the IC semantics, the above range axiom can be treated as a check rather than an inference rule. As a result, we will detect that there is a violation because ACME cannot inferred to be a manufacturer The semantics of ICs are defined in a way that the constraint validation can be reduced to query answering. The range axiom above would be translated to the following SPARQL query:

   ASK WHERE {
      ?x  :isManufacturedBy  ?y .
      NOT EXISTS { ?y  rdf:type :Manufacturer . }
   }

The NOT EXISTS keyword is the Negation as Failure feature being added in SPARQL 1.1. In SPARQL 1.0, you can use the well-known OPTIONAL/FILTER/!BOUND pattern for encoding negation.

Min cardinality constraints:

Another common use case for constraints is to enforce that the instances of a class should have certain property values. For example, we want to say that “every product has a manufacturer”. In OWL, we can use either a min cardinality or a some values restriction for this purpose. The following ontology shows the RDF encoding of this restriction along with a product instance definition:

  
   :Product rdfs:subClassOf [
         rdf:type owl:Restriction; 
         owl:onProperty :isManufacturedBy; 
         owl:someValuesFrom :Manufacturer 
   ] .
   :product1 rdf:type :Product .

There is again no inconsistency in this ontology with OWL semantics. According to OWA, product1 may have a manufacturer that we don’t know about so there is no explicit evidence that the restriction is violated. With IC semantics, the above restriction will be validated against the data at hand and if the property value is not contained in this data a violation will be reported.

The SPARQL translations start to get more complicated as the axioms we use as ICs gets complex. However, the ontology developers do not need to worry (or even know) about the SPARQL representation of a constraint. Since OWL axioms are translated into SPARQL queries behind the scenes this is an implementation detail that ontology developers can ignore.

Max cardinality constraints:

Expressing a constraint such as “a product cannot have more than one manufacturer” reveals another interesting feature of OWL. In OWL, a property can be defined to be functional to express that there cannot be multiple values for a property for the same instance. Let’s consider the following ontology:

   :isManufacturedBy rdf:type owl:FunctionalProperty .
   :product1 :isManufacturedBy :manufacturer1 .
   :product1 :isManufacturedBy :manufacturer2 .

There is again no inconsistency here with OWL semantics. A reasoner would simply infer that manufacturer1 is same as manufacturer2. Obviously, one can add the explicit assertion that the manufacturer1 and manufacturer2 are different from each other. However, maintaining explicit different from assertions can be hard especially when individuals are being added to and removed from a data set dynamically.

The interpretation of cardinality restrictions in IC semantics adopt the view that two individuals are different from each other unless explicitly stated otherwise. That is, we will get a constraint violation above because there is no information to conclude that manufacturer1 and manufacturer2 refer to the same individual. But if this explicit equality assertion is added to the data, the IC violation would be resolved. Thus, IC both provide a weaker form of Unique Name Assumption where we still have the ability to assert equality between two individuals.