Towards SPARQL-DL Evaluation in Pellet
by Petr Kremen
This post starts the series on currently developed SPARQL-DL engine in Pellet.
So, what is SPARQL-DL ? It is a query language recently proposed by Evren and Bijan. Actually, it has less to do with SPARQL than its name may suggest. It’s really an expressive language for querying OWL-DL ontologies, which can in turn be used to extend the semantics of SPARQL’s basic graph patterns.
Each SPARQL-DL query is a conjunction of one or more query atoms. As the meaning of the atom is pretty straightforward (and the precise semantics is fully described in the above referenced paper) I would point out just their abstract syntax. All the atoms except Annotation(i,p,j) allow distinguished variables in each of its arguments. Undistinguished variables can be placed only in individual/literal positions.:
Type(i,c), PropertyValue(i,p,j)– standard ABox atoms (enriched by the possibility of distinguished variables in predicate positions—c,p)SameAs(i,j), DifferentFrom(i,j)—atoms corresponding to OWL individual axioms for handling Unique Name AssumptionSubClassOf(c,d), EquivalentClass(c,d), DisjointWith(c,d)—atoms corresponding to standard OWL class axiomsComplementOf(c,d)—the only atom representing a class description constructSubPropertyOf(p,q), EquivalentProperty(p,q), InverseOf(p,q), ObjectProperty(p), DataProperty(p), FunctionalProperty(p), InverseFunctional(p), Symmetric(p), Transitive(p)—atoms corresponding to standard OWL property axiomsAnnotation(i,p,j)—ground atom for matching OWL annotations
Simply put—okay, it’s simple if you’re familiar with DL terminology—SPARQL-DL allows mixed (TBox, RBox, and ABox) queries that may contain, in addition to distinguished variables, undistinguished variables in the instance/literal positions of ABox atoms. Here are two simple examples for querying the LUBM dataset (ub=http://www.lehigh.edu/~zhp2/2004/0401/univ-bench.owl# and rdf, rdfs are prefixes for RDF and RDFS namespaces respectively), :
- “Give me all graduate students (?X) that are somehow related to some course (?W). We also want to know what kind of the relation (?Y) they are related with and what is the kind of the course (?Z).”
Abstract syntax: Type(?X, ub:GraduateStudent), PropertyValue(?X, ?Y, ?W), Type(?W, ?Z), SubClassOf(?Z, ub:Course)SPARQL syntax: SELECT ?X ?Y ?W ?Z WHERE { ?X rdf:type ub:GraduateStudent . ?X ?Y ?W . ?W rdf:type ?Z . ?Z rdfs:subClassOf ub:Course . } - “Give me all people (?X) that are members of the mentioned department and tell me what kind of membership (?Y) it is.”
Abstract syntax: Type(?X, ub:Person), PropertyValue(?X, ?Y, http://www.Department0.University0.edu), SubPropertyOf(?Y, ub:memberOf)SPARQL syntax: SELECT ?X ?Y WHERE { ?X rdf:type ub:Person . ?X ?Y <http://www.Department0.University0.edu> . ?Y rdfs:subPropertyOf ub:memberOf . }
I’ve recently finished the first (rather naive) implementation of the SPARQL-DL engine. Its only restriction on query form is that the class, property and instance variable sets do not overlap (thus avoiding all the stuff around punning). It uses the current Pellet ABox engine (see Pellet description, section 3.5) as a black box :
- All the class and property variables are evaluated (only distinguished variables are allowed for class and property arguments) using an atom per atom evaluation.
- For each found binding a standard ABox query is created and sent to the current Pellet ABox engine.
Having this straightforward implementation I extended it to a fully autonomous implementation for queries with only distinguished variables. Of course, performance of both these implementations strongly depends on the quality of cost-based reordering. The initial experiments with in advance cost-based reordering are not very encouraging; it’s both time consuming to compute them and they are not very accurate, after all.
After doing a bit more on this stuff I would like to try some kind of dynamic cost-based reordering. i.e. recomputing the atom order after each evaluation step.
And what to do next ?
First, there is also a need to optimize query plan generation to disregard sub-optimal plans as soon as possible, possibly using intelligent sampling or heuristic-based query ordering. Second, and this will be useful for evaluation of pure ABox queries, is evaluation of cyclic queries with undistinguished variables, where each (ABox) cycle contains at least one constant or distinguished variable. This might be handled with distinguished-like engine with rolling-up techniques. Third, another useful thing is the integration of the created expressive basic pattern evaluator to full SPARQL engine, thus providing all the algebra around.
There is much (more) stuff around SPARQL-DL—various extensions are mentioned in the Evren and Bijan’s paper—so I will stop writing now and get to solving them.





November 3rd, 2007 at 4:50 am
[...] 2007 OWLED workshop. The guys at Clark&Parsia really got busy, and the first implementation of SPARQL-DL will be introduced as part of [...]