Using Taxonomies for SPARQL-DL optimizations
by Petr Kremen
Last post I spent showing directions for optimizing SPARQL-DL in general. Now I would like to touch a more particular case of SPARQL-DL queries, namely those that contain class/property variables in ABox part of the query.
I call down-monotonic all the class/property variables ?x that occur in either Type(•,?x), or PropertyValue(•,?x,•) atom. For these variables we can use class/property hierarchies to prune the search. Let’s take an atom Type(i, ?x) (i.e., i is either a binding for variable •, or constant •). If i is not of type C we can safely avoid considering subclasses of C as bindings for ?x ( and analogically for PropertyValue(i, ?x, j) and property hierarchy ).
However, reverse implication doesn’t hold, because of possible interaction with other TBox atoms. Finding a binding C for (down-monotonic) ?x we can’t take all superclasses/superporperties of C as valid bindings. Consider the query Type(i,?x), ComplementOf(?x, not(C)). If C is a subclass of D, it might happen that C is a valid binding for ?x, while D is not.
- “Give me all people (?X) together with their type (?A) that are advisors of themselves.”
Abstract syntax: Type(?X, ?A), SubClassOf(?A, ub:Person), PropertyValue(?X, ub:advisor, ?X) - “Give me all people (?X) together with their type (?A) that are teaching assistants of some course (?Y)”
Abstract syntax: SubClassOf(?A, ub:Person), Type(?X, ?A), PropertyValue(?X,ub:teachingAssistantOf,?Y), Type(?Y, ub:Course).
The first query does not return any result, thus the optimization prunes all subclasses of Person immediately, resulting in reduction of execution time by an order of magnitude (from 3 second to 0.3 seconds). On the other hand, the second query returns nonempty result set, resulting in the significant decrease of the pruning gain (from 3 seconds to 1.5 seconds).