Optimizing SPARQL-DL

by Petr Kremen

To continue with the SPARQL-DL series, I will briefly discuss two implemented cost-based query reordering methods.

The current Pellet engine for conjunctive ABox queries with distinguished variables first explores all query orderings, and then it executes the best one (according to the cost function defined here). This reordering method (let’s call it static) of SPARQL-DL queries suffers from several problems, mainly :


  • As it evaluates each query ordering, even queries consisting of as few as 10 atoms or more are effectively uncomputable.
  • To keep cost evaluation of each query ordering computable in linear time w.r.t. the query length, the cost estimates have to be based on rough, and thus imprecise, estimates of various knowledge base statistics (like the number of instances in given class). Some of these statistics are computationally expensive as they, in general case, may require classification (like estimate for number of subclasses of given class) or realization (like estimate for number of named classes that given instance belongs to) of the knowledge base.

These problems led me to experiment with reordering queries dynamically during their execution. Even the first implemented method—greedy-like search—seems to be promising (contrary to the use of greedy search in most other applications). The algorithm starts with the the list L of all query atoms and in each step the atom with best score in L is evaluated and removed from L. The scoring function I’m using is simple: for a given atom it sums up an estimate of the number of consistency checks that are required to evaluate this atom and the number of new variable bindings to be explored in next steps. This heuristic is linear in the number of atoms and thus cheap to compute.

On the other hand, this scoring function does not overcome the second objection above. It still provides only a rough estimate, and does not reflect the ordering of remaining query atoms at all. However, since costs of relevant atoms are evaluated after each iteration, this issue is less critical than in the static ordering case. My initial experiments prove that the proposed dynamic ordering method is comparable to static ordering on short queries and significantly more scalable on longer ones.

As an example, let’s take a query consisting of 9 atoms :

  • “Give me all graduate students (?X) that takes some course (?Z) together with the teacher of the course (?A) and his/her official position (?B). Filter out those teachers who do not work for the department (?W) the student belongs to.”

    Abstract syntax:Type(?X, ub:GraduateStudent]), PropertyValue(?X, ub:takesCourse, ?Z), Type(?Z, ub:Course), PropertyValue(?X, ub:memberOf, ?W), PropertyValue(?A, ?P, ?W), Type(?A, ?B]), PropertyValue(?A, ub:teacherOf, ?Z), SubPropertyOf(?P, ub:worksFor), SubClassOf(?B, ub:Faculty)

Execution of this query against LUBM (0,1) takes approximately 3 seconds using the dynamic reordering, comparing to almost 30 seconds required in the static ordering case (plus additional approx. 4 seconds for KB consistency check in both cases). Both approaches have comparable execution times on the original LUBM queries. Detailed evaluation and comparison will be shown in a future post.

Spread the word:
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

One Response to “Optimizing SPARQL-DL”

  1. Rinke Hoekstra » SPARQL DL Says:

    [...] reading the more recent post, about query optimization, I wondered whether it is foreseen that SPARQL-DL will support CONSTRUCT [...]

Leave a Reply