On Queries with Undistinguished Variables

by Petr Kremen

After a bit of delay, I want to discuss some new optimizations for queries with undistinguished variables that are being implemented in the new Pellet query engine. For simplicity, I consider just connected conjunctive ABox queries with undistinguished variables.

The current Pellet query engine evaluates the queries with undistinguished variables as follows:

  • the query is rolled-up to each distinguished variable ?x resulting in a concept rollup(?x), instances of which are taken as candidate bindings for ?x
  • Two strategies are provided at this point : (i) optimized (for q. with at-least two distinguished variables)—the query is partially grounded by all bindings for some two distinguished variables and for each such grounding an instance check is performed. In each of the next steps the current variable binding is extended by all possible bindings for one of the remaining variables and an instance check is performed; (ii) simple—for each query grounding with a possible candidate binding for all variables an instance check is performed.

The problem with this approach is that rolling-up an edge PropertyValue(?x,p,?y) to rollup(?x)= ∃ p.T and retrieving instances of this concept in many cases ends-up finding also a valid binding for ?y, which is then thrown away. In the subsequent execution another instance check is often required to check that some binding for ?x and ?y is valid. Thus, it is preferable to evaluate a property value atom with no undistinguished variables using the query engine for distinguished variables instead.

To narrow the gap between evaluation of queries with only distinguished variables and those with also some undistinguished variables I introduce cores (see figure below) of undistinguished variables as the largest connected subqueries induced by non-distinguished variables. Cores are then treated as normal query atoms with signature consisting of neighboring distinguished variables and constants. There are different strategies to evaluate a core (to be described in another post). For now just consider the simple strategy (above), mentioned at the beginning of this post.

Cores example

I stick to the tradition of two sample queries in my posts :

  • “For all people (?X) leading a department tell me at which institution (?Z) they have obtained their degrees.”
    PropertyValue(?X, ub:headOf, _:a), Type( _:a, ub:Department), PropertyValue(?X, ub:degreeFrom, ?Z)
  • “Give me all teaching graduate students (?X) that take a course (?Y) taught by their advisor.”
    Type(?X, ub:GraduateStudent), PropertyValue(?X, ub:takesCourse, ?Y), PropertyValue( _:a, ub:teacherOf, ?Y), PropertyValue(?X, ub:advisor, _:a), PropertyValue(?X, ub:teachingAssistantOf, _:b).

Even for the simple strategy mentioned in the first paragraph we get significant performance improvement. For the first query this evaluation method produces results within 3 s. with just 2 consistency checks, comparing to 30 s. and 21 consistency checks in case of the original Pellet query engine. For the second query we get 23 results within 4 s. with 2 consistency checks. The original Pellet engine didn’t finish in 5 min., having performed by this time hundreds of consistency checks.

Comments are closed.