Owlgres, DL-Lite and Selectivity Optimizations
by Markus Stocker
I have been blogging about Owlgres, our scalable OWL reasoner for DL-Lite. Recently, we have been playing with DBpedia, in particular the dataset with the Wikipedia Infoboxes. I have previously talked about the features of DL-Lite: querying a large persistent ABox and complete result sets w.r.t. a TBox.
In this post I’d like to underline the usefulness of the selectivity optimizer using a short and concrete example from DBpedia. We query over roughly 24 million triples (in Owlgres metrics 2,198,649 concept assertions and 21,503,343 data role assertions). The query asks for instances of Book and looks like this:PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX yago: <http://dbpedia.org/class/yago/> PREFIX dbpedia: <http://dbpedia.org/property/>SELECT ?name WHERE {
?s rdf:type yago:Book106410904 .
?s dbpedia:name ?name .
}
Without the optimizer, the query runs in 61,743 ms and with the optimizer in 14,336 ms.
Why? The TBox contains totally 58,108 concepts and 53,898 of them have no asserted instances (that’s around 93%). This isn’t very surprising since typically individuals are asserted in the leaf concepts of taxonomies.
But it makes a lot of sense to consider this during the query reformulation process in DL-Lite. In fact, without the selectivity optimization, the reformulated set size is 83, i.e. the algorithm reformulates the original conjunctive query into a set of 83 conjunctive queries.
By using some basic statistics (I’m not talking about complicated maths here!) and a trivial procedure to eliminate conjunctive queries that have a known zero selectivity, we can reduce the reformulated set of conjunctive queries from 83 to 16. This makes everybody happy, including Postgres, which returns the result set 4.3 times faster (not huge, yes, but worth the effort). Note that there is no caching here. If caching is involved, the query takes approximately the same time (with and without selectivity optimization, it takes roughly 3 seconds).
I’ll be at WWW2008 this week and I’m happy to chat about DL-Lite with interested people and demo Owlgres on DBpedia. I’ll have a talk on SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation on Thursday, probably the easiest way to spot me. The talk is about some previous work that started in 2006 at the University of Zurich as a diploma thesis and was refined in 2007 at HP Labs with the implementation of the query optimizer for ARQ, the Jena SPARQL query engine.