Owlgres: A Scalable OWL Reasoner
by Markus Stocker
Hi, I’m Markus Stocker, the new intern at C&P and enjoying time here in DC working on a first project, i.e. Owlgres a DL-Lite implementation for PostgreSQL. DL-Lite is a fragment of expressive DLs with some interesting properties. First, together with the standard reasoning services (e.g. subsumption) it supports conjunctive query answering over an ABox maintained in secondary storage (typically a RDBMS). Second, DL-Lite has some nice computational properties, i.e. the standard reasoning tasks are polynomial in the size of the TBox and query answering is polynomial in data complexity. Third, DL-Lite allows us to expand ABox queries with TBox knowledge and translate the expanded ABox query into SQL queries that are subsequently evaluated over the RDBMS.
More practically, this means for instance that (after tweaking a bit the data) one can load the DBpedia Wikipedia infoboxes and their classification into PostgreSQL as ABox and TBox respectively and then query along the YAGO class hierarchy. For example, DBpedia explicitly states “The Lord of the Rings” as an instance of Book and a corresponding query returns the correct result set. But although Book is a subclass of Publication the result set of a query that asks for a Publication with name “The Lord of the Rings” is empty. Not in OwlGres. Somehow sweet, isn’t it?
The main parts of Owlgres were already implemented before I joined C&P in February. Though, Owlgres suffered from some query performance problems. So my task was to look at how we could possibly optimize the overall query performance.
In a nutshell, DL-Lite query answering includes a reformulation step that encodes the TBox knowledge into the DL-Lite ABox query. This reformulation typically expands a query into a set of (reformulated) queries that are subsequently translated into SQL queries. The overall result set is the UNION of the intermediate result sets for each of the queries in the reformulated set. A good starting point for optimization is, hence, to look at the size of the reformulated set. Is it minimal?
Let’s take a look at a real world case. For instance the LUBM query 9. The query can be seen as the following DL-Lite ABox query
{ Student(?X), Faculty(?Y), Course(?Z), advisor(?X, ?Y), teacherOf(?Y, ?Z), takesCourse(?X, ?Z) }
The PerfectRef reformulation algorithm described by Calvanese et al. returns a reformulated set of queries of size 726 for the LUBM query 9 and, hence, Owlgres executes 726 SQL queries on PostgreSQL to get the result set. The question is, can this set be reduced? The answer? Yes.
One thing that we quickly notice is that a significant number of queries have an empty result set. Further, many queries have result sets that are subsets of other queries. For instance, the result set of the second query in the following example is a subset of the first:
{{ advisor(?X,?Y), teacherOf(?Y,?Z) }, { advisor(?X,?Y), teacherOf(?Y,?Z), Faculty(?Y) }}
Thus, we can drop the second query as it doesn’t add any value to the overall result set. We found that a domain/range simplification practically eliminates such subset queries. Essentially, for each reformulated query we drop query atoms that are redundant constraints. For instance, in the second query above, the Faculty(?Y) query atom is redundant as the domain of the object property teacherOf is specified to be a class of type Faculty in the LUBM TBox. By removing the Faculty(?Y) query atom in the query above the two queries are equal and, hence, we reduce the set by one. So, domain/range simplification is the first optimization.
A DL-Lite ABox query is a set of query atoms which is executed as a conjunctive query. If one of the atoms has zero selectivity, the entire query has zero selectivity. It is, hence, useful to know if query atoms have a zero selectivity as this information allows us to safely drop queries from the reformulated set. This is our second optimization. During the ABox loading process, we gather statistics about the frequency of TBox names (concepts, roles). Whenever a TBox name with zero selectivity is used in a (reformulated) query we drop the query from the resulting set.
To come back to our example LUBM query 9, by applying the optimizations described above we reduce the initial reformulated set of 726 queries to just 2 queries. Pretty amazing, isn’t it? Needless to say, PostgreSQL is happy about this too. Finally, we create a single SQL query of the optimized reformulated set by creating an SQL UNION of the queries in the set. Thus, instead of 726 single queries we execute just one.
It’s time for a couple figures. We expect faster queries, but how much faster are they? We tested the 14 LUBM queries on the University0 dataset (i.e. roughly 100k triples). The timings below are in milliseconds. We first show the timing for the non optimized reformulated set and then the timing for the optimized set.
For instance, the LUBM query 9 takes 18 seconds to be executed without the optimizations and 0.1 seconds with the optimizations. The average difference between the optimized and unoptimized queries is 1,914.43 milliseconds.
We’re going to be demoing Owlgres publicly for the first time at OWLED 2008 in DC in two weeks; if you’re attending OWLED, we’d love to show Owlgres to you. We’ll be saying more in future posts about the kinds of reasoning problems Owlgres and DL-Lite are especially suited for.






March 24th, 2008 at 11:39 am
Readers should note that LUBM, as generally used, is not expressible in DL-Lite because it uses transitive properties and because qualified existentials are used in the left side of subsumption relations. The results presented here are with a modified version that reduces the expressivity.
March 26th, 2008 at 10:50 am
Is there a demo version we can download and experiment with?
March 26th, 2008 at 3:24 pm
John,
There will be a first public release soonish; we’ll be giving demos at OWLED next week, and I think some Ordnance Survey folks will be there. After that, we’ll make a public release.
April 21st, 2008 at 8:37 am
[...] have been blogging about Owlgres, our scalable OWL reasoner for DL-Lite. Recently, we have been playing with DBpedia, [...]
June 16th, 2008 at 4:28 pm
IBM is working on a DLP (and now DL) engine on top of DB2.
there is an attempt from the EPFL to add such support to Oracle. OwlGres is targetted at Postgres.
Definitely there is something to investigate, it seems :)
From a developper point of view, I have a concern.
Is it possible to abstract such a technology from the RDBMS?
So we are not tied to a given “vendor”.
Or does it require fine tuning of the database internals?
June 17th, 2008 at 4:51 pm
Olivier,
I can’t speak for the other projects, but Owlgres isn’t Postgres-specific or even tied to Postgres. We developed it with Postgres in mind, but we don’t any Postgres internals tweaking. Owlgres exists purely at the JDBC level, except for database-specific setup code.
So we could easily make Owlgres work for Oracle or DB2, and probably will do so at some point.
To get the most performance, one might tweak some DB internals (which is one reason we prefer Postgres), but so far that’s not been necessary.