Archive for the 'Owlgres' Category

Owlgres 0.1: First Release

Wednesday, May 7th, 2008 · Markus Stocker

We are proud to announce the first release, version 0.1 (alpha), of Owlgres, a very scalable OWL reasoner that uses Postgres. It implements DL-Lite, a tractable profile of the upcoming OWL 2 standard. Owlgres supports consistency checking and conjunctive query reasoning services—the latter via SPARQL-DL.

Downloads and documentation can be found at the Owlgres site. For bug reports, feel free to open a ticket on our issue tracking site for Owlgres, which also summarizes the first steps with Owlgres on the Wiki page. There’s a mailing list for discussion and support.

Owlgres is dual-licensed; for open source projects, it’s available under the AGPL v.3. For commercial projects, commercial support licenses are available.

We’d love feedback on Owlgres and encourage people to try it out, play with it, and report bugs, issues, and ideas.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Owlgres, DL-Lite and Selectivity Optimizations

Monday, April 21st, 2008 · Markus Stocker

I have been blogging about Owlgres, our scalable OWL reasoner for DL-Lite. Recently, we have been playing with DBpedia, in particular the dataset with the Wikipedia Infoboxes. I have previously talked about the features of DL-Lite: querying a large persistent ABox and complete result sets w.r.t. a TBox.

In this post I’d like to underline the usefulness of the selectivity optimizer using a short and concrete example from DBpedia. We query over roughly 24 million triples (in Owlgres metrics 2,198,649 concept assertions and 21,503,343 data role assertions). The query asks for instances of Book and looks like this:


PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX yago: <http://dbpedia.org/class/yago/>
PREFIX dbpedia: <http://dbpedia.org/property/>

SELECT ?name WHERE {
?s rdf:type yago:Book106410904 .
?s dbpedia:name ?name .
}

Without the optimizer, the query runs in 61,743 ms and with the optimizer in 14,336 ms.

Why? The TBox contains totally 58,108 concepts and 53,898 of them have no asserted instances (that’s around 93%). This isn’t very surprising since typically individuals are asserted in the leaf concepts of taxonomies.

But it makes a lot of sense to consider this during the query reformulation process in DL-Lite. In fact, without the selectivity optimization, the reformulated set size is 83, i.e. the algorithm reformulates the original conjunctive query into a set of 83 conjunctive queries.

By using some basic statistics (I’m not talking about complicated maths here!) and a trivial procedure to eliminate conjunctive queries that have a known zero selectivity, we can reduce the reformulated set of conjunctive queries from 83 to 16. This makes everybody happy, including Postgres, which returns the result set 4.3 times faster (not huge, yes, but worth the effort). Note that there is no caching here. If caching is involved, the query takes approximately the same time (with and without selectivity optimization, it takes roughly 3 seconds).

I’ll be at WWW2008 this week and I’m happy to chat about DL-Lite with interested people and demo Owlgres on DBpedia. I’ll have a talk on SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation on Thursday, probably the easiest way to spot me. The talk is about some previous work that started in 2006 at the University of Zurich as a diploma thesis and was refined in 2007 at HP Labs with the implementation of the query optimizer for ARQ, the Jena SPARQL query engine.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Owlgres: A Scalable OWL Reasoner

Sunday, March 23rd, 2008 · Markus Stocker

Hi, I’m Markus Stocker, the new intern at C&P and enjoying time here in DC working on a first project, i.e. Owlgres a DL-Lite implementation for PostgreSQL. DL-Lite is a fragment of expressive DLs with some interesting properties. First, together with the standard reasoning services (e.g. subsumption) it supports conjunctive query answering over an ABox maintained in secondary storage (typically a RDBMS). Second, DL-Lite has some nice computational properties, i.e. the standard reasoning tasks are polynomial in the size of the TBox and query answering is polynomial in data complexity. Third, DL-Lite allows us to expand ABox queries with TBox knowledge and translate the expanded ABox query into SQL queries that are subsequently evaluated over the RDBMS.

More practically, this means for instance that (after tweaking a bit the data) one can load the DBpedia Wikipedia infoboxes and their classification into PostgreSQL as ABox and TBox respectively and then query along the YAGO class hierarchy. For example, DBpedia explicitly states “The Lord of the Rings” as an instance of Book and a corresponding query returns the correct result set. But although Book is a subclass of Publication the result set of a query that asks for a Publication with name “The Lord of the Rings” is empty. Not in OwlGres. Somehow sweet, isn’t it?

The main parts of Owlgres were already implemented before I joined C&P in February. Though, Owlgres suffered from some query performance problems. So my task was to look at how we could possibly optimize the overall query performance.

In a nutshell, DL-Lite query answering includes a reformulation step that encodes the TBox knowledge into the DL-Lite ABox query. This reformulation typically expands a query into a set of (reformulated) queries that are subsequently translated into SQL queries. The overall result set is the UNION of the intermediate result sets for each of the queries in the reformulated set. A good starting point for optimization is, hence, to look at the size of the reformulated set. Is it minimal?

Let’s take a look at a real world case. For instance the LUBM query 9. The query can be seen as the following DL-Lite ABox query

{ Student(?X), Faculty(?Y), Course(?Z), advisor(?X, ?Y), teacherOf(?Y, ?Z), takesCourse(?X, ?Z) }

The PerfectRef reformulation algorithm described by Calvanese et al. returns a reformulated set of queries of size 726 for the LUBM query 9 and, hence, Owlgres executes 726 SQL queries on PostgreSQL to get the result set. The question is, can this set be reduced? The answer? Yes.

One thing that we quickly notice is that a significant number of queries have an empty result set. Further, many queries have result sets that are subsets of other queries. For instance, the result set of the second query in the following example is a subset of the first:

{{ advisor(?X,?Y), teacherOf(?Y,?Z) }, { advisor(?X,?Y), teacherOf(?Y,?Z), Faculty(?Y) }}

Thus, we can drop the second query as it doesn’t add any value to the overall result set. We found that a domain/range simplification practically eliminates such subset queries. Essentially, for each reformulated query we drop query atoms that are redundant constraints. For instance, in the second query above, the Faculty(?Y) query atom is redundant as the domain of the object property teacherOf is specified to be a class of type Faculty in the LUBM TBox. By removing the Faculty(?Y) query atom in the query above the two queries are equal and, hence, we reduce the set by one. So, domain/range simplification is the first optimization.

A DL-Lite ABox query is a set of query atoms which is executed as a conjunctive query. If one of the atoms has zero selectivity, the entire query has zero selectivity. It is, hence, useful to know if query atoms have a zero selectivity as this information allows us to safely drop queries from the reformulated set. This is our second optimization. During the ABox loading process, we gather statistics about the frequency of TBox names (concepts, roles). Whenever a TBox name with zero selectivity is used in a (reformulated) query we drop the query from the resulting set.

To come back to our example LUBM query 9, by applying the optimizations described above we reduce the initial reformulated set of 726 queries to just 2 queries. Pretty amazing, isn’t it? Needless to say, PostgreSQL is happy about this too. Finally, we create a single SQL query of the optimized reformulated set by creating an SQL UNION of the queries in the set. Thus, instead of 726 single queries we execute just one.

It’s time for a couple figures. We expect faster queries, but how much faster are they? We tested the 14 LUBM queries on the University0 dataset (i.e. roughly 100k triples). The timings below are in milliseconds. We first show the timing for the non optimized reformulated set and then the timing for the optimized set.

For instance, the LUBM query 9 takes 18 seconds to be executed without the optimizations and 0.1 seconds with the optimizations. The average difference between the optimized and unoptimized queries is 1,914.43 milliseconds.

We’re going to be demoing Owlgres publicly for the first time at OWLED 2008 in DC in two weeks; if you’re attending OWLED, we’d love to show Owlgres to you. We’ll be saying more in future posts about the kinds of reasoning problems Owlgres and DL-Lite are especially suited for.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati