Archive for the 'SPARQL' Category

Owlgres 0.1: First Release

Wednesday, May 7th, 2008 · Markus Stocker

We are proud to announce the first release, version 0.1 (alpha), of Owlgres, a very scalable OWL reasoner that uses Postgres. It implements DL-Lite, a tractable profile of the upcoming OWL 2 standard. Owlgres supports consistency checking and conjunctive query reasoning services—the latter via SPARQL-DL.

Downloads and documentation can be found at the Owlgres site. For bug reports, feel free to open a ticket on our issue tracking site for Owlgres, which also summarizes the first steps with Owlgres on the Wiki page. There’s a mailing list for discussion and support.

Owlgres is dual-licensed; for open source projects, it’s available under the AGPL v.3. For commercial projects, commercial support licenses are available.

We’d love feedback on Owlgres and encourage people to try it out, play with it, and report bugs, issues, and ideas.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

OwlSight v.52 — Keeping up with the Jones’

Monday, May 5th, 2008 · Michael Grove

Following closely on the heels of the recent release of Pellet 1.5.2, we’ve updated OwlSight. Since OwlSight runs on raw Pellet power, the new version, .52, updates the back-end to take advantage of the recent Pellet release.

If you have not already taken a look at OwlSight, cruise on over to the OwlSight page and take it for a spin. For those not already in the know, OwlSight is a lightweight browser-based ontology browser utilizing both GWT (and GWT-Ext) and Pellet as its core technologies. Until next time, stay classy cyberspace.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

How Distinguished is Your Variable?

Thursday, December 20th, 2007 · Evren Sirin

In a recent post, Petr talked about optimizations for queries involving undistinguished variables. There is a close relationship between undistinguished variables and bnodes in RDF whose semantics is being discussed in the OWL working group for OWL 1.1. In this post, I’d like to discuss how they relate and explain how using (un)distinguished variables can change the results you get from a reasoner. First off, the term undistinguished variable is used interchangeably with the term nondistinguished variable so don’t be confused. In the literature you might also see the term existential variable being used for the same purpose.

Let’s start with reviewing bnodes in RDF and then we’ll get to bnodes in SPARQL. Contrary to common belief, bnodes in RDF are not just unnamed individuals but they are existential variables (see the bnode definition from RDF semantics). Treating bnodes as existential variables does not cause a difference in ontology consistency results or entailments regarding named individuals. The difference is visible when you are checking entailments where bnodes are used in the consequences. For example, consider the following simple ontology which I will call ontology A:

:Parent owl:equivalentClass [
       a owl:Restriction ;
       owl:onProperty :hasChild ;
       owl:someValuesFrom owl:Thing ] .
:Alice a :Parent .
:Bob :hasChild :Charlie .

Suppose we have another ontology B as follows:

:Bob :hasChild _:x .

According to the bnode semantics of RDF, ontology B is entailed by ontology A. This is because the bnode _:x in ontology B plays like a wildcard role and matches the individual Charlie from A. Suppose you have yet another ontology C:

:Bob :hasChild :Somebody .

Ontology C is not entailed by ontology A because there is no evidence that the named individual Somebody is same as the individual Charlie. If bnodes were just unnamed individuals then ontology B and C would be treated equivalently because we would just assign an arbitrary name to the bnode <em>:x and ontology B would not be entailed.

In contrast to the formal semantics of RDF, thinking of bnodes as unnamed individuals is common practice for people using RDF/OWL. The situation is similar for tool support. I’m not aware of any reasoner that treats bnodes in the consequences as existential variables (Pellet supports entailment checking but it does not support bnodes in consequences). These are some reasons why OWL working group is considering to adopt the unnamed individual semantics (a.k.a. skolemization semantics) for bnodes in OWL 1.1 that is in agreement with the current practice and users’ expectations. This is an ongoing discussion and there is no resolution on this topic yet.

Now let’s move to the semantics of bnodes in SPARQL queries which is defined similar to the RDF semantics. The SPARQL specification says:

Blank nodes in graph patterns act as non-distinguished variables, not as references to specific blank nodes in the data being queried.

A nondistinguished variable in the query traditionally means that the variable can be bound not only to asserted individuals but also to individuals whose existence is inferred. For example, in ontology A, since we know Alice is a parent we can infer the existence of her child even though there are no explicit assertions regarding that child in the ontology. This means the following SPARQL query:

SELECT *
WHERE { ?parent :hasChild _:a }

would return two results where variable parent is bound to Alice and Bob. That is, if you have a query engine that respects the semantics of bnodes. Pellet query engine supports bnodes in SPARQL queries with this semantics. As far as Pellet is concerned, the previous query is exactly same as the following query:

SELECT * 
WHERE { ?parent a [
       a owl:Restriction ;
       owl:onProperty :hasChild ;
       owl:someValuesFrom owl:Thing ] . }

However, if you change the query to use a distinguished variable in place of the undistinguished </em>:a as in:

SELECT * 
WHERE { ?parent :hasChild ?child }

then the query would have a single result (Bob as the parent and Charlie as the child). The difference in the results might seem surprising because two queries look very similar. But you can see the clear analogy between the first and last query and ontologies B and C. In both cases, replacing bnodes with named individuals/variables change the results we get.

Now let’s consider one last tricky case. Let’s change ontology A to contain one extra assertion as follows:

:Alice a :Parent .
:Bob :hasChild :Charlie .
:Dudley :hasChild _:c .

As expected, the first two queries (both using existential variables) will return three results (Alice, Bob, Dudley). But what do we expect for the last query which uses the distinguished variable child? If we are treating bnodes in the ontology as existential variables (as defined in the RDF semantics) then there are no differences between Alice and Dudley since owl:someValuesFrom is an existential restriction. This suggests we should still get a single answer for the last query (Bob). However, in practice, if you try to run this query against an OWL reasoner (I tried Pellet and Jena rule reasoners) you will get two answers (Bob and Dudley) where the child of Dudley is bound to a newly generated bnode identifier like _:b0.

This is just another example of how tools treat bnodes in the source ontology more like unnamed individuals not as pure existentials. And this is why I believe OWL 1.1 should adopt the skolemization semantics for bnodes and leave the existential variables to SPARQL. After all, existential variables only make a difference in query answering (note that you can think of ontology entailment as a SPARQL ASK query). It makes sense to have explicit existentials in the query language but not in the ontology language (which already supports existentials through restrictions like owl:someValuesFrom anyway).

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

On Queries with Undistinguished Variables

Friday, December 7th, 2007 · Petr Kremen

After a bit of delay, I want to discuss some new optimizations for queries with undistinguished variables that are being implemented in the new Pellet query engine. For simplicity, I consider just connected conjunctive ABox queries with undistinguished variables.

The current Pellet query engine evaluates the queries with undistinguished variables as follows:

  • the query is rolled-up to each distinguished variable ?x resulting in a concept rollup(?x), instances of which are taken as candidate bindings for ?x
  • Two strategies are provided at this point : (i) optimized (for q. with at-least two distinguished variables)—the query is partially grounded by all bindings for some two distinguished variables and for each such grounding an instance check is performed. In each of the next steps the current variable binding is extended by all possible bindings for one of the remaining variables and an instance check is performed; (ii) simple—for each query grounding with a possible candidate binding for all variables an instance check is performed.

The problem with this approach is that rolling-up an edge PropertyValue(?x,p,?y) to rollup(?x)= ∃ p.T and retrieving instances of this concept in many cases ends-up finding also a valid binding for ?y, which is then thrown away. In the subsequent execution another instance check is often required to check that some binding for ?x and ?y is valid. Thus, it is preferable to evaluate a property value atom with no undistinguished variables using the query engine for distinguished variables instead.

To narrow the gap between evaluation of queries with only distinguished variables and those with also some undistinguished variables I introduce cores (see figure below) of undistinguished variables as the largest connected subqueries induced by non-distinguished variables. Cores are then treated as normal query atoms with signature consisting of neighboring distinguished variables and constants. There are different strategies to evaluate a core (to be described in another post). For now just consider the simple strategy (above), mentioned at the beginning of this post.

Cores example

I stick to the tradition of two sample queries in my posts :

  • “For all people (?X) leading a department tell me at which institution (?Z) they have obtained their degrees.”
    PropertyValue(?X, ub:headOf, _:a), Type( _:a, ub:Department), PropertyValue(?X, ub:degreeFrom, ?Z)
  • “Give me all teaching graduate students (?X) that take a course (?Y) taught by their advisor.”
    Type(?X, ub:GraduateStudent), PropertyValue(?X, ub:takesCourse, ?Y), PropertyValue( _:a, ub:teacherOf, ?Y), PropertyValue(?X, ub:advisor, _:a), PropertyValue(?X, ub:teachingAssistantOf, _:b).

Even for the simple strategy mentioned in the first paragraph we get significant performance improvement. For the first query this evaluation method produces results within 3 s. with just 2 consistency checks, comparing to 30 s. and 21 consistency checks in case of the original Pellet query engine. For the second query we get 23 results within 4 s. with 2 consistency checks. The original Pellet engine didn’t finish in 5 min., having performed by this time hundreds of consistency checks.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

Using Taxonomies for SPARQL-DL optimizations

Friday, November 9th, 2007 · Petr Kremen

Last post I spent showing directions for optimizing SPARQL-DL in general. Now I would like to touch a more particular case of SPARQL-DL queries, namely those that contain class/property variables in ABox part of the query.

I call down-monotonic all the class/property variables ?x that occur in either Type(•,?x), or PropertyValue(•,?x,•) atom. For these variables we can use class/property hierarchies to prune the search. Let’s take an atom Type(i, ?x) (i.e., i is either a binding for variable •, or constant •). If i is not of type C we can safely avoid considering subclasses of C as bindings for ?x ( and analogically for PropertyValue(i, ?x, j) and property hierarchy ).

However, reverse implication doesn’t hold, because of possible interaction with other TBox atoms. Finding a binding C for (down-monotonic) ?x we can’t take all superclasses/superporperties of C as valid bindings. Consider the query Type(i,?x), ComplementOf(?x, not(C)). If C is a subclass of D, it might happen that C is a valid binding for ?x, while D is not.

Of course, the best performance of the optimization is achieved when the search fails for a down-monotonic variables binding that is a root of a deep and complex sub/super hierarchy. Let’s take two sample queries run against LUBM :

  • “Give me all people (?X) together with their type (?A) that are advisors of themselves.”
    Abstract syntax:Type(?X, ?A), SubClassOf(?A, ub:Person), PropertyValue(?X, ub:advisor, ?X)

  • “Give me all people (?X) together with their type (?A) that are teaching assistants of some course (?Y)”
    Abstract syntax:SubClassOf(?A, ub:Person), Type(?X, ?A), PropertyValue(?X,ub:teachingAssistantOf,?Y), Type(?Y, ub:Course).

The first query does not return any result, thus the optimization prunes all subclasses of Person immediately, resulting in reduction of execution time by an order of magnitude (from 3 second to 0.3 seconds). On the other hand, the second query returns nonempty result set, resulting in the significant decrease of the pruning gain (from 3 seconds to 1.5 seconds).

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati