Archive for December, 2007

Job: R&D Position for Experienced Java Programmer

Saturday, December 22nd, 2007 · Kendall Clark

We’re looking for an experienced Java programmer to join us to work on R&D projects and production systems in reasoning, planning, description logics, semantic web services, logistics, etc.

The successful applicant will be smart and have a demonstrable record of getting things done.

  • Bachelor’s or Master’s in CS or similar field
  • Minimum 5 years of serious programming experience
  • Experience with AI, KR, logic programming a definite plus
  • Enthusiasm for solving difficult, algorithmically novel problems required
  • Familiarity with database theory valuable
  • Familiarity with systems or operations research also valuable
  • Excellent written and verbal communication skills

We offer the following compensations:

  • Competitive salary and good benefits
  • Intellectually stimulating work that matters
  • regular office sports outings, Wii tournaments, sundry other diversions
  • great espresso & coffee & a well-stocked ‘fridge
  • Beer Fridays
  • great location, and a bad MOFO office, in downtown DC

To apply, email me—Kendall Clark—a resume and cover letter. We are an equal opportunity employer and encourage all qualified candidates to apply.

How Distinguished is Your Variable?

Thursday, December 20th, 2007 · Evren Sirin

In a recent post, Petr talked about optimizations for queries involving undistinguished variables. There is a close relationship between undistinguished variables and bnodes in RDF whose semantics is being discussed in the OWL working group for OWL 1.1. In this post, I’d like to discuss how they relate and explain how using (un)distinguished variables can change the results you get from a reasoner. First off, the term undistinguished variable is used interchangeably with the term nondistinguished variable so don’t be confused. In the literature you might also see the term existential variable being used for the same purpose.

Let’s start with reviewing bnodes in RDF and then we’ll get to bnodes in SPARQL. Contrary to common belief, bnodes in RDF are not just unnamed individuals but they are existential variables (see the bnode definition from RDF semantics). Treating bnodes as existential variables does not cause a difference in ontology consistency results or entailments regarding named individuals. The difference is visible when you are checking entailments where bnodes are used in the consequences. For example, consider the following simple ontology which I will call ontology A:

:Parent owl:equivalentClass [
       a owl:Restriction ;
       owl:onProperty :hasChild ;
       owl:someValuesFrom owl:Thing ] .
:Alice a :Parent .
:Bob :hasChild :Charlie .

Suppose we have another ontology B as follows:

:Bob :hasChild _:x .

According to the bnode semantics of RDF, ontology B is entailed by ontology A. This is because the bnode _:x in ontology B plays like a wildcard role and matches the individual Charlie from A. Suppose you have yet another ontology C:

:Bob :hasChild :Somebody .

Ontology C is not entailed by ontology A because there is no evidence that the named individual Somebody is same as the individual Charlie. If bnodes were just unnamed individuals then ontology B and C would be treated equivalently because we would just assign an arbitrary name to the bnode <em>:x and ontology B would not be entailed.

In contrast to the formal semantics of RDF, thinking of bnodes as unnamed individuals is common practice for people using RDF/OWL. The situation is similar for tool support. I’m not aware of any reasoner that treats bnodes in the consequences as existential variables (Pellet supports entailment checking but it does not support bnodes in consequences). These are some reasons why OWL working group is considering to adopt the unnamed individual semantics (a.k.a. skolemization semantics) for bnodes in OWL 1.1 that is in agreement with the current practice and users’ expectations. This is an ongoing discussion and there is no resolution on this topic yet.

Now let’s move to the semantics of bnodes in SPARQL queries which is defined similar to the RDF semantics. The SPARQL specification says:

Blank nodes in graph patterns act as non-distinguished variables, not as references to specific blank nodes in the data being queried.

A nondistinguished variable in the query traditionally means that the variable can be bound not only to asserted individuals but also to individuals whose existence is inferred. For example, in ontology A, since we know Alice is a parent we can infer the existence of her child even though there are no explicit assertions regarding that child in the ontology. This means the following SPARQL query:

SELECT *
WHERE { ?parent :hasChild _:a }

would return two results where variable parent is bound to Alice and Bob. That is, if you have a query engine that respects the semantics of bnodes. Pellet query engine supports bnodes in SPARQL queries with this semantics. As far as Pellet is concerned, the previous query is exactly same as the following query:

SELECT * 
WHERE { ?parent a [
       a owl:Restriction ;
       owl:onProperty :hasChild ;
       owl:someValuesFrom owl:Thing ] . }

However, if you change the query to use a distinguished variable in place of the undistinguished </em>:a as in:

SELECT * 
WHERE { ?parent :hasChild ?child }

then the query would have a single result (Bob as the parent and Charlie as the child). The difference in the results might seem surprising because two queries look very similar. But you can see the clear analogy between the first and last query and ontologies B and C. In both cases, replacing bnodes with named individuals/variables change the results we get.

Now let’s consider one last tricky case. Let’s change ontology A to contain one extra assertion as follows:

:Alice a :Parent .
:Bob :hasChild :Charlie .
:Dudley :hasChild _:c .

As expected, the first two queries (both using existential variables) will return three results (Alice, Bob, Dudley). But what do we expect for the last query which uses the distinguished variable child? If we are treating bnodes in the ontology as existential variables (as defined in the RDF semantics) then there are no differences between Alice and Dudley since owl:someValuesFrom is an existential restriction. This suggests we should still get a single answer for the last query (Bob). However, in practice, if you try to run this query against an OWL reasoner (I tried Pellet and Jena rule reasoners) you will get two answers (Bob and Dudley) where the child of Dudley is bound to a newly generated bnode identifier like _:b0.

This is just another example of how tools treat bnodes in the source ontology more like unnamed individuals not as pure existentials. And this is why I believe OWL 1.1 should adopt the skolemization semantics for bnodes and leave the existential variables to SPARQL. After all, existential variables only make a difference in query answering (note that you can think of ontology entailment as a SPARQL ASK query). It makes sense to have explicit existentials in the query language but not in the ontology language (which already supports existentials through restrictions like owl:someValuesFrom anyway).

On Queries with Undistinguished Variables

Friday, December 7th, 2007 · Petr Kremen

After a bit of delay, I want to discuss some new optimizations for queries with undistinguished variables that are being implemented in the new Pellet query engine. For simplicity, I consider just connected conjunctive ABox queries with undistinguished variables.

The current Pellet query engine evaluates the queries with undistinguished variables as follows:

  • the query is rolled-up to each distinguished variable ?x resulting in a concept rollup(?x), instances of which are taken as candidate bindings for ?x
  • Two strategies are provided at this point : (i) optimized (for q. with at-least two distinguished variables)—the query is partially grounded by all bindings for some two distinguished variables and for each such grounding an instance check is performed. In each of the next steps the current variable binding is extended by all possible bindings for one of the remaining variables and an instance check is performed; (ii) simple—for each query grounding with a possible candidate binding for all variables an instance check is performed.

The problem with this approach is that rolling-up an edge PropertyValue(?x,p,?y) to rollup(?x)= ∃ p.T and retrieving instances of this concept in many cases ends-up finding also a valid binding for ?y, which is then thrown away. In the subsequent execution another instance check is often required to check that some binding for ?x and ?y is valid. Thus, it is preferable to evaluate a property value atom with no undistinguished variables using the query engine for distinguished variables instead.

To narrow the gap between evaluation of queries with only distinguished variables and those with also some undistinguished variables I introduce cores (see figure below) of undistinguished variables as the largest connected subqueries induced by non-distinguished variables. Cores are then treated as normal query atoms with signature consisting of neighboring distinguished variables and constants. There are different strategies to evaluate a core (to be described in another post). For now just consider the simple strategy (above), mentioned at the beginning of this post.

Cores example

I stick to the tradition of two sample queries in my posts :

  • “For all people (?X) leading a department tell me at which institution (?Z) they have obtained their degrees.”
    PropertyValue(?X, ub:headOf, _:a), Type( _:a, ub:Department), PropertyValue(?X, ub:degreeFrom, ?Z)
  • “Give me all teaching graduate students (?X) that take a course (?Y) taught by their advisor.”
    Type(?X, ub:GraduateStudent), PropertyValue(?X, ub:takesCourse, ?Y), PropertyValue( _:a, ub:teacherOf, ?Y), PropertyValue(?X, ub:advisor, _:a), PropertyValue(?X, ub:teachingAssistantOf, _:b).

Even for the simple strategy mentioned in the first paragraph we get significant performance improvement. For the first query this evaluation method produces results within 3 s. with just 2 consistency checks, comparing to 30 s. and 21 consistency checks in case of the original Pellet query engine. For the second query we get 23 results within 4 s. with 2 consistency checks. The original Pellet engine didn’t finish in 5 min., having performed by this time hundreds of consistency checks.