Archive for the 'RDF Databases' Category

Owlgres 0.1: First Release

Wednesday, May 7th, 2008 · Markus Stocker

We are proud to announce the first release, version 0.1 (alpha), of Owlgres, a very scalable OWL reasoner that uses Postgres. It implements DL-Lite, a tractable profile of the upcoming OWL 2 standard. Owlgres supports consistency checking and conjunctive query reasoning services—the latter via SPARQL-DL.

Downloads and documentation can be found at the Owlgres site. For bug reports, feel free to open a ticket on our issue tracking site for Owlgres, which also summarizes the first steps with Owlgres on the Wiki page. There’s a mailing list for discussion and support.

Owlgres is dual-licensed; for open source projects, it’s available under the AGPL v.3. For commercial projects, commercial support licenses are available.

We’d love feedback on Owlgres and encourage people to try it out, play with it, and report bugs, issues, and ideas.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

(Slowly) Open Sesame

Monday, November 26th, 2007 · Michael Grove

OpenRDF/Aduna recently announced Sesame 2 is finally in the release candidate stage (see the note from 11/12 on their homepage). We’ve been using Sesame since about version 1.1 as the primary backend for jSpace with reasonable success. Sesame 1.x has proven to be a solid backend as development has progressed in jSpace. Unfortunately, jSpace’s UI is very query driven, and its responsiveness relies greatly on the performance of the backend it’s talking to; a slow database means a long time between selecting something in a column and seeing results in the subsequent column.

Until recently, Sesame 1.x has handled this rather well. I had to do some work on the auto-generated queries, some pre-optimization, to squeeze a little better performance out of them, but for the most part, the query response time has been adequate for development and testing. But we’re now we’re trying to test jSpace against non-trivial sized data sets, including my favorite, our scrape of the Retrosheet.org baseball data, which is about 7.5M triples. We’ve been using the in-memory Sesame repositories because they give us the best query performance, but we’re coming up on the point were its not going to fast enough for larger data sets. I’ve been tracking Sesame 2 for most of the time it’s been in development, about two odd years now I guess, and I was happy to hear it finally made release candidate status. That meant to me that it was worth finally giving it a test drive.

I downloaded the latest version (RC1) and tore into it like a kid at Christmas. I set up a very simple bit of profiling code, which basically just took sample queries dumped from a session of me using jSpace against the baseball data and posed them against the repository and tracked the query time. I was dismayed when I saw the inital results, they were not what I expected. The out-of-the-box configuration of a Sesame 2 in-memory repository was being crushed by a copy of Sesame 1.2.7 built from their CVS trunk about a month ago. We’re talking between two or three times slower for some queries, to two or three orders of magnitude slower for others. Out of 13 test queries, Sesame 2 outperformed its predecessor on only one, a rather simple query which grabbed all the rdfs:label triples from the kb. I posted the results on the Sesame forums and got two suggestions; one, trying SPARQL queries rather than SERQL, and two, there’s a dead simple query optimiziation that has not been included into the query optimizer yet, so maybe if I do that optimization by hand, I’ll see results more like what I expected.

So the next test was with SPARQL queries, but not surprisingly, there was no appreciable speed-up. The queries are parsed into the same query model which is excecuted by the engine, so this is what I expected. However, the hand-optimization did yield a significant improvement in performance. The worst-case difference was reduced to only an order of magnitude, and for the most part, queries were only a couple times slower with Sesame 2. And now there was a second query in which Sesame 2 was outperforming Sesame 1.x.

This cheered me up, there still seems to be hope for Sesame 2, but in a later release candidate. James, one of the fellows who responded to my post on the Sesame forums, did point out that Sesame 2’s performance may never reach up to the level of Sesame 1.x because of the added level of complexity of the new quad-based format over Sesame 1.x’s triple-based architecture. He makes a good point, but I’ve got my fingers crossed anyway. I’ve enjoyed using Sesame in the past, and I hope they can streamline the query engine some before the final release so we can continue using it.

For those interested in my post on the Sesame forums, you can see it here, and you can download the raw profilng results in .xls format.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati