Archive for the 'jSpace' Category

POPS is new W3C SWEO Use Case

Monday, March 10th, 2008 · Michael Grove

There is a new W3C Semantic Web Education Outreach (SWEO) use case online for POPS, our expertise location service built on top of jSpace for our friends at NASA. The use case describes in detail what POPS is, design issues and decisions, how it’s being used at NASA, and includes some screenshots. There is some good information for anyone who wants to learn more about POPS and jSpace, and how Semantic Web technology is being used at NASA, so please, check it out.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

NASA’s First Semantic Web App

Thursday, February 7th, 2008 · Kendall Clark

Last week POPS—the expertise location service we built for NASA—went into production as an Agency-wide application; it’s thought to be the first “institutional” (that is, business) Semantic Web app deployed Agency-wide at NASA. (I should emphasize that no one really knows whether POPS is the first; but we believe it to be and for good reasons.)

We’re very proud of this accomplishment. It’s proof that SemWeb technologies like RDF and SPARQL are useful for building solutions to information integration problems. We’re also proud because it proves what Bijan Parsia, Andy Schain (NASA HQ CTO), and I thought about this problem from the first time we talked about it in 2005: expertise location is a kind of information integration problem.

And despite all the OWL—and, especially, OWL DL—work we do, this demonstrates that we’re also a pretty good “shallow end” SemWeb app company, too.

We don’t yet know how successful POPS will be at NASA, but if it’s successful, it will be so for two reasons: First, it’s really just a visual query builder for an RDF aggregation that we’ve tricked people into using by building a user interface that reminds people of iTunes—and we owe this to one of our fav HCI researchers, m.c. schrafel and her mspace tool. Second, because C&P Employee #1, Mike Grove, will have, often by sheer force of will, made it a success by writing a ton of good, clean, interesting code; by doing an inordinate amount of project management; and, third, by being goddam unflappable under pressure.

At the launch party for POPS, lots of people were giving and taking credit for it—I don’t disagree with a word of it. Neither Mike nor I said much about any of that because, well, that’s pretty boring. But if someone had asked me, I would have said what I know to be true: POPS is Mike Grove’s baby and it’s all grows up.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

(Slowly) Open Sesame

Monday, November 26th, 2007 · Michael Grove

OpenRDF/Aduna recently announced Sesame 2 is finally in the release candidate stage (see the note from 11/12 on their homepage). We’ve been using Sesame since about version 1.1 as the primary backend for jSpace with reasonable success. Sesame 1.x has proven to be a solid backend as development has progressed in jSpace. Unfortunately, jSpace’s UI is very query driven, and its responsiveness relies greatly on the performance of the backend it’s talking to; a slow database means a long time between selecting something in a column and seeing results in the subsequent column.

Until recently, Sesame 1.x has handled this rather well. I had to do some work on the auto-generated queries, some pre-optimization, to squeeze a little better performance out of them, but for the most part, the query response time has been adequate for development and testing. But we’re now we’re trying to test jSpace against non-trivial sized data sets, including my favorite, our scrape of the Retrosheet.org baseball data, which is about 7.5M triples. We’ve been using the in-memory Sesame repositories because they give us the best query performance, but we’re coming up on the point were its not going to fast enough for larger data sets. I’ve been tracking Sesame 2 for most of the time it’s been in development, about two odd years now I guess, and I was happy to hear it finally made release candidate status. That meant to me that it was worth finally giving it a test drive.

I downloaded the latest version (RC1) and tore into it like a kid at Christmas. I set up a very simple bit of profiling code, which basically just took sample queries dumped from a session of me using jSpace against the baseball data and posed them against the repository and tracked the query time. I was dismayed when I saw the inital results, they were not what I expected. The out-of-the-box configuration of a Sesame 2 in-memory repository was being crushed by a copy of Sesame 1.2.7 built from their CVS trunk about a month ago. We’re talking between two or three times slower for some queries, to two or three orders of magnitude slower for others. Out of 13 test queries, Sesame 2 outperformed its predecessor on only one, a rather simple query which grabbed all the rdfs:label triples from the kb. I posted the results on the Sesame forums and got two suggestions; one, trying SPARQL queries rather than SERQL, and two, there’s a dead simple query optimiziation that has not been included into the query optimizer yet, so maybe if I do that optimization by hand, I’ll see results more like what I expected.

So the next test was with SPARQL queries, but not surprisingly, there was no appreciable speed-up. The queries are parsed into the same query model which is excecuted by the engine, so this is what I expected. However, the hand-optimization did yield a significant improvement in performance. The worst-case difference was reduced to only an order of magnitude, and for the most part, queries were only a couple times slower with Sesame 2. And now there was a second query in which Sesame 2 was outperforming Sesame 1.x.

This cheered me up, there still seems to be hope for Sesame 2, but in a later release candidate. James, one of the fellows who responded to my post on the Sesame forums, did point out that Sesame 2’s performance may never reach up to the level of Sesame 1.x because of the added level of complexity of the new quad-based format over Sesame 1.x’s triple-based architecture. He makes a good point, but I’ve got my fingers crossed anyway. I’ve enjoyed using Sesame in the past, and I hope they can streamline the query engine some before the final release so we can continue using it.

For those interested in my post on the Sesame forums, you can see it here, and you can download the raw profilng results in .xls format.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

New jSpace App: Baseball Stats Browser

Thursday, September 20th, 2007 · Michael Grove

We’re big baseball fans around here, and if you were in our office, you’d frequently find us debating various baseball topics, such as whether or not Jeff Bagwell had a Hall of Fame career, or lamenting about the sad state of the Orioles franchise. So it was just a matter of time until our interests at work collided with our love of the national pastime and yielded our latest demo of jSpace over a data set near and dear to our hearts, baseball statistics.

Retrosheet is a great site for baseball statistics, they have a very comprehensive database of stats for nearly every game played dating back to about 1870. We’ve used this site in the past to settle more than a few debates. One of the great things about Retrosheet is that they provide a dump of their data, which we scraped into RDF and loaded into Sesame. It’s about 7.1M triples and contains all the season stats (pitching, batting and fielding) for all the players in their database who played between 1871 and 2006. That’s nearly 17000 players, from Hank Aaron to Ryan Zimmerman and everyone in between. It also includes all relevant associated team, league and position data.

We’ve created a model file for this data and hooked it up to jSpace for browsing, and it makes for a very cool demo—in fact, we think it’s probably gone beyond a demo and is a nearly-useful tool for sabermetricians, baseball stats junkies, etc.

Under the assumption you’re looking for a player, our model contains columns for all relevant statistics in the database, as well as some calculated ones, such as Slugging %, Runs Created or Zone Rating. You can very easily find all first basemen who played for any of the Baltimore Orioles franchises (there are several) who have hit more than 200 homeruns in their career and have a lifetime batting average of .280 or better, all it takes is a few clicks in jSpace’s interface. Maybe you don’t care what franchise they played for, no problem. You’re just a click away from finding all players that meet those criteria, except for the franchise for which they played.

To make the demo possible, we’ve added some new features to jSpace. Among the new features is support for custom UI’s for columns containing typed literals. So in the case of our baseball demo, the homeruns column, which is full of integer values, now gives you the option to restrict your search using numerical operations such as greater than or less than. If a column contains date entries, you can specify that you’re looking for things before or after a certain date. And you can even set the granularity of a column of numbers, so rather than showing ALL the homerun values, you can display them in buckets of 10 or 100, narrowing down the results in a column from potentially hundreds to just a handful. The custom column UI support is extensible, so it would be easy to make new column view’s, such a map for geographic data, or a calendar for date values rather than a list.

We’ve also added a new Information Panel component called the “Web View” which simply takes the current selected resource and shows a web page relevant to that selection. We provide out of the box support for Wikipedia and Google’s “I’m Feeling Lucky” searches. So in our baseball demo, when you select “First Base” from the list of positions, the web view will show wikipedia’s entry on first basemen. When you select a player, you’ll see Retrosheet’s player data page. The web view brings more information about what you are browsing right to your fingertips.

We think these new features and a very cool data set make for a great demo of jSpace and we invite you over to check it out. Just browse to the jSpace page and scroll down to the bottom of the page where we list the demo’s and click on the link to launch the jSpace baseball demo.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati

BIANCA, 2007 Semantic Summer, and OWLED

Friday, May 18th, 2007 · Kendall Clark

We’ve been busy with new internal projects, as well as some milestones for some customer work:

First, we’re getting very close to putting an RDF-powered data integration app into production at NASA HQ. There have been Semantic Web projects at NASA, but not very many have been put into production alongside ordinay business apps. If we’re not the first (and we probably aren’t), we’re at least early.

At the Semantic Technology conference next week our customer, Andy Schain will be giving a talk about this app, which is called BIANCA, and another one, called POPS, we’ve been working on for NASA.

BIANCA provides a single, integrated view of information about, including relationships between, applications, servers, network services, networks, and change items for NASA HQ. The integration is over four different data sources, and with this integrated view we’re able to provide some novel analysis services, including the ability to do disaster and other outage planning scenarios based on building dependency graphs of the relationships between BIANCA nodes. From these graphs we can generate outage repair plans (though these are not yet optimal plans, that’s coming in the next version) as well as productivity estimates per hour of downtime.

BIANCA is a Pylons web app (and RESTful web service) in front of an RDF database. (The next version of BIANCA will include live querying of DNS, SNMP, IDS, and other “network fabric” services. I suspect we’ll do something in Sesame by building a new Sail for some of these data services.)

POPS (People Organizations Projects and Skills) is the other app, which will go into user pilot soon: it’s an expertise locator service for NASA civil servants and contractors (all 80,000 of them), which also integrates disparate data sources (this time 6 or 7 of them) using RDF. (For more details, check out the 2006 XTech talk about POPS.)

The interesting bit about POPS is the client user interface, called JSpace, which started off as our clone of mspace, but has since diverged in some non-trivial ways. JSpace translates user input into RDF queries against a data aggregation accessible via HTTP.

JSpace is an example of what the cool kids are calling these days a linked data browser, though we haven’t yet done a good enough job talking about it publicly, so no one really knows anything about it at all. One project for the summer is to get more demo data sets up on our site so people can play around with the webstart version of JSpace.

Second, our first internship program, which we’re calling 2007 Semantic Summer, is already an unexpected success. Honestly, I didn’t think we’d get a single applicant, since we’re new and the program is even newer. But we’ve gotten about 10 so far, several of them from very strong candidates, mostly people working on a PhD in computer science and pursuing a diss topic in Semantic Web.

We’ve accepted four applicants for the summer, and there’s another who may intern with us in the fall. This is all very exciting: they’ll be working on a range of projects, including new stuff for Pellet, the next version of BIANCA, and some of our internal projects.

Third, the 2007 OWLED (the OWL Experience and Directions) Workshop—which we’re proud to sponsor—is coming up very soon, the first week of June, right after ESWC in Innsbruck. I wish more of us were able to attend, but it’s not a cheap or easy trip from DC and we’re swamped with engaging work. We did get two papers accepted; they’ll be presented by our European R&D, i.e., Bijan.

We’ve very excited to see the registration numbers looking good, as well as a very cool program of talks and papers. If you’re into OWL DL, OWLED is the conference.

Finally, watch this space on or around 7 June: we’ll have a couple of announcements to coincide with OWLED which will be worth hearing, especially if you’re into OWL.

Spread the word: These icons link to social bookmarking sites where readers can share and discover new web pages.
  • Reddit
  • Digg
  • del.icio.us
  • TwitThis
  • Technorati