Why not SPARQL (for DBLP)?
by Bijan Parsia
Hoisted from the comments:
Hi, Instead of using XSLT, you could also use directly the RDF access to DBLP : http://www4.wiwiss.fu-berlin.de/dblp/ Normally a Sparql query can give you all the needed data. Did you try this way ?
I did not so try. After poking a bit I became convinced that I didn’t want to so try! It seems much harder. Well, certainly harder than not doing anything at all, which, since I have a solution at the moment, is an available option!
Also, my personal infrastructure is much worse for using SPARQL in this circumstance. I had trouble browsing the data inside the RDF-DBLP (Tabulator never works for me, Disco wasn’t too helpful, and the Ajax SPARQL thing didn’t do much for me either). With normal DBLP, the info design and presentation are already done for me. This makes a big difference! The spidering is easy and the scraping even easier! (Tidy access was indispensable for doing this in XSLT, though.) And this way I have reasonable Bibtex as well as an Exhibit. (I didn’t see Bibtex as a translation target of Babel and all the translators I saw while googling were from Bibtex, not to Bibtex. That’s a bit of a deal killer for me. Of course, the JSON to Bibtex route wouldn’t be that hard. But remember, I wasn’t out to show the glory of semweb tech…I was just trying to put together a nice and cool publications page. Imagine telling a random person who might have a bit of XSLT under their belt and a decent XML/XSLT environment (say oXygen) on their machine that they need to go to RDF-DBLP, figure out the schema in play, the URIs, the SPARQL, find a client, etc. etc. etc. instead of, “oh, you can scrape your DBLP page and use the Bibtex to build an Exhibit”. And you don’t need XSLT…a simple Python, Perl, Ruby, whathaveyou script will do the scrape pretty easily in this case.)
In another comment, Keith Alexander points out that Exhibit itself might treat its JSON format as less structured than I thought. This is an interesting question. I’m not sure exactly how to characterize “real” structure from, as I put it, “aped” structure. (And I’m deferring that question for the moment.)
Keith has done some work that would make scraping my pub information from RDF-DBLP to Exhibit much easier. He has a script which takes an Exhibit “template file” and converts it to a series of SPARQL queries. (Of course, I want something client side, since I’m collecting and massaging stuff first. I don’t really want to dynamic query anything at the moment. My pub list and home page don’t need or particular want that.)
In the end, even if I had all the appropriate infrastructure easily to hand, I might still have gone with the XSLT. It seems, overall, a bit thinner and working with it kept me more “in the zone” (notwithstanding having to look up XPath and XSLT stuff…what a nightmare!). Having document() makes a huge difference, since I can write a little targeted spider very easily. I think, without a huge amount of evidence, that the result is easier to share and explain (though we’d need a worked out SPARQL based version). XSLT (and XQuery) engines should definitely have Tidy built in. It’s a hard thing to standardize, but it’s so necessary.
Lest I be seen as to negative toward RDF-DBLP, I must say that I think it will be very handy as a jSpace back end. Certainly beats massaging and publishing all that DBLP XML ourselves!
March 20th, 2007 at 4:43 am
Hi,
I was really surprised to see my comment again…
The answer is really interesting. In fact, it’s true I was looking for some kind of tech show. I discovered this RDF DBLP two days before on the W3C SWEO wiki, and I was looking for a way to integrate it in Exhibit… so when I saw your post, my first though was “same data, same result but the middle is different…” Maybe my comment was kind of a “lazy web” one to see if you would do the job ;)
But I must agree plain DBLP is much easier to use right now. On the other side, more interesting than RDF DBLP is SwetoDBLP http://lsdis.cs.uga.edu/projects/semdis/swetodblp/ but it’s only available as a dump and I wouldn’t deal with a triple store just for that…
Also, this Exhibit/sparql looks really interresting, I might give it a try….
March 20th, 2007 at 12:45 pm
I hope it was a good surprise!
I’d be interested in what you come up with.