Why not SPARQL (for DBLP)?

by Bijan Parsia

Hoisted from the comments:

Hi, Instead of using XSLT, you could also use directly the RDF access to DBLP : http://www4.wiwiss.fu-berlin.de/dblp/ Normally a Sparql query can give you all the needed data. Did you try this way ?

I did not so try. After poking a bit I became convinced that I didn’t want to so try! It seems much harder. Well, certainly harder than not doing anything at all, which, since I have a solution at the moment, is an available option!

Also, my personal infrastructure is much worse for using SPARQL in this circumstance. I had trouble browsing the data inside the RDF-DBLP (Tabulator never works for me, Disco wasn’t too helpful, and the Ajax SPARQL thing didn’t do much for me either). With normal DBLP, the info design and presentation are already done for me. This makes a big difference! The spidering is easy and the scraping even easier! (Tidy access was indispensable for doing this in XSLT, though.) And this way I have reasonable Bibtex as well as an Exhibit. (I didn’t see Bibtex as a translation target of Babel and all the translators I saw while googling were from Bibtex, not to Bibtex. That’s a bit of a deal killer for me. Of course, the JSON to Bibtex route wouldn’t be that hard. But remember, I wasn’t out to show the glory of semweb tech…I was just trying to put together a nice and cool publications page. Imagine telling a random person who might have a bit of XSLT under their belt and a decent XML/XSLT environment (say oXygen) on their machine that they need to go to RDF-DBLP, figure out the schema in play, the URIs, the SPARQL, find a client, etc. etc. etc. instead of, “oh, you can scrape your DBLP page and use the Bibtex to build an Exhibit”. And you don’t need XSLT…a simple Python, Perl, Ruby, whathaveyou script will do the scrape pretty easily in this case.)

In another comment, Keith Alexander points out that Exhibit itself might treat its JSON format as less structured than I thought. This is an interesting question. I’m not sure exactly how to characterize “real” structure from, as I put it, “aped” structure. (And I’m deferring that question for the moment.)

Keith has done some work that would make scraping my pub information from RDF-DBLP to Exhibit much easier. He has a script which takes an Exhibit “template file” and converts it to a series of SPARQL queries. (Of course, I want something client side, since I’m collecting and massaging stuff first. I don’t really want to dynamic query anything at the moment. My pub list and home page don’t need or particular want that.)

In the end, even if I had all the appropriate infrastructure easily to hand, I might still have gone with the XSLT. It seems, overall, a bit thinner and working with it kept me more “in the zone” (notwithstanding having to look up XPath and XSLT stuff…what a nightmare!). Having document() makes a huge difference, since I can write a little targeted spider very easily. I think, without a huge amount of evidence, that the result is easier to share and explain (though we’d need a worked out SPARQL based version). XSLT (and XQuery) engines should definitely have Tidy built in. It’s a hard thing to standardize, but it’s so necessary.

Lest I be seen as to negative toward RDF-DBLP, I must say that I think it will be very handy as a jSpace back end. Certainly beats massaging and publishing all that DBLP XML ourselves!

Viewing 2 Comments

 

Trackbacks

(Trackback URL)

close Reblog this comment
blog comments powered by Disqus