Making Ontology Modules Fast
by Mike Smith
We’ve been collaborating with the National Cancer Institute this year to improve Pellet’s performance when working with the NCI Thesaurus, a very large OWL ontology for which the interesting problems are TBox problems.
In a previous post, we highlighted our success reducing classification times for the Thesaurus by a few orders of magnitude. Together with that work, we also added explanation services to the NCI infrastructure. These improvements have had the desired effect—our client has reported that the improvements enabled some long-standing modeling bugs to be diagnosed and repaired.
More recently we’ve undertaken an effort to add incremental classification support to Pellet, with the goal of enabling Thesaurus editors to see the impact of their changes in real-time. The first step towards incremental classification is partitioning of the ontology into modules. For us, this meant a review of the existing state-of-the-art in ontology modularity and some engineering to turn prototype algorithms into professional software. To date, it has yielded some very satisfying results. Though we never waited long enough for it to complete, our estimates are that the first-cut partitioning code was taking about 50 hours to partition the latest version of the Thesaurus—after a few days of software engineering, the partitioning takes about 5 minutes.
The impact this has on classification times is something we expect to detail in a future post.
For now, we’re taking it as another example of what Bijan said here recently: the performance story in OWL is getting better all the time. As you can tell, we’re happy and proud to be a part of that.




