The C&P Semantic Summer R&D Internship program is an intensive internship focused on the Semantic Web and related R&D areas; it combines real-world problems with state of the art research questions in KR, web systems, AI, etc.
Update: We used to call this “Semantic Summer”, which was cute but limiting, since it suggested we were only interested in interns during, well, the summer. In fact, we’ve already moved to accommodating interns and trainees year-round, including people who need J1 support (i.e., students from the EU and Asia).
You don’t need to have a background in AI or logic programming specifically to be a successful C&P intern. You do need to be an experienced and enthusiastic learner and programmer with a strong CS education in hand or in progress. Some of these projects are more suited for a graduate student; others are suitable for an upper-level undergrad. They’re all substantial projects in which there is some base upon which to start but you’ll have to go beyond that base in order to hit a home run.
Real-world R&D
Students seek internships for a variety of reasons. We think a good reason to intern with us is because our R&D Agenda is substantially the same as the projects we describe below. In other words, C&P interns will be working on real problems that really matter to industry. Nothing insures productivity as much as happiness at work; and few things contribute as more to that happiness than significant, meaningful challenges in the workplace.
Background Reading
Many of these projects are generally within the KR area of Description Logic and, typically, OWL DL in particular. The single best place to start learning DL is with the seminal Description Logic Handbook, which is partially available online.
Pellet
Much of what we do is based on OWL DL, and we think Pellet is one of the best all-around OWL DL reasoners available. If you want to work on DL, or learn about Semantic Web reasoning, Pellet is a good project to work on. See also: the Pellet homepage.
SPARQL-DL
Implement and optimize a SPARQL-DL query engine in Pellet. This is being actively worked on in the Fall of 2007 and throughout 2008 by interns.
XACML and WS-Policy
We’re exploring the use of OWL DL to manage various policy languages, including XACML and WS-Policy; but these efforts are in their infancy. To learn more about this approach, there are several papers, ontologies, or systems to study:
SNOMED
Translating SNOMED (or OpenGalen) into OWL so that we can study them with Pellet is an important first step in using Description Logic to manage some kinds of health data. See also:
D2RQ and ETL
ETL is a data warehousing technique for integration via aggregation. We’re interested in the relation between ETL and the query proxying approach explored by D2RQ, as well as in optimization techniques for ETL frameworks (including exploiting parallelism through concurrency, distributed message passing systems like Spread, etc.). See also:
Spread or, even better, RabbitMQ (and AMQP generally)
Prediction Markets
We’re enamored of prediction markets and market design theory generally, as well as the field of econometrics. This problem is probably the most interdisciplinary and speculative because we’re looking for someone who’s interested in both DL and econometrics or market design theory to work at the intersection of these two fields. Is there a role for DLs to play in prediction markets or electronic markets generally? While there is some new PhD research being done in this area recently, it’s pretty wide open. See also: A Short Prediction Market Primer.
Rules
Pellet supports both AL-log and DL-safe rules; but both implementations could be optimized and extended. We’re also interested in complete SWRL support, which is almost certainly more than a summer’s worth of work. Or, put another way: if you can completely implement SWRL in a summer, your future is very promising. See also:
Query Answering for OWL-DL with Rules
AL-log: Integrating Datalog and Description Logics
Default Logic
There are a host of interesting non-monotonic extensions of OWL DL; Reiter’s Default Logic is one such. It is often the case that ontology modelers would like to be able to say, in effect, “by default, something is true”. Extending OWL DL with Default Logic makes a form of defaulting possible. See also:
Probabilistic Reasoning
P-SHOQ: A Probabilistic Extension of SHOQ for Probabilistic
See Pronto
Finite Model Reasoning
Finite Model Reasoning in Description Logics
Abduction
A Case for Abductive Reasoning over Ontologies
DL-Lite and EL++
OWL DL is a very expressive KR language. Therefore, reasoning with OWL DL is computationally expensive (NEXPTIME worst-case complexity). Although, highly optimized tableau algorithms can scale to large datasets scalability in the order of RDBMS systems (billions of statements) is not possible with the current techniques. One way to get around this problem is to define tractable fragments of the language which are less expressive but easier to reason with. Several such fragments (e.g. DL-Lite, EL++) have been identified in the literature and we are looking for students who are interested in one of these fragments to build a proof-of-concept prototype reasoner. See also:
Linking Data to Ontologies: The Description Logic DL-Lite_A
Spatial Logic
There are increasing number of applications where one needs to represent and reason with geospatial relations. Although, OWL can be used represent many different things about a spatial region but in such applications you need to express special-purpose relations so you can say “a region has a border with another region” or “a region is inside another one”. There are quite a number of spatial logics that have been designed for this purpose, RCC being the most prominent. We are interested in integrating OWL DL (or a less expressive fragment) with RCC (or another similar spatial logic). See also:
Representing Qualitative Spatial Information in OWL-DL
Epistemic Queries and the K Operator
OWL DL adopts Open World Assumption (OWA) rather the Closed World Assumption (CWA) of traditional database systems. With CWA, if we do not know the truth of a statement we assume it is false. This assumption does not make sense on the Web, when we do not have complete knowledge about the world. However, we generally have complete knowledge about some particular domain. In such cases, we want to query what is known rather than what is possible. Epistemic queries can be used exactly for this purpose allowing one to use OWA to represent their knowledge and use CWA to query it. We think extending Pellet with an epistemic query engine is a cool idea, don’t you? See also:
Towards a Nonmonotonic Extension to OWL
EQL-Lite: Effective First-Order Query Processing in Description Logics
