A Quick Overview of Features
Introduction
We present PelletServer's features in three
parts: logical, statistical,
and data management.wget on the command-line. Programmatic access
via a PelletServer client in Java, Scala, and JavaScript is
covered in Chapter $chapter.owl:sameAs assertions); and vice versa.
The focus of PelletServer's capabilities is on semantic data management and analysis; these may support both analytic and transactional applications, often where the focus is only information integration and analysis.
Multi-Tenancy Knowledge Bases
PelletServer can provide services for an arbitrary number of
data sets, which are called “knowledge bases” ( KBs,
for short). Practically speaking, KBs are RDF graphs or OWL
ontologies and are configured at run-time via the PelletServer
configuration file.
Asynchronous Requests
PelletServer supports asynchronous requests via two patterns: Slow REST and WebHooks. If a request contains a WebHook, that pattern is used. If a request does not include a WebHook, and the Slow REST pattern is enabled, it will use that instead. See $chapter-asynch for more about asynchronous requests.
Logical Features
The logical capabilities include query, reasoning, data validation, and automated planning.
SPARQL Query
PelletServer's support for SPARQL query of PelletServer KBs—both RDF and OWL—is essentially an implementation of SPARQL Protocol. It is worth noting that PelletServer may either proxy for an arbitrary SPARQL endpoint, which means it can easily integrate any RDF database that implements SPARQL Protocol; it may also be configured to use Pellet's native SPARQL support for querying OWL, including Pellet's extensions: SPARQL-DL and the Terp syntax.
Query services are configured per PelletServer KB and may provide multiple SPARQL endpoints for a single KB. To query a PelletServer KB with SPARQL,
$ wget http://www.example.com/{kb}/query{?query,default-graph-uri,named-graph-uri}
OWL 2 Reasoning
PelletServer's reasoning service is based on OWL 2 automated reasoning provided by Pellet. PelletServer supports consistency checking, concept satisfiability, classification, realization, query, datatype reasoning, modularity, explanation, debugging, repair, SWRL rules, and incremental reasoning.
OWL 2 Reasoner Family
Note: Pellet is less a single reasoner than a family of OWL 2 reasoners: it supports OWL 2 DL, EL, and QL profiles. It provides a common API, called Ortiz, as well as Jena API for programmatic Java access. Pellet is a family of reasoners because different profiles have different expressivity-peformance tradeoffs. The design approach in PelletServer is to configure an appropriate reasoner per knowledge base (i.e., per dataset) but not to expose any of that information in the REST interface. It is a matter of explicit configuration. A PelletServer admin may or may not choose to expose some or all of this OWL 2 profile information via PelletServer service advertisement.
Consistency Checking
To check the logical consistency of a KB:
$ wget http://www.example.com/{kb}/consistency
where {kb} is syntactic shorthand (using URI
Templates) for the name of a valid KB under PelletServer
management.
Consistency checking returns a boolean value;
the default serialization is SPARQL
(application/sparql-results+xml), i.e.:
<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
<head></head>
<boolean>true</boolean>
</sparql>
Using client-side content negotiation in the normal way, consistency checking results can also be represented in JSON:
$ wget --header "Accept:application/sparql-results+json" http://www.example.com/{kb}/consistency
The results are as expected:
{"head":{},"boolean":true}
Requesting a results type that is not supported—
$ wget --header="Accept:text/turtle" http://www.example.com/{kb}/consistency
gives the usual response, i.e,. 406 Not
Acceptable. PelletServer uses HTTP in ways that
developers and HTTP libraries expect.
PelletServer's service description and discovery capability
describes all available PelletServer KBs, operations on those KBs,
including HTTP methods and acceptable resource formats. PelletServer
provides hypertext that can be used as the engine of application
state.
Classification
To retrieve the class tree or hierarchy
$ wget http://www.example.com/{kb}/classify
The classification hierarchy resource for a KB may be
represented as
text/turtle, application/rdf+xml,
or text/html.
Realization
To realizeHighCreditRiskCustomer
or PersonOfInterest), and the realization process
finds all of the individuals in the KB that are types of those
classes, i.e., that are answers to those queries. The way in
which this approach in an ontology kicks the ass of standing
queries is that these classes can be arranged hierarchically in
subsumption relationships, reasoned about for logical
consistency, etc. Good stuff.
$ wget http://www.example.com/{kb}/realize
It supports the same MIME types as classification (Turtle, RDF/XML, and HTML).
Explanation
PelletServer's explanation support provides access to
Pellet's explanation service, which, for any inference that
Pellet can compute, will explain
(TODO: Support for all-unsat or explanation cardinalities...)
To explain an inconsistency,
$ wget http://www.example.com/{kb}/explain/inconsistent
To explain a subclass relationship between a named subclass and a superclass,
$ wget http://www.example.com/{kb}/explain/subclass/{sub}/{super}
Note: PelletServer supports a namespace binding service—basically: syntactic sugar for named classes—so that URIs don't have to be encoded within URLs.
Using the bindings given by the namespace service, we can ask
for an explanation for why nasa:Employee is a
subclass of foaf:Person:
$ wget http://www.example.com/NASA/explain/subclass/nasa:Employee/foaf:Person
To explain a property relation,
$ wget http://www.example.com/{kb}/explain/property/{subject}/{predicate}/{object}
To explain why an individual is an instance of some named
class (that is, why it's rdf:type is some
class),
$ wget http://www.example.com/{kb}/explain/instance/{instance}/{class}
To explain why a named class is unsatisfiable,
$ wget http://www.example.com/{kb}/unsat/{class}
Finally, all of these explanation resources are syntactic sugar for the query explanation service. The query explanation service gives an explanation for any arbitrary bits of a KB, picked out by a SPARQL query
$ wget http://www.example.com/{kb}/explain{?query}
(TODO: This explanation query service is going away for a simpler design: POST a graph of triples to a KB and get back explanations of them. Requires multipart response body; or some kind of hybrid JSON-RDF structure that maps explanations and explanandums.)
Modularity
Pellet supports ontology modularity: take an OWL ontology as input
and return a set of ontology modules.
Any Pellet-backed KB under management in PelletServer may be modularized, which typically has good results in terms of reasoning performance. These modules, however, are not exposed via PelletServer; they are, strictly speaking, an internal or implementation detail.
PelletServer also supports modularity as a service; it takes an ontology as input and returns a set of modules for further use.
To modularize an ontology, foo.owl:
$ wget --post-file=/tmp/foo.owl http://www.example.com/modularity/
That is, you POST the ontology to the modularity endpoint...TODO
Incremental Reasoning, SWRL Rules, and Datatype Reasoning
PelletServer provides access to Pellet's incremental reasoning, SWRL rules, and datatype reasoning services. However, given the nature of these services, they are configured or accessed a bit differently.
Incremental, Persistent Reasoning
As of version 2.1, Pellet is able to reason incrementally and
is able to persist to disk the results (and internal structures)
of reasoning, both of which improve reasoning performance. A
PelletServer KB may be configured to take advantage of Pellet's
incremental and persistent reasoning features; but, presently,
those details are not exposed via PelletServer's service
advertisement.
SWRL Rules Reasoning
SWRL is a de facto standard for rules reasoning in OWL ontologies. If Pellet reasons with an ontology that contains SWRL rules, the rules and their consequences will be fired. See the Pellet FAQ about SWRL for more information.
Datatype Reasoning
As with SWRL rules, Pellet supports sophisticated datatype reasoning that is enabled if an OWL ontology contains axioms or individuals or typed literals that trigger datatype reasoning.
Spatial Reasoning
Pellet has a qualitative spatial reasoning extension (PelletSpatial) that is available in PelletServer. It requires use of SPARQL magic predicates, which correspond to spatial relations. TODO.
Probabilistic Reasoning
Pellet has a probabilistic DL reasoning extension (Pronto)...TODO
Data Validation
PelletServer provides access to data validation services via Pellet's Integrity Constraint Validator (ICV) extension. ICV adds closed world semantics to OWL. Based on those semantics, an OWL ontology may be used to validate some RDF data, that is, to check that data for integrity constraint violations. So ICV treats OWL as a schema language for RDF and Linked Data.
PelletServer supports two modes of ICV operation:
- Validate a KB with a pre-defined constraints ontology
$ wget http://www.example.com/{kb}/validate - Validate a KB with a arbitrary constraints ontology pulled
off the Web
PelletServer generally does not pull arbitrary data from the Web at request-time. However, this is a notable exception, since ICV ontologies will typically be quite small and may be useful even when published by third parties. $ wget http://www.example.com/{kb}/validate{;icv-ontology}
Explanation of Integrity Constraint Violations
One of the advantages of treating constration validation as a kind of OWL reasoning is that Pellet's explanation facility can be used to explain integrity constraint violations, which can be an aid to manual or automated data cleansing, etc. TODO
Automated Planning
HotPlanner is Clark & Parsia's mature, featureful, domain-independent HTN Planner integrated with Pellet; the planner's state is maintained as an OWL ontology, with access to Pellet's reasoning services. HotPlanner supports OWL-based open world reasoning via Pellet; constrained asset management; scheduling and planning for parallel or concurrent task execution; and multi-objective plan optimization.
TODO
$ wget http://www.example.com/{kb}/plan
The result is...
Statistical Features
The statistical capabilities of PelletServer include semantic search, machine learning, and natural language processing.
Semantic Search
PelletServer's semantic search capability uses information
retrieval technology together with RDF and OWL reasoning to
index KB individuals (i.e., instances of classes)—rather
than to index documents (though it can also do that,
too)—and the values of RDF literals. Since there isn't a
single useful notion of how to partition the statements about an
individual, PelletServer's semantic search uses SPARQL DESCRIBE
queries to index a KB's individuals. We've implemented the
DESCRIBE query form using six different algorithms, which are
configurable at index-time—these algorithms provide
semantic search that's tunable for particular KBs.
The result of this approach is that semantic search results are pointers (i.e., URIs or URLs) to individuals in the KB, rather than to documents. Since they are typically semantically-typed (either explicitly or via inference), they are more semantically precise than document-based results.
To make use the semantic search engine,
$ wget http://www.example.com/wine/search?{search}
For example,
$ wget http://www.example.com/wine/search?search=red+wine
The results type is JSON:
[
{"hit":{"value":"http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#MariettaOldVinesRed","type":"uri"},
"score":10.54171085357666},
{"hit":{"value":"http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#Red","type":"uri"},
"score":7.4853034019470215},
{"hit":{"value":"http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#RedBurgundy","type":"uri"},
"score":7.367768287658691},...
]
Machine Learning
PelletServer's machine learning capabilities are provided by
Corleone, our statistical inference system specifically for RDF and
OWL (i.e., relational) data. It supports prediction
To classify (i.e., predict) the value of a property p for some individual,
$ wget http://www.example.com/{kb}/classify/{individual}/{property}
The individual may be specified in one of two ways:
- Either by URI or URL representing the individual resource.
- By the value of an inverse functional property (IFP; that is, a property that uniquely identifies an individual).
For example, a part_numberproperty in a supply chain management application might be inverse functional. Modulo privacy concerns with using Social Security numbers to identify people in the US, SSN is inverse functional.
When you want to identify an individual for prediction using an
IFP, you must specify both the property and the property value; for
example, supply:part_number=123ABC.
The prediction property is specified using the
{namespace}:{property_name} syntax; or the full URI or
URL may be used, alternately.
To cluster the instances of a type,
$ wget http://www.example.com/{kb}/cluster/{number}/{type}
Note that {number} is the number of clusters
you want, not the number of individuals in a cluster. Also, the
{type} argument is optional; if it's not specified, then
the entire KB is clustered into {number} clusters.
Finally, to infer similarity for an individual,
$ wget http://www.example.com/{kb}/similar/{individual}/{number}
An individual is identified in one of the two ways described
previously; {number} is the number of similar
individuals.
TODO: Specify defaults, return types, finalize the syntax for IFPs.
Natural Language Processing
TODO
(document clustering, keyword extraction, key sentence or document 'sense' extraction, content extraction, etc.)
Data Management Features
The final group of capabilities include transactional update, service description and discovery, and administrative.
Transactional Update
TODO
Service Description & Discovery
The first point to make—and we'll consider it in more
depth later on—is that the contract between PelletServer and
its consumers concerns the service description and discovery (SDD)
resource, not particular URLs as described by URI Templates
in the SDD. In other words, the SDD decouples consumers from URLs
by describing them dynamically and accurately. The contract is that
PelletServer will serve resource representations as described in the
SDD; hence, consumer clients should dynamically construct an API
from the SDD. Put another way: the URLs described by URI Templates
in the SDD may change. While we consider it best practice that they
not change, a pragmatic approach to reality suggests that they
will change.
The practical consequence of this point is that PelletServer clients and other consumers should check the SDD resource to insure that the URLs and resources it describes are still in play; it should construct its API such that code is not tightly coupled to those URLs. PelletServer clients should use the SDD hypermedia as the engine of application state; that is, dereference and parse it dynamically (subject to its caching and freshness metadata) in order to use its most current version.
TODO
Administrative
PelletServer provides a few administrative services to make life easier for consumers.
KB List
While the list of managed KBs is available in the service advertisement resource, it can be a long, complex document, if all you're looking for is a list of KBs. The KB List service provides an alternate entry-point for hypertext navigation through PelletServer's managed space:
$ wget http://www.example.com/kb-list
It returns a JSON map of URLs to KB identifiers:
{
"http://www.example.com/wine", : "wine",
"http://www.example.com/peoplepets" : "peoplepets",
"http://www.example.com/dbpedia", : "dbpedia",
...
}
Namespace Mapper
Because concepts and data are identified in RDF and OWL by
URLs and URIs, which can be arbitrarily long in practice, using
those identifiers as part of service requests can be
awkward. PelletServer supports a namespace “short
name” service that describes short names for some standard
namespace identifiers. The service can be extended via
configuration to also shorten user-specific namespace
identifiers. All of these identifiers are subsequently
recognized (and expanded appropriately) at request-time by all
PelletServer services.PREFIX mechanism.
The namespace service is available both globally and on a per-KB basis:
$ wget http://www.example.com/ns-service
Or:
$ wget http://www.example.com/{kb}/ns-service
The service returns short names mapped to namespace identifiers in JSON:
{
"wine": "http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#",
"rdfs": "http://www.w3.org/2000/01/rdf-schema#",
"cp": "tag:clarkparsia.com,2010-06-21:pelletserver:",
"food": "http://www.w3.org/TR/2003/PR-owl-guide-20031209/food#",
"owl": "http://www.w3.org/2002/07/owl#",
"xsd": "http://www.w3.org/2001/XMLSchema#",
"rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
"skos": "http://www.w3.org/2008/05/skos#",
"sparql": "http://www.w3.org/2005/sparql-results#"
}