Civilization advances by extending the number of important operations which we can perform without thinking about them. Alfred North Whitehead

PelletServer Docs

Chapter One—A Quick Overview of Features

TLDR? PelletServer provides simple access to a rich set of semantic technology services; most services may be accessed via HTTP GET.

Table of Contents

A Quick Overview of Features

Introduction

We present PelletServer's features in three parts: logical, statistical, and data management.We also give examples of the most basic interaction with each service, by using wget on the command-line. Programmatic access via a PelletServer client in Java, Scala, and JavaScript is covered in Chapter $chapter. All PelletServer services expect information to be encoded in RDF or in OWL. PelletServer supports the usual serialization formats of both RDF and OWL (RDF/XML, Turtle, etc). Where possible, statistical services take advantage of logical information (i.e., owl:sameAs assertions); and vice versa.

The focus of PelletServer's capabilities is on semantic data management and analysis; these may support both analytic and transactional applications, often where the focus is only information integration and analysis.

Multi-Tenancy Knowledge Bases

PelletServer can provide services for an arbitrary number of data sets, which are called “knowledge bases” ( KBs, for short). Practically speaking, KBs are RDF graphs or OWL ontologies and are configured at run-time via the PelletServer configuration file.See Chapter Two for information about configuring PelletServer. PelletServer “backend services” may run on distributed, non-local systems; and there may be more than one PelletServer frontend, either behind an HTTP load-balance or equivalent mechanism. Deployment patterns or strategies are opaque to consumers of PelletServer's REST API, which provides a helpful abstraction from those details. PelletServer doesn't generally support services against arbitrary data pulled off the Web at request-time, owing to the obvious performance limitations of that approach.

Asynchronous Requests

PelletServer supports asynchronous requests via two patterns: Slow REST and WebHooks. If a request contains a WebHook, that pattern is used. If a request does not include a WebHook, and the Slow REST pattern is enabled, it will use that instead. See $chapter-asynch for more about asynchronous requests.

Logical Features

The logical capabilities include query, reasoning, data validation, and automated planning.

SPARQL Query

PelletServer's support for SPARQL query of PelletServer KBs—both RDF and OWL—is essentially an implementation of SPARQL Protocol. It is worth noting that PelletServer may either proxy for an arbitrary SPARQL endpoint, which means it can easily integrate any RDF database that implements SPARQL Protocol; it may also be configured to use Pellet's native SPARQL support for querying OWL, including Pellet's extensions: SPARQL-DL and the Terp syntax.

Query services are configured per PelletServer KB and may provide multiple SPARQL endpoints for a single KB. To query a PelletServer KB with SPARQL,

$ wget http://www.example.com/{kb}/query{?query,default-graph-uri,named-graph-uri}

OWL 2 Reasoning

PelletServer's reasoning service is based on OWL 2 automated reasoning provided by Pellet. PelletServer supports consistency checking, concept satisfiability, classification, realization, query, datatype reasoning, modularity, explanation, debugging, repair, SWRL rules, and incremental reasoning.

OWL 2 Reasoner Family

Note: Pellet is less a single reasoner than a family of OWL 2 reasoners: it supports OWL 2 DL, EL, and QL profiles. It provides a common API, called Ortiz, as well as Jena API for programmatic Java access. Pellet is a family of reasoners because different profiles have different expressivity-peformance tradeoffs. The design approach in PelletServer is to configure an appropriate reasoner per knowledge base (i.e., per dataset) but not to expose any of that information in the REST interface. It is a matter of explicit configuration. A PelletServer admin may or may not choose to expose some or all of this OWL 2 profile information via PelletServer service advertisement.

Consistency Checking

To check the logical consistency of a KB:

$ wget http://www.example.com/{kb}/consistency

where {kb} is syntactic shorthand (using URI Templates) for the name of a valid KB under PelletServer management.See Chapter $conf for configuring PelletServer to handle KBs.

Consistency checking returns a boolean value; the default serialization is SPARQL (application/sparql-results+xml), i.e.:

<?xml version="1.0"?>
<sparql xmlns="http://www.w3.org/2005/sparql-results#">
  <head></head>
  <boolean>true</boolean>
</sparql>

Using client-side content negotiation in the normal way, consistency checking results can also be represented in JSON:

$ wget --header "Accept:application/sparql-results+json" http://www.example.com/{kb}/consistency

The results are as expected:

{"head":{},"boolean":true}

Requesting a results type that is not supported—

$ wget --header="Accept:text/turtle" http://www.example.com/{kb}/consistency

gives the usual response, i.e,. 406 Not Acceptable. PelletServer uses HTTP in ways that developers and HTTP libraries expect.

PelletServer's service description and discovery capability describes all available PelletServer KBs, operations on those KBs, including HTTP methods and acceptable resource formats. PelletServer provides hypertext that can be used as the engine of application state.See Chapter $discovery.

Classification

To retrieve the class tree or hierarchyClassification computes the subclass relations between every named class and arranges them hierarchically. of a KB,

$ wget http://www.example.com/{kb}/classify

The classification hierarchy resource for a KB may be represented as text/turtle, application/rdf+xml, or text/html.

Realization

To realizeRealization computes the direct types for each individual in the KB; note that realization requires classification, since direct types are defined with respect to the class hierarchy. Think of realization as a kind of standing query; that is, you define classes that describe arbitrarily complex things that are of interest in the data (for example, HighCreditRiskCustomer or PersonOfInterest), and the realization process finds all of the individuals in the KB that are types of those classes, i.e., that are answers to those queries. The way in which this approach in an ontology kicks the ass of standing queries is that these classes can be arranged hierarchically in subsumption relationships, reasoned about for logical consistency, etc. Good stuff. a KB,

$ wget http://www.example.com/{kb}/realize

It supports the same MIME types as classification (Turtle, RDF/XML, and HTML).

Explanation

PelletServer's explanation support provides access to Pellet's explanation service, which, for any inference that Pellet can compute, will explainThe explanation itself is a set of OWL axioms which, taken together, justify or support the inference in question. There may be many (even infinitely many) explanations for an inference; Pellet heuristically attempts to provide a good explanation. why that inference holds. PelletServer's interface for the explanation service is more complex than the other reasoning services; the complexity is largely a matter of additional URLs, each of represents a kind of inference explanation. The explanation services all return either OWL serialized as RDF/XML or Turtle. We give examples of each kind of explanation service: inconsistency, subclass, property, instance, unsatisfiability, and query.

(TODO: Support for all-unsat or explanation cardinalities...)

To explain an inconsistency,

$ wget http://www.example.com/{kb}/explain/inconsistent

To explain a subclass relationship between a named subclass and a superclass,

$ wget http://www.example.com/{kb}/explain/subclass/{sub}/{super}

Note: PelletServer supports a namespace binding service—basically: syntactic sugar for named classes—so that URIs don't have to be encoded within URLs.

Using the bindings given by the namespace service, we can ask for an explanation for why nasa:Employee is a subclass of foaf:Person:

$ wget http://www.example.com/NASA/explain/subclass/nasa:Employee/foaf:Person

To explain a property relation,That is, why s has value o for property p.

$ wget http://www.example.com/{kb}/explain/property/{subject}/{predicate}/{object}

To explain why an individual is an instance of some named class (that is, why it's rdf:type is some class),

$ wget http://www.example.com/{kb}/explain/instance/{instance}/{class}

To explain why a named class is unsatisfiable,An unsatisfiable class is one which cannot (logically) have any instances. An unsatisfiable class does not make an OWL ontology inconsistent; but an instance of that class in the ontology makes the ontology inconsistent. In short: an unsatisfiable class is neither necessary nor sufficient to cause an ontology to be inconsistent. Unsatisfiable classes usually indicate some kind of bad modeling; but they do not alone make an ontology inconsistent.

$ wget http://www.example.com/{kb}/unsat/{class}

Finally, all of these explanation resources are syntactic sugar for the query explanation service. The query explanation service gives an explanation for any arbitrary bits of a KB, picked out by a SPARQL query

$ wget http://www.example.com/{kb}/explain{?query}

(TODO: This explanation query service is going away for a simpler design: POST a graph of triples to a KB and get back explanations of them. Requires multipart response body; or some kind of hybrid JSON-RDF structure that maps explanations and explanandums.)

Modularity

Pellet supports ontology modularity: take an OWL ontology as input and return a set of ontology modules.Ontology modularity is very roughly equivalent to database sharding. The set of modules has some interesting properties. First, any inference that can be legally drawn from the ful ontology can be legally drawn from the set of ontology modules. We call this safety in the sense that modularizing an ontology is inference-preserving and, hence, a semantically safe operation. Second, the set of modules is economic: each module is as small as possible, while still fully defining the terms or concepts it contains. Finally, the modularity process itself is quite performant, with good worst-case complexity; that is, it's relatively cheap to modularize an ontology of arbitrary size.

Any Pellet-backed KB under management in PelletServer may be modularized, which typically has good results in terms of reasoning performance. These modules, however, are not exposed via PelletServer; they are, strictly speaking, an internal or implementation detail.

PelletServer also supports modularity as a service; it takes an ontology as input and returns a set of modules for further use.

To modularize an ontology, foo.owl:

$ wget --post-file=/tmp/foo.owl http://www.example.com/modularity/

That is, you POST the ontology to the modularity endpoint...TODO

Incremental Reasoning, SWRL Rules, and Datatype Reasoning

PelletServer provides access to Pellet's incremental reasoning, SWRL rules, and datatype reasoning services. However, given the nature of these services, they are configured or accessed a bit differently.

Incremental, Persistent Reasoning

As of version 2.1, Pellet is able to reason incrementally and is able to persist to disk the results (and internal structures) of reasoning, both of which improve reasoning performance. A PelletServer KB may be configured to take advantage of Pellet's incremental and persistent reasoning features; but, presently, those details are not exposed via PelletServer's service advertisement.Please give us feedback if you have use cases or requirements for exposing that information.

SWRL Rules Reasoning

SWRL is a de facto standard for rules reasoning in OWL ontologies. If Pellet reasons with an ontology that contains SWRL rules, the rules and their consequences will be fired. See the Pellet FAQ about SWRL for more information.

Datatype Reasoning

As with SWRL rules, Pellet supports sophisticated datatype reasoning that is enabled if an OWL ontology contains axioms or individuals or typed literals that trigger datatype reasoning.

Spatial Reasoning

Pellet has a qualitative spatial reasoning extension (PelletSpatial) that is available in PelletServer. It requires use of SPARQL magic predicates, which correspond to spatial relations. TODO.

Probabilistic Reasoning

Pellet has a probabilistic DL reasoning extension (Pronto)...TODO

Data Validation

PelletServer provides access to data validation services via Pellet's Integrity Constraint Validator (ICV) extension. ICV adds closed world semantics to OWL. Based on those semantics, an OWL ontology may be used to validate some RDF data, that is, to check that data for integrity constraint violations. So ICV treats OWL as a schema language for RDF and Linked Data.

PelletServer supports two modes of ICV operation:

  1. Validate a KB with a pre-defined constraints ontology
    $ wget http://www.example.com/{kb}/validate 
  2. Validate a KB with a arbitrary constraints ontology pulled off the WebPelletServer generally does not pull arbitrary data from the Web at request-time. However, this is a notable exception, since ICV ontologies will typically be quite small and may be useful even when published by third parties.
    $ wget http://www.example.com/{kb}/validate{;icv-ontology}

Explanation of Integrity Constraint Violations

One of the advantages of treating constration validation as a kind of OWL reasoning is that Pellet's explanation facility can be used to explain integrity constraint violations, which can be an aid to manual or automated data cleansing, etc. TODO

Automated Planning

HotPlanner is Clark & Parsia's mature, featureful, domain-independent HTN Planner integrated with Pellet; the planner's state is maintained as an OWL ontology, with access to Pellet's reasoning services. HotPlanner supports OWL-based open world reasoning via Pellet; constrained asset management; scheduling and planning for parallel or concurrent task execution; and multi-objective plan optimization.

TODO

$ wget http://www.example.com/{kb}/plan

The result is...

Statistical Features

The statistical capabilities of PelletServer include semantic search, machine learning, and natural language processing.

Semantic Search

PelletServer's semantic search capability uses information retrieval technology together with RDF and OWL reasoning to index KB individuals (i.e., instances of classes)—rather than to index documents (though it can also do that, too)—and the values of RDF literals. Since there isn't a single useful notion of how to partition the statements about an individual, PelletServer's semantic search uses SPARQL DESCRIBE queries to index a KB's individuals. We've implemented the DESCRIBE query form using six different algorithms, which are configurable at index-time—these algorithms provide semantic search that's tunable for particular KBs.See $chapter-configuration for details about these semantic search indexing strategies.

The result of this approach is that semantic search results are pointers (i.e., URIs or URLs) to individuals in the KB, rather than to documents. Since they are typically semantically-typed (either explicitly or via inference), they are more semantically precise than document-based results.

To make use the semantic search engine,

$ wget http://www.example.com/wine/search?{search}

For example,

$ wget http://www.example.com/wine/search?search=red+wine

The results type is JSON:

[
  {"hit":{"value":"http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#MariettaOldVinesRed","type":"uri"},
          "score":10.54171085357666},
  {"hit":{"value":"http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#Red","type":"uri"},
          "score":7.4853034019470215},
  {"hit":{"value":"http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#RedBurgundy","type":"uri"},
          "score":7.367768287658691},...
]

Machine Learning

PelletServer's machine learning capabilities are provided by Corleone, our statistical inference system specifically for RDF and OWL (i.e., relational) data. It supports predictionIn the machine learning literature this is often called classification, but we call it “prediction” to avoid confusion with classification as an OWL reasoning service. (predicting the value of an individual's property); clustering (arranging individuals into related groups); and similarity (finding similar individuals to a given one).

To classify (i.e., predict) the value of a property p for some individual,

$ wget http://www.example.com/{kb}/classify/{individual}/{property}

The individual may be specified in one of two ways:

  1. Either by URI or URL representing the individual resource.
  2. By the value of an inverse functional property (IFP; that is, a property that uniquely identifies an individual).For example, a part_number property in a supply chain management application might be inverse functional. Modulo privacy concerns with using Social Security numbers to identify people in the US, SSN is inverse functional.

When you want to identify an individual for prediction using an IFP, you must specify both the property and the property value; for example, supply:part_number=123ABC.

The prediction property is specified using the {namespace}:{property_name} syntax; or the full URI or URL may be used, alternately.

To cluster the instances of a type,

$ wget http://www.example.com/{kb}/cluster/{number}/{type}

Note that {number} is the number of clusters you want, not the number of individuals in a cluster. Also, the {type} argument is optional; if it's not specified, then the entire KB is clustered into {number} clusters.

Finally, to infer similarity for an individual,

$ wget http://www.example.com/{kb}/similar/{individual}/{number}

An individual is identified in one of the two ways described previously; {number} is the number of similar individuals.

TODO: Specify defaults, return types, finalize the syntax for IFPs.

Natural Language Processing

TODO

(document clustering, keyword extraction, key sentence or document 'sense' extraction, content extraction, etc.)

Data Management Features

The final group of capabilities include transactional update, service description and discovery, and administrative.

Transactional Update

TODO

Service Description & Discovery

The first point to make—and we'll consider it in more depth later on—is that the contract between PelletServer and its consumers concerns the service description and discovery (SDD) resource, not particular URLs as described by URI Templates in the SDD. In other words, the SDD decouples consumers from URLs by describing them dynamically and accurately. The contract is that PelletServer will serve resource representations as described in the SDD; hence, consumer clients should dynamically construct an API from the SDD. Put another way: the URLs described by URI Templates in the SDD may change. While we consider it best practice that they not change, a pragmatic approach to reality suggests that they will change.Even PelletServer deployments that never change an extant URL may always add additional services or KBs; that kind of change won't break even tightly coupled systems, but it may mean that new data or services will be unused and unknown to consumers.

The practical consequence of this point is that PelletServer clients and other consumers should check the SDD resource to insure that the URLs and resources it describes are still in play; it should construct its API such that code is not tightly coupled to those URLs. PelletServer clients should use the SDD hypermedia as the engine of application state; that is, dereference and parse it dynamically (subject to its caching and freshness metadata) in order to use its most current version.

TODO

Administrative

PelletServer provides a few administrative services to make life easier for consumers.

KB List

While the list of managed KBs is available in the service advertisement resource, it can be a long, complex document, if all you're looking for is a list of KBs. The KB List service provides an alternate entry-point for hypertext navigation through PelletServer's managed space:

$ wget http://www.example.com/kb-list

It returns a JSON map of URLs to KB identifiers:

{
     "http://www.example.com/wine", : "wine",
     "http://www.example.com/peoplepets" : "peoplepets",
     "http://www.example.com/dbpedia", : "dbpedia",
     ...
}

Namespace Mapper

Because concepts and data are identified in RDF and OWL by URLs and URIs, which can be arbitrarily long in practice, using those identifiers as part of service requests can be awkward. PelletServer supports a namespace “short name” service that describes short names for some standard namespace identifiers. The service can be extended via configuration to also shorten user-specific namespace identifiers. All of these identifiers are subsequently recognized (and expanded appropriately) at request-time by all PelletServer services.The exception is SPARQL query answering, which ignores PelletServer short names in favor of the standard SPARQL PREFIX mechanism.

The namespace service is available both globally and on a per-KB basis:

$ wget http://www.example.com/ns-service

Or:

$ wget http://www.example.com/{kb}/ns-service

The service returns short names mapped to namespace identifiers in JSON:

{
  "wine": "http://www.w3.org/TR/2003/PR-owl-guide-20031209/wine#",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "cp": "tag:clarkparsia.com,2010-06-21:pelletserver:",
  "food": "http://www.w3.org/TR/2003/PR-owl-guide-20031209/food#",
  "owl": "http://www.w3.org/2002/07/owl#",
  "xsd": "http://www.w3.org/2001/XMLSchema#",
  "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "skos": "http://www.w3.org/2008/05/skos#",
  "sparql": "http://www.w3.org/2005/sparql-results#"
}

Notes