Semantic
Web Technologies
In Practice
Bijan Parsia
Semantic Web Technologies
- "The Semantic Web"
- Languages
- Infrastructure
- Parsers
& APIs
- Reasoners and query engines
- Debuggers,
model and module extractors, diffing, profiling...
- Editors,
browsers, visualizers
- Applications!
About me
- Founder and Chief Scientist at Clark &
Parsia, LLC
- 3 employees + founders (and
growing)
- Customers include NASA, NCI, NATO, defense
and telecom companies
- Custom applications, research
& development, tool infrastructure
- Lecturer
at University of Manchester
- Standards Vet
- OWL,
WSDL, WS-Policy, SPARQL, OWL 1.1, RIF, OWL-S, SWSI, etc.
- Philosopher
in Exile
A
Simple Taxonomy
- Needle in the
Haystack
- Known vs. Unknown Needle
- Building
the Stack
- Analysis
- Terminology
Engineering
Garlik.com
- (Known) Needle in the Haystack
- UK
Based tech startup
- “give people real
power over their online data”
- $18.5m
in venture capital
- Incorporates
members from the 3Store team
- DataPatrol
- Reports
on personal information online
- Uses
SPARQL to build these reports
- Currently
57,000 users!
- Key
developer, Steve Harris, member of DAWG
Garlik: Tech details
- Reports
- 500-2000
SPARQL queries to build a report
- Often
recursive, i.e., using prior results to find next ones
- 8
knowledge bases of 2 billion triples each
- Reports
take 1-2 seconds to generate
- Query
characteristics
- Lots of
GRAPH and OPTIONAL
- Results
- XML
Format but not the protocol (for performance)
A Building
Block: JSpace
- Front ends critical!
- Flexible
browsing
- The
mSpace
approach developed at U. of Southampton
- “Google
meets iTunes”
- Browsing rather
than querying
- Tame multi-dimensional data
- Selections
drive query building
- Each
column selection instantiates a variable and
adds some
conjuncts
- One can browse
intermediate results
POPS (a jSpace app)
- (Unknown)
Needle in the Haystack
- Expertise location service
for NASA
- Serendipity
is key
- Hence browsing over
search
- Federates 4 diverse
data sources
- Most queries
are built by browsing
- Fixed
queries for info pane and socnet
- Pilot
for Office of the Chief Engineer
- Production
will see 10,000 users
A POPS Note
- Some
simple advantages of RDF(S)
- Object model
helpful
- URIs as OIDs
- Attributes
and Values
- Links!
- Class
model useful
- If only for consistency
- E.g.,
faced with ad hoc inheritance models
- Standard
solutions often better
- Exploit
modeling affordances
BIANCA
- Business
Impact Analysis for Network
Computer Assets - Integrated
view of applications, servers, networks,
and changes, and
their relations
- Supports
interruption analysis
- Sensitive data, so
few users (~50) but high impact
- One
of the first deployed SemWeb Apps at NASA
- Tech
details
- Classification
tree, instance retrieval, graph building
BIANCA
- Interruption
analysis
- "Semantically" Shallow Analysis
- I.e.,
graph tracing to find dependancies
- Simply making the data
accessible is important!
- Expansion paths
- "Deeper"
analyses
- Interaction with security
policies
- Incorporating background
knowledge
- Build more complex organization
model
- Build more complex system/application model
("Semantically") Deep Analysis
- XACML, WS-Policy, etc.
- Languages
for expressing policy constraints
- Systems generally
focus on enforcement
- Policy
development and maintenance need:
- Services
explored by reduction to formalisms
- Such
as OWL
- Also suggestive of methodological points
- Iterative
refinement
- Auto-discovery of cross-cutting concerns
Policy Analysis
- A policy denotes a set of "acceptible" things
- By
describing them
- Thus a policy is
a kind of class
- Relations between classes are
(interesting) relations between policies
- Disjointness,
equivalence, subsumption, etc.
- Background
knowledge
- Some aspects stretch languages
like OWL
- Non-mon, builtins (math and XML)
- Production/Business
rule like behavior
- Change analysis and querying
- Currently
working on prototype
for NASA
Terminologies
- Bio-ontologies, esp. bio-medical ontologies, long
been the most significant application area
- SNOMED-CT,
GO, Galen, FMA, NCI Thesaurus, etc. etc. etc.
- Some
characteristics
- Very large; largeness is
the watchword
- Collaboratively developed
- Long-lived
- Often
complex modeling
- Applications
(and application characteristics) vary enormously
- Better
infrastructure is critical!
Terminology
Development
- NCI Thesaurus
- 50,000
concepts
- Team of up to 20
- Provide
terminology for application builders
- Recently
migrated to open source, standards based tools
- Protégé,
FaCT++, and (more recently) Pellet
- Heavy investment
in tool infrastructure
- Most recently
explanation services and incremental reasoning
- Critical
to both speeding up development and improving quality
A Key Building Block: Pellet
- Pellet is a popular, open source OWL reasoner
- First
to cover entire language, and 1.1
- Includes novel
services
- Debugging,
incremental reasoning, conj. query
- Good
middleware
- Jena, OWL API interfaces
- Command
line, web service
- Bundled
with many editors
- TopBraid
Composer, Protege4, Swoop
- Originally
developed at the University of Maryland
- Now
by C&P, actively
- Original author, Evren
Sirin, directs development
Performance...
Performance
is an ongoing issue!
Without
a solid DL
foundation, the Semantic Web would have remained
largely irrelevant to health care terminology standardization...
The
development of OWL 1.1 eliminated
one
of the most significant barriers to use of OWL for SNOMED, since it
permits the identification of tractable sublanguages capable of
handling the size and complexity of SNOMED.
Kent
Spackman, An
Examination of OWL and the Requirements of a Large Health Care
Terminology, OWLED 2007
...and Expressivity
People
need to be able to say what they want to say.
[A]dding
property chain inclusion axioms
... was essential. Without it, adoption of
OWL by the SNOMED community would have required awkward workarounds
with their attendant complications and complexities – effectively
killing movement in that direction. With it, we have a clear path to
using OWL 1.1 for further development and integration with other
biomedical ontologies.
Kent
Spackman, An
Examination of OWL and the Requirements of a Large Health Care
Terminology, OWLED 2007
Patient
Data Management
- UK NHS has a £6.2 billion “Connecting for Health”
IT program
- Key component is Care Records Service
(CRS)
- “interactive patient record
service accessible 24/7”
- Patient data distributed
across local centers in 5 regional clusters, and a national DB
- Detailed
records held by local service providers
- Diverse
applications support radiology, pharmacy, etc
- Applications
exchange messages containing “semantically rich clinical information”
- Summaries
sent to national database
- SNOMED-CT
ontology provides common vocabulary for data
SNOMED
- Large: 373,731 concepts & over 1 million
terms
- Language
used corresponds to EL++ fragment of OWL 1.1
- NHS
version extended to 542,380 classes with
- 19,828
additional named classes
- 148,821
class drug taxonomy (primitive hierarchy)
- OWL
reasoner (FaCT++) classified NHS ontology
- Able
to classify whole ontology in <4 hours
- Interesting
results come from 19,828 additional named classes
- 180
missing subClass relationships were found, e.g.:
- Periocular_dermatitis
subClassOf Disease_of_face
SNOMED
Post Coordination
- Vocabulary is extensible at point of use: “post
coordination”
- Users (e.g. clinicians) may
enter novel class descriptions
- Terminology
service (reasoner) used to classify descriptions
- Typical
new term might be “allergy caused by almond”
- OWL
reasoner (FaCT++) used to classify new term
- Able
to perform classification in <10 ms
- Classified
as a kind of “nut allergy”
- Clearly
of crucial importance to recognize patients with allergy caused by
almond as kinds of patient with nut allergy
Online Self-Medication Advice
- Self-medication is pervasive, but can be hazardous
- 180
deaths in the USA in 2006
- French
project to provide on-line advice
- Will be
made available to 20 million customers of French health insurance
companies
- Patients have their
own simple health care record (SEHR)
- Diagnosis
system considers symptom descriptions, SEHR, Q&A and
self-medication KB
- Uses OWL
reasoner to advise on treatment, and check for contra-indications,
side-effects, etc.
- E.g.,
do not take x if patient suffers from y; side-effects may include z
Online Self-Medication Advice
Online
Self-Medication Advice
- Data taken from drug terminologies, e.g.:
- European
Pharmaceutical Market Research Association (EphMRA)
- Anatomical
Therapeutic Chemical (ATC)
- Data
transformed into OWL ontology
- Reasoner
used to check and enhance ontology
- OWL
reasoner also used to check and enhance data
- Combined
with induction and interaction with expert
- Corrected
missing/incorrect information on interactions, contra-indications,
allergies, side-effects, etc.
- Quality
of data improved by factor of 8%
OWL Community
- OWL: Experiences and Directions (OWLED) Workshops
- Users,
Implementors, Theorists working to advance the state of things
- OWLED
2007 had >90 participants, >40 papers
- OWLED
2005-2006 yielded OWL 1.1 (a W3C submission)
- OWL WG
charter just went to AC!
- Task forces
New OWL WG
- Strong support for extending OWL
- E.g.,
Siemens, Partners Healthcare, Oracle, webMethods, SRI, (IBM!) etc.
- Balance
disruption against (perceived) stagnation
- Stability
is good
- But don't want to seem moribund!
- OWL
has stimulated a lot of interest in the technology
- E.g.,
Pellet wouldn't exist, if not for OWL
(Some) Other Application Areas
- Form Management
- See
the Galen
Experience
- Siemens
picking this up
- Should apply to other areas than
healthcare!
- Configuration management
- Data integration
- Conceptual
Modeling
Thanks
- To Steve Harris for the Garlik.com information.
- To
Chris Wroe for the CRS information and Ian Horrocks for the slides.
- To
Ian Horrocks for Online Self-Med Advice slides
- To
Andrew Schain and Kendall Clark for Bianca and POPS discussion.
About Clark & Parsia, LLC
DC-based R&D firm specializing in Semantic Web, web services, and advanced AI technologies for federal
and enterprise customers.
http://clarkparsia.com/