Figure 7.1 Semantic Web knowledge management architecture
7.5 On-To-Knowledge Semantic Web Architecture
Building the Semantic Web not only involves using the new languages de- scribed in this book, but also a rather different style of engineering and a rather different approach to application integration. To illustrate this, we describe in this section how a number of Semantic Web-related tools can be integrated in a single lightweight architecture using Semantic Web standards to achieve interoperability between independently engineered tools (see fig- ure 7.1).
7.5.1 Knowledge Acquisition
At the bottom of figure 7.1 we find tools that use surface analysis techniques to obtain content from documents. These can be either unstructured natural language documents or structured and semistructured documents (such as HTML tables and spreadsheets).
In the case of unstructured documents, the tools typically use a combi- nation of statistical techniques and shallow natural language technology to extract key concepts from documents.
In the case of more structured documents, the tools use techniques such as wrappers, induction, and pattern recognition to extract the content from the weak structures found in these documents.
7.5.2 Knowledge Storage
The output of the analysis tools is sets of concepts, organized in a shal- low concept hierarchy with at best very few cross-taxonomical relationships.
RDF and RDF Schema are sufficiently expressive to represent the extracted information.
Besides simply storing the knowledge produced by the extraction tools, the repository must of course provide the ability to retrieve this knowledge, preferably using a structured query language such as discussed in chapter 3. Any reasonable RDF Schema repository will also support the RDF model theory, including deduction of class membership based on domain and range definitions, and deriving the transitive closure of thesubClassOfrelation- ship.
Note that the repository will store both the ontology (class hierarchy, prop- erty definitions) and the instances of the ontology (specific individuals that belong to classes, pairs of individuals between which a specific property holds).
7.5.3 Knowledge Maintenance
Besides basic storage and retrieval functionality, a practical Semantic Web repository will have to provide functionality for managing and maintaining the ontology: change management, access and ownership rights, transaction management.
Besides lightweight ontologies that are automatically generated from un- structured and semistructured data, there must be support for human engi-
7.5 On-To-Knowledge Semantic Web Architecture 217
neering of much more knowledge-intensive ontologies. Sophisticated edit- ing environments must be able to retrieve ontologies from the repository, allow a knowledge engineer to manipulate it, and place it back in the repos- itory.
7.5.4 Knowledge Use
The ontologies and data in the repository are to be used by applications that serve an enduser. We have already described a number of such applications.
7.5.5 Technical Interoperability
In the On-To-Knowledge project,14 the architecture of figure 7.1 was imple- mented with very lightweight connections between the components. Syn- tactic interoperability was achieved because all components communicated in RDF. Semantic interoperability was achieved because all semantics was expressed using RDF Schema. Physical interoperability was achieved be- cause all communications between components were established using sim- ple HTTP connections, and all but one of the components (the ontology editor) were implemented as remote services. When operating the On-To- Knowledge system from Amsterdam, the ontology extraction tool, running in Norway was given a London-based URL of a document to analyze; the re- sulting RDF and RDF Schema were uploaded to a repository server running in Amersfoort (the Netherlands). These data were uploaded into a locally in- stalled ontology editor, and after editing downloaded back into the Amers- foort server. The data were then used to drive a Swedish ontology-based Web site generator (see the EnerSearch case-study in chapter 6), as well as a U.K.-based search engine, both displaying their results in the browser on the screen in Amsterdam.
In summary, all these tools were running remotely, were independently engineered, and only relied on HTTP and RDF to obtain a high degree of interoperability.
14. <http://www.ontoknowledge.org>.
Suggested Reading
Some key papers that were used as the basis for this chapter are:
• Ontology Development 101: A Guide to Creating Your First Ontology Na- talya. F. Noy and Deborah L. McGuinness
<http://www.ksl.stanford.edu/people/dlm/papers/ontology101/
ontology101-noy-mcguinness.html>.
• M. Uschold, and M. Gruninger. Ontologies: Principles, Methods and Applications. Knowledge Engineering Review, Volume 11 Number 2, (June 1996).
• B. Omelayenko. Learning of Ontologies for the Web: the Analysis of Ex- isting Approaches, In:Proceedings of the International Workshop on Web Dy- namics, 8th International Conference on Database Theory (ICDTŠ01). 2001.
<http://www.cs.vu.nl/ borys/papers/WebDyn01.pdf>
Two often cited books are:
• A. Maedche,Ontology Learning for the Semantic Web, Kluwer International Series in Engineering and Computer Science, Volume 665, 2002.
• J. Davies, D. Fensel, and F. van Harmelen. Towards the Semantic Web:
Ontology-Driven Knowledge Management. New York: Wiley, 2003.
Project
This project is a mediumscale exercise that will occupy two or three people for about two to three weeks. All required software is freely available. We provide some pointers to software that we have used successfully, but given the very active state of development of the field, the availability of software is likely to change rapidly. Also, if certain software is not mentioned, this does not indicate our disapproval of it.
The assignment consists of tree parts.
1. In the first part, you will create an ontology that describes the domain and contains the information needed by your own application. You will use the terms defined in the ontology to describe concrete data. In this step, you will be applying the methodology for ontology construction outlined in the first part of this chapter, and you will be using OWL as a represen- tation language for your ontology (see chapter 4).
Project 219
2. In the second part, you will use your ontology to construct different views on your data, and you will query the ontology and the data to extract information needed for each view. In this part, you will be applying RDF storage and querying facilities (see chapter 3).
3. In the third part, you will create different graphic presentations of the extracted data using XSLT technology (see chapter 2).