Applied ontology based knowledge management a report on the state of the art

THÔNG TIN TÀI LIỆU

The thesisThis thesis evaluates the use of Semantic Web technologies for building an ontology-based content management system in the context of the EnerSearch case study conducted within

Applied Ontology-based Knowledge Management: A Report on the State-of-the-Art Péter Mika 2002-07-23 Vrije Universiteit, Amsterdam A Master's Thesis written by Péter Mika under the supervision of Prof Frank van Harmelen at the Vrije Universiteit, Amsterdam in July, 2002 Second reader: Heiner Stuckenschmidt Version 1.3 Cover design shows a fragment of the ontology extracted from the text of this thesis by OntoExtract, as visualized by Spectacle Cluster Maps technology vrije Universiteit amsterdam Table of Contents PREFACE ACKNOWLEDGEMENTS INTRODUCTION THE VISION .7 THE THESIS .7 ONTOLOGY .9 1.1 THE HISTORY OF ONTOLOGY 1.2 PRESENT DAY ONTOLOGY .9 1.3 THE FUTURE OF ONTOLOGY – SOME REFLECTIONS .10 ONTOLOGY LANGUAGES 13 2.1 THE RESOURCE DESCRIPTION FRAMEWORK AND SCHEMA 13 2.1.1 RDF unraveled .13 2.1.2 Interpreting RDF 16 2.1.3 RDF-S .17 2.1.4 Advanced RDF-S constructs 19 2.1.5 Concluding remarks .19 2.2 TOPIC MAPS 20 2.2.1 Advanced features of Topic Maps 21 2.3 MORE EXPRESSIVE SEMANTIC WEB LANGUAGES 21 2.3.1 The Ontology Inference Layer .21 2.3.2 From OIL to OWL 22 THE ON-TO-KNOWLEDGE RESEARCH PROJECT .23 3.1 ONTOLOGY DEVELOPMENT 23 3.2 ONTOLOGY EXTRACTION 24 3.3 ONTOLOGY-BASED BROWSING AND VISUALIZATION 24 3.4 ONTOLOGY STORAGE AND QUERY .25 3.5 ONTOLOGY MIDDLEWARE AND ADVANCED REASONING .25 3.6 ONTOLOGY-BASED SEARCH AND KNOWLEDGE SHARING .26 THE ENERSEACH CASE STUDY 27 4.1 THE ORGANIZATION 27 4.2 ONTOLOGY USE CASE 27 4.3 ALTERNATIVE TECHNOLOGIES 29 4.4 FORMULATION OF THE RESEARCH AREA .29 4.5 OVERVIEW OF THE ONTOLOGY-BASED SOLUTION 29 THE FIRST STEP: ONTOLOGY ENGINEERING .31 5.1 DATA FOR THE SEMANTIC WEB 31 5.2 THE ENERSEARCH CASE 31 5.2.1 Ontology extraction with OntoExtract 32 5.2.2 The domain ontology 34 5.2.3 The publication ontology .39 5.3 FUTURE WORK .40 5.3.1 Data model and representation language 40 5.3.2 Ontology editor 42 5.3.3 Versioning 43 THE SECOND STEP: ONTOLOGY STORAGE, QUERY AND TRANSFORMATION 45 6.1 THE SESAME RDF(S) STORAGE AND QUERY FACILITY 45 6.2 ONTOLOGY STORAGE, QUERY AND TRANSFORMATIONS IN THE ENERSEARCH CASE STUDY 46 6.3 THE ON2K.GRAPH PACKAGE 49 6.3.1 Using on2k.graph in the EnerSearch case .50 6.3.2 Portable Inference Modules 51 6.4 FUTURE WORK .51 THE THIRD STEP: ONTOLOGY-BASED INFORMATION RETRIEVAL 53 7.1 A BRIEF HISTORY OF WEB SEARCH AND NAVIGATION 53 7.2 ONTOLOGY-BASED BROWSING USING SPECTACLE 54 7.2.1 Scalability of the Spectacle presentation .59 7.3 ONTOLOGY-BASED SEARCH USING QUIZRDF .61 7.4 FUTURE WORK 64 7.4.1 Ontology-based information visualisation .64 7.4.2 Combining search and navigation 64 7.4.3 Ontology-based user profiling .64 SUMMARY AND CONCLUSION 65 8.1 SUMMARY OF THE WORK 65 8.2 CONCLUSION 65 APPENDIX 67 10 REFERENCES 69 Preface As with every major intellectual effort, there is a story behind this work as well Following four years of education in Computer Science at the Eötvös Loránd Tudomány Egyetem (ELTE) in Budapest and a year at the Vrije Universiteit in Amsterdam (VUA), just like my peers, I had the obligation to complete a thesis work for a Master's degree in Computer Science Unlike my peers, however, I was not only looking for a topic, but planned to find a related internship as well: I considered that a guarantee that the topic I chose would have a practical value beyond an exercise per se In my quest I had the fortune to meet Frank van Harmelen from the Artificial Intelligence Department of the VUA He not only offered me a topic and an internship to go with it, but also helped me to joining one of the most reputable research collaboration in the field of Semantic Web [29] technologies and applications He also introduced me to Hans Akkermans from the Business Informatics Department of the VUA who helped as an advisor during my work I started in January 2002 at EnerSearch, a Swedish case study partner in the On-ToKnowledge research project In order to avoid relocation, AIdministrator (another partner in the project) generously hosted me for the duration of my work For the following six month I had been vested the responsibility to assemble the components developed in the project into a comprehensive solution for EnerSearch With very few experience available to build on, my task was rightfully expected to be a challenge and an immense learning endeavor While strenuous at times and fun at others, it altogether gave me a unique insight into some of the most exciting technologies in AI today and provided the wealth of experience that is captured in the following thesis Acknowledgements I would like to express my appreciation to my supervisor, Frank van Harmelen for his insights, reviews and professional advice He not only helped me to get started, but guided all along Working for a virtual organization on a pan-European research effort, my work could not have been realized without the true spirit of cooperation that characterizes both local and international research efforts Here, I would like to express my gratitude for their help to all On-To-Knowledge partners, who are too numerous to mention A special thanks goes out to my colleagues at AIdministrator: Hilde, Jos, Jeen, Arjohn, Herko, Jeroen, Chris and Peter They embraced me as a colleague of their own and took the greatest burden in helping me to cope with the challenges of working with state-of-the-art technologies Technologies, that are, well, state-of-the-art… Introduction U " nfortunately no one can be told what the Matrix is You have to see it for yourself." The vision Through my work I came to realize that the Semantic Web is not entirely unlike the Matrix in one of the most successful movies that has ever been made about a future governed by AI: it seems that no one can be told what the Semantic Web really is Nevertheless, researchers who are hard at work on the nuts and bolts -and their number is growing rapidly- have little trouble conceptualizing it one way or the other Their visions can hardly fill the void of a definition, yet they mostly point to a future that can be interpreted as follows Dear Reader, imagine a computer or a system that feels just a bit more human No, it does not have emotions or come up with grand new ideas such as the Semantic Web Yet, it feels remarkably knowledgeable when it comes to answering questions about a field, like the energy industry In particular, it uses the same vocabulary and descriptions of the terms as the people who are at home in that field, people we usually call experts Also, it's not only good at communicating along the common, shared conceptualization of experts, it also follows their logic, i.e the way they use their knowledge to reason about their domain The vocabulary, the definitions of the terms and the logic that applies within the domain are captured in formal ontologies that we introduce in Chapter But we don't have stop here: imagine an entire network of such systems Though physically built on top of today's Internet, this web is even smarter than the sum of its components: it aggregates their potential into an enormous think tank we might call the Semantic Web This web will also use its knowledge to annotate the contents it stores, providing descriptions of the content on a conceptual level, i.e in terms of a well-understood ontology At first, this metainformation will help you to find information more easily on the web by searching the meaning of the content instead of trying to match keywords you provide Later, intelligent personal agents will scour the vastness of knowledge structures, reasoning along the way to find an answer to just about any problem that can be described using terms of an ontology Let that be finding a better match for a date or coordinating scientific research more effectively on an international scale The thesis This thesis evaluates the use of Semantic Web technologies for building an ontology-based content management system in the context of the EnerSearch case study conducted within the On-To-Knowledge research project As in most cases the tools and technologies developed within the project represent the state-of-the-art of Semantic Web research, such an evaluation will reflect the extent at which current ontology-based solutions live up to the vision outlined above In particular, we shall investigate how the costs of building an ontology-based system measure up to real or expected benefits of semantic technologies We shall so by a systematic account of the problems that we encounter in building such a system, the extent at which these problems degrade the qualities of such a system and the remedies that might be provided in the future This means finding an answer to the following two questions: - What are the costs of building an ontology-based solution? This requires looking at the process of creating an ontology-based system and the theoretical and practical difficulties that can be identified along the way - What are the benefits of building an ontology-based solution? This requires an evaluation of the technologies in comparison to existing solutions and the outlook for the rapidly developing semantic field In Chapter we will look at the past, present and future of ontologies and discuss the RDF(S), Topic Map and OIL ontology representation languages We will describe RDF(S) in more detail as it is the ontology language of choice for the case study and its features have significant impact on the tools and applications that work with semantic data in RDF(S) format In Chapter 2, we give an overview of the goals and methods of the On-To-Knowledge project, along with a brief description of all tools and technologies Chapter introduces the EnerSearch case study: the company and the business case for ontology-based content management in its context This chapter formulates our research area as well and outlines the system that has been developed by integrating On-To-Knowledge tools and technologies into a comprehensive solution While Chapters 1-3 set the context of our survey, the following chapters describe the three major steps of the development Each of these chapters focus on a specific area, namely the acquisition of ontologies (Chapter 4), ontology storage, query and transformation (Chapter 5) and ontology-based information retrieval (Chapter 6) These chapters follow the same layout in that they offer a description of the related work, the work done in the case study and ideas for future work Lastly, in Chapter we build on the experience of the previous chapters to draw our conclusion: the price to pay for the use of semantic technologies is high and the search need to continue for the application that would justify that investment For the rest, Dear Reader, you have to see it for yourself Ontology “Some semanticists say that certain expressions designate certain entities, and among these designated entities they include not only concrete material things but also abstract entities e.g., properties as designated by predicates and propositions as designated by sentences…” Rudolf Carnap, 1950 1.1 The history of ontology The term Ontology (originating from the Greek words onto for existence, being and logos for science) migrated into AI from the realm of philosophy where it represents a systematic account of existence: “Ontology seeks to provide a definitive and exhaustive classification of entities in all spheres of existence.” [1] There are many streams in the science of Ontology For example, there is a strong division between adequatist ontologists who seek a taxonomy of things in reality through a descriptive means and reductionists who describe reality in terms of a basic level of simple entities and see more complex concepts as a combination of them Although they take a different trade-off with respect to generativity vs descriptiveness, reductionists and adequatists share the same convention that an ontology should at least be compatible of empirical evidence Indeed, some philosophers such as Quine, thought that the only source of ontology should be the study of natural sciences Quine was also the first to emphasize first-order predicate logic as representation for ontologies Some twentieth-century philosophers inspired by Kant, however, took a radical new view at Ontology: they consider the building of an ontology a meta-level activity that does not deal with the real world, but rather theories about the world While traditional ontologists seek principles that are true in reality, this new school of thought claims that the best we can achieve is internal metaphysics: an elicitation of principles that may or may not be true According to this group, the significance of Ontology lies in the use of these theories In 1950 in an influential paper Carnap defended the use of abstract entities as names in ‘linguistic frameworks’: After new forms are introduced in the language, it is possible to formulate with their help internal questions and possible answers to them… From the internal questions we must clearly distinguish external questions concerning the existence or reality of the total system of the new entities… we take the position that the introduction of new ways of speaking does not need any theoretical justification because it does not imply any assertion of reality We may still speak (and have done so) of the “acceptance of the new entities” since this form of speech is customary; but one must keep in mind that this phrase does not mean for us anything more than acceptance of the new framework, i.e., of the new linguistic forms Above all, it must not be interpreted as referring to an assumption, belief or assertion of “the reality of the entities.” There is no such assertion Following his example, we might have a framework for discrete mathematics where we can express an assertion like “Five is a number” Such a framework is perfectly able to answer the question whether numbers exist or not: since it’s known within the framework that five is a number, the answer is positive However, the system is not able to answer any external questions that would relate to whether numbers exist in flesh and blood reality Nevertheless, Carnap’s conclusion was that the use of abstract linguistic forms can be justified by their efficiency as instruments, “the ratio of the results achieved to the amount and complexity of the efforts required.” Ontology (plural ontologies for specific theories) outside philosophy takes internal metaphysics as a departure point for moving further away from the philosophical traditions Ontologies in AI are further reduced in scope and are ever more detached from reality This process is observable in the first (1978) and second (1985) “Naive Physics Manifesto” of Hayes [2, 3], in which he describes the building of an all-encompassing theory of physical reality Remarkably, while in the first version fidelity (i.e that the ontology should be adequate and reasonably detailed) is one of the four characteristics such a theory should possess, this is dropped in the later revision Also, in the second version he talks of “the ability to interpret our axioms in a possible world” instead of faithfulness to reality 1.2 Present day ontology Following Gruber [4], we define present day ontologies in AI as explicit specifications of conceptualizations, where a conceptualization is an abstract, simplified view of the world we wish to represent for some purpose and a specification is a systematic account of it What differentiates ontologies from data models is the underlying intent of communication on the basis of the ontology Typically, a conceptualization contains objects, concepts and other entities that are of interest and the relationships that hold among them Although there are still attempts at creating ontologies to describe common-sense knowledge (so called upper-level ontologies, such as CYC [27] or SUO, the Standard Upper Ontology [28]), the world we thus describe is in most cases confined to some domain or related to some activity For example, we can talk of an ontology of a university with entities such as ‘student’, ‘Peter’, ‘thesis’ and relations ‘instanceOf’, ‘authorOf’ or a biblical ontology with entities such as Virgin Mary, immaculate conception and Jesus Christ with relations ‘relatedTo’ and ‘hasMiracle’ As ontologies represent a shared description of an area, they can be used to mediate communication within the domain.1 Thus ontologies represent a key enabling technology for the Semantic Web where (software) agents or applications rely on the shared description in exchanging queries and assertions in order to guarantee consistency (but not completeness) In parallel to the ever-wider acceptance of the above definition of ontologies, experience has been mounting in the last decade in authoring, representing, transforming and applying ontologies in a web environment While the ideal behind all this heightened interest, namely a smarter, Semantic Web is not yet a reality, some of the technologies are already available and many exciting smaller scale applications exist, ranging from business and medicine to law and entertainment The EnerSearch case study presented here, built on technologies developed within the On-To-Knowledge project, is one such application for ontology-based information retrieval 1.3 The future of Ontology – some reflections As we have seen above, present day ontologies in AI are mere representations that at best form a logic theory that (if consistent) is capable of answering questions that are internal to the domain However, such ontologies are inherently closed systems disconnected from For a classification of various applications of ontologies, consult [5]

Ngày đăng: 26/12/2023, 16:36

Xem thêm: