Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 78 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
78
Dung lượng
4,43 MB
Nội dung
VIETNAM NATIONAL UNIVERSITY, HANOI UNIVERSITY OF ENGINEERING AND TECHNOLOGY NGUYEN QUOC DAT RIPPLE DOWN RULES FOR QUESTION ANALYSIS Major: Computer Science Code: 60 48 01 MASTER THESIS Supervised by: Dr Pham Bao Son Hanoi - 2011 TIEU LUAN MOI download : skknchat@gmail.com Ripple Down Rules for Question Analysis Nguyen Quoc Dat Faculty of Information Technology University of Engineering and Technology Vietnam National University, Hanoi Supervised by Dr Pham Bao Son A thesis submitted in fulfillment of the requirements for the degree of Master of Science in Computer Science August 2011 TIEU LUAN MOI download : skknchat@gmail.com TIEU LUAN MOI download : skknchat@gmail.com ORIGINALITY STATEMENT ‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at University of Engineering and Technology (UET/Coltech) or any other educational institution, except where due acknowledgement is made in the thesis Any contribution made to the research by others, with whom I have worked at UET/Coltech or elsewhere, is explicitly acknowledged in the thesis I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project’s design and conception or in style, presentation and linguistic expression is acknowledged.’ Hanoi, August 23rd , 2011 Signed i TIEU LUAN MOI download : skknchat@gmail.com ABSTRACT For the task of turning a natural language question into an explicit intermediate representation of the complexity in question answering systems, all published works so far use rule-based approach to the best of our knowledge We believe that it is because of the complexity of the representation and the variety of question types and also there are no publicly available corpora of a decent size In these rule-based approaches, the process of creating rules is not discussed It is clear that manually creating the rules in an ad-hoc manner is very expensive and error-prone This thesis firstly describes an ad-hoc method to convert Vietnamese natural language questions into intermediate representation elements over semantic annotations via grammar rules Importantly, this thesis focuses on proposing a language independent approach on the process of creating those rules manually, in a way that consistency between rules is maintained and the effort to create a new rule is independent of the size of the current rule set Experimental results are promising to show that our language independent approach is easy to adapt for a new domain and a new language Publications: Dat Quoc Nguyen, Dai Quoc Nguyen and Son Bao Pham Systematic Knowledge Acquisition for Question Analysis In Proc of the 8th International Conference on Recent Advances in Natural Language Processing (RANLP 2011) Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham and Dang Duc Pham Ripple Down Rules for Part-Of-Speech Tagging In Proc of 12th International Conference on Intelligent Text Processing and Computational Linguistics (CICLING 2011), Springer-Verlag LNCS, part I, pp 190-201 Dai Quoc Nguyen, Dat Quoc Nguyen and Son Bao Pham A Vietnamese question answering system In Proc of the 2009 International Conference on Knowledge and Systems Engineering (KSE 2009), IEEE CS, pp 26–32 ii TIEU LUAN MOI download : skknchat@gmail.com ACKNOWLEDGEMENTS First and foremost, I would like to express my deepest gratitude to my supervisor, Dr Pham Bao Son, for his patient guidance and continuous support throughout the years He always appears when I need help, and responds to queries so helpfully and promptly I would like to give my honest appreciation to my brother, Nguyen Quoc Dai, for his great support I would like to specially thank Prof Bui The Duy and my colleagues for their help through my time at Human Machine Interaction Laboratory, UET/Coltech I would also like to thank my friend, Nguyen Le Trang, for her kindly help I sincerely acknowledge the Vietnam National University, Hanoi, NAFOSTED Vietnam, Toshiba Foundation Scholarship, and especially Dr Pham Bao Son for supporting finance to my master study Finally, this thesis would not have been possible without the support and love of my mother and my father Thank you! iii TIEU LUAN MOI download : skknchat@gmail.com To my family ♥ iv TIEU LUAN MOI download : skknchat@gmail.com Table of Contents Introduction Literature review 2.1 Question analysis in question answering systems 2.1.1 Question classification 2.1.2 Pattern-matching based analysis 2.1.3 Syntactic-based analysis 2.1.4 Semantic-based analysis 2.1.5 Annotation-based question analysis in question answering systems 2.2 GATE 2.2.1 Information Extraction in GATE 2.2.2 JAPE 2.3 Single Classification Ripple Down Rules 3 Our 3.1 3.2 3.3 3.4 3.5 Question Answering System Architecture Introduction Preprocessing module Syntactic analysis module 3.3.1 Noun phrases detection 3.3.2 Question-phrases detection 3.3.3 Relations detection Semantic analysis module Answer retrieval component Systematic Knowledge Acquisition for Question Analysis 10 12 14 14 19 20 20 23 24 24 25 26 27 29 30 v TIEU LUAN MOI download : skknchat@gmail.com vi TABLE OF CONTENTS 4.1 4.2 4.3 Recall Intermediate Representation of an input question 30 Rule language 32 Knowledge Acquisition Process 33 Evaluation 37 5.1 Question Analysis for Vietnamese 37 5.2 Question Analysis for English 39 Conclusion 41 A Definitions of question-class types 43 B Definitions of question-structures 45 C Intermediate Representation Elements of English questions 48 D Embedding Java code in JAPE 59 TIEU LUAN MOI download : skknchat@gmail.com List of Figures 2.1 2.2 2.3 2.4 2.5 Parse tree of question “ which rock contains magnesium? ” The syntactic-semantic tree example Aqualog’s architecture GATE’s architecture A set of Token annotations in GATE 11 12 13 3.1 3.2 3.3 3.4 3.5 3.6 3.7 3.8 Architecture of our question answering system An example of intermediate representation element An example of redefining the TokenVn annotation NounPhrase annotations QU-E-L-MC and QUTerm annotations Relation between phrases Relation annotations Question structures 4.1 4.2 Question analyzer’s GUI 31 Question processing component to create the intermediate representation of question “trường đại học Cơng Nghệ có sinh viên?”(“how many students are there in the College of Technology?”) 34 C.1 C.2 C.3 C.4 C.5 C.6 C.7 C.8 Question-structure Question-structure Question-structure Question-structure Question-structure Question-structure Question-structure Question-structure of of of of of of of of Definition UnknTerm UnknRel Normal Affirm ThreeTerm Affirm_3Term And 21 22 23 25 26 27 27 28 48 49 49 50 50 51 51 52 vii TIEU LUAN MOI download : skknchat@gmail.com Appendix C 51 Figure C.6: Question-structure of ThreeTerm Figure C.7: Question-structure of Affirm_3Term TIEU LUAN MOI download : skknchat@gmail.com 52 Appendix C Figure C.8: Question-structure of And TIEU LUAN MOI download : skknchat@gmail.com Appendix C 53 Figure C.9: Question-structure of And (2) TIEU LUAN MOI download : skknchat@gmail.com 54 Appendix C Figure C.10: Question-structure of And (3) TIEU LUAN MOI download : skknchat@gmail.com Appendix C 55 Figure C.11: Question-structure of And (4) TIEU LUAN MOI download : skknchat@gmail.com 56 Appendix C Figure C.12: Question-structure of Or TIEU LUAN MOI download : skknchat@gmail.com Appendix C 57 Figure C.13: Question-structure of Clause TIEU LUAN MOI download : skknchat@gmail.com 58 Appendix C Figure C.14: Question-structure of Clause (2) TIEU LUAN MOI download : skknchat@gmail.com Appendix D Embedding Java code in JAPE Phase: EditYesnoAnno Input: TokenVn Split Options: control = appelt Macro: YESNO /* Macro YESNO is used to match question-word phrases “phải không”, “đúng không”, “có là”, “có phải là”, “có đúng”, “có phải”, “Có đúng”, “Có phải”, “Có là”, and “Có phải là” These phrases means “is that”, “is this”, “are these”, “are those” in English */ ( ( ({TokenVn.string == "phải"}|{TokenVn.string == "đúng"}) ? {TokenVn.string == "không"} )| ( ({TokenVn.string == "Có"} | {TokenVn.string == "có"}) ({TokenVn.string == "đúng"} | {TokenVn.string == "phải"}) ({TokenVn.string == "là"})? ) ) Rule: editYesNoTerm Priority: 50 ( YESNO ):ynSet 59 TIEU LUAN MOI download : skknchat@gmail.com 60 Appendix D { // Retrieve YesNoSet annotations from the LHS side gate.AnnotationSet YesNoSet = (gate.AnnotationSet)bindings.get("ynSet"); // Create a new list to hold YesNoSet annotations List listTerm = new ArrayList(YesNoSet); // Get an iterator of the annotations over created list Iterator termIter = (Iterator)listTerm.iterator(); // Declare variables gate.Annotation yesnoAnn; gate.FeatureMap yesnoAnnFeatures; String string = ""; // Get feature map while(termIter.hasNext()){ yesnoAnn = (gate.Annotation)termIter.next(); yesnoAnnFeatures = (gate.FeatureMap)yesnoAnn.getFeatures(); string += (String)yesnoAnnFeatures.get("string") + " "; } // Create features gate.FeatureMap features = Factory.newFeatureMap(); features.put("string", string.trim()); features.put("category", "question-word"); features.put("type", "YesNo"); /* Remove all of old TokenVn annotations corresponding with words in the phrase that LHS matched */ inputAS.removeAll(YesNoSet); /* Create a new annotation TokenVn annotating the matched phrase */ outputAS.add(YesNoSet.firstNode(), YesNoSet.lastNode(), "TokenVn", features); } TIEU LUAN MOI download : skknchat@gmail.com Bibliography I Androutsopoulos, G Ritchie, and P Thanisch Masque/sql: an efficient and portable natural language query interface for relational databases In Proceedings of the 6th international conference on Industrial and engineering applications of artificial intelligence and expert systems, pages 327–330, 1993 Ion Androutsopoulos, Graeme Ritchie, and Peter Thanisch Natural language interfaces to databases — an introduction Natural Language Engineering, 1:29–81, 1995 Paolo Atzeni, Roberto Basili, Dorte Haltrup Hansen, Paolo Missier, Patrizia Paggio, Maria Teresa Pazienza, and Fabio Massimo Zanzotto Ontology-based question answering in a federation of university sites: The MOSES case study In Proceedings of 9th International Conference on Applications of Natural Languages to Information Systems, NLDB 2004, pages 413–420, 2004 Van Dur Benjamin, Yifen Huang, Anna Kup´s´c, and Eric Nyberg Towards light semantic processing for question answering In Proceedings of the HLT-NAACL 2003 workshop on Text meaning Volume 9, pages 54–61, 2003 Noam Chomsky Syntactic Structures Mouton, The Hague, 1957 Philipp Cimiano, Peter Haase, Jă org Heizmann, Matthias Mantel, and Rudi Studer Towards portable natural language interfaces to knowledge bases - the case of the orakel system Data Knowl Eng., 65:325–354, 2008 Stephen Clark, Mark Steedman, and James R Curran Object-extraction and question-parsing using ccg In Proceedings of the SIGDAT Conference on Empirical Methods in Natural Language Processing, pages 111–118, 2004 William W Cohen, Pradeep Ravikumar, and Stephen E Fienberg A comparison of string distance metrics for name-matching tasks In Proceedings of IJCAI-03 Workshop on Information Integration, pages 73–78, 2003 P Compton and R Jansen A philosophical basis for knowledge acquisition Knowledge Aquisition, 2(3):241–257, 1990 61 TIEU LUAN MOI download : skknchat@gmail.com 62 Bibliography Paul Compton and Bob Jansen Knowledge in context: A strategy for expert system maintenance In Proceedings of the second Australian joint conference on Artificial intelligence, volume 406, pages 292–306, 1988 Hammish Cunningham, Diana Maynard, Kalina Bontcheva, and Valentin Tablan GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications In Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics, pages 168–175, 2002 Danica Damljanovic, Valentin Tablan, and Kalina Bontcheva A text-based query interface to owl ontologies In Proceedings of 6th Language Resources and Evaluation Conference, 2008 Christiane D Fellbaum WordNet: An Electronic Lexical Database MIT Press, 1998 A Galea Open-domain surface-based question answering system In Proceedings of the Computer Science Annual Workshop (CSAW), 2003 Sanda Harabagiu, Dan Moldovan, Marius Pa¸sca, Rada Mihalcea, Mihai Surdeanu, Zvan Bunescu, Roxana Girju, Vasile Rus, and Paul Morarescu Falcon: Boosting knowledge for answer engines In Proceedings of the Ninth Text REtrieval Conference, pages 479–488, 2000 Sanda M Harabagiu, Steven J Maiorano, and Marius A Pa¸sca Open-domain textual question answering techniques Natural Language Engineering, 9(3):231–267, 2003 Zhiheng Huang, Marcus Thint, and Zengchang Qin Question classification using head words and their hypernyms In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 927–936, 2008 John Judge, Yuqing Guo, Gareth J F Jones, and Bin Wang An analysis of question processing of english and chinese for the ntcir cross-language question answering task 2005 Daniel Jurafsky and James H Martin Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition (Second Edition) Prentice Hall, 2008 Boris Katz Annotating the world wide web using natural language In Proceedings of the 5th RIAO Conference on Computer Assisted Information Searching on the Internet - RIAO 1997, pages 136–159, 1997 Boris Katz, Gary C Borchardt, and Sue Felshin Syntactic and semantic decomposition strategies for question answering from multiple resources In Proceedings of the AAAI 2005 Workshop on Inference for Textual Question Answering, pages 35–41, 2005 Boris Katz, Gary C Borchardt, and Sue Felshin Natural language annotations for question answering In Proceedings of the 19th International Florida Artificial Intelligence Research Society Conference, pages 303–306, 2006 TIEU LUAN MOI download : skknchat@gmail.com Bibliography 63 Krystle Kocik Question classification using maximum entropy models Master’s thesis, University of Sydney, 2004 Wei Li Question classification using language modeling Technical report, In CIIR Technical Report: University of Massachusetts, 2002 Xin Li and Dan Roth Learning question classifiers In Proceedings of the 19th international conference on Computational linguistics - Volume 1, COLING ’02, pages 1–7 Association for Computational Linguistics, 2002 Xin Li and Dan Roth Learning question classifiers: the role of semantic information Natural Language Engineering, 12(3):229–249, 2006 Vanessa Lopez, Victoria Uren, Enrico Motta, and Michele Pasin Aqualog: An ontology-driven question answering system for organizational semantic intranets Web Semantics: Science, Services and Agents on the World Wide Web, 5(2):72–105, 2007 Christopher D Manning and Hinrich Schă utze Foundations of statistical natural language processing MIT Press, Cambridge, MA, USA, 1999 Christopher D Manning, Prabhakar Raghavan, and Hinrich Schtze Introduction to Information Retrieval Cambridge University Press, New York, NY, USA, 2008 Donald Metzler and W Bruce Croft Analysis of statistical question classification for fact-based questions Inf Retr., 8:481–504, May 2005 ISSN 1386-4564 Wu Min and Strzalkowski Tomek Utilizing entity relation to bridge the language gap in crosslingual question answering system In Proceedings of the 6th NTCIR Workshop Meeting on Evaluation of Information Access Technologies: Information Retrieval, Question Answering and Cross-Lingual Information Access, 2006 D Moldovan, S Harabagiu, R Girju, P Morarescu, F Lacatusu, A Novischi, A Badulescu, and O Bolohan Lcc tools for question answering In Voorhees and Buckland, editors, Proceedings of the 11th Text REtrieval Conference (TREC-2002), 2002 Anh Kim Nguyen and Huong Thanh Le Natural language interface construction using semantic grammars In Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence, pages 728–739, 2008 Dai Quoc Nguyen, Dat Quoc Nguyen, and Son Bao Pham A vietnamese question answering system In Proceedings of the 2009 International Conference on Knowledge and Systems Engineering, pages 26–32, 2009 Dat Quoc Nguyen, Dai Quoc Nguyen, and Son Bao Pham Systematic knowledge acquisition for question analysis In Proceedings of 8th International Conference on Recent Advances in Natural Language Processing, (In press), September, 2011a TIEU LUAN MOI download : skknchat@gmail.com 64 Bibliography Dat Quoc Nguyen, Dai Quoc Nguyen, Son Bao Pham, and Dang Duc Pham Ripple down rules for part-of-speech tagging In Proc of 12th International on Conference Computational Linguistics and Intelligent Text Processing, pages 190–201, 2011b Ahad Niknia and Leila Sharif Hassanabadi A question answering system based on grammatical structure matching In Proceedings of the IADIS International Conference Applied Computing 2009, pages 165–172, 2009 Dang Duc Pham, Giang Binh Tran, and Son Bao Pham A hybrid approach to vietnamese word segmentation using part of speech tags In Proceedings of the 2009 International Conference on Knowledge and Systems Engineering, pages 154–161, 2009 Son Bao Pham and Achim Hoffmann Efficient knowledge acquisition for extracting temporal relations In Proceeding of the 17th European Conference on Artificial Intelligence, pages 521– 525, 2006 T.T Phan and T.C Nguyen Question semantic analysis in vietnamese qa system In Edited book "Advances in Intelligent Information and Database Systems" of The 2nd Asian Conference on Intelligent Information and Database Systems (CIIDS2010), pages 29–40, 2010 Ana-Maria Popescu, Oren Etzioni, and Henry Kautz Towards a theory of natural language interfaces to databases In Proceedings of the 8th international conference on Intelligent user interfaces, IUI ’03, pages 149–157, 2003 Debbie Richards Two decades of ripple down rules research Knowledge Engineering Review, 24 (2):159–184, 2009 Ashish Kumar Saxena, Ganesh Viswanath Sambhu, Saroj Kaushik, and L Venkata Subramaniam Iitd-ibmirl system for question answering using pattern matching, semantic type and semantic category recognition In Proceedings of The Sixteenth Text REtrieval Conference, 2007 Sanjay Silakari, Mahesh Motwani, and Neelu Nihalani Natural language interface for database: A brief review IJCSI International Journal of Computer Science Issues, 8:600–608, 2011 Eriks Sneiders Automated question answering using question templates that cover the conceptual model of the database In Proceedings of the 6th International Conference on Applications of Natural Language to Information Systems-Revised Papers, NLDB ’02, pages 235–239, 2002 Niculae Stratica, Leila Kosseim, and Bipin C Desai Nlidb templates for semantic parsing In Proceedings of the 8th International Conference on Applications of Natural Language to Information Systems, pages 235–241, 2003 Valentin Tablan, Daina Maynard, Kalina Bontcheva, and Hamish Cunningham Gate – an application developer’s guide http: // gate ac uk/ sale/ pg/ pg pdf , 2004 TIEU LUAN MOI download : skknchat@gmail.com Bibliography 65 Marjorie Templeton and John Burger Problems in natural-language interface to dbms with examples from eufid In Proceedings of the first conference on Applied natural language processing, pages 3–16, 1983 M Vargas-Vera and E Motta An ontology-driven similarity algorithm Technical report, Knowledge Media Institute, The Open University, 2004 David L Waltz An english language question answering system for a large relational database Commun ACM, 21:526–539, July 1978 W A Woods, Ron Kaplan, and Nash B Webber The LUNAR sciences natural language information system: Final report Technical Report BBN Report No 2378, Bolt Beranek and Newman, 1972 Min Wu, Xiaoyu Zheng, Michelle Duan, Ting Liu, and Tomek Strzalkowski Question answering by pattern matching, web-proofing, semantic form proofing In Proceedings of the Twelfth Text REtrieval Conference (TREC 2003), pages 578–585, 2003 Dell Zhang and Wee Sun Lee Question classification using support vector machines In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval, pages 26–32, 2003 Copyright c 2011 by Nguyen Quoc Dat Printed and bound by Nguyen Quoc Dat TIEU LUAN MOI download : skknchat@gmail.com ... described in appendix D TIEU LUAN MOI download : skknchat@gmail.com 2.3 Single Classification Ripple Down Rules 2.3 19 Single Classification Ripple Down Rules Ripple Down Rules (RDR) (Compton and Jansen,... Engineering Java Annotation Patterns Engine A New-Nearly Information Extraction Ripple Down Rules Single Classification Ripple Down Rules Question Classification Support Vector Machine Semantically.. .Ripple Down Rules for Question Analysis Nguyen Quoc Dat Faculty of Information Technology University of Engineering and Technology