ABSTRACT - English: This research study aims to delve deeper into the implementation of a sophisticated knowledge base chatbot system tailored for international school situated at Vietna
Trang 1INTERNATIONAL SCHOOL
STUDENT RESEARCH PROPOSAL
RESEARCH AND APPLICATION OF VECTOR STORE DATABASE FOR INFORMATION RETRIEVAL Team Leader: Vu Thanh Phuong
ID: 21070139 Class: MIS2021A
NCKHSV 09
Trang 2TEAM LEADER INFORMATION
- Phone no /Email: 0344582982
II Academic Results (from the first year to now)
Trang 3We would like to express our sincere appreciation for the care and support provided by our teachers From the initial stages of ideation to the final completion phase, they have been instrumental in guiding us and providing the necessary motivation to overcome challenges along the way
Without their unwavering support, we would not have been able to accomplish this research
We extend our heartfelt gratitude for their significant contributions and eagerly anticipate future collaborations on upcoming projects
Trang 4LIST OF FIGURES
Figure 1 Streamlined ETL pipeline with chatbot integration
Figure 2 Logic models
LIST OF TABLES
Table 1 Operating milestones in details
Trang 6CHAPTER 2: Literature Review 21
Trang 7ABSTRACT
- English: This research study aims to delve deeper into the implementation of a sophisticated knowledge base chatbot system tailored for international school situated at Vietnam National University The primary objective of this project is to leverage advanced data crawling techniques to acquire comprehensive and current information from the school's website This extracted data is then meticulously organized and stored in a knowledge base, serving as the foundational structure for the chatbot system
The chatbot system is meticulously designed to provide a seamless and interactive user experience by delivering personalized and accurate responses to inquiries pertaining to various aspects of the international school This achievement is made possible through the utilization of state-of-the-art natural language processing algorithms, which effectively comprehend and interpret user queries Additionally, the integration of machine learning models within the system enables continuous improvement of its performance, enhancing the chatbot's ability to offer contextually relevant and precise answers
By implementing this knowledge base chatbot system, the research aims to enhance communication and accessibility for prospective students, parents, and other stakeholders associated with the international school The system not only streamlines the process of obtaining information regarding admissions, curriculum, faculty profiles, and extracurricular activities but also serves as a dependable and efficient support tool for users seeking guidance and assistance
This research project contributes to the advancement of knowledge management and the integration of artificial intelligence within the educational context The knowledge base chatbot system presents a novel approach to facilitating communication and providing comprehensive information within the specific domain of an international school at Vietnam National University The outcomes of this research have the potential to significantly improve user experience and information access, ultimately benefiting the entire school community
- Vietnamese: Nghiên cứu này nhằm mục đích đi sâu hơn vào việc triển khai hệ thống chatbot
cơ sở tri thức phức tạp được thiết kế riêng cho trường quốc tế nằm trong Đại học Quốc gia
Hà Nội Mục tiêu chính của dự án này là tận dụng các kỹ thuật thu thập dữ liệu nâng cao để
Trang 8có được thông tin toàn diện và cập nhật từ trang web của trường Dữ liệu được trích xuất này sau đó được tổ chức và lưu trữ tỉ mỉ trong cơ sở tri thức, đóng vai trò là cấu trúc nền tảng cho hệ thống chatbot
Hệ thống chatbot được thiết kế tỉ mỉ để mang lại trải nghiệm người dùng liền mạch và tương tác bằng cách đưa ra các phản hồi chính xác và được cá nhân hóa cho các câu hỏi liên quan đến các khía cạnh khác nhau của trường quốc tế Thành tựu này có được nhờ việc sử dụng các thuật toán xử lý ngôn ngữ tự nhiên tiên tiến, giúp hiểu và diễn giải các truy vấn của người dùng một cách hiệu quả Ngoài ra, việc tích hợp các mô hình học máy trong hệ thống cho phép liên tục cải thiện hiệu suất của hệ thống, nâng cao khả năng của chatbot trong việc đưa
ra các câu trả lời chính xác và phù hợp theo ngữ cảnh
Bằng cách triển khai hệ thống chatbot cơ sở tri thức này, nghiên cứu nhằm mục đích tăng cường giao tiếp và khả năng tiếp cận cho học sinh, phụ huynh tương lai và các bên liên quan khác có liên quan đến trường quốc tế Hệ thống không chỉ hợp lý hóa quy trình thu thập thông tin liên quan đến tuyển sinh, chương trình giảng dạy, hồ sơ giảng viên và hoạt động ngoại khóa mà còn đóng vai trò là công cụ hỗ trợ đáng tin cậy và hiệu quả cho những người dùng đang tìm kiếm hướng dẫn và trợ giúp
KEYWORDS:
chatbot, crawling, vector, knowledge, information, queries, responses
Trang 9CONTENT
1 Research Topic
- English: Research and application of vector store database for information retrieval
- Vietnamese: Nghiên cứu và ứng dụng cơ sở dữ liệu lưu trữ vector để truy xuất thông tin
2 Member List (ID, Email, Class, etc)
Trang 10I.INTRODUCTION
1 Problem Statement
The International School at Vietnam National University is confronted with a significant challenge in effectively managing and addressing a diverse range of inquiries spanning its academic programs, student services, administrative processes, and other pertinent areas The absence of a centralized knowledge database compounds this challenge, impeding the accessibility of accurate and up-to-date information Consequently, inefficiencies, inconsistencies, and potential stakeholder frustration ensue The current reliance on traditional communication channels, namely email and hotlines, further exacerbates the problem by causing delays in response times and impeding the school's ability to promptly address student queries This predicament not only undermines internal operational efficiency but also undermines overall effectiveness and stakeholder satisfaction levels, encompassing students, faculty, staff, and other involved parties The absence of a centralized knowledge database, coupled with the reliance on conventional communication channels, entails significant drawbacks Foremost, the lack
of immediate access to accurate information impedes stakeholders' ability to obtain timely assistance and relevant information Moreover, the absence of real-time support through mechanisms like a chatbot further compounds the issue, introducing additional delays in providing assistance This concern assumes even greater significance given the ever-evolving educational landscape, characterized by a growing need for efficient and expeditious responses It is, therefore, imperative to develop a comprehensive knowledge database that effectively addresses these challenges and integrates a chatbot to enhance the accessibility, responsiveness, and efficiency of the Q&A consulting service offered
by the International School
2 Motivation
The motivation driving this research is rooted in the recognition of the aforementioned challenges faced by the International School at Vietnam National University The primary stimulus is to proactively tackle these hurdles and provide a robust solution in the form of
a comprehensive knowledge database tailored specifically to the unique needs of the institution By establishing a centralized platform, bolstered by the integration of a
Trang 11chatbot, the aim is to streamline and optimize the question-answering processes within the school
The development of a knowledge database serves as an essential catalyst for enhancing the overall efficiency and effectiveness of the International School's operations The creation of a reliable and easily accessible repository of information empowers stakeholders to swiftly access accurate and up-to-date answers to their queries This, in turn, fosters a more efficient and effective communication flow within the institution, ultimately leading to heightened stakeholder satisfaction
Furthermore, the need for a comprehensive knowledge database is underscored by the rapid growth and evolving nature of the International School As the institution continues
to expand and attract a diverse student body, the demand for accurate and timely information escalates It is imperative to adapt to this dynamic landscape by providing real-time support through the integration of a chatbot within the knowledge database This technological advancement not only addresses the immediate challenges faced by the school but also aligns with its vision of being at the forefront of educational innovation and technological advancement
By investing in the development of a comprehensive knowledge database and integrating
a chatbot, the International School showcases its commitment to providing an exceptional learning environment and supporting the success of its students, faculty, and staff The implementation of such a system not only addresses the existing challenges but also positions the institution as a forward-thinking and technologically advanced entity in the competitive educational landscape
3 Research Method
This research will employ a systematic approach to ensure the successful integration of databases into the chatbot system at Vietnam National University's International School (VNUIS) The following research methods will be implemented:
3.1 Literature Review
The literature review conducted in this research study will involve a comprehensive exploration of existing knowledge base systems and chatbots that have been developed
Trang 12using databases This review will encompass scholarly articles, research papers, and reputable sources in the field of chatbot development By analyzing the methodologies, architectures, and features of these systems, valuable insights into best practices and potential improvements can be gained The literature review serves as a foundation for the subsequent stages of the research, ensuring that the proposed system is built upon a solid understanding of the existing state-of-the-art in the field
3.2 Data Collection
To collect the necessary data for this study, various approaches will be employed Data serving and retrieval will be performed from the school's database server APIs or through web scraping techniques This approach aims to automate the data collection process and obtain relevant information that will contribute to the development of the knowledge base system The collected data will encompass a wide range of inquiries, frequently asked questions, and other relevant information pertaining to the International School at VNU
By collecting comprehensive and diverse data, the resulting knowledge base system will
be equipped to address a broad spectrum of user queries
3.3 Data Analysis
The collected data will undergo rigorous analysis to identify patterns, trends, and key concepts related to the integration of databases into a chatbot system Data analysis techniques, such as data mining and natural language processing, will be employed to extract valuable insights and derive meaningful conclusions This analysis will provide a deeper understanding of the structure and organization of the knowledge base system, guiding the subsequent development process and ensuring that the system effectively captures and represents the relevant information
3.4 Model Development
Based on the insights gained from the literature review and data analysis, a model or framework will be developed to facilitate the integration of databases into the VNUIS chatbot This model will outline the structure, organization, and retrieval mechanisms of the knowledge base system It will consider factors such as data storage, indexing, and query processing to ensure efficient and accurate retrieval of information The aim is to
Trang 13design a model that enables the chatbot to provide prompt and relevant responses to user queries, leveraging the power of the integrated databases
3.5 Prototype Implementation
To demonstrate the practical application of the proposed model, a prototype of the knowledge base system will be implemented This involves translating the conceptual design into a functional system Specialized technologies, such as Pinecone and Milvus, will be utilized to handle vector data efficiently Additionally, embedding technologies such as PhoBERT, SentencesBERT, and OPENAI will be integrated to apply advanced knowledge representation techniques The prototype implementation serves as a tangible demonstration of the capabilities of the proposed system, allowing for further evaluation and refinement
3.6 Testing and Evaluation
A comprehensive testing and evaluation process will be conducted to validate the functionality, accuracy, and performance of the integrated knowledge base system within the chatbot This involves testing the system under various scenarios, user queries, and edge cases to identify potential issues or limitations Feedback from users and stakeholders will be collected to gain insights into the system's strengths and areas for improvement This iterative testing and evaluation phase ensures that the developed system meets the desired criteria and effectively serves the academic audience at VNU's International School
By employing these research methods, this study aims to develop an effective and efficient knowledge base system integrated into the chatbot at VNUIS The systematic approach ensures the thorough exploration of existing literature, rigorous data analysis, and practical implementation, ultimately contributing to the improvement of user experience and satisfaction within the International School community
4 Scope of the Study
The scope of this research study is centered on the development and implementation of a knowledge base system integrated within a chatbot, specifically designed to cater to the academic audience at Vietnam National University's International School (VNUIS)
Trang 14The study aims to address the distinct challenges and requirements encountered within the educational institution and provide a solution that enhances the efficacy of question-answering processes within the academic community
The scope of this study encompasses the following key dimensions:
4.1 Academic Audience
The target audience for this research comprises students, faculty, staff, and other stakeholders within the academic environment of VNUIS By tailoring the knowledge base system to their specific needs, the study aims to improve accessibility and responsiveness, ensuring accurate and timely information retrieval for this academic audience
4.2 Research Area
To ensure the availability of relevant and accurate information within the knowledge base system, a meticulous data collection process will be employed Data will be manually crawled from trusted sources, specifically the official websites of VNUIS, such as is.vnu.vn and is.vnuis.vn This process involves the systematic extraction of inquiries, frequently asked questions, and other pertinent information essential for the development
of the knowledge base system
Once the data has been collected, it will be embedded within the system using advanced techniques Data embedding ensures the structured organization and efficient storage of information, enabling the chatbot to access and retrieve the required information promptly The embedding process plays a crucial role in optimizing the functionality and performance of the knowledge base system, ultimately enhancing the user experience for the academic audience at VNUIS
4.3 Technology Area: Technologies, Distributed Vector Databases, and Embedding for Bot Application
4.3.1 Technologies
To ensure the efficient functioning and optimal performance of the knowledge base
Trang 15technologies include natural language processing (NLP), machine learning algorithms, and artificial intelligence (AI) techniques NLP enables the chatbot to understand and interpret user queries, while machine learning algorithms and AI techniques facilitate the learning and adaptation of the chatbot over time, enhancing its ability to provide accurate and contextually relevant responses
Additionally, the use of distributed vector databases allows for the effective storage and retrieval of data within the knowledge base system These databases offer scalability, fault tolerance, and high-performance capabilities, ensuring the efficient processing of large amounts of data required for the chatbot's functionality
4.3.2 Distributed Vector Databases
Distributed vector databases play a crucial role in the storage and retrieval of information within the knowledge base system These databases enable the representation of textual information as numerical vectors, which can be efficiently stored and indexed for quick access
By utilizing distributed vector databases, the chatbot can effectively handle a vast amount
of data while ensuring fast and accurate retrieval of information The distributed nature
of these databases allows for scalability, enabling the system to handle increasing amounts
of data and user queries without compromising performance
4.3.3 Embedding Techniques
Embedding techniques are applied to transform textual information into numerical representations that can be processed and utilized by the chatbot These techniques enable the chatbot to understand and match user queries with relevant information stored within the knowledge base system
Through the application of embedding techniques, the chatbot can establish semantic relationships and similarities between different pieces of information This enhances the system's ability to provide comprehensive and contextually relevant responses to user inquiries