1. Trang chủ
  2. » Luận Văn - Báo Cáo

Ai205De01 artificial intelligence final project report hsu chatbot

14 0 0
Tài liệu được quét OCR, nội dung có thể không chính xác
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề Hsu Chatbot
Tác giả Lê Văn Niém, Phan Văn Khải, Nguyễn Trần Trung Kiên
Người hướng dẫn Lé Thanh Tung
Trường học Hoa Sen University
Chuyên ngành Information Technology
Thể loại Final Project Report
Năm xuất bản 2024
Thành phố Ho Chi Minh City
Định dạng
Số trang 14
Dung lượng 597,04 KB

Nội dung

To provide a complete knowledge base, the project integrates data taken from approximately 300 websites on HSU's official website, which 1s processed and stored in a vector database call

Trang 1

MINISTRY OF EDUCATION AND TRAINING

AI05DE01 ARTIFICIAL INTELLIGENCE

FINAL PROJECT REPORT

HSU CHATBOT

Lecturer: Lé Thanh Tung

Member List:

1 Lê Văn Niém — 22207193

2 Phan Văn Khải- 22206077

3 Nguyễn Trần Trung Kiên- 22205375

JULY 02,2024

Trang 2

MINISTRY OF EDUCATION AND TRAINING

HOA SEN UNIVERSITY

FACULTY OF INFORMATION TECHNOLOGY

AI05DE01 ARTIFICIAL INTELLIGENCE

FINAL PROJECT REPORT HSU CHATBOT

Lecturer: Lé Thanh Tung

Member List:

1 Lê Văn Niềm- 22207193

2 Phan Van Khai — 22206077

3 Nguyén Tran Trung Kién — 22205375

JULY 2,2024 PLEDGE

Trang 3

“We have read and understand the academic integrity violations We pledge on

our personal honor that this work was done by us and does not violate academic

integrity.”

Day month year

(Student’s full name and signature)

Trang 4

ABSTRACT

This project introduces a revolutionary AI chatbot that is intended to improve

user engagement and streamline information access for prospective and current

Hoa Sen University (HSU) clients The chatbot uses cutting-edge technology to

offer a tailored and thorough guide to everything HSU

The chatbot's fundamental language model is Google's Gemini 1.5, which is

well-known for its outstanding natural language processing and creation skills

To provide a complete knowledge base, the project integrates data taken from

approximately 300 websites on HSU's official website, which 1s processed and

stored in a vector database called ChromaDB This database enables the efficient

retrieval of relevant information depending on user queries

The chatbot's functionality is based on LangChain, a robust framework for

creating sophisticated conversational bots LangChain's Retrieval-Augmented

Generation (RAG) approach is used, which enables the chatbot to retrieve

important information from the ChromaDB database and smoothly integrate it

into its responses This ensures that the chatbot's responses are not just useful but

also personalized to each user's specific demands

The HSU AI Chatbot strives to improve user happiness and foster better ties

between the university and its community by making information easily

available, engaging, and instructive Its ability to provide tailored insights on

HSU's academic programs, facilities, student life, and other facets of university

life has the potential to dramatically improve the user experience, promoting a

better awareness of the university's offerings and ideals

1H

Trang 5

ACKNOWLEDGEMENT

Trang 6

LECTURER’S REVIEW

Ho Chi Minh City, Day month year 2023

REVIEWER

Trang 7

TABLE OF CONTENTS

LECTURER’S REVIEW

1 Introduction

2.1 Overview of the system

2.2 System Architecture/System Flow

2.3 Detailed Description of System Components

Project Scope

`

Reference

vi

Trang 8

LIST OF TABLES, DIAGRAMS, IMAGES

Image 3: Chatbof”S OUfUI Gà 1H 111141131911 1916 TT TT 0 16014 12

vii

Trang 9

1 Introduction

This project introduces an artificial intelligence chatbot that will act as a thorough guide for prospective students and current Hoa Sen University clients The chatbot is

powered by Google's Gemini 1.5 language model and LangChain's Retrieval-

Augmented Generation (RAG) approach, and it uses a vector database (ChromaDB) to extract information from over 300 webpages on HSU's main website The chatbot

responds to user requests with individualized responses that provide thorough

information about HSU's academic programs, facilities, student life, and other topics

This project seeks to improve the user experience by offering an easily available and

useful resource, streamlining communication, and increasing engagement with the

university

2 System Description

2.1 Overview of the system

The Hoa Sen academic AI Chatbot is a cutting-edge conversational assistant that aims to increase user engagement and provide quick access to academic information It makes use of Google's strong Gemini 1.5 language model, which ensures natural and

informative responses, as well as a massive knowledge base derived from over 300

webpages scraped from HSU's official website This information is kept in a vector

database known as ChromaDB, which allows the chatbot to swiftly extract relevant

information in response to user inquiries

The chatbot's functionality is based on LangChain, a framework for creating

conversational bots It uses the Retrieval-Augmented Generation (RAG) approach,

which allows the chatbot to retrieve important information from ChromaDB and

smoothly integrate it into its responses

This innovative approach empowers the chatbot to deliver personalized and

comprehensive responses to user inquiries about HSU's academic programs, facilities, student life, and more It offers a convenient and engaging way for prospective students and current customers to explore the university and find the information they need The

Trang 10

chatbot aims to improve user satisfaction and strengthen the connection between HSU and its community by offering a readily accessible and informative resource

2.2 System Architecture/System Flow

Chunked a ¬ kK > ot

Texts Generate Embeddings om Prompt Embedding

— “| Embeddings =f 7/77 ——,

\ ” Qi

Most relevant text passages (context) |

Result

sm c3 LUM —=E|

Image 1: Chathot’s flow

Prompt

2.3 Detailed Description of System Components

1 Data Gathering and Preparation:

Webpage Scraping: The project begins by extracting relevant information from Hoa Sen University's official website This involves systematically collecting data from various webpages, potentially using BeautifulSoup

Data Chunking: The scraped data is divided into manageable chunks, ensuring efficient processing and storage This involve splitting text data from webpage by 1000 words per chunks for the vector database chew

Embedding with Google AI: Google AT's embedding technology is used to transform the text chunks into numerical representations This allows the ChromaDB database to efficiently store and retrieve information based on semantic similarity, meaning it can understand the meaning of text rather than just matching keywords

ChromaDB Storage: The embedded data chunks are stored within the ChromaDB vector database ChromaDB is optimized for storing and retrieving large amounts of text data, enabling efficient search based on semantic similarity

2 Agent phase:

User Query: The user interacts with the chatbot by typing in a question about Hoa Sen University

Trang 11

understanding model (like Gemini 1.5) to interpret its meaning and intent

ChromaDB Retrieval: The chatbot utilizes the ChromaDB database to retrieve relevant chunks of information based on the user's query This retrieval basically a tool included in the

custom agent is driven by semantic similarity, ensuring that the chatbot finds the most relevant

information even if the user's query uses different words than the original text

Response Generation: The chatbot combines the retrieved data with Gemini model

capabilities to generate a comprehensive and informative response for the user

Chat History: The chatbot stores the history of user interactions, allowing it to

potentially provide more personalized responses in future interactions

3 Project Scope

Data Gathering:

The project begins with acquiring important information from Hoa Sen University (HSU)

Scraping websites from HSU's official website, with a focus on academics, student life,

facilities, and general university information In addition, documents providing essential

information concerning HSU are collected To facilitate efficient storage and processing,

scraped data and document content are separated into manageable parts These pieces are

subsequently converted to numerical representations via Google Al's embedding technology

Finally, the embedded data chunks are saved in a ChromaDB vector database, which is designed

to store and retrieve vast volumes of text data based on semantic similarity This method builds

a comprehensive knowledge base from which the chatbot can present consumers with correct

and relevant information

Chatbot Development:

The chatbot’s functionality is based on LangChain, a framework for creating advanced

conversational bots The project uses the ChromaDB database for information retrieval,

allowing the chatbot to access the relevant knowledge base during discussions To interpret user questions and provide natural responses, the chatbot incorporates Google's sophisticated Gemini 1.5 language model A basic conversational flow is created to direct user activities, and a small

chat history function is incorporated to save the current conversation for future reference This

combination of technology results in a chatbot that can interact with users, interpret their

questions, acquire relevant information, and deliver informative responses

10

Trang 12

Data from social platforms: The project will not include data from social media platforms like Facebook or Instagram, focusing on official website and provided documents

Multi-lingual support: The chatbot will primarily operate in Vietnamese

Advanced voice interaction: The chatbot will primarily be text-based

Complex user authentication: User accounts and authentication will be kept simple for the

initial version

Real-time data updates: The chatbot’s knowledge base will be updated periodically, but it will not have real-time access to dynamically changing information

Different chat sessions: The chatbot did not implement separate chat sessions for different

users

4 Results

Image 2: User's input

Image 3: Chatbot’s output

11

Trang 13

relevant information from the knowledge base and generate coherent responses to user queries

Notably, the chatbot consistently retrieved accurate answers from the ChromaDB vector

database, even when user questions were phrased differently from the original text within the

scraped webpages and documents

For example, when a user inquired about the university's history, the chatbot retrieved information from a webpage about the university's founding and milestones, successfully

identifying and extracting the relevant information from a child page within the HSU website

This demonstrates that the chatbot's ability to navigate through the complex structure of the

HSU website and identify relevant content based on user questions, proving that it has

successfully implemented the core processes of data gathering, embedding, storage, and

retrieval, as designed

This achievement signifies the successful implementation of the project's core components, proving the effectiveness of using Google Al's embedding technology and ChromaDB for

storing and retrieving information based on semantic similarity It validates the chatbot’s ability

to navigate through a complex web structure and identify relevant information, ultimately

providing a valuable resource for users seeking information about Hoa Sen University

5 Summary

The HSU AI Chatbot project aimed to develop a conversational assistant that would serve

as a comprehensive guide for prospective students and current customers of the university The

project utilized advanced natural language processing (NLP) and a knowledge base built from

data scraped from HSU's official website and provided documents

The project faced several challenges:

Data Gathering: While the goal was to collect data from both the website and social media platforms, the project ultimately focused on the website and provided documents due to

limitations The sheer volume of data collected from the website posed a challenge in terms of

processing and storage

Data Integration: Finding a suitable method to integrate the data into ChromaDB, a vector database optimized for semantic search, proved to be a hurdle

Agent Development: Creating a custom agent that could effectively utilize the knowledge base and provide engaging responses required considerable effort

12

Trang 14

with the chatbot presented a significant challenge

Despite these difficulties, the project successfully demonstrated the core functionality of the chatbot It successfully retrieved relevant information from the ChromaDB database, proving the effectiveness of the embedding technology and data storage methods The chatbot was able

to navigate through the website's structure and identify pertinent information, even when user questions were phrased differently from the original text

This project laid the groundwork for future development, highlighting the need for addressing the challenges encountered, particularly in relation to user interface development, data

integration, and the inclusion of social media data The project demonstrated the potential of AI chatbots to enhance user experience and provide valuable information about Hoa Sen

University

6 Reference

Custom agent | LangChain (n.d.)

Chroma Docs (n.d.) https://docs.trychroma.com/

Quickstart: Send requests to the Vertex AI API for Gemini (n.d.) Google Cloud

https://cloud.google.com/vertex-ai/generative-ai/docs/start/quickstarts/quickstart-multimodal

13

Ngày đăng: 12/12/2024, 16:28

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN