Software and database design

Một phần của tài liệu Khóa luận tốt nghiệp: Constructing a knowledge graph with fact checking about Vietnamese cuisine (Trang 47 - 52)

3.2.1 System architecture

Database

Information

of entity query

WUĐRS API Application Fact checking

eb interface model

Information of entity

Http request/respond Word segmentation

Triple generation AP| execute Củittii word embedding and

model I Controls similarity

4&————————]

Score result

Hitp request/respond nede

User input Entity text and label

BERT-NER model]

Word separation.

entity recognition and

labeling

@

Figure 3-1 System architecture

As figure 3-1, our system is composed of numerous parts and tools that make it simple to update and maintain. A personal computer will be used to train the machine learning model, which will then be extracted and utilized in Python. The website system is based

on Reactjs and Nodejs, which are widely used to create websites that are easy to deploy, manage code, and serve as a conduit for communication between models and applications. The client side, Reactjs, creates the user interface (UI) and processes the BERT-NER results before sending them to the server side, Nodejs. Reactjs also serves as a bridge to the graph database and underlying machine learning models in Python format, allowing

38

JSON to be passed back and forth between the two sides. Another way that web apps connect users and the system is as a bridge.

To be more specific, the server component (Node.js) remains the most crucial component

of the system. This section serves as the hub of the system, connecting and exchanging data between all models, algorithms, and other services via the HTTP request response method. It also implements the CRUD (Create, Read, Update, Delete) methods. To process input or output data, JavaScript Object Notation (JSON) is needed. Regarding the knowledge graph, Neo4j aura will be used to deploy it and represent it on the Neo4j graph database. A cloud-based, fully automated, scalable, always-on graph platform is called Neo4j Aura. Therefore, when switching servers, we don't need to download the Neo4J application multiple times.

3.2.2 Knowledge graph construction

The reason we use knowledge graph is some of its benefits and benefits of using knowledge graph, database storage include:

e Knowledge graph helps organize information in a structured way. This helps us

easily track and understand relevant information.

e Represents relationships between objects, helping to clearly define the connection

between them. This can provide an overview of the data and create a rich and detailed picture of the information.

e Provides a flexible way to perform complex queries. We can search for information

and get results more quickly and efficiently.

e Knowledge graphs can be used to support artificial intelligence applications, such

as machine learning and natural language processing, by providing a rich and structured source of data.

e Knowledge graphs are usually easy to maintain and extend. As new information

becomes available, we can easily add it to the chart without affecting the overall

structure.

39

We must first construct the knowledge graph in order to be able to query it. We save time

to focus on more machine learning models, in contrast to other smart construction techniques that involve using models to separate sentences into a triple for inclusion in the knowledge graph. Therefore, we decided to construct the knowledge graph using a less complicated technique. Since the Vietnamese cuisine topic is not as large as other topics, we have been able to find information online more quickly. Wikipedia and other websites that list the dishes from all 63 of Vietnam's provinces have provided us with information. Once our system detects relevant or corresponding information extracted from a specific natural language query, an answer based on the results from the knowledge graph is generated.

We will outline our database organization strategy for the system in this section. As we previously discussed, our system's database needs to be organized. We made the decision

to gather data under supervision and by hand. The subject is foods and cuisines found in

63 Vietnamese provinces and cities. We have used techniques like:

e Web scraping: Gathering data on Vietnamese food by pulling text, photo from

pertinent blogs, forums, and websites. Next, preprocessing is done on the gathered data to guarantee consistency.

e Wikipedia: This well-known online encyclopedia was a great source of

information for the data. Structured data and textual content were extracted from Wikipedia articles about Vietnamese cuisine, provinces, cities, and culinary traditions.

We saved all of the data in CSV files after it was extracted. We've now divided it up into

a number of files, one csv file for each entity. The Food entity, for instance, has attributes like locationId, sourceld, image, etc. Figure 3-2, our data organization chart, is shown below.

40

Location

Id

Food LocationName

id lowerLocationName

Id

FoodName Regionld TypeName

engName noSpace lowerName

vieName Country

Locationld —

Typeld Region

Description Id

Image RegionName

Temporal lowerRegionName

sourceld RegionDetail

lowerRegionDetail

EngName

Figure 3-2 Data organization chart

The knowledge map built for the topic "Vietnamese Cuisine" plays an important role in organizing information and creating a logical structure between entities related to cuisine. The combination of entities and the relationships between them provide a comprehensive view of Vietnamese culinary culture.

4l

Table 3-1 Data organization analysis

1 Food Id, foodName, Id The Food entity plays a central role

engName, vieName, in the knowledge graph.

LocationId, Typeld, Relationships between Food and Description, Image, other entities such as Location, Temporal, sourceld Region, Source and Type help link

information in an organized and logical way.

2 Location Id, locationName, Id Location and Region Entities The

lowerLocationName, combination of Location and

regionId, noSpace, Region creates an _ organized

Country geographic system. Each dish can

be linked to a specific place and

3 Region Id, regionName, Id each place belongs to a certain

lowerRegionName, region. This makes it easy to track

regionDetail, and understand the origin and

lowerRegionDetail, region of each dish.

engName

4 Source Id, links Id Source and Type Entities, the

Source entity contains information about the source of the data, while

5 Type Id, typeName Id the Type entity relates to the

, , category of each dish although this

lowerName tà ca : .

entity is not too important in

helping to validate information but

it contributes to enriching the data.

This relationship helps identify and classify information effectively.

We gather information about Vietnamese food, including dishes, locations, and regions, and then we store it all in CSV files. We then import data into Neo4j. Map database to

42

enable a more comprehensive and user-friendly view, and we estimate that there are roughly 1,400 items and 15,000 relationships in our database.

Một phần của tài liệu Khóa luận tốt nghiệp: Constructing a knowledge graph with fact checking about Vietnamese cuisine (Trang 47 - 52)

Tải bản đầy đủ (PDF)

(93 trang)