1. Trang chủ
  2. » Luận Văn - Báo Cáo

Khóa luận tốt nghiệp Hệ thống thông tin: High performance search and verification system for Covid-19 information

97 0 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề High Performance Search and Verification System for Covid-19 Information
Tác giả Thai Hoang Thinh, Le Trinh Quang Trieu
Người hướng dẫn Phd. Le Kim Hung
Trường học University of Information Technology
Chuyên ngành Information Systems
Thể loại Thesis
Năm xuất bản 2021
Thành phố Ho Chi Minh City
Định dạng
Số trang 97
Dung lượng 45,09 MB

Nội dung

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITYUNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS THAI HOANG THINH LE TRINH QUANG TRIEU HIGH PERFORMANCE SEARCH AND

Trang 1

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS

THAI HOANG THINH

LE TRINH QUANG TRIEU

HIGH PERFORMANCE SEARCH AND VERIFICATION SYSTEM FOR COVID-19

INFORMATION

BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS

HO CHI MINH CITY, 2021

Trang 2

NATIONAL UNIVERSITY HO CHI MINH CITY

UNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS

THAI HOANG THINH -17521091

LE TRINH QUANG TRIEU - 1751314

HIGH PERFORMANCE SEARCH AND VERIFICATION SYSTEM FOR COVID-19

INFORMATION

BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS

THESIS ADVISORPHD LE KIM HUNG

HO CHI MINH CITY, 2021

Trang 3

ASSESSMENT COMMITTEE

The Assessment Committee is established under the Decision , date

"¬ by Rector of the University of Information Technology

Doce - Chairman

Trang 4

ACKNOWLEDGMENTS

Trang 6

ĐẠI HỌC QUỐC GIA TP HÒ CHÍ CỘNG HÒA XÃ HỘI CHỦ NGHĨA

MINH VIỆT NAM

TRƯỜNG ĐẠI HỌC CÔNG Độc Lập - Tự Do - Hạnh Phúc

HIGH PERFORMANCE SEARCH AND VERIFICATION

SYSTEM FOR COVID-19 INFORMATION

Students work: MentorThai Hoang Thinh 17521091 TS Lé Kim Hung

Lê Trinh Quang Triệu 17521314

Trang 7

Thesis review

1 About the report:

Number page: Number chapter:

Number of data tables: Number of figures:

Number of references: Product:

Some comments on the report format:

2 About research content:

3 About the application program:

4 About the student's working attitude:

General assessment: Thesis pass/fail the requirements of anengineering/bachelor's thesis, graded Excellent/Good/Average

Trang 8

Score each student:

Thái Hoang Thinh 10

Lê Trịnh Quang Triệu 9

Mentor

TS Lé Kim Hung

Trang 9

ĐẠI HỌC QUOC GIA TP.HÒ CHÍ — CỘNG HÒA XÃ HỘI CHỦ NGHĨA

MINH VIỆT NAM

TRƯỜNG ĐẠI HỌC CÔNG Độc Lập - Tự Do - Hạnh Phúc

HIGH PERFORMANCE SEARCH AND VERIFICATION

SYSTEM FOR COVID-19 INFORMATION

Students work: ReviewerThai Hoang Thinh 17521091

Lê Trinh Quang Triệu 17521314

Trang 10

Thesis review

2 About the report:

Number page: Number chapter:

Number of data tables: Number of figures:Number of references: Product:

Some comments on the report format:

3 About research content:

4 About the application program:

5 About the student's working attitude:

Trang 11

General assessment: Thesis pass/fail the requirements of an

engineering/bachelor's thesis, graded Excellent/Good/Average

Score each student:

Thái Hoang Thinh /10

Lê Trịnh Quang Triệu 190

Reviewer

Thank you

Trang 12

First of all, we would like to thank all the instructors of the Department ofInformation System, as well as the instructors at the University of InformationTechnology - Vietnam National University, Ho Chi Minh City, who have provided uswith invaluable knowledge, lessons and experience over the past four years We also spent a little time and arranged the schedule to have the best opportunity to complete the graduation thesis Wishing the Faculty of Information Systems in particular and theUniversity of Information Technology in general to continue to achieve success inteaching and training talents, is a firm belief for generations students on their educationaljourney.

We would like to express our heartfelt gratitude to Dr Le Kim Hung, our lecturer,

in particular He has always cared about and helped us solve problems and difficulties inthe implementation process, thanks to the valuable experiences and lessons he has shared.

We were able to successfully complete this graduation thesis thanks to you Next, we'd like to express our gratitude to our family for their unwavering support and encouragement throughout our time at the University of Information Technology -Vietnam National University in Ho Chi Minh City, which has given us more energy toattend classes

Finally, our team would like to thank the brothers, sisters and students at theUniversity of Information Technology - Vietnam National University, Ho Chi Minh City,who have always enthusiastically supported, shared ideas and suggestions for us in thefuture dissertation time.

City Ho Chi Minh, date 26 month 12 2021Thai Hoang Thinh - Lé Trinh Quang Triéu

Trang 13

ĐẠI HOC QUOC GIA TP HO CHÍ CONG HOA XA HOI CHU

MINH NGHIA VIET NAM

TRUONG DAI HOC CONG NGHE Độc Lập - Tự Do - Hạnh Phúc

THÔNG TIN

DETAILED OUTLINE

THEME NAME: HIGH PERFORMANCE SEARCH AND VERIFICATION

SYSTEM FOR COVID-19 INFORMATION

Mentor: TS Lé Kim Hing

Time to do: From 07/08/2021 to 26/12/2021

Students:

Thai Hoang Thinh - 17521091

Lé Trinh Quang Triéu - 17521314

Content of the subject:

Objectives of the study:

Apply FAISS and Fake News Detection Model available

Crawling data from rss feed

Learn about gRPC's HTTP/2 protocol and its application to the system

Use Elasticsearch to quickly query data

Testing, building and optimizing the website system to provide informationabout Covid-19 has been confirmed

Management services by Protainer

+ Deploy system and public

10

Trang 14

performance search and verification system for covid-19 information.

Research scope: Research on FAISS and Fake News Detection model, gRPC protocol and find suitable datasets.

Methods of implementation:

e Theoretical basis:

+ Crawl data on reputable rss feed and store data on Elasticsearch

Integrated FAISS finds the closest semantic data index and Fake NewsDetection model to calculate the correct percentage of the sentencessearched.

+ Use gRPC protocol to request data quickly.

Expected results:

There is a large and validated data set

You may rate the search sentence as low comprehension

Create a system to check whether the information entered is correct or not

Create a scalable, stress-resistant and horizontally scalable system(increasing the number of nodes)

Technology related to the topic:

Use Elasticsearch to bookmark index data.

Use docker compose and docker swarm.

Available models such as Faiss and Fake News Detection.

Crawl data using selenium with rss feed

Deployment using docker and load balancing using nginx.

11

Trang 15

Research FAISS Model.

Research Fake NewsDetection Model

Research Roberta baseModel

Research gRPC.

Deploy envoy

Research about servers.

12

Trang 16

Set up a private server totrain models.

31/10/2021 - 2/11/2021 Research domain IP

Use domain to map to

server.

- Hide IP with cloudflare

16/11/2021 - 26/12/2021 - Deploy all systems to server

and public outside by dockerswarm.

- Check system operation

Confirmation of mentor TP HCM, 26/12/2021

Student 1 Student 2

Ts Lê Kim Hùng

13

Trang 18

HTTP/2 employs binary rather than text 13Multiplexing of Requests and Responses 13Streams 13

How to connect a gRPC server 15

CI/CD github 16

Docker engine 17

Docker container 17Docker swarm 18Docker compose 20Portainer management 21

Domain ip address 22

15

Trang 19

Chapter 3

MODEL ARCHITECTURE

General

OverviewWorkflow

Detailed training processDetailed predicting processCrawler API

WorkflowFake News detection APIUser Interface

Deployment and load balancing in docker swarm

Chapter 4

IMPLEMENTATION

RSS-crawler APIFake News detection apiUser interface

Github CI/CD processElasticsearch

16

24

2424

26

26

26

272730

3233

35

3544

526064

69

72737373

Trang 20

LIST OF FIGURES

Figure 2-1 Workflow RSS work

Figure 2-2 Flow RSS crawler API work

Figure 2-3 Dataset sample 1

Figure 2-4 Dataset sample 2

Figure 2-5 Predict Process Workflow

Figure 2-6 Summary of all models and performances

Figure 2-7 Example Encoder data by encoder

Figure 2-8 Example Encoder data by encoder

Figure 2-9 Use FAISS search something

Figure 2-10 Flow Elasticsearch engine work

Figure 2-11 Two Sentence Input Elasticsearch

Figure 2-12 Inverted Index Elasticsearch

Figure 2-13 Search Query Elasticsearch

Figure 2-14 gRPC workflow

Figure 2-15 gRPC auto convert text, image, etc to binary

Figure 2-16 Stream gRPC workflow

Figure 2-17 Example Compare HTTP/1.1 and HTTP/2

Figure 2-18 Core Executed Envoy

Figure 2-19 Envoy Translates HTTP/* To HTTP/2

Figure 2-20 Flow CI/CD work in github (Viblo CI/CD)

Figure 2-21 Flow docker work (Topdev docker)

Figure 2-22 Swarm node (Bizflycloud)

Figure 2-23 Workflow docker compose build (Azuremarketplace)

Figure 2-24 Dashboard of portainer example

Figure 2-25 List activity of container example

17

oo On Do fF +

1010in12121313141415161616171820212222

Trang 21

Figure 2-26 Register domain complete at matbao

Figure 2-26 Grant DNS domain

Figure 3-1 Overview model architecture

Figure 3-2 Overview public system

Figure 3-3 Crawler process

Figure 3-4 Al-core overview

Figure 3-6 Fake News detection API workflow

Figure 3-6 Encoding Copus Workflow

Figure 3-7 Result classification sentences

Figure 3-8 Envoy transfer HTTP/1* to HTTP/2 and general protoc.

Figure 3-9 Apply nginx in docker swarm for management.

Figure 4-1 Swagger UI

Figure 4-2 Test uncrawlable url

Figure 4-3 Test crawlable url

Figure 4-4 Test add url to RSS-crawler api

Figure 4-5 Test remove rss url

Figure 4-6 Test get all rss-url in data-base

Figure 4-7 Swagger UI

Figure 4-8 Demo train a bunch of data include a random sentence

Figure 4-9 Result after call predict api

Figure 4-10 Saving checkpoint api

Figure 4-11 Where to save checkpoint

Figure 4-12 Code structure

Figure 4-13 Libraries in package Json

Figure 4-14 Atomic design workflow

Figure 4-15 Source code

323334353638

394041

42434445

464148495051

5253

Trang 22

Figure 4-17 serve.proto

Figure 4-18 All file js have been general

Figure 4-19 Envoy.yaml

Figure 4-20 Test gRPC on BloomRPC.

Figure 4-21 Function call gRPC on ReactJS

Figure 4-22 Website CoronaCheck Information

Figure 4-23 CI/CD build and push basic flow

Figure 4-24 CICD steps

Figure 4-25 CICD code example

Figure 4-26 CICD process summary

Figure 4-27 Workflow history

Figure 4-28 Model overview

Figure 4-29 Status checker

Figure 4-30 Node checker

Figure 4-31 Kibana UI node checker

Figure 4-32 Example Network configuration

Figure 4-33 Example for kibana container

Figure 4-34 Example for elasticsearch master node

Figure 4-35 Initialize swarm

Figure 4-36 Swarm node checker

Figure 4-37 Swarm service result

Figure 4-38 Result after start up

Figure 4-39 Demo run a docker container in existed swarm

Figure 4-40 The swagger result after start up append

6162

636465656666

676768

6869697071

Trang 23

Everyone in the world has been concerned about the covid-19 pandemic since theoutbreak At that time, it must be stated that in a pandemic, the media is the mostimportant; if one news media is not reliable, a country can suffer severe economic consequences Since then, many unreliable sources of information have been widely disseminated on the internet, particularly on widely accessible social media platforms Fake news is distributed by social media and news organizations to increase readership or

as psychological warfare The goal, according to Inge Tang, is to profit from clickbaits.With flashy titles or designs, clickbaits entice users to click links in order to boost adrevenue Because of advances in communication brought about by the rise of socialnetworking sites, this exposure is due to the prevalence of fake news The project's goal

is to develop a solution that users can use to identify and filter out websites that contain false or misleading information.

In this graduation thesis, I investigate and comprehend the problem of detecting fake news In addition, I used my existing knowledge, as well as learning more about machinelearning and deep learning, to use available models and build a website system to detectfake news covid-19

20

Trang 24

1.2 Rationale

So, what happens when a country is affected by covid? Such incidents always occur as a result of an unconfirmed word or unreviewed articles from that country's health ministries, causing confusion and extremely serious effects on the economy and citizens After a parallel pandemic, there will be a lot of information,some of it untrustworthy and even harmful, that has spread widely around theworld What matters here is the dependability and dependability of both the source

of the information and the information itself, which has emerged as a global issue

in modern society There is no doubt that social networking has revolutionized theway information spreads on the Web and in general around the world over the lastfew decades by allowing users to freely share content faster than traditional news sources Because content spreads so quickly and easily across platforms, people (and the algorithms that power the platforms) are potentially vulnerable tomisinformation, hoaxes, biases, and biases Every day, whether accidentally or onpurpose, low-trust content is shared However, the issue of misinformation spreadaffects not only social media platforms, but also the World Wide Web In fact,when people use web search engines like Google or Bing to conduct a search, they

Trang 25

can view and potentially visit hundreds or thousands of web pages with varyinginformation Covid-19 information that is both useful and potentially misleading.

The goal is to build an information search system that is scalable and highlyerror-tolerant during data collection and analysis of information documents, usingreputable information from the Ministry of Health that is carefully filtered andreputable article url Use the fake news detection model available and combine with the above system to create a complete system that can be produced.

1.3 Aims and Objectives

With this graduate thesis, the main goal that I aim for when doing this is asfollows:

Research and apply detection of fake news models.

Crawling data by selenium through the rss feed.

To apply Elasticsearch engines into the system and optimize search engines.Understanding Grpc protocol (HTTP/2) for communication betweenbackend with backend and frontend with backend

To use Envoy proxy for transfer protocol HTTP/1.* to HTTP/2

To use the React-JS framework used to build website tracking information

To use Portainer for managing all images and containers on docker

To use docker, docker swarm and docker compose deploy all services in the

system.

Integrated CI/CD github for build and test.

Grant domain and public server all services.

1.4 Scope of study

The goal of this study is to create a fault-tolerant covid-19 information trackingsystem when large amounts of data are added and requested Furthermore, the

Trang 26

system must be capable of simultaneously handling multiple queries When there is

a problem with the system, it should be able to restart itself using Docker Swarm

Trang 27

Chapter 2 LITERATURE REVIEW AND THEORETICAL

to do anything In this case, we will choose rss feed to provide data, using python(selenium) for crawling data through the rss feed and parse responses to get title

RSS crawler api

Crawling e200

NEWS ES),

-Figure 2-2 Flow rss feed has been crawled by API crawler rss

In addition, we develop a module with additional rss link sources to be able toenrich the data Crawler will crawl data each 3 hours

Trang 28

2.2 Dataset

Dataset is a collection of data In other words, a dataset corresponds to thecontents of a database table or a matrix of statistical data, where each column of the table represents a specific variable and each row corresponds to a certain member of the dataset in question In Machine Learning projects, we need atraining dataset This is the actual data set used to train the model to performdifferent actions Datasets are broken down into 3 categories for use in machinelearning including: training set, validation set, testing set

Currently on the internet there are many datasets about the problem of covidfakenew detection, but are those types of data really useful, really clean?Therefore, the selection of a dataset to pre-train the model is one of the most important and important issues to pay attention to.

Datasets that our team has collected and analyzed at paperswithcode, kaggle.

[uni] Browse State-of-the-Art Datasets | Methods More - We XS Signin |

Texts

COVID-19 Fake News Dataset (COVID19 Fake News edit

Detection in English)

Introduced by Patwa et al in Fighting an Infodemic: COVID-19 Fake News Dataset

Along with COVID-19 pandemic we are also fighting an `infodemic: Fake news and rumors are rampant on social media, Believing in rumors can cause significant harm This is further exacerbated at the time of a pandemic To tackle this, we

curate and release a manually annotated dataset of 10,700 social media posts and articles of real and fake news on Usage 4

COVID-19 We benchmark the annotated dataset with four machine learning baselines - Decision Tree, Logistic Regression , Gradient Boost , and Support Vector Machine (SVM) We obtain the best performance of 93.46\% F1- ©

® COVID-19 Fake

Benchmarks rất

eng Te saset Varia est Model License ® wea

Fake News Detection COVID-19Fake News Dataset Ensemble Madel+ Heuristic Post-Processing & =

Trang 29

= kaggle

Create Home Competitions

total metadata rows: 866685

CORD UIDs (new: 8526, removed: 26)

FuLL text POF - 314391 json (new: 3047, removed: 89) PMC - 243652 json (new: 1831)

2021-12-13 CHANGES.

No major changes

~=-SUMMARY-~~

total metadata rows: 858007

CORD UIDs (new: 12364, removed: 45)

Full text POF - 311432 json (new: 8332, remove: d: 178) PMC - 241821 json (new: 7018)

n the site By using Kaggle, you agree to our use of c

Download (63 GB) New Notebook Ệ

fe

Sign In

2.3 Fake news detection

Figure 2-4 Dataset sample challenge (Kaggle dataset)

On the one hand, fake news articles are easily spread through various onlinemedia platforms nowadays, posing a significant threat to the trustworthiness ofinformation On the other hand, our comprehension of the language of fake newsremains limited Incorporating the hierarchical discourse-level structure of fake andreal news articles is an important step toward gaining a better understanding of howthese articles are structured Nonetheless, this has received little attention in the fake news detection domain and faces significant challenges For starters, existing methods for capturing discourse-level structure rely on annotated corpora, which arenot available in the case of fake news datasets

The mechanism of fake news detection AI is based on text semanticclassification This means that every sentence or phrase that feeds to the model will

be labeled true or false Model will be trained nine times with that data

Trang 30

Predict process

Encode block

C >) Trainsformer

tokenizer

Re-train RoberTa base :

model Logic Model RoberTa base

model Documents

D

In this case, we compare the performance of 19 models provided by theauthors along several aspects like features used, resource requirements, etc Ourteam chose RoberTa to apply to our system

R Kế Summary of Result (Acc.)

Model Type Model Rationale for Picking Feature Used Lạc Fake Combined

Or Real Corpus

SVM Lexical 0.56 0.67 0.71

SVM These traditional models are Lexical + Sentiment 0.56 0.66 0.71

Traditional LR used in different classification Lexical + Sentiment 0.56 0 67 0 76

Machine Decision Tree tasks including text Lexical + Sentiment 0.51 0 65 0 67

Learning AdaBoost classification Different Lexical + Sentiment 0.56 0.72 0 74

Models Naive Bayes existing studies used them for Unigram 0.60 0 82 0.91

Naive Bayes fake news detection as well Bigram 0.60 0 86 0 93

Learning LSTM LSTM remembers information GloVe embedding 054 0.76 0.93

Models for long sentences.

Bi-LSTM analyzes a certain

HAN applies attention

HAN mechanism for both 0.75 0.87 0.92

word-level and sentence-level representation.

Convolutional layer encodes

Conv-HAN embedding into feature for 0.59 0.86 0.92

word-level and level attention.

senetence-BERT - senetence-BERT embeddings 0.62 0 96 0 95

T†ng: RoBERTa Sein oe ietboapod RoBERTa embeddings 0.62 0.98 0.96

Language DistiIBERT and can be fine-tuned for” DistiIBERT embeddings 0.60 0 95 0.93

Models ELECTRA texLclssdiREntGii ELECTRA embeddings 0.61 0 96 0 95

ELMo ELMo embeddings 0.61 0 93 0.91

Figure 2-6 Compare models and performances

Trang 31

Our team has tested and checked the accuracy of this roberta model and theresults are exactly what the above statistics have described.

Q 1-8x of 8

Name Value

accuracy 1

fl_score 1 loss 0.000004708 val_accuracy 0.9981

val_f1 0.9972

val_loss 0.001654 val_precision 0.9976

val_recall 0.9975

Figure 2-6 Testing result

2.4 FAISS

Faiss is a library for efficient similarity search and clustering of dense vectors

It contains algorithms that search in sets of vectors of any size, up to ones thatpossibly do not fit in RAM It also contains supporting code for evaluation andparameter tuning

Trang 32

2.4.1 Semantic similarity in universal sentence encoder.

Faiss is a library for efficient similarity search and clustering of densevectors It contains algorithms that search in sets of vectors of any size, up toones that possibly do not fit in RAM It also contains supporting code for evaluation and parameter tuning.

Semantic similarity is a measure of the degree to which two pieces of textcarry the same meaning This is broadly useful in obtaining good coverageover the numerous ways that a thought can be expressed using languagewithout needing to manually enumerate them Simple applications includeimproving the coverage of systems that trigger behaviors on certain keywords,phrases or utterances This section of the notebook shows how to encode textand compare encoding distances as a proxy for semantic similarity.

"How old are you?"

"What is your age?"

"My phone is good."

Figure 2-7 Example Encoder data by encoder

Trang 33

8 ‘What color is chameleon?',

9 ‘When is the festival of colors?',

10 ‘When is the next music festival?',

11 ‘How far is the moon?',

12 ‘How far is the sun?',

13 ‘What happens when the sun goes down?',

14 ‘What we do in the shadows?',

15 ‘What is the meaning of all this?',

16 ‘What is the meaning of Russel\'s paradox?',

17 ‘How are you doing?'

Trang 34

2.5 Elasticsearch engine

Elasticsearch is a search engine based on the Lucene Apache library Itprovides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.

2.5.1 How does elasticsearch work

Raw data flows into Elasticsearch from a variety of sources, includinglogs, system metrics, and web applications Data ingestion is the process bywhich this raw data is parsed, normalized, and enriched before it is indexed inthe Elasticsearch engine Once indexed in Elasticsearch, users can runcomplex queries against their data and use aggregations to retrieve complexsummaries of their data

Figure 2-10 Flow Elasticsearch engine work

2.5.2 Reindex in elasticsearch

Use an inverted index structure in Elasticsearch It is intended to supportfull-text searches The documents are divided into meaningful words and then mapped to determine which text belongs to which Depending on the type of search, it will return specific results.

lãi

Trang 35

Example, we have 2 documents that have been added to elasticsearch.

1,The quick brown fox jumped over the lazy dog

2,Quick brown foxes leap over lazy dogs in summer

Figure 2-11 Two sentence input Elasticsearch example

(Topdev elasticsearch reindex)

To make an inverted index, we'll extract each document's content intoindividual words (which we call terms), make a sorted list of all unique terms,and then list the documents in which each term appears The following are the

outcomes:

Quick | |} x The Ni

brown | ie | xX

dog DP |

dogs | | X fox | x |

foxes | | X

in | | x

jumped | me |

lazy | x | X leap | | x over | x | x quick | x |

summer | | X the | x

Figure 2-12 Inverted index elasticsearch

(Topdev elasticsearch reindex)

12

Trang 36

For quick brown color search, we just need to look in the documentswhere each term appears or not The following results:

Term Doc_1 Doc 2

brown | x | xXquick | rs |

Total | 2 | 1

Figure 2-13 Search Query Elasticsearch

(Topdev elasticsearch reindex)

Both paragraphs are keyphrases It is clear, however, that Doc 1 is far more correct Data can be quickly searched and support for fuzzy search(fuzzy), which means that even if the search query contains misspellings orerroneous syntax, elasticsearch can still return useful results

2.6 gRPC protocol

gRPC is an open-source data exchange technology developed by Google using the HTTP 2.0 protocol and we often use REST to design web API It uses standardHTTP 1.1 methods like GET, POST, PUT, and DELETE to work with server-side

Figure 2-14 gRPC workflow

13

Trang 37

2.6.1 HTTP/2 employs binary rather than text

Binary protocols are easier to parse, more compact to communicate, and,most importantly, less error prone than text Simply because binary data doesnot have to deal with things like spaces, capitalization, carriage returns, blanklines, emojis, and so on

Figure 2-15 gRPC auto convert text, image, etc to binary

2.6.2 Multiplexing of Requests and Responses

When the client wants to optimize the connection by making multiplerequests at once and then waiting for the response, this is referred to asPipelining in HTTP/1 We already know that the response must be in thecorrect order in relation to the request This results in the HOL phenomenon.

To address this, HTTP/2 employs multiplexing for both the request and theresponse It is understandable that they are identified in order to determine which response to which request They can then work independently withouthaving to adhere to the same order as before This method, in fact, divides databetween Client and Server into interleaved data frames The receiving sidewill be responsible for "assembling" them in order to obtain the complete data

2.6.3 Streams

Multiple Streams will be present in an HTTP/2 connection (data streams) These streams contain a series of Data Frames that must be exchanged between the client and server Data Frames can be interleaved regardless of

14

Trang 38

origin Nonetheless, the order of the Data Frame is critical, and the receivermust process it in this order A data stream can be closed or restarted on boththe receiving and sending sides A new Stream will be assigned an integer IDwhen it is created.

'HTTP/2 Request

steam! steam3 stream?

frame2 framet frame2

— seen rent —

frame’ frame2

Figure 2-16 Stream gRPC workflow (Grpc protocol)

In the example below The large image is broken down into several smaller files We can see that the thumbnails are uploaded sequentially, oneafter the other, using HTTP/1 With HTTP/2, the TCP connection is dividedinto multiple streams, each of which is responsible for loading a small image These streams can intertwine asynchronously and do not need to wait in order.

Trang 39

2.6.4 How to connect a gRPC server.

With gRPC-Web, client calls still need to be translated into gRPC-friendlycalls, but that role is now filled by Envoy proxy service, which has built-insupport for gRPC-Web and serves as its default service gateway.

@©=>

Figure 2-18 Core Executed Envoy

(Comdy envoy proxy)

Listener is an Envoy module that accepts new connections and loadsIP/Port Filter will be in charge of handling the request A request can berouted through a series of filters Envoy proxy using codec API has beenintegrated to translate the HTTP/1.* to HTTP/2 or layer higher.

gRPC-Web

app

Browser

gRPC-Web

Figure 2-19 Envoy Translates HTTP/* To HTTP/2

(Comdy envoy proxy)

16

Trang 40

2.7 CL/CD github

The CI/CD Pipeline or Continuous Integration/Continuous Deployment is thebackbone of the modern DevOps environment It bridges the gap between

development and operations by automating application building, testing, and

deployment CI stands for Continuous Integration and CD stands for Continuous

Delivery/Continuous Deployment We can think of it as a process like the software

development lifecycle In the owner system, all services will be integrated with

CI/CD to ensure all these services can run in any environment when docker is

Figure 2-20 Flow CI/CD work in github (Viblo CI/CD)

This necessitates the use of a project management system as well as codeversioning We'll use github because it has built-in CI/CD support We only need

to write a CI/CD script, commit the code version, and push the code, and CI/CD

will run on its own.

17

Ngày đăng: 23/10/2024, 01:17

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN