VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITYUNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS THAI HOANG THINH LE TRINH QUANG TRIEU HIGH PERFORMANCE SEARCH AND
Trang 1VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY
UNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS
THAI HOANG THINH
LE TRINH QUANG TRIEU
HIGH PERFORMANCE SEARCH AND VERIFICATION SYSTEM FOR COVID-19
INFORMATION
BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS
HO CHI MINH CITY, 2021
Trang 2NATIONAL UNIVERSITY HO CHI MINH CITY
UNIVERSITY OF INFORMATION TECHNOLOGYADVANCED PROGRAM IN INFORMATION SYSTEMS
THAI HOANG THINH -17521091
LE TRINH QUANG TRIEU - 1751314
HIGH PERFORMANCE SEARCH AND VERIFICATION SYSTEM FOR COVID-19
INFORMATION
BACHELOR OF ENGINEERING IN INFORMATION SYSTEMS
THESIS ADVISORPHD LE KIM HUNG
HO CHI MINH CITY, 2021
Trang 3ASSESSMENT COMMITTEE
The Assessment Committee is established under the Decision , date
"¬ by Rector of the University of Information Technology
Doce - Chairman
Trang 4ACKNOWLEDGMENTS
Trang 6ĐẠI HỌC QUỐC GIA TP HÒ CHÍ CỘNG HÒA XÃ HỘI CHỦ NGHĨA
MINH VIỆT NAM
TRƯỜNG ĐẠI HỌC CÔNG Độc Lập - Tự Do - Hạnh Phúc
HIGH PERFORMANCE SEARCH AND VERIFICATION
SYSTEM FOR COVID-19 INFORMATION
Students work: MentorThai Hoang Thinh 17521091 TS Lé Kim Hung
Lê Trinh Quang Triệu 17521314
Trang 7Thesis review
1 About the report:
Number page: Number chapter:
Number of data tables: Number of figures:
Number of references: Product:
Some comments on the report format:
2 About research content:
3 About the application program:
4 About the student's working attitude:
General assessment: Thesis pass/fail the requirements of anengineering/bachelor's thesis, graded Excellent/Good/Average
Trang 8Score each student:
Thái Hoang Thinh 10
Lê Trịnh Quang Triệu 9
Mentor
TS Lé Kim Hung
Trang 9ĐẠI HỌC QUOC GIA TP.HÒ CHÍ — CỘNG HÒA XÃ HỘI CHỦ NGHĨA
MINH VIỆT NAM
TRƯỜNG ĐẠI HỌC CÔNG Độc Lập - Tự Do - Hạnh Phúc
HIGH PERFORMANCE SEARCH AND VERIFICATION
SYSTEM FOR COVID-19 INFORMATION
Students work: ReviewerThai Hoang Thinh 17521091
Lê Trinh Quang Triệu 17521314
Trang 10Thesis review
2 About the report:
Number page: Number chapter:
Number of data tables: Number of figures:Number of references: Product:
Some comments on the report format:
3 About research content:
4 About the application program:
5 About the student's working attitude:
Trang 11General assessment: Thesis pass/fail the requirements of an
engineering/bachelor's thesis, graded Excellent/Good/Average
Score each student:
Thái Hoang Thinh /10
Lê Trịnh Quang Triệu 190
Reviewer
Thank you
Trang 12First of all, we would like to thank all the instructors of the Department ofInformation System, as well as the instructors at the University of InformationTechnology - Vietnam National University, Ho Chi Minh City, who have provided uswith invaluable knowledge, lessons and experience over the past four years We also spent a little time and arranged the schedule to have the best opportunity to complete the graduation thesis Wishing the Faculty of Information Systems in particular and theUniversity of Information Technology in general to continue to achieve success inteaching and training talents, is a firm belief for generations students on their educationaljourney.
We would like to express our heartfelt gratitude to Dr Le Kim Hung, our lecturer,
in particular He has always cared about and helped us solve problems and difficulties inthe implementation process, thanks to the valuable experiences and lessons he has shared.
We were able to successfully complete this graduation thesis thanks to you Next, we'd like to express our gratitude to our family for their unwavering support and encouragement throughout our time at the University of Information Technology -Vietnam National University in Ho Chi Minh City, which has given us more energy toattend classes
Finally, our team would like to thank the brothers, sisters and students at theUniversity of Information Technology - Vietnam National University, Ho Chi Minh City,who have always enthusiastically supported, shared ideas and suggestions for us in thefuture dissertation time.
City Ho Chi Minh, date 26 month 12 2021Thai Hoang Thinh - Lé Trinh Quang Triéu
Trang 13ĐẠI HOC QUOC GIA TP HO CHÍ CONG HOA XA HOI CHU
MINH NGHIA VIET NAM
TRUONG DAI HOC CONG NGHE Độc Lập - Tự Do - Hạnh Phúc
THÔNG TIN
DETAILED OUTLINE
THEME NAME: HIGH PERFORMANCE SEARCH AND VERIFICATION
SYSTEM FOR COVID-19 INFORMATION
Mentor: TS Lé Kim Hing
Time to do: From 07/08/2021 to 26/12/2021
Students:
Thai Hoang Thinh - 17521091
Lé Trinh Quang Triéu - 17521314
Content of the subject:
Objectives of the study:
Apply FAISS and Fake News Detection Model available
Crawling data from rss feed
Learn about gRPC's HTTP/2 protocol and its application to the system
Use Elasticsearch to quickly query data
Testing, building and optimizing the website system to provide informationabout Covid-19 has been confirmed
Management services by Protainer
+ Deploy system and public
10
Trang 14performance search and verification system for covid-19 information.
Research scope: Research on FAISS and Fake News Detection model, gRPC protocol and find suitable datasets.
Methods of implementation:
e Theoretical basis:
+ Crawl data on reputable rss feed and store data on Elasticsearch
Integrated FAISS finds the closest semantic data index and Fake NewsDetection model to calculate the correct percentage of the sentencessearched.
+ Use gRPC protocol to request data quickly.
Expected results:
There is a large and validated data set
You may rate the search sentence as low comprehension
Create a system to check whether the information entered is correct or not
Create a scalable, stress-resistant and horizontally scalable system(increasing the number of nodes)
Technology related to the topic:
Use Elasticsearch to bookmark index data.
Use docker compose and docker swarm.
Available models such as Faiss and Fake News Detection.
Crawl data using selenium with rss feed
Deployment using docker and load balancing using nginx.
11
Trang 15Research FAISS Model.
Research Fake NewsDetection Model
Research Roberta baseModel
Research gRPC.
Deploy envoy
Research about servers.
12
Trang 16Set up a private server totrain models.
31/10/2021 - 2/11/2021 Research domain IP
Use domain to map to
server.
- Hide IP with cloudflare
16/11/2021 - 26/12/2021 - Deploy all systems to server
and public outside by dockerswarm.
- Check system operation
Confirmation of mentor TP HCM, 26/12/2021
Student 1 Student 2
Ts Lê Kim Hùng
13
Trang 18HTTP/2 employs binary rather than text 13Multiplexing of Requests and Responses 13Streams 13
How to connect a gRPC server 15
CI/CD github 16
Docker engine 17
Docker container 17Docker swarm 18Docker compose 20Portainer management 21
Domain ip address 22
15
Trang 19Chapter 3
MODEL ARCHITECTURE
General
OverviewWorkflow
Detailed training processDetailed predicting processCrawler API
WorkflowFake News detection APIUser Interface
Deployment and load balancing in docker swarm
Chapter 4
IMPLEMENTATION
RSS-crawler APIFake News detection apiUser interface
Github CI/CD processElasticsearch
16
24
2424
26
26
26
272730
3233
35
3544
526064
69
72737373
Trang 20LIST OF FIGURES
Figure 2-1 Workflow RSS work
Figure 2-2 Flow RSS crawler API work
Figure 2-3 Dataset sample 1
Figure 2-4 Dataset sample 2
Figure 2-5 Predict Process Workflow
Figure 2-6 Summary of all models and performances
Figure 2-7 Example Encoder data by encoder
Figure 2-8 Example Encoder data by encoder
Figure 2-9 Use FAISS search something
Figure 2-10 Flow Elasticsearch engine work
Figure 2-11 Two Sentence Input Elasticsearch
Figure 2-12 Inverted Index Elasticsearch
Figure 2-13 Search Query Elasticsearch
Figure 2-14 gRPC workflow
Figure 2-15 gRPC auto convert text, image, etc to binary
Figure 2-16 Stream gRPC workflow
Figure 2-17 Example Compare HTTP/1.1 and HTTP/2
Figure 2-18 Core Executed Envoy
Figure 2-19 Envoy Translates HTTP/* To HTTP/2
Figure 2-20 Flow CI/CD work in github (Viblo CI/CD)
Figure 2-21 Flow docker work (Topdev docker)
Figure 2-22 Swarm node (Bizflycloud)
Figure 2-23 Workflow docker compose build (Azuremarketplace)
Figure 2-24 Dashboard of portainer example
Figure 2-25 List activity of container example
17
oo On Do fF +
1010in12121313141415161616171820212222
Trang 21Figure 2-26 Register domain complete at matbao
Figure 2-26 Grant DNS domain
Figure 3-1 Overview model architecture
Figure 3-2 Overview public system
Figure 3-3 Crawler process
Figure 3-4 Al-core overview
Figure 3-6 Fake News detection API workflow
Figure 3-6 Encoding Copus Workflow
Figure 3-7 Result classification sentences
Figure 3-8 Envoy transfer HTTP/1* to HTTP/2 and general protoc.
Figure 3-9 Apply nginx in docker swarm for management.
Figure 4-1 Swagger UI
Figure 4-2 Test uncrawlable url
Figure 4-3 Test crawlable url
Figure 4-4 Test add url to RSS-crawler api
Figure 4-5 Test remove rss url
Figure 4-6 Test get all rss-url in data-base
Figure 4-7 Swagger UI
Figure 4-8 Demo train a bunch of data include a random sentence
Figure 4-9 Result after call predict api
Figure 4-10 Saving checkpoint api
Figure 4-11 Where to save checkpoint
Figure 4-12 Code structure
Figure 4-13 Libraries in package Json
Figure 4-14 Atomic design workflow
Figure 4-15 Source code
323334353638
394041
42434445
464148495051
5253
Trang 22Figure 4-17 serve.proto
Figure 4-18 All file js have been general
Figure 4-19 Envoy.yaml
Figure 4-20 Test gRPC on BloomRPC.
Figure 4-21 Function call gRPC on ReactJS
Figure 4-22 Website CoronaCheck Information
Figure 4-23 CI/CD build and push basic flow
Figure 4-24 CICD steps
Figure 4-25 CICD code example
Figure 4-26 CICD process summary
Figure 4-27 Workflow history
Figure 4-28 Model overview
Figure 4-29 Status checker
Figure 4-30 Node checker
Figure 4-31 Kibana UI node checker
Figure 4-32 Example Network configuration
Figure 4-33 Example for kibana container
Figure 4-34 Example for elasticsearch master node
Figure 4-35 Initialize swarm
Figure 4-36 Swarm node checker
Figure 4-37 Swarm service result
Figure 4-38 Result after start up
Figure 4-39 Demo run a docker container in existed swarm
Figure 4-40 The swagger result after start up append
6162
636465656666
676768
6869697071
Trang 23Everyone in the world has been concerned about the covid-19 pandemic since theoutbreak At that time, it must be stated that in a pandemic, the media is the mostimportant; if one news media is not reliable, a country can suffer severe economic consequences Since then, many unreliable sources of information have been widely disseminated on the internet, particularly on widely accessible social media platforms Fake news is distributed by social media and news organizations to increase readership or
as psychological warfare The goal, according to Inge Tang, is to profit from clickbaits.With flashy titles or designs, clickbaits entice users to click links in order to boost adrevenue Because of advances in communication brought about by the rise of socialnetworking sites, this exposure is due to the prevalence of fake news The project's goal
is to develop a solution that users can use to identify and filter out websites that contain false or misleading information.
In this graduation thesis, I investigate and comprehend the problem of detecting fake news In addition, I used my existing knowledge, as well as learning more about machinelearning and deep learning, to use available models and build a website system to detectfake news covid-19
20
Trang 241.2 Rationale
So, what happens when a country is affected by covid? Such incidents always occur as a result of an unconfirmed word or unreviewed articles from that country's health ministries, causing confusion and extremely serious effects on the economy and citizens After a parallel pandemic, there will be a lot of information,some of it untrustworthy and even harmful, that has spread widely around theworld What matters here is the dependability and dependability of both the source
of the information and the information itself, which has emerged as a global issue
in modern society There is no doubt that social networking has revolutionized theway information spreads on the Web and in general around the world over the lastfew decades by allowing users to freely share content faster than traditional news sources Because content spreads so quickly and easily across platforms, people (and the algorithms that power the platforms) are potentially vulnerable tomisinformation, hoaxes, biases, and biases Every day, whether accidentally or onpurpose, low-trust content is shared However, the issue of misinformation spreadaffects not only social media platforms, but also the World Wide Web In fact,when people use web search engines like Google or Bing to conduct a search, they
Trang 25can view and potentially visit hundreds or thousands of web pages with varyinginformation Covid-19 information that is both useful and potentially misleading.
The goal is to build an information search system that is scalable and highlyerror-tolerant during data collection and analysis of information documents, usingreputable information from the Ministry of Health that is carefully filtered andreputable article url Use the fake news detection model available and combine with the above system to create a complete system that can be produced.
1.3 Aims and Objectives
With this graduate thesis, the main goal that I aim for when doing this is asfollows:
Research and apply detection of fake news models.
Crawling data by selenium through the rss feed.
To apply Elasticsearch engines into the system and optimize search engines.Understanding Grpc protocol (HTTP/2) for communication betweenbackend with backend and frontend with backend
To use Envoy proxy for transfer protocol HTTP/1.* to HTTP/2
To use the React-JS framework used to build website tracking information
To use Portainer for managing all images and containers on docker
To use docker, docker swarm and docker compose deploy all services in the
system.
Integrated CI/CD github for build and test.
Grant domain and public server all services.
1.4 Scope of study
The goal of this study is to create a fault-tolerant covid-19 information trackingsystem when large amounts of data are added and requested Furthermore, the
Trang 26system must be capable of simultaneously handling multiple queries When there is
a problem with the system, it should be able to restart itself using Docker Swarm
Trang 27Chapter 2 LITERATURE REVIEW AND THEORETICAL
to do anything In this case, we will choose rss feed to provide data, using python(selenium) for crawling data through the rss feed and parse responses to get title
RSS crawler api
Crawling e200
NEWS ES),
-Figure 2-2 Flow rss feed has been crawled by API crawler rss
In addition, we develop a module with additional rss link sources to be able toenrich the data Crawler will crawl data each 3 hours
Trang 282.2 Dataset
Dataset is a collection of data In other words, a dataset corresponds to thecontents of a database table or a matrix of statistical data, where each column of the table represents a specific variable and each row corresponds to a certain member of the dataset in question In Machine Learning projects, we need atraining dataset This is the actual data set used to train the model to performdifferent actions Datasets are broken down into 3 categories for use in machinelearning including: training set, validation set, testing set
Currently on the internet there are many datasets about the problem of covidfakenew detection, but are those types of data really useful, really clean?Therefore, the selection of a dataset to pre-train the model is one of the most important and important issues to pay attention to.
Datasets that our team has collected and analyzed at paperswithcode, kaggle.
[uni] Browse State-of-the-Art Datasets | Methods More - We XS Signin |
Texts
COVID-19 Fake News Dataset (COVID19 Fake News edit
Detection in English)
Introduced by Patwa et al in Fighting an Infodemic: COVID-19 Fake News Dataset
Along with COVID-19 pandemic we are also fighting an `infodemic: Fake news and rumors are rampant on social media, Believing in rumors can cause significant harm This is further exacerbated at the time of a pandemic To tackle this, we
curate and release a manually annotated dataset of 10,700 social media posts and articles of real and fake news on Usage 4
COVID-19 We benchmark the annotated dataset with four machine learning baselines - Decision Tree, Logistic Regression , Gradient Boost , and Support Vector Machine (SVM) We obtain the best performance of 93.46\% F1- ©
® COVID-19 Fake
Benchmarks rất
eng Te saset Varia est Model License ® wea
Fake News Detection COVID-19Fake News Dataset Ensemble Madel+ Heuristic Post-Processing & =
Trang 29= kaggle
Create Home Competitions
total metadata rows: 866685
CORD UIDs (new: 8526, removed: 26)
FuLL text POF - 314391 json (new: 3047, removed: 89) PMC - 243652 json (new: 1831)
2021-12-13 CHANGES.
No major changes
~=-SUMMARY-~~
total metadata rows: 858007
CORD UIDs (new: 12364, removed: 45)
Full text POF - 311432 json (new: 8332, remove: d: 178) PMC - 241821 json (new: 7018)
n the site By using Kaggle, you agree to our use of c
Download (63 GB) New Notebook Ệ
fe
Sign In
2.3 Fake news detection
Figure 2-4 Dataset sample challenge (Kaggle dataset)
On the one hand, fake news articles are easily spread through various onlinemedia platforms nowadays, posing a significant threat to the trustworthiness ofinformation On the other hand, our comprehension of the language of fake newsremains limited Incorporating the hierarchical discourse-level structure of fake andreal news articles is an important step toward gaining a better understanding of howthese articles are structured Nonetheless, this has received little attention in the fake news detection domain and faces significant challenges For starters, existing methods for capturing discourse-level structure rely on annotated corpora, which arenot available in the case of fake news datasets
The mechanism of fake news detection AI is based on text semanticclassification This means that every sentence or phrase that feeds to the model will
be labeled true or false Model will be trained nine times with that data
Trang 30Predict process
Encode block
C >) Trainsformer
tokenizer
Re-train RoberTa base :
model Logic Model RoberTa base
model Documents
D
In this case, we compare the performance of 19 models provided by theauthors along several aspects like features used, resource requirements, etc Ourteam chose RoberTa to apply to our system
R Kế Summary of Result (Acc.)
Model Type Model Rationale for Picking Feature Used Lạc Fake Combined
Or Real Corpus
SVM Lexical 0.56 0.67 0.71
SVM These traditional models are Lexical + Sentiment 0.56 0.66 0.71
Traditional LR used in different classification Lexical + Sentiment 0.56 0 67 0 76
Machine Decision Tree tasks including text Lexical + Sentiment 0.51 0 65 0 67
Learning AdaBoost classification Different Lexical + Sentiment 0.56 0.72 0 74
Models Naive Bayes existing studies used them for Unigram 0.60 0 82 0.91
Naive Bayes fake news detection as well Bigram 0.60 0 86 0 93
Learning LSTM LSTM remembers information GloVe embedding 054 0.76 0.93
Models for long sentences.
Bi-LSTM analyzes a certain
HAN applies attention
HAN mechanism for both 0.75 0.87 0.92
word-level and sentence-level representation.
Convolutional layer encodes
Conv-HAN embedding into feature for 0.59 0.86 0.92
word-level and level attention.
senetence-BERT - senetence-BERT embeddings 0.62 0 96 0 95
T†ng: RoBERTa Sein oe ietboapod RoBERTa embeddings 0.62 0.98 0.96
Language DistiIBERT and can be fine-tuned for” DistiIBERT embeddings 0.60 0 95 0.93
Models ELECTRA texLclssdiREntGii ELECTRA embeddings 0.61 0 96 0 95
ELMo ELMo embeddings 0.61 0 93 0.91
Figure 2-6 Compare models and performances
Trang 31Our team has tested and checked the accuracy of this roberta model and theresults are exactly what the above statistics have described.
Q 1-8x of 8
Name Value
accuracy 1
fl_score 1 loss 0.000004708 val_accuracy 0.9981
val_f1 0.9972
val_loss 0.001654 val_precision 0.9976
val_recall 0.9975
Figure 2-6 Testing result
2.4 FAISS
Faiss is a library for efficient similarity search and clustering of dense vectors
It contains algorithms that search in sets of vectors of any size, up to ones thatpossibly do not fit in RAM It also contains supporting code for evaluation andparameter tuning
Trang 322.4.1 Semantic similarity in universal sentence encoder.
Faiss is a library for efficient similarity search and clustering of densevectors It contains algorithms that search in sets of vectors of any size, up toones that possibly do not fit in RAM It also contains supporting code for evaluation and parameter tuning.
Semantic similarity is a measure of the degree to which two pieces of textcarry the same meaning This is broadly useful in obtaining good coverageover the numerous ways that a thought can be expressed using languagewithout needing to manually enumerate them Simple applications includeimproving the coverage of systems that trigger behaviors on certain keywords,phrases or utterances This section of the notebook shows how to encode textand compare encoding distances as a proxy for semantic similarity.
"How old are you?"
"What is your age?"
"My phone is good."
Figure 2-7 Example Encoder data by encoder
Trang 338 ‘What color is chameleon?',
9 ‘When is the festival of colors?',
10 ‘When is the next music festival?',
11 ‘How far is the moon?',
12 ‘How far is the sun?',
13 ‘What happens when the sun goes down?',
14 ‘What we do in the shadows?',
15 ‘What is the meaning of all this?',
16 ‘What is the meaning of Russel\'s paradox?',
17 ‘How are you doing?'
Trang 342.5 Elasticsearch engine
Elasticsearch is a search engine based on the Lucene Apache library Itprovides a distributed, multitenant-capable full-text search engine with an HTTP web interface and schema-free JSON documents.
2.5.1 How does elasticsearch work
Raw data flows into Elasticsearch from a variety of sources, includinglogs, system metrics, and web applications Data ingestion is the process bywhich this raw data is parsed, normalized, and enriched before it is indexed inthe Elasticsearch engine Once indexed in Elasticsearch, users can runcomplex queries against their data and use aggregations to retrieve complexsummaries of their data
Figure 2-10 Flow Elasticsearch engine work
2.5.2 Reindex in elasticsearch
Use an inverted index structure in Elasticsearch It is intended to supportfull-text searches The documents are divided into meaningful words and then mapped to determine which text belongs to which Depending on the type of search, it will return specific results.
lãi
Trang 35Example, we have 2 documents that have been added to elasticsearch.
1,The quick brown fox jumped over the lazy dog
2,Quick brown foxes leap over lazy dogs in summer
Figure 2-11 Two sentence input Elasticsearch example
(Topdev elasticsearch reindex)
To make an inverted index, we'll extract each document's content intoindividual words (which we call terms), make a sorted list of all unique terms,and then list the documents in which each term appears The following are the
outcomes:
Quick | |} x The Ni
brown | ie | xX
dog DP |
dogs | | X fox | x |
foxes | | X
in | | x
jumped | me |
lazy | x | X leap | | x over | x | x quick | x |
summer | | X the | x
Figure 2-12 Inverted index elasticsearch
(Topdev elasticsearch reindex)
12
Trang 36For quick brown color search, we just need to look in the documentswhere each term appears or not The following results:
Term Doc_1 Doc 2
brown | x | xXquick | rs |
Total | 2 | 1
Figure 2-13 Search Query Elasticsearch
(Topdev elasticsearch reindex)
Both paragraphs are keyphrases It is clear, however, that Doc 1 is far more correct Data can be quickly searched and support for fuzzy search(fuzzy), which means that even if the search query contains misspellings orerroneous syntax, elasticsearch can still return useful results
2.6 gRPC protocol
gRPC is an open-source data exchange technology developed by Google using the HTTP 2.0 protocol and we often use REST to design web API It uses standardHTTP 1.1 methods like GET, POST, PUT, and DELETE to work with server-side
Figure 2-14 gRPC workflow
13
Trang 372.6.1 HTTP/2 employs binary rather than text
Binary protocols are easier to parse, more compact to communicate, and,most importantly, less error prone than text Simply because binary data doesnot have to deal with things like spaces, capitalization, carriage returns, blanklines, emojis, and so on
Figure 2-15 gRPC auto convert text, image, etc to binary
2.6.2 Multiplexing of Requests and Responses
When the client wants to optimize the connection by making multiplerequests at once and then waiting for the response, this is referred to asPipelining in HTTP/1 We already know that the response must be in thecorrect order in relation to the request This results in the HOL phenomenon.
To address this, HTTP/2 employs multiplexing for both the request and theresponse It is understandable that they are identified in order to determine which response to which request They can then work independently withouthaving to adhere to the same order as before This method, in fact, divides databetween Client and Server into interleaved data frames The receiving sidewill be responsible for "assembling" them in order to obtain the complete data
2.6.3 Streams
Multiple Streams will be present in an HTTP/2 connection (data streams) These streams contain a series of Data Frames that must be exchanged between the client and server Data Frames can be interleaved regardless of
14
Trang 38origin Nonetheless, the order of the Data Frame is critical, and the receivermust process it in this order A data stream can be closed or restarted on boththe receiving and sending sides A new Stream will be assigned an integer IDwhen it is created.
'HTTP/2 Request
steam! steam3 stream?
frame2 framet frame2
— seen rent —
frame’ frame2
Figure 2-16 Stream gRPC workflow (Grpc protocol)
In the example below The large image is broken down into several smaller files We can see that the thumbnails are uploaded sequentially, oneafter the other, using HTTP/1 With HTTP/2, the TCP connection is dividedinto multiple streams, each of which is responsible for loading a small image These streams can intertwine asynchronously and do not need to wait in order.
Trang 392.6.4 How to connect a gRPC server.
With gRPC-Web, client calls still need to be translated into gRPC-friendlycalls, but that role is now filled by Envoy proxy service, which has built-insupport for gRPC-Web and serves as its default service gateway.
@©=>
Figure 2-18 Core Executed Envoy
(Comdy envoy proxy)
Listener is an Envoy module that accepts new connections and loadsIP/Port Filter will be in charge of handling the request A request can berouted through a series of filters Envoy proxy using codec API has beenintegrated to translate the HTTP/1.* to HTTP/2 or layer higher.
gRPC-Web
app
Browser
gRPC-Web
Figure 2-19 Envoy Translates HTTP/* To HTTP/2
(Comdy envoy proxy)
16
Trang 402.7 CL/CD github
The CI/CD Pipeline or Continuous Integration/Continuous Deployment is thebackbone of the modern DevOps environment It bridges the gap between
development and operations by automating application building, testing, and
deployment CI stands for Continuous Integration and CD stands for Continuous
Delivery/Continuous Deployment We can think of it as a process like the software
development lifecycle In the owner system, all services will be integrated with
CI/CD to ensure all these services can run in any environment when docker is
Figure 2-20 Flow CI/CD work in github (Viblo CI/CD)
This necessitates the use of a project management system as well as codeversioning We'll use github because it has built-in CI/CD support We only need
to write a CI/CD script, commit the code version, and push the code, and CI/CD
will run on its own.
17