l en de lai: : UNG DUNG CONG NGHE HDP VAO VIEC TOI U U LUU TRU • • •DU LIEU TAI TONG CONG TY MANG LUOl LAO TELECOM, TINH ATTA PE U NUOC CIIDCND LAO Nganh: J le thong thong tin lorn tAt:
Trang 1TRUONG DAI HQC SU PHAM
VONGVILAI THIDSAMAI
Da Nlng - 2023
Trang 2TRUCJNG DAI HOC SU PHAM
VONGVILAI THIDSAMAI
CNG DUNG CONG NGHE HDP VAO VIEC TOI ll LUU TRU*
Chuyen nganh: He thong thong tin
Ma so: 84.80.104
Ngu’di hurting dan khoa hoc: TS Nguyen Dinh Lau
Da Nang - 2023
Trang 3LOI CAM DOAN
Toi xin cam doan day la cong trinh nghien cthi do toi thuc hien duai str hiring dan cua TS Nguyen Dinh Lau tai bo mon He thong Thong tin, Khoa Cong nghe Thong tin, Taring Dai hoc Su Pham Da Nang Cac so lieu va ket qua trinh bay trong luan an la (rung thuc, chua duac cong bo boi bat ky tac gid nao hay a bat ky cong trinh nao khac.
lac gia
Vongvilai Thidsamai
Trang 4LOI CAM ON
> r > e
Trude tien toi xin gui ldi cam on chan thanh va sau sSc den thay giao, TS
Nguydn Dinh Lau - ngudi da hudng dan, khuyen khich, truyen cam hung, chi bao va tao cho toi nhung dieu ki?n tot nhat tu khi bat dau nghien cuu den khi hoan thanh luan van nay
Toi xin chan thanh cam on cac thay cd giao khoa Cong nghe thong tin, trubng Dai hoc su pham Da NSng, dac biet la cac Thay Co trong B q mon II? thong Thong tin
da tan tinh dao tao, cung cap cho toi nhung kien thuc vo cung quy gia, da tao dieu kien tot nhat cho toi de hoan thanh luan vSn n^y
Dong then toi xin chan thanh cam on cac ban trong ldp K40.HTTT da tao moi
Trang 5l en de lai: : UNG DUNG CONG NGHE HDP VAO VIEC TOI U U LUU TRU • • •
DU LIEU TAI TONG CONG TY MANG LUOl LAO TELECOM, TINH ATTA PE U NUOC CIIDCND LAO
Nganh: J le thong thong tin
lorn tAt: Cong ngh$ thong tin va vien thong con la mot trong nhirng dieu ki?n chinh quyet dinli sir phat trien cua nen kinh te the gidi No tac dong sail sAc den each chung la dang song, hoc tap va lain vi$c: den each thirc nha nude giao tiep vdi dan Nd Cling t$o ra nhirng lhach thirc kinh te xa hoi trirdc eac ca nhan doanh nghi^p cong dong d moi noi tren trai dal nhAm dat hi$u qua va tinh sang tao cao lion, lat ca chung la dang dung trirdc va can nam bat co hoi nas
Hortonworks Data Platform La mot nen tang phat trien va xay dirng hoan toan md 11 DP dirge rhiet ke de dap img nhu can xir ly du li£u Idn cua doanh nghi$p IIDP la linh hoyt cung cap kha nang
nd rong tuyen tinh md rong liru trit va tinh loan Iren mot loat cac phirong phap truy cap (access nethods) batch va real-time, search va streaming Nd bao gom nipt tap hop loan dien cac kha nAng xir
ly dir li$u cho doanh nghiep nhir: governance, integration, security va operation.
Ap dung nhtfng cong nghe mdi nay vao viec luu trii va xu ly dir lieu Idn Trong nganh vien thong thi dir li?u la circ ky Idn nhal la doi vdi dir lieu lien quan den cuoc ggi tin nhan va hanh vi sir dyng dir li$u Doi vdi nha mang Lao Telecom thi nhirng dCr lieu nay phat sinh khoang 2.5 I B den 3T
irong mot thang doi vdi nha myng Lao Telecom thi dd lipu nay khoang I.5T den 2TB mot thang Do Idn cua dir lieu phu tlnioc vao so lirgng time bao cua moi nha mang va cac thong tin lien quan ma tong dai sc ghi nhan lai.
I rong de tai tir ket qua phan tich so sanh giira giai phap cu va giai phap mdi ta thay ket qua cua giai phAp mdi tot hern I hdi gian thyc hipn import so li?u vao he thong nhanh horn hAn so vdi giai phap
cu Ngoai ra dCr lieu cirdc cua khach hang ngay cang Idn moi nam co the tang 20-30%, vi vay neu van
ip dyng cong nghe cu va giai phAp cu s6 khong dAp img dirge nhu can thyc te Vdi cong ngh? mdi va giai phap mdi sc hoan toan cd the dap irng dirge cac yeu cAu nay cong ngh^ dirge ra ddi de chuyen dung cho cac co sd dil lieu Idn xir ly dir lieu Idn trong thdi gian thyc.
De tai cd gia trj ve mat ly thuyet Co the sir dyng de tai nhir la tai li$u tham khao doi vdi sinh vien nganh hp thong thong tin va hieu phirong phap sir dung cong ngh£ HDP de img dyng vao thyc te Dira ra hgp gdp phan Idn vao vi?c liru trir va xir Iv d& lieu ciia nha mang Lao Telecom.
lir khoa: toi iru liru tnr dur lieu; xir ly dfr lieu Idn: cong ngh? HOP: cong ngh^ vien thong: cong igh$ thong tin.
Xac nhan ciia giao vien hiron an X
IS
Ngirfri thuc hi{*n de tai
V0NGV1LAI THIDSAMAI
Trang 6Name of thesis: APPLYING HDP TECHNOLOGY TO OPTIMIZE DATA STORAGE AT LAO TELECOM NETWORK COMPANY, ATTAPEU PROVINCE, LAOPDR.
Major: Information system
Abstract: Information technology and telecommunications are also one of the main conditions determining the development of the global economy It deeply affects the way we live, learn, and work: the way the state communicates with the people It also creates economic and social challenges for individuals, businesses, and communities everywhere on the planet to achieve higher efficiency and creativity All of us are facing and need to seize this opportunity.
Hortonworks Data Platform (HDP) is a fully open development and construction platform designed to meet the needs of enterprise big data processing HDP is flexible, providing linear scalability, expandable storage and computing across a range of access methods, batch and real-time, search and streaming It includes a comprehensive set of data processing capabilities for enterprises such
is governance, integration, security, and operation.
Applying these new technologies to the storage and processing of big data is crucial In the telecommunications industry , data is extremely large, especially for data related to calls, messages, and data usage behavior For Lao Telecom, this data generates about 2.5TB to 3T per month, while for other telecom companies, this data is about 1.51 to 2TB per month The size of the data depends on the number
of subscribers of each telecom company and the related information that the switchboard will record.
In the research project, the comparison analysis results between the old solution and the new solution showed that the new solution performs better The time it takes to import data into the system
is much faster than the old solution In addition, the customer billing data is increasing every year, possibly by 20-30% so if the old technology and solution are still applied, they will not meet practical
•equirements With the new technology and solution, these requirements can be completely met This technology was developed specifically for large databases and processing big data in real-time.
The topic is valuable in terms of theory It can be used as a reference for students in the field of information systems to understand the method of using HDP technology to apply it in practice It provides a significant contribution to the storage and processing of data for Lao Telecom.
Keywords: data storage optimization; big data processing: HDP technology: telecommunications technology: information technology.
Trang 7MUC LUC• •
Ldl CAM DOAN
Ldl CAM ON ii
DANH MVC CAC KY HIEU, CHU VIET TAT v
DANH MUC BANG vi
DANH SACH HINH VE vii
MO DAU 1
1 Ly do chon de tai 1
2 Muc ticu nghien cuu 2
3 Doi tuong va pham vi nghien cuu 3
4 Phuong phap nghien cuu 3
5 Y nghia khoa hoc va thuc tien cua de tai 3
6 Ket qua du kien 3
7 Bo cue luan van 3• •
CHUONG 1 DAT VAN DE VE BAI TO AN TINII CUOC DANG DUNG TAI LAO TELECOM 5
1.1 Mo hinh, hien trang va nghiep vu cua he thong Lao Billing 5
1.1.1 Giai thieu mo hinh he thong Lao Billing 5
1.1.2 Cac nghiep vu linh cudc 6
1.2 Nhung ton tai cua he thong 13
1.3 Ket chuong 14
CHUONG 2 PHAN TICH, LUA CHON VA THIET KE GIAI PHAP 15
2.1 Mo hinh giai phap cu 15
2.1.1 Mo hinh vat ly he thong Billing hien tai 15
2.1.2 Mo hinh logic he thong Billing hien tai 16
2.2 Mo hinh giai phap he thong Billing moi va hoan toan mien phi[ 101 17
2.3 Mo hinh giai phap mien phi ket hop co phi 21
• • • ' 2.4 So sanh giai phap mien phi hoan toan va giai phap ket hop co phi 26
2.5 Ket chuong 28
CHUONG 3 UNG DUNG CONG NGHE HORTONWORKS DATA PLATFORM 3.1 Horton works Data Platform 29
3.2 HDP Data Management 30
3.4 HDP Data Access 31
3.5 HDP Data Governance and Integration 32
3.6 HDP Security 32
3.7 HDP Cluster Operations 33
Trang 83.8 Trien khai cac tuy chon cua Hadoop 33
3.9 Cai dal Hortonworks Sandbox tren windows sir dung Oracle VirtualBox 34
3.9.1 Cai dat tren Windows bang each sir dung Oracle VirtualBox 34
3.9.2 Import tap tin Sandbox: File-> Import Appliance 35
* r 3.9.3 Cira so Import Virtual Appliance xuat hien 36
3.9.4 Import Virtual Applicance 36
9 9 '» * 1 3.9.5 Man hinh cai dat thiet bi xuat hien Ban co the phan bo RAM nhieu hon so vdi » 9 mac dinh de nang cao hieu suat 37
3.9.6 Thiet bi dirge Import 38
3.9.7 Mo Sandbox 38
3.9.8 Dgi may ao khdi dong len 39
♦ 3.9.9 Sir dung trinh duyet tren may chu de mb cac URL hien thi tren giao di?n dieu khien 39
3.10 Ket chirong 40
CHUONG 4 DANH GIA THUC NGHIEM SO SANH GIAI PHAP CU VA GIAI PIIAP MC5I 41
* 4.1 Mo hinh logic he thong thirc nghiem 41
z z 4.2 Phurong phap lay so lieu thirc nghiem 42
z z 4.3 Phan tich, so sanh so lieu thirc nghiem gitra hai he thong 53
4.4 Ket chiromg 57
KET LU AN, KHUYEN NGHI 59
DANH MUC TAI LIEU TH AM KHAO 60• •
Trang 9DANH MUC CAC KY HIEU, CHU VIET TAT
KPI Key Performance
f
La chi so danh gia cong vi?c, cong cu do lirongIndicator nharn phan anh hieu qua hoat dong
Balacing
Trang 10DANH MUC BANG
So hieu
2.1 So sanh tinh nang EBS cua cac nha cung cap dich vu 262.2
- f
So sanh Database cua cac nha cung cap dich vu 27
4.3
-
7 -Thong tin chi tiet may chu trong qua trinh chay chay script 43
4.5
- r a
-Dir lieu chi tiet tai may chu trong mot lan chay thirc nghiem 45
4.7 So lieu thirc nghiem tren cong nghe cu vdi dung lirong 2G7 474.8
Trang 11DANH SACH HINH VE
2.4 Mo hinh giai phap ket hop co phf va mien phi 223.1
- Hadoop trong mot mo hinh kien true hien dai 293.2 Hoilonworks Data Platform: Enterprise Hadoop 303.3
7 -F
3.10
4.3 Cong thuc va ket qua tinh ca mau thuc nghiem" Z A • A 544.4
Trang 12MO DAU
1. Ly do chon de tai
Cong nghe thong tin ngay cang dong vai tro quan trong trong phat trien kinh te, xa hoi Cong nghe thong tin va vien thong la mot trong nhirng dong lire chinh tao nen bo mat the ky 21 Ngoai ra, cong nghe thong tin va vien thong con la mot trong nhirng dieu
ta dang dung trudc va can nam bat co hoi nay
Theo nghien cuu ciia Tong cong ty mang ludi Lao Telecom - la mot trong nam
* X
cong ty hang dau ve linh vuc cong nghe thong tin tai Lao, nganh cong nghe thong tin se dong gop khoang 1,16 nghin ty USD vao GDP cua chau A - Thai Binh Duong vdi ty le tang trudng hang nam 0,8% Nam 2017, chiem khoang 6% GDP ciia chau A - Thai Binh
Duong den tir cac san pham va dich vu cong nghe thong tin, thong qua viec sir dung cac cong nghe so Tong cong ty mang ludi Lao Telecom du bao con so nay se tang len 60% GDP khu vuc chau A - Thai Binh Duong vao nam 2021 Cung theo so lieu Tong cong
ty mang ludi Lao Telecom, khoang 84% cac to chuc, doanh nghiep trong khu vuc da va
the thuang phong va dat dugc nhieu thanh tuu Chinh vi vay, mot dieu rat quan trong do
la cac doanh nghiep phai thuang xuyen doi mdi cong nghe, cap nhat cac cong nghe mdi
trien va doi mdi cong nghe Cac cong ty va doanh nghiep can phai sir dung cac chi so
do ludng (Key Performance Indicator-KPI) de do cac chi so nhu hieu qua ciia quy
Trang 13khi ap dung cong nghe moi, tCr do moi co danh gia chinh xac dirge su thay doi nay co thirc sir hicu qua hay khong.
Tuy nhien viec thuc hien ap dung va nghien cuu ap dung cong nghe thong tin
vao qua trinh san xuat tai cac doanh nghiep con chain Nhieu cong ty van con su dung cac cong nghe trudc thdi diem hien tai ca chuc nam, mac du cong nghe thay doi hang ngay va dac biet la doi vdi nganh cong nghe thong tin Doi vdi cac cong ty chuyen sau ve cong nghe van chua cd nhung nghien cuu co ban hoac cd cac san pham cot loi
cd tieng vang tren thi trudng Day la mot diem you lam cho nganh cong nghe thong
»»
tin cua nude ta chua phat tricn dat dugc nhu ky vong
Xuat phat tir thuc trang nay, tac gia dua ra mot van de khong mdi nhung van
phuofng an khac phuc
2 Muc tieu nghien cuu
Ap dung nhung cong nghe mdi nay vao viec luu tru va xir ly dir li?u ldn Trong
De tai nay se chi ra neu nha mang ap dung cac cong nghe mdi thi toe do Import
du lieu cd the tang len den 1,5 den 2 lan so vdi cong nghe cu Cac so lieu trong de tai
la dirge chay tren he thong LAB cd cau hinh cao, dung lugng mdi mau tir 1 ngay den
Trang 143 Doi tuong va pham vi nghien ciru
Doi tuong nghien cuu:
Cac phuong phap su dung cong nghe HDP trong vice xu ly va luu tru du lieuPham vi nghien cuu:
Du lieu CDR cua nha mang Lao Telecom
4 Phuong phap nghien cuu
Tim hieu ly thuyct ve cong nghe HDP
T %
Tim hieu ve cac phuong phap sir dung
5 Y nghia khoa hoc va thuc tien cua de tai
de ung dung vao thuc te
Ve mat thuc tien: Ket qua nghien
du lieu cua nha mang Lao Telecom
Ion vao viec luu tru va xu
6 Ket qua du* kien
Ly thuyet
- Hieu duoc cac phuong phap HDP trong xu ly du lieu
Thuc tien
Ung dung phuong phap HDP vao phan toi uu Import du lieu CDR vao Database
7 Bo cue luan van
Chirong 1 Dat van de ve bai toan tinh cuYrc dang dung tai Lao Telecom.
Telecom viec trinh bay ve su can thiet va quan trong cua viec xem xet lai each tinh cuoc hien tai dang duoc ap dung tai vien thong Lao Telecom
Trang 15tap trung vao tirng khfa canh cua van de de co the dira ra cac giai phap phu hop vil
dap ung duoc yeu cau cua khach hang hoac doi tuong sir dung
Chuong 3 Ung dung cong nghe HORTONWORKS DATA.
Chuong nay gidi thieu va tim hieu cai dat, ling dung cong nghe HortonWork
Data de giai quyct bai toan Y nghia cua ting dung cong nghe HORTONWORKS
DATA la viec sir dung nen tang du lieu Hadoop cua Hortonworks de xu ly va phan tfch cac du lieu Ion Cong nghe nay dupe su dung de giai quyet cac van de phiic tap va
xu ly cac tap du lieu ldn, da dang va phirc tap
Chuong 4 Danh gia thuc nghiem so sanh giai phap cu va giai phap moi
r
Sau khi thuc hien chuong trinh hoan thanh, tien hanh danh gia va so sanh cac giai cu va moi Danh gia thuc nghiem so sanh giai phap cu va giai phap moi qua trinh danh gia hieu qua cua mot giai phap moi bang each so sanh vdi giai phap cu da dupe
su dung trudc do Muc dich cua qua trinh nay la de xac dinh xem giai phap moi co tot hon, hieu qua hon, hoac can dupe cai tien hay khong so vdi giai phap cu
Trang 16CII LONG 1 DAT VAN DE VE BAI TOAN TINH CUdC DANG DUNG
TAI LAO TELECOM.
1.1 Mo hinh, hien trang va nghiep vu ciia lie thong Lao Billing.
1.1.1 Gidi thieu mo hinh he thong Lao Billing.
Hinh 1.1 Mo hinh he thong Lao Billing
cho khach hang Nam 2006 co khoang 5 tong dai, nhung den nam
2019 thi so lugng tong dai da len tdi con so 30;
Trang 179 •>
- Nhom he thong Billing dung de import du lieu vao trong Database, tinh cudc cudc goi va hien thi du lieu cho khach hang tra cuu
Cu the md ta chuc nang cua tung nhom nhu sau:
a He thong tong dai
+ He thong co khoang 30 tong dai thuoc nhicu doi tac khiic nhau
+ Toan bo cac thong tin ve cudc goi, tin nhan va licit sir truy nhap du lieu deu dugc cac tong dai ghi nhan lai
+ He thong chay tren nen tang may chu Linux va cor sd du lieu Oracle
+ Du lieu import 1 thang khoang gan 2 TB
+ Sau khi dir lieu dugc import vao thi cac nghiep vu tinh cudc deu chay tren dir lieu nay
1.1.2 Cac nghiep vu tinh cudc
a Nghiep vu Lay du lieu tir he thong Billing Gateway
+ Bude 1: Quet thu muc chira du lieu tong dai
Trang 18+ Bude 2: Kiem tra mau dinh dang cua file CDR:
- Neu khong dung mau dinh dang thi khong xu ly file do
- Neu dung mau dinh dang file:
+ Bude 3: Kiem tra dung luong 6 cung tren may local:
Neu dung luong 6 cung du thi chuyen sang Bude 4
Neu dung luong 6 cung khong du thi nhan tin canh bao va quay lai Bude 1
A+ Bude 4: Download ve thu muc tren may Local va chuyen sang Bude 5
+ Bude 5: Xu ly vdi nhung file da duoc download vdi 1 trong 4 tuy chon:
- Xoa file
- Thay doi ten file
- Chuyen file sang mot thu muc khac tren FTP Server
b Nghiep vu Import du lieu vao Database
+ Bude 1: Quct thu muc chua file da download ve
i u + ngay + thang +nam
+ Bude 2: Kiem tra mau dinh dang file
*
- Neu khong dung mau dinh dang thi khong xu ly
- Neu dung mau dinh dang file thi chuyen sang Birac 3
+ Birac 3: Doc noi dung trong file
Trang 19+ Bude 5: Insert theo batch vao Database.
- Neu insert khong thanh cong thi ghi thong tin ra log va luu file den thu muc Unratc va chuyen sang Bude 6
+ Bude 3: He thong tong hop theo cac tieu chi
+ Bude 4: Cap nhat du lieu tong hgp vao bang tong hop
c Nghiep vu Tong hop cudc nong
thong cung tong hgp so lieu cac dieu chinh nay
+ Bude 5: Tfnh toan cong ng cho cac thue bao, hgp dong cua khach hang
Trang 20+ Birac 6: Nhan vien tinh cudc chuan bi cau lenh SQL de kiem tra viec thuc hien
mdi fill so tien theo nguyen tac phan tfch ng
+ Bude 3: Kiem tra lai xem con giao dich nao dugc day vao them khong
hgp dong) de dieu chinh Cac thong tin ca ban de tim kiem bao gom:
So CMT/Ho chieu, ma so thue.
So hop dong co thue bao can dieu chinh.
So thue bao can dieu chinh.
+ Bude 2: Nhap so lieu dieu chinh
Trang 21cac thue bao trong hop dong, cac quy tac phan bo nay co the la:
Phan ho deu cho cac thue bao trong hap dong.
Phan bo theo ty le phan tram cua cudc phat sinh.
Phan bo theo ty le phan tram cua so tien phai thanh loan.
Sau khi dp dung quy tac phan bo cho tirng hop dong se tinh toan du-pc so lieu dieu chinh cho tirng thue bao trong hap dong do.
+ Bude 5: Thuc hien dieu chinh cho thue bao.■ •
chinh de thuc hien chot so cuoi ky
+ Bude 8: Phan tich lai cong no cua thue bao
ghi cong no chi tiet thuc hien
Xac dinh so tien dieu chinh doi vdi thuc bao do (DCO)
Neu so tien dieu chinh Idn lion so tien con no cua ky no xa (NO) nhat thi:
Ghi so tien dieu chinh cho ky no xa nhat bang so tien no (NO)
So tien dieu chinh cho ky no gan hon (N1) = DCO - NO
Trang 22Tru but no cuoi ky trudc cua thue bao do so tien bang vdi so tien dieu chinh.
chay chuong trinh khuyen mai nao trudc trong he thong
Cac thong tin co ban:
Trang 23g Nghiep vu Kiem tra.
+ Bude 3: Thuc hien kiem tra dir lieu theo cac hudng (Bill Item), neu co hudng sai thi
se kiem tra tai Bude 4 0 budc nay ta se co danh sach cac hudng bi lech, ta thuc hien
F •»
tiep kiem tra lung hudng bi lech
+ Budc 4: Thuc hien kiem tra dir lieu theo ngay cua hudng bi lech 0 budc nay ta se co danh sach cac ngay bi lech ciia hudng day, va thuc hien kiem tra tiep den muc nhd hon + Budc 5: Kiem tra du lieu tong hgp cua thuc bao, va dua ra danh sach cac thuc bao bi lech cudc
h Nghiep vu In thu
+ Budc 1: Lay danh sach tat ca khach hang cd thong bao cudc tren he thong
F /
+ Budc 2: Sap xep lai theo thu tu uu tien vdi cac tieu chi NSD chon
+ Budc 3: Gan cho KH mot so thu tu nhu da sap xep d Budc 2 va luu vao CSDL Kern theo viec gan la tao ra mot ma barcode (nhu mo ta d tai lieu THNV) Va tao ra mot ma jobin cho khach hang, mdi khach hang se thuoc mot nhom co chung mot ma jobin, trong
mot jobin se co khoang 3000-4000 item noi lien tiep nhau, tuan theo nguyen tac mot jobin khong dugc thuoc 2 nhom in mot jobin khong dugc thuoc 2 to thu (Nhom in la mot bang danh muc, danh muc nay se chi ra mot hinh thuc quan ly se thuoc mot nhom
in nao day, vi du nhom 1 gom KNT va N1K, nhom 2 gom ng dong, nhom 3 KXD )
j Nghiep vu Phat hanh thong bao cudc
+ Budc 1: Thuc hien Import cac dir lieu da tinh cudc vao CSDL
Trang 24+ Bude 4: Tien hanh in thir khach hang vira tim duoc thoa man cac dieu kien tren.
+ Bude 5: Kiem tra thong tin tren thong bao cudc, chi tiet cudc in thir, neu chinh cac thi nhan vien phong Billing ky xac nhan
1.2 Nhung ton tai ciia he thong
Vdi do phuc tap cua qua trinh xu ly nhir tren, du lieu cua he thong ngay cang
a Cac nhugc diem cua he thong:
Theo quy dinh KPI ve he thong, du lieu ve cuoc goi cua khach hang cham
import het so lugng dir lieu ton trong khoang thdi gian sir co
Dir lieu trung binh thang trong nam 2018 la khoang 1.5TB, nhung sang den nam
2019 thi du lieu trung binh thang khoang 2TB tuong img vdi khoang 50 trieu thue bao
Du lieu chi tiet nay tang cao do so lugng thue bao tang hang nam, ngoai ra nhu cau su dung cua khach hang cung tang
- Neu xay ra sir cd thi he thong phai hoan thanh import cudc cham nhat de khong
bi ton la 45 phut Day la KPI cua Tap doan cung nhu la KPI ciia Bo Thong tin va
Trang 26CHUONG 2 PHAN TICH, LUA CHON VA THIET KE GIAI PHAP 2.1 Mo hinh giai phap cu.
2.1.1 Mo hinh vat ly he thong Billing hien tai.
Vai mo hinh nay, viec ma rong cac thanh phan vat ly rat don gian, trong khoang
tu nam 2006 den nam 2018 [10] thi so lupng node mang da tang len khoang 10 lan,
CORK SWITCH • 02
Load Balancing-02
Load Balancing-01
Switch DB-01 Switch App-02
i i
i i
Hinh 2.1 Mo hinh vat ly he thong Billing
Trang 27Cap Switch core: la cap switch dam nhiem vai tro giao dien giira he thong Billing
va cac he thong khac
Cap Load Balancing: lam nhiem vu phan tai cac connection tir ben ngoai vachia
b Nhom cac thiet bi may chu
Thiet bi may chu dugc chia lam hai nhom chinh, nhom ung dung vao nhom Database Nhom ung dung co cau hinh thap han so vdi may chu nhom Database nhung
co so lugng nhieu han
c Nhom cac thiet bi luu tru
tren hai tu dia giong nhau nham muc dich du phong cho nhau
2.1.2Mo hinh logic he thong Billing hien tai.
- Khdi giao dicn vdi cua hang;
Hinh 2.2 Mo hinh logic he thong Billing
Trang 28a Khoi import chi tiet cudc
Day la khoi cung cap dir lieu dau vao cho he thong Toan bo thong tin chi
tietve cuoc goi, tin nhan, dtr lieu deu dirge import vao Database phuc vu cho
viectinh cudc va tra cuu cuoc
b Khoi tinh cube
Khoi tinh cudc co chuc nang tong hop cudc dinh ky, tinh khuyen mai va
•> r
khoaso cuoi thang va in thong bao cudc cho khach hang Toan bo nghiep vu
tinh cudc deu nam tren khoi nay, day cung la khoi chuc nang chiem tai nguyen
nhu:dang ky thue bao mdi, cap nhat thue bao, thu cudc, in chi tiet cudc
d Khoi luu trtr du lieu
va thong tin ve tien cudc cua khach hang
2.2 Md hinh giai phap he thong Billing mdi va hoan toan mien phi[ 10J
Zeppelin Real Time Data Visualization
Apache Hive SQL Query
Apache Pig Scripting
YARN Cluster Resource Management
Trang 29a Khoi churc nang import (ETL)
*Red Hat Fuse: la mot nen tang tich hop ma nguon mo dua tren Apache Camel
No la mot non tang tich hop phan tan cung cap mot phuong phap, co so ha tang
va cong cu dupe chuan hoa de tich hop cac dich vu, cac dich vu nho vacac thanh
phan ung dung JBoss Fuse su dung cong nghe (Java Business Integration-JBI) lam nen tang tich hop ung dung Nho vay, JBoss Fuse thuake cac tinh nang tren JBI nhu: cac bo dinh tuycn va chuan hoa thong diep, cac tac vu de quan ly va cai
%
dat cac thanh phan trong true tich hop
b Khoi chuc nang luu tarn thoi (cache)
Kafka: la he thong truyen thong diep phan tan, do tin cay cao, de dang mb
du lieu di vao hang dpi Kafka dupe thiet ke ho trp totcho viec thu thap du lieu thoi gian thuc
Toe do nhanh: Vdi mot may don cai dat Kafka cd the xu ly sb lupng dulieu tir viec doc va ghi len tbi hang tram megabyte trong mot giay tir hang nganmay khach
Kha nang mb rong: Kafka dupe thiet ke cho phep de dang dupe mb rongva
trong suot vbi ngubi dung (nghia la khong co thoi gian chet - ngirng hoat dong trong khi them mot nut may chu moi vao cum) Khi Kafka chay tren mbteum, luong du lieu se dupe phan chia va dupe van chuyen tbi cac nut trong cum, do do
Trang 30HDFS: la 1 he thong luu tru chinh dugc dung bdi Hadoop Nd cung cap truy cap
MapReduce: Day la he thong dua tren YARN dung de xir ly song song cac tapdir lieu Ion MapReduce framework gom mot single master (may chu) JobTracker va cac slave (may tram) TaskTracker tren mdi cluster-node Mastered nhiem vu quan ly tai nguyen, theo doi qua trinh lieu thu tai nguyen va lap licit quan ly cac tac vu tren cac may tram, theo doi chung va thuc thi lai cac taevu bi loi Nhung may slave TaskTracker thuc thi cac tac vu dugc master chi dinh va cung cap thong tin trang thai tac vu (task-status)
de master theo doi
Zookeeper: la mot dich vu tap trung de duy tri thong tin cau hinh, dat ten, cungcap
r
su dong bo phan tan va cung cap cac dich vu nhom Noi dun gian trong Hadoop Cluster
co nhieu nodes khac nhau va cd mot node la master Gia sir master node bi loi vdi bat
•>
ky ly do gi thi vai tro cua master node dirge chuyencho mot node khac Vai tro chinh cua master node la quan ly viec ghi theo thulir giua cac nodes Zookeeper se gan master node mdi va dam bao rang Hadoopcluster thuc hien tiep viec xu ly va khong gap van de
gi Zookeeper la phuungphap phoi hgp tat ca cac yeu to cua he thong phan tan Hadoop
HBase: la mot he cu su du lieu ma nguon mu dugc xay dung dua tren BigTabledugc
mo la trong nghien cuu “BigTable: A Distributed Storage Systemystem for Structure Data” HBase cung cap kha nang liru tru du lieu ldn len tdi hang ty ban ghi, hang trieu cot khac nhau cung nhu hang petabytes dung lugng HBase la mot NoSQL dien hinh bdi vay cac tables cua HBase khong cd mot schemas co dinh va khong cd cac quan he
Trang 31- tat ca scripts chay tren Hadoop Cluster.
Hive: la ha tang kho du lieu cho Hadoop Nhiem vu chinh la cung cap sir tdnghpp
du lieu, truy van va phan tich No ho trp phan tich cac tap du lieu Idn dupcluu trong HDFS cua Hadoop cung nhu tren Amazon S3 Diem hay cua HIVEla ho trp truy xuat giong
Trang 32luc tren toan bo tap dir lieu ma khong can phai trfch xuat mau tinh toan thu nghiem Toe
%
do xu ly cua Spark co duoc do vice tinh toan duoc thuc hien cung luc trennhicu may khac nhau Dong thoi viec tinh toan duoc thuc hien b bo nho trong(in-memories) hay thuc hien hoan toan tren RAM.• •
la bieu do, bang bieu hay bang chi tiet Cau hoi cua ban co the duoc luu lai sau, giup ban
de dang quay lai vbi chung hoac ban co the nhomcac cau hoi thanh cac trang tong quan
2.3 Mo hinh giai phap mien phi ket hop cd phi.
Trang 33Data Collector
i
i i i i
i i
•
i
i
i i i i i
i
i
i i
• i i i
•
• i
c
Xi E
&
ro c - E
r
I
i I
I I
I
l I t
I
i I
i I I
I I
l I
l
I
l I
i
l I t I
I I I i
MemSQL
Relational database management system
Pipeline MemSql Node
Pipeline MemSql Node
Pipeline MemSql Node
f
Pipeline k
33
X J
X
-Hinh 2.4 Mo hinh giai phap kep hgp cd phi va mien phi
a Khdi thu thap du lieu (ETL)
Mule ESB la mot true tich hgp(Enterprise Service Bus-ESB) cua hangMulesoft,
bo quy tac, khi do viec dinh tuyen den kenh nao hoan toan dua tren bo quy tac nay
Tfnh nang chuyen doi dir li»u(Transformation): la qua trinh chuyen doi dtrlieu tir mot dinh dang (vi du: tep co sd dir lieu, tai lieu XML hoac trang tinh Excel)sang mot
dinh dang khac Bdi vi trong doanh nghiep dir lieu thirdng nam d cac vjtri va co nhieu dinh dang khac nhau, nen viec chuyen doi dir lieu la can thiet dedam bao dir lieu cd the
lien ket vdi nhau hoac co the ducrc dung bdi he thong khac
Tinh nang quan ly giao dich(Transaction management): la mot hoat dong trong irng dung de dam bao rang ket qua la xac dinh va chinh xac Trong qua trinhehuyen doi
hoac dinh tuyen se cd nhieu birdc, de dam bao cac budc nay lhanheong va dir lieu la thong nhat thi ban phai dong goi cac birdc thanh mot ludng.Trong luong xir ly dir lieu
nay se khong cd giao dich thanh cong mot phan hoackhong day du Neu viec nay xay
ra thi Mulesoft se rollback lai du lieu va thuc
Trang 34hien lai ludng xir ly tir dau.
Tinh nang bao mat(Security): cung cap mot lop bo sung cac kha nang tren hethong
da co Day la ldp tren cung va bao gom toan bo cac lop con lai Lop bao mat nay la mot
Kiem soat quyen truy cap vao cac API co tieu chuan bao mat da duqcchirng
minh nhu OAuth2, SAML va LDAP
Dir lieu sau khi di qua he thong Mulesoft duoc xir ly(chuycn doi, quan ly giao dich bao
mat) se duoc chuyen den khoi tiep theo la khoi hang doi Khoi nay lamnhiem vu luu cache cac giao dich va hoan toan hoat dong tren bo nhd trong(RAM)vi vay co toe do cao, dam bao duoc yeu cau lam he thong lam vice vdi du lieu Idn
a Khoi hang doi (cache)
Khoi hang doi dune chia lam ba phan chinh la khoi tiep nhan du lieu dau vao (Producer), khoi luu tru cache du lieu(cac topic) va khoi du lieu dau ra (Consumer) Khoi nay cho phep chay Cluster tren nhieu node de tang toe do xu
ly va dam bao tinh du phong Moi loai du lieu se duoc luu vao cac Topic, moi
Topic se co nhieu Partition, tren mot node se luu mot hoac nhieu Partition
Producer: co nhiem vu day du lieu vao mot hoac nhieu topic Ngudi dung co the quyct dinh lieu nhung thong diep (mdi dong cua du lieu) nao se cung thuoc vao mot partition thong qua mot chudi khoa dinh kern vdi thong diep Neu khong producer se gan mot khoa ngau nhien va quyet dinh dich den cua thong diep dua tren gia tri bam cua khoa Topic: la mot hang doi cua thong diep (mdi dong du lieu) co ten do ngudi dung dat Cac thong diep mdi do mot hoac nhieu producer
Trang 35tri bam cua chudi so do, dieu nay dam bao so luung thong diep tren moi partition
la tuong tir nhau
Partition: la noi luu tru du lieu tren tung may chu (broker) Vdi mdi partition,
ly lai thong diep neu gap loi trong qua trinh xu ly tnrdc dd
b Khoi luu tru du lieu (storage)
Pipeline: cho phep nhap du lieu thdi gian thuc tu cac nguon ben ngoai, nd se
Trang 36du- lieu hoan toan nam tren bo nho trong, vi vay du lieu duoc xuly vdi toe do cue cao Pipeline co kha nang md rong theo nen tang cluster vi vay cd hieu suat va tinh du phong cao.
MemSQL: la co so du lieu tren bo nho (In-memory database - IMDB), day
thdi gian (windows frame) dupe thuc hicn tuong doi thudng xuycn
• Ho trp JSON: MemSQL ho trp JSON tuong doi tot bang each ho trp loaidu
lieu la JSON, co the danh chi muc (indexing) tren mot doi tuong thuoc JSON true tiep thay v'l phai tach du lieu roi mdi danh chi muc nhu hau het cac CSDLkhac MemSQL con ho trp truy cap true tiep mot doi tuong trong JSON bang DML, nho do cd the truy van, loc bdt, chuyen doi mot vai du lieu thuoc JSONthanh chuoi, so va cd the thuc hien tinh toan dupe Tham chi co the truy van cac doi tupng long nhau trong JSON
• Kieu du lieu dia ly: MemSQL ho trp tuong doi day du kieu du lieu dia ly, cac ham thuc hien tinh toan dia ly nhu tinh khoang each, tinh giao cat, tmhkhoanh
bo nho nen thao tac spartial join dupe thuc hien kha nhanh
• Luu du lieu snapshot tren o cung: MemSQL khong chi luu du lieu tren bp
Trang 37lai nguyen ven tir d cung, thao tac nay trong trudng hop co sd du lieuco kich thudc
F
ldn lam cham tuong doi vice khdi dong lai co sd dir lieu Di nhien, co the tuy
*
chinh tuy theo nhu cau Cac du lieu luu tren o cung dupe non lai lam giam dang
ke dung lupng o cung su dung
cua cong nghe Web Services ESBcung cap cac dich vu dinh tuyen, chuyen doi va quan
ly giao dich de ho trp cho viec tuong tac giua nhung ung dung va dich vu tach biet phan tan trong mot each thuc an toan va tin cay Gartner da dua ra bang so sanh giua cac nha
F
cung cap dich vu EBS nhu sau:
F
Bang 2.1 So sanh tinh nang EBS cua cac nha cung cap dich vu
Mulesoft Oracle Microsoft Red Dell