Luận văn khai phá dữ liệu trong SQL server 2012

89 508 5
Luận văn khai phá dữ liệu trong SQL server 2012

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

BO GAO DUC DAO TAD TRUONG DAI HQC THANG LONG o0o CHUYEN DE TOT NGHIEP KHAI PHA DU' LIEU TRONG SQL SERVER 2012 thing vien huOng den : Trait Quang Duy Sinh vien unit hien : Doan Minh C6ng A11278 Nguyen Mk Hoang A11500 Chuyen nginh HA NOI-2014 : C8ng nett thong tin Lot MO DAU Sr phat then cua cong nghe thong tin va viec img dung tong nghe thong tin v nhieu linh Arc ctia dbi song, kinh tee, xft hoi nhieu nim qua cling ding nghia veri lucmg de lieu dl duqc the co quan thu thip va lint frit mot tich lily nhieu len H9 luu t± cac de lieu vi cho ring no An chfra nhung gia trj nho nhat nao Tuy nhien, theo thOng ke tin chi mot lacing nho cira nheng de lieu (khoing tir 5% den 10%) la luon duqc phan tich, so lui h9 khong biet phai lam gi hoic co the lam gi veri chting nhung h9 van tiep mc thu thip rat ton kern viii y nghia lo sq rang co cai gi de quan trcong bj be qua sau Inc can den no Mit khac, mOi throng canli tranh, ngu&i to cang can c6 nhieu thong tin veri tic dO nhanh try glop viec quyOt djnh vi cang nhieu cau hoi mang tinh chit djnh firth can phai tra lei dua tr'en mot khOi lacing de lieu khOng 16 dii c6 Viii nheng It nhtr vay, cac phuong phap quan trj va khai thac ca ser de lieu truyin thong nwly cing khong dap img duqc thuc to di lam phat trier mot khuynh huemg ky thuat mOi de la ky thuat phat hien tri thirc va khai thic de lieu (KDD — Knowlefge Discovery and Data Mining) icy thuit kham pha tri thfrc va khai pha de lieu da va dang duqc nghien ciru, img dung nhieu rinh Arc khac cac ntrerc ten the gieri, tai Viet Nam ky thuot tuong dOi mai me toy nhien cling dang duqc nghien thuva din dua vao ling dung Buerc quan nhat ctia qua tranh la Khai phi de lieu (Data Mining), giirp ngueri sir dung thu thip duqc nhung tri thirc heu ich tir nhung ca ser de lieu hoic cac nguOn de lieu khOng to khac Rat nhieu doanh nghiep Ara to chirc tre'n the giai da img dung ky thuilt khai pha de lieu vao hoot dOng kinh doanh ctia minh va di thu duqc nheng lqi ich to Ion Vi nhung IY nhu viy nen chting em di ch9n de taithai pha du lieu va img dung SQL Server 2012"v6i mong mu6n tim hieu cac phuong phap, cac me) hinh, ky thuat khai phi de lieu Dieu khong chi c6 tat dung tat gee nghien cuu IY thuyet ma img dung thuc to din tren mot me hinh va kiim chimg tinh xac thuc ma ky thuat khai phi de lieu dem lid Tir nhung kien thirc ca ban, dan sang tim hieu cac van de phirc tap lien quan den cac thuat Win khai phi du lieu Tuy chi la nhting mirc tim hieu ca ban, don &An nhung cling it nhieu de cap duqc den cac van de can ton tai va kha ning cita img dung khai pha de lieu, dic biet la img dung he quan CSDL SQL Server 2012 trj NOi dung bio ciao chuyen de tot nghiep bao gem: Lori my diu Danh !nye tir vier tit Chuang Tong quan ye khai phi de lieu Chuang 2: Cie tic vu khai phi (M lieu Chuang 3: Khai phi der lieu SQL Server 2012 Chuang 4: Ling dung khai phi de lieu SQL 2012 Ket luin TM lieu tham khio BANG it IOU VA CHU VIET TAT KY hieu viet tit Nghia tieng anh Nghia tiang viet DM Data Mining Khai pha dU lieu BI Business Intelligence Tri tue doanh nghiep CSDL/DB Database Ca so dft lieu OLAP Online Analytical Processing Xir ly, Oen tich der lieu ttvc tuyen KDD Knowledge discovery in databases Kham pha tri thtic cac at sa der lieu SSIS SQL Server Integration Services Cac djch At tich hop ten SQL Server ht3 trq khai pha de lieu ERP Enterprise Resource Planning Quin lY nguOn loc va tai nguyen ctia doanh nghiep ODBC Open Database Connectivity Ket not ca ser du lieu ma MVC LUC CH •CING TONG QUAN VE KHAI PHA DIY LIEU 1 1.1 Khai niem ve khai pha 80 lieu 1.1.1 Giei thieu ye khai pha der lieu 1.1.2 Dinh nghia ve khai pha der lieu 1.2 Cac buoy khai pha 80 lieu 1.2.1 Cac ki thuat khai pha 80 lieu 1.2.2 Luting 80 lieu 1.2.3 yang dbi caa mOt du an khai pha der lieu 1.2.4 Chuan khai phi dii lieu 1.3 Cac huang tiep can den van de khai pha der lieu 1.3.1 Kien irk caa mOt he thOng khai phi der lieu 8 1.3.2 Cac chirc rang chinh cua khai pha 80 lieu 10 1.3.3 Cac dung de lieu the khai pha 11 1.3.4 Nhang van de kho khan khai phi der lieu 12 1.4 Xu huemg nghien cuu va viec *fig dung cua khai pha der lieu hien 14 1.4.1 Huang nghien ciru 14 1.4.2 (Trig dung coa khai phi der lieu thuc to 14 1.4.3 Ung dung cua khai phi der lieu viec giii guy& cac nhom bai toga kink doanh 15 CHUCING CAC Kt THU3T KHAI PHA usu 2.1 Phan lop da lieu 16 16 2.1.1 M8 hinh phin lap cay guy& dinh 16 2.1.2 M8 hinh phin lop chit lieu Bayer 18 2.2 Phan gun 80 lieu 20 2.3 Hai quy 22 2.4 Luat ket hap 23 2.5 Du bio 25 2.6 T6'ng hqp hem (Summarization) 26 2.7 M8 hinh h6a sv phv thuec (dependency modeling) 26 2.8 Phat hien stir Bien d6i va de Itch (Change and deviation detection) 27 CHUIZING KHAI PHA Dir LItU TRONG SQL SERVER 2012 28 3.1 MO Willi OLE DB SQL Sever 28 3.1.1 Gidi thieu 28 3.1.2 Cac khai niem co ban OLE DB cho Data Mining 30 3.1.3 Data Mining Extensions to SQL (DMX) 31 3.2 Cac thuat toan khai phi der lieu SQL Server 2012 34 3.2.1 Microsoft Decion Trees 35 3.2.2 Microsoft Clustering 35 3.2.3 Microsoft Naive Bayes 36 3.2.4 Microsoft Sequence Clustering 36 3.2.5 Microsoft Time Series 36 3.2.6 Microsoft Association Rules 37 3.2.7 Microsoft Neural Network 38 3.2.8 Microsoft Linear Regression 38 3.2.9 Microsoft Logistic Regression 38 3.3 Nguyen tic chqn dm* toan CHITONG VNG DVNG KHAI PHA DC LIEU SQL SERVER 2012 38 41 4.1 GiOi thieu ve Business Intelligence Development Studio 41 4.2 ling dvng SQL 42 4.2.1 Sir dung thuat than Microsoft Decision Tree va Microsoft Naive 42 Bayes 4.2.2 Su dying thujt toan Microsoft Association Rule 63 !CET LU*N 81 TAI LI$U THAM KHAO 81 TONG QUAN YE KHAI PHA DIY LIEU CHUCFNG TONG QUAN VE KHAI PHA Dir LIEU 1.1 Khii niem va khai phi d* lieu 1.1.1 GM thifu vi Mai plui chi Wu Trong nhcmg am gin day, su phat then mph me ciut CNT'T va nganh ding nghiep phis cimg da lam cho kha ning thu nhap va Itru fru thong tin ciia cac thimg thong tin tang nhanh met cach cheng mat Ben conh viec tin hoc hea met each at va nhanh chiong cac hoot dOng san xuat, kinh doanh cling nhu nhieu lInh Arc hog dOng khk di tio cho chimg to met lucmg de lieu luu tray Ichting 16 Hang trieu CSDL da dugc sir dung cac host dong san xuat, kinh doanh, wan co nhieu CSDL cac len cot Gigabyte, thorn chi la Terabye So bang din tin ye'u cau cap thiet la can co nhung k9 thuit va ding cu mei de to Ong chuyen doi Wong de lieu khang to Ida the tri thirc co ich Tir do, cac Id thuili khai pha de lieu di fro met linh we then so dm nen cting nghe thong tin the giei hien 1.1.2 Dinh nghia vi khai pith dfr lifu Phat hien tri thirc (Knowledge Discovery) cac co se du lieu la met qui trinh nhan biet the miu ho4c the mo Mob de lieu voi cac tinh fling: hqp thee mei, kha ich, va c6 the hiau duqc Con khai thic de lieu (data mining) la men nge tuong del mei, no din vao khoang nhfrng nam cu & cua dun thap 1980 C6 nit nhieu djnh nghia khac khai phi de lieu Giao su Tom Mitchell da dua djnh nghia cita khai pha de lieu nhu sau:" Khai phi de lieu la viec sir dung da lieu lich sir de kham phi nheng qui tic va cai thien nhcmg quyet djnh tong tuong lai" Veri met each ti6'p c4r1 ling dung han, tien si Fayyad da phat bleu:" Khai phi da lieu durang duqc xem la viec kham phi tri thirc cac co se de lieu, la meat qua trinh trich xuat nheng thong tin in, trues day chua hi& va co kha fling heu ich, duel ding cac quy luat, rang bu0c, qui tic co se du lieu" Con cac nha thong ke thi xem" khai phi da lieu nhu la qua trinh phan tich dugc thiet ke tham mitt luong coc len cac der lieu nhim phat hien cac miu thich hqp vil hok cac mOi quan he mang tinh he thing gifts cac hien va sau de se hqp thirc hoi cac ket qua rim duqc bing each ap dung the miu da phat hien duqc cho tip ve mei cita de lieu" Trang 1190 A11278 — Doan Thanh Cong A11500 — Nguyin Dec Hoing TONG QUAN VE KHAI PHA DIY LI$U N6i tom lai: khai pha 80 lieu la met buoy quy trinh phat hien tri thirc gom co cac that town khai thic du lieu chuyen dimg dtrOi met se quy djnh ve hieu qua tinh town chap nhan duqc di tim cac mitt hoac cac me hinh dO lieu 1.2 Cic bulk khai phi Ilea 1.2.1 Clic ki thuOt khan ph6 drr lifu M3c du khai thic dfr lieu nhu lit met thuat nge tuong del mai, nhung hau bet cac ky thuat khai thic du lieu da ten tai tong nhieu nim Ma tier than cita khai thic dur lieu deu xuat phat tir: thong ke, hoc may ya co so a lieu Mot so thOt town khai thic d0 lieu, bao gOm ca hOi quy, chugi that wan, va cay quyet djnh deu duqc phat minh boi cac nhi thOng ke hqc Ky thuorhei quy" CIA ton tai nhieu the kY Cac thuat toan"chuOi than gian" di duqc nghien ciru nhieu thap ky Thuat town thy quyet djnh la met nhieu k9 thuat gin day, co nien dai tir gifta nhUng nam 1980 Khan thic d0 lieu tap yao phat hien to (king ho#c ban qr ()Ong matt Met di thuat town hoc may(machine learning) duqc lip dtmg cho khai thic dti lieu: a Mang noron (Neural networks) Day la mot nhftng icy thuat khai pha du lieu dirge ling dung ph6 bien Men ' '' K9 thuat phat trien dva tenmet nen tang town hqc vtIng yang, kha nang htan luyen ky thuat (lira tren mil hinh than kinh trong cita ngu&i Kat qua ma mpg naron hqc duqc c6 kha nang tao cac mo hinh dv bio, dv doin yeti de chinh xitc yi dO tin cay cao NO co kha nang phat hien duqc cac xu bluing phirc tap ma k9 thuat thong thubng Ichic kh6 c6 the phat hien duqc Tuy nhien phuong phip tnang no ron rat phirc tap yi qua trinh tien Minh no g#p rat nhieu kh6 khan: doi hoi mat nhieu thai gian, nhieu 80 lieu, nhieu Ian lciem tra thir nghiem b Giii thuat di truyen Li qui trinh m8 phong theo tier hoi cua tSr nhien Y Wang chinh cua giai thuat 11 dva vim quy luat di truyen bien dOi, chip Ice tv nhien yi tiers boa sinh hoc Viec xay dvng cac thuat town di truyen me phong sinh hoc nhim tim cac giii phip tot What bao gem cac btreic sau: - Tao ca the ma di truyen dual long cac xau cita met bang ma lct tv han che T - Thiet lap mei tnrang nhan tao tror h may tinh co cac giii phip co the tham gia"dau tranh sinh tO'n"veri de zit djnh dO cong hay that thich nghi" hay goi Trang 2190 A11278 — Doan Thanh Gong A11500 — Nguygn Thic Holing TONG QUAN VE KHAI PHA DIY LISU - Phat trien cac"phep lai ghep" de the gild phip ket hqp vei Khi cac rcau mi di truyen cua giii phip cha va mg bi cat di vi xep lai, qua trinh sinh sin nhu vay cac kieu dOt bien co the duqc ap dung may - Cung cap mot (lull the cac giii phip ban diu tucmg d6i da long vi a tinh thqc hien"cu(ic chai tien hem" bing each loci be cac gal phip tir ca the va thay the chung bing cac chin hoac cac dOt hien cua cac giai phip bk Thu* wan se ket thitc mot h9 cac giiti phip thinh citing duqc sinh Khai phi de lieu (KPDL) la viec frith chcm d.3c trtmg MI lieu mot each ty doting tir mot Si dii lieu 16n Tri thin thtrimg o cac ding maw c6 tinh chat khong tam thuong, An (khong twang minh) nhung 13i co the mang 13i ich lqi lam neu no duce sir clung dung chi) Co the coi KPDL 11 cot lai cfut qua trinh phat hien tri thac co so dii lieu (Knowledge Discovery in Databases — KDD) 1.2.2 Luling di lifu Khai thic der lieu la mot nhUng vien quan trong data warehouse family Trutmg hqp khai thic dft lieu nio la phu hqp veri dien kien ctla cac luOng der lieu mot kith bin kinh doanh dien hinh? Hinh sau minh h9a mot luting dir lieu doanh nghiep dien hinh ma khai that der lieu co the duqc ap dung cac giai down Ichic Application -4 Si ill O Data Mining P- ♦4 • Online transaction Processing (OLTP) ••s Onlbe Analytical Processing Hinh 1: M6 hinh khai phti du lieu doanh nghiep Trang 3190 A11278 — Doan Thanh Cling A11500 — Nguyln Dire Hoing TONG QUAN VE KHAI PHA Dir Met ung dung kinh doanh luu till the dt1 lieu giao Bich met ca so &I lieu bb 15, giao djch true tuyan (online transaction processing- OLTP) Cie clit lieu OLTP duqc chiet xuat, chuyin doi va nap vio data warehouse met each thuong xuyen Luqc itO Gila data warehouse thuimg khic tir met luqc 46 OLTP Met lucre d6 data warehouse dk tnrng cob hinh ding du met ngoi hay met bong tuyet.V6i bang giao djch o chinh gifta luqc 46 va dtrqc bao quash bei met be dimension tables(cic bang kich thubc) Tnnk lien, vi ph6 hien nhit, khai that dO lieu co the duqc by dung cho cac kho dO lieu nth ma dft lieu di duct lim mtch Cac miu duqc phat hien bed cic mo hinh khai thic c6 the duqc trinh bay cho cite nhit quan lt tiep chi thong qua the bio cao Khai thic dft lieu co the c6 met lien ket true tiep den cic ling dung kinh doanh, ph6 bien nhit la thong qua cac du doin Nh(mg khai thic dft lieu vio ling dung kinh doanh dang met phO bien han Vi du: Trong met kich bin bin hang qua Web, met met khach hang dit met sin vio gio hang, met du bao troy van khai thic der lieu duqc thuc hien de c6 duqc mot danh sich cic sin phAm duqc de nghj dua tren phin tich Khai thic du lieu cling co the duqc cip dung de pit tich kh6i OLAP, la met cc so du lieu da chieu ved nhieu kich thubc vi don vi Kich thy& c6 the len den hang trieu bin ghi d6 se kho khAn cho vier tim mo hinh quan tan Ky thubt khai thic dO lieu c6 the duqc ap dung de kham phi cac mo hinh an met khoi OLAP Vi du: Met thulit than lien ket co the duqc bp dung cho mot Ich6i ban hang, phin tich mau mua ctia khich hing cho met vimg cµ the va then gian Chling to c6 the ip dung ky thubtIchai thic dO lieu de du bao cac bien phip nhu ban hing vi lqi nhubn Trong 4190 A11278 — Doan Thanh tong A11500 — Nguyin Due Hoing LING DUNG ICHAI PHA Dv LIEU SQL SERVER 2012 P di„ le (0 0e5 - 51.0!3(1.diC RE EDIT 411BY POMO MD MSG TEEM 91 DAMS( TOOLS 1131 SIDOKTLIII 0- 1Mi-Dees aliblit MIALYZI MEOW le, - $: • • • 5000 S,ks Me/se VIM OWM2.0410uml P (A If • lewawaoan O (000000010 P• 000(00000110 (00000100100 Sea A papal • 44 amlismic • Q 00a ken so Ser w Rase Ma EMY0-6 OMI 0.0 Q dbaan Km p A•m•MMOMIIM fi OnMAer ♦ A •0000.01108 AM 1/40ZUMEI6TM lalkSLIPPM LT SIMMOSOMM 111119111 ( -1 AM 140-1 OM la Moms • QIimSbudm ?Oat* 0.•••••• PM ISIMAIRKIIMILIIY (0 44 Kum IIIMINMPORT OM an PA -) ONO k(0,5000 lam 60:0( MernerOmm mo•mr•a •1 Star ballet WANE (5000101110 Men 5110001000111,0111110MII 400116.000a hafalas VS* tle Spas grin 41 made met ar Ism 0( -p Mem Mist momrn 04,08•1 920 0.• zu000 *WS Et 5m1 MO* - Spas 14molonon s (404 «••-.0 (mai 04 sr nib • • IK ••104 P.' U,•0•.0•Mated Sms1•14 :ak nw Ercgttu Ma( Sau hieu chinh cac tham s6 cua Mining Models, him F5 de thuc hien m6 hinh Khim phi Minine Models Ket qua ciia Microsoft Association Rules the hien Tab Mining Models Viewer heti n6i dung chinh la Itemsets, Rules, va Dependency Net Itemsets: Itemsets cho ',jet cac thong tin quan tong dm luat ket hqp nhu Support (d6 ho trq cua 1u4t ket hqp), Size (S6 items Itemsets) EM hien thj cac Itemsets co chira m6t item nio (vi du mau xe Mountain-200) till nha‘p Mountain -200 Filter Itemset Trang 681 90 A11278 — Doan Thanh Cong A11500 — Nguyen Et Hoing irNG DUNG ICHAI PHA Dif LIEU SQL SERVER 2012 association rules - Niticroson ilisuai St.JO.° RLE EDIT VIEW PROJECT BUILD DOG sbit • Ds* 0.11111 SCIL DATABASE PINING MOOEI TOOLS TEST AR(1411(1191 ANALYZE *COW MP TEAM Akeetat Weis DY12012Asv (Deign] MSc Sbudare A 'atoms* Kmpaincyairt 81bi g lbd sar 14n14odd: soillbt nit Rules *nit And& RIM Oar Y Cl Don Dererdoxy Newt Snromacrt 141 ; Wisp 1E4441 la: Kmitun ram: 2000 ; ;- FAN ltepeet Shove Slsr ettrbie ram gle QSMokm we Sart Sr And 4MS MA 2110 M42 1 1 1739 1583 mn UM 1354 1 1217 Mot-100 thistim VAlkabt Pal6M-Exilm 1621004 'WAS • bilm 14222120-2M • Wm balite ToW COO, 6121m Fide Set - IS& • Salm 142421WMMIKage •ENMN Wane 12441m2y •Emig 1203 1146 Rom10016 Cage•&dm Mobil BOMBCape•UPW11ar BO* Hinh ten veri Itemsets c6 Support la 1146 gOm items la Mountain Bottle Cage, Water Bottle co nghia la tat ca cac giao dich thi co 1146 giao dich d6 khach hang mua loaf Mountain Bottle Cage thi cling mua loci Water Bottle Rules Tab: Phan trinh bay cac luat kit hqp dugc phat hien Uri mo hinh Cac thong tin ve luat kit hqp bao g6m: Probability: Cho bier xac suat xay cua fait Importance: Do Wang tinh him eking cua lust, gia Ili tang cao thi luat kit hqp tang tot Rules: Phan the hien cac luat kit hqp clang X=- ->Y Trang 691 90 A11278 — Doan Thinh Cling A11500 — Nguyen Dirc Hoang eNG DI,1NG ICHAI PHA Litu SQL SERVER 2012 ea association rules - Microsoft Visa! StJd[a RLE EIXT AR ARD1KT BUILD DM TERM SQL DATABASE 11196 MOIR TOOLS TEL ARCHITECTURE ANALYZE 0• s Slat Dredop • d- • limj Away 0.1 WircliteRtddm t Skase ;{ MIthg Alodels ;i • *INAS: reakri AR its , kkeenwcgos waniatom Maw Veen Win Isar RAMA, • Demet I Depailey PISS 14riun mealy: 10 ÷ %Ruh: 14M ingrIne: 0.18 • u Pr 1167 1.412 LOCO 1.000 1.030 100 1.011 LCCO LW 0.733 LOOD 1106 1000 1119 1.070 0.715 1.221 LOX 0.904 0.951 0.901 fie RUA lacrirce 10:0 1.000 5ko Anita stale ties mes: 5b7A krig rent 97701 0.901 0.713 -ORA Tom-1000 • BEIM Wen Dot • Eddrp Rod Bole Cage A Wry Pal-79) • BSc Rad The %be • Etart> Rune • bac 131M7 The =Niro Seat-100•E I A Icor* The Tite bag ALPS Tre Emirs SprAMO • bag -a Rosd ire 'Ate • BIM - ESN Huang® • Bin% ItuVAILTre IL& • &sr , 11116Aran •edort> war We • Ears] boa& Opt • Sas Cychi fernierSet -14osan beg War DA -Rag -711.nagtifk Cmie • Reg YouValla • Nang, Wale Retie • billrt) lontin Mk Cage • bag Rod Bole Cap A bac Spit-100 • Eta -A Wale Bath • Essig D C r- Ros7-750•SEIM Warr Balk - et101->Road Bode Cap-Bate WWI*, Bre • Bata Srat-100 • SW , ISE* Trete • Reg Balk -reap BON -> YOJIIth ask Cgs • ES COI HI 110171,136 5071- 100 • tag -> roma Tlt Tthe -(m MA Cac lust cho Met sv ket hqp gift cac items co so dir lieu giao (rich Chin han lust ket hqp thir cho ban biet ring neu met khich hang nio mua cac san phim la Touring - 1000 va Water Bottle thi nguari luon mua san phim Mountain Bottle Cage voi xac suet 100% Dependency Net (Man ohu thuoc): SU clang Dependency Net cho phep ban hieu duqc sv tac deng ciia cac items den Model Med Node Dependency Net the hien met Item, bang cach chqn met item ban se they duqc cac items khic duqc xac djnh beri Item di chqn (hok dung de xac dinh Item de chqn) model Ban co the keo thank tract (All link) ben trai de xem cac mirc ket hqp (manh hay yeu) gift cac Items model Trang 70190 A11278 — Doan Thanh Cong A11500 — Nguyen Dirc Hoing fiNG DUNG KHAI PHA DC! LI$U SQL SERVER 2012 assecat DI tiles - M:crosot V•suai SR;dic RLE EDIT NEW PROTECT MD DBU6 TEAM SQ DOWSE LANK MODEL PAS TEST ARCHTR1UPS ANALY71 WUOW le 0• 12 • km • Deed" • Arkeekse Viab DW2712.8sv Pop] 0 p Mniq kart/ Ort 1,9 Rm91432fikedom rf x MN Sluctut JI MOMS " Mmp Iblet auxibink Raceitioxidal *yr Pole Deae6 Derag Ik•bok k Pa di n % P 9,7„: ssabignmen V Shy blase Al lib Seled a node Mlle reit b ktifit is &perdue • 4eWm4 • ProwIrk It* law Trong Dependency Net, nEu chop Node Mountain bottle Cage to se tit ring Item Mountain bottle Cage c6 the duqc dv down bai items khac d6 la water bottle, Mountain - 200, Cycling cap, Fenderset - Mountain hojc Mountain bottle Cage duqc dung de dv down Items water bottle va Mountain - 200, Fenderset — Mountain, Cycling cap (Deiu mai ten chieu, xem hinh &rid), Trang 71190 Sport-100 (Dan ten chieu) A11278 — Doan Thanh Ding A11500 — Nguyen Dirc Hoang ING DUNG KHAI PHA Dist LIEU SQL SERVER 2012 association rules - Microsoft Visual Studio Eli EN YIBV YROJECT MAID MUG TEAM d-6111.4 0• ► go Start • Deg*, DATABASE MOING MODEL IDOLS TEST AKHITICUE AMA1371 VANDOW MB) ; Minn Wools Do4312.1frt May] kosacy Girt IiinStElPreckbr ;• Atomise r Pirn21616 otic Mode asocanok Rain f *KC V NOWA Anal SCAM DePerdni rat s1,7 I taterstenewif • u shmovrat M Lit a o• Select a rode oo te rebook b *iris epees Dieu c6 nghia la nhUng sin pham c6 kha nang duqc mua ding Neu ichich hang nao d6 mua xe dap thi co kha nang hq mua kgp de binh dung nu6c va binh dung nuec Cac thong tin co the gifip cho be phfin ban hang dat cac sin pham co kha nang mua cang can)) de giap cho khich hang khoi mat Gong tim kiem ding Sur xay dung cac chitin luqc marketing hien qua (chin han khong nen khuyen mai ding 16c cac hing thubng duqc mua ding nhau) Tao cac dir doin: Sau da hai long vei cac mo hinh khai phi dit liOu, c6 the bit olau to cac truy van du loan DMX rihO sir dung Prediction Query Builder NO co tinh fling On gi 6ng v6i Access Query Builder, tai day co the keo va tha cac town tar de xay dung cac cau truy yin C8ng cu bao Om khung nhin khic nhau, d6 la: - Desgin - Query - Result Trang 721 90 A11278 — Doan Thinh Cong A11500 — Nguyen Disc Hoing ITNG DUNG KHAI PHA Dv LIEU SQL SERVER 2012 - Dung khung nhin Design vi Query, c6 the xiy dung va thy &roc troy van Sau d6 co the chay va hien thj ket qua khung nhin Result Dv doin cic mau san phim c6 kha ning mua kern veri mOt san phim cho tram: Diu vao cna bai town la m6t mau san phim co ca so du lieu giao djch Dva vao m6 hinh khai phi, SQL server data tool se glop to dv doin cic mau hang khic co lcha ning mua kern vei mau hang di cho, hien thj d6 hi) trey cUa mau hag, d6 tin ciy call* ket hop - Tren h6p Mining Model menu, select Singleton Query d , d' CU Fart Mir YEW Plata - -r MAD DON IBM • 0-11100 IDA Askatee sct CATOWI lea M00fl TOMS TB7 MOOKILE WW1 WO* IS? D.A / - SOS Wow DoADDIVAIDIDO a ! mrosman A moms* A Detownsat P peontanaom To X ses P • •-* moincaelon SSW MCA Ss Opc,A1 w eft IISICAm ancomde (MENA ONAS•aellw MAII • jamblimels • g Disks= Q /*se Wc•bCPAIDILA Sr4a0Ony Sated g •ANK SOAR DASD • illaSsocatos Ep kkesas• MAD INCI)Cosi Danes • g mei swan 21 Oa Oodevinn n Dore Tun bra -1[K 1101./MICae • x Lao." phe • De•ption is e LA • CA, A= Sol NIB N ose Spalesainectik Awl ) Trang 73190 A11278 — Doan Thanh Cling A11500 — Nguyen Dire Hoag UNG DUNG KHAI PHA DIY LIEU SQL SERVER 2012 - Trong ceit Source ta awn Prediction Function Truing Field, chon Predict Association si association rules Microsot Visual Stiie FL! EDF YEW PROJECT BIRD DEBUG TEAM SOL DATABASE WING MODES TOOLS TEST ARCHITECTURE t3•0010 0• ► Start • Develop • ANALYZE Vrt1200/ ICY kMature Works DVI2012-thr Pena] O WM StActwe T Whig MSS A Ming ablel reef May Accrue Girt • ▪ SelettS AS Scarce g pmthimmair *Wok AS km emu And/Or Caterinekpael yolk el> WWI • kaki ! Prukt -}x Error LA Description Fie • Line • Cole- • Project A - Toi truirng Criteria/Argument, ta nhap [association rule].[v Assoc Seq Line Items],INCLUDE_STATISTICS,3 Trang 741 90 A11278 — Doan Thanh Cong A11500 — Nguyen Due Hoang ▪ irNG DUNG KHAI PHA DC! LBW SQL SERVER 2012 P FILE EDIT VIEW MELT PAD MUG RAJ SC PUSS •ING LOCH TOMS 1137 M10•71311 •-a or • • 9/a • Dooms- X •MAY71 WNW* 1W' I saasa Pot 00701/41 10=01 —span A Mae mai Ab.e.aw • an saga ow IS • Mining LIO0e1 scam Pa I od - I PAPAW OM Input asp IPS Cion • •Palsouna• I msg MIS I ki fa ere *Wo otallara PALOISsauflJMTAPXI as roam • maws • ,x &e.t.a aaas.e Pas Nam - Nome spa Spodanaa• a/ as Daate - Tai hijp Singleton Query Input, click the ( ) button ten cet Value de chon tnau mat hang can du down cac mkt lien quan x 211 7,f6a: on In: • LficoITIt FILE EDT HEW MELT MD MUG TEAM SC MUMS( 1•016140011 TOOLS 7137 IPOITIRTIOE ••LYZI MOOR le CV a - 6•IP 1.2a• D•amp- •: - 19- OAS AO P • • • SVIt0 GNP/ 11117LIE 141•11•1044.• Waning Mood - wens • •Ix Sx.Tesces IPoLLL Pala )nn +a Norta Year P+nmar an Vas Selata Laos DPIP (I Earl dr animism Alm • a Ds San • ••••••••030712A • GI Des Son Kw • Om( 311 Lat1=• ANSE Web 0•2111243 a Cs sr ibuchre • (a A • Asa Sal Onaulow a hi • I x Rmau •••• S Oda liegla0Pro 2 f- OL/a0POL ProCafiseece OMAN MOILeal•Can HalisAMPAcw4 Lea Trang 751 90 WT Nom Ent Ls 511 addsLISsed +.137 Nom Space the anti • don A11278 — Doan Thanh Clang A11500 — Nguyen Disc Hoing ONG DUNG ICHAI PHA DIY LItU SQL SERVER 2012 - HOp thoii Nested Table Input hi'en cic mau hang boa co the chip de dv doin.Chang to thin chon sin phim hop dyng ntrerc Water Bottle, click Add vi OK x FPI HOT vIlW PROW PAO MUG Tu Kt wows 0/1016 MOOR TOOLS MO •0010101 • • BaYd *oft OCOILIgal101 ID" 4E•• PO • S A " - • L • IMMO* IV %I • 0.•••• • *Fs •s•••••s•• T wter *al* Sinattri Om ewe • • ay, p MOOS - • C•S** Impol tom Nos • La awar Iblon1.1•01 Cop lassim Ma ten lionm-X11 1•4•••iseve Ws ft* mot Sact Moat - • 1- Iwo tan • Pam Sellrenlms I* • Deo3*••• e oG eir• - Click the Results de xem ket qui dv doim • , P X 0,• 3: 00 c 04 FILE WIT HM MOUT RIO MIK MM SOL BATS" tAllelOCCIEL IGLU TLST MOVIL11.• MAIM WU - or.* • • 8-0110 • M.M • Deets Oat ON21)114ft posy Mb Men " Miloo *AI Imw M SAM useasnie• McIM1 10 0.40331730 01/030/12204 61 ammo 0.1•111001SE12 n1 O U/70•0101 rts•EtI • a wrome • a a Ateesact OV01126 1/01010/100317 VFW X • • • sO lannetffincv Owl ISWIIIIN014 os r.seassisOW23124^ • a ris, a Noun • • s" $11aMS A • Pa a •s to•A n in•cu boln • I X I2:3* Woes EfragSlasan 2N Cobb* Mo.* EnsCo‘onlon *flad0 HehloaloCas Hokleoll*Paa• o QS/ 01111•11011011.1.111 lee SEMI X Untais — Eno, Lei •14.ac SmOolos ••s••• Trang 76190 CAL • hood • NE= *SWIM ono GOON aro A11278 — Doan Thanh Cong Al 1500 — Nguyen Dix Hoing INC DUNG ICHAI PHA mar LIEU SQL SERVER 2012 Dv down cac san phim met khach hang có the mua cimg yeti dva vao nhting hoa don gin nhit khach hang da timg mua Diu vao: la ma cac khach hang, khu Arc sinh song coa ho Diu ra: la cac bang china thong tin dv doan: tutting miu san phim lien quan khach hang có the mua, de try, dO tin cay cua lujt ket hop Ta tiep mc lam viec yeti tab Mining Model Prediction - Trong Mining Model pane, click chuot chon v AssocSegLine Item - Trong hOp Select Input Table(s), click Select Case Table HiiiiialS"" °""`"'"3 00 ItingSInchre Jf Ittfloiels fia """ "1"1" ItimptdelYar Ittg awry Owl SQL A M ning Model , c 54 stdonnk i CS Ibter E9 asSefilieltri I PS / SixtCaset ker.veTne Ike :or SelectliS1 on Feld 51311 Gin $0 OltreligUlat - HOp thoai Select Table hien ra: to chon bang v AssocSeq Order Trang 771 90 A11278 — Doan Thinh Cong A11500 — Nguyen Dire Hoing UNG DUNG KHAI PHA DU LICIT SQL SERVER 2012 a ge se pada pan caw asa • s 2-n• mh I I I I I ME I I I MI I1 • st ••••7:naeo Aroma goon S• DVS! WILAIXEL /CAS WA I I I I IWO* WI r_lte II a • • • 12., .Pas A Amon IAA ACK TesieVe Pen LP}, Oa /AMA til•• eim• AAA Sews arse wpm ; e e • , , - Ttrcmg ty, hep thoai Select Input Table(s), click Select Nested Table HeP thoai Select Table hien ra, Ian ta chon bang v AssocSeqLine Items, and then click OK SQL Server se tu deng tao met firth xa tir Mining Model tai bang v AssocSeqLine Items pu a pen POMO 1A1) la IP SOIRC11111 Abtaiiit /MO* praus AA/ SOL POWS WON WOK roam EMEMME11 A AAA".—'I As sun A woe maw tp, • ss Awoke A re, Piss *me mos AasinnOS• 11.1MINE O • - • • 21 • 13 ens CF:tt as, _ My PPM, Oilalinpia• vamerletO•ws Pit e y i • PI e• - Tiep theo ta them the nen thong tin ciia khach hang can du doily Trang 781 90 A11278 — Doan Thanh aing A11500 — Nguyen Dirc Hoing (INC DUNG ICRAI PHA DU LIEU SQL SERVER 2012 Deng : Tai cOt Source: select vAssocSeq Orders table, cOt Field : select CustomerKey Deng : Cot Source : select vAssocSeq Orders table, cOt Field : select Region DOng : Cot Source : select Prediction Function, cOt Field : select Predict Association Keo tha v AssocSeqLine Items ten cot Criteria/Argument Them tham so INCLUDE STATISTICS, ding sau gia trj nap vao cOt p 3550Q610r rlite • RE EEO SD PROJECT 1111) OM UM SR DATALESE IMIEG 1K TOOLS 131 41041ECIUM NOME WION IL Ondit TGW0.11k ' la • 74 Wig Mold 19-100 fr0 lantana ax" A 90baile•Y x Seita••Was ASA VISE INEElam Marl F x 111 ; Osto• Dolt"- • 0• e ),J“ MI(13SCr Seta SEMEN re O peso • j slam ode • IQ Doke:a Q Marta Weds MINA • g Oda SafficeSSos amocilm nit NEUSS WeYI eOm # Adman Ws601012la fi Gee s asnadm vas : fa Cis a aTLeySwans T aas.arsa- I- °Seen bamboo oae a Wm Eokia Teon •sx SESSIONEITElia San s 4.Edoptin ▪ AS*, s OW 040 Olesiapsex MN caeca IREOP 14AISMEEPtocent lanIdeErsieSE• Ami epeii IOIQ}R16161 Mena MOM b ASCSKIden 511 C • x Larne Now He • Lot • (lam • Nen • Sol Othus Wa Specks the now al thE object e r DI - Click the Result de tra ve ket qua to can du doin Trang 791 90 A11278 — Doan Thank Cong A11500 — Nguyen Mc Huang (INC D1,11‘1G KHAI PHA DU LISU SQL SERVER 2012 p-e FIE EDT MEW MET MD 00 TEM SCI WW2 MIMIC MODEL TOOLS TM WIRCRIE 0- IMLYZE AMON Hal Sim DeS • Soden Soto Mooluellx101200 Pewl •IX 1)-000 fr0 07 01rng 00 00011 j• 1010 an A 01 a.1 * OsbasTes 41MB 0000 -lora an roam —M MI 720 Ro-710 it bolive II asiggin • COMUIX6 BOUM 171311011 10 • EIS 11245 IWe rWe • boom • Egrsn None nun ISM Eerie am Elm • We • Eqnsin 2i+ mob IW 00ax • EPS 27767 610411.01101 Ima loo • Expes • Imes as Wks t Oman 0 nabs HallSISCan 10110ercart 11,001 *WA =Oa a DM Ro lla • X p I Soso rit • EN X 0001/110 ♦ 10142 2411 • I • Saz Sal Nan IfrosiSint la • year* 0•1•0 Meta Ws DX111110 • Nome *oho istroya dike Apt ) Ket qui tra ve g6m CustomerKey(tna khich hang) Region(Khu wc) Expression (chua thong tin ve san pham khich hang c6 the mua sung nhau, dO h6 trq, d6 tin cay ciia du doom) Vi du nhu hinh ye, to thiy Ichich hang 18239, khu vue thai binh throng, cep kha ning mua binh nu6c Water Bottle, dO tin coy ctia du doin la 88%, dan hang c6 the di kern yeri cac san phim 16p xe Road-750, ya HL Road Tire Trang 801 90 A11278 — Doan Thanh Cling A11500 — Nguyen Due Hoing KET LUAN De tai di trinh bay cac khai niem co bin cita khai pha der lieu, nghia cua khai pha du lieu deri song va giai thieu mot set huerng di mei Milt vkrc khai pha dr lieu hien Deing died qua &rang kien thtic ca ban vira tim hieu, chting em di ting ding cac thuat toan phan lap dtra vao Cay quyet djnh, djnh 19 Naïve Bayes va Luit ket hqp giai quytt mot so bai toan kinh doanh thvc te Cac thuat toan duqc then khai tren he quan trj SQL Server 2012, mot tong cv khai pha dv lieu phO bien hien Kat qua thu duqc sau thvc hien chuyen de: - Nim duqc cac khai niem chinh, cac thuit toan ve khai pha du lieu - Ap dung mot s6 thuit toan, 1C9 thuit khai pha der lieu vao bai toan kinh doanh phan tich khach hang tiem ning, phan tich giO hang - Ap clang ding nghe men, sir dung cong cv SQL Server Data Tool tich hqp Visual Studio 2012 va he quan trj SQL Server 2012 de khai pha dit lieu Huang phat trien tiep theo: Tiep tic nghien ciru sau hem ve cac thuit toan lai, ap clang vao giai quyet cac bai toan kinh doanh thvc to Ichac chimg em xin giri 1?ri cam on chin cho sv giop der nhiet tinh va ceri ma cita cac thiy, co giang vien throng Dai Hoc Thing Long nai chung em thvc hien de Xin giri lai cam an dic biet tat thay Trait Quang Duy di huerng den chung em hoan Bao cao chuyen de tot nghiep Xin chin cam an Trang 81190 A11278 — Doin Thanh Ging A11500 — Nguyen Dirc Huang TAI LIEU THAM MAO [1] Wiley,.Data.Mining.with.SQL.Server.2005.(2005).DDU.LotB [2] Wiley,.Data.Mining.with.SQL.Server.2008.(2008).DDU.LotB [3] Data Mining Tutorial — Microsoft Corporation 2005 [4] Trang web ve KTDL - Kdnuggets: www.kdnuggets.com [5] Slide bii giang Data mining coa PGS TS HA QUANG THIJY - truing Dai h6c Cong Nghe - Dai hnc Quec Gia — Ha N6i [6] M6t s6 tai lieu tra ciru khic Trang 82190 A11278 — Doan Thinh Cong A11500 — Nguyen Wm Hoing

Ngày đăng: 03/07/2016, 22:11

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan