luận văn về tập phổ biến và luật kết hợp
Trang 38Vdi t~p phdbi€n {iI, i 2 , i3}, eo th€ t~o lu~t k€t h<;ipeo d~ng:
Co 50% khdch hang mua MANY(nhi~u) {il,i2},mua MANY (nhi~u) (ill 1.7.4 Tun lu~t ke"th(/p cae ngii' canh khai thac du Ii~u mit [7]
GQi FFS(O,I,RF,{/-li};r,minsupp)la t~p h<;ipcae t?P ph6 bien cua ngii'dnhkhai iliac dii' li~u mo ung vdi bt) ham thanh VieD{J-li},gia tIi nguong chuy€n d6ifigii' dnh 1: va nguong minsupp Vdi ba bt) ham tMnh VieD{J liMANY}, {J-liAVER}, {/-liFEw}cho tUng m~t hang iEI, co th€ t~o ra ba ngii' dnh khaithac dii'li~u mo khac nhau va ttrdo sii' dt,mgcac thu~t giiii £lmt?P phd bi€n da
trlnh bay d cac ph~n tren d€ tlm cac t?p:
. FFSl= FFS(O,I,RF,{J-liMANY };r,mimsupp) ti'ng vdi bt) ham MANY
. FFSz= FFS(O,I,RF,{/-liAVER};r,mimsupp)ti'ngvdi bt) ham AVERAGE
. FFS3= FS(O,I,RF,{J liFEW};r,mimsupp) ti'ngvdi bi) ham FEW
TIm SFl E FFSj, SFzEFF~z saD cho SFI2=SF1nSF27:0 va phan ra SF12
thanh cae t?P con X, Y khac r6ng cua SFI2saDcho SFI2=XuY va XnY=0 d€t~o Iu?t ket h<;ipx~ Y giii'acac ngii'dnh khac nhau N€u lu~t nay co dt) tin c~yvu<;itngu'Ongminconf, thi co th€ cq.,caelu~t k€t h<;ipeo d~ng:,,"",,9
C6 56% khdch hang mua MANY(nhi~u) m(it hang X, thi se mua FEW (it) m(it hang Y
1.8 DUNG LV! T KET H(1P DE PHAN LOP DU LltV VA M<1RQNG Hts6 PHt} THVQC THUQC TINH TRONG LY THVYET T~P THO [9]
1.8.1 Cae khai ni~m cd ban
lJinh nghia 1.22 Bang quyet dinh nhi phan
Xet ngii' dnh khai iliac dii' li~u (O,D,R) vdi 0 la t~p khae r6ng cac d6i
tu<;ing,D la t~p khac r6ng cac chi M.o ( thut)c tloh nhi phan), cho H va C la caet~p con khac r6ng cua D saDcho D=HuC , HnC=0, bi) ba (0, D=HvC,R) (hf\1e gQi III mi)t bang quy€t dinh nhi phan.
Trang 39Bang 1.11: MQt vi dl,lv~ bang quye't dinh nb! phan
Bang 1.11 Ia mOt vi dl,l v~ bang quye't dinh nhi phan voiH={dl ,d2,d3,d4,d5} va C={cl ,c2} ThuQc tinh cl xac dioh lOp am; thuQc tinh c2xac dinh lop dlfdng
Djnh nghia 1.23 Lu~t pMn lop ireD bang quye't d!nh ohi phao
rho bang quye't dinh nhi phan (0, D=HvC,R), gQi S la cac t~p coo khac
,
ding cua H, lu~t pMn lop tren bang quye't dinh nhi phan (;6 d~ng S~ {c} voiCEc Ham pMn lop f dU'c;1c~o tit lu~t phan lop co d~ng f=1\ dEH" d va H' c H
Vi dlJ1.8.MQts6lu~t pMn lop trang bang quye'tdinhnhipMn (jbang L 11
RI:{d3,d4}~ {c2li;'R2:{d2,d5}~ {cl};R3: {d5}~ {el}
Cac ham pMo lop tu'ong t1ng Ia fl=d3 " d4; f2 = d2 J\ d5, f3=d5 E>6i tlfc;1ng0
thoa ham phan lop f ne'u a co chtta ta't ca cac chI baa co m~t trang H'
1.8.2.Dq chinh xac cua ham phan lap
0 du'c;1cxe'p vao hai lop GQi0+ la t~p cae d6i tu'c;1ngcua 0 thuQcv~ lop e2 va
0-la eac t~p cae d6i tu'c;1ngcua 0 thuQc v~ lop cl rho f la mOt ham phan lop, eoth€ stl' dl,lngcac tieu chu~n san d€ xae dioh dO chinh xae cua ham phan lOp f[24],[38],[48].
GQi TP={OEO+I f(a) dung}; FP = {oEO+1 f(a)sai}
Trang 40TN={0 E 0-' reO)dung }; FN={ 0 E 0-' f(o) sai }
Be>chinh xac ciia phan lop c I dtt<;1ctinh bAng Gong thti'c:
Vi dlJ.1.9 Voi bang quye't dinh nbi phan trong bang 1.11
Xet lu? t phan lop cl : {d2,d5}~ {c1} voi f= d2 J\ d5
0+ ={02, 03,06, 08} ti'ng voi c2; O.={oI, 04, 05, 07} ti'ng voi c1
. Xet lu?t phan lop e~'d~ng {d3,d4}~ {e2} voi f=d3J\ d4:
0+ ={02,03, 06, 08} ung voi c2; O.={01, 04, 05, 07} ti'ng voi cl
TP={ 0 E 0+ I reo) dung}={ 03,08}
FP= {o E 0+ I f(o)sai }={02,06}
TN ={ 0 E 0.1 res) dung }=0
FN={oEO.1 f(s)sai} ={ol, 04,05, 07}
Be>chinh xac phan lop c2 - ITPI = I{oJ.o8} I -1,0
ITP I+! nv I I{oJ,oS}I+ 101
1.8.3 Dung lu~t ke't hc1plam lu~t phan lop dii' Ii~u
Cho bang quyet dinh to, D=Hl£,R) va cae ngtKJng minsupp, mine:onf,t1m cae lu~t ke't h<;1pco d~ng r: S~{ e} voi c EC va S cH Co th~ dl{aVaGlu~t
Trang 41ke't hQpnay lam cae lu~t phan lOp dii'li~u rho bang quye't d!nh (0, D=Hl£.R)
va cae ngu'<Jngminsupp, mineonf fun cac lu~t ke't h<;1pco dl:lng r: S~{e} vdiceC va S cR Theo dinh nghi'a dQ tin c~y eua lu~t ke't hQp r: S~{e} la :
CF(r) IP(S)~~({C}) I va peS) Ia t~p cac d6i tu'Qngco ehua cae thuQc tinh trong
S, p({e}) la ~p cae d6i tu'QngthuQelOpc do do p(S)np({c}} se xae dinh cae d6i
tu'<;1ngthuQe Idp e va co chua cae thuQc nnh trong S Ne'u e la ldp e2 thiIp(S)()p({e2})1=TP, peS)=TP uTN hay Ip(S)1=ITPI+ITNIvi TPnTN=0 Noicach khae:
Nhqn xii: Co thE sa d~ng dQtin e~y cua lu~t ke't hQpd~ daub gia dQ chinh de
eua ham phan ldp
Vi d~ 1.10 Vdi bang quy~t dinh nb! phan trong bang 1.11, se co cae lu~t ke't h~p
Trang 421.8.4 Uimg Iu~t ke"t h(jp d~ md rqng h~ s6 ph~ thuQc thuqc tinh trong Iy
thuye't t~p tho
1.8.4.1 Cae khai ni?m cd ban trong Ii thuylt tqp tho
Ph~n nay sii' d~ng cac djnh ngma cd ban cua 1:9thuyet t~p tho lam cd sa
xiiy dlfng h~ s6 phl;1thuQcthuQctinh ma rQng [33],[79]
Dinh nghia 1.24: H~ th6ng thong tin
Cho t~p h<;1p0 hii'uh~n, khac r6ng cac t~p d6i ut<;1ngva A la t~p hii'u h.,n
khac r5ng cac thuQc tinh roi r~c GQidom(a;) Iii ffii~ngia tri cua thuQc tmh aiEA
RAIl
va V=Udom(a;), ham is: O~AxV xac dinh ghi teiciia cac doi ttf<;1ngU'ngvoi cac
1=1
thuQc tinh cua A H~ th6ng thong tin Iii bQ ba (O,A,fs).
Bang 1.12 MQt vi d~ v~ h~ thong thong tin
Trang 43Dink ngkia 1.25 Quan h~ bit kha phan va phan ho~ch t~p d6i tu<;1ng
Cho h~ th6ng thong tin (O,A,fs), BcA, quail h~ bit kha phan ind(B) tren
t~p dO'i ttf<;1ng0 du'<;1cd!nh nghla nhu'sau:
'if B c A , 'if u, V EO, U ind(B) v ~ u(B) =v(B) (1.9)
Quan h~ bit kha phan ind(B) xac dinh hai d6i tu<;1ngu va v co cling gia tIithuQctinh dO'ivoi tit d cae thuQetinh trong B ( u(B)=v(B »
Cho BcA, co th~ ki€m ITaquail h~ bit kha phan ind(B) Ia mQt quail h~ tu'dng du'dng Quan h~ bit kha phan ind(B) xae dinh mQt phan ho~eh t~p dO'i
tu'<;1ng0 thanh cae lop ttfdng du'dng Vdi u E 0, k9 hi~u [U]ind(B) 130lOp ttfdng
du'dng eila u theo quail h~ ind(B) va O/B Ia phan ho<:1ehdu'<;1c1<:10tll quail h~
ind(B) M6i phgn tli eila phan ho~ch O/B du'<;1c gQiIa IDQlt~p co sa hay IDQtIdp
[02]ind(B)=[03]ind(B) =[06]ind(B)= [08]ind(B) = {02, 03, 06, 08}
Dink ngkia 1.26: Bang quy€t dinh trong 19thuy€t t~p tho
Cho h~ thO'ngthong tin (O,A,fs), gQi HR va CR la cae t~p con khac r6ng
eila A sao cho A=HRuCR va HRi1CR=0, (0, A=HRuCR, fs» du'<;1cgQi hi mQtbang quy€t dinh trong 19 thuy€t t~p tho T~p HR du<JcgQi la t~p cae thuQetinhdi~u ki~n va CR la t~p cae thuQc tinh quy€t dinh Bang 1.12 Ia IDQtvi d~lv~bang quy€t d!nh trang 19thuy€tt~ptho vdi H={ a,b} va C={c}
Trang 44Dink ngkia 1.27 Xa'p xl t~p h<;fp
Cho h~ th6ng thong tin (O,A,fs), X, la cac t~p can khac r6ng cua 0, XcO
va B la t~p con khac r6ng cua A, BcA - BE 1!oe Iu'<;fngt~p X cae d6i tu'<;fngqua t?P B cac thuQc tinh, Z.Pawlak dung khai ni~m xa'p xi du'oi eua X qua B ky hi~u
la B.(Xr va xa'pxi tren eua X quaB kYhi~uIa B*(X)[79] Cae xa'pxi du'oiva
B.(X) ={u EO I[U]ind(B)C X}
(1.10)
Dink nghia 1.28 H~ so' ph1,1thuQc thuQc tlnh
Cho tru'dc hai ~p con khac r6ng U, V cua ~p thuQc tlnh A, h~ sO' ph1,1 thuQc thuQc tinh cua t~p thuQc tmh V VaGt~p thuQc tinh U du'<;fC sa d1,1ngdE khao sat s1,1' ph1,1thuQc cua t~p thuQc tinh V VaGt~p thuQc tlnh U va du'<;fcdinh nghIa nhasau:
H~ so' ph1,1thuQc thuQc tinh y(U,V) du'<;fC su-d1,1ngdE phan anh mti'c dQ ph1,1thuQccua hai t~p thuQctinh [79]
Vi dl} 1.12 Vdi h~ th6ng thong tin d bang dii'li~u 3.2, rho: U={a, b} va V={c;},
hay tinh Y(U,V)?
a) V8i U={a, b }se eo cae 18p ttfdng dtfdng:
. {<a,I> ; <b,4>}: UI=[ol]ind(U)=[oI]
{<a,2> ; <b, 4>}: U2=[ 02]ind(U)=[ 02]
Trang 45
V~y h~ 86 pht;1thuQc thuQc tinh cua V vao U la 1,0 hay V pht;1thue}choantoan vao U.
1.8.4.2 Mil TQnghi sit ph1;lthuQc thuQc linh [9J
Phin nay trlnh bay cd sd 19 lu~n dE dinh nghia va tinh tmin h~ s6 pht;1 thuQc thue}ctinh md fe}ng.
Dinh nghia 1.29 Ham phan anh muc de}bao ham
Cho ngU'ongdo mue dQ bao ham 8e[0,1], gQi ~(S,T) la ham phan anhmue dQbao ham cua Strong T, ham ~(S,T) dU<;fC(t!nhnghia nhu san:
Trang 46Neu J.lc(S,T);::: 8, thi t~p h<;1pS du'<jcgQi la baa ham trang T vdi mUGdQ baa ham la 8 Neu 8=1,0 thi S c T
Dtnh nghia 1.30 Xa'p Xldu'oimd fQng
Vdi dinh nghla cila ham philo anh mue dQ baa ham, co th~ dinh nghiaXa'pXlmo fQngB**(X)trong Iy thuyet t~p tho nhu'sau:
B**(X)={ u E 0 I J.lc([U]ind(B), X);:::8J\ U EX} (1.13)
Dtnh nghia 1.31 H~ s6 ph\! thuQcthuQctfnhmd fQng
H~ s6 ph\! thuQcmo fQng du'<;1cdinh nghla qua ham phan anh mue dQ baa
ham Cho hai t~p thuQctinh U va t~p thuQctinh V, M s6 ph\! thuQcthuQctinh mofQngcila V vao U du'<;1cky hi~u Ia '¥ (U,V)va du'<;1cd!nh nghia nhu'sau:
Vi dl} 1.13: Xet bang quyet dinh 1.12, cho U={b} va V={c}, ta co:
Voi U={b} se co cae lop tu'dngdu'dng:
[01]ind(U)=[02]ind(U)=[03]ind(U)=[08]ind(U)={ 01,02,03,08}
[04]ind(U)=[05]ind(U)=[06]ind(U)=[07]ind(U)={04,05,06,07}
Voi V= {c}se eo cae lop tu'dngdu'dng:
-[ol]ind(B)=[04]ind(B) =[05]ind(B)= [07]ind(B)= {ol,04.05, 07}
[02]ind(B)=[03]ind(B)=[06]ind(B)= [08]ind(B)= {o2,03, 06, 08}
Dung h~ s6 ph\! thuQcthuQctinh truy~n th6ng y(U,V)= II U.(X) 1/101=0
Trang 47Trong 1:9thuytt t~p tho khi y(U,V)=Oco nghla l?iV khong ph\,!thuQcVaG
U, nhung theo yeu cftu cua pIlau lap gftn dung v~n co th8 suy fa duQCV tIcU
Tit hai lu~t phan ldp :
<b, 4> ~ <c,7>, dQchfnh xac cua pMn lap =0,75
<b, 5>~ <C,6>, dQchinh xaccua pMn Idp =0,75
D\fa VaGnh~n xet tren, lu~n an md fQngkhai ni~m xa'p XlduOicua t~p tho
nh~m (ijnh nghla h~ s6 ph1,1thuQcthuQctinh md fQng \fI(U,V)
Vdi cac t~p cd sd cua phan ho~ch ON va muc dQbaa ham e =0,75:
Vdi Xl= {oI,04.a5, a7}, U (XI)={a4,05, 07}
Nhq.n xet:Khi nguong do mue dQbaa ham 8=1,0 thl '¥ (U,V) =y(U,V)
1.8.4.1 Chuyin tl/Jibang quye'Fi1/nhtTong Ii thuylt tljp tho sang bang quylt dink nhjphlin
Trang 48Ham attributes d~ la'yten cac thuQctinh trong t~p con S cac chi baa cua D Tinh chat 1.6: Voi c~p ham (p, A) dfi dtnh nghia aireD, gQi U eA va OIU la mQtphilo ho~ch o thee quaDh~ ba't kha philo ind(U) va U1,Uz,., Ukla cac ~p cd
sa cua philo ho~ch OIU thi p(A(Uj»=UjV j=I, ,k.
Vi dl} 1.14: Voi U={a, b} va t~p cd sa cua phan hOi;lCh OIU ung voi lop tttdng
du'dng U5=[o5]ind(U)=[o7]ind(U)={o5,o7} du'<;1cxac dtnh bai: <a,3> va <b, 5>
Theo cach ma hoa ireD,hai chi baa tttdng ung la d2=<a,2>; d5=<b, 5> Dung c~p
ham p,A da du'<;1c dtnh nghia aireD, ta co:
A(05, o7)={d2,d5,cl}; p(A-(o5, 07») = p({d2,dS,cl})={o5,o7} = U5
1.8.4.4 Tinh hf srfphI} thul)c thul)c tinh md rl)ng qua dl) tin cljy va dl)phil bitn cua luat kit hd, rp "-
Rtl dl 1.1: Cho SeD va TeD, muc dQcua peS)bao ham trong peT) du'<;1ctlnh:
J.Ic(p(S) ,peT»~=Ip(S) tlp(T)llIp(S)1 =CF(S-+ T) (1.16)
-.}-.,~.
Dinh Ii 1.7([9]).Cho (O,A=HRuHC,fs) la bang quye't dtnh va bang chuy~n d6i quye't dtnh nht philo (O,D=HuC,R) tttdng ung, gQiU va VIa hai t?P h<;1pcon cua
A, Uj la cac t?P cd sa cua philo hOi;lCh OIU va X la t?P cd sa cua philo hOi;lCh
ON, J la t~p cac chi s6 sao rho VjeJ, !lc(Uj,X)~ e thi:
'I' (U,V) = I I(CF(A.(Uj)-+A,(X»*SP(A,(Uj)))
XeOlVjeJ
(1.17)
Trang do D la t~p chi baa cua bang quye't dtnh nht phan (O,D,R) dtt<;1c
chuy~n d6i tITbang quye'tdtnh (O,AJs)
Trang 49Chung minh: Gqi J Ia ~p cac chi s6 saGcho 'v'jeJ, J.1c(Uj,X);::e voi l!j Ia ~p cd
sd cua phin ho~ch 01U, co th€ tinh I(U (X»I bhg:
I(U (X»I = IIUj(JXI
jeJ
Do l(Uv cD, A.(X)g), lu~t ke't h<;1pA.(Uj)-+A.(X)di'idu<;1etlnh dQ ph6bie'n va dQ tin c~y Den CF(A(Uj)-+A,(X»= Ip(A,(Uj»(\ p(A,(X)l/lp(A(Uj»1.Theotlnh cha't 1.6 do Uj va X la cac t~p co sd eua phin ho~ch Den p(A(Uj»=Ujva
p(A(X)=X,do v~y Ip(A.(Uj»n p(A.(X)I=IUj n XI = CF(A(Uj)~A(X»* IV) Ngoai
fa, dQ ph6 bie'n cua ~p h<;fp A(Uj)Ia SP(A,(Uj»= Ip(A(Uj))I/IOI=IUpIOI,DenIUjl=SP(A(Uv)* 101 Tom l~i:IUjn XI=CF(A(Uj)~A(X»* SP(I (Uj»* 101
Ne'uA.(Uj ) la t~p ph6 bie'n va A(Uj)~A(X) la lu~t ke't h<;fp,co th€ tlnh h~s6 ph1:lthuQcthuQc tinh md rQng nhu san:
XeD/V jeJ
1.8.4.5 Xliytb!ng thuQ.t giai dJ!a tren hi siJphlJ thllQCthuQc tilllz mll TQng
Cho bang quye't dinh (O,A=HRuCR,fs) va nglliJng dQ ehlnh xae cua phin
~.
lOpminprecisione[O,I], fun cae lu~t'phin lop S~T voi S ~HR va TcCR, saGtho
do chlnh xae cua lu~t phin lop S~ V Ion hon ho~c bing minprecision Cho bang
quye't dinh (O,A=HRuCR,fs), gQi (O,D=HuC,R) la bang quye't djnh nb! phindU<;fCehuy~n d6i tU bang quye't djnh (O,A=HRuCR,fs) Cho trUoc cac nglliJngminsupp, minconf, minprecision GQi FS(O,D=HuC,R,minsupp) la t~p cae t~pph6 bie'n cia (O,D=HuC,R) va R(O,D=HuC,R,minsupp,mincont) la t~p cae lu~tke't h<;fpeo d~ng lu~t phin lop S~ T, saGcho S~H va Tcc.A=Huc
Thu~t giai 1.11 san dfty sa d1:lngh~ s6 ph1:lthuQcthuQetinh md rQngd~tlm lu~( phan Idp dli li~u
Trang 50Thu4t giiii 1.11: TIm lu~t phan lop dt!a tren h~ 56 ph1:1thuQcmd rQng
Vao: Bang quy~t djnh (O,A=HR0CR,fs)
NgU'Ongminsupp, mineonf, minpreeision
Ra: T~p cae lu~t phan lop S ~ T, sac cho S c H, T c C, A=HuC, ngU'Qngphan
lOp la minprecision.
BlIUc 1: Chuy~n bang quy~t dtnh (O,A=HRuCR,fs) sang bang quy€t djnh nhtphan (O,D=HuC, R)
BlIf1c 2: Tinh FS(O,D=HuC,R,minsupp) va R(O,D=HuC,R,minsupp,minconf)
thee cae thu~t giai fun t~p ph6 bi€n va lu~t k~t h<Jp
BlIUc 3: Phan hoi;1cht~p R(O,D=HuC,R,minsupp,mincont) ra cae nhom lu~it
phan lop S ~ T, co cac thuQc tinh trong t~p S gi6ng nhau va cae thuQc tmhtrong t~p T gi5ng nhau, gQiC={G!,Gz, ,Gd la cac nhom lu~t san khi phan lop
Trang 51Vi dl!-minh h{Ja thuq.t gidi 1.11
Voi bang quytt dinh nhi phan (j bang 1,12, ngU'ong ph6 bitn t6i thi~uminsupp=O,1 ngu'Ongtin c~y t6i thi~u III minconf=0,75, ngu'ong cmnh xac toi
thi~u Iii minprecision=O,75.Ung dl,mgcac thu~t giai rim Iu~t phan lop tit lu~t ktt
h<jp se thu dU'<Jccac Iu~t phan lOp san:
Trang 52\f'({b},{ c})=
CF(r3)*SP( {d4} )+CF(r4)*SP( {d5} }=0.5*0.75+0.5*0.75=0,75Nhom G3:
Lu~t ke't h<;1p{d1,d4} ~ {el }
Trang 53Chu'c1ngnay phat tri~n cac thu?t giiii hi~u qua d~ tlm t~p ph6 bien va lu~t
ke't hQp trong CSDL biing cach ghlm dQ phuc t~p cila nnh toaD va giam so lftn
truy c~p CSDL Co hai lo~i thu~t gi.H du'Qcphat tri~n la thu~t giai khong tangcu'ong va thU?tgiai tang cu'ong
Trong thu~t giai khong tang cu'ong, mo hlnh vector bi€u di~n t~p m~t hang
va baa dong d:i du\1Cd€ xu!t nhiim bi~u di€n CSDL thanh ngfi'canh nhi phanniim trong bQ nho may nnh va giam solu'c1ngt~p ung VieDdn tinh dQph6 biend~ DangcaDhi~u stIlt thu~t giai ,
Trong thu~t giai tang cu'ong, thu~t giai (~O daD khai ni~m cila R Godin d:i du'Qc di bien d€ fun t~p ph6 bie'n (it cac khai ni~m hlnh £huc £rong daD khaini~m Thu~t ghHtren daD khai ni~m ngoai kha Dang tang cu'ong con co tnI di~m
"f,
la chi dn truy c~p CSDL mQ(Iftn'atiy nh!t la co th€ t~o daDkhai ni~m
Ke' de'n la cac nghien CUumd rQng lu~t ke't hQp truy€n thong sang d~nglu~t ke't hQpphil d!nhva lu~t ket hc;ipmo
Cuoi clIngchttc1ngnay trlnh bay cac nghien cUu dung lu~t ke't hc;iPlam lu~t
, phan lOpdfi'li~u va xay dl,l'ngh~ so ph1,1£huQcthuQctinh rod fQngtrong ly thuyet
t~p tho nhiim Dangcao khii Dang khao sat mli'cdQ ph1,1thuQcgifi'acac ~p thuQctinh trong cae bai toaDphan lop dii'li~u g§n dung