1. Trang chủ
  2. » Nông - Lâm - Ngư

ứng dụng SVM cho bài toán phân lớp nhận dạng.

8 3 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 8
Dung lượng 234,55 KB

Nội dung

Proceedings of ICT.rda'06 Hanoi May 20-21.2006 Ky ylu HQi thto ICT.rda'06 IJNG DVNG SVM CHO BAI TOAN PHAN L P N H ^ DANG Phpm Anh Phuomg, Ngd Quoc Tao, Liromg Chi Mai Tdm tat: Trong bdi bdo ndy, chung Idi gi&i thiiu mgt hu&ng tiip can cho bdi lodn phdn l&p nhdn dgng mdu dua tren md hinh SVM Phuang phdp hgc mdy ndy dd dugc Vapnick nghiin dm tir cudi thgp ky 70 vd hiin na^ dang dugc dp dung rdng rdi Irong ITnh vuc nhdn dgng Cdc kit qud thuc nghiim cho thdy, SVM cho kit qud phdn l&p khd chinh xdc vd dugc coi Id phuang phdp c6 hiiu qud cd the so sdnh v&i cdc phuang phdp hgc mdy khdc nhu mgng na ron, HMM Tit khda: Suppori Vector Machines, margin, feature space, kernelfunction SVM CHO BAI TOAN PHAN LdP NHf PHAN Gidl THIEU Support Vector Machines (SVM) dupc nghifn ciiru tir nhumg nim 60 vdi nhQng cdng trinh ciia Vapnik vi Lemer (1963), Vapnik vi Chervonenkis (1964) Co sd ciia SVM dya fren nin ting ciia ly thuylt hpc thing kf, ly tiiuylt vl so chilu VC (Vapnik-Chervonenkis) di dupe phat triin qua thpp ky bdi VapnikChervonenkis (1974) vi Vapnik (1982,1995) [1,2,3] Mii cho din din nhiing nim gin day thi ly thuylt niy mdi cd nhihig bude phit triin mpnh mg (Burges, 1996 [4]; Osuma, 1997 [6,7]; Piatt, 1998 [9]) vi nd frd mpt cdng cy mpnhfrongnhilu ung dyng nhu: nh$n dpng chii vilt (Joachims, 1999 [13]; Nguyen Dire Dung, 2005 [25]), nhpn dpng m§t ngudi (Osuna, 1997 [7]) Khac vdi miy hpc tuyln tinh, y tudng chinh ciia phuang phap niy li tim mpt sif u phing phan cich cho khoing each (margin) giiia hai tpp dpt c\fc dpi Dk giii bii toin niy, chiing ta cin nim mpt sd khii mifm: margin, phan ldp mem (soft classifier), vector hd trp (support vector), khdng gian djc trung (feature space), him nhan (kernel function) Trong bii bio niy, chiing tdi chi tpp trung vio bii toin phan ldp nhj phan, cii dpt thu nghifm, sau dl xuit hudng cii tiln dl nang cao tic dp huin luyfn 2.1 SVM tuyen tinh Cho N mlu {(xi,yi), ,(xN,yN)} dd xieR"^ vi yj€{±l} Tim mpt sifu phing phan cich: f(x) = sgn(w.x + b) dk phan tich tpp mau trfn thinh hai ldp cho khoing cich (margin) phan chia giiia hai ldp dpt cyc dpi Tire li, ta mulntimmdt sifu phing H: y=w.x + b = vi hai sifu phing song song cich diu: H,:y=w.x + b = +l Hj: y=w.x + b = -1 cho khdng cd dilm nio nim giiia H| vi Hj, ding thdi khoing cich giiia H| vi H2 dpt eye dpi (hinh 1) 393 Ol^ Hinh 2.1 Sieu phdng phdn cdch tuyen tinh, cdc "vector ho trg" dugc khoanh tron Kyyiu HQI thto ICT.rda'06 Proceedings of ICTroauB Hanoi May, zu-zi, z w Doi vdi mpi mit phing phan each H vi cic mit phing H,, H2 tuomg irng, ta ludn ludn cd thi "diiu chinh" vector w cho Hi se la y=w.x + b = +1 va H2 sf la y=w.x + b = -1 Chung minh: xem [8] D I cyc dpi khoang each giiia Hi va H2, ta phai dya vao cac mlu nim tren Hi va H2 Cac mau dupc gpi la vector hi trgf (support vector) bdi vi chi cd chiing mdi tham gia vao vifc xac djnh sieu phin^ phan each, cdn cic mau khac chiing ta cd the bd qua Ta cd khoing cich ciia mpt diem nam ^L _ aiv ~ ° N ^ "^ T^^^ii-Vi^i) = ^ isl N ow=5]a.y.Xi (3) i=l —=0 ^b yay=0 T\ ' ' , Thay (3) va (4) vao (2) ta cd: N V^ V^ M ' 2"^^ ' tren Hi tai H la = suy II ^11 II ^ II , ' u - u 'ui'c khoang cach giua Hi va H2 la = (4) ' ' vTa cd L(w, b, a) > e(a) nfn tiiay^i giai bai toan (*), ta se giai bai toan doi ngli ^ ^ • ui £./ wu .*- ^ cue dai him 8(a) theo or, vai dieu kien Oi > IIY^II N Do dd, df cyc dai 5, ta se cyc tieu ||w||= vdi dieu kifn yj(w.Xi + b) >1 Tir dd, bai toan cd the phat bieu lai nhu sau: J min— < w.w > cho yi(w.Xi + b) >1 "• ^ ^ '' 2.2 Bii toan dii ngiu va y.ajyj =0 i=i O day chung ta cd cac hf sd Largrin 0^ tuang irng vdi mdi mau hpc Sau d huan luyfn xong, nhirng mau ed 04 > dup gpi la "vecta hd tra", tat ca cac mlu hpc kha cd 05 = thi nam ve hai phia ciia hai sif phang Hi va H2 ^ ' ^^ ^^"^ ^^^^ ^^^ °i' *» cd till tin Xft him Lagrange: ^=^^yiXi L(w, b, a) = f(w, b) + ^a.gj(w,b) '=' (1) frong dd va ngu&ng b Mpt doi tupng mdi x se dupc phan ldp be ham myc tieu: f(x)= sgn(w.x + b) f(w,6)=l =sgn((Y^a,y,.J + b) N vagi(w,6)=l-yi( + 6) Xet tiep ham: e(a) r1 = minL(w,b,a)= *ã'ã ô rTiinl-+ 2]a,(l-y,( + b)) , ^^^ Lay dpo ham theo hai bien w va b ta co- =sgn(Y,a,y., + b) i=i \ (6) Trudc di vio chi tilt dl giii bi toin qui hopch loi nay, chung ta md rpng n theo hai hudng: SVM phi tuyln vi phan Id mlm 2.3 SVM phi tuyen x Neu mit phan cich khdng phii li tuyl tinh, ta cd the anh xp cac diem diJ lifu vi 394 Proceedings of ICT.rda'06 Hanoi May 20-21.2006 Ky ylu HQi thto ICT.rda'06 mdt khdng gian khic vdi so chilu cao hom 2.4 Phan ldp mem cho cic dilm diT lifu niy s€ tich dupc tuyln tinh Cho inh xp biln doi la (.),frongkhdng gian mdi vdi so chieu cao ban, thi: N I N ( a ) = X « i -TZ«.«jyiyj*^(*i)*^(''j) (^) i-l ,j Gii sir 0(Xi).0(Xj) = K(Xi,Xj) Nghia li, tich vd hudng khdng gian mdi tuong ducmg vdi mpt ham nhan (kernel) ciia khdng gian diu vio Vi vpy, ta khdng cin phii tfnh tryc tilp tich vd hudng a>(xi).(Xj) mi chi cin tinh giin tilp thdng qua him nhan K(xj,Xj) Cuoi eiing 6(a) se trd thinh: 0(a) = X a i - ^ Z a i a j y j j K ( x , x , ) (8) i=l jj Dinh ly Mercer, mpt ham K(x,y) cd thi dupc sir dyng nhu mpt him nhan nlu: - K phii doi xirng: K(x,y)=K(y,x) vdi mpi x,yeR° - Ton tpi mpt anh xp O vi mdt khai friln K(x,y) = ^^MMvX^ "^" ^^ chi neu i Hinh 2.2 Phdn lop mem Mpt hudng khic df md rpng SVM cho vifc phan Idp la cho phep cd nhifu NghTa li, ta khdng hoin toan ep bupc dir lifu phii nim vl hai phia ciia H| vi H2, nhung ta muon hpn chf toi da cac diem di^ lifu nim giii'a H| vi H2 Ta ndi Idng diiu kifn phan Idp bing cich them cac biln tri (slack variables) ^i>0 cho wjc,-i-b>-i-l-(Jj vdiyi = +l wJCi+hOV/ va ta cpng thfm vio him myc tifu mpt dpi lupng phpt (penalizing term): min-+C(2]^i)"' + Him nhan da thirc (Polynominal): K(x,y) = ((x.y)+^)'* d e N, ^ € R + Him nhan Gauss (Gaussian RBFRadial Basic Function): (9) frong dd m thudng dupc ehpn li 1, nhu vpy bai toin (*)frdthinh: — < w.w > + C y Ê ô.^.4 K(Xi,Xi)=e "ã' tr' (10) + Him nhan sigma (Sigmoidal): K(x, y) = tanh(/cx y - S) K,S eR + Him nhan tuyln tfnh (Linear): K(x, y) = X y 395 sacicho yi(w.Xi + b) + ^ i ^ SO I SO, *i^i ~ vifc luufriiima trpn Gram ddi hdi mpt khdnj i.l i.l i-l gian nhd bing binh phuang kich thudc ciia tp| huan luyfn nen rit dl vupt qui ning liri - < WW >+£(c-a.-^.)^,-(Xa^y.^ )w-(Z**iy.)*''*'Z« tru' ciia may tfnh tap huin luyfn qui Idn .1 i-i Vdi cic hf so Lagrange a, da thirc I N N N N (**) Ci ^ i va ^li dfu khdng xuit hifn bai toan doi ngau: N I N max0(a) = X ! « i — Z « * « , y i y j ^ ^ (^^) i.l i,j cho: N Trong so nhihig thupt toin thdng dyn dupc thiet kl cho vifc huin luyfn SVM chiing tdi se md ti tdm tit cic phuang phip d dupc eung cip hiu hit umg dyng SVM thuat toan chpt khuc (chunking), phan tic (decomposition) [6] vi day toi uu cyc til (SMO) [9] 3.1 Thuat toan chunking Phuang phap Chunking bit dau vdi mj tap bit ky (chunk) cua tpp dir lieu hui De huan luyen SVM, chiing ta can tim luyfn, sau huan luyfn SVM theo mi cac Q!i thdng qua mien xac djnh cua bai toan phuang an toi uu tren chunk du lifu vira chpi doi ngau de cue dai ham muc tieu Giai phap Tifp din, thuat toan giii' Ipi cac vector hd tr toi uu cd the dupc kiem tra bang each sir (cac mau co o^ > 0) tir chunk sau da loi dyng dieu kifn KKT bd cac phin tiir khac (tuang iirng vdi 04 = 0) v diing cac vector hd trp de kiem tra ci 2.5 Dieu kien KKT (Karush-Kuhn-Tuckcr) phan tiir phan cdn lai cua tpp dii lifi Dieu kifn toi uu KKT cua bai toan (*"•) Phan tir nao vi pham dieu kifn KKT thi duq bd sung vao tpp cac vector hd trp de tpo la: chunk mai Cong viec dupc lip di l^p I9 • Dao ham ciia L(w,b,^;a) theo cac bien vifc khdi tao Ipi a cho mdi bai toan mi w,b,^ phai trift tieu phu thupc vao gia trj diu ciia trpng th trudc dd va tiep tyc toi uu bai toan mi • Vdi l < i < N , vdi cac tham so toi uu da dupc lya chp a.(yi(w.x.+b)+^-l) = (i2) Thupt toin se dimg Ipi thda man diiu kic tdi uu Chunk ciia du lifu tpi thdi dilm dar Mi4i=0 (13) xet thudng dupc hieu nhu tap lam vi\ (working set) Kich thudc cua tpp lim vii ludn thay ddi, nhung cuoi cimg nd bing ; CAC PHUONG PHAP T I l/U lupng 0^ khac khdng (bing so lupng vector AP DVNG CHO SVM trp) Phuang phap dupc sir dyng vdi g Trong phan trudc, chiing ta thay ring thiet ring ma tran Gram dung dl luu tich ^ huan luyen mpt SVM tuomg duang vdi vifc hudng cua tirng cip cac vector hd trp phii h< giai bai toan qui hoach loi thda man cac rang vdi kfch thudc bp nhd (chiing ta cd the tfnh I bupc tuyen tinh Bai toan Max/Min cua mpt ma frpn Gram bat ciir luc nao thiy c; ham nhieu bien da dupc nhilu ngudi nghien thiet, nhung dieu se lam giim toe dp hu; ciru va hiu het cac ky thupt chuan cd thf ap luyfn) Trong thyc nghifm, cd thi xiy dung tryc tiep cho vifc huan luyen SVM trudng hpp so lupng vector ho trp qui Id ^ ^va X^iy* ^ ^ • •XQf^ Ky yCu HQi thto ICT.rda'06 Proceedings of ICT.rda'06 Hanoi May 20-21.2006 lim cho ma trpn kernel vupt qui ning luu trii ciia miy tinh Thupt toin niy li trudng hpp d|u; bift ciia thu|t toin Decomposition, tiirc li nd giii bii toin qui hopch loi vdi kfch thudc t|p lim vifc la frong mdi bude lip Uu dilm eiia thu|il toan niy li cd thi giii bii toan tii uu bing phuang phip giii tich 3.2 Thuft toin decomposition Phuong phip decomposition khic phyc dupc khd khin ciia phuang phip Chunking bing cich co djnh kich thudc cua bai toan (kich thudc ciia ma trpn Gram) Vi vpy tpi mpi thdi dilm, mdt phin tiJr mdi dupc bo sung vio /^p lorn vifc thi mpt phan tiir khic bj lopi Diiu niy cho phep SVM cd ning huin luyfn vdi tpp dii' lifu Idn Tuy nhifn, th\rc nghifm cho thiy phuang phip niy hpi ty rit chpm Trong thyc nghifm, ta cd thi chpn vii mlu dl bo sung vi lopi bd tijr bai toan dl ting toe dp hpi ty Thupt toin niy dupc trinh biy tdm tit nhu sau: Input: - Tap S gdm N mlu huan luyfn Thupt toan SMO tiiyc hifn hai cdng vifc chinh: Giii bii toin toi uu eho hai nhin tir Lagrange bing phuang phip giii tich vi mpt phuang phip heuristic dl chpn hai nhin tiir cho vifc tii uu KET QUA THirC N G H I | : M Chiing tdi tiln hinh thyc nghifm vdi tpp mau huin luyfn li 100 miu mit phing (hinh 4.1) vi tap miu thiir nghifm li 71 {(Xi,yi)}i=i N - Kich thudc cua Working Set la M Output: Tap {oili^i N Khdi tao • Dit cic Oj = 0; • Chpn Working Set B vdi kich thudc M; Tim nghifm toi iru Repeat • Giii bii toin toi uu cue bp tren B; • Cpp nhpt Ipi B; Until

Ngày đăng: 08/12/2022, 21:21

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN

w