Tap ciii' Tin hpc va Bleu khiin hpc, T.25, S.l (2009), 88-97 KET Hgp CAC BO PHAN PHAN iCfP SVM CHO VIEC NHAN DANG CHU" VIET VIET TAY R6^I RAC PHAM ANH P H U N G \ NGO QUOC TAO^, LUONG CHI MAI^ ^ Khoa Cong nghe Thong tin, Truang Dgi hpc Khoa hgc Hue ^ Viin Cong nghi thong tin, Viin Khoa hgc vd Cong nghi Viet Nam A b s t r a c t This paper studies some features, which can be applied to Vietnamese handwritten character recognition Base on SVM classification and Haar wavelet features we propose a new model for Vietnamese handwritten recognition Our test results over Vietnamese handwriting with 50,000 character samples show the relatively high accuracy of our recognition model T d m t a t Bai bao nghien ciiu mot sd loai dac trung co the ap dung cho bai toan nhan dang cbd Viet viet tay rdi rac Tir de xuat mot mo hinb nhan dang chii Viet viet tay rdi rac tren co sd phuang phap vec to tua ket hop vdi lua chon dac trung wave-let Haar Cac ket qua thuc nghiem tren cac tap dii lieu chii viet tay tieng Viet vdi 50000 mau t u thu thap cho thay mo binh nhan dang de xuat dat chfnh xac tuong ddi cao GlCri T H I E U Nhan dang chir viet tay dang la van de thach t h u c ldn ddi vdi cie nha nghien ciiu Cho den nay, bai t o i n nhan dang chur viet tay van chua cd d u g c mgt giii phap tong t h e Mdt sd ket q u i chd yeu chi t a p trung tren cac t a p dii lieu chir sd viet tay chuan n h u USPS va MNIST 14, 5, 6], ben canh dd cung cd mdt sd edng trinh nghien ciiu tren c i c he ehfr cii La tinh, Hy Lap, Trung Qudc nhien c i c ket q u i cung chi d u g c gidi ban mot pham vi hep 12, 7] D i e biet ddi vdi viec nhan dang chii viet tay tieng Viet lai eang gap nhieu khd khan hon bo ky t y tieng Viet ed nhieu ky t y vdi hinh dang rat gidng nhau, chi k h i c chut ft ve phan dau Do dd rat ft cdng trinh nghien ciiu ve n h i n dang chii viet tay tieng Viet Bai toan chiing tdi d a t day la xay dyng mdt md hinh nhan dang chii Viet in viet tay rdi rac Bd ky t y tieng Viet bao gdm t i p ky t y khdng dau A, B, C, D, D, E, G, H, I, K, L, M, N, O, P, Q, R, S, T, U, V, X, Y va cac ky t y cd dau A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, A, E, E, E, E, E, E, E, E, E, E, E, I, I, 1, I, I, 6, O, 6, 6, 6, 6, 0, 6, 6, 6, 6, 0, a, a, o, 6, o, u; u, u, u, u, y, ir, u; u, u, u, Y, Y, Y, Y, Y chung tdi gidi ban pham vi cda bai t o i n theo mgt sd qui dinh nhu: c i c chir viet p h i i cd mot khoing cich t u o n g ddi, giu'a phan chur va phan dau p h i i tach rdi P h u a n g p h a p vec t o t y a (SVM Support Vector Machines) la mdt p h u a n g phap m i y hoc tien tien da cd nhieu cdng khdng chi eae linh vyc khai p h i dtr lieu m a cdn linh vyc nhan dang Trong nhurng t h i p nien gan day, SVM da d u g c i p dung rgng rai v i o KET HOP CAC BO PHAN PHAN LOP SVM 89 nhieu bai t o i n t h y c te va cho nhieu ket q u i rat k h i quan | , 3] Vi vay, ehdng tdi sU dung phuang p h i p vec t o t y a cho mo hinh nhan dang ma chung tdi de xuat TVong mdt he thdng nhan dang thi viec trfch chgn d i e trung l i mot budc quan trgng, nd cd i n h hudng rat ldn den chfnh xac cda he thdng n h i n dang Cd rat nhieu p h u a n g p h i p trich chgn d i e trung hieu q u i ed the i p dung cho chur viet tay nhu: ma tran trgng sd, toin td' Kirsch, cac bieu dd chieu 14, 5, 7], bai b i o chung tdi da sdr dung v i cii dat t h d nghiem tren mgt sd cic loai dac t r u n g dd va quyet dinh sd dung y t u d n g cda p h u a n g p h i p trfch chgn dac trung wavelet Haar l8] cho mo hinh n h i n dang chii Viet viet tay rdi rac Tiep theo, Mue se tdm t a t nhung y tudng ca b i n cda phuong phap vec t o tua Mue trinh bay cac ket q u i t h y c nghiem tren dii lieu chu' viet tay tieng Viet vdi mdt sd phuang phip trfch chgn dac t r u n g thdng dung Mue p h i t hga kien true cda md hinh nhan dang chir Viet viet tay rdi rac va c i c ket q u i t h y c nghiem theo mo hinh Cudi cung la phan ket luan va hudng phat trien P H U O N G P H A P V E C T O TU*A Cho t i p mau huan luyen xi £ R^, i = 1, , N vk cac nhan t u a n g ung yi e { - , +1}, mue tieu cda SVM la tim mgt sieu phang phan each (dugc xac dinh bdi w) cho khoing each (margin) giiia hai Idp dat cyc dai (Hinh 1) Hinh Sieu phang tach vdi khoing cich cyc dai Ham mue tieu cda mgt m i y phan ldp SVM nhi p h i n cd the dugc phat bieu nhu sau: gix) = w.^ix)-\-b, (1) do, vec t a dau vao x G R^, w la vec t o chuan cda sieu phang phan cich khdng gian dac trung duge sinh tur i n h xa cda bam $(a;) : R^ -^ P ^ ( M > TV, $(a;) cd the tuyen tfnh hoac phi tuyen) va b Ii lech so vdi gdc tga ll], SVM gdc dugc thiet ke cho bai toan p h i n Idp nhi p h i n , vi vay dau cda gix) cho biet vec t o x thugc ldp +1 hay ldp —1 Viec tim sieu phang phan eich chi'nh l i viec giii bai toin qui hoach toan phuang (QPQuadratic programming): m a x ( a i — -a a 7, Ha) (2) 90 PHAM ANH PHUONG, NGO QUOC TAO, LUONG CHI MAJ thda man < Oi