Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 214 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
214
Dung lượng
27,44 MB
Nội dung
trY BAN NHAN DAN THANH PHO HO CHI MINH SO KHOA HOC VA CONG NGFK BAO CAO TONG KET THVC HOT DP TAI ThuOc chuang trinh Vtriin uom Sang to KHCN tre De tai Nghien ciieu xay dung chuang trinh tra citu van pham va tit dien tieng Viet Ca quan chu tri: Dia chi: ChU Trung tam Mat trign Khoa hoc va Cong nghg tre 01 Pham Ngoc Thach, Q1, TPHCM KS DO Van Long TP Ho Chi Minh, _6I/200 IOIT-HCM Nghien cifu va xay dung chu'dng trinh tra cin, van pham va tir din tag Vist Muc Luc Chtrcrng 1: , Gieri thi0 ve de tai 1.1 Ve tai lieu Muc dich 1.1.1 Pham vi sir dung 1.1.2 Tir viet tat 1.1.3 Chuan bi tai lieu 1.1.4 Tai lieu tham khao 1.1.5 Ve de tai 1.2 Muc tieu 1.2.1 Nei dung nghien ciru 1.2.2 San pham dang kjf 1.2.3 Thanh phan tham gia thut hien 1.2.4 Thanhyhan tu van 1.2.5 Torn tat ket qua dat dugc 1.2.6 Chuang 2: Van pham turp nhAt Giai thieu , 2.1 Kien thirc nen tang trong van pham hop nhat 2.2 Vai tre dm cac hinh thirc van pham (Grammar formalisms) 2.2.1 Met so diem co ban thiet ke van pham 2.2.2 Cac loaf vanpham hop nhat 2.2.3 Cac phan van pham hop nhat 2.3 Hinh thirc (formalism) 2.3.1 Mien thong tin 2.3.2 Cac luat ket hop 2.3.3 Cac nhem14 hieu (Notational sugar) • 2.3.4 Van pham PATR-II 2.4 Kieu van pham thin I: Agreement (sty tucmg hop) 2.4.1 Kieu van pham thin II: Subcategorization (cac tir muc con) 2.4.2 Kieu van pham thir III : Logical Form (dang luan15/) 2.4.3 Sir bo sung met so kieu van pham khac 2.4.4 Cac each bieu then tir vung tong quat 2.4.5 Cac hinh thirc ma reng 2.5 2.5.1 Giei thieu Hai lop (class) cila cac hinh thirc 2.5.2 Van pham hop nhAt theo chirc nang (FUG) 2.5.3 Van pham có menh de xac dinh (DCG) 2.5.4 Van pham theo chirc nang tir vung (LFG) 2.5.5 , Van pham cau trite cum tir tong quat (GPSG) 2.5.6 Van pham head va van pham eau triic cum tir dan xuat tir head 2.5.7 TO chirc tir vung 2.5.8 Cac hinh thirc ma reng Ichac 2.5.9 2.6 Tong ket Tong quat ye cac hinh thirc nen ngir 2.6.1 2.6.2 Tong ket Chiron 3: BO phan tich cu phip dtra teen nen tang hqp nhit Giai thi'eu 3.1 Hinh thirc PATR-II PC-PATR 3.2 Phan vian Cong ngha thong tin tai TP Ho Chi Minh 6 6 6 7 8 9 10 10 11 11 12 13 14 14 14 20 21 22 22 23 27 28 29 29 29 29 30 33 35 37 39 39 45 45 45 46 47 47 47 Trang 2/216 IOIT-HCM Nghien tiro va xay citing chu'dng trinh tra citu van pham va tit dign Wag Vigt 3.2.1 Cac luat eau trac cau (Phrase structure rules) 3.2.2 Cac eau tile Feature PC-PAIR 3.2.3 Su• hop nhat (Unification) • 3.2.4 Cac rang buOc feature (Feature constraints) • 3.2.5 Tir vgng (lexicon) 3.3 Van hanh PC-PAIR 3.3.1 Cac toy chon cau lenh PC-PATR (Command Line Options) • 3.3.2 Cac l'011 tuong tac (Interactive Commands) : 3.3.2.1 cd 3.3.2.2 clear 3.3.2.3 close 3.3.2.4 directory 3.3.2.5 edit 3.3.2.6 exit 3.3.2.7 file 3.3.2.8 fi le disambiguate 3.3.2.9 file parse 3.3.2.10 help 3.3.2.11 load 3.3.2.12 load ample control 3.3.2.13 load ample dictionary 3.3.2.14 load ample text-control 3.3.2.15 load analysis 3.3.2.16 load grammar 3.3.2.17 load kimmo grammar 3.3.2.18 load kimmo lexicon 3.3.2.19 load kimmo rules 3.3.2.20 load lexicon 3.3.2.21 log 3.3.2.22 parse 3.3.2.23 quit 3.3.2.24 save 3.3.2.25 save lexicon 3.3.2.26 save status 3.3.2.27 set 3.3.2.28 set ambiguities 3.3.2.29 set check-cycles 3.3.2.30 set comment 3.3.2.31 set failures 3.3.2.32 set features 3.3.2.33 set final-punctuation 3.3.2.34 set gloss 3.3.2.35 set kimmo check-cycles 3.3.2.36 set kimmo promote-defaults 3.3.2.37 set kimmo top-down-filter 3.3.2.38 set limit 3.3.2.39 set marker category 3.3.2.40 set marker features 3.3.2.41 set marker gloss 3.3.2.42 set marker record Phan vian Cong ngha thong tin tai TP Ho Chi Minh 47 50 51 52 56 56 56 57 57 57 57 57 57 57 57 57 58 58 58 58 58 59 59 59 59 59 59 59 60 60 60 60 60 60 60 60 61 61 61 61 62 62 62 62 62 63 63 63 63 63 Trang 3/216 IOIT-HCM Nghien ct:ru va xay di ng chtfcing trinh tra cau van pham va tit dign tigng Vigt set marker word 3.3.2.43 set promote-defaults 3.3.2.44 set property-is-feature 3.3.2.45 set timing 3.3.2.46 set top-down-filter 3.3.2.47 set tree 3.3.2.48 set trim-empty-features 3.3.2.49 set unification 3.3.2.50 set verbose 3.3.2.51 set warnings 3.3.2.52 set write-ample-parses 3.3.2.53 show 3.3.2.54 show lexicon 3.3.2.55 show status 3.3.2.56 status 3.3.2.57 system 3.3.2.58 take 3.3.2.59 File van pham PC-PATR (Grammar File) 3.4 Cac luat (Rules) • 3.4.1 Cac m'au feature (Feature templates) • 3.4.2 Cac thiet lap thong so (Parameter settings) 3.4.3 Cac luat tir vung (Lexical rules) 3.4.4 Cac mau rang buOc (Constraint templates) 3.4.5 Dinh clang chuan (Standard format) 3.5 File tir vkmg PC-PATR (PC-PATR Lexicon File) 3.6 Chuang 4: Nghien dm )(ay dung van pham va tir vting tieng Vqt Hien thuc Van pham tieng Viet duafi hinh thirc PATR-II 4.1 Giai thieu 4.1.1 Xay dung be van pham tieng Viet theo hinh thirc van pham hop nhat 4.1.2 Nghien dru va hien thuc tir dien tieng Viet 4.2 Cau Tao Tir (xet a phucmg dien cau tao ngir phap) 4.2.1 Tir loaf Tieng Viet 4.2.2 Cum Tir (hay goi la NO) 4.2.3 BO todien Tieng Viet theo eau tile feature 4.2.4 Chncrng 5: Chirong trinh tra cum tir dien va van pham tieng Viet Gieri thieu 5.1 Cac thiet ke dm chuang trinh 5.2 Cau true cac bO phan chucmg trinh 5.2.1 Cau true thu muc cCia chuang trinh 5.2.2 Giai thieu ve,ky thuat xcr lY chucmg trinh 5.2.3 5.2.3.1 Ltru tra tir dien 5.2.3.2 Them tir mai 5.2.3.3 Sira nghia dm tir 5.2.3.4 Xoa nghia cua tir 5.2.3.5 Khoiyhuc nghia dm tir bi xoa 5.2.3.6 \re ket qua phan tich cau Tieng Viet met sa dO cay 5.2.3.7 To darn cac tir elm co tir dien 5.2.3.8 Chuyen doi file text tir UTF-8 sang VNI-WIN va nguac lai: Cac module th.r vien rieng xay dung chuang trinh 5.2.4 Gioi thieu giao dien chuang trinh 5.3 Phan vi"0 Cong nghe thong tin tai TP HO Chi Minh 63 63 64 64 64 64 66 66 66 66 66 66 66 67 67 67 67 67 68 74 75 77 79 81 81 83 83 83 83 93 93 94 121 124 126 126 126 127 127 128 128 128 129 129 129 129 129 130 130 132 Trang 4/216 IOIT-HCM Nghien ciru va Ay citing chutmg trinh tra citu van pham va tir dign tigng Vigt 5.4 Tai lieu Icy thuat va lurong an sir dung: 5.4.1 Tai lieu ky thot 5.4.2 Tai lieu huong clan sir dung , 5.4.2.1 Sir dung chirc 'fang tir dien tieng Viet 5.4.2.2 Sir dung chirc nang xay dung van pham 5.4.2.3 Sir dung chirc nang phan tich van pham 5.4.2.4, Sir dung chirc nang tim nhanh nghia dm tir 5.5 Ket qua thong Ice voi ta'p cau,mh Chuang 6: , Cac ket qua khac tir de tai s 6.1 PhOi hop voi cac don vi nghien cim ye ngon nit tieng Viet mrac 6.2 Cac bai bao va bao cao hOi nghi 6.3 Tham gia dir an parallel grammar Chuang 7: Kha Jiang mo Ong de tai Chuang 8: KET LUAN Phan vian Cong ngha thong tin tai TP Ho Chi Minh 136 136 147 147 156 159 163 165 212 212 212 213 215 216 Trang 5/216 IOIT-HCM Nghien citu va xay dung chutmg trinh tra cifu van pham va tit dign tieng Vigt Chu'o'ng 1: 1.1 Giol thieu 1/6 de tai lee tai lieu 1.1.1 Muc dich Tai lieu duce dung lam tai lieu bao cao tong k6t thgc hien dm de tai "Nghien caw xely dung chteang trinh tra cieu van pham vet ter then tieng Viet" thu6c chtrong trinh Vuon trom Sang t4o KHCN tre, SoyKhoa hoc va Cong nghe TP HO Chi Minh quan 1.1.2 Pham vi sir dung Tai lieu dtrgc dung nhu mot bao cao b6 sung chi tit cho cac bao cao trinh bay truac h6i Tong nghiem thu So Khoa hoc va Cong nghe TPHCM lap Dong then lam can dr de trinh lanh dao Sb Khoa hoc va Cong nghe TPHCM xem xet thgc hien thu tuc ly theo dugc ky ket gala Sof khoa hoc NIA Cong nghe / HD-SKHCN hap &rig so TPHCM vOi Trung tam Phat tri8n Khoa hoc va Cong nghe tre TPHCM D6i tucmg phuc vu bao g6m: Lanh dao Soy Khoa hoc va Cong nghe TP H6 Chi Minh Cac vier' H6i d6ng nghiem thu Cac vien tham gia thgc hien c18 tai 1.1.3 Tir vitt tit Nhang tir vitt tat sir dung tai lieu: TPHCM CNTT PVCNTT KHCNMT AI XLNN PC ATN LFG FUG DCG GPSG HPSG Thanh ph6 H6 Chi Minh Cong nghe Thong tin Phan vien COng nghe Thong tin tai Thanh ph6 H6 Chi Minh Khoa hoc, Cong nghe va M6i truong Aritificial Intelligent Xir ly ng6n ngir Personal Computer Augmented-transition-network Lexical-Functional Grammar Functional Unification Grammar Definite-Clause Grammars Generalized Phrase Structure Grammars Head-driven Phrase-Structure Grammar 1.1.4 Chuan bi tai lieu Bien soan: Ong D'6 Van Long Ong Pham Manh Hung Ong Tran Minh WI Ong Nguyen Duy Hau Ong Nguyen Dang Nhan Phan vien Cong nghe thong tin tai TP Ho Chi Minh Trang 6/216 IOIT-HCM Nghien aft' va xay dipg chudng trinh tra ciru van pham va Us' dign tigng Viet Ong Huynh Him Viet Ong Le Minh Dat Ong Tnrcmg Trong Diing Ong Duang Quoc Thang BA Chau Thu Tran Hieu dInh: ThS Dao Van Tuyet KS Le Phu6c Lec 1.1.5 Tai lieu tham khao [1] D6 Van Long, DAo Van Tuyet — "Cciu frac Feature va hinh thin PATR-II, mot cach tie(p can xu ly ngon ngir tieng Viet hieu qua" - Hei nghi tin hoc quec to Viet Phap lan 2, RIVF'04, Ha Noi, 2004 [2] DO Van Long — "Giai phap xely dung van pham fling Viet tren may tinh" - Hei thao CNTT qutic gia Ian 7, DA Nang, 2004 [3] Nguyen Dang Nhan, Huynh Huu Viet, Do Van Long " VietDict —Tir dien cho xir ly nem ngit to nhien tieng Viet" - Hei thao Quiic gia met se van de chon 19c ye CNTT l'an 8, Hai Phong, 2005 [4] Do Van Long, Dao Van Tuyet, Tran Van Lang - "Using Unificatioin-Based Grammar for Vietnamese Natural Language Processing at IOIT-HCM"- Parallel Grammar Conference, Stanford University, USA, 2005 [5] Tran Ngoc Tuan, Phan Thi Tuoi - Feature-Based Grammar in adaptation to Vietnamese Natural Language Processing - HCMC University of Technology, HCM, 2004 [6] Stuart M.Shieber, An Introduction To Unification-Based Approaches To Grammmar, 1986 [7] Stephen McConnel, PC-PATR Referance Manual - A Unification Based Syntactic Parser Version 1.3.0, March 2003 [8] Noriko Tomuro, Left-Corner parsing algorithm for Unification Grammars, DePaul University Press,1999 [9] A Feature-based Korean Grammar with Unification Constraints - Dept of Computer Science & Engineering, Korea University, 2003 [10] Artificial Intelligence textbook - Chapter 6: Knowledge Representation, 4th ed., by George F Luger, published by Addison-Wesley, Harlow, England, 2002 [11] Frame-Based Systems http://www.cs.umbc.edu/771/current/papers/nebel.html [12] A Theory of Natural Language Understanding Cognitive Psychology http ://www cc gatech eduk-j immyd/summari es/, http://www.cc.gatech.edu/commiting/classes/cs3361 96 spring/Fa1195/Notes/cd.html [13] Diep Quang Ban, HoAng Van Thung, Ngu phap Tieng Viet, DH Su Pham, NXB Giao Duc, 2003 [14] Parallel Grammar Project: http://www2.parc.com/isl/groups/nitt/pargram/ 1.2 V6 d6 tai 1.2.1 Muc tieu Nghien ciru 1S, thuyet, giai phap bieu dien ngon ner to nhien hieu quA, qua ap dung bieu clan nem ngir' tieng Viet tren may tinh Cu the lA nghien cuu Ira xay dung tong quat he thong ngir phap va tir \rung tieng Viet a clang hinh thirc van pham hgp nhat (Unification-based Phan vian Cong ngha thong tin tai TP Ho Chi Minh Trang 7/216 IOIT-HCM Nghien cCtu va xay dung chu'dng trinh tra cd'u van pham va tip dien tieng Viet grammar) va tri'en khai tich hop vao chuang trinh may tinh phtic vu cac irng dung xiring6n ngir tier nhien tieing Viet ye sau 1.2.2 Ni i dung nghien cam • Nghien ciru hinh thirc van pham hop nhAt bi8u di8n ner phap cua nein ngir tier nhien tren may tinh • He thong h6a van pham tieing Viet ca ban de dung duct tren may tinh theo dang van pham hop nhAt • Nghien ciru xay dung tap tier vung cho he thong, tier d6 hinh be, tier dien tieing Viet • Nghien ciru van dung he thong phan tich cu phap hieu qua phpc vu cho chtrong trinh • Xay dung img dung keit hop cac keit qua, hinh chtrang trinh tra ciru Van pham va tier dien tieing Viet 1.2.3 San phim clang k57 TT Yeu cau khoa 4c, kinh to Ten san pham Phan mem 116 tra tra ciru Phan mem bao Om cac tinh van pham, tier vung tieing nang sau day: )=.- Cho phep ngixpi dUng Viet tra ciru to dien tieing Viet, ht; to them, b&t, sira xoa tu dien (20.000 tier) > Cho phep ngtre:i dung tham MIA° he thong van pham tieing Viet, ho tro tra ciru va cap nhat van pham )=- Cho phep phan tich N/A hien thi cay cu pita') dm mot cau cac doan van ban Tai lieu ks, thuat, huong Gieri thi'eu chtrcmg trinh, tro dan sir dung, cap nhat to giup va twang dan sir dung va cac kien thirc, k5', thuat thgc di8n hien d8 tai , Chu thich 1.2.4 Thanh phan tham gia thtrc hien Ho va ten KS Do VAn Long KS Phgm Mgnh HUng CN Nguyen Duy frgu KS Trin Minh Vu CN Diming Qu8c ThAng CN Chau Thu Trail CN Le Minh Dat Phan viOlCong nghe thong tin tai TP Ho Chi Minh Phan cong Chu nhiem de tai Thu ky de tai Kian trUc he th6ng VAn phgm tieing Viet VAn pham tieing Viet Tier dien tieing Viet Tier diem tieing vi4t Trang 8/216 IOIT-HCM Nghien citu va xay dting chtking trinh tra citu van pham va tit dign tieng Viet 1.2.5 Thanh pilau tir van Ho Ira ten GS.VS Tran Ngoc Them GS.TS Die') Quang Ban TS Pham Van Tinh CN Vu Xuen Luang Ca quan cong tac Dai hoc KHXHNV TPHCM Dai hoc Su pham Ha N8i Vien Ng8n ngu hoc Ha N8i Trung tam TU dien hoc HA N8i 1.2.6 Tom tat Wet qua (10 duvc Cac k& qua a tai da thuc hien &roc • Nghien ciru va nam bat 1Y thuyet ve ngon ngu hop nhat mo to non ngir to nhien teen may tinh • Nghien ctru va nam bat 1)0 phan tich van ban theo hinh thirc ngOn ngir hop nhat • Nghien ciru tong quat ve van pham tieng Viet va xay dung &roc 130 van pham cho may tinh • Nghien ciru ve tir tieng Viet N/A xay dk.mg &roc 13,0 tir dien khoang 25.000 tir • Xay dyng phan mem may tinh tir dien tieng Viet va chtrong trinh tra ciru van pham tong quat dua tren ket qua nghien dru • Chuang trinh can cho phep phan tIch cu phap mot so van ban tieng Viet don gian va có mot so ket qua ma rOng cho cac dy an )(Cr 1Y nein ngfr dm Phan vien CNTT tai TPHCM Cac ket qua mo rOng va hop tac nghien ciru • Nhorn da duvc mei ,tham gia vao du an Van pham song song trtrong Dai hoc Stanford (Hoa KY) de xuat de hien thyc van pham cho nem ngir tieng Viet (thong tin ye du an ducyc trinh bay teen website: http://www2.parc.com/isl/groups/nitt/pargram/ va da dtrov mad trinh bay hOi nghi gan nhat dm du an Pargram tai My va Anh Quoc • Nhorn da duce chap nhan va mai trinh bay cac hOi nghi Qutk gia ve Cong nghe thong tin ye not dung lien quan den xcr ly non ngir tieng Viet va xay dyng bO van tir dien tieng Viet cho may tinh • NhOm thyc hien da ket hop vai Trung tam tir dien hoc Ha NOi de nhap lieu cho phan mem ma nhom da xay dyng va Vien Ngon ngir hoc Ha NOi de WO tro xay dung 130 van pham tieng Viet mire tong quat Nh6m cling da tham khdo chinh tac gia cua 130 sach tieng Viet "NO pita') tieng Viet" la Giao Su Tien Si Diep Quang Ban de tham khao va ducc khIch le rat larn de tiep tuc de tai Phan vian Cong ngha thong tin tai TP Ho Chi Minh Trang 9/216 IOIT-HCM Nghien ciru va Ay dung chu'dng trinh tra citu van ph4m va tit dign tieng Viet Chu'o'ng 2: Van pham hcyp Mit Khai niem Van pham hgp nhat (Unification Grammar) dai tir nhang nam d'au 1980 da dugc cac nha nghien cini the gioi linh vuc xir 1S, ngon ngil rat quan tam vi tinh tong,quat va kha nang bieu dien kha hieu qua cho ngon ngir to nhien Tinh tru viet oh chung d'a dan dugc chap nhlan kha reing rdi va tra mot nhang hinh thirc van pham rat thong dung cac nganh tinh toan ngon ngu hoc va xir 15, ngon nga,tu nhien Trong khuon kho dm de tai, chung toi d'a chon lua hinh thirc van pham hop nhat de lam nen tang mo to b6 ngir phap ciia ngon ngcr tieng Viet tren may tinh Trong lich sir hinh va phat then, cac hinh that nem ngfr dugc chia hai nhanh biet nhu GPSG, LFG, chinh: nhanh thir nhAt phat trien cac 13', thuyet nen ngir tongquat nhixsPATR-II, FUG HPSG, ; nhanh thir hai Oat trien cac cong cu nen ngir tong va DCG Mac du ca hai nhanh hinh tir cac muc dich khac nhtmg hau het deu thong nhat sir dung feature nhir mot cau truc du lieu nen tang, va su hop nhat (unification) lam tac vu ca ban de ket not thong tin Hien nay, cling co kha nhieu cac nha nghien cfru xir 13% non ngir to nhien tren the giai dachon Unification-based lam hinh thirc bieu dien cho nen ngit dm minh (tai lieu tham Ichao [8], [9]) Song song d6, cling c6 rat nhieu cac to chirc nghien cau ngon ngir quoc to hien thuc hinh thirc PATR-II nhang cong cu yo trg cho cong tac phan tich va xir 13', ngon ngir nhtr SIL (Vien nghien dna ngon ngit quoc to tai California), Dai hoc DePaul, Korea, (tai lieu tham Ichao [6],[7]) Va dac biet, tir nam 2002, truemg Dai hoc not tieng Stanford dm Hoa KS, da to chirc du an Van pham song song (Parallel Grammar) de ket hop cac he th6ng van pham dm cac quoc gia Oa ten nen tang van pham hop nhat d6 co nen ngir tieng Viet ma chung toi dang tham gia thuc hien (tai lieu tham khao 1141) Hu&ng tiep can ma chung toi dp dung cho Xl:T1S, ng6n ngil tieng Viet la Ung dung 1S, thuyet ve van pham hop nhat voi cac tru diem cita cau triic feature va hinh thirc PATR-II dE xay dung he thong tir vung va van pham tieng Viet Tir dO sir dung 136 phan tich cu phap ltni goc trai (left cornet chart parser) phan tich cac van ban tieng Viet nh4p vao He thong hoan toan co kha nang ma r6ng de xay dung cac module km tilt, xir ,, phan tich cac van ban tieng Viet va tich hop vao cac chuang trinh xir l , nen ngir tu nhien khac 2.1 Gievi thieu Cac diem chinh dugc de Op den van pham hop nhAt la la cach bieu dien thong tin cfla nghia va cu phap Thuat ngir "hinh thuc van pham dua tren ngOn ngir xet tren khia canh su hop nhat" bao Om mot so sophIrcrng phap hinh fink bieu dien nem ngit dugc phat trien qua nhieu died kS, khac Nhin chung, cac phuang phap dugc xem nhu la mot nhom co cac phan lien quan voi va cho phop sir dung ke thira mot phan dm nhau, nhtmg tom 14i, tat ca cac hinh fink tren co ban deu dua tren ding mot phuang phap, la su hop nhat (unification) Ve lich sir deri, cac hinh thirc van pham (grammar formalisms) la ket qua cila viec nghien cau c 14 ve cac phong doan ngon ngcr, cac hinh thirc bieu dien non ngil va xir 13, nem ngilt to nhien Cac Icy thOt lien quan co the thay dugc qua chirng minh cac gia thuyet, cac kien thirc thu thap dugc thong qua nghien cuu va cac gia thuyet ve cac kieu du lieu., Qua, do, mot so khuynh htrOng nghien cfru dOc 14 d'a bat dau hinh va dua tren quan diem ve su hop nhat de kiem soat Itni luting thong tin bieu dien ng6n ngir Phan vie;ri Cong ng4 thong tin tai TP HO Chi Minh Trang 10/216 IOIT-HCM 67 Nghien cau va xay dung chu'dng trinh tra cifu van pharn va tEr dign tigng Vigt Nang ta dep ghe ! Dugc Dung Cau CauDdn 8• CauDdnHaiThanhPhIn 8C 8- D NgaDanhTit DanhTil Nang El Dan ta v NOTInhTt1 T InhT it dPI) a PhOT it ghe DauChamThan 68 anh hoi cai gi ? Dirov Dung a cat., CauDdn b CauDdnHaiThanhPhen o C D Dan anh V G.W 8W 0.1)(ingT it hoi 8D NOD aiT ii D inhT it 0 aiT il gi DauHei -.? Phan vign Cong nghg thong tin tai TP Ho Chi Minh Trang 202/216 IOIT-HCM 69 Nghien citu va xay thing chtfclng trinh tra ctu van pham va t& dian bang Viet Nha anh ay xa khong ? 70 mOi v'e dO Dtroc Dung Cau G CauDdn O Caul) dnM 8R angN ongCat C 8•D N OD anhT LI G N OD anhT ti b DanhTii N aiT it anh W DaiT ii Sy 0.V b T L T InhTi.1 ' xa TPhPhy b PhyNO T han T it khOng D "AuHoi ? Dugc Dung S Cau Caufl an El CauDdnHaiT hanhPhan O C 2D N OD aiT ii p poiru anh DaiTil ta 0V 8W b NOD angT it PhoT LI mai S angT11 ye eD El DaiT i • dO Phan vian Cong ngha thong tin tai TP HO Chi Minh Trang 203/216 IOIT-HCM 71 Nghien anh ay clang pha phach va xay dung chu'dng trinh tra cttu van pham va tit dign tag Vigt Ducc Dung a cat, G CauDdn b• CauDdnHaiThanhPhan o NOD aiT ii 1±1 D ail ii anh DaiT LI : ay a BV LI T I 72 teen PM ban , toi de quygn sach Dirge Ding NOT inhTil PhoT i1 ' sang El T InhT i.1 pha phach Cau b., CauDdn CauDdnMdflngNongCot c41 TPhPhu b T rangNO El Lien% a tren D b N OD anhT ii b inhT ii : cai S D anhT i ban DauPhSy : ac 8D o DanhT i.1 roi V D'dngTtl El D anhTii : quy6n sach Phan vre'n Cong ngh'0 thong tin tai TP Ho Chi Minh Trang 204/216 TOT T -HCM 73 Nghien ctYu va xay dying chuting trinh tra dru van pham va tif clign tigng Viet toi an cam Dtroc Dung G Cau CauDdn ti CSufldnHaiThanhPhIn b C GD G DanhT ii toi G V p-€ itingT i1 gn 13 D CI DanhT cdm 74 No that la phach loi D age Sai a cau G CauDdn O CauDdnM8ROngNangC5t CI C 8D aiTil NO aV 8T J T MhTil th4t la U TPhPhy 1:i PhyNga di TrOT L ph.fich 16i Phan vian Cong ngha thong tin tai TP HO Chi Minh Trang 205/216 IOIT-HCM 75 Nghien citu va xay dt.tng chu'dng trinh tra ciru van pham va tit din tiang Viet toi ng6i hQc Dlrov Dung GI Cau b.- CauDdn L) CauDdnHaiThanhPhan b C Li D DanhT1.1 ' toi 0.V bw Li N 009'ngT ill D 9ngT LI , ngoi Li D OngT Li hoc 76 anh em toi hin lam Nam Duce Dung a Cau ti CauDdn b CauDdnHaiThanhPh5n o, c LI D 0- NODaiTil b NgaDaiTil Soli.1 Nem 0 aiT ii L anh em Danh% Vol V b T Phan vian Cling ng4 th8ng tin tai TP Ho Chi Minh NOT inhTil b TinhTil hien a PhoT Li lm Trang 206/216 TOT T -HCM Nghien c(tu va xay thing chuting trinh tra cCru van pham va til dign tieng Viet 77 Toi de quyen such dual cai ban Ducrc Dimg a Cau 8- CauDdn b•• CauD dnM 8R OngN angCdt b C D° D GI- DaiTii Toi 0.• V b•• DOngT8 •••• de 8.• D b•- DanhTiJ quyen sach S TPhPhu o•• TrangNO ea LienT dab 0D O•• NODanhTi.1 0- D anhT cai 0- DanhTit •••• ban 78 anh la sinh vien Mita-1g DHKThua't Dtrac Dung a Cau a CauD On b.- Ca uDdnHaiThanhPhan o•• c b•• D O- NOD aiTil c4 eon il anh O- Nail Ca, V o•• D OngT it is e D S N OD anhT it b•• NOD anhT il o • DanhTil sinh vien 8• D anhT it traIng 8- DanhTil •• DH KT huat Phan vien Cong nghe thong tin tai TP HO Chi Minh Trang 207/216 IOIT-HCM Nghien citu va xay thing chuting trinh tra cifu van pham va tit dign tiang Viet 79 hay , anh ay hien nhi! Duov Dung a cau G CauDdn a CauDdnM8ROngNengCot J TPhPhy PhyNO a - T hanT Li hay DauPhgy , C:I c CJ D a NODaiTii 6• DaiTtl .anh a.Nrra ay 2V a T b T InhTt1 hien 80 toi tang cai cho me Duct Dimg TPhPhy a PhyN du T h6nT Li nhi DauChamThan L ! a cau a CauDdn It/ CauDdnHaiThanhPhan C b D it] DanhTi1 : toi Ci V bw bw b D OngT II Ong aD P N OD anhT il a DinhTii cai o DanhTii ao BM a G i6iT LI cho a D a DanhTil • me Phan vin Cong ngha thong tin tai TP Ho Chi Minh Trang 208/216 IOIT-HCM 81 Nghien thu va Toi a quye'n sach duOi cai ban va an xay clu'ng chutIng trinh tra Dtrgc C OM Phan vien Cong ng4 th8ng tin tai TP Ho Chi Minh Dang van pham va ter din tiEng VrOt G Cau El CauGhep CauGhepErAngLp o CauDdn El CauDdnMaRtingNangCot ij C El D DaiTi1 T di El v CI OngTi/ , L a 8D L1 DanhTil - qui& sach TPhPhy S TrangNO CI LianTil &A CD D El N OD anhT LI Li DanhTil cai CI DanhT ban Ca LienTil va aV DOngTil an 0D b Danhfll edm Trang 209/216 TOT T-HCM 82 Nghien cUu va xay citing chuting trinh tra cifu van pham va tif din tigng Viet Toi va anh a'y hien Ducfc ram Phan vi"On Cong nghe: thong tin tai TP Ho Chi Minh Dung a cau a CauDdn B CauD 6nH air hanhPhan BC BD B NOElaiTil Li NOD iTill iti- ElaiTil ỵơí Ca LienT Li va Dat1 iT anh B- Dan ay V b T d NOT InhT LI d T inhTil L Nan a Ph6TU i lam Trang 210/216 TOT T -HCM Nghien cal va xay dung chu'dng trinh tra citu van pham va ttt clign tiEng Viet 83 MI de quye'n sach dtroi cai ban , fed An Dirac Dung cam Cau 8- CauG hap LI CauGhepChuOi d CauDdn d cauDdnMiliROngNangCot th, C d, D LI DaiTil T di âãV El DOngTi1 - de O•• D Cl] DanhTil - • qui& sach E) TPhPhy C1- TwigNgt1 b LienTil ] , dt.idi 8- D 1.• NODanhTii Cl DanhTil -a 1:21- DanhTil - ban El DauPhSy , d CSuDdn 8- CauDdnHaiThanhn-An o- C L1 D a.' DanhTil toi CI V ' G D IngTil L an Ei- D ol DanhTil cdm TOng kk: So cau phan tich duac So cau thong phan tich duo.c SO cau dung So cau sai Phan vian Cong ngha thong tin tai TP Ho Chi Minh 98,8% 83 82 Trang 211/216 IOIT— HCM Nghien c to va xay dung chutmg trinh tra Chu'o'ng 6: van pham va of dign tieng V4t Cac ket't qua khac tiv de tai Tuy de tai dugc thine hien vai muc tieu chin yeu la nghien ciru va tim kiem phuang phap mo to ngon ngir tieng Viet teen may tinh Tuy nhien, qua trinh thine hien de tai, chi:mg toi cling TA' nhan duce rat nhieu sr dOng vien,va khich tix cac giOi nghien xir ngon va ngoai nuerc Phan sau trinh bay mot so ket qua lien quan den de tai da khuyen khich chung toi rat nhieu qua trinh nghien ciru va thine hien de tai Ph6i hop vdi cac don vi nghien col, v6 ngon ngip tieng Viet ntyclec 6.1 Trong qua trinh thine hien de tai, nhom cling da hop tac vii Trung tam tir dien hoc Ha NOi va cling da giri cac chixcmg trinh de cac chuyen gia tir dien nhan xet, bo sung va nhap lieu cho de tai Ket qua buck dauda nhap dugc ban 47.000 tir bo tir then VietDict ma nhom da thiet ke va xay d ung de km tat tir dien theo hinh thdc Van pham hop nhat lud v tic nit Viet T4111in WNW Cargos Gkip Ob Tin to Tndma Chl antic Ole hen nin t4loel ! 'Tkdiin to Oena tit LKit qui dm tarn I gin dung gla deo gin dint, gin dint, pa Pnn cr4 nog* gin dinft10 gle Sting gin dining gin gam gla alio ids Kinn gle huin ele huin se gla Wong 17-0,8-4171 ntjap cj oarh_tu r ifinA pa lb gle nghldm Did dahlia gin nhin pa nhits pa n6 ail on pa oni pa phip gle ohong gin phi gla avtdn Ola!sai gin sin gin out gle sit gin Sit gin tat gla tang allows dr mfint tint nen • cdna On gla flan cho naingay nira • sin gin hen nghien cUu sinh e-Q Thh_tit e-1::1041dt ❑ Than_tit Itis_tit D s•! Tin : Datti_tif Dan dash : b00 Ma ti : Dana title rteng IC/ tau thi t sten Nail Quit ve cesi T melte le cue tinh thi high but bap clic truing van ddng oia thyt th6 tin Li Y tgFia harp Mr13 T ^Oa VON 04 'Mt Ina clues hda rnei ten hi vet van ditto tan thus 0,6 trona no pan va wen] pan Xtte IDiin0_t4: 4038 mitt Oil bolds Tsang del ad 10 Mude cac 16914 core 6.2 Cac bai bao va bao cao hoi nghi Trong qua trinh tham gia de tai, cac vien tham gia cling da dung kien thac nghien ciru &roc tir de tai viet cac bao cao va da dugc chap nhan trinh bay cac hOi thao, hOi nghi ve CNTT va ngoai ntrac Cac bao cao rat dugc hoan nghenh va da khich nh6m nghien ciru c6 nhttng ket qua kha quan ham Cac bao cao (duce in phu luc din.h kern) bao gOm: Bao cao tai hOi nghi RIVF04: "CAu true feature va hinh thfre PATR II, MOt each tiep can nghien cuu va xir 15, ngon net tieng Viet hieu qua - DO Van Long, Dao Van Tuyet" flOi thao Quoc gia mot so van de chop loc ve CNTT Ian 7, Da Nang 2004: " Giai phap xay dt.rng van pham tieng Viet tren may tinh- DO Van Long, Da° Van Tuyet" Phan vi'On Cong ngh0 th8ng tin tai TP Ho Chi Minh Trang 212/216 IOIT-HCM Nghien ciru va xay du'ng chudng trinh tra ciru van pham va of din tigng Viet HOi thao QuOc gia melt s8 van de chon loc ye CNTT lan 8, Hai Phong 2005: " VietDict — Tir dien cho xii Ur ngon VI ter nhien tieng Viet" I-10i thao Quoc gia mot so van de chon loc ye CNTT lan 9, Da lat 2006: " Web nger nghia va ti-ng dung rut trich thong tin teen Web- DO Van Long, Huynh Him Viet, Trtrang Trong Dung, Nguyen Duy 114u, Pham M#nh Hang, Le Minh Dat, Trill Minh Vu" FlOi thao Pargram tai Dai hoc Stanford, My", 2005: "Using Unificatioin-Based Grammar for Vietnamese Natural Language Processing at IOIT-HCM- Do Van Long, Tran Van Lang, Dao Van Tuyet" I-10i thao Pargram tai Dai hoc Oxford, Anh quoc, 2006: "Vietnamese Grammar and VNLP Project - Do Van Long, Tran Van Lang, Dao Van Tuyet" 6.3 Tham gia du' an parallel grammar Du an ParGram trung tam nghien ciru Alto Palo thuOc Dai hoc Stanford, Hoa KS, thiet lap vao nam 2000 nham ket hop cac nha nghien dm ng8n ngcr thuOc cac Vien nghien ciru, throng Dai hoc cac qu8c gia tren the gioi voi Muc tieu chinh dm du an la ,xay dung mot he th6ng van pham mang tinh bao quat chung cho toan 1)0 cac ng8n ngil dm cac quoc gia khac nhau, hien tai du an sir dung hinh thirc van pham LFG (Lexical Function Grammar) lam nen tang chung de m6 to cac loci van pham dm cac nuov NhOm lam de tai chimg toi da dtroc mai tham gia va lam vien dm du an vao nam 2005 Nh6m de tai da chive mai tham gia va trinh bay tai hOi nghi vien gan nhat dm du an va rat ducfc hoan ling 110: vao thang 12/2005 tai Dai hoc Stanford (Hoa K5) va vao thang 9/2006 tai Dai hoc Oxford (Anh) Hien tai nhom thirc hien cling da dude dua ten vao he thong website du an cila ParGram tai dia chi: http://wvvw2.parc.com/isl/groups/nitt/pargram/ Phan vien C6ng ngh' thong tin tai TP HO Chi Minh Trang 213/216 Nghien cttu va xay dVng chifdng trinh tra cifu van pham va tif dign tigng Viet IOIT—HUM Parallel Grammar Project The Lexical Fur:lona:sal Grammar (LFG) ParGrain project is a collaborathe effort ievolsing researchers in industrial nod academic institutions around the world The aka of the project is to produce wide coverage grammars for a wide ValietY of languages (see partidpatin sites below) These are urinal collaborative witbk the inguistic framework of LEG and with a coconsonly-agreed-upon set of grammatical feanaes / MO $40.0.1).• 4441.1,11$4 INCIARATIVL About ParGram: 444144 • Development FAMMIXI100" • Papers and References • Previous Members Ma 7.4 1.44 n.E and Medic, HAI 014011‘41 CAN KM 144404 Tr$4 mat $4 Current Participants: Natural Language Theory and Technology PARC (Chinese English French): Fang, Ron Kaplan, Tracy Holloway King, Annie Znersen Instant for CompetationalLingnisties (MS), University of Stuttgart (Germank Chrisdan„lobrer, Martin EOM Linguistics Department, University of Berge Haste thsi: Vgtoria Rosen ): Corporate Research Laboratory, Fisji Xerox (Imanese): Hirosli Nlasuichi, Hiroshi Uniernoto Tomoko Ohkurna Daigo Suaara Human Lan:gate and Speech Technologies Laboratory at Sabaaci University (Turkish): Kemal (Mater ()dem Cetinosdu Lingeistics Dept, University of Konstanz (iZerhi) Mkiam Bun :77=7:17: • "k ESt tr Pew xtrr .es -x4; e " •?!i "' "1 Q411 , ; - te71 0,- ®3 parC.OTTASIVOLOVNICOncr 10J- f 30 wen Corporate Research Laboratory, Fuji Xerox (Japanese): 1-hroshi Masuichi, Hiroshi Unsernoto Tomoko Ohkuma Daigo Sueihara Haman Lan/wage and Speech Technologies Laboratory at Sabatini University (Turkish): Kemal Oda:rr Orlem Ceti-nog-1u Linguistics Dept, University of Konstanz (Ural): Mgiant Butt Linguistics Department, University of Essex (Welsh), Lousa Sadler Irmo 4haendorf Linguistics Department, Oxford University (Malagasy): May Dalmmple Charles Randriamnsimanana School of Informatics Universire of Manchester (Arabic): Harold Somers Mohammed Attic UniversitY of Debrecen (Hungarian) Tibor Lseeko Ho Chi Minh City Institute of Information Technology (101T-11CM) (Vietnamese): Dao Van Inset Do Van Lone, Tran Vast Lang Zvir,m14 mat,oa Theory and Toc;:r•-.0!ogy Palo Alm Rcrcort.ti Camp3333 Corso MI1 Ad Palo Alto, CA 94304 USA Last updated by TracY_Yogowa- n Kw Friday 06-Oct-2006 12.47-02 PDT Lel Phan vian Cang ngha thong tin tai TP Ho Chi Minh 1- I iP :nmret Trang 214/216 IOIT-Havi Nghien oh va xay dying chu'ang trinh tra citu van pham va tit dign tie'ng Vi8t Chu'o'ng 7: Kha nang mev repg a tai De tai dd dugc xay dtmg vai kha nang co the tich hop cac module xir lY tIch hap vao cac du an xir lY ng8n ngir tu nhien khac Truck mat la 1)0 tir dien c6 the phtic vu cho cac img dung xir ly van ban tieng Viet thong dung Ngoai ra, de' tai cho thay kha nang CO the phat trien img dung ho trg hoi dap thong minh bang ngon ngir tu nhien tieng Viet Tuy day la mot module xay dung de xem xet kha nang img dung vao linh v-ua Tri tue nhan tao nhtmg neu &rot tiep tuc dau tu se dem lai tiem nang rat kha quan cho linh vuc NLP tieng Viet NgOai ra, de tai ding c6 the tiep ttic phat trien theo cac huong nhtr: • BO W dien dien tir tieng Viet giup ngtrai dung tra dm tu dien ye Y nghia tir, phAn tir loci dm tir, va cac tir lien quan ding tir loci La phan mem tra dm tir \Tung tro giup cho cac sinh vien hoc sinh viec hoc va tham khdo nen ngir tieng Viet • BO tra cim van pham tieng Viet, cho phep nguai dung sir dung de tham kha.o van pham tong quat dm Tieng Viet, CO the tich hop de phat trien hieu qua chuang trinh phan tich ngit phap dia cac van ban viet thong dung tieng Viet Ngoai chuang trinh c6 the 1-16 tra sinh vien hoc sinh tham khdo dung cu phap dm cac cau tieng Viet dtrai dang hinh anh hoa trtrc quan sinh &Ong Chuang trinh c6 the phat trien mot module WO trg kiem tra Iiii ngir phap cac van ban tieng Viet, giup cho ngtroi viet c6 the nhanh ch6ng phat hien 18i van pham trinh bay van ban • • Kha nang phat trien he th'Ong h6i dap th6ng minh Oa vao viec phan tich va xir ly cac van ban nh5p vao chuang trinh la rat kha quan, co the phat trien nhung vao cong nghe Robot tieng Viet neu &roc dau tu phii hop Ngoai chuang trinh ,con c6 the phat trien Chuang trinh hOi dap thong minh giup hoc sinh kiem tra va bo sung kien thirc ye sinh hoc thong • Kha nang me rOng va dp dung va Filth vuc datamining dp dung teen nem ngil tieng Viet thu thiap cac ar kien van ban tieng Viet va tap hgp cac phan tIch de dua ratting nOi dung chinh dm van ban dua tren cac phan tich phan nang cot cau Phan vien Cong ngha thong tin tai TP.1-16 Chi Minh Trang 215/216 IOIT-HCM Nghien c(tu va xay dgng chuc%ng trinh tra al'u van pham va to dign tieng Vigt ChLrang 8: KET LUAN Cac ket qua de tai da thgc hien dugc Nghien ciru va nam bat ljr thuyet ve ng6n ngir hop nhat mo to ngon ngir tir nhien tren may tinh Nghien ciru va nam bat be plan tich van ban theo hinh tilde ngon ngir hop nhat Nghien cim tong quat ye van pham tieng Viet va xay dung dugc 1)0 van pham cho may tinh Nghien cuu ye tir tieng Viet va xay dung dugc be tir dien khoang 25.000 tir Xay dtmg phan mem may tinh tir dien tieng Viet va chuang trinh tra ciru van pham tong quat dtra tren ket qua nghien ciru Chucmg trinh cho phep phan tich Cu phap met so van ban tieng Viet don gian va c6 met so ket qua m6 Ong cho cac du an xir ly nem ngit dm Phan vien CNTT tai TPHCM Cac kgt qua ma reng va hop tac nghien ciru Nhom da dugc mai tham gia vao dg an Van pham song song twang Dai hoc Stanford (Hoa Ks) de xuat de hien thgc van pham cho ngon ngfr tieng Viet (thong tin ye dir an dugc trinh bay tren website: http://www2.parc.com/isUgroups/nitt/pargram/ va da (lucre mai trinh bay hei nghi gan !that dm dir an Pargram tai My va Anh Quoc Nhom da duce chap nhan va mai trinh bay cac hei nghi Quoc gia ve Cong nghe thong tin ye nei dung lien quan den xir ly ng6n ngr tieng Viet va xay dtmg be van pham, tir dien tieng Viet cho may tinh a Nh6m thgc hien da ket hgp vai Trung tam tir dien h9c Ha Nei nhlap lieu cho phan mem ma nhom da xay dung va Vien Nem ngir hoc Ha Nei de ho tra xay dung be van pham tieng Viet a mire tong quat Nh6m ciing da tham khan tac gia dm be sach tieng Viet,"Nga phap tieng Viet" la Gido Su Tien Si Dip Quang Ban de de tham khao va dtrgc khIch ration de tiep tuc de tai Trong khuon khO de tai nghien ciru Vtran tram Khoa hoc Cong nghe tre cua Sa Khoa hoc va Cong nghe TPHCM, chung toi rat,muon trinh bay cac ket qua lam viec minh lam cho ngon ngir Tieng Viet gan han vai the giai may tinh Hy vong cimg vai cac ket qua de tai lam chive va phoi hop vai cac nghien cim a nhiing to chirc khac mac, mong rang met thai gian IchOng xa, se c6 nhiing chucrng trinh sir dung nem ngcr Tieng Viet dap,img cho nhu cau giao tiep ngtrai-may, cac chtrang trinh todien, tra ciru va xir van ban, gap phan pito bien ngon ngir Viet Nam Phan vin Cong ngha thong tin tai TP Ho Chi Minh Trang 216/216