Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 79 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
79
Dung lượng
6,28 MB
Nội dung
I H C QU C GIA TP HCM IH NGUY N KIM HUY N M T MC MT N : 60.48.01 LU TP H 2013 i: iH -HCM ng d n khoa h ch m nh GS.TS Phan Th ch m nh Lu cb ov t 12 2013 25 nH PGS.TS Qu iH m: TS Nguy n H GS.TS Phan Th TS H B o Qu c lu n c a Ch t ch H c s a ch a (n CH T CH H NG ng Khoa qu - - MSHV: 11070455 16/07/1983 I : 604801 II III : 02/07/2012 IV :21/06/2013 V : T L IC ng h t om i u ki t vi c h c t y, s ch d n t r Nh ng l c a th y t lu ib nc n su t th i gian th c hi n lu t ih uc gi ng d c bi c at tc i thu t L ng, ngo i tr th c hi m nn t qu tham kh o t a lu cn l y b ng c p Nguy n Kim Huy n MT T C mt ng t hay c m t li u i di n cho n ng ti p c th ng h c tc mt ng h c hai m quan h ng mt v M c s quan i ti n hi u su t c a SemiRank, m mt d c mt m i quan h ng xu t hai p t p c i ti n t u c m t tr c i ti n t p c i thi n nh ng mt t m i quan h ng y r ng hi u su t c a SemiRank ng h p it p gi v i p WikiHai p mt t qu th c nghi im ts xu t u u c i ti n hi u su t c a SemiRank c mt k t qu t t SUMMARY Keyphrases are single or multiple words summarizing the main contents of a document There are two main approaches for keyphrase extraction: supervised and unsupervised learning However, semantic relations between phrases have not been adequately considered in both approaches In this thesis, we proposed two methods to improve performance of SemiRank, an approach to extract keyphrases based on initial keyphrases and semantic relations between phrases in the document The two methods are: Core Phrases and Information Features methods Our methods outperform SemiRank with intitial keyphrases from title and two derivatives of KEA and KEA++ on F1 measure In addition, we show that, the new methods give better results to SemiRank in the case that initial keyphrases are re-ranked based on their semantic relations i N I DUNG M U 1.1 1.2 M m vi 2.1 T 2.2 mc ac mt T 10 3.1 Wikipedia 10 3.2 ng m i quan h ng i nh p nh ng 12 3.3 th (hyper-graph) 15 3.4 (community) 17 XU T 18 4.1 SemiRank 18 4.2 m t tr 4.3 d 4.4 Ti n x li 25 ac mt 28 29 TH C NGHI M 32 5.1 Wiki-20 32 5.2 32 5.3 Hi n th c 35 Hi n th c SemiRank 36 Hi n th c ti n x li u 38 ii Hi n th m t tr Hi n th d 5.3 38 ac mt 39 hi u qu 39 nh s ng c m t u 39 Hi u qu k t h p v i m i quan h ng 42 43 S d 44 T NG K T 46 6.1 46 6.2 n 46 THAM KH O 48 iii DANH M v n Wikipedia 11 v bi u di th G1 16 mt 18 4.2 Gi i thu t PhraseRank SemiRank 22 4.3 Minh h a m t s c l p gi i thu t PhraseRank 23 mt t tr m 25 th bi u di n hi u su tr u c s d mt 40 iv DANH M C B NG 2- c s d ng m t s h th 3-1 Tr ng c 13 5-1 Hi u su t c a SemiRank s d c m t tr mt d 41 5-2 Hi u su t c a SemiRank s d d c 42 5-3 Hi u su t c a t mt u so v i t mt 43 5-4 Hi u su t c mt p d li u Wiki-20 44 5-5 Hi u su c s d ng gi i thu 44 ... ng k mt kh c ph y theo m u (POS s d ng [5, 18] tb nh t lo i c m c ch c nh ng c m t m u t lo i mt Trong nh i s l n m nh c mt t Wikipedia (article) [4, 8, 14], i di n cho ng th y m t hay nhi M a... ng m t s h th Vi ac mt m t C th ng 2.1 li m mt m ng p l c c ng TF ic mt h ng c c m t ti c c m t trong vi th ng t n [7], t tr n m i quan c m i quan h ng [8] b k th pm ts c tr a c m t kh g KEA [23]... chung i (bao g m c d Dice ng, m i quan h ng ng c a nh c n) t chung c a , [22] , [22] d ng theo c t Trong ph n t t, sau: ( , Ai, Aj (3.1) + t Wiki tt ) + )( ( )= ng c a t Ai N i di n cho ng am t