Proceedings of the 12th Conference of the European Chapter of the ACL, pages 835–842, Athens, Greece, 30 March – 3 April 2009. c 2009 Association for Computational Linguistics Growing Finely-Discriminating Taxonomies from Seeds of Varying Quality and Size Tony Veale tony.veale@ucd.ie Guofu Li guofu.li@ucd.ie Yanfen Hao yanfen.hao@ucd.ie Abstract ! " # $"%% ! 1 Introduction & ' !& (')*++,- ./(%01,,*- "%(.et al.*++,-!2 $ ! & 3(!!4 51,,6-! ) 3! %'0 78 # 3 (!! 59 (*+:6- 52;- # ! # !0 "% 2 !0 (5 1,,<-! 4 "% #!" = ! grown seeds! & 7>?%/%8 !""% # 2 835 &7 8 sharp!; (- @"% X-ness! / ! 0 1 A !< = # !> B # !& 6! 2 Related Work # (> 2*+::-!;5 (*++1- # 4 (*+++- ! 3( - ! & KnowItAll 2 et al. (1,,<- 5(!!7%0%0*%01C8- # ! " (1,,D- 5 # = 7 E %/%87%/%E8 (-(- (-! ?%(1,,<- ## !& "% = !> ! F et al. (1,,B- ! > = !F et al. (1,,:- 5 (*++1- !; 7%/% %/% E8 E %/% %/% ! !& $ reckless (E-(%/% - = ! & F et al!(1,,:-!" # 7>? %/% E87>? E %/% 8!> (>? - # ! & "%"!Fet al!(1,,:- 3 states countries ( 836 -singersfish( - food sweet (G51,,D-!4 "% "%"% ! 3 Seeds for Taxonomic Growth > & 3 HI 3 0 J 0 3 = 3 0 ! & Icola, carbonated, drinkJ!; cola (treatrefreshment-# 7E8 #7 E8! # #! " $" %%! 3.1 WordNet & "% !;"% {feline, felid} {true_cat, cat} {big_cat, cat} ! 5 "% 6,K Xess "%(ess, ess, ?ess!- female ! % "% ! > Ilioness, female, lionJ Iespresso, strong, coffeeJ Imessiah, awaited, kingJImessiah, expect- ed, delivererJ! 3.2 ConceptNet "% ! %('1,,<- """ ! % (- !' > % espresso strong coffee("%- bagelJewish word(usemen- tion-!'expressionism artistic style ("% artistic movement- explosion suicide attack(-! % "% ! " % A,,,,> (78- (!!7 8- (!!78- "%! & IWyoming, great, stateJ Iwreck, serious, accidentJ Iwolf, wild, animalJ! 3.3 Web-derived Stereotypes G5(1,,D- # 7>?%/%8 !& #!! # !5 *BL 837 (!!787 8!- 6,,, 1,,, 3 ! 5 G59 ! & = Isurgeon, skilful, ?JIvirus, malicious, ?J Idog, loyal, ?J!& # ! 3.4 Overview of Seed Resources % !&"% # !& % "% !> G5 # 3 3 !># &*! "% % M *111D **AA 6B*1 M B*A*< *:,: *66:: M <!*1 *!6 1!B6 M 1A,B BB, **D1 &*$& ! ""% $ (-( - ( -! 4 # !B # ! 4 Bootstrapping from Seeds & ! NN !& & 3 HI 3 0 J # # (E -$ *! 7 3 E 8 1! 7 3 0 E8 # 3 0 ! # #$ A! 7E 3 0 E8 <! 7E 3 E 8 & ! ; 7 8 # Ilemonade, cold, beverageJIlemon- ade, refreshing, beverageJ!& ( 3 - ! "& # #O ex- pand(T')!" ) >0! / # 1,, #B, #! 838 " # StK t S . & K , S =S K * S =K , S ∪ {T ∣ T ' ∈S ∧ T ∈expand T ' } K t* S =K t S ∪ {T ∣ T ' ∈K t S ∧ T ∈expand T ' } "# 3 ! # ex- pand(T') !; Fet al.(1,,:- reckless bootstrapping # !& 3 !" "% near-miss$ I 3 0 J"% 0 (- 0 ( -!& # "% # "%( "%-!& $ K tP S =K t S ∪ { T ∣ T '∈K t S ∧ T ∈ filter near−miss expand T ' } ;*1 ! Q* % ND <, "%N ! & "% near- miss #! ;*$)# B! ;1$) #B ! 4.1 An Example cola $Icola, refreshing, beverageJ!> cola effervescent beverage sweet beverage nonalcoholic beverage !> sugary foodfizzy drinkdark mixer! > sensitive beverage everyday beverage common drink!> irritating food unhealthy drink!> stimulating drinktoxic foodcorrosive substance! cola *<*<A1 D1A+A<*,1 B! refreshing beverage champagne lemonadebeer! 0 1 2 3 4 5 0 200000 400000 600000 800000 1000000 1200000 1400000 1600000 1800000 WordNet Simile ConceptNet Bootstrapping Cycle # Triples 0 1 2 3 4 5 0 50000 100000 150000 200000 250000 300000 350000 WordNet Simile ConceptNet Bootstrapping Cycle # Terms 839 5 Empirical Evaluation &"% near-miss (0 - ( - ( 3 - # !& # # 3 # 3 !; > 0 (1,,B- # O " %! >0 <,1 1* "%!&# ( hot red!-#R(a|an| the) * C i (is|was)R 3 ! not # # (0 -( Temperature hot -!&#+<+:+ <,1!& '&/ F(1,,1-!4 <,1 1* > 0 1* "% !& "% (- ! 3B6!DL "%!4 61!DL!.0> ,!61D ,!AA:B*A<B <,1! * We replicate the above experiments using the same 402 nouns, and assess the clustering accur- acy (again using WordNet as a gold-standard) after each bootstrapping cycle. Recall that we use only the D j fields of each triple as features for the clustering process, so the comparison with the WordNet gold-standard is still a fair one. Once again, the goal is to determine how much like the human-crafted WordNet taxonomy is the tax- onomy that is clustered automatically from the discriminating words D j only. The clustering ac- curacy for all three seeds are shown in Tables 2, 3 and 4. Cycle E P # Features Coverage 1 st !A1D !61+ +,D 66L 2 nd !1BA !D*1 *<:1 DDL 3 rd !1D1 !D*D 1**< :1L 4 th !A*1 !6<, 1<DA :AL 5 th !1:+ !6:< 1DB1 :AL &1$WordNet (2200- Cycle E P # Features Coverage 1 st !**B !:<1 A6A <*L 2 nd !1BB !D1< D:D B+L 3 rd !1:6 !6+< *A61 D<L 4 th !1D+ !6+< *:BA D+L 5 th !1++ !6DA 11D< :1L &A$ConceptNet Cycle E P # Features Coverage 1 st !1B< !D*6 :AD B+L 2 nd !1:, !D*1 *AA: DAL 3 rd !1:+ !6+A *+<< D+L 4 th !A*A !66, 1A*1 :1L 5 th !*BD !:<A 16*< :1L &<$Simile & <,1 #casuarina, cinchona, do- decahedron concavity> * " !"= ,!61D61!DL! 840 0 #B *,, 4%!' 0 > #B ! #( &*- (S:1L-B ! & yesteryear, nonce ( - salient(3-jag, droop, fluting, fete, throb, poundage, stinging, rouble, rupee, riel, drachma, escudo, dinar, dirham, lira,dispensationhoardairstream( -riversidecurling!; A< $ # ! ;A$) ! ;<$0 !& 0>$ H,!61D! 4 "% % 6:L6DL B 61!DL 0 >! 5 :<!AL 66!<L0> and ( Tem- peratureColor!- D,!+L !; <,1 G5(1,,:- (6+!:BL-! 4 # !& 316*< 0 > B*A<B !& 0>! 6 Conclusions & # B !%# ! 4 "% 0>!" ! & #! ; G5(1,,D- O= !G 5(1,,:- 1 2 3 4 5 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 WordNet Simile ConceptNet Bootstrapping Cycle Coverage 1 2 3 4 5 0.40 0.50 0.60 0.70 0.80 0.90 1.00 WordNet Simile ConceptNet Poesio & Alm. Bootstrapping Cycle Purity 841 ! 7D j C i 8 7D j P k C i 8= C i D j ! 7 8 ! > :1L 0> ! B A1: <,1 :*!B+L ! !> # ! & #F et al. (1,,:- ! # # # ! References >&!2.!(*+::-!0!& 0! In Proc. of the 26 th >.>' 1*D 11<! > >! 0 .! (1,,B-! ' "! Proc. of the annual meeting of the Cognitive Society?! 4 >! 5 )! (1,,6-! 2 "% .'Q !Computational Linguistics,A1(*-$*A <D! 0!"?!(1,,D-!>> # Q T "! In Proc. of the 45 th Annual Meeting of the ACL::: :+B! 2!4.!(*+++-!; ! In Proc. of the 37 th Annual Meeting of the ACLBDN6<! 2 /! F ! ! .! 0> .!"!!&! U>! (1,,<-!" F> ( -! In Proc. of the 13 th WWW Conference*,,N*,+! 5F!?!(*+:6-!52;$>. 0! In Proc. of the 5 th National Con- ference on Artificial Intelligence 16D 1D* 00!>> >! 50!(1,,<-!"%$"V Proc. of GWC’2004, the 2 nd Global WordNet con- ference.4! 5 .! (*++1-! > # ! In Proc. of the 14 th Int. Conf. on Computational Linguistics BA+NB<B! F G! Q ! &! >! (1,,B-!&.$ ! Int. Jour- nal of Web and Grid Services*(1-1<, 166! F )! (1,,1-! '&/$ > ! Technical Report 02-017.! $OO !!!OSOO! FW!Q2!52!(1,,:-! ' "5 0')!In Proc. of the 46 th Annu- al Meeting of the ACL. '!4!)Q!G!(*++,-!4 $ 3!%U$> "! '5!0!(1,,<-%$>0 Q&! BT Technology Journal11(<-$1** 116! .)!4Q!;!)! .F!?! (*++,-! "%$ !!?' A(<-$1ABN1<<! %!0>!(1,,*-!& !In Proc. of the 2 nd International Con- ference on Formal Ontology in Information Sys- tems (FOIS-2001)! Q!?!%>!U!(1,,<-!' ! Advances in Neural Information Process- ing Systems*D! G&!5U! (1,,D-!.'/ ; !In Proc. of the 45 th Annual Meeting of the ACLBDN6<! G&!5U! (1,,:-!>;F Q ) .!In Proc. of Coling 2008, The 22 nd International Conference on Computational Linguistics.! 842 . Association for Computational Linguistics Growing Finely-Discriminating Taxonomies from Seeds of Varying Quality and Size Tony Veale tony.veale@ucd.ie Guofu. Proceedings of the 12th Conference of the European Chapter of the ACL, pages 835–842, Athens, Greece, 30 March