Proceedings of EACL '99 #-TBL Lite: A Small, Extendible Transformation-Based Learner Torbj6rn Lager Department of Linguistics Uppsala University SWEDEN Torbjorn.Lager@ling.uu.se Abstract This short paper describes - and in fact gives the complete source for - a tiny Prolog program implementing a flexi- ble and fairly efficient Transformation- Based Learning (TBL) system: 1 Introduction Transformation-Based Learning (Brill, 1995) is a well-established learning method in NLP circles. This short paper presents a 'light' version of the #-TBL system - a genera/, logically transparent, flexible and efficient transformation-based learner presented in (Lager, 1999). It turns out that a transformation-based learner, complete with a compiler for templates, can be implemented in less than one page of Prolog code. 2 #-TBL Rules &= Representations The point of departure for TBL is a tagged initial- state corpus and a correctly tagged training cor- pus. Assuming the part-of-speech tagging task, corpus data can be represented by means of three kinds of clauses: wd(P,W) is true iff the word W is at position P in the corpus tag(P,A) is true iff the word at position P in the corpus is tagged A tag(A,B,P) is true iff the word at P is tagged A and the correct tag for the word at P is B Although this representation may seem a bit re- dundant, it provides exactly the kind of indexing into the data that is needed3 A decent Prolog system can deal with millions of such clauses. 1 Assuming a Prolog with first argument indexing. The #-TBL systems are implemented in SICStus Pro- log. The object of TBL is to learn an ordered se- quence of transformation rules. Such rules dictate when - based on the context - a word should have its tag changed. An example would be "replace tag vb with nn if the word immediately to the left has a tag dr." Here is how this rule is represented in the #-TBL rule/template formalism: tag:vb>nn <- tag:dr@[-1]. Conditions may refer to different features, and complex conditions may be composed from sim- pler ones. For example, here is a rule saying "re- place tag rb with j j, if the current word is "only", and if one of the previous two tags is dr.": tag:rb>jj <- wd:only@[O] ~ tag:dt~[-l,-2]. Rules that can be learned in TBL are instances of templates, such as "replace tag A with B if the word immediately to the left has tag C", where A, B and C are variables. In the/~-TBL formalism: t3(A,B,C) # tag:A>B <- tag:C~[-l]. Positive and negative instances of rules that are instances of this template can be generated by means of the following clauses: pos (t3(A,B,C)) :- dif(A,B),tag(A,B,P),Pl is P-l,tag(Pl,C). neg(t3(A,B,C)) :- tag(A,A,P),P1 is P-l,tag(Pi,C). Tied to each template is also a procedure that will apply rules that are instances of the template: app(t3(A,B,C)) :- (tag(A,X,P), Pl is P-l, tag(Pl,C), retract (tag(A,X,P)), retract (tag(P,A)), assert(tag(B,X,P)), assert(tag(P,B)), fail ; true). 3 The #-TBL Template Compiler To write clauses such as the above by hand for large sets of templates would be tedious and prone to errors. Instead, Prolog's term expansion facil- ity, and a couple of DCG rules, can be used to compile templates into Prolog code, as follows: 279 Proceedings of EACL '99 term_expansion((ID # A<-Cs), [(pos(ID) :- Gt), (neg(ID) :- G2), (app(ID) :- (G3,fail;true))]) :- pos((A<-Cs),Ll,[]), list2goal(Li,Gl), neg((A<-Cs),L2,[]), list2goal(L2,G2), app((A<-Cs),L3,[]), list2goal(L3,G3). pos((F:A>B<-Cs)) > {G = [F,A,B,P]},[dif(A,B),G], cond(Cs,P). neg((F:A>_<-Cs)) > {G = [F,A,A,P]}, [G], cond(Cs,P). app ( (F: A>B<-Cs) ) > {G1 = [F,A,X,P], G2 = [F,P,A], G3 = [F,B,X,P], G4 = [F,P,B]}, [GI], cond(Cs,P), [retract(Gl), retract(G2), assert(G3), assert(G4)]. cond((C~Cs),P) > cond(C,P), cond(Cs,P). cond(FA©Pos,PO) > pos(Pos,PO,P), feat(FA,P). pos(Pos,P0,P) > [member(0ffset,Pos), P is P0+0ffset]. feat(F:A,P) > {G = [F,P,A]}, [G]. 4 The #-TBL Lite Learner Given corpus data, compiled templates, and a value for Threshold, the predicate tbl/1 imple- ments the /~-TBL main loop, and writes a se- quence of rules to the screen: tbl (Threshold) :- ( setof (N-Rule,L" (bagof (. ,pos (Rule) ,L), length(L,N), N >= Threshold) ,FL), reverse (FL, RevFL), bestof (RevFL, dummy, Threshold, Winner), dif (Winner, dummy) -> write(Winner) ,nl, app (Winner), tbl (Threshold) ; crue ). The call to the setof-bagof combination generates a frequency listing of all positive instances of all templates, based on which the call to bestof/4 then selects the rule with the highest score, tbl/1 terminates if the score for that rule is less than the threshold, else it applies the rule and goes on to learn more rules from there. bestof (FL0, Leader, HiScore, Winner) • - ( FL0 = [Pos-Kule]FL] , Pos > HiScore -> Max is Pos-HiScore, ( count0 (neg (Rule) ,Max,Neg) -> bestof (FL,Rule,Pos-Neg,Winner) ; bestof (FL, Leader, HiScore, Winner) ) Winner = Leader ). To compute the rule with the highest score, bestof/4 traverses the frequency listing, keeping track of a leading rule and its score. The score of a rule is calculated as the difference between the number of its positive instances and its negative instances. When the list of rules is empty or the number of positive instances of the most frequent rule in what remains of the list is less than the leading rules score, the leader is declared winner. The following procedure implements the count- ing of negative instances in an efficient way: count0 (G,M,N) :- ( bb_put(c,O), G, bb_get(c,NO), N is NO+l, bb_puZ(c,N), N > M -> fail ; bb_get (c, N) ). 5 p-TBL Lite Performance The learner was benchmarked on a 250Mhz Sun Ultra Enterprise 3000, training on Swedish cor- pora of three different sizes, with 23 different tags, and the 26 templates that Brill uses in his context-rule learner 2. In each case, the ac- curacy of the resulting sequence of rules was measured on a test corpus consisting of 40k words, with an initial-state accuracy of 93.3~. The following table summarizes the results: Size Thrshld Runtime ~ of rules Acc. 30k 2 15 min 99 95.5% 60k 4 24 rain 85 95.7% 120k 6 60 rain 92 95.8% By comparison, it took Brill's C-implemented context-rule learner 90 minutes, 185 minutes, and 560 minutes, respectively, to train on these corpora, producing similar sequences of rules. Thus #-TBL Lite is an order of magnitude faster than Brill's learner. The full #-TBL system presented in (Lager, 1999) is even faster, uses less memory, and is in certain respects more general. Small is beautiful, however, and the light version may also have a greater pedagogi- cal value. Both versions can be downloaded from http ://www. ling. gu. se/~lager/mutbl, html. References Lager, TorbjSrn. 1999. The #-TBL System: Logic Programming Tools for Transformation- Based Learning, In: Proceedings of CoNLL-99, Bergen. Brill, Eric. 1995. Transformation-Based Error- Driven Learning and Natural Language Process- ing: A Case Study in Part of Speech Tagging. Computational Linguistics, December 1995. 2Available from http://www, cs. jhu. edu/~brill. 280 . Proceedings of EACL '99 #-TBL Lite: A Small, Extendible Transformation-Based Learner Torbj6rn Lager Department of Linguistics. transparent, flexible and efficient transformation-based learner presented in (Lager, 1999). It turns out that a transformation-based learner, complete