... getstokenized correctly, independently of the number ofresulting tokens; the token-based measures refer to the four token fields into which the ATB splits eachworddetermines the ATB tokenization. ... corpora asTR1 and TR2, and to the test corpora as, TE1 and TE2. We report results on both TE1 and TE2 be-cause of the differences in the two parts of the ATB,both in terms of origin and in terms ... request.575velopment, training, and test corpora with roughly12,000 word tokens in each of the development and test corpora, and 120,000 words in each of the train-ing corpora. We will refer to the training...