... optimized on the heldout data. Usually,larger values are used for global parameters and for domains with more data, while for domains with less data, the variance is typically set to besmaller, ... this∗Currently with Tallinn University of Technology, Esto-niapaper is that we show how the suggested hierar-chical adaptation can be used with suitable pri-ors and combined with the class-based ... speech. For training the LMs, two sources were used:first 5M sentences from the Gigaword (2nd ed.)corpus (99.5M words), and broadcast news tran-scriptions from the TDT4 corpus (1.19M words).The...