intelligent selection of language model training data

Báo cáo khoa học: "Intelligent Selection of Language Model Training Data" ppt

Báo cáo khoa học: "Intelligent Selection of Language Model Training Data" ppt

... selecting a subset of the available data as language model training data. This not only pro- duces a language model better matched to the do- main of interest (as measured in terms of perplex- ity ... non-domain-specifc language models, for each sentence of the text source used to produce the latter language model. We show that this produces better language models, trained on less data, than both random data ... Computational Linguistics Intelligent Selection of Language Model Training Data Robert C. Moore William Lewis Microsoft Research Redmond, WA 98052, USA {bobmoore,wilewis}@microsoft.com Abstract We...

Ngày tải lên: 07/03/2014, 22:20

5 348 0
Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words" pdf

Báo cáo khoa học: "Automatic Acquisition of Language Model based on Head-Dependent Relation between Words" pdf

... i i i ! (DEP model) o (TRI model) "*' rT I I I I I I 200 400 600 800 1000 1200 1400 1600 No. of training sentences Figure 8: Model size Related to the size of model, however, ... more useful than the naive word sequences of n-gram, for language modeling. We are planning to experiment the perfor- mance of the proposed language model for large corpus, for various domains, ... Based n-gram Models of Natural Language& quot;. Computational Linguistics, 18(4):467-480. C. Chang and C. Chen. 1996. "Application Is- sues of SA-class Bigram Language Models"....

Ngày tải lên: 08/03/2014, 05:21

5 334 0
Báo cáo khoa học: "A Word-Order Database for Testing Computational Models of Language Acquisition" docx

Báo cáo khoa học: "A Word-Order Database for Testing Computational Models of Language Acquisition" docx

... investigators interested in computational models of natural language acquisition. 2 The Language Domain Database The focus of the language domain database, (hereafter LDD), is to make readily ... TLA as a viable model of human language acquisition. The STL: Fodor’s Structural Triggers Learner (STL) makes greater use of the parser than the TLA. A key feature of the model is that parameter ... primary areas: Gram- mar Selection, Sentence Selection and Data Download. First a user has to specify, on the Grammar Selection page, which settings of the 13 parameters are of interest and save...

Ngày tải lên: 08/03/2014, 04:22

8 368 0
Báo cáo khoa học: "EVALUATION OF NATURAL LANGUAGE INTERFACES TO DATABASE SYSTEMS: A PANEL DISCUSSION " ppt

Báo cáo khoa học: "EVALUATION OF NATURAL LANGUAGE INTERFACES TO DATABASE SYSTEMS: A PANEL DISCUSSION " ppt

... EVALUATION OF NATURAL LANGUAGE INTERFACES TO DATABASE SYSTEMS: A PANEL DISCUSSION Norman K. Sondheimer, Chair Sperry Univac Blue Bell, PA For a natural language access to database system ... informal evaluations of systems conducted. Recently, this has begun to change. In the last several years, many of the current generation of natural language access to database systems have ... and the database domaln+s semantics. All natural language access systems achieve some degree of success. But to make progress as a field, we need to be able to evaluate the degree of this...

Ngày tải lên: 08/03/2014, 18:20

2 371 0
Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difficulty of Texts for FFL" potx

Báo cáo khoa học: "Combining a Statistical Language Model with Logistic Regression to Predict the Lexical and Syntactic Difficulty of Texts for FFL" potx

... a statistical language model and a measure of tense difficulty. 4.1 The language model The lexical difficulty of a text is quite an elaborate phenomenon to parameterise. The logistic regres- sion models ... fact that the MLR model multiplies the number of pa- rameters by J − 1 compared to the PO model. Because of this, they recommend using the PO model. 6 Implementation of the models Having covered ... perfectly represen- tative of the population (which could be true for our data) . These analyses would aim to illuminate some of the assets and flaws of each of the statis- tical models considered. Acknowledgments Thomas...

Ngày tải lên: 08/03/2014, 21:20

9 514 0
Báo cáo khoa học: "Combining data and mathematical models of language change" ppt

Báo cáo khoa học: "Combining data and mathematical models of language change" ppt

... computational model. We analyze the dynamics of 5 dynamical systems models of linguistic populations, each derived from a model of learning by individuals. We compare each model s dy- namics to a set of ... the study of language change. Language and Linguis- tics Compass, 2(3):289–307. G.J. Baxter, R.A. Blythe, W. Croft, and A.J. McK- ane. 2009. Modeling language change: An evalu- ation of Trudgill’s ... 2 lists which of Models 1–5 show each of the desired properties (from §3.2), corresponding to aspects of the observed diachronic dynamics of N/V pair stress. Based on this set of models, we are...

Ngày tải lên: 17/03/2014, 00:20

11 407 0
Báo cáo khoa học: "Machine-Learning-Based Transformation of Passive Japanese Sentences into Active by Separating Training Data into Each Input Particle" doc

Báo cáo khoa học: "Machine-Learning-Based Transformation of Passive Japanese Sentences into Active by Separating Training Data into Each Input Particle" doc

... changing the number of training data. We found that our method of separating training data for all source particles could obtain high accuracy rates even when there were few training data. This indicates that ... method in terms of separating the training data into source parti- cles. Baseline 3 separates the training data into 592 Table 6: Deletion of features Deleted Closed data Open data features Eval. ... sepa- rate training data for any source particles, obtained a low accuracy rate (75.57%), when the number of training data was small. This indicates that our method of separating training data for all...

Ngày tải lên: 17/03/2014, 04:20

8 426 0
Báo cáo khoa học: "Machine-Learning-Based Transformation of Passive Japanese Sentences into Active by Separating Training Data into Each Input Particle" ppt

Báo cáo khoa học: "Machine-Learning-Based Transformation of Passive Japanese Sentences into Active by Separating Training Data into Each Input Particle" ppt

... cannot readily be included in training data. For simplicity of implementation, they are excluded from training data (we will dis- cuss the use of these excluded data in Section 6). Note that ... count- ability of English nouns from corpus data. In Proc. of 41st Annual Meeting of ACL, pages 463–470. T. Baldwin and F. Bond. 2003b. A plethora of meth- ods for learning English countability. In Proc. of 2003 ... corresponding training data as a feature to create a new set of training data be- fore applying a machine learning algorithm; then a machine learning algorithm is applied to the new set. The resulting model...

Ngày tải lên: 17/03/2014, 04:20

8 305 0
Báo cáo khoa học: "A Pylonic Decision-Tree Language Model with Optimal Question Selection" potx

Báo cáo khoa học: "A Pylonic Decision-Tree Language Model with Optimal Question Selection" potx

... test data with respect to the tree. A comparison with the perplexity of a standard back-off trigram model will in- dicate which model performs better. Although decision-tree letter language models ... for word language models. In the case of words References L. R. Bahl, P. F. Brown, P. V. de Souza, and R. L. Mercer. 1989. A tree-based statistical language model for natural language speech ... automatic building of the hierarchy accounts both for similarity in meaning and of parts of speech. the vocabulary is significantly larger, making impossible the estimation of N-gram models for N...

Ngày tải lên: 17/03/2014, 07:20

4 283 0
Báo cáo khoa học: "EVALUATION OF NATURAL LANGUAGE INTERFACES TO DATA BASE SYSTEMS" pdf

Báo cáo khoa học: "EVALUATION OF NATURAL LANGUAGE INTERFACES TO DATA BASE SYSTEMS" pdf

... 19.0 Statements (data addition) 5.0 Considering the wide range of R k'r- syntax [7], the pau- city of complex sentences is surprising. The use of definitions which often involved complex ... EVALUATION OF NATURAL LANGUAGE INTERFACES TO DATA BASE SYSTEMS Bozena Henisz Thompson California Institute of Technology INTEODUCT~ON Is evaluation, like beauty, in the eye of the beholder? ... argued as force- fully on the basis of my study of users* evaluation of machine translation [2] a study which was prompted by the evaluations of the quality of machine translation as viewed...

Ngày tải lên: 17/03/2014, 19:20

4 341 0
Báo cáo " On the detection of gross errors in digital terrain model source data " pdf

Báo cáo " On the detection of gross errors in digital terrain model source data " pdf

... not directly reflect the character of topography.Notethattheanomaly of i V  may be caused by either gross error of source data orvariation of topography. In next sections, ... the source data,  they can be easily converted to points. 2.Testmethodology 2.1.Test data This research uses two sets of data:  one is the DEM project in the area of old ... of total number of data points and assign them intentional gross errors with magnitude of 2‐20timeslargerthantheoriginalrootmean square error (RMSE). The selected data ...

Ngày tải lên: 22/03/2014, 12:20

7 379 0
The role of language in adult education and poverty reduction in Botswan

The role of language in adult education and poverty reduction in Botswan

... national language policy currently in place in Botswana, is that Setswana is the national language of the country (Republic of Botswana 1985:8). So, despite the existence of other local languages, ... parts of the world, education is for the most part the preserve of the few (elites). The choice of which language or dialect to use to teach (medium of instruction) reflects the interests of those ... second and even third languages are added to the learners’ repertoire of language systems whilst sustaining the primary language through the schooling process instead of subtractive bilingualism...

Ngày tải lên: 05/11/2012, 16:27

5 838 1
bai 13 use of language

bai 13 use of language

Ngày tải lên: 05/07/2013, 01:26

1 333 0
w