Báo cáo khoa học: "An Unsupervised System for Identifying English Inclusions in German Text" doc
... Proceedings of the ACL Student Research Workshop, pages 133–138, Ann Arbor, Michigan, June 2005. c 2005 Association for Computational Linguistics An Unsupervised System for Identifying English Inclusions ... German or English. The pipeline is composed of a pre-processing module for tokenisation and POS- tagging as well as a lexicon lookup and Google lookup module for id...
Ngày tải lên: 23/03/2014, 19:20
... Pedersen, and Francine Chen. 1995. A trainable document summarizer. In Pro- ceedings of the 18th ACM-SIGIR Conference, Asso- ciation for Computing Machinery, Special Interest Group Information Retrieval, ... not experts in all of the sub- domains of the papers they annotated. The anno- tators went through a substantial amount of train- ing, including the reading of coding instru...
Ngày tải lên: 22/02/2014, 03:20
... Dan Klein. 2010. Discriminative mod- eling of extraction setsfor machine translation. In Pro- ceedings of the 48th AnnualMeeting of the Association for Computational Linguistics, pages 1453–1463. John ... number of words in each corpus for TM and LM training, tuning, and testing. 7.1 Experimental Setup The data for French, German, and Spanish are from the 2010 Workshop on Statistic...
Ngày tải lên: 20/02/2014, 04:20
Báo cáo khoa học: "An Unsupervised Model for Statistically Determining Coordinate Phrase Attachment" pptx
... backed-off model in [MG, in prep] trained on only 1380 train- ing phrases. The training corpus used in the study presented here consisted of 119629 train- ing phrases. Reducing this figure ... It is interesting to note that after reducing the volume of training data by half there was no drop in accuracy. In fact, accuracy remained exactly the same as the volume of data was i...
Ngày tải lên: 23/03/2014, 19:20
Báo cáo khoa học: "AN EXPERT SYSTEM FOR THE PRODUCTION OF PHONEME STRINGS FROM UNMARKED ENGLISH TEXT USING MACHINE-INDUCED RULES" pdf
... ~. Traininm Mode When UTTER is operating in training mode, the system allows the user to correct errors in transcription interactively by specifying the proper pronunciation for the incorrectly ... combinations of feature values should reduce the number of iterations required in the inference routine by eliminating redundant entries in the training set. This type of training...
Ngày tải lên: 24/03/2014, 05:21
Báo cáo khoa học: "An Online System for Corpus Management and Analysis in Support of Computing in the Humanities" pot
... as an initial set of applications which are offered by the system. 1 Introduction Since there is an ongoing shift towards computer based studies in the humanities new challenges in maintaining ... like downloading a docu- ment for example. The Master Data include infor- mation about all objects managed by the system, for example users, groups, documents, resources and their interre...
Ngày tải lên: 31/03/2014, 20:20
Báo cáo khoa học: "AN INTERNATIONAL DELPHI POLL ON FUTURE TRENDS IN "INFORMATION LINGUISTICS"" doc
... Automatic Indexing by Syntactic Analysis in3 Improvement of Automatic Indexing by Semantic Approaches in4 Probabilistic Methods of Indexing in5 Indexing Functions in6 Automatic Indexing of ... participants are mainly involved in research (defined as: basic groundwork, mainly of theoretical interest, experimental environment) or in applica- tion/development (defined as: mainly of...
Ngày tải lên: 17/03/2014, 19:21
Báo cáo khoa học: "An Estimate of Referent of Noun Phrases in Japanese Sentences" docx
... ITTA. Indefinite noun phrase An indefinite noun phrase denotes an arbitrary member of the class of the noun phrase. For example, "INU(dog)" in the following sentence is an indefinite ... of noun phrases in determining the referents of noun phrases. As a result, on training sentences we ob- tained a precision rate of 82% and a recall rate of 85% in the determination o...
Ngày tải lên: 31/03/2014, 04:20
Báo cáo khoa học: "An Unsupervised Morpheme-Based HMM for Hebrew Morphological Disambiguation" pdf
... 2.4 (in contrast to 1.4 for En- glish). In Hebrew, several morphemes combine into a single word in both agglutinative and fusional ways. This results in a potentially high number of tags for each ... the performances of the baseline tagger used by Habash and Ram- bow – which selects the most frequent tag for a given word in the training corpus – for Hebrew and Arabic, shows s...
Ngày tải lên: 23/03/2014, 18:20
Báo cáo khoa học: "an Unsupervised Web Relation Extraction System" pot
... patterns. 1 Introduction The most common preprocessing technique for text mining is information extraction (IE). It is defined as the task of extracting knowledge out of textual documents. In general, ... massive hu- man effort and hence prevent information extrac- tion from becoming more widely applicable. In order to minimize the huge manual effort in- volved with building in...
Ngày tải lên: 23/03/2014, 18:20