... methods have been widely used in sentiment analysis. In particular, the use of SVMs in (Pang et al., 2002) initially sparked interest in using machine learning methods for sentiment classification. ... approaches. Does a one-time effort in compiling a domain- independent dictionary and using it for different sentiment tasks pay off in comparison to simply using unsupervis...
Ngày tải lên: 30/03/2014, 23:20
... also evaluated performance using the 52 doubly-annotated files present in the RST- DT as test set (see Table 3). In each case, the remaining 340–350 files are used for training. For each corpus evaluation, ... are obtained through automated grid search with n-fold cross- validation (Staelin, 2003) on the training corpus, while a separate test set is used for performance evaluation. I...
Ngày tải lên: 30/03/2014, 23:20
Tài liệu Báo cáo khoa học: "Translation Model Adaptation for Statistical Machine Translation with Monolingual Topic Information" doc
... easily. In this paper, we propose a novel approach for translation model adapta- tion by utilizing in- domain monolingual top- ic information instead of the in- domain bilin- gual corpora, which incorporates ... data, including 33 docu- ments with 666 sentences, is our test set. To obtain various topic distributions for the out- of-domain training corpus and the in- domain mono- ling...
Ngày tải lên: 19/02/2014, 19:20
Báo cáo khoa học: "An Extensible Architecture for Integrating Natural Language Processing Techniques with Wikis" docx
... challenges by using wiki pages as input/output documents. For instance, 76 by running the sentiment analysis component right from within the wiki, its output can be written back to the originating wiki ... high- lighted in green. analysis by a link to a page providing related in- formation such as evaluation datasets. Wikulu sup- ports users in this tedious task by automatically sug...
Ngày tải lên: 07/03/2014, 22:20
Báo cáo khoa học: "Bypassed Alignment Graph for Learning Coordination in Japanese Sentences" doc
... listed above are concerned mainly with scope disam- biguation, reflecting the fact that detecting the presence of coordinations in a sentence (Task 1) is straightforward in English. Indeed, nearly 100% precision ... only in the first word. Both contain a particle to, which is one of the most fre- quent coordination markers in Japanese—but only the first sentence contains a coordinate str...
Ngày tải lên: 08/03/2014, 01:20
Báo cáo khoa học: "Statistical Machine Translation for Query Expansion in Answer Retrieval" pptx
... p LM (syn I 1 ) λ LM For estimation of the feature weights λ defined in equation (4) we employed minimum error rate (MER) training under the BLEU measure (Och, 2003). Training data for MER training were ... practice imagination concentration information consciousness different meditation relaxation qa-translation (-): birth industrial induced induces paraphrasing (-): way workers induc...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "Machine-learned contexts for linguistic operations in German sentence realization" doc
... complete, spanning parse: 85.14% of the sentences in the training and parameter tuning set, and 84.59% in the blind test set fall into that category. Most sentences yield more than one training case. ... a machine Computational Linguistics (ACL), Philadelphia, July 2002, pp. 25-32. Proceedings of the 40th Annual Meeting of the Association for learning approach. The linguistically...
Ngày tải lên: 08/03/2014, 07:20
Báo cáo khoa học: "A global model for joint lemmatization and part-of-speech prediction" doc
... English indicating present tense third person singular verb and A–FS-N for Bulgarian indicating a feminine singular adjective in indefinite form. In this work we predict only main POS tags for the ... encouraging re- using the same lemma for different words. An ad- ditional feature fires for every distinct lemma, in effect counting the number of assigned lemmas. 5.2 Training and i...
Ngày tải lên: 17/03/2014, 01:20
Báo cáo khoa học: "ConsentCanvas: Automatic Texturing for Improved Readability in EndUser License Agreements" pot
... command line interface. It then passes this document to four independent submodules for analysis. Each submodule stores the initial and final character positions of a string selected from within ... passed to our rendering system, which inserts the corresponding HTML5 tags at the posi- tions in original plaintext EULA. We append a header to the output document to include the linked...
Ngày tải lên: 23/03/2014, 16:20
Báo cáo khoa học: "Exploiting Feature Hierarchy for Transfer Learning in Named Entity Recognition" ppt
... call tuning), using the previ- ously trained prior for regularization. If we are un- able to find a match between features in the training and tuning datasets (for instance, if a word appears in the ... trained classifier to make predictions. 246 In the paradigm of inductive learning, (X train , Y train ) are known, while both X test and Y test are completely hidden during training tim...
Ngày tải lên: 23/03/2014, 17:20