Tài liệu Báo cáo khoa học: "Using linguistic principles to recover empty categories" ppt
... The term recovery refers to the complete package: detection, resolution, and assignment of function tags to empty categories. 2 Empty nodes in the Penn Treebank The major types of empty category ... iterate over nodes from top down 2 for each node X 3 try to insert NP* in X 4 try to insert 0 in X 5 try to insert WHNP 0 or WHADVP 0 in X 6 try to insert *U* in X 7 t...
Ngày tải lên: 20/02/2014, 16:20
... some types of linguistic struc- ture. Because of this, less linguistic structure needs to be “built in” to an adaptor grammar compared to a comparable PCFG. For example, the adaptor gram- mars ... tree under the adaptor grammar and the PCFG approximation. 3 Word segmentation with adaptor grammars We now turn to linguistic applications of adap- tor grammars, specifically, to mod...
Ngày tải lên: 20/02/2014, 09:20
... automatic method to create a thesaurus that is sensitive to the sentiment of words expressed in different domains. • We describe a method to use the created the- saurus to expand feature vectors ... vector d ∈ R N , where the value of the j-th element d j is set to the total number of occurrences of the unigram or bigram w j in the review d. To find the suitable candidates to exp...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Using Confidence Bands for Parallel Texts Alignment" pptx
... second class from 2769 to 5538 and so forth. With this histogram, we are able to identify those words which are too far apart from their expected positions. In Figure 2, the gap in the histogram makes ... Points filtered p oints 3297 Figure 2: Histogram of the distances between expected and real word positions. In order to build this histogram, we use the Sturges rule (see ‘Histograms...
Ngày tải lên: 20/02/2014, 18:20
Tài liệu Báo cáo khoa học: "USING BRACKETED PARSES TO EVALUATE A GRAMMAR CHECKING APPLICATION" ppt
... a more complex approach to evaluat- ing the performance of the system's ability to detect errors. Here, we need to look at both the 1. We use the term critique to represent an instance ... In order to coerce our system into accepting only the desired parse tree, we modified it to ac- cept only parses that satisfied bracketed forms. 6. The BSEC has the capability to...
Ngày tải lên: 20/02/2014, 21:20
Tài liệu Báo cáo khoa học: "Obfuscating Document Stylometry to Preserve Author Anonymity" pptx
... that there are two ways to apply this information. The first is to simply correct the term to conform to the norms as defined by the authors in K. The second approach is to incorpo- rate characteristic ... discriminators, but it also means that we do not have a threshold value to drive the feature adjustment process. Token DTR Frequency Token DTR Frequency the...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Using Smaller Constituents Rather Than Sentences in Active Learning for Japanese Dependency Parsing" docx
... Corporation Midtown Tower, 9-7-1 Akasaka, Minato-ku, Tokyo 107-6211, Japan msassano@yahoo-corp.jp Sadao Kurohashi Graduate School of Informatics, Kyoto University Yoshida-honmachi, Sakyo-ku, Kyoto 606-8501, ... Simon Tong and Daphne Koller. 2000. Support vec- tor machine active learning with applications to text classification. In Proc. of ICML-2000, pages 999– 1006. Kiyotaka Uchimoto, Satos...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Using Cross-Entity Inference to Improve Event Extraction" docx
... (CLUTO toolkit) 3 is used to divide it into different cohesive subtypes, each of which only contains the entities of the same background. For instance, the Air entities will be divided into ... entity. After establishing the vector space model (VSM) for each entity mention of the type, we adopt a clustering toolkit (CLUTO) to further divide the mentions into different subtypes. F...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Using Structural Information for Identifying Similar Chinese Characters" pdf
... for computer-assisted language learning and for psycholinguistic studies. Al- though it is possible for us to employ image- based methods to identify visually similar characters, the resulting ... The sentence “經理要我構買一部計算機” also con- † We use Arabic digits to denote the four tones in Mandarin. tains an error, and we need to replace “構買” with “購買”. “構買” is cons...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Using Automatically Transcribed Dialogs to Learn User Models in a Spoken Dialog System" doc
... Computational Linguistics Using Automatically Transcribed Dialogs to Learn User Models in a Spoken Dialog System Umar Syed Department of Computer Science Princeton University Princeton, NJ 08540, ... A t and ˜ A t are all assumed to belong to finite sets, and so all the conditional distributions in our model are multinomials. Hence θ is a vec- tor that parameterizes the user model accord...
Ngày tải lên: 20/02/2014, 09:20