Báo cáo khoa học: "Scaling Distributional Similarity to Large Corpora" doc
... synonymy using distributional similarity requires large vol- umes of data to reliably represent infre- quent words. However, the na¨ıve nearest- neighbour approach to comparing context vectors extracted ... to calculate the similarity between context vectors. Curran (2004) decomposes this into measure and weight func- tions. The measure calculates the similarity between two we...
Ngày tải lên: 31/03/2014, 01:20
... use the evidence of distributional similarity to achieve better spelling correction accuracy. We present two methods that are able to take advan- tage of distributional similarity information. ... aventura. To solve this problem, we consider alternative methods to make use of the information beyond a 1026 term’s character strings. Distributional similarity provides...
Ngày tải lên: 17/03/2014, 04:20
... vector of the expanded word is analogous to the set of all relevant docu- ments while tested features correspond to retrieved documents. Included features thus correspond to relevant retrieved documents, ... vectors, as those usu- ally yield low symmetric similarity with the longer vectors of more common templates. 3 A Statistical Inclusion Measure Our research goal was to develop a...
Ngày tải lên: 23/03/2014, 17:20
Báo cáo khoa học: "From Prosodic Trees to Syntactic Trees" doc
... we use to mark constituent structures. At that time, those cantillation marks were intended to record the correct way of reading or chanting the Hebrew text: how to group words into phrases ... correspond to each other in some systematic ways. Just as there are ways to transform syntactic structures to prosodic structures (e.g. Abney 1992), prosodic structures can also pro...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "Two Easy Improvements to Lexical Weighting" doc
... approaches to morphology and provenance in machine translation are possible. We have chosen to implement our approach as exten- sions to lexical weighting (Koehn et al., 2003), which is nearly ubiquitous, ... mapping from s-features to sentence weights was chosen to optimize ex- pected TER on held-out data. A drawback of this method is that we must now learn the mapping from s-featu...
Ngày tải lên: 23/03/2014, 16:20
Báo cáo khoa học: "Fully Abstractive Approach to Guided Summarization" docx
... task at TAC is to motivate a move towards abstractive approaches. It is an oriented multidocument sum- marization task in which a category is attributed to a cluster of 10 source documents to be summa- rized ... appropriate to address it. The idea to use an IE system for summarization can be traced back to the FRUMP system (DeJong, 1982), which generates brief summaries about vari...
Ngày tải lên: 30/03/2014, 17:20
Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf
... tempered to take into account the quantity of data that supports its conclusion. To put this another way, we want to pick (w,p) pairs that have two properties, p(w I P) is high and [ w, pl is large. ... ground patient floor unit room entrance doctor administrator corridor staff department bed pharmacist director superintendent storage chief lawn compound head...
Ngày tải lên: 20/02/2014, 19:20
Báo cáo khoa học: "Constructing Transliteration Lexicons from Web Corpora" docx
... phonetically and statistically to a syllable in the target language. Two conversions using phoneme -to- phoneme and text -to- phoneme syllabification algorithms are automatically deduced from a ... of paired terms and are used to calculate the degree of similarity between phonemes for transliterated-term extraction. In a large- scale experiment using this automated learning proce...
Ngày tải lên: 17/03/2014, 06:20
Báo cáo khoa học: "Learning Tense Translation from Bilingual Corpora" docx
... mSchte mich beschweren. I 'd like to myself weigh down I'd like to make a complaint. For translation, the discontinuous words must be amalgamated into single semantic items. Single words ... Transducers Two partial parsers (rather: transducers) are used to detect English and German CVPs and to translate them into predicate argument structures (verb chains). The par...
Ngày tải lên: 31/03/2014, 04:20
Báo cáo khoa học: "Scaling up from Dialogue to Multilogue: some principles and benchmarks" doc
... 2LS UK {ginzburg,raquel}@dcs.kcl.ac.uk Abstract The paper considers how to scale up dialogue protocols to multilogue, settings with multiple conversationalists. We extract two benchmarks to evaluate scaled up protocols based on the long distance ... point. Applying DR to the querying protocol yields the fol- lowing protocol: (9) Querying with multiple responders 1. LatestMove = Ask(A,...
Ngày tải lên: 08/03/2014, 04:22