... 2010. c 2010 Association for Computational Linguistics Profiting from Mark-Up: Hyper-Text Annotations for Guided Parsing Valentin I. Spitkovsky Computer Science Department Stanford University and Google ... found good solutions from bracketed corpora but not from raw text, sup- porting the view that purely unsupervised, self- organizing inference methods can miss the trees...
Ngày tải lên: 30/03/2014, 21:20
... p 1 and p 2 are extracted. For each method, we randomly sample n samples from all of the paraphrase pairs (p 1 , p 2 ) for which both s 1 and s 2 are retrieved. Then, from each (p 1 , p 2 ) and ... recognizing definition sentences for the same concept is quite an easy task at least for Japanese, as we will show, and we were able to find a huge amount of definition sentence pairs fro...
Ngày tải lên: 30/03/2014, 21:20
Tài liệu Báo cáo khoa học: "Event-based Hyperspace Analogue to Language for Query Expansion" ppt
... Construction From Events Since events are extracted from documents, they form a reduced text corpus from which HAL can 1 To be specific, the modifiers include negation, as well as adverbs or particles for ... syntactic 120 information. This was represented by an undi- rected graph, where nodes stood for words, de- pendency edges stood for syntactical relations, and sequences of depe...
Ngày tải lên: 20/02/2014, 04:20
... semantic score accounts for the semantic preference on a given set of lexical categories and a particular syntactic structure for the sentence. Various formulation for the lexical score and ... literatures. Hence, we will concentrate on the formulation for semantic score. 3. Semantic Tagging Canonical Form of Semantic Representation Given the formulation in Eqn. (1), first we...
Ngày tải lên: 20/02/2014, 21:20
Tài liệu Báo cáo khoa học: "Analysing Wikipedia and Gold-Standard Corpora for NER Training" ppt
... available for CoNLL and MUC, the BBN corpus was split at our discretion: sec- tions 03–21 for TRAIN, 00–02 for DEV and 22-24 for TEST. Corpus sizes are compared in Table 1. 2.2 Evaluating NER performance One ... coverage. Transforming links into annotations that con- form to a gold standard is far from trivial. Link boundaries need to be adjusted, e.g. to remove ex- cess punctuati...
Ngày tải lên: 22/02/2014, 02:20
Tài liệu Báo cáo khoa học: "Deriving Verbal and Compositional Lexical Aspect for NLP Applications" pptx
... be determined algo- rithmically both from the verbal lexicon and from composed structures built from verbs and other sen- tence constituents, using uniform processes and rep- resentations. ... built up from linguistically relevant and univer- sally accessible elements of verb meaning. Bor- rowing from Jackendoff (1990), we assume seman- tic structure to conform to wellformed...
Ngày tải lên: 22/02/2014, 03:20
Báo cáo khoa học: "Local Histograms of Character N -grams for Authorship Attribution" ppt
... results for these data sets as well: BOW and LOWBOW histograms ob- tained comparable performance to each other and the BOLH formulation performed the best. The BOLH formulation outperforms state ... have used in- formation derived from local histograms for dis- playing a 2D representation of document’s con- tent. More recently, Chasanis et al. (2009) used the LOWBOW framework for segm...
Ngày tải lên: 07/03/2014, 22:20
Báo cáo khoa học: "Noun-Phrase Analysis in Unrestricted Text for Information Retrieval" pptx
... interpretation is use- ful for tasks like information retrieval, document classification, and thesaurus extraction, and indeed forms the basis in the CLARIT system for automated thesaurus discovery. ... parameters for statistics. For example, one useful heuristic is that we should use a higher threshold of reliability (evidence) for accepting the pair [adjective, noun] as a l...
Ngày tải lên: 17/03/2014, 09:20
Báo cáo khoa học: "The Design of a Computer Language for Linguistic Information" ppt
... Computer Language for Linguistic Information Stuart M. Shieber Artificial Intelligence Center SRI International and Center for the Study of Language and Information Stanford University Abstract ... tion of linguistic information to computers. The PATR-II formalism is our current computer language for encoding linguistic information. This paper, a brief overview of that forma...
Ngày tải lên: 24/03/2014, 01:21
Báo cáo khoa học: Hyaluronan–CD44 interactions as potential targets for cancer therapy pptx
... of CD44v isoforms from less malignant to more advanced stages is beyond the scope of this minireview, we high- light the relevance of CD44v isoforms in cancer which seem to be suitable targets for anti-cancer ... vasoactive mediators for vascular permeability, serum fibrinogen for fibrin clot formation and growth factors ⁄ cyto- kines ⁄ matricellular proteins to initiate granulation tis...
Ngày tải lên: 28/03/2014, 23:20