... counts. The potential of the bootstrapping approach can best be appreciated by imagining millions of doc- uments with coreference annotations. With such a set, we could extract fine-grained features,
Ngày tải lên: 20/02/2014, 11:21
... obfuscated docu- ment. This is important since there is no clear consensus as to which features should be used for authorship attribution. 2 Document Obfuscation Our approach to document obfuscation ... deletions to be equally weighted. However, while deletion sites in the document are easy to identify, Document Changes Doc Size Changes/1000 49 42 3849 10.9 50 46 2364 19.5...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Ensemble Document Clustering Using Weighted Hypergraph Generated by NMF" docx
... feed- back in IR, where retrieved documents are clus- tered, is actively researched (Hearst and Pedersen, 1996)(Kummamuru et al., 2004). In document clustering, the document is repre- sented as a ... hypergraph in the integration phase. Document clustering is the task of dividing a doc- ument’s data setinto groupsbased ondocumentsim- ilarity. This is the basic intelligent procedure, and is...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "HITS-based Seed Selection and Stop List Construction for Bootstrapping" doc
... instance and the jth pattern. Let A T denote the matrix trans- pose of A. Algorithm 1 shows the pseudocode of Espresso. The input vector i 0 (called seed vector) is an n- dimensional binary vector ... respectively representing the final scores of instances and patterns. Note that for brevity, the pseudocode assumes fixed numbers (k and m) of components in i and p are carried over to the subsequ...
Ngày tải lên: 20/02/2014, 04:20
Tài liệu Báo cáo khoa học: "Semi-Supervised Learning of Partial Cognates using Bilingual Bootstrapping" doc
... can use the knowledge of the sense of certain words in a query in order to re- trieve desired documents in the target language. Our task, disambiguating partial cognates, is in a way equivalent
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Learning with Unlabeled Data for Text Categorization Using Bootstrapping and Feature Projection Techniques" doc
... words within a document as a context. To extract contexts from a document, we use sliding window techniques (Maarek et al., 1991). The window is a slide from the first word of the document to ... occurrence (binary value: 0 or 1) of words T and W in i-th document respectively, and n is the total number of documents in the collected documents. This method calculates the similarity scor...
Ngày tải lên: 20/02/2014, 16:20
Tài liệu Báo cáo khoa học: " Word Translation Disambiguation Using Bilingual Bootstrapping" doc
Ngày tải lên: 20/02/2014, 21:20
Tài liệu Báo cáo khoa học: Molecular basis of glyphosate resistance – different approaches through protein engineering doc
... mutations of the target enzyme (target-site mechanism of resistance) [60], but there is, as yet, no documented case of a plant species having native or evolved tolerance to glyphosate by virtue of ... that is 125-fold higher than for the physiologi- cal substrate glycine; Table 2). An in silico docking analysis of glyphosate binding at the GO active site showed that glyphosate is bound in ......
Ngày tải lên: 14/02/2014, 14:20
Tài liệu Báo cáo khoa học: An alternative isomerohydrolase in the retinal Muller cells of a cone-dominant species doc
... Gennadiy Moiseyev 2 , Ying Chen 2 , Olga Nikolaeva 2 and Jian-Xing Ma 2 1 Department of Medicine Endocrinology, Harold Hamm Oklahoma Diabetes Center, University of Oklahoma Health Sciences Center, Oklahoma
Ngày tải lên: 14/02/2014, 14:20