semisupervised simhash for efficient document similarity search

Báo cáo khoa học: "Semi-Supervised SimHash for Efﬁcient Document Similarity Search" pptx

... Association for Computational Linguistics, pages 93–101, Portland, Oregon, June 19-24, 2011. c 2011 Association for Computational Linguistics Semi-Supervised SimHash for Efficient Document Similarity Search Qixia ... are similar to a query document is an important component in modern information retrieval. Some ex- isting hashing methods can be used for efficient document similarity search. However, unsupervised ... best performance. 1 Introduction Document Similarity Search (DSS) is to find similar documents to a query doc in a text corpus or on the web. It is an important component in modern information...

Ngày tải lên: 30/03/2014, 21:20

9 389 0

Tài liệu Báo cáo khoa học: "SPEECH OGLE: Indexing Uncertainty for Spoken Document Search" pptx

... employed for performing this computation. The computation for the backward proba- bility β n stays unchanged (Rabiner, 1989) whereas during the forward pass one needs to split the forward ... section, position information is crucial for being able to evaluate prox- imity information when assigning a relevance score to a given document. In the spoken document case however, we are faced ... TREC evaluations. The PSPL lattices for each segment in the spoken document collection were indexed. In terms of rel- ative size on disk, the uncompressed speech for the ﬁrst 20 lectures uses 2.5GB,...

Ngày tải lên: 20/02/2014, 15:20

4 256 0

Tài liệu Báo cáo khoa học: "A Hybrid Hierarchical Model for Multi-Document Summarization" ppt

... p z s n = p(z s n |c o m ) via Eq.(3). The similarity between the distributions is then measured with transformed IR 818 Document Cluster 1 Document Cluster 2 Document Cluster n f 1 f 2 f 3 f q f-input ... scores 0.43 0.20 0.03 . . Figure 3: Flow diagram for Hybrid Learning Algorithm for Multi -Document Summarization. 7 Conclusion In this paper, we presented a hybrid model for multi -document summarization. We demonstrated that ... Report MSR-TR-2005-101, Microsoft Research, Red- wood, Washington, 2005. D.R. Radev, H. Jing, M. Stys, and D. Tam. Centroid-based summarization for multiple documents. In In Int. Jrnl. Information Process- ing...

Ngày tải lên: 20/02/2014, 04:20

10 559 0

USE OF MINIMAL LEXICAL CONCEPTUAL STRUCTURES FOR SINGLE-DOCUMENT SUMMARIZATION doc

... license for English and Spanish parsers. 2. Update the host/port in the ﬁles fdges-client.pl (for Spanish) and fdgen-client.pl (for English). The current values should look like this for fdges-client.pl: $remote ... capabilities useful for intelligence analysts, such as cross-lingual summarization and data mining. 6.3 CONTRIBUTIONS TO RESOURCES FOR RESEARCH This work provides an integral part for many NLP applications ... Generation of Informative Cross-Lingual Headlines for Text and Speech. Thesis Proposal, University of Maryland, 2003. 5.3 OTHER PRODUCTS 1. Trimmer: Trimmer generates a headline for a news story...

Ngày tải lên: 07/03/2014, 11:20

12 361 0

Leveraging User Comments for Aesthetic Aware Image Search Reranking pot

... Retrieval]: Information Search and Retrieval; H.5.1 [Information Interfaces and Presentation]: Multimedia Information Systems Keywords opinion mining, visual aesthetics modeling, image search reranking, ... aesthetic scores for ranking aesthetic-aware reranking. Intuitively, relevance and aesthetic quality are orthogonal dimensions and therefore convey complementary information about documents being ... Oliver Telefonica Research Barcelona, Spain nuriao@tid.es ABSTRACT The increasing number of images available online has created a growing need for eﬃcient ways to search for relevant con- tent....

Ngày tải lên: 07/03/2014, 17:20

10 383 0

Báo cáo khoa học: "Lexically-Triggered Hidden Markov Models for Clinical Document Coding" pot

... context and the document as a whole. For each candidate code, three types of features are generated: document features, ConText features, and code-semantics features (Table 1). Document: Document features ... Cherry Institute for Information Technology National Research Council Canada {Svetlana.Kiritchenko,Colin.Cherry}@nrc-cnrc.gc.ca Abstract The automatic coding of clinical documents is an important task for ... within a document may interact. It is an interesting combination of sentence and document- level processing. Formally, we deﬁne the document coding task as follows: given a set of documents...

Ngày tải lên: 07/03/2014, 22:20

10 397 0

Báo cáo khoa học: "A Probabilistic Model for Fine-Grained Expert Search" pptx

... CDD-based Formal Model for Expert Finding. In Proc. of CIKM 2007. Hertzum, M. and Pejtersen, A. M., 2000. The information-seeking practices of engineers: searching for documents as well as for ... of topics transformed from an original query will be obtained and then be used in the search for experts. Table 3 shows five forms of topic discovering from a given query. Forms Description ... 2005. Research on expert search at enterprise track of TREC 2005. In: Proc. of TREC 2005. Craswell, N., Hawking, D., Vercoustre, A. M. and Wil- kins, P., 2001. P@NOPTIC Expert: searching for experts...

Ngày tải lên: 08/03/2014, 01:20

9 400 0

Báo cáo khoa học: "A Bag of Useful Techniques for Efficient and Robust Parsing" ppt

... paper is available as DFKI Research Report RR-94-37. Hans-Ulrich Krieger and Ulrich Sch~ifer. 1995. Efficient parameterizable type expansion for typed feature formalisms. In Proceedings of ... Ivan A. Sag. 1987. Information-Based Syntax and Seman- tics. Vol. I: Fundamentals. CSLI Lecture Notes, Number 13. Center for the Study of Language and Information, Stanford. Stuart M. Shieber. ... German Research Center for Artificial Intelligence (DFKI), Saarbr/icken, Germany. Also in Proc. MT Summit IV, 127-135, Kobe, Japan, July 1993. 480 A Bag of Useful Techniques for Efficient...

Ngày tải lên: 08/03/2014, 06:20

8 340 0

Báo cáo khoa học: "Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction" doc

... retrieval, document clustering, etc. For example, keywords of a document can be used for document indexing and thus benefit to improve the performance of document retrieval, and document summary ... basic forms based on WordNet before comparison. The precision p, re- call r, F-measure (F=2pr/(p+r)) were obtained for each document and then the values were averaged over all documents for ... topics of the document without any additional clues and prior knowledge. In this paper, we focus on generic document summarization and keyword extraction for single documents. Document summarization...

Ngày tải lên: 17/03/2014, 04:20

8 393 0

Báo cáo khoa học: "Packing of Feature Structures for Efficient Unification of Disjunctive Feature Structures" pptx

... factor of 6.4 to 8.4. For realiz- ing efficient NLP systems, I am currently build- ing an efficient parser by integrating the packing method with the compilation method for HPSG (Torisawa and ... this system. For performance evaluation I measure the execution time for a part of application of grammar rules (i.e. schemata) of XHPSG. Table 1 shows the execution time for uni- fying ... Execution time for unification. Test data shows the word used for the experiment. # of LEs shows the number of lexical entries assigned to the word. Naive shows the time for unification...

Ngày tải lên: 17/03/2014, 07:20

6 296 0

Báo cáo khoa học: "Topic Analysis for Psychiatric Document Retrieval" potx

... consideration, all personal information has been removed. A total of 3,650 consultation documents were collected for evaluating the retrieval model, of which 20 documents were randomly selected ... 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages ... similarities at the two lengths. To calculate the similarity at 1027 for relevance estimation. These functions consider word frequencies and document lengths for word weighting. Both the VSM and Okapi...

Ngày tải lên: 23/03/2014, 18:20

8 338 0

Báo cáo khoa học: "TOWARDS AN INTEGRATED ENVIRONMENT FOR SPANISH DOCUMENT VERIFICATION AND COMPOSITION" pptx

... marked. 3. There are additional marks for hyphenation points (for later use by a formatter performing automatic syllable partition), and several other for foreign and Latin words, geographical ... environment for document verification and composition. INTRODUCTION In the field of document processing many tools exist today which allow the user to introduce a text in storage, format it, ... bound to document composition: several other objectives are also foreseen for the dictionaries and the parser, a computer-assisted verb conjugation system has already been built for Spanish...

Ngày tải lên: 24/03/2014, 05:21

4 378 0

Báo cáo khoa học: "A Bottom-up Approach to Sentence Ordering for Multi-document Summarization" ppt

... 2006. c 2006 Association for Computational Linguistics A Bottom-up Approach to Sentence Ordering for Multi -document Summarization Danushka Bollegala Naoaki Okazaki ∗ Graduate School of Information Science ... Ishizuka Abstract Ordering information is a difﬁcult but important task for applications generat- ing natural-language text. We present a bottom-up approach to arranging sen- tences extracted for multi -document ... important for such MDS systems to determine a coherent arrangement of the tex- tual segments extracted from multi-documents in order to reconstruct the text structure for summarization. Ordering information...

Ngày tải lên: 31/03/2014, 01:20

8 239 0

Báo cáo khoa học: "Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction" potx

... paper focuses on cross -document disambiguation of person names. Previous research for cross -document name disambiguation applies vector space model (VSM) for context similarity, only using ... the context similarity for a context pair is a vector of similarity features, e.g. {VSM_Similairty_equal_to_2, NE _Similarity_ equal_to_1, Relationship_Conflicts_only, No_Sharing _for_ Age, ... Conflict _for_ Affiliation}. Besides the four categories of basic context similarity features defined above, we define induced context similarity features by combining basic context similarity...

Ngày tải lên: 31/03/2014, 03:20

8 333 0

Short for Portable Document Format, a file format developed by Adobe Systems

... rotation, and conforms to the formula in the previous table. Transformations can be deﬁned in terms of a transform matrix. Such a matrix is stored in a transform variable. For example: transform t ; ... familiar for loop' of all programming languages. for i=0 step 2 until 20 : draw (0,i) ; endfor ; As explained convincingly in Niklaus Wirth's book on algorithms and datastructures, the for ... for i=0 upto n-1 : p[i] endfor p[n] ; After seeing if in action, the following for loop will be no surprise: draw origin for i=0 step 10 until 100 : {down}(i,0) endfor ; This gives the zig zag...

Ngày tải lên: 15/04/2014, 14:26

376 593 0

piecewise pseudolikelihood for efficient crf training

Ngày tải lên: 24/04/2014, 13:18

8 303 0

Báo cáo sinh học: " Cyclooxygenase activity is important for efficient replication of mouse hepatitis virus at an early stage of infection" ppt

Ngày tải lên: 18/06/2014, 18:20

5 449 0

Báo cáo sinh học: "Use of recombinant lentivirus pseudotyped with vesicular stomatitis virus glycoprotein G for efficient generation of human anti-cancer chimeric T cells by transduction of human peripheral blood lymphocytes in vitro" pot

Ngày tải lên: 19/06/2014, 08:20

10 435 0

Báo cáo hóa học: " Cyclooxygenase activity is important for efficient replication of mouse hepatitis virus at an early stage of infection" pptx

Ngày tải lên: 20/06/2014, 01:20

5 392 0

báo cáo hóa học:" Use of recombinant lentivirus pseudotyped with vesicular stomatitis virus glycoprotein G for efficient generation of human anti-cancer chimeric T cells by transduction of human peripheral blood lymphocytes in vitro" docx

Ngày tải lên: 20/06/2014, 04:20

10 411 0