0

semisupervised simhash for efficient document similarity search

Báo cáo khoa học:

Báo cáo khoa học: "Semi-Supervised SimHash for Efficient Document Similarity Search" pptx

Báo cáo khoa học

... Association for Computational Linguistics, pages 93–101,Portland, Oregon, June 19-24, 2011.c2011 Association for Computational LinguisticsSemi-Supervised SimHash for Efficient Document Similarity Search Qixia ... are similar to aquery document is an important componentin modern information retrieval. Some ex-isting hashing methods can be used for effi-cient document similarity search. However,unsupervised ... best performance.1 Introduction Document Similarity Search (DSS) is to find sim-ilar documents to a query doc in a text corpus oron the web. It is an important component in mod-ern information...
  • 9
  • 389
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "SPEECH OGLE: Indexing Uncertainty for Spoken Document Search" pptx

Báo cáo khoa học

... employed for performing this com-putation. The computation for the backward proba-bility βnstays unchanged (Rabiner, 1989) whereasduring the forward pass one needs to split the for- ward ... section, position in-formation is crucial for being able to evaluate prox-imity information when assigning a relevance scoreto a given document. In the spoken document case however, we arefaced ... TREC evaluations.The PSPL lattices for each segment in the spoken document collection were indexed. In terms of rel-ative size on disk, the uncompressed speech for thefirst 20 lectures uses 2.5GB,...
  • 4
  • 255
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Hybrid Hierarchical Model for Multi-Document Summarization" ppt

Báo cáo khoa học

... pzsn=p(zsn|com) via Eq.(3). The similarity between thedistributions is then measured with transformed IR818 Document Cluster1 Document Cluster2 Document Clustern f1f2f3fqf-input ... scores0.430.200.03..Figure 3: Flow diagram for Hybrid Learning Algorithm for Multi -Document Summarization.7 ConclusionIn this paper, we presented a hybrid model for multi -document summarization. We demonstratedthat ... ReportMSR-TR-2005-101, Microsoft Research, Red-wood, Washington, 2005.D.R. Radev, H. Jing, M. Stys, and D. Tam.Centroid-based summarization for multipledocuments. In In Int. Jrnl. Information Process-ing...
  • 10
  • 559
  • 0
USE OF MINIMAL LEXICAL CONCEPTUAL STRUCTURES FOR SINGLE-DOCUMENT SUMMARIZATION doc

USE OF MINIMAL LEXICAL CONCEPTUAL STRUCTURES FOR SINGLE-DOCUMENT SUMMARIZATION doc

Kĩ thuật Viễn thông

... license for English and Spanish parsers.2. Update the host/port in the files fdges-client.pl (for Spanish) and fdgen-client.pl (for English). The current valuesshould look like this for fdges-client.pl:$remote ... capabilities useful for intelligence analysts, such as cross-lingual summa-rization and data mining.6.3 CONTRIBUTIONS TO RESOURCES FOR RESEARCHThis work provides an integral part for many NLP applications ... Generation of Informative Cross-Lingual Headlines for Text and Speech. Thesis Proposal,University of Maryland, 2003.5.3 OTHER PRODUCTS1. Trimmer: Trimmer generates a headline for a news story...
  • 12
  • 361
  • 0
Leveraging User Comments for Aesthetic Aware Image Search Reranking pot

Leveraging User Comments for Aesthetic Aware Image Search Reranking pot

Chụp ảnh - Quay phim

... Retrieval]: Information Search and Retrieval; H.5.1 [Information Interfaces andPresentation]: Multimedia Information SystemsKeywordsopinion mining, visual aesthetics modeling, image search reranking, ... aesthetic scores for ranking aesthetic-awarereranking. Intuitively, relevance and aesthetic quality areorthogonal dimensions and therefore convey complementaryinformation about documents being ... OliverTelefonica ResearchBarcelona, Spainnuriao@tid.esABSTRACTThe increasing number of images available online has createda growing need for efficient ways to search for relevant con-tent....
  • 10
  • 383
  • 0

Báo cáo khoa học

... context and the document asa whole. For each candidate code, three types offeatures are generated: document features, ConTextfeatures, and code-semantics features (Table 1). Document: Document features ... CherryInstitute for Information TechnologyNational Research Council Canada{Svetlana.Kiritchenko,Colin.Cherry}@nrc-cnrc.gc.caAbstractThe automatic coding of clinical documentsis an important task for ... within a document may interact. It is an interesting combination of sen-tence and document- level processing.Formally, we define the document coding taskas follows: given a set of documents...
  • 10
  • 397
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Probabilistic Model for Fine-Grained Expert Search" pptx

Báo cáo khoa học

... CDD-based Formal Model for Expert Finding. In Proc. of CIKM 2007. Hertzum, M. and Pejtersen, A. M., 2000. The informa-tion-seeking practices of engineers: searching for documents as well as for ... of topics transformed from an original query will be obtained and then be used in the search for experts. Table 3 shows five forms of topic discovering from a given query. Forms Description ... 2005. Research on expert search at enterprise track of TREC 2005. In: Proc. of TREC 2005. Craswell, N., Hawking, D., Vercoustre, A. M. and Wil-kins, P., 2001. P@NOPTIC Expert: searching for ex-perts...
  • 9
  • 399
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Bag of Useful Techniques for Efficient and Robust Parsing" ppt

Báo cáo khoa học

... paper is available as DFKI Research Report RR-94-37. Hans-Ulrich Krieger and Ulrich Sch~ifer. 1995. Efficient parameterizable type expansion for typed feature formalisms. In Proceedings of ... Ivan A. Sag. 1987. Information-Based Syntax and Seman- tics. Vol. I: Fundamentals. CSLI Lecture Notes, Number 13. Center for the Study of Language and Information, Stanford. Stuart M. Shieber. ... German Research Center for Artificial Intelligence (DFKI), Saarbr/icken, Germany. Also in Proc. MT Summit IV, 127-135, Kobe, Japan, July 1993. 480 A Bag of Useful Techniques for Efficient...
  • 8
  • 340
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Towards an Iterative Reinforcement Approach for Simultaneous Document Summarization and Keyword Extraction" doc

Báo cáo khoa học

... retrieval, document clustering, etc. For example, keywords of a document can be used for document indexing and thus benefit to improve the performance of document retrieval, and document summary ... basic forms based on WordNet before comparison. The precision p, re-call r, F-measure (F=2pr/(p+r)) were obtained for each document and then the values were averaged over all documents for ... topics of the document without any additional clues and prior knowledge. In this paper, we focus on generic document summarization and keyword extraction for single documents. Document summarization...
  • 8
  • 393
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Packing of Feature Structures for Efficient Unification of Disjunctive Feature Structures" pptx

Báo cáo khoa học

... factor of 6.4 to 8.4. For realiz- ing efficient NLP systems, I am currently build- ing an efficient parser by integrating the packing method with the compilation method for HPSG (Torisawa and ... this system. For performance evaluation I mea- sure the execution time for a part of application of grammar rules (i.e. schemata) of XHPSG. Table 1 shows the execution time for uni- fying ... Execution time for unification. Test data shows the word used for the experiment. # of LEs shows the number of lexical entries assigned to the word. Naive shows the time for unification...
  • 6
  • 296
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Topic Analysis for Psychiatric Document Retrieval" potx

Báo cáo khoa học

... consideration, all personal infor-mation has been removed. A total of 3,650 consultation documents were collected for evaluating the retrieval model, of which 20 documents were randomly selected ... 2000. IR Evaluation Methods for Retrieving Highly Relevant Documents. In Proc. of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages ... similarities at the two lengths. To calculate the similarity at 1027 for relevance estimation. These functions consider word frequencies and document lengths for word weighting. Both the VSM and Okapi...
  • 8
  • 338
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "TOWARDS AN INTEGRATED ENVIRONMENT FOR SPANISH DOCUMENT VERIFICATION AND COMPOSITION" pptx

Báo cáo khoa học

... marked. 3. There are additional marks for hyphenation points (for later use by a formatter performing automatic syllable partition), and several other for foreign and Latin words, geographical ... environment for document verification and composition. INTRODUCTION In the field of document processing many tools exist today which allow the user to introduce a text in storage, format it, ... bound to document composition: seve- ral other objectives are also foreseen for the dictionaries and the parser, a computer-assisted verb conjugation system has already been built for Spanish...
  • 4
  • 378
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Bottom-up Approach to Sentence Ordering for Multi-document Summarization" ppt

Báo cáo khoa học

... 2006.c2006 Association for Computational LinguisticsA Bottom-up Approach to Sentence Ordering for Multi -document SummarizationDanushka Bollegala Naoaki Okazaki∗Graduate School of Information Science ... IshizukaAbstractOrdering information is a difficult butimportant task for applications generat-ing natural-language text. We presenta bottom-up approach to arranging sen-tences extracted for multi -document ... important for such MDS systemsto determine a coherent arrangement of the tex-tual segments extracted from multi-documents inorder to reconstruct the text structure for summa-rization. Ordering information...
  • 8
  • 239
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Weakly Supervised Learning for Cross-document Person Name Disambiguation Supported by Information Extraction" potx

Báo cáo khoa học

... paper focuses on cross -document disambiguation of person names. Previous research for cross -document name disambiguation applies vector space model (VSM) for context similarity, only using ... the context similarity for a context pair is a vector of similarity features, e.g. {VSM_Similairty_equal_to_2, NE _Similarity_ equal_to_1, Relationship_Conflicts_only, No_Sharing _for_ Age, ... Conflict _for_ Affiliation}. Besides the four categories of basic context similarity features defined above, we define induced context similarity features by combining basic context similarity...
  • 8
  • 333
  • 0
Short for Portable Document Format, a file format developed by Adobe Systems

Short for Portable Document Format, a file format developed by Adobe Systems

Tiêu chuẩn - Qui chuẩn

... rotation, and conforms to the formula in theprevious table.Transformations can be defined in terms of a transform matrix. Such a matrix is stored in a trans-form variable. For example:transform t ; ... familiar for loop'of all programming languages. for i=0 step 2 until 20 :draw (0,i) ;endfor ;As explained convincingly in Niklaus Wirth's book on algorithms and datastructures, the for ... for i=0 upto n-1 : p[i] endfor p[n] ;After seeing if in action, the following for loop will be no surprise:draw origin for i=0 step 10 until 100 : {down}(i,0) endfor ;This gives the zig zag...
  • 376
  • 593
  • 0

Xem thêm