... utilizing unsupervised data in addi- tion to supervised data for supervised learn- ing. We use unsupervised data to gener- ate informative ‘condensed feature represen- tations’ from the original feature ... utilize a large amount of unsupervised data to supplement supervised data. Specifically, an approach that involves incorporating ‘clustering- based word representations (CWR)’ induced from unsupervised ... i.e., F-score 90.72 with 344 features for CoNLL-2003 NER data, and UAS 93.55 with 12.5K features for depen- dency parsing data derived from PTB-III. 1 Introduction In the last decade, supervised...
Ngày tải lên: 07/03/2014, 22:20
Ngày tải lên: 23/03/2014, 20:20
Báo cáo khoa học: "Automatic Single-Document Key Fact Extraction from Newswire Articles" potx
Ngày tải lên: 24/03/2014, 03:20
Báo cáo khoa học: "Extracting Noun Phrases from Large-Scale Texts: A Hybrid Approach and Its Automatic Evaluation" pot
Ngày tải lên: 31/03/2014, 06:20
Automatic text extraction using DWT and Neural Network
... or video sequences and hence can be used for video browsing/retrieval in a large video database. However, text extraction presents a number of problems because the properties of text may ... in a cluttered background. These are the reasons make text extraction a challenging task. Many papers concerning extraction of texts from static images or video sequences have been published ... images or compressed images. Text extraction from uncompressed image can be classified as either component-based or texture-based. For component-based text extraction methods, text regions...
Ngày tải lên: 05/11/2012, 14:51
Text extraction from name cards using neural network
... contour by scanning from outer sides towards center. Studying these background pixels will give us knowledge on which part of the histogram is from background and which from text. Then the ... extracting text from name cards: 1) Variation of background color and text color (varying from line to line); 2) Complex graphical foregrounds like logos or pictures; 3) Large variation ... recognition (OCR) to build a name card database. The application provides for document information portability, thus dispensing with the need to carry a large number of name cards and facilitating...
Ngày tải lên: 05/11/2012, 14:54
Tài liệu Reading XML Data Directly from SQL Server doc
... retrieving data in XML format using the FOR XML clause. The .NET SQL Server data provider SqlCommand object has an ExecuteXmlReader( ) that allows you to retrieve an XML stream directly from SQL ... used with SQL statements that return XML data, such as those with a FOR XML clause. The ExecuteXmlReader( ) method can also be used to return ntext data containing valid XML. For more information...
Ngày tải lên: 24/12/2013, 05:15
Tài liệu Open Domain Event Extraction from Twitter docx
... of automatically discovered event types with percentage of data covered. Inter- pretable types representing significant events cover roughly half of the data. supervised approaches that will automatically ... infers an appropriate set of event types to match our data, and also classifies events into types by leveraging large amounts of unlabeled data. Supervised or semi-supervised classification of event ... N e do Generate z n e,i from Multinomial(θ e ). Generate the entity n e,i from Multinomial(β z n e,i ). end for for each date which co-occurs with e, i = 1 . . . N d do Generate z d e,i from Multinomial(θ e ). Generate...
Ngày tải lên: 19/02/2014, 18:20
Tài liệu Báo cáo khoa học: "Effective Phrase Translation Extraction from Alignment Models" ppt
... sources from existing, mature components within the translation process. This paper presents a method of phrase extraction from alignment data generated by IBM Models. By working directly from alignment ... We estimate translation con- fidence by measures from three models; the estima- tion from the maximum approximation (alignment map), estimation from the word based translation lexicon, and language ... with , at the cost of devi- ating from the Bayesian framework. Regardless of the approach, the question of accurately estimating a model of translation from a large parallel or com- parable corpus...
Ngày tải lên: 20/02/2014, 16:20
Tài liệu Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" docx
... linear proximity 3.1 Cooccurrence Extraction with Fips Collocations are extracted from syntactically ana- lysed corpora. The analysis is performed by Fips, a large- scale parser based on an adaptation ... terminological extraction capable of handling multi-word expres- sions, based on a detailed linguistic analysis. The originality of our approach comes from the fact that collocations are not extracted from ... Berkeley, Canada, pp. 169-176. Catizone R., Russell G., and Warwick S. (1989). Deriv- ing Translation Data from Bilingual Texts. In Pro- ceedings of the First International Lexical Acquisition Workshop,...
Ngày tải lên: 22/02/2014, 02:20
The six business models for copyright infringement: A data-driven study of websites considered to be infringing copyright docx
... the natural solution From the 257 websites, we used 153 websites as the ‘Training’ set and the remaining 104 websites as the ‘Validation’ set. We used the training set of websites to test the ... business models for copyright infringement – A data driven study of websites considered to be infringing copyright Acknowledging contributions of data from: with the assistance of: 50 The six business ... business but be limited to a few websites, where a much larger segment in terms of the numbers of websites may undertake less business. Chart labels are the number of websites in each segment -...
Ngày tải lên: 06/03/2014, 21:20
Báo cáo khoa học: "Information Extraction From Voicemail" potx
... nondeterministic and type of data 1 is only in the neighborhood of 60- 70% (Huang et al., 2000). The task that is most similar to our work is named entity extraction from speech data (DARPA, 1999). Although ... novel technique based on automatic stochastic- transducer induction. It aims to learn rules auto- matically from training data instead of requiring hand-crafted rules from experts. Although the ... speci- fied transducer, and in this section, we describe how such an item can be automatically induced from labeled training data. The overall goal is to take a set of labeled training examples in which...
Ngày tải lên: 08/03/2014, 05:20
Báo cáo khoa học: "Creating a Multilingual Collocation Dictionary from Large Text Corpora" ppt
... (paragraph-level) structure of documents is examined, possibly using mark-up from text encoding. 133 Creating a Multilingual Collocation Dictionary from Large Text Corpora Luka Nerima, Violeta Seretan, Eric Wehrli Language ... terminological extraction capable of handling multi-word expres- sions, based on a detailed linguistic analysis. The originality of our approach comes from the fact that collocations are not extracted from ... approach. 4 Collocation Dictionary We used the collocations extracted from the French and English corpora for creating a database of knowledge that integrates collocations and in- stances of...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: " The Development of Lexical Resources for Information Extraction from Text Combining Word Net and Dewey Decimal Classification" potx
... 227 Proceedings of EACL '99 The Development of Lexical Resources for Information Extraction from Text Combining WordNet and Dewey Decimal Classification* Gabriela Cavagli~t ITC-irst ... development of new ap- plications in the field of Information Ex- traction from text. Generic resources (e.g., lexical databases) are promising for reducing the cost of specific lexica defi- ... neces- sary for a IE lexicon; secondly the presence of a large amount of lexical polysemy. In this paper we propose a methodology for semi-automatically developing the relevant part of a lexicon...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "Acquisition of Conceptual Data Models from Natural Language Descriptions" doc
... clauses. Many database interfaces have such capabilities, McCord (1982), Dahl (1982) and Warren and Pereira (1982) inter a//a. 245 Acquisition of Conceptual Data Models from Natural Language ... specifications from natural language description is presented as a problem class that requires a different treatment of semantics when compared with other applied NL systems such as database and ... interfaces. Within this problem class, the specific task of obtaining explicit conceptual data models from natural language text or dialogue is being investigated. The knowledge brought to bear...
Ngày tải lên: 09/03/2014, 01:20
Báo cáo khoa học: "Toolkit for Multi-Level Alignment and Information Extraction from Comparable Corpora" pptx
... parallel data. Then parallel sentence pairs are extracted from the aligned comparable corpora (section 2.2). The workflow for named entity (NE) and terminology extraction and mapping from comparable ... (also m to n alignments) for sentence extraction. It also allows extraction from comparable corpora as a whole; however, precision may decrease due to larger search space. LEXACC scores sentence ... translation data. Our presented toolkit deals with parallel content extraction from comparable corpora. It consists of tools bundled in two workflows: (1) alignment of comparable documents and extraction...
Ngày tải lên: 16/03/2014, 20:20
Báo cáo khoa học: "Topical Keyphrase Extraction from Twitter" potx
... à . 4 Experiments 4.1 Data Set and Preprocessing We use a Twitter data set collected from Singapore users for evaluation. We used Twitter REST API 1 to facilitate the data collection. The majority ... as news reports. So far there is little work on keyword or keyphrase extraction from Twitter. Wu et al. (2010) proposed to automatically generate personalized tags for Twit- ter users. However, ... content within a certain period and/or from a certain group of peo- ple such as people in the same region. Existing work on keyphrase extraction identifies keyphrases from either individual documents...
Ngày tải lên: 17/03/2014, 00:20
Báo cáo khoa học: "Rare Word Translation Extraction from Aligned Comparable Documents" doc
... dif- ferent pair of languages. For this experiment, we used the data from one corpus to train the classifier, and used the data from another combination of lan- guages as the test set. Results ... seeks for comparable and paral- lel documents from the web. Starting from a list of Chinese documents (in this case, mostly news arti- cles), we automatically selected English target docu- ments ... of rare lexicon extraction There are few previous works focusing on the ex- traction of rare word translations, especially from comparable corpora. One of the earliest works is from (Pekar et...
Ngày tải lên: 17/03/2014, 00:20