Parallel texts extraction from the web

Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

... (e.g (Pantel and Pennacchiotti, 2006)) to reduce noise Some of the methods are suitable for retrieval of numerical attributes However, most of them not exploit the numerical nature of the attribute ... evaluation, since the nature of the data is different from that of the QA dataset Most of the questions asked over the Web target named entities like spe...

Ngày tải lên: 20/02/2014, 04:20

10 466 0

Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

... collected 610 terms in total; the average number of output terms per input is 12.2 terms We checked whether each of the 610 terms is a correct related term of the original seed term by hand The result ... indidates that the term passed the test Twenty terms out of the thrity candidate terms passed the ﬁrst techinical-term test (Tech.) and sixteen terms...

Ngày tải lên: 20/02/2014, 16:20

4 437 0

Báo cáo khoa học: "Automatic Set Instance Extraction using the Web" pptx

... of set instance extraction for each dataset measured in MAP NP is the Noisy Instance Provider, NE is the Noisy Instance Expander, and BS is the Bootstrapper quality of the initial list, and the ... components: the Noisy Instance Provider, the Noisy Instance Expander, and the Bootstrapper Given a semantic class name, the Provider extracts a initial set of nois...

Ngày tải lên: 08/03/2014, 00:20

9 331 0

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

... techniques, the DOM tree alignment model, sentence alignment model, and candidate web page pair verification model are introduced DOM Tree Alignment Model The Document Object Model (DOM) is an application ... 1: Precision of Mined Parallel Documents Experimental Results The DOM tree alignment based mining system is used to acquire English-Chinese paralle...

Ngày tải lên: 08/03/2014, 02:21

8 435 0

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

... as the number of web pages containing the words the and ’and’ matched On the basis of these, we then calculate the probability of a certain qualia element given a certain role on the basis of ... learning ranked qualia structures which allow to ﬁnd an ideal cut-off point to increase the precision/recall trade-off of the learned structures We have abstracted...

Ngày tải lên: 08/03/2014, 02:21

8 379 0

Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

... preferences into the algorithm, by using the word positions to break ties of the φ2 scores when sorting the word pairs 4.4 Capturing syllable-level regularities Many of the parenthetical translations ... parenthetical translations need only determine the first preparenthesis word aligned with an in-parenthesis word, whereas word alignment requires the respecti...

Ngày tải lên: 17/03/2014, 02:20

9 612 0

Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

... applies hyponym patterns to the web and acquires contexts around them The KnowItAll system (Etzioni et al., 2005) also uses hyponym patterns to extract class instances from the web and then evaluates ... semantic class: such as and * This pattern has two variables: the name of the semantic class to be learned (class name) and a member of the semantic class (c...

Ngày tải lên: 17/03/2014, 02:20

9 340 0

Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

... is a hypernym of A, B and C • A, B and C are siblings of each other Here, sibling refers to the relative position of the words in the hypernymy tree Two words are siblings of each other if they ... parent We compute a hypernym evidence score S(h, w) for each candidate hypernym h for word w It is the sum of the normalized evidence for the hypernymy relation between h and w, a...

Ngày tải lên: 17/03/2014, 04:20

4 395 0

Báo cáo khoa học: "Compiling French-Japanese Terminologies from the Web" pptx

... around the seed 2.2 Automatic Term Recognition The next step is to extract candidate related terms from the corpus Because the sentences composing the corpus are related to the seed, the same ... consists of picking the most likely translation from the translation candidates we have generated To discern the likely from the unlikely, we use the empirical evidence...

Ngày tải lên: 17/03/2014, 22:20

8 372 0

Báo cáo khoa học: "Extracting Sequences from the Web" pptx

... Pattern the ORD the RB ORD the JJS the RB JJS the ORD JJS the RBS JJ the ORD RBS JJ Example the ﬁfth the very ﬁrst the best the very best the third biggest the most popular the second least ... measured how the density and functionality features improve performance on the sequence name We queried for both the numeric form of the ordinal and the number spe...

Ngày tải lên: 23/03/2014, 16:20

5 309 0

Báo cáo khoa học: "Unsupervised Relation Extraction by Mining Wikipedia Texts Using Information from the Web" pdf

... start by deﬁning the problem under consideration: relation extraction from Wikipedia We use the encyclopedic nature of the corpus by speciﬁcally examining the relation extraction between the entitled ... pair by leveraging the vast size of the Web Our hypothesis is that there exist some key terms and patterns that provide clues to the relations between pairs...

Ngày tải lên: 23/03/2014, 16:21

9 345 0

Báo cáo khoa học: "Learning to Extract Relations from the Web using Minimal Supervision" ppt

... containing a1 and a2 in the same sentence” The returned documents (limited by Google to the ﬁrst 1000) are downloaded, and then the text is extracted using the HTML parser from the Java Swing package ... elements: the Means element (e.g “in a stock-for-stock transaction”) and the Time element (e.g “on October 9, 2006”) Words from these elements, like “stock”, or “Octob...

Ngày tải lên: 23/03/2014, 18:20

8 371 0

Báo cáo khoa học: "Using Corpus Statistics on Entities to Improve Semi-supervised Relation Extraction from the Web" pot

... target relation arguments, and how to integrate the results produced by the validating patterns into the whole relation extraction system • We show how to use corpus statistics and term extraction ... specified relations from the Web without human supervision Accordingly, the supervised input to the system is limited to the specifications of the target 60...

Ngày tải lên: 23/03/2014, 18:20

8 310 0

Báo cáo khoa học: "Learning Transliteration Lexicons from the Web" pptx

... labeling At the same time, we select samples of high confidence score from the rest and consider them correct E-C pairs We then merge the labeled set with the highconfidence set in the PSM re-training ... resulting from automatic speech recognition to bootstrap an initial PSM model The task of labeling samples is basically to distinguish the qualified transliteration pairs...

Ngày tải lên: 31/03/2014, 01:20

8 341 0

Parallel texts extraction from the web

... from language L1 to language L2 and others the other way around The direction of the translation may not even be known The parallel corpora exist in several formats They can be raw parallel texts ... the dp feature the n feature the r feature the p feature the publication date feature the simcognates feature the text length feature the number of paragraphs feat...

Ngày tải lên: 25/03/2015, 10:03

53 256 0