extracting sequences from the web

Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

... relations from the web. We compare our approach with hypernym ex- traction from morphological clues and from large text corpora. We show that the abun- dance of available data on the web enables obtaining ... reason, we are interested in em- ploying the web for the extraction of hypernym re- lations. We are especially curious about whether the size of the web allows to achieve meaningful results with ... the two web ex- periments and a combination of the best web ap- proach with the morphological approach. The con- junctive web pattern N en N rates best, because of its high frequency. The recall...

Ngày tải lên: 17/03/2014, 04:20

4 395 0
Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

... each kind. These patterns are the only attribute-specific resource in our framework. Value extraction. The first pattern group, P values , allows extraction of the attribute values from the Web. All ... width 1.695m]’). We then extract new pat- terns from the retrieved search engine snippets and re-query the Web with the new patterns to obtain more attribute values. We provided the framework with ... value for the given object. During the first stage it is possible that we directly extract from the text a set of values for the requested object. The bounds processing step rejects some of these...

Ngày tải lên: 20/02/2014, 04:20

10 466 0
Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

... query is a term, its hit is the number of pages that contain the term on the Web. We use the following notation. H(x)= the number of pages that contain the term x” The number H (x) can be used ... in the compiled corpus. R: the target term did not exist on the collected web pages. Only 43 terms (20%) out of 210 terms were col- lected by the system. This low recall primarily comes from the ... term list To make the term list L by extracting every term that is a noun or a compound noun from the compiled corpus. 2. Selection by scoring To select the top N (= 30) terms from the list L by...

Ngày tải lên: 20/02/2014, 16:20

4 437 0
Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

... that, using the new web mining scheme, the web mining throughput is increased by 32%; (ii) The quality of the mined data is improved. By lever- aging the web pages’ HTML structures, the sen- tence ... English-Chinese parallel data from the web. The mining procedure is initiated by acquiring Chinese website list. We have downloaded about 300,000 URLs of Chinese websites from the web directories at ... (1) Given a web site, the root page and web pages directly linked from the root page are downloaded. Then for each of the downloaded web page, all of its anchor texts (i.e. the hyperlinked...

Ngày tải lên: 08/03/2014, 02:21

8 435 0
Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

... (not calculated over the Web) as well as the conditional probability cal- culated over the Web (Web- P) delivered the best re- sults, while the PMI-based ranking measure yielded the worst results. ... coefficient (Web- Jac), the Pointwise Mutual Information (Web- PMI) and the conditional probability (Web- P). We also present a version of the conditional probability which does not use the Web but merely ... appropriate queries to the web search engine and choosing the article leading to the highest number of results. The corresponding patterns are then matched in the 50 snippets returned by the search engine...

Ngày tải lên: 08/03/2014, 02:21

8 379 0
Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

... our modified version of the competitive link- ing algorithm, the link score of a pair of words is the sum of the φ 2 scores of the words themselves, their prefixes and their suffixes. In addition ... pairs, where the translation of the in-parenthesis terms is a suffix of the pre-parenthesis text. The lengths and frequency counts of the suffixes have been used to determine what is the translation ... C ≥ 2 E + K, where C is the length of the Chinese text, E is the length of the English text in the parentheses and K is a constant (we used K=6 in our experiments). The lengths C and E are...

Ngày tải lên: 17/03/2014, 02:20

9 612 0
Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

... hyponym patterns to extract class instances from the web and then evalu- ates them further by computing mutual information scores based on web queries. The work by (Widdows and Dorow, 2002) on lex- ical ... to instantiate the pattern. On the first iteration, the pattern is given to Google as a web query, and new class members are extracted from the retrieved text snippets. We wanted the system to ... progresses. Initially, the seed is the only trusted class member and the only vertex in the graph. The bootstrapping process begins by instan- tiating the doubly-anchored pattern with the seed class...

Ngày tải lên: 17/03/2014, 02:20

9 340 0
Báo cáo khoa học: "Compiling French-Japanese Terminologies from the Web" pptx

Báo cáo khoa học: "Compiling French-Japanese Terminologies from the Web" pptx

... translation. They use a compositional method to generate a set of translation candidates from which they select the most likely translation by using empirical evidence from the web. The method ... around the seed. 2.2 Automatic Term Recognition The next step is to extract candidate related terms from the corpus. Because the sentences compos- ing the corpus are related to the seed, the ... precedence to the alignments obtained with the more accurate methods. Con- sequently, we start by adding the alignments in FJ to the output set. Then, we augment it with the alignments from FJJ...

Ngày tải lên: 17/03/2014, 22:20

8 372 0
Báo cáo khoa học: Subunit sequences of the 4 · 6-mer hemocyanin from the golden orb-web spider, Nephila inaurata Intramolecular evolution of the chelicerate hemocyanin subunits pot

Báo cáo khoa học: Subunit sequences of the 4 · 6-mer hemocyanin from the golden orb-web spider, Nephila inaurata Intramolecular evolution of the chelicerate hemocyanin subunits pot

... Server of the Swiss Institute of Bioinformatics (http://www.expasy.org) and the program GENEDOC 2.6 [25] were used for the analyses of DNA and amino acid sequences. The amino acid sequences of the ... assuming that the LpoHc2 and the a-subunits of N. inaurata and E. californicum on the one hand, and TtrHcA and the arachnid g-subunits on the other hand are orthologous proteins (see above). The fossil ... allows the unambiguous assignment to distinct subunit types. The orthologous subunits of these species share 69.1–76.2% of their amino acids, with the a subunits being the most conserved and the...

Ngày tải lên: 08/03/2014, 08:20

8 415 0
Tài liệu Báo cáo khoa học: Complete subunit sequences, structure and evolution of the 6 · 6-mer hemocyanin from the common house centipede, Scutigera coleoptrata pptx

Tài liệu Báo cáo khoa học: Complete subunit sequences, structure and evolution of the 6 · 6-mer hemocyanin from the common house centipede, Scutigera coleoptrata pptx

... the determination of the N-terminal sequences, C. Hunzinger for the MALDI-TOF analyses, C. Bache and J. Hermanns for their help with the cloning experiments, and J. R. Harris for correcting the ... conserved in any other arthropod hemocyanin subunit. Nevertheless, the glycosylation site at a-sheet 2E (Fig. 3) is located at the surface of the putative hemocyanin hexamer, as deduced from the comparison with ... recent common ancestor than the other four S. coleoptrata hemocyanin (Fig. 5). The topology demonstrates that the diversification of the hemocyanin subunits commenced before the Chilo- poda and Diplopoda...

Ngày tải lên: 20/02/2014, 11:20

9 553 0

Bạn có muốn tìm thêm với từ khóa:

w