keyword extraction from the web for foaf metadata

Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

Tài liệu Báo cáo khoa học: "Extraction and Approximation of Numerical Attributes from the Web" pdf

... and reliable Web data. For some questions, the exact answer is the only possible one (e.g., the height of a person), while for others it is only the center of a distribution (e.g., the weight of ... datasets, Web and TREC based. Web- based QA dataset. We created QA datasets for size, height, width, weight, and depth attributes. For each attribute we extracted from the Web 250 questions in the following ... based on the number of web snip- pets retrieved during the value acquisition stage. If there are several values with the same frequency we select the median of these values. Approximating the attribute...

Ngày tải lên: 20/02/2014, 04:20

10 466 0
Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

... (1) Given a web site, the root page and web pages directly linked from the root page are downloaded. Then for each of the downloaded web page, all of its anchor texts (i.e. the hyperlinked ... English-Chinese parallel data from the web. The mining procedure is initiated by acquiring Chinese website list. We have downloaded about 300,000 URLs of Chinese websites from the web directories at ... that, using the new web mining scheme, the web mining throughput is increased by 32%; (ii) The quality of the mined data is improved. By lever- aging the web pages’ HTML structures, the sen- tence...

Ngày tải lên: 08/03/2014, 02:21

8 435 0
Tài liệu Báo cáo khoa học: " Mining the Web for Language Learning" pdf

Tài liệu Báo cáo khoa học: " Mining the Web for Language Learning" pdf

... verbs describing actions for the noun “TV.” In the results, we find fresh and authentic sample sentences mined from the web, the first of which contains “watch TV,” the most common collocation, as the top result. Additionally, ... consists of the crawler and the raw web page storage. The crawler periodically downloads two kinds of web pages, which are put into the storage. The first kind of web pages are parallel web pages ... round of the mining process. The second layer consists of the extractor, the filter, the classifiers and the readability evaluator, which are applied sequentially. The extractor scans the raw web page...

Ngày tải lên: 20/02/2014, 05:20

6 658 0
Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

Tài liệu Báo cáo khoa học: "Automatic Collection of Related Terms from the Web" pptx

... query. In case the query is a term, its hit is the number of pages that contain the term on the Web. We use the following notation. H(x)= the number of pages that contain the term x” The number H ... half (Evaluation II) in Table 2 shows the result. S: the target term was collected by the system. F: the target term was removed in the filtering step. A: the target term existed in the compiled corpus, but ... automatic term extrac- tion. C: the target term existed in the collected web pages, but did not exist in the compiled corpus. R: the target term did not exist on the collected web pages. Only 43 terms...

Ngày tải lên: 20/02/2014, 16:20

4 437 0
Tài liệu This material is from the Council for Economic Education docx

Tài liệu This material is from the Council for Economic Education docx

... by the Indiana Council for Economic Education (ICEE). For further information see ☞ Parent and Community Support: Parents are strong supporters of the mini-economy program. They ... become rather hectic, these tips should be helpful:  Display the Items/Privileges Before the Auction Takes Place. This allows the students to examine carefully what will be offered for sale ... usually open before school or at the end of the day. Others have special “store times” for students to shop. ✎ Setting Prices: Unlike the class auction, it will be necessary to set prices for store...

Ngày tải lên: 20/02/2014, 19:20

14 467 0
Báo cáo khoa học: "Automatic Set Instance Extraction using the Web" pptx

Báo cáo khoa học: "Automatic Set Instance Extraction using the Web" pptx

... and the Bootstrapper then enhances it further more. On average, the Ex- pander improves the performance of the Provider from 37% to 80% for English, 24% to 82% for Chinese, and 12% to 89% for ... com- ponents: the Fetcher, Extractor, and Ranker. The Fetcher is responsible for fetching web docu- ments, and the URLs of the documents come from top results retrieved from the search engine us- ing the ... effective even without the Expander; it directly improves the performance of the Provider from 37% to 77% for English, 24% to 52% for Chinese, and 12% to 39% for Japanese. The simple back-off strategy...

Ngày tải lên: 08/03/2014, 00:20

9 331 0
Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

Báo cáo khoa học: "Automatic Acquisition of Ranked Qualia Structures from the Web" potx

... results for the formal and agentive roles, while for the con- stitutive and telic roles the Web- Jac measure per- Figure 1: Average F 1 measure for the different rank- ing measures formed best. The ... coefficient (Web- Jac), the Pointwise Mutual Information (Web- PMI) and the conditional probability (Web- P). We also present a version of the conditional probability which does not use the Web but merely ... presents an approach for the au- tomatic acquisition of qualia structures for nouns from the Web and thus opens the pos- sibility to explore the impact of qualia struc- tures for natural language...

Ngày tải lên: 08/03/2014, 02:21

8 379 0
Báo cáo khoa học: "Mining the Web for Bilingual Text" pot

Báo cáo khoa học: "Mining the Web for Bilingual Text" pot

... [END:TITLE]. The number inside the chunk token is the length of the text chunk, not counting whitespace; from this point on only the length of the text chunks is used, and therefore the structural ... considered the most reliable, these were used as the basis for the computation of recall and precision. For this reason, and because the human-judged set included only a sample of the full ... data from the European Cor- pus Initiative (ECI), available from the Linguis- tic Data Consortium (LDC). In a formal evaluation, STRAND with the new language identification stage was run for...

Ngày tải lên: 08/03/2014, 06:20

8 229 0
Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

... our modified version of the competitive link- ing algorithm, the link score of a pair of words is the sum of the φ 2 scores of the words themselves, their prefixes and their suffixes. In addition ... pairs, where the translation of the in-parenthesis terms is a suffix of the pre-parenthesis text. The lengths and frequency counts of the suffixes have been used to determine what is the translation ... Whenever there is more than one translation, we randomly pick one as the answer key. For each Chinese and English word in the Wikipedia data, we first find whether there is a translation for the...

Ngày tải lên: 17/03/2014, 02:20

9 612 0
Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

Báo cáo khoa học: "Semantic Class Learning from the Web with Hyponym Pattern Linkage Graphs" pdf

... demon- strated how to scale up their algorithms for the web. Several techniques for semantic class induction have also been developed specifically for learning from the web. (Pasáca, 2004) uses Hearsts ... hyponym patterns to extract class instances from the web and then evalu- ates them further by computing mutual information scores based on web queries. The work by (Widdows and Dorow, 2002) on lex- ical ... to instantiate the pattern. On the first iteration, the pattern is given to Google as a web query, and new class members are extracted from the retrieved text snippets. We wanted the system to...

Ngày tải lên: 17/03/2014, 02:20

9 340 0
Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

Báo cáo khoa học: "Extracting Hypernym Pairs from the Web" potx

... very large text corpora. Today, the web contains more data than the largest available text corpus. For this reason, we are interested in em- ploying the web for the extraction of hypernym re- lations. ... interesting information. Most web search engines impose a limit on the number of results returned from a query (for example 1000), which limits the opportunities for assessing the per- formance of ... about whether the size of the web allows to achieve meaningful results with basic extraction techniques. In section two we introduce the task, hypernym extraction. Section three presents the results...

Ngày tải lên: 17/03/2014, 04:20

4 395 0