scaling distributional similarity to large corpora

Báo cáo khoa học: "Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases" pptx

Báo cáo khoa học: "Scaling Phrase-Based Statistical Machine Translation to Larger Corpora and Longer Phrases" pptx

Ngày tải lên : 17/03/2014, 05:20
... declined to confirm that spain declined to aid morocco declined to confirm that spain declined to aid morocco to confirm that spain declined to aid morocco confirm that spain declined to aid morocco that ... fre- 8 3 6 1 9 5 0 4 7 2 to aid morocco to confirm that spain declined to aid morocco morocco spain declined to aid morocco declined to confirm that spain declined to aid morocco declined to aid morocco confirm ... morocco that spain declined to aid morocco spain declined to aid morocco declined to aid morocco to aid morocco aid morocco morocco spain declined to confirm that spain declined aidto morocco 0 1 2 3...
  • 8
  • 316
  • 0
Tài liệu More Than a Message: Framing Public Health Advocacy to Change Corporate Practices docx

Tài liệu More Than a Message: Framing Public Health Advocacy to Change Corporate Practices docx

Ngày tải lên : 14/02/2014, 13:20
... 52) Similar to a frame around a painting, the news frame draws attention to a specific picture and separates told from untold pieces of the story. Elements in the story are said to be in the ... Publications time. A landscape story pulls back the lens to take a broader view. It may include people and events, but it connects them to the larger social and economic forces. News stories framed in such ... should strive to make stories about the landscape as vivid and interesting as the portrait. This is not easy to do but is crucial. The framing challenge for public health edu - cators is to create...
  • 17
  • 352
  • 0
Tài liệu A Comparison of Approaches to Large-Scale Data Analysis pdf

Tài liệu A Comparison of Approaches to Large-Scale Data Analysis pdf

Ngày tải lên : 19/02/2014, 12:20
... required us to (1) write to a custom tuple object us- ing Hadoop’s API, (2) modify our data loader program to transform records to compressed and serialized custom tuples, and (3) refac- tor each ... required DBMS-X to be running in order to adjust them, it was unfortunately easy to lock ourselves out with no failsafe mode to restore to a previous state. Vertica was relatively easy to install ... as stacked bars, where the bottom segment represents the time it took to execute the UDF/parser and load the data into the table and the top segment is the time to execute the actual query. DBMS-X...
  • 14
  • 923
  • 0
Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Tài liệu Báo cáo khoa học: "Finding Synonyms Using Automatic Word Alignment and Measures of Distributional Similarity" pdf

Ngày tải lên : 20/02/2014, 12:20
... discarded. We refer to this by the term minimum row frequency. The cutoff is used to make the feature space manageable and to reduce noise in the data. 3 5.1 Distributional Similarity Based on Syntactic ... would like to make use of the distributional similarity score to set a threshold that will remove a lot of errors. The last thing that remains for future work is to find a more adequate way to combine ... Methodology 3.1 Measuring Distributional Similarity An increasingly popular method for acquiring se- mantically similar words is to extract distribution- ally similar words from large corpora. The under- lying...
  • 8
  • 516
  • 0
Tài liệu Báo cáo khoa học: "Measures of Distributional Similarity" ppt

Tài liệu Báo cáo khoa học: "Measures of Distributional Similarity" ppt

Ngày tải lên : 20/02/2014, 18:20
... test triple tokens in the set, and a tie results when both alternatives are deemed equally likely by the language model in question. To perform the evaluation, we incorporated each similarity ... similarity function into a decision rule as follows. For a given similarity measure f and neighborhood size k, let 3f, k(n) denote the k most similar words to n according to f. We define the ... k most similar words according to f are on the whole better predictors than the k most similar words according to g; hence, f induces an inherently better similarity ranking for distance-weighted...
  • 8
  • 338
  • 0
Tài liệu Báo cáo khoa học: "Distributional Similarity Models: Clustering Neighbors" doc

Tài liệu Báo cáo khoa học: "Distributional Similarity Models: Clustering Neighbors" doc

Ngày tải lên : 20/02/2014, 18:20
... belong mostly to the same cluster (dotted ellipse), the two nearest neighbors to A are not the nearest two neighbors to B. like to control the degree of compression of C relative to N, that ... previous two sections, we presented two complementary paradigms for incorporat- ing distributional similarity information into cooccurrence probability estimates. Now, one cannot always draw ... large number of clus- ters in the distributional clustering case results in only the closest centroids contributing sig- nificantly to the cooccurrence probability esti- mate, whereas a large...
  • 8
  • 268
  • 0
Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Ngày tải lên : 20/02/2014, 19:20
... tempered to take into account the quantity of data that supports its conclusion. To put this another way, we want to pick (w,p) pairs that have two properties, p(w I P) is high and [ w, pl is large. ... us to produce better lists, both because the statistics we are currently collecting would be more accurate, but also because larger num- bers would allow us to find other reliable indicators. ... the machines at our disposal, so still larger corpora would not be out of the question. Finally, as noted above, Hearst [2] tried to find parts in corpora but did not achieve good results....
  • 8
  • 351
  • 0
Báo cáo khoa học: "Reducing semantic drift with bagging and distributional similarity" pdf

Báo cáo khoa học: "Reducing semantic drift with bagging and distributional similarity" pdf

Ngày tải lên : 08/03/2014, 00:20
... let L 1 n correspond to the first n terms extracted into L, and L (N−m) N correspond to the last m terms added to L N . In an iteration, let t be the next can- didate term to be added to the lexicon. We ... seman- tic drift. We integrate a distributional similarity filter directly into WMEB (McIntosh and Curran, 2008). This filter judges whether a new term is more similar to the earlier or most recently ... lexical-syntactic patterns to label clusters of distributionally similar terms. Mirkin et al. (2006) used 11 patterns, and the distributional similarity score of each pair of terms, to construct features...
  • 9
  • 339
  • 0
Báo cáo khoa học: "Using lexical and relational similarity to classify semantic relations" pptx

Báo cáo khoa học: "Using lexical and relational similarity to classify semantic relations" pptx

Ngày tải lên : 08/03/2014, 21:20
... de- scribes two complementary approaches for using distributional information extracted from corpora to calculate noun pair similarity. The first model of pair similarity is based on standard methods for ... According to this lexical similarity model, word pairs (w 1 , w 2 ) and (w 3 , w 4 ) are judged similar if w 1 is similar to w 3 and w 2 is similar to w 4 . Given a measure wsim of word-word similarity, ... con- text, a frequency cutoff to eliminate less common subsequences and the Gaussian kernel to compare vectors. While we cannot compare methods di- rectly as we do not possess the large corpus of 5...
  • 9
  • 416
  • 0
Báo cáo khoa học: "Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap" potx

Báo cáo khoa học: "Finding Word Substitutions Using a Distributional Similarity Baseline and Immediate Context Overlap" potx

Ngày tải lên : 08/03/2014, 21:20
... where antonyms are returned by the system; in those cases, a very high distribu- tional similarity actually corresponds to opposite meanings. Producing an output ranked accord- ing to distributional ... Information Content to Evaluate Semantic Similarity in a Taxonomy. In Proceedings of IJCAI–95, 1995. Idan Szpektor, Hristo Tanev, Ido Dagan and Bonaven- tura Coppola. 2004. Scaling Web-Based Acquisition of ... probability that they ap- pear together. PMI is known to have a bias towards less fre- quent events. In order to counterbalance that bias, we apply a simple logarithm function to the results as a discount: d...
  • 9
  • 248
  • 0
Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Ngày tải lên : 16/03/2014, 20:20
... analysis of large corpora due to a relatively low frequency of instances and whose identification requires expert knowledge to distin- guish them from other similar constructions. Our tool integrates ... expert knowledge to identify instances of linguistic phenomena that are hard to identify by means of existing automatic annotation tools. 1 Introduction Linguistic annotation by means of automatic pro- cedures, ... knowledge to be annotated. We plan to integrate further automatic annotations and query possibilities to support such further use-cases. Acknowledgments We would like to thank Erik-L ˆ an Do Dinh,...
  • 6
  • 356
  • 0
Báo cáo khoa học: "Syntax is from Mars while Semantics from Venus! Insights from Spectral Analysis of Distributional Similarity Networks" ppt

Báo cáo khoa học: "Syntax is from Mars while Semantics from Venus! Insights from Spectral Analysis of Distributional Similarity Networks" ppt

Ngày tải lên : 17/03/2014, 02:20
... the other eigenvectors corresponding to the signifi- cantly high eigenvalues are important classifica- tory dimensions. Fig 2 shows the plot of the first eigenvector component (aka eigenvector centrality) ... initial attempt to answer this fundamental and intriguing question, whereby we construct the syn- tactic and semantic distributional similarity net- work (DSN) and analyze their spectrum to un- derstand ... eigenvalue tells us to what extent the rows of the adjacency matrix are correlated and therefore, the corresponding eigenvector is not a dimension pointing to any classificatory basis of the words....
  • 4
  • 250
  • 0
Báo cáo khoa học: "Exploring Distributional Similarity Based Models for Query Spelling Correction" docx

Báo cáo khoa học: "Exploring Distributional Similarity Based Models for Query Spelling Correction" docx

Ngày tải lên : 17/03/2014, 04:20
... use the evidence of distributional similarity to achieve better spelling correction accuracy. We present two methods that are able to take advan- tage of distributional similarity information. ... that is able to leverage all available features, which could include (but not limited to) tradi- tional character string-based typographical simi- larity, phonetic similarity and distributional ... output to drop by around 2%. The work of Ahmad and Kondrak (2005) tried to employ an unsupervised approach to error model estimation. They designed an EM (Expectation Maximization) algorithm to...
  • 8
  • 309
  • 0
Báo cáo khoa học: "Integrating Pattern-based and Distributional Similarity Methods for Lexical Entailment Acquisition" doc

Báo cáo khoa học: "Integrating Pattern-based and Distributional Similarity Methods for Lexical Entailment Acquisition" doc

Ngày tải lên : 17/03/2014, 04:20
... calculated relatively to the total number of cor- rect entailment pairs acquired by both methods together. METHOD P R F Pattern-based 0.44 0.61 0.51 Distributional Similarity 0.33 0.53 ... target noun a scored list of up to a few hundred words with positive distributional similarity scores. Next we need to determine an optimal thresh- old for the similarity score, considering ... investigate automatic acquisi- tion of the lexical entailment relation. For the distributional similarity component we employ the similarity scheme of (Geffet and Dagan, 2004), which was shown to yield...
  • 8
  • 355
  • 0
Báo cáo khoa học: "Discovering Relations among Named Entities from Large Corpora" pot

Báo cáo khoa học: "Discovering Relations among Named Entities from Large Corpora" pot

Ngày tải lên : 17/03/2014, 06:20
... we have set a frequency threshold to re- move those pairs. 3.4 Context similarity among NE pairs We adopt a vector space model and cosine similarity in order to calculate the similarities between ... richly annotated corpora which are tagged with relation in- stances. The biggest problem with this approach is that it takes a great deal of time and effort to prepare annotated corpora large enough to apply ... context vector is ex- tremely small due to a lack of content words, the co- sine similarity between the vector and others might be unreliable. So, we also define a norm threshold in advance to eliminate...
  • 8
  • 283
  • 0

Xem thêm