scaling to very very large corpora

Báo cáo khoa học: "Scaling to Very Very Large Corpora for Natural Language Disambiguation" potx

Báo cáo khoa học: "Scaling to Very Very Large Corpora for Natural Language Disambiguation" potx

Ngày tải lên : 23/03/2014, 19:20
... unsupervised learning with large training corpora, in hopes of being able to obtain the benefits that come from significantly larger training corpora without incurring too large a cost. 2 Confusion ... Scaling to Very Very Large Corpora for Natural Language Disambiguation Michele Banko and Eric Brill Microsoft ... exploiting very large corpora when labeled data comes at a cost. 1 Introduction Machine learning techniques, which automatically learn linguistic information from online text corpora, have...
  • 8
  • 265
  • 0
Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Tài liệu Báo cáo khoa học: "Finding Parts in Very Large Corpora" pdf

Ngày tải lên : 20/02/2014, 19:20
... a very large corpus our method finds part words with 55% accuracy for the top 50 words as ranked by the system. The part list could be scanned by an end-user and added to an existing ontology ... tempered to take into account the quantity of data that supports its conclusion. To put this another way, we want to pick (w,p) pairs that have two properties, p(w I P) is high and [ w, pl is large. ... the machines at our disposal, so still larger corpora would not be out of the question. Finally, as noted above, Hearst [2] tried to find parts in corpora but did not achieve good results....
  • 8
  • 351
  • 0
Báo cáo khoa học: "Scaling Distributional Similarity to Large Corpora" doc

Báo cáo khoa học: "Scaling Distributional Similarity to Large Corpora" doc

Ngày tải lên : 31/03/2014, 01:20
... corresponds to a unique node. • The nodes are arranged into a hierarchy of levels, with the bottom level containing n 2 nodes and the top containing a single root node. Each level, except the top, will ... generating bilingual lexicons from parallel corpora. In RI, we first allocate a d length index vec- tor to each unique attribute. The vectors con- sist of a large number of 0s and small number () ... us to choose both the weight and the mea- sure used. LSH and PLEB could not match either the efficiency of RI or the accuracy of SASH. We intend to use this knowledge to process even larger corpora...
  • 8
  • 242
  • 0
99 ways to say " very good"

99 ways to say " very good"

Ngày tải lên : 05/09/2013, 17:10
... been practicing 58. You did it very well 59. FINE 60. Nice going 61. You're really going to town 62. OUSTANDING! 63. FANTASTIC! 64. TREMEDOUS! 65. That's how to handle that 66. Now that's ... u'r doing the right thing 99 ways to say " very good" FOR THOSE DAYS WHEN U CAN'T THINK OF WHAT TO SAY!!! My foreign teacher taught me how to express the congratulation. I think ... You certainly did it well today. 75. Keep it up! 76. Congratulation. You got it right! 77. You did a lot of work today 78. Well look at you go 79. That's it 80. I am very proud of u 81. MARVELOUS! 82....
  • 3
  • 435
  • 0
99 ways to say "very good" docx

99 ways to say "very good" docx

Ngày tải lên : 10/03/2014, 15:20
... 99 ways to say " ;very good" 77. You did a lot of work today 78. Well look at you go 79. That's it 80. I am very proud of you 81. MARVELOUS! 82. I like that 83. Way to go ... practicing 58. You did it very well 59. FINE 60. Nice going 61. You're really going to town 62. OUSTANDING! 63. FANTASTIC! 64. TREMEDOUS! 65. That's how to handle that 66. Now ... cách dưới đây nhé! My foreign teacher taught me how to express the congratulation. I think it is useful, so I post it for everyone to refe and you can apply it in daily life. 1. you're...
  • 8
  • 521
  • 0
Tài liệu Which Bank Is the “Central” Bank? An Application of Markov Theory to the Canadian Large Value Transfer System doc

Tài liệu Which Bank Is the “Central” Bank? An Application of Markov Theory to the Canadian Large Value Transfer System doc

Ngày tải lên : 16/02/2014, 12:20
... would be to assume that the θ’s vary by day; since it could be argued that θ captures both processing speed and other unobserved factors. 17 One way to implement this would be to find the θ vectors ... non-optimal points since as the optimizer gets close to (for example) the unit vector it will stop moving (or slow down in its movements) due to the flatness. 13 and p k i it is i’s aggregate balance ... distribution that corresponds to the transition probability matrix B t . 5 Estimation of the delay parameters We want to choose the vector θ so that over the sample perio d the eigenvectors defined by (6)...
  • 20
  • 438
  • 0
Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Báo cáo khoa học: "CS NIPER Annotation-by-query for non-canonical constructions in large corpora" pdf

Ngày tải lên : 16/03/2014, 20:20
... analysis of large corpora due to a relatively low frequency of instances and whose identification requires expert knowledge to distin- guish them from other similar constructions. Our tool integrates ... expert knowledge to identify instances of linguistic phenomena that are hard to identify by means of existing automatic annotation tools. 1 Introduction Linguistic annotation by means of automatic pro- cedures, ... knowledge to be annotated. We plan to integrate further automatic annotations and query possibilities to support such further use-cases. Acknowledgments We would like to thank Erik-L ˆ an Do Dinh,...
  • 6
  • 356
  • 0
Báo cáo khoa học: "Discovering Relations among Named Entities from Large Corpora" pot

Báo cáo khoa học: "Discovering Relations among Named Entities from Large Corpora" pot

Ngày tải lên : 17/03/2014, 06:20
... and effort to prepare annotated corpora large enough to apply supervised learning. In addition, the varieties of relations were limited to those defined by the ACE RDC task. In order to discover ... phrase as an initial seed in order to find similar verb phrases. 3 Relation Discovery 3.1 Overview We propose a new approach to relation discovery from large text corpora. Our approach is based on 2 A ... beginning of articles) as peculiar to The New York Times. In our experiment, the norm threshold was set to 10. We also used stop words when context vectors are made. The stop words include symbols and...
  • 8
  • 283
  • 0
A simple large scale synthesis of very long aligned silica nanowires

A simple large scale synthesis of very long aligned silica nanowires

Ngày tải lên : 16/03/2014, 15:03
... August 2002; in final form 11 October 2002 Abstract A simple method based on the thermal oxidation of Si wafers has been discovered to provide a large- scale synthesis of very long, aligned silica nanowires. ... Grobert, J. Olivares, J.P. Zhang, H. Terrones, K. Kordatos, W.K. Hsu, J.P. Hare, P.D. Townsend, K. Prassides, A.K. Cheetham, H.W. Kroto, D.R.M. Walton, Nature 388 (1997) 52. J.Q. Hu et al. / Chemical ... mechanical rotary pump to a base pressure of 6 Â 10 À2 Torr. The furnace was heated at a rate of 10 °C/min to 800 °C and kept at this temperature for 30 min, and then further heated to and kept at 1300...
  • 5
  • 524
  • 0
Báo cáo khoa học: "Practical very large scale CRFs" potx

Báo cáo khoa học: "Practical very large scale CRFs" potx

Ngày tải lên : 16/03/2014, 23:20
... resorts to scaling , a solution commonly used for HMMs. Scaling amounts to normalizing the values of α t and β t to one, making sure to keep track of the cumulated normalization factors so as to ... computations of exp(x) are vec- torized, which provides an additional speed up of about 20%. 4.3 Optimization in Large Parameter Spaces Processing very large feature vectors, up to bil- lions of components, ... Issues Efficiently processing very- large feature and ob- servation sets requires to pay attention to many implementation details. In this section, we present several optimizations devised to speed up training. 4.1...
  • 10
  • 314
  • 0
Báo cáo khoa học: "Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation" ppt

Báo cáo khoa học: "Using Large Monolingual and Bilingual Corpora to Improve Coordination Disambiguation" ppt

Ngày tải lên : 17/03/2014, 00:20
... asbestos w 2 h and w 1 polyvinyl English: asbestos , and polyvinyl chloride w 1 , and w 2 h chloride English: asbestos and chloride w 1 and h (no ellipsis) Portuguese: o amianto e o cloreto de ... acquire the counts using custom tools for managing web-scale N-gram 1348 Algorithm 1 The bilingual co-training algorithm: subscript m corresponds to monolingual, b to bilingual Given: • a set ... i = 0 to k do Use L m to train a classifier h m using only ¯x m , the monolingual features of ¯x Use L b to train a classifier h b using only ¯x b , the bilingual features of ¯x Use h m to label...
  • 10
  • 406
  • 0

Xem thêm