0

a dom tree alignment model for mining parallel data from the web

Báo cáo khoa học:

Báo cáo khoa học: "A DOM Tree Alignment Model for Mining Parallel Data from the Web" doc

Báo cáo khoa học

... proposes a new web parallel data mining scheme. Given a pair of parallel web pages as seeds, the Document Object Model 1 (DOM) is used to represent the web pages as a pair of DOM trees. Then a ... web pages manually labeled as parallel or non- parallel. The Iterative Scaling algorithm (Pietra, Pietra and Lafferty 1995) is used for the training. 7 Experimental Results The DOM tree alignment ... discovered are regarded as anchors to new parallel data. This makes the mining scheme an iterative process. The new mining scheme has three advantages: (i) Mining coverage is increased. Parallel...
  • 8
  • 435
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "Mining Parenthetical Translations from the Web by Word Alignment" potx

Báo cáo khoa học

... parenthetical translations are extremely valuable both as a stand-alone on-line dictionary and as training data for statistical machine transla-tion systems. They provide fresh data (new words) and cover ... that Exact Match is a rather stringent crite-rion. Table 7 shows a random sample of extracted parenthetical translations that failed the Exact Match test. Only a small percentage of them are ... on a partially parallel corpus to extract translation pairs from the web. Treating the transla-tion extraction problem as a word alignment prob-lem allowed us to generalize across instances...
  • 9
  • 612
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Tree Transducer Model for Synchronous Tree-Adjoining Grammars" pdf

Báo cáo khoa học

... we assumethat all adjunctions are mandatory; i.e., if an aux-iliary tree can be adjoined, then we need to makean adjunction. Thus, a derivation starting from aninitial tree to a derived tree ... 2010.c2010 Association for Computational Linguistics A Tree Transducer Model for Synchronous Tree- Adjoining GrammarsAndreas MalettiUniversitat Rovira i VirgiliAvinguda de Catalunya 25, 43002 Tarragona, ... auxiliary tree by a special marker. Traditionally, the root label A ofan auxiliary tree is replaced by A ∅once adjoined.Since we assume that there are no auxiliary treeswith such a root label,...
  • 10
  • 294
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Syntax-Driven Bracketing Model for Phrase-Based Translation" pptx

Báo cáo khoa học

... to the word align-ments, we define bracketable and unbracketableinstances. For each of these instances, we auto-matically extract relevant syntactic features from the source parse tree as bracketing ... consid-ered as a syntactic constraint. Therefore wecan use thousands of syntactic constraints toguide phrase translation.ã The SDB model maintains and protects the strength of the phrase-based approach ... in a better way than the CMVC does. It is able toreward non-syntactic translations by assign-ing an adequate probability to them if thesetranslations are appropriate to particular syn-tactic...
  • 9
  • 438
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Unified Syntactic Model for Parsing Fluent and Disfluent Speech∗" ppt

Báo cáo khoa học

... modified for use in a specialrepair grammar, which not only reduces the amountof available training data, but violates our intuitionthat most reparanda are fluent up until the actual editoccurs. The ... is the fact that thereis often a good deal of overlap in words between the reparandum and the alteration, as speakers maytrace back several words when restarting after an er-ror. For instance, ... Communication Re-search Centre, University of Edinburgh.John Hale, Izhak Shafran, Lisa Yung, Bonnie Dorr, MaryHarper, Anna Krasnyanskaya, Matthew Lease, YangLiu, Brian Roark, Matthew Snover, and...
  • 4
  • 581
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "A Phrase-based Statistical Model for SMS Text Normalization" ppt

Báo cáo khoa học

... groups and domains can be modeled separately without accessing and adapting the language model of the MT system for each SMS application. Another advantage is that the normalization module can ... normalization as a translation problem from the SMS language to the English language1 and we propose to adapt a phrase-based statistical MT model for the task. Evaluation by 5-fold cross validation ... a consensus translation technique to bootstrap parallel data using off -the- shelf translation sys-tems for training a hierarchical statistical transla-tion model for general domain instant...
  • 8
  • 399
  • 0
Tài liệu Báo cáo khoa học:

Tài liệu Báo cáo khoa học: "GPSM: A GENERALIZED PROBABILISTIC SEMANTIC MODEL FOR AMBIGUITY RESOLUTION" pptx

Báo cáo khoa học

... environment. The open test per- formance can be attributed to the small database size and the estimation error of the parameters thus introduced. Because the training database is small with respect ... sentences while using the remaining parts as the training set. The overall performance is then estimated as the average performance of the 10 iterations. The performance is evaluated in terms of ... under a uniform formulation. The se- mantic score measure shows substantial im- provement in structural disambiguation over a syntax-based approach. 1. Introduction In a large natural language...
  • 8
  • 412
  • 0
Tài liệu Báo cáo khoa học: Trophoblast-like human choriocarcinoma cells serve as a suitable in vitro model for selective cholesteryl ester uptake from high density lipoproteins pdf

Tài liệu Báo cáo khoa học: Trophoblast-like human choriocarcinoma cells serve as a suitable in vitro model for selective cholesteryl ester uptake from high density lipoproteins pdf

Báo cáo khoa học

... island and cell columns, isformed which maintains the ability of proliferation andinvasion. Choriocarcinoma is a malignant neoplasm thatrepresents the early trophoblast of the attachment phase ... TOPO-TA cloning vector and sequenced.Northern blot analysisTotal RNA was isolated from choriocarcinoma and humanliver tissues (used as a positive control) by the RNA-easy kit(Qiagen) exactly as ... of lipoprotein-associated cholesterol across the placenta from the maternal circulation [4–8]. The fact that the placenta binds and internalizes maternallipoproteins both in vivo and in vitro...
  • 12
  • 470
  • 0
Báo cáo khoa học: A mouse model for in vivo tracking of the major dust mite allergen Der p 2 after inhalation docx

Báo cáo khoa học: A mouse model for in vivo tracking of the major dust mite allergen Der p 2 after inhalation docx

Báo cáo khoa học

... band) that the clearance and further metabo-lism of the allergen was altered as a result of the inflammation in the lungs of sensitized animals.Up to now there are few data available on the fateof ... day 30 displayed an airwayinflammation 18 h after treatment, in the same magni-tude as in animals challenged with a third HDM aero-sol on day 30 (data not shown). The animals werekilled after ... visual-ized by autoradiography using a PhosphorImager with the image quant software (both from Molecular Dynamics,Sunnyvale, CA, USA). As standards for SDS ⁄ PAGEautoradiography,75Se-labelled...
  • 12
  • 518
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness" pot

Báo cáo khoa học

... the M&C data set. It surpasses HR though in the R&Gand the 353-C data set. The latter contains the word pairs of the M&C data set. To visualize the performance of our measure in a more comprehen-sible ... pattern as the human ratings,as closely as our measure of relatedness does (lowy values for small x values and high y values for high x). The same pattern applies in the M&C and353-C data ... stan-dard (human judgements). The correlations for the three data sets show thatSR performs better than any other measure of se-mantic relatedness, besides the case of (HR) in the M&C data...
  • 9
  • 394
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Class-Based Agreement Model for Generating Accurately Inflected Translations" pptx

Báo cáo khoa học

... 5. The inputs are a translationhypothesiseI1, an indexndistinguishing the prefix from the attachment, and a flag indicating if theirconcatenation is a goal hypothesis. The beam search maintains ... from phrase-structure trees. Williams and Koehn (2011)annotated German trees, and extracted translationrules from them. They then specified manual unifi-cation rules, and applied a penalty according ... of the source. This large gap between the unigram recallof the actual translation output (top) and the lexicalcoverage of the phrase-based model (bottom) indi-cates that translation performance...
  • 10
  • 414
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học

... predict the POStag with positional information for each character.Each character can be assigned one of two possi-ble boundary tags: “B” for a character that begins a word and “I” for a character ... readersto read the above paper for details. For parameterestimation, our work adopt the Passive-Aggressive(PA) framework (Crammer et al., 2006), a familyof margin based online learning algorithms. ... segmentationcan also be formulated as a sequential classificationproblem to predict whether a character is located at the beginning of, inside or at the end of a word. Thischaracter-by-character...
  • 10
  • 412
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Language-Independent Unsupervised Model for Morphological Segmentation" pot

Báo cáo khoa học

... making available the code of the RePortSalgorithm, and Stefan Bordag and Delphine Bern-hard for running their algorithms on the German data. Many thanks also to Matti Varjokallio for eval-uating ... presentedhere have been shown to improve accuracy (Kurimoet al., 2006).Another motivation for evaluating the system on a task rather than on manually annotated data isthat linguistically motivated morphological ... the most probable affixes from both ends of the word, all possible segmentations of the word aregenerated and ranked using the language model. The probabilities for the language model are learnt...
  • 8
  • 288
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Hierarchical Phrase-Based Model for Statistical Machine Translation" pptx

Báo cáo khoa học

... build only par-tial translations using hierarchical phrases, and thencombine them serially as in a standard phrase-based model. For a partial example of a synchronous CFGderivation, see Figure ... Pharaoh, a state-of -the- artphrase-based system.1 Introduction The alignment template translation model (Och andNey, 2004) and related phrase-based models ad-vanced the previous state of the art ... Yonggang Deng, and William Byrne.2005. A weighted finite state transducer transla-tion template model for statistical machine translation.Natural Language Engineering. To appear.Daniel Marcu and...
  • 8
  • 331
  • 0
Báo cáo khoa học:

Báo cáo khoa học: "A Generative Constituent-Context Model for Improved Grammar Induction" docx

Báo cáo khoa học

... performance or data- likelihood.However, parameter search methods have a poten-tial advantage. By aggregating over only valid, com-plete parses of each sentence, they naturally incor-porate the ... shown, but are modeled.Parameter search is also local; parameters whichare locally optimal may be globally poor. A con-crete example is the experiments from (Carroll andCharniak, 1992). They restricted ... Communi-cation Papers for the 97th Meeting of the Acoustical Societyof America, pages 547–550.Eric Brill. 1993. Automatic grammar induction and parsing freetext: A transformation-based approach....
  • 8
  • 316
  • 0

Xem thêm