a novel word segmentation approach

Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

Báo cáo khoa học: "A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers" pptx

... Korean lan- guage, many researchers have adopted a traditional WS approach, which eliminates all spaces in the user input and re-inserts proper word boundaries. Unfortunately, such an approach ... language, the majority of recent research has been based on a traditional WS ap- proach (Nakagawa, 2004). The first step of the traditional approach is to eliminate all spaces in the user input, and then ... the ACL-IJCNLP 2009 Conference Short Papers, pages 29–32, Suntec, Singapore, 4 August 2009. c 2009 ACL and AFNLP A Novel Word Segmentation Approach for Written Languages with Word Boundary Markers Han-Cheol...

Ngày tải lên: 17/03/2014, 02:20

4 268 0
Tài liệu Báo cáo khoa học: "A Novel Feature-based Approach to Chinese Entity Relation Extraction" ppt

Tài liệu Báo cáo khoa học: "A Novel Feature-based Approach to Chinese Entity Relation Extraction" ppt

... tree-kernel approaches are not suitable for Chinese, at least at current stage. In this paper, we study a feature-based approach that basically integrates entity related information with context ... name list and personal relative trigger word list. Jiang and Zhai (2007) then systematically explored a large space of features and evaluated the effectiveness of different feature subspaces ... extraction has been extensively studied in English over the past years. It is typically cast as a classification problem. Existing approaches include feature-based and kernel-based classification....

Ngày tải lên: 20/02/2014, 09:20

4 480 0
Báo cáo khoa học: A novel 2D-based approach to the discovery of candidate substrates for the metalloendopeptidase meprin pot

Báo cáo khoa học: A novel 2D-based approach to the discovery of candidate substrates for the metalloendopeptidase meprin pot

... oxidation of methionine and variable deamidation of asparagine and glutamine. Parent and fragment mass toler- ances were set to 1 Da. Up to two missed cleavages and half tryptic peptides were allowed. ... JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M et al. (2003) Development of human protein reference database as an initial platform ... FEBS A novel 2D-based approach to the discovery of candidate substrates for the metalloendopeptidase meprin Daniel Ambort 1 , Daniel Stalder 2 , Daniel Lottaz 1 , Maya Huguenin 1 , Beatrice Oneda 1 ,...

Ngày tải lên: 07/03/2014, 06:20

20 506 0
Báo cáo khoa học: A novel mass spectrometric approach to the analysis of hormonal peptides in extracts of mouse pancreatic islets ppt

Báo cáo khoa học: A novel mass spectrometric approach to the analysis of hormonal peptides in extracts of mouse pancreatic islets ppt

... Both end-plate potentials of the ion trap were set at 1.5 V and the duration of the electron pulse was 100 ms. Data acquisition and handling Primary data analysis was performed on a workstation running ... 993–998. 16. Tanaka, Y., Sato, I., Iwai, C., Kosaka, T., Ikeda, T. & Nakamura, T. (2001) Identification of human liver diacetyl reductases by nano-liquid chromatography/Fourier transform ion ... were fed a standard pellet diet and tap water ad libitum. Appropriate measures were taken to minimize pain and discomfort for the mice, which were maintained in accordance with the National Institutes...

Ngày tải lên: 17/03/2014, 10:20

7 491 0
Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx

... categorization evaluation based on our word segmentation approach. Due to the fact that our approach use internet-based statistic, we harvest news abstracts from many online newspapers 3 ... lexicon and/or a large and trusted training corpus. Character-based approaches (syllable-based in Vietnamese case) purely extract certain number of characters (syllable). It can further be classified ... our segmentation approach based on 172 3 However, we argue that both above formulas have some drawbacks. Most of Vietnamese 4-grams are actually the combination of two 2-syllable words,...

Ngày tải lên: 12/12/2013, 11:15

6 742 1
Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

Tài liệu Báo cáo khoa học: "Joint Word Segmentation and POS Tagging using a Single Perceptron" docx

... character) 12 tag t on a word starting with char c 0 and containing char c 13 tag t on a word ending with char c 0 and containing char c 14 tag t on a word containing repeated char cc 15 tag ... the tag set (T = 1 for pure word segmentation) . It worked well for word segmentation alone (Zhang and Clark, 2007), even with an agenda size as small as 8, and a simple beam search algorithm also ... Treebank data, the joint model gave an error reduction of 14.6% in segmentation accuracy and 12.2% in the overall segmentation and tagging accu- racy, compared to the traditional pipeline approach. In...

Ngày tải lên: 20/02/2014, 09:20

9 576 0
Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

Báo cáo khoa học: "Incremental Joint Approach to Word Segmentation, POS Tagging, and Dependency Parsing in Chinese" potx

... Kruengkrai, Kiyotaka Uchimoto, Jun’ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara. 2009. An error-driven word- character hybrid model for joint Chinese word segmentation and POS tagging. ... Kazama, Yoshimasa Tsuruoka, Wenliang Chen, Yujie Zhang, and Kentaro Torisawa. 2011. Improving Chinese word segmentation and POS tagging with semi-supervised methods using large auto-analyzed data. ... T01–05 are taken from Zhang and Clark (2010), and P01–P28 are taken from Huang and Sagae (2010). Note that not all features are always considered: each feature is only considered if the action...

Ngày tải lên: 07/03/2014, 18:20

9 524 0
Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

Báo cáo khoa học: "Exploring Deterministic Constraints: From a Constrained English POS Tagger to an Efficient ILP Solution to Chinese Word Segmentation" ppt

... possible tags, i.e. all tag types that are assigned to the word in training data. Furthermore, we approximate unknown words in testing data by rare words in training data. For a word that occurs ... character-based fea- tures in word- based models. Consider a character- based feature function φ(c, t, c) that maps a character-tag pair to a high-dimensional feature space, with respect to an input character ... character-based feature templates defined in Section 3.1 are naturally used in a word- based model. When character-based features are incorporated into word- based CWS models, some word- based features...

Ngày tải lên: 07/03/2014, 18:20

9 425 0
Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

Báo cáo khoa học: "A Cascaded Linear Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" pdf

... Philadelphia, PA 19104, USA jiangwenbin@ict.ac.cn lhuang3@cis.upenn.edu Abstract We propose a cascaded linear model for joint Chinese word segmentation and part- of-speech tagging. With a character-based perceptron ... ap- proach of discriminative models treats segmentation as a labelling problem by assigning each character a boundary tag (Xue and Shen, 2003), Joint S&T can be conducted in a labelling fashion ... trained a 3-gram word language model measuring the flu- ency of the segmentation result, a 4-gram POS lan- guage model functioning as the product of state- transition probabilities in HMM, and a...

Ngày tải lên: 08/03/2014, 01:20

8 445 0
One dimensional organic nanostructures a novel approach based on the selective adsorption of organic molecules on silicon nanowires

One dimensional organic nanostructures a novel approach based on the selective adsorption of organic molecules on silicon nanowires

... Sahaf, L. Masson, C. Leandri, B. Auffray, G. Le Lay, F. Ronci, Appl. Phys. Lett. 90 (2007) 263110. [3] M .A. Valbuena, J. Avila, M.E. Davila, C. Leandri, B. Aufray, G. Le Lay, M.C. Asensio, Appl. ... serving as a good approximation of the local density of states (LDOSs) [6–8]. A single crystal Ag(110) purchased from Mateck was prepared by several cycles of Ar-ion sputtering (500 eV) and annealing (690 ... adsorption on a clean Ag(110) surface [10]. The reactivity of the Ag surface is presumably locally modified by the SiNWs, possibly by the forma- tion a 2D surface Si–Ag alloy, as in the case of Si adsorbed...

Ngày tải lên: 16/03/2014, 15:35

5 466 0
Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

Báo cáo khoa học: "A Stacked Sub-Word Model for Joint Chinese Word Segmentation and Part-of-Speech Tagging" potx

... systems (Ng and Low, 2004; Jiang et al., 200 8a; Zhang and Clark, 2008). 2.2 Character-Based and Word- Based Methods Two kinds of approaches are popular for joint word segmentation and POS tagging. ... information for each character. Each character can be assigned one of two possi- ble boundary tags: “B” for a character that begins a word and “I” for a character that occurs in the mid- dle of a word. ... the “character-based” approach, where basic process- ing units are characters which compose words. In this kind of approach, the task is formulated as the classification of characters into POS tags...

Ngày tải lên: 17/03/2014, 00:20

10 412 0
Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

Báo cáo khoa học: "Parsing the Internal Structure of Words: A New Paradigm for Chinese Word Segmentation" doc

... on Innovative ap- plications of artificial intelligence, AAAI’97/IAAI’97, pages 598–603. AAAI Press. Michael Collins and Terry Koo. 2005. Discrimina- tive reranking for natural language parsing. ... part-of-speech tagged. That is, the bracketing in our case is around characters instead of words. Another observation is we can still evaluate Chinese word segmentation and part- of-speech tagging accuracy, ... of the AFNLP, pages 522–530, Suntec, Singapore, Au- gust. Association for Computational Linguistics. Canasai Kruengkrai, Kiyotaka Uchimoto, Jun’ichi Kazama, Yiou Wang, Kentaro Torisawa, and Hitoshi Isahara....

Ngày tải lên: 17/03/2014, 00:20

10 476 0
Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

... and Representation: Bootstrapping Annotated Language Data. David Chiang. 2007. Hierarchical phrase-based trans- lation. Computational Linguistics, pages 201–228. Michael Collins and Brian Roark. ... ACL and AFNLP Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study Wenbin Jiang † Liang Huang ‡ Qun Liu † † Key Lab. of Intelligent Information ... liuqun}@ict.ac.cn liang.huang.sh@gmail.com Abstract Manually annotated corpora are valuable but scarce resources, yet for many anno- tation tasks such as treebanking and se- quence labeling there...

Ngày tải lên: 17/03/2014, 01:20

9 404 0
Báo cáo khoa học: "A Trainable Rule-based Algorithm for Word Segmentation" pdf

Báo cáo khoa học: "A Trainable Rule-based Algorithm for Word Segmentation" pdf

... before AB any Move from after trigram ABC to before ABC any Figure 1: Possible transformations. A, B, C, J, and K are specific characters; x and y can be any character. ~J and ~K can be any character ... disambiguation (Oflazer and Tur, 1996), and phrase parsing (Vilain and Day, 1996). 2.1 Training Word segmentation can easily be cast as a transformation-based problem, which requires an initial ... encountered, each of the characters was treated as a separate word, as in the CAW algorithm above. This variation of the greedy algorithm, using the same list of 57472 words, produced an initial score...

Ngày tải lên: 17/03/2014, 23:20

8 470 0
Báo cáo khoa học: "A Word-Class Approach to Labeling PSCFG Rules for Machine Translation" pot

Báo cáo khoa học: "A Word-Class Approach to Labeling PSCFG Rules for Machine Translation" pot

... Empirical Methods in Natural Language Process- ing (EMNLP). Masaaki Nagata, Kuniko Saito, Kazuhide Yamamoto, and Kazuteru Ohashi. 2006. A clustered global phrase reordering model for statistical machine ... sparser syntax model, the syntax grammar also contains the hierarchical grammar as a back- bone (cf. Zollmann and Vogel (2010) for details and empirical analysis). We implemented our rule labeling ... of morphologically similar words into the same class. 3 Ashish Venugopal and Andreas Zollmann. 2009. Gram- mar based statistical MT on Hadoop: An end-to-end toolkit for large scale PSCFG based MT. The Prague Bulletin...

Ngày tải lên: 23/03/2014, 16:20

11 424 0

Bạn có muốn tìm thêm với từ khóa:

w