... Models for Naive Bayes Text Classification. AAAI ’98 workshop on Learning for Text Categorization, pp. 41-48. K. P. Nigam, A. McCallum, S. Thrun, and T. Mitchell, 1998, Learning to Classify Text ... centroid-contexts and contexts selected by the similarity measure. Using the context-clusters as labeled training data, a Naive Bayes classifier can be built. Since the Naive Bayes classifier ... clustering algorithms for text categorization (Slonim et al., 2002). Nigam studied an Expected Maximization (EM) technique for combining labeled and unlabeled data for text categorization in...
Ngày tải lên: 20/02/2014, 16:20
Ngày tải lên: 30/03/2014, 23:20
A Comparison of Event Models for Naive Bayes Text Classication potx
... classification performance on five text corpora. We find that the multi-variate Bernoulli performs well with small vocabulary sizes, but that the multinomial performs usually performs even better ... other given the con- text of the class. This is the so-called naive Bayes assumption.” While this assumption is clearly false in most real-world tasks, naive Bayes often performs classification ... community about the naive Bayes classifier because there are two dif- ferent generative models in common use, both of which make the naive Bayes assumption.” Both are called naive Bayes by their...
Ngày tải lên: 16/03/2014, 19:20
Tài liệu Word Segmentation for Vietnamese Text Categorization: An online corpus approach pptx
... mapping method for text categorization and retrieval. ACM Transaction on Information System (TOIS’94): 252-277. \ [18] Yiming Yang, Xin Liu 1999. A re-examination for text categorizationmethods. ... experiment. B. Text Categorization Experiment As we stated above, there are many approaches performing text categorization task. Nevertheless, the best performance approach for English may not ... this task for future works. In this part, we perform a simple text categorization experiment for testing our segmentation approach based on 172 3 However, we argue that both above formulas...
Ngày tải lên: 12/12/2013, 11:15
Báo cáo khoa học: "Exploiting Comparable Corpora and Bilingual Dictionaries for Cross-Language Text Categorization" potx
... and low cost solu- tion for the Cross-Language Text Categorization task. In particular, when bilingual dictionar- ies/repositories are available, the performance of the categorization gets close ... of texts defined by T ∗ = i T i . If the function ψ exists for every text t i z ∈ T ∗ and for every language L j , and is known, then the corpus is parallel and aligned at document level. For ... Melamed. 2001. Empirical Methods for Exploiting Parallel Texts. The MIT Press. L. Rigutini, M. Maggini, and B. Liu. 2005. An EM based training algorithm for cross-language text cat- egorizaton. In Proceedings...
Ngày tải lên: 17/03/2014, 04:20
Báo cáo khoa học: "A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering" potx
Ngày tải lên: 31/03/2014, 20:20
Tài liệu Báo cáo khoa học: "An ERP-based Brain-Computer Interface for text entry using Rapid Serial Visual Presentation and Language Modeling" ppt
... symbols is removed and, like before, the user can take a break and then the system continues with the next epoch. 3 Language Modeling Language modeling is important for many text pro- cessing applications, ... impaired users. Simpler interactions via brain-computer in- terfaces (BCI) hold much promise for effective text communication for these most impaired users. Yet these simple interfaces have yet to take full ... for each stimulus. Sixth, the conditional prob- ability of each letter given the typed history is ob- tained from the language model. Seventh, Bayesian fusion (which assumes the EEG-based information and...
Ngày tải lên: 20/02/2014, 05:20
Tài liệu Báo cáo khoa học: "Demonstration of the UAM CorpusTool for text and image annotation" docx
... specific file for annotation at a specific layer (each file has a button for each layer). 3 Tag Hierarchy Editing Most of the current text annotation tools lack built- in facilities for creating ... CorpusTool allows for partially overlap- ping segments, and embedding of segments. Annotated texts are stored using stand-off XML, one file per source text and layer. See Figure 4 for a sample. ... been necessary where software for automatic annotation has not been available, e.g., for linguistic patterns which are not easily identi- fied by machine, or for languages without suffi- cient...
Ngày tải lên: 20/02/2014, 09:20
Tài liệu Báo cáo khoa học: "Outilex, a Linguistic Platform for Text Processing" pdf
... put in our RTNs. This can be used, for instance, to insert tags in texts and therefore formalize relations between identified segments. This formalism allows for the construction of local grammars ... Data are structured both in standard XML formats and in more compact ones. Format converters are in- cluded in the platform. The WRTN formalism al- lows for combining statistical methods with meth- ods ... context-free grammar) and a text in the form of an acyclic finite state automaton (instead of a word sequence). The result of the parsing consists of a shared forest of weighted syntactic trees for...
Ngày tải lên: 20/02/2014, 12:20
Tài liệu Báo cáo khoa học: "Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales" doc
... examples for learning sentiment. In Workshop on the Analysis of Informal and Formal Information Exchange during Negotiations (FINEXIN). Liu, Hugo, Henry Lieberman, and Ted Selker. 2003. A model of textual ... “straightforward”, “likeable” (class 2). Some unexpected distinguish- ing terms for this author are “lion” for class 2 (three- class case), and for class 2 in the four-class case, “jennifer”, for ... deviation of PSP for reviews expressing different ratings. But before proceeding, we note that it is possi- ble that similarity information might yield no extra benefit at all. For instance, we...
Ngày tải lên: 20/02/2014, 15:20
Tài liệu Báo cáo khoa học: "Fragments and Text Categorization" pptx
... highest impact was observed for J48. Thus, for instance for Czech, it was observed for fragments that the ac- curacy was higher for 14 out of 15 tasks when J48 had been used, and for 12 out of 15 in ... of text categoriza- tion. For the Na¨ıve Bayes classifier this increase is significant. 1 Motivation In the process of automatic classifying documents into several predefined classes – text categorization (Sebastiani, ... also looked for the optimal length of fragments. We found that for the lengths of fragments for the range about the average document length (in the learning set), the accuracy increased for the signifi- cant...
Ngày tải lên: 20/02/2014, 16:20
Tài liệu Báo cáo khoa học: "DiMLex: A lexicon of discourse markers for text generation and understanding" docx
... in text understanding In text understanding, discourse markers serve as cues for inferring the rhetorical or seman- tic structure of the text. In the approach pro- posed by Marcu [1997], for ... based formalism that allows for a compact representation of individual descriptions, hy- ponymic relations between them, and polyse- mous entries. 3 Using DiMLex in text generation Present text ... ory: Towards a functional theory of text organi- zation." In: TEXT, 8:243-281, 1988 D. Marcu. "The rhetorical parsing of natural lan- guage text. " In: Proceedings of the 35th...
Ngày tải lên: 20/02/2014, 18:20
khai phá dữ liệu dùng thuật toán K-mean và naive bayes trên wave
... 43 @data f,f,e,t,n,f,c,n,k,e,e,s,s,w,w,p,w,o,p,k,s,u x,s,y,t,a,f,c,b,k,e,c,s,s,w,w,p,w,o,p,n,n,g b,s,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,n,m x,y,w,t,p,f,c,n,n,e,e,s,s,w,w,p,w,o,p,k,s,u x,s,g,f,n,f,w,b,k,t,e,s,s,w,w,p,w,o,e,n,a,g x,y,y,t,a,f,c,b,n,e,c,s,s,w,w,p,w,o,p,k,n,g b,s,w,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,n,m b,y,w,t,l,f,c,b,n,e,c,s,s,w,w,p,w,o,p,n,s,m x,y,w,t,p,f,c,n,p,e,e,s,s,w,w,p,w,o,p,k,v,g b,s,y,t,a,f,c,b,g,e,c,s,s,w,w,p,w,o,p,k,s,m Kết quả kiểm thử mô hình như sau : === Run information === Scheme: weka.classifiers .bayes. NaiveBayes Relation: mushroom Instances: 8124 Attributes: 23 Test mode: user ... Các phương pháp dựa trên luật (Rule-based Methods) - Các phương pháp Bayes «Ngây thơ» (Na¨ıve Bayes) và mạng tin cậy Bayes (Bayesian Belief Networks) - Các phương pháp máy vector hỗ trợ (Support ... CSDL DM Data Mining Khai phá dữ liệu FCM Fuzzy c-Mean Thuật toán c-Mean mờ NB Naıve Bayes Thuật toán Naive Bayes FP False positives Khẳng định sai FN False negatives Phủ định sai TP True positives...
Ngày tải lên: 05/03/2014, 17:56
Báo cáo khoa học: "A Study on Automatically Extracted Keywords in Text Categorization" doc
... be state-of-the-art. 3 Text Categorization Experiments This section describes in detail the four experi- mental settings for the text categorization exper- iments. 3.1 Corpus For the text categorization ... a common form. In addition, any of a number of feature selection metrics may be applied to further reduce the space, for example chi-square, or infor- mation gain (see for example Forman (2003) for ... improve text categorization. In summary we show that a higher perfor- mance — as measured by micro-averaged F-measure on a standard text categoriza- tion collection — is achieved when the full-text...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "A Comparison and Semi-Quantitative Analysis of Words and Character-Bigrams as Features in Chinese Text Categorization" potx
... Chinese text process- ing tasks, but no systematic comparison or analysis of their values as features for Chinese text categorization has been re- ported heretofore. We carry out here a full performance ... similar comparative studies have been re- ported for Text Categorization (Li et al., 2003) so far in literature. Text categorization and Information Retrieval are tasks that sometimes share ... Space Model (VSM) in text information processing, document indexing (term extraction) acts as a pre-requisite step in most text information proc- essing tasks such as Information Retrieval (Baeza-Yates...
Ngày tải lên: 08/03/2014, 02:21
Báo cáo khoa học: "A Generalized Vector Space Model for Text Retrieval Based on Semantic Relatedness" pot
... reveals that semantic information can boost text retrieval performance with the use of the proposed GVSM. 1 Introduction The use of semantic information into text retrieval or text classification has ... controversial. For example in Mavroeidis et al. (2005) it was shown that a GVSM using WordNet (Fellbaum, 1998) senses and their hypernyms, improves text clas- sification performance, especially for small ... represent- ing users’ information needs. Let also t i symbol- 70 showed that this can improve text categorization. Stokoe et al. (Stokoe et al., 2003) reported an im- provement in retrieval performance using...
Ngày tải lên: 08/03/2014, 21:20
Báo cáo khoa học: "Evaluating Centering-based metrics of coherence for text structuring using a reliably annotated corpus" doc
... Entity Coherence for Descriptive Text Structuring. Ph.D. the- sis, Division of Informatics, University of Ed- inburgh. Rodger Kibble and Richard Power. 2000. An integrated framework for text planning ... Evaluating the coherence of a text and text structuring The statistics about transitions computed as just discussed can be used to determine the de- gree to which a text conforms with, or violates, Centering’s ... M.CHEAP. The exact number of BfCs for which the classification rate of M.NOCB is lower than its competitor for each comparison is re- ported in the next column of the Table. For ex- ample, M.NOCB has...
Ngày tải lên: 17/03/2014, 06:20
Báo cáo khoa học: "Modeling Topic Dependencies in Hierarchical Text Categorization" pot
Ngày tải lên: 23/03/2014, 14:20
Báo cáo khoa học: "A High-Performance Semi-Supervised Learning Method for Text Chunking" pot
Ngày tải lên: 23/03/2014, 19:20
Báo cáo khoa học: "Discourse Structures for Text Generation" doc
Ngày tải lên: 24/03/2014, 01:21