06 - automatic personalized spam filtering through significant word modeling

06 - automatic personalized spam filtering through significant word modeling

06 - automatic personalized spam filtering through significant word modeling

... -spam_ score = ∑ W Si (sum is over all significant spam words in e-mail) -nonspam_score = ∑ W Ni (sum is over all significant non- spam words in e-mail) -If (s * spam_ score > nonspam_score) ... D) C Si = count of word i in all spam e-mails C Ni = count of word i in all non -spam e-mails Z S = set of significant spam words Z N = set of significant n...

Ngày tải lên: 22/03/2014, 22:25

7 324 0
Báo cáo khoa học: "Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level" doc

Báo cáo khoa học: "Automatic Evaluation of Chinese Translation Output: Word-Level or Character-Level" doc

... (HLT-NAACL 2003) - short papers, pages 6 1-6 3, Edmonton, Canada. Hwee Tou Ng and Jin Kiat Low, 2004. Chinese Part-of- Speech Tagging: One-at-a-Time or All-at-Once? Word- Based or Character-Based? ... state-of-the-art CWS systems in the CIPS-SIGHAN’2010 evaluation. This tool is trained on Chinese Treebank 6.0. 4 Experimental Results 4.1 Data To compare the word- level automati...

Ngày tải lên: 17/03/2014, 00:20

6 344 1
Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

Báo cáo khoa học: "Automatic Adaptation of Annotation Standards: Chinese Word Segmentation and POS Tagging – A Case Study" potx

... Low (2004). For word segmentation only, there are four boundary tags: • b: the begin of the word • m: the middle of the word • e: the end of the word • s: a single-character word while for Joint ... of standard-adaptation for word segmen- tation. Suppose we are considering the third char- acter “” in “  ”. annotation-styles. Gao et al. (2004) described a transformation- bas...

Ngày tải lên: 17/03/2014, 01:20

9 404 0
02 - spam filtering using spam mail communities

02 - spam filtering using spam mail communities

... Applications and the Internet (SAINT’05) 0-7 69 5-2 26 2-9 /05 $ 20.00 IEEE the word ‘sex’ would be spam 99% of the time’. Thus a person who would like to receive porn -spam but not others also can be accommodated. ... Approaches to filter spam The current techniques to filter spam mail do it by means of classifying a message as either spam or non- spam (legitimate). M...

Ngày tải lên: 22/03/2014, 22:25

7 216 0
03 - spam filtering based on preference ranking

03 - spam filtering based on preference ranking

... 0-7 69 5-2 432-X/05 $20.00 © 2005 IEEE An additional layer in the spam filtering process is presented as a new spam filter [5]. This filter is based on a representative vocabulary. Spam e-mails ... 0-7 69 5-2 432-X/05 $20.00 © 2005 IEEE Section 4 provides our experiment results and analysis. Finally we summarize this chapter. 2 Anti -Spam Technologies and Related...

Ngày tải lên: 22/03/2014, 22:25

5 114 0
04 - collaborative spam filtering using e-mail networks

04 - collaborative spam filtering using e-mail networks

... such collaborative spam- filtering mechanisms can be implemented as plug-ins to popular e-mail programs such as Microsoft Outlook. COLLABORATIVE SPAM- FILTERING MECHANISMS Our spam- filtering system ... linearly with its e-mail activity level—that is,its degree in the network. A B C D 2 e-mails (0.2) 10 e-mails (0.5) 9 e-mails (1.0) 8 e-mails (0.8) 5 e-mails (0.25) 5 e-mails (0.25) 7 e-m...

Ngày tải lên: 22/03/2014, 22:25

7 287 0
05 - current and new developments in spam filtering

05 - current and new developments in spam filtering

... recent years with new, ML-based technologies. In the last 3-4 years, substantial academic research has taken place to evaluate new ML-based approaches to filtering spam. ML filtering techniques ... Bayesian filtering - along with its variants - provide the greatest potential for future spam prevention. 1. Introduction Constructing a single model to classify a broad r...

Ngày tải lên: 22/03/2014, 22:25

6 384 0
Báo cáo khoa học: "Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation" doc

Báo cáo khoa học: "Using Rejuvenation to Improve Particle Filtering for Bayesian Word Segmentation" doc

... -2 65.40 PF 16 62.34 -2 39.22 50.14 -2 62.34 PF 1000 64.11 -2 34.87 57.88 -2 54.17 PF 1,100 63.17 -2 45.32 66.88 -2 57.65 PF 16,100 68.05 -2 35.71 70.05 -2 51.66 PF 1,1600 77 .06 -2 28.79 74.47 -2 49.78 Table 1: Results ... (Goldwater, 2 006; Gold- water et al., 2009) A sequence of words or utterance is generated by making independent draws from a discrete di...

Ngày tải lên: 30/03/2014, 17:20

5 222 0
Báo cáo khoa học: "The Contribution of Stylistic Information to Content-based Mobile Spam Filtering" potx

Báo cáo khoa học: "The Contribution of Stylistic Information to Content-based Mobile Spam Filtering" potx

... varies between spam and non -spam messages. For in- stance, non-spammers often use special characters to create emoticons to express their mood, such as “ :-) ” (smiling) or “T T” (crying), whereas spam- mers ... including words, charac- ter n-grams, and orthogonal sparse word bi- grams (OSB) 3 . This feature set represents the content-based approaches previously pro- posed by G ´ om...

Ngày tải lên: 31/03/2014, 00:20

4 382 0
Báo cáo khoa học: "Aspect Extraction through Semi-Supervised Modeling" doc

Báo cáo khoa học: "Aspect Extraction through Semi-Supervised Modeling" doc

... model. As DF-LDA requires must-link and cannot-link constraints, we used our seed sets to generate intra-seed set must- link and inter-seed set cannot-link constraints. For its hyper-parameters, ... both SAS and ME-SAS can also discover sentiments. To compare the performance with our models, we use two existing state-of-the-art models, ME-LDA (Zhao et al. 2010) and DF-LDA (Andrzejewski...

Ngày tải lên: 30/03/2014, 17:20

10 200 0
w