当前位置: 首页 > 工具软件 > SpamBayes > 使用案例 >

SpamBayes

郎雪风
2023-12-01
  1. SpamBayes是一个用Python编写的贝叶斯 垃圾邮件过滤器,它使用了Paul Graham在他的文章“垃圾邮件计划”中提出的技巧。随后,Gary Robinson和Tim Peters等人对其进行了改进。
  2. 传统的贝叶斯过滤器和SpamBayes使用的过滤器之间最显着的区别是有三种分类而不是两种:垃圾邮件,非垃圾邮件(在SpamBayes中称为ham),和不确定
  3. 用户将消息训练为火腿或垃圾邮件; 过滤邮件时,垃圾邮件过滤器为火腿生成一个分数,为垃圾邮件生成另一个分数。如果垃圾邮件分数较高且火腿分数较低,则该邮件将被归类为垃圾邮件
  4. 如果垃圾邮件分数较低且火腿得分较高,则该邮件将被归类为火腿。如果分数既高又低,则该消息将被归类为不确定
  5. 不确定的这种方法导致假阳性和假阴性的数量较少,但它可能导致许多需要人类决定的不确定因素。
  6. 来自维基百科,原文如下:
    SpamBayes Original author(s)is a Bayesian spam filter written in Python which uses techniques laid out by Paul Graham in his essay “A Plan for Spam”. It has subsequently been improved by Gary Robinson and Tim Peters, among others.
    The most notable difference between a conventional Bayesian filter and the filter used by SpamBayes is that there are three classifications rather than two: spam, non-spam (called ham in SpamBayes), and unsure. The user trains a message as being either ham or spam; when filtering a message, the spam filters generate one score for ham and another for spam.
    If the spam score is high and the ham score is low, the message will be classified as spam.
    If the spam score is low and the ham score is high, the message will be classified as ham.
    If the scores are both high or both low, the message will be classified as unsure.
    This approach leads to a low number of false positives and false negatives, but it may result in a number of unsures which need a human decision.
 类似资料:

相关阅读

相关文章

相关问答