
摘 要
2、研究实现了一种改进垃圾邮件过滤的算法。在朴素贝叶斯算法的基础上,结合AdaBoost自适应增强算法,优化模型参数,提高邮件过滤性能,测试阶段,算法分类的平均错误率约为0.6%,性能提高约70%; [版权所有:http://DOC163.com]
E-mail as a modern Internet service, is applied widely with low cost, easy operation and the advantages of the instant interactive, has gradually become an essential part of network communication. However, there are still some problems in the current mail systems, such as: user may receives many e-mails and email inbox gradually bloated, however messages are not classified and sorted according to certain rules, it is not convenient for users to manage e-mails, and may, therefore, will miss some important messages. Another problem is the proliferation of spam, which brings great inconvenience to users. Although the current spam filter model based on the white-list and black-list is good, but the user is not easy to operate, spam filtering is not universal.The paper's main research work is as follows:
1.Researched and implemented a naive Bayesian spam filtering algorithm, the algorithm trained by a large number of samples to optimize the model parameters. In the testing phase, the average error rate of the algorithm is about 2.06%;
2.Researched and implemented an improved spam filtering algorithm, based on the naive Bayesian algorithm, using AdaBoost adaptive enhancement algorithm to optimize the parameters of the model, and improve the performance of the mail filtering. In the testing phase, the average error rate of classification is about 0.6% and improve the performance about 70%;
3.Researched and implemented a mail priority setting algorithm, Using the subject, content, attachments and other content to set a certain priority and rank the email by priorities;
4.Implemented the complete intelligent mail processing platform and build a java server with machine learning algorithm script, realized the intelligent mail sending, receiving, spam filtering and prioritization and processing functions.
Through the test of intelligent mail processing system, in the long time of different mail analysis experiment, we found that the average error rate of spam filtering and e-mail priority setting is less than 5%, indicating that the intelligent e-mail processing system has high spam filtering accuracy, set the priority mail effectively, and without the complex setting, easy to operate.
Keywords: machine learning; Spam filtering; Email priority ; Mail intelligent processing platform;

目 录
第1章 绪论 1
1.1 课题研究的背景及意义 1
1.2 国内外的研究现状分析 1
1.3 论文组织结构 3
第2章 智能邮件处理的基本理论和关键技术 4
2.1 电子邮件系统的工作原理 4 [资料来源:https://www.doc163.com]
2.1.1 电子邮件系统的基本构成 4
2.1.2 电子邮件系统常用的网络协议 5
2.1.3 电子邮件系统的传送机制 5
2.2 机器学习的基本原理 6
2.3 贝叶斯理论基本内容 7
2.4 J2EE后台相关技术 7
2.4.1 MVC模式简介 8
2.4.2 SSH框架简介 8
2.5 本章小结 9
第3章 智能邮件处理系统的设计与实现 10
3.1 系统总体架构 10
3.2 邮件获取与解析模块 11
3.2.1 电子邮件读取协议详解 11
3.2.2 解析邮件内容 12
3.3 垃圾邮件过滤及邮件优先级设定模块 13
3.3.1 文档词袋模型 13
3.3.2 语料库获取及数据预处理 14
3.3.3 朴素贝叶斯垃圾邮件过滤算法的设计与实现 15
3.3.4 朴素贝叶斯过滤器结合AdaBoost算法的设计与实现 17
3.3.5 邮件优先级设定算法 18
3.4 后台服务器模块 19
3.5 本章小结 20
第4章 智能邮件处理系统的运行及结果分析 21
4.1 邮件获取与解析模块 21
4.1.1 运行效果 21
4.1.2 结果分析 21
4.2 朴素贝叶斯垃圾邮件过滤模块 21
4.2.1 运行效果 21
4.2.2 结果分析 22
4.3 基于扩展朴素贝叶斯算法的邮件优先级设定 22
4.3.1 运行效果 23
4.3.2结果分析 23
4.4 利用AdaBoost算法提高过滤性能 23
4.4.1 运行效果 24
4.4.2 结果分析 24
4.5 Java后台及前端展示 24
4.5.1 运行效果 24
4.5.2 结果分析 26
4.6 本章小结 26
第5章 总结与展望 27
5.1 论文总结 27
5.2 工作展望 27
参考文献 29
致 谢 30