基于词内涵的文本分类算法研究
基于词内涵的文本分类算法研究(论文12000字)
摘要:随着互联网技术的迅速发展与普及,我们进入到了一个信息爆炸的时代,如何对浩如烟海的文本数据进行分类、组织和管理,成了当今时代最重要的问题之一,因此文本分类已经成为一个具有重要用途的研究课题。本文的主题就是研究文本分类算法,本文将会基于词内涵研究文本分类算法,通过分析出词内涵,研究词与词组合在一起的规律,找出关键词的词频规律,与已知分类文本进行比较,从而实现文本分类。词内涵与词义不同,词义仅指词语的意思,而词内涵包含了词义、词语背景、语境等等,是范围更大的,所以我们通过研究词内涵可以更精确的实现文本分类。本文只是基于词内涵初步实现文本分类的实验,还有待改善。
关键词:词内涵,词频,语料库,文本分类
Research on Text Classification Algorithm Based on Word Connotation
Abstract: With the rapid development of Internet technology and the popularity of the Internet , we have entered into an era of information explosion. How to categorize, organize, and manage large amounts of text data has become one of the most important problems in resent times. Therefore, text classification becomes a research hotspot. The theme of this paper is to study the text classification algorithm. This paper will study the text classification algorithm based on the connotation of the word. We achieve the text classification by analyzing the word connotation to study the regular pattern of words, finding out the frequency regularity of words, and comparing with the known classification of text. The connotation of the word is different from the meaning of the word, which also refers to the meaning of the words, the background of the word, the language environment and so on. The word connotation has greater scope. Thus, we can make text classification more accurately through the research on the connotation of the word. Of course, this experiment only achieves the initial results, to be improved. [资料来源:http://www.doc163.com]
Key words: Word connotation; Word frequency; Corpus linguistics; Text classification
目录
第一章 综述 1
1.1总体概述 1
1.2选题背景 2
1.2.1文本分类定义及流程 2
1.2.2国内外发展现状 2
1.3研究目的及意义 4
第二章 文本分类算法 5
2.1 KNN算法及k近邻(k-NN)分类器 5
2.2朴素贝叶斯算法 6
2.3支持向量机((SVM)分类器 7
2.4神经网络算法 8
2.5各类算法比较 8
第三章 实现 9
3.1开发运行环境 9
3.2准备工作 9
3.2.1 JDK介绍及安装 9
3.2.2 Eclipse介绍及安装 10
3.2.3 Maven介绍及安装 10
3.2.4 MySQL介绍及安装 11
3.2.5 MapDB介绍 12
3.3基于词内涵的文本分类算法研究 12 [资料来源:http://doc163.com]
3.3.1词的共句关系 12
3.3.2词的共句统计 13
3.3.3 20_Newsgroup(单标签英文平衡语料) 17
3.3.4分类算法实现 18
第四章 实验 22
4.1数据库内容 22
4.2实验结果 25
4.2.1实验结果截图 25
4.2.2分类结果 26
第五章 总结 26
参考文献 28
致谢 29
[资料来源:https://www.doc163.com]