基于改进分类模型的文本分类系统实现

doi:10.11721/cqnuj20090217

首页 > 按月查看>2009年第2月 >79-83. DOI:10.11721/cqnuj20090217

基于改进分类模型的文本分类系统实现
DOI:
                        10.11721/cqnuj20090217
                    
作者:
                        
                        
                    
作者单位:(重庆师范大学 数学与计算机科学学院 运筹学与系统工程重庆市市级重点实验室，重庆400047)
作者简介:
通讯作者:
基金项目:

Realization of Text Classification System Based on Improved Classification Model

Author:

Affiliation:

Chongqing Key Lab. of Operations Research and System Engineering, College of Mathematics and Computer Science, Chongqing Normal University, Chongqing 400047, China

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

摘要:

提出一种基于改进的分类模型的文本分类系统来实现文本的自动分类。针对传统的特征提取算法不能很好区分特征词在类内和类间分布情况的缺陷，该系统利用方差对该算法作了改进，用改进的特征提取算法量化各个特征词的权重，为了降低特征向量的维数，采用为每个类建分类器的分类模型，利用遗传算法来修正各个类特征词的权重，直到为每个类训练出能够代表本类的特征向量，最后用这些类的特征向量进行分类。通过在同一数据集上进行对比实验﹐说明本文提出的改进分类模型的文本分类系统是正确可行的。

Abstract:

Text classification is to automatically classify an unknown class text into its corresponding text class. With the increasing growth of information, as an important research task in information-processing fields, automatic text classification has nowadays become a research hotspot. A text classification system based on improved classification model presented in this paper was used to realize automatic text classification. The traditional feature selection algorithm doesn’t take the distribution of feature terms in inter-class and intra-class into consideration, which makes the algorithm can’t effectively weigh the distribution proportion of feature terms. In order to solve the problem, variance in inter-class and intra-class which describes the distribution of feature terms was used to revise weight of the feature term. Then genetic algorithm was applied to feature selection. The traditional idea that selection was done in every document did not be adopted here, instead the idea that selection was done in every category was adopted. That is, genetic algorithm was used to modify the weight of the feature term until feature vector trained for every category can represent the feature of this category. Finally, the feature vector trained was used in automatic classification. After some experiments, it was proved that the method proposed was proper and feasible.

参考文献

相似文献

引证文献

引用本文

吕佳.基于改进分类模型的文本分类系统实现[J].重庆师范大学学报自然科学版,2009,(2):79-83

复制

文章指标

点击次数:
下载次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期:

引用本文

分享

文章指标

历史