Abstract:Text classification is to automatically classify an unknown class text into its corresponding text class. With the increasing growth of information, as an important research task in information-processing fields, automatic text classification has nowadays become a research hotspot. A text classification system based on improved classification model presented in this paper was used to realize automatic text classification. The traditional feature selection algorithm doesn’t take the distribution of feature terms in inter-class and intra-class into consideration, which makes the algorithm can’t effectively weigh the distribution proportion of feature terms. In order to solve the problem, variance in inter-class and intra-class which describes the distribution of feature terms was used to revise weight of the feature term. Then genetic algorithm was applied to feature selection. The traditional idea that selection was done in every document did not be adopted here, instead the idea that selection was done in every category was adopted. That is, genetic algorithm was used to modify the weight of the feature term until feature vector trained for every category can represent the feature of this category. Finally, the feature vector trained was used in automatic classification. After some experiments, it was proved that the method proposed was proper and feasible.