Abstract
This paper uses vector space model as the description of the Web text, analyses the feature of the Web pages which are written in HTML, and improves the traditional formula of TF-IDF. The feature weight is calculated according to the term location in the document. In addition, a text classification system based on Vector Space Model is studied. In the article, feature selection and text classification is connected and feature terms are selected depending on the term’s importance to classification, and then the paper proposes a feature selection algorithm based on rough set. Experiments show that this method can effectively improve the classification accuracy. It can not only reduce the dimension of feature space, but also improve the accuracy of classification.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Wang, J.C., Pan, J.G., Zhang, F.Y.: Research on Web Text Mining. Journal of Computer Research and Development 37, 513–520 (2007) (in Chinese)
Liu, H.: Research on Some Problems in Text Classification. Jilin University, Jilin (2009)
Liu, L.: The Research and Implementation of Automatic Classification for Chinese Web Text. University of Changchun for Science and Technology, Changchun (2007)
Chu, J.C., Liu, P.Y., Wang, W.L.: Improvement Approach to Weighting Terms in Web Text. Computer Engineering and Applications 43, 192–194 (2007) (in Chinese)
Tai, D.Y., Xie, F., Hu, X.G.: Text Categorization Based on Position Weight of Feature Term. Journal of Anhui Technical College of Water Resources and Hydroelectric (3), 64–66 (2008) (in Chinese)
Tan, J.B., Yang, X.J., Li, Y.: An Improved Approach to Term Weighting in Automatic Web Page Classification. Journal of the China Society for Scientific and Technical Information 27, 56–61 (2008) (in Chinese)
Liu, H.F., Zhao, H., Liu, S.S.: An Improved Method of Chinese Text Feature Selection Based on Position. Library and Information Service 53, 102–105 (2009) (in Chinese)
Wang, G.Y.: Rough Sets Theory and Knowledge Acquisition. Xi’an JiaoTong University Press, Xi’an (2001) (in Chinese)
Zhang, B.F., Shi, H.J.: Improved Algorithm of Automatic Classification Based on Rough Sets. Computer Engineering and Applications 47, 129–131 (2011) (in Chinese)
Chen, S.R., Zhang, Y., Yang, Z.Y.: The Research of the Feature Selection Method Based on Rough Set. Computer Engineering and Applications 42, 159–161 (2006) (in Chinese)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, X., Wang, W. (2012). Study on Web Text Feature Selection Based on Rough Set. In: Huang, DS., Jiang, C., Bevilacqua, V., Figueroa, J.C. (eds) Intelligent Computing Technology. ICIC 2012. Lecture Notes in Computer Science, vol 7389. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31588-6_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-31588-6_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-31587-9
Online ISBN: 978-3-642-31588-6
eBook Packages: Computer ScienceComputer Science (R0)