Skip to main content

Study on Web Text Feature Selection Based on Rough Set

  • Conference paper
Book cover Intelligent Computing Technology (ICIC 2012)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 7389))

Included in the following conference series:

  • 2564 Accesses

Abstract

This paper uses vector space model as the description of the Web text, analyses the feature of the Web pages which are written in HTML, and improves the traditional formula of TF-IDF. The feature weight is calculated according to the term location in the document. In addition, a text classification system based on Vector Space Model is studied. In the article, feature selection and text classification is connected and feature terms are selected depending on the term’s importance to classification, and then the paper proposes a feature selection algorithm based on rough set. Experiments show that this method can effectively improve the classification accuracy. It can not only reduce the dimension of feature space, but also improve the accuracy of classification.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Wang, J.C., Pan, J.G., Zhang, F.Y.: Research on Web Text Mining. Journal of Computer Research and Development 37, 513–520 (2007) (in Chinese)

    Google Scholar 

  2. Liu, H.: Research on Some Problems in Text Classification. Jilin University, Jilin (2009)

    Google Scholar 

  3. Liu, L.: The Research and Implementation of Automatic Classification for Chinese Web Text. University of Changchun for Science and Technology, Changchun (2007)

    Google Scholar 

  4. Chu, J.C., Liu, P.Y., Wang, W.L.: Improvement Approach to Weighting Terms in Web Text. Computer Engineering and Applications 43, 192–194 (2007) (in Chinese)

    Google Scholar 

  5. Tai, D.Y., Xie, F., Hu, X.G.: Text Categorization Based on Position Weight of Feature Term. Journal of Anhui Technical College of Water Resources and Hydroelectric (3), 64–66 (2008) (in Chinese)

    Google Scholar 

  6. Tan, J.B., Yang, X.J., Li, Y.: An Improved Approach to Term Weighting in Automatic Web Page Classification. Journal of the China Society for Scientific and Technical Information 27, 56–61 (2008) (in Chinese)

    Google Scholar 

  7. Liu, H.F., Zhao, H., Liu, S.S.: An Improved Method of Chinese Text Feature Selection Based on Position. Library and Information Service 53, 102–105 (2009) (in Chinese)

    Google Scholar 

  8. Wang, G.Y.: Rough Sets Theory and Knowledge Acquisition. Xi’an JiaoTong University Press, Xi’an (2001) (in Chinese)

    Google Scholar 

  9. Zhang, B.F., Shi, H.J.: Improved Algorithm of Automatic Classification Based on Rough Sets. Computer Engineering and Applications 47, 129–131 (2011) (in Chinese)

    Google Scholar 

  10. Chen, S.R., Zhang, Y., Yang, Z.Y.: The Research of the Feature Selection Method Based on Rough Set. Computer Engineering and Applications 42, 159–161 (2006) (in Chinese)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, X., Wang, W. (2012). Study on Web Text Feature Selection Based on Rough Set. In: Huang, DS., Jiang, C., Bevilacqua, V., Figueroa, J.C. (eds) Intelligent Computing Technology. ICIC 2012. Lecture Notes in Computer Science, vol 7389. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-31588-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-31588-6_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-31587-9

  • Online ISBN: 978-3-642-31588-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics