Skip to main content

Evaluation of Feature Combination Approaches for Text Categorisation

  • Conference paper
Foundations of Intelligent Systems (ISMIS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6804))

Included in the following conference series:

Abstract

Text categorisation relies heavily on feature selection. Both the possible reduction in dimensionality as well as improvements in classification performance are highly desirable. To the end of feature selection for text, a range of different methods have been developed, each having unique properties and selecting different features. However, it remains unclear which of them can be combined and what benefits this brings with it. In this paper we present correlation methods for the analysis of feature rankings and evaluate the combination of features according to these metrics. We further show results of an extensive study of feature selection approaches using a wide range of combination methods. We performed experiments on 19 test collections and report our findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Cormack, G.V., Clarke, C.L.A., Büttcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd ACM SIGIR, pp. 758–759 (2009)

    Google Scholar 

  2. Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)

    MATH  Google Scholar 

  3. Forman, G.: BNS feature scaling: an improved representation over tf-idf for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM), pp. 263–270 (2008)

    Google Scholar 

  4. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  5. Mladenić, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 25-29, pp. 234–241. ACM, New York (2004)

    Google Scholar 

  6. Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM), pp. 538–548 (2002)

    Google Scholar 

  7. Neumayer, R., Doulkeridis, C., Nørvåg, K.: A hybrid approach for estimating document frequencies in unstructured P2P networks. Information Systems 36(3), 579–595 (2011)

    Article  Google Scholar 

  8. Neumayer, R., Mayer, R., Nørvåg, K.: Combination of feature selection methods for text categorisation. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 763–767. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  9. Scott Olsson, J., Oard, D.W.: Combining feature selectors for text classification. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), pp. 798–799 (2006)

    Google Scholar 

  10. Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM), pp. 659–661 (2002)

    Google Scholar 

  11. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  12. Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML), pp. 412–420 (1997)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Neumayer, R., Nørvåg, K. (2011). Evaluation of Feature Combination Approaches for Text Categorisation. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_47

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-21916-0_47

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-21915-3

  • Online ISBN: 978-3-642-21916-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics