Evaluation of Feature Combination Approaches for Text Categorisation

Neumayer, Robert; Nørvåg, Kjetil

doi:10.1007/978-3-642-21916-0_47

Robert Neumayer²³ &
Kjetil Nørvåg²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6804))

Included in the following conference series:

International Symposium on Methodologies for Intelligent Systems

3730 Accesses
1 Citations

Abstract

Text categorisation relies heavily on feature selection. Both the possible reduction in dimensionality as well as improvements in classification performance are highly desirable. To the end of feature selection for text, a range of different methods have been developed, each having unique properties and selecting different features. However, it remains unclear which of them can be combined and what benefits this brings with it. In this paper we present correlation methods for the analysis of feature rankings and evaluate the combination of features according to these metrics. We further show results of an extensive study of feature selection approaches using a wide range of combination methods. We performed experiments on 19 test collections and report our findings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Cormack, G.V., Clarke, C.L.A., Büttcher, S.: Reciprocal rank fusion outperforms condorcet and individual rank learning methods. In: Proceedings of the 32nd ACM SIGIR, pp. 758–759 (2009)
Google Scholar
Forman, G.: An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research 3, 1289–1305 (2003)
MATH Google Scholar
Forman, G.: BNS feature scaling: an improved representation over tf-idf for SVM text classification. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM), pp. 263–270 (2008)
Google Scholar
Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)
MATH Google Scholar
Mladenić, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, July 25-29, pp. 234–241. ACM, New York (2004)
Google Scholar
Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM), pp. 538–548 (2002)
Google Scholar
Neumayer, R., Doulkeridis, C., Nørvåg, K.: A hybrid approach for estimating document frequencies in unstructured P2P networks. Information Systems 36(3), 579–595 (2011)
Article Google Scholar
Neumayer, R., Mayer, R., Nørvåg, K.: Combination of feature selection methods for text categorisation. In: Clough, P., Foley, C., Gurrin, C., Jones, G.J.F., Kraaij, W., Lee, H., Mudoch, V. (eds.) ECIR 2011. LNCS, vol. 6611, pp. 763–767. Springer, Heidelberg (2011)
Chapter Google Scholar
Scott Olsson, J., Oard, D.W.: Combining feature selectors for text classification. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management (CIKM), pp. 798–799 (2006)
Google Scholar
Rogati, M., Yang, Y.: High-performing feature selection for text classification. In: Proceedings of the 11th ACM International Conference on Information and Knowledge Management (CIKM), pp. 659–661 (2002)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the 14th International Conference on Machine Learning (ICML), pp. 412–420 (1997)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer and Information Science, Norwegian University of Science and Technology, Trondheim, Norway
Robert Neumayer & Kjetil Nørvåg

Authors

Robert Neumayer
View author publications
You can also search for this author in PubMed Google Scholar
Kjetil Nørvåg
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology,, Nowowiejska 15/19, 00-665, Warsaw, Poland
Marzena Kryszkiewicz
Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19, 00-665, Warsaw, Poland
Henryk Rybinski
University of Warsaw, 02-097, Warsaw, Poland
Andrzej Skowron
Faculty of Electronics and Information Technology, Institute of Computer Science, Warsaw University of Technology, Nowowiejska 15/19,, 00-665, Warsaw, Poland
Zbigniew W. Raś

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Neumayer, R., Nørvåg, K. (2011). Evaluation of Feature Combination Approaches for Text Categorisation. In: Kryszkiewicz, M., Rybinski, H., Skowron, A., Raś, Z.W. (eds) Foundations of Intelligent Systems. ISMIS 2011. Lecture Notes in Computer Science(), vol 6804. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-21916-0_47

Download citation

DOI: https://doi.org/10.1007/978-3-642-21916-0_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-21915-3
Online ISBN: 978-3-642-21916-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics