Skip to main content

Ensemble Pruning for Text Categorization Based on Data Partitioning

  • Conference paper
Information Retrieval Technology (AIRS 2011)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7097))

Included in the following conference series:

Abstract

Ensemble methods can improve the effectiveness in text categorization. Due to computation cost of ensemble approaches there is a need for pruning ensembles. In this work we study ensemble pruning based on data partitioning. We use a ranked-based pruning approach. For this purpose base classifiers are ranked and pruned according to their accuracies in a separate validation set. We employ four data partitioning methods with four machine learning categorization algorithms. We mainly aim to examine ensemble pruning in text categorization. We conduct experiments on two text collections: Reuters-21578 and BilCat-TRT. We show that we can prune 90% of ensemble members with almost no decrease in accuracy. We demonstrate that it is possible to increase accuracy of traditional ensembling with ensemble pruning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Breiman, L.: Bagging predictors. Mach. Learn. 24, 123–140 (1996)

    MATH  Google Scholar 

  2. Caruana, R., Munson, A., Niculescu-Mizil, A.: Getting the most out of ensemble selection. In: ICDM 2006, pp. 828–833. IEEE Computer Society, Washington, DC (2006)

    Google Scholar 

  3. Caruana, R., Niculescu-Mizil, A., Crew, G., Ksikes, A.: Ensemble selection from libraries of models. In: Proceedings of The Twenty-First Int. Conf. on ML, ICML 2004, p. 18 (2004)

    Google Scholar 

  4. Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13, 21–27 (1967)

    Article  MATH  Google Scholar 

  5. Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)

    Chapter  Google Scholar 

  6. Dong, Y.S., Han, K.S.: Text classification based on data partitioning and parameter varying ensembles. In: Proceedings of the 2005 ACM Symposium on Applied Computing, SAC 2005, pp. 1044–1048 (2005)

    Google Scholar 

  7. Hernández-lobato, D., Martínez-Muñoz, G., Suárez, A.: Pruning in ordered regression bagging ensembles. In: Proceedings of IJCNN 2006, IEEE WCCI 2006, Vancouver, BC, pp. 1266–1273 (2006)

    Google Scholar 

  8. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: UAI 1995, pp. 338–345 (1995)

    Google Scholar 

  9. Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Symposium on Document Analysis and Information Retrieval, pp. 81–93. ISRI, Univ. of Nevada, Las Vegas (1994)

    Google Scholar 

  10. Lu, Z., Wu, X., Zhu, X., Bongard, J.: Ensemble pruning via individual contribution ordering. In: Proceedings of the 16th ACM SIGKDD, KDD 2010, pp. 871–880 (2010)

    Google Scholar 

  11. Margineantu, D.D., Dietterich, T.G.: Pruning adaptive boosting. In: Proceedings of the Fourteenth International Conference on ML, ICML 1997, pp. 211–218 (1997)

    Google Scholar 

  12. Martínez-Muñoz, G., Suárez, A.: Aggregation ordering in bagging. In: Proc. of the IASTED, pp. 258–263. Acta Press (2004)

    Google Scholar 

  13. Martínez-Muñoz, G., Suárez, A.: Using boosting to prune bagging ensembles. Pattern Recognition Letters 28, 156–165 (2007)

    Google Scholar 

  14. Prodromidis, A.L., Stolfo, S.J., Chan, P.K.: Effective and efficient pruning of meta-classifiers in a distributed data mining system. Tech. rep. (1999)

    Google Scholar 

  15. Quinlan, J.R.: C4.5: Programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  16. Toraman, C.: Text Categorization and Ensemble Pruning in Turkish News Portals. M.Sc. Thesis. Bilkent University, Ankara, Turkey (2011)

    Google Scholar 

  17. Tsoumakas, G., Partalas, I., Vlahavas, I.: A taxonomy and short review of ensemble selection. In: ECAI 2008, Workshop on Supervised and Unsupervised Ensemble Methods and Their Applications (2008)

    Google Scholar 

  18. Vapnik, V.: Estimation of Dependences Based on Empirical Data. Springer Inc., Secaucus (1982)

    MATH  Google Scholar 

  19. Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques, 2nd edn. Morgan Kaufmann Series in Data Management Systems. Morgan Kaufmann Publishers Inc., San Francisco (2005)

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Toraman, C., Can, F. (2011). Ensemble Pruning for Text Categorization Based on Data Partitioning. In: Salem, M.V.M., Shaalan, K., Oroumchian, F., Shakery, A., Khelalfa, H. (eds) Information Retrieval Technology. AIRS 2011. Lecture Notes in Computer Science, vol 7097. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25631-8_32

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-25631-8_32

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-25630-1

  • Online ISBN: 978-3-642-25631-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics