Machine Learning

, Volume 39, Issue 2–3, pp 135–168 | Cite as

BoosTexter: A Boosting-based System for Text Categorization

  • Robert E. Schapire
  • Yoram Singer
Article

Abstract

This work focuses on algorithms which learn from examples to perform multiclass text and speech categorization tasks. Our approach is based on a new and improved family of boosting algorithms. We describe in detail an implementation, called BoosTexter, of the new boosting algorithms for text categorization tasks. We present results comparing the performance of BoosTexter and a number of other text-categorization algorithms on a variety of tasks. We conclude by describing the application of our system to automatic call-type identification from unconstrained spoken customer responses.

text and speech categorization multiclass classification problems boosting algorithms 

References

  1. Apté, C., Damerau, F., & Weiss, S. M. (1994). Towards language independent automated learning of text categorization models. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 23–30).Google Scholar
  2. Biebricher, P., Fuhr, N., Lustig, G., Schwantner, M., & Knorz, G. (1988). The automatic indexing system AIR/PHYS—from research to application. Proceedings of the 11th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 333–342).Google Scholar
  3. Blum, A. (1997). Empirical support for winnow and weighted-majority based algorithms: results on a calendar scheduling domain. Machine Learning, 26, 5–23.Google Scholar
  4. Breiman, L. (1998). Arcing classifiers. The Annals of Statistics, 26(3), 801–849.Google Scholar
  5. Cohen, W. (1995). Fast effective rule induction. Proceedings of the Twelfth International Conference on Machine Learning (pp. 115–123).Google Scholar
  6. Cohen, W.W. & Singer, Y. (1996). Context-sensitive learning methods for text categorization. Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 307–315).Google Scholar
  7. Drucker, H. & Cortes, C. (1996). Boosting decision trees. In Advances in Neural Information Processing Systems 8 (pp. 479–485).Google Scholar
  8. Field, B.J. (1975). Towards automatic indexing: automatic assignment of controlled-language indexing and classification from free indexing. Journal of Documentation, 31(4), 246–265.Google Scholar
  9. Freund, Y. & Schapire, R.E. (1996). Experiments with a new boosting algorithm. Machine Learning: Proceedings of the Thirteenth International Conference (pp. 148–156).Google Scholar
  10. Freund, Y. & Schapire, R.E. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1), 119–139.Google Scholar
  11. Freund, Y., Schapire, R.E., Singer, Y., & Warmuth, M.K. (1997). Using and combining predictors that specialize. Proceedings of the Twenty-Ninth Annual ACM Symposium on the Theory of Computing (pp. 334–343).Google Scholar
  12. Fuhr, N. & Pfeifer, U. (1994). Probabilistic information retrieval as a combination of abstraction, inductive learning, and probabilistic assumptions. ACM Transactions on Information Systems, 12(1), 92–115.Google Scholar
  13. Gorin, A.L., Parker, B.A., Sachs, R.M., & Wilpon, J.G. (1996). How may I help you?. Proceedings Interactive Voice Technology for Telecommunications Applications (IVTTA) (pp. 57–60).Google Scholar
  14. Gorin, A.L., Riccardi, G., & Wright, J.H. (1997). How may I help you?. Speech Communication, 23(1–2), 113–127.Google Scholar
  15. Ittner, D.J., Lewis, D.D., & Ahn, D.D. (1995). Text categorization of low quality images. Symposium on Document Analysis and Information Retrieval (pp. 301–315). Las Vegas, NV. ISRI; Univ. of Nevada, Las Vegas.Google Scholar
  16. Joachims, T. (1997). Aprobabilistic analysis of the Rochhio algorithm with TFIDF for text categorization. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 143–151).Google Scholar
  17. Koller, D. & Sahami, M. (1997). Hierarchically classifying documents using very few words. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 171–178).Google Scholar
  18. Lang, K. (1995). Newsweeder: Learning to filter netnews. Proceedings of the Twelfth International Conference on Machine Learning (pp. 331–339).Google Scholar
  19. Lewis, D. (1992). Representation and learning in information retrieval. Technical Report 91–93, Computer Science Department, University of Massachusetts at Amherst. Ph.D. Thesis.Google Scholar
  20. Lewis, D. & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. Machine Learning: Proceedings of the Eleventh International Conference.Google Scholar
  21. Lewis, D. & Gale, W. (1994). Training text classifiers by uncertainty sampling. Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Google Scholar
  22. Lewis, D.D. & Ringuette, M. (1994). A comparison of two learning algorithms for text categorization. Third Annual Symposium on Document Analysis and Information Retrieval (pp. 81–93).Google Scholar
  23. Maclin, R. & Opitz, D. (1997). An empirical evaluation of bagging and boosting. Proceedings of the Fourteenth National Conference on Artificial Intelligence (pp. 546–551).Google Scholar
  24. Margineantu, D.D. & Dietterich, T.G. (1997). Pruning adaptive boosting. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 211–218).Google Scholar
  25. Mitchell, T.M. (1997). Machine learning. McGraw Hill.Google Scholar
  26. Moulinier, I., Răskinis, G., & Ganascia, J.-G. (1996). Text categorization: a symbolic approach. Fifth Annual Symposium on Document Analysis and Information Retrieval (pp. 87–99).Google Scholar
  27. Ng, H.T., Goh, W.B., & Low, K.L. (1997). Feature selection, perceptron learning, and a usability case study for text categorization. Proceedings of the 20th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 67–73).Google Scholar
  28. Quinlan, J.R. (1996). Bagging, boosting, and C4.5. Proceedings of the Thirteenth National Conference on Artificial Intelligence (pp. 725–730).Google Scholar
  29. Riccardi, G., Gorin, A.L., Ljolje, A., & Riley, M. (1997). Spoken language understanding for automated call routing. Proceedings of the 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing (pp. 1143–1146).Google Scholar
  30. Rocchio, J. (1971). Relevance feedback information retrieval. In G. Salton, (Ed.), The Smart retrieval system—experiments in automatic document processing (pp. 313–323). Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
  31. Salton, G. (1991). Developments in automatic text retrieval. Science, 253, 974–980.Google Scholar
  32. Salton, G. & McGill, M.J. (1983). Introduction to modern information retrieval. McGraw-Hill.Google Scholar
  33. Schapire, R.E. (1997). Using output codes to boost multiclass learning problems. Machine Learning: Proceedings of the Fourteenth International Conference (pp. 313–321).Google Scholar
  34. Schapire, R.E., Freund, Y., Bartlett, P., & Lee, W.S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics. 26(5), 1651–1686.Google Scholar
  35. Schapire, R.E. & Singer, Y. (1998). Improved boosting algorithms using confidence-rated predictions. Proceedings of the Eleventh Annual Conference on Computational Learning Theory (pp. 80–91). To appear, Machine Learning.Google Scholar
  36. van Rijsbergen, C.J. (1979). Information retrieval. London: Butterworths.Google Scholar
  37. Weiss, S., Apte, C., Damerau, F., Johnson, D., Oles, F., Goetz, T., & Hampp, T. (1999). Maximizing text-mining performance. IEEE Intelligent Systems.Google Scholar
  38. Wright, J.H., Gorin, A.L., & Riccardi, G. (1997). Automatic acquisition of salient grammar fragments for calltype classification. Proceedings of the 5th European Conference on Speech Communication and Technology (pp. 1419–1422).Google Scholar
  39. Yang, Y. (1994). Expert network: effective and efficient learning from human decisions in text categorization and retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 13–22).Google Scholar
  40. Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1, 69–90.Google Scholar

Copyright information

© Kluwer Academic Publishers 2000

Authors and Affiliations

  • Robert E. Schapire
    • 1
  • Yoram Singer
    • 2
  1. 1.Shannon LaboratoryAT&T LabsUSA
  2. 2.School of Computer Science & EngineeringThe Hebrew UniversityJerusalemIsrael

Personalised recommendations