Skip to main content

Towards Language Independent Automated Learning of Text Categorization Models

  • Conference paper
Book cover SIGIR ’94

Abstract

We describe the results of extensive machine learning experiments on large collections of Reuters’ English and German newswires. The goal of these experiments was to automatically discover classification patterns that can be used for assignment of topics to the individual newswires. Our results with the English newswire collection show a very large gain in performance as compared to published benchmarks, while our initial results with the German newswires appear very promising. We present our methodology, which seems to be insensitive to the language of the document collections, and discuss issues related to the differences in results that we have obtained for the two collections.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. C. Apté, F. Damerau, and S. Weiss. Automated Learning of Decison Rules for Text Categorization. Technical Report RC 18879, IBM T.J. Watson Research Center, 1993. To appear in ACM Transactions on Office Information Systems.

    Google Scholar 

  2. L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Monterrey, Ca., 1984.

    MATH  Google Scholar 

  3. P. Hayes and S. Weinsteun. Adding Value to Financial News by Computer. In Proceedings of the First International Conference on Artificial Application on Wall Street, pages 2–8, 1991.

    Chapter  Google Scholar 

  4. P.J. Hayes, P.M. Andersen, I.B. Nirenburg, and L.M. Schmandt. TCS: A Shell for Content-Based Text Categorization. In Proceedings of the Sixth IEEE CALA, pages 320–326, 1990.

    Google Scholar 

  5. D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval,Las Vegas, NV, April 1994. ISRI; Univ. of Nevada, Las Vegas. To appear.

    Google Scholar 

  6. D. Lewis. An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 37–50, June 1992. Edited by Nicholas Belkin, Peter Ingwersen, and Annelise Mark Pejtersen.

    Google Scholar 

  7. D. Lewis. Feature Selection and Feature Extraction for Text Categorization. In Procceedings of the Speech and Natural language Workshop,pages 212–217, February 1992. Sponsored by the Defense Advanced Research Projects Agency.

    Article  Google Scholar 

  8. B. Masand, G. Linoff, and D. Waltz. Classifying News Stories using Memory Based Reasoning. In Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 59–65,June 1992. Edited by Nicholas Belkin, Peter Ingwersen, and Annelise Mark Pejtersen.

    Google Scholar 

  9. J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.

    Google Scholar 

  10. S. Weiss and N. Indurkhya. Optimized Rule Induction. IEEE EXPERT, 8 (6): 61–69, December 1993.

    Article  Google Scholar 

  11. S.M. Weiss and C.A. Kulikowski. Computer Systems That Learn. Morgan Kaufmann, 1991.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1994 Springer-Verlag London Limited

About this paper

Cite this paper

Apté, C., Damerau, F., Weiss, S.M. (1994). Towards Language Independent Automated Learning of Text Categorization Models. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-2099-5_3

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19889-5

  • Online ISBN: 978-1-4471-2099-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics