Abstract
We describe the results of extensive machine learning experiments on large collections of Reuters’ English and German newswires. The goal of these experiments was to automatically discover classification patterns that can be used for assignment of topics to the individual newswires. Our results with the English newswire collection show a very large gain in performance as compared to published benchmarks, while our initial results with the German newswires appear very promising. We present our methodology, which seems to be insensitive to the language of the document collections, and discuss issues related to the differences in results that we have obtained for the two collections.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
C. Apté, F. Damerau, and S. Weiss. Automated Learning of Decison Rules for Text Categorization. Technical Report RC 18879, IBM T.J. Watson Research Center, 1993. To appear in ACM Transactions on Office Information Systems.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees. Wadsworth, Monterrey, Ca., 1984.
P. Hayes and S. Weinsteun. Adding Value to Financial News by Computer. In Proceedings of the First International Conference on Artificial Application on Wall Street, pages 2–8, 1991.
P.J. Hayes, P.M. Andersen, I.B. Nirenburg, and L.M. Schmandt. TCS: A Shell for Content-Based Text Categorization. In Proceedings of the Sixth IEEE CALA, pages 320–326, 1990.
D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Symposium on Document Analysis and Information Retrieval,Las Vegas, NV, April 1994. ISRI; Univ. of Nevada, Las Vegas. To appear.
D. Lewis. An Evaluation of Phrasal and Clustered Representations on a Text Categorization Task. In Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 37–50, June 1992. Edited by Nicholas Belkin, Peter Ingwersen, and Annelise Mark Pejtersen.
D. Lewis. Feature Selection and Feature Extraction for Text Categorization. In Procceedings of the Speech and Natural language Workshop,pages 212–217, February 1992. Sponsored by the Defense Advanced Research Projects Agency.
B. Masand, G. Linoff, and D. Waltz. Classifying News Stories using Memory Based Reasoning. In Proceedings of the Fifteenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 59–65,June 1992. Edited by Nicholas Belkin, Peter Ingwersen, and Annelise Mark Pejtersen.
J.R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
S. Weiss and N. Indurkhya. Optimized Rule Induction. IEEE EXPERT, 8 (6): 61–69, December 1993.
S.M. Weiss and C.A. Kulikowski. Computer Systems That Learn. Morgan Kaufmann, 1991.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1994 Springer-Verlag London Limited
About this paper
Cite this paper
Apté, C., Damerau, F., Weiss, S.M. (1994). Towards Language Independent Automated Learning of Text Categorization Models. In: Croft, B.W., van Rijsbergen, C.J. (eds) SIGIR ’94. Springer, London. https://doi.org/10.1007/978-1-4471-2099-5_3
Download citation
DOI: https://doi.org/10.1007/978-1-4471-2099-5_3
Publisher Name: Springer, London
Print ISBN: 978-3-540-19889-5
Online ISBN: 978-1-4471-2099-5
eBook Packages: Springer Book Archive