Abstract
In natural language tasks like text categorization, we usually have an enormous amount of unlabeled data in addition to a small amount of labeled data. We present here a transductive boosting method for text categorization in order to make use of the large amount of unlabeled data efficiently. Our experiments show that the transductive method outperforms conventional boosting techniques that employ only labeled data.
Chapter PDF
Similar content being viewed by others
References
S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proc. of 7th International Conference on Information and Knowledge Management, 1998.
Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2): 256–285, 1995.
M. Haruno, S. Shirai, and Y. Ooyama. Using decision trees to construct a practical parser. Machine Learning, 34:131–149, 1999.
T. Joachims. Text categorization with support vector machines. In Proc. of European Conference on Machine Learning(ECML), 1998.
T. Joachims. Transductive inference for text classification using support vector machines. In Proc. of the 16th International Conference on Machine Learning (ICML’99), 1999.
D.D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proc. of Third Annual Symposium on Document Analysis and Information Retrieval, pages 81–93, 1994.
Mainichi. CD Mainichi Shinbun 94. Nichigai Associates Co., 1995.
L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. In Proc. of Neural Information Processing Systems 1999 (NIPS-99), 1999.
Y. Matsumoto, A Kitauchi, T. Yamashita, Y. Hirano, O. Imaichi, and T. Imamura. Japanese Morphological Analysis System Chasen Manual, 1997. NAIST Technical Report NAIST-IS-TR97007.
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103–134, 2000.
G. Salton (Ed.). The Smart Retrieval System-experiments in Automatic Document Processing. Prentice-Hall, 1971.
R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39:135–168, 2000.
R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998.
H. Taira and M. Haruno. Feature selection in SVM text categorization. In Proc. of the 16th National Conference on Artificial Intelligence (AAAI-99), pages 480–486, 1999.
V. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 13–22, 1994.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2001 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Taira, H., Haruno, M. (2001). Text Categorization Using Transductive Boosting. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_39
Download citation
DOI: https://doi.org/10.1007/3-540-44795-4_39
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive