Text Categorization Using Transductive Boosting

Taira, Hirotoshi; Haruno, Masahiko

doi:10.1007/3-540-44795-4_39

Hirotoshi Taira³ &
Masahiko Haruno⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2167))

Included in the following conference series:

European Conference on Machine Learning

2262 Accesses

Abstract

In natural language tasks like text categorization, we usually have an enormous amount of unlabeled data in addition to a small amount of labeled data. We present here a transductive boosting method for text categorization in order to make use of the large amount of unlabeled data efficiently. Our experiments show that the transductive method outperforms conventional boosting techniques that employ only labeled data.

Download to read the full chapter text

Chapter PDF

Term Network Approach for Transductive Classification

An AdaBoost for Efficient Use of Confidences of Weak Hypotheses on Text Categorization

Learning to Classify Text Using a Few Labeled Examples

References

S. Dumais, J. Platt, D. Heckerman, and M. Sahami. Inductive learning algorithms and representations for text categorization. In Proc. of 7th International Conference on Information and Knowledge Management, 1998.
Google Scholar
Y. Freund and R. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139, 1997.
Article MATH MathSciNet Google Scholar
Y. Freund. Boosting a weak learning algorithm by majority. Information and Computation, 121(2): 256–285, 1995.
Article MATH MathSciNet Google Scholar
M. Haruno, S. Shirai, and Y. Ooyama. Using decision trees to construct a practical parser. Machine Learning, 34:131–149, 1999.
Article MATH Google Scholar
T. Joachims. Text categorization with support vector machines. In Proc. of European Conference on Machine Learning(ECML), 1998.
Google Scholar
T. Joachims. Transductive inference for text classification using support vector machines. In Proc. of the 16th International Conference on Machine Learning (ICML’99), 1999.
Google Scholar
D.D. Lewis and M. Ringuette. A comparison of two learning algorithms for text categorization. In Proc. of Third Annual Symposium on Document Analysis and Information Retrieval, pages 81–93, 1994.
Google Scholar
Mainichi. CD Mainichi Shinbun 94. Nichigai Associates Co., 1995.
Google Scholar
L. Mason, J. Baxter, P. Bartlett, and M. Frean. Boosting algorithms as gradient descent. In Proc. of Neural Information Processing Systems 1999 (NIPS-99), 1999.
Google Scholar
Y. Matsumoto, A Kitauchi, T. Yamashita, Y. Hirano, O. Imaichi, and T. Imamura. Japanese Morphological Analysis System Chasen Manual, 1997. NAIST Technical Report NAIST-IS-TR97007.
Google Scholar
K. Nigam, A. McCallum, S. Thrun, and T. Mitchell. Text classification from labeled and unlabeled documents using EM. Machine Learning, 39:103–134, 2000.
Article MATH Google Scholar
G. Salton (Ed.). The Smart Retrieval System-experiments in Automatic Document Processing. Prentice-Hall, 1971.
Google Scholar
R. E. Schapire and Y. Singer. Boostexter: A boosting-based system for text categorization. Machine Learning, 39:135–168, 2000.
Article MATH Google Scholar
R. E. Schapire, Y. Freund, P. Bartlett, and W. S. Lee. Boosting the margin: A new explanation for the effectiveness of voting methods. The Annals of Statistics, 26(5):1651–1686, 1998.
Article MATH MathSciNet Google Scholar
H. Taira and M. Haruno. Feature selection in SVM text categorization. In Proc. of the 16th National Conference on Artificial Intelligence (AAAI-99), pages 480–486, 1999.
Google Scholar
V. Vapnik. Statistical Learning Theory. John Wiley & Sons, 1998.
Google Scholar
Y. Yang. Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In Proc. of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 13–22, 1994.
Google Scholar

Download references

Author information

Authors and Affiliations

NTT Communication Science Laboratories, 2-4, Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237, Japan
Hirotoshi Taira
Advanced Telecommunications Research Institute International, 2-2, Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan
Masahiko Haruno

Authors

Hirotoshi Taira
View author publications
You can also search for this author in PubMed Google Scholar
Masahiko Haruno
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science, Albert-Ludwigs University Freiburg, Georges Köhler-Allee, Geb. 079, 79110, Freiburg, Germany
Luc De Raedt
Department of Computer Science, University of Bristol, Merchant Ventures Bldg., Woodland Road, Bristol, BS8 1UB, UK
Peter Flach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Taira, H., Haruno, M. (2001). Text Categorization Using Transductive Boosting. In: De Raedt, L., Flach, P. (eds) Machine Learning: ECML 2001. ECML 2001. Lecture Notes in Computer Science(), vol 2167. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44795-4_39

Download citation

DOI: https://doi.org/10.1007/3-540-44795-4_39
Published: 30 August 2001
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-42536-6
Online ISBN: 978-3-540-44795-5
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics

Text Categorization Using Transductive Boosting

Abstract

Chapter PDF

Similar content being viewed by others

Term Network Approach for Transductive Classification

An AdaBoost for Efficient Use of Confidences of Weak Hypotheses on Text Categorization

Learning to Classify Text Using a Few Labeled Examples

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Text Categorization Using Transductive Boosting

Abstract

Chapter PDF

Similar content being viewed by others

Term Network Approach for Transductive Classification

An AdaBoost for Efficient Use of Confidences of Weak Hypotheses on Text Categorization

Learning to Classify Text Using a Few Labeled Examples

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation