Distributional Features for Text Categorization

Xue, Xiao-Bing; Zhou, Zhi-Hua

doi:10.1007/11871842_47

Xiao-Bing Xue²¹ &
Zhi-Hua Zhou²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4212))

Included in the following conference series:

European Conference on Machine Learning

5473 Accesses
4 Citations

Abstract

In previous research of text categorization, a word is usually described by features which express that whether the word appears in the document or how frequently the word appears. Although these features are useful, they have not fully expressed the information contained in the document. In this paper, the distributional features are used to describe a word, which express the distribution of a word in a document. In detail, the compactness of the appearances of the word and the position of the first appearance of the word are characterized as features. These features are exploited by a TFIDF style equation in this paper. Experiments show that the distributional features are useful for text categorization. In contrast to using the traditional term frequency features solely, including the distributional features requires only a little additional cost, while the categorization performance can be significantly improved.

Download to read the full chapter text

Chapter PDF

An Extensive Selection of Features as Combinations for Automatic Text Categorization

Improved Document Categorization Through Feature-Rich Combinations

Text categorization based on a new classification by thresholds

Article 03 June 2021

Walid Cherif, Abdellah Madani & Mohamed Kissi

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

Baker, L.D., McCallum, A.K.: Distributional clustering of words for text classification. In: Proceedings of SIGIR 1998, Melbourne, Australia, pp. 96–103 (1998)
Google Scholar
Bekkerman, R., El-Yaniv, R., Tishby, N., Winter, Y.: Distributional word clusters vs. words for text categorization. Journal of Machine Learning Research 3, 1182–1208 (2003)
Article Google Scholar
Callan, J.P.: Passage retrieval evidence in document retrieval. In: Proceedings of SIGIR 1994, Dublin, Ireland, pp. 302–310 (1994)
Google Scholar
Craven, M., DiPasquo, D., Freitag, D., McCallum, A.K., Mitchell, T.M., Nigam, K., Slattery, S.: Learning to extract symbolic knowledge from the World Wide Web. In: Proceedings of AAAI 1998, Madison, WI, pp. 509–516 (1998)
Google Scholar
Dietterich, T.G.: Machine learning research: Four current directions. AI Magazine 18(4), 97–136 (1997)
Google Scholar
Joachims, T.: Text categorization with support vector machines: Learning with many relevant features. In: Proceedings of ECML 1998, Chemnitz, Germany, pp. 137–142 (1998)
Google Scholar
Lang, K.: Newsweeder: Learning to filter netnews. In: Proceedings of ICML 1995, Tahoe City, CA, pp. 331–339 (1995)
Google Scholar
Lewis, D.: Reuters-21578 text categorization test colleciton, Distrib. 1.0 (September 26, 1997)
Google Scholar
Moschitti, A., Basili, R.: Complex Linguistic Features for Text Classification: A Comprehensive Study. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 181–196. Springer, Heidelberg (2004)
Chapter Google Scholar
Nigam, K., McCallum, A.K., Thrun, S., Mitchell, T.M.: Learning to classify text from labeled and unlabeled documents. In: Proceedings of AAAI 1998, Madison, WI, pp. 792–799 (1998)
Google Scholar
Rennie, J., Shih, L., Teevan, J., Karger, D.: Tackling the poor assumptions of Naive Bayes text classifiers. In: Proceedings of ICML 2003, Washington, DC, pp. 616–623 (2003)
Google Scholar
Sauban, M., Pfahringer, B.: Text categorization using document profiling. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS, vol. 2838, pp. 411–422. Springer, Heidelberg (2003)
Chapter Google Scholar
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2-3), 135–168 (2000)
Article MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surverys 34(1), 1–47 (2002)
Article Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of SIGIR 1999, Berkeley, CA, pp. 42–49 (1999)
Google Scholar

Download references

Author information

Authors and Affiliations

National Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210093, China
Xiao-Bing Xue & Zhi-Hua Zhou

Authors

Xiao-Bing Xue
View author publications
You can also search for this author in PubMed Google Scholar
Zhi-Hua Zhou
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Knowledge Engineering Group, Technische Universität Darmstadt,
Johannes Fürnkranz
Max Planck Institute for Computer Science, Saarbrücken, Germany
Tobias Scheffer
Faculty of Computer Science, Otto-von-Guericke-University Magdeburg, Germany
Myra Spiliopoulou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Xue, XB., Zhou, ZH. (2006). Distributional Features for Text Categorization. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds) Machine Learning: ECML 2006. ECML 2006. Lecture Notes in Computer Science(), vol 4212. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11871842_47

Download citation

DOI: https://doi.org/10.1007/11871842_47
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45375-8
Online ISBN: 978-3-540-46056-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Distributional Features for Text Categorization

Abstract

Chapter PDF

Similar content being viewed by others

An Extensive Selection of Features as Combinations for Automatic Text Categorization

Improved Document Categorization Through Feature-Rich Combinations

Text categorization based on a new classification by thresholds

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Distributional Features for Text Categorization

Abstract

Chapter PDF

Similar content being viewed by others

An Extensive Selection of Features as Combinations for Automatic Text Categorization

Improved Document Categorization Through Feature-Rich Combinations

Text categorization based on a new classification by thresholds

Keywords

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation