Advertisement

Cross-Comparison for Two-Dimensional Text Categorization

  • Giorgio Maria Di Nunzio
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3246)

Abstract

The organization of large text collections is the main goal of automated text categorization. In particular, the final aim is to classify documents into a certain number of pre-defined categories in an efficient way and with as much accuracy as possible. On-line and run-time services, such as personalization services and information filtering services, have increased the importance of effective and efficient document categorization techniques. In the last years, a wide range of supervised learning algorithms have been applied to this problem [1]. Recently, a new approach that exploits a two-dimensional summarization of the data for text classification was presented [2]. This method does not go through a selection of words phase; instead, it uses the whole dictionary to present data in intuitive way on two-dimensional graphs. Although successful in terms of classification effectiveness and efficiency (as recently showed in [3]), this method presents some unsolved key issues: the design of the training algorithm seems to be ad hoc for the Reuters-21578 collection; the evaluation has only been done only on the 10 most frequent classes of the Reuters-21578 dataset; the evaluation lacks measure of significance in most parts; the method adopted lacks a mathematical justification. We focus on the first three aspects, leaving the fourth as the future work.

References

  1. 1.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34, 1–47 (2002)CrossRefMathSciNetGoogle Scholar
  2. 2.
    Di Nunzio, G.M.: A bidimensional view of documents for text categorisation. In: McDonald, S., Tait, J.I. (eds.) ECIR 2004. LNCS, vol. 2997, pp. 112–126. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  3. 3.
    Di Nunzio, G.M., Micarelli, A.: Pushing “underfitting” to the limit: Learning in bidimensional text categorization. In: Proceedings of the 16th European Conference on Artificial Intelligence (ECAI 2004), Valencia, Spain (2004) (forthcoming)Google Scholar
  4. 4.
    Ross, S.: Introduction to Probability and Statistics for Engineers and Scientists. Academic Press, London (2000)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Giorgio Maria Di Nunzio
    • 1
  1. 1.Department of Information EngineeringUniversity of Padua 

Personalised recommendations