Chi-Square Classifier for Document Categorization

  • Mikhail Alexandrov
  • Alexander Gelbukh
  • George Lozovoi
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2004)

Abstract

The problem of document categorization is considered. The set of domains and the keywords specific for these domains is supposed to be selected beforehand as initial data. We apply the well-known statistical hypothesis test that considers images of documents and domains as normalized vectors. In comparison with existing methods, such approach allows to take into account a random character of initial data. The classifier is developed in the framework of Document Investigator software package.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alexandrov, M., Gelbukh, A., and Makagonov, P. Some keyword-based characteristics for evaluation of thematic structure of multidisciplinary documents. Proc. of 1st Int. Conf. on Intelligent Text Processing and Computational Linguistics, Mexico City, 2000, pp. 390–401.Google Scholar
  2. 2.
    Cramer, H. Mathematical methods of statistics. Cambridge, 1946.Google Scholar
  3. 3.
    Guzman-Arenas, A. Finding the main themes in a Spanish documents. Intern. J. of Expert Systems with Applications, 1998, v. 14, N 1/2, pp. 139–148.Google Scholar
  4. 4.
    Mitchel, T. Machine learning. New-York, McGraw Hill, 1997.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Mikhail Alexandrov
    • 1
  • Alexander Gelbukh
    • 1
  • George Lozovoi
    • 2
  1. 1.Center for Computing Research, IPNMexico
  2. 2.DatagisticsCanada

Personalised recommendations