Data Mining of Text Documents

Part of the Springer Optimization and Its Applications book series (SOIA, volume 43)


This chapter investigates the problem of classifying text documents into two disjoint classes. It does so by employing a data mining approach based on the OCAT algorithm. This chapter is based on the work discussed in [ Nieto Sanchez, Triantaphyllou, and Kraft, 2002]. In the present setting two sample sets of training examples (text documents) are assumed to be available. An approach is developed that uses indexing terms to form patterns of logical expressions (Boolean functions) that next are used to classify new text documents (which are of unknown class). This is a typical case of supervised “crisp” classification.


Cross Validation Boolean Function Wall Street Journal Text Document Vector Space Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

  1. Luhn, H.P., (1957), “A Statistical Approach to Mechanized Encoding and Searching of Literary Information,” IBM J. Res. Dev., Vol. 4, No. 4, pp. 600–605.MathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer ScienceLouisiana State UniversityBaton RougeUSA

Personalised recommendations