Advertisement

Text Classification by Aggregation of SVD Eigenvectors

  • Panagiotis Symeonidis
  • Ivaylo Kehayov
  • Yannis Manolopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7503)

Abstract

Text classification is a process where documents are categorized usually by topic, place, readability easiness, etc. For text classification by topic, a well-known method is Singular Value Decomposition. For text classification by readability, “Flesch Reading Ease index” calculates the readability easiness level of a document (e.g. easy, medium, advanced). In this paper, we propose Singular Value Decomposition combined either with Cosine Similarity or with Aggregated Similarity Matrices to categorize documents by readability easiness and by topic. We experimentally compare both methods with Flesch Reading Ease index, and the vector-based cosine similarity method on a synthetic and a real data set (Reuters-21578). Both methods clearly outperform all other comparison partners.

Keywords

Dimensionality Reduction Singular Value Decomposition Recommender System Test Document Text Categorization 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Coleman, M., Liau, T.L.: A computer readability formula designed for machine scoring. Journal of Applied Psychology 60, 283–284 (1975)CrossRefGoogle Scholar
  2. 2.
    Cover, T., Hart, P.: Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13(1), 21–27 (1967)zbMATHCrossRefGoogle Scholar
  3. 3.
    Dale, E., Chall, J.: A Formula for Predicting Readability. Educational Research Bulletin 27, 11–20, 28 (1948)Google Scholar
  4. 4.
    Furnas, G.W., Deerwester, S., et al.: Information Retrieval Using a Singular Value Decomposition Model of Latent Semantic Structure. In Proceedings of SIGIR Conference, pp.465-480, Grenoble, France (1988)Google Scholar
  5. 5.
    Guan, H., Zhou, J., Guo, M.: A Class-Feature-Centroid Classifier for Text Categorization. In: Proceedings of WWW Conference, Madrid, Spain, pp. 201–210 (2009)Google Scholar
  6. 6.
    Hans-Henning, G., Spiliopoulou, M., Nanopoulos, A.: Eigenvector-Based Clustering Using Aggregated Similarity Matrices. In: Proceedings of ACM SAC Conference, Sierre, Switzerland, pp. 1083–1087 (2010)Google Scholar
  7. 7.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with many Relevant Features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  8. 8.
    Kincaid, J.P., Fishburne, R.P., Rogers, R.L., Chissom, B.S.: Derivation of New Readability Formulas (Automated Readability Index, Fog Count, and Flesch Reading Ease formula) for Navy Enlisted Personnel. Chief of Naval Technical Training: Naval Air Station Memphis, Research Branch Report 8-75. Memphis, USA (1975)Google Scholar
  9. 9.
    McLaughlin, G.H.: SMOG Grading a New Readability Formula. Journal of Reading 12(8), 639–646 (1969)Google Scholar
  10. 10.
    Sarwar, B., Karypis, G., Konstan, J., Riedl, J.: Application of Dimensionality Reduction in Recommenders Systems: a Case Study. In: Proceedings of ACM WebKDD Workshop, Boston, MA, pp. 285–295 (2000)Google Scholar
  11. 11.
    Smith, E.A., Senter, R.J.: Automated Readability Index. Wright Patterson AFB, Ohio. Aerospace Medical Division (1967)Google Scholar
  12. 12.
    Spache, G.: A New Readability Formula for Primary-Grade Reading Materials. The Elementary School Journal 53(7), 410–413 (1953)CrossRefGoogle Scholar
  13. 13.
    Symeonidis, P.: Content-based Dimensionality Reduction for Recommender Systems. In: Proceedings of GfKl Conference, Freiburg, Germany, pp. 619–626 (2007)Google Scholar
  14. 14.
    Yang, Y.: An Evaluation of Statistical Approaches to Text Categorization. Information Retrieval 1(1-2), 69–90 (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Panagiotis Symeonidis
    • 1
  • Ivaylo Kehayov
    • 1
  • Yannis Manolopoulos
    • 1
  1. 1.Department of InformaticsAristotle UniversityThessalonikiGreece

Personalised recommendations