Comparison of SVM and Ontology-Based Text Classification Methods

  • Krzysztof Wróbel
  • Maciej Wielgosz
  • Aleksander Smywiński-Pohl
  • Marcin Pietron
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9692)

Abstract

This work addresses the challenging task of text categorization. The main goal is the comparison of two different approaches, i.e. Vector Space Model and ontology-based solutions. The authors compare and contrast them with respect to accuracy and processing flow, which affect the classification results. The ontology-based method outperforms its counter-part when it comes to category resolution, i.e. the number of categories which can be processed. On the other hand, the SVM-based module is much faster and performs well when trained on an appropriately-structured learning set. The authors performed a series of tests to compare the methods and, as expected, the ontology-based solution outperformed the SVM classifier. It reached a micro averaged F1-score of 0.90 with 2.8 million Wikipedia articles, whereas the SVM-based module did not exceed 0.86 with the same data set. The macro averaged F1-score of both solutions was inferior to the micro one and reached values of 0.75 and 0.57, for ontology and SVM-based solutions respectively.

Keywords

Text classification Vector space model Ontology-based methods Support vector machine Wikipedia 

References

  1. 1.
    Ng, V., Dasgupta, S., Arifin, S.: Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings of the COLING/ACL on Main Conference Poster Sessions, Association for Computational Linguistics, pp. 611–618 (2006)Google Scholar
  2. 2.
    Durant, K.T., Smith, M.D.: Predicting the political sentiment of web log posts using supervised machine learning techniques coupled with feature selection. In: Nasraoui, O., Spiliopoulou, M., Srivastava, J., Mobasher, B., Masand, B. (eds.) WebKDD 2006. LNCS (LNAI), vol. 4811, pp. 187–206. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  3. 3.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398. Springer, Heidelberg (1998)Google Scholar
  4. 4.
    Hotho, A., Maedche, A., Staab, S.: Ontology-based text document clustering. KI 16(4), 48–54 (2002)Google Scholar
  5. 5.
    Salton, G., Wong, A., Yang, C.S.: A vector space model for automatic indexing. Commun. ACM 18(11), 613–620 (1975)CrossRefMATHGoogle Scholar
  6. 6.
    Liu, Z., Lv, X., Liu, K., Shi, S.: Study on SVM compared with the other text classification methods. In: 2010 Second International Workshop on Education Technology and Computer Science (ETCS), vol. 1, pp. 219–222. IEEE (2010)Google Scholar
  7. 7.
    Polpinij, J., Ghose, A.K.: An ontology-based sentiment classification methodology for online consumer reviews. In: Proceedings of the 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, vol. 01, pp. 518–524. IEEE Computer Society (2008)Google Scholar
  8. 8.
    Zhao, L., Li, C.: Ontology based opinion mining for movie reviews. In: Karagiannis, D., Jin, Z. (eds.) KSEM 2009. LNCS, vol. 5914, pp. 204–214. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  9. 9.
    Muller, H.M., Kenny, E.E., Sternberg, W.: Textpresso: an ontology-based information retrieval and extraction system for biological literature. PLoS Biol. 2(11), e309 (2004)CrossRefGoogle Scholar
  10. 10.
    Lenat, D.B.: CYC: a large-scale investment in knowledge infrastructure. Commun. ACM 38(11), 33–38 (1995)CrossRefGoogle Scholar
  11. 11.
    Pohl, A.: Classifying the wikipedia articles into the OpenCyc taxonomy. In: Rizzo, G., Mendes, P., Charton, E., Hellmann, S., Kalyanpur, A., (eds.) Proceedings of the Web of Linked Entities Workshop in Conjuction with the 11th International Semantic Web Conference, pp. 5–16 (2012)Google Scholar
  12. 12.
    Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)MATHGoogle Scholar
  13. 13.
    Ben-Hur, A., Horn, D., Siegelmann, H.T., Vapnik, V.: Support vector clustering. J. Mach. Learn. Res. 2, 125–137 (2002)MATHGoogle Scholar
  14. 14.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)MATHGoogle Scholar
  15. 15.
    Piasecki, M., Szpakowicz, S., Broda, B.: A WordNet from the ground up. Oficyna Wydawnicza Politechniki Wrocawskiej, Wrocaw (2009)Google Scholar
  16. 16.
    Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.G.: DBpedia: a nucleus for a web of open data. In: Aberer, K., et al. (eds.) ASWC 2007 and ISWC 2007. LNCS, vol. 4825, pp. 722–735. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  17. 17.
    Lehmann, J., Isele, R., Jakob, M., Jentzsch, A., Kontokostas, D., Mendes, P.N., Hellmann, S., Morsey, M., van Kleef, P., Auer, S., et al.: DBpedia-a large-scale, multilingual knowledge base extracted from wikipedia. Semant. Web J. 5, 1–29 (2014)Google Scholar
  18. 18.
    Motik, B.: On the properties of metamodeling in owl. J. Logic Comput. 17(4), 617–637 (2007)MathSciNetCrossRefMATHGoogle Scholar
  19. 19.
    Manning, C.D., Surdeanu, M., Bauer, J., Finkel, J., Bethard, S.J., McClosky, D.: The Stanford CoreNLP natural language processing toolkit. In: Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations, pp. 55–60 (2014)Google Scholar
  20. 20.
    Mendes, P., Jakob, M., Bizer, C.: DBpedia for NLP: A Multilingual Cross-domain Knowledge Base. In: LREC (to appear, 2012)Google Scholar
  21. 21.
    Zhang, X., LeCun, Y.: Text understanding from scratch (2015). arXiv preprint arXiv:1502.01710
  22. 22.
    Agarwal, A., Chapelle, O., Dudík, M., Langford, J.: A reliable effective terascale linear learning system. J. Mach. Learn. Res. 15(1), 1111–1133 (2014)MathSciNetMATHGoogle Scholar
  23. 23.
    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)MathSciNetMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Krzysztof Wróbel
    • 3
  • Maciej Wielgosz
    • 1
  • Aleksander Smywiński-Pohl
    • 1
    • 3
  • Marcin Pietron
    • 2
  1. 1.AGH University of Science and TechnologyKrakowPoland
  2. 2.ACK Cyfronet AGHKrakowPoland
  3. 3.Jagiellonian UniversityKrakowPoland

Personalised recommendations