Hypatia Digital Library: A Text Classification Approach Based on Abstracts

  • Frosso Vorgia
  • Ioannis Triantafyllou
  • Alexandros Koulouris
Conference paper
Part of the Springer Proceedings in Business and Economics book series (SPBE)

Abstract

The purpose of this paper is to investigate the application of text classification in Hypatia, the digital library of Technological Educational Institute of Athens, in order to provide an automated classification tool as an alternative to manual assignments. The crucial point in text classification is the selection of the most important term-words for document representation. Classic weighting method TF.IDF was investigated. Our document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Classification was conducted utilizing 14 classifiers available on WEKA. Classification process yielded an excellent ~97 % precision score.

Keywords

Digital libraries Text classification WEKA Word stemming 

References

  1. Awad, W. A., & ELseuofi, S. M. (2011). Machine learning methods for spam e-mail classification. International Journal of Computer Science & Information Technology, 3(1), 173–184.CrossRefGoogle Scholar
  2. Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2010). WEKA-experiences with a Java open-source project. Journal of Machine Learning Research, 11, 2533–2541.Google Scholar
  3. Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Addison-Wesley.Google Scholar
  4. Irani, D., Webb, S., Pu, C., & Li, K. (2010). Study of trend-stuffing on twitter through text classification. Proceedings of Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS). Google Scholar
  5. Jones, K. S. (1972). A statistical interpretation of term frequency and its application in retrieval. Journal of Documentation, 28(1), 11–21.CrossRefGoogle Scholar
  6. Joorabchi, A., & Mahdi, A. (2011). An unsupervised approach to automatic classification of scientific literature utilizing bibliographic metadata. Journal of In-formation Science, 37(5), 499–514.CrossRefGoogle Scholar
  7. Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1137–1145).Google Scholar
  8. Machine Learning Group at the University of Waikato. (n.d.) WEKA 3-data min-ing with open source machine learning software in Java. Retrieved June 06, 2015 from http://www.cs.waikato.ac.nz/~ml/weka/index.html.
  9. Sebastiani, F. (2002). Machine learning in automated text categorization. ACM Computing Surveys (CSUR), 34(1), 1–47.CrossRefGoogle Scholar
  10. Triantafyllou, I., Demiros, I., & Piperidis, S. (2001). Two level self-organizing approach to text classification. Proceedings of RANLP-2001: Recent Advances in NLP. Google Scholar
  11. Triantafyllou, I., Koulouris, A., Zervos, S., Dendrinos, M., Kyriaki-Manessi, D., & Giannakopoulos, G. (2014). Significance of clustering and classification applications in digital and physical libraries. Proceedings of 4th International Conference IC-ININFO 2014, Madrid, Spain.Google Scholar
  12. Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.Google Scholar

Copyright information

© Springer International Publishing Switzerland 2017

Authors and Affiliations

  • Frosso Vorgia
    • 1
  • Ioannis Triantafyllou
    • 1
  • Alexandros Koulouris
    • 1
  1. 1.Department of Library Science and Information SystemsTechnological Educational Institute of AthensAegaleo, AthensGreece

Personalised recommendations