Hypatia Digital Library: A Text Classification Approach Based on Abstracts
The purpose of this paper is to investigate the application of text classification in Hypatia, the digital library of Technological Educational Institute of Athens, in order to provide an automated classification tool as an alternative to manual assignments. The crucial point in text classification is the selection of the most important term-words for document representation. Classic weighting method TF.IDF was investigated. Our document collection consists of 718 abstracts in Medicine, Tourism and Food Technology. Classification was conducted utilizing 14 classifiers available on WEKA. Classification process yielded an excellent ~97 % precision score.
KeywordsDigital libraries Text classification WEKA Word stemming
- Bouckaert, R. R., Frank, E., Hall, M. A., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2010). WEKA-experiences with a Java open-source project. Journal of Machine Learning Research, 11, 2533–2541.Google Scholar
- Croft, W. B., Metzler, D., & Strohman, T. (2010). Search engines: Information retrieval in practice. Addison-Wesley.Google Scholar
- Irani, D., Webb, S., Pu, C., & Li, K. (2010). Study of trend-stuffing on twitter through text classification. Proceedings of Collaboration, Electronic messaging, Anti-Abuse and Spam Conference (CEAS). Google Scholar
- Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. Proceedings of International Joint Conference on Artificial Intelligence (IJCAI) (pp. 1137–1145).Google Scholar
- Machine Learning Group at the University of Waikato. (n.d.) WEKA 3-data min-ing with open source machine learning software in Java. Retrieved June 06, 2015 from http://www.cs.waikato.ac.nz/~ml/weka/index.html.
- Triantafyllou, I., Demiros, I., & Piperidis, S. (2001). Two level self-organizing approach to text classification. Proceedings of RANLP-2001: Recent Advances in NLP. Google Scholar
- Triantafyllou, I., Koulouris, A., Zervos, S., Dendrinos, M., Kyriaki-Manessi, D., & Giannakopoulos, G. (2014). Significance of clustering and classification applications in digital and physical libraries. Proceedings of 4th International Conference IC-ININFO 2014, Madrid, Spain.Google Scholar
- Witten, I. H., Frank, E., & Hall, M. A. (2011). Data mining: Practical machine learning tools and techniques. Morgan Kaufmann.Google Scholar