Knowledge-Based Metrics for Document Classification: Online Reviews Experiments

  • Mihaela ColhonEmail author
  • Costin Bădică
Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 798)


In this paper we propose a new method that addresses the documents classification problem with respect to their topic. The presented method takes into consideration only textual measures. We exemplify the method by considering three sets of documents of gradually different topics: (i) the first two sets contain reviews that comment the published entity features characteristics representing electronic devices – laptops and mobile phones; (ii) the third set contains reviews about touristic locations. All the review texts are written in Romanian and were extracted by crawling popular Romanian sites. The paper presents and discusses the obtained evaluation scores after the application of textual measures.


  1. 1.
    Balahur, P., Balahur, A.: What does the world think about you? Opinion mining and sentiment analysis in the social web. The Scientific Annals of “Alexandru Ioan Cuza” University of Iaşi Communication Science (2015). ISSN 2068-1143Google Scholar
  2. 2.
    Becheru, A., Bădică, C.: A deeper perspective of online tourism reviews analysis using natural language processing and complex networks techniques. In: Proceedings of the 12th International Conference Linguistic Resources and Tools for Processing the Romanian Language, ConsILR 2016, pp. 189–192 (2016)Google Scholar
  3. 3.
    Becheru, A., Buşe, F., Colhon, M., Bădică, C.: Tourist review analytics using complex networks. In: Proceedings of the 7th Balkan Conference on Informatics Conference, BCI 2015, pp. 25:1–25:8 (2015)Google Scholar
  4. 4.
    Becheru, A., Bădică, C., Antonie, M.: Towards social data analytics for smart tourism: a network science perspective. In: Trandabăţ, D., Gîfu, D. (eds.) Linguistic Linked Open Data, RUMOUR 2015. Communications in Computer and Information Science, vol. 588, pp. 35–48. Springer, Cham (2016)Google Scholar
  5. 5.
    Bonferroni, C.E.: Il calcolo delle assicurazioni su gruppi di teste. In: Studi in Onore del Professore Salvatore Ortu Carboni, Rome, Italy, pp. 13–60 (1935)Google Scholar
  6. 6.
    Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory - COLT 1992, p. 144 (1992).
  7. 7.
    Chai, K.M.A., Chieu, H.L., Ng, H.T.: Bayesian online classifiers for text classification and filtering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 97–104. ACM, New York (2002)Google Scholar
  8. 8.
    Colhon, M., Bădică, C.: Users classification in an online community of Romanian tourists. In: Joint Proceedings of the 1st Workshop on Temporal Dynamics in Digital Libraries (TDDL 2017), the (Meta)-Data Quality Workshop (MDQual 2017) and the Workshop on Modeling Societal Future (Futurity 2017) co-located with 21st International Conference on Theory and Practice of Digital Libraries (TPLD 2017), Thessaloniki, Greece, paper 8 (2017)Google Scholar
  9. 9.
    Colhon, M., Bădică, C., Şendre, A.: Relating the opinion holder and the review accuracy in sentiment analysis of tourist reviews. In: Proceedings of 7th International Conference on Knowledge Science, Engineering and Management, KSEM 2014, pp. 246–257 (2014)Google Scholar
  10. 10.
    Do, C., Ng, A.: Transfer learning for text classification. In: Proceedings of Neural Information Processing Systems (NIPS) (2005)Google Scholar
  11. 11.
    Feng, G., Guo, J., Jing, B.-Y., Hao, L.: A Bayesian feature selection paradigm for text classification. Inf. Process. Manag. 48(2), 283–302 (2012)CrossRefGoogle Scholar
  12. 12.
    Friedl, M.A., Brodley, C.E.: Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61(3), 399–409 (1997)CrossRefGoogle Scholar
  13. 13.
    Evgeniy, G., Markovitch, S.: Harnessing the expertise of 70,000 human editors: knowledge-based feature generation for text categorization. J. Mach. Learn. Res. 8, 2297–2345 (2007)Google Scholar
  14. 14.
    Gîfu, D., Sălăvăstru, A.: A study of geographic data annotation. In: Proceedings of the Summer School on Linguistic Linked Open Data, EUROLAN-2015, Sibiu, Romania (2015)Google Scholar
  15. 15.
    Goyal, R.D.: Knowledge based neural network for text classification. In: Proceedings of the IEEE International Conference on Granular Computing, pp. 542–547 (2007).
  16. 16.
    Groom, N.: Closed-class keywords and corpus-driven discourse analysis. In: Bondi, M., Scott, M. (eds.) Keyness in Texts, pp. 59–78 (2010)Google Scholar
  17. 17.
    Huang, A.: Similarity measures for text document clustering. In: Proceedings of the New Zealand Computer Science Research Student. Conference (NZCSRSC 2008), Christchurch, New Zealand (2008)Google Scholar
  18. 18.
    Jingbo, Z., Yao, T.: A knowledge-based approach to text classification. In: Proceedings of the first SIGHAN Workshop on Chinese Language Processing (SIGHAN 2002), vol. 18, pp. 1–5. Association for Computational Linguistics, Stroudsburg (2002).
  19. 19.
    Lenat, D., Feigenbaum, E.: On the thresholds of knowledge. Artif. Intell. 47, 185–250 (1990)MathSciNetCrossRefGoogle Scholar
  20. 20.
    McCallum, A.K.: Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering (1996)Google Scholar
  21. 21.
    Ng, V., Dasgupta, S., Arifin, S.M.N.: Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 611–618 (2006)Google Scholar
  22. 22.
    Raina, R., Ng, A., Koller, D.: Constructing informative priors using transfer learning. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA (2006)Google Scholar
  23. 23.
    Scott, M.: Problems in investigating keyness, or clearing the undergrowth and marking out trails. In: Bondi, M., Scott, M. (eds.) Keyness in Texts, pp. 43–57. John Benjamins, Amsterdam (2010)CrossRefGoogle Scholar
  24. 24.
    Scott, M., Tribble, C.: Textual Patterns: Key Words and Corpus Analysis In Language Education. John Benjamins, Philadelphia (2006)CrossRefGoogle Scholar
  25. 25.
    Weka 3: Data Mining with Open Source Machine Learning Software in Java, Machine Learning Group at the University of Waikato (2017)Google Scholar
  26. 26.
    Yadollahi, A., Shahraki, A.G., Zaiane, O.R.: Current state of text sentiment analysis from opinion to emotion mining. ACM Comput. Surv. 50(2), Article 25 (2017)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CraiovaCraiovaRomania
  2. 2.Department of Computer and Information TechnologyUniversity of CraiovaCraiovaRomania

Personalised recommendations