Exploring Performance of Instance Selection Methods in Text Sentiment Classification

  • Aytuğ OnanEmail author
  • Serdar Korukoğlu
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 464)


Sentiment analysis is the process of extracting subjective information in source materials. Sentiment analysis is a subfield of web and text mining. One major problem encountered in these areas is overwhelming amount of data available. Hence, instance selection and feature selection become two essential tasks for achieving scalability in machine learning based sentiment classification. Instance selection is a data reduction technique which aims to eliminate redundant, noisy data from the training dataset so that training time can be reduced, scalability and generalization ability can be enhanced. This paper examines the predictive performance of fifteen benchmark instance selection methods for text classification domain. The instance selection methods are evaluated by decision tree classifier (C4.5 algorithm) and radial basis function networks in terms of classification accuracy and data reduction rates. The experimental results indicate that the highest classification accuracies on C4.5 algorithm are generally obtained by model class selection method, while the highest classification accuracies on radial basis function networks are obtained by nearest centroid neighbor edition.


Instance selection Text sentiment classification Text mining 


  1. 1.
    Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)CrossRefGoogle Scholar
  2. 2.
    Cambria, E., Hussain, A.: Sentic Computing: Techniques, Tools and Applications. Springer, Berlin (2012)CrossRefGoogle Scholar
  3. 3.
    Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)zbMATHGoogle Scholar
  4. 4.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)MathSciNetCrossRefGoogle Scholar
  5. 5.
    Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge, Boston (2007)Google Scholar
  6. 6.
    Al-Salemi, B., Aziz, M.J.A., Noah, S.A.: Boosting algorithms with topic modeling for multi-label text categorization: a comparative empirical study. J. Inf. Sci. 41(5), 732–746 (2015)CrossRefGoogle Scholar
  7. 7.
    Aggarwal, C.C., Zhai, C.X.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C.X. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 77–128. Springer, Berlin (2012)Google Scholar
  8. 8.
    Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithm. Mach. Learn. 38, 257–286 (2000)CrossRefzbMATHGoogle Scholar
  9. 9.
    Czarnowski, I.: Cluster-based instance selection for machine classification. Knowl. Inf. Syst. 30(1), 113–133 (2012)CrossRefGoogle Scholar
  10. 10.
    Verbiest, N.: Fuzzy rough and evolutionary approaches to instance selection. Ph.D. thesis. University of Gent, Belgium (2004)Google Scholar
  11. 11.
    Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Springer, Berlin (2001)CrossRefGoogle Scholar
  12. 12.
    Dey, D., Solorio, T., Gomez, M.M., Escalante, H.J.: Instance selection in text classification using the silhouette coefficient measure. Lecture Notes in Computer Science, vol. 7094, pp. 357–369 (2011)Google Scholar
  13. 13.
    Tsai, C.-F., Chang, C.-W.: SVOIS: support vector oriented instance selection for text classification. Inf. Sys. 38, 1070–1083 (2013)CrossRefGoogle Scholar
  14. 14.
    Garcia-Pedjaras, N., Haro-Garcia, A., Perez-Rodriguez, J.: A scalable approach to simultaneous evolutionary instance and feature selection. Inf. Sci. 228, 150–174 (2013)MathSciNetCrossRefGoogle Scholar
  15. 15.
    Tsai, C.-F., Chen, Z.-Y., Ke, S.-W.: Evolutionary instance selection for text classification. J. Syst. Softw. 90, 104–113 (2014)CrossRefGoogle Scholar
  16. 16.
    Garcia-Pedjaras, N., Haro-Garcia, A.: Boosting instance selection algorithms. Knowl. Based Syst. 67, 342–360 (2014)CrossRefGoogle Scholar
  17. 17.
    Blachnik, M.: Ensembles of instance selection methods based on feature subset. Procedia Comput. Sci. 35, 388–396 (2014)CrossRefGoogle Scholar
  18. 18.
    Blachnik, M., Kordos, M.: Bagging of instance selection algorithms. Lecture Notes in Computer Science, vol. 8468, pp. 40–51 (2014)Google Scholar
  19. 19.
    Chen, Z.-Y., Tsai, C.-F., Eberle, W., Lin, W.-C., Ke, S.-W.: Instance selection by genetic-based biological algorithm. Soft. Comput. 19(5), 1269–1282 (2015)CrossRefGoogle Scholar
  20. 20.
    Garcia, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)CrossRefGoogle Scholar
  21. 21.
    Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010)CrossRefGoogle Scholar
  22. 22.
    Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    Tomek, I.: An experiment with the edited nearest neighbor rule. IEEE Trans. Syst. Man Cybern. 6(2), 121–126 (1976)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Devijver, P.A.: On the editing rate of the multiedit algorithm. Pattern Recogn. Lett. 4(1), 9–12 (1986)CrossRefGoogle Scholar
  25. 25.
    Broadley, C.E.: Addressing the selective superiority problem: automatic algorithm/model class selection. In: Proceedings of the 10th International Machine Learning Conference, pp. 17–24. IEEE, New York (1993)Google Scholar
  26. 26.
    Sanchez, J.S., Pla, F., Ferri, F.J.: Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recogn. Lett. 18, 507–513 (1997)CrossRefGoogle Scholar
  27. 27.
    Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recogn. 33, 521–528 (2000)CrossRefGoogle Scholar
  28. 28.
    Sanchez, J.S., Barandela, R., Marques, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24, 1015–1022 (2003)CrossRefGoogle Scholar
  29. 29.
    Jankowski, N., Grochowski, M.: Comparison of instance selection algorithm I: algorithms survey. Lecture Notes in Artificial Intelligence, vol. 3070, pp. 598–603 (2004)Google Scholar
  30. 30.
    Vazquez, F., Sanchez, J.S., Pla, F.: A stochastic approach to Wilson’s editing algorithm. Lecture Notes in Computer Science, vol. 3523, pp. 35–42 (2005)Google Scholar
  31. 31.
    Lowe, D.G.: Similarity metric learning for a variable-kernel classifier. Neural Comput. 7(1), 72–85 (1995)CrossRefGoogle Scholar
  32. 32.
    Sebban, M., Nock, R.: Instance pruning as an information preserving problem. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 855–862. Morgan Kaufmann, New York (2000)Google Scholar
  33. 33.
    Cano, J., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7(6), 561–575 (2003)CrossRefGoogle Scholar
  34. 34.
    Eshelman, L.J.: The CHC adaptive search algorithm: how to have safe search when engaging in non-traditional genetic recombination. In: Rawlins, G. (ed.) Foundations of Genetic Algorithms and Classifier Systems, pp. 265–283. Morgan Kaufmann, San Mateo (1991)Google Scholar
  35. 35.
    Gehrke, J.: Decision trees. In: Ye, N. (ed.) The Handbook of Data Mining, pp. 3–24. Lawrence Erlbaum, London (2003)Google Scholar
  36. 36.
    Bors, A.G.: Introduction of the radial basis function networks. In: Online Symposium for Electronic Engineers, vol. 1, pp. 1–7 (2001)Google Scholar
  37. 37.
    Du, K.-L., Swamy, M.N.S.: Neural Networks and Statistical Learning. Springer, Berlin (2014)CrossRefzbMATHGoogle Scholar
  38. 38.
    Whitehead, M., Yaeger, L.: Building a general purpose cross-domain sentiment mining model. In: Proceedings of the World Congress on Computer Science and Information Engineering, pp. 472–476. IEEE, New York (2009)Google Scholar
  39. 39.
    Onan, A, Korukoğlu, S.: Ensemble methods for opinion mining. In: Proceedings of the 23th Signal Processing and Communications Applications Conference, pp. 212–215. IEEE, New York (2015)Google Scholar
  40. 40.
    Wang, G., Sun, J., Ma, J., Xu, K., Gu, J.: Sentiment classification: the contribution of ensemble learning. Decis. Support Syst. 57, 77–93 (2014)CrossRefGoogle Scholar
  41. 41.
    Alcala-Fdez, J., Sanchez, L., Garcia, S., Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernandez, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  1. 1.Faculty of Engineering, Department of Computer EngineeringCelal Bayar UniversityManisaTurkey
  2. 2.Faculty of Engineering, Department of Computer EngineeringEge UniversityIzmirTurkey

Personalised recommendations