Skip to main content

Exploring Performance of Instance Selection Methods in Text Sentiment Classification

  • Conference paper
  • First Online:
Artificial Intelligence Perspectives in Intelligent Systems

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 464))

Abstract

Sentiment analysis is the process of extracting subjective information in source materials. Sentiment analysis is a subfield of web and text mining. One major problem encountered in these areas is overwhelming amount of data available. Hence, instance selection and feature selection become two essential tasks for achieving scalability in machine learning based sentiment classification. Instance selection is a data reduction technique which aims to eliminate redundant, noisy data from the training dataset so that training time can be reduced, scalability and generalization ability can be enhanced. This paper examines the predictive performance of fifteen benchmark instance selection methods for text classification domain. The instance selection methods are evaluated by decision tree classifier (C4.5 algorithm) and radial basis function networks in terms of classification accuracy and data reduction rates. The experimental results indicate that the highest classification accuracies on C4.5 algorithm are generally obtained by model class selection method, while the highest classification accuracies on radial basis function networks are obtained by nearest centroid neighbor edition.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)

    Article  Google Scholar 

  2. Cambria, E., Hussain, A.: Sentic Computing: Techniques, Tools and Applications. Springer, Berlin (2012)

    Book  Google Scholar 

  3. Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  4. Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)

    Article  MathSciNet  Google Scholar 

  5. Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge, Boston (2007)

    Google Scholar 

  6. Al-Salemi, B., Aziz, M.J.A., Noah, S.A.: Boosting algorithms with topic modeling for multi-label text categorization: a comparative empirical study. J. Inf. Sci. 41(5), 732–746 (2015)

    Article  Google Scholar 

  7. Aggarwal, C.C., Zhai, C.X.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C.X. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 77–128. Springer, Berlin (2012)

    Google Scholar 

  8. Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithm. Mach. Learn. 38, 257–286 (2000)

    Article  MATH  Google Scholar 

  9. Czarnowski, I.: Cluster-based instance selection for machine classification. Knowl. Inf. Syst. 30(1), 113–133 (2012)

    Article  Google Scholar 

  10. Verbiest, N.: Fuzzy rough and evolutionary approaches to instance selection. Ph.D. thesis. University of Gent, Belgium (2004)

    Google Scholar 

  11. Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Springer, Berlin (2001)

    Book  Google Scholar 

  12. Dey, D., Solorio, T., Gomez, M.M., Escalante, H.J.: Instance selection in text classification using the silhouette coefficient measure. Lecture Notes in Computer Science, vol. 7094, pp. 357–369 (2011)

    Google Scholar 

  13. Tsai, C.-F., Chang, C.-W.: SVOIS: support vector oriented instance selection for text classification. Inf. Sys. 38, 1070–1083 (2013)

    Article  Google Scholar 

  14. Garcia-Pedjaras, N., Haro-Garcia, A., Perez-Rodriguez, J.: A scalable approach to simultaneous evolutionary instance and feature selection. Inf. Sci. 228, 150–174 (2013)

    Article  MathSciNet  Google Scholar 

  15. Tsai, C.-F., Chen, Z.-Y., Ke, S.-W.: Evolutionary instance selection for text classification. J. Syst. Softw. 90, 104–113 (2014)

    Article  Google Scholar 

  16. Garcia-Pedjaras, N., Haro-Garcia, A.: Boosting instance selection algorithms. Knowl. Based Syst. 67, 342–360 (2014)

    Article  Google Scholar 

  17. Blachnik, M.: Ensembles of instance selection methods based on feature subset. Procedia Comput. Sci. 35, 388–396 (2014)

    Article  Google Scholar 

  18. Blachnik, M., Kordos, M.: Bagging of instance selection algorithms. Lecture Notes in Computer Science, vol. 8468, pp. 40–51 (2014)

    Google Scholar 

  19. Chen, Z.-Y., Tsai, C.-F., Eberle, W., Lin, W.-C., Ke, S.-W.: Instance selection by genetic-based biological algorithm. Soft. Comput. 19(5), 1269–1282 (2015)

    Article  Google Scholar 

  20. Garcia, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)

    Article  Google Scholar 

  21. Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010)

    Article  Google Scholar 

  22. Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)

    Article  MathSciNet  MATH  Google Scholar 

  23. Tomek, I.: An experiment with the edited nearest neighbor rule. IEEE Trans. Syst. Man Cybern. 6(2), 121–126 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  24. Devijver, P.A.: On the editing rate of the multiedit algorithm. Pattern Recogn. Lett. 4(1), 9–12 (1986)

    Article  Google Scholar 

  25. Broadley, C.E.: Addressing the selective superiority problem: automatic algorithm/model class selection. In: Proceedings of the 10th International Machine Learning Conference, pp. 17–24. IEEE, New York (1993)

    Google Scholar 

  26. Sanchez, J.S., Pla, F., Ferri, F.J.: Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recogn. Lett. 18, 507–513 (1997)

    Article  Google Scholar 

  27. Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recogn. 33, 521–528 (2000)

    Article  Google Scholar 

  28. Sanchez, J.S., Barandela, R., Marques, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24, 1015–1022 (2003)

    Article  Google Scholar 

  29. Jankowski, N., Grochowski, M.: Comparison of instance selection algorithm I: algorithms survey. Lecture Notes in Artificial Intelligence, vol. 3070, pp. 598–603 (2004)

    Google Scholar 

  30. Vazquez, F., Sanchez, J.S., Pla, F.: A stochastic approach to Wilson’s editing algorithm. Lecture Notes in Computer Science, vol. 3523, pp. 35–42 (2005)

    Google Scholar 

  31. Lowe, D.G.: Similarity metric learning for a variable-kernel classifier. Neural Comput. 7(1), 72–85 (1995)

    Article  Google Scholar 

  32. Sebban, M., Nock, R.: Instance pruning as an information preserving problem. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 855–862. Morgan Kaufmann, New York (2000)

    Google Scholar 

  33. Cano, J., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7(6), 561–575 (2003)

    Article  Google Scholar 

  34. Eshelman, L.J.: The CHC adaptive search algorithm: how to have safe search when engaging in non-traditional genetic recombination. In: Rawlins, G. (ed.) Foundations of Genetic Algorithms and Classifier Systems, pp. 265–283. Morgan Kaufmann, San Mateo (1991)

    Google Scholar 

  35. Gehrke, J.: Decision trees. In: Ye, N. (ed.) The Handbook of Data Mining, pp. 3–24. Lawrence Erlbaum, London (2003)

    Google Scholar 

  36. Bors, A.G.: Introduction of the radial basis function networks. In: Online Symposium for Electronic Engineers, vol. 1, pp. 1–7 (2001)

    Google Scholar 

  37. Du, K.-L., Swamy, M.N.S.: Neural Networks and Statistical Learning. Springer, Berlin (2014)

    Book  MATH  Google Scholar 

  38. Whitehead, M., Yaeger, L.: Building a general purpose cross-domain sentiment mining model. In: Proceedings of the World Congress on Computer Science and Information Engineering, pp. 472–476. IEEE, New York (2009)

    Google Scholar 

  39. Onan, A, Korukoğlu, S.: Ensemble methods for opinion mining. In: Proceedings of the 23th Signal Processing and Communications Applications Conference, pp. 212–215. IEEE, New York (2015)

    Google Scholar 

  40. Wang, G., Sun, J., Ma, J., Xu, K., Gu, J.: Sentiment classification: the contribution of ensemble learning. Decis. Support Syst. 57, 77–93 (2014)

    Article  Google Scholar 

  41. Alcala-Fdez, J., Sanchez, L., Garcia, S., Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernandez, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aytuğ Onan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Onan, A., Korukoğlu, S. (2016). Exploring Performance of Instance Selection Methods in Text Sentiment Classification. In: Silhavy, R., Senkerik, R., Oplatkova, Z., Silhavy, P., Prokopova, Z. (eds) Artificial Intelligence Perspectives in Intelligent Systems. Advances in Intelligent Systems and Computing, vol 464. Springer, Cham. https://doi.org/10.1007/978-3-319-33625-1_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-33625-1_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-33623-7

  • Online ISBN: 978-3-319-33625-1

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics