Abstract
Sentiment analysis is the process of extracting subjective information in source materials. Sentiment analysis is a subfield of web and text mining. One major problem encountered in these areas is overwhelming amount of data available. Hence, instance selection and feature selection become two essential tasks for achieving scalability in machine learning based sentiment classification. Instance selection is a data reduction technique which aims to eliminate redundant, noisy data from the training dataset so that training time can be reduced, scalability and generalization ability can be enhanced. This paper examines the predictive performance of fifteen benchmark instance selection methods for text classification domain. The instance selection methods are evaluated by decision tree classifier (C4.5 algorithm) and radial basis function networks in terms of classification accuracy and data reduction rates. The experimental results indicate that the highest classification accuracies on C4.5 algorithm are generally obtained by model class selection method, while the highest classification accuracies on radial basis function networks are obtained by nearest centroid neighbor edition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Cambria, E., Schuller, B., Xia, Y., Havasi, C.: New avenues in opinion mining and sentiment analysis. IEEE Intell. Syst. 28(2), 15–21 (2013)
Cambria, E., Hussain, A.: Sentic Computing: Techniques, Tools and Applications. Springer, Berlin (2012)
Mitchell, T.: Machine Learning. McGraw-Hill, New York (1997)
Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34, 1–47 (2002)
Feldman, R., Sanger, J.: Text Mining Handbook: Advanced Approaches in Analyzing Unstructured Data. Cambridge, Boston (2007)
Al-Salemi, B., Aziz, M.J.A., Noah, S.A.: Boosting algorithms with topic modeling for multi-label text categorization: a comparative empirical study. J. Inf. Sci. 41(5), 732–746 (2015)
Aggarwal, C.C., Zhai, C.X.: A survey of text classification algorithms. In: Aggarwal, C.C., Zhai, C.X. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 77–128. Springer, Berlin (2012)
Wilson, D.R., Martinez, T.R.: Reduction techniques for instance-based learning algorithm. Mach. Learn. 38, 257–286 (2000)
Czarnowski, I.: Cluster-based instance selection for machine classification. Knowl. Inf. Syst. 30(1), 113–133 (2012)
Verbiest, N.: Fuzzy rough and evolutionary approaches to instance selection. Ph.D. thesis. University of Gent, Belgium (2004)
Liu, H., Motoda, H.: Instance Selection and Construction for Data Mining. Springer, Berlin (2001)
Dey, D., Solorio, T., Gomez, M.M., Escalante, H.J.: Instance selection in text classification using the silhouette coefficient measure. Lecture Notes in Computer Science, vol. 7094, pp. 357–369 (2011)
Tsai, C.-F., Chang, C.-W.: SVOIS: support vector oriented instance selection for text classification. Inf. Sys. 38, 1070–1083 (2013)
Garcia-Pedjaras, N., Haro-Garcia, A., Perez-Rodriguez, J.: A scalable approach to simultaneous evolutionary instance and feature selection. Inf. Sci. 228, 150–174 (2013)
Tsai, C.-F., Chen, Z.-Y., Ke, S.-W.: Evolutionary instance selection for text classification. J. Syst. Softw. 90, 104–113 (2014)
Garcia-Pedjaras, N., Haro-Garcia, A.: Boosting instance selection algorithms. Knowl. Based Syst. 67, 342–360 (2014)
Blachnik, M.: Ensembles of instance selection methods based on feature subset. Procedia Comput. Sci. 35, 388–396 (2014)
Blachnik, M., Kordos, M.: Bagging of instance selection algorithms. Lecture Notes in Computer Science, vol. 8468, pp. 40–51 (2014)
Chen, Z.-Y., Tsai, C.-F., Eberle, W., Lin, W.-C., Ke, S.-W.: Instance selection by genetic-based biological algorithm. Soft. Comput. 19(5), 1269–1282 (2015)
Garcia, S., Derrac, J., Cano, J.R., Herrera, F.: Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans. Pattern Anal. Mach. Intell. 34(3), 417–435 (2012)
Olvera-Lopez, J.A., Carrasco-Ochoa, J.A., Martinez-Trinidad, J.F., Kittler, J.: A review of instance selection methods. Artif. Intell. Rev. 34, 133–143 (2010)
Wilson, D.L.: Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans. Syst. Man Cybern. 2(3), 408–421 (1972)
Tomek, I.: An experiment with the edited nearest neighbor rule. IEEE Trans. Syst. Man Cybern. 6(2), 121–126 (1976)
Devijver, P.A.: On the editing rate of the multiedit algorithm. Pattern Recogn. Lett. 4(1), 9–12 (1986)
Broadley, C.E.: Addressing the selective superiority problem: automatic algorithm/model class selection. In: Proceedings of the 10th International Machine Learning Conference, pp. 17–24. IEEE, New York (1993)
Sanchez, J.S., Pla, F., Ferri, F.J.: Prototype selection for the nearest neighbor rule through proximity graphs. Pattern Recogn. Lett. 18, 507–513 (1997)
Hattori, K., Takahashi, M.: A new edited k-nearest neighbor rule in the pattern classification problem. Pattern Recogn. 33, 521–528 (2000)
Sanchez, J.S., Barandela, R., Marques, A.I., Alejo, R., Badenas, J.: Analysis of new techniques to obtain quality training sets. Pattern Recogn. Lett. 24, 1015–1022 (2003)
Jankowski, N., Grochowski, M.: Comparison of instance selection algorithm I: algorithms survey. Lecture Notes in Artificial Intelligence, vol. 3070, pp. 598–603 (2004)
Vazquez, F., Sanchez, J.S., Pla, F.: A stochastic approach to Wilson’s editing algorithm. Lecture Notes in Computer Science, vol. 3523, pp. 35–42 (2005)
Lowe, D.G.: Similarity metric learning for a variable-kernel classifier. Neural Comput. 7(1), 72–85 (1995)
Sebban, M., Nock, R.: Instance pruning as an information preserving problem. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 855–862. Morgan Kaufmann, New York (2000)
Cano, J., Herrera, F., Lozano, M.: Using evolutionary algorithms as instance selection for data reduction in KDD: an experimental study. IEEE Trans. Evol. Comput. 7(6), 561–575 (2003)
Eshelman, L.J.: The CHC adaptive search algorithm: how to have safe search when engaging in non-traditional genetic recombination. In: Rawlins, G. (ed.) Foundations of Genetic Algorithms and Classifier Systems, pp. 265–283. Morgan Kaufmann, San Mateo (1991)
Gehrke, J.: Decision trees. In: Ye, N. (ed.) The Handbook of Data Mining, pp. 3–24. Lawrence Erlbaum, London (2003)
Bors, A.G.: Introduction of the radial basis function networks. In: Online Symposium for Electronic Engineers, vol. 1, pp. 1–7 (2001)
Du, K.-L., Swamy, M.N.S.: Neural Networks and Statistical Learning. Springer, Berlin (2014)
Whitehead, M., Yaeger, L.: Building a general purpose cross-domain sentiment mining model. In: Proceedings of the World Congress on Computer Science and Information Engineering, pp. 472–476. IEEE, New York (2009)
Onan, A, Korukoğlu, S.: Ensemble methods for opinion mining. In: Proceedings of the 23th Signal Processing and Communications Applications Conference, pp. 212–215. IEEE, New York (2015)
Wang, G., Sun, J., Ma, J., Xu, K., Gu, J.: Sentiment classification: the contribution of ensemble learning. Decis. Support Syst. 57, 77–93 (2014)
Alcala-Fdez, J., Sanchez, L., Garcia, S., Jesus, M.J., Ventura, S., Garrell, J.M., Otero, J., Romero, C., Bacardit, J., Rivas, V.M., Fernandez, J.C., Herrera, F.: KEEL: a software tool to assess evolutionary algorithms to data mining problems. Soft. Comput. 13(3), 307–318 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Onan, A., Korukoğlu, S. (2016). Exploring Performance of Instance Selection Methods in Text Sentiment Classification. In: Silhavy, R., Senkerik, R., Oplatkova, Z., Silhavy, P., Prokopova, Z. (eds) Artificial Intelligence Perspectives in Intelligent Systems. Advances in Intelligent Systems and Computing, vol 464. Springer, Cham. https://doi.org/10.1007/978-3-319-33625-1_16
Download citation
DOI: https://doi.org/10.1007/978-3-319-33625-1_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-33623-7
Online ISBN: 978-3-319-33625-1
eBook Packages: EngineeringEngineering (R0)