Advertisement

Statistical Comparison of Opinion Spam Detectors in Social Media with Imbalanced Datasets

  • El-Sayed M. El-AlfyEmail author
  • Sadam Al-Azani
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 969)

Abstract

Sentiment analysis is a growing research area that analyzes people’s opinions towards a specific target using posts shared in social media. However, spammers can inject false opinions to change sentiment-oriented decisions, e.g. low quality products or policies can be promoted or advocated over others. Therefore, identifying and removing spam posts in social media is a crucial data cleaning operation for text mining tasks including sentiment analysis. An inherent problem related to spam detection is the imbalanced-class problem. In this paper, we explore the impact of imbalance ratio on the performance of Twitter spam detection using multiple approaches of single and ensemble classifiers. Besides ensemble-based learning (Bagging and Random forest), we apply the SMOTE oversampling technique to improve detection performance especially for classifiers sensitive to imbalanced datasets.

Keywords

Sentiment analysis Opinion spam detection Imbalanced dataset SMOTE  Social media security Social big data 

References

  1. 1.
    Alberto, T.C., Lochter, J.V., Almeida, T.A.: Tubespam: comment spam filtering on Youtube. In: 14th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 138–143 (2015)Google Scholar
  2. 2.
    Almerekhi, H., Elsayed, T.: Detecting automatically-generated Arabic tweets. In: Zuccon, G., Geva, S., Joho, H., Scholer, F., Sun, A., Zhang, P. (eds.) AIRS 2015. LNCS, vol. 9460, pp. 123–134. Springer, Cham (2015).  https://doi.org/10.1007/978-3-319-28940-3_10CrossRefGoogle Scholar
  3. 3.
    Batista, G.E., Prati, R.C., Monard, M.C.: A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explor. Newslett. 6(1), 20–29 (2004)CrossRefGoogle Scholar
  4. 4.
    Benevenuto, F., Magno, G., Rodrigues, T., Almeida, V.: Detecting spammers on Twitter. In: Collaboration, Electronic Messaging, Anti-abuse and Spam Conference (CEAS), vol. 6, p. 12 (2010)Google Scholar
  5. 5.
    Boudad, N., Faizi, R., Thami, R.O.H., Chiheb, R.: Sentiment analysis in Arabic: a review of the literature. Ain Shams Eng. J. 9(4), 2479–2490 (2018).  https://doi.org/10.1016/j.asej.2017.04.007CrossRefGoogle Scholar
  6. 6.
    Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)CrossRefGoogle Scholar
  7. 7.
    Chawla, N.V., Bowyer, K.W., Hall, L.O., Kegelmeyer, W.P.: SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)CrossRefGoogle Scholar
  8. 8.
    Chen, C., Zhang, J., Chen, X., Xiang, Y., Zhou, W.: 6 million spam tweets: a large ground truth for timely twitter spam detection. In: IEEE International Conference on Communications (ICC), pp. 7065–7070 (2015)Google Scholar
  9. 9.
    El-Mawass, N., Alaboodi, S.: Detecting Arabic spammers and content polluters on twitter. In: Sixth International Conference on Digital Information Processing and Communications (ICDIPC), pp. 53–58 (2016)Google Scholar
  10. 10.
    Grier, C., Thomas, K., Paxson, V., Zhang, M.: @spam: the underground on 140 characters or less. In: Proceedings of the 17th ACM Conference on Computer and Communications Security, pp. 27–37. ACM (2010)Google Scholar
  11. 11.
    He, H., et al.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)CrossRefGoogle Scholar
  12. 12.
    Kabakus, A.T., Kara, R.: A survey of spam detection methods on Twitter. Int. J. Adv. Comput. Sci. Appl. 8(3), 29–38 (2017)Google Scholar
  13. 13.
    Liu, B.: Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 5(1), 1–167 (2012)CrossRefGoogle Scholar
  14. 14.
    Mataoui, M., Zelmati, O., Boughaci, D., Chaouche, M., Lagoug, F.: A proposed spam detection approach for Arabic social networks content. In: IEEE International Conference on Mathematics and Information Technology (ICMIT), pp. 222–226 (2017)Google Scholar
  15. 15.
    Platt, J., et al.: Fast training of support vector machines using sequential minimal optimization. In: Advances in Kernel Methods Support Vector Learning, vol. 3 (1999)Google Scholar
  16. 16.
    Rajdev, M., Lee, K.: Fake and spam messages: detecting misinformation during natural disasters on social media. In: IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology (WI-IAT), vol. 1, pp. 17–20 (2015)Google Scholar
  17. 17.
    Ruan, X., Wu, Z., Wang, H., Jajodia, S.: Profiling online social behaviors for compromised account detection. IEEE Trans. Inf. Forensics Secur. 11(1), 176–187 (2016)CrossRefGoogle Scholar
  18. 18.
    Song, J., Lee, S., Kim, J.: Spam filtering in Twitter using sender-receiver relationship. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 301–317. Springer, Heidelberg (2011).  https://doi.org/10.1007/978-3-642-23644-0_16CrossRefGoogle Scholar
  19. 19.
    Wang, A.H.: Detecting spam bots in online social networking sites: a machine learning approach. In: Foresti, S., Jajodia, S. (eds.) DBSec 2010. LNCS, vol. 6166, pp. 335–342. Springer, Heidelberg (2010).  https://doi.org/10.1007/978-3-642-13739-6_25CrossRefGoogle Scholar
  20. 20.
    Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 46–54. ACM (2011)Google Scholar
  21. 21.
    Witten, I.H., Frank, E., Hall, M.A., Pal, C.J.: Data Mining: Practical Machine Learning Tools and Techniques. Morgan Kaufmann, Los Altos (2016)Google Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.College of Computer Sciences and EngineeringKing Fahd University of Petroleum and MineralsDhahranSaudi Arabia

Personalised recommendations