Detecting Cyberbullying in Social Commentary Using Supervised Machine Learning

  • Muhammad Owais RazaEmail author
  • Mohsin Memon
  • Sania Bhatti
  • Rahim Bux
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1130)


This paper addresses the problem of cyberbullying on various online discussion forums in the form of social commentary. Here, supervised machine learning algorithms are employed to detect whether a particular comment is an insult, threat or a hate message. First of all, a machine learning model is developed with Logistic Regression, Random forest and naive bayes algorithms for classification and then, both Voting and AdaBoost classifiers are applied on the developed model to observe which works best in this case. In Japan, the members of PTA (Parent Teacher Association) perform net-petrol with a manual website monitoring in order to catch and stop cyberbullying activities; however, doing all this manually is very time consuming and hectic process. The main contribution of this paper includes a mechanism to detect cyberbullying and by using supervised machine learning with logistic regression algorithm, model has achieved an accuracy of 82.7%. With voting classifier, an accuracy of 84.4% was observed. The evaluation results show that voting classifier outperforms all other algorithms in detecting cyberbullying.


Cyberbullying Python NLP Supervised machine learning 


  1. 1.
    Detecting Insults in Social Commentary “Kaggle”, (2019). Accessed 09 Apr 2019
  2. 2.
    Nitta, T., et al.: Detecting cyberbullying entries on informal school websites based on category relevance maximization. In: Proceedings of the Sixth International Joint Conference on Natural Language Processing (2013)Google Scholar
  3. 3.
    Reynolds, K., Kontostathis, A., Edwards, L.: Using machine learning to detect cyberbullying. In: 2011 10th International Conference on Machine learning and applications and workshops, vol. 2. IEEE (2011)Google Scholar
  4. 4.
    Dadvar, M., et al.: Improved cyberbullying detection using gender information. In: Proceedings of the Twelfth Dutch-Belgian Information Retrieval Workshop (DIR 2012). University of Ghent (2012)Google Scholar
  5. 5.
    Kontostathis, A., et al.: Detecting cyberbullying: query terms and techniques. In: Proceedings of the 5th Annual ACM Web Science Conference. ACM (2013)Google Scholar
  6. 6.
    Dadvar, M., et al.: Improving cyberbullying detection with user context. In: European Conference on Information Retrieval. Springer, Berlin (2013)Google Scholar
  7. 7.
    DeGregory, K.W., et al.: A review of machine learning in obesity. Obes. Rev. 19(5), 668–685 (2018)CrossRefGoogle Scholar
  8. 8.
    Wu, J.-Y., Hsiao, Y.-C., Nian, M.-W.: Using supervised machine learning on large-scale online forums to classify course-related Facebook messages in predicting learning achievement within the personal learning environment. In: Interactive Learning Environments, pp. 1–16 (2018)Google Scholar
  9. 9.
    Balyan, R., McCarthy, K.S., McNamara, D.S.: Comparing machine learning classification approaches for predicting expository text difficulty. In: The Thirty-First International Flairs Conference (2018)Google Scholar
  10. 10.
    Hoogeveen, D., et al.: Web forum retrieval and text analytics: a survey. Found. Trends® Inf. Retrieval 12(1), 1–163 (2018)CrossRefGoogle Scholar
  11. 11.
    Raisi, E., Huang, B.: Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, pp. 409–416. ACM, July 2017Google Scholar
  12. 12.
    Al-garadi, M.A., Varathan, K.D., Ravana, S.D.: Cybercrime detection in online communications: the experimental case of cyberbullying detection in the Twitter network. Comput. Hum. Behav. 63, 433–443 (2016)CrossRefGoogle Scholar
  13. 13.
    Randhawa, K., et al.: Credit card fraud detection using AdaBoost and majority voting. IEEE Access 6, 14277–14284 (2018)CrossRefGoogle Scholar
  14. 14.
  15. 15.
    Ensemble Methods. Scikit. Accessed 24 Apr 2019
  16. 16.
    Rahman, H.A.A., Wah, Y.B., He, H., Bulgiba, A.: Comparisons of ADABOOST, KNN, SVM and logistic regression in classification of imbalanced dataset. In: International Conference on Soft Computing in Data Science, pp. 54–64. Springer, Singapore, September 2015Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Muhammad Owais Raza
    • 1
    Email author
  • Mohsin Memon
    • 1
  • Sania Bhatti
    • 1
  • Rahim Bux
    • 1
  1. 1.Department of Software EngineeringMehran University of Engineering TechnologyJamshoroPakistan

Personalised recommendations