Weakly supervised cyberbullying detection with participant-vocabulary consistency

  • Elaheh Raisi
  • Bert Huang
Original Article


Online harassment and cyberbullying are becoming serious social health threats damaging people’s lives. This phenomenon is creating a need for automated, data-driven techniques for analyzing and detecting such detrimental online behaviors. We propose a weakly supervised machine learning method for simultaneously inferring user roles in harassment-based bullying and new vocabulary indicators of bullying. The learning algorithm considers social structure and infers which users tend to bully and which tend to be victimized. To address the elusive nature of cyberbullying using minimal effort and cost, the learning algorithm only requires weak supervision. The weak supervision is in the form of expert-provided small seed of bullying indicators, and the algorithm uses a large, unlabeled corpus of social media interactions to extract bullying roles of users and additional vocabulary indicators of bullying. The model estimates whether each social interaction is bullying based on who participates and based on what language is used, and it tries to maximize the agreement between these estimates, i.e., participant-vocabulary consistency (PVC). To evaluate PVC, we perform extensive quantitative and qualitative experiments on three social media datasets: Twitter,, and Instagram. We illustrate the strengths and weaknesses of the model by analyzing the identified conversations and key phrases by PVC. In addition, we demonstrate the distributions of bully and victim scores to examine the relationship between the tendencies of users to bully or to be victimized. We also perform fairness evaluation to analyze the potential for automated detection to be biased against particular groups.


  1. Ashktorab Z, Vitak J (2016) Designing cyberbullying mitigation and prevention solutions through participatory design with teenagers. in Proceedings of the CHI conference on human factors in computing systems, pp 3895–3905Google Scholar
  2. Bellmore A, Calvin AJ, Xu J-M, Zhu X (2015) The five W’s of bullying on Twitter: who, what, why, where, and when. Comput Hum Behav 44:305–314CrossRefGoogle Scholar
  3. Bifet A, Frank E (2010) Sentiment knowledge discovery in Twitter streaming data. In: International conference on discovery science, pp 1–15Google Scholar
  4. Boyd D (2014) It’s complicated. Yale University Press, New HavenGoogle Scholar
  5. Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017a) Mean birds: detecting aggression and bullying on twitter. In: Proceedings of the 2017 ACM on web science conference, June 2017Google Scholar
  6. Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakali A (2017b) Measuring #gamergate: a tale of hate, sexism, and bullying. In: Proceedings of the 26th international conference on world wide web companion, pp 1285–1290Google Scholar
  7. Chatzakou D, Kourtellis N, Blackburn J, Cristofaro ED, Stringhini G, Vakili A (2017c) Hate is not binary: studying abusive behavior of #gamergate on twitter. In: Proceedings of the 28th ACM conference on hypertext and social media, July 2017Google Scholar
  8. Chatzakou D, Kourtellis N, Blackburn J, De Cristofaro E, Stringhini G, Vakali A (2017) Detecting aggressors and bullies on Twitter. In: Proceedings of the 26th international conference on World Wide Web companion, ser. WWW ’17 Companion, pp 767–768. [Online].
  9. Chelmis C, Zois D, Yao M (2018) Mining patterns of cyberbullying on Twitter. In: 2017 ieee international conference on data mining workshops (ICDMW), vol 00, pp 126–133.
  10. Cheng J, Danescu-Niculescu-Mizil C, Leskovec J (2015) Antisocial behavior in online discussion communities. In: Proceedings of ICWSM, June 2017, pp 61–70Google Scholar
  11. Chen Y, Zhou Y, Zhu S, Xu H (2012) Detecting offensive language in social media to protect adolescent online safety. In: International conference on social computing, pp 71–80Google Scholar
  12. Corcoran L, Guckin CM, Prentice G (2015) Cyberbullying or cyber aggression? A review of existing definitions of cyber-based peer-to-peer aggression. Societies 5(2):245–255CrossRefGoogle Scholar
  13. Dadvar M, de Jong F, Ordelman R, Trieschnigg D (2012) Improved cyberbullying detection using gender information. In: Dutch–Belgian information retrieval workshop, February 2012, pp 23–25Google Scholar
  14. Davidson T, Warmsley D, Macy MW, Weber I (2017) Automated hate speech detection and the problem of offensive language. CoRR arXiv:abs/1703.04009
  15. Dinakar K, Reichart R, Lieberman H (2011) Modeling the detection of textual cyberbullying. In: ICWSM workshop on social mobile webGoogle Scholar
  16. (2013) The annual cyberbullying survey. Accessed 25 Sept 2013
  17. Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N (2015) Hate speech detection with comment embeddings. In: International conference on World Wide Web, pp 29–30Google Scholar
  18. Donath JS (1999) Identity and deception in the virtual community. Commun Cybersp 1996:29–59Google Scholar
  19. Dordolo N (2014) The role of power imbalance in cyberbullying. Inkblot Undergrad J Psychol 3:35–41Google Scholar
  20. Farrington DP (1993) Understanding and preventing bullying. Crime Justice 17:381–458CrossRefGoogle Scholar
  21. Herring SC (2002) Cyber violence: recognizing and resisting abuse in online environments. Asian Women 14:187–212Google Scholar
  22. Hosseinmardi H, Ghasemianlangroodi A, Han R, Lv Q, Mishra S (2014) Towards understanding cyberbullying behavior in a semi-anonymous social network. In: IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2014, pp 244–252Google Scholar
  23. Hosseinmardi H, Li S, Yang Z, Lv Q, Rafiq RI, Han R, Mishra S (2014) A comparison of common users across Instagram and to better understand cyberbullying. In: IEEE International confercne on big data and cloud computingGoogle Scholar
  24. Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Analyzing labeled cyberbullying incidents on the Instagram social network. In: Intarnational confercne on social informatics, pp 49–66Google Scholar
  25. Hosseinmardi H, Mattson SA, Rafiq RI, Han R, Lv Q, Mishra S (2015) Detection of cyberbullying incidents on the Instagram social network. In: Association for the advancement of artificial intelligenceGoogle Scholar
  26. Huang Q, Singh VK (2014) Cyber bullying detection using social and textual analysis. In: Proceedings of the international workshop on socially-aware multimedia, pp 3–6Google Scholar
  27. Kowalski RM, Limber SP, Agatston PW (2012) Cyberbullying: bullying in the digital age. Wiley, New YorkGoogle Scholar
  28. Lavrenko V, Croft WB (2001) Relevance based language models. In: Proceedings of the international ACM SIGIR conference on research and development in information retrieval, pp 120–127Google Scholar
  29. Mahendiran A, Wang W, Arredondo J, Huang B, Getoor L, Mares D, Ramakrishnan N (2014) Discovering evolving political vocabulary in social media. In: International conference on behavioral, economic, and socio-cultural computingGoogle Scholar
  30. Manning CD, Raghavan P, Schütze H (2008) Introduction to information retrieval. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  31. Margono H, Yi X, Raikundalia GK (2014) Mining Indonesian cyber bullying patterns in social networks. Proceedings of the Australasian computer science conference, January 2014, vol 147Google Scholar
  32. Massoudi K, Tsagkias M, de Rijke M, Weerkamp W (2011) Incorporating query expansion and quality indicators in searching microblog posts. Proc Eur Conf Adv Inf Retr 15(5):362–367Google Scholar
  33. McGhee I, Bayzick J, Kontostathis A, Edwards L, McBride A, Jakubowski E (2011) Learning to identify internet sexual predation. Int J Electron Commerce 15(3):103–122CrossRefGoogle Scholar
  34. Nahar V, Li X, Pang C (2013) An effective approach for cyberbullying detection. Commun Inf Sci Manag Eng 3(5):238–247Google Scholar
  35. Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y (2016) Abusive language detection in online user content. InL Proceedings of the international confercne on World Wide Web, pp 145–153Google Scholar
  36. (2016) List of swear words and curse words. Accessed Jan 2016
  37. Patchin JW, Hinduja S (2012) Cyberbullying prevention and response: expert perspectives. Routledge, New YorkGoogle Scholar
  38. Patton DU, McKeown K, Rambow O, Macbeth J (2016) Using natural language processing and qualitative analysis to intervene in gang violence. arXiv preprint arXiv:1609.08779
  39. Pennebaker JW, Francis ME, Booth RJ (2001) Linguistic inquiry and word count: LIWC. Lawrence Erlbaum Associates, MahwayGoogle Scholar
  40. Ptaszynski M, Dybala P, Matsuba T, Masui F, Rzepka R, Araki K (2010) Machine learning and affect analysis against cyber-bullying. In: Linguistic and cognitive approaches to dialog agents symposium, pp 7–16Google Scholar
  41. Raisi E, Huang B (2016) Cyberbullying identification using participant-vocabulary consistency. In: Proceedings of 2016 ICML workshop on #Data4Good: machine learning in social good applicationsGoogle Scholar
  42. Raisi E, Huang B (2017) Cyberbullying detection with weakly supervised machine learning. In: Proceedings of the IEEE/acm international conference on social networks analysis and miningGoogle Scholar
  43. Ramakrishnan N, Butler P, Self N, Khandpur R, Saraf P, Wang W, Cadena J, Vullikanti A, Korkmaz G, Kuhlman C, Marathe A, Zhao L, Ting H, Huang B, Srinivasan A, Trinh K, Getoor L, Katz G, Doyle A, Ackermann C, Zavorin I, Ford J, Summers K, Fayed Y, Arredondo J, Gupta D, Mares D (2014) Beating the news’ with EMBERS: forecasting civil unrest using open source indicators. In: ACM SIGKDD conference on knowledge discovery and data mining, pp 1799–1808Google Scholar
  44. Reynolds K, Kontostathis A, Edwards L (2011) Using machine learning to detect cyberbullying. In: International conference on machine learning and applications and workshops (ICMLA), vol 2, pp 241–244Google Scholar
  45. Shachaf P, Ha N (2010) Beyond vandalism: wikipedia trolls. J Inf Sci 36:357–370CrossRefGoogle Scholar
  46. Silva TH, de Melo PO, Almeida JM, Salles J, Loureiro AA (2013) A picture of Instagram is worth more than a thousand words: workload characterization and application. In: DCOSS, pp 123–132Google Scholar
  47. Singh VK, Huang Q, Atrey PK (2016) Cyberbullying detection using probabilistic socio-textual information fusion. In: 2016 IEEE/ACM international conference on advances in social networks analysis and mining (ASONAM), August 2016, pp 884–887. [Online].
  48. Smith PK, Mahdavi J, Carvalho M, Fisher S, Russell S, Tippett N (2008) Cyberbullying: its nature and impact in secondary school pupils. J Child Psychol Psychiatry 49(4):376–385CrossRefGoogle Scholar
  49. Tahmasbi N, Rastegari E (2018) A socio-contextual approach in automated detection of cyberbullying. In: Proceedings of the 51st Hawaii international conference on system sciences, pp 2151–2160Google Scholar
  50. Tokunaga RS (2010) Following you home from school: a critical review and synthesis of research on cyberbullying victimization. Comput Hum Behav 26(3):277–287CrossRefGoogle Scholar
  51. Wang J, Iannotti RJ, Nansel TR (2009) School bullying among US adolescents: physical, verbal, relational and cyber. J Adolesc Health 45:368–375CrossRefGoogle Scholar
  52. Warner W, Hirschberg J (2012) Detecting hate speech on the world wide web. In: Workshop on language in social media, pp 19–26Google Scholar
  53. Whitney I, Smith PK (1993) A survey of the nature and extent of bullying in junior/middle and secondary schools. Educ Res 35(1):3–25CrossRefGoogle Scholar
  54. Yin D, Xue Z, Hong L, Davison BD, Kontostathis A, Edwards L (2009) Detection of harassment on Web 2.0. In: Proceedings of the content analysis in the WEB 2.0 (CAW2.0) workshop at WWW2009, pp 1–7Google Scholar
  55. Zois D-S, Kapodistria A, Yao M, Chelmis C (2018) Optimal online cyberbullying detection. In: 2018 IEEE international conference on acoustics, speech and signal processing. IEEE SigPort [Online].

Copyright information

© Springer-Verlag GmbH Austria, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceVirginia TechBlacksburgUSA

Personalised recommendations