Detecting Inappropriate Comments to News

  • Patrizio Bellan
  • Carlo Strapparava
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11298)


Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.


Journal news comments Inappropriateness Off-topic Text distortion 


  1. 1.
    Brown, A.: What is so special about online (as compared to offline) hate speech? Ethnicities 18(3), 297–326 (2018)CrossRefGoogle Scholar
  2. 2.
    Brysbaert, M., Warriner, B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46(3), 904–911 (2014)CrossRefGoogle Scholar
  3. 3.
    Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(11), 11 (2016)CrossRefGoogle Scholar
  4. 4.
    Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Internet Policy Polit. 1–18. (2014). Scholar
  5. 5.
    Christians, C.G., Richardson, K.B., Fackler, M., Kreshel, P., Woods, R.H.: Media Ethics: Cases and Moral Reasoning, CourseSmart eTextbook, 9th edn. Routledge, Abingdon (2015)CrossRefGoogle Scholar
  6. 6.
    Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, pp. 512–515. AAAI Press, Montréal (2017)Google Scholar
  7. 7.
    Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 29–30. ACM, Florence (2015)Google Scholar
  8. 8.
    Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL, University of Michigan, USA (2005)Google Scholar
  9. 9.
    Gilbert, E., Hutto, C.J.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the Eighth International Conference on Weblogs and Social Media. AAAI Press, Ann Arbor (2014)Google Scholar
  10. 10.
    Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquit. Eng. 10(4), 215–230 (2015)CrossRefGoogle Scholar
  11. 11.
    Golbeck, J., Ashktorab, Z., Banjo, R., et al.: A large labeled corpus for online harassment research. In: Proceedings of the 2017 ACM on Web Science Conference, pp. 229–233. ACM, New York (2017)Google Scholar
  12. 12.
    Kirby, A.: The communicative features of online hate in temporary social networks in Twitter and YouTube. Multilingual Margins: J. Multiling. Peripher. 2(2), 74 (2017)Google Scholar
  13. 13.
    Martinc, M., Skrjanec, I., Zupan, K., Pollak, S.: PAN 2017: author profiling - gender and language variety prediction. In: Working Notes of CLEF - Conference and Labs of the Evaluation Forum., Dublin (2017)Google Scholar
  14. 14.
    Pelle, R.D., Moreira, V.P.: Offensive comments in the Brazilian web: a dataset and baseline results. In: Proceedings of the 6th Brazillian Workshop on Social Network Analysis and Mining, Sao Paulo, SP, Brazil, pp. 510–519 (2017)Google Scholar
  15. 15.
    Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on Arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, pp. 52–56. ACL (2017)Google Scholar
  16. 16.
    Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation. European Language Resources Association, Portorož (2016)Google Scholar
  17. 17.
    Stamatatos, E.: Authorship attribution using text distortion. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, vol. 1, pp. 1138–1149. ACL (2017)Google Scholar
  18. 18.
    Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Montreal, Canada, pp. 19–26. ACL (2012)Google Scholar
  19. 19.
    Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, Austin, TX, USA, pp. 138–142. ACL (2016)Google Scholar
  20. 20.
    Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the Student Research Workshop, SRW@HLT-NAACL 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 88–93. ACL (2016)Google Scholar
  21. 21.
    Yenala, H., Jhanwar, A., Chinnakotla, M.K., et al.: Deep learning for detecting inappropriate content in text. Int. J. Data Sci. Anal. 1–14 (2017)Google Scholar
  22. 22.
    The Guardian. Accessed 31 Jan 2018
  23. 23.
    TextSTAT. Accessed 30 Nov 2017
  24. 24.
    Pattern.en. Accessed 30 Nov 2017
  25. 25.
    SenticNet. Accessed 31 Jan 2018
  26. 26.
    Scikit-Learn. Accessed 31 Jan 2018
  27. 27.
    Tsvetkov, Y., Boytsov, L., Gershman, A., Nyberg, E., Dyer, C.: Metaphor detection with cross-lingual model transfer. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Baltimore, Maryland, USA (2014)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2018

Authors and Affiliations

  1. 1.University of Trento, CIMeCTrentoItaly
  2. 2.FBK-irstTrentoItaly

Personalised recommendations