Skip to main content

Detecting Inappropriate Comments to News

  • Conference paper
  • First Online:
AI*IA 2018 – Advances in Artificial Intelligence (AI*IA 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11298))

  • 995 Accesses

Abstract

Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Brown, A.: What is so special about online (as compared to offline) hate speech? Ethnicities 18(3), 297–326 (2018)

    Article  Google Scholar 

  2. Brysbaert, M., Warriner, B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46(3), 904–911 (2014)

    Article  Google Scholar 

  3. Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(11), 11 (2016)

    Article  Google Scholar 

  4. Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Internet Policy Polit. 1–18. (2014). https://doi.org/10.1002/poi3.85

    Article  Google Scholar 

  5. Christians, C.G., Richardson, K.B., Fackler, M., Kreshel, P., Woods, R.H.: Media Ethics: Cases and Moral Reasoning, CourseSmart eTextbook, 9th edn. Routledge, Abingdon (2015)

    Book  Google Scholar 

  6. Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, pp. 512–515. AAAI Press, Montréal (2017)

    Google Scholar 

  7. Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 29–30. ACM, Florence (2015)

    Google Scholar 

  8. Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL, University of Michigan, USA (2005)

    Google Scholar 

  9. Gilbert, E., Hutto, C.J.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the Eighth International Conference on Weblogs and Social Media. AAAI Press, Ann Arbor (2014)

    Google Scholar 

  10. Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquit. Eng. 10(4), 215–230 (2015)

    Article  Google Scholar 

  11. Golbeck, J., Ashktorab, Z., Banjo, R., et al.: A large labeled corpus for online harassment research. In: Proceedings of the 2017 ACM on Web Science Conference, pp. 229–233. ACM, New York (2017)

    Google Scholar 

  12. Kirby, A.: The communicative features of online hate in temporary social networks in Twitter and YouTube. Multilingual Margins: J. Multiling. Peripher. 2(2), 74 (2017)

    Google Scholar 

  13. Martinc, M., Skrjanec, I., Zupan, K., Pollak, S.: PAN 2017: author profiling - gender and language variety prediction. In: Working Notes of CLEF - Conference and Labs of the Evaluation Forum. CEUR-WS.org, Dublin (2017)

    Google Scholar 

  14. Pelle, R.D., Moreira, V.P.: Offensive comments in the Brazilian web: a dataset and baseline results. In: Proceedings of the 6th Brazillian Workshop on Social Network Analysis and Mining, Sao Paulo, SP, Brazil, pp. 510–519 (2017)

    Google Scholar 

  15. Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on Arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, pp. 52–56. ACL (2017)

    Google Scholar 

  16. Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation. European Language Resources Association, Portorož (2016)

    Google Scholar 

  17. Stamatatos, E.: Authorship attribution using text distortion. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, vol. 1, pp. 1138–1149. ACL (2017)

    Google Scholar 

  18. Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Montreal, Canada, pp. 19–26. ACL (2012)

    Google Scholar 

  19. Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, Austin, TX, USA, pp. 138–142. ACL (2016)

    Google Scholar 

  20. Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the Student Research Workshop, SRW@HLT-NAACL 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 88–93. ACL (2016)

    Google Scholar 

  21. Yenala, H., Jhanwar, A., Chinnakotla, M.K., et al.: Deep learning for detecting inappropriate content in text. Int. J. Data Sci. Anal. 1–14 (2017)

    Google Scholar 

  22. The Guardian. http://www.theguardian.com. Accessed 31 Jan 2018

  23. TextSTAT. https://github.com/shivam5992/textstat. Accessed 30 Nov 2017

  24. Pattern.en. https://www.clips.uantwerpen.be/pages/pattern-en. Accessed 30 Nov 2017

  25. SenticNet. http://sentic.net/. Accessed 31 Jan 2018

  26. Scikit-Learn. http://scikit-learn.org/. Accessed 31 Jan 2018

  27. Tsvetkov, Y., Boytsov, L., Gershman, A., Nyberg, E., Dyer, C.: Metaphor detection with cross-lingual model transfer. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Baltimore, Maryland, USA (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Carlo Strapparava .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bellan, P., Strapparava, C. (2018). Detecting Inappropriate Comments to News. In: Ghidini, C., Magnini, B., Passerini, A., Traverso, P. (eds) AI*IA 2018 – Advances in Artificial Intelligence. AI*IA 2018. Lecture Notes in Computer Science(), vol 11298. Springer, Cham. https://doi.org/10.1007/978-3-030-03840-3_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-03840-3_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-03839-7

  • Online ISBN: 978-3-030-03840-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics