Abstract
Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Brown, A.: What is so special about online (as compared to offline) hate speech? Ethnicities 18(3), 297–326 (2018)
Brysbaert, M., Warriner, B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46(3), 904–911 (2014)
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(11), 11 (2016)
Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Internet Policy Polit. 1–18. (2014). https://doi.org/10.1002/poi3.85
Christians, C.G., Richardson, K.B., Fackler, M., Kreshel, P., Woods, R.H.: Media Ethics: Cases and Moral Reasoning, CourseSmart eTextbook, 9th edn. Routledge, Abingdon (2015)
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, pp. 512–515. AAAI Press, Montréal (2017)
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 29–30. ACM, Florence (2015)
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL, University of Michigan, USA (2005)
Gilbert, E., Hutto, C.J.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the Eighth International Conference on Weblogs and Social Media. AAAI Press, Ann Arbor (2014)
Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquit. Eng. 10(4), 215–230 (2015)
Golbeck, J., Ashktorab, Z., Banjo, R., et al.: A large labeled corpus for online harassment research. In: Proceedings of the 2017 ACM on Web Science Conference, pp. 229–233. ACM, New York (2017)
Kirby, A.: The communicative features of online hate in temporary social networks in Twitter and YouTube. Multilingual Margins: J. Multiling. Peripher. 2(2), 74 (2017)
Martinc, M., Skrjanec, I., Zupan, K., Pollak, S.: PAN 2017: author profiling - gender and language variety prediction. In: Working Notes of CLEF - Conference and Labs of the Evaluation Forum. CEUR-WS.org, Dublin (2017)
Pelle, R.D., Moreira, V.P.: Offensive comments in the Brazilian web: a dataset and baseline results. In: Proceedings of the 6th Brazillian Workshop on Social Network Analysis and Mining, Sao Paulo, SP, Brazil, pp. 510–519 (2017)
Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on Arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, pp. 52–56. ACL (2017)
Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation. European Language Resources Association, Portorož (2016)
Stamatatos, E.: Authorship attribution using text distortion. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, vol. 1, pp. 1138–1149. ACL (2017)
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Montreal, Canada, pp. 19–26. ACL (2012)
Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, Austin, TX, USA, pp. 138–142. ACL (2016)
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the Student Research Workshop, SRW@HLT-NAACL 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 88–93. ACL (2016)
Yenala, H., Jhanwar, A., Chinnakotla, M.K., et al.: Deep learning for detecting inappropriate content in text. Int. J. Data Sci. Anal. 1–14 (2017)
The Guardian. http://www.theguardian.com. Accessed 31 Jan 2018
TextSTAT. https://github.com/shivam5992/textstat. Accessed 30 Nov 2017
Pattern.en. https://www.clips.uantwerpen.be/pages/pattern-en. Accessed 30 Nov 2017
SenticNet. http://sentic.net/. Accessed 31 Jan 2018
Scikit-Learn. http://scikit-learn.org/. Accessed 31 Jan 2018
Tsvetkov, Y., Boytsov, L., Gershman, A., Nyberg, E., Dyer, C.: Metaphor detection with cross-lingual model transfer. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Baltimore, Maryland, USA (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Bellan, P., Strapparava, C. (2018). Detecting Inappropriate Comments to News. In: Ghidini, C., Magnini, B., Passerini, A., Traverso, P. (eds) AI*IA 2018 – Advances in Artificial Intelligence. AI*IA 2018. Lecture Notes in Computer Science(), vol 11298. Springer, Cham. https://doi.org/10.1007/978-3-030-03840-3_30
Download citation
DOI: https://doi.org/10.1007/978-3-030-03840-3_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03839-7
Online ISBN: 978-3-030-03840-3
eBook Packages: Computer ScienceComputer Science (R0)