Detecting Inappropriate Comments to News

Bellan, Patrizio; Strapparava, Carlo

doi:10.1007/978-3-030-03840-3_30

Patrizio Bellan¹⁶ &
Carlo Strapparava¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11298))

Included in the following conference series:

International Conference of the Italian Association for Artificial Intelligence

995 Accesses

Abstract

Inappropriate comments, defined as deliberately offensive, off-topic, troll-like, or direct attacks based on religious, sexual, racial, gender, or ethnic posts, are becoming increasingly problematic in user-generated content on the internet, because they can either derail the conversation or spread out harassment. Furthermore, the computational analysis of this kind of content, posted in response to professional news-papers, is not well investigated yet. To such an extent, the most predictive linguistic and cognitive features were seldom been addressed, and inappropriateness was not investigated deeply. After collecting a new dataset of inappropriate comments, three classic machine learning models were tested over two possible representations for the data to fed in: normal and distorted. Text distortion technique, thanks to its ability to mask thematic information, enhanced classification performance resulting in the valuable ground in which extract features from. Lexicon based features showed to be the most valuable characteristics to consider. Logistic regression turned out to be the most efficient algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Brown, A.: What is so special about online (as compared to offline) hate speech? Ethnicities 18(3), 297–326 (2018)
Article Google Scholar
Brysbaert, M., Warriner, B., Kuperman, V.: Concreteness ratings for 40 thousand generally known English word lemmas. Behav. Res. Methods 46(3), 904–911 (2014)
Article Google Scholar
Burnap, P., Williams, M.L.: Us and them: identifying cyber hate on Twitter across multiple protected characteristics. EPJ Data Sci. 5(11), 11 (2016)
Article Google Scholar
Burnap, P., Williams, M.: Hate speech, machine classification and statistical modelling of information flows on Twitter: interpretation and communication for policy decision making. Internet Policy Polit. 1–18. (2014). https://doi.org/10.1002/poi3.85
Article Google Scholar
Christians, C.G., Richardson, K.B., Fackler, M., Kreshel, P., Woods, R.H.: Media Ethics: Cases and Moral Reasoning, CourseSmart eTextbook, 9th edn. Routledge, Abingdon (2015)
Book Google Scholar
Davidson, T., Warmsley, D., Macy, M., Weber, I.: Automated hate speech detection and the problem of offensive language. In: Proceedings of the 11th International AAAI Conference on Web and Social Media, pp. 512–515. AAAI Press, Montréal (2017)
Google Scholar
Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web Companion, pp. 29–30. ACM, Florence (2015)
Google Scholar
Finkel, J.R., Grenager, T., Manning, C.: Incorporating non-local information into information extraction systems by Gibbs sampling. In: 43rd Annual Meeting of the Association for Computational Linguistics, pp. 363–370. ACL, University of Michigan, USA (2005)
Google Scholar
Gilbert, E., Hutto, C.J.: Vader: a parsimonious rule-based model for sentiment analysis of social media text. In: Proceedings of the Eighth International Conference on Weblogs and Social Media. AAAI Press, Ann Arbor (2014)
Google Scholar
Gitari, N.D., Zuping, Z., Damien, H., Long, J.: A lexicon-based approach for hate speech detection. Int. J. Multimedia Ubiquit. Eng. 10(4), 215–230 (2015)
Article Google Scholar
Golbeck, J., Ashktorab, Z., Banjo, R., et al.: A large labeled corpus for online harassment research. In: Proceedings of the 2017 ACM on Web Science Conference, pp. 229–233. ACM, New York (2017)
Google Scholar
Kirby, A.: The communicative features of online hate in temporary social networks in Twitter and YouTube. Multilingual Margins: J. Multiling. Peripher. 2(2), 74 (2017)
Google Scholar
Martinc, M., Skrjanec, I., Zupan, K., Pollak, S.: PAN 2017: author profiling - gender and language variety prediction. In: Working Notes of CLEF - Conference and Labs of the Evaluation Forum. CEUR-WS.org, Dublin (2017)
Google Scholar
Pelle, R.D., Moreira, V.P.: Offensive comments in the Brazilian web: a dataset and baseline results. In: Proceedings of the 6th Brazillian Workshop on Social Network Analysis and Mining, Sao Paulo, SP, Brazil, pp. 510–519 (2017)
Google Scholar
Mubarak, H., Darwish, K., Magdy, W.: Abusive language detection on Arabic social media. In: Proceedings of the First Workshop on Abusive Language Online, Vancouver, BC, Canada, pp. 52–56. ACL (2017)
Google Scholar
Schuster, S., Manning, C.D.: Enhanced English universal dependencies: an improved representation for natural language understanding tasks. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation. European Language Resources Association, Portorož (2016)
Google Scholar
Stamatatos, E.: Authorship attribution using text distortion. In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, vol. 1, pp. 1138–1149. ACL (2017)
Google Scholar
Warner, W., Hirschberg, J.: Detecting hate speech on the world wide web. In: Proceedings of the Second Workshop on Language in Social Media, Montreal, Canada, pp. 19–26. ACL (2012)
Google Scholar
Waseem, Z.: Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In: Proceedings of the First Workshop on NLP and Computational Social Science, Austin, TX, USA, pp. 138–142. ACL (2016)
Google Scholar
Waseem, Z., Hovy, D.: Hateful symbols or hateful people? Predictive features for hate speech detection on Twitter. In: Proceedings of the Student Research Workshop, SRW@HLT-NAACL 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California, USA, pp. 88–93. ACL (2016)
Google Scholar
Yenala, H., Jhanwar, A., Chinnakotla, M.K., et al.: Deep learning for detecting inappropriate content in text. Int. J. Data Sci. Anal. 1–14 (2017)
Google Scholar
The Guardian. http://www.theguardian.com. Accessed 31 Jan 2018
TextSTAT. https://github.com/shivam5992/textstat. Accessed 30 Nov 2017
Pattern.en. https://www.clips.uantwerpen.be/pages/pattern-en. Accessed 30 Nov 2017
SenticNet. http://sentic.net/. Accessed 31 Jan 2018
Scikit-Learn. http://scikit-learn.org/. Accessed 31 Jan 2018
Tsvetkov, Y., Boytsov, L., Gershman, A., Nyberg, E., Dyer, C.: Metaphor detection with cross-lingual model transfer. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL 2014, Baltimore, Maryland, USA (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

University of Trento, CIMeC, Trento, Italy
Patrizio Bellan
FBK-irst, Trento, Italy
Carlo Strapparava

Authors

Patrizio Bellan
View author publications
You can also search for this author in PubMed Google Scholar
Carlo Strapparava
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Carlo Strapparava .

Editor information

Editors and Affiliations

Fondazione Bruno Kessler, Povo (TN), Italy
Chiara Ghidini
Fondazione Bruno Kessler, Povo (TN), Italy
Bernardo Magnini
University of Trento, Povo (TN), Italy
Andrea Passerini
Fondazione Bruno Kessler, Povo (TN), Italy
Paolo Traverso

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bellan, P., Strapparava, C. (2018). Detecting Inappropriate Comments to News. In: Ghidini, C., Magnini, B., Passerini, A., Traverso, P. (eds) AI*IA 2018 – Advances in Artificial Intelligence. AI*IA 2018. Lecture Notes in Computer Science(), vol 11298. Springer, Cham. https://doi.org/10.1007/978-3-030-03840-3_30

Download citation

DOI: https://doi.org/10.1007/978-3-030-03840-3_30
Published: 09 November 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-03839-7
Online ISBN: 978-3-030-03840-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics