Anomalous User Comment Detection in Social News Websites

  • Jorge de-la-Peña-Sordo
  • Iker Pastor-López
  • Xabier Ugarte-Pedrero
  • Igor Santos
  • Pablo García Bringas
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 299)


The Web has evolved over the years and, now, not only the administrators of a site generate content. Users of a website can express themselves showing their feelings or opinions. This fact has led to negative side effects: sometimes the content generated is inappropriate. Frequently, this content is authored by troll users who deliberately seek controversy. In this paper we propose a new method to detect trolling comments in social news websites. To this end, we extract a combination of statistical, syntactic and opinion features from the user comments. Since this troll phenomenon is quite common in the web, we propose a novel experimental setup for our anomaly detection method: considering troll comments as base model (normal behaviour: ‘normality’). We evaluate our approach with data from ‘Menéame’, a popular Spanish social news site, showing that our method can obtain high rates whilst minimising the labelling task.


Information Retrieval Troll Detection Web Categorisation Content Filtering Machine-Learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    OReilly, T.: What is web 2.0: Design patterns and business models for the next generation of software. Communications & Strategies (1), 17 (2007)Google Scholar
  2. 2.
    Dadvar, M., Trieschnigg, D., Ordelman, R., de Jong, F.: Improving cyberbullying detection with user context. In: Serdyukov, P., Braslavski, P., Kuznetsov, S.O., Kamps, J., Rüger, S., Agichtein, E., Segalovich, I., Yilmaz, E. (eds.) ECIR 2013. LNCS, vol. 7814, pp. 693–696. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  3. 3.
    Smith, P.K., Mahdavi, J., Carvalho, M., Fisher, S., Russell, S., Tippett, N.: Cyberbullying: Its nature and impact in secondary school pupils. Journal of Child Psychology and Psychiatry 49(4), 376–385 (2008)CrossRefGoogle Scholar
  4. 4.
    Dinakar, K., Reichart, R., Lieberman, H.: Modeling the detection of textual cyberbullying. In: The Social Mobile Web (2011)Google Scholar
  5. 5.
    Shachaf, P., Hara, N.: Beyond vandalism: Wikipedia trolls. Journal of Information Science 36(3), 357–370 (2010)CrossRefGoogle Scholar
  6. 6.
    Bergstrom, K.: don’t feed the troll: Shutting down debate about community expectations on reddit. com. First Monday 16(8) (2011)Google Scholar
  7. 7.
    Fisher, D., Smith, M., Welser, H.T.: You are who you talk to: Detecting roles in usenet newsgroups. In: Proceedings of the 39th Annual Hawaii International Conference on System Sciences, HICSS 2006, vol. 3, p. 59b. IEEE (2006)Google Scholar
  8. 8.
    Lea, M., O’Shea, T., Fung, P., Spears, R.: ’Flaming’in computer-mediated communication: Observations, explanations, implications. Harvester Wheatsheaf (1992)Google Scholar
  9. 9.
    Postmes, T., Spears, R., Lea, M.: Breaching or building social boundaries? side-effects of computer-mediated communication. Communication Research 25(6), 689–715 (1998)CrossRefGoogle Scholar
  10. 10.
    Lerman, K.: User participation in social media: Digg study. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops, pp. 255–258. IEEE Computer Society (2007)Google Scholar
  11. 11.
    Jindal, N., Liu, B.: Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1189–1190. ACM (2007)Google Scholar
  12. 12.
    Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230. ACM (2008)Google Scholar
  13. 13.
    Santos, I., de-la Peña-Sordo, J., Pastor-López, I., Galán-García, P., Bringas, P.: Automatic categorisation of comments in social news websites. Expert Systems with Applications (2012)Google Scholar
  14. 14.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)Google Scholar
  15. 15.
    Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill New York (1983)Google Scholar
  16. 16.
    Tata, S., Patel, J.M.: Estimating the selectivity of tf-idf based cosine similarity predicates. ACM SIGMOD Record 36(2), 75–80 (2007)CrossRefGoogle Scholar
  17. 17.
    Kohavi, R., et al.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: IJCAI, vol. 14, pp. 1137–1145 (1995)Google Scholar
  18. 18.
    Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 1991 Conference on Uncertainty in Artificial Intelligence (1991)Google Scholar
  19. 19.
    Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. In: Machine Learning, pp. 131–163 (1997)Google Scholar
  20. 20.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar
  21. 21.
    Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783–789 (1999)CrossRefGoogle Scholar
  22. 22.
    Maji, S., Berg, A., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)Google Scholar
  23. 23.
    Üstün, B., Melssen, W., Buydens, L.: Visualisation and interpretation of support vector regression models. Analytica Chimica Acta 595(1-2), 299–309 (2007)CrossRefGoogle Scholar
  24. 24.
    Cho, B., Yu, H., Lee, J., Chee, Y., Kim, I., Kim, S.: Nonlinear support vector machine visualization for risk factor analysis using nomograms and localized radial basis function kernels. IEEE Transactions on Information Technology in Biomedicine 12(2), 247–256 (2008)CrossRefGoogle Scholar
  25. 25.
    Garner, S.: Weka: The waikato environment for knowledge analysis. In: Proceedings of the 1995 New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  26. 26.
    Quinlan, J.: C4.5 programs for machine learning. Morgan Kaufmann (1993)Google Scholar
  27. 27.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jorge de-la-Peña-Sordo
    • 1
  • Iker Pastor-López
    • 1
  • Xabier Ugarte-Pedrero
    • 1
  • Igor Santos
    • 1
  • Pablo García Bringas
    • 1
  1. 1.S3Lab, DeustoTech ComputingUniversity of DeustoBilbaoSpain

Personalised recommendations