Social News Website Moderation through Semi-supervised Troll User Filtering

  • Jorge de-la-Peña-Sordo
  • Igor Santos
  • Iker Pastor-López
  • Pablo García Bringas
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 239)


Recently, Internet is changing to a more social space in which all users can provide their contributions and opinions to others via websites, social networks or blogs. Accordingly, content generation within social webs has also evolved. Users of social news sites make public links to news stories, so that every user can comment them or other users’ comments related to the stories. In these sites, classifying users depending on how they behave, can be useful for web profiling, user moderation, etc. In this paper, we propose a new method for filtering trolling users. To this end, we extract several features from the public users’ profiles and from their comments in order to predict whether a user is troll or not. These features are used to train several machine learning techniques. Since the number of users and their comments is very high and the labelling process is laborious, we use a semi-supervised approach known as collective learning to reduce the labelling efforts of supervised approaches. We validate our approach with data from ‘Menéame’, a popular Spanish social news site, showing that our method can achieve high accuracy rates whilst minimising the labelling task.


User Profiling Content Filtering Web Mining User Categorisation Machine-learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Lerman, K.: User participation in social media: Digg study. In: Proceedings of the 2007 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology-Workshops, pp. 255–258. IEEE Computer Society (2007)Google Scholar
  2. 2.
    Jindal, N., Liu, B.: Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web, pp. 1189–1190. ACM (2007)Google Scholar
  3. 3.
    Chapelle, O., Schölkopf, B., Zien, A., et al.: Semi-supervised learning, vol. 2. MIT Press, Cambridge (2006)Google Scholar
  4. 4.
    Neville, J., Jensen, D.: Collective classification with relational dependency networks. In: Proceedings of the Second International Workshop on Multi-Relational Data Mining, pp. 77–91 (2003)Google Scholar
  5. 5.
    Laorden, C., Sanz, B., Santos, I., Galán-García, P., Bringas, P.G.: Collective classification for spam filtering. In: Herrero, Á., Corchado, E. (eds.) CISIS 2011. LNCS, vol. 6694, pp. 1–8. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    Santos, I., de-la Peña-Sordo, J., Pastor-López, I., Galán-García, P., Bringas, P.: Automatic categorisation of comments in social news websites. Expert Systems with Applications (2012)Google Scholar
  7. 7.
    Baeza-Yates, R.A., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley Longman Publishing Co., Inc., Boston (1999)Google Scholar
  8. 8.
    Namata, G., Sen, P., Bilgic, M., Getoor, L.: Collective classification for text classification. Text Mining, 51–69 (2009)Google Scholar
  9. 9.
    Garner, S.: Weka: The waikato environment for knowledge analysis. In: Proceedings of the 1995 New Zealand Computer Science Research Students Conference, pp. 57–64 (1995)Google Scholar
  10. 10.
    Salton, G., McGill, M.: Introduction to modern information retrieval. McGraw-Hill, New York (1983)MATHGoogle Scholar
  11. 11.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press (1995)Google Scholar
  12. 12.
    Kent, J.: Information gain and a general measure of correlation. Biometrika 70(1), 163–173 (1983)MathSciNetMATHCrossRefGoogle Scholar
  13. 13.
    Singh, Y., Kaur, A., Malhotra, R.: Comparative analysis of regression and machine learning methods for predicting fault proneness models. International Journal of Computer Applications in Technology 35(2), 183–193 (2009)CrossRefGoogle Scholar
  14. 14.
    Cooper, G.F., Herskovits, E.: A bayesian method for constructing bayesian belief networks from databases. In: Proceedings of the 1991 Conference on Uncertainty in Artificial Intelligence (1991)Google Scholar
  15. 15.
    Geiger, D., Goldszmidt, M., Provan, G., Langley, P., Smyth, P.: Bayesian network classifiers. Machine Learning, 131–163 (1997)Google Scholar
  16. 16.
    Amari, S., Wu, S.: Improving support vector machine classifiers by modifying kernel functions. Neural Networks 12(6), 783–789 (1999)CrossRefGoogle Scholar
  17. 17.
    Maji, S., Berg, A., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1–8. IEEE (2008)Google Scholar
  18. 18.
    Üstün, B., Melssen, W., Buydens, L.: Visualisation and interpretation of support vector regression models. Analytica Chimica Acta 595(1-2), 299–309 (2007)CrossRefGoogle Scholar
  19. 19.
    Park, J., Sandberg, I.W.: Universal approximation using radial-basis-function networks. Neural Computation 3(2), 246–257 (1991)CrossRefGoogle Scholar
  20. 20.
    Quinlan, J.: C4. 5 programs for machine learning. Morgan Kaufmann Publishers (1993)Google Scholar
  21. 21.
    Breiman, L.: Random forests. Machine learning 45(1), 5–32 (2001)MATHCrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Jorge de-la-Peña-Sordo
    • 1
  • Igor Santos
    • 1
  • Iker Pastor-López
    • 1
  • Pablo García Bringas
    • 1
  1. 1.S3Lab, DeustoTech ComputingUniversity of DeustoBilbaoSpain

Personalised recommendations