Framework for Opinion Spammers Detection

  • Andrzej Opalinski
  • Grzegorz Dobrowolski
Part of the Communications in Computer and Information Science book series (CCIS, volume 429)


Evolution of the WEB and high anonymity of virtual identities result in positive and negative impact on society. One of the negative effects is a problem of false spam opinions, which are distributed throughout WEB forums and recommendation portals. Researches in this area mainly concern detection of particular examples of spam opinions. Nevertheless, an idea of detecting virtual multi-identities, created by a single person, still seems to be lacking effective solutions. Presented article describes a system which allows to search virtual multi-identities, created in order to generate spam opinions. The system bases on a combination of features from various domains: natural language processing, time-activity analysis and related to common objects. Series of tests evaluated system’s efficiency in the area of detecting virtual multi-identities from recommendation portal.


virtual identities opinion spam cybercrime 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Chen, H.-C., Goldberg, M., Magdon-Ismail, M.: Identifying multi-ID users in open forums. In: Chen, H., Moore, R., Zeng, D.D., Leavitt, J. (eds.) ISI 2004. LNCS, vol. 3073, pp. 176–186. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  2. 2.
    Chen, C., Wu, K., Srinivasan, V., Zhang, X.: Battling the internet water army: Detection of hidden paid posters. arXiv preprint:1111.4297 (2011)Google Scholar
  3. 3.
    Christopherson, K.M.: The positive and negative implications of anonymity in Internet social interactions: On the Internet, nobody knows you’re a dog. Computers in Human Behavior 23(6), 3038–3056 (2007)CrossRefGoogle Scholar
  4. 4.
    De Vel, O., Anderson, A., Corney, M., Mohay, G.: Mining e-mail content for author identification forensics. ACM Sigmod Record 30(4), 55–64 (2001)CrossRefGoogle Scholar
  5. 5.
    Harris, C.G.: Detecting Deceptive Opinion Spam Using Human Computation. In: Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence (2012)Google Scholar
  6. 6.
    International Telecommunication Union: Measuring the Information Society 2012, Place des Nations, CH-1211 Geneva Switzerland (2012) ISBN 978-92-61-14071-7Google Scholar
  7. 7.
    Jindal, N., Liu, B.: Analyzing and Detecting Review Spam. In: Data Mining, ICDM 2007, October 28-31, pp. 547–552 (2007)Google Scholar
  8. 8.
    Jindal, N., Liu, B.: Opinion spam and analysis. In: Proceedings of the International Conference on Web Search and Web Data Mining, pp. 219–230 (2008)Google Scholar
  9. 9.
    Juola, P.: Authorship attribution. Foundations and Trends in Information Retrieval 1(3), 233–334 (2007)CrossRefGoogle Scholar
  10. 10.
    Kim, S.M., Pantel, P., Chklovski, T., Pennacchiotti, M.: Automatically assessing review helpfulness. In: Proc. of the 2006 Conference on Empirical Methods in Natural Language Processing, pp. 423–430 (2006)Google Scholar
  11. 11.
    Le, J., Edmonds, A., Hester, V., Biewald, L.: Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution. In: SIGIR 2010 Workshop on Crowdsourcing for Search Evaluation, pp. 21–26 (2010)Google Scholar
  12. 12.
    Li, J., Wang, G.A., Chen, H.: Identity matching using personal and social identity features. Information Systems Frontiers 13(1), 101–113 (2010)CrossRefGoogle Scholar
  13. 13.
    Maciolek, P., Dobrowolski, G.: CLUO: Web-Scale Text Mining System for Open Source Intelligence Purposes. Computer Science 14(1), 45 (2013), doi:10.7494Google Scholar
  14. 14.
    Miniwatts Marketing Group: World internet usage and population statistics (June 30, 2012),
  15. 15.
    Mukherjee, A., Liu, B., Glance, N.: Spotting Fake reviewer groups in consumer reviews. In: Proc. of the 21st Int. Conf. on WWW, pp. 191–200. ACM (2012)Google Scholar
  16. 16.
    Musial, K., Kazienko, P.: Social networks on the internet. World Wide Web, 1–42 (2012)Google Scholar
  17. 17.
    Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint:1107.4557 (2011)Google Scholar
  18. 18.
    Pillay, S.R., Solorio, T.: Authorship attribution of web forum posts. In: eCrime Researchers Summit (eCrime), pp. 1–7. IEEE (2010)Google Scholar
  19. 19.
    Stamatatos, E.: Author identification using imbalanced and limited training texts. In: 18th International Workshop on DEXA 2007, pp. 237–241. IEEE (2007)Google Scholar
  20. 20.
    Stamatatos, E.: A survey of modern authorship attribution methods. Journal of the American Society for information Science and Technology 60(3), 538–556 (2009)CrossRefGoogle Scholar
  21. 21.
    Stamatatos, E.: Intrinsic plagiarism detection using character n-gram profiles. In: 3rd PAN Workshop on Uncovering Plagiarism, Authorship and Social Software Misuse, vol. 2, p. 38 (2009)Google Scholar
  22. 22.
    Thomas, D., Loader, B.: Cybercrime: Security and surveillance in the information age. Routledge (2000)Google Scholar
  23. 23.
    Turek, W., Opalinski, A., Kisiel-Dorohinicki, M.: Extensible web crawler – towards multimedia material analysis. In: Dziech, A., Czyżewski, A. (eds.) MCSS 2011. CCIS, vol. 149, pp. 183–190. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  24. 24.
    Wang, A.G., Atabakhsh, H., Petersen, T., Chen, H.: Discovering identity problems: A case study. In: Kantor, P., Muresan, G., Roberts, F., Zeng, D.D., Wang, F.-Y., Chen, H., Merkle, R.C. (eds.) ISI 2005. LNCS, vol. 3495, pp. 368–373. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  25. 25.
    Wang, D., Irani, D., Pu, C.: A social-spam detection framework. In: Proceedings of the 8th Annual Collaboration, Electronic Messaging, Anti-Abuse and Spam Conference, pp. 46–54 (2011)Google Scholar
  26. 26.
    Wang, G., Mohanlal, M., Wilson, C., Wang, X., Metzger, M., Zheng, H., Zhao, B.: Social Turing Tests: Crowdsourcing Sybil Detection. arXiv preprint:1205.3856 (2012)Google Scholar
  27. 27.
    Weimer, M., Gurevych, I., Mühlhäuser, M.: Automatically assessing the post quality in online discussions on software. In: Proceedings of the 45th Annual Meeting of the ACL, pp. 125–128 (2007)Google Scholar
  28. 28.
    van Kokswijk, J.: Granting Personality to a Virtual Identity. International Journal of Human and Social Sciences 2(4) (2010)Google Scholar
  29. 29.
    Vrij, A.: Detecting lies and deceit: Pitfalls and opportunities. Wiley Interscience (2008)Google Scholar
  30. 30.
    Xie, S., Wang, G., Lin, S., Yu, P.S.: Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD, pp. 823–831 (2012)Google Scholar
  31. 31.
    Xu, J., Chau, M., Wang, G.A., Li, J.: Complex problem solving: identity matching based on social contextual information. Journal of the Association for Information Systems 8(10), 525–545 (2007)Google Scholar
  32. 32.
    Yang, Y.C., Padmanabhan, B.: Toward user patterns for online security: Observation time and user identification. Decision Support Systems 48(4), 548–558 (2010)CrossRefGoogle Scholar
  33. 33.
    Zheng, R., Qin, Y., Huang, Z., Chen, H.: Authorship analysis in cybercrime investigation. In: Chen, H., Miranda, R., Zeng, D.D., Demchak, C.C., Schroeder, J., Madhusudan, T. (eds.) ISI 2003. LNCS, vol. 2665, pp. 59–73. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  34. 34.
    Zheng, R., Li, J., Chen, H., Huang, Z.: A framework for authorship identification of online messages: Writing-style features and classification techniques. Journal of the American Society for Information Science and Technology 57(3), 378–393 (2005)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Andrzej Opalinski
    • 1
  • Grzegorz Dobrowolski
    • 1
  1. 1.AGH University of Science and TechnologyKrakowPoland

Personalised recommendations