Multimedia Tools and Applications

, Volume 76, Issue 3, pp 3187–3211 | Cite as

Deceptive review detection using labeled and unlabeled data

  • Jitendra Kumar Rout
  • Smriti Singh
  • Sanjay Kumar Jena
  • Sambit Bakshi


Availability of millions of products and services on e-commerce sites makes it difficult to search the best suitable product according to the requirements because of existence of many alternatives. To get rid of this the most popular and useful approach is to follow reviews of others in opinionated social medias, who have already tried them. Almost all e-commerce sites provide facility to the users for giving views and experience of the product and services they experienced. The customers reviews are increasingly used by individuals, manufacturers and retailers for purchase and business decisions. As there is no scrutiny over the reviews received, anybody can write anything unanimously which conclusively leads to review spam. Moreover, driven by the desire of profit and/or publicity, spammers produce synthesized reviews to promote some products/brand and demote competitors products/brand. Deceptive review spam has seen a considerable growth overtime. In this work, we have applied supervised as well as unsupervised techniques to identify review spam. Most effective feature sets have been assembled for model building. Sentiment analysis has also been incorporated in the detection process. In order to get best performance some well-known classifiers were applied on labeled dataset. Further, for the unlabeled data, clustering is used after desired attributes were computed for spam detection. Additionally, there is a high chance that spam reviewers may also be held responsible for content pollution in multimedia social networks, because nowadays many users are giving the reviews using their social network logins. Finally, the work can be extended to find suspicious accounts responsible for posting fake multimedia contents into respective social networks.


Review spam Spam detection techniques Review analysis Opinion spam Sentiment analysis Social networking 


  1. 1.
    Akoglu L, Chandy R, Faloutsos C (2013) Opinion fraud detection in online reviews by network effects. Proc Seventh Int AAAI Conf Weblogs Soc Media 13:2–11Google Scholar
  2. 2.
    Algur SP, Patil AP, Hiremath P, Shivashan S (2010) Conceptual level similarity measure based review spam detection. In: International Conference on Signal and Image Processing. doi:10.1109/ICSIP.2010.5697509, pp 416–423
  3. 3.
    Blum A, Mitchell T (1998) Combining labeled and unlabeled data with co-training. In: Proceedings of the 11th Annual Conference on Computational Learning Theory. doi:10.1145/279943.279962, pp 92–100
  4. 4.
    Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):1–24. doi:10.1186/s40537-015-0029-9 CrossRefGoogle Scholar
  5. 5.
    Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. Proc Seventh Int AAAI Conf Weblogs Soc Media 13:175–184Google Scholar
  6. 6.
    Gao Y, Wang F, Luan H, Chua TS (2014) Brand data gathering from live social media streams. In: Proceedings of International Conference on Multimedia Retrieval. doi:10.1145/2578726.2578748, p 169
  7. 7.
    Gao Y, Zhao S, Yang Y, Chua TS (2015) Multimedia social event detection in microblog. In: Multimedia Modeling. doi:10.1007/978-3-319-14445-0-24, pp 269–281
  8. 8.
    Günnemann S, Günnemann N, Faloutsos C (2014) Detecting anomalies in dynamic rating data: A robust probabilistic model for rating evolution. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/2623330.2623721, pp 841–850
  9. 9.
    Harris C (2012) Detecting deceptive opinion spam using human computation. In: Workshops at AAAI on Artificial IntelligenceGoogle Scholar
  10. 10.
    Hernández D, Guzmán R, Móntes y, Gomez M, Rosso P (2013) Using pu-learning to detect deceptive opinion spam. In: Proceedings of the 4th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pp 38–45Google Scholar
  11. 11.
    Jindal N, Liu B (2007) Analyzing and detecting review spam. In: Proceedings of the Seventh IEEE International Conference on Data Mining. doi:10.1109/ICDM.2007.68, pp 547–552
  12. 12.
    Jindal N, Liu B (2007) Review spam detection. In: Proceedings of the 16th International Conference on World Wide Web. doi:10.1145/1242572.1242759, pp 1189–1190
  13. 13.
    Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 International Conference on Web Search and Data Mining. doi:10.1145/1341531.1341560, pp 219–230
  14. 14.
    Lai C, Xu K, Lau RY, Li Y, Jing L (2010) Toward a language modeling approach for consumer review spam detection. In: Proceedings of IEEE 7th International Conference on E-business Engineering. doi:10.1109/ICEBE.2010.47, pp 1–8
  15. 15.
    Lau RY, Liao S, Kwok RCW, Xu K, Xia Y, Li Y (2011) Text mining and probabilistic language modeling for online review spam detecting. ACM Trans Manag Inf Syst 2(4):1–30. doi:10.1145/2070710.2070716 CrossRefGoogle Scholar
  16. 16.
    Lee K, Caverlee J, Pu C (2014) Social spam, campaigns, misinformation and crowdturfing. In: WWW (Companion volume). doi:10.1145/2567948.2577270, pp 199–200
  17. 17.
    Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI Proceedings-International Joint Conference on Artificial Intelligence. doi:10.5591/978-1-57735-516-8/IJCAI11-414, vol 22, p 2488
  18. 18.
    Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM International Conference on Information and Knowledge Management. doi:10.1145/1871437.1871557, pp 939–948
  19. 19.
    Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of 3rd IEEE International Conference on Data Mining. doi:10.1109/ICDM.2003.1250918, pp 179–186
  20. 20.
    Long NH, Nghia PHT, Vuong NM (2014) Opinion spam recognition method for online reviews using ontological features. Tap chi KHOA HoC DHSP TPHCM (61) 44Google Scholar
  21. 21.
    Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th International Conference Companion on World Wide Web. doi:10.1145/1963192.1963240, pp 93–94
  22. 22.
    Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web. doi:10.1145/2187836.2187863, pp 191–200
  23. 23.
    Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. doi:10.1145/2487575.2487580, pp 632–640
  24. 24.
    Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: Classification and analysis of real and pseudo reviews. Technical. Report., Technical Report UIC-CS-2013-03 University of Illinois at ChicagoGoogle Scholar
  25. 25.
    Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing?. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social MediaGoogle Scholar
  26. 26.
    Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies-volume 1, pp 309–319Google Scholar
  27. 27.
    Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp 497–501Google Scholar
  28. 28.
    Peng Q, Zhong M (2014) Detecting spam review through sentiment analysis. J Softw 9(8):2065–2072. doi:10.4304/jsw.9.8.2065-2072 CrossRefGoogle Scholar
  29. 29.
    Qi S, Wang F, Wang X, Wei J, Zhao H (2015) Live multimedia brand-related data identification in microblog. Neurocomputing 158:225–233. doi:10.1016/j.neucom.2015.01.041 CrossRefGoogle Scholar
  30. 30.
    Rayson P, Wilson A, Leech G (2001) Grammatical word class variation within the british national corpus sampler. Lang Comput 36(1):295–306Google Scholar
  31. 31.
    Ren Y, Ji D, Zhang H (2014) Positive unlabeled learning for deceptive reviews detection. In: Proceedings of First Conference on Empirical Methods in Natural Language Processing, pp 488–498Google Scholar
  32. 32.
    Shojaee S, Murad MAA, Bin Azman A, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: Proceedings of 13th International Conference on Intelligent Systems Design and Applications. doi:10.1109/ISDA.2013.6920707, pp 53–58
  33. 33.
    Wang F, Qi S, Gao G, Zhao S, Wang X (2016) Logo information recognition in large-scale social media data. Multimed Syst 22(1):63–73. doi:10.1007/s00530-014-0393-x CrossRefGoogle Scholar
  34. 34.
    Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the First Workshop on Social Media Analytics. doi:10.1145/1964858.1964860, pp 10–13
  35. 35.
    Zhang Z, Wang K (2013) A trust model for multimedia social networks. Soc Netw Anal Min 3(4):969–979. doi:10.1007/s13278-012-0078-4 CrossRefGoogle Scholar
  36. 36.
    Zhao S, Yao H, Zhao S, Jiang X, Jiang X (2014) Multi-modal microblog classification via multi-task learning. Multimed Tools Appl:1–18. doi:10.1007/s11042-014-2342-2

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Jitendra Kumar Rout
    • 1
  • Smriti Singh
    • 2
  • Sanjay Kumar Jena
    • 1
  • Sambit Bakshi
    • 1
  1. 1.National Institute of TechnologyRourkelaIndia
  2. 2.Teradata CorporationTelanganaIndia

Personalised recommendations