Skip to main content
Log in

Fake opinion detection: how similar are crowdsourced datasets to real data?

  • Original Paper
  • Published:
Language Resources and Evaluation Aims and scope Submit manuscript

Abstract

Identifying deceptive online reviews is a challenging tasks for Natural Language Processing (NLP). Collecting corpora for the task is difficult, because normally it is not possible to know whether reviews are genuine. A common workaround involves collecting (supposedly) truthful reviews online and adding them to a set of deceptive reviews obtained through crowdsourcing services. Models trained this way are generally successful at discriminating between ‘genuine’ online reviews and the crowdsourced deceptive reviews. It has been argued that the deceptive reviews obtained via crowdsourcing are very different from real fake reviews, but the claim has never been properly tested. In this paper, we compare (false) crowdsourced reviews with a set of ‘real’ fake reviews published on line. We evaluate their degree of similarity and their usefulness in training models for the detection of untrustworthy reviews. We find that the deceptive reviews collected via crowdsourcing are significantly different from the fake reviews published online. In the case of the artificially produced deceptive texts, it turns out that their domain similarity with the targets affects the models’ performance, much more than their untruthfulness. This suggests that the use of crowdsourced datasets for opinion spam detection may not result in models applicable to the real task of detecting deceptive reviews. As an alternative method to create large-size datasets for the fake reviews detection task, we propose methods based on the probabilistic annotation of unlabeled texts, relying on the use of meta-information generally available on the e-commerce sites. Such methods are independent from the content of the reviews and allow to train reliable models for the detection of fake reviews.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. www.amazon.com.

  2. www.ebay.com.

  3. www.tripadvisor.com.

  4. www.fakespot.com.

  5. https://s3.amazonaws.com/amazon-reviews-pds/readme.html.

  6. We know from discussions with Amazon researchers (p.c.) that Amazon has created a substantial task-force dedicated to identifying fake reviews, removing them from the site, and pursuing their authors, and that the fake reviews in the published dataset were identified by this task-force as being almost certaintly false, but this status is unofficial.

  7. www.yelp.com.

  8. Amazon Mechanical Turk.

  9. Sock puppetry and fake reviews: publish and be damned, http://www.guardian.co.uk/books/2012/sep/04/sock-puppetry-publish-be-damned.

  10. http://www.moneytalksnews.com/2011/07/25/3-tips-for-spotting-fake-product-reviews.

  11. http://www.nytimes.com/2011/08/20/technology/finding-fake-reviews-online.html?_r=1&.

  12. www.nytimes.com/2012/08/26/business/book-reviewers-for-hire-meet-a-demand-for-online-raves.html?pagewanted=all.

  13. We specify the date of the scraping because, obviously, the amount of reviews changes as time passes.

  14. www.crowdflower.com.

  15. https://www.grammarly.com/plagiarism-checker.

  16. https://www.paperrater.com/plagiarism_checker.

  17. http://smallseotools.com/plagiarism-checker/.

  18. www.ims.uni-stuttgart.de.

  19. www.r-project.org.

References

  • Baeza-Yates, R. (2018). Bias on the web. Communications of the ACM, 61(6), 54–61.

    Article  Google Scholar 

  • Banerjee, S., & Chua, A. Y. (2014). Applauses in hotel reviews: Genuine or deceptive? In: Science and Information Conference (SAI), 2014 (pp. 938–942). New York: IEEE.

  • Bhargava, R., Baoni, A., & Sharma, Y. (2018). Composite sequential modeling for identifying fake reviews. Journal of Intelligent Systems,. https://doi.org/10.1515/jisys-2017-0501.

    Article  Google Scholar 

  • Bickel, P. J., & Doksum, K. A. (2015). Mathematical statistics: Basic ideas and selected topics (2nd ed., Vol. 1). Boca Raton: Chapman and Hall/CRC Press.

    Book  Google Scholar 

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(Jan), 993–1022.

    Google Scholar 

  • Blum, A., & Mitchell, T. (1998). Combining labeled and unlabeled data with co-training. In: Proceedings of the eleventh annual conference on computational learning theory (pp. 92–100). New York: ACM.

  • Cagnina, L. C., & Rosso, P. (2017). Detecting deceptive opinions: Intra and cross-domain classification using an efficient representation. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 25(Suppl. 2), 151–174. https://doi.org/10.1142/S0218488517400165.

    Article  Google Scholar 

  • Cardoso, E. F., Silva, R. M., & Almeida, T. A. (2018). Towards automatic filtering of fake reviews. Neurocomputing, 309, 106–116. https://doi.org/10.1016/j.neucom.2018.04.074.

    Article  Google Scholar 

  • Carpenter, B. (2008). Multilevel bayesian models of categorical data annotation. Retrieved from http://lingpipe.files.wordpress.com/2008/11/carp-bayesian-multilevel-annotation.pdf.

  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20, 273–297.

    Google Scholar 

  • Costa, P. T., & MacCrae, R. R. (1992). Revised NEO personality inventory (NEO PI-R) and NEO five-factor inventory (NEO FFI): Professional manual. Psychological Assessment Resources.

  • Dawid, A. P., & Skene, A. M. (1979). Maximum likelihood estimation of observer error-rates using the EM algorithm. Applied Statistics, 28(1), 20–28.

    Article  Google Scholar 

  • Dempster, A. P., Laird, N. M., & Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society Series B (Methodological), 39(1), 1–38.

    Article  Google Scholar 

  • Elkan, C., & Noto, K. (2008). Learning classifiers from only positive and unlabeled data. In: Proceedings of the 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 213–220). New York: ACM.

  • Fei, G., Mukherjee, A., Liu, B., Hsu, M., Castellanos, M., & Ghosh, R. (2013). Exploiting burstiness in reviews for review spammer detection. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (Vol. 13, pp. 175–184).

  • Feng, S., Banerjee, R., & Choi, Y. (2012). Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics (Vol. 2: Short Papers, pp. 171–175). Jeju Island: Association for Computational Linguistics.

  • Forman, G. (2003). An extensive empirical study of feature selection metrics for text classification. Journal of Machine Learning Research, 3, 1289–1305.

    Google Scholar 

  • Fornaciari, T., & Poesio, M. (2013). Automatic deception detection in Italian court cases. Artificial intelligence and law, 21(3), 303–340. https://doi.org/10.1007/s10506-013-9140-4.

    Article  Google Scholar 

  • Fornaciari, T., & Poesio, M. (2014). Identifying fake amazon reviews as learning from crowds. In: Proceedings of the 14th conference of the European chapter of the Association for Computational Linguistics (pp. 279–287). Gothenburg: Association for Computational Linguistics. Retrieved from http://www.aclweb.org/anthology/E14-1030.

  • Gelman, A., & Hill, J. (2007). Data analysis using regression and multilevel/hierarchical models., Analytical methods for social research Cambridge: Cambridge University Press.

    Google Scholar 

  • Graves, A., Jaitly, N., & Mohamed, A. R. (2013). Hybrid speech recognition with deep bidirectional LSTM. In: 2013 IEEE workshop on automatic speech recognition and understanding (ASRU) (pp. 273–278). New York: IEEE.

  • Hernández-Castañeda, Á., & Calvo, H. (2017). Deceptive text detection using continuous semantic space models. Intelligent Data Analysis, 21(3), 679–695.

    Article  Google Scholar 

  • Hernández Fusilier, D., Guzmán, R., Móntes y Gomez, M., & Rosso, P. (2013). Using pu-learning to detect deceptive opinion spam. In: Proc. of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis (pp. 38–45).

  • Hernández Fusilier, D., Montes-y Gómez, M., Rosso, P., & Cabrera, R. G. (2015). Detecting positive and negative deceptive opinions using pu-learning. Information Processing & Management, 51(4), 433–443.

    Article  Google Scholar 

  • Hovy, D. (2016). The enemy in your own camp: How well can we detect statistically-generated fake reviews–an adversarial study. In: The 54th annual meeting of the association for computational linguistics (p 351).

  • Jelinek, F., Lafferty, J. D., & Mercer, R. L. (1992). Basic methods of probabilistic context free grammars. Speech recognition and understanding (pp. 345–360). New York: Springer.

    Chapter  Google Scholar 

  • Jindal, N., & Liu, B. (2008). Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining (pp. 219–230). New York: ACM.

  • Karatzoglou, A., Meyer, D., & Hornik, K. (2006). Support vector machines in R. Journal of Statistical Software, 15(9), 1–28.

    Article  Google Scholar 

  • Kim, S., Lee, S., Park, D., & Kang, J. (2017). Constructing and evaluating a novel crowdsourcing-based paraphrased opinion spam dataset. In: Proceedings of the 26th international conference on world wide web (pp. 827–836). Geneva: International World Wide Web Conferences Steering Committee.

  • Li, F., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to identify review spam. IJCAI Proceedings-International Joint Conference on Artificial Intelligence, 22(3), 2488–2493.

    Google Scholar 

  • Li, H., Chen, Z., Liu, B., Wei, X., & Shao, J. (2014a). Spotting fake reviews via collective positive-unlabeled learning. In: 2014 IEEE international conference on data mining (ICDM) (pp. 899–904). New York: IEEE.

  • Li, H., Fei, G., Wang, S., Liu, B., Shao, W., Mukherjee, A., & Shao, J. (2017). Bimodal distribution and co-bursting in review spam detection. In: Proceedings of the 26th international conference on world wide web (pp. 1063–1072). Geneva: International World Wide Web Conferences Steering Committee.

  • Li, H., Liu, B., Mukherjee, A., & Shao, J. (2014b). Spotting fake reviews using positive-unlabeled learning. Computación y Sistemas, 18(3), 467–475.

    Article  Google Scholar 

  • Li, J., Ott, M., Cardie, C., & Hovy, E. H. (2014c). Towards a general rule for identifying deceptive opinion spam. In: ACL (Vol. 1, pp. 1566–1576).

  • Lin, C. H., Hsu, P. Y., Cheng, M. S., Lei, H. T., & Hsu, M. C. (2017). Identifying deceptive review comments with rumor and lie theories. In: International conference in swarm intelligence (pp. 412–420). New York: Springer.

  • Liu, B., Dai, Y., Li, X., Lee, W. S., & Yu, P. S. (2003). Building text classifiers using positive and unlabeled examples. In: Third IEEE international conference on data mining (pp. 179–186). New York: IEEE.

  • Liu, B., Lee, W. S., Yu, P. S., & Li, X. (2002). Partially supervised classification of text documents. ICML, 2, 387–394.

    Google Scholar 

  • Martens, D., & Maalej, W. (2019). Towards understanding and detecting fake reviews in app stores. Empirical Software Engineering,. https://doi.org/10.1007/s10664-019-09706-9.

    Article  Google Scholar 

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:13013781.

  • Mukherjee, A., Kumar, A., Liu, B., Wang, J., Hsu, M., Castellanos, M., & Ghosh, R. (2013a). Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 632–640) New York: ACM.

  • Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. S. (2013b). What yelp fake review filter might be doing? In: Proceedings of the seventh international AAAI conference on weblogs and social media.

  • Negri, M., Bentivogli, L., Mehdad, Y., Giampiccolo, D., & Marchetti, A. (2011). Divide and conquer: Crowdsourcing the creation of cross-lingual textual entailment corpora. In: Proceedings of the conference on empirical methods in natural language processing (pp. 670–679). Stroudsburg: Association for Computational Linguistics.

  • Ott, M., Cardie, C., & Hancock, J. T. (2013). Negative deceptive opinion spam. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies (pp. 497–501).

  • Ott, M., Choi, Y., Cardie, C., & Hancock, J. (2011). Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual meeting of the association for computational linguistics: human language technologies (pp. 309–319). Portland, Oregon: Association for Computational Linguistics.

  • Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count (LIWC): LIWC2001. Mahwah: Lawrence Erlbaum Associates.

    Google Scholar 

  • Pennington, J., Socher, R., & Manning, C. (2014). Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP) (pp. 1532–1543).

  • Raykar, V. C., Yu, S., Zhao, L. H., Valadez, G. H., Florin, C., Bogoni, L., et al. (2010). Learning from crowds. Journal of Machine Learning Research, 11, 1297–1322.

    Google Scholar 

  • Ren, Y., & Ji, D. (2017). Neural networks for deceptive opinion spam detection: An empirical study. Information Sciences, 385, 213–224.

    Article  Google Scholar 

  • Rout, J. K., Dalmia, A., Choo, K. K. R., Bakshi, S., & Jena, S. K. (2017). Revisiting semi-supervised learning for online deceptive review detection. IEEE Access, 5(1), 1319–1327.

    Article  Google Scholar 

  • Saini, M., & Sharan, A. (2017). Ensemble learning to find deceptive reviews using personality traits and reviews specific features. Journal of Digital Information Management, 12(2), 84–94.

    Google Scholar 

  • Salloum, W., Edwards, E., Ghaffarzadegan, S., Suendermann-Oeft, D., & Miller, M. (2017). Crowdsourced continuous improvement of medical speech recognition. In: The AAAI-17 workshop on crowdsourcing, deep learning, and artificial intelligence agents.

  • Schmid, H. (1994). Probabilistic part-of-speech tagging using decision trees. In: Proceedings of international conference on new methods in language processing. Retrieved from http://www.ims.uni-stuttgart.de/ftp/pub/corpora/tree-tagger1.pdf.

  • Shehnepoor, S., Salehi, M., Farahbakhsh, R., & Crespi, N. (2017). Netspam: A network-based spam detection framework for reviews in online social media. IEEE Transactions on Information Forensics and Security, 12(7), 1585–1595.

    Article  Google Scholar 

  • Skeppstedt, M., Peldszus, A., & Stede, M. (2018). More or less controlled elicitation of argumentative text: Enlarging a microtext corpus via crowdsourcing. In: Proceedings of the 5th workshop on argument mining (pp. 155–163).

  • Strapparava, C., & Mihalcea, R. (2009). The lie detector: Explorations in the automatic recognition of deceptive language. In: Proceedings of the 47th annual meeting of the association for computational linguistics and the 4th international joint conference on natural language processing.

  • Streitfeld, D. (August \(25{{\rm th}}\), 2012). The best book reviews money can buy. The New York Times.

  • Whitehill, J., Wu, T., Bergsma, F., Movellan, J. R., & Ruvolo, P. L. (2009). Whose vote should count more: Optimal integration of labels from labelers of unknown expertise. Advances in neural information processing systems (pp. 2035–2043). Cambridge: MIT Press.

    Google Scholar 

  • Xie, S., Wang, G., Lin, S., & Yu, P. S. (2012). Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining (pp 823–831). New York: ACM.

  • Yang, Y., & Liu, X. (1999). A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, SIGIR ’99 (pp. 42–49). New York: ACM.

  • Zhang, W., Bu, C., Yoshida, T., & Zhang, S. (2016). Cospa: A co-training approach for spam review identification with support vector machine. Information, 7(1), 12.

    Article  Google Scholar 

  • Zhang, W., Du, Y., Yoshida, T., & Wang, Q. (2018). DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network. Information Processing & Management, 54(4), 576–592.

    Article  Google Scholar 

  • Zhou, L., Shi, Y., & Zhang, D. (2008). A Statistical Language Modeling Approach to Online Deception Detection. IEEE Transactions on Knowledge and Data Engineering, 20(8), 1077–1081.

    Article  Google Scholar 

Download references

Acknowledgements

Leticia Cagnina thanks CONICET for the continued financial support. This work was funded by MINECO/FEDER (Grant No. SomEMBED TIN2015-71147-C2-1-P). The work of Paolo Rosso was partially funded by the MISMIS-FAKEnHATE Spanish MICINN research project (PGC2018-096212-B-C31). Massimo Poesio was in part supported by the UK Economic and Social Research Council (Grant Number ES/M010236/1).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tommaso Fornaciari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fornaciari, T., Cagnina, L., Rosso, P. et al. Fake opinion detection: how similar are crowdsourced datasets to real data?. Lang Resources & Evaluation 54, 1019–1058 (2020). https://doi.org/10.1007/s10579-020-09486-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10579-020-09486-5

Keywords

Navigation