Skip to main content
Log in

Factitious or fact? Learning textual representations for fake online review detection

  • Published:
Cluster Computing Aims and scope Submit manuscript

Abstract

User reviews can play a big part in deciding a company's income in the e-commerce industry. Before making selections regarding any product or service, online users rely on reviews. As a result, the trustworthiness of online evaluations is vital for organisations and can directly impact their reputation and revenue. Because of this, some firms pay spammers to publish false reviews. Most recent studies to detect fake reviews utilise supervised learning. However, neural network techniques, a recent form of advanced technology, have been utilised extensively to detect fake reviews and have demonstrated their ability to do so. Thus, this paper first provides a benchmark study to analyse the performance of various machine learning algorithms with different feature extraction methods on five fake review datasets to present our results. Second, we propose three advanced language models for embedding reviews into the classifiers. Third, we conduct an exhaustive feature set evaluation study to find the best features in detecting fake reviews. Fourth, we analyse the performance of traditional machine learning, deep learning, and advanced deep learning models using different feature extraction methods on five fake review datasets. Finally, we integrate the ELECTRA model with CNN which can identify real or fake reviews. Our proposed technique utilises accuracy, precision, recall, and F1 score as assessment criteria to determine the leniency of the proposed model. For deep contextualised representation and neural classification, we integrate Single-Layer Perceptron (SLP), Multi-Layer Perceptron (MLP), and Convolutional Neural Networks (CNN) following the embedding layer of unique pre-trained models like ELMo, ELECTRA, and GPT2. The experimental results indicate that our proposed model outperforms state-of-the-art methods with improvements ranging from 1 to 7% in terms of the accuracy, F1 score. To the best of our knowledge, no prior work has evaluated such advanced pre-trained models' efficiency in detecting fake reviews. Further, this research comprehensively evaluates several machine-learning approaches and feature extraction strategies for fake online review detection.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

Data availability

All datasets are open-source, and the sources are cited.

Notes

  1. https://www.nltk.org/_modules/nltk/tag.html.

References

  1. Mir, A.Q., Khan, F.Y., Chishti, M.A.: Online Fake Review Detection Using Supervised Machine Learning and BERT Model. arXiv preprint (2023). arXiv:230103225

  2. Kolides, A., Nawaz, A., Rathor, A., Beeman, D., Hashmi, M., Fatima, S., et al.: Artificial intelligence foundation and pre-trained models: fundamentals, applications, opportunities, and social impacts. Simul. Model. Pract. Theory 126, 102754 (2023)

    Article  Google Scholar 

  3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, 2017, pp. 5998–6008.

  4. Aslam, U., Jayabalan, M., Ilyas, H., Suhail, A.: A survey on opinion spam detection methods. Int. J. Sci. Technol. Res. 8(9), 1355–1363 (2019)

    Google Scholar 

  5. Vidanagama, D.U., Silva, T.P., Karunananda, A.S.: Deceptive consumer review detection: a survey. Artif. Intell. Rev. 53(2), 1323–1352 (2020)

    Article  Google Scholar 

  6. Rodrigues, J.C., Rodrigues, J.T., Gonsalves, V.L.K., Naik, A.U., Shetgaonkar, P., Aswale, S.: Machine and deep learning techniques for detection of fake reviews: a survey. In: 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), 2020, pp. 1–8. IEEE (2020)

  7. Wu, Y., Ngai, E.W., Wu, P., Wu, C.: Fake online reviews: literature review, synthesis, and directions for future research. Decis. Support. Syst. 132, 113280 (2020)

    Article  Google Scholar 

  8. Ren, Y., Ji, D.: Learning to detect deceptive opinion spam: a survey. IEEE Access 7, 42934–42945 (2019)

    Article  Google Scholar 

  9. E4tech. The Fuel Cell Industry Review 2017. E4tech, London (2017)

  10. Sedighi, Z., Ebrahimpour-Komleh, H., Bagheri, A.: RLOSD: representation learning based opinion spam detection. In: 2017 3rd Iranian Conference on Intelligent Systems and Signal Processing (ICSPIS), 2017, pp. 74–80. IEEE (2017)

  11. Khurshid, F., Zhu, Y., Yohannese, C.W., Iqbal, M.: Recital of supervised learning on review spam detection: an empirical analysis. In: 2017 12th International Conference on Intelligent Systems and Knowledge Engineering (ISKE), 2017, pp. 1–6. IEEE (2017)

  12. Kondamudi, M.R., Sahoo, S.R., Chouhan, L., Yadav, N.: A comprehensive survey of fake news in social networks: attributes, features, and detection approaches. J. King Saud Univ. Comput. Inf. Sci. 35(6), 101571 (2023)

    Google Scholar 

  13. Li, L., Ren, W., Qin, B., Liu, T.: Learning document representation for deceptive opinion spam detection. In: Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data, pp. 393–404. Springer (2015)

  14. Zhao, S., Xu, Z., Liu, L., Guo M.: Towards accurate deceptive opinion spam detection based on word order-preserving CNN. arXiv preprint (2017). arXiv:171109181

  15. Ren, Y., Zhang, Y.: Deceptive opinion spam detection using neural network. In: Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, 2016, pp. 140–150 (2016)

  16. Tang, X., Qian, T., You, Z.: Generating behavior features for cold-start spam review detection with adversarial learning. Inf. Sci. 526, 274–288 (2020)

    Article  Google Scholar 

  17. Abdullah, M., Madain, A., Jararweh, Y.: ChatGPT: fundamentals, applications and social impacts. In: 2022 Ninth International Conference on Social Networks Analysis, Management and Security (SNAMS), 2022, pp. 1–8. IEEE (2022)

  18. Tenney, I., Das, D., Pavlick, E.: BERT rediscovers the classical NLP pipeline. arXiv preprint (2019). arXiv:190505950

  19. González-Carvajal, S., Garrido-Merchán, E.C.: Comparing BERT against traditional machine learning text classification. arXiv preprint (2020). arXiv:200513012

  20. Alkhodair, S.A., Fung, B.C., Ding, S.H., Cheung, W.K., Huang, S.-C.: Detecting high-engaging breaking news rumors in social media. ACM Trans. Manag. Inf. Syst. 12(1), 1–16 (2020)

    Article  Google Scholar 

  21. Arulmurugan, R., Sabarmathi, K., Anandakumar, H.: Retraction Note: Classification of Sentence Level Sentiment Analysis Using Cloud Machine Learning Techniques. Springer, Berlin (2022)

    Google Scholar 

  22. Mukherjee, A., Venkataraman, V., Liu, B., Glance, N.: Fake Review Detection: Classification and Analysis of Real and Pseudo Reviews. UIC-CS-03-2013 Technical Report (2013)

  23. Rayana, S., Akoglu, L.: Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2015, pp. 985–994. ACM (2015)

  24. Barbado, R., Araque, O., Iglesias, C.A.: A framework for fake review detection in online consumer electronics retailers. Inf. Process. Manag. 56(4), 1234–1244 (2019)

    Article  Google Scholar 

  25. Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319. Association for Computational Linguistics (2011)

  26. Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics: Long Papers, 2014, vol. 1, pp. 1566–1576 (2014)

  27. Aghakhani, H., Machiry, A., Nilizadeh, S., Kruegel, C., Vigna, G.: Detecting deceptive reviews using generative adversarial networks. In: 2018 IEEE Security and Privacy Workshops (SPW), 2018, pp. 89–95. IEEE (2018)

  28. Das, B., Chakraborty, S.: An improved text sentiment classification model using TF–IDF and next word negation. arXiv preprint (2018). arXiv:180606407

  29. Wu, H.C., Luk, R.W.P., Wong, K.F., Kwok, K.L.: Interpreting TF–IDF term weights as making relevance decisions. ACM Trans. Inf. Syst. 26(3), 1–37 (2008)

    Article  Google Scholar 

  30. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3(Feb), 1137–1155 (2003)

    MATH  Google Scholar 

  31. Almeida, F., Xexéo, G.: Word embeddings: a survey. arXiv preprint (2019). arXiv:190109069

  32. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  33. Jain, N., Kumar, A., Singh, S., Singh, C., Tripathi, S.: Deceptive reviews detection using deep learning techniques. In: International Conference on Applications of Natural Language to Information Systems, 2019, pp. 79–91. Springer (2019)

  34. Vimala, S., Khanaa, V., Nalini, C.: Retraction Note: A Study on Supervised Machine Learning Algorithm to Improvise Intrusion Detection Systems for Mobile Ad Hoc Networks. Springer, Berlin (2022)

    Google Scholar 

  35. Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: ELECTRA: pre-training text encoders as discriminators rather than generators. arXiv preprint (2020). arXiv:200310555

  36. Peters, M., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K., et al.: Deep contextualized word representations. arXiv preprint (2018). arXiv:180205365.1802;12.

  37. Radford, A., Narasimhan, K., Salimans, T., Sutskever, I.: Improving language understanding by generative pre-training (2018)

  38. Rosenblatt, F.: The perceptron: a probabilistic model for information storage and organization in the brain. Psychol. Rev. 65(6), 386 (1958)

    Article  Google Scholar 

  39. Riedmiller, M., Lernen, A.: Multi layer perceptron. In: Machine Learning Lab Special Lecture, 2014, pp. 7–24. University of Freiburg (2014)

  40. Shang, R., He, J., Wang, J., Xu, K., Jiao, L., Stolkin, R.: Dense connection and depthwise separable convolution based CNN for polarimetric SAR image classification. Knowl. Based Syst. 194, 105542 (2020)

    Article  Google Scholar 

  41. Zhang, J., Dong, B., Philip, S.Y.: Fakedetector: effective fake news detection with deep diffusive neural network. In: 2020 IEEE 36th International Conference on Data Engineering (ICDE), 2020, pp. 1826–1829. IEEE (2020)

  42. Halyal, S.V.: Running Google Colaboratory as a server-transferring dynamic data in and out of colabs. Int. J. Educ. Manag. Eng. 9(6), 35 (2019)

    Google Scholar 

  43. Wolf, T., Chaumond, J., Debut, L., Sanh, V., Delangue, C., Moi, A., et al.: Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, 2020, pp. 38–45 (2020)

  44. Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Short Papers, 2012, vol. 2, pp. 171–175. Association for Computational Linguistics (2012)

  45. Cagnina, L., Rosso, P.: Classification of deceptive opinions using a low dimensionality representation. In: Proceedings of the 6th Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, 2015, pp. 58–66 (2015)

  46. Lai, S., Xu, L., Liu, K., Zhao, J.: Recurrent convolutional neural networks for text classification. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015 (2015)

  47. Ren, Y., Ji, D.: Neural networks for deceptive opinion spam detection: an empirical study. J. Inf. Sci. (2017). https://doi.org/10.1016/j.ins.2017.01.015

    Article  Google Scholar 

  48. Zhang, W., Du, Y., Yoshida, T., Wang, Q.: DRI-RCNN: an approach to deceptive review identification using recurrent convolutional neural network. Inf. Process. Manag. 54(4), 576–592 (2018)

    Article  Google Scholar 

  49. Zhang, C., Gupta, A., Qin, X., Zhou, Y.: A computational approach for real-time detection of fake news. Expert Syst. Appl. 221, 119656 (2023)

    Article  Google Scholar 

Download references

Funding

No Funding.

Author information

Authors and Affiliations

Authors

Contributions

RM initialized the project and managed the study. RM collected the data and performed formal analysis. RM analyzed the data. RM wrote the initial manuscript. All authors contributed to the editing of the paper.

Corresponding author

Correspondence to Rami Mohawesh.

Ethics declarations

Conflict of interest

No competing interest.

Ethical approval

No ethical issue involved.

Informed consent

No ethical issue involved.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mohawesh, R., Al-Hawawreh, M., Maqsood, S. et al. Factitious or fact? Learning textual representations for fake online review detection. Cluster Comput (2023). https://doi.org/10.1007/s10586-023-04148-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10586-023-04148-x

Keywords

Navigation