Node embedding approach for accurate detection of fake reviews: a graph-based machine learning approach with explainable AI

Zaki, Nazar; Krishnan, Anusuya; Turaev, Sherzod; Rustamov, Zahiriddin; Rustamov, Jaloliddin; Almusalami, Aisha; Ayyad, Farah; Regasa, Tsion; Iriho, Brice Boris

doi:10.1007/s41060-024-00565-2

Node embedding approach for accurate detection of fake reviews: a graph-based machine learning approach with explainable AI

Regular Paper
Published: 04 June 2024

(2024)
Cite this article

International Journal of Data Science and Analytics Aims and scope Submit manuscript

Nazar Zaki¹,
Anusuya Krishnan¹,
Sherzod Turaev¹,
Zahiriddin Rustamov¹,
Jaloliddin Rustamov¹,
Aisha Almusalami¹,
Farah Ayyad²,
Tsion Regasa¹ &
…
Brice Boris Iriho¹

34 Accesses
Explore all metrics

Abstract

In recent years, online reviews have become increasingly important in promoting various products and services. Unfortunately, writing deceptive reviews has also become a common practice to promote one’s own business or tarnish the reputation of competitors. As a result, identifying fake reviews has become an intense and ongoing area of research. This paper proposes a node embedding approach to detect online fake reviews. The approach involves extracting features from the input data to create a distance matrix, which is then used to construct a Graph. From the graph, we extract graph nodes and use the Node2Vec biased random walk algorithm to create a model. We retrieve node embeddings from the Node2Vec model using Word2Vec and use different classifiers to classify the nodes. We trained and evaluated the machine learning models on the Deceptive Opinion Spam Corpus and YelpChi datasets and achieved superior results compared to prior work for detecting fake reviews, with SVM using the Hamming distance achieving 98.44% accuracy, 98.44% precision, 98.44% recall, and 98.44% F1-score. Furthermore, we explored different methods for explaining our proposed methods using explainable AI, demonstrating the interpretability of our approach. Our proposed node embedding approach shows promising results for detecting fake reviews and offers a transparent and interpretable solution for the problem.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fake news, disinformation and misinformation in social media: a review

Article 09 February 2023

Impact of word embedding models on text analytics in deep learning environment: a review

Article 22 February 2023

Predicting properties of nodes via community-aware features

Article Open access 15 June 2024

Data availability

The data and code used in the study are available from https://github.com/nzaki02/Fake_Review_2023.

References

Rustagi, A., Padisetti, V., Subramaniam, S.: Fake review detection using machine learning. J. Stud. Res. 11, 1–8 (2022). https://doi.org/10.47611/jsrhs.v11i1.3281
Article Google Scholar
Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Priv. 1, 9 (2018). https://doi.org/10.1002/spy2.9
Article Google Scholar
Kennedy, S., Walsh, N., Sloka, K., McCarren, A., Foster, J.: Fact or factitious? Contextualized opinion spam detection. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics Proceedings. Student Research Workshop, pp. 344–350 (2019). https://doi.org/10.18653/v1/p19-2048
Archchitha, K., Charles, E.Y.A.: Opinion spam detection in online reviews using neural networks. In: 2019 19th International Conference on Advances in ICT for Emerging Regions, pp. 1–6. IEEE (2019). https://doi.org/10.1109/ICTer48817.2019.9023695
Ligthart, A., Catal, C., Tekinerdogan, B.: Analyzing the effectiveness of semi-supervised learning approaches for opinion spam classification. Appl. Soft Comput. 101, 107023 (2021). https://doi.org/10.1016/j.asoc.2020.107023
Article Google Scholar
Raza, S.: Automatic fake news detection in political platforms—a transformer-based approach. In: Hürriyetoğlu, A. (ed.) Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), pp. 68–78. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.case-1.10. https://aclanthology.org/2021.case-1.10
Salminen, J., Kandpal, C., Kamel, A.M., Jung, S., Jansen, B.J.: Creating and detecting fake reviews of online products. J. Retail. Consum. Serv. 64, 102771 (2022). https://doi.org/10.1016/j.jretconser.2021
Article Google Scholar
Mohawesh, R., Xu, S., Springer, M., Al-Hawawreh, M., Maqsood, S.: Fake or genuine? Contextualised text representation for fake review detection. In: Natural Language Processing. Academy and Industry Research Collaboration Center, AIRCC, pp. 137–148 (2021). https://doi.org/10.5121/csit.2021.112311
Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion spam. In: 52nd Annual Meeting of the Association for Computational Linguistics. ACL 2014—Proceeding Conference, pp. 1566–1576. Association for Computational Linguistics, Stroudsburg (2014). https://doi.org/10.3115/v1/p14-1147
Shojaee, S., Murad, M., Azman, A., Sharef, N.M., Nadali, S.: Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th International Conference on Intelligent Systems Design and Applications, pp. 53–58. IEEE (2013). https://doi.org/10.1109/ISDA.2013.6920707
Algur, S.P., Patil, A.P., Hiremath, P., Shivashankar, S.: Conceptual level similarity measure based review spam detection. In: 2010 International Conference on Signal, Image Processing, pp. 416–423. IEEE (2010). https://doi.org/10.1109/ICSIP.2010.5697509
Lau, R.Y.K., Liao, S.Y., Kwok, R.C.-W., Xu, K., Xia, Y., Li, Y.: Text mining and probabilistic language modeling for online review spam detection. ACM Trans. Manag. Inf. Syst. 2, 1–30 (2011). https://doi.org/10.1145/2070710.2070716
Article Google Scholar
Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, pp. 191–200. ACM, New York (2012). https://doi.org/10.1145/2187836.2187863
Yu, S., Ren, J., Li, S., Naseriparsa, M., Xia, F.: Graph Learning for Fake Review Detection. Front. Artif. Intell 5, 1–14 (2022). https://doi.org/10.3389/frai.2022.922589
Article Google Scholar
Zaki, N., Singh, H., Mohamed, E.A.: Identifying protein complexes in protein-protein interaction data using graph convolutional network. IEEE Access 9, 123717–123726 (2021). https://doi.org/10.1109/ACCESS.2021.3110845
Article Google Scholar
Li, A., Qin, Z., Liu, R., Yang, Y., Li, D.: Spam review detection with graph convolutional networks. In: International Conference on Information and Knowledge Management Proceedings, pp. 2703–2711 (2019). https://doi.org/10.1145/3357384.3357820
Sun, Y., Loparo, K.: Opinion spam detection based on heterogeneous information network. In: Proceedings—International Conference on Tools with Artificial Intelligence. ICTAI, pp. 1156–1163. IEEE (2019). https://doi.org/10.1109/ICTAI.2019.00277
Noekhah, S., Salim, N., Zakaria, N.H.: Opinion spam detection: using multi-iterative graph-based model. Inf. Process. Manag. 57, 102140 (2020). https://doi.org/10.1016/j.ipm.2019.102140
Article Google Scholar
Hamilton, W.L., Ying, R., Leskovec, J.: Representation Learning on Graphs: Methods and Applications. arXiv:1709.05584
Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 663–671 (2017). https://doi.org/10.1145/2020408.2020512
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 1–9 (2013)
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In:, ACL-HLT 2011—Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319 (2011)
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 171–175. Short Pap. Association for Computational Linguistics, Jeju Island (2012). https://aclanthology.org/P12-2034
Xu, Q., Zhao, H.: Using deep linguistic features for finding deceptive opinion spam. In: Proceedings of COLING 2012 Posters, pp. 1341–1350 (2012). http://www.aclweb.org/anthology/C12-2131
Ott, M., Cardie, C., Hancock, J.T.: Negative deceptive opinion spam. In: NAACL HLT 2013—2013 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 497–501 (2013)
Cagnina, L.C., Rosso, P.: Classification of deceptive opinions using a low dimensionality representation. In: 6th work. Empir. Methods Nat. Lang. Process. EMNLP 2015—Proceedings, pp. 58–66 (2015). https://doi.org/10.18653/v1/w15-2909
Kim, S., Chang, H., Lee, S., Yu, M., Kang, J.: Deep semantic frame-based deceptive opinion spam analysis. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1131–1140. ACM, New York (2015). https://doi.org/10.1145/2806416.2806551
Chen, C., Zhao, H., Yang, Y.: Deceptive opinion spam detection using deep level linguistic features. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 465–474. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_43
Hernandez-Castaneda, A., Calvo, H., Gelbukh, A., Flores, J.J.G.: Cross-domain deception detection using support vector networks. Soft. Comput. 21, 585–595 (2017). https://doi.org/10.1007/s00500-016-2409-2
Article Google Scholar
Stanton, G., A.A.: Irissappane, gans for semi-supervised opinion spam detection. In: IJCAI: International Joint Conference on Artificial Intelligence, pp. 5204–5210 (2019-08). https://doi.org/10.24963/ijcai.2019/723
Tian, Y., Mirzabagheri, M., Tirandazi, P., Bamakan, S.M.H.: A non-convex semi-supervised approach to opinion spam detection by ramp-one class SVM. Inf. Process. Manag. 57, 102381 (2020). https://doi.org/10.1016/j.ipm.2020.102381
Article Google Scholar
Kaggle, D.O.S.C. https://www.kaggle.com/datasets/rtatman/deceptive-opinion-spam-corpus. Accessed 15 Jan 2023
Mukherjee, A., Venkataraman, V., Liu, B., Glance, N., Doing, W.Y.F.R.F.M.B., Proceedings: International AAAI Conference on Web and Social Media, vol. 7, pp. 409–418 (2021). https://doi.org/10.1609/icwsm.v7i1.14389
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)
Google Scholar
spaCy 101: Everything you need to know. https://spacy.io/usage/spacy-101. Accessed 6 July 2022
Vajjala, S., Majumder, B., Gupta, A., Surana, H.: Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems. O’Reilly Media, Sebastopol (2020)
Google Scholar
NLTK, N. https://www.nltk.org/_modules/nltk/stem/wordnet.html. Accessed 6 July 2022
documentation. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn-feature-extraction-text-countvectorizer. Accessed 6 July 2022
sklearnfeature_extractiontextTfidfVectorizer. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html. Accessed 6 July 2022
documentation. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html. Accessed 15 Jan 2023
documentation. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html. Accessed 15 Jan 2023
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13–17-August, pp. 855–864 (2016). https://doi.org/10.1145/2939672.2939754
Node classification with weighted Node2Vec–StellarGraph 1.2.1 documentation. https://stellargraph.readthedocs.io/en/stable/demos/node-classification/node2vec-node-classification.html#Introduction. Accessed 15 Jan 2023
CSIRO’s Data61, StellarGraph Machine Learning Library, GitHub Repos. https://github.com/stellargraph/stellargraph
Khan, W., Zaki, N., Ahmad, A., Bian, J., Ali, L., Masud, N., Ghenimi, M.M., Ahmed, L.: Infant low birth weight prediction using graph embedding features. Int. J. Environ. Res. Public Health 20, 1317 (2023). https://doi.org/10.3390/ijerph20021317
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, A.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet Google Scholar
Raschka, S., Mirjalili, V.: Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, Packt (2019)
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30, 1145–1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2
Article Google Scholar
Fawcett, T.: An introduction to roc analysis. Pattern Recognit. Lett. 27, 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010
Article Google Scholar
Zaki, N., Wolfsheimer, S., Nuel, G., et al.: Conotoxin protein classification using free scores of words and support vector machines. BMC Bioinform. 12, 217 (2011). https://doi.org/10.1186/1471-2105-12-217
Article Google Scholar
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012). https://doi.org/10.11613/bm.2012.031
Article Google Scholar
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104
Article Google Scholar
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2012). https://doi.org/10.1109/TPAMI.2013.50
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Google Scholar
Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
Google Scholar
3.3. Metrics and scoring: quantifying the quality of predictions. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics. Accessed 5 April 2023
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv:1810.04805
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized Bert pretraining approach (2019). arXiv:1907.11692
Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators (2020). arXiv:2003.10555
Budhi, G.S., Chiong, R., Wang, Z., Dhakal, S.: Using a hybrid content-based and behaviour-based featuring approach in a parallel environment to detect fake reviews, electron. Commer. Res. Appl. 47, 101048 (2021). https://doi.org/10.1016/j.elerap.2021.101048
Article Google Scholar
Wang, J., Kan, H., Meng, F., Mu, Q., Shi, G., Xiao, X.: Fake review detection based on multiple feature fusion and rolling collaborative training. IEEE Access 8, 182625–182639 (2020). https://doi.org/10.1109/ACCESS.2020.3028588
Article Google Scholar
Budhi, G.S., Chiong, R., Wang, Z.: Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features. Multimed. Tools Appl. 80, 13079–13097 (2021). https://doi.org/10.1007/s11042-020-10299-5
Article Google Scholar

Download references

Acknowledgements

The authors gratefully acknowledge the partial support received from the College of Information Technology (CIT) at the United Arab Emirates University (UAEU). In addition, the authors would like to thank the Research Office at the UAEU for providing a summer grant (Grant code: G00003895) that supported the research work presented in this paper.

Funding

The work is supported by the Research Office at the UAEU (Grant code: G00003895)

Author information

Authors and Affiliations

Department of Computer Science and Software Engineering, College of Information Technology, United Arab Emirates University, Al Ain, 15551, UAE
Nazar Zaki, Anusuya Krishnan, Sherzod Turaev, Zahiriddin Rustamov, Jaloliddin Rustamov, Aisha Almusalami, Tsion Regasa & Brice Boris Iriho
Department of Information Systems and Security, College of Information Technology, United Arab Emirates University, Al Ain, 15551, UAE
Farah Ayyad

Authors

Nazar Zaki
View author publications
You can also search for this author in PubMed Google Scholar
Anusuya Krishnan
View author publications
You can also search for this author in PubMed Google Scholar
Sherzod Turaev
View author publications
You can also search for this author in PubMed Google Scholar
Zahiriddin Rustamov
View author publications
You can also search for this author in PubMed Google Scholar
Jaloliddin Rustamov
View author publications
You can also search for this author in PubMed Google Scholar
Aisha Almusalami
View author publications
You can also search for this author in PubMed Google Scholar
Farah Ayyad
View author publications
You can also search for this author in PubMed Google Scholar
Tsion Regasa
View author publications
You can also search for this author in PubMed Google Scholar
Brice Boris Iriho
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

NZ and ST conceptualized the paper. All authors contributed to the experimental work, with NZ, AK, ST, ZR, and JR contributing to the writing of the manuscript. NZ provided project supervision.

Corresponding author

Correspondence to Nazar Zaki.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 746 KB)

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zaki, N., Krishnan, A., Turaev, S. et al. Node embedding approach for accurate detection of fake reviews: a graph-based machine learning approach with explainable AI. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00565-2

Download citation

Received: 20 April 2023
Accepted: 08 May 2024
Published: 04 June 2024
DOI: https://doi.org/10.1007/s41060-024-00565-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Node embedding approach for accurate detection of fake reviews: a graph-based machine learning approach with explainable AI

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Impact of word embedding models on text analytics in deep learning environment: a review

Predicting properties of nodes via community-aware features

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 746 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Node embedding approach for accurate detection of fake reviews: a graph-based machine learning approach with explainable AI

Abstract

Access this article

Similar content being viewed by others

Fake news, disinformation and misinformation in social media: a review

Impact of word embedding models on text analytics in deep learning environment: a review

Predicting properties of nodes via community-aware features

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Supplementary Information

Supplementary file 1 (pdf 746 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation