Abstract
In recent years, online reviews have become increasingly important in promoting various products and services. Unfortunately, writing deceptive reviews has also become a common practice to promote one’s own business or tarnish the reputation of competitors. As a result, identifying fake reviews has become an intense and ongoing area of research. This paper proposes a node embedding approach to detect online fake reviews. The approach involves extracting features from the input data to create a distance matrix, which is then used to construct a Graph. From the graph, we extract graph nodes and use the Node2Vec biased random walk algorithm to create a model. We retrieve node embeddings from the Node2Vec model using Word2Vec and use different classifiers to classify the nodes. We trained and evaluated the machine learning models on the Deceptive Opinion Spam Corpus and YelpChi datasets and achieved superior results compared to prior work for detecting fake reviews, with SVM using the Hamming distance achieving 98.44% accuracy, 98.44% precision, 98.44% recall, and 98.44% F1-score. Furthermore, we explored different methods for explaining our proposed methods using explainable AI, demonstrating the interpretability of our approach. Our proposed node embedding approach shows promising results for detecting fake reviews and offers a transparent and interpretable solution for the problem.
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig1_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig2_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig3_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig4_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig5_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig6_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig7_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig8_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig9_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig10_HTML.png)
![](http://media.springernature.com/m312/springer-static/image/art%3A10.1007%2Fs41060-024-00565-2/MediaObjects/41060_2024_565_Fig11_HTML.png)
Similar content being viewed by others
Data availability
The data and code used in the study are available from https://github.com/nzaki02/Fake_Review_2023.
References
Rustagi, A., Padisetti, V., Subramaniam, S.: Fake review detection using machine learning. J. Stud. Res. 11, 1–8 (2022). https://doi.org/10.47611/jsrhs.v11i1.3281
Ahmed, H., Traore, I., Saad, S.: Detecting opinion spams and fake news using text classification. Secur. Priv. 1, 9 (2018). https://doi.org/10.1002/spy2.9
Kennedy, S., Walsh, N., Sloka, K., McCarren, A., Foster, J.: Fact or factitious? Contextualized opinion spam detection. In: ACL 2019—57th Annual Meeting of the Association for Computational Linguistics Proceedings. Student Research Workshop, pp. 344–350 (2019). https://doi.org/10.18653/v1/p19-2048
Archchitha, K., Charles, E.Y.A.: Opinion spam detection in online reviews using neural networks. In: 2019 19th International Conference on Advances in ICT for Emerging Regions, pp. 1–6. IEEE (2019). https://doi.org/10.1109/ICTer48817.2019.9023695
Ligthart, A., Catal, C., Tekinerdogan, B.: Analyzing the effectiveness of semi-supervised learning approaches for opinion spam classification. Appl. Soft Comput. 101, 107023 (2021). https://doi.org/10.1016/j.asoc.2020.107023
Raza, S.: Automatic fake news detection in political platforms—a transformer-based approach. In: Hürriyetoğlu, A. (ed.) Proceedings of the 4th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2021), pp. 68–78. Association for Computational Linguistics (2021). https://doi.org/10.18653/v1/2021.case-1.10. https://aclanthology.org/2021.case-1.10
Salminen, J., Kandpal, C., Kamel, A.M., Jung, S., Jansen, B.J.: Creating and detecting fake reviews of online products. J. Retail. Consum. Serv. 64, 102771 (2022). https://doi.org/10.1016/j.jretconser.2021
Mohawesh, R., Xu, S., Springer, M., Al-Hawawreh, M., Maqsood, S.: Fake or genuine? Contextualised text representation for fake review detection. In: Natural Language Processing. Academy and Industry Research Collaboration Center, AIRCC, pp. 137–148 (2021). https://doi.org/10.5121/csit.2021.112311
Li, J., Ott, M., Cardie, C., Hovy, E.: Towards a general rule for identifying deceptive opinion spam. In: 52nd Annual Meeting of the Association for Computational Linguistics. ACL 2014—Proceeding Conference, pp. 1566–1576. Association for Computational Linguistics, Stroudsburg (2014). https://doi.org/10.3115/v1/p14-1147
Shojaee, S., Murad, M., Azman, A., Sharef, N.M., Nadali, S.: Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th International Conference on Intelligent Systems Design and Applications, pp. 53–58. IEEE (2013). https://doi.org/10.1109/ISDA.2013.6920707
Algur, S.P., Patil, A.P., Hiremath, P., Shivashankar, S.: Conceptual level similarity measure based review spam detection. In: 2010 International Conference on Signal, Image Processing, pp. 416–423. IEEE (2010). https://doi.org/10.1109/ICSIP.2010.5697509
Lau, R.Y.K., Liao, S.Y., Kwok, R.C.-W., Xu, K., Xia, Y., Li, Y.: Text mining and probabilistic language modeling for online review spam detection. ACM Trans. Manag. Inf. Syst. 2, 1–30 (2011). https://doi.org/10.1145/2070710.2070716
Mukherjee, A., Liu, B., Glance, N.: Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st International Conference on World Wide Web, pp. 191–200. ACM, New York (2012). https://doi.org/10.1145/2187836.2187863
Yu, S., Ren, J., Li, S., Naseriparsa, M., Xia, F.: Graph Learning for Fake Review Detection. Front. Artif. Intell 5, 1–14 (2022). https://doi.org/10.3389/frai.2022.922589
Zaki, N., Singh, H., Mohamed, E.A.: Identifying protein complexes in protein-protein interaction data using graph convolutional network. IEEE Access 9, 123717–123726 (2021). https://doi.org/10.1109/ACCESS.2021.3110845
Li, A., Qin, Z., Liu, R., Yang, Y., Li, D.: Spam review detection with graph convolutional networks. In: International Conference on Information and Knowledge Management Proceedings, pp. 2703–2711 (2019). https://doi.org/10.1145/3357384.3357820
Sun, Y., Loparo, K.: Opinion spam detection based on heterogeneous information network. In: Proceedings—International Conference on Tools with Artificial Intelligence. ICTAI, pp. 1156–1163. IEEE (2019). https://doi.org/10.1109/ICTAI.2019.00277
Noekhah, S., Salim, N., Zakaria, N.H.: Opinion spam detection: using multi-iterative graph-based model. Inf. Process. Manag. 57, 102140 (2020). https://doi.org/10.1016/j.ipm.2019.102140
Hamilton, W.L., Ying, R., Leskovec, J.: Representation Learning on Graphs: Methods and Applications. arXiv:1709.05584
Henderson, K., Gallagher, B., Li, L., Akoglu, L., Eliassi-Rad, T., Tong, H., Faloutsos, C.: It’s who you know: graph mining using recursive structural features. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, pp. 663–671 (2017). https://doi.org/10.1145/2020408.2020512
Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 1–9 (2013)
Ott, M., Choi, Y., Cardie, C., Hancock, J.T.: Finding deceptive opinion spam by any stretch of the imagination. In:, ACL-HLT 2011—Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 309–319 (2011)
Feng, S., Banerjee, R., Choi, Y.: Syntactic stylometry for deception detection. In: Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics, vol. 2, pp. 171–175. Short Pap. Association for Computational Linguistics, Jeju Island (2012). https://aclanthology.org/P12-2034
Xu, Q., Zhao, H.: Using deep linguistic features for finding deceptive opinion spam. In: Proceedings of COLING 2012 Posters, pp. 1341–1350 (2012). http://www.aclweb.org/anthology/C12-2131
Ott, M., Cardie, C., Hancock, J.T.: Negative deceptive opinion spam. In: NAACL HLT 2013—2013 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 497–501 (2013)
Cagnina, L.C., Rosso, P.: Classification of deceptive opinions using a low dimensionality representation. In: 6th work. Empir. Methods Nat. Lang. Process. EMNLP 2015—Proceedings, pp. 58–66 (2015). https://doi.org/10.18653/v1/w15-2909
Kim, S., Chang, H., Lee, S., Yu, M., Kang, J.: Deep semantic frame-based deceptive opinion spam analysis. In: Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, pp. 1131–1140. ACM, New York (2015). https://doi.org/10.1145/2806416.2806551
Chen, C., Zhao, H., Yang, Y.: Deceptive opinion spam detection using deep level linguistic features. In: Li, J., Ji, H., Zhao, D., Feng, Y. (eds.) Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics, pp. 465–474. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-25207-0_43
Hernandez-Castaneda, A., Calvo, H., Gelbukh, A., Flores, J.J.G.: Cross-domain deception detection using support vector networks. Soft. Comput. 21, 585–595 (2017). https://doi.org/10.1007/s00500-016-2409-2
Stanton, G., A.A.: Irissappane, gans for semi-supervised opinion spam detection. In: IJCAI: International Joint Conference on Artificial Intelligence, pp. 5204–5210 (2019-08). https://doi.org/10.24963/ijcai.2019/723
Tian, Y., Mirzabagheri, M., Tirandazi, P., Bamakan, S.M.H.: A non-convex semi-supervised approach to opinion spam detection by ramp-one class SVM. Inf. Process. Manag. 57, 102381 (2020). https://doi.org/10.1016/j.ipm.2020.102381
Kaggle, D.O.S.C. https://www.kaggle.com/datasets/rtatman/deceptive-opinion-spam-corpus. Accessed 15 Jan 2023
Mukherjee, A., Venkataraman, V., Liu, B., Glance, N., Doing, W.Y.F.R.F.M.B., Proceedings: International AAAI Conference on Web and Social Media, vol. 7, pp. 409–418 (2021). https://doi.org/10.1609/icwsm.v7i1.14389
Bird, S., Klein, E., Loper, E.: Natural Language Processing with Python. O’Reilly Media Inc., Sebastopol (2009)
spaCy 101: Everything you need to know. https://spacy.io/usage/spacy-101. Accessed 6 July 2022
Vajjala, S., Majumder, B., Gupta, A., Surana, H.: Practical Natural Language Processing: A Comprehensive Guide to Building Real-World NLP Systems. O’Reilly Media, Sebastopol (2020)
NLTK, N. https://www.nltk.org/_modules/nltk/stem/wordnet.html. Accessed 6 July 2022
documentation. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.CountVectorizer.html#sklearn-feature-extraction-text-countvectorizer. Accessed 6 July 2022
sklearnfeature_extractiontextTfidfVectorizer. https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html. Accessed 6 July 2022
documentation. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.chi2.html. Accessed 15 Jan 2023
documentation. https://scikit-learn.org/stable/modules/generated/sklearn.feature_selection.SelectKBest.html. Accessed 15 Jan 2023
Grover, A., Leskovec, J.: Node2vec: scalable feature learning for networks. In: Proceedings of ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 13–17-August, pp. 855–864 (2016). https://doi.org/10.1145/2939672.2939754
Node classification with weighted Node2Vec–StellarGraph 1.2.1 documentation. https://stellargraph.readthedocs.io/en/stable/demos/node-classification/node2vec-node-classification.html#Introduction. Accessed 15 Jan 2023
CSIRO’s Data61, StellarGraph Machine Learning Library, GitHub Repos. https://github.com/stellargraph/stellargraph
Khan, W., Zaki, N., Ahmad, A., Bian, J., Ali, L., Masud, N., Ghenimi, M.M., Ahmed, L.: Infant low birth weight prediction using graph embedding features. Int. J. Environ. Res. Public Health 20, 1317 (2023). https://doi.org/10.3390/ijerph20021317
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, A.: Scikit-learn: machine learning in python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
Raschka, S., Mirjalili, V.: Python Machine Learning: Machine Learning and Deep Learning with Python, scikit-learn, and TensorFlow 2, Packt (2019)
Bradley, A.P.: The use of the area under the roc curve in the evaluation of machine learning algorithms. Pattern Recognit 30, 1145–1159 (1997). https://doi.org/10.1016/S0031-3203(96)00142-2
Fawcett, T.: An introduction to roc analysis. Pattern Recognit. Lett. 27, 861–874 (2006). https://doi.org/10.1016/j.patrec.2005.10.010
Zaki, N., Wolfsheimer, S., Nuel, G., et al.: Conotoxin protein classification using free scores of words and support vector machines. BMC Bioinform. 12, 217 (2011). https://doi.org/10.1186/1471-2105-12-217
McHugh, M.L.: Interrater reliability: the kappa statistic. Biochem. Med. 22, 276–282 (2012). https://doi.org/10.11613/bm.2012.031
Cohen, J.: A coefficient of agreement for nominal scales. Educ. Psychol. Meas. 20, 37–46 (1960). https://doi.org/10.1177/001316446002000104
Bengio, Y., Courville, A., Vincent, P.: Representation learning: a review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. 35, 1798–1828 (2012). https://doi.org/10.1109/TPAMI.2013.50
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016)
Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008)
3.3. Metrics and scoring: quantifying the quality of predictions. https://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics. Accessed 5 April 2023
Devlin, J., Chang, M.-W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding (2018). arXiv:1810.04805
Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized Bert pretraining approach (2019). arXiv:1907.11692
Clark, K., Luong, M.-T., Le, Q.V., Manning, C.D.: Electra: pre-training text encoders as discriminators rather than generators (2020). arXiv:2003.10555
Budhi, G.S., Chiong, R., Wang, Z., Dhakal, S.: Using a hybrid content-based and behaviour-based featuring approach in a parallel environment to detect fake reviews, electron. Commer. Res. Appl. 47, 101048 (2021). https://doi.org/10.1016/j.elerap.2021.101048
Wang, J., Kan, H., Meng, F., Mu, Q., Shi, G., Xiao, X.: Fake review detection based on multiple feature fusion and rolling collaborative training. IEEE Access 8, 182625–182639 (2020). https://doi.org/10.1109/ACCESS.2020.3028588
Budhi, G.S., Chiong, R., Wang, Z.: Resampling imbalanced data to detect fake reviews using machine learning classifiers and textual-based features. Multimed. Tools Appl. 80, 13079–13097 (2021). https://doi.org/10.1007/s11042-020-10299-5
Acknowledgements
The authors gratefully acknowledge the partial support received from the College of Information Technology (CIT) at the United Arab Emirates University (UAEU). In addition, the authors would like to thank the Research Office at the UAEU for providing a summer grant (Grant code: G00003895) that supported the research work presented in this paper.
Funding
The work is supported by the Research Office at the UAEU (Grant code: G00003895)
Author information
Authors and Affiliations
Contributions
NZ and ST conceptualized the paper. All authors contributed to the experimental work, with NZ, AK, ST, ZR, and JR contributing to the writing of the manuscript. NZ provided project supervision.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zaki, N., Krishnan, A., Turaev, S. et al. Node embedding approach for accurate detection of fake reviews: a graph-based machine learning approach with explainable AI. Int J Data Sci Anal (2024). https://doi.org/10.1007/s41060-024-00565-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41060-024-00565-2