Abstract
People usually prepare themselves by reading online reviews before purchasing a product. Sellers sometimes try to imitate user experience as a deceptive review to increase profits. Deceptive opinion spam detection has emerged as a challenging task in the field of opinion mining. Feature reduction techniques play the most important role in data mining which finds the essential features and removes the unnecessary dimensions that only contribute to the noise. This article extracts various textual features of gold-standard deceptive hotel reviews using different representation techniques like Part of Speech tag (POS tag), Bag of Word (BoW), and Doc2Vec. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are applied to reduce the features' dimensions. Various supervised classifiers like Decision Tree (DT), Na¨ıve Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) are used to classify deceptive opinions and truthful opinions. The features used by these supervised classifiers cannot retain sequential information from reviews. To overcome this problem, we used the Words Attention-based Bidirectional Long Short-Term Memory (WABiLSTM) network model that trains to learn the patterns of words. The article examines machine and deep learning-based spam detection models and provides their outline and results. The metrics like accuracy, precision, recall, and F-Measure are used to analyze the performance of these classification models. The experimental results showed the model's performance improved after reducing the features.
Similar content being viewed by others
References
Algur SP, Patil AP, Hiremath PS, Shivashankar S (2010) Conceptual level similarity measure based review spam detection. In: 2010 International conference on signal and image processing, IEEE, pp 416–423
Asghar MZ, Ullah A, Ahmad S, Khan A (2020) Opinion spam detection framework using hybrid classification scheme. Soft Comput 24(5):3475–3498
Barushka A, Hajek P (2019) Review spam detection using word embeddings and deep neural networks. In: Artificial intelligence applications and innovations: 15th IFIP WG 12.5 international conference, AIAI 2019, hersonissos, crete, greece, May 24–26, 2019, proceedings, vol 15. Springer International Publishing, pp 340–350
Batra J, Jain R, Tikkiwal VA, Chakraborty A (2021) A comprehensive study of spam detection in e-mails using bio-inspired optimization techniques. Int J Inf Manag Data Insights 1(1):100006
Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215
Costa VG, Pedreira CE (2023) Recent advances in decision trees: an updated survey. Artif Intell Rev 56(5):4765–4800
Dong M, Yao L, Wang X, Benatallah B, Huang C, Ning X (2020) Opinion fraud detection via neural autoencoder decision forest. Pattern Recogn Lett 132:21–29
Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. ICWSM 13:175–184
Feng S, Banerjee R, Choi Y (2012) Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics, vol 2. Short Papers, pp 171–175
Hameed Z, Garcia-Zapirain B (2020) Sentiment classification using a single-layered bilstm model. IEEE Access 8:73992–74001
Heydari A, Ali-Tavakoli M, Salim N, Heydari Z (2015) Detection of review spam: a survey. Expert Syst Appl 42(7):3634–3642
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Jindal N, Liu B (2007) Analyzing and detecting review spam. In: 7th IEEE international conference on data mining ICDM 2007, pp 547-552
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining, pp 219–230
Labrín C, Urdinez F (2020) Principal component analysis. R for political data science. Chapman and Hall/CRC, Boca Raton, pp 375–393
Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, PMLR, pp 1188–1196
Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41
Li Y, Wang F, Zhang S, Niu X (2021) Detection of fake reviews using group model. Mob Netw Appl 26(1):91–103
Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 1. Long Papers, pp 1566–1576
Liu W, Jing W, Li Y (2020) Incorporating feature representation into bilstm for deceptive review detection. Computing 102(3):701–715
Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in twitter. IEEE Trans Comput Soc Syst 5(4):973–984
Malandri L, Porcel C, Xing F, Serrano-Guerrero J, Cambria E (2022) Soft computing for recommender systems and sentiment analysis. Appl Soft Comput 118:108246
Maurya SK, Singh D, Maurya AK (2023) Deceptive opinion spam detection approaches: a literature survey. Appl Intell 53(2):2189–2234
Mewada A, Dewang RK (2021) Deceptive reviewer detection by analyzing web data using HMM and similarity measures. Materials today proceedings. Elsevier, Amsterdam
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119
Mohammadzadeh H, Gharehchopogh FS (2021) A novel hybrid whale optimization algorithm with flower pollination algorithm for feature selection: case study email spam detection. Comput Intell 37(1):176–209
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, pp 191–200
Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 632–640
Narayan R, Rout JK, Jena SK (2018) Review spam detection using opinion mining. In: Progress in intelligent computing techniques: theory, practice, and applications: proceedings of ICACNI 2016, vol 2. Springer, Singapore, pp 273–279
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557
Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 497–501
Radovanovi´c D, Krstaji´c B (2018) Review spam detection using machine learning. In: 2018 23rd international scientific-professional conference on information technology (IT), IEEE, pp 1–4
Rayana S, Akoglu L (2015) Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 985–994
Ren Y, Zhang Y (2016) Deceptive opinion spam detection using neural network. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 140–150
Sandulescu V, Ester M (2015) Detecting singleton review spammers using semantic similarity. In: Proceedings of the 24th international conference on World Wide Web, pp 971–976
Saumya S, Singh JP et al (2020) Spam review detection using LSTM autoencoder: an unsupervised approach. Electron Commer Res 22:1–21
Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th international conference on intellient systems design and applications, IEEE, pp 53–58
Shuai Q, Huang Y, Jin L, Pang L (2018) Sentiment analysis on Chinese hotel reviews with doc2vec and classifiers. In: 2018 IEEE 3rd advanced information technology, electronic and automation control conference (IAEAC), IEEE, pp 1171–1174
Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958
Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1088–1096
Tian Y, Mirzabagheri M, Tirandazi P, Bamakan SMH (2020) A non-convex semi-supervised approach to opinion spam detection by ramp-one class SVM. Inf Process Manag 57(6):102381
Wang Z, Songmin Gu, Zhao X, Xiaowei Xu (2018b) Graph-based review spammer group detection. Knowl Inf Syst 55(3):571–597
Wang C-C, Day M-Y, Chen C-C, Liou J-W (2018) Detecting spamming reviews using long short-term memory recurrent neural network framework. In: Proceedings of the 2nd international conference on E-commerce, E-Business and E-Government, pp 16–20
Wickramasinghe I, Kalutarage H (2021) Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Comput 25(3):2277–2293
Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the first workshop on social media analytics, pp 10–13
Xanthopoulos P, Pardalos PM, Trafalis TB, Xanthopoulos P, Pardalos PM, Trafalis TB (2013) Linear discriminant analysis. Robust Data Min 2013:27–33. https://doi.org/10.1007/978-1-4419-9878-1_4
Xu Q, Zhao H (2012) Using deep linguistic features for finding deceptive opinion SPAM. In: Proceedings of COLING 2012: posters, pp 1341–1350
Yu L, Zhou R, Chen R, Lai KK (2022) Missing data preprocessing in credit classification: one-hot encoding or imputation? Emerg Mark Financ Trade 58(2):472–482
Zhang Y, Rao Z (2020) n-bilstm: bilstm with n-gram features for text classification. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC), IEEE, pp 1056–1059
Zou X, Hu Y, Tian Z, Shen K (2019) Logistic regression model optimization and case analysis. In: 2019 IEEE 7th international conference on computer science and network technology (ICCSNT), IEEE, pp 135–139
Funding
No funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Human and animals participants
This article does not involve human participants or animals.
Informed consent
There is no plagiarism.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Maurya, S.K., Singh, D. & Maurya, A.K. Deceptive opinion spam detection using feature reduction techniques. Int J Syst Assur Eng Manag 15, 1210–1230 (2024). https://doi.org/10.1007/s13198-023-02208-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13198-023-02208-4