Skip to main content
Log in

Deceptive opinion spam detection using feature reduction techniques

  • ORIGINAL ARTICLE
  • Published:
International Journal of System Assurance Engineering and Management Aims and scope Submit manuscript

Abstract

People usually prepare themselves by reading online reviews before purchasing a product. Sellers sometimes try to imitate user experience as a deceptive review to increase profits. Deceptive opinion spam detection has emerged as a challenging task in the field of opinion mining. Feature reduction techniques play the most important role in data mining which finds the essential features and removes the unnecessary dimensions that only contribute to the noise. This article extracts various textual features of gold-standard deceptive hotel reviews using different representation techniques like Part of Speech tag (POS tag), Bag of Word (BoW), and Doc2Vec. Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA) are applied to reduce the features' dimensions. Various supervised classifiers like Decision Tree (DT), Na¨ıve Bayes (NB), Logistic Regression (LR), and Support Vector Machine (SVM) are used to classify deceptive opinions and truthful opinions. The features used by these supervised classifiers cannot retain sequential information from reviews. To overcome this problem, we used the Words Attention-based Bidirectional Long Short-Term Memory (WABiLSTM) network model that trains to learn the patterns of words. The article examines machine and deep learning-based spam detection models and provides their outline and results. The metrics like accuracy, precision, recall, and F-Measure are used to analyze the performance of these classification models. The experimental results showed the model's performance improved after reducing the features.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Algorithm 1
Algorithm 2
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

References

  • Algur SP, Patil AP, Hiremath PS, Shivashankar S (2010) Conceptual level similarity measure based review spam detection. In: 2010 International conference on signal and image processing, IEEE, pp 416–423

  • Asghar MZ, Ullah A, Ahmad S, Khan A (2020) Opinion spam detection framework using hybrid classification scheme. Soft Comput 24(5):3475–3498

    Article  Google Scholar 

  • Barushka A, Hajek P (2019) Review spam detection using word embeddings and deep neural networks. In: Artificial intelligence applications and innovations: 15th IFIP WG 12.5 international conference, AIAI 2019, hersonissos, crete, greece, May 24–26, 2019, proceedings, vol 15. Springer International Publishing, pp 340–350

  • Batra J, Jain R, Tikkiwal VA, Chakraborty A (2021) A comprehensive study of spam detection in e-mails using bio-inspired optimization techniques. Int J Inf Manag Data Insights 1(1):100006

    Google Scholar 

  • Cervantes J, Garcia-Lamont F, Rodríguez-Mazahua L, Lopez A (2020) A comprehensive survey on support vector machine classification: applications, challenges and trends. Neurocomputing 408:189–215

    Article  Google Scholar 

  • Costa VG, Pedreira CE (2023) Recent advances in decision trees: an updated survey. Artif Intell Rev 56(5):4765–4800

    Article  Google Scholar 

  • Dong M, Yao L, Wang X, Benatallah B, Huang C, Ning X (2020) Opinion fraud detection via neural autoencoder decision forest. Pattern Recogn Lett 132:21–29

    Article  ADS  Google Scholar 

  • Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. ICWSM 13:175–184

    Google Scholar 

  • Feng S, Banerjee R, Choi Y (2012) Syntactic stylometry for deception detection. In: Proceedings of the 50th annual meeting of the association for computational linguistics, vol 2. Short Papers, pp 171–175

  • Hameed Z, Garcia-Zapirain B (2020) Sentiment classification using a single-layered bilstm model. IEEE Access 8:73992–74001

    Article  Google Scholar 

  • Heydari A, Ali-Tavakoli M, Salim N, Heydari Z (2015) Detection of review spam: a survey. Expert Syst Appl 42(7):3634–3642

    Article  Google Scholar 

  • Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  CAS  PubMed  Google Scholar 

  • Jindal N, Liu B (2007) Analyzing and detecting review spam. In: 7th IEEE international conference on data mining ICDM 2007, pp 547-552

  • Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining, pp 219–230

  • Labrín C, Urdinez F (2020) Principal component analysis. R for political data science. Chapman and Hall/CRC, Boca Raton, pp 375–393

    Chapter  Google Scholar 

  • Le Q, Mikolov T (2014) Distributed representations of sentences and documents. In: International conference on machine learning, PMLR, pp 1188–1196

  • Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41

    Article  Google Scholar 

  • Li Y, Wang F, Zhang S, Niu X (2021) Detection of fake reviews using group model. Mob Netw Appl 26(1):91–103

    Article  Google Scholar 

  • Li J, Ott M, Cardie C, Hovy E (2014) Towards a general rule for identifying deceptive opinion spam. In: Proceedings of the 52nd annual meeting of the association for computational linguistics, vol 1. Long Papers, pp 1566–1576

  • Liu W, Jing W, Li Y (2020) Incorporating feature representation into bilstm for deceptive review detection. Computing 102(3):701–715

    Article  MathSciNet  Google Scholar 

  • Madisetty S, Desarkar MS (2018) A neural network-based ensemble approach for spam detection in twitter. IEEE Trans Comput Soc Syst 5(4):973–984

    Article  Google Scholar 

  • Malandri L, Porcel C, Xing F, Serrano-Guerrero J, Cambria E (2022) Soft computing for recommender systems and sentiment analysis. Appl Soft Comput 118:108246

    Article  Google Scholar 

  • Maurya SK, Singh D, Maurya AK (2023) Deceptive opinion spam detection approaches: a literature survey. Appl Intell 53(2):2189–2234

    Article  Google Scholar 

  • Mewada A, Dewang RK (2021) Deceptive reviewer detection by analyzing web data using HMM and similarity measures. Materials today proceedings. Elsevier, Amsterdam

    Google Scholar 

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013) Distributed representations of words and phrases and their compositionality. In: Advances in neural information processing systems, pp 3111–3119

  • Mohammadzadeh H, Gharehchopogh FS (2021) A novel hybrid whale optimization algorithm with flower pollination algorithm for feature selection: case study email spam detection. Comput Intell 37(1):176–209

    Article  MathSciNet  Google Scholar 

  • Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, pp 191–200

  • Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on knowledge discovery and data mining, pp 632–640

  • Narayan R, Rout JK, Jena SK (2018) Review spam detection using opinion mining. In: Progress in intelligent computing techniques: theory, practice, and applications: proceedings of ICACNI 2016, vol 2. Springer, Singapore, pp 273–279

  • Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. arXiv preprint arXiv:1107.4557

  • Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: Proceedings of the 2013 conference of the North American chapter of the association for computational linguistics: human language technologies, pp 497–501

  • Radovanovi´c D, Krstaji´c B (2018) Review spam detection using machine learning. In: 2018 23rd international scientific-professional conference on information technology (IT), IEEE, pp 1–4

  • Rayana S, Akoglu L (2015) Collective opinion spam detection: bridging review networks and metadata. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining, pp 985–994

  • Ren Y, Zhang Y (2016) Deceptive opinion spam detection using neural network. In: Proceedings of COLING 2016, the 26th international conference on computational linguistics: technical papers, pp 140–150

  • Sandulescu V, Ester M (2015) Detecting singleton review spammers using semantic similarity. In: Proceedings of the 24th international conference on World Wide Web, pp 971–976

  • Saumya S, Singh JP et al (2020) Spam review detection using LSTM autoencoder: an unsupervised approach. Electron Commer Res 22:1–21

    Google Scholar 

  • Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th international conference on intellient systems design and applications, IEEE, pp 53–58

  • Shuai Q, Huang Y, Jin L, Pang L (2018) Sentiment analysis on Chinese hotel reviews with doc2vec and classifiers. In: 2018 IEEE 3rd advanced information technology, electronic and automation control conference (IAEAC), IEEE, pp 1171–1174

  • Srivastava N, Hinton G, Krizhevsky A, Sutskever I, Salakhutdinov R (2014) Dropout: a simple way to prevent neural networks from overfitting. J Mach Learn Res 15(1):1929–1958

    MathSciNet  Google Scholar 

  • Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1088–1096

  • Tian Y, Mirzabagheri M, Tirandazi P, Bamakan SMH (2020) A non-convex semi-supervised approach to opinion spam detection by ramp-one class SVM. Inf Process Manag 57(6):102381

    Article  Google Scholar 

  • Wang Z, Songmin Gu, Zhao X, Xiaowei Xu (2018b) Graph-based review spammer group detection. Knowl Inf Syst 55(3):571–597

    Article  Google Scholar 

  • Wang C-C, Day M-Y, Chen C-C, Liou J-W (2018) Detecting spamming reviews using long short-term memory recurrent neural network framework. In: Proceedings of the 2nd international conference on E-commerce, E-Business and E-Government, pp 16–20

  • Wickramasinghe I, Kalutarage H (2021) Naive Bayes: applications, variations and vulnerabilities: a review of literature with code snippets for implementation. Soft Comput 25(3):2277–2293

    Article  Google Scholar 

  • Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the first workshop on social media analytics, pp 10–13

  • Xanthopoulos P, Pardalos PM, Trafalis TB, Xanthopoulos P, Pardalos PM, Trafalis TB (2013) Linear discriminant analysis. Robust Data Min 2013:27–33. https://doi.org/10.1007/978-1-4419-9878-1_4

    Article  Google Scholar 

  • Xu Q, Zhao H (2012) Using deep linguistic features for finding deceptive opinion SPAM. In: Proceedings of COLING 2012: posters, pp 1341–1350

  • Yu L, Zhou R, Chen R, Lai KK (2022) Missing data preprocessing in credit classification: one-hot encoding or imputation? Emerg Mark Financ Trade 58(2):472–482

    Article  Google Scholar 

  • Zhang Y, Rao Z (2020) n-bilstm: bilstm with n-gram features for text classification. In: 2020 IEEE 5th information technology and mechatronics engineering conference (ITOEC), IEEE, pp 1056–1059

  • Zou X, Hu Y, Tian Z, Shen K (2019) Logistic regression model optimization and case analysis. In: 2019 IEEE 7th international conference on computer science and network technology (ICCSNT), IEEE, pp 135–139

Download references

Funding

No funding.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sushil Kumar Maurya.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Human and animals participants

This article does not involve human participants or animals.

Informed consent

There is no plagiarism.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Maurya, S.K., Singh, D. & Maurya, A.K. Deceptive opinion spam detection using feature reduction techniques. Int J Syst Assur Eng Manag 15, 1210–1230 (2024). https://doi.org/10.1007/s13198-023-02208-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13198-023-02208-4

Keywords

Navigation