Skip to main content
Log in

Opinion spam detection framework using hybrid classification scheme

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

With the advent of social networking sites, opinion-mining applications have attracted the interest of the online community on review sites to know about products for their purchase decisions. However, due to increasing trend of posting spam (fake) reviews to promote the target products or defame the specific brands of competitors, Opinion Spam detection and classification has emerged as a hot issue in the community of opinion mining and sentiment analysis. We investigate the issue of Opinion Spam detection by using different combinations of entities, features, and their sentiment scores. We enrich the feature set of a baseline Spam detection method with Spam detection features (Opinion Spam, Opinion Spammer, Item Spam). Using a dataset of reviews from the Amazon site and sentences labeled for Spam detection, we evaluate the role of spamicity-related features in detecting and classifying spam (fake) clues and distinguishing them from genuine reviews. For this purpose, we introduce a rule-based feature weighting scheme and propose a method for tagging the review sentence as spam and non-spam. Experiments results depict that spam-related features improve Spam detection in review sentences posted on product review sites. Adding a revised feature weighting scheme achieved an accuracy increase from 93 to 96%. Furthermore, a hybrid set of features are shown to improve the performance of Opinion Spam detection in terms of better precision, recall, and F-measure values. This work shows that combining spam-related features with rule-based weighting scheme can improve the performance of even baseline Spam detection method. This improvement can be of use to Opinion Spam detection systems, due to the growing interest of individuals and companies in isolating fake (spam) and genuine (non-spam) reviews about products. The outcome of this work will provide an insight into spam-related features and feature weighting and will assist in developing more advanced applications for Opinion Spam detection. In the field of Opinion Spam detection, previous state-of-the-art studies used less number of spamicity-related features and less efficient feature weighting scheme. However, we provided a revised feature selection and a revised feature weighting scheme with normalized spamicity score computation technique. Therefore, our contribution is novel to the field because it provides a significant improvement over the comparing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

References

  • Abu Hammad A, El-Halees A (2015) An approach for detecting spam in Arabic opinion reviews. Int Arab J Inf Technol (IAJIT) 12(1):9–16

    Google Scholar 

  • Ahmed H, Traore I, Saad S (2017) Detection of online fake news using N-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, Cham, pp 127–138

    Google Scholar 

  • Algur SP, Biradar JG (2015a) Review spamicity based on rank and content of the review. In: 2015 international conference on applied and theoretical computing and communication technology (iCATccT). IEEE, pp 140–145

  • Algur SP, Biradar JG (2015b) Rating consistency and review content based multiple stores review spam detection. In: 2015 international conference on information processing (ICIP). IEEE, pp 685–690

  • Asghar MZ, Khan A, Ahmad S, Khan IA, Kundi FM (2015) A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS ONE 10(10):e0140204

    Article  Google Scholar 

  • Asghar MZ, Ahmad S, Qasim M, Zahra SR, Kundi FM (2016) SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5(1):1139

    Article  Google Scholar 

  • Asghar MZ, Khan A, Ahmad S, Qasim M, Khan IA (2017) Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2):e0171649

    Article  Google Scholar 

  • Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: twitter sentiment analysis framework using a hybrid classification scheme. Exp Syst 35(1):e12233

    Article  Google Scholar 

  • Bandakkanavar RV, Ramesh M, Geeta H (2014) A survey on detection of reviews using sentiment classification of methods. IJRITCC 2(2):310–314

    Google Scholar 

  • Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web Spam detection. ACM Trans Web TWEB 2(1):2

    Google Scholar 

  • Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media Inc, Sebastopol

    MATH  Google Scholar 

  • Chen YR, Chen HH (2015) Opinion spam detection in web forum: a real case study. In: Proceedings of the 24th international conference on world wide web. ACM, pp 173–183

  • Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 373–380

  • Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23

    Article  Google Scholar 

  • De Souza FB, De Magalhaes TR, Almeida VAF, De Almeida JM, Goncalves MA (2010) U.S. Patent application no. 12/967,923

  • Elli MS, Wang YF (2015) Amazon reviews, business analytics with sentiment analysis. https://pdfs.semanticscholar.org/bbb4/b549cae71fb74680764fd3fe4d72b705f4f4.pdf

  • Fairbanks J, Fitch N, Knauf N, Briscoe E (2018) Credibility assessment in the news: do we need to read? MIS2’18, Feb 2018, Los Angeles, California USA

  • Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. ICWSM 13:175–184

    Google Scholar 

  • Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of deceptive product reviews. ICWSM 12:98–105

    Google Scholar 

  • Gilbert E, Karahalios K (2010) Understanding deja reviewers. In: Proceedings of the 2010 ACM conference on computer supported cooperative work. ACM, pp 225–228

  • Granik M, Mesyura V (2017) Fake news detection using naive Bayes classifier. In: 2017 IEEE first Ukraine conference on electrical and computer engineering (UKRCON). IEEE, pp 900–903

  • Hosseinimotlagh S, Papalexakis EE (2018) Unsupervised content-based identification of fake news articles with tensor decomposition ensembles. MIS2, Marina Del Rey, CA, USA

  • Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 219–230

  • Jindal N, Liu B, Lim EP (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, pp 1549–1552

  • Kokate S, Tidke B (2015) Fake review and brand spam detection using J48 classifier. IJCSIT Int J Comput Sci Inf Technol 6(4):3523–3526

    Google Scholar 

  • Li J, Ott M, Cardie C, Hovy EH (2014) Towards a general rule for identifying deceptive opinion spam. In: ACL, vol 1, pp 1566–1576

  • Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41

    Article  Google Scholar 

  • Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp 939–948

  • Lloret E, Saggion H, Palomar M (2010) Experiments on summary-based opinion classification. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text. Association for Computational Linguistics, pp 107–115

  • McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794

  • Montes-y-Gomez M, Rosso P (2013) Using PU-learning to detect deceptive opinion spam. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 38–45

  • Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 93–94

  • Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on world wide web. ACM, pp 191–200

  • Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013a) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 632–640

  • Mukherjee A, Venkataraman V, Liu B, Glance NS (2013b) What yelp fake review filter might be doing? In: ICWSM

  • Nair A, Phapale A, Yagnik V, Bathe K (2016) Opinion spam mining. Int Res J Eng Technol (IRJET) 3(4):1855–1859

    Google Scholar 

  • Noekhah S, Fouladfar E, Salim N, Ghorashi SH, Hozhabri AA (2014) A novel approach for opinion spam detection in e-commerce. In: Proceedings of the 8th IEEE international conference on E-commerce with focus on E-trust

  • Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 309–319

  • Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: HLT-NAACL, pp 497–501

  • Prajapati J, Bhatt M, Prajapati DJ (2012) Detection and summarization of genuine review using visual data mining. Int J Comput Appl 43(11):22–26

    Google Scholar 

  • Radulescu C, Dinsoreanu M, Potolea R (2014) Identification of spam comments using natural language processing techniques. In: 2014 IEEE international conference on intelligent computer communication and processing (ICCP). IEEE, pp 29–35

  • Rajamohana SP, Umamaheswari K (2018) Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Comput Electr Eng 67:497–508

    Article  Google Scholar 

  • Rajamohana SP, Umamaheswari K, Karthiga R (2015) Sentiment classification based on latent Dirichlet allocation. Int J Comput Appl. ISSN 0975-8887

  • Rajamohana SP, Umamaheshwari K, Karthiga R (2016) Sentiment analysis using shuffled frog leaping algorithm. Int J Adv Res Comput Sci Softw Eng 6(12)

  • Raschka S (2018) About feature scaling and normalization. http://sebastianraschka.com/Articles/2014_about_feature_scaling.html. Last Accessed 03 Jan 2018

  • Rout JK, Dalmia A, Choo KKR, Bakshi S, Jena SK (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5(1):1319–1327

    Article  Google Scholar 

  • Sharma K, Lin KI (2013) Review spam detector with rating consistency check. In: Proceedings of the 51st ACM Southeast conference. ACM, p 34

  • Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th international conference on intelligent systems design and applications (ISDA). IEEE, pp 53–58

  • Sun C, Du Q, Tian G (2016) Exploiting product related review features for fake review detection. Math Probl Eng 2016:4935792. https://doi.org/10.1155/2016/4935792

    Article  Google Scholar 

  • Wang G, Xie S, Liu B, Philip SY (2011) Review graph based online store review spammer detection. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 1242–1247

  • Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review spammers via social review graph. ACM Trans Intell Syst Technol (TIST) 3(4):61

    Google Scholar 

  • Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the first workshop on social media analytics. ACM, pp 10–13

  • Wu J, Xu B, Li S (2011) An unsupervised approach to rank product reviews. In: 2011 eighth international conference on fuzzy systems and knowledge discovery (FSKD), vol 3. IEEE, pp 1769–1772

  • Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 823–831

  • Zhiyuli A, Liang X, Wang Y (2015) Discerning the trend: concealing deceptive reviews. In 2015 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 1833–1838

  • Zubiaga A, Aker A, Bontcheva K, Liakata M, Procter R (2018) Detection and resolution of rumours in social media: a survey. ACM Comput Surve (CSUR) 51(2):32

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Zubair Asghar.

Ethics declarations

Conflict of interest

The authors declare that they no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (RAR 135 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Asghar, M.Z., Ullah, A., Ahmad, S. et al. Opinion spam detection framework using hybrid classification scheme. Soft Comput 24, 3475–3498 (2020). https://doi.org/10.1007/s00500-019-04107-y

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-019-04107-y

Keywords

Navigation