Opinion spam detection framework using hybrid classification scheme

Asghar, Muhammad Zubair; Ullah, Asmat; Ahmad, Shakeel; Khan, Aurangzeb

doi:10.1007/s00500-019-04107-y

Opinion spam detection framework using hybrid classification scheme

Methodologies and Application
Published: 11 June 2019

Volume 24, pages 3475–3498, (2020)
Cite this article

Soft Computing Aims and scope Submit manuscript

Muhammad Zubair Asghar ORCID: orcid.org/0000-0003-3320-2074¹,
Asmat Ullah¹,
Shakeel Ahmad² &
…
Aurangzeb Khan³

1785 Accesses
41 Citations
Explore all metrics

Abstract

With the advent of social networking sites, opinion-mining applications have attracted the interest of the online community on review sites to know about products for their purchase decisions. However, due to increasing trend of posting spam (fake) reviews to promote the target products or defame the specific brands of competitors, Opinion Spam detection and classification has emerged as a hot issue in the community of opinion mining and sentiment analysis. We investigate the issue of Opinion Spam detection by using different combinations of entities, features, and their sentiment scores. We enrich the feature set of a baseline Spam detection method with Spam detection features (Opinion Spam, Opinion Spammer, Item Spam). Using a dataset of reviews from the Amazon site and sentences labeled for Spam detection, we evaluate the role of spamicity-related features in detecting and classifying spam (fake) clues and distinguishing them from genuine reviews. For this purpose, we introduce a rule-based feature weighting scheme and propose a method for tagging the review sentence as spam and non-spam. Experiments results depict that spam-related features improve Spam detection in review sentences posted on product review sites. Adding a revised feature weighting scheme achieved an accuracy increase from 93 to 96%. Furthermore, a hybrid set of features are shown to improve the performance of Opinion Spam detection in terms of better precision, recall, and F-measure values. This work shows that combining spam-related features with rule-based weighting scheme can improve the performance of even baseline Spam detection method. This improvement can be of use to Opinion Spam detection systems, due to the growing interest of individuals and companies in isolating fake (spam) and genuine (non-spam) reviews about products. The outcome of this work will provide an insight into spam-related features and feature weighting and will assist in developing more advanced applications for Opinion Spam detection. In the field of Opinion Spam detection, previous state-of-the-art studies used less number of spamicity-related features and less efficient feature weighting scheme. However, we provided a revised feature selection and a revised feature weighting scheme with normalized spamicity score computation technique. Therefore, our contribution is novel to the field because it provides a significant improvement over the comparing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

Sentiment analysis: A survey on design framework, applications and future scopes

Article 20 March 2023

A novel feature and class-based globalization technique for text classification

Article 25 April 2023

References

Abu Hammad A, El-Halees A (2015) An approach for detecting spam in Arabic opinion reviews. Int Arab J Inf Technol (IAJIT) 12(1):9–16
Google Scholar
Ahmed H, Traore I, Saad S (2017) Detection of online fake news using N-gram analysis and machine learning techniques. In: International conference on intelligent, secure, and dependable systems in distributed and cloud environments. Springer, Cham, pp 127–138
Google Scholar
Algur SP, Biradar JG (2015a) Review spamicity based on rank and content of the review. In: 2015 international conference on applied and theoretical computing and communication technology (iCATccT). IEEE, pp 140–145
Algur SP, Biradar JG (2015b) Rating consistency and review content based multiple stores review spam detection. In: 2015 international conference on information processing (ICIP). IEEE, pp 685–690
Asghar MZ, Khan A, Ahmad S, Khan IA, Kundi FM (2015) A unified framework for creating domain dependent polarity lexicons from user generated reviews. PLoS ONE 10(10):e0140204
Article Google Scholar
Asghar MZ, Ahmad S, Qasim M, Zahra SR, Kundi FM (2016) SentiHealth: creating health-related sentiment lexicon using hybrid approach. SpringerPlus 5(1):1139
Article Google Scholar
Asghar MZ, Khan A, Ahmad S, Qasim M, Khan IA (2017) Lexicon-enhanced sentiment analysis framework using rule-based classification scheme. PLoS ONE 12(2):e0171649
Article Google Scholar
Asghar MZ, Kundi FM, Ahmad S, Khan A, Khan F (2018) T-SAF: twitter sentiment analysis framework using a hybrid classification scheme. Exp Syst 35(1):e12233
Article Google Scholar
Bandakkanavar RV, Ramesh M, Geeta H (2014) A survey on detection of reviews using sentiment classification of methods. IJRITCC 2(2):310–314
Google Scholar
Becchetti L, Castillo C, Donato D, Baeza-Yates R, Leonardi S (2008) Link analysis for web Spam detection. ACM Trans Web TWEB 2(1):2
Google Scholar
Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media Inc, Sebastopol
MATH Google Scholar
Chen YR, Chen HH (2015) Opinion spam detection in web forum: a real case study. In: Proceedings of the 24th international conference on world wide web. ACM, pp 173–183
Chirita PA, Diederich J, Nejdl W (2005) MailRank: using ranking for spam detection. In: Proceedings of the 14th ACM international conference on information and knowledge management. ACM, pp 373–380
Crawford M, Khoshgoftaar TM, Prusa JD, Richter AN, Al Najada H (2015) Survey of review spam detection using machine learning techniques. J Big Data 2(1):23
Article Google Scholar
De Souza FB, De Magalhaes TR, Almeida VAF, De Almeida JM, Goncalves MA (2010) U.S. Patent application no. 12/967,923
Elli MS, Wang YF (2015) Amazon reviews, business analytics with sentiment analysis. https://pdfs.semanticscholar.org/bbb4/b549cae71fb74680764fd3fe4d72b705f4f4.pdf
Fairbanks J, Fitch N, Knauf N, Briscoe E (2018) Credibility assessment in the news: do we need to read? MIS2’18, Feb 2018, Los Angeles, California USA
Fei G, Mukherjee A, Liu B, Hsu M, Castellanos M, Ghosh R (2013) Exploiting burstiness in reviews for review spammer detection. ICWSM 13:175–184
Google Scholar
Feng S, Xing L, Gogar A, Choi Y (2012) Distributional footprints of deceptive product reviews. ICWSM 12:98–105
Google Scholar
Gilbert E, Karahalios K (2010) Understanding deja reviewers. In: Proceedings of the 2010 ACM conference on computer supported cooperative work. ACM, pp 225–228
Granik M, Mesyura V (2017) Fake news detection using naive Bayes classifier. In: 2017 IEEE first Ukraine conference on electrical and computer engineering (UKRCON). IEEE, pp 900–903
Hosseinimotlagh S, Papalexakis EE (2018) Unsupervised content-based identification of fake news articles with tensor decomposition ensembles. MIS2, Marina Del Rey, CA, USA
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining. ACM, pp 219–230
Jindal N, Liu B, Lim EP (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM international conference on information and knowledge management. ACM, pp 1549–1552
Kokate S, Tidke B (2015) Fake review and brand spam detection using J48 classifier. IJCSIT Int J Comput Sci Inf Technol 6(4):3523–3526
Google Scholar
Li J, Ott M, Cardie C, Hovy EH (2014) Towards a general rule for identifying deceptive opinion spam. In: ACL, vol 1, pp 1566–1576
Li L, Qin B, Ren W, Liu T (2017) Document representation and feature combination for deceptive spam review detection. Neurocomputing 254:33–41
Article Google Scholar
Lim EP, Nguyen VA, Jindal N, Liu B, Lauw HW (2010) Detecting product review spammers using rating behaviors. In: Proceedings of the 19th ACM international conference on Information and knowledge management. ACM, pp 939–948
Lloret E, Saggion H, Palomar M (2010) Experiments on summary-based opinion classification. In: Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text. Association for Computational Linguistics, pp 107–115
McAuley J, Pandey R, Leskovec J (2015) Inferring networks of substitutable and complementary products. In: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, pp 785–794
Montes-y-Gomez M, Rosso P (2013) Using PU-learning to detect deceptive opinion spam. In: Proceedings of the 4th workshop on computational approaches to subjectivity, sentiment and social media analysis, pp 38–45
Mukherjee A, Liu B, Wang J, Glance N, Jindal N (2011) Detecting group review spam. In: Proceedings of the 20th international conference companion on world wide web. ACM, pp 93–94
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on world wide web. ACM, pp 191–200
Mukherjee A, Kumar A, Liu B, Wang J, Hsu M, Castellanos M, Ghosh R (2013a) Spotting opinion spammers using behavioral footprints. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 632–640
Mukherjee A, Venkataraman V, Liu B, Glance NS (2013b) What yelp fake review filter might be doing? In: ICWSM
Nair A, Phapale A, Yagnik V, Bathe K (2016) Opinion spam mining. Int Res J Eng Technol (IRJET) 3(4):1855–1859
Google Scholar
Noekhah S, Fouladfar E, Salim N, Ghorashi SH, Hozhabri AA (2014) A novel approach for opinion spam detection in e-commerce. In: Proceedings of the 8th IEEE international conference on E-commerce with focus on E-trust
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 309–319
Ott M, Cardie C, Hancock JT (2013) Negative deceptive opinion spam. In: HLT-NAACL, pp 497–501
Prajapati J, Bhatt M, Prajapati DJ (2012) Detection and summarization of genuine review using visual data mining. Int J Comput Appl 43(11):22–26
Google Scholar
Radulescu C, Dinsoreanu M, Potolea R (2014) Identification of spam comments using natural language processing techniques. In: 2014 IEEE international conference on intelligent computer communication and processing (ICCP). IEEE, pp 29–35
Rajamohana SP, Umamaheswari K (2018) Hybrid approach of improved binary particle swarm optimization and shuffled frog leaping for feature selection. Comput Electr Eng 67:497–508
Article Google Scholar
Rajamohana SP, Umamaheswari K, Karthiga R (2015) Sentiment classification based on latent Dirichlet allocation. Int J Comput Appl. ISSN 0975-8887
Rajamohana SP, Umamaheshwari K, Karthiga R (2016) Sentiment analysis using shuffled frog leaping algorithm. Int J Adv Res Comput Sci Softw Eng 6(12)
Raschka S (2018) About feature scaling and normalization. http://sebastianraschka.com/Articles/2014_about_feature_scaling.html. Last Accessed 03 Jan 2018
Rout JK, Dalmia A, Choo KKR, Bakshi S, Jena SK (2017) Revisiting semi-supervised learning for online deceptive review detection. IEEE Access 5(1):1319–1327
Article Google Scholar
Sharma K, Lin KI (2013) Review spam detector with rating consistency check. In: Proceedings of the 51st ACM Southeast conference. ACM, p 34
Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13th international conference on intelligent systems design and applications (ISDA). IEEE, pp 53–58
Sun C, Du Q, Tian G (2016) Exploiting product related review features for fake review detection. Math Probl Eng 2016:4935792. https://doi.org/10.1155/2016/4935792
Article Google Scholar
Wang G, Xie S, Liu B, Philip SY (2011) Review graph based online store review spammer detection. In: 2011 IEEE 11th international conference on data mining (ICDM). IEEE, pp 1242–1247
Wang G, Xie S, Liu B, Yu PS (2012) Identify online store review spammers via social review graph. ACM Trans Intell Syst Technol (TIST) 3(4):61
Google Scholar
Wu G, Greene D, Smyth B, Cunningham P (2010) Distortion as a validation criterion in the identification of suspicious reviews. In: Proceedings of the first workshop on social media analytics. ACM, pp 10–13
Wu J, Xu B, Li S (2011) An unsupervised approach to rank product reviews. In: 2011 eighth international conference on fuzzy systems and knowledge discovery (FSKD), vol 3. IEEE, pp 1769–1772
Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, pp 823–831
Zhiyuli A, Liang X, Wang Y (2015) Discerning the trend: concealing deceptive reviews. In 2015 IEEE international conference on systems, man, and cybernetics (SMC). IEEE, pp 1833–1838
Zubiaga A, Aker A, Bontcheva K, Liakata M, Procter R (2018) Detection and resolution of rumours in social media: a survey. ACM Comput Surve (CSUR) 51(2):32
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Computing and Information Technology, Gomal University, D.I.Khan, KP, Pakistan
Muhammad Zubair Asghar & Asmat Ullah
Faculty of Computing and Information Technology at Rabigh (FCITR), King Abdul Aziz University (KAU), Jeddah, Kingdom of Saudi Arabia
Shakeel Ahmad
Department of Computer Science, University of Science and Technology, Bannu, KP, Pakistan
Aurangzeb Khan

Authors

Muhammad Zubair Asghar
View author publications
You can also search for this author in PubMed Google Scholar
Asmat Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Shakeel Ahmad
View author publications
You can also search for this author in PubMed Google Scholar
Aurangzeb Khan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Muhammad Zubair Asghar.

Ethics declarations

Conflict of interest

The authors declare that they no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Additional information

Communicated by V. Loia.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (RAR 135 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Asghar, M.Z., Ullah, A., Ahmad, S. et al. Opinion spam detection framework using hybrid classification scheme. Soft Comput 24, 3475–3498 (2020). https://doi.org/10.1007/s00500-019-04107-y

Download citation

Published: 11 June 2019
Issue Date: March 2020
DOI: https://doi.org/10.1007/s00500-019-04107-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Opinion spam detection framework using hybrid classification scheme

Abstract

Access this article

Similar content being viewed by others

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Sentiment analysis: A survey on design framework, applications and future scopes

A novel feature and class-based globalization technique for text classification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary material 1 (RAR 135 kb)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Opinion spam detection framework using hybrid classification scheme

Abstract

Access this article

Similar content being viewed by others

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Sentiment analysis: A survey on design framework, applications and future scopes

A novel feature and class-based globalization technique for text classification

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher’s Note

Electronic supplementary material

Supplementary material 1 (RAR 135 kb)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation