Skip to main content

Advertisement

Log in

Spam review detection using spiral cuckoo search clustering method

  • Research Paper
  • Published:
Evolutionary Intelligence Aims and scope Submit manuscript

Abstract

Nowadays, online reviews play an important role in customer’s decision. Starting from buying a shirt from an e-commerce site to dining in a restaurant, online reviews has become a basis of selection. However, peoples are always in a hustle and bustle since they don’t have time to pay attention to the intrinsic details of products and services, thus the dependency on online reviews have been hiked. Due to reliance on online reviews, some people and organizations pompously generate spam reviews in order to promote or demote the reputation of a person/product/organization. Thus, it is impossible to identify whether a review is a spam or a ham by the naked eye and it is also impractical to classify all the reviews manually. Therefore, a spiral cuckoo search based clustering method has been introduced to discover spam reviews. The proposed method uses the strength of cuckoo search and Fermat spiral to resolve the convergence issue of cuckoo search method. The efficiency of the proposed method has been tested on four spam datasets and one Twitter spammer dataset. To validate the efficacy of proposed clustering method it is compared with six metaheuristics clustering methods namely; particle swarm optimization, differential evolution, genetic algorithm, cuckoo search, K-means, and improved cuckoo search. The experimental results and statistical analysis validate that the proposed method outruns the existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  1. Lackermair G, Kailer D, Kanmaz K (2013) Importance of online product reviews from a consumer’s perspective. Adv Econ Bus 1:1–5

    Google Scholar 

  2. Dixit S, Agrawal A (2013) Survey on review spam detection. Int J Comput Commun Technol ISSN 4:0975–7449

    Google Scholar 

  3. Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: Intelligent systems design and applications (ISDA), 2013 13th international conference on, IEEE, pp 53–58

  4. Rosso P, Cagnina LC (2017) Deception detection and opinion spam. In: A practical guide to sentiment analysis, Springer, New York, pp 155–171

  5. Heredia B, Khoshgoftaar TM, Prusa JD, Crawford M (2017) Improving detection of untrustworthy online reviews using ensemble learners combined with feature selection. Soc Netw Anal Min 7(1):37

    Article  Google Scholar 

  6. Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Vol 1, association for computational linguistics, pp 309–319

  7. Jindal N, Liu B, Lim E-P (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM international conference on information and knowledge management, ACM, pp 1549–1552

  8. Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI proceedings of international joint conference on artificial intelligence, vol 22, p 2488

  9. Cheng L-C, Tseng JC, Chung T-Y (2017) Case study of fake web reviews. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, ACM, pp 706–709

  10. Munzel A (2016) Assisting consumers in detecting fake reviews: the role of identity information disclosure and consensus. J Retail Consumer Serv 32:96–108

    Article  Google Scholar 

  11. Narayan R, Rout JK, Jena SK (2018) Review spam detection using opinion mining. In: Progress in intelligent computing techniques: theory, practice, and applications, Springer, New York, pp 273–279

  12. Petrescu M, O’Leary K, Goldring D, Mrad SB (2018) Incentivized reviews: promising the moon for a few stars. J Retail Consumer Serv

  13. Luca M, Zervas G (2016) Fake it till you make it: reputation, competition, and yelp review fraud. Manag Sci 62(12):3412–3427

    Article  Google Scholar 

  14. Gieseke F, Kramer O, Airola A, Pahikkala T (2012) Efficient recurrent local search strategies for semi-and unsupervised regularized least-squares classification. Evolut Intell 5(3):189–205

    Article  Google Scholar 

  15. Behdad M, Barone L, French T, Bennamoun M (2012) On XCSR for electronic fraud detection. Evolut Intell 5(2):139–150

    Article  Google Scholar 

  16. Mani S, Kumari S, Jain A, Kumar P (2018) Spam review detection using ensemble machine learning. In: International conference on machine learning and data mining in pattern recognition, Springer, New York, pp 198–209

  17. Ghai R, Kumar S, Pandey AC (2019) Spam detection using rating and review processing method, smart innovations in communication and computational sciences. Springer, Singapore, pp 189–198

  18. Heydari A, Tavakoli M, Salim N (2016) Detection of fake opinions using time series. Expert Syst Appl 58:83–92

    Article  Google Scholar 

  19. Liu Y, Pang B (2018) A unified framework for detecting author spamicity by modeling review deviation. Exp Syst Appl 112:148–155

    Article  Google Scholar 

  20. Li C, Liu S (2018) A comparative study of the class imbalance problem in twitter spam detection. Concurr Comput Pract Exp 30(5):e4281

    Article  Google Scholar 

  21. Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews-A text summarization approach. Inf Process Manag 53(2):436–449

    Article  Google Scholar 

  22. Hai Z, Zhao P, Cheng P, Yang P, Li X-L, Li G (2016) Deceptive review spam detection via exploiting task relatedness and unlabeled data. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1817–1826

  23. Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for twitter. In: Applied sciences and technology (IBCAST), 2017 14th international Bhurban conference on, IEEE, pp 466–471

  24. Vishwarupe V, Bedekar M, Pande M, Hiwale A (2018) Intelligent twitter spam detection: a hybrid approach. In: Smart trends in systems, security and sustainability, Springer, New York, pp 189–197

  25. Sedhai S, Sun A (2018) Semi-supervised spam detection in twitter stream. arXiv:1702.01032

  26. Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017) Statistical features-based real-time detection of drifted twitter spam. IEEE Trans Inf Forensics Secur 12(4):914–925

    Article  Google Scholar 

  27. Wu T, Wen S, Xiang Y, Zhou W (2018) Twitter spam detection: survey of new approaches and comparative study. Comput Secur 76:265–284

    Article  Google Scholar 

  28. Singh S, Singh AK (2018) Web-spam features selection using cfs-pso. Proc Comput Sci 125:568–575

    Article  Google Scholar 

  29. Li Y, Nie X, Huang R (2018) Web spam classification method based on deep belief networks. Expert Syst Appl 96:261–270

    Article  Google Scholar 

  30. Singh A, Batra S (2018) Ensemble based spam detection in social iot using probabilistic data structures. Fut Gen Comput Syst 81:359–371

    Article  Google Scholar 

  31. Wei Y, Singh L (2018) Detecting users who share extremist content on twitter. In: Surveillance in Action, Springer, New York, pp 351–368

  32. Bindu P, Mishra R, Thilagam PS (2018) Discovering spammer communities in twitter. J Intell Inf Syst, pp 1–25

  33. Liu S, Zhang J, Xiang Y (2016) Statistical detection of online drifting twitter spam. In: Proceedings of the 11th ACM on Asia conference on computer and communications security, ACM, pp 1–10

  34. Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on Twitter. Neurocomputing 315:496–511

    Article  Google Scholar 

  35. Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73

    Article  Google Scholar 

  36. Singh M, Kumar L, Sinha S (2018) Model for detecting fake or spam reviews. In: ICT based innovations, Springer, New York, pp 213–217

  37. Narayan R, Rout JK, Jena SK (2018) Review spam detection using semi-supervised technique. In: Progress in intelligent computing techniques: theory, practice, and applications, Springer, New York, pp 281–286

  38. Salehi S, Selamat A, Bostanian M (2011) Enhanced genetic algorithm for spam detection in email. In: Software engineering and service science (ICSESS), 2011 IEEE 2nd international conference on, IEEE, pp 594–597

  39. Idris I, Selamat A, Omatu S (2014) Hybrid email spam detection model with negative selection algorithm and differential evolution. Eng Appl Artif Intell 28:97–110

    Article  Google Scholar 

  40. Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341–359

    Article  MathSciNet  MATH  Google Scholar 

  41. Kennedy J, Eberhart R (1995) Particle swarm optimization. Neural Netw 4:1942–1948

    Google Scholar 

  42. Idris I, Selamat A, Nguyen NT, Omatu S, Krejcar O, Kuca K, Penhaker M (2015) A combined negative selection algorithm-particle swarm optimization for an email spam detection system. Eng Appl Artif Intell 39:33–44

    Article  Google Scholar 

  43. Pereira FB, Marques JMC (2009) A study on diversity for cluster geometry optimization. Evolut Intell 2(3):121

    Article  Google Scholar 

  44. Simon D (2008) Biogeography-based optimization. IEEE Trans Evolut Comput 12(6):702–713

    Article  Google Scholar 

  45. Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33:1455–1465

    Article  Google Scholar 

  46. Žalik KR (2008) An efficient k’-means clustering algorithm. Pattern Recognit Lett 29:1385–1391

    Article  Google Scholar 

  47. Yang X-S, Deb S (2009) Cuckoo search via lévy flights. In: World congress on nature and biologically inspired computing, IEEE, pp 210–214

  48. Pandey AC, Rajpoot DS, Saraswat M (2016) Data clustering using hybrid improved cuckoo search method. In: Contemporary Computing (IC3), 2016 9th international conference on, IEEE, pp 1–6

  49. Pandey AC, Rajpoot DS, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manag 53(4):764–779

    Article  Google Scholar 

  50. Pandey AC, Rajpoot DS, Saraswat M (2017) Hybrid step size based cuckoo search. In: Contemporary computing (IC3), 2017 10th international conference on, IEEE, pp 1-6

  51. Pavlyukevich I (2007) Lévy flights, non-local search and simulated annealing. J Comput Phys 226(2):1830–1844

    Article  MathSciNet  MATH  Google Scholar 

  52. Payne RB, Sorensen MD (2005) The cuckoos, vol 15. Oxford University Press, Oxford

    Google Scholar 

  53. Kulhari A, Pandey A, Pal R, Mittal H (2016) Unsupervised data classification using modified cuckoo search method. In: Contemporary computing (IC3), 2016 9th international conference on, IEEE, pp 1–5

  54. Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc., Newton

    MATH  Google Scholar 

  55. Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of liwc2015, Tech. rep

  56. Tran CT, Zhang M, Andreae P, Xue B (2016) Improving performance for classification with incomplete data using wrapper-based feature selection. Evolut Intell 9(3):81–94

    Article  Google Scholar 

  57. Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312

    Article  Google Scholar 

  58. Roessler EB, Alder HL (1977) Introduction to probability and statistics. WH Freeman

  59. Saraswat M, Arya K, Sharma H (2013) Leukocyte segmentation in tissue images using differential evolution algorithm. Swarm Evolut Comput 11:46–54

    Article  Google Scholar 

  60. Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184

    Article  MathSciNet  Google Scholar 

  61. Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM Sigkdd Explor Newsl 6(1):1–6

    Article  Google Scholar 

  62. Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 783–792

  63. Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1088–1096

  64. Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing? In: ICWSM, pp 409–418

  65. Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: classification and analysis of real and pseudo reviews. Technical Report UIC-CS-2013–03, University of Illinois at Chicago, Tech. Rep

  66. Pandey AC, Pal R, Kulhari A (2018) Unsupervised data classification using improved biogeography based optimization. Int J Syst Assur Eng Manag 9(4):821–829

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Avinash Chandra Pandey.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Pandey, A.C., Rajpoot, D.S. Spam review detection using spiral cuckoo search clustering method. Evol. Intel. 12, 147–164 (2019). https://doi.org/10.1007/s12065-019-00204-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12065-019-00204-x

Keywords

Navigation