Abstract
Nowadays, online reviews play an important role in customer’s decision. Starting from buying a shirt from an e-commerce site to dining in a restaurant, online reviews has become a basis of selection. However, peoples are always in a hustle and bustle since they don’t have time to pay attention to the intrinsic details of products and services, thus the dependency on online reviews have been hiked. Due to reliance on online reviews, some people and organizations pompously generate spam reviews in order to promote or demote the reputation of a person/product/organization. Thus, it is impossible to identify whether a review is a spam or a ham by the naked eye and it is also impractical to classify all the reviews manually. Therefore, a spiral cuckoo search based clustering method has been introduced to discover spam reviews. The proposed method uses the strength of cuckoo search and Fermat spiral to resolve the convergence issue of cuckoo search method. The efficiency of the proposed method has been tested on four spam datasets and one Twitter spammer dataset. To validate the efficacy of proposed clustering method it is compared with six metaheuristics clustering methods namely; particle swarm optimization, differential evolution, genetic algorithm, cuckoo search, K-means, and improved cuckoo search. The experimental results and statistical analysis validate that the proposed method outruns the existing methods.
Similar content being viewed by others
References
Lackermair G, Kailer D, Kanmaz K (2013) Importance of online product reviews from a consumer’s perspective. Adv Econ Bus 1:1–5
Dixit S, Agrawal A (2013) Survey on review spam detection. Int J Comput Commun Technol ISSN 4:0975–7449
Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: Intelligent systems design and applications (ISDA), 2013 13th international conference on, IEEE, pp 53–58
Rosso P, Cagnina LC (2017) Deception detection and opinion spam. In: A practical guide to sentiment analysis, Springer, New York, pp 155–171
Heredia B, Khoshgoftaar TM, Prusa JD, Crawford M (2017) Improving detection of untrustworthy online reviews using ensemble learners combined with feature selection. Soc Netw Anal Min 7(1):37
Ott M, Choi Y, Cardie C, Hancock JT (2011) Finding deceptive opinion spam by any stretch of the imagination. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, Vol 1, association for computational linguistics, pp 309–319
Jindal N, Liu B, Lim E-P (2010) Finding unusual review patterns using unexpected rules. In: Proceedings of the 19th ACM international conference on information and knowledge management, ACM, pp 1549–1552
Li F, Huang M, Yang Y, Zhu X (2011) Learning to identify review spam. In: IJCAI proceedings of international joint conference on artificial intelligence, vol 22, p 2488
Cheng L-C, Tseng JC, Chung T-Y (2017) Case study of fake web reviews. In: Proceedings of the 2017 IEEE/ACM international conference on advances in social networks analysis and mining 2017, ACM, pp 706–709
Munzel A (2016) Assisting consumers in detecting fake reviews: the role of identity information disclosure and consensus. J Retail Consumer Serv 32:96–108
Narayan R, Rout JK, Jena SK (2018) Review spam detection using opinion mining. In: Progress in intelligent computing techniques: theory, practice, and applications, Springer, New York, pp 273–279
Petrescu M, O’Leary K, Goldring D, Mrad SB (2018) Incentivized reviews: promising the moon for a few stars. J Retail Consumer Serv
Luca M, Zervas G (2016) Fake it till you make it: reputation, competition, and yelp review fraud. Manag Sci 62(12):3412–3427
Gieseke F, Kramer O, Airola A, Pahikkala T (2012) Efficient recurrent local search strategies for semi-and unsupervised regularized least-squares classification. Evolut Intell 5(3):189–205
Behdad M, Barone L, French T, Bennamoun M (2012) On XCSR for electronic fraud detection. Evolut Intell 5(2):139–150
Mani S, Kumari S, Jain A, Kumar P (2018) Spam review detection using ensemble machine learning. In: International conference on machine learning and data mining in pattern recognition, Springer, New York, pp 198–209
Ghai R, Kumar S, Pandey AC (2019) Spam detection using rating and review processing method, smart innovations in communication and computational sciences. Springer, Singapore, pp 189–198
Heydari A, Tavakoli M, Salim N (2016) Detection of fake opinions using time series. Expert Syst Appl 58:83–92
Liu Y, Pang B (2018) A unified framework for detecting author spamicity by modeling review deviation. Exp Syst Appl 112:148–155
Li C, Liu S (2018) A comparative study of the class imbalance problem in twitter spam detection. Concurr Comput Pract Exp 30(5):e4281
Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews-A text summarization approach. Inf Process Manag 53(2):436–449
Hai Z, Zhao P, Cheng P, Yang P, Li X-L, Li G (2016) Deceptive review spam detection via exploiting task relatedness and unlabeled data. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 1817–1826
Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for twitter. In: Applied sciences and technology (IBCAST), 2017 14th international Bhurban conference on, IEEE, pp 466–471
Vishwarupe V, Bedekar M, Pande M, Hiwale A (2018) Intelligent twitter spam detection: a hybrid approach. In: Smart trends in systems, security and sustainability, Springer, New York, pp 189–197
Sedhai S, Sun A (2018) Semi-supervised spam detection in twitter stream. arXiv:1702.01032
Chen C, Wang Y, Zhang J, Xiang Y, Zhou W, Min G (2017) Statistical features-based real-time detection of drifted twitter spam. IEEE Trans Inf Forensics Secur 12(4):914–925
Wu T, Wen S, Xiang Y, Zhou W (2018) Twitter spam detection: survey of new approaches and comparative study. Comput Secur 76:265–284
Singh S, Singh AK (2018) Web-spam features selection using cfs-pso. Proc Comput Sci 125:568–575
Li Y, Nie X, Huang R (2018) Web spam classification method based on deep belief networks. Expert Syst Appl 96:261–270
Singh A, Batra S (2018) Ensemble based spam detection in social iot using probabilistic data structures. Fut Gen Comput Syst 81:359–371
Wei Y, Singh L (2018) Detecting users who share extremist content on twitter. In: Surveillance in Action, Springer, New York, pp 351–368
Bindu P, Mishra R, Thilagam PS (2018) Discovering spammer communities in twitter. J Intell Inf Syst, pp 1–25
Liu S, Zhang J, Xiang Y (2016) Statistical detection of online drifting twitter spam. In: Proceedings of the 11th ACM on Asia conference on computer and communications security, ACM, pp 1–10
Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on Twitter. Neurocomputing 315:496–511
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73
Singh M, Kumar L, Sinha S (2018) Model for detecting fake or spam reviews. In: ICT based innovations, Springer, New York, pp 213–217
Narayan R, Rout JK, Jena SK (2018) Review spam detection using semi-supervised technique. In: Progress in intelligent computing techniques: theory, practice, and applications, Springer, New York, pp 281–286
Salehi S, Selamat A, Bostanian M (2011) Enhanced genetic algorithm for spam detection in email. In: Software engineering and service science (ICSESS), 2011 IEEE 2nd international conference on, IEEE, pp 594–597
Idris I, Selamat A, Omatu S (2014) Hybrid email spam detection model with negative selection algorithm and differential evolution. Eng Appl Artif Intell 28:97–110
Storn R, Price K (1997) Differential evolution-a simple and efficient heuristic for global optimization over continuous spaces. J Glob Optim 11:341–359
Kennedy J, Eberhart R (1995) Particle swarm optimization. Neural Netw 4:1942–1948
Idris I, Selamat A, Nguyen NT, Omatu S, Krejcar O, Kuca K, Penhaker M (2015) A combined negative selection algorithm-particle swarm optimization for an email spam detection system. Eng Appl Artif Intell 39:33–44
Pereira FB, Marques JMC (2009) A study on diversity for cluster geometry optimization. Evolut Intell 2(3):121
Simon D (2008) Biogeography-based optimization. IEEE Trans Evolut Comput 12(6):702–713
Maulik U, Bandyopadhyay S (2000) Genetic algorithm-based clustering technique. Pattern Recognit 33:1455–1465
Žalik KR (2008) An efficient k’-means clustering algorithm. Pattern Recognit Lett 29:1385–1391
Yang X-S, Deb S (2009) Cuckoo search via lévy flights. In: World congress on nature and biologically inspired computing, IEEE, pp 210–214
Pandey AC, Rajpoot DS, Saraswat M (2016) Data clustering using hybrid improved cuckoo search method. In: Contemporary Computing (IC3), 2016 9th international conference on, IEEE, pp 1–6
Pandey AC, Rajpoot DS, Saraswat M (2017) Twitter sentiment analysis using hybrid cuckoo search method. Inf Process Manag 53(4):764–779
Pandey AC, Rajpoot DS, Saraswat M (2017) Hybrid step size based cuckoo search. In: Contemporary computing (IC3), 2017 10th international conference on, IEEE, pp 1-6
Pavlyukevich I (2007) Lévy flights, non-local search and simulated annealing. J Comput Phys 226(2):1830–1844
Payne RB, Sorensen MD (2005) The cuckoos, vol 15. Oxford University Press, Oxford
Kulhari A, Pandey A, Pal R, Mittal H (2016) Unsupervised data classification using modified cuckoo search method. In: Contemporary computing (IC3), 2016 9th international conference on, IEEE, pp 1–5
Bird S, Klein E, Loper E (2009) Natural language processing with Python: analyzing text with the natural language toolkit. O’Reilly Media, Inc., Newton
Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of liwc2015, Tech. rep
Tran CT, Zhang M, Andreae P, Xue B (2016) Improving performance for classification with incomplete data using wrapper-based feature selection. Evolut Intell 9(3):81–94
Mafarja MM, Mirjalili S (2017) Hybrid whale optimization algorithm with simulated annealing for feature selection. Neurocomputing 260:302–312
Roessler EB, Alder HL (1977) Introduction to probability and statistics. WH Freeman
Saraswat M, Arya K, Sharma H (2013) Leukocyte segmentation in tissue images using differential evolution algorithm. Swarm Evolut Comput 11:46–54
Hatamlou A (2013) Black hole: a new heuristic optimization approach for data clustering. Inf Sci 222:175–184
Chawla NV, Japkowicz N, Kotcz A (2004) Special issue on learning from imbalanced data sets. ACM Sigkdd Explor Newsl 6(1):1–6
Wang H, Lu Y, Zhai C (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 783–792
Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, pp 1088–1096
Mukherjee A, Venkataraman V, Liu B, Glance NS (2013) What yelp fake review filter might be doing? In: ICWSM, pp 409–418
Mukherjee A, Venkataraman V, Liu B, Glance N (2013) Fake review detection: classification and analysis of real and pseudo reviews. Technical Report UIC-CS-2013–03, University of Illinois at Chicago, Tech. Rep
Pandey AC, Pal R, Kulhari A (2018) Unsupervised data classification using improved biogeography based optimization. Int J Syst Assur Eng Manag 9(4):821–829
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Pandey, A.C., Rajpoot, D.S. Spam review detection using spiral cuckoo search clustering method. Evol. Intel. 12, 147–164 (2019). https://doi.org/10.1007/s12065-019-00204-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12065-019-00204-x