Abstract
Currently, online reviews play an essential role in the decision-making of customers. Various online websites such as Amazon, Yelp, Google Plus, BookMyShow, Facebook, Twitter, etc., allow its users to generate huge bulk of data. The data is generated in the form of feedback/reviews, comments, or tweets. This data is helpful for organizations to improve the quality of their products. Due to dependency on these online reviews, spam reviews are generated pretentiously by some organizations and people concerning promotion or demotion of the prominence of any product, organization, or person. Thus, identifying spam or non-spam review by the naked eye is nearly impossible. Classifying the reviews manually is also highly speculative. Hence, to overcome this issue, a hybrid Grey Wolf Optimizer (GWOK) based clustering method is proposed in this paper to identify spam reviews. In the proposed GWOK, the k-Means algorithm is used for initialization of the initial population for the basic GWO algorithm, and then the GWO algorithm is used for finding the optimal Cluster Heads. To prove that the proposed strategy is effective, three spam datasets, namely Synthetic Spam Reviews, Movie Reviews, and Yelp Hotel & Restaurant Reviews, have been used in our work. The reported results are compared with the existing state-of-art metaheuristic clustering methods like a genetic algorithm (GA), differential evolution (DE), particle swarm optimization (PSO), cuckoo search (CS), and k-Means. The results obtained by experimental and statistical analysis legitimize that the proposed GWOK algorithm surpasses contemporary techniques.
Similar content being viewed by others
References
Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pp 60–69
Angeli A, Filliat D, Doncieux S, Meyer J-A (2008) Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans Robot 24(5):1027–1037
Asghar MZ, Ullah A, Ahmad S, Khan A (2020) Opinion spam detection framework using hybrid classification scheme. Soft Comput 24(5):3475–3498
Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York
Bindu PV, Mishra R, Santhi Thilagam P (2018) Discovering spammer communities in twitter. J Intell Inform Syst 51(3):503–527
Bird S, Klein E, Loper E (2009) Natural language processing with python: analyzing text with the natural language toolkit. ” O’Reilly Media Inc.”
Catal C, Guldan S (2017) Product review management software based on multiple classifiers. Iet Softw 11(3):89–92
Chang T, Hsu PY, Cheng MS, Chung CY, Yi LC (2015) Detecting fake review with rumor model—case study in hotel review. In: International conference on intelligent science and big data engineering. Springer, pp 181–192
Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108
Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews–a text summarization approach. Inform Process Manag 53(2):436–449
Idris I, Selamat A, Omatu S (2014) Hybrid email spam detection model with negative selection algorithm and differential evolution. Eng Appl Artif Intell 28:97–110
Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on twitter. Neurocomputing 315:496–511
Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining, pp 219–230
Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, vol 4. IEEE, pp 1942–1948
Li FH, Huang M, Yi Y, Zhu X (2011) Learning to identify review spam. In: Twenty-second international joint conference on artificial intelligence
Li Y, Nie X, Huang R (2018) Web spam classification method based on deep belief networks. Expert Syst Appl 96:261–270
Liu S, Zhang J, Xiang Y (2016) Statistical detection of online drifting twitter spam. In: Proceedings of the 11th ACM on Asia conference on computer and communications security, pp 1–10
Luca M (2016) Reviews: reputation, and revenue: The case of yelp. com. Com (March 15, 2016). Harvard Business School NOM Unit Working Paper (12-016)
Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for twitter. In: 2017 14Th international bhurban conference on applied sciences and technology (IBCAST). IEEE, pp 466–471
Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: International conference on autonomic and trusted computing. Springer, pp 175–186
Mesleh AMoA (2007) Chi square feature extraction based svms arabic language text categorization system. J Comput Sci 3(6):430–435
Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61
Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, pp 191–200
Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing?. In: Proceedings of the international AAAI conference on web and social media, p 7
Mukherjee A, Venkataraman V, Liu B, Glance N et al (2013) Fake review detection: Classification and analysis of real and pseudo reviews. Technical Report UIC-CS-2013–03. University of Illinois at Chicago. Tech Rep
Narayan R, Rout JK, Jena SK (2018) Review spam detection using semi-supervised technique. In: Progress in intelligent computing techniques: theory, Practice, and Applications. Springer, pp 281–286
Nesmachnow S (2014) An overview of metaheuristics: accurate and efficient methods for optimisation. Int J Metaheuristics 3(4):320–347
Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of liwc2015. Technical report
Pereira FB, Marques J MC (2009) A study on diversity for cluster geometry optimization. Evol Intel 2(3):121
Petrescu M, O’Leary K, Goldring D, Mrad SB (2018) Incentivized reviews: Promising the moon for a few stars. J Retail Consum Serv 41:288–295
Rajamohana SP, Umamaheswari K, Abirami B (2017) Adaptive binary flower pollination algorithm for feature selection in review spam detection. In: 2017 International conference on innovations in green energy and healthcare technologies (IGEHT). IEEE, pp 1–4
Rajamohana SP, Umamaheswari K, Vasantha Keerthana S (2017) An effective hybrid cuckoo search with harmony search for review spam detection. In: 2017 Third international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB). IEEE, pp 524–527
Salehi S, Selamat A, Bostanian M (2011) Enhanced genetic algorithm for spam detection in email. In: 2011 IEEE 2Nd international conference on software engineering and service science. IEEE, pp 594–597
Santos I, Penya YK, Devesa J, Bringas PG (2009) N-grams-based file signatures for malware detection. ICEIS (2) 9:317–320
Sasaki M, Shinnou H (2005) Spam detection using text clustering. In: 2005 International conference on cyberworlds (CW’05). IEEE, pp 4–pp
Sedhai S, Sun A (2017) Semi-supervised spam detection in twitter stream. IEEE Trans Comput Soc Syst 5(1):169–175
Shehnepoor S, Salehi M, Farahbakhsh R, Crespi N (2017) Netspam: A network-based spam detection framework for reviews in online social media. IEEE Trans Inform Forens Secur 12(7):1585–1595
Shekhawat SS, Shringi S, Sharma H (2020) Twitter sentiment analysis using hybrid spider monkey optimization method. Evol Intel, 1–10
Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13Th international conference on intellient systems design and applications. IEEE, pp 53–58
Singh A, Batra S (2018) Ensemble based spam detection in social iot using probabilistic data structures. Futur Gener Comput Syst 81:359–371
Singh M, Kumar L, Sinha S (2018) Model for detecting fake or spam reviews. In: Ict based innovations. Springer, pp 213–217
Singh S, Singh AK (2018) Web-spam features selection using cfs-pso. Procedia Computer Science 125:568–575
Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359
Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1088–1096
Talbi E-G (2009) Metaheuristics: from design to implementation, vol 74. Wiley, New York
Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) Improving performance of classification on incomplete data using feature selection and clustering. Appl Soft Comput 73:848–861
Van der Aalst WMP, Rubin V, Verbeek HMW, Van Dongen BF, Kindler E, Günther CW (2010) Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9(1):87
Wang H, Yue L u, Zhai C (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 783–792
Wu C-H (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321–4330
Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. In: Proceedings of the australasian computer science week multiconference, pp 1–8
Wu Z, Wang Y, Wang Y, Wu J, Cao J, Lu Z (2015) Spammers detection from product reviews: a hybrid model. In: 2015 IEEE International conference on data mining. IEEE, pp 1039–1044
Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 823–831
Xu Y, Lin T, Lam W, Zhou Z, Cheng H, So AM-C (2014) Latent aspect mining via exploring sparsity and intrinsic information. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 879–888
Yang X-S (2010) Nature-inspired metaheuristic algorithms. Luniver Press
Yang Z, Nie X, Xu W, Guo J (2006) An approach to spam detection by naive bayes ensemble based on decision induction. In: Sixth international conference on intelligent systems design and applications, vol 2. IEEE, pp 861–866
Zhai Y, Song W, Liu X, Liu L, Zhao X (2018) A chi-square statistics based feature selection method in text classification. In: 2018 IEEE 9Th international conference on software engineering and service science (ICSESS). IEEE, pp 160–163
Funding
We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Shringi, S., Sharma, H. Detection of spam reviews using hybrid grey wolf optimizer clustering method. Multimed Tools Appl 81, 38623–38641 (2022). https://doi.org/10.1007/s11042-022-12848-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-022-12848-6