Skip to main content

Advertisement

Log in

Detection of spam reviews using hybrid grey wolf optimizer clustering method

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Currently, online reviews play an essential role in the decision-making of customers. Various online websites such as Amazon, Yelp, Google Plus, BookMyShow, Facebook, Twitter, etc., allow its users to generate huge bulk of data. The data is generated in the form of feedback/reviews, comments, or tweets. This data is helpful for organizations to improve the quality of their products. Due to dependency on these online reviews, spam reviews are generated pretentiously by some organizations and people concerning promotion or demotion of the prominence of any product, organization, or person. Thus, identifying spam or non-spam review by the naked eye is nearly impossible. Classifying the reviews manually is also highly speculative. Hence, to overcome this issue, a hybrid Grey Wolf Optimizer (GWOK) based clustering method is proposed in this paper to identify spam reviews. In the proposed GWOK, the k-Means algorithm is used for initialization of the initial population for the basic GWO algorithm, and then the GWO algorithm is used for finding the optimal Cluster Heads. To prove that the proposed strategy is effective, three spam datasets, namely Synthetic Spam Reviews, Movie Reviews, and Yelp Hotel & Restaurant Reviews, have been used in our work. The reported results are compared with the existing state-of-art metaheuristic clustering methods like a genetic algorithm (GA), differential evolution (DE), particle swarm optimization (PSO), cuckoo search (CS), and k-Means. The results obtained by experimental and statistical analysis legitimize that the proposed GWOK algorithm surpasses contemporary techniques.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working groups 2nd annual eCrime researchers summit, pp 60–69

  2. Angeli A, Filliat D, Doncieux S, Meyer J-A (2008) Fast and incremental method for loop-closure detection using bags of visual words. IEEE Trans Robot 24(5):1027–1037

    Article  Google Scholar 

  3. Asghar MZ, Ullah A, Ahmad S, Khan A (2020) Opinion spam detection framework using hybrid classification scheme. Soft Comput 24(5):3475–3498

    Article  Google Scholar 

  4. Baeza-Yates R, Ribeiro-Neto B et al (1999) Modern information retrieval, vol 463. ACM Press, New York

    Google Scholar 

  5. Bindu PV, Mishra R, Santhi Thilagam P (2018) Discovering spammer communities in twitter. J Intell Inform Syst 51(3):503–527

    Article  Google Scholar 

  6. Bird S, Klein E, Loper E (2009) Natural language processing with python: analyzing text with the natural language toolkit. ” O’Reilly Media Inc.”

  7. Catal C, Guldan S (2017) Product review management software based on multiple classifiers. Iet Softw 11(3):89–92

    Article  Google Scholar 

  8. Chang T, Hsu PY, Cheng MS, Chung CY, Yi LC (2015) Detecting fake review with rumor model—case study in hotel review. In: International conference on intelligent science and big data engineering. Springer, pp 181–192

  9. Hartigan JA, Wong MA (1979) Algorithm as 136: A k-means clustering algorithm. J R Stat Soc Ser C (Appl Stat) 28(1):100–108

    MATH  Google Scholar 

  10. Hu Y-H, Chen Y-L, Chou H-L (2017) Opinion mining from online hotel reviews–a text summarization approach. Inform Process Manag 53(2):436–449

    Article  Google Scholar 

  11. Idris I, Selamat A, Omatu S (2014) Hybrid email spam detection model with negative selection algorithm and differential evolution. Eng Appl Artif Intell 28:97–110

    Article  Google Scholar 

  12. Inuwa-Dutse I, Liptrott M, Korkontzelos I (2018) Detection of spam-posting accounts on twitter. Neurocomputing 315:496–511

    Article  Google Scholar 

  13. Jindal N, Liu B (2008) Opinion spam and analysis. In: Proceedings of the 2008 international conference on web search and data mining, pp 219–230

  14. Kennedy J, Eberhart R (1995) Particle swarm optimization. In: Proceedings of ICNN’95-international conference on neural networks, vol 4. IEEE, pp 1942–1948

  15. Li FH, Huang M, Yi Y, Zhu X (2011) Learning to identify review spam. In: Twenty-second international joint conference on artificial intelligence

  16. Li Y, Nie X, Huang R (2018) Web spam classification method based on deep belief networks. Expert Syst Appl 96:261–270

    Article  Google Scholar 

  17. Liu S, Zhang J, Xiang Y (2016) Statistical detection of online drifting twitter spam. In: Proceedings of the 11th ACM on Asia conference on computer and communications security, pp 1–10

  18. Luca M (2016) Reviews: reputation, and revenue: The case of yelp. com. Com (March 15, 2016). Harvard Business School NOM Unit Working Paper (12-016)

  19. Mateen M, Iqbal MA, Aleem M, Islam MA (2017) A hybrid approach for spam detection for twitter. In: 2017 14Th international bhurban conference on applied sciences and technology (IBCAST). IEEE, pp 466–471

  20. Mccord M, Chuah M (2011) Spam detection on twitter using traditional classifiers. In: International conference on autonomic and trusted computing. Springer, pp 175–186

  21. Mesleh AMoA (2007) Chi square feature extraction based svms arabic language text categorization system. J Comput Sci 3(6):430–435

    Article  Google Scholar 

  22. Miller Z, Dickinson B, Deitrick W, Hu W, Wang AH (2014) Twitter spammer detection using data stream clustering. Inf Sci 260:64–73

    Article  Google Scholar 

  23. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61

    Article  Google Scholar 

  24. Mukherjee A, Liu B, Glance N (2012) Spotting fake reviewer groups in consumer reviews. In: Proceedings of the 21st international conference on World Wide Web, pp 191–200

  25. Mukherjee A, Venkataraman V, Liu B, Glance N (2013) What yelp fake review filter might be doing?. In: Proceedings of the international AAAI conference on web and social media, p 7

  26. Mukherjee A, Venkataraman V, Liu B, Glance N et al (2013) Fake review detection: Classification and analysis of real and pseudo reviews. Technical Report UIC-CS-2013–03. University of Illinois at Chicago. Tech Rep

  27. Narayan R, Rout JK, Jena SK (2018) Review spam detection using semi-supervised technique. In: Progress in intelligent computing techniques: theory, Practice, and Applications. Springer, pp 281–286

  28. Nesmachnow S (2014) An overview of metaheuristics: accurate and efficient methods for optimisation. Int J Metaheuristics 3(4):320–347

    Article  Google Scholar 

  29. Pennebaker JW, Boyd RL, Jordan K, Blackburn K (2015) The development and psychometric properties of liwc2015. Technical report

  30. Pereira FB, Marques J MC (2009) A study on diversity for cluster geometry optimization. Evol Intel 2(3):121

    Article  Google Scholar 

  31. Petrescu M, O’Leary K, Goldring D, Mrad SB (2018) Incentivized reviews: Promising the moon for a few stars. J Retail Consum Serv 41:288–295

    Article  Google Scholar 

  32. Rajamohana SP, Umamaheswari K, Abirami B (2017) Adaptive binary flower pollination algorithm for feature selection in review spam detection. In: 2017 International conference on innovations in green energy and healthcare technologies (IGEHT). IEEE, pp 1–4

  33. Rajamohana SP, Umamaheswari K, Vasantha Keerthana S (2017) An effective hybrid cuckoo search with harmony search for review spam detection. In: 2017 Third international conference on advances in electrical, electronics, information, communication and bio-informatics (AEEICB). IEEE, pp 524–527

  34. Salehi S, Selamat A, Bostanian M (2011) Enhanced genetic algorithm for spam detection in email. In: 2011 IEEE 2Nd international conference on software engineering and service science. IEEE, pp 594–597

  35. Santos I, Penya YK, Devesa J, Bringas PG (2009) N-grams-based file signatures for malware detection. ICEIS (2) 9:317–320

    Google Scholar 

  36. Sasaki M, Shinnou H (2005) Spam detection using text clustering. In: 2005 International conference on cyberworlds (CW’05). IEEE, pp 4–pp

  37. Sedhai S, Sun A (2017) Semi-supervised spam detection in twitter stream. IEEE Trans Comput Soc Syst 5(1):169–175

    Article  Google Scholar 

  38. Shehnepoor S, Salehi M, Farahbakhsh R, Crespi N (2017) Netspam: A network-based spam detection framework for reviews in online social media. IEEE Trans Inform Forens Secur 12(7):1585–1595

    Article  Google Scholar 

  39. Shekhawat SS, Shringi S, Sharma H (2020) Twitter sentiment analysis using hybrid spider monkey optimization method. Evol Intel, 1–10

  40. Shojaee S, Murad MAA, Azman AB, Sharef NM, Nadali S (2013) Detecting deceptive reviews using lexical and syntactic features. In: 2013 13Th international conference on intellient systems design and applications. IEEE, pp 53–58

  41. Singh A, Batra S (2018) Ensemble based spam detection in social iot using probabilistic data structures. Futur Gener Comput Syst 81:359–371

    Article  Google Scholar 

  42. Singh M, Kumar L, Sinha S (2018) Model for detecting fake or spam reviews. In: Ict based innovations. Springer, pp 213–217

  43. Singh S, Singh AK (2018) Web-spam features selection using cfs-pso. Procedia Computer Science 125:568–575

    Article  Google Scholar 

  44. Storn R, Price K (1997) Differential evolution–a simple and efficient heuristic for global optimization over continuous spaces. J Global Optim 11(4):341–359

    Article  MathSciNet  Google Scholar 

  45. Sun H, Morales A, Yan X (2013) Synthetic review spamming and defense. In: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 1088–1096

  46. Talbi E-G (2009) Metaheuristics: from design to implementation, vol 74. Wiley, New York

    Book  Google Scholar 

  47. Tran CT, Zhang M, Andreae P, Xue B, Bui LT (2018) Improving performance of classification on incomplete data using feature selection and clustering. Appl Soft Comput 73:848–861

    Article  Google Scholar 

  48. Van der Aalst WMP, Rubin V, Verbeek HMW, Van Dongen BF, Kindler E, Günther CW (2010) Process mining: a two-step approach to balance between underfitting and overfitting. Softw Syst Model 9(1):87

    Article  Google Scholar 

  49. Wang H, Yue L u, Zhai C (2010) Latent aspect rating analysis on review text data: a rating regression approach. In: Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 783–792

  50. Wu C-H (2009) Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst Appl 36(3):4321–4330

    Article  Google Scholar 

  51. Wu T, Liu S, Zhang J, Xiang Y (2017) Twitter spam detection based on deep learning. In: Proceedings of the australasian computer science week multiconference, pp 1–8

  52. Wu Z, Wang Y, Wang Y, Wu J, Cao J, Lu Z (2015) Spammers detection from product reviews: a hybrid model. In: 2015 IEEE International conference on data mining. IEEE, pp 1039–1044

  53. Xie S, Wang G, Lin S, Yu PS (2012) Review spam detection via temporal pattern discovery. In: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, pp 823–831

  54. Xu Y, Lin T, Lam W, Zhou Z, Cheng H, So AM-C (2014) Latent aspect mining via exploring sparsity and intrinsic information. In: Proceedings of the 23rd ACM international conference on conference on information and knowledge management, pp 879–888

  55. Yang X-S (2010) Nature-inspired metaheuristic algorithms. Luniver Press

  56. Yang Z, Nie X, Xu W, Guo J (2006) An approach to spam detection by naive bayes ensemble based on decision induction. In: Sixth international conference on intelligent systems design and applications, vol 2. IEEE, pp 861–866

  57. Zhai Y, Song W, Liu X, Liu L, Zhao X (2018) A chi-square statistics based feature selection method in text classification. In: 2018 IEEE 9Th international conference on software engineering and service science (ICSESS). IEEE, pp 160–163

Download references

Funding

We wish to confirm that there are no known conflicts of interest associated with this publication and there has been no significant financial support for this work that could have influenced its outcome.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sakshi Shringi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shringi, S., Sharma, H. Detection of spam reviews using hybrid grey wolf optimizer clustering method. Multimed Tools Appl 81, 38623–38641 (2022). https://doi.org/10.1007/s11042-022-12848-6

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-022-12848-6

Keywords

Navigation