Skip to main content

Analysis of Phishing Base Problems Using Random Forest Features Selection Techniques and Machine Learning Classifiers

  • Conference paper
  • First Online:
Data Intelligence and Cognitive Informatics

Part of the book series: Algorithms for Intelligent Systems ((AIS))

  • 560 Accesses

Abstract

Since the Internet is anonymous and uncontrolled, it is more open to phishing attacks, which can trick users to view malicious content in exchange for their personal information. However, the number of victims to this digital attack is significantly increasing due to inadequate security mechanisms. This research study develops a cyberbullying detection system, which can produce features from Twitter text by incorporating a point-wise mutual information approach. Further, a supervised machine learning method is developed for detecting the cyberbullying scenarios. Moreover, the proposed study has employed the sentiment, lexicon, and embedding features along with the PMI-semantic orientation. To apply extracted features, the SVM, Naive Bayes, KNN, decision tree, and random forest algorithm were employed. Experiments employing the proposed framework in a multi-class and binary setting indicate considerable potential in terms of kappa values, increased accuracy, and computed f-values. These findings imply that the proposed framework is a suitable option for recognizing the cyberbullying behavior in online social networks. Finally, the proposed outcomes and baseline features are compared by using various machine learning algorithms. The tenfold cross-validation has generated a highest accuracy of about 90.36%, and all four experiments assessed random forest algorithm based on 80% of the training dataset. The test result has also computed higher accuracy on random forest algorithm based on 20% of the test dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 329.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Jain AK, Gupta BB (2018) PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Cyber security. Advances inside intelligent systems and computing, vol 729. https://doi.org/10.1007/978-981-10-8536-9_44

  2. Purbay M, Kumar D (2021) Split behavior of supervised machine learning algorithms on the behalf of phishing URL detection. Lecture notes inside electrical engineering, vol 683. https://doi.org/10.1007/978-981-15-6840-4_40

  3. Gandotra E, Gupta D (2021) An efficient approach on the behalf of phishing detection using machine learning. In: Algorithms on the behalf of intelligent systems, Springer, Singapore.https://doi.org/10.1007/978-981-15-8711-5_12

  4. Le H, Pham Q, Sahoo D, Hoi SCH (2017) URLNet: learning a URL representation with deep learning on the behalf of malicious URL detection. In: Conference’17, Washington, DC, USA. arXiv:1802.03162

  5. Hong J, Kim T, Liu J, Park N, Kim SW Phishing URL detection with lexical features and blacklisted domains. In: Autonomous secure cyber systems. Springer, https://doi.org/10.1007/978-3-030-33432- 1_12.

  6. Kumar J, Santhanavijayan A, Janet B, Rajendran B, Bindhumadhava BS (2020) Phishing website classification and detection using machine learning. In: International conference on computer communication and informatics (ICCCI), Coimbatore, India, pp 1–6, https://doi.org/10.1109/ICCCI48352.2020.9104161

  7. Hassan YA, Abdelfettah B (2017) Using case-based reasoning on the behalf of phishing detection. Procedia Comput Sci 109:281–288

    Article  Google Scholar 

  8. Rao RS, Pais AR (2019) Jail-Phish: an improved search engine based phishing detection system. Comput Secur 1(83):246–267

    Article  Google Scholar 

  9. Aljofey A, Jiang Q, Qu Q, Huang M, Niyigena JP (2020) An effective phishing detection model based on character level convolutional neural network from URL. Electronics 9(9):1514

    Article  Google Scholar 

  10. AlEroud A, Karabatis G (2020) Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In: Proceedings of the sixth international workshop on security and privacy analytics 2020 Mar 16, pp 53–60

    Google Scholar 

  11. Gupta D, Rani R (2020) Improving malware detection using big data and ensemble learning. Comput Electron Eng 86:106729

    Google Scholar 

  12. Anirudha J, Tanuja P (2019) Phishing attack detection using feature selection techniques. In: Proceedings of international conference on communication and information processing (ICCIP). https://doi.org/10.2139/ssrn.3418542

  13. Wu CY, Kuo CC, Yang CS (2019) A phishing detection system based on machine learning. In: International conference on intelligent computing and its emerging applications (ICEA), pp 28–32

    Google Scholar 

  14. Chiew KL, Chang EH, Tiong WK (2015) Utilisation of website logo on the behalf of phishing detection. Comput Secur 16–26

    Google Scholar 

  15. Srinivasa Rao R, Pais AR (2017) Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM workshop on cyber-physical system security, ACM, pp 33–42

    Google Scholar 

  16. Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357

    Article  Google Scholar 

  17. Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F et al (2019) Phishing web site detection using diverse machine learning algorithms. Electron Libr 38(1):65–80

    Article  Google Scholar 

  18. Almseidin M, Zuraiq AA, Al-kasassbeh M, Alnidami N Phishing detection based on machine learning and feature selection methods. Int J Interact Mob Technol 13

    Google Scholar 

  19. Tan CL, Chiew KL, Wong K (2016) PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Decis Support Syst 88:18–27

    Article  Google Scholar 

  20. Gull S, Parah SA (2019) Color image authentication using dual watermarks. In: Fifth international conference on image information processing (ICIIP), pp 240–245

    Google Scholar 

  21. Giri KJ, Bashir R, Bhat JI (2019) A discrete wavelet based watermarking scheme on the behalf of authentication of medical images. Int J E-Health Med Commun 30–38

    Google Scholar 

  22. Gandotra E, Bansal D, Sofat S (2016) Malware threat assessment using fuzzy logic paradigm. Cybern Syst 29–48

    Google Scholar 

  23. Nisha S, Madheswari AN (2016) Secured authentication on the behalf of internet voting in corporate companies to prevent phishing attacks. 22(1):45–49

    Google Scholar 

  24. Kazemian HB, Ahmed S (2015) Comparisons of machine learning techniques on the behalf of detecting malicious webpages. Expert Syst Appl 42(3):1166–1177

    Article  Google Scholar 

  25. Thomas K, Grier C, Ma J, Paxson V, Song D (2011) Design and evaluation of a real-time URL spam filtering service. In: IEEE symposium on security and privacy, pp 447–462

    Google Scholar 

  26. Firdaus A, Anuar NB, Razak MFA, Hashem IAT, Bachok S, Sangaiah AK (2018) Root exploit detection and features optimization: mobile device and blockchain based medical data management. J Med Syst 42(6)

    Google Scholar 

  27. Razak MFA, Anuar NB, Othman F, Firdaus A, Afifi F, Salleh R (2018) Bio-inspired on the behalf of features optimization and malware detection. Arab J Sci Eng

    Google Scholar 

  28. Chaudhry JA, Chaudhry SA, Rittenhouse RG (2016) Phishing attacks and defenses. Int J Secur Appl 10(1):247–256

    Google Scholar 

  29. Gowtham R, Krishnamurthi I (2014) A comprehensive and efficacious architecture on the behalf of detecting phishing webpages. Comput Secur 40:23–37

    Article  Google Scholar 

  30. Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+. ACM Trans Inf Syst Secur 14(2):1–28

    Article  Google Scholar 

  31. Abhilash PM, Chakradhar D (2021) Sustainability improvement of WEDM process by analysing and classifying wire rupture using kernel-based naive Bayes classifier. J Braz Soc Mech Sci Eng 43(2):1–9

    Article  Google Scholar 

  32. Khorshid SF, Abdulazeez AM (2021) Breast cancer diagnosis based on k-nearest neighbors: a review. PalArch’s J Archaeol Egypt/Egyptol 18(4):1927–1951

    Google Scholar 

  33. Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm on the behalf of machine learning. J Appl Sci Technol Trends 2(01):20–28

    Article  Google Scholar 

  34. Zhang W, Wu C, Zhong H, Li Y, Wang L (2021) Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci Front 12(1):469–477

    Article  Google Scholar 

  35. Chandra MA, Bedi SS (2021) Survey on SVM and their application in image classification. Int J Inf Technol 13(5):1–11

    Google Scholar 

  36. Yadav DC, Pal S (2021) An ensemble approach on the behalf of classification and prediction of diabetes mellitus disease. In: Emerging trends in data driven computing and communications. Springer, Singapore, pp 225–235

    Google Scholar 

  37. Yadav DC, Pal S (2021) Performance based evaluation of algorithms on chronic kidney disease using hybrid ensemble model in machine learning. Biomed Pharmacol J 14(3):1633–1646

    Google Scholar 

  38. Yadav DC, Pal S (2021) Discovery of thyroid disease using different ensemble methods with reduced error pruning technique. In: Computer-aided design and diagnosis methods on the behalf of biomedical applications. CRC Press, pp 293–318

    Google Scholar 

  39. Hamdan YB (2021) Construction of statistical SVM based recognition model for handwritten character recognition. J Inf Technol 3(02):92–107

    Google Scholar 

  40. Tripathi M (2021) Sentiment analysis of Nepali COVID19 tweets using NB, SVM AND LSTM. J Artif Intell 3(03):151–168

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saurabh Pal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Pandey, M.K., Singh, M.K., Pal, S., Tiwari, B.B. (2023). Analysis of Phishing Base Problems Using Random Forest Features Selection Techniques and Machine Learning Classifiers. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Izonin, I. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-6004-8_5

Download citation

Publish with us

Policies and ethics