Analysis of Phishing Base Problems Using Random Forest Features Selection Techniques and Machine Learning Classifiers

Pandey, Mithilesh Kumar; Singh, Munindra Kumar; Pal, Saurabh; Tiwari, B. B.

doi:10.1007/978-981-19-6004-8_5

Mithilesh Kumar Pandey⁷,
Munindra Kumar Singh⁷,
Saurabh Pal⁷ &
…
B. B. Tiwari⁸

Part of the book series: Algorithms for Intelligent Systems ((AIS))

560 Accesses

Abstract

Since the Internet is anonymous and uncontrolled, it is more open to phishing attacks, which can trick users to view malicious content in exchange for their personal information. However, the number of victims to this digital attack is significantly increasing due to inadequate security mechanisms. This research study develops a cyberbullying detection system, which can produce features from Twitter text by incorporating a point-wise mutual information approach. Further, a supervised machine learning method is developed for detecting the cyberbullying scenarios. Moreover, the proposed study has employed the sentiment, lexicon, and embedding features along with the PMI-semantic orientation. To apply extracted features, the SVM, Naive Bayes, KNN, decision tree, and random forest algorithm were employed. Experiments employing the proposed framework in a multi-class and binary setting indicate considerable potential in terms of kappa values, increased accuracy, and computed f-values. These findings imply that the proposed framework is a suitable option for recognizing the cyberbullying behavior in online social networks. Finally, the proposed outcomes and baseline features are compared by using various machine learning algorithms. The tenfold cross-validation has generated a highest accuracy of about 90.36%, and all four experiments assessed random forest algorithm based on 80% of the training dataset. The test result has also computed higher accuracy on random forest algorithm based on 20% of the test dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Analysis of Machine Learning Algorithms for Detection of Cyberbullying on Social Networks

Evaluating the Impact of Features for Twitter Spammers Detection

Comparative Analysis of Various Machine Learning Algorithms to Detect Cyberbullying on Twitter Dataset

References

Jain AK, Gupta BB (2018) PHISH-SAFE: URL features-based phishing detection system using machine learning. In: Cyber security. Advances inside intelligent systems and computing, vol 729. https://doi.org/10.1007/978-981-10-8536-9_44
Purbay M, Kumar D (2021) Split behavior of supervised machine learning algorithms on the behalf of phishing URL detection. Lecture notes inside electrical engineering, vol 683. https://doi.org/10.1007/978-981-15-6840-4_40
Gandotra E, Gupta D (2021) An efficient approach on the behalf of phishing detection using machine learning. In: Algorithms on the behalf of intelligent systems, Springer, Singapore.https://doi.org/10.1007/978-981-15-8711-5_12
Le H, Pham Q, Sahoo D, Hoi SCH (2017) URLNet: learning a URL representation with deep learning on the behalf of malicious URL detection. In: Conference’17, Washington, DC, USA. arXiv:1802.03162
Hong J, Kim T, Liu J, Park N, Kim SW Phishing URL detection with lexical features and blacklisted domains. In: Autonomous secure cyber systems. Springer, https://doi.org/10.1007/978-3-030-33432- 1_12.
Kumar J, Santhanavijayan A, Janet B, Rajendran B, Bindhumadhava BS (2020) Phishing website classification and detection using machine learning. In: International conference on computer communication and informatics (ICCCI), Coimbatore, India, pp 1–6, https://doi.org/10.1109/ICCCI48352.2020.9104161
Hassan YA, Abdelfettah B (2017) Using case-based reasoning on the behalf of phishing detection. Procedia Comput Sci 109:281–288
Article Google Scholar
Rao RS, Pais AR (2019) Jail-Phish: an improved search engine based phishing detection system. Comput Secur 1(83):246–267
Article Google Scholar
Aljofey A, Jiang Q, Qu Q, Huang M, Niyigena JP (2020) An effective phishing detection model based on character level convolutional neural network from URL. Electronics 9(9):1514
Article Google Scholar
AlEroud A, Karabatis G (2020) Bypassing detection of URL-based phishing attacks using generative adversarial deep neural networks. In: Proceedings of the sixth international workshop on security and privacy analytics 2020 Mar 16, pp 53–60
Google Scholar
Gupta D, Rani R (2020) Improving malware detection using big data and ensemble learning. Comput Electron Eng 86:106729
Google Scholar
Anirudha J, Tanuja P (2019) Phishing attack detection using feature selection techniques. In: Proceedings of international conference on communication and information processing (ICCIP). https://doi.org/10.2139/ssrn.3418542
Wu CY, Kuo CC, Yang CS (2019) A phishing detection system based on machine learning. In: International conference on intelligent computing and its emerging applications (ICEA), pp 28–32
Google Scholar
Chiew KL, Chang EH, Tiong WK (2015) Utilisation of website logo on the behalf of phishing detection. Comput Secur 16–26
Google Scholar
Srinivasa Rao R, Pais AR (2017) Detecting phishing websites using automation of human behavior. In: Proceedings of the 3rd ACM workshop on cyber-physical system security, ACM, pp 33–42
Google Scholar
Sahingoz OK, Buber E, Demir O, Diri B (2019) Machine learning based phishing detection from URLs. Expert Syst Appl 117:345–357
Article Google Scholar
Zamir A, Khan HU, Iqbal T, Yousaf N, Aslam F et al (2019) Phishing web site detection using diverse machine learning algorithms. Electron Libr 38(1):65–80
Article Google Scholar
Almseidin M, Zuraiq AA, Al-kasassbeh M, Alnidami N Phishing detection based on machine learning and feature selection methods. Int J Interact Mob Technol 13
Google Scholar
Tan CL, Chiew KL, Wong K (2016) PhishWHO: phishing webpage detection via identity keywords extraction and target domain name finder. Decis Support Syst 88:18–27
Article Google Scholar
Gull S, Parah SA (2019) Color image authentication using dual watermarks. In: Fifth international conference on image information processing (ICIIP), pp 240–245
Google Scholar
Giri KJ, Bashir R, Bhat JI (2019) A discrete wavelet based watermarking scheme on the behalf of authentication of medical images. Int J E-Health Med Commun 30–38
Google Scholar
Gandotra E, Bansal D, Sofat S (2016) Malware threat assessment using fuzzy logic paradigm. Cybern Syst 29–48
Google Scholar
Nisha S, Madheswari AN (2016) Secured authentication on the behalf of internet voting in corporate companies to prevent phishing attacks. 22(1):45–49
Google Scholar
Kazemian HB, Ahmed S (2015) Comparisons of machine learning techniques on the behalf of detecting malicious webpages. Expert Syst Appl 42(3):1166–1177
Article Google Scholar
Thomas K, Grier C, Ma J, Paxson V, Song D (2011) Design and evaluation of a real-time URL spam filtering service. In: IEEE symposium on security and privacy, pp 447–462
Google Scholar
Firdaus A, Anuar NB, Razak MFA, Hashem IAT, Bachok S, Sangaiah AK (2018) Root exploit detection and features optimization: mobile device and blockchain based medical data management. J Med Syst 42(6)
Google Scholar
Razak MFA, Anuar NB, Othman F, Firdaus A, Afifi F, Salleh R (2018) Bio-inspired on the behalf of features optimization and malware detection. Arab J Sci Eng
Google Scholar
Chaudhry JA, Chaudhry SA, Rittenhouse RG (2016) Phishing attacks and defenses. Int J Secur Appl 10(1):247–256
Google Scholar
Gowtham R, Krishnamurthi I (2014) A comprehensive and efficacious architecture on the behalf of detecting phishing webpages. Comput Secur 40:23–37
Article Google Scholar
Xiang G, Hong J, Rose CP, Cranor L (2011) Cantina+. ACM Trans Inf Syst Secur 14(2):1–28
Article Google Scholar
Abhilash PM, Chakradhar D (2021) Sustainability improvement of WEDM process by analysing and classifying wire rupture using kernel-based naive Bayes classifier. J Braz Soc Mech Sci Eng 43(2):1–9
Article Google Scholar
Khorshid SF, Abdulazeez AM (2021) Breast cancer diagnosis based on k-nearest neighbors: a review. PalArch’s J Archaeol Egypt/Egyptol 18(4):1927–1951
Google Scholar
Charbuty B, Abdulazeez A (2021) Classification based on decision tree algorithm on the behalf of machine learning. J Appl Sci Technol Trends 2(01):20–28
Article Google Scholar
Zhang W, Wu C, Zhong H, Li Y, Wang L (2021) Prediction of undrained shear strength using extreme gradient boosting and random forest based on Bayesian optimization. Geosci Front 12(1):469–477
Article Google Scholar
Chandra MA, Bedi SS (2021) Survey on SVM and their application in image classification. Int J Inf Technol 13(5):1–11
Google Scholar
Yadav DC, Pal S (2021) An ensemble approach on the behalf of classification and prediction of diabetes mellitus disease. In: Emerging trends in data driven computing and communications. Springer, Singapore, pp 225–235
Google Scholar
Yadav DC, Pal S (2021) Performance based evaluation of algorithms on chronic kidney disease using hybrid ensemble model in machine learning. Biomed Pharmacol J 14(3):1633–1646
Google Scholar
Yadav DC, Pal S (2021) Discovery of thyroid disease using different ensemble methods with reduced error pruning technique. In: Computer-aided design and diagnosis methods on the behalf of biomedical applications. CRC Press, pp 293–318
Google Scholar
Hamdan YB (2021) Construction of statistical SVM based recognition model for handwritten character recognition. J Inf Technol 3(02):92–107
Google Scholar
Tripathi M (2021) Sentiment analysis of Nepali COVID19 tweets using NB, SVM AND LSTM. J Artif Intell 3(03):151–168
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Applications, VBS Purvanchal University, Jaunpur, India
Mithilesh Kumar Pandey, Munindra Kumar Singh & Saurabh Pal
Department of Electronics and Communication, VBS Purvanchal University, Jaunpur, India
B. B. Tiwari

Authors

Mithilesh Kumar Pandey
View author publications
You can also search for this author in PubMed Google Scholar
Munindra Kumar Singh
View author publications
You can also search for this author in PubMed Google Scholar
Saurabh Pal
View author publications
You can also search for this author in PubMed Google Scholar
B. B. Tiwari
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Saurabh Pal .

Editor information

Editors and Affiliations

Computer Science and Engineering, GITAM University, Bangalore, Karnataka, India
I. Jeena Jacob
Department of Mathematics and Computer Science, Ashland University, River Forest, IL, USA
Selvanayaki Kolandapalayam Shanmugam
Department of Artificial Intelligence, Lviv Polytechnic National University, Lviv, Ukraine
Ivan Izonin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Pandey, M.K., Singh, M.K., Pal, S., Tiwari, B.B. (2023). Analysis of Phishing Base Problems Using Random Forest Features Selection Techniques and Machine Learning Classifiers. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Izonin, I. (eds) Data Intelligence and Cognitive Informatics. Algorithms for Intelligent Systems. Springer, Singapore. https://doi.org/10.1007/978-981-19-6004-8_5

Download citation

DOI: https://doi.org/10.1007/978-981-19-6004-8_5
Published: 03 December 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-6003-1
Online ISBN: 978-981-19-6004-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Analysis of Phishing Base Problems Using Random Forest Features Selection Techniques and Machine Learning Classifiers

Abstract

Access this chapter

Similar content being viewed by others

Analysis of Machine Learning Algorithms for Detection of Cyberbullying on Social Networks

Evaluating the Impact of Features for Twitter Spammers Detection

Comparative Analysis of Various Machine Learning Algorithms to Detect Cyberbullying on Twitter Dataset

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Analysis of Phishing Base Problems Using Random Forest Features Selection Techniques and Machine Learning Classifiers

Abstract

Access this chapter

Similar content being viewed by others

Analysis of Machine Learning Algorithms for Detection of Cyberbullying on Social Networks

Evaluating the Impact of Features for Twitter Spammers Detection

Comparative Analysis of Various Machine Learning Algorithms to Detect Cyberbullying on Twitter Dataset

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation