Catching the Phish: Detecting Phishing Attacks Using Recurrent Neural Networks (RNNs)

Halgaš, Lukáš; Agrafiotis, Ioannis; Nurse, Jason R. C.

doi:10.1007/978-3-030-39303-8_17

Lukáš Halgaš⁹,
Ioannis Agrafiotis⁹ &
Jason R. C. Nurse¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 11897))

Included in the following conference series:

International Workshop on Information Security Applications

1246 Accesses
13 Citations
5 Altmetric

Abstract

The emergence of online services in our daily lives has been accompanied by a range of malicious attempts to trick individuals into performing undesired actions, often to the benefit of the adversary. The most popular medium of these attempts is phishing attacks, mainly through emails and websites. In order to defend against such attacks, there is an urgent need for automated mechanisms to identify this malicious content before it reaches users. Machine learning techniques have gradually become the standard for such classification problems. However, identifying common measurable features of phishing content (e.g., in emails) is notoriously difficult. To address this problem, we engage in a novel study into a phishing content classifier based on a recurrent neural network (RNN), which identifies such features without human input. At this stage, we scope our research to emails, but our approach can be extended to apply to websites. Our results show that the proposed system outperforms state-of-the-art tools. Furthermore, our classifier is efficient and takes into account only the text and, in particular, the textual structure of the email. Since these features are rarely considered in email classification, we argue that our classifier can complement existing classifiers with high information gain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bahnsen, A.C., Bohorquez, E.C., Villegas, S., Vargas, J., González, F.A.: Classifying phishing URLs using recurrent neural networks. In: Proceedings of the APWG Symposium on Electronic Crime Research, eCrime 2017. IEEE, April 2017. https://doi.org/10.1109/ECRIME.2017.7945048
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994). https://doi.org/10.1109/72.279181
Article Google Scholar
Bergholz, A., Chang, J.H., Paaß, G., Reichartz, F., Strobel, S.: Improved phishing detection using model-based features. In: Proceedings of the Fifth Conference on Email and Anti-Spam, CEAS 2008, August 2008
Google Scholar
Bergholz, A., De Beer, J., Glahn, S., Moens, M.F., Paaß, G., Strobel, S.: New filtering approaches for phishing email. J. Comput. Secur. 18(1), 7–35 (2010). https://doi.org/10.3233/JCS-2010-0371. Special Issue on EU-funded ICT research on Trust and Security
Article Google Scholar
Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In: Proceedings of the 9th Annual NYS Cyber Security Conference, NYSCSC 2006, June 2006
Google Scholar
Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 649–656. ACM, May 2007. https://doi.org/10.1145/1242572.1242660
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000). https://doi.org/10.1109/72.279181
Article Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). https://www.deeplearningbook.org
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Article Google Scholar
Iuga, C., Nurse, J.R.C., Erola, A.: Baiting the hook: factors impacting susceptibility to phishing attacks. Hum.-Centric Comput. Inf. Sci. 6 (2016). https://doi.org/10.1186/s13673-016-0065-2
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp. 2342–2350, July 2015
Google Scholar
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013). https://doi.org/10.1109/SURV.2013.032213.00009
Article Google Scholar
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv preprint (2014). https://arxiv.org/abs/1412.6980
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014). https://doi.org/10.1007/s00521-013-1490-z
Article Google Scholar
Nazario, J.: https://monkey.org/~jose/phishing/
Nurse, J.R.C.: Cybercrime and you: how criminals attack and the human factors that they seek to exploit. In: The Oxford Handbook of Cyberpsychology. Oxford University Press, Oxford, May 2019. https://doi.org/10.1093/oxfordhb/9780198812746.013.35
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. arXiv preprint (2012). https://arxiv.org/abs/1211.5063
PhishMe Inc.: 2016 enterprise phishing susceptibility and resiliency report (2016)
Google Scholar
Porter, M.F.: Snowball: a language for stemming algorithms. https://snowballstem.org/
Ramzan, Z.: Phishing attacks and countermeasures. In: Stavroulakis, P., Stamp, M. (eds.) Handbook of Information and Communication Security, pp. 433–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04117-4_23
Chapter Google Scholar
Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint (2013). https://arxiv.org/abs/1312.6120
SpamAssassin. https://spamassassin.apache.org/old/publiccorpus/
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
MathSciNet MATH Google Scholar
Toolan, F., Carthy, J.: Feature selection for spam and phishing detection. In: 2010 eCrime Researchers Summit, pp. 1–12. IEEE, October 2010. https://doi.org/10.1109/ecrime.2010.5706696
Verma, R., Shashidhar, N., Hossain, N.: Detecting phishing emails the natural language way. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 824–841. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33167-1_47
Chapter Google Scholar
Vinayakumar, R., Soman, K.P., Poornachandran, P.: Evaluating deep learning approaches to characterize and classify malicious URLs. J. Intell. Fuzzy Syst. 34(3), 1333–1343 (2018). https://doi.org/10.3233/JIFS-169423
Article Google Scholar
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint (2014). https://arxiv.org/abs/arXiv:1409.2329
Zhao, J., Wang, N., Ma, Q., Cheng, Z.: Classifying malicious URLs using gated recurrent neural networks. In: Barolli, L., Xhafa, F., Javaid, N., Enokido, T. (eds.) IMIS 2018. AISC, vol. 773, pp. 385–394. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-93554-6_36
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Oxford, Oxford, UK
Lukáš Halgaš & Ioannis Agrafiotis
School of Computing, University of Kent, Canterbury, UK
Jason R. C. Nurse

Authors

Lukáš Halgaš
View author publications
You can also search for this author in PubMed Google Scholar
Ioannis Agrafiotis
View author publications
You can also search for this author in PubMed Google Scholar
Jason R. C. Nurse
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lukáš Halgaš .

Editor information

Editors and Affiliations

Soonchunhyang University, Asan, Korea (Republic of)
Ilsun You

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Halgaš, L., Agrafiotis, I., Nurse, J.R.C. (2020). Catching the Phish: Detecting Phishing Attacks Using Recurrent Neural Networks (RNNs). In: You, I. (eds) Information Security Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11897. Springer, Cham. https://doi.org/10.1007/978-3-030-39303-8_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-39303-8_17
Published: 25 January 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39302-1
Online ISBN: 978-3-030-39303-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics