Abstract
The emergence of online services in our daily lives has been accompanied by a range of malicious attempts to trick individuals into performing undesired actions, often to the benefit of the adversary. The most popular medium of these attempts is phishing attacks, mainly through emails and websites. In order to defend against such attacks, there is an urgent need for automated mechanisms to identify this malicious content before it reaches users. Machine learning techniques have gradually become the standard for such classification problems. However, identifying common measurable features of phishing content (e.g., in emails) is notoriously difficult. To address this problem, we engage in a novel study into a phishing content classifier based on a recurrent neural network (RNN), which identifies such features without human input. At this stage, we scope our research to emails, but our approach can be extended to apply to websites. Our results show that the proposed system outperforms state-of-the-art tools. Furthermore, our classifier is efficient and takes into account only the text and, in particular, the textual structure of the email. Since these features are rarely considered in email classification, we argue that our classifier can complement existing classifiers with high information gain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bahnsen, A.C., Bohorquez, E.C., Villegas, S., Vargas, J., González, F.A.: Classifying phishing URLs using recurrent neural networks. In: Proceedings of the APWG Symposium on Electronic Crime Research, eCrime 2017. IEEE, April 2017. https://doi.org/10.1109/ECRIME.2017.7945048
Bengio, Y., Simard, P., Frasconi, P.: Learning long-term dependencies with gradient descent is difficult. IEEE Trans. Neural Netw. 5(2), 157–166 (1994). https://doi.org/10.1109/72.279181
Bergholz, A., Chang, J.H., Paaß, G., Reichartz, F., Strobel, S.: Improved phishing detection using model-based features. In: Proceedings of the Fifth Conference on Email and Anti-Spam, CEAS 2008, August 2008
Bergholz, A., De Beer, J., Glahn, S., Moens, M.F., Paaß, G., Strobel, S.: New filtering approaches for phishing email. J. Comput. Secur. 18(1), 7–35 (2010). https://doi.org/10.3233/JCS-2010-0371. Special Issue on EU-funded ICT research on Trust and Security
Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In: Proceedings of the 9th Annual NYS Cyber Security Conference, NYSCSC 2006, June 2006
Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th International Conference on World Wide Web, WWW 2007, pp. 649–656. ACM, May 2007. https://doi.org/10.1145/1242572.1242660
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM. Neural Comput. 12(10), 2451–2471 (2000). https://doi.org/10.1109/72.279181
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press, Cambridge (2016). https://www.deeplearningbook.org
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997). https://doi.org/10.1162/neco.1997.9.8.1735
Iuga, C., Nurse, J.R.C., Erola, A.: Baiting the hook: factors impacting susceptibility to phishing attacks. Hum.-Centric Comput. Inf. Sci. 6 (2016). https://doi.org/10.1186/s13673-016-0065-2
Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: Proceedings of the 32nd International Conference on Machine Learning, ICML 2015, pp. 2342–2350, July 2015
Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013). https://doi.org/10.1109/SURV.2013.032213.00009
Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. arXiv preprint (2014). https://arxiv.org/abs/1412.6980
Mohammad, R.M., Thabtah, F., McCluskey, L.: Predicting phishing websites based on self-structuring neural network. Neural Comput. Appl. 25(2), 443–458 (2014). https://doi.org/10.1007/s00521-013-1490-z
Nazario, J.: https://monkey.org/~jose/phishing/
Nurse, J.R.C.: Cybercrime and you: how criminals attack and the human factors that they seek to exploit. In: The Oxford Handbook of Cyberpsychology. Oxford University Press, Oxford, May 2019. https://doi.org/10.1093/oxfordhb/9780198812746.013.35
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. arXiv preprint (2012). https://arxiv.org/abs/1211.5063
PhishMe Inc.: 2016 enterprise phishing susceptibility and resiliency report (2016)
Porter, M.F.: Snowball: a language for stemming algorithms. https://snowballstem.org/
Ramzan, Z.: Phishing attacks and countermeasures. In: Stavroulakis, P., Stamp, M. (eds.) Handbook of Information and Communication Security, pp. 433–448. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-04117-4_23
Saxe, A.M., McClelland, J.L., Ganguli, S.: Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint (2013). https://arxiv.org/abs/1312.6120
SpamAssassin. https://spamassassin.apache.org/old/publiccorpus/
Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)
Toolan, F., Carthy, J.: Feature selection for spam and phishing detection. In: 2010 eCrime Researchers Summit, pp. 1–12. IEEE, October 2010. https://doi.org/10.1109/ecrime.2010.5706696
Verma, R., Shashidhar, N., Hossain, N.: Detecting phishing emails the natural language way. In: Foresti, S., Yung, M., Martinelli, F. (eds.) ESORICS 2012. LNCS, vol. 7459, pp. 824–841. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-33167-1_47
Vinayakumar, R., Soman, K.P., Poornachandran, P.: Evaluating deep learning approaches to characterize and classify malicious URLs. J. Intell. Fuzzy Syst. 34(3), 1333–1343 (2018). https://doi.org/10.3233/JIFS-169423
Zaremba, W., Sutskever, I., Vinyals, O.: Recurrent neural network regularization. arXiv preprint (2014). https://arxiv.org/abs/arXiv:1409.2329
Zhao, J., Wang, N., Ma, Q., Cheng, Z.: Classifying malicious URLs using gated recurrent neural networks. In: Barolli, L., Xhafa, F., Javaid, N., Enokido, T. (eds.) IMIS 2018. AISC, vol. 773, pp. 385–394. Springer, Cham (2019). https://doi.org/10.1007/978-3-319-93554-6_36
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Halgaš, L., Agrafiotis, I., Nurse, J.R.C. (2020). Catching the Phish: Detecting Phishing Attacks Using Recurrent Neural Networks (RNNs). In: You, I. (eds) Information Security Applications. WISA 2019. Lecture Notes in Computer Science(), vol 11897. Springer, Cham. https://doi.org/10.1007/978-3-030-39303-8_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-39303-8_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39302-1
Online ISBN: 978-3-030-39303-8
eBook Packages: Computer ScienceComputer Science (R0)