NLP Based Phishing Attack Detection from URLs

  • Ebubekir Buber
  • Banu Diri
  • Ozgur Koray Sahingoz
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 736)


In recent years, phishing has become an increasing threat in the cyberspace, especially with the increasingly use of messaging and social networks. In traditional phishing attack, users are motivated to visit a bogus website which is carefully designed to look like exactly to a famous banking, e-commerce, social networks, etc., site for getting some personal information such as credit card numbers, usernames, passwords, and even money. Lots of the phishers usually make their attacks with the help of emails by forwarding to the target website. Inexperienced users (even the experienced ones) can visit these fake websites and share their sensitive information. In a phishing attack analysis of 45 countries in the last quarter of 2016, China, Turkey and Taiwan are mostly plagued by malware with the rate of 47.09%, 42.88% and 38.98%. Detection of a phishing attack is a challenging problem, because, this type of attacks is considered as semantics-based attacks, which mainly exploit the computer user’s vulnerabilities. In this paper, a phishing detection system which can detect this type of attacks by using some machine learning algorithms and detecting some visual similarities with the help of some natural language processing techniques. Many tests have been applied on the proposed system and experimental results showed that Random Forest algorithm has a very good performance with a success rate of 97.2%.


Machine learning Phishing attack Random Forest Algorithm Cyber attack detection Cyber security 



Thanks to Normshield Inc., BGA Security, SinaraLabs and Roksit for contributing to the development of this work.


  1. 1.
    Anti-Phishing Working Group (APWG): Phishing activity trends report—last quarter (2016).
  2. 2.
    Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013)CrossRefGoogle Scholar
  3. 3.
    Garera, S., Provos, N., Chew, M., Rubin, A.D.: A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM Workshop on Recurring Malcode, pp. 1–8. ACM, November 2007Google Scholar
  4. 4.
    Stone, A.: Natural-language processing for intrusion detection. Computer 40(12), 103–105 (2007)CrossRefGoogle Scholar
  5. 5.
    Fu, A.Y., Wenyin, L., Deng, X.: Detecting phishing web pages with visual similarity assessment based on earth mover’s distance (EMD). IEEE Trans. Dependable Secur. Comput. 3(4), 301–311 (2006)CrossRefGoogle Scholar
  6. 6.
    Toolan, F., Carthy, J.: Phishing detection using classifier ensembles. In: 2009 eCrime Researchers Summit, eCRIME 2009, pp. 1–9 (2009)Google Scholar
  7. 7.
    Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit, eCrime 2007, pp. 60–69. ACM, New York (2007)Google Scholar
  8. 8.
    Cook, D.L., Gurbani, V.K., Daniluk, M.: Phishwish: a stateless phishing filter using minimal rules. In: Financial Cryptography and Data Security, pp. 182–186. Springer (2008)Google Scholar
  9. 9.
    Cao, Y., Han, W., Le, Y.: Anti-phishing based on automated individual white-list. In: DIM 2008: 4th ACM Workshop on Digital Identity Management, New York, pp. 51–60 (2008)Google Scholar
  10. 10.
    Sahingoz, O.K., Erdogan, N.: RUBDES: a rule based distributed event system. In: 18th International Symposium on Computer and Information Sciences - ISCIS 2003, Antalya, Turkey, pp. 284–291 (2003)Google Scholar
  11. 11.
    Phistank: join the fight against phishing. Accessed Oct 2017
  12. 12.
    Yandex account: Yandex Technologies. Accessed Oct 2017
  13. 13.
    PyEnchant—PyEnchant v1.6.6 documentation. Accessed Oct 2017
  14. 14.
    A small program to detect gibberish using a Markov chain. Accessed Oct 2017
  15. 15.
    Weka 3: data mining software in Java. Accessed Oct 2017
  16. 16.
    Buber, E., Diri, B., Sahingoz, O.K.: Detecting phishing attacks from URL by using NLP techniques. In: 2017 International Conference on Computer Science and Engineering (UBMK), Antalya, Turkey, pp. 337–342 (2017)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Ebubekir Buber
    • 1
  • Banu Diri
    • 1
  • Ozgur Koray Sahingoz
    • 2
  1. 1.Computer Engineering DepartmentYildiz Techical UniversityIstanbulTurkey
  2. 2.Computer Engineering DepartmentIstanbul Kultur UniversityIstanbulTurkey

Personalised recommendations