CBR-PDS: a case-based reasoning phishing detection system

  • Hassan Abutair
  • Abdelfettah Belghith
  • Saad AlAhmadi
Original Research


Phishing attacks have become the preferred vehicle to gather sensitive information as well as to deliver dangerous malware. So far, there is still no phishing detection system that can perfectly detect and progressively self adapt to differentiate between phishing and legitimate websites. This paper proposes the case-based reasoning Phishing detection system (CBR-PDS) that relies on previous cases to detect phishing attacks. CBR-PDS is highly adaptive and dynamic as it can adapt to detect new phishing attacks using rather a small dataset size in contrast to other machine learning techniques. CBR-PDS aims to improve the detection accuracy and the reliability of the results by identifying a set of discriminative features and discarding irrelevant features. CBR-PDS relies on a two stage hybrid procedure using Information gain and Genetic algorithms. The reduction of the data dimensionality amounts to an improved accuracy rate, yet it necessitates a reduced processing time. The CBR-PDS is tested using different scenarios on a various balanced datasets. The obtained performances clearly show the suitability of our proposed hybrid feature selection procedure as well as the efficiency of the proposed CBR-PDS system. The obtained accuracy rates exceed 95%. We also show that the integration of an Online Phishing Threats component into the CBR-PDS system improves further the accuracy rate. Finally, CRB-PDS performances are compared to those of several known competitive classifiers.


Phishing detection Machine learning classifiers Case-based reasoning Features selection Features weighting 



The authors extend their appreciation to the Deanship of Scientific Research at King Saud University for funding this work through research Group No. RG-1439-023.


  1. Aamodt A, Plaza E (1994) Case-based reasoning: foundational issues, methodological variations, and system approaches. AI Commun 7:39–59Google Scholar
  2. Abu-Nimeh S, Nappa D, Wang X, Nair S (2007) A comparison of machine learning techniques for phishing detection. In: Proceedings of the anti-phishing working group, 2nd annual eCrime researchers summit, pp 60–69Google Scholar
  3. Abutair HYA, Belghith A (2017a) A multi-agent case-based reasoning architecture for phishing detection. Elsevier Procedia Comput Sci 110:492–497CrossRefGoogle Scholar
  4. Abutair HYA, Belghith A (2017b) Using case-based reasoning for phishing detection. Elsevier Procedia Comput Sci 109:281–288CrossRefGoogle Scholar
  5. Aitken S (2017) Aiai cbr shell. In: Artificial intelligence applications institute. http://www.aiai.ed.ac.uk/project/cbr/CBRDistrib/. Accessed 15 Oct 2017
  6. Albitz P, Liu C (2009) DNS and BIND, 5th edn. O’Reilly Media, NewtonMATHGoogle Scholar
  7. Alhaj TA, Siraj A, Zainal, MM, Elshoush HT, Elhaj F (2017) Feature selection using information gain for improved structural-based alert correlation. PloS one 11:1–18Google Scholar
  8. Amiri I, Akanbi O, Fazeldehkordi E (2015) A machine-Learning approach to phishing detection and defense. Elsevier. ISBN: 978-0-12-802927-5. https://www.sciencedirect.com/science/book/9780128029275
  9. Basnet R, Basnet R, Mukkamala S, Sung AH (2008) Detection of phishing attacks: a machine learning approach. Industry studies in fuzziness and soft computing, vol 226. Springer, BerlinGoogle Scholar
  10. Basnet RB, Doleck T (2015) Towards developing a tool to detect phishing urls: a machine learning approach. In: IEEE international conference on computational intelligence and communication technology (CICT’15), pp 220–223Google Scholar
  11. Basnet RB, Sung AH, Liu Q (2012) Feature selection for improved phishing detection. IEA/AIE 2012. Lecture Notes in Computer Science, vol 7345. Springer, BerlinGoogle Scholar
  12. Bergmann R, Kolodner J, Plaza E (2005) Representation in case-based reasoning. Knowl Eng Rev 20:209–213CrossRefGoogle Scholar
  13. Breiman L (2001) Random forests. Mach Learn 45:5–32CrossRefMATHGoogle Scholar
  14. Chaudhry JA, Chaudhry SA, Rittenhouse RG (2016) Phishing attacks and defenses. Int J Secur Appl 10:247–256Google Scholar
  15. Cohen WW (1995) Fast effective rule induction. In: Proceedings of the twelfth international conference on machine learning, pp 115–123, Morgan Kaufmann, BurlingtonGoogle Scholar
  16. Cortes C, Vapnik V (1995) Support vector networks. Mach Learn 20:273–297MATHGoogle Scholar
  17. Dunham K (2009) Mobile malware attacks and defense. Elsevier. ISBN: 978-1-59749-298-0. https://www.sciencedirect.com/science/book/9781597492980
  18. Eiben AE, Smith JE (2010) Introduction to evolutionary computing (Natural Computing Series). Springer, BerlinMATHGoogle Scholar
  19. Frank E, Witten IH (1998) Generating accurate rule sets without global optimization. In: Hamilton: computer science, University of WaikatoGoogle Scholar
  20. Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Rem Sens Lett 12:309–313CrossRefGoogle Scholar
  21. Hall M et al (2009) The weka data mining software: an update. ACM SIGKDD Explor Newsl 11:10–18CrossRefGoogle Scholar
  22. Hewahi NM, Alashqar EA (2015) Wrapper feature selection based on genetic algorithm for recognizing objects from satellite imagery. J Inf Technol Res 8:1–20CrossRefGoogle Scholar
  23. Huang H, Qian L, Wang Y (2012) A svm-based technique to detect phishing urls. Inf Technol J 11:921–925CrossRefGoogle Scholar
  24. Jiang S, Chin KS, Wang L, Qu G, Tsui KL (2017) Modified genetic algorithm-based feature selection combined with pre-trained deep neural network for demand forecasting in outpatient department. Expert Syst Appl 82:216–230CrossRefGoogle Scholar
  25. Khonji M, Iraqi Y, Jones A (2013) Phishing detection: a literature survey. IEEE Commun Surv Tutor 15:2091–2121CrossRefGoogle Scholar
  26. Landwehr N, Hall M, Frank E (2005) Logistic model trees. Mach Learn 59:161–205CrossRefMATHGoogle Scholar
  27. Liu G, Qiu B, Wenyin L (2010) Automatic detection of phishing target from phishing webpage. In: Proceedings of the 20th international conference on pattern recognition (ICPR’10), pp 4153–4156Google Scholar
  28. Liu Z, Wang R, Tao M (2016) Smoteadanl: a learning method for network traffic classification. J Ambient Intell Hum Comput 7:121–130CrossRefGoogle Scholar
  29. Marchal S (2015) Analyse du dns et analyse smantique pour la dtection de l’hameonnage. Ph.D. Dissertation, pp 1–5, University of Lorraine, FranceGoogle Scholar
  30. Marchal S, Franois J, State R, Engel T (2014a) Phishscore: Hacking phishers’ minds. In: Proceedings of the international conference on network and service management (CNSM’14), pp 46–54Google Scholar
  31. Marchal S, Franois J, State R, EngelMoghimi T (2014b) Phishstorm: detecting phishing with streaming analytics. IEEE Trans Netw Serv Manag 11:458–471CrossRefGoogle Scholar
  32. Miyamoto D, Hazeyama H, Kadobayashi Y (2009) An evaluation of machine learning-based methods for detection of phishing sites. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546Google Scholar
  33. Moghimi M, Varjani AY (2016) New rule-based phishing detection method. Expert Syst Appl 53:231–242CrossRefGoogle Scholar
  34. Murphy C, Kaiser GE (2008) Improving the dependability of machine learning applications. In: Research Report, Department of Computer Science, Columbia University, NY, USAGoogle Scholar
  35. Novakovic J (2016) Toward optimal feature selection using ranking methods and classification algorithms. Yugoslav J Oper Res 21(1)Google Scholar
  36. Obitko M (2017) Introduction to genetic algorithms. In: http://obitko.com/tutorials/genetic-algorithms/. Accessed 15 Oct 2017
  37. Pradeepth KI, Kannan A (2009) Performance study of classification techniques for phishing url detection. In: Kppen M, Kasabov N, Coghill G (eds) Advances in neuro-information processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5506. Springer, Berlin, pp 539–546Google Scholar
  38. Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann, BurlingtonGoogle Scholar
  39. Reyes ER, Negny S, Robles GC, Le Lann JM (2015) Improvement of online adaptation knowledge acquisition and reuse in case-based reasoning: application to process engineering design. Eng Appl Artif Intell 41:1–16CrossRefGoogle Scholar
  40. Richter MM, Weber R (2013) Case-based reasoning: a textbook. Sringer, BerlinCrossRefGoogle Scholar
  41. Scrucca L (2016) Genetic algorithms for subset selection in model-based clustering. Springer, BerlinCrossRefGoogle Scholar
  42. Sumner M, Hall M, Frank E (2005) Greedy attribute selection. In: Jorge A, Torgo L, Brazdil P, Camacho R, Gama J (eds) PKDD’05. Lecture notes in computer science, vol 3721. Springer, Berlin, Heidelberg, 675–683, Morgan Kaufmann, BurlingtonGoogle Scholar
  43. Tan CL, Chiew KL (2014) Phishing website detection using url-assisted brand name weighting system. In: Proceedings of the international symposium on intelligent signal processing and communication systems (ISPACS’14), pp 054–059Google Scholar
  44. Xue B, Zhang M, Browne WN, Yao X (2016) A survey on evolutionary computation approaches to feature selection. IEEE Trans Evol Comput 20:606–626CrossRefGoogle Scholar
  45. Yala N, Fergani B, Fleury A (2017) Towards improving feature extraction and classification for activity recognition on streaming data. J Ambient Intell Hum Comput 8:177–189CrossRefGoogle Scholar
  46. Zuhair H, Selamat A, Salleh M (2015) Selection of robust feature subsets for phish webpage prediction using maximum relevance and minimum redundancy criterion. J Theor Appl Inf Technol 81(2):188–205Google Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  • Hassan Abutair
    • 1
  • Abdelfettah Belghith
    • 1
  • Saad AlAhmadi
    • 1
  1. 1.College of Computer and Information SciencesKing Saud UniversityRiyadhSaudi Arabia

Personalised recommendations