A Genetic Algorithm Based Model for Chinese Phishing E-commerce Websites Detection

  • Zhijun YanEmail author
  • Su Liu
  • Tianmei Wang
  • Baowen Sun
  • Hansi Jiang
  • Hangzhou Yang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9751)


We propose a new Chinese phishing e-commerce websites detection model which integrates the URL features and web features of websites. Some unique features of Chinese e-Commerce websites are included and Sequential Minimal Optimization (SMO) algorithm is applied to identify the phishing e-commerce websites. At the same time, we adopt the genetic algorithm (GA) to optimize the detection model. The evaluation results show that the performance of SMO algorithm is better than the baseline model and GA improves the detection accuracy significantly.


Chinese phishing website detection E-commerce Sequential minimal optimization Genetic algorithm 



This research is supported by the National Natural Science Foundation of China (Grant No. 71272057, 71572013) and the National Social Science Fund of China (Grant No. 14AZD045).


  1. 1.
    iResearch. Annual report of China’s E-Commerce (2014). Accessed 2014
  2. 2.
    Herzberg, A., Jbara, A.: Security and identification indicators for browsers against spoofing and phishing attacks. ACM Trans. Internet Technol. 8(4), 36 (2008)CrossRefGoogle Scholar
  3. 3.
    APAC. Annual report of Anti-Phishing Alliance of China (2012). Accessed 2012
  4. 4.
    Wu, M., Miller, R.C., Garfinkel, S.L.: Do security toolbars actually prevent phishing attacks? In: Proceedings of the 2006 Conference on Human Factors in Computing Systems (CHI 2006), Montréal, Québec, Canada (2006)Google Scholar
  5. 5.
    Ma, J., et al.: Learning to detect malicious URLs. ACM Trans. Intell. Syst. Technol. 2(3), 24 (2011)Google Scholar
  6. 6.
    Abbasi, A., et al.: Detecting fake websites: the contribution of statistical learning theory. MIS Q. 34(3), 435–461 (2010)MathSciNetGoogle Scholar
  7. 7.
    Fu, A.Y., Wenyin, L., Deng, X.T.: Detecting phishing web pages with visual similarity assessment based on Earth Mover’s Distance (EMD). IEEE Trans. Dependable Secure Comput. 3(4), 301–311 (2006)CrossRefGoogle Scholar
  8. 8.
    Zhang, H.J., et al.: Textual and visual content-based anti-phishing: a Bayesian approach. IEEE Trans. Neural Netw. 22(10), 1532–1546 (2011)CrossRefGoogle Scholar
  9. 9.
    Mao, J., et al.: BaitAlarm: detecting phishing sites using similarity in fundamental visual features. In: 2013 5th International Conference on Intelligent Networking and Collaborative Systems (INCoS). IEEE (2013)Google Scholar
  10. 10.
    Huang, H., Qian, L., Wang, Y.: A SVM-based technique to detect phishing URLs. Inf. Technol. J. 11(7), 921–925 (2012)CrossRefGoogle Scholar
  11. 11.
    Gowtham, R., Krishnamurthi, I.: A comprehensive and efficacious architecture for detecting phishing webpages. Comput. Secur. 40, 23–37 (2014)CrossRefGoogle Scholar
  12. 12.
    Bartoli, A., Davanzo, G., Medvet, E.: A framework for large-scale detection of web site defacements. ACM Trans. Internet Technol. 10(3), 37 (2010)CrossRefGoogle Scholar
  13. 13.
    Akiyama, M., Yagi, T., Hariu, T.: Improved blacklisting: inspecting the structural neighborhood of malicious URLs. IT Prof. 15(4), 50–56 (2013)CrossRefGoogle Scholar
  14. 14.
    Xiang, G., et al.: CANTINA+: a feature-rich machine learning framework for detecting phishing web sites. ACM Trans. Inf. Syst. Secur. 14(2), 21 (2011)CrossRefGoogle Scholar
  15. 15.
    Zhang, D., et al.: A domain-feature enhanced classification model for the detection of Chinese phishing e-Business websites. Inf. Manag. 51(7), 845–853 (2014)CrossRefGoogle Scholar
  16. 16.
    Platt, J.: Sequential minimal optimization: a fast algorithm for training support vector machines. IEEE Trans. Neural Netw. 17(4), 1039–1049 (1998)MathSciNetGoogle Scholar
  17. 17.
    Chang, C.-C., Lin, C.-J.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27–38 (2011)CrossRefGoogle Scholar
  18. 18.
    Zanni, L., Serafini, T., Zanghirati, G.: Parallel software for training large scale support vector machines on multiprocessor systems. J. Mach. Learn. Res. 7(3), 1467–1492 (2006)MathSciNetzbMATHGoogle Scholar
  19. 19.
    Wang, M., Wang, W.: Approach for kernel selection from SVM ensemble. Comput. Eng. Appl. 45(27), 31–33 (2009)Google Scholar
  20. 20.
    Kucukkoc, I., Karaoglan, A.D., Yaman, R.: Using response surface design to determine the optimal parameters of genetic algorithm and a case study. Int. J. Prod. Res. 51(17), 5039–5054 (2013)CrossRefGoogle Scholar
  21. 21.
    Rahman, M.A., Islam, M.Z.: A hybrid clustering technique combining a novel genetic algorithm with K-Means. Knowl.-Based Syst. 71, 345–365 (2014)CrossRefGoogle Scholar
  22. 22.
    Ilhan, I., Tezel, G.: A genetic algorithm-support vector machine method with parameter optimization for selecting the tag SNPs. J. Biomed. Inform. 46(2), 328–340 (2013)CrossRefGoogle Scholar
  23. 23.
    Khonji, M., Iraqi, Y., Jones, A.: Phishing detection: a literature survey. IEEE Commun. Surv. Tutor. 15(4), 2091–2121 (2013)CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Zhijun Yan
    • 1
    Email author
  • Su Liu
    • 1
  • Tianmei Wang
    • 2
  • Baowen Sun
    • 2
  • Hansi Jiang
    • 1
  • Hangzhou Yang
    • 1
  1. 1.School of Management and EconomicsBeijing Institute of TechnologyBeijingChina
  2. 2.School of InformationCentral University of Finance and EconomicsBeijingChina

Personalised recommendations