Malicious URL detection via spherical classification
- 413 Downloads
We introduce and test a binary classification method aimed at detecting malicious URL on the basis of some information on both the URL syntax and its domain properties. Our method belongs to the class of supervised machine learning models, where, in particular, classification is performed by using information coming from a set of URL’s (samples in machine learning parlance) whose class membership is known in advance. The main novelty of our approach is in the use of a spherical separation-based algorithm, instead of SVM-type methods, which are based on hyperplanes as separation surfaces in the sample space. In particular we adopt a simplified spherical separation model which runs in O(tlogt) time (t is the number of samples in the training set), and thus is suitable for large-scale applications. We test our approach using different sets of features and report the results in terms of training correctness according to the well-established tenfold cross-validation paradigm.
KeywordsClassification Spherical separation Malicious Web sites
This work has been partially supported by Italian M.I.U.R. Programma Operativo Nazionale (PON) 2007–2013, Project “Protezione dei servizi digitali e di pagamento elettronico,” PON03PE_00032_2.
- 10.Ma J, Saul LK, Savage S, Voelker GM (2009) Beyond blacklists: learning to detect malicious web sites from suspicious URLs. KDD’09, June 28–July 1, 2009. France, Paris, pp 1245–1253Google Scholar
- 14.PhishTank: http://www.phishtank.com/
- 17.Zhang J, Porras P, Ullrich J (2008) Highly predictive blacklisting. USENIX Security Symposium 2008—usenix.orgGoogle Scholar