Advertisement

Neural Computing and Applications

, Volume 28, Supplement 1, pp 699–705 | Cite as

Malicious URL detection via spherical classification

  • A. Astorino
  • A. Chiarello
  • M. GaudiosoEmail author
  • A. Piccolo
Original Article
  • 413 Downloads

Abstract

We introduce and test a binary classification method aimed at detecting malicious URL on the basis of some information on both the URL syntax and its domain properties. Our method belongs to the class of supervised machine learning models, where, in particular, classification is performed by using information coming from a set of URL’s (samples in machine learning parlance) whose class membership is known in advance. The main novelty of our approach is in the use of a spherical separation-based algorithm, instead of SVM-type methods, which are based on hyperplanes as separation surfaces in the sample space. In particular we adopt a simplified spherical separation model which runs in O(tlogt) time (t is the number of samples in the training set), and thus is suitable for large-scale applications. We test our approach using different sets of features and report the results in terms of training correctness according to the well-established tenfold cross-validation paradigm.

Keywords

Classification Spherical separation Malicious Web sites 

Notes

Acknowledgments

This work has been partially supported by Italian M.I.U.R. Programma Operativo Nazionale (PON) 2007–2013, Project “Protezione dei servizi digitali e di pagamento elettronico,” PON03PE_00032_2.

References

  1. 1.
    Astorino A, Gaudioso M (2005) Ellipsoidal separation for classification problems. Optim Methods Softw 20(2–3):261–270MathSciNetzbMATHGoogle Scholar
  2. 2.
    Astorino A, Gaudioso M (2009) A fixed-center spherical separation algorithm with kernel transformations for classification problems. CMS 6(3):357–372MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Astorino A, Fuduli A, Gaudioso M (2010) DC models for spherical separation. J Glob Optim 48(4):657–669MathSciNetCrossRefzbMATHGoogle Scholar
  4. 4.
    Astorino A, Fuduli A, Gaudioso M (2012) Margin maximization in spherical separation. Comput Optim Appl 53(2):301–322MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1:23–34CrossRefGoogle Scholar
  6. 6.
    Cristianini N, Shawe-Taylor J (2000) An introduction to support vector machines and other kernel-based learning methods. Cambridge University Press, CambridgeCrossRefzbMATHGoogle Scholar
  7. 7.
  8. 8.
    Le Thi HA, Pham Dihn T (2005) The DC (difference of convex functions) programming and DCA revisited with DC models of real world nonconvex optimization problems. Ann Oper Res 133:23–46MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Le Thi HA, Le HM, Pham Dinh T, Van Huynh N (2013) Binary classification via spherical separator by DC programming and DCA. J Glob Optim 56:1393–1407MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Ma J, Saul LK, Savage S, Voelker GM (2009) Beyond blacklists: learning to detect malicious web sites from suspicious URLs. KDD’09, June 28–July 1, 2009. France, Paris, pp 1245–1253Google Scholar
  11. 11.
    Mangasarian OL (1965) Linear and nonlinear separation of patterns by linear programming. Oper Res 13:444–452MathSciNetCrossRefzbMATHGoogle Scholar
  12. 12.
    Palagi L, Sciandrone M (2005) On the convergence of a modified version of \(SVM^{light}\) algorithm. Optim Methods Softw 20(2–3):317–334MathSciNetCrossRefzbMATHGoogle Scholar
  13. 13.
    Pham Dinh T, Le Thi HA (1998) A D.C. optimization algorithm for solving the trust-region subproblem. SIAM J Con Opt 8:476–505MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
  15. 15.
    Rosen JB (1965) Pattern separation by convex programming. J Math Anal Appl 10:123–134MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Vapnik V (1995) The nature of the statistical learning theory. Springer, New YorkCrossRefzbMATHGoogle Scholar
  17. 17.
    Zhang J, Porras P, Ullrich J (2008) Highly predictive blacklisting. USENIX Security Symposium 2008—usenix.orgGoogle Scholar

Copyright information

© The Natural Computing Applications Forum 2016

Authors and Affiliations

  1. 1.ICAR--CNRRendeItaly
  2. 2.DIMESUniversity of CalabriaRendeItaly

Personalised recommendations