Advertisement

Time Split Based Pre-processing with a Data-Driven Approach for Malicious URL Detection

  • N. B. Harikrishnan
  • R. VinayakumarEmail author
  • K. P. Soman
  • Prabaharan Poornachandran
Chapter
Part of the Advanced Sciences and Technologies for Security Applications book series (ASTSA)

Abstract

Malicious uniform resource locator (URL) host unsolicited content and are a serious threat and are used to commit cyber crime. Malicious URL’s are responsible for various cyber attacks like spamming, identity theft, financial fraud, etc. The internet growth has also resulted in increase of fraudulent activities in the web. The classical methods like blacklisting is ineffective in detecting newly generated malicious URL’s. So there arises a need to develop an effective algorithm to detect and classify the malicious URL’s. At the same time the recent advancement in the field of machine learning had shown promising results in areas like image processing, Natural language processing (NLP) and other domains. This motivates us to move in the direction of machine learning based techniques for detecting and classifying URL’s. However, there are significant challenges in detecting malicious URL’s that needs to be answered. First and foremost any available data used in detecting malicious URL’s is outdated. This makes the model difficult to be deployed in real time scenario. Secondly the inability to capture semantic and sequential information affects the generalization to the test data. In order to overcome these shortcomings we introduce the concept of time split and random split on the training data. Random split will randomly split the data for training and testing. Whereas time split will split the data based on time information of the URL’s. This in turn is followed by different representation of the data. These representation are passed to the classical machine learning and deep learning techniques to evaluate the performance. The analysis for data set from Sophos Machine Learning building blocks tutorial shows better performance for time split based grouping of data with decision tree classifier and an accuracy of 88.5%. Additionally, highly scalable framework is designed to collect data from various data sources in a passive way inside an Ethernet LAN. The proposed framework can collect data in real time and process in a distributed way to provide situational awareness. The proposed framework can be easily extended to handle vary large amount of cyber events by adding additional resources to the existing system.

Keywords

Malicious URL Deep learning Machine learning Scalable framework Situational awareness 

Notes

Acknowledgements

This research was supported in part by Paramount Computer Systems and Lakhshya Cyber Security Labs. We are grateful to NVIDIA India, for the GPU hardware support to research grant. We are also grateful to Computational Engineering and Networking (CEN) department for encouraging the research.

References

  1. 1.
    Elhoseny H, Elhoseny M, Riad AM, Hassanien AE (2018). A framework for big data analysis in smart cities. In: International conference on advanced machine learning technologies and applications. Springer, Cham, pp 405–414CrossRefGoogle Scholar
  2. 2.
    Vinayakumar R, Poornachandran P, Soman KP (2018) Scalable framework for cyber threat situational awareness based on domain name systems data analysis. In: Big data in engineering applications. Springer, Singapore, pp 113–142Google Scholar
  3. 3.
    Mohan VS, Vinayakumar R, Soman KP, Poornachandran P (2018) Spoof net: syntactic patterns for identification of ominous online factors. In: 2018 IEEE security and privacy workshops (SPW). IEEE, pp 258–263Google Scholar
  4. 4.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Applying convolutional neural network for network intrusion detection. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1222–1228Google Scholar
  5. 5.
    Vinayakumar R, Soman KP, Velan KS, Ganorkar S (2017) Evaluating shallow and deep networks for ransomware detection and classification. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 259–265Google Scholar
  6. 6.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Evaluating effectiveness of shallow and deep networks to intrusion detection system. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1282–1289Google Scholar
  7. 7.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Evaluation of recurrent neural network and its variants for intrusion detection system (IDS). Int J Inf Syst Model Des (IJISMD) 8(3):43–63CrossRefGoogle Scholar
  8. 8.
    Vinayakumar R, Barathi Ganesh HB, Anand Kumar M, Soman KP (2018) DeepAnti-PhishNet: applying deep neural networks for Phishing email detection. In: CEN-AISecurity@IWSPA-2018, pp 40–50. http://ceur-ws.org/Vol-2124/paper9
  9. 9.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Applying deep learning approaches for network traffic prediction. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp. 2353–2358Google Scholar
  10. 10.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Evaluating shallow and deep networks for secure shell (ssh) traffic analysis. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 266–274Google Scholar
  11. 11.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Secure shell (ssh) traffic analysis with flow based features using shallow and deep networks. In: 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 2026–2032Google Scholar
  12. 12.
    Vinayakumar R, Soman KP, Poornachandran P, Sachin Kumar S (2018) Detecting android malware using long short-term memory (LSTM). J Intell Fuzzy Syst 34(3):1277–1288CrossRefGoogle Scholar
  13. 13.
    Vinayakumar R, Soman KP, Poornachandran P (2017) Deep android malware detection and classification. In 2017 international conference on advances in computing, communications and informatics (ICACCI). IEEE, pp 1677–1683Google Scholar
  14. 14.
    Vinayakumar R, Soman KP (2018) DeepMalNet: evaluating shallow and deep networks for static PE malware detection. In: ICT expressGoogle Scholar
  15. 15.
    Vinayakumar R, Soman KP, Poornachandran P, Mohan VS, Kumar AD (2019) ScaleNet: scalable and hybrid framework for cyber threat situational awareness based on DNS, URL, and email data analysis. J Cyber Secur Mobility 8(2):189–240CrossRefGoogle Scholar
  16. 16.
    Sahoo D, Liu C, Hoi SC (2017) Malicious URL detection using machine learning: a survey. In: arXiv preprint. arXiv:1701.07179
  17. 17.
    Rao H, Shi X, Rodrigue AK, Feng J, Xia Y, Elhoseny M, Gu L (2019) Feature selection based on artificial bee colony and gradient boosting decision tree. Appl Soft Comput 74:634–642CrossRefGoogle Scholar
  18. 18.
    Sanders H, Saxe J (2017) Garbage in, garbage out: how purportedly great ML models can be screwed up by bad dataGoogle Scholar
  19. 19.
  20. 20.
    Heartfield R, Loukas G (2016) A taxonomy of attacks and a survey of defence mechanisms for semantic social engineering attacks. ACM Comput Surv (CSUR) 48(3):37Google Scholar
  21. 21.
    Hong J (2012) The state of phishing attacks. Commun ACM 55(1):74–81CrossRefGoogle Scholar
  22. 22.
    Liang B, Huang J, Liu F, Wang D, Dong D, Liang Z (2009) Malicious web pages detection based on abnormal visibility recognition. In: International conference on e-business and information system security. EBISS’09. IEEE, pp 1–5Google Scholar
  23. 23.
    Maslennikov D, Namestnikov Y (2012) Kaspersky security bulletin statisticsGoogle Scholar
  24. 24.
    Garera S, Provos N, Chew M, Rubin AD (2007) A framework for detection and measurement of phishing attacks. In: Proceedings of the 2007 ACM workshop on recurring malcode. ACM, pp 1–8Google Scholar
  25. 25.
    Patil DR, Patil JB (2015) Survey on malicious web pages detection techniques. Int J U E Serv Sci Technol 8(5):195–206MathSciNetCrossRefGoogle Scholar
  26. 26.
    McGrath DK, Gupta M (2008) Behind phishing: an examination of Phisher Modi operandi. LEET 8:4Google Scholar
  27. 27.
    Kuyama M, Kakizaki Y, Sasaki R (2016) Method for detecting a malicious domain by using whois and dns features. In: The third international conference on digital security and forensics (DigitalSec2016), p 74Google Scholar
  28. 28.
    Kan MY, Thi HON (2005) Fast webpage classification using URL features. In: Proceedings of the 14th ACM international conference on Information and knowledge management. ACM, pp 325–326Google Scholar
  29. 29.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521(7553):436CrossRefGoogle Scholar
  30. 30.
    Vinayakumar R, Soman KP, Poornachandran P (2018) Detecting malicious domain names using deep learning approaches at scale. J Intell Fuzzy Syst 34(3):1355–1367CrossRefGoogle Scholar
  31. 31.
    Young T, Hazarika D, Poria S, Cambria E (2017) Recent trends in deep learning based natural language processing. In: arXiv preprint. arXiv:1708.02709
  32. 32.
    Elsayed W, Elhoseny M, Sabbeh S, Riad A (2018) Self-maintenance model for wireless sensor networks. Comput Electr Eng 70:799–812CrossRefGoogle Scholar
  33. 33.
    Ghandour AG, Elhoseny M, Hassanien AE (2019) Blockchains for smart cities: a survey. In: Hassanien A, Elhoseny M, Ahmed S, Singh A (eds) Security in smart cities: models, applications, and challenges. Lecture notes in intelligent transportation and infrastructure. Springer, ChamGoogle Scholar
  34. 34.
    Elhoseny M, Hassanien AE (2019) Secure data transmission in WSN: an overview. In: Dynamic wireless sensor networks. Studies in systems, decision and control, vol 165. Springer, ChamGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • N. B. Harikrishnan
    • 1
  • R. Vinayakumar
    • 1
    Email author
  • K. P. Soman
    • 1
  • Prabaharan Poornachandran
    • 2
  1. 1.Center for Computational Engineering and Networking (CEN), Amrita School of EngineeringAmrita Vishwa VidyapeethamCoimbatoreIndia
  2. 2.Centre for Cyber Security Systems and Networks, Amrita School of EngineeringAmrita Vishwa VidyapeethamAmritapuriIndia

Personalised recommendations