Advertisement

Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

High-performance network traffic analysis for continuous batch intrusion detection

Abstract

Network traffic analysis is applied to detect intrusions and manage application traffic. Continuous batch network traffic analysis is a computationally demanding task. Because of traffic intensity variations due to the natural peaks and crests of network traffic intensity, a network analysis cluster may have to be severely over-dimensioned to support 24/7 continuous packet block capture and processing. In this paper, we characterize the computational requirements of the network traffic packets for several conditions, which constitute a useful tool for generating a network workload in simulated scenarios. Our target MapReduce jobs are map-intensive, including string matching-based virus and malware detection. We present an architecture for a Hadoop-based network analysis solution including a scheduler, report on using this approach in a small cluster, and show scheduling performance results obtained through simulation. The scheduler considers a cloud-based traffic analysis solution that bursts traffic to the cloud to overcome local resource limitations. The results show that we are able to reduce the amount of the traffic to burst out by up to 50 % and still accomplish a continuous batch traffic analysis with single-job comparable run times.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Notes

  1. 1.

    http://hadoop.apache.org.

  2. 2.

    http://www.pravail.com.

  3. 3.

    https://www.openstack.org/software/icehouse/.

  4. 4.

    http://docs.openstack.org/developer/sahara/.

  5. 5.

    The CAIDA UCSD http://www.caida.org/data/passive/trace_stats/.

References

  1. 1.

    Stephen McGough A, Forshaw M, Gerrard C, Wheater S, Allen B, Robinson P (2014) Comparison of a cost-effective virtual cloud cluster with an existing campus cluster. Future Gen Comput Syst 41:65–78

  2. 2.

    Guo T, Sharma U, Shenoy P, Wood T, Sahu S (2014) Cost-aware cloud bursting for enterprise applications. ACM Trans Internet Technol 13(3):1–24

  3. 3.

    Nair SK et al (2010) Towards secure cloud bursting, brokerage and aggregation. In: Proceedings of the 8th IEEE European conference on web services, ECOWS 2010, pp 189–196

  4. 4.

    Lee Y, Lee Y (2012) Toward scalable internet traffic measurement and analysis with Hadoop. ACM SIGCOMM Comput Commun Rev 43(1):5–13

  5. 5.

    RIPE (2012) Large-scale PCAP data analysis using Apache Hadoop. https://github.com/RIPE-NCC/hadoop-pcap

  6. 6.

    Pallavi A, Hemlata P (2012) Network traffic analysis using packet sniffer. Int J Eng Res Appl 2(3):854–856

  7. 7.

    Bicer T, Chiu D, Agrawal G (2011) A framework for data-intensive computing with cloud bursting. 2011 IEEE international conference on cluster computing, pp 169–177

  8. 8.

    Kailasam S, Dhawalia P, Balaji SJ, Iyer G, Dharanipragada J (2014) Extending MapReduce across clouds with BStream. IEEE Trans Cloud Comput 2(3):362–376

  9. 9.

    Chang H, Kodialam M, Kompella RR, Lakshman TV, Lee M, Mukherjee S (2011) Scheduling in mapreduce-like systems for fast completion time. IEEE INFOCOM, pp 3074–3082

  10. 10.

    Mattess M, Calheiros RN, Buyya R (2013) Scaling MapReduce applications across hybrid clouds to meet soft deadlines. International conference on advanced information networking and applications, pp 629–636

  11. 11.

    Verma A, Cherkasova L, Kumar VS, Campbell RH (2012) Deadline-based workload management for MapReduce environments: pieces of the performance puzzle. In: Proceedings of network operations and management symposium, pp 900–905

  12. 12.

    Dong X, Wang Y, Liao H (2011) Scheduling mixed real-time and non-real-time applications in MapReduce environment. International conference on parallel and distributed systems, pp 9–16

  13. 13.

    Hwang E, Kim KH (2012) Minimizing cost of virtual machines for deadline-constrained MapReduce applications in the cloud international conference on grid computing, pp 130–138

  14. 14.

    Kc K, Anyanwu K (2010) Scheduling hadoop jobs to meet deadlines. In: Proceedings of IEEE second international conference on cloud computing technology and science, Indianapolis, pp 388–392

  15. 15.

    Lim N, Majumdar S, Ashwood-Smith P (2014) A constraint programming-based resource management technique for processing MapReduce jobs with SLAs on clouds. International conference on parallel processing (ICPP), pp 411–421

  16. 16.

    Gaj P, Kwiecie A, Stera P (2015) Estimating the intensity of long-range dependence in real and synthetic traffic traces. Springer Comput Netw 522:11–22

Download references

Acknowledgments

Work (partially) funded by the Operational Programme for Competitiveness and Internationalisation - COMPETE 2020 within project POCI-01-0145-FEDER-006961, and by FCT – Portuguese Foundation for Science and Technology as part of projects UID/EEA/50014/2013 and UID/CEC/00027/2013.

Author information

Correspondence to Jorge G. Barbosa.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Morla, R., Gonçalves, P. & Barbosa, J.G. High-performance network traffic analysis for continuous batch intrusion detection. J Supercomput 72, 4107–4128 (2016). https://doi.org/10.1007/s11227-016-1743-6

Download citation

Keywords

  • Packet network traffic analysis
  • Hadoop
  • Cloud bursting
  • On-line scheduling