Abstract
Internet traffic measurement and analysis has become a significantly challenging job because large packet trace files captured on fast links could not be easily handled on a single server with limited computing and memory resources. Hadoop is a popular open-source cloud computing platform that provides a software programming framework called MapReduce and the distributed filesystem, HDFS, which are useful for analyzing a large data set. Therefore, in this paper, we present a Hadoop-based packet processing tool that provides scalability for a large data set by harnessing MapReduce and HDFS. To tackle large packet trace files in Hadoop efficiently, we devised a new binary input format, called PcapInputFormat, hiding the complexity of processing binary-formatted packet data and parsing each packet record. We also designed efficient traffic analysis MapReduce job models consisting of map and reduce functions. To evaluate our tool, we compared its computation time with a well-known packet-processing tool, CoralReef, and showed that our approach is more affordable to process a large set of packet data.
This research was partly supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2011-(C1090-1031-0005)) and partly by the IT R&D program of MKE/KEIT [KI001878, “CASFI : High-Precision Measurement and Analysis Research”].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Tcpdump, http://www.tcpdump.org
Wireshark, http://www.wireshark.org
CAIDA CoralReef Software Suite, http://www.caida.org/tools/measurement/coralreef
Roesch, M.: Snort - Lightweight Intrusion Detection for Networks. In: USENIX LISA (1999)
Cisco NetFlow, http://www.cisco.com/web/go/netflow
Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Cluster. In: OSDI (2004)
Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: SOSP (October 2003)
Hadoop, http://hadoop.apache.org/
Chen, W., Wang, J.: Building a Cloud Computing Analysis System for Intrusion Detection System. In: CloudSlam (2009)
Lee, Y., Kang, W., Son, H.: An Internet Flow Analysis Method with MapReduce. In: 1st IFIP/IEEE Workshop on Cloud Management (April 2010)
Conner, J.: Customizing Input File Formats for Image Processing in Hadoop, Arizona State University Technical Report (2009)
Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce Performance in Heterogeneous Environments. In: OSDI (2008)
Kambatla, K., Pathak, A., Pucha, H.: Towards Optimizing Hadoop Provisioning in the Cloud. In: USENIX Hotcloud (2009)
Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. ACM/USENIX NSDI (April 2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lee, Y., Kang, W., Lee, Y. (2011). A Hadoop-Based Packet Trace Processing Tool. In: Domingo-Pascual, J., Shavitt, Y., Uhlig, S. (eds) Traffic Monitoring and Analysis. TMA 2011. Lecture Notes in Computer Science, vol 6613. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20305-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-642-20305-3_5
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20304-6
Online ISBN: 978-3-642-20305-3
eBook Packages: Computer ScienceComputer Science (R0)