Skip to main content

A Hadoop-Based Packet Trace Processing Tool

  • Conference paper
Traffic Monitoring and Analysis (TMA 2011)

Part of the book series: Lecture Notes in Computer Science ((LNCCN,volume 6613))

Included in the following conference series:

Abstract

Internet traffic measurement and analysis has become a significantly challenging job because large packet trace files captured on fast links could not be easily handled on a single server with limited computing and memory resources. Hadoop is a popular open-source cloud computing platform that provides a software programming framework called MapReduce and the distributed filesystem, HDFS, which are useful for analyzing a large data set. Therefore, in this paper, we present a Hadoop-based packet processing tool that provides scalability for a large data set by harnessing MapReduce and HDFS. To tackle large packet trace files in Hadoop efficiently, we devised a new binary input format, called PcapInputFormat, hiding the complexity of processing binary-formatted packet data and parsing each packet record. We also designed efficient traffic analysis MapReduce job models consisting of map and reduce functions. To evaluate our tool, we compared its computation time with a well-known packet-processing tool, CoralReef, and showed that our approach is more affordable to process a large set of packet data.

This research was partly supported by the MKE (Ministry of Knowledge Economy), Korea, under the ITRC (Information Technology Research Center) support program supervised by the NIPA (National IT Industry Promotion Agency) (NIPA-2011-(C1090-1031-0005)) and partly by the IT R&D program of MKE/KEIT [KI001878, “CASFI : High-Precision Measurement and Analysis Research”].

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Tcpdump, http://www.tcpdump.org

  2. Wireshark, http://www.wireshark.org

  3. CAIDA CoralReef Software Suite, http://www.caida.org/tools/measurement/coralreef

  4. Roesch, M.: Snort - Lightweight Intrusion Detection for Networks. In: USENIX LISA (1999)

    Google Scholar 

  5. Cisco NetFlow, http://www.cisco.com/web/go/netflow

  6. Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Cluster. In: OSDI (2004)

    Google Scholar 

  7. Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: SOSP (October 2003)

    Google Scholar 

  8. Hadoop, http://hadoop.apache.org/

  9. Chen, W., Wang, J.: Building a Cloud Computing Analysis System for Intrusion Detection System. In: CloudSlam (2009)

    Google Scholar 

  10. Lee, Y., Kang, W., Son, H.: An Internet Flow Analysis Method with MapReduce. In: 1st IFIP/IEEE Workshop on Cloud Management (April 2010)

    Google Scholar 

  11. Conner, J.: Customizing Input File Formats for Image Processing in Hadoop, Arizona State University Technical Report (2009)

    Google Scholar 

  12. Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce Performance in Heterogeneous Environments. In: OSDI (2008)

    Google Scholar 

  13. Kambatla, K., Pathak, A., Pucha, H.: Towards Optimizing Hadoop Provisioning in the Cloud. In: USENIX Hotcloud (2009)

    Google Scholar 

  14. Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. ACM/USENIX NSDI (April 2010)

    Google Scholar 

  15. https://sites.google.com/a/networks.cnu.ac.kr/yhlee/p3

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2011 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lee, Y., Kang, W., Lee, Y. (2011). A Hadoop-Based Packet Trace Processing Tool. In: Domingo-Pascual, J., Shavitt, Y., Uhlig, S. (eds) Traffic Monitoring and Analysis. TMA 2011. Lecture Notes in Computer Science, vol 6613. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20305-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-20305-3_5

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-20304-6

  • Online ISBN: 978-3-642-20305-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics