A Hadoop-Based Packet Trace Processing Tool

  • Yeonhee Lee
  • Wonchul Kang
  • Youngseok Lee
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6613)


Internet traffic measurement and analysis has become a significantly challenging job because large packet trace files captured on fast links could not be easily handled on a single server with limited computing and memory resources. Hadoop is a popular open-source cloud computing platform that provides a software programming framework called MapReduce and the distributed filesystem, HDFS, which are useful for analyzing a large data set. Therefore, in this paper, we present a Hadoop-based packet processing tool that provides scalability for a large data set by harnessing MapReduce and HDFS. To tackle large packet trace files in Hadoop efficiently, we devised a new binary input format, called PcapInputFormat, hiding the complexity of processing binary-formatted packet data and parsing each packet record. We also designed efficient traffic analysis MapReduce job models consisting of map and reduce functions. To evaluate our tool, we compared its computation time with a well-known packet-processing tool, CoralReef, and showed that our approach is more affordable to process a large set of packet data.


Packet Data Packet Processing Statistic Command Packet Trace MapReduce Platform 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
  2. 2.
  3. 3.
    CAIDA CoralReef Software Suite,
  4. 4.
    Roesch, M.: Snort - Lightweight Intrusion Detection for Networks. In: USENIX LISA (1999)Google Scholar
  5. 5.
  6. 6.
    Dean, J., Ghemawat, S.: MapReduce: Simplified Data Processing on Large Cluster. In: OSDI (2004)Google Scholar
  7. 7.
    Ghemawat, S., Gobioff, H., Leung, S.: The Google file system. In: SOSP (October 2003)Google Scholar
  8. 8.
  9. 9.
    Chen, W., Wang, J.: Building a Cloud Computing Analysis System for Intrusion Detection System. In: CloudSlam (2009)Google Scholar
  10. 10.
    Lee, Y., Kang, W., Son, H.: An Internet Flow Analysis Method with MapReduce. In: 1st IFIP/IEEE Workshop on Cloud Management (April 2010)Google Scholar
  11. 11.
    Conner, J.: Customizing Input File Formats for Image Processing in Hadoop, Arizona State University Technical Report (2009)Google Scholar
  12. 12.
    Zaharia, M., Konwinski, A., Joseph, A.D., Katz, R., Stoica, I.: Improving MapReduce Performance in Heterogeneous Environments. In: OSDI (2008)Google Scholar
  13. 13.
    Kambatla, K., Pathak, A., Pucha, H.: Towards Optimizing Hadoop Provisioning in the Cloud. In: USENIX Hotcloud (2009)Google Scholar
  14. 14.
    Condie, T., Conway, N., Alvaro, P., Hellerstein, J.M., Elmeleegy, K., Sears, R.: MapReduce Online. ACM/USENIX NSDI (April 2010)Google Scholar
  15. 15.

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Yeonhee Lee
    • 1
  • Wonchul Kang
    • 1
  • Youngseok Lee
    • 1
  1. 1.Chungnam National UniversityDaejeonRepublic of Korea

Personalised recommendations