An Open Architecture for Distributed Malware Collection and Analysis

  • Davide Cavalca
  • Emanuele Goldoni
Conference paper


Honeynets have become an important tool for researchers and network operators. However, the lack of a unified honeynet data model has impeded their effectiveness, resulting in multiple unrelated data sources, each with its own proprietary access method and format. Moreover, the deployment and management of a honeynet is a time-consuming activity and the interpretation of collected data is far from trivial. HIVE (Honeynet Infrastructure in Virtualized Environment) is a novel highly scalable automated data collection and analysis architecture we designed. Our infrastructure is based on top of proven FLOSS (Free, Libre and Open Source) solutions, which have been extended and integrated with new tools we developed. We use virtualization to ease honeypot management and deployment, combining both high-interaction and low-interaction sensors in a common infrastructure. We also address the need for rapid comprehension and detailed data analysis by harnessing the power of a relational database system, which provides centralized storage and access to the collected data while ensuring its constant integrity. This chapter presents our malware data collection architecture, offering some insight in the structure and benefits of a distributed virtualized honeynet and its development. Finally, we present some techniques for the active monitoring of centralized botnets we integrated in HIVE, which allow us to track the menaces evolution and timely deploy effective countermeasures.


Virtual Machine Database Schema External Service Attack Vector Outgoing Traffic 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    F-Secure world map.
  2. 2.
    The honeynet project.
  3. 3. honeypot project.
  4. 4.
    The mwcollect alliance.
  5. 5.
    Prelude hybrid IDS project.
  6. 6.
    Baecher P, Koetter M, Holz T, Dornseif M, Freiling F (2006) The Nepenthes platform: An efficient approach to collect malware. In Springer, editor, Proceedings of the 9th International Symposium on Recent Advances in Intrusion Detection (RAID), 165-184, Sept. 2006.Google Scholar
  7. 7.
    Balas E, Viecco C (2005) Towards a third generation data capture architecture for honeynets. In Systems, Man and Cybernetics (SMC) Information Assurance Workshop. Proceedings from the Sixth Annual IEEE, 21-28, June 15-17.Google Scholar
  8. 8.
    Bayer U, Moser A, Kruegel C, Kirda E. (2006) Dynamic analysis of malicious code. Journal in Computer Virology, 2(1):67-77.CrossRefGoogle Scholar
  9. 9.
    Bellard F (2005) QEMU, a fast and portable dynamic translator. In Proceedings of the USENIX 2005 Annual Technical Conference, FREENIX Track, 41-46, June 2005.Google Scholar
  10. 10.
    Colajanni M, Gozzi D, Marchetti M (2008) Collaborative architecture for malware detection and analysis. In Proceedings of The IFIP TC 11 23rd International Information Security Conference, volume 278/2008 of IFIP International Federation for Information Processing, 79-93. Springer Boston, July 2008.Google Scholar
  11. 11.
    Dornseif M, Holz T, Klein CN (2004) NoSEBrEaK - attacking honeynets. In Information Assurance Workshop. Proceedings of the Fifth Annual IEEE SMC, 123-129, June 2004.Google Scholar
  12. 12.
    Göbel JG. Amun: Python honeypot.
  13. 13.
    Göbel JG (2007) Infiltrator v0.1., Nov. 2007.
  14. 14.
    Göbel JG (2008) Infiltrator v0.3., Oct. 2008.
  15. 15.
    Grizzard JB, Sharma V, Nunnery C, Kang BB, Dagon D (2007) Peer-to-peer botnets: overview and case study. In HotBots’07: Proceedings of the First Conference on First Workshop on Hot Topics in Understanding Botnets, 1-1, Berkeley, CA, USA, USENIX Association.Google Scholar
  16. 16.
    Hispasec Sistemas. Virustotal.
  17. 17.
    Holz T, Raynal F (2005) Detecting honeypots and other suspicious environments. In Information Assurance Workshop, IAW ‘05. Proceedings of the Sixth Annual IEEE SMC, 29-36, June 2005.Google Scholar
  18. 18.
    International Secure Systems Lab. Anubis: Analyzing unknown binaries.
  19. 19.
    Joe Security. Joebox.
  20. 20.
    Leita C, Pham VH, Thonnard O, Ramirez ES, Pouget F, Kirda E, Dacier M (2008) The project: Collecting internet threats information using a worldwide distributed honeynet. In Information Security Threats Data Collection and Sharing. WISTDCS ‘08. WOMBAT Workshop, 40-57, Amsterdam, Apr. 2008.CrossRefGoogle Scholar
  21. 21.
  22. 22.
    McCanne S, Leres C, Jacobson V (2008) libpcap.
  23. 23.
    Mukkamala S, Yendrapalli K, Basnet R, Shankarapani MK, Sung AH (2007) Detection of virtual environments and low interaction honeypots. In Information Assurance and Security Workshop. IAW ‘07. IEEE SMC, 92-98, West Point, NY, June 2007.CrossRefGoogle Scholar
  24. 24.
    Norman. SandBox information center.
  25. 25.
    Ormandy T (2007) An empirical study into the security exposure to hosts of hostile virtualized environments. Technical report, Google, Inc., Apr. 2007.Google Scholar
  26. 26.
    Porras P, Shmatikov V (2006) Large-scale collection and sanitization of network security data: risks and challenges. In NSPW ‘06: Proceedings of the 2006 Workshop on New security paradigms, 57-64, New York, NY, USA, ACM.Google Scholar
  27. 27.
    Portokalidis G, Slowinska A, Bos H (2006) Argos: an emulator for fingerprinting zero-day attacks. In Proc. ACM SIGOPS EUROSYS’2006, Leuven, Belgium, Apr. 2006.Google Scholar
  28. 28.
    Pouget F, Dacier M, Pham VH (2005) on the advantages of deploying a large scale distributed honeypot platform. In ECCE’05, E-Crime and Computer Conference, 29-30th March 2005, Monaco.Google Scholar
  29. 29.
    Provos N (2004) A virtual honeypot framework. In Proceedings of the 13th USENIX Security Symposium, San Diego, CA, Aug. 2004.Google Scholar
  30. 30.
    Rajab MA, Zarfoss J, Monrose F, Terzis A (2006) A multifaceted approach to understanding the botnet phenomenon. In IMC ‘06: Proceedings of the 6th ACM SIGCOMM conference on Internet measurement, pages 41-52, New York, NY, USA. ACM.Google Scholar
  31. 31.
  32. 32.
    Sourcefire, Inc. Clam antiVirus.
  33. 33.
    SUN Microsystems. VirtualBox.
  34. 34.
    Sunbelt. Cwsandbox.
  35. 35.
    The Artemis Team. HoneyBow.
  36. 36.
    The Honeynet project (2003) Know your enemy: Sebek., Aug. 2003.
  37. 37.
    The Honeynet project (2005) Know your enemy: Honeywall cdrom roo., Aug. 2005.
  38. 38.
    The Linux Foundation. Net:bridge.
  39. 39.
    The PostgreSQL Global Development Group. Slony-I: enterprise-level replication system.
  40. 40.
    The PostgreSQL Global Development Group PostgreSQL 8.3.1 documentation. appendix D. SQL conformance., Mar. 2008.
  41. 41.
    Trend Micro (2007) Threat report and forecast. Technical report, Trend Micro.Google Scholar
  42. 42.
    Werner T. Honeytrap.
  43. 43.
  44. 44.
    Willems C, Holz T, Freiling F (2007) Toward Automated Dynamic Malware Analysis Using CWSandbox. IEEE Security & Privacy Magazine, 5(2):32-39.CrossRefGoogle Scholar
  45. 45.
    Zhuge J, Holz T, Han X, Song C, Zou W (2007) Collecting autonomous spreading malware using high-interaction honeypots. In ICICS 2007, 438-451.Google Scholar
  46. 46.
    Zou CC, Cunningham R (2006) Honeypot-aware Advanced Botnet Construction and Maintenance. International Conference on Dependable Systems and Networks, 199-208, Philadelphia, PA.Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.Department of Computer Engineering and Systems ScienceUniversity of PaviaPaviaItaly
  2. 2.Department of ElectronicsUniversity of PaviaPaviaItaly

Personalised recommendations