Skip to main content

The MALICIA dataset: identification and analysis of drive-by download operations


Drive-by downloads are the preferred distribution vector for many malware families. In the drive-by ecosystem, many exploit servers run the same exploit kit and it is a challenge understanding whether the exploit server is part of a larger operation. In this paper, we propose a technique to identify exploit servers managed by the same organization. We collect over time how exploit servers are configured, which exploits they use, and what malware they distribute, grouping servers with similar configurations into operations. Our operational analysis reveals that although individual exploit servers have a median lifetime of 16 h, long-lived operations exist that operate for several months. To sustain long-lived operations, miscreants are turning to the cloud, with 60 % of the exploit servers hosted by specialized cloud hosting services. We also observe operations that distribute multiple malware families and that pay-per-install affiliate programs are managing exploit servers for their affiliates to convert traffic into installations. Furthermore, we analyze the exploit polymorphism problem, measuring the repacking rate for different exploit types. To understand how difficult is to takedown exploit servers, we analyze the abuse reporting process and issue abuse reports for 19 long-lived servers. We describe the interaction with ISPs and hosting providers and monitor the result of the report. We find that 61 % of the reports are not even acknowledged. On average, an exploit server still lives for 4.3 days after a report. Finally, we detail the Malicia dataset we have collected and are making available to other researchers.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7


  1. This practice also applies to other types of abuse such as C&C servers, hosts launching SSH and DoS attacks, and malware-infected machines. However, spam is commonly reported from a receiving mail provider to the sender mail provider and web server compromises are commonly first reported to the webmaster.


  1. Allatori java obfuscator.

  2. Anderson, D.S., Fleizach, C., Savage, S., Voelker, G.M.: Spamscatter: characterizing internet scam hosting infrastructure. In: USENIX Security Symposium, Boston, MA (August 2007)

  3. An overview of exploit packs (update 20) Jan (2014)

  4. Bailey, M., Oberheide, J., Andersen, J., Mao, Z., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: International Symposium on Recent Advances in Intrusion Detection, Queensland, Australia (September 2007)

  5. Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security Symposium, San Diego, CA (February 2009)

  6. Bfk: Passive dns replication.

  7. Caballero, J., Grier, C., Kreibich, C., Paxson, V.: Measuring pay-per-install: the commoditization of malware distribution. In: USENIX Security Symposium, San Francisco, CA (August 2011)

  8. Caida: As Ranking (October 2012).

  9. Canali, D., Balzarotti, D., Francillon, A.: The role of web hosting providers in detecting compromised websites. In: International World Wide Web Conference, Rio de Janeiro, Brazil (May 2013)

  10. Cho, C.Y., Caballero, J., Grier, C., Paxson, V., Song, D.: Insights from the inside: A view of botnet management from infiltration. In: USENIX Workshop on Large-Scale Exploits and Emergent Threats. San Jose, CA, April (2010)

  11. Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: International World Wide Web Conference, Raleigh, NC (April 2010)

  12. Crocker, D.: Mailbox Names for Common Services, Roles and Functions. RFC 2142 (May 1997)

  13. Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: low-overhead mostly static javascript malware detection. In: USENIX Security Symposium, San Francisco, CA (August 2011)

  14. Cool exploit kit—a new browser exploit pack.

  15. Daigle, L.: Whois Protocol Specification. RFC 3912 (September 2004)

  16. Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)

  17. Falk, J.: Complaint Feedback Loop Operational Recommendations. RFC 6449 (November 2011)

  18. Falk, J., Kucherawy, M.: Creation and Use of Email Feedback Reports: An Applicability Statement for the Abuse Reporting Format (arf). RFC 6650 (June 2012)

  19. Grier, C., Ballard, L., Caballero, J., Chachra, N., Dietrich, C.J., Levchenko, K., Mavrommatis, P., McCoy, D., Nappa, A., Pitsillidis, A., Provos, N., Rafique, M.Z. Rajab, M.A., Rossow, C., Thomas, K., Paxson, V., Savage, S., Voelker, G.M.: Manufacturing compromise: the emergence of exploit-as-a-service. In: ACM Conference on Computer and Communications Security, Raleigh, NC (October 2012)

  20. Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: ACM Conference on Computer and Communications Security. Chicago, IL (October 2011)

  21. John, J.P., Moshchuk, A., Gribble, S.D., Krishnamurthy, A.: Studying spamming botnets using Botlab. In: Symposium on Networked System Design and Implementation, Boston, MA (April 2009)

  22. Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 4. Wiley, New York (1990)

    Book  Google Scholar 

  23. Krawetz, N.: Average Perceptual Hash (May 2011).

  24. Kreibich, C., Weaver, N., Kanich, C., Cui, W., Paxson, V.: GQ: Practical containment for measuring modern malware systems. In: Internet Measurement Conference, Berlin, Germany (November 2011)

  25. Li, Z., Alrwais, S., Xie, Y., Yu, F., Wang, X.: Finding the linchpins of the dark web: a study on topologically dedicated hosts on malicious web infrastructures. In: IEEE Symposium on Security and Privacy, San Francisco, CA (May 2013)

  26. Love vps

  27. Malicia project

  28. Malware domain list

  29. Morrison, T.: How Hosting Providers can Battle Fraudulent Sign-ups (October 2012).

  30. Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware on the web. In: Network and Distributed System Security Symposium, San Diego, CA (February 2006)

  31. New Dutch Notice-and-Take-Down Code Raises Questions (October 2008).

  32. Nappa, A., Rafique, M.Z., Caballero, J.: Driving in the cloud: an analysis of drive-by download operations and abuse reporting. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Berlin, Germany (July 2013)

  33. Nelms, T., Perdisci, R., Ahamad, M.: Execscent: mining for new c&c domains in live networks with adaptive control protocol templates. In: USENIX Security Symposium, Washington, DC (August 2013)

  34. Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of http-based malware and signature generation using malicious network traces. In: Symposium on Networked System Design and Implementation, San Jose, CA (April 2010)

  35. Perdisci, R., Vamo, M.U.: Towards a fully automated malware clustering validity analysis. In: Annual Computer Security Applications Conference, Orlando, FL (December 2012)

  36. Polychronakis, M., Mavrommatis, P., Provos, N.: Ghost turns zombie: exploring the life cycle of web-based malware. In: USENIX Workshop on Large-Scale Exploits and Emergent Threats, San Francisco, CA (April 2008)

  37. Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iframes point to us. In: USENIX Security Symposium, San Jose, CA (July 2008)

  38. Provos, N., McNamee, D., Mavrommatis, P., Wang, K., Modadugu, N.: The ghost in the browser: analysis of Web-based malware. In: USENIX Workshop on Hot Topics on Understanding Botnets, Cambridge, UK (April 2007)

  39. Rafique, M.Z., Caballero, J.: Firma: Malware clustering and network signature generation with mixed network behaviors. In: International Symposium on Recent Advances in Intrusion Detection, St. Lucia (October 2013)

  40. Rafique, M.Z., Huygens, C., Caballero, J.: Network Dialog Minimization and Network Dialog Diffing: Two Novel Primitives for Network Security Applications. Technical Report TR-IMDEA-SW-2014-001, IMDEA Software Institute, Madrid, Spain (March 2014).

  41. Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Paris, France (July 2008)

  42. Rossow, C., Dietrich, C.J.: Provex: Detecting botnets with encrypted command and control channels. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Berlin, Germany (July 2013)

  43. Rossow, C., Dietrich, C.J., Bos, H., Cavallaro, L., van Steen, M., Freiling, F.C., Pohlmann, N.: Sandnet: network traffic analysis of malicious software. In: Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, Salzburg, Austria (April 2011)

  44. Ssdsandbox.

  45. Shafranovich, Y., Levine, J., Kucherawy, M.: An Extensible Format for Email Feedback Reports. RFC 5965 (August 2010). Updated by RFC 6650

  46. Shue, C., Kalafut, A.J., Gupta, M.: Abnormally malicious autonomous systems and their internet connectivity. IEEE/ACM Transactions of Networking 20(1), (2012)

  47. Snort

  48. Stone-Gross, B., Christopher, Kruegel, Almeroth, K., Moser, A., Kirda, E.: Fire: Finding rogue networks. In: Annual Computer Security Applications Conference, Honolulu, HI (December 2009)

  49. Suricata

  50. The spamhaus project (October 2012)

  51. Urlquery.

  52. Virustotal.

  53. Walls, R.J., Levine, B.N., Liberatore, M., Shields, C.: Effective digital forensics research is investigator-centric. In: USENIX Workshop on Hot Topics in Security, San Francisco, CA (August 2011)

  54. Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.: Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In: Network and Distributed System Security Symposium, San Diego, CA (February 2006)

  55. Wepawet.

  56. Wyke, J.: The Zeroaccess Botnet: Mining and Fraud for Massive Financial Gain (September 2012).

  57. X-arf: Network abuse reporting 2.0.

  58. Xylitol: Blackhole exploit kits update to v2.0 (September 2011).

  59. Xylitol: Tracking Cyber Crime: Hands Up Affiliate (Ransomware) (December 2011).

  60. Zauner, C.: Implementation and Benchmarking of Perceptual Image Hash Functions. Master’s thesis, Upper Austria University of Applied Sciences, Hagenberg, Austria (July 2010)

  61. Zelix klassmaster heavy duty protection.

  62. Zhang, J., Seifert, C., Stokes, J. W., Lee, W.: Arrow: Generating signatures to detect drive-by downloads. In: International World Wide Web Conference, Hyderabad, India (April 2011)

Download references


The authors would like to thank Chris Grier and Kurt Thomas for their help and the anonymous reviewers for their insightful comments. This work was supported in part by the European Union through the FP7 network of excellence NESSoS (Grant FP7-ICT No. 256980), by the Spanish Government through the StrongSoft project (Grant TIN2012-39391-C04-01) and a Juan de la Cierva Fellowship for Juan Caballero, by the N-Greens CM project, by the Research Fund KU Leuven, and by the Fight against Crime Programme of the European Union (B-CCENTRE). Opinions expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Antonio Nappa.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Nappa, A., Rafique, M.Z. & Caballero, J. The MALICIA dataset: identification and analysis of drive-by download operations. Int. J. Inf. Secur. 14, 15–33 (2015).

Download citation

  • Published:

  • Issue Date:

  • DOI:


  • Drive-by download operations
  • Malicia dataset
  • Malware distribution
  • Cybercrime