The MALICIA dataset: identification and analysis of drive-by download operations

Nappa, Antonio; Rafique, M. Zubair; Caballero, Juan

doi:10.1007/s10207-014-0248-7

The MALICIA dataset: identification and analysis of drive-by download operations

Regular Contribution
Published: 21 June 2014

Volume 14, pages 15–33, (2015)
Cite this article

International Journal of Information Security Aims and scope Submit manuscript

Antonio Nappa^1,2,
M. Zubair Rafique³ &
Juan Caballero¹

1259 Accesses
67 Citations
Explore all metrics

Abstract

Drive-by downloads are the preferred distribution vector for many malware families. In the drive-by ecosystem, many exploit servers run the same exploit kit and it is a challenge understanding whether the exploit server is part of a larger operation. In this paper, we propose a technique to identify exploit servers managed by the same organization. We collect over time how exploit servers are configured, which exploits they use, and what malware they distribute, grouping servers with similar configurations into operations. Our operational analysis reveals that although individual exploit servers have a median lifetime of 16 h, long-lived operations exist that operate for several months. To sustain long-lived operations, miscreants are turning to the cloud, with 60 % of the exploit servers hosted by specialized cloud hosting services. We also observe operations that distribute multiple malware families and that pay-per-install affiliate programs are managing exploit servers for their affiliates to convert traffic into installations. Furthermore, we analyze the exploit polymorphism problem, measuring the repacking rate for different exploit types. To understand how difficult is to takedown exploit servers, we analyze the abuse reporting process and issue abuse reports for 19 long-lived servers. We describe the interaction with ISPs and hosting providers and monitor the result of the report. We find that 61 % of the reports are not even acknowledged. On average, an exploit server still lives for 4.3 days after a report. Finally, we detail the Malicia dataset we have collected and are making available to other researchers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Driving in the Cloud: An Analysis of Drive-by Download Operations and Abuse Reporting

The Role of Cloud Services in Malicious Software: Trends and Insights

PExy: The Other Side of Exploit Kits

Notes

This practice also applies to other types of abuse such as C&C servers, hosts launching SSH and DoS attacks, and malware-infected machines. However, spam is commonly reported from a receiving mail provider to the sender mail provider and web server compromises are commonly first reported to the webmaster.

References

Allatori java obfuscator. http://www.allatori.com/
Anderson, D.S., Fleizach, C., Savage, S., Voelker, G.M.: Spamscatter: characterizing internet scam hosting infrastructure. In: USENIX Security Symposium, Boston, MA (August 2007)
An overview of exploit packs (update 20) Jan (2014) http://contagiodump.blogspot.com.es/2010/06/overview-of-exploit-packs-update.html
Bailey, M., Oberheide, J., Andersen, J., Mao, Z., Jahanian, F., Nazario, J.: Automated classification and analysis of internet malware. In: International Symposium on Recent Advances in Intrusion Detection, Queensland, Australia (September 2007)
Bayer, U., Comparetti, P.M., Hlauschek, C., Kruegel, C., Kirda, E.: Scalable, behavior-based malware clustering. In: Network and Distributed System Security Symposium, San Diego, CA (February 2009)
Bfk: Passive dns replication. http://www.bfk.de/bfk_dnslogger.html
Caballero, J., Grier, C., Kreibich, C., Paxson, V.: Measuring pay-per-install: the commoditization of malware distribution. In: USENIX Security Symposium, San Francisco, CA (August 2011)
Caida: As Ranking (October 2012). http://as-rank.caida.org
Canali, D., Balzarotti, D., Francillon, A.: The role of web hosting providers in detecting compromised websites. In: International World Wide Web Conference, Rio de Janeiro, Brazil (May 2013)
Cho, C.Y., Caballero, J., Grier, C., Paxson, V., Song, D.: Insights from the inside: A view of botnet management from infiltration. In: USENIX Workshop on Large-Scale Exploits and Emergent Threats. San Jose, CA, April (2010)
Cova, M., Kruegel, C., Vigna, G.: Detection and analysis of drive-by-download attacks and malicious javascript code. In: International World Wide Web Conference, Raleigh, NC (April 2010)
Crocker, D.: Mailbox Names for Common Services, Roles and Functions. RFC 2142 (May 1997)
Curtsinger, C., Livshits, B., Zorn, B., Seifert, C.: Zozzle: low-overhead mostly static javascript malware detection. In: USENIX Security Symposium, San Francisco, CA (August 2011)
Cool exploit kit—a new browser exploit pack. http://malware.dontneedcoffee.com/2012/10/newcoolek.html/
Daigle, L.: Whois Protocol Specification. RFC 3912 (September 2004)
Dunn, J.C.: Well-separated clusters and optimal fuzzy partitions. J. Cybern. 4(1), 95–104 (1974)
Falk, J.: Complaint Feedback Loop Operational Recommendations. RFC 6449 (November 2011)
Falk, J., Kucherawy, M.: Creation and Use of Email Feedback Reports: An Applicability Statement for the Abuse Reporting Format (arf). RFC 6650 (June 2012)
Grier, C., Ballard, L., Caballero, J., Chachra, N., Dietrich, C.J., Levchenko, K., Mavrommatis, P., McCoy, D., Nappa, A., Pitsillidis, A., Provos, N., Rafique, M.Z. Rajab, M.A., Rossow, C., Thomas, K., Paxson, V., Savage, S., Voelker, G.M.: Manufacturing compromise: the emergence of exploit-as-a-service. In: ACM Conference on Computer and Communications Security, Raleigh, NC (October 2012)
Jang, J., Brumley, D., Venkataraman, S.: Bitshred: feature hashing malware for scalable triage and semantic analysis. In: ACM Conference on Computer and Communications Security. Chicago, IL (October 2011)
John, J.P., Moshchuk, A., Gribble, S.D., Krishnamurthy, A.: Studying spamming botnets using Botlab. In: Symposium on Networked System Design and Implementation, Boston, MA (April 2009)
Kaufman, L., Rousseeuw, P.J.: Finding Groups in Data: An Introduction to Cluster Analysis, vol. 4. Wiley, New York (1990)
Book Google Scholar
Krawetz, N.: Average Perceptual Hash (May 2011). http://www.hackerfactor.com/blog/index.php?/archives/432-Looks-Like-It.html
Kreibich, C., Weaver, N., Kanich, C., Cui, W., Paxson, V.: GQ: Practical containment for measuring modern malware systems. In: Internet Measurement Conference, Berlin, Germany (November 2011)
Li, Z., Alrwais, S., Xie, Y., Yu, F., Wang, X.: Finding the linchpins of the dark web: a study on topologically dedicated hosts on malicious web infrastructures. In: IEEE Symposium on Security and Privacy, San Francisco, CA (May 2013)
Love vps http://www.lovevps.com/
Malicia project http://malicia-project.com/
Malware domain list http://malwaredomainlist.com/
Morrison, T.: How Hosting Providers can Battle Fraudulent Sign-ups (October 2012). http://www.spamhaus.org/news/article/687/how-hosting-providers-can-battle-fraudulent-sign-ups
Moshchuk, A., Bragin, T., Gribble, S.D., Levy, H.M.: A crawler-based study of spyware on the web. In: Network and Distributed System Security Symposium, San Diego, CA (February 2006)
New Dutch Notice-and-Take-Down Code Raises Questions (October 2008). http://www.edri.org/book/export/html/1619
Nappa, A., Rafique, M.Z., Caballero, J.: Driving in the cloud: an analysis of drive-by download operations and abuse reporting. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Berlin, Germany (July 2013)
Nelms, T., Perdisci, R., Ahamad, M.: Execscent: mining for new c&c domains in live networks with adaptive control protocol templates. In: USENIX Security Symposium, Washington, DC (August 2013)
Perdisci, R., Lee, W., Feamster, N.: Behavioral clustering of http-based malware and signature generation using malicious network traces. In: Symposium on Networked System Design and Implementation, San Jose, CA (April 2010)
Perdisci, R., Vamo, M.U.: Towards a fully automated malware clustering validity analysis. In: Annual Computer Security Applications Conference, Orlando, FL (December 2012)
Polychronakis, M., Mavrommatis, P., Provos, N.: Ghost turns zombie: exploring the life cycle of web-based malware. In: USENIX Workshop on Large-Scale Exploits and Emergent Threats, San Francisco, CA (April 2008)
Provos, N., Mavrommatis, P., Rajab, M.A., Monrose, F.: All your iframes point to us. In: USENIX Security Symposium, San Jose, CA (July 2008)
Provos, N., McNamee, D., Mavrommatis, P., Wang, K., Modadugu, N.: The ghost in the browser: analysis of Web-based malware. In: USENIX Workshop on Hot Topics on Understanding Botnets, Cambridge, UK (April 2007)
Rafique, M.Z., Caballero, J.: Firma: Malware clustering and network signature generation with mixed network behaviors. In: International Symposium on Recent Advances in Intrusion Detection, St. Lucia (October 2013)
Rafique, M.Z., Huygens, C., Caballero, J.: Network Dialog Minimization and Network Dialog Diffing: Two Novel Primitives for Network Security Applications. Technical Report TR-IMDEA-SW-2014-001, IMDEA Software Institute, Madrid, Spain (March 2014). https://software.imdea.org/~juanca/papers/TR-IMDEA-SW-2014-001.pdf
Rieck, K., Holz, T., Willems, C., Düssel, P., Laskov, P.: Learning and classification of malware behavior. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Paris, France (July 2008)
Rossow, C., Dietrich, C.J.: Provex: Detecting botnets with encrypted command and control channels. In: SIG SIDAR Conference on Detection of Intrusions and Malware & Vulnerability Assessment, Berlin, Germany (July 2013)
Rossow, C., Dietrich, C.J., Bos, H., Cavallaro, L., van Steen, M., Freiling, F.C., Pohlmann, N.: Sandnet: network traffic analysis of malicious software. In: Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, Salzburg, Austria (April 2011)
Ssdsandbox. http://xml.ssdsandbox.net/dnslookup-dnsdb
Shafranovich, Y., Levine, J., Kucherawy, M.: An Extensible Format for Email Feedback Reports. RFC 5965 (August 2010). Updated by RFC 6650
Shue, C., Kalafut, A.J., Gupta, M.: Abnormally malicious autonomous systems and their internet connectivity. IEEE/ACM Transactions of Networking 20(1), (2012)
Snort http://www.snort.org/
Stone-Gross, B., Christopher, Kruegel, Almeroth, K., Moser, A., Kirda, E.: Fire: Finding rogue networks. In: Annual Computer Security Applications Conference, Honolulu, HI (December 2009)
Suricata http://suricata-ids.org/
The spamhaus project (October 2012) http://www.spamhaus.org/
Urlquery. http://urlquery.net/
Virustotal. http://www.virustotal.com/
Walls, R.J., Levine, B.N., Liberatore, M., Shields, C.: Effective digital forensics research is investigator-centric. In: USENIX Workshop on Hot Topics in Security, San Francisco, CA (August 2011)
Wang, Y.-M., Beck, D., Jiang, X., Roussev, R., Verbowski, C., Chen, S., King, S.: Automated web patrol with strider honeymonkeys: Finding web sites that exploit browser vulnerabilities. In: Network and Distributed System Security Symposium, San Diego, CA (February 2006)
Wepawet. https://wepawet.iseclab.org/
Wyke, J.: The Zeroaccess Botnet: Mining and Fraud for Massive Financial Gain (September 2012). http://www.sophos.com/en-us/why-sophos/our-people/technical-papers/zeroaccess-botnet.asp:x
X-arf: Network abuse reporting 2.0. http://x-arf.org/
Xylitol: Blackhole exploit kits update to v2.0 (September 2011). http://malware.dontneedcoffee.com/2012/09/blackhole2.0.html
Xylitol: Tracking Cyber Crime: Hands Up Affiliate (Ransomware) (December 2011). http://www.xylibox.com/2011/12/tracking-cyber-crime-affiliate.html
Zauner, C.: Implementation and Benchmarking of Perceptual Image Hash Functions. Master’s thesis, Upper Austria University of Applied Sciences, Hagenberg, Austria (July 2010)
Zelix klassmaster heavy duty protection. http://www.zelix.com/klassmaster/
Zhang, J., Seifert, C., Stokes, J. W., Lee, W.: Arrow: Generating signatures to detect drive-by downloads. In: International World Wide Web Conference, Hyderabad, India (April 2011)

Download references

Acknowledgments

The authors would like to thank Chris Grier and Kurt Thomas for their help and the anonymous reviewers for their insightful comments. This work was supported in part by the European Union through the FP7 network of excellence NESSoS (Grant FP7-ICT No. 256980), by the Spanish Government through the StrongSoft project (Grant TIN2012-39391-C04-01) and a Juan de la Cierva Fellowship for Juan Caballero, by the N-Greens CM project, by the Research Fund KU Leuven, and by the Fight against Crime Programme of the European Union (B-CCENTRE). Opinions expressed in this material are those of the authors and do not necessarily reflect the views of the sponsors.

Author information

Authors and Affiliations

IMDEA Software Institute, Madrid, Spain
Antonio Nappa & Juan Caballero
Universidad Politécnica de Madrid, Madrid, Spain
Antonio Nappa
iMinds-DistriNet, KU Leuven, Leuven, Belgium
M. Zubair Rafique

Authors

Antonio Nappa
View author publications
You can also search for this author in PubMed Google Scholar
M. Zubair Rafique
View author publications
You can also search for this author in PubMed Google Scholar
Juan Caballero
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Antonio Nappa.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Nappa, A., Rafique, M.Z. & Caballero, J. The MALICIA dataset: identification and analysis of drive-by download operations. Int. J. Inf. Secur. 14, 15–33 (2015). https://doi.org/10.1007/s10207-014-0248-7

Download citation

Published: 21 June 2014
Issue Date: February 2015
DOI: https://doi.org/10.1007/s10207-014-0248-7

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The MALICIA dataset: identification and analysis of drive-by download operations

Abstract

Access this article

Similar content being viewed by others

Driving in the Cloud: An Analysis of Drive-by Download Operations and Abuse Reporting

The Role of Cloud Services in Malicious Software: Trends and Insights

PExy: The Other Side of Exploit Kits

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

The MALICIA dataset: identification and analysis of drive-by download operations

Abstract

Access this article

Similar content being viewed by others

Driving in the Cloud: An Analysis of Drive-by Download Operations and Abuse Reporting

The Role of Cloud Services in Malicious Software: Trends and Insights

PExy: The Other Side of Exploit Kits

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation