Skip to main content

Synthetic Data Generation and Defense in Depth Measurement of Web Applications

  • Conference paper
Research in Attacks, Intrusions and Defenses (RAID 2014)

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8688))

Included in the following conference series:

Abstract

Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls.

This work is sponsored in part by Air Force Office of Scientific Research (AFOSR) grant FA9550-12-1-0162 “Designing for Measurable Security” and DARPA grant FA8650-11-C-7190 “Mission-oriented Resilient Clouds.” The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFOSR or DARPA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems 10(5), 557 (2002), http://ezproxy.cul.columbia.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=bah&AN=8584293&site=ehost-live&scope=site

    Article  MATH  MathSciNet  Google Scholar 

  2. Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006), http://link.springer.com/chapter/10.1007/11787006_1

    Chapter  Google Scholar 

  3. Selenium, http://seleniumhq.org

  4. Rapid 7 Open Source Projects Metasploit framework, http://www.metasploit.com/

  5. Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security 31(3), 357–374 (2012), http://dx.doi.org/10.1016/j.cose.2011.12.012

    Article  Google Scholar 

  6. Shaoul, C., Westbury, C.: A reduced redundancy usenet corpus (2005–2011), http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html (2013)

  7. Shaoul, C., Westbury, C.: The westbury lab wikipedia corpus (2010), http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html

  8. Stack exchange data dump, https://archive.org/details/stackexchange

  9. Stack exchange data explorer, http://data.stackexchange.com/

  10. Skull security wiki rockyou password file, https://wiki.skullsecurity.org/Passwords

  11. Wikimedia commons public domain images (2013), http://commons.wikimedia.org/wiki/Category:PD-user

  12. OSVDB, Testlink... arbitrary file upload weakness (2012), http://osvdb.org/show/osvdb/85446

  13. OSVDB, Foxypress plugin for wordpress.. file upload php code execution (2012), http://osvdb.org/show/osvdb/82652

  14. Boggs, N., Hiremagalore, S., Stavrou, A., Stolfo, S.J.: Cross-domain collaborative anomaly detection: So far yet so close. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 142–160. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-23644-0_8

    Chapter  Google Scholar 

  15. Song, Y., Keromytis, A.D., Stolfo, S.J.: Spectrogram: A mixture-of-markov-chains model for anomaly detection in web traffic. In: NDSS 2009: Proceedings of the 16th Annual Network and Distributed System Security Symposium (2009)

    Google Scholar 

  16. Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A Content Anomaly Detector Resistant to Mimicry Attack. In: Symposium on Recent Advances in Intrusion Detection, Hamburg, Germany (2006)

    Google Scholar 

  17. Kruegel, C., Vigna, G.: Anomaly Detection of Web-based Attacks. In: ACM Conference on Computer and Communication Security, Washington, D.C (2003)

    Google Scholar 

  18. Cretu, G., Stavrou, A., Locasto, M., Stolfo, S., Keromytis, A.: Casting out demons: Sanitizing training data for anomaly sensors. In: IEEE Symposium on Security and Privacy, SP 2008, pp. 81–95 (May 2008)

    Google Scholar 

  19. Cretu-Ciocarlie, G.F., Stavrou, A., Locasto, M.E., Stolfo, S.J.: Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 41–60. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  20. Valeur, F., Mutz, D., Vigna, G.: A learning-based approach to the detection of sql attacks. In: Julisch, K., Kruegel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 123–140. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  21. Greensql opensource database firewall, http://www.greensql.net/download-dot-net

  22. Stolfo, S.J., Apap, F., Eskin, E., Heller, K., Honig, A., Svore, K.: A comparative evaluation of two algorithms for windows registry anomaly detection. Journal of Computer Security, 659–693 (2005)

    Google Scholar 

  23. The linux audit daemon, http://linux.die.net/man/8/auditd

  24. Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proceedings of the International Conference on Machine Learning, pp. 255–262. Morgan Kaufmann (2000)

    Google Scholar 

  25. Mutz, D., Valeur, F., Vigna, G., Kruegel, C.: Anomalous system call detection. ACM Trans. Inf. Syst. Secur. 9(1), 61–93 (2006), http://doi.acm.org/10.1145/1127345.1127348

    Article  Google Scholar 

  26. System intrusion analysis & reporting environment(snare) for linux, http://www.intersectalliance.com/projects/snare/ .

  27. Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, SP 1996. IEEE Computer Society, Washington, DC (1996), http://dl.acm.org/citation.cfm?id=525080.884258

  28. Sequence-based intrusion detection, http://www.cs.unm.edu/~immsec/systemcalls.htm

  29. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009), http://dl.acm.org/citation.cfm?id=1736481.1736489

    Google Scholar 

  30. The DETER testbed: Overview (August 2004) , http://www.isi.edu/deter/docs/testbed.overview.pdf

  31. Benzel, T., Braden, R., Kim, D., Neuman, C., Joseph, A.D., Sklower, K.: Experience with deter: A testbed for security research. In: TRIDENTCOM. IEEE (2006)

    Google Scholar 

  32. Ingham, K.L., Inoue, H.: Comparing anomaly detection techniques for http. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 42–62. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-74320-0_3

    Chapter  Google Scholar 

  33. Cavusoglu, H., Mishra, B., Raghunathan, S.: A model for evaluating it security investments. Commun. ACM 47(7), 87–92 (2004), http://doi.acm.org/10.1145/1005817.1005828

    Article  Google Scholar 

  34. Boggs, N.G., Stolfo, S.: Aldr: A new metric for measuring effective layering of defenses. In: Fifth Layered Assurance Workshop (LAW 2011), Orlando, Florida, December 5-6 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Boggs, N., Zhao, H., Du, S., Stolfo, S.J. (2014). Synthetic Data Generation and Defense in Depth Measurement of Web Applications. In: Stavrou, A., Bos, H., Portokalidis, G. (eds) Research in Attacks, Intrusions and Defenses. RAID 2014. Lecture Notes in Computer Science, vol 8688. Springer, Cham. https://doi.org/10.1007/978-3-319-11379-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-11379-1_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-11378-4

  • Online ISBN: 978-3-319-11379-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics