Synthetic Data Generation and Defense in Depth Measurement of Web Applications

  • Nathaniel Boggs
  • Hang Zhao
  • Senyao Du
  • Salvatore J. Stolfo
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8688)


Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls.


Metrics Defense in Depth Web Application Attacks Measuring Security 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems 10(5), 557 (2002), CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006), CrossRefGoogle Scholar
  3. 3.
  4. 4.
    Rapid 7 Open Source Projects Metasploit framework,
  5. 5.
    Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security 31(3), 357–374 (2012), CrossRefGoogle Scholar
  6. 6.
    Shaoul, C., Westbury, C.: A reduced redundancy usenet corpus (2005–2011), (2013)
  7. 7.
    Shaoul, C., Westbury, C.: The westbury lab wikipedia corpus (2010),
  8. 8.
  9. 9.
    Stack exchange data explorer,
  10. 10.
    Skull security wiki rockyou password file,
  11. 11.
    Wikimedia commons public domain images (2013),
  12. 12.
    OSVDB, Testlink... arbitrary file upload weakness (2012),
  13. 13.
    OSVDB, Foxypress plugin for wordpress.. file upload php code execution (2012),
  14. 14.
    Boggs, N., Hiremagalore, S., Stavrou, A., Stolfo, S.J.: Cross-domain collaborative anomaly detection: So far yet so close. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 142–160. Springer, Heidelberg (2011), CrossRefGoogle Scholar
  15. 15.
    Song, Y., Keromytis, A.D., Stolfo, S.J.: Spectrogram: A mixture-of-markov-chains model for anomaly detection in web traffic. In: NDSS 2009: Proceedings of the 16th Annual Network and Distributed System Security Symposium (2009)Google Scholar
  16. 16.
    Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A Content Anomaly Detector Resistant to Mimicry Attack. In: Symposium on Recent Advances in Intrusion Detection, Hamburg, Germany (2006)Google Scholar
  17. 17.
    Kruegel, C., Vigna, G.: Anomaly Detection of Web-based Attacks. In: ACM Conference on Computer and Communication Security, Washington, D.C (2003)Google Scholar
  18. 18.
    Cretu, G., Stavrou, A., Locasto, M., Stolfo, S., Keromytis, A.: Casting out demons: Sanitizing training data for anomaly sensors. In: IEEE Symposium on Security and Privacy, SP 2008, pp. 81–95 (May 2008)Google Scholar
  19. 19.
    Cretu-Ciocarlie, G.F., Stavrou, A., Locasto, M.E., Stolfo, S.J.: Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 41–60. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  20. 20.
    Valeur, F., Mutz, D., Vigna, G.: A learning-based approach to the detection of sql attacks. In: Julisch, K., Kruegel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 123–140. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  21. 21.
    Greensql opensource database firewall,
  22. 22.
    Stolfo, S.J., Apap, F., Eskin, E., Heller, K., Honig, A., Svore, K.: A comparative evaluation of two algorithms for windows registry anomaly detection. Journal of Computer Security, 659–693 (2005)Google Scholar
  23. 23.
    The linux audit daemon,
  24. 24.
    Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proceedings of the International Conference on Machine Learning, pp. 255–262. Morgan Kaufmann (2000)Google Scholar
  25. 25.
    Mutz, D., Valeur, F., Vigna, G., Kruegel, C.: Anomalous system call detection. ACM Trans. Inf. Syst. Secur. 9(1), 61–93 (2006), CrossRefGoogle Scholar
  26. 26.
    System intrusion analysis & reporting environment(snare) for linux,
  27. 27.
    Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, SP 1996. IEEE Computer Society, Washington, DC (1996),
  28. 28.
    Sequence-based intrusion detection,
  29. 29.
    Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009), Google Scholar
  30. 30.
    The DETER testbed: Overview (August 2004) ,
  31. 31.
    Benzel, T., Braden, R., Kim, D., Neuman, C., Joseph, A.D., Sklower, K.: Experience with deter: A testbed for security research. In: TRIDENTCOM. IEEE (2006)Google Scholar
  32. 32.
    Ingham, K.L., Inoue, H.: Comparing anomaly detection techniques for http. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 42–62. Springer, Heidelberg (2007), CrossRefGoogle Scholar
  33. 33.
    Cavusoglu, H., Mishra, B., Raghunathan, S.: A model for evaluating it security investments. Commun. ACM 47(7), 87–92 (2004), CrossRefGoogle Scholar
  34. 34.
    Boggs, N.G., Stolfo, S.: Aldr: A new metric for measuring effective layering of defenses. In: Fifth Layered Assurance Workshop (LAW 2011), Orlando, Florida, December 5-6 (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Nathaniel Boggs
    • 1
  • Hang Zhao
    • 1
  • Senyao Du
    • 1
  • Salvatore J. Stolfo
    • 1
  1. 1.Columbia UniversityNew YorkUSA

Personalised recommendations