Abstract
Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls.
This work is sponsored in part by Air Force Office of Scientific Research (AFOSR) grant FA9550-12-1-0162 “Designing for Measurable Security” and DARPA grant FA8650-11-C-7190 “Mission-oriented Resilient Clouds.” The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFOSR or DARPA.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems 10(5), 557 (2002), http://ezproxy.cul.columbia.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=bah&AN=8584293&site=ehost-live&scope=site
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006), http://link.springer.com/chapter/10.1007/11787006_1
Selenium, http://seleniumhq.org
Rapid 7 Open Source Projects Metasploit framework, http://www.metasploit.com/
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security 31(3), 357–374 (2012), http://dx.doi.org/10.1016/j.cose.2011.12.012
Shaoul, C., Westbury, C.: A reduced redundancy usenet corpus (2005–2011), http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html (2013)
Shaoul, C., Westbury, C.: The westbury lab wikipedia corpus (2010), http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html
Stack exchange data dump, https://archive.org/details/stackexchange
Stack exchange data explorer, http://data.stackexchange.com/
Skull security wiki rockyou password file, https://wiki.skullsecurity.org/Passwords
Wikimedia commons public domain images (2013), http://commons.wikimedia.org/wiki/Category:PD-user
OSVDB, Testlink... arbitrary file upload weakness (2012), http://osvdb.org/show/osvdb/85446
OSVDB, Foxypress plugin for wordpress.. file upload php code execution (2012), http://osvdb.org/show/osvdb/82652
Boggs, N., Hiremagalore, S., Stavrou, A., Stolfo, S.J.: Cross-domain collaborative anomaly detection: So far yet so close. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 142–160. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-23644-0_8
Song, Y., Keromytis, A.D., Stolfo, S.J.: Spectrogram: A mixture-of-markov-chains model for anomaly detection in web traffic. In: NDSS 2009: Proceedings of the 16th Annual Network and Distributed System Security Symposium (2009)
Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A Content Anomaly Detector Resistant to Mimicry Attack. In: Symposium on Recent Advances in Intrusion Detection, Hamburg, Germany (2006)
Kruegel, C., Vigna, G.: Anomaly Detection of Web-based Attacks. In: ACM Conference on Computer and Communication Security, Washington, D.C (2003)
Cretu, G., Stavrou, A., Locasto, M., Stolfo, S., Keromytis, A.: Casting out demons: Sanitizing training data for anomaly sensors. In: IEEE Symposium on Security and Privacy, SP 2008, pp. 81–95 (May 2008)
Cretu-Ciocarlie, G.F., Stavrou, A., Locasto, M.E., Stolfo, S.J.: Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 41–60. Springer, Heidelberg (2009)
Valeur, F., Mutz, D., Vigna, G.: A learning-based approach to the detection of sql attacks. In: Julisch, K., Kruegel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 123–140. Springer, Heidelberg (2005)
Greensql opensource database firewall, http://www.greensql.net/download-dot-net
Stolfo, S.J., Apap, F., Eskin, E., Heller, K., Honig, A., Svore, K.: A comparative evaluation of two algorithms for windows registry anomaly detection. Journal of Computer Security, 659–693 (2005)
The linux audit daemon, http://linux.die.net/man/8/auditd
Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proceedings of the International Conference on Machine Learning, pp. 255–262. Morgan Kaufmann (2000)
Mutz, D., Valeur, F., Vigna, G., Kruegel, C.: Anomalous system call detection. ACM Trans. Inf. Syst. Secur. 9(1), 61–93 (2006), http://doi.acm.org/10.1145/1127345.1127348
System intrusion analysis & reporting environment(snare) for linux, http://www.intersectalliance.com/projects/snare/ .
Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, SP 1996. IEEE Computer Society, Washington, DC (1996), http://dl.acm.org/citation.cfm?id=525080.884258
Sequence-based intrusion detection, http://www.cs.unm.edu/~immsec/systemcalls.htm
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009), http://dl.acm.org/citation.cfm?id=1736481.1736489
The DETER testbed: Overview (August 2004) , http://www.isi.edu/deter/docs/testbed.overview.pdf
Benzel, T., Braden, R., Kim, D., Neuman, C., Joseph, A.D., Sklower, K.: Experience with deter: A testbed for security research. In: TRIDENTCOM. IEEE (2006)
Ingham, K.L., Inoue, H.: Comparing anomaly detection techniques for http. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 42–62. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-74320-0_3
Cavusoglu, H., Mishra, B., Raghunathan, S.: A model for evaluating it security investments. Commun. ACM 47(7), 87–92 (2004), http://doi.acm.org/10.1145/1005817.1005828
Boggs, N.G., Stolfo, S.: Aldr: A new metric for measuring effective layering of defenses. In: Fifth Layered Assurance Workshop (LAW 2011), Orlando, Florida, December 5-6 (2011)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Boggs, N., Zhao, H., Du, S., Stolfo, S.J. (2014). Synthetic Data Generation and Defense in Depth Measurement of Web Applications. In: Stavrou, A., Bos, H., Portokalidis, G. (eds) Research in Attacks, Intrusions and Defenses. RAID 2014. Lecture Notes in Computer Science, vol 8688. Springer, Cham. https://doi.org/10.1007/978-3-319-11379-1_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-11379-1_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11378-4
Online ISBN: 978-3-319-11379-1
eBook Packages: Computer ScienceComputer Science (R0)