Synthetic Data Generation and Defense in Depth Measurement of Web Applications

Boggs, Nathaniel; Zhao, Hang; Du, Senyao; Stolfo, Salvatore J.

doi:10.1007/978-3-319-11379-1_12

Nathaniel Boggs¹⁸,
Hang Zhao¹⁸,
Senyao Du¹⁸ &
…
Salvatore J. Stolfo¹⁸

Part of the book series: Lecture Notes in Computer Science ((LNSC,volume 8688))

Included in the following conference series:

International Workshop on Recent Advances in Intrusion Detection

3030 Accesses
5 Citations

Abstract

Measuring security controls across multiple layers of defense requires realistic data sets and repeatable experiments. However, data sets that are collected from real users often cannot be freely exchanged due to privacy and regulatory concerns. Synthetic datasets, which can be shared, have in the past had critical flaws or at best been one time collections of data focusing on a single layer or type of data. We present a framework for generating synthetic datasets with normal and attack data for web applications across multiple layers simultaneously. The framework is modular and designed for data to be easily recreated in order to vary parameters and allow for inline testing. We build a prototype data generator using the framework to generate nine datasets with data logged on four layers: network, file accesses, system calls, and database simultaneously. We then test nineteen security controls spanning all four layers to determine their sensitivity to dataset changes, compare performance even across layers, compare synthetic data to real production data, and calculate combined defense in depth performance of sets of controls.

This work is sponsored in part by Air Force Office of Scientific Research (AFOSR) grant FA9550-12-1-0162 “Designing for Measurable Security” and DARPA grant FA8650-11-C-7190 “Mission-oriented Resilient Clouds.” The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of AFOSR or DARPA.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Sweeney, L.: k-anonymity: A model for protecting privacy. International Journal of Uncertainty, Fuzziness & Knowledge-Based Systems 10(5), 557 (2002), http://ezproxy.cul.columbia.edu/login?url=http://search.ebscohost.com/login.aspx?direct=true&db=bah&AN=8584293&site=ehost-live&scope=site
Article MATH MathSciNet Google Scholar
Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006), http://link.springer.com/chapter/10.1007/11787006_1
Chapter Google Scholar
Selenium, http://seleniumhq.org
Rapid 7 Open Source Projects Metasploit framework, http://www.metasploit.com/
Shiravi, A., Shiravi, H., Tavallaee, M., Ghorbani, A.A.: Toward developing a systematic approach to generate benchmark datasets for intrusion detection. Computers & Security 31(3), 357–374 (2012), http://dx.doi.org/10.1016/j.cose.2011.12.012
Article Google Scholar
Shaoul, C., Westbury, C.: A reduced redundancy usenet corpus (2005–2011), http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html (2013)
Shaoul, C., Westbury, C.: The westbury lab wikipedia corpus (2010), http://www.psych.ualberta.ca/~westburylab/downloads/westburylab.wikicorp.download.html
Stack exchange data dump, https://archive.org/details/stackexchange
Stack exchange data explorer, http://data.stackexchange.com/
Skull security wiki rockyou password file, https://wiki.skullsecurity.org/Passwords
Wikimedia commons public domain images (2013), http://commons.wikimedia.org/wiki/Category:PD-user
OSVDB, Testlink... arbitrary file upload weakness (2012), http://osvdb.org/show/osvdb/85446
OSVDB, Foxypress plugin for wordpress.. file upload php code execution (2012), http://osvdb.org/show/osvdb/82652
Boggs, N., Hiremagalore, S., Stavrou, A., Stolfo, S.J.: Cross-domain collaborative anomaly detection: So far yet so close. In: Sommer, R., Balzarotti, D., Maier, G. (eds.) RAID 2011. LNCS, vol. 6961, pp. 142–160. Springer, Heidelberg (2011), http://dx.doi.org/10.1007/978-3-642-23644-0_8
Chapter Google Scholar
Song, Y., Keromytis, A.D., Stolfo, S.J.: Spectrogram: A mixture-of-markov-chains model for anomaly detection in web traffic. In: NDSS 2009: Proceedings of the 16th Annual Network and Distributed System Security Symposium (2009)
Google Scholar
Wang, K., Parekh, J.J., Stolfo, S.J.: Anagram: A Content Anomaly Detector Resistant to Mimicry Attack. In: Symposium on Recent Advances in Intrusion Detection, Hamburg, Germany (2006)
Google Scholar
Kruegel, C., Vigna, G.: Anomaly Detection of Web-based Attacks. In: ACM Conference on Computer and Communication Security, Washington, D.C (2003)
Google Scholar
Cretu, G., Stavrou, A., Locasto, M., Stolfo, S., Keromytis, A.: Casting out demons: Sanitizing training data for anomaly sensors. In: IEEE Symposium on Security and Privacy, SP 2008, pp. 81–95 (May 2008)
Google Scholar
Cretu-Ciocarlie, G.F., Stavrou, A., Locasto, M.E., Stolfo, S.J.: Adaptive Anomaly Detection via Self-Calibration and Dynamic Updating. In: Kirda, E., Jha, S., Balzarotti, D. (eds.) RAID 2009. LNCS, vol. 5758, pp. 41–60. Springer, Heidelberg (2009)
Chapter Google Scholar
Valeur, F., Mutz, D., Vigna, G.: A learning-based approach to the detection of sql attacks. In: Julisch, K., Kruegel, C. (eds.) DIMVA 2005. LNCS, vol. 3548, pp. 123–140. Springer, Heidelberg (2005)
Chapter Google Scholar
Greensql opensource database firewall, http://www.greensql.net/download-dot-net
Stolfo, S.J., Apap, F., Eskin, E., Heller, K., Honig, A., Svore, K.: A comparative evaluation of two algorithms for windows registry anomaly detection. Journal of Computer Security, 659–693 (2005)
Google Scholar
The linux audit daemon, http://linux.die.net/man/8/auditd
Eskin, E.: Anomaly detection over noisy data using learned probability distributions. In: Proceedings of the International Conference on Machine Learning, pp. 255–262. Morgan Kaufmann (2000)
Google Scholar
Mutz, D., Valeur, F., Vigna, G., Kruegel, C.: Anomalous system call detection. ACM Trans. Inf. Syst. Secur. 9(1), 61–93 (2006), http://doi.acm.org/10.1145/1127345.1127348
Article Google Scholar
System intrusion analysis & reporting environment(snare) for linux, http://www.intersectalliance.com/projects/snare/ .
Forrest, S., Hofmeyr, S.A., Somayaji, A., Longstaff, T.A.: A sense of self for unix processes. In: Proceedings of the 1996 IEEE Symposium on Security and Privacy, SP 1996. IEEE Computer Society, Washington, DC (1996), http://dl.acm.org/citation.cfm?id=525080.884258
Sequence-based intrusion detection, http://www.cs.unm.edu/~immsec/systemcalls.htm
Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd cup 99 data set. In: Proceedings of the Second IEEE International Conference on Computational Intelligence for Security and Defense Applications, CISDA 2009, pp. 53–58. IEEE Press, Piscataway (2009), http://dl.acm.org/citation.cfm?id=1736481.1736489
Google Scholar
The DETER testbed: Overview (August 2004) , http://www.isi.edu/deter/docs/testbed.overview.pdf
Benzel, T., Braden, R., Kim, D., Neuman, C., Joseph, A.D., Sklower, K.: Experience with deter: A testbed for security research. In: TRIDENTCOM. IEEE (2006)
Google Scholar
Ingham, K.L., Inoue, H.: Comparing anomaly detection techniques for http. In: Kruegel, C., Lippmann, R., Clark, A. (eds.) RAID 2007. LNCS, vol. 4637, pp. 42–62. Springer, Heidelberg (2007), http://dx.doi.org/10.1007/978-3-540-74320-0_3
Chapter Google Scholar
Cavusoglu, H., Mishra, B., Raghunathan, S.: A model for evaluating it security investments. Commun. ACM 47(7), 87–92 (2004), http://doi.acm.org/10.1145/1005817.1005828
Article Google Scholar
Boggs, N.G., Stolfo, S.: Aldr: A new metric for measuring effective layering of defenses. In: Fifth Layered Assurance Workshop (LAW 2011), Orlando, Florida, December 5-6 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Columbia University, New York, NY, USA
Nathaniel Boggs, Hang Zhao, Senyao Du & Salvatore J. Stolfo

Authors

Nathaniel Boggs
View author publications
You can also search for this author in PubMed Google Scholar
Hang Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Senyao Du
View author publications
You can also search for this author in PubMed Google Scholar
Salvatore J. Stolfo
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

George Mason University, Fairfax, VA, USA
Angelos Stavrou
VU University Amsterdam, The Netherlands
Herbert Bos
Stevens Institute of Technology, NJ, USA
Georgios Portokalidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Boggs, N., Zhao, H., Du, S., Stolfo, S.J. (2014). Synthetic Data Generation and Defense in Depth Measurement of Web Applications. In: Stavrou, A., Bos, H., Portokalidis, G. (eds) Research in Attacks, Intrusions and Defenses. RAID 2014. Lecture Notes in Computer Science, vol 8688. Springer, Cham. https://doi.org/10.1007/978-3-319-11379-1_12

Download citation

DOI: https://doi.org/10.1007/978-3-319-11379-1_12
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11378-4
Online ISBN: 978-3-319-11379-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics