Abstract
Testing of database applications is of great importance. Although various studies have been conducted to investigate testing techniques for database design, relatively few efforts have been made to explicitly address the testing of database applications which requires a large amount of representative data available. As testing over live production databases is often infeasible in many situations due to the high risks of disclosure of confidential information or incorrect updating of real data, in this paper we investigate the problem of generating synthetic database based on a-priori knowledge about production database. Our approach is to fit general location model using various characteristics (e.g., constraints, statistics, rules) extracted from production database and then generate synthetic data using model learnt. As characteristics extracted may contain information which may be used by attacker to derive some confidential information, we present a disclosure analysis method which is based on cell suppression technique. Our method is effective and efficient to remove aggregate private information during data generation.
This research was supported by USA National Science Foundation Grant CCR-0310974.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Chays, D., Dan, S., Frankl, P., Vokolos, F., Weyuker, E.: A framework for testing database applications. In: Proceedings of the ISSTA, Portland, Oregon (2000)
Dobra, A., Fienberg, S.E.: Bounds for cell entries in contingency tables induced by fixed marginal totals with applications to disclosure limitation. Statistical Journal of the United Nations ECE 18, 363–371 (2001)
Niagara, http://www.cs.wisc.edu/niagara/datagendownload.html
Schafer, J.L.: Analysis of Incomplete Multivariate Data. Chapman Hall, Boca Raton (1997)
Stephens, J., Poess, M.: Mudd: A multi-dimensional data generator. In: Proceedings of the 4th International Workshop on Software and Performance, pp. 104–109 (2004)
Wu, X., Wang, Y., Zheng, Y.: Privacy preserving database application testing. In: Proceedings of the ACM Workshop on Privacy in Electronic Society, pp. 118–128 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wu, X., Wang, Y., Zheng, Y. (2005). Statistical Database Modeling for Privacy Preserving Database Generation. In: Hacid, MS., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds) Foundations of Intelligent Systems. ISMIS 2005. Lecture Notes in Computer Science(), vol 3488. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11425274_40
Download citation
DOI: https://doi.org/10.1007/11425274_40
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25878-0
Online ISBN: 978-3-540-31949-8
eBook Packages: Computer ScienceComputer Science (R0)