Abstract
Privacy-preserving is a major concern in the application of data mining techniques to datasets containing personal, sensitive, or confidential information. Data distortion is a critical component to preserve privacy in security-related data mining applications, such as in data mining-based terrorist analysis systems. We propose a sparsified Singular Value Decomposition (SVD) method for data distortion. We also put forth a few metrics to measure the difference between the distorted dataset and the original dataset and the degree of the privacy protection. Our experimental results using synthetic and real world datasets show that the sparsified SVD method works well in preserving privacy as well as maintaining utility of the datasets.
Similar content being viewed by others
References
Agrawal D, Aggarwal CC (2001) On the design and quantification of privacy preserving data mining algorithms. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems, Santa Barbara, California, USA
Agrawal R, Srikant R (2000) Privacy-preserving data mining. In: Proceedings of the 2000 ACM SIGMOD international conference on management of data, Dallas, Texas
Agrawal R, Evfimievski A, Srikant R (2003) Information sharing across private databases. In: Proceedings of the 2003 ACM SIGMOD international conference on management of data, San Diego, CA, pp 86–97
Berry MW (1992) Large scale singular value decompositions. Int J Supercomput Applic High Perf Comput 6:13–49
Berry MW, Drmac Z, Jessup ER (1999) Matrix, vector space, and information retrieval. SIAM Rev 41:335–362
Burges C (1998) A tutorial on support vector machine for pattern recognition. Kluwer Academic Publishers, Boston
Campbell C (2002) Kernel methods: a survey of current techniques. Neurocomputing 48:63–84
Datta S, Kargupta H, Sivakumar K (2003) Homeland defense, privacy-sensitive data mining, and random value distortion. In: Proceedings of the 2003 workshop on data mining for counter terrorism and security, San Francisco, CA
Deerwester S, Dumais S, Furnas G, Landauer T, Harsgman R (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41:391–407
Dempsey JX, Rosenzweig P (2004) Technologies that can protect privacy as information is shared to combat terrorism. Legal Memorandum #11, The Heritage Foundation. Available at www.heritage.org/Research/HomelandDefense/lm11.cfm
Estvill-Castro V, Brankovic L, Dowe DL (1999) Privacy in data mining. Australian Computer Society, NSW Branch, Australia. Available at www.acs.org.au/nsw/articles/1999082.html
Frankes W, Baeza-Yates R (1992) Information retrieval: data structures and algorithms. Prentice-Hall, Englewood Cliffs, NJ
Gao J, Zhang J (2003) Sparsification strategies in latent semantic indexing. In: Berry MW, Pottenger WM (eds) Proceedings of the 2003 text mining workshop, San Francisco, CA, pp 93–103
Gao J, Zhang J (2005) Clustered SVD strategies in latent semantic indexing. Inf Process Manage 41(5), 1051–1063
Gilburd B, Schuster A, Wolff R (2004) K-TTP: a new privacy model for large-scale distributed environments. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, Seattle, WA, USA
Golub GH, van Loan CF (1996) Matrix computations, 3rd edn. John Hopkins University, Columbia, MD
Joachims T (1999) Making large-scale SVM learning practical. In: Schölkopf B, Burges C, Smola A (eds) Advances in Kernel methods – support vector learning, MIT-Press, Cambridge, MA
Liew CK, Choi UJ, Liew CJ (1985) A data distortion by probability distribution. ACM Trans Database Syst 10:395–411
Li Y, Gong S, Liddell H (2000) Support vector regression and classification based multiview face detection and recognition. In: Proceedings of the IEEE international conference on automatic face and gesture recognition (FGR'00), Grenoble, France
Skillicorn DB (2003) Clusters within clusters: SVD and counterterrorism. In: Proceedings of 2003 workshop on data mining for counter terrorism and security, San Francisco, CA, p 12
Skillicorn DB (2004) Social network analysis via matrix decompositions: applications to al Qaeda. Technical report, School of Computing, Queen's University, Canada
Skillicorn DB, Vats N (2004) Novel information discovery for intelligence and counterterrorism. Technical report, School of Computing, Queen's University, Canada, pp 488
Sun A, Naing M, Lim EP, Lam W (2003) Using support vector machines for terrorism information extraction. Lecture Notes in Comput Sci 2665:1–12
Sweeney L (2002) K-anonymity: A model for protecting privacy. Int J Uncertainty, Fuzziness Knowl-Based Syst 10:557–570
Taipale KA (2003) Data mining and domestic security: connecting the dots to make sense of data. Columbia Sci Tech Law Rev 5:1–83
Tether T (2003) Statement before the subcommittee on technology, Information Policy, Intergovernmental Relations and the Census, Committee on Government Reform. U.S. House of Representatives. Available at www.fas.gov/irp/congress/2003_hr/050603tether.html
Vapnik VN (1998) Statistical learning theory. Wiley, New York
Verykios VS, Bertino E, Fovino IN, Provenza LP, Saygin Y, Theodoridis Y (2004) State-of-the-art in privacy preserving data mining. SIGMOD 33:50–57
Author information
Authors and Affiliations
Corresponding author
Additional information
Shuting Xu received her PhD in Computer Science from the University of Kentucky in 2005. Dr. Xu is presently an Assistant Professor in the Department of Computer Information Systems at the Virginia State University. Her research interests include data mining and information retrieval, database systems, parallel, and distributed computing.
Jun Zhang received a PhD from The George Washington University in 1997. He is an Associate Professor of Computer Science and Director of the Laboratory for High Performance Scientific Computing & Computer Simulation and Laboratory for Computational Medical Imaging & Data Analysis at the University of Kentucky. His research interests include computational neuroinformatics, data miningand information retrieval, large scale parallel and scientific computing, numerical simulation, iterative and preconditioning techniques for large scale matrix computation. Dr. Zhang is associate editor and on the editorial boards of four international journals in computer simulation andcomputational mathematics, and is on the program committees of a few international conferences. His research work has been funded by the U.S. National Science Foundation and the Department of Energy. He is recipient of the U.S. National Science Foundation CAREER Award and several other awards.
Dianwei Han received an M.E. degree from Beijing Institute of Technology, Beijing, China, in 1995. From 1995to 1998, he worked in a Hitachi company(BHH) in Beijing, China. He received an MS degree from Lamar University, USA, in 2003. He is currently a PhD student in the Department of Computer Science, University of Kentucky, USA. His research interests include data mining and information retrieval, computational medical imaging analysis, and artificial intelligence.
Jie Wang received the masters degree in Industrial Automation from Beijing University of Chemical Technology in 1996. She is currently a PhD student and a member of the Laboratory for High Performance Computing and Computer Simulation in the Department of Computer Science at the University of Kentucky, USA. Her research interests include data mining and knowledge discovery, information filtering and retrieval, inter-organizational collaboration mechanism, and intelligent e-Technology.
Rights and permissions
About this article
Cite this article
Xu, S., Zhang, J., Han, D. et al. Singular value decomposition based data distortion strategy for privacy protection. Knowl Inf Syst 10, 383–397 (2006). https://doi.org/10.1007/s10115-006-0001-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0001-2