Privacy Sensitive Distributed Data Mining from Multi-party Data
Privacy is becoming an increasingly important issue in data mining, particularly in security and counter-terrorism-related applications where the data is often sensitive. This paper considers the problem of mining privacy sensitive distributed multi-party data. It specifically considers the problem of computing statistical aggregates like the correlation matrix from privacy sensitive data where the program for computing the aggregates is not trusted by the owner(s) of the data. It presents a brief overview of a random projection-based technique to compute the correlation matrix from a single third-party data site and also multiple homogeneous sites.
Unable to display preview. Download preview PDF.
- 1.R. Agrawal and S. Ramakrishnan. Privacy-preserving data mining. In Proceedings of SIGMOD Conference, pages 439–450, 2000.Google Scholar
- 2.R. Arriaga and S. Vempala. An algorithmic theory of learning: Robust concepts and random projection. In Proc. of the 40th Foundations of Computer Science, New York, New York, 1999.Google Scholar
- 3.M. Kantarcioglu and C. Clifton. Privacy-preserving distributed mining of association rules on horizontally partitioned data, 2002.Google Scholar
- 4.H. Kargupta, S. Datta, and K. Sivakumar. Random value perturbation: Does it really preserve privacy? Technical Report TR-CS-03-25, Computer Science and Electrical Engineering Department, University of Maryland, Baltimore County, 2003.Google Scholar
- 5.H. Kargupta, K. Liu, and J. Ryan. Random projection and privacy preserving correlation computation from distributed data. Technical Report TR-CS-03-24, Computer Science and Electrical Engineering Department, University of Maryland, Baltimore County, 2003.Google Scholar
- 6.H. Kargupta, B. Park, D. Hershberger, and E. Johnson. Collective data mining: A new perspective towards distributed data mining. In Advances in Distributed and Parallel Knowledge Discovery, Eds: Kargupta, Hillol and Chan, Philip. AAAI/MIT Press, 2000.Google Scholar
- 7.R. Hecht-Nielsen. Context vectors: general purpose approximate meaning representations self-organized from raw data. Computational Intelligence: Imitating Life, pages 43–56, 1994.Google Scholar