“Secure” Log-Linear and Logistic Regression Analysis of Distributed Databases

  • Stephen E. Fienberg
  • William J. Fulp
  • Aleksandra B. Slavkovic
  • Tracey A. Wrobel
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4302)

Abstract

The machine learning community has focused on confidentiality problems associated with statistical analyses that “integrate” data stored in multiple, distributed databases where there are barriers to simply integrating the databases. This paper discusses various techniques which can be used to perform statistical analysis for categorical data, especially in the form of log-linear analysis and logistic regression over partitioned databases, while limiting confidentiality concerns. We show how ideas from the current literature that focus on “secure” summations and secure regression analysis can be adapted or generalized to the categorical data setting.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agresti, A.: Categorical Data Analysis, 2nd edn. Wiley, New York (2002)MATHCrossRefGoogle Scholar
  2. 2.
    Bertino, E., Fovino, I.N., Provenza, L.P.: A framework for evaluating privacy preserving data mining algorithms. Data Mining and Knowledge Discovery 11, 121–154 (2005)CrossRefMathSciNetGoogle Scholar
  3. 3.
    Bishop, Y.M.M., Fienberg, S.E., Holland, P.W.: Discrete Multivariate Analysis: Therory and Practice. MIT Press, Cambridge (1975)Google Scholar
  4. 4.
    Dobra, A., Fienberg, S.E., Rinaldo, A., Zhou, Y.: Confidentiality Protection and Utility for Contingency Table Data: Algorithms and Links to Statistical Theory (unpublished manuscript, 2006)Google Scholar
  5. 5.
    Dobra, A., Fienberg, S.E., Trottini, M.: Assessing the risk of disclosure of confidential categorical data. In: Bernardo, J., et al. (eds.) Bayesian Statistics 7, pp. 125–144. Oxford University Press, Oxford (2003)Google Scholar
  6. 6.
    Eriksson, N., Fienberg, S.E., Rinaldo, A., Sullivant, S.: Polyhedral conditions for the non-existence of the MLE for hierarchical log-linear models. Journal of Symbolic Computation 41, 222–233 (2006)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Fienberg, S.E.: The Analysis of Cross-Classified Categorical Data. MIT Press, Cambridge (1980)MATHGoogle Scholar
  8. 8.
    Fienberg, S.E.: Datamining and Disclosure Limitation for Categorical Statistical Databases. In: Proceedings of Workshop on Privacy and Security Aspects of Data Mining, Fourth IEEE International Conference on Data Mining (ICDM), Brighton, UK (2004)Google Scholar
  9. 9.
    Fienberg, Rinaldo.: Three centuries of categorical data analysis: log-linear models and maximum likelihood estimation, Journal of Statistical Planning and Inference (to appear, 2006)Google Scholar
  10. 10.
    Fienberg, S.E., Slavkovic, A.B.: Making the release of confidential data from multi-way tables count. Chance 17(3), 5–10 (2004a)MathSciNetGoogle Scholar
  11. 11.
    Fienberg, S.E., Slavkovic, A.B.: Preserving the Confidentiality of Categorical Statistical Data Bases When Releasing Information for Association Rules. Data Mining and Knowledge Discovery Journal 11(2), 155–180 (2005)CrossRefMathSciNetGoogle Scholar
  12. 12.
    Haberman, S.J.: The Analysis of Frequency Data. University of Chicago Press, Chicago (1974)MATHGoogle Scholar
  13. 13.
    Kantarcioglu, M., Clifton, C.: Privacy preserving data mining of association rules on horizontally partitioned data. Transactions on Knowledge and Data Engineering 16, 1026–1037 (2004)CrossRefGoogle Scholar
  14. 14.
    Karr, A.F., Lin, X., Reiter, J.P., Sanil, A.P.: Privacy preserving analysis of vertically partitioned data using secure matrix products. In: J. Official Statist. (submitted for publication, 2004), available on-line at www.niss.org/dgii/technicalreports.html
  15. 15.
    Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure regressions on distributed databases. Journal of Computational and Graphical Statistics 14, 263–279 (2005a)CrossRefMathSciNetGoogle Scholar
  16. 16.
    Karr, A.F., Fulp, W.J., Vera, F., Young, S.S.: Secure, Privacy-Preserving Analysis of Distributed Databases (2005b), available on-line at www.niss.org/dgii/techreports.html
  17. 17.
    Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Secure statistical analysis of distributed databases. In: Wilson, A., Wilson, G., Olwell, D. (eds.) Statistical Methods in Counterterrorism: Game Theory, Modeling, Syndromic Surveillance, and Biometric Authentication, Springer, New York (2006)Google Scholar
  18. 18.
    Koch, G., Amara, J., Atkinson, S., Stanish, W.: Overview of categorical analysis methods. SAS-SUGI 8, 785–795 (1983)Google Scholar
  19. 19.
    Kohnen, C.N., Reiter, J.P., Karr, A.F., Lin, X., Sanil, A.P.: Secure regression for vertically partitioned, partially overlapping data (2005), available on-line at http://www.niss.org/dgii/techreports.html
  20. 20.
    Reiter, J.P.: Model diagnostics for remote access regression servers. Statistics and Computing 13, 371–380 (2003)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Reiter, J.P.: Secure regression on distributed databases (unpublished manuscript, 2004)Google Scholar
  22. 22.
    Reiter, J.P., Kohnen, C.: Categorical data regression diagnostics for remote access servers. Journal of Statistical Computation and Simulation 75, 889–903 (2005)MATHCrossRefMathSciNetGoogle Scholar
  23. 23.
    Rinaldo, A.: Maximum Likelihood Estimation for Log-linear Models. Ph.D. Dissertation, Department of Statistics, Carnegie Mellon University (2005)Google Scholar
  24. 24.
    Slavkovic, A.B.: Statistical disclosure limitation with released marginal and conditionals for contingency tables. In: ICDM 2004, pp. 13–20. IEEE Computer Society Press, Los Alamitos (2004)Google Scholar
  25. 25.
    Vaidya, J., Clifton, C.: Privacy preserving association rule mining in vertically partitioned data. In: The Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Edmonton, Alberta, Canada (2002)Google Scholar
  26. 26.
    Vaidya, J., Clifton, C.: Privacy-preserving data mining: Why, how, and when. IEEE Security and Privacy 2(6), 19–27 (2004)CrossRefGoogle Scholar
  27. 27.
    Vaidya, J., Clifton, C., Zhu, M.: Privacy Preserving Data Mining. Springer, New York (2006)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Stephen E. Fienberg
    • 1
    • 2
  • William J. Fulp
    • 1
  • Aleksandra B. Slavkovic
    • 3
  • Tracey A. Wrobel
    • 3
  1. 1.Department of StatisticsCarnegie Mellon University 
  2. 2.Cylab and Machine Learning DepartmentCarnegie Mellon University 
  3. 3.Department of StatisticsPennsylvania State University 

Personalised recommendations