Advertisement

Accurate Estimation of Structural Equation Models with Remote Partitioned Data

  • Joshua SnokeEmail author
  • Timothy Brick
  • Aleksandra Slavković
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9867)

Abstract

This paper focuses on a privacy paradigm centered around providing access to researchers to remotely carry out analyses on sensitive data stored behind firewalls. We develop and demonstrate a method for accurate estimation of structural equation models (SEMs) for arbitrarily partitioned data. We show that under a certain set of assumptions our method for estimation across these partitions achieves identical results as estimation with the full data. We consider two situations: (i) a standard setting with a trusted central server and (ii) a round-robin setting in which none of the parties are fully trusted, and extend them in two specific ways. First, we formulate our methods specifically for SEMs, which have become increasingly common models in psychology, human development, and the behavioral sciences. Secondly, our methods work for horizontal, vertical, and complex partitions without needing different routines. In application, this method will serve to increase opportunities for research by allowing SEM estimation without transfer or combination of data. We demonstrate our methods with both simulated and real data examples.

Keywords

Statistical disclosure control Partitioned data Structural Equation Models Distributed Maximum Likelihood Estimation 

Notes

Acknowledgements

This work was supported in part by NSF grants Big Data Social Sciences IGERT DGE-1144860 to Pennsylvania State University, and BCS-0941553 and SES-1534433 to the Department of Statistics, Pennsylvania State University. The work was also in part supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant UL1 TR000127. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

References

  1. 1.
    Arbuckle, J.L., Marcoulides, G.A., Schumacker, R.E.: Full information estimation in the presence of incomplete data. Adv. Struct. Equ. Model. Issues Tech. 243, 277 (1996)Google Scholar
  2. 2.
    Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., Spies, J., Estabrook, R., Kenny, S., Bates, T., et al.: Openmx: an open source extended structural equation modeling framework. Psychometrika 76(2), 306–317 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Boker, S.M., Brick, T.R., Pritikin, J.N., Wang, Y., von Oertzen, T., Brown, D., Lach, J., Estabrook, R., Hunter, M.D., Maes, H.H., et al.: Maintained individual data distributed likelihood estimation (middle). Multivar. Behav. Res. 50(6), 706–720 (2015)CrossRefGoogle Scholar
  4. 4.
    CALIT. Personal data for the public good. Technical report, California Institute for Telecommunications and Information Technology (2014)Google Scholar
  5. 5.
    de Montjoye, Y.-A., Shmueli, E., Wang, S.S., Pentland, A.S.: OpenPDS: protecting the privacy of metadata through safeanswers. PloS one 9(7), e98790 (2014)CrossRefGoogle Scholar
  6. 6.
    Dufau, S., Duñabeitia, J.A., Moret-Tatay, C., McGonigal, A., Peeters, D., Alario, F.-X., Balota, D.A., Brysbaert, M., Carreiras, M., Ferrand, L., et al.: Smart phone, smart science: how the use of smartphones can revolutionize research in cognitive science. PloS one 6(9), e24974 (2011)CrossRefGoogle Scholar
  7. 7.
    Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  8. 8.
    Fienberg, S.E., Fulp, W.J., Slavkovic, A.B., Wrobel, T.A.: “Secure” log-linear and logistic regression analysis of distributed databases. In: Domingo-Ferrer, J., Franconi, L. (eds.) Privacy in Statistical Databases. LNCS, vol. 4302, pp. 277–290. Springer, Heidelberg (2006)Google Scholar
  9. 9.
    Fienberg, S.E., Nardi, Y., Slavković, A.B.: Valid statistical analysis for logistic regression with multiple sources. In: Gal, C.S., Kantor, P.B., Lesk, M.E. (eds.) ISIPS 2008. LNCS, vol. 5661, pp. 82–94. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  10. 10.
    Gaye, A., Marcon, Y., Isaeva, J., LaFlamme, P., Turner, A., Jones, E.M., Minion, J., Boyd, A.W., Newby, C.J., Nuotio, M.-L., et al.: DataSHIELD: taking the analysis to the data, not the data to the analysis. Int. J. Epidemiol. 43(6), 1929–1944 (2014)CrossRefGoogle Scholar
  11. 11.
    Gillespie, N.: Direction of causation and comorbidity models mutualism, sibling / spousal interaction. Presentation at Advanced Genetic Epidemiology Statistical Workshop 2015, Richmond, VA (2015)Google Scholar
  12. 12.
    Gillespie, N.A., Henders, A.K., Davenport, T.A., Hermens, D.F., Wright, M.J., Martin, N.G., Hickie, I.B.: The brisbane longitudinal twin study: pathways to cannabis use, abuse, and dependence project–current status, preliminary results, and future directions. Twin Res. Hum. Genet. 16(01), 21–33 (2013)CrossRefGoogle Scholar
  13. 13.
    Goldwasser, S.: Multi party computations: past and present. In: Proceedings of the Sixteenth Annual ACM Symposium on Principles of Distributed Computing, pp. 1–6. ACM (1997)Google Scholar
  14. 14.
    Hall, R., Fienberg, S.E.: Privacy-preserving record linkage. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 269–283. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  15. 15.
    Haynsworth, E.V.: On the schur complement. Technical report, DTIC Document (1968)Google Scholar
  16. 16.
    Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., De Wolf, P.-P.: Statistical Disclosure Control. John Wiley & Sons, Hoboken (2012)CrossRefGoogle Scholar
  17. 17.
    Karr, A.F., Fulp, W.J., Vera, F., Young, S.S., Lin, X., Reiter, J.P.: Secure, privacy-preserving analysis of distributed databases. Technometrics 49(3), 335–345 (2007)MathSciNetCrossRefGoogle Scholar
  18. 18.
    Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Privacy-preserving analysis of vertically partitioned data using secure matrix products. J. Official Stat. 25(1), 125 (2009)Google Scholar
  19. 19.
    Kupek, E.: Beyond logistic regression: structural equations modelling for binary variables and its application to investigating unobserved confounders. BMC Med. Res. Methodol. 6(1), 1 (2006)CrossRefGoogle Scholar
  20. 20.
    Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. J. Priv. Confidentiality 1(1), 5 (2009)Google Scholar
  21. 21.
    McArdle, J.J., McDonald, R.P.: Some algebraic properties of the reticular action model for moment structures. Br. J. Math. Stat. Psychol. 37(2), 234–251 (1984)CrossRefzbMATHGoogle Scholar
  22. 22.
    Miller, G.: The smartphone psychology manifesto. Perspect. Psychol. Sci. 7(3), 221–237 (2012)CrossRefGoogle Scholar
  23. 23.
    Raab, G.M., Dibben, C., Burton, P.: Running an analysis of combined data when the individual records cannot be combined: practical issues in secure computation. In: Statistical Data Confidentiality Work Session, UNECE, October 2015Google Scholar
  24. 24.
    Schur, I.: Neue begründung der theorie der gruppencharaktere (1905)Google Scholar
  25. 25.
    Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: “Secure” logistic regression of horizontally and vertically partitioned distributed databases. In: Seventh IEEE International Conference on Data Mining Workshops (ICDM Workshops 2007), pp. 723–728. IEEE (2007)Google Scholar
  26. 26.
    Willenborg, L., De Waal, T.: Statistical Disclosure Control in Practice, vol. 111. Springer, New York (1996)zbMATHGoogle Scholar
  27. 27.
    Yao, A.C-C.: Protocols for secure computations. In: FOCS 82, pp. 160–164 (1982)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2016

Authors and Affiliations

  • Joshua Snoke
    • 1
    Email author
  • Timothy Brick
    • 2
  • Aleksandra Slavković
    • 1
  1. 1.Department of StatisticsPennsylvania State UniversityState CollegeUSA
  2. 2.Department of Human Development and Family StudiesPennsylvania State UniversityState CollegeUSA

Personalised recommendations