Accurate Estimation of Structural Equation Models with Remote Partitioned Data

Snoke, Joshua; Brick, Timothy; Slavković, Aleksandra

doi:10.1007/978-3-319-45381-1_15

Joshua Snoke¹⁵,
Timothy Brick¹⁶ &
Aleksandra Slavković¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9867))

Included in the following conference series:

International Conference on Privacy in Statistical Databases

836 Accesses
1 Citations

Abstract

This paper focuses on a privacy paradigm centered around providing access to researchers to remotely carry out analyses on sensitive data stored behind firewalls. We develop and demonstrate a method for accurate estimation of structural equation models (SEMs) for arbitrarily partitioned data. We show that under a certain set of assumptions our method for estimation across these partitions achieves identical results as estimation with the full data. We consider two situations: (i) a standard setting with a trusted central server and (ii) a round-robin setting in which none of the parties are fully trusted, and extend them in two specific ways. First, we formulate our methods specifically for SEMs, which have become increasingly common models in psychology, human development, and the behavioral sciences. Secondly, our methods work for horizontal, vertical, and complex partitions without needing different routines. In application, this method will serve to increase opportunities for research by allowing SEM estimation without transfer or combination of data. We demonstrate our methods with both simulated and real data examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Arbuckle, J.L., Marcoulides, G.A., Schumacker, R.E.: Full information estimation in the presence of incomplete data. Adv. Struct. Equ. Model. Issues Tech. 243, 277 (1996)
Google Scholar
Boker, S., Neale, M., Maes, H., Wilde, M., Spiegel, M., Brick, T., Spies, J., Estabrook, R., Kenny, S., Bates, T., et al.: Openmx: an open source extended structural equation modeling framework. Psychometrika 76(2), 306–317 (2011)
Article MathSciNet MATH Google Scholar
Boker, S.M., Brick, T.R., Pritikin, J.N., Wang, Y., von Oertzen, T., Brown, D., Lach, J., Estabrook, R., Hunter, M.D., Maes, H.H., et al.: Maintained individual data distributed likelihood estimation (middle). Multivar. Behav. Res. 50(6), 706–720 (2015)
Article Google Scholar
CALIT. Personal data for the public good. Technical report, California Institute for Telecommunications and Information Technology (2014)
Google Scholar
de Montjoye, Y.-A., Shmueli, E., Wang, S.S., Pentland, A.S.: OpenPDS: protecting the privacy of metadata through safeanswers. PloS one 9(7), e98790 (2014)
Article Google Scholar
Dufau, S., Duñabeitia, J.A., Moret-Tatay, C., McGonigal, A., Peeters, D., Alario, F.-X., Balota, D.A., Brysbaert, M., Carreiras, M., Ferrand, L., et al.: Smart phone, smart science: how the use of smartphones can revolutionize research in cognitive science. PloS one 6(9), e24974 (2011)
Article Google Scholar
Dwork, C.: Differential privacy: a survey of results. In: Agrawal, M., Du, D.-Z., Duan, Z., Li, A. (eds.) TAMC 2008. LNCS, vol. 4978, pp. 1–19. Springer, Heidelberg (2008)
Chapter Google Scholar
Fienberg, S.E., Fulp, W.J., Slavkovic, A.B., Wrobel, T.A.: “Secure” log-linear and logistic regression analysis of distributed databases. In: Domingo-Ferrer, J., Franconi, L. (eds.) Privacy in Statistical Databases. LNCS, vol. 4302, pp. 277–290. Springer, Heidelberg (2006)
Google Scholar
Fienberg, S.E., Nardi, Y., Slavković, A.B.: Valid statistical analysis for logistic regression with multiple sources. In: Gal, C.S., Kantor, P.B., Lesk, M.E. (eds.) ISIPS 2008. LNCS, vol. 5661, pp. 82–94. Springer, Heidelberg (2009)
Chapter Google Scholar
Gaye, A., Marcon, Y., Isaeva, J., LaFlamme, P., Turner, A., Jones, E.M., Minion, J., Boyd, A.W., Newby, C.J., Nuotio, M.-L., et al.: DataSHIELD: taking the analysis to the data, not the data to the analysis. Int. J. Epidemiol. 43(6), 1929–1944 (2014)
Article Google Scholar
Gillespie, N.: Direction of causation and comorbidity models mutualism, sibling / spousal interaction. Presentation at Advanced Genetic Epidemiology Statistical Workshop 2015, Richmond, VA (2015)
Google Scholar
Gillespie, N.A., Henders, A.K., Davenport, T.A., Hermens, D.F., Wright, M.J., Martin, N.G., Hickie, I.B.: The brisbane longitudinal twin study: pathways to cannabis use, abuse, and dependence project–current status, preliminary results, and future directions. Twin Res. Hum. Genet. 16(01), 21–33 (2013)
Article Google Scholar
Goldwasser, S.: Multi party computations: past and present. In: Proceedings of the Sixteenth Annual ACM Symposium on Principles of Distributed Computing, pp. 1–6. ACM (1997)
Google Scholar
Hall, R., Fienberg, S.E.: Privacy-preserving record linkage. In: Domingo-Ferrer, J., Magkos, E. (eds.) PSD 2010. LNCS, vol. 6344, pp. 269–283. Springer, Heidelberg (2010)
Chapter Google Scholar
Haynsworth, E.V.: On the schur complement. Technical report, DTIC Document (1968)
Google Scholar
Hundepool, A., Domingo-Ferrer, J., Franconi, L., Giessing, S., Nordholt, E.S., Spicer, K., De Wolf, P.-P.: Statistical Disclosure Control. John Wiley & Sons, Hoboken (2012)
Book Google Scholar
Karr, A.F., Fulp, W.J., Vera, F., Young, S.S., Lin, X., Reiter, J.P.: Secure, privacy-preserving analysis of distributed databases. Technometrics 49(3), 335–345 (2007)
Article MathSciNet Google Scholar
Karr, A.F., Lin, X., Sanil, A.P., Reiter, J.P.: Privacy-preserving analysis of vertically partitioned data using secure matrix products. J. Official Stat. 25(1), 125 (2009)
Google Scholar
Kupek, E.: Beyond logistic regression: structural equations modelling for binary variables and its application to investigating unobserved confounders. BMC Med. Res. Methodol. 6(1), 1 (2006)
Article Google Scholar
Lindell, Y., Pinkas, B.: Secure multiparty computation for privacy-preserving data mining. J. Priv. Confidentiality 1(1), 5 (2009)
Google Scholar
McArdle, J.J., McDonald, R.P.: Some algebraic properties of the reticular action model for moment structures. Br. J. Math. Stat. Psychol. 37(2), 234–251 (1984)
Article MATH Google Scholar
Miller, G.: The smartphone psychology manifesto. Perspect. Psychol. Sci. 7(3), 221–237 (2012)
Article Google Scholar
Raab, G.M., Dibben, C., Burton, P.: Running an analysis of combined data when the individual records cannot be combined: practical issues in secure computation. In: Statistical Data Confidentiality Work Session, UNECE, October 2015
Google Scholar
Schur, I.: Neue begründung der theorie der gruppencharaktere (1905)
Google Scholar
Slavkovic, A.B., Nardi, Y., Tibbits, M.M.: “Secure” logistic regression of horizontally and vertically partitioned distributed databases. In: Seventh IEEE International Conference on Data Mining Workshops (ICDM Workshops 2007), pp. 723–728. IEEE (2007)
Google Scholar
Willenborg, L., De Waal, T.: Statistical Disclosure Control in Practice, vol. 111. Springer, New York (1996)
MATH Google Scholar
Yao, A.C-C.: Protocols for secure computations. In: FOCS 82, pp. 160–164 (1982)
Google Scholar

Download references

Acknowledgements

This work was supported in part by NSF grants Big Data Social Sciences IGERT DGE-1144860 to Pennsylvania State University, and BCS-0941553 and SES-1534433 to the Department of Statistics, Pennsylvania State University. The work was also in part supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant UL1 TR000127. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH.

Author information

Authors and Affiliations

Department of Statistics, Pennsylvania State University, State College, USA
Joshua Snoke & Aleksandra Slavković
Department of Human Development and Family Studies, Pennsylvania State University, State College, USA
Timothy Brick

Authors

Joshua Snoke
View author publications
You can also search for this author in PubMed Google Scholar
Timothy Brick
View author publications
You can also search for this author in PubMed Google Scholar
Aleksandra Slavković
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Joshua Snoke .

Editor information

Editors and Affiliations

Universitat Rovira i Virgili, Tarragona, Spain
Josep Domingo-Ferrer
University of Zagreb, Zagreb, Croatia
Mirjana Pejić-Bach

A Appendix

1.1 A.1 RAM Algebra

We briefly exhibit here the method we use for defining SEMs and transforming the model parameters to model implied means and covariance matrices. These model implied matrices are then used to calculate log likelihoods iteratively given the data. Optimizing over these matrices is equivalent to optimizing over the model parameters, giving us our estimates.

The SEM path diagram has a one-to-one relationship with the Multivariate normal mean and covariance matrices for the manifest variables. We construct this relationship through the use of RAM matrix algebra. For this we define five matrices denoted A, S, F, M, and I. These matrices contain both fixed and free model parameters. The free parameters are to be estimated and will be changed during optimization, while the fixed parameters do not change. In these matrices, free parameters are denoted with a greek symbol and the fixed parameters are designated by a constant number.

Recall the path diagram shown in Fig. 4. For this example model, the RAM algebra proceeds as follows. The A (“asymmetric”) matrix defines all regression parameters or one-headed arrows in the path diagram. It has number of rows and columns equal to the number of combined latent and manifest variables, with the column designating the path origin and the row designation the destination.

The S (“symmetric”) matrix defines are variance parameters or two-headed arrows in the path diagram in the same way as the A matrix.

The F (“filter”) matrix acts a filter for the manifest variables. It has columns equal to the combined number of latent and manifest variables but rows equal only the number of manifest variables. For each manifest variable it has a one on the diagonal.

The M (“mean”) matrix defines the mean parameters if any for the latent and manifest variables. These are not always included in the path diagrams.

Finally an I (“identity”) matrix is included, with columns and rows equal to the number of combined latent and manifest variables.

Using these matrices, we obtain the corresponding model implied mean ($\mu $) and covariance matrices ($\varSigma $) of the manifest variables based on the chosen parameters. The following equations give this crucial relationship.

$$\begin{aligned} \varSigma = F * (I - A)^{-1} * S * ((I - A)^{-1})^T * F^T \end{aligned}$$

(10)

$$\begin{aligned} \mu = F * (I - A)^{-1} * M \end{aligned}$$

(11)

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Snoke, J., Brick, T., Slavković, A. (2016). Accurate Estimation of Structural Equation Models with Remote Partitioned Data. In: Domingo-Ferrer, J., Pejić-Bach, M. (eds) Privacy in Statistical Databases. PSD 2016. Lecture Notes in Computer Science(), vol 9867. Springer, Cham. https://doi.org/10.1007/978-3-319-45381-1_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-45381-1_15
Published: 31 August 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-45380-4
Online ISBN: 978-3-319-45381-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Accurate Estimation of Structural Equation Models with Remote Partitioned Data

Abstract

Access this chapter

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

A Appendix

A Appendix

1.1 A.1 RAM Algebra

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation