Privacy-Preserving Statistical Data Analysis on Federated Databases

  • Dan Bogdanov
  • Liina Kamm
  • Sven Laur
  • Pille Pruulmann-Vengerfeldt
  • Riivo Talviste
  • Jan Willemson
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8450)


The quality of empirical statistical studies is tightly related to the quality and amount of source data available. However, it is often hard to collect data from several sources due to privacy requirements or a lack of trust. In this paper, we propose a novel way to combine secure multi-party computation technology with federated database systems to preserve privacy in statistical studies that combine and analyse data from multiple databases. We describe an implementation on two real-world platforms—the Sharemind secure multi-party computation and the X-Road database federation platform. Our solution enables the privacy-preserving linking and analysis of databases belonging to different institutions. Indeed, a preliminary analysis from the Estonian Data Protection Inspectorate suggests that the correct implementation of our solution ensures that no personally identifiable information is processed in such studies. Therefore, our proposed solution can potentially reduce the costs of conducting statistical studies on shared data.


secure multi-party computation federated database infrastructures linking sensitive data privacy-preserving statistical analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal, G., Mishra, N., Pinkas, B.: Secure computation of the median (and other elements of specified ranks). Journal of Cryptology 23(3), 373–401 (2010)zbMATHMathSciNetGoogle Scholar
  2. 2.
    Ansper, A., Buldas, A., Freudenthal, M., Willemson, J.: Scalable and Efficient PKI for Inter-Organizational Communication. In: Proceedings of ACSAC 2003, pp. 308–318 (2003)Google Scholar
  3. 3.
    Ansper, A., Buldas, A., Freudenthal, M., Willemson, J.: High-Performance Qualified Digital Signatures for X-Road. In: Riis Nielson, H., Gollmann, D. (eds.) NordSec 2013. LNCS, vol. 8208, pp. 123–138. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  4. 4.
    Ansper, A., Buldas, A., Freudenthal, M., Willemson, J.: Protecting a Federated Database Infrastructure Against Denial-of-Service Attacks. In: Luiijf, E., Hartel, P. (eds.) CRITIS 2013. LNCS, vol. 8328, pp. 26–37. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  5. 5.
    Asharov, G., Lindell, Y., Zarosim, H.: Fair and Efficient Secure Multiparty Computation with Reputation Systems. In: Sako, K., Sarkar, P. (eds.) ASIACRYPT 2013, Part II. LNCS, vol. 8270, pp. 201–220. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  6. 6.
    Ben-David, A., Nisan, N., Pinkas, B.: FairplayMP: A system for secure multi-party computation. In: Proceedings of ACM CCS 2008, pp. 257–266 (2008)Google Scholar
  7. 7.
    Bogdanov, D.: Sharemind: programmable secure computations with practical applications. PhD thesis. University of Tartu (2013)Google Scholar
  8. 8.
    Bogdanov, D., Laud, P., Randmets, J.: Domain-Polymorphic Programming of Privacy-Preserving Applications. Cryptology ePrint Archive, Report 2013/371 (2013),
  9. 9.
    Bogdanov, D., Niitsoo, M., Toft, T., Willemson, J.: High-performance secure multi-party computation for data mining applications. International Journal of Information Security 11(6), 403–418 (2012)CrossRefGoogle Scholar
  10. 10.
    Bogdanov, D., Talviste, R., Willemson, J.: Deploying secure multi-party computation for financial data analysis. In: Keromytis, A.D. (ed.) FC 2012. LNCS, vol. 7397, pp. 57–64. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  11. 11.
    Bogetoft, P., et al.: Secure Multiparty Computation Goes Live. In: Dingledine, R., Golle, P. (eds.) FC 2009. LNCS, vol. 5628, pp. 325–343. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Breunig, M.M., Kriegel, H.-P., Ng, R.T., Lof, J.S.: Identifying density-based local outliers. In: Proceedings of CM SIGMOD 2000, pp. 93–104 (2000)Google Scholar
  13. 13.
    Burkhart, M., Strasser, M., Many, D., Dimitropoulos, X.A.: SEPIA: Privacy-Preserving Aggregation of Multi-Domain Network Events and Statistics. In: Proceedings of USENIX 2010, pp. 223–240 (2010)Google Scholar
  14. 14.
    Canetti, R., Ishai, Y., Kumar, R., Reiter, M.K., Rubinfeld, R., Wright, R.N.: Selective private function evaluation with applications to private statistics. In: Proceedings of PODC 2001, pp. 293–304. ACM (2001)Google Scholar
  15. 15.
    Cybernetica. Income analysis of the Estonian Public Sector. Online service, (last accessed December 13, 2013)
  16. 16.
    Damgård, I., Geisler, M., Krøigaard, M., Nielsen, J.B.: Asynchronous multiparty computation: Theory and implementation. In: Jarecki, S., Tsudik, G. (eds.) PKC 2009. LNCS, vol. 5443, pp. 160–179. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  17. 17.
    Damgård, I., Pastro, V., Smart, N., Zakarias, S.: Multiparty computation from somewhat homomorphic encryption. In: Safavi-Naini, R., Canetti, R. (eds.) CRYPTO 2012. LNCS, vol. 7417, pp. 643–662. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  18. 18.
    Du, W., Atallah, M.J.: Privacy-preserving cooperative statistical analysis. In: Proceedings of ACSAC 2001, pp. 102–110 (2001)Google Scholar
  19. 19.
    Du, W., Chen, S., Han, Y.S.: Privacy-preserving multivariate statistical analysis: Linear regression and classification. In: SDM 2004, pp. 222–233 (2004)Google Scholar
  20. 20.
    Dwork, C.: Differential privacy. In: Bugliesi, M., Preneel, B., Sassone, V., Wegener, I. (eds.) ICALP 2006. Part II. LNCS, vol. 4052, pp. 1–12. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  21. 21.
    Feigenbaum, J., Pinkas, B., Ryger, R., Saint-Jean, F.: Secure computation of surveys. In: EU Workshop on Secure Multiparty Protocols (2004)Google Scholar
  22. 22.
    Gentry, C.: Fully homomorphic encryption using ideal lattices. In: Proceedings of STOC 2009, pp. 169–178. ACM (2009)Google Scholar
  23. 23.
    Goldreich, O., Ostrovsky, R.: Software Protection and Simulation on Oblivious RAMs. Journal of the ACM 43(3), 431–473 (1996)CrossRefzbMATHMathSciNetGoogle Scholar
  24. 24.
    Hollander, M., Wolfe, D.A.: Nonparametric statistical methods, 2nd edn. John Wiley, New York (1999)zbMATHGoogle Scholar
  25. 25.
    Hoonhout, H.C.M.: Setting the stage for developing innovative product concepts: people and climate. CoDesign, 3(S1),19–34 (2007)Google Scholar
  26. 26.
    Hyndman, R.J., Fan, Y.: Sample quantiles in statistical packages. The American Statistician 50(4), 361–365 (1996)Google Scholar
  27. 27.
    Jawurek, M., Kerschbaum, F.: Fault-tolerant privacy-preserving statistics. In: Fischer-Hübner, S., Wright, M. (eds.) PETS 2012. LNCS, vol. 7384, pp. 221–238. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  28. 28.
    Kalja, A.: The X-Road Project. A Project to Modernize Estonia’s National Databases. Baltic IT&T review 24, 47–48 (2002)Google Scholar
  29. 29.
    Kalja, A.: The first ten years of X-road. In: Estonian Information Society Yearbook 2011/2012, pp. 78–80. Department of State Information System, Estonia (2012)Google Scholar
  30. 30.
    Kalja, A., Vallner, U.: Public e-Service Projects in Estonia. In: Proceedings of Baltic DB&IS 2002, vol. 2, pp. 143–153 (June 2002)Google Scholar
  31. 31.
    Kamm, L., Bogdanov, D., Laur, S., Vilo, J.: A new way to protect privacy in large-scale genome-wide association studies. Bioinformatics 29(7), 886–893 (2013)CrossRefGoogle Scholar
  32. 32.
    Kanji, G.K.: 100 statistical tests. Sage (2006)Google Scholar
  33. 33.
    Kerschbaum, F.: Practical privacy-preserving benchmarking. In: Jajodia, S., Samarati, P., Cimato, S. (eds.) Proceedings of IFIP TC-11 SEC 2008, vol. 278, pp. 17–31. Springer, Boston (2008)Google Scholar
  34. 34.
    Kiltz, E., Leander, G., Malone-Lee, J.: Secure computation of the mean and related statistics. In: Kilian, J. (ed.) TCC 2005. LNCS, vol. 3378, pp. 283–302. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  35. 35.
    Lane, J., Heus, P., Mulcahy, T.: Data Access in a Cyber World: Making Use of Cyberinfrastructure. Transactions on Data Privacy 1(1), 2–16 (2008)MathSciNetGoogle Scholar
  36. 36.
    Laur, S., Talviste, R., Willemson, J.: From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting. In: Jacobson, M., Locasto, M., Mohassel, P., Safavi-Naini, R. (eds.) ACNS 2013. LNCS, vol. 7954, pp. 84–101. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  37. 37.
    S. Laur, R. Talviste, J. Willemson.: From Oblivious AES to Efficient and Secure Database Join in the Multiparty Setting (extended version). Cryptology ePrint Archive, Report 2013/203 (2013),
  38. 38.
    Laur, S., Willemson, J., Zhang, B.: Round-Efficient Oblivious Database Manipulation. In: Lai, X., Zhou, J., Li, H. (eds.) ISC 2011. LNCS, vol. 7001, pp. 262–277. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  39. 39.
    Lettl, C.: User involvement competence for radical innovation. Journal of engineering and technology management 24(1), 53–75 (2007)CrossRefGoogle Scholar
  40. 40.
    Li, N., Li, T., Venkatasubramanian, S.: t-closeness: Privacy beyond k-anonymity and ℓ-diversity. In: Proceedings of ICDE 2007 (2007)Google Scholar
  41. 41.
    Y. Lindell, K. Nissim, C. Orlandi.: Hiding the input-size in secure two-party computation. Cryptology ePrint Archive, Report 2012/679 (2012),
  42. 42.
    Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M.: L-diversity: Privacy beyond k-anonymity. ACM Transactions on Knowledge Discovery from Data (TKDD) 1(1) ( March 2007)Google Scholar
  43. 43.
    P. Pruulmann-Vengerfeldt, L. Kamm, R. Talviste, P. Laud, D. Bogdanov.: Deliverable D1.1—Capability model (2012),
  44. 44.
    Samarati, P.: Protecting respondents identities in microdata release. IEEE Transactions on Knowledge and Data Engineering 13, 1010–1027 (2001)CrossRefGoogle Scholar
  45. 45.
    Shamir, A.: How to share a secret. Communications of the ACM 22, 612–613 (1979)CrossRefzbMATHMathSciNetGoogle Scholar
  46. 46.
    Suber, P.: Open Access. MIT Press (2012)Google Scholar
  47. 47.
    Subramaniam, H., Wright, R.N., Yang, Z.: Experimental analysis of privacy-preserving statistics computation. In: Jonker, W., Petković, M. (eds.) SDM 2004. LNCS, vol. 3178, pp. 55–66. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  48. 48.
    Sweeney, L.: K-anonymity: A model for protecting privacy. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10(5), 557–570 (2002)CrossRefzbMATHMathSciNetGoogle Scholar
  49. 49.
    Wilcoxon, F.: Individual Comparisons by Ranking Methods. Biometrics Bulletin 1(6), 80–83 (1945)CrossRefGoogle Scholar
  50. 50.
    Willemson, J.: Pseudonymization Service for X-Road eGovernment Data Exchange Layer. In: Andersen, K.N., Francesconi, E., Grönlund, Å., van Engers, T.M. (eds.) EGOVIS 2011. LNCS, vol. 6866, pp. 135–145. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  51. 51.
    Willemson, J., Ansper, A.: A Secure and Scalable Infrastructure for Inter-Organizational Data Exchange and eGovernment Applications. In: Proceedings of ARES 2008, pp. 572–577. IEEE Computer Society (2008)Google Scholar
  52. 52.
    Yang, Z., Wright, R.N., Subramaniam, H.: Experimental analysis of a privacy-preserving scalar product protocol. Computer Systems Science & Engineering 21(1) (2006)Google Scholar
  53. 53.
    Yao, A.C.-C.: Protocols for Secure Computations (Extended Abstract). In: Proceedings of FOCS 1982, pp. 160–164. IEEE (1982)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Dan Bogdanov
    • 1
  • Liina Kamm
    • 1
    • 2
  • Sven Laur
    • 2
  • Pille Pruulmann-Vengerfeldt
    • 3
  • Riivo Talviste
    • 1
    • 2
  • Jan Willemson
    • 1
    • 4
  1. 1.CyberneticaTallinnEstonia
  2. 2.Institute of Computer ScienceUniversity of TartuTartuEstonia
  3. 3.Institute of Journalism, Communication and Information StudiesUniversity of TartuTartuEstonia
  4. 4.ELIKO Competence Centre in Electronics-, Info- and Communication TechnologiesTallinnEstonia

Personalised recommendations