Data Integration between Swedish National Clinical Health Registries and Biobanks Using an Availability System

  • Ola Spjuth
  • Jani Heikkinen
  • Jan-Eric Litton
  • Juni Palmgren
  • Maria Krestyaninova
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8574)


Linking biobank data, such as molecular profiles, with clinical phenotypes is of great importance in epidemiological and predictive studies. A comprehensive overview of various data sources that can be combined in order to power up a study is a key factor in the design. Clinical data stored in health registries and biobank data in research projects are commonly provisioned in different database systems and governed by separate organizations, making the integration process challenging and hampering biomedical investigations. We here describe the integration of data on prostate cancer from a clinical health registry with data from a biobank, and its provisioning in the SAIL availability system. We demonstrate the implications of using the actual raw data, data transformed to availability data, and availability data which has been subjected to anonymization techniques to reduce the risk of re-identification. Our results show that an availability system such as SAIL with integrated clinical and biobank data can be a valuable tool for planning new studies and finding interesting subsets to investigate further. We also show that an availability system can deliver useful insights even when the data has been subjected to anonymization techniques.


Data integration health registry biobanks availability system anonymization 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Solomon, D.J., Henry, R.C., Hogan, J.G., Van Amburg, G.H., Taylor, J.: Evaluation and implementation of public health registries. Public Health Rep. 106(2), 142–150 (1991)Google Scholar
  2. 2.
    McCarthy, M.I., Abecasis, G.R., Cardon, L.R., Goldstein, D.B., Little, J., Ioannidis, J.P.A., Hirschhorn, J.N.: Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9(5), 356–369 (2008)CrossRefGoogle Scholar
  3. 3.
    Manolio, T.A.: Genomewide association studies and assessment of the risk of disease. N. Engl. J. Med. 363(2), 166–176 (2010)CrossRefGoogle Scholar
  4. 4.
    Kaiser, J.: Swedish bioscience. working sweden’s population gold mine. Science 293(5539), 2375 (2001)CrossRefGoogle Scholar
  5. 5.
    Fortier, I., Doiron, D., Little, J., Ferretti, V., L’Heureux, F., Stolk, R.P., Knoppers, B.M., Hudson, T.J., Burton, P.R.: Is rigorous retrospective harmonization possible? application of the datashaper approach across 53 large studies. Int. J. Epidemiol. 40(5), 1314–1328 (2011)CrossRefGoogle Scholar
  6. 6.
    Reiter, J.P., Kinney, S.K.: Sharing confidential data for research purposes: a primer. Epidemiology 22(5), 632–635 (2011)CrossRefGoogle Scholar
  7. 7.
    Harris, J.R., Burton, P., Knoppers, B.M., Lindpaintner, K., Bledsoe, M., Brookes, A.J., Budin-Ljøsne, I., Chisholm, R., Cox, D., Deschênes, M., Fortier, I., Hainaut, P., Hewitt, R., Kaye, J., Litton, J.E., Metspalu, A., Ollier, B., Palmer, L.J., Palotie, A., Pasterk, M., Perola, M., Riegman, P.H.J., van Ommen, G.J., Yuille, M., Zatloukal, K.: Toward a roadmap in global biobanking for health. Eur. J. Hum. Genet. 20(11), 1105–1111 (2012)CrossRefGoogle Scholar
  8. 8.
    Dankar, F.K., El Emam, K., Neisa, A., Roffey, T.: Estimating the re-identification risk of clinical data sets. BMC Med. Inform. Decis. Mak. 12, 66 (2012)CrossRefGoogle Scholar
  9. 9.
    Homer, N., Szelinger, S., Redman, M., Duggan, D., Tembe, W., Muehling, J., Pearson, J.V., Stephan, D.A., Nelson, S.F., Craig, D.W.: Resolving individuals contributing trace amounts of dna to highly complex mixtures using high-density snp genotyping microarrays. PLoS Genet. 4(8), e1000167 (2008)Google Scholar
  10. 10.
    Gymrek, M., McGuire, A.L., Golan, D., Halperin, E., Erlich, Y.: Identifying personal genomes by surname inference. Science 339(6117), 321–324 (2013)CrossRefGoogle Scholar
  11. 11.
    El Emam, K., Dankar, F.K.: Protecting privacy using k-anonymity. Journal of the American Medical Informatics Association 15, 627–637 (2008)CrossRefGoogle Scholar
  12. 12.
    Samarati, P., Sweeney, L.: Protecting privacy when disclosing information: k-anonymity and its enforcement through generalization and suppression. Technical report (1998)Google Scholar
  13. 13.
    Avillach, P., Coloma, P.M., Gini, R., Schuemie, M., Mougin, F., Dufour, J.C., Mazzaglia, G., Giaquinto, C., Fornari, C., Herings, R., Molokhia, M., Pedersen, L., Fourrier-Réglat, A., Fieschi, M., Sturkenboom, M., van der Lei, J., Pariente, A., Trifirò, G.: EU-ADR consortium: Harmonization process for the identification of medical events in eight european healthcare databases: the experience from the eu-adr project. J. Am. Med. Inform. Assoc. 20(1), 184–192 (2013)CrossRefGoogle Scholar
  14. 14.
    Wolfson, M., Wallace, S.E., Masca, N., Rowe, G., Sheehan, N.A., Ferretti, V., LaFlamme, P., Tobin, M.D., Macleod, J., Little, J., Fortier, I., Knoppers, B.M., Burton, P.R.: Datashield: resolving a conflict in contemporary bioscience–performing a pooled analysis of individual-level data without sharing the data. Int. J. Epidemiol. 39(5), 1372–1382 (2010)CrossRefGoogle Scholar
  15. 15.
    Gostev, M., Fernandez-Banet, J., Rung, J., Dietrich, J., Prokopenko, I., Ripatti, S., McCarthy, M.I., Brazma, A., Krestyaninova, M.: Sail–a software system for sample and phenotype availability across biobanks and cohorts. Bioinformatics 27(4), 589–591 (2011)CrossRefGoogle Scholar
  16. 16.
    ENGAGE Consortium: Data sharing in large research consortia: experiences and recommendations from engage. Eur. J. Hum. Genet. 22(3), 317–321 (2014)Google Scholar
  17. 17.
    Kuriyama, M., Wang, M.C., Papsidero, L.D., Killian, C.S., Shimano, T., Valenzuela, L., Nishiura, T., Murphy, G.P., Chu, T.M.: Quantitation of prostate-specific antigen in serum by a sensitive enzyme immunoassay. Cancer Research 40(12), 4658–4662 (1980)Google Scholar
  18. 18.
    Milette, F., Larivière, L., Piché, J.: Gleason grading of prostatic biopsies. Am. J. Surg. Pathol. 24(10),1443–1444 (2000)Google Scholar
  19. 19.
  20. 20.
  21. 21.
    Templ, M.: scdMicro: A package for statistical disclosure control in R. ISI (2007)Google Scholar
  22. 22.
    Swedish Cancer Centre: Variable description for the prostate cancer quality regsitry,

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Ola Spjuth
    • 1
  • Jani Heikkinen
    • 2
    • 3
  • Jan-Eric Litton
    • 1
  • Juni Palmgren
    • 1
    • 3
  • Maria Krestyaninova
    • 2
    • 3
    • 4
  1. 1.Department of Medical Epidemiology and Biostatistics and Swedish e-Science Research CenterKarolinska InstitutetStockholmSweden
  2. 2.Uniquer SarlLausanneSwitzerland
  3. 3.Institute for Molecular Medicine Finland FIMMUniversity of HelsinkiHelsinkiFinland
  4. 4.EawagDübendorfSwitzerland

Personalised recommendations