Management and Curation of Multi-Dimensional Data in Biobank Studies

Sansome, Gary; Hacker, Alex

doi:10.1007/978-981-15-7666-9_8

Gary Sansome² &
Alex Hacker²

559 Accesses

Abstract

The development of secure and reliable systems to collect, store, utilise, and share data on study participants plays a critical role in large population health studies. Contemporary prospective biobank studies typically involve hundreds of thousands of participants, and collect a wide range of data through questionnaires, physical measurements, sample assays, and linkages with external data sources for an extended period. Careful planning and management of a central data repository are required to ensure the privacy, security, accessibility, flexibility, consistency, and accuracy of the data collected and generated in the study. This chapter outlines some of the key concepts and principles underlying the design and development of data storage infrastructures, database architecture, and management systems in large biobank studies. It also describes practical considerations for each step from initial data collection from study participants to delivery of research-ready datasets; from data import, cleaning, and integration; through quality checks, standardisation, and validation; and finally to preparing datasets for bone fide researchers. The general principles and approaches described should be applicable to a wide variety of population health studies in different settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 139.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Traits and types of health data repositories

Article Open access 30 June 2014

Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba Centre for Health Policy

Abbreviations

API:: Application programming interface
CKB:: China Kadoorie Biobank
DAG:: Data access governance
DBMS:: Database management system
ICD:: International classification of diseases
ID:: Identifier
IT:: Information technology
RDBMS:: Relational database management system
SQL:: Structured query language
SOP:: Standard operating procedures
WHO:: World Health Organisation

References

Arbuckle L, El Emam K. Anonymizing health data – case studies and methods to get you started. Newton: O’Reilly Media; 2013.
Google Scholar
Foster EC, Godbole S. Database systems - a pragmatic approach. New York: Apress; 2016.
Google Scholar
Goldberg D. What every computer scientist should know about floating-point arithmetic. ACM Comput Surv. 1991;23(1). https://dl.acm.org/doi/pdf/10.1145/103162.103163
Harron K, Goldstein H, Dibben C. Methodological developments in data linkage. London: Wiley; 2016.
Google Scholar
Kirkwood BR, Sterne JAC. Essential medical statistics. 2nd ed. Hoboken: Wiley-Blackwell; 2003.
Google Scholar
Molinaro A. SQL cookbook – query solutions and techniques for database developers. Newton: O’Reilly Media; 2009.
Google Scholar
UK Biobank Limited. UK Biobank: Access procedures November 2011. 2011. Available from http://www.ukbiobank.ac.uk/wp-content/uploads/2012/09/Access-Procedures-2011.pdf
World Health Organisation International Statistical Classification of Diseases and Related Health Problems 10th Revision. 2016. Available from: https://icd.who.int/browse10/2016/en
Ziemann M, Eren Y, El-Osta A. Gene name errors are widespread in the scientific literature. Genome Biol. 2016;17:177.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Big Data Institute Building, Nuffield Department of Population Health, Old Road Campus, University of Oxford, Oxford, UK
Gary Sansome & Alex Hacker

Authors

Gary Sansome
View author publications
You can also search for this author in PubMed Google Scholar
Alex Hacker
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gary Sansome .

Editor information

Editors and Affiliations

Nuffield Department of Population Health, University of Oxford, Oxford, Oxfordshire, UK
Zhengming Chen

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Sansome, G., Hacker, A. (2020). Management and Curation of Multi-Dimensional Data in Biobank Studies. In: Chen, Z. (eds) Population Biobank Studies: A Practical Guide. Springer, Singapore. https://doi.org/10.1007/978-981-15-7666-9_8

Download citation

DOI: https://doi.org/10.1007/978-981-15-7666-9_8
Published: 10 December 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-7665-2
Online ISBN: 978-981-15-7666-9
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Management and Curation of Multi-Dimensional Data in Biobank Studies

Abstract

Access this chapter

Similar content being viewed by others

Traits and types of health data repositories

Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba Centre for Health Policy

Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba Centre for Health Policy

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Management and Curation of Multi-Dimensional Data in Biobank Studies

Abstract

Access this chapter

Similar content being viewed by others

Traits and types of health data repositories

Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba Centre for Health Policy

Health Services Data: Managing the Data Warehouse: 25 Years of Experience at the Manitoba Centre for Health Policy

Abbreviations

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation