Abstract
“Modern epidemiology” has consolidated the direct collection of individual data as the most valued approach for conducting epidemiological research. An essential feature of powerful epidemiological studies (in whatever design, observational, quasi-experimental or experimental) is a longitudinal structure, so that in the course of the study, data are collected over time and measurements can be repeated for each participant. Notably, the amount and variety of individual health data routinely collected from different sources and available in digital media have increased exponentially. This growing amount of data has caused scientific disciplines to confront essential challenges in operational (data management, infrastructure, training), methodological (new approaches to analyze and to derive inferences from “big data”), and epistemological (several argue that the hypothesis-driven science is outdated, and we live now in a data-driven era) realms. There is no doubt that the use of large administrative databases in particular when enriched through linkage with other sources of data, while in its infancy, is a powerful tool with the potential to bolster medical and epidemiological longitudinal research. Being relatively fast and low cost, it can enable the study of essential research questions previously unfeasible for among others, budgetary, or ethical reasons.
Similar content being viewed by others
References
Lipworth W, Mason PH, Kerridge I, Ioannidis JPA. Ethics and epistemology in big data research. Bioeth Inq. 2017;14:489–500. https://doi.org/10.1007/s11673-017-9771-3.
Hu H, Galea S, Rosella L, Henry D. Big data and population health: focusing on the health impacts of the social, physical, and economic environment. Epidemiology. 2017;28(6):759–62. https://doi.org/10.1097/EDE.0000000000000711.
National Research Council (US) Committee on A Framework for Developing a New Taxonomy of Disease. Toward precision medicine: building a knowledge network for biomedical research and a new taxonomy of disease. Washington (DC): National Academies Press (US); 2011.
Dolley S. Big data’s role in precision public health. Front Public Health. 2018;6:68. https://doi.org/10.3389/fpubh.2018.00068.
Genowska A, Jamiołkowski J, Szafraniec K, Stepaniak U, Szpak A, Pająk A. Environmental and socio-economic determinants of infant mortality in Poland: an ecological study. Environ Health. 2015;14:61. https://doi.org/10.1186/s12940-015-0048-1.r.
Rasella D, Harhay MO, Pamponet ML, Aquino R, Barreto ML. Impact of primary health care on mortality from heart and cerebrovascular diseases in Brazil: a nationwide analysis of longitudinal data. BMJ. 2014;349:g4014.
Rasella D, Aquino R, Santos CAT, Paes-Sousa R, Barreto ML. Effect of a conditional cash transfer programme on childhood mortality: a nationwide analysis of Brazilian municipalities. Lancet. 2013;382(9886):57–64.
Editorial. Epidemiology is a science of high importance. Nat Commun. 2018;9:1703. https://doi.org/10.1038/s41467-018-04243-3.
Stringhini S, Carmeli C, Jokela M, Avendaño M, Muennig P, Guida F, et al. Socioeconomic status and the 25 × 25 risk factors as determinants of premature mortality: a multicohort study and meta-analysis of 1·7 million men and women. Lancet. 2017;389(10075):1229–37. https://doi.org/10.1016/S0140-6736(16)32380-7.
Dunn HL. Record linkage. Am J Publ Health. 1946;36:1412–6.
Somers RL. Repeat abortion in Denmark: an analysis based on national record linkage. Stud Fam Plan. 1977;8(6):142–7.
Schmidt M, Pedersen L, Sørensen HT. The Danish civil registration system as a tool in epidemiology. Eur J Epidemiol. 2014;29(8):541–9. https://doi.org/10.1007/s10654-014-9930-3.
Davidsen M, Kjøller M, Helweg-Larsen K. The Danish National Cohort Study (DANCOS). Scand J Public Health. 2011;39(7 Suppl):131–5. https://doi.org/10.1177/1403494811399167.
Spoerri A, Zwahlen M, Egger M, Bopp M. The Swiss National Cohort: a unique database for national and international researchers. Int J Public Health. 2010;55(4):239–42. https://doi.org/10.1007/s00038-010-0160-5.
Zhao J, Gibb S, Jackson R, Mehta S, Exeter DJ. Constructing whole of population cohorts for health and social research using the New Zealand Integrated Data Infrastructure. Aust N Z J Public Health. 2018. https://doi.org/10.1111/1753-6405.12781.
https://www.closer.ac.uk/. Accessed 6th June 2018
Harron K, Dibben C, Boyd J, Hjern A, Azimaee M, Barreto ML, et al. Challenges in administrative data linkage for research. Big Data Soc. 2017;4:1–12. https://doi.org/10.1177/2053951717745678.
Mooney SJ, Pejaver V. Big data in public health: terminology, machine learning, and privacy. Annu Rev Public Health. 2018;39(1):95–112. https://doi.org/10.1146/annurev-publhealth-040617-014208.
Pita R, Pinto C, Sena S, Fiaccone R, Amorim L, Reis S, et al. On the accuracy and scalability of probabilistic data linkage over the Brazilian 114 million cohort. IEEE J Biomed Health Inform. 2018;22(2):346–53. https://doi.org/10.1109/JBHI.2018.2796941.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Rights and permissions
About this article
Cite this article
Barreto, M.L., Rodrigues, L.C. Linkage of Administrative Datasets: Enhancing Longitudinal Epidemiological Studies in the Era of “Big Data”. Curr Epidemiol Rep 5, 317–320 (2018). https://doi.org/10.1007/s40471-018-0177-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40471-018-0177-5