Abstract
Data in epidemiological studies are obtained from several sources such as questionnaires, medical records, medical devices, or laboratory tests. Data management includes the transfer of such data into a (central) database as well as all subsequent processing activities and quality control. This chapter describes the most essential steps related to the collection, entry, storage, transport, cleaning, maintenance, and statistical processing of epidemiological data, which embrace all steps from raw data to the final dataset for statistical analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The definition is simplified. From the perspective of a database administrator, for example, an ORACLE database consists of several files, and it is also possible to store more than one spreadsheet in one R file.
- 2.
http://en.wikipedia.org/wiki/Apollo_11_missing_tapes, last access: July 19, 2012
References
Ahrens W, Merletti F (1998) A standard tool for the analysis of occupational lung cancer in epidemiologic studies. Int J Occup Environ Health 4:236–240
Ahrens W, Bammann K, Siani A, Buchecker K, De Henauw S, Iacoviello L, Hebestreit A, Krogh V, Lissner L, Mårild S, Molnár D, Moreno LA, Pitsiladis YP, Reisch L, Tornaritis M, Veidebaum T, Pigeot I; IDEFICS Consortium (2011) The IDEFICS cohort: design, characteristics and participation in the baseline survey. Int J Obes 35(Suppl 1):S3–15
Armitage P, Berry G, Matthews JNS (2002) Statistical methods in medical research, 4th edn. Blackwell, Oxford
Boice JD Jr, Morin MM, Glass AG, Friedman GD, Stovall M, Hoover RN, Fraumeni JF Jr (1991) Diagnostic x-ray procedures and risk of leukemia, lymphoma, and multiple myeloma. JAMA 265:1290–1294
Breslow NE, Day NE (1980) Statistical methods in cancer research. Volume I – the analysis of case-control studies. IARC Science Publication, Lyon
Breslow NE, Day NE (1987) Statistical methods in cancer research. Volume II – the design and analysis of cohort studies. IARC Science Publication, Lyon
CDC Centers for Disease Control and Prevention (2011) Epi InfoTM 7. http://www.cdc.gov/epiinfo/. Accessed 9 Aug 2012
Chang S, Wong S (2005) The role of analysis datasets in successful FDA advisory meetings. http://www.lexjansen.com/pharmasug/2005/fdacompliance/fc06.pdf. Accessed 9 Aug 2012
Chin R, Lee B (2008) Principles and practice of clinical trial medicine. Academic, St. Louis
CIOMS (2008) International ethical guidelines for epidemiological studies. Council for International Organizations of Medical Sciences (CIOMS) in collaboration with the World Health Organization (WHO), Geneva
Cody R (2008) Cody’s data cleaning techniques, 2nd edn. SAS Institute Inc. Cary, NC
Dean AG, Sullivan KM, Soe MM (2011) OpenEpi: open source epidemiologic statistics for public health, Version 2.3.1. http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Accessed 12 July 2012
De Guise P (2008) Enterprise systems backup and recovery: a corporate insurance policy. Auerbach Publications, Boston
EpiData Association (2010) EpiData Software. http://www.epidata.dk/. Accessed 9 Aug 2012
Fitzmaurice G (2008) Missing data: implications for analysis. Nutrition 24:200–202
Gassman JJ, Owen WW, Kuntz TE, Martin JP, Amoroso WP (1995) Data quality assurance, monitoring, and reporting. Control Clin Trials 16:104S–136S
Gumm HP (1986) Encoding of numbers to detect typing errors. Int J Appl Eng Educ 2:61–65
Hartge P (2006) Participation in population studies. Epidemiology 17:252–254
Hebestreit A, Ahrens W (2012) Dietary and lifestyle-induced diseases in children: design, examination modules and study population of the baseline survey of the German IDEFICS cohort (in German). Bundesgesundheitsblatt 55:892–899
International Labour Office (1968) International standard classification of occupations. International Labour Office Publications, Geneva
IEA International Epidemiological Association (2007) Good Epidemiological Practice (GEP) IEA Guidelines for proper conduct in epidemiological research. http://www.iaeweb.org/. Accessed 11 May 2012
Kuczmarski RJ, Ogden CL, Grummer-Strawn LM, Flegal KM, Guo SS, Wei R, Mei Z, Curtin LR, Roche AF, Johnson CL (2000) CDC growth charts: United States. Advance data from vital and health statistics. National Center for Health Statistics, Hyattsville
Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 140:1–55
Little DB, Chapa DA (2003) Implementing backup and recovery: the readiness guide for the enterprise. Wiley, Indianapolis
Morton LM, Cahill J, Hartge P (2006) Reporting participation in epidemiologic studies: a survey of practice. Am J Epidemiol 163:197–203
Nelson S (2011) Pro data backup and recovery. Apress, New York
Olson SH, Voigt LF, Begg CB, Weiss NS (2002) Reporting participation in case-control studies. Epidemiology 13:123–126
Osborne JW (2010) Data cleaning basics: best practices in dealing with extreme scores. Newborn Infant Nurs Rev 10:37–43
Pohlabeln H, Boffetta P, Ahrens W, Merletti F, Agudo A, Benhamou E, Benhamou S, Brüske-Hohlfeld I, Ferro G, Fortes C, Kreuzer M, Mendes A, Nyberg F, Pershagen G, Saracci R, Schmid G, Siemiatycki J, Simonato L, Whitley E, Wichmann HE, Winck C, Zambon P, Jöckel KH (2000) Occupational risks for lung cancer among nonsmokers. Epidemiology 11:532–538
Pohlabeln H, Wild P, Schill W, Ahrens W, Jahn I, Bolm-Audorff U, Jöckel KH (2002) Asbestos fibreyears and lung cancer: a two phase case-control study with expert exposure assessment. Occup Environ Med 59:410–414
Portas M (2008) A dictionary of epidemiology. Oxford University Press, New York
Preston CW (2007) Backup & recovery: inexpensive backup solutions for open systems. O’Reilly Media, Sebastopol
Prud’homme GJ, Canner PL, Cutler JA (1989) Quality assurance and monitoring in the Hypertension Prevention Trial. Control Clin Trials 10:84S–94S
Reineke A, Pigeot I, Ahrens W (2014) MODYS – a modular control and documentation system for epidemiological studies. In: Bammann K, Ahrens W (eds) Instruments for a large sacle survey in children – the European IDEFICS study: development, scientific rationale, application and practical recommendations. Springer, Heidelberg
Sax FL, Charlson ME (1987) Medical patients at high risk for catastrophic deterioration. Crit Care Med 15:510–515
Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393
Theobald K, Capan M, Herbold M, Schinzel S, Hundt F (2009) Quality assurance in non-interventional studies. Ger Med Sci (GMS) 7:Doc29
Tooth L, Ware R, Bain C, Purdie DM, Dobson A (2005) Quality of reporting of observational longitudinal research. Am J Epidemiol 161:280–288
TrueCrypt (2012) Free open-source on-the-fly encryption. http://www.truecrypt.org/. Accessed 12 July 2012
Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading
United Nations Publications (1971) International standard industrial classification of all economic activities (ISIC). Publishing Service United Nations, New York
Van den Broeck J, Cunningham SA, Eeckels R, Herbst K (2005) Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2:e267
van Es GA (1996) Research practice and data management. Neth J Med 48:38–44
Vrijheid M, Richardson L, Armstrong BK, Auvinen A, Berg G, Carroll M, Chetrit A, Deltour I, Feychting M, Giles GG, Hours M, Iavarone I, Lagorio S, Lonn S, McBride M, Parent ME, Sadetzki S, Salminen T, Sanchez M, Schlehofer B, Schuz J, Siemiatycki J, Tynes T, Woodward A, Yamaguchi N, Cardis E (2009) Quantifying the impact of selection bias caused by nonparticipation in a case-control study of mobile phone use. Ann Epidemiol 19:33–41
Whitney CW, Lind BK, Wahl PW (1998) Quality assurance and quality control in longitudinal studies. Epidemiol Rev 20:71–80
Wichmann H-E, Kaaks R, Hoffmann W, Jöckel K-H, Greiser KH, Linseisen J (2012) The national cohort (in German). Bundesgesundheitsblatt 55:781–789. see also: http://www.nationale-kohorte.de. Accessed 9 Aug 2012
Williams D (1942) Basic instructions for interviewers. Public Opin Q 6:634–641
World Health Organization (2009) International statistical classification of diseases and health related problems. The ICD-10, 2nd edn. World Health Organization, Geneva
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this entry
Cite this entry
Pohlabeln, H., Reineke, A., Schill, W. (2014). Data Management in Epidemiology. In: Ahrens, W., Pigeot, I. (eds) Handbook of Epidemiology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-09834-0_48
Download citation
DOI: https://doi.org/10.1007/978-0-387-09834-0_48
Publisher Name: Springer, New York, NY
Print ISBN: 978-0-387-09833-3
Online ISBN: 978-0-387-09834-0
eBook Packages: MedicineReference Module Medicine