Skip to main content

Data Management in Epidemiology

  • Reference work entry
Book cover Handbook of Epidemiology

Abstract

Data in epidemiological studies are obtained from several sources such as questionnaires, medical records, medical devices, or laboratory tests. Data management includes the transfer of such data into a (central) database as well as all subsequent processing activities and quality control. This chapter describes the most essential steps related to the collection, entry, storage, transport, cleaning, maintenance, and statistical processing of epidemiological data, which embrace all steps from raw data to the final dataset for statistical analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 999.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 1,399.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The definition is simplified. From the perspective of a database administrator, for example, an ORACLE database consists of several files, and it is also possible to store more than one spreadsheet in one R file.

  2. 2.

    http://en.wikipedia.org/wiki/Apollo_11_missing_tapes, last access: July 19, 2012

References

  • Ahrens W, Merletti F (1998) A standard tool for the analysis of occupational lung cancer in epidemiologic studies. Int J Occup Environ Health 4:236–240

    Article  CAS  PubMed  Google Scholar 

  • Ahrens W, Bammann K, Siani A, Buchecker K, De Henauw S, Iacoviello L, Hebestreit A, Krogh V, Lissner L, Mårild S, Molnár D, Moreno LA, Pitsiladis YP, Reisch L, Tornaritis M, Veidebaum T, Pigeot I; IDEFICS Consortium (2011) The IDEFICS cohort: design, characteristics and participation in the baseline survey. Int J Obes 35(Suppl 1):S3–15

    Article  Google Scholar 

  • Armitage P, Berry G, Matthews JNS (2002) Statistical methods in medical research, 4th edn. Blackwell, Oxford

    Book  Google Scholar 

  • Boice JD Jr, Morin MM, Glass AG, Friedman GD, Stovall M, Hoover RN, Fraumeni JF Jr (1991) Diagnostic x-ray procedures and risk of leukemia, lymphoma, and multiple myeloma. JAMA 265:1290–1294

    Article  PubMed  Google Scholar 

  • Breslow NE, Day NE (1980) Statistical methods in cancer research. Volume I – the analysis of case-control studies. IARC Science Publication, Lyon

    Google Scholar 

  • Breslow NE, Day NE (1987) Statistical methods in cancer research. Volume II – the design and analysis of cohort studies. IARC Science Publication, Lyon

    Google Scholar 

  • CDC Centers for Disease Control and Prevention (2011) Epi InfoTM 7. http://www.cdc.gov/epiinfo/. Accessed 9 Aug 2012

  • Chang S, Wong S (2005) The role of analysis datasets in successful FDA advisory meetings. http://www.lexjansen.com/pharmasug/2005/fdacompliance/fc06.pdf. Accessed 9 Aug 2012

  • Chin R, Lee B (2008) Principles and practice of clinical trial medicine. Academic, St. Louis

    Google Scholar 

  • CIOMS (2008) International ethical guidelines for epidemiological studies. Council for International Organizations of Medical Sciences (CIOMS) in collaboration with the World Health Organization (WHO), Geneva

    Google Scholar 

  • Cody R (2008) Cody’s data cleaning techniques, 2nd edn. SAS Institute Inc. Cary, NC

    Google Scholar 

  • Dean AG, Sullivan KM, Soe MM (2011) OpenEpi: open source epidemiologic statistics for public health, Version 2.3.1. http://www.openepi.com/OE2.3/Menu/OpenEpiMenu.htm. Accessed 12 July 2012

  • De Guise P (2008) Enterprise systems backup and recovery: a corporate insurance policy. Auerbach Publications, Boston

    Book  Google Scholar 

  • EpiData Association (2010) EpiData Software. http://www.epidata.dk/. Accessed 9 Aug 2012

  • Fitzmaurice G (2008) Missing data: implications for analysis. Nutrition 24:200–202

    Article  PubMed  Google Scholar 

  • Gassman JJ, Owen WW, Kuntz TE, Martin JP, Amoroso WP (1995) Data quality assurance, monitoring, and reporting. Control Clin Trials 16:104S–136S

    Article  CAS  PubMed  Google Scholar 

  • Gumm HP (1986) Encoding of numbers to detect typing errors. Int J Appl Eng Educ 2:61–65

    Google Scholar 

  • Hartge P (2006) Participation in population studies. Epidemiology 17:252–254

    Article  PubMed  Google Scholar 

  • Hebestreit A, Ahrens W (2012) Dietary and lifestyle-induced diseases in children: design, examination modules and study population of the baseline survey of the German IDEFICS cohort (in German). Bundesgesundheitsblatt 55:892–899

    Article  CAS  Google Scholar 

  • International Labour Office (1968) International standard classification of occupations. International Labour Office Publications, Geneva

    Google Scholar 

  • IEA International Epidemiological Association (2007) Good Epidemiological Practice (GEP) IEA Guidelines for proper conduct in epidemiological research. http://www.iaeweb.org/. Accessed 11 May 2012

  • Kuczmarski RJ, Ogden CL, Grummer-Strawn LM, Flegal KM, Guo SS, Wei R, Mei Z, Curtin LR, Roche AF, Johnson CL (2000) CDC growth charts: United States. Advance data from vital and health statistics. National Center for Health Statistics, Hyattsville

    Google Scholar 

  • Likert R (1932) A technique for the measurement of attitudes. Arch Psychol 140:1–55

    Google Scholar 

  • Little DB, Chapa DA (2003) Implementing backup and recovery: the readiness guide for the enterprise. Wiley, Indianapolis

    Google Scholar 

  • Morton LM, Cahill J, Hartge P (2006) Reporting participation in epidemiologic studies: a survey of practice. Am J Epidemiol 163:197–203

    Article  PubMed  Google Scholar 

  • Nelson S (2011) Pro data backup and recovery. Apress, New York

    Book  Google Scholar 

  • Olson SH, Voigt LF, Begg CB, Weiss NS (2002) Reporting participation in case-control studies. Epidemiology 13:123–126

    Article  PubMed  Google Scholar 

  • Osborne JW (2010) Data cleaning basics: best practices in dealing with extreme scores. Newborn Infant Nurs Rev 10:37–43

    Article  Google Scholar 

  • Pohlabeln H, Boffetta P, Ahrens W, Merletti F, Agudo A, Benhamou E, Benhamou S, Brüske-Hohlfeld I, Ferro G, Fortes C, Kreuzer M, Mendes A, Nyberg F, Pershagen G, Saracci R, Schmid G, Siemiatycki J, Simonato L, Whitley E, Wichmann HE, Winck C, Zambon P, Jöckel KH (2000) Occupational risks for lung cancer among nonsmokers. Epidemiology 11:532–538

    Article  CAS  PubMed  Google Scholar 

  • Pohlabeln H, Wild P, Schill W, Ahrens W, Jahn I, Bolm-Audorff U, Jöckel KH (2002) Asbestos fibreyears and lung cancer: a two phase case-control study with expert exposure assessment. Occup Environ Med 59:410–414

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  • Portas M (2008) A dictionary of epidemiology. Oxford University Press, New York

    Book  Google Scholar 

  • Preston CW (2007) Backup & recovery: inexpensive backup solutions for open systems. O’Reilly Media, Sebastopol

    Google Scholar 

  • Prud’homme GJ, Canner PL, Cutler JA (1989) Quality assurance and monitoring in the Hypertension Prevention Trial. Control Clin Trials 10:84S–94S

    Article  PubMed  Google Scholar 

  • Reineke A, Pigeot I, Ahrens W (2014) MODYS – a modular control and documentation system for epidemiological studies. In: Bammann K, Ahrens W (eds) Instruments for a large sacle survey in children – the European IDEFICS study: development, scientific rationale, application and practical recommendations. Springer, Heidelberg

    Google Scholar 

  • Sax FL, Charlson ME (1987) Medical patients at high risk for catastrophic deterioration. Crit Care Med 15:510–515

    Article  CAS  PubMed  Google Scholar 

  • Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Wood AM, Carpenter JR (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ 338:b2393

    Article  PubMed Central  PubMed  Google Scholar 

  • Theobald K, Capan M, Herbold M, Schinzel S, Hundt F (2009) Quality assurance in non-interventional studies. Ger Med Sci (GMS) 7:Doc29

    Google Scholar 

  • Tooth L, Ware R, Bain C, Purdie DM, Dobson A (2005) Quality of reporting of observational longitudinal research. Am J Epidemiol 161:280–288

    Article  PubMed  Google Scholar 

  • TrueCrypt (2012) Free open-source on-the-fly encryption. http://www.truecrypt.org/. Accessed 12 July 2012

  • Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading

    Google Scholar 

  • United Nations Publications (1971) International standard industrial classification of all economic activities (ISIC). Publishing Service United Nations, New York

    Google Scholar 

  • Van den Broeck J, Cunningham SA, Eeckels R, Herbst K (2005) Data cleaning: detecting, diagnosing, and editing data abnormalities. PLoS Med 2:e267

    Article  PubMed Central  PubMed  Google Scholar 

  • van Es GA (1996) Research practice and data management. Neth J Med 48:38–44

    Article  PubMed  Google Scholar 

  • Vrijheid M, Richardson L, Armstrong BK, Auvinen A, Berg G, Carroll M, Chetrit A, Deltour I, Feychting M, Giles GG, Hours M, Iavarone I, Lagorio S, Lonn S, McBride M, Parent ME, Sadetzki S, Salminen T, Sanchez M, Schlehofer B, Schuz J, Siemiatycki J, Tynes T, Woodward A, Yamaguchi N, Cardis E (2009) Quantifying the impact of selection bias caused by nonparticipation in a case-control study of mobile phone use. Ann Epidemiol 19:33–41

    Article  PubMed  Google Scholar 

  • Whitney CW, Lind BK, Wahl PW (1998) Quality assurance and quality control in longitudinal studies. Epidemiol Rev 20:71–80

    Article  CAS  PubMed  Google Scholar 

  • Wichmann H-E, Kaaks R, Hoffmann W, Jöckel K-H, Greiser KH, Linseisen J (2012) The national cohort (in German). Bundesgesundheitsblatt 55:781–789. see also: http://www.nationale-kohorte.de. Accessed 9 Aug 2012

    Google Scholar 

  • Williams D (1942) Basic instructions for interviewers. Public Opin Q 6:634–641

    Article  Google Scholar 

  • World Health Organization (2009) International statistical classification of diseases and health related problems. The ICD-10, 2nd edn. World Health Organization, Geneva

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this entry

Cite this entry

Pohlabeln, H., Reineke, A., Schill, W. (2014). Data Management in Epidemiology. In: Ahrens, W., Pigeot, I. (eds) Handbook of Epidemiology. Springer, New York, NY. https://doi.org/10.1007/978-0-387-09834-0_48

Download citation

  • DOI: https://doi.org/10.1007/978-0-387-09834-0_48

  • Publisher Name: Springer, New York, NY

  • Print ISBN: 978-0-387-09833-3

  • Online ISBN: 978-0-387-09834-0

  • eBook Packages: MedicineReference Module Medicine

Publish with us

Policies and ethics