Abstract
In epidemiological research, large datasets are essential to reliably capture small variations among comparative groups or detect new unsuspected associations. Although large databases of web-search information, social media, airline traffic and telephone records are already widely used to capture social trends, large databases in medical research are just emerging. With the universal use of electronic medical records underway, vast amounts of health-related information will become available for biomedical research. Accepting such new research tools—based on the analysis of large pre-existing datasets rather than hypothesis-driven, in-depth prospective study—will require a new mindset in clinical research, as data might be 'messy' and only associations, but not causality, can be detected. In spite of such limitations, the utilization of these new resources for medical research harbours great potential for advancing knowledge about digestive diseases.
Similar content being viewed by others
References
Sonnenberg, A. & Genta, R. M. Helicobacter pylori is a risk factor for colonic neoplasms. Am. J. Gastroenterol. 108, 208–215 (2013).
Sonnenberg, A., Lash, R. H. & Genta, R. M. A national study of Helicobacter pylori infection in gastric biopsy specimens. Gastroenterology 139, 1894–1901 (2010).
Dellon, E. S. et al. Inverse association of esophageal eosinophilia with Helicobacter pylori based on analysis of a US pathology database. Gastroenterology 141, 1586–1592 (2011).
Sonnenberg, A. et al. Patterns of endoscopy in the United States—analysis of data from the Centers for Medicare and Medicaid Services and the National Endoscopic Database. Gastrointest. Endosc. 67, 489–496 (2008).
Crooks, C., Card, T. & West, J. Reductions in 28-day mortality following hospital admission for upper gastrointestinal hemorrhage. Gastroenterology 141, 62–70 (2011).
Meyer-Schonberger, V. & Cukier, K. Big Data: a Revolution That Will Transform How We Live, Work, and Think (Houghton Mifflin Harcourt, 2013).
Everhart, J. E. (Ed.) Digestive Diseases in the United States: Epidemiology and Impact. (US Department of Health and Human Services, NIH publication no. 94–1447, US Government Printing Office, 1994).
Talley, N. J., Locke, G. R. III & Saito, Y. A. (Eds) GI Epidemiology (Blackwell Publishing, 2007).
Loftus, E. V. Jr et al. Ulcerative colitis in Olmsted County, Minnesota, 1940–1993: incidence, prevalence, and survival. Gut 46, 336–343 (2000).
Eckardt, V. F., Gockel, I. & Bernhard, G. Pneumatic dilation for achalasia: late results of a prospective follow up investigation. Gut 53, 629–633 (2004).
Gupta, N. et al. Adequacy of esophageal squamous mucosa specimens obtained during endoscopy: are standard biopsies sufficient for postablation surveillance in Barrett's esophagus? Gastrointest. Endosc. 75, 11–18 (2012).
Ludvigsson, J. F. et al. Increasing incidence of celiac disease in a North American population. Am. J. Gastroenterol. 108, 818–824 (2013).
Hennekens, C. H. & Buring, J. E. Epidemiology in Medicine (Ed. Mayrent, S. L.) (Lippincott Williams & Wilkins, 1987).
Pounder, R. E. & Ng, D. The prevalence of Helicobacter pylori infection in different countries. Aliment. Pharmacol. Ther. 9 (Suppl. 2), 33–39 (1995).
Sonnenberg, A. Differences in the birth-cohort patterns of gastric cancer and peptic ulcer. Gut 59, 736–743 (2010).
Sonnenberg, A. & Wasserman, I. H. & Jacobsen, S. J. Monthly variation of hospital admission and mortality of peptic ulcer disease: a reappraisal of ulcer periodicity. Gastroenterology 103, 1192–1198 (1992).
Sonnenberg, A. Seasonal variation of enteric infections and inflammatory bowel disease. Inflamm. Bowel Dis. 14, 955–959 (2008).
Almansa, C. et al. Seasonal distribution in newly diagnosed cases of eosinophilic esophagitis in adults. Am. J. Gastroenterol. 104, 828–833 (2009).
Sezgin, O., Altintas, E. & Tombak, A. Effects of seasonal variations on acute upper gastrointestinal bleeding and its etiology. Turk. J. Gastroenterol. 18, 172–176 (2007).
Manser, C. N. et al. Heat waves, incidence of infectious gastroenteritis, and relapse rates of inflammatory bowel disease: A retrospective controlled observational study. Am. J. Gastroenterol. 108, 1480–1485 (2013).
Frei, P. et al. Use of mobile phones and risk of brain tumours: update of Danish cohort study. BMJ 343, d6387 (2011).
McKinney, W. P., McIntire, D. D., Carmody, T. J. & Joseph, A. Comparing the smoking behavior of veterans and nonveterans. Public Health Rep. 112, 212–217 (2010).
Hawkins, E. J., Grossbard, J., Benbow, J., Nacev, V. & Kivlahan, D. R. Evidence-based screening, diagnosis, and treatment of substance use disorders among veterans and military service personnel. Mil. Med. 177 (Suppl. 8), 29–38 (2012).
Thirumurthi, S., Chowdhury, R., Richardson, P. & Abraham, N. S. Validation of ICD-9-CM diagnostic codes for inflammatory bowel disease among veterans. Dig. Dis. Sci. 55, 2592–2598 (2010).
Bernstein, C. N., Blanchard, J. F., Rawsthorne, P., Wajda, A. Epidemiology of Crohn's disease and ulcerative colitis in a central Canadian province: a population-based study. Am. J. Epidemiol. 149, 916–924 (1999).
Ginsberg, J. et al. Detecting influenza epidemics using search engine query data. Nature 457, 1012–1014 (2009).
El-Serag, H. B. et al. The use of screening colonoscopy for patients cared for by the Department of Veterans Affairs. Arch. Intern. Med. 166, 2202–2208 (2006).
El-Serag, H. B., Xu, F., Biyani, P. & Cooper, G. S. Bundling in medicare patients undergoing bidirectional endoscopy: how often does it happen? Clin. Gastroenterol. Hepatol. 12, 58–63 (2014).
Yang, Y. X., Lewis, J. D., Epstein, S. & Metz, D. C. Long-term proton pump inhibitor therapy and risk of hip fracture. JAMA 296, 2947–2953 (2005).
Sonnenberg, A. Occupational distribution of inflammatory bowel disease among German employees. Gut 31, 1037–1040 (1990).
Bloom, B. S. Cross-national changes in the effects of peptic ulcer disease. Ann. Intern. Med. 114, 558–562 (1991).
Lagergren, J., Mattsson, F. & Nyrén, O. Gastroesophageal reflux does not alter effects of body mass index on risk of esophageal adenocarcinoma. Clin. Gastroenterol. Hepatol. 12, 45–51 (2014).
Davila, J. A. & El-Serag, H. B. GI Epidemiology: databases for epidemiological studies. Aliment. Pharmacol. Ther. 25, 169–176 (2007).
Corley, D. A. et al. Impact of endoscopic surveillance on mortality from Barrett's esophagus-associated esophageal adenocarcinomas. Gastroenterology 145, 312–319 (2013).
US Department of Health and Human Services. Health Information Privacy. HHS.gov [online], (2013).
Govtrack.us. H. R. 3590 (111th): Patient Protection and Affordable Care Act. Govtrack.us [online], (2013).
Centers for Medicare & Medicaid Services. Physician Quality Reporting System. CMS.gov [online], (2013).
Department of Health Informatics Directorate. NHS Connecting for Health. NHS Connecting for Health [online], (2013).
Estonia ICT Demo Center. Electronic Health Record. eEstonia [online], (2013).
Google. Google Translate [online], (2014).
Hou, J. K. et al. Automated identification of surveillance colonoscopy in inflammatory bowel disease using natural language processing. Dig. Dis. Sci. 58, 936–941 (2013).
Velayos, F. S. et al. Prevalence of colorectal cancer surveillance for ulcerative colitis in an integrated health care delivery system. Gastroenterology 139, 1511–1518 (2010).
Author information
Authors and Affiliations
Contributions
Both authors contributed equally to all aspects of this manuscript.
Corresponding author
Ethics declarations
Competing interests
In addition to his academic post, R.M.G. is an employee of Miraca Life Sciences, serving as Chief of Academic Affairs. Miraca Life Sciences possess a private database of pathology records, which R.M.G. manages. A.S. declares no competing interests.
Rights and permissions
About this article
Cite this article
Genta, R., Sonnenberg, A. Big data in gastroenterology research. Nat Rev Gastroenterol Hepatol 11, 386–390 (2014). https://doi.org/10.1038/nrgastro.2014.18
Published:
Issue Date:
DOI: https://doi.org/10.1038/nrgastro.2014.18
- Springer Nature Limited
This article is cited by
-
A review of systematic evaluation and improvement in the big data environment
Frontiers of Engineering Management (2020)
-
Big data in IBD: a look into the future
Nature Reviews Gastroenterology & Hepatology (2019)
-
The Use of International Classification of Diseases Codes to Identify Patients with Pancreatitis: A Systematic Review and Meta-analysis of Diagnostic Accuracy Studies
Clinical and Translational Gastroenterology (2018)
-
Liver cirrhosis and cancer: comparison of mortality
Hepatology International (2018)
-
Software for enhanced video capsule endoscopy: challenges for essential progress
Nature Reviews Gastroenterology & Hepatology (2015)