Introduction

Despite the increasing prevalence and use of enormous datasets in GI and hepatology research, training in the handling of ‘big data’ appears to be underutilized. As discussed in Part 1 of this two-part series, there is an evident need for comprehensive training in the acquisition, analysis, and comprehension of large datasets, which those interested in pursuing these lines of investigation frequently lack [1]. While expansion of existing training pathways is necessary, multiple opportunities currently exist for those interested in performing big data GI and liver research. Here, we offer an overview of the current options for training in big data use and analysis and offer potential avenues for attaining this training during undergraduate and graduate medical education. We review these opportunities in the context of the time available, arbitrarily dividing options into those requiring less than 1 year, 1 to 2 years, and 3 or more years (Fig. 1).

Fig. 1
figure 1

Matching pre- and postdoctoral timeframes to opportunities for training in big data acquisition, analysis, and interpretation. The steep learning curve for proper big data analysis and interpretation can be mitigated by increasing the duration of training. Options to obtain even a rudimentary understanding of these methods are available for those with limited time, i.e., less than 1 year. ABIM American Board of Internal Medicine, BD-Step Big Data Scientist Training Enhancement Program, NGO non-governmental organization, ME medical education, MSTP Medical Scientist Training Program, PSTP Physician Scientist Training Program. Created with Biorender.com

Existing Options for Trainees Who Wish to Pursue Experience and Develop Expertise in Big Data Analysis

Numerous options and pathways exist for trainees wishing to attain competence in working with or incorporating large datasets in their work (Table 1). Students enrolled in dual MD/PhD or DO/PhD programs (henceforth referred to as “MD/PhD”) can obtain a PhD in formal biostatistics or bioinformatics and rotate in laboratories that focus on big data generation and/or analysis. However, it is important to highlight opportunities accessible to the larger group of MD-only aspiring physician-investigators to develop the skills needed to harness the research power of big data distinct from MD/PhD.

Table 1 Overview of big data research training opportunities available at various stages of medical education and clinical training

MD-only candidates with research-intensive career intentions have different research interests, practice plans, interests in advocacy and administration roles, time constraints, and foreseeable obstacles in career progression than MD/PhD or traditional MD candidates [2]. Support for their career development must be tailored accordingly, particularly in light of the declining percentages of R01-equivalent research grants awarded to MD-only physician-scientists [3], as well as the currently high attrition rate of physicians in academic medicine [4].

An option involving intensive and structured training in research at institutions with big data capabilities is “year-out” or “2-years-out” training programs for medical students, such as those offered by the NIH. Indeed, it is increasingly common for medical student interested in academic careers to take so-called ‘gap years’ to enhance their academic skills. Students typically apply for such training during their second year of medical school and, upon acceptance, take a leave of absence to complete research programs before starting the third year of medical school. The Medical Research Scholars Program at the NIH offers accepted students a year-long research opportunity in a variety of areas, including big data generation and analysis, additionally providing networking opportunities, training in clinical protocol development, and exposure to emerging technologies. Alternatively, medical students with established research connections via a training grant (e.g., NIH T32 training program) can apply for “year-out” funding through a Medical Student Research Training Supplement offered by some NIH institutes (e.g., NIDDK).

Research support during medical school is also possible through programs like the 2-year Carolyn L. Kuckein Student Research Fellowship and non-governmental organizations (NGOs). Benefits of undertaking a “year-out” research program include focused enrichment of research techniques and skills without distractions that foster the analytical capabilities needed for impactful GI/liver-directed big data analysis. Nonetheless, these options require financial support for the training as well as the additional work required for successful research projects (e.g., time and resources required to satisfy manuscript revisions); the latter may pose substantial obstacles following the medical student’s return to clinical rotations. Another limiting consideration is the substantial debt owed by many students who need to repay undergraduate- and graduate-level loans obtained to pay for their education.

Alternatively, some undergraduate medical education programs offer non-PhD-granting physician-scientist training programs (PSTPs), distinct from residency PSTPs, that provide dedicated time to gain research expertise in managing big datasets during undergraduate medical education. For example, the University of Pittsburgh School of Medicine offers a 5-year PSTP for aspiring MD-only clinical researchers, which aims to produce physician-scientist preclinical and translational researchers through career mentorship, PSTP-designed research courses, and one dedicated research year supported by institutional funding. The program’s success is well documented; students and alumni of the PSTP had higher measures of research productivity and funding acquisition compared to classmates in the traditional MD-granting program [5]. At the Stanford University School of Medicine, all students are required to complete a “scholarly concentration” in an area of interest, traditionally spanning all years of medical school under the mentorship of a topic expert and culminating in a publishable product—to dedicate time to research, most students commit an additional year to medical school training. Likewise, at the Duke University School of Medicine, 9 months of the standard curriculum are devoted to an intensive research program requiring completion of a formal thesis.

Requiring a more modest time commitment, some medical school curricula offer “tracks” or “concentrations” in bioinformatics, which provide introductory but insufficient exposure for dedicated research or training in the use of big data. For example, the scholarly concentration in Biomedical Informatics at the Brown University Warren Alpert Medical School provides foundational curricula through a 3-week summer course and workshops; similar programs exist at several medical schools (UCLA David Geffen, the University of Michigan, the University of Texas Medical Branch at Galveston, and others).

A prominent talent development pathway available at the resident level is the American Board of Internal Medicine (ABIM) Research Pathway, a program available at many top-tier medical research institutions that integrates internal medicine residency and subspecialty fellowship training in GI or hepatology with several years of mentored postdoctoral research. This allows candidates committed to academic research careers to gain valuable experience while “fast-tracking” into their intended subspecialty. During the three dedicated research years afforded by the ABIM Research Pathway for those planning to subspecialize in GI or hepatology, trainees can collaborate with and learn from investigators experienced in big data and -omics methodologies.

Unfortunately, postdoctoral fellowship options catering specifically to physicians who recently finished GI fellowship training are scarce. General postdoctoral fellowships aimed at big data training tend to accept applicants who already possess substantial experience in computer science and/or bioinformatics. For example, the NIH offers Data Science Fellowship opportunities through the National Cancer Institute (NCI) and the National Institute of Allergy and Infectious Disease (NIAID), each provide training and hand-on experience in managing big datasets. The Big Data Scientist Training Enhancement Program (BD-STEP) is another valuable opportunity provided by NCI in partnership with the Veterans Health Administration to train postdoctoral fellows in data science processing while addressing patient health challenges, but this is currently available only to MD/PhDs or PhDs with relevant experience.

An additional complementary approach is to incorporate big data analysis into the medical school curriculum and subsequent fellowship training by means of expert presentation at divisional Grand Rounds and workshops and conferences sponsored by the American Gastroenterological Association (AGA, e.g., the AGA Academic Skills Workshop). At a minimum, the information offered must contain education on big data-type terminology, applications in healthcare and disease surveillance, available open-source big data platforms and utilities, and review recent big data research in the field.

Conclusions and Future Directions

Although there is much room to improve training opportunities, several avenues and pathways currently exist that can and should be leveraged by those interested in incorporating big data analysis to their work. These include tracks which may be pursued at the undergraduate medical level, such as pursuing an MD/PhD in bioinformatics/statistics, enrolling in a research-intensive non-PhD program or PSTP, and taking advantage of gap-year opportunities, such as the Medical Student Research Training Supplement or NIH Medical Research Scholars Program. During post-graduate clinical training, trainees can pursue the ABIM Research Pathway, obtain a masters level degree in bioinformatics or clinical research, or a more time-intensive PhD degree. MA, the lead author of the current manuscript, has witnessed firsthand how inadequate physician training in big data slows the advancement of medicine. She is currently leveraging the MSTP at the University of Maryland School of Medicine to complete the requirements for a PhD degree with a strong emphasis on multi-omics integration and interpretation.

Undoubtedly, including such instruction will contribute to the progressive expansion of the duration of scientific training; the average junior scientist can expect to receive initial independent research funding (NIH R01 award or equivalent) no earlier than in their mid-forties [6]. To maximize return on this financial and time investment, this increased duration of training must, by necessity, result in a proportional increase in productivity and professional lifespan. Indeed, the former concept of retirement at or about age 65 years has been laid to rest by the observation that many in the scientific workforce, taking advantage of advances in human health and functional lifespan, work well past age 70 years. Rather than regard an aging scientific workforce as competitors for funding and research positions that might otherwise go to junior scientists, we urge our colleagues to view this as an opportunity to take the time to both broaden and deepen training in big data research and, therefore, expand our scientific workforce and capabilities to add important new knowledge while at the same time bolstering research rigor and reproducibility.

Key Messages

  • Training GI and liver investigators in the appropriate application of big data in their work has become increasingly important, if not vital, for a successful academic career.

  • Several training pathways that vary in intensity and duration are available for those wishing to acquire big data research skills.