FormalPara Key Point

Pathological findings in esophagogastroduodenoscopy are frequent in healthy individuals without clinical symptoms and need to be considered to assess adverse drug effects in early clinical trials.

1 Introduction

The mode of action of some drugs, such as inhibition of prostaglandin synthesis [e.g. nonsteroidal anti-inflammatory drugs (NSAIDs)], vasoconstriction (e.g. nicotine, noradrenalin), chemical damage (e.g. bisphosphonates) or anticoagulation, is well known to cause gastrointestinal damage including severe effects such as ulcer and gastrointestinal (GI) bleeding [1,2,3,4]. The respective potency of drugs depends on identity even within classes, but also on doses, duration of use [5], route of administration and even preparation. Accordingly, when developing a new medication for human use, the mechanism of action and/or preclinical results may cause concerns with regard to GI safety in humans. Such risks are monitored by targeted safety measures in human trial subjects, especially during First-In-Human (FIH) administration [6,7,8,9]. To directly assess upper GI tract toxicity early in clinical development, esophagogastroduodenoscopy (EGD) is often used in healthy volunteers. Its invasiveness is considered as acceptable because the attributable risk is very low [10]. EGD findings in respective trials considered as pathological are typically summarized using one of the several versions of the Lanza score. Such scoring systems attempt to quantify the severity of any gastrointestinal findings by assigning predefined grades [e.g. 0 (no changes) to 4 (severe changes)] and are applied as surrogate endpoints to assess and compare the adverse effects of drugs on the upper GI tract [11].

In such studies on gastrointestinal safety of drugs, a screening EGD is conducted and volunteers with any pathological findings are excluded from participation, although clinical relevance of EGD findings in the absence of clinical symptoms is unclear. However, the baseline frequency and type of any lesions provides important information on the typical gastrointestinal status of clinically healthy subjects. It may serve as an indicator for the probability that respective lesions may develop spontaneously, and provide a reference for both the assessment of a drug as a potentially causative agent and for defining a finding as indicative of a disease process that may require treatment and/or follow-up. Indeed, EGD has been used as a method to assess gastrointestinal safety in a large number of clinical trials in healthy volunteers [11,12,13,14,15,16,17,18,19]. In these studies, the results for potential adverse treatment effects in eligible subjects were assessed, but data on respective screening failures were not reported. Several controlled clinical trials reported results of repeated EGD after intake of placebo, but no data about screening failures was reported in these studies either [11, 17, 20, 21]. Evaluation of relevant data on GI status in clinically healthy volunteers before inclusion in a clinical trial therefore provides important additional information on the background variability of respective findings. Thus, the main objective of the present evaluation was to document the occurrence of pathological findings of the GI mucosa in healthy volunteers without any clinical symptoms.

Data regarding lifestyle (e.g. consumption of stimulants such as coffee, alcohol and tobacco smoking) and dietary habits are by default recorded for selection of subjects in clinical trials, because these covariates are known to interact with drugs [22,23,24,25]. As some of these habits may also impair gastrointestinal health, the second objective was to assess a possible association of EGD findings with lifestyle (and demographics) in this population.

2 Methods

Three different clinical trials to assess safety, tolerability and pharmacokinetics of a new investigational drug or to specifically assess the GI safety of a new formulation of a known drug compound were conducted in healthy volunteers. The drugs tested were suspected to cause GI lesions; therefore, a baseline EGD was performed to detect any preexisting lesions. Volunteers with any finding considered as clinically relevant in this screening EGD were excluded from participation in the trials. The results of these baseline examinations were used for the present analysis.

2.1 Clinical Trials and Selection of Volunteers

All three trials were approved by the Ethics Committee of the Medical Association of Northrhine-Westfalia (Dec 29, 2008, #2008347; Feb 18, 2011, #2011015; and Feb 07, 2011, #2010443) and were conducted in compliance with the principles outlined in the Declaration of Helsinki as well as the International Conference for Harmonisation Good Clinical Practice [26]. They were registered in the European Clinical Trials Database (EudraCT) under the following numbers: 2008-005251-15, 2010-021371-80 and 2010-024116-34. Each volunteer provided written informed consent before participating in the respective trial. Participants were considered healthy based on standard screening procedures for clinical trials, including medical history and absence of symptoms, physical examination, hematology, clinical chemistry, serology, coagulation, ECG, and vital signs. Specific exclusion criteria included a positive fecal occult blood test, which was conducted in all volunteers, history or current evidence of any clinically relevant gastrointestinal disease including respective surgery (except appendectomy and herniotomy), recent intake of any drugs including NSAIDs, and pregnancy. Additionally, volunteers with a vegetarian diet or other peculiar dietary habits were excluded from participation of the trials; therefore, all analyzed volunteers consumed a mixed diet. If volunteers were considered to be eligible according to these screening procedures, a baseline EGD was carried out as the last step of the eligibility assessment.

As an additional safety assessment, a test for Helicobacter pylori antigen in feces was performed in two trials for all volunteers and in the third trial just for those volunteers who were found to be eligible for trial participation after baseline EGD. However, a positive test for H. pylori antigen in feces in the absence of clinically relevant EGD findings was not considered as an exclusion criterion for trial participation.

2.2 Evaluation of Lifestyle

Due to divergent demands of the protocols of these trials, the evaluation of lifestyle, that is, consumption of alcohol, caffeine-containing beverages (coffee, coke etc.) and tobacco was not homogeneous. Consumption of caffeine and smoking habits were recorded in two trials only.

For comparison of these three trials within the present analysis, alcohol consumption was transformed to grams of alcohol per day (10 g = 1 unit of alcohol, i.e. 250 mL of beer, 100 mL of wine, 30–35 mL of spirits). Caffeine consumption was calculated as milliliters of caffeine-containing beverages per day.

2.3 Endoscopic Evaluation

Classical EGD [27] was performed in a single center after an overnight fasting period (at least 6 h) using a PENTAX EG-2990i high-definition video gastroscope (PENTAX Europe GmbH, Hamburg, Germany). All EGDs were recorded on DVDs. A xylocaine throat spray (Xylocain® 10 mg/hub; 4–6 hubs) was applied in all trials. In one trial, sedation with propofol was offered as an option, while in the other trials EGD was carried out without sedating medication. The status of the esophageal, gastric and duodenal mucosa was assessed by careful inspection of all mucosal sections. The duodenum was examined until the pars descendens.

The endoscopists were all experienced specialists of internal medicine with board accreditation in gastroenterology. Directly after the examination, the results were recorded by the endoscopist separately for each GI section using trial-specific paper forms and plain language. Findings were described by type of observation (redness, erosion, ulcer or other), size and location by the anatomical position within the GI section. Assessment by clinical relevance (evaluated by the endoscopist as yes/no) was also recorded immediately. Clinically relevant findings were those that were considered to directly impact therapeutic decisions and prognosis [10]. The necessity of a biopsy and the recommendation of therapy following the examination was also assessed and documented directly afterwards by the endoscopist.

2.4 Data Analysis

All data from volunteers who passed the screening examination and underwent the baseline EGD were included in this analysis. The data recording period spanned from March 2009 to October 2011. Twenty-three volunteers participated in the screening procedures of more than one of these trials. Of these volunteers, only the first EGD was included in the current analysis.

The volunteers were classified in one of three groups:

  • ‘No Finding’: EGD resulted in no finding at all; volunteers in this group were considered eligible for participation in the clinical trials

  • ‘NCR’: EGD resulted in a finding that was rated as not clinically relevant (NCR); volunteers in this group were considered eligible for participation in the clinical trials

  • ‘CR’: EGD resulted in at least one clinically relevant (CR) finding, which led to exclusion from trial participation

These three groups were compared with respect to demographic characteristics (age at examination, body height, body weight, BMI, sex) and lifestyle (consumption of alcohol, caffeine-containing beverages and tobacco). The three groups (No Finding, NCR and CR) were compared. Where normal distribution was not rejected in the Kolmogorov–Smirnov test, one-way ANOVA was performed, otherwise a Kruskal–Wallis H test was used. Both assessments were subject to post-hoc analysis with Student's t test and Mann–Whitney U test, respectively. Pearson’s Chi Square test was applied for dichotomous variables. This test was also used to evaluate the results of tests for Helicobacter pylori from two trials regarding the distribution in the three groups of subjects, as well as appearance in the different organs and type of findings. Post-hoc testing (post-hoc analysis using U test according Mann and Whitney) was applied as described. The third trial (H. pylori test just in eligible subjects) was not included in this analysis.

For all tests, the significant differences were defined as p ≤ 0.05.

For a comparison with published studies, the results of the EGDs were retrospectively graded according to a ‘Lanza score’ based on a 0–4 scale as described in Table 1 [12]. The clinically relevant findings described as ‘Other’ [see Supplemental Table S4 in the electronic supplementary material (ESM) for details] were rated with Lanza score 0, since no applicable category is foreseen for abnormalities other than lesions, hemorrhages, erosions and ulcers in Lanza scores.

Table 1 Lanza score: grading scale for endoscopic results [12]

For all calculations, the statistical software Statistical Product and Service Solution (SPSS®) version 24 was used (International Business Machines Corp. IBM®, New York, USA). All statistic tests are considered as descriptive and do not test hypotheses.

3 Results

3.1 Volunteers

A total of 294 volunteers without any clinical symptoms during standard screening examinations were included in the present analysis. Of these, 37 subjects (12.6%) were female and 257 subjects (87.4%) were male. The majority of the volunteers were of Caucasian origin (n = 279, 95%), eight volunteers were Asian and five were African. For two volunteers, information on ethnic origin was missing. The mean age was 32 ± 8.1 years. Age, body height, body weight as well as BMI are listed in Table 2 and BMI is presented in Fig. 1.

Table 2 Demographic features of trial population and within groups no finding, NCR and CR
Fig. 1
figure 1

BMI (mean, SD) per group

Two hundred volunteers (out of 294 for which this information was available) declared that they regularly consumed alcohol, and 250 volunteers (out of 290) drank coffee and/or caffeine-containing beverages daily. Smoking was queried in 152 persons and out of these, 39 male volunteers reported smoking up to 10 cigarettes per day (see Table 3).

Table 3 Description of lifestyle of trial population

3.2 EGD Findings

A percentage of 55.8% of the EGDs showed no clinically relevant finding (20.4% of EGDs without any finding at all and 35.4% with at least one finding but of no clinical relevance), while 44.2% baseline EGDs showed at least one clinically relevant finding (see Table 4).

Table 4 Outcome and frequency of endoscopic ratings within groups ‘no finding’, ‘NCR’ and ‘CR’

The majority of the CR findings (100) were detected in the stomach. In the esophagus and duodenum, 44 and 24 CR findings were recorded, respectively. The most frequent CR finding was erosion: 34 in the esophagus, 87 in the stomach and 19 in the duodenum. Details on findings are shown in Figs. 2 and 3. Details on CR and NCR findings listed as ‘Other’ and per organ are provided in Supplemental Table S4 (see ESM).

Fig. 2
figure 2

Frequency of findings per organ

Fig. 3
figure 3

Findings per organ and group

The three groups no finding/NCR/CR showed a statistically significant difference with respect to age (p = 0.027) and BMI (p < 0.001). Age and the average BMI differed between the groups, which resulted in the oldest and heaviest volunteers in the group with clinically relevant findings in the EGD (age 2.8 and 0.6 years and BMI 0.6 and 1.0 kg/m2, respectively; see Table 2 for details). Similar results with regards to age and BMI were found in the sub-group of males (age p = 0.021 and BMI p = 0.001), whereas for the small group of female volunteers no statistically significant relationship was observed.

Regarding age of volunteers and appearance of clinically relevant findings, statistically significant relationships were observed for redness in the esophagus, findings in stomach in general, as well as erosion in the stomach (p = 0.040; p = 0.019; p = 0.027). Analysis of BMI showed a relationship between higher BMI and findings in the esophagus and stomach (p = 0.001; p = 0.005). As per type of finding, just erosions were significantly more frequent (p = 0.002) for higher BMI (see Supplemental Table S2 in the ESM). The differences between the subgroups are listed in Supplemental Table S2a (see ESM).

With regard to lifestyle (i.e. consumption of alcohol, caffeine and tobacco), no statistical difference was found between the groups no finding/NCR/CR either regarding overall outcome of EGD or for GI sections assessed separately. The only exception was that clinically relevant erosion in the esophagus was more frequent in subjects who were smokers (X2 = 7.741, p = 0.021). The statistically significant difference was apparent between sub-groups no finding and CR. A summary of results can be found in Supplemental Table S1 (see ESM).

Out of the 294 volunteers, 246 were tested for H. pylori, whereas 151 were included in statistical analysis regarding the relationship between H. pylori infection and appearance of GI abnormalities during EGD.

Testing of H. pylori infection showed no statistically significant difference between the three groups no finding/NCR/CR regarding appearance of clinically relevant findings in the EGD (p = 0.127). By organs, a statistically significant relationship was apparent for the esophagus (p = 0.009) and in the duodenum (p = 0.014).

Further investigations on the kind of finding by organ showed a statistically significant difference for erosions in the esophagus (p = 0.031) and erosions in the duodenum (p = 0.003). For all related findings, the post-hoc analyses showed significant differences between the subgroups No Finding and CR (see Supplemental Table S3b in the ESM).

4 Discussion

Our analysis of EGDs performed in 294 clinically healthy, mainly male subjects showed that upper gastrointestinal tract mucosal lesions, including those assessed as clinically relevant, are frequent and may increase with higher age and/ or body mass index.

The current analysis describes the results of EGDs of volunteers for participation in clinical trials without history or current evidence of any clinically relevant GI or other disease. Selection of trial participants during the screening procedure for early phase clinical trials, along with study design, is aimed at a high level of standardization by applying a narrow range of strict selection criteria [8, 9, 28]. Two of the trials were part of an early development program of a new drug, including an FIH trial. For safety reasons, FIH trials are typically conducted in young (maximum 50 years), healthy men [6]. Therefore, the analyzed population is not a representative sample of the average population with regard to sex, age (32.0 ± 8.1 years), BMI (24.0 ± 2.5 kg/m2), or health status. Positive H. pylori test results are similar in the trial population (19.9% tested positive) and in the German population (20.5%) [29, 30]. Caffeine-containing beverages were similar to the general population, with about 80% of adult Germans claiming to consume coffee on a daily basis [31]. The consumption of alcohol in the average German population is reported to reach up to 32.4 g/day [32,33,34] and this is clearly higher compared with the tested volunteers (average 7 g/day). Likewise, only 13.3% of the volunteers were smokers while 26% of the inhabitants of the European Union claimed to be smokers [35].

Age and BMI were significantly higher in the group with clinically relevant findings in the EGD. As it is known that BMI increases with age [31, 36, 37], it cannot be assessed which of the two could be causal for this observation. Age has been described as a risk factor for pathological findings in the upper GI tract in some studies [38,39,40], but not consistently so [41, 42].

Comparison of subjects with positive and negative H. pylori tests resulted in a statistical significance only in the assessment of esophagus (p = 0.009) and duodenum (p = 0.014). Infection with H. pylori is a well-known risk factor for diseases of the stomach and duodenum [30, 39, 42]; still, also in other studies, about 20% of symptom-free individuals were H. pylori positive [43].

No clear lifestyle effects on EGD results were seen in our study. Caffeine consumption has been a suspected cause of EGD findings in some reports [38, 44, 45], but not in another [46]. A review paper summarized that the relationship is unclear [47]. Also, the relevance of alcohol consumption as a risk factor for EGD findings is considered controversial [40, 42, 44, 48].

The high frequency of pathological EGD findings in our population, with clinically relevant findings in 44.2% of participants, remains surprising.

More than half of the studied population showed no finding or no clinically relevant finding (55.8%), which is considerably lower than shown in previous trials (73% without upper gastrointestinal findings [46], 66.3% in the asymptomatic control group [49], 62% of healthy volunteers with ‘normal’ EGD results [50], 80% of eligible subjects for trial participation after passing the screening EGD [51] or 91.2% of healthy volunteers without erosions during baseline EGD [18]). It has to be considered that in the current analyzed trials, all EGDs were performed using a high definition video gastroscope, which allowed the endoscopists to detect lesions and small findings that would not be visible using fiberglass endoscopes, which were used in trials reported in the 1980s and 1990s. Fiberglass endoscopy was then the common method but had a lower resolution and thus sensitivity to detect pathological findings [11, 13, 14, 20]. For the above-mentioned results of EGDs in healthy, symptom-free volunteers, no details are provided on used gastroscopic devices [17, 46, 49, 51], except for one study in asymptomatic healthy volunteers in which the use of a fiberglass gastroscope was described [50].

The reason for the high prevalence of lesions of the mucosa in the upper GI tract of clinically healthy symptom-free individuals is unclear, but questions should be raised around what is considered as a normal mucosa in the GI tract. Since EGD is not a preventive medical examination—such as colonoscopy—sufficient data about asymptomatic persons is not available. Furthermore, it was reported that the relationship between subjective experienced dyspeptic symptoms and the detection of damages of the epithelium of the upper GI tract during EGDs (which might lead to gastritis or even erosive gastritis) is unclear [49, 50] or was even considered as absent [1]. To account for the frequent observation that patients with clinical symptoms have no EGD findings, the term ‘Endoscopy Negative Reflux Disease’ has been coined, which is diagnosed in a major proportion of patients [52], but essentially reflects the limitations of EGD as a diagnostic tool of less severe upper gastrointestinal pathology.

It may also be questionable whether a ‘clean’ baseline EGD is required as inclusion criterion for clinical studies in assessing the gastrointestinal safety of drugs. In the present evaluation, absence of a clinically relevant finding was stipulated. In other respective trials, often a ‘normal’ mucosa (i.e. a Lanza score of 0) was defined as entry criterion to participate [12]. Application of the Lanza score in our studies as an entry criterion would have resulted in exclusion of another 59 subjects with Lanza scores ≥ 1.

The strength of the current analysis is that EGDs were conducted in the highly regulated set-up of a clinical trial. The performance of the EGDs and the quantitative documentation of the findings followed strictly defined instructions (i.e. a written procedure was prepared and adhered to by the endoscopists). The main limitations include the retrospective evaluation of the data; the mode of selection of participants in early-phase clinical trials, which did not allow us to include a representative sample of an average population into the current analysis; the lack of control for possible rater-specific assessments despite the standardized procedure; and the heterogeneity of covariate documentation according to the respective study protocols.

5 Conclusion

Upper gastrointestinal tract mucosal lesions, including those assessed as clinically relevant, are frequent in clinically healthy individuals, impeding the assessment of causality for both disease and drug effects on gastrointestinal health. Beyond the present cross-sectional study, large longitudinal evaluations of EGD findings in clinically healthy individuals in the absence of drug exposure would be helpful to assess intraindividual variability and time course of respective findings.