Introduction

Chronic gastrointestinal (GI) disorders and ensuing symptoms like abdominal pain, bloating, nausea, and changes in bowel habits pose a significant health burden [1]. Conditions such as irritable bowel syndrome (IBS), small intestinal bacterial overgrowth (SIBO), and functional dyspepsia (FD) collectively impact over 40% of the global population [2]. The origins of these chronic symptoms are complex and multi-faceted, but emerging evidence suggests that gut microbiome dysbiosis plays an important role [3, 4].

Interindividual variation in gastrointestinal microbial composition means that a “healthy” microbiome is difficult to characterize [5]. Despite this, differences between the microbiomes of those with chronic GI symptoms and healthy subjects have been demonstrated [6, 7]. Some of the bacterial genera that differentiate these groups also play a role in carbohydrate metabolism [8]. Intestinal fermentation of dietary carbohydrates leads to the production of gases, including hydrogen (H2) and methane (CH4). The accumulation of these gases is thought to be a contributory factor in the generation of certain GI symptoms. Elevated levels of exhaled H2 have been observed in symptomatic cohorts compared to healthy controls in both the fasting state [9] and in response to diet [10]. Breath H2, therefore, is considered a useful biomarker in the study of GI disorders.

Given the link between GI symptoms and an imbalance in gut microflora, some of the most effective treatments are those that target the microbiome. Antibiotic therapies have been shown to alter both the gut microbiota [11, 12] and breath gas profiles [13,14,15] of GI patients. It is therefore reasonable to hypothesize that the postprandial breath gas profile of patients with chronic GI symptoms may be distinguishable from healthy controls and that it may be modulated as a result of antibiotic therapy. Accordingly in the present study, we used a novel app-connected, hand-held breath analyzer (AIRE®, FoodMarble) to compare the postprandial H2 of patients before, during, and after antibiotic treatment to healthy controls. A technical assessment of the AIRE device was also carried out, which included comparing the device’s performance to commercially available mail-in kits.

Methods

Technical Assessment

The technical performance and repeatability of the AIRE device were determined by an ISO 17025:2017 accredited testing and calibration laboratory (GAS Analysis Services, Wicklow, Ireland). Certified test gases (hydrogen mixed with air at 3, 10, and 50 ppm) were analyzed using a gas chromatography machine to verify their concentrations. The test gases were heated and humidified to replicate human breath (24 °C ≤ temperature ≤ 28 °C, 80% ≤ relative humidity ≤ 95%). The test gases were passed through each AIRE device for five seconds to match real world operation. Each device was tested three times using each test gas sequentially, with a 15-min gap between each test to replicate a typical LHBT testing scenario. Three AIRE devices were tested three times at each test gas concentration on three non-consecutive days.

Study Subjects

The study population consisted of adult patients suspected to have SIBO based on chronic (3 months or more) GI symptoms including nausea, bloating, distention, altered bowel movements, weight loss, or abdominal pain with no evidence of mechanical obstruction. Exclusion criteria included a history of inflammatory bowel disease or antibiotic use within 30 days prior to study commencement. The study was registered with ClinicalTrials.gov (NCT04309396).

Performance of LHBT

Subjects performed simultaneous LHBTs at-home using the AIRE device and a commercially available mail-in kit (Commonwealth Diagnostics International, Salem, MA, United States or Trio-smart, Raleigh, NC, United States). LHBTs were performed in a standardized manner using the instructions provided by each manufacturer. With Commonwealth Diagnostics International mail-in kits, subjects are instructed to exhale into a collection tube for 3–5 s. With Trio-smart mail-in kits, subjects are instructed to exhale into a mouthpiece attached to a collection bag for 2 s. A hole in the mouthpiece allows dead space air to escape. After 2 s, the subject covers this hole while continuing to exhale until the bag is filled (approximately 3 s). Mail-in kit samples are analyzed for carbon dioxide (CO2) concentration. An end-tidal breath sample is required for a reliable breath test result, which should contain approximately 5.5% CO2. In cases where the breath sample is contaminated by dead space or room air, due to improper collection technique, a correction factor is applied to the final breath result. With the AIRE device, the subject is instructed to hold their breath for 3 s before exhaling to allow for adequate gas exchange in the lungs. The FoodMarble app provides a countdown timer on screen to guide the patient. Upon exhalation, the subject is instructed to breathe slowly into the device for 5 s to ensure an end-tidal breath sample. If a subject stops breathing before 5 s have been reached, they are prompted by the FoodMarble app to retake the breath sample one minute later.

In line with the North American Consensus on breath testing in gastrointestinal disorders, a positive LHBT was defined as a  ≥ 20 ppm rise in breath H2 above baseline within 90 min of substrate consumption [16]. Where breath test timings were not recorded at exactly 90 min post substrate consumption for either device, data were linearly interpolated. Subjects were required to follow a diet low in fermentable carbohydrates for the 24 h before the test. They underwent a minimum of 12 h fasting, where only water was permitted. On the morning of the LHBT, subjects checked their fasting breath H2 levels using the AIRE device and app. If their fasting breath H2 was elevated (> 15 ppm), they were advised to re-check it in 20 min intervals for up to one hour. If their fasting breath H2 remained elevated, they were advised to postpone the test until the following morning. A baseline breath reading was recorded on the AIRE device and using the mail-in kit. The test substrate, 15 mL of lactulose, was ingested. Every 15 min a breath test was recorded on the AIRE device and the mail-in kit for a minimum of two hours. Upon completion of the LHBT, the FoodMarble app automatically confirmed adherence to the LHBT protocol and notified the subject that test results had been sent to their healthcare provider for review via the FoodMarble online dashboard. The commercial LHBT kit was mailed by the subject to a testing lab for the measurement of the breath test samples. Results were received by PDF within 3–10 days.

Postprandial Breath H2 Measurement

Twenty four patients and 10 healthy controls were instructed to follow their habitual diet throughout the study. Both groups used the AIRE device and app to measure exhaled breath H2 content before and after the first (morning) and last (evening) meal of each day. Prior to eating, a baseline breath reading was recorded. After finishing each meal, a breath reading was recorded at 30, 60, and 90 min. Patients recorded breath data for 7 days prior to antibiotic treatment and for 7 days after completion of treatment. Post-treatment data were gathered 2 weeks after the completion of antibiotic therapy. A subset of patients (14/24) also recorded breath data during treatment. Healthy volunteers recorded breath data for 7 days.

Statistical Analysis

Postprandial breath H2 comparisons between groups were made using one-way analysis of variance (ANOVA). Subsequently, post hoc comparisons were conducted using the Bonferroni test to discern specific pairwise differences between groups. A p value ≤ 0.05 was considered statistically significant. This analysis was performed using IBM SPSS software version 28.

The sample size for the LHBT comparison was calculated using the methods of Buderer et al. [17]. For the calculation, a sensitivity of 42% and a specificity of 70% were used [18]. A target population prevalence of 36.7% was used [19]. Using these parameters, a sample size of at least n = 48 was defined for the LHBT comparison.

Correlations between breath H2 concentrations from the AIRE devices and mail-in kits were determined using Pearson’s correlation and simple linear regression analysis. This analysis was performed using IBM SPSS software version 28. A p value ≤ 0.05 was considered statistically significant. Mean and standard deviation of baseline, peak, and overall H2 concentrations were calculated. To assess diagnostic agreement, Cohen’s kappa statistic was also calculated. Mail-in kit LHBT reports may return individual breath test results as ‘invalid samples’ due to insufficient gas samples being present in the collection tube or bag at the time of analysis. Results that were reported as ‘invalid samples’ were omitted from the statistical analyses. All pairwise comparisons were made with viable samples from the AIRE device and the mail-in kit.

Results

Technical Assessment

Three AIRE devices were tested with three certified test gases covering a diagnostically significant range of H2 concentrations (3, 10, and 50 ppm H2 mixed with air). The overall average readings (mean ± SD) for the three AIRE devices tested at 3, 10, and 50 ppm H2 were as follows: 3.5 ± 0.7 ppm, 10.7 ± 1.1 ppm, and 49.5 ± 2.6 ppm respectively. To test day-to-day repeatability, the protocol was repeated on three non-consecutive days (Fig. 1). Across all three AIRE devices, the MAE at 3, 10, and 50 ppm were 0.7, 1, and 1.9 ppm, respectively. The overall MAE was 1.2 ppm.

Fig. 1
figure 1

Linearity and repeatability curve for three AIRE devices tested at 3, 10, and 50 ppm H2/air on three non-consecutive days

Comparison of LHBT Performance Using AIRE and Mail-in Kit

All subjects performed LHBTs using the standard diagnostic method. 26% (13/50) of LHBTs performed with the AIRE device were positive. Similarly, 26% (13/50) of LHBTs performed with the mail-in kit were positive (Table 1). There was diagnostic agreement in 90% (45/50) of cases.

Table 1 LHBT comparison of AIRE and mail-in kit (n = 50)

Of the five LHBTs which were in diagnostic disagreement, two were positive on the AIRE device and negative with the mail-in kit and three were positive with the mail-in kit and negative on the AIRE device. The Cohen’s kappa statistic (κ) was calculated as 0.8 representing a substantial level of agreement between the AIRE device and the mail-in kit.

The average (mean ± SD) baseline breath H2 values measured with the AIRE device and mail-in kit were 10.3 ± 13.7 ppm and 7.9 ± 11.2 ppm, respectively (Fig. 2). The average (mean ± SD) peak H2 values measured were 37.9 ± 31.3 ppm and 37.3 ± 35.6 ppm on the AIRE device and mail-in kit, respectively (Table 2).

Fig. 2
figure 2

A Mean breath H2 concentrations collected during the LHBT using AIRE (n = 50). B Mean breath H2 concentrations collected during the LHBT using the mail-in kit (n = 50)

Table 2 Mean baseline and peak breath H2 ppm (mean ± SD) for AIRE devices and mail-in kits (n = 50)

A total of 387 pairs of breath measurements taken during 50 LHBTs were included in the correlation and linear regression analysis. The results demonstrated a significant positive linear correlation between the breath H2 measurements using the AIRE device and the mail-in kits (r = 0.73, p < 0.001) (Table 3).

Table 3 Linear regression analysis for the association of breath H2 measurement using AIRE and mail-in kits (n = 50)

Postprandial Breath H2 Measurement

Twenty four patients measured postprandial breath H2 for 7 days prior to treatment and for 7 days after treatment. 10 healthy controls measured postprandial breath H2 for 7 days (Fig. 3).

Fig. 3
figure 3

A comparison of postprandial hydrogen from patients pre-treatment (n = 24) and post-treatment (n = 24) to healthy volunteers (n = 10)

The patient group had significantly greater daily average breath H2 before treatment than the healthy controls, 24.5 and 16.8 ppm, respectively (p < 0.001). Their breath H2 reduced significantly following treatment, to 16.2 ppm (p < 0.001) and there was in fact no significant difference between the H2 levels for the healthy controls and the patient group after treatment (p = 0.743) (Table 4).

Table 4 Daily mean postprandial breath H2 for healthy controls (n = 10) and patients before and after treatment (n = 24)

Discussion

Breath testing is commonly used in the diagnosis of GI disorders. However, current breath testing methods require specialized, expensive equipment, and technical support and are either done at a healthcare facility and/or have to be mailed to a central laboratory extending the wait time for results. Accordingly, the performance of the AIRE device was assessed, including a comparison to commercially available mail-in LHBT kits. There was a high degree of diagnostic agreement (90%) between both approaches and the resulting Cohen’s kappa statistic (κ = 0.8) also demonstrated a substantial level of agreement (Table 1). These data support the use of the AIRE device as equivalent to mail-in LHBT kits for the diagnosis of GI disorders such as SIBO.

Conventional breath tests have substantial practical utility but have some limitations. They require patients to be in a fasted state and follow a preparation diet in order to have a sufficiently low baseline reading. This protocol may not be strictly adhered to and for those with slow transit, a low baseline may be hard to achieve in practice. For SIBO testing, the use of glucose as a substrate limits the investigation to the proximal small bowel resulting in false negatives, whereas lactulose can result in false positives owing to it reaching the cecum in some cases before the standard 90-min diagnostic cut-off. Above all else, neither substrate represents actual food well. The prospect of measuring a patient’s postprandial response to their actual diet may offer an alternative to these existing breath testing paradigms. By understanding changes in postprandial breath readings following a therapeutic intervention, it may be possible to identify people who would be most suitable for a particular intervention and through monitoring, their success with that intervention. Further, measuring postprandial gas responses holds significant promise for advancing research into dietary interventions, such as the low FODMAP diet. Researchers could gain a more comprehensive understanding of how different foods impact GI function in real time. This approach could enable the identification of personalized dietary strategies tailored to individual responses, ultimately leading to more effective management of GI disorders.

In the present study, the patient group produced significantly more postprandial H2 on average than healthy controls (p < 0.001) (Table 4). This is consistent with previous studies reporting elevated exhaled H2 in those with chronic GI symptoms compared to healthy controls in the fasting state [9] and in response to both a standard diet [20] and more fermentable challenge meals [10]. Studies examining H2 generation in response to lactulose ingestion have reported differences between symptomatic and healthy groups [21]. The measurement of intestinal gas volume using imaging techniques has also been used to differentiate between symptomatic and healthy cohorts [22, 23]. Quantitative and qualitative differences in intestinal microbiota have been demonstrated between those with a range of GI disorders and healthy controls [24,25,26]. These bacteria play a significant role in dietary carbohydrate fermentation [3]. Therefore, the elevated postprandial H2 in the patient group may be a manifestation of an altered microbiome.

While the exact mechanism of action of antibiotics used to treat GI disorders is not fully understood, their efficacy is thought to be due in part to inhibiting the growth of certain microbiota [11]. There was a significant reduction in the mean postprandial H2 of patients following antibiotic treatment (p < 0.001), falling to levels consistent with the healthy controls (Table 4). It is likely that the changes observed in postprandial breath H2 in the current study are due to changes in the relative abundance or eradication of certain microbial taxa. Interestingly, a subset of patients (14/24) recorded postprandial breath H2 during their treatment and a significant increase in mean H2 compared to their pre-treatment levels was observed (p = 0.05). This may have been caused by a dying off of H2-consuming methanogens, resulting in a sudden increase in intestinal H2. This hypothesis could only be confirmed by assessing postprandial CH4 levels, something future studies may wish to do. The changing gas profiles may be commensurate with alterations in the microbial milieu along the gastrointestinal tract during the course of antibiotic treatment. This suggests a meaningful association between breath H2 and treatment-induced microbial modulation. Postprandial breath gases, therefore, may serve as an accessible and quantifiable indicator of treatment efficacy and provide a non-invasive means to monitor changes in microbial composition.

The current study has several limitations. Patient and healthy control groups were not matched for age, BMI, and gender. The sample sizes in both groups were small, future studies should seek to replicate these findings in larger cohorts. Postprandial CH4 was not measured in the current study but should be assessed in similar future studies to provide a more complete picture of fermentative activity. Diet was not controlled, therefore variations in fermentable foods cannot be discounted. Antibiotic therapy was not standardized, future studies should repeat the current protocol with standardized therapy as well as controlled diets. Changes in postprandial gas profiles could then be more confidently attributed to the effect of a specific therapy on intestinal microbial activity.

In conclusion, this study represents a first step in demonstrating that measuring postprandial breath gases in those with chronic GI symptoms may prove to be clinically relevant for dynamic, real-time assessment of gut microbial activity and treatment response. This approach could offer a practical and patient-friendly avenue for continuous clinical monitoring and personalized therapeutic management. From a research perspective, the ability to track postprandial breath gases may also contribute to a more comprehensive understanding of GI disorder pathophysiology.