1 Background

Aspirin (acetylsalicylic acid), one of the most commonly used drugs in the USA [1, 2], is commonly purchased over the counter for short-term treatment of pain, fever, and colds. Other nonsteroidal anti-inflammatory drugs (NSAIDs) such as ibuprofen and naproxen are also widely used for these indications. However, with prolonged use, all of these medications carry a risk of gastrointestinal adverse effects, including ulceration and bleeding in the luminal gastrointestinal tract [35]. Rarely, these complications can be life threatening, but even minor adverse effects such as dyspepsia may be important, since they may discourage patients from obtaining appropriate treatment.

Despite the common use of these drugs, data regarding their safety during short-term use in over-the-counter doses in adults are scattered in the literature and are not well characterized [6]. We aimed to summarize the gastrointestinal toxicity of aspirin in comparison both with placebo and with other drugs commonly used in this manner, by conducting a meta-analysis of randomized clinical trial data bearing on the issue. This report is a companion to a recent summary using individual subject data on the relative toxicity of aspirin in short-term trials conducted by Bayer [7].

2 Methods

On February 20, 2008, we conducted an extensive literature search of the published medical literature to identify reports of clinical trials or observational studies comparing the gastrointestinal toxicity of aspirin with that of placebo or active comparators. The databases scanned were Medline [1950–2008], Embase [1993–2008], Derwent Drug File [1982–2008], Biosis [1978–2008], Current Contents [1992–2008], and a Bayer internal bibliographic database focusing on drug safety [1918–2008]. Search strategies, tailored to the individual databases, are detailed in Appendix 1 in the Electronic Supplementary Material. A total of 119,310 citations (including possible duplicates) were identified. Articles classified as reviews or meta-analyses, those written in a language other than English, and those that were conference abstracts or one-page short communications were not considered further, as they were unlikely to provide substantial relevant data. After removal of evident duplicates, 23,131 reports remained.

2.1 Selection of Reports for Inclusion in the Meta-Analysis

Since a manual review of each paper we identified was not feasible, we developed a relevance score, using automated text mining to grade articles for relevance to our meta-analysis (Fig. 1). The score was based on the occurrence of words in article titles, abstracts, and indexing terms. We searched for five groups of relevant words, related to (i) study design (e.g., ‘randomized’, ‘cohort’, or ‘meta-analysis’); (ii) key drug compounds (e.g., ‘aspirin’ or ‘ibuprofen’); (iii) adverse effects (e.g., ‘bleeding’ or ‘dyspepsia’); (iv) size of study (i.e., number of subjects); and (v) drugs NOT used for treatment of pain, inflammatory conditions, or as a cardioprotective agent. Through repeated examination of the candidate articles, an extensive list of synonyms was generated for each group of terms (see Appendix 1 in the Electronic Supplementary Material). In the scoring of each article, the number and places of occurrence of the terms were counted, generally weighting the index and title more heavily, and greatly weighting larger studies. Mention of drugs not used for aspirin-related conditions lowered the score. The scoring algorithm was derived in an iterative manner, in which different weighing factors were tried for each aspect, followed by manual evaluation of the highest-scoring articles. (Details of the scoring algorithm are given in Appendix 1 in the Electronic Supplementary Material).

Fig. 1
figure 1

Selection of publications for inclusion in the meta-analysis

We aimed to consider in more detail the 4,000 highest-scoring articles, and we were able to obtain copies of 3,983 of them. These were reviewed by trained physicians at GGA Software Services (St. Petersburg, Russia), each with an MD degree and a PhD degree. A paper was considered ‘relevant’ if it summarized a human randomized controlled trial or epidemiological study, included any usable information regarding at least one adverse event during aspirin treatment, and provided information about the doses of the active treatments that were studied and the duration of treatment.

After further elimination of duplicates, there were 3,916 apparently distinct papers. There was a steady decrease in the percentage of relevant publications across groups of articles with decreasing relevance scores. There was also a strong downward trend in the number of adverse events across papers with decreasing scores; the aggregate number of events in the 500 lowest-scoring articles was negligible.

Further steps were taken to assess the accuracy of the selection of reports for inclusion in the meta-analysis. From the 19,131 articles with lower relevance scores that had not previously been reviewed in detail, the 616 that included 1,000 or more subjects were screened manually, using the title and abstract, to ensure that important data were not missed. None was eligible for inclusion in the meta-analysis. Among the 2,345 articles with 100–999 subjects, 20 % were similarly reviewed, and only one eligible report was identified, which contained a total of only six symptom complaints and thus it was not included in the database. The original designation of non-relevance was also checked for the 289 of the 500 papers that had the highest relevance score but were deemed not relevant. Eight were judged to be potentially relevant and were included in the database. In total, there were 805 relevant articles identified in the pool of the 4,000 highest-scoring reports.

From the relevant articles, data were extracted regarding details of study design, medications investigated (dose, duration of treatment and follow-up, etc.), numbers of subjects, and the numbers of specific events reported. The counts of subjects at risk of adverse events were taken from the safety study population (i.e., randomized subjects who took any study medication), whether or not they provided any efficacy data. The specific terms used to describe the adverse events in each of the articles were retained during the data extraction. These were then grouped into relevant categories. Dyspepsia, nausea/vomiting, and abdominal pain were considered separately and also in aggregate as ‘minor gastrointestinal events’. Dyspepsia was taken to include terms covered by the Medical Dictionary for Regulatory Activities (MedDRA) preferred term ‘dyspepsia’, nonspecific (functional) gastrointestinal disorders, eructation, abdominal/epigastric discomfort, and abdominal tenderness but not abdominal pain. Gastrointestinal bleeding was defined as including all bleeding in the gastrointestinal tract, ranging from a positive stool test to melena. Clinically active gastrointestinal ulcers and perforations were also tabulated, but purely endoscopic findings were not. The term ‘gastrointestinal events’ was reserved for descriptions of low specificity reported as a sole safety outcome, as well as an overall summary of other events considered in the same publication. Gastrointestinal events that did not match one of these outcome categories were not considered in the analysis (e.g. diarrhea, flatulence, constipation, dry mouth). Data entry was repeated on the 5 % of clinical trial and observational reports that provided the largest number of endpoints. Articles with discrepancies were re-reviewed to reconcile the differences.

The risk of experiencing gastrointestinal adverse events after short-term treatment with aspirin was assessed using meta-analytical methods. We did not include observational studies, as they rarely provided detailed data regarding dose and duration of treatment, and they did not directly compare different agents with each other. We included parallel-design, randomized clinical studies with at least one aspirin arm at a dose between 325 and 4,000 mg/day and a treatment duration of at most 10 days. We included only articles that studied aspirin as monotherapy, i.e., not in combination with other active agents (e.g., ephedrine). Vitamin C and caffeine were not considered active components. No exclusions were made with regard to blinding, subject compliance, single vs. multiple dosing, total dosages, or formulations. Crossover trials were excluded because of concerns regarding unknown carryover effects, patient dropout between treatment phases, and within-patient correlations. To avoid including previously reported data, publications describing Bayer-sponsored studies that were included in a previous report [7] were also not included in the current analysis. After these exclusions, a total of 152 studies from 150 publications were considered.

In some reports, the number of subjects allocated to each study treatment was stated only as a percentage of an overall total. The corresponding products were retained in our database even if this resulted in fractional numbers of subjects. Calculation of incidence rates of aggregate outcomes, especially ‘minor gastrointestinal events’, created some complexities. To account for the possibility that individual subjects may have experienced more than one reported event, we estimated the total event count as the harmonic mean across the range of all possible event count values, ranging from the minimum (the largest reported individual event count) to the maximum (the sum of all different individual event counts). In formal terms, if a i was the number of patients affected by adverse event i, the possible event frequencies ranged between E min  = maximum of [a i ] and E max  = sum of [a i ]. In order to assess whether the harmonic mean presented a reliable risk estimate, two other estimates were calculated in a sensitivity analysis:

(i) ‘10 % incidence rate’: [E min  + (E max  − E min ) × 0.1]/N; and

(ii) ‘90 % incidence rate’: [E min  + (E max  − E min ) × 0.9]/N

In all instances, these showed at most minor differences with the harmonic mean estimate, and thus they are not presented. Neither the harmonic mean estimates nor the 10 % and 90 % incidence estimates were rounded to integer values, which resulted in fractional numbers of patients with some adverse events.

We compared adverse event rates in subjects randomized to aspirin with the rates in those treated with placebo, with any active comparator, or with paracetamol, ibuprofen, naproxen, or diclofenac. Odds ratios (ORs) were used as the measure of the effect, calculated using the Mantel–Haenszel risk estimator, as it is robust even where few cases of adverse events occur. A continuity correction that accounted for the sizes of treatment arms [8] was applied in case of zero cells in a stratum. Heterogeneity across studies was assessed using the modified Breslow–Day statistic for the OR [9, 10], with a P value of ≤0.10 being considered an indication of heterogeneity. Studies with no mention of an adverse event in either treatment arm were not included in the analysis of that event.

Summary risk differences were also computed, using Mantel–Haenszel statistics. The absolute rates differed considerably across studies, presumably varying with the clinical setting. The risk differences also varied, with marked heterogeneity in most analyses, indicating that risk differences were not a suitable scale for summarizing the data. Consequently, those analyses are not reported here.

For paracetamol, ibuprofen, naproxen, and diclofenac, overall comparisons and low- and high-dose specific comparisons were made using the categories listed in the footnotes to Table 1. In studies with a range of possible aspirin doses, an average dose was calculated from the minimum and maximum doses.

Table 1 Characteristics of studies included in the meta-analysis

A full protocol for the meta-analysis is available from the corresponding author. Bayer HealthCare (Leverkusen, Germany) funded the study, and Bayer employees participated in this research. All authors assume responsibility for the integrity of the work.

3 Results

3.1 Studies

Overall, 150 publications describing 152 studies and 48,774 patients were selected; 78 of these with 19,829 subjects provided relevant data for at least one safety outcome in comparisons of aspirin with placebo or an active agent (see Table 1 and see Appendix 2 in the Electronic Supplementary Material). Three studies did not describe whether subjects and investigators were blinded to study treatment, but 69 (88 %) were double-blinded. The most frequently investigated indication was pain—the target condition in 62 studies (79 %). Subjects were aged between 16 and 75 years; about equal numbers of men and women were included. A total of 6,712.5 subjects were allocated aspirin, 3,385.5 placebo, and 9,731 an active comparator. The aspirin treatment was a single dose in 2,694 subjects (43 %). The daily dose was 500–1,000 mg in 2,874 aspirin-treated subjects (46 %) and 1,500–2,000 mg in 2,920 subjects (47 %).

3.2 Gastrointestinal Risks

Five studies comparing aspirin with placebo and five studies comparing aspirin with active comparators reported data on overall gastrointestinal risks, which were recorded in 4.2–18.2 % of subjects (Table 2). Aspirin subjects had higher rates than those allocated placebo (OR 2.12, 95 % confidence interval [CI] 0.95–4.76) and active comparators (OR 1.61 95 % CI 1.43–1.82) [see Table 2 and see Appendix 3 in the Electronic Supplementary Material].

Table 2 Gastrointestinal events in subjects treated with aspirin vs. comparators, all doses

In 59 studies with 3,304.5 subjects receiving aspirin and 3,170.5 subjects receiving placebo, 5.2 % of aspirin subjects reported a minor gastrointestinal complaint (abdominal pain, dyspepsia, or nausea/vomiting), versus 3.7 % of placebo subjects. The corresponding summary OR was 1.46 (95 % CI 1.15–1.86) [see Table 2 and see Appendix 3 in the Electronic Supplementary Material]. The ORs for dyspepsia (3.17, 95 % CI 1.73–5.82) and abdominal pain (1.92, 95 % CI 1.12–3.27) were also increased significantly.

Similar findings emerged in comparisons of aspirin with any active comparator (50 studies with 4,888 and 9,471 subjects, respectively). The pooled risks of minor gastrointestinal complaints were 12.5 % in subjects receiving aspirin and 7.8 % in subjects receiving an NSAID/analgesic. The risks varied modestly across studies of aspirin versus the different comparators. Abdominal pain tended to be the most frequent complaint, recorded in 3–11 % of subjects (see Table 2 and see Appendix 3 in the Electronic Supplementary Material). Dyspepsia was reported in 3.2–6.2 %, and nausea/vomiting in 3.1–6.3 %. The OR for aspirin versus any active comparator for minor gastrointestinal complaints was 1.81 (95 % CI 1.62–2.04.) The risks of dyspepsia, nausea and vomiting, and abdominal pain were each significantly increased for aspirin versus any active comparator, with ORs between 1.37 and 1.95 (Table 2).

The findings for comparisons of aspirin in any dose with paracetamol or ibuprofen in any dose were similar to those for any active comparator, with ORs ranging up to >2.0 (Table 2). Relatively limited data were available for naproxen and diclofenac; the aspirin ORs ranged from nonsignificantly reduced risks to nonsignificantly increased risks for the various endpoints, all with wide CIs.

The data for paracetamol and ibuprofen were dominated by a single large study, the Paracetamol, Aspirin and Ibuprofen New Tolerability (PAIN) study [11]. After exclusion of this trial, the numbers of subjects in the analyses were reduced by about 90 % or more. In this reduced data set, the ORs for aspirin versus paracetamol were somewhat lower than the overall estimates, ranging from 0.31 (95 % CI 0.03–3.38) for dyspepsia in two studies to 3.64 (95 % CI 0.68–19.54) for abdominal pain in one study. For comparisons with ibuprofen, the ORs tended to increase after exclusion of the PAIN study data and generally retained statistical significance (data not shown).

Overall comparisons of low-dose aspirin (1,000 mg/day or less) with lower-dose comparators and higher-dose aspirin (>1,000 mg/day) with higher-dose comparators were imprecise; most ORs had wide CIs and lacked statistical significance (data not shown). However, lower-dose aspirin was associated with significantly more overall minor gastrointestinal complaints than lower-dose ibuprofen (OR 2.67; 95 % CI 1.22–5.84) or naproxen (OR 3.52; 95 % CI 1.01–12.25). Higher-dose aspirin was associated with significantly more of these complaints than higher-dose paracetamol (OR 1.68; 95 % CI 1.44–1.97), ibuprofen (OR 1.99; 95 % CI 1.69–2.33), and naproxen (OR 11.1; 95 % CI 1.74–70.85).

Serious gastrointestinal events were very rare. There was one perforated appendix in a placebo patient, one case of ulcerative colitis after placebo treatment, and an ulcerative colitis attack after paracetamol. In one study [12], gingival bleeding occurred at slightly lower incidence with aspirin 900 mg (8 %) than with paracetamol 1,000 mg (13 %), though both rates were higher than those seen with placebo (3 %). (Statistical significance of the differences was not reported.) No clinically significant gastrointestinal bleeds were observed. Two studies each observed that one aspirin-treated subject had occult blood in stools [13, 14].

4 Discussion

We used a digital data-mining process to identify comparative studies of gastrointestinal adverse effects of aspirin and other medications commonly used over the counter for short-term treatment. After scanning approximately 4,000 articles, we found 150 relevant clinical trials, including 78 with endpoint data that could be used in our meta-analysis. Serious gastrointestinal events were very rare. Although minor gastrointestinal complaints (dyspepsia, abdominal pain, and nausea/vomiting) tended to be uncommon, aspirin was associated with higher risks of most of them, typically increasing the risk by about 50–100 %. One large study dominated the comparison of aspirin with paracetamol and ibuprofen; exclusion of its data from the analyses left the findings more variable but broadly consistent with the overall results.

Chronic use of NSAIDs is well known to increase the risk of serious gastrointestinal events such as perforations, ulcers, and bleeds [3, 4, 15, 16]. We have shown here that those events are not a concern for short-term use of aspirin or other drugs commonly used for pain, colds, and fever. Our main focus was more minor gastrointestinal problems—subject-reported symptoms, which are inherently more subjective than serious adverse events. Nausea, vomiting, and abdominal pain are fairly well defined, but even with the most careful use, ‘dyspepsia’ can refer to several different symptom patterns [17, 18]. The ambiguity in the term naturally carries over to our analysis from the primary reports we included. However, as far as possible, we separated dyspepsia from abdominal pain and nausea/vomiting.

Previous reports have summarized data regarding gastrointestinal symptoms associated with longer-term NSAID use. In observational studies, aspirin and other NSAIDs have clearly been associated with dyspepsia [6]. An early meta-analysis [16] summarized data from NSAID trials with a treatment duration of four or more days. There was no statistically significant effect of aspirin or non-aspirin NSAIDs on dyspepsia, nausea, or abdominal pain in a random-effects analysis. In a less conservative fixed-effects analysis, aspirin was associated with an increased risk of dyspepsia and abdominal pain, and non-aspirin NSAIDs were associated with an increased risk of dyspepsia. A more recent meta-analysis summarized data regarding dyspepsia from randomized, placebo-controlled trials of non-aspirin NSAIDs used for five or more days [18]. The association depended on the definition of the endpoint. A narrow dyspepsia definition (omitting nausea, vomiting, and other symptoms only tangentially related to epigastric pain or discomfort) yielded a pooled risk ratio (RR) of 1.36 (95 % CI 1.11–1.67) versus placebo. In analyses using broader definitions, the RRs were more modest. Aside from a previous analysis of Bayer-sponsored trials [7], we are unaware of any previous overview of the adverse effects of short-term use of any NSAID, including aspirin.

The findings obtained in this meta-analysis are broadly compatible with those from the meta-analysis of the Bayer studies [7], which considered aspirin versus placebo, paracetamol, or ibuprofen (Table 3). Unfortunately, combined analysis or even detailed comparison of the two sets of findings is not possible, because of differences in the definitions of the endpoints in the two analyses (see Table 3 footnotes).

Table 3 Odds ratios (ORs) for aspirin vs. comparators in the current literature analysis and in Bayer studies

Our study utilized a novel data-mining approach to identify appropriate studies for inclusion in the meta-analysis. Our literature search identified over 119,000 citations (including possible duplicates) mentioning aspirin; it was obviously not possible to examine each of them in detail for possible inclusion in our meta-analysis. Nonetheless, our quality control measures made it clear that we identified the vast majority of the relevant data, and this comprehensive approach is a strength of our analysis. In the end, we included data from 78 studies and almost 22,000 subjects. Consequently, many of our analyses have considerable statistical precision, and we have stable estimates for the comparison of aspirin with placebo, all active comparators, paracetamol, or ibuprofen. On the other hand, our meta-analysis was unavoidably limited by the features of the studies that were summarized, including possible lack of compliance, unblinding, and ambiguous definitions of endpoints. Our findings may also reflect heterogeneity in effects over the indications for, and duration of, treatment. Close to half of the subjects who were analyzed received only a single dose of the study agent.

There are limitations to the interpretation of our data. Clinical trials of aspirin and other NSAIDs often screen potential subjects for risks of adverse events, creating low-risk study populations. Consequently, estimates of absolute risks of various events may be conservative in comparison with what might be expected in general use. In interpreting our data, it should be remembered that as we selected studies for analysis, we excluded those that reported no adverse events. This is commonly done, but, other things being equal, this has the tendency to inflate absolute incidence estimates because it reduces the denominators of rates without similarly reducing the numerators.

5 Conclusions

In this meta-analysis, serious adverse events were not observed with short-term use of aspirin or other over-the-counter medications used for pain, cold, or fever. However, aspirin conferred a higher risk of minor gastrointestinal complaints.