Non-malignant chronic pain affects up to 20% of the population of Western countries [13]. Chronic pain can originate from a variety of underlying diseases or syndromes, such as cancer, lower back pain, musculoskeletal alteration, or neuropathy. Opioid agonists are normally used to treat this condition. However, strong pain killers come with an array of side-effects that substantially narrow their therapeutic window. Opioid prescription has dramatically increased over the past 10 years, inducing risks of overdose, particularly for patients with chronic pain. Clinical practice guidelines have recently been reviewed and mitigation strategies have been thoroughly discussed in a recent appraisal [4]. Given these drug attributes, opioids are often titrated to an individual-based optimal efficacy–tolerability ratio. Even then, a considerable amount of adverse events can remain at the cost of reduced efficacy. In the current study, the efficacy and safety profiles of two major opioids, tramadol and tapentadol, are compared using model-based quantitative methods. A review of the mechanism of action of tramadol and tapentadol can be found in a study by Nossaman et al. [5]. Tramadol is a centrally acting synthetic analgesic of the opioid class used to treat moderate-to-severe pain. Tramadol and its active metabolite produce anti-nociception predominantly via a mechanism of binding to mu-opioid receptors. Appropriate dosing regimen and compliance are critical to the success of the therapy. Belonging to the same class of analgesic, tapentadol was approved more recently (in 2008) for the relief of moderate-to-severe acute pain in patients 18 years or older. Tapentadol binds to mu-opioid receptors and inhibits norepinephrine re-uptake. These two processes are thought to be responsible for pain relief with tapentadol [5]. Although tapentadol is often presented as a new-generation analgesic, no formal head-to-head comparison between tramadol and tapentadol has been run and/or made public so far.

Summary-level information about clinical efficacy (pain intensity) for these treatments can be found in the literature. It can be extracted and combined into a meta-analysis framework to compare treatment effects of different drugs across different patient populations. Frequently, the assessment of the efficacy of a compound in meta-analyses is based on the end of study results only and pain scores collected at repeated times points are discarded or are averaged. However, study durations can be different, which makes the interpretation of findings ambiguous if the time dynamics are not explicitly covered in the meta-analysis. Incorporating longitudinal information about pain intensity would allow the evaluation of the onset of effect, its magnitude, and its resilience, and could provide accurate estimates of the true response and, as a consequence, more valid comparison between treatments. Longitudinal model-based meta-analyses are an extension of traditional meta-analyses and represent a framework for assessment of such longitudinal information. Two important components are captured in these models: the magnitude of the treatment effect which may be related to the dose in a linear or non-linear way, and its time course. Longitudinal model-based meta-analyses have been reported for migraine pain [6], Alzheimer’s disease [7], and type 2 diabetes mellitus [8], while Ahn and French [9] discuss some methodological aspects of this approach.

Gastrointestinal adverse effects, dizziness, and somnolence are among the most commonly reported adverse events in patients taking opioids analgesics [10]. The frequency of these adverse events has been discussed in previous meta-analyses [11, 12], but they were based on a small number of studies and did not take into account the possible confounding effects of dose or study duration. In the present work, the authors apply meta-analysis techniques to compare the proportion of patients experiencing constipation, nausea, vomiting, dizziness, and/or somnolence (at least once during the study), in the four treatment groups, respectively, while accounting for dose, study duration, and any other relevant covariate effect. In addition, the same analysis is applied to the frequency of patient withdrawals, either due to adverse events or due to lack of efficacy.

The key objective of the present study was to compare the benefit–risk tradeoff of tramadol versus tapentadol in patients with chronic non-malignant pain by leveraging public-domain summary-level data and performing indirect treatment comparisons. The results of this analysis are presented in the light of previous meta-analyses and provide evidence, or highlight a lack of it, for differentiation between the investigated compounds.


The analysis in this article is based on previously conducted studies, and does not involve any new studies of human or animal subjects performed by any of the authors.

Literature Data

A systematic screening of clinical trials involving tramadol and/or tapentadol for the treatment of non-malignant pain was performed. Clinical trials published until November 2011, were considered. The search sources included PubMed®/MEDLINE™, European Medicines Agency, and Food and Drug Administration drug labeling information and additional sources were identified in clinical trial registries. Several combinations of key words were used (see the Electronic Supplementary Material for details). A total of 83 sources were identified. Leaving out the sources which did not report any data on pain intensity or adverse event frequency or drop-out rate, and after full-text examination, publications describing 45 unique double-blind Phase II or Phase III randomized clinical trials in adult patients with chronic non-malignant pain were retained in the meta-analysis. A list of the trials used in the analysis with key information is provided in Supplementary Table S1 in the Electronic Supplementary Material.

The majority of the trials were placebo-controlled. Six tapentadol trials [1318] were active-controlled trials, using oxycodone as comparator. Because these trials were large in size, hence informative, it was important to keep them in the analysis. In case of active-controlled trials with a comparator other than tramadol or tapentadol, only the arm corresponding to one of these two treatments was retained in the analysis dataset. The list of studies retained in the analysis is presented in Supplementary Table S1 in the Electronic Supplementary Material. It is also worth mentioning that three studies (Adler et al. [19], Mongin et al. [20] and Beaulieu et al. [21]) considered tramadol at several therapeutic doses and in various formulations without a placebo arm. For each treatment arm, information about patient population, sample size, baseline and demographic characteristics were also available.

In addition to describing the treatment effect over time, other differences among trials and treatment arms due to intrinsic (e.g., disease severity, gender, age) or extrinsic factors (e.g., concomitant medication) were accounted for in the analysis. These factors are introduced in the model as covariates. However, because patient-specific covariates are in the form of summary statistics, their values cover a narrower range than the individual values. Consequently, they are less informative about their effects unless the data have been stratified based on them. Covariates of interest in the dataset included year of publication, baseline pain intensity, pain syndrome, and trial duration. The various pain syndromes were grouped into the following categories: osteoarthritis pain, back pain, neuropathic pain, and other chronic non-malignant pain.

Pain Intensity

Pain intensity was analyzed on a scale ranging from 0 to 10 (0 = no pain, 10 = worst imaginable pain). Where efficacy was reported only in terms of change from baseline, the absolute pain intensity score was derived from the difference between change from baseline and baseline values. Pain intensity data from papers which failed to report baseline pain were discarded. Because the scales used to measure pain intensity were very heterogeneous, two broad categories were considered to capture the residual variability in the model: visual analog scale (VAS; continuous) and categorical scales. The rules used to convert the raw data into a 0–10 range are presented in Table 1.

Table 1 Conversion rules for each pain intensity scale

Graphical exploration of the data revealed a marked placebo response across studies (Fig. 1). The placebo effect in pain treatment is a well-known phenomenon [22] which was taken into account in the model development by capturing not only the pain intensity time course in the active treatment groups, but also in the placebo group. Capturing the precise time dynamics in the placebo groups was also important because the indirect comparison of treatments relies on a common (exchangeable) placebo response.

Fig. 1
figure 1

Pain intensity (normalized to a 0–10 scale) over time, in patients treated for chronic non-malignant pain, with placebo, tapentadol, or tramadol. Each circle represents the arm-level average score, with a diameter proportional to the sample size in the arm. The outer curves give the 95% predictive interval and the bold curve the predicted median, using the final pain intensity model

The proposed structural model (1) for the kth pain intensity measured in the jth treatment arm of trial i, at time t included three components: (i) a baseline term (Base); (ii) the placebo and drug effects time courses, and (iii) the between-study random effects and residual error terms. The model was written as follows:

$$ \begin{aligned}{\text{PI}}_{ijk} &= g\left\{ {{\text{Base}}_{ij} + R_{ij} \times \left( {1 - {\text{e}}^{{\lambda \times t_{ijk} }} } \right) + \varepsilon_{ijk} } \right\}\\ g\left\{ x \right\} &= 10 \times \frac{{{ \exp }\left( x \right)}}{{1 + { \exp }\left( x \right)}}\end{aligned} $$

In this equation, Base ij is the baseline estimate (or intercept); R ij corresponds to the placebo or drug effect, reflecting the change from baseline; and ε ijk represents the residual (unexplained) variability. The exploratory graphical analysis showed a mono-exponential decrease of the response over time (in all treatment groups including placebo), which was parameterized in the model by the decay rate λ. In order to estimate the respective effect size and time course, separate R and separate λ parameters were introduced in the model for each drug (placebo, tramadol, and tapentadol). The decay rate was parameterized such that: λ drug = λ pbo + λ Δdrug.

In order to estimate the between-study variability, random effects were associated additively with the Base parameter and exponentially with the R parameter:

$$ \begin{aligned} {\text{Base}}_{ij} &= {\text{Base}} + \eta_{{{\text{Base}}_{ij} }} {\text{ with }}\eta_{{{\text{Base}}_{ij} }} \sim N\left( {0, \omega_{\text{Base}}^{2} } \right)\\ R_{ij} &= R \times e^{{\eta_{{R_{ij} }} }} \,{\text{with}}\,\eta_{{R_{ij} }} \sim N\left( {0, \omega_{R}^{2} } \right)\end{aligned} $$

In order to acknowledge our confidence in trials executed in larger populations, the residual error was entered in the model as inversely proportional to the number of patients (N) contributing to each data point.

$$ \varepsilon_{ijk} \sim N\left( {0, \frac{{\sigma_{\text{res}}^{2} }}{{N_{ijk} }}} \right). $$

As mentioned above, the residual variance was different whether the scale used to measure pain intensity was continuous (VAS) or categorical. These variances are hereafter referred to as σ 21 and σ 22 , instead of a unique σ 2res .

The model was coded in R using the nlme function of package nlme [23]. This function fitted the non-linear mixed-effect model by the method of maximum likelihood.

Adverse Events and Drop-outs

The tolerability-related events (adverse events and drop-outs) were analyzed in terms of number of patients experiencing the event (at least once) during the treatment period. Using the same notation as above, the number of patients experiencing event E (at least once) was assumed to follow a binomial distribution with a probability pE ij , and a sample size N ij , such that:

$$ E_{ij} \sim {\text{Bin}}\left( {pE_{ij} , N_{ij} } \right). $$

The probability of a patient having an event in treatment arm j of trial i was modeled using a logistic model, as afunction of the intercept (α 0) and the m covariates (X mij ), including drug (parameterized either as a factor or as a dose–response relationship), and other covariates. A term for between-treatment variability (u ij ), assumed to be normally distributed in the logit scale, was also introduced in the model.

$$ \begin{aligned}& \log \left( {\frac{{pE_{ij} }}{{1 - pE_{ij} }}} \right) = \alpha_{0} + \mathop \sum \limits_{m} \beta_{m} X_{mij} + u_{ij} \\ & u_{ij} \sim N\left( {0, \omega_{E}^{2} } \right).\end{aligned} $$

This model evaluates the log odds of the outcome (E) probability on various predictors. Hence, the parameter β m measures the effect of increasing X mij by one unit on the log odds ratio.

When enough data were available the dose–response relationship was investigated using linear model. Non-linear dose–response relationships were discarded a priori based on observed trends in exploratory graphics.

The potential for an increased risk of an adverse event under treatment seemed likely to be related to treatment duration; the alternative would be to hypothesize a one-off risk increase on treatment initiation, with no additional risk thereafter, however long the treatment was applied. Hence, treatment duration was always tested as a covariate in the tolerability events and drop-out rates meta-analyses.

The model was coded in R using the glmer function of package lme4 [24]. This function fitted the linear model by the method of maximum likelihood.

Model Selection

During the model development phase, a cut-off of 4-points in Akaike Information Criterion value was used to decide which model to retain. Goodness-of-fit plots and visual predicted checks (VPC) inform the decision of whether to consider the model appropriate for simulations or not. Goodness-of-fit plots included plots of observed versus predicted, and observed versus individual predicted values, stratified (as appropriate) by drug, to ensure adequacy of the fit across drugs.

To obtain a VPC, the observations of the analysis dataset are simulated 1,000 times using the fitted model (structure, parameter estimates, and associated uncertainty). The distribution of the model predictions are superimposed onto the actual trial data to obtain a visual display of the model ability to describe the data it is coming from.

Indirect Comparison of Tramadol and Tapentadol

Particular attention was devoted to the comparison of tramadol and tapentadol benefit–risk ratios. In absence of clinical trial results providing head-to-head comparison between these two compounds, an adjusted indirect comparison method is considered.

Given the non-linear form (in the parameters) of the pain intensity time course model, tramadol and tapentadol were compared by simulation of typical time profiles. For this purpose, the typical tramadol dose was considered to be 300 mg qd, and the typical duration of a trial, 12 weeks. A total of 2,000 treatment arms (1,000 per group) each containing 1,000 patients were simulated, with a baseline pain intensity of 7 out of 10. The predicted differences in mean pain intensity between tramadol and tapentadol group for each trial were then summarized by the median, and 95% predictive interval.

For the comparison of event proportion, the Butcher’s [25] indirect treatment comparison method was readily applied to derive the odds-ratio between tramadol and tapentadol, as associated confidence intervals.


Literature Search Results

The analysis database consisted of 45 unique trial reports, representing publicly available (yet not free) knowledge gathered from 12,985 patients. The distribution of treatment arms per indication is provided in Table 2. The treatments evaluated were approximately equally distributed across pain syndromes.

Table 2 Number of treatment arms per syndrome

The mean (SD) duration of follow-up of the studies was 9.0 (6.8) weeks, with one trial exceeding 15 weeks, i.e., Wild et al. [16], which had a 52-week duration. Only the trial discussed by Wild et al. [16] was open-label; the others were double-blind, randomized controlled trials. The median age in these trials was 58 years (range 47–72 years), and 64% of participants were female. The treatment duration in case of osteoarthritis or back pain (median 12 weeks) was longer than the one in patients experiencing neuropathic (median 9 weeks) or other unspecific types of pain (median 4 weeks). The total number of observations available to fit the pain intensity model was 534.

Incidences of adverse events or drop-out rates were frequently, but not consistently, reported across trials. While the drop-out rate due to adverse event was available for each of the 45 trials contained in the meta-database, the other types of events were reported less frequently: constipation and nausea frequencies were reported in 40 articles, dizziness in 36, drop-out due to lack of efficacy in 37, vomiting in 31, and somnolence in 31. Whatever the event, the range of proportion of patients experiencing it was consistently large, reflecting the heterogeneity between trials and possibly between pain syndromes considered in this analysis. The most frequent adverse events were constipation and nausea, observed in up to 40% of patients exposed to tramadol, and dizziness, observed in 52% of the patients exposed to tramadol in study by Norrbrink and Lundeberg [26].

Pain Intensity

The mean baseline pain intensity across studies was equal to 6.9 (SD = 0.72), with no marked differences between pain symptoms categories, nor between treatment group. The decline in pain intensity in the placebo groups on a 0–10 scale is displayed in Fig. 1(left panel). Without the resort of a statistical model it would be very difficult to estimate the active treatments effect sizes and associated uncertainties.

The model (1) was fitted to the data. With the final model (of which the parameter estimates are presented in Table 3), no obvious misspecification was found based on the goodness-of-fit plots. The visual predictive checks displayed in Fig. 1 were satisfying. The placebo effect was modeled as a mono-exponential function of time, with a decay rate (λ) equal to 0.571 week−1, corresponding to a time to reach 50% of the maximum effect (t 1/2) equal to 1.2 weeks. The onset of effect was found to be as fast in the active groups (tapentadol and tramadol) as in placebo.

Table 3 Parameter estimates of the final pain intensity model

As observed previously [27], subjects with high baseline pain intensity (PI 0i ) had a greater reduction in pain intensity than subjects with low baseline scores. This phenomenon materialized in our model into a positive and significant θ Base parameter estimate. In addition, for patients treated with tramadol, the extent of reduction was related to dose (Dose) by an E max function. Due to lack of data, it was not possible to capture the dose–response relationship for tapentadol. However, patients treated with tapentadol received doses ranging between 100 and 250 mg twice daily (bid). This dose range constitutes the domain of validity of our results.

The extent of reduction R ij in model (1) was therefore expressed as:

$$ R_{ij} = R_{\text{Pla}} \times \left( {1 + \theta_{\text{Base}} \times {\text{logit}}\left( {\frac{{PI_{0ij} }}{10}} \right) + \theta_{\text{Oxy}} + \theta_{\text{Tap}} + \frac{{\theta_{\text{Trm}} \times {\text{Dose}}_{ij} }}{{{\text{ED}}_{50} + {\text{Dose}}_{ij} }}} \right). $$

Assuming a baseline pain intensity level of 6.9 on a scale ranging from 0 to 10, this model revealed that, in a typical trial, tramadol 300 mg qd would lead to a 46% (95% CI 41–51%) reduction of the pain intensity compared to baseline. The estimated reduction with tapentadol 100 to 250 mg bid would be equal to 36% (95% CI 35–37%), while placebo treatment would trigger a 28% (95% CI 23–33%) reduction in pain intensity compared to baseline.

The Monte Carlo simulations run to compare tramadol (300 mg qd) and tapentadol (100–250 mg bid), assuming a fixed baseline value of 6.9 (on a 0–10 pain intensity range), showed that at week 12 the pain intensity would be 0.69 points lower in the tramadol (300 mg qd) group compared to the tapentadol (100–250 mg bid) group. This difference was statistically significant, as illustrated in Fig. 2, yet not clinically relevant.

Fig. 2
figure 2

Distribution of 1,000 differences in predicted group-level pain intensity (normalized to a 0–10 scale) after 12 weeks of treatment with tramadol (300 mg one daily) versus tapentadol (100–250 mg twice daily) (ΔPI). Predictions of pain intensity were simulated from the final model, assuming 1,000 patients per arm. The plain and dashed vertical lines materialize the median and 95% prediction interval, respectively

Adverse Events and Drop-outs

Event rates were higher for opioids than placebo for all events, with the exception withdrawal due to lack of efficacy, when placebo was higher than opioids. During the model building process, trial duration and type of pain syndrome covariates did not prove to be worth keeping in the final logistic models applied to adverse events frequencies. Hence, models with different intercepts between treatments were used to describe the data. Only for constipation could the slope of a linear and positive dose-dependency be estimated in patients treated with tramadol.

The model parameter estimates (and associated uncertainty) converted in odds-ratio, using placebo as a reference, are presented in Fig. 3. These results confirm the high frequency of adverse events associated with opioid-based therapies. Due to the large number of treatment arms contributing to this analysis, the precision of the estimate is relatively good.

Fig. 3
figure 3

From left to right number of arms (N arms), frequency of adverse event, odds-ratio (with 95% confidence interval) for the active groups versus placebo and odds-ratio (with 95% confidence interval) for tramadol (300 mg once daily) versus tapentadol (100–250 mg twice daily), for each type of adverse event (Constip. constipation, Dizzin. dizziness, Somnol. somnolence)

The indirect comparison of odds-ratios between tramadol (300 mg qd) and tapentadol (100–250 mg bid) show a significantly higher risk of experiencing constipation and vomiting when patients are treated with tramadol, and a slightly higher risk of dizziness when patients are treated with tapentadol (Fig. 3).

The analysis of drop-out frequencies shows expected differences between treatment groups (Fig. 4): more drop-out due to adverse events (DO.AE) in the active treatment compared to placebo, and more drop-out due to lack of efficacy (DO.LoE) in the placebo group. In both models (for DO.AE and DO.LoE), a dose-dependent relationship could be captured in the tramadol group. Patients treated with higher doses of tramadol were more prone to adverse events and less prone to dropping out for lack of efficacy. Dose coverage on tapentadol were insufficient to allow capturing of these trends in these treatment groups.

Fig. 4
figure 4

From left to right number of arms (N arms), proportion of drop-out patients, odds-ratio (with 95% confidence interval) for the active groups versus placebo, and odds-ratio (with 95% confidence interval) for tramadol (300 mg once daily) versus tapentadol (100–25/0 mg twice daily), per reason of withdrawal (DO.AE, drop-out due to adverse event; or DO.LoE, drop-out due to lack of efficacy)


Quantitative assessment of the efficacy and tolerability of treatments in pain was performed across different therapeutic and patient populations. The group-level results of 48 clinical trials in osteoarthritis, back pain, neuropathic pain, or other chronic non-malignant pain were pooled together. As expected, the reduction of pain intensity on placebo treatment was large and clinically important [28], imposing the medication under investigation in the active arm to have a very large effect in order to differentiate from placebo. Assuming a baseline score of 6.9 (on a 0–10 range), the mean pain intensity in placebo groups at week 12, was found to be 4.8, i.e., a reduction of 2.1 points.

Based on the data currently available in the literature, the comparison of tramadol versus tapentadol efficacy indicated that the most effective treatment is tramadol (300 mg qd) with a median pain intensity score reduced from 6.9 at baseline (on a 0–10 range) to 3.7 after 12 weeks of treatment. For the same baseline value, the reduction estimated for patients treated with tapentadol (100–250 mg bid) led to a median score of 4.3, at week 12. The full time course of pain intensity was modeled, which allowed the assessment of the onset of effect for each compound. It was estimated that 50% of the maximal effect could be reached within 8 days after treatment initiation, whatever the treatment was (including placebo). The only significant covariate retained in the final pharmacodynamic model was the baseline pain intensity; the higher the pain score at baseline, the larger the extent of effect on treatment. This expected relationship [24] was observed in both active and placebo groups. The time dynamics were not significantly different across indications (osteoarthritic pain, back pain, neuropathic pain, or others), nor did the study duration influence the studies outcomes.

With respect to the tolerability profile, the results of the current review were consistent with the ones reported in a previous meta-analysis [11] as illustrated in Supplementary Table S2 in the Electronic Supplementary Material.

In past analyses [11, 12], the variety of opioid drugs included (with different doses, different dosing schedules, different comparators, and in different conditions) meant that no statistical analysis was possible. Hence, no quantitative, formal conclusion could be drawn. Many of the publicly available results at that time referred to small and short-term trials, so that the risk of erroneous, incomplete, or imprecise conclusions from heterogeneous data was high. Since 2005, knowledge and data have increased in size and in nature. The access to a significant number of trials offers new perspectives to understand and evaluate the benefit–risk profile of analgesics in chronic non-malignant pain. Model-based meta-analysis techniques allow for controlling and measuring the variability in response that come from differences in dose, time under treatment, and baseline characteristics. The mathematical framework not only gives the possibility to summarize the information in a clear and concise way, but also to predict the response in various hypothetical clinical scenarios. Hence, quantitative knowledge about competitor efficacy can be helpful when setting the desired safety and efficacy profile of a new drug candidate in pain management.

The majority of the studies included in this review were funded by the pharmaceutical industry. As recently discussed by Dunn et al. [29], the research agenda of the pharmaceutical industry may introduce a bias in the quality of results reported in the literature. In particular, less than adequate reporting of adverse events information in clinical trials is relatively common. While frequency was the only available data according to our review of the public-domain literature, it would be more informative to study incidence rates (number of event per unit time) of events or the time to (repeated) events. Patients starting opioids are usually told to expect initial adverse events, such as nausea and drowsiness, but that these will improve rapidly. Judging the adverse event time profile in the absence of longitudinal data analysis is unlikely to be sufficient to support such statements. Several authors have discussed the value of such data for a better benefit–risk assessment of new therapeutic treatment [30, 31].

The current method does not overcome the general weaknesses of many other meta-analyses, such as publication bias, mixed quality of the included studies, and their different strategies for selecting study subjects, obtaining exposure and outcome information, or controlling for potential confounders. Besides, the trials were not designed to study primarily the adverse events and drop-out rates, hence the results of the corresponding meta-analysis should be considered as hypothesis-generating. However, methods are now available to compare pairs of treatment, in absence of data from direct head-to-head comparisons [32]. While the work we presented was focusing on the comparison of two opioids, the same modeling framework could be used to expand this analysis to an entire network of treatments.


The meta-analysis suggests that the benefit–risk ratios of tramadol (300 mg qd)and tapentadol (100–250 mg bid) are similar or not markedly different, with a slightly larger efficacy for tramadol and a slightly better safety profile in favor of tapentadol. In spite of a clinical meaningful efficacy, information from large numbers of patients exposed to these opiate analgesics confirms that one in five patients will discontinue the treatment due to intolerable adverse events, most likely constipation or nausea. As with any meta-analysis, the conclusions must be treated with a degree of caution.

The presented framework can easily be adapted and re-used to address questions related to the development and prescription of pain management drugs. As more competitor information becomes available, the literature database can be extended and the modeling framework can be updated to include most recent information. Such a longitudinal model-based meta-analysis can also represent a valuable tool for health authorities who want to evaluate new drug applications.