FormalPara Key Points for Decision Makers

In a cost-effectiveness analysis, a dance and yoga intervention dominated current practice for girls aged 9–13 years with functional abdominal pain disorders (FAPDs).

Considering the high prevalence of FAPDs and that effective treatments are still rare, it is important to expand the number of cost-effective interventions.

The cost-effectiveness of the intervention provides important information for decision-makers when prioritizing and allocating public resources.

1 Introduction

Children worldwide are affected by functional abdominal pain disorders (FAPDs), with a significantly higher prevalence among girls (15.9%) than boys (11.5%) [1]. Irritable bowel syndrome (IBS), functional dyspepsia, abdominal migraine, and functional abdominal pain (FAP)—not otherwise specified, are considered as FAPDs [2]. FAPDs are characterized by recurrent or persistent abdominal pain and are associated with reduced quality of life (QoL) [3,4,5,6,7,8,9], depression, anxiety [6,7,8,9], school absenteeism [9, 10], and parental work absenteeism [3]. In the long run, abdominal pain is a risk factor for sustained pain [11,12,13,14,15] and mental health problems in later life [16, 17].

Beyond suffering for the children and their families, abdominal pain in childhood incurs high societal costs. A Dutch study estimated the total annual cost, including for healthcare, school support, travel, out-of-pocket expenses, and parental productivity loss, at approximately US dollars (US$) 3000 per child with FAPDs [18]. A study of adults with IBS reported higher healthcare consumption, more sick leave than healthy workers, and symptoms that considerably affected the patients’ productivity and ability to optimally perform their jobs [19].

Some evidence indicates that cognitive behavioral therapy (CBT) and hypnotherapy are effective for FAPDs [20, 21] but there is currently no convincing evidence supporting treatment with pharmaceuticals [22,23,24] or dietary treatments [23, 24]. Well-designed studies of new types of interventions are required [25,26,27,28] and there is a lack of cost-effectiveness studies for children with FAPDs.

To expand evidence of cost-effective interventions, the Just in TIME (JiT) research project hypothesized that combining dance and yoga could be an effective and cost-effective intervention for girls with FAPDs, specifically IBS or FAP [29]. Dance is a popular activity among girls [30] that has been shown to enhance both physical [31,32,33] and psychological health [31, 32, 34,35,36], and yoga has been shown to reduce pain and school absenteeism [37, 38], decrease IBS-related symptoms, and improve QoL and physical functioning [39,40,41] among children with FAPDs.

The efficacy evaluation of the primary outcome in the JiT study found a medium to high between-group effect size (Cohen’s d 0.67) and a significant reduction in abdominal pain compared with controls in the end of the intervention [42]. In this study, our aim was to evaluate the cost effectiveness of the JiT dance and yoga intervention for girls aged 9–13 years with FAPDs, compared with current practice. To our knowledge, this is the first study of this type of intervention for this target group.

2 Methods

2.1 Randomized Controlled Trial

The JiT study was a prospective randomized controlled trial (RCT) including 121 girls aged 9–13 years with FAPDs. The power calculation before the RCT was based on the primary outcome (abdominal pain) and the randomization into intervention or controls was based on minimization [43] according to pain intensity, both performed by an external statistician. This study was registered at ClinicalTrials.gov (identifier NCT02920268; Name: Just in TIME—Intervention With Dance and Yoga for Girls With Recurrent Abdominal Pain) and was approved by the Regional Ethical Review Board in Uppsala, Sweden (Dnr 2016/082 1-2). All participants voluntarily took part in the RCT and the legal guardians provided written informed consent.

Participants were recruited from the outpatient clinics at the Pediatric Departments of four hospitals, the school health services and the general public (via information about the study in the media and websites) in two Swedish regions (Region of Örebro and Region of Västmanland), and the primary healthcare and a counseling unit for children and adolescents in one of the regions (Region of Örebro). The inclusion criteria were girls aged 9–13 years diagnosed with FAP or IBS according to the Rome III criteria [44], with persistent pain after examination at the pediatric center. Girls who reported values of 4 or higher on the Faces Pain Scale–Revised (FPS-R) [45, 46] at least once during a week at baseline qualified for the study, whereas girls with celiac or inflammatory bowel disease, difficulties following oral instructions, severe mental problems, or being treated with CBT were excluded from the study [29]. The mean age was 10.5 years (standard deviation [SD] 1.37) and most participants (61%) had FAP. Most of the girls reported moderate (38%) or severe (39%) pain at baseline measured using the FPS-R (with values of 0–10) [45, 46] and grouped according to Tsze et al. [47]: ≤ 2 = no pain, > 2 ≤ 4 = mild pain, > 4 ≤ 6 = moderate pain, and > 6 = severe pain.

The intervention group (n = 64) received combined dance and yoga twice weekly (with the exception of school holidays) for 8 calendar months, on average 50 (48–52) sessions depending on recruitment year and site. Each session took 60 min: 30 min of dance practice, 25 min of yoga including relaxation, and 5 min of short reflection. The intervention was mirrored at both sites that were involved in the research project. Ahead of the start of the research project, all instructors received a 2-day course, including practical information about the standardized program, dance choreographies, yoga sequences and essential elements of the intervention. The instructors also received a written manual and the course was followed up with three booster sessions to ensure fidelity to the method. The control group (n = 57) was instructed to live as usual. This means that both groups had access to all types of healthcare or other support when needed, with one exception; participants were excluded if having treatment with CBT. The control group was also allowed to participate in all types of activities if they wanted to. The efficacy study from the trial showed that the JiT dance and yoga intervention were superior to standard health care alone, with a medium to high between-group effect size and significantly greater pain reduction (b = −1.29, p = 0.002) at the end of the intervention [42]. A full description of the JiT study design has been published in the study protocol [29].

2.2 Economic Evaluation

This paper examines the health economic outcomes, from a 12-month and 10-year horizon, of JiT by developing a decision analysis tool, i.e., a decision tree followed by a Markov model. An important role of decision models is to bridge the gap between what has been observed in trials and what would be expected, in terms of costs and effects, from a long-term time horizon [48].

This was a cost-utility analysis using individual data [48], performed from both healthcare and societal perspectives to illustrate both the perspective of a healthcare decision maker and to acknowledge that there are costs and effects occurring outside the health care sector. This is also in line with current recommendations [49]. The base case considered healthcare costs and productivity loss. Quality-adjusted life-years (QALYs) gained were used to measure the effects. Cost-effectiveness ratios were based on the changes in QALYs and the net costs of the intervention group versus those in the control group. Results are presented as an incremental cost-effectiveness ratio (ICER) expressed as (Ca – Cb)/(Ea – Eb). The target population for the economic model analysis was young girls with FAP or IBS. Data from JiT were used to represent this population. Sensitivity analyses (described below) were performed to acknowledge uncertainty in parameter estimates. The reporting of the trial was based on the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) statement from 2013 [50], as these were prevailing when completing the study. An annual discount rate of 3% was applied to costs and effects for the period beyond 12 months, as recommended [49]. All costs were estimated according to Swedish values of the year 2021 and converted from Swedish krona (SEK) to US$ using an exchange rate of 1 SEK = 0.12 US$ valid in June 2021.

2.2.1 Measuring Health Benefits

QoL was measured using KIDSCREEN-10 [51] at baseline and 4, 8, and 12 months after by administrating questionnaires to participants. KIDSCREEN-10 is a well-established self- and parent-reported index measuring subjective health and wellbeing among children and adolescents. KIDSCREEN-10 has shown good reliability and ability to differentiate between different groups [52]. It is available in Swedish and consists of 5-point response scales ranging from 0 = ‘not at all’ to 4 = ‘extremely’, or from 0 = ‘never’ to 4 = ‘always’ [51]. The self-report version was used in JiT. However, KIDSCREEN-10 is not a preference-based instrument, meaning that it cannot be used to derive QALYs for use in health economic studies. KIDSCREEN-10 data were therefore converted, using a statistical algorithm [53], to the Child Health Utility Index (CHU9D) [54] to obtain utility scores with which to estimate QALYs. CHU-9D is a generic preference-based measure of health-related QoL suitable for 7- to 17-year-olds. The CHU9D questionnaire was developed specifically for children [54], and the preference weights used in the mapping study [53] were obtained from a sample of adolescents in Australia.

QALYs were estimated by multiplying the time spent in a given health state (baseline—4 months, 4–8 months and 8–12 months) by the average utility weight associated with the period, using the trapezium rule [55].

2.2.2 Measuring Costs

From the health care perspective, the total cost comprises intervention costs and healthcare costs accruing to studied girls; from the societal perspective, production losses to the girls and their legal guardians are added to the total (Table 1).

Table 1 Unit costs and sources

As this was a new intervention, it was assumed that no trained instructors would be available, therefore the training costs were included in the intervention costs in the base case analysis. Intervention costs included (1) costs of training the instructors, in a 2-day course covering practical instructions for the intervention sessions as well as delivery mode and teaching style; (2) remuneration for the dance and yoga instructors, who were compensated for each dance and yoga session, an additional hour for preparation, and all training and ‘booster’ sessions mandatory for instructors to ensure standardization of program delivery; (3) rental for the dance/yoga studio; (4) costs of music; and (5) material costs (e.g., blankets and lights) [see electronic supplementary material (ESM) Table 1]. The average class size in the trial at inclusion was 13.2 participants per intervention group; based on this information and experiences from the dance and yoga instructors in the trial, 15 participants were estimated to be a suitable number in each intervention group in the model.

In the questionnaires distributed at the 4- and 8-month follow-ups, the study participants were asked to state the number of visits to the school nurse, school physician, school psychologist, or school counsellor made during the last term. Other outpatient healthcare utilization (i.e., visits to physicians, nurses, physiotherapists, psychologists, or counsellors) was measured with an open question to the legal guardians in their questionnaires at all follow-ups, asking about visits over the past 3 months. The visits to each health profession reported at all follow-ups were summed to obtain the number of healthcare visits per profession per year.

Outpatient healthcare resource use was costed using national Swedish tariffs. Since no tariffs for school healthcare exists, these cost items were, given the shorter visit durations, less sampling, and lower rent for school premises, estimated at half the cost per visit relative to outpatient healthcare visits. Next, the cost of healthcare utilization was estimated by multiplying the number of healthcare visits per profession by the unit cost per professional visit. Finally, the costs of school healthcare and other health care were summed separately.

Production losses were associated with school absenteeism for the girls and with work absenteeism for their legal guardians. Data on school absence were collected at baseline and at each follow-up using two questions in the participants’ questionnaires, regarding absence from school for whole days or parts of days. Each question included six response alternatives ranging from ‘never’ to ‘several times each week’, coded as never = 0 days absent, sometimes during each term = 3 days absent, once each month = 5 days absent, two to three times each month = 10 days absent, once each week = 20 days absent, and several times each week = 40 days absent. Absence from school for part of a day was estimated to be absence for a quarter of the day. Absences from school (for all or part of a day) were summed into a single variable at each follow-up, and finally an average school absence value was calculated for the whole year.

According to Swedish laws, legal guardians can stay home from work to care for ill children until their children are 12 years of age. One legal guardian was therefore assumed to be absent from work for the same number of days as their girl was absent for whole days, until the girls turned 12 years of age. No absenteeism from work was assumed for the legal guardians when their girls were absent for part days. The average work absence value was calculated for the whole year.

School absenteeism for the girls and work absenteeism for the legal guardians were costed in the same way, i.e., according to the mean wage (including social fees) in Sweden in year 2021. Lastly, absenteeism was multiplied by the mean wage to derive a total cost.

2.2.3 Economic Model

An economic model combining a within-trial decision tree (Fig. 1) and a Markov cohort model with a time horizon of 10 years (Fig. 2) was constructed. Treeage Pro version 2021 R1.2 was used for analysis.

Fig. 1
figure 1

Twelve-month decision tree structure. Participants are distributed in terms of allocation to the intervention group/control group, and the decision tree shows the different pathways that the participants could follow from baseline to the 12-month follow-up period. FAPDs functional abdominal pain disorders, dashed line indicates that no participants moved along that pathway

Fig. 2
figure 2

The long-term Markov model. Patients begin in each of the pain groups, i.e. no pain, mild pain, moderate pain, and severe pain, where they ended after the first 12 months. The arrows indicate possible transitions. Participants may either stay in the same state or move between each of the pain states over time. The model runs for 10 years

The initial decision tree structure (Fig. 1) was based on the pathways and outcomes from baseline to the 12-month follow-up in JiT. The distribution of participants in terms of allocation to intervention/controls, abdominal pain status at baseline, and subsequent abdominal pain status at the 12-month follow-up is illustrated in the model. Abdominal pain status was measured using the FPS-R [45, 46] and recorded in a pain diary. The highest reported abdominal pain during a 1-week period was used and was divided into the four different states according to the cut-offs defined by Tsze et al. [47]. The participants were distributed between the pain states at the start of the model. Since one of the inclusion criteria for the trial was to report four or higher according to the FPS-R, no participants were identified as having no pain at baseline.

Each pathway in the decision tree was assigned a QALY value and mean costs for health care, production losses, and intervention costs (only applicable to the intervention pathways). The decision tree parameters are summarized in Table 2 in the ESM. Probabilities of moving in different pathways in the decision tree were derived from JiT (ESM Table 3).

Subsequently, a Markov cohort model (Fig. 2) was constructed for each study arm to extrapolate the findings from JiT over a long-term time horizon. The arrows in the model show how participants can move through the model over the cycles, with the time horizon of 10 years being split into 10 cycles, until the participants reached adulthood, because of the differences in disease outcomes between children and adults. The distribution of participants across health states at the start of the Markov model differed between the states, reflecting the observed proportion of participants in each health state at the end of the decision tree.

The probabilities (Table 2) of moving between the health states in the Markov model corresponds to the proportion of participants moving between different health states from baseline to the 12-month follow-up in JiT. The probabilities of moving from the no-pain state were derived from the control group’s movements at the 4- and 8-month follow-up. In the base case analysis, the intervention effect was assumed to successively decrease over 5 years, therefore no intervention effect was assumed during the last 5 years in the model. This assumption was based on a preliminary analysis from a 24-month follow-up in JiT and other studies of interventions for FAPDs with long-term follow-ups [37,38,39, 56]. Other assumptions regarding the intervention effect were handled in the sensitivity analyses.

Table 2 Base case model inputs, transition probabilities for the Markov model

Costs and QALYs associated with each state were assumed to be the same for both groups (i.e., with/without intervention), so the differences between the groups were the transitions probabilities and the distribution between the states at the start of the Markov model. Costs and QALYs for the base case Markov model are summarized in Table 3.

Table 3 Base case model inputs, costs and QALYs for the Markov model.

A time-dependency factor for the no-pain state was incorporated into the model. This means that being in the no-pain health state for 2 years/cycles or more increased the probability (with 10 percentage points each year) of staying in the no-pain health state. Finally, the model was performed with a half-cycle correction to acknowledge that transitions in reality can occur anytime during the cycle [57].

2.2.4 Sensitivity Analysis

Methodological uncertainty (of model inputs and other assumptions) was handled by means of deterministic sensitivity analyses.

First, the intervention cost was varied (only affecting the decision tree): (1) we hypothesized different training costs ranging from no costs at all (assuming that trained instructors already existed) to higher training costs due to few (i.e., 10) participants in the groups; (2) the intervention cost was varied to account for different sizes of intervention groups (i.e., 10–30 participants); and (3) with no other variables changed, the cut-off (i.e., when the intervention arm stopped to be dominating or when the ICER was above the recommended limit) for the intervention cost was investigated.

Second, since the base case analysis was a complete cases analysis, we also performed an analysis with imputed data for the QoL measure using multiple imputation with fully conditional specification using logistic regression. According to the proportion of missing data (29%), 15 imputed datasets were created, and estimates obtained from each imputed value were combined to generate a mean estimate. As recommended, QoL was imputed at the item level [58] for KIDSCREEN, after which the CHU9D measures were estimated. The imputation process was performed in IBM SPSS Statistics 25 (IBM Corporation, Armonk, NY, USA). No reported values for healthcare visits and school absence were assumed to imply that no visits were made and were not imputed.

Third, the costing of the production losses was also varied, estimating lower and higher values due to higher (150%) and lower (50%) costing of school absences. The production losses were also estimated to be 0, equal to the health care perspective.

Fourth, the costs of visits to school healthcare workers were assumed to be the same as those of visits to other health care professionals instead of 50% of the costs.

Fifth, different scenarios were applied for the long-term intervention effects, which means that the transition probabilities were different for different scenarios. As mentioned above, the base case scenario included a successively decreasing intervention effect for 5 years, and thereafter no remaining effect. The second scenario assumed a continued full intervention effect for all 10 years. The third scenario was that no intervention effect remained after 12 months, implying that the intervention group continued with the natural course of the illness (in line with the control group) from their level after 12 months. The fourth scenario expected no persistent intervention effect and that the participants would fall back into their original pain group after the first year. These scenarios are summarized in Fig. 3.

Fig. 3
figure 3

Different scenarios for the remaining intervention effect applied in the sensitivity analysis

Finally, to assess the level of parameter uncertainty, we also performed a probabilistic sensitivity analysis (PSA). Costs and QALYs were each assigned a base case value (based on JiT), and for each value, an SD and distribution were assumed (Tables 2, 3, and ESM Tables 2, 3). The SDs of costs and QALYs in the decision tree pathways were based on both interventions and controls to account for variability in the means. In a Monte Carlo simulation, 10,000 iterations were run using values drawn from appropriate distributions to estimate expected costs and outcomes as well as statistical measures of the statistical uncertainty, which is presented in the cost-effectiveness plane. Each of the 10,000 estimates was compared against different willingness-to-pay (WTP) thresholds. The results of this analysis are presented in the form of a cost-effectiveness acceptability curve (CEAC).

3 Results

The intervention cost of the dance and yoga intervention was US$847 per participant, and the mean cost of healthcare utilization during the trial was, when using complete cases, somewhat lower for the intervention group (US$587, 95% confidence interval [CI] 399–775) than for the control group (US$738, 95% CI 477–1000), with considerable variability. Furthermore, the mean production loss was somewhat lower in the intervention group (US$1352, 95% CI 966–1737) than the control group (US$2092, 95% CI 1262–2923), with high variability. Summing the costs of health care utilization, production loss, and intervention gave a total cost of US$2786 (95% CI 2301–3271) in the intervention group and US$2831 (95% CI 1945–3718) in the control group. The unit costs for each parameter are summarized in Table 1. There was also a non-significant QALY gain in the intervention group during the trial (0.004, 95% CI −0.027 to 0.034) and a non-significant QALY loss in the control group (− 0.0001, 95% CI −0.026 to 0.26). The mean values of health care visits, days of school and work absence, and QoL are presented in ESM Table 4.

3.1 Cost-Effectiveness

3.1.1 Short-Term Analysis

The cost-effectiveness results are presented in Table 4. The 12-month decision tree analysis indicated that the intervention was the dominant strategy. There were incremental savings and small QALY gains, resulting in a negative ICER for the base case analysis. This means that there were lower costs and higher QALY gains in the intervention group compared with the control group. The results of the PSA showed that the probability of being cost effective was 0.56 at a WTP threshold of US$50,000. From a healthcare perspective, with a 12-month horizon, the ICER was US$157,328 for a gained QALY, indicating that conducting the intervention in addition to standard health care was less cost effective than standard health care alone when only healthcare was considered.

Table 4 Cost-effectiveness results, base case, and sensitivity analyses.

3.1.2 Long-Term Analysis

The long-term Markov model analysis also indicated a negative ICER based on an incremental saving and a QALY gain (Table 4). The uncertainty of this estimate is presented in both Fig. 4, showing the spread of points on the cost-effectiveness plane, and Fig. 5, showing the CEAC.

Fig. 4
figure 4

Long-term base-case cost-effectiveness plane. Incremental costs and effectiveness reflect the mean sum costs and QALYs of the JiT intervention versus current practice. Each dot reflects a model iteration; points that lie above 0 for effectiveness indicated that the JiT intervention is more effective, and those points above 0 for cost indicate that the JiT intervention is more expensive. Most model iterations lie above 0 for effectiveness and below 0 for cost, indicating that the JiT intervention is more effective and cheaper. The WTP line indicates the threshold value of US$50,000 for a gained QALY; dots to the right of this line are seen as cost effective. JiT Just in TIME, QALY quality-adjusted life-year, WTP willingness to pay

Fig. 5
figure 5

Long-term base-case CEAC. The graph plots the probability that the JiT intervention and current practice are cost effective at various thresholds for willingness to pay for a QALY. CEAC cost-effectiveness acceptability curve, QALY quality-adjusted life-years

The analysis from the healthcare perspective, excluding the production losses, was in line with the findings for the societal perspective, showing a negative ICER.

3.2 Sensitivity Analysis

The sensitivity analyses showed that the results were not sensitive to alternative assumptions regarding the training cost for the intervention, costing of production loss due to school absence, and costing of school healthcare (Table 4).

Changing the regular cost of the intervention resulted in a cut-off at US$1007/participant, to be cost-effective given a WTP threshold of US$50,000, with no change in training cost.

Moreover, running the 12-month analysis and the long-term Markov model with imputed values for QALYs reached the same conclusion as in the base case analysis with negative ICERs.

Finally, in the long-term analysis, the impact of assumptions on the enduring intervention effect was analyzed using different scenarios. These analyses indicated that results were not sensitive to different assumptions as to the intervention effect, resulting in ICERs similar to those of the base case scenario.

4 Discussion

The efficacy and cost-effectiveness of interventions provide important information for decision-makers when prioritizing and allocating public resources. This study aimed to evaluate the cost-effectiveness of a dance and yoga intervention for young girls with FAPDs, based on the JiT study. The main findings from the base case analysis show a small QALY gain as well as savings in societal costs from both the 12-month and long-term perspectives, resulting in negative ICERs. These findings indicate that a dance and yoga intervention for girls with FAPDs has the potential to be cost effective compared with standard care in Sweden. However, the sensitivity analysis indicated that the 12-month results do not seem very robust, with a probability of 0.56 when stakeholders are willing to pay US$50,000 for a QALY gained shown in the PSA analysis. Moreover, the short-term sensitivity analyses demonstrated that the intervention was not cost effective when evaluating from a healthcare perspective. This indicates that the intervention had a considerable effect on first-hand production loss, i.e., school absence for the children and work absence for their legal guardians. From a long-term horizon, the finding seems more robust, with a probability of 0.78 when stakeholders were willing to pay US$50,000 for a QALY gained. There was however a considerable uncertainty of the estimates, with points spread in all quadrants of the cost-effectiveness plane (Fig. 4) that gave a CEAC with a rising and then falling slope according to savings in some estimates (74%), not all of which involved health gains [59].

As this is the first study of dance and yoga combined for this target group, no previous studies are completely comparable. Moreover, cost-effectiveness analyses of non-pharmacological treatments in pediatric FAPDs are rare. However, a recent economic study of CBT among children with FAPDs showed, after a 10-week intervention, a somewhat higher QALY gain (0.0187) and greater cost savings (US$1050.49) than the JiT study [56]. Another study of internet CBT for adolescents with IBS showed a smaller QALY gain (0.0031) and higher costs (US$170.24) after a 10-week intervention [60] compared with the JiT study. In addition, a study of social learning and CBT intervention targeting parents of children with FAP showed a decrease in health care visits after 6 months [61] and one study of hypnotherapy for children (8–18 years of age) with IBS or FAP showed improved QoL [62]. A study of dance from an economic perspective, examining an 8-month dance intervention for teenage girls with internalizing symptoms, indicated a high probability of cost-effectiveness (95%) at the 20-month follow-up, with a QALY gain of 0.1 and net costs of US$383 [63]. No cost-effectiveness studies of yoga for children with abdominal pain have been found, but a systematic review of yoga as a treatment for IBS demonstrated positive effects on QoL [41].

The diversity of findings of different studies may be attributable to several factors. Because all mentioned full cost-effectiveness studies were performed in Sweden, country-specific differences did not affect the outcomes. This fact also motivates remarks about some contextual factors that are important to consider for the generalizability of our findings. In Sweden, health care is free of charge for children and there is also an overall support for school health care stated in the school act [64]. In addition, the Swedish welfare system allows parents to stay home with ill children until the child is 12 years of age, which they also were assumed to do in the present study. We believe that the general pattern of health care and intervention costs reported in the present study is likely to be valid across countries, although the bearer of the costs might differ depending on how health care is organized and financed. However, costs resulting from work absence might be affected by differences between countries regarding parents’ right to stay home with ill children or labor force participation. A substantial difference between the studies was the intervention cost, which was considerably higher in JiT (US$847) and for the dance intervention studied by Philipsson et al. [63] (US$670) compared with the various CBT interventions studied by Lalouni and colleagues [56, 60] (US$186) and Sampaio et al. [60] (US$178). The higher cost may depend on the length (8 months) and frequency (twice weekly) of the interventions in both the JiT and in the study by Philipsson et al. [63]. In addition, the interventions examined by Lalouni et al. [56] and Sampaio et al. [60] were digitally distributed over the internet, which likely reduced their costs. That fact that the intervention is costly and requires a significant time investment by children and parents may be a weakness of the JiT intervention. On the other hand, interventions that contribute to an increase in regular physical activity over time can lead to sustained positive lifestyle habits and counteract the alarming levels of physical inactivity in young girls [65].

Important factors affecting diversity in general in health economic studies are that they measure costs and QoL using different instruments. This variety may lead to differing results of economic evaluations, with implications for policy recommendations [66]. In JiT, the number of visits to outpatient healthcare was measured using an open-ended, self-composed question for school and other healthcare and using two multiple-choice questions for school absence. Lalouni et al. [56] and Sampaio et al. [60] both used an adapted version of the Trimbos and Institute of Medical Technology Assessment Cost Questionnaire for Psychiatry (TIC-P) [67], whereas Philipsson et al. [63] only measured visits to the school nurse when estimating resource use. In general, TIC-P and the open-ended questions used in the JiT study include the same cost items, although the TIC-P instrument also includes questions about medication, which is not covered by the JiT study. Overall, the use of a self-composed unvalidated questionnaire for healthcare utilization might be a limitation in the JiT study. Moreover, the coding of the multiple-choice question about school absence used in JiT may have affected the outcomes; for example, the choice ‘several times a week’ was coded as ‘2’, which may have been an underestimation. In addition, a divergence arose in the costing of each unit, where the costing of school absence in particular was costed higher in JiT (US$276/school day) than in the studies by Lalouni et al. (US$74) [56] or Sampaio et al. (US$74) [60]. In the JiT study, school absence was costed in the same way as work absence, using the mean wage in Sweden. In contrast, Sampaio et al. [60] used the estimated daily cost of a child attending high school [60], while the costing process was not presented by Lalouni et al. [56]. These types of valuation of lost childhood education have been applied in several studies. However, according to Andronis et al. [68], none of these approaches are suitable for the valuation of alternative use of children’s time, but they and other approaches are used despite the lack of overall recommendations [68]. Moreover, a weakness of the JiT study was that children’s lost leisure time was not measured. However, in their recent review of the field, Andronis et al. [68] did not find any other study that included measures of leisure time, even though this may be an important yet complex factor in childhood.

Three of the four cost-effectiveness studies discussed here used different instruments to obtain QALYs, which also might have affected the results. KIDSCREEN-10 was mapped onto CHU9D in the JiT study, while the Pediatric Quality of Life Inventory [69] was mapped onto EQ-5D-Y-3L [70] by Lalouni et al. [56] and Sampaio et al. [60], whereas Philipsson et al. [63] used the Health Utilitiy Index 3 [71,72,73]. Measuring children’s health states is challenging and requires a suitable and age-specific instrument [74, 75], which probably affected the choice of instrument used in the studies mentioned here. There are a limited number of preference-based QoL-instruments targeting children available, even if there has been an increase in recent years. Mapping is seen as the second best alternative [76] when an appropriate instrument is missing. Such algorithms are useful to increase comparability across different instruments [77]. By the time of the start of the JiT project, there were no validated and translated preference-based instruments available that were appropriate for the target group. Instead, we decided to use the KIDSCREEN instrument and apply the available algorithm to map these scores onto CHU-9D utility scores [53]. A strength with the CHU9D is that it is based on adolescents’ preferences, but a limitation is that these adolescents were living in Australia, which might differ from the Swedish culture. In the JiT study, the changes in QoL were very small compared with the changes in abdominal pain shown by Högström et al. [42]. This might indicate that the correlation between pain and QoL is not straightforward or that the measurement of QoL has not captured pain very well.

This study is also affected by other methodological considerations. First, unlike the studies of Lalouni et al. [56, 78], Sampaio et al. [60] and Philipsson et al. [63], this was a modeling study. Decision trees are simple and straightforward, while Markov modeling is commonly used in health economics and is flexible and powerful [78]. In this study, model inputs are based on trial data. One unexpected consequence following on from that was that the utility (QALY) value for no pain was lower than for mild pain. This might be due to the limited number of participants or reflecting that pain is not in full correspondence to QoL, which is elaborated on below. Second, even though the question on school absence specifically concerned abdominal pain, other absences may have been included in the answers. Third, a common issue in cost-effectiveness analysis is the risk of double counting. Spending less time in ill health might be captured in the QoL measure when the instrument includes questions about daily activities [68], and we believe that there may also be a risk of double counting the school absence in this study. Fourth, a state for death is commonly included in Markov models in health economic evaluations. In the JiT study, the risk of death from FAPDs was negligible, therefore the state was neglected. Fifth, although Swedish laws state that legal guardians may stay home from work to care for ill children up to 12 years of age, it is not certain that they do so. Not asking about work absenteeism in the legal guardians’ questionnaires was a weakness of the JiT study, as this parameter instead had to be estimated from the children’s answers. Sixth, the states in the model were based on the participants’ abdominal pain level, as well as the corresponding QoL and costs for each state. Only using pain as a proxy for each state might have been a weakness in the study since there may be other symptoms of FAPDs. However, pain is seen as the main driving symptom of this disorder [27]. Simplifications of this type are often needed in health economic modeling. Seventh, despite the fact that much of the data we had access to were collected in 4-month intervals, we choose to use cycles of 1 year. The reason for this was that the questions about school absence referred to ‘the last term’ and not the last 4 months. In addition, we believe that a period of 4 months is too short to be able to see changes in the long run. A model consists of several assumptions and uncertainty and we believe that using the data from the 12-month follow-up is a more valid way to deal with uncertainty. Finally, the use of a time-dependency factor needs to be discussed. We have made the assumption that the probability of staying in the no-pain health state would increase once being in that state for 2 years/cycles or more. To our knowledge, there is no research that supports this assumption, which is a limitation in this study. The long-term effects from intervention studies, as well as epidemiological follow-ups regarding FAPDs among children, are few. However, Lalouni et al. showed that there was a remaining intervention effect 36 weeks from baseline, indicating a long-term intervention effect from a 10-week CBT intervention [56]. A recent long-term follow-up also indicated that the majority of youths with abdominal pain as youth experience pain resolution between childhood and late adolescence/emerging adulthood [79]. Horst et al. showed that 59% of patients with abdominal problems as children had no abdominal diagnosis as adults [80].

5 Conclusion

The JiT intervention with dance and yoga for young girls with FAPDs seems to potentially be a cost-effective intervention. However, what can be recommended depends on the perspective applied and the assumptions on the remaining intervention effect. Within the area of FAPDs, there is a huge demand for new evidence-based interventions, and with this in mind, the findings here might be ‘good enough’ for economic incentives of implementing the intervention more broadly. However, more health economic research is needed since economic findings can constitute important considerations for decision makers allocating an existing budget. To decrease the intervention cost, it would be interesting to evaluate a similar intervention using a shorter timeframe or lower frequency of the intervention. This study may provide useful information when designing future interventions.