The total number of published articles yielded from electronic database searches after duplicates were removed was 10,073. An additional 197 records were identified from supplementary searches, resulting in a total of 10,270 records for screening. Of these, 10,221 records were excluded at title/abstract screening. Figure 2 shows the flow of studies identified and included in the review.
Clinical effectiveness results and discussion
Eleven RCTs of group art therapy were included in the clinical effectiveness review. Eight of the studies were conducted in adults and three were conducted in children. All trials had small final sample sizes with the number of participants reported to be included in each study ranging between 18 and 111. The total number of patients in the included studies is 533.
As can be seen from Table 2 eight studies compared art therapy with an active control group. The comparator groups from the included studies can be seen in Fig. 3. Two of the studies were versus a psychological therapy (Broome  & Thyme ) whereas six studies were attention placebo control groups which mimic the amount of time and attention the intervention group receives. Three studies compared art therapy with a wait-list control or treatment as usual. The majority of studies were conducted in a community/outpatient setting, but the precise setting location for conducting the intervention was not reported in four studies (Broome ; Kim ; Monti ; Monti ) and one study was reported to be conducted in an outpatient setting (Lyshak-Stelzer et al. ).
The symptoms or ‘outcome domains’ under investigation and associated outcome measures are reported in Table 3.
The study populations were heterogeneous in their clinical profiles highlighting the wide application of art therapy but also demonstrating the difficulty in obtaining a pooled estimate of treatment effect. The control groups across the included studies are heterogeneous therefore there may be different estimates of treatment effects depending on what art therapy is compared against. Additionally, despite common mental health symptoms being investigated across the included RCTs, the majority of studies were using different measurement scales to assess these outcomes (see Table 3). Therefore as there is insufficient comparable data on outcome measure across studies it is not possible to perform a formal pooled analysis.
Potential treatment effect modifiers include the experience/qualification of the art therapist, characteristics that were not consistently reported. Also, the age of the included patients could be a potential effect modifier as eight studies are of adults and three are of children. Pre-existing physical conditions were present in seven of the included studies which could also represent a potential treatment effect modifier.
The direction of statistically significant results from the 15 included RCTs are summarised in Table 4.
As can be seen in Table 4, in 10 of the 11 included studies there were improvements from baseline in some outcomes in the art therapy groups. However, both the intervention and the control groups improved from baseline in three studies with no significant difference between the groups (Broome ; McCaffrey  and Thyme ). The control groups across these three studies were: CBT; garden walking; and verbal psychodynamic psychotherapy respectively. In six studies art therapy was significantly better than the control group for some but not all outcome measures. Table 5 shows the results according to the mean change from baseline between groups in these six studies.
In one study (Kim ) outcomes for the art therapy intervention group were significantly better than the control group for all outcomes. Table 6 shows the results from the Kim 2013 study .
In one study (Rusted ) from a sample of people with dementia, outcomes were worse for the art therapy group than the control group, which was an activity control group. An unusual pattern of results is presented including a significant increase reported in anxious/depressed mood (p < 0.01) at 40 weeks which is not present at the 10 or 20 week time points and dissipates by 44 and 56 weeks. The authors discuss several reasons for this result including the high level of attrition; the reliance on observer ratings in the frail and elderly sample (and subsequent potential impact of observer bias); the increased depression as a response to the sessions ending; and the possibility that this sample were contra-indicated for art therapy.
Adverse events were not reported in any of the included RCTs. The lack of adverse event data in the majority of included studies is not necessarily evidence that there were no adverse events in the included trials, it may only be an indication that adverse events were not recorded. Potential harms and negative effects of art therapy are further explored in the qualitative review within the full health technology assessment (Uttley et al. (in press)).
Quality assessment of the 11 included RCTs indicated that the trials were generally of low quality (see Table 7). All trials had high or unclear risk of bias across several domains particularly for: method of randomisation; allocation concealment; blinding; detection bias; and incomplete outcome data.
In addition, withdrawals were not consistently reported or accounted for in the included trials which are particularly important considering the small sample sizes in the included trials. Therefore attrition in the studies represents an important confounder. Also concomitant treatment and treatment fidelity which were rarely reported, represent additional possible confounders to the review findings.
Cost-effectiveness results and discussion
During the clinical effectiveness review 192 abstracts were identified that were potentially relevant for cost-effectiveness purposes and these were reviewed by a health economic modeller. Twenty six articles were retrieved for detailed inspection, although only 1 was deemed relevant (12 were not art therapy; 9 contained no economic data; 4 non-English text).
No existing models of art therapy were identified. One paper that was deemed as potentially relevant was not an economic appraisal but did report costs incurred and health related benefits pertaining to a single patient over a 6 year period . This patient was one of 357 patients initially recruited but the paper did not discuss the potential impact of selection bias on the results presented.
To follow recommended National Institute of Health and Care Excellence (NICE) guidelines  for conducting economic evaluations the health of patients should use a preference based utility measure. Utility is a measure of patient health where 0 equates to death and 1 equates to perfect health. The Euroqol 5 dimensions (EQ-5D) is the preferred measure by NICE. None of the RCTs identified included a preference-based utility measure and therefore mappings from outcome measures reported in the RCTs to the EQ-5D were sought from an online database (http://www.hqlo.com/content/11/1/151) reported by Dakin . Two outcome measures in the RCTs could be mapped onto the EQ-5D: the medical outcomes short form (36) health survey (SF-36) reported in Monti et al.  and the Barthel Index reported in Hattori et al. . However, in Hattori et al.  the Barthel index is reported for the overall score only whereas mapping to the EQ-5D would require the individual component scores. The authors were contacted to enquire whether the individual component data could be obtained, however, the authors declined to provide these data due to their intention to publish these in a forthcoming publication.
In the Monti et al. RCT all participants had a diagnosis of breast cancer, with participants between 4 months and 2 years post-diagnosis. Women with a terminal diagnosis, or who had a current diagnosis of a major mood disorder, psychotic disorder or significant cognitive deficit were excluded. Those receiving any type of mental health care could be included but had to obtain written permission from their treating health professional to enter the study. Eight week data from Monti et al. were available and the SF-36 data reported are shown in Table 8. Only those variables that have been used in the mapping algorithms have been reported.
Two mapping algorithms from SF-36 to EQ-5D were identified: one by Ara and Brazier  and one by Rowen et al. , these predicted utility gains at the end of the 8-week period of 0.0780 and 0.0871 respectively using the data in Table 8. As the Monti et al. RCT also reported changes in the Global Severity Index (GSI) , the summary score from the Symptoms Checklist Revised measure, an inference could be made between a unit decrease in GSI and utility gain estimated via mapping: this value was 0.487 using the Ara and Brazier  mapping and 0.542 using the Rowen et al.  mapping. As GSI data were presented in Thyme et al.  this trial could now be used in an economic evaluation, albeit with more uncertainty in the generated results. It was estimated that at the end of the 10-week treatment period in Thyme et al.  there was a utility loss associated with short-term psychodynamic art therapy compared with short-term psychodynamic verbal therapy, henceforth, abbreviated to verbal therapy. This value was 0.122 using the Ara and Brazier  algorithm and 0125 using the Rowen et al.  algorithm.
Attempts were made to make further inferences on utility changes from the changes in the remaining outcome measures reported in the Thyme et al.  RCT in order to widen the number of RCTs considered but this did not allow the inclusion of further RCTs in the economic evaluation.
Due to heterogeneity the two RCTs were analysed separately. Based on clinical advice regarding the generalisability of the RCTs to practice in England and Wales and limitations of the Thyme et al.  RCT (see later) the results from the Monti et al.  RCT was set to be the primary analyses with results from Thyme et al.  denoted exploratory analyses.
Within the Monti et al. RCT the costs of art therapy per woman was assumed to be £180 using data from the British Association of Art Therapists (BAAT) (personal communication Val Huet, British Association of Art Therapists, February 2014) and £248 using data reported by Curtis . For Thyme et al. the cost per participant was £80 (BAAT) and £110 (Curtis ). The cost of the verbal therapy in Thyme et al.  was estimated to be £64 (BAAT) and £88 (Curtis ) per participant, assuming a verbal therapist had the same cost as an art therapist. It was assumed that control/waitlist incurred no cost in therapist time. Full details on the methods for estimating costs are provided in Uttley et al. (in press).
Probabilistic sensitivity analyses were undertaken to generate the expected cost per QALY for each RCT using the distributions reported in Table 9. It was assumed that all distributions were independent. Scenario analyses were undertaken using: the Ara and Brazier  and Rowen et al.  mapping algorithms; the BAAT and Curtis  cost estimations; and 52 and 104 week residual benefits.
In addition, a threshold analysis was conducted to ascertain the likely level of gain in utility at 52 weeks that would be required for art therapy, as typically used in England and Wales, to be deemed cost effective compared with wait list. This used £20,000 per QALY gained, which is a threshold cited by NICE  as signifying an intervention is likely to be cost effective. To undertake this analysis assumptions regarding the likely cost, and likely durations of treatment and residual benefit were required. Whilst it is acknowledged that there is a spectrum of needs and treatments it was believed that the majority of patients would be treated in either an art therapy outpatient group or a community recovery setting, with only a small proportion needing more expensive treatment. Using data provided from the BAAT, it was assumed that typical treatment would be of 42 sessions, over a 52-week period and with a cost, per patient, of £750.
Primary results from the model
Monti et al. (2006) 
Probabilistic results for the Monti et al. RCT are shown in Table 10. It is seen that even in unfavourable scenarios (low residual benefit and increased cost per participant and using the Ara and Brazier  algorithm) the expected cost per QALY is below £6000. A histogram of the QALY benefit associated with art therapy is shown in Fig. 4.
In the threshold analysis it was calculated that even with unfavourable assumptions regarding length of residual benefit and mapping algorithm the utility gain required to be cost effective would be below 0.04. This value is below that reported by Monti et al. , which had a mean value of 0.078 indicating that art therapy as practiced in England and Wales was likely to be seen as cost effective compared with wait list.
Exploratory results from the model
Thyme et al. (2007) 
Probabilistic results when using data from the Thyme et al.  RCT are shown in Table 11. It is seen that the expectation is that verbal therapy dominates art therapy as it is marginally cheaper and more efficacious. However, there is considerable uncertainty and the 95 % confidence intervals indicate that art therapy may have a cost per QALY gained compared with verbal therapy of less than £300. A histogram of the incremental benefit of verbal therapy compared with art therapy is shown in Fig. 5: this shows considerable uncertainty in the most effective intervention with the solid blue bars indicating verbal therapy is more cost effective and the striped red bars indicating that art therapy is more cost effective. Art therapy is the more efficacious intervention in approximately 20 % of simulations.
Evidence from two RCTs has been used to generate estimates of cost effectiveness, although there are caveats regarding: the mappings; the study population; small sample sizes; and possible confounding, all of which increase the uncertainty in our results.
The Monti et al.  RCT recruited women with breast cancer, of varying stages, and with a range of time since diagnosis between 4 months and 2 years and was conducted in the USA. The generalisability of these women to those treated with art therapy in England and Wales is unclear. Furthermore, there may be inaccuracy introduced by the values in Table 8. It is noted that the data for physical role and emotional roles at week 8, are medians (and change in the median) due to the non-normality of the data whereas means would be preferable. There also is a discrepancy in the results for the physical role scale, as the values reported at weeks 0 and 8 weeks indicate a change of 25 across the 8 week period (50–25) yet the reported difference was zero. We assumed that the value of zero reported for the change between art therapy and wait list is correct, which could be unfavourable to art therapy. A further caveat regarding the reliability of these efficacy data is that only women with values at baseline (week 0) and at end of treatment (week 8) were included in the analysis with no imputation for missing data. There were 11 dropouts in the art therapy arm and 7 dropouts in the control arm. If these reported dropouts were not random but related to lack of (perceived) efficacy then it is possible that the reported results favour art therapy.
The Thyme et al.  study compared art therapy and verbal therapy. The RCT was conducted in Sweden and recruited 44 women. At recruitment, 28 (63.6 %) study participants were diagnosed with dysthymic disorder and 16 (36.4 %) study participants had depressive symptoms and difficulties. One participant withdrew her participation before randomisation resulting in a final study population at randomisation of 43 women, (21 art therapy; 22 verbal therapy). Of these, 39 women completed the study (n = 18 art therapy; n = 21 verbal therapy). The reported results are potentially confounded by concomitant treatment; two participants in the verbal therapy intervention “accepted body awareness as an additional treatment during psychotherapy” compared with none in the art therapy arm. The mechanism by which these women were offered body awareness is unclear. In addition, the use of anti-depressants may differ between arms as the text is unclear: “In the AT group, one participant were (sic) prescribed antidepressants during therapy (n = 1) and one between termination of therapy and the 3-month follow-up (n = 1), and in the VT group three during therapy (n = 1) and two after (n = 2).” Data from women who dropped out from the study (n = 2 art therapy; n = 1 verbal therapy) or who were referred for long-term art psychotherapy (n = 1 art therapy; n = 0 verbal therapy) were not included in the analysis which may add uncertainty to the results. It is noted that as two active interventions were trialled no inference could be made with respect to the relative efficacy compared with no treatment.
Limitations of the work
This review can be considered as an evidence portfolio for art therapy across several non-psychotic mental health disorders but as such it suffers from substantial heterogeneity in the patient clinical profiles included. Focusing the population of interest to specific health conditions or outcome domains in future systematic reviews will increase the precision of any resulting pooled treatment effects.