Emotional disorders (anxiety and mood) are the most prevalent mental disorders in society. It is estimated that 3.76% of the population suffers from a mood disorder, while 4.05% suffers from an anxiety disorder (TIHME, 2019).

Furthermore, these disorders cause a high cost for both the individual and society. Thus, people who suffer from an emotional problem have a lower quality of life, worse social adjustment, and higher levels of disability (Ishak et al., 2011; Olatunji et al., 2007; Sudhir et al., 2012).

Likewise, the consequences of these disorders for society imply large expenses for states both directly, costs in hospital care, outpatient care, and medication, and indirectly, costs of permanent or temporary leave from work and early retirement. In total, it is estimated that each patient represents an average expense of $5703 (Christensen et al., 2020).

However, despite economic efforts to combat emotional problems, there is evidence that shows that resources allocated to mental health are scarce and less than necessary, especially in low- and middle-income countries. Furthermore, these resources are not distributed equally between different regions of the world and are used inefficiently. On the one hand, a large amount of resources are invested in medication and large institutions such as psychiatric hospitals, while psychological interventions, whose effectiveness has been widely demonstrated in treating emotional problems (Hollon & Ponniah, 2010), receive a portion much lower investment (Saxena et al., 2007). Given the above, emotional disorders are a social and economic problem for society. For this reason, there is an increasing need for effective and efficient psychological treatments. UP using a group version could help in this way reducing associated social and economic costs.

Barlow’s Unified Protocol (Barlow et al., 2010) is a cognitive-behavioral treatment that has unified a set of psychological procedures from learning theories, developmental theories, emotional regulation theories, and cognitive theories whose intervention focuses on the common factors of emotional disorders, rather than on the specific symptoms of each disorder. This transdiagnostic approach allows the UP to be applied across the entire spectrum of emotional disorders using the same therapeutic principles.

Thus, UP is a highly parsimonious therapy because is not necessary to select treatment components for intervention; the same set of procedures is applied for any emotional disorder of the patient. This has several advantages. On the one hand, clinical training is easier and shorter (clinicians do not have to be trained in several therapies for different specific disorders), and on the other, it is easier for patients to understand and internalize the different therapeutic techniques (Dalgleish et al., 2020).

In terms of efficacy, the UP has received empirical support from multiple studies. One of the first studies was conducted on individual basis by Ellard et al. (2010), whose results showed similar effectiveness to other cognitive-behavioral interventions. There were also RCT studies with similar efficacy results, such as the study by Farchione et al. (2012). Later, its efficacy was demonstrated in group format in the study conducted by Bullis et al. (2015) Since then, the UP has been replicated in a multitude of contexts around the world. It has been successfully applied with both adults and children (Ehrenreich-May et al., 2012), in group and individual contexts, as well as online (Wurm et al., 2017).

Several meta-analyses have been carried out that also support the efficacy of UP. Sakiris and Berle (2019) found it to be effective in reducing symptoms of anxiety, depression, generalized anxiety disorder, obsessive–compulsive disorder, panic disorder with/without agoraphobia, social anxiety disorder, and borderline personality disorder. In addition, moderate effect sizes were found for the increase in adaptive emotional regulation strategies and the reduction of maladaptive strategies after UP. Similarly, a meta-analysis conducted by Carlucci et al. (2021) found favorable results for the efficacy of UP in reducing anxiety and depressive symptoms in studies with children, adolescents, and adults.

In summary, UP has strong empirical support for its effectiveness in reducing the symptoms of emotional disorders.

In addition, this protocol could represent a highly efficient alternative to reduce economic costs. This is so because the group application modality allows not only treating several patients at the same time, but also treating people with different disorders simultaneously. So, this therapeutic option could represent a possible solution for public health systems where budgets are very tight and the volume of people with emotional problems is increasing.

The effectiveness of UP in group format has also been demonstrated. Studies have been conducted in patients with problems across the emotional spectrum, both in people whose main problem was anxiety (generalized anxiety disorder, social anxiety disorder, panic disorder, and agoraphobia) (Laposa et al., 2017) and in people whose main problem was post-traumatic stress disorder (Varkovitzky et al., 2018), as well as in people with suicidal ideation (Bentley et al., 2018), etc. Furthermore, the efficacy of the protocol in group format has also been tested in RCT studies (Mohsenabadi et al., 2018). However, no meta-analysis has yet been developed that evaluates the effectiveness of UP in group format. This information could be revealing in order to assess whether the UP in group format is an effective and efficient solution to emotional problems.

For all the above, we decided to carry out a systematic review and meta-analysis of the efficacy/effectiveness of the application of UP in group format for emotional disorders.

Efficacy studies emphasize internal validity to infer the existence of a causal relationship between the therapy and the results obtained. To do so, they compare the chosen therapy with a waiting list or a standard treatment condition (ideally RCT). The therapy is protocolized and has a predetermined number of sessions. Thus, it is applied under optimal and as controlled conditions as possible, so that the effects observed can be attributed exclusively to the treatment applied.

On the other hand, effectiveness or clinical utility studies give priority to external validity, carrying out treatments that have previously demonstrated their efficacy. To this end, they analyze the positive effects of the chosen therapy under conditions similar to those in clinical practice (not necessarily RCT), targeting the influence of variables such as the training and clinical experience of the therapists, the greater heterogeneity of the sample, and its added problems or the application of the therapy in a flexible way (techniques, duration, etc.) due to the needs of the patients.

Objectives

Our main objective was to summarize existing information on the group application of the Unified Protocol. We subdivided this main objective into several secondary ones:

First, to conduct a systematic review of the research about the efficacy and effectiveness of the UP in group format.

Second, to develop a meta-analysis of the efficacy and effectiveness of the UP in group format. The analysis of efficacy included RCT studies with the aim of trying to control variables external to the treatment. On the other hand, the analysis of effectiveness or clinical utility incorporated RCT studies and non-RCT studies with the aim of analyzing a larger amount of research and obtaining more information on the psychological variables affected after the UP intervention.

Method

Protocol

We followed the guidelines established by the PRISMA group for the elaboration of systematic reviews and meta-analysis protocols (Moher et al., 2015). A PRISMA checklist showing the items recommended by the PRISMA group that we included in our work can be found in the Table 1S in the supplementary material.

A brief description of the procedure phases is the following: Step 1, a search was carried out through the main search sources of the scientific literature. Step 2, the selection of studies was carried out through the use of inclusion and exclusion criteria by two of the authors independently. Step 3, we elaborated a coding manual of the variables that could act as moderators of the results of UP effectiveness, and then, these coded variables were recorded in a table. Step 4, a homogenization of the results of the studies was carried out translating these results to the same effect size. Step 5, an analysis and interpretation of the combined estimated effect size and heterogeneity was made. A flow diagram of current meta-analysis study selection can be found in Fig. 1S in the supplementary material.

Inclusion and Exclusion Criteria

Efficacy Analysis

Inclusion criteria for the efficacy analysis were as follows: (1) studies that included a psychological intervention based on the Unified Protocol for Emotional Disorders. (2) Subjects suffered from an emotional disorder. (3) The intervention design was RCT. (4) The treatment was in group format. (5) Patients were adults, i.e., aged 18 years or older. (6) At least two measurements (pre-post treatment) were conducted. (7) The study assessed depression or anxiety symptoms. (8) At least the mean and standard deviation of the assessment results were published. (9) The psychological intervention was face-to-face. (10) The psychological intervention had at least five sessions.

The exclusion criteria for the efficacy analysis were as follows: (1) The study design was a single case. (2) The intervention modality was individual. (3) The subjects were under 18 years of age. (4) The assessments did not include depression or anxiety symptoms. (5) The psychological intervention was conducted online. (6) The therapy had less than five sessions. (7) The study did not include a control group and/or the sample was not randomly assigned.

Effectiveness Analysis

Studies included in the effectiveness analysis had to meet the following criteria: (1) A psychological intervention based on the Unified Protocol for emotional disorders is carried out. (2) Subjects who received the intervention suffer from an emotional disorder. (3) The intervention design was RCT or non-RCT. (4) The therapy was in a group format. (5) The subjects were adults, i.e., aged 18 years or older. (6) At least two assessments (pre-post treatment) were carried out. (7) The study assessed depressive or anxiety symptoms, as well as other variables such as quality of life, social adjustment, or personality styles (positive affect and negative affect). (8) At least the mean and standard deviation of the assessment results were presented. (9) Treatment was conducted face-to-face. (10) The duration of the intervention was five or more sessions.

In addition, the exclusion criteria for the effectiveness analysis were as follows: (1) The study design was single case. (2) The intervention format was individual. (3) Subjects were under 18 years of age. (4) Measures did not include symptoms of depression, anxiety, quality of life, social adjustment, or personality styles (positive affect and negative affect). (5) The psychological intervention was conducted online. (6) The psychological intervention had less than five sessions.

Sources of Information

We searched for studies between June 2023 and November 2023. The following databases were used: Web of Science, PsycInfo, PubMed, PSICODOC, Scholar Google, EBSCOhost, SpringerLink, and Sage Publications.

Search

The searching strategy was based on the combination of the following keywords: “Unified Protocol,” “Barlow,” “efficacy,” “effectiveness,” “group,” and “RCT.” These combinations were made using the Boolean operators “AND” and “OR.”

Selection of Studies

First, we reviewed all the databases and located those documents that contained the search terms. Second, we performed a first screening in which we eliminated duplicate results. Third, we inspected the remaining papers and excluded those in which the UP was not applied, as well as those papers that contained records of future studies and meta-analyses. We also excluded documents that were not research articles such as books or theses. Fourth, two authors independently reviewed the full text of each article to select the studies to be included in the meta-analysis following the inclusion and exclusion criteria. Fifth, both authors agreed on the final choice of articles.

Data Extraction and Description of Variables

After selecting the studies, we proceeded to select the information necessary to carry out the analysis. At a qualitative level, we collected the main information of each study. These characteristics were the following: first, contextual variables where we collected data on the sample recruitment process; second, methodological variables where we recorded information on the research design; third, treatment variables such as the intervention format; and fourth, participant variables where we collected demographic and diagnostic data.

In addition, we recorded quantitative information in a table in which mean and standard deviation values of the psychological variables pre- and post-treatment, for both the group receiving the UP intervention and the control group. We also noted the sample size of both groups.

Finally, we selected the following psychological variables for study: anxiety symptoms, depressive symptoms, personality styles (positive affect and negative affect), quality of life, and social adjustment.

Meta-analytic Procedures

Homogenization of the Results of the Selected Studies

The efficacy results found in the selected studies presented different effect sizes, so it was first necessary to transform the different effect sizes into a common metric.

To do this, we decided to use Hedges’ g (adjusted), one of the most common effect sizes in meta-analytic studies when comparing the evolution of a continuous measure between two groups (Lipsey & Wilson, 2001). We compute Hedges’ g based on the formula proposed by Cohen (1988) and incorporate the correction factor of Hedges (1981).

Thus, for the efficacy analyses, the formula for calculating effect sizes consists in the post–pre difference of the experimental group minus the post–pre difference of the control group. For this, we chose only the pre-treatment standard deviations as recommended by Morris (2008).

For the effectiveness analyses, the computation of effect sizes was done through the mean difference of the pre- and post-treatment measurement moments of the experimental group.

This formula together with the calculation of the standard error associated with the effect size and the calculation of the confidence interval (95%) can be found in Table 2S of the supplementary material.

The interpretation of the effect size was done according to the criteria outlined by Sawilowsky (2009) as very small (0.1), small (0.2), medium (0.5), large (0.8), very large (1.2), and huge (2).

Choice of Statistical Model

We decided to use a fixed-effect model rather than a random-effects model for the analysis for the following reasons:

First, the studies that were selected for the meta-analysis had very similar characteristics, and, therefore, we expected to find low heterogeneity.

Secondly, the fixed-effect model allows the results obtained in the analysis to be generalized to other populations with very similar characteristics to the samples of the studies included in the meta-analysis. This feature of the fixed effects model makes it possible to extrapolate the results to future similar contexts in which to treat emotional disorders.

Third, the choice of the random-effects model could generate biases in the results because the number of studies included in the analysis was less than 30, and according to the recommendations of Aguinis et al. (2011) for the estimation of the inter-study variance to be accurate (in the random-effects model), the studies analyzed should be more than 30.

Combined Effect Size Estimation

After homogenizing the results of the selected studies and choosing the statistical analysis model, we calculate the combined estimate. To do this, we use the formula proposed by the fixed effects model that can be found in Table 3S of the supplementary material along with the formula to calculate its confidence interval (95%).

Heterogeneity

Furthermore, we studied the variability of the effect sizes of the selected studies by calculating heterogeneity. To estimate this parameter, we use two statistics: Cochran’s Q and the \({I}^{2}\) index. On the one hand, Cochran’s Q allows us to know whether or not there is statistically significant heterogeneity depending on whether or not the null hypothesis is accepted. The null hypothesis will be rejected if p < α, where α = 0.05. On the other hand, the \({I}^{2}\) index, proposed by Higgins and Thompson (2002), shows the degree of heterogeneity of the effect sizes. The formulas for estimating Q and the \({I}^{2}\) index are given in Table S3. In addition, we interpret the degrees of heterogeneity following the criteria of Huedo-Medina et al. (2006): thus, when the \({I}^{2}\) value is 0% the heterogeneity is zero, when \({I}^{2}\) is 25% the heterogeneity is low, when \({I}^{2}\) is 50% the heterogeneity is medium and when \({I}^{2}\) is 75% the heterogeneity is high.

After calculating the combined estimate and heterogeneity, we created a forest plot to be able to observe the results of the analyses graphically and synthetically.

Finally, when we found high levels of heterogeneity in the analyses (above 25%), we searched the studies that might be responsible and developed possible hypotheses as to why they had different efficacy and effectiveness results. To do this, we reviewed the main characteristics of each study, comparing them with each other to see if there were any relevant differences between them. A more exhaustive explanation of the possible causes of heterogeneity was carried out in the discussion.

All statistical calculations were carried out by Review Manager 5.4 software (RevMan, 2020).

Results

Selection of Studies

The search for papers through the databases resulted in a total of 115 studies in which the UP intervention had been applied. Of these 115 studies, 7 studies met the inclusion criteria for the efficacy analysis, and 24 studies met the inclusion criteria for the effectiveness analysis. Those studies that were excluded from both analyses and the reasons for their exclusion can be found in Table 4S. Finally, we describe the study selection process in more detail in Fig. 1S in the supplementary material.

Study Characteristics

The selected studies were coded according to the variables as explained in the study selection section of the methodology section. Table 1 shows the studies selected for analysis (efficacy and effectiveness) along with a summary of their main characteristics.

Table 1 Codification of the characteristics of the studies selected for meta-analysis

Study Results

First, we present an analysis of the effects produced by the UP on the different psychological variables under study. In the analyses of the symptoms of depression and anxiety, we included a study of the efficacy (RCT studies) and effectiveness (RCT and non-RCT studies) of the Unified Protocol. For the rest of the psychological variables, we only incorporated studies of the effectiveness of the UP, as we did not find enough RCT papers to be able to carry out an efficacy study as well. Finally, we present each analysis by means of a forest plot so that the results obtained can be observed graphically.

Efficacy Analysis

Analysis of Depressive Symptoms

In the analysis of the efficacy of the UP in depression, we obtained the following results: \({Chi}^{2}\) =51.5; df = 6; p =  < 0.00001; \({I}^{2}\) = 88%; ES = 1.54; 95% CI = [1.27,1.80]. These results are presented graphically in a forest plot in Fig. 1.

Fig. 1
figure 1

Forest plot of efficacy on the depression and anxiety measure

As can be seen, we found an heterogeneity of 88%. The studies responsible of this heterogeneity were the research by Bameshgi et al. (2021), Nazari et al. (2020), and Zemestani et al. (2017). The exclusion of them from the analysis reduced heterogeneity to 0%.

So, we conducted an analysis without including these studies and found the following results: \({Chi}^{2}\) =1.76; df = 3; p =  < 0.00001; \({I}^{2}\) = 0%; ES = 1.02; 95% CI = [0.71,1.32]. A forest plot of this analysis can be seen in Fig. 2S in the supplementary material.

Analysis of Anxiety Symptoms

In the forest plot in Fig. 1, we show the analysis of the efficacy of the UP on anxiety symptoms (\({Chi}^{2}\) =55.37; df = 5; p =  < 0.00001; \({I}^{2}\) = 89%; ES = 1.35; 95% CI = [1.09,1.60]).

We also found that high heterogeneity (91%) was generated by the studies of Bameshgi et al. (2021), Zemestani et al. (2017), and Nazari et al. (2020). An analysis excluding them is shown in Fig. 3S in the supplementary material. The results of this analysis were \({Chi}^{2}\) =3.98; df = 3; p =  < 0.00001; \({I}^{2}\) = 25%; ES = 0.84; 95% CI = [0.54,1.14].

Effectiveness Analysis

Analysis of Depressive Symptoms

Analyses of the effectiveness of the UP on the depression measure are presented in Fig. 2: \({Chi}^{2}\) =205.99; df = 21; p =  < 0.00001; \({I}^{2}\) = 90%; ES = 0.97; 95% CI = [0.87,1.07].

Fig. 2
figure 2

Forest plot of effectiveness on the depression and anxiety measure

The heterogeneity resulting from these analyses was high. The most heterogeneous studies were the research by Bameshgi et al. (2021), Reinholt et al. (2016), Varkovitzky et al. (2018), Nazari et al. (2020), and Zemestani et al. (2017). Not including them in the analysis decreased the heterogeneity to 20%. We present an analysis without incorporating these investigations in Fig. 4S in the supplementary material (\({Chi}^{2}\) =19.99; df = 16; p =  < 0.00001; \({I}^{2}\) = 20%; ES = 0.86; 95% CI = [0.75,0.97]).

Analysis of Anxiety Symptoms

In Fig. 2, we present a forest plot with the analysis of the effectiveness of the UP in the measurement of anxiety symptoms. The results achieved were as follows: \({Chi}^{2}\) =302.61; df = 17; p =  < 0.00001; \({I}^{2}\) = 94%; ES = 1.12; 95% CI = [1.00, 1.23].

\({I}^{2}\) reflected high heterogeneity. It was caused by the research of Reinholt et al. (2016), Nazari et al. (2020), and Zemestani et al. (2017). When they were excluded from the analysis, \({I}^{2}\) was reduced to 6%. An analysis without these studies is shown in Fig. 5S in the supplementary material (\({Chi}^{2}\) =12.75; df = 12; p =  < 0.00001; \({I}^{2}\) = 6%; ES = 0.84; 95% CI = [0.70, 0.97]).

Analysis of the Positive Affect

The effectiveness of the UP on the measure of positive affect is found in Fig. 3. The results obtained were as follows: \({Chi}^{2}\) =21.03; df = 9; p =  < 0.00001; \({I}^{2}\) = 57%; ES = 0.56; 95% CI = [0.43, 0.69].

Fig. 3
figure 3

Forest plot of effectiveness on the positive and negative affect measure

Heterogeneity presented a medium level (57%) and was reduced to a low level (0%) when the study by Nazari et al. (2020) was not excluded of the analysis. An analysis of the effectiveness on the positive affect of the UP without the Nazari study can be seen in Fig. 6S in the supplementary material (\({Chi}^{2}\) =7.33; df = 8; p =  < 0.00001; \({I}^{2}\) = 0%; ES = 0.49; 95% CI = [0.36, 0.63]).

Analysis of Negative Affect

A forest plot with the analysis of negative affect is presented in Fig. 3 (\({Chi}^{2}\) =35.07; df = 9; p =  < 0.00001; \({I}^{2}\) = 74%; ES = 0.77; 95% CI = [0.63, 0.90]).

The resulting 74% heterogeneity was produced by the extreme values of the Nazari et al. (2020) study. An analysis without the most heterogeneous studies is set out in Fig. 7S in the supplementary material: \({Chi}^{2}\) =9.75; df = 8; p =  < 0.00001; \({I}^{2}\) = 18%; ES = 0.67; 95% CI = [0.53, 0.81].

Analysis of the Quality of Life Measure

Results are shown in Fig. 4 (\({Chi}^{2}\) =34.50; df = 8; p =  < 0.00001; \({I}^{2}\) = 77%; ES = 0.76; 95% CI = [0.61, 0.91]).

Fig. 4
figure 4

Forest plot of the effectiveness on the quality of life and social adjustment measure

The \({I}^{2}\) value of 77% indicated high heterogeneity. We found that research by De Ornelas Maia et al., (2013, 2017) and Johari-Fard and Ghafoupour (2015) showed the largest differences in effectiveness results. By excluding them, the heterogeneity was reduced to 22%. A forest plot with the analysis without these studies is shown in Fig. 8S in the supplementary material: \({Chi}^{2}\) =6.40; df = 5; p =  < 0.00001; \({I}^{2}\) = 22%; ES = 0.60; 95% CI = [0.44, 0.76].

Analysis of the Social Adjustment Measure

Figure 4 shows a forest plot with the analysis of the effectiveness of the UP on the measure of social adjustment. The main results obtained were as follows: \({Chi}^{2}\) =1.84; df = 5; p =  < 0.00001; \({I}^{2}\) = 0%; ES = 0.67; 95% CI = [0.42, 0.93].

Finally, we summarize the results obtained in all analyses in Table 2.

Table 2 Summary table of the results of the efficacy and effectiveness of the Unified Protocol on the different psychological variables

Discussion

Results obtained show the efficacy and effectiveness of UP in reducing depressive and anxiety symptoms and negative affect, as well as in increasing quality of life, positive affect, and social adjustment. We also found a high degree of heterogeneity in almost all analyses. However, it is important to point out two key aspects of the heterogeneity resulting from these analyses. First, the results of the combined effect sizes estimation remained stable both when all studies were included and when those with more extreme values were excluded. This would indicate that the UP achieves robust results. In addition, the studies with more extreme results achieve larger effect sizes than the average of the rest. This demonstrates that the differences in these investigations would further increase the efficacy and effectiveness of the UP.

Furthermore, if we analyze the characteristics of the most heterogeneous studies in comparison with the rest, it is possible to hypothesize some reasons why they obtained different efficacy and effectiveness results than the rest. Thus, the main characteristics in which the heterogeneous studies differed from the rest were the following: type of diagnosis, level of education and age of the participants, previous chronicity of emotional problems in the intervention subjects, and finally, the use of different types of instruments to measure the psychological variables.

The most heterogeneous studies that differed from the rest in the type of diagnosis were Bameshgi et al. (2021), Nazari et al. (2020), and Varkovitzky et al. (2018).

As can be seen in Figs. 1 and 2, the research by Bameshgi et al. (2021) was an intervention that achieved results of efficacy and effectiveness in depression and anxiety measures that were much higher than average. When reviewing the study of Bameshgi et al. (2021), we found that the diagnosis of the subjects of this research was different from the rest, since the main problem of the individuals was depressive and anxious states associated with marital problems and this characteristic was only present in this study. Therefore, the higher efficacy results found in this research could be due to the type of diagnosis of the subjects. An explanation of how this occurs was presented by the authors of the research themselves. Thus, they related the high efficacy found to the UP’s ability to change emotional avoidance patterns into coping strategies. The consequences of this change of emotional strategies would be the reduction of depressive and anxiety symptoms, as well as the improvement of marital communication. Thus, the patients instead of avoiding expressing their feelings with their partner or avoiding problematic situations would use better communication patterns which would allow them to face conflicts and reach greater marital agreements. Likewise, better levels of communication would be related to better couple relationships, and better couple relationships would result in a decrease in negative emotional states. Therefore, the combination of both problems would explain why UP would be even more effective in this type of diagnosis.

Secondly, in Nazari et al. (2020), we also found differences in the diagnosis of the participants. This study was the only one that worked with patients whose main problems were depressive and anxiety disorders associated with multiple sclerosis syndrome. Thus, this difference in the diagnosis of the participants could be the reason why this study achieved much larger than average effect sizes on measures of depression, anxiety, positive affect, and negative affect (Figs. 1, 2 and 3). Two arguments could explain these differences in efficacy and effectiveness. First, people suffering from multiple sclerosis syndrome have high levels of comorbidity of anxiety disorders and depression. Consequently, these patients are likely to benefit more from transdiagnostic interventions that simultaneously address both anxiety and depression problems (Butler et al., 2016). Also, it has been shown that individuals with multiple sclerosis tend to use less problem-focused strategies than the general population, while more frequently employing emotional avoidance and worry strategies (Goretti et al., 2009). As discussed above, one of the main therapeutic components of UP is to shift the use of emotional avoidance strategies to coping strategies. Then, increasing coping strategies could be very effective in patients with this type of problem.

Third, in Varkovitzky et al. (2018), we also found differences in participants’ diagnoses. This study was the only one that intervened with military personnel whose primary diagnosis was post-traumatic stress disorder (PTSD). This difference in diagnosis could explain why the effect size of the UP in this intervention was moderate on the depression measure (ES = 0.43). In fact, according to Steenkamp et al. (2015), approximately 66% of military veterans diagnosed with PTSD who receive psychological therapy continue to have PTSD after the intervention. On the other hand, the effect size found by Varkovitzky et al. (2018) was similar to other effect sizes found in other studies that applied different group treatments to military veterans diagnosed with PTSD (Sloan et al., 2013). Thus, these data seem to suggest that military patients diagnosed with PTSD present, in general, more difficulties for therapeutic success, which would explain why the efficacy of the UP in this study was lower.

In conclusion, the diagnostic characteristics of the participants could be related to the difference in efficacy and effectiveness results between these investigations and the rest. Thus, one possibility would be that the UP is more efficient/effective when intervening in a certain type of problem than in others. Furthermore, according to the results obtained, the UP could enhance its effects when intervening on depressive and anxiety disorders associated with multiple sclerosis syndrome and when intervening on depressive and anxiety disorders associated with marital problems.

Another differential characteristic found in the more heterogeneous studies was the educational level of the participants in the research by Zemestani et al. (2017).

The study by Zemestani et al. (2017) obtained efficacy and effectiveness results well above average on measures of anxiety and depression (Figs. 1 and 2). This research was the only one on the depression measure that worked with university students. Therefore, this characteristic could be related to its high results of efficacy and effectiveness in comparison with the rest. In this way, some authors such as Nazari et al. (2020) relate high levels of education with better abilities to learn UP skills and, therefore, with higher gains from the therapy. Thus, a person with high levels of education could be more effective in performing some key aspects of therapy such as self-reporting and homework, as this work methodology is similar to that used in teaching. Consequently, a possible hypothesis explaining the increased efficacy and effectiveness of these investigations could be that having a higher level of studies leads to better UP results.

An alternative explanation of the differences in efficacy between the heterogeneous studies and the rest could be the low age of the participants. Here, the research by Zemestani et al. (2017) showed the lowest mean age of all research with 22.8 years. As we have just discussed, this study achieved very high effect sizes, so the low age of the participants could be a factor that enhances the effects of the UP.

In addition, whether participants had received previous treatment or had been suffering from symptoms for more than 5 years could be other variables influencing the results of effectiveness on depressive symptoms, anxiety symptoms, and quality of life. For example, intervention results of Reinholt et al. (2017) achieved significantly less effective results than the rest (Figs. 2 and 4), and this study had some different characteristics in its sample. First, 91.5% of the sample had received previous treatment, and, in addition, 51.1% had been suffering from symptoms for more than 5 years. Reinholt et al. (2017) suggested that the “sample was composed of psychiatric outpatients with high rates of comorbidity, a long history of symptoms and several previous treatments, could indicate a certain degree of treatment resistance and/or chronicity” (Reinholt et al. 2016, p.38). Consequently, patients with chronic emotional problems and previous treatments could lead to a decrease in the effectiveness of the UP.

In turn, the use of different assessment instruments could explain the heterogeneity found in the analysis of the quality of life measure. In this analysis, it was found that, with the exception of two studies, all of them used different instruments: Osma et al. (2015) used the LQI (Quality of Life Index) instrument; Bullis et al. (2015) used the Q-LES-Q (Satisfaction and Quality of Life Questionnaire); Reinholt et al. (2017) used the WHO-5 (World Health Organization Well-Being Index); De Ornelas Maia et al., (2013, 2015) evaluated using the instrument WHOQOL (World Health Organization Quality of Life); finally, Johari-Fard and Ghafoupour (2015) measured the quality of life with the instrument IBS-QOL(Irritable Bowel Syndrome-Quality of life). Furthermore, when we compared the assessment instruments used in all of them, the analysis of the quality of life measure showed the greatest variety. Therefore, the use of different assessment instruments could lead to differences in effectiveness results.

Finally, we obtained other important findings when we compared the efficacy and effectiveness results. On the measure of depressive symptoms, we found a value of 1.54 in the combined estimate of the efficacy analysis, while in the effectiveness analysis, this was 0.97. Although there was some difference between the two results (0.57), both indicate a high effect of the intervention in reducing depressive symptomatology. Secondly, in the analysis of anxiety symptoms, an efficacy score of 1.35 and an effectiveness score of 1.12 were found. Both results are similar and show a high effect on the intervention in decreasing anxious symptomatology. Therefore, the analyses of efficacy and effectiveness of these measures go in the same direction and show positive and robust results in favor of the UP intervention on anxiety and depressive symptoms variables.

Also, due to the scarcity of RCTs studies, we were only able to perform an efficacy analysis of these two variables. The rest of the variables were analyzed in non-RCT studies, so the results we found could be influenced by variables external to the treatment. However, it is likely that the results we have found in the effectiveness analyses may be similar to those found in future RCT studies, as was the case when we compared efficacy and effectiveness results on the depression and anxiety variables. Therefore, although it still needs to be confirmed in future studies, the results obtained suggest that UP in group format is an intervention with a great capacity to reduce negative affect and to increase the psychological variables of quality of life, social adjustment, and positive affect.

In addition, the results found in the improvement of these variables are similar to those found in other meta-analyses such as those carried out by Sakiris and Berle (2019) and Carlucci et al. (2021). However, these comparisons must be made with caution, because they include group and individual applications, as well as adult and child populations, while this work has only included UP research in group format with adults.

In summary, the results obtained are particularly relevant given that, in RCT studies, the UP is shown to be efficacious in group format, which supports its internal validity, and represents an original result. This type of study is usually carried out in Universities or University Hospitals by therapists in the field of research and education (researchers, professors, doctoral students, and/or scholarship holders) who have been trained in the application of the same therapy being studied. In turn, the fact that the UP in group format has internal consistency and has been shown to be efficacious in controlled situations, means that in applied settings its effectiveness, external validity, and clinical usefulness have been very satisfactory. Furthermore, it is important to highlight that the characteristics of this treatment give it a high degree of efficiency. On the one hand, it is a therapy that can treat the entire range of emotional problems and, on the other hand, its group format allows one to work with several people at the same time. That is, a single therapist can treat different people with different emotional problems simultaneously.

Therefore, the Unified Protocol in group format could be an efficient solution in contexts such as hospitals or university clinics where the demand is very high and there are not enough professionals to attend to it. Especially relevant for the public sector where resources are scarce and mental health funding is insufficient.

Limitations

One of the main limitations we found was high heterogeneity. Although the results between analyses with and without heterogeneous studies were similar, generalization of the results obtained in our analyses to other studies with similar characteristics would present difficulties due to this high heterogeneity.

Another important limitation in meta-analysis studies is referred to publication bias; we have to be cautious because studies accepted and published tend to show positive results; negative results may not be published and this fact may affect the conclusions and treatment recommendations. Funnel plot-based methods that include visual examination of a funnel plot, regression and rank tests, and the nonparametric trim and fill method can be useful to deal with this limitation.

Future Directions

The amount of research conducted on the efficacy of the Unified Protocol in groups using a RCTs design is low; specifically, seven studies were located. Thus, more research seems to be needed in this area. More experimental research would allow for more information about the efficacy of UP on the psychological variables analyzed in this meta-analysis.

In addition, an important aspect to take into account would be the study of the effects of the group format of UP in the long term. Due to the paucity of studies, they were not included in this analysis. However, the results found so far are promising (Bullis et al., 2014; Osma et al., 2021), so it would be a great contribution to the literature to include long-term follow-ups in future analyses.

On the other hand, more research could be useful to clarify the hypotheses raised about the influence of some variables that may exert moderating effects on the outcomes of UP, such as type of diagnosis, education level, and age.

Subgroup analysis around variables like therapy format, clinical setting, comorbidity, number of sessions, attrition percentage, and outcome measures would help to clarify and identify more treatment moderators.

There is a need to analyze the geographic moderation of the findings and their ability to generalize to different geographical locations or population densities; some of more heterogeneous studies are located in non-western countries; this variable should be taken into account in future research.