A Systematic Review of Intervention Programs Promoting Peer Relationships Among Children and Adolescents: Methods and Targets Used in Effective Programs

Children’s peer relationships are crucial for their social-emotional development, mental and physical health. To identify effective strategies to facilitate peer relationships among 8–14-year-olds, a systematic review of intervention programs was conducted. Electronic databases ERIC, EMBASE, MEDLINE, PsycINFO, Cochrane Collection Library and grey literature sources were searched for intervention studies with general or clinical populations published between 2000 and 2020. Interventions had to assess quantity or quality of peer relationships as an outcome measure, thus focusing on helping children to establish more positive relationships or improving their self-reported relationship quality. Sixty-five papers were identified and grouped into universal prevention programs, selective interventions for typically developing children and indicated interventions for children with clinical diagnosis. Prevention programs and interventions for typically developing children facilitated peer relationships by targeting mental wellbeing and self-concepts. Clinical interventions focused on social-emotional skills, symptoms and peer behaviors. Successful programs showed a close alignment of methods and targeted program effects. Practitioners should also be aware of realistic goals for each population. Programs for a general population showed potential to decrease loneliness, whereas clinical populations achieved high increases in play dates, peer acceptance and sociometric status.


Introduction
Peer relationships and friendships play an important role in children's development, particularly around the transition from childhood to adolescence (Brown & Larson, 2009). Peer experiences are linked to young people's physiological health (Leigh-Hunt et al., 2017), mental health (Schwartz-Mette et al., 2020), socioemotional development (Laible, 2007) and identity development (Nawaz, 2011). While social-emotional learning (Greenberg et al., 2017), young people's personality development (Ofsted, 2019) and mental health (NICE, 2008) are increasing priorities of the education system, the role of peer relationship programs has so far been neglected in the literature. This review aims to fill this gap in the literature by synthesizing existing evidence on the effectiveness of programs aiming to facilitate peer relationships and to translate this evidence into advice for best-practice intervention development and implementation.

Indices of Peer Relationships and Their Long-Term Outcomes
Peer relationships are defined as "interactions, both positive and negative, with same-aged mates", which are becoming ever more complex over the course of adolescence (Naylor, 2011(Naylor, , p. 1075. Friendships specifically are defined as a "relationship between two individuals characterized by 1 3 support, time, intimacy, trust, affection, and the ability to manage conflict" (Roach, 2019, p. 330). Thus, "peers" is a more general term for individuals, typically close in age, within the same social network or community, while only some peer relationships are considered friendships, which are typically closer, mutual relationships (Arnett & Jensen, 2012). Although there is considerable heterogeneity regarding the use of this terminology (Flannery & Smith, 2017), research suggests that all types of peer experiences are fundamental for developmental trajectories.
Peer experiences in childhood and adolescence have been recognized as crucial since Vygotsky's (1978) sociocultural perspective, postulating cognitive development as guided by social experiences. Indeed, increased social sensitivity during adolescence (Blakemore & Mills, 2014), is thought to interact with neurological development in social-emotional (Casey et al., 2008) and cognitive domains (Blakemore, 2012). Peer relationships provide a unique context to practice socioemotional skills, due to inherent reciprocity and equal power balances (Laible, 2007). Peer attachment surpasses even parent attachment in its effects on adolescent's socialemotional development (Laible, 2007) and was found to be positively associated with identity development (Ragelienė, 2016). The consequences of poor peer relationship quality (Schwartz-Mette et al., 2020), loneliness and social isolation (Leigh-Hunt et al., 2017) can manifest in negative long-term mental health outcomes (e.g. depression, anxiety, self-harm or suicide intention). Mental health problems are both associated with (Husky et al., 2020) and exacerbated (Brendgen & Poulin, 2018) by negative peer relationships in school (i.e. victimization)-associations which can be traced into adulthood (Copeland et al., 2013) and even old age (Hu, 2021). Even regarding physiological health, loneliness and social isolation are linked to poorer general wellbeing, cardiovascular disease, and mortality (Leigh-Hunt et al., 2017). Furthermore, peer relationships in childhood seem to play a role in acute and long-term immune profiles (Scott & Manczak, 2021).

Existing Support Programs and Their Shortcomings
For a long time, researchers and health care providers have been calling for early interventions to foster children's mental health (Enns et al., 2016) and counteract developmental risk factors (Conroy & Brown, 2004). A considerable body of literature focuses on social-emotional learning (Taylor et al., 2017), mental health programs (Das et al., 2016) and school-based interventions (Shackleton et al., 2016), However, research points towards strong interdependencies between peer relationships and social-emotional learning and mental health (Orson et al., 2020). Indeed, peer relationships in school are a major determinant of student's life satisfaction (Suldo et al., 2013) and most important source of support (Bagnall et al., 2020). Negative peer experiences appear to be counterbalanced by the presence of one friend (Adams et al., 2011) and positive friendship quality (Cuadros & Berger, 2016). Thus fostering children's peer relationships in its own right has been identified as an important objective to advance public health (Atkins et al., 2017). However, to this date little effort has been made to systematically collect existing evidence regarding peer relationship programs.
Additionally, there seems to be a lack of focus on the crucial transition period from childhood to adolescence. While a vast amount of literature focuses on early years, the period of late childhood and early adolescence, between the age of 8 and 14, is often overlooked in intervention literature (Milton et al., 2021). This is despite the fact that effectiveness of preventive efforts were found to peak during important developmental transition periods (January et al., 2011)-such as early adolescence. This lacking focus on early adolescence is problematic, as it constitutes a critical period for social-emotional development (Casey et al., 2008) and the onset of many mental health disorders (Merikangas et al., 2010) with 50% of lifetime mental health disorders developing before the age of fourteen (Kessler et al., 2005).
Furthermore, existing evidence regarding the effectiveness of support programs for children is not consistent. While some reviews of primary prevention programs provide evidence for overall effectiveness (Cheney et al., 2014;Mason-Jones et al., 2012), other reviews point towards limited effectiveness of programs (January et al., 2011;Mackenzie & Williams, 2018). These mixed results have been attributed to heterogeneity of target populations, settings and implementation factors (de Leeuw et al., 2020), underlining the importance of a close examination of intervention contents, context variables, and target populations in order to design developmentally and contextually appropriate and effective intervention programs. As the best intervention approach is likely to be dependent on the target group's specific needs, available resources and context factors, more comprehensive overviews of existing approaches are needed (Gutman & Schoon, 2015).

Current Study
Although the importance of peer relationships, specifically from middle childhood to early adolescence (8-14), is evident from presented literature, to the authors' knowledge, there is no systematic examination of existing psychoeducational support programs to facilitate peer relationships. To provide a comprehensive overview of existing programs and their outcomes, the analysis of this review aimed to answer general review questions such as "what works, for whom, and in what circumstances" (Popay et al., 2006, p. 19). The first research question concerned an overview of existing programs and contextual factors to clarify which circumstances (program duration, age groups, population etc.) further program effects in all peer relationship programs. The second research question addressed differential program set-ups and effects for specific target populations, to specify "what works for whom" (Popay et al., 2006, p. 19). The analysis will focus on what typical programs for each target population look like (i.e. methods, intended program effects and peer relationship outcomes), if specific methods and intended program effects are related to better peer relationship outcomes for each of the respective target populations, and which peer relationship outcomes (i.e. different indices of peer relationships) are addressed and improved for each target population. Since this is the first narrative review with a specific focus on peer relationship interventions, broad inclusion criteria were established to provide a comprehensive overview of targeted interventions and their outcomes, and establish a sound foundation for future research efforts and implementation in practice.

Database Search
A structured protocol for this review was registered on the PROSPERO database (reference CRD42018111227). A systematic search was conducted on the electronic databases Education Resources Information Center (ERIC), EMBASE, MEDLINE, PsycINFO and Cochrane Collection Library as well as on the grey literature sources OSF Preprints and OpenGrey. The search for this and another systematic review on the determinants of peer relationships  was conducted simultaneously with broadly defined terms to include both, intervention and empirical studies. For a detailed description of the search strategy see Online Resource 1. Primary search terms used in all the above databases comprised a set of terms relating to the population of children and adolescents (e.g. 'child*', 'adoles*', 'teenage*') and a set of terms referring to peer relationships (e.g. 'social relation(s)', 'social connection(s)', 'belongingness', 'friendship(s)', 'peer relation(s)'). Terms for intervention were not included at this stage, instead intervention studies were identified within the process of title screening. Papers published in a scientific journal in English, German, Spanish, Portuguese, Italian, Hebrew, Croatian or Serbian with available English abstract were considered for inclusion. Additionally, a secondary search of relevant literature reviews, identified during the database search, as well as hand searches of reference lists and cited references of included papers, were conducted (see Fig. 1). Included were all studies published or updated after 2000 to ensure applicability and transferability of intervention techniques and results into today's context.
The search was updated in June 2020 with the original (population and peer relationship) terms and additionally defined intervention terms (e.g. 'prevention*', 'intervention*' 'program*', 'training*'). Intervention terms were selected from standard intervention terms and terms used in papers previously identified for inclusion. Some terms appear quite general, but were deliberately included to identify as many intervention papers as possible (in line with the original broad search).

Inclusion and Exclusion Criteria
Eligible for inclusion were both prevention and intervention programs for children aged between 8 and 14 years, including general population samples, at-risk and highfunctioning clinical samples. Case studies were excluded to allow for group level analyses. Biases attributed to nonrandomized trails or inappropriate randomization concealment were found to be unpredictable and an a-priori exclusion has been discouraged (Gluud, 2006). Thus, this review included uncontrolled trials as well as controlled trials with randomized control, waitlist control, or "school-as-usual" control group as comparator. Studies had to report on outcome data collected with a full-scale or sub-scale measure of peer relationships with available reliability and/or validity information. During the screening process, several studies that focused on populations in precarious or adverse life circumstances were identified, including refugees, delinquent or homeless populations and populations with a serious physical illness (e.g., cerebral palsy, cancer). These studies were delivered in specific formats and/or settings that would demand a bespoke analysis and specific contextual understanding of ecology around the interventions. Due to very specific needs and targeted intervention strategies for these populations, such studies were excluded.
Although different indices of peer relationships and friendships have been discussed as separate constructs (Schwartz-Mette et al., 2020), heterogeneity of research findings has been attributed to heterogeneity of measurements and constructs across peer relationship domains (Schacter et al., 2021). However, research on various peer relationship indices and its short-and long-term associations suggests that all indices are important for general developmental trajectories. This heterogeneity in peer relationship definitions and the aim to synthesize all existing evidence, led to a broad operationalization of peer relationships as subjective feeling regarding quality and quantity aspects of relationships with peers. Thus, self-report measures of peer relationship quality (e.g. intimacy, closeness of friendship questionnaires) and quantity (e.g. perceived popularity or acceptance questionnaires) were eligible for inclusion. Social connectedness to 1 3 peers was considered a quality aspect reflecting a feeling of belonging (Haslam et al., 2015), and loneliness was regarded as an indicator of a lack of relationship quality or quantity. Additionally, sociometric measures, which ask each group member to provide peer nominations or ratings, were understood as group-based self-report of relationships. Although an individual's sociometric rating is derived from their peers' ratings, this was understood as adequate self-report of young people's peer relationships on a group level. However, behavioral measures, observational measures and measures concerning contextual aspects or adult connectedness (e.g. school connectedness, community connectedness) were excluded.

Quality Appraisal of Included Studies
The Revised Cochrane risk-of-bias tool for randomized trials (RoB 2) (Higgins et al., 2019) was chosen to assess the methodological quality of included studies. The Cochrane ROB rating was conducted by two trained reviewers independently for 20 (25%) papers, with Cohen's Kappa = .64 reliability between reviewers being moderate (McHugh, 2012). Disagreement between reviewers was resolved through discussion. The remaining papers were assessed by one reviewer.
It has been argued that some reviews are likely to benefit from an additional quality appraisal seeking to review specific differences between studies, which might be more informative concerning review specific questions (Gough, 2007). The nature of school-based interventions and the inclusion criteria for this review lead to very similar ratings for all included studies. For example, many papers were considered high risk regarding implementation fidelity, which reflects the fact that Cochrane focuses on clinical interventions, while this review assessed social interventions, which are more flexible. Therefore, an additional authorderived quality of evidence rating was carried out, based on Cochrane domains considered crucial for methodological quality with potential to impact intervention effects. Points awarded in this author-derived rating system were adjusted to better reflect differences between included studies. The author-derived rating was subdivided into (A) quality of evidence and the (B) level of evidence (for detailed description see Online Resource 2). Categories included in the (A) quality of evidence rating were (i) study design, (ii) randomization procedure, (iii) implementation fidelity, and (iv) missing data. Each category was rated with 0 points representing low quality, 1 point for medium quality and 2 points for high quality. A summary score between a minimum of 0 and a maximum of 8 points was created and transferred into percentages of achieved quality (e.g. 4 points would equal 50%).
The (B) level of evidence rating comprised (i) between group significance (if applicable), (ii) effect sizes, (iii) follow-up evidence (if applicable) and (iv) sample size. Including all these indices increased the validity of the effectiveness estimates as positive evidence was weighted against negative evidence, while absent evidence (e.g. due to lacking follow-up data) was not impacting the rating negatively. For example, a significant between-group effect alongside low within-group effect size (indicating a drop in the control group) is suggesting an intervention was effective in preventing a drop in connectedness-which would be plausible as adolescence is often characterized as a time of heightened social sensitivity and loneliness (Wong et al., 2018). Similarly, significant follow-up effects after low prepost effects are a promising sign of the intervention's effectiveness in the long-run as friendships take a while to consolidate and more time spent together has a positive effect on the relationship by establishing shared meanings and behaviors (van Hoogdalem et al., 2012). A summary score between a minimum of -3 and a maximum of 8 points was calculated. For a detailed description of the tool see Online Resource 2. As this review's aim was to discuss all existing evidence, no studies were excluded due to low quality and/ or low evidence ratings. All studies were used to describe characteristics of existing interventions. However, some of the presented analyses regarding effectiveness of methods and program effects are focused on studies with positive evidence only.

Data Extraction and Data Analysis
To address the first research question regarding general effectiveness trends, data was extracted about intervention aims, setting, duration, target population, target age group, other intended program effects (operationalized as other measures), intervention methods and peer relationship measures. Additionally, study quality information as well as results were extracted as described above in study quality and level of evidence rating. For papers with missing information on peer relationship outcomes (e.g. missing effect sizes), or papers failing to provide a detailed description of intervention components, the authors were contacted. Authors who did not reply to the first email were contacted again about a month after the initial request was sent. If missing information regarding outcomes could not be obtained, the paper was excluded from further analysis.
To address the second research question 'what works for whom', differences in program structure concerning different target populations were explored. Following a long tradition of differentiation between prevention approaches (Gordon, 1983), programs were grouped into (i) universal prevention (or intervention) targeting the entire population before the manifestation of symptoms, (ii) selective intervention targeting at-risk populations showing first signs of existing symptoms and (iii) indicated intervention describing intervention efforts after the unfolding of illness (January et al., 2011). Based on this classification, the present review differentiates (i) "preventive programs" for the general population, (ii) "selective intervention" programs for typically developing children, with identified problems that put them at risk for developing mental health problems, and (iii) indicated "clinical intervention" programs for children with a clinical diagnosis targeting behavioral and emotional problems 1 3 associated with the respective diagnosis. Each program type was first described regarding (a) measures used to assess peer relationships, (b) other program effects (i.e., significantly improved skills and psychological variables) and (c) didactic and practical methods used during implementation. Secondly, effectiveness trends (based on level of evidence ratings) of methods and targets were explored.

Peer Relationship Outcomes
Measures were grouped into five categories, which the authors believe represent distinct aspects of relationships. After consideration of scale descriptions and sample items, measures were grouped into (i) sociometric measures, providing an index retrieved from peer report of likability or friendship nominations, (ii) friendship quality measures, assessing quality aspects of specific relationships to one or more particular friend(s), (iii) perceived popularity or acceptance measures, assessing self-reports of quantitative popularity, (iv) loneliness or social connectedness measures, assessing the subjective feeling of social belonging and (v) the autism-specific Quality of Play Questionnaire, asking children with ASD diagnosis to report the number of recent play dates and/or their perceived level of conflict at these play dates.

Positive Program Effects
Grouping of program effects followed an existing categorization of determinants of peer relationships , which was adapted to fit the empirical data found in the presently included papers. Identified program effects included (i) emotion regulation and coping, (ii) social skills, (iii) self-concept and self-beliefs, (iv) ASD/ADHD symptoms, (v) anxiety, (vi) depression, (vii) internalizing/externalizing problems, (viii) behavior towards peers, (ix) general wellbeing, (x) family factors, (xi) victimization, (xii) academic factors, and (xiii) school connectedness. The present review only reports on significantly improved effects, as it was assumed that only positive effects could be linked to improved peer relationships.

Methodological Components
Methodological program components were derived during data extraction and iteratively refined to result in a comprehensive list of differential and specific didactic methods. Identified methodological components comprised (i) collaborative tasks, (ii) group discussion, (iii) individual tasks/ self-awareness training (any task that was carried out by participants individually and with the intention to reflect/ focus on concepts individually e.g. working on self-esteem, mindfulness, drawing activities), (iv) context for interaction (only used if this was the explicit aim of the intervention: games/breakfast), (v) didactic content delivery, (vi) active practice in group (e.g. role play, practice abilities), (vii) homework, (viii) parental involvement, (ix) implicit reinforcement of behaviors (any kind of implemented system or strategies that guide children's behaviors throughout the duration of the intervention, e.g. token system or systematic praise) or (x) mentorship.

Overview of Included Studies
Sixty-five studies were identified for inclusion (see Fig. 1). Included studies were published between 2000 and 2020.
The study designs included 39 (60%) randomized controlled trials, 14 (21.5%) non-randomized controlled trials, and 12 (18.5%) uncontrolled trials. Four studies were follow-up studies on included studies and were regarded as valuable evidence to supplement the level of evidence rating and the synthesis of outcome trends on the corresponding original studies but were not included in the main analyses. A further four papers featured two intervention programs respectively and presented results in comparison to the other intervention as well as to a separate control group. While study characteristics are presented for each paper, these programs were weighted and discussed separately as two entities in the analyses of program components. See Table 1 in Online Resource 3 for an overview of included studies.
Most studies were conducted in the US (25 studies, equaling 38.5%), followed by nine studies from the UK (13.8%), five studies from the Netherlands (7.7%) and four studies each from Israel and Australia (6.2% each). Two studies were conducted in Hong Kong, China, South Korea, and Japan respectively (3.1% each) and one study each was conducted in Sweden, Iceland, Belgium, Italy, Spain, Slovenia, Brazil, Chile, Lebanon, and Canada. Study setting varied between programs. Thirty-four studies (52.3%) were conducted in the school context, 31 studies (47.7%) were conducted in community and social care settings. While some studies reported on clinical intervention aspects, a clear distinction between clinical and community settings could not be drawn (e.g. conducted by health care professionals in the community setting, conducted by researchers in a hospital setting). Regarding population, 40 studies (61.5%) were conducted with the general population with or without behavioral issues but without clinical diagnosis. 18 studies (27.6%) were conducted with children with an autism spectrum disorder (ASD) diagnosis, four studies (6.1%) with children with an attention deficit hyperactivity (ADHD) diagnosis, two studies (3.1%) with mixed clinical and general population and one study (1.5%) with children with an anxiety disorder diagnosis. The final sample comprised of 25 prevention programs, 14 selective intervention programs for typically developing children, and 22 intervention programs for children with a clinical diagnosis.

Quality Appraisal
The Cochrane risk of bias assessment found a majority of studies to be of poor quality (86.4%). Seven papers (8.6%) were rated as raising some concerns and only four papers (4.9%) were rated as high quality. These findings reflect the necessity of a more differentiated rating system for studies with a different context compared to the clinical one. The author-derived quality of evidence rating provided a more nuanced picture, with studies spanning the full range of a minimum 0% quality percentage points to the maximum of 100% percentage points. The mean quality rating was at 37.8%, with a majority of 42 studies (64.6%) receiving a rating between 25 and 75% of quality. A total of 15 studies were rated below 25% of quality and 4 studied achieved a quality rating above 75%, with 2 studies achieving a maximum rating of 100%.

General Overview of Effectiveness Results
The following results are reported for 61 included studies with data from four respective follow-up studies contributing to the level of evidence rating. Effect sizes, which were calculated in Cohen's d for all studies, were found to be between d = -.42 and d = 1.91, with M = .47 and SD = .44. Out of 49 studies that had a control group, 29 studies (59.2%) found significant intervention effects as compared to the control group. For 27 studies, some form of followup data was obtained, for five of those studies follow-up data was added from four separately published follow-up studies. 20 of those (74.1%) showed that effects were maintained over the follow-up period. The range of the authorderived effectiveness ratings was broad, from -3 points to 7 points (just below the maximum of 8), with M = 2.59, and SD = 2.51.

Effects of Study Quality
A high significant correlation between effect sizes and author-derived level of evidence ratings was found (r(59) = 0.67, p < .001). However, neither indicator was found to be significantly correlated with the author-derived study quality rating. Interestingly, however, the non-significant correlation coefficient of effect size and quality was negative (r(59) = -0.11, p = .395), indicating higher effect sizes for lower quality studies. On the contrary, the non-significant correlation coefficient of the comprehensive level of evidence rating and study quality was positive (r(59) = 0.17, p = .193). Thus, a high level of evidence rating, which comprised effects sizes, between-group and follow-up data, correlated with high quality, indicating a bias when looking at effect sizes only.

Effects of Program Duration
Program duration was operationalized in two ways; (i) program length in weeks and (ii) total hours spent on program activities. The programs were implemented over a period of between 2 weeks and 2 years, with a mean program length of 17 weeks (SD = 20). Total time spent on program activities was between 1.5 h and 84 h, with a mean of 17 h (SD = 15.4). Prevention programs seemed to be conducted over longer periods of time (M = 18 weeks, min = 2 weeks, max = 112 weeks) with less intensity (M = 14.6 h, min = 1.5 h, max = 64 h). Similarly, selective interventions were often conducted over longer periods of time (M = 18 weeks, min = 6 weeks, max = 112 weeks) with less daily/weekly allocated program hours (M = 11.5 h, min = 3.3 h, max = 30 h). Clinical interventions, on the contrary, tended to have a higher duration intensity with more program hours (M = 22.1 h, min = 4 h, max = 84 h) over shorter periods of time (M = 15 weeks, min = 5 weeks, max = 56 weeks). Neither effect size in Cohen's d nor authorderived level of evidence ratings were found to be significantly correlated with program length or hours spent on program activities. However, a marginally significant negative correlation between level of evidence ratings and total time spent on program activities was found (r(56) = -.24, p = .072).

Effects of Age
A medium significant correlation between mean age of participants and effect size (Cohen's d) was found for the total sample (r(43) = 0.35, p = .019), indicating a trend of increased effects with older age groups. Separating program types, there was a significant correlation between mean age and effect size for intervention programs (r(26) = 0.51, p = .005), but not for prevention programs. However, looking at the author-derived level of evidence ratings, no significant correlation between mean age and effectiveness was found (r(43) = 0.24, p = .107). Still, a trend of higher effectiveness of prevention programs in younger age groups was found to be evident in both effect size and level of evidence indicators (see Figs. 2, 3).
The following in-depth analyses of each intervention type's effectiveness trends will rely on the authorderived level of evidence ratings, as the authors believe this combined appreciation of effect size, between group effects, follow-up effects and sample size provided better insights into actual effectiveness trends than any one indicator alone. In the following, each intervention type will be discussed separately to provide an overview of peer relationship outcomes, other positive program effects and methodological components employed by the program.

Prevention Programs
Twenty-five papers reported on prevention programs, targeting typically developing children without any emotional or behavioral problems. Mean program length was 18 weeks (SD = 23) and 15.1 h (SD = 13.4) and mean age of participating children was 11.72 years. Most programs were conducted in a school-setting (72%). Mean program quality was 43% (SD = 28.2) and mean effectiveness rating was 2.64 (SD = 2.78).

Peer Relationship Outcomes of Prevention Programs
A majority of prevention programs (56%) measured outcomes using peer acceptance or popularity measures. Reported effectiveness of these programs on acceptance or popularity was moderate (see Online Resource 3, Table 2). The highest mean effectiveness ratings (3.75) were reported by programs assessing loneliness or connectedness. Friendship quality measures had moderate effectiveness ratings, but promising follow-up trends. However, friendship quality and loneliness results need to be interpreted with caution due to the small number of studies.

Positive Effects of Prevention Programs
Many preventive peer relationship programs had positive effects on psychological wellbeing and mental health factors such as self-concepts, internalizing/externalizing problems, wellbeing, emotion regulation, depression and anxiety (see Online Resource 3, Table 3). Especially depression and anxiety were associated with high level of evidence (mean rating of 3) regarding program effects on peer relationships. This is evident from the violin plot in Fig. 7, with the anxiety and depression violin being entirely in the positive spectrum of peer relationship evidence. This means, all studies improving anxiety and depression also yield strong positive effects on peer relationships. Figure 7 shows that most studies improving self-concepts, internalizing/externalizing problems, and emotion regulation were in the upper range of the evidence scale, and thus associated with strong peer relationship evidence. This was not the case with social skills and wellbeing factors, as evident in long equally thin violins across the whole peer relationship evidence scale.

Methods of Prevention Programs
Methods used in prevention programs varied. Didactic content delivery and individual/self-awareness tasks were each present in 60% of prevention programs. Practice of skills, group discussions, and homework were each used in 40-44% of programs. As can be seen in Fig. 8, a majority of studies with didactic content, individual/self-awareness tasks, active practice, homework and parental involvement was in the upper range of evidence, thus associated with strong positive peer relationship evidence. Although present in few studies, highest and most consistent effectiveness ratings were found for parental involvement (mean rating = 3, Cohen's d = .54), with all studies in the positive evidence range. Didactic content delivery, individual/self-awareness tasks and practice of skills were associated with strong positive effects in 80-86% of programs and a mean rating of between 2 and 3. As evident in long, equally wide violins, collaborative tasks and group discussion components were associated with varied evidence, although their mean peer relationship evidence ratings of 3 (Cohen's d = .45) and 2 (Cohen's d = .28) respectively, were similarly strong compared to other methodological components.

Patterns of Methods and Positive Effects of Prevention Programs
As a majority of prevention programs implemented combinations of methodological components, an additional examination of such combinations of methodological components, and their relation with positive target effects and peer relationship evidence was carried out (see Online Resource 3, Table 4). Number of methodological components was not correlated with effectiveness ratings (r(25) = 0.06, p = .76), neither was number of positive target effects (r(24) = -0.02, p = .93).
The two most prevalent methodological components of prevention programs, didactic content delivery and individual/self-awareness tasks, were used in combination by 80% of programs. Programs using both methods had a mean effectiveness rating of 2.75. This combination was frequently paired with homework and parental involvement, both of which were almost exclusively associated with strong positive evidence. All of the prevention programs employing these four components had strong positive effectiveness ratings (M = 3.14). This combination of methods was frequently associated with positive effects on self-concepts and health factors (anxiety, depression or internalizing/externalizing problems). A slightly different pattern is identified regarding prevention programs with behavioral focus, that is positive effects on emotion regulation and social skills. These programs were only associated with high levels of peer relationships evidence when conducted with the methodological component active practice of skills.

Selective Intervention Programs for Typically Developing Children
A total of 14 papers reported on selective intervention programs for typically developing children. One of these papers presented two separate intervention programs in comparison to each other. Thus, the following section will report on 15 programs. Children participating in these interventions were selected based on different behavioral or emotional risk factors, such as bullying, social skills deficits, anxiety, loneliness, suicide risk, or general behavioral problems. Mean age of participating children was 9.87 years (SD = 1.99). Mean program length was 18 weeks (SD = 27.5 weeks), with a mean of 11.50 h (SD = 8.21 h). A majority of programs (64.3%) was conducted in the school-setting. Mean evidence rating of these programs was 2.07 (SD = 2.25) and mean quality rating was 30.83% (SD = 31.29).

Peer Relationship Outcomes of Selective Intervention Programs
Selective interventions used fewer outcome measures, compared to other programs. A majority of selective interventions used peer-focused measures, such as sociometrics or peer acceptance/popularity measures. Both outcomes yield moderate effectiveness scores (see Online Resource 3, Table 5), however, for both measures, available follow-up data pointed towards maintenance of results. Although fewer studies measured outcomes with loneliness or connectedness measures, these seemed to yield better results (mean evidence = 4) and were associated with higher study quality.

Positive Effects of Selective Intervention Programs
Many selective interventions had positive effects on selfconcepts, anxiety, internalizing/externalizing problems, victimization, peer behaviors and academic factors. Especially, self-concepts, anxiety, internalizing/externalizing problems, and victimization had high mean level of evidence ratings (see Online Resource 3, Table 6). However, as evident from the violins with two bubbles in Fig. 9, self-concepts, peer behaviors, victimization, and academic factors were associated with mixed effects on peer relationships. Anxiety and internalizing/externalizing problems, on the contrary, were consistently associated with positive effects on peer relationships.

Methods of Selective Intervention Programs
A majority of selective interventions (80%) used didactic content delivery or practice of skills as methodological components. A third of programs used homework or individual/self-awareness tasks and 20% of programs used group discussions, collaborative tasks or implicit reinforcements of behaviors to achieve program goals. As can be seen in Fig. 10, didactic content and practice of skills were associated with strong evidence ratings in the majority of cases (75% and 67% respectively) and both had a mean evidence rating of 2 (Cohen's d = .47 and .4 respectively). Individual/ self-awareness tasks and group discussions were both in the upper range of the scale, which implies they were consistently associated with strong evidence (mean ratings = 3, Cohen's d = .52 and .81 respectively). Using homework also seemed to be effective (mean rating of 2, Cohen's d = .56). Although present in fewer programs, parental involvement and providing context for interaction also seemed promising for achieving high effects (mean ratings of 5 and 4, Cohen's d = .72 and .88 respectively).

Patterns of Methods and Positive Effects of Selective Intervention Programs
For selective interventions, no significant correlation between the effectiveness on peer relationships and number of methodological components (r(15) = 0.266, p = .33) or number of positive effects (r(15) = 0.149, p = .59) was  Table 7). The majority of selective interventions (73%) used a combination of didactic content delivery and active practice in group. Only two programs used one of these components without the other. The combination of these components seems universal for selective interventions, regardless of positive effects on other variables and effectiveness on peer relationships. Generally, more effective programs combined active practice and content delivery with either individual/self-awareness tasks, discussions or parental involvement, while less effective programs combined active practice and content delivery with collaborative tasks. Although the most effective program focused on mentoring, trends or conclusions might be inferred with care, since one of the least effective programs also focused on mentoring.

Indicated Intervention Programs for Children with Clinical Diagnosis
Twenty-two papers reported on indicated interventions for children with clinical diagnosis. Three of these papers reported two separate interventions, thus a total of 25 clinical intervention programs will be compared. The majority of clinical diagnoses concerned autism spectrum disorder (ASD) (18 programs), while four programs focused on attention deficit hyperactivity disorder (ADHD), two programs on mixed populations and one program on anxiety disorder. Mean program length was 15 weeks (SD = 9.94) with a mean of 22.15 h spent on program activities. Participating children had a mean age of 12.12 years (SD = 2.39) and the majority of programs was conducted in a community or social care setting (72%). Mean level of evidence rating was 2.56 (SD = 2.45) and mean quality rating was 37% (SD = 20.25).

Peer Relationship Outcomes of Clinical Intervention Programs
Clinical interventions used a variety of measures to assess peer relationship outcomes. The ASD-specific Quality of Play Questionnaire was frequently used and associated with the highest mean strength of evidence rating of 3.22 (see Online Resource 3, Table 8). Other clinical interventions positively impacted subjective peer acceptance/popularity and sociometric ratings (mean evidence of 2.38 and 2.4 respectively). The interventions' impact on loneliness/ connectedness was smaller with a mean strength of evidence rating of 1.33. In contrast to these positive results, effects on friendship quality were sparse, if present at all. With effect sizes between -.15 and -.11, and only one program improving over a control, mean strength of evidence rating was -1, pointing to non-existent effects.

Positive Effects of Clinical Intervention Programs
Overall, clinical intervention programs seemed to have positive effects on few specific target variables. Most clinical interventions improved social skills, ASD symptoms, emotion regulation and peer behaviors (see Online Resource 3, Table 9). Social skills and ASD symptoms were mostly associated with strong effects on peer relationships, as evident from the larger bubbles of the violins in the positive range of the evidence rating in Fig. 11. Interpersonal variables emotion regulation and peer behaviors, as well as anxiety were not consistently related to

Methods of Clinical Intervention Programs
Most programs implemented didactic content or practice of skills (84%), involved parents or homework tasks (68%). Some programs used group discussions (32%), implicit reinforcement of behaviors (24%) or individual/self-awareness tasks (20%) to achieve program goals. No clear patterns regarding effectiveness of methodological intervention components emerged. Four components, namely didactic content, active practice, homework and parental involvement, were clearly most prevalent. Although all of them had mean peer relationship evidence ratings of 3 (Cohen's d between .63 and .65), evidence was spread across the scale as evident in long, equally wide violins in Fig. 12. Similarly, group discussions, implicit reinforcement, and individual/ self-awareness tasks all achieved mean evidence ratings of 2 (Cohen's d between .45 and .63) but evidence was distributed across the scale.

Patterns of Methods and Positive Effects of Clinical Intervention Programs
No significant correlation between effectiveness ratings on peer relationship outcomes and number of methodological components (r(25) = -0.17, p = .41) or number of positive effects (r(23) = -0.07, p = .76) was found. The majority of clinical interventions (68%) used a combination of  Table 10). At the same time, a majority of programs achieved positive effects on emotion regulation, social skills, ASD symptoms or behaviors towards others. This combination of specific methodological components and skills/behavior-based variables seems strongly interlinked. Indeed, those few programs yielding effects on variables other than social skills or emotion regulation were also employing different methods. However, no patterns regarding more or less efficient combinations of methods or targets could be identified. The described combinations were present in highly effective programs as well as less effective programs (ratings of up to 7 until -1).

Discussion
Although peer relationships are crucial for healthy social, emotional and physiological development of adolescents, a comprehensive review of the structure and effectiveness of existing peer relationship programs has so far been lacking. To guide intervention development and inform practitioner's choices, a comprehensive overview of existing programs and effective intervention strategies is needed. Therefore, this study aimed to explore which circumstances impact program effects on peer relationships, and which program types work for different target populations. Considerable heterogeneity was found between program types regarding methods, targeted effects, and peer relationship outcomes, highlighting the need for practitioners and intervention developers to align program characteristics with the target population's needs.

General Circumstances Impacting Program Effectiveness
Consistent with other reviews in the field (Durlak et al., 2011;January et al., 2011), effects of age were observed. A trend for preventive programs to be more effective for younger children was identified, which is in line with findings regarding elevated effectiveness of universal programs when introduced during an (early) developmentally significant period (January et al., 2011). Clinical and selective interventions, however, showed a trend for higher effects in older children. Although other clinical intervention reviews found stronger effects for younger age groups, they also found interaction effects of age and other variables, such as the intervention period (Towle et al., 2020). Interventions usually set in when problems are already present and relevant. Social skills interventions have been understood to be particularly beneficial at the time of school transition as children are increasingly aware of the immanent importance of social skills to initiate positive peer interactions (January et al., 2011). This need might be elevated for a population with ASD or ADHD as behavioral or emotional difficulties are often associated with peer rejection (Perren et al., 2006). Children are likely to become more aware of these peer difficulties and their risk for rejection with increased age.
A negative correlation between time spent on program activities and level of evidence was found. This relationship is surprising as it appears contrary to expectations and findings of other reviews, which postulated larger effect sizes for longer and more intensive interventions (Wolstencroft et al., 2018). Indeed, clinical programs tended to have more program hours and higher effect sizes. Thus, this overall finding might be an artefact of few very long but unsuccessful programs.

Prevention Programs
Preventions had promising effects on self-concepts, internalizing/externalizing problems, depression, and anxiety. Specifically mental health factors, depression, and anxiety were consistently associated with improved peer relationships. As these variables were addressed simultaneously, no inferences regarding causal relationships or the direction of effects can be made. However, these findings suggest strong associations between mental health and social relationships, as found in other studies (Leigh-Hunt et al., 2017). Especially loneliness was found to be strongly associated with mental health (Heinrich & Gullone, 2006) and have bidirectional links with depressive symptoms (Vanhalst et al., 2012). Preventive programs in this review were particularly effective in reducing loneliness compared to other peer relationship indices.
Additionally, this review's findings suggest a link between improved self-believes, mental health factors and peer relationships. Indeed, positive self-beliefs and identity factors are common target variables of mental health prevention programs (Enns et al., 2016). Low self-esteem was found to predict depression, partially mediated by rejection sensitivity, while rejection sensitivity and depression both accounted for increases in loneliness (Zhou et al., 2020). Low selfesteem and fear of negative evaluation have also been found to play a role in the maintenance of loneliness (Geukens et al., 2020). Thus, implementing peer relationship programs targeting self-believes and perceived loneliness appears to be a promising preventive approach to combat later mental health problems.
Child-centered interventions typically employ interactive methods similar to the methods identified in this review (Brigden et al., 2019;Voight & Nation, 2016). For prevention programs focusing on self-concepts and mental health factors, a combination of homework, parental involvement, didactic content and self-awareness was most prominent.
Direct didactic input has been linked to improved cognitive impacts (Nelson et al., 2003), which are essential for improved self-concepts. Another set of prevention programs focused on emotion regulation and social skills. These programs had best effects on peer relationships when employing active skills practice in combination with content or self-awareness tasks. Similarly, other studies reported on the benefits of direct instruction for social-emotional skills promotion (Ashdown & Bernard, 2012). This highlights the importance of a nuanced alignment of methodological components and target variables.

Selective Intervention Programs for Typically Developing Children
Selective interventions targeted similar variables as prevention programs. Associations with high peer relationship effects were found for self-concepts, anxiety, internalizing/ externalizing problems and victimization. The additional focus on victimization is a result of the definition of intervention as a program targeting existing problems, which in the school context often concerned bullying and victimization. Social withdrawal in childhood has been found to predict later depression diagnosis, via a mediational pathway of adolescent peer problems (Katz et al., 2011). Thus, intervening during adolescence to improve existing peer problems seems to be a promising intervention strategy concerning mental health outcomes of at-risk children.
Many selective interventions employed sociometric measures, which assess children's social standing in the group through peer ratings. However, sociometric outcomes were only moderate, which might be explained by the relative stability of sociometric status, specifically rejection status (Jiang & Cillessen, 2005). As selective interventions focused on existing peer problems, it is likely that target children had been rejected by their peers, thus making intervention effects on sociometric standing more difficult to achieve. Loneliness was targeted in fewer studies, but yielded higher effects, similar to prevention programs. Given a similar focus on mental health factors and self-beliefs, these strong effects on loneliness are very plausible as discussed before. Considering findings regarding the mediational role of rejection sensitivity for depression outcomes and its bi-directional links with loneliness (Zhou et al., 2020), selective interventions bear potential to achieve long-term mental health effects by improving children's self-concepts and (social) anxiety, while simultaneously decreasing loneliness.
Although 73% of selective interventions employed a combination of didactic content and active practice, this combination was not necessarily associated with strong effects on peer relationships. A recent meta-analysis found that social skills interventions alone are not sufficient to reduce bullying (da Silva et al., 2018). The authors suggested that bullying as a group phenomenon needs to be addressed by interventions targeting group norms and group processes beyond individual social behaviors (da Silva et al., 2018). In line with these findings, peer relationship interventions for children with behavioral or emotional risk factors might need to go beyond content delivery and skills practice. Recently, beneficial effects of personality-targeting CBT interventions for high-risk victims were reported (Kelly et al., 2020), emphasizing the importance of self-beliefs. Indeed, this review found consistent associations between self-awareness tasks or group discussions and high peer relationship effects in selective interventions. Therefore, interventions for at-risk children might need to combine active practice of social skills (e.g. role-plays, group tasks) and self-awareness tasks.

Intervention Programs for Children with Clinical Diagnosis
Clinical interventions mainly improved peer relationships by simultaneously promoting social skills, ASD symptoms, emotion regulation and peer behaviors. As the clinical diagnoses of children in these studies were mainly ASD and ADHD, typical symptoms are often overlapping with peer problems and can be alleviated by social skills training, as found in other reviews (Wolstencroft et al., 2018). Most clinical interventions in this review employed a typical but also widely successful combination of didactic content delivery, active practice of skills, homework, and parental involvement. Other studies have found similar intervention set-ups for a clinical population with didactic content (Wolstencroft et al., 2018), practical social skills trainings (January et al., 2011), parental involvement (Brigden et al., 2019;Wolstencroft et al., 2018) and homework (Gardner & Gerdes, 2015). Gains in social skills were partially attributed to gains in social knowledge (Gates et al., 2017), emphasizing the importance of combining content and practical abilities when promoting social skills (January et al., 2011).
Social skills interventions work on the premise of improving children's social behaviors and thereby increasing their social acceptance among peers . Indeed, this review found high effects for interventions targeting peer acceptance and sociometrics, suggesting potential of these interventions to change peer group dynamics. However, setting and group composition are important factors for effects on the group level and transfer of skills to other groups. Clinical interventions in this review were usually implemented in the community in form of training groups for children with ASD or ADHD. These groups offer children proximity and homophily (i.e. common characteristics) to their peers, two factors essential for the establishment of friendships (Kasari et al., 2016). However, positive intervention effects achieved within this particular group context might not be carried over to a classroom setting when it comes to friendships with typically developing peers (Kasari et al., 2016). In line with this, it has been argued that typically developing peers should be involved in clinical interventions through peer-mediated activities (Kasari et al., 2016) or by focusing on typically developing peer's acceptance of unusual behaviors (Mikami et al., 2005). However, stronger effects were found for social skills interventions as compared to typically developing peer-mediated interventions (Kasari et al., 2016). It has been suggested that delivering social skills interventions in the classroom context will increase effectiveness , as target children would simultaneously benefit from social skills trainings and proximity to their typically developing peers.
Although children with ADHD were not found to experience more loneliness or have less friendships, self-reported characteristics of a close friendship varied from typically developing children (Heiman, 2005). Children with ASD were found to report fewer friendships and be selected fewer times as friend by their peers, while also reporting differences in friendship quality (Kasari et al., 2011). Thus, a clinical population appears to have different peer experiences (Diendorfer et al., 2021). Therefore, the quality of play questionnaire was included as specific measure accounting for different peer relationship experiences of the clinical population and avoiding an underrepresentation of this population in this review. This measure often achieved high effect sizes, which might have impacted the trends for higher effect sizes in clinical studies. Some reviews suggest a plausible connection between a target group more in need of support and higher effects (Pandey et al., 2018). However, a sub-domain of this measure concerned the number of play dates, which is highly sensitive to parental effort and intervention trainers' encouragement. Although high effects are reduced in the current review through the combined level of evidence rating, a confounding factor of unlikely high effect sizes might persist.

Implications for Practitioners and Intervention Development
Overall, intervention developers and practitioners should be clear about realistic peer relationship goals of their intervention efforts. This review showed that preventive programs and selective interventions targeting self-beliefs or mental health factors bear the potential to decrease loneliness. Programs for a clinical population targeting social skills bear the potential to increase number of play dates, perceived acceptance and sociometric status. However, contrary to intuitive expectations of peer relationship programs, friendship quality was not a prominent outcome and hardly improved. Especially strong peer relationships such as friendships, measured by quality aspects such as support, intimacy and trust (Roach, 2019) are far more complex and might take longer to develop. In contrast, peer acceptance, sociometrics and perceived loneliness largely depend on the amount of (successful) social interactions, which can be increased through social skills trainings or merely social exposure. Similarly, other reviews found most social skills programs to improve interpersonal and emotional skills, while they hardly ever found direct effects on peer relationships (de Mooij et al., 2020). However, the present review found follow-up effects for friendship quality to be rather promising with maintained or even increased long-term effects. This it is noteworthy as increases in effect sizes after follow-up periods were rare among other measures and suggests that relationship quality might be a realistic long-term outcome and just takes longer to develop. Thus, realistic intervention efforts might concern children's perceived loneliness and acceptance by their peers through improved social skills and improved self-esteem. Over time more positive interactions might turn into actual friendships, which is however beyond the scope of a single intervention program.
For preventive efforts, both, group-level indices (sociometrics, acceptance) and individual indices (loneliness) seem to be feasible targets. Highest effects, however, were found for reducing loneliness when simultaneously increasing mental health factors and self-believes, which is supported by other studies (Geukens et al., 2020;Zhou et al., 2020). Thus, practitioners aiming to prevent relationship problems might want to focus on children's individual outlook and self-concept. Such a focus on mental health and self-concepts was found to benefit peer relationships (e.g. reducing loneliness) by implementing methods to support self-awareness and knowledge components. A focus on emotion regulation and skills, however, benefits peer relationships by actively training those skills in the group setting. Thus, a key finding regarding effectiveness was the importance of aligning methods and intervention targets.
Selective interventions were used to address peer problems and victimization, thus often targeting sociometrics and peer acceptance. However, practitioners should carefully consider which peer factors are feasible intervention targets. Peer's opinions (sociometrics, acceptance) were less likely to change as a result of the intervention, while loneliness was found to be more malleable in the short-term. To interrupt trajectories from withdrawal to later mental health problems (Zhou et al., 2020), it seems therefore more promising for practitioners to address self-believes and mental health factors to simultaneously improve at-risk children's perceived loneliness. A combination of content delivery, practice of skills, self-awareness and group discussions seems suitable for these programs.
Most clinical interventions focused on social skills trainings to alleviate peer problems and manage ASD/ADHD symptoms. They successfully featured a combination of content delivery, active practice, homework, and parental 1 3 involvement. While this focus on social skills positively impacted children's sociometric status and their play dates, no effects on friendship quality were found and effects on loneliness were mediocre. These findings somewhat contradict previous recommendations to focus on dyadic relationships when working with a clinical population (Gardner & Gerdes, 2015). Based on this review's findings, clinical populations are best supported by training their social skills in a group, allowing them to make new friends. Most clinical interventions are currently set in a community context, but transferring them in a classroom setting might allow for better transfer of skills and positive peer experiences with typically developing peers.
Practitioners should be aware of the importance of family support and at-home practice to ensure transfer and sustainability of skills, as evident from high peer relationship effects related to homework and parental involvement. While parental involvement was previously identified as important intervention component for clinical and at-risk populations (Brigden et al., 2019), this review's results suggest it is also crucial for preventive programs. In the context of children's peer relationships this importance might even be elevated considering the parents' roles in organizing and supporting play dates.

Limitations of this Review and Implications for Further Research
To the authors knowledge, this review is the first systematic examination of peer relationship programs. However, this topic comes with some inherent limitations apparent in this study. Due to the wide variability of peer relationship indices and settings, considerable heterogeneity of studies emerged. Rather than limiting this review to a subgroup of existing programs, it aimed to provide an overview of programs in all settings and all populations. While this is regarded a strength of this study, it was therefore neither possible to conduct a meta-analysis nor to explore every possible line of interaction between program features and effects. Peer relationship indices, methods and program effects were chosen as main analysis targets as this information was thought to be most relevant for practitioners and intervention developers. Exploring more detailed associations, for example methods and age groups (Brigden et al., 2019), was beyond the scope of this review.
As a-priori exclusion of non-randomized trails has been discouraged (Gluud, 2006), this review included RCTs, CTs and uncontrolled trials. Randomization of intervention groups or the employment of a control group is sometimes difficult to achieve in the context of at-risk children in need of support or school settings with school personals' time restrictions and preferences for either group. However, uncontrolled trials are naturally disadvantaged. Due to the instability of relationships and pronounced loneliness during adolescence (Wong et al., 2018), an intervention program achieving some stability in peer relationships might actually be a successful program. However, this would only be evident compared to a control group showing drops or changes in relationships. Thus, uncontrolled studies failing to produce significant peer relationship improvements might not necessarily be unsuccessful.
A considerable limitation concerns the high percentage of poor-quality studies-a total of 86.4% of included paperswhen assessed with the standardized Cochrane RoB 2 measure. This is very problematic as methodological aspects (which are usually assessed in ROB ratings) tend to be correlated with outcome indices (Cheung & Slavin, 2016). Low quality papers included in this review tended to produce higher effect sizes, a pattern previously reported (Mackenzie & Williams, 2018). To reduce this bias and to allow for careful appreciation of results of non-RCTs (Katikireddi et al., 2015), the Cochrane ROB 2 tool was adapted. The authorderived level of evidence rating allowed for a nuanced rating of existing evidence without downgrading studies without control group or follow-up. Higher effects as rated with this tool were better aligned with high study quality.
Missing information regarding program descriptions and program implementation was one of the biggest obstacles encountered and constitutes another limitation regarding the analysis of intervention contents and methods. For 57 of originally identified papers, authors had to be contacted for a description of program contents and implementation procedures as the paper did not provide sufficient detail. Only 26% of papers reported appropriate implementation fidelity checks, which is a common problem addressed in other reviews . Thus, for almost three quarters of included papers in this review it is unclear how successful and rigorous implementation was. Given that implementation is crucial for intervention effectiveness (Durlak & DuPre, 2008;Voight & Nation, 2016), this lack of reporting might bias the interpretation of results.
Poor reporting standards regarding implementation, analysis, or control groups and poor quality of studies are common problems regularly criticized within the field (Durlak et al., 2011;Mackenzie & Williams, 2018). Only a quarter of published intervention papers was found to provide information about factors promoting intervention effectiveness and only 5% discussed economic factors (Premachandra & Lewis, 2021), which is highly problematic, as intervention research should aim at informing practitioners' use. Although intervention literature is popular in psychology and education, information on successful programs does not easily reach practitioners, resulting in few interventions being sustained (Durlak & DuPre, 2008). To increase the impact of intervention research and make research findings more applicable for practitioners and policy-makers, reporting of intervention strategies and implementation processes needs to be more rigorous. More rigorous reporting will also reduce potential bias and allow for more nuanced analyses. The more nuanced findings of the author-derived rating tool used in this study emphasize the importance of complex effectiveness evaluations in reviews in order to reduce bias in the interpretation of results.

Conclusion
Despite their potential to support healthy development and improve public health, peer relationship programs have so far been neglected in the literature. A comprehensive review of existing peer relationship problems was conducted to explore circumstances furthering program effectiveness and appropriate methods and targeted effects for different target populations. Intervention developers and practitioners need to put their target population's needs in the center of each intervention effort. For preventive efforts, peer relationships seem to improve alongside mental wellbeing and self-concepts, especially when starting at a young age. Intervention programs addressing peer problems or victimization have been shown to be most effectively addressing loneliness by focusing on metal wellbeing and self-concepts. Focusing on a clinical population, peer relationships were improved alongside social skills, emotion regulation and managing symptoms. Additionally, a close alignment of intervention methods and desired effects is essential, such as practical activities and homework to train emotion regulation and social skills or self-awareness tasks to address mental wellbeing. For further clarification of implementation effects, future intervention studies should put more emphasis on rigorous reporting of intervention characteristics and methodological aspects. Heterogeneity of methodological quality and poor reporting standards are a major problem for efforts to synthesize findings within the heterogeneity of peer relationship programs.
Authors' Contributions IP conceived of the study, lead the literature search, conducted and coordinated data extraction and data analysis, and drafted the manuscript; MM conceived of the study and participated in the literature search and coordination of data extraction and data analysis; JB participated in the discussion of data presentation; SD participated in data extraction and data analysis; IK participated in data extraction and data analysis; JCR participated in the discussion of data presentation; EJS participated in the literature search and conceptualization of this study; BS conceived of the study and participated in coordination of data extraction and data analysis; KAMS participated in data extraction and data analysis; KAW conceived of the study and participated in coordination of data extraction and data analysis; All authors read and approved the final manuscript.
Funding This work was part of the D.O.T. research project, which was funded by Ludwig Boltzmann Society Open Innovation for Science, Karl Landsteiner University of Health Sciences, and Lower Austrian Research and Education Association (NFB). The D.O.T. project grew during a sandpit event organized by the Open Innovation Center of Ludwig Boltzmann Society and facilitated by Know Innovation. The authors would like to thank these organizations for their role in the formation of the research group.

Conflict of interest
The authors have no conflicts of interest to declare.
Preregistration A structured protocol for this review was registered on the PROSPERO database (reference CRD42018111227). The systematic search was conducted to feed into three separate strands of data extraction and analysis (see PROSPERO 2018 CRD42018107945 and CRD42018114312). The original registration for this review was published in October 2018 and amended in January 2021 after the search update was conducted.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.