Shinrin-yoku (forest bathing) is a healing practice in Japan, where people immerse themselves in nature, while mindfully paying attention to their senses. Often involving a walk in a forest, it aims to integrate and harmonise humans with a forest (Miyazaki 2018). Contents of shinrin-yoku programmes include breathing yoga, meditation, walking and other recreational activities (e.g. cooking) that are often aimed at producing relaxation effects (Forest Therapy Society 2005). The word ‘shinrin-yoku (“森林浴”)’ was coined in 1982, for ‘yoku (bathing)’ implies the holistic nature of our health. Shinrin-yoku then began to be introduced into Japanese clinical fields (Hansen et al. 2017).

Literature reviews reported diverse health benefits of shinrin-yoku: on immune system functioning by increasing natural killer cells, and the cardiovascular and respiratory systems (Williams 2016). The health benefits of shinrin-yoku are not limited to physical well-being; improvements have been described in mood disorders and stress, and mental relaxation (Park et al. 2012).

There are a number of theories that account for the health benefits of exposure to nature. Kaplan’s Attention Restoration Theory claims that spending time in nature restores our concentration through practice of effortless attention (Kaplan and Kaplan 1989). Stress Reduction Theory asserts that being in an unthreatening natural environment reduces stress and improves relevant physiological functions such as heartrate and blood pressure (Ulrich et al. 1991). Indeed, Song, Ikei and Miyazaki noted that natural stimuli help to reduce stress and strengthen our immune system. More recent studies explored the mechanism of shinrin-yoku and found that the benefits of shinrin-yoku accord with Gilbert’s (2014) model of affect regulation (Richardson et al. 2016). Although the benefits of nature for affect regulation are often overlooked (Korpella et al., 2018), it is essential to health and well-being (Gross 2013). Forest bathing and connecting with nature can help us regulate our emotions, through soothing and calming (i.e. the parasympathetic system), instead of fear, anxiety and drive (i.e. sympathetic system) (Richardson et al. 2016).

Humans are more familiar with spending time in nature than in urban environments: over seven million years of human history, we have spent 99.99% of the time in nature (Miyazaki 2018); this may partially explain why we feel better in nature (Miyazaki 2018). Women living in a green-rich area had a 12% lower rate of mortality than those living in a green-poor area (James et al. 2016). Spending time in nature is related to lower rates of depression and high blood pressure, and the frequent visits to nature was related to social cohesion (Shanahan et al. 2016). Participants who viewed a towering tree for 1 min scored high awe scores, associated with more prosocial helping behaviours, than people who viewed a building with the same height (Piff et al. 2015). A three-day shinrin-yoku programme increased the number and activity of natural killer cells compared with 3 days of walking in a city (Li 2010). Likewise, a 90-min walk in nature reduced the level of rumination (negative repetitive thoughts, linking with mental health problems) and the activities of subgenual prefrontal cortex (part of brain that is related with mental health problems) (Bratman et al. 2015). Finally, there is recent evidence that visits and time in nature may be acting as proxy measures for nature connectedness (Martin et al. 2020).

While our physical health has markedly improved in the twentieth century, our mental health has arguably worsened (Mental Health Foundation 2016). In 1995, the World Health Organization (WHO) launched an initiative called ‘Nations for Mental Health’ aiming to raise awareness of mental health and innovate mental health treatment (WHO 2002). Mental health action plan 2013–2020 was passed at the 66th WHO Summit, promoting universal mental health care (WHO 2013). Approximately 1.1 billion people were estimated to have a mental health problem in the world in 2016 (15% of the population): most prominent disorders being anxiety (4%), depression (3%) and alcohol use disorders (1%; Ritchie and Roser 2018). The global costs of mental illness were estimated about £2.5 trillion in 2010, which was projected to increase to £6 trillion by 2030. About two-thirds of those costs are not directly associated with mental health issues, for example, reduced productivity and income (Marquez and Saxena 2016). Among developed countries, the costs related to mental disorders are 2–4% of GDP (Hewlett 2014), and they were estimated to be substantially higher in developing countries (Patel 2007). Unsurprisingly, many countries have enacted government-led initiatives. For example, in the UK, mental health has been high on the national agenda and the budget for mental health care has been increasing (Department of Health 2011). In Japan, poor mental health has also been a major national issue (e.g. the high suicide rates), and new policies for supporting people with mental health problems were established in 2004, improving citizens’ mental health awareness and care (Ministry of Health, Labour and Welfare [MHLW] 2004). This led to a revision of the law, further enhancing the country’s mental health care (MHLW, 2014).

These reports suggest that mental health is a worldwide concern, and affordable, accessible and effective mental health solutions are needed. Treatment and care using nature may be one solution that can satisfy those needs (Hunter et al. 2019). Some of the benefits have been reviewed recently (Richardson et al. 2016; Farrow and Washburn 2019; Payne and Delphinus 2019), but no review has specifically focused on mental health benefits. Accordingly, the present review systematically evaluated empirical findings noting the effects of shinrin-yoku on mental health. The most prevalent mental health problems are depression, anxiety and stress; therefore, these were our foci (Farmer and Dyer 2016). In addition, we also examined whether shinrin-yoku’s effects on anger, as it is associated with depression, anxiety and stress (Walsh et al. 2018).

Methods

The present article followed the preferred reporting items for systematic review and meta-analysis (PRISMA; Moher et al. 2009) guidelines, to systematically review the literature and appraise the quality of evidence for the mental health effects of shinrin-yoku. Additionally, to maintain the validity of this systematic review, Klassen et al.’s (1998) framework was employed, focusing on question, criteria, missing articles, quality of the studies, assessment and results. The extended version of the PICO (population, intervention, control and outcomes) format (Boland et al. 2013) was used to construct a researchable question by dissecting the question into those components to help organise relevant information (Sackett et al. 1997). The main research questions of this review were the following: (i) how effective is shinrin-yoku in improving mental health outcomes? And (ii) what quantity and quality of evidence is reported?

Literature Search

The literature search was conducted clarifying (i) where (i.e. databases), (ii) when the literature was searched, (iii) who searched the literature, (iv) how (i.e. keywords), (v) what amount of articles retrieved at each combination of the keywords and (vi) why some articles were included/excluded (i.e. selection criteria) (Callahan 2010). Literature on PubMed/MEDLINE, PsycINFO, Science Direct and Google Scholar were searched, after a consultation with a subject librarian. Articles published before the 30 October 2019 were searched in November 2019. The search terms ‘shinrin-yoku (including ‘shinrin yoku’) (n=205), ‘forest bathing’ (n = 148) and ‘nature therapy’ (n = 129) retrieved 481 articles in total (‘nature therapy’ as synonymous with shinrin-yoku; Hansen et al. 2017).

Eligibility Criteria

To be eligible for further analysis, articles needed to (i) be published in a peer-reviewed academic journal using English language; (ii) employ a shinrin-yoku intervention; (iii) report an empirical intervention study, using pre- and post-intervention measures; and (iv) use mental health measures for depression, anxiety, stress and anger. Exclusion criteria were articles that (i) were not interventions, (ii) were case studies or qualitative studies and (iii) did not measure depression, anxiety, stress and anger (Table 1).

Table 1 Extended PICO for this review

Outcome Measures

Outcome measures were instruments that evaluate the levels of depression, anxiety, stress and anger. Because there were various measurement tools used, we did not have principal summary measures set. Mental health outcomes measured in the included studies were depression (k = 19), anxiety (k = 22), anger (k = 14) and stress (k = 1).

Data Extraction and Synthesis

The lead author comprehensively examined the search results, and articles were shortlisted for possible inclusion if the title and abstract indicated that the study might satisfy the eligibility criteria. Additional manual reference searches on previous systematic reviews on shinrin-yoku (n = 15) identified 16 additional articles that might fit with the inclusion criteria, thus shortlisted (Rojon et al. 2011; Appendix 1). To counter any potential bias, the other co-authors independently reviewed the entire selection process. Full papers of shortlisted articles were independently reviewed by all co-authors. Lastly, a discussion was held among the co-authors to confirm whether the selected articles had met the eligibility criteria and revisited the excluded studies to ensure the reasons for exclusion were accurate (Appendix 2). Forward and backward reference searches of relevant articles revealed no additional studies.

Data were extracted focusing on study aims, characteristics, participants, intervention details, outcome measures and main findings (Table 2). Data were synthesised by the mental health outcomes examined in the selected articles, further categorised into the four measures of mental health—depression, anxiety, stress and anger (Table 3).

Table 2 Study details of selected articles exploring mental health effects of shinrin-yoku (n = 20)
Table 3 Included studies organised by mental health measures

Meta-analyses were conducted focusing on depression, anxiety and anger; meta-analysis of stress data was not possible (k = 1). We compared Pearson’s product-moment correlations to determine effect size for the shinrin-yoku intervention. Data were entered into Meta-Essentials (Van Rhee et al. 2015).

Variability was examined using Cochran’s Q and I2. Heterogeneity among effect sizes was determined by a significant Q value (p < 0.10). The I2 statistic indicates the degree of variability in effect sizes (low heterogeneity, 1–49; moderate heterogeneity, 50–74; high heterogeneity, 75–100). In the case of significant heterogeneity, subgroup and moderator analyses were undertaken.

Quality Appraisal: Risk of Bias

The quality of the included non-randomised studies was assessed using the Newcastle-Ottawa Scale (NOS; Wells et al. 2000). Using a star system, three assessors rate the quality of studies from 0 to 9 stars (high risk, 0–3; medium risk, 4–6; low risk, 7–9) in three domains: (i) representativeness of study group selection (max. 4), (ii) comparability of groups (max. 2) and (iii) ascertainment of either the exposure or outcome of interest (max. 3). Some adjustments were made to NOS because many of the included studies recruited samples who had no mental disorders (while NOS was originally developed for medical research attended by clinical samples): (i) the word ‘exposure’ was changed to ‘intervention’, (ii) the fourth scale item was changed from ‘Demonstration that outcome of interest was not present at start of study’ to ‘Demonstration that the measured outcome was assessed before the intervention’ (because some mental health outcomes exist before intervention, e.g. stress) and (iii) in respect of the first item in the outcome assessment, a star was awarded if the outcome was assessed using a validated scale (instead of medical records).

Randomised controlled trials were appraised using the Quality Assessment Table of Randomised Controlled Trials (Brown et al. 2013).

Results

Search Results

The article selection process was illustrated in Fig. 1. Of the 497 articles (481 from the databases and 16 from manual reference search), 167 articles were removed for duplication. The remaining 330 articles were screened for their title and abstract by authors. Sixty articles were selected for full-text review, of which 40 were excluded (Appendix 2) and 20 were included (Table 2).

Fig. 1
figure 1

PRISMA flow diagram of the article selection process

Characteristics of Included Studies

Twenty included studies were relatively recent, the oldest one being published in 2007 (Morita et al.). The majority of the studies were conducted in Asia (n = 18; 86%): ten in Japan (Furuyashiki et al. 2019; Horiuchi et al. 2013; Lee et al. 2011; Morita et al. 2007; Ochiai et al. 2015; Park et al. 2011; Song et al. 2018, 2019; Takayama et al. 2014, 2019), four in Korea (Chun et al. 2017; Lee et al. 2018; Han et al. 2016; Shin et al. 2012), two in Taiwan (Chen et al. 2018; Yu et al. 2017) and one in China (Guan et al. 2017). Three studies were conducted in Europe: two in Poland (Bielinis et al. 2018a, b; Bielinis et al. 2019), and one in Serbia (Vujcic et al. 2017) (Table 2). No studies were identified in Africa, Oceania and South and North America. Eight studies were non-randomised trials (Table 5), and twelve were randomised controlled trials (RCT; Table 6). All the non-randomised studies employed a pre-post design; two studies had a comparator condition. Twelve RCTs included six studies using crossover (e.g. a forest group walked in a city, while a city group walked in a forest on the second day; Lee et al. 2011; Park et al. 2011; Song et al. 2018, 2019; Takayama et al. 2014; Takayama et al. 2019), and one study where groups were categorised by different types of trees (Guan et al. 2017). Interventions included walking and meditation, and time duration ranged from 15 min to 9 days. While all studies involved paying attention to the five senses (Table 1 for eligibility criteria), 18 (90%) studies involved walking (Bielinis et al. 2018a, b; Chen et al. 2018; Chun et al. 2017; Furuyashiki et al. 2019; Guan et al. 2017; Han et al. 2016; Horiuchi et al. 2013; Lee et al. 2011; Lee et al. 2018; Morita et al. 2007; Ochiai et al. 2015; Park et al. 2011; Shin et al. 2012; Song et al. 2019; Takayama et al. 2014, 2019; Vujcic et al. 2017; Yu et al. 2017), four (20%) involved meditative activities (Furuyashiki et al. 2019; Lee et al. 2018; Shin et al. 2012; Ochiai et al. 2015) and three (15%) involved recreational activities (Bielinis et al. 2019; Chen et al. 2018; Han et al. 2016).

A total of 2257 participants (M = 1478, F = 779; age range 18–79 years old) were involved in these included studies, indicating shinrin-yoku’s wide applicability. Six studies involved clinical samples: metabolic syndrome (Lee et al. 2018), chronic stroke (Chun et al. 2017), psychiatric disorders (Vujcic et al. 2017), chronic diseases (Yu et al. 2017), chronic pain (Han et al. 2016) and alcoholism (Shin et al. 2012).

Measures

Table 3 presents all included studies organised by the mental health measures. POMS was frequently used in shinrin-yoku research (n = 14); other measures used in more than one paper were the State-Trait Anxiety Inventory (STAI; n = 6), and the Beck Depression Inventory (BDI; n = 3).

Outcomes

Depression was measured using six scales: POMS, BDI, the Depression Anxiety and Stress Scale (DASS), the EuroQol Visual Analog Scale (EQVAS), the Hamilton Depression Rating Scale (HDR). Anxiety was also measured using six scales: POMS, STAI, the Anti-Anxiety Questionnaire (AAQ), DASS, EQVAS and the Multiple Mood Scale (MMS). Anger was measured using a single scale: POMS.

Meta-analyses

The depression subscales in POMS, DASS21, MMS and BDI were considered for meta-analysis of depression; EQVAS was excluded as it measures depression and anxiety together. Chun, Chang and Lee’s study (2017) used HDR and BDI, so HDR was removed because HDR emphasises physical symptoms (Hamilton 1960). The anxiety subscales in POMS, STAI and DASS21 were considered for meta-analysis of anxiety, and the anger subscale in POMS was considered for meta-analysis of anger (* in Table 3). The anxiety score in STAI from Chun et al.’s study (2017) was not included as whether the 20 items used were related to state or trait anxiety was not reported. Likewise, the anxiety score in AAQ (Guan et al. 2017) was not included as whether this scale has been validated was not reported. For studies that employed two anxiety subscales, namely POMS and STAI (Chen et al. 2018; Song et al. 2018, 2019; Yu et al. 2017), POMS was considered as it is more commonly used. The random effects models were used as the included studies included diverse populations; thus, heterogeneity was assumed (mean effect sizes = small, 0.10–0.29; moderate, 0.30–0.49; high, ≧ 0.50; Cohen 1992).

First, data from six RCT studies were analysed (Bielinis et al. 2018a, b; Chun et al. 2017; Lee et al. 2011; Shin et al. 2012; Takayama et al. 2014, 2019). Song et al.’s RCT studies (2018, 2019) were excluded as these studies only reported post-intervention scores. Lee et al. (2018) and Guan et al. (2017) were excluded as their RCTs compared different types of forest. Lastly, Vujcic et al. (2017) was excluded as their RCT did not employ a comparable control group.

Second, 16 studies that reported pre-intervention and post-intervention scores were analysed (Bielinis et al. 2018a, b, 2019, Chen et al. 2018, Chun et al. 2017, Furuyashiki et al. 2019, Horiuchi et al. 2013, Lee et al. 2011, 2018, Morita et al. 2007, Ochiai et al. 2015, Park et al. 2011, Shin et al. 2012, Takayama et al. 2014, 2019, Yu et al. 2017, Vujcic et al. 2017).

As significant heterogeneity was identified in each symptom, three moderator analyses (i, ii and iii) and three subgroup analyses (iv, v and vi) were conducted to appraise whether (i) crossover of the two groups, (ii) participants being Asian, (iii) participants being Japanese, (iv) the length of the intervention, (v) gender (female-male ratio) and (vi) age accounted for the variability made difference in the effects. The first moderator (i) crossover was not examined in the 16 pre-post studies, as it was not applicable. Lastly, publication bias was examined. Table 4 summarises the results of our meta-analysis.

Table 4 Effect sizes (g) and p values for moderators and subgroups in each variable (depression, anxiety and anger)

Depression in RCT

The total sample size for RCTs measuring depression was 417 (range 12–47) from six studies. Figure 2 shows the forest plot for the meta-analysis with depression in RCT. There was a small mean negative effect size, g = − 2.54, 95% CI(− 3.56, − 1.52), which was significant. Heterogeneity of effects was significant (Q = 38.84, p < 0.001) and inconsistency was high (I2 = 87.13%); in all studies, the effect was negative—depression decreased more in the forest setting compared to the urban setting.

Fig. 2
figure 2

Effect size for depression in RCT

The three moderators—(i) whether group crossover was done or not (p = 0.27), (ii) whether participants were Asian or not (p = 0.26) and (iii) whether participants were Japanese or not (p = 0.27)—were not significant predictors. In subgroup analyses, the length of the intervention (iv) was a significant predictor of effect size for depression (slope = 0.04, p = 0.002). However, the intervention length was not significant (p = 0.11) when one extreme value (Shin et al. 2012) was removed. The female-male ratio (v) was not a significant (slope = 0.09, p = 0.97), whereas average age (vi) was a significant predictor for depression (slope = 0.04, p = 0.04). Possible evidence of publication bias was identified (Appendix 3).

Depression in Studies Reported Pre-Post Scores

The total sample size for pre-post scores measuring depression was 1449 (range 12–498) from 16 studies. Figure 3 shows the forest plot for the meta-analysis with depression in studies that reported pre-post scores. There was a medium mean negative effect size, g = − 1.04, 95% CI(− 1.47, − 0.60), which was significant. Heterogeneity of effects was significant (Q = 331.57, p < 0.001) and inconsistency was high (I2 = 95.48%); in all studies, apart from Lee et al. (2011), the effect was negative—depression decreased from pre-shinrin-yoku to post-shinrin-yoku.

Fig. 3
figure 3

Effect size for depression in studies reported pre-post scores

The two moderators—(ii) whether participants were Asian or not (p = 0.12), and (iii) whether participants were Japanese or not (p = 0.20)—were not significant predictors. In subgroup analyses, the length of intervention (iv) was not significant (slope = − 0.02, p = 0.24); however, after removing an extreme value (Park et al. 2011), it became significant and it was negative; i.e. the longer the intervention, the smaller the effects (slope = − 0.03, p = 0.01). The female-male ratio (v) was significant (slope = 1.04, p = 0.03); however, after removing an extreme value (Park et al. 2011), it became non-significant (slope = 0.58, p = 0.15). Lastly, average age (vi) was non-significant (slope = 0.01, p = 0.22). Possible evidence of publication bias was identified (Appendix 4).

Anxiety in RCT

The total sample size for studies measuring anxiety was 327 (range 12–46) from five studies. Figure 4 shows the forest plot for the meta-analysis with anxiety. There was a large mean negative effect size, g = − 8.81, 95% CI(− 21.91, 3.57), which was not significant. Variability across samples was significant (Q = 125.03, p < 0.001) and high (I2 = 96.80%).

Fig. 4
figure 4

Effect size for anxiety in RCT

All moderators—crossover (p = 0.003), Asian (p = 0.007) and Japanese (p = 0.003)—were significant predictors of effect size for anxiety. The length of the intervention (iv) was not a significant predictor (slope = 0.35, p = 0.19). Female-male ratio (v) was a significant predictor; i.e. the effects were smaller when there were fewer female participants (slope = − 30.84, p < 0.001), whereas average age (vi) was not a significant predictor (slope = 0.13, p = 0.20). Possible evidence of publication bias was not identified (Appendix 5).

Anxiety in Studies Reported Pre-Post Scores

The total sample size for pre-post scores measuring anxiety was 1371 (range 12–498) from 16 studies. Figure 5 shows the forest plot for the meta-analysis with anxiety in studies that reported pre-post scores. There was a large mean negative effect size, r = − 1.83, 95% CI(− 3.07, − 0.58), which was significant. Heterogeneity of effects was significant (Q = 611.89, p < 0.001) and inconsistency was high (I2 = 97.55%); in all studies, anxiety decreased from pre-shinrin-yoku to post-shinrin-yoku.

Fig. 5
figure 5

Effect size for anxiety in studies reported pre-post scores

The two moderators—(ii) whether participants were Asian or not (p = 0.88), and (iii) whether participants were Japanese or not (p = 0.75)—were not significant predictors. In subgroup analyses, (iv) the length of the intervention was significant; i.e. the longer the intervention, the more effects observed (slope = 0.13, p = 0.02); however, after removing one extreme value (Park et al. 2011), it was not significant (slope = 0.05, p = 0.24). The female-male ratio (v) was not significant (slope = 1.26, p = 0.12). Lastly, the average age (vi) was a significant moderator (slope = 0.04, p = 0.01); however, after removing an extreme value (Park et al. 2011), it became non-significant (slope = 0.01, p = 0.52). Possible evidence of publication bias was identified (Appendix 6).

Anger in RCT

The total sample size for studies measuring anger was 268 (range 12–46) from four studies. Figure 6 shows the forest plot for the meta-analysis with anger. There was a medium mean negative effect size, g = − 1.63, 95% CI(− 13.25, − 0.01), which was significant. Variability across samples was significant (Q = 25.52, p < 0.001) and high (I2 = 88.25%).

Fig. 6
figure 6

Effect size for anger in RCT

For anger, all the three moderators—crossover (p = 0.13), Asian (p = 0.13) and Japanese (p = 0.13)—were not significant predictors. Likewise, the intervention length (data unidentified), female-male ratio (slope = − 1.61, p = 0.47) and average age (slope = − 3.40, p = 0.06) were not significant. Possible evidence of publication bias was identified (Appendix 7).

Anger in Studies Reported Pre-Post Scores

The total sample size for pre-post scores measuring anger was 1365 (range 12–498) from 12 studies. Figure 7 shows the forest plot for the meta-analysis with anger in studies that reported pre-post scores. There was a medium mean negative effect size, g = − 0.81, 95% CI(− 1.17, − 0.45), which was significant. Heterogeneity of effects was significant (Q = 153.52, p < 0.001) and inconsistency was high (I2 = 92.83%); in all studies, anger decreased from pre-shinrin-yoku to post-shinrin-yoku.

Fig. 7
figure 7

Effect size for anger in studies reported pre-post scores

The two moderators—(ii) whether participants were Asian or not (p = 0.46), and (iii) whether participants were Japanese or not (p = 0.15)—were not significant predictors. In subgroup analyses, all subgroups were significant: the length of the intervention (iv) slope = − 0.13, p < 0.001; female-male ratio (v) slope = − 0.65, p < 0.001, and average age (vi) slope = − 0.01, p < 0.001. Possible evidence of publication bias was identified (Appendix 8).

Risk of Bias

The risk of bias in the non-randomised studies was deemed to be medium for all eight studies (Bielinis et al. 2019; Chen et al. 2018; Furuyashiki et al. 2019; Han et al. 2016; Horiuchi et al. 2013; Morita et al. 2007; Ochiai et al. 2015; Yu et al. 2017). All of these studies assessed the mental health outcomes before and after shinrin-yoku (for non-clinical samples, participation eligibility of no mental health disorder was reported). None of these eight studies commented on the representativeness of the cohort or conducted follow-up assessments (Table 5).

Table 5 Assessment of quality of studies based on mental health outcome (non-randomised trials)

In the randomised controlled trials, the risk of bias was deemed high to medium: all the studies scored from two (Bielinis et al. 2018a, b) to six (Shin et al. 2012). For the studies that employed crossover (Park et al. 2011; Song et al. 2019; Takayama et al. 2014, 2019), blinding administration and participants were both graded as ‘not applicable (NA)’ as it was impossible for participants to be unaware of the condition they were assigned to at each time. All studies reported the number of participants allocated to different groups, and inclusion criteria apart from Song et al. (2019). The baseline comparability of different groups was reported in seven studies (Chun et al. 2017; Guan et al. 2017; Lee et al. 2011, 2018; Shin et al. 2012; Takayama et al. 2014, 2019), and achieved in seven studies (Chun et al. 2017; Guan et al. 2017; Lee et al. 2011, 2018; Shin et al. 2012; Takayama et al. 2014; Vujcic et al. 2017): Vujcic et al. (2017) did not present demographic details; however, they noted that the gender and diagnosis distribution were equal. Unsurprisingly, given the type of intervention, no study maintained allocation concealment and blinding of assessors, administration and participants; hence, the blinding procedure was poor (Table 6).

Table 6 Assessment of quality of RCT studies based on mental health outcomes

Discussion

This systematic review and meta-analysis examined the quality and extent of evidence reported in studies investigating the effects of shinrin-yoku on mental health. Twenty studies (eight non-randomised, and twelve randomised controlled trials), involving 2257 participants, satisfied all of the eligibility criteria for in-depth review and assessment. Shinrin-yoku was deemed to have a greater effect on anxiety, than depression and anger, and the effects on anxiety could be predicted by many of the moderators examined, including the gender and Japanese or Asian participants (greater proportions of females, Japanese or Asian participants were associated with larger effects). Potential publication bias was identified in all analyses apart from RCTs on anxiety. While some studies demonstrated rigorous design and reporting, our conclusions are tempered by a number of weaknesses concerning study design and outcomes. Accordingly, in this discussion, we elucidate a number of areas of improvement.

Shinrin-yoku was reported effective for depression, anxiety, stress and anger in both clinical and non-clinical samples, especially for anxiety. The results reported in the selected studies were in line with relevant theories: spending time in nature increased restoration (Bielinis et al. 2019) aligning with Attention Restoration Theory (Kaplan and Kaplan 1989). Stress was reduced through shinrin-yoku (Vujcic et al. 2017; Morita et al. 2007), supporting Stress Reduction Theory (Ulrich et al. 1991). The role of nature in affect regulation is often overlooked (Korpela et al. 2018), and although not explicitly explored, findings accord with the three emotion regulatory systems model (threat, drive, and soothing; Richardson et al. 2016); being in nature may activate our soothing system, endorsing compassion, safety and connection, protecting our mental health. Psychological constructs relevant to the soothing system such as self-compassion and psychological safety need to be examined in shinrin-yoku research. Further, although likely to activate the pathways to nature connection (Lumber et al. 2017), none of the studies explored the psychological construct of nature connectedness—oneness with nature (Nisbet et al. 2009)—which is positively associated with psychological well-being (Pritchard et al. 2019). Future shinrin-yoku research should also explore nature connectedness.

Although all included studies demonstrated promising results, the risk of bias was deemed medium to high, and potential publication bias was identified in almost all analyses. This may explain why benefits were greater for Japanese and Asian participants: people in a culture that accords with nature’s healing effects may receive greater benefits of shinrin-yoku (e.g. Shintoism, perceptions of nature differ cross-culturally; Gierlach et al. 2010). Furthermore, none of the RCTs compared shinrin-yoku with other major therapeutic approaches such as CBT (while there was a study that combined CBT and nature; Kim et al. 2009): shinrin-yoku was only compared with spending time in urban settings (Vujcic et al. 2017 compared with art therapy, not a major approach). Given that being in an urban setting has negative health effects (Lederbogen et al. 2011; Marques and Lima 2011), shinrin-yoku should be compared with other major therapeutic approaches. Indeed, in our RCT meta-analyses, all control/urban groups, apart from depression score in Chun, Chang and Lee’s study (2017), reported increases in mean scores. These points suggest the need for shinrin-yoku research in Oceania, Africa and North and South America, and the need to compare shinrin-yoku with other major approaches. Moreover, shinrin-yoku’s effects were particularly salient for anxiety, which is the most common mental health problem in the world (Ritchie and Roser 2018), again suggesting more shinrin-yoku research is needed.

Other limitations in shinrin-yoku research included a lack of follow-up assessments and consideration for sample representativeness. A lack of follow-up assessments can compromise the validity of clinical research because whether the effects of shinrin-yoku can last or not remains uncertain (Dettori 2011). The representativeness of the sample was not addressed; therefore, whether the study recruited people who were interested in, and positively interpreted/reported the effects of shinrin-yoku or not, was not clarified. In the RCTs, randomisation and blinding were not addressed. This may be again related to participants’ expectations (Antonelli et al. 2019): revealing the allocation of the group, participants who were interested in shinrin-yoku might have become more susceptible to placebo effects. In addition, many RCTs used a crossover design with no interval (the groups were swapped on the next day), which may violate the accuracy of the results: the impacts of the first intervention need to be washed out before swapping the groups (Enck and Zipfel 2019). Furthermore, failing to blind the researchers can lead to placebo effects in participants; this may be particularly important when many of the reviewed studies included a prominent shinrin-yoku figure (e.g. Miyazaki, Lee). Finally, the included RCTs did not conduct intention-to-treat analysis and did not clarify whether other outcomes were measured or not. Similar to a previous systematic review (Kamioka et al. 2012), lack of these research items needs to be addressed in the future studies.

Lastly, though we defined that nature-based practice must include integration with nature engaging with five sensory experience to be recognised as shinrin-yoku (Table 1), shinrin-yoku practice included diverse forms: most commonly walking, meditation and recreational activities such as handcrafts. While this indicates high applicability of shinrin-yoku, it could also leave shinrin-yoku practice rather unguided. This may resemble mindfulness, which can be practiced in many ways (Williams and Penman 2011), but its flexibility may make practitioners feel that they were just sitting or sleeping (Bojic and Becerra 2017) and lead to biased reporting (Schumer et al. 2018). As with other alternative approaches, shinrin-yoku can benefit from more guidance in practice, to be more accepted as a reliable clinical approach. Accredited training packages are emerging (e.g. the European Forest Therapy Institute 2019).

While this article offers useful insights, limitations need to be noted. Firstly, unpublished studies, qualitative studies (e.g. Sonntag-Öström et al. 2015) or studies not published in English language were excluded (Appendix 9 for articles in Japanese satisfying the other criteria). Also, some studies examined many variables (Bielinis et al. 2018a, b, 2019; Takayama et al. 2014, 2019); however, the multiple comparisons problem was not addressed. These could exaggerate the effects of shinrin-yoku. Lastly, the included studies were conducted only in two continents. Considering the serious nature of mental health globally, and different views on nature, research in other continents should be conducted.

Conclusion

The twenty studies included reported that shinrin-yoku is effective for mental health, particularly anxiety. Shinrin-yoku can be practiced in 15 min to 9 days, and reduce negative mental health symptoms. While promising results were reported, medium-high risk of bias and publication bias were identified. Some of the key constructs related to mental health (e.g. self-compassion, isolation, nature connectedness) have not been explored in shinrin-yoku research and mechanisms of benefits have not been determined. Additionally, the duration of benefits and how they compare with other established therapeutic approaches need to be examined for shinrin-yoku to be accepted as mainstream intervention.