As highlighted by ongoing events such as climate change and the COVID-19 pandemic, which is strongly suspected to have zoonotic origins (Andersen et al., 2020), it is essential to acknowledge the interconnectedness of humans, animals, and the environment. This thought is at the core of the One Health concept, which aims to highlight the “synergistic benefit of closer cooperation between human, animal and environmental health sciences” (Amuasi et al., 2020). One example of a benefit derived from the connection between humans and animals is animal-assisted interventions (AAIs). Based on the definition presented by López-Cepero, in this review, AAIs are defined as any intervention that incorporates an element of human-animal interaction (HAI) with an unfamiliar animal, with the aim of improving a human health outcome (López-Cepero, 2020). Unfamiliar animals are defined as animals that are not owned by or living with participants. Most commonly AAIs use dogs as the intervention animal, but other animals such as cats, horses, birds, or fish are also sometimes used (Bert et al., 2016; Kamioka et al., 2014). Importantly, Howell et al. distinguish between visiting animals and therapy animals: while therapy animals are those who are included “in the work of a qualified health professional in the provision of […] treatment,” animals “that have suitable characteristics and are trained for public visitation by humans who volunteer to take them into facilities to bring enjoyment” are defined as visiting animals. Accordingly, in the context of higher education settings, we expect AAIs to mainly include visiting animals (Howell et al., 2022).

Past research has predominantly focused on the benefits of AAIs for clinical populations, and has found beneficial effects (Beetz et al., 2012; Maujean et al., 2015). Among autism and dementia patients, AAIs have been found to improve social interaction and reduce problematic behaviors such as aggression or agitation (Berry et al., 2013; O’Haire, 2013; W. Wood et al., 2017; Yakimicki et al., 2019). AAIs are especially beneficial for patients with mental disorders. Several systematic reviews have shown reductions of clinical symptoms of disorders like anxiety, depression, and schizophrenia, as well as improved engagement and social interaction (Brooks et al., 2018; Jones et al., 2019; Kamioka et al., 2014). In addition, AAIs have been shown to reduce stress and improve well-being among non-clinical populations (Ein et al., 2018; Kamioka et al., 2014; Nimer & Lundahl, 2007).

There is a particularly strong need for stress-reducing interventions among students at higher education institutions. A higher education institution is “any postsecondary institution of learning that usually affords, at the end of a course of study, a named degree, diploma, or certificate of higher studies” (Higher Education, 2020). Due to a multitude of factors including navigating a new environment, a high academic workload and financial pressures, the prevalence of stress, and symptoms of depression and anxiety disorders are worryingly high among higher education students worldwide (Bayram & Bilgel, 2008; Eisenberg et al., 2007). According to the Anxiety and Depression Association of America, for example, 85% of students feel overwhelmed by academic expectations and demands, over 40% of students state that anxiety is a top concern, and 30% of students state that stress negatively affects their academic performance (Anxiety & Depression Association of America, 2020; Austin et al., 2010). Similar results have been replicated among higher education students around the world (Bayram & Bilgel, 2008; Grützmacher et al., 2018; Mortier et al., 2018). The burden of mental health problems among students has been continuously increasing, and has been further exacerbated during the COVID-19 pandemic (Cao et al., 2020; Grützmacher et al., 2018; Son et al., 2020).

In light of these findings, AAIs are becoming increasingly common at higher education institutions to promote student mental health (Crossman & Kazdin, 2015). Such programs most commonly take the form of drop-in events where groups of students can freely interact with dogs and their handlers (Gee et al., 2017). AAIs in higher education settings are low-cost and easily scalable, allowing them to reach a large proportion of the student body (Bell, 2013; Crossman & Kazdin, 2015; Reynolds & Rabschutz, 2011; E. Wood et al., 2018). In addition, AAIs are not stigmatized like other traditional mental health services due to the overwhelmingly positive perception of AAIs among higher education students (Crossman & Kazdin, 2015). This makes AAIs an ideal universal intervention for mental health promotion efforts at higher education institutions (Greenberg & Abenavoli, 2017; Vadivel et al., 2021).

To confidently implement AAIs in higher education settings, a comprehensive overview of the current state of research is needed. Importantly, despite a growing number of studies assessing the efficacy of AAIs on students, there are only a limited number of published systematic reviews on the topic. Previous work on the effects of AAIs in educational settings includes a 2012 meta-analysis by Hummel and Randler (2012) and a 2017 systematic review by Brelsford et al. (2017), both of which focus on school-aged children. A recent systematic review by Parbery-Clark et al. is to the author’s knowledge, the first systematic review to focus on the effects of AAIs among higher education students (Parbery-Clark et al., 2021). However, Parbery-Clark et al. focus solely on mental health outcomes and exclude physiological and cognitive outcomes from their review. The objective of this systematic review was therefore to fill this gap in the literature by estimating the effects of AAIs in higher education settings on the mental, physiological, and cognitive outcomes of students. This review also aims to contribute evidence to the “shared medicines and interventions” subgroup of The Lancet One Health Commission (Amuasi et al., 2020).

Methods

Protocol and Registration

A systematic review protocol was developed in keeping with the PRISMA-P 2015 statement (Moher et al., 2015). This protocol was registered on PROSPERO on August 12, 2020, with the registration number CRD42020196283.

Sources, Search Methods, and Eligibility Criteria

The literature search was conducted from June 10 to June 20, 2020, and was designed to identify all published and unpublished experimental and observational trials on AAIs conducted in higher education settings. Medline/PubMed, PsycInfo, CINAHL, Web of Science, Embase, ERIC, and Scopus were searched. In addition, WALTHAM Science, HABRI Central and Animal and Society Institute, and the database OpenGrey were searched. Reference lists from relevant systematic reviews and included studies were hand-searched for potentially relevant publications.

Due to the large number of retrieved results, only randomized controlled trials (RCTs) that were published in a peer-reviewed journal were included in this review. Studies were included if they assessed the effect of an AAI on any mental, physiological, or cognitive outcome of higher education students. Mental health outcomes were considered those that describe a person’s emotional or psychological state, for example, through self-perceived assessments of stress, anxiety, or depression. We also included physiological outcome measures that reliably correlate with acute stress, such as blood pressure (BP), heart rate (HR), or cortisol levels (APA, 2018). Cognitive outcomes were considered those that describe a person’s cognitive functioning (Henderson et al., 2015), for example, through assessments of intelligence, concentration, or attention. In the higher education context, we also considered cognitive outcomes to include academic outcomes such as test performance. Details on the eligibility criteria can be found in Table S1, while details on the search strategy can be found in File S1.

Study Selection

The selection process was conducted in two steps, using Covidence (Covidence—Better Systematic Review Management, 2020). First, two independent reviewers (AH and EW) screened articles first by title and abstract, then by full text and voted on eligibility. Potential disagreements were resolved through regular discussions. If articles could not be found, the corresponding author was contacted. If there was no response within 2 weeks, the articles were excluded. Articles both reviewers agreed upon were included in the systematic review.

Quality Assessment

Only quantitative outcomes that were assessed by at least three studies and could thus be meaningfully combined in a quantitative synthesis were included in the quality assessment process. The risk of bias of the included studies was assessed independently by two reviewers (AH and EW), using the Cochrane Risk-of-Bias tool for Randomized Trials 2 (RoB 2) (Sterne et al., 2019). The version for individually randomized, parallel-group trials and the version for crossover trials were used.

Data Extraction

Data extraction was independently conducted by AH and EW using an Excel sheet. Data was collected on the study design, study participants, the intervention condition, the control condition, and reported outcomes. Conflicts were resolved through regular discussions. A full list of the extracted data items can be found in File S2.

Data Synthesis

All studies were grouped according to the qualitative or quantitative outcomes they assessed. For stress, anxiety, and depression, we further differentiated between chronic (long-term) and acute (short-term) outcomes. We defined acute outcomes as measuring how a person is feeling in a given moment, and chronic outcomes as measuring how a person is feeling over a longer period of time.

Qualitative Synthesis

Quantitative outcomes reported by less than three studies, as well as all qualitative outcomes, were summarized in a qualitative synthesis. Study results were briefly summarized for each outcome. Studies assessing mental, physiological, and cognitive outcomes were grouped together, and common trends in results were described.

Quantitative Synthesis

Quantitative outcomes reported by three or more studies were included in the quantitative synthesis. For both the meta-analyses and the albatross plots, potential multiplicity was eliminated by applying the following rules: First, if an outcome was reported across multiple time-points, the last reported measurement of the outcome which was not yet part of follow-up measurements was chosen. Second, if an outcome was reported using multiple measures and the reported measures were assumed to be interchangeable, only one of the included measures was chosen. This was the case in studies reporting both systolic and diastolic BP, where systolic BP was chosen, and in studies reporting both HF (high-frequency) and rMSSD (root mean square of successive differences) heart rate variability (HRV), where HF HRV was chosen.

To be included in a meta-analysis, studies needed to supply an effect size (Hedges’ g) of the post-test difference in mental, physiological, or cognitive outcomes between an intervention and a control group, and had to be of good quality (rated as “low risk” or “some concerns” by the RoB 2). In addition, studies had to use comparable intervention and control conditions. Interventions generally fell into two categories: (1) interventions that allowed participants to freely interact with animals and their handlers (active intervention) and (2) interventions where an animal was present while participants’ primary focus was on a task (passive intervention). These tasks typically aimed to increase the stress levels of participants (stressors), such as timed math tasks. Interventions were additionally categorized based on the animal species used in the intervention condition. Control conditions broadly fell into four categories: (1) control groups that replaced the presence of an animal with a human (active human control); (2) control groups that replaced the presence of the animal with a different animal, a toy animal, or pictures/videos of an animal (active animal control); (3) control groups with an active component that was not a human or a different animal like yoga (active other control); and (4) control groups without any active component (no-treatment control). For each outcome included in the quantitative synthesis, coded tables were created to assess meta-analysis eligibility (Tables SIIISXIV). Meta-analyses were conducted for all outcomes where at least three studies reported an effect size, were of good quality, and used comparable intervention and control conditions. Due to the small number of studies included in each meta-analysis, it was not possible to conduct moderator analyses.

For eligible outcomes, meta-analyses were conducted using RStudio Version 1.3.959 (RStudio Team, 2020). Summary effect sizes as well as the corresponding 95% confidence interval (CI) were calculated using a random-effects model, and visualized using forest plots. The heterogeneity between included studies was assessed using the Q and I2 statistics. If Hedges’ g and its standard error (SE) was not reported in the original study, it was computed in RStudio Version 1.3.959, using the package “esc” (Lüdecke, 2019). Details of the conducted calculations can be found in Table SXV. Funnel plots were used to explore publication bias, and Egger’s test for funnel plot asymmetry was conducted.

Due to the limited number of studies included in the meta-analyses, albatross plots were used to extend the quantitative data synthesis. The albatross plot is a graphical tool that allows an approximation of effect sizes based on p-value and sample size (Harrison et al., 2017). The eligibility criteria in place for the meta-analyses were not required for inclusion in the albatross plots. Albatross plots were created using Stata/SE 16.1 (Stata Statistical Software. College Station, TX: StataCorp LLC; 2019). Effect size contours were calculated based on the standardized mean difference (SMD). Contours corresponded to the effect sizes 0.2 (small effect), 0.5 (medium effect), and 0.8 (large effect). Since all included studies were randomized, an equal group size was assumed. As suggested by Harrison et al., if a p-value was presented as a threshold instead of an exact value (e.g., p < 0.05), the threshold value was used as the exact value (Harrison et al., 2017). In addition, for any non-significant outcome without an exact p-value (e.g., p > 0.05), a p-value of 1 was substituted (Harrison et al., 2017). If not reported in the original study, p-values were calculated by conducting unpaired two-sided Student’s t-tests in RStudio Version 1.3.959, using the command “t.test” and the mean, standard deviation, and sample size provided (RStudio Team, 2020). If not otherwise specified in the study, a normal distribution of the data was assumed.

The threshold for statistical significance was set at p < 0.05 for all conducted calculations. The code used for all calculations can be found under https://doi.org/10.6084/m9.figshare.19368047.v1.

Results

Study Selection

A search of all databases yielded 2.431 search results. Screening of reference lists contributed an additional 63 search results, giving a total of 2.494 results. Details on the exact number of results obtained from each database can be found in Table SII. After removing duplicates and screening the articles by title and abstract, a total of 218 articles remained for full text screening. After the full text screening, 32 articles remained for inclusion in this systematic review. Of these 32 articles, three reported two separate eligible studies (Crump & Derting, 2015; Gee et al., 2019; Trammell, 2017), bringing the total of individual studies included in this review to 35 (Banks et al., 2018; Barker et al., 2016, 2017; Binfet, 2017; Capparelli et al., 2020; Charnetski et al., 2004; Crossman et al., 2015; Crump & Derting, 2015; Fiocco & Hunse, 2017; Gebhart et al., 2020; Gee et al., 2014, 2015, 2019; González-Ramírez et al., 2016; Grajfoner et al., 2017; Hall, 2018; Hunt & Chizkov, 2014; Kobayashi et al., 2017; McDonald et al., 2017; Pendry et al., 2018, 2020; Pendry et al., 2019a; Pendry, Vandagriff, et al., 2019; Pendry & Vandagriff, 2019; Polheber & Matchock, 2014; Shearer et al., 2016; Stewart & Strickland, 2013; Straatman et al., 1997; Trammell, 2017, 2019; Ward-Griffin et al., 2018; Wilson, 1987). Common reasons for exclusion can be found in the PRISMA flow chart (Fig. 1). Of the 35 studies, 30 were included in the quantitative data synthesis. Of these, eight studies were included in the meta-analyses and 28 studies were included in the albatross plots.

Fig. 1
figure 1

PRISMA flow chart. 1Quantitative outcomes assessed by less than three studies as well as qualitative outcomes were not eligible for inclusion in the meta-analyses. 2Methodological heterogeneity = heterogeneous for type of intervention condition (active/passive), animal used or type of control condition (active animal/active human/active other or no-treatment)

Study Characteristics

An overview of the most important extracted data items and study results can be found in the data extraction table (Table 1). Information on additional study characteristics can be found in Table SXVI. In general, most studies had more female than male participants, and participants were mostly of “typical” undergraduate age (mean 20.2 years, median 19.7 years). In almost all studies (n = 29), the intervention animal was a dog. While all studies used visiting animals as defined above, most intervention animals had received a therapy animal certification, while a small number of studies used companion animals (most commonly pets of researchers) who did not have a certification (Hunt & Chizkov, 2014; McDonald et al., 2017; Pendry et al., 2019a; Pendry, Vandagriff, et al 2019; Pendry & Vandagriff, 2019; Straatman et al., 1997). Most studies (n = 2 7) used active intervention conditions, with most taking place in a group setting. In studies with active interventions, the animal-to-participant ratio was generally 1:3–5 participants. The remaining studies (n = 8) used passive intervention conditions that mostly took place in individual settings and included a stressor. In studies with passive interventions, the animal-to-participant ratio was generally 1:1. The most common control condition was a no-treatment control condition (n = 27). In most studies (n = 28), intervention sessions took place only once per participant. In general, intervention sessions were relatively short (mean 20.7 min, median 15 min).

Table 1 Data extraction table (n = 35)

Outcomes were grouped into mental health outcomes, physiological outcomes, and cognitive outcomes. Mental health outcomes were by far the most common (n = 26), followed by physiological outcomes (n = 14), and cognitive outcomes (n = 9). Most reported cognitive outcomes were related to students’ academic performance.

Risk of Bias Within Studies

Thirty studies were included in the quality assessment. Overall, 60 outcomes from 27 studies were classed as “some concerns,” 6 outcomes from 5 studies were classed as “high risk,” and no studies were classed as “low risk.” Common limitations included not reporting the method of allocation sequence generation or allocation sequence concealment. Additionally, blinding of participants and study personnel to a participants’ allocated condition was generally not possible due to the animal presence, although some studies tried to conceal the true study purpose from participants. Nonetheless, in most studies, both participants and study personnel were probably aware of their assigned condition, which may have especially affected self-reported outcomes. Finally, none of the included crossover RCTs gave information about potential carryover effects. An overview of quality assessment results for RCTs and crossover RCTs can be found in Figures S1 and S2. Quality assessment results at the individual outcome level can be found in Tables SXVII and SXVIII.

Synthesis of Results

Qualitative Synthesis

Consistent with the study hypotheses, most studies reporting on negative mental health outcomes, including acute depression, chronic depression, homesickness, and irritability, reported lower levels of these outcomes in the intervention group compared to the control group at post-test (Binfet, 2017; Hunt & Chizkov, 2014; Pendry et al., 2018; Pendry, Vandagriff, et al., 2019, Wilson, 1987). Only Wilson et al. did not report an effect of the intervention on chronic anxiety (Wilson, 1987), and Shearer et al. did not report an effect on chronic depression (Shearer et al., 2016). Similarly, some studies reporting on positive mental health outcomes reported higher levels of these outcomes in the intervention group compared to the control group at post-test (Barker et al., 2017; Binfet, 2017; Gee et al., 2019; Grajfoner et al., 2017; Kobayashi et al., 2017; Pendry et al., 2018; Pendry et al., 2019a; Pendry, Vandagriff, et al., 2019). However, a few studies also reported no effect of the intervention on positive mental health outcomes, including mood, life satisfaction, and mindfulness (Barker et al., 2017; Grajfoner et al., 2017; Shearer et al., 2016; Ward-Griffin et al., 2018). Most studies reporting on physiological outcomes reported no effect of the intervention (Barker et al., 2016; Charnetski et al., 2004; Straatman et al., 1997), although Fiocco and Hunse reported a smaller electrodermal response after a stressor in the intervention compared to the control group (Fiocco & Hunse, 2017), and Wilson et al. found mean arterial pressure (MAP) to be significantly higher in the intervention compared to the control condition (Wilson, 1987). Similarly, most studies reporting on cognitive outcomes showed no effect of the intervention (Banks et al., 2018; González-Ramírez et al., 2016; Pendry et al., 2020; Pendry et al., 2019a). Only Pendry et al. found an improvement in test anxiety, attitude and study motivation (Pendry et al., 2020) (Table 1).

Quantitative Synthesis

The following outcomes were included in the quantitative synthesis: acute self-perceived stress, chronic self-perceived stress, negative affect, acute anxiety, arousal, happiness, positive affect, BP, HR, HRV, salivary cortisol, and performance on a memory task. Of these, meta-analyses were conducted for chronic self-perceived stress, negative affect, acute anxiety, positive affect, and BP. All studies included in the meta-analyses used an active intervention condition, a dog as the intervention animal and a no-treatment control condition. The most important results are presented below. Detailed results for the remaining outcomes, including the albatross plots and the meta-analyses, can be found in Figures S3S13.

Mental health outcomes were most common. For acute anxiety and self-perceived stress, most included studies showed a clear reduction at post-test. Acute anxiety was reported by 14 studies, of which four studies were combined in a meta-analysis (Banks et al., 2018; Crossman et al., 2015; Shearer et al., 2016; Wilson, 1987). The pooled Hedges’ g was − 0.57 (95% CI: − 1.45, 0.31; Q = 12.5, I2 = 76%, p = 0.006), indicating a medium-sized negative effect of the intervention (Fig. 2). This result was mirrored by the albatross plot, where most studies clustered around the 0.5 to the 0.8 negative effect size contours (Fig. 3). Acute self-perceived stress was reported by seven studies. Although not combinable in a meta-analysis, the albatross plot demonstrated that included studies showed a reduction of self-perceived stress with a medium to large effect size, with most results clustering around the 0.5 to the 0.8 negative effect size contours of the albatross plot (Fig. 4).

Fig. 2
figure 2

Forest plot acute anxiety (n = 4). TE: Hedges’ g. seTE: standard error of Hedges’ g. N(i): number of participants in intervention condition. N(c): number of participants in control condition

Fig. 3
figure 3

Albatross plot acute anxiety (n = 11)

Fig. 4
figure 4

Albatross plot acute self-perceived stress (n = 7)

Negative affect was reported by 6 studies, of which 4 studies were combined in a meta-analysis (Banks et al., 2018; Crossman et al., 2015; Shearer et al., 2016; Ward-Griffin et al., 2018). The pooled Hedges’ g was − 0.47 (95% CI: − 1.46, 0.52; Q = 15.3, I2 = 80.4%, p = 0.016), indicating a small- to medium-sized negative effect of the intervention (Fig. 5). The albatross plot showed that while some studies showed a reduction of negative affect, other studies showed no effect (Fig. 6). The tendency for some studies to show the expected effect while other studies showed no effect was also observed for the remaining mental health outcomes. Accordingly, a small negative effect of the intervention was observed for chronic self-perceived stress (pooled Hedges’ g: − 0.23 (95% CI: − 0.57, 0.11; heterogeneity: Q = 1.44, I2 = 0%, p = 0.49), and a small positive effect was observed for positive affect (pooled Hedges’ g: 0.06 (95% CI: − 0.78, 0.90; heterogeneity: Q = 3.97, I2 = 49.6%, p = 0.138), arousal, and happiness. Forest plots and albatross plots for these outcomes can be found in Figures S3S8.

Fig. 5
figure 5

Forest plot negative affect (n = 4). TE: Hedges’ g. seTE: standard error of Hedges’ g. N(i): number of participants in intervention condition. N(c): number of participants in control condition

Fig. 6
figure 6

Albatross plot negative affect (n = 4)

Among the physiological outcomes, salivary cortisol was the only outcome to demonstrate the expected reduction post-intervention. Salivary cortisol was reported by four studies. Although not combinable in a meta-analysis, included studies showed a small to medium negative effect on cortisol, with most results falling between the 0.3 and 0.8 effect size contours of the albatross plot. In contrast, among the 8 studies assessing HR, most included studies showed no effect on HR, with most results clustered around the middle of the albatross plot (Fig. 7). This trend was mirrored by the studies assessing HRV.

Fig. 7
figure 7

Albatross plot heart rate (n = 8)

Interestingly, studies reporting on BP showed very heterogeneous results. BP was reported by four studies, three of which were combinable in a meta-analysis. However, despite correcting for methodological heterogeneity, the results of studies included in the meta-analyses were very disparate in size and direction of effect. Additionally, the forest plot showed a very high, statistically significant level of heterogeneity between the included studies (Q = 45.5, I2 = 95.6%, p < 0.0001). These levels of heterogeneity were significantly higher than for any other meta-analysis conducted. Accordingly, it was deemed inappropriate to statistically combine BP outcomes, and no pooled effect size was calculated. This strong heterogeneity was mirrored in the albatross plot, where results were spread out throughout the plot. Forest and albatross plots for physiological outcomes can be found in Figures S9S12.

The only cognitive outcome included in the quantitative synthesis was performance on a memory task, reported by six studies. Overall, included studies suggested a very small negative effect of the intervention on memory task performance, with most results clustering around the 0.2 effect size contour of the albatross plot. The albatross plot can be found in Figure S13.

Risk of Bias Across Studies

The funnel plot showed no evidence of publication bias, as confirmed by Egger’s regression test for funnel plot asymmetry (z = − 1.74, p = 0.081). The funnel plot can be found in Figure S14.

Discussion

Summary of Findings

The aim of this systematic review was to assess the effect of AAIs implemented in higher education settings on the mental, physiological, and cognitive outcomes of students. In general, the results of this review suggest that AAIs in higher education settings are particularly effective at reducing acute feelings of anxiety and stress. The evidence is less clear for other mental health outcomes assessed in this review — the included studies suggest a smaller, but nonetheless beneficial effect of AAIs on these outcomes as well. This review does not suggest a clear beneficial effect of AAIs on physiological or cognitive outcomes of students. Overall, the quality of included studies was moderate, with most studies being classed as “some concerns”.

Mental Health Outcomes

The beneficial effects on acute feelings of stress and anxiety are in keeping with previous systematic reviews, which have shown AAIs to improve these outcomes in a large variety of populations (Beetz et al., 2012; Bert et al., 2016; Ein et al., 2018). Parbery-Clark et al. also found included studies to have beneficial effects on self-perceived stress and anxiety among higher education students (Parbery-Clark et al., 2021). Similarly, previous reviews have shown AAIs to promote a positive mood, increase happiness, and reduce depressive symptoms (Beetz et al., 2012; Kamioka et al., 2014; Morrison, 2007; Souter & Miller, 2007). While the overall direction of effect of the included mental health outcomes was beneficial as expected, some studies reporting on mental health outcomes showed no effect of the intervention. Since formal moderator analyses were not possible in this review, we cannot say with certainty which, if any study characteristics are associated with this. The comparatively smaller effect sizes of chronic stress, anxiety, and depression, assessed with instruments designed to detect changes in the mental state over longer periods of time, may point to limited long-term effects of AAIs, as suggested in previous literature (Hillen, 2020; Serpell et al., 2017; Stern & Chur-Hansen, 2013).

Another possible contributor to differences in study results could be related to intervention design. Beetz et al. suggest that the beneficial effects of HAI could stem from an activation of the oxytocin (OT) system through sensory stimulation (Beetz et al., 2012). Specifically, they state that the closeness of the connection between human and animal, including the presence and duration of physical contact with the animal, is an important factor in if and how much OT is released during HAI (Beetz et al., 2012). Since participants in studies using a passive intervention and stressors were not able to focus completely on the present animal and often did not even touch the animal in question, it is possible that not enough OT was released to achieve the expected beneficial effects on mental health outcomes (Hunt & Chizkov, 2014; Stewart & Strickland, 2013). Since moderator analyses to confirm this hypothesis were not possible, future research could explore whether the use of passive interventions and stressors is indeed associated with a reduced effect of AAIs on the mental health outcomes of higher education students.

Physiological Outcomes

Only three of the included studies provided results for cortisol, with two studies reporting reductions and one study reporting no effect on salivary cortisol at post-test (Crump & Derting, 2015; Pendry & Vandagriff, 2019; Polheber & Matchock, 2014). This trend towards a reduction of cortisol at post-test is in keeping with other literature (Beetz et al., 2012; Lundqvist et al., 2017). By contrast, studies assessing BP showed very mixed outcomes. This mixed effect of AAIs on BP has been reported in other systematic reviews, even though the overall trend suggests a decrease in BP post-AAI (Beetz et al., 2012; Bert et al., 2016). One possible explanation for the large discrepancies between BP results in this review and past reviews could be the poor reliability of BP as an outcome measure. Indeed, a study by Kelsey et al. showed that measures of cardiovascular reactivity, including BP, have a poor reliability across different typical stressor tasks (Kelsey et al., 2007). BP can be affected by variables such as the posture of the participant, movement or respiration (Kelsey et al., 2007). All of these factors differed between the studies included in this review. Additionally, while all studies used a BP monitor, measurements were taken from different locations including the upper arm (Crump & Derting, 2015; Wilson, 1987), the wrist (McDonald et al., 2017) or the finger (Straatman et al., 1997), which may also have contributed to the heterogeneity in results.

Interestingly, most included studies showed no effect of the intervention on HR or HRV. This is different from other reviews, which have found an overall reduction of HR after an AAI in a variety of populations (Bert et al., 2016; Ein et al., 2018). It is possible that AAIs may have less of an effect on physiological outcomes in young, healthy populations. Indeed, while Nimer and Lundahl found a significant improvement of physiological outcomes after an AAI, moderator analyses revealed that populations with disabilities showed significantly larger improvements than healthy populations (Nimer & Lundahl, 2007). Additionally, it is possible that differences in effects between studies are again associated with intervention design: Most studies that assessed HR and HRV included a stressor in their intervention, thus likely triggering an acute stress response among participants. It is well established that in response to an acute stressor, HR increases while HRV decreases (Chu et al., 2021). Accordingly, it is possible that in studies with a stressor, the potential effect of an AAI on these physiological outcomes was not strong enough to compete with or alter the effects of the acute stress response. More studies without an incorporated stressor would be needed to judge the effects of AAIs on the physiological outcomes of students in a non-stressful situation.

Cognitive Outcomes

The studies included in this review showed no significant effect of the intervention on cognitive outcomes. This is an interesting finding, especially considering that past systematic reviews assessing the impact of AAIs among children have found that the presence of animals helped create a productive learning environment (Beetz et al., 2012; Brelsford et al., 2017). Although these systematic reviews point out that there is little evidence that AAIs directly improve academic performance, they have nevertheless been shown to improve related cognitive outcomes like concentration, motivation and attention (Beetz et al., 2012; Brelsford et al., 2017). However, Banks et al. hypothesized that while the presence of an animal may be beneficial for children, whose cognitive functions are still developing, there is less of an impact among higher education students, who are already at their peak of cognitive functioning (Banks et al., 2018). The primary benefits of AAIs for this population therefore seem to be affective, not cognitive.

Limitations of Included Evidence

Included studies shared some characteristics that may limit the generalizability of review results. First, participants in the included studies were overwhelmingly female. This may be attributable to an increased interest in AAIs among females, as most studies recruited participants via self-selection, or to recruitment from traditionally majority-female degree programs, such as psychology or nursing (Fowler, 2018; US Bureau of Labor Statistics, 2020). Since previous research has shown differences between males and females in, for example, responses to stressors, it is possible that results may not be generalizable to both male and female students (Merz & Wolf, 2015; Taylor et al., 2014). Second, although the search strategy was designed to find publications using any intervention animal, almost all included studies used dogs. This may be due to the popularity of dogs as companion animals and the feelings of empathy and companionship associated with them, making them a popular choice for AAIs (Custance & Mayer, 2012). Additionally, dogs may have been the easiest option logistically since some studies cooperated with established university-based AAI programs that were already using dogs (Banks et al., 2018; Barker et al., 2016, 2017; Grajfoner et al., 2017; Pendry et al., 2018, 2020; Pendry et al., 2019a; Pendry, Vandagriff, et al., 2019; Pendry & Vandagriff, 2019; Trammell, 2017), and some studies used pet dogs of the researchers (Pendry & Vandagriff, 2019; Pendry et al., 2018; Pendry, Vandagriff, et al. 2019). Nonetheless, it should be kept in mind that the results of this review represent the effects of AAIs using dogs and are likely less applicable to AAIs using other animals. Other limitations of included studies were small sample sizes, lack of sample size calculations, and general lack of follow-up assessments.

Limitations of the Review Process

The strength of this review lies in the use of the albatross plots to enrich the quantitative data synthesis, as well as the inclusion of RCTs only. Nonetheless, some important limitations remain.

First, the strong methodological heterogeneity severely limited the comparability of included studies, as has been the case with many other reviews in the AAI field (Bert et al., 2016; Brooks et al., 2018; Kamioka et al., 2014). The heterogeneity also limited the number of studies included in the individual meta-analyses and limited our ability to conduct moderator analyses. This heterogeneity is at least partly attributable to the broadly defined eligibility criteria used in this review. Kazdin et al. have remarked that such broad eligibility criteria, where inclusion is based on the presence of an animal in the intervention as opposed to the proposed mechanism of the intervention, is one of the reasons for the methodological heterogeneity in most reviews in the AAI field (Kazdin, 2017). This lack of a guiding theoretical framework in most reviews is exacerbated by the lack of an unanimously accepted theory on the mechanism of AAI effectiveness (Borrego et al., 2014). In order to limit this issue in future research, systematic reviews should settle on a specific theoretical framework to guide their eligibility criteria in order to include only logically comparable studies (Kazdin, 2017).

Second, the albatross plots in this review were explicitly meant to allow a more inclusive overview of available data than what was available based on meta-analyses alone, and were not meant to generate a usable summary statistic. The effect size contours superimposed on the plot are only approximations of the actual effect size (Harrison et al., 2017). While they allow a visual interpretation of the general trend of the included studies in terms of effect size and direction, they are not exact and are not to be interpreted as such (Harrison et al., 2017).

Research Gaps and Implications

If possible, future reviews in this field could conduct moderator analyses to assess whether any study characteristics have an influence on study results. Future studies could explore whether incorporating a stressor in the study design or conducting an AAI in either a group or an individual setting influences the effect of AAIs on health outcomes. Additionally, while a recent review suggested that AAI participation has no adverse effects for participating animals, research is limited and results remain conflicting (Glenk, 2017). There is even less research on potential benefits of AAI participation for animals (Glenk, 2017). Interestingly, research has suggested that following stress, trauma, or abuse, animals can exhibit behavior similar to symptoms of human mental disorders such as depression or post-traumatic stress disorder (Ferdowsian et al., 2011; PTSD in Dogs, 2010). Taking this into account, it is essential that the physical and mental health of animals participating in AAIs is protected. In the best case, AAIs should be mutually beneficial to animals and humans, thus making them a truly shared intervention in the spirit of One Health.

One of the goals of this review was to provide an evidence base that administrators at higher education institutions can use to decide whether to implement AAIs at their own campus. Despite the methodological limitations listed, this review shows that AAIs can improve students’ mental health outcomes, especially acute feelings of anxiety and stress. Taking into consideration the high burden of mental health issues among students at higher education institutions, along with the unprecedented stress caused by the COVID-19 pandemic, higher education institutions will likely be facing an increasing demand for mental health support (Cao et al., 2020; Son et al., 2020; Vadivel et al., 2021). Due to their low cost, easy scalability and high popularity, AAIs present a good option for higher education institutions to improve student mental health (Reynolds & Rabschutz, 2011). This opportunity could be taken up particularly by universities outside of the US and Canada, where AAI programs are still rare. It has to be kept in mind, however, that while stress reduction efforts can certainly help, more structural changes should be implemented to reduce academic, social and financial pressures that impact students’ mental health. These could include an increased mental health budget at higher education institutions, reduced tuition fees, and a mandatory salary for student internships (Bayram & Bilgel, 2008; Hamaideh, 2011; Heckman et al., 2014).

Conclusion

Overall, the results of this review suggest that AAIs in higher education settings can be effective at improving mental health outcomes of students and are particularly effective at reducing acute feelings of anxiety and stress. These findings have been replicated in many different settings and with a variety of populations. However, contrary to prior research, this review does not suggest a clear beneficial effect of AAIs on physiological or cognitive outcomes of students.