Salient stimuli attract attention in humans (Compton, 2003) and non-human primates (hereafter: primates) appear to share this tendency. Such attention biases are typically shaped by evolutionary pressures, as they are important for survival. Reported attention biases include the rapid detection not only of threatening stimuli, such as poisonous animals (Hopper et al., 2021; Masataka et al., 2018; Shibasaki & Kawai, 2009) or predators (Laméris et al., 2022), but also of emotionally valent stimuli (Van Rooijen et al., 2017). Based on evolutionary theories, the latter should especially be the case for social species, for whom the fast detection and recognition of such stimuli triggers corresponding behavioral responses which are thought to aid individuals in navigating their social environment (Van Rooijen et al., 2017; Vuilleumier, 2005). Namely, emotional expressions can inform group members about the expresser’s internal state and potential future behavior (Waller et al., 2017).

Despite that primates use a range of emotional expressions that are comparable between species, their use and function may differ (Kret et al., 2020), which is possibly driven by the socio-ecological environment of the species (Dobson, 2012). Bonobos (Pan paniscus), for example, prevent conflict with sexual interactions, play and grooming activities (Furuichi, 2011; Palagi & Norscia, 2013), and console individuals in distress (Clay & De Waal, 2013). In parallel, bonobos show an attention bias towards affiliative scenes, such as grooming and sexual activities (Kret et al., 2016) and play faces (Laméris et al., 2022). In contrast, rhesus macaques (Macaca mulatta) and long-tailed macaques (Macaca fascicularis) are considered despotic (Matsumura, 1999; Thierry, 1985) and show biased attention for threatening faces of conspecifics (Cassidy et al., 2021; King et al., 2012; Lacreuse et al., 2013; Parr et al., 2013). Important to note, however, is the absence of an attentional bias towards emotionally-salient cues in chimpanzees (Pan troglodytes; Kret et al., 2018; Wilson & Tomonaga, 2018). This may be related to methodological differences between studies, but also other factors such as influences of current affective states (Bethell et al., 2012; Cassidy et al., 2021) and life experiences (Leinwand et al., 2022; Puliafico & Kendall, 2006) that may modulate attention bias. Nevertheless, there is a large body of literature on biased attention towards emotions (Van Rooijen et al., 2017) and findings seem to suggest that general attention biases reflect the socio-biology of the species. This makes it interesting to study these biases in a range of species with different social structures, leading to evolutionary insights in emotion perception.

Orangutans (Pongo spp.) are phylogenetically close to humans and in the wild live in complex but loose social communities. Compared to the other great apes, orangutans do not form stable social groups (apart from mother-infant groupings) and live a semi-solitary existence (Delgado & Van Schaik, 2000; Galdikas, 1985; Mitra Setia et al., 2009; Roth et al., 2020; Singleton et al., 2009; Van Schaik, 1999). Their social structure is highly variable with close-range affiliations depending on sex, age, reproductive state, social status, and ecological determinants. They, nonetheless, form temporary social parties for mating opportunities, socializations for their infants and protection from male coercion. As such, orangutans show a range of expressions and behaviors potentially indicating a sensitivity to emotions (e.g., Davila-Ross et al., 2008; Laméris et al., 2020; Pritsch et al., 2017; van Berlo et al., 2020). Given their social organization, as compared to the other great apes, orangutans are an interesting model to investigate the evolutionary roots of emotion-biased attention.

Here, we investigate whether implicit emotion-biased attention is present in orangutans using the dot-probe paradigm, a suitable paradigm for comparative studies (Van Rooijen et al., 2017). In this task, two stimuli are simultaneously, and briefly (300 ms), presented to individuals on opposite sides of a touchscreen. Stimuli are two photographs, in which a neutral expression is paired with an emotional expression, although other pairings are possible as long as the two stimuli compete for attentional resources. After the brief presentation of the two stimuli, a probe emerges on the location of either the emotional stimulus (i.e., the congruent condition) or the neutral stimulus (i.e., the incongruent condition). As a consequence, attention will automatically be drawn to the most salient stimulus. This attention bias is resulting in faster reaction times in the congruent condition, i.e., to the probe replacing the stimulus that caught their attention, whereas slower reaction times indicate that attention was shifted from the other location (i.e., the less-salient stimulus or incongruent condition). As such, the dot-probe paradigm allows to investigate the implicit attentional processes involved in emotion perception.

Based on our current knowledge of facial and bodily expressions in orangutans, and their putative relevance within their social structure, we predict that orangutans show attention biases towards emotional scenes, although selectively. Currently, we know very little about which specific emotional categories are relevant to orangutans, but take prior work as a starting point (see Kret et al., 2016), who used categories such as grooming, sex, play and yawning. Orangutans use play faces flexibly and possibly intentionally (Waller et al., 2015); hence, we expect orangutans to show a bias for playful scenes. Previously, an attention bias towards grooming, sexual interactions, and yawning was reported in bonobos using a similar paradigm (Kret et al., 2016). However, although the socio-behavioral repertoire of orangutans is somewhat similar to that of other apes, orangutans affiliate less frequently. For example, bonobos use sex to maintain social bonds (De Waal, 1988), whereas orangutans do not. Thus, we expect that orangutans may show a bias for grooming, but not for sexual scenes. Moreover, as orangutans are known to pucker their lips to produce kiss-squeaks when agitated (Hardus et al., 2009), orangutans may show an attention bias towards displays of agitation. Lastly, we expect to find a bias for yawning scenes, in line with previous findings in humans and bonobos (Kret et al., 2016; Kret & Van Berlo, 2021). Although not necessarily an emotional expression, yawning is highly contagious, also in orangutans (van Berlo et al., 2020). It has been proposed that it may synchronize vigilance levels between individuals (Gallup & Gallup, 2007; Miller et al., 2012); thus, its rapid detection may be beneficial in threatening situations.

Method

Subjects and housing

Six Bornean orangutans (Pongo pygmaeus, four female and two males; mean age = 16.2 years; range = 6–36 years old), at Apenheul Primate Park (the Netherlands), participated in the current study (Table 1). The animals were part of a population of 9 orangutans housed in a building consisting of four indoor enclosures that were each connected to outdoor islands. The orangutans were typically housed in 3–4 subgroups, and group composition was regularly changed with the aim to mimic the natural social structure of orangutans in which they form temporary parties but no stable social groups. Some individuals never shared enclosures to avoid conflict (e.g., the two adult males).

Table 1 Subject information

All orangutans were naïve to touchscreen training at the start of this study. Between February 2017 and June 2017, and between October 2017 and February 2018, we initially trained four individuals successfully on the dot-probe paradigm. We had the opportunity to train an additional individual (Kawan) between June 2019 and February 2020. Another individual (Baju) sporadically joined training sessions during this period and showed immediate high accuracy scores. Although this individual did not go through different training stages, he was included in this study as he reached the inclusion criteria (as described below). Touchscreen sessions were conducted between February and April in 2018 for the first four individuals and in February 2021 for the remaining two individuals in an off-exhibit enclosure, and participation was completely voluntarily. During training and testing, orangutans had the opportunity to be surrounded by conspecifics and were thus not separated from other individuals. Nevertheless, orangutans were trained to complete the touchscreen task alone. Sessions were paused whenever another orangutan interrupted. Sessions were furthermore conducted using positive reinforcement training, using quarter pieces of hazelnuts for the initial four subjects and sunflower seeds for the two individuals that were later included, and conformed to the guidelines of the Ex-situ Program (EEP), formulated by the European Association of Zoos and Aquaria (EAZA) as well as to the guidelines formulated by Apenheul Primate Park. The test sessions of the two additional subjects were conducted following a strict COVID-protocol.

Apparatus

All touchscreen sessions were conducted using E-Prime on a TFT-19-OF1 Infrared touchscreen (19″, 1280 × 1024 pixels). The touchscreen setup was encased in a custom-made setup which was incorporated in the orangutans’ enclosure. The researchers controlled the sessions on a laptop connected to the touchscreen setup and could monitor the orangutans’ responses on the touchscreen through a livestream with a camera that was built in the enclosure behind the orangutan. This footage was stored and later used to code the test sessions for outliers and the orangutans’ behavior. Correct responses were rewarded with small food items which were manually delivered through a PVC chute on a 100% fixed reinforcement ratio. The researcher was positioned behind the setup which prevented visual contact between the orangutans and researchers.

Stimuli

Socio-emotional stimuli used during this study were sourced from the Internet or from personal photo libraries. Images were full-color, resized to 330 × 400 pixels and depicted unfamiliar orangutans in either neutral or emotional scenes. Neutral scenes included individuals that were resting, locomoting and showing a neutral expression. Based on previous work (Kret et al., 2016), we defined five emotional categories: Display, Grooming, Play, Sex, and Yawn (Table 2; Table ESM1). We reasoned to use scenes as emotional expressions consist of a combination of facial and bodily cues which together convey information about an individuals’ state and intentions (De Gelder et al., 2010). Using such stimuli may therefore be biologically more relevant than isolated cues (Kano & Tomonaga, 2010). To avoid the potential effect of predicted confounding factors, we then matched neutral and emotional stimuli according to the number of individuals present, the presence of juveniles or flanged males, and other low-level features such as luminance and contrast levels (Kano et al., 2012). Ideally, we would have included more categories such as individuals in distress or involved in agonistic interactions, but were unable to source enough stimuli.

Table 2 Stimulus categories used in this study, together with the number of images per category and mean valence and intensity scores

Seven people (two caretakers and five primatologists [including three authors]) rated these images on a 7-point Likert scale in terms of their valence (ranging from 1 = very negative to 7 = very positive, with 4 = neutral) and intensity (ranging from 1 = not intense at all to 7 = very intense). We calculated intraclass correlations for valence and intensity ratings using a two-way mixed model and a consistency definition. The raters showed a good intraclass correlation, ICC(3,k)valence = .89; ICC(3,k)intensity = .87; Table 2. The Display and Yawn category were rated as relatively negative, compared to the Neutral category, whereas the Grooming, Play, and Sex category were rated as more positive. Furthermore, the emotional stimuli were all rated as more intense than the neutral stimuli (Table ESM2–3).

Procedure

Because the orangutans never worked on touchscreens before, we followed step 1–6 from the training protocol described in the supplements of Kret et al. (2016) for the dot-probe paradigm. In summary, we first habituated the orangutans to the presence of the touchscreen by rewarding them when they approached the screen and used vocal appraisal. We then presented a large black dot on the screen and rewarded the individuals if they touched the screen at any location. Once the orangutans were sufficiently conditioned on the association between touching the screen and receiving a small food reward, we gradually reduced the size of the dot until reaching the final size which was used during the study (200 × 200 pixels). After the orangutans reliably touched the dot, it was followed by a similar dot, on either the left or the right side of the screen. Once this step was established, we proceeded training the orangutans on the trial outline of the dot-probe task (Figure 1): The orangutans started a trial by touching a centrally presented dot (a black circle), followed by the presentation of two pictures; after 300 ms, the pictures automatically disappeared and were followed by a single probe (a black circle) replacing one of the pictures. Pictures during the training phase consisted of colored images of various animals (rabbits, sheep) or flowers. Trials were considered correct and rewarded when the orangutans correctly touched the dot and the subsequent probe and when they were attending the task for the entire trial. Initially, four orangutans reached a 80% accuracy inclusion criterium in the beginning of 2018. One additional orangutan was later trained on the dot-probe paradigm for another study in 2021 (Roth et al., in prep), and another orangutan spontaneously participated. During the dot-probe paradigm, the animals were presented with a black dot in the lower, middle part of the screen. Touching this dot initiated the trial after which two images were immediately presented side-by-side and centered on the y-axis on the screen for 300 ms. One of the images was a neutral stimulus and the other consisted of an emotional stimulus. After the stimuli were presented for 300 ms, they disappeared and the probe (a similar dot) appeared on either the left or right side, replacing one of the two stimuli and remained on the screen until the animal touched the probe. After an inter-trial interval of 2,000 ms, the start dot was presented again and the orangutan could initiate the next trial. The location of the stimuli on the screen and the location of the probe were counterbalanced, and the order of presentation of the emotional categories was randomized.

Figure 1
figure 1

Trial outline of the dot-probe task

The orangutans were presented with 190 unique trials, and 10 repetitions to create 8 sessions of 25 trials each. Unsuccessful trials (defined in the next section) were repeated at the end of all sessions. Ultimately, each orangutan completed between 7 and 12 sessions with an average of 248 (SD = 46.46) trials (range = 175–300).

Data filtering

One researcher coded all test sessions for unsuccessful trials. A second researcher coded 25% of the trials and showed a high agreement, ICC(3,k) = .94, p < 0.001. Unsuccessful trials were defined as trials where orangutans were not properly sitting in front of the screen, not paying attention to the screen during stimulus presentation, not pressing the probe right after onset, switched hands when pressing the probe, where other orangutans interfered with the task, or where the screen did not immediately register a touching the probe despite a touch being visible on the camera recording. We first filtered out erroneous trials and based on these criteria, 556 out of 1,488 trials (37.4%) were removed. Next, we filtered out extremely fast or slow responses. The lower exclusion criterion was RT < 200 ms; the upper criterion was determined by calculating the median absolute deviation (MAD) per subject (i.e., RT = median + 2.5 × MAD; Leys et al., 2013). This resulted in the removal of an additional 135 trials (9.1%; Table ESM2).

Statistical analysis

All analyses were done in RStudio (version 1.4.1106; R Core Team, 2020). Using the package “brms” (Bürkner, 2017, 2018), we fitted Bayesian mixed models to assess whether orangutans show an attention bias for emotionally laden stimuli over neutral stimuli and whether this bias is driven by pre-defined emotional categories. We chose for a Bayesian rather than frequentist approach, as it is particularly useful for small sample studies such as ours (see, e.g., Wagenmakers et al., 2008) Moreover, Bayesian analyses result in directly interpretable results. For instance, they include the 89% credible interval, which indicates the 89% probability that our effect of interest falls within the reported range (McElreath, 2018). This contrasts with the confidence interval interpretation, which only allows for making indirect inferences about the true estimate falling within a specific range (Hespanhol et al., 2019). The prime number 89 is different from the conventional 95% confidence interval in a frequentist approach in order to avoid unconscious hypothesis testing (McElreath, 2018). In addition, the analysis relies on the inclusion of prior knowledge or expectations, is therefore less sensitive to type I errors, and provides more robust results in small samples (Makowski et al., 2019).

To investigate a general bias for emotional stimuli, we fitted a Bayesian mixed model using a Student-t distribution, with a continuous dependent variable, reaction time (ms), and Congruence as an independent, categorical variable (with congruent trials having a probe appear behind an emotional stimulus and incongruent trials having a probe appear behind a neutral stimulus). Congruence was sum-coded. Moreover, we included nested random intercepts, namely sessions (minimum of 7 and maximum of 12 per subject) nested within subjects (6). We used a weakly informative Gaussian prior for the intercept (M = 500, SD = 100) and a more conservative Gaussian prior the fixed effect (M = 0, SD = 10). Furthermore, we used the default half Student-t priors with 3 degrees of freedom for the random effects and residual standard deviation.

In the second model where we zoomed in on emotion categories, we fitted a Bayesian mixed (Student-t) model with reaction time as dependent variable and an interaction between Congruence and Emotion Category (with the categories Sex, Play, Grooming, Yawning, Display). Congruence and Emotion Category were sum-coded (also known as effect coding), and we included a nested random intercept (session within subject). We used the same prior settings as in the previous model (Gaussian priors for the intercept and independent variables, default half Student-t priors for the random effects and residual standard deviation).

To further substantiate our findings, we calculated a Bayes factor (BF) for both of our models by comparing them to an intercept-only (null) model. The BF can quantify the amount of evidence for or against a hypothesis (see, e.g., Lee & Wagenmakers, 2013). We also conducted post hoc analyses to assess the influence of various potential confounds (e.g., stimulus intensity, presence or absence of infants and flanged males); as we did not find evidence for an effect for any of these, a description of these analyses and their results can be found in the provided Supplementary Material.

To summarize the results, we report (i) the median difference between conditions; (ii) the 89% credible interval (CI); (iii) the probability of direction (pd), reflecting the certainty with which an effect goes in a specific direction (here: a faster reaction time to probes replacing emotional stimuli) and ranging between 50 and 100% (Makowski et al., 2019); and (iv) the Bayes factor.

We check the validity of our models using the WAMBS checklist (Depaoli & van de Schoot, 2017). For every model, we ran 4 chains and 40,000 iterations (including 2,000 warm-up iterations). Model convergence was checked by inspecting trace plots, histograms of the posteriors, Gelman-Rubin diagnostics, and autocorrelation between iterations (Depaoli & van de Schoot, 2017). No divergences or excessive autocorrelations were found.

Results

For our first model, in which we investigated a general bias for emotional stimuli over neutral stimuli, we did not find a robust effect for Congruence on reaction time (median differenceneutral-emotional = 7.70 ms, 89% CI [−10.94 to 26.30], pd = 0.75; see Table 3 and Figure 2). This conclusion can be drawn based on the 89% credible interval (CI), which contains values indicating a positive as well as negative difference between reaction times on probes appearing behind emotional and neutral stimuli. The pd indicates a 75% certainty that the effect is in the direction that we expect (i.e., orangutans have a bias for emotional stimuli), but it does not inform us about how plausible the null-hypothesis (i.e., no difference between emotional and neutral stimuli) is. To find the strength of evidence for our null finding, we computed the Bayes factor in favor of the null-hypothesis over the alternative hypothesis (BF01) and found BF01 = 1.35, indicating anecdotal evidence for the null hypothesis (Lee & Wagenmakers, 2013). As such, orangutans did not show a bias for emotional over neutral stimuli in our study, but more data are needed to draw definitive conclusions. In the second model, where we looked at specific emotion categories, we again found no robust evidence for an attention bias for specific emotions (Yawn: median differenceneutral-emotional = −2.95 ms, 89% CI [−41.81 to 36.17], pd = 0.45; Display: median differenceneutral-emotional = 15.67 ms, 89% CI [−16.57 to 48.03], pd = 0.78; Grooming: median differenceneutral-emotional = −2.02 ms, 89% CI [−32.32 to 27.89], pd = 0.46; Play: median differenceneutral-emotional = 20.58 ms, 89% CI [−11.82 to 52.62], pd = 0.84; Sex: median differenceneutral-emotional = 12.57, 89% CI [−20.37 to 45.36], pd = 0.73; see Table 4 and Figure 3; also see Table ESM3 and Figure ESM3 for individual results). Calculation of the subsequent Bayes factor indicated moderate evidence for the null-hypothesis (BF01 = 4.79; Lee & Wagenmakers, 2013).

Table 3 Model output for model 1 (general emotion bias)
Figure 2
figure 2

Median reaction time (in milliseconds) per probe location. Congruent trials represent trials in which the probe appeared behind an emotional stimulus, whereas in incongruent trials, the probe appeared behind a neutral stimulus. Error bars represent the 89% credible interval

Table 4 Model output for model 2 (emotion category bias)
Figure 3
figure 3

Median reaction time (in milliseconds) per emotion category and probe location. Congruent trials represent trials in which the probe appeared behind an emotional stimulus, whereas in incongruent trials, the probe appeared behind a neutral stimulus. Error bars represent the 89% credible interval

Discussion

The current study investigated whether orangutans show an attention bias towards emotional stimuli. Contrary to our predictions, the orangutans in our sample did not show an attentional bias to emotions, nor towards specific emotional categories. However, more data are needed to make a decisive conclusion whether these effects are truly absent in orangutans. Below, we discuss several reasons for our findings.

We applied the dot-probe paradigm, a well-validated paradigm in humans (but see Puls & Rothermund, 2018), and a promising tool for comparative studies (Van Rooijen et al., 2017). Several studies have successfully used the paradigm with primates (Cassidy et al., 2021; King et al., 2012; Kret et al., 2016; Lacreuse et al., 2013; Leinwand et al., 2022; Parr et al., 2013), although, like our current study, not all report significant results (see e.g., Kret et al., 2018; Wilson & Tomonaga, 2018 for null-findings in chimpanzees). However, several methodological parameters may explain these inconsistencies.

Stimulus presentation duration can determine what attentional process is measured and therefore affects study outcome. Long stimulus exposure may result in the involvement of the prefrontal cortex and attentional control (Cisler & Koster, 2010; Weierich et al., 2008), thus not measuring implicit attention bias (Cassidy et al., 2021). To measure implicit attention to specific stimuli, presentation times have to be short enough to prevent saccades, which occur around the 250-ms mark in humans as well as other primates, including great apes (Fuchs, 1967; Kano & Tomonaga, 2011). As such, a stimulus presentation duration of around 250–300 ms is an appropriate threshold for stimuli being clearly (supraliminally) visible (Ben-Haim et al., 2021). To be in line with previous studies (Kret et al., 2016, 2018) and recommendations stemming from the human literature (Petrova et al., 2013), we used a stimulus presentation duration of 300 ms, which is most likely to target implicit stages of attention. Given the existing evidence, we have no reason to believe that our presentation time was somehow insufficient to measure an attentional bias for emotional expressions in orangutans. Nevertheless, most studies on the visual system of primates have been conducted in monkeys, and only very few studies have thus far compared gaze patterns in different great ape species (see e.g., Kano et al., 2012; Kano & Tomonaga, 2011). As such, more work is needed to pinpoint potential species-specific characteristics in visual processing.

Moreover, stimulus pairing may influence test outcomes (Van Rooijen et al., 2017). Emotional stimuli can be paired with scrambled stimuli (Parr et al., 2013), neutral images without conspecifics (Kret et al., 2016), neutral stimuli (King et al., 2012; Kret et al., 2018; Lacreuse et al., 2013; Leinwand et al., 2022; Wilson & Tomonaga, 2018), or other emotional stimuli. Differences in saliency or low-level features between the emotional and paired stimulus potentially influence detectability of biases towards the stimuli of interest. For instance, Wilson and Tomonaga (2018) tested chimpanzees and paired threatening facial expressions with scrambled images, for which they found an attention bias, and additionally paired threatening stimuli with neutral stimuli, for which no evidence of a bias was found. As we only included emotion-neutral pairings in our study, future work investigating emotion-biased attention in orangutans could include different types of pairings to disentangle effects of, e.g., seeing (familiar or unfamiliar) conspecifics, scrambled images, or neutral images without conspecifics to rule out potential effects of low-level features (Tomonaga & Imura, 2015).

Possibly, the used stimuli were not biologically relevant enough for the orangutans, or still-images do not adequately represent the saliency of the actual expressions. Considering that we selected our emotional categories based on previous findings (King et al., 2012; Kret et al., 2016; Parr et al., 2013) and on work indicating orangutans have a sensitivity to emotional expressions of conspecifics (Davila-Ross et al., 2008; Pritsch et al., 2017), we deem this unlikely. Simultaneously, we presented a limited number of categories and other emotional expressions might induce attention biases. For instance, an eye-tracking study with Sumatran orangutans has shown that they looked longer at emotional stimuli compared to neutral ones, specifically looking longer at the silent bared-teeth face, but not at the bulging lip face (Pritsch et al., 2017). Moreover, following Kret et al. (2016), multiple experts classified the stimuli in terms of their emotional valence and intensity and showed high inter-rater reliability, suggesting that the ratings of the used categories are trustworthy. We encourage future studies to include more emotional categories, including, for example, silent bared-teeth face or bulging lip face.

Alternatively, it is possible that orangutans simply do not attend to emotional stimuli automatically. Implicit attention biases are theoretically expected to be the strongest in highly social species where the rapid detection and recognition of another’s emotional expression is needed for appropriate responses (Spoor & Kelly, 2004). Orangutans lead semi-solitary lives, and hence, it might not be important for them to be implicitly sensitive to other’s emotions, while this is arguably beneficial for obligate group-living species (see e.g., findings by Lewis et al., 2021). In contrast, emotional expressions of unknown individuals might be more relevant as such individuals pose a potential higher likelihood of threat or unpredictability (Campbell & De Waal, 2011). Our results do not give clear evidence to either confirm or reject this hypothesis, although it seems unlikely that orangutans do not implicitly attend to emotional stimuli. For example, the flexible production of play faces (Waller et al., 2015) and rapid mimicry (Davila-Ross et al., 2008) suggest that orangutans are able to quickly recognize and respond to such facial expressions. Play facial expressions are, however, arguably more relevant for juveniles, potentially explaining why we did not find a bias for such stimuli in our sample as the majority were adults (5 out of 6 individuals). Equally for other emotional categories, it is possible that individual characteristics of the subjects, such as sex (Howarth et al., 2021), age range, temperamental predispositions, current affective states (Bethell et al., 2012; Cassidy et al., 2021), and life experiences (Leinwand et al., 2022; Puliafico & Kendall, 2006), influenced the relevance of emotional scenes, therefore limiting the interpretability of our results. For instance, orangutan aggregation patterns are highly variable, and sex-specific patterns for example may differ between sites (Galdikas, 1985; Roth et al., 2020), which could influence sex-specific effects on attention. Testing for such individual differences is beyond the scope of the current sample size, but visual inspection of the results per individual showed that the absence of evidence for attentional biases was consistent across our sampled individuals.

Moreover, characteristics about the individuals depicted on the stimuli, such as facial characteristics, may have obfuscated the effect of emotion on attention. It has previously been reported that orangutans look more at the eyes of juveniles than to adult eyes (Kano et al., 2012). The lighter coloring around the eyes of juvenile orangutans and flanged cheeks of males may present conspicuous facial features that are attractive. To control for these characteristics, we carefully paired emotional and neutral stimuli and took into account if the expresser was a juvenile or flanged male. We tested if the presence of juveniles or flanged males on either the probe, non-probe or both influenced the reaction times, but found no such effect. Hence, we can only conclude that the absence of a bias for emotional stimuli was not modulated by these facial features.

Interestingly, we did not find a bias for yawning scenes. Bonobos previously showed a strong attention bias for yawning (while controlling for canine visibility; Kret et al., 2016). Indeed, yawns are contagious between bonobos and contagion is stronger between kin and friends or when expressed by a high ranking group member (Demuru & Palagi, 2012; Massen et al., 2012; Palagi et al., 2014). In our earlier work, we provided experimental evidence for yawn contagion in orangutans (van Berlo et al., 2020), although this effect was independent of the familiarity of the stimulus individual. Yawning potentially facilitates thermoregulation of the brain (Massen et al., 2021), where cooling may promote vigilance. The contagiousness of yawning within a group may consequently synchronize vigilance (Bower et al., 2012; Gallup & Gallup, 2007) and thus may be beneficial to quickly attend to. Given that yawning is a dynamic facial expression, the still images of yawns in our study may lack crucial information for orangutans to elicit an attention bias.

The absence of a bias for other emotional stimuli, such as the display (i.e., kiss-squeak), may be explained simply that the associated facial expression is more like a by-product, rather than a signal in and of itself. Kiss-squeaks are mostly produced in response to predators or other orangutans (Hardus et al., 2009) and predominantly function as an auditory signal (Lameira et al., 2013), hence explaining why an implicit attention bias is absent. A recent study showed that a visual attention bias can be strengthened by including congruent auditory signals (e.g., hearing an alarm call when viewing a predator; Sato et al., 2021). This method provides an interesting way to complement emotion-biased attention in the future.

In conclusion, we found no convincing evidence for implicit attention biases for scenes depicting display, grooming, sex, play, or yawning and addressed a number of methodological parameters that may explain these findings. Future studies could focus on exploring attention to a wider range of social and emotional scenes further, for instance, by including auditory signals. Individual factors might have influenced our results, and we recommend future studies to take this into account when possible. Orangutans remain interesting study subjects for investigating emotion-biased attention, given their unique social structure. We therefore encourage future studies to investigate both implicit and explicit attention processing for emotional stimuli.