Reciprocity may be an important mechanism in the evolution and maintenance of cooperation between unrelated animals (Trivers 1971). Actors can recoup the immediate disadvantage of interactions that are costly to themselves but beneficial to the recipient, so long as the initial investment increases the likelihood the actor becomes the recipient in return (Trivers 1971; Axelrod and Hamilton 1981; Carter 2014). Though simple in concept, demonstrating evidence of reciprocity, particularly in free-roaming animals, poses considerable challenges.

One such challenges derives from there being multiple forms of reciprocity that can operate simultaneously in a single system, and whose presence can obscure or confound the ability of researchers to detect one or more forms. Of the forms of reciprocity, what is often called direct reciprocity has received the most attention in empirical research (Krams et al. 2008; Smith et al. 2010; Fraser and Bugnyar 2012; Wilkinson et al. 2016; Edenbrow et al. 2017; Kern and Radford 2018). According to direct reciprocity, animals preferentially cooperate with individuals that have cooperated with them in the past, i.e., an actor helping animal A results in animal A helping that actor (Trivers 1971; Axelrod and Hamilton 1981; Carter 2014). However, generalized reciprocity (sometimes called upstream indirect reciprocity) may also be widespread (Nowak and Sigmund 2005). Under generalized reciprocity, cooperation is maintained when individuals that receive cooperation are more likely to act cooperatively toward any other individual — regardless of past cooperation between that specific pair, i.e., animal A helping an actor results in that actor helping animal B, where animal B can be any individual, including animal A (Hamilton and Taborsky 2005; Nowak and Sigmund 2005). Crucially, in cases where animal B and A represent the same individual, generalized reciprocity is indistinguishable from direct reciprocity (Hamilton and Taborsky 2005; Nowak and Sigmund 2005) and either could result in the same visible pattern of cooperation. As such, it is important that studies seeking evidence of direct reciprocity account for the possibility of generalized reciprocity (and vice versa). In other words, studies should test for both mechanisms. However, only a few studies have tested generalized reciprocity alongside direct reciprocity (Majolo et al. 2012; Duque and Stevens 2016; Molesti and Majolo 2017), and we are aware of none that have done so experimentally.

In addition to different mechanisms underpinning apparently similar patterns of cooperative behaviors, another considerable challenge for researchers studying reciprocity is ruling out kin selection as a possible explanation by ensuring that subjects are unrelated. While relatively simple to achieve in lab-based studies (Milinski 1987; Hamilton and Taborsky 2005; Duque and Stevens 2016), this can be more difficult with free-roaming subjects where information on both maternal and paternal relatedness are seldom available (Seyfarth and Cheney 1984; Krams et al. 2008; Cheney et al. 2010; Edenbrow et al. 2017; Kern and Radford 2018). Accounting for both maternal and paternal relatedness is important when investigating reciprocity because many social animals appear to recognize and associate preferentially with not only maternal but also paternal kin (Widdig et al. 2001; Mateo 2002; Streich et al. 2002; Wahaj et al. 2004; De Moor et al. 2020).

Finally, researchers studying reciprocity must overcome the challenge of lacking a priori information on the timescales over which reciprocal exchanges might be operating in their study system (de Waal and Brosnan 2006; Schino and Aureli 2009). Animals can reciprocate behaviors in the span of minutes to hours — completing the cycle of investment and payback quickly (Trivers 1971). However, animals that live in relatively stable social groups have the opportunity to interact repeatedly with the same partners over weeks, months, or even years. Direct reciprocity can develop over a longer timescale by animals cooperating preferentially with the specific partners that cooperate with them most frequently (Schino and Aureli 2009). Observational studies have investigated direct reciprocity in the longer timescale — using multi-year datasets to explore cooperative partner preferences of vampire bats (Desmodus rotundus) (Wilkinson et al. 2016) and non-human primates (Schino et al. 2007; Jaeggi et al. 2013). Generalized reciprocity can also occur over longer timescales, where animals that are the most frequent recipients of cooperation become, in turn, the most frequent providers of cooperation (Pfeiffer et al. 2005). Studies of direct or generalized reciprocity that focus only on shorter timescales may miss evidence of reciprocity over a longer timescale, and vice versa. When possible, it can therefore be beneficial for studies to be designed in a manner that assesses all appropriate timescales over which a reciprocal exchange could occur. But seldom is more than one timescale compared in the same system (Jaeggi et al. 2013), and we are aware of no study that has examined generalized reciprocity over the longer-term.

We set out to test for direct and generalized reciprocity in both the short- and longer-term using free-ranging rhesus macaques (Macaca mulatta) with a deep genetic pedigree containing information on maternal and paternal relatedness extending multiple generations, allowing us to exclude related partners from our study design. We adopted a classic playback experiment, designed to investigate one of the most studied behavioral exchanges in non-human primates — grooming for coalitionary support. Grooming (“allogrooming”) is where animals pick through one another’s’ fur, removing debris and parasites (Hutchins and Barash 1976). In addition to its hygienic benefits, grooming is a common behavior in non-human primates that forms and maintains social bonds (Henzi and Barrett 1999). Coalitionary support is a high-risk behavior wherein individuals intervene in ongoing conflicts by forming alliances with one combatant against the other (Chapais 1992, 1995). Low ranking individuals are expected to exchange grooming for coalitionary support from higher ranking individuals whose assistance in agonistic conflicts is more valuable than support from other low-ranking animals (Seyfarth 1977). The playback experiment we used takes advantage of non-human primates’ individually recognizable recruitment calls in order to measure variation in individual responses to simulated agonistic conflicts (Gouzoules et al. 1984; Rendall et al. 1996; Fischer 2004).

We predicted that if direct reciprocity is the mechanism responsible for the exchange of grooming for coalitionary support between non-relatives on a short timescale, then grooming should lead to a temporary increase in an individual’s responsiveness to a recruitment call from their most recent grooming partner. If generalized reciprocity is the mechanism behind the exchange of grooming for coalitionary support on a short time scale, we predicted that grooming should lead to a temporary increase in an individual’s responsiveness to calls for coalitionary support from any group mate. In the long-term, we predicted that if direct reciprocity is operating, an individual should be more responsive to recruitment calls from individuals that have groomed them frequently in the past. While if generalized reciprocity is operating in the longer term, then we predicted that the individuals that have received the greatest amounts of grooming from all partners in the past should be the most responsive to calls for coalitionary support from others.


Study subjects

We conducted this study on rhesus macaques at the long-running Cayo Santiago field station off the coast of Puerto Rico. Subjects were 81 mature adult females, i.e., 6 years old or greater (Blomquist et al. 2011) of a single social group, “F”. Each subject was individually identifiable from her unique physical, especially facial, features in addition to individualized ear notches, and tattoos on the chest and inner thigh. The colony’s monkeys are habituated to humans and have been subject to past behavioral experiments, including playback experiments (Gouzoules et al. 1984; Rendall et al. 1996).

We determined relatedness between all pairs of subjects along both maternal and paternal lines using the field station’s genetic pedigree, which dates back to 1985, is based on 29 microsatellite markers, and includes over 4500 individuals with known parentage (Brent et al. 2013; Widdig et al. 2016). Females in this population preferentially associate with both maternal and paternal kin compared to unrelated females (Widdig et al. 2001; Streich et al. 2002). By using complete parentage data, we were able to exclude subject-partner pairs whose interactions might be better explained by kin selection. Experimental trials involved subject-partner pairs that were not closely related, which we defined as a mean coefficient of relatedness (r) < 0.125 (mean r for all subject-partner pairs in the experiment was 0.026 ± 0.04 SD with a range of 0.0–0.093) (Seyfarth and Cheney 1984; Silk et al. 2010).

Rhesus macaques live in groups with relatives and a larger number of non-relatives. Their society is highly despotic, composed of a matrilineal dominance hierarchy of natal females and a separate hierarchy for dispersing males. While nearly all behaviors are biased towards related females, female rhesus macaques also attend to and form social bonds with non-kin partners: 38% (327/900) of grooming bouts are directed toward non-kin (Sade 1965), 29% (1079/3774) of coalitionary support goes to non-kin (Bernstein and Ehardt 1985), and females attend to social information from non-kin (Gouzoules et al. 1984; Pfefferle et al. 2014). Together, this makes female rhesus macaques a good system for testing if grooming relationships between non-kin influence responsiveness to calls for coalitionary support.

Data collection schedule

This study took place between January 2016 and September 2017. Throughout the study period, we collected observational data on the behaviors of all females in group F to establish grooming partnerships, the amount of grooming each individual female received from all others, and each subject’s position in the dominance hierarchy. From January to May 2017, we recorded vocalizations for use as experimental stimuli. This 5-month period corresponded with the population’s annual mating season (Vandenbergh and Vessey 1968; Berman 1980). The experimental phase of this study took place between May and September 2017, during which time females gave birth to new infants. Data collection in the experimental phase was ended by Hurricane Maria, a category 4 storm that hit the field site on September 20th, 2017. All data were collected prior to this natural disaster.

Establishing dominance ranks

We collected observational data using 10-min focal animal samples (Altmann 1974) based on a previously established ethogram (Brent et al. 2013). We recorded the duration of all grooming bouts and the occurrence of all agonistic interactions, along with the identity of all adult female social partners. We balanced the number of focal animal samples across subjects within the year, as well as across morning and afternoon to avoid biases driven by time-dependent patterns of interactions (Brent et al. 2013). We collected 733 h of focal animal samples, with a mean of 4.95 ± 1.62 SD h of observation per subject.

Exchange of grooming for coalitionary support in non-human primates is posited to occur up the dominance hierarchy with low ranking females grooming unrelated higher-ranking females who in turn provide their support to the lower ranking female (Seyfarth 1977). To best imitate this exchange, subjects were always the higher-ranking member of each experimental dyad and thus heard the recruitment call of a lower-ranking unrelated female. We determined the dominance rank of all females using pairwise win–loss data from agonistic encounters. We calculated dominance rank as the percentage of all adult females in the study group that a subject outranked. We classed females as either high, middle, or low ranking, with high ranking animals being those that outranked ≥ 80% of other females, middle ranking animals outranked between 50 and 79% of other females, and low ranking animals outranked < 50% of other females (Madlon-Kay et al. 2017).

Recording and preparation of vocal stimuli

We recorded vocalizations for playback stimuli with a Marantz PMD661MKII portable digital solid-state sound recorder (Marantz Professional, Cumberland, RI, USA) and a Sennheiser directional microphone at a mean distance of 9.2 m ± 4.8 SD. For each vocalization, we documented the caller’s ID, distance to the microphone, and behavioral context. Vocalizations were visualized with PRAAT 6.0.2. We selected calls fitting the characteristics of noisy screams — a common class of call rhesus macaques used to recruit coalition partners when engaged in physical conflicts with higher-ranking unrelated group mates (Gouzoules et al. 1984). Of the 125 noisy scream bouts, we recorded, 13 (10.4%) resulted in successful recruitment of a coalition partner: six related females, three unrelated females, three unknown juveniles, and one male. This rate of recruitment is similar to the rate of fight interference found in captive rhesus macaques, 8.1% (3774/46,517) (Bernstein and Ehardt 1985), and previously recorded in Group F on Cayo Santiago, 10.7% (988/9252) (Kaplan 1978). Each of our stimuli were composed of a sequence of 5 to 8 screams (6.47 ± 1.35, mean ± SD) each of which were between 0.5 and 1.5 s in length, leading to an average stimulus length of 4.95 s ± 1.69 (Fig. 1). The length of our stimuli was based on previous work that found longer exemplars resulted in subjects frequently approaching, discovering, and even charging the speaker (Gouzoules et al. 1984). By using shorter stimuli, we avoided inciting subjects to search for our speaker, which would reveal the experiment to the subjects, and prevent future trials. All stimuli were standardized to an intensity of 70 dB ± 3.2 SD measured with a sound level meter (Peak Meter MS6708) at a distance of 10 m to match the mean intensity of recorded natural calls. We used 60 stimuli recorded from 30 females.

Fig. 1
figure 1

Spectrogram of a representative noisy scream used as a stimulus in our playback experiment

Experimental procedure

Playback trials were conducted by two experimenters; one operated the camera (Panasonic HC-V770), the second operated the playback device (iphone 6 with a 20-m cable) and concealed the playback speaker (Mipro MA portable speaker). We hid the speaker behind foliage 10 m ± 2.2 SD to the left or right of the direction the subject was facing, and the digital video camera was placed 8 m ± 0.82 SD in front of the subject. Trials were conducted when the monkey whose call was to be played was > 50 m away or out of sight. We played the stimulus through the speaker when the subject was looking away from the speaker’s location and was at rest, i.e., not grooming another individual or foraging. Trials were recorded for 80 s beginning 20 s before the onset of the stimulus (Seyfarth and Cheney 1984).

We took a number of precautions to avoid subjects becoming habituated to the experimental procedure: (1) we conducted no more than two trials per day; (2) a stimulus could not be used again for 30 days once used, and could not be used more than three times overall; (3) for each trial conducted we ran at least three mock trials in which the experimental apparatus were placed, but no stimulus was played; (4) females could not be subjects in two trials of the same condition; and (5) combinations of subjects and callers were never repeated.

Experimental design

Our experimental design included two control conditions and three test conditions (Fig. 2). Our two control conditions were the “null control” and “social control”. The “null control” (Fig. 2) allowed us to quantify a subject’s baseline response to the recruitment call of a lower-ranking, unrelated, female group mate in the absence of any recent interactions. In null control trials, a subject was played a female’s recruitment call after a 90-min period in which the subject received no interactions from any individual of any age-sex class. The 90-min period without grooming or submissions ensured subjects’ responses were not primed by receiving a recent interaction of interest and has been used in previous playback experiments to indicate baseline responsiveness (Cheney et al. 2010). The “social control” (Fig. 2) allowed us to account for the possibility that any prior interaction with a caller affected a female’s subsequent response to recruitment calls (Cheney et al. 2010). In social control trials, subjects (monkey B) were played the recruitment call of a female (monkey A) that submitted to them within the last 60 min. Submissive behaviors included receiving a fear grimace from monkey A, the subject physically displacing monkey A, or monkey A avoiding the subject. Importantly, in both control conditions, the subject and caller were not observed to groom each other at any point during the 2-year study period. We did not include a non-recruitment call control in our design because we considered all conditions to be equal with respect to the novelty of the call type used. That is, in all conditions, subjects hear a recruitment call. The novelty of hearing a recruitment call does not vary across conditions and should not therefore bias responses toward one condition over the others.

Fig. 2
figure 2

Schematic of the experimental conditions used to test direct and generalized reciprocity in the short and longer term

Fig. 3
figure 3

Subjects’ A) latency to look, B) duration of looking in the direction of the playback stimulus separated by condition type and clustered by the four analyses we used to test our predictions: A1 and B1 subject responses to recruitment calls from recent unrelated female grooming partners (recent groomer), compared to calls from unrelated females other than a recent grooming partner (not recent groomer), unrelated females that recently submitted to them (social control), and unrelated females subjects have had no recent interactions with (null control). Boxes indicate the inter quartile range (IQR), with the central line depicting the median and the whiskers extending to 1.5*IQR. A2 and B2 Subject responses when “recent groomer” and “not recent groomer” conditions were combined, relative to the social and null controls. A3 and B3 Subject responses to recruitment calls from unrelated females that have groomed them frequently over 2-year study period (frequent grooming partner) relative to the null control. A4 and B4 The relationship between subjects’ responses in all conditions and the total amount of grooming subjects received across the study period (standardized to the group mean). The black line is from a simple linear regression fit to the data points, and the shaded area indicates the standard error. Durations were adjusted by subtracting the amount of time the subject spent looking towards the speaker in the 20 s before the onset of the stimulus from the time the subject spent looking in the direction of the speaker in the 20 s after the onset of the stimulus. Significant differences between conditions are indicated by bars. Dots represent individual data points. No-look trials in which the subject did not orient towards the speaker and in which no latency was recorded (n = 8 of 64) are not included for the purposes of visualizing the data

We tested for reciprocity in the short-term by asking whether receiving recent grooming, within the last 60 min, predicted the strength of a subject’s response to another female’s recruitment call, i.e., an estimate of a subject’s interest in gathering information on a conflict that could lead to her providing coalitionary support (Seyfarth and Cheney 1984). In rhesus macaques, intervention is preceded by orienting to the direction of a recruitment call and looking toward the disturbance (Kaplan 1977). The orienting response itself can be predictive of an individual’s likelihood to intervene because the speed of orientation and duration of time spent looking in many mammals is indicative of the information’s value to the animal, and information gathering is crucial to decision-making (Winters et al. 2015). After periods of looking in the direction of simulated conflicts, rhesus macaques in this population have also been shown to approach long sequences of noisy screams delivered from concealed playback speakers (Gouzoules et al. 1984), suggesting these stimuli are biologically and behaviorally relevant to the animals. We expected monkeys to look longer in the direction of calls when the information presented to them is more valuable — something they may need to act on, compared to cases where the information presented is of little value and no decision or future action on their part is required.

To test reciprocity in the short term, we used two test conditions. In the first condition, a female (monkey A) groomed the subject (monkey B), then the subject heard the recruitment call of her most recent grooming partner (monkey A) (Fig. 2: recent groomer). In the second condition, a female (monkey A) groomed the subject (monkey B), then the subject heard the recruitment call of a female other than her most recent grooming partner (monkey C) (Fig. 2: not recent groomer). Subjects were considered “groomed” if they received a minimum of 10 s of continuous grooming from an unrelated lower-ranking female. To test for direct reciprocity, we determined first if subjects’ responses in both conditions differed significantly from responses in the control conditions, and second if subjects’ responses differed significantly between the recent groomer and not recent groomer conditions themselves. Comparing between conditions tests if subjects showed a stronger response when the call came specifically from their most recent grooming partner (recent groomer) compared to when the caller was not the female that recently groomed them (not recent groomer) (Fig. 2). Because generalized reciprocity subsumes direct reciprocity, to test generalized reciprocity in the short term, we combined subject responses in “recent groomer” and “not recent groomer”. Then we asked if, compared to control conditions, subjects responded more strongly to a recruitment call after receiving recent grooming from any group mate (recent groomer + not recent groomer), regardless of whether the caller had provided the grooming (Fig. 2). Comparing both within and between our two short-term reciprocity conditions allowed us to discriminate between evidence for direct or generalized reciprocity. If generalized reciprocity was present, the subject (monkey B) should respond to a recent grooming partner (monkey A) and another female (monkey C) more strongly than in the control conditions (Fig. 2). Whereas a response significantly stronger to the recruitment call of a recent grooming partner (monkey A) but not another female (monkey C) would indicate the presence of only direct reciprocity. It should be noted, this design allows us to discriminate between the presence of direct reciprocity alone and the presence of direct and/or generalized reciprocity. It does not allow us to determine if generalized reciprocity is present alone.

We tested reciprocity over a longer timescale by asking whether grooming relationships detected over the 2-year study period predicted the strength of subjects’ responses to recruitment calls. To test direct reciprocity over the longer timescale, we used a test condition in which a subject (monkey B) who had received no grooming or submission from any partner (of any age or sex class) for 90 min heard the recruitment call of a female (monkey A) that had groomed them frequently within the study period (Fig. 2: frequent grooming partner). Subject pairs used in this condition represented the highest rates of grooming between unrelated females in the group. Pairwise grooming index values were calculated as the rate of grooming (seconds per hour observed) between each female dyad divided by the mean rate of grooming for all possible female dyads (Silk et al. 2006). Of the 396 dyads of unrelated females in group F, only 136 were observed to groom each other. We divided the grooming index values for these 136 dyads into quartiles and chose our subject-caller pairs from the 29 dyads in the upper two quartiles. The pairwise grooming index values of our subject-caller pairs were between 10.76 and 55.55 with a mean of 19.06 and standard deviation of 15.01. For reference, during our study period, closely related female dyads (r > 0.125) in this group had a mean grooming index of 16.89 ± 22.65. Thus, the unrelated females we tested in the frequent grooming partner condition had grooming relationships similar in strength to female kin in this highly kin-biased system. To establish if there was evidence for long-term direct reciprocity, we determined if subject responses in the frequent grooming partner condition differed significantly from their responses to the null control condition.

To test generalized reciprocity over the longer timescale, we determined if subject responses across all conditions (recent groomer, not recent groomer, and frequent grooming partner) and controls (social and null) were significantly positively related to the total amount of grooming they received from all partners over the study period (Fig. 2). The amount of grooming a subject received was measured as a proportion of focal animal samples in which the subject was groomed by an adult female, relative to the total number of focal animal samples collected for that subject. This allowed us to test if females who generally received more grooming were more responsive to recruitment calls. To increase the power of the analysis, we used all the available trial data to test for generalized reciprocity in the long-term statistically (n = 64), rather than creating a separate condition type (ESM, Table S1).

Video coding

Responses to playback trials were analyzed frame-by-frame, at 30 fps, by an observer blind to experimental condition using BORIS 6.3 software. We extracted two measurements of subject responsiveness: (1) “latency to look” was the amount of time between the onset of the stimulus and the subject’s orientation to the direction of the speaker (Gouzoules et al. 1984; Seyfarth and Cheney 1984; Rendall et al. 1996; Fischer 2004), (2) “duration of looking” was the amount of time the subject spent looking towards the speaker in the 20 s after the onset of the stimulus minus the time spent looking towards the speaker in the 20 s before the onset of the stimulus (Palombit et al. 1997; Lemasson et al. 2008). Subtracting pre-onset looking time accounted for incidental orientation towards the concealed speaker, creating a conservative measure of duration of looking (Seyfarth and Cheney 1984; Palombit et al. 1997; Lemasson et al. 2008). A subject was considered to be looking in the speaker’s direction when their head was oriented to within 10° of the speaker’s location. The degree of a subject’s orientation was determined by comparing each frame of their response to photos of rhesus macaques oriented toward known angles at each 5° interval relative to a camera at 0°. To ensure observer reliability, a second observer blind to the condition scored 20% of videos selected at random. There was 90% agreement between observers, with a Cohen’s K value of 0.67, indicating a high level of inter-observer reliability.

Data analysis

We used generalized linear mixed models (GLMM) fit with the lme4 R package version 1.1–18 (Bates et al. 2015) in the R environment (R Core Team 2017) to determine the factors that predicted duration of looking towards the playback speaker. We used survival models fit with coxme version 2.2–10 (Therneau 2018) to determine the factors that predicted latency to look in the direction of the playback speaker. Survival models allowed us to use trials in which subjects did not look in the direction of the playback speaker. In these trials, a subject spent 0 s looking toward the speaker (a true “0”), but their latency to look toward the speaker was a null value because the event “looking” did not occur (Jahn-Eimermacher et al. 2011). We considered no-look trials as censored observations in our survival analysis, allowing us to retain all data (Bates et al. 2015).

For each response variable, we constructed three models comparing different sets of experimental conditions: model (1) recent groomer, not recent groomer, social control, null control; model (2) recent groomer + not recent groomer, social control, null control; model (3) frequent grooming partner, social control, null control (ESM, Table S1). In addition to the fixed effect “condition”, all models included the amount of grooming (seconds/hour observed) a subject received during the study period as a fixed effect covariate, the identities of subject and caller as random effects, and three additional fixed effects that could influence subject responsiveness to recruitment calls: (1) the number of individuals in the subject’s vicinity (10 m) at the time of the trial to account for the negative effect of audience size on some animals’ responses to conspecific vocalizations (Fischer 2004; Semple et al. 2009); (2) whether the subject and caller were of similar or different dominance ranks (“rank distance”) to account for the tendency of animals to engage more frequently in affiliative behaviors with individuals of similar rank (de Waal and Luttrell 1986; Smith et al. 2010; Blomquist et al. 2011), where “1” = if subject and caller belonged to the same rank category (both high-ranking, middle-ranking, or low-ranking) and “0” = if subject and caller belonged to different rank categories (e.g., one was high-ranking, the other low-ranking); (3) and the ratio of infants to adult females within a subject’s matriline. Preliminary analysis of subject responses revealed a significant decline in female responsiveness across the experimental phase of our study. The decline in subject responsiveness was most closely correlated with an increase in the number of infants born into subjects’ matrilines over the same period (ESM, Table S2, S3). To account for this relationship, we included the ratio of infants to adult females within a subject’s matriline as fixed effect in all models. We found no evidence of collinearity between any of our predictor variables.

To test whether the variable “condition” had a meaningful effect on subjects’ responses to stimuli, we used a log-likelihood ratio test to compare each of our three models containing only “condition” and random effects to a null model containing a constant value and random effects. All three models containing only “condition” fit significantly better than null models for both latency to look and duration of looking (p < 0.01) (ESM, Table S4). To test whether the fixed effects we added to our model were meaningful, we also used a log-likelihood ratio test to compare models containing all terms (global models) against models containing only “condition” and against null models (Crawley 2008). All global models fit significantly better than either only condition models or null models for both response variables (p < 0.01) (ESM, Table S4). All statistical analyses were conducted using R version 1.1.453.


We conducted 64 playback trials (recent groomer n = 15, not recent groomer n = 15, frequent grooming partner n = 11, social control n = 13, null control n = 10) using 39 different female subjects. Subjects looked in the direction of the speaker in 56 of our 64 trials (87.5%). As expected, due to the short duration of calls played (Gouzoules et al. 1984), subjects did not approach the speaker in any of our trials. Overall, we found no evidence for either direct or generalized reciprocity between unrelated females, over either the short- or longer-time scale.

Short-term direct and generalized reciprocity

We found that subjects took significantly more time to look in the direction of the speaker and spent significantly less time looking at the speaker in the recent groomer condition compared to subjects in the not recent groomer condition (latency: Coef ± SE = 1.35 ± 0.62, z = 2.18, p = 0.03; duration: Coef ± SE = 138.31 ± 44.20, t = 3.13, p = 0.004) (ESM, Table S1. Model 1). However, the strength of subjects’ responses in the recent groomer condition did not differ significantly from subjects in the social control (latency: Coef ± SE = 0.72 ± 0.54, z = 1.32, p = 0.19; duration: Coef ± SE = 35.63 ± 47.13, t = 0.76, p = 0.46) or the null control conditions (latency: Coef ± SE = 0.69 ± 0.60, z = 1.15, p = 0.25; duration: Coef ± SE = 68.15 ± 52.74, t = 1.29, p = 0.20) (ESM, Table S1. Model 1; Fig. 3 A1, B1). In other words, while subjects responded less strongly to recruitment calls from a recent grooming partner compared to a third party, subjects’ responses to a recent grooming partner did not differ significantly from their responses to calls from a recently submissive female, or a female with whom they had no interaction at all.

We found no evidence of generalized reciprocity in the short term. Subjects did not differ in their responses in the not recent groomer condition to compared to the social control (latency: Coef ± SE =  − 0.40 ± 0.67, z =  − 0.77, p = 0.44; duration: Coef ± SE =  − 25.73 ± 48.43, t =  − 0.53, p = 0.60) or null control conditions (latency: Coef ± SE =  − 0.72 ± 0.48, z =  − 1.64, p = 0.18; duration: Coef ± SE =  − 4.07 ± 50.34, t =  − 0.08, p = 0.93) (ESM, Table S1. Model 2). In other words, females that recently received grooming from any individual, including the caller, were no more responsive to recruitment calls than females who recently received a submission or had no interaction at all (Fig. 3 A2, B2).

Long-term direct and generalized reciprocity

We found no evidence of direct reciprocity in the longer term. Subjects were no more responsive to recruitment calls from frequent grooming partners compared to subjects in the null control who heard calls from females that were not observed grooming the subject during the study period (latency: Coef ± SE =  − 1.14 ± 0.69, z =  − 1.64, p = 0.10; duration: Coef ± SE =  − 38.57 ± 85.79, t =  − 0.45, p = 0.66) (ESM, Table S1. Model 3; Fig. 3 A3, B3). We also found no evidence of generalized reciprocity in the longer term. There was no significant relationship between the total amount of grooming a subject received across the study period from all other adult females and either of our measures of subject responsiveness (i.e., subject latency to look and duration of looking toward the recruitment call) in five out of our six models (ESM, Table S1). The only exception was a significant negative relationship between total grooming received and latency to look in the model containing the smallest number of data points (n = 21) and thus most prone to type II errors (Suresh and Chandrashekara 2012) (ESM, Table S1).


We found no evidence of direct or generalized reciprocity in either the short- or longerterm amongst unrelated female rhesus macaques. Females that recently received grooming showed no indication of an increased responsiveness to calls for coalitionary support from unrelated female group mates — even when the caller was their most recent grooming partner. Also, females were not more responsive to recruitment calls from their most frequent unrelated grooming partners of the last 2 years. Nor were females more responsive if they received a large amount of grooming from other females in their group as a whole. Together, these results yield no evidence of a reciprocal relationship, direct or generalized, between grooming and responsiveness to calls for coalitionary support over two distinct time frames.

When interpreting responses in a playback experiment designed to ask questions about cooperation, we must consider what it does and does not mean when a subject looks in the direction of a speaker. Looking towards a call for aid is not the same as providing the aid that is called for. Yet, at the same time the nature of an individual’s response provides a clear indication of how much the information in the call means to them and what they might do with it. Individuals who take a long time to look in the direction of a simulated conflict, or who look away quickly, demonstrate little interest in the conflict. By contrast, those quick to look and who spend a long time looking in the direction of a conflict demonstrate a willingness to gather information about the situation and are the most likely to be deciding whether or not to intervene. Of course, intervention does not often occur within a playback experiment as there is no actual conflict. Subjects are only provided with a singular piece of information (here, the recruitment call), and their efforts to gather information (through looking and listening) reveal that the conflict is both out of sight and over quickly. As put by Seyfarth and Cheney “playbacks elicited only looking responses… actual physical involvement may depend on further assessment of the potential costs of intervention” (Seyfarth and Cheney 1984). However, the information about the potential costs of intervention is not available. Individuals’ attention to the fixed amount of information provided acts as a stand in for their response to the richer information that would exist in an actual conflict. The information necessary for an individual to decide to intervene can only be gained by sustained interest and observation. Thus, interest and information gathering are the necessary first step toward cooperation. Certainly, if there is no interest there is unlikely to be coalitions and cooperation. So, while the subjects in our study may not actually provide coalitionary support, the difference in their responsiveness to a fixed amount of information about a conflict is indicative of how they would attend to and potentially intervene in an actual conflict.

Absence of evidence is not evidence of absence

Of course, the absence of evidence for a relationship between grooming and increased responsiveness to calls for coalitionary support does not mean the exchange is necessarily absent. This exchange might be occurring, but the signal may be weak. Whilst our sample sizes of 10–15 trials per condition may limit our ability to detect a weak signal, past studies using the same experimental design in other species detected a short-term effect of grooming on responsiveness to calls for coalitionary support using samples size somewhat larger (Cheney et al. 2010, Papio ursinus, 5–29 per condition) or even smaller (Seyfarth and Cheney 1984, Chlorocebus pygeruthrus, 9–10 per condition) than our own. To evaluate the minimal effect size our study could detect, we conducted a post hoc minimal effect size calculation using the “pwr” R package version 1.3–0 (R Core Team 2017). A sample size of 10–15 subjects per condition, an alpha of 0.05, and a desired power of 0.80, yielded a Cohen’s d value of 1.06–1.32. To arrive at a study-appropriate unit, we multiplied “d” by a standard deviation value taken from the difference in subject responses between the “prior grooming” and “no prior grooming” conditions of a previous study (Seyfarth and Cheney 1984). We found that our study could reliably detect a difference of 2.94–3.68 s or larger between conditions. This means our experiment would have an 80% chance to detect a significant difference (< 0.05) between responses to recruitment calls that were ~ 2.9–3.7 s longer in one condition compared to another. The minimal size of the effect that our analysis can detect is relatively large, but so too is size of the effect we are looking for. In previous studies of this paradigm subjects looked on average 4.41 s longer in the prior grooming condition compared to the no prior grooming condition with an effect size, Cohen’s d, of 1.92 (Seyfarth and Cheney 1984). The difference in mean response time in our study between the recent groomer condition and the not recent groomer condition was 4.99 s with an effect size of 1.44. This value is greater than our estimated range of minimal detectable effects (MDE), and we can thus be confident in the power of our analyses to detect this difference. The other large, but not significant, difference between responses to conditions was between the recent groomer condition and the null control. The mean difference in response time was 3.94 s (above the “minimal detectable looking duration” range), but the effect size was only − 0.84 (below the Cohen’s d MDE range), and the p-value was non-significant (p = 0.20). The disparity between the effect size and the looking duration difference likely indicates that the standard deviation of our sample is slightly wider than that of Seyfarth and Cheney (1984), meaning the actual minimal detectable looking duration for our study is slightly higher than the estimate — likely above 3.94 s. Since the effect size of the compared conditions is below the MDE, it is possible there is a significant difference our analysis cannot detect, but, like the difference between the recent groomer and not recent groomer conditions, the direction of the effect is negative. Thus, it would still support our conclusion that we found no evidence for a reciprocal effect. The mean difference in responses between our other conditions was relatively small (1.05–2.94 s) — with effect sizes below the range of minimal detectable effects (Cohen’s –: 0.22–0.82). Thus, while it is possible that there are effect-driven differences in the responses between our other conditions, any undetected effects must relatively small. Given that our sole significant effect was of similar strength to, but the opposite direction of, past detected effects, and that the differences in responses between the rest of our conditions were minimal, we can be relatively confident the absence of a reciprocal effect of grooming on responsiveness to calls for coalitionary support that we found in unrelated female rhesus macaques is not simply a result of underpowered analysis, and stands at odds with previous findings in other female primates.

Differences between our results and those of past studies could be driven, in part, by differences in the extent to which relatedness was excluded from our study designs. By taking advantage of a population with a deeply resolved pedigree containing information on both maternal and paternal relatedness, we were able to wholly exclude the effects of kinship from our results. Previous playback studies only had information on subject’s maternal relatedness, and so may have inadvertently tested paternally related dyads, which may have been more cooperative toward one another than unrelated dyads (Seyfarth and Cheney 1984; Widdig et al. 2001; Cheney et al. 2010). Our study thus highlights the value of controlling for kin selection in cooperative experiments.

The absence of evidence for the exchange of grooming for support in rhesus macaques could also be a result of high costs associated with providing coalitionary support in this species (Flack and de Waal 2004). Rhesus macaque society is highly despotic — conflicts are likely to escalate, and are unlikely to end in reconciliation (Flack and de Waal 2004; Arnold and Auriel 2010). Providing coalitionary support thus may come with an increased risk of injury compared to less physically aggressive species like vervet monkeys and chacma baboons (de Waal and Luttrell 1988). The costs of providing coalitionary support may simply be too high for the benefits of grooming to be worth the exchange in highly despotic species, and alternative rules may govern the distribution of grooming and coalitionary support. This is seemingly the case in another despotic macaque species, the Japanese macaque (Macaca fuscata), where an observational study found that females were not more likely to provide coalitionary support within 30 min of receiving grooming (Schino et al. 2007). Spotted hyena (Crocuta crocuta) females are also a highly despotic, hierarchal species who do not exchange coalitionary support reciprocally with consistent partners. Instead, they selectively intervene in lower-intensity fights against subordinates, avoiding high conflict costs and gaining direct and indirect fitness benefits from reinforcing their own dominance rank and those of their kin (Smith et al. 2010). Reciprocity may be absent in despotic animal societies for behaviors such as grooming and coalitionary support, but even if reciprocity is present, any effects of past grooming on decisions to help might be small, relative to all the other factors that go into the decision to respond calls for aid.

What grooming gets you

Even if grooming is not exchanged for increased responsiveness to calls that may translate into coalitionary support, this is not to say that grooming is not beneficial. As in many animals societies (Snyder-Mackler et al. 2020), female rhesus macaques who maintain proximity and exchange grooming with a few close partners, or with a wide range of infrequent partners, have better survival outcomes than less well-connected females (Ellis et al. 2019). Interestingly, whatever fitness benefits females are receiving do not necessarily come from grooming alone, but from the relationships that emerge from grooming and other interactions (Ellis et al. 2019). Partners who interact frequently may benefit from better coordinated actions, with more balanced exchanges that are less prone to cheating (Noë and Hammerstein 1994; Cheney et al. 2010). Having a wide range of partners may also provide broad social tolerance from group mates — granting access to key resources and helping individuals to avoid injury in the agonistic interactions of a despotic society (Henzi and Barrett 1999; de Waal and Brosnan 2006; Testard et al. 2020; Gareta García et al. 2021).

Generalized reciprocity in animal societies

The absence of evidence consistent with generalized reciprocity is not unusual. Observational studies of primates exchanging grooming for grooming (Majolo et al. 2012; Molesti and Majolo 2017) and grooming for food (de Waal and Brosnan 2006) have also failed to detect a pattern of indiscriminate giving or evidence of a relationship between the amount of grooming a subject received and their general level of cooperativeness. Generalized reciprocity has also been explored in short-term exchanges of coalitionary support in ravens (Corvus corax) (Fraser and Bugnyar 2012). This study found the same result — the amount of cooperation an individual received did not predict the short-term likelihood that individuals would respond more strongly to any partner. Evidence of generalized reciprocity in animals has thus far been found in lab-based tasks, where Norway rats (Rattus norvegicus) (Rutte and Taborsky 2007), capuchin monkeys (Cebus apella) (Leimgruber et al. 2014), and working dogs (Canis familiaris) (Gfrerer and Taborsky 2017) were all more likely to help unknown conspecifics after receiving help themselves. The differences in study outcomes could result from species differences, or could result from difference between ecologically realistic contexts where many factors could be affecting the decision to cooperate and controlled laboratory contexts where the effect of past cooperation can be better isolated (Carter 2014). Generalized reciprocity may exist in free-living animal societies, but the presence of other partner preferences such as kinship, dominance rank, past interactions, and others may overshadow the tendency to “pay cooperation forward”.

The result that females were less responsive to recruitment calls from recent grooming partners compared to calls from other females could reflect differences in the expected location of the caller in these two conditions. One interpretation of looking responses is that animals look more quickly and for longer at calls that contain information that is unexpected, novel, important, or important for its novelty and unexpectedness (Winters et al. 2015; Whitehouse and Meunier 2020). In the recent groomer condition, the subject might expect the caller to be nearby because they recently interacted, but in the “not recent groomer” condition, the subject had no recent contact with the caller and so might expect her to be further away (in fact in both cases the caller is > 50 m away or well out of sight). However, we believe this explanation unlikely because our control conditions mirror the expected location of callers in these two experimental conditions. If a violation of expectation regarding the location of the caller were driving the difference in responses, we would expect subjects to also look longer in the null control where no prior interaction occurred, and for less time in the social control where the subject and caller recently interacted. But responsiveness in neither the recent groomer nor the not recent groomer conditions differed significantly from responsiveness in our control conditions, suggesting the expected location of the caller did not substantially influence our results. Given the small effect size for the difference between the recent groomer and not recent groomer conditions, further study is required to establish the robustness of this result and its potential meaning.

Mutualism too

A meta-analysis of non-experimental studies of primate behavior revealed a positive, albeit weak, relationship between grooming, and coalitionary support (Schino 2007). That is, individuals that were grooming partners tended to be coalitionary partners, and vice versa. The correlation reveals a relationship between two behaviors but not a cause. Just as direct and generalized reciprocity can explain the same pattern of cooperation in a single behavior, each obscuring evidence of the other, so too can reciprocal and by-product mechanisms co-exist and co-confound within a single correlation. For example, two individuals eating from the same food patch could form a coalition to defend it from a third individual — each netting a mutual benefit by retaining their own access to the resource. If coalition partners find themselves repeatedly accessing the same resources because of shared social status, disposition, metabolism, or food preferences, a pattern of repeated support could emerge (Mcpherson et al. 2001). The same characteristics that assort individuals while feeding can apply to affiliative contexts — thus pairs of individuals that mutually support one another can also be more likely to associate and groom one another as well (de Waal and Luttrell 1986; de Waal and Brosnan 2006), leading to the observed correlation between grooming and coalition partners (Schino 2007). This is not to offer up by-product benefits as an alternative mechanism to reciprocity. Instead, we, who tested and failed to find evidence for two possibly overlapping forms of reciprocity across two timescales while controlling for kin selection, and wish to highlight an additional explanation — immediate mutual benefits — on the list of possible factors motivating an animal’s choice to cooperate. Exhaustive studies aimed at explaining cooperative behaviors are best served not by testing for a single mechanism but by accounting for the fact that social decisions in animal societies are made under a complex set of interacting rules that include factors like relatedness, past interactions, existing long-term relationships, a tendency to pay cooperation forward, homophily, and opportunity. Effective tests for the presence of one mechanism must be aware of all the possible interacting mechanisms. For this reason, we propose by-product explanations may also be at play in the cooperative exchanges of unrelated female rhesus macaques.


Overall, we have highlighted how choice of grooming and coalition partners, like other behaviors first described as examples of reciprocity (three-spined stickleback (Gasterosteus aculeatus) predator inspection (Milinski 1987), pied flycatcher (Ficedula hypoleuca) predator mobbing (Krams et al. 2008), vampire bat blood donations (Wilkinson et al. 2016)) may have multiple interacting explanations, some cooperative, others not. Further work is needed and should be conducted using comprehensive experimental designs and a high degree of inclusivity when exploring mechanisms to explain apparent cooperative behaviors.