Recognition memory impacts day-to-day social interactions. Failing to recognize someone you’ve met before may lead to an awkward conversation, especially if the other person recognizes you. You might also forget that you’ve gone over details of a business deal with a client, appearing unorganized when presenting them as new. People influence how we remember words, pictures, and other people (Wright et al. 2008; Wright et al. 2005). Despite the role recognition memory plays in navigating our environments, research considering the processes underlying recognition decisions has largely been conducted in non-social contexts (e.g., Eichenbaum et al. 2007; Yonelinas, 2002; Yonelinas & Parks, 2007).

To study recognition memory, researchers typically assess how people employ decision rules to respond “old or “new” to memory probes. Researchers further characterize decisions using unequal-variance signal detection theory (SDT; Fig. 1a). SDT describes the rule leading to “old” or “new” responses as a criterion dividing the memory evidence distributions that represent targets and lures. That is, items whose memory strength lies above the criterion will be identified as “old,” and items not exceeding it will be identified as “new.” The hit rate is computed as the area under the target distribution to the right of the criterion, and the false alarm rate is the area under the lure distribution to the right of the criterion. Accuracy, or “sensitivity,” is modeled as the distance between the means of the target and lure distributions, and referred to as d a (Macmillan & Creelman, 1991). When considering confidence, several criteria partition the evidence continuum, with each partition representing a different confidence rating on a scale (e.g., from sure old to sure new). Such a model is depicted in Fig. 1b, for a six-point scale.

Fig. 1
figure 1

The unequal-variance signal detection theory (SDT) model. (a) The SDT model applied to a simple old–new task, involving a single, unbiased old–new response criterion (vertical line). (b) The SDT model as applied to a confidence-rating task, with liberal criterion placement. (c) The SDT confidence-rating model with conservative criterion placement

Individuals differ in their decision criteria. One person may call a word “old,” but be unsure, whereas another person may be very confident in the “old” response, even if the memory strength for the item is the same across the two individuals. This difference in decision criteria is illustrated in Fig. 1b, c as a shift of the response criteria to favor either more liberal (leftward shift, panel B) or more conservative (rightward shift, panel C) response biases.

People can control criterion placement between the old and new evidence distributions, and may shift criteria to favor different outcomes (Azimian-Faridani & Wilding, 2006; Hirschman & Henzler, 1998; Rotello et al. 2005; Strack & Forster, 1995; Van Zandt, 2000). One branch of research has assessed the impact of explicit response instructions on criterion placement. For instance, people can be motivated to call items “old” if given a high probability of items being “old” before making a decision (Dube & Rotello 2012; Van Zandt, 2000). Moreover, instruction to change decision criteria—by, for example, only responding “old” when absolutely confident—also affects criterion placement (Azimian-Faridani & Wilding, 2006).

Emerging work suggests that personality and social factors influence criterion placement, dovetailing with the increased focus on social influence in the broader memory literature (for reviews, see Echterhoff & Hirst, 2009; Hirst & Echterhoff, 2012). For instance, increased negative affect corresponds with less shifting, perhaps because those with enhanced negative affect could be less cognitively flexible (Aminoff et al. 2012). Stereotype threat also impacts criterion placement. For example, women under threat in academic settings make more risk-aversive financial decisions (Carr & Steele, 2010), and older adults under threat become more conservative, to avoid errors of commission on memory tests (Barber & Mather, 2013). Interestingly, explicit instructions to make lax or strict decisions influence the extent of reporting peers’ inaccurate memories in one’s own recollections, such that receiving instructions to be strict reduces the reporting of inaccurate suggestions while at the same time reducing the reporting of accurate details (Wright et al. 2008). Source reliability also impacts the extent to which people integrate others’ suggestions into their memory decisions (Skagerberg & Wright, 2009).

An open question in this growing line of research regards how people adaptively fine-tune memory decisions to be more conservative or liberal on the basis of performance feedback from others, without explicit instructions to change decision-making strategies. This is an important consideration because many aspects of learning involve adaptively changing decision-making strategies to optimize one’s likelihood of success. Nonsocial memory work has shown that people adaptively and implicitly shift criteria when given biased feedback, in which selectively false positive feedback on false alarm or miss trials encourages the adoption of a more lax or conservative decision criterion (Han & Dobbins, 2008), while hits and correct rejections are given fully correct feedback. Because false positive feedback is reserved for errors, people presumably have little basis to suspect a manipulation. As a result, people show adaptive criterion learning, becoming more lax when false alarms beget false positive feedback, and stricter when misses beget false positive feedback. These effects persist even in later memory tests without feedback (Han & Dobbins, 2009).

To date, no work has directly examined how social context—that is, who provides you with feedback—impacts the extent of adaptive criterion learning. Related work indicates, however, that external cues (i.e., information about the likelihood of information being old or new) influence memory (Jaeger, Cox, & Dobbins, 2012; Jaeger, Cox & Dobbins 2012). This is an interesting topic, as external cues can certainly be social in nature (e.g., saying you have seen someone before on the basis of a friend’s suggestion rather than your own recollection). People do indeed over-rely on external cues, showing decreased accuracy when computer-based recommendations (e.g., “Likely old”) were ultimately invalid versus valid (Jaeger, Cox, & Dobbins, 2012). Critically, social cues influence memory as well, with people taking more recommendations into consideration for memory decisions when they come from a peer who is a reliable source of information, as compared to an unreliable source (Jaeger, Lauris et al. 2012). Social cues even impact cognition when others are not physically present (Shteynberg & Galinsky, 2011). This suggests external cues may be taken into consideration depending on who provides you with information even if a computer transmits all information. People may be more likely to prioritize recommendations or feedback from reliable people or computers, which rely on logic-based rules to provide information, relative to people who seem unreliable.

The present work leverages Han and Dobbins’s (2008) biased feedback procedure to examine social influences on adaptive criterion learning. Since feedback in everyday life does not always come from a computer, as used in Han and Dobbins (2008), examining potential differences in criteria based on source sociality may be important in improving feedback learning and developing new ways to give employees and students feedback about performance. This may be critical in some arenas, such as encouraging airport security employees to adopt a lax versus strict criterion in identifying and subsequently investigating suspicious items. Adopting a lax criterion would encourage more false alarms, but those are desirable when considering public safety. Likewise, a stricter criterion would be beneficial when errors of commission should be reduced, as when vetting people for an important position. Given the broad social influences on memory, we anticipated that social sources would influence adaptive criterion learning. We expected to replicate Han and Dobbins’s (2008) findings of criterion shifting for both social or computer feedback sources (Hypothesis 1). This would suggest that peers induce criterion shifts similarly to computers acting on programmed rules.

Meta-analytic results show highly credible sources induce more persuasion than low-credibility ones (Pornpitakpan, 2004). When people believe another person is more powerful than themselves, the other person’s opinion carries more weight in influencing memory conformity (Skagerberg & Wright, 2007). In addition, a person’s relative credibility influences perceivers’ susceptibility to misinformation (French et al. 2011). Expertise conveys reliability, with persuasiveness increasing given the perceived expertise of a communicator (Birnbaum & Stegner, 1979; Cialdini & Goldstein, 2004).

For these reasons, we anticipated that differences in source reliability could impact adaptive criterion learning in two ways. First, we anticipated that participants would be less likely to shift criteria when they believed feedback came from an unreliable and low-achieving person, relative to a reliable and high-achieving person or a computer (Hypothesis 2a). Second, we anticipated that social source reliability could impact the overall tendency to become more lax with time (as evidenced in Han & Dobbins, 2008; Murdock, 1974; Verde & Rotello, 2007). Notably, we provided false feedback to participants in the first and third of three tests, but not the second. The nature of when participants received biased feedback indicates that participants would see more feedback labeled “Incorrect” in the second test than in the first, and thus would perceive worsening performance. We believed that increased exposure to feedback labeled “Incorrect” could have repercussions when sources are reliable or unreliable.

Participants might be more sensitive to losses in the presence of a person perceived as high-achieving and reliable, rather than an unreliable person. Thus, participants assigned to a reliable social source might become more risk-aversive and adopt a stricter criterion over time (Hypothesis 2b). This could reflect protective self-presentation, in which individuals avoid losing approval from others (Arkin, 1981), or maintenance of a positive self-concept (Cialdini & Goldstein, 2004; Wood, 2000). Becoming stricter could also be indicative of avoiding negative impressions in public settings (Wooten & Reed, 2004). Indeed, behaviors performed in public can lead to changes in self-concept and self-presentation relative to identical behaviors performed in the absence of an interpersonal context (Tice, 1992). In contrast, participants might not have such a desire in the presence of an unreliable and low-achieving person, and might adopt a more lax criterion over time, similar to those receiving computer-based feedback (Hypothesis 2b).

Experiment 1

Method

Participants

Seventy-two adults (mean age = 18.40 years, SD = 0.87, range = 17–23; 47 female, 25 male) from Brandeis University participated for credit. One 17-year-old obtained parental permission to participate. Participants were randomly assigned to a lax–neutral–strict (LNS) or strict–neutral–lax (SNL) biased feedback group. In all, 36 participants each were assigned to the LNS and SNL groups. Participants were also randomly assigned to nonsocial (N = 24), reliable/high-achieving (N = 24), or unreliable/low-achieving (N = 24) source groups. Analyses conducted in G*Power (Mayr et al. 2007) for sample size estimation suggested that 21 participants in each group would allow us to detect effects using alpha = .05, power = .80, and ηp 2 = .10. Twelve participants assigned to each source comprised the LNS and SNL groups. The source and feedback manipulations will be discussed in turn.

Materials

Six hundred nouns were randomly drawn from a group of 4,983 words generated from the English Lexicon Project (http://elexicon.wustl.edu; Balota et al. 2007). Three lists of 200 nouns each (100 randomly assigned as targets and 100 as lures) were constructed for three study/test cycles. Two task versions counterbalanced target and lure assignments in the three cycles. The cycles were identical, but word order was randomized. We compared the words’ numbers of letters, syllables, and phonemes, as well as their Kučera–Francis (KF) corpus frequency (Kučera & Francis, 1967) in four 3 (Study/Test Cycle: 1, 2, 3) × 2 (Word Assignment: target, lure) analyses of variance (ANOVAs). No effects approached significance, ps > .22. The selected nouns contained, on average, 7.11 letters (SD = 1.68), 6.02 phonemes (SD = 1.64), and 2.40 syllables (SD = .81), with a KF frequency of 12.26 (SD = 7.00).

Procedure

Adaptive criterion learning task

We replicated the study–test cycle procedure of Han and Dobbins (2008). Stimuli were presented via E-Prime (Psychology Software Tools, Pittsburgh, PA). All participants practiced the study syllable counting task (Fig. 2a). They were instructed to indicate how many syllables were in each word and were told that they would be tested on their memory for the words immediately after the syllable-counting task. All participants completed ten practice trials. Words appeared on the screen one at a time for 2,000 ms. Below each word, the question, “How many syllables?” appeared, with choices of “1,” “2,” “3,” and “4+” listed below. Participants pressed the “1,” “2,” “3,” or “4” key to indicate how many syllables were in each word. Participants were then given accuracy feedback on the computer, but were told that they would not receive feedback on the syllable-counting task after the practice session. After practicing, participants began the first study–test cycle. In each of three study tasks, 100 words were presented, one at a time, for 2,000 ms each. A blank screen was presented for 500 ms between trials.

Fig. 2
figure 2

Example study (a) and test (b) trials

Immediately following the first study task, participants began the first self-paced memory test (Fig. 2b). In all tests, participants viewed 200 words one at a time on the screen. One hundred words were from the study task, and 100 were lures. Participants knew that some words would be old and some would be new, and knew that the words appearing in one cycle would not appear again. Participants pressed “1” to indicate “old” (i.e., they had seen it at study) and “2” to indicate “new” (i.e., they had not seen it).

After each decision, the question “How confident are you?” appeared on the screen with a three-point scale (1 = unsure, 3 = certain). After each rating, the word “Correct” or “Incorrect” appeared at the center of the screen for 1,000 ms as feedback on the memory decisions. A blank screen was presented for 250 ms between trials. Participants then completed two more cycles.

Source manipulation

To study social influences on adaptive criterion learning, we manipulated who or what participants believed provided them with feedback, labeled “Correct” or “Incorrect.” All participants, regardless of source assignment, filled out a questionnaire prior to the study–test cycles. This questionnaire asked each participant to identify his or her major; goal for the year; how he or she was currently feeling (via a nine-point scale), along with an explanation of that response; and the number of hours slept and how well he or she had slept (via a nine-point scale) the previous night, along with a description of sleep patterns.

The questionnaire instructions varied by source type. Participants assigned to the nonsocial source type group were told, “This questionnaire just provides us some additional information about you.” Participants assigned to the social source types were told, “This questionnaire provides us some additional information about you. Today we are teaming up with another study in the lab that has to do with information processing. Your task today involves a syllable-counting task and a memory task, which we will practice next. The other study being conducted concurrently deals with how quickly people react when processing information. While you perform the memory task, another person in a room down the hall will be monitoring your performance via a computer and entering in whether your responses were correct or incorrect. You will receive this feedback in the memory task. We also had the other participant fill out the same questionnaire as you will fill out now, and we will let you see each other’s responses without revealing your name or any identifying information, to make the experience of working together less weird.”

For participants in the nonsocial source group, an experimenter took the questionnaires away upon completion. For the reliable and unreliable source groups, an experimenter took the completed questionnaires and told participants that he or she would swap their questionnaires with the other participant’s, so they could review each other’s responses. The experimenter then left the room and closed the door, returning 1 min later with the purported other participant’s questionnaire. The experimenter told participants, “Look this over for as long as you’d like, and let me know when you’re ready to practice the task.” The questionnaires from the reliable or unreliable source were handwritten. Questionnaire responses conveyed that the source was reliable/high-achieving or unreliable/low-achieving (Table 1). Nine people provided ratings on the believability of the responses and the general reliability of the people purportedly filling out the questionnaires after completing a separate study in the lab. Taken from a nine-point scale, the responses to the questionnaires from the reliable/high-achieving (M = 6.33, SD = 1.80) and unreliable/low-achieving (M = 6.33, SD = 2.24) sources were determined to be equally believable, p > .99. However, people perceived the source labeled as reliable/high-achieving (M = 6.67, SD = 1.80) as being more reliable than the unreliable/low-achieving source (M = 4.00, SD = 2.24), t(8) = 3.27, p = .01.

Table 1 Responses to participant questionnaires for groups with the reliable/high-achieving and unreliable/low-achieving source types

Before the first memory test, participants were reminded of who or what would be labeling their feedback as “Correct” or “Incorrect” in the memory tests. Participants in the nonsocial source group were told, “After you give your confidence rating, you will receive computer feedback letting you know if your response was correct or incorrect. You will receive feedback like this for each of the three memory tasks.” Participants in social source groups were told, “As soon as you decide whether a word is old or new, the participant in the information processing study has been instructed to react as quickly as possible whether you were correct or not. After making your confidence decision, you will be able to see this feedback telling you whether you were correct or incorrect. You will receive feedback like this for each of the three memory tasks.” If participants asked how the other person knew whether or not they were correct, they were told, “The other participant has a guide at the top of their screen where they see your response and accuracy. They have to react as quickly as possible. Their reaction is your feedback.” These instructions were not repeated for the second and third cycles.

Biased feedback manipulation

Participants received biased feedback in the first and third tests. In the first or third test, false positive feedback for all false alarm responses (i.e., receiving “Correct” as feedback when they classified a “new” word as “old”) encouraged a lax criterion (L; lax). Other responses (i.e., hits, misses, and correct rejections) were correctly identified. The feedback in the second test was fully correct (N; neutral). In the first or third test, false positive feedback to all miss responses (i.e., receiving “Correct” as feedback when identifying an old word as “new”) encouraged a conservative criterion (S; strict). The order of the manipulation was reversed for the LNS and SNL groups. This manipulation was conducted across participants; the participants differed in whom they were told provided the feedback.

After the study–test cycles, participants were debriefed and completed a posttask questionnaire. This questionnaire addressed the extent to which participants had detected that the feedback was systematically biased (“Did you believe the feedback given to you?”: 1 = not at all, 4 = moderately, 7 = very much so), and how much they believed that another person had provided the feedback (“If you were told someone would be in another room giving you feedback during the memory task, how much did you believe that?”: 1 = not at all, 4 = moderately, 7 = very much so).

Results

Manipulation efficacy

An ANOVA on the posttask questionnaire responses showed no main effect of source type, p = .23 M nonsoc = 4.79, SD = 1.87; M reli = 4.54, SD = 1.62; M unreli = 5.38, SD = 1.61), indicating that participants did not suspect the systematic biased feedback. A t test confirmed that the participants assigned to social source types (M reli = 4.58, SD = 2.08; M unreli = 4.54, SD = 1.62) did not differ in believing that a person had provided feedback, p = .94. Although these scores suggest the efficacy of the manipulations (i.e., on average, above moderate ratings), answering questions after debriefing may have attenuated the degree of belief expressed by participants.

Because Hypothesis 2b depended on the idea that participants perceived worsening performance over time, additional analyses confirmed that participants encountered more feedback labeled “Incorrect” in the second than in the first or the third memory test (see the supplemental materials).

Signal detection measures

To most closely replicate Han and Dobbins’s (2008) analyses, we computed c a and A z for our analyses. C a and A z were calculated as follows (Macmillan & Creelman, 1991):

$$ {A}_z = \varPhi \left({d}_a/\surd 2\right), $$
$$ {c}_a=\frac{-\sqrt{2s}}{{\left(1+{s}^2\right)}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}\left(1+s\right)}\left[z(H)+z(F)\right] $$

where s is the slope of the zROC and d a is defined as

$$ {d}_a=\sqrt{\frac{2}{1+{s}^2}\left[z(H)-sz(F)\right]} $$

C a and A z were calculated after obtaining the zROC slope estimates from a linear regression. These slope estimates are comparable to 1/σTarget, as obtained from model-fitting techniques (Ratcliff et al. 1994; Ratcliff et al. 1992).

Unsurprisingly, given the large literature on this topic, our recognition data were consistent with the assumptions of the unequal-variance SDT model. As such, the measures that we used, which were derived from that model, provide statistically independent measures of memory accuracy (A z ) and response bias (c a ; Macmillan & Creelman, 1991). This means that although accuracy and bias were both changing across conditions in the present data set, our measures are unlikely to have been unduly influenced by this fact or to result in any statistical artifact as a result of the concurrent variation (Dube & Rotello, 2012; Green & Swets, 1966; Pazzaglia et al. 2013). The means and standard deviations are reported in Table 2 unless specified otherwise.

Table 2 Experiment 1: Mean (and standard deviation) unequal-variance decision criterion and accuracy estimates across source types, groups, and tests

Decision criteria

We assessed the effects of the biased feedback and source manipulations by focusing on decision criterion (represented as the middle criterion in Figs. 1a, c) in a 3 (Source Type: nonsocial, reliable, unreliable) × 2 (Group: LNS, SNL) × 3 (Test: 1–3) mixed ANOVA (Table 2).Footnote 1 Participants performed above chance at study (M = .86, SD = .05), with no effect of source type, p = .45.

Replication of prior work

As in Han and Dobbins’s (2008) findings, SNL participants were stricter (M = .19, SD = .23) than LNS participants (M = .02, SD = .23), F(1, 66) = 9.35, p = .003, ηp 2 = .12. We found a marginal effect of test, F(2, 132) = 2.85, p = .06, ηp 2 = .04. Criteria were stricter in Test 1 than in Test 2, F(1, 66) = 11.01, p = .001, ηp 2 = .14. The criteria between Tests 1 and 3 and between Tests 2 and 3 were similar, ps > .13.

Supporting Hypothesis 1, an interaction existed between group and test, F(2, 132) = 54.10, p < .001, ηp 2 = .45 (Fig. 3a). SNL participants were stricter in Test 1 than LNS, F(1, 66) = 44.48, p < .001, ηp 2 = .40. This persisted into Test 2, F(1, 66) = 24.82, p < .001, ηp 2 = .27. Consistent with Han and Dobbins (2008, Exp. 3), the criteria also differed in Test 3, F(1, 66) = 16.34, p < .001, ηp 2 = .20. Here, LNS were stricter than SNL participants. Notably, the linear trend for the interaction between group and test was significant, F(1, 66) = 60.72, p < .001, ηp 2 = .48.

Fig. 3
figure 3

Criteria across tests were different between the lax–neutral–strict and strict–neutral–lax groups (a). The nonsocial and unreliable source type groups became more lax over time, whereas the reliable group became more conservative (b)

Social influences on criterion shift

Crucially, an interaction between source type and test emerged, F(4, 132) = 3.75, p = .006, ηp 2 = .10 (Fig. 3b), revealing social influence on criterion learning that supported Hypothesis 2b: Those with an unreliable source adopted stricter criteria in Test 1 than in Test 2, F(1, 22) = 16.71, p < .001, ηp 2 = .43, and Test 3, F(1, 22) = 8.99, p = .007, ηp 2 = .29. Their criteria between Tests 2 and 3 did not differ, p = .68. This demonstrates a tendency to become more lax over time, consistent with prior work (Donaldson & Murdock, 1968; Verde & Rotello, 2007). The nonsocial source group showed a nonsignificant visual trend toward leniency, ps > .13. The reliable source group, however, became more conservative: Their criteria did not differ between Tests 1 and 2, p = .41, but were stricter from Test 2 to 3, F(1, 22) = 6.99, p = .02, ηp 2 = .24. Tests 1 and 3 did not differ, although a trend toward becoming stricter was evident, p = .13. No other effects were significant, including an interaction between source type, group, and test that would have supported Hypothesis 2a, ps > .16.

Accuracy results and interpretation

Accuracy across the memory tasks was above chance for all participants (M = .79, SD = .09). Although not of primary interest, we assessed accuracy using the ANOVA described above, to be consistent with prior work. As in the previous findings (Han & Dobbins, 2008), accuracy decreased over time, reflecting potential fatigue or proactive interference effects over the experiment, F(2, 132) = 13.99, p < .001, ηp 2 = .18: Accuracy was lower in Test 3 than in Test 1, F(1, 66) = 18.32, p < .001, ηp 2 = .22, or Test 2, F(1, 66) = 23.41, p < .001, ηp 2 = .26. Accuracy did not differ between Tests 1 and 2, p = .73. LNS participants (M = .82, SD = .09) were more accurate than SNL (M = .77, SD = .09), F(1, 66) = 5.12, p = .03, ηp 2 = .07.

A marginal interaction emerged between group and test, F(2, 132) = 2.66, p = .07, ηp 2 = .04. Accuracy differed between the LNS and SNL groups in Tests 1, F(1, 66) = 4.69, p = .03, ηp 2 = .07, and 2, F(1, 66) = 11.82, p = .001, ηp 2 = .15, but not in Test 3, p = .48.

The marginal interaction between group and test was qualified by source type, F(4, 132) = 4.27, p = .003, ηp 2 = .12 (Fig. 4). We will first discuss the patterns within the LNS group. For nonsocial source participants, we found an effect of test, F(2, 22) = 12.52, p < .001, ηp 2 = .53: Accuracy decreased on Test 3 relative to Tests 1, F(1, 11) = 23.25, p = .001, ηp 2 = .68, and 2, F(1, 11) = 11.63, p = .01, ηp 2 = .51. Accuracy between Tests 1 and 2 did not differ, p = .92. For reliable source participants, there was also an effect of test, F(2, 22) = 11.32, p < .001, ηp 2 = .51: Accuracy decreased on Test 3 relative to Tests 1, F(1, 11) = 12.75, p = .004, ηp 2 = .54, and 2, F(1, 11) = 14.68, p = .003, ηp 2 = .57. Accuracy did not differ between Tests 1 and 2, p = .83. Unreliable source participants showed no effect of test, p = .70.

Fig. 4
figure 4

Participants anchored in the lax feedback condition became less accurate over time if they were assigned to the nonsocial or reliable source types, but not if assigned to the unreliable source type. Participants anchored in the strict feedback condition became less accurate over time if assigned to the unreliable source type, but not if assigned to the nonsocial or reliable source types.

We now discuss the patterns for the SNL group. Nonsocial and reliable source participants showed no effect of test, ps > .52. Test did matter, however, for unreliable source participants, F(2, 22) = 5.54, p = .01, ηp 2 = .34: Accuracy was increased in Test 1 relative to Test 3, F(1, 11) = 9.58, p = .01, ηp 2 = .47, and Test 2, F(1, 11) = 5.76, p = .04, ηp 2 = .34. Accuracy was not different between Tests 2 and 3, p = .25. No other effects were significant, ps > .13.

Speculatively, being initially manipulated to become strict or lax could be one reason why accuracy patterns persisted or decreased with time. Anchoring in a lax response style (LNS) means that being manipulated to become more conservative could conflict with the tendency to become more lax over time (Donaldson & Murdock, 1968; Verde & Rotello, 2007). Such a conflict could result in decreased accuracy in the third test. This effect might only be expected among those with nonsocial and reliable sources, since they might be more invested in the task than were those with a lower-achieving source. Anchoring in a stricter response (SNL) style with a manipulation to become more lax would not result in such a conflict, perhaps lowering the chance of accuracy decreases. Decreased accuracy, however, could be noted among those with a lower-achieving source, since fatigue effects coupled with general comfort in the task could result in less accurate performance if participants felt less pressure to perform well. Although this is not our present focus, further work could clarify the mechanisms underlying maintained or decreased accuracy given different social sources.

Discussion

Experiment 1 replicated work (Han & Dobbins, 2008, 2009) showing adaptive criterion learning through biased feedback. Extending the literature, we showed learning regardless of source sociality, and of whether social sources seemed high-achieving and reliable or low-achieving and less reliable (Hypothesis 1). This suggests that people are susceptible to shifting their criterion placement in response to social sources, even when the sources are arguably imperfect. Moreover, we suggest that a pattern of adopting stricter criteria with time within the reliable source group (unlike in the nonsocial and unreliable groups) might reflect a self-protective mechanism when being evaluated by a reliable and high-achieving peer (Hypothesis 2b).

Much research on decision criteria (e.g., Rotello et al. 2005) has employed nonsocial sources. People may have fewer reasons to believe that feedback might be biased if it is computer-mediated, since computers provide feedback using logic-based rules. Person-delivered feedback means that the human factor introduces potential errors. People might be less influenced by person- than by computer-driven feedback, especially when it comes from a relatively unreliable, low-achieving person. Our results do not support this notion. Rather, they show that social sources produce adaptive criterion learning that endures over time, similar to the effect of nonsocial sources. This adds to the literature on social influences on memory, whether by social contagion (Meade & Roediger, 2002; Roediger et al. 2001) or conformity (Horry et al. 2012), by showing that social sources influence decision criterion placement.

It would be worthwhile for future work to assess how changing perceptions of feedback influences criterion shift by social sources. For instance, it would be interesting if letting participants know that that their “source” had performed poorly in his or her task, and that as a result, much feedback had been incorrect, would reduce the effect of the biased feedback manipulation. Alternatively, the reward of receiving feedback labeled “Correct” could be enough to shift criteria even when participants know that the feedback is suspect.

We also demonstrated that differences in source characteristics impact decision criterion placement, specifically when a social source appears to be reliable and high-achieving versus unreliable and low-achieving. We predicted two potential patterns of results for how source credibility could impact criterion shift. First, because people are more influenced by sources with increased credibility (French et al. 2011), the criterion might be more flexibly shifted given reliable-seeming social or computer feedback sources versus an unreliable-seeming person (Hypothesis 2a). Second, and following prior work (Han & Dobbins, 2008), individuals in the nonsocial and unreliable source type groups might display more lax criteria over time, whereas those in the reliable source type group might become stricter (Hypothesis 2b). Our data supported this second possibility.

People often prefer computer-based to person-based feedback, because negative person-mediated feedback may negatively impact performance given one’s sense of public self-consciousness (Kluger, 1993). This might be particularly salient when receiving feedback from a person who seems high-achieving versus low-achieving. Given the increased frequency of feedback labeled “Incorrect” in the second versus the first memory test, the reliable source type group might have adopted stricter criteria in order to avoid making a negative public impression (Wooten & Reed, 2004). Adopting more conservative criteria in the third test might be akin to impression management, whereby people take control over how others perceive them (for a review, see Leary & Kowalski, 1990). Projecting undesired behaviors (e.g., worsening performance) under social–evaluative threat can lead to embarrassment and related physiological changes (Dickerson, Gruenwald,  & Kemeny 2004; Dickerson & Kemeny, 2004), potentially leading to behaviors to repair one’s image (Miller & Leary, 1992). Responding conservatively in the third test might be a way to repair one’s image, regardless of the strategy’s efficacy.

Experiment 2

Experiment 1 showed that perceived worsening performance in the presence of a high-achieving peer potentially elicits stricter decision criteria over time, contrasting with lenient criteria given feedback from lower-achieving and nonsocial sources. We wanted to assess whether the perception of a source as being reliable leads to stricter criteria over sequential tests. If this is true, dispelling the notion that a reliable peer provides feedback should reduce the adoption of stricter criteria. We compared criteria from those made aware that a person was not providing feedback immediately prior to Test 3 to the criteria adopted by the reliable source participants from Experiment 1 (i.e., the “unaware” group), expecting to find less strict criteria from the aware than from the unaware group in Test 3, but not for the first two tests (Hypothesis 3).

Method

Participants

Twenty-four adults (mean age = 18.71 years, SD = 0.62, age range = 18–20; 19 female, five male) from the Brandeis community participated for credit. Twelve participants comprised the LNS group, and 12 the SNL group. These participants comprised the “aware” group. The reliable source participants from Experiment 1 will be referred to as the “unaware” group.

Materials and procedure

All materials matched those of Experiment 1. The procedure for the reliable and high-achieving source type group in Experiment 1 was followed, with one difference. Immediately before the third test, participants were informed that there had never been anyone in the other room providing feedback. They were not informed of the biased feedback. This is an important distinction, because our goal was to test how dispelling belief in a source (rather than in the procedure itself) would impact how conservative people became over time. We again computed c a and A z for our analyses.

Results

Manipulation efficacy

The aware (M = 5.21, SD = 1.79) and unaware (M = 4.54, SD = 1.62) participants did not differ in believing feedback labeled “Correct” or “Incorrect,” p = .18, suggesting that they did not suspect biased feedback. The unaware participants (M = 4.58, SD = 1.08) reported believing marginally more than the aware ones (M = 3.54, SD = 2.00) that another person had provided feedback, t(46) = 1.77, p = .08. Note that this question assessed initial belief in the social source manipulation versus belief after being told that no one had ever provided feedback. Thus, it is not problematic if the unaware and aware groups did not differ in their beliefs. However, we cannot disentangle whether they believed the manipulation throughout the experiment or whether disbelief began after becoming aware of the manipulation, leading to the marginal difference shown here. Notably, the level of belief that another person had provided feedback did not impact the reported results.Footnote 2 Additional analyses confirmed that the participants encountered more feedback labeled “Incorrect” in the second than in the first or third memory test (see the supplemental materials).

Decision criteria

We assessed whether awareness impacted decision criteria in a 2 (Awareness: unaware, aware) × 2 (Group: LNS, SNL) × 3 (Test: 1–3) mixed ANOVA (Table 3). Unless specified, refer to Tables 2 and 3 for the means and standard deviations. Participants performed above chance at study (M = .83, SD = .11), with no impact of later awareness, p = .31.

Table 3 Experiment 2: Mean (and standard deviation) unequal-variance decision criterion and accuracy estimates for the aware participants and the aware and unaware participants combined, across tests and groups

Replications

As in Han and Dobbins (2008) and our Experiment 1, the SNL group (M = .11, SD = .23) was marginally stricter than the LNS group (M = –.01, SD = .23), F(1, 44) = 3.39, p = .07, ηp 2 = .07. We also found an effect of test, F(2, 88) = 3.865, p = .03, ηp 2 = .08: People were stricter in Test 1 than in Test 2, F(1, 44) = 6.16, p = .01, ηp 2 = .12. Although Test 1 did not differ from Test 3, p = .42, the Test 3 criteria were stricter than those from Test 2, F(1, 44) = 6.16, p = .01, ηp 2 = .12.

As in Experiment 1, we observed a test by group interaction, F(2, 88) = 29.50, p < .001, ηp 2 = .40. In Test 1, SNL were stricter than LNS participants, F(1, 44) = 20.26, p < .001, ηp 2 = .32. In Test 2, SNL were again stricter than LNS participants, F(1, 44) = 13.72, p = .001, ηp 2 = .24. In Test 3, however, LNS were stricter than SNL participants, F(1, 44) = 10.85, p = .002, ηp 2 = .20.

Effect of awareness

Supporting Hypothesis 3, a marginal interaction between awareness and test emerged, F(2, 88) = 2.90, p = .06, ηp 2 = .06. Unaware participants adopted marginally stricter criteria than did aware ones in Test 3, F(1, 44) = 3.12, p = .08, ηp 2 = .07. Aware and unaware participants did not differ in the first two tests, ps > .44.

Assignment to the SNL or LNS group informed the effects of awareness and test, F(2, 88) = 5.36, p = .01, ηp 2 = 0.11 (Fig. 5). Among aware participants, criteria for the LNS and SNL groups in Test 3 did not differ, indicating that the biased feedback manipulation was unsuccessful, p = .21. However, a nonsignificant visual difference suggests that awareness did not entirely eliminate the manipulation’s effects. In contrast, the manipulation was successful in the first two tests [Test 1, F(1, 44) = 2.86, p = .09, ηp 2 = .06; Test 2, F(1, 44) = 5.16, p = .03, ηp 2 = .11]. This suggests that awareness immediately before the third test reduced the biased feedback manipulation’s strength. Among unaware participants, the biased feedback manipulation was successful across tests [Test 1, F(1, 44) = 21.84, p < .001, ηp 2 = .33; Test 2, F(1, 44) = 8.80, p = .005, ηp 2 = .17; Test 3, F(1, 44) = 11.46, p < .002, ηp 2 = .21].

Fig. 5
figure 5

The biased feedback manipulation was successful across tests among unaware participants, but only in the first two tests for the aware group

Teasing this interaction apart in a different way, unaware-LNS participants adopted stricter criteria than did aware-LNS participants in Test 3, suggesting that awareness attenuated the adoption of stricter criteria among LNS participants, F(1, 44) = 5.31, p = .03, ηp 2 = .11. By contrast, unaware- and aware-LNS participants had similar criteria for Test 2 (p = .88). The unaware-LNS participants had marginally more lax criteria than did aware-LNS participants in Test 1, F(1, 44) = 4.18, p = .05, ηp 2 = .09. Unlike in Test 3, however, the biased feedback manipulation was successful in Test 1, regardless of group. No differences emerged between the aware-SNL and unaware-SNL groups’ criteria across tests, ps > .35. No other effects were significant, ps > .53.

Accuracy results and interpretation

Accuracy across tests was above chance for all participants (M = .77, SD = .10). We again assessed accuracy within the described ANOVA. An effect of test emerged, F(2, 88) = 11.59, p < .001, ηp 2 = .21, potentially reflecting fatigue. Accuracy was lower in Test 3 than in Test 1, F(1, 44) = 12.13, p = .001, ηp 2 = .22, or Test 2, F(1, 44) = 22.67, p < .001, ηp 2 = .34. Tests 1 and 2 did not differ, p = .85. A Group × Test interaction emerged, F(2, 88) = 4.36, p = .02, ηp 2 = .09. Although there were no differences in accuracy between the LNS and SNL groups in Tests 1 and 2, ps > .52, SNL participants were marginally more accurate in Test 3 than were the LNS group, F(1, 44) = 3.846, p = .06, ηp 2 = .08. This complements the results of Experiment 1: Those manipulated to become stricter in the third test would be expected to have decreased accuracy relative to those manipulated to become lenient if being stricter over time conflicts with a tendency to become more lax. No other effects were significant, ps > .39.

Discussion

Awareness that a person did not provide feedback reduced a shift toward stricter criteria elicited in the reliable source group in Experiment 1 (Hypothesis 3). Our finding of less conservative criteria among aware versus unaware participants supports the idea that belief in the reliable source induced adoption of stricter response styles. Unaware participants were slightly stricter than the aware in Test 3, with no criterion differences for the first two tests.

Placement in the LNS or SNL groups qualified the relationship between awareness and test. For unaware participants, the biased feedback manipulation was successful across the three memory tests: LNS and SNL participants always differed in their criterion placements. Among aware participants, the LNS and SNL groups differed in the first and second tests, but not the third. Aware participants were told that a person did not provide their feedback; they were never explicitly told that the feedback itself had been manipulated. Suspicion aroused from awareness of the source manipulation might have been enough to reduce the strength of the biased feedback manipulation altogether. Indeed, awareness of deception in one experiment had corresponded with persistence of suspicion over time (Epley & Huff, 1998). Work building on Solomon Asch’s (1956) classic conformity paradigm that has indicated that suspicion reduces the conformity effect (Stricker et al. 1967) illustrates this idea. Awareness of deception in one aspect of an experiment (i.e., the “social” source) could create suspicion across the paradigm.

Additional comparisons revealed that aware-LNS participants became less strict in the third test than did unaware-LNS participants. The aware- and unaware-SNL participants did not differ. This suggests that the sequence of biased feedback may matter as well. If one is manipulated to become more lax over time (SNL), learning that a computer is actually providing the feedback could be congruent with that manipulation. If increasing leniency with time is natural (Donaldson & Murdock, 1968; Verde & Rotello, 2007), being aware that a person did not provide feedback might not interfere with that tendency. In contrast, if one is manipulated to become stricter (LNS), knowing of a deception might work against that manipulation, resulting in less conservative criteria than among the deception-unaware individuals. If the presence of a reliable and high-achieving peer assists in becoming conservative over time, disrupting belief in that peer might remove an important factor contributing to the adoption of stricter criteria.

Although the unaware-LNS group was stricter than the aware-LNS group in the third test, the unaware-LNS group was marginally more lenient than the aware-LNS group in the first. An important distinction between the first and third tests, however, is that the biased feedback manipulation was successful for both the aware and unaware groups in the first test, but only for the unaware group in the third, which is suggestive of relative success for our manipulations in the first, but not in the third, test. Considerable individual variability exists in criterion shifting, potentially due to factors ranging from task strategy to personality (Aminoff et al. 2012; Han, 2009). Individual differences could have in part contributed to some of the differences seen in the first test between the aware-LNS and unaware-LNS participants. Although random assignment to the LNS and SNL groups should in part control for this, it would be worthwhile to assess how differences in personality and task strategy, as well as in generalized suspicion of deception, contribute to how social sources influence adaptive criterion learning.

A limitation of Experiment 2 is that the aware participants were compared to the reliable source type group from Experiment 1 rather than to a new sample. Although the samples could differ as a result of collecting data for the two conditions at different points in time, the participants were drawn from the same introductory psychology students during subsequent semesters of the same academic year. Given that the same research assistants collected data from each sample and had experience with the paradigm and cover story, we have no reason to believe that a new sample would differ systematically from the one collected for Experiment 1.

General discussion

The success of the biased feedback manipulation (Han & Dobbins, 2008) across experiments demonstrates its generalizability to social feedback sources. We also showed that feedback from different kinds of sources shifts criteria toward strictness or leniency over time, with those receiving feedback from a reliable and high-achieving social source becoming stricter, and those with feedback from a lower-achieving social or a nonsocial source becoming lax. Finally, we showed that dispelling belief in the high-achieving person providing feedback reduced the tendency to become stricter and the biased feedback manipulation’s efficacy. Our findings may have important implications for real-world situations. Optimal decision criteria can vary by social situation. Our findings suggest that the characteristics of who teaches others might inhibit or enhance the production of desired behaviors from those being taught. Our results also suggest that disrupting belief in the different characteristics of the person providing feedback (e.g., someone who seems believable but ultimately is not) may reduce the efficacy of training.

Interestingly, the extent to which explicit (Aminoff et al. 2012) and implicit (Han, 2009) manipulations induce criterion shift may partially depend on individual personality differences. Notably, these personality influences impacted criterion learning in nonsocial contexts. Personality effects could be exacerbated in a social context. For instance, given that increased negative affect is associated with less criterion shifting (Aminoff et al. 2012), potentially because of less cognitive flexibility, source characteristics could further modulate the extent of relative criterion shift. Individuals with increased negative affect could display even less shifting when the person providing feedback was perceived as low-achieving. Future work could also connect personality to the tendency to become lax or strict when receiving feedback from sources varying in reliability. For instance, the tendency to become more conservative with time given feedback from a reliable source could be related to neuroticism or anxiety around others given particular sensitivity to losses. These assessments could inform interpersonal feedback situations, such as responses to workplace performance reviews and determining the candidates best suited to learn the decision strategies necessary for a job.

Across both studies, participants appeared to exhibit greater criterion shifts when manipulated to become more strict versus lax (see Figs. 3a and 4). Different shifts in criteria across groups are consistent with Han and Dobbins’s (2008) study. However, reducing the number of trials in one of their subsequent experiments eliminated group differences in criteria (Han & Dobbins, 2008, Exp. 3). Although this was not a focus of the present study, increasing task difficulty might lead to increased criterion flexibility over time.

More generally, this work extends research on social influences on memory. Beyond social sources influencing or even misleading participant responses—as in, for example, a social contagion study (Meade & Roediger, 2002)—we showed that different social source characteristics may induce adaptive criterion strategies. Thus, sociality not only influences responses, but also influences the strategies underlying the generation of memory-related responses. Although future work will be necessary to determine which individual differences regulate the strength of criterion shifts and how receiving feedback from social versus nonsocial sources impacts the neural and temporal dynamics underlying these shifts, here we provide initial evidence of social influences on adaptive criterion learning. These experiments extend research on social influences on memory to adaptive criterion learning, an important aspect of memory that, although relatively underexplored in the social domain, is highly relevant to everyday life.