Social influences on adaptive criterion learning

Cassidy, Brittany S.; Dubé, Chad; Gutchess, Angela H.

doi:10.3758/s13421-014-0497-8

Social influences on adaptive criterion learning

Published: 30 December 2014

Volume 43, pages 695–708, (2015)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Social influences on adaptive criterion learning

Download PDF

Brittany S. Cassidy^1,2,
Chad Dubé³ &
Angela H. Gutchess¹

1352 Accesses
3 Citations
3 Altmetric
Explore all metrics

Abstract

People adaptively shift decision criteria when given biased feedback encouraging specific types of errors. Given that work on this topic has been conducted in nonsocial contexts, we extended the literature by examining adaptive criterion learning in both social and nonsocial contexts. Specifically, we compared potential differences in criterion shifting given performance feedback from social sources varying in reliability and from a nonsocial source. Participants became lax when given false positive feedback for false alarms, and became conservative when given false positive feedback for misses, replicating prior work. In terms of a social influence on adaptive criterion learning, people became more lax in response style over time if feedback was provided by a nonsocial source or by a social source meant to be perceived as unreliable and low-achieving. In contrast, people adopted a more conservative response style over time if performance feedback came from a high-achieving and reliable source. Awareness that a reliable and high-achieving person had not provided their feedback reduced the tendency to become more conservative, relative to those unaware of the source manipulation. Because teaching and learning often occur in a social context, these findings may have important implications for many scenarios in which people fine-tune their behaviors, given cues from others.

Decision Making: a Theoretical Review

Article 15 November 2021

In AI we trust? Perceptions about automated decision-making by artificial intelligence

Article 01 January 2020

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Recognition memory impacts day-to-day social interactions. Failing to recognize someone you’ve met before may lead to an awkward conversation, especially if the other person recognizes you. You might also forget that you’ve gone over details of a business deal with a client, appearing unorganized when presenting them as new. People influence how we remember words, pictures, and other people (Wright et al. 2008; Wright et al. 2005). Despite the role recognition memory plays in navigating our environments, research considering the processes underlying recognition decisions has largely been conducted in non-social contexts (e.g., Eichenbaum et al. 2007; Yonelinas, 2002; Yonelinas & Parks, 2007).

To study recognition memory, researchers typically assess how people employ decision rules to respond “old or “new” to memory probes. Researchers further characterize decisions using unequal-variance signal detection theory (SDT; Fig. 1a). SDT describes the rule leading to “old” or “new” responses as a criterion dividing the memory evidence distributions that represent targets and lures. That is, items whose memory strength lies above the criterion will be identified as “old,” and items not exceeding it will be identified as “new.” The hit rate is computed as the area under the target distribution to the right of the criterion, and the false alarm rate is the area under the lure distribution to the right of the criterion. Accuracy, or “sensitivity,” is modeled as the distance between the means of the target and lure distributions, and referred to as d _a (Macmillan & Creelman, 1991). When considering confidence, several criteria partition the evidence continuum, with each partition representing a different confidence rating on a scale (e.g., from sure old to sure new). Such a model is depicted in Fig. 1b, for a six-point scale.

Individuals differ in their decision criteria. One person may call a word “old,” but be unsure, whereas another person may be very confident in the “old” response, even if the memory strength for the item is the same across the two individuals. This difference in decision criteria is illustrated in Fig. 1b, c as a shift of the response criteria to favor either more liberal (leftward shift, panel B) or more conservative (rightward shift, panel C) response biases.

People can control criterion placement between the old and new evidence distributions, and may shift criteria to favor different outcomes (Azimian-Faridani & Wilding, 2006; Hirschman & Henzler, 1998; Rotello et al. 2005; Strack & Forster, 1995; Van Zandt, 2000). One branch of research has assessed the impact of explicit response instructions on criterion placement. For instance, people can be motivated to call items “old” if given a high probability of items being “old” before making a decision (Dube & Rotello 2012; Van Zandt, 2000). Moreover, instruction to change decision criteria—by, for example, only responding “old” when absolutely confident—also affects criterion placement (Azimian-Faridani & Wilding, 2006).

Emerging work suggests that personality and social factors influence criterion placement, dovetailing with the increased focus on social influence in the broader memory literature (for reviews, see Echterhoff & Hirst, 2009; Hirst & Echterhoff, 2012). For instance, increased negative affect corresponds with less shifting, perhaps because those with enhanced negative affect could be less cognitively flexible (Aminoff et al. 2012). Stereotype threat also impacts criterion placement. For example, women under threat in academic settings make more risk-aversive financial decisions (Carr & Steele, 2010), and older adults under threat become more conservative, to avoid errors of commission on memory tests (Barber & Mather, 2013). Interestingly, explicit instructions to make lax or strict decisions influence the extent of reporting peers’ inaccurate memories in one’s own recollections, such that receiving instructions to be strict reduces the reporting of inaccurate suggestions while at the same time reducing the reporting of accurate details (Wright et al. 2008). Source reliability also impacts the extent to which people integrate others’ suggestions into their memory decisions (Skagerberg & Wright, 2009).

An open question in this growing line of research regards how people adaptively fine-tune memory decisions to be more conservative or liberal on the basis of performance feedback from others, without explicit instructions to change decision-making strategies. This is an important consideration because many aspects of learning involve adaptively changing decision-making strategies to optimize one’s likelihood of success. Nonsocial memory work has shown that people adaptively and implicitly shift criteria when given biased feedback, in which selectively false positive feedback on false alarm or miss trials encourages the adoption of a more lax or conservative decision criterion (Han & Dobbins, 2008), while hits and correct rejections are given fully correct feedback. Because false positive feedback is reserved for errors, people presumably have little basis to suspect a manipulation. As a result, people show adaptive criterion learning, becoming more lax when false alarms beget false positive feedback, and stricter when misses beget false positive feedback. These effects persist even in later memory tests without feedback (Han & Dobbins, 2009).

To date, no work has directly examined how social context—that is, who provides you with feedback—impacts the extent of adaptive criterion learning. Related work indicates, however, that external cues (i.e., information about the likelihood of information being old or new) influence memory (Jaeger, Cox, & Dobbins, 2012; Jaeger, Cox & Dobbins 2012). This is an interesting topic, as external cues can certainly be social in nature (e.g., saying you have seen someone before on the basis of a friend’s suggestion rather than your own recollection). People do indeed over-rely on external cues, showing decreased accuracy when computer-based recommendations (e.g., “Likely old”) were ultimately invalid versus valid (Jaeger, Cox, & Dobbins, 2012). Critically, social cues influence memory as well, with people taking more recommendations into consideration for memory decisions when they come from a peer who is a reliable source of information, as compared to an unreliable source (Jaeger, Lauris et al. 2012). Social cues even impact cognition when others are not physically present (Shteynberg & Galinsky, 2011). This suggests external cues may be taken into consideration depending on who provides you with information even if a computer transmits all information. People may be more likely to prioritize recommendations or feedback from reliable people or computers, which rely on logic-based rules to provide information, relative to people who seem unreliable.

The present work leverages Han and Dobbins’s (2008) biased feedback procedure to examine social influences on adaptive criterion learning. Since feedback in everyday life does not always come from a computer, as used in Han and Dobbins (2008), examining potential differences in criteria based on source sociality may be important in improving feedback learning and developing new ways to give employees and students feedback about performance. This may be critical in some arenas, such as encouraging airport security employees to adopt a lax versus strict criterion in identifying and subsequently investigating suspicious items. Adopting a lax criterion would encourage more false alarms, but those are desirable when considering public safety. Likewise, a stricter criterion would be beneficial when errors of commission should be reduced, as when vetting people for an important position. Given the broad social influences on memory, we anticipated that social sources would influence adaptive criterion learning. We expected to replicate Han and Dobbins’s (2008) findings of criterion shifting for both social or computer feedback sources (Hypothesis 1). This would suggest that peers induce criterion shifts similarly to computers acting on programmed rules.

Meta-analytic results show highly credible sources induce more persuasion than low-credibility ones (Pornpitakpan, 2004). When people believe another person is more powerful than themselves, the other person’s opinion carries more weight in influencing memory conformity (Skagerberg & Wright, 2007). In addition, a person’s relative credibility influences perceivers’ susceptibility to misinformation (French et al. 2011). Expertise conveys reliability, with persuasiveness increasing given the perceived expertise of a communicator (Birnbaum & Stegner, 1979; Cialdini & Goldstein, 2004).

For these reasons, we anticipated that differences in source reliability could impact adaptive criterion learning in two ways. First, we anticipated that participants would be less likely to shift criteria when they believed feedback came from an unreliable and low-achieving person, relative to a reliable and high-achieving person or a computer (Hypothesis 2a). Second, we anticipated that social source reliability could impact the overall tendency to become more lax with time (as evidenced in Han & Dobbins, 2008; Murdock, 1974; Verde & Rotello, 2007). Notably, we provided false feedback to participants in the first and third of three tests, but not the second. The nature of when participants received biased feedback indicates that participants would see more feedback labeled “Incorrect” in the second test than in the first, and thus would perceive worsening performance. We believed that increased exposure to feedback labeled “Incorrect” could have repercussions when sources are reliable or unreliable.

Participants might be more sensitive to losses in the presence of a person perceived as high-achieving and reliable, rather than an unreliable person. Thus, participants assigned to a reliable social source might become more risk-aversive and adopt a stricter criterion over time (Hypothesis 2b). This could reflect protective self-presentation, in which individuals avoid losing approval from others (Arkin, 1981), or maintenance of a positive self-concept (Cialdini & Goldstein, 2004; Wood, 2000). Becoming stricter could also be indicative of avoiding negative impressions in public settings (Wooten & Reed, 2004). Indeed, behaviors performed in public can lead to changes in self-concept and self-presentation relative to identical behaviors performed in the absence of an interpersonal context (Tice, 1992). In contrast, participants might not have such a desire in the presence of an unreliable and low-achieving person, and might adopt a more lax criterion over time, similar to those receiving computer-based feedback (Hypothesis 2b).

Experiment 1

Method

Participants

Seventy-two adults (mean age = 18.40 years, SD = 0.87, range = 17–23; 47 female, 25 male) from Brandeis University participated for credit. One 17-year-old obtained parental permission to participate. Participants were randomly assigned to a lax–neutral–strict (LNS) or strict–neutral–lax (SNL) biased feedback group. In all, 36 participants each were assigned to the LNS and SNL groups. Participants were also randomly assigned to nonsocial (N = 24), reliable/high-achieving (N = 24), or unreliable/low-achieving (N = 24) source groups. Analyses conducted in G*Power (Mayr et al. 2007) for sample size estimation suggested that 21 participants in each group would allow us to detect effects using alpha = .05, power = .80, and η_p ² = .10. Twelve participants assigned to each source comprised the LNS and SNL groups. The source and feedback manipulations will be discussed in turn.

Materials

Six hundred nouns were randomly drawn from a group of 4,983 words generated from the English Lexicon Project (http://elexicon.wustl.edu; Balota et al. 2007). Three lists of 200 nouns each (100 randomly assigned as targets and 100 as lures) were constructed for three study/test cycles. Two task versions counterbalanced target and lure assignments in the three cycles. The cycles were identical, but word order was randomized. We compared the words’ numbers of letters, syllables, and phonemes, as well as their Kučera–Francis (KF) corpus frequency (Kučera & Francis, 1967) in four 3 (Study/Test Cycle: 1, 2, 3) × 2 (Word Assignment: target, lure) analyses of variance (ANOVAs). No effects approached significance, ps > .22. The selected nouns contained, on average, 7.11 letters (SD = 1.68), 6.02 phonemes (SD = 1.64), and 2.40 syllables (SD = .81), with a KF frequency of 12.26 (SD = 7.00).

Procedure

Adaptive criterion learning task

We replicated the study–test cycle procedure of Han and Dobbins (2008). Stimuli were presented via E-Prime (Psychology Software Tools, Pittsburgh, PA). All participants practiced the study syllable counting task (Fig. 2a). They were instructed to indicate how many syllables were in each word and were told that they would be tested on their memory for the words immediately after the syllable-counting task. All participants completed ten practice trials. Words appeared on the screen one at a time for 2,000 ms. Below each word, the question, “How many syllables?” appeared, with choices of “1,” “2,” “3,” and “4+” listed below. Participants pressed the “1,” “2,” “3,” or “4” key to indicate how many syllables were in each word. Participants were then given accuracy feedback on the computer, but were told that they would not receive feedback on the syllable-counting task after the practice session. After practicing, participants began the first study–test cycle. In each of three study tasks, 100 words were presented, one at a time, for 2,000 ms each. A blank screen was presented for 500 ms between trials.

Immediately following the first study task, participants began the first self-paced memory test (Fig. 2b). In all tests, participants viewed 200 words one at a time on the screen. One hundred words were from the study task, and 100 were lures. Participants knew that some words would be old and some would be new, and knew that the words appearing in one cycle would not appear again. Participants pressed “1” to indicate “old” (i.e., they had seen it at study) and “2” to indicate “new” (i.e., they had not seen it).

After each decision, the question “How confident are you?” appeared on the screen with a three-point scale (1 = unsure, 3 = certain). After each rating, the word “Correct” or “Incorrect” appeared at the center of the screen for 1,000 ms as feedback on the memory decisions. A blank screen was presented for 250 ms between trials. Participants then completed two more cycles.

Source manipulation

To study social influences on adaptive criterion learning, we manipulated who or what participants believed provided them with feedback, labeled “Correct” or “Incorrect.” All participants, regardless of source assignment, filled out a questionnaire prior to the study–test cycles. This questionnaire asked each participant to identify his or her major; goal for the year; how he or she was currently feeling (via a nine-point scale), along with an explanation of that response; and the number of hours slept and how well he or she had slept (via a nine-point scale) the previous night, along with a description of sleep patterns.

The questionnaire instructions varied by source type. Participants assigned to the nonsocial source type group were told, “This questionnaire just provides us some additional information about you.” Participants assigned to the social source types were told, “This questionnaire provides us some additional information about you. Today we are teaming up with another study in the lab that has to do with information processing. Your task today involves a syllable-counting task and a memory task, which we will practice next. The other study being conducted concurrently deals with how quickly people react when processing information. While you perform the memory task, another person in a room down the hall will be monitoring your performance via a computer and entering in whether your responses were correct or incorrect. You will receive this feedback in the memory task. We also had the other participant fill out the same questionnaire as you will fill out now, and we will let you see each other’s responses without revealing your name or any identifying information, to make the experience of working together less weird.”

For participants in the nonsocial source group, an experimenter took the questionnaires away upon completion. For the reliable and unreliable source groups, an experimenter took the completed questionnaires and told participants that he or she would swap their questionnaires with the other participant’s, so they could review each other’s responses. The experimenter then left the room and closed the door, returning 1 min later with the purported other participant’s questionnaire. The experimenter told participants, “Look this over for as long as you’d like, and let me know when you’re ready to practice the task.” The questionnaires from the reliable or unreliable source were handwritten. Questionnaire responses conveyed that the source was reliable/high-achieving or unreliable/low-achieving (Table 1). Nine people provided ratings on the believability of the responses and the general reliability of the people purportedly filling out the questionnaires after completing a separate study in the lab. Taken from a nine-point scale, the responses to the questionnaires from the reliable/high-achieving (M = 6.33, SD = 1.80) and unreliable/low-achieving (M = 6.33, SD = 2.24) sources were determined to be equally believable, p > .99. However, people perceived the source labeled as reliable/high-achieving (M = 6.67, SD = 1.80) as being more reliable than the unreliable/low-achieving source (M = 4.00, SD = 2.24), t(8) = 3.27, p = .01.

Table 1 Responses to participant questionnaires for groups with the reliable/high-achieving and unreliable/low-achieving source types

Full size table

Before the first memory test, participants were reminded of who or what would be labeling their feedback as “Correct” or “Incorrect” in the memory tests. Participants in the nonsocial source group were told, “After you give your confidence rating, you will receive computer feedback letting you know if your response was correct or incorrect. You will receive feedback like this for each of the three memory tasks.” Participants in social source groups were told, “As soon as you decide whether a word is old or new, the participant in the information processing study has been instructed to react as quickly as possible whether you were correct or not. After making your confidence decision, you will be able to see this feedback telling you whether you were correct or incorrect. You will receive feedback like this for each of the three memory tasks.” If participants asked how the other person knew whether or not they were correct, they were told, “The other participant has a guide at the top of their screen where they see your response and accuracy. They have to react as quickly as possible. Their reaction is your feedback.” These instructions were not repeated for the second and third cycles.

Biased feedback manipulation

Participants received biased feedback in the first and third tests. In the first or third test, false positive feedback for all false alarm responses (i.e., receiving “Correct” as feedback when they classified a “new” word as “old”) encouraged a lax criterion (L; lax). Other responses (i.e., hits, misses, and correct rejections) were correctly identified. The feedback in the second test was fully correct (N; neutral). In the first or third test, false positive feedback to all miss responses (i.e., receiving “Correct” as feedback when identifying an old word as “new”) encouraged a conservative criterion (S; strict). The order of the manipulation was reversed for the LNS and SNL groups. This manipulation was conducted across participants; the participants differed in whom they were told provided the feedback.

After the study–test cycles, participants were debriefed and completed a posttask questionnaire. This questionnaire addressed the extent to which participants had detected that the feedback was systematically biased (“Did you believe the feedback given to you?”: 1 = not at all, 4 = moderately, 7 = very much so), and how much they believed that another person had provided the feedback (“If you were told someone would be in another room giving you feedback during the memory task, how much did you believe that?”: 1 = not at all, 4 = moderately, 7 = very much so).

Results

Manipulation efficacy

An ANOVA on the posttask questionnaire responses showed no main effect of source type, p = .23 M _nonsoc = 4.79, SD = 1.87; M _reli = 4.54, SD = 1.62; M _unreli = 5.38, SD = 1.61), indicating that participants did not suspect the systematic biased feedback. A t test confirmed that the participants assigned to social source types (M _reli = 4.58, SD = 2.08; M _unreli = 4.54, SD = 1.62) did not differ in believing that a person had provided feedback, p = .94. Although these scores suggest the efficacy of the manipulations (i.e., on average, above moderate ratings), answering questions after debriefing may have attenuated the degree of belief expressed by participants.

Because Hypothesis 2b depended on the idea that participants perceived worsening performance over time, additional analyses confirmed that participants encountered more feedback labeled “Incorrect” in the second than in the first or the third memory test (see the supplemental materials).

Signal detection measures

To most closely replicate Han and Dobbins’s (2008) analyses, we computed c _a and A _z for our analyses. C _a and A _z were calculated as follows (Macmillan & Creelman, 1991):

$$ {A}_z = \varPhi \left({d}_a/\surd 2\right), $$

$$ {c}_a=\frac{-\sqrt{2s}}{{\left(1+{s}^2\right)}^{\raisebox{1ex}{$1$}\!\left/ \!\raisebox{-1ex}{$2$}\right.}\left(1+s\right)}\left[z(H)+z(F)\right] $$

where s is the slope of the zROC and d _a is defined as

$$ {d}_a=\sqrt{\frac{2}{1+{s}^2}\left[z(H)-sz(F)\right]} $$

C _a and A _z were calculated after obtaining the zROC slope estimates from a linear regression. These slope estimates are comparable to 1/σ_Target, as obtained from model-fitting techniques (Ratcliff et al. 1994; Ratcliff et al. 1992).

Unsurprisingly, given the large literature on this topic, our recognition data were consistent with the assumptions of the unequal-variance SDT model. As such, the measures that we used, which were derived from that model, provide statistically independent measures of memory accuracy (A _z) and response bias (c _a; Macmillan & Creelman, 1991). This means that although accuracy and bias were both changing across conditions in the present data set, our measures are unlikely to have been unduly influenced by this fact or to result in any statistical artifact as a result of the concurrent variation (Dube & Rotello, 2012; Green & Swets, 1966; Pazzaglia et al. 2013). The means and standard deviations are reported in Table 2 unless specified otherwise.

Table 2 Experiment 1: Mean (and standard deviation) unequal-variance decision criterion and accuracy estimates across source types, groups, and tests

Full size table

Decision criteria

We assessed the effects of the biased feedback and source manipulations by focusing on decision criterion (represented as the middle criterion in Figs. 1a, c) in a 3 (Source Type: nonsocial, reliable, unreliable) × 2 (Group: LNS, SNL) × 3 (Test: 1–3) mixed ANOVA (Table 2).^{Footnote 1} Participants performed above chance at study (M = .86, SD = .05), with no effect of source type, p = .45.

Replication of prior work

As in Han and Dobbins’s (2008) findings, SNL participants were stricter (M = .19, SD = .23) than LNS participants (M = .02, SD = .23), F(1, 66) = 9.35, p = .003, η_p ² = .12. We found a marginal effect of test, F(2, 132) = 2.85, p = .06, η_p ² = .04. Criteria were stricter in Test 1 than in Test 2, F(1, 66) = 11.01, p = .001, η_p ² = .14. The criteria between Tests 1 and 3 and between Tests 2 and 3 were similar, ps > .13.

Supporting Hypothesis 1, an interaction existed between group and test, F(2, 132) = 54.10, p < .001, η_p ² = .45 (Fig. 3a). SNL participants were stricter in Test 1 than LNS, F(1, 66) = 44.48, p < .001, η_p ² = .40. This persisted into Test 2, F(1, 66) = 24.82, p < .001, η_p ² = .27. Consistent with Han and Dobbins (2008, Exp. 3), the criteria also differed in Test 3, F(1, 66) = 16.34, p < .001, η_p ² = .20. Here, LNS were stricter than SNL participants. Notably, the linear trend for the interaction between group and test was significant, F(1, 66) = 60.72, p < .001, η_p ² = .48.

Social influences on criterion shift

Crucially, an interaction between source type and test emerged, F(4, 132) = 3.75, p = .006, η_p ² = .10 (Fig. 3b), revealing social influence on criterion learning that supported Hypothesis 2b: Those with an unreliable source adopted stricter criteria in Test 1 than in Test 2, F(1, 22) = 16.71, p < .001, η_p ² = .43, and Test 3, F(1, 22) = 8.99, p = .007, η_p ² = .29. Their criteria between Tests 2 and 3 did not differ, p = .68. This demonstrates a tendency to become more lax over time, consistent with prior work (Donaldson & Murdock, 1968; Verde & Rotello, 2007). The nonsocial source group showed a nonsignificant visual trend toward leniency, ps > .13. The reliable source group, however, became more conservative: Their criteria did not differ between Tests 1 and 2, p = .41, but were stricter from Test 2 to 3, F(1, 22) = 6.99, p = .02, η_p ² = .24. Tests 1 and 3 did not differ, although a trend toward becoming stricter was evident, p = .13. No other effects were significant, including an interaction between source type, group, and test that would have supported Hypothesis 2a, ps > .16.

Accuracy results and interpretation

Accuracy across the memory tasks was above chance for all participants (M = .79, SD = .09). Although not of primary interest, we assessed accuracy using the ANOVA described above, to be consistent with prior work. As in the previous findings (Han & Dobbins, 2008), accuracy decreased over time, reflecting potential fatigue or proactive interference effects over the experiment, F(2, 132) = 13.99, p < .001, η_p ² = .18: Accuracy was lower in Test 3 than in Test 1, F(1, 66) = 18.32, p < .001, η_p ² = .22, or Test 2, F(1, 66) = 23.41, p < .001, η_p ² = .26. Accuracy did not differ between Tests 1 and 2, p = .73. LNS participants (M = .82, SD = .09) were more accurate than SNL (M = .77, SD = .09), F(1, 66) = 5.12, p = .03, η_p ² = .07.

A marginal interaction emerged between group and test, F(2, 132) = 2.66, p = .07, η_p ² = .04. Accuracy differed between the LNS and SNL groups in Tests 1, F(1, 66) = 4.69, p = .03, η_p ² = .07, and 2, F(1, 66) = 11.82, p = .001, η_p ² = .15, but not in Test 3, p = .48.

The marginal interaction between group and test was qualified by source type, F(4, 132) = 4.27, p = .003, η_p ² = .12 (Fig. 4). We will first discuss the patterns within the LNS group. For nonsocial source participants, we found an effect of test, F(2, 22) = 12.52, p < .001, η_p ² = .53: Accuracy decreased on Test 3 relative to Tests 1, F(1, 11) = 23.25, p = .001, η_p ² = .68, and 2, F(1, 11) = 11.63, p = .01, η_p ² = .51. Accuracy between Tests 1 and 2 did not differ, p = .92. For reliable source participants, there was also an effect of test, F(2, 22) = 11.32, p < .001, η_p ² = .51: Accuracy decreased on Test 3 relative to Tests 1, F(1, 11) = 12.75, p = .004, η_p ² = .54, and 2, F(1, 11) = 14.68, p = .003, η_p ² = .57. Accuracy did not differ between Tests 1 and 2, p = .83. Unreliable source participants showed no effect of test, p = .70.

We now discuss the patterns for the SNL group. Nonsocial and reliable source participants showed no effect of test, ps > .52. Test did matter, however, for unreliable source participants, F(2, 22) = 5.54, p = .01, η_p ² = .34: Accuracy was increased in Test 1 relative to Test 3, F(1, 11) = 9.58, p = .01, η_p ² = .47, and Test 2, F(1, 11) = 5.76, p = .04, η_p ² = .34. Accuracy was not different between Tests 2 and 3, p = .25. No other effects were significant, ps > .13.

Speculatively, being initially manipulated to become strict or lax could be one reason why accuracy patterns persisted or decreased with time. Anchoring in a lax response style (LNS) means that being manipulated to become more conservative could conflict with the tendency to become more lax over time (Donaldson & Murdock, 1968; Verde & Rotello, 2007). Such a conflict could result in decreased accuracy in the third test. This effect might only be expected among those with nonsocial and reliable sources, since they might be more invested in the task than were those with a lower-achieving source. Anchoring in a stricter response (SNL) style with a manipulation to become more lax would not result in such a conflict, perhaps lowering the chance of accuracy decreases. Decreased accuracy, however, could be noted among those with a lower-achieving source, since fatigue effects coupled with general comfort in the task could result in less accurate performance if participants felt less pressure to perform well. Although this is not our present focus, further work could clarify the mechanisms underlying maintained or decreased accuracy given different social sources.

Discussion

Experiment 1 replicated work (Han & Dobbins, 2008, 2009) showing adaptive criterion learning through biased feedback. Extending the literature, we showed learning regardless of source sociality, and of whether social sources seemed high-achieving and reliable or low-achieving and less reliable (Hypothesis 1). This suggests that people are susceptible to shifting their criterion placement in response to social sources, even when the sources are arguably imperfect. Moreover, we suggest that a pattern of adopting stricter criteria with time within the reliable source group (unlike in the nonsocial and unreliable groups) might reflect a self-protective mechanism when being evaluated by a reliable and high-achieving peer (Hypothesis 2b).

Much research on decision criteria (e.g., Rotello et al. 2005) has employed nonsocial sources. People may have fewer reasons to believe that feedback might be biased if it is computer-mediated, since computers provide feedback using logic-based rules. Person-delivered feedback means that the human factor introduces potential errors. People might be less influenced by person- than by computer-driven feedback, especially when it comes from a relatively unreliable, low-achieving person. Our results do not support this notion. Rather, they show that social sources produce adaptive criterion learning that endures over time, similar to the effect of nonsocial sources. This adds to the literature on social influences on memory, whether by social contagion (Meade & Roediger, 2002; Roediger et al. 2001) or conformity (Horry et al. 2012), by showing that social sources influence decision criterion placement.

It would be worthwhile for future work to assess how changing perceptions of feedback influences criterion shift by social sources. For instance, it would be interesting if letting participants know that that their “source” had performed poorly in his or her task, and that as a result, much feedback had been incorrect, would reduce the effect of the biased feedback manipulation. Alternatively, the reward of receiving feedback labeled “Correct” could be enough to shift criteria even when participants know that the feedback is suspect.

We also demonstrated that differences in source characteristics impact decision criterion placement, specifically when a social source appears to be reliable and high-achieving versus unreliable and low-achieving. We predicted two potential patterns of results for how source credibility could impact criterion shift. First, because people are more influenced by sources with increased credibility (French et al. 2011), the criterion might be more flexibly shifted given reliable-seeming social or computer feedback sources versus an unreliable-seeming person (Hypothesis 2a). Second, and following prior work (Han & Dobbins, 2008), individuals in the nonsocial and unreliable source type groups might display more lax criteria over time, whereas those in the reliable source type group might become stricter (Hypothesis 2b). Our data supported this second possibility.

People often prefer computer-based to person-based feedback, because negative person-mediated feedback may negatively impact performance given one’s sense of public self-consciousness (Kluger, 1993). This might be particularly salient when receiving feedback from a person who seems high-achieving versus low-achieving. Given the increased frequency of feedback labeled “Incorrect” in the second versus the first memory test, the reliable source type group might have adopted stricter criteria in order to avoid making a negative public impression (Wooten & Reed, 2004). Adopting more conservative criteria in the third test might be akin to impression management, whereby people take control over how others perceive them (for a review, see Leary & Kowalski, 1990). Projecting undesired behaviors (e.g., worsening performance) under social–evaluative threat can lead to embarrassment and related physiological changes (Dickerson, Gruenwald, & Kemeny 2004; Dickerson & Kemeny, 2004), potentially leading to behaviors to repair one’s image (Miller & Leary, 1992). Responding conservatively in the third test might be a way to repair one’s image, regardless of the strategy’s efficacy.

Experiment 2

Experiment 1 showed that perceived worsening performance in the presence of a high-achieving peer potentially elicits stricter decision criteria over time, contrasting with lenient criteria given feedback from lower-achieving and nonsocial sources. We wanted to assess whether the perception of a source as being reliable leads to stricter criteria over sequential tests. If this is true, dispelling the notion that a reliable peer provides feedback should reduce the adoption of stricter criteria. We compared criteria from those made aware that a person was not providing feedback immediately prior to Test 3 to the criteria adopted by the reliable source participants from Experiment 1 (i.e., the “unaware” group), expecting to find less strict criteria from the aware than from the unaware group in Test 3, but not for the first two tests (Hypothesis 3).