1 Introduction

In recent years, retraction has been the focus of many works in the philosophy of language. Indeed, since MacFarlane’s work on assessment-relativism (see MacFarlane, 2014), retraction has been one of the touchstones of the debates discussing the meaning of perspective-dependent expressions, such as epistemic modals or predicates of personal taste, and the crux of the contextualism/relativism debate (C/R debate). Those who defend contextualism, on the one hand, (see, for example, Marques, 2014; Marques & García-Carpintero, 2014) tend to minimize the importance of retraction, or deny its mandatory nature in virtue of truth assessment-sensitivity. Those, on the other hand, who advocate relativism (see, for example, MacFarlane, 2011, 2014; Bordonaba & Villanueva, 2018; Bordonaba, 2019) tend to argue that retraction is crucial for explaining the meaning of perspective-dependent expressions.

Several recent studies (see Knobe & Yalcin, 2014; Khoo, 2015; Marques, 2018; Kneer, 2021a) address speakers’ intuitions on retraction experimentally. They test speakers’ intuitions regarding the truth-value of sentences involving perspective-dependent expressions, and the mandatory nature of retraction. Concerning the first point, different studies conclude that speakers’ intuitions align with the contextualist thesis –in evaluating the truth-value of sentences involving perspective-dependent expressions, they tend to choose the standard of the context of utterance, not that of the context of assessment. Concerning the second point, these studies conclude that competent speakers (of English) do not consider retraction to be mandatory. For example, I have no obligation to retract from the claim I made when I was fifteen years old proclaiming Oasis to be the best British band after The Beatles. It was true then, according to my taste at the time, and it’s no longer true once time has instilled some sense into my musical likings. I would not make the same claim now, I probably –hopefully– never will, but that doesn’t mean that I’m under any obligation to retract what I said back then. The error, and the shame, belong to my previous self, and there’s nothing that my present self can, or needs to, do about that.

This outcome clashes with MacFarlane’s (2014) thesis, according to which speakers are compelled to retract a previous assertion if, after some time, they evaluate it differently. If retraction is mandatory, then relativism seems to be in a better position than its semantic competitors, given that its semantic framework allows you to take back a claim made in a different time, under different standards. I can, and should, retract from the claim that Oasis was the best British band after The Beatles. Only a relativist semantics can accommodate the intuition that rationality requires that we retract those previous assertions that we now deem false (see MacFarlane, 2014, ch. 12).

Our purpose is not to argue for or against any of these two positions, but to emphasize that retraction is a significantly public phenomenon, something that both positions have ignored. Typically, two features are common to contextualists and relativists. They take for the most part into consideration private scenarios, and they do not discriminate between descriptive and evaluative statements. The vignettes used in these experimental studies follow the toy examples used in MacFarlane (2014), i.e., private conversations between two people: “the vignettes employed have been modeled closely on relativists’ favorite examples.” (Kneer, 2021a, 6), and this might pose a problem because, as Krabbe (2001) points out, speakers’ intuitions about a retraction may vary depending on the circumstances surrounding the retraction. In particular, we have no principled reason to suppose that public and private contexts behave alike with respect to our intuitions on retraction, or that we feel the same pressure to retract our descriptive and our evaluative statements. These are the points that we take issue with in this paper. If these two hypotheses are confirmed, our study would be putting new evidence on the table that both sides of the C/R debate should account for, and consider in future empirical studies.

Krabbe’s work on retraction does not belong to the philosophy of language. However, we can extract a crucial insight to illuminate the shortcomings of the cases used in experimental studies to test speakers’ intuitions. Krabbe’s goal is to examine the role that retractions play in critical discussion, that is, a conversation that starts “from an initial state of controversy, a dispute, and aims at a resolution of the dispute” (Krabbe, 2001, p. 143). Retractions were typically considered to be an obstacle to critical discussion. Krabbe argues, against Hamblin’s (1970), that retractions are crucial for reasonable and critical discussion. More specifically, he defends that retractions are not essentially disruptive, but that their disruptive nature varies depending on the situation: “in real life an occasional retraction of some statement might be condoned without serious effects upon the course of the dialogue, but frequent retractions and retractions of statements that, for one reason or other, are deemed vital to the argument can be utterly disruptive.” (Krabbe, 2001, p. 142). In other words, in a mundane discussion with a friend, a retraction may have no detrimental effect on the course of the dialogue and the argument. However, a retraction of an argument made at the United Nations may scuttle the entire discussion. Besides, Krabbe (2001) highlights that the speaker’s intuitions about retraction may vary depending on the type of dialogue in which the retraction takes place, and the commitments undertaken by the speakers in that dialogue: “the diversity of types of dialogues may constitute one source for the divergence between our intuitions about the permissibility of retraction.” (Krabbe, 2001, p. 145). And later, “different types of commitment may constitute another source for the diversity of our intuitions about retraction.” (Krabbe, 2001, p. 146).

Following Krabbe’s intuition, in this paper we empirically test whether the type of conversation or dialogue affects competent speakers’ intuitions about retraction. On the one hand, we test whether the fact that speakers make descriptive or evaluative statements influences their intuitions regarding the obligatory nature of the retraction. On the other hand, we want to empirically test the hypothesis that retraction is, above all, a public phenomenon. In public contexts, retraction is frequently demanded from certain people, for specific statements. Retraction plays a central role in these conversations. Moreover, the effects that follow from retraction are significant, sometimes even constituting “reparations” for the damage produced by those from whom a retraction is demanded (see Kukla, 2023). However, the examples used in the literature are examples of private conversations, where retraction is not particularly important. These cases involve situations in which nothing crucial seemingly depends on whether retraction is conceded; they are situations in which nothing of major significance is at stake for either speaker. If you liked fish sticks when you were three years old, said so when you ate them, and do not like them now, that is okay; no retraction is likely to be demanded from you. However, if someone publicly says, for example, that plus-size or transgender models should not be included in a fashion show because the show as it stands is fantastic (Alegre, 2019), and later on realizes that they were wrong in saying so, then the case seems to involve very different practical consequences. Intuitions of competent speakers may vary from one scenario to another. They may not consider a retraction mandatory in a private conversation, but may do so in a public setting.

Additionally, an analysis of the terms that most frequently appear with “retract” and retractarse “retract” in the English Web 2020 (enTenTen20) corpus (see Jakubícek et al., 2013) and the Spanish Web 2018 (esTenTen2018) corpus (see Kilgariff & Renau, 2013) in Sketch Engine, shows that retraction is also a public phenomenon, not just a private one. For example, the verb retractar / retractarse “retract” has a total frequency in the corpus of 26.953. Its most statistically significant modifier is públicamente “publicly,”, with a frequency of 654 and a Log Dice of 6.6.Footnote 1 Likewise, one of the modifiers that most frequently appears with the verb “retract” is “publicly”, with a frequency of 319 and a Log Dice of 3,5. In short, theoretical and empirical reasons exist to consider the possibility that retraction is a public phenomenon.

We approached the studies contained in this paper with three general objectives. First, we seek to test the hypothesis that (H1) public and private contexts affect the possibility and the obligatory character of retraction. That is, we aim to test whether a speaker that made in the past a claim which turns out to be false, as assessed from a posterior time, might be asked to retract what she said, and also whether the speaker should retract what she said. Second, we evaluate whether (H2) the nature of the statement, i.e., whether it was a descriptive or evaluative claim, also affects the mandatory character of retraction. Finally, we aim to test whether (H3) participants’ abstract beliefs about the mandatory character of retraction are incongruent with patterns in their concrete, case-based evaluations or, on the contrary, they are aware of the principles guiding their assessments.

To test these hypotheses, we have conducted two studies. Study 1 employs a set of four hypothetical scenarios, each involving a statement made by a speaker at a context, which turns out to be false as assessed from a later context, either because the statement has been proven to be untrue or because the speaker’s standards have changed. We manipulate the context where the statement was made (public or private) and the nature of the statement (descriptive or evaluative). Study 2 aims at replicating the results of Study 1, in order to strengthen our findings.Footnote 2

The plan for the paper is as follows. Section 2 reviews the experimental data on retraction, assessing the findings of different experimental studies, as well as the independent variables considered in them. We will show that neither of them test the interaction of the variables analyzed in our work –public versus private, descriptive versus evaluative. Section 3 introduces some preliminary methodological remarks. Section 4 presents the first study we have conducted. Section 5 focuses on Study 2. Section 6 discusses the results of our studies and offers some conclusions.

2 Experimental works on retraction

Most of the experimental work on retraction has focused on epistemic modals, although there is also work that takes into consideration predicates of personal taste. In this section, we review three of these studies (Knobe & Yalcin, 2014; Khoo, 2015; Kneer, 2021a), whose data show that speakers’ intuitions align with the contextualist hypothesis –meaning is determined by the context of utterance, and thus retraction is not mandatory. We will then present some studies (Beddor & Egan, 2018; Dinges & Zakkou, 2020) that question the results of the former by pointing out that various effects might influence these intuitions. As stated at the outset, the aim of this paper is not to argue in favor of one of the positions in contention. Rather, our goal is to enrich the debate by adding new elements that both contextualists and relativists have to account for. So this section reviews the empirical literature with that purpose in mind.

Knobe and Yalcin (2014) presents four empirical studies that explore speakers’ intuitions on epistemic modals. In the first two studies, Knobe and Yalcin tested whether participants judged a present-tense bare epistemic possibility claim (BEP), e.g., “Fat Tony might be dead”, as true only if the prejacent, i.e., “Fat Tony is dead”, is compatible with the information that participants have, i.e., the information available in the assessor’s context. According to the results they obtained, participants mostly evaluated –using a 7-point Likert scale ranging from 1 (“completely disagree”) to 7 (“completely agree”)– the nonmodal statement as false (M = around 6 and 6.7 respectively) and the modal statement as true (M = around 5 and 4.8 respectively). Against the relativist intuition, these experiments show that, despite the fact that participants knew that Fat Tony was alive, and therefore the prejacent was incompatible with their information, they judged the BEP “Fat Tony might be dead” as true. In this sense, they extract the information relevant for determining the truth-value of the BEP from the context of utterance, instead of the context of assessment.

To strengthen the obtained outcome, Knobe and Yalcin conducted two further studies. In Study 3, they tested the appropriateness of retraction. The results showed that participants in the nonmodal condition agreed that it was right for the speaker to recognize she was wrong (M = 6.5 approximately). However, this rating was significantly lower than the previous one for the modal condition (M = 4 approximately). Finally, in Study 4 they tested whether it is the case that if a subject A thinks that it would be appropriate for a speaker to retract a statement, then A considers such statement as false. According to their results, retraction and falsity ratings were similar in the nonmodal scenario (both 6 over 7 approximately), but they were significantly different in the modal scenario (retraction: 5.5 over 7 approximately; falsity: 3 over 7 approximately). Thus, it seems that even when retraction is seen as appropriate, the statement is not necessarily evaluated as false. Relativism once again seems to be in turmoil.

Khoo (2015) follows in the steps of Knobe and Yalcin (2014). The Fat Tony vignette is presented to the participants, but in this new experiment participants are told that they are part of the criminal investigation. The results of the experiments show two different things. First, they replicate the results of Knobe and Yalcin (2014). Speakers’ intuitions align with those of the contextualist, because speakers do not judge the expert’s sentence in Fat Tony’s case, the sentence that included the epistemic modal, to be false. Specifically, the participants mostly evaluated the nonmodal statement as false (M = 6.10), but they did not, for the most part, evaluate the modal statement as false (M = 2.42). Second, this study shows that speakers are willing to reject sentences that include epistemic modals, but not to judge such sentences as false. In other words, rejection intuitions and truth-value intuitions come apart when speakers are considering a BEP. In conclusion, the study shows that, when a speaker A utters a sentence that includes an epistemic modal, other speakers may reject speaker A’s statement, but the reason they reject it is not that they consider A to have said something false.

Moving to a different set of examples, Kneer (2021a) has recently conducted three studies, focused on predicates of personal taste instead of epistemic modals. These studies’ results showed that participants strongly disagree with the idea that if somebody changes his taste over time, then their taste claims from 20 years ago were false (M = 2.24 over 7 in a first scenario, M = 2.21 over 7 in a second scenario), and with the claim that he should retract what he said in the past (M = 2.77 over 7 in the first scenario, M = 2.34 over 7 in the second one; using again a 7-point Likert scale ranging from 1 (“completely disagree”) to 7 (“completely agree”) for both claims).Footnote 3 In a final study, Kneer modified the vignettes so that the speaker’s taste does not change. In one version of the first scenario, the protagonist has never liked fish sticks, and still does not (‘No/No’), which is the benchmark for relativism. In another version, the protagonist has always liked fish sticks, and still does (‘Yes/Yes’), which is the benchmark for contextualists. These versions contrast with the target version, previously tested (‘Yes/No’). Concerning truth-evaluations, the rate of target scenario (‘Yes/No’) was similar to the results in experiments 1 and 2 (M = 1.72 over 7). The results of the contextualist benchmark case (‘Yes/Yes’) were similar to the target scenario’s results (M = 1.38 over 7). Finally, the results of the relativist benchmark case (‘No/No’) were much higher (M = 5.96 over 7). Concerning retraction, the results were the following: Target scenario (M = 2.05 over 7), the contextualist benchmark case (M = 1.3 over 7), and the relativist benchmark case (M = 4.25 over 7).

Taking stock: contextualism and relativism differ in their predictions about how competent speakers assess the truth-values, and the requirement to retract sentences including epistemic modals and predicates of personal taste. Empirical data from three different works show that competent speakers’ intuitions about retraction align with contextualist predictions. Falsity and retractability do not seem to go hand in hand. We are under no obligation to retract from our previous statements even when we discover that they were wrong. Statements including predicates of personal taste and epistemic modals are perspectival, their truth-conditions depend on the standards that we assume at a time. When those standards change, we simply change our statements. We shouldn’t retract from something that we said, because when we said it, with respect to the standards that we held at the time, our statement was true. So goes the contextualist intuition that seems to be vindicated by these studies.

Different studies, however, question these conclusions, by pointing out that different factors may affect speakers’ intuitions about retraction. For example, Beddor and Egan (2018) defends that the empirical data of Knobe and Yalcin (2014) and Khoo (2015) do not support the contextualist intuition. Instead, they defend flexible relativism, the idea that, in evaluating the truth-value of a BEP, speakers can choose different contexts of assessment, either their own context, the context of the person who uttered the BEP, or some other context. Then, they point out some pragmatic constraints that allow them to predict when speakers choose one over the other. One of these pragmatic constraints is the QUD (for question under discussion) constraint.Footnote 4 In a nutshell, the QUD constraint states that, when speakers assess the truth-value of a BEP in a conversational context c, they will evaluate the utterance using the most relevant context of assessment, whichever it is, to answer the QUD in c (Beddor & Egan, 2018, p. 10). Then, they ran two experiments to test whether truth-value evaluations are QUD-sensitive. They assign participants to one of the two following QUD-conditions: the first one is focused on the BEP itself (it makes clear that the point of the experiment’s question is to determine the truth-value of the BEP), and the second condition is focused on the speaker (it makes clear that the point of the experiment’s question is to determine whether the person who uttered the BEP made an appropriate inquiry). 85% of the participants answer that the BEP is false under the first condition, and 67% of the participants answer that the BEP is false under the second one.Footnote 5 These results provide evidence of a QUD effect. When participants are assigned to the first condition, they evaluate the BEP as false because they do not choose the speaker’s context, but the context with the best information available. Our own studies will expand on this idea that context determines speakers’ intuitions about the nature of retraction.

Dinges and Zakkou (2020) also questions whether the data support the contextualist thesis. They defend the existence of a direction effect on taste predicates –speakers will have contextualist or relativist intuitions about retraction concerning propositions involving taste predicates depending on the direction in which their tastes change. For example, if someone starts by liking fish sticks, but later dislikes them, they will favor contextualist intuitions, and will not be prone to retract their earlier assertion. However, if someone starts disliking beer but later on likes it, they will favor relativist intuitions, and will be more inclined to retract their earlier assertion. Dinges and Zakkou assign participants to either of two conditions: the NLtoL-condition (for not liking to liking), in which participants are asked to imagine themselves in a scenario were they started not liking Yumble, a new brand of bubblegum, but, a few weeks later, they ended up liking it; and the LtoNL-condition (for liking to not liking), which is just the other way round. Then, participants have to rate, on a scale of 0 to 100, how likely they are to judge their initial statements of “Yumble is tasty” and “Yumble is not tasty” as true or false. On the one hand, the results show that participants assigned to the LtoNL-condition gave to the true option higher ratings (M = 47.01) than they gave to the false option (M = 42.87), but this difference was not statistically significant. On the other hand, participants assigned to the NLtoL-condition gave to the false option higher ratings (M = 54.28) than they gave to the true option (M = 38.10).Footnote 6

Overall, the empirical evidence seems to support the contextualist side in this debate (see Marques, 2018, but especially Kneer, 2022). Despite the nuances introduced by Dinges and Zakkou’s studies, their results were not replicated under slightly modified scenarios (Kneer, 2022). While Knobe and Yalcin’s second study reported some support for the appropriateness of retraction in the modal condition, these results were neither replicated under modified scenarios where participants were asked whether the speaker “is required to take back what she said” instead of whether it is appropriate for the speaker to do so (Kneer, 2022).

One way to interpret the contention between contextualists and relativists is as part of the broader debate about norms of assertion (Kneer, 2018, 2021b, 2022). Understood this way, the contention centers in the validity of the norms of assertion proposed by MacFarlane’s relativism, i.e., Reflexive Truth Rule –“An agent is permitted to assert that p at context c1 only if p is true as used at c1 and assessed from c2” (MacFarlane, 2014, p. 103), and Retraction Rule –“An agent in context c2 is required to retract an (unretracted) assertion of p made at c1 if p is not true as used at c1 and assessed from c2” (MacFarlane, 2014, p. 108), and its possible alternatives (seeKneer, 2018, 2021b, 2022). Our aim, in this context, is not to provide evidence in favor of relativism, only to point to the fact that in order to gather proper evidence regarding retraction as a rule of assertion, the difference between public and private contexts should be taken into account. In this sense, there might be a shortcoming in the empirical literature so far. Note that the vignettes employed in the previously reviewed studies take place in private contexts. Take the following sentences as a proof of this claim: “Watching this discussion on television, Fat Tony says to his henchmen” (Knobe & Yalcin, 2014, p. 11), “Sally and George are talking about whether Joe is in Boston. […] Sally says: “Joe might be in Boston.”” (Knobe & Yalcin, 2014, p. 14), “Sally asks him whether he still likes fish sticks and John says he doesn’t anymore.” (Kneer, 2021a, p. 6458), “Sally asks him whether he still thinks building sandcastles is fun, and John says he doesn’t.” (Kneer, 2021a, p. 6459). “Yumble is a new brand of bubblegum […] One day you decide to try one. You tell your friend Paul: “Yumble isn’t tasty.”” (Dinges & Zakkou, 2020, p. 8).

Most of these cases, private conversations, are situations where nothing of importance depends on whether the speakers actually retract their previous assertions. In these situations nothing is at stake for either speaker. If Sally was wrong concerning Joe’s location during a private chat, it would be odd for George to insistently ask her to retract. If John liked fish sticks, or loved building sandcastles when he was three years old, while he does not like them now, that is fine as it is. If you didn’t like Yumble bubblegums and you like them now, that’s okay, it would be really strange for your friend Paul to ask you to retract what you said. In a private conversation with a relative, retraction may have no effect on the course of the dialogue, or the argument. In fact, asking a relative in such a context to retract could be seen as a bizarre request, or as a way to achieve something else.

As mentioned, the goal of this paper is not to argue in favor of one of the positions in contention. Our results cannot be used to support the validity of Reflexive Truth Rule, Retraction Rule, or relativism in general. Instead, our aim is to single out new effects that can influence people’s intuitions on retraction in order to enrich the discussion between contextualists and relativists. The public nature of retraction is something that both contextualists and relativists should account for, and a factor that should be included in the design of the vignettes of empirical studies. The hypothesis we want to test is whether speakers’ intuitions about retraction vary depending on the circumstances surrounding the linguistic exchange. In particular, we believe that retraction is, above all, a public phenomenon, and therefore retraction might not be mandatory in a private conversation but might be seen as obligatory in a public setting. Moreover, we wonder whether the nature of the statement –i.e., its evaluative or descriptive character– can also affect people’s intuitions regarding retraction. If so, such findings would significantly affect the results obtained by the empirical research on retraction so far: the evidence might be inconclusive or support the contextualist intuition because most of the tested vignettes are based on private conversations, and don’t discriminate between descriptive and evaluative statements.

As mentioned above, we have conducted two empirical studies to test three hypotheses: (H1) public and private contexts affect the possibility and the obligatory character of retraction, (H2) the nature of the statement, i.e., whether it was a descriptive or evaluative claim, also affects the possibility and the mandatory character of retraction, and (H3) participants’ abstract beliefs about the mandatory character of retraction are incongruent with patterns in their concrete, case-based evaluations or, on the contrary, they are aware of the principles guiding their assessments. However, before detailing the results of the studies with respect to these three hypotheses, we will make a few methodological remarks.

3 Preliminary methodological remarks

Retraction being a highly context-dependent phenomenon, our judgments about the demand for retraction issued by X may be affected by (1) the particular way in which this demand is expressed, (2) what those involved in the conversation know, and know that others know. Besides, as Dinges and Zakkou (2020, p. 7) points out, asking participants for the permissibility or the obligatory nature of retraction in a given scenario could be problematic, since these are normative semi-technical notions. Directly asking participants what they would do in a given situation is a better strategy, provided we can assume that “people presumably are good counterfactual reasoners at least as far as their own speech behavior is concerned.” (Dinges & Zakkou, 2020, p. 7). In our vignettes, participants are asked whether someone might ask others to retract, or whether participants think that they should retract, rather than being asked whether they agree with whatever somebody else says. This gives us the following testable prediction:

TPR (testable prediction on retraction): the participants will judge that subject X should retract, or that subject Y could ask subject X to retract, if the participant judges subject X’s or Y’s assertion to be impermissible.

Some brief remarks on the criteria that we have taken into consideration to design our cases also need to be mentioned. Some of the features that we will discuss separate our cases from the ones present in the current literature, some other features indicate possible pitfalls that we took great care in avoiding during the design phase of this project.

  1. a.

    We put together the cases so that the difference between public and private nature is apparent. We tried to avoid any possible case in which a private situation could be interpreted as a public one, or the other way around.

  2. b.

    We assumed that retraction might be connected with the level of harm associated with particular utterances. The more offensive an utterance is perceived to be, the more likely it is that a retraction is demanded from the speaker. Therefore, there is no mention in our cases of any harm arising from them. Especially for evaluative cases, it is important to balance the level of offensiveness that they may carry, and we designed our cases with that in mind.

  3. c.

    In most cases involving evaluative utterances, a speaker utters an evaluative statement preceded by a reason that functions as the antecedent of a conditional. To make these cases more realistic, we did not simply include raw evaluations, but a conversational context in which, inevitably, some of the reasons for these evaluations are also mentioned. Thus, our cases are not of the form “x is good/bad/etc.”, but rather “x has certain property P, which makes them good/bad/etc”. With this setting, it was crucial to make clear that the speaker’s opinion had changed with respect to the whole conditional claim. Otherwise, the audience might ask for a retraction simply because the factual conditions mentioned in the antecedent have changed, which would make our evaluative cases collapse into descriptive ones, affecting our design. When speakers change their minds with respect to “x has certain property P, which makes them good/bad/etc”, it’s not just that they now think that x no longer instantiates property P. Rather, the point is that they no longer believe that having property P implies that x is good/bad/etc.

  4. d.

    In line with the previous point, when evaluative predicates associated with thin concepts like “tasty” are tested, any change of mind is associated with a change in the standards that rule the use of that concept. One can only move from finding some food tasty to disliking it because their culinary standards have changed. This is not the case for every evaluative predicate. Evaluative predicates expressing thicker concepts can be involved in cases in which a speaker changes their mind simply because the object no longer is perceived as an instantiation of the descriptive properties associated with that particular thick concept. I can believe of x, whom I thought to be a generous person, that she no longer is generous even if my idea –my standards– of what constitutes a generous person hasn’t changed at all. If our evaluative cases were to be in line with others in the literature, and we were to make sure that they didn’t collapsed into descriptive ones, we needed to make sure that the change experimented by the speakers concerned their standards, not just their perception of the extension of a particular property or another.

  5. e.

    Finally, we made sure that in factual cases everyone knew that what the speaker believed was false. Correspondingly, evaluative cases are designed so that the opinion that the speaker had when she made the utterance does not correspond to the usual way of thinking about the issue. If the audience believes that the original utterance was true, it makes no sense to think that they will demand that the speaker retracts it, even if she has now changed her mind. I wouldn’t ask you to retract a statement of your past that I think now is clearly true, even if you believe now that it was a false one.

4 Experiment 1

4.1 Participants

For this first study, we recruited 72 Spanish undergraduate students, aged 18 and 19 years old, and excluded 3 of them who didn’t complete the whole survey. 69 participants remained. Unlike study 2, this time we did not include demographic questions in the survey.

To estimate post hoc power, we conducted a bootstrap power analysis based on 5000 simulations with replacement. The analysis revealed low power to detect the target effect of context (54%), as well as the effect of statement (19%) and the context-by-statement interaction (10%).

4.2 Method

Materials. Study 1 employs a set of four hypothetical scenarios. Each scenario involves a statement made by a speaker at a context c1, which turns out to be false when assessed from a later context c2, either because the statement has been proven to be untrue or because the speaker’s standards have changed. For each of the four scenarios, we wrote a total of four variants—one for each factorial combination in the 2 (Context: public, private) by 2 (Statement: descriptive, evaluative) matrix.

As an example, in one of the scenarios, the speaker makes a statement about the NBA player Luka Dončić after being picked in the 2018 NBA draft. In the public condition, the speaker makes the statement on a prime-time TV show. Meanwhile, in the private condition, the speaker is talking to his friend at home:

[CONTEXT]: Every year the NBA Draft is held, a process in which the teams of this basketball league choose, in turn, young players to join their ranks. The player Luka Dončić was selected in the third position of the first round of the 2018 draft. When the pick was announced, Rashad Phillips, a former professional NBA basketball player, said on a prime-time TV show (PUBLIC) / When the pick was announced, Rashad Phillips, a former professional NBA basketball player who every year follows the draft at his friend Kevin’s house, told Kevin (PRIVATE).

Next, we introduce information about the nature of the statement made by the speaker:

[STATEMENT]: “Luka Dončić is a lousy player, he’s not worthy of being considered a top 5 pick. He’s a second-round pick.” (EVALUATIVE) / “Luka Dončić’s lateral movements are slower than most NBA players. It is impossible to be a top player in the league without quick lateral displacements and that, at this age, is no longer coachable.” (DESCRIPTIVE).

Finally, the vignette concluded with information concerning the falsity of the statements:

In the 2018-19 season, however, Luka Dončić was voted the NBA’s young player of the year. In the 2019-20 season he was third in a ranking of NBA players made based on his playing statistics during the season. In the 2020-21 season he was sixth in the same ranking, and in both 2020 and 2021 he was chosen to play in the “All-Star game” (End of the descriptive condition). Today, Phillips would not say that Dončić is a lousy player; Dončić’s numbers around the league outweigh the reasons he had for considering him a second-round pick. In the future, when faced with similar situations, he will judge more carefully.

Procedure. In a randomized block design, participants were assigned to one of four groups and viewed a battery of four trials in a random order. In each group, participants viewed every factorial combination in the 2 (Context: public, private) × 2 (Statement: descriptive, evaluative) matrix paired with a different scenario on each trial. Thus, no participant viewed the same scenario or factorial combination twice.

After each trial, participants had to respond to two different questions on a 7-point Likert scale ranging from “completely disagree” (1) to “completely agree” (7). The first question was whether the audience might ask the speaker to retract, aimed at assessing whether retraction makes sense in each trial. The second question was whether the speaker should retract what she said, aimed at measuring whether retraction is mandatory. Thus, in the previous case, the possibility of retraction was assessed through the question: “Might the audience ask Phillips to retract?”. The mandatory character of retraction, on the other hand, was assessed through the question: “Should Phillips retract what he said on the TV show?”.

After the block of four trials, we included a post-test questionnaire about the relevance of each factor to consider retraction mandatory. Participants made relevance judgments for each condition using continuous scales from 1 “No importance/relevance” to 7: “Absolute importance/relevance”:

  1. 1.

    Public context (“that the speaker has said so publicly”),

  2. 2.

    Private context (“that the speaker has said so privately”).

  3. 3.

    Descriptive statement (“The speaker’s statement is about a fact.”).

  4. 4.

    Evaluative statement (“The speaker’s statement is an evaluation or an offense.”).

4.3 Results

Summary statistics for each experimental condition are reported in Table 1.

4.3.1 Experimental effects

Possibility. Participants manifested agreement with the possibility of retraction through the four vignettes (M = 4.30). We analyze our data using a multilevel model with random effects of participant and scenario, applying the Kenward-Roger approximation to calculate degrees of freedom (Bates et al., 2015; Luke, 2017). In the fixed effects portion of the model, we enter Context and Statement, along with two-way interaction. Furthermore, the model employs an effect coding system, such that factors with two levels are coded as {−0.5, 0.5}. As a result, in models involving interactions, the lower-level terms represent main effects, and not simple effects.

Context, public versus private, highly affected the possibility of retraction, F(1, 201.42) = 12.67, p < .001, η2p = 0.053, while statement, evaluative versus descriptive, didn’t, F(1, 202.73) = 0.860, p = .354, η2p = 0.004. No higher order interactions between context and statement were observed, F = 0.827, p = .364, η2p = 0.004.

When the statement was descriptive, the effect of context was significant (p < .001), but when the statement was evaluative, context didn’t exert a significant effect (p = .060). See Fig. 1.

Fig. 1
figure 1

Displays the differences in mean agreement regarding the claim that the speaker could be asked to retract (possibility of retraction) their previous descriptive statement (left) or evaluative statement (right) in private and public contexts

Mandatory. Participants manifested strong agreement with the mandatory character of retraction through the four vignettes (M = 4.74).

Context also strongly affected the mandatory character of retraction, F(1, 201.17) = 11.67, p < .001, η2p = 0.046, while statement didn’t, F(1, 201.78) = 0.718, p = .397, η2p = 0.003. No higher order interactions between context and statement were observed, F = 0.500, p = .479, η2p = 0.002.

When the statement was descriptive, the effect of context wasn’t significant in this case (p = .056), but when the statement was evaluative, context did exert a significant effect (p = .003). See Fig. 2.

Fig. 2
figure 2

Displays the differences in mean agreement regarding the claim that the speaker should retract (possibility of retraction) their previous descriptive statement (left) or evaluative statement (right) in private and public contexts

Table 1 Possibility and Mandatory by Condition: Mean and Standard Deviation

Abstract principles. Participants reported that a statement being evaluative, rather than merely descriptive, is the most relevant criterion in order to consider retraction mandatory (M = 5.60), together with context being a public one (M = 5.53), followed by the statement being descriptive (M = 4.96), and context being of a private nature (M = 3.64).

4.4 Discussion

Consistent with our first hypothesis, we found that context affects retraction, both its possibility and its obligatory character. In particular, we found that claims that turn out to be false are more likely to be asked to be retracted. Also retraction is more likely to be seen as mandatory when the context is public rather than private. This is an important finding, to the extent that hypotheses about how retraction works have so far been tested mostly in private conversations. Previous studies’ results might be highly dependent on the nature of the vignettes employed, mainly conversations taking place in private settings.

The nature of the statement, on the other hand, didn’t exert a significant effect on the possibility or the mandatory character of retraction: in both descriptive and evaluative conditions retraction was seen as possible and mandatory. An important, if subtler, result is that when the statement was evaluative, context exerted a significant effect regarding the mandatory character of retraction. In particular, retraction of false evaluative statements was seen as mandatory much more in public settings than it did in private ones. These results suggest that, in public contexts, the nature of the statement could be relevant to determine whether retraction is taken to be mandatory. This is also important, because it shows that the topic the claim was about can affect the obligatory character of retraction.

In the abstract, participants deemed evaluative statements and public contexts as the two most relevant factors in order to consider retraction mandatory. A context being private was seen as the least relevant factor when considering whether retraction is mandatory in our cases. So it seems that participants were aware of the underlying principles guiding their assessments in this study.

We observed that ratings regarding the mandatory nature of retraction were higher in scenario 3 than they were in the rest. This vignette, unlike the others, has a famous right-wing Spanish politician as speaker. Politicians are public agents, and thus it might be the case that subjects have some difficulties giving full credit to a scenario designed to be private when it involves such public figures. Since our results seemed to be both clear-cut and controversial –they go against certain features exhibited in most common studies on the topic, as it was shown in Sect. 2–, we decided to conduct a second study to try to replicate our results.

5 Experiment 2

5.1 Participants

This time we recruited a sample of 361 native Spanish speakers from the Netquest panel (www.netquest.com). A total of 382 participants were invited to participate, of whom 14 did not complete the survey and 3 failed our attention check, resulting in a final sample of 361 participants (185 male, 51%, 174 female, 48%, 2 non-binary, 1%).

Mean sample age was 48.7 years, ranging from 24 to 88 years. The sample was somewhat left of center (M = 3.50, SD = 1.41, on a 1: “Left” to 7: “Right” scale).

To estimate post hoc power, we conducted a bootstrap power analysis based on 5000 simulations with replacement. The analysis revealed excellent power to detect the target effect of context (> 99%), while less power to detect the effect of statement (75%) and the context-by-statement interaction (28%).

5.2 Method

Materials. For Study 2 we removed one of the previous four hypothetical scenarios, three of them remained. In particular, we removed scenario 3, because the speaker in this vignette, as mentioned above, is a famous right-wing Spanish politician, and we thought that this feature could affect the results when trying to discriminate between public and private contexts. As in Study 1, each scenario involves a statement made at context c1, which turns out to be false assessed from a later context c2, either because the statement has been proven to be untrue or because the speaker’s standards have changed. For each of the three scenarios, we wrote a total of four variants—one for each factorial combination in the 2 (Context: public, private) by 2 (Statement: descriptive, evaluative) matrix.

Procedure. In a randomized block design, participants were assigned to one of four groups and viewed a battery of three trials in a random order. In each group, participants viewed every factorial combination in the 2 (Context: public, private) × 2 (Statement: descriptive, evaluative) matrix paired with a different scenario on each trial. Thus, no participant viewed the same scenario or factorial combination twice.

After each trial, participants had to respond to two different questions on a 7-point Likert scale ranging from “completely disagree” (1) to “completely agree” (7). As in the previous study, the first question was whether the audience might ask the speaker to retract, aimed to measure if retraction makes sense in each trial. The second question was whether the speaker should retract what she said, aimed to measure if retraction is mandatory.

After the block of three trials, we included again the same post-test questionnaire about the relevance of each factor in those cases in which retraction was considered to be mandatory. Participants made relevance judgments for each condition using continuous scales from 1: “No importance/relevance” to 7: “Absolute importance/relevance”.

5.3 Results

Summary statistics for each experimental condition are reported in Table 2.

5.3.1 Experimental effects

Possibility. Participants manifested agreement with the possibility of retraction through the three vignettes (M = 4.64, SD = 1.87). As in Study 1, we analyze our data using a multilevel model with random effects of participant and scenario, applying the Kenward-Roger approximation to calculate degrees of freedom.

Context, public versus private, highly affected the possibility of retraction, F(1, 781.28) = 15.37, p < .001, η2p = 0.018, and statement, descriptive versus evaluative, also affected it, F(1, 780.89) = 6.82, p = .009, η2p = 0.008. A higher order interaction between context and statement was found, F = 8.972, p = .002, η2p = 0.011.

When the statement was descriptive, the effect of context was significant (p < .001), but when the statement was evaluative, context didn’t exert a significant effect (p = .502). When the context was private, the effect of statement was significant (p < .001), but when the context was public, statement didn’t exert a significant effect (p = .792). See Fig. 3.

Fig. 3
figure 3

Displays the differences in mean agreement regarding the claim that the speaker could be asked to retract (possibility of retraction) their previous descriptive statement (left) or evaluative statement (right) in private and public contexts

Mandatory. Participants manifested strong agreement with the mandatory character of retraction through the three vignettes (M = 4.99, SD = 1.83).

Context strongly affected judgments concerning the mandatory nature of retraction in our cases, F(1, 783.82) = 21.72, p < .001, η2p = 0.025, and statement also affected it, F(1, 783.46) = 6.63, p = .010, η2p = 0.008. No higher order interactions between context and statement were observed, F = 1.805, p = .179, η2p = 0.002.

When the statement was descriptive, the effect of context was significant (p < .001), and when the statement was evaluative, context did also exert a significant effect (p = .018). When the context was private, the effect of statement was significant (p = .005), but when the context was public, statement didn’t exert a significant effect (p = .381). See Fig. 4.

Fig. 4
figure 4

Displays the differences in mean agreement regarding the claim that the speaker should retract (possibility of retraction) their previous descriptive statement (left) or evaluative statement (right) in private and public contexts

Table 2 Possibility and Mandatory by Condition: Mean and Standard Deviation

Abstract principles. Participants reported a context being public as the most relevant criterion in order to consider retraction mandatory (M = 5.90), together with the evaluative nature of the statement (M = 5.57), followed by descriptive statements (M = 5.31) and finally private contexts (M = 3.95).

5.4 Discussion

Study 2 successfully replicated the main effects of context observed in Study 1 concerning the possibility and the mandatory nature of retraction: context affected both significantly. Again, false statements were more likely to be asked to be retracted, and retraction was more likely to be seen as mandatory, when the context was public than when it was private.

This time, however, the nature of the statement did exert an effect on the possibility and the mandatory character of retraction. Retraction concerning false evaluative claims was much more often considered to be mandatory, than retraction concerning false descriptive statements, even in private contexts.

Once again, in the abstract, participants deemed public and evaluative conditions as the two most relevant factors in order to consider retraction mandatory.

6 General discussion

The goal of our studies was to examine the effect of the context (i.e., public or private) and the nature of a statement (i.e., descriptive or evaluative) on retraction. There are reasons to hypothesize that both factors may modulate the obligatory character of retraction. Looking across a diverse set of four scenarios, we manipulated the conditions of context and statement to assess three hypotheses about the nature of retraction. Take the following as a synthesis of our primary results:

  1. 1.

    Against prior research (see Knobe & Yalcin, 2014; Khoo, 2015; Marques, 2018; Kneer, 2021a), when the truth-value of a previously uttered claim changes, the speaker seems to be required to retract it.

  2. 2.

    Whether retraction is mandatory or not significantly depends on the context: in public settings speakers are more often required to retract what they have said than they are in private ones.

  3. 3.

    Requests for retraction are slightly more likely to occur whenever evaluative statements are involved, especially in public contexts.

  4. 4.

    We are aware, in the abstract, about the influence of context concerning the obligatoriness of retraction. Participants recognized public settings as more suitable for demands of retraction.

In support of H1, public and private contexts affected participants’ perception of the possibility and the obligatory character of retraction.

Study 1 didn’t find any significant effect of the nature of the statement concerning the obligatoriness of retraction. However, Study 2 provided support for H2, in that evaluative statements required retraction more often than descriptive claims did. One possible reason why evaluative claims required retraction more often than descriptive statements is that evaluative uses of language are more likely to cause offense, and other kinds of harm, than descriptive ones are (Almagro & Villanueva, 2021; Cepollaro et al., 2021; Soria-Ruiz & Stojanovic, 2019). More studies will be needed to support H2, but this change with respect to the effect of the nature of the statement from the first study to the second one might be consistent with the replicated result that, in the abstract, participants see evaluative statements as slightly more relevant than descriptive statements when they consider retraction to be mandatory. The same statement might count as descriptive or evaluative depending on, among other factors, who the speaker is (Almagro et al., 2022, 2023), and we often fail to differentiate between descriptive and evaluative statements, so this might be affecting the results here as well.

Finally, concerning H3, it seems that participants are aware of the principles guiding their assessments regarding retraction. In the abstract, people recognize public contexts and evaluative statements as highly important in order to consider retraction mandatory, and these were the factors that most influenced participants’ responses about the obligatoriness of retraction through the vignettes.

However, someone might argue that recent empirical studies showing that the norm of assertion is not truth but justified belief (Kneer, 2018, 2021b, 2022; Marsili & Wiegmann, 2021), support contextualism. If we assume, contrary to MacFarlane, that the norm of assertion is not the Truth Rule but the Justified Belief Rule, why would it be necessary for the speaker to retract a statement now assessed as false if she was justified when she made the original assertion? This would make the falsehood of the previous statement somewhat irrelevant to a proper understanding of retraction. However, we do not believe that these empirical works pose a problem for the results obtained in our paper, because the examples that appear in these papers showing that the norm of assertion is justified belief are very different from the ones that we use in our work.

On the one hand, they are examples that take place in epistemic contexts: Russell’s Clock Case (Kneer, 2018, p. 166), in which June concludes the hour by looking at a watch that does not work, the well-worn American Car vignette (Kneer, 2018, p. 167), in which Bob claims that Jill drives an American car but does not know that Jill has changed it recently, or Rolex Unlucky (Marsili & Wiegmann, 2021, p. 2), in which Maria, a watch collector, believes she owns a 1990 Rolex but at the end doesn’t because of a very rare mistake in her detailed inventory. None of our examples, however, have to do with epistemic contexts (Case 4. Fat Tony is also not a case of epistemic context since there are no epistemic standards or evidence involved in any sense). On the other hand, the examples contained in studies defending that the norm of assertion is justified belief are all descriptive, while ours are descriptive and evaluative. We wanted to test whether there were differences between descriptive and evaluative judgments regarding the mandatory nature of retraction. In summary, we do not believe that the experiments reported in the above studies can directly and non-problematically be adapted to cases such as the ones we use in our work.

6.1 Limitations

First, our studies were conducted in a single country, Spain. The effects we observed could stem from cultural or even linguistic norms of Spanish. If so, we cannot confidently assert that our findings will generalize to other cultures/languages. It might be the case that retraction works differently in Spanish and English. It would be interesting to conduct a cross-linguistic study in order to be better positioned to deal with this possible limitation. However, despite the differences we might find, we are confident that the relevance of context for retraction extends beyond Spanish.

Second, it might be the case that our results, and the results of most studies conducted so far, were affected by the confusion between two different practices intuitively associated with retraction: withdrawing a false claim vs. apologizing. If so, our results may conflate two very different phenomena. In future studies, it would be necessary to train participants from the beginning of the study, by clarifying the sense of retraction we want to test. Empirical evidence points to this idea. A searchFootnote 7 in the esTenTen18 in Sketch Engine for the most common collocationsFootnote 8 of retractar “to retract”, shows that the most frequent collocate in the “‘retractar’ and/or” list is disculparse “apologize” with a total frequency of 62 and a Log Dice of 8, as in No iba a retractarse o disculparse por ello “He was not going to retract or apologize for it.” In English, we obtain similar results. When searched “retract” in the enTenTen20, the second most frequent collocate in the “‘retract’ and/or” list is “apologize”, with a total frequency of 528 and a Log dice of 9.

Finally, we want to discuss two routes worth exploring in future work for addressing a limitation of our studies, that is, the lack of response to the question of how and why individuals do feel a need to retract their statements, particularly in public settings. One route would be to explore the hypothesis that the criteria for truth-value attributions differ in public contexts. Presumably, public speakers, as givers of knowledge, are expected to be reliable and trustworthy, especially in an environment where attention is limited. So this commitment to honesty and trustworthiness might partially justify the capacity of a speaker to reach a large audience as an informant. Thus, if a public claim is later found to be false, it might be important for the speaker to retract the statement in order to maintain their credibility as a trusted source of information. Otherwise, they would lose their status as an informant worth paying attention to. That is, they would lose their status as a public informant.

Another route worth exploring would be to pursue the hypothesis that someone may think that a speaker should retract their statement not because it is untrue, but because the speaker may regret having said it. It is possible to feel regretful about a statement that is true but, say, unkind. Thus, the need to retract a statement may be more significant in public contexts not only because it is important to be truthful as a public informant, but also because it is important not to be perceived as insensitive. When public speakers cause some kind of harm, it might be important for them to show regret. One way to do this is to retract or take back what they said.Footnote 9

We believe that these issues must be addressed empirically to better understand the phenomenon of retraction.

6.2 Conclusion

Previous empirical research so far has suggested that people’s intuitions are inconsistent with the predictions made by MacFarlane’s assessor relativism concerning retraction. In particular, according to these studies, people don’t consider retraction to be obligatory, which makes relativism less plausible than some of its competitors, such as contextualism. However, we have identified a common feature of these empirical studies that might be affecting their results: they only consider scenarios of private conversations between two subjects, or at least don’t take into consideration the crucial difference between public and private contexts.

At the most general level, our findings show that people take retraction to be mandatory when speakers consider that a statement they made in the past is false now. But, more importantly, both assessing abstract and concrete situations, participants considered retraction to be more often mandatory in public than they do in private contexts. This is an important aspect that must be taken into consideration to empirically test the plausibility of MacFarlane’s relativism. Again, to be clear, we don’t think that our results support MacFarlane’s relativism.

The present study is an attempt to fill this critical gap in our understanding of retraction. To this end, we reveal that retraction appears to be significantly affected by the context in which the original statement was made. Moreover, we also found that the nature of the statement could also play an important role in relation to retraction. Evaluative statements might be required to be retracted more often than descriptive ones are. Our results so far are, nevertheless, insufficient to support this hypothesis.