Choice-Driven Counterfactuals

In this paper, we investigate the semantics and logic of choice-driven counterfactuals, that is, of counterfactuals whose evaluation relies on auxiliary premises about how agents are expected to act, i.e., about their default choice behavior. To do this, we merge one of the most prominent logics of agency in the philosophical literature, namely stit logic (Belnap et al. 2001; Horty 2001), with the well-known logic of counterfactuals due to Stalnaker (1968) and Lewis (1973). A key component of our semantics for counterfactuals is to distinguish between deviant and non-deviant actions at a moment, where an action available to an agent at a moment is deviant when its performance does not agree with the agent’s default choice behavior at that moment. After developing and axiomatizing a stit logic with action types, instants, and deviant actions, we study the philosophical implications and logical properties of two candidate semantics for choice-driven counterfactuals, one called rewind models inspired by Lewis (Nous13(4), 455–476 1979) and the other called independence models motivated by well-known counterexamples to Lewis’s proposal Slote (Philos. Rev.87(1), 3–27 1978). In the last part of the paper we consider how to evaluate choice-driven counterfactuals at moments arrived at by some agents performing a deviant action.


Introduction
What would have happened if the charge nurse had not put the wrong medications on the desk? Would the intern have given them to the patient anyway? What if Alice hadn't moved out of the way? Would the thief have shot her? Would Beth's husband have picked up the kids if she hadn't? If David had bet tails, would Max have kept playing? These types of questions are asked in many situations, such as when determining responsibility, when making plans for the future, and when reasoning strategically about how our choices influence the choices of others. A common feature of these questions is that they involve choice-driven counterfactuals. Choicedriven counterfactuals are counterfactuals whose semantic value depends on how agents are expected to act. This means that the evaluation of a choice-driven counterfactual relies on auxiliary premises about the default choice behavior of the involved agents, where the default choice behavior is determined by, for instance, duties, personality, daily schedule, preferences, goals, and so on.
Our aim in this paper is to study a logic for reasoning about choice-driven counterfactuals. To do this, we merge one of the most prominent logics of agency in the philosophical literature, namely stit logic (the logic of seeing-to-it-that) [5,25], with the well-known logic of counterfactuals due to Stalnaker [46] and Lewis [30].
There has been some investigation of the semantics of counterfactuals in the context of branching time [38,49]-the theory of time that underlies stit semantics. However, these proposals do not take agency into account. In addition, although counterfactual reasoning is key to a number of applications of stit logic, such as the analysis of the notion of responsibility [2,11,20,32], to our knowledge, only Xu [52] and Horty [25,Chapter 4] explicitly consider how to interpret counterfactuals in stit semantics. This paper begins to fill this important gap in the stit literature. We develop a stit logic with the resources to represent the agents' default choice behavior and show how to extend this logic with counterfactuals, highlighting some key motivating assumptions and identifying interesting logical properties of choice-driven counterfactuals.
The paper is organized as follows. In Section 2, we present the stit logic with deviant actions and n agents, SLD n , that we use to study choice-driven counterfactuals. In Section 2.1, we introduce the notion of agency in branching time. In Section 2.2, we motivate a key component of our semantics for counterfactuals, namely the distinction between deviant and non-deviant actions at a moment, where an action available to an agent is deviant if it is not prescribed by the agent's default choice behavior. In Section 2.3, we present the syntax and semantics of SLD n , and provide a sound and complete axiomatization. Section 3 extends SLD n to include counterfactuals. In Section 3.1, we gradually introduce two candidate semantics for choice-driven counterfactuals, one called rewind models inspired by Lewis [31] and the other called independence models motivated by well-known counterexamples to Lewis's proposal [44]. The logical properties of the two semantics are studied in Section 3.2. In Section 4, we consider how to evaluate choice-driven counterfactuals at moments arrived at by some agents performing a deviant action. Finally, we conclude in Section 5 with a brief discussion of future work. All proofs are found in Appendix A and B.

Basic Framework
This section introduces the stit logic with deviant actions and n agents SLD n that we use as a basis to study choice-driven counterfactuals. The following example, adapted from [49], illustrates the type of situation that we aim at modeling: Example 1 There are three agents engaged in the following game: Initially, David decides whether to play with Max or Maxine and then he bets heads or tails. After David bets, the person nominated by David flips a coin. David wins if his bet matches the outcome of the coin flip and loses otherwise; Max wins just in case David loses; finally, Maxine wins no matter whether David's bet matches the outcome of the coin flip. Unknown to David, both Max and Maxine have two coins, one with heads on each side and one with tails on each side (called the H-coin and the T-coin, respectively). If Max has a chance to play, he flips the H-coin if David bets tails and the T-coin if David bets heads. If Maxine has a chance to play, she picks one of the coins to flip at random. 1 After nominating Max, David bets heads and Max flips the T-coin, so David loses.
In Example 1, after Max flips the T-coin, the counterfactual C1 If David had bet tails, then he would still have lost is intuitively true: according to the story-the reasoning goes-if David had bet tails instead of heads, Max would have flipped the H-coin, thus making David lose. In order to capture this intuition, we need a semantics that can represent the following elements: (E1) The different ways in which things could go or could have gone.
For instance, in Example 1, David bets heads but he could have bet tails, and this would have led to an alternative course of events. (E2) The particular time at which an agent makes a choice.
When we evaluate a choice-driven counterfactual, we consider what would have happened had the agents acted differently at a particular time. For instance, when we evaluate C1, we consider alternatives where David has just bet tails; alternatives where he has not just bet tails but did bet tails, say, two weeks ago or will bet tails six days from now are immaterial. (E3) The types of action performed by the agents.
When we evaluate a choice-driven counterfactual, we consider what would have happened had the agents performed different types of action. For instance, when we evaluate C1, we consider alternatives where David performs the action type "betting tails" instead of the action type "betting heads".
(E4) The default choice behavior of the agents. When we evaluate a choice-driven counterfactual, we rely on default assumptions about what the agents would have done had some agents acted differently. For instance, when we suppose that David bets tails in order to evaluate C1, we use Max's default choice behavior (i.e., to select the coin that makes David lose) to conclude that he would choose the H-coin.
The semantics of stit logic has almost everything we need. Stit captures the idea that the future can unfold in different ways, and how it will actually unfold depends, in part, on what the agents decide to do. This leads to defining stit models in terms of two main components: a branching time structure representing the different ways things could go (as per element E1) and a choice function representing the actions available to the agents at each moment. 2 The branching time structure is sometimes supplemented with instants, which represent the time at which alternative moments occur (as per element E2); see [5]. In addition, the choice function is sometimes accompanied by a function that labels the actions available to the agents with their types (as per element E3); see, e.g., [14,27,53]. The only missing ingredient is a representation of the agents' default choice behavior (element E4).
We propose a way to model E4 in Section 2.2 below, after we introduce the formal definitions of branching time structure, instant, and action-type function in Section 2.1 (readers who are familiar with these notions should feel free to skim quickly through the definitions). We then present the syntax, semantics, and an axiomatization of our stit logic with deviant actions SLD n in Section 2.3. We will use SLD n models to provide a semantics for choice-driven counterfactuals in Section 3.

Agency in Branching Time
A branching time structure is a set of moments, Mom, with a relation < on Mom, where m < m means that moment m occurs before moment m . The relation < is assumed to have a treelike structure with forward branching representing the indeterminacy of the future and backward linearity representing the determinacy of the past. For technical convenience, in this paper we assume that time is discrete, meaning that every moment has a set of immediate successors, and that it has a unique beginning and no end. Formally: The standard notions used to reason about DBT structures are summarized in Table 1. Given a DBT structure T = Mom, m 0 , < , each history h ∈ Hist T represents a complete course of events. Because of forward branching, many different histories can pass through a single moment m (i.e., m can be an element of many different histories). The set of histories passing through moment m is denoted H T m ; each h ∈ H T m represents a complete course of events that can still be realized at m. Since time is discrete with no endpoints, for each m ∈ Mom, the set of immediate successors of m, denoted succ(m), is non-empty. If h ∈ H T m , then h ∩ succ(m) is a singleton because histories are linearly ordered sets of moments. This means that there is one and only one successor of m on history h, denoted succ h (m). The condition of past linearity ensures that every non-initial moment m = m 0 has a unique predecessor, denoted pred(m). An index m/ h ∈ Ind T represents the complete state of affairs at moment m on history h. In the context of branching time, formulas are typically evaluated at indices.
We now supplement DBT structures with instants. Intuitively, an instant is a set of moments happening at the same time.  Then Inst T = {succ n (m 0 ) | n ∈ N} is the set of instants over T . We use t, t 1 , t 2 , . . . , to denote elements of Inst T .
According to Definition 2, each clock tick transitions every moment in an instant to the next unique instant. 3 When m ∈ t we say that moment m occurs at instant t and when m ∈ h ∩ t we say that history h crosses instant t at moment m. Let T = Mom, m 0 , < be a DBT structure. The fact that < is discrete and rooted in m 0 ensures that: 1. Inst T is a partition of Mom. Hence, every m ∈ Mom occurs at one and only one instant, denoted with t m . 2. Every history h crosses each instant t at exactly one moment, denoted with m (t,h) .
In what follows, we write t/ h for m (t,h) /h.
The above notation together with the notation introduced in Table 1 will be repeatedly used in Sections 3 and 4. In what follows, we omit the superscript T and simply write Hist, H m , Ind, and Inst when the DBT structure is clear from the context. Turning to agency, we start by fixing sets of (names of) action types and agents: • Let Atm be a non-empty finite set of (names of) action types.
(We use a, b, c, possibly with superscripts a , a , . . . , for elements of Atm.) • Let Ag = {1, . . . , n} be the set of n agents for some number n ∈ N.
(We use i, j, k, possibly with superscripts i , i , . . . , for elements of Ag.) We think of agents as endowed with a repertoire of action types of which they can be authors. Let Acts be the set of (names of) individual actions defined as follows: We write a i when (a, i) ∈ Acts. The idea is that a i is the action type that is instantiated whenever agent i performs an action of type a. For instance, if a ∈ Atm is the action type "flipping a coin" and 1, 2 ∈ Ag are, respectively, David and Max, then a 1 is the action type "David flipping a coin" and a 2 is the action type "Max flipping a coin". For i ∈ Ag, let Acts i be the set of action types authored by agent i: A profile is a function α : Ag → Acts such that, for all i ∈ Ag, α(i) ∈ Acts i . So, a profile is any combination of actions associated with each agent. Let Ag-Acts be the set of all profiles (we use Greek letters α, β, γ for elements of Ag-Acts). As usual, when α ∈ Ag-Acts and I ⊆ Ag, we will write α I for the restriction of α to the set I , α −I for α Ag\I , and α(I ) for the image of I under α.
We make the following two key assumptions about the individual actions that are performed at a moment: 1. The action types in Atm, Acts, and Ag-Acts represent one-step actions. So, in the spirit of Propositional Dynamic Logic (PDL) [22] and Coalition Logic (CL) [35], performing an action at a moment transitions to a set of next moments representing the different possible outcomes of the action. 4 2. Every transition from a moment to one of its successors is brought about by a unique profile. Accordingly, we label every index m/ h with the profile that brings about the transition from m to its successor on h (i.e., the moment succ h (m)). If index m/ h is labeled with α ∈ Ag-Acts, then α(i) represents the action type that agent i ∈ Ag performs at m/ h. Hence, every agent i performs one, and only one, type of action at every index m/ h.  action token is assigned a unique type and different tokens are assigned different types. 6 Conditions 1 and 2 from Definition 3 are standard requirements in stit semantics, see [25,Chapter 2]: The condition of no choice between undivided histories ensures that no individual action executable at a moment can separate histories that are undivided at that moment. The condition of independence of agents ensures that every combination of individual actions executable at a moment (one for each agent) can itself be executed at that moment.

Deviant Actions
Having introduced branching time structures, instants, and action types, the last element we need in order to provide a semantics for choice-driven counterfactuals is the notion of default choice behavior. Before presenting a formal definition, let us go back to Example 1. A DBT structure and an action-type function representing Example 1 are pictured in Fig. 1. In the figure, David is agent 1, Max is agent 2, and Maxine is agent 3. David's individual action types are nm 1 (nominate Max), nm 1 (nominate Maxine), bt 1 (bet tails), and bh 1 (bet heads); Max's individual action types are tc 2 (flip the T-coin) and hc 2 (flip the H-coin); and Maxine's individual action types are tc 3 (flip the T-coin) and hc 3 (flip the H-coin). 7 The dashed lines represent instants, and the actual history is h 2 (the thick line).
Suppose that we are at moment m 4 on history h 2 (so, David and Max have made their choices) and that we want to determine whether the counterfactual C1 If David had bet tails, then he would still have lost is true. In order to evaluate C1, we need to consider histories on which David performs an action of type "betting tails" just previous to the time of m 4 (the time of utterance). In other words, we need to consider histories on which David performs the action type bt 1 at instant t 2 . Histories h 3 , h 4 , h 7 , and h 8 all have this property. However, among these histories, we only focus our attention on those that are most similar to the actual history h 2 . We give a full analysis of similarity in Section 3. What is important at this stage is that there is a crucial difference between h 3 and h 4 .
On both histories, David bets tails at t 2 after nominating Max. Yet, after that, Max flips the H-coin on h 3 and the T-coin on h 4 . The key difference is that only h 3 is consistent with Max's default choice behavior, namely that if he has a chance to play, he flips the coin that makes David lose. Thus, we take C1 to be true assuming that Max's choice matches his default choice behavior. Contrast C1 with the counterfactual: "If David had nominated Maxine and bet tails, then he would still have lost". Given that Maxine might well flip the T-coin, this counterfactual is false. 8 In order to represent the default choice behavior of the agents over time, we will introduce a deviant-action function that identifies the deviant actions at each moment. An action available to an agent i at a moment m is deviant if its performance at m does not agree with agent i's default choice behavior at m-it is a non-deviant or default action otherwise. To simplify the exposition, we call an agent's default choice behavior a choice rule. In Example 1, "Max flips the coin that makes David lose" is a choice rule and the actions hc 2 (flipping the H-coin) and tc 2 (flipping the T-coin) are deviant actions at m 4 and m 5 , respectively. The following four comments clarify the notion of choice rule.
What Choice Rules are (not). Choice rules can have various sources, including social conventions, shared standards of rationality, habits, individual preferences or goals, and, in the case of artificial agents, choice-guiding programs. Natural examples of a choice rule are the decision rules found in the game-and decision-theory literature, such as expected utility maximization or maximin. However, it is important to stress that some choice rules can be dictated by habits or behavior that is, on the face of it, irrational (more on this in Section 4). A final point about the interpretation of choice rules is that they should not be thought of as physical or causal laws. The key difference is that the latter laws constrain the behavior of the agents in a way that choice rules do not: while an agent who is hit on his legs by a 220 pound rolling ball cannot avoid falling, an agent who normally cheats at cards can avoid cheating.

Degrees of Deviation.
It is natural to think that the notion of deviant action comes in degrees: the way that some actions deviate from the default choice behavior may be more or less important or "abnormal" than others. For simplicity, we treat all deviant choices equally. Everything that follows can be adapted to a graded notion of deviant action.
(In)deterministic Choice Rules. Suppose that m is a moment at which an agent i has a non-vacuous choice, and let r be a choice rule that guides the behavior of i at m. We will say that: Max's behavior in Example 1 is guided by a deterministic choice rule: provided that Max can play, flipping the T-coin is his only non-deviant option if David bets heads and flipping the H-coin is his only non-deviant option if David bets tails. Maxine's behavior, on the other hand, is guided by an indeterministic choice rule: if she can play, Maxine may flip either one of the two coins, no matter how David bets. Finally, an example of a non-deterministic choice rule is: "If mango, pineapple, and pear are available, then Alice picks either mango or pineapple". When all three fruits are present, this rule guides Alice's behavior only partially since picking the mango and picking the pineapple are both non-deviant. In this paper, we make the simplifying assumption that all choice rules are either deterministic or indeterministic. Excluding non-deterministic choice rules simplifies our formal definitions. Of course, this is a significant assumption since non-deterministic choice rules are ubiquitous. However, the issues concerning choice-driven counterfactuals addressed in this paper do not depend on this assumption.
Extensional Perspective on Choice Rules. Our models represent the distinction between actions that are deviant and actions that are not deviant according to an underlying set of choice rules. But we do not include a representation of the underlying choice rules themselves. 9 Using this approach, we can represent a wide variety of choice rules, including choice rules that may change over time. For example, we can easily represent the choice rule "Alice normally cheats at cards up to time t and normally respects the rules afterwards" by classifying all instances of Alice's noncheating up to t as deviant and all instances of Alice's cheating after t as deviant. Similarly, we can represent choice rules such as "Alice is indifferent between mango and pineapple but strictly prefers watermelon over mango and pineapple": according to this rule, picking watermelon is the only non-deviant option for Alice when watermelon is available, while none of her options is deviant at moments when watermelon is not available.
We are now ready to introduce the definition of a frame for our logic SLD n .
Definition 4 (SLD n frame) An SLD n frame is a tuple T , act, dev where T is a DBT structure, act : Ind → Ag-Acts is an action-type function over T , and dev : Mom → 2 Acts assigns to every moment a set of deviant individual actions. The function dev is required to satisfy the following conditions: for all m ∈ Mom and i ∈ Ag, According to condition 1, only individual actions executable at a moment can be deviant at that moment. The idea is that individual actions that cannot be performed at a moment are immaterial for the default choice behavior of the agents at that moment. According to condition 2, every agent can perform at least one non-deviant action at every moment. Given the condition of independence of agents, this means that, at every moment, there is some history on which no agent performs a deviant action. So, according to the choice rules underlying an SLD n frame, something will always happen. 10 Finally, condition 3 captures the simplifying assumption that all choice rules are either indeterministic or deterministic. This condition ensures that, at each moment, agents can be divided into two categories: (i) agents that have no deviant actions (called unconstrained) and (ii) agents who have some deviant actions and only one non-deviant action (called constrained). 11 This distinction will play a key role in Section 3.1.
An SLD n frame representing Example 1 is pictured in Fig. 2, where the gray cells represent the deviant actions (recall that Max's choice rule is that he flips the coin that guarantees that David bet incorrectly). In the frame, all agents are unconstrained at every moment, except for Max who is constrained at moments m 4 and m 5 .
We conclude this subsection with some brief comments about extensions of the stit semantics related to the one proposed here.
The first extension that we discuss is strategic stit, see [5,Chapter 13], [25,Chapter 7], [15]. Labeling some actions as deviant at a moment can be viewed as a generalization of a strategy used in strategic stit. Given a dev function and an agent i, we can define a function s i : Mom → 2 Acts i as follows: for all m ∈ Mom, Thus defined, s i is a partial strategy for agent i that assigns to each moment m the non-deviant actions available to i at m. It is a partial strategy because agent i may be unconstrained at moment m, in which case it is possible that s i (m) = Acts m i with |Acts m i | > 1. A similar generalization of strategic stit can be found in [33], where the authors supplement stit with a set of rational choices for every agent at every moment. But, as we mentioned above, choice rules may be grounded on preferences or habits that are, on the face of it, irrational. So, non-deviant choices may not coincide with rational choices. The approach that comes closest to our understanding of the dev function is Müller's [34, p. 199] idea of using strategic stit to "affix 'defaults' to future choices". The key difference between Müller's proposal (and, more generally, strategic stit) and our own is the role that "defaults" (or strategies) play in the semantics: in the present paper, "defaults" are introduced to contribute to the analysis of choice-driven counterfactuals rather than provide a semantics for strategic stit operators.
A second extension of stit adds epistemic operators, see, e.g., [17,23,27,32]. It is important to not confuse an epistemic indistinguishability relation (an equivalence relation on indices) with instants. Our interpretation of instants is that they represent "alternative presents," and not uncertainty of the agents. In this paper, we are interested in truth conditions for choice-driven counterfactuals, and not what such counterfactuals may express about the cognitive procedure, knowledge, and beliefs used to evaluate them.

The Logic SLD n
Recall that Ag = {1, . . . , n} is a fixed set of (names of) agents and Atm is a fixed non-empty finite set of (names of) action types. In addition, let us fix a nonempty countable set P rop of propositional variables (we use p, q, r, possibly with superscripts p , p , . . . , for elements of P rop).
Definition 5 (Syntax of SLD n ) Let P rop, Atm and Ag be defined as above. The set of formulas of the language of SLD n , denoted L SLD n , is generated by the following grammar: where p ∈ P rop and a i ∈ Acts.
The abbreviations for the Boolean connectives ∨, →, ↔, and the propositional constants ⊥ and are defined as usual. We use ♦ϕ,Xϕ, andŶϕ as abbreviations for ¬ ¬ϕ, ¬X¬ϕ, and ¬Y¬ϕ respectively. Finally, we will adopt the usual rules for the elimination of parentheses.
The three modalities are standard in branching time logic: ϕ means "ϕ is settled true" or "ϕ is historically necessary," Xϕ means "ϕ is true at the next moment on the current history," and Yϕ means "ϕ is true at the previous moment on the current history". The intended interpretations of the action formulas do(a i ) and dev(a i ) are "agent i does action a" and "action a i is deviant", respectively. For any α ∈ Ag-Acts, we define: Thus, do(α) means "the agents do α" (i.e., "for all i ∈ Ag, i performs action α(i)").
We now define a model based on an SLD n frame and truth for formulas from L SLD n at an index.
Definition 6 (SLD n model) An SLD n model is a tuple M = F, π , where F is an SLD n frame and π : P rop → 2 Ind is a valuation function.
Definition 7 (Truth for L SLD n ) Suppose M is an SLD n model. Truth of a formula ϕ ∈ L SLD n at an index m/ h in M, denoted M, m/h |= ϕ, is defined recursively as follows: The notions of validity and satisfiability are standardly defined as follows: Let ϕ be a formula in L SLD n and M an SLD n model. Then: ϕ is valid in M just in case ϕ is true at all indices m/ h in M; ϕ is valid in the class of SLD n models just in case ϕ is valid in all SLD n models; ϕ is satisfiable in M just in case ϕ is true at some index m/ h in M; finally, ϕ is satisfiable in the class of SLD n models just in case ϕ is satisfiable in some SLD n models.
The proof of the following theorem can be found in Appendix A. Table 2, is sound and complete with respect to the class of all SLD n frames.

Theorem 1 The axiom system SLD n , defined by the axioms and rules in
. . , α(n) = a n (III) Axioms for dev: The axioms for do are a reformulation, in L SLD n , of the main axioms of the Dynamic Logic of Agency (DLA) proposed by [24]. 12 Axioms Act (for "Active") and Sin (for "Single") say that every agent performs one, and only one, action at every index. Axiom UH expresses no choice between undivided histories: if a group of agents performs an action that does not rule out that ϕ is true at the next moment, then there is some history consistent with the group action on which ϕ is true at the next moment. Axiom IA expresses independence of agents: if the individual actions a 1 , . . . , a n can be performed separately, then these actions can also be performed jointly.
Finally, the axioms in the last group express the fact that the dev function is moment-relative (axiom Ax1) and satisfies the conditions of executability of deviant actions (axiom Ax2), availability of non-deviant actions (axiom Ax3), and (in)determinism of choice rules (axiom Ax4).

Adding Counterfactuals
In this Section, we extend L SLD n with formulas of the form ϕ → ψ with the interpretation "if ϕ were true, then ψ would be true". Let L → SLD n be the full language. We aim at providing a semantics for L → SLD n based on SLD n frames. Our starting point is the well-known possible world semantics for counterfactuals due to Stalnaker [46] and Lewis [30]: ( * ) A counterfactual ϕ → ψ is true at a world w just in case either (i) there is no ϕ-world accessible from w (the vacuous case), or (ii) some world satisfying ϕ ∧ ψ is more similar to w than any world satisfying ϕ ∧ ¬ψ.
The fundamental notion is a relative similarity relation between possible worlds, which [30] takes to be a weak ordering (a transitive relation in which ties are permitted but any two worlds are comparable) satisfying the centering condition (any world is more similar to itself than any other world).
There are two key questions that arise to adapt the above definition to our semantics: What should take the place of possible worlds as arguments of the relative similarity relation? What properties does the relative similarity relation satisfy? There is an extensive literature about the second question; see, e.g., [6,. While the properties we consider in this paper are not uncontroversial, our semantics for choice-driven counterfactuals takes into account some core issues from this literature. Our aim is to: 1. study the implications of these issues in our stit framework (Sections 3.1 and 3.2); and 2. explore some of the additional issues that arise when evaluating choice-driven counterfactuals after some agents don't follow their default choice behavior (Section 4).
We start with addressing the first question about the definition of relative similarity in our framework.
In the Lewis-Stalnaker semantics, possible worlds are treated as unanalyzed entities. By contrast, in our framework formulas are interpreted at a moment on a history, where the latter represents everything that happened in the past and everything that will happen in the future. From a logician's perspective, since Lewis defines relative similarity as a three-place relation on possible worlds and indices (i.e., momenthistory pairs) are the analogue of possible worlds in an SLD n frame, relative similarity should be defined as a three-place relation over indices. However, when scholars in the Lewisian tradition try to put flesh on the bones of Lewis's abstract relative similarity relation, they typically think of possible worlds as evolving over time (as histories) and not as momentary states (as moment-history pairs). 13 This squares, too, with the analysis of Example 1 we suggested in Section 2: In order to determine the truth value of (C1) If David had bet tails, then he would still have lost we consider histories that differ minimally from the actual one where it is true, at the time of utterance, that David bet tails and check whether, at that time, it is true that David loses. From this perspective, it makes sense to introduce a relative similarity relation between histories (rather than indices). We will see below that, granted some additional assumptions, both perspectives can be accommodated.
Taking the more philosophical stance and following the intuitive analysis of Example 1, let us supplement SLD n frames with a relative similarity function : Hist → 2 Hist×Hist that assigns to every history h a relative similarity relation h , where for all h, h 1 , h 2 , means "h 1 is at least as similar to h as h 2 ". Let a relative similarity SLD n frame be a tuple T , act, dev, such that T , act, dev is an SLD n frame and a relative similarity function. A relative similarity SLD n model is a tuple T , act, dev, , π where T , act, dev, is a relative similarity SLD n frame and π is a valuation function (as in Definition 6). Recall that, for any moment m, t m is the instant to which m belongs (the time of m). When a formula is evaluated at m/ h, we call t m the time of evaluation. The following definition is the analogue of the Lewis-Stalnaker semantics for counterfactuals ( * ): Accordingly, a counterfactual is true at an index m/ h just in case the consequent is true, at the time of evaluation t m , on all histories that differ minimally from h where the antecedent is true at t m (if there are any histories on which the antecedent is true at t m ). We are thus assuming that the truth values of ϕ and ψ at indices not occurring at the time of evaluation do not affect the truth-value of ϕ → ψ. This reflects the idea that, when we reason from a counterfactual supposition, we reason about what would happen if the supposed proposition were true now, see [49, p. 68]. More generally, the tense used in the antecedent and the consequent of a counterfactual is a source of indexicality: it points to a specific time (past or future) with respect to the time of utterance. A semantics for counterfactuals should be able to identify this specific time. Our semantics does this by first fixing the time of evaluation and then interpreting the temporal operators occurring in the antecedent and consequent. 14 A few definitions will clarify the connection between Definition 8 and the Lewis-Stalnaker semantics ( * ). For any index m/ h in a similarity SLD n model T , act, dev, , π , let That is, m 1 /h 1 is at least as similar to m/ h as m 2 /h 2 just in case m 1 /h 1 is accessible from m/ h and h 1 is at least as similar to h as h 2 . The evaluation rule for → in Definition 8 can then be rewritten as: This is the standard evaluation rule for counterfactuals replacing possible worlds with indices. Rewriting Definition 8 in this way reveals a key assumption underlying our semantics for counterfactuals, namely that the time of evaluation does not affect the relation of relative similarity between histories: if h 1 is at least as similar to h as h 2 , then this is true no matter what time it is. This is a substantial assumption. This informal principle is to be intended as strongly as possible: if h 3 up to m 3 is even a little closer to h 1 up to m 1 than is h 2 up to m 2 , then m 3 /h 3 is closer to m 1 /h 1 than m 2 /h 2 is, even if h 2 after m 2 is much closer to h 1 after m 1 , than h 3 after m 3 . Any gain with respect to the past counts more than even the largest gain with respect to the future. [Notation adapted.] Consider the DBT structure in Fig. 3. Condition 2.3 implies that t 2 /h 2 is more similar to t 2 /h 1 than t 2 /h 3 , even if t 1 /h 2 and t 1 /h 3 may well be equally similar to t 1 /h 1 . This is excluded by our assumption ( * * ), according to which, if t 2 /h 2 is more similar to t 2 /h 1 than t 2 /h 3 , then t 1 /h 2 must be more similar to t 1 /h 1 than t 1 /h 3 . The acceptance or rejection of Thomanson and Gupta's [49] condition 2.3 influences the logic of counterfactuals. We come back to this issue in Section 3.2.

Similarity Defined
In this Section, we say more about the properties that our relative similarity relation h should satisfy. 15 We gradually introduce two candidate definitions of relative similarity in SLD n frames. The first definition is based on Lewis's [31] criteria for determining similarity and gives rise to what we call rewind models. The second definition, based on well-known counterexamples to Lewis's criteria [44, p. 27, fn. 33], incorporates the idea that a notion of (in)dependence is key to a semantics of counterfactuals, giving rise to what we call independence models.
We start with Lewis's [31, p. 472] first criterion of similarity: "It is of the first importance to avoid big, widespread, diverse violations of law".
Lewis has in mind mainly causal or physical laws, but the notion of law in the above quote can also be understood in terms of choice rules. The suggestion is that a history h 1 is more similar to a history h than another history h 2 if fewer deviations from the agents' default choice behavior occur on h 1 than on h 2 . For any history h, the number of deviations on h is defined as follows: Our first observation in this Section is that our definition of similarity requires additional constraints that go beyond Analysis 1. To see this, consider again Example 1 and its representation in Fig. 2. Recall that the actual history is h 2 : after nominating Max, David bets heads and Max flips the T-coin, so David loses. Let L be the proposition that David loses (so, L is true at instant t 3 on h 2 , h 3 , h 6 , h 7 ). Intuitively, the counterfactual C1 is true at m 4 /h 2 . The counterfactual C1 is expressed by the following formula of L SLD n : (F 1) Ydo(bt 1 ) → L ("If David had bet tails, then he would still have lost").
It is not hard to see that Definition 8 and Analysis 1 would evaluate F 1 as false. The histories on which Ydo(bt 1 ) is true at the time of evaluation t m 4 = t 3 are h 3 , h 4 , h 7 , and h 8 . Among these histories, the ones with the fewest number of deviations are h 3 , h 7 , and h 8 (in fact, no deviant action is performed on these histories). So, according to Analysis 1, h 3 , h 7 , and h 8 are the most similar histories to h 2 on which Ydo(bt 1 ) is true at t 3 . But ¬L rather than L is true on h 8 at t 3 . So, if we compare histories only in terms of the number of deviations as in Analysis 1, then F 1 turns out to be false at m 4 /h 2 . The problem with Analysis 1 is that it ignores the fact that a "small miracle" [31, p. 478] (or a "surgical intervention" [36, p. 239]) at m 4 /h 2 suffices to reach h 3 from h 2 , while a substantial change in the past is needed to reach h 7 and h 8 . This suggests that the greater past overlap between h 3 and h 2 is more important than the fewer number of deviations on h 7 and h 8 .
Given the condition of past linearity, the past overlap between two histories h 1 and h 2 is their intersection: 16 This leads to a straightforward modification of Analysis 1:

Remark 1
The criterion of past overlap is the second criterion for determining similarity between histories proposed by [31]. There are well-known criticisms of this criterion: Suppose you left your jacket on a chair in a café. Consider the counterfactual "If my jacket had been stolen, then it would have been stolen right before I left". Since the histories on which your jacket has been stolen one moment ago have the greatest past overlap with the current history, the past overlap criterion implies that this counterfactual is true. This is clearly a counterintuitive consequence of past overlap. However, this issue arises when evaluating a counterfactual whose antecedent includes an arbitrary past operator. The closest we can come to express this counterfactual is "If my jacket had been stolen n moments ago, then it would have been stolen one moment ago," which is clearly false when n > 1. In this paper we assume the Lewisian analysis and leave a full discussion of this problem for future work. In doing this, we follow previous work on the semantics of counterfactuals in the context of branching time [38,52], where a relative similarity relation between histories is defined in terms of the past overlap criterion. Unlike in the present paper, these papers do not consider any other criterion of similarity.
Analysis 2 delivers the correct evaluation of F 1 at m 4 /h 2 : Histories h 3 and h 4 are more similar to h 2 than h 7 and h 8 , because their past overlap with h 2 is greater. In turn, history h 3 is more similar to h 2 than h 4 because there are fewer deviations on An SLD n frame representing Example 2 is depicted in Fig. 4, where the labels and shadings are read as in Fig. 2  Intuitively, F 2 is true at m 2 /h 1 . But Analysis 2 and Definition 8 do not vindicate this judgement. The histories on which Max flips the H-coin at t m 2 = t 2 are h 2 , h 3 , h 6 , and h 7 . Histories h 2 and h 3 have a greater past overlap with h 1 than h 6 and h 7 , so the latter two histories can be discarded. In turn, since the number of deviations on h 2 is the same as the number of deviations on h 3 , h 2 and h 3 are equally similar to h 1 . Yet, L rather than ¬L is true on h 3 at t 2 . Given Definition 8, it follows that David might win-a weaker conclusion than the desired one. The problem is that, even though h 2 and h 3 have the same past overlap with h 1 as well as the same number of deviations, more agents need to change their actions to reach h 3 than h 2 (in this sense the change required to reach h 3 is not minimal). This suggests that the smaller change making h 2 branch off from h 1 is more important than the equal number of deviations on h 2 and h 3 . 17 17 The importance of fixing the actions of as many agents as possible when evaluating a counterfactual in a stit model is already emphasized by Horty [25,Chapter 4], who uses this criterion to define a selection function that picks, for every index m/ h, agent i, and action (token) K available to i at m, the most similar histories to h where i performs K. Since he is only interested in counterfactuals of form "if Given two histories h 1 and h 2 , say that h 1 and h 2 divide at moment m if m is the last moment they share, i.e., m ∈ h 1 ∩ h 2 and succ h 1 (m) = succ h 2 (m). When h 1 and h 2 divide at moment m, let the number of agents separating h 1 and h 2 be defined as follows: Then, n sep(h 1 , h 2 ) counts the number of agents that, by performing different actions on h 1 and h 2 at moment m, make h 1 and h 2 divide at m. 18 When h 1 and h 2 never divide (i.e., h 1 = h 2 ), let n sep(h 1 , h 2 ) = 0. Putting everything together, we have our first definition of similarity. We will call rewind model any similarity model T , act, dev, R , π , where R is defined as in Definition 9.
Definition 9 encodes a substantial assumption about how we let a scenario unfold under the supposition that the antecedent of a counterfactual is true. To see this, let us go back to our initial Example 1 (cf. also Fig. 2, p. 12), but suppose that the actual history is h 6 instead of h 2 : After nominating Maxine, David bets heads and Maxine happens to flip the T-coin, so David loses. What if David had bet tails? Would he have won? There are two ways to answer this question.
(1) Rewind History: When we suppose that David bet differently, we rewind the course of events to the moment when David bets (m 3 ), intervene on his choice, and then let the future unfold according to the agents' default choice behavior.
Since there is no choice rule constraining Maxine's flip, we only conclude that David might win. This is the conclusion we reach by applying Definition 9, agent i performed (now) a different action, then ϕ would be true," [25] does not consider other criteria of similarity. 18  To make the reasoning in (2) precise, we need to identify all the events that are independent of David's choice. In stit, we can think of events as actions performed by agents (possibly treating Nature as an agent). This allows us to use our distinction between constrained and unconstrained agents to capture the reasoning in (2): the unconstrained agents whose default choice behavior is not constrained by a choice rule at a moment are precisely those whose actions at that moment are independent of the actions performed at previous moments (e.g. David betting). 19 To account for the Assume Independence intuition, we supplement Definition 9 with a further requirement on unconstrained agents. Recall that an agent i is unconstrained at a moment m when none of the actions available to her at m is deviant (cf. Section 2.2). The set of agents unconstrained at moment m is thus defined as:

act(m/ h) = {act(m/ h)(i) | i ∈ Ag(m)}
Then the number of independent events for any histories h 1 and h 2 is defined as: 19 To account for the reasoning in (2) in the context of branching time, Thomason and Gupta [49] impose constraints of "causal coherence" on their models. Yet, they acknowledge that this move adds a substantial layer of complexity to their theory. With a similar aim but in the context of branching space-time, Placek and Müller [38] define "independence" as space-like separation. Yet, they acknowledge that this kind of independence is hardly realized in everyday situations like the betting scenarios of our examples. The possibility of distinguishing constrained and unconstrained agents provides us with a convenient way to get around these difficulties. Thus, n indep counts, for every instant t, the number of agents unconstrained at t on both h 1 and h 2 that act in the same way on these histories. 20 Let us illustrate the previous definitions with Fig. 2. Assume that the vacuous choices of agent i ∈ {1, 2, 3} are all labeled with vc i . We then have the following: •Ag(m k ) = {1, 2, 3} for k ∈ {1, 2, 3, 6, 7} and Ag(m j ) = {1, 3} for j ∈ {4, 5}; Our second definition of similarity refines our first definition by incorporating the assumption of independence discussed in item (2)  We will call independence model any similarity model T , act, dev, I , π , where I is defined as in Definition 10. In the following, we will use ≺ for elements of {≺ R , ≺ I } and for elements of { R , I }.
Definition 10 delivers the correct analysis of Example 2: although h 2 and h 3 overlap the same initial segment of h 1 , at m 2 both David and Maxine act in the same way on h 2 and h 1 , while Maxine changes her behavior on h 3 . Hence, h 2 is more similar to h 1 than h 3 . Since ¬L is true on h 2 at t 2 , it follows that F 2 is true at m 2 /h 1 . 21 20 The reason why n indep is defined over all instants rather than a single instant or a set of relevant instants is that our relative similarity relation compares histories "globally" (see the discussion on pp. 17-18) 21 Note that this analysis essentially relies on the assumption that Maxine has two choices: she can pick the H-coin or pick the T-coin. If Maxine tossed a fair coin instead of choosing between the H-coin and the T-coin, the example would be different since Maxine would have a single choice with indeterministic outcomes instead of two choices with deterministic outcomes. So, unless the coin itself was modeled as an unconstrained agent (i.e., treat nature as an agent), our analysis would be different.

Logical Properties
The following are some immediate consequences of Definitions 9 and 10. T , act, dev, , π is either a rewind model or an  Recall that, for any index m/ h from a similarity SLD n model, the set of indices

Proposition 1 Suppose that
The following is a straightforward corollary of Proposition 1:

Proposition 2
The following axioms and rule are valid and truth preserving in any rewind model (resp. independence model): 22 More interestingly, the principles in the next proposition reflect the interaction between counterfactuals and temporal modalities.

Proposition 3
The following principles are valid in any rewind model (resp. independence model).

Corollary 2
The following principles are theorems of the axiom system obtained by extending SLD n with the principles in Proposition 2, Cen1 and Cen2: Proof Straightforward given Cen1, Cen2, and the fact that is an S5 modality.
The validity of the distribution principles Dis X and Dis Y depends on the assumption that the time of evaluation does not affect the relation of relative similarity between histories. In fact, since the most similar histories to a history h up to the present time t are the same as the most similar histories to h up to one instant after t, the most similar histories to h on which Xϕ is true at t must be the same as the most similar histories to h on which ϕ is true one instant after t (similarly for Yϕ).
Interestingly, the condition 2.3 from [49] (see p. 18) makes it possible to find counterexamples to Dis X and Dis Y . To see this, let us go back to Fig. 3. Recall that, according to condition 2.3, t 2 /h 2 is more similar to t 2 /h 1 than t 2 /h 3 . Assume that t 1 /h 2 and t 1 /h 3 are equally similar to t 1 /h 1 and that p is true only at t 2 /h 2 and t 2 /h 3 while q is true only at t 2 /h 2 . Since q is true at the most similar index to t 2 /h 1 at which p is true (i.e., t 2 /h 2 ), p → q is true at t 2 /h 1 , and so X(p → q) is true at t 1 /h 1 . On the other hand, since ¬Xq is true at one of the most similar indices to t 1 /h 1 at which Xp is true (i.e., t 1 /h 3 ), Xp → Xq is false at t 1 /h 1 .
Thomason and Gupta [49, pp. 70-71] rely on a variant of Example 1 to support the claim that Dis X and Dis Y should not come out as logical validities. In their version of the example, Max and David are the only agents, the game starts with David's bet (at t 2 in Fig. 2) and ends after Max flips either the T-coin or the H-coin. So we can depict their example as in Fig. 2 ignoring histories h 5 , h 6 , h 7 , and h 8 and moments occurring before time t 2 . As in Example 1, Max flips the coin that guarantees that David loses. In addition, the actual history is h 2 : David bets heads and Max flips the T-coin. Now, let L be the proposition "David loses at time t 3 " (so, L is true at all moments on histories h 2 and h 3 ). According to [49], the counterfactual (A) do(bt 1 ) → L ("If David bets tails, he would lose at t 3 ") is intuitively true at t 2 /h 2 , i.e., at the beginning of the game on the actual history. Hence, Y(do(bt 1 ) → L ) is true at t 3 /h 2 . On the other hand, the authors take the counterfactual (B) Ydo(bt 1 ) → YL ("If David had bet tails, he would have lost at t 3 ") to be intuitively false at t 3 /h 2 , i.e., at the end of the game on the actual history. If this is correct, then the implication Y(do(bt 1 ) → L ) → (Ydo(bt 1 ) → YL ) is false at t 3 /h 2 , that is, the principle Dis Y is not intuitively valid. 23 We disagree with Thomanson's and Gupta's judgement about B. Given Max's choice rule, at the end of the game it would be perfectly natural to explain to David: "Well, if you had bet tails, you would still have lost". We think that the problem stems from a confusion between the time of evaluation and the time to which the antecedent of a counterfactual refers. In discussing the present example, Thomason and Gupta seem to take it that, in reasoning from a counterfactual supposition, we hold fixed as many past facts as possible up to the time of evaluation (t 2 in the case of A and t 3 in the case of B). But, as most scholars think (cf. [6,Chapter 12]), what we intuitively do is rather to hold fixed as many past facts as possible up to the time to which the antecedent refers (t 2 for both A and B). 24 It then makes sense that relative similarity between histories is not affected by the time of evaluation: what is important is just that the longer a history h overlaps another history h, the more similar h is to h.
Turning to Cen1 and Cen2, the validity of these principles follows from the priority of the criterion of past overlap: if ϕ can be true at a moment, then supposing that ϕ is true does not require shifting to a different moment. (Compare the reasoning behind the validity of Cen: if ϕ is true at an index, then supposing that ϕ is true does not require moving to a different index).
Items 1 and 2 in Corollary 2 highlight an interesting interaction between counterfactuals and historical necessity. In particular, item 2, which we discuss below, can be viewed as a principle of "exportation" of from → .
The validities we have considered so far do not depend on whether we work with rewind models or with independence models. The next Proposition 2 involves a formula that distinguishes the two classes of models. 23 Observe that Thomason and Gupta's [49] condition 2.3 does not exclude the possibility of defining a similarity relation between the indices from Fig. 2 such that t 2 /h 3 is the most similar index to t 2 /h 2 where do(bt 1 ) is true and t 3 /h 4 is the most similar index to t 3 /h 2 where Ydo(bt 1 ) is true. Given such a similarity relation, A turns out to be true at t 2 /h 2 while B turns out to be false at t 3 /h 2 , in accordance with the authors' intuitive judgement. Our property ( * * ) does not allow us to define a similarity relation of this sort: according to it, t 2 /h 3 is the most similar index to t 2 /h 2 where do(bt 1 ) is true if and only if t 3 /h 3 is the most similar index to t 3 /h 2 where Ydo(bt 1 ) is true. 24 It is worth noting that, if we kept fixed as many past facts as possible up to the time of evaluation, B would be false, no matter whether Max flips the T-coin by chance or because his default choice behavior is to make David lose. Yet, intuitively, we judge B false only in the former case (recall the reasoning underlining the Rewind History and Assume Independence attitudes).

Proposition 4
The following principle is valid in any rewind model, but not valid in some independence model.
Using item 2 in Corollary 2 and Exp we can show that (ϕ → ψ) → (ϕ → ψ) is valid in the class of rewind models. The validity of this principle can be proved directly from Definition 9, which ensures that the most similar ϕhistories 25 to histories passing through a moment pass through the same moments. Note that the converse implication is not valid: suppose that we scheduled a lecture on Tuesday at 1pm and our default choice behavior is to follow the schedule. Then, "If I were not sick, I would be teaching" is settled true on Tuesday at 1pm, even though "If I were not sick, it would be settled that I would be teaching" may be false (e.g., because there is a possibility that our bike breaks down on the way to school).
To see why the addition of the criterion regarding the number of independent events leads to the invalidity of Exp , consider another example.

Example 3
Suppose that there is a basket containing an apple, a banana, an orange, and a grapefruit on a table. Next to the basket there is a jar containing three pieces of paper with the choices orange+grapefruit, orange+apple, grapefruit+banana written on them. Bob can pick one piece of paper and is given the fruits written on it. After Bob makes his choice, Ann can pick one of the remaining fruits from the basket. Assume that Bob picks the orange+grapefruit-paper and Ann picks the banana. Fig. 5. In the figure, Bob is agent 1 and his non-vacuous choices are og 1 (pick the orange+grapefruit-paper), oa 1 (pick the orange+applepaper), and gb 1 (pick the grapefruit+banana-paper). Ann is agent 2 and her 25 By "ϕ-history" we mean a history on which ϕ is true at the time of evaluation. non-vacuous choices are a 2 (pick the apple), b 2 (pick the banana), g 2 (pick the grapefruit), and o 2 (pick the orange). The actual history (thick line) is h 2 . In our terminology, both Bob and Ann are unconstrained agents-none of their actions are deviant. At m 2 , there are no citrus fruits in the basket. But what if there were? According to Definition 10, the most similar history to h 2 satisfying this condition is h 3 , where Bob picks the orange+apple-paper and Ann picks the banana-as she does at m 2 /h 2 . At t 2 /h 3 it is settled that Ann can pick a banana, so "If there was a citrus fruit in the basket, it would be settled that Ann could pick a banana" is true at m 2 /h 2 . But consider the index m 2 /h 1 where Ann picks the apple instead of the banana. Again, what if there was a citrus fruit in the basket? Reasoning as before, the most similar history to h 1 satisfying this condition is h 5 , where Bob picks the grapefruit+bananapaper and Ann picks the apple. Since there is no banana in the basket at t 2 /h 5 , "If there was a citrus fruit in the basket, Ann could pick a banana" is false at m 2 /h 1 , and so "It is settled that, if there was a citrus fruit in the basket, Ann could pick a banana" is false at m 2 /h 2 .

Example 3 is illustrated in
To conclude this section, let us highlight a potential problem for our proposal emerging from Fig. 5. We have seen that, according to Definition 10, h 3 is the most similar history to h 2 on which Bob does not choose the orange+grapefruit-paper. So, "If Bob had picked a different piece of paper, then Ann would pick the banana" is true at m 2 /h 2 . But this is a counterintuitive conclusion: if Bob had picked a different piece of paper, he might have picked the grapefruit+banana-paper, in which case Ann could not even pick a banana! We view this as a modeling issue: since choosing a banana over an apple is not the same type of choice as choosing a banana over a grapefruit, the two choices should not be labeled the same way (see the discussion of menu dependence in rational choice theory [21,28,40]). If we change the labeling, then the weaker (and unproblematic) "If Bob had picked a different piece of paper, then Ann might pick the banana" is true at m 2 /h 2 . 26 This suggests the introduction of the next condition: for all i ∈ Ag and m, m ∈ Mom, 1. Identity of Overlapping Menus: if Acts m i ∩ Acts m i = ∅, then Acts m i = Acts m i . According to this condition, if an agent has the same type of choice available at two different moments, then the menus of alternative choices available to the agent at 26 To be sure, suppose that we label Ann's choice at t 2 /h 3 as b 2 (choosing a banana over a grapefruit) instead of b 2 (choosing a banana over an apple). In addition, for simplicity, assume that every agent i has a vacuous choice vc i at all moments after t 2 . Then, it is not difficult to see that histories h 3 , h 4 , h 5 , and h 6 are equally similar to h 2 : these histories have the same past overlap with h 2 (they all branch off from h 2 at m 1 ); the same number of agents make them branch off from h 2 (namely 1, i.e., Bob); the same number of independent events occur on them (namely the events corresponding to the agents' vacuous choices); finally, the same number of deviant actions are performed on them (namely 0). Since these are all the histories on which Bob picks a different piece of paper at t 1 and Ann picks a banana only on h 3 , we indeed conclude that, if Bob had picked a different piece of paper, then Ann might have picked a banana-the unwanted conclusion that Ann would have picked a banana does not follow. (Of course, according to this reasoning, we should also replace the label a 2 at t 2 /h 5 with a 2 ). those moments must be the same. The model in Fig. 5 does not satisfy this condition because Ann has two different but overlapping menus at m 2 and m 3 , that is, {a 2 , b 2 } and {b 2 , g 2 } respectively. Interestingly, as proved in Appendix B, Exp remains invalid in the class of independence models satisfying the condition of identity of overlapping menus. In fact, the countermodel presented there satisfies a stronger condition: for all m, m ∈ Mom, 2. Uniformity of menus: if t m = t m , then Acts m = Acts m .
While the condition of identity of overlapping menus is a desirable condition, the condition of uniformity of menus is not: as illustrated by Example 3, depending on what happens at a moment, different actions may become executable in the future.

A Refinement: From Independence to Influence
The definitions of similarity we introduced in the previous Section differ in how they treat choices of unconstrained agents. Definition 10 can be understood as fixing the choices of unconstrained agents when reasoning about counterfactual situations. On the other hand, Definition 9 does not keep track of the actions of unconstrained agents on the actual history. Despite this difference, a crucial assumption that both definitions of similarity rely on is that the evaluation of choice-driven counterfactuals depends on the default choice behavior of the agents. Do these definitions still make sense when evaluating a choice-driven counterfactual on a history where one or more agents behaved deviantly in the past? Should we ignore any past deviation from default choice behavior or take it into account when evaluating a choice-driven counterfactual? Consider the following variant of our running example.   According to either Definition 9 or Definition 10, C2 is true at t 2 /h 1 : the most similar history to h 1 on which David bets heads during the second game is h 5 , where XXL is true at t 2 . 27 It is not clear that this is the correct judgement about C2 given that Max mistakenly flipped the fair coin in the first game. The main issue is that neither definition of similarity takes into account the fact that the counterfactual is evaluated at a history along which Max acted deviantly. This raises a question about what Max would do in the second game. There are different ways to answer this question: 1. Forget that Max's actual choice was deviant and assume that he is still constrained by his choice rule (i.e., he would flip the coin that makes David lose). 2. Assume that Max would make the same mistake and flip the fair coin. 3. Assume that Max would make a mistake, but we cannot tell which one (e.g., he might flip the fair coin or the tails coin). 4. Assume that Max is no longer a constrained agent, so the only conclusion we can draw is that Max might flip any of the available coins.
Without further details about why Max made the deviant choice in the first game, it is not clear which of the above options is best. Perhaps Max made a fleeting mistake and there is no further explanation, which would suggest that option 1 is the best. There might be a systematic problem with the coins (e.g., they are labeled incorrectly), which would suggest that either option 2 or option 3 is the best. Finally, options 4 is best if Max's deviant action is some type of signal that he is no longer being guided by his choice rule.
Remark 2 Counterfactuals like C2 play an important role in the analysis of strategic reasoning in game theory [7,10,39,41,43,45,54]. A central question in this literature is: What do the players expect that their opponents will do if an unexpected point in the game tree is reached? One answer (forward induction) is that players rationalize past behavior and use it as a basis for forming beliefs about future moves [3,4,47]. A second answer (backward induction) is that players ignore past behavior and reason only about their opponents' future moves [1,9,37,47]. These different answers roughly correspond to the four different options listed above explaining Max's deviant choice. Forgetting that Max made a deviant choice and assuming he will be guided by his choice rule (option 1) is analogous to the assumptions underlying backward induction reasoning (the second answer). The other options can be viewed as different ways to rationalize Max's surprising choice, as in forward induction reasoning (the first answer).
In our framework, option 1 is implicitly assumed in both Definition 9 and Definition 10. Option 4 is best understood as Max transitioning from a constrained to an unconstrained agent, which requires a revision of Max's dev function. We leave the revision of the dev function to future work and suggest a way to represent options 2 and 3.
The reasoning underlying options 2 and 3 can be captured by generalizing Definition 10: When we suppose that David will bet tails, we follow the actual course of events up to the moment when David leaves the game, intervene on his choice by making sure that he will bet tails in the second game, fix all the actions of the unconstrained agents and the fact that Max acted deviantly in the game, and then let the future unfold according to the agents' default choice behavior. The key idea is that Max's deviant choice in the first game overrides his default behavior in the second game by fixing the fact that his choice will be deviant. Similarly, according to Definition 10, the choices of unconstrained agents are held fixed in counterfactual situations.
Both ideas can be captured by adding a relation between agent-moment pairs, where (i, m) is related to (j, m ) means that i's choice at m influences j 's choice at m : On the one hand, Max's deviant choice at m 1 influences him to make a deviant choice at m 4 . On the other hand, Definition 10 requires that an unconstrained agent's choice at a moment m on a history h influences that agent to make the same type of choice at t m on the most similar histories to h. This leads us to the following definitions. (2, m 4 ) = Acts m 4 2 ∩ dev(m 4 ) (in line with option 3 above). That is, if 2 chooses deviantly at m 1 , then 2 will choose deviantly at m 4 . Then, n indep * (h 1 , h 5 h 4 ), since 2 chooses deviantly at m 4 on all of h 2 , h 3 , and h 4 (as 2 does at m 1 on h 1 ) but not at m 4 on h 5 . Hence, histories h 2 , h 3 , and h 4 are more similar to h 1 than h 5 , and so the counterfactual C2 is false at m 2 /h 1 according to Definition 10 using n indep * in place of n indep.

Conclusion
In this paper, we studied the semantics and logical properties of choice-driven counterfactuals in a stit logic with action types, instants and deviant choices. Following Lewis [30], we interpreted counterfactual statements using a relation of relative similarity on histories. We introduced two definitions of similarity motivated by different intuitions about how choice rules guide the agents' actions in counterfactual situations: the Rewind History intuition and the Assume Independence intuition. We showed how to adapt our definitions to situations in which some agents perform a deviant action. We have highlighted the subtle issues that arise when merging a logic of counterfactuals with a logic of branching time and agency.
There are a number of interesting technical questions that arise concerning our full language L → SLD n . One question concerns whether L → SLD n is strictly more expressive than L SLD n over our class of models. For instance, consider the formula ¬ϕ → ⊥, which says that ϕ is true at all indices occurring at the instant of evaluation (cf. [30, p. 22]). Note that at any index m/ h in any model M there is an n ∈ N such that m ∈ succ n (m 0 ). This means that M, m/h |= ¬ϕ → ⊥ iff M, m/h |= Y n X n ϕ. Thus, in any model and index we can find a formula of L SLD n that is equivalent to ¬ϕ → ⊥ at that index. Of course, n (and, hence, the formula of L SLD n ) varies depending on the index. This suggests that comparing the expressive power of L → SLD n and L SLD n over our models is not straightforward.
A second question concerns the possibility of a sound and complete axiomatization of rewind (resp. independence) models with respect to our full language. We do have a sound and complete axiomatization of SLD n frames (Definition 4) in a language without counterfactuals (Theorem 1). For our full language, we identified some core validities (Proposition 2 and Proposition 3) and an interesting formula that distinguishes rewind and independence models (Proposition 4). Since our definitions of similarity (Definition 9 and Definition 10) involve counting (deviant) actions along different histories, we expect that a complete axiomatization (if there is one) will require an extension of our language.
Another direction for future research is to explore applications of the logical framework developed in this paper. Branching-time logics with both agency operators and counterfactuals are a powerful tool to reason about complex social interactions.
In particular, logics of this sort seem to be necessary to clarify complex moral and legal ideas, such as the concept of responsibility [2,11,12,20,32] and "could have done otherwise" [5]. In addition, the discussion in Section 4 and Remark 2 suggests that a stit logic with counterfactuals may be fruitfully used to incorporate strategic reasoning in stit, thus advancing recent research connecting stit and game-theory, see, e.g., [19,29,48,51]). We conjecture that the latter application may call for a framework combining our approach to the semantics of counterfactuals with extensions of stit logics with epistemic operators [23,27,50] and probabilistic belief operators [13].

Appendix: A Completeness of SLD n
In this appendix we prove that the axiom system SLD n is complete with respect to the class of all SLD n frames. 29 The proof consists of two parts. First, we show that SLD n is sound and complete with respect to a class of Kripke models (called pseudomodels). By elaborating on a technique presented by [24], we then prove that every pseudo-model in which a formula ϕ ∈ L SLD is satisfiable can be turned into an SLD n model in which ϕ is satisfiable.

A.1 Pseudo-Models
Pseudo-models consist of a non-empty set W of possible states representing momenthistory pairs partitioned into equivalence classes by an equivalence relation R . Intuitively, every equivalence class of R represents a moment. Besides R , pseudomodels feature the following elements: two accessibility relations, denoted R X and R Y , modeling, respectively, what happens next and what happened a moment ago; a function f do assigning to every possible state the profile that is performed at that state; finally, a function f dev assigning to every state a set of deviant individual actions.

Remark 3
We adopt the following standard notation. For any set S, element s ∈ S, and relation R ⊆ S × S, R(s) = {s ∈ S | sRs }. For any number n ∈ N, R n ⊆ S × S is defined recursively by setting: wR n u and uRv.
R is an equivalence relation on W , R X and R Y are binary relations on W , f do : W → Ag-Acts is the action function, f dev : W → 2 Acts is the deviant-choice function, and ν : Prop → 2 W is a valuation function. For any w ∈ W and i ∈ Ag, let: } be the actions available to agent i at R (w); Acts w = i∈Ag Acts w i be the individual actions executable at R (w). Define R Ag ⊆ W × W by setting: for all w, w ∈ W , wR Ag w iff wR w and f do (w) = f do (w ). The elements of a pseudo-model are assumed to satisfy the following conditions: 1. Properties of R X and R Y : for all w, w 1 , w 2 ∈ W , 1.1. Seriality of R X : there is w ∈ W such that wR X w . 1.2. R X -functionality: if wR X w 1 and wR X w 2 , then w 1 = w 2 .
2. Independence of Agents: for all w ∈ W and α ∈ Ag-Acts, if α(j ) ∈ Acts w for all j ∈ Ag, then there is w ∈ R (w) such that f do (w ) = α.

No Choice between Undivided
The axiom system SLD n , defined by the axioms and rules in Table 2, is sound and complete with respect to the class of all pseudo-models.
The proof of Theorem 2 is entirely standard: soundness is proved via a routine validity check and completeness is proved via the construction of a canonical model for SLD n (see [8,Chapter 4.2]). We only provide the definition of the canonical model for SLD n and leave the rest to the reader. Let W be the set of all maximal consistent sets of SLD n . Where w ∈ W and ∈ { , X, Y}, define w/ = {ϕ ∈ L SLD n | ϕ ∈ w}.

Definition 15
The canonical SLD n model is a tuple W c , R c , R c X , R c Y , f c do , f c dev , ν c , where • W c = W and ν c : P rop → 2 W c is s.t., for all w ∈ W c , w ∈ ν c (p) iff p ∈ w; • where ∈ { , X, Y}, R c ⊆ W c × W c is s.t., for all w, w ∈ W c , wR c w iff w/ ⊆ w ; • f c do : W c → Ag-Acts is s.t., for all w ∈ W c , f c do (w) = α iff do(α) ∈ w; • f c dev : W c → 2 Acts is s.t., for all w ∈ W c and a i ∈ Acts, a i ∈ f c dev (w) iff dev(a i ) ∈ w.

A.2 From Pseudo-Models to SLD n Models
Call a pointed pseudo-model any pair M, w such that M is a pseudo-model and w a state in M. By Theorem 2, for any SLD n -consistent formula ϕ, there is a pointed pseudo-model M, w such that M, w |= ϕ. We want to show that M can be transformed into an SLD n model in which ϕ is satisfiable. To build stit models from Kripke models similar to our pseudo-models, Herzig and Lorini [24] use a construction consisting of two preliminary steps: (1) the relevant Kripke model is unraveled 30 in order to ensure that the relation R X generates a treelike ordering of the equivalence classes of R (recall that these represent moments); (2) from a certain point on along the relation R X in the unraveled model, every equivalence class of R is forced to be a singleton.
Step (2) guarantees that there is a one-to-one correspondence between states in the unraveled model and indices in the stit model built from it. The presence of the operator Y in the language of SLD n requires us to refine the unraveling procedure in step (1). We present the said refinement in details (Steps 1 and 2 below) and only sketch the rest of the proof (Steps 3 to 4 below), which proceeds (except for a few minor modifications) as in [18, Appendix A.1.2].

Step 1: Extended language and complexity measures
Our first task is to define an unraveling procedure u that takes a pointed pseudomodel M, w and a formula ϕ ∈ L SLD n and returns a pointed pseudo-model u ϕ (M, w) satisfying: The idea is roughly as follows: we first identify the earliest state w needed to determine whether ϕ is true at w; then, we unravel R X around the R -equivalence class of w . To make this work, we need to extend our language and introduce three complexity measures of the formulas in the extended set L ALD : (i) the Y-depth of ϕ is needed to identify w and the state corresponding to w in the unraveled model; (ii) the size of ϕ and (iii) the c-size of ϕ are needed to define a well-founded strict partial order < S c on L ALD . The proof that our unraveling procedure satisfies P 1 will be on < S c -induction on ϕ (cf. Proposition 6).
Definition 16 (Extended language) Let P rop and Acts be as before. The set L SLD n is generated by the following grammar: where p ∈ P rop and a i ∈ Acts.

Lemma 1 < S
c is a well-founded strict partial order between the formulas of L SLD n .

Lemma 2
For any ϕ ∈ L SLD n and n ∈ N such that n ≥ d(ϕ), there is ϕ ∈ L SLD n s.t. (1) ϕ ↔ ϕ is valid on any pseudo-model, (2)  action on h 3 and h 11 at all times. Since q is false at t 2 /h 11 , t 2 /h 3 |= p → q. Therefore, t 2 /h 1 |= (p → q). Fig. 7 satisfies the conditions of uniformity of menus and of identity of overlapping menus from Section 3.2. Hence, Exp remains invalid in the class of independence models satisfying these conditions.