1 Introduction

The notion of ‘finite control’ arose primarily as a descriptive term that broadly refers to a phenomenon in which a control-like interpretation is observed in seemingly finite environments (see, e.g., Iatridou 1993; Krapova 2001; Landau 2004; among others). Japanese is a language which, at least according to some authors, exhibits finite control in certain syntactic environments (Nakau 1973; Uchibori 2000; Fujii 2006):

  1. (1)
    figure a

The verb ketuisi-ta in (1) is semantically similar to the infinitive-taking English verb decide, but the embedded clause is morpho-syntactically marked for tense, suggesting that it is a finite clause. Note in particular that, roughly speaking, the unexpressed embedded subject and the matrix subject are understood to be the same (we will be more precise about the nature of “sameness” here in what follows, since this is the very issue that is at the heart of the analysis of control verbs).

Since control was taken to be limited to non-finite environments in the early phase of the GB theory (cf., the PRO Theorem), such cross-linguistic facts received much attention in the syntactic literature (see, e.g., Landau 2013 for a useful and lucid review of this rich and complex literature), and the issue is still highly relevant in the current syntactic literature. One point that most current work on control agrees on is that semantic factors (such as de se interpretation; cf. Chierchia 1989; Pearson 2018) play a much more prominent role in the interpretation and distribution of control predicates than has been traditionally recognized in the literature. But somewhat curiously, despite extensive work to date, the lexical semantic properties of control predicates have not been studied in any detail beyond rough descriptive classifications (e.g. verbs of trying, verbs of ordering, implicative verbs, aspectual verbs, etc., as in the list provided by Stiebels 2007). Thus, the exact role that semantic properties of the embedding predicates play in licensing the control interpretation and accounting for its related properties, such as the oft-noted fact that the temporal properties of the verb influence the availability of the control interpretations, is still considerably unclear. This latter issue is especially pertinent in the case of finite control, since a syntactic analysis involving PRO (or its more contemporary avatar known as the Movement Theory of Control; Hornstein 1999) inevitably necessitates some nontrivial modifications to its licensing conditions.

In this paper, we take up Japanese koto-taking verbs of the sort exemplified by (1) to investigate precisely this semantic question. Japanese koto-taking verbs are ideal for this purpose for two reasons. First, the class of verbs that take koto-marked complements is semantically diverse, making it possible for us to identify the crucial semantic properties that underlie the control-like interpretation observed in (1). Second, the distribution and interpretation of tense in Japanese is relatively simple and is well documented in the literature, and for this reason, the semantic interactions between tense and control verb meanings are much easier to disentangle than in other languages (such as those in the Balkan family; see, e.g., Zec 1987; Iatridou 1993; Varlokosta and Hornstein 1993; Krapova 2001; Giannakidou 2009; Smirnova 2009a). We believe that the relatively simple picture that emerges from Japanese has much wider cross-linguistic implications, a topic that we address toward the end of the paper.

The paper is structured as follows. We start with a descriptive overview of koto-taking control verbs in Japanese in Sect. 2. Section 3 then analyzes the semantics of these verbs in detail, by building on previous semantic approaches to control. In Sect. 4, we provide an explicit mechanism which captures the syntax-semantics interface of koto-taking control verbs, and briefly compare our semantic analysis with previous approaches in both the mainstream and lexicalist syntax traditions. We then discuss cross-linguistic implications of our proposal in Sect. 5. Section 6 concludes the paper.

2 Finite control in Japanese

As noted above, our starting point is the class of sentences such as the following in which a verb that has a control-like meaning takes a finite complement clause.Footnote 1,Footnote 2

  1. (2)
    figure b

In all these examples, the embedded subject is obligatorily coreferential with one of the matrix arguments (the orientation, that is, which of the matrix arguments is identified as the controller, is determined by the lexical semantics of the matrix verb).Footnote 3 Note also that in almost all classes, the embedded tense is nonpast (the -(r)u form), but there is one class, namely, the factive verbs, which allow the embedded tense to be past (the -ta form).Footnote 4 Japanese has the nonpast vs. past distinction for tense, and tense interpretation in embedded clauses is uniformly relative; see Sect. 3 for more details.Footnote 5

As a first attempt at making sense of the similarities and differences between the verbs in (2), we provide a rough descriptive classification in (3), which largely follows Stiebels (2007), with an additional broader two-way classification into future-oriented and non-future-oriented verbs. Roughly, if the complement P is irrealis (i.e. if the truth of P is unknown at the time of the matrix predicate), the verb counts as future-oriented; otherwise it counts as non-future-oriented. We alert the reader that the classification in (3) is purely for the sake of convenience, and that it is not meant to be exhaustive either, since descriptive classification is not itself the goal of this paper.

  1. (3)
    figure c

Aside from the broad distinctions of irrealis vs. realis, the classification in (3a, b) is motivated by the different distributional properties with respect to temporal modifiers given in (4), where the posteriority of complements is indicated by the modifiers.

  1. (4)
    figure d

Note here that the attemptive verb kokoromiru in (4a) exhibits a slightly complicated behavior. Specifically, unlike the other future-oriented verbs, it does not allow the embedded and matrix events to be temporally disjoint. We nevertheless group attemptive verbs as future-oriented due to its irrealis nature (i.e., non-completion of the embedded event) and broad semantic similarity to other future-oriented verbs in that these verbs all denote (conceived or actual) commitments of certain attitude holders.

The list in (3) may give one the impression that koto-taking control verbs in Japanese belong to rather heterogeneous semantic classes that do not have anything in common. Some are attitude predicates, but others (for example, implicatives, aspectuals and dispositionals) are not, at least not in any straightforward sense. Some are factive, others are implicative, and still others are neither factive nor implicative. An important subclass are speech act verbs, but this class is by no means representative of the whole. A key claim of the present paper is that, despite this apparent heterogeneity, once we tease apart the fine-grained lexical semantic properties of these verbs there is in fact a common core semantic property shared by all these verb classes.

3 The lexical semantics of finite control in Japanese

We advocate a semantic analysis of koto-taking finite control verbs in Japanese, building on extensive research on semantic approaches to control (see, for example, Jackendoff 1972; Růžička 1983; Foley and Van Valin 1984; Comrie 1985; Farkas 1988; Ladusaw and Dowty 1988; Chierchia 1989; Sag and Pollard 1991; Růžička 1999; Jackendoff and Culicover 2003; Stiebels 2007; Uegaki 2011; Duffley 2014; among others). Our work is also in line with the recent trend in the literature on control, which unanimously recognizes the importance of semantic factors (see Grano 2015; Pearson 2016; and in particular Landau 2015).

Since the discussion in this section is somewhat technical and complex in some parts, we emphasize here the larger goal we aim to achieve. Our main goal is to propose an independently motivated semantic analysis of koto-taking control verbs in Japanese, and to see to what extent such an account can explain some of the core properties of (finite) control that have traditionally been attributed to syntactic factors, such as the obligatory coreference of the embedded subject with a matrix argument and restrictions on embedded tense. In our analysis, the unexpressed subject in the examples in (2) is just an ordinary zero pronoun (this type of analysis of control has a long tradition, going back at least to the analysis of anaphoric control in LFG by Bresnan 1982), and the way in which the link between the controller argument in the matrix clause and the embedded subject is established is exactly the same in both the zero pronoun cases and cases involving overt reflexives/pronouns (such as (5) below for the latter type of example).

Before moving on, some comments are in order regarding the terminology we adopt in this paper and the questions that it does and does not address (readers who would like to see the analysis first can skip directly to Sect. 3.1). First, unless it is clear from the context that a different meaning is intended, we use the term “control” as a descriptive term that refers to a phenomenon involving a predicate that takes some clausal complement in which the subject of the embedded clause is obligatorily “coreferential” with one of the arguments of the higher clause (where the specific nature of “coreference” is itself an important issue that is to be clarified). Relatedly, up to the end of Sect. 4, we focus on koto-taking control verbs in Japanese, so, when we state generalizations about “control verbs” in that part of the paper, the statement should be understood to pertain to koto-taking control verbs in Japanese only, and not to control phenomena in general, unless otherwise noted.

Note that, by adopting the above terminology, even an example such as the following in which the embedded subject is overt counts as a case of control as long as the semantic criterion above is satisfied.

  1. (5)
    figure e

Conversely, when the embedded clause lacks an overt subject and exhibits coreference with a matrix argument, that alone does not automatically mean that we have a control construction (in our definition), even if the meaning of the matrix verb is similar to obligatory control verbs in English. A case in point comes from verbs such as kimeru and kettei-suru (which both roughly translate as ‘decide’):

  1. (6)
    figure f

These verbs can have overt subjects distinct from the matrix subject; moreover, an unexpressed subject is not obligatorily coreferential with the matrix subject.

Second, even though we aim to delineate the core common semantic property of koto-taking control verbs as precisely as possible, this by itself should not be taken to constitute a claim about either the necessary or sufficient condition for control verbs (in the above descriptive sense) in general. Delineating such a condition is clearly one of the ultimate goals of the semantic approach, but there are some thorny outstanding issues, as we briefly mention in the conclusion section. Thus, our proposal should be understood first and foremost as the lexical semantic analyses of these verbs, with implications primarily for related facts in Japanese (see below), and secondarily, for similar phenomena in other languages (discussed in Sect. 5).

Third, unlike some authors (see, e.g., Duffley 2014), we are not advocating the position that all varieties/aspects of control are to be reduced to semantic factors. For one thing, our account of koto-taking control verbs recognizes an irreducibly syntactic factor (see Sect. 4.1). For another, even in Japanese, there are clear cases in which control interpretation is arguably established solely by syntactic mechanisms, such as certain types of complex predicate constructions including V-hazimeru ‘start V-ing’ and V-te miru ‘try V-ing’ (Kageyama 1993, 1999; Matsumoto 1996) (cf. “restructuring predicates”; Rizzi 1982). Such syntactically control constructions are simply out of the scope of the present paper.

Finally, we take it that one of the criteria for success for our proposal is whether it can account for related facts that have traditionally been attributed to syntactic factors. Among these, the most important is the distribution of embedded tense reviewed above (and the related facts about temporal adverbials).Footnote 6 We take it to be essential to spell out the semantic analysis fully to achieve this goal. For example, the actual accounts of the embedded tense distribution developed below depend nontrivially on the interactions between modal and temporal semantics across the dimensions of presupposition and assertion, and the way in which the restriction falls out differs in subtle but crucial ways for each subtype of control verb.

To aid the reader’s understanding, we start by outlining the key analytic ideas in informal terms in Sect. 3.1. The rest of the section develops the formal analysis gradually, by spelling out the necessary components one by one.

3.1 An informal sketch

We characterize the core semantic property of koto-taking control verbs in terms of the notions of de se attitudes (Morgan 1970; Chierchia 1989) and responsibility (Farkas 1988), building on the key ideas of previous semantic approaches to control. Of the two broader subgroups of control verbs, future-oriented ones are semantically more uniform. These verbs all encode ascription of an attitude—more specifically, commitment to bring about the state of affairs named by the embedded clause—to some volitional agent. The verbs differ in several dimensions (such as whether the ascription is made in public or not, whether the ascription is to oneself or to one’s interlocutor, and whether the commitment is purely mental or is partly “externalized”). But the core meaning is common and can be characterized by means of a relation that holds between two elements as schematized in (7).

  1. (7)
    figure h

To illustrate, in the case of the attemptive verb kokoromiru (‘try’), the complement clause denotes some event or state of affairs involving the agent himself, which is the goal P to be realized. The verb additionally means that the agent behaves in a way conductive to his plan of achieving this goal in the future, that is, the course of action that the agent engages in (V) is causally linked to the goal (P) such that the former supports the realization of the latter (at least according to the agent’s belief). We deliberately leave the two key notions—“volitional action” and “causal relation”—undefined since the purpose here is to characterize the meanings of the relevant verbs in broad strokes. While we believe that the schematic characterization in (7) is useful, what really matters in the end is the specific analysis for each verb to be presented in Sect. 3.4.

By viewing the underlying meanings of (future-oriented) control verbs as in (7), the relation between the notions of responsibility and de se attitude becomes clearer: at the core of the meanings of (at least a large subset of) control verbs is a causal relation between a volitional action and its (direct or indirect) consequence. (The propositional content of) the volitional action is inherently de se and so is its consequence. As compared to future-oriented control verbs, the way in which a de se attitude ascription is involved in the meanings of non-future-oriented verbs is somewhat trickier to identify, due to the fact that these verbs are semantically more complex and heterogeneous. We discuss this issue in the next subsection.

3.2 Two semantic properties of control

We now examine in more detail the properties of the two meaning components of control verbs and how they are instantiated in different subclasses of verbs in (3) (in particular, the non-future-oriented verbs).

Morgan (1970) and Chierchia (1989) made an important observation that the complement clause of (at least some) control verbs denotes obligatorily de se properties. This can be illustrated by the following minimal pair.

  1. (8)
    figure i

(8a) and (8b) mean different things. To see this, consider the following context.

  1. (9)

    Context: John had scheduled his own business trip to France for a year later and then completely forgot about it. When the day of departure was approaching, he found the note on this trip in his schedule book, but mistakenly thought that his colleague was making the trip. There were various obstacles for the trip, and he did everything he could to make sure his colleague could travel to France smoothly, without realizing that “colleague” was he himself.

In this context, (8b) can be true and felicitous on the de re reading of the embedded pronoun he, but (8a) cannot felicitously be used to describe the same situation. For (8a) to be true and felicitous, John has to realize that the person who goes to France is no one other than himself. That is, the property denoted by the complement is a de se property that the matrix subject ascribes to him/herself.

As noted by Fujii (2006), just like English control verbs, Japanese control verbs taking koto-marked complement clauses exhibit the obligatory de se interpretation. Thus, (10) receives exactly the same interpretation as (8a), and not (8b).

  1. (10)
    figure j

This much should be straightforward but there is one complication. While the de se (or de te, for the “second person” perspective with directive verbs) property of future-oriented verbs in (3a) should be easy to see, one may wonder whether all the koto-taking control verbs in (3) (especially, implicatives such as sippai-suru ‘fail,’ aspectuals such as tuzukeru ‘continue’ and dispositionals such as nigate-da ‘be poor at’) obligatorily involve de se attitude ascription in the above sense. In fact, it was already noted by Chierchia (1989: 17) that not all control verbs can be analyzed as simple de se predicates.

We believe that the picture is obscured here only because non-future-oriented verbs have more complex lexical entailments. This is perhaps related to the fact that conceptually natural relations that can be identified between volitional actions and their (expected or unexpected) outcomes can be more varied for things that have (at least partly) already happened than things that have not yet taken place (for the latter, intention is the only obvious candidate relation). For example, we can reflexively talk about the causal relations between past events and past deeds (factives), about the actual—as opposed to intended—outcome of past attempts (implicatives), about the ways in which volitional decisions directly influence the temporal unfolding of events (aspectuals), or about intrinsic relations between volitional properties of an individual and externally observable state of affairs (dispositionals). These all involve more complex meaning components on top of the causal relation between V and P in (7).

In light of this, the implicative verb sippai-suru can be roughly paraphrased as ‘try P and ¬ P’ and the attemptive component arguably involves a de se attitude.Footnote 7 To see this, consider the following situation.

  1. (11)

    Context: John had set up a fundraising website to start his own business, but closed it immediately after changing his mind. However, due to an operation error, the site was actually not closed. Later on one day, John (thought he) found out that Bill had a similar website to raise funds (which was actually his own site), and decided to help Bill’s fundraising. He did everything he could to make sure Bill could successfully raise funds. Despite his efforts, John learned later that funds were not raised, meaning that John’s site didn’t raise the initial target amount. But he still didn’t realise that the site was his own, and not Bill’s.

In this scenario, even though (i) John had an intention to bring the fundraising to success, (ii) the fund rasing was unsuccessful in the end, and (iii) John is aware of the failure, (12) is clearly infelicitous, due to the fact that John is confused about self-identity.

  1. (12)
    figure l

Similarly, the dispositional verb nigate-da ‘be poor at’ is just as infelicitous in an analogous situation (well imaginable in a fictional setting) in which John’s assistance to ‘somebody else’s’ fundraising consistently results in a failure.

  1. (13)
    figure m

Again, the point here is that for the truth of (13) it is not enough for the property in question to hold of John, for John to have an intention to achieve the goal, and for him to be aware of the relevant outcome—in addition to all these, he needs to self-identify himself as the ‘possessor’ of the property in question.

Aspectual verbs such as hazimeru (‘begin’), tuzukeru (‘continue’) and yameru (‘stop’) exhibit a similar behavior. Thus, consider the following minimal pair:

  1. (14)
    figure n

In this context, John is aware of the ongoing deceit, is intentionally acting to keep it going, but he fails to recognize that the person who is legally responsible for the crime is he himself. The compound verb version in (14b) is perfectly felicitous in this context since it just encodes the aspectual meaning. By contrast, the koto-taking variant of tuzukeru is infelicitous, which we take to be evidence for the fact that a de se, self-ascription meaning is an obligatory component of the koto-taking variant of aspectual verbs.Footnote 8

In all the exampless in (12)–(14a), the lexical entailments of the verbs are all satisfied except for the relevant self-identity of the attitude holder. Given this, we conclude that all subclasses of koto-taking control verbs in (3) involve de se attitude ascription as an obligatory meaning component.

The other important property of control predicates comes from the work of Farkas (1988). In an attempt to provide a semantic account of controller identification in English, Farkas notes that the notion of ‘responsibility,’ defined as in (15), is at the heart of the lexical meanings of verbs that induce control.

  1. (15)
    figure o

Farkas hypothesizes that in control sentences, the RESP relation holds between the controller (i) and the embedded clause (s). For example, in (8a), in the worlds in which John’s goals are satisfied, John brings it about that he goes to France, that is, his going to France is a consequence of some volitional action performed by John.

While we believe that this characterization gets at the heart of the meanings of many control verbs (note that our (7) is directly influenced by Farkas’s (15)), there are both conceptual and empirical issues that need to be clarified. First, conceptually, Farkas takes RESP to be a semantic primitive, and she doesn’t offer any characterization of this notion beyond the intuitive paraphrase along the lines reproduced above.

Second, empirically, ‘responsibility’ (at least if we take Farkas’s original characterization verbatim) seems somewhat too narrow as a notion that unifies the meanings of all control verbs in (3). Specifically, with negative implicatives such as sippai-suru (‘fail’), it is unclear in just what sense an event that did not happen was ‘brought about’ by (or is ‘the result’ of) an act performed by the controller. Some of the factive verbs are similarly (or perhaps even more) problematic: in the case of kookai-suru (‘regret’), the embedded event is typically something that happened despite one’s will, or without one’s conscious recognition that one’s own action would later lead to undesirable consequences. In this sense, factive verbs with negative connotation in (3b) seem to encode a relation that is exactly the opposite of RESP.

A key property of our proposal is that it decomposes and generalizes the notion of responsibility along lines informally sketched in (7) in Sect. 3.1 above. Importantly, as will become clear when we present more detailed analyses of each type of verbs below, the exact content of the volitional meaning and the causal relation differs from one verb to another, and this is essentially the reason that a simple characterization of RESP along the lines of (15) is inadequate. The notion of de se attitude ascription (as outlined above) turns out to play a crucial role in our decomposition of the RESP relation. Thus, through a close examination of the lexical semantic properties of control verbs in Japanese in (3), we aim to shed light on the deeper connection between these notions that have (to our knowledge) not yet been fully clarified in the literature.

3.3 Formal assumptions about embedded tense and de se attitudes

In this subsection, we introduce some formal machinery and notation. In line with the recent literature, we analyze de se attitude ascription in terms of the so-called “centered propositions” (see especially Stephenson 2010, in relation to control). A centered proposition is essentially a proposition with a bunch of parameters (formally, a set of variables) designating the attitude holder’s self, the ‘current world’ (for the attitude holder), the ‘current time,’ etc. We generally follow this approach but add one extra parameter: the ‘event time,’ that is, the time at which the state of affairs described by the proposition is taken to hold. While this is not a standard assumption, it makes it easy to state conditions directly referring to the event time of the embedded clause in a compositional system implementing relative tense. Conceptually, this merely amounts to the claim that the attitude holder has direct access to the temporal location (relative to his/her ‘now’) of the state of affairs that his/her belief is about.

We assume that declarative sentences denote centered propositions of type , where the type \(\mathscr{C}\) is the type of contexts.Footnote 9 A context ℂ = of type \(\mathscr{C}\) is a quadruple such that x (of type e) is the doxastic center, w (of type s) is the doxastic center’s ‘current world,’ \(t_{0}\) (of type i, for temporal intervals) is the doxastic center’s ‘now,’ and \(t_{1}\) (also of type i) is the extra parameter we introduce for the ‘event time.’Footnote 10 We write C s (s for ‘self’), C w , C n o w and C t for x, w, \(t_{0}\) and \(t_{1}\) (for the above ℂ), respectively. In other words, the following holds for any ℂ:

  1. (16)
    figure p

Since attitude verbs frequently modify a subset of parameters of a context, we introduce a notational convention here. We write C [ C α / u ] to denote a context identical to ℂ except that C α is replaced by u. For example, with ℂ = , we have:

  1. (17)
    figure q

To avoid notational clutter, we introduce further abbreviations as in (18) (since the two temporal parameters cannot be uniquely identified by their semantic type alone, we distinguish them by prefixing the symbol @ for the ‘current time’ parameter):

  1. (18)
    figure r

We now illustrate the workings of this system with the compositional calculation of relative tense in Japanese. As discussed by many authors (Teramura 1984; Ogihara 1996; Kusumoto 1999; among others), Japanese exhibits a relative tense system more or less uniformly across different types of embedding contexts. The following example of the propositional attitude verb omou ‘think’ illustrates the relevant point.

  1. (19)
    figure s

In (19), when the past tense form -ta is used in the embedded clause, Mary was in Tokyo (in John’s belief) before John had the relevant belief. With the nonpast tense, Mary’s being in Tokyo is simultaneous with John’s thought (rather than the speech time, unlike the present tense in English).

Since context parameters explicitly carry temporal indices, tense morphemes can simply be defined as functions that take centered propositions to return centered propositions. Specifically, nonpast and past tense morphemes are both modifiers of type . Here and in what follows, we use variables P and Q for centered propositions (i.e., expressions of type ).

  1. (20)
    figure t

When centered propositions are embedded under attitude predicates, C n o w is identified as the attitude holder’s ‘now’ (we illustrate this point immediately below). Thus, the past tense in (20a) says that the event time of the embedded clause precedes the attitude holder’s ‘now.’ The nonpast tense imposes the non-precedence condition.

The denotation of the attitude verb omou ‘think’ can be defined as in (21), using the standard notion of epistemic alternatives defined along the lines of (22).

  1. (21)
    figure u
  1. (22)
    figure v

As in (22), \(Alt^{epst}_{x,w,t}\) is the set of contexts (formally, triples of individual, world and time) that constitute the epistemic alternatives for x in world w at time t. Note that \(Alt^{epst}\) takes three arguments rather than four and it returns contexts as triples rather than quadruples (in the extended form introduced above). Since the extra temporal parameter designating the event time of the embedded clause for centered propositions is irrelevant for defining epistemic alternatives (similarly for other flavors of modality; see below), \(Alt^{epst}\) does not make reference to it. Centered propositions take quadruple-contexts rather than triple-contexts, so, the argument given to P in (21) has the extra temporal parameter appended at the end. The notation + denotes the ‘append’ operation:

  1. (23)
    figure w

Then, with the denotation for the embedded clause in (24) and an empty operator defined as in (25) which supplies the appropriate context for the matrix clause (consisting of the speaker sp, the current world w∗, the speech time now and an existentially bound matrix event time), the truth conditions for (26) come out as in (27).

  1. (24)
    figure x
  1. (25)
    figure y
  1. (26)
    figure z
  1. (27)
    figure aa

(27) says that in all of John’s epistemic alternatives in the current world at some past time t, there is a time that is simultaneous with or following what John identifies as ‘now’ at which Mary is in Tokyo. Note that the doxastic center for the matrix proposition is the speaker (which is ensured by (25)) but the center is shifted to whoever John takes himself to be by the epistemic alternative operator encoded in the meaning of the belief verb omou.

An astute reader should have noticed one peculiar property of the compositional system we have introduced above. That is, explicitly keeping track of the embedded event time via an extra parameter is superfluous in (27), since this temporal variable is just existentially bound under the universal quantification over epistemic alternatives introduced by the attitude predicate (that is, the same effect could be obtained by existentially binding this variable before passing the denotation to the matrix verb). With simple attitude predicates such as omou, the way we have set up the compositional system is indeed unnecessarily complex. However, it will become clear in the next section that this extra complexity is necessary for control verbs, which directly impose restrictions on the relationships between the matrix and embedded event times.Footnote 11

3.4 Lexical semantics of koto-taking control verbs in Japanese

In what follows, we take up one or two verbs from each group in (3) on Sect. 2 as a representative and analyze their meanings in some detail. We deliberately take a somewhat eclectic approach balancing detail and coverage, since our primary goal in this paper is to identify the key common property of koto-taking control verbs belonging to apparently heterogenous semantic classes as precisely as possible. In doing so, we build on both detailed formal semantics approaches (such as Sharvit 2003; Baglini and Francez 2016; Grano 2017b) and broader typological work (such as Sag and Pollard 1991; Pollard and Sag 1994; Jackendoff and Culicover 2003; Stiebels 2007).Footnote 12

In this connection, a brief comment is in order on Pollard and Sag’s (1991, 1994) work, which proposes a classification of the lexical meanings of control verbs in English into three general types—influence, commitment and orientation—which is similar in spirit to our own proposal (see also Foley and Van Valin 1984; Jackendoff and Culicover 2003; Stiebels 2007). While Pollard and Sag’s discussion of the three types of control relations (organized in a “type inherentance hierarchy” mechanism of HPSG) clearly suggests that the guiding intuition behind their classification is to capture the similarities and differences between these classes in a systematic manner, they are cautious in noting that their classification remains at an intuitive level (see, e.g., Pollard and Sag 1994: 287–288, fn. 4 and 5). Our proposal can be seen as an attempt to move this line of work one step forward, by delineating as clearly as possible the common underlying core of at least a large subset (if not all) of control verbs.

Before jumping in to the details, we remind the reader of the big picture: one of our main goals in this section is to account for some of the apparently syntactic properties of koto-taking control verbs in Japanese from independently motivated lexical semantic properties of these verbs. The most important one among these is the distribution of embedded tense (see (2)). Importantly, the status of semantic anomaly that arises when the wrong tense form is chosen turns out to be subtly different for each subtype in our analysis, and details do actually matter when it comes to identifying precisely the relevant underlying factors. It is for this reason that we take it to be important to present a formal analysis explicitly. Some readers may find some of the formulas we present below unforgivingly meticulous; we have tried to paraphrase the analyses carefully in prose, so, at least the key ideas of our proposal should be comprehensible without technical expertise in formal semantics.

3.4.1 Attitudinal, commissive and directive verbs

Attitudinal verbs are the prototypical type of control verbs in which the causal relation schematized in (7) above can be identified most transparently. Here, we take up ketui-suru (‘decide’) as an example. We analyze the meaning of ketui-suru as in (28), formally as a function that takes a centered proposition P and an individual x as arguments to return a centered proposition with a certain presupposition. The core meaning (in (iii)) consists of quantification over volitional alternatives for the attitude holder. The two presuppositions are conditions that constrain the nature of the volitional alternatives involved.

  1. (28)
    figure ac

The two presuppositional components and the assertion can be paraphrased as follows:

  • (i) simply says that P will be true in at least one of x’s epistemic alternatives; in other words, x believes that P is within the realm of possibility.

  • (ii) is what captures the causality meaning. It says that there is some property Q that needs to be true of the doxastic center in order for P (content of the embedded clause) to be true in the future.

  • Finally, on the basis of these two presuppositions, (iii) asserts that the attitude holder has a volitional commitment to Q.

Since P and Q take C + t and C + C n o w as their context arguments, the runtimes of P and Q are \(t'\) and C n o w , respectively. Since t > C n o w is specified as part of the lexical meaning of the verb, it then follows that Q temporally precedes P. This means that, in our analysis, the verb meaning has direct access to the time at which P holds. This is why the centered proposition has an additional parameter that keeps track of the time of the embedded clause (an issue that was left open at the end of Sect. 3.3).

The notion of volitional commitment, formally modelled by the set of contexts that constitute the volitional alternatives \(Alt^{volit}\), perhaps requires some comment here. We take \(Alt^{volit}_{x,w,t}\) to be defined as follows:Footnote 13

  1. (29)
    figure ad

Note that the intended outcome of the commitment (P in (28)) and the state of affairs that constitutes the actual commitment (Q in (28)) are distinct. As in (28), we take only the latter to necessarily hold in the volitional worlds that are meant to represent the attitude holder’s commitments.

The analysis in (28) captures the core property of the meaning of ketui-suru, but it is somewhat simplified, and some aspects of it is in need of further elaboration. First, the property Q that constitutes a necessary condition for the realization of P cannot be any arbitrary property that may or may not hold of the attitude holder at the relevant time, but has to be something whose truth or falsity is up to the attitude holder’s choice. The underlying intuition here is that, though there are of course numerous conditions that support the realization of P at a future time, conditions that are not under one’s control are simply irrelevant for the truth of ‘decide to P’ (that is, if they are satisfied P will happen ceteris paribus and if they are not P won’t, but these contingencies do not affect whether the attitude decide(P) can be truthfully and felicitously ascribed to the attitude holder in question). We introduce a predicate discret (‘at one’s discretion’) to capture this idea.

While a completely precise characterization of this notion is beyond the scope of the present paper, we can attempt an approximation with the desiderative and abilitative modal operators along the following lines (here, P is of type ):

  1. (30)
    figure ae

This says that the attitude holder either takes it desirable that s/he does P at his/her ‘now’ and has the ability to do so, or else s/he finds it desirable that s/he doesn’t do P at his/her ‘now’ and has the ability to prevent P. In other words, either way s/he has the ability to bring about the desired outcome as far as the truth of P is concerned. The definition of the abilitative modal base \(Alt^{abl}\) is itself a complex issue, and we do not attempt to articulate it further here. One possibility would be to adopt a version of dispositional modality of the sort entertained by recent work by Castroviejo and Oltra-Massuet (2018), in which the relevant modal alternatives are defined roughly along the following lines:

  1. (31)
    figure af

Second, the presupposition in (28ii) is too weak. Specifically, we need to ensure that the property that actually holds of the attitude holder at the level of asserted content (28iii) is the totality of such preconditions Q that are at one’s discretion and which constitute the necessary conditions for P. This can be ensured by maximalizing over the property Q in (28) with the max operator defined as in (32).

  1. (32)
    figure ag

(32) says that P is the strongest (or most informative) property that satisfies the (higher-order) property \(\mathscr{P} \) of type (that is, it asymmetrically entails all the other properties satisfying \(\mathscr{P} \)).

With the discret and max operators introduced above, we can now refine the meaning of ketui-suru as in (33).

  1. (33)
    figure ah

With (33), the denotation for (34) (= (1)) comes out as in (35).

  1. (34)
    figure ai
  1. (35)
    figure aj

This says that there is a time t in the past such that John is volitionally committed at t to engaging in whatever he can do in order to bring about the outcome that he opens the box in the future relative to his ‘now’ (which coincides with the matrix event time t as long as John is not confused about the current time). Note that the semantic contribution of the embedded nonpast tense (the grayed-in part in (35)) is consistent with the temporal restriction introduced by the matrix verb ( t > C n o w ).

With the past tense in the embedded clause, we would obtain t < C n o w replacing the grayed-in part in (35):

  1. (36)
    figure ak

This then would conflict with the temporal restriction imposed by the matrix verb (the underlined part).Footnote 14

A couple of comments are in order here on some of the key components of this analysis. First, from the definition in (28)/(33), it should be clear that our analysis captures causality (corresponding to Farkas’s ‘bring about’ relation) in a version of Lewisian counterfactual analysis (i.e., had Q not take place, P would not have happened). Note in particular that this causal relation holds in the attitude holder’s epistemic alternatives only. Thus, if John decided (ketuisi-ta) to wake up at 6am next morning before going to bed but woke up on the following morning for a reason completely unrelated to his decision (for example, he forgot to set up the alarm clock, but woke up at 6am anyway since a car accidentally drove by his house, making a loud noise), his waking up at 6am in the actual world has nothing to do with the truth of the sentence. In such a situation, it’s just that his decision was not fulfilled. More generally, whether the complement clause P is true in the actual world has nothing to do with the truth of ‘decide to P.’

Second, the condition that Q is at x’s discretion pertains to the volitional component in the informal schematic characterization in (7) in Sect. 3.1 and captures the fact that one cannot truthfully decide to do things that do not causally depend on things that are at one’s will. Thus, suppose John takes a sleeping pill, and he knows that this pill is so effective that, once he takes it, he has absolutely no control over when he wakes up whatever he does. In such a situation, saying that John decided (ketuisi-ta) to wake up at 6am is infelicitous, since there is no candidate Q that is up to John and whose execution is a precondition for bringing about P.

Finally, decision verbs such as ketui-suru have certain entailment properties that distinguish them from simple bouletic verbs such as nozomu (‘hope’) and kiboo-suru (‘request’). Our analysis captures these properties as well. Since this is a somewhat complex issue, we discuss it in Appendix B.

Before moving on, we briefly comment on commissive (tikau ‘vow,’ etc.) and directive (meiziru ‘order,’ etc.) verbs (cf. Foley and Van Valin 1984; Comrie 1985; see also Landau 2015). These verbs have the meanings of attitudinal control verbs at their core, and that essentially explains their temporal orientations. Commissive verbs are similar to attitudinal verbs in involving a ‘first person’ attitude of the controller. The difference is in the relationship between the relevant attitude and the controller. With attitudinal verbs, the attitude is strictly internal to the controller him/herself. This is captured in the above analysis by requiring Q to be true only in the controller’s volitional alternatives. By contrast, being speech act verbs, commissive verbs necessarily involve other participant(s) of the speech act event; these verbs refer to the speech act of making the relevant attitude public (i.e., known to others). While we do not spell out a formal analysis, the meaning of tikau ‘vow’ can thus be characterized as a three-place relation V between the agent/controller (a), the recipient of the message (r), and the content of the message (P) such that V(a,r,P) is true just in case a makes it known to r that a self-ascribes λC.decide(P)(C), that is, the commitment to bring about P. Directive verbs are similar to commissive verbs in involving attitude ascription. The difference between the two is to whom the ascription is directed. While commissive verbs involve a ‘first person’ ascription, directives involve a ‘second person’ ascription of obligation. Thus, taking meiziru ‘order’ to denote a three-place relation O between the agent (a), the recipient/controller (r) and the attitude P, O(a,r,P) is true iff a makes it shared knowledge between a and r that a finds it necessary/imperative that his interlocutor (i.e. who a thinks r is) ascribes to him/herself the property P.Footnote 15 Thus, both commissive and directive verbs are attitudinal verbs at their core, with additional speech act meaning components on top of it. Since the attitudes involved are future-oriented, the restriction on embedded tense (nonpast only) follows from exactly the same reason as attitudinal verbs.

3.4.2 Factive verbs

In contrast to the future-oriented verbs, factive verbs such as kookai-suru (‘regret’) and zihu-suru (‘take pride in’) are non-future-oriented, given the nature of the semantic relations they express. That is, one can only regret (or take pride in) things whose consequences are currently relevant. This means that, for eventive predicates, the event denoted by the complement clause needs to have already happened (note that nonpast tense of eventive predicates in Japanese is future-oriented (cf., e.g., Jacobsen 2020: 313), thus excluding the simultaneous interpretation). This fact is shown in (37)(= (2e)).Footnote 16 For stative predicates, since the nonpast tense is compatible with the simultaneous interpretation, both the past and nonpast tenses are acceptable (Uchibori 2000: 204) as in (38):

  1. (37)
    figure ao
  1. (38)
    figure ap

As noted by Akuzawa (2018) and Akuzawa and Kubota (2020), this fact constitutes counterevidence to Fujii’s Tense Alternation Generalization (see Sect. 4.2.2).

The key idea behind our analysis of the factive control verb kookai-suru (‘regret’) is that it expresses a counterfactual causal dependency between what has already been done (or, more precisely, what is already ‘settled’) and what one could have done to prevent that outcome. An analysis that embodies this idea can be formalized as follows:Footnote 17

  1. (39)
    figure aq

(39) introduces two conditions in its presupposition: (i) P obtains at \(t_{0}\), which is either simultaneous with or precedes the matrix time C t (the factive component; note that the evaluation time is shifted from the speech time to the matrix event time, in accordance with the shift of the doxastic center to x), and (ii) there is some property Q that x believes was at x’s discretion at a time \(t''\) preceding \(t'\) (i.e. the time of P in x’s belief worlds) and which could have prevented P. On the basis of this presupposition, the sentence asserts that at C t , x wishes (or, finds it desirable) to have actually done Q prior to P (where Al t C d e s i d e r is the desiderative alternatives for ℂ).

As for the temporal relations between the matrix and embedded events, the time \(t_{0}\) of P is constrained to either be simultaneous with or precede the matrix time C t in the presuppositional component of the meaning of the verb. So, using the non-past tense with an eventive predicate in the embedded clause simply leads to presupposition failure, unless coercion of the sort described in footnote 16 rescues the interpretation. This makes sense, since it simply doesn’t make sense to regret matters that are not yet settled. The actual distribution of tense is somewhat more complex, partly affected by event type (stative vs. dynamic), but this arguably follows from independently motivated assumptions about the interactions between tense and aspect along lines noted above.

With (39), the translation for (40) (= (2e) from Sect. 2) comes out as in (41).

  1. (40)
    figure ar
  1. (41)
    figure as

This says that John had gotten married at some point \(t_{0}\) in the past, and that he later believed that he could have undertaken Q at an earlier point so as to prevent that outcome. In all the worlds compatible with what John found desirable at t (i.e. the time of regretting), he would have actually engaged in Q. Note that Q is a sufficient condition for ¬P in (39). The intuition here is that the inexecution of Q, together with other circumstantial conditions, was the cause of P. This is similar to the causal meaning of the verb let. That is, the agent let the situation develop in such a way that the outcome P ensued (causal meaning). Regret is essentially an attitude about such outcomes that one could, in principle, have prevented by deliberate intervention (volitional meaning).

Importantly, it is only Q, that is, the hidden (or absent) cause, which is required to be volitional, and the outcome P can be a non-volitional event (such as losing one’s wallet), so, the following example is totally acceptable:

  1. (42)
    figure at

John could have been more careful (had he chosen to be), but he wasn’t, and he now regrets it.

3.4.3 Attemptive and implicative verbs

We now turn to attemptive and implicative verbs. Unlike the classes of verbs examined above, in which the asserted content strictly pertains to the mental attitudes of the agents, attemptive and implicative verbs have entailments about facts that obtain in the real world (more precisely, the world of evaluation for the verb itself).

The most detailed analysis of attemptive verbs to date is Sharvit’s (2003) analysis of try in English (see also Grano 2017a), which builds on Landman’s (1992) analysis of progressives in terms of continuation branches. Intuitively, a continuation branch of an event is a future development of that event. According to Sharvit, the key difference between try and progressives is that the latter but not the former requires the continuation branch to be realistic. This explains the difference between the two in an ‘unrealistic’ scenario such as the one in (43). In the situation described in (43), it is inappropriate to use the progressive as in (44a) to describe what Mary is doing, but try in (44b) is perfectly fine, as long as Mary seriously believed that achieving the goal was within the realm of possibility.

  1. (43)

    We see Mary one evening at the beach in San Francisco, paddling a tiny boat. She is somehow determined to make it to the other side of the Pacific Ocean, that is, all the way to China. Clearly, there is no way for her to succeed, but she believes that, with luck, there is a chance for her to accomplish the task.

  1. (44)
    figure au

Note also that, while Mary’s attempt does not have to be backed up with realistic estimates for success, there has to be some such attempt that she is actually engaged in the real world for (44b) to be true and felicitous. Thus, (44b) cannot be used to describe Mary sitting in a sofa in her room, just speculating leisurely on her plans about crossing the Pacific Ocean with a boat.

Though we do not literally adopt Sharvit’s continuation branch-based analysis, it incorporates the key idea of the latter that the attempt need not be realistic. Our analysis of kokoromiru (which has properties essentially analogous to English try) is given in (45), which makes explicit the temporal and causal relations between the ‘volitional attempt’ component and the attempted outcome denoted by the embedded proposition.

  1. (45)
    figure av

(45) is similar to the meaning of the attitudinal verb ketui-suru (‘decide’) in (33) in involving (the maximality of) a volitional action Q that constitutes a necessary condition for the de se proposition P, but there are two differences, highlighted in gray in (45). First, unlike in (33), in (45) Q obtains not just in the attitude holder’s volitional worlds, but in the current world of evaluation as well (the second conjunct of the asserted content). This captures the intuition that trying involves an actualized precondition (but note that Q is causally related to the goal only in the attitude holder’s belief worlds).

The other difference, noted by authors such as Pearson (2016) and Grano (2017a), is the temporal relation between P and Q. In the case of attitudinal verbs, the time of P is located in the future with respect to the time of Q, the latter of which holds at the attitude holder’s ‘now.’ Attemptive verbs differ from attitudinal verbs in locating P at an extended interval containing the time of Q (i.e. the attitude holder’s ‘now’) as its subpart (that is, Q’s runtime C n o w is a subinterval of P’s runtime \(t'\)). This is attested by the felicity of the following example (data constructed on the basis of a similar example in English with try in Pearson 2016):

  1. (46)
    figure aw

As we show immediately below, the obligatoriness of the embedded nonpast tense follows from this requirement.Footnote 18

Given the lexical entry for kokoromiru in (45), the denotation for (47) (= (2a) from Sect. 2) comes out as in (48).

  1. (47)
    figure ay
  1. (48)
    figure az

Here, the embedded tense constrains the time \(t'\) of opening the box to not precede the attitude holder’s now C n o w . This is consistent with the lexical meaning of the attemptive verb which requires \(t'\) to be a superinterval of C n o w . If the embedded tense were replaced with the past tense, the strict precedence relation of the past tense would contradict with this superinterval requirement, and this would lead to the same type of anomalous attitude ascription as in the case of attemptive verbs.

Let us now move on to implicative verbs. As discussed in Sect. 3.2 (cf. the discussion on example (12)), implicative verbs that take koto-marked complement clauses have meaning components that express de se attitudes. The simplest way to capture this property is to assume that these verbs presuppose the meaning of an attemptive verb. Specifically, we assume the following meaning for seikoo-suru, which roughly says that ‘succeed P (at t)’ presupposes that at some subinterval \(t'\) of t, ‘try P’ is true and asserts that P actually obtains at t.Footnote 19

  1. (49)
    figure bb

We assume that the predicate try in (49) is identical to the meaning of kokoromiru in (45). Note that seikoo-suru resets the evaluation time to the time of the matrix clause ( C t ) for both the presupposed attemptive meaning and the asserted implicative content (see (18) for the /@ C t notation). This way, the embedded tense is interpreted relative to the matrix time (see below for the specifics).

The translation for (50) (= (2f) in Sect. 2) in (51) shows how the temporal relations are specified in the presupposed and asserted meanings of implicative verbs:

  1. (50)
    figure bc
  1. (51)
    figure bd

(51) presupposes that John tried to deceive the enemy at \(t'\), an initial subinterval of some past time t. On the basis of this presupposition, the sentence asserts that John actually did deceive the enemy at t. Note in particular here that, since the evaluation time of the embedded clause is set to the matrix time by the lexical meaning of seikoo-suru in (49), the contribution of the embedded nonpast tense (whose function is to impose a non-precedence relation between the event time and the evaluation time) ends up being redundant (t ≥ t).Footnote 20

With the past tense, this redundancy will be replaced by a contradiction (t<t). Unlike with attitudinal and attemptive verbs, due to the resetting of the evaluation time for tense (which comes from the implicative nature of these verbs), here the contradiction obtains at the level of matrix assertion:

  1. (52)
    figure bf

That is, it simply makes no sense for the speaker to assert that somebody ‘succeeded’ in doing P, where the time of success (located by the matrix tense) is simultaneous with the time of P (due to the lexical meaning of the implicative verb) while at the same time the time of P is past relative to the time of success (due to the meaning of embedded past). Intuitively, the embedded past tense variant of (50) sounds as if the choice of embedded tense is simply wrong, so we take the prediction of our analysis (that it is contradictory for the speaker to assert (52)) to be on the right track.

3.4.4 Aspectual and dispositional verbs

Finally, we briefly comment on aspectual and dispositional verbs. These verbs are non-future-oriented, and they allow only the nonpast tense for the embedded clause. Here again, once we examine the meanings of these verbs carefully, the embedded tense restriction follows straightforwardly. We start with aspectual verbs. As noted in Sect. 2, some koto-taking aspectual verbs have corresponding compound verb variants ((53a) (= (i) in footnote 6)), and there is a subtle but crucial meaning difference between these two variants that is relevant for the analysis of the koto-taking variant. The following pair illustrates this point.

  1. (53)
    figure bg

The compound verb version in (53a) is more neutral in meaning, where the role of the aspectual verb is just to encode the inception of the event. Thus, (53a) is compatible with some specific event of John drinking milk, and the sentence just asserts that the relevant event started. By contrast, the koto-complement version in (53b) is somewhat awkward in the same situation, except when a special effort is required on the part of John in initiating the milk-drinking situation (which goes against our ordinary world knowledge about drinking milk). The sentence is instead most typically construed as an establishment of a habit (see Uchibori 2000: 96–97 and Yamada 2019: 321–322), and has the connotation that the habit involves some conscious management on the part of the agent (i.e. John).Footnote 21 Note that in this respect, the koto-taking variant of the aspectual verbs instantiate the general schema in (7).

The temporal properties of aspectual control verbs can be accounted for by taking into consideration this additional aspect of meaning. The idea in a nutshell is that these verbs mark the beginning, continuation, or ending of habitual/persistent state of affairs. For example, in the case of the continuative aspectual verb tuzukeru (‘continue’), the matrix clause denotes a temporally extended state of affairs in which some property P (denoted by the embedded clause) obtains as a ‘regular pattern.’ Given the nature of habitual events, the specific instantiations of the embedded event P all occur at subintervals of the larger interval at which the matrix clause is evaluated. This effectively forces the choice of the embedded tense to nonpast.Footnote 22

Finally, let us take a look at dispositional verbs briefly. As noted by previous authors (e.g. Copley 2018; see also Sect. 6 below), there is a close connection between the notions of disposition and volitionality, manifested in a range of phenomena in natural language. It is then not too surprising to find dispositional verbs as a subtype of control verbs. But there is one difference between English and Japanese: unlike control verbs with dispositional meanings in English such as (54a) noted by Landau (2013), Japanese koto-taking dispositional verbs obligatorily require animate subjects, as shown by the infelicity of (54b).Footnote 23

  1. (54)
    figure bi

As already noted with aspectual verbs, this is a language/construction-specific property of koto-taking control verbs, having nothing to do with dispositionality per se.

Disposition is arguably an inherently causative notion; for example, Choi and Fara (2018) offer the following (slightly simplified) characterization from Lewis (1997):

  1. (55)

    An object x is disposed to M when C iff x has an intrinsic property B such that, if it were the case that C, and if x were to retain B for a sufficient time, then C and B would jointly cause x to M.

It then seems reasonable to assume that dispositional verbs in (3) instantiate the same general schema of control verb meanings in (7). Their temporal property also makes sense naturally. Given (55), dispositional predicates are essentially counterfactual, and they shift the world of evaluation, but not necessarily the temporal index. The intrinsic property B (which corresponds to the volitional meaning component in the case of intentional control verbs) has to either precede or be simultaneous with its outcome M (denoted by the embedded clause) given the causal relation. But this just means that dispositional predicates are present/future-oriented, in a way analogous to future-oriented control verbs, from which the obligatoriness of the nonpast tense follows, in a way completely parallel with future-oriented control verbs.

The discussion in this section, together with the de se property of koto-taking aspectual and dispositional verbs noted in Sect. 3.2, makes it clear that these verbs embody the general schema in (7). That is, they require volitional management on the part of the agent to bring about a de se proposition, denoted by the embedded clause. In this respect, though these verbs at first sight may not appear to encode any attitudinal meaning (or causality), they share a common core meaning with the more prototypically volitional types of verbs such as attitudinal and attemptive verbs.

4 Syntax-semantics interface

4.1 Assumptions about syntax

In our analysis, the complement clause of a control verb denotes a centered proposition of type semantically. The link between the individual parameter of this context ( C s ), that is, the doxastic center, and the controller argument in the matrix clause is established in the lexical meaning of the control predicate. This is in line with the semantic analysis of control, in particular, Chierchia’s (1989) analysis which takes the notion of de se to be a key property of control. But there is one loose end that needs to be tightened: the lexical semantic analysis doesn’t by itself ensure that the embedded subject gets identified as the doxastic center in the complement clause. There are several different ways in which this particular aspect of the syntax-semantics interface can be implemented. In this section, we outline the key aspects of such a mechanism somewhat informally, so that readers can have an idea of how this can in principle be done in any theory. In Appendix A, we provide a fully explicit fragment within Hybrid Type-Logical Grammar (Kubota and Levine 2020), a version of categorial grammar in which this component can be implemented particularly straightforwardly.

The question is how the embedded subject is identified as the doxastic center of the embedded clause. As we have argued in Sect. 2, we assume that the subject position of the embedded clause is occupied by an ordinary zero pronoun (or sometimes an overt pronoun). Then, following Chierchia (1989), the syntax needs to provide some mechanism for λ-binding this argument position when the whole clause is formed, along the following lines:

  1. (56)
    figure bj
  1. (57)

In (57), the passing of information (that the semantics of the subject NP is what gets abstracted over in the semantics of the whole clause) is mediated by a syntactic feature-inheritance mechanism in an HPSG-like notation. Specifically, the SUBJ(ECT) feature at the VP node and the TO-BIND feature at the immediately dominating S node are coindexed, which ensures that the semantics of this NP, namely, the variable x, is bound by the λ-operator at the clausal level (the same effect can be achieved by local feature checking of some sort in the more mainstream approach; see, e.g., von Stechow 2004).

Once this semantic abstraction of the embedded subject position is ensured, then it only suffices to assume that the matrix control verb has the following type of meaning that takes a property and an individual as arguments to return a proposition, of type .

  1. (58)
    figure bk

In the semantic term in (58), a centered proposition whose doxastic center is identified as the individual type argument of this property (i.e. λC.P( C s )(C)) is given as the first argument of the constant decide, whose definition is given in (33) from Sect. 3, and this enforces the identification of the type e argument of the property as the doxastic center of the relevant centered proposition. With these assumptions, the denotation for the matrix VP node in (57) comes out as in (59), which corresponds to the formula in (35) from Sect. 3.4.1.

  1. (59)
    figure bl

The fact that a control verb requires its complement clause to denote a property is ensured by its semantic type, so, no further syntactic assumption is needed. Specifically, assuming that lambda abstraction at the clausal level is licensed only in the presence of a pronominal local subject (mediated by a syntactic feature-inheritance mechanism of the sort outlined above), the derivation simply crashes by semantic type mismatch if a fully saturated proposition syntactically combines with a matrix control verb (in which case a proposition of type would be given as an argument to a function that is looking to combine with an argument of type ).

The above discussion is admittedly sketchy, but we believe that it gives the reader enough of an idea of how the semantic analysis we have argued for in the previous sections can be coupled with an adequate syntax-semantics interface of koto-taking control verbs in Japanese. For a more complete fragment spelling out the relevant details fully explicitly, we refer the reader to Appendix A.

4.2 Brief comparisons with alternatives

In this section, we compare our proposal with some of its major alternatives. We start with approaches to control in the lexicalist syntax literature in Sect. 4.2.1. While our approach shares with this lexicalist tradition the heavy emphasis on semantic aspects of control, there are clear syntactic reasons for which a direct application of the so-called “VP analysis” of control commonly adopted in this literature is not feasible for our Japanese data. We then compare our work with syntactic approaches to control in Sect. 4.2.2. Here, our claim essentially is that much (or in fact most) of what has traditionally been regarded as syntactic generalizations in this latter approach receives independent semantic explanations. We expand on this latter point further in Sect. 5, in relation to the larger literature on finite control in other languages.

4.2.1 Lexicalist approaches to control

There is an important line of work on the analysis of control predicates in formal semantics and nontransformational syntax that has its roots in the categorial grammar literature in the 80s (Bach 1979; Chierchia 1984; Dowty 1985). The key idea of this approach is that English-type control can be analyzed simply by taking the control verb to combine with a VP (semantically denoting a property) rather than an S (semantically denoting a proposition), along the following lines:

  1. (60)

There are important similarities between this analysis and our own proposal. Most crucially, in both, the embedded clause semantically denotes a property rather than a proposition (so they are both instances of the “property analysis” of control semantically, as opposed to the “propositional analysis”), and that crucially explains the link between the controller argument of the matrix predicate and the unexpressed subject argument in the embedded clause.

The lexicalist approach works fine for English, but it cannot be directly applied to koto-taking control verbs in Japanese, as long as one seeks a uniform analysis for cases with and without overt subjects. The crucial case comes from examples such as (61) (= (5) in Sect. 2) involving an overt controlled subject in the embedded clause:

  1. (61)
    figure bm

Here, there is no way in which the embedded clause can be analyzed as a VP. Since the lexicalist analysis takes the lack of an overt subject to correlate with the property denotation of the complement clause, a simple application of the lexicalist analysis to Japanese finite control is untenable. Instead, something like the anaphoric binding-based analysis of the sort we have spelled out in Sect. 4.1 is needed.

4.2.2 Syntactic approaches to control

By far the still dominant assumption about (finite) control in the syntactic literature is one in which the link between the matrix controller argument and the embedded subject is mediated via abstract mechanisms in the syntax, be it PRO in GB Theory (Chomsky 1981) or some kind of movement in minimalist syntax (Hornstein 1999).Footnote 24 In fact, in the literature on Japanese finite control, this approach is the majority view since the 80s (Hasegawa 1984/85; Uchibori 2000; Fujii 2006).

In what follows, we briefly review Fujii (2006), the most recent and most detailed work among syntactic approaches to finite control in Japanese. A common idea behind such approaches is the assumption that the embedded tense of finite control is ‘defective’ in some sense, borrowing an idea from the Balkan subjunctive literature (we will say more about this line of work in Sect. 5). As a descriptive generalization, this makes sense to some extent—with many of the verbs discussed above, the embedded tense is semantically redundant since the temporal relation between the matrix and embedded events is already determined by the lexical meaning of the control verb.Footnote 25

However, instead of pursuing a semantic analysis (a possibility he acknowledges in passing), Fujii (2006: 92) takes the move to lift the status of this descriptive observation to a syntactic generalization along the following lines:

  1. (62)
    figure bn

This condition enables Fujii to treat Japanese koto-clauses on a par with infinitives in English; that is, all the standard syntactic properties of control immediately follow from the structural properties of such clauses. However, there are several reasons to think that this type of syntactic approach is unsatisfactory.Footnote 26

The first problem is that overt subjects can appear in the embedded clause, as already noted in Sect. 2 (repeated from (5)):

  1. (63)
    figure bo

Note in particular that the overt element is not limited to reflexives (which on some approaches might be argued to be an overt allomorph of PRO), but a pronoun is also possible as long as the right pragmatic conditions are satisfied.Footnote 27 One can of course claim that such sentences with overt subjects are by definition not (syntactically) control constructions, but that leaves unexplained why an obligatory de se interpretation is observed in sentences like (63) in just the same way, regardless of whether the embedded subject is overt or not.

Second, as pointed out by Akuzawa (2018), the control/non-control contrast holds not just with koto-taking verbs, but with event nominal complements, as in (64).

  1. (64)
    figure bp

Event nominals are syntactically tenseless. Thus, the most straightforward way to capture the parallels between them and koto-taking verbs is to account for the observed patterns in terms of semantic, rather than syntactic factors. In particular, Fujii’s syntactic account of koto-taking verbs has nothing to say about the data in (64). It is worth noting in this connection that these event nominals obey essentially the same restrictions about temporal adverbial modification as koto-taking control verbs (see (4) in Sect. 2 for corresponding data for the latter). We list in (65) the contrast between attitudinal and implicative verbs for illustration (see Uchibori 2000 and Kubota and Akuzawa 2020 for more data and related discussion).

  1. (65)
    figure bq

Third, as noted in the previous section, there is at least one type of verb, namely, factive control verbs such as kookai-suru (‘regret’), which do permit tense alternation, constituting a clear counterexample to the TAG in (62) (the fact itself was actually already noted by Uchibori 2000: 204):

  1. (66)
    figure br

To summarize, a syntactic approach to finite control in Japanese suffers from at least three empirical problems. Moreover, a syntactic approach is conceptually unsatisfying too, in that it does not by itself shed any light on the question of why something like the TAG (almost) holds. Given that an analysis which takes the embedded subject to be an ordinary zero pronoun is the null hypothesis to begin with, we take this discussion to provide compelling evidence favoring a semantic approach like ours.

In view of the larger literature on control, syntactic approaches to control come in several varieties, of which Fujii’s proposal reviewed above is just one. Our proposal differs from all such approaches to the extent that the latter posit some mechanism or other at the level of syntax that goes beyond the explicit identification of the doxastic center and the local subject slot of the embedded clause. A comparison with Wurmbrand’s (2014) work on infinitives in English is useful at this point, as it brings out clearly the relationship (and points of contention) between a semantic analysis of the sort we advocate and syntactic approaches in the literature with respect to languages (such as English) in which the relationship between temporal interpretation and syntactic realization of tense is potentially more complex. Wurmbrand conducts a careful study on the temporal interpretations of infinitives in English, and comes to the conclusion that the alleged “tenses” of infinitives in English (Stowell 1982; Martin 2001) do not display indexical properties (such as double access readings for present) typical of morphological tense in English at all.

Wurmbrand (2014) illustrates her point by the contrast in (67), where the time of the future event denoted by the infinitival complement of decide is relative to the matrix event time (i.e., Leo’s now) unlike the finite clause.

  1. (67)
    figure bs

The gist of Wurmbrand’s claim is that English future infinitives are themselves tenseless, but that they contain a syntactic operator woll, which is responsible for the future interpretation of infinitival clauses. In addition, on the basis of contrasts such as (68), she takes it that (what she calls) simultaneous infinitives in (68b) lack woll at all.

  1. (68)
    figure bt

Wurmbrand’s proposal is syntactic, in that it involves the operator woll as an indispensable component. While investigating the syntactic structure of English infinitives is beyond the scope of this paper, it is important to note that the evidence provided by Wurmbrand for the syntactic presence of woll in future infinitives is not conclusive. One of the two pieces of evidence she offers involves the tense interpretation of relative clauses in examples such as the following, originally due to Abusch (2004: 49):Footnote 28

  1. (69)
    figure bu

In this sentence, the time of the relative clause (having a crush) can precede the time of the infinitive (having dinner). Wurmbrand takes this as evidence for the syntactic presence of woll in future infinitives, on the assumption that the tense in the infinitive and that in the relative clause are syntactically bound by the same operator. However, as Wurmbrand (2014: 423, fn. 14) herself acknowledges, whether the interpretation of tense is uniformly derived by syntactic binding is itself a controversial issue, and there is an anaphora-based alternative (similar to the situation with real pronouns) that can possibly explain the data point in (68) without recourse to woll. Therefore, whether or not woll is present in future infinitives remains still open to dispute.

The English infinitive pattern in (67) and (68) are compatible with our approach, assuming that the meanings of the relevant verbs in Japanese and English are similar in their temporal properties.Footnote 29 While the issue of how much needs to be encoded in syntax remains controversial, we believe that our discussion has at least clarified the implications of pushing the semantic approach to its limit, and that at this point a careful comparison is needed on the relative advantages of syntactic and semantic approaches to the interpretation of temporal properties in control constructions.

5 Going beyond Japanese

Building on the brief discussion on implications for syntactic alternatives above, this section considers implications for the still broader literature on cross-linguistic studies on the syntax and semantics of finite tense/finite control in languages other than English.

Landau’s (2004: 818) study on finite control in Hebrew, with examples such as (70), turns out to be quite illuminating for this purpose.

  1. (70)
    figure bv

In (70), the embedded future tense is interpreted relative to the matrix event time (Landau calls this type of tense interpretation “dependent tense,” in accordance with the terminology in the Balkan subjunctive literature).

Based on the temporal properties of finite control in Hebrew, Landau draws the following generalization:

  1. (71)
    figure bw

Interestingly, this type of tense interpretation is impossible with non-control verbs in Hebrew, as the following example shows:Footnote 30

  1. (72)
    figure bx

The data in (70) and (72) suggest that temporal interpretation and the licensing of (finite) control are somehow closely related in Hebrew. Essentially, a relative tense interpretation that is otherwise unavailable in embedded contexts becomes possible when the embedded clause receives a control interpretation. This is highly suggestive, given that some of the other languages in which finite control is observed (Japanese and Korean) are known to be relative tense languages.

In fact, a broadly similar pattern has been observed across a range of Balkan languages too. The key descriptive generalization from this literature is that there is an overall correlation between the types of control observed and the interpretation of embedded tense (Krapova 2001; Spyropoulos 2008; Smirnova 2009a): “dependent tense” induces partial control (where the controlled subject of the embedded clause is understood to refer to a group of people including the referent of some matrix argument NP) and “anaphoric tense”—which is taken to be a type of tense morpheme that is apparently meaningless, being licensed only via some “anaphoric” mechanism—induces exhaustive control (where the referents of the controlled subject and one of the matrix arguments coincide with each other).

The discussion of Greek by Spyropoulos (2008) is representative of this literature. According to Spyropoulos (2008), the verbs in (73) license dependent tense and anaphoric tense in the complement clauses respectively:

  1. (73)
    figure by

The verbs in (73a) in general have future-oriented meanings, and the tense in the embedded clause receives relative tense interpretation, just like Landau’s Hebrew example in (70). The verbs in (73b) are implicative, aspectual and perception verbs. These are similar to the Japanese implicative and aspectual verbs from Sect. 3. The matrix and embedded ‘events’ are simultaneous (if there are indeed two distinct events involved), and the (nonpast) tense in the embedded clause seems to just ‘corefer’ to the matrix tense, hence the name ‘anaphoric’ tense.

While the notions of anaphoric and dependent tenses have played a major role in the literature on Balkan subjunctives (especially in relation to the notion of “defective tense”; cf., e.g., Varlokosta and Hornstein 1993; Terzi 1997; Krapova 2001), in the context of a more general typology of embedded tense, they are just instances of relative tense. This view is advocated particularly clearly and forcefully by Smirnova (2009a), who notes that (despite what the names suggest) the distinction primarily pertains to the lexical meanings of the matrix predicates rather than to the interpretations (let alone syntactic properties) of the tense morphemes or the subjunctive markers.

It then seems reasonable to conclude that there is a robust cross-linguistic tendency that tense morphemes in finite control environments are interpreted as relative tense. The following hypothesis then suggests itself as a possible cross-linguistic universal:

  1. (74)
    figure bz

Taken as a semantic generalization, there is a sense in which (74) is a conceptually natural pattern. Recall from Sect. 3 that the fundamental property of control predicates is self-ascription of a centered proposition (i.e. a de se property) by an attitude holder. Different control verbs encode different types of causal dependency between these de se attitudes and underlying volitional meanings. It is then only natural that such de se properties are located temporally from the perspective of the attitude holder, that is, relative to the evaluation time of the embedded predicate which is identified as the attitude holder’s ‘now’ via the lexical meaning of the matrix predicate.Footnote 31

So far as we were able to identify, reports on finite control in languages that have tense inflection in control clauses in the literature all conform to the generalization in (74) (examples include Gamerschlag 2007: 101 on Korean and Potsdam and Polinsky 2007: 285 on Malagasy).Footnote 32 Note also that, though stated primarily as a generalization about languages that realize tense in control clauses, (74) is potentially relevant for languages in which tense (in control clauses) is not morphologically overt as well. For example, infinitival clauses in English have sometimes been argued to be “tensed” (see, e.g., Stowell 1982; Martin 2001). Note in particular Wurmbrand’s (2014) work we discussed briefly in Sect. 4.2.2, whose overall conclusions are (at least at the descriptive level) perfectly consonant with our own.

6 Conclusion

In this paper, we proposed a semantic analysis of finite control in Japanese, focusing on verbs that take koto-marked complement clauses. While koto-taking verbs that exhibit (semantic) control properties belong to apparently heterogeneous semantic classes, we have shown that they all share a common underlying abstract meaning that can be characterized in terms of a causal relation between a hidden volitional action and the de se property denoted by the complement clause. The semantic approach we argued for has its roots in the pioneering works by Farkas (1988) and Chierchia (1989) from the 80s, and is in line not only with the property theory of control in the formal semantics and categorial grammar tradition (Bach 1979; Chierchia 1984; Dowty 1985) but also with what seems to be the emerging convergence in the recent literature (cf., e.g., Grano 2015; Landau 2015), which unanimously recognizes the importance of semantic factors in characterizing the notion of control. We believe that our proposal is the first to identify the core meaning common to a wide range of control verbs encompassing both the future-oriented (attitudinal, commissive, etc.) and non-future-oriented (factive, implicative, etc.) subtypes.

Our proposal raises several questions for future inquiry. Here, we identify three issues which we take to be the most urgent. First and foremost, the validity of the Hypothesis of Relative Tense in Finite Control needs to be investigated more thoroughly. The fact that temporal interpretation correlates closely with control construal has been repeatedly noted in the syntactic literature (Stowell 1982; Landau 2000; Martin 2001; Wurmbrand 2014; Grano 2015). We believe that our hypothesis offers an interesting alternative interpretation for many of the known facts, but it goes without saying that a much more thorough investigation is needed in this domain.

Second, and relatedly, we did not have space to examine the phenomenon of partial control. This issue is important since the partial vs. exhaustive control distinction is often taken to reflect a syntactic difference (e.g. Landau 2015), and in at least some proposals (such as that of Krapova 2001), it is directly related to the different types of syntactic notions of tense that are posited in control complement clauses (but see Asudeh 2005 for an opposing view, according to which this distinction is orthogonal to syntactic assumptions). Thus, whether a semantic analysis that does not rely on such syntactic distinctions (see, e.g., Jackendoff and Culicover 2003 and Pearson 2016) is feasible has direct ramifications for the larger controversy between syntactic and semantic approaches to control.

Finally, there is the question of whether a uniform semantic analysis of control would be possible for languages other than Japanese. As noted by many authors, the classes of verbs that induce control interpretations are remarkably similar across a wide range of typologically unrelated languages. While the notion of de se attitude ascription adequately characterizes the class of koto-taking control verbs in Japanese, it is arguably too narrow for characterizing the semantics of control predicates in general. This can be seen most clearly from examples such as the following noted by Landau (2013: 33–34):

  1. (75)
    figure cb

Strikingly, all these examples arguably involve the notion of ‘dispositional causation’ in the sense of Copley (2018) (see also Copley and Wolff 2014)—a property that is observed when an inanimate expression is unexpectedly licensed in a syntactic environment in which a volitional agent is normally expected (attested in futurates and have causatives). Copley (2018) suggests that there may be a close connection between the notions of intentionality and dispositionality in such cases. In the case of (75), the higher predicate ascribes some inherent property to the controller which essentially constitutes a sufficient condition for the proposition expressed by the embedded clause. The question that arises then is whether we can come up with a suitably general semantic notion that subsumes both de se attitude ascription in the Chierchian sense and the notion of disposition relevant for (75). Much more work is needed here in order to critically examine whether a truly uniform semantic characterization of the notion of control is possible. We see this an exciting opportunity for future research.

Postscript

In the last minute of finalizing this paper, we became aware of Fujii et al. (2023), which addresses an earlier version of our criticism of the Tense Alternation Generalization. While we believe that all of their criticism of our semantic analysis of control is fully overcome in the updated analysis we have presented in this paper, a detailed critique of their argument needs to await another occasion.