1 Introduction

1.1 The Intuitive Notion of “Level-Generation”

Our point of departure is a philosophical theory from as far back as 1970, the year when the first seminal papers by Richard Montague appeared and triggered the development of formal semantics. Goldman’s theory of “level-generation” was the first general theory of actionFootnote 1 to come up with the idea (and observation) that we consider ordinary tokens of acts very often as representing more than one type of act. While it is an almost trivial fact about categorization that one and the same thing can always be categorized in numerous different ways, Goldman’s theory makes a much stronger claim: His basic mechanism of “level-generation ” relates multiple categorizations of the same doing in systematic ways. Under given circumstances , level-generation yields a whole tree of categorizations , such that doing a particular thing amounts to, or constitutes, doing at the same time—in one—a variety of things of different types. Goldman emphasizes that his notion of level-generation meets a basic intuition, and you will see that it does from just a handful of examples (in (1) in the box). These examples are to be read as follows: start from the bottom and follow the ↥ arrows; these symbolize level-generation . Assume that for each example the given circumstances are such that they allow to read the arrow as “and thereby”, or “this constitutes”. You can easily imagine (or reconstruct) circumstances that would support these steps of level-generation . The vertical structures are trees; for the sake of simplicity, the trees in (1) don’t branch, but you will see below that trees can. The trees consist of acts by the same agent and they coalesce acts that are all done in one: x, in one, flips the light switch, turns on the light, lightens the room, wakes the baby, ruins their night—all done by one little movement of a finger. The same holds for all other examples of level-generation . Being done in one, all those acts in a tree happen at the same time.

figure a

These examples all seem natural. Without much reflection, we would agree that in all these cases the upward arrow may, under appropriate circumstances, be expressed as “and thereby” and always means the same; and it is natural to view these examples as different types of act done in one. It is this intuitive connection between different ways in which—under circumstances—a given act can be categorized that Goldman’s theory of action is about.

Level-generation is an extremely common thing. If we think of it, we realize that our minds are doing it automatically and inevitably all the time. If somebody does something concrete, we will categorize it not just as a basic bodily action like keeping a door open, handing money to somebody, or pressing a button. We will rather have our attention on what the person is doing thereby, because what will matter to us will not be the mere bodily movements, meaningless in themselves, but what they achieve (or try to achieve). The same applies to our own actions and the ways we mean them. We don’t mean to exercise our thumb, when we press a button on the remote control—we mean to turn on the TV. Most, if not all, things we physically do we do not do just for themselves.

1.2 The Structure of the Chapter

Goldman originally presented his theory as a contribution to philosophical ontology. He argued that under circumstances like those assumed in the examples, the agent exemplifies multiple different acts in one. Not every ontologist would follow him; many would argue that the agent does just one thing which may happen to meet different descriptions, under circumstances.

I will re-construe Goldman’s theory not as an ontological theory of action, but as a theory of the cognitive categorization of action, a view which Goldman actually supported later in arguing that the notion of level-generation is “a psychological structure, or the manifestation of a psychological structure” (see (7) in Sect. 2.3 for the full quote from Goldman 1979). This turn has important consequences. First, Goldman’s theory is turned into a theory of cognitive representation, and his mechanism of level-generation receives the role of a cognitive mechanism. Second, it makes the theory immune to the ontological objection that there exist only one doing, not several distinct ones: the fact that one doing may, under circumstances, be categorized in multiple ways, is uncontroversial. Third, the psychological turn makes Goldman’s theory applicable to linguistic semantics (of a cognitive orientation); as you will see, it is to be assumed that level-generation is written into the lexical meanings of probably almost all verbs of action.

In Sect. 2, I will briefly review Goldman’s original theory and its reception in the philosophical discussion. My own construal of the theory will be made precise; I will introduce the central notions of ‘cascade’ and ‘c-constitution’ replacing Goldman’s ‘act-tree’ and ‘level-generation’, respectively. Section 3 provides examples and data that illustrate the relevance of level-generation for verb semantics and verb grammar.

The second part will be concerned with a formalization of c-constitution and cascades in the framework of Düsseldorf Frame Theory and the application of the approach to semantics. In Sect. 4, act-cascades will be modeled as trees of first-order frames that each represent a single type of action (like ‘flip the light switch’ or ‘wake the baby’). Section 5 will treat in depth an illustrative, more complex example, the ‘write’ cascade. I will discuss the far-reaching consequences of a cascade approach to action verb meanings for theories of lexical meaning, composition, and reference in Sect. 6. The chapter will be concluded with a brief reflection of the perspectives that the multilevel approach to categorization opens up for cognition, semantics, and life.

2 Level-Generation: Doing Multiple Things in One

2.1 Preliminary: Act-Tokens, Act-Types, and Act-TTs

The upward relation symbolized by the arrow ↥ in the examples represents what Goldman called level-generation. The first question concerning this notion is: what kind of thing does it relate. Goldman (1970) distinguishes act-tokens and act-types. Act-types are common enough: it is types such as ‘open the door’, ‘turn on the light’, ‘wake the baby’, or ‘decline a request’.Footnote 2 They can be defined more or less specifically, for example as ‘open’, ‘open a door’, ‘open (a particular) door’, ‘x open (a particular) door’ etc. In philosophy, types of act (or action) are often subsumed under the notion of “property”, in semantics, under “types of events”. Act-types are exemplified/enacted/performed/implemented if someone does something of that type. The agent then produces an act-token of this type. If Sue does something that can be described as “open the door”, she produces a token of the act-type ‘open the door’. An act-token has a determinate agent and occurs at a determinate time.

According to Goldman’s approach, level-generation obtains between act-tokens in this sense; there is a token of ‘flip the light switch’ that level-generates a token, by the same agent and at the same time, of ‘turn on the light’, and so on. In Goldman’s account, two act-tokens are different if they are tokens of different types, and two tokens are only identical, if they are tokens of the same type; more precisely:

figure b

Thus, according to him, the tokens in one act-tree are distinct. The conditions in (2) mean that the relation of level-generation does not obtain between act-tokens as such, but between acts-as-tokens-of-a-type. For example, (1d) is to be construed as: a token of the act-type ‘say “No” to y’ level-generates a token of the act-type ‘decline y’s request’, and this in turn a token of the act-type ‘disappoint y’.

Tokens-of-a-type are a very natural kind of thing. Whenever we talk about acts or events, we do so while describing them as of one type or another. For example, if we use a VP for event reference, the VP provides a description of the event referred to and thereby gives its type. Language cannot refer to acts other than by type description and semantic and pragmatic means that fix the reference to particular tokens of that type. This does not only hold for acts and events, but in general for all things we verbally refer to: we always refer qua type, that is, using expressions that provide a type description. It may even be argued that this applies beyond language to thinking in general: we can’t think of things, or even perceive things, without categorizing them in one way or another.

I will refer to a token-of-a-type as a “TT” for short, and introduce the following notation:

figure c

TTs are essentially ordered pairs of an entity and a type such that the entity is of this type. It follows immediately that two TTs t/ T and t’/ T’ are different if T and T’ are. Goldman himself never speaks explicitly of act-tokens-of-a-type, but always of act-tokens and of act-types. However, due to the conditions in (2), he implicitly talks of TTs whenever he talks of act-tokens in the context of his theory. We will keep this in mind for the following discussion.

2.2 Goldman’s Theory of Act-Levels

2.2.1 The Multilayered View on Human Action

Goldman’s point of departure is the observation that agents when they act may do several distinct things in one; they produce a set of several act-tokens. Goldman emphasizes that these act-tokens are distinct “because”, he argues, “the properties picked out […] are distinct properties” (Goldman 1970: 12, his italics)—flipping the light switch is not a token of the same property as turning on the light is a token of, etc. One crucial difference of the properties distinguished concerns the respective causal relationships of the types of action: flipping the light switch may cause the light to go on, but turning the light on does not cause the light switch to be flipped. As a consequence of the regulations in (2), acts related to each other like in the examples cannot be identical as they are tokens of different properties. Goldman presents this argument against the proponents of what he calls the “identity thesis” put forward by Anscombe (1963) and Davidson (1963), among others he mentions [p. 2]. According to Goldman, there is one doing by the agent that constitutes a combination of distinct act-tokens of distinct act-types. Our construal of Goldman’s—that he is actually talking of TTs—avoids the ontological controversy between “unifiers” (Davidson, Anscombe and others) and “multipliers” (Goldman himself).Footnote 3

2.2.2 Act Levels and Level-Generation

In Goldman’s theory of action, the act-tokens enacted with a single doing are ordered in levels. Act-tokens at lower levels “level-generate” higher-level act-tokens of the same agent at the same time. If an act-token \(\,{\text{a}}\,\) by agent \(\,{\text{s}}\,\) level-generates an act-token a’, then  s  does a’ “by” or sometimes “in” doing a [pp. 20–1]. Goldman distinguishes four general types of level-generation. One of them is “augmentation generation”; I will set it apart from the other three (as Goldman himself does, to a degree) and turn to it later in Sect. 2.5. I will use original examples from Goldman (1970) in order to introduce and illustrate Goldman’s types of level-generation. As above, I use the symbol ↥ for level-generation, but I do not yet apply the notion of act-TTs, as I want to quote Goldman’s original definitions. A restatement of Goldman’s notions in terms of TTs will be undertaken in Sects. 2.5 and 2.6.Footnote 4

figure d

Among the introductory examples, (1a) and (1b) involve causal generation in all steps. In order to avoid confusion, it is very important to keep in mind that causal level-generation does not relate an act a with an event e caused by a, but an act a with the act a’ of causing such an event. For example, it does not relate the act of turning on the light with the event of the baby waking up; rather it relates the act of turning on the light with the act of waking the baby. Unlike the other two types to follow, causal generation raises the question as to whether the generating and the generated act happen at the same time. Goldman points out [p. 21] that it is generally inadequate for two acts a and a’ related by level-generation to state that the agent did a and then did a’. This holds even if a’ is causally generated and the effect caused sets in only later than a is done; thus, even if in the case of, say, (1d) y learns of x’s declining y’s request only several days later, one would not say that x declined y’s request and then disappointed her. Rather the disappointing act was done when x declined the request.

figure e

(1c) is a case of conventional generation; in (1d), the first step is conventional, the second is causal.

figure f

The distinction of types of level-generation reflects the fact that level-generation may draw on different types of connection between actions: on causal connections, on convention, or just on the constellation of facts (simple generation).

Goldman uses “act-tree” diagrams for complex level-generational act structures; the trees are to be read bottom-up. The act-tree in Fig. 1 contains instances of all three types of level-generation listed in (4).Footnote 5 The diagram displays six nodes that stand for act-tokens of different types as labeled. They are connected by arrows indicating the direction of generation. The numbers indicate the three types of level-generation as numbered in (4). The tree contains two act-nodes with upward branching generation. Moving the agent’s head not only conventionally generates indicating a refusal, but also causally generates upsetting the agent’s glasses. The agent’s declining the nomination causally generates his disappointing his followers; it also generates in simple generation breaking a long-standing tradition. The latter constitutes simple generation because it comes about by the mere circumstances of such a tradition having obtained for a long time. If an act-token generates two or more others which do not generate each other, the generated acts are both at a higher level, but the levels are independent of each other; in particular, they are not the same level. According to Goldman [p. 31], two acts are “at the same level” if and only if they are distinct but generated by the same act and generating the same acts. His examples include ‘hitting the tallest man in the room’ and ‘hitting the wealthiest man in the room’ where in the circumstances given the tallest man in the room happens to be the wealthiest one. I will neglect the issue of same-level acts in the following.

Fig. 1
figure 1

Goldman’s act-tree for declining the nomination for vice-president

Goldman gives the following general definition of level-generation.Footnote 6 He also includes the type of augmentation generation which we exclude, but the definition applies to the three types in (4) just the same.

figure g

The condition in (ii) that a and a’ be not co-temporal is in need of explanation. According to Goldman’s introduction of the term, two acts a and a’ are “co-temporal” if and only if the agent of \(\,{\text{a}}\,\) does a “while also” doing a’, as an instance, one might add, of multitasking. If x turns on the light by flipping the light switch, x does not flip the light switch while also turning on the light. Thus, condition (ii) bars level-generation between acts exerted in parallel. It does not preclude that the acts related by level-generation do not have the same temporal extension—to the contrary, they necessarily have. “There is a sense […] in which pairs of generational acts are always done at the same time” Goldman explains [pp. 21–2].

Goldman’s definition captures important basic properties of level-generationFootnote 7:

figure h

Goldman’s definition secures the basic relational properties of level-generation. The relation of “level-generation is intended to be asymmetric, irreflexive, and transitive” (Goldman 1970: 22). Since it is irreflexive, no act generates itself. Asymmetry prevents two acts from generating each other. Due to transitivity, if \(\,{\text{a}}\,\) generates \(\,{\text{b}}\,\) and \(\,{\text{b}}\,\) generates \(\,{\text{c}}\,\), then \(\,{\text{a}}\,\) generates \(\,{\text{c}}\). As a consequence of transitivity, level-generation may result in chains, and due to irreflexivity and asymmetry the chains cannot contain loops. (If loops are not excluded, acts in a loop would generate themselves and generate their generators.)

Transitivity has two important consequences. First, we may combine a given sequence of level-generations into one larger step. For example in (1a) we might skip some of the levels; somebody might warn the agent: “if you flip this switch, you’ll ruin your night!” Second, it may conversely be possible that a given step be broken down into several smaller steps. For instance, one might analyze the level-generation of ‘flip the light switch’ ↥ ‘turn on the light’ into more steps that take into account what the agent does on the mechanical and the electrical level, like closing an electric circuit and thereby providing electricity to the bulb in a lamp, heating a wire and making it radiate light. A fine-grained analysis like this might matter under circumstances where the attempt to turn on the light by flipping the switch fails.

Asymmetry, irreflexivity, and transitivity hold for generalized level-generation comprising the causative, conventional, and simple type. It is these logical properties of level-generation that give rise to tree structures as the one in Fig. 1.

2.3 Critics of Goldman’s Theory

Goldman’s theory was criticized by Castañeda (1979), Bennett (1988), and McCann (1982), among other philosophers. The central target of criticism is Goldman’s formal definition of level-generation quoted in (5). The critics showed by counterexamples that it would apply to cases of act pairs that are obviously not intended to be included. This criticism is justified, but it fails to invalidate Goldman’s theory of level-generation; it just shows that Goldman’s attempt at a formal definition did not achieve an adequate description of level-generation.

Goldman’s definition in (5) is essentially in terms of logical conditions on two statements s does a and s does a’ where s’s doing a level-generates s’s doing a’. Logical conditions, properties, and relations are in terms of truth-values (entailment) or in terms of extensions of concepts. For example, if a sentence B is always and necessarily true if sentence A is, then A and B are related by logical entailment: A entails B. If a concept P is such that it applies to all cases that another concept Q applies to, then P is in the logical relationship of superordination to Q. By contrast, conceptual relations concern the conceptual content. For example, the two sentences Today is Tuesday and Tomorrow is Wednesday logically entail each other, but they are not the same. There are conceptual meaning relations between them that explain why they are logically equivalent (both refer to a day, the second sentence to a day following the one referred to in the first; Wednesdays are related to Tuesdays in the same way). Logical relations derive from conceptual relations; for example it derives from the concepts of ‘perceive’ and ‘hear’ that ‘x hears y’ logically entails ‘x perceives y’. But conversely, no particular conceptual relation derives from entailment. Thus, Goldman’s condition (5iiia) does not tell us how the categorizations of a and a’ are conceptually related, for example in the way that a’ of type A’ is done by exemplifying some a of type A. Taking a look at the conditions in (5), we realize that (5i) is just a restricting precondition for the definition, and that the conditions in (5iii) are in terms of logical entailment (or can be paraphrased as such). The only (probably) non-logical condition is the restriction in clause (5ii) that a and a’ be not co-temporal; but this weak constraint is far from capturing the basically non-logical notion of level-generation. Level-generation, as introduced by Goldman, is a genuinely conceptual, or as I see it, cognitive relation. In his reply to Castañeda (1979), Goldman explicitly locates level-generation in the realm of psychology:

figure i

Given that, Goldman’s definition in (5) fails to capture the real nature of the notion of level-generation—in fact no definition in terms of logical relations can. A definition like the one intended in (5) can only provide necessary logical conditions to be met by level-generation. The critics mentioned were right in pointing out that Goldman’s attempt at a [logical] analysis of the relation does not provide a sufficient condition; but this circumstance does not invalidate the underlying intuitive notion of level-generation that Goldman’s attempt at an analysis was aimed at.

figure j

It appears uncontroversial to consider the rich analysis of doings like the ones indicated in the examples as “real” in the sense that if an agent acts in a particular situation and we consider a multilevel conceptualization adequate, then all the act-types, to us, are “really” enacted in this one doing. Thus, Goldman’s theory of human action can be considered a contribution to ontology, and metaphysics, of the world as it is perceived and conceived by human cognitive agents, i.e. of what is real to us.

2.4 Goldman’s Theory of Human Action Applied to Cognitive Representation

In view of the two quotes cited, I will apply Goldman’s theory to the cognitive representation of human action (a construal which was not applied by the philosophical critics). If, to us, an act constitutes a whole tree of act-TTs, I will assume that our cognitive representation has this tree structure, composed of representations of the participating types of act. I assume that level-generation is a fundamental cognitive mechanism, ubiquitously at work in our cognitive systems. Whenever somebody acts, we will try to interpret their action at levels beyond the pure doing, and will thereby come up with a view that, for example, explains the action as the result of the agent pursuing certain intentions to be accomplished at some level generated; we will try to relate the action to ourselves as some type of act towards us; we will often appraise the action as positive or negative in various regards; we will take it as constituting interaction with ourselves, and so on. All these views amount to the addition of cascade levels to the doing. Thus, there are quite general level-generations we may assume, like the following:

figure k

In view of such examples, it is hard to imagine that we do not level-generate whenever we observe the actions of others, or plan and execute our own. Level-generation as a cognitive process will very often be automatic, not involving any conscious reasoning.

Construing Goldman’s as a theory of cognitive representation of action will enable us below to apply it to semantics—which I take to be part of a theory of cognitive representations, too, in this case of linguistic meanings. But before we turn to this aspect, I will restate the basic points of the theory in terms of act-TTs, and also undertake a slight revision of Goldman’s view of “augmentation generation”.

2.5 Level-Generation and Augmentation Generation

Goldman (1970: 28–30) distinguishes three subtypes of what he calls “augmentation generation”Footnote 8,Footnote 9:

figure l

Goldman himself did not seem entirely convinced that augmentation generation is of the same kind as the other three types of level-generation (cf. his discussion pp. 28–30). Related to the conceptual level, augmentation in all varieties mentioned is enrichment of a given act-type concept: the original concept is maintained and a condition, or circumstance, added such as to form a concept that is more specific. In ‘extend one’s arm out the car window’, the direction of the movement is added as a particular circumstance, analogously for manner augmentation; for compound augmentation, the co-temporal acts constitute the crucial circumstances for each other.

The application of the augmented concept must be narrower than the application of the concept augmented. If a concept A+ is an augmentation of a concept A, then A+ unilaterally entails A, that is, A applies to all cases to which A+ applies, but not conversely. As we saw in (6d), entailment does not pertain with the other types of level-generation.

Rather than attempting to subsume augmentation under level-generation, I recognize the conceptual process as a mechanism of its own, independent of the phenomenon of level-generation. Augmentation is the well-known, basic, and ubiquitous conceptual process of concept enrichment: a given concept/categorization/type is enriched by adding conditions. Thereby the extension of the concept is narrowed down. As a cognitive process, augmentation, or enrichment, is of fundamental importance. It underlies learning in form of gradual differentiation of a concept; it is involved in all processes of adding information to existing knowledge representations, including concepts for categories. In the theory of types such as in Carpenter (1992), the relationship between a given type and an enrichment of it is established as “subsumption”, the wider, less rich, type subsumes the narrower, enriched type.

Augmentation is a basic process along with level-generation; it may even be more general. The definition in (11a) defines the general notion as a relation between concepts in general; it applies to act-types in particular. The definition is generalized in (11b) as to cover Goldman’s compound augmentation. (11c) defines the derived notion of an act-TT a+/A+ being more specific than an act-TT a/A; in the case of compound augmentation, the relation holds between each component act and the compound act.

figure m

By referring to the act tokens as “a” and “a+”, it is not implied that they are different as such. In fact, by the very definition, if a+ is a token of act-type A+, then it also is a token of all act-types A that subsume A+ . The notation for the act tokens is chosen for convenience in order to fit in with the distinction of act tokens involved in c-constitution. We will refer to both, the relation between types and the relation between TTs, as augmentation.

Augmentation shares certain basic properties with level-generation. (i) By definition, augmentation preserves all information. Thus, if we apply augmentation to an act-TT a/A, then the agent of a+/A+ is necessarily the same as the agent of a/A; the same holds for the act times of a and a+. Note that this also holds in the case of compound augmentation: the subsumption relation can only obtain between A1, …, An and A+ if all n + 1 act-types have the same agent and time specification. Thus, the analogue of (6a, b) applies to augmentation. (ii) Augmentation, too, is an asymmetric, irreflexive, and transitive relation between act-TTs, and hence generates tree structures. Applied in the same domain, we can form trees that involve both augmentation and level-generation. However, there is one fundamental difference between augmentation and level-generation in the narrower sense: level-generation requires logical independence, while augmentation involves logical entailment.

I define “cascades” basically as Goldmanian act trees. I introduce a new term because I want to be able to extend the notion to multilevel representations of things other than acts.

figure n

According to this definition, act-cascades are co-extensive with Goldmanian act-trees, but they are considered to be not all produced by sub-types of what I call “level-generation”.

2.6 C-Constitution

2.6.1 The Relations c-by and c-in

Goldman mentions the two options of paraphrasing the downward relationship between a generated act-TT h/H and its generator l/L, with a by or an in paraphrase: ‘Agent does h/H by doing l/L’ or ‘Agent does h/H in doing l/L.’Footnote 10 He exempts augmentation . Goldman does not elaborate on the question as to when one or the other type of paraphrase is adequate, but there is some discussion in Kearns (2003), although she does not refer to Goldman’s theory. Kearns discusses in versus by paraphrases in connection with certain action predicate types, to be discussed in Sect. 3.3 as “criterion predicates ”. What I refer to as lower and higher level, she calls ‘host ’ and ‘parasite ’, respectively. According to her, an in paraphrase expresses that “the host simply realizes the parasite ” [p. 602]; while a by paraphrase expresses that “the causative parasite is not realized simply in the occurrence of the one action performed, but requires also a consequential upshot” [p. 615]. It is not clear from her discussion either, when which of the two paraphrases applies. Still, Kearns ’ observation that the in paraphrase applies when the generating act simply realizes the generated act seems to be a valid generalization. We would say, for example, in the case of (13) that the casting of the speaker is the mistake.

figure o

By contrast, cases of generation where a by paraphrase is adequate seem to not allow for the equation, in this sense, of generating and generated act:

figure p

Clearly, giving young people the facts about AIDS is not, in itself, a reduction of the number of HIV infections, rather it is a possible means, or method, of achieving that.

I conclude that there are two distinct inverse cascade relations that can be described by using in or by, respectively. These are alternative inverses of the relation of level-generation. I index the relations with the subscript ‘c’ for the given circumstances since these relations, like level-generation, only hold under circumstances.

figure q

A simple intuitive description of the relation between the generating act l/L and the generated act h/H derives from these definitions; it holds in both cases: Under the given circumstances, doing L is a way, or a method, to do H.

2.6.2 The Relation of C-Constitution

Rather than striving for a general formal definition of level-generation, I will apply the notion to the more concrete three types, causal, conventional, and simple. I will also introduce a different term, and with it a slightly different perspective: the notion of level-generation emphasizes the process of creating additional categorizations for a given act-TT. In the following I will focus rather on the conceptual relation between the act-TTs, and speak of “c-constitution”. Thus, the following definition of c-constitution can mutatis mutandis be taken as a definition of level-generation:

figure r

3 Cascades and Verb Classes

In this section, I will apply the cascade approach to verb meanings, that is, lexicalized act-TTs. Goldman never did this, although, of course, he used English verbs for referring to the act-types he discussed. The recognition of the fact that Goldman’s theory applies to TTs opens the way to consider level-generation as a relation between act-types, abstracting away of the particular circumstances under which a TT is exemplified. The cognitive perspective developed here allows us to apply the theory to lexical verb meanings if we assume, as I do, that these consist in event concepts that cognitively represent the type of event a verb denotes.

Applying cascade theory to lexical action verb meanings and to certain morphological and grammatical phenomena will yield ample evidence for the relevance of the approach to verb semantics. We will start out with the distinction between basic and non-basic act-TTs and demonstrate that most verbs appear to denote non-basic act-types.

3.1 Basic Versus Non-basic Act-Types

The notion of level-generation raises the question whether there is a basic level of action. Goldman’s (1970) answer is positive. His examples of basic act-types include the following:

figure s

Informally, a type of action is basic if it does not require a generating act of a different type in order to come about. Basic act-types are exemplified immediately, not by means of level-generation. A convenient test for non-basic act-types is to check if there are different types of act for implementing it. For example, depending on the circumstances, an electric light may be turned on by doing various more basic things, like flipping a light switch, triggering a motion detector, using a smart phone touch display, or giving a voice command to an electronic device that controls the light. Thus, ‘turn on the light’ is not a basic act-type. Similarly, if you are working at a computer, you may bring the cursor on the screen to a certain position by various methods, including a mouse click, using a mousepad, arrow keys on your keyboard, or touching the screen, if it is a touchscreen. Even these act-types are not basic, though; basic are just the simple bodily movements. By the way, none of the act-types displayed in the act-trees in (1) at the lowest level displayed is basic.

According to Goldman [p. 67], all action is caused by a current want to act correspondently. Essentially, he defines basic act-types as things an agent would do if they had the want to do so and were in standard condition with respect to this type of act, and if the act can be brought about without level-generation. Basicness is primarily defined for act-types, and derivatively for act-TTs.Footnote 11

3.2 Verbs of Basic and Non-basic Action

The meaning of a verb describes a type of situation; for action verbs, it describes a type of act. The distinction between basic and non-basic act-types therefore immediately carries over to verbs. If one takes a look at corpus and dictionary data, it turns out that non-basicness of action verbs is the rule rather than the exception.

Table 1 displays the 100 most frequent English action verbs, among the 156 most frequent verbs in all. The table was obtained by checking the entries in the online Oxford Dictionary of EnglishFootnote 12 (ODE) for the most frequent English verbs in the online British National Corpus. A verb was counted as an action verb if the first sense in the dictionary entry has an agentive , non-stative description. It was classified as non-basic if the definition was in terms of multiple synchronous or sequential action, if the method was left open, or if a cascade -like definition is given (“do … by doing ---”). In the table, verbs of social action are marked with italics. Social action is necessarily non-basic, as its social character derives from social rules. For any type of social action , a generating physical act is required that under circumstances will count as that type of social action , according to some rule. Thus, concepts for social act-types always involve conventional generation .Footnote 13 I classified verbs as social if the sense description mentions interaction with other persons.

Table 1 100 most frequent English action verbs (verbs of social action are written in italics)

Among the one-hundred action verbs, there is not a single example of a clearly basic-act verb. One verb might be a candidate: The ODE describes the first sense of stay as ‘remain in the same place’Footnote 14; it is a borderline case, however, and the fact that it seems basic may just be due to it not involving doing anything concrete. Certain verbs in the list may appear basic, but they aren’t. For example say is not basic because saying something involves a complex cascade of actions, starting from the basic acts of what we do with our articulatory organs in order to produce speech sounds; the sound productions may or may not constitute productions of linguistic sounds like vowels and consonants; even if they do, they need not necessarily constitute acts of ultimately producing ordinary words and grammatical sentences. I will come back to this special case of action in the brief discussion of Austin’s speech act cascade in Sect. 5.1. Even a seemingly elementary verb like sit is not basic (as an action verb ): depending on what the agent sits on, a chair, a bike, a swing, etc. the action requires different physical activities; sit may also mean ‘sit up’ from a lying position, or ‘sit down’—asking for yet different physical action. Apart from these senses, there is the transitive use of sit as in sit the child on one’s shoulder. Even if certain verbs denote action that is closely related to a particular body part, like kick, they are not necessarily basic, as one can, for example, kick with various parts of the foot, with one’s shin, one’s knee or thigh—variants of kicking that are executed by different more basic types of action.

As a result, it appears that there may be no basic-act verbs at all among the 100 most frequent English verbs. Are there any basic-act verbs in English, verbs that invariably denote basic action rather than what is accomplished by some type of more basic action? The verbs in Goldman’s basic action examples in (18)—extend, move, bend, shrug, open, turn, pucker, wrinkle—are not in themselves verbs of basic action. In Goldman’s examples, they are all transitive verbs and their basicness depends on the choice of a particular body-part as the object argument. For types of object other than one’s own body-parts (‘move the table’, ‘turn the pancake’, ‘open the door’), there would be various methods of enactment available. Some of the verbs have intransitive action uses—move, bend, shrug, and turn; among them, shrug is a candidate for a basic-action verb because to shrug is the same as to shrug one’s shoulder; maybe intransitive bend is another one.

It is not surprising that there are so few verbs that denote basic acts. The vocabulary of natural language serves communication in, and about, our reality, and this is to a large part social reality. Verbs of action are used in order to describe what people do. If we were restricted to verbs of basic action, it would be extremely hard, if not impossible, to describe what people are really doing (try to say that you are writing an article by reporting the basic physical movements you make to do so—no-one would understand what you are describing). Quite generally, it seems, we communicate about what people do on considerably advanced levels of cascading. Verbs like help supply a good illustration of the ‘abstractness’ of action concepts. Ranking 24 in the above list, it is central vocabulary. According to the analysis in Engelberg (2005), the verb means essentially ‘do something for somebody that improves their situation’. The concept of helping leaves open what the generating action would be concretely; in fact, an action of almost any type may constitute help in one situation, and the contrary in another, and the very same act-token may constitute help for one person and a big problem for another. In social life, improving others’ situation is of utmost importance; it applies to all kinds of situation in our complex lives; we need general verbs like this.

For another source on basicness or nonbasicness, one may take a look at Levin’s (1993) English Verb Classes and Alternations, where a comprehensive collection of semantic verb classes is compiled and described. There are 49 major classes distinguished, almost all of them action verbs—not a single class is basic-action.

3.3 Criterion Predicates

Goldman’s theory of action was not really taken up in semantic theories of verb meaning.Footnote 15 There is, though, a small thread of discussion on the semantic analysis of by gerunds where a two-level view on the meaning of selected types of action verb is adopted. The discussion starts out with Kearns (2003). Kearns distinguishes two special classes of action predicates which she dubs “causative upshots ” and “criterion predicates ”. Causative upshots are transitive predicates like cure the patient or convince s.o. [p. 599]; they denote the achievement of some sort of change by doing something more concrete, e.g. curing someone by administering a certain treatment, or convincing someone by presenting evidence. Criterion predicates are often intransitive and not inherently causative ; they include predicates such as make a mistake, break the law, score a goal, or prove a theorem. As with help, the predicate requires that something be done that fulfils a given criterion, while the method is left open; it can be specified with a by or in locution (recall the example in (13)). For both types of predicate there is, in Kearns ’ terms, a “host ” and a “parasite ” [pp. 600–1]. The “more abstract ” parasite , the causative upshot or criterion predicate , is denoted by the verb and is implemented, or accomplished, by the “more concrete” host . For example, the parasite is ‘breaking-the-law’ and the host is a theft; the parasite is ‘curing-the-patient’ and the host is administering the treatment. Clearly, Kearns ’ hosts level-generates the parasites . Kearns does not mention Goldman’s work, though. Her analyses are confined to two levels, and to two special classes of generated act-types .

The two classes of verbs were taken up in Sæbø (2008, 2016). He chooses different terms for Kearns’ causative upshots (“manner-neutral causatives” in 2008, “method-neutral causatives” in 2016); hosts and parasites he calls concrete and abstract.

Notably, the “hosts”, or more concrete acts, are not basic in the sense explained here, at least not necessarily so; they may be high-level act-types. What matters here, is that the two authors distinguish within one verb meaning different levels of action related by, in fact, level-generation.

3.4 Means of Explicit Level-Generation

In addition to this lexical evidence for cascade-structure action concepts, there are numerous lexical and grammatical mechanisms operating on verbs and their lexical meanings to the effect of generating further cascade levels. Some of them involve word formation, for example affixation, or conversion from a different word class, others employ certain grammatical constructions, or types of adverbial. The examples in the following are chosen for the sake of illustration; they do not provide a systematic survey, but represent just the tip of an iceberg. Almost all the cases described involve augmentation along with level-generation. The augmentation of the underlying action concept iconically corresponds to the augmentation by word formation and/or syntax at expression level.

3.4.1 Adding a Level of Social Interaction

Many lexical and grammatical processes add a further argumentFootnote 16 to a given action concept . This amounts to augmentation of the underlying concept, but in addition c-constitution is involved, on top of the augmentation . I will discuss the addition of arguments of the type ‘person’; this will inevitably have the effect of cascading to a level of social interaction .

Many basic types of bodily action are used as non-verbal signals in communication. For example, the verb expressions smile, frown, raise one’s brows, wink, nod, shrug, bow, kneel down, fold one’s hands, scratch one’s head, wave one’s hand, and others can also denote communicative action. They do so invariably if they are used with a prepositional phrase that adds an addressee: ‘smile/wink/wave/frown at someone’. German has verb prefixes such as in zu-zwinkern (‘wink at’) or an-lächeln (‘smile at’) which serve the same effect of enriching the argument structure with an addressee.Footnote 17 (19a) is an example that attests the social-level relevance of zuzwinkern. The concept of zuwinkern has the informal cascade structure in (19b).

figure t

German an and zu can also be used as prepositions marking an additional addressee argument for verbs of communication: schreiben an + accusative NP ‘write to’ or sprechen zu + dative NP ‘speak to’.

Similar to these cases are applicative constructions (Van Valin and LaPolla 1997: 337–8). Japanese has several such constructions consisting of two verbs. The first verb is in the gerund -te form and the second a verb of possession transfer, such as ageru ‘give upward’ and kureru ‘give downward’; the direction component is metaphorically used for expressing ‘give to superior’ or ‘give to inferior’. A speaker will always treat the addressee as socially superior and themselves as inferior; therefore the beneficiary in the -te ageru construction will typically be the other, and the agent typically the self or someone related to the self. The complex expression is used to describe doing a favor.Footnote 18 The cascade analysis has the first verb as the generator.

figure u

Thus, the construction has the structure of a criterion predicate, with the method specified. A similar construction in Mandarin is discussed in Tsai (2012). It makes use of the verb gěi 给 ‘give’ that is otherwise also used as a standard verb of giving (Chang 2016: 251–2).Footnote 19

figure v

Van Valin and LaPolla (1997, p. 384) describe beneficiary constructions in Lakhota with essentially the same semantics. German has a special use of the dative in such casesFootnote 20:

figure w

As witnessed by the translation, English has a for-complement construction with the same function.

3.4.2 Adding a Level of Achieving a Result

Predicate expressions such as hammer flat or drink empty consist of a verb of action and a predicative adjective that denotes a resulting state of the object acted upon. Resultatives of this type denote an action that is generated by an act of the type of the base verb; for example, hammer flat denotes a cascade of the structure ‘hammer …’ ↥ ‘flatten’, and drink empty a cascade ‘drink …’ ↥ ‘ emptyverb ‘. However, the cascade first requires an augmentation that adds the affected object. Thus, the analysis again requires two cascade steps:

figure x

Dowty (1979), and many others since, analyzed this type of construction as causative in the sense that, for example, drink the glass empty means ‘drink from the glass and [thereby] cause the glass to become empty’ (Dowty 1979: 93). This is reflected by the analysis in (23) if ↥ is taken as representing the causal type of level-generation. German has a lot of particle verbs with a resultative particle such as tot- ‘dead’ in tot-schießen ‘shoot to death’, klein- ‘small, little’ in kleinschneiden ‘cut into small pieces, chip’ or an- ‘on’ in anknipsen ‘to flick on’; these can be analysed analogously.

Van Valin and LaPolla (1997: 90) mention verbs of killing in Lakhota; they have the form of compounds with the first part indicating the method of killing, and the second a verb t’a that means ‘dead / to die’, for example ka-t’a ‘strike to death’ (ka- ‘by striking’), ya-t’a ‘bite to death’ (ya- ‘with the teeth’), yu-t’a ‘strangle’ (yu- ‘with the hands’). English can generally use the addition to death for level-generating a predicate of killing. German has a series of verbs of killing with the prefix er- that does not have much of a lexical meaning on its own, but rather constructional meaning in this type of verb formation: erschießen (‘shoot to death’), erschlagen (‘beat to death’), erwürgen (‘choke/strangle to death’), erhängen (‘hang’), erdrücken (‘crush to death’), and several more.Footnote 21—The generating act-type fails to be specified in cases of conversion of adjectives to verbs; the adjective denotes the resulting state of the object of an unspecified action: empty, fill, smooth, etc. These verbs are method-neutral predicates in the sense of Sæbø (2016).

3.4.3 Adding a Level of Appraisal

A further type of cascade extension adds an appraisal to the action-verb concept. German has a productive word formation pattern that derives from almost arbitrary verbs of action a verb used to express failure; these verbs have been dubbed ‘erratic’ verbs (see Fleischhauer 2016: 293). One variant of the derivation adds the prefix ver- to a transitive verb and yields another transitive verb (die Hecke verschneiden, ‘cut the hedge in the wrong way’Footnote 22); a second type adds the same prefix and the verb is reflexivized as to form an intransitive predication (sich verschneiden ‘cut in the wrong way’). This derivation adds a cascade level of failure: ‘cut’ ↥ ‘fail’. Thus, this is another mechanism that produces criterion predicates . The highest level of the cascade is fairly unspecific, but the cascade as a whole yields the meaning expressed. English has some erratic verbs with the prefix mis-: misunderstand, misdirect, mishear, but the pattern is far less productive than the German one.Footnote 23

Other constructions across languages serve the generation of a level of ‘doing too much’: cf. English overcook, overheat, overpay etc.; Russian uses the prefix pere- in a similar way (pere-gret’ ‘overheat’).Footnote 24 Japanese has verb compounds with the second verb -sugi-ru ‘exceed’, for example nomi-(‘drink’)-sugi-ru ‘drink too much’.Footnote 25

A two-verb construction in Mandarin with the second verb 玩 wán ‘play’ can be used to express the level-generation of acting for pleasure:

figure y

German has a very productive adverb formation that adds -erweise to an adjective or a present participle stem. This type of adverb is used for evaluating an act, or more generally an event or a state. Examples include dummerweise ‘stupidly’, erstaunlicherweise ‘surprisingly’, unnötigerweise (‘unnecessarily’), glücklicherweise (‘luckily’), and hundreds more. They correspond to English adverbs in sentence-initial use.

figure z

This type of adverb projects the verb to a criterion-predication level. For example, adding dummerweise to a verb V, has the effect of [V] ↥ ‘do something stupid’.

3.5 Implicit Level-Generation

It may be worthwhile considering cases of “integrated” augmentation generation of the types discussed above as they provide a glimpse into the decompositional structure of certain types of action concept.

Appraisal. One group with an integrated specific evaluation is constituted by verbs of forbidden action, e.g. lie, steal, trespass, rob, rape, murder, and many others. These add to the concept of a particular type of action a level ‘do something forbidden/illegal’. Thus, there is a cascade relationship between ‘kill’ and ‘murder’. ‘Murder’ can project further to ‘assassinate’ if the victim is an important person, giving rise to elaborate cascades such as ‘shoot’ ⊏ ‘shoot at y’ ↥ ‘kill y’ ↥ ‘murder y’ ↥ ‘assassinate y’.

Result. Van Valin and LaPolla (1997) distinguish causative and active accomplishments, and achievements. Causative accomplishments are verbs like kill: the agent does something that causes somebody to die. The authors apply the following general half-formal analysis to this type of action verb [pp. 188–9].Footnote 26

figure aa

This reads essentially as follows: agent x does something of the type predicate1 which causes x or y to change into the condition denoted by predicate2. The first part of the analysis—do x, [predicate1(x, (y))]—describes an action by the agent x (that possibly involves another participant y); according to the second part—CAUSE [BECOME predicate2(x) or (y)]—x’s doing causes x or y to enter the condition described by the second predicate. The whole formula describes the constitutive condition for causal generationFootnote 27:

figure ab

Causative achievement and accomplishment verbs with an agent argument are abundant in natural languages. Typically, the generating level of the more basic method action is not specified.

Signaling. As mentioned above, some action verbs of basic or near-basic level can be used to denote a social-level act of signaling (smile, frown, harrumph, nod, shrug, and others). If used in this sense, they incorporate generation of a social level. As social agents, equipped with the “sense-making machines” our minds are, we usually try to come up with a construal of the acts of others as meaningful beyond the mere act. The verbs mentioned reflect this tendency by incorporating a higher cascade level in lexicalized meaning variants.

4 Cascades and Frames

Application of Goldman’s approach to psychology calls for a framework for modelling cognitive representations. I apply the theory of Barsalou frames as further developed in the Düsseldorf context of research on the structure of representations.Footnote 28 The framework is applied to the decompositional analysis of lexical meanings and the modelling of compositional processes, among other things.Footnote 29 I will characterize it here very briefly and then propose an integration of cascade structures into the theory.

4.1 Barsalou Frames

As a working hypothesis, I adopt Barsalou’s Frame Hypothesis, according to which Barsalou frames constitute the universal format of concept representation in human cognition.Footnote 30 It is assumed that lexical meanings are concepts stored in long-term memory and that compositional meanings are concepts formed as the result of syntactic and semantic processing, essentially by unification .

According to Löbner’s (2017) formal theory of Barsalou frames, a frame structure is a coherent network of nodes connected by functional attributes. The nodes represent individuals in a global universe of discourse. The attributes are functions that for individuals of an appropriate type return another individual of the same or another type as value. For example, the attribute size returns the individual size for all individuals that have size; the attribute mother returns the mother for every animal with parents; the attribute head returns the head for those things that have a head. The values of attributes may carry their own attributes; thus, frame structures are recursive. In a frame, type restrictions may be imposed on the nodes, that is, conditions specifying that the entity represented by the node belong to a certain subset of the universe. The frame structures defined in Löbner (2017) are first-order in that the underlying ontology provides a universe of discourse, the set of all individuals, and the attributes are functions that return individuals to individuals. The universe does not contain second-order entities such as properties, relations, attributes, or first-order frames. Frame structures can be translated into an appropriate first-order predicate logic language (see Löbner 2017: 99–109 for details).

Frames are usually represented by frame diagrams (see examples below), or else by attribute value matrices. I will use diagrams. There is always a distinguished central node that represents the individual described by the whole frame. Frames have the same double nature as Goldmanian act-TTs: they represent a token of a type. A frame diagram as a whole provides a type description of the token represented by the central node; the analogue holds for frames represented by attribute-value matrices.

In the context here, we exclusively deal with frames for actions. Actions are a particular type of individual in the universe, a subtype of events. All events have an attribute τ for the time they occupy; therefore every action frame has this attribute on the central act node. Actions have an agent whence the act node in an action frame carries an attribute agent. For the current discussion in the context of a theory of human action, it will be assumed that agents are persons. An action frame may contain more attributes of the act, corresponding to more semantic roles such as theme, patient, instrument, goal etc.Footnote 31

4.2 Cascades in Frame Theory

The question arises if cascades are another variant of frames. Löbner (2017) allows only first-order attributes in frames. The cascade relations c-constitution, c-in, c-by, and subsumption, however, are essentially and irreducibly second-order, because they relate types, i.e. whole first-order frames. Apart from that, the upward relations are not functions. Due to transitivity, a level-generating act-token does not project to a uniquely defined token it generates. In addition, level-generation may branch upwards. Thus the cascade relations cannot figure as attributes within first-order frames. I will integrate them into frame theory as second-order relations between first-order frames.

Let us consider a simple two-level cascade for illustrating the interplay of frame representation and c-constitution:

figure ac

The cascade diagram in Fig. 2 contains the frames for a1/‘Bill turns on the light’ and for a2/‘Bill wakes the baby’ at the lower and the upper level, respectively. The two frames are parallel in structure. They have a central act node that represents an act of the type indicated by the bold-face type label. In both frames, the action nodes carry the attributes agent and τ. Both frames also have a theme attribute on the central node, of different nature. As the two frames are related by c-constitution, the attributes agent and τ necessarily both take the same value in the lower and the upper frame. The identity of agent and time cannot be expressed by linking the attributes in both frames to one value node; attributes cannot take values in another frame than their argument node belongs to. The identity of values can only be accomplished by assigning the same individuals as values for the two attributes, respectively. The dashed upward arrow in Fig. 2 stands for the relation of c-constitution between the two acts.

Fig. 2
figure 2

Cascade formed by two frames

A structure formed by more than one first-order frame is itself second-order, that is, a hyperframe. Hyperframe structures are a natural extension of first-order frame theory. For example, if one is to model scripts with frames, one will have to design hyperframes that consist of first-order action frames for subsequent acts, connected in an appropriate way.

5 The Writing Cascade

We will now turn to an elaborate example, the cascade for the act-type ‘write by hand’. It will be used to discuss the consequences that the adoption of the cascade model to lexical verb meanings has for semantic theory. As a prelude, we will have a brief look at Austin’s (1962) speech act model. Austin’s analysis anticipated Goldman’s multilevel theory of action; Goldman mentions it as such in his introduction [p. 8].Footnote 32 The speech act cascade also prepares the discussion of the writing cascade in the section to follow because the upper levels of the speech act cascade also appear in the write [act] cascade .

5.1 Austin’s Speech Act Cascade

Austin’s (1962) analysis of speech acts constitutes a classical example of a cascade. Austin distinguishes five levels of action in an ordinary verbal utterance (Fig. 3). The “locutionary” level consists in saying something with a particular sense and reference in the given context of utterance. Within the locutionary act, Austin makes a finer distinction into three levels: with the “phonetic act”, the speaker produces speech sounds; the “phatic act” is “the uttering of certain vocables or words, that is, noises of certain types, belonging to and as belonging to, a certain vocabulary, conforming to and as conforming to a certain grammar.” (Austin 1962: 95); the “rhetic act” is “the performance of an act of using those vocables with a certain more-or-less definite sense and reference.” [p. 95]. The phonetic act generates the phatic act, and this in turn the rhetic act. Austin continues [p. 98], “To perform a locutionary act is in general, we may say, also and eo ipso to perform an illocutionary act”. Austin calls this level the illocutionary act in order to emphasize that it is done in performing the locutionary act. He thus explicitly assumes a c-in relation between illocution and locution. The achievement of the illocutionary act—a promise, an answer to a question, etc.—only succeeds if complex “felicity conditions” [pp. 25–38] are fulfilled. Austin discussed these conditions in detail, thereby offering an elaborate case study of the “circumstances” involved in these cases of level-generation.

Fig. 3
figure 3

Austin’s speech act cascade

Finally, by performing an illocutionary act, the speaker may execute a “perlocutionary act” that consists in causing a particular effect, for example, convincing, offending, or delighting the addressee. Austin calls it perlocution because it is done by performing the illocution [p. 108]. “[T]he perlocutionary act always includes some consequences” [p. 107]. Unlike the lower four levels of a speech act, the perlocutionary act may or may not be intended. The nature of the four level-generations is a combination of conventional and simple for phatic, rhetic, and illocutionary act; the level-generation of the perlocutionary act from the illocutionary act is causal; it does not involve convention [p. 121].

5.2 The Cascade Structure of Writing by Hand

We will now proceed to an example that is suitable to illustrate and discuss central aspects of applying the cascade approach to verb semantics. Figure 4 displays a cascade for the concept of writing by hand. This concept essentially constitutes the lexical meaning of the verb (except for the specification of the lowest level which we will argue in Sect. 6.1 is not specified in the lexical entry). It is roughly analogous to Austin’s cascade, but I will elaborate it more, commenting on the single-level frames and their relationships. The writing cascade has a lowest level of three co-temporal acts: the agent holds a writing implement in their hand, presses its writing part on some surface, and moves it along leaving a visible trace. Compound augmentation integrates the three co-temporal acts into the act-type at H1 ‘write by hand’, the first level that can be called writing, in the sense of producing visible lines and shapes. For reasons of space, the three frames for the acts of holding, pressing, and moving along are only represented by their central act nodes. In fact, they share the agent and the action time among them; they also have the same theme argument (i.e. the pen or other writing implement); the acts of pressing and moving share the surface as a third argument. Actually, the process of handwriting is even more complex; usually, the pen will not be in continuous contact with the surface since writing will require to lift the pen and move it to a different position on the surface. We neglect this aspect here.

Fig. 4
figure 4

The cascade for writing by hand

The higher Levels H1 to H5 consist of action frames that each have an agent and a product attribute (the attribute arrows are labeled accordingly only in the highest level). If Level H1 produces perceptible forms of writing on the surface, it generates Level H2 ‘writegraph’ of producing graphemes. Graphemes, in turn, may or may not constitute linguistic text: under circumstances, Level H2 generates Level H3 ‘writetext’. Again under circumstances, writing text constitutes a fourth Level H4 ‘writecontent’. Writing verbal content corresponds to the locutionary level in Austin’s cascade. To this level adds an illocutionary level H5 ‘writeillocution’, for example, an application, an excuse, a reply, a request, etc. The specific type labels for the agents will be explained in Sect. 5.4. A perlocutionary level is not assumed to figure in the concept of writing.

At each cascade level, the act is embedded in a different context, and each context comes with different conditions and requirements. The context of Level H1 is the same as, for example, the context of a drawing activity. The agent needs a surface such as a sheet of paper and a pen or other implement, maybe along with ink, paint, etc. The agent needs to be able to hold the implement and move it along on the surface at some level of motor control. The agent determines readability in terms of the size of writing, the visibility of the writing material on the surface, the durability of the product; they may be concerned with highlighting parts of the writing by different color or style. The product at Level H1 can be copied or scanned; if properly processed, it can be stored on an electronic device. At Level H2, the agent bothers about a writing system and a writing style; they need to command the skill of writing; they will write legibly or not. The Level 3 agent is concerned with choosing a language, with orthography and grammar; they need be in sufficient command of the language. At Level H4, the agent is an author of content, whereby the agent potentially relates to other content and its authors; for larger texts, the author is concerned with aspects such as coherence and structure which are crucial for comprehensibility. Obviously, producing text involves more abilities than just knowing the language. It is at the illocutionary level H5 that the agent enters social interaction with a reader addressee, possibly initiating or continuing a sequential exchange; the agent at this level will choose an appropriate type of text, a style and a tone of expression, which requires the relevant social skills. At each level, different criteria of successful action obtain. And each level is motivated and informed by what it serves to level-generate.

5.3 Types of Products and Levels of Manner Modification

Depending on the level, writing brings about different types of product, for example, lines, letters and characters, words, coherent text, illocutions, etc. This amounts to different selectional restrictions for each level. Correspondingly, if the verb write is complemented with a direct object such as whorls, e’s, “mama”, “I’m to the cafeteria”, a receipt, etc., an appropriate level within the cascade will be selected for application. If one were to describe the selectional restrictions for the theme argument of write in a single-level approach, one would run into an inconsistent type assignment for the product argument.Footnote 33

The level-distinction is equally relevant for the analysis of manner modification. (29) lists manner modifiers of write that are level-specific; others like slowly or beautifully may apply at more than one level.

figure ad

Without requiring disambiguation or coercion, the verb combines with any-level modifiers or product specifications. Simultaneous relation to different levels is possible, such as in the following example:

figure ae

5.4 Agencies at Cascade Levels

In Goldman’s theory, the agents of the acts in a cascade are presupposed to be the same. They are, however, in different roles, a fact that is blurred if one uses the same generalized attribute agent through all levels as I did in Fig. 2 and the writing cascade; the difference becomes transparent if one uses instead the more specific role attributes that actually apply. These are in the case of writing by hand:

figure af

Goffman (1979) introduced the notion of “footing” in order to distinguish different roles that the participants in a verbal communication can take on.Footnote 34 There are producer footings and recipient footings . On the producer’s side, which matters here, Goffman distinguishes the roles of “principal ”, “author ”, and “animator ”. The principal is the one on whose behalf an utterance is made, the one who is responsible. The author chooses the words, the animator produces the verbal signals . In everyday communication, the three roles are usually enacted by the same person. In institutional settings, however, like press conferences, public speeches, court trials, examinations, and countless others, the producer footings may be distributed among more than one person, present or absent; ghostwriters choose the words they don’t utter themselves, attorneys speak on behalf of their clients, typists type words not their own. In the diagram of the writing cascade in Fig. 4, the agent nodes are labeled according to Goffman’s distinctions. Agentship can in principle be delegated down the cascade if the higher-level agent is in a social position to do so. A lower-level agent is responsible to their higher-level delegators; ultimately, the principal will be held responsible for the performance of all the agents involved at the lower levels.

These considerations suggest a generalization of level-generation that allows for delegation of agency down the cascade, instead of strict identity of agents. In the realm of social interaction, delegated agency is a common phenomenon. For example, I may help somebody by delegating helpful action to a third party; I may pay a debt by having a third person pay who owes me money; I may break the law by making my subordinates do something illegal, and so on.

If agency does not split, there is a relation more specific than physical identity between the agent roles at the different levels—if these agents are not considered just persons but persons-in-a-particular-role. Let us assume that Erica holds a pen and moves it along a piece of paper. As such she is already in three roles, implementing the penholder, the one who presses the pen upon the paper, and the one who moves it along on the paper. If she produces script, she thereby implements a ‘writer-by-hand’. The implementation cascades upwards if Erica is successful in writing graphemes, thereby producing text, content, an illocution. Under the circumstances required, the agent at a given generator level implements the agent at the generated higher level. As the implementation is successful only under circumstances, I will talk of “c-implementation”.

The implementation relation is asymmetric: the writer-of-text implements a writer-of-content, but not vice versa, since text need not have content. It is also irreflexive: no role implements itself. And implementation is transitive. Thus, the c-implementation relation has essentially the same properties as c-constitution, except for the fact that it is a relation between persons and the roles they implement, rather than between acts. In analogy to c-constitution, I consider c-implementation as a relation between TTs, in this case persons under a particular role description, for example Erica/agent(h1/writeby hand), that is, “Erica in the role of the agent of an act h1 of the type ‘writeby hand’”.

C-implementation shares with c-constitution the question of grounding. Although c-implementation goes hand in hand with c-constitution of acts, the grounding of c-implementation is not just derivative from the grounding of c-constitution. Rather, for any level of action, including the basic level, taking the agent role means implementing it, for the person who acts. Hence, if l/L is the basic act-TT in a cascade to perform, the c-implementation chain starts with an additional prior step, taking the form in (32a), while the corresponding act-cascade is as in (32b):

figure ag

Figure 5 displays the two levels involved with agency: the person who implements the agent and the person in the agent role for a specific act. The act level may cascade further upwards.

Fig. 5
figure 5

The two levels of implementing an agent role

We may assume that a person is implemented by a living human, the human by an organism, the organism by biomass, and so on. This assumption would be in line with theories that model social entities such as persons as supervenient on biological entities, and these on chemical entities, etc. The problem of grounding persons is an ontological problem of its own.

This mismatch notwithstanding, we may consider to generalize the term c-constitution as to also cover the c-implementation relation. It makes sense to extend the use of the term in this way: the writer-by-hand under circumstances constitutes a writer of graphemes, who in turn may constitute an author of text, and so on.

5.5 Objects at Cascade Levels

Goldman’s notion of level-generation does not impose conditions on arguments other than agents. In view of the writing cascade, we see that it would be inadequate to assume identity of the products across levels because they exemplify ontologically different types of object. Extracting the product track from the cascade yields a multilevel conceptual description of the product on its own. The products are things of a quality that originates at Level H1, H2, etc. respectively. Again, there is a relation of constituency: under circumstances, the graphemes constitute text, the text constitutes content, the content an illocution.

The difference of description that applies to the products of writing at the levels distinguished is particularly conspicuous. This will always be the case for object arguments in action cascades of creating, destroying, or changing things, like bake, break, or repair. However, objects in any cascade will be in different roles, too, analogous to the agents in a cascade. Consider the following cascade, imagining circumstances that would support its formation:

figure ah

And now consider the role of the TV set at the different levels:

figure ai

5.6 A Multitrack Notion of C-Constitution

I argued above that the cascade relations are second-order because they are relations between act-types, and therefore relations between, rather than within, first-order frames, in the frame-model adopted here. We now see that there is an even stronger argument for the second-order view: c-constitution between acts necessarily comes along with c-constitution of agencies and potential further arguments of the acts if they are shared across levels. These other tracks of c-constitution are conceptualized as roles of the arguments involved. Hence, c-constitution is a multitrack condition. Figure 6 displays a three-track sub-configuration cascade that would apply to the writing example. Notably, the parallel tracks in an action cascade intrinsically harmonize. To each of them the same circumstances—the “c” parameter of c-constitution—are relevant, and with them the level-specific contexts. The diagram highlights the multitude of c-const relations, the three tracks can alternatively be considered the components of one complex inter-level relation.

Fig. 6
figure 6

Three tracks of c-constituency in a cascade

6 Reference and Composition

The assumption that action verb meanings are concepts with a cascade structure has far-reaching consequences not only for a theory of cognitive representation and decomposition, but also for the theory of reference and composition.

6.1 Meaning and Reference of the Verb Write

We call activities at all Levels H1 to H5 of the writing cascade “writing”, regardless if the higher levels are actually achieved. If we refer to a level higher than H1, a choice of alternative methods at Levels L and H1 is available, such as writing with a typewriter, or on a computer with a keyboard, on a smart phone with a touch screen etc. Thus, for present-day English, it is not to be assumed that the cascade in Fig. 4 represents the lexical meaning of the verb, as the lexical entry must not fix the method of writing. That does not mean that the level of the writing method is absent from the concept; it cannot be absent because it is required for logical reasons (there are no higher-level acts without appropriate generating lower-level acts). Thus, I assume that the lexical meaning of the verb write is the cascade in Fig. 4 with the lowest level H1 and its generators left unspecified. In general, verbs for non-basic action eo ipso call for a lexical analysis in form of a cascade. If an unspecified generating level is addressed, for example by a modification of write with shakily, it is to be accommodated suitably.

The multilevel structure of the meaning is not a case of polysemy, that is, different senses on a par with each other. Rather, it is a case of one sense with several components, organized into a cascade. Of course, action verbs with a cascade structure meaning can be polysemous independently, requiring a separate cascade analysis for each sense.

When the verb write is used referentially, it refers to a whole cascade of act-TTs. Even if the very token of the verb is used in a way that relates to a specific level, for example, by specifying a product of a specific level or by applying level-specific modification, more than this level is concerned. On the one hand, reference is necessarily downward-complete: reference to a non-basic cascade level ontologically and conceptually requires generating act-TTs. This holds for all verbs that denote non-basic action: their cascade-format lexical meaning will contain at least one generating level, of an act-type which may or may not be specified. Even if unspecified, generating lower level actions are not of arbitrary type; rather they must be such that, under the circumstances one is entitled to assume, they level-generate what is at stake. On the other hand, we will further assume that, if a lower level is explicitly addressed, it will generate higher levels according to our assumptions about the circumstances. That does not mean we have to assume that always a complete writing cascade up to Level H5 is referred to. The circumstances may be such that they prevent level-generation of certain higher levels. Also, a given specification of the product argument, say as “whorls”, may preclude level-generation on the object track and therefore also on the act-track.

In addition to the levels subject to direct reference, we will be ready to generate further levels of a given TT cascade in our inevitable attempts to make more sense of what is said, by relating the act to further contexts in which it might matter. Thus, level-generation is a particularly rich source of conversational implicatures based on relevance. These cascade extensions will not be found in the lexical entries since they depend on the circumstances of an individual utterance.

6.2 Cascades and Composition

If we consider semantic meanings to be concepts, for example frame cascades for verbs of action, and if we are provided with explicit models of these concepts, we are in a position to ground a theory of semantic composition on decomposition. Semantic composition can then be modeled in more detail and more precisely. Also, if we know more about the meanings of words, we can start to model the interaction of semantic information with context knowledge. Using the example of the verb write, I will illustrate some of the general perspectives of semantic composition emerging.

Let us assume we are to interpret a simple sentence with the verb write in finite use, with a subject and a direct object.

figure aj

The lexical meaning of the name Martha, when taken as a person name, is a very simple frame: There is a central referential node typed as ‘person’ with one attribute, name, that carries the value ‘[Martha]’, basically an English sound and written form; we may add a gender attribute to the central node with the value ‘female’ if we consider it adequate to assume that bearer’s gender being female constitutes part of the meaning of the name Martha. The subject DP in (35) specifies the agent argument of the verb. Now, there are five agent nodes in the writing cascade that belong to an act typed as some level of writing. In principle, the frame for Martha can be unified with any one of them. What about the remaining four agent nodes? They will essentially be taken care of by the c-constitution requirements. In the simpler case of unsplit agency, Martha implements the agent at all levels, i.e. the scribbler, the scriber, the author, and the principal at the same time. If we allow for footing splits, the conditions are more involved: the level-agent is either Martha herself, or somebody who delegates this level to Martha or someone who Martha delegates this level to.

In addition to the full five-level readings of write, there is the possibility that the writing cascade may be implemented only up to a level lower than H5. Thus, there are three degrees of freedom given for the composition of verb and subject NP: (i) choice of the overall expansion of the writing cascade up to a level less than or equal H5; (ii) selection of a level for the agent; (iii) selection of the agent’s role in a footing structure. This amounts to a vast number of readings on this part alone.

Dealing with the direct object in (35) is less complex because the product is Level H5, an illocution. In order to be able to select the appropriate level for unifying the product node with the frame for the statement, we need to know that statements are illocutions, that is, we need an according frame representation of the noun statement. As to the remaining four object nodes in the cascade, again the c-const relation will take care; for any product at a Level n + 1, the product at Level n must support (i.e. c-constitute in the generalized sense) the higher-level product type. We may, however, also have product specifications that leave the type and level open, such as it or that. Depending on how the reference of the pronoun is determined in the given context, it might result in selecting a different level than was chosen for the agent. Therefore, the number of readings due to handling the agent argument potentially multiplies with the number of levels on account of level-selection for the object specification.

As is natural when one works with frames, I assume that the basic mechanism of semantic composition is unification.Footnote 35 Unification is restricted by the condition that the type information on the nodes unified be compatible. In the case of level-specific object specifications or modifiers, this condition accounts for how these “find” their level to apply to. If there is more than one pair of nodes that fit, there may be more than one way of unification . We therefore have to accept that semantic composition is not deterministic. Although this is a bitter pill to swallow for some theoretical orientations in semantics, this consequence is after all welcome. All the readings possible are potentially “real”. If there are several readings to a construction, the compositional theory must predict all of them. Thus, the multilevel approach is on the one hand considerably more complex, but on the other able to account for the data more adequately.

The classical model of semantic composition is not a psychologically realistic model (and never was meant to be). In a realistic approach to semantic processing, the semantic agent will not only process linguistic information (i.e. syntactic structure and lexical meanings), but they will also draw on contextual knowledge during the process of composition, not only after it is finished (Hagoort et al. 2004). Aiming not at abstract sentence meaning, but at utterance meaning, i.e. meaning plus reference in the given context, the composing subject will merge the semantic information as early as possible with contextual information about the referents. For example, when faced with the sentence Martha wrote the statement, in a context where they know who Martha is, what statement is at issue, and which writing footing Martha can have, they may end up with one possible reading only. It is in this connection, where the dependence of c-constitution on the circumstances comes to bear crucially. The c-parameter in every cascade link calls for the inclusion of contextual knowledge in the compositional process; knowledge of the circumstances is necessary in order to decide which cascade levels are actually accomplished.

7 Conclusion: Cascades in Cognition, Semantics, and Life

We started out from Goldman’s (1970) theory of level-generation and act-trees. Taken as the psychological notion Goldman had in mind, level-generation provides the ground for a novel theory of the cognitive representation of action concepts: human action is conceptualized in multilevel cascade structures (the occasional basic acts notwithstanding). The levels of c-constitution are not levels of generality, but of constituency: lower-level acts constitute higher-level acts, where constituency is generally dependent on circumstances that make it possible.

In his introduction, Goldman relates his theory of action to the ontological debate about the question as to whether, say, flipping a switch and thereby turning on the light is one act or two. The problem dissolves, if one adopts the psychological view on the matter. From this perspective, Goldman’s theory is not about just act-tokens, but about act-tokens-of-a-type, i.e. what I dubbed “act-TTs”. There is no doubt that, if one does something—one doing—one potentially enacts a whole cascade of action. All the acts in a cascade really are enacted; they really are as what they are categorized at each cascade level. This is reality to us as we cognitively construe the world. For psychology and for the analysis of verbal communication—and thereby for semantics and pragmatics—this is the relevant notion of reality.

In a second step, we applied Goldman’s multilevel approach to action verb concepts in natural language. Almost all action verbs denote non-basic action and therefore cascades of action. Some examples of everyday activities such as writing or speech acts call for cascaded concepts of as much as six or more levels. Thus, the repertoire of natural language verb meanings provides ample evidence for a Goldmanian multilevel view on action categorization. As a theory of the structure of semantic verb concepts, the cascade approach has far-reaching consequences for semantic theory.

Linking the cascade theory of action to observations on the meanings of action verbs is not only an application of the theory; these observations conversely provide evidence for cognitive theory: if so many lexical verb concepts turn out to be multilevel, this must be due to the way in which our minds work.

A closer look at the participants in the acts within a cascade revealed that there are analogous constituency relationships between the respective participants at different levels. There is a track of stepwise upwards implementation of agency in terms of the finer-grained level-specific agent roles. A parallel track obtains for other participants involved through cascade levels. This finding suggests that the multilevel conceptualization of human action induces cascades not only for action itself, but also for agents and objects involved.

Can cascade theory be extended to other types of verb? One natural way of extension appears to be the generalization of c-constitution in a way that captures the meaning and relevance of arbitrary events for the options of acting. For example, a rainfall or a blackout or an insufficient battery stage of our mobile may c-constitute all sorts of conditions for possible and impossible action. The outcome of level-generation would be what events and situations mean to us and for our options to act. In any event, the findings on the multilevel categorization of action, as well as, derivatively, of roles to act in and roles in which objects may be involved in action suggest that the conceptualization of action may play a more fundamental and central role in our cognitive system than widely assumed.Footnote 36

A radical induction from these findings might be this: All human categorization is, at least potentially, multilevel in the sense of cascade theory. Whatever we categorize, we categorize at potentially more than one level. This is owed to the fact that the bits and pieces of reality, or to be precise: of what is reality to us as human cognitive subjects always matter in many different contexts. The brief glimpse at upward cascading mechanisms in the verbal lexicon (Sect. 3.4) gave an impression of where cascading expands to: in many cases it is a projection into the realm of social action and interaction; in others, cascading takes categorization to the realm of appraisal (with respect to personal or socially shared values). This might be taken as an indication that there be macrolevels across specific action types. Acquiring a vocabulary of verbs for human action with cascade structure meanings will help the members of a language community to synchronize their cascade level distinctions for single types of action as well as for overarching macrolevels. Clark’s (1996) theory of language use is a detailed study of how conversational interactants synchronize their multilevel views of the interaction they are engaged in.

The higher levels of an action cascade can be considered as corresponding to as many respects in which the doing has meaning to us (in a nontechnical sense). Likewise, persons in roles matter at the level of action that defines this role, and so do objects involved in action. Conversely, acts, persons, and objects can be viewed as lacking meaning to us as long as they, for us, do not c-constitute anything at a higher level. Of course, what carries meaning to a subject is first of all a personal issue. There are, however, socially established ways of c-constitution that will be anticipated by persons in social interaction (cf., for example, Searle’s (1995) social ontology).

An aspect of cascade theory that was not discussed here is the role of cascades in practical knowledge. The basic levels of cascades, like pressing a button on a remote control, flipping a switch, touching a symbol on a touch screen, constitute the methods we learn and then command for doing the higher-level types of action such as turning on the TV, or the light, or starting an app. In our complex and ever-expanding knowledge-how about the world we live in, we have learned countless such cascades from our earliest stages of life on: we have learnt by which methods to do what. Notably, most of the time, we have no understanding of the underlying circumstances and causal relations responsible for the possibility of these level-generations; for all practical purposes, they are just given in our world and part of it. Level-generation in these cases does not seem to involve any kind of reasoning. Thus, the observation that most of our practical knowledge about the environment has cascade structure constitutes solid evidence that level-generation, or c-constitution, is indeed a fundamental brain mechanism, as I assumed above. This view of the role of cascade formation in the psychology of knowing how and learning by doing is developed in the contribution by Kalenscher et al. in this volume. That contribution is about rats, suggesting that cascade theory might apply even to animal cognition.