Perception, Types and Frames

We present a view of perception as the classiﬁcation of objects and events in terms of types in the sense of TTR, a Type Theory with Records. We argue that such types can be used to give a formal model of concepts and cognitive processing involving concepts. This yields a view that natural language semantics is based on our cognitive perceptual ability. The paper provides an overview of some key ideas in TTR including the important notion of record type. We suggest that record types can be used to model frames in a way that relates to the Düsseldorf notion of frame as well as those of Fillmore and Barsalou.


Introduction
We will present a simple-minded view of perception as the classification of objects and events in terms of types viewed as cognitive resources. The theory of types that we are using is TTR, a Type Theory with Records, which borrows a great deal from work in logic and computer science in a tradition initiated by Per Martin-Löf. It provides a rich type theory, that is, it includes types not just for basic ontological categories such as entities and functions, but also types of objects such as Tree and Boy and types of events (or situations) such as Hugging-of-a-dog-by-a-boy. Types may be complex objects constructed from other types in a type theoretic universe. We will argue that such types can be used to give a formal model of concepts and cognitive processing involving concepts. In particular, we will suggest that natural language semantics is at bottom based on our cognitive ability to perceive objects and situations in terms of types. To this we have added the ability to reason in terms of the types themselves. Thus, for example, we can consider types of situations without actually perceiving a situation of the type and we can even consider types of situations which are impossible.
Among the complex types introduced in TTR are record types which are used to model types of situations and also propositions. An utterance of the sentence A boy hugged a dog is true if there is a situation of the type Hugging-of-a-dog-by-a-boy and false if there is nothing of that type. (This follows the dictum known as "Propositions as Types" which Martin-Löf took over from intuitionistic logic.) Both the intuition behind record types and their structure in the formal theory suggest that they can be used to model frames, both as conceived of by Fillmore and as introduced by Barsalou. We will develop this correspondence and suggest that this provides one way of integrating frames into compositional semantics. In exploring this we will find relations with work on frames conducted by several researchers in the Düsseldorf group working on frames.

Types and Cognition
Here we will give a brief overview of certain key ideas in TTR. For more detailed discussion of TTR in general see Cooper (2012, prep), Cooper and Ginzburg (2015). TTR is a rich type theory: in contrast to the simple type theory used in formal semantics as developed by Montague (1974), it contains a much richer collection of types. Whereas Montague has types for what we might call basic ontological categories such as entities and truth values, TTR includes types of objects like Tree and of events such as boy-hugs-dog. We will see later that such types may have a complex internal structure. For discussion of the difference between simple and rich type theories including a historical perspective see Chatzikyriakidis and Cooper (2018). TTR is inspired by work in the tradition of Martin-Löf type theory (Martin-Löf 1984;Nordström et al. 1990). While it has borrowed many tools and insights from this it does not follow all of the basic tenets of Martin-Löf type theory such as a proof-theoretic constructive approach derived from intuitionism. For discussion of some of the differences and motivations see Cooper (2017a).
A central notion in Martin-Löf type theory is judgement, a judgement that an object (or event), a, is of type, T . This is represented in symbols in (1).
(1) a : T We say that a is a witness for T . In work using TTR we put a cognitive spin on this notion. Suppose an agent, A, perceives a tree, t. (Here we are thinking of t as an object in the world, construed naively, that is the physical object with a trunk, branches and leaves.) We say that perception involves classifying an object as being of some particular type, that is making a judgement. Thus perceiving t as a tree, A makes the judgement that t is of type Tree. In symbols we can write this as (2).
(2) t : A Tree For discussion of this notation and the theory of type acts that we associate it with see Cooper (2014). We can think of the type Tree as what Gibson (1979) would call an invariant: whatever it is that trees share in common that enable us to classify them as trees. Following Gibson's terminology we can say that A is attuned to this type or A has this type as a resource. The idea that attunement is an important notion for semantics goes back to work on situation semantics (Barwise and Perry 1983), which is another important source of inspiration for our work on TTR.
Different agents have different type resources available. For example, a bee landing on the tree perceived by A probably does not have the same type Tree as the human A does. Different species have different perceptual apparatus and cognitive abilities. Even within a species the resources we have available might vary depending on our experience. For example, most people have a greater variety of subtypes for Tree than I do corresponding to different kinds of trees. The idea of linking types to perception is developed further by Larsson (2013) and is related to the theories which ground cognition in perception, for example, Barsalou (1999). For an agent to be able to make classifications corresponding to types there must be patterns of neural activation corresponding to types which we could think of as mental representations of types. For some suggestions concerning how such neural representations might be see Cooper (2017bCooper ( , 2019. TTR provides not only types of objects but also types of situations, following a suggestion by Ranta (1994). Suppose that the boy, Sam, hugs his dog, Fido. The type of situation in which Sam hugs Fido is represented as in (3).
(3) hug(sam, fido) We are used to this notation as a logical formula which denotes a truth value. In TTR, however, we use the notation to represent a type of situation. Nevertheless we can recover the notion of truth by using the "propositions as types" dictum (see Chatzikyriakidis and Cooper 2018 for discussion and references). A type (thought of as a proposition) is true just in case it has a witness, that is, there is something of the type. The type (3) is a complex type which is constructed from the predicate 'hug' and two individuals ('sam' and 'fido') as arguments.
Suppose, however, that we want a more general type of situation, one where any boy hugs any dog, that is, the type Hugging-of-a-dog-by-a-boy which we mentioned in Sect. 1. In TTR we use record types for this. Consider the record type in (4).
x : Ind c boy : boy(x) y : Ind c dog : dog(y) e : hug(x,y) ⎤ ⎥ ⎥ ⎥ ⎥ ⎦ This is a graphical notation for a set of fields, which in turn are ordered pairs containing a label and a type. The type Ind is the type of individuals, about which we say more below. A type like 'boy(x)' is a dependent type-exactly which type it is depends on the individual you choose in the 'x'-field. A witness for this record type is also a set of fields, though in this case the fields consist of a label followed by an object. A record is a witness for the record type if it contains fields with the same labels as the type (and possibly more fields with other labels) and the objects in these fields are witnesses for the corresponding types in the record type. So, for example, a record of the form (5a) would be a witness for (4) provided that it meets the conditions in (5b).
We can think of records as modelling complex situations in which each field introduces either an object or a situation. Thus we can think of (4) as being the type of situations in which a boy hugs a dog. What does it mean for an agent to perceive some situation, s, as being of type (4)? If situations are to be construed as being part of the world (as in Barwise and Perry 1983) then we might be misled by thinking of a situation as being of the type (4). After all (4) is a record type and a record, as we have seen, is a pairing of labels with objects like Sam and situations in which, for example, Sam is a boy or, if you like, proof objects, such as a part of the world which shows that Sam is a boy. (The term proof object was introduced by Martin-Löf and shows an important bridge between a proof theoretic and a model theoretic approach to logic.) While it seems reasonable (though not entirely uncontroversial) to say that objects like Sam and situations in which he is a boy are parts of the world, the world does not come conveniently labelled as would be suggested by a record. We do not wish to claim that the world consists of records as characterized in TTR. The notation (6a) in TTR is a convenient graphic display of a set of ordered pairs (the graph of a function) whose first members are labels and whose second members are objects, as in (6b).
Another intuitive way to think about this is as a labelling of the set 1 (7a) which could be graphically represented as (7b).
x c boy y c dog e | | | | | {sam, s 1 , fido, s 2 , s 3 } Intuitively, the elements in the set in (7b) are part of the world whereas the labels are pointers or handles introduced by cognitive processing of the world. Depending on your metaphysical view, you can consider the set in (7a), as opposed to the elements of the set either as something existing in the world or a cognitive construct which assembles those elements into a collection. On our view, records, at any rate, represent cognitive objects since they introduce labelling and perception of a situation as one in which a boy hugs a dog involves breaking down the situation into components corresponding to the boy, the dog, the "boyness" of the boy, the "dogginess" of the dog and the "hugging" event involving the boy and the dog.
It might be that we could regard this as perception of a collection of tropes according to one or more of the varieties of tropes that have been proposed (Maurin 2016). 2 A witness for a type like 'boy(sam)' is normally glossed in TTR as a situation which shows (or proves) that 'sam' is a boy. Such a situation is a particular (an "object" in TTR terms) as required for a trope though it is perhaps not clear that it is abstract in the right sense for a trope. It appears, at any rate, that it would not be the kind of trope discussed by Moltmann (2013). For one thing, Moltmann does not consider tropes as corresponding to common nouns in natural language. For another, there seems to be a kind of uniqueness of tropes instantiated by particular objects as in the red of the box whereas on our view given a box b, there could be many witnesses for the type 'red(b)', that is, situations which are proofs for the redness of the box. Furthermore the red of the box would be shared with another box which has exactly the same shade of red. There is no requirement that a situation which shows that one box is red also shows another box to be red, although there can be such situations. However, a situation which shows two boxes to be red would not require that the two boxes have an identical shade of red. This would, in Moltmann's terms at least, indicate that the situation is not a trope. Nevertheless, there is something trope-like about the situations which witness these types in that they are particulars which instantiate a specific quality obtained by applying a single predicate to appropriate arguments.
Record types give us a notion of subtyping. We can obtain a subtype of a record type by adding additional fields to it. Any record of the type with additional fields will also be of the type with fewer fields because a witness for a record type may contain additional fields with labels not occurring in any field in the record type. Thus the intuitive fact that any situation in which a boy hugs a dog is a situation in which there is a boy is modelled by the subtype relation expressed in (8). We have talked as if there are situations like a boy hugging a dog on the one hand and objects like trees on the other, but actually the dividing line between them is not so obvious. For example, you could think of Tree as being shorthand for a record type like (9).
:Ind y :set(Ind) c leaves : leaves(y,x) z :set(Ind) c branches : branches(z,x) w :Ind c trunk : trunk(w,x) (Here 'set(Ind)' represents the type of sets of individuals.) This represents the intuition that trees have leaves, branches and a trunk. You can either think of this as an individual or as a situation in which various things hold. Using the type Ind for "individual" as we standardly do in TTR, following the lead of traditional model theoretic semantics (cf. Montague's type e), hides a great deal of complexity which needs attention if we are to take a cognitive approach to perception and semantics. Perhaps the least you can say is that each agent may have their own view of what counts as a witness for Ind corresponding to a scheme of individuation (discussed in connection with semantics by, for example, Barwise 1989). For important work addressing some of the many difficulties involving individuation see Sutton and Filip (2017).
In this section we have talked about types from a cognitive perspective and in fact we can think of types as models of cognitive notions like concept, memory and belief. If we think of a concept as a type we can say that the concept is instantiated just in case there is a witness of the type. If we think of a memory as a type we can say that the memory is correct just in case there is or was a witness for the type. If we think of a belief as a type we can say that the belief is true just in case there is a witness for the type. This, coupled with the ideas of how types could be represented on a network of neurons presented by Cooper (2017bCooper ( , 2019, gives us an admittedly very preliminary and "armchairish" theory of how concepts, memories and beliefs could be represented in the brain. It is my hope that this might in the future lead to a substantial connection between formal work on language and empirically based neuroscience. It is in this context that I would like to view the discussion of frames in the next section.

Record Types and Frames
TTR has been used to model frames by Cooper (2010Cooper ( , 2016. This work took the frame semantics suggested by Fillmore (1982Fillmore ( , 1985 leading to the kind of frames used in FrameNet (https://framenet.icsi.berkeley.edu) as its starting point. However, the use of frames to analyze the Partee temperature puzzle is strikingly similar to that proposed by Löbner (2014Löbner ( , 2015 who based his work on Barsalou's (1992) more cognitively based notion of frame.
Partee's temperature puzzle involves explaining why the inference in (10) is not valid, as it would be if the interpretation of is 90 is "is identical with 90".
(10) The temperature is 90 The temperature is rising 90 is rising In order to address this puzzle Cooper (2016) uses the record type (11) corresponding to a stripped down version of the FrameNet frame Ambient_temperature. We call (11) AmbTempFrame. Any record belonging to this type will contain a pair of a real number (in the 'x'-field) and a location (in the 'loc'-field) such that the real number is the temperature at the location. In the terminology adopted in Cooper (2016) we refer to the record type AmbTempFrame as a frame type and we refer to records that are witnesses for it as frames. As records are used to model situations (including both states and events) frames correspond to situations and frame-types correspond to situation types. The basic idea in Cooper (2010Cooper ( , 2016 is that a temperature rise is a string of two frames, s 1 s 2 , such that s 1 , s 2 : AmbTempFrame and s 1 .loc = s 2 .loc and s 1 .x < s 2 .x. This is a very simple theory of temperature rises. One might, for example, object to holding the location constant in view of sentences like (12).
(12) The temperature rises as you go south Cooper (2016) suggests, however, that all locations are relative, even those we consider to be fixed locations on the Earth when we consider them from an astronomical perspective, so we could think of the location in (12) as being the relative location "around you". One might object also to having a string of just two frames corresponding intuitively to two temperature readings over time. The idea of strings is adapted from Fernando's (2004Fernando's ( , 2006Fernando's ( , 2008Fernando's ( , 2009Fernando's ( , 2011Fernando's ( , 2015 work on a string theory of events, where a finite string can be regarded as a finite number of observations of a continuous world. The question arises whether the temperature should be rising between the two frames or whether it would still count as a rise even if the temperature was lower at some point between the two frames. The fact that examples like (13) can be true despite temperature dips during the night suggests that we can allow for temperature falls during a rise.
(13) The temperature rose during the week AmbTempFrame can be related to a directed graph similar to those discussed by Kallmeyer and Osswald (2013), Kallmeyer et al. (2017) in connection with frames. We let the labels in the record type be labels on the edges and the types be labels on the nodes. In the case of types constructed with a predicate we use the predicate to label a node with edges labelled 'argn' corresponding to the arguments of the predicate. Thus the type Ambient_temperature in (11) could correspond to the directed graph in (14). (14)

Real
Loc temp x loc e arg1 arg2 This would indicate that ambient temperature has three attributes: a real number (here labelled as the attribute 'x'), a location and a constraint (here labelled as the attribute 'e') that the real number is the temperature at the location. Both the record type (11) and the directed graph (14) could be coded in terms of hybrid logic in the manner suggested in Kallmeyer et al. (2017) as in (15). (15) x (l 1 ∧ Real) ∧ loc (l 2 ∧ Loc) ∧ e (temp ∧ arg1 l 2 ∧ arg2 l 1 ) One of the anonymous referees offers a different way of relating TTR frames and Düsseldorf frames (DF). This involves thinking of the attributes in DF as functions from entities to entities in TTR. What appears below is my own adaptation of the referee's suggestion and the referee (anonymous, though he or she is) should not be held responsible for it. The suggestion involves first recasting the TTR frame type suggested in (11) in a neo-Davidsonian version, something that I think can be a good idea in many respects although it has not be explored to any extent within TTR. My suggestion for a neo-Davidsonian type for ambient temperature is given in (16). The referee's idea is that then the labels in fields with basic types stand in for values in DF and the predicates in the types labelled 'c 1 ' and 'c 2 ' correspond to attributes which label edges in the DF graph. Thus we would obtain (17), again a modification of the reviewer's original. (17) e State x

Real
loc Loc

TEMP LOC
This certainly gives us a more intuitive looking Düsseldorf frame. Also the representation of this in hybrid logic, given in (18), corresponds more closely to the use of hybrid logic by Kallmeyer et al. (2017).
A possible disadvantage with this, though, is that the relationship between the record type and the directed graph is less direct than in the first suggestion that we presented. This discussion raises the interesting question of whether a general relationship could be shown between TTR and hybrid logic and more specifically between frames modelled in terms of records and record types and frames as modelled by Kallmeyer and Osswald (2013), Kallmeyer et al. (2017). Then it is interesting to consider whether the particular linguistic analyses offered in the two approaches to frames can be intuitively represented in both TTR and DF.
For example, it is not obvious to me that the following analysis could be easily reconstructed in DF, although I would be happy to be convinced otherwise. The basic idea in Cooper (2010Cooper ( , 2016, although the analyses in the two papers differ in details, is that temperature and rise correspond to predicates not of numbers but of frames of the type AmbTempFrame and for this reason the offending inference in the Partee puzzle does not go through. This leads us to distinguish between nouns and verbs which correspond to properties of individuals on the one hand and properties of frames on the other. The way that this distinction is made in Cooper (2016) is represented in (19) where dog and run correspond to individual level properties and temperature and rise frame level properties (modelled as properties of records).

(20) The dog is nine
The dog is getting older Nine is getting older The conclusion drawn by Cooper (2016) is that expressions corresponding to individual level properties can have a coerced interpretation where they correspond to frame level properties. 3 Thus in addition to (19a) we can obtain a coerced interpretation of dog as in (21).
(21) λr : x:Rec . e : dog_frame(r .x) A record is a dog frame just in case it is of the type (22a). For example, it may be of the type (22b), a subtype of (22a). This allows for frames of types other than (22b) to count as dog frames. The only requirement on a dog frame is that it contain an individual which is a dog. What other information we put into the frame may vary with whatever we are interested in when creating the frame. For many objects age is a relevant issue and we can imagine that among our resources is the type (23a) (which requires an individual with some age) and that this type can be merged with a minimal frame type like (22a) as indicated in (23b).
(23) a.  2015.) Thus (23a) could be thought of as a resource which could be used in a general coercion procedure for taking individual level properties to frame level properties involving a frame type including age information. This, perhaps, points to a rather different notion of frame than we have in either Fillmore's or Barsalou's work where we get the impression that frames might be a fixed non-dynamic part of our cognitive furniture. This appears to be the case despite Barsalou's interest in ad hoc categories. Barsalou (1991), for example, sees ad hoc goal-derived categories as important in providing the mapping from frames to world models. Thus while categories are created on the fly, the frames seem less dynamic, even if they are learned over time. Here, however, in talking of coercion we are considering creating frames on the fly. It seems reasonable to say that some of the frame types we have available are a permanent part of our general cognitive resources. However, it also seems reasonable to say that other frame types can be created ad hoc for the purposes at hand and that our ability to do this is exploited in cases of coercion.

Conclusion
We have discussed a simple-minded theory of the perception of objects and situations couched in terms of a theory of types which takes inspiration from Martin-Löf type theory. As part of this we introduced the notion of record type as corresponding to types of situations like boy-hugging-dog situations where we do not require particular individuals to be involved in the situation. We also suggested that such record types could correspond to types of individuals and raised (but did not solve) issues of individuation which relate to those which have been discussed by Sutton and Filip. We suggested that such record types can be used to model frame types and that they relate to both the Fillmorean notion of frame and that put forward by Barsalou together with linguistic developments of this notion carried out in Düsseldorf. Despite the fact that the origins of our notion of frame came from Fillmore, the fact that we take a cognitive view of our type theoretic analysis perhaps makes them appropriate for Barsalou's notion.
We discussed work on the Partee puzzle using such frames which seems similar in spirit to Löbner's recent work using frames to analyze the same puzzle. We also pointed out that the techniques we are using seem to have a correspondence to techniques used by Kallmeyer and colleagues, although more detailed investigation would be required to show a general relationship.
Finally, we suggested that the Partee puzzle is not limited to a restricted number of frame level properties but that individual level properties seem to be able to be coerced into frame level properties. This suggests that the frames that we have available as cognitive resources are not necessarily stable but apparently can be created ad hoc to meet requirements at hand. This is perhaps an aspect of frames that was discussed neither by Fillmore nor Barsalou.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.