Keywords

1 Introduction

What makes a certain choice of representation better suited than another for conveying the same information? Stapleton et al. made a contribution towards a general theory that may provide an answer to this question [27]. They put forward a formal theory of ‘observation’ and ‘observational advantage’ that distinguishes between the information that is observable in, and the one that needs to be inferred from, a given representation. This theory allows to formally prove the observational advantage of Euler diagrams over set-theoretic sentences when it comes to conveying information about set-theoretic claims concerning set equality and inclusion. In order to achieve that, Stapleton et al. resort to an abstract notation for Euler diagrams that is detached from cognitive aspects of the act of observing and making sense of a diagram. This leaves open the possibility of some diagrammatic formalisms where observation is much more cognitively costly, having an equivalent observational advantage, and thus be judged as equally effective. For instance, as we will show in this paper, Hasse and Euler diagrams can have equivalent observational advantage over set-theoretic sentences. Thus, to account for the cognitive aspects of observation, we will model the act of observing and making sense of a diagram as a network of conceptual blends of image schemas with the geometric configuration of the diagram, and show that observation on the Hasse diagram is modeled with a much more complex network of blends. We believe the latter fact indicates that the observation act has a higher cognitive cost for the user.

Our work is based on various theories of cognitive science. First, the notion of sense-making refers to how agents actively create meaning by perceiving and acting within their environment [20, 28]. Image schemas are mental structures acquired through infancy, as humans interact with their environment, and reflect the basic structure of sensorimotor contingencies experienced repeatedly, such as container, link, and path [13, 15]. Conceptual blending is a theory that posits that novel meaning emerges as we integrate existing concepts with each other [11]. Integrating all these theories, and applying them to the domain of diagrammatic reasoning, our proposal is the following: The geometry of a diagram is not meaningful on its own. We make sense of it, and reason with it, by integrating with it certain image schemas that are suitable to actively draw conclusions about its semantics [1,2,3].

To realise the above proposal, we must decide which image schemas are blended with each diagram, which can be done by following the approach that the advocates of the theories of image schemas and conceptual blending have followed for language. In this literature (e.g., [11, 16]), in order to argue that humans make sense of certain concepts by integrating certain image schemas with them, it is shown that:

  • the components of the image schema correspond, in a one-to-one manner, to the components of the concept to be made sense of,

  • there is a transfer of a more detailed inferential structure, that allows reasoning about the new concept.

For example, to explain the concept of being depressed, a conceptual metaphor is described using the container schema to convey the experience of being trapped, when one says: “I am in a deep depression.” By uttering this sentence, we put in correspondence the inside of a container with the state of being depressed, and the outside of the container with the non-depressed mental state. The inferences here originate in our embodied experience with containers: if I am inside a depressed state, I cannot be outside of it; if my depression is deep, then getting out of it will be hard. Transferring this approach from language to diagrams, in this paper we will show that:

  • certain image schemas can be put in correspondence, in a way that is almost one-to-one, with the geometric configuration of certain diagrams

  • certain blends of these image schemas with certain mathematical diagrams are apt to model the sense-making of the latter, because they can give rise to inferences that are valid in the reference domain of these diagrams.

Integrating image schemas with the geometry of our diagrams, using the guidelines described above, we will be able to compare the resulting networks of conceptual blends. Our hypothesis is that, between two diagrams for both of which such networks exist, the most cognitively effective one would be that with the simplest network of blends. We will argue that users reason about sets with Hasse diagrams by conceptualising them as vertically linked paths along a scale, and with Euler diagrams by conceptualising them as a configuration of containers that may contain other containers. We present an Euler and a Hasse diagram that have equivalent observational advantage with respect to set-theoretical notation, but we argue the Euler diagram is more cognitively effective than the Hasse one because the network of conceptual blends modeling observation with it is much simpler. We believe our approach reaps the benefits of a formal but abstract approach, such as that of Stapleton et al. [27], while accounting for the cognitive aspects of reasoning when comparing the effectiveness of two diagrams.

2 Background

The term sense-making is defined within the framework of enactive cognition, which takes cognition and sense-making to refer to the process of an autonomous agent bringing its own meaning upon its environment, as a result of trying to grow and sustain itself [20, 28]. This process is dependent on the embodiment of the agent, because a specific body—including a brain, sensory organs, and actuators—constrains the ways an agent can perceive, and interact with, its environment. Cognition and sense-making are therefore understood as emerging through the interaction of an embodied agent with its environment.

One concrete way to approach sense-making is through image schemas and conceptual blending. Image schemas are mental structures formed early in life, constituting structural contours of repeated sensorimotor contingencies, such as container, support, verticality and balance [13, 15]. They are not acquired by learning a set of propositions, rules, or criteria, but by experiencing, for instance, our bodies being balanced, trying to maintain our balance, supporting an object, etc. Repeated experiences of the same kind lead to the formation of a mental structure capturing what is invariant and shared among them. The most important function of image schemas is their capacity to structure our experience. For example, we can perceive bees as being in a swarm, through the container and count-mass schemas, even though there is no single physical object in the environment, corresponding to ‘swarm’ [17, p. 31]. Image schemas are Gestalts; they consist of a set of necessary components with a specific relational structure, whereby each component becomes meaningful only through its relation to all the others [17, p. 31]. By way of this structure, agents can—unconsciously but systematically—integrate image schemas with their experience, thus making sense and drawing meaning out of it. In order to fulfill this function, the image-schematic structure has to be preserved during this integration [17, p. 42]. Consequently, when putting image schemas in correspondence with the geometry of a diagram, it would be desirable to put in correspondence as many elements of the image schemas as possible, and in a one-to-one manner, with the geometrical shapes. Finding the right schema for a given state of a affairs is unconscious and immediate, but is nonetheless a cognitive process that uses our mental resources.

The image schemas of relevance for our case study are: link, path, verticality, scale, and container. We will now discuss their cognitive structure according to the literature, and explain what kind of geometrical configurations they should be put in correspondence with. However, these correspondences are not written in stone, but are flexible and could change depending on the context the diagrams are used in. We have previously described and formalised similar correspondences for Hasse, Euler, and some more diagrams [3].

link. This schema can capture associations of various types, ranging from a physical chain tying two objects together, to two events abstractly linked by occurring at the same time. The prototypical link schema associates two distinct, usually contiguous, entities linked with each other through a link. Therefore, the link schema structure comprises two objects of the same type (entities), and a third object of a different type (link). Being in this particular configuration makes it so that the two entities have the property of being ‘linked’. This structure fits well with a geometrical configuration of two regions or points that both intersect with a line. The objects identified as linked entities are typically “spatially contiguous within our perceptual field.” [13, p. 118], which holds for points linked by a line.

path. This schema gives rise to our understanding of things moving from one point to the other [13, pp. 113–114]. It underlies the conceptualisation of objects following trajectories through space, irrespective of the details of the trajectory [18]. The path schema has the cognitive structure of a sequence of pairwise adjacent locations, naming the first one as a source and the last one as a goal. There can optionally be a trajector on some location of the path [13, 15]. The structure of the schema necessitates that, if someone is on a certain location of the path, then they have already traversed all prior locations, and that contiguous locations serially lead from the source to the goal without branching. Given its structure, we believe the path schema should be put in correspondence with a series of shapes that are neighboring with each other in some way, and the source and the goal with shapes that do not have the same neighboring relation with any shape. This description is quite general, and could apply to almost any diagram. We will later see how it can be applied it to the diagrams studied here.

verticality. This schema obtains its structure from our experience of standing upright with our bodies resisting gravity, or from perceiving upright objects like trees. It comprises the axis of an upright object, the axis reflecting the trajectory an object would follow if free-falling, or an axis that is merely mentally visualised by an observer upon a scene [25]. Regarding the latter case, for example, when observing the sun on the horizon, the horizon is the base, and a visualised vertical axis runs upward from it, reaching the sun. This axis is always unique, and has an up-down polarity, so it is associated with a base at the bottom, or the ground, as a reference point. The base corresponds to the point where the axis meets the ground, or, if discussing an upright object, to the bottom part of an object by which it can stand [25]. Given the above, the verticality schema could be put in correspondence with diagrams with configurations along a vertical axis. More precisely, there must be a single shape that is geometrically lower than all others, serving as a base, and a geometric configuration resembling a vertical axis, e.g., shapes being one above the other.

scale. This image schema pertains to a gradient of quantity, and has the following four properties: a fixed directionality, a cumulative property (if one has 15 euros, they also have 10), it can be open or closed, i.e., have a specific endpoint or not, and finally, numerical gradients or normative judgements can be projected on it [13, pp. 122–123]. The scale schema is proposed to underlie the more-is-up metaphor, whereby a higher position in the vertical axis implies a higher quantity of something; that is, a larger number of rocks, or amount of water, means the top/surface reaches a higher position. Thus, the fixed directionality of scale is always upward [13, p. 121]. However, we believe that horizontal or circular scales (e.g., rulers and measuring tapes, or mechanical weighting scales respectively) also satisfy the other properties of scale and so perhaps scale is not inherently vertical, and a separate verticality schema is additionally involved in the more-is-up metaphor. Therefore, for us scale simply comprises an order of several discrete levels. Given the above, a scale schema could be put in correspondence with a geometrical structure of shapes that have a graded property. Such a structure could comprise, for example, shapes with a color or size grading, or shapes that are positioned one above the other, one to the right of the other, etc.

container. This schema captures the structure of entities that are hollow, and can enclose and protect other entities in various ways, ranging from a fence enclosing a plot of land, to a balloon enclosing the air inside it. container consists of a boundary, separating an inside and an outside, and this structure gives it certain properties; that is, an entity can be either in the inside or on the outside of a boundary, but not both. Also, several axioms hold, such as: if object A is inside boundary B, and boundary B is inside boundary C, then object A is inside boundary C; if object A is inside boundary B, and boundary B is outside boundary C, then object A is outside boundary C [17, p. 44]. We can see that the boundary of a container can be put in correspondence very naturally with a closed curve of any shape on the 2D plane. The inside and outside regions of the container also correspond naturally with the areas inside and outside the curve in this 2D space, respecting all the aforementioned properties of container [17, pp. 45, 122].

In our approach, sense-making as the integration of image schemas with our experience can be described though the theory of conceptual blending, following [11, pp. 104–105]. Conceptual blending operates on mental spaces, which we introduce below based on the descriptions of Fauconnier [10] and Gärdenfors [12]. Mental spaces are mental representations that structure our perception and action. They comprise coherent and integrated chunks of information, containing entities, and relations or properties that characterise them. Mental spaces can be constructed from knowledge we have acquired previously, or from current experience, including exposure to language. Therefore, they operate in working memory but long-term memory can play an important role in their construction. Last but not least, the elements of one mental space can be put in correspondence with those of others, allowing cognitive access to them.

The central claim of the theory of conceptual blending is that a systematic process of building correspondences between different, preexisting mental spaces—called input spaces—can result in the emergence of novel meaning. This process gives rise to a new mental space—called blended space—that contains some elements of the input spaces with new relations among them. To construct a blend, some pairs of entities, relations, or attributes from input spaces must be put in correspondence with each other, and related in a new way, or even merged with each other, in the blend. This process leads to the emergence of novel structure and thus novel meaning. The entire network comprising the input spaces, the blended space, the generic space—reflecting the common structure among input and blended spaces—as well as the correspondences among all spaces, is called the integration network. Meaning emerges in the integration network as a whole.

Now we can put the aforementioned theories in the context of sense-making of diagrams. An enactive cognition approach to diagrammatic reasoning would entail that no geometric configuration is meaningful in itself, but it prompts the user to unconsciously structure it into a meaningful diagram by activating suitable frames, and integrating them appropriately with the configuration. The logical approaches taken to diagrammatic reasoning are very different from this paradigm. Such approaches formally study the informational content, and the effectiveness of diagrams for reasoning. To that end, a mapping between the syntax (geometric configuration) and the semantics of the diagram is typically assumed [19]. The theory of observational advantage put forward by Stapleton et al. [27], which stems from Shimojima’s early work on the effectiveness of representations [26], follows an equally abstract approach. We believe such abstract approaches overlook the active, embodied role of the user in diagrammatic reasoning. Indeed, in agreement with enactive cognition, it has been suggested that the interpretation of diagrams entails a constructive and imaginative process on the part of the user [7, 19]. We wish to extend the theory of Stapleton et al. [27] to take into account the embodied and enactive aspect of our capacity to understand diagrams, and explain observation as emerging from the structure of the image schemas.

3 Related Work

In this section we will briefly summarise the theory of observational advantage of Stapleton et al. [27], and a cognitively-inspired framework for the analysis of representations, developed by Cheng et al. [6]. The former work put forth a formal criterion to compare the effectiveness of two notations of any kind; including diagrammatic or sentential. First, any notation has some meaning-carrying relationships among its components, i.e., visuo-spatial relationships that express a certain meaning. A mathematical diagrammatic notation, in particular, is drawn with certain meaning-carrying relationships intended to express some sentences in another notation, e.g., logical or set-theoretical. In some cases drawing this notation can result in the appearance of additional meaning-carrying relationships that allow reading even more sentences, that would require additional inference steps in the second notation, directly off of the first one. In this case, the first notation has an observational advantage over the second. For example, someone intending to express the sentences \(P \cap Q = \emptyset \) and \(R \subseteq P\) with an Euler diagram, will have to draw a diagram that is topologically equivalent to that of Fig. 1(a),Footnote 1 and will, in doing so, inadvertently also express that \(R \cap Q = \emptyset \). In contrast, to obtain that \(R \cap Q = \emptyset \) from the sentential notation, an inference step is required. Observation is therefore seen as a kind of immediate inference rule by which we extract, by merely looking at the notation, some atomic fact (that evaluates to either true or false) that is already ‘within’ that notation. Finally, a notation can also be observationally complete with respect to a set of facts (in the same or other notation), meaning that any inferences that can be drawn from these statements can be observed from the first notation.

With these definitions of ‘observation’ and ‘observational advantage’, Stapleton et al. go about proving the observational advantage of Euler diagrams [27] over set-theoretical sentential notation. This is done in a very abstract way, disconnected from the embodied, enactive nature of observation and from the spatial properties of the geometry. The visuo-spatial relationship of a ‘region’ \(r_1\) being contained in a ‘region’ \(r_2\) is not a visuo-spatial relationship anymore in this abstract treatment of Euler diagrams. As we will see in the next section, this gives rise to the possibility of defining an alternative diagrammatic notation with an equivalent abstract observational advantage, but in which observation would arguably have a higher cognitive cost.

Regarding other work with similar goals to ours, Cheng et al. [6] develop a comprehensive formal framework for characterising the formal and cognitive properties of representations, ultimately aiming to build an AI system to automatically select effective representations for particular problem solving tasks. They systematically classify cognitive properties of representation systems, allowing them also to discuss cognitive cost, and thus effectiveness, of using a certain representation system for solving a problem. Important variables assessed have to do with both the components of the representation (e.g., symbols, sentences etc.) and their characteristics, as well as cognitive processes from symbol parsing to problem solving.

4 Approach

In this section we introduce an Euler and a Hasse diagram that have equivalent observational advantage, in that, any entailment about sets that can be observed in one diagram can be also be observed in the other. However, the act of observing a particular set-theoretic claim is more complicated in the Hasse diagram. We will show this by describing the act of observation in these diagrams as integration networks that make the conceptual blends of the image schemas with the geometrical elements of the diagram explicit, and show that the integration network corresponding to the Hasse diagram is more complex then the one corresponding to the Euler diagram.

Fig. 1.
figure 1

Observationally complete Euler and Hasse diagrams (and thus of equivalent observational advantage) that are semantically equivalent to the set of set-theoretical sentences \(\mathscr {S} = \{ P \cap Q = \emptyset , R \subseteq P \}\).

4.1 Working Example

Take, for example, the set of set-theoretic sentences \(\mathscr {S} = \{ P \cap Q = \emptyset , R \subseteq P \}\) over a set of labels \(\mathscr {L} = \{ P, Q, R \}\) (two additional symbols, \(\emptyset \) and U, are also part of the syntax, to denote the empty set and the universal set, respectively). An observationally complete Euler diagram that is semantically equivalentFootnote 2 to \(\mathscr {S}\) is shown in Fig. 1(a). All set-theoretic sentences that are entailed by \(\mathscr {S}\) can be observed from this Euler diagram. We can also draw a semantically equivalent Hasse diagram for \(\mathscr {S}\), such as the one shown in Fig. 1(b). This Hasse diagram represents the lattice of all regions of the Euler diagram, generated as the lattice of sets closed under finite union and intersections, such that \(A\vee B = A\cup B\) and \(A\wedge B = A\cap B\).Footnote 3 Put more simply, the nodes of the second level from the bottom of the Hasse diagram, correspond to the four minimal disjoint sets R, \(P \setminus R\), \(\overline{P \cup Q}\), and Q. The bottom level corresponds to their intersection, which is empty, the third level is generated by all possible unions of the minimal disjoint sets, and finally the top level is generated by the unions of the previous unions. As with the Euler diagram of in Fig. 1(a), all set-theoretic sentences that are entailed by \(\mathscr {S}\) can be observed from the Hasse diagram of Fig. 1(b). In what follows, we will describe these observations using integration networks of image schemas with the geometry, and compare the complexity of the integration networks corresponding to the two diagrams.

4.2 Enactive Observation in Hasse Diagrams

To observe if a certain set-theoretic claim \(S \subseteq T\) or \(S = T\) holds in a given Hasse diagram (where S and T are labels or complex set-theoretic expressions formed using the operators \(\cap \), \(\cup \), \(\setminus \), and \(\overline{\,\,}\)), we must first identify the nodes of the Hasse diagram representing set-expressions S and T, and then check if there is an upward path between these nodes (for set inclusion) or if they are the same (for set equality). The existence of an upward path can be immediately ruled out if the nodes representing S and T are distinct nodes at the same level of the Hasse diagram. Let us denote this identification task with a function node that assigns to each set-theoretic expression S over a set of labels \(\mathscr {L}\) a node \(\textsf {node}(S)\) in the Hasse diagram:

  • if \(S \in \mathscr {L}\), then \(\textsf {node}(S) = \lambda (S)\), the node labeled with S

  • if \(S = S_1 \cup S_2\), then

    • if there is a downward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(S_2)\), then \(\textsf {node}(S) = \textsf {node}(S_1)\)

    • if there is a upward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(S_2)\), then \(\textsf {node}(S) = \textsf {node}(S_2)\)

    • if there is neither an upward nor a downward path between \(\textsf {node}(S_1)\) and \(\textsf {node}(S_2)\), then \(\textsf {node}(S)\) is the lowest of all those nodes that are on a meeting point between an upward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(U)\), and a upward path from \(\textsf {node}(S_2)\) to \(\textsf {node}(U)\)

  • if \(S = S_1 \cap S_2\), then

    • if there is a downward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(S_2)\), then \(\textsf {node}(S) = \textsf {node}(S_2)\)

    • if there is a upward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(S_2)\), then \(\textsf {node}(S) = \textsf {node}(S_1)\)

    • if there is neither an upward nor a downward path between \(\textsf {node}(S_1)\) and \(\textsf {node}(S_2)\), then \(\textsf {node}(S)\) is the highest of all those nodes that are on a meeting point between a downward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(\emptyset )\), and a downward path from \(\textsf {node}(S_2)\) to \(\textsf {node}(\emptyset )\)

  • if \(S = S_1 \setminus S_2\), then

    • if there is a downward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(S_2)\), then

      \(*\):

      if \(\textsf {node}(S_2) = \textsf {node}(\emptyset )\), then \(\textsf {node}(S) = \textsf {node}(S_1)\)

      \(*\):

      if \(\textsf {node}(S_2) \ne \textsf {node}(\emptyset )\), then \(\textsf {node}(S)\) is the highest among all those nodes (excluding \(\textsf {node}(S_1)\)) that are on all downward paths from \(\textsf {node}(S_1)\) to \(\textsf {node}(\emptyset )\) that do not go through \(\textsf {node}(S_2)\)

    • if there is a upward path from \(\textsf {node}(S_1)\) to \(\textsf {node}(S_2)\), then \(\textsf {node}(S) = \textsf {node}(\emptyset )\);

    • if there is neither an upward nor a downward path between \(\textsf {node}(S_1)\) and \(\textsf {node}(S_2)\), then

      \(*\):

      if \(\textsf {node}(S_1 \cap S_2) \ne \textsf {node}(\emptyset )\), then \(\textsf {node}(S)\) is the highest among all those nodes (excluding \(\textsf {node}(S_1)\)) that are on all downward paths from \(\textsf {node}(S_1)\) to \(\textsf {node}(\emptyset )\) that do not go through \(\textsf {node}(S_1 \cap S_2)\)

      \(*\):

      if \(\textsf {node}(S_1 \cap S_2) = \textsf {node}(\emptyset )\), then \(\textsf {node}(S) = \textsf {node}(S_1)\)

  • if \(S = \overline{S_1}\), then

    • if \(\textsf {node}(S_1) = \textsf {node}(\emptyset )\), then \(\textsf {node}(S) = \textsf {node}(U)\),

    • if \(\textsf {node}(S_1) \ne \textsf {node}(\emptyset )\), then \(\textsf {node}(S)\) is the highest among all those nodes (excluding \(\textsf {node}(U)\)) that are on all downward paths from \(\textsf {node}(U)\) to \(\textsf {node}(\emptyset )\) that do not go through \(\textsf {node}(S_1)\)

As is evident from the above description, we can observe set-theoretic claims in a given Hasse diagram by realising these observations in an enactive, experiential way through the image schemas link, path, verticality, and scale. Notice that all operations between sets are expressed as spatial relations between the objects in the diagram, therefore satisfying the definition of observation. We thus describe the cognitive process of observation as constructing a network of blends involving some instances of the aforementioned image schemas, and parts of the geometric configuration of the Hasse diagram.

Apart from the path schema, a verticality schema is also involved. Specifically, the base of the verticality schema is put in correspondence with the point that is geometrically lowest. This schema provides the polarity required in order to disambiguate which correspondences of the source and the goal of a path schema are needed in order to go ‘upwards’ or ‘downwards’; that is, to go upward, we put in correspondence the source with the point closer to the base, i.e., lower, and the goal with the point further from the base, i.e., higher. To move downward, we build the reverse correspondence. The link schema also plays a crucial role because what counts as a path, given the desired interpretation of a Hasse diagram, is formed only by those points connected by lines, not e.g., merely neighboring points, as the path schema structure dictates. Therefore, adjacency on the path is determined by lines drawn between node locations. In summary, we can model observations on the Hasse diagram through the involvement of a verticality schema to specify upward and downward orientation, several link schemas blended on pairs of nodes that are connected by some line, and also a path schema blended on the sequence of linked node locations from a source location (node) to a target location, capturing our experiential understanding of advancing, step by step, node by node, along the lines of the Hasse diagram.

Concretely, to observe, for instance, whether \(Q \subseteq P \setminus R\), we need to check if we can reach a target location \(\textsf {node}(P \setminus R)\) starting from a source location \(\textsf {node}(Q)\) by traversing a path of contiguous node locations going upwards. Since Q is already denoted in the diagram, there is no need to locate it by way of our enactive cognition. We would, however, need to identify the target location \(\textsf {node}(P \setminus R)\) in the Hasse diagram. To do so, we would need to check first if we can reach \(\textsf {node}(R)\) on a downward path from \(\textsf {node}(P)\), blending the base of the verticality schema to the lowest node, i.e., \(\textsf {node}(\emptyset )\), and a link schema and a path schema on the edge from \(\textsf {node}(P)\) to \(\textsf {node}(R)\) of the Hasse diagram, so that we can “walk down the path” from \(\textsf {node}(P)\) to \(\textsf {node}(R)\). Since this is possible, we next need to find all downward paths from \(\textsf {node}(P)\) to \(\textsf {node}(\emptyset )\) that do not go through \(\textsf {node}(R)\). This blends a verticality schema, two link schemas and a path schema on the Hasse diagram, in order to traverse the two steps on the path from \(\textsf {node}(P)\) to \(\textsf {node}(\emptyset )\) via the node location that is not labeled with R. The highest location on our path down (excluding \(\textsf {node}(P)\)) is the node we were looking for. Subsequently, we return to our original question, whether \(Q \subseteq P \setminus R\). Now, we have to check whether there is an upward path from \(\textsf {node}(Q)\) to the node we have identified as \(\textsf {node}(P \setminus R)\). Here, the scale schema comes into play. The way this particular Hasse diagram is drawn,Footnote 4 a user can easily put in correspondence the base of the verticality schema with the geometrically lowest shape of the Hasse diagram, i.e., the node representing \(\emptyset \), and one level of a scale to each group of points that are on the same horizontal plane.Footnote 5 This way, the user can observe that \(\textsf {node}(Q)\) and the node we identified as \(\textsf {node}(P \setminus R)\) are on the same level. Our embodied experience with paths, scales and the vertical dimension equips us with the knowledge that if two objects are on the same level of a vertical scale, it is impossible to traverse an upward path from one towards the other. Thus, it is immediately clear to us that there is no upward path from \(\textsf {node}(Q)\) to \(\textsf {node}(P \setminus R)\) and therefore \(Q \subseteq P \setminus R\) does not hold.

To summarise: although the fact that \(Q \not \subseteq P \setminus R\) is observable from the Hasse diagram, it requires from the user to walk many paths with different source and target locations, stepping through several linked node locations, sometimes following an upwards, sometimes a downwards orientation, and finding the highest node locations traversed. From our description it is evident that a complex network of blends involving many instances of the path, link, verticality and scale schemas, and correspondences with many different shapes, is involved.

4.3 Enactive Observation in Euler Diagrams

To observe if a certain set-theoretic claim \(S \subseteq T\) or \(S = T\) holds in a given Euler diagram, as the one in Fig. 1(a), we must first identify the regions of the Euler diagram representing set-expressions S and T, and then check if the first region is inside the second (for set inclusion), or if they are the same region (for set identity). Let us denote this identification task with a function region that assigns to each set-theoretic expression S over a set of labels \(\mathscr {L}\) a region \(\textsf {region}(S)\) in the Euler diagram:

  • if \(S \in \mathscr {L}\), then \(\textsf {region}(S)\) is the region inside the closed curve labeled with S

  • if \(S = S_1 \cup S_2\), then \(\textsf {region}(S)\) is the region made up of the combination of the insides of \(\textsf {region}(S_1)\) and \(\textsf {region}(S_2)\)

  • if \(S = S_1 \cap S_2\), then \(\textsf {region}(S)\) is the region that is both inside \(\textsf {region}(S_1)\) and inside \(\textsf {region}(S_2)\)

  • if \(S = S_1 \setminus S_2\), then \(\textsf {region}(S)\) is the part of \(\textsf {region}(S_1)\) outside of \(\textsf {region}(S_2)\)

  • if \(S = \overline{S_1}\), then \(\textsf {region}(S)\) is the region outside \(\textsf {region}(S_1)\)

As is evident from the above description, any set-theoretic claim in a given Euler diagram is enactively observed by way of the container image schema. We model this cognitive process as a network of conceptual blends involving some instances of the container schema and parts of the geometric configuration of the Euler diagram.

For instance, to observe \(Q \subseteq P \setminus R\), we need to check if \(\textsf {region}(Q)\) is contained in \(\textsf {region}(P \setminus R)\). This points to two instances of the container schema blended upon the geometric configuration of the Euler diagram, capturing our sense-making of the inside, boundary, and outside of \(\textsf {region}(P \setminus R)\), and of \(\textsf {region}(Q)\), together with the containment relationship between the two container schemas. Concretely, the integration network involved is as follows: first, to identify \(P \setminus R\), we put in correspondence the boundary of one container schema with the curves labeled P and R, the inside with the area between curves P and R, and the outside with the area outside curve P and the area inside curve R. With this blend, we model the way we observe \(\textsf {region}(P \setminus R)\) in the diagram as a container. Subsequently, to check if \(Q \subseteq P \setminus R\), we construct another blend between a second container schema and the same geometrical configuration. This time the boundary, inside and outside of the container will correspond to the curve labeled Q, its interior, and its exterior. Checking whether \(Q \subseteq P \setminus R\) amounts to observing that the boundary of the container schema we put in correspondence with the former is located on the outside of the container schema we put in correspondence with the latter. This observation again comes from our experience with containers, leading to the realisation that if Q is on the outside of \(P \setminus R\) then it cannot be on its inside, and thus \(Q \subseteq P \setminus R\) does not hold.

Regarding the complexity of the integration network required to model the observations of \(Q \subseteq P \setminus R\) from the Euler versus from the Hasse diagram, we can note that the integration network for the Euler diagram contains fewer different image schemas, fewer instances of image schemas, the diagram geometry itself contains much fewer elements, and the correspondences are also fewer. Concerning the blended space, blending the boundaries of container schemas with the closed curves in a diagram imbues the latter with a sense of enclosure and separation. This sense emerges in the conceptual blends, where geometrical and image-schematic elements are integrated with each other, into elements that are simultaneously geometric and image-schematic. As we have seen, what constitutes the interior, boundary and exterior of a configuration of closed curves representing a set-theoretic expression, such as \(P \setminus R\), arises in the way a container schema is blended with said configuration; not from the geometry itself.

5 Discussion

The predominant logical approaches to diagrammatic reasoning and effectiveness usually view the diagram as a mapping between an abstract geometry and an abstract semantics. These approaches seem to overlook the enactive cognitive processes on the user’s part, despite the fact that the term effectiveness can only be conceptualised and tested with respect to a user. We believe the user’s embodied experiences—whose invariants are crystallized in the form of image schemas—can help bridge that gap. Using them, we can propose a conceptual model of the sense-making of a diagram as the integration of image schemas with the geometry of a diagram. This way, we can provide a more cognitively-plausible approach to diagrammatic reasoning whereby the users act cognitively upon the geometry of the diagram.

According to our framework, the effectiveness of Euler diagrams for representing set inclusion and disjointness (demonstrated in behavioral experiments [5, 24]) can be explained as follows: The geometry of an Euler diagram can be put in correspondence with instances of the container schema. Through the process of constructing these correspondences, and thus integration networks, facts like \(R \cap Q = \emptyset \) in Fig. 1(a) become immediately apparent. This integration network models how a user cognitively structures set P as a container, surrounding curve R, enveloping it, thus preventing its exiting and coming into contact with set Q—in agreement with [21]. Furthermore, it has been proposed that classes and Boolean logic are conceptualised via the container schema [17]. Elements are understood as being in or out of a class, and Boolean logic has intersections and unions, which also emerge from a blend with the container schema. Moreover, the container schema corresponds very naturally to Euler diagrams, therefore making them apt to visualise such semantics [17, pp. 45, 122].Footnote 6

In contrast, when reasoning with the Hasse diagram, we think about paths, links, the vertical orientation, and levels of scales. Some indication that image schemas are implicitly used to cognitively structure diagrams is provided by the informal language researchers use when describing how Hasse diagrams should be used for reasoning [4, 8, 9, 21, 22]. Researchers talk about Hasse diagrams, and the posets they represent, as having top/bottom elements and arrows pointing upward (verticality). They mention implications or entailments going upwards, line segments running upwards, and of diagrams having upward paths (path, verticality). Reasoning is done by following upward/downward edges and upward/downward sequences of lines (path, verticality, link). Each line is said to connect an ordered pair of objects, and edges are said to connect adjacent nodes/elements and to form sequences (path, link). Moreover, nodes and edges can be traversed, lines can be followed or traced, and arrows can form sequences with consecutive points (path). Finally, posets have levels and a largest and smallest element (scale).

Additional support comes from behavioral experiments showing that being upright, as opposed to slanted, explicitly showing levels (i.e., having points placed on horizontal parallels), and having non-crossed lines, makes Hasse diagrams faster to interpret [14, 23]. These findings are consistent with our claims that observation in Hasse diagrams can be modelled as blends of verticality, scale, link and path. Arguably, being upright, showing levels, and having non-crossed lines, makes it easier to put the structures of verticality, scale, and link with path respectively, in correspondence with the geometry of a Hasse diagram. Regarding the non-crossed lines, perhaps crossings result in some ambiguity because there are two possible ways to link pairwise the four points involved in the crossing, making sense of them as being adjacent in a path. Theoretical work on diagrammatic reasoning also asserts that Hasse diagrams prioritise visualising the structure of the order they represent, through a vertical organisation, and explicit visualisation of levels [8]. Levels corresponding to elements with the same rank are geometrically orthogonal to the vertical axis. In fact, this axis is the one intended to be interpreted, and elements of the same rank are indeed not comparable semantically with respect to the ordering. This description seems consistent with our description of how verticality and scale may structure the geometrical configuration of a Hasse diagram.

Arguably, there is no definitive way to prove that a user reasons with Hasse and Euler diagrams with the image schemas we have claimed. Therefore, our approach is to show that these integration networks model all the possible observations that Hasse and Euler diagrams allow. In previous work we have followed the same approach to model the inferences we can draw from various diagrams [3]. In the present paper, we specifically discuss facts that emerge as observations, not simply inferences. Moreover, we use our framework to study a case where the observational advantage is equivalent between two diagrammatic representations, but arguably one is much more effective than the other for showing certain information. The reason for this discrepancy could be the mathematical abstraction of the theory of observational advantage. In contrast, our framework accounts for the user as an embodied actor by modeling observation as a conceptual integration network of various image schemas with the diagram geometry. We propose that one diagram may be more cognitively effective than another because the observations it affords can be modeled with a simpler conceptual integration network. Complexity manifests in several ways; we note that the different image schemas, and the different image schema instances, are much more in the integration network for the Hasse diagram. The geometric elements of the diagram itself are also much more, and the integration network overall has much more mental spaces, and more correspondences, than in the case of the Euler diagram. Since mental spaces and their correspondences are proposed to be realised and manipulated in the mind in some way, we conjecture that higher complexity of the integration network modeling the sense-making of a diagram, would correlate with a higher utilisation of the cognitive resources of the user reasoning with that diagram, and thus lower effectiveness of the diagram [10].

An additional contribution of our work is defining in more detail what Stapleton et al. call ‘meaning-carrying relationships’ [27]. The definition of observation that Stapleton et al. use includes this term, forcing them to address concrete geometric and cognitive properties of the diagram; a meaning-carrying relationship is defined as a visuo-spatial relationship between syntactic elements of a visual representation, that expresses a certain meaning. The term visuo- implies an agent with a certain body and perceptual faculties. Cheng et al. [6] also take as a given which relations between symbols of a given representation are meaningful, and should be used for inference. One of our contributions here is that what counts as a meaning-carrying relationship, or valid inference, can be explained in terms of blends with image schemas. At the level of discrete shapes like closed curves, lines etc., a wide range of spatial relationships hold; shapes can be related by having the same or different size, color and shape, by showing symmetry with respect to certain axes, and by their relative position. Someone who has been trained on how to read Euler diagrams knows that only topological relations are meaning-carrying. In contrast, in Hasse diagrams, relative position and topological intersection of lines with points is meaning-carrying, but topological intersection between lines is not. Focusing on the right meaning-carrying relationships and utilising them correctly for reasoning can be challenging for novices. Thus, we believe our approach can have future applications relating to guiding novices on how to use diagrams. Moreover, our theoretical contributions include showing how meaning-carrying relationships can become salient through blending apt image schemas with the geometry of a diagram, making explicit their experiential origins, and finally, providing new avenues for evaluating the cognitive effectiveness of diagrams.

6 Conclusions and Future Work

In this paper we explore the notions of observational advantage and meaning-carrying relation of Stapleton et al. [27] in a more cognitively-inspired way. In this, and most diagrammatic reasoning work, the specific meaning-carrying relations involved are taken as a given, and treated abstractly. In contrast, we believe our framework explores how they can emerge through the interplay of image schemas—which crystallize our early embodied experiences—with the diagram geometry. Our model simply accounts for the differences of the image schemas at play, keeping all else equal. We do not model all processes and factors that could affect the cognitive cost, e.g., the user’s experience with the diagrammatic formalism, domain knowledge and cognitive strategies. We study two examples of diagrammatic notations, Hasse and Euler diagrams, with equivalent observational advantage over sentential set-theoretical notation, whereby an Euler diagram is arguably more cognitively effective for many set-theoretic claims than a Hasse diagram. We show that their difference, according to our framework, is the complexity of the integration network modeling how observations on these diagrams become possible. In this paper we discuss the integration networks reflecting only one example of observation. However, we describe how various types of observation about sets can be made with both Euler and Hasse diagrams, and it seems likely that the integration networks modeling most of them would be much simpler in the case of the Euler diagram. Nonetheless, depending on how the observational advantage and the meaning-carrying relations are defined, it might be the case that certain sentences regarding the empty set are not observable from the Euler diagram but only from the Hasse diagram [22, p. 10].

In previous work, we have used first-order logic to formalise and implement the integration networks reflecting reasoning with several diagrammatic formalisms [1,2,3]. Image schemas provide pointers to the meaning-carrying spatial relations of diagrams, and a cognitive explanation of how an embodied agent uses those relations to reason about the semantics the diagrams represent. Our framework could be used to guide students on which spatial relations of a diagram they must draw meaning from, by making explicit a blend with some image schema. Moreover, by analysing the integration network modeling observations with a particular diagram, we could compare their cognitive effectiveness. The above could be developed into computational systems, as we have already shown that such conceptual blends can be implemented [1,2,3].