1 Introduction

During the summer of 2022 I took part in a summer school on the philosophy of mind in Pomaia, Italy. One of the voluntary workshops was led by Father Francis Tiso, who took us for extraordinary early-morning walks in the surrounding countryside, where the rolling hills and valleys are covered with cultivated orchards and wild bushes. During one of these walks, just before daybreak, we descended a path overlooking a valley. Father Tiso stopped us at a vista at the foot of a hill, beyond which the sun was just about to reveal itself and cast its light over the valley. He paused and explained that when the first rays of sunlight will transcend the hill and hit the canopy of the trees and bushes down in the valley, birds will jointly start to sing and a breeze will rattle the leaves of the trees. And, indeed, when the first rays hit the canopy, this is exactly what occurred. A wind appeared from nowhere and the flickering leaves rattled and reflected and refracted the rays of the morning sun, creating an effect of the whole canopy flickering with light and rattling with sound, while the birds joined in, in beautiful harmony.

It was a magnificent moment of what I call shared attention. Without the guidance of Father Tiso, I would not have detected this symbiotic event of nature; I would not have known which aspects and events to pay attention to in order to grasp the connectedness between the light, the wind, and the flora and fauna. He did not say much, just describing a few aspects of how the rays of the sun will trigger a natural reaction in the environment. He did not add any theological tonality to his statements. Still, the event was revelatory for me. I come from a completely atheistic environment, and I have been socialized to regard theological explanations with skepticism. But what we witnessed together that morning had an existential and even spiritual character; it gave me a sense of what grace might mean—that through the benevolent workings of natural light, the whole fabric of our environment has a shared foundation. Within our small workshop group, consisting of people from quite different backgrounds, we shared this sense of communion. For a brief moment we were all beheld by the world in the same way, through the guidance of Father Tiso’s simple gestures and words. Independent of the metaphysical or spiritual explanations we all separately would give for this event, from the perspectives of our different frameworks and backgrounds, I felt that the event in itself provided us with a miracle—a miracle that occurs every morning for a few minutes, there to be sensed if we have the eyes to see it and the ears to hear it.

In this context, the concept of “grace” alludes to the external agency that provides me with new possibilities for my actions of perception. Father Tiso enabled me to acknowledge that which was there to be seen. His orientation was a contributing factor to my revelation. However, what was there to be seen was independent of our subjective intentions. The condition for the shared attention was external to our respective intentionalities. Without a common world with its external agency in creating conditions for salience, our intentionalities would not have found an orientation from which to grasp the phenomenon. In this sense, grace signifies the propensity of the natural world to provide our different subjectivities with a common structure that grants us the potential for salience and discovery. The revelatory aspect of the experience hinged on the realization that my attention was awakened by something external to self, an agency that had its source in something other than my intentions and motivations (see Freeman 2015, 172).

This experience reverberates with my earlier work and the many things I have tried to describe within the modality of attention. It is a philosophically puzzling concept. In late modern philosophy, a discourse on attention is brought in at the beginnings of psychology as an independent scientific subject (see Fredriksson 2022, 11ff). Why is attention puzzling? William James posed the question: “Millions of items of the outward order are present to my senses which never properly enter into my experience. Why? Because they have no interest for me” (2017, 402). Five decades later, Maurice Merleau-Ponty starts his treatment of attention with a similar question: “How could one real object among all objects be able to arouse an act of attention, given that consciousness already possesses them all?” (2012, 30). The question of how objects and aspects become salient for the perceiver is the foundation of the philosophy of attention. If we think about the vastness of our visual field, the infinite amount of information that our perception contains, why do we cognize and sense certain aspects and elements rather than others?

In the anecdote above, we already have some answers to this question. Attention is driven not only by my internal intentions, thoughts, and reflections, or “interests”, as James would have it. There is a certain role played by that which is external, that creates salience for my perception. The natural light of the sun—its rays reflected and refracted by the structures and qualities of the environment, with its play of light and shadow—will to some extent create paths for my perception. This means that attention is partly determined by the external environment. Furthermore, the example shows how attention requires the perception of other subjects. Without the guidance of Father Tiso, I would not have been present for the event, even if I had been standing on the same spot at the same time. I might have been caught up by my inner reflections on the talks at the summer school or concerns about my orientation in the foreign environment. With his guidance I became aware and was able to attend to exactly these ephemeral occurrences. And this experience was shared among the group. For a brief moment, we were taken by the same aspects of our perception: we beheld the same phenomena. In this sense, attention is not solely determined by my intentions, desires, and will. It also requires the guidance of others. Attention is not purely governed by the volitional control of the self; it requires an attitude in which we are expectant of what the world presents to us—including the perspectives of other subjects—and this cannot be determined by us beforehand. Whereas my intentions and will usually come with a specific expectation that I project on the world and others, attention builds on a responsiveness and expectation that is open-ended. Tim Ingold describes it as the propensity that “allows every present moment to be a new beginning” (2018, 21). In acknowledging that there is a particularity and an open-endedness in each moment of perception, we also accept that our attentiveness carries with it a revelatory potential, which cannot be put into play willingly or intentionally.

We did not create the revelatory event in Pomoia, as an intentional joint focusing of our perceptions. Rather, we discovered something together, through being receptive, vigilant, and attentive. What we saw, heard, and sensed in a general manner was provided by the play of natural light in symbiosis with the wind, the leaves, the birds, etc. Merleau-Ponty describes this as communion, referring to the theological meaning of the word. The sharedness is given to us by grace: “[N]othing other than a certain manner of being in the world that is proposed to us from a point in space, that our body takes up and adopts if it is capable, and sensation is, literally, a communion” (2012, 219). The perceptual external world is a prerequisite for the common—a common in which subjects cease to be in a first- and third-person relation with each other: a we-perspective is formed (Merleau-Ponty 1964, 175).

I claim that these two aspects are crucial when we want to articulate a philosophical account of attention as a shared practice:

  1. 1.

    The way in which the external objective world guides and molds our attention.

  2. 2.

    The way in which other subjects play a part in co-constituting our ways of attending to the world.

Attention—the way in which certain objects and aspects become salient for us while others stay obscured—is a complex affair. In my example, my attention to the symbiotic event, triggered by the first rays of sunlight hitting the tree canopy, required many things: the guidance of Father Tiso; the reactions and the presence of the others in the group; the rays of the sun; the breeze; the interaction of the leaves with the breeze which made a rattling sound, and refracted the rays of light, creating a flickering in the whole environment; the joint harmony of the birds; and so on. If I were to be asked why I, for a moment, was able to focus intensely on the play of light and wind in the leaves, whereas I usually would not be attentive enough to discern these intricate details, all the aspects that I recount above would form an important part of the answer.

In my book on the phenomenology of attention, I try to make similar points by using a somewhat different example in which the guide for my attention is a dog rather than a human being. I describe an experience of how my dog was able to enrich my sense of the perceptual world:

“When I go for a walk with my dog and suddenly she reacts to a squirrel high up in a tree that is on our path, my attention is turned through her engagement. Earlier, my mind was occupied by reflections on unanswered emails, when suddenly my perception is brought into the present. Without my dog, I would have no chance of detecting the squirrel. This makes me realize that my sense of this world is not solely constructed by my own devices. Through the attention of the other, I may discover aspects of our common world that are partially hidden for me”. (Fredriksson 2022, 111)

This experience had a revelatory meaning similar to the one I described earlier with Father Tiso. When I got a dog for the first time, I started to perceive the environment in my neighborhood with new eyes. The park next to my house, which I had walked in hundreds of times, was suddenly shown to me in a new light once my walks were accompanied by my dog. She reacted to all the wildlife there—the hares, the doves, the foxes, and the squirrels—as her perceptive apparatus was more prone to detect them. Although she was not able to speak, her embodied disposition and her actions guided my gaze: she started pulling the leash, ran toward the tree trunk, looked up toward the branch with the squirrel sitting on it, and started barking. I was able to see the animals and attend to them, since my dog could sniff them out while they were hiding in the bushes or the tree branches.

The aspect that is relevant here for my philosophical account of shared attention is that my dog is apparently a different creature from a human being. Her cognitive capacities and sensory modalities are clearly different from mine: she walks on four legs, whereas I walk upright; her sense of smell is superior to mine, whereas my sense of vision is superior to hers. In the example, it is exactly through these differences that she is able to enrich my understanding of the perceptual world.Footnote 1 Our differences are, in this case, not a hindrance for her guiding my attention but actually the circumstance that makes it possible for me to see things that I would not see by my own means or even with the guidance of another human being.

Both of the above examples reveal something about how we conceive of the perceptual world in a shared way and how we acknowledge that we live in a shared world. I regard these factors as existentially important aspects of the ways in which our attention is constituted. When James and Merleau-Ponty posed the puzzling question about how certain features within our perceptual field become salient for us whereas others stay obscured, the role of the processes in the natural world and the guidance of other (even non-human) subjects are part of the explanation for how our perception is able to find its focus and orientation.

Merleau-Ponty acknowledges this ambiguity in how the focusing of our perception is co-constituted: “We perceive according to light, just as in verbal communication we think according to others” (2012, 323). According to this view, our ability to focus our perception is dependent from the start on the environment structuring our orientation by the means of light and shadow, and by the gaze of others. It is partly these kinds of realizations that have led to the development of theories of joint attention. Shaun Gallagher describes joint attention as a capacity that “has tremendous importance for social interaction and for our ability to generate meaning through such interaction” (2020, 108). Through the coordination of our perceptions, we learn from others, and our world is enriched beyond what we could conceive of solely by ourselves. In developmental psychology, joint attention is studied as a pivotal phase in child development that plays a crucial role in socializing infants to a common adult world (Eilan 2005, 1). What can be seen as crucial for the concept is that it articulates the way we as subjects are co-constituted in relation to others. In the phenomenological framework this we-intentionality is primarily the context for the theory of joint attention (Brinck et al. 2017, 134). However, this line of thought is not clearly compatible with the mainstream theories of joint attention, and I will now show why this is the case.

2 Shared Attention vs. Joint Attention

My approach to the theoretical framework of joint attention stems from examples such as those I have described above. I believe it is philosophically significant to acknowledge how our perceptions are guided by others and how these practices of shared recognition and attention play a part in molding and scaffolding our understanding and knowledge of the world. We need the eyes of others to find our orientation. The act of focusing our perceptions in the same way, on the same objects and aspects, and the understanding of other beings as able to do the same, is constitutive for how our consciousness comes about. However, there is no clear consensus in the current state of research on how joint attention should be defined (see, for example, Eilan et al. 2005; Seemann 2019; Urban 2014). My examples and claims are not necessarily compatible with the mainstream of theories of joint attention, whereas there is a clear affinity with alternative theories stemming from the phenomenological tradition.

One dividing question in the literature on joint attention is whether this capacity requires higher-order reflective cognitive capacities and a shared language, that is, whether it is a solely human modality of consciousness or whether the phenomenon is also shared by other non-human animals (see Urban 2014). The first view would clearly disqualify the example of my dog as joint attention.

Another common theoretical underpinning is that joint attention is often studied as a “mental state” occurring in a singular human mind. It is considered to be dependent on the capacity of the human mind to simulate or theorize about the intentions and representation of the other subject. Based on this theorizing or simulation, a sense of joint perception is achieved (see Eilan 2005, 10; Seemann 2011, 4f). These theories commonly presuppose that joint attention involves forms of mindreading: interpretations and predictions about the representations and intentions of the other.

In my first example, with the rays of sun creating the reaction in the environment, shared attention does not build particularly on any reading or theorizing of the others’ minds but rather on interaction within the group and interaction with the environment. Surely Father Tiso’s gestures and sparse words were partly formative for my attention, but this did not come about through my somehow reading into his mind; rather, my focus was supported by the interaction between Tiso’s gestures and the processes in nature. Because of this, both of my examples would be ill-suited to the mainstream theories of joint attention.

An alternative account has been developed by psychologists and philosophers of mind who are rooted in the phenomenological tradition. Child psychologist Vasudevi Reddy aims to show how attention, and the way we share our practices of attending, cannot solely be understood “as a ‘purely’ mental state that is both discrete and unavailable in action and interaction” (Reddy 2005, 104). A related account of joint attention can be found in the work of Shaun Gallagher, who writes, “the kind of coordination needed for joint attention is the kind of movement found in embodied interaction rather than a psychological coordination of mental states” (2020, 108). These critical accounts are much more in line with the existential character of experiences of what I call shared attention, since they do not build on the reductionist views in which joint attention is considered as dependent on the higher-level cognitive reflective capacities, nor do they emphasize mindreading as a precursor for joint attention.Footnote 2

In order to understand what we are talking about when we talk about joint attention, I will now turn to a discussion on how the mainstream view is constituted. I aim to show in what way much of the theory of joint attention is quite narrowly constructed and comes with tacit disciplinary biases that exclude much of what is existentially important. I want to show that in our interactions, we do share our perceptions even with beings that are distinctly different from us. And that this difference in our personal worldviews, our cognitive capacities, and our sensory modalities should be seen as a constitutive aspect of our practices of shared attention.

As the reader might notice, I have intentionally used the term “shared attention” in my examples rather than the more technical term “joint attention,” exactly because the more technical term comes with a baggage of metaphysical assumptions and reductionistic ideas. Joint attention is used partly as a diagnostic tool to determine certain stages in human development (Eilan et al. 2005). It is used as a signifier for a certain form of joint intentionality that is claimed to distinguish the human animal from other forms of biological life (Call and Tomasello 2005; Whiten 2013). The term is entrenched in a certain cognitivist framework in which consciousness is viewed as foundationally representational. All these theoretical underpinnings have their understandable aims, and they can be useful tools in certain specific empirical and theoretical research. However, they also limit our understanding of the phenomenon of shared attention.

3 Theory of Joint Attention as a Mental State

The ongoing debate within philosophy of mind and psychological research on how a joint form of attention should be understood often circles around questions about whether joint attention is achieved through “rich,” higher-level, cognitive and reflective faculties of the human mind or whether joint attention simply signifies “lean” behavioral gaze following (Seemann 2019, 161).Footnote 3 As Timothy P. Racine has shown, the whole distinction between rich and lean conceptions is problematic from the start (Racine 2011, 22). Although these theoretical underpinnings have served some explanatory purpose in describing aspects of joint attention, the debate over which aspect (lean or rich) should be considered to be primary has stood in the way of developing an account in which joint attention can be understood as a complex, context-dependent, and dynamic phenomenon (Racine 2011, 38). Much of my view follows this line of argumentation in trying to unpack the limiting effects of reductionism.

In order to understand how the theoretical framework of joint attention is constructed, we have to scrutinize its key building blocks. Many theoretical frameworks (exceptions: Gallagher 2020; Hutto 2011; Racine 2011; Reddy 2005) presuppose that we are able to attend jointly because we have developed the propensity to theorize about other minds. An important question here is, who does this “we” refer to? I will return to this question later. For now, I need to articulate what is meant by “theory of mind.” The theory-of-mind theory (TT) claims that joint attention (among other higher cognitive functions) requires reflective mental capacity: I am able to infer what you are attending to and your intentionality—the motivations guiding your perception—through mental processing based on knowledge of how my own attention and intentionality works. Victoria Southgate describes TT as follows: “A theory of mind comprises not only the formation of a representation of someone’s thought or perspective, but the process of using that representation to generate predictions about how those thoughts will influence behavior” (Southgate 2013, 15). In this sense, I can theorize about your intentions and perceptual content—hypostasize what you are seeing and how you are experiencing it—because of our common human reflective capacity. According to TT, we are able to create knowledge of the other’s mental states based on our reflective knowledge of our own mental states. Axel Seemann writes: “[S]ubjects of propositional common knowledge must have in place a reflective understanding of their own and their cooperators’ mental states” (Seemann 2019, 179).

The emphasis in TT is on the ability to create both representations of the other’s mental state and an understanding of the intentions of the other, based on these representations. Although Josep Call and Michael Tomasello claim not to advocate TT (2005, 59), their account of joint attention builds on a very similar premise, namely, that joint attention requires inferential knowledge of the intentions of the other (Call and Tomasello 2005, 60). According to the representationalist/intentionalist framework, joint attention is thus something that occurs in the mind as a “coordination of mental states” (Gallagher 2011, 295), through constructing a theoretical understanding of the attention of the other mind.

The question that follows is, what kind of beings are capable of this level of mental action? For example, Seemann admits that joint attention might be achieved by “some non-human primates” (2019, 159). Whereas Call and Tomasello claim that chimpanzees (as the primate closest to the human species) are not capable of joint attention, since they lack the capacity to theorize about other minds: “Our hypothesis is simply that they have the cognitive skills to recall, represent, categorize, and reason about the behaviour and perception of others, but not about their intentional or mental states—because they do not know that others have such states, since they cannot make a link with their own” (Call and Tomasello 2005, 61). Within the representationalist/intentionalist framework, the pivotal issue for whether joint attention is achievable hinges on the capacity of reflective cognition within an “individual mind” (Seeman 2019, 62). According to Call and Tomasello, chimpanzees have some level of reflective cognition (since they can recall the actions and perceptions of others) but not a fully developed ability to read other minds.

One remarkable aspect of these discussions is that TT and joint attention are quite commonly understood as achievements for beings who have reached a certain level of development in two senses:

  1. 1.

    Development in children: Subjects of a certain age achieve the propensity for, first, joint attention and, later on, TT.

  2. 2.

    Development of the human species: At some point in history the human species developed into a life form with propensity for joint attention and later on TT.

The theory theorists create a kind of script for the developmental arc of the human species and the developmental arc of infants. In both cases, sufficient development is understood to be achieved based on theoretical assumptions about a certain linear cognitive development within subjects belonging to a specific category of primates. The tacit assumption here is that, at some point, the human species achieved the ability to theorize about other minds, and at some point an individual infant attains sufficient development for the same task. For infants, predictions about this development vary. For example, Josef Perner (and Jean Piaget) claim the threshold to be around the age of 18 months, whereas Tomasello defines it as 9–12 months (see Reddy and Morris 2004, 653).

However, as Reddy and Morris acknowledge, there is a tacit conceptual problem in TT that is independent of the question of correctly defining the threshold. The question is, what do theory theorists mean by communication? Reddy and Morris note that a common cognitivist articulation of the threshold is dependent on distinguishing between mere behavior and the capability to read other minds. Whereas small toddlers (under 9 months) and chimpanzees are assumed to be able to simulate and react to the expressions of others, this is explained from the TT side as merely behavior,Footnote 4 but not yet as proper understanding of the other’s intentions (i.e., mental life). Reddy and Morris write:

“The sequence has also been adopted in relation to evolution, with monkeys being described as good ethologists (reading behaviour but not minds) but poor psychologists (reading minds!) (Cheney & Seyfarth, 1990). In this way, mentality is conceptualized as a gradually emerging intervening variable in the understanding of behaviour” (2004, 656).

When the theory theorists define true communication, their claim is that it requires that “both participants have equal access to interpretive procedures that entail sophisticated theories of mind” (Shatz and O’Reilly, quoted in Reddy and Morris 2004, 652). To exemplify this “mindreading,” Michael Tomasello claims that to understand what is meant when another person points at something with her finger, it is required not only that I see the finger and the object it points to but also that I understand the intention of the pointing. According to Tomasello, this reading of the other’s intentions requires conceptual communication and a common form of life (2008, 4f), which entails that the commonness is achieved as a distinctly human trait, dependent on our “extraordinary cooperative abilities” and “linguistic capabilities” (Watzl 2017, 16).

Reddy and Morris detect a conceptual problem in these kinds of theories, since communication is then understood as “the activity of one individual subject towards another rather than something that emerges between them” (2004, 653). If we grant some merit to the “mindreading” hypothesis, it does refer to something we do. We do theorize about other people’s intentions and about the contents of their minds. We may inquire and ask questions about how the other person experiences a certain object and about her intentions regarding this object, since we share a language. However, can we call this true communication and interaction? In the above-mentioned theories of mindreading, all the interpretation and theorizing take place in one singular mind as predictions based on cognitive reflections on what potentially goes on in the other’s mind. Reddy and Morris point out that in this view the communicator is seen as isolated from the receiver, and the act of communication is separated from the content of communication, which leads to a form of solipsism (2004, 653).

To highlight the philosophical problem here, let me bring in the famous private language argument of Ludwig Wittgenstein: “Suppose that everyone had a box with something in it which we call a ‘beetle’. No one can ever look into anyone else’s box, and everyone says he knows what a beetle is only by looking at his beetle. Here it would be quite possible for everyone to have something different in his box” (2009, § 293). Wittgenstein criticizes the idea of subjective consciousness as consisting of “internal” representations. He shows that if our concepts were based on internal representations, we would never be able to conclude that we are talking about the same thing. Our interactions and communications would only involve theorizing about other minds and we would never actually engage, act, or interact with the other.

The problem hinges on the concepts “experiential facts,” “mental representations,” “perceptual contents,” and “states of mind.” In the above-mentioned intentionalist/representationalist theories, the notion of experience becomes fixed and compartmentalized. Verbs become nouns: the action of intending, perceiving, and experiencing is transposed into “intentions,” “perceptions,” “experiences,” that is, representations. This grammatical shift reveals the difference in the theoretical frameworks (see Reddy 2011, 137). The distinction here runs between the mainstream intentionalist/representationalist accounts (Call, Tomasello, Whiten) and the enactive and embodied accounts (Reddy, Gallagher). A second critical tradition, with similar aims to the enactivists, can be traced to Wittgensteinian philosophers of mind (Hutto, Racine).

For these reasons, there is a weakness in the intentionalist/representationalist theories of mind. They can hardly explain what we quite commonly do when we interact and communicate with each other (see Gallagher 2020, 110).

4 Communication Based on Difference and Novelty

When we look closely at what exactly is assumed in the theory-of-mind-theory, there is something that goes against the grain of what I call shared attention. The pivotal point for my account is that in my two examples, my attention was awakened by the actions and engagement first of Father Tiso and then of my dog. They were able to reveal something new, different, and unexpected in our common environment. If my background and experience had been similar to that of Father Tiso, I would have easily acknowledged the interconnectedness of the sun, the wind, and the natural environment. Whereas, in my case, the differences in our life-worlds were the factor that triggered my attention. He showed me something unexpected, something that I could not conceive of by myself, and as a result the experience was revelatory for me. The same aspect is present in the example with my dog. Because of her different cognitive capabilities, the differences in our sensory apparatus, and the differences in our ways of engaging with the environment, she was able to guide me toward that which I could not have seen by my own means.

In both cases, no theorizing about the representations or hidden intentions of the other was required. My attention was focused on the interaction of my dog and the environment, and on the interaction of Father Tiso, the other group members, and the environment. It was the engagement, actions, and behavior of the other that awakened my attention. I was able to stretch my perception beyond self-referentiality and see things and aspects I would not see by myself—aspects that are not created by my thinking. Or, to put it another way, the intentions of the other were present for me in my direct perception rather than in my reflection and thinking. Gallagher describes how joint attention comes about through “perception-based understanding of another person’s intentions because their intentions are explicitly instantiated in their embodied actions” (2020, 106). In this view, the intentions of the other are seen as integrated in the embodied actions—intentions are not isolated inside a singular mind.

Now, someone might object that in the example of Father Tiso, something was communicated by his words. He told us about certain phenomena that would take place. Even though this is true—his words did guide my attention—the interconnectedness between his words and the visual phenomena was what awakened my attention. It was not his words per se that mattered, nor his words as expressions of his inner intentions, but rather the interaction between his words and the external perceptual world that gave me a sense of connectedness and revelation. And, as the example of my dog shows, this kind of revelatory experience of shared attention may be achieved even without words. It was enough that my dog showed me where to look, through her embodied actions. I did not need to speculate about any hidden intentions, since her intentions were explicitly expressed in her embodied disposition and orientation.

Another, perhaps more pressing, objection is that the dog example does not really describe joint attention, since there is no reciprocity. I claim that my experiential horizon is expanded through the guidance of my dog, but this is still a one-sided affair, since my dog is clearly not affected by my embodied disposition nor by my way of attending to the squirrel. The dog is not grasped by my attention, whereas I am grasped by hers. Furthermore, her reaction carries completely different desires and intentions compared to mine (hers probably predatory instincts; mine a sense of the wonder and joy of discovery). We can claim that the example shows that I can see the same thing as my dog, but this does not entail seeing it in the same way. The meaning of the squirrel is very different for me and for my dog. It solicits us both in different ways.Footnote 5

This objection points at something significant. I do understand that there is an important meaning for the concept of joint intentionality and that there is a distinction between mutual intentional and affective attention, and merely one-sided attention in which I see the other interacting with the world. For these reasons, the example of Father Tiso carried an existential importance for me, since the revelatory experience was heightened because the wonder in that moment was shared with others. I was acutely aware of the phenomena being met with similar enthusiasm within the group and this amounted to a we-perspective. Whether we call this “shared” or “joint” does not make a difference; both concepts disclose the mutuality at play. The quality of my experience is different owing to the mutual and reciprocal dynamics.

What is important here, however, why we should accept these anecdotes as examples of “attention.” I claim that the two examples exemplify an important aspect of something being salient for several subjects, and that this saliency is intrinsic for the concept of attention. Attention as a concept refers to some aspect or feature of the perceptual field becoming clear and distinct, in relation to other aspects that are more obscure. And this criterion for attention is met in the example of the squirrel, as it appears saliently for both me and my dog, even though our motivations and actions in relation to this object are characteristically different. In this case, the salience of the squirrel is common for me and my dog. Among all the other objects, features, and aspects in the environment, we are for this moment grasped by the squirrel. And even though my dog is apparently attentive of the squirrel rather than my awareness of the squirrel, whereas I am attentive of the squirrel through my awareness of my dog’s awareness of the squirrel, we still share attention owing to the same object being salient for both of us. Our intentionality, actions, understanding, and motivations might differ, but our attention qua salience has a common ground in the external world.

Whereas the mainstream account usually articulates joint attention as a species or subcategory of shared intentionality (see Urban 2014, 63), I want to highlight how the common salience in many (not all) cases works as a prerequisite for what I call shared attention, and in extension also to what commonly is referred to as joint attention in the mainstream theories. In these moments in which the same object becomes salient for two or more subjects, a shared understanding of the world may start to gain a foothold. To claim that the sharedness of intentions, interests, or common goals must be in place before two subjects may perceive the same thing saliently is too categorical as a criterion for shared attention. The anecdote of my dog shows how shared intentions do not necessarily precede shared attention.

What I want to emphasize by this example is the moment in which my attention is grasped by the actions of perception of another being. By drawing me out of my self-constituted visual habits, my dog brings me into her world of perception. The shared salience is a starting point for a relational understanding of the world, through which I am invited to see the world through the intentions of the other. This entails that I must surrender something of my disposition and adapt something of my dog’s disposition. I become enabled to see the park and its wildlife through the eyes of my dog. We might say I adapt to a dog-like perception, which does not mean that I also come to see the squirrel as prey, or that I would somehow develop canine sensory and cognitive capacities, but rather that I become responsive to my dog’s way of seeing the world—I become enabled to see salience in what she perceives saliently. Our “intentions,” in the meaning of motivations for action, might still differ, but my awareness of the environment has become affected by aspects of her way of perceiving the world, drawing me toward a new way of corresponding with the environment that I had not acknowledged before (see Ingold 2018, 30). Anna Bloom-Christen describes this receptiveness as “attentionality”. It is “the beginning of participation” and precedes my deliberate, intentional agency (2023, 69f). From this point on, our relationship may develop toward shared intentionality: my dog and I may share a world in a sense that we did not before. However, in this example, potential shared intentions develop out of the moment of shared attention and common salience, not the other way around.

The critical point that I want to underline with this articulation of shared attention is that the idea that we would first have to establish shared intentionality between subjects, in order to achieve a shared from of attention, is forced and misleading. Of course, shared attention often takes this form: the walk in Pomoia was a project in which our actions and motivations—to take a walk together to discover the surrounding nature—were, at least to some extent, intentionally shared from the start. Some common goals were set, preceding and leading up to the revelatory event of attention. However, the dog example shows that the sharing of attention—in the meaning of attending with somebody and perceiving salience in the same thing—also may occur without defined shared goals. The way in which my dog’s intentions start to inform my understanding of the environment does not depend on us having the same motivations, goals, or reasons for action from the start.

This is remarkable since attention is revealed to be shared despite our different life-worlds. And although we can agree that the similarity of our forms of life enables shared intentionality and joint action, this cannot be the whole story, since we do share a world with beings that are distinctly different from us. When we ask how this is possible, we cannot fall back on explanations based on similarity in experience. There is a modality of my mind that enables me to grasp that which is different from me, external to my subjectivity, and independent of self. And we are capable of learning to perceive the world differently through the guidance of beings that are different from us. This modality is emphasized in what I call shared attention.

This does not refute the fact that some level of phenomenal similarity must be in place for shared attention to be possible. I do share overlapping perceptual capabilities with my dog—we both have senses of sight, hearing, smell, taste, etc.—and without this common experiential ground we would not be able to interact and guide each other. However, if the similarities in our experience were exhaustive, there would be no need for communication in the first place. Because we have different perspectives on the world, we need to establish not only similarity and consensus but also an understanding that is able to contain this plurality. Tim Ingold describes how correspondence requires “the co-dependency of commoning and variation” (2018, 26). Some commonality is required in order to establish and grasp the plurality and the differences in our perspectives on the world. Therefore, similarity alone does not grant us a sense of a shared world. If we take the point about similarity to its extreme, I am left with the other seeing exactly in the same way as me. And without any difference there is no room for salience to occur, nothing novel, different, or unfamiliar to awaken my curiosity and wonder.

Lastly, I want to articulate this aspect of attention by showing how salience is connected to the aspects of novelty. Whether we want to talk about joint or shared attention, the meaningfulness of these concepts hinges on how we understand the concept of attention.

In addition to the philosophical problems described above, there is one more critical detail that I want to highlight, which reveals a certain misconception about the concept of attention in the joint attention mainstream. The modality of attention is a unique concept that reveals something about the process of how we are able to grasp and learn new things. This aspect usually goes unnoticed in the theoretical discourse.

There is a stark discrepancy between Call and Tomasello’s account and mine, since they emphasize the similarity in cognitive capacities as the guarantor of joint attention. They claim that, since certain subjects are able to reflect between the intentions and representations of their own minds and those of the other mind, we are able to achieve a joint way of attending to the same object in the same way. This theory is built on the idea that we share a certain life-world and a common conceptual ground. Tomasello claims that “The ability to create common conceptual ground—joint attention, shared experience, common cultural knowledge—is an absolutely critical dimension of all human communication” (2008, 5). He continues: “Human Cooperative Communication includes everything we both know (and know that we both know, etc.), from facts about the world, to the way that rational people act in certain situations, to what people typically find salient and interesting” (75). I do not contest this categorically; of course, we often share some common ground and through this commonality we are able to share a world.

However, this does not explain how we come to learn new things, adopt new ways of seeing, and acknowledge new objects and aspects in the perceptual world. And I claim that these aspects of novelty play a foundational part in the act of shared attention. My account, supported by the embodied and enactive accounts, emphasizes the requirement of there being at least some level of difference between my mind and the other’s mind in order for shared attention to occur.

When I am dealing with another mind that is working in a similar way to mine, at some point this interaction becomes unproductive, stale, and boring. A person with a quite similar experiential background and way of thinking would not have been able to help me acknowledge the symbiotic play of the rays of the sun and the environment on that morning in Pomaia. A human being would most probably not be able to guide me in detecting the wildlife in the park, at least not in the very direct manner as my dog.

In both cases, my attention is awakened by that which is alien and unfamiliar to me: aspects and objects are brought to my attention owing to their being unprecedented for me. Attention is not only a concept that signifies a focusing of perception; it also refers to the unexpected, the surprising, and the wondrous. Bernhard Waldenfels describes otherness—the tension that something unknown, unfamiliar, or alien poses for one’s experiential life—as a constitutional aspect of attention. He shows how part of the modus of attention is to turn, quite naturally, to those elements that are unfamiliar to us. When we are attentive, we are able to see beyond our habitual ways of perceiving. Waldenfels writes: “It is intrinsic for attention that senses can be controlled only to a limited extent. If controls were perfect, life would be determined only by habit without allowing for anything of the alien” (2011, 58). In both of my examples, the revelatory aspect of attention came about owing to the novelty that was introduced to me by the guidance of the other. I was learning to see the environment in a new way. This required that the other subject operated his/her perceptual actions differently, that is, when I was grasped by the novel aspects of the environment, I was able to adapt to an unfamiliar way of seeing. Which in turn meant that the behavior, and the way of acting and engaging with the environment of my dog and the environment of Father Tiso, revealed to me novel ways of acting and perceiving.

This is part of the process of learning new things. We do not always look for similarities, and our mind is capable of stretching, adopting, and assimilating new forms of acting and perceiving.Footnote 6 Without this propensity our experience would consist of habitual loops—repeatedly seeing the same objects in the same ways. Through interaction and communication with others, we develop our forms of life and find kinship with beings who can guide us and liberate us from our self-referential habits. I want to show that attention as a revelatory practice can aid us in discovering a shared world through plurality rather than the similarity of our perceptual actions.

5 Attention and Revelation

Merleau-Ponty articulates something along these lines when he writes: “Attention is no longer a form that more or less lights up an immutable field but rather a restructuring power [my emphasis], one that makes the components of the landscape that did not exist reappear phenomenally. Thus, instead of a clarification of preexisting details, a transformation of the object occurs” (2010, 416). This brings us to an understanding of attention as a creative power. The moments of attention that I have described are not only occasions in which I come to see new things. I have heard the rattling of the leaves in a breeze, seen the rays of the sun, and witnessed the squirrels and the foxes before. The revelatory aspect of attention here also requires that I see in a new manner. In these moments, my experience is restructured. For once I do not repeat my habitual way of perceiving but discover novel ways to engage with my environment, and through this transformation objects and aspects are revealed.

The creativity of attention resides in the potential for my perception to make new connections in a visual world that nevertheless is, in a certain sense, constant. That which is restructured or recreated is the quality of my relationality to the environment—suddenly I am more present in an environment, which to a large extent has the same qualities as it had yesterday. Tim Ingold writes about attention as the propensity through which “the world opens up and is made present to us, so that we ourselves may be exposed to this presence and be transformed” (2018, 30). When I discovered the wildlife in my neighborhood park, I did not doubt that this fauna had been present in that environment before. The novelty introduced to me by my dog led to my realization that my perceptual capability had transformed, become more acute, and found a new orientation that allowed me to see what I had not seen before. This same sense was present during my experience in Pomoia. I was enabled to discover what was there to be seen every morning. The novelty was not dependent on new objects in the environment (squirrels, foxes, flickering sunlight, birdsong) but rather on a new or heightened relation between me and my environment.

Although my intentions, desires, will, and interests may play a part here, this restructuring requires an external agency. As William James pointed out, one of the elements in the puzzle of attention is my interest, but even this interest is co-constituted by the external: other subjects and the environment. As Diego D’Angelo puts it, in these specific moments attention is “motivated by the interest that comes from the things themselves” (2018, 111). Here D’Angelo interprets Merleau-Ponty, who shows that the object that is external to the self of the observer—that which is not self—has an agency of its own. This, for Merleau-Ponty the primary mode of attention, aids us in understanding the agency of the objective world and the agency of other subjects. Attention is, in this sense, not driven solely by subjective intentions; it is also formed and co-constituted by my being addressed by the other. Bloom-Christen articulates this ambiguity: “Attention emerges in situations where we have (been) trained to be observant, or where habits are disrupted by the extraordinary” (2023, 71). It is in this tension between my presuppositions, interests, and intentions, contra the disruptive, revelatory, and unfamiliar agency of the world, that new orientations for my perception can become established. I have agency and act upon the world; the world has agency and acts upon me. Attention is developed in the tension between these two movements of the mind.