The investigation into song-like vocalizations, such as chanting, among non-human primates and other animals, has proposed their role in conveying affective experiences. However, the significance of chanting in the context of human experience prompts inquiry. There are reasons to believe that chanting has been in existence long before human civilization (see Oubré, 1998) and until today, chanting is present in social rituals across cultures all around the world in various forms (Holtmann, 2017; Jeffery, 1992). More than a hundred years ago, Durkheim (1858–1917) coined the term “collective effervescence” to capture a certain collective emotion which arises by engaging in ecstatic musical rituals, in which chanting, dancing, music-making, and singing are involved. Contemporary theorists of human cultural evolution have speculated that such synchronized activities have developed to promote social bonding (Savage et al., 2020). “A chant may be started by [one] but is most often sung in groups” (Shuter-Dyson & Gabriel, 1981: 4). Meanwhile, chanting or to chant is to verbalize the same words or phrase in unison, usually repeatedly, and often in rhythm (Schweingruber & McPhail, 1999). While in general music has a unique ability to trigger memories, awaken emotions, and intensify our social experiences (Molnar-Szakacs & Overy, 2006), chanting as a type of participatory music (Turino, 2008) allows those who join in to engage in a joint action that promotes collective emotion and feelings of togetherness (McNeill, 1995; Herrera, 2018), produces affiliation, enhances memory performance, and increases coordination among in-group members (von Zimmermann & Richardson, 2016). Rhythmic chanting generally plays a significant role in promoting social, heroic, moral, political, and religious narratives, as well as communal myths and salvation among human groups in conflict (Holtmann, 2017). However, not all forms of chanting elicit positive emotions nor affect only group members’ own groups. For example, a battle chant, sing-song, or hymn that is performed on a battlefield “bond and prepare fighters, [while at the same time] deter and frighten enemies” (Holtmann, 2017: 278). The role of chanting in mediating collective emotional experiences among other things may have led to its use in demonstrations and protests because “chanting is a key activity to most protests” (Manabe, 2019: 5).

Aside from street protests, the use of chanting can also be observed in other events, in which social movement repertoires (Tilly, 2004), such as marching, gathering, and rallying, take place. Some authors (Ganz & McKenna, 2019; Matsumoto et al., 2015) argue that social movement group leaders may orchestrate activists to experience certain emotions. They do so through “the careful use of specific types of emotion-laden words, metaphors, images, and analogies, and nonverbally through their facial expressions, voices, and gestures” (Matsumoto et al., 2015: 370). Nonetheless, activists do this practice as well by “construct[ing] situations to evoke particular cognitions and emotions in fellow activists, bystander, and social control agents … [through] … social and material arrangements of protest events …” (Van Ness & Summers-Effler, 2018: 421). Thus, it can be argued that gestures, icons, speeches, symbols, objects, music, sing-song, and chanting are among many “tools” that can be used to provide scaffolds for influencing even manipulating activists’ cognition and emotion. These scaffoldings, in turn, might affect those who are exposed to or engaged in interactions with them, whether they want to be affected or not. Such affective experiences might occur during social movement repertoire. I argue that chanting that allows people to actively join in and contribute to can also facilitate collective emotions in similar contexts. Specifically, this act of raising one’s voice serves as an excellent tool for expressing collective emotions used during social movement repertoires, which has not only cognitive but also somatic components but also (potentially) situational, social, cultural, and historical components.

Through this writing, I aim to analyze how chanters’ (i.e., activists’) cognitive and affective processing interact with synchronous chanting, especially the emotionally charged ones, during its occurrence in social movement repertoires (i.e., marches, rallies, demonstrations, and protests). To accomplish this analysis, I consider the recently-developed framework of situated cognition and affectivity (i.e., SCAff) as an apt lens because it explains how a combination of neural, bodily, social, and environmental resources can contribute to how a situated agent cognizes and emotes in a certain context. Given that such repertoires are prevalent worldwide, applying the concepts from the situated approach can provide proper tools to analyze the phenomenon and advance the debate on relevant fields concerning the mind and emotions.

This paper continues in several sections. In the “Situated Cognition and Affectivity” section, the theoretical framework of SCAff, especially the user-resource interactions model and so-called mind invasion, and other relevant notions and terms are introduced. The goal of the “Chanting Matters” section will be to review characteristics of synchronous chanting along with the neural, bodily, situational, social, cultural, historical components which it is comprised of and argue why synchronous chanting is a worthy phenomenon to be analyzed using the said framework. In the “Chanters-Chant Interactions” and “Chanting as Mind Invasion” sections, the framework will be used to analyze how various distributed affectivity phenomena are facilitated or enabled by certain scaffoldings during an occurrence of synchronous chanting in Indonesian social movement repertoires, including how synchronous chanting mediates so-called mind invasion.

Turning to the cases of occurrences of synchronous chanting in social movements, this paper presents documented cases in the world’s third most populous democracy and the world’s largest archipelagic state (The World Factbook, 2020). With over 270 million people, Indonesia is a multicultural nation, a diverse country composed of hundreds of ethnic groups with different languages, cultures, religions, and ways of life. A shared identity was developed as a model for forging common identity on a background of diversity with the slogan “Bhinneka Tunggal Ika” (i.e., “Unity in Diversity”), defined by a national language (i.e., Bahasa Indonesia), ethnic diversity, religious pluralism within a Muslim-majority population, and a history of colonialism and rebellion against it. In 1998, street protests conducted by “generasi pemuda 1998” (i.e., the youths of generation 1998) had overthrown Suharto, the tyrannous president, along with his 32-year-long New Order military regime. Free and fair legislative elections took place the year after and thus began the era of reformation of democracy (for further details, see The World Factbook, 2020). These scourges against colonialism, tyranny, and many other sociopolitical issues will be referred to as the contexts, in which occurrences of synchronous chanting were documented. The cases discussed in this paper focus on the role of Indonesian youth nationalists in the nation-building processes, without any intention to diminish and belittle the significant role of other contributors. The objects of analysis are taken from the book “Activist Archives: Youth Culture and the Political Past in Indonesia” (Lee, 2016), a few relevant scenes in a documentary movie “Student Movement in Indonesia: They Forced Them to Be Violent” (Saroengallo, 2002), several social media videos depicting Indonesian social movement repertoires, among other sources, for these cases will prove to be useful in unraveling the current matter.

Situated Cognition and Affectivity

In the realm of cognitive science, the concept of “situatedness” has been extensively explored for over two decades (Clark, 2008; Menary, 2007; Rowlands, 1999; Sutton, 2010). In contrast to the traditional view of cognition, the recently developed framework called “situated cognition” (Walter, 2014) asserts that the mind is not solely an intracranial entity, but rather interacts with environmental scaffoldings. The term scaffolding was based on the work of Vygotsky (1986 [1962]), which was then reintroduced by Clark as “… any item or structure in the environment that provides reliable support for cognitive processes, so that cognitive routines will regularly exploit these structures to enhance their functionality and effectiveness” (Slaby, 2016: 266). These scaffoldings serve as components in the environment to facilitate cognitive processes, emphasizing the interplay of body, world, and interaction in shaping cognitive processing (Costello, 2014). The term “soft assembly” clarifies this notion—minds are viewed as systems softly assembled from a temporary coalition of elements, in contrast to systems with constituting components that have fixed and dedicated functions (Anderson et al., 2012, in Kirchhoff & Kiverstein, 2019), such as cars or computers. Building on Clark’s (2008) account, Kirchhoff and Kiverstein (2019) elaborate:

[t]he functional contribution of each [component] in softly assembled systems set by the task for which the cognitive system is assembled. [Hence], the cognitive system will be made up of whatever mixture of elements is able to accomplish the task most efficiently. Sometimes this will mean relying purely on internal resources. On other occasions, … making use of the right mixture of internal neural elements and bodily action or resources located in the environment. (2019: 91)

In other words, the mind may recruit extracranial resources as if they were “tools” for dealing with the cognitive task at hand more efficiently, which can be illustrated by how children use their fingers to help them with addition or subtraction. This symbiotic relationship between mind, body, and environment showcases the interactions between proper components that enhance cognitive processing, which has been argued to also extend to the domain of affectivity (e.g., Griffiths & Scarantino, 2009; Colombetti, 2014; Krueger, 2014; Colombetti & Krueger, 2014; Stephan et al., 2014; Colombetti & Roberts, 2015; Wilutzky, 2015; Slaby, 2016). The alleged Hypothesis of Extended Affectivity (HEA) further argues that affective phenomena can extend beyond the confines of an individual’s body, where the interaction between humans and their environment, including the natural, technological, and social aspects, serves as scaffoldings that facilitate various affective processes (Colombetti & Roberts, 2015). The framework takes an important turn when “the hypothesis of scaffolded mind” (Sterelny, 2010) “in its more comprehensive form” takes the center stage as the main thesis “that human cognitive capacities both depend on and have been transformed by environmental resourcessidibid, 2010: 472, in Stephan and Walter (2020: 305). When applied to affectivity, this means that our affective life is situated in relation to how intimate the interactions between us and our scaffoldings are. For example, we can compare the feelings that arise when driving our own car while listening to our music playlist to driving someone else’s car while listening to their playlist. There are “interactions which originate with the individual and from there stretch out into the environment through a (mostly intentional) process of resource usage” (Stephan & Walter, 2020: 305) and interactions that do not result from one’s intentional decision to organize one’s environmental resources, nor to make sense of them—rather “they involve some sort of mind invasion whereby structures in the environment reach inward into the individual …” (ibid., 305; see also Slaby, 2016). In an attempt to understand how these reciprocal interactions might transform an agent’s affective life, situated affectivity, an extrapolation of the so-called situated cognition framework is developed.

Two different, yet interconnected, types of affective scaffoldings are principally distinguished. The first type is so-called user-resource interactions and the second type is so-called mind invasion (cf. Stephan & Walter, 2020: §§ 3 and 4). For the sake of the arguments in this paper, only one case of each type of affective scaffoldings is presented here. The first one is a case in which human affectivity can be transformed through user-resource interactions: (A) strongly coupled and integrated material tools for emoting (ibid., 306). Referring to the work of Clark (2008: 129–131), Colombetti and Roberts (2015) have identified coupling and self-stimulating loops as requirements for a coupled system and applied them to affectivity. Coupling happens when “two or more systems reciprocally influence and constrain one another’s behavior over time,” such that they can be modeled as a single system, whereas a self-stimulating loop involves an “activity, that has been designed, or selected and maintained, for a certain purpose over time” (ibid., 1246–1247, emphasis added). Thus, when being applied to affectivity, an emoting agent and a coupled environmental item would result in a coupled system that “allows novel forms of steering and expressing affectivity inaccessible to the uncoupled individual” (Stephan & Walter, 2020: 306). For example, “… an agent who uses a diary as a tool for constantly and reliably refueling the (dispositional) feelings of resentment towards her parents” (Colombetti & Roberts, 2015: 1253). The other case belongs to the second type of affective scaffoldings, the so-called mind invasion. Coined by Slaby (2016), mind invasion, in a broader sense, happens when the scaffoldings that can transform our affective life are imposed on us, regardless of whether we want to be affected or not, even without us being consciously aware of them (Stephan & Walter, 2020). (B) Tools for manipulating emoters. This case belongs to the type of affective scaffoldings that is “deliberately launched as tools for mind invasion to diachronically modify the attitudes and the emotional set up of a target group” (ibid., 308). Advertisements and propagandas are examples of this type. However, according to Slaby (2016), this type of affective environment can also be created accidentally. I will discuss more about this type of affective scaffolding in Section "Chanting as Mind Invasion" with real cases.

Chanting Matters

The arguments built in this paper are based on an understanding of synchronous chanting as a form of “verbalization of the same words in unison, usually repeatedly, and often in rhythm” (Schweingruber & McPhail, 1999: 465). Particularly in social movements, this kind of joint action can often be observed as a collective response to certain invitations or rhetorical devices from a chant initiator (e.g., the leader, the organizer, the orator, or anyone in the crowd initiates a chant). By nature, synchronous chanting is a collective event since isolated instances would either be inaudible or shortly lived if the chant does not “catch on” (Dye, 2018: 26). Conversely, asynchronous chanting that is equal to the sound of indistinct chatter would be likely to result in unintelligible murmuring (ibid.). Synchronous chanting is a collective activity, in which chanters utter chants synchronously.

Chants are typically short, rhythmic, and repetitive so that anyone can join in, making chanting a type of participatory music that ideally involves as many people as possible (Turino, 2008). The idea that chanting relates to emotional experiences can be traced to its evolutionary origin. Oubré (1998) argues that chanting developed prior to language in the first hominids, as an attempt to convey information about visceral and affective states. She writes:

... the earliest form of proto-chanting may have been simply emotionally exhilarating experiences which were biologically based in the same neuroanatomically substrates corresponding to the components of the non-hominid vocalization system ... as hominids gradually extended their perceptual world of time, concomitantly developing a more elaborate auditory system, they would have been able to utter vocalization with improved precision in pitch, timing, and rhythmicity. (ibid., 177-178)

Along the same lines, Winkelman and Baker (2008) argue that chanting has roots in emotive vocalizations that primates use for a variety of social purposes and that our hominin ancestors developed as excited synchronous singing and dancing among members of a group. Such group vocalizations observed in many mammalian species have their closest human parallels in the ritualized synchronous group singing that is at the core of humans’ shamanic rituals. Despite being present in all the apes and in most monkeys, humans epitomize this capability to chant for passionate and emotional outbursts (ibid.). Richman (1987) argues that as in non-human primates and other animal species, human group vocalizations such as chanting and singing serve as an expressive system that serves not only to convey specific conceptual information but also to express emotions, enhance group solidarity, motivate other members, and accomplish social ends. Unsurprisingly, chanting can be observed in human cultures all around the world, often as a part of human rituals, and throughout human histories, such as an infantry drill in ancient Mesopotamia, Nazi’s muscular mass politics, and religious worships of Pentecostalist (Codrons et al., 2014; Cummins, 2013; McNeill, 1995; Winkelman & Baker, 2008). McNeill (1995) argues that when a group moves briskly together in unison while chanting, singing, or shouting rhythmically, the boundaries between participating individuals become blurry. He coined the term “muscular bonding” to capture the phenomenon of “euphoric fellow feeling that prolonged rhythmic muscular movement arouses among nearly all participants [that makes] other social ties [fade] to insignificance among them” (ibid., 2–3). We can see how it applies in both protest and/or non-protest contexts, where synchronous chanting activities “incite intense feelings of social cohesion, belonging, and bonding” (Herrera, 2018: 483), and heighten the feeling that the chanters are united as one (Manabe, 2019).

Chanting also occur as a response to an invitation (e.g., rhetorical devices) by an organizer, a leader, or anyone among the crowd although chanting is the most costly among other types of audience’ responses, such as applause, cheering, laughter, and booing (Dye, 2018). If we link it back to (A), it can be argued that a successful occurrence of synchronous chanting serves as an excellent case of a self-stimulating loop because a successful chanting requires others to join in and maintain the chanting through repetition and rhythm, achievable only through collective coordination among chanters (Choi et al., 2016). The loop begins when a chant initiator projects information in the form of an emotionally charged chant invitation into the world (consisting of voice, rhythmic pattern, prosodic features to emotional expression, ideas, and other chant components) and the other audience who perceive it as an input respond to it by producing more information into the world, consequently extending the emotions conveyed through the invitation to other chanters in the process. If this continues for some time, the participating chanters will keep re-cycling (emotional) information carried by the chant to produce a continuous loop that transforms their collective emotional experiences and expressions. This can mean that over time, emotional experiences are distributed across participating agents.Footnote 1

Should the necessary cooperation fail, it will result in a barely audible chanting or an asynchronous one which will usually cease on its own. Without any further motivating instruction or additional cue, such as the sound of musical instruments to help the mass coordinate their chanting, it would be hard to compensate for the lack of volume of the seemingly dying chanting, without actually joining in. Based on the situated approach, the minds may softly assemble both intra- and extracranial resources to deal with this demanding task of enabling or facilitating collective affective phenomena. Thus, it can be argued that, during the initiation (and subsequent repetition) of synchronous chanting, internal neural elements, bodily action, and resources located in the environment are considered to be components that are softly assembled. These environmental resources can be situationally, as in, available here and now, or socially, culturally, and even historically embedded. We will take a look at these components one by one, starting from non-trivial biological mechanisms.

Cranial Nerves

While the cranial nerves responsible for audiovisual processing deal with relevant cues for chanting, glossopharyngeal nerves move the muscles of the throat and the facial nerves move the muscles of facial expression. Each one of these cranial nerves is supposed to be involved during an occurrence of chanting. However, the vagus nerves also play a pivotal role in phonation, because they innervate the majority of the muscles associated with the pharynx and larynx as well as wander all over the body and reach various organs. It interacts with the microbiota, particularly those which reside in the gut, thus driving brain-gut communication (Bagga et al., 2018; Panduro et al., 2017) through the so-called brain–gut axis. This term is used to capture a complex system, which consists of the brain, the spinal cord, the autonomic nervous system, and the hypothalamic–pituitary–adrenal axis. The said axis manages a two-way communication process between the brain and the enteric nervous systems, bridging brain regions that are responsible in emotional and cognitive processing with peripheral intestinal functions (Carabotti et al., 2015).

Mutual Monitoring Mechanism

Dye (2018) points out how mutual monitoring is involved in an occurrence of synchronous chanting. During an occurrence of a successful chant, infectious emotions can often be observed among those who join in because observing others in a certain emotional state constantly sets of behavioral and physiological synchronizations (Dimberg & Thunberg, 1998; Hietanen et al., 1998; Wild et al., 2001). Even in contemporary contexts, euphoric waves may sweep from athletes to their supporters when they manage to win in a sporting event, a sense of tranquility elicits among religious devotees during moments of silence in a sacred ceremony, or in the context of social movement, a peaceful protest may turn it into a violent riot when anger and hatred sweep throughout the protesters (Nummenmaa et al., 2014). These phenomena which are thought to be mediated by emotional contagion, which captures our ability and tendency to “catch” emotions from the people in our vicinity, have been widely documented (Schoenewolf, 1990; Hatfield et al., 1994; Dimberg & Thunberg, 1998; Hietanen et al., 1998; Wild et al., 2001; Cacioppo et al., 2014; Nummenmaa et al., 2014). According to perception control theory (Schweingruber & McPhail, 1999; for references, see Miller, 2000), “people must monitor and interpret one another's behavior in order for collective action to occur … by way of observing direction, speed, [and the] tempo of the movement of others that [people] can adjust [their] behavior with respect to a particular objective” (Miller, 2000: 45). In a collective action, people also tend to spontaneously mimic others’ expressions, gestures, or postures (Belot et al., 2013).

Some researchers argue that humans predict others’ thoughts and actions, including emotions, through the “mirror neuron” system (Decety & Jackson, 2004; Gallese, 2003; Gallese & Goldman, 1998). Activation of these neurons with mirror-like properties is found when actions are performed, observed, or listened to (Kohler et al., 2002), enabling people to recognize and understand others’ states by mirroring perceived actions in their own brains. In addition, auditory cues arguably play a significant role in mediating the spread of emotion during a successful collective chanting because emotions can be expressed via facial expressions, gestures, postures, and prosody (for a review, read Klasen et al., 2012). Besides, emotional information is carried through human speech and music (Gibbon, 2016), as well as ambient sound events (Weninger et al., 2013). In addition, the mutual monitoring mechanism allows chanters to perceive others’ emotions while keeping the chanting “alive” by tracking the chant’s rhythm and timing of the beat. Some collective responses are “made up of very extended vowel sounds” (Atkinson, 1984: 20), such as “Merdekaa!” (i.e., roughly translates into “Freedomm! [sic]”), which are typical examples in Indonesian students’ protests. “Their open-ended character … makes it very easy for others to join in some time after a roar has got underway.” By doing so, “even late starters are able to play an active part in determining the volume, intensity, and duration of a response” (ibid., 20–21). That said, it is safe to assume that at least visual and auditory stimuli are involved in the mutual monitoring mechanism, influencing one’s decision to participate in, maintain, alter, or stop the chanting event.

Timing Mechanism

Timing refers to “the ability to represent and use temporal information such as duration and rhythm” (Schirmer et al., 2016: 760). Though there are several ways for synchronous chanting to occur, a successful synchronous chanting occurrence involves appropriate timing mechanisms. For example, Atkinson (1984, in Bull & Noordhuizen, 2000) identified that audience response would “ideally” occur within one second of the end of a speaking turn and pause subsequent to [the time] when the speaker signals that they are ready to continue speaking by beginning their next phrase. However, “timing,” in a sense of choosing when something should be done, is also important for speakers to properly utilize rhetorical devices to emphasize their message and ensure positive audience responses. One strategy is to use rhetorical devices, such as claptrap, a technique equivalent to three cheers or starters’ orders such as, “On your marks, get set—Go!” In this way, “those who [have] heard it are guided step by step towards a precise moment in the near future when they should do the same thing at the same time” (Atkinson, 1984: 47).

A speaker may also use multimodal cues to orchestrate the response of the activists to lead a series of free-flowing synchronous chantings.Footnote 2 By using simple hand gestures such as fastened beat gestures to pace up the tempo of the collective chanting, raising hand(s) upwards to increase its volume, and hand gestures that resemble “air sweeping” to invite the audience in the area to chant together, the speaker manages to coordinate the chants shouted and/ or sung by the activists. Invitations to chant are sometimes paired with particular movements, such as waving a raised fist or making beat gestures, so that chanters may perceive the cues more clearly, predict the timing more accurately, and produce and maintain chanting successfully. Not to mention that chanters’ sense of timing seems to improve when the chants are broken down into short phrases because the longer the phrases, the greater the cognitive demands for maintaining coordinated responses.

Turn-taking Mechanism

Timing also seems to be involved in the turn-taking routine during an occurrence of synchronous chanting. These alternating turns can be observed in both the initiator-chanters and among the chanters themselves. Logically speaking, a turn-taking mechanism involves both action and inaction, which means to exhibit or inhibit verbalization of the chant. Incorrect timing for taking turns in verbalizing the chant would result in asynchronous and/ or interruptive utterances (Dye, 2018), be it at the onset, during the utterance of the phrases, during the pause between one word or phrase to another, or when the chanting ought to cease. The turn-taking mechanism, which seems to be based on the same exchange principles, shows strong universality in communication across cultures (Levinson, 2016), suggesting its evolutionary benefits to our social life. Neuroimaging studies have revealed the brain regions which are involved in timing processing overlap with the “social brain” or brain regions that are typically engaged in social-stimuli processing (Kennedy & Adolphs, 2012). When engaging with others, humans tend to mimic others’ facial expressions, gestures, or postures in a process of chorusing and rhythmic synchronization. Such mimicry extends as well when these behaviors occur in time, leading to the temporal synchronization between interacting agents (Hasson et al., 2012). For most species, chorusing requires the perception of the behavior of a conspecific; however, there are a few exceptions in which chorusing can be triggered by just an auditory rhythm, and humans are one of the exceptions (Ravignani et al., 2014). Rhythmic processing, which is closely related to an increase in activation in the insula and superior temporal cortex, and the timing of social stimuli (e.g., bodily movements, facial expressions) seem to further intensify the said two regions which have been consistently reported to be responsible for empathizing and mentalizing, respectively (see Schirmer et al., 2016). It suggests that the synchronization process relates to human abilities to share feelings with others and to gain insights into the thoughts of others. These relations allow for spatiotemporal coordination to achieve a shared goal and can be observed in a situation where people play music and dance together, where they engage in forms of musical joint action that are often characterized by a shared sense of rhythmic timing and affective state (Phillips-Silver & Keller, 2012).

Memory Mechanisms

Even in the simplest form such as remembering words or phrases to be repeated, memory seems to play a pivotal role in an occurrence of synchronous chanting. However, arguably memory mechanism goes beyond simply encoding, storing, and recalling both implicit or explicit memories, especially in the context of social movements because memory mechanism allows chanters to relate to the situational components during an occurrence of synchronous chanting, as well as social, cultural, and historical components of a chant. This relates to the observation that chants in social movements are often made up of borrowed familiar patterns and tunes that make it easy for people to participate and create anew through incremental changes. In the process of making novel chants, often the tendency to follow the familiar musical forms of sentences or periods persists (Manabe, 2019). According to Atkinson (1984), “[a] recognizable rhythmic beat, sometimes in combination with the familiar words of a hymn or popular song, makes it possible for thousands of different individuals to join in and produce exactly the same actions at exactly the same time” (19).

Some songs, chants, and symbols also seem to flow from movement to movement. The adoption of pre-existing chant patterns (i.e., intertextuality) serves as a common tactic used widely in social movements (Manabe, 2019). These chants are often constructed out of pre-existing patterns that have been previously used in other movements. Unlike novel chants, these pre-existing patterns are easy to remember, repeat, and be adapted to the current issue. And the words of the basic idea in chants that are made out of musical sentences do not have to match, only the rhythmic similarity (ibid., 8–10). According to Eyerman and Jamison (1998a, b), these sounds serve as “channels of communication that enter the collective memory and conjure up long-lost movements from extinction, as well as reawakening forgotten structures of feeling” (161, as also cited in Manabe, 2019).

To sum up, we can see how these intracranial components recruited during an occurrence of chanting might interact with the bodily components in a complementary manner to enable and/ or facilitate certain emotional experiences during collective chanting. As also implied by the analysis above, these biological components can interact with proper environmental resources. We will take a closer look at these synchronic, here and now, interactions between each component assembled during collective chanting and other diachronic constituents of chants that give access to novel forms of emotional experiences in the next two sections. While these components comprise any proper intra- and extracranial resources available to an agent, I will focus on resources related to synchronous chanting activity used in Indonesian social movement repertoires.

Chanters-Chant Interactions

In social movements, emotions play a role in motivating participation (Jasper & Poulsen, 1995, Jasper, 1997; Oliver, 1984; Brym, 2007) and sustaining commitment (Jasper & Poulsen, 1995). However, actual demonstrations consist of not only heightened emotional moments but also bored moments of silence along with repetitive walking, sitting, standing, and queueing (Lee, 2016), which pose a risk to the movement because “[e]xperiencing sustained fatigue without emotional intensities of thrill, risk, or hope may eventually lead to frustration at the movement or [the] leader” (see Summers-Effler, 2010). Hence, leaders and activists of develop various ways to avoid such risk by using verbal and non-verbal behavior (Matsumoto et al., 2015) or by constructing emotion-evoking situations, through both social and material arrangements of protest events (Van Ness & Summers-Effler, 2018). The most often used in protest events is chanting (Manabe, 2019).

Nonetheless, even the social movement leaders are not always capable of initiating a synchronous chanting—and we can see how situational elements overshadow the leader’s invitation to chant by referring to Doren Lee’s writing about her experience when participating in a demo (i.e., demonstration). The demo began with a long march in commemoration of the Trisakti Tragedy on May 12, 1998, a tragedy in which Indonesian soldiers opened fire at a demonstration on unarmed protestors who were demanding the resignation of the tyrannous president Suharto. She writes:

The Trisakti Tragedy is the original event that galvanized public support for Reformasi and so the massa [i.e. Mass; activists] are extremely motivated. Bereaved relatives of the dead have turned up. The march is slow and the heat ... [is unbearable]. Under the hot sun, activists scream themselves hoarse about the injustice of the state and the political aspirations of the army: “Fight tyranny! Fight militarism! Fight neoliberalism!” The crowd repeats these chants. [However, the] heat drives people ... [away from the demonstration's center to] the outer fringes of the demo’s soundscape, where speeches are faintly audible so that the massa are not close enough to the command truck to repeat the yells or to respond to the orator’s cues. (Lee, 2016: 78-79)

Emotions are a major source of motivation for behavior in individuals (Ekman, 2003) as well as in groups (Matsumoto et al, 2015). Lee’s writing above shows that emotions gave the activists reasons to participate in the movement, to gather with everyone who is driven to demonstrate their feelings, to voice what they have to say, and to go on a demo. A long march made the activists walk quite a distance from their starting point, to head to a particular direction to arrive at a destination as the route they take while chorusing, moving together coordinatively while adjusting their direction and speed to that of others, in a manner resembling a single superorganism.

It is to be expected that not every activist will engage fully in the long march and so “[o]rganizers worry about their [activists] for the duration of the demo, wondering if disinterested [activists] have wandered off” (Lee, 2016: 76). Chanting does seem to help them counter boredom and be engaged with the overall process of a demo. Nonetheless, as we can see in the given example, some situations hinder synchronous chanting to occur, such as when the activists got dispersed due to unbearable heat. We can then interpret that without adequate proximity among the activists, it is difficult to hear and monitor others, let alone perceive the orator’s cue. On the other hand, proximity does not automatically guarantee a chant to be verbalized synchronously. For example, if we take a look at their shouting of political aspiration (“Fight tyranny! Fight militarism! Fight neoliberalism!”), we can think of these chants as not being uttered in unison, but instead, sporadically, where the chants are shouted by individuals in some form of a group without any causal relationship between one’s shout to another despite being close to each other. Crucially, while asynchronous chanting also occurs in social movements’ repertoire (as well as in a riot or panic incident) and it may lead to emotional contagion, the mind does not need to “recruit” mutual monitoring, timing, and turn talking mechanisms in the same way it does during an occurrence of synchronous chanting.

Aside from proximity, there are other situational elements that facilitate an occurrence of synchronous chanting, especially when being led by multiple cues from the organizers. For instance, tools (e.g., a microphone or a handheld speaker) can be used to amplify the organizer’s voice so that it reaches more audiences (i.e., activists). The organizer can stand on elevated platforms, such as a balcony, top of a building, a command truck, or places where the activists can see or hear the organizers, such as in front of or at the center of the activists. In some cases, the organizers ask the audience to do simple tasks, such as standing, raising one’s fist, or putting a hand on one’s heart. In this way, the audience’ attentions are caught and they are more likely to be ready to start the chant, to stop, or to change phrases when cues are given verbally (e.g., rhetorical devices, direct instruction, “repeat after me,” or “follow my lead”) and/or nonverbally (e.g., pointing the microphone towards the audience, using beat gestures, iconic gestures, or other inviting gestures). Combinations of both verbal and nonverbal cues might further reduce audiences’ cognitive demands, especially for timing prediction and turn-taking.

Another situational element is the direction of the chant. It seems unlikely that chanting in demonstration serves to entertain audiences, but since synchronous chanting may help to boost social cohesivity and sense of belongingness to the group, chanting and sing-song are usually directed towards the group itself—or in other words, chanting by the group and for the group. Exceptions for this are when an organizer leads the event, in which usually the attentions of the activists and their chanting are directed towards the one on the stage, and during a long march or a rally, where the participants usually shout along the direction where they are heading to.

There are instances where chanting is aimed at specific targets, whether they are objects, buildings, symbolic representations, individuals, or groups. For example, following incidents of public protests which resulted in the death of civilians called the Semanggi Tragedy, one of the student activists chanted towards the armed soldiers who guarded the area in front of them, “Pembunuh! Pembunuh! Pembunuh!” (i.e., Murderer! Murderer! Murderer!) and other activists immediately followed the chanting while giving thumbs down repeatedly along with the chant as seen in the documentation of 1998 street protest (Saroengallo, 2002, 18:00 – 18:18).Footnote 3 Another example involves a group’s action against Papua university students in Surabaya who refused to raise the Indonesian flag on Independence Day, resulting in a hostile confrontation.Footnote 4 Their place was surrounded and the crowds chanted, “Usir! Usir! Usir Papua! Usir Papua sekarang juga!” (i.e., Cast out! Cast out! Cast out [people of] Papua! Cast out [people of] Papua, immediately!). In both cases, negative emotional expressions were expressed through synchronous chanting which directed towards the target of said emotions. In these cases, the chants express the chanters’ emotions to potentially elicit targets’ feelings of guilt, shame, fear, or other negative emotions. In contrast, there are types of chants targeted at an individual which serve as encouragement, such as naming (Atkinson, 1984), where the audience chants to encourage a particular individual to act, to give a speech, or simply to move forward.

Most chanting, particularly during long marches, are spontaneous. No one usually organizes what phrases or messages to shout, when to shout, and in what order, especially because in a long march the organizer might not be at the center of the activists. Anyone in the crowd has the opportunity to initiate a chant, especially when there are no speeches. However, a successful synchronous chanting hinges on others joining in with repetitions of a word or phrase that collectively expresses their emotions. Should the other activists be less than ready to catch on the uttered the often-slogan-like messages, chanting might occur asynchronously or it might not occur at all. To prevent unsuccessful attempts, some cues to chant are deliberately used because many individuals are already coupled with them. These cues potentially evoke memories tied to the social, cultural, and historical aspects of the chants. Here are examples from social movements in Indonesia:

Chorus-Based Chants

The first case of (A) in Indonesian social movements is between chanters and a chorus of a song. Specifically, it is between activists and the chorus part of a renowned children's song in Indonesia entitled, “Menanam Jagung” (roughly translated to “Planting (The) Corn”). The song is composed by Saridjah Niung Bintang Soedibjo (1908–1993), or popularly known as Ibu Soed, who has composed hundreds of children’s songs when she was alive. This chorus’ melody (see Fig. 1) was chanted by Indonesian youth revolutionaries, called pemudasFootnote 5 and mahasiswa, the archetypal university students, during the street protest following the 1998 Tragedy and the demonstration that forcefully ended President Suharto’s thirty-two-year military dictatorship. In both events, its original lyrics got replaced with phrases that represent their collective intentionalities.

Fig. 1
figure 1

Chorus part of the song “Planting (The) Corn” by Ibu Soed; Saridjah Niung Bintang Soedibjo (1908–1993)

In the documentary of 1998 Tragedy by Saroengallo (2002),Footnote 6 the lyrics were replaced by, “Turun, Turun, Turun, Suharto. Turun Suharto sekarang juga!” (6:04–6:08) which served as an order for Suharto to give up his position right away and “Gantung, Gantung, Gantung, Suharto! Gantung Suharto di Taman Lawang!” (28:04–28:11) which conveyed the chanters’ intentions to lynch the president in Taman Lawang, a public park in Indonesian capital city—and, “Lawan! Lawan! Lawan dan menang! Hancurkan rezim penindas rakyat!” which translates to “Fight! Fight! Fight and win! Destroy the People-oppressor regime!” (12:54–13.07). It could be imagined that these were only three out of many said chorus-based chants that were chanted by the pemudas during that period.

Now let us skip forward around two decades later, where the landscape of audiovisual media has fundamentally changed and online video has become a medium commonly shared through the internet and social media platforms. As a result, many social movement events in Indonesia were documented and published online, thus available for analysis. The exact melody of the same chorus was used as a base of a chant that can be heard in 2017, during the night of the torch relay to welcome Ramadan, the ninth month of the Muslim calendar, and the holy month of fasting. Just like in 1998, the original lyric was replaced but this time it was changed to “Bunuh! Bunuh! Bunuh, si Ahok! Bunuh si Ahok sekarang juga” (i.e. “Kill! Kill! Kill, Ahok! Kill Ahok right now!”).Footnote 7 Ahok is a “double minority” in Indonesia, a Chinese (descent) Indonesian and a Christian, who became Jakarta’s first non-Muslim governor in 50 years. Before he was allegedly accused of blasphemy of Islam, people had been acknowledging him as an abrasive politician who holds a strong anti-corruption stance. The charge against him led him to undergo a prison sentence for two years, giving up his position as the governor of the capital city shortly before the next election in Jakarta (Coca, 2017). However, the most disturbing part was the fact that the chant was chanted by children who participated in the torch relay to welcome Ramadhan, their holy month. Based on the video, it seems like a collective response to an unknown initiator’s invitation, but the chant appeared to be caught on immediately. The same chant with the same message was chanted by Rizieq Shihab,Footnote 8 the hardline Muslim leader of FPI (Front Pembela Islam; the radical Islamic Defender’s Front),Footnote 9 during a demo against the said governor six months before (BBC News Indonesia, 2017).

The same chorus-based chants can also be heard in other social movements with different purposes, such as in street protests by groups of mahasiswa who protested against the Criminal Code Bill issued in 2019,Footnote 10 in the previously mentioned incident in Surabaya which involved a “siege (verbal) attack” to cast out People of Papua and in a gathering of a radical right-wing group who called for a revolution by promoting the establishment of a caliphate system in secular Indonesia.Footnote 11 It seems that textual phrases that represent each social movement’s goal are put into a rhythmic melody of a familiar song so that participants will be more ready to join in unison.

To sum up, in social movements, although they consist of different group members, which are organized by different leaders and have different types of repertoires with different agendas, the same chorus is used consistently by replacing its original lyrics with phrases that suit each movement’s goal. It leads to the conclusion that the chorus-based chant serves as a case of strongly coupled and integrated material tools for emoting (see A in the Situated Cognition and Affectivity” section), where an assembly of intra- and extracranial tools enables and/or facilitates its repeated occurrence. More importantly, the repeated use of chorus-based chant represents a “shared (demo) culture” among Indonesian protesters (not exclusive to student activists) that has been maintained and practiced collectively over more than two decades.

Historically-Scaffolded Chants

Sumpah Pemuda (i.e., Youth Pledge) is an oath that embodies the spirit of Indonesian youth collectivism and solidarity that has been enmeshed alongside the nationalism of Indonesian youths (Sebastian et al., 2014). Originally a pledge, it occurs in social movement repertoires in relatively the same way as typical repeat-after-me-type of a chant, during which a leader would recite it line by line, followed by the audience who verbalize the solemn pledge in unison. This historically significant pledge has been in existence in pemudas’ movements over generations. Its origin can be traced back to The Youth Congress on 28 October 1928, where diverse, young intellectual groups of the 1928 generation (known also as Jong Java, Jong Sumatrenon Bond, Jong Ambon, and other ethnicity-based pemudas’ groups) declared the Sumpah Pemuda in The Congress of Indonesian Youth in October 1928: “We have one homeland, that is, the Indonesian homeland. We are one nation, that is, the Indonesian nation. We have one language, that is, the Indonesian language” (Foulcher, 2000: 403). Following its recitation, ethnicity was discarded for the first time in favor of the broader concept of nationalism. The idea of Indonesian youth solidarity gradually emerged, enmeshed, and extended alongside the idea of Indonesia as a nation (Sebastian et al., 2014). However, the symmetry of the formulation of this “sacred pledge”Footnote 12 has been maintained in all of its variations since its first declaration in 1928 and it has been customized to suit anti-colonial, anti-imperial, and Old and New Order needs (Foulcher, 2000). In essence, the Sumpah Pemuda puts emphasis on unity: “One homeland, One nation, One language” (377).

There is another variation of the Youth Pledge, which is known as Sumpah Mahasiswa Indonesia (i.e., Nationalist Student Oath). It was also chanted during the street protest to overthrow Suharto and his New Order in 1998, as documented by Saroengallo (2002, 9:52–10:34).Footnote 13 This version reads, “We have one homeland, that is the homeland without oppression. We have one nation, that is the nation that loves justice. We have one language, that is the language without lies.” In this version, we can see how its verses and the way it is iterated have a resemblance to its predecessor. From one generation to another, this pledge has been used to show pemudas’ nationalism and their spirit for unification despite ethnic, language, belief, and cultural differences. Later, “[a]fter the fall of Suharto, a generational identity was conferred upon pemudas of Generation 98, the Reformasi generation, which in turn codified personal and individual memories” (Lee, 2016:11). It is as if the spirit of nationalism were conveyed through the pledge throughout decades of pemudas’ struggle and later when it is chanted, it evokes certain feelings that might otherwise fade over time (cf. Eyerman & Jamison, 1998a, b) associated with the history behind it. Furthermore, even until today, Sumpah Pemuda can still be heard in student activists’ demonstrations outside its annual commemoration on 28 October. One documented event, in which this pledge was chanted, was in September 2019 by student activists.Footnote 14 One might be able to tell from the video how zealous the participating mahasiswa in reciting the pledge in unison, channeling certain spirits that may not be felt by those who are not part of the struggle nor identify with the nationalism of Indonesian pemudas. To further illustrate, here is Lee’s account:

Many of the demonstrations I observed (from 2003 until 2009) ended with the Youth Pledge [where] the massa snapped into formation, raised their fists forward toward the sky ... and lustily recited the oath they knew by heart. The mostly male and young activists around me would notice my reticence in those moments and nudge me to raise my fist, sometimes aggressively insistent that I get in line with the others, ignoring my discomfort ... who ... could not quite mimic the heartfelt declaration of pledging the self to the nation.” (ibid., 95)

We can interpret that anyone could join the chanting provided s/he knows the verses, yet her overall affectivity would arguably be different from those who identify themselves with pemudas generations. The chanting of the pledge scaffolds affective experiences for those coupled with its historical significance.

A Rallying Cry

Returning to Lee’s report on the demonstration in the commemoration of the Trisakti Tragedy, in which the activists got dispersed due to unbearable heat, one may not expect that it ended well. Nonetheless, as a matter of fact, it did. In an attempt to rectify various failing situations which happened during the demonstration, one of the activists climbed up the command truck that served as a podium, took the role of an orator, and began reciting a particular poem in a slogan-like form, which eventually “brought the dispersed demonstrators together again … All the demonstrators, young and seasoned alike, knew which script to follow to appear united, strong, and resistant … right until the final moment of dissolution” (Lee, 2016: 84).

This poem, titled “Peringatan” (i.e., Warning), is a favorite among leftist activists (ibid.). It is composed by an activist, who was persecuted by the government during Suharto’s New Order and who has been declared missing until now, namely Wiji Thukul (1963—missing in 1998). The poem captures both the tyrannous era of Suharto and “semangat perlawanan” (i.e., Widerstandsgeist; the spirit of resistance) against it. It starts with a warning for the ruling class about their subjects’ gradual desperation, its rhythm slowly builds people’s anticipation for the climactic conclusion of inevitable resistance, and the ritual effect of the poem is meant to lead the demonstrators to unite so that “[d]uring the demonstration, all activists are expected to say [the] words in chorus, ending with the powerful final line: “Maka hanya ada satu kata: Lawan!” (i.e., “For there is only one word: Resist!” ibid., 83, emphasis added). Lee notices that the phrase was repeated over and over again in the student archived documents starting from the mid-1980s and lasting for almost over two decades. “Satu kata: Lawan!” (i.e., One word: Resist) was also added as the final line that entails the triplet formula of the student oath of Generation '89. It “appears as the epilogue to secret reports, in closing statements on leaflets, in demonstrations as a rallying cry” (ibid.).

Peringatan

Warning

Jika rakyat pergi

If people leave

Ketika penguasa pidato

While the rulers are speaking

Kita harus berhati-hati

We must beware

Barangkali mereka putus asa

Perhaps they are desperate

Kalau rakyat bersembunyi

If the People hide

Dan berbisik-bisik

And whisper

Ketika membicarakan masalahnya sendiri

When talking about their troubles

Penguasa harus waspada dan belajar mendengar

The rulers must be alert and learn to listen

Bila rakyat berani mengeluh

If the People dare to complain

Itu artinya sudah gawat

That means it’s terminal

Dan bila omongan penguasa

And if the ruler’s speech

Tidak boleh dibantah

Cannot be challenged

Kebenaran pasti terancam

The truth must be in jeopardy

Apabila usul ditolak tanpa ditimbang

If suggestions are refused without consideration

Suara dibungkam kritik dilarang tanpa alasan

Voices silenced, criticism outlawed without reason

Dituduh subversif dan mengganggu keamanan

Accused of subversion and disturbing the peace

Maka hanya ada satu kata: Lawan!

Then there is only one word: Resist!

The Poem “Peringatan” by Wiji Thukul, written in Solo in 1986 (Rishanjani et al., 2019: 62).

Translation into English is taken from Doren Lee (2016: 82–83).

Particularly interesting is the last word, “lawan!” which serves as the most anticipated word during the recitation of the poem. Indeed, “lawan” can be translated into English as “resist.” Nevertheless, the word “resist” arguably does not embody its constituting sociocultural-historical as well as psychophysiological components and meanings related to each of the components. To clarify it a bit, the word “lawan” in Indonesian is closer to the word “fight” in English. Accordingly, “lawan!” is an imperative form of a verb with multiple meanings, such as to fight back, counter, oppose, and combat. Moreover, the meaning of the word, especially during the New Order, symbolizes some sort of a mental revolution when taking cultural context, in which it is embedded, into consideration.

While Indonesian society comprises numerous ethnic groups, the Javanese are the largest, namely 41 percent of the total population. In the late twentieth century, the political culture in the Indonesian government was dominated by paternalistic rule reflecting Javanese cultural values, and that includes the officers’ corps (Santoso, 2012). From the Javanese culture perspective, “[t]he whole of society should be characterized by the spirit of rukun [i.e. harmonious] … its behavioral expression in relation to … superiors is respectful, polite, obedient, and distant …” (Mulder, 1978: 39). Its values involve loyalty to the top level of the hierarchy, obedience to superiors, and the desire for conflict avoidance so that Javanese children are raised in a way that expressing disagreements or overt emotions is intolerable, especially against older people (Santoso, 2012). Hofstede (1993) added that hierarchical relationships are more readily observed in Asian cultures than in Western cultures, which is strongly embedded in the strictly hierarchical Javanese society (Dean, 2021) as they ensure that the members of the society learn their position and responsibility within the social structure. This also extended to the government, where power has been exerted through a paternalistic bureaucratic state, and showed in the political culture in the government which was dominated by paternalistic rule reflecting Javanese cultural values.

Putting the word in Indonesian cultural context, “lawan” can often be heard as a part of a phrase that means to question one's disobedience when being scolded or during a dispute, mostly coming from senior members of the society, authoritative figures, or even one’s own parents, such as, “kamu berani me-lawan saya?” (i.e., you dare to defy me?). It is not an exaggeration to say that it is normatively “forbidden” for the younger generations to act, raise their voice, talk back, stare at, “give an attitude,” or show any sign of hostility against those who are older or deemed as having seniority over them. Consequently, one may even receive punishments or treatments harsher than what one may receive by inhibiting any sign of per-lawan-an (i.e., resistance). Yet, me-lawan (i.e., to resist) describes an act when younger generations decide to actively resist oppressive treatments, supposedly because they can no longer tolerate such treatments. Arguably, the same spirit can be seen on a national level, especially in times of colonialism and also “during the New Order, when soldiers … intimidated and even beat students who dared to mouth back to them”.Footnote 15 This is why the word “lawan” can be argued as having a sociocultural-historical meaning beyond its “naked” semantics.

During the recitation of the poem, there are also individuals who mouth the words [silently] before the poem reaches its climax (Lee, 2016, p.83), arguably they were anticipating and gradually building up their emotions for the climactic conclusion. Moreover, the synchronous chanting of the final line in “Peringatan” is often accompanied by the activists raising their clenched fists, as a symbol that expresses unity, strength, defiance, or resistance. The same tendency can also be observed across cultures over the course of histories, often as a reaction to oppression.

Furthermore, one might realize how different it feels to vocalize the word “lawan” compared to “resist” for one does not need to “resist” oneself when shouting the former as opposed to the latter. In anticipation, one’s mouth would readily articulate the word as one’s lungs and diaphragm are filled with air, allowing one to shout to one’s heart content for the mouth and the throat are already widely opened to produce the two syllables that constitute the word. Since the words that the chant includes are meaningful to the audience and they express a shared agreement or position with the initiator (Dye, 2018), using the word lawan reflects who they are up against. The chanting of the said word implicitly conveys the stance the activists are taking, as well as expresses their shared emotion as a group. Taking various perspectives into account, it is not surprising that the word that is equal to the word “fight” is often chanted in street protests for the word embodies various elements tied to novel forms of emotions for Indonesian people.

“Both the Sumpah Pemuda and Wiji Thukul’s dissident poetry [especially the One Word] continue to punctuate Indonesian demonstrations today, even long after Thukul’s disappearance by the state” (Lee, 2016: 42), as well as the chorus-based chants. The three make up the list of tools that organizers or activists use to evoke collective emotions while at the same time potentially recruit memories related to the social, cultural, and/or historical components of the chants among other constitutive components of collective chanting, providing access to scaffolded collective emotional experiences. On the other hand, owing to the nature of synchronous chanting that requires other people to join in, it can be argued that they might also be used as a tool for invading the minds and affective life of the chanters (i.e., the student activists). Synchronous chanting in Indonesian social movements as cases of affective mind invasion is the topic of the next section.

Chanting as Mind Invasion

The concept of “Mind Invasion” (Slaby, 2016) applied to social movements can be understood through the transformation of newcomers as they engage in social movement repertoire. The behavioral distinctions between newcomers and seasoned activists become evident as newcomers learn the actions and interactions of demonstrators, including attire, social interactions, responses to authorities, and managing challenges within the movement (Lee, 2016: 75–76). Over time, they would pick up how seasoned demonstrators would act, how they interact with each other, how to dress properly in street protests, how to deal with social control agents, and how to cope with adversities they encounter during the movements, et cetera. Eventually, they will also learn how displaying certain traits or behaviors may be beneficial for them when relating to fellow demonstrators. For instance, dressing too formally may lead to alienation. Likewise, activists who are given the chance to lead the movement may feel proud or anxious—feelings that might be unknown for newcomers. However, when interacting with some affective scaffoldings in some situations, even seasoned activists can also experience unintentional emotions, where they were overcome with emotions regardless of whether or not they feel aware and consent. Noticing said situations, in which seasoned activists with years of experience got nervous before they climbed on a command truck to give a public speech during a demonstration, Lee (2016) writes:

Activists changed their body language the minute they climbed onto the command truck or clutched a microphone to make a speech. They pitched their bodies at a forward-leaning angle to shout into the crowd, raising their arms and jabbing their fingers in the air performatively ... In this soundscape of leadership resonating from the mobil komando [i.e. command truck], the broken, harsh, and high-pitched screams of orators represented a climax. They were signs that one was overcome with emosi (i.e. emotion), that an everyday voice could not carry their emosi (emotions, anger), which was so physical that sometimes people swooned and had to be helped off the truck.” (ibid., 83)

In accordance with the concept of (affective) mind invasion, such environmentally scaffolded emotional experiences can potentially be mediated during an occurrence of synchronous chanting. I argue that chanting can be used as a tool for manipulating emoters (see B in the “Situated Cognition and Affectivity” section) especially because its nature allows access to distributed emotions among the chanters. In this section, I will discuss a case, in which affective mind invasion seems to apply during occurrences of synchronous chanting in Indonesian social movements. This case is selected to provide an overview of chanting as a case of affective mind invasion because it contains most cases of strongly coupled and integrated tools for emoting (see A in the “Situated Cognition and Affectivity” section) that have been discussed in the “Chanters-Chant Interactions” section. In what follows, I will analyze various means of affective mind invasion in this particular demo.

Affective Manipulation

One particular demo carried by Indonesian student activists against Revisi RUU KPK (i.e., Corruption Eradication Commission Policy Revised-Draft) in 2019 was amateur documented and uploaded in an online video platform.Footnote 16 The demonstrators, who called themselves PMII (Pergerakan Mahasiswa Islam Indonesia, i.e., Indonesian Muslim Undergraduate Student Movement), consisted of student activists from Muslim universities in Java Island, wearing their university attributes, and carrying banners of their university. They began their protest with a rally until they arrived in front of the KPK building. The organizing student activists were standing on a stage. Only one of them held a handheld loudspeaker. In between the activists and said building stood police forces serving as social control agents.

Chronologically, the protest can be summarized as follows:

  1. (i)

    The student activists started their protest by throwing something that looked like a pebble as a symbol of their mistrust towards KPK. Suddenly someone among the crowd initiated a chorus-based chant, “Usir! Usir! Usir KPK! Usir KPK sekarang juga!” (i.e., Cast out! Cast out! Cast out KPK! Cast out KPK right now!). The invitation managed to lead the activists to jointly produce the chant several times until the organizer asked them to keep calm. The leader then asked his comrades to listen to him and eventually they obeyed him.

  2. (ii)

    The organizer then told the policemen that the crowd was under his command, saying “I can manage my comrades,” to which the audience cheered and applauded in agreement.

  3. (iii)

    A moment later, one police officer walked into the crowd and the organizer pointed out that he might do something fishy. Without any command from the organizer, some of the people began chanting repeatedly, “Hati-hati! Hati-hati provokasi!” (i.e., beware of provocation) based on another familiar song, to which the others responded.

  4. (iv)

    The chant stopped when smoke rose from the center of their rank and began to spread in the location between the two parties. Someone must have burned something on the groundFootnote 17 and the policeman was just trying to extinguish it. Because of that incident, a clash broke.

  5. (v)

    To cease the dispute between the student activists and the police forces, the organizer chanted a particular Muslim chant to calm them down. It sounded soothing, especially for activists who were mostly Muslims. It did not take long for the audience to collectively join the chant led by the organizer, and thus the confrontation was over.

  6. (vi)

    However, the smoke spread once again. It was more intense than before and the student activists once again conflicted with the policemen. Some were trying to chant the soothing Muslim chant again, but apparently, the crowd did not “catch on” the chant since they were too busy fighting violently with the police.

  7. (vii)

    The organizer tried his best to calm the situation, yet he failed. Before the situation got worse, some activists began to chant the previous phrases again, “Hati-hati! Hati-hati provokasi!” followed by other demonstrators, and thus ended the physical confrontation.

  8. (viii)

    By the time the clash was over, the fire they started began to burn amidst the activists—and the organizer began to cue them to sing “Lingkaran besar, lingkaran besar—”Footnote 18 (i.e., large circle, large circle), while one of the organizers was seen to make a “circle” in the air using a hand gesture. The audience joined the cheerful song soon after being prompted by the organizer and they collectively continued the song by themselves while jumping up in the air and raising their arms energetically.

  9. (ix)

    Later the fire burned brighter and forced the student activists to step back, creating a distance between them and the fire in front of their rank. The organizer used this chance to instruct his comrades to assume a formation and raise their left hands as a symbolic gesture of their per-lawan-an and shouted, “Sumpah Mahasiswa Indonesia!” leading the activists to recite the 1998 version of the Student Pledge wholeheartedly until in the end, they chanted synchronously and repeatedly, “Hidup mahasiswa!” (i.e., Long live students!) while the fire was burning even brighter and frothing a smoke cloud which covered a big area between the activists, the KPK building, and the police forces.

  10. (x)

    The crowd went into a state of chaos again when members of the police force were trying to put off the fire. However, the fire grew larger and the situation grew even more chaotic. The organizer then tried his best to calm his comrades by repeatedly asking them nicely. Seeing that his attempt was all in vain, the organizer began to sing the song “Lingkaran” again to reduce the state of chaos—but the crowd had begun to chant something entirely different.

  11. (xi)

    A synchronous chanting of the chorus-based chant, “Bakar! Bakar! Bakar KPK! Bakar KPK sekarang juga!” (i.e., Burn down! Burn down Burn down KPK! Burn KPK right now!), was chanted by the student activists zestfully. Considering that the demonstrators were no longer amicable, the police ended the demonstration by sending reinforcements while the rest of them were trying to extinguish the fire. The combined police forces effectively forced the student activists to disband themselves.

Several chanting occurrences during the demonstration can be identified and further analyzed. Viewing through the lens of situated affectivity, it can be argued that synchronous chanting as a case of strongly coupled and integrated material tools for emoting (see A in the “Situated Cognition and Affectivity” section) occurred in (i), (iii), (v), (vii), (viii), (ix), and (xi). It allowed the student activists to “jointly instantiate an affective atmosphere” (Stephan, 2018: 616). In these cases, activists were responding to certain emotionally-charged chanting invitations, allowing the chant to be distributed to more participating chanters (i.e., student activists), who were reciprocally reacting in return. Referring to Griffiths and Scarantino (2009: 438–440) and Stephan et al., (2014: 76–78), these transiently growing interactions between chanters during chanting allow distributed emotions to be realized and shaped over time until the chanting ceases to exist.

Supposing that each activist had her/his own intentionality despite coming as a group of demonstrators, chanting allows their intentionality to synchronize to some extent. Not to mention that the nature of synchronous chanting assures that facilitating cognitive mechanisms (e.g., mutual monitoring, memory, and turn taking mechanism) are recruited during this process. Accordingly, their attentions and their somatic and affective states are among many aspects of the activists’ overall beings that are likely to converge once they decide to participate in a collective chanting. As the number of activists who join the synchronous chant is increasing, their synchronicity, volume, intensity, and energy level also improve over time to match the current emotional atmosphere before the chanting is gradually ceased or replaced with another chant (see ix, where a transition from The Youth Pledge to “Long Live Students” chanting takes place). In these occurrences, chanting can apparently facilitate the group of demonstrators turning a rampaging atmosphere into a soothing one (see v), elevating their moods (see viii), circulating a word of warning within their ranks (see iii and vii), conveying dismissal (see i), and/or aggression (xi) towards their out-group. This speaks for the claim that occurrences of synchronous chanting facilitate collective affective experiences among the chanters.

At the same time, it can be argued that chanting also serves as a tool for manipulating emoters (see B in the “Situated Cognition and Affectivity” section). In (v), (viii), and (ix), collective chanting was deliberately and successfully initiated by the organizer, while in (i), (iii), and (viii), occurrences of synchronous chanting were triggered by non-organizers. It can be argued that these invitations serve as an attempt to use chanting as a tool for manipulating the demonstrators’ emotions. These chanting invitations were “deliberately launched” and when caught on, they modified the “attitudes and the emotional set up of [the] group” (Stephan & Walter, 2020: 308), though whether the modification took place diachronically or not is still open for discussion.

In the case of the chorus-based chant, despite having the same melodic pattern for more than two decades ago, the contents of the “lyrics” keep changing according to what message the demonstrators want to propagate in each social movement repertoire. Note that the chorus-based chant is generally used for inviting others to act (as what the original “Menanam Jagung” chorus is all about), but not necessarily nor exclusively for expressing one particular emotion. In other words, the emotional contents of each message conveyed through the chorus-based chant may be different, depending on the intention of the message and the context in which it is initiated. For instance, one chorus-based chant may convey emotions of disgust in one protest, but anger in another. Likewise, the affective modifications of the said chant facilitated during a demonstration take place synchronically per occurrence.

By contrast, the collective chanting of Sumpah Pemuda allows the student activists to experience the spirit of youth collectivity, solidarity, and nationalism tied to the nation-building efforts (Sebastian et al., 2014). Referring to the fact that undergraduate student activists have been diachronically enculturated before embracing activism during their college years, particularly the Youth Pledge’s entrenchment to Indonesian student activists may have taken place throughout elementary and secondary school years where the history of pemudas deemed as national heroes has been taught, promulgated and commemorated especially on October 28. Even when the verses were changed, the spirit of unity encapsulated in each version of Sumpah Pemuda (i.e., the Generation 98’s version) unfolds when the pledge is chanted in unison, regardless of the demonstration’s agenda—as Lee’s account suggests (2016: 95). The ritualized reiteration of the “sacred” pledge of the youth then serves as the affective scaffolding (i.e., mind invasion) that affects student activists in the demonstration in point, especially for newcomers. In this way, student activists can feel as if they were representing the spirit of pemudas who are fighting for the nation. This is in line with the function of “synchronic tools” that allows the users to access “a larger set of diachronic enculturation processes” (Stephan & Walter, 2020: 308). In this respect, the chanting of the Youth Pledge is used not only for emoting but also for manipulating student activists synchronically during the demonstration, giving access to novel experiences associated with the youth archetype embedded in Indonesian pemudas’ nation-building movements.

It is worth mentioning that most of the time, chanting of certain phrases was started intentionally. Inviting others to chant together is a deliberate action and this holds true in the context of social movement repertoires such as demonstrations and street protests. It is the initiator’s conscientious decision to choose a compact phrase or statement that represents the collective thought of the group (cf. Dye, 2018: 26). In this regard, the invitation to join in a collective chanting is an act of active manipulation. Referring to Stephan and Walter (2020), the initiator first projects her/his intentionality outward into the environment in a form of chanting invitation which in turn reaches inward into other individuals. The change in direction occurs when the chant is caught on by other chanters, who then reciprocally contributes to an occurrence of chanting and temporarily sets off self-stimulating loops of affective experiences among themselves. In some cases, the organizer might even invite others to chant using a specific cue word, phrase, or familiar tune exactly because she or he knows that such an invitation will trigger others to react with a synchronous chanting as a response. This practice can be seen during speeches in political campaigns, demonstrations, and street protests, where emotional-contagion-eliciting tools are often utilized to influence the collective attitudes and behaviors of the participants. While speeches may influence the emotional repertoire of others, Dye (2018) suggests that a statement spoken by the organizer and how it is framed may bear less importance than if the organizer “mention[s] at least one ‘hot button’ word or phrase … that ma[kes] the audience feel a surge of emotion towards the [organizer] and their fellow group members” (ibid., 49). In summary, we can see that initiating an occurrence of synchronous chanting is not a mere act of expressing oneself, but it counts as an attempt to intentionally manipulate others to experience certain emotions conveyed through the chants.

Accidental Affective Mind Invasion

On the other hand, there are also cases in which synchronous chanting failed despite being prompted by the organizer. In (vi), (x), and (xi), the organizer’s manipulation attempts were not effective enough to alter the activists’ affective repertoire, proving that his claim in (ii) is wrong. The most interesting thing occurred in (xi), when fire, as a structure in the environment that reached into the chanters’ mind, contributed to the process of making up a chant and led them to collectively express violent intent. Several attempts at coordinating the activists to calm themselves down or even to start a non-violent chant ended up in vain. The organizer could not do anything when the rest of the activists started shouting a chant that conveyed their intention to burn down the target of their protest. Therefore, it should not be too far-fetched to say that mind invasion played a significant role in what happened here in (xi), wherein structures in the environment scaffolded the student activists’ minds altogether to the point that they started to chant a chorus-based chant with a message that fitted and contributed even more to the chaotic atmosphere they were already in. In a way, we can see how the unified and rowdy demonstrators “welcomed” the flame and the heat it emitted as proper structures that invaded their minds. Subsequently, a chorus-based chant that contained a violent invitation to act filled the air as their fervors were set ablaze, matching the fiery atmosphere. While the invitation to start the chant itself was intentional, the message contained in it is arguably a product of the situation, in which the student activists’ minds were influenced by accidentally set mind-invading structures (i.e., small fire that grew bigger). The flame may only be a cue, but it does not change the fact that it contributes to the demonstrators’ minds in a way that is difficult to ignore. In other words, the structure in the environment invades their minds and is assembled in the process of emoting. Taken together, we can see how synchronous chanting occurs and modifies the emotional set up of the student activists in real-time during a protest.

Conclusion

Spanning thousands of years and woven into various social rituals, synchronous chanting serves as a powerful tool for evoking emotions and influencing behavior, particularly in the context of social movements. Through the lens of SCAff, this paper has explored how the interactions between activists and proper environmental scaffoldings during an occurrence of synchronous chanting contribute to the distributed affective phenomena among participating chanters that are otherwise inaccessible to uncoupled individuals. The analysis of Indonesian social movements further demonstrates the enduring role of synchronous chanting as a tool for emoting, stretching across social movements, historical periods, and diverse activist groups. Ultimately, this study sheds light on the multifaceted nature of chanting, highlighting its potential not only for emoting but also manipulation within the complex landscape of social movement repertoires. Lastly, it has been shown that the environmental scaffolds might invade the minds of potential chanters, whether launched deliberately or triggered accidentally.