Embodied meaning

In this paper we introduce a social semiotic framework for analysing paralanguage. Our approach was inspired by Chris Cléirigh’s contributionsFootnote 1 to New South Wales Youth Justice Conferencing research consolidated in Zappavigna and Martin (2018) and Martin and Zappavigna (2018).Footnote 2 Cléirigh’s work drew on Matthiessen’s synopses (Matthiessen 2004; Matthiessen 2007; Matthiessen 2009) of Systemic Functional Linguistic (hereafter SFL) research on early child language development. Following Matthiessen (2009) we use the term paralanguage to refer to gestural resources arranged along what McNeill (1992) christened as ‘Kendon’s Continuum’ (gesticulation, pantomime and emblems), along with the vocalisations outlined in van Leeuwen (1999) not usually included in linguistic descriptions of the segmental and prosodic phonology of spoken language (timbre, tempo, tension, pitch range etc.). In this paper however we will consider only gestural systems.

There are of course many ways to classify gestural resources. Kendon 2004 (Chapter 6) provides a thorough historical survey. The most useful vantage point from which to compare classifications is Kendon’s Continuum. The introductory chapters in McNeill (McNeill 2000a, 2000b; McNeill 2012) include clear presentations of the model outlined in Fig. 1 below (taken from Sekine et al. 2013). We will cross-reference our work to this model as we present our framework, setting aside the sign languages of deaf communities (ASL, BSL, Auslan, LSF etc.) since these are languages in their own right (themselves involving paralanguage; Johnston 2018).

Fig. 1
figure 1

Kendon’s Continuum (as rendered in Sekine et al. 2013)

Following on from Martinec’s ground-breaking SFL inspired studies (Martinec 1998, Martinec 2000a, 2000b, Martinec 2001, Martinec 2004, Martinec 2008; see also Muntigl 2004) we will organise our description around the kinds of meaning being made – in SFL terms the trilogy of ideational, interpersonal and textual meaning.Footnote 3 Ideational meaning involves resources for construing reality, interpersonal meaning involves resources for enacting social relations and textual meaning involves resources for managing information flow. Textual meaning corresponds roughly to beats and pointing/deictics in Fig. 1; ideational and interpersonal meaning involves both iconic and metaphoric gestures. The correspondences with Kendon’s distinction between representational and pragmatic functions of gestures is outlined in Table 1 below. Ideational gestures would be representational in his terms; and his pragmatic gestures (defined as not referring to referential or propositional content) include both interpersonal and textual functions.

Table 1 SFL metafunctions (ideational, interpersonal and textual meaning)

There are a number of reasons why our SFL interpretation of paralanguage is timely. One has to do with the explosion of SFL inspired work on modalities other than language triggered by Kress and van Leeuwen’s (1996, 2006) monographs, which focused on single static images. As reviewed in Martinec (2005), O’Halloran et al. n.d.(in press) and Taylor (2017) this work has now been extended to the study of diagrams, Powerpoint slides, webpages, comics, picturebooks, moving images, sound and music, architecture, sculpture, toys and behaviour. Since so many texts involve one or more of these modalities, it is advantageous when studying inter-modal relations to be able to draw on descriptions informed by the same theoretical principles. The concept of metafunction introduced above for example allow us to compare like with like as far as convergence and divergence of meaning across modalities is concerned (Painter et al. 2013). Paralanguage is so closely coordinated with spoken language and so regularly implicated in inter-modal texts of several kinds that the utility of a common metalanguage is clear.

Alongside theoretical integration, SFL is particularly well-suited to the study of paralanguage in a number of ways. One is that it provides a linguistically informed model of prosodic phonology (Halliday 1967, 1970; Halliday and Greaves 2008; Smith and Greaves 2015; van Leeuwen 1992; Martinec 2002) which can be used to make explicit the coordination of rhythm and intonation in spoken language with beats and strokes in gesture. This has facilitated Martinec’s development of Kendon’s early work (Kendon 1972) in this area, taking into account later work by Tuite (1993). We will in fact suggest that SFL’s tone group, analysed for rhythm and tone, provides an essential unit of analysis for work on paralanguage as far as questions of synchronicity across modalities are concerned.

Another advantage of SFL is the clear distinction it draws between paradigmatic and syntagmatic relations (system vs structure in SFL terminology). As is well-attested, there is more variation in the language structures realising systemic options than in the underlying systems themselves (Caffarel et al. 2004). This is even more true when comparing the structural realisation of systems from one modality to another. Kendon’s (2004: 186–187) well-known example of the different trajectories of the gestures accompanying “sliced the wolf’s head off” vs “sliced the wolf’s stomach open” illustrates this point. The swinging arm motions are very different structurally from the clause structures in play; but from the perspective of system, the oppositions in meaning are comparable.Footnote 4 Systematically separating system from structure is crucial when comparing and contrasting modalities.

We also feel that further development of Martinec’s pioneering modelling is timely in light of theoretical and descriptive developments in SFL since his work. This has mainly to do with a clearer articulation of the stratification of language as levels of phonology, lexicogrammar and discourse semantics (e.g. Martin 2010; 20112014; Martin and Rose 2007). Martinec’s work draws largely on Halliday’s lexicogrammatical systems (those proposed in Halliday 1985), the same systems which inspired Kress and van Leeuwen’s (1996) breakthrough. We have found it illuminating to further develop this work by drawing on ideational, interpersonal and textual systems at the level of discourse semantics (ideation, connexion, negotiation, appraisal, identification and periodicity). Work on appraisal (the language of evaluation) in particular (Martin and White 2005) has a number of ramifications for models of paralanguage, especially in relation to the relative marginalisation of these resources in canonical work by Calbris (Calbris 2011), Kendon (Kendon 1997) and McNeill (McNeill 2006).

In this paper we accordingly proceed as follows. In section “Language development (ontogenesis)” we briefly review SFL research on early child language development. We then move on to draw a distinction between behaviour (somasis) and meaning (semiosis), outlining our current framework in sections “Non-semiotic behaviour (somasis)" and "Embodied meaning (semiosis)”. As noted above, for this framework we adopt the term paralanguage to refer to semiosis dependent on language and realised through both sound quality and body language (including facial expression, gesture, posture and movement). We close with a brief discussion of the relations among language, paralanguage and other modalities of communication.

Language development (ontogenesis)

SFL research on language development in young children is a useful starting point for work on paralanguage in two respects. On the one hand, the emergence of the first signs (protolanguage) highlights the issue of what counts as semiosis and what does not. On the other, the realisation of these first signs is multimodal – linguistic and paralinguistic resources are not differentiated at this stage.

Matthiessen (2004, 2007, 2009) reviews the emergence of language and other semiotic systems based on SFL studies of child language development by Halliday (1975, 2003), Painter (1984, 2003) and Torr (1991). These studies show that language develops out of a protolinguistic system in which children draw on sounds, facial expressions and gestures to enact signs. With the emergence of language proper however, these resources become specialised in distinctive ways. Segmental articulation and prosody (rhythm and intonation) are marshalled as the phonology of spoken language.Footnote 5 But vocal resources such as timbre, tempo, tension and loudness (explored in detail in van Leeuwen 1999) continue as expressive resources, often referred to as sound quality. And gesture, posture and facial expression develop as resources often referred to as body language. As Matthiessen points out, sound quality and body language are then coordinated with language as texts unfold: ...certain interpersonal contrasts in language are realized vocally by contrasts in tone (pitch movement) accompanied by facial contrasts involving eyebrow movements; textual contrasts in deicticity are often accompanied by pointing gestures; talking to babies may involve rounding, pouting lips – a feature that affects the sound but which is also visible; and as detailed studies have shown, there is a complex relationship between addressing somebody in language and gazing at them. (2007: 6–7).

In this paper we will follow Matthiessen (2009: 21–22) in referring to the resources of both sound quality and body language as paralanguage.

Non-semiotic behaviour (somasis)

One basic challenge that has to be faced when working on paralanguage is how to distinguish it from behaviour – separating semiosis from non-semiosis in other words.Footnote 6 This is of course the challenge faced by specialists in ontogenesis as they track the emergence of protolanguage out of the pre-linguistic interaction, as explored by Trevarthen (Trevarthen 2005). For Halliday and Painter the key criteria are that i. the act in question is interpretable as one of a systematic system of content/expression pairs (i.e. signs with valeur), and ii. the act in question is used on a number of different occasionsFootnote 7 (i.e. not simply iterated in a single interaction). Halliday (1984/2003: 240) for example notes three signs oriented to action in his son’s protolanguage at 8 months of age (Fig. 2). The signs are constituted as the following content/expression pairs: ‘I want it/grasp firmly’, ‘I don’t want it/touch lightly’ and ‘do that with it/touch firmly’. The contrast between semiosis and non-semiosis is evident here, perhaps most clearly in the contrast between touching something lightly (semiosis) and pushing it away (non-semiosis). In Peircian termsFootnote 8 we might say that the semiosis symbolises the intention of the speaker while the non-semiosis indexes it.

Fig. 2
figure 2

Early protolanguage (action systems); from Halliday 1984/2003: 240, Fig. 9

From this point on we will use the term somasis for non-semiotic behaviour, and semiosis for systems of signs. As far as somasis is concerned we have found it useful to draw on Halliday’s proposals for an evolutionary typology of systems (Halliday 1996: 388, Halliday 2005: 67–68). He recognises four orders of complexity, with semiotic systems evolving out of social systems, social systems out of biological ones and biological ones out of physical ones. We have adapted this framework in our classification of somatic behaviour, distinguishing physical activity, biological behaviour and social communion. Physical activity covers material action involving some change in the relationship of one physical entity to another (walking, running, jumping, throwing, breaking, cutting, digging, pulling etc.). Biological behaviour can be divided into changes that restore comfort (sneezing, coughing, scratching, laughing, adjusting garments or hair etc.) and changes that index discomfort (nail biting, fiddling, fidgeting, wriggling, blushing, shivering, crying etc.). Social communion can be divided into mutual perception (sharing gaze, pitch, proximity, touch, smell etc.) and reciprocal attachment (tickling, cradling, holding hands, hugging, stroking, hugging, kissing, mating etc.). These proposals are outlined in Fig. 3.

Fig. 3
figure 3

A model of behaviour (somasis)

Trained as we are as linguists and semioticians we are not ourselves in a strong position to further develop this model.Footnote 9 But we have found it useful to try and compile a range of behaviours that border on semiosis and which can be interpreted by social semiotic animals as indexing purposeful activity. As Halliday and Painter have shown, early protolinguistic semiosis involves a reconstrual of some of these activities as the expression face of signs. And all of the behaviour outlined above has the potential to be used as signs – for example stamping one’s foot in frustration, coughing to remind a meeting of one’s presence, shivering to indicate one is cold, sniffing to object to an odour, kissing on the cheek as a greeting and so on. In these cases there is some degree of deliberation involved, as manifested in the fact that the behaviour will synchronise with the prosodic phonology and turn-taking structure of an interaction and will be responded to as meaningful by co-participants.Footnote 10

The model of non-semiotic and semiotic behaviour we have developed to this point is outlined as Fig. 4.

Fig. 4
figure 4

Somasis and semiosis

Embodied meaning (semiosis)

In their work on intermodal relations in children’s picture books (Painter and Martin 2012; Painter et al. 2013) Painter and her colleagues suggest a model involving degrees of convergence between verbiage and image. The model is organised by metafunction – degrees of concurrence for ideational meaning, degrees of resonance for interpersonal meaning and degrees of synchronicity for textual meaning (for illustrative text analysis see Martin 2008, Painter and Martin 2012). The relevant terminology is presented in Table 2 below.

Table 2 Convergent verbiage/image relations in children’s picture books

We have drawn on this terminology to deal with two dimensions of the relation between language and paralanguage introduced by Cléirigh as ‘linguistic body language’ and ‘epilinguistic body language’.Footnote 11 The basic distinction here is between paralanguage that is in tune with (resonance) or in sync with (synchronicity) the prosodic phonology (i.e. rhythm and intonation) of spoken language on the one hand and on the other paralanguage that expresses meanings made possible by having language – in Cléirigh ‘s terms linguistic vs epilinguistic body language respectively. We have preferred a more transparent terminology, derived from Table 2, with phonologically convergent paralanguage referred to as sonovergent and semantically convergent paralanguage as semovergent. This revised terminology is outlined in Table 3.

Table 3 Sonovergent and semovergent paralanguage

Sonovergent and semovergent paralinguistic systems will be introduced in turn below, drawing on examples from a Youtube video titled, ‘Let’s Talk. | Random Chatty Vlog’, used here with the presenter’s permission (https://youtu.be/YRx-zDoPbVw). A ‘vlog’ (derived from ‘blog’Footnote 12) is a video in which a user recounts, or presents, some form of personal activity (e.g. a ‘day in the life’ vlog where the user shows highlights from their activity over a day). The following is the description accompanying the video posted by the vlogger:

Grab a cup of coffee and a snack. Let's just sit down and talk today. I chat about annoying people who follow me in the parking lot, my kids begging for food all summer, my hair, feet, Youtube...etc. I have no trouble rambling on. If you like this, PLEASE give this video a thumbs up so I know! I want to know what you guys like seeing. Thank you for watching! Subscribe so you don't miss another video. I post every Monday, Tuesday and Thursday at 2pm EST.

A full transcription, including intonationFootnote 13 analysis, of the hair dye episode of this Vlog from which we take most of our examples is provided as Appendix 1.

Sonovergent paralanguage

Sonovergent paralanguage converges with the prosodic phonology of spoken language (Halliday 1967, 1970; Halliday and Greaves 2008; Smith and Greaves 2015). From an interpersonal perspective, it resonates with tone and involves a body part (e.g. eyebrows or arms) moving up and down in tune with pitch movement in a tone group (tone and marked salience). From a textual perspective it involves a body partFootnote 14 (e.g. hands, head) beating in sync with the periodicity of speechFootnote 15 – which might involve beats aligned with a salient syllable of a foot, the tonic syllable of a tone group, or a gesture co-extensive with a tone group (i.e. in sync with tonality, tonicity or rhythm). An outline of this sonovergent paralanguage is presented in Table 4.

Table 4 Sonovergent paralinguistic systems

The phonological system of tone is realised through pitch movement. In example (1) the vlogger’s eyebrows move up in tune with the rising tone (tone 2) on the syllable prev (Fig. 5).

Fig. 5
figure 5

Example 1

The phonological system of tonality organises spoken language into waves of information called tone groups, with one salient syllable carrying this tone movement. Gestures tend to be co-extensive with this periodic unit. In examples (2) and (3) the vlogger makes a sweeping right-to-leftFootnote 16 gesture referencing past time; the gestures unfold in sync with the temporal extent of the tone group (Figs. 6 and 7).

Fig. 6
figure 6

Example 2

Fig. 7
figure 7

Example 3

The phonological system of tonicity highlights a peak of informational prominence by positioning the major pitch movement of a tone group (its tone) on one or another of its salient syllables (its culminative salient syllable in the unmarked case). In example (4) the vlogger claps on the syllable realising the tone group’s major pitch movement – hair (Fig. 8).

Fig. 8
figure 8

Example 4

The phonological system of rhythm is realised in English through the timing of the salient syllables beginning each foot (relatively equal timing between salient syllables in a stress-timed language). In the following example, the vlogger beats with her hands in time with the salient syllables of the feet / not /, / find the / and / hair dye that I /. The last of these in fact syncs with the tonic syllable hair (Fig. 9).

Fig. 9
figure 9

Example 5

Salient syllables other than the tonic syllable can be given additional prominence through various means. In the following sequence the vlogger’s pitch on the first tone group is unusually high, and contrasts with the descending lower pitch of the following tone group (a sing/song effect).

//3 hopefully / next ↑time I will

//1 get my / ↓hair colour / back //

And the vlogger’s eyebrows move up in tune and in sync with the higher pitch on / hopefully /, before lowering again by the end of the following tone group (Fig. 10).

Fig. 10
figure 10

Example 6

The same sing/song effect follows on and culminates this section of the vlog, with a high pitch on the tonic syllable / now // contrasting with the low pitch on / do //. The vlogger’s eyebrows once again move up and down in tune and in sync with the contrasting pitch salience (this time on contrasting tonic syllables).

//3 [handclap] / um /but for / ↑now

//3 this will / ↓do //

These rhythmic in-tune gestures reinforce the attitudinal import of the rhythm and tonicity (cf. section “Evaluation (interpersonal semovergent paralanguage)” below).

The contribution of sonovergent paralanguage to the vlog is interrupted in tone group 15 of Appendix 1, suspended for tone groups 16–19, and resumes for tone group 20 – to allow for a somatic phase during which the vlogger uses her left hand to scratch her right arm. This phase unfolds as follows:

//3 lighter than it / was a few /days ago

//1 ^ but /yeah it’s

//1 such a / bummer and then I

//2 went to / Target

//3 ^ like / two days / later and there / was a //

The vlogger stops looking at her followers and begins scratching in the final foot of tone group 15 (Fig. 11).

Fig. 11
figure 11

Example 7

The scratching and absence of gaze continues for two tone groups (Figs. 12 and 13).

Fig. 12
figure 12

Example 8

Fig. 13
figure 13

Example 9

Gaze resumes in the final foot of tone group 18 (Fig. 14).

Fig. 14
figure 14

Example 10

And the vlogger then resumes gesturing (Fig. 15).

Fig. 15
figure 15

Example 11

It is interesting to note that the vlogger does not scratch in sync with the rhythm, tonicity and tonality of the text; the scratching lasts for two and a half tone groups, and does not match the timing of salient and tonic syllables. But the paralanguage remains in sync, stopping precisely at the tonic syllable of tone group 15 (/ days ago //), resuming with a smile precisely at the tonic syllable of tone group 18 (/ Target //) and resuming with gesture precisely at the beginning of tone group 19. This indicates that synchronicity with prosodic phonology can function as a demarcating criteria for distinguishing somatic from semiotic behaviour.

Gesture converging with meaning (semovergent paralanguage)

Semovergent paralanguage is convergent with the lexicogrammar and discourse semantics of spoken language (its content plane). We adopt a discourse semantic perspective on these meaning making resources here (Martin and Rose 2007). Ideational paralanguage is ‘mimetic’, concurring with ideation and connexionFootnote 17 systems; interpersonal paralanguage is ‘expressive’, resonating with negotiation and appraisal systems; and textual body language is ‘deictic’, syncing with identification and periodicityFootnote 18 systems. These convergences are outlined in Table 5.Footnote 19

Table 5 Converging paralinguistic and discourse semantic systems

Representation (ideational semovergent paralanguage)

From an ideational perspective we need to take into account how spoken language combines entities, occurrences, qualities and spatiotemporal circumscriptions as figures (ideation), and how these figures are connected to one another (connexion). Semovergent paralanguage supports these resources with hand shapes, which potentially concur with entities, and hand/arm motion, which potentially concurs with occurrences; the hand/arm motion is optionally directed, potentially concurring with spatiotemporal direction (to/from there in space, to/from then in time). We say “potentially concurring” because ideational paralanguage can be used on its own, without accompany spoken language; see the discussion of mime in section "Multidimensionality (multiplying meaning)" below.

By way of illustration we now move to the next section in the vlog, which concerns a visit to the vlogger’s dermatologist (for treatment for granuloma). The sequence of figures we are interested in unfolds verbally in tone groups as follows (for the complete anecdote see Appendix B):

// and so the dermatologist um took like this needle

// and under each like bump

// and injected this like steroid

// and like it all bubbled up //

From the perspective of language, this sequence makes explicit four entities (dermatologist, needle, bump, steroid). The paralanguage uses handshape to concur with two of these (needle and bump) (Fig. 16). The ‘needle’ is first rendered as a tiny pointed entity the vlogger holds between thumb and index finger, and then with the hand shape used for holding a syringe. The ‘bump’ is not actually visualised until the fourth tone group, where it renders the shape of the steroid bubbling up. As we can see, the meanings construed in language and paralanguage can either correspond or complement one another. In terms of commitment (i.e. the amount of meaning specified across semiotic modes; Martin 2010, Painter et al. 2013), the dermatologist and steroid are committed in the language but not the paralanguage; but the needle is more delicately committed in the paralanguage as a tiny pointed entity and then as a syringe. And the paralinguistic commitment of the bump in fact takes place two tone groups after it is committed verbally.

Fig. 16
figure 16

Example 12

Turning from a static to a dynamic perspective, the language of this sequence makes explicit three occurrences  (took, injected, bubbled). The paralanguage concurs with these, and in addition uses six rapid piercing gestures to make explicit the occurrences implied by the second tone group (Figs. 17 and 18).

Fig. 17
figure 17

Example 13

Fig. 18
figure 18

Example 14

In each case the entity indicated by the hand shape is in motion, as the dermatologist picks the needle up, pierces the bumps, injects the steroid and the bump bubbles up (Figs. 19 and 20).

Fig. 19
figure 19

Example 15

Fig. 20
figure 20

Example 16

As with imagic sequences in film, animations, graphic novels, comics, cartoons and picture books, the gesture sequence does not make explicit the conjunctive relations between figures. These have to be abduced (Bateman 2007) from the sequence and concurring language. In the case of this sequence conjunctive relations of time and cause are not made explicit linguistically either; only the additive linker and, is used. A defeasible reading of the sequence is offered below.

// and so the dermatologist um took like this needle

(temporal sequential)

// and under each like bump

(temporal overlapping)

// and injected this like steroid

(causal)

// and like it all bubbled up //

As noted above, for this paralinguistic sequence handshape and motion are combined. In other cases handshapes occur on their own (Fig. 21). In the following sequence our vlogger concentrates on the size of the snack she has given her children, without setting the bowl in motion.

Fig. 21
figure 21

Example 17

// then they had a snack I

// gave them each a bowl

// like a heaping bowl full of Chex Mix and applesauce squeeze //

Motion can also occur on its own, without a handshape concurring with an entity (Fig. 22). For example the vlogger uses a circular hand motion (two rotations) concurrent with the tone group // tried washing it out and it's //.

Fig. 22
figure 22

Example 18

Motion can also be used to support direction in space or time (Fig. 23). Above in section “Sonovergent paralanguage” we illustrated two examples of hands sweeping right-to-left towards the past, concurring with the tone groups // bought previously when I // and // loved the first time //. These contrast with left-to-right movement towards the future, concurrent with // hopefully next time I will //. This motion to the right is reinforced by a pointing gesture, which we discuss in section “Information flow (textual semovergent paralanguage)” below (as textual semovergence).

Fig. 23
figure 23

Example 19

Evaluation (interpersonal semovergent paralanguage)

From an interpersonal perspective we need to take into account how spoken language inscribes attitudes, grades qualities and positions voices other than the speaker’s own (appraisal). We also need to account for how speakers exchange feelings, greetings, calls for attention, information and goods & services in dialogue (negotiation). Semovergent paralanguage potentially resonates with appraisal resources through facial expression, bodily stance, muscle tension hand/arm position and motion (Hood 2011, Ngo n.d. in press) and voice quality. Whereas spoken language can make explicit attitudes of different kinds (emotional reactions, judgements of character and appreciation of things), paralanguage can only enact emotion. A further interpersonal restriction (as suggested by Cleírigh), setting aside emblems (discussed in Section “Emblems” below; Kendon 2004, McNeill 2012), is that semovergent paralanguage cannot be used to distinguish move types in dialogic exchanges (although sonovergent paralanguage can of course support tone choice in relation to these moves).

Paralanguage deploys facial expression and bodily stance to share attitude (Fig. 24). In the following example our vlogger nuances her appreciation (exciting) of a neighbourhood get-together she has dressed up for with raised eyebrows, lopsided mouth expressionFootnote 20 (which we might read as indicating that some followers might not find it all that exciting).

Fig. 24
figure 24

Example 20

As outlined by Martin and White (2005) attitude may not be explicitly inscribed in language, but invoked by ideational choices a speaker expects a reaction to. We introduced an example of this in section “Sonovergent paralanguage” above, as the vlogger introduces the good news that her hair dye is back in stock at Target. Her smiling face makes explicit the affect that her language does not (Fig. 25).

Fig. 25
figure 25

Example 21

A good example of a combined face and body commitment of affect in the vlog we are drawing our examples from comes as the vlogger is complaining about being hassled for her parking spot before she is ready to leave. The relevant tone groups are presented below, and we will return to this example in our discussion of mime in section “Emblems” below (for a complete transcription of this narrative see Appendix C). At this point we are simply interested in the way the vlogger’s facial expression and arm position are used to express the hassler’s exasperation (Fig. 26).

Fig. 26
figure 26

Example 22

// some guy was sitting there

// and there was cars behind him

// and he was like

// [mimics man’s expression]

// [mimics man’s gesture] like

// waving me out //

Turning to graduation, as noted by Hood (2011) the size of hand shapes and the range of hand/arm motion can be used to support graded language. In the following example the sweeping extent of the hand/arm motion resonates with the large quantity of hair dye in stock (whole stack) (Fig. 27).

Fig. 27
figure 27

Example 23

The most striking example of intensification in the hair colour episode occurs when the vlogger uses whole body movement to enact her reaction to how dark her hair is. She throws her head back and leans back as her arms move rise up – literally overwhelmed with emotion (Fig. 28).

Fig. 28
figure 28

Example 24

Alongside paralanguage of this kind converging with force, Hood notes the potential for precise hand shapes and muscle tension to resonate with focus. In the following example, introduced in section “Representation (ideational semovergent paralanguage)” above, the vlogger tightens her grip on the tiny virtual needle she is holding and frowns slightly in concentration as she role plays the precision involved in the dermatologist piercing her bumps (Fig. 29).

Fig. 29
figure 29

Example 25

Hao and Hood (in press). draw attention to the use of what they call de-centering postures to soften focus, using the example of a shoulder shrug converging with fairly non-contractile in a biology lecture. The paralinguistic generalisation here would appear to be loss of equilibrium – e.g. asymmetrical facial expression, out of kilter posture or a rotating prone hand (interpretable as between prone and supine). Clear examples in our data are the faces the vlogger pulls as she struggles to name her skin condition in the second tone group below, the second of which is accompanied by two shakes of her head (Fig. 30).

Fig. 30
figure 30

Example 26

// anyway

//^ ‘it was / some / granuloma /: / /^ / something

// I don’t know- it’s / called- it’s

// some sort of / skin thing //

Turning to engagement, Hood notes the significance of hand position as far as supporting the expansion and contraction of heteroglossia is concerned – with supine hands opening up dialogism and prone hands closing it down. In the following example the vlogger’s supine hands converge with the modalisation probably, reinforcing acknowledgement of the viewers voice (Fig. 31).

Fig. 31
figure 31

Example 27

Two moves later the hands flip over to prone position in support of the negative move shutting down the expectation that the vlogger was in control of the new colour of her hair (Fig. 32).

Fig. 32
figure 32

Example 28

Voice quality was noted in section “Sonovergent paralanguage” above in relation to the sing/song pitch (high then low) movement the vlogger uses in her last four tone groups to close down her hair dye narrative. From the perspective of appraisal the sound quality resonates with her resignation. Further work on this interpersonal aural dimension of paralanguage, drawing on van Leeuwen 1999, is beyond the scope of our current research.Footnote 21

Information flow (textual semovergent paralanguage)

From a textual perspectiveFootnote 22 we need to take into account how spoken language introduces entities and keeps track of them once there (identification) and how it composes waves of information in tone groups, clauses and beyond (periodicity). Semovergent paralanguage potentially supports these resources with pointing gestures and whole body movement and position.

As far as pointing deixis is concerned we can return to the examples contrasting past and future in sections “Sonovergent paralanguage" and "Representation (ideational semovergent paralanguage)” above. Alongside motioning to the past the vlogger’s hand points there. And alongside motioning to the future both the vlogger’s index fingers point there (Figs. 33 and 34).

Fig. 33
figure 33

Example 29

Fig. 34
figure 34

Example 30

Fig. 35
figure 35

Example 31

As far as longer wave lengths of information flow are concerned,Footnote 23 our vlogger is seated and so whole body movement from one location to another is not a factor (as it would be for example for a lecturer roaming to and fro across a stage; cf. Hood 2011). As noted in sections “Sonovergent paralanguage and Information flow (textual semovergent paralanguage)” above however the vlogger does end the episode with a contrasting high then lowered pitch (Fig. 35). The higher pitch penultimate tone group begins rhythmically speaking with a handclap foot and then a foot comprising the ‘filler’ / um /.

Fig. 36
figure 36

Example 32

This is followed by the low pitch tone group; the vlogger is winding down. Following this there is a suspension of both language and paralanguage as her the vlogger’s eyes shut and her head slumps forward (Fig. 36).

The preceding episode to the one we are using to explore sonovergence here ends in a similar way (lowered pitch, with eyes shut, head down) (Fig. 37). So shutting down language and paralanguage and handing over to somasis is clearly a strategy for punctuating longer waves of discourse. It is at these points that the vlogger cuts from one filmic segment to the next (as she thinks of something more to say).

Fig. 37
figure 37

Example 33

Multidimensionality (multiplying meaning)

The sonovergent and semovergent paralinguistic systems discussed thus far are outlined in Fig. 38 (including cross-references to Cléirigh’s original terminology). Although presented as a simple taxonomy, all five subtypes of paralanguage can combine with one another in support of a single tone group (Fig. 38).

Fig. 38
figure 38

Sonovergent and semovergent paralanguage

Several examples of multiple dimensions of paralanguage converging on the same tone group were in fact presented above (for example, the combination of motion towards the future and pointing deixis in Example (19) of section “representation (ideational semovergent paralanguage)”). It is probably safe to say that whenever semovergent paralanguage is deployed, it will be coordinated with tonality, tonicity and rhythm; this is tantamount to arguing that semovergence implies sonovergence. Sonovergent paralanguage on the other hand can be deployed without semovergence, through gestures in tune with or in sync with prosodic phonology (but no more).

An important exception to these principles is what is commonly referred to as mime. In terms of our model mime is semovergent paralanguage that does not accompany language, an apparent contradiction in terms. To explore this further we will return to the miming segment in the parking lot narrative referred to above. The vlogger sets up what happened as follows:

Oh another thing that has been really annoying this summer is you know when you go to a parking lot and it’s a busy place. You get in your car and you don’t necessarily want to leave immediately. Like you might want to- I might want to have Henry test his blood sugar, give the kids snacks. Or if we were at the pool, like change or look at my phone or send a text message or whatever. It drives me crazy when a car is like sitting there following you and then they just wait for you to leave. I cannot stand that. And that has happened so many times. And I was just at the Mall of America and I got back to my car and I went into-. And I met up with a Kimmy from the Dodge family and I went to- I wanted to like Instagram a picture of us and FaceBook whatever. And as I was doing that I- I had...

This is followed by a specific parking lot incident, presented in tone groups below.

// just got in my car

// got my phone and

// as I was doing that

// some guy was sitting there and

// there was cars behind him and

// he was like

// [mimics man’s paralanguage]

// [mimics man’s paralanguage] like

// waving me out and

// I was so:: upset like

// I immediately got up

// put my phone down

// I immediately drove away a bit

// I wasn’t even thinking I

// shouldn’t have done that I

// should not have done that

// but it was just like “What!”

// There’s a guy sitting there

// waving and

// angry at me be-

// cause I was sitting in my car //

In terms of the tonality of this sequence, there are two miming segments where tone groups might have been. For each, the vlogger mimes the paralanguage of her parking spot assailant. In the first slot she mimes his interpersonal attitude paralanguage, as discussed in section “Evaluation (interpersonal semovergent paralanguage)” above (Fig. 39).

Fig. 39
figure 39

Example 34

In the second she mimes his ideational motion paralanguage as she twice gestures leaving (the second time including a textual pointing gesture) (Figs. 40 and 41).

Fig. 40
figure 40

Example 35

Fig. 41
figure 41

Example 36

The third time his motion gesture is mimed in fact concurs with language (Fig. 42).

Fig. 42
figure 42

Example 37

As we can see, the two miming segments are heavily co-textualised by language that makes explicit what is going on. The orientation to the narrative introduces the recurrent problem of someone following the vlogger in a parking lot and waiting for her to leave. The miming segments are themselves introduced with the incomplete tone group // he was like... //, with a missing tonic segment. The vlogger then mimes the expected information, before making it linguistically explicit in a tone group converging with the third iteration of the gesture.

Setting aside pantomime (the ‘art of silence’ Marcel Marceau referred to it), we can predict that co-textualisation of this kind is a generalisable pattern as far as semovergent paralanguage in the absence of language is concerned. What the moment of mime does not provide as far as language is concerned, the immediately preceding and following co-text does provide. So the convergent nature of semovergent paralanguage is clear.

Emblems

It remains to introduce our treatment of what Kendon (Kendon 2004) refers to as emblems, drawing on Ekman and Friesen (1969). Included here are gestures such as thumbs up or thumbs down (as praise or censure), index finger touching lips (for ‘quiet please’), hand cupped over ear (for ‘I can’t hear’), middle finger vertical (for ‘get fucked’) and so on. Our vlogger uses one of these gestures to introduce the first of her explanations as to why her hair is darker than usual – raising her index finger as an emblem for the numeral ‘1’ (Fig. 43).

Fig. 43
figure 43

Example 38

These gestures differ from the semovergent ones illustrated thus far in critical ways (cf. McNeill 2012: 7–10). For one thing they commit very specific meanings and can be readily recognised without accompanying co-text. As part of this specificity they can enact moves in exchange structure on their own – e.g. the statements and requests noted above, alongside greetings and leave-takings (hand waving), calls (beckoning gestures), agreement (nodding head), disagreement (shaking head), challenges (upright palm facing forward for ‘stop’) and so on. For another they are much more easily called to consciousness, as the first thing that comes to mind when someone mentions gesture. And in this regard they are often commented on as culturally specific (e.g. the difference between an Anglo supine hand beckoning gesture and its Filipino prone hand equivalent). In both respects emblems contrast with common-sense dismissals of the paralanguage (introduced in sections “Sonovergent paralanguage" and "Gesture converging with meaning (semovergent paralanguage)”) as idiosyncratic (although none of us has any trouble successfully interpreting another speaker’s sonovergent and semovergent systems). From the perspective of the sign language of the deaf, emblems most strongly resemble signs; they are expression form gestures explicitly encoding meaning. Similarly, from the perspective of character based writing systems (such as those of Chinese), emblems most strongly resemble characters (but gestured rather than scribed).

This indicates that emblems are better treated as part of language than as a dimension of paralanguage. The relationship we are emphasising between emblems and alternative expression form systems is outlined in Fig. 44, using the words zero, one, two, three, four and five as examples. These words can be alternatively expressed in English through segmental phonology (e.g. /tuw/), graphological characters (e.g. ‘2’) or hand gestures (index and middle finger vertical).

Fig. 44
figure 44

Alternative realisations of expression form (gesture, graphology, phonology)

An outline of the place of emblems in our overall system in presented in Fig. 45. Rather than treating them as a dimension of paralanguage, we have moved them over to language proper, as an alternative manifestation of its expression form.

Fig. 45
figure 45

Emblems as gestural signs

Intermodality

In this paper we have outlined a model distinguishing behaviour from meaning (somasis vs semiosis), and within semiosis, language from paralanguage. Paralanguage itself was then divided into sonovergent and semovergent systems according to their convergence with either the expression plane or content plane of language. Sonovergent systems enact interpersonal meaning in tune with and compose textual meaning in sync with the prosodic phonology of language; semovergent systems construe ideational meaning, enact interpersonal meaning and compose textual meaning convergently with the discourse semantics of language (and its realisation through lexiogrammar).

Compared to other modalities of communication, paralanguage has a distinctive relation to language in that it is coordinated with prosodic phonology. This is obviously true, by definition, for sonovergent paralanguage. But semovergent paralanguage is also coordinated with tonality, tonicity rhythm and tone, since gestures, facial expression, bodily stance and sounds unfold in measures of time converging with units of rhythm and intonation. Even brief episodes of mime follow this principle, filling in for ‘missing’ tonic segments or tone groups as a whole. Alongside this expression form of temporal dependency, paralanguage is dependent on the content form of language because of its inherent generality. Semovergent paralanguage typically commits meaning far less specifically than spoken language can; instantiations are by and large interpretable with respect to what is said. With respect to these two dependencies, the prefix para- (understood as ‘beside’) is appropriate.

But what about the stem -language which para- is prefixed to? The drift of consensus in gesture studies, as reviewed and promoted by Fricke (Fricke 2013) appears to be towards treating aspects of what we have been calling paralanguage here as part of language (in fact as part of grammar in Fricke’s work). From the perspective of SFL this argues for a re-interpretation of the taxonomy in Fig. 44 above as Table 6 below, with paralanguage positioned not alongside language but as part of its expression form. In this model, the content form of face-to-face linguistic communication can be realised as phonology (of spoken language) or sign (including the sign languages of deaf communities and the ‘emblems’ of hearing ones), plus in both cases sonovergent and semovergent paralanguage; and for many languages we have a graphological system used for written communication. This leaves us with the terminological challenge of how best to name the sound quality and gestural resources we have been calling paralanguage in this paper (since they wouldn’t be para- anymore); we will not attempt to improve on our usage here.

Table 6 Paralanguage as expression form

This of course makes research into the relation between language and paralanguage an interesting case study as far as research into intermodality in general is concerned, possibly helping to clarify some of the theoretical and descriptive challenges posed in Martin 2011.

Our evolving work on these dependencies can be tracked through Martin et al. (2010), Hood (2011), Martin (2011), Martin, Zappavigna, Dwyer, and Cléirigh (2013) Martin and Zappavigna, 2018, Zappavigna and Martin (2018), and Hao and Hood (in press). From the perspective of SFL the most pertinent work on relations between modalities to compare with these studies is Painter et al. 2013 (on language and image in children’s picture books). Beyond these initiatives, multimodal discourse analysis research is best guided by Bateman et al. (2017).

As we stressed at the beginning of the paper building models of intermodality is facilitated if the descriptions of distinct modalities are informed by the same theoretical principles; and this is important for applications. Work in educational linguistics, for example Hood (2011) and Hao and Hood (in press), regularly has to deal with the interaction of language, paralanguage and imaging on Power Point slides. And for forensic linguistics, for example Martin and Zappavigna (2013) and Martin and Zappavigna, 2018, Zappavigna and Martin (2018), language and paralanguage interact with the semiotics of the location of the legal proceedings (which are very different for courtrooms and Youth Justice Conferences). The model of intermodal convergence (ideational concurrence, interpersonal resonance and textual synchronicity) presented in Table 2 above is far easier to operationalise when each of the modalities involved is interpreted from the perspective of SFL.

Our model of paralanguage might also prove of interest as a contribution to the growing field of interactional linguists (Ochs et al. 1996; Fox et al. 2013; Couper-Kuhlen and Selting 2001, 2018). These linguists see language structure as an emergent phenomenon which can only be understood in relation to the use of language in dialogue, and they draw heavily on Conversation Analysis (CA) in their research. This brings paralanguage and other modalities of communication into the picture as far as our understanding of language is concerned (cf. Heath and Luff 2013). SFL’s perspectives on multimodality creates an opportunity for linguistics to make a stronger contribution to this transdisciplinary exercise (Martin forthcoming).