The issue of learning is and will remain central to education. The influence of the cultural–historical paradigm of the Soviet linguist-psychologist Vygotsky on thinking about learning has not waned. It has been expanded in various ways, as it is in the article we review here, where Lilian Pozzer and Wolff-Michael Roth are turning to the still quite new notion of multimodality to get a more detailed and more expanded account of the resources involved in processes of learning, and a better understanding of how they are used. Pozzer and Roth’s article explores how scientific concepts are communicated in a Canadian secondary school classroom. The authors present 22 episodes from a series of video-taped biology lessons. In most of these episodes, the teacher deals with the phenomenon of contraction within the human heart. Pozzer and Roth explore how the teacher uses a range of different semiotic resources to communicate this phenomenon to the students. They show how the teacher, each time seemingly newly, designs a multimodal learning environment on the basis of the principle of aptness of fit between what is to be communicated and the specific possibilities of the semiotic resources available to the teacher for communication. Speech, gesture, drawing, and other semiotic modes are used by the teacher in multimodal complexes (designed combinations of modes), because each mode offers distinct, complementary possibilities to characterize and to draw the students’ attention to specific features of contraction. Without the range of resources available to the teacher, the potential for student learning would be limited, perhaps severely so (Bezemer and Kress 2016).

The authors have documented the teacher’s multimodal accounts of contraction in transcripts of speech combined with a series of video-stills depicting hand and arm positions at different points in time, suggesting movement, a defining feature of gesture. In the transcripts, some words are underlined, to indicate that they are spoken as the teacher produces the “stroke”—the main movement of the gesture suggested by the series of video-stills. Additional video-stills depict what the teacher had drawn on the blackboard, including drawings of the heart, and words written naming features of the heart. More information on the pitch and intonation contour is given for those parts of the speech that coincide with the stroke.

Academics, just like teachers, are limited by the semiotic resources available to them to (re)construct and comment on phenomena in the physical and social world. For Pozzer and Roth, one specific challenge is to render visible the movements that make up the science teacher’s gestures and the way in which these unfold in time and space. Alternative resources for tran[s]-“scription”, including line drawings and formalized and narrative descriptions of gesture (see, for example, Kendon 2004; Goodwin 2018) may be needed to produce a useable semiotic, multimodal account of gesture in the science classroom.

Still, we can make pertinent observations about the place of gesture in the sign-complexes that the teacher designs to construct and to communicate scientific concepts. We briefly discuss four observations. First, the teacher uses both hands in all examples. In some, the movements of his hands jointly represent a single contraction; in others, the movement of each hand represents an individual contraction. For example, in Episode 11, the teacher brings his hands together as he states that “the” ventricle contracts. In contract, in Episode 14, the teacher squeezes both hands, while referring to the two “ventricles” (plural) contracting at the same time. Second, the speed, the “reach,” and apparent force of the movements signify varying levels of intensity of the contraction. Indeed, as the video stills suggest, intensity goes beyond the hand movements. “Forcefulness” is signified by the teacher’s whole upper body—tensed, simultaneous squeezing to form fists; coordinated “strokes” of the hands—to do semiotic work that is particular to the science classroom: in this case to draw attention to the distinct kinds and functions of contraction in the different heart chambers. Third, the strokes signifying contraction are repeated to suggest a beat, a rhythm; and they are held in a fixed position for some time (“post-stroke hold”), freezing the process being represented to draw attention to specific states. Fourth, the “placement” of gestural representations of contraction on the imaginary canvas in front of the teacher signifies where in the heart the contraction being represented happens. Gestures representing contraction in the upper chambers (atria) are made above the level of the teacher’s shoulders; those representing contraction in the lower chambers (ventricles) appear to be made below the level of the teacher’s shoulders.

The authors observed a total of seven consecutive lessons on the heart, so the collection of examples allows for a diachronic exploration. The teacher builds an account of contraction and related phenomena over a series of lessons. We can understand the changes in the gestural representations in the course of the lesson series as the teacher’s responses to presumed, imagined, and observed changes in the students’ understanding of contraction. Thus, the teacher not only produces different versions of the “same” gesture, he also orders them so as to draw attention to, respectively: the criterial features of a contraction; the loci of contractions in the heart; the intensity of some contractions; their temporal instantiation, et cetera. The authors provide a parallel set of examples showing that the teacher produces different drawings, each highlighting different features of the heart. Similar principles underpin the design of textbooks and of other, online learning materials (Bezemer and Kress 2016). What is clear is that the teacher designs in response to his assessment of the understanding of his students.

Recognition of multimodality in sites of learning begs critical reviewing of terms that until recently appeared entirely apt for describing the “semiotic work” of teachers and students. If—as we think it is—a semiotic account is now needed, the terms used in that account have to indicate and be apt for that. Now we are no longer dealing with “speakers” and “listeners,” but with meaning makers, sign makers, semiotic workers, communicators, interlocutors. In a semiotic approach, scientific concepts, brought into the science classroom, are drawn out, written out, acted out, talked about. If multimodality is brought into the account—as it is here—there are consequences for theory and hence for “naming”: Science is communicated multimodally.

In using the notion of mode, the social semiotic paradigm abandons the distinction of “the verbal” and the “non-verbal,” which is still present in some parts of Pozzer and Roth’s article and indeed in publications on multimodality more generally. The problem with that distinction is that it is derived from an old paradigm that privileges “the linguistic” over the “non-linguistic.” Social semiotics uses the distinction of speech and gesture as two modes. As mode, gesture has the same potential for making meaning as do all modes, such as speech or writing. Pozzer and Roth do in fact recognize that distinction when they define “growth points” as “moments in which ideas in the form of a gesture–speech dialectic are born.” That will make it possible to “analyze, from a multimodal communicative perspective, the articulation and development of scientific concepts”: drawing on all modes involved is essential in that.

As the notion of multimodality is more widely used, one important task for semioticians is to refine commonly used, yet still somewhat haphazardly, descriptive categories. Frequently, lists of items are presented as belonging to the same paradigm, where it might be argued that they are not. For instance, Pozzer and Roth write about “multimodal resources, such as, for example, speech, gestures, body orientations, facial expressions, prosody, videos, three-dimensional models, drawings, diagrams, graphs, and photographs.” Yet are “drawings, diagrams, graphs, and photographs” all instances of the mode of image? Are speech and prosody distinct modes or is prosody one of the resources of the mode of speech? Are intonation, pitch and prosody distinct semiotic features, and if so how? Is video a medium for documenting and disseminating meanings-as-messages, or is it a mode? These questions, we believe, put into sharp focus the need for the development of more refined theoretical frameworks in multimodality.

Many of the observations of Prosser and Roth resonate with prior accounts of science education from multimodal and (social) semiotic perspectives (e.g., Lemke 1998; Kress et al. 2001), as would their statement that “a concept is not taught as a single thing but each concept is the outcome of an event, the cumulative effect of the production of coordinated material signs.” We would, however, regard the mode of speech as more than “the words uttered”: that is, we would attempt to become aware of regularities and different potentials for making meanings in the differing modal resources and in their use, similarly, with the processes of learning and teaching, and the role, function, and position of the participants in these. In a social semiotic account, communication—as an instance both of social interaction generally and of learning and teaching specifically—crucially depends on the semiotic work of interpretation by the person who engages with a message. In the case of learning and teaching, that is the semiotic work of the learner.

And so as a concluding comment on the possibilities of bringing the two paradigms into productive conjunction, if, as Prosser and Roth announce at the beginning of their article, “Concepts are the currency of Science,” then a social semiotician would want to assert that “signs are the currency of meaning.” Each paradigm holds to its position; both are valid. For social semiotics, it is clear that concepts are made materially evident as signs, whether simple or complex. Each offers a specific “take,” distinct and different, entirely and essentially compatible in order to answer specific questions, to do and achieve specific kinds of things.