The voice, text, and the visual as semiotic companions: an analysis of the materiality and meaning potential of multimodal screen feedback


The gap between how learners interpret and act upon feedback has been widely documented in the research literature. What is less certain is the extent to which the modality and materiality of the feedback influence students’ and teachers’ perceptions. This article explores the semiotic potential of multimodal screen feedback to enhance written feedback. Guided by an “Inquiry Graphics” approach, situated within a semiotic theory of learning edusemiotic conceptual framework, constructions of meaning in relation to screencasting feedback were analysed to determine how and whether it could be incorporated into existing feedback practices. Semi-structured video elicitation interviews with student teachers were used to incorporate both micro and macro levels of analysis. The findings suggested that the relationship between the auditory, visual and textual elements in multimodal screen feedback enriched the feedback process, highlighting the importance of form in addition to content to aid understanding of written feedback. The constitutive role of design and material artefacts in feedback practices in initial teacher training pertinent to these findings is also discussed.


Although commenting on student writing is the most widely used method for responding to student writing, it is the least understood (Sommers, 1982, p.143).

The quote above relates to research conducted in the late 1970s. However, its message is likely to resonate with many tutors working in higher education today (Burke and Pieterick, 2010). Feedback is considered to play a crucial role in student learning and achievement (Black and Wiliam, 1998; Hattie and Timperley, 2007; Price et al., 2010). Different models of effective and sustainable feedback practices have been proposed, including a focus on process and task (Hattie and Timperley, 2007), prioritising timely feedback (Irons, 2008; Wiggins, 2012) and comments which are developmental rather than evaluative (Wiggins, 2012).

Academic writing is a key assessment component of teacher training courses in the post-compulsory education sector and, as such, is considered a “high stakes” activity (Lillis and Scott, 2007, p.9). Trainee teachers are expected to possess an excellent level of professional literacy to be able to meet the rigorous demands of their programme and to be able to assess their students’ work effectively. Given the increasingly diverse body of non-traditional learners, including those entering teaching, from different social, cultural and economic backgrounds, the need for transparent, meaningful and supportive feedback is more relevant than ever. However, the root of the issue lies with how feedback is interpreted by learners, suggesting that the written form of the feedback may be problematic. Learners can find written comments difficult to unpack and view them as overly critical, unclear or impersonal.

The literature on written feedback practices in teacher education, albeit limited, has tended to emphasise the relational aspects of the process. For example, personalised, constructive comments were considered instrumental in building confidence and building motivation in Ferguson’s study (Ferguson, 2011). Dowden et al. (2013) concluded that trainee teachers’ perceptions of feedback were heavily shaped by their emotions, sometimes resulting in misinterpretation of key messages, and highlighted the need for increased dialogue. Greater reciprocity in the teacher-student relationship was also foregrounded in Davis and Dargush’s research (2015), mediated through mutual respect and “academic trust”: trainers were expected to adopt a professional approach to marking and respond to students’ queries. Common to all these studies was the idea that feedback was seen primarily as a language-based method.

An increased emphasis on self-regulated feedback practices (Butler and Winne, 1995; Laurillard, 2002; Nicol and McFarlane-Dick, 2006), guiding students to notice gaps in their assessment performance, reflects an epistemological shift to a socio-constructivist theory of knowledge. Learners are encouraged to be more autonomous in the feedback process, to engage in dialogue with the tutor and work on their areas for development. This will help them to not only perform better on their current programme of study, but also to foster effective lifelong learning habits (Boud, 2000): a general move from “mechanistic” to “responsive” feedback (Boud and Molly, 2013, p.703). It is questionable, however, whether a monomodal form of feedback is the only and most appropriate way of achieving this. New approaches to understanding learning and communication have also emerged, and within these, it is notable to mention multimodal approaches to assessment feedback. Multimodality can be observed as part of the most recent emergence of semiotic approaches to education (Jewitt et al., 2016; Kress, 2009) that argue for a more holistic view of communication modes and their integration.


Screencast and assessment feedback

Research into the effects of auditory feedback on written assignments has revealed generally positive findings (Butler, 2011; Voelkel and Mello, 2015) and this relationship between multimodality and feedback has expanded to include simultaneous visual and auditory modes of delivery, collectively known as screencasting. This is the process of capturing screen activities, including cursor movements, accompanied by an audio narration (Peterson, 2007). Tutors can home in on specific sections of a learner’s work, commenting on both content and use of language; a link of the recording can then be sent to learners to watch in their own space and time.

With multimedia screen feedback, the attention switches to a combination of elements working in synch: the visual markers on the screen; the static and dynamic movement of the cursor; the text, typography and layout in addition to the voice. It is, therefore, not only language which is significant in the feedback process but also the material constituents of the feedback: how the interrelated elements contribute to meaning-making. Although screencasting is not new, most research has focused on its use as an instructional tool (Peterson, 2007; Razik and Ali, 2016). Its potential as a feedback mechanism has tended to foreground the significance of the medium in terms of conveying content, possibly providing more depth and clarity (Brick and Holmes, 2008; Martinez-Arbodela, 2018), greater nuance (Hope, 2011) and connectivity and engagement with the tutor (Hope, 2011; Jones et al., 2012; Mathieson, 2012). Less analytical attention has been paid to its materiality and how this is connected to knowledge and educational practice (Sørensen, 2009). This study on multimodal screen feedback builds on this view of sociomateriality (Fenwick, 2011; Sørensen, 2009), emphasising the “matter” of education, defined by Fenwick and Landri (2012, p.1) as “the mutual entailment of human and non-human energies in local materialisations of education and learning”. At the heart of this research is the view that multimodal feedback practices are not shaped by human actions alone; the social and material components of the screencasting technology are inextricably linked and contribute to different forms of knowledge.

Multimodal approaches and assessment feedback

In today’s digital world, electronic screens are ubiquitous in education, increasingly superseding the printed page (Kress, 2003). Although the term “digital natives” (Prensky, 2001) is an oversimplified term to describe those who have grown up with technology, there is no denying that digital literacy practices play a significant role in students’ work (Lea and Jones, 2011). It is unsurprising, therefore, that feedback has evolved to “keep up with the changes taking place beyond the boundaries of the classroom and campus” (Lamb, 2018, p.5). Although the written word remains dominant in assessment feedback, primarily because it is perceived as a more scholarly and permanent medium, multimodal feedback has the potential to resonate with learners from a diverse range of academic backgrounds. Given the drive to recruit larger numbers of students into higher education, including those who are considered “non-traditional” learners, a one-size-all policy to delivering feedback is unfeasible. It seems sensible, therefore, to adopt a more open-minded approach to assessment feedback, to explore the potential value of multidimensional modes in their representation of knowledge rather than dismissing them as alternative or “risky” (Lamb, 2018, p.15).

Situated within a semiotic analytical framework based on an Inquiry Graphics (Lacković, 2018) approach, the contribution of this study to the field of initial education is twofold. Firstly, it highlights the significance of the materiality or characteristics of modes which both carry meaning and shape social actions in situated feedback practices. Secondly, with reference to Kress (2009) and Kress and Selander (2012), it foregrounds the role of multimodal design on teacher education programmes as practitioners select and deploy semiotic resources based on their prior uses, availability, intention and knowledge of the audience.

Semiotics of assessment via an inquiry graphics approach

Multimodal approaches to communication, of which feedback is an example, are a part of a larger and longer tradition of semiotics, and, particularly, a recent and emerging approach of a semiotic theory of learning and edusemiotics. In the broadest sense, semiotics is the study of “signs” (Eco, 1976). Its primary aim is to analyse how we create and interpret signs in order to communicate and interpret meanings (Sebeok, 2001) through a diversity of forms such as words, images, sounds, gestures, road signs, text messages and so on (Chandler, 2002; Lacković, 2018). According to Peirce (1931-58), a sign in its represented form can be categorised in different ways: as icon, index or symbol. An iconic sign is the least arbitrary of the typology, bearing a physical resemblance to the object it denotes; for example, in the form of a picture or diagram. An indexical sign indicates something: “it is related to its representamen by an actual, single, existential, cause and effect relation” (Trifonas, 2015, p.139). In Peircian terms, a symbol does not resemble the object being represented; instead it is “constituted a sign merely or mainly by the fact that it is used and understood as such” (Peirce, 1931–58, p.2307). This reinforces the view that interpretation is instrumental in meaning-making and is particularly pertinent to feedback practices. Audio-visual multimodal feedback is a unique form of icon-symbol feedback where pictorial and design information of the screen (iconic and indexical signs as text highlighting and the deictic movement of the cursor) are combined with language as symbolic signs.

This study is guided by an Inquiry Graphics (IG) approach (Lacković, 2018) in its analysis of multimodal screen feedback. This approach draws upon a Peircian triadic model of interpreting signs, the unit of analysis in edusemiotic research. Edusemiotics posits semiotics “as the foundation for educational theory and practice at large” (Olteanu and Campbell, 2018, p.246). It adopts a holistic perspective of education, rejecting the subject-object dualism pervasive in classical scientific inquiry (Deely and Semetsky, 2017). Viewing education as “a process of continuous enquiry and exploration, both formal and informal, through engagement with signs” (Semetsky and Stables, 2014, p.1), its alternative philosophy complements the view that feedback is not simply a “textual product” (Tuck, 2012), independent of social and political factors, focused on achieving concrete outcomes. Feedback is a complex social practice, beyond the purposive actions of both the teacher and the student: the social and material aspects of feedback interrelate in frequently unpredictable ways.

Edusemiotics offers an alternate view to the “linguistic turn” (Semetsky, 2014) and an Inquiry Graphics approach extends this to encompass multimodal elements including screen layout, the voice, diagrams, images and videos in its framework (Lacković, 2018). Peirce’s triadic model (fig. 1) offers a dynamic process of meaning-making with its irreducible combination of the three following elements: representamen, object and interpretant. The representamen refers to the form of the sign so, in this study, what can be heard or seen on the screencast such as the voice and cursor. The object is the referent to which the sign refers (the screencast) and the interpretant is the meaning ascribed to the sign in the interpreter’s consciousness dependent on context, historical and socio-cultural factors (Peirce, 1931–58).

Fig. 1
figure 1

Peirce’s triadic model in relation to screencasting

Although these elements have been presented separately for ease of analysis, they exist simultaneously. Meaning-making is interactive and evolutionary: new signs are borne out of “an ongoing open-ended process of interpretation, growth and development” (Semetsky and Stables, 2014, p.1).

The IG model has adapted these elements of the Peircian triad, also incorporating Barthes’ (1977) principles of semiotics of denotation and connotation (Lacković, 2018) as will be explained via my presentation of analytical data. This enables the researcher to integrate both top-down and bottom-up processes in a research study, embedding different levels of analysis. Although Peirce’s sign is triadic, the levels of interpretation applied consist of four layers. The initial focus of analysis is on the individual elements of the screencast (interpretation level one), which links to Peirce’s concept of “representamen”. These are listed systematically as nouns, representing what can be seen or heard. In order to convey meaning, they need to be linked to the other elements of the triad. In the IG model, these then divide into two separate levels of signification: denotation and connotation under Peircian “interpretant” (fig. 1). The denotational level (interpretation level two) is concerned primarily with the description of the elements in terms of their role in providing feedback, focusing on their literal meaning.

However, these visual, textual and auditory elements are packed with meaning; they are not “neutral recordings of reality” (Machin, 2007, p.23). The sign-maker (teacher) has selected these elements to communicate different messages, drawing upon the affordances of each mode. The connotation level (interpretation level three) develops this denotational recording process to the different meanings these elements evoke in the mind of the interpreter and in relation to the thematic object of reflection. This stage of analysis is interactive, requiring the observer to draw upon their sociocultural knowledge of the represented elements to generate meaning (Lacković, 2018), akin to Peirce’s interpretant. This emphasises the subjective nature of the semiotic coding process. In addition to drawing upon their linguistic reserves to decode the spoken and written comments on the screencast, observers will refer to their contextual knowledge (what they have encountered or familiarised themselves with previously in feedback practices), for example the notion that the highlighting of specific text in this context connotes prominence or visibility.

One potential criticism of the IG coding system is, as with the semiotic analysis of still images, it offers only “impressionistic insights into the construction of meaning” (Penn, 2000, p.239) and interpretations derive from descriptive stereotypes or are embedded in personal bias. Adopting a reflexive approach throughout the process is, thus, essential, continuously being alert to individual prejudices and focusing the analysis on the research object. It is important to acknowledge the limits of our interpretations but equally recognise that by raising awareness of the possibility of bias, we will be practising criticality in our semiotic analysis. Gathering the interpretations of others will also ensure greater validity and provoke further criticality through our questioning of concepts or images that have become deeply entrenched in our cognitive schema.

Based on the reviewed literature, the following research questions were designed:

  • What are characteristics of multimodal screen feedback?

  • How do particular ensembles of modes communicate meaning?

  • How do student teachers perceive the semiotic potential of the audio-visual multimodal feedback to enhance written feedback?

  • How do the findings relate to the practice of feedback in teacher education?

Methods and methodology

Research context and participants

This study was situated within a two-year part-time higher education initial teacher development programme. Semi-structured video elicitation interviews were conducted with ten student teachers (eight female, two male) of different disciplines (SEND, English literature, Health and Social care, Beauty, Engineering, ESOL and Education Studies) to gauge their perceptions of multimodal screen feedback. The participants represented a diversity of ethnicity, culture and languages with five of the interviewees declaring English as their second or third language.

The object of the multimodal analysis was a screencast used by a tutor to deliver feedback to a female student teacher on a higher education training programme. The purpose was to respond to the trainee’s written assignment in alignment with a self-regulated approach to assessment feedback. The recipient of the feedback had been encouraged to adopt a more active role in the feedback process and outline what she considered to be the strengths and weaknesses of her written assignment. The tutor then tailored her feedback, using Camtasia Studio, in response to this self-assessment.

Video elicitation interviews

Adhering to the semiotic principles of the IG approach, analysis of the screencasting artefact was embedded into the interview process. This ensured greater coherence between conversations focused on the micro analysis of the screencast, its individual elements and their contribution to meaning-making, and macro aspects such as issues of power and the influence of internal and external forces in feedback practices. In addition, viewing the screencast acted as an aide-mémoire for the interviewees who had been recipients of the feedback to recall their experiences, emotions and perceptions of the medium.

The first step of the multimodal analysis was to play different sections of the screencast. The video was then stopped, and the participants were invited to list all the details they could see on the screen and hear. Two screenshots were provided for analysis (figs. 2 and 3). These were specifically selected as they revealed different represented elements which were perceived to be most relevant to the focus of the research (Lacković, 2018). Finally, conclusions were drawn about the relationship between the multimodal characteristics and the affordances of the screencast in respect of enhancing feedback practices in teacher education, the object of the research.

Fig. 2
figure 2

A screenshot of the screencast

Fig. 3
figure 3

A screenshot of the screencast depicting different visual elements

IG analysis

As I have already introduced the triadic sign on which the analysis is based, the definition of each IG code, as aligned with figure one presented earlier in this paper, is provided below.

  • Representamen: the identification by the participants of the visual and auditory screencast elements.

  • Denotation: a description of each element: what the participants recognise is happening on the screen in conjunction with the audio commentary.

  • Connotation: the socio-cultural and personal meanings associated with each element.

Findings were developed as reflection and insights following the analytical coding.

Ethical considerations

During the data collection phase of the study, an open and welcoming environment was created, one in which the participants were encouraged to speak as freely as possible. They were reassured that no interpretations were considered incorrect; it was the significance they attached to the multimodal elements that was noteworthy. Forging a relationship based on trust and openness also facilitated a rebalancing of the unequal power dynamics in the study, between the researcher (a teacher trainer) and participants (student teachers). Each interview took approximately thirty minutes, was transcribed verbatim and as new themes emerged, these were categorised using NVIVO software. These broad themes were then divided into sub-themes in order to perceive patterns in the data and highlight any gaps or follow-up questions. Confidentiality of data was assured and to protect the participants’ identities, each interviewee was allocated a number: P1, P2 and so on. Where illustrative quotes have been used to record the interviewee’s narratives, their academic and/or vocational discipline has also been provided to contextualise the data.

Findings using inquiry graphics analysis coding

Multimodal elements (interpretation level one)

A list of the individual elements of the screencast (figs. 2 and 3) are provided in fig. 4. These have been aligned in order of their spatial and structural “hierarchy”: smaller units are represented via the number of oblique lines.

Fig. 4
figure 4

Spatial and structural hierarchy of multimodal elements in the video feedback

It was evident from the interviews that the participants easily identified the leading modes – the voice, laptop screen and also the cursor – but the sub-units, smaller units contained within these larger units, were generally not named and needed to be elicited from the interviewees. For example, in regard to the voice elements, the participants were able to broadly identify the tone of the voice but did not comment on prosodic features such as tempo, pitch and intensity.

This illustrates that participants and learners in general might not be attuned to detailed observation of representations and their description. In educational practice, we often engage with signs, such as books, technologies or pictures, but their semiotic compositions as meaning-making resources are rarely reflected upon. Semiotic and multimodal approaches support the development of the sharpening of the senses, for example hearing or vision, in order to explore and question what is perceived. Multimodality research makes a case for attending to a combination of modalities to acquire a deeper and more critical understanding of how meanings are created and interpreted in communication.

Denotation of elements (interpretation level two)

The next stage of the process was for the interviewees to describe what was happening on the screen as the tutor delivered the feedback. A summary of the researcher’s and participants’ responses has been provided below with corresponding quotations from the tutor’s audio commentary. These are categorised under the leading modes – the voice and screen – although there is clearly an overlap between the two as the feedback is essentially multimodal. Examples of the tutor’s audio commentary are provided as appropriate.

The screen

  • TEXT- structure and meaning: In fig. 3, the screen reveals the title of the assignment in bold print. There are two main blocks of text visible on the page. The text consists of letters; these form words phrases, clauses and sentences within paragraphs. There are two paragraphs: one comprising five lines and the other twenty lines. These deal with separate topics or ideas: the first paragraph introduces the assignment, indicated by its use of language: “In this essay, I am going to focus on…”. The second paragraph introduces the reader to the philosophical underpinnings of behaviourism, a learning theory shown through the choice of lexis; for example: “their findings led to the school of behaviourism….”.

  • TABLE: The assessment table (fig. 2) outlines the trainee teacher’s perceptions of her strengths and weaknesses in relation to meeting the assessment criteria and development of her academic writing.

  • HIGHLIGHTING AND COMMENT BOXES: Three words are highlighted in yellow (fig. 3). The comment boxes in the right-hand margin (fig. 2) are linked to words and phrases within the body of the text. They represent different uses of language (functions), for example, praise, suggestion and more directive comments. These are in pink.

  • CURSOR: The position of the cursor above or below the words has a focusing function. Its movement is linked to language in the voice commentary, primarily to replicate what is being said and to emphasise special information juxtaposed with deictic expressions such as “here” and “there”.

The voice

  • Words are often stressed in conjunction with the movement of the cursor; for example, “a new paragraph here”.

  • Words are stressed when a word is repeated several times; for example, “very long paragraphs”.

  • The voice quietens when the cursor hovers on specific words; for example, “entails learning occurs… I wondered if you meant”.

  • The voice speeds up and there is a rise in pitch when the speaker reads information from the table (figure one); for example, “so on what areas would you like to get feedback?”.

  • Pauses of more than two seconds usually occur when the cursor is static along with the utterance of a discourse marker/filler, primarily “um”; for example, “I thought it was very good and um (….)”.

Connotation of elements (interpretation level three)

Participants were asked to comment on their interpretations of the multimodal elements: the concepts and values which they were felt were communicated through the audio-visual feedback. Their understandings of the represented elements in conjunction with mine are provided below. As before, although the findings are categorised under screen and voice elements, the two are clearly interrelated and, therefore, relational connections are inevitable and highly valuable in understanding the materiality of the screencast.

Screen elements

Textual elements: Typography and layout

The screen has a communicative or social function (Kress, 2009) which shapes the relationship with the observer or listener. This is visible if we compare the screen elements in figs. 2 and 3. The screen in fig. 3 does not contain any images, the text is more lexically-dense and follows a logical and cohesive structure, The visual and linguistic features ease the reading of the text and focus the reader’s attention. This is evidenced via the use of paragraphs: an introductory and longer one; topic sentences and discourse markers, for example “on the other hand”. Coupled with its use of academic language and writing conventions such as embedded citations, it is evident that this is a scholarly piece of assessed writing.

In fig. 2, the writing is contained within a table. This is visually more appealing as it is less cluttered. It also occupies most of the space on the page, so the reader’s attention is immediately drawn to it. The use of numbers and bullet points also aid clarity. In the screencast commentary, when the tutor zooms in on the table and discusses what is written, the cursor largely remains static. Because less cognitive effort is demanded in interpreting the written word, there is less need to pinpoint specific elements. This also explains why the voice speeds up here as the information is shared by both tutor and student, so less elaboration is required.

Comment boxes

The comment boxes in the right margin were generally thought to be a useful addition to the spoken commentary owing to their position on the screen and focusing function as highlighted below:

I think they’re clearer on the side because if you embedded them in the text, the student might get mixed-up (P10, Beauty).

I’ll read the comments and then listen to them to try to understand what the tutor meant so the writing is also important because when I don’t listen to the feedback, I can see and correct my mistakes (P9, Engineering).

It is significant that the comment boxes are located on the right side of the screen. Kress and Van Leeuwen (1996) discuss how the positioning of elements communicate different meanings. In the composition of the page or screen, the left-hand side is generally reserved for given or accepted information so, here, the margins, which are used to define the beginning and end of text, are clutter free. The comment boxes on the right represent new and, potentially, valuable information for learners, framed as separate from the body of the text. For a few participants, these were perceived as having a “scolding” function:

It can be a little off-putting if you see hundreds of comments down the side of the page….. People would naturally think if something has been highlighted, it’s wrong (P8, ESOL).

This, however, depends on their content, whether they pose a query for the learner to consider or serve a more corrective function.

Most of the interviewees observed how the addition of the voice enhanced the feedback process as they were able to make clear links between the written word and the tutor’s interpretation of the feedback. This aligns with Mayer’s (2009) cognitive theory of multimedia learning which posits that the brain processes information in different channels: visual and auditory. In order to make sense of this information, individuals will assimilate both modes, building on their cognitive schemata. The synchronous nature of multimodal feedback, combining both visual and auditory elements, may signify less cognitive overload and result in deeper processing and learning.

However, dual coded feedback, both spoken and written comments in tandem, was perceived as potentially confusing by one participant:

You’re reinforcing the written comments with verbal ones although the problem for the listener or observer is do I listen or do I read? (P7, English Literature).

Furthermore, in this screencasting example, the written feedback was pre-existing, and tutors might be tempted to repeat the information on the screen in the spoken commentary. A more dynamic effect would be for tutors to mark up the assignment as they speak so the feedback unfolds before the students, enabling the recipient of the feedback to link elements together and make sense of the feedback holistically. Without the written margin comments to support the voice narration, learners might be less distracted and more inclined to listen to the feedback. This view, however, was generally not reiterated by the participants who believed the text comments provided reinforcement and demonstrated the tutor’s interest in the student’s work.

Highlighting feature

Because the comment boxes are located at the side of the screen, in the peripheral view of the student, they are perhaps not deemed quite as significant in conveying meaning as other visual modes, namely the highlighted sections and the cursor, which hold the individual’s attention in connection with the voice. The use of colour makes words and phrases stand out and, as such, is perhaps more memorable for students. If they can notice and reflect on their error, they may be better able to regulate their own learning:

Highlighting’s bright; it tells you the words that you need to work on. For example, “entails learning” is highlighted: that’s a specific word or phrase you’re talking about. The learner has something to take away and look back on for the next assignment. (P10, Beauty).

The highlighted words inevitably imply that the learner needs to be alert to something in their writing, generally a linguistic error of some kind. However, because they are not attached to any written comments, there is a danger that learners will be unable to interpret the feedback, only guessing what the tutor wants from them. This is where the synchronous voice commentary is significant in dictating the way the feedback is provided as emphasised below:

Sometimes words on the screen are quite flat and you don’t always understand what’s being suggested. I think when someone speaks you get a better understanding of tone, whether something’s right or if it’s a question. For example, in the highlighted part, “entails learning”, just looking at the yellow bit, I don’t know why that’s been highlighted but the voice said, “I’m not sure, are you suggesting…?” It’s obvious there was a question the tutor wanted the student to think about (P8, ESOL).

This suggests that the intonation of the voice provides more meaning than words alone and allows others to make sense of what we are saying. Although this is generally at an unconscious level, we become attuned to speakers’ intentions and emotions through the way they express themselves. This also reiterates the view that it is the combination of spoken and visual elements that enhances the understanding of the feedback.

Cursor movement

With screen cultures, the temporal and spatial aspects of feedback practices are highly significant. The fluctuating motion of the cursor, for example, could be perceived as an extension of the speaker’s hand, and provides a dynamic element to the feedback. The cursor, as a deictic tool, is instrumental in guiding the observer to different locations on the screen, moving from the “here and now” to the “there” (Hill, 2006). Combined with prosodic features such as word stress, there is a sense of travelling on the same conceptual route as the speaker. The corrective features of the cursor can also save time in providing feedback, resulting in an instant and possibly more memorable way of exerting the tutor’s authority. One interviewee, who had been a recipient of multimodal feedback, remarked how the indication by the cursor of where to begin a paragraph had enhanced her knowledge of this feature albeit connected to other semiotic features:

It was very visual because I could see the cursor and could see how to start a new paragraph… I need to see things for them to stick in my mind [P4, ESOL].

Nevertheless, the absence of the student in the feedback process is significant. The flow of information is still one-way irrespective of the form of the feedback. This means that the tutor needs to carefully consider which aspects are note-worthy to highlight in the assignment to avoid information overload and disorientation as the user scrolls up and down the screen with the cursor (Martinez-Arboleda, 2018).

Voice commentary

The area which provoked the most discussion during the interviews was the effect of the voice commentary in adjunct with visual elements. Although the voice was not considered “superior” to the other elements, its impact on the feedback process, as mentioned in the previous section, was undeniable. Its impression on the participants was noticeable, possibly because it was perceived as a novelty; the student teachers had become so accustomed to written comments that hearing the voice as they watched the screen immediately piqued their interest. A recognisable voice was also perceived as more genuine as it provided warmth and a more individualised approach to feedback:

It gives it [feedback] a personal touch because if it’s your tutor giving you the feedback, it makes you think you are listening to him or her teaching the class so you can relate to it better (P5, Health and Social Care).

The tone, pitch and intonation of the voice appear to soften more critical comments on the page and the informal use of language, displayed through hesitation (“um”) and vague language (“sort of”) gives the commentary a spontaneous feel, which could put the listener at ease. Longer pauses and hedging language, for example “I JUST wanted to go through a few points with you” further conveys a certain tentativeness on the part of the speaker: the adverb, ‘just’ is stressed, underlining how the tutor is gathering her thoughts together. Cumulatively, these aspects reveal the speaker’s concern for caution to avoid being too direct and affect the recipient’s self-esteem. The tutor wants to convey a supportive persona and these auditory elements suggest that she is conscious of not being too judgemental in her narration as she balances facilitative and evaluative comments.

Discussion in relation to the research object (interpretation level four)

In this section, I return to how the findings of the research multimodal analysis relate to feedback practices in teacher education.

The link between focused attention and engagement in feedback practices

In the voice commentary, several of the visual screen elements, for example comment boxes, highlighted words and paragraphs were explicitly referenced by the tutor. The material aspects of the feedback were thus intricately related to the construction of the trainee’s knowledge. The juxtaposition of visual pointers such as the cursor and audio narration foregrounded the tutor’s preoccupation with the micro-elements of feedback connected with syntax and lexis. Prosodic features of speech such as pauses and lowering of pitch also affected how meaning was represented, in this case to signal doubt and caution. The use of language was still significant, for example, hedging language to signify tentativeness, but this was situated within a “multimodal ensemble of modes” (Jewitt, 2013, p.251). Using these visual and sound signifiers suggested that the attentional and self-regulatory capabilities of the recipient of the feedback would be enhanced as the student teacher noticed and acted upon these spoken and written comments. A connection between focused attention and engagement was also implied: the trainee would be more motivated to watch the whole of the student screencast because the combined multimodal elements communicated meaning more efficiently.

Student engagement is a widely espoused albeit slippery concept in higher education (Kahu, 2011). In recent years, higher education institutions have implemented measures to respond to students’ views on feedback practices (Crook et al., 2011), yet learner satisfaction levels with assessment and feedback remain consistently lower than other aspects of their higher education experience. Jones et al. (2012) suggest that video screencasts promote student engagement with feedback and this research largely supports this idea. In this study, the illusion of direct address conveyed through linguistic and visual modes: the tutor’s use of conversational language; tone of the voice and movement of the cursor, suggested greater interest, care and engagement in the feedback process.

However, because the transmission of feedback is one-way, changes would need to be incorporated into existing feedback practices to move towards greater reciprocity in the student-tutor relationship. Tutors are responsible for determining the content and there is a danger with this medium that they are only paying lip-service to a more democratic feedback process. With respect to teacher education, if engagement is at the forefront of teaching, learning and assessment practices, it would be beneficial for student teachers to engage in critical discussions around more complex issues such as the power relationship within feedback and other aspects pertaining to teaching pedagogy. Raising awareness of the discourses and resources available to them will enable trainee teachers to voice their opinions and develop their professional identities in the process, essential to their development as teachers.

The significance of materiality in feedback practices

Material aspects – technologies, texts and visual elements – play a pivotal role in teaching and teacher education. Practices are “intrinsically connected to and interwoven with objects” (Schatzki, 2002, p.106). Feedback practices will inevitably entail interaction with non-human artefacts, including handbooks, digital tools, chairs and cultural artefacts such as the tick and cross marking symbols, but the impact they have on social interaction and educational practices is frequently overlooked. Identifying individual elements in the multimodal ensemble enabled the researcher in this study to better understand their significance in the sociomaterial assemblage of arrangements and circumstances which hold a practice in place. These include temporal and spatial aspects, histories, relationships across people, conventions and values which can both enable and constrain a practice (Fenwick et al., 2011). For example, the written form remains the expected mode of feedback in further and higher education institutions, connected to its social, cultural uses, and is embedded in the norms and protocols of assessment and feedback practices. This point applied to this study as, generally, the student teachers were required to mark online using plagiarism detection software or to complete written summaries on learners’ work in adherence to quality assurance processes. The trainees were doubtful whether internal and external verifiers would want to spend time accessing and viewing screencasts as part of the feedback process. Students too have become accustomed to producing written assessments in a prescribed format as documented earlier: a bold font to convey the title; margins to signal space and readability; paragraphs to assure cohesion and so on. Incorporating multimodal screen feedback into existing feedback practices was perceived as a challenge although not an impossibility: any change initiatives would need to be aligned with the cultural ethos of the institution.

Within a sociomaterial realm, it is necessary to explore how meaning emerges from a wider range of interests beyond the individual (Fenwick et al., 2011). A multimodal approach focuses on the social effects (Jewitt, 2013) that are enacted via different ensembles of modes. It has already been noted that specific voice elements, such as tonal variety, long pauses and intonation, were perceived as neutralising more negative comments in the feedback. The spatial and temporal aspects of a practice are also significant in constructing social relations. For example, the asynchronous nature of the feedback may reduce affective barriers as both the tutor and trainee are less constrained by social pressures: they occupy the same virtual space but are unable to read each other’s facial expressions. Vincelette and Bostic (2013) maintain that devoid of the social environment and conventions involved in the delivery of feedback such as the positioning of chairs, the feedback is more focused. The “conversation” is less likely to deviate off topic, as the instructor is in control of the delivery and difficult exchanges can be avoided.

However, is there a danger that the feedback begins to border on narcissism, as suggested by one of the interviewees (P2, Education Studies), as tutors impose their authority on a student’s work, directing learners to the areas they specifically want them to notice? The interviewees felt that one advantage of this type of feedback was being able to repeatedly watch the screencast in their own time and space but there was also a concern that the written comment boxes, often comprising more critical feedback, use of the voice and the “judgemental” cursor (P1, ESOL) might result in the student feeling demoralised. Although uncomfortable conversations are part and parcel of teacher education feedback practices, tutors still need to be mindful of how they position themselves in the process and convey feedforward messages, particularly as they are unable to gauge the reaction of the recipient in real time. The value of the feedback, therefore, does not simply rest on its medium but how it is incorporated into existing feedback and assessment strategies (Lamb, 2018).

The significance of multimodal design in feedback practices

An increasing focus on dialogue and self-regulation in feedback has resulted in a blurring of roles between teacher and student. Students, like teachers, have become “designers of their learning practices” (Kress and Selander, 2012, p.265). Here, design is understood as something which shapes both practitioners’ use of artefacts and the way they interact. It is not arbitrary: teachers will select the mode which they feel best communicates their feedback messages depending on their understanding of its histories, material characteristics, the semiotic resources available to them and a knowledge of the wider social conditions (Kress, 2009) in which feedback will occur. For example, in this study, the provider of the feedback often focused on replication of information within the text, using the voice to elaborate on highlighted words and phrases, aided by the cursor. In conjunction with these multimodal clues was the instructor’s use of language, selected to be sensitive to the trainee’s personality, previous history and learning needs.

Learners will engage with the signs conveyed through the feedback, drawing in turn on their semiotic resources to interpret them in their own way, forming a new sign in the process (Kress and Selander, 2012). This dynamic view of meaning-making requires a re-examination of the view of agency in feedback and how this shapes the relationship between teachers and students. A focus on more collaborative feedback, however, does not necessarily signify a shift in practice. The design of teacher education programmes means that trainees’ teaching practice frequently revolves around meeting pre-determined outcomes (Brandt, 2008), resulting in prescriptive models of teaching and learning, with few opportunities for innovation. This can also lead to trainees teaching in a way which is different from their own beliefs of effective pedagogy as they conform to what they believe their tutor wants to see. Acknowledging these oft unspoken aspects of teaching practice with student teachers could facilitate a better understanding of the significance of design in teaching, learning and assessment practices. Discussions on teacher education programmes about how the representation of knowledge is shaped by the selection and configuration of modes (Jewitt, 2013), which then inform design practices, will foreground the view that all signs are motivated by the interests of the sign-maker and interpreted in social interaction (Kress, 2009).

As with this study, an IG approach would be a valuable way of uncovering some of the meanings implicit within written feedback artefacts. Such an approach provides a precise, robust and meaningful analytical framework within which to study the format, discourse and content of feedback artefacts used in different disciplines, including digital tools, e-portfolios, written summaries and criteria sheets, and how these aspects afford and constrain their use (Wertsch, 2007). For example, the student teachers could examine the implicit power relations embedded within these artefacts (Engin, 2015) and how the social and the material are interrelated (Orlikowski, 2007) in feedback practices. If they recognise the attributes of different semiotic resources and the constraints which affect their situated use, they are better positioned to critically evaluate and select artefacts which best convey their messages and accommodate their learners’ needs, interests and goals. This is naturally not a straightforward process as departments and institutions are frequently bound by external accountability pressures. Nevertheless, raising awareness of the different meanings that emerge from these artefacts could also provoke introspection of feedback practices at a micro, meso and macro level, encouraging the trainee teachers to “reflect on their work relative to the wider domain of other practitioners in the field and in the context of their work in the marketplace” (Ryan and Brough, 2012, p.5).


During the study, the research participants and I became more attuned to the different multimodal elements which, when intertwined, contributed to the totality of feedback: its richness and complexity. Although language was significant in conveying meaning, it was the semiotic companionship of the voice, visual and textual elements which emphasised the impact of the material representation and design of multimodal screen audio feedback. Adopting an Inquiry Graphics approach proved invaluable in exploring both “compositional” and “social” modality (Rose, 2001 p.72), progressively unpeeling the different levels of signification at the micro level to examining meaning-making on a macro level and discussing the implications for feedback practices in initial teacher education. The purpose of the research was neither to recommend nor to criticise audio-visual feedback, but its semiotic potential suggests possible avenues for exploration in opening up the sociocultural landscape of multimodal communication to pedagogic inquiry on teacher education programmes.

In addition, this study has attended to the constitutive role of design and artefacts in situated feedback practices. Digital technologies prefigure human relationships and activities, but social arrangements also shape and condition their use. In terms of feedback practices, these social forces include having to confirm to rigid assessment criteria, access to appropriate resources, temporal and spatial aspects, and engrooved cultural norms. Conducting research into the perception of digital technologies necessitates further analysis beyond the modality itself (Phillips et al., 2016) to explore how these technological artefacts constitute “larger systems and elements of a sociotechnical landscape” (Rip and Kemp, 1998, p.328). A useful extension of this research study would be to investigate the use of multimodal artefacts in specific contexts, through an ethnographic lens, to investigate their role in feedback and what “shapes, sustains and transforms” (Mahon et al., 2017 p.2) the practitioners’ realities. By obtaining a better understanding of the consequences that are generated from situated actions, practitioners would be better equipped to respond to contextual challenges and develop and design better practices for the future.

