Multimodal Genre of Science Classroom Discourse: Mutual Contextualization Between Genre and Representation Construction

This paper argues that meaning-making with multimodal representations in science learning is always contextualized within a genre and, conversely, what constitutes an ongoing genre also depends on a multimodal coordination of speech, gesture, diagrams, symbols, and material objects. In social semiotics, a genre is a culturally evolved way of doing things with language (including non-verbal representations). Genre provides a useful lens to understand how a community’s cultural norms and practices shape the use of language in various human activities. Despite this understanding, researchers have seldom considered the role of scientific genres (e.g., experimental account, information report, explanation) to understand how students in science classrooms make meanings as they use and construct multimodal representations. This study is based on an enactment of a drawing-to-learn approach in a primary school classroom in Australia, with data generated from classroom videos and students’ artifacts. Using multimodal discourse analysis informed by social semiotics, we analyze how the semantic variations in students’ representations correspond to the recurring genres they were enacting. We found a general pattern in the use and creation of representations across different scientific genres that support the theory of a mutual contextualization between genre and representation construction.


Introduction
Social semiotics theory has been widely used to understand the language and meaningmaking in the science classroom. Seminal studies from this theoretical tradition include Talking Science (Lemke, 1990), Writing Science (Halliday & Martin, 1993), Reading Science (Martin & Veel, 1998), and Multimodal Teaching and Learning (Kress, Jewitt, Ogborn, & Tsatsarelis, 2001). In general, there have been two major strands in applying social semiotics to science education research. The first strand foregrounds the importance of genre as a "staged, goal-oriented social process" in all human activities (Martin, 2007). Genre has been used to examine the linguistic structure in scientific written texts such as explanation or experimental account (e.g., Rose & Martin, 2012;Tang, 2016a;Unsworth, 2001). However, its application has largely been restricted to a mono-modal focus on reading or writing. The second strand applies the concept of multimodality to examine how students learn with a range of multimodal resources including diagrams, symbols, and gestures (e.g., Tang, 2016b;Tang, Delgado, & Moje, 2014;Van Rooy & Chan, 2017). However, most studies in this strand seldom take into account the role of genre as the overarching "context of culture" (Halliday, 1999) that shapes how people use and create representations.
Combining the ideas of genre and multimodality (henceforth multimodal genre) has received increasing attention in the analysis of multimodal texts and digital media in the last decade (e.g., Bateman, 2008;Nielsen, Jones, Georgiou, Turney, & Macken-Horarik, 2019). Yet, few researchers have applied multimodal genre to the analysis of classroom discourse as well as within science education. This has resulted in a general tendency in science education to interpret representations in isolation from the genre and social context that drive the meaning-making process in using those representations. For instance, it is common to label a student's drawing as an "explanation diagram" by noting visual features in the final product (e.g., Ainsworth, 2006), without examining the social purpose of why the drawing was made or the process of how it was drawn.
This paper presents an argument that meaning-making with multimodal representations in science learning is always contextualized within a genre and, conversely, what constitutes an ongoing genre also depends on a multimodal coordination of speech, gesture, diagrams, symbols, and material objects. This argument is supported by a case study that illustrates how a group of primary school students made science meanings during a drawing-to-learn approach. The study shows the students' use and creation of representations tended to correspond to recurring patterns in different scientific genres. It also implies the importance of further theorizing multimodal genre within the context of science education, drawing upon the conceptual tools of genre and multimodality from social semiotics theory.

Literature Review and Theoretical Framing
Genre: a Pattern of Language Use In social semiotics, a genre is not merely a collection of similar texts, but it is a "higher-level patterning" of language use that both arises from and realizes the repetitive stages and actions in a communicative event (Bateman, 2008;Lemke, 1988). There are numerous genres in any culture that we can recognize based on the defining stages as well as the semantic variations of language use 1 in every stage. For instance, in a shopping transaction genre, this genre frequently unfolds along a series of stages, with each stage achieved by a distinctive semantic variation as shown in Fig. 1. These patterns (both stage and semantic variation) are quite consistent and recurring in a given culture. Thus, genres help us to predict how an activity is going to unfold, and this is how we can seamlessly participate in the activity by knowing and expecting what to say and do at various stages. In contrast, we often find it difficult to participate in unfamiliar genres. In other words, a genre is a culturally evolved way of doing things with language.
The pattern in a genre is characterized by how we use language to mediate and achieve different functional stages of a social activity. Over time, the pattern evolves and develops into a set of norms and conventions that is not only recognizable within a culture, but also becomes a kind of framing that guides our participation within familiar genres (Fairclough, 1992). It is important to highlight genres are not fixed structures that dictate our actions. Instead, genres are abstractions that we create and maintain through our communicative actions. In other words, the relationship between a genre and communicative event is mutually constitutive and reciprocal. Just as knowing the predictive stages of a genre guides our participation in the associated activity, the semantic variations over the things we say and do in a communication also shape our recognition and co-construction of the genre and its constitutive stages.

Scientific Genres
Functional linguists have identified four major genres in science, namely, information report, experimental account, explanation, and argument (Rose & Martin, 2012). Each of these genres consists of several functional stages as summarized in Table 1. Every stage is distinguished and recognizable based on a particular semantic variation in the text. For instance, an explanation genre comprises the stages of phenomenon and implication (Unsworth, 2001). The phenomenon stage is typically a general statement of natural events or objects that is linguistically realized as simple clauses with timeless present tense verb; for example, "Matter exists as either a solid, liquid or gas." By contrast, the implication stage often contains a logical sequence of causal or temporal statements that have a high percentage of action verbs (e.g., molecules vibrate) and conjunctions (e.g., and, then, therefore).
Although the characterization of scientific genres according to Table 1 is widely used in science education (e.g., Fang, Lamme, & Pringle, 2010), it is mainly derived from the account of written texts. The assumption here is that the structure of written texts reflects the staged, goal-oriented social activity that produced those texts (Fairclough, 1992). This assumption neglects the role of non-verbal resources that are prevalent in scientific discourse and thus motivates us to consider the concept of multimodality. 1 In social semiotics theory informed by systemic functional linguistics, this semantic variation is referred to as register. Register consists of three variables corresponding to the context of situation that is occurring. These variables are connected to the subject matter in the communication (called field), the social relationship between the participants (tenor), and how the communication is constructed or set up (mode). Thus, a genre is theorized as a pattern of register, which is in turn a pattern of discourse semantics, which is a pattern of lexicogrammar, which is itself a pattern of phonology and graphology (Lemke, 1988).

Multimodality: Combination of Multiple Modes
Multimodality refers to the multiplicity or combination of two or more modes in the act of making meaning (Lemke, 1998). In social semiotics, a mode is defined as a system of culturally shaped signs (e.g., sounds, alphabets, symbols) that are created and used by a discourse community as a meaning-making tool. Van Rooy and Chan (2017) identify five of the most common modes used in science education. These modes are as follows: verballinguistic (spoken and written language), visual-graphical (image, diagram, graph), mathematical-symbolic (number system, equation), gestural-kinesthetic (hand gesture, gaze), and material-operational (physical object, tactile manipulative).
In social semiotics, the identification and analysis of genres has traditionally focused on the verbal-linguistic mode, and it was only until recently that the notion of "multimodal genre analysis" became known (Hiippala, 2014). Since then, some researchers have applied the ideas from genre theory to the analysis of multimodal documents, such as newspapers, illustration books and websites, and within science education, digital explanations (Nielsen et al., 2019). Buyer asking quesƟons, seller offering informaƟon. Factual informaƟon on product specificaƟon, performance, pricing, durability, aŌer-sales support etc.

Product inquiry
Series of request and acceptance/rejecƟon between buyer and seller. ArgumentaƟve and persuasive talk around price and economic jusƟficaƟon.
Seller requesƟng informaƟon and acƟons, buyer complying or declining. Factual informaƟon on finance and payment. Exchange of cash or cards as well as goods and services Buyer or seller signalling end of sale. Exchange of graƟtude.  To present a claim and supporting evidence in favor of the claim Thesis-claim or position Argument-justification of claim Discussion-consideration between two claims However, as Bateman (2008, p. 182) cautions, "the extension of traditional notions of genre to multimodal documents is not straightforward. Just what genres there are and how they are best to be defined multimodally remains far from clear." In science education, few studies have explicitly made a connection between genre and multimodality. There have been some studies that examine how multiple representations (notably visual ones) are used to construct scientific explanations (e.g., Park, Chang, Tang, Treagust, & Won, 2020;Prain & Tytler, 2012;Yeo & Gilbert, 2017). These studies have highlighted the affordances of various representations to support students in learning and producing explanations. However, they do not make explicit the role of genre. They are also limited to one particular genre (i.e., explanation) with few comparisons across other genres. What we need are further studies that examine the use of multimodal representations across a range of genres within science classroom discourse.

Research Context and Purpose
We draw on an instrumental case study (Stake, 2000) to analyze data from a primary school science classroom in Australia. Fifteen fifth and sixth grade high-ability students took part in a research project that saw the enactment of a drawing-to-learn approach over five 120-min lessons. The topics involved were states of matter, water pressure, and air pressure. The drawing-to-learn approach, called the Thinking Frame Approach (Newberry & Gilbert, 2007), generally followed a guided inquiry where students investigated a puzzling phenomenon through hands-on activities, discussed ideas in small groups, and then constructed oral and written explanations to account for the observed phenomenon. This approach was chosen due to its focus on the use of multimodal representations. The design and enactment of this instructional approach were not guided by genre theory or social semiotics. All the lessons were conducted by an experienced teacher who collaborated with us during the research project. Taking the role of participant-observers, the three authors took part as co-instructors throughout the lessons.
Given the highly contextual nature of a case study, the purpose of the study is not to make universal claims regarding the connection between scientific genres and multimodal representations. Instead, our purpose is to generate a "thick description" (Denzin, 2002) to reveal interesting and detailed connections between genres and multimodality as they were dynamically enacted in a classroom environment that used a drawing-to-learn approach. With this purpose and context in mind, the research questions that guided our analysis are as follows: 1. What genres were enacted through classroom discourse during the lessons? 2. What were the recurring semantic variations in the use and construction of multimodal representations within each genre?

Data Sources and Analytical Methods
Aligned with our purpose of generating a thick description, the primary data source for this study was classroom videos. Videos provide a rich multimodal dataset to examine the discourse among the teachers and students. As our focus was on student learning, we used four cameras and audio recorders to record each group of students. The videos captured the interaction among the students, teacher, and researchers; their manipulation of hands-on materials; and the process of their drawing and writing on an accompanying worksheet. The completed worksheets, containing the final products of their drawing and writing, were also collected to obtain higher resolution copies of the students' work. (See https://goo.gl/rQVBWK for a sample). Data analysis of the videos was conducted in two interconnected phases. The first phase comprised a macro-event analysis to view the entire video stream and divide it into discrete and meaningful "episodes," as determined by the shifts in participation structure or nature of the interaction (Erickson, 1992). Based on Jordan and Henderson's (1995) interaction analysis, a content listing indexing the description and categories of every episode with their respective video timestamp was also generated. The purpose of the content listing was to provide an overall contextual basis as well as facilitate a purposeful sampling and selection of "telling cases" (Mitchell, 1983) for subsequent analysis. Telling cases serve to reveal insights into the phenomenon being investigated and enable researchers to establish connections between the theoretical constructs of the study (i.e., scientific genre and multimodality).
The second analytical phase comprised a micro-discursive analysis, which involved a discourse analysis (Tang, Tan & Mortimer, 2021) of the participants' moment-by-moment actions. In particular, semantic analyses based on social semiotics were used to analyze the verbal utterances (Lemke, 1990) and drawings (Kress & van Leeuwen, 2006). This analytical method involved identifying the semantic relationship among the words within an utterance (for verbal mode) and visual elements within a drawing (for visual mode). Common semantic relationships include hyponym (type-of relationship, e.g., solid is one of the states of matter), meronym (part-whole relationship, e.g., matter is made of atoms), and transitivity (object-verb relationship, e.g., ice melts). See Tang (2020) for further elaboration of this method for analyzing both verbal and visual modes. The participants' gestures and gazes were also considered in the micro-discursive analysis, but they were not analyzed as detailed as the verbal and visual modes.
To address our research questions, the identification and characterization of the genres were carried out in both analytical phases, in an iterative and incremental manner. During the macroevent analysis, each episode was first tentatively coded with an identified genre-report, explanation, or experimental account (no argument genre was observed during the lessons). This preliminary coding was informed by our current understanding of scientific genres based solely on the verbal mode (see Table 1). During the micro-discursive analysis, we then examined the multimodal coordination of representations (e.g., speech, drawing, gesture, hands-on materials) more closely to identify the recurring semantic variations that characterized the genre.
In this iterative process, we made changes to our earlier identification of genres in three ways. First, we expanded our characterization of genres based on a more detailed and holistic consideration of the non-verbal resources that were used. Second, the multimodal nuances allowed us to resolve any ambiguity and overlap between genres that were identified during the macro-event analysis. Third, as we revised our characterization of genres, we modified our earlier preliminary codes for the genre based on its multimodal characteristics. From this process, assertions that address our research findings were formulated. Throughout the process, the authors checked on one another's analysis and interpretation in order to establish a common understanding and joint consensus.
Following a constructivist research paradigm (Guba & Lincoln, 1994), we adopted the following criteria in strengthening the validity of this study: prolonged engagement with participants, disconfirming evidence, and thick description (Creswell & Miller, 2000). First, all the authors were present at the research site during the data collection and actively engaged with the teacher and students for the entire duration of the program (over two months). This allowed us to build trust and rapport so that the participants were comfortable in revealing their natural self and disclosing reliable information in front of the cameras. During the analysis, we constantly searched for confirming as well as disconfirming evidence that might contradict the emerging claim or interpretation. Any disconfirming evidence was discussed jointly, which frequently led to a clarification in the interpretation or modification of the claim. Lastly, the provision of a thick description also improves validity as the detailed and contextual account (within the space constraint in this journal) allows other researchers to evaluate and ascertain for themselves the credibility and applicability of our interpretation.

Findings
There was a general pattern in the semantic variations of representation across the genres of information report, experimental account, and explanation throughout the five lessons (see Table 2). This pattern is similar to previous studies that examined scientific genres solely in Table 2 Multimodal semantic variations across genres in science classroom discourse

Genre
Stage and semantic variation of representation use Information report Identification/classification: -Mainly verbal language with relational verb, e.g., "what is pressure?", "what is the difference between solid, liquid and gas?" Description: -Verbal language: generalized, taxonomic (hyponym, meronym), non-sequential -Occasional drawings showing and comparing taxonomic and microscopic attributes, e.g., particles, arrows -Occasional tables showing comparison Experimental account Goal: -Mainly verbal language with behavioral verb and action, e.g., "you have to tilt the bottle and observe what happens" Methods: -Predominant use of material-tactile objects and gestures -Verbal language: specific, procedural, indexical to situation, e.g., "do this", "look here" Results: -Verbal language: concrete and observable statements with material verbs, e.g., "the block moves" -Occasional drawings showing macroscopic observable objects, e.g., container, water Explanation Phenomenon: -Frequently preceded by the "results" of an experimental account -Verbal language tends to focus on "why" or "how" Implication: -Use of verbal and/or visual modes to show: a) Symbolic entities based on conventions (e.g., forces) b) Movement or dynamic action of objects c) Temporal or causal sequence of events written texts (as reviewed earlier and summarized in Table 1). However, based on empirical evidence from science classroom discourse focusing on multimodal genre and in response to the research questions, this study added new semantic variations in terms of how non-verbal representations were used or created as part of the genres. Furthermore, the study revealed that each genre was usually triggered by the teacher's instruction at the beginning of the activity and followed by the creation and coordination of representations by the students in a certain way. Conversely, the ongoing construction of the students' representations also influenced the instructors' interventions (e.g., questions, directions) to maintain the development of the genre. This instructor-student interaction was how the corresponding relationship between genres and representation construction was developed and reinforced.
To illustrate our finding, we selected three representative episodes (one for each genre) to show the interaction and use of representations in further details. Not only do these episodes typify the semantic variations of their corresponding genre (shown in Table 2), they were also selected to facilitate a meaningful comparison across the three genres as well as illustrate the instructors' interventions to trigger and maintain the development of the genre. Figure 2 shows the chronological sequence of genres across the five lessons, as well as the position of the three selected episodes.

Genre of Information Report (Episode 1)
This episode illustrates a typical information report genre. The students had individually wrote the similarities and differences between solids, liquids, and gases on a 3 × 2 table. The teacher then proceeded to get each group to "report back" (line 1) what they had written. The following transcript shows the exchange with one of the groups: Fig. 2 Chronological sequence of events according to genres Transcription notation: Tr -teacher; S# -individual student; Ssgroup of students; R1researcher (first author). (..)pause with more than 2 seconds; (…)omission of irrelevant text. Boldphrases and clauses crucial for our interpretation. In this teacher-students exchange, the verbal characteristic consisted of mostly clauses with relational verbs that describe a general attribute of the objects; for example, solid and liquid "have a definite mass" (line 5) and "are tightly packed" (lines 7, 8, and 10). The objects here are "solids, liquids and gases" as well as the "particles." There is a hyponym among the first set of objects (solids, liquids and gases are types of matter) and a meronym between "matter" and "particles" (matter is made of particles). These semantic relationships were made in line 3 when S1 said, "solid, liquid and gases (hyponyms of matter) are all formed by particles." The meronym relationship made in line 3 is an important distinction in order to separate the general attributes of solid, liquid, and gas at a macroscopic level (discussed from lines 5 to 7) and the attributes of their constituents at a sub-microscopic level (discussed from lines 8 to 15). Subsequently, the discussion from line 10 onwards focused on the attributes of the particles in terms of their motion. The teacher and students were negotiating the difference between "moving" and "vibrating." The students were initially not sure whether the particles are moving (line 11), although S2 knew that they are vibrating (line 12). Once it was established that vibrating is "technically" (lines 14-15) a type of moving-a hyponym relationship-then the students could accept that the particles are indeed "moving," just not "moving away from each other." To show the particles were "moving," the teacher made a reference to "the video" (line 15) they had watched prior to this worksheet activity. The video contains several diagrams showing the molecular behavior of a solid, liquid, and gas (see Fig. 3). There are two common semantic relationships depicted in the diagrams. The first one is inclusion where the red circles are drawn inside the square, which signify a meronym relationship where a container holds a solid, liquid, and gas, which are themselves made of particles. The second relationship is spacing which shows the relative distance between the red circles, and this corresponds to the "tightly packed" attribute mentioned in lines 8 to 10 (see Tang (2020) for a range of such visual semantic relationships).
This episode highlights a typical semantic variation in the genre of information report as enacted through classroom discourse. Attributive and taxonomic relationships (e.g., meronym, hyponym) are frequently found and realized through both verbal and visual modes. This is because such relationships are essential in describing the abstract entities and their relations to one another in science. Comparative relationships are also prominent. These can be realized using conjunctions in talk (e.g., and, but, similarly, however), using a table, or juxtaposing diagrams side by side as shown in Fig. 3, or presenting them one after another in a video or presentation slide. An information report also does not have any ordered sequence. For instance, it makes no difference whether the verbal or visual descriptors for solid appear before or after those for liquid and gas.

Genre of Experimental Account (Episode 2)
In this episode, the students were participating in a hands-on activity to feel the water pressure acting on their hands, wrapped in plastic bag, inside a pail of water. The teacher started by giving procedural instructions on how to perform the "experiment" (lines 1-5), and this was followed by the students taking turns to observe, feel, and react to the "results" of the experiment (lines 6-15):  In this activity, although the teacher mentioned that it was an "experiment" (line 4), it is not a real scientific experiment involving hypothesis testing, controlled variables, and systematic data collection. Nevertheless, the activity incorporated several elements connected to an experimental genre that were appropriate for primary school students. These elements can be analyzed from the functional stages and semantic variations of representation use in this activity.

Line Time Speaker Utterance and action
First, the teacher's verbal instruction and tactile demonstration from lines 1 to 5 as well as the students' physical actions from line 6 onwards mirrored the methods stage of the genre involving procedural and situated actions. The verbal instruction frequently contained behavioral verbs and demonstrative pronouns, such as "do it/this/that" (lines 1, 2, 4, and 5). At the same time, they always occurred with an accompanying physical action, in order for the instruction to be contextually meaningful. For instance, when the teacher said, "you can do this"; the demonstrative pronoun "this" refers to the procedural action of removing the plastic bag that would prevent "water dripping everywhere" (line 2). Subsequently, the students copied this action when they removed the plastic bag, as shown in Fig. 4. Another thing to note in an experimental account is that the procedural actions were sequential, as seen from the temporal words and conjunctions used in the instruction, for example, "as" and "when" (line 2), "after" (line 4), and "and then" (line 5).
The next functional stage mirrored the results of the experimental account genre. As the students dipped their hands wrapped in plastic bags into the water, they verbally reported their observation. In this case, the observation was tactile rather than visual as they reported how "it feels" (lines 7, 10, 12, and 13). From lines 7 to 12, the students only reported superficial results in terms of their sensation and emotional feeling (e.g., "weird," "so cool!"). This prompted the researcher (R1) to intervene by asking if they could feel the difference in pressure according to the depth of water (line 13). This intervention is telling as it shows R1 somehow knew that the students were not adequately producing the results of an experimental genre and thus decided to ask a question that would point them towards the genre. In effect, R1 introduced an experimental variable for them to test the variation of depth. Consequently, S4 gave a response in line 16 that met the criterion of a valid "result" for R1. R1 then used that response to prompt S3 to "try it again." Fig. 4 Students' physical actions and manipulation of materials in a hands-on activity In this particular example, the students did not need to complete a worksheet or produce a laboratory report. Thus, they did not write and draw their procedures and observations. However, there were other instances where they had to record their observations (as results) in the form of writing and drawing. Such records tended to occur as a prerequisite to the genre of explanation, which will be examined next.

Genre of Explanation (Episode 3)
Prior to this episode, there was a hands-on activity for students to observe the behavior of solid, liquid, and gas. They tilted three different bottles containing a Lego® block, water, and incense smoke and observed the outcome; that is, the Lego block retained its shape, while water and incense smoke did not. In this episode, the instructional goal was for the students to draw and explain the shape in terms of the intermolecular spacing and forces between the particles in each state. The use of drawing here functioned as a visual tool to support the students' model-based reasoning and explanation (Tytler, Prain, Aranda, Ferguson, & Gorur, 2020). This episode began with the teacher's instruction to the class: Immediately after the teacher's instruction, almost every student drew the outline of the containers (line 5). The teacher then asked twice if they "really need to draw the container" in the explanation (lines 6 and 8). His question revealed a certain expectation of what a diagram in an explanation should or should not contain. Just like R1 in the earlier episode on experimental account, the teacher implicitly knew the students were not enacting an explanation genre when they only drew the macroscopic observable objects. This intervention to maintain the development of the genre is a recurring theme that resurfaced several times, as seen in the following interaction about 5 minutes later between R1 and the group comprising S1, S2, and S3: While the students were drawing, R1 noted that they were not constructing an explanation and asked them how the diagrams could help them answer the questions (line 12). The subsequent exchange shows that S2 understood their diagrams could only show and describe "what happens" in terms of the macroscopic things they "can see" (line 13), but cannot explain "why it happens" (lines 14-15). The students were then given an opportunity to discuss how to change their diagrams. Consequently, S3 and S1 both gave three suggestions that revealed their thoughts about the features of a diagram suited for an explanation.
The first feature was using arrows to show forces of gravity (line 20). This is a form of symbolic representation to depict abstract entities based on social conventions instead of drawing observed macroscopic things (Tang, 2020). The second feature was showing a temporal sequence by drawing a "before and after sort of things" (line 22). This feature is common in the genre of explanation because it has a logical sequence (Unsworth, 2001), as opposed to the genre of report which has no sequence. The last feature was to draw "little connecting circles" (line 26), which has the effect of depicting the sub-microscopic and adjoining particles in a solid.
After this interaction, S3 and S1 went on to erase their writing and include more "explanatory details" in their diagrams, as shown in Figs. 5 and 6 respectively. In Fig. 5, S3 initially drew the outline of the container and Lego brick based on his observation of "what happens." This drawing could only show the "results" of an experimental account but not account for the explanation of "why." After R1's questioning, S3 drew the solid as "little connecting circles" and the effect of gravity as a big arrow. He also drew other arrows to show the "molecules pulling on each other" in order to explain why the solid keeps its shape. Similarly for S1 (shown in Fig. 6), his initial drawing of the macroscopic observation was later supplemented with the invisible molecules and their movement, as well as a temporal sequence of before and after titling the bottle.

Summary of Findings
Based on the concept of multimodal genre and the above illustration, this paper highlights two major observations concerning how students create and learn with representations in science.
First, the use of representations follows a general pattern that varies across different genres (as summarized in Table 2). This pattern within multimodal genres both arises from and realizes the repetitive actions of our communication (Bateman, 2008). It accounts for why we expect certain characteristics in the representations used within a particular genre; for instance, a diagram for an explanation should go beyond depicting macroscopic observations or a report should emphasize taxonomic relationships (e.g., meronym, hyponym) in both its written and visual forms.
Second, the pattern within multimodal genres frames and is framed by the instructorstudents' ongoing interaction. This was most visible in the teacher's and researchers' implicit interventions to guide the students' creation and manipulation of representations at different stages of a genre; for instance, asking students if they really needed to "draw the container" in an explanation or introducing a variable for students to test in a hands-on experimental account. Such interventions are crucial to maintain and reinforce the development of the multimodal genre and consequently the process of representation construction undertaken by the students.

Limitation
In this study, although we have presented the various genres as three distinct blocks in the chronology of the lesson activities (see Fig. 2), the boundaries between the genres were at times blur and fluid. While an activity typically unfolded according to a dominant genre (e.g., experiment account), the participants sometimes deviated into another genre prematurely before coming back to complete the stages of the first genre (e.g., asking why it happens before completing the experiment). This is not unexpected given that genres are always dynamically constructed by us instead of functioning as scripts to follow. Another consideration for the fuzzy boundary is the transition from one genre to the next, such that it is difficult to draw the line of when a particular genre starts and ends. This is particularly the case when the results stage of an experimental account (e.g., observation) frequently transited into the phenomenon stage of an explanation genre (e.g., why it happens).

Significance and Implications
This study has implications for the shared objective in this special issue to understand how students create and use representations as they learn science. Our interpretation of any representation must always be contextualized to the cultural purpose of the genre in which the representation was constructed for, instead of merely examining the surface features of the representation. This understanding challenges prevalent theories that are based on universal claims about multiple representations and ignore the role of context in meaning-making (e.g., Fig. 6 Drawing of liquid scenario by student S1 Ainsworth, 2006;Mayer, 2001). At the same time, this study also reveals the mutually constitutive and reciprocal relationship between genres and multimodal coordination of representations. That is, the "context" in a genre is not a pre-existing structure (akin to a container) that imposes on the use of representations, but it is in fact dynamically created and maintained through our multimodal actions and interactions.
In social semiotics, although the role of context is understood through the notions of register and genre (as "context of situation" and "context of culture" respectively; see Halliday, 1999), there have been few discussions on how multimodality is connected to this theorization. Previous accounts developed from genre theory (e.g., Rose & Martin, 2012;Schleppegrell, 2004) were largely based on written texts and neglected the role of multimodal representations. Thus, our focus was to develop a greater understanding of the relationship between genre and multimodality, as enacted in primary science classroom discourse. Specifically, we applied the concept of multimodal genre and showed how the semantic variations in students' representations corresponded to the recurring genres they were enacting.
Moreover, this study also suggests several theoretical connections between multimodal representations and scientific genres that warrant further investigation. First, scientific genres are constructed through the use and coordination of multimodal representations. Second, multimodal representations are seldom used or constructed on their own outside the context of a scientific genre. Third, there is a predictive mutual contextualization between a scientific genre and representation construction, such that the unfolding of a genre shapes the coordination of representations, and vice versa.
Lastly, the expanded inclusion of multimodal representations in scientific genres raises a number of issues for science teaching and learning. Genre theory has at times been critiqued not only for its sole focus on written texts, but also for prescribing a linear approach to knowledge construction (Yore, 2018). This prescriptive nature is partly attributed to the dominant role of writing as a reporting tool within formal curriculum and assessment. The inclusion of talk and non-verbal representations has the potential to change the view of genre to one that is more dynamic, particularly with the use of drawing as a reasoning tool (Tytler et al., 2020). For example, the talk around students' drawing in episode 3 enabled the students to break away from merely reporting their observations to engaging in a deeper reasoning of why it happened. Thus, it is crucial to expand the view of scientific genres to include multimodal representations.
At the same time, while genres do not impose fixed structures for us to follow, our actions are often based on familiar genres. Genres are after all culturally evolved way of doing things with specific social purposes. This was most evident in the teacher's and researchers' interventions to ensure the students' meaning-making developed along scientific genres, even though the project did not foreground genre during the research. Research on representation construction must therefore take into account the role of genre, not only as a lens for researchers to consider the contextual purpose of the surrounding activity but also for teachers to highlight and explain the genre requirements as a form of explicit disciplinary literacy (see Tang & Rappa, 2020). For instance, in an experimental account genre, teachers could be more explicit in naming the variables in the experiment and discussing the rationale of the experiment, instead of focusing on the "results." In an explanation genre, teachers could highlight the purpose of drawing as a modeling process to think about the underlying causes and mechanism, instead of depicting surface observations at a macroscopic level. In both genres, students should also be given more opportunities to create multimodal representations in relation to the purpose of designing experiments or constructing scientific explanations.