Introduction

Much has been written recently about the importance of the STEM disciplines for advances in many fields, from those aiding economic progress in a rapidly changing technological world to those enhancing sustainability outcomes in the face of climate change (e.g. Engler, 2012; Office of the Chief Scientist, 2013). Such moves have fostered the spread of interest in STEM as an integrating force for Science, Technology, Engineering, and Mathematics, throughout the school curriculum, as well as in the industries themselves (Honey et al., 2014). From early childhood (Early Childhood STEM Working Group, 2017), to primary school (Mildenhall et al., 2019; Tytler et al., 2021), middle school (Pecen et al., 2012; Stohlmann et al., 2012), and high school (Australian Curriculum, Assessment and Reporting Authority [ACARA], 2016; Harland, 2011), resources have become available to assist teachers and students to appreciate the potential linkages among these disciplines and the supports that exist to aid investigations in authentic contexts. Indeed, the importance of integration across the STEM disciplines is the focus of Anderson and Li’s (2020) collection of reports of various approaches to integration, how the approaches are designed for students (e.g. Wang et al., 2020), and the implications for teacher education (e.g. Delen et al., 2020). Input from around the world illustrates the impact of moves to STEM integration internationally (e.g. Steffensen, 2020).

In this context, Watson et al. (2020a) have made the case for the critical role of statistics in integrating STEM education. The first evidence put forward for this case is based on variation being an essential part of STEM, underpinning the implementation, quantifying, and improving of projects relying on measurement of data in these fields. Through graphical representations and numerical statistical calculations, variation is analysed and decisions made to improve STEM initiatives. Specifically, variation underpins the “practice of statistics” (Watson et al., 2018), for example as outlined by the GAISE report (Bargagliotti et al., 2020):

  • Formulate questions, anticipating variability and making predictions,

  • Collect data, planning for variability and fair testing,

  • Analyse data, accounting for variability using a range of representations and technologies,

  • Interpret and evaluate results, allowing for variation and suggesting improvements.

This process, however, is parallel to the Engineering Design Process (Lyden et al., 2018) and Science Inquiry Skills (ACARA, 2019). This natural alliance is further support for considering interdisciplinary opportunities across these disciplines (e.g. Mayes, 2019).

More research, however, is needed to illustrate the way in which students engage with the learning experiences and to provide evidence of the way in which student learning about variation is fostered within multi curricula/discipline learning contexts (Tytler et al., 2019). This is particularly true in relation to the Australian Curriculum: Mathematics (ACARA, 2019). Although the Proficiencies across the Years 1–6 related to statistics (see Appendix 1) generally cover the fluency, reasoning, and problem-solving required of the “practice of statistics”, the Content Descriptions often do not supply the detailed techniques required to carry out investigations in other subject areas. For example, following the mention of column graphs, dot plots, and tables in Year 5 (ACMSP119), in Year 6, this is extended to “Interpret and compare a range of data displays, including side-by-side column graphs for two categorical variables” (ACMSP147). Although using digital technologies is mentioned, it is in relation to those types of representation listed. In this study, it was necessary to extend students’ experiences beyond the specifics of the curriculum. In doing this, the various manifestations of variation as encountered in an actual scientific experiment were the focus of the workbook questions as the students completed an activity on Plant Growth. Furthermore, the assessment of the learning that took place at the end of the year employed more general questions, allowing students opportunities to display their appreciation of the impact of variation on the outcomes of the experiments. Background for these two perspectives is considered next.

Variation

The word variation itself has different, or varying, interpretations. In the Chambers Dictionary (Kirkpatrick, 1983, p. 1437), variation is defined under the word “variable”, in part as, “a varying; a change; continuous change; … departure from the mean or usual character; the extent to which a thing varies; … an inequality in the moon’s motion discovered by Tycho Brahe; a change in the elements of an orbit by the disturbing force of another body.” James and James (1959) with reference to statistics, equate variation with dispersion and describe it as “scatteration of the data; the lack of tendency to concentrate or congregate” (p. 125). Many sources ignore the term and concentrate on related words like “variability” or “variable”. Some texts accept variation as an undefined term. Kitchens (1998), for example, defines a variable as “any characteristic that can be measured on each experimental unit in a statistical study” (p. 3), then claims “It is the variation exhibited by the variable that is of interest to the statistician”, and further defines, “The distribution of a variable specifies the distinct values that the variable assumes and how often these values occur. The distribution illustrates the pattern of the variation in the data” (p. 9). When introducing contextual data to children in primary school, the idea of variation as change or scatter and the need to illustrate (or represent) this in a distribution to study and explain it is the critical starting point in answering questions.

Although statistics is built on variation, at times in the early days of introducing statistics at the school level, it was taken for granted (e.g. Holmes, 1980). Green (1993), and particularly Shaughnessy (1997), brought it to the fore, asking for research specifically on the topic. Shaughnessy (2007) summarised eight conceptions of variability arising from research at that time, of which six were considered relevant to the experiences of the students in the study reported here: variability (i) in particular values, (ii) as change over time, (iii) as a whole range of possible values, (iv) as the likely range of a sample, (v) as association of several variables, and (vi) as distribution (pp. 984–985).

Hence by 2007, the first 7 of the 18 chapters of Lovett and Shah’s book, Thinking with Data, were related to variation, indicating its foundational importance before considering statistical reasoning and data analysis (Chapters 8 to 14) and learning from and making decisions with data (Chapters 15–18). The diversity of the first six chapters is typified by the page header of Chapter 7: “Variation in Variation”! Of particular interest to this study is an earlier chapter (Lehrer & Schauble, 2007), which considered two activities with children where variation first occurred in the actual measurements created by the children (e.g. estimating the height of a flagpole) and then later occurred naturally within the context of the investigation of plant growth. This research of Lehrer and Schauble, related to distribution and variation, detailed several issues encountered by Year 5 students when first introduced to the study of plant growth data. They concluded that considering variation in specific measurement data was easier for the students to deal with than considering the natural variation that occurred as the plant growth context became more complex over time. Earlier, Lehrer and Schauble (2004) had used a design methodology focussed on students’ development of thinking and the support given to them in the classroom, to study their growing appreciation of natural variation through a series of experiences, including creating hand-drawn representations of data from samples of plant height. Although progressing to considering divisions in the data equivalent to box plots, the students did not have the advantage of the technology available today for representing further the variation in their data.

SOLO taxonomy: describing development of understanding

The Structure of Observed Learning Outcomes (SOLO) model (Biggs & Collis, 1982, 1991) is an hierarchical model that describes the increasing complexity of student understanding of ideas and concepts. The SOLO model has been used and extended over the years in various ways, related to the two major components of the structure: the modes of development and the advancement within each mode. First, five modes of development are proposed from birth to adulthood: the Sensori-motor mode from birth to about 18 months is described as tacit knowledge; from about 18 months the Ikonic mode emerges based on intuitive knowledge often related to visual imagery; from around 6 years, the Concrete Symbolic mode involves a transition to written language and symbol systems producing declarative knowledge; from around 14 years, the Formal mode emerges, appreciating the theoretical aspects underpinning particular disciplinary knowledge; and finally, potentially from around 20 years, the Postformal mode may develop to allow for theory creation. Of particular interest in the primary school is the transition from the Ikonic (IK) mode to the Concrete Symbolic (CS) mode.

Second, the detailed study of this development is based on the levels of advancement that take place within each mode. The first level of understanding for a mode, labelled unistructural (U), involves using a single aspect or element of the context of the mode to make a response. The multistructural level (M) involves using two or more elements in a serial fashion in responding to a task, whereas at the relational level (R) the elements are integrated into a coherent explanation. Moving to the next higher mode involves generalising the structure into single elements of that mode. Because the CS mode is the dominant mode in primary and middle school, often observations in the previous IK mode are labelled as Prestructural and not analysed further. Groth et al. (2021), however, recently analysed IK responses to tasks, resulting in U-M-R sequences of intuitive reasoning preliminary to CS sequences. In doing so for particular tasks, some of the IK reasoning is incomplete but relevant to the context of the task and termed “normative compatible” (c), whereas other IK reasoning is irrelevant, hence termed “normative incompatible” (ic). Types of normative incompatible reasoning include a focus on irrelevant task aspects, myths and imaginative stories, out-of-context mental imaging, and a deterministic worldview (Groth et al., 2021). Again, it is possible for either normative compatible or normative incompatible responses to display U, M, or R structures, depending on how many aspects are included and how they are combined. Hence, the length of a response is not the determining factor of whether a response is IK or CS, but whether elements of the context and the specific question are employed. An IK response can be quite long but out of context, whereas a CS response, particularly at the U level, can be very short but relevant to the context. Considering responses in the IK mode in more detail, rather than just labelling them “Prestructural” as done previously (e.g. Biggs & Collis, 1982; Watson & Moritz, 1998, 2000), allows more opportunity for discussion with students on their current conceptions, why they are potentially inadequate for the task at hand, and how they might be improved. Figure 1 summarises the possible pathways for responses to tasks, noting that the IK mode is the precursor to the CS mode.

Fig. 1
figure 1

SOLO model for end-of-year questionnaire items

The SOLO model has been used extensively in statistics education research with school students over the years. In particular, it has been applied to learning about variation in chance contexts (e.g. Watson & Kelly, 2004; Watson & Moritz, 1998), sampling contexts (e.g. Watson & Moritz, 2000), and the study of distributions (e.g. Watson, 2009), as well as more generally across contexts based on surveys (e.g. Watson et al., 2003). More recently, the extension of Groth et al. (2021) has been applied to Year 3 students’ understanding of data (Watson & Fitzallen, 2021) and to Year 5 students’ appreciation of variation when working in a STEM context collecting data from catapults (Watson et al., 2022b).

Background

A 4-year project in a primary school, Years 3 to 6, sought to embed the practice of statistics and its underpinning big ideas, particularly variation, within a range of STEM-related activities. The aim was to reinforce the need to be able to use data arising across these disciplines, and particularly Science, in meaningful ways for decision-making. Each year, investigations embedded in the Australian Curriculum: Science (ACARA, 2019), with links to Digital and Design Technologies and other topics in Mathematics, were introduced to build understanding. Each activity from the beginning of the larger project was intended to build sequentially on more complex aspects of the STEM contexts within which variation occurred and the practice of statistics was taking place. The building of the associated science concepts was linked specifically to the Australian Curriculum for that year of schooling (ACARA, 2019). In relation to the practice of statistics, for example, the encounter with the fundamental concept of variation (e.g. Cobb & Moore, 1997) grew first from considering variation in a measurement context (cf. Lehrer & Schauble, 2007). Initially, students experienced two different treatments with the same expected outcome (objective) but which were carried out in a different fashion: they created licorice sticks from Playdoh® by two methods, “by hand” and “by machine”, but with the same measurement criteria (Watson et al., 2020b).

Variation was later experienced in a context of comparing two conditions where the expectation of outcome was also different, this time by launching ping pong balls with catapults and then changing the catapults, intending to increase the distance travelled (Watson et al., 2022b). In these two situations, class data were combined to make decisions about the research questions that were set for the students. Time was introduced as a variable in an activity involving heat, insulation, and cooling (Fitzallen et al., 2017), where again class data were combined. Each activity added to the complexity of the data analysis and decision-making because of the need to appreciate the combining of samples, the variation in the larger data sets, and the need then to find the relevant data to answer the questions.

The previous activity to the one reported here, at the end of Year 5, considered the dispersal of seeds where students worked in groups to design methods of dispersal by wind, using the engineering design process (Fitzallen et al., 2019; Smith et al., 2019). Each group of three students designed and tested one of the three methods of dispersal: helicopter, parachute, or sail. Hence in this case, all class data were not combined but instead contrasted across three subsets to answer the research question. The process involved each student in the group creating a mechanism, which was tested by the group. Subsequently, the group chose its “best” mechanism and worked to improve it further, thereby addressing the engineering design process associated with the Design Technologies curriculum. Across the class, group data for the three methods of dispersal were combined for a class discussion of the most effective method. Although the students each had the opportunity to create their original designs, the research question was based on comparing the three methods and related to the “best” method, judged by the distance travelled. The Plant Growth activity took the type of data set considered for decision-making one step further.

TinkerPlots (Konold & Miller, 2015) was a contribution from the Digital Technologies curriculum because of its particular features assisting students to display and understand variation in distributions (Konold, 2007; Konold & Lehrer, 2008; Konold et al., 2007). Watson and Fitzallen (2016) examined the software from the perspective of affordances (Chick, 2007; Gibson, 1977), in that it had the capability to foster learning based on its accessible and flexible features, facilitating its incorporation in learning activities by teachers. They provided examples of affordances related to students’ learning of statistics, to enhancing teachers’ pedagogical content knowledge for teaching statistics, to assessing students’ understanding, and to supporting other areas of the curriculum. In the case of this study, that other area of the curriculum is obviously Science. Successful implementation of learning activities supported by TinkerPlots across the school years has been reported in recent years (e.g. Allmond & Makar, 2014; Fielding-Wells & Hillman, 2018; Kazak et al., 2014; Khairiree & Kurusatian, 2009).

Before the Plant Growth activity, students had learned to create plots in TinkerPlots when data were provided with the Licorice activity, to enter their own data with the Catapults activity, and to display numerical and multiple categorical variables with Wind Dispersal of Seeds. Through an activity on Viscosity (Watson et al., 2022a), they also plotted the relationship of two numerical variables. A summary of these activities is found in Fitzallen and Watson (2020). This report presents some outcomes from the final major activity in the project, Plant Growth.

The Plant Growth activity then focused on combining the statistical understanding gained with the Science Inquiry Skills to allow students to decide, within the context of plant growth, their own research question, collect the appropriate data, represent and analyse the data in TinkerPlots, and present their conclusions with respect to the method of plant growth chosen.

Building upon interest in plants, the Year 6 Australian Curriculum: Science (ACARA, 2019) suggested the following Content Description and Elaboration for Biological Sciences:

The growth and survival of living things are affected by physical conditions of their environment (ACSSU094)

  • Investigating how changing the physical conditions for plants impacts on their growth and survival such as saltwater, use of fertilizers, and soil types.

Plant growth has been suggested often as a learning context in the classroom. As early as 1984, Smith and Anderson explored the teaching strategies employed to support students’ conceptual understanding of plant growth. Wood and Roper (2000) suggested a basic technique for students to measure the number of expanded leaves for garden bean plants grown in the classroom under three lighting conditions over an 11-day period. The GAISE Reports (Bargagliotti et al., 2020; Franklin et al., 2007) provided a similar example by considering growth data from radish seedlings grown in three conditions of darkness and light. In line with the Australian Curriculum (ACARA, 2019), the Australian Academy of Science (2016) produced a 7-lesson resource for Year 6 focusing on sustainability, water, salt, and plant growth. One lesson considered the growth of lettuce seedlings in four different saltwater concentrations. An optional lesson included students working in teams to formulate a question related to salinity and conducting an investigation to collect evidence to support or refute their claims. These reports did not include any research outcomes related to student learning as a consequence of carrying out the experiments.

Watching plants grow involves observing variation: they become taller, produce more leaves, flower, bear fruit, and/or produce seeds with different dispersion mechanisms. Plants change across the seasons and eventually they may shrink and die, or become dormant until the weather is again favourable for producing flowers and fruit. The aspects of plant growth that occur on the germination of seeds provide a context in which students can observe change and variation over time. As a Biological Science, the study of plant growth can contribute to appreciation of the importance of variation not only in Science but also across the STEM disciplines. At the school level, experimenting with plant growth, as reported here, was aimed at linking content expectations across the areas of the curriculum associated with Science, Technology, Engineering, and Mathematics. Although in most countries, Science and Mathematics are distinctive content areas of the school curriculum, often Technology and Engineering are not singled out individually. In Australia, for example, these topics are considered under Digital and Design Technologies (ACARA, 2019), whereas in the USA, Engineering is considered by the National Research Council (NRC) as part of the Science curriculum (2013). In particular, based on the Australian Curriculum (ACARA, 2019), this study focused on Year 6 Science Understanding (Biological Sciences) and Science Inquiry Skills from Science; Measurement and Statistics from Mathematics; data handling software from Digital Technologies; and experimental design and testing from Design and Technologies.

The overall research interest and questions in this study related to the features associated with variation that students included when using TinkerPlots software to interpret plots of data collected from a plant growth experiment. Initially, this related to the learning activity as it occurred, with students reporting in their workbooks the analyses they were carrying out to explain the variation they were seeing. Later in an end-of-year questionnaire, the focus was on students’ consolidation of understanding of variation in terms of the SOLO developmental model as they answered questions about the activity.

Methodology

The methodology for the Plant Growth study followed a pragmatic paradigm (Mackenzie & Knipe, 2006) in being problem-centred and oriented to real-world practice with specific questions matched to the purpose of the research. Here, the initial data were qualitative in nature, with the aim of capturing the nuances of students’ formative thinking (e.g. Wiliam, 2011) while they were carrying out the Plant Growth activity. Second, again following Wiliam (2011), data were collected to monitor the summative learning as a consequence of the activity. The method hence was based on a classroom intervention—the problem-centred, real-world experimental activity, with accompanying workbook questions—followed by an extended end-of-year multipart question designed to monitor learning.

In the classroom: the experiment

The Plant Growth activity was a variation on that suggested by the Australian Academy of Science (2016). It was decided that within the context of an appropriate experimental design, groups of four students would themselves decide the research question and hence the materials to be used in setting up the experiment with a control and three treatments using an hydroponic watering system (Fig. 2). There were four soil types, four treatments, and two different seeds. Each group could choose a seed, a soil type, and a treatment, or test the influence of the various soil types (see Appendix 2 for details). Hence, it was very unlikely that in a class there would be a duplicate design. Each group of students was its own scientific team, carrying out the instructions for setting up the experiment, with each student supervising one of the treatments. Measurements were taken across 3 or 4 weeks, including recording the data, entering the data into TinkerPlots, and making a decision about the plants’ growth in relation to the treatments chosen to answer the question: What makes plants grow best?

Fig. 2
figure 2

Planting wheat seeds in the hydroponic bottle system and growth after 14 days

The purpose of allowing a large degree of freedom in the choice of materials and treatments was to introduce students to the wide range of possibilities and to the importance of data when applying Science Inquiry Skills. As the activity was set up, it was not expected to be possible, at the end, for the students to all agree on “the best” of various treatments across the class; some treatments, however, were agreed not to be appropriate.

The details of setting up and carrying out the experiments are presented in Wright et al. (2021). This report considers two aspects of the extended activity. When the experiments were complete, students were given instructions for entering their data into TinkerPlots and each member of the group had access to the complete data collected by the group. An example is shown in Fig. 3, which involved planting wheat seeds in potting mix with the treatment being three different concentrations of vinegar in the water. In their workbooks, students then answered the questions in Fig. 4 (condensed, with space for writing removed), which focused specifically on the variation experienced during the activity as seen in the TinkerPlots graphs, within and across the variables. The instructions were very explicit due to the complexity of the experiment and the students’ lack of previous experience. The data from these workbook questions provide the data for the formative assessment.

Fig. 3
figure 3

TinkerPlots file for one group’s data

Fig. 4
figure 4

Workbook questions focussing on variation (space for writing removed)

Longer-term learning outcomes

At the end of the year, students completed an extended questionnaire on the year’s activities and other linking items across the 4 years of the project. The first multipart question was related to the Plant Growth activity and is shown in Fig. 5. As can be seen, one example of a TinkerPlots graph was presented and students were asked to explain and interpret the outcomes from it. This was to evaluate the summative aspect of learning from the activity. Because of the time gap from completing the activity for some of the students, the initial questions were intended to be straightforward, helping students recall the main components of the activity before discussing the outcomes.

Fig. 5
figure 5

End-of-year multipart question (space for writing removed)

Research questions

In order to appreciate the two aspects of learning that took place in relation to the Plant Growth activity, it is necessary to document the students’ formative experiences while taking part in the experiment, as well as the summative results from the end of the year (Wiliam, 2011).

  • Research Question 1 (formative) What were the explicit features associated with variation that students included in their descriptions of a science experiment on plant growth (cf. Fig. 4) based on their groups’ TinkerPlots representations?

  • Research Question 2 (summative) What evidence of student learning associated with variation in the context of the Plant Growth activity was demonstrated in responses to the end-of-year question based on one group’s data (cf. Fig. 5)?

The end-of-year multipart question presented all students with a single example of the outcomes of an experiment in the form of a TinkerPlots graph (cf. Fig. 3). Although students had seen their classmates give oral presentations on each group’s working, they only had first-hand experience within their own groups. The second research question hence relates to the consolidation of learning and application to a particular experimental data set. The word “variation” was purposely not included in the end-of-year question.

Participants

Three classes of Year 6 students at an urban independent Catholic school participated in the study with 64 of the 71 students having parental and student permission for their data to be included in the research reported here. The school, in an inner regional center, had a socio-economic status index (ICSEA) value of 1026 (mySchool.com.au; mean = 1000, standard deviation = 100) and four of the students had English as a second language. The reported sample consisted of 40 boys and 24 girls, ranging from 11 to 12 years of age. Each class was taught by the same teacher, using teaching notes and student workbooks developed by the research team, with input from the teacher. The activity took place across school terms 1, 2, and 4. Eight of the 64 students were unavailable for the end-of-year questionnaire, which was administered at the end of term 4 and the entire study. The overall longitudinal study for which the Plant Growth activity was the final part had ethics approval from the Tasmanian Social Sciences Human Research Ethics Committee (approval number H0015039).

Analysis for Research Question 1

As the first research question was formative in nature (Wiliam, 2011), the data analysis was considered qualitative and interpretative (Creswell, 2013). As noted, the experiment completed by each group of students was unique with its own set of data to interpret. Furthermore, the activity took place across three school terms for three Year 6 cohorts, often with different climatic conditions under which the plants were growing. It was hence important to assess the types of response without judgement on the variable/s chosen or the experimental conclusions drawn for growth. All responses to the questions in Fig. 4 were clustered into categories by the first author related to the expectations as stated specifically in each of the workbook questions. The same process was carried out by the second author independently, with a discussion confirming the observed categories, which are summarised in Table 1.

Table 1 Categories of response to workbook questions

Analysis for Research Question 2

For the second research question, because the end-of-year question was monitoring the summative learning (Wiliam, 2011) that had taken place in relation to the Plant Growth activity, the Structure of Observed Learning Outcomes (SOLO) Model (Biggs & Collis, 1982, 1991), as adapted for the Ikonic Mode (Groth et al., 2021), was appropriate for categorising the outcomes.

In considering the coding of responses to the end-of-year questionnaire items, the nature of the individual questions targeted different levels of response in the CS mode. In particular, items (b) and (c) were included to reorient the students to the activity. This was taken into account when rubrics were developed for coding responses. There were also many potential elements of the task to be used in responses due to the complex nature of the overall activity. Some may or may not have been relevant to the particular question being answered. These elements included the physical components of the experiment (e.g. soil type, treatment kind, plant type); the actions taking place (e.g. watering, growth, wilting, measuring, comparing); and the products of the investigation (e.g. data, plot, conclusion).

Table 2 details the notation used to describe the modes and levels of responses employed here for responses to the Plant Growth multipart question in the end-of-year questionnaire. CS responses are those where elements relevant to the question are included. For these responses, the difference between M and R levels is often the use of a word such as “because”, which shows the linking between or among the elements listed. IK responses may be vague or intuitive with unclear or unspecific relation to the question asked. If responses are considered within the context of the particular question, they are deemed normative compatible (c). If they are considered irrelevant, or incorrect with respect to the context of the question, they are deemed normative incompatible (ic). As an example, for the question about why there was a gap in the data, a normative incompatible response would give a reason not related to the gap (“to measure”) or an irrelevant reason (“because different plants grow different”), whereas a normative compatible response might be related to the gap (“because of the numbers on the line”). Table 2 is used as a rubric for coding the end-of-year questionnaire.

Table 2 SOLO modes and levels of response

Coding was undertaken by the first two authors. The overall agreement rate for the student workbook items was 85%, ranging from 66% for Q2 to 98% for Q5. For coding of the student summative assessment items, the agreement rate was 96%, with a range from 93 to 100%. All disagreements were resolved by discussion.

Results

The results are presented in two parts. First, the categories identified in the responses to the five workbook questions (Fig. 4) are identified with examples for answering Research Question 1, related to the formative aspect of learning. Second, the SOLO analysis of the responses to the end-of-year questions (Fig. 5) is presented with examples for answering Research Question 2, related to the summative aspect of the learning.

Workbook responses for Research Question 1: what were the explicit features associated with variation that students included in their descriptions of a science experiment on plant growth (cf. Fig. 4) based on their groups’ TinkerPlots representations?

Q1. Describe 3 different ways variation is shown in the plot

This question could be answered succinctly by listing the three different variables labelled in the plot: Height, Treatment, and Days_Since (referring to the days since planting). This was done by 20% of students, who elaborated no further. Other students (23%) went further in the description of the “ways” these three variables represented the variation seen in the plots. At times this was an extended description of one of the variables, such as “different amounts of salt” [ID175] or “plants went up then most went back down and then they went up again” [ID117]. Nine students (14%) commented on a single variable only, with eight commenting on Height and one on Treatment. Three students (5%) did not respond to this question.

Most frequently, the variation was described between the two variables, Height and Treatment, often with implicit language for height (e.g. tallest) and comparison (31%), as shown in the following examples.

  • Treatment 1 grew the tallest. [ID166]

  • The heights are different. Control got in the hundreds and treatment 2 didn’t grow at all. [ID138]

  • With the differences of control and treatment 1, 2 & 3 control has grown the most but in some places they are very consistent. [ID119]

  • There is variation the plants grew for the different treatments. [ID135]

  • Most of the control is on 1 mm and Treatment 1 has the highest TinkerPlot and treatment 2 and 3 are the same really. [ID159]

Although time (Days_Since) displayed in the plot influenced many of the comments made about TinkerPlots, it was specifically mentioned less frequently in conjunction with height (6%). The examples that follow illustrate the types of connections made.

  • Most of the variation was in between 19 days, with the height there is also variation between the days 12 and 14. [ID148]

  • I think the plants have grown a lot over the time when we first planted them. Mine has grown 130 mm … [ID170]

  • [It] goes from 0 to 220 and there is 20 days. [ID115]

Q2. Analyse the data for one Treatment. Choose one. What does the variation within the treatment tell you?

The appropriate responses typically made were one or two general comments about the plants’ growth in relation to height for the treatment chosen, although eight students (13%) linked three comments together. Examples of these are shown in Table 3. Because the question related to a single treatment, the comparison of the treatment with another treatment was not appropriate. This accounted for 22% of the comments, and included:

  • That Treatment 1 is bigger than all of the others. [ID143]

  • Control had no sugar and others did and control always was the tallest. [ID127]

Table 3 Variation within treatment*

Some students combined a within-treatment statement with a comparison statement. These responses were given credit for the appropriate comment.

Q3. Analyse the data comparing two treatments. What does the variation between those two treatments tell you?

Generally, responses pointed out one or two differences or similarities between the two treatments (69%). Very few took the comparison outside the two treatments chosen. Occasionally, comments (13%) went outside of the data available in the plots when suggesting idiosyncratic causes for difference. These included:

  • Maybe treatment 3 had more sun and a good amount of water and maybe treatment 2 had too much water and not enough sun. [ID130]

  • That our prediction was incorrect. We believed that the vinegar being acidic would kill the plant because a young plant’s roots wouldn’t be able to handle the vinegar’s acidity. [ID136]

Table 4 contains examples of some of the most common responses.

Table 4 Variation between treatments*

Q4. Analyse the data according to the days grown. What does the variation across the Days_Since attribute tell you?

As time was the essence of the experiment on plant growth, it provided the greatest opportunity to discuss changes in the other variables. With the focus placed on the plot, responses generally recounted observations drawn from there in terms of the variables Treatment and Height, rather than explanations on the science of the plant growth (see Q5). Only one response did not make a connection to the other variables: “Most of the TinkerPlots are on 10–14” [ID159], and two others were unrelated or idiosyncratic (5% in total). Following on from Q3, some responses continued to refer back to the previous question. For example, a response of “Control was the highest” for comparing Control and Treatment 2 was followed by “That they went higher.” [ID115] when referring to Days_Since. These responses were considered appropriate. The responses ranged from one or more generalised statements, to one or more contrasts observed for the other variables, as exemplified in Table 5.

Table 5 Variation across Days_Since*

Q5. Analyse the data in the plot. What conclusions do you draw about typical growth of the plants?

Because there were so many possible designs (17) for the experiments with different expectations, the responses to the final workbook question ranged widely over the possibilities. Some (33%) stayed focused on the data in the plots, revising the outcomes for various treatments.

  • Treatment 3: It grew slow but well. Treatment 2: Did not grow too well but it was OK I guess. Treatment 1: It grew well and strong. Control: It grew the best out of all of them with just water. [ID117]

  • That through all the measurement, the plants got higher and higher each week. Different things, like vinegar, etc., can make a big big difference. [ID109]

  • Treatment 2 and 3 grew the best. Treatment 1 and Control didn’t grow as well. [ID165]

Other responses made generalisations from their group’s experience (20%).

  • Sugar is not good for plants. It can make them grow but only for a matter of time, before it kills them. My plants I say grew the best because [they] had no sugar. [ID105]

  • If you have too much sugar it grows a lot and if you just have water it also grows a lot but if you have too less or not enough sugar your plant may not grow as well as the other plants or not even grow at all. [ID114]

  • It doesn’t need anything added, it just grows naturally because control which had no sugar grew the best. [ID127]

Some responses, however, reflected less appreciation of the finer points of the experiment for typical plant growth (22%).

  • You put water in it and it will go up so that’s how to grow a plant. [ID115]

  • I have learnt that plants die out. [ID147]

  • I think that all plants slowly grow. [ID121]

  • That even if it was the same thing to grow, the data is always different and never the same and it just depends on luck if it grows well or not. [ID116]

One student made a comment peripheral to plant growth and 15 (23%) did not respond to the question.

End-of-year responses for Research Question 2: what evidence of student learning associated with variation in the context of the Plant Growth activity was demonstrated in responses to the end-of-year multipart question based on one group’s data (cf. Fig. 5)?

Using the rubric described in Table 2, each summative assessment question (see Fig. 5) is considered in turn. It should be noted that the construction of the questions at the end of the year was designed to assist the students in remembering the parts of the activity and how they were linked to the plot provided rather than necessarily expecting a full range of SOLO responses.

(a) Why did we have a control and 3 treatments?

Generally, students showed a good appreciation of why a control and three treatments were required for their experiments with only one student providing a normative incompatible response and 85% of students focusing on the essential elements of variation, i.e. comparison and difference, for CS responses (see Table 6). Half, however, only noted one specific reason.

Table 6 SOLO levels for “Why did we have a control and 3 treatments?”

(b) What data did we collect to plot?

Appreciation of variation in responses to this question was reflected in the number of different variables measured, as well as the implicit appreciation of the change associated with growth. Again, only one response was considered normative incompatible (see Table 7). Given the specific nature of the focus on the measurement of “data” in the activity, however, only half of the students were specific about what was measured or noted a single variable or list of the variables, e.g. Height, Treatment, Days_Since. Forty-seven percent of responses were ikonic in referring more obliquely to “measurement” or “growth”. The split of responses such as this illustrates the value of acknowledging IK responses as less focused on the actual “data” than CS responses but still appreciating the context in which the question is asked.

Table 7 SOLO levels for “What data did we collect to plot?”

(c) Why is there a gap in the data?

This question was included to check if the awareness of missing data (see Fig. 5), which had been discussed pointedly with the classes, had been internalised (see Table 8). As it turned out, each class had a reason for absence during the activity, being the Easter break, a trip to the nation’s capital, or a school camp. At the time of entering data into TinkerPlots, the importance of acknowledging the data were not collected at particular times was emphasised and hence there were gaps with no icons for some values on the x-axis. Eighty percent of students recalled this aspect of the data entry, whereas 18% of responses were considered normative incompatible because they did not recall the context of the experiments.

Table 8 SOLO levels for “Why is there a gap in the data?”

(d) Draw a line through the data for the control

This was a task about summarizing visually the variation in the data over time. To complete the task, students needed to read the key in the plot to determine which data were the control data (see Fig. 5). This was the basic criterion for a CS response. Ninety-two percent of students managed this successfully and their “lines” drawings were considered to be in the CS mode, representing the upward trend in the data (see Table 9). Four others did not recognise the control. Of these, two were considered IK incompatible, as showing no relationship at all, whereas two others were considered IK compatible by showing a trend but not related to the control. Although many of the CS responses were not considered “lines” as would be accepted in more senior classrooms, they identified the required data, the control, which was the context for the question. The unistructural ones highlighted in some fashion all of the control data, whereas the multistructural responses provided a continuous line connecting “boundary” values of the control data, and the relational responses attempted to show a general trend with a “smooth” line within the control data or suggesting smooth boundaries.

Table 9 SOLO levels for “Draw a line through the data for the control.”

(d+) Write a sentence on how the data changed

For this question on how the data changed, 81% of the students were able to describe with CS evidence the variation in the data, with a few integrating their explanations at the relational level (see Table 10). Again, there were a few responses (16%) that were either compatible or incompatible with the context of change and not focused on the “how” aspect of the question.

Table 10 SOLO levels for “Write a sentence on how the data changed”

(e) How did the data for the control compare with the other treatments?

The focus of this question was a comparison of the control with the other treatments, with 88% of students able to make at least one reasonable description, using language describing variation, e.g. consistency, taller, different, bigger, faster. Of the five responses (10%) that did not focus on the data (see Table 11), four were, however, compatible with the context of the question.

Table 11 SOLO levels for “How did the data for the control compare with the other treatments?”

(f) What did this group learn from their experiment?

This question was open-ended and some students interpreted it from their experience more generally rather than from the example given in Fig. 5. Sixty-eight percent responded in line with the expectations of learning about variation embedded in the extended activity, although not explicitly using a related word. Twenty-eight percent made IK suggestions, with 16% considered incompatible with the learning objectives (see Table 12).

Table 12 SOLO levels for “What did this group learn from their experiment?”

Discussion

Representing, analysing, and interpreting the variation in the data from the Plant Growth activity that occurred at the end of each of the three classroom interventions (cf. Fig. 4) formed the context for Research Question 1 of this report. The work reported was as a culmination of other workbook entries and many discussions that took place across the term as the experiments were set up and monitored over the three or more weeks of gathering data. There was often debate in the classroom about the expectations and surprises related to the emerging results for the various treatments. Each group of four also presented a report to the class with questions and discussion after completing the workbook questions (see also Wright et al., 2021).

Specifically, the classroom intervention part of this study illustrated six of the conceptions of variability suggested by Shaughnessy (2007): (i) variation was noted in particular values as students collected and recorded their data. (ii) Variation as change over time was built into the activity and perhaps was the most obvious aspect to students as they collected data from their plants on many occasions, commenting on changes in weather conditions or growth spurts. (iii) Variation as the whole range of possible values was experienced through the TinkerPlots representations. (iv) Variation as the range of possible samples was clear as the students within their groups discussed and compared the different treatments being applied to the plants. (v) Variation as association of several variables was seen through the plots of Days_Since and Height with different colours for the different treatments. (vi) Variation as distribution was seen as the class observed and discussed the results of other groups.

The expectations of the workbook questions based on the TinkerPlots representations the students created were quite advanced for Year 6 students. Although from Year 4, the Australian Curriculum (ACARA, 2019) Content Descriptions related to “Data representation and interpretation” suggest students “evaluate the effectiveness of different displays in illustrating data features including variability” (ACMSP097), the graphical contexts do not include more than two variables. Across Years 4 to 6, suggestions include, “construct suitable data displays … [that] include tables, column graphs and picture graphs where one picture can represent many data values” (ACMSP096); “construct displays including column graphs, dot plots and tables, appropriate for data type …” (ACMSP119); and “interpret and compare, a range of data displays, including side-by-side column graphs for two categorical variables” (ACMSP147). The expectations associated with the graphs these students created went well beyond this. The more statistical language of “within” and “between” variables is not used in the curriculum. The Proficiencies in the Curriculum for Statistics (Appendix 1), particularly “reasoning” and “problem-solving,” cover well the general aims across Years 1 to 6. There needs to be, however, more explicit linkage of these to the details in the Content Descriptions.

The only previous classroom research that could be found in the context of collecting data associated with plant growth was that of Lehrer and Schauble (2004, 2007). As a design study, their research documented the progression from the beginning of the students’ experiences with variation in measurement data and representing it in hand-drawn displays. As the students in the current study had experienced 3 years of discussion about, and representing of, variation, often in hand-drawn graphs, they were beginning the activity with a more sophisticated background and experience with the software TinkerPlots. It was hence felt important to begin with what the research team felt were significant questions for the students to answer during the activity. Although not ideal that some of the workbook entries were incomplete, the teacher said that the expectation of writing detailed explanations was quite demanding for Year 6. The teacher, however, was favourable to the approach and later had the students prepare other written reports for their classroom assessment. The research team was satisfied that generally speaking over half of the responses to the workbook questions were acceptable, showing appreciation of the meaning of “within” (71%), “between” (72%), and “across” (79%) in relation to the variables and the graph presented (cf. Tables 3, 4, and 5), indicating appreciation of the concept of variation and the role it played in making decisions about what makes plants grow best. Overall, the students were able to answer their research questions and were able to recall the data and the variation evident when making their final decisions about the growth of their plants. The students were often able to identify optimal growth patterns and instances where growth rates were either impeded from germination or declined after a short period of growth.

For the end-of-year questions related to Research Question 2, the recent adaptation of the SOLO model (Biggs & Collis, 1982, 1991) by Groth et al. (2021) was considered very useful in describing the longer-term learning outcomes from the activity in terms of a developmental model. When experiencing a new context and new content, as these students were, it is not surprising that at times responses failed to include concrete symbolic (CS) language and symbols in the explanation of events to answer the questions. It is important to recognise the informal visual and contextual clues students provided, indicating they were developing the intuitions required as a foundation for CS reasoning. Although the range of IK responses across the questions (Fig. 4) was from 4% (Part d) to 49% (Part b), the normative incompatible IK responses ranged from only 2% (Parts a and b) to 18% (Part c). The roughly even split of normative compatible IK and CS responses to the question, “What data did we collect to plot?” (Part b), shows an example where a focus on the specific language of what data represent is likely to support the move of students’ IK responses into the CS mode. The normative incompatible IK responses to “Why is there a gap in the data?” are most likely to represent forgetfulness of some students or a continued difficulty in identifying the precise meaning of the TinkerPlots representation.

Although some of the representations created to show the trend in the control data were idiosyncratic by more sophisticated statistical standards (cf. Table 9), almost all students could focus on the control data and their trend upward over time. It was pleasing that over 80% of the students could write straightforward descriptions of the change in height over time in relation to the “change in the data,” as well as for “comparing the control with the other treatments.” Although CS responses to “what did this group learn?” fell to 68%, this may not be surprising given the new context and complex nature of the activity. The normative compatible IK responses (12%) brought the percentage responding in the context to 80%, which offers promise for future movement to the CS mode with further conversation.

Conclusion

As a culmination of the larger STEM-based project (cf. Fitzallen & Watson, 2020), the Plant Growth activity was planned to embed explicitly many of the content descriptors of the Australian Curriculum that can be related to make meaning in a STEM context (ACARA, 2016). The overlap across the curriculum of the critical underlying concepts and goals that are found there should reinforce efforts of teachers to make STEM connections explicit to students in the classroom. As Watson et al. (2020a) point out, many career opportunities are emerging across a range of fields linked to STEM. It is hoped that some students will continue an interest in similar topics as they continue their education.

Picking variation as the underpinning theme for the entire project reflected the obvious, but not always acknowledged, realisation that none of the STEM fields would exist except for the variation that exists across every aspect of the fields. As noted earlier, the importance of variation was not broadly recognised by the statistics education community until the 1990s, following Moore’s (1990) claim that variation underpined four of his five core components of statistical thinking. This recognition has continued through the American Statistical Association’s two GAISE Reports by Franklin et al. (2007) and Bargagliotti et al. (2020). Early work with student interviews (e.g. Watson & Kelly, 2005; Watson et al., 2003), surveys (e.g. Watson & Kelly, 2004), and classroom interventions (e.g. Lehrer & Kim, 2009; Lehrer & Schauble, 2004; Lehrer et al., 2007; Petrosino et al., 2003), has laid a foundation for studying understanding with larger interventions and meaningful contexts, providing links across the curriculum, as explored here. Hopefully, further research will be carried out, not only related to STEM in this context from Biological Science, but also more widely for topics of interest across the STEM fields.

In their analyses of students creating representations of variation, by hand, in two contexts, where the variation was of their own creation in making measurements or in the natural growth of the plants, Lehrer and Schauble (2007) emphasised the difficulties students had creating representations of the variables in the second, more complex, context. By the time the students in the study reported here reached the plant growth activity, the last activity of nine in the project, they had the support of TinkerPlots technology and experience using it in other activities (e.g. Fitzallen & Watson, 2020; Watson et al., 2020b). The authors speculate that the students in this study would not have been able to create reasonable representations of the data and variation in the activity by hand.

The language in the workbook questions, “within” and “between” treatments, was deliberately chosen to mirror the type of language that is used in more advanced statistics analyses when similar decisions are made about trends in experimental data. Helping students to begin thinking in such ways about data, with relevant language, can hopefully contribute to the appreciation of similar contexts encountered in later courses in the STEM fields. Similar to Watson’s (2005) evidence that students’ appreciation of variation precedes their appreciation of expectation, having students thinking about variation within and between variables intuitively at this level may support understanding when numerical techniques are introduced later.

In the large environment of research that is required to put STEM education firmly in the school curriculum (Honey et al., 2014), this study is a relatively small component. Although part of a larger longitudinal study, it stands on its own here, as an example of Foundational Research (p. 139), aiming to model improved learning by developing and testing methodologies and technologies (e.g. TinkerPlots) that will inform others. In terms of the components of an Integrated STEM Education (p. 32), the Goals and Outcomes for students are addressed here through the developing of the twenty-first-century competencies, particularly associated with integrating statistical literacy with other STEM literacies to make meaningful foundations for future growth among the fields. The study also suggests classroom design aspects to be further trialed and improved by others.

Constraints

In retrospect, it is always possible to imagine that some aspects of the plant growth activity could have been different. Allowing many choices of treatments was complex to organize and the authors believe the outcome could have been achieved with fewer choices available. Students, however, were seen to enjoy discussions in their groups on what choices to make, sometimes based on their previous out-of-school experiences with the treatments. Although the students could rarely compare progress and results directly with those in other groups of four, there was interest across the class in seeing which plants and treatments grew better.

Some of the questions on the end-of-year questionnaire could have been phrased to expect higher level SOLO responses, but overall the authors had a purpose in not doing so, given the time gap for some students and the length of the complete end-of-study questionnaire. Although some students did not respond to a few questions, generally the response rate was good. The teacher and the researchers asked and encouraged the students to complete the end-of-year questionnaire, but there was no penalty for not doing so and it did not count toward their in-class assessment by the classroom teacher. Those who completed the task appeared to enjoy thinking back on the activity.