Introduction

The important contribution of the STEM disciplines has gained increased attention in recent years in political, economic, and educational contexts (e.g. Honey et al., 2014; Office of the Chief Scientist, 2013; Timms et al., 2018), which has translated into education policy (e.g. Department of Education, 2015; National Science and Technology Council, 2018) and curricula (e.g. Australian Curriculum, Assessment and Reporting Authority [ACARA], 2020); Ministry of Education, n.d.). A feature of STEM education is the integration of learning across the STEM disciplines (e.g. Anderson & Li, 2020; Tytler et al., 2019; Watson et al., 2020a). In a more recent commentary on preparing a STEM-ready workforce, however, Bentley et al. (2022) highlighted issues considered to impede the advancement of “STEM education”, including a contested definition of the phrase, uncertainty on exact skills required, hesitancy about the effectiveness of inquiry-based teaching approach within STEM learning, and the questioning of who should teach STEM. Some of the arguments are not presented in relation to the primary school level, and absent from the commentary is the acknowledgment of issues raised in the literature about implementing the curriculum and determining the way in which students’ understanding develops within integrated learning environments (e.g. Fitzallen, 2015). In a study focussed on coherence and integration in STEM teaching, Roehrig et al. (2021) also noted the problematic nature of the [mis]alignment of the science and mathematics learning standards with the necessary science and mathematics content needed to connect STEM disciplines in meaningful ways. The study reported here addresses Fitzallen’s call for greater attention to be on the contribution mathematics makes to STEM learning, Roehrig et al.’s curriculum issues, and Bentley et al.’s call for continuing research in the field, in particular, about the way in which students’ understanding of mathematical concepts develops within integrated learning environments.

The research reported in this paper explored opportunities for students to build graphical representations, conduct data analyses, and utilise interpretation skills while developing an understanding of STEM concepts from science using digital technology. The focus of the exploratory research was on students’ capacities to “interpolate” and “extrapolate” data from results collected from viscosity experiments designed to address mathematics curriculum outcomes associated with recognising and using graphical and numerical representations to describe data from statistical investigations, and science curriculum outcomes associated with explaining observable properties of solids, liquids, and gases (ACARA, 2020). The terminology of interpolation and extrapolation of data, however, was not introduced in those terms during the activities carried out by the students. Instead, the students were “solving a mystery” and “making a prediction into the future”, respectively. Across the STEM disciplines, this concerns making decisions based on evidence that can be derived from data collected in a classroom context.

Statistical background

In relation to statistical analyses, interpolation is related to estimating a value inside a set of bivariate data, and extrapolation is related to estimating a value outside of the range of the data. The process of making the estimation of values involves plotting the data to look for a trend. If the data are roughly linear, thanks to calculus, the minimum total distance can be calculated between the data values in the two sets to create a “line of best fit” to show the trend that a group of points appear to follow (Nagle et al., 2017). The line is then used to answer a question relating a given value in one of the data sets to a predicted value in the other set. If, however, the trend in the data does not appear to be linear, the challenge is to create the equation of the function that represents the relationship between the two variables. Whether the line is approximated by the students or provided by a computer program (e.g. Motulsky & Ransnas, 1987) depends on the level of course being studied. Dealing with equations for curved lines of fit is not generally in the school curriculum (ACARA, 2020). In tertiary, courses it occurs as part of numerical analysis units (e.g. Sauer, 2012) or in applied STEM subjects where on-line applications are available to create fits with various mathematical models (e.g. Mavrevski et al., 2018). Even when students are not given the results of advanced mathematical calculations used to determine a curve to fit the data, it is still feasible for them to use their approximate curves to discuss the association of variables based in meaningful contexts.

Traditionally, the method of representation of such data is a scatterplot, with two perpendicular axes, each scaled evenly for the values of the two variables plotted. The research on aspects of covariation, and hence linear trends, has occurred almost exclusively with middle and high school students (e.g. Casey, 2015; Groth et al., 2018; Nagle et al., 2017). Outcomes from these research studies indicate students can interpret scatterplots, including looking at trends in aggregated values, or connecting values in a dot-to-dot sequence, or signifying a trend with a straight line, or conceptualising the slope as the rate of change from lines of best fit. Without any of this formal background in primary education, there is still the potential for younger students to observe trends in plotted data points and sketch lines of fit related to them (Andre et al., 2022; Fitzallen, 2012; Moritz, 2004). The lines can then be used to predict other values within or extended from the plotted data. A review of literature related to judgements made by 8–12-year-old students when using spreadsheets to analyse data found they used: knowledge of everyday experience, accumulated over extended time; knowledge gained recently during the data-generating activity itself; an expectation that graphical or numerical patterns in data should be maintained or continued; and emerging knowledge of an underlying mathematical model (Ainley et al., 2001). The application for other types of representations has been made possible through advances in technology (Frischemeier et al., 2021; Konold, 2002; Mojica et al., 2019). Further research based on more recent technological applications is warranted.

Within a bivariate graph, the interpretation of the trend in the data can also be thought of in terms of Curcio’s (1989) and Shaughnessy’s (2007) discussions of “reading” with respect to the message in data. For Curcio, reading “within the data” related to making sense of a graph. Then, reading “beyond the data” for Curcio was interpreting the information in the graph to answer a question about a context. Shaughnessy was more specific in suggesting the need also to read “behind the data” for causes of the variation shown in the data or the potential relationship of the variables represented on the two axes. For Shaughnessy, it was looking at these aspects behind the data that could assist in answering questions asked beyond the data as suggested by Curcio. According to these definitions, determination of a line or curve that approximates the data would be considered reading “behind the data”, interpolation would be considered reading “within the data”, and extrapolation would be considered reading “beyond the data”.

In considering the graphical representations discussed above to carry out interpolation or extrapolation, the two variables involved in each are plotted on the x-axis and y-axis as time series or scatterplots. A recent revision of the Australian Curriculum: Mathematics v9 (ACARA, 2022) included for the first time at the Year 5 level, “interpret line graphs representing change over time; discuss the relationships that are represented and conclusions that can be made” (AC95ST02). The US Common Core for Grade 5 (Common Core State Standards Initiative [CCSSI], 2010) under Measurement and Data described graphing points on the coordinate plane in detail to “represent real world and mathematical problems” (p. 38). The National Council of Teachers of Mathematics (NCTM) Standards (2000) for Grades 3–5 suggested using line plots, bar graphs, and line graphs (p. 176). It was not until Grades 6–8 that scatterplots were introduced to show the relationship between two numerical variables (NCTM, 2000, p. 248). In Australia, scatterplots specifically appeared at Year 10 (ACARA, 2020), with interpolation and extrapolation encountered in the senior secondary curriculum. In “Essential Mathematics”, Unit 3, students are required to fit a linear model to numerical data and distinguish between interpolation and extrapolation when using the fitted line to make predictions. Neither the Common Core (CCSSI, 2010) nor the NCTM Standards (2000) mention these two topics at the secondary level.

Although specific mention of scatterplots does not appear in curricula until the secondary years of schooling, research recommends that exploring student understanding of relationships suggested by scatterplots in the primary years of schooling is appropriate (e.g. Ainley et al., 2001; Andre et al., 2022; Fitzallen, 2012; Moritz, 2004). Now that time series are included in Year 5 (ACARA, 2022), research into student understanding of those graph types needs to be conducted. Gaining an appreciation of the early emergence of intuitions about interpolation, extrapolation, and lines and curves of fit has the potential to inform teaching practices that build on those intuitions to develop more sophisticated strategies for interpreting trends using statistical calculations when introduced later across the curriculum.

Informal statistical inference

Developments in statistic education have shifted from students focusing on statistical processes to a focus on using informal statistical inference to interpret data. “The foundational difference in newer approaches to working with data is the shift from learning statistical tools and artefacts (measures, graphs, and procedures) as the focus of instruction, towards more holistic, process-oriented approaches to learning statistics” (Makar & Rubin, 2009, p. 83). Informal statistical inference involves making statements or predictions beyond the data at hand, with the inclusion of uncertainty, through explicit use of evidence from the data based on an understanding of the context (Makar & Rubin, 2018). Students who have an appreciation of variability and are familiar with the context of the data develop more sophisticated statistical understanding, which becomes imperative when applying formal statistical inference methods, such as tests of significance (Garfield et al., 2015). Lehrer (2017) contended that adopting a statistical modelling approach with students facilitates linking variation to process. He claimed that collecting data and constructing statistical models of the data support students’ development of informal statistical inference in a natural and intuitive way.

Research into informal statistical inference has been conducted from the early years of schooling (e.g. Makar, 2016) to tertiary education (e.g. Noll et al., 2016). More research, however, is needed to explore how students experience informal statistical inference in varied statistical contexts and how inferential reasoning established in one statistical context can be transferred to other statistical situations (Makar & Rubin, 2018). The study reported in this paper addresses Makar and Rubin’s call for more research about inferences from time series and inferences about relationships revealed in scatterplots and Lehrer’s (2017) suggestion that students apply statistical models (in this instance, lines or trendlines), to make informal inferences.

Data analysis: TinkerPlots

The use of the exploratory data analysis software, TinkerPlots™ (Konold & Miller, 2015), at all levels of education across many contexts has grown tremendously in recent years (e.g. Allmond & Makar, 2014; delMas et al., 2014; Fitzallen, 2012; Frischemeier, 2020; Khairiree & Kurusatian, 2009; Noll et al., 2018; Podworny & Biehler, 2014; Watson, 2014). Designed by Konold and Miller (2015), the software was developed to help middle school students visualise data and their relationships before encountering the theoretical approaches of formal statistics (e.g. Konold, 2007). Data are recorded in the software on data cards (includes a list of attributes for each case) and in a table (includes all cases). The drag-and-drop functionality of the software allows users to create graphical representations in a plot window by dragging attributes from the cards or table into a plot window. Features also include data icons that can be coloured according to the attribute highlighted.

The further development of the software and its affordances for learning and teaching (e.g. Watson & Fitzallen, 2016) has seen the software also used at the tertiary level (e.g. Noll & Kirin, 2016). Also, it has been the basis for exploratory research on initial involvement of students with the software in interview settings, where Year 5/6 students were found using three different strategies when first introduced to the software with a data set of measurement and gender data for 200 students their age (Fitzallen, 2013). Other research has shown that TinkerPlots™ provides students with more flexibility to create graphical representations than when students create paper versions (e.g. Konold, 2002). The level of understanding demonstrated, however, depends on the features chosen of the software (Watson & Donne, 2009). Further research is needed to document the transformation from creating hand-drawn representations to using TinkerPlots™ and observing the potential this has for empowering young students to explore data in more meaningful ways.

Study background

The research reported in this paper was part of a STEM education project with primary school students from Years 3 to 6 that involved the implementation of STEM learning activities that focused on conducting statistical investigations. The learning activities implemented were chosen with input from several STEM subject areas. All topics included the collection of data associated with the development of informal inference, through understanding of the practice of statistics (Bargagliotti et al., 2020; Watson et al., 2018) and identification of variation within a data set and among data sets. Before introducing the first activity, students completed a pre-test, which included three questions about data (Watson & Fitzallen, 2021): (a) What do you think “data” means? (b) Give an example of some data you have seen or collected. (c) Sketch a graph of the data. Of interest here are the responses to part (c). At that time, 53% of the students could create a single-variable pictograph, table, or column graph for their examples of data (the three types of representation recommended in the curriculum for Year 3 [cf. ACARA, 2020]). This indicated many of the students had limited experiences with learning about different graph types.

In Year 3 the first activity focused on the fundamental concept of variation, without which there can be no statistical investigation. The context chosen allowed students to experience variation twice in different situations, while reinforcing their developing measurement and plotting skills in the mathematics curriculum (ACARA, 2020). Adapted from the work of Konold and Harradine (2014) based on making a product by hand and by machine, students worked in groups to create licorice sticks by hand from Play-Doh™ that were intended to be 8-cm long and 1 cm in diameter. The licorice sticks were weighed, the mass in grams recorded on sticky labels, and the labels from all groups placed on a stacked plot on the classroom wall, about which there was much discussion of the different values of mass, ranging from 6 to 30 gm. In the next lesson, the same procedure was carried out, but this time the licorice sticks were created using a Play-Doh™ extruder to simulate production of licorice by a “factory”. When a stacked plot was created from the factory data on the wall and compared using the same scale as for the plot of the hand-made data, the differences in the variation within and between the distributions in the two plots were made obvious to the students. The results from the initial activity indicated that young students could use variation in the data—illustrated as spread from the centre—to compare two sets of data and explain the variation in terms of the context of the data collection (Watson et al., 2020b).

The other activity students experienced in Year 3 was related to using time series data to describe variation in a physical phenomenon. The STEM connection to science with the physical sciences context was related to heat: “Heat can be produced in many ways and can move from one object to another” (ACARA, 2020). Because the representation of data as a time series graph was not yet covered in the curriculum, students used a stylised graph created as “thermometers” to record the cooling that took place in insulated and non-insulated cups full of hot water placed in cold water, into which ice was placed after 10 min (Fitzallen et al., 2016; see Fig. 1). Combining data across groups in the classroom, and observing the variation in the data collection, students described the difference in the cooling for the two cup conditions, as well as the warming of the water in which the cups were placed (Chick et al., 2018; Fitzallen et al., 2016). The results suggested young students are capable of interpreting complex graphs, describing the trends in data, identifying where in a graph the data reflect a change in conditions, and comparing the variation in the rate of change for the data, albeit at varying levels of understanding.

Fig. 1
figure 1

Using thermometers to create a graph to measure temperature over time

At the beginning of Year 4, in introducing students to the practice of statistics and writing investigative questions, students wrote individual survey questions to compare life in their city with life in another city in which other students involved in the project lived (English et al., 2017; Watson et al., 2019). The questions were collated to construct a questionnaire that was administered to the students in the project. Once completed, the results from the questionnaire were given to the students for analysis. The analysis of students’ hand-drawn representations of the questionnaire results detailed the first time they explicitly considered a categorical variable (city) in their representations of other data to answer their research questions (Watson et al., 2019). Although not always involving a two-dimensional representation, 64% of the students were able to distinguish in some manner conclusions about the two cities from the data. The results from the activity showed young students can represent data using drawings and images as well as conventional graphs.

The other activity in Year 4 extended the exploration of difference in variation from the Licorice activity to include a difference in the expectation (middle) of two related data sets. Here, the STEM link to science involved in a study of force from the Science Curriculum: “Forces can be exerted by one object on another through direct contact or from a distance” (ACARA, 2020). Students experimented with catapults in two conditions: (1) an initial construction of the catapults and testing how far the catapults launched ping pong balls, and (2) repeated testing after the catapults had changes made to increase the force of the launches. In this activity, where students collected data in groups of three, there was variation in the expectation for the two catapult constructions as well as within each of these methods of launching (Watson et al., 2022, 2023). Again, students created hand-drawn representations of their initial trials with the catapults. The Catapult activity reinforced the students’ emerging appreciation of the fundamental nature of variation, as well as providing the context for their beginning exposure to using technology as part of investigations. In the computer lab, each student was asked to complete a representation to demonstrate the results of the experiment (cf. Watson et al., 2023, p. 963). Seventy percent of students were able to distinguish the two conditions of launching the ping pong balls, either by creating a two-dimensional plot or by using different colours for the two treatments (see Fig. 2). The results indicated that TinkerPlots™ supports students to identify elements of variation that can be used to explain the difference in two sets of data.

Fig. 2
figure 2

Differentiating the first and second trials in the Catapult activity (Watson et al., 2022, p. 15)

The activity reported in this paper, conducted when the students were in Year 5, combined the experience of plotting two variables and identifying trends in the data, supported in the heat activity, with the experience of comparing two groups in both the Licorice activity and the Catapult activity, including using TinkerPlots, which involved creating representations of measurement data with categorical data. The STEM connection chosen from the Science Curriculum was related to chemical sciences: “Solids, liquids and gases have different observable properties and behave in different ways” (ACARA, 2020). The STEM connections with mathematics were firmly based in statistics through “Data representation and interpretation”. There were also further close links with the content description for science inquiry skills, through “planning and conducting” and “processing and analysing data and information” (ACARA, 2020).

The research reported in this paper continued investigation into students’ levels of understanding of graphical representations and development of data-related concepts experienced within integrated STEM learning contexts. It also explored students’ potential transition from creating and interpreting hand-drawn representations to using those generated by data analysis software in relation to identifying trends in the data to make predictions and projections from data.

Methodology

Given no evidence could be found of previous research related to school students’ understanding of interpolation and extrapolation in nonlinear data contexts, the research reported in this paper was exploratory in nature. It adopted a pragmatist paradigm as suggested by Mackenzie and Knipe (2006). Hence, the research questions, the resultant data, and the method of analysis were central to the exploration of the extent to which students could use their developing skills of data representation to interpret data arising in an unfamiliar context (interpolation or exploration) to answer questions from another part of the STEM curriculum. Following Mackenzie and Knipe (cf. Table 1, p. 4), the research was problem-centred in terms of context, related to the onsequences of the students’ actions in their experiments, pluralistic in addressing two situations where viscosity is relevant, and real-world practice–oriented due to the link to the STEM curriculum. Hence, in terms of the analysis of the students’ responses to their research questions using the data collected by their groups, the study’s research questions were qualitative, based on how the students employed the elements at their disposal to make informal inferences for their two research questions. For one class, this included analysing their hand-drawn representations as well as those in TinkerPlots™ files for the first question on interpolation. This suggests incorporating the theoretical perspective of the Structure of Observed Learning Outcomes (SOLO) model of Biggs and Collis (1982) when considering students’ creation of responses to the tasks presented. Because of the importance of graph creation, the SOLO model extended by Watson and Fitzallen (2010) in terms of consolidation in the concrete symbolic mode as suggested by Watson et al. (1995) and Pegg (2002) was used to explore students’ levels of cognitive development displayed.

SOLO taxonomy: describing levels of graphical understanding

The SOLO model (Biggs & Collis, 1982, 1989) is a hierarchical model that describes the increasing complexity of student understanding of ideas and concepts. The basic SOLO model is comprised of five modes of development from sensori-motor in infancy to formal and post formal in adulthood. The concrete symbolic (CS) mode, from the beginning of schooling to adolescence, is the mode of interest in this research. Within each mode, responses to tasks are characterised by a framework based around the elements in the task that are required in creating a response to the question asked. If one element is used, the response is considered unistructural (U). If two or more elements are employed in sequence, the response is considered multistructural (M). If the two or more elements are shown to be related in answering the task, the response is considered relational (R). Also, before reaching an extension of this structure considered to be in the formal mode, it may be that a consolidation occurs of a particular concept, which then can be used as a unistructural element in a higher U-M-R cycle in response to the task (Levins & Pegg, 1993; Pegg, 2002; Watson et al., 1995) in the CS mode. An example of this might be the construction of the arithmetic mean in one sequence and then its subsequent use in another sequence solving a task that requires the mean (e.g. Callingham, 1997; Watson & Moritz, 2000).

In this paper, SOLO is applied to the concepts of graph creation and informal inference based on graphical representations. In relation to graph creation, Watson and Fitzallen (2010) employed the SOLO model using the elements required for students to construct graphical representations in repeated sequences in the CS mode. This begins with the potential levels of achievement shown in Fig. 3 for the creation of the type of graph in the curriculum representing one attribute, as first introduced in the primary curriculum. The elements in the U1-M1-R1 cycle for such a construction are one attribute (variable), Data, the variation present in the data, and Scale required to present a visualisation of the attribute. Specific examples of the elements could be the attribute of height, the height data from the students in a class, demonstrating the variation among the students, and scale created with height on a horizontal axis and frequency on a vertical axis.

Fig. 3
figure 3

U1-M1-R1 sequence for the concept of graph for one attribute (variable)

Employing the Concept of Graph consolidated as seen in Fig. 3 (R1), Watson and Fitzallen (2010) considered it as a new element (cf. Pegg, 2002) that could then be used in a second SOLO sequence, where, besides the element of the Concept of Graph, the other elements were the Types of Attributes (e.g. numerical or categorical), 2-D Scaling appropriate to the attributes, Relationship of the Two Attributes to a Single Case, and, for the concepts in this activity, Trendline (identified from the data). This is shown in Fig. 4.

Fig. 4
figure 4

The U2-M2-R2 sequence for the Concept of Graph for Multiple Attributes (adapted from Watson & Fitzallen, 2010)

Depending on the context, Concept of Graph for Multiple Attributes applies, for example, when two numerical variables suggest that a linear correlation may exist. Building on the examples of elements for the Concept of Graph suggested above, the plot shown in Fig. 5 is an example of an R2 graph for multiple attributes. Based on the two attributes of height and arm span measured for each of 34 children, with the 2-D-scaled numerical attributes (measured in centimetres) on the axes, including the single points representing the attributes, a linear trendline can be created to suggest the relationship between the attributes.

Fig. 5
figure 5

Scatterplot of height and arm span with a suggested trendline (Bargagliotti et al., 2020, p. 40), exemplifying the Concept of Graph for Multiple Attributes

Contexts that suggest interpolation or extrapolations can also be represented in this way with the acknowledgement that the relationship involved is not linear. The trendline may be all that is required in some contexts, hence only the need for an R2 representation. However, once the 2-dimensional graph is created, the expectation is likely to be to make an informal inference for a question asked about the attributes (variables) involved. This adds another level of complexity that is expected of students in tasks involving multiple attributes. In the case of interpolation or extrapolation, this is usually a prediction of the value of one of the attributes given a new value of the other based on a question about the context of the data. Consolidating the Concept of Graph for Multiple Attributes, as a new element, other elements include a reminder of the importance of variation in deciding and the reference lines associated with the known and predicted values. These are the elements required for statistical analysis and informal inference. This is shown in Fig. 6, for decision-making in the tasks in this study.

Fig. 6
figure 6

The U3-M3-R3 sequence for Informal Inference for Graphs (adapted from Watson & Fitzallen, 2010)

This meta sequence of SOLO models for graph creation provides a way of reporting on the learning outcomes for students as they initially move from creating graphs with one attribute, to creating graphs with two attributes, and then further using the representations to make informal inferences in relation to their research questions. For example, from the R2 representation in Fig. 5, one might want to suggest an arm span for a person of a height of 145 cm. This would involve adding a new element, reference lines, first vertically from the horizontal height axis and then horizontally from the intersection with the trendline to the vertical arm span axis.

As the outcomes of each of the first two cycles become elements of the next higher cycle, how they are reported depends on the aim of the task and the level reached in the response. For example, the R1 concept of graph may or may not be combined with other elements in an U2 attempt for multiple attributes; similarly, for R2 responses. Figure 7 illustrates the relationship across the three cycles. Also, the initial model for the Concept of Graph for a single attribute may be useful in assessing responses if students struggle with their hand-drawn representations.

Fig. 7
figure 7

Consolidation of the concept of graph across SOLO levels in the concrete symbolic mode

Research questions

Given the background of the students exploring relationships observed in graphical representations (e.g. Figs. 1 and 2) and the lack of any known previous research on interpolation or extrapolation in the primary years of schooling, the research questions were exploratory in nature. To conduct the research, two student activities were developed, for which the following research questions apply. Three research questions related to the outcome of the interpolation activity and one question related to the extrapolation activity.

RQ1: How can students’ creations of hand-drawn representations for the interpolation activity be characterised by the SOLO models for graph creation?

RQ2: How can students’ creations of TinkerPlotsTM representations for the interpolation activity be characterised by the SOLO models for graph creation?

The difficulties experienced by the students in completing, to their own satisfaction, their hand-drawn representations led to considering the additional cognitive support provided by the introduction of the 2-dimensional framework of TinkerPlots™; hence, RQ3 was investigated.

RQ3: For students who completed hand-drawn and TinkerPlotsTM representations for the interpolation activity, what does the relationship between the SOLO levels reached suggest about the additional support provided by the software?

Finally, the extrapolation activity was considered.

RQ4: How can students’ creations of TinkerPlotsTM representations for the extrapolation activity be characterised by the SOLO models for graph creation?

Participants

Data were collected from 51 students in two Year 5 classes at an urban independent Catholic school, for whom parental and student permissions to use the data had been obtained. All except five had been students for the two previous years; two had been present the previous year; and three were new in the year data were collected. The sample consisted of 29 boys and 22 girls, 10 or 11 years old, of whom 22 completed hand-drawn representations. The project had ethics approval from the Tasmanian Social Sciences Human Research Ethics Committee (H0015039). For reporting purposes, students were assigned a unique student identification code (e.g. ID101).

In summary, in Year 4, the students had two specific experiences with TinkerPlots™ before this activity:

  • introduction to plotting the class data from the Year 3 Licorice activity (Watson et al., 2020b); and

  • from the Catapult activity (Watson et al., 2022), being presented with TinkerPlots™ files on USB memory devices that contained the data from their first trials, being taught how to enter the data from their second trials, and then creating a representation to show the results for their groups (cf. Fig. 2).

Student data collection context: the STEM connection

The STEM connection chosen was related to viscosity because it was acknowledged to be part of the Science Curriculum (ACARA, 2020) that would be taught by the school in Year 5 (Australian Academy of Science [AAS], 2014).

  • Solids, liquids, and gases have different observable properties and behave in different ways (ACSSU077), recognising that not all substances can be easily classified on the basis of their observable properties.

The viscosity of a liquid is a measure of its resistance to flowing (AAS, 2014, p. 9). It is an observable property of liquids that is described in terms of “thickness” and behaving in different ways. The AAS (2014) recognised that viscosity is an appropriate topic for Year 5 in its Primary Connections chemical science activities document, What’s the matter? It developed an activity, “Runny Races”, based on liquids of varying viscosity flowing at different rates. Hence, the first activity, for interpolation, was set up to solve a mystery, with the aim of using the nature of viscosity to determine the concentration of an unknown solution. The second activity developed for the project was based on lava flows. The activity was implemented at the same time with Year 5 students in two cities. The data from one school was used to investigate the way in which students model quantitative data (English, 2022). Implementation of the task in the other city for the purposes of investigating students making informal statistical inferences is reported in this paper.

Activity 1: Mystery Solution

In this activity, five viscosities were created with the same amount of hair conditioner but with different amounts of water (0 ml, 10 ml, 20 ml, 30 ml, or 40 ml) added to each. To initiate the activity, the teacher led discussions on “fair tests”, with each group of three students deciding how to complete the tests and record the data. Testing of each of the solutions took place by measuring how far the mixture flowed down square-centimetre graph paper attached to a clipboard and placed at a fixed angle, in 30 seconds. Each mixture was tested three times as groups worked their way around five “test sites” in the classroom (Fig. 8). The students hence had potentially 15 data values to represent in relation to the concentration of the conditioner and the distance travelled. In the first class, creating hand-drawn plots, students were allowed freedom in how they represented the data their groups collected. In most cases, a two-dimensional representation reported the distance vertically with the trials grouped along the horizontal axis and the frequency noted in some form vertically. After completing the TinkerPlots™ representations with this class, it was decided to skip the hand-drawn representations in the second class. It became apparent that the hand-drawn graphs lacked the accuracy required to make reasonable estimations of values from the graphs. Also, most hand-drawn graphical representations were not suitable for interpolating the data. After completing both graphing activities, students in this class were asked to compare the two representations and note the similarities and differences in workbooks.

Fig. 8
figure 8

Setup for timing flows of mixtures of different viscosities

TinkerPlots™ files were prepared for students in both classes, with attribute names, concentration and distance, and with the data entered for their groups’ measurements as seen in Fig. 9. The teacher then demonstrated how to plot the attributes (e.g. top left of Fig. 10) and construct, using the pencil in TinkerPlots, a trendline as the concentration increased (e.g. top right of Fig. 10). The students then completed plotting the data and drawing the line of best fit in their own files.

Fig. 9
figure 9

Examples of a TinkerPlots™ table and data card

Fig. 10
figure 10

Proposed 4-step procedure from data collection to predicting the concentration of the mystery solution

Students were then given a solution of conditioner of unknown concentration within the range of 0–40 ml to test using the same procedure. After testing three samples, each group used an appropriate middle of the three values to create an estimate of the distance travelled to plot on the distance axis in TinkerPlots™. Again, demonstrated by the teacher, a horizontal reference line from that value was extended to cross the trendline (e.g. bottom left of Fig. 10, from distance 3.5 cm). At this intersection, a vertical reference line was created and dropped to the horizontal axis to suggest the concentration of the mystery solution (e.g. bottom right of Fig. 10, to concentration 27.5 ml). This was the students’ “hands-on” method of interpolation. The actual mystery concentration was different for each class.

Activity 2: Lava Flow

A week later the second activity linked the Science Curriculum topic of viscosity (ACARA, 2020, ACSSU077) to that of volcanic eruptions (Earth & Space Sciences, AAS, 2015) and lava of a given viscosity flowing down the side of a volcano, threatening a village some distance away. For this extrapolation activity, models of shield volcanoes were created as shown in Fig. 11, with six equally spaced sections marked down the slope, each representing a kilometre. A viscous material to represent the magma was dropped from the top of the volcano and students timed (with stop watches) as the substance reached each of the six sections marked. Working as teams in groups of three, students timed a separate flow of the lava down the volcano six times and entered the 36 values themselves into TinkerPlots™, set up with interval and time as attributes (see Fig. 11, right). This time extrapolation occurred by extending the interval axis and the extension of the trendline to 10 intervals, dropping a vertical reference line to the interval axis, and then creating a horizontal reference line back to the time axis, in order to estimate the time required to reach the imaginary village (10 km away). The 4-step procedure, demonstrated by the teacher in TinkerPlots, is shown in Fig. 12.

Fig. 11
figure 11

Setup for shield volcano and TinkerPlots™ files used for data collection

Fig. 12
figure 12

Proposed 4-step procedure from initial data representations to predicting time for lava to reach the 10th time interval

The graph used to demonstrate the procedure for extrapolating values from a graph displayed hypothetical data (Fig. 12). The aim was to use data that showed variation in the trials and that the variation in time could change across the intervals. The goal was to use an example that would look familiar to the students without replicating the data to be collected. Although explicit instruction was necessary to show students how to use the graph for extrapolation, the researchers did not want to tell the students exactly what their trendlines should look like. Although the sweeping trendline added could be construed as fitting a non-linear function, the goal was just to illustrate that trendlines could be nonlinear. It was anticipated students would insert roughly linear trendlines for the Mystery Solution activity; therefore, the sweeping trendline was added to the Lava Flow graph to expose students to the idea that trendlines did not always have to be linear.

For both the interpolation and extrapolation examples, the emphasis was on using the trendline to determine the unknown values. Although primary students are able to interpret scatterplots by looking at trends in aggregated values (Casey, 2015; Groth et al., 2018; Nagle et al., 2017), Fitzallen (2012) found that strategies were needed so that students focused on the trend in the data rather than relying on patterns in the numerical values to make predictions. With this in mind, the decision was made to extend the extrapolation of the lava flow data from Interval 6 to Interval 10. Interval 10 was chosen so that there was a large distance between the recorded data and the extrapolated value. The purpose of this strategy was to ensure students focused on using the trendline when extrapolating the data and to minimise the risk of students being distracted by numerical patterns, which may have come to the fore if asked to extrapolate the data to Interval 7 or Interval 8.

Purpose of activities

For both activities, discussion took place about the need to model real-world situations, with an example of solving a criminal case with evidence needed on a suspicious substance, or in estimating the time required to escape from an erupting volcano. The purpose of such activities was to create an awareness of the link between the science topic, in this case viscosity, and the power of data to answer meaningful questions. The activities were designed to provide the opportunity for students to make informal inferences from evidence gathered. When implementing the activities, the language of interpolation and extrapolation was not used with the students. Hence, the students were answering two “classroom research questions”:

  • What is the concentration of a mystery solution given data from five different concentrations?

  • What is the length of time it would take “lava” to flow past 10 distance intervals having collected data for six intervals?

Although classroom instruction and discussion also took place concerning the science topics of viscosity and lava flow from volcanos, this report explores not only the students’ results for their research questions, but also the transition from hand-drawn representation to the use of the software TinkerPlots™ by the students to provide the estimated data values to answer the first question. For the viscosity activities, the Year 5 students were:

  • given TinkerPlots™ files on their USB memory devices with the variable names entered (see Figs. 9 and 11), to enter the data for their groups’ trials, first for the Mystery Solution data and later for the lava flow data, each time creating a plot of the data to answer the question related to the activity; and

  • shown the “pencil” tool in TinkerPlots™ to draw a trendline fitting the data they plotted and how to create reference lines.

Data analysis

In keeping with the pragmatist paradigm adopted for this research, a deductive data analysis process was implemented to explore students’ understanding (Mackenzie & Knipe, 2006). To conduct the analysis coding rubrics (Tables 1 and 2) were developed from SOLO models presented in Figs. 3, 4, and 6. The coding rubrics included descriptions of the elements that were required to demonstrate achievement of the Concept of Graph with one attribute, then the Concept of Graph for Multiple Attributes, and finally Informal Inference for Graphs (Tables 1 and 2). The coding rubrics detailed included the relevant needs of the activities required by the context in the Science Curriculum (ACARA, 2020) that were being explored. In accordance with deductive analyses, the coding rubrics were applied as set out and not altered in light of the data.

Table 1 SOLO levels for hand-drawn representations for Mystery Solution activity
Table 2 Levels for TinkerPlots™ representations for Mystery Solution concentration and Lava Flow prediction

Each viscosity activity included two numerical Attributes: Concentration and Distance for the Mystery Solution and Interval and Time for the Lava Flow activity. Hence the Concept of Graph for Multiple Attributes (Watson & Fitzallen, 2010; cf. Fig. 4) was required for successful creation of a graph to answer the questions. As some students had difficulty with this, particularly with the hand-drawn representations, the basic Concept of Graph (cf. Fig. 3) was used to characterise some responses. As the Concept of Graph was itself an element of the model for multiple attributes, the R1 level of concept of graph was redundant if there was an attempt to go further that included a complete graph. Also, for the hand-drawn representations, the students did not have access to the reference lines introduced with TinkerPlots™. Hence, reaching the R3 level for informal inference with a hand-drawn graph required somehow adding the mystery solution data to the plot and using it with the data available to predict the mystery solution. Table 1 contains the possible SOLO levels for the hand-drawn representations.

For the TinkerPlots™ representations for both activities, because they were set in the two-dimensional software with the data, students began at the multistructural level of the Concept of Graph for Multiple Attributes (M2, Fig. 4). Also, because it was possible to confuse the placement of the reference lines, there could be a unistructural response for Informal Inference for Graphs (U3, Fig. 6), for example, if the reference line was not moved from its original placement. Table 2 summarises these descriptions.

There were 22 hand-drawn representations for the Mystery Solution activity from the first class. These were sorted by the first author according to the descriptions of the SOLO levels in Table 1. These categories were then reviewed by the third author, again based on the SOLO levels. Agreement was 82%, with four plots reallocated after negotiation. These levels of development were used to answer Research Question 1.

Of the 51 students, 35 completed both activities, whereas eight completed only the Mystery Solution activity and eight completed only the lava flow activity. Hence, there were 86 graphs in TinkerPlots™ to be coded. The most advanced TinkerPlots™ graph created and saved by each student was used for the further analysis. The graphs were printed and sorted by the first author into categories with reference to the descriptions of the five SOLO levels in Fig. 6. Again, the third author reviewed the coding to confirm the levels allocated. Agreement on the categorisation was 96.5%, with three of the graphs re-allocated after negotiation. This analysis was used to answer Research Questions 2 and 4. A two-way table was then created comparing the SOLO levels resulting from Research Questions 1 and 2, to acknowledge the support provided by TinkerPlots™ (Research Question 3).

Results

Research question 1: how can students’ creations of hand-drawn representations for the interpolation activity be characterised by the SOLO models for graph creation?

Of the 22 hand-drawn representations, only five were able to facilitate suggesting the mystery solution concentration data, with two doing so. A summary of the SOLO levels is shown in Table 3 with examples presented in Fig. 13.

Table 3 Number of graphs at each SOLO level for the hand-drawn representations (n = 22)
Fig. 13
figure 13

Examples of hand-drawn representations at each SOLO level

The hand-drawn representations in Fig. 13 illustrate that at the U1 level, only data are shown; at the M1 level, there are data, attribute, and variation, but no scale; at the U2 level, the basic concept of graph is present but only for one attribute that can be identified. At the M2 level, the graph does not include 2-D scaling; at the R2 level, a complete graph is presented for the two attributes, but no further information is seen; at the M3 level, the mystery solution data for the time are present in a complete graph (on the right) but they are not linked to a suggested concentration, and at the R3 level, the mystery solution data are linked to one of the five concentrations, which have identifying “horizontal” lines to connect the data values for each time.

Research question 2: how can students’ creations of TinkerPlots™ representations for the interpolation activity be characterised by the SOLO models for graph creation?

Initially all 43 students who engaged with the mystery solution interpolation activity were considered. Table 4 indicates the number of responses at each category as described in Table 2. All students began at M2 given the support of TinkerPlots™.

Table 4 Number of TinkerPlots™ graphs at each SOLO level for Mystery activity (n = 43)

Figure 14 shows that at the M2 level, the data have only been accessed via TinkerPlots™; at the R2 level, the trendline has been added with the pencil; and at the U3 level, vertical reference lines have been included but not moved to cross the horizontal line at the trendline (13.2 was the default value for the initial placement in TinkerPlots™).

Fig. 14
figure 14

Incomplete representations for Mystery Solution data

Although the plots at the M3 level included the hand-drawn trendline and the horizontal and vertical reference lines, at times the intersection was close to but not exactly on the line or crossed the trendline twice due to atypical data collection or placement of the line of best fit. Examples are seen in Fig. 15.

Fig. 15
figure 15

Atypical representations for Mystery Solution data

At the R3 level, responses were considered to have achieved the expected goal of manipulating the vertical reference line with reference to the horizontal reference line intersecting the trendline to determine the concentration of the mystery solution, as seen in Fig. 16. Overall, just over half of the students were considered to have been successful in achieving the aim of the Mystery Solution activity in relation to using a graphical interpolation method in TinkerPlots™.

Fig. 16
figure 16

Complete representations to predict concentration for Mystery Solution data

Research question 3: for students who completed hand-drawn and TinkerPlots™ representations for the interpolation activity, what does the relationship between the SOLO levels reached suggest about the additional support provided by the software?

This question was considered to report the support provided by TinkerPlots™ for completing the task. Five of the students who completed the hand-drawn representation for the Mystery Solution activity did not create a plot using TinkerPlots. The SOLO levels for the 17 students who completed both representations are shown in Table 5. Although only six students achieved R3 level representations with TinkerPlots™, all except one produced a higher-level representation in TinkerPlots™ than with a hand-drawn graph. Four students who could not complete a basic graph (U1/M1) by hand could move to adapting a basic graph in some way or at least add a trendline. Also, four students, who were able to create a trend feature, were able to move further toward solving the mystery.

Table 5 Number of graphs for Mystery Solution activity in each SOLO category for hand-drawn and TinkerPlots™ representations (n = 17)

Research question 4: how can students’ creations of TinkerPlots™ representations for the extrapolation activity be characterised by the SOLO models for graph creation?

The categorisation of responses for the 43 students who completed the Lava Flow activity trials is shown in Table 6, again based on the SOLO levels in Table 2. All students began at M2 given the support of TinkerPlots™.

Table 6 Number of TinkerPlots™ graphs at each SOLO level for Lava Flow activity (n = 43)

Examples of the first three incomplete levels are shown in Fig. 17. Similar to the mystery interpolation data, at the M2 level, only the data are plotted; at the R2 level, the trendline only is plotted; and at the U3 level, the trendline does not reach the intersection of the reference lines.

Fig. 17
figure 17

Incomplete representations for Lava Flow data

At SOLO level M3, generally the idiosyncrasies were related to the line of best fit drawn, for example, following the edge rather than the centre of the data collected. Two of these plots switched the axes as well. Examples are seen in Fig. 18.

Fig. 18
figure 18

Idiosyncratic representations for Lava Flow data

Twenty-two students created TinkerPlots™ graphs judged appropriate for the task of predicting the time required for the “lava” to reach the 10th interval from the top of the volcano. Of these, three students (with different data sets) declared that there were outliers in their data sets, and four of the plots switched the axes from the expected representation. Examples are seen in Fig. 19. Hence, 70% of students were considered to have been successful in appreciating the task of making a prediction for how much time it would take for the lava to reach 10 intervals using a graphical extrapolation method in TinkerPlots™, although some were idiosyncratic or lacking accuracy.

Fig. 19
figure 19

Complete representations to predict lava reaching the 10th section

Discussion

The increasing complexity of graph creation, interpretation, and then use for decision-making, even facilitated by the support of software for 2-D scaling, provides cognitive challenges for primary students in being able to combine all of the levels for problem-solving, in this case making an informal inference from data in context. This is indeed one of the challenges noted by Bentley et al. (2022) for STEM education. The extension of the SOLO model of developing cognition, however, is useful in describing the elements that are involved in the increasingly complex constructions required to solve problems in the CS mode.

Although the theoretical mathematical demands of interpolation and extrapolation for nonlinear relationships require formal-mode SOLO level mathematics, the introduction of technology, in this study TinkerPlots™, facilitated for students an appreciation of the context in a manner that is possible in the CS mode. The specific description of the levels (i) first to construct a graph for one variable (attribute) (U1, M1, R1); (ii) then, to extend it for two variables (U2, M2, R2) including elements that are meaningful in the 2-D context such as trend; and (iii) finally extending to informal inference, again including elements necessary for this purpose such as reference lines (U3, M3, R3), are helpful in appreciating and documenting the progress students are making. There can be no doubt, observing the representations produced in Fig. 13 and considering the relationship shown in Table 5 between the SOLO levels for the hand-drawn and TinkerPlots™ graphs created, that the support for 2-D representations provided by TinkerPlots™ is instrumental in students achieving the Concept of Graph for Multiple Attributes (Fig. 4) and making informal inferences for the data they represent (Fig. 6). These examples provide further evidence of the affordances that were the feature of Watson and Fitzallen’s (2016) review of statistical software and the potential for a wide spectrum of learning opportunities.

The mystery solution data and lava flow activities illustrated an aspect of introducing statistical concepts qualitatively before the sophisticated mathematical techniques are available to provide exact mathematical descriptions of the outcomes, similar to the situation with determining correlations and rates of change (e.g. Casey, 2015), and the standard deviation. Today students learn about variation years before they learn the formula for the standard deviation (Shaughnessy, 1997). If students are exposed to the ideas of interpolation and extrapolation in meaningful contexts, even without the mathematical naming of the concepts, it will provide a meaningful background for when they reach the numerical analysis algorithms later. Imagine the joy to be able to tell a teacher or lecturer, “This is what we did in Grade 5!” And for those who do not continue with the advanced mathematics, at least they will have encountered contexts that may assist in making meaning out of claims of “experts” and graphs presented in the media.

Reflecting on Curcio’s (1989) and Shaughnessy’s (2007) suggestions on reading “beyond” and “behind” the data brings us back to this activity as including part of the STEM curriculum, in this case viscosity in science. Curcio’s “beyond the data” provides the estimated value for the context. Shaughnessy’s “behind the data”, in looking for the cause of the variation affecting the estimated value, links back to the Science Curriculum (ACARA, 2020) and the learning related to the viscosity of various substances. As questions in context are the foundation of the practice of statistics (Bargagliotti et al., 2020; Watson et al., 2018), viscosity provides a fascinating topic for embedding statistics as an integral part of STEM education (Watson et al., 2020a).

It is clear from the graphs presented that some of the data collection in the class did not support the expected trend in the data sets for the tasks set. Discussion in the classroom throughout data collection is needed to make students aware of both the need for extreme care when conducting scientific experiments and the variation that is likely to occur. In addition, reinforcement of the purpose of a continuous curve to fit the data is needed. This is necessary to ensure students move on from drawing lines that join data values using a dot-to-dot strategy to drawing lines that represent the trends in the data more generally (Fitzallen, 2012; Groth et al., 2018).

The somewhat better performance on the Lava Flow activity than the mystery solution activity (from 16 to 22 R3 responses) by the students could be attributed to several circumstances. It could have been that overall, students learned aspects of the use of TinkerPlots, and the purpose of the trendlines and the reference lines from the first activity, which assisted in the creation of the graphs for the second activity. It could have been that the conceptual difficulty of the first task (interpolation) was greater than that of the second (extrapolation). It also could have been that the context of the second activity, volcanos, was of more interest to the students in terms of imagining the answer for villagers escaping the lava. Further research with primary and middle school students should help explore these possibilities.

This exploratory study has extended the research, particularly of Ainley et al. (2001), but also others who have looked at correlation and its representation more recently to find linear trend (e.g. Groth et al., 2018). It has illustrated the value of using software, such as TinkerPlots, which is accessible to primary school students (Watson & Fitzallen, 2016), to explore data that otherwise would be overly tedious to represent graphically by hand especially without the support of 2-D scaling. There are times, however, when allowing students to demonstrate their creativity and emerging understanding of statistical concepts via hand-drawn graphs is most appropriate. Which strategy to use in the classroom will depend on the intended learning outcomes and the need for accuracy in the representation of the scale or the need for all students to use the same graph type for comparison purposes or the combining of data from a class.

Conclusion and implications

The research reported in this paper illustrates that the curriculum does not need to wait until senior secondary or tertiary levels to introduce the more sophisticated application of interpolation and extrapolation in meaningful data contexts across the STEM curriculum. This recognises to some extent Roehrig et al.’s (2021) concern about the potential misalignment of mathematics and science in supporting STEM activities, although it was not clear in their study that statistics was of any interest. Included in the recent version of the Australian Curriculum: Mathematics v.9.0 (ACARA, 2022) is a content descriptor for Year 5 related to interpreting line graphs related to time series and discussing the relationship represented. This applies to the Lava Flow activity. The “interpolation” in the Mystery Solution activity, however, depends on the interpretation of the context where time is not a variable plotted and it is necessary to understand the relationship of the two variables on the axes. There is, however, some way to go for the mathematics curriculum to provide the support needed with the interpretation of scatterplots. Scatterplots are still not mentioned until Year 10 in the new curriculum (ACARA, 2022). The fact that digital technology is suggested from Year 2 for recording data would appear to provide the environment (e.g. TinkerPlots™), where acknowledgement of studying correlation intuitively for scatterplots would be possible much earlier (e.g. Fitzallen, 2012). The scaffolding provided by the technology allows young students to study relationships of two numerical variables accurately represented on continuous scales.

This research has also demonstrated the way in which data analysis software, in this case, TinkerPlots™, provides students with the capacity to develop intuitions about statistical concepts not currently expected at the primary level of schooling. Although important to give young students the opportunity to create hand-drawn graphical representations that reflect and foster creativity (Watson et al., 2020b), there are situations where the use of standardised, conventional graphical representations is important for analysing the data (Fitzallen, 2012). For such purposes, the use of data analysis software becomes vital because it provides accurate application of scale, which is necessary for activities associated with interpolation and extrapolation. Like Nagle et al.’s (2017) use of piano wire as a “straight edge” to draw lines of best fit to eliminate students’ drawing lines that joined data points in a dot-to-dot fashion, TinkerPlots™ scaffolded student learning by ensuring the correct graph type was utilised and the tools available (e.g. reference lines) facilitated the extraction of accurate values from the graphs. Students were given the freedom to draw the trendlines using the pencil tool, which, at times, were constructed in a dot-to-dot fashion. For the purposes of this study, accuracy of the trendline was not vital because it was not being used to determine the linear relationship in algebraic terms. The decision to allow students to add hand-drawn trendlines was made to extract students’ varying understanding of the trend as a generalisation of the data. It is, therefore, incumbent on teachers to be able to discern when student learning outcomes are best served by using data analysis software to construct graphical representations from data instead of the hand-drawn alternatives. It is not always practical or productive for students to analyse graphical representations that do not have the accuracy of scale and representation afforded by data analysis software. The curriculum needs to support teachers in this regard.

This study took place at a level that dealt with less sophisticated science contexts than those in the science and engineering programs studied by Roehrig et al. (2021). Those authors, however, did not consider statistics as part of the mathematics curriculum relevant to the STEM contexts reviewed. The fact that technology allows statistical issues to be considered at the primary school level challenges to some extent Roehrig et al.’s lack of focus on the T in STEM and their narrow view of the field of mathematics. Acknowledgement of the importance of cross-curricular opportunities for younger students in STEM needs to include these two areas of the curriculum. With viscosity as a topic in the Year 5 Science Curriculum (ACARA, 2020), the opportunity arises to link to complex mathematical ideas through the use of intuitive technology to begin making informal inferences.

In relation to Bentley et al.’s (2022) questioning of the adoption of inquiry-based approaches, they note that “STEM learning tasks often already require complex conceptual understanding” and “[i]f these understandings are not automated, the learner may be under excessive cognitive load before the task has commenced …” (p. 5). This study provides just such an example where the context of the student investigations made the statistical concepts accessible, which is in line with one of Bentley et al.’s summary points.

Critical with the reforming of STEM education is the further investigation of problem- or inquiry-based learning with current STEM education practices. The importance of reviewing and adopting pedagogical approaches that are sympathetic to new cognitive understanding and cater for the cognitive load implications created by STEM learning may provide future breakthroughs to improve not only the current pedagogical approaches but also the current pedagogical teaching models. (p. 8)