Introduction

Well before the introduction of Chance and Data as a separate strand in A National Statement on Mathematics for Australian Schools in 1991 (Australian Education Council (AEC), 1991), researchers had begun to explore students’ understanding of stochastic concepts (e.g. Green, 1986). Most of the research was focused on probability (e.g. Falk, 1983; Fischbein, 1975; Fischbein & Gazit, 1984), which is not surprising given that theoretical probability was a pure mathematics topic in high schools. In Australia, Truran (1985) was reporting clinical interviews from the early 1980s with nine students in Years 4 to 10, related to the symmetry of dice and coins and exposing a range of unconventional beliefs associated with outcomes when tossing them.

Prior to 1991, the curriculum was not standardised across Australia and how statistics and probability were addressed in school was left to individual states’ curriculum bodies. After the formal inclusion of both statistics and probability in the National Council of Teachers of Mathematics ([NCTM], 1989) Standards followed by Australia (AEC, 1991) and New Zealand (Ministry of Education [MoE], 1992), specific recommendations for research began to be identified including at an International Statistical Institute Round Table in 1992 (Begg, 1993; Green, 1993), in an NCTM Handbook on Research (Shaughnessy, 1992), and in Australia (Watson, 1992). The International Association for Statistical Education (IASE) was founded in 1993 and initiated meetings, mini conferences, and special interest groups (SIGs), including a SIG devoted to research in statistics education (Phillips, 2002). The focus of this paper is the research undertaken by Australian and New Zealand researchers that arose from this international interest in statistics education research at the school level during the early 1990s.

This paper aims to track the breadth of research into statistics and probability at the school level. It has a focus on works by Australian and New Zealand researchers and does not seek to be a complete review of the whole international field. After providing some background, the paper is organised around types of research including small-scale clinical interviews, classroom studies including teaching experiments and observations, studies based on surveys and questionnaires, and studies where the focus is on teachers as well as students. The emphasis of these studies includes the development of statistical concepts; teaching approaches and their impact on learning; in-service and pre-service teacher development; assessment; longitudinal change; development of models to describe cognitive aspects of learning; teaching experiments with specific goals; and continuing impact of changes in technologies.

Approach

Several criteria were applied to the choice of articles for this review.

  1. 1.

    Authorship. Although papers from international collaborations were included, the author list of any paper had to contain Australian or New Zealand researchers, even if these people were based elsewhere.

  2. 2.

    Scope. All papers had to address research within or related to school contexts. This restriction meant that many papers addressing first year university statistics or bridging courses were excluded from the review.

  3. 3.

    Type of paper. The papers had to address findings from research studies. Many papers have been published in professional journals (e.g. journals published by the Australian Association of Mathematics Teachers [AAMT]) to provide practical help for teachers, including activities for classrooms, but these were generally excluded, other than where they were needed to verify a resource used in research or to provide context.

  4. 4.

    Date. The date range was predominantly after 1990 when interest grew in statistics education research at the school level.

  5. 5.

    Sources. In addition to journal articles, peer-reviewed conference papers were also included, especially from the International Conference on Teaching Statistics (ICOTS), as were book chapters, notably those arising from IASE gatherings, and occasionally media articles that had a research base.

The authors collaboratively developed a list of both authors and sources as a starting point. They also contacted researchers about recent work. Finally a search was undertaken of Scopus indexed papers using the search term “statistics education” and restricting the search to Australia and New Zealand researchers, and every issue of the Statistics Education Research Journal, the major journal in the field. This final review provided four additional papers.

It proved difficult to organise the material chronologically because many different threads developed in parallel, so the decision was made to categorise papers methodologically, although it is acknowledged that many studies used a variety of approaches and hence could appear in more than one category. In this situation, the authors used their professional judgement to place the relevant citations in appropriate places in the paper. This paper is organised around the following: Interview and small-scale survey studies; Frameworks for classroom studies; Classroom studies; Large-scale studies; Studies involving teachers; and Supporting statistics in schools.

The next section provides a brief background.

Background to the inclusion of statistics in the school curriculum

In New Zealand, interest in the inclusion of statistics in the school mathematics curriculum grew from a genuine concern of research professors of statistics in transforming the introductory teaching of the subject from the purely mathematical to the use of practical experimentation (Jowett & Davies, 1960). When the school mathematics curriculum was being revised in the early 1990s, this view underpinned the planning for the statistics component of Mathematics in the School Curriculum (MoE, 1992). Significant in relation to this approach was Vere-Jones (1990), representing those who were “involved in applying statistics to real world projects and advocated hands-on ‘playing with data’ in the statistics classroom at all levels” (Forbes, 2014, p. 7). This involvement of professional statisticians was important in shaping the approach to teaching in New Zealand and led to the renaming of the Mathematics curriculum to Mathematics and Statistics (MoE, 2007). In Australia, there was less involvement from statisticians, and this difference is reflected in the different ways in which the research agenda developed in the two countries. New Zealand researchers made a significant contribution to the field with the development of the Problem, Plan, Data, Analysis, and Conclusion (PPDAC) model (Wild & Pfannkuch, 1999). In Australia, in contrast, a major contribution was the development of the Statistical Literacy Hierarchy (Watson & Callingham, 2003) and the identification of appropriate goals for statistical literacy at the school level (Watson, 2006). The research agenda began with initial small-scale studies, and these are discussed in the next section.

Interview and small-scale survey studies

Much of the early statistics education research involved in-depth interviews with sample sizes less than 100. Most of this work had a focus on specific concepts such as chance measurement (Watson & Moritz, 1998; Watson et al., 1997), bar graphs (Watson & Moritz, 1999a), comparing groups (Watson & Moritz, 1999b), average (Watson & Moritz, 1999c, 2000a), variation (Torok & Watson, 2000), conjunction and conditional events (Watson & Moritz, 2002), and fairness of dice (Watson & Moritz, 2003). Many of these studies (e.g. Watson & Moritz, 1998, 1999b, c, 2000a, 2003; Watson et al., 1997) used the Structure of Observed Learning Outcomes (SOLO) model of Biggs and Collis (1982, 1991) to develop frameworks for describing students’ understanding. The SOLO model is based on five modes of cognitive development: sensorimotor, ikonic, concrete symbolic, formal, and post formal. Of most interest at the school level are the ikonic (IK) and concrete symbolic (CS) modes. Within each mode, there are three levels of response depending on how the elements required for problem solving are employed and combined: Unistructural (U) responses use a single element from the context; Multistructural (M) responses use two or more elements, in series; whereas Relational (R) responses include links or relationships among the multiple elements. These three levels of response form a cycle, the U-M-R cycle, within each mode.

Jones et al. (1997, 1999, 2000) interviewed eight Year 3 students and 4 students from each of Years 1 to 5 and used the SOLO model to develop a framework for probabilistic thinking. Following this study, Jones et al. (2000) hypothesised a framework for characterising children’s statistical thinking more broadly with four components—describing data displays, organising and reducing data, representing data, and analysing and interpreting data—each having four levels of response based on SOLO. This framework was validated and slightly modified following interviews with 20 students across grades 1 to 5 in the USA. Nisbet (2002; Nisbet et al., 2003) used this framework to consider the differences and difficulties that students had representing categorical and numerical data, finding that numerical data were more difficult for students in primary years to organise.

Extensions of the model over the years have added to the usefulness of the model by acknowledging multimodal functioning (Biggs & Collis, 1991; Collis & Romberg, 1991), where there is interaction between two modes and there can be U-M-R progressions in each. In the case of school students, this is most likely related to the IK and CS modes, with observations in particular of IK support (often visual) for CS responses (e.g. Watson & Collis, 1994). More recently working with students’ responses to chance questions, Groth et al. (2021) also found it useful to explore two distinct U-M-R progressions within the IK mode, depending on whether the response (or support) was within the context of the statistical response expected in the CS mode (normative compatible) or not (normative incompatible). These studies (Groth et al., 2021; Watson & Collis, 1994) were related to outcomes of chance events being explained by interviewees. For example, it was useful to distinguish the IK support that supported a CS response, such as discussing how a coin is tossed, from the support that did not advance statistical understanding, such as expressing a superstition about the outcome. The SOLO model and its adaptations over the years have continued to be useful in analysing the learning outcomes from students undertaking statistical investigations.

The introduction of technology into classrooms has also impacted on this type of research. As the use of software has become more common in classrooms, it has been possible to base interviews on students’ interactions with technology. For example, Prodromou (2011) reported on an interview with two Year 8 students based on interaction with TinkerPlots (Konold & Miller, 2015) as the students worked their way through three data sets based on the weight of back packs, then adding gender, followed by year level. Implications included the students’ appreciation of variation in the data and their confidence in informal inferences with increasing sample size. As part of a design experiment based on four lessons with a Year 7 class employing TinkerPlots, including the use of hat plots (Watson & Donne, 2008), the SOLO model was employed across responses in the lessons for 15 students, and 12 students were subsequently interviewed. Not all of the students used hat plots—a representation produced by the software to show the “middle 50%” of the data rather than the more complex box-and-whisker plot—preferring to use “bins” or arbitrary reference lines, and the SOLO levels in relation to responses ranged across unistructural to relational. Also, these twelve students and another 12 Year 5/6 students who had taken part in activities with TinkerPlots for four weeks were interviewed with the “comparing groups” protocol used by Watson and Moritz (1999b). This study considered differences between being presented with hard-copy graphical representations and the same data sets in TinkerPlots. Students were given freedom to create whatever graphical representations they desired to answer a question about which of two groups had done better on a quiz of “quick recall of maths facts” (Watson & Donne, 2009). The students carried out the comparison of groups four times with different sized groups. The time for analysis took longer for the students using TinkerPlots because they tried different possibilities with the software before making a decision, but the advantage was that they could create something meaningful to them rather than having to interpret a fixed graphical form presented on a piece of paper. This and other affordances of the software are reported from research around the world in Watson and Fitzallen (2016). These affordances include value plots as well as dot plots, a ruler tool to show differences of data values from the mean, bins for grouping data, and dividers for highlighting differences in distributions.

Callingham (2011), in the context of assessment, pointed out that technology use led to a wider range of representations being available (Lesh, 2000) such as heat maps, “hat” plots, and complex nested tables. Technology use also provided opportunities for innovative approaches to assessment of statistical understanding. For example, students could easily produce a PowerPoint demonstrating understanding of statistical or probabilistic concepts using technology, or in computer-based assessment, use drag-and-drop interactions. In particular, technology use reduces cognitive demands associated with the production of routine representations such as some types of graph, and allows for an emphasis on interpretation and communication. Fitzallen (2006, 2008) developed a theoretical framework specifically to address graphing in a technology environment, together with a traditional pen-and-paper assessment instrument. More than half the students involved in the trial of the assessment tool indicated that they had not used technology to create graphs. The respondents also had difficulty drawing inferences from graphs or determining trends in the data. Prodromou (2015) provided a review of the research about the use of technology in statistics based on the Technological Pedagogical Content Knowledge framework (Mishra & Koehler, 2006). She concluded that technology environments that allowed students to interact with the data gave opportunities to draw deeper connections between real-life situations and statistical problem solving activities in the classroom.

Frameworks for classroom studies

Closely linked to the interview research were various studies conducted in actual classroom settings. As with the interviews, the early studies focussed on specific aspects of statistics and probability. As the research moved into classroom settings however, study outcomes broadened as students began to draw on wider knowledge bases. Constructs such as “statistical literacy” (e.g. Wallman, 1993) and the “practice of statistics” (Moore & McCabe, 1989) began to emerge in the classroom context, emphasising the contextual nature of statistics, in contrast to the mathematical bases of probability that were the focus in early studies. This led to new frameworks being developed that guided researchers.

Wallman (1993) set the scene for statistical literacy with the statement:

“Statistical Literacy” is the ability to understand and critically evaluate statistical results that permeate our daily lives–coupled with the ability to appreciate the contributions that statistical thinking can make in public and private, professional and personal decisions. (p.1)

Following this, Watson (1997a) provided one of the earliest frameworks for statistical literacy with a three-tiered model: (1) understanding basic statistical terminology; (2) understanding that terminology in social contexts; and (3) questioning statistical claims made in context. The examples provided were from authentic media stories presenting claims based on limited information and data. Moving from the adult media to preparing students during their school years, Watson (2006) presented suggestions based on previous research related to the following: Sampling, a good start; Graphs, how best to represent data; Average, what does it tell us?; Chance, precursor to probability; Beginning Inference, supporting a conclusion; and Variation, the underlying phenomenon.

Coming from the perspective of the actual practice of statistics as employed by their applied statistician colleagues, Wild and Pfannkuch (1999) introduced the PPDAC model of statistical thinking for empirical enquiry. They suggested that the investigative cycle includes Problem, Plan, Data, Analysis, and Conclusion (PPDAC). Franklin et al. (2007) adapted the New Zealand model in the American Statistical Association’s (ASA) GAISE Report, with four steps for statistical problem-solving, that acknowledged the importance of variation at every step. Their stages were:

  • Formulate question/s, anticipating variability:

    • Clarify the problem at hand

    • Formulate one (or more) questions that can be answered with data

  • Collect data, designing for variability:

    • Design a plan to collect appropriate data

    • Employ the plan to collect the data

  • Analyse data, accounting for variability:

    • Select appropriate graphical and numerical methods

    • Use these methods to analyse the data

  • Interpret results, allowing for variability:

    • Interpret the analysis

    • Relate the interpretation to the original question

Makar and Rubin (2009), in working with primary children, focused more generally on what they termed “informal statistical inference” (ISI). Rather than particular steps, their process stressed the importance of dealing with actual data, as would be expected in the primary classroom:

1. Generalization, including predictions, parameter estimates, and conclusions, which extend beyond describing the given data

2. The use of data as evidence for those generalizations

3. Employment of probabilistic language in describing the generalization, including informal reference to levels of certainty about the conclusions drawn (2009, p. 85).

They then presented extensive examples of classroom contexts, dialogues, and student work illustrating these features as a framework: generalisation beyond the data, data as evidence, and probabilistic language. Later Makar and Rubin (2018) further focussed on five principal components of ISI:

  1. 1.

    Making claims that extend beyond the data at hand.

  2. 2.

    Acknowledging the uncertainty inherent in the claim.

  3. 3.

    Using data as evidence for the claim.

  4. 4.

    Considering the aggregate (focusing on signal and noise).

  5. 5.

    Consideration of the context.

This framework has been used by many others working in primary classrooms (e.g. Fielding-Wells, 2018; Pfannkuch et al., 2015; Watson & English, 2018).

The emphasis on statistical literacy (Wallman, 1993) and the practice of statistics (Moore & McCabe, 1989) together impacted on classroom practice. The nature of learning and teaching experiences began to change from a narrow focus on mathematical statistics to students conducting investigations in social contexts of interest to their age and experience. In turn, research shifted to considering whole class experiences as described in the next section.

Classroom studies

Early classroom studies often developed out of interview protocols. For example, Watson et al. (1995) used a set of data cards originally developed for use in interviews. The 16 cards each had information about a different fictitious student. Trialled with students in Years 6 and 9 using interviews, they were later used as a basis for group work in two Year 6 classrooms (Watson & Callingham, 1997). The data were analysed using SOLO, and in the group rather than the interview situation, higher level responses were identified as the students sorted the cards and then represented the data to convince others about what they had learned from the data. Using the same data cards approach, Chick and Watson (2001) reiterated the higher order responses obtained from 27 Year 6 students working in groups. Similarly, Pfannkuch and Rubick (2002) used these data cards with twelve 12-year-old students to explore how students construct meaning from data. They identified five issues for consideration: prior knowledge of statistics and context; active representation of the data to make meaning; moving beyond the data using higher-order thinking; linking local and global ideas; and changing the statistical language with different representations.

Reading (2004) employed the SOLO model as adapted by Reading and Shaughnessy (2004) to consider responses by students in Years 7, 9, and 11 before and after a classroom intervention, in the context of using rainfall or temperature data to decide on the best month of the year for an outdoor youth festival. In both of these studies, students had to explore and make decisions about the data, and justify their thinking. The use of SOLO allowed higher order responses to be identified. Pfannkuch (2005) also used SOLO as a framework for analysis of thirty 15-year-old students’ responses to two statistics activities. She provided insights into potential ways in which teaching and assessing statistics might be improved.

English (2010) engaged 6-year-olds in reading picture books including Baxter Brown’s Messy Room, purposefully created by English to provide a context for data modelling (English, 2013). When provided with small, illustrated cut-outs of the various items in the story for students to sort, across several lessons students first sorted by type of object, e.g. bones and cans, later putting items into categories they determined themselves, such as recycle, throw away, and compost. Further sessions with a set of 12 distinct objects produced various models for grouping the objects. A summary of English’s data modelling activities in this context across Years 1, 2, 3, and 7 is found in English (2014).

In a small classroom study with 5-year-olds, Kinnear (2018) explored the potential of English’s picture books to create interest in statistical problems. With these young children, the potential of data gathering was most effective with the story of Baxter Brown (English, 2013), a white fluffy dog, who was lost under the mess of his room. Of the three scenarios, Kinnear speculated that perhaps the children could identify more readily with the familiar context of Baxter Brown’s “messy” room dilemma. These studies also emphasise the importance of the context in which the statistics are embedded.

Oslington et al. (2018) also worked with nine very able Year 1 students. These students developed rules for categorising portraits drawn by other children, and then applied these rules to portraits that they created themselves, identifying different ways of grouping the data (portraits). These studies indicate that very young students can engage with data in meaningful ways.

In a longitudinal study beginning with 46 Year 3 students, Oslington et al. (2020, 2021) considered the development of students’ predictive reasoning strategies, based on a two-way table of maximum daily temperatures for each month of the year for the previous seven years for Sydney, Australia, where the students lived. In Year 3, the students were asked to fill in the missing values for the current temperatures based on the information in the table, to respond to the prompt, “Write down anything you notice about the numbers,” and to construct a representation for the prompt, “Show how the numbers might look on a graph” (Oslington et al., 2020, p.11). The coding for each of the three requests was based on a developmental structural model with five levels: Pre-structural, Emergent, Partial, Structural, and Advanced Structural. Also, the researchers considered four data lenses used by the students in their observations: data as pointers, data as case values, data as classifiers, and data as aggregates. Analysis of the use of data lenses by level of structure showed a strong relationship. Further in Year 4, 44 of these students considered the same questions with the task to predict the following years’ maximum monthly temperatures with eight years of data. For the three tasks, there was more improvement in Year 4 for the number of reasonable predictions of maximum temperatures and the graphical representations, than for the explanations about the numbers.

Much of the research of Fielding-Wells has been based in primary classrooms using an inquiry-based pedagogy. Activities have included children 9–10 years old using evidence when planning investigations of the usefulness of shower timers and choosing the most popular songs for a school disco (Fielding-Wells, 2010), children 7–8 years old arguing about strategies for playing an addition bingo game exposing the equiprobability bias (Fielding-Wells, 2014; Fielding-Wells & Makar, 2015), and children 10–11 years old developing ideas associated with distribution, centre, and variability, and employing technology, while testing two types of catapult planes (Fielding-Wells, 2018). These studies were based on the stages of the ASA statistical problem-solving model (Franklin et al., 2007)—formulating a question, collecting data, analysing data, and interpreting results—and demonstrated the value of this model being reinforced in separate, but linked, components.

More recently classroom projects have involved schools making multiple year commitments allowing for students to be followed over time as concepts are developed and monitored. One project following from English’s (2010) early childhood work on modelling, worked with all students in Year 4 of a school and followed them through to Year 6, with a planned series of activities developing beginning inference through the practice of statistics. This sequence began with problem posing as students developed multi-part questions about improvements their fellow students would like to see to the school playground (English & Watson, 2015b). Different types of questions, such as “how” and “if,” were created, as were diverse representations of the results. The next focus was on variation in an activity where students compared graphically, the measurements that all students made of one student’s arm span, with the measurements made of all students’ arm spans once (English & Watson, 2015a). The software TinkerPlots (Konold & Miller, 2015) was introduced, allowing students to begin to explore multiple ways of displaying variation in distributions. Expectation was combined with variation in an activity based on tossing two coins to work out the probability of achieving two heads, two tails, or one of each (English & Watson, 2016). With an initial expectation of equal probability for the three outcomes, the progression from tossing coins by hand to having TinkerPlots produce large numbers of trials allowed students to appreciate the decreasing variation in the proportion of each outcome with larger numbers of trials. Although appreciating this aspect, it was not easy for the students to create visual models of the relationship they discovered among the coins. The complete process of carrying out the practice of statistics was formally introduced in an activity based on the Australian Bureau of Statistics (ABS) CensusAtSchool website, which provided a series of questions on whether Australian students were environmentally friendly (Watson & English, 2015). These questions introduced categorical (Yes/No) data with students allowed to determine their own criteria as to whether their classes were environmentally friendly or not. Students were then provided with a “population” of 1300 Year 5 students from the ABS CensusAtSchool in order to carry out “random” sampling with TinkerPlots and make similar decisions for Year 5 students across Australia. In Year 6 decision-making was extended, in the context of first having students measure their reaction times in two ways in Year 5 (Watson & English, 2017) and then using the “better” method to test a hypothesis found in the media that brown-eyed people had faster reaction times than people with other coloured eyes (Watson & English, 2018). Rasch analysis of four surveys, at the start of Year 4 and the end of each subsequent year, showed a statistically significant improved performance overall but considerable variation across years and students (Watson et al., 2017). This likely reflected the ambitious nature of the activities’ contexts and the sophisticated aims of the project.

A more recent four year exploratory study considered the possibilities of linking the practice of statistics with topics from areas of STEM (Science, Technology, Engineering, Mathematics), in particular Science (Fitzallen & Watson, 2020). Addressing topics such as heat (Fitzallen et al., 2017), manufacturing (Watson et al., 2020), catapults (Watson et al., 2022a), and plant growth (Watson et al., 2022b), students developed surprisingly sophisticated representations of the data (see Fig. 1) over the period of the project. This project demonstrated both the opportunities and the need for closer cooperation among the curriculum developers across the curriculum to ensure that meaningful investigations can be undertaken across the primary years.

Fig. 1
figure 1

Examples of representations for science activities

Similarly, working with 26 Year 5 students in Queensland, Doerr et al. (2017) used a model-based approach to provide a coherent sequence of topics and novel situations for activities focussing on drawing informal inferences with comparing two data sets. A series of student questions were developed beginning with “How long does a paper helicopter stay in the air?” and progressing to consideration of whether height matters and whether length of wing matters, illustrating the potential of a modelling perspective linked to informal statistical inference.

Similarly, Makar and Allmond (2018) used a modelling approach in a Year 6 classroom to identify “Which origami animal jumps the furthest” (p. 1142). Their focus was the purpose, process, and prediction of the modelling process as a basis for developing statistical thinking and informal inference. These studies emphasised the importance of statistics as a unifying thread across different learning areas, and STEM in particular.

Classroom studies have also involved older students. For example, in New Zealand, Budgett et al. (2013) included five final-year secondary school students with five tertiary students in a pilot teaching program on data visualisations and the randomisation test, including the use of software, with pre- and post-evaluations. Problematic aspects included the language used with and by the students, the students detecting a causal relationship between two variables, and in students being able to interpret “no evidence against chance.” In other classroom studies, Arnold (2008) reported on a pilot study of 15 Year 10 students, introducing question posing in a multivariate data set using information on data cards. The study involved pre- and post-tests around two lessons focussed on problem posing, reflecting the first P in the PPDAC model. Arnold et al. (2011) later reported on a much larger classroom intervention with approximately 100 students around 14 years old involving computer simulations and animations to enhance students’ inferential reasoning. This was based on the preparation of extensive learning materials (Pfannkuch et al., 2010), including dialogues for comparative reasoning (“I notice…, I wonder…, I worry…, I expect…”). Based on a continued extensive focus in the class in this fashion while looking for evidence to reach a conclusion (“make a call”) in an investigation, a five-level categorisation was used to compare results from pre- to post-test. Both of these studies drew on aspects of the statistical investigative cycle (PPDAC) and the hierarchical framework implicit in SOLO.

In a small study in a Year 10 class directed at students’ development of informal inferential reasoning, two types of context were the focus: the data-context of the problem considered in the classroom and the learning-experience-contexts that the students brought with them from their previous real-life and learning experiences, including interaction with the teacher (Pfannkuch, 2011). Presenting box plots (a data context) for comparing two conditions (e.g. owning a cell phone) of a variable (e.g. hours of sleep), created an example about whether it was to be expected that those with cell phones would have less sleep. Later examples focused on data from different samples to create cognitive conflict in making informal inferences by taking account of both the data presented or collected, and the learning experiences of the students in the context employed.

Arnold and Pfannkuch (2014, 2016) reported on a classroom study of Year 10 students’ development of the concept of distribution over a series of three lessons based on an extensive framework of characteristics and 28 specific features of distribution. The lessons focused on the language of shape and used the SOLO model (Biggs & Collis, 1982, 1991) for the analysis of pre- and post-tests. For the entire class, they found that 26 out of 27 students improved their SOLO levels of performance from the pre- to the post-test in describing the visual presentations they were given (2016), while a detailed analysis of two extreme students—one who did not change and one who moved from unistructural to extended abstract—showed that these students had used 18 of the 28 features of distribution. In a different intervention study, 27 students took part in a 16-lesson classroom teaching experiment (Arnold & Pfannkuch, 2018), including pre- and post-test response analysis of five questions involving the critiquing of investigative questions. All students improved performance on the second testing.

Following the results of these studies, Arnold and Pfannkuch (2019) addressed the broad issue of posing comparative statistical investigative questions, based on four teaching experiments. The main findings from 26 students completing pre- and post-tests in the fourth experiment, who were asked to pose three comparison investigative questions, used similar SOLO model criteria as in earlier studies based on a complex two-dimensional developmental hierarchy. One dimension was based on eight categories of questions related to comparison investigations, ranging from A, including nonsense questions or questions not involving comparison, to H, comparison questions including the idea of tendency and phrases such as average. The other dimension was based on the population chosen for the questions, ranging from (1) referring to “the sample,” to (6) referring to an actual New Zealand student population, for example by year or gender. These categories were then combined to reflect six SOLO levels from “no response/idiosyncratic” (0) through to extended abstract (5) (Biggs & Collis, 1982). Analysis using these scores showed that 23 of the 26 students improved their performances from pre- to post-test, with a highly significant paired t-test.

Considering statistical literacy as a general component of external achievement standards at the end of schooling in New Zealand, Budgett and Rose (2017) designed a classroom intervention to assist students in critically evaluating media reports. The contexts played a role in terms of motivation, as well as providing conceptual foundations for interpretations of claims made in the media reports. Visualisations and procedural scaffolds were also useful in assisting student to make evaluations.

Sharma (2014, 2016) showed that culture, as well as context, was important in influencing secondary students’ statistical thinking. Using interview data from 14-16 year old students in Fiji, she identified difficulties with probabilistic language and a disconnect between students’ classroom experiences with probability and those met outside of school. She recommended using familiar contexts, such as “Fiji Sixes,” to teach probability in Fiji schools. In addition, the school context of following teachers’ directions appeared to inhibit students’ explanations when presented with an open-ended task. Similar findings have been shown in other areas of mathematics (e.g. Hunter, 2023).

Classroom-based research has highlighted two key outcomes. First, sophisticated statistical thinking can be achieved by students in primary and middle years, especially when scaffolded by a statistical investigation framework. Second, a range of appropriate activities can bring about improvement in students’ learning. The scale of studies located within classrooms was, however, relatively limited and to achieve some generalisation across school contexts different approaches were needed.

Large-scale studies

Large-scale student surveys

Large-scale survey studies began to emerge using items that were developed from the earlier small-scale interviews. Whereas early studies addressed specific concepts of statistics and probability, these larger survey studies tended to be broader. Reading (2002) reported the development of a Profile of Statistical Understanding based on responses of 180 students across Years 2 to 12. This study did not include any probability items and was restricted to aspects of data. The profile was developed using Rasch analysis (Bond & Fox, 2015) and was specifically linked to the SOLO model.

Another study considered the over-arching concept of variation (Watson et al., 2003) across many different contexts. Students in Years 3, 5, 7, and 9 (N = 746) responded to a 16-item survey covering notions of variation in chance, data, and sampling contexts. A hierarchical coding scheme was developed for each item, based on SOLO, and the data were analysed using Rasch (Bond & Fox, 2015) approaches. The outcomes were interpreted as a 4-level scale of appreciation of variation of increasing sophistication. This study provided a springboard for several later studies, also using Rasch analysis.

As the notion of statistical literacy began to take hold, ways of describing it emerged as introduced earlier. Watson’s (1997a) three-tier framework was used with specific aspects of statistics that went beyond the school curriculum, such as sampling (Watson & Moritz, 2000b). There was a need, however, to bring many of the underlying concepts together in an overarching construct. To this end, Watson and Callingham (2003) used archived data from several prior studies employing small-scale surveys and interviews, all of which had included items coded with a hierarchical structure based on SOLO or similar models. This exploratory study was important because it included a wider range of concepts from both statistics and probability than previous studies, and aimed to establish that the broader idea of statistical literacy was a single hierarchical construct. The analysis, based on responses from a total of 3852 students across Years 3 to 9, was undertaken using Rasch approaches. A six-level hierarchy of statistical literacy was identified as shown in Table 1. The construct was confirmed in a second empirical study using the same analysis technique, with responses from 673 students across Years 5 to 10 in five different schools (Callingham & Watson, 2005). The new study used some of the items from the previous study together with new items to address perceived gaps in the initial exploratory study. The aim of this study was to provide information to teachers in the form of a “profile” of student understanding.

Table 1 Statistical literacy construct (Watson & Callingham, 2003)

Large-scale teacher collaboration studies

The establishment of the statistical literacy hierarchy provided a springboard for two further large-scale intervention studies, both of which also involved teacher professional learning. The first of these, StatSmart (Callingham & Watson, 2007), specifically provided professional learning to teachers with the aim of improving students’ learning outcomes. The second, Reframing Mathematical Futures 2 (RMFII) (Siemon et al., 2018) aimed to develop assessment processes for statistical reasoning through collaboration with teachers in the middle years (Years 7 to 10).

The StatSmart study, a partnership with the Australian Bureau of Statistics, employed a sophisticated research design (Callingham & Watson, 2007) that allowed student participants to be tracked across their involvement with the study and to be linked to individual teachers across the years. The study provided teachers with targeted professional development with the aim of improving student learning outcomes, and involved over 4000 students across Years 5 to 11 in 19 different schools from three sectors (Independent, Catholic, and Government) in three Australian states. Teachers had to teach at least one unit of statistics during the year (a requirement of the curriculum), and students were surveyed at the start of their first year in the project, at the end of that year and again one year later. About 200 students stayed in the project for all three years, and these students undertook a fourth or fifth survey, which was used to trial new items. The surveys were administered and marked by teachers in their normal classrooms, because part of the philosophy of the project was that teachers needed to get close to their students’ learning if they were to intervene effectively. Among several outcomes from this study for students (summarised in Callingham & Watson, 2017), a key finding was that students were not moving into the higher levels of the statistical literacy hierarchy shown in Table 1. Although students’ statistical skills did improve, they did not appear to develop the critical, questioning stance that Gal (2002) suggested was essential for statistical literacy.

A unique aspect of the StatSmart project was that a measure of teachers’ pedagogical content knowledge (PCK) in statistics was developed (Callingham & Watson, 2011) that allowed the complex relationship between PCK and students’ outcomes to be examined. The influence of teachers’ PCK in statistics was found to have a positive influence on students’ learning (Callingham et al., 2016). Because of the ways in which teachers’ PCK was assessed, comparisons could also be made between student and teacher knowledge of specific topics in statistics (Watson & Callingham, 2013, 2014), highlighting the complexities for teachers of dealing with their students’ mathematical understandings and appropriate pedagogical approaches.

Further outcomes from the StatSmart project came from an associated study looking at students’ interest and self-efficacy with respect to statistics. In addition to developing a scale of interest in statistics (Carmichael et al., 2009), the complex interactions among self-efficacy, prior mathematics achievement, and interest in statistics were examined. Interest in statistics and self-efficacy interacted to influence academic achievement positively (Carmichael et al., 2010; Hay et al., 2015). Findings emphasised the importance of context to raise interest when teaching statistics.

A second significant large-scale study was Reframing Mathematical Futures 2 (RMFII). The focus of this study was mathematical reasoning across three domains: Geometric, Algebraic, and, of interest here, Statistical Reasoning. The aim of the project was to develop an evidence-based learning progression in each of the domains, and then to provide teachers with appropriate assessment tools to identify their students’ progress along the progression alongside targeted teaching advice (Siemon & Callingham, 2019). In the statistics domain, items from the StatSmart study were revised where the context was no longer relevant, such as a question about median house prices, and augmented by new questions to tap into higher order thinking. Across the three years of the study, approximately 80 teachers and 3500 students across Years 7 to 10 were involved in the development and trialling of the Learning Progressions and Teaching Advice. To comply with the requirements of the project, an eight-zone Learning Progression of Statistical Reasoning was developed, and each zone was attached to targeted Teaching Advice (Callingham et al., 2019). A total of four statistical reasoning assessment tasks were also produced, so that teachers could estimate where their students were along the learning progression prior to beginning a teaching sequence using the Teaching Advice as a framework for targeted teaching. Teachers then followed up with a second assessment task to determine progress. In this way, teaching was explicitly linked to the research outcomes.

The RMFII project, however, went further. Using Rasch analysis, the three domains of Geometric, Algebraic, and Statistical Reasoning were combined to create a hierarchy of Mathematical Reasoning. This scale was in turn linked to archived data about multiplicative thinking, demonstrating that the three reasoning domains were all underpinned by the knowledge and understanding of mathematical relationships that were at the heart of multiplicative thinking (Callingham & Siemon, 2021). The RMFII project probably went further than any other in explicitly bringing mathematics and statistics together, while also recognising the essential differences (Callingham et al., 2021).

Studies involving teachers (classroom and preservice)

An early study relating to the teaching of statistics was reported by Begg and Edwards (1999), who used interviews and surveys about teacher’s beliefs, attitudes, and content knowledge about statistics and teaching statistics to profile 22 primary and 12 pre-service teachers in New Zealand. Findings fell into four main categories reflecting in many ways the emerging interest in pedagogical content knowledge (PCK, Shulman, 1987). For this sample, beliefs and attitudes about statistics included initial negative views and fears, and limited appreciation of the need for a mathematical background to understand statistics in the media. In terms of content knowledge, the interviewees were considered weak in probability and average, with a preference for bar graphs and pictographs for data representation. For beliefs and attitudes about teaching statistics, generally, it was viewed as “a part of maths,” and relatively procedurally based without any further applications across the curriculum. Finally, in terms of their knowledge in relation to teaching statistics, teachers generally felt that knowing how to provide statistical activities was more important than knowing the statistics. Some were not yet familiar with the curriculum but overall expressed confidence in teaching statistics. The authors concluded that more support was needed for both categories of teacher.

An early study of teachers’ attitudes to chance and data (Callingham et al., 1995) indicated differences between teachers in primary and high schools, and between male and female teachers on some aspects of confidence in using and teaching statistical and probability ideas. This study led to Watson (2001) developing an extensive teacher profiling instrument. This was designed to assess teachers’ knowledge and identify needs in terms of adopting new professional standards for teaching chance and data. Trialled with 43 teachers, results showed a wide range of outcomes in terms of planning, teaching practices, content knowledge, confidence, beliefs, background, and previous professional learning. The need to continue monitoring on these factors and intervening where necessary was stressed.

Working with 33 BTeach pre-service secondary mathematics teachers, Watson (2000) challenged them with a Harvard Case Study (Merseth & Karp, 1997) based on solutions to Kahneman and Tversky’s famous “hospital problem” (1972) related to appreciation of sampling. The activity exhibited strategies based on mathematics or intuition or a combination of the two, and different mathematical approaches often related to the participants’ previous mathematics backgrounds, which were quite varied. Reaction to the case study after solving the problem was positive, as the BTeach students compared simulations with coins and software (Konold & Miller, 1994) and discussed the opportunities for reinforcing work with fractions, percentages, and independence of events, for an intuitive appreciation of the solution without the formal mathematics.

Makar and Confrey (2005) interviewed 17 pre-service secondary mathematics and science teachers at the beginning and end of an assessment course, including overviews of data graphing, descriptive statistics, linear regression, sampling distributions, and inference. The interviews were based on interpreting a dot plot presenting variation across two conditions for some students’ scores. Although reporting on the improvement in the use of standard language across the two interviews, of particular interest was the use of nonstandard “Variation-talk.” This was found in four categories, to express spread, low-middle-high, modal clump, and distribution chunks, reflecting the work of Konold and Pollatsek (2002) and Konold et al. (2002).

Burgess (2010) used classroom video and stimulated recall discussions of four upper primary school teachers to identify progress on four types of knowledge required: common knowledge of content, specialised knowledge of content, knowledge of content and students, and knowledge of content and teaching (Ball et al., 2008; Burgess, 2006). He found that knowledge of content and students grew the most. Less frequent was the growth of specialised knowledge of content.

In working with pre-service primary teachers (PST), Prodromou (2012) used data from 100 portfolios to consider the PSTs’ understanding of the relationship of experimental and theoretical probability and how it can be fostered. In further work with pre-service primary teachers, Prodromou (2013) analysed 20 of their essays to report how they could use students’ difficulties in understanding the arithmetic mean to influence the teaching of the topic.

As part of a three year Australian government initiative for a mathematics and science teacher program, a cross-university collaboration developed 25 online learning modules for pre-service primary teacher programs, including one on statistical literacy. Bilgin et al. (2017) describe The Statistical Literacy Module for primary teachers from its development and evaluation to initial trials, with an inquiry-based approach relevant to teacher practice using a six-phase model (Engage, Explore, Explain, Elaborate, Evaluate, and Elucidate). Evaluation of the module indicated that 9 of the 16 students who completed the module, indicated both satisfaction and suggestions for improvement.

A single teacher’s scaffolding norms for developing an argumentation-based inquiry for statistical investigations were the focus of a study in a Year 4 classroom (Makar et al., 2015). The teacher taught one unit of work in each of four terms over a school year. The teachers set expectations for classroom behaviour including Classroom talk: active listeners (reflect on others’ ideas), clear audible speakers, and active contributors. Expectations were also set for Collaboration: It is possible to reach a conclusion using more than one method; everyone is expected to think; all ideas and opinions are valued; and ideas can be questioned or challenged respectfully. Furthermore, prompts were given to students to scaffold their questioning of each other, e.g. “Tell us more about…,” “I agree/disagree with…,” and “What convinced you that was the answer?” Having built the norms across the year, the activity completed in term 4, “What is the typical time it takes for a Year 4 student to read a book?” was used to document the teacher success in scaffolding.

In a different approach, Howley and Roberts (2020) described an initiative where tertiary educators and STEM industry workers travelled to remote areas of New South Wales to deliver professional learning to teachers of Years 3 to 10 and workshops for students based on their practical work as statisticians in environmental science and sustainability. These encounters illustrated the practice of statistics in terminology appropriate to the school curriculum in order to motivate both teachers and students in appreciating the value of statistics to answer real-world questions. Analysis of pre- and post-intervention survey questions of both the teachers and students indicated a large positive change in both groups related to belief in the importance and applications of statistics in addressing environmental issues through accessible projects.

Concerns about the statistical literacy needs of the education sector were expressed in relation to quantitative data being provided for teachers to make decisions. This concern prompted a project that surveyed 704 Victorian teachers, probing teachers’ perceptions of data and particularly their understanding of box plots, which is the way many of the NAPLAN results are presented (Pierce & Chick, 2013). Some serious misconceptions were discovered for some aspects of box plots and issues were raised about teachers’ abilities to interpret and use effectively the information provided. Reporting a case study of a single teacher, Pfannkuch (2006) considered the reasoning of the Year 11 teacher while teaching the class about comparing box plots, using two examples. Analysis focussed on 10 aspects, including shift, spread, sample size, and individual cases, with examples, followed by a discussion of implications for continuing research and teacher training.

Studies such as these led to a further consideration of the factors that teachers thought were influencing their engagement with statistical results (Pierce et al., 2013). In relation to System Reports on Students Achievement (SRSA) provided in Victoria, including NAPLAN, the teachers, both primary and secondary, reflected on their attitudes to the usefulness of the SRSA generally, with over 50% agreement on usefulness, but only 28% agreeing that NAPLAN tests were well designed to assess students’ achievements. Generally, teachers said that school administrations were more interested in SRSA than parents were, and more than half felt the timing of the release of NAPLAN results reduced their effectiveness. Using Rasch analysis, a hierarchy for professional statistical literacy was created from the data for 704 teachers (Pierce et al., 2014b), with the characteristics of the scale ranging from reading a data value from a table, to analysing the data set provided and explaining all components. The comparison of scores showed statistically significant differences by gender, by professional learning attendance, by teacher job classification, and by last statistics course studied. As a follow-up to these results, a workshop was developed (Pierce et al., 2014a), in which teachers were given hypothetical data for 30 students to be manipulated on a scale to create a distribution. Five workshops were conducted, and although the feedback was positive, significant misconceptions persisted for some on what is shown or not shown in a box plot. Given the few times a year that the knowledge is required, the authors created a series of short online packages to assist when required throughout the school year.

In 2014, the updated Senior Secondary Mathematics subjects (ACARA, 2014) in Australia included increased statistics content. Marshman et al. (2015) compared statistics in the new curriculum with that in operation at the time in Queensland, finding many differences. They surveyed teachers attending professional learning about their attitudes related to their own cognitive competence, use of technology, belief in the value of statistics generally, and emotions concerning statistics in the affective domain. Significant associations were found between the affective domain and the cognitive and value domains, and between the cognitive and technology domains. Dunn et al. (2015), had concerns about the use of language in both the curriculum and textbooks, and reported on these same teachers’ definitions of important statistical terms: mean, standard deviation, sample, confidence interval, and standard error. In general, teachers demonstrated a limited understanding of the terms, and the researchers found that the textbooks designed to support them were unhelpful in this regard.

Further examples have been reported over the years from studies reflecting the characteristics of teachers in relation to teaching statistics and interventions designed to improve teaching. Pierce and Chick (2011), reported on the beliefs of teachers about statistics generally, with particular examples from their pre-service primary teachers’ beliefs about statistics, the relationship between mathematics and statistics, and teaching and learning statistics. For these students, the beliefs mainly impacted on their acknowledgement of using an interactive approach to teaching. Reading and Canada (2011) reviewed specifically the research related to teachers’ understanding of distribution, presenting several frameworks, with examples from pre-service teachers and professional development of in-service teachers. In the context of assisting teachers to teach statistics using statistical investigations, Makar and Fielding-Wells (2011) reported on a project with primary teachers based on a four-step “model of learning to teach statistical inquiry: Orientation, Exploration, Consolidation, and Commitment” (p. 354). The implications for teacher education included making explicit the required statistical content knowledge, engaging teachers in investigations as learners, helping teachers embed their learning in their classrooms, encouraging collaboration with other teachers and researchers, providing time and opportunity for reflection, and continuing with long-term support and resources.

From a different perspective, Burgess (2011) employed a two-dimensional framework relating four aspects of statistical knowledge for teaching (related to PCK) to the types of thinking involved (derived from Wild & Pfannkuch, 1999) and the investigative or interrogative cycle of an activity. Discussing the types of knowledge and ways of thinking, he emphasised the need in pre-service teacher education to avoid over-focus on single aspects: to be able to teach investigations teachers required the wide foundation he identified.

Based on detailed Rasch analysis of 12 survey items related to PCK for statistics for 45 teachers involved in an extensive professional learning program, Callingham and Watson (2011) identified a hierarchical scale, interpreted in four levels: Aware, Emerging, Competent, and Accomplished. These levels reflected teachers’ abilities to suggest both appropriate and inappropriate student responses to survey questions and to display increasing appreciation of pedagogical content knowledge in their suggestions for next steps for students. A follow-up survey of 18 teachers after the professional learning program showed a statistically significant increase in the mean value of teachers’ responses with an effect size of 0.59, suggesting the scale could be useful for other such programs. The levels are summarised in Table 2.

Table 2 Levels of statistical PCK

Supporting statistics in school

Although a full discussion of the place of statistics and probability in school curricula is beyond the scope of this paper, as the research agenda developed, so did support packages for teachers. These packages were developed in different ways, reflecting diverse stakeholder influences.

As indicated earlier, in New Zealand, professional statisticians became involved in the school curriculum. Significant to the curriculum reform in New Zealand was the establishment by the New Zealand Statistical Association (NZSA) of its Education Committee, which actively lobbied for statistics across all levels of education. This tertiary group set up the 1990 Children’s Census in conjunction with the third ICOTS (Vere-Jones, 1990) held in Dunedin. This led eventually, in collaboration with other nations including Australia, to the international CensusAtSchool program that provided an excellent resource for teachers at all levels for introducing the statistics curriculum. There continued to be close collaboration with the NZSA over the years with many professional development sessions supported by Vere-Jones (1995) and others from the tertiary sector (Forbes, 2014).

In contrast, after the introduction of the National Statement on Mathematics for Australian Schools (AEC, 1991), in Australia teachers drove change as they began to demand professional learning to implement the new curriculum Chance and Data strand. This need led to a number of initiatives to support teachers. These included Maths Works: Teaching and Learning Chance and Data (Watson, 1994) supported by the AAMT; the three year LUDDITE project (Learning the Unlikely at Distance Delivered as an Information Technology Enterprise) (Watson & Baxter, 1997; Watson et al., 1996; Watson, 1997b) culminating three years later with the formal trialling of a package of materials, including the text Statistics: Concepts and Controversies (Moore, 1991), video extracts from the series Statistics: Decisions through Data (Moore, 1992), and a CD-ROM. Many other packages for effective teaching of statistics and probability began to emerge, including Chance and Data Investigations (Lovitt & Lowe, 1993) and Chance and Data: Exploring Real Data (Finlay & Lowe, 1993). Response to these initiatives by teachers was very positive. By 2013, the place of statistics and probability in the Australian Curriculum: Mathematics v8.4 (Australian Curriculum, Assessment and Reporting Authority [ACARA], 2013) was firmly established. Nevertheless, Watson (2013) indicated that teachers were less confident about teaching statistics and called for more attention on the pedagogical needs of teachers.

In a collaboration with the AAMT online Top Drawer Teacher resource, a “drawer” was created for Statistics (Watson et al., 2013). It was created around five Big Ideas of Statistics at the school level: Variation, Expectation, Distribution, Randomness, and Informal Inference. With this background in mind, more recently the “practice of statistics” and the big ideas have been emphasised as the foundation for a review of the international research, which followed these initiatives (Watson et al., 2018). The continued evolution of frameworks for statistical analysis at the school is seen in the six-step statistical modelling framework of Patel and Pfannkuch (2018) arising from research with Year 7 students, including a total of 23 possible sub-steps.

Using work from the RMFII project (Siemon et al., 2018), Callingham et al. (2021) suggested that a curriculum could be based on a validated learning progression, using examples from the Australian Curriculum: Mathematics v8.4 (ACARA, 2013). The differences in curriculum development in statistics between New Zealand and Australia have impacted on teachers. Callingham and Burgess (2014) considered data from focus groups with teachers in New Zealand and Australia responding to a stimulus question showing student responses to a counter-intuitive two-way table based on a problem from Batanero et al. (1996). Responses from teachers in New Zealand were firmly embedded in the statistical context, whereas Australian teachers focussed on the underlying proportional reasoning and the social context, suggesting that teachers in New Zealand had a stronger grasp of statistical ideas.

Subsequent revisions of the mathematics curricula in both countries from 2000 have continued to consolidate the position of statistics at the school level (ACARA, 2022; MoE, 2022). In the latest iteration of the Australian Curriculum: Mathematics v9.0, statistics is included in every year level from Foundation to Year 10. This revised Australian curriculum also has an emphasis on undertaking statistical investigation, bringing the intent closer to that of the New Zealand curriculum. This emphasis may lead to deeper understanding by Australian teachers. Nevertheless, differences between New Zealand and Australia remain. For example, in Year 10 in Australia, students “analyse claims, inferences and conclusions of statistical reports in the media, including ethical considerations and identification of potential sources of bias” (AC9M10ST01). The equivalent in New Zealand states “Evaluate a wide range of statistically based reports, including surveys and polls, experiments, and observational studies:

  • critiquing causal-relationship claims

  • interpreting margins of error.” (Level 8 Achievement Standard).

In New Zealand, it seems that explicit higher order thinking is expected of students, but in Australia, this is more implicit.

Conclusion

Prior to the introduction of chance and data into the curriculum as a separate strand (AEC, 1991; MoE, 1992; NCTM, 1989), statistics and probability were either neglected altogether or presented as mathematical processes such as theoretical probability and narrow conceptions of measures of central tendency or technical aspects of graphing (e.g. Holmes, 2003), depending on the curriculum jurisdiction. The research agenda was focussed on narrow mathematical ideas, often related to theoretical probability. In more recent times, however, researchers from Australia and New Zealand have influenced, and often led the way, in statistics education research internationally. This strength may have contributed to the performance of students on statistics questions in international studies. In the Trends in International Mathematics and Science Study (TIMMS) in 2019, for example, Australian students performed better in the data strand at both Year 4 and Year 8 than in any other content strand (Thomson et al., 2020). The studies reported in this paper showcase a breadth of research—both in approach and focus.

Early studies focussed on fundamental ideas of statistics held by students, and sometimes their teachers, across the years of schooling (e.g. Jones et al., 2000; Nisbet, 2002; Watson & Moritz, 1999a, b, c). These studies have been important in clarifying ways in which students understand statistical concepts and develop statistical thinking. As such these have provided a foundation for advice to teachers and the development of resources (e.g. Watson et al., 2013).

Numerous classroom-based projects have documented students’ increasing understanding from experiencing quality teaching activities. Different approaches to teaching statistics have been investigated (e.g. English, 2014; Fielding-Wells & Makar, 2015; Pfannkuch et al., 2010; Watson & Callingham, 1997; Wild & Pfannkuch, 1999), and ways of assessing and measuring learning and understanding have been developed from small and large scale studies (e.g. Callingham, 2011; Fitzallen, 2006, 2008; Siemon et al., 2018). Links between mathematics and statistics have been identified (e.g. Pierce & Chick, 2011). Australian and New Zealand researchers have contributed to many conferences, meetings, publications, and collaborations nationally and internationally. This paper does not pretend to have included every study that has been reported, but the range of studies covered is impressive.

There are new and emerging research agendas. The place of technology, and the best ways of using it, is still open to question. At present technology use is still relatively procedural, being used to create representations, or to sort and classify data. The emergence of Big Data and new approaches to data analysis have the potential to be linked to some of the work on computational thinking and coding, for example. Data can now be collected and presented in a variety of ways and there are hundreds of data visualisation tools accessible to school students. As yet the impact of such tools is unexplored. Projects such as Dollar Street (https://www.gapminder.org/dollar-street) focus on social aspects and such ideas lead to investigating the crossover of statistics from mathematics into other curriculum areas.

The importance of statistical education is increasing, and the necessity for students learning about statistics through conducting investigations is growing. As Cobb (2015) stated, our roles as teachers of statistics should be not “to prepare students to use data to answer a question that matters; our job is to help them use data to answer a question that matters” (p. 277, italics in the original). Although Cobb was addressing issues in undergraduate statistics, this approach is as pertinent to school level statistics. Today this issue is even more important at all levels of education, given the impact of recent challenges, such as the Covid-19 pandemic, and the growing pressures on the environment caused by climate change (Watson & Smith, 2022). The body of statistics education research presented in this paper indicates that Australia and New Zealand educators have a sound research foundation for facing the future.