Students’ Tool-Shaped Conceptualisation of the Idea of Statistical Distributions: The Case of Frida

This paper presents a case study that explores digital experiences in statistics teaching within Danish lower secondary school, focusing on the development of students' statistical concepts. The study tracks the progress of a student named Frida, who engages with the digital tool Tinkerplots over the span of a year. During this period, Frida develops a unique 'plot-stack-drag' technique that


Introduction
Statistics is a way to cope with variability in the world and offers strong tools, concepts and language to understand and describe such variability.The power of statistics lies in the abstraction from the individual values of the data collected to an aggregate view of how the data are distributed, both when the task involves describing the data and when the data are utilised for inference (e.g.Wild, (2006).When learning statistics, it is essential to develop a conceptual understanding of a data set, perceiving it not merely as a collection of individual values but rather as a unified entity that can reveal important aspects of the variation in the data.According to Bakker and Gravemeijer (2004), a conceptual understanding of distribution is a prerequisite for developing the ability to choose appropriate statistical measures.Nevertheless, it is difficult for middle-grade students to expand their view on data from the measurement value of an object to understanding data distribution.According to Garfield et al. (2008), the tasks students encounter in textbooks do not support their understanding of the notion of distribution.Rather, the tasks ask students to '… look at a histogram or stem plot and describe the shape, center, and spread …' (Garfield, 2008, p. 186).Consequently, many students do not understand that the distribution of data is an entity.As noted by Bakker and Hoffmann (2005, p. 334), 'An essential characteristic of statistics is that it can predict properties of individual values.
Therefore, if students cannot see a data set as a whole, they miss the essential point of doing statistics.'Digital tools can support students in develop an understanding of key concepts in statistics education, including the idea of distribution.Digital technology can promote students' development of a conceptual understanding about such powerful statistical concepts, tools, and language.In the field of statistics education research, the potential of digital tools has been widely explored.Biehler et al. (2013) presented an overview of statistical tools for education at all levels, from the primary to the tertiary level.Different tools serve different purposes and offer different affordances.Some tools has been developed for professional statisticians, others have been developed for educational purposes, e.g.Minitool (Cobb et al., 1997), TinkerPlots (Konold & Miller, 2011) or CODAP (Finzer & Damelin, 2014).What they have in common is what Biehler et al. (2013) called fast modes of transportation.According to this metaphor, statistical reasoning is characterised as movements between looking at data as individual values and looking at the patterns and trends in the data.Instead of spending the time, for instance, to calculate means manually or draw graphs, the power of the digital tools frees up cognitive space to focus on the more fundamental ideas.This is equivalent to the idea of lever potential, which Winsløw (2003) and Dreyfus (1994) investigated in the context of computer algebra systems, which involves allowing students to work on a conceptual level while leaving lower-level operations to the computer.
Few studies have explicitly investigated how digital tools support students' development of statistical concepts.Bakker and Hofmann (2005) followed the development of students' diagrammatic reasoning while working with Minitool.Ben-Zvi and Ben-Arush (2014) studied students' instrumentation of certain features in TinkerPlots.
The current study, however, draws on Vergnaud's (1998Vergnaud's ( , 2013) ) theory of conceptual fields.Vergnaud (1998Vergnaud ( , 2013) ) presented an extensive conceptualisation of what constitutes a mental scheme.This study follows the case of Frida, a 12-13-year-old Danish middleschool student, as she construes the concept of statistical distribution while interacting with the digital tool TinkerPlots.Specifically, the study involves an in-depth investigation of how the digital experiences shape her mental understanding of data distribution in terms of Vergnaud's conceptual notions.The richness of this theoretical approach not only relates to the development of Frida's understanding of the concept but also to how, over time, the digital tool shapes her personal goals and aspirations while she work on statistics tasks.In particular, the analysis of her personal goals and anticipations help to extend our understanding of how digital experiences shape and challenge students' conceptual understandings.
The study seeks to answer the following research question: How can interaction with the digital tool TinkerPlots shape a student's conceptual understanding of data distribution, and how does the student's tool-shaped scheme(s) challenge or support further conceptual development of this understanding as she progresses to new situations?
This research involves a case study built on transcripts from, in part, screencast recordings of Frida's actions with TinkerPlots while working on two different tasks as well as a transcript from an artefact interview conducted after one of the teaching sequences.The findings and potential implications for statistical teaching practice in a digital learning environment are also discussed.
The following section introduces the potential of digital tools in statistics teaching.
First, the idea of distribution is presented along with why this concept is important as an educational aim.Second, research on the potential of including digital tools in this process is discussed.Next, the method section clarifies the boundaries of the specific case of Frida.
The analysis focuses on three excerpts, which extend over a time span of one year from three different teaching sequences.Finally, the findings of the analysis are discussed along with their educational implications.

The complexity of making sense of data distribution
In the context of statistics, distribution is a meta-concept with several embedded statistical sub-concepts, such as variability, shape and centres (e.g.Burrill & Biehler, 2011).Wild (2006) explained the idea of distribution, from a statisticians point of view.He emphasised that data are not interesting as a collection of individual values; rather, it is the patterns that can be seen when freed from distracting details: When case labels are set aside individuals with identical values for the variables of interest become more indistinguishable so that, without any loss of information, we can reduce the data to a set of distinct values and their corresponding frequencies, that is, to a frequency distribution.(Wild, 2006, p. 11) Statistical reasoning can be described as the shift from seeing data as individual values to seeing data as a distribution (Bakker & Gravemeijer, 2004).Bakker and Gravemeijer (2004) investigated seventh-grade students' learning of several aspects of distribution, developing a three-layer model.The model describe how statistical reasoning is moving between the layers.The top layer includes the distribution as an entity, while the middle layer contains terminology and concepts describing individual features of the distribution.This could be a way to express central tendencies, such as means, modes or medians, or it could be a way to describe a range (e.g. standard deviation).It could also be a way to describe the skewness and density of the data.The bottom layer contains data as individual values.Here, the student focuses on what the data represent, for example, considering what a single outlier actually represents in the given context.A professional statistician would begin by looking at the distribution of data as a conceptual entity and then move between the different layers of this model.However, according to Bakker and Gravemeijer (2004), this movement should not be seen as the end goal.
The conceptualisation of statistical distribution and the development of smooth movement between the layers is non-trivial and, according to Bakker and Gravemeijer (2004), difficult for middle-school students.Indeed, it is complicated to connect the distribution as an entity to the different formal descriptors.For instance, Mokros and Russell (1995) raised concerns about the 'unappreciated complexity' of the mathematical mean.While it is trivial to calculate, it is complex and difficult for students to find meaning in it.Konold and Pollatsek (2002) highlighted a problem with students' understanding of averages, noting that students often fail to see averages as representative of the whole distribution.To address this issue, they proposed viewing statistics as the study of a 'noisy' process in which signals can be detected.Both Mokros and Russell (1995) and Konold and Polatsek (2002) identified a potential conceptual lever for use in comparing distributions.When students describe similarities and differences between distributions, they utilise various methods to accomplish the comparison, and in doing so, they gain an understanding of the concept of distribution as a distinct entity.

The potential of digital technologies in statistics education
The role of digital technologies in statistics teaching should involve '…accessing, analyzing and interpreting large real data sets, automating calculations and processes, generating and modifying appropriate statistical graphics and models, performing simulations to illustrate abstract concepts and exploring "what happens if…" type questions' (Chance et al., 2007, pp. 2-3).According to the literature, digital tools in statistics education hold such great potential for good statistics learning environments (Ben-Zvi et al., 2018).There is a distinction between route-type and landscape-type educational software for statistics teaching.The educational tool TinkerPlots is creates a landscape '…in which students and teachers may freely explore data' (Garfield & Ben-Zvi, 2004, p. 402).The tool is designed to support students in transiting data from an unorganised form to both formal and informal structures (Biehler et al., 2013).Various informal and formal visualisations are embedded in software (both in Minitool and TinkerPlots), which allows students to develop various sub-models in an emergent modelling process (Bakker & Gravemeijer, 2004;Ben-Zvi et al., 2018).
In the Connections Project, TinkerPlots gradually became a thinking tool for developing students' statistical reasoning, aiding them in learning new ways to organise and represent data (Biehler et al., 2013).Ben-Zvi and Ben-Arush (2014) investigated students' interaction with certain features of TinkerPlots.They used the theory of instrumental genesis (e.g.Guin & Trouche, 1999) to analyse the students' instrumentation and instrumentalisation of the tool.This approach supported the students' instrumental and cognitive development while they constructed meanings of key ideas through an exploratory data analysis process.Ben-Zvi and Ben-Arush (2014) identified three kinds of instrumentation profiles: unsystematic, systematic and expanded instrumentation.Bakker and Hoffmann (2005) analysed students' learning about statistical distributions using Peirce's framework of semiotics.They found that students' diagrammatic reasoning led to the development of statistical concepts as objects in the discourse.The students later developed these objects into means of communication in their further reasoning.In particular, Bakker and Hoffmann (2005) focused on students' development of the term 'bump', referring to the shape of the statistical distribution, hence the distribution as an aggregate.Their analyses focused on the possibilities for students to engage in what they called hypostatic abstraction, which refers to the process whereby certain characteristics of a set of objects become an object in itself.They gave an example from their own study, where students developed the concept of spread.First, the students referred to the spread by saying 'the dots are spread out'.Subsequently, through the process of hypostatic abstraction, they referred to the same phenomenon as 'the spread is large'.The term spread has developed as an object in itself and become a means for communication (Bakker & Hoffmann, 2005, p. 342).Bakker and Hoffmann (2005) concluded that the diagrammatic reasoning process involves three steps: making diagrams, experimenting with them and reflecting on the results.Their conclusions led to three recommendations for teaching statistics.First, students should both make their own and learn conventional diagrams.Second, students should experiment with diagrams.Third, students' reflections should be stimulated.

Students' statistical distribution schemes in a digital learning environment
In this study, the analysis of Frida's case draws on Vergnaud's theory of conceptual fields and the notion of schemes.However, the notion of schemes also appears in the theory of instrumental genesis developed by Guin and Trouche (1999).The idea behind this theory is that the tool develops from being an artefact to an instrument for the student.Drijvers et al. (2013) described the process of instrumental genesis as involving three dualities: artefact-instrument, instrumentation-instrumentalisation and scheme-technique.
In this study, the analysis will relate Frida's techniques with the tool and the four aspects of Vergnaud's notion of scheme, which will be elaborated in the next section.Whereas the study of Ben-Zvi and Ben-Arush (2014) focused on the instrumentationinstrumentalisation duality, this study focuses on the scheme-technique duality.In the case of Frida, we follow the techniques she employs with the tool in parallel with the traces of her scheme of statistical distribution.Vergnaud (2013) described conceptualisation as a way to reduce information and elaborate a sense of what is sufficient and necessary to manage a certain type of situation.
The use of symbolism enables a person to represent the relevant information and support the organisation of activity when facing the given situation.The different approaches are then interiorised so that they can be evoked when the person faces a similar situation.The different ways to organise activity guide perception, action and imagination.A person's mediation actions, such as the use of words, sentences and gestures, are crucial to the development of schemes.Vergnaud (2013) distinguished between two psychological functions of schemes.The first involves organising and generating activities for familiar situations.The second function relates to tackling new situations and extending the scope of the application of the scheme.According to Vergnaud (2013), a scheme can be divided into four parts: 1) a dynamic functional whole, 2) an invariant organisation of activity and behaviour for a certain class of situations, 3) a subset of four categories (see below) and 4) a mapping of a multidimensional informational space onto a multidimensional space of action variables.The third definition is useful for analytical purposes and will be the frame for this study.This part of the scheme comprises four components: 1) the student's goals and anticipations, 2) rules of action, 3) operational invariants (theorems-in-action and concepts-in-action) and 4) possibilities to make inferences (Vergnaud, 1998(Vergnaud, , 2013)).The objectives of the scheme are what the student anticipates and wants to attain, the effects to be considered and the possible intermediate states that are reached (Vergnaud, 2013).The students' goal are not to be confused with the goals of the tasks.While objectives are personal, in this study they inform the process of students' conceptualisation of data distribution.The operational invariants refer to the epistemic components of schemes.
They often remain implicit and unconscious.However, Vergnaud (2013) noted an important distinction: Whereas theorems-in-action is a proposition that can be true or false, concepts-in-action do not have this property; they are simply relevant or not.
According to Vergnaud (1998), schemes and situations are narrowly tied together.
A scheme addresses a class of situations.Vergnaud (2013) presented the example of an algorithm that can apply to a whole class of situations that share certain characteristics.
When students are first introduced to a concept, Vergnaud (2013) stated that the class of situations is small when they first construe a scheme.Therefore, schemes begin as what he called local organisations of activity, and therefore the field of application is very small at the beginning of the development of a scheme.Vergnaud (1998) defined different categories of schemes.One category that is relevant to the case of Frida is perceptive-gestural schemes (Vergnaud, 1998).A perceptive-gestural scheme can be efficiently applied to a whole range of situations, and the scheme can generate a sequence of actions relevant in the situation.Concepts and theorems are involved in these schemes.One example is counting a set of objects as a perceptive-gestural scheme, where the concept of cardinal numbers constitutes the underlying concept.This study will focus on Frida's perceptive-gestural scheme of data distribution.The analyses will show how Frida's digital experiences with TinkerPlots strongly shape her perceptive-gestural scheme of data distribution, which will be referred to here as a tool-shaped scheme.

Method
The task and the tool The study follows Frida and her peers over a period of one year in three teaching modules of 8-9 lessons.In the first teaching module, the students compared the ages of their parents (gender comparison).This was their very first experience with the TinkerPlots.Because the tool was new to the students, the idea was that understanding the data and the context should not involve any obstacles.However, it should be possible to activate relevant functions in the tool and answer a statistical question.In the second teaching module about six months later, the students were working with data they collected themselves with the purpose of entering into the public debate about indoor climate in Danish schools.The students had read newspaper articles about a general Danish problem of overly high levels of carbon dioxide (CO2) levels in Danish classrooms (Børne-og Undervisningsministeriet, 2019).The teacher facilitated a data-driven investigation of the following question: Do we have a problem with the indoor climate in our classroom?The students prepared a presentation of their findings and invited the school principal to hear their arguments.The students created representations of the data in TinkerPlots that they found relevant for explaining and describing their findings for their presentation.
In the curricular demands for this age group, descriptive statistics are emphasised in the mid-level of the curriculum, and statistical descriptors, such as means, medians, modes and ranges, are central to the curriculum in grades 4 to 6 (Børne-og undervisningsministeriet, 2019).The overarching aim of the activities was to combine, on the one hand, for students to experience the process through which data could support them in gaining insight on a phenomenon and support them in communication about a problem and, on the other hand, to develop a concept of distribution.The progression was supposed to move from students' informal descriptions of the data to embedding the formal descriptors (e.g.means, medians, ranges, quartiles and boxplots) to increasingly make sense of the distribution of the data.TinkerPlots was an important tool that supported students' informal investigations.The first features they were introduced to were how to draw a plot, stack, drag and separate dots, draw lines, add attributes to the axes and use the measure tool to find the range.The tool also had black-box features that students could activate and deactivate, including means, medians and modes.In the second teaching module, the students were also introduced to boxplots.
Because the idea of distribution is a meta-concept that includes several other embedded concepts, the task should be balanced between openness and explicitness.If the task asks students to compare two sets of data, they have various ways of doing so, some more relevant than others.Both formal and informal natural language and gestures as well as exact values interweave in a good and thorough description of the data distribution.On the one hand, if students are asked to calculate a mean, median or range, students might succeed in doing so.However, this might serve no purpose or contain any meaning for the students.On the other hand, if the task is too vague and asks questions of students that are too broad, there is a risk that the students will be confused.The teaching sequence in this study seeks to balance this dilemma.

The data of the study and selection of the case of Frida
The data for the case of Frida are drawn from several sources.First, field notes from classroom observations helped to identify the case of Frida.The main source of data was transcribed video data from screencast recordings of students work on the computer while working with the tasks in TinkerPlots.A smaller group of students was selected for a semi-structured artefact interview (Kvale & Brinkmann, 2015).The analysis of Frida's case builds on the transcripts of screen cast recordings of her work on two different tasks.
However, Frida was also selected for an interview after the second teaching sequence.
Frida's case was selected because of its epistemological potential.Several cases could have been selected from the teaching sequence to follow students' learning trajectories and their interaction with TinkerPlots.However, according to Thomas (2011, p. 514), the '… essence of selection must rest in the dynamic of the relation between the subject and object'.The subject is a practical, historical unit, and the object is an analytical or theoretical frame (Thomas, 2011).In this study, the relationship between Frida's use of the tool and the development of the scheme represented a key case (Thomas, 2011).A key case is defined by its ability to exemplify the analytical object.In this study, Frida's case illuminates all parts of the scheme in the scheme-technique duality.Thomas (2011) proposed a case study approach that guides the case selection: 1) establishing the subject, 2) defining the purpose, 3) selecting the approach and 4) determining the process and establishing the boundaries of the case. 1) Frida as the subject presents an interesting relation to the theoretical frame, as we are able to observe her articulation of her goals, which is an important aspect of Vergnaud's concept of schemes, and this could inform the analysis of the role of the tool in her establishment of a scheme of statistical distribution.In line with Thomas (2011), the case of Frida was not selected because she is representative of the larger population, that is, all students.Rather, the case was selected because it reveals an important aspect of the object, here the theory of conceptual fields.Students' goals are an indispensable part of their development of mental schemes according to Vergnaud's theory.Frida is orally clear about her goals, which makes them accessible for observation.Therefore, the selection of the case of Frida is based on the dynamic relationship between the four parts of Vergnaud's mental scheme and the accessibility to all four parts of the scheme of statistical distribution in the case of Frida.
2) The case of Frida makes it possible to explore the relationship between scheme and technique in the learning of statistics.The case study explores how a digital tool (TinkerPlots) can create certain types of situations that affect the development of the students' scheme.
3) According to Thomas (2011), the approach to a case study can be either theoretical or illustrative.The case of Frida is tied to the theory of conceptual fields and thus involves a theory testing approach.The purpose is not to expand the theory but rather to empirically explore a special case in which a digital tool heavily shapes the students' scheme.
4) This case study focus on a single, diachronic case, as described by Thomas (2011).The singularity of the approach pertains to the selection of a single student's learning path.Unlike studies that compare various learning paths, this research centres on a longitudinal investigation into Frida's unique learning process.The term 'diachronic' emphasizes the commitment to tracking Frida's development over time.The study closely examines how Frida's understanding of data distribution evolves over the span of one year.

Analysis
Excerpt 1: Frida and Mathias compare the ages of their parents while they plot-stackdrag: They reject the 'ugly' representations and finally find the one that satisfies them In the following excerpt, Frida and Mathias work on the task of comparing the age of their parents based on gender.The task is formulated as follows: The following excerpt from the screencast recording follows Frida and Mathias, who have a pile of 48 data cards with the age and gender of their parents when they had their first child, to the point where they have a stacked dot plot connected with a line.The point at which they have a stacked dot plot is where they are able to solve the task and describe and compare the distributions of their parents' ages.
Frida and Mathias use a plot-sort-stack-drag technique.First, they drag a plot down and the data from the table and randomly spread the out as dots in the window (Fig. 1).Second, they sort the data by adding the attributes to the axes.Third, the students stack the data, sorting them into piles with the same case values.Fourth, the students drag the dots to the left or the right until a representation satisfies them.They also adjust the size of the plot window.In the following excerpt, Frida and Mathias also activate the mean and connect the stacks with a line.

A:
Describe the shape of the distribution in your own words.You could consider questions such as the following: Are there any peaks?Are there one or more peaks?Is it symmetrical or skewed?Is it flat, or is there a high peak?How do the two distributions differ?

B:
Make a conclusion about your mothers' and fathers' ages when they had their first child.Frida reads the task out loud.
Frida: Okay, now we need to get this one down.This is gender.
Frida drags the attribute 'gender' (Køn in Danish) to the vertical axis and age to the horizontal axis.Frida: Well, I can't see anything.
Frida enlarges the window.Mathias points at the screen on the four white dots (Fig. 2).
Mathias: Like this, because this is Eva's parents.

Mathias points at the white dots.
separate the ages further and activates a 'draw line'.There are four choices for drawing lines in the dropdown menu.Frida tries different ways.
Fig. 5 The data are separated horizontally, the mean is calculated, and the 'draw line' feature is applied Mathias: Wow.Shut up! Frida: But, there are two of these.This is not what we wanted, is it?
Mathias: What does it even mean?
Frida: I don't understand that either.

Frida and Mathias draw the sketch on the paper. They move on to the next sub-task.
Mathias: It is not symmetrical.It is really not symmetrical.Or … a little bit there.
Frida: The mothers' are very flat.

Mathias:
The fathers' has many peaks.
Frida: Yes.Well, there is one peak.
Mathias: There are two there.(Mathias refers to the two small peaks left of the highest peak).
Frida: Yeah, but that is not really… The mothers' are very flat.
Mathias: But, there is one peak?
Frida and Mathias answer question C with the support of Fig. 6.Frida writes the following: 'The mothers' is very flat.There are not that many peaks.There is a very little peak at 26.
The fathers' has a very high peak at 30 and a small peak at 28.'

Question C:
'The fathers are approximately 1.5 years older than the mothers.The mothers are more spread out over the whole diagram, whereas the fathers peak very big at 30.' Frida and Mathias use informal words when describing and comparing the distributions.However, they succeed in doing so, and they also draw a conclusion about the fathers being the oldest.Without using the formal word for mean, they use it in their comparison.It seems evident that the tool played an important role and that Frida and Mathias tried several (for them) useless representations before they decided to use the diagram in Fig. 6 to support them in solving the task.
Focusing on the interaction with the tool, it is evident that Frida has an objective.It is not precisely articulated what her objective is, but it is evident when she is not satisfied with the feedback from the tool ('This is ugly' or 'This is not what we wanted, is it?').Her objective is not convergent with the task, as the task does not ask for a specific representation.Nevertheless, the tool guides Frida's goal of solving the task, and she articulates the desire to be able to 'see'.This establishes some kind of rule of action: The representation Frida creates with the tool must expose visual characteristics of the distribution.The distribution is thus a concept-in-action, understood as a collection of data that has a shape that exposes certain features of a phenomenon, in this example, the differences in the age of the parents.Actually, the representation from Fig. 4 could also have helped her solve the task, but Frida found it ugly for some reason.Nevertheless, in the end it is possible to make an (informal) inference about differences between genders.
Ultimately, Frida's experiments with different representations of the distribution makes it possible to solve the task and make a comparison of the two genders.
The flexibility of the tool and the ability to plot, stack and drag the dots plays an important role in Frida's development of the distribution scheme.As stated by Vergnaud (2013), the schemes starts as local organisations of activity and thus have a small field of application.In the case of Frida, the scheme is tool shaped and relates to the action of manipulating the representation until she is satisfied.
Data distribution becomes a concept-in-action for Mathias and Frida.Distribution is a meta-concept with several embedded sub-concepts embedded.While the task does not ask for specific sub-concepts, Frida and Mathias handle both individual values of the data (Emily's parents), the mean as representative of the ages of the mothers and fathers, and thus they describe features of the distribution in natural language to describe the shape.

Excerpt 2: Frida and Iris are explaining what they like about their favourite representation
The next teaching sequence takes place six months later.This time, the theme is indoor climate.The students read newspaper articles about a general problem of overly high CO2 levels in Danish classrooms.The articles claim that these high levels disturb students' ability to learn and cause headaches.In the articles, 1000 ppm is defined as the critical value that should not be exceeded.The students are to investigate if they have this problem in their classroom and report their finding to the school principal.The teacher and the researcher have installed a CO2 meter that measures the CO2 levels during the day in ppm.The students measure other attributes as well, including the school subject, time of day (categories: morning, forenoon, lunchtime and afternoon) and mood/energy level.The dragging part of the plot-stack-drag technique is where Frida makes her final decision regarding whether the representation is good.Frida drags the dots back and forth until she reaches her goal.The three examples (Fig. [8][9][10] show the different degrees of separation when the dots are dragged horizontally.In Fig. 8, the dots are not separated at all.In Fig. 8, the dots are fully separated horizontally, and in Fig. 9, the dots are separated to a level where Frida and Iris are satisfied with the representation.In the following excerpt, the interviewer asks Frida and Iris to explain why they prefer this representation (Fig. 10).

Fig. 8 CO2-data in to columns
Fig. 9 The CO2 data in ppm on the horizontal axis is completely separated Fig. 10 The CO2-data is grouped into categories with a span of 300 ppm Iris: So, now I have separated them like totally, so I think it's hard to see where this one is and where this one is.
Int.: What is it that you want to see? Now you're saying, Iris, you have such a nice way of expressing it, you say it's a bit difficult to see.
Iris: Well, it's a bit tricky to see where it's located without having to tap on it, then you can see that it's there.But, it can be a bit harder to spot.It's a bit easier to see here (Iris is referring to Fig. 10).
Frida: So, you can kind of make it more clear (Frida makes a curve with her hand).Iris: Because the wave is very unclear (Iris is referring to Fig. 9).This part of the artefact interview with Frida and her peer Iris clarifies her goal.In the first excerpt, it is evident that some representations did not fulfil Frida's goal.In excerpt 2, the interviewer asks Frida and Iris which representation they preferred and why.
Once again, the representation should make them able to 'see'.Iris explains that it is 'difficult to see' when the dots are dragged out the most (Fig. 9).Frida makes an important gesture when she explains what she likes about the representation in Fig. 10.She draws a wave in the air and explains that it is clearer this way.This shows that Frida (and Iris) is driven by a goal to create a representation with a smooth shape, which reveals some features.The students have developed a goal that is very closely related to distribution as a concept-in-action.They seek to view a collection of data as a coherent unit, which can be represented in a way that exposes features to the students.The technique of plot-stackdrag becomes a rule of action (drag until satisfied with the representation) and implies a scheme of distribution according to which data can create a shape.

Excerpt 3: Frida and Mathias investigate data about the indoor climate: 'Why do they not stack properly?'
The third excerpt occurs place six month later.The students are working on new tasks, but the teacher also asks them to revisit the CO2 data and write a letter to the editor of the local newspaper to start a debate about the problem of the indoor climate.Frida is now working with Mathias again.However, Mathias has some technical issues with a damaged file and does not participate in the dialogue.Frida continues to work on the task while attempting to help Mathias and sharing her work.
The task asks the students to investigate the data with the features they know from before: make a data card, plot the data, drag the data, draw attributes, drag lines and activate the mean, median and mode and use the ruler to measure the spread.The teacher reminds the students of the common descriptors, means, medians, modes, ranges and minimum and maximum values.The teacher also introduces the students to the boxplot and instructs them to include it in their representation of the data.The students are instructed in how to draw a boxplot.In the following excerpt, Frida has made a plot, added attributes and dragged the dots into an acceptable representation.By now, this technique has become routine to Frida, and she performs it rapidly.Nevertheless, this time it does not satisfy Frida goals.
The CO2 data take many different values.Conversely, the age data are categorical and are more compatible with Frida's plot-stack-drag technique.Frida is not satisfied with the representation, as her technique did not work.However, her conceptual understanding and more diverse ways for describing the data allow her to proceed in spite of the representation being 'ugly'.

Frida reads the task aloud.
Frida: Okay, we can do that.
Frida drags a plot.Frida drags the dots to the right two times (Figs. 13 and 14).Frida activates both the mean and median.Mathias has some technical difficulties with the files.Frida tries to help him and returns to their common task afterwards.
Frida: Okay, now you can see.Now, I will just take a picture of this.I can send it to you.

Frida and Mathias write this in the written assignment as a solution to the task that accompanies the screen shot:
As one can see, the mean of the CO2 measurements in September is 1152.73ppm.
The first 50% are below 1000, and the last 50% are above 1000.The median, which is the point between these two 50% groups, is approximately 1000, but the average is higher.This is because there are some very high values.The high values have a greater impact on the mean than on the median.For example, if the value just above 2600 were 1000000, the median would remain the same. 1 So, the last 50% are higher than the first 50%, which are lower.The different 25% intervals have a wider and wider range as the values get higher.It spreads out more as the numbers increase.
As excerpt 2 shows, Frida appreciates a representation that exposes the shape of the distribution.This is a valuable goal for Frida, allowing her to 'see' something, as she referred to it in excerpts 1 and 2. The plot-stack-drag technique has become a rule of action.She is very explicit that the plot-stack-drag technique does not satisfy her goal this time.This might be due to the data, which act differently than the categorical data on age in the first excerpt, but if Frida was more familiar with the tool, she might have succeeded anyway.She could have changed the size of the dots, the shape and size of the plot window and created a representation of a smooth left-skewed distribution.This was not the case, and Frida was annoyed with the representation, blaming it on the tool ('Why don't they stack properly?It is so annoying.').Frida's plot-stack-drag does not satisfy her objective of creating a smooth shape of the data.Therefore, she has to discard her toolshaped distribution scheme this time and revise the distribution scheme.
Fortunately, Frida does not give up, and since the first excerpt, her scheme of distribution has expanded with several other theorems-in-action.Here, she make sense of the mean, median and quartiles in her description of the distribution.However, the shape does not seem to support her description this time.Her scheme of distribution now also consists of more formal descriptors, such as range, mean, median and boxplot (subschemes), which are now concepts-in-action and theorems-in-action.Hence, she exhibits the capacity to explore other representations than the one created by her plot-stack-dragtechnique.The boxplot and its representation of median and quartiles become a new set of concepts-in-action that support Frida in achieving her goals and solving the task.

Discussion
As stated in the introduction, TinkerPlots creates a landscape in which students can freely explore data and which supports them in the organisation of data.TinkerPlots frees the students from tidy calculations (Garfield & Ben-Zvi, 2004).In the case of Frida, it is clear how she uses the features of the tool to support her in solving the task.In the first excerpt, Frida and Mathias try different representations, rapidly shifting between them, before deciding which one best supports solving the task.Frida's desire to 'see' something in the representation when they compare gender differences might indicate that her idea of distribution consists not only of data as a collection of individual values but also as a whole, which can expose certain characteristics like shape.In Frida's conceptualisation of the statistical distribution, her personal goals and anticipation might play a significant role.
It is not only important to make sense of formal statistical measures, such as means, medians, modes and ranges.The tricky part is to judge what is worthy of attention, and therefore will the personal goals of the students have an enhanced position when making sense of data distributions.
The flexibility of Tinkerplots provides the students with a variety of choices.They students can adjust the dot plot, choose which attributes are interesting, make adjustments until the representation is just right, activate or deactivate different features and measures of centre, activate the ruler and so on.In her first experience with the tool, Frida quickly develops a technique of dragging a dot plot, stacking the dots, selecting the attributes and then dragging the dots back and forth until she is satisfied with the representation.The representation that satisfies her is the one that enables her to 'see something'.I refer to this gesture as her plot-stack-drag technique.Even the simplest and most modest plot-stackdrag technique allows the student to actively evaluate the feedback from the tool and to easily adjust it if it does not satisfy the student's objective.For Frida, this represents a relevant starting point when she moves from the individual values of the parents' ages to construct a coherent description of gender differences.
In the first excerpt, Frida and Mathias use the mean as the only formal descriptor, but in the last excerpt, several descriptors are embedded in the description.One of the major challenges in developing a conceptual understanding about descriptive statistics and the idea of empirical distribution is seeing the descriptors as representative of the whole distribution (e.g.Konold & Pollatsek 2002;Mokros and Russell (1995).It seems that TinkerPlots supports Frida in developing a conceptual understanding of the descriptors as representatives of the whole distribution.In the first excerpt, Frida uses the mean along with a description of the shape.In the third excerpt, Frida uses several descriptors.In her description of the quartiles of the CO2 levels, she does not report the quartiles as numbers.
Rather, she has noticed how they increasingly spread out from the first to the last quartile.This description arises from Frida interaction with the tool, as she identifies the (most) satisfying representation and then notices characteristic features of the distribution.This might be connected to her personal goal of creating a representation that enables her to 'see something', a goal that is tightly connected to her plot-stack-drag technique and her experiences with the digital tool.It seems that the metaphor of digital tools in the learning of statistics as fast modes of transportation (Biehler et al., 2013) is appropriate and beneficial in the case of Frida.Students must not only conceptualise statistical measures but also develop their judgement and anticipation.The process of learning to judge what is worthy of attention is non-trivial, and students must develop this competence in parallel with conceptual knowledge about statistical measures.It seems that the tool has supported Frida in this conceptual development.
However, in the third excerpt, it is evident that Frida's plot-stack-drag technique was challenging for her as she attempted to make sense of the CO2 data.The plot-stackdrag technique did not act as she anticipated, which obviously annoyed her.Frida's experience with TinkerPlots has established a certain kind of situation, as the technique supported her in solving the task, and in her first experience with data exploration, the plot-stack-drag technique supported her in drawing conclusions about the data distributions.As Vergnaud (1998) stated, situations and schemes are tightly connected.Frida's distribution scheme is shaped by situations she has already mastered.The toolshaped scheme is relevant for her until she encounters a situation in which the data act differently in the tool.This relates to what Vergnaud (1998) called the perceptive-gestural scheme, which can be efficient for a whole range of situations, and the scheme can generate a sequence of actions relevant to the situation.Fortunately, by the time of the third excerpt, the teacher had introduced the students to other ways of describing and representing data, and Frida's scheme had been expanded with the boxplot, which supported her in reaching her goal, which might have still been to 'see something'.
Although digital tools provide good support to students in their exploration of data and conceptual development, it is important to be sensitive to which tool-shaped schemes the students develop along the way.To support the students' development of statistical distribution schemes, it is important for the teacher to be sensitive to the duality between the techniques the students develop with the tool and how these techniques shape the scheme.In particular, there is something important to learn from the ways students' experiences with the digital tool shape their goals and anticipations.Frida is flexible and is quick to either develop a new sub-scheme or adapt the boxplot into her distribution scheme.However, the teacher should be aware that tool-shaped schemes that are fixed, and, hence, may present an obstacle hampering students' development of a rich and flexible statistical distribution scheme.The conflict Frida experiences when the tool does not serve her goal reveals a great deal of information.If the teacher engaged in a dialogue with the student to explore the origin of the student's personal goals, the teacher could understand important aspects of the student's conceptualisation of statistical distribution.
For instance, it is important for Frida to find a representation with a smooth shape that bring out features of the distribution.Indeed, by engaging in a dialogue to explore students' goals and anticipations, the teacher could contribute to the expansion of the students' schemes.In this case, an expansion could have been supported in two ways if the teacher had the ability to explore Frida's dissatisfaction with the feedback from the tool.First, it could have led to an investigation of the digital opportunities and the students could learn new ways to handle the tool, such as adjusting the window, the size of the dots, the bin size etc.This might have led Frida to succeed in finding a satisfactory representation with her plot-stack-drag technique.This success, in turn, could have expanded the range of possible actions available to her, ultimately increasing her readiness to handle various situations in the data exploration process.Second, a dialogue with Frida and her peers about why the tool did not 'stack properly' could elicit a reflection on how to handle various kinds of data, that is, differences between categorical and continuous variables.Crucially, the personal goals and anticipations that students develop from their interactions with the digital tool present learning opportunities if they are brought into the light and explored through dialogue.
The case of Frida is a key case as defined by Thomas (2011).This case was selected due to the interesting dialectical relationship between the subject, Frida, and the object, the theory of conceptual fields.Frida was remarkable because of her explicitness regarding her satisfaction and dissatisfaction with the feedback from the digital tool, which gave access to interesting parts of her statistical distribution scheme, particularly how the digital experiences shaped her personal goals and anticipations.The results of this paper reveal an important dialectical relationship between the plot-stack-drag technique in TinkerPlots and how it formed Frida's goals and anticipations.This finding enhances an important aspect of Vergnaud's theory of conceptual fields.Specifically, it illustrates how digital experiences can shape perceptual-gestural schemes and the conceptualisation of statistical distribution.The time span of one year allowed the exploration of how the technique and personal goal of finding the right representation followed Frida from one situation to the next.
However, this study presents opportunities for further research.The study of Ben-Zvi and Ben-Arush (2014), as noted above, contributed different student profiles of instrumentation in the exploratory data analysis of Tinkerplots.Bakker and Hoffmann (2005) explored the role of digital tool in students' learning trajectories.In the case of Frida, the focus was on the dialectical relationship between the student's goals and the technique developed with the digital tool.To enhance our comprehension of the intricate link between students' personal goals and the techniques they employ with digital tools while learning statistical concepts, it could be intriguing to investigate additional cases, particularly if variations exist in student profiles when monitoring the learning paths of other students."

Conclusion
The research question of this study focused on how the interaction with the digital tool TinkerPlots could shape students' conceptual development of the notion of data distribution and how the students' tool-shaped scheme challenges or supports their further conceptual development as they progress to new situations.The analyses of the case of Frida showed how Frida's plot-stack-drag technique shaped her personal goal of finding a representation that enabled her to 'see'.The analysis revealed that this personal goal supported her conceptual development of the idea of statistical distribution.However, the study also showed that the technique could lead to disappointment when the student fails to achieve the goal of identifying the representation that fulfils this goal.A disappointment that required that Frida reshaped her distribution scheme.
Overall, the study highlights the importance of teachers' awareness of the personal goals that students develop when interacting with TinkerPlots.It is crucial to bring possible conflicts to the fore in order to exploit the learning opportunities such conflicts might contain.The findings also suggest a direction for further research.Specifically, future research could explore the dialectical relationship between the techniques students develop when using digital tools for data exploration and their personal goals and anticipations these techniques form along the way.

Statements and declarations
The author, Stine Gerster Johansen, has no financial or non-financial interests that are directly or indirectly related to the work submitted for publication.

Fig. 1
Fig. 1 Data cards (to the left) and dot plot (to the right)

Fig. 2
Fig.2The data are sorted by gender in vertical columns and by age in horizontal rows.

Fig. 6
Fig. 6 A line that connects all the dots is drawn

Fig. 7
Fig. 7 A line is drawn to connects the stacks

Figs. 8 -
Figs. 8-10 present data from the students' study of the CO2 levels.The measurements vary from around 400 ppm to 2600 ppm.The same data are the basis of the task in excerpt 3, where Frida is making sense of the data, but excerpt 2 stems from an artefact interview with Frida and Iris focused on why they prefer the representations they do and, more generally, what they value in the representation they create with the TinkerPlots.The aim was to gain a deeper understanding of what guided Frida's techniques when she seeks the representation that fulfils her goal.The goal is not, as shown in the previous excerpt, explicit.Nevertheless, Frida is clear when she does not reach her goal.
Int.: Frida, you're saying, you can make it a bit clearer, and then you do this with your hand.What do you like about it?What does the hand movement mean?Frida: It's just very easy to figure out what's happening or something like that.It's very...I don't know.You can see it quite well.It's easy to understand, unlike the other one, for example, if it's all scattered.

Fig. 11 A
Fig. 11 A table on the left and a dot plot on the right

Fig. 13
Fig. 13The dots are dragged to the right once and grouped into two categories with a span of 2000 ppm

Figure 15 A
Figure 15 A box plot is added to the chart

Fig. 16
Fig.16The 'equal count division' feature is activated and applied to the chart

Fig. 17
Fig.17The mean and the median are applied, represented by the blue triangle and the red 't' symbol, respectively

Fig. 18
Fig. 18 Numeric values are added to the mean and median