Introduction

In some educational contexts, finding a general rule for a data set, often presented as pairs of independent and dependent values or as sequential data, is a typical task in school algebra (see Rivera, 2013). Identification of an underlying rule, mainly for linear relationships, has been addressed in research. This paper contributes in this area by aiming to identify strategies for simple quadratic sequences (monic two-term) used by relative experts with onscreen data to inform future approaches to teaching. In this study, we asked trainee teachers to find nth term rules given the first few terms in a sequence of numbers, a process of generalizing from sequential data, with a particular focus on quadratic sequences.

In a previous study, we used eye-track software to develop conjectures about the ways in which individuals collect and coordinate information when presented with data from which they might identify general relationships. Crisp, Inglis, Watson and Mason (2012) analysed the ways in which a small sample of participants looked at data pairs presented in a vertical tabular form when identifying linear and quadratic rules from the first few terms of a sequence. For example, for a sequence generated by m = n2 + n, the first six terms (1, 2), (2, 6),…, (6, 42) were given in a tabular format. The only significant finding was that participants tended to repeat particular personal search behaviours. We analysed eye-track data quantitatively in terms of the number and order of horizontal and vertical saccades.

In this paper, we investigate search behaviours further by drawing on a larger sample and on qualitative analysis of eye-tracking data. Also, being aware that format has a major effect on how a screen is ‘read’ (Canham & Hegarty, 2010), we investigate in what ways the format of task-data may have influenced participants’ behaviours. To this purpose, we engage the participants with a variety of formats: tabular, scattered data pairs and sequence of values of the dependent variable (sequential). We conjectured that sequential presentation would require imagining the missing independent variable; also, data given as scattered data pairs could disrupt any tendency to make assumptions about sequential patterns.

We first formulate our research questions through a brief overview of research on generalizing from sequential data. Then, we discuss research that uses eye-tracking with specific attention on how eye-gaze behaviour during reading onscreen mathematical representations might support inferences about strategies. We then describe the design of our study and present the main findings about strategies with examples from our data. We conclude with a discussion on the findings and their interpretation.

Generalization from Sequential Data

There is considerable research about generalizing sequential data, particularly for linear relationships (e.g. Carraher, Martinez, & Schliemann, 2008; Radford, 2008; Rivera, 2013). Such tasks are a component in teaching and testing early algebra in several educational contexts worldwide. For example, “generalizing from sequential data” given the first few terms is an explicit expectation in the national curriculum for mathematics in England and children from 11 years are tested on such tasks (DfE, 2013, p. 7). Here, we use quadratics as the simplest and most familiar class of non-linear functions. If generalizing from sequential data is a useful school mathematical activity, we would expect people who have worked successfully on generalizations of linear sequences to also have some expertise in expanding that skill to other functions commonly taught in school. Familiarity with non-linear sequential data can support work on modelling. However, few researchers have looked beyond linear sequences and specifically at quadratics.

Zaslavsky (1997) provides the most thorough examination (n = 800) of student strategies in working with a quadratic function. However, her study focuses on relations between graphical and algebraic representations and provides no direct insights into the use of sequential data, except that students typically confuse linear and quadratic features. Ellis and Grinstead (2008) found that their eight participants made assumptions about symbolic representations of quadratics based on what they knew about linear functions. The closest work we have found to our enquiries is from Yeşildere and Akkoç (2010) who gave 147 pre-service teachers two sequences of diagrams generated by quadratic growth, and identified two generalization approaches in participants’ self-reports. Where the given diagrams were visually related to squares, 43% were able to give an explicit full formula derived from n2 and 30% gave a recursive relation for a term to be generated from the preceding one. In the other task, the diagram was not visually related to a square and 24% gave a full formula, while 45% gave a recursive relation. This suggests that recursion could often be a default strategy. Several researchers have suggested that task presentation of the first few values of a sequence may influence students’ approaches; presenting data in order might encourage the recursive approach (e.g. Steele, 2008). In previous studies, students have tended to focus on differences in the dependent variable (e.g. Confrey & Smith, 1994). For linear sequences, novice learners often recognize constant difference and give recursive descriptions about repeated addition or subtraction. It is likely that tabular presentation tends to mask any need to pay attention to the independent variable, often given as consecutive natural numbers in such tasks. Following these observations, we wanted to investigate whether the use of varied presentations in our design (i.e. tabular, numerical sequences without the visual presence of the independent variable, or scattered data pairs) influences solvers’ strategies.

We expected behaviour suggesting a search for the relationship between a value of the independent variable and the associated dependent variable (correspondence) and also behaviour suggesting that changes in the dependent variable are associated with changes in the independent variable (covariation). Such approaches have the potential to lead to appropriate generalization. In the study by Yeşildere and Akkoç (2010), strategies included construction of partial formulae by correspondence or recursion, finalized using particular cases. Foster (2004) lists possible formal methods for generalizing polynomial-based sequential data that might be taught in schools. We expected that responses to scattered data pairs might include attempts to construct correspondences between independent and dependent variables, which Foster refers to as by inspection. We also expected some familiarity with methods that manipulate successive differences, such as reducing the degree of a function, or comparison with a general case (Cuoco, 2005; Foster, 2004).

In this paper, we investigate strategies in finding the nth term for simple quadratic sequences adopted by those who are already confident with numbers and elementary algebra. We invited graduate pre-service mathematics teachers, all of whom had mathematics-related degrees, as our participants in order to investigate these research questions:

RQ1: What strategies do our participants use to generalize quadratic sequences?

RQ2: How are these strategies affected by different formats (vertical tables, scattered data pairs and sequences of numbers)?

Inferring Meaning from Eye-tracking Data

The number of educational studies that include analysis of gaze patterns has been growing (e.g. van Gog & Scheiter, 2010; Was, Sansosti, & Morris, 2017) including in mathematics education (e.g. Barmby, Andrà, Gomez, Obersteiner, & Shvarts, 2014). Eye-tracking software monitors the movement of the eye-gaze while a participant interacts with content, which in our study is on screen. The software provides several kinds of data about where, and for how long, eye-gaze is focused during an onscreen task. Here, we use the following conventions: ‘fixation’ means the position of the gaze; ‘fixation duration’ indicates time spent on that fixation; movements between fixations are called ‘saccades’, which may be fast or slow. ‘Area of interest’ (AoI) indicates a cluster of nearby positions that may have significance for the study (e.g. they all relate to a particular number or symbol). Attention to designated AoIs, for which participants would be unaware, is the focus of the analysis (e.g. Panse, Alcock, & Inglis, 2018).

Studies on the relationship between gaze and cognition focus mainly on two areas: the development of reading and the attention-directing role of layout and displayed items. A study of generalizing from sequential data involves elements of both of these. Onscreen data has something in common with reading text, in that meaning can be inferred by reading in a particular order. Because the data is patterned and sequential, there is the possibility of some automatisation in this process. However, unlike text, the order of reading might be decided by the participant and might also be led by the layout. In this sense, we see similarities with reading screen graphics, where the gaze sequence is influenced by the arrangement and salience of particular features, and also by having goals and subgoals (Rayner, 1998). In our study, goals were introduced by the researcher (to find a formula) and participants had to construct subgoals to achieve this goal. It is the enactment of such subgoals that interests us.

When people have an element of choice in how they seek and coordinate data, there may be differences in the behaviour of novices and experts in the subject matter (e.g. Ainsworth, 2006). Chumachemko, Shvarts, and Budanov (2014) compared eye movements of people with three ‘levels’ of mathematical expertise when ‘reading’ the Cartesian coordinate system. Experts were more likely to select essential data and bring other information to bear than those with less experience. Interpreting conventional formats is a learnt skill. In this sense, onscreen data about quadratic sequences could be read by experts in part as a diagram, that is a spatial structure of mathematical inscriptions, letters, numerals and punctuations, having mathematical relationships among its parts (Dörfler, 2016). In Dörfler’s view, knowing how to work mathematically with a diagram is a practice, and therefore would be expected to have novice-expert dimensions. Hence, interplay of self-directed gaze and the effects of format interest us, the former suggesting deliberate mathematical subgoals, possibly arising from prior knowledge, and the latter being as yet unexamined.

The effect of prior knowledge on fixation includes short-term “episodic knowledge” (Canham & Hegarty, 2010, p. 156), that is, the effects of what readers do earlier in the episode, and the knowledge they accrue, can influence later gaze. Haider and Frensch (1999) develop a hypothesis that, as expertise grows, people reduce the amount of information they collect. Thus, emergent patterns of gaze have at least two sources: participants’ prior knowledge and experience, and knowledge and experience developed during the study.

All the studies cited so far assume direct connections between gaze and cognition. An alternative view of the plausibility of connections between gaze and cognition arises in the research of Knoblich, Ohlsson, and Raney (2001). Their work focuses on the eye-gaze behaviour associated with being stuck on a problem: when “[h]e or she does not know what to do next, so he or she tends to stare at the problem without testing particular solution ideas” (Knoblich et al., 2001, p. 1002). These long fixations occur after initial explorations of the data. Long fixations can indicate deeper processing, but they could also signal an impasse. The authors characterize actions between impasse and insight as: looking at the problem differently; selecting different data; or gradual transformation of what is known.

Many past studies are about reading text. For example, Just and Carpenter (1980, p. 330) offer an assumption that “the eye remains fixated on a word as long as the word is being processed”, for example, by thinking about meanings already accumulated and how this new ‘word’ relates to them. Our participants do not come to the tasks tabula rasa so may make initial choices about which terms can be read with meaning for the task. They might also use information that is not on the screen. Another useful aspect of Just and Carpenter’s reading model is that the end of a fixation is taken to indicate a move to get further input.

Of relevance to our study, it has been found that analysis of gaze patterns supports the identification of mathematical strategies (Beitlich & Obersteiner, 2015) such as: calculation (Green, Lemaire, & Dufau, 2007); comparison of fractions (Huber, Moeller, & Nuerk, 2014; Obersteiner et al., 2014); and using the number line (Sullivan, Juhasz, Slattery, & Barth, 2011). To avoid over-interpretation of specific fixations and the gaze-cognition connections, a combination of methods is advised: “eye tracking in combination with verbal reports could be reasonable to validate the data” (Beitlich & Obersteiner, 2015, p. 97). For this reason, to support our analysis, we made use of spontaneous utterances by participants combined with our own appreciation of the mathematical structures that they seemed to be using.

The Study

The study involved 18 graduates who were following a masters level mathematics teacher training programme in 2015 at an East Midlands English university. All participants had degrees in mathematics-related subjects, thus having secure relevant numerical and algebraic competence in arithmetical operations and algebraic expressions. They had all been successful in a mathematics curriculum which included generalization of sequence data at elementary and advanced levels, but we did not know if they had used taught or ad hoc methods for non-linear data. They would all be expected to teach such generalizations in the future. We needed participants who could engage with such tasks, although they might not be able to solve them. Recruitment was opportunistic: an invitation went to a cohort of a mathematics teacher training programme and all volunteers participated.

In our research the screen and the seating were normal; there was no special headgear or seating. The technology requires participants to carry out the task ‘on screen’, without access to pencil and paper, as this would result in us not being able to follow the eye movements. Aware of this limitation, we designed sequences (linear and monic two-term quadratics) which we felt were accessible to be completed mentally for these participants, given their academic background.

The gaze of each participant was calibrated, adjusted for individual differences as far as possible. The accuracy of the eye-tracking is dependent upon several factors, such as features of the participants themselves, lighting in the room, stimuli properties, and head position. We ensured the head position of each participant was within a suitable range, the illumination in the room was consistent throughout the study, as were the stimuli on the screen by ensuring the dimensions of the formats and the font size were consistent across all the tasks. Lastly, the distance between each AoI was such that differences of accuracy should not affect clarity about on which AoI gaze was focused, while allowing us also to observe fixations on other features, such as spaces between values. We did find with some participants that the heatmaps showed a consistent displacement but always in the same translation. Data from P14 was excluded from analysis since gaze data did not appear to relate to screen images, possibly due to an eye condition.Footnote 1

Nine tasks (Table 1) were presented, one at a time. Participants were told that they would see sequences of numbers and their task was to try to find a general rule for that sequence, however, they were not aware of what type of sequences they would meet. There was no time limit. The time taken for each participant to do all the tasks (possibly passing on some) varied between 6 and 23 min. Participants were told to say out loud a rule when they felt they had found one. Some spontaneously stated out loud some key aspects they noticed along the way, such as when they knew it involved n2 or when they identified a sequence of differences between successive terms. When such comments were made, the researcher noted the comments along with the approximate time they were made (Fig. 10). The exact time was then verified when analysing the captured data of eye movement and audio within the eye-tracking software.

Table 1 The rules and presentation type of each of the nine tasks for each group of participants

The tasks were a mixture of six quadratic and three linear sequences. All participants had the same tasks in the same order (but not in the same form). The nine tasks were organized into three sets of three tasks (tasks 1, 2, and 3; tasks 4, 5, and 6; tasks 7, 8, and 9 in Table 1). Each set of three included two quadratic and one linear rule. Some quadratics were of a form that could involve factors (e.g. n2 − 2n or n(n − 2)) and some others were not (e.g. n2 + 7). Also, one quadratic in each set involved a subtraction and the other an addition. To keep attention on the screen, the tasks had to be manageable by mental methods as far as we could predict. Hence, all the quadratic formulae had one as the lead coefficient and all could be expressed as two terms.

For RQ2, we were interested in the effect of having the middle three tasks presented in sequential or paired-data form. To this end, the 18 participants were placed into one of three groups of six. Group 1 had nine tabular forms. Group 2 had the middle three tasks presented as scattered data pairs with the others in tabular form. Group 3 had the middle three tasks as sequences of numbers showing the first six values of the dependent variable, with the others in tabular form. Table 1 summarizes the tasks and their design.

By starting and then returning everyone to the familiar tabular format for the first and then the last three tasks, respectively, we hoped to find out whether working with different presentations might affect competences or strategies. Different presentations of task 5 in tabular, scattered data pairs, and sequential forms are shown in Fig. 1.

Fig. 1
figure 1

Three forms of presentation: tabular, scattered data pairs, and sequential

For our analysis, each onscreen number was designated as an ‘Area of interest’ (AoI) and attention to these and other fixations is our main source of data. The eye-tracking software (Tobii Technology, 2010) recorded eye-focus position every 1/60th of a second during these tasks. The software can transform the data into heatmaps, videos of eye-gaze trajectories, strings of position references and other formats. We used the data in two main ways. We used heatmaps to gain a sense of where the eyes were looking on the screen as a proportion of a specific time interval. Maps are coloured from green, through yellow, to red, with red indicating the areas which were looked at for a longer time. In greyscale, the red parts are darker. Figure 2 shows heatmaps from two participants while completing task 9. References to source data throughout the paper are P: Participant; T: Task number; V: Video; N: Field Notes; and H: Heatmap.

Fig. 2
figure 2

Heatmaps from participants P2 (left) and P17 (right), P2-T9-H and P17-T9-H

When it was relevant, we produced heatmaps for shorter time periods when a participant had verbally expressed an intermediate result. For example, Fig. 3 shows two heatmaps for Participant 11 (P11) when working on task 3. The leftmost map covers the time until they said “goes up from 3 to 5 to 7 to 9 to 11”; the rightmost map starts from this expression of differences to when they stated a final formula — a different pattern of foci.

Fig. 3
figure 3

Two stages of Participant 11 working on task 3, P11-T3-H

We also used videos of eye movements tracked on the screen in real time, showing saccades as lines and fixations as blobs, and fixation durations as growing blobs (Fig. 5). Three of the authors viewed data from 12 participants with an overlap of six participants between each pairing of authors. Thus, each participant’s data was analysed by two researchers. Of 18 participants, 17 were analysed. Each researcher started by identifying sequences of fixations and saccades that could be mathematically significant. This led to the joint development of our analytical skills and processes for this type of data. It was agreed that the unit of analysis was not at the level of every move and gaze, merely providing a verbal version of the video data, but was at the level of groups of actions from which some mathematical meaning could be reasonably inferred. For example, saccades were understood to be a rich source of data since they could be interpreted as comparing or relating values. Discussions in researcher pairings led to agreements about the identification of significant episodes. The process of reaching agreement was for each pair to compare their identification of episodes for every participant for every task. Where there was disagreement about an episode, they would watch the video and other data again and reanalyse, together. Sometimes this led to splitting or combining episodes, sometimes to reinterpretation or additional interpretation. Movements evidenced in the videos were summarized as exemplified in Fig. 4. The three researchers then met and used their experiences of this process to confirm the nature of the evidence, the processes of identifying episodes, the processes of interpretation, and to start the process of labelling different strategies used. Interpretation of every episode was therefore agreed by at least two of the three researchers. We then summarized these episodes by constructing narratives about how each participant had tackled each task.

Fig. 4
figure 4

Video observation notes for Participant 9 in Task 2

In previous research (Crisp et al.,2012) horizontal and vertical saccades had been the units of analysis, rather than fixations, with no additional data to support inferences about relationships between attention and cognition. Here, we followed the advice of Hyönä (2010) to combine on- and off-line data to support inferences about connections between perception and cognition. In line with Beitlich and Obersteiner (2015), we supplement eye-tracking data with the soundtrack of spontaneous interim utterances and final results, with a researcher occasionally intervening to ensure clarity. The soundtrack, clarified by contemporaneous researcher notes, gave points in time at which some recognition, confirmed by verbal utterances, had occurred. However, we did not depend on spoken and written commentaries but used these, where available, to validate our interpretations.

Our narratives were necessarily incomplete with several saccades and fixations for which we have no plausible interpretation and no additional data, but the narratives do contain many episodes of mathematical significance and they support plausible conjectures. We then sought other evidence that could validate those conjectures. For example, the form in which a final formula was stated was an indication about how the participant might have arrived at it, such as by factorizing (see Fig. 7).

The narratives synthesized collections of phenomena deemed worthy of further discussion according to our research questions, to the literature, to our initial assumptions and to our observations. Once the pairs of researchers had developed these narratives, we sought common threads across participants and phenomena. This paper reports the common threads relevant for our research questions for the quadratic cases.

Strategies for Generalizing from Quadratic Sequential Data

To address our first research question, we identified and categorized strategies evidenced as sequences of episodes used by participants to gather sequential data, whether they found a formula successfully or not. The following strategies were identified: Sequence of Differences; Building a Relationship (with two substrategies: Building from Square and Factor Search); Known Formula; Linear Recursive; and, Initial Conjecture. In the next sections, we describe these strategies with examples from the data, where all audio mathematical utterances have been transcribed as mathematical text.

Sequence of Differences Strategy

The Sequence of Differences strategy involves comparing consecutive values of the dependent variable with stepwise processes, such as calculation of differences by subtraction or identification of the gap between one value and the following one. This strategy may involve checking all the given values or examining some of them, usually the first few values of the sequence. We consider, as evidence of this strategy, saccades between, and fixations at, consecutive dependent values (Fig. 5). Sometimes this was validated by relevant verbal statements. These saccades moved between consecutive dependent values vertically in the tabular representation, horizontally in the sequential representation or between pairs in the data pair representation. For those who appeared to use the Sequence of Differences strategy, the focus was mostly on the dependent values. For example, the gaze of P10, in the first 17 s (0:00–0:17) of their engagement with task T3 (m = n2 + 7 with tabular representation), moved vertically, mostly in the m column, as follows: started from the top values, continued down to higher values, then returned to the top values and scanned through all the pairs of consequent values (Fig. 5) before saying “again 3, 5, 7, 9, 11”.

Fig. 5
figure 5

Sequence of gaze movement, P10-T3-V (0:00–0:17)

The associated heatmap also indicated that the majority of the gaze activity was on the m column of the data table. P10 identified the difference between consecutive values of the dependent variable, however, they were not successful in finding the formula. For the remaining time between identification of differences and giving up on the task, gaze focused on the top pair values. They may have been trying to build a relationship (see Building a Relationship strategy below) between m and n, without success, or they may have been at an impasse and had not engaged further strategies (Knoblich et al., 2001). Figure 3 showed a participant who used a successful strategy after an initial search for differences.

Most relevant research shows the Sequence of Differences strategy to be a common strategy for linear data, and we were not surprised to see it in the quadratic case; the sequential layout may have supported this approach and eventually participants might have realized that some tasks were linear and some not, so used it as an initial sorting strategy. Use of this strategy did not always lead to the correct formula. It was also common for it to be followed with periods of apparently random movement across the table, or long repeated fixations on single small values, which we interpreted as an impasse. P10 identified the difference in all the quadratic tasks but did not manage to use this information to find any of the formulae in quadratic tasks (see Tables 2, 3, and 4), although successful with the linear data.

Table 2 Strategies, in order of use, identified for the quadratics in the group of participants where tasks 5 and 6 were all in vertical tabular form
Table 3 Strategies, in order of use, identified for the quadratics in the group of participants where tasks 5 and 6 were in sequence of numbers form
Table 4 Strategies, in order of use, identified for the quadratics in the group of participants where tasks 5 and 6 were in scattered data pairs form

Building a Relationship Between the Independent and Dependent Variables Strategy

The Building a Relationship strategy involved examination of relationships between independent and the corresponding dependent variables. This included building from the value of the independent variable towards the value of the dependent (e.g. in the data pair (5, 15), building from 5 to 15 might be 5 + 10 = 15 or 5 × 3 = 15) or vice versa starting from the value of the dependent variable and finding how it can be expressed in order to include the value of the independent variable (e.g. again in the pair (5, 15), 15 can be written as 3 × 5, so 15 = (5–2)×5). This may involve checking all the given values or examining special cases that suggest possibilities. Repeated and sustained horizontal saccades between pairs of dependent and the independent values in the tabular and data pair representations, sometimes accompanied by relevant verbal statements recorded in the field notes, and sometimes by the form of the final rule, were taken as evidence of Building a Relationship. In the sequence form, where the independent variable was not shown, evidence was sought either from the verbal statements or plausible mathematical interpretation of the proposed formula. Our analysis suggested two distinctive forms of relationships that were evidenced in the eye-tracker data and participants’ statements, both of which were dependent on participants recognizing particular structures in certain m values: Building from Square and Factor Search. We discuss and exemplify these further below. It is possible that there were other Building a Relationship strategies that depended on characteristics of particular values, but we could not identify convincing evidence to report any other specific strategies, hence, we keep Building a Relationship as a general strategy within which there were two identifiable substrategies.

Building a Relationship — Building from Square Strategy.

The Building from Square strategy was detected when a participant stated that “[it is] n2” before suggesting the final formula, and/or presented it as “n2 ± something”. We found that the participant initially attended mainly to different values of f(n) (see Fig. 6, left) before saying “[I] notice from previous 3, 5, 7, … so n2”. After establishing it was a quadratic, their eye movements changed where saccades went from n to f(n) (see Fig. 6, right) we assume in order to establish the constant term in the formula.

Fig. 6
figure 6

Typical saccades before and after noticing the differences and n2, P11-T5-V

Building a Relationship — Factor Search Strategy.

In the Factor Search strategy, the values of f(n) are factorized and these factors are seen in relation to the corresponding n values. This activity can be seen also with the opposite direction when the n value is multiplied to reach the corresponding f(n) value (e.g. in the pair (4, 28), 28 = 4 × 7 = 4(4 + 3)). This strategy was evidenced by a gaze activity from n to f(n) or vice-versa accompanied with the verbal statement of the formula in a factorized form. For example, P6 (Fig. 7) started the investigation in task 2 (n2 − 2n) by checking across the table, vertically and horizontally. They identified the differences of the m values: “up 1, 3, 5, 7” and then they did not know how to use the differences to identify the formula. They even considered the possibility of giving up when they asked: “can I pass?”. However, they kept navigating across the table, without a distinctive saccade pattern, up to the point the eye-tracker captured horizontal saccades between 5 and 15; then between 6 and 24; then they said “Mmmm” while the gaze was moving towards the top of the table; and finally said “n(n − 2)” before moving to the pair (4, 8). We interpreted this gaze behaviour as a factor search, possibly starting with an insightful conjecture, where the relationship between n and f(n) is built by seeing n as a factor of f(n). We cannot always distinguish in individual tasks between deliberate Factor Search and the more general use of specific numbers leading to initial conjectures. Particular data pairs, such as (5, 15), can be eye-catching for participants. We can, however, distinguish between the final proposal of factorized or non-factorized forms, and whether the same participant then appeared to seek factors in subsequent tasks. P6 might have attempted to use factors again in the next task (task 3), as their saccades were similar to the previous task. However, this strategy would not be productive since the rule was not one which factorized (n2 + 7). It did not appear that P6 used the Factor Search strategy again.

Fig. 7
figure 7

Sequence of screenshots leading to ‘n(n 2)’, P6-T2-V (1:21–1:37)

Known Formula Strategy

The clear application of an algorithm to identify a polynomial formula from a given set of data was used by only one participant (P13). In task 2 (m = n2–2n), they looked at values in the m column (Fig. 8) and said: “OK so difference is changing 1 3 5 7 9”. Then, they worked at the top pair and the m column were checked first while saying “So, it will be -1 … +” then “n− 1 + ½ (n− 1) (n− 2) × 2” and “½ and 2 cancel each other”.

Fig. 8
figure 8

Heatmaps before and after noticing the differences, P13-T2-H

P13’s method drew on a general version of method of differences to calculate polynomial coefficients by using its values at consecutive points. For quadratics, it could be expressed in a simpler form, but P13 used a general versionFootnote 2 for polynomial functions. This strategy was identifiable from the final stated form; heatmaps tell us that differences were being used but not how they were used.

Linear Recursive Strategy

This strategy involves identification of recursive linear relationships between consecutive data pairs. These relationships include both independent and dependent variables. Evidence of activity that was identified with this strategy includes up/down, left/right, and diagonal saccades between four adjacent cells of tabular data accompanied by relevant verbal statements recorded in the field notes (Fig. 9). We have no evidence of this strategy in the data pair and sequence representations, so we conjecture that this strategy is induced by the physical layout. The table is seen as a grid of data and potential patterns are sought on this grid, without any distinctions being made between variables. For example, P16 said, for task 3 (n2 + 7): “n-cubed is n-squared + n – 1 + n” (sic) and later stated “(n − 1) + n + n − 1” as the final statement. Interpreting these to have mathematically meaning, expressed incorrectly, we get: “f(3) is f(2) + n − 1 + n” and “f(n) is (n − 1) + n + f(n − 1)”, both correct but expressing a recursion rather than a quadratic generalization.

Fig. 9
figure 9

Part of the table for task 3, P16-T3-V

Initial Conjecture Strategy

The Initial Conjecture strategy involves an early conjecture of a plausible element of the formula based on one or two data points and then attempts to build up the rest of this formula from this element, evidenced by a few horizontal saccades during the testing stage. This is what Foster (2004) calls method by inspection. This strategy could be seen together with other strategies. The fieldnotes (Fig. 10) of P8’s comments provide major evidence that they tried to build a formula from the constant “8” in task 3 (n2 + 7).

Fig. 10
figure 10

Copy from researcher’s notes, P8-T3-N

Analysis of the Strategies Used

In this section we take an overview of the strategies used. We look for patterns in the way strategies were used across the participants and across tasks. Numbers are small and what we notice is used to raise questions and conjectures to inform further study and practice.

Tables 2, 3, and 4 display the strategies identified, but note that we are not interested in comparing participants’ competence. By far the most common way the participants started these tasks appeared to be using the Sequence of Differences strategy. On 59 occasions this was the initial strategy discernible from the eye movements (see Tables 2, 3, and 4). Fourteen of the 17 participants started off using this strategy on at least one of the quadratic tasks and they had all used that strategy with linear data (not presented in this paper). Building a Relationship and its substrategies were used 32 times at the start, with other strategies being used at the start on only five occasions or fewer.

Although Sequence of Differences was by far the most common strategy used to start, it was usually insufficient on its own. When it was the only strategy used, it resulted in an articulation of a correct formula on just five of the 16 occasions when no other strategy was identified. It was Building a Relationship, including the two substrategies, namely Factor Search and Building from Square, which was the most successful for finishing off to produce a correct formula. Fifty of the 74 successful outcomes came through use of these strategies being employed in the final stages. These strategies are associated with horizontal saccades of eye movements with tabular data whereas Sequence of Differences employs vertical saccades. These all employed some horizontal saccades. In fact, when looking solely at the tasks involving a tabular form, none of the 57 successful outcomes involved only use of the vertical saccades strategy of Sequence of Differences, horizontal saccades were required to gain a successful outcome from tables, even by P13 who needed one horizontal move to check initial values. Two other strategies which were employed successfully to complete a correct formula were Known Formula and Initial Conjecture. P13 used a Known Formula successfully throughout that included using differences in consecutive values of the dependent variable. The strategy of Factor Search was generally successful when used for those sequences which did factorize.

All 13 participants who succeeded in the final two tasks had improved their speed per task, which suggests automatisation, or an increased expertise in selecting helpful data, as suggested from the literature (Chumachemko et al., 2014; Haider & Frensch, 1999).

We saw evidence of partial approaches that may have been learnt in school. Some participants appeared to be trying to apply methods for identifying linear formulae to quadratic data, building solely from the differences and f(0), as suggested by Zaslavsky (1997) and Ellis and Grinstead (2008). Several participants acted as if announcing the sequence of differences was a result for the task they had to solve. Then, there was a pause before starting to seek further. Behaviour after such announcements often appeared to be random at first, and most of the gaze trajectories suggested the kind of response to impasse described by Knoblich et al. (2001) rather than inspection suggested by Foster (2004). On at least one occasion each, three participants (P6, P8, and P16) did not treat dependent and independent variables as if they were anything more than numbers in a grid. P12, who had sequences of the dependent variable for the middle tasks, worked on extending the sequence rather than generalizing the formula, at least for a while, and two others in the same group appeared to do the same but without saying so. Only P13 had discernible strategies for constructing quadratics at first.

Tasks 2, 3, 7 and 9 were presented in a vertical tabular form. Tasks 5 and 6 were presented in different forms to each one of the three groups: vertical tables, scattered data pairs and sequences of numbers. After the overview of participants’ behaviour summarized in Tables 2, 3 and 4, we tried to identify whether there were differences in gaze behaviour in participants who experienced different formats in the middle three tasks, either in their strategies with these tasks or changes in strategies used with tabular data after their work with tasks involving other presentations. Table 2 shows the strategies used by those who had vertical tabular form throughout.

Strategies used when working on non-tabular forms differed from those used with the tabular form. We found indications but not strong evidence that the strategies used with tabular data could be changed as a consequence of having a set of three middle tasks of a different form, either as scattered data pairs or as a sequence of numbers.

Specifically, three of the five participants in the group which had middle tasks presented in the form of sequence of numbers displayed changes in strategies for the final tasks (Table 3). Some changes were noticeable with P12, who continued to start with the Sequence of Differences strategy but in the last two tasks also used Building a Relationship which they had only used for the first time when doing a task in the sequence of numbers form. The first successful attempt with quadratic tasks for P12 was Task 5 with a sequence of numbers. P15 completely stopped using the Sequence of Differences strategy in the final two quadratic tasks having used it in the first two tasks. P18, started using Sequence of Differences from Task 5 and kept using it afterwards. For this group, there was a greater reliance upon the use of Sequence of Differences in the middle tasks. This is not surprising with the sequence of numbers form, since the independent variable was not present at all.

Of the participants who had middle tasks of scattered data pairs (Table 4), P5 used a Sequence of Differences to begin with on both of the initial two tabular data tasks but changed to starting with a Building a Relationship strategy when meeting the scattered data pairs tasks. This carried through to the final two tasks in tabular form. P1 used Sequence of Differences as a starting strategy in the tabular format, but Building a Relationship was their initial strategy in the middle tasks of paired-data. With the other participants in this group, there were no significant changes in the strategies they used for tabular data before and after the data pair tasks. What was of particular interest was the less obvious use of the Sequence of Differences strategy when the data were presented in scattered data pairs form. The deliberate placing of data pairs randomly on the screen, so that they were not visually placed in sequential form, could have encouraged use of horizontal saccades between the independent and dependent variables. Note, however, that some of those who had tabular data throughout also changed strategy during the task sequence.

Conclusion

Our first research question focused on identifying strategies used to generalize quadratic sequences. We had expected our participants to be familiar with this kind of task, but, apart from P13, there was no evidence of fluent execution of known methods. In general, the use of the onscreen data was not about seeking the information participants knew they needed for a particular method. Instead, apart from looking at differences, most showed characteristics of impasse and appeared to look around unsystematically for some familiar relations or numbers in order to make progress. We have identified a range of their strategies. Sequence of Differences was used by most as an opening strategy in all layouts, even ones with scattered data pairs — a format where this would not be so natural a strategy. Building a Relationship subdivides into methods of building, the most identifiable and successful here being Building from Square and Factor Search. These were productive in finishing off the final formula. Successful application of a known method (Known Formula) was found only in one participant. Linear Recursive seemed to relate to tabular formation and the identification of recursive linear relationships between consecutive data pairs. There was also evidence of learnt methods used in partial and often irrelevant ways. Lastly, Initial Conjecture was used by five participants, leading to localized building attempts, usually unsuccessfully.

Sequence of Differences has the potential to identify the order of a polynomial, such as when the changes in the dependent variable have a linear sequence. P13 used a correct learnt method that required remembering a fairly complex formula and keeping mental track of its application. P13’s suggested formulae were correct, but in an expanded form rather than the simpler versions found by all the other successful participants. This raises the issue of whether it is sensible to teach a general method which can be carried out in a formulaic way for all polynomials. It is a relatively complex process and would not encourage students to develop useful heuristics and creativity to build formulae from awareness of mathematical connections between data. Such creative activity is what mathematicians employ when working with data which do not conform to a known method. In fact, 12 of the 16 participants who devised strategies to create a rule were successful for over half of their quadratic tasks. We feel that developing the ‘mathematician within’ students require adaptation to analyse the data presented rather than trying to apply a given learned process.

Our tasks were designed for our research purposes about the use of onscreen data and not about the teaching of quadratics or generalization. Nevertheless, it is probable that some learning about strategies took place during this task sequence, with increased speed and quicker use of Building from Square seen as evidence of this (Haider & Frensch, 1999). The most-used successful strategy overall was Building a Relationship using elements, similar to the partial formula method described by Yeşildere and Akkoç (2010). This choice turned out to be reasonable given the simplicity of the tasks. However, this was not always obvious to our participants and they might find higher polynomial functions, or non-unit leading coefficients, problematic. Our aim was not to find out what they could or could not do mentally, or what they could have done in other contexts, but to understand strategies in onscreen gathering of formatted data.

Our second research question concerned whether there were noticeable changes in strategies used with tasks involving other presentations, either scattered data pairs or in sequence of numbers form, and whether these formats might influence subsequent work with tabular data. The influence on participants’ range of strategies was inconclusive, but analysis indicated sporadic influence. This is an area that needs further investigation. Additionally, we identified that Sequence of Differences strategy was less popular in the paired data tasks, an observation that we would like to discuss further. Participants’ behaviour suggested that they became aware that sequential connections do not provide enough information to generate a formula. There is a need to make connections stemming from horizontal saccades connecting the dependent and independent variables. It is well known that when sequential data is presented in tabular form students in schools tend to find recursive rules and have more difficulty finding a functional rule (e.g. MacGregor & Stacey, 1993). The scattered data pairs tasks were less likely to encourage use of the Sequence of Differences strategy due to the data being presented visually in a way where pairs were not arranged sequentially. Indeed, of the 10 scattered data pairs tasks carried out by participants, this strategy was only employed on three of those, which is few compared to the appearance of the strategy in the other tasks. The main strategy used was based upon Building a Relationship and this was highly successful, with most resulting in a correct formula being stated. Common practice with tabular data encourages saccades between successive values of the dependent variable, resulting in a recursive rule being articulated. In most research using tabular data, many students do not seem able to go on to find a functional rule (e.g. Montenegro, Costa, & Lopes, 2018). Our results raise the issue of whether creating a table of values is the most productive way of working with sequential data, and even whether generalization of sequential data is a useful way to develop a correspondence view of functions. In order to encourage horizontal saccades and use of methods based upon Building a Relationship it may be more productive to present the data in a non-sequential form, such as in data pairs positioned randomly on a page as we did in our study, or using data that are not sequential.

Finally, we review our use of eye-tracking software in this study. The value of eye tracking data has been to give insight into the possible foci of attention of individuals. It does not tell us definitively what was thought, but then neither would think aloud or post facto explanation (e.g. Branch, 2000). It has, however, shown us how some participants were able to overcome unhelpful tendencies and to adopt personal strategies constructed during the study. If our participants had displayed fluent generalizations, we might have been able to say more about the relationship between searching for data and onscreen format. However, what we have instead is information about how participants used formatted data, firstly to apply familiar methods and then to search for clues towards suitable strategies and in some cases to apply recent successful strategies again. Since this is an area that has not been researched and could only proceed in an exploratory manner, we had no hypothesis about participants’ behaviour. Our approach has been to analyse eye tracking videos qualitatively and combine them with heatmaps and other sources of data (participant utterances, audio recording and field notes) together with our own mathematical experience and interpretations. We merit the eye tracking methodology as making a significant contribution to this work, but not as a complete source in itself.

Our study has reinforced our view that it is important and valuable to think about functions from a correspondence perspective, seeking informative data pairs and making use of sequential differences. There are possibilities for further eye-tracking research on the layout and format of data from which to formulate functions.