Summary accuracy feedback and the left digit effect in number line estimation

Kayton, Kelsey; Williams, Katherine; Stenbaek, Claudia; Gwiazda, Gina; Bondhus, Charles; Green, Jordan; Fischer, Greg; Barth, Hilary; Patalano, Andrea L.

doi:10.3758/s13421-022-01278-2

Summary accuracy feedback and the left digit effect in number line estimation

Published: 25 February 2022

Volume 50, pages 1789–1803, (2022)
Cite this article

Download PDF

Memory & Cognition Aims and scope Submit manuscript

Summary accuracy feedback and the left digit effect in number line estimation

Download PDF

Kelsey Kayton¹,
Katherine Williams¹,
Claudia Stenbaek¹,
Gina Gwiazda¹,
Charles Bondhus¹,
Jordan Green¹,
Greg Fischer¹,
Hilary Barth¹ &
…
Andrea L. Patalano¹

2226 Accesses
5 Citations
Explore all metrics

Abstract

A robust left digit effect arises in number line estimation such that adults’ estimates for numerals with different hundreds place digits but nearly identical magnitudes are systematically different from one another (e.g., 299 is placed too far to the left of 302). In two experiments, we investigate whether brief feedback interventions designed to increase task effort can reduce or eliminate the left digit effect in a self-paced 0–1,000 number line estimation task. Participants were assigned to complete three blocks of 120 trials each where the middle block contained feedback or no feedback. Feedback was in the form of summary accuracy scores (Experiment 1; N = 153) or competitive (summary) accuracy scores (Experiment 2; N = 145). In both experiments, planned analyses revealed large left digit effects in all blocks regardless of feedback condition. Feedback did not lead to a reduction in the left digit effect in either experiment, but improvements in overall accuracy were observed. We conclude that there are no changes in the left digit effect resulting from either summary accuracy feedback or competitive accuracy feedback. Also reported are exploratory analyses of trial characteristics (e.g., whether 299 is presented before or after 302) and the left digit effect.

Twenty years of load theory—Where are we now, and where should we go next?

Article 04 January 2016

Validity and reliability of a ruler drop test to measure dual-task reaction time, choice reaction time and discrimination reaction time

Article Open access 07 March 2024

Combining speed and accuracy to control for speed-accuracy trade-offs(?)

Article 18 July 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Understanding numerical magnitudes is an important skill. Numerical magnitude estimation is often measured with the number line estimation task. This task is used as a skills assessment and education training tool, and is also important for understanding underlying cognitive processes (e.g., Barth & Paladino, 2011; Booth & Siegler, 2008; Brez et al., 2016; Schneider et al., 2018; Siegler & Opfer, 2003; Siegler & Ramani, 2009; Slusser et al., 2013; Xing et al., 2021; Zhu et al., 2017). In a typical number line estimation task, one is shown a blank horizontal line with a label at each end (e.g., 0 and 1,000) and is asked to estimate the location of target numerals on the line (e.g., 802; see Fig. 1).^{Footnote 1} A primary measure of task performance is overall accuracy error, which reflects the difference between one’s placement and the correct location of each numeral. Overall accuracy error has been linked to many measures of numerical competency, including counting and fraction skills in children (Hamdan & Gunderson, 2017; Hansen et al., 2015; Jordan et al., 2013; Östergren & Träff, 2013), math performance on standardized achievement tests (Booth & Siegler, 2008; Holloway & Ansari, 2009; Schneider et al., 2009; Tosto et al., 2017; see also Schneider et al., 2018, for review) and numeracy in adults (Patalano et al., 2020; Peters & Bjalkebring, 2015; Schley & Peters, 2014), even when controlling for potential confounding variables (e.g., Bailey et al., 2014; Geary, 2011; Hansen et al., 2015; Hornung et al., 2014; Östergren & Träff, 2013; Zhu et al., 2017).

Some error in number line estimation is known to be systematic. The bias that is the subject of most work in this area, in its simplest form, is a tendency to overestimate numerals on one half of the line and to underestimate numerals on the other half. This bias has been modeled as an S-shaped or an inverse-S-shaped curve, with the direction and degree of bias indicated by a parameter estimate β (e.g., Cohen & Blanc-Goldhammer, 2011; Slusser & Barth, 2017; but see Siegler et al., 2009). The shape of the curve is thought to be the result of imprecision in one’s estimate of individual magnitudes (e.g., Dehaene et al., 2008; Siegler & Opfer, 2003) and in the relationship of the part to a whole (e.g., estimating 599 as a proportion of 1000; Barth & Paladino, 2011; Cohen & Blanc-Goldhammer, 2011; Cohen et al., 2018; Slusser & Barth, 2013; see also Hollands & Dyre, 2000; Zax et al., 2019; Zhang & Maloney, 2012). The pattern of bias may also be multi-cyclical (e.g., two S-shapes in a row), with the number of cycles thought to depend on the number of reference points, besides the two endpoints, used to perform the task (e.g., using the line’s midpoint of 500 as an additional reference point; Hollands & Dyre, 2000; Peeters et al., 2017; Slusser et al., 2013; Sullivan et al., 2011). All else being equal, placements are typically more accurate when more reference points are used.

In most work to date, it is the overall value of the target numeral that is used for predicting placements. For example, for targets 598 and 601, placements would be predicted to be similar because the targets have nearly the same overall magnitude. However, it has recently been demonstrated that the individual digits comprising a numeral also contribute to placement of a target on a line. Lai et al. (2018) observed that when asked to estimate the locations of three-digit numerals on a 0–1,000 number line, individuals exhibited a left digit effect, placing numerals with different leftmost (hundreds-place) digits but similar overall magnitudes farther apart than is warranted. In contrast, they placed numerals with different tens-place digits (e.g., 348 vs. 352) in the same location on the line, suggesting that the bias is driven by the leftmost digit. The effect is very large (ds ≈ 1 in adults), and appears in task variations such as a speeded version of the task (Lai et al., 2018; Williams et al., 2020), and various numerical ranges (e.g., 0–100; Vaidya et al., 2022; Savelkouls et al., 2020; Williams et al., 2021). There is also noticeable individual-level variation in the size of the left digit effect.

Although the left digit effect in number line estimation has been reported only relatively recently, related phenomena have been observed in numerical comparison tasks and in judgment tasks. Numerical comparison tasks involve deciding which of two numbers is larger (e.g., 59 vs. 61). In these tasks, a distance effect arises in that numerals that are farther apart produce faster response times (Dehaene et al., 1990; Moyer & Landauer, 1967). While these effects are most often attributed to comparisons of overall magnitudes, there is evidence that individual digits also matter. For example, longer response times are observed for comparable pairs with the same leftmost digit (47 vs. 49) than for pairs with a different leftmost digit (49 vs. 51; Moeller et al., 2009; Nuerk et al., 2011; Verguts & De Moor, 2005). Relatedly, in price judgment tasks, a product costing on or above a whole-dollar value (e.g., $5.00) is judged as significantly more costly than one priced just below the whole-dollar (e.g., $4.99), while products whose prices do not cross such a boundary are perceived as equally costly (e.g., $4.20 vs. $4.19; Beracha & Seiler, 2015; Lin & Wang, 2017; MacKillop et al., 2014; Manning & Sprott, 2009; Sokolova et al., 2020; Thomas & Morwitz, 2005, 2009). This finding has been extended to judgments about nutritional information (Choi et al., 2019), product evaluations (Thomas & Morwitz, 2005), medical records (Olenski et al., 2020), and hypothetical college portfolios (Patalano et al., 2022), revealing significant consequences of a left digit bias for real decisions.

One important question that has emerged is the extent to which the left digit effect in number line estimation arises because people do not put sufficient effort into performing the task accurately. If it does, the effect might be reduced using simple motivational interventions designed to increase task effort (by encouraging participants to put more effort into making accurate placements). This question is important in that it speaks to the malleability of the left digit effect, as well as the source of the effect and the contexts in which it is likely to emerge. If the left digit effect that has been observed in past work occurs under conditions in which task effort is not sufficiently high, the effect might be easily reduced or eliminated with a simple motivational intervention, and the effect should be unlikely to arise in everyday contexts of importance to an individual. Alternatively, if the effect is not easily reduced through such interventions, the effect should emerge even in everyday contexts in which one strives for accuracy, and elimination of the effect will likely demand more carefully developed interventions. This research question is also of value to researchers seeking to identify strong individual difference measures of magnitude estimation skills, as such measures should reflect as much as possible one’s estimation skills, not variations in task effort.

There is some reason to believe that the left digit effect could be related to task effort. Although the specific cognitive processes that underlie the left digit effect are not well understood, it widely believed that the left digit effect reflects an overweighting of the leftmost digit of multidigit numerals (e.g., Lacetera et al., 2012; Thomas & Morwitz, 2005). It is possible that this overweighting arises, at least in part, because individuals give careful attention to the leftmost digit but devote less attention to rightward digits, leading the latter to be underweighted in magnitude estimates. If true, motivational interventions might reasonably lead to a reduction in the left digit effect. There are many possible reasons for how or why this might happen, but one possibility is that when people reduce attention to rightward digits, they do so strategically to reduce effort, aware that it is possible to give a “close enough” estimate without fully attending to rightward digits. When motivated to be as accurate as possible, they might increase attention to rightward digits in order to increase the precision of their estimate. By this description, intervention-based reductions in the left digit effect would arise from the use of general strategies associated with increasing task effort towards improving accuracy, rather than from trying to reduce the left digit effect per se. This is important in that we have no reason to believe that people are aware of the left digit effect or would be explicitly trying to reduce it in this context.

No work to date has considered the malleability of the left digit effect in response to motivational interventions intended to increase task effort; however, there are a few suggestive findings from related work in numerical cognition. Notably, Eyler et al. (2018) assessed whether a trial-by-trial feedback intervention would reduce overall accuracy error in number line estimation in adults. Although the intervention was brief (feedback was given on only five trials), researchers found that overall accuracy error decreased by about a third for a subset of participants (specifically, those with no college education). In more distantly related numerical cognition studies that use non-symbolic tasks (e.g., judging which set of dots is larger), accuracy feedback interventions have been found to improve performance in adults (De Wind & Brannon, 2012), with improvements attributed to increased motivation, rather than to any intervention-related changes in knowledge (Lindskog et al., 2013). In addition, variability of an individual’s scores across time points on at least one non-symbolic task has been attributed to lapses of attention during task performance (Peters & Bjalkebring, 2015). All of these findings point to the possibility that a motivational intervention intended to increase task effort might serve to reduce the left digit effect in number line estimation.

Overview of experiments

In two experiments, three 120-trial blocks of a 0–1,000 self-paced number line estimation task were administered. There were two between-subjects conditions: no-feedback and feedback. In the no-feedback condition, all three blocks of the task were the same. In the feedback condition, the middle block was modified to serve as an intervention. In this block, a summary accuracy score (on a 0–100 scale where 100 is perfect accuracy across all 20 trials) was provided after each set of 20 trials along with instructions to try to improve one’s score over time. Participants in this condition were also instructed that one reason people often give estimates that are not precisely correct is because they do not pay careful attention to all the digits, and were periodically reminded to pay attention to all digits. The purpose of the intervention was not to give detailed instructions or feedback that would indicate in what direction to adjust one’s responses. Rather, the intervention served to test whether simply motivating efforts to perform the task as accurately as possible (including attending to all digits) would lead to a reduction in or elimination of the left digit effect (and reduction in accuracy error generally).

As in past work, we computed three dependent measures (Lai et al., 2018). The left digit effect was assessed using a dependent measure called a hundreds difference score (Lai et al., 2018), for which we focused on eight critical pairs of target values, called hundreds pairs, with similar magnitudes but different leftmost hundreds place digits (e.g., 199 and 201). Of interest was whether the larger value in each pair would be placed too far to the right of the smaller value on the line (assuming they should be placed in approximately the same location given the numerical range and physical line length used here). To use for comparison, we also computed fifties difference scores in the same manner except using nine pairs of target values, called fifties pairs, with similar magnitudes but surrounding fifties boundaries (e.g., 149 and 151). As a measure of overall accuracy error, we computed percent absolute error (PAE), a commonly used measure of one’s overall performance on the number line estimation task. PAE reflects the differences between one’s placements and the correct locations of targets as a percentage of line length, using all trials except those used in computing hundreds and fifties difference scores.

If the motivational intervention leads to a reduction in the left digit effect, we should see an interaction effect in both experiments with regard to the hundreds difference score. Namely, there should be a greater reduction in the hundreds difference score across blocks in the feedback condition than in the no-feedback condition. In contrast, if the intervention does not reduce the left digit effect, we should not see any greater reduction of the hundreds difference score in the feedback condition relative to the no-feedback condition. Regarding PAE, given the suggestive findings of Eyler et al.’s (2018) feedback study, and the fact that there are likely many sources of the errors that contribute to PAE, we had reason to believe that PAE might decrease as a result of the feedback intervention whether or not the hundreds difference score also decreased.

In addition to asking the important question regarding the malleability of the left digit effect, we also used this opportunity to consider properties of trials that might be related to the size of the hundreds difference score. In exploratory analyses, we used combined data (from both experiments) from each participant’s first block of trials to test whether each of the following properties of pairs of hundreds trials is related to the size of the hundreds difference score: (1) distance between paired targets (i.e., number of intervening trials), (2) order of paired targets (i.e., whether the larger or smaller target in a pair was presented first), and (3) pair boundary (i.e., which boundary the pair surrounded, as in 200, 300, etc.). One past study that looked at boundary pairs individually found a left digit effect for all pair boundaries except those at the 200 and 500 boundaries (Williams et al., in press), thereby suggesting differences across pair boundaries, but no research has yet considered pair distance or order. These exploratory analyses were conducted here to provide additional clues as to when and why the left digit effect emerges.

Experiment 1: Summary accuracy feedback

In this experiment, we compared performance across three blocks of a number line estimation task for participants who had a summary accuracy feedback intervention in the middle block versus those who did not (i.e., who had three identical blocks). If the left digit effect is reduced or eliminated as a result of summary accuracy feedback, we should see an interaction of condition by block; specifically, there should be greater reduction in the hundreds difference score across blocks in the feedback condition than in the no-feedback condition. If the left digit effect also decreases as a result of task practice (regardless of whether the feedback intervention affects performance), we should also see a main effect of block, where scores decrease across blocks in both conditions. While our focus was on the left digit effect, we also conducted the analyses with our measure of overall accuracy error, PAE, to compare findings.

Method

Participants

Participants were 153 undergraduates (89 women, 63 men, one undisclosed) who received Introductory Psychology course credit for their participation. They were run individually in a lab setting in 1-h sessions. Participants were assigned in alternation to either a no-feedback condition (n = 79) or a feedback condition (n = 74).^{Footnote 2} They completed a number line estimation task (as well as several cognitive tasks and scales unrelated to this report). A power analysis (1 – β = .80, α = .05) indicated samples of n ≈ 40 per condition would be needed to detect a moderately small effect size (Cohen’s f = 0.15) for the interaction between condition and block in an ANOVA. The study was approved by the Wesleyan Institutional Review Board; participants gave written consent to participate in the study and were debriefed at its conclusion.

Stimuli

Stimuli for a 0–1,000 number line estimation task were displayed using PsychoPy3 software onto a desktop computer monitor (47 cm wide × 27 cm high; screen resolution 2,560 × 1,440 pixels). On each trial, a horizontally centered target numeral (e.g., 47; 1.5 cm tall) located 8 cm above a black horizontal line (20 cm long) was presented, as shown in Fig. 1a. The horizontal line, which was in the center of the screen, had small vertical lines at each end (1 cm long). The endpoints of the horizontal line were labeled ‘0’ on the left and ‘1,000’ on the right (0.8 cm tall). After a participant selected a location on the line, a vertical red line (1 cm long) immediately appeared in the selected location, as shown in Fig. 1b.

Each block (of three) consisted of the same 120 target numerals between 0 and 1,000. Target numerals were selected to fall on either side of hundreds boundary values (eight pairs of values: 199/202, 298/302, 398/403, 499/502, 597/601, 699/703, 798/802, 899/901), fifties boundary values (nine pairs of values, used as controls: 149/152, 248/252, 348/352, 449/451, 549/551, 648/653, 748/752, 849/853, 947/951),^{Footnote 3} and 82 non-boundary values (e.g., 235, 367, 411). Non-boundary values were used to compute percent absolute error (PAE), a standard measure of task accuracy (e.g., Lai et al., 2018; Petitto, 1990; Siegler & Booth, 2004; Siegler & Opfer, 2003). We use only non-boundary values in computing PAE in order to have an independent set of estimates for calculating this measure. Target numerals were presented one at a time in a different randomized order within each block for each participant. Numerals were “paired” for the purposes of analyses only; they were not paired during presentation.

Procedure

Each participant was seated in front of a computer and given written instructions followed by three blocks of 120 trials each. On each trial, participants responded with a mouse click and a vertical red line appeared in the selected location on the response line. The task was self-paced, but participants were instructed to respond as quickly and accurately as possible. After each response, a rectangular button icon (labeled “Next”) appeared centered at the bottom of the screen to advance to the next trial. A 0.5-s blank screen was presented before each new trial. Coordinates of mouse clicks were recorded and converted to a number between 0 to 1,000, corresponding to the selected location on the response line.

In the no-feedback condition, all three blocks were identical and no feedback was given. In the feedback condition, the first and third blocks were identical but, in the middle block, participants received feedback after every 20 trials, six times in total. Feedback was presented as an overall accuracy score ranging from 0 to 100, where a higher score reflects greater accuracy. Participants were instructed that a score of 0 indicates their responses were as distant as they could be from the correct locations while a score of 100 indicates responses were precisely in the correct locations (see Online Supplemental Materials (OSM) for full instructions).

To calculate a feedback score, we first computed one’s average PAE (PAE = (|actual location – estimated location| / numerical range) * 100) over the preceding 20 trials. We then calculated an accuracy score by subtracting the PAE from 100 (accuracy score = 100 – PAE). Finally, to obtain the feedback score, we assigned all accuracy scores less than 90 to a feedback score of 0, while accuracy scores between 90 and 100 were rescaled to values between 0 and 100 (feedback score = (accuracy score – 90) * 10).^{Footnote 4} A feedback score of “50 out of 100,” for example, would correspond to a PAE of 5% and an accuracy score of 95.

At the time summary feedback was provided, participants in the feedback condition were also instructed to do their best to improve their score and were reminded to carefully attend to all digits comprising each numeral (see Fig. 2). Participants could view the feedback screen for as long as they wished before continuing on to the next set of trials.

Results and discussion

Exclusions

All exclusion criteria and data analyses were planned unless otherwise indicated; exclusion criteria were the same as those in Lai et al. (2018) and Patalano et al. (in press). An individual’s estimate for a target number was removed as an outlier from the computation of hundreds and fifties difference scores (but not from the computation of PAE) if it differed from the group mean for a given target by more than two standard deviations (3.26% of trials, on average, were removed within each block). In addition, we excluded from all analyses participants missing more than three hundreds pairs (as a result of outlier removal) from one or more blocks (n = 7 excluded). A total of 146 participants were in the final dataset (no-feedback condition n = 77, feedback condition n = 69). For descriptive and inferential statistics with outliers retained (N = 153, because participants no longer needed to be excluded for missing hundreds pairs) (see OSM).

Dependent measures

We computed three dependent measures for each participant: PAE, hundreds difference score, and fifties difference score. PAE was calculated as (|actual location – estimated location| / numerical range) * 100, using estimates of all non-boundary target values. Hundreds and fifties difference scores were computed as (estimated location of larger numeral – estimated location of smaller numeral), averaged over the eight hundreds pairs (e.g., the estimate for 703 minus the estimate for 699; see Lai et al., 2018) and, separately, over the nine fifties pairs (e.g., the estimate for 653 minus the estimate for 648). If specific left digits matter, hundreds pairs (target values with similar magnitudes but different leftmost hundreds place digits) will not be placed in the same location on the response line and the larger number in the pair will be placed to the right of the smaller number. Thus, evidence of a left digit effect comes from hundreds difference scores that are greater than zero. We included fifties pairs (target values with similar magnitudes and the same leftmost hundreds place digit) as controls; these numbers should be placed in the same location on the response line, and so we expected fifties difference scores to be no different from zero. We calculated hundreds and fifties difference scores for each block separately so that we could evaluate intervention-related changes in the left digit effect.

Planned analyses

See Table 1 for descriptive statistics for all dependent measures by condition and block. We first asked whether there was evidence of the left digit effect in each block and condition. To do this, we conducted t-tests (all two-tailed) and found that hundreds difference scores were reliably greater than 0 in each of the three blocks in both the no-feedback (ts > 8, ps < .001, ds = 0.92–1.26) and the feedback condition (ts > 6, ps < .001, ds = 0.77–1.13). Fifties difference scores were not different from 0 in any block in either condition (|t|s < 1.5, ps > .160). Based on these findings, we conclude that there is a large left digit effect even when summary feedback is provided. There were no gender differences (|t|s < 2, ps > .090) except in block 3 of the no-feedback condition in which hundreds difference scores were larger for women than men (M = 25.48 vs. 15.43 respectively), t(75) = 2.20, p = .031; and in block 1 of the feedback condition, in which fifties difference scores were larger for women than men (M = 1.69 vs. –4.97 respectively), t(65) = 2.03, p = .047.

Table 1 Descriptive statistics for Experiment 1

Full size table

To assess whether summary feedback leads to a reduced left digit effect (even if it does not fully extinguish it), we conducted a two-way mixed ANOVA with one between-subjects independent variable (condition: no-feedback, feedback) and one within-subjects variable (block: 1, 2, 3). We predicted that if summary feedback leads to a reduction in the left digit effect, a condition by block interaction should arise for hundreds difference scores; we found none, F(2, 288) = 1.85, MSE = 370.27, p = .159. There was also no effect of condition, F(1, 144) = 1.34, MSE = 478.25, p = .249, or block, F(2, 288) = 0.17, MSE = 370.27, p = .844 (see Fig. 3). If summary feedback leads to increased overall accuracy, there should be a condition by block interaction for PAE; in this case, an interaction was found, F(2, 288) = 11.58, MSE < 0.01, p < .001, η_p² = .07; and there were main effects of block, F(2, 288) = 24.00, MSE < 0.01, p < .001, η_p² = .14, and condition (F(1, 144) = 4.25, MSE < 0.01, p = .041, η_p² = .03 (see Fig. 4). PAE generally decreased across blocks, and especially from block 1 to block 2 in the feedback condition, as shown in Table 1. Overall, we found that summary accuracy feedback led to modest improvements in overall accuracy but did not reduce the left digit effect.

In sum, in Experiment 1 we replicated the left digit effect and the large effect size. The findings provide no evidence, however, that summary accuracy feedback leads to a reduction in the left digit effect, and thus do not suggest that the effect observed in many studies could be reduced if greater effort were devoted to the task. In contrast, summary accuracy feedback did lead to reductions in overall accuracy error (specifically, a 13% reduction in accuracy error in the feedback condition), suggesting that increased task effort can lead to improvements in performance more generally. These findings notwithstanding, we did notice that, descriptively, the pattern of means for the hundreds difference score was consistent with a feedback effect. In particular, the hundreds difference scores were smallest in block 2 of the feedback condition. It is possible that our intervention was not strong enough to motivate a change in performance that would be detectable in our study, so we conducted a second experiment with a stronger intervention. Specifically, we added a competitive game context to further motivate accurate number line estimation performance.

Experiment 2: Competitive (summary) accuracy feedback

This experiment largely followed the design and procedure of Experiment 1 except that the summary accuracy feedback was enhanced with the addition of a scoreboard that was described as ranking the top ten highest scoring games of players that semester. Participants were instructed to try to get one’s own screen name onto (or to move one’s name up) the scoreboard. This design is supported by studies showing that use of competitive games can enhance motivation (e.g., Burguillo, 2010; Cagiltay et al., 2015; Chen et al., 2018). As in Experiment 1, we asked whether the feedback intervention would lead to a reduction in the hundreds difference score. If the previous findings were the result of the intervention being insufficiently motivating to lead to a reduction in the left digit effect, we would expect to see a condition by block interaction emerge here. That is, we would expect participants to show greater reduction in the hundreds difference score across blocks in the feedback condition relative to the no-feedback condition. However, if the left digit effect is not reduced by increasing one’s task effort, the pattern of findings should instead be similar to Experiment 1.