Introduction

Measurement estimation skills—and specifically length estimation skills—are frequently needed in everyday life and are, therefore, of central relevance for our orientation in everyday and professional contexts. For example, wrapping a gift with wrapping paper requires estimating the length which is needed to fully cover the gift, and there are several situations in traffic that require length estimation such as instructions from the GPS system or certain traffic signs with speed limits that apply for a certain length only. Because of this relevance, estimation skills are considered in curricular guidelines in many countries (e.g. Kultusminister der Laender [KMK], 2004; Ministry of Education [MoE], 2010; National Council of Teachers of Mathematics [NCTM], 2000) and should be addressed in mathematics classes. However, there is still a lack of research on students’ measurement estimation skills, specifically on models about the estimation processes as well as on models describing the structure of estimation skills (Joram et al., 1998). Some articles on measurement estimation describe several characteristics of estimation situations that are relevant in the course of measurement estimation such as the size or accessibility of the to-be-estimated object (TBEO) (e.g. Bright, 1976; Desli & Giakoumi, 2017; Jones et al., 2009). However, previous studies that aim at assessing measurement estimation skills covered only a small selection of these characteristics (if applicable). In addition, these studies usually define measurement estimation skills as a unidimensional construct regardless of those characteristics that were implicitly or explicitly considered in the respective test. That means, for example, it is not analyzed whether estimating the length of small touchable objects requires a different skill than estimating the length of large untouchable objects. Thus, as Hogan and Brezinski (2003) raised the question regarding the structure of quantitative estimation skills, the question regarding the structure of measurement estimation skills can be asked here.

Currently, there is a lack of studies with tests which systematically vary different measurement estimation situations and which, thus, are able to analyze and model the structure of measurement estimation skills. This missing knowledge of the dimensionality of measurement estimation skills makes it difficult to compare findings from different studies addressing different estimation situations in the administered tests. Moreover, if it turns out that different dimensions of measurement estimation skills should be distinguished, then this may have consequences for the choice of estimation situations offered in the mathematics classroom as opportunities to learn (e.g. Hogan & Brezinski, 2003). Therefore, one purpose of this research is to analyze the dimensionality of elementary students’ measurement estimation skills, which we exemplify for the most basic case of length estimation.

In addition to the estimation situations that may influence students’ estimation processes and their outcomes, differences in the educational approach—as manifested in varying educational traditions in different countries—may also affect the acquisition of students’ estimation skills. Joram et al. (1998, 2005) suggested that students’ measurement estimation skills are closely related to their conceptual understanding of measurement processes and physical measurement skills that students mainly develop in their mathematics education at school (Outhred & Mitchelmore, 2000). Based on these considerations, different curricula and instructional traditions may result in different knowledge bases and measurement skills of students from different international schooling contexts which — in turn — should yield different outcomes of students’ estimations. Such differences can be examined by contrasting student samples of countries with very different educational traditions like the ones in Western and in East Asian countries (Leung et al., 2006). With regard to our main goal of identifying the structure of length estimation skills, it seems reasonable to consider length estimates performed by students who were educated in these different educational traditions such as Germany as a Western country and Taiwan as an East Asian country (for additional information about the differences in the two teaching traditions regarding length estimation, see Ruwisch & Huang, 2018). However, due to a lack of international comparisons on students’ skills in measurement or measurement estimation, little is known about the differences concerning this topic. Hence, the second aim of the present research is to analyze cross-country differences between German and Taiwanese students’ length estimation skills.

Theoretical Background

In the research literature, the notion estimation occurs in various contexts and is considered from different perspectives. A general definition is provided by Mitchell et al. (1999) who define estimation

as a process whereby one approximates, through rough calculations, the worth, size, or amount of an object or quantity that is present in a given situation. The approximation, or estimate, is a value that is deemed close enough to the exact value or measurement to answer the question being posed. (p. 9)

Their definition reflects that there are different kinds of estimates that can be addressed. In the literature, it is generally differentiated between (1) estimating results of computations, (2) estimating measures, and (3) estimating numerosity (Hogan & Brezinski, 2003; Sowder, 1992). In the current research, we focus on measurement estimation only. More specifically, we focus our research on the specific measurement area “length” and analyze with regard to this specific field, whether different characteristics of estimation situations require different estimation skills.

Length Estimation

Measuring can be characterized as “the process of comparing an attribute of a physical object to some unit selected to quantify that attribute” (Bright, 1976, p. 88). According to Bright (1976), a measurable attribute of an object is a characteristic that can be quantified by comparing it to some standardized unit. For measurement estimation, Bright (1976, p. 89) specifies the process of estimating as “arriving at a measurement or a measure without the aid of measuring tools. It is a mental process though there are often visual or manipulative aspects to it.”

Analyzing this mental process, D’Aniello et al. (2015) propose a multi-step action model. Applying this model to the length estimation process results in the three stages: (1) A representation of the object whose length is to be estimated is activated in the working memory. Subsequently, (2) the length of the resulting representation is estimated, employing knowledge and skills (such as strategies). Finally, (3) the estimated length is evaluated in a monitoring process. Estimation strategies—that become relevant in the second step of the process model—are of special relevance in the estimation process (e.g. Hildreth, 1983; Joram et al., 1998; Siegel et al., 1982). Findings of Siegel et al. (1982) suggest that there are certain characteristics of the estimation problems which may lead to a specific choice of estimation strategy. In addition, they found that using specific strategies affects the estimation accuracy (Siegel et al., 1982).

Three main estimation strategies are distinguished in the research literature (e.g. Hildreth, 1983; Joram et al., 1998; Siegel et al., 1982): (1) unit iteration which involves counting the amount of standard or non-standard units that are needed to measure the length of a given object, (2) benchmark comparison involves using knowledge about the length of a reference object. The length of the TBEO is compared to the known length of a reference object, and (3) decomposition/recomposition strategy that transforms the TBEO by decomposing the TBEO into smaller units (decomposition), estimating one of these units using one of the strategies 1 or 2 and adding this estimate as often as needed to reach the estimated length for the entire TBEO (recomposition).

As proposed by the previous arguments, the accuracy of length estimates (which can be an indication of length estimation skills) is related to the use of estimation strategies which is—in turn—related to specific characteristics of the estimation situation. Therefore, it might be possible that different estimation situations require different estimation skills because they can be linked to a specific strategy use. In addition to the strategies, specific knowledge—such as knowledge about physical measurement, about the length of specific objects that can be used as reference objects, or about the TBEOs themselves—is considered a crucial element in the estimation process (D’Aniello et al., 2015). In this regard, the structure of students’ length estimation skills may be determined by different kinds of estimation situations (resulting in different strategy use) and students’ knowledge about measurement and object-specific characteristics. To analyze this structure, firstly, different kinds of estimation situations are necessary to validly assess students’ length estimation skills, and secondly, students from diverse schooling contexts with different educational traditions might be beneficial in order to account for differences in the available knowledge and strategies. In the following two subsections, we elaborate on possibly relevant characteristics of estimation situations and their influence on students’ estimation processes. Finally, we analyze differences in children’s estimation skills with respect to the different instructional approaches in Germany and Taiwan.

Characteristics of Length Estimation Situations

One of the first descriptions of measurement estimation situations that systematically varied specific characteristics was provided by Bright (1976). He distinguished three dichotomous categories which resulted in eight different types of estimation situations. In the first step, he considered the case in which an object is given and one specific characteristic of this object should be estimated. In this case, the two independent criteria whether the TBEO is physically present or not, and whether a unit is present or not were distinguished. Secondly, he considered the reverse case in which a measure is given and students should assign an object that complies with the measure given. In this case, he distinguished the two criteria whether a selection of suitable objects is given or not, and again whether a unit is present or not. Bright (1976) argued that these eight estimation situations must be addressed in the mathematics classroom in order to successfully and comprehensively foster students’ measurement estimation skills. Starting from this description by Bright and with a view to discover further possible relevant characteristics of length estimation situations, Heinze et al. (2018) identified seven characteristics in an analysis which was based on theoretical arguments and a review of previous test concepts of empirical studies (Table 1).

Table 1 Characteristics of estimation situations (TBEO = to-be-estimated object)

Heinze et al. (2018) argued that it is relevant for research in the field of length estimation skills to consider these different estimation situations because the low variation of estimation situations might affect the validity of the empirical data and the interpretation of the results.

The Role of Length Estimation Situations when Assessing Length Estimation Skills

With a view on previous research, it turns out that estimation skills are usually modeled as a unidimensional construct regardless of (a) what kinds of estimation contexts (numerosity, measurement, or computational estimation) are used in an assessment of estimation skills (e.g. Harel et al., 2007; Mitchell et al., 1999), (b) whether there is more than one measure to estimate in a test on measurement estimation skills (e.g. Hogan & Brezinski, 2003; Swan & Jones, 1980), or (c) what kinds of estimation situations are considered in a test aiming to assess length estimation skills as one specific measure in particular (e.g. Desli & Giakoumi, 2017; Jones et al., 2009; Siegel et al., 1982). In the case of measurement estimation, the test instruments often use only one type or a small selection of estimation situations but do not systematically vary possible characteristics. For example, Jones et al. (2009) conducted an interview study with students from grades 6 to 9 on their ability to estimate linear measurements. They used estimation situations that they characterized as (1) estimating the length of an object while viewing the object, (2) naming an object from memory for different metric sizes, (3) estimating the lengths of large objects like a building, (4) metric estimation of objects that students can touch or distances they can pace, and (5) student knowledge of using their body as a ruler to measure different objects (Jones et al., 2009, p. 1502). In the sense of the categories presented by Heinze et al. (2018), they considered size, estimation condition, activity, scale, role of benchmark, and (implicitly) accessibilityFootnote 1 as characteristics of estimation situations but did not vary these aspects in their test and data analysis systematically.

Desli and Giakoumi (2017) assessed length estimations by third and fifth-grade students. The tasks which were used in the assessment systematically varied (1) the object’s orientation (horizontal vs. vertical), (2) visual interference (white background vs. pattern background), (3) spatial dimensionality (three-dimensional objects vs. two-dimensional objects), (4) representation of standard measurement units (draw a line vs. indicate an object), (5) unit (standard unit vs. non-standard unit), and (6) size of the object (small vs. large). Other characteristics like the objects’ accessibility or whether a benchmark is given or not were not considered.

In the approach by Siegel et al. (1982), the TBEOs were either physically present or presented as a photograph. With regard to the TBEOs’ accessibility, Siegel et al. (1982, p. 217) stated that “objects and photographs were passed around”, indicating that the TBEOs were touchable. However, it is unclear whether other characteristics such as size or scale were considered for task development.

As illustrated by the presented studies, the existing research on measurement estimation skills considered some characteristics of measurement estimation situations. All of the studies aim to assess children’s or adults’ measurement estimation skills. Therefore, the different tests aim to assess the same construct even if they follow different research goals. In order to validly assess measurement estimation skills, it is necessary to cover all facets that become relevant in this regard. This implies that it is highly relevant to test whether the different characteristics (as presented in Table 1) may assess different facets of the construct of measurement estimation or whether they assess the same facet. However, due to different research goals, previous studies hardly varied these characteristics systematically in order to examine their relevance. Although the studies of Jones et al. (2009) and Desli and Giakoumi (2017) included the greatest variation in characteristics of measurement estimation situations in their estimation assessment, most studies cover only one or two characteristics and unidimensional models are specified for the analyses. Since different characteristics of estimation situations might require different dimensions of estimation skills, unidimensional models of estimation skills might be too simplistic which in turn limits the conclusions drawn from empirical findings based on these models and the comparability between different studies.

Impact of Educational Traditions on Students’ Length Estimation Skills

According to empirical research, measurement estimation skills are learnable and can be improved by teaching interventions (e.g. Jones et al., 2009; Joram et al. 2005). In addition, it is assumed that measurement estimation skills are dependent on specific knowledge such as knowledge about measurement processes (e.g. Joram et al., 1998) or spatial sense (e.g. Sowders, 1992). Opportunities to acquire these knowledge and skills are to a large extent provided in school mathematics due to the importance of length estimation skills in the mathematics curriculum (KMK, 2004; MoE, 2010; NCTM, 2000). Therefore, different educational traditions which manifest in different schooling contexts such as curricula or teacher education may have a significant influence on students’ length estimation skills. In this respect, international comparative research in the last decades revealed a great variety of content topics, educational goals, and outcomes of mathematics education in elementary school (e.g. Kelly et al. 2020; Mullis et al., 2020) as well as of organization, contents, and outcomes of mathematics teacher education (e.g. Blömeke et al., 2014; Tatto et al., 2012). Significant differences in the corresponding educational traditions were observed especially between Western and East Asian countries (Leung et al., 2006). Hence, including samples from a Western and an East Asian culture in a study appears to be promising for a variation of different aspects of educational traditions. In our study, we considered elementary school samples from Germany and Taiwan as exemplary countries from the Western and East Asian cultures.

International comparative studies on students’ mathematical achievement in grade 4 revealed great differences between German and Taiwanese students with a strong advantage for students from Taiwan (e.g. Mullis et al., 2020). Analyses of the elementary mathematics curricula indicate that length estimation is addressed differently in Taiwan and Germany (Ruwisch & Huang, 2018). In Taiwan, there are specific learning opportunities that address length estimation (Huang, 2016; MoE, 2010). Since over 90% of elementary school teachers use textbooks, which was examined by MoE in Taiwan, as main instructional materials (Askew et al., 2010; Chen, 2019), Taiwanese students have learning opportunities for length estimation in their mathematics class. Particularly, the use of body parts for estimating lengths is explicitly taught (see Fig. 1 as a typical task example from a Taiwanese textbook). As Fig. 1 shows, length estimation is derived from a measurement process using non-standard units (such as arm spans) and determining the amount of non-standard units to fit the to-be estimated length. Figure 2 shows an example of estimating the length of one side of the classroom implemented by a group of fourth-grade students in which the body length (140 cm) was used as a non-standard unit repeatedly and the estimate, 7 m, was obtained through multiplication.

Fig. 1
figure 1

Typical task example from the Teacher Handbook of a Mathematics Textbook series in Taiwan; Na-I, vol. 4 (2015), p. 30

Fig. 2
figure 2

Taiwanese students’ use of body length for estimating the length of the classroom

Textbooks are similarly important in German mathematics classrooms. In elementary school, 86% of the German teachers use mathematics textbooks as the basis for their lessons’ instructions (Mullis et al., 2012), and the textbook chosen by teachers have an effect on students’ performance in mathematics (e.g. Van den Ham & Heinze, 2018). In contrast to Taiwan, length estimation in German elementary schools is mainly comprehended as a purely mental process (visual estimation), while strategies like using body parts play a minor role. Length estimation is often implemented as an intermediate step in the introduction of length measurement and serves as a motivation to use a precise, correct, and objective measuring process (Franke & Ruwisch, 2010). One main element in the German teaching tradition is to have children memorize the length of specific objects as representative benchmarks for 1 mm, 1 cm, 10 cm, and 1 m (in German: “Stützpunktvorstellungen”). Students are encouraged to use this knowledge for length estimation and compare the length of the TBEO with the length of a memorized reference object. Figure 3 shows a typical task example from a German textbook.

Fig. 3
figure 3

Typical task example from Germany; Buschmeier et al. (2012), p. 100

The findings of the curriculum analysis in Taiwan and Germany are supported by textbook comparisons with the focus on length measurement and length estimation (Ruwisch & Huang, 2019). The analysis of three Taiwanese and four German mathematics textbook series for the elementary school revealed that German textbooks had a stronger emphasis on visual estimation (i.e. by estimating the length of a TBEO without touching it or using aids like body parts), whereas Taiwanese textbooks put a stronger emphasis on the use of non-standard units like body parts as a kind of approximate measurement process for touchable TBEOs. Hence, it is likely that Taiwanese students are taught different estimation strategies than German students (usage of body parts as opposed to mental comparison with memorized objects). As the action model of estimation proposes (e.g. D’Aniello et al., 2015), the different strategy use has an impact on the estimation’s process and its outcome.

In the light of these analyzed differences, we assume that students from Taiwan and Germany have different opportunities to learn length estimation. Hence, cross-national analyses of students’ length estimation skills would provide a good basis for the analysis of the structure of length estimation skills.

Research Intention and Research Questions

Due to the lack of empirical research on the structure of measurement estimation skills, the main aim of our research is to analyze the dimensionality of elementary school students’ length estimation skills and, if possible, to describe a corresponding structure model. In order to pursue this aim, we …

  • developed a length estimation assessment which systematically varies the situational aspects given in Table 1,

  • implemented a cross-national study design with samples of Taiwanese and German students who were in third and fourth grades. Thus, we included students from two countries with different educational traditions in mathematics education which possibly induce different learning processes for the acquisition of length estimation skills.

The study was guided by the following three research questions:

  1. (1)

    What is the structure of elementary school students’ length estimation skills?

  2. (2)

    To what extent is the structure of elementary school students’ length estimation skills invariant across the countries Taiwan and Germany?

  3. (3)

    How do German and Taiwanese elementary school students’ length estimation skills differ and can these differences be explained by the different educational traditions?

In order to examine the research questions, two studies were conducted: In study 1, a length estimation assessment was developed to assess the length estimation skills of third and fourth-grade students from Taiwan and Germany. As Joram et al. (1998) point out, most of the children are able to learn physical measurement concepts already in first and second grade, but these concepts seem to be challenging for many students. Hence, for third- and fourth-graders, measurement concepts should be available and it should be possible to measure length estimation skills with a written test. The newly developed length estimation assessment was tested in study 1, and the structure of elementary school students’ length estimation skills was analyzed. In study 2, a revised test was used to replicate the result on the structure of students’ length estimation skills, to check the measurement invariance between the students from the two countries for the identified dimensions, and to analyze the benefits of the structure model for research when examining the length estimation skills of students educated in Taiwan and Germany. Figure 4 gives an overview about the research approach.

Fig. 4
figure 4

Overview research intention and design

Study 1

Sample

Ninety-seven students from Taiwan and 151 students from Germany participated in study 1 (see Table 2). The Taiwanese third and fourth grade students were recruited from four classes in a public elementary school in Taipei City, Taiwan. In Germany, four third and four fourth-grade classes took part in study 1 from public schools near Lueneburg and Kiel, Germany. In total, 248 third- and fourth-grade students participated (see Table 2).

Table 2 Sample characteristics of study 1

Test Design, Test Development, and Test Administration

The previously described characteristics of estimation situations (Table 1) built the basis for item development. In total, 28 length estimation items were developed, and each item was classified with regard to the types of estimation situations described in Table 1. Figure 5 presents some sample items (translated to English) which will subsequently be used to illustrate the characteristics of estimation situations covered in the test.

Fig. 5
figure 5

Item examples of the formal estimation test on length estimation skills

Estimation Condition

Items with physical present TBEOs comprised also situations with a picture of an object in original size or situations in which a line of x cm had to be drawn (i.e. the object was physically present after drawing the line; see Item 3 in Fig. 5).

Accessibility

Touchable TBEOs were always physically present whereas some TBEOs were physically present and visible to the students but not allowed to touch. Therefore, several TBEOs were brought in the classroom and the students were asked during the test to estimate their length without touching them (see Item 1 in Fig. 5; the yellow ribbon was physically present on the classroom’s blackboard but not allowed to be touched). TBEOs that were physically present and touchable were given as representations in the test booklet (see Item 2 in Fig. 5).

Activity

Some questions asked the students to estimate a measure (see Items 1 and 2 in Fig. 5) while other questions asked the students to draw a line in the length of a given measure (see Item 3 in Fig. 5).

Benchmarks

In some items, a benchmark was described that the students could use for their length estimations (see Item 2 in Fig. 5) while other items did not provide a benchmark (see Item 1 in Fig. 5). For items that provided benchmarks, it was varied whether the length of the benchmark was given (see Item 2 in Fig. 5) or not (see Item 3 in Fig. 5).

Scale

Students were asked to estimate either in the metric units cm or mm, or in non-metric units like stripes (see Item 3 in Fig. 5).

Size

The TBEOs in the assessment varied in size from 1 to 100 cm.

Detailed information about the number of items per characteristic is provided in the Online Supplement (Table 9 in Appendix 1). For reasons of implementation, items were separated into two test booklets: The first test booklet contained all items where either no benchmark was given at all or a benchmark was visible but its length was not given (22 items in total). The students had to finish this test booklet before they got the second booklet that held six items with given benchmark lengths. This arrangement was chosen to avoid that students use the given benchmark lengths to estimate the TBEOs in the other items.

All the TBEOs used in the test items were carefully chosen to ensure that (a) children in Germany and Taiwan are familiar with the objects and (b) typical everyday representatives of the TBEOs are of equal length in both countries (e.g. a sharpener for pencils). The test manual that involved short item instructions was developed and discussed by the research team in English and the final version was translated into German and Chinese, respectively. During test administration and the subsequent analysis of the item responses, we found no indication that students did not understand the items.

Students were tested in their classrooms by test administrators following a test manual. The testing time was about one class period (40 minutes). After a short introduction of the item format by the test administrator (one additional sample item was discussed with the entire class) as well as an explanation of the TBEOs that were shown in the front of the classroom, the students worked silently and individually in their booklet. Student questions that occurred during the testing were answered according to the test manual.

The students’ answers were scored with regard to their deviation from the original lengths of the TBEOs. If a student’s length estimation did not deviate more than 10% from the original length, the answer was scored with 3 points, if the deviation was greater than 10% but not more than 25%, the answers were scored with 2 points, length estimations that deviated more than 25% but not more than 50% got 1 point, and answers that deviated more than 50% were scored with 0 points. Test runs showed that one school lesson turned out to be sufficient time for almost all students. Therefore, missing responses were considered as a lack of skills and scored with 0 points.

Data Analysis of Study 1

We had no hypothesis about how many and which dimensions of length estimation skills might originate from the different characteristics of the estimation situations. Accordingly, we used an exploratory approach and conducted a principal component analysis (PCA) to analyze the structure of the estimation skills. To determine the number of principal components, the eigenvalue criterion and the elbow criterion based on the scree plot were considered. Varimax rotation was used to improve the interpretability of components. All statistical analyses were performed using IBM SPSS Statistics 24 (IBM Corp., 2016).

Results of Study 1

The first aim of study 1 was to develop an assessment for length estimation skills which is feasible for elementary school children in Taiwan and Germany. As a second aim, study 1 should provide first explorative results on the possible multidimensional structure of length estimation skills.

The Barlett test of sphericity got significant (χ2 (378) = 1434.35, p < .001) and the Kaiser-Meyer-Olkin measure of sampling adequacy (KMO) was .75 indicating that patterns of correlations are relatively compact, and a PCA should yield reliable factors (Field, 2005, p. 650). Four items were excluded from the analyses due to low correlations to the other itemsFootnote 2. The elbow criterion based on the scree plot suggested a three-component solution (see Appendix 2 in the Online Supplement), whereas the eigenvalue criterion did not yield a plausible solution. The items that load on the three specific components were analyzed in terms of their theoretical similarities, and three dimensions were defined with regard to the theoretically specified characteristics of estimation situations in the items (the rotated component matrix is provided in Table 10 in Appendix 2). The first dimension (Component 1 with 7 items) included estimation situations with TBEOs that are small (≤ 12 cm) and touchable, and the second dimension (Component 2 with 8 items) involved TBEOs that were not small (≥ 15 cm) and touchable while the third dimension (Component 3 with 6 items) included estimation situations with non-touchable TBEOs.

In addition to the four items that were excluded due to low correlations, there are three items with negligible factor loadings. The only two items in the test with small but untouchable TBEOs were either excluded prior to the analyses due to low correlations (Q18) or did not load on either of the three factors (Q10). This is a limitation of study 1 because it is unclear to which of the three dimensions this type of item belongs. From the statistical results we cannot infer whether the structure is (a) small; not small and touchable; not small and not touchable (items with small and untouchable TBEOs in dimension 1) or (b) as suggested above, small and touchable; not small and touchable; not touchable (items with small and untouchable TBEOs in dimension 3).

In sum, the main result from study 1 is that the characteristics of size and accessibility from Table 1 resulted as the two central characteristics to model the multi-dimensional construct “length estimation skills” in study 1.

Study 2

In order to gain further insights about the first dimension, additional items with the characteristics “small and untouchable” were developed and used in study 2 with another group of third- and fourth-grade students from German and Taiwanese elementary schools.

Study 2—Sample and Test Instrument

In total, 903 students from Germany and Taiwan participated in study 2. The sample spreads evenly between countries and between third and fourth grades (Table 3). The 453 Taiwanese students were recruited from four public elementary schools (18 classes) in Taipei, Taiwan. The German sample consisted of 450 students (25 classes) from Kiel and its surrounding regions as well as from the surrounding of Lueneburg.

Table 3 Sample size of the estimation test in study 2

Following the evaluation of study 1, some items and the test manual were optimized (e.g. change of a figure). Six additional items were developed to increase the number of items with TBEOs that are not physically present (and therefore not touchable), two of these items had small TBEOs. Finally, 34 items were used in the main study (detailed information is provided in the Online Supplement Table 9 in Appendix 1). The test administration procedure was identical to the procedure of study 1.

Data Analysis of Study 2

Following the results from study 1, confirmatory factor analyses (CFAs) were conducted to confirm or disprove the previously indicated three-dimensional structure. Seven items were excluded from the CFA because they correlated low with the other itemsFootnote 3. Due to the nested data structure (students in classes), standard errors were corrected for the CFAs. Two tasks in the estimation test had only one prompt for more than one corresponding item (Q2 a–c, Q13 a–c). For these items, residual variances were allowed to correlate.

In order to verify the previously indicated three-dimensional structure from study 1, different models were specified that vary the identified relevant characteristics size and accessibility of estimation situations (Table 4, detailed information about the item distribution within each model is given in Table 11 in Appendix 3). Several fit indices were considered to compare the models. The χ2 test is very sensitive to sample size (Hu & Bentler, 1999, p. 2) and large sample sizes generally result in significant χ2 values. Therefore, comparing fit indices such as Bender’s comparative fit index (CFI), the Tucker-Lewis index (TLI), and the root mean square error of approximation (RMSEA) provide additional information to identify the model that best represents the given data. Hu and Bentler (1999, p. 27) suggest that models with RMSEA values close to 0.06 or smaller as well as CFI and TLI values close to 0.95 or greater have a good fit on the data. Bentler (1990) suggests to consider CFI and TLI values between 0.90 and 0.95 also as an acceptable fit. All CFAs were conducted using the Software MPlus (version 5, Muthén & Muthén, 1998–2011).

Table 4 Model specifications considering “size” and “accessibility” (combinable) criteria. Models 4 and 5 are both consistent with the results of study 1

After identifying the underlying structure of length estimation skills, multiple-group analyses were conducted to analyze whether the previously identified three-dimensional structure is invariant across countries (Taiwan and Germany). Multiple-group CFA was conducted in a stepwise procedure, beginning with the least restricted model as suggested by Brown (2006, p. 269). Stepwise, restrictions concerning factor loadings and thresholds were implemented into the model to test for strong factorial invariance. Since the present data is categorical, thresholds were analyzed instead of intercepts (Bowen & Masa, 2015, p. 234). We followed the measurement invariance steps suggested by Bowen and Masa (2015) except that factor loadings and thresholds were constrained and freed simultaneously as suggested by Muthén and Asparouhov (2002). In order to address the third research question, the mean scores of the German and Taiwanese students in our sample were compared and tested for significant differences using t-tests.

Results of Study 2

Results Regarding the Structure of Students’ Length Estimation Skills (Confirmative Approach)

Table 5 shows the resulting fit indices of the five models described in Table 4.

Table 5 Fit indices of the 5 models resulting from CFA

As Table 5 shows, model 5 provides the best fit to the present data. This finding of study 2 confirms the result of study 1 that length estimation skills should preferably be described as a multidimensional construct that takes into account the characteristics size and accessibility of the estimation situations. The resulting three dimensions in study 2 are (1) small (2) not small and touchable and (3) not small and not touchable. The three dimensions correlate significantly with each other: small correlates with not small and touchable (r = .64, p < .001); small correlates with not small and not touchable (r = .52, p < .001); not small and touchable correlates with not small and not touchable (r = .51, p < .001).

Results of Measurement Invariance

Multiple-group analyses were conducted to analyze whether the multidimensional structure is invariant across countries (Brown, 2006). As the findings in Table 6 show, separate CFAs conducted in both subsamples indicate that the three-dimensional model can be considered acceptable for both countries.

Table 6 Results of Step 1baseline models for each country

Subsequently, configural, metric, and scalar invariance were tested (Table 7). Since the multiple group model in Step 2 shows acceptable fit indices (except for the χ2 value which becomes significant due to the large sample size, Hu & Bentler, 1999, p. 2), metric and scalar invariance were tested simultaneously in Step 3. Factor loadings and thresholds were constrained to be equal across countries. Fit indices of the restricted model were compared to the non-restricted model from Step 2. Based on simulation studies, Meade et al. (2008) recommend to use ΔCFI as an indicator for invariance because it is independent of model complexity and sample size whereas Δχ2 is highly sensitive to the sample size and less sensitive to a lack of invariance. To analyze differences in model fit, we checked whether ΔCFI ≤ 0.01 (Cheung & Rensvold, 2002; Meade et al., 2008) and, in addition, whether ΔRMSEA ≤ 0.015 (Sass, 2011). Both criteria were satisfied for our results (Table 7).

Table 7 Results of Steps 2 and 3analyzing strong factorial invariance

A further analysis revealed that three-factor loadings and thresholds were non-invariant across countries. These values were estimated freely across countries. This means that only 11% of the factor parameters were non-invariant, and the length estimation test proved to be partially measurement invariant so that a comparison of group means across countries is acceptable (Dimitrov, 2010)Footnote 4.

Results of Cross-National Comparisons of Students’ Length Estimation Skills

In order to analyze the differences between German and Taiwanese students’ length estimation skills in our sample, the items for the three dimensions were grouped into three subtests. Cronbach’s α was determined to judge the subtests’ reliabilities and mean values were compared by conducting t-tests. Table 8 shows that there is a significant difference between the German and Taiwanese students’ skills to estimate the length of small TBEOs, on the one hand, and not small TBEOs that are touchable, on the other hand. In both cases, the Taiwanese students outperformed the German students. However, German students performed significantly better than Taiwanese students on items with TBEOs that are not small and not touchable.

Table 8 Group differences in the three dimensions of students’ length estimation skills between German and Taiwanese students

General Discussion

Structure of Elementary School Students’ Length Estimation Skills

Research findings presented in previous studies seem to follow the implicit assumption that students’ length estimation skills can be considered as a unidimensional construct (e.g. Harel et al., 2007; Mitchell et al., 1999). Though some studies distinguish different characteristics of length estimation situations in their items (e.g. Desli & Giakoumi, 2017; Jones et al., 2009; Siegel et al., 1982), an effect of these characteristics on students’ length estimation skills was not examined systematically. In particular, it is an open question whether characteristics of length estimation situations used in test items measure different aspects of length estimation skills. If this is the case, the comparison or combination of reported empirical findings in the sense of cumulating scientific evidence seems problematic. In the present paper, the focus lay on (a) analyzing the structure of length estimation skills, and (b) verifying the relevance of possible dimensions of length estimation skills by analyzing cross-national differences between Taiwanese and German students.

We developed an assessment on length estimation with a systematical variation of seven characteristics of estimation situations (Table 1) as proposed by Bright (1976) and Heinze et al. (2018). The results of the two studies with elementary school students from Taiwan and Germany (i.e. from different educational traditions) yield evidence that a unidimensional model is not an adequate representation of students’ length estimation skills (not in the whole sample, nor in the national subsamples). We were able to show that the three-dimensional structure fit the data of both countries, implying that the characteristics of size and accessibility become crucial for length estimation processes in both countries (even if educational traditions vary). Moreover, the three-dimensional structure of length estimation skills can be adequately assessed across the two groups (partially measurement invariance). These similarities indicate that size and accessibility may be important facets for the teaching and learning of length estimation independent of culture and curriculum. Finally, we found an interesting pattern of differences between Taiwanese and German students’ length estimation skills. In two dimensions, the Taiwanese students of our sample outperformed the German students, and in the third dimension of length estimation skills, the German students of our sample were slightly better than the Taiwanese students. These findings suggest that a simple integration of results of prior research on students’ length estimation skills (e.g. Harel et al., 2007; Hogan & Brezinski, 2003; Mitchell et al., 1999) should be done cautiously and must take into account the characteristics of the test items used in the corresponding studies.

Based on the set of test items used in studies 1 and 2, we found a three-dimensional structure that takes into account the structure-giving characteristics size and accessibility of the TBEO. More specifically, the three resulting dimensions are length estimation of (1) small TBEOs, (2) not small TBEOs which are touchable, and (3) not small TBEOs which are not touchable (Table 5). In our studies, the characteristic “small” refers to TBEOs with a length smaller than 12 cm, whereas “not small” means TBEOs of a length of 15–100 cm. The results indicate that estimating the length of small TBEOs involves different skills than estimating the length of TBEOs that are not small (the same is true for touchable and not touchable TBEOs). There may be different strategies required to estimate small and not small TBEOs, respectively, such as decomposing large TBEOs whereas small TBEOs can be estimated using benchmark strategies. It might be the case that children are more familiar with smaller lengths, perhaps because of their finger or hand span as a body measure (which is about 10–15 cm in grade 3/4), so that the identified threshold in the interval 12–15 cm has a certain plausibility. In addition, if children use decomposition strategies, this may require additional skills like spatial abilities (rotating parts of the TBEO mentally) and arithmetic skills (multiplying the length of a part) while small TBEOs can be assessed as a whole. Similarly, different skills are required to estimate the lengths of TBEOs which are touchable as opposed to untouchable TBEOs which demand mental processes for measurement comparisons. In this case, the results from the cross-country analyses indicate that differences in these items might be traced back to different knowledge and strategy use by students from Taiwan and Germany, as initiated by the different emphasis of the curricula and instructional approaches for length estimation in both countries.

Comparing the mean scores of German and Taiwanese students’ test results in the three resulting dimensions (Table 8) showed significant differences in favor of Taiwanese students for small TBEOs and TBEOs that are not small but touchable. For TBEOs that are not small and not touchable, the mean score of the German subsample was significantly higher than that of the Taiwanese subsample. The skill profiles of the two groups correspond with the different educational traditions in both countries. One main difference between the educational approaches in Germany and Taiwan is (a) the estimation strategy which is taught in class and (b) how explicitly it is addressed. In the German mathematics classroom, length estimation is mainly considered as one specific element in the process of introducing length measurement (Franke & Ruwisch, 2010). Among others, children are asked to measure the lengths of various objects to acquire knowledge about the length of these objects from their surroundings (as benchmark knowledge), which can also be used for estimation purposes. The application of estimation strategies is not emphasized in mathematics classes as a topic on its own. In contrast, length estimation is specifically addressed in the Taiwanese mathematics classroom (Huang, 2016). For example, children are explicitly taught to use their body parts for length estimation (as illustrated in Fig. 1). Hence, it is plausible that Taiwanese students outperform German students in estimating the length of touchable TBEOs in our test because they can use the estimation strategies they were taught. German students are taught to use knowledge about the length of reference objects and, thus, are not dependent on the accessibility of the TBEO. Instead, they are better at using benchmark knowledge to estimate the length of TBEOs that are not touchable. For estimating the lengths of the objects that are not small and not touchable, Taiwanese students’ performance was inferior to that of German students which may result from inadequate estimation experience of using benchmarks mentally (Huang, 2020). Here, another difference between the two educational approaches in the two different countries becomes apparent: While estimation in Germany is understood as a mental process, in Taiwan length estimation can be seen as an estimation process using non-standard units such as body parts. Collectively, the findings imply that curriculum and instruction on measurement estimation play an essential role in developing students’ skills in length estimation.

Limitations

Findings of empirical studies are always subject to restrictions which might influence the results and their interpretation. Our studies used seven characteristics of estimation situations (Table 1) as a basis for test development. In addition, the lengths of the TBEOs used for the test items were restricted to the interval of 1–100 cm regarding the TBEOs’ lengths and did not cover, for example, large TBEOs of 2 m or 5 m which are also discussed in mathematics classes in elementary school (e.g. length of the classroom, height of the door). Moreover, the quality of the test items with small and not touchable TBEOs was not perfect and some had to be excluded in both studies due to malfunctioning. These restrictions of test items might influence the identified structure of length estimation skills. For example, there is a need for a replication study to confirm that the first dimension identified in study 1 and study 2 covers all small TBEOs and not only the small and touchable TBEOs. Therefore, one limitation of the present study is that the imbalance of item characteristics might distort the identified characteristics of the three dimensions. However, though the set of items is restricted, the studies provide evidence that length estimation skills should be modeled as a multidimensional construct. In order to better characterize the relevant dimensions, further evidence and replication studies are needed. A second limitation is that the test on length estimation skills did not assess the estimation strategies children used for reaching their estimates. Hence, it is an open question if varying characteristics such as accessibility enable for or hinder certain strategies such as unit iteration or decomposition/recomposition. A third limitation is the sample selection. It should be mentioned that our analyses are based on the data from students of two intentionally chosen countries (Germany and Taiwan). Including more countries with another variety of schooling contexts and educational traditions might result in other structure-giving characteristics or more than three dimensions, respectively. However, the aim of the present research was to provide evidence that a more complex structure model of length estimation skills has an added value so that two different countries can be considered sufficient as a first step. A fourth limitation is that we did not assess the knowledge of the students about the TBEOs and benchmarks used in the test. Thus, we cannot differentiate whether a child estimated inaccurately because of poor length estimation skills or because of missing knowledge about the TBEOs and benchmarks. Our test items were reviewed by some elementary school teachers from both countries before test administration. Based on their assessment, we considered the TBEOs and benchmarks as familiar to the children. However, we recommend to control for this knowledge in future studies. Finally, we can only make assumptions about children’s estimation skills in the field of lengths. Of course, due to the close connections between lengths and other sizes such as area and volume, it can be assumed that the characteristics of size and accessibility may become relevant here as well. However, this must be investigated in further research.

Conclusions

The results of our studies imply that future research studies as well as mathematics educators should be aware that students’ length estimation skills should in general not be assumed as a unidimensional construct. Depending on the research or teaching goals, different estimation situations should be addressed when students’ length estimation behavior is examined or when length estimation is taught in mathematics classes. Especially the characteristics “size” and “accessibility” seem to be relevant when developing test items or opportunities to learn for students. Future research should further evaluate the relationships between characteristics of estimation situations, estimation strategies, knowledge, and skills. Specific knowledge and strategy use may have an influence on students’ length estimation skills as well as spatial abilities and measurement skills as proposed by Sowder (1992) or Joram et al. (1998).