Keywords

1 Introduction

Within the context of presage-process–product research and its pursuit of ‘good’ teaching (Type C), learning outcomes (Type A) represent the end goal and final criterion on which any assessment of teaching must be based (Medley, 1977, 1987a; also see Figs. 2 and 3 in Manizade et al., 2022). In this respect, ‘good’ teaching can be considered teaching that produces the maximum learning outcomes and progress to meet the prescribed education goals. However, the very idea of learning outcomes and educational goals has changed throughout the years, mainly because the understanding of what education, and here mathematics education, should entail has changed (Kilpatrick, 2020a; Manizade et al., 2022).

Since the 1980s, worldwide, the increasing demand for knowledge in many areas of life and work has placed the burden of productivity on education systems (Klieme et al., 2008). Consequently, this has led to a stronger focus on ‘outputs’ and ‘outcomes’ at all levels of the educational system and their transferability to the job market. In such a society, mathematical knowledge, ability, skills and(or) competence are seen as an essential prerequisite in encountering the challenges of the world today (Boesen et al., 2018; Ehmke et al., 2020; Freeman et al., 2015; Gravemeijer et al., 2017; OECD, 2016). Such a need has also led to a broader understanding of what being ‘mathematically’ equipped means. It includes both posing and answering questions in and by means of mathematics (i.e., reasoning, modelling, problem-solving), as well as handling the language, constructs and tools of the field (i.e., formalism and language, handling different representations, handling material aids and tools for mathematical activity, digital tools included; Niss et al., 2017; Niss & Højgaard, 2019). At the same time, mathematics itself consists of different subfields, each of which may employ somewhat different mathematical tools. Some have argued that the development of the underlying frameworks that bring together these constituents has, in return affected teaching to a certain extent (Type C) (e.g., Boesen et al., 2014). Such discussions are coupled with considerable differences of opinion regarding which teaching methods are effective, which may help sustain different learning goals and desired outcomes (e.g., Blazar, 2015; Hiebert & Grouws, 2007; Hill et al., 2005).

Indeed, efforts to improve the quality of teaching largely depend on the effectiveness and availability of accurate, detailed and objective evaluations of teaching (Medley, 1987a, 1987b). Students’ learning outcomes represent one of the most favoured criteria, especially amid policy, under the assumption that it is reasonable to judge teaching by its results, just as we do for most other activities in life (Darling-Hammond & Rustique-Forrester, 2005). Additionally, both affective and self-belief constructs may be specified as learning outcomes (Ramseier, 2001), here with the rationale that competent participants within a field also hold certain beliefs about the field itself (Aditomo & Klieme, 2020; Radišić & Jensen, 2021). Furthermore, facilitating students’ development of positive self-beliefs and interest in mathematics increases the probability that even students with lower skills can gain the opportunity to move forward in developing own skills and can gradually become individuals who apply reasoning or problem-solving in daily situations (Callan et al., 2021; Freeman et al., 2015; Radišić & Jensen, 2021; Verschafel et al., 2020).

Against this background, in this chapter, we focus on how learning outcomes have been defined and some of the conceptualisations that have been used for the purpose. Afterwards, the process of assessing learning outcomes within the classroom context, here regarding international large-scale assessments (ILSAs), will be discussed in more detail. From the perspective of Medley (1987b), successful assessment of student outcomes involves three essential steps (p. 170). These comprise standard tasks or a set of tasks that must be alike or equivalent for all students. Thus, the differences in the quality of performance will not arise because of dissimilarities in the tasks. Next, a detailed, objective and accurate documentary record is required. Finally, a scoring key with clear procedures for developing the criterion score from the record is compulsory, ensuring that the same quality can be obtained no matter who does the scoring. Robust conceptualisation, instrumentation, design and statistical analyses are crucial prerequisites for these three steps to be fulfilled. ILSAs may be seen as the perfect examples of student outcomes assessments in trying to realise all these conditions.

Subsequently, this chapter will also address how technology shapes our understanding of students’ learning and outcomes. Finally, we revisit the idea of an ‘outcome’, examining it in the context of individual students’ characteristics (Type G), namely self-beliefs and interest in mathematics.

2 Being Competent in Math: A Desirable Learning Outcome

Historically, becoming proficient in mathematics has taken on different meanings and expectations (Abrantes, 2001; Kilpatrick et al., 2001; Niss et al., 2017). Ever since the 1930s, when it came to defining mathematics learning outcomes, the focus was primarily on knowledge and understanding of the mathematical content, that is, definitions and theorems and a clear set of associated procedural skills. However, these ideas of being proficient in mathematics were soon confronted (Niss et al., 2017). Examples can be found as early as in the work of Polya (1945), who writes that if a teacher only focuses on ‘drilling’ their students in routine operations, that same teacher destroys students’ interests, hampering their intellectual development. Since the 1980s, the National Council of Teachers of Mathematics (NCTM) in the US has strongly advocated that problem-solving should be the focus of school mathematics; at the same time, basic skills should be defined to include more than merely computational ability (1980, p. 1). Similar conceptions were nurtured elsewhere, leading to a firm footing in the understanding that the overall enactment of mathematics and general mathematical thinking is and should be embedded in the different aspects of daily activities, school curriculum included (Kilpatrick et al., 2001; Niss & Højgaard, 2019; Niss et al., 2017). At the same time, it was acknowledged that none of these different aspects could stand-alone or contradict one another (Kilpatrick et al., 2001; Niss et al., 2017; RAND Mathematics Study Panel, 2003). Notably, despite the development in what it means to master mathematics, desired mastery, that is, the outcome, was inevitably used as a criterion to assess student learning progress in math, the quality of teaching, or even the system (Klieme et al., 2008). Moreover, although the criterion-oriented outcome evaluation still has a strong foothold in both practice and research, the concept of ‘mathematical knowledge’ has lost its supremacy, and the idea of competence has gained momentum both in mathematics education research and neighbouring fields like educational psychology (Guskey, 2013; Niss et al., 2017; Kilpatrick, 2020b; Sternberg & Grigorenko, 2003; Sternberg, 2017; Weinert, 2001). There, the idea of competence was mainly discussed within and in connection to the notion of ability and intelligence (Sternberg & Grigorenko, 2003; Sternberg, 2017).

The concept of competence is one of the most fleeting in the educational literature (Kilpatrick, 2020b), and arriving at a collective meaning of competence across different fields is even more difficult. The concept possesses a myriad of similar terms like mastery, proficiency or skill (Niss et al., 2017). Furthermore, one distinguishes between a ‘competence’ with a broader meaning compared with the term ‘competencies’, which refers to the various facets of competence (Blömeke et al., 2015).

With the notion that ‘many theoretical approaches and no single conceptual framework’ (Weinert, 2001, p. 46) can be found, Weinert recognises seven different ways of defining competence. Weinert’s framework includes general cognitive competencies, specialised cognitive competencies, the competence–performance model, modifications of the competence-performance model, cognitive competencies and motivational action tendencies, objective and subjective competence concepts and action competence. The idea could be observed as a critical shift within the broader context of the presage-process–product research because it moves away from utilising a strict cognitive lens. Interestingly, starting in the late 1980s, NCTM Curriculum and Evaluation Standards for School Mathematics (1989), besides a focus on cognitive competencies, already included motivational action tendencies, as recognised by Weinert, in the form that students should learn to value mathematics and become confident in their ability to do mathematics. Revisions in 2000 eliminated these attitudinal and dispositional aspects (NCTM, 2000).

Conversely, specialised cognitive competency frameworks dominate the mathematics field. Also, being context-dependent implies that their advancement can only be perceived as a result of an individual’s interaction with relevant situations and experiences, such as one encounter during the mathematics class.

In contrast, Bloom’s taxonomy (1956), with its attempt to outline the cognitive goals of any school subject, can be seen as a predecessor of context-dependent frameworks today. Its categories of knowledge, comprehension, application, analysis, synthesis and evaluation, which were later revised by Anderson and Krathwohl (2001), do not withstand the criticism that the taxonomy itself does not genuinely fit the mathematics field’s needs (Kilpatrick, 2020b). Nevertheless, a competency framework for mathematics may still include a division of the processes alone, leaving out the mathematical content or combining the processes and the subject’s content. Examples of the former are seen in the frameworks proposed by the National Research Council in the United States and the well-known KOMFootnote 1 project (Niss, 2003; Niss & Højgaard, 2019) linked to the reform of the Danish education system. Reforms in other countries, under similar influences, followed (e.g., Abrantes, 2001; Boesen et al., 2014; Nortvedt, 2018).

The KOM framework

The KOM project defines mathematical competence as a ‘means to have knowledge about, to understand, to exercise, to apply and relate to and judge mathematics and mathematical activity in a multitude of contexts, which do involve, or potentially might involve, mathematics (Niss, 2003, p. 43). It distinguishes between the eight competencies needed for mastering mathematics and is divided into two groups (Niss, 2003, 2015). The first gathers aspects of involvement with and in math—thinking mathematically, posing and solving mathematical problems and modelling and reasoning mathematically. The second group of competencies addresses dealing with and managing mathematical language and tools—representing mathematical entities, handling mathematical symbols and formalisms, making use of aids and tools and communicating in, with and about mathematics (Fig. 1).

Fig. 1
A diagram of interconnected ellipses are labeled with terms like reasoning competency, and modeling competency, among others. Outside the petals, an upward arrow represents asking questions about math, while the downward arrow represents dealing with mathematical language.

Competencies within the KOM framework (adapted from Niss, 2003)

At the same time, it is proposed that the entire set of dimensions has both analytical and productive sides (Niss & Højgaard, 2019). In addition, each competency can be developed and employed only by dealing with specific topics in math, but their choice is not predetermined and may transcend the subject.

Parallel to the use and development of each of the competencies, the framework also proposes three types of ‘overview and judgements’ that students should develop through their study of mathematics. These include its application, its historical development and its unique nature, which combined with the eight competencies, may be used (a) descriptively—to describe mathematics teaching and learning; (b) normatively—by proposing outcomes for school mathematics; and (c) metacognitively—aiding teachers and students in monitoring what they are teaching or have learned so far. Overall, the critical impact of the KOM framework was in its introduction and description of the concepts of mathematical competence and mathematical competencies (Kilpatrick, 2020b), as well as the possible roles these may play in the process of teaching and learning mathematics (Niss & Højgaard, 2019). Ultimately, it should be noted that although KOM authors recognise the importance of affective and dispositional factors as part of mathematical mastery, these factors are not considered within the KOM framework. Mathematical competence and competencies are, in essence, cognitive constructs.

Strands of mathematical proficiency

In contrast to the competency approach, which focuses on what it takes to do mathematics, Kilpatrick, Swafford and Findell (2001) focus on mathematics learning. Mathematical proficiency is the key concept used for this purpose. The basic premise is that mathematical proficiency should not be seen as a unidimensional trait. Instead, it should combine five strands that are mutually intertwined and codependent (Kilpatrick, 2001). Central to this understanding is a concept from the findings in neighbouring fields (i.e., ‘cognitive’ sciences, as referred to by Kilpatrick) that having a deep understanding involves learners being able to connect pieces of existing knowledge. In turn, such a ‘connection’ is an essential factor in facilitating whether learners can use what they know effectively while solving (mathematical) problems.

The following strands are pertinent to the framework (a) conceptual understanding—comprehension of mathematical concepts, operations, and relations; (b) procedural fluency—a skill needed to carry out procedures accurately, fittingly and flexibly; (c) strategic competence—an ability to articulate, formulate, represent, and solve mathematical problems; (d) adaptive reasoning—a capacity for logical thought, justification, explanation and reflection; and (e) productive disposition—a habit of inclining to see mathematics as functional, useful, and meaningful, coupled with a belief in one’s own efficacy (Kilpatrick et al., 2001). To reach mathematical proficiency cannot be achieved by attending to merely one or two of these strands, but rather, this calls for instructional programmes that address them all (Kilpatrick, 2001), an argument resonating well with the earlier ideas of Polya (1945) (Fig. 2).

Fig. 2
An illustration of a rope. The strands on the end of the rope are labeled adaptive reasoning, strategic competence, conceptual understanding, productive disposition, and procedural fluency.

(adapted from Kilpatrick et al., 2001)

Five proficiency strands.

Kilpatrick et al. (2001) argue that, although there is no perfect fit between the proposed strands and different kinds of knowledge and processes identified by researchers within mathematics education or the adjacent fields on the factors contributing to learning, the strands do resonate with a substantial body of literature on the topic. Examples can be found in the investigation of motivation, which is considered a component of productive disposition and metacognition, contributing to strategic competence. Finally, proficiency strands include motivational action tendencies (Weinert, 2001), compared to the KOM framework, which remains solely in the cognitive sphere.

How is mathematical competence conceptualised in the ILSA?

Among the frameworks that combine the processes and mathematics content, we choose to discuss two coming from the ILSA domain here, given their prevalent influence on understanding student competence in mathematics worldwide. The first is linked to Trends in International Mathematics and Science Study (TIMSS), which was initiated in 1995 and as a follow-up to the IEA’sFootnote 2 previous studies during the 1960s through the 1980s. TIMSS uses the curriculum as the principal organising concept to see how educational opportunities are provided to students and the factors that affect how such opportunities are used by the students (Mullis, 2017). The TIMSS curriculum model has three aspects. It comprises the intended curriculum, the implemented curriculum and the attained curriculum. These three aspects combined represent the mathematics students are expected to master. Since 1995, the TIMSS framework has been provided for grades four and eight, with recurrent improvements given each four-year cycle (Mullis, 2017). For example, the TIMSS assessment frameworks for 2019 were updated from those employed in 2015. In this way, the participating countries are provided with a chance for an update regarding their national curricula, standards and mathematics instruction for every cycle, keeping the frameworks relevant and coherent with the previous assessment. At the same time, in each cycle, a particular emphasis is given to a specific aspect of the assessment. In the TIMSS 2019 cycle, the focus was on the transition to eTIMSS. This transition implied conducting the assessments in the eTIMSS digital format, hence providing an enriched measurement of the TIMSS mathematics (and science) frameworks. Here, the mathematics frameworks were updated to utilise both digital and paper assessment formats. About half the countries participating in TIMSS 2019 transited to eTIMSS, and the process has continued into the 2023 cycle.

In 2019, the frameworks for both grades four and eight were organised around two dimensions: (1) content dimension (i.e., subject matter to be assessed) and (2) cognitive dimension (i.e., thinking processes to be assessed; Lindquist et al., 2017). See Table 1 for details.Footnote 3

Table 1 Domains of mathematical competence within the TIMSS framework (adapted from Lindquist et al., 2017)

The content domains differ between grades four and eight, thus reflecting the topics taught at each level. For example, the ‘number’ is emphasised more in grade four than in grade eight, at which point algebra is also introduced. Although in grade four, the section on ‘data’ focuses on collecting, reading and representing data, the interpretation of data, basic statistics and the fundamentals of probability are the focus in the eighth grade. Also, about two-thirds of the items demand that students use applying and reasoning skills. The cognitive domains are alike for both grades, with less of an emphasis on the ‘knowing’ domain for grade eight. Altogether, they largely resemble earlier Bloom’s taxonomy categories (1956).

Similar to the notion of Bandura on the distinction between knowing and being able to use one’s own skills well when under diverse settings (1990), the question on which skills young adults at the end of (compulsory) education would need to be able to play a constructive role as citizens in society was the guiding principle of OECD policymakers in setting up an international programme to assess the outcome of schooling (OECD, 1999, 2003, 2013a, 2018; Trier & Peschar, 1995). Unlike TIMSS, the Programme for International Student Assessment (PISA) crosses the boundaries of school curricula by taking a functional view (Klieme et al., 2008) with the idea of being prepared to cope with the demands and challenges in the future. This cross-curricular competence or life skill becomes central within the PISA framework (OECD, 1997, 2013a, 2018).

In the context of mathematics, the PISA framework initially defined mathematical literacyFootnote 4 as ‘an individual’s ability, in dealing with the world, to identify, to understand, to engage in and to make well-founded judgements about the role that mathematics plays, as needed for that individual’s current and future life as a constructive, concerned and reflective citizen (OECD, 1999, p. 41). Also, the mathematics framework drew a clear parallel to the eight competencies of the KOM framework, but here with the label of skills (e.g., modelling skill, Niss, 2015). Succeeding frameworks alternate ability with the ‘capacity to reason mathematically and to formulate, employ and interpret mathematics to solve problems in a variety of real-world contexts’ (OECD, 2018).

PISA 2003 was the first to focus on students’ mathematical literacy. Its framework entailed situations/contexts (i.e., personal, educational/occupational, public, and scientific) in which the problems were situated. The mathematical content categories (i.e., quantity, space and shape, change and relationships, uncertainty) and the processes (i.e., thinking and reasoning; argumentation; communication; modelling; problem posing and solving; representation; using symbolic, formal and technical language and operations; and the use of aids and tools) were employed to solve them (OECD, 2003). The latter serves the purpose of supporting matematisation, representing constitutive parts of a comprehensive mathematical competence (Niss, 2015). In the 2003 framework, relying on the work of Niss and colleagues is even more prominent.

With the 2012 PISA round, the mathematics framework grew significantly, eventually including a more progressive organisation of the contexts (i.e., societal was introduced), content (i.e., data were combined with uncertainty) and processes that have undergone a more significant change. This change has led to the following division of the processes students engage in as they solve problems: formulating situations mathematically; employing mathematical concepts, facts, procedures and reasoning; and interpreting, applying and evaluating mathematical outcomes. The associated underlying fundamental mathematical capabilities, which replaced earlier mathematical competency (Niss, 2015), include communication, representation, devising strategies, mathematisation, reasoning and argument, symbolic, formal and technical language and operations and mathematical tools (OECD, 2013a).Footnote 5 Redefining fundamental capabilities served the purpose, among other things, to clearly set the stage for a scheme to analyse the requirements of PISA items.

Finally, PISA 2022 has aimed to consider mathematics in a ‘rapidly changing world driven by new technologies and trends in which citizens are creative and engaged, making nonroutine judgements for themselves and the society in which they live’ (OECD, 2018, p. 7) (see Fig. 3).

Fig. 3
An illustration of overlapping circles. The first is labeled mathematical reasoning in the center and on the sides, interpret and evaluate, formulate, and employ. Some of the other circles are labeled quantity, uncertainty and data, and relationships. Two double-sided arrows indicate context and skills.

PISA 2022 mathematics framework—the relationship between mathematical reasoning, the problem-solving (modelling) cycle, mathematical contents, context and selected twenty-first-century skills (adapted from OECD, https://pisa2022-maths.oecd.org/); Note: This is an adaptation of an original work by the OECD. The opinions expressed and arguments employed in this adaptation are the sole responsibility of the author of the adaptation and should not be reported as representing the official views of the OECD or of its member countries

These shifts focus on the capacity to reason mathematically. At the same time, the effect technology has created, fosters the need for students to understand computational thinking concepts that are part of mathematical literacy. The theoretical foundations of the PISA mathematics assessment are still based on its fundamental concept of mathematical literacy, here relating mathematical reasoning and the three processes of the problem solving (mathematical modelling) cycle. The framework defines how mathematical content knowledge is organised into four content categories that are coupled with four context categories that situate the mathematical challenges students face. Novel to the framework is a more detailed description of mathematical reasoning that includes six basic understandings that deliver structure and support. The basic understandings include (1) understanding quantity, number systems and their algebraic properties; (2) valuing the power of abstraction and symbolic representation; (3) seeing mathematical structures and their regularities; (4) distinguishing functional relationships between quantities; (5) using mathematical modelling as a lens onto the real world; and (6) understanding variation as the fundamentals of statistics.

To sum up, irrespective of whether a competency framework is hierarchical (e.g., Bloom), whether it addresses topic areas in mathematics (e.g., TIMSS) or not (e.g., KOM framework), or what its primary use is (normative vs, descriptive), the frameworks serve the purpose of demonstrating that the learning of mathematics and outcomes at the end is more than acquiring a myriad of facts. Instead, mastering mathematics as an outcome of learning (Type A) involves grappling with its content and is more than carrying out well-rehearsed procedures. Although school mathematics is often seen as a simple match between knowledge and skill, competency frameworks challenge this view, affecting curricular contents more and more (Abrantes, 2001; Boesen et al., 2014; Nortvedt, 2018). Even if it may appear that the frameworks do not communicate well, fundamentally, some form of mathematical modelling is described in each (i.e., in PISA ‘formulate’, ‘employ’ and ‘interpret’ or in TIMSS by ‘applying’ and ‘reasoning’). Still, how explicitly this is stated (e.g., in KOM modelling competency), of course, varies. What separates KOM from the PISA and TIMSS frameworks is that the latter consider the reality of ILSA; that is, clear operationalisations are needed for measurement to take place (Medley, 1987b)—‘elements have to be separable to be measurable’ (Niss et al., 2017, p. 241). A clear example of this principle can be found in the introduction of the fundamental capabilities within the 2012 PISA mathematics framework. In the following section, we focus on how learning of mathematics and its outcomes are captured.

3 Assessment of Student Outcomes

It has been argued that as long as there were students learning mathematics in one form or another, they experienced some form of assessment, either to observe the impact of the teaching they have been exposed to or how much of the content they have mastered themselves (Niss, 1993; Suurtamm et al., 2016). Thus although it may seem that discussing student learning outcomes has picked up in intensity recently, especially when the results of ILSAs are concerned, measuring student outcomes has a long tradition. Furthermore, Kilpatrick (1993) maintains that the notion of assessing the mathematics students have learned was unavoidably entwined with the questions of who should receive additional mathematics instruction and how that instruction should be brought about. Thus, assessment and measuring outcomes are an integral part of teaching and learning from the beginning (Suurtamm et al., 2016).

According to Niss et al. (1998), in mathematics, the term assessment refers to the identification and appraisal of students’ knowledge, insights, understanding, skills, achievement performance and capability in math. Pegg (2002) later contests this, stating that the dominant view of assessment in mathematics has been focused on content, specific skills and the production of these in a given situation. In addition, when assessments are being made, they are never free of context, serve different stakeholders and are bound by the available resources. So within the classroom context, although assessing students’ problem-solving skills may be inviting, many teacher-made tests will still focus on computational skills because these are often less time-consuming (Palm et al., 2011).

Conversely, one of the principal reasons assessment of student outcomes has attracted increased attention from the international mathematics education community is that during the past couple of decades, the field of mathematics education has developed considerably (Suurtamm et al., 2016). Nevertheless, assessment practice seems to be somewhat lagging (e.g., paper and pencil is still the dominant format), and the ideas of mathematics as a hierarchically organised school subject and a vehicle for regulating education still seem to be alive (Kilpatrick, 1993, 2020a). Thus, the challenge of assessing students’ learning gains in mathematics still focuses on producing measures that allow for an understanding of how students come to use mathematics (Type A) in different social settings and how one can create mathematics instruction that helps them use mathematics even better (Type C) (Blazar, 2015; Blömeke et al., 2016; Hiebert & Grouws, 2007; Kilpatrick, 2020a; Manizade et al., 2022; Medley, 1977, 1987a).

However, one needs to acknowledge that the variety of assessment practices has been increased. Still, Nortvedt and Buchholtz (2018) recognise that discussions within the field of mathematics education are often influenced by discussions in neighbouring disciplines (e.g., educational psychology), ultimately affecting the purpose, conceptualisation and chosen outlets of assessment. To date, the purpose of assessing student outcomes in mathematics has varied. Although debates are still very much alive on what the purpose of mathematics education is (Niss, 2007; Niss et al., 2017) or the optimal teaching practices (e.g., Blazar, 2015; Hiebert & Grouws, 2007; Hill et al., 2005), the same goes for what should be the primary purpose of assessments in and of mathematics. Although some strongly argue that assessments should be used mainly to improve learning (e.g., Black & Wiliam, 2012; Niss, 1993), the formative-summative debate—coupled with the existence of ILSAs and national tests—ignites the ongoing discussions, despite attempts to merge some of the contrasting perspectives (e.g., Buchholtz et al., 2018; Nortvedt & Buchholtz, 2018). Each type of assessment may target different audiences and needs. While some have the purpose of informing policy-making, others are intended to inform the teacher teaching a particular group of students (Nortvedt, 2018). These goals can also be combined within a particular assessment session.

‘One size of assessment does not fit all’

Student learning outcomes are assessed in different contexts and for different purposes (Klieme et al., 2008; Niss, 1993; Kilpatrick, 1993, 2020a; Suurtamm et al., 2016). These equally include ILSAs (e.g., TIMSS and PISA), evaluations of implemented programmes or classroom assessments. Thus, an assessment is of central importance in education (Taras, 2005). Furthermore, given that the realisation of many educational decisions, choices and interventions depend on assessments, their accuracy in monitoring learning outcomes is pivotal (Klieme et al., 2008). At the individual level, assessments provide teachers with an opportunity to promote individual learning. However, they may also be detrimental in granting students an opportunity to continue education in the desired field (e.g., entry test to the STEM field in higher education). Conversely, assessments that report results at an aggregated level (i.e., country score in PISA) assess institutions or systems and advise and inform decision-makers and policy. Thus, ‘one size of assessment does not fit all’ purposes (Pellegrino et al., 2001, p. 222).

However, in improving learning outcomes, the discussion is often set on the formative–summative divide. If we assume that the central aim of educational research is to improve teaching and learning processes, formative assessment can be seen as one such tool (Black & Wiliam, 2012; Niss, 1993; Pinger et al., 2018; Taras, 2005; Thompson et al., 2018). Formative assessment is founded on the notion of evaluating students’ understanding and progress regularly throughout the process of teaching and making use of this information to improve both teaching and learning. Consequently, teachers can use this information to adapt their (mathematics) instruction, aligning it with students’ needs and providing them with feedback to improve learning. For assessment to be regarded as formative, it is fundamental that the assessment information is used successively to alter students’ learning processes (Black & Wiliam, 2009, 2012). Providing students with feedback is a powerful tool for changing learning processes and, as a result, is regarded as a key strategy in realising formative assessments (Hattie & Timperley, 2007; Klieme et al., 2008).

At the same time, the assessment of individual achievements may also entail the summative evaluation of competencies, either at the individual or aggregated level (Klieme et al., 2008; Taras, 2005). Such evaluations help determine whether a student has reached a certain level of competence, for example, upon completion of upper secondary education. As such, these evaluations have significant consequences—representing a high-stake test for the student (de Lange, 2007; Klieme et al., 2008). However, if a student takes part in an ILSA survey such as TIMSS and PISA, the practical consequences of taking a test are nonexistent (e.g., getting a low mark) at the student level (low stakes). In contrast, the consequences of the same assessment at the system level may be detrimental and lead to practical decisions at the system level (e.g., PISA shock in Germany, de Lange, 2007).

International large-scale assessment and the field of mathematics

Robust conceptualisation, instrumentation and design have long been recognised to be essential to successful assessment (Medley, 1987b). Large-scale studies utilise complex and often representative samples, offer multi-layered, rich data and results, allow for the latter’s generalisability and are created to describe and inform about a particular system rather than an individual student (Middleton et al., 2015). Of course, such studies are also rather costly. Nevertheless, in past decades, many countries have opted for some version of large-scale assessment studies. Examples may be found in national mathematics tests in Norway (Nortvedt, 2018) and Sweden (Boesen et al., 2018) or in the National Educational Panel Study (NEPS) in Germany (Ehmke et al., 2020) and National Assessment of Educational Progress (NAEP) in the United States (NCES, 2021). Common across these is that they attempt to capture what it means to be mathematically competent in one way or another. However, despite this ‘common’ goal, it can be questioned to what extent they all measure the same thing or the ‘what’ of the assessment (Nortvedt & Buchholtz, 2018). A joint criterion or framework is missing, similar to measuring and comparing temperatures in different capitals across the world without referencing either Celsius or Fahrenheit scales while doing so (Cartwright et al., 2003). ILSAs produce such a frame of reference, thus attracting much attention in educational research and outside the field, that is, policy and media, when discussing the quality of education in different countries and how that quality can be improved (de Lange, 2007; Nortvedt, 2018).

The origins of ILSAs date back to the 1960s, with the International Association for the Evaluation of Educational Achievement (IEA) being established in 1959. Its purpose was to conduct international comparative research studies focused on educational achievement and its factors. In this initial stage, the aim was to understand the vast complexity of the aspects influencing student achievement in different subject fields, with mathematics being one of them. The famous metaphor used by the founding researchers was that they ‘wanted to use the world as an educational laboratory to investigate effects of school, home, student and societal factors’ (Gustafsson, 2008, p. 1). The argument was that an international comparative methodology was essential for investigating the effects of many of these factors. The first study investigating mathematics achievement in 12 countries started in 1964. The Six Subject Survey, conducted in 1970–71, followed the first study, gathering information on the subjects of reading comprehension, literature, civic education, English and French as foreign languages, and science. Throughout the 1980s, mathematics and science studies were repeated.

During the 1990s, the IEA was transformed. The TIMSS was born, creating a slight shift in the focusFootnote 6—describe the educational systems that partake in the study. The published international reports primarily describe the outcomes alongside background and process factors. There is no attempt to explain the variations in outcomes between school systems—inferences about causes and effects are also omitted. The latter, causes and effects, are left to participating countries, with caution being urged when claiming causality because of the cross-sectional design of ILSAs (Rutkowski et al., 2010).

In many cases, the results were used to evaluate educational quality as a basis for national discussions about educational policy (Cai et al., 2015; Gustafsson, 2008; Middleton et al., 2015). This goal was even more prominent with the establishment of PISA in 2000 (OECD, 2001). The volume and frequency of ILSAs have increased (i.e., every three and four years), along with the number of participating countries (e.g., 58 in grade four and 39 in grade eight in the 2019 TIMSS cycle). Both these aspects have contributed to comparing and contrasting the systems and particular groups within a system, for example, boys and girls in grade four in Norway or across the Scandinavian countries (Cai et al., 2015), helping in identifying the affordances and strengths within and across each (Mullis et al., 2016; Mullis, 2017; OECD, 2013b, 2016).

Here, in terms of methodological challenges, ILSAs have been questioned on whether a single assessment format or particular test can grasp the full image of being skilful in mathematics, the ‘how(s)’ of assessment (Nortvedt & Buchholtz, 2018) and provide comparable measures of curriculum effects across countries (de Lange, 2007). Jablonka (2003) addresses this situated nature of mathematics competence in the context of PISA, stating that the contexts used in the assessment will be familiar to some students more than others (e.g., students across Europe, compared to students in many of the African countries). Cultural differences are an essential aspect in understanding students’ mastery of mathematics as a field (Manizade et al., 2022). This variation is visible in the ILSA results. For example, PISA 2012 reports a significant variance across countries (OECD, 2013b), with as many as, on average, 43% of the students reporting perceiving themselves as not being competent in mathematics. Just within Europe, in the same cycle, the number of students who scored low in mathematics (below level 2) was between 10.5% (Estonia) and 60.7% (Albania). However, according to de Lange (2007), follow-up discussions about the outcomes of ILSAs are often about politics rather than performance, and the consequences of having similar or dissimilar results as a neighbouring country may not be taken upon in the fashion it was established or envisioned earlier. Taking the example of PISA, Baird et al. (2016) claim that the connection between PISA results and policy is not consistent. PISA’s ‘supranational spell’ (p. 133) in policy connects to how its results are used as a magical stick in political discourse, as though results invoke particular policy choices. Instead, they divert from the ideological basis for reforms, indicating that the same PISA results could motivate different policy solutions.

Most often, when a new set of results in PISA or TIMSS for mathematics come out, policymakers, media and a part of academia focus on the country rankings using the number and position in the league tables as an indication of system quality. Auld and Morris (2016) dispute such a view, claiming it reduces the complexity of the information ILSAs may provide while decreasing opportunities to identify insights that could be used to learn valuable lessons about school effectiveness and inform national educational policies.

In observing the benefits of partaking in ILSA programmes and their relevance to mathematics as a field, Sälzer and Prenzel (2014) argue that ILSAs provide a standard or benchmark against which countries can measure themselves. In addition, the abundance of data collected about schools, processes and outcomes allows for profound insights into policy and decision making and observing particular patterns relevant to the teaching and learning process. Cai et al. (2016) are even more explicit regarding the affordance of mathematics education gains from ILSAs; these include understanding students’ mathematical thinking, classroom instruction, students’ experiences with teaching and students’ disposition for mathematics. All of these are highly valid to mathematics education researchers, as well as school leaders and teachers. Moreover, with the recent uptake in the use of technology in mathematics classrooms, ILSAs can also be a vehicle for understanding what it means to be competent in mathematics in a digital environment (Stacey & Wiliam, 2013).

How does technology affect our understanding of student outcomes in mathematics?

The use of technology, especially within the past decade, has influenced how mathematics is viewed (Manizade et al., 2022). Technology has enabled a ‘transformation of [the field] from static to dynamic symbolic systems through which teachers and learners can access knowledge and think’ (Hegedus & Moreno-Armella, 2018, p. 1). It has also set a new understanding of students’ competence—including handling digital tools (Niss et al., 2017)—affected curricular goals (Gravemeijer et al., 2017) and initiated the need for different kinds of assessments to probe students’ skills in a new way (Li & Ma, 2010; Stacey & Wiliam, 2013). Several studies have analysed technology-enhanced learning environments in mathematics classrooms (e.g., Higgins et al., 2019; Hillmayr et al., 2020; Pape et al., 2013). In their meta-study, Li and Ma (2010) show that the effect of technology may vary. For example, technology can promote elementary over secondary students’ achievement or special needs education students over the general population. In addition, the positive effect of technology has been found to be more significant when combined with constructivist instructional approaches compared with the traditional ones. Drijvers (2015) provides caution, stating that the integration of technology in mathematics education is a subtle question, whose success and failure occur at the levels of learning, teaching and research (p. 147).

Over the past decade, digital assessments have emerged primarily in the context of large-scale assessments of students’ outcomes, both within the national (e.g., Norway, Japan, USA) and international contexts. TIMSS and PISA are clear examples of the latter. It has been argued that computer-based assessments (CBAs) possess several advantages. They allow complex stimuli, response formats, and interactive testing procedures and may incorporate computerised adaptive testing (Klieme et al., 2008). Regarding the latter, the task (stimuli) presented is designed to fit the individual ability level of the test taker (student) in real-time. Feedback procedures may also be incorporated (Chung et al., 2008), thus allowing the assessment of learning progress (i.e., ‘dynamic testing’). Moreover, CBAs allow for the production of complex and interactive stimuli that would be very expensive or difficult to realise on paper. Consequently, the practice may afford the assessment of new competencies previously not accessible through more traditional procedures.

Software suitable for use in the mathematics classroom is also increasingly available. Its use within the classroom context is advocated for by the argument of improving the quality of mathematics teaching and assessment to be more realistic and attuned to the needs of the new generation learners (Gravemeijer et al., 2017; Hoogland & Tout, 2018). The possibility of simulating real-life situations in the assessment situation makes CBA an example of what Weinert (2001) describes as context-specific cognitive dispositions.

Although digital tools may enable new and enhanced possibilities for the learning, teaching and assessment of mathematics (Drasgow, 2002; Drijvers, 2015; Higgins et al., 2019; Hillmayr et al., 2020; Pape et al., 2013), only their appropriate use and equal availability to all participants will produce a positive impact (Gravemeijer et al., 2017; Higgins et al., 2019; Li & Ma, 2010). With this in mind, a request for any assessment is to afford all students the optimal opportunities to demonstrate what they have learned and can do (Niss, 2007). Such demand holds the same for technology-assisted assessments. Although the recent shift towards assessments focusing on problem-solving and modelling may benefit from the technology, others argue that assessments of student outcomes combining affordances of technology and nonstatic item formats allows students to demonstrate mastery of a broader range of mathematical skills (Hoogland & Tout, 2018; Stacey & Wiliam, 2013). Conversely, although a shift from traditional paper-based to computer-assisted assessments may be favourable to a task requiring students to model a solution, Jerrim (2016) warns of the possible adverse effects on student outcomes. These risks are primarily in danger of appearing if mathematics teaching does not include the use of such tools. An interesting finding here may be found in PISA 2012. Together with the regular paper-and-pencil test, a computer-based assessment in mathematics was offered as an option. Among the European countries that took advantage of this, only a handful of countries remained at the same competency level when paper- and computer-based assessment results were compared (OECD, 2013b).

Furthermore, studies show that the impact of computer-assisted assessment also relies on students’ prior experience. In some cases, this may include general computer skills (Falck et al., 2018; Stacey & Wiliam, 2013), whereas in others, this tackles the understanding and use of specific tools (Hoogland & Tout, 2018) or item formats (e.g., real-life problems). Hillmayr et al. (2020) show that overall, digital tools positively affect student learning outcomes. However, the provision of teacher training on digital tool use significantly moderates the effect. The effect size is more prominent when digital tools are used in addition to other instruction methods and not as a substitute, with intelligent tutoring systems or dynamic mathematical tools being more beneficial than hypermedia systems.

New opportunities with CBAs have opened doors for their use in a different context that is still relevant to mathematics teaching and learning. Nevertheless, many of these applications are driven by the rapid development of computer technology rather than well-founded models and theories. Thus, much empirical and theoretical work is needed to link complex measurement potentials to particular learning outcomes and (or) instructional practices hence maintaining rigour in the conceptualisation, instrumentation and design (Medley, 1987b) of future assessments.

4 Optimal Self-beliefs and Motivation for Math: A New-Old Learning Outcome

So far, we have discussed students’ learning outcomes as achievement outcomes. However, both an array of motivational and ability–belief constructs may be specified as learning outcomes (Ramseier, 2001). Although Weinert (2001) recognises ‘cognitive competencies and motivational action tendencies’ as one of the seven ways to define competence, in Medley’s reflections (1987a), individual student characteristics (Type G) are seen as mediating the relationship between A (outcomes) and B (students’ learning activities). Medley (1987a) states, ‘Even if two pupils have identical learning experiences, they do not show identical outcomes because of differences in these characteristics’ (p. 105). To date, such a lens has dominated the field. ILSAs are a clear example, with domain achievements measured separately from attitudinal constructs and the latter often being reported concerning achievement. An exception may be found in the framework of Kilpatrick et al. (2001), which include motivational action tendencies.

The tradition of observing ‘attitude’ towards mathematics in mathematics education research was apparent as early as the 1950s (Zan & Di Martino, 2020). However, one fundamental characteristic of the research in that period was the absence of a proper definition or theoretical background. Schukajlow et al. (2017) recognise an increased interest in attitudinal constructs (i.e., motivation, affect, ability beliefs) in mathematics education over the last decade, and similarly, Zan and Di Martino (2020) attribute the beginnings of modern research on these topics to McLeod (1992), who include attitude among the three factors that define the affective domain.

At the same time, neighbouring fields—namely educational psychology—have flourished in different conceptualisations and frameworks that explain what drives human action (e.g., Ryan & Deci, 2016; Eccles & Wigfield, 2020; Hidi & Renninger, 2006). Several of these frameworks are used in mathematics education research, each presenting its terms (Schukajlow et al., 2017). Among them, attitudes, self-beliefs, intrinsic motivation and interest are probably the most commonly used. Amid the diverse frameworks aiming to explain students’ motivation, expectancy-value (EV) theory covers a variety of aspects that affect the decisions students make by relating students’ expectancies for success and subjective task values to their achievement and achievement-related choices (Eccles & Wigfield, 2020; Wigfield & Eccles, 2000).

Within the EV framework, expectancies for success originate from a person’s domain-specific beliefs that are based on experience or beliefs about their ability to succeed in future tasks like solving a mathematical problem. Though the tags these beliefs may have had are somewhat different, confidence, self-efficacy and self-concept are all found under the category of ability beliefs (Lee & Stankov, 2018). Furthermore, the EV model recognises four subjective task values. These include intrinsic value, attainment value, utility value and cost (Eccles & Wigfield, 2020; Wigfield & Eccles, 2000). Intrinsic value relates to the anticipated enjoyment that one expects to gain from doing a task. The dimension itself is similar in certain respects to the concepts of interest (Hidi & Renninger, 2006) and intrinsic motivation (Ryan & Deci, 2016). Attainment value relates to identity and how important the task is for the individual. Utility value indicates how useful the task is for other goals. Again, in certain respects, the utility value is related to the idea of extrinsic motivation (Ryan & Deci, 2016). Finally, cost indicates the time, effort, stress and other valued tasks put away to fulfil the current task in which an individual participates.

Today, although mastering math is seen as a requirement in meeting the demands of modern life (Boesen et al., 2018; Ehmke et al., 2020; Freeman et al., 2015; Gravemeijer et al., 2017; OECD, 2016), research demonstrates that students’ task values and ability beliefs are fundamental to their optimal outcomes in mathematics (Dowker et al., 2016; Marsh et al., 2012; Schöber et al., 2018; Skaalvik et al., 2015; Stankov & Lee, 2017; Wang, 2012, Watt et al., 2012). For example, in PISA 2012, a rise in the degree of one standard deviation in self-efficacy was linked to an increase of 49 score points in achievement—the equivalent of more than one school year (OECD, 2013b). Similarly, students experiencing low-ability beliefs are potentially at risk of underperforming (OECD, 2013b).

Nevertheless, the relationship between achievement in mathematics and different motivational and belief constructs is not always straightforward; neither one portrays a unique image (Zan & Di Martino, 2020). For example, Wang (2012) argues that task values are stronger predictors of engagement and choices to stay in the field of mathematics, while expectations of success predict more immediate student achievement outcomes. Watt et al. (2012) similarly conclude on intrinsic value, linking ability beliefs to staying within the field (i.e., career choice in math). Prast et al. (2018) argue for a unique contribution of perceived competence in predicting subsequent achievement in mathematics.

Gender differences in students’ ability beliefs in mathematics are relatively common (Nagy et al., 2010), mostly favouring boys (e.g., Geary et al., 2019). Likewise, although it has been shown that among ability beliefs, positive self-concept is conducive to student learning and achievement in mathematics (Marsh et al., 2017), the relationship itself can be direct, indirect (Habók et al., 2020) or reciprocal (Schöber et al., 2018).

Despite this somewhat diverse image on how students’ ability beliefs and task values, namely intrinsic value, contribute to achievement in math, that is, what the genuine relationship between what Medley (1987a) labels as the A and G type variables, there is growing support regarding the development of positive self-beliefs and interest in math. The latter has gained a foothold given that both self-beliefs and interest are regarded as facilitators in students becoming individuals that engage in mathematical reasoning, apply problem-solving in daily situations and even choose careers in mathematics (Freeman et al., 2015). From the perspective of lifelong learning, this is crucial given its end goal of building highly competent, engaged individuals. The reasoning backs the notion that competent participants within a field are also those who possess certain beliefs about the field itself (Aditomo & Klieme, 2020), like the use of mathematical reasoning in everyday lives. Also, supporting students’ development of positive self-beliefs and interest in math increases the likelihood that even students with lower competence will acquire the opportunity to move forward in developing their skills and gradually become individuals who engage in, for example, applying problem-solving or some form of mathematical modelling in daily situations (Callan et al., 2021; Radišić & Jensen, 2021).

Echoing what Maehr (1976) noted many years ago, that motivation is one of the more essential and seldom studied educational outcomes, Anderman and Grey (2017) conclude that motivation matters. However, coupled with ability beliefs, motivation is still not considered a prized outcome in (mathematics) education. Undeniably, ‘achievement’ repeatedly triumphs motivation. Although, across many countries, decision-makers proudly acclaim the extent high achievement students have reached in a particular domain, like mathematics, little focus is given to whether those students subsequently wish to continue pursuing a career in mathematics (Anderman & Grey, 2017).

5 Concluding Remarks

Starting from the presage-process–product paradigm and reasoning formed primarily in the period after Medley’s reflections (1977, 1987a, 1987b) on relevant research variables in understanding mathematics teaching and student outcomes as its ultimate goal (Manizade et al., 2022), in the present chapter, an attempt had been made to provide an overview of the main lines of rationale in mathematics education research on student learning outcomes and their assessment. A point of departure in this process has been capturing the basic ideas on what it means to be proficient in mathematics and how students’ outcomes could be understood in light of such ideas. The focus was on different conceptual frameworks instead of particular theoretical background. In doing so, different frameworks were presented, with no ambition to capture all of them but instead to sketch the flow of ideas pre- and post-Medley times. This was achieved by showing dominant orientations (e.g., the dominance of cognitive and context-specific frameworks), their possible similarities (e.g., KOM and PISA framework) and dissimilarities in the understandings each of them provides (e.g., TIMSS and PISA).

A discussion on some core aspects of assessing student outcomes followed, capturing its historical perspective within mathematics education, including the foundations of the formative versus summative assessment and through the lenses of ILSAs, which have strongly affected the assessment process. Although there was no particular aim to investigate all methodological challenges related to assessments as such, major ideas were discussed by keeping in mind the principle conditions Medley (1987b) mentions (e.g., robust conceptualisation, instrumentation, design) when discussing the successful assessment of student outcomes. The section ended with deliberation on the technology intake and need to link complex measurement potentials to particular learning outcomes and (or) instructional practices (Manizade et al., 2022).

Finally, an argument was raised on how student outcomes could be envisioned today and the extent this widens or blurs the relationship between the A–G variables proposed by Medley (1987a). To date, it remains crucial to grasp at what it means and requires to master mathematics. Understanding the role of dispositional factors (e.g., ability beliefs and task values) in the conceptualisations of mathematical competence is still required (Niss et al., 2017), especially given their fleeting position across existing frameworks (e.g., included as one of the proficiency strands but absent from the vast majority of other frameworks). Possible recognition could lead to a broader and fuller understanding of what it means to be proficient in mathematics without solely relying on the cognitive aspect of being competent. Ultimately, this may lead to different methodological choices on measuring mastery and shifting the balance from cognitive to noncognitive learning outcomes, which, in return, affect choices such as applying problem-solving or some form of mathematical modelling in daily situations or even pursuing a career in mathematics. Only then could the enactment of mathematics, coupled with teaching and assessment, be genuinely in agreement.