It is the daily task of mathematics teachers to monitor what students have learned and to adapt their instructional practices to the diagnostic information about students’ mathematical assets and needs (Carpenter et al., 1999; Empson & Jacobs, 2008; Stacey et al., 2018). To support teachers’ monitoring and enhancement practices by external resources, designing formative assessment tools has gained increasing importance in education research, including mathematics education (Burkhardt & Schoenfeld, 2018; Wiliam, 2007). In this context, formative assessment or assessment for learning was defined as “assessment functioning as a bridge between teaching and learning, helping teachers collect evidence about student achievement in order to adjust instruction to better meet student learning needs” (Wiliam, 2007, p. 1054). Existing research shows that well-designed formative assessment tools can be educative for teachers in that they focus the attention on learning goals (which so far perhaps have been underattended to), support the interpretation of assessment outcomes, and provide advice for further instructional actions (Black & Wiliam, 1998; Burkhardt & Schoenfeld, 2018; Stacey et al., 2018).

These general support affordances can be further strengthened by digital formative assessments platforms when student answers can be automatically analyzed and turned into advice for instructional actions (Olsher et al., 2023). While impressive examples have been given for deep analysis of single answers (e.g., Drijvers, 2020), more research is needed to optimize the analytic summaries on teacher dashboards for quick but actionable overviews (Olsher et al., 2023). This corresponds to the state of research on learning analytics dashboards for which technical elements are well developed, but researchers called for better empirically and theoretically grounded dashboard designs to turn the descriptive information into “teacher-actionable insights” (Koh & Tan, 2017, p. 319).

In this paper, we report from a larger design research study for which the overall goal is to optimize diagnostic information dashboards in a digital formative assessment platform to provide teacher-actionable insights that can underpin teachers’ decisions on enhancing students’ conceptual understanding. Such dashboards can only fully exploit their support affordances when they really align with teachers’ support needs (Koh & Tan, 2017; Moser Opitz, 2022; Sergis & Sampson, 2017), so this paper’s intent is to empirically identify these support needs, by investigating what expert teachers can without suitable dashboards that novice teachers do not do. For the middle-term design goal of optimizing the support affordances of dashboards, we first have to investigate the following research question:

How do expert and novice teachers use diagnostic information from a digital formative assessment platform?

For motivating and empirically approaching this question, in Sect. 1.1 we outline a conceptual framework for describing and explaining teachers’ practices of monitoring and enhancing student understanding and then summarize the state of research on formative assessment tools and dashboards potentially supporting these practices. In Sect. 2.1, we present the methodological background of the qualitative expert-novice comparison. Section 3 provides empirical insights into the analyzed practices and support needs. By these steps, we lay out the theoretical foundation and empirical grounding for the development of five design outcomes for supporting expert and novice teachers’ practices in Sect. 4: correctness overview, error-focused detailed report, and individual-focused, task-focused, and trajectory-integrating dashboards.

1 Theoretical Background and Conceptual Framework for Supporting Teachers’ Monitoring and Enhancement Practices

1.1 Teachers’ Practices of Monitoring and Enhancing Student Understanding Through Unpacking and Goal-Setting Subject-Matter Concepts

We start the theory section by defining the constructs of teachers’ jobs and practices and the underlying constructs for explaining the practices: Following Bass and Ball (2004) and Bromme (1992), jobs are defined as typical and often complex situational demands that teachers have to master in classrooms. Following Franke et al. (2007), we define teachers’ practices as the recurrent patterns of teachers’ utterances and actions for managing particular jobs and characterize these practices by the underlying categories and pedagogical tools (e.g., assessment tasks or visuals) upon which teachers implicitly or explicitly draw (Prediger, 2019). Categories are defined as the conceptual (i.e., non-propositional) knowledge elements that filter and focus teachers’ perceiving and thinking, for example, categories about content elements in which learning goals are articulated vaguely or in detail. The overall job of assessing students’ learning progress comprises different subjobs (Wiliam, 2007), for example, screening as a subjob of selecting students with further learning needs (e.g., while recruiting for pull-out groups) or finding difficult tasks as a subjob of selecting difficulties to be revised (for all or for some students).

Different from those is the job in formative assessment on which this paper focuses, monitoring students’ understanding, that means identifying students’ resources for, errors about, and misconceptions about the learning content and its content categories with the purpose of making decisions about further actions for fostering students’ understanding (Black & Wiliam, 1998; Wiliam, 2007). In formative assessment, monitoring understanding is tightly connected to two other jobs (Black & Wiliam, 1998; Moser Opitz, 2022; Prediger et al., 2023): setting learning goals (i.e., unpacking an area of learning content into more detailed content categories and select which content category should be the next learning goals for different groups of students) and enhancing students’ understanding (i.e., taking measures such as selecting/adapting tasks, moves, and manipulatives for providing learning opportunities, either for the overall learning goal or targeted to particular content categories).

Various empirical studies have documented that teachers handle these three jobs (goal setting, monitoring, and enhancing understanding) with different practices (Carpenter et al., 1999; Empson & Jacobs, 2008; Prediger et al., 2023; Stacey et al., 2018). For example, many mathematics teachers restrict their monitoring practices to procedural fluency and neglect conceptual understanding, in particular for students who struggle in the subject (Wilhelm et al., 2017). These procedural monitoring practices can be explained by goal-setting practices (that may be implicit) in which only procedural learning goals are identified as relevant and later used as diagnostic categories. Other teachers adopt conceptual goal-setting practices and monitor students’ conceptual understanding; nevertheless, the learning opportunities they provide might not align to what they assessed as missing in students’ understanding (van der Steen et al., 2023). When teachers’ goal-setting practices include the careful unpacking of learning goals into well-articulated content categories, their enhancement practices tend to align more easily to the diagnostic information received through the monitoring practices (Morris et al., 2009; van der Steen et al., 2023). If the learning goals are only articulated vaguely, then the enhancement practices are rarely targeted to the most relevant content categories (Karsenty, 2010; Prediger et al., 2023). These findings resonate with the systematic review in which teachers’ pedagogical content knowledge (PCK) of detailed learning content categories was identified as a critical prerequisite for productive formative assessment practices, across various subjects and areas of content (Schildkamp et al., 2020).

We exemplify these connections between teachers’ content categories and practices for the meaning of multiplication and division, the mathematical topic in view of this paper, which should be acquired in Grades 2 and 3, but is still not deeply understood by many Grade 5 children (Siemon, 2019). Meanings of multiplication and division can be assessed and enhanced by connecting symbolic and graphical representations, so this has been the main approach in the formative assessment and enhancement material provided to teachers (Prediger, 2022; see Fig. 1).

Fig. 1
figure 1

Unpacking the mathematical topic in view into relevant content categories: Meanings of multiplication and division

In order to interpret diagnostic outcomes, teachers need to know typical challenges, for example, the surface translation, meaning switching between representations only superficially by focusing solely on the numbers, but not on the underlying mathematical structures (Fig. 1). For enhancing students’ understanding, teachers need to be able not only to “feed-back” but “feed-forward” (Hattie & Timperley, 2007). This requires the unpacking of content elements needed for successful connections of representations, these are the model of multiplication as counting in units, and inversely, the partitive model of division and the quotitive model of division (Fig. 1). Teachers have been shown to hold diverse PCK with respect to these content elements and their use as diagnostic categories or goal categories. Because the models are often only vaguely expressed, or the two models for division not carefully distinguished (Roche & Clarke, 2009), teachers may be able to only vaguely monitor students’ assets and difficulties and may fail to provid targeted learning opportunities (Prediger et al., 2023).

Summing up, the state of research on teachers’ monitoring and enhancement practices has revealed that they should be considered together with goal-setting practices that include the unpacking of the overall learning goal into PCK content elements. These can then serve as diagnostic categories for monitoring practices (What does the teacher notice in a student solution?) and as goal categories for enhancement practices (Towards which next goal does the teacher intend to work?).

1.2 Digital Formative Assessment Tools as External Support of Teachers’ Practices

Even if productive monitoring and enhancement practices have been identified in research, their scaled-up dissemination into many classrooms continues to be an ongoing challenge (Chazan & Ball, 1999) with strong differences among teachers (van der Steen et al., 2023; Wiliam, 2007). Hence, the research-based design of formative assessment tools has been identified as a potential external support, starting with the seminal work of Swan (1985) with his formative assessment tasks for functions and graphs. External support using formative assessment tasks and teacher manuals counts as promising affordance for (a) shifting teachers’ focus to a wider set of learning goals, (b) supporting teachers in unpacking the learning goals into more refined content elements, (c) supporting the interpretation of student work on the assessment tasks, and (d) providing advice for enhancing practices (Burkhardt & Schoenfeld, 2018; Stacey et al., 2018).

For example, Siemon (2019) specified a multiplication learning trajectory from Grade 4 to Grade 8, designing pen-and-paper formative assessment tasks along the learning trajectory with diagnostic tasks and evaluations from which she drew adaptively assignable learning opportunities. In two field trials, she provided empirical evidence that classes using the formative assessment over several months achieved higher learning gains than control classes without them. Their teachers were found to give higher priority to the formerly underattended conceptual learning goals and to unpack concept elements that were the most relevant to progressing along the learning trajectory.

In the last decade, various digital formative assessment platforms have been developed (Gikandi et al., 2011; Looney, 2019; Olsher et al., 2023), many of them focusing mainly on procedural learning goals (as summarized by Thurm & Graewert, 2022). Example platforms such as SMART (Stacey et al., 2018) and Numworx (Drijvers, 2020) provide conceptually focused digital formative assessments based on clearly defined learning stages and students’ common misconceptions as a means of informing teachers about students’ levels of understanding. Stacey et al. (2018) reported that teachers’ use of the SMART platform led to improvements in attitudes towards formative assessment by encouraging deeper analysis beyond statically categorizing student achievement. Teachers also self-reported changes in their knowledge of teaching and their enhancement practices (now often starting on more basic levels when realizing learning gaps in these basics). In this way, digital formative assessment platforms can be educative for teachers (Burkhardt & Schoenfeld, 2018).

The three briefly sketched formative assessment tools provide encouraging examples for digital platforms that support teachers’ use of formative assessment in the classroom by providing information about students understanding and therefore supporting teachers’ monitoring practices and enhancement practices (Drijvers, 2020; Stacey et al., 2018). Siemon (2019) also provided explicit support for teachers in setting their learning goals in a complete learning trajectory.

The additional affordance of digital formative assessment tools compared to paper-and-pencil tools is the automatic analysis of student answers (Looney, 2019; Olsher et al., 2023), compiling immediate analytic reports for teachers that can save time and allow for instructional implications within the same lesson. However, these analytic reports require further consideration, as emphasized in the recent overview on the state of the art:

There are many important details to consider about how results of analysis are presented: What is the form that a report takes (e.g., is an analysis presented in words or in numbers?)? Do reports only look backwards or do they look forward, offering hints, or suggestive alternative strategies? … Finally, what level of aggregation do reports use as their level of analysis? (Olsher et al., 2023, p. 17).

So far, two analytic reports are most common in digital formative assessments:

  • An aggregated “stop light” overview (Trgalová & Tabach, 2023) on the correctness of each student’s solution for each task (possibly in a matrix with the stop light colors of green, yellow, and red) mainly serves for the first two assessment subjobs: screening, as selecting students with further learning needs (e.g., when recruiting for pull-out groups), and finding difficult tasks, as selecting tasks to be revised (for all or for some students). However, this highest degree of aggregation in correctness overviews cannot really support teachers’ deeper monitoring of students’ understanding when no further diagnostic information is provided.

  • Detailed error-focused reports can provide detailed analyses of every student’s solution for every task, with a characterization of the errors and their backgrounds and possibly with advice for further adaptive learning opportunities. Although these detailed reports are highly educative and allows teachers to dive deeply into the specific assets and needs of single students, they also run the risk of holding more information for teachers than they can handle, so other types of aggregation of diagnostic insights are required that are more directly actionable.

While there has been ample research on conditions and effects of automated feedback provided for students (Deeva et al., 2021; Mertens et al., 2022; Olsher et al., 2023), affordances and constraints for different analytic reports for teachers have not yet been intensively studied (Traglová & Tabach, 2023). We therefore draw upon a slightly different but kindred domain: the dashboards in digital learning environments for this design challenge to motivate the empirical research needs.

1.3 State of Design and Research of Teacher-Actionable Diagnostic Insights in Dashboards

Digital learning (instead of assessment) environments aim at temporarily replacing the teacher (when organized as individual self-learning environments) or supporting the teacher (when combining individualized and collective work in classrooms). In recent years, various learning analytics tools have been developed that gather data from multiple technical sources to provide insights into students’ working processes and learning progress, for instance, how often students change their opinions in drag-and-drop tasks or how long they take for writing a text (Khulbe & Tammets, 2023; Koh & Tan, 2017; Sergis & Sampson, 2017). As it is technically easy to gather a huge amount of process information, the learning analytics community has started a discourse on how to prepare the data so that teachers can make sense of it and act on the information provided by it (Khulbe & Tammets, 2023; Trgalová & Tabach, 2023).

In their use of the term “actionable insights,” Tan and Chan (2016, p. 2) suggested a construct that can guide the design and analysis of dashboards with respect to its usefulness for users. Koh and Tan (2017) ranked four affordances of dashboards that offer actionable insights for teachers to help with student engagement:

  • “Descriptive analytics describe what students’ activities on [sic] the systems are, depicting indicators of student engagement.” (p. 321)

  • “Diagnostic analytics tries to explain, why students did what they did.” (p. 322)

  • “Predictive analytics provides empirical evidence of what students will be engaged in.” (p. 323)

  • “Prescriptive analytics provide recommendations to the student, reducing the immediate intervention required by the teacher.” (p. 323)

In their systematic review, Sergis and Sampson (2017) analyzed 50 papers on learning analytics, of which 46 (92%) provided no actionable insights for teachers. Instead the papers focused mainly on providing teachers with analyses of data without “recommendations to facilitate teachers’ reflection and sense-making” (Sergis & Sampson, 2017, p. 42). The four exceptions mainly provided two forms of teacher-actionable insight (Sergis & Sampson, 2017, p. 43): (a) allowing teachers to pose questions on their educational design or delivery that can be answered by the learning analytics from the delivery of the educational design or (b) generating textual feedback (mainly descriptive or diagnostic) to teachers through pre-defined feedback templates derived from the learning analytics data.

Van Leeuwen and Rummel (2022) designed different dashboards for teachers in the context of computer-supported student collaboration: Mirroring dashboards descriptively show the complete information obtained by the data analyses (e.g., all written answers or captured time lengths). Alerting dashboards provide prioritized information about when students’ work differs from the expected work and draw teachers’ attention to those differences. Advising dashboards provide recommendations for teachers’ actions for specific students. Intervening dashboards partially take teachers’ roles by acting on their analyses, for example, by immediately assigning a remediation task or prompting a particular form of communication, but they were less accepted by teachers. The researchers compared teachers’ success in identifying collaborative student groups that were in need of support with or without advising dashboards in a first study without time pressure and in a second study with strong time pressure. Under time pressure, teachers with the dashboard more successfully selected students in need of support significantly than those without the dashboard, whereas the selection of both teacher groups was equally good without time pressure. This suggests that the support affordances of dashboards for teachers can also depend on the circumstances of their teaching (such as time pressure).

While Koh and Tan (2017) focused their teacher-actionable dashboard work mainly on students’ engagement in learning environments, while Van Leeuwen and Rummel (2022) focused theirs on student collaboration, we can learn from both for the design of dashboards for formative assessment, but we first have to better understand teachers’ practices.

1.4 Deriving the Design Research Goal and the Refined Research Question of the Expert-Novice Comparison

Summing up the wide range of empirical research on teachers’ monitoring and enhancement practices, the importance of the underlying goal-setting practices has been outlined (Morris et al., 2009; Prediger et al., 2023). Expert teachers have been shown to unpack the learning goals in greater detail and on this base to enact well-aligned monitoring and enhancement practices, whereas novice teachers’ practices tend to be characterized by more mismatches and more vague learning goals (Prediger et al., 2023; van der Steen et al., 2023). As many teachers in our German school context have had limited access in their initial teacher education to relevant PCK about content categories, such as the meaning of multiplication and division (due to being trained for older grade levels or teaching out of field), it is the goal of our long-term design research project to support teachers in their monitoring and enhancement practices in our digital formative assessment platform, which will be introduced in Sect. 2.2.

In principle, digital formative assessment platforms reveal affordances for supporting teachers in all three practices (Olsher et al., 2023; Stacey et al., 2018). However, these affordances have not yet been fully exploited (Trgalová & Tabach, 2023), partly because teachers’ dashboards provide either too aggregated or not sufficiently aggregated information, both not aligned to teachers’ typical practices. Together with the different kinds of dashboards suggested by Van Leeuwen and Rummel (2022), the taxonomy adapted from Koh and Tan (2017) allows us to articulate the state of the art for dashboards in formative assessment tools: On one end of the scale of informativity, descriptive analytics are usually provided in correctness overviews (Trgalová & Tabach, 2023) that serve as alerting dashboards for selecting students or tasks requiring attention but reveal no diagnostic or prescriptive functions. On the other end of the scale of informativity, detailed error-focused reports for each student and task can serve diagnostic, predictive, and prescriptive functions but tend to produce overloads in information. In between the two extremes, so that it can be actionable in classrooms, information should be aggregated in such a way that it reveals diagnostic information that teachers need for deriving actions in classrooms. We therefore derive our design goal of developing structures for dashboards to scaffold teachers’ practices of better aligning the goal-setting, monitoring, and enhancement of student understanding.

In general, scaffolds are support means to enable a “novice to solve a problem … which would be beyond his unassisted efforts. This scaffolding consists essentially of … ‘controlling’” those aspects of the task (in our case practices) that are beyond the person’s “current capacity” (Wood et al., 1976, p. 90). In order to find these aspects in the practices that novice teachers might be able to perform with support, we conducted a qualitative expert-novice comparison of practices.

The existing state of research reveals that expert teachers seem to be able to work without automatic analysis, so for them the digital formative assessment platform may mainly provide support in terms of reducing work load or time for analysis; this can be helpful in situations such as being under time pressure (Van Leeuwen & Rummel, 2022). For novice teachers, however, the digital assessment and the automated analysis with backgrounds might be educative in improving PCK and the depth of the monitoring practices and enhancement decisions that are increasingly better aligned with the monitored learning needs and set next learning goals (Siemon, 2019; Stacey et al., 2018). However, as long as analytic reports in the formative assessment tools are too detailed or not aligned with teachers’ practices, there is a risk that these affordances will not be exploited (Trgalová & Tabach, 2023). So, for dashboard designs that can fully exploit support affordances (aligned with teachers’ support needs), we empirically operationalize these support needs as there are differences between what an expert teacher and a novice teacher can do without dashboards (Wood et al., 1976, p. 90). This research approach is condensed in the refined empirical research question for the current paper:

How do expert and novice teachers use diagnostic information from a digital formative assessment for their practices of goal setting, monitoring, and enhancing student understanding?

Comparisons of experts’ and novices’ practices have been conducted in many domains, revealing some stable findings: Experts perceive large meaningful patterns in their domain, perceive problems in their domains “at a deeper level than novices” (Glaser & Chi, 2014, p. 14), and by this draw more sustainable connections. In contrast, “novices tend to represent a problem at a superficial level” and jump more quickly to immediate solutions (summarized in Glaser & Chi, 2014, p. 19), possibly with less strong alignment to the perceived elements. In this paper, we follow van der Steen et al. (2023) in their research approach to conducting expert-novice comparisons, assuming that characterizing expert teachers’ practices can reveal “knowledge [that] can be used to support teachers who struggle with implementing formative assessment as intended. Therefore, the outcomes of this study will result in design steps and strategies for all teachers” (p. 2). Design implications are then presented in Sect. 4.

2 Methods for the Qualitative Expert-Novice Comparison

2.1 Research Context of the Mastering Math Online-Check

We situate our expert-novice comparison in the larger design research study on the Mastering Math Online-Check (Hankeln et al., submitted), a digital formative assessment platform developed and investigated within the 15-year-long Mastering Math project (Prediger, 2022). The project aims at teachers of Grade 5–7 students (10- to 13-year-olds) who struggle in mathematics and need second learning opportunities for understanding basic arithmetic concepts such as the meaning of multiplication and division. From 2008 to 2017, 45 pen-and-paper modules of the Mastering Math project intervention program were iteratively designed, qualitatively tested in design experiments, and redesigned to best support teachers in fostering low-achieving students’ deep conceptual understanding. Each module starts with a short test of three to five tasks, designed to formatively assess students’ conceptual understanding of carefully chosen concept elements. Because these brief targeted tests cannot deterministically provide accurate and comprehensive assessments of students’ understanding, teachers are encouraged to continuously monitor their students’ understanding in later oral conversations. They can, however, still serve the teachers in formatively planning their enhancement practices, ideally in small-group (or whole-class) teaching. The assessment tasks are directly linked to teaching material for deepening incomplete or shallow understanding, so the test results can guide the prioritization of learning tasks and communicative prompts in remediating instruction. During the oral conversation in the enhancement sessions, teachers are encouraged to continuously monitor the progression of their students’ understanding, and to adapt their enhancement practices accordingly. The Mastering Math formative assessment and intervention program was shown effective (in its pen-and-paper version) in a field trial with 655 children (Prediger, 2022).

As the evaluation of the formative assessment tests can be time-consuming for teachers and partially constrained by teachers’ heterogeneous PCK (Prediger et al., 2023), we decided to digitalize the pen-and-paper tests and to develop the platform Mastering Math Online-Check that can automatically provide detailed but yet accurate feedback to teachers (Hankeln et al., submitted). This platform includes tasks in closed formats (multiple- or single-choice tasks, short answers, drag-and-drop answers, etc.) that are automatically coded as correct or incorrect with a distinction of different error categories. It also includes open answers that need to be manually coded by teachers. In the future, evaluation outcomes will be displayed in different evaluation dashboards with different degrees of detail and focus. The current study documents our steps towards the (re-)design of dashboards, in other words, the expert-novice comparison in Sect. 3 and the empirically grounded design decisions in Sect. 4.

2.2 Methods of Data Collection of the Empirical Study

2.2.1 Participants in the Expert-Novice Comparison of Practices

In order to compare teachers’ practices, we conducted individual work sessions with a think-aloud protocol with four teachers who were all not yet acquainted with the Online-Check: Two novice teachers (less than 3 years of teaching experience) were selected among the volunteers as having worked with the Mastering Math pen-and-paper materials for only several months. Two expert teachers were selected using four indicators of expertise: They (a) had had at least 5 years of mathematics teaching experience and (b) at least 3 years of teaching experience with the Mastering Math materials, (c) had been recruited by the local authorities to serve as facilitators in Mastering Math professional development (PD) programs due to their general teaching expertise, and (d) had deepened their expertise in the Mastering Math program by an additional qualification program on its PCK backgrounds.

2.2.2 Tasks, Student Products, and Error-Focused Reports used in the Think-Aloud Sessions

The four participating teachers started the think-aloud session by administrating the prototype digital formative assessment with eight tasks on the meanings of multiplication and division in their classes. To assure comparability, the think-aloud session all referred to products of the same students (brought in by the researchers) in a simulation of the Online-Check (extracts of tasks are shown in Fig. 2, sequenced along the learning trajectory from the bottom to the top).

Fig. 2
figure 2

Assessment tasks with one student’s answers and extracts of the analysis and aligned learning goals along the learning trajectory (to be read from bottom to top)

In the think-aloud session, the teachers were exposed to three jobs: (1) analyze Chantal’s answers (summarized in blue in Fig. 2) without analytic support by the Online-Check platform to monitor her understanding, (2) set individualized learning goals for her, and (3) plan the enhancement session through choosing and adapting tasks from the given teaching materials and planning prompts for taking into account the diagnostic information. In a second step, they received (4) the detailed error-focused report (similar to the second and third column of Fig. 2) and the content categories along the learning trajectory (as on the learning trajectory on the right side of Fig. 2) and were invited to work with them and to (5) comment on the drafts of other dashboards.

Teachers might monitor Chantal’s understanding as bearing some ideas about the meaning of multiplication and division that are nevertheless fragile: The partitive model of division seems to be safer (in Tasks 3a, 3c, and 5a) than the quotitive model (in Tasks 4a, 4c, and 5a) and the inverse relationship (Task 5a). The meaning of multiplication as counting in units (Tasks 2a, 2d, and 5b) is safe in pre-structured figures (Task 2d) but not stable in figures that need to be structured (Tasks 5b and 2a/b). Chantal repeatedly struggles with matching symbolic, graphical, and textual representations, with a sole focus on numbers in Tasks 1a, 1b, 2b, 3b, 4a, and 5b. From there, teachers might set the learning goal that the student should overcome surface translations (of finding only numbers) and learn to explain the multiplicative/partitive/quotitive structure. Building upon her knowledge about counting in units in the pre-structured figures (such as the dice picture), her next learning goals seem to be structuring the unstructured figures into units and translating the unit count into symbolic multiplication. This will allow her to gain a deeper understanding of the inverse relationship of multiplication and division with both partitive and quotitive models. Hence, planning enhancement practices should focus on exactly these learning goals, building upon what Chantal brings in and using her intuitive ideas. Even if the teachers decide to continue monitoring Chantal’s understanding by oral conversation in the lesson enactment, these kind of monitoring results and next learning goals and enhancement practices that are chosen form the base for their lesson planning.

2.3 Methods of Data Analysis

In total, about 330 min of think-aloud sessions were videorecorded and partially transcribed. The qualitative analysis of the transcripts followed a deductive coding procedure (Mayring, 2015).

Step 1.:

The transcripts were deductively coded according to the

(a):

tasks from the digital assessment platform or enhancement materials (as in Fig. 2), which were addressed by the teachers or interviewers, and

(b):

jobs implicitly or explicitly addressed by the interviewer’s questions or teacher’s think-aloud articulations: monitoring students’ understanding (alluding to practices to identify students’ assets, errors, and misconceptions), setting learning goals (alluding to practices to identify and sequence the next learning goals for students), or envisioned enhancing of students’ understanding (alluding to planning various ways to work with students).

Step 2.:

The transcripts were segmented, with new segments starting when the conversation changes with respect to (a) the addressed job, (b) task, or the (c) learning goal (successively refined in Step 3).

Step 3.:

By methodological means of Vergnaud’s (1998) concepts-in-action, we inferred in each identified transcript segment the implicit or explicit content categories the teachers used as diagnostic categories for monitoring or goal categories for setting learning goals and/or striving for it in envisioned enhancement practices. First, content categories were deductively determined from those in the learning trajectory (Fig. 2, right side). Further content categories were then inductively added when teachers referred to others. In the transcripts and analytic text, we marked inferred content categories using “||.||” and used abbreviated names, for instance, ||inverse|| for inverse relationship of multiplication/division. The analytic findings were then visualized in graphical analytic summaries along the learning trajectory (Fig. 3). In this way, the connectedness and coherence of teachers’ noticing, goal setting, and enhancing could be visually analyzed. The codings were first conducted by the first author and a student research helper and discussed among three of the authors until interpretative consensus was achieved.

3 Empirical Insights into Expert and Novice Teachers’ Practices

3.1 First Comparison: Construction and use of a Learning Trajectory

We start by comparing two cases, the expert teacher Kaye and the novice teacher Will, with respect to the content categories addressed while monitoring Chantal’s assets and challenges, setting goals and envision enhancing practices.

3.1.1 Case of Expert Teacher Kaye’s Practices

In Fig. 3, we summarize relevant excerpts of the transcripts from Kaye’s think-aloud session and their coding from the first case analyzed.

Fig. 3
figure 3

Kaye’s monitoring, goal-setting, and envisioned enhancement practices: Coded transcript

When monitoring Chantal’s answers (see Fig. 2), Kaye notices substantial assets, first in vague terms (Turn T2a in Fig. 3) and later more concisely. In her first utterance, she already sets the learning goal to ensure the understanding of ||multiplication as counting in units|| (T2b), and identifies some assets for this in early understandings (T4c). She monitors a general asset in ||connecting representations|| (T4a,c), which she then unpacks into what she acknowledges as a first understanding of ||partitive division|| (T10a) but a fragile understanding of ||quotitive division|| (T4a), questioned with respect to its depth (T10b).

Continuing with goal setting, Kaye focusses on the three goal categories of ||connecting representation|| (T12 and T18), ||partitive division|| (T14), and ||quotitive division|| (T14 and T24a). Having identified Chantal’s assets in some understanding of ||partitive division|| (T10a), Kaye picks it up (T14a) to connect it systematically to ||quotitive division|| (T14a). She explicitly combines goal setting (14a) with envisioned enhancing practices (T14b), indicating the clear focus that the set learning goals provide for her enhancement.

When asked to envision her enhancing practices, she reconfirms her strategy already articulated (T18 as in T14a, b) by also referring back to monitoring results (T20) to justify the selected enhancement prompt (T20). Later, after building a safe ground in ||partitive division||, she proceeds to ||quotitive division|| (T24a), in line with the learning trajectory in the enhancement materials. She articulates explicitly what ||quotitive division|| means (T24a), introduces an additional activity and then continues with the first enhancement task on this goal in the enhancement material (T24b). This is enriched by connecting students’ everyday experiences to the quotitive structuring into groups of a given size.

To sum up, Fig. 4 (left side) reveals the graphical analytical summary of Kaye’s addressed content categories. It illustrates her practices as being characterized by clear intent and coherent focus for both the diagnostic categories in monitoring and the goal categories strived for in goal setting and envisioned enhancing of students’ understanding. Her practices to explicitly justify enhancement decisions by the set goals and monitoring results are made graphically visible by the horizontal arrows in three content categories. We interpret these arrows as evidence of a high consistency between her practices. Furthermore, the diagonal arrows depict sequences of learning goals (in goal-setting practices) and goals of activities (in enhancement practices) that Kaye explicitly articulated to be sequenced in this order. These sequences resonate with the intended learning trajectory in the material, and she even pursues them for content categories that Chantal seems to master, just to build a safe ground for the next steps in the learning trajectory and integrate relevant monitoring results along the way.

Fig. 4
figure 4

Visual summary for first expert (left)-novice (right) comparison (see Fig. 2 for the learning trajecory). (Boxes indicate transcript turns in which the teachers explicitly or implicitly refer to content categories: Solid color boxes indicate categories adequately addressed as student challenges. Boxes with colored borders indicate monitored student assets. Boxes with light colors indicate falsely or vaguely addressed categories. Arrows indicate teachers’ articulated connections: Vertical/diagonal arrows indicate deliberate sequences of categories. Horizontal arrows indicate argumentative connections of monitoring, goal setting, and enhancing). (Color figure online)

In the following subsection, we contrast these practices with those of the novice teacher Will. The visual summary of his analysis is shown in Fig. 4 (right side). After the analysis of Will’s case, we will discuss the interesting contrasts that these cases reveal.

3.1.2 Case of Novice Teacher Will’s Practices

In Fig. 5, we summarize relevant excerpts of the transcripts from Will’s think-aloud session and their coding from the second case analyzed.

Fig. 5
figure 5

Will’s monitoring, goal-setting, and envisioned enhancement practices: Coded transcript

Will starts his monitoring practices mentioning the students’ procedural assets (T2a) and some assets in understanding ||multiplication|| (T2b) but does not remark on the challenges in the assessment Task 2c. He also notices challenges in the content categories of ||inverse relationship|| (T4) and ||connecting representations||, focusing only on numbers instead of structures (T6).

However, when turning to setting learning goals, he articulates rather unspecific learning goals (Turns 8a and T10) without referring back to the monitoring results of noticed assets and challenges. In T8b, he names both ||partitive division|| and ||quotitive division||, but without carefully distinguishing them. (While this distinction is critical PCK for articulating two distinct learning goals, the German terms for ||partitive division|| (“Verteilen”) and ||quotitive division|| (“Aufteilen”) are often used interchangeably in everyday language.) Later, in T18a/b, the legitimate doubt about Will’s distinction is confirmed by mixing both in one suggestion for enhancement. When asked to justify his learning goals (T9), he inconsistently refers to another monitoring result coded as ||other goal|| (T10), so the dotted arrows in Fig. 4 are not horizontal (signifying consistent reference to the same content categories), but indicate jumps between steps in the learning trajectory.

When asked to envision his enhancing practices for Chantal, his first approach inconsistently refers to a further learning goal that is not part of the main learning trajectory, delineating division from addition and subtraction (T12). Although this learning goal is not irrelevant in general, it is particularly not in line with his previous monitoring and goal-setting results. After that he jumps to an enhancement task that provides the representation he wants to work on, the dot array, yet is designed for the ||inverse relationship||, which he does not explicitly address (T14). While discussing this enhancement task, he switches the addressed content category from ||quotitive division|| (T14 and T16) to ||partitive division|| (T18a) and in the same utterance back to ||quotitive division|| (T18b), apparently unintentionally. When asked to provide a prompt for the student unable to solve the task (T19), Will focuses again on ||quotitive division|| (T20) and finally provides an appropriate explanation of this content category (T22).

To sum up, Will’s practices in comparison to Kaye’s practices can be outlined as follows: While Kaye has clearly articulated learning goals that she sequences along her learning trajectory, the goal categories that Will adresses when setting goals or envisioning his enhancing practices are rather vague (depicted in Fig. 4 by light colored boxes) and do not have a clear sequencing. This makes it difficult for Will to construct a consistent learning trajectory and to justify his goal-setting or enhancement decisions. Whereas Kaye focuses on the learning goals she set in her envisioned enhancing practices, the consequence of Will’s vagueness is that the focus of his enhancing practices are wavering and are not necessarily in line with the intentions of the tasks. In the visual summary in Fig. 4, this becomes visible by only few arrows.

Whereas Kaye consistently connects her monitoring, goal settting, and enhancing (several horizontal arrows in the visual summary), we depicted only one horizontal arrow indicating a consistent inference from goal setting to enhancing for Will (Fig. 4). The consistency between monitoring and his other practices is low, as he addresses entirely different content categories, which is visible in the visual summary by non-correspondences between the columns in the lines of the content categories.

3.2 Second Comparison: Influence of the Learning Trajectory on Enhancement Practices

To widen the empirical insights to further cases, we present two more excerpts of transcripts which refer to Task 2 (in Fig. 2): asking students to select those graphical representations that match representations to 3 × 4 = 12. The dot array (Task 2a) and the dice picture (Task 2d) are correctly matched, which reveals a first knowledge. However, Chantal’s wrong answer to 3 + 4 dots (Task 2b) indicates that her knowledge of the meaning of ||multiplication|| is only shallow, with a sole focus on numbers while ||connecting representations||. As the novice teacher Chris and the expert teacher Tom both decide to work on assessment Task 2 with Chantal, it is worth contrasting their enhancement practices.

3.2.1 Case of Novice Teacher Chris’s Practices

Chris conducts a typical practice of error analysis with subsequent repair (Fig. 6).

Fig. 6
figure 6

Chris’s monitoring, goal-setting, and envisioned enhancement practices: Coded transcript

Starting from his observation of Chantal’s challenge with matching multiplication to counting in unstructured figures in Tasks 2a and 2b (T14a), he takes this as the learning goal and uses the assessment task as an occasion for communication about it in the enhancement practice (T14b). He does not take into account Chantal’s asset in the first learning goal for multiplication (counting in pre-structured units in Task 2d). When the interviewer asks for further action when the student shows difficulties in understanding “counting smartly”, i.e., counting in units (T17), Chris goes back to ||multiplication in pre-structured units|| by referring to the enhancement task with dice (T18a), which he connects to the new goal, ||multiplication in unstructured figures|| (T18b).

3.3 Case of Expert Teacher Tom’s Practices

The expert teacher Tom receives the same student errors and monitored similar challenges and assets as Chris, but he starts his enhancement practice differently, not with the immediate error repair, but with the first task along the learning trajectory in the enhancement material (Fig. 7).

Fig. 7
figure 7

Tom’s monitoring, goal-setting, and envisioned enhancement practices: Coded transcript

As the student is already able to translate the dice picture to symbolic multiplication, Tom identifies this asset as a productive starting point for deepening her understanding of what ||connecting representations|| means: Not to match solely the numbers but also the multiplicative structures, namely, the ||multiplication as counting in units in the pre-structured figure|| (T7). He then explicitly connects this familiar learning content to the new learning content, ||multiplication as counting in units in un-strucutred dot arrays|| (T13b), for which students have to impose the structure onto the dot array.

As the visual summary in Fig. 8 shows, Tom’s and Chris’ practices differ in two characteristics: First, the expert teacher Tom (on the right side in Fig. 8) is able to identify fundamental content categories underlying the content in view of Task 2 and prioritizes addressing this fundamental content first, whereas the novice teacher Chris (on the left side in Fig. 8) directly addresses the last task for a quick error repair. Second, Chris only goes back to the fundamental content when he notes that the student is not able to handle it (T17). In contrast, Tom notes an asset the student has (T9) and decides to pick it up so that he can build upon it.

Fig. 8
figure 8

Visual summary of the second expert (right)-novice (left) comparison. (Boxes indicate transcript turns in which the teachers explicitly or implicitly refer to content categories: Solid color boxes indicate categories adequately addressed as student challenges. Boxes with colored borders indicate monitored student assets. Boxes with light colors indicate falsely or vaguely addressed categories. Arrows indicate teachers’ articulated connections: Vertical/diagonal arrows indicate deliberate sequences of categories. Horizontal arrows indicate argumentative connections of monitoring, goal setting, and enhancing). (Color figure online)

3.4 Repeating Patterns in the Expert-Novice Comparisons

The two expert-novice comparisons provide highly valuable insights into differences between two expert teachers (Kaye and Tom) and two novice teachers (Will and Chris) that we have also found in the other parts of the data. Our expert teachers stand out with respect to five differences:

  • Observed Difference 1: Less vague content categories. The novice teacher Will applies multiple content categories, but often in quite vague or general forms (indicated in Fig. 4 by boxes in light colors). In contrast, the expert teacher Kaye uses less vague, well-delineated diagnostic categories with a clear focus on the most relevant aspects. This difference might be traced back to having more elaborate and more differentiated PCK compared to Will, who does not safely distinguish quotitive and partitive division.

  • Observed Difference 2: Higher consistency between categories across monitoring, goal-setting, and enhancing practices. The novice teacher Will monitors using other diagnostic categories than he applies as goal categories for enhancing practices (indicated in Fig. 4 by same lines addressed in different columns). In contrast, the expert teacher Kaye more consistently uses the same content categories as diagnostic categories (for monitoring) and goal categories (for goal setting and enhancing).

  • Observed Differences 3: More explicit category-guided inferences. Whereas the expert teacher Kaye explicitly draws upon the monitored content categories for inferring or justifying goal-setting and enhancement decisions, the novice teacher Will rarely connects them (indicated in Fig. 4 by horizontal arrows). However, the novice teacher Chris seems to infer the goal for the enhancement conversation too directly from the detected error by only targeting the error on a surface level (i.e., the additive picture in the array representation) instead of the underlying content categories (see Observed Difference 4).

  • Observed Difference 4: Learning trajectory rather than errors as guidance. Both expert teachers, Kaye and Tom, operate with clear learning trajectories in mind, which helps them to sequence the learning goals and the enhancement activities. In contrast, both novice teachers, Will and Chris, jump immediately from the detected errors to a task with a similar surface (in case of Will, the same dot array as graphical representation) or the same task for which a repair is needed, without attention to the learning progression needed to arrive at this point in the learning trajectory.

  • Observed Difference 5: Building upon students’ assets to strengthen the step to the next learning goal. Whereas the novice teachers Will and Chris concentrate on errors and plan their intervention directly from these errors, expert teachers Kaye and Tom also search for students’ assets, not just to give positive feedback, but to deliberately build upon or even use them to establish a foundation in earlier learning goals (e.g., Tom, who uses the dice picture to establish the norms for connecting representations through structures). More generally, experts stand out in sometimes starting with tasks that the student can already do, just to deepen their insights and then link the new learning content to this familiar content.

Even if the comparison between only four teachers cannot claim any generalizability, these empirical findings have substantially informed the design of the analytic reports in our digital formative assessment, as will be outlined in the next section.

4 Analytic Reports as Scaffolds for Teachers´ Practices with Formative assessments: How can Expert-Novice Comparisons Inform the Design?

Based on the literature review in Sect. 1.1 and our own empirical study in Sects. 2.1 and 3, we can now present the design decisions we have drawn for the analytic dashboards provided in the teacher platform for the Mastering Math Online-Check (Hankeln et al., submitted).

Informed by the research on dashboards for digital learning environments (Koh & Tan, 2017; Olsher et al., 2023; Trgalová & Tabach, 2023; Van Leeuwen & Rummel, 2022), we distinguish alerting and advising dashboards and adapt the taxonomy by Koh and Tan (2017) for our purpose, providing topic-specific teacher-actionable insights from digital formative assessment focusing conceptual understanding:

  • Descriptive analytics describe students’ activities on the systems, depicting indicators of (in-)correctness of students’ work.

  • Diagnostic analytics are intended to explain why students did what they did, so revealing information about students’ conceptual understanding.

  • Predictive-prescriptive analytics provide (diagnostically justified) advice on those content categories that are the next learning goals for students.

  • Prescriptive analytics provide recommendations for specific actions in the classroom to enhance students’ understanding of the next learning goals.

As an advance organizer, Fig. 9 summarizes five types of analytic reports, including three kinds of teacher dashboards that we have designed before and redesigned after the expert-novice comparisons. In the following, we explain for each type of report how the existing design approaches and research findings on affordances and constraints and the findings on expert-novice comparisons on practices informed the redesign.

Fig. 9
figure 9

Design outcome of this paper: Five different analytic reports aligned with teachers’ practices and support needs of novice teachers (in grey drawn from literature, in black from observed differences OD1–5 in expert-novice comparisons). (Color figure online)

4.1 Correctness Overview

The most aggregated descriptive analytics (Koh & Tan, 2017) are provided by a correctness overview providing information on the correct/incorrect results of each student for each task. Belonging to the alerting dashboards (Van Leeuwen & Rummel, 2022), this has the affordance of supporting teachers in immediate screening practices for selecting students (who need particular attention) and screening tasks (that need attention for many students in class), two relevant practices that are prerequisites for formative assessment practices (Wiliam, 2007).

However, this most highly aggregated alerting dashboard has the serious constraint of not providing sufficient diagnostic information to inform teachers’ enhancing practices in topic-specific ways and risks to be handled very technically without attention to the task content (Trgalová & Tabach, 2023).

The empirical investigation of our expert and novice teachers indicated that the titles of our assessment tasks (that exactly correspond to the enhancement units; Prediger, 2022) already reveal an immediate hint to the content elements in view. Hence, we decided for the redesign that including the task titles in the correctness overview can enrich the highly aggregated overview in a small but relevant manner.

4.2 Detailed Error-Focused Reports

On the other end of the scale, most in-depth information is provided by detailed error-focused reports. Their affordances can be characterized using Koh and Tan’s (2017) taxonomy as providing not only descriptive analytics on (in-)correctness for each student and for each task, but also diagnostic analytics by giving possible explanations for student errors and prescriptive analytics by providing recommendations for specific enhancement tasks. This therefore fulfils the criteria of an advising dashboard (Van Leeuwen & Rummel, 2022) with detailed support for monitoring and enhancing practices.

According to our empirical expert-novice comparison (Observed Difference OD2: Higher consistency of categories across monitoring, goal-setting, and enhancing practices), the immediate assignment of enhancement tasks for particular errors can support teachers in making consistent decisions. However, what most distinguishes our two experts from novices is that they consistently activate the same well-articulated content categories (Observed Difference OD1: Less vague content categories) as diagnostic categories in monitoring and as goal categories for goal-setting and envisioned enhancing practices. As long as the detailed error report only articulates the background of the error but not the next learning goal (see example in Fig. 9), this consistency is much harder to achieve for novice teachers.

We therefore decided to redesign the articulation of the diagnosis in the detailed error report by explicating the next learning goals. In this way, we hope to provide novice teachers with less vague content categories (OD1) that they can use consistently for monitoring, goal-setting, and enhancing practices (OD2), like expert teachers. We hope that by the redesign, we will also support novice teachers in taking more explicit connections between monitoring, goal-setting, and enhancing practices in turning diagnostic categories into goal categories to be set and strived for (Observation Difference OD3: More explicit category-led inferences).

However, these redesign decisions cannot overcome the already articulated constraint that detailed error reports for each student and task risk an overload of too detailed and isolated information (Trgalová & Tabach, 2023), in particular when teachers do not work with only one student but a whole group. To overcome this constraint, we have introduced three additional dashboards with different modes of aggregation.

4.3 Task-Focused Dashboard

The task-focused dashboard is an advisory dashboard that aggregates information from detailed reports for each student into one list for all students for one specific task, with nearly the same descriptive, diagnostic, and prescriptive affordances as the detailed report (Fig. 9). In this way, the task-focused dashboards can overcome information overload constraints and increase the teacher actionability (Tan & Chan, 2016) by supporting teachers in combining the information of each single student in a way that does not provide every individual error detail but supports working under time pressure (Van Leeuwen & Rummel, 2022).

As in the detailed reports, the redesign, based on OD1 and OD2, included the next learning goals rather than only error backgrounds. In this way, we intended to strengthen the predictive-prescriptive content focus by assigning students to relevant content categories in need of revision with the given enhancement tasks. However, the dashboard still continues to be error focused (relevant with respect to OD4 and OD5); this constraint can only be overcome with two further dashboards introduced after the expert-novice comparisons.

4.4 Individual-Focused Dashboard

The most interesting outcome of the expert-novice comparison revealed that only the novices in our data set took the errors as their main guidance (Observed Difference 4: Learning trajectory rather than errors as guidance). Whereas the novice teacher Chris immediately intended to work on the content category identified as erroneous (Fig. 8, left), the expert teacher Tom (Fig. 8, right) drew the learning goal from the error analysis, but then planned a learning trajectory in several steps starting from students’ assets (Observed Difference 5: Building upon students’ assets to strengthen the step to the next learning goal) to build a foundation for successively reaching the last goal. This much more elaborate enhancement practice is not supported by the first three analytic reports, which are all organized starting from errors and reveal teacher-actionable information for supporting teachers’ actions in repairing these errors.

In order to scaffold novice teachers in what expert teachers can do without support (Wood et al., 1976), we have introduced the individual-focused dashboard, another advising dashboard revealing the descriptive, diagnostic, and predictive-prescriptive affordances organized by student so that the sequence of learning goals becomes more explicit. In the future, we aim at explicating students’ assets in this dashboard.

Even with the future revision, the individual-focused dashboard still bears the constraint that teachers need to aggregate diagnostic insights and advice for several students while the learning trajectory stays implicit, so we invented a fifth dashboard.

4.5 Learning-Trajectory Integrated Dashboard

The learning-trajectory integrated dashboard (see Fig. 10 for a larger example) provides the most teacher actionability for goal-guided enhancing practices along the learning trajectory while omitting support for deeper monitoring of student understanding. It reveals no descriptive affordances, but organizes the predictive-prescriptive information directly along the highly unpacked, fine-grained content categories, which are presented in a clear sequence along the learning trajectory. Implicit diagnostic information is given by listing all students whom the teachers need to engage in classroom communication while treating a particular enhancement task with respect to a particular unpacked learning goal. With this deliberate reduction of diagnostic information, we intend to reduce the load of personalized adaptive instruction for each student separately, yet encourage teachers to organize a teacher-moderated group discussion to engage concerned students in diagnostic talks at the right moment in the learning trajectory. This encouragement is also supported by suggested prompts on particular unpacked goals, which enable teachers to derive explicit practices for enhancing aligned with the goal setting (OD 3).

Fig. 10
figure 10

Learning-trajectory integrated dashboard: Supporting the use of diagnostic information for the unpacked goals at the right moment in the learning trajectory

By this innovative dashboard, we intended to take into account OD 4 and first steps for OD5 by providing exactly the support that novice teachers might use for teaching adaptively with the focus on the most relevant goal categories outside of one-to-one tutoring situations.

We hope it will allow teachers to avoid disentangling isolated details treated separately in error-guided ways without long-term coherence and connections, but rather to successively build upon students’ assets and sequencing learning steps so that the mathematics becomes understandable. In the future redesign, we might aim at including students’ assets and teacher-actionable ways to build upon them (OD 5). The structure of this innovative dashboard has been co-constructed together with highly experienced teachers and PD facilitators of the project design team in reaction to the empirical findings.

5 Discussion and Conclusion

5.1 Embedding the Outcomes into the General Discourse

To widely implement formative assessment in mathematics classrooms as has been requested for decades (Black & Wiliam, 1998), digital formative assessments bear strong affordances through automated analysis (Olsher et al., 2023; Trgalova & Tabach, 2023). Until now, the affordances have not yet been fully exploited, as they have provided few “teacher-actionable insights” (Koh & Tan, 2017, p. 319). One hindrance has been that limited knowledge existed about teachers’ actions and exact support needs, in particular for monitoring and enhancing students’ deep conceptual understanding (Drijvers, 2020; Hankeln et al., submitted; Sergis & Sampson, 2017).

In this paper, we have followed Wood et al. (1976) in their idea of scaffolding exactly those aspects that novices cannot do without support but experts can. For this, we conducted an expert-novice comparison using think-aloud sessions to specify these differences and conceptualize them as the aspects in need of support to compare expert and novice teachers’ practices of monitoring, goal-setting, and enhancing low-achieving fifth-graders’ understanding of the mathematical topic of meanings of multiplication and division (Fig. 1).

The qualitative analysis of practices with respect to the content categories addressed by teachers (see visual summaries in Figs. 4 and 8) revealed five observed differences:

  • OD 1: Less vague content categories

  • OD 2: Higher consistency between categories across monitoring, goal-setting, and enhancing practices

  • OD 3: More explicit category-guided inferences

  • OD 4: Learning trajectory rather than errors as guidance

  • OD 5: Building upon students’ assets to strengthen the step to the next learning goal

While being identified using a 2 × 2 sample size, these observed differences were strengthened by the fact that they resonated with findings from expert-novice research in other domains (Glaser & Chi, 2014): Experts perceive large meaningful patterns in their domain (in our case, keeping the learning trajectory in mind; OD4), notice problems in their domains more concisely (in our case, with less vague and more unpacked content categories; OD1), and draw more sustainable connections (by consistently using the same content categories for monitoring, goal-setting, and enhancing; OD2). In contrast, novices tend to capture problems at a more superficial level (OD1) and jump more quickly to immediate solutions (in our case, being guided by immediate error repair rather than a systematic succession of learning opportunities along a learning trajectory (OD4) and not building upon students' assets in that learning trajectory (OD5)). In particular, the last two findings have also been identified in other studies of teachers’ monitoring and enhancement practices in mathematics (Prediger et al., 2023; Siemon, 2019; van der Steen et al., 2023).

For the design of dashboards, we could only draw upon limited research on teacher dashboards for formative assessment (Olsher et al., 2023; Trgalova & Tabach, 2023), yet substantial research on dashboards for learning environments (Tan & Chan, 2016; Van Leeuwen & Rummel, 2022), showing that different affordances (descriptive, diagnostic, predictive, and prescriptive) can best be exploited by different aggregations in dashboards, between the extreme ends of the scales of informativity, correctness overviews, and detailed error-focused reports.

Following the empirical findings of our expert-novice comparison, we have redesigned five analytic reports (Fig. 9) to strengthen novice teachers’ practices of unpacking concise subgoals (OD1) that they consistently use across monitoring, goal-setting, and enhancing practices (OD2) and for more explicit inferences of enhancement decisions (OD3). These findings and the subsequent redesign decisions are in line with what Siemon (2019) and Stacey et al. (2018) outline as critical supports for teachers’ formative assessments.

However, even after redesign, the three already introduced analytic reports (correctness overview, detailed error-focused report for each student and task, and task-focused dashboard aggregating information for all students in one task; see Fig. 9) still did not meet novice teachers’ support needs that we identified through OD 4 and OD 5: Novice teachers in our sample immediately jumped from a detected error to enhancement tasks for repairing this error (OD4), whereas the expert teachers kept their eyes on the necessary progression of learning goals along the learning trajectory and came back to a particular error when the prerequisites had first been safely established (Fig. 8) built upon students’ assets (OD5). Our first three analytic reports do not provide sufficient support for overcoming this immediate error-repair practice, so we developed two further dashboards that might support novice teachers in planning along the learning trajectory spanning over one or two enhancement sessions (Fig. 10). The re-structured information in individual-focused dashboards provides an overview for each student along all tasks, the learning trajectory-integrated dashboards even aggregate them for all students. By complementing the suggested enhancement tasks using oral prompts (to help particular children focus on particular goals in particular steps of the learning trajectory), we hope to provide the most teacher-actionable information (Tan & Chan, 2016) that fully exploits the power of the learning trajectories (Siemon, 2019).

Summing up, all five analytic reports we have introduced or redesigned (Fig. 9) provide some actionable information for teachers, with the correctness overview being only an alerting board for selection actions and the other advising dashboards providing substantial advice on how to go to enhancement. These actionable insights relate and aim to support novices regarding the observed differences OD 1–5, but OD5 (building upon students’ assets) still needs to be fully integrated into our dashboard design. This will be the subject of future redesign and research that can then also validate the hypotheses that have been generated in the small sample in this study.

5.2 Methodological Limitations and Future Research and Design Needs

Of course, the findings of this study must be interpreted with caution regarding its methodological limitations. Like all qualitative studies, the findings are bound to the particular data-gathering contexts and cannot easily be generalized to all contexts.

First, we worked with a small sample size of four teachers on very particular tasks, so we do not claim any statistical generalizability of the findings and are aware that they are still tied to the chosen tasks. With other teachers, other assessment tasks, and other enhancement tasks, further phenomena might occur or observed differences might disappear. The sample size was also particular in that all participating teachers were (at least a bit) familiar with the Mastering Math program and were likely engage in formative assessment practices more often than other teachers. For example, all teachers integrated opportunities for further monitoring of students’ understanding in their enhancement practices in order to adapt their enhancement to students’ needs. While we do not claim statistical generalizability, our case study findings resonating with expert-novice comparisons in other domains (Glaser & Chi, 2014) can be interpreted as a first cross-domain validation.

The study used typical answers of a fictitious student in order to compare assessment-based teachers’ enhancement practices without any distractions (e.g., by teachers’ preformed judgments on their students’ needs, or by small changes in students’ answers that could have an impact on the monitoring results). What helped us to compare teachers’ practices in comparable situations hindered us also from analyzing how they took into account further context factors that might usually impact teachers’ decisions (e.g., teachers’ knowledge about students’ different motivation or self-regulation), so the analysis is restricted to topic-specific cognitive categories for teaching. When future formative assessment (e.g., by log data analysis of students’ processes) also captures self-regulation or motivational indicators, these could also be included in the analysis of teachers’ decision making.

With the focus on only one mathematical topic—understanding of multiplication and division—the current findings are local, in other words, they are bound to the particular topic in view. While the resonance with findings from other mathematical topics (Siemon, 2019; Stacey et al., 2018) provides a first impression of being not completely topic specific, more research in the future should be conducted to investigate to what extent expert and novice teachers’ monitoring, goal-setting, and enhancing practices bear topic-specific differences.

In future studies, we plan to test our generated design hypotheses drawn from observed differences to validate quantitatively whether the hypothesized potential support affordances for the redesigned dashboards really support novice teachers in the intended ways. For the quantitative validation, larger sample sizes will be included.