Rubric formats for the formative assessment of oral presentation skills acquisition in secondary education

Acquiring complex oral presentation skills is cognitively demanding for students and demands intensive teacher guidance. The aim of this study was twofold: (a) to identify and apply design guidelines in developing an effective formative assessment method for oral presentation skills during classroom practice, and (b) to develop and compare two analytic rubric formats as part of that assessment method. Participants were first-year secondary school students in the Netherlands (n = 158) that acquired oral presentation skills with the support of either a formative assessment method with analytic rubrics offered through a dedicated online tool (experimental groups), or a method using more conventional (rating scales) rubrics (control group). One experimental group was provided text-based and the other was provided video-enhanced rubrics. No prior research is known about analytic video-enhanced rubrics, but, based on research on complex skill development and multimedia learning, we expected this format to best capture the (non-verbal aspects of) oral presentation performance. Significant positive differences on oral presentation performance were found between the experimental groups and the control group. However, no significant differences were found between both experimental groups. This study shows that a well-designed formative assessment method, using analytic rubric formats, outperforms formative assessment using more conventional rubric formats. It also shows that higher costs of developing video-enhanced analytic rubrics cannot be justified by significant more performance gains. Future studies should address the generalizability of such formative assessment methods for other contexts, and for complex skills other than oral presentation, and should lead to more profound understanding of video-enhanced rubrics.


Introduction
Both practitioners and scholars agree that students should be able to present orally (e.g., Morreale & Pearson, 2008;Smith & Sodano, 2011). Oral presentation involves the development and delivery of messages to the public with attention to vocal variety, articulation, and non-verbal signals, and with the aim to inform, self-express, relate to and persuade listeners (Baccarini & Bonfanti, 2015;De Grez et al., 2009a;Quianthy, 1990). The current study is restricted to informative presentations (as opposed to persuasive presentations), as these are most common in secondary education. Oral presentation skills are complex generic skills of increasing importance for both society and education (Voogt & Roblin, 2012). However, secondary education seems to be in lack of instructional design guidelines for supporting oral presentation skills acquisition. Many secondary schools in the Netherlands are struggling with how to teach and assess students' oral presentation skills, lack clear performance criteria for oral presentations, and fall short in offering adequate formative assessment methods that support the effective acquisition of oral presentation skills (Sluijsmans et al., 2013).
Many researchers agree that the acquisition and assessment of presentation skills should depart from a socio-cognitive perspective (Bandura, 1986) with emphasis on observation, practice, and feedback. Students practice new presentation skills by observing other presentations as modeling examples, then practice their own presentation, after which the feedback is addressed by adjusting their presentations towards the required levels. Evidently, delivering effective oral presentations requires much preparation, rehearsal, and practice, interspersed with good feedback, preferably from oral presentation experts. However, large class sizes in secondary schools of the Netherlands offer only limited opportunities for teacher-student interaction, and offer even fewer practice opportunities. Based on research on complex skill development and multimedia learning, it can be expected that videoenhanced analytic rubric formats best capture and guide oral presentation performance, since much non-verbal behavior cannot be captured in text (Van Gog et al., 2014;Van Merriënboer & Kirschner, 2013).

Formative assessment of complex skills
To support complex skills acquisition under limited teacher guidance, we will need more effective formative assessment methods (Boud & Molloy, 2013) based on proven instructional design guidelines. During skills acquisition students will perceive specific feedback as more adequate than non-specific feedback (Shute, 2008). Adequate feedback should inform students about (i) their task-performance, (ii) their progress towards intended learning goals, and (iii) what they should do to further progress towards those goals (Hattie & Timperly, 2007;Narciss, 2008). Students receiving specific feedback on criteria and performance levels will become equipped to improve oral presentation skills (De Grez et al., 2009a;Ritchie, 2016). Analytic rubrics are therefore promising formats to provide specific feedback on oral presentations, because they can demonstrate the relations between subskills and explain the open-endedness of ideal presentations (through textual descriptions and their graphical design). Ritchie (2016) showed that adding structure and self-assessment to peer-and teacherassessments resulted in better oral presentation performance. Students were required to use analytic rubrics for self-assessment when following their (project-based) classroom education. In this way, they had ample opportunity for observing and reflecting on (good) oral presentations attributes, which was shown to foster acquisition of their oral presentation skills.

Viewbrics
Analytic rubrics incorporate performance criteria to inform teachers and students when preparing oral presentation. Such rubrics support mental model formation, and enable adequate feedback provision by teachers, peers, and self (Brookhart & Chen, 2015;Jonsson & Svingby, 2007;Panadero & Jonsson, 2013). Such research is inconclusive about what are most effective formats and delivery media, but most studies dealt with analytic text-based rubrics delivered on paper. However, digital video-enhanced analytic rubrics are expected to be more effective for acquiring oral presentation skills, since many behavioral aspects refer to non-verbal actions and processes that can only be captured on video (e.g., body posture or use of voice during a presentation).
This study is situated within the Viewbrics project where video-modelling examples are integrated with analytic text-based rubrics (Ackermans et al., 2019a). Video-modelling examples contain question prompts that illustrate behavior associated with (sub)skills performance levels in context, and are presented by young actors the target group can identify with. The question prompts require students to link behavior to performance levels, and build a coherent picture of the (sub)skills and levels. To the best of authors' knowledge, there exist no previous studies on such video-enhanced analytic rubrics. The Viewbrics tool has been incrementally developed and validated with teachers and students to structure the formative assessment method in classroom settings (Rusman et al., 2019).
The purpose of our study is twofold. On the one hand, it investigates whether the application of evidence-based design guidelines results in a more effective formative assessment method in classroom. On the other hand, it investigates (within that method) whether video-enhanced analytic rubrics are more effective than text-based analytic rubrics.

Research questions
The twofold purpose of this study is stated by two research questions: (1) To what extent do analytic rubrics within formative assessment lead to better oral presentation performance? (the design-based part of this study); and (2) To what extent do video-enhanced analytic rubrics lead to better oral presentation performance (growth) than text-based analytic rubrics? (the experimental part of this study). We hypothesize that all students will improve their oral presentation performance in time, but that students in the experimental groups (receiving analytic rubrics designed according to proven design guidelines) will outperform a control group (receiving conventional rubrics) (Hypothesis 1). Furthermore, we expect the experimental group using video-enhanced rubrics to achieve more performance growth than the experimental group using text-based rubrics (Hypothesis 2).
After this introduction, the second section describes previous research on design guidelines that were applied to develop the analytic rubrics in the present study. The actual design, development and validation of these rubrics is described in "Development of analytic rubrics tool" section. "Method" section describes the experimental method of this study, whereas "Results" section reports its results. Finally, in the concluding "Conclusions and discussion" section, main findings and limitations of the study are discussed, and suggestions for future research are provided.

Previous research and design guidelines for formative assessment with analytic rubrics
Analytic rubrics are inextricably linked with assessment, either summative (for final grading of learning products) or formative (for scaffolding learning processes). They provide textual descriptions of skills' mastery levels with performance indicators that describe concrete behavior for all constituent subskills at each mastery level (Allen & Tanner, 2006;Reddy, 2011;Sluijsmans et al., 2013) (see Figs. 1 and 2 in "Development of analytic rubrics tool" section for an example). Such performance indicators specify aspects of variation in the complexity of a (sub)skill (e.g., presenting for a small, homogeneous group as compared to presenting for a large heterogeneous group) and related mastery levels (Van Merriënboer & Kirschner, 2013). Analytic rubrics explicate criteria and expectations, can be used to check students' progress, monitor learning, and diagnose learning problems, either by teachers, students themselves or by their peers (Rusman & Dirkx, 2017).
Several motives for deploying analytic rubrics in education are distinguished. A review study by Panadero and Jonsson (2013) identified following motives: increasing transparency, reducing anxiety, aiding the feedback process, improving student   self-efficacy, and supporting student self-regulation. Analytic rubrics also improve reliability among teachers when rating their students (Jonsson & Svingby, 2007). Evidence has shown that analytic rubrics can be utilized to enhance student performance and learning when they were used for formative assessment purposes in combination with metacognitive activities, like reflection and goal-setting, but research shows mixed results about their learning effectiveness (Panadero & Jonsson, 2013).
It remains unclear what is exactly needed to make their feedback effective (Reddy & Andrade, 2010;Reitmeier & Vrchota, 2009). Apparently, transparency of assessment criteria and learning goals (i.e., make expectations and criteria explicit) is not enough to establish effectiveness (Wöllenschläger et al., 2016). Several researchers stressed the importance of how and which feedback to provide with rubrics (Bower et al., 2011;De Grez et al., 2009b;Kerby & Romine, 2009). We now continue this section by reviewing design guidelines for analytic rubrics we encountered in literature, and then specifically address what literature mentions about the added value of video-enhanced rubrics.

Design guidelines for analytic rubrics
Effective formative assessment methods for oral presentation and analytic rubrics should be based on proven instructional design guidelines (Van Ginkel et al., 2015). Table 1 presents an overview of (seventeen) guidelines on analytic rubrics we encountered in literature. Guidelines 1-4 inform us how to use rubrics for formative assessment; Guidelines 5-17 inform us how to use rubrics for instruction, with Guidelines 5-9 on a rather generic, meso level and Guidelines 10-17 on a more specific, micro level. We will now shortly describe them in relation to oral presentation skills. Use peer-assessment via rubrics for formative purposes Guideline 4 Provide rubrics for usage by self, peers, and teachers as students appreciate rubrics Guideline 5 Train teachers before using rubrics in instruction Guideline 6 Formulate explicit policy towards using rubrics in instruction Guideline 7 Train and motivate students for using rubrics Guideline 8 Take the context of the school/class into account when using rubrics Guideline 9 Use rubrics within a constructive approach to learning Guideline 10 Align learning objectives with outcomes in rubrics descriptions [instruction] Guideline 11 Use authentic learning tasks with increasing complexity [instruction] Guideline 12 Use modeling examples to scaffold and illustrate behavioral skills [learning Guideline 13 Provide sufficient rehearsal practice for behavioral skills [learning] Guideline 14 Provide high-quality feedback [assessment Guideline 15 Involve peers in formative assessment [assessment] Guideline 16 Facilitate self-assessment [assessment Guideline 17 Use specific goal-setting within self-assessment [assessment

Guideline 1: use analytic rubrics instead of rating scale rubrics if rubrics are meant for learning
Conventional rating-scale rubrics are easy to generate and use as they contain scores for each performance criterium (e.g., by a 5-point Likert scale). However, since each performance level is not clearly described or operationalized, rating can suffer from rater-subjectivity, and rating scales do not provide students with unambiguous feedback (Suskie, 2009). Analytic rubrics can address those shortcomings as they contain brief textual performance descriptions on all subskills, criteria, and performance levels of complex skills like presentation, but are harder to develop and score (Bargainnier, 2004;Brookhart, 2004;Schreiber et al., 2012).

Guideline 2: use self-assessment via rubrics for formative purposes
Analytic rubrics can encourage self-assessment and -reflection (Falchikov & Boud, 1989;Reitmeier & Vrchota, 2009), which appears essential when practicing presentations and reflecting on other presentations (Van Ginkel et al., 2017). The usefulness of self-assessment for oral presentation was demonstrated by Ritchie's study (2016)

Guideline 3: use peer-assessment via rubrics for formative purposes
Peer-feedback is more (readily) available than teacher-feedback, and can be beneficial for students' confidence and learning (Cho & Cho, 2011;Murillo-Zamorano & Montanero, 2018), also for oral presentation (Topping, 2009). Students positively value peerassessment if the circumstances guarantee serious feedback (De Grez et al., 2010;Lim et al., 2013). It can be assumed that using analytic rubrics positively influences the quality of peer-assessment.

Guideline 4: provide rubrics for usage by self, peers, and teachers as students appreciate rubrics
Students appreciate analytic rubrics because they support them in their learning, in their planning, in producing higher quality work, in focusing efforts, and in reducing anxiety about assignments (Reddy & Andrade, 2010), aspects of importance for oral presentation. While students positively perceive the use of peer-grading, the inclusion of teacher-grades is still needed (Mulder et al., 2014) and most valued by students (Ritchie, 2016).

Guidelines 5-9
Heitink et al. (2016) carried out a review study identifying five relevant prerequisites for effective classroom instruction on a meso-level when using analytic rubrics (for oral presentations): train teachers and students in using these rubrics, decide on a policy of their use in instruction, while taking school-and classroom contexts into account, and follow a constructivist learning approach. In the next section, it is described how these guidelines were applied to the design of this study's classroom instruction.

Guidelines 10-17
Van Ginkel et al. (2015) review study presents a comprehensive overview of effective factors for oral presentation instruction in higher education on a micro-level. Although our research context is within secondary education, the findings from the aforementioned study seem very applicable as they were rooted in firmly researched and well-documented Instructional Design approaches. Their guidelines pertain to (a) instruction, (b) learning, and (c) assessment in the learning environment (Biggs, 2003). The next section describes how guidelines were applied to the design of this study's online Viewbrics tool.

Video-enhanced rubrics
Early analytic rubrics for oral presentations were all text-based descriptions. This study assumes that such analytic rubrics may fall short when used for learning to give oral presentations, since much of the required performance refers to motoric activities, time-consecutive operations and processes that can hardly be captured in text (e.g., body posture or use of voice during a presentation). Text-based rubrics also have a limited capacity to convey contextualized and more 'tacit' behavioral aspects (O'Donnevan et al., 2004), since 'tacit knowledge' (or 'knowing how') is interwoven with practical activities, operations, and behavior in the physical world (Westera, 2011). Finally, text leaves more space for personal interpretation (of performance indicators) than video, which negatively influences mental model formation and feedback consistency (Lew et al., 2010). We can therefore expect video-enhanced rubrics to overcome such restrictions, as they

Development of analytic rubrics tool
This section describes how design guidelines from previous research were applied in the actual development of the rubrics in the Viewbrics tool for our study, and then presents the subskills and levels for oral presentation skills as were defined.

Application of design guidelines
The previous section already mentioned that analytic rubrics should be restricted to formative assessment (Guidelines 2 and 3), and that there are good reasons to assume that a combination of teacher-, peer-, and self-assessment will improve oral presentations (Guidelines 1 and 4). Teachers and students were trained in rubric-usage (Guidelines 5 and 7), whereas students were motivated for using rubrics (Guideline 7). As participating schools were already using analytic rubrics, one might assume their positive initial attitude. Although the policy towards using analytic rubrics might not have been generally known at the work floor, the participating teachers in our study were knowledgeable (Guideline 6). We carefully considered the school context, as (a representative set of) secondary schools in the Netherlands were part of the Viewbrics team (Guideline 8). The formative assessment method was embedded within project-based education (Guideline 9).
Within this study and on the micro-level of design, the learning objectives for the first presentation were clearly specified by lower performance levels, whereas advice on students' second presentation focused on improving specific subskills, that had been performed with insufficient quality during the first presentation (Guideline 10). Students carried out two consecutive projects of increasing complexity (Project 1, Project 2) with authentic tasks, amongst which the oral presentations (Guideline 11). Students were provided with opportunities to observe peer-models to increase their self-efficacy beliefs and oral presentation competence. In our study, only students that received video-enhanced rubrics could observe videos with peer-models before their first presentation (Guideline 12). Students were allowed enough opportunities to rehearse their oral presentations, to increase their presentation competence, and to decrease their communication apprehension. Within our study, only two oral presentations could be provided feedback, but students could rehearse as often as they wanted outside the classroom (Guideline 13). We ensured that feedback in the rubrics was of high quality, i.e., explicit, contextual, adequately timed, and of suitable intensity for improving students' oral presentation competence. Both experimental groups in the study used digital analytic rubrics within the Viewbrics tool (both teacher-, peer-, and self-feedback). The control group received feedback by a more conventional rubric (rating scale), and could therefore not use the formative assessment and reflection functions (Guideline 14). The setup of the study implied that all peers play a major role during formative assessment in both experimental groups, because they formatively assessed each oral presentation using the Viewbrics tool (Guideline 15). The control group received feedback from their teacher. Both experimental groups used the Viewbrics tool to facilitate self-assessment (Guideline 16). The control group did not receive analytic progress data to inform their self-assessment. Specific goal-setting within self-assessment has been shown to positively stimulate oral presentation performance, to improve selfefficacy and reduce presentation anxiety (De Grez et al., 2009a;Luchetti et al., 2003), so the Viewbrics tool was developed to support both specific goal-setting and self-reflection (Guideline 17). Reddy and Andrade (2010) stress that rubrics should be tailored to the specific learning objectives and target groups. Oral presentations in secondary education (our study context) involve generating and delivering informative messages with attention to vocal variety, articulation, and non-verbal signals. In this context, message composition and message delivery are considered important (Quianthy, 1990). Strong arguments ('logos') have to be presented in a credible ('ethos') and exciting ('pathos') way (Baccarini & Bonfanti, 2015). Public speaking experts agree that there is not one right way to do an oral presentation (Schneider et al., 2017). There is agreement that all presenters need much practice, commitment, and creativity. Effective presenters do not rigorously and obsessively apply communication rules and techniques, as their audience may then perceive the performance as too technical or artificial. But all presentations should demonstrate sufficient mastery of elementary (sub)skills in an integrated manner. Therefore, such skills should also be practiced as a whole (including knowledge and attitudes), making the attainment of a skill performance level more than the sum of its constituent (sub)skills (Van Merriënboer & Kirschner, 2013). A validated instrument for assessing oral presentation performance is needed to help teachers assess and support students while practicing.

Subskills and levels for oral presentation
When we started developing rubrics with the Viewbrics tool (late 2016), there were no studies or validated measuring instruments for oral presentation performance in secondary education, although several schools used locally developed, non-validated assessment forms (i.e., conventional rubrics). For instance, Schreiber et al. (2012) had developed an analytic rubric for public speaking skills assessment in higher education, aimed at faculty members and students across disciplines. They identified eleven (sub)skills of public speaking, that could be subsumed under three factors ('topic adaptation', 'speech presentation' and 'nonverbal delivery', similar to logos-ethos-pathos).
Such previous work holds much value, but still had to be adapted and elaborated in the context of the current study. This study elaborated and evaluated eleven subskills that can be identified within the natural flow of an oral presentation and its distinctive features (See Fig. 1 for an overview of subskills, and Fig. 2 for a specification of performance levels for a specific subskill).
Between brackets are names of subskills as they appear in the dashboard of the Viewbrics tool (Fig. 3).
The upper part of Fig. 2 shows the scoring levels for first-year secondary school students for criterium 4 of the oral presentation assessment (four values, from more expert (4 points) to more novice (1 point), from right to left), an example of the conventional ratingscale rubrics. The lower part shows the corresponding screenshot from the Viewbrics tool, representing a text-based analytic rubric example. A video-enhanced analytic rubric example for this subskill provides a peer modelling the required behavior on expert level, with question prompts on selecting reliable and interesting materials. Performance levels were inspired by previous research (Ritchie, 2016;Schneider et al., 2017;Schreiber et al., 2012), but also based upon current secondary school practices in the Netherlands, and developed and tested with secondary school teachers and their students.
All eleven subskills are to be scored on similar four-point Likert scales, and have similar weights in determining total average scores. Two pilot studies tested the usability, validity and reliability of the assessment tool (Rusman et al., 2019). Based on this input, the final rubrics were improved and embedded in a prototype of the online Viewbrics tool, and used for this study. The formative assessment method consisted of six steps: (1) study the rubric; (2) practice and conduct an oral presentation; (3) conduct a self-assessment; (4) consult feedback from teacher and peers; (5) Reflect on feedback; and (6) select personal learning goal(s) for the next oral presentation.
After the second project (Project 2), that used the same setup and assessment method as for the first project, students in experimental groups could also see their visualized progress in the 'dashboard' of the Viewbrics tool (see Fig. 3, with English translations provided between brackets), by comparing performance on their two project presentations during the second reflection assignment. The dashboard of the tool shows progress (inner circles), with green reflecting improvement on subskills, blue indicating constant subskills, and red indicating declining subskills. Feedback is provided by emoticons with text. Students' personal learning goals after reflection are shown under 'Mijn leerdoelen' [My learning goals].

Method
The previous sections described how design guidelines for analytic rubrics from literature ("Previous research and design guidelines for formative assessment with analytic rubrics" section) were applied in a formative assessment method with analytic rubrics ("Development of analytic rubrics tool" section). "Method" section describes this study's research design for comparing rubric formats.

Research design of the study
All classroom scenarios followed the same lesson plan and structure for project-based instruction, and consisted of two projects with specific rubric feedback provided in between. Both experimental groups used the same formative assessment method with validated analytic rubrics, but differed on the analytic rubric format (text-based, videoenhanced). The students of the control group did not use such a formative assessment method, and only received teacher-feedback (via a conventional rating-scale rubric that consisted of a standard form with attention points for presentations, without further instructions) on these presentations. All three scenarios required similar time investments for students. School classes (six) were randomly assigned to conditions (three), so students from the same class were in the same condition. Figure 4 graphically depicts an overview of the research design of the study.
A repeated-measures mixed-ANOVA on oral presentation performance (growth) was carried out to analyze data, with rubric-format (three conditions) as between-groups factor and repeated measures (two moments) as within groups factor. All statistical data analyses were conducted with SPSS version 24.

Participants
Participants were first-year secondary school students (all within the 12-13 years range) from two Dutch schools, with participants equally distributed over schools and conditions (n = 166, with 79 girls and 87 boys). Classes were randomly allocated to conditions. Most participants completed both oral presentations (n = 158, so an overall response rate of 95%). Data were collected (almost) equally from the video-enhanced rubrics condition (n = 51), text-based condition (n = 57), and conventional rubrics (control) condition (n = 50).
A related study within the same context and participants (Ackermans et al., 2019b), analyzed the concept maps elicited from participants to reveal that their mental models (indicating mastery levels) for oral presentation across conditions were similar. From that finding we can conclude that students possessed similar mental models for presentation skills before starting the projects. Results from the online questionnaire ("Anxiety, preparedness, and motivation" section) reveal that students in experimental groups did not differ in anxiety, preparedness and motivation before their first presentation. Together with the teacher  Fig. 4 Research design overview assessments of similarity of classes, we can assume similarity of students across conditions at the start of the experiment.

Materials and procedure
Teachers from both schools worked closely together in guaranteeing similar instruction and difficulty levels for both projects (Project 1, Project 2). Schools agreed to follow a standardized lesson plan for both projects and their oral presentation tasks. Core team members then developed (condition-specific) materials for teacher-and student workshops on how to use rubrics and provide instructions and feedback (Guidelines 5 and 7). This also assured that similar measures were taken for potential problems with anxiety, preparedness and motivation. Teachers received information about (condition-specific) versions of the Viewbrics tool (see "Development of analytic rubrics tool" section). The core team consisted of three researchers and three (project) teachers, with one teacher also supervising the others. The teacher workshops were given by the supervising teacher and two researchers before starting recruitment of students.
Teachers estimated similarity of all six classes with respect to students' prior presentation skills before starting the first project. All classes were informed by an introduction letter from the core team and their teachers. Participation in this study was voluntary. Students and their parents/caretakers were informed about 4 weeks before the start of the first project, and received information on research-specific activities, time-investment and -schedule. Parents/caretakers signed, on behalf of their minors of age, an informed consent form before the study started. All were informed that data would be anonymized for scientific purposes, and that students could withdraw at any time without giving reasons.
School classes were randomly assigned to conditions. Students of experimental groups were informed that the usability of the Viewbrics tool for oral presentation skills acquisition were investigated, but were left unaware of different rubric formats. Students of the control group were informed that their oral presentation skills acquisition was investigated. From all students, concept maps about oral presentation were elicited (reflecting their mental model and mastery level). Students participated in workshops (specific for their condition and provided by their teacher) on how to use rubrics and provide peer-feedback (all materials remained available throughout the study).
Before giving their presentations on Project 1, students filled in the online questionnaire via LimeSurvey. Peers and teachers in experimental groups provided immediate feedback on given presentations, and students immediately had to self-assess their own presentations (step 3 of the assessment method). Subsequently, students could view the feedback and ratings given by their teacher and peers through the tool (step 4), were asked to reflect on this feedback (step 5), and to choose specific goals for their second oral presentation (step 6). In the control group, students directly received teachers' feedback (verbally) after completing their presentation, but did not receive any reflection assignment. Control group students used a standard textual form with attention points (conventional rating-scale rubrics). After giving their presentations on the second project, students in the experimental groups got access to the dashboard of the Viewbrics tool (see "Development of analytic rubrics tool" section) to see their progress on subskills. About a week after the classes had ended, some semi-structured interviews were carried out by one of the researchers. Finally, one of the researchers functioned as a hotline for teachers in case of urgent questions during the study, and randomly observed some of the lessons.

Measures and instruments
Oral performance scores on presentations were measured by both teachers and peers. A short online questionnaire (with 6 items) was administered to students just before their first oral presentation at the end of Project 1 (see Fig. 4). Interviews were conducted with both teachers and students at the end of the intervention to collect more qualitative data on subjective perceptions.

Oral presentation performance
Students' oral presentation performance progress was measured via comparison of the oral presentation performance scores on both oral presentations (with three months in between). Both presentations were scored by teachers using the video-enhanced rubric in all groups (half of the score in experimental groups, full score for control group). For participants in both experimental groups, oral presentation performance was also scored by peers and self, using the specific rubric-version (either video-enhanced or text-based) (other half of the score). For each of the (eleven) subskills, between 1 point (novice level) and 4 points (expert level) could be earned, with a maximum of 44 points for total performance score. For participants in the control group, the same scale applied but no scores were given by peers nor self. The inter-rater reliability of assessments between teachers and peers was a Cohen's Kappa = 0.74 which is acceptable.

Anxiety, preparedness, and motivation
Just before presenting, students answered the short questionnaire with five-point Likert scores (from 0 = totally disagree to 4 = totally agree) as additional control for potential differences in anxiety, preparedness and motivation, since especially these factors might influence oral presentation performance (Reddy & Andrade, 2010). Notwithstanding this, teachers were the major source to control for similarity of conditions with respect to dealing with presentation anxiety, preparedness and motivation. Two items for anxiety were: "I find it exciting to give a presentation" and "I find it difficult to give a presentation", a subscale that appeared to have a satisfactory internal reliability with a Cronbach's Alpha = 0.90. Three items for preparedness were: "I am well prepared to give my presentation", "I have often rehearsed my presentation", and "I think I've rehearsed my presentation enough", a subscale that appeared to have a satisfactory Cronbach's Alpha = 0.75. The item for motivation was: "I am motivated to give my motivation". Unfortunately, the online questionnaire was not administered within the control group, due to unforeseen circumstances.

Interviews
Semi-structured interviews with teachers (six) and students (thirty) were meant to gather qualitative data on the practical usability and usefulness of the Viewbrics tool. Examples of questions are: "Have you encountered any difficulties in using the Viewbrics online tool? If any, could you please mention which one(s)" (both students of experimental groups and teachers); "Did the feedback help you to improve your presentation skills? If not, what feedback do you need to improve your presentation skills?" (just students); "How do you evaluate the usefulness of formative assessment?" (both students and teachers); "Would you like to organize things differently in applying formative assessment as during this study? If so, what would you like to organize different?" (just teachers); "How much time did you spend on providing feedback? Did you need more or less time than before?" (just teachers).
Interviews with teachers and students revealed that the reported rubrics approach was easy to use and useful within the formative assessment method. Project teachers could easily stick to the lessons plans as agreed upon in advance. However, project teachers regarded the classroom scenarios as relatively time-consuming. They expected that for some other schools it might be challenging to follow the Viewbrics approach. None of the project teachers had to consult the hotline during the study, and no deviations from the lesson plans were observed by the researchers.

Results
Most important results on the performance measures and questionnaire are presented and compared between conditions.

Oral presentation performance
A mixed ANOVA, with oral presentation performance as within-subjects factor (two scores) and rubric format as between-subjects factor (three conditions), revealed an overall and significant improvement of oral presentation performance over time, with F(1, 157) = 58.13, p < 0.01, η p 2 = 0.27. Significant differences over time were also found between conditions, with F(2, 156) = 17.38, p < 0.01, η p 2 = 0.18. Tests of between-subjects effects showed significant differences between conditions, with F(2, 156) = 118.97, p < 0.01, η p 2 = 0.59, and both experimental groups outperforming the control group as expected (so we could accept H1). However, only control group students showed significantly progress on performance scores over time (at the 0.01 level). At both measures, no significant differences between experimental groups were found as was expected (so we had to reject H2). For descriptives of group averages (over time) see Table 2.
A post-hoc analysis, using multiple pairwise comparisons with Bonferroni correction, confirms that experimental groups significantly (with p < 0.01 level) outperform the control group at both moments in time, and that both experimental groups not to differ significantly at both measures. Regarding performance progress over time, only the control group shows significant growth (again with p < 0.01). The difference between experimental groups in favour of video-enhanced rubrics did 'touch upon' significance (p = 0.053), but formally H2 had to be rejected. This finding however is a promising trend to be further explored with larger numbers of participants.

Anxiety, preparedness, and motivation
An independent t-test comparing the similarity of participants in both experimental groups before their first presentation for anxiety, preparedness, motivation showed no difference, with t(1,98) = 1.32 and p = 0.19 for anxiety, t(1,98) = − 0.14 and p = 0.89 for preparedness, and t(1,98) = − 1.24 and p = 0.22 for motivation (see Table 3 for group averages). As mentioned in the previous section (interviews with teachers), it was assessed by teachers that presentation anxiety, preparedness and motivation in the control group were no different from both experimental groups. It can therefore be assumed that all groups were similar regarding presentation anxiety, preparedness and motivation before presenting, and that these factors did not confound oral presentation results. There are missing questionnaire data from 58 respondents: Video-enhanced (one respondent), Text-based (seven respondents), and Control group (fifty respondents), respectively.

Conclusions and discussion
The first purpose was to study if applying evidence-informed design guidelines in the development of formative assessment with analytic rubrics supports oral presentation performance of first-year secondary school students in the Netherlands. Students that used such validated rubrics indeed outperform students using common rubrics (so H1 could be accepted). This study has demonstrated that the design guidelines can also be effectively applied and used for secondary education, which makes them more generic. The second purpose was to study if video-enhanced rubrics would be more beneficial to oral presentation skills acquisition when compared to text-based rubrics, but we did not find significant differences here (so H2 had to be rejected). However, post-hoc analysis shows that the growth on performance scores over time indeed seems higher when using video-enhanced rubrics, a promising difference that is 'only marginally' significant. Preliminary qualitative findings from the interviews point out that the Viewbrics tool can be easily integrated into classroom instruction and appears usable for the target audiences (both teachers and students), although teachers state it is rather time-consuming to conform to all guidelines.
All students had prior experience with oral presentations (from primary schools) and relatively high oral presentation scores at the start of the study, so there remained limited room for improvement between their first and second oral presentation. Participants in the control group scored relatively low on their first presentation, so had more room for improvement during the study. In addition, the somewhat more difficult content of the second project (Guideline 11) might have slightly reduced the quality of the second oral presentation. Also, more intensive training, additional presentations and their assessments might have demonstrated more added value of the analytic rubrics. Learning might have occurred, since adequate mental models of skills are not automatically applied during performance (Ackermans et al., 2019b). A first limitation (and strength at the same time) of this study was its contextualization within a specific subject domain and educational sector over a longer period of time, which implies we cannot completely exclude some influence of confounding factors. A second limitation is that the Viewbrics tool has been specifically designed for formative assessment, and not meant for summative assessment purposes. Although our study revealed the inter-rater reliability of our rubrics to be satisfactory (see "Measures and instruments" section), it is likely to become lower and less suitable when compared to more traditional summative assessment methods (Jonsson & Svinby, 2007). Thirdly, just having a reliable rubric bears no evidence for content-validity (representativeness, fidelity of scoring structure to the construct domain) or generalizability to other domains and educational sectors (Jonsson & Svinby, 2007). Fourth, one might criticize the practice-based research design of our study, as this is less-controlled than laboratory studies. We acknowledge that the application of more unobtrusive and objective measures to better understand the complex relationship between instructional characteristics, student characteristics and cognitive learning processes and strategies could best be achieved in a combination of more laboratory research and more practice-based research. Notwithstanding some of these issues, we have deliberately chosen for design-based research and evidence-informed findings from educational practice.
Future research could examine the Viewbrics approach to formative assessment for oral presentation skills in different contexts (other subject matters and educational sectors). The Viewbrics tool could be extended with functions for self-assessment (e.g., record and replay one's own presentations), for coping with speech anxiety (Leary & Kowalski, 1995), and goal-setting (De Grez et al., 2009a). As this is a first study on video-enhanced rubrics, more fine-grained and fundamental research into beneficial effects on cognitive processes is needed, also to justify the additional development costs. Development of video-enhanced rubrics is more costly when compared to text-based rubrics. Another line of research might be directed to develop multiple measures for objectively determining oral presentation competence, for example using sensor-based data gathering and algorithms for data-gathering, guidance, and meaningful interpretation (Schneider et al., 2017), or direct measures of cortisol levels for speaking anxiety (Bartholomay & Houlihan, 2016;Merz & Wolf, 2015). Other instructional strategies might be considered, for example repeated practice of the same oral presentation might result in performance improvement, as has been suggested by Ritchie (2016). This also would enable to downsize the importance of presentation content and to put more focus on presentation delivery. The importance of finding good instructional technologies to support complex oral presentation skills will remain of importance throughout the twenty-first century and beyond.