Introduction

Personal initiative in learning is an important concept that has been given particular emphasis by educational leaders. The Former Secretary of Health, Education, and Welfare in the US stated that “the ultimate goal of the education system is shift to the individual the burden of pursuing his own education.” [1]. The previous concept is now popularly known as self-regulated learning (SRL) [2]. Learners with the ability to self-regulate their learning approach educational tasks with confidence, deliberation and resourcefulness; but most importantly, their self-assessment (SA) ability is well-developed [3].

In dental education, SA has been given increasing attention, and in some dental schools, it has been set as a core competence that should be fostered in trainees [4]. Some dental schools teach trainees to self-assess using objective criteria so that they become closer to that of teachers’ assessment (TA) [5]. Nevertheless, the argument about whether SA is a stable characteristic or a learnable skill is still unresolved. In medical education, a longitudinal study detected no significant change in SA over the course of three years [6]. Other studies in dental education replicated this finding [7, 8]. In contrast, there have been some researchers who managed to produce positive findings in developing self-assessment skills in dental education [9]. According to a systematic review, many SA studies in dental education did not report the use of a structured assessment form nor used detailed assessment criteria [10]. The review also highlighted that a number of studies were limited to a single assessment encounter and did not provide any information regarding how students were trained to self-assess [10]. Another shortcoming of SA studies was not exploring trainees’ attitudes [10], which could provide insight into the educational impact of self-assessment, its perceived value, and therefore, students’ motivation to improve this particular skill.

In 2022, a published pilot study investigated whether the SA of trainees can be bridged with TA in clinical operative dentistry by providing trainees with sufficient SA training [11]. This study found a decrease in the gap between SA and teacher assessment (TA) after four assessment encounters. Nevertheless, the previous study was limited by the sample size and the lack of inter-rater and intra-rater reliability statistics. The current study is an extension of the above-mentioned pilot study and is aimed to evaluate the effectiveness of a modified workplace assessment method designed to improve trainees’ self-assessment of clinical performance in operative dentistry.

Methods

Ethical approval was acquired from the ethical committee at Damascus University Faculty of Dental Medicine on January 15, 2022 (no. 98,735).

Study design

This is a quasi-experimental study conducted at Damascus University Faculty of Dental Medicine during the second semester of the academic year 2021/2022, which began in March 2022 and ended in June 2022. Participation in this study was voluntary and confidential, and the reported data does not compromise this confidentiality. Participants provided their consent to participate in the recruiting survey. The assessment method used in this study was piloted in 2021 [11].

Participants and settings

The study was conducted on fifth-year (last year) dental students during the clinical operative dentistry training module in which they had to treat patients in authentic work settings. The required total sample size of participants was calculated using G*Power 3.1 [12] based on the findings of the pilot study [11] for an effect size of 0.64, a power of 90%, and an alpha value of 0.05; the calculated sample size was 28. Participants were recruited in the study via an online survey in which they were also requested to self-evaluate their overall performance in clinical operative dentistry on a 4-point scale similar to that used in the study.

To ensure fairness between participants and non-participants, it was explicitly stated that partaking in the study would not affect their grades in the module which would be assigned based on the traditional assessment criteria of the Faculty as their non-participating peers.

Assessment method

Participants were assessed using the DOPS (Direct Observation of Procedural Skills) method and were instructed to perform self DOPS. In comparison to the pilot study [11], the grading scale was reduced from a 5-point scale to a 4-point scale (1 = clear fail, 2 = borderline fail, 3 = borderline pass, 4 = clear pass) because participants rarely hit the excellent point; second, two assessment criteria namely tooth preparation and restoration were subdivided into 4 and 7 sub-criteria respectively (supplementary file I). The English version is available in supplementary file I. The DOPS forms for clinical teachers and participants were identical except for the addition of item coded III (supplementary file I) to the clinical supervisor’s form which was designed to assess participants’ abilities to pinpoint areas of improvement and excellence. All forms designed and used in the study were in Arabic to overcome the language barrier. A grading rubric was also shared with both participants and supervisors; the rubric provided a detailed description of each intersection between a scale point and a criterion (supplementary file II). All documents went through a double-translation process to ensure the accuracy of the language.

Participants underwent five DOPS assessment encounters in which they assessed their own performance and were also assessed by a clinical supervisor simultaneously. Between each encounter, there was a one-week interval. Participants independently completed the form directly after they finished their procedure (retrospectively), whereas supervisors completed the form directly as participants were conducting the procedure. Participants did not receive feedback from supervisors until they have completed the self-assessment form.

There were three calibrated clinical assessors. Each student was assessed at least once by each supervisor. This was done to increase assessment inter-reliability [13] and mitigate possible bias that might result from participants becoming calibrated to a certain supervisor.

Five assessment encounters were decided to be the minimum number of encounters required by each participant; a reliability study of DOPS found that five encounters achieved a generalizability coefficient of 0.87 (95% CI: 0.59) [14] despite the fact that assessors were not trained. The previous pilot study [11] conducted four encounters and found significant findings; hence, five encounters were set to be sufficient, especially since the assessors were calibrated; further assessment encounters were difficult to achieve due to pragmatic reasons and limited human resources. Data from those who completed less than 5 encounters was excluded from the analysis.

Calibration of clinical supervisors

Three qualified GP dentists were chosen as clinical supervisors. These clinical supervisors had experience in conducting DOPS assessments as they participated as assessors in the pilot study. Moreover, the grading rubric was discussed in detail between the three supervisors to make sure that all agreed upon the meaning of each criterion.

Prior to commencing the study, the inter-rater reliability of the three clinical supervisors was evaluated on three DOPS occasions in which they assessed two different participants conducting three different classical operative procedures on real patients; one participant had done one procedure and the other had done two procedures. This evaluation procedure assessed inter-rater reliability across different cases.

As for the intra-rater reliability, a simulated trainee-scenario paper-based exam was designed. The rationale behind using the simulated scenario is that it can be repeated exactly the same, whereas a participant’s clinical performance can be inconsistent on different occasions. Supervisors had to evaluate the simulated trainee’s scenario using the DOPS form twice with a one-week interval. Thereafter, the intra-rater reliability was calculated for each supervisor.

During the actual DOPS encounters, the clinical supervisors did not intervene unless the participant was totally incapable of completing a certain step of the procedure, and in this case, only, the supervisor assigned the performing participant the lowest grade on that certain skill before taking over the participant and completing the step.

Self-assessment training

A full detailed description of self DOPS assessment protocol was sent to participants along with the DOPS form and grading rubric; all sent documents were in Arabic. An instructional video that explains the assessment process and criteria was filmed and sent to participants; the instructional video demonstrated correct application of the grading rubric and assessment form on scenario cases. These scenario cases covered cases that were clear pass, clear fail as well as borderline fail/pass. Before the clinical assessment encounters and to ensure that participants read and understood the assessment instructions, an electronic quiz was made in which a virtual case was put forward to participants to assess using the DOPS form and grading rubric. After submitting their answers to the quiz, participants could assess the discrepancy in their evaluations compared to the actual evaluations assigned by the clinical supervisors. Further, in line with training students to self-assess, the grading criteria, assessment approach, and basic assessment skills were illustrated to participants face-to-face before the study commenced.

Before each self DOPS encounter, clinical supervisors held a 5-minute feedforward session with participants to discuss the pitfalls they noticed participants making in assessing themselves according to the observations of the previous session [15]. After each Self DOPS, a 5-minute feedback session took place during which clinical supervisors compared their scores with that of the self-assigned ones and thereafter discussed the differences in scoring with participants so that the reason and/or the specific observations that merited a certain score were clear to participants. Further, areas of improvement and excellence were highlighted and an action plan to improve clinical performance was agreed upon. This model of starting a feedback discussion with self-assessment has been widely supported [16,17,18], and it is based on multiple pedagogical theories suggesting that feedback providers should engage in a dialogue with feedback receivers; self-assessment being a good conversation starter [19]. This sequence is especially useful in providing negative feedback as it is easier for faculty to ask participants about what they did wrong and then elaborate in comparison to directly criticizing their performance; the former approach is less pejorative [20]. Further, utilizing a standard assessment form for both participants and supervisors helps in making the content of the feedback provided by both sides closely related; an aspect that previous studies were limited by [20].

Quantifying self-assessment accuracy

Three main methods were used to quantify self-assessment accuracy. First, the mean arithmetic difference between SA and TA scores indicated how much participants overestimated or underestimated their performance at each item. Second, the sum of deviations (absolute differences) between SA and TA scores of each of the 22 items indicated how far SA was from that of TA. The deviation was calculated at the domain level as well. These two methods of quantifying self-assessment accuracy were inspired by a previous study [6].

The third method of quantifying self-assessment accuracy is novel to our study and was illustrated in the pilot study [11]; briefly, on the back page of the DOPS form (supplementary file I), participants and supervisors were asked to separately identify three areas of improvement and three areas of excellence. Thereafter, the clinical supervisors compared their points to that of participants and assessed how many points identified by participants matched theirs on a 4-point scale (0 = no matching points, 1 = one matching point, 2 = two matching points, 3 = three matching points). This variable assessed participants’ ability to identify the most serious areas of improvement and the most prominent areas of excellence in their own performance as perceived by the more experienced and calibrated clinical supervisors.

To examine changes in participants’ performance, the sum of points given by the supervisors for each participant (at each of the 22 items) was calculated and the mean was used as the main indicator of participants’ performance.

Participant’s attitudes towards the assessment method

Participants’ attitudes toward the self-assessment method were assessed after each encounter on the DOPS assessment form in two items (a- and b-, supplementary file I, pg.4).

Data analysis

The skewness of data and histograms were used to examine data normality. Minor violations of normality were disregarded as the sample size is over 15 (sample size = 32) [21]. The intraclass correlation coefficient was used to measure inter-rater and intra-rater reliability. Paired t-test was used to measure the mean difference between SA and TA at each item across different encounters. Deviation (absolute difference) between SA and TA scores was calculated for each domain at each encounter. Repeated-measures ANOVA was used to measure the difference between deviation mean scores of the five encounters; a P-value < 0.10 was considered significant and the confidence level was set at 90% in all conducted statistical tests as per previous recommendations [22]. The significant test was made less stringent considering the human factor as well as the limited assessment encounters. Statistics conducted did not focus merely on hypothesis testing but also estimation; therefore, effect size and observed power were reported whenever possible. Mixed ANOVA was used to test for the effect of sex, percentage grade (PG), case complexity and restoration type on the deviation mean scores in the five encounters. Friedman’s two-way analysis was conducted to measure the difference in the number of matching points between participants and supervisors in regard to areas of improvement and areas of excellence.

Data processing and analysis were conducted using Microsoft Excel (2019) [23] and IBM SPSS Statistics for Windows, version 26 (IBM Corp., Armonk, N.Y., USA). G*Power 3.1 [24] was used to calculate the sample size. Google Forms [25] was used to conduct the screening survey.

Results

Out of 182 trainees, 87 trainees completed the recruiting survey and only 39 trainees were selected for the study based on who completed the survey first. Seven trainees did not manage to complete all five required assessment encounters and therefore were excluded from the analysis. A total of 32 participants met the inclusion requirement. The age mean for the participants was 22.45 (SD = 0.8) and consisted of 25% (n = 8) male participants. As for their reported percentage grade (PG), 6.3% (n = 2) had a PG of 70–75%, 28.1% (n = 9) had a PG of 75–80%, 53.1% (n = 17) had a PG of 80–85% and 12.5% (n = 4) had a PG of 85–90%. The majority of cases (82.5%) treated by participants were resin composite restorations; the total number of cases was 160 (32 participants * 5 encounters). Table 1 provides a summary of the cases participants treated during the 5 assessment encounters. The inter-rater reliability as measured by the intraclass correlation coefficients for the three supervisors in the first, second, and third pre-study encounters were 0.746 (P = 0.004), 0.794 (P < 0.001), and 0.790 (P < 0.001) respectively. As for the intra-rater reliability, the intraclass correlation coefficients for the three supervisors were 0.982 (P < 0.001), 0.713(P = 0.002), and 0.843 (P < 0.001).

Table 1 Summary of assigned supervisor, case complexity, restoration material and restoration class of participants’ cases in the five occasions in total (n = 160)

The changes over time in the mean differences (MDs) between SA and TA scores (SA-TA) at each assessment criterion are illustrated in Table 2. Most MDs were positive and this indicated that SA scores were higher than TA. As for the changes across different encounters, MDs of most items decreased through the five encounters. The number of items with a statistically significant difference (P < 0.10) between TA and SA was 17, 10, 11, 7, and 10 in the first, second, third, fourth, and fifth encounters respectively. It is also important to note that P-value increased across encounters from mostly < 0.001 to higher values as shown in Table 2. In items no. 18 and 20, the MDs were consistently negative indicating that TA scores were higher than SA. In items no. 11, 14, and 15, MDs were initially positive but changed to negative later in the 4th and 5th encounters. MDs between SA and TA remained statistically significant across the five encounters in items no. 1, 2, 3, 6, and 7; nevertheless, the P-vale decreased as shown in Table 2. The MD of the Overall performance assessment (item no. 22) dropped from 0.77 ± 0.7 in the first encounter with a significant difference (P < 0.001) to 0.29 ± 1.0 in the 5th encounter with no significant difference between SA and TA (P > 0.10).

Table 2 Paired t-test showing the mean difference between self-assigned scores and that of clinical assessors at each assessment criteria in each encounter

The deviations (absolute differences) between SA and TA in each domain across the five encounters are demonstrated in Table 3. Deviation in all domains except for tooth preparation decreased in the last two encounters in comparison to the first two encounters. Deviation of overall performance assessment (item no. 22) also decreased slightly. The total deviation of all 22 items dropped consistently across encounters (Table 3). A repeated-measures ANOVA, with sphericity assumed, was conducted to assess the difference between the deviation scores of the five encounters. The results of the repeated-measures ANOVA are illustrated in Table 3. The aggregated total deviation mean differed significantly across the five encounters, P = 0.064, with a medium effect size and an observed power of 65.1%. Polynomial contrasts indicated, that there was a statistically significant linear trend in the changes of the aggregated total deviation across encounters, F(1, 31) = 5.659, P = 0.024 with a large effect size, partial Eta2 = 0.154 and an observed power of 0.752. There was a significant difference in the deviation score of the tooth restoration domain, P = 0.068, partial Eta squared = 0.068. The largest P-value and the smallest effect size in the assessment domains were observed in the tooth preparation domain, P = 0.874, partial Eta squared = 0.010. In terms of changes in TA across the five encounters, the sum of points for all 22 items given by supervisors fluctuated between encounters with a rising tendency (Table 3). A repeated measures ANOVA was conducted to assess whether there were differences in TA between the five assessment encounters. Results indicated that teachers’ rated participants’ performance significantly differed across encounters, (F = 3.986, P = 0.011, partial Eta2 = 0.363). Bonferroni multiple comparisons detected a significant difference (P = 0.005) between TA in the first and fourth encounters. The sum of SA points for the 22 items is illustrated in Table 3. SA mean scores fluctuated slightly with a decreasing tendency. Repeated-measures ANOVA did not indicate a significant difference between SA of the five encounters, P = 0.539 with slightly over small effect size, partial Eta2 = 0.025, and an observed power of 0.361. At the assessment criteria level (considered separately), it was observed that SA mean scores generally declined after each encounter, whereas TA scores increased (Supplementary file III).

Table 3 Absolute differences between self-assigned scores and that of clinical assessors in each assessment domain with repeated measures ANOVA statistics

Mixed ANOVA showed no statistically significant effect for sex, age, and percentage grade on the aggregated total deviation score, P > 0.10 (Table 4). Similarly, one-way ANOVA showed no significant effect, P > 0.10, for case complexity, restoration material, and restoration type on the aggregated total deviation mean score with a small effect size. Only age had a large effect size (partial eta2 = 0.202) but with no statistically significant effect (P = 0.194) (Table 4).

Table 4 Mixed ANOVA statistics showing the effect of sex, age, percentage grade and case-related factors on the aggregated deviation score across each of the five assessment encounters

Figure 1 shows the sum of matching points between participants and supervisors in areas of improvement and areas of excellence in each of the five encounters. The number of matching points of areas of improvement between participants and supervisors increased significantly according to Friedman’s two-way analysis (P = 0.044) from only 20 matching points in the first encounter to 33 matching points in the fifth encounter. Similarly, the number of matching points for areas of excellence between participants and supervisors also increased consistently and significantly (P = 0.042) across the five encounters.

Fig. 1
figure 1

Changes in sum of matching points between participants and assessors in the areas of improvement and areas of excellence across the five encounters

SA and TA of overall performance assessment (item no. 22) in the screening survey and each of the five encounters are demonstrated in Fig. 2. The figure shows that the SA of overall performance in the screening survey and the first DOPS encounter were quite similar and the paired-sample t-test showed no significant difference (MD = 0.06, SD = 0.56, 90% Confidence Interval: -0.10 to 0.23, P = 0.536). In the following encounters, self-assessment started to follow in trend with that of TA, increasing when it increased and decreasing when it decreased.

Fig. 2
figure 2

Mean of participants’ self-assigned scores and that of clinical assessors in the overall performance assessment item (n.22) in the screening survey (encounter 0) and the in the 5 DOPS encounters

Participants’ attitudes towards the assessment method

After each DOPS encounter participants had to indicate their disagreement or agreement on a 4-point scale with two statements that examine participants’ attitudes (a- and b-, supplementary file I, pg.4). The first statement was Your experience with the current assessment method was positive which all participants across all encounters agreed with; the exception for this was one participant who strongly disagreed with the statement in two out of the five DOPS encounters. The second statement being You benefited from participating and evaluating yourself with the supervisor, which all participants agreed with across all five encounters.

Discussion

This study was set out to evaluate the efficacy of a modified workplace assessment method (self DOPS) designed to improve trainees’ self-assessment of clinical performance in operative dentistry. To this end, a quasi-experiment was conducted on 32 participants who engaged with self DOPS as well as were evaluated by trained and calibrated clinical supervisors across five encounters. The inter- and intra-rater reliability statistics indicated good reliability of these supervisors.The findings of this study showed that SA was consistently higher than TA across encounters; nevertheless, the aggregated total deviation (absolute difference) between SA and TA decreased considerably over the course of five DOPS encounters. The gap between SA and TA differed from one skill to another. SA and TA both fluctuated between encounters, but in general, SA decreased and TA increased. Participants’ ability to identify areas of improvement and areas of excellence consistently improved over the five encounters. Similarly, participants’ clinical performance as assessed by supervisors improved steadily in the first four encounters yet dipped in the last encounter. Positive attitudes towards the assessment method utility were expressed by participants.

According to the triangulation of different measures of self-assessment, the introduced assessment method seemed effective in improving participants’ self-assessment ability in operative dental procedures, albeit SA accuracy varied from one assessment criterion to another. In dental education, there have been several attempts to improve trainees’ ability to self-assess; some were successful [7, 9, 26,27,28], while others were not [7, 29]. Common features of successful approaches to improving SA were providing focused training and feedback as well as using clear objective criteria and allowing students enough time to develop this skill [9, 26,27,28]. Repetitive experience with SA in conjunction with corrective feedback might also be a contributing factor in improving SA accuracy [30]. A recent study that utilized digital assessment to improve SA accuracy of dental anatomy wax-up exercise reported positive findings; this study provided participants with clear evaluation criteria, a lecture on conducting SA, and practice sessions where students compared their scores with that of instructors along with access to the evaluation software for feedback [9]. Most of these elements are similar to the self-assessment experience participants were subjected to in the current study. This comparison with previous studies shed light on the elements of the assessment method used in this study that is believed to have affected SA the most, which were: the use of clear assessment criteria, focused SA training, repetitive experience with SA accompanied with corrective TA. Moreover, unlike some assessment models that rely only on feedback, the proposed assessment model introduced feedforward sessions in which assessment criteria were discussed in detail with students to calibrate and guide self-assessment practice in the immediate future encounter [15]. This may have helped participants understand the assessment criteria and therefore self-assess more accurately.

According to a systematic review in 2016 on self-assessment in dental education [10], the majority of the literature in dental education did not report the use of a structured self-assessment training. In contrast, the current study reports in detail structured and various approaches to familiarize and train participants with self-assessment such as using instructional video, orientation session, electronic quiz as well as feedforward and feedback before and after each assessment encounter; all of these approaches were not difficult to implement and could be feasible at a larger scale. As for faculty calibration, very limited studies in the literature reported the amount and approach to faculty calibration [10], and, to our knowledge, few studies reported inter-rater or intra-rater reliability [8, 10, 31]. This adds to the significance of the current study which reported both inter-rater and intra-rater reliability and calibration method.

Self-assessment has been previously described as a stable characteristic that changes little over time [6]. This can be true if no repetitive correction of SA took place as illustrated in this study where SA mean score of overall performance in the screening survey with no corrective feedback and in the first encounter were approximately equal (Fig. 2). The aggregated SA mean scores of the five encounters were not significantly different. Nevertheless, that does not necessarily mean that SA did not change; it is important to note that SA scores did not rise even though teachers were giving students better scores after each encounter but it rather decreased overall. It is possible that a significant difference in SA could have been detected if more assessment encounters were conducted. Further evidence of improvement in the SA ability of participants are present in this study such as: (1) the reduced gap between SA and TA encounter after encounter, (2) improvement in participants ability to pinpoint areas of improvement and areas of excellence.

Participants overcalled their performance in the majority of assessment criteria and this is concurrent with other study findings [10, 11]. This might reflect the lack of experience in performing operative dental procedures [7, 32], or might show that participants were injecting their emotions consciously or subconsciously during the assessment process as a defense mechanism.

In this study, SA of technical skills (tooth preparation and tooth restoration) was closer to that of TA in comparison to non-technical skills (knowledge and professionalism). This is concurrent with previous findings in surgical and dental training [11, 33]. Judging one’s performance is relatively straightforward when it produces simple objective outcomes such as avoiding excess or sub-margination (item no. 17) or achieving optimal tooth anatomy (no. 19, 20) [3]. On the other hand, assessing covert interpersonal skills such as professionalism and patient management skills have varying subjective outcomes, making the assessment process more difficult.

It is out of the scope of this study to determine the relationship between conducting self-assessment and performance levels. Nevertheless, participants’ scores as given by teachers consistently improved over the course of four encounters and dropped slightly in the fifth encounter. The dip in participants’ performance in the fifth encounter might be because it is the last session participants have to finish their assigned clinical cases before the end of the term. In a former study in a removable prosthodontics lab, self-assessment and reflection were found beneficial in improving future performance [34]. Zimmerman’s SRL cyclical model shows how self-assessment and performance are mutually dependent. For instance, self-assessment can help an individual identify learning goals and increase one’s motivation which in turn can affect performance [3, 35]. It’s only logical for self-assessment to change according to performance. The skills needed to perform the metacognitive task of SA are exactly the same as the cognitive tasks necessary to perform well at a certain procedure [36]. Similarly, performance improvement can also affect SA in what is popularly known as the Dunning–Kruger Effect [37]. This could explain why our novice participants’ inflated SA scores at the first encounter declined when performance improved. A model that illustrates the factors leading to a reduced gap between SA and TA is illustrated in Fig. 3.

Fig. 3
figure 3

A model that highlights the factors leading to the reduced gap between TA and SA in the current study

The complexity of the clinical environment makes it difficult to know for sure whether the introduced assessment method or some other variable affected SA accuracy. For example, the patient and the dental assistant’s role were not taken into consideration in the current study. Task familiarity is another factor that can affect participants’ ability to self-assess; this limits the transferability of our findings to other procedures. Although many variables such as time, case difficulty, and procedure were taken into consideration, this study’s results need to be interpreted with an understanding of the nature of the complex clinical environment.

Implementation of DOPS as proposed in the protocol of this study should have similar feasibility to the regular DOPS with no self-assessed forms. The long amount of time the assessor has to spend with the assessee is a feasibility issue that was reported in the previous literature and were faced in this study [14]. This issue should be especially problematic when shortage of staff and large student count are apparent as in the case Damascus University dental school [38, 39]. Nonetheless, adding the self-assessment component to DOPS did not increase the usual amount of time taken for feedback despite the focus on fostering on self-assessment. In actuality, it made the feedback session more focused and efficient as the teacher skipped discussing the points in which the trainee showed confirming self-assessment and focused primarily on aspects that have disconfirming self-assessment. The proposed DOPS model adopted in this study should be as pedagogically effective as regular DOPS [40] and it could be more student-centered thanks to its self-assessment component; this might increase acceptability of DOPS among students. To allow the implementation of this self-assessment focused DOPS at a larger scale at dental schools such as Damascus University Faculty of Dental Medicine, it is suggested that the university should reform its assessment policy to allow the adoption and experimentation with for-learning assessment methods; second, the university should increase funding to train the clinical staff; third, the university should modify its admission policy taking into consideration the staff capacity [38, 39].

Despite its limitations, the current study findings add to the body of the literature concerned with improving the SA accuracy of dental trainees. Few studies have tried to improve SA in authentic clinical work settings [10]; this makes the current study an important addition, especially with the results it held as to the improvement in SA ability of participants. It is well-established in the scientific community that students have poor self-assessment abilities [41, 42] and evidence did not suggest the improvement of this skill without intervention over time [6, 43]. This urges for the exploration of novel structured methods to teach students how to accurately self-assess their performance in the hectic clinical environment [10]. It would be of interest for future research to explore how this self-DOPS affects students’ SRL processes.

Conclusion

The findings of this study suggest that the introduced assessment method was effective in fostering participants’ ability to self-assess and identify areas of improvement in the clinical environment. A natural progression of this work is to evaluate the effectiveness of the introduced assessment method in developing participants’ self-assessment of performance in a wider range of clinical procedures.