Introduction

Health sciences education has long focused on the science of teaching, but in recent years we have seen a shift toward the science of learning [1, 2]. This study focused on undergraduates’ workplace learning in health sciences education [3,4,5] from an educational psychology perspective. Psychological theories, especially the theory of self-regulated learning (SRL), focus on the individual learner and view learning as a process in which cognitive, motivational, emotional, and contextual aspects are considered [6,7,8]. To gain a better understanding of undergraduates’ learning processes at the workplace, multivariate and longitudinal studies are needed. Such studies require fewer items per construct, which helps avoid survey fatigue and increases applicability in workplace settings, especially stressful ones.

We assessed the psychometric properties of single-item measures of constructs related to self-regulated learning at the workplace and this paper discusses the items’ role in health sciences education research. We selected some single items from the Workplace Learning Inventory in Health Sciences Education (WLI) [9] scales and specifically developed others using more general wording.

Self-regulated learning in the workplace

The study was based on SRL research [10, 11]. In health sciences education, the most prominent SRL theory is a process-based model, namely Zimmerman’s cyclical phases model, which differentiates between the forethought, performance, and reflection phases [12, 13]. Besides process-based models, there are also component-based models that integrate different areas (e.g., Pintrich’s conceptual framework for assessing motivation and SRL) [14] and levels (e.g., Boekaerts’ six-component model of SRL) [15]. The present study adopted Steinberg et al.‘s [9] component-based model as its conceptual framework. The model integrates four areas, namely cognition, motivation, emotion, and context, at two levels, namely the learning process level and the metalevel, resulting in a total of eight components (see Fig. 1).

Fig. 1
figure 1

The four areas of workplace SRL at the learning process level (inner components) and the metalevel (outer components), based on Steinberg et al. [9]

Cognition refers to learning strategies focused on workplace learning [16]. Motivation means instigating and sustaining goal-directed activity [17]. Emotions are defined within the broader concept of affect but are distinguished from other affective phenomena, such as moods, in that emotions are more intense, have a clearer object focus and a more salient cause, and are typically experienced for a shorter duration [18,19,20]. Context means undergraduate medical students’ perceptions of multiple dimensions of the educational environment in the clinical practice setting [21]. The metalevel of cognition, motivation, emotion, and context means regulating those respective aspects of the learning process [14, 22,23,24,25]. For more details on the model, we refer to Steinberg et al. [9].

Steinberg et al. identified aspects relevant to the eight components of undergraduates’ workplace learning and developed corresponding scales, resulting in the WLI [9], which provides 31 scales, each comprising three to six items. Researchers investigating workplace learning can select the scales that are relevant to their research questions. Table 1 lists the constructs with their corresponding definitions.

Van Houten-Schat et al. [13] and Roth et al. [26] reviewed SRL research and respectively identified the need to investigate SRL sub-processes in the workplace to gain a better understanding of the interplay of the different SRL aspects, as well as the need to use more diverse methodologies in SRL research, including multivariate longitudinal and diary studies. Single-item measures of the WLI constructs could facilitate such studies.

Table 1 WLI constructs and definitions (based on Steinberg et al.) [9]

Single-item measures

We summarize the discussion on the advantages and disadvantages of using single items in scientific studies, based on overviews provided in the literature [27,28,29]. Arguments in favor of the use of single items include parsimony, which is relevant in holistic studies considering the large number of theoretical constructs, as well as in diary studies with many measurement points and in time-limited settings such as data collection in the workplace. Parsimony is also associated with increased participant motivation and cognitive involvement, resulting in fewer missing values and higher validity. Moreover, parsimony addresses researchers’ ethical commitment to participants; that is, researchers strive not to overburden participants and to avoid their confusion and frustration when answering similar items. Other arguments in favor of the use of single items are their lower ambiguity, better interpretability, higher face validity, and reduced risk of criterion contamination.

Arguments against the use of single items include their lower or unknown reliability, their inability to adequately capture complex psychological constructs, and the less fine-grained distinctions between individuals. Hence, single items are usually acceptable when the construct is concrete, unidimensional, clearly defined, narrow in scope, and used as a moderator or control variable or when the desired precision is low [27, 28, 30]. Fisher et al. summarized successful examples of single items used in organizational psychology [28].

If there is uncertainty about whether a construct meets the above requirements, validation tests can be performed to ensure trustworthiness [30]. The appropriate validation method depends on how the item was developed, that is, whether it was selected from an existing scale or developed anew [28]. For items selected from a scale, Gogol et al. [29] have provided the following best practice for examining the psychometric properties: [27] assessing the reliability, information reproduction, and relationships within the nomological network. For newly developed single items measuring stable characteristics (or traits), the recommendation is to assess the test–retest reliability [28]; however, this method is inappropriate for single items measuring states that are expected to change over time, as in longitudinal studies [31]. To provide evidence of the validity of newly developed single items, assessing relationships within the nomological network is recommended.

Aim

Our study aimed to examine two sets of single items appropriate for research on undergraduate health science students’ learning by analyzing their reliability, their correspondence to the full scale, and their relations with external criteria. These sets of single items could be helpful for economically conducting multivariate longitudinal and intensive longitudinal studies in health workplace settings. First, we investigated 29 single items selected from the WLI [9]. The items address four areas of workplace learning, namely cognition, motivation, emotion, and context, at two levels, namely the learning process level and the metalevel. Each of the eight components is represented by several items, with the exception of emotion on the learning process level since Duffy et al. [20] have already provided single items for that. We systematically compared the single items with their corresponding full scales with respect to the following measurement questions [29]: (1) How reliable are single-item measures? (2) How well do single-item measures reproduce the information that the full scales obtain? (3) How well do single-item measures reproduce the relationships with external criteria in the nomological network that the full scales obtain?

Second, we examined four newly developed and more generally formulated single items measuring states rather than traits [32]. The items represent cognition, motivation, emotion, and context at the learning process level. Although their reliability cannot be tested, we examined the items’ validity with respect to the following measurement questions: (1) How well do single-item measures correlate with their respective full WLI scales? (2) How well do single-item measures relate to external criteria within the nomological network?

Methods

Participants

The outcomes should represent a diverse population of undergraduate health sciences students in the aspects of cognition, motivation, emotion, and learning environment. Consequently, we made a deliberate effort to encompass the majority of a pertinent student cohort from a single institution rather than distributing a questionnaire to students at different institutions, which could have led to a biased sample predominantly comprising highly motivated high achievers. We invited students from a second institution to participate in achieving the predetermined target sample size of n = 200 in adherence to a rule of thumb guideline for the minimum sample size for confirmatory factor analysis [33].

Participants were from two higher education institutions in Austria and Germany. At Institution 1, the target group comprised 200 students enrolled in a Clinical Rotation course as part of a veterinary degree program in which students learn in a clinical practical setting over a relatively long period for the first time. Students take this course in their ninth semester and rotate among highly varied workplace settings (rotations include, e.g., anesthesia/imaging diagnostics, surgery, gynecology, internal medicine, emergency department, reproduction). Although all 200 enrollees participated in the study, 13 students did not consent to their data being used for research purposes, and 11 consenting participants had to be excluded from further analysis owing to a high proportion of missing values (> 50%); the final sample size was n = 176 at Institution 1.

At Institution 2, the target group comprised about 260 students in their practical year of a veterinary degree program. Students usually complete their practical year during the ninth and tenth semesters and familiarize themselves with various workplaces. The questionnaire was opened 91 times; thereof 38 participants completed more than 50% of the items and consented to their data being used for research purposes. Combining both samples, the total sample size was n = 214 (78% [167] female[s], 21% [45] male[s], 1% [2] diverse; age: 21–41 years; M = 24.79, SD = 2.74). There were no statistically significant differences in gender (female: 77.90% [167] and 79.37% [170]) and age (M = 24.76, SD = 2.67 and M = 24.56, SD = 3.99) between Institutions 1 and 2, respectively.

Measures

We tested the psychometric properties of two sets of single items. First, the project team selected 29 single items from the WLI’s 29 full scales [9]. The project team comprised one professor, three senior scientists, two clinical teachers, and two students, all working in health science education and/or educational psychology. Single items were selected based on content and factor loadings (using the data of the study at hand). We preferred items couched in broader terms and considered face validity according to the project team’s ratings as well as those of nine researchers in the field of health sciences education and/or SRL who were not part of the project team. Furthermore, we chose items with high factor loadings (see Steinberg et al. [9] for details on factor loadings). Second, we tested the psychometric properties for four generally formulated single items representing cognition, motivation, emotion, and context on the learning process level. The project team developed these items using established instruments/scales and experiences with the SRL questionnaire and diary items, as well as theoretical assumptions. Table 2 provides an overview of the items.

Table 2 Overview of single-items

To validate the single items, we used the WLI, as well as measures of external criteria within the nomological network, which have also been used to validate the WLI [34,35,36,37,38,39,40,41,42]. A nomological network is a system of related constructs [43]. We excluded external criteria for emotions at the learning process level and context at the metalevel because none were available in German. Table 3 provides an overview of the measures.

Table 3 Overview of measures

Procedure

At Institution 1, the students completed the questionnaires as part of the course, as it was an exercise that supported the course’s learning goal of “reflecting on one’s own learning and practice.” Due to the large number of items, we spread data collection over a week, and students completed the questionnaires in the period December 6–10, 2021, or December 13–17, 2021, using the online survey tool unipark© [44]. Each item was answered once. Every morning during the survey period, the participants received an email with an invitation link to the questionnaire and were encouraged to complete it in the workplace.

At Institution 2, the rectorate invited all students in their practical year to participate in the study via an email with a link to the online questionnaire. Students were allowed to pause at any time and continue completing the questionnaire later within the aforementioned period using the online survey tool unipark© [44]. To improve the response rate at Institution 2, participants who completed the questionnaire were entered into a raffle to win a €50 voucher.

To avoid survey fatigue, the following steps were taken in addition to spreading the data collection over a week: Students were provided with targeted information about the study’s aims and benefits; teachers gave students time to complete the survey at their workplace; and students received individual feedback on their results, with tips and tricks for further developing their SRL skills.

Data analysis

We analyzed the first set of single items, which were selected from existing scales, according to Gogol et al.’s [29] recommendations for single-item measures in psychological science [27]. Accordingly, we assessed (1) the items’ reliability by computing the coefficient ω reflecting the proportion of item variance accounted for by the latent construct (Note that the items‘ reliability is the square of the standardized factor loading, see Brown (, p.115) [45], (2) the amount of reproduced information by computing the product–moment correlation between the scores obtained by the full scales and the scores for every single item, while accounting for the overlapping error variance, [46] and (3) relationships within the nomological network by computing product–moment correlations between the single-item measures and measures of external criteria within the nomological network. Similarly, we examined the second set of single items, which were generally formulated, by assessing their relationships with the full WLI scales and their relationships within the nomological network. These items’ reliability could not be assessed because they were not derived from a scale; their ω could not be calculated, and test–retest reliability is inappropriate for measures of states. Note that there is no clear cut-off separating good and poor reliability, but it has been suggested that 0.70 is an acceptable lower bound [47], with values between 0.65 and 0.70 considered minimally acceptable [48]. For correlations, r = .10 is considered as small, r = .30 as medium and r = .50 as large [49]. Analyses were conducted in Mplus 8.6 (Muthén and Muthén, Los Angeles, California) and R 4.3.1 (R Core Team, Vienna, Austria) [50, 51]. All analyses are based on the significance level \(\:\alpha\:\) = 0.05.

Results

Table 4 provides an overview of the detailed results, after which we summarized the results for the 29 single items selected from the full WLI [9] scales, as well as those for the four generally formulated single items.

Table 4 Psychometric characteristic of the full-scale (FS) and single-item (SI), including reliability, correlation with full scale (information reproduction) and correlation with external criteria (nomological network)

Single items selected from full scales

The following paragraphs summarize the results for the 29 single items selected from full scales. Twenty items showed acceptable reliability (ω > 0.70), seven showed minimally acceptable reliability (ω = 0.65 to ω = 0.70), and two showed unacceptable reliability (ω < 0.65) [48]. Of the 29 reliability values for the single items, eight differed significantly from the corresponding full-scale value but showed minimally acceptable to relatively high values (ranging from ω = 0.67 to ω = 0.83).

Regarding information reproduction, the single items showed low to substantial correlations (corrected for shared error variance) [34] with the corresponding full scales, with r ranging from 0.20 for reviewing to 0.79 for monitoring on the emotion metalevel. Of the 29 correlations, 27 values were below 0.70, indicating insufficient information reproduction (we considered less than 50% information reproduction to be insufficient; information reproduction expressed as a percentage is the square of the correlation values); two values were above 0.70, indicating substantial information reproduction.

Regarding the relationships of the selected single items within the nomological network, the items showed patterns that were similar to those of the full scales in terms of their correlations with the external criteria, but the correlations were significant less often. A similar pattern was reflected in the small mean absolute differences between the correlations obtained for the full scales and single items (between − 0.03 for all cognition metalevel aspects and 0.09 for all contextual aspects). The respective differences in the correlations ranged from − 0.25 to 0.28, but only 7 of the 124 correlations between the single items and the external criteria differed significantly from the correlations between the corresponding full scales and these external variables. See Table 5 for an overview of the results.

Generally formulated single items

The following paragraphs summarize the results for the four general single items shown in Table 4 (cognition general, motivation general, emotion general and context general). The correlations between the generally formulated single items (see lines “correlation with … general”) and their respective full scales (see columns “FS”) ranged between 0.16 and 0.51 for general cognition, − 0.51 and 0.67 for general motivation, − 0.54 and 0.66 for general emotion and 0.21 and 0.74 for general context.

Regarding the general single items’ (see column “… general”) relationships within the nomological network (see lines below “correlation with external criteria”), the items showed low to substantial correlations with external criteria. Analyses showed significant correlations between the general cognition single item and the external criterion elaboration (r = .22), single-item general motivation and the external criterion learning goal approach, and single-item general context and the external criteria perception of teachers (r = .61) and perception of atmosphere (r = .68). See Table 5 for an overview of the results.

Table 5 Overview of results

Discussion

In this study, we analyzed the psychometric properties of single items measuring different aspects of undergraduate health sciences students’ self-regulated learning (SRL) in the workplace. First, we assessed the psychometric properties of 29 single items selected from full WLI scales [29], of which 27 items showed sufficient reliability; however, the results regarding validity were heterogeneous. Second, we assessed the psychometric properties of four generally formulated single items [28], which showed acceptable validity, although their reliability could not be assessed. Consequently, this study provides evidence to inform decision-making regarding whether to use single-item measures rather than full scales when investigating the various aspects of workplace learning.

Single items’ psychometric properties

Reliability was acceptable for most of the 29 single items selected from the WLI. The broad range of reliability results is in line with Gogol et al. [29], who found low-reliability values for different types of academic anxiety and low- to acceptable-reliability values for different types of self-concept. If higher reliability is desired, single items can be used as daily measures aggregated to weekly measures in diary studies.

Regarding the items’ validity in terms of information reproduction, 27 of the 29 selected single items showed limited validity. This result aligns with Gogol et al. [29], who also found low correlations of their anxiety single items with the corresponding full scales (but acceptable correlations of their self-concept single items with the corresponding full scales). The possible reasons for low information reproduction are manifold [52]. For example, the constructs might be too complex [27, 52], and the items may not be representative [28, 29]. That could be the case for the single-item of the construct ‘reviewing’, whose information reproduction was particularly low. Additionally, the response format might not have enough categories and might, therefore, lack sufficient sensitivity [27]. We recommend cautious interpretation of such items, particularly if the items used do not represent the construct, as defined above. For example, although planning, as defined in Table 1, includes both anticipation and planning, the corresponding single item only addresses anticipation, necessitating a narrow interpretation using the single item’s wording.

The 29 single items selected from the WLI showed acceptable validity in terms of similar relationships with external criteria of the nomological network compared to the full scales. Similarly, the absolute differences between the correlations obtained for the full scales and those for the single items were small. The range of mean absolute differences is similar to Gogol et al.’s [29].

For the four newly developed single items, we analyzed how well they correlated with their corresponding WLI full scales and with the external criteria within the nomological network. In summary, the correlations were as expected. For example, the correlations of ‘cognition general’ or ‘motivation general’ with their respective full scales were significant while the correlation with external criteria were not always. This is plausible as the respective full scales were developed for the workplace setting while the scales that measured external criteria were developed for the classroom setting. In contrast, the correlations of ‘context general’ with both respective full scales and external criteria were significant as both were developed for the workplace setting. This study’s results should be used to interpret future studies’ results derived from the newly developed single-item measures. For example, the generally formulated single item “I am motivated today” showed a high correlation with situational interest and the mastery goal approach but a low correlation with the expectancy of success. Hence, the item represents the value rather than the expectancy component of motivation.

For constructs where the single items’ psychometric properties are insufficient, we recommend the use of full scales, such as for ‘reviewing’ and ‘control cognition’ which have low reliability or for further constructs if it is important to include multiple facets of the construct. However, researchers often need to balance the number of constructs measured with ethical standards to avoid overburdening participants, as well as to obtain complete and valid data. Several scenarios might justify the use of single items with limited information reproduction: (1) when the single item represents a control or moderator variable [30], (2) when a narrower definition of the construct is justified and the item represents the study’s aspect of interest [27], or (3) when the measure’s desired precision is low [28, 30].

Strengths, limitations, and implications

Our study’s strength is its rigorous methodology to test the psychometric properties of 33 single items [28, 29]. Furthermore, our data represent students’ heterogeneity in terms of cognition, motivation, and emotions, as we collected the data from an almost full cohort of students at one institution. The data also represent heterogeneous learning environments, as the students were in very different workplace settings.

A limitation is that our respondents were from two institutions only. This was necessary because we aimed to collect high-quality data using rigorous implementation management, such as ensuring support from all stakeholders and adequate time for the respondents to complete the questionnaires at the workplace. Furthermore, although preliminary analysis showed measurement invariance regarding gender, this result is limited due to the limited number of male participants. Future studies need sufficient participants to test for measurement invariance. Another limitation is the self-report aspect of our measurement instrument, as the reported information can sometimes differ from actual lived experiences [53, 54]. This study was an important first step in assessing the psychometric properties of single items for measuring different aspects of health sciences undergraduates’ SRL at the workplace. Further research should validate the items using alternative measures for comparison. Furthermore, items with different wording should be tested to improve the quality of the single items [28].

The scientific implication of our study lies in its provision of evidence to inform decision-making regarding whether to use scales versus single-item measures to investigate undergraduates’ workplace learning in health sciences education. This study took a very differentiated view of SRL by providing items on its various aspects. Such a differentiated view with corresponding single items enables future researchers to investigate SRL sub-processes. However, the use of single items also makes it possible to consider several aspects of SRL simultaneously and thus take a more holistic view of SRL. This allows researchers to be more economical in their data collection and to include more constructs. It also supports multivariate longitudinal and diary studies of workplace learning.

The research’s practical implication is its contribution to building scientific evidence that can serve as a foundation for developing interventions to enhance workplace learning. The single items can be used for screenings to further probe cognition, motivation, emotions, and contexts in workplace learning within a particular cohort toward the evaluation of different workplace learning curricula. The items can also be used for learning analytics.

Conclusion

The present study has enhanced knowledge of the psychometric properties of single items measuring different aspects of undergraduates’ self-regulated learning (SRL) at the workplace. Most single-items showed acceptable reliability but the results regarding validity were mixed. While the single-items reproduced the relationships with external criteria in the nomological network that the full scales obtain, most single-items insufficiently reproduced the information that the full scales obtained. The results provide evidence for health sciences education researchers to decide between using full scales and single items. The present study supports further investigation of health sciences undergraduates’ SRL at the workplace.