Introduction

Intellectual character education, which refers to educating a person in a manner that inculcates desirable thinking dispositions known as epistemic or intellectual virtues–is gaining popularity in all sectors of education (Baehr, 2021; Ruch et al., 2020; Zhang et al., 2022), and particularly higher education (Arum, Eccles, Heckhausen et al., 2021; Orona, 2021c; Schwartz, 2020). The relevance is fueled, in part, by the growing sub-field of character education in psychology, the desire of employers to see graduates with specific soft skills inclusive of reflective thinking habits (McGrew et al., 2018; Villacı́s et al., 2022), and the widespread aim of educators to assist students in personal growth trajectories leading to a life of flourishing (Arum, Eccles et al., 2021).

While its significance is gaining steam across policy recommendations and scholarship, few empirical studies–and even less experimental–address whether educational processes produce such epistemic virtues, and whether specific interventions can be devised that espouse them (Ruch et al., 2020). In this study, we begin by succinctly discussing three key theoretical perspectives. These include: virtue epistemology, the STRIVE-4 framework, and Besser’s virtue learning theory. We highlight the connections between each and focus on noteworthy prior research before analyzing experimental data pertaining to several thinking disposition outcomes. Afterward, we test a theoretically inspired mediator as the possible link between the intervention and virtue growth. Finally, we discuss implications and future work.

Theoretical background

Virtue epistemology and intellectual virtue

The notion of an intellectual virtue can be traced to ancient Greece, being explicitly introduced in Aristotle’s Nicomachean Ethics as a set of personal qualities and knowledge contributing to eudaimonia, or individual flourishing. Aristotle described two parts of the soul: one irrational and the other rational, with the latter separating humans from animals and characterizing the capacity for the development of moral and intellectual virtue. The two sets of virtue–moral and intellectual–are theorized to develop via different routes. While the moral virtues such as courage, generosity, temperance, and justice, to name a few, are developed by habituation and training from youth, the intellectual virtues are acquired through teaching. Aristotle stipulated five intellectual virtues: techne (technical expertise), phronesis (practical wisdom), nous (insight/grasping first principles), episteme (scientific knowledge), and sophia (theoretical wisdom, which is nous plus episteme). Being a treatise on human happiness, in the Nicomochean Ethics, the best life is asserted to be the life of contemplation; a life marked by theoretical wisdom (Zhang et al., 2022).

The Aristotelian concept of intellectual virtue has since undergone modifications and has been used to resolve debates in the field of epistemology (Sosa, 1980), leading to the sub-field of virtue epistemology (Kotsonis, 2020), which evaluates processes of knowing and knowledge as stemming from these individual attributes. The most popular virtue epistemology thesis in recent decades is Zagzebski’s (1996) “Virtues of the Mind.” Zagzebski (1996) argues that intellectual virtues are just like moral virtues: attributes of a person that dictate good/desirable behavior (moral virtue); however, in the case of intellectual virtue, good/desirable thinking (Zagzebski, 1996). In this way, intellectual virtues ought to assist in the acquisition of true and accurate beliefs via good epistemic habits, hence they are also referred to as epistemic virtues (Greene & Yu, 2016).

Over the years, scholars have unsurprisingly described intellectual virtues differently. Whereas some define them as reliable cognitive faculties (Greco, 2000; Sosa, 1985), such as memory and perceptual acuity, Zagzebski (1996) promotes a twofold construct, with one component relating to cognitive skills or capability, and the other component motivational. Baehr (2015) expanded on Zagzebski’s model, adding two more dimensions: an affective and judgment dimension. The affective dimension goes beyond intellectual interest in a topic, or even motivation that leads to truth-seeking behavior, and emphasizes the enjoyment of the learning process itself. That is, one is intellectually virtuous not only if they desire to–and actually–solve an epistemic problem, but take pleasure in asking questions and being inquisitive. The judgement dimension adds yet another layer, which entails being sensitive to or “judging” when a particular moment or situation (in everyday life or otherwise) calls for critical reflection.

STRIVE-4 framework

Scholarship in moral virtue also offers insights into how virtue develops and what types of research questions are well-suited for virtue inquiry. Because of the growing interest in (intellectual) virtues, scholarship is beginning to transition from philosophical, psychological, and policy ruminations to a testable empirical framework (Fowers et al., 2021). For example, Fowers et al. (2021) introduced the STRIVE-4 (Scalar Traits that are Role sensitive, include Situation × Trait Interactions, and are related to important Values that help to constitute Eudaimonia) model to support and integrate the currently disjointed yet budding field of virtue science. The STRIVE-4 model posits that virtue has four components: knowledge, behavior, motivation, and disposition.

As the acronym suggests, virtues are explicitly conceptualized as quantitative attributes (e.g., individuals possess certain amounts of virtue) as opposed to categorical (e.g., presence/absence of virtue). Fowers et al., (2021, p. 129) also note “…from a neo-Aristotelian perspective, virtue traits are not biologically given. Rather, we see virtues as acquired traits.” Though they are conceptualized as stable dispositions, virtues are role sensitive and dependent on specific contexts and situations.

With this basis, Fowers et al. (2021) generate 26 testable hypotheses stemming from their model. The hypotheses range from topics regarding measurement validity (e.g., #17. Self-reported virtue will be related to relevant criterion variables), anticipated correlations (e.g., #26. Virtue traits will be associated with variations in neural processes), prediction (#19. Virtue-related knowledge will add incremental validity to the ability to predict virtue-related criteria), development (e.g., #22. The rudiments of virtue can develop over time into mature virtue), and intervention (e.g., #24. Virtue-related behavior can be increased with simple, short-term interventions). The hypotheses are intended to guide virtue science, thus the STRIVE-4 framework does not provide a list of virtues nor does it suggest specific mechanisms for development.

Besser’s theory of virtue learning

Another noteworthy perspective in moral virtue is Besser’s (2020) theory of virtue learning, which is grounded in self-determination theory. The theory notes that for virtue to develop in individuals, it must resonate with their perception of who they are and who they desire to become. Thus, it must connect with and become a part of their identity. In this way, the successful development of virtue must tap an individual’s basic psychological needs, such as autonomy, competence, and relatedness.

In order for instruction to support virtue development by connecting it to learners’ sense of autonomy, competence, and how they relate to others, the theory predicts that learning what virtue is/what is involved in its application, why it’s important, and how to implement it in one’s own life are the three key knowledge areas necessary for virtue learning. Besser (2020) places particular salience on learning why (e.g., why virtue is important), arguing that this is the bedrock for which deeper learning of the what and how of virtue follow suit. She (2020, p. 285) summarizes, “…our focus within moral development ought to be on helping subjects develop an understanding of the goal of virtue in a way that resonates with them.”

Connections between the models

The frameworks and theories share important features with each other. First, all three agree that the aim of virtue leads to eudonmiona or flourishing. Second, they suggest that instruction and intervention are effective means of developing virtue. Third, two of the three frameworks explicitly mention that relaying the significance of virtue is a salient component of virtue instruction.

While both the STRIVE-4 model and Besser’s theory were originally formulated to pertain to moral virtue, their scope readily extends to intellectual virtues. This is because: (a) the STRIVE-4 virtue components conform with proposed intellectual virtue dimensions (Baehr, 2015); (b) empirical studies of character often classify intellectual and moral virtue studies together (Brown et al., 2022; McGrath et al., 2020); (c) growth trajectories of intellectual and moral dispositions occur synchronously (King et al., 1989); (d) previous research has applied moral virtue theory to intellectual virtue testing (Orona, 2021b); (e) teaching strategies to inculcate moral and intellectual virtue have substantial overlap (Besser, 2020; Kotzee et al., 2019); (f) leading virtue taxonomies categorize several popular thinking dispositions as virtues (e.g., VIA-model listing curiosity and open-mindedness as virtues) and (g) seminal philosophical texts argue against this dichotomization of virtue types (Zagzebski, 1996). Thus, the intellectual virtues appear to be sufficiently alike the moral virtues to warrant the application of the STRIVE-4 framework and Besser’s theory to intellectual virtue research (Zagzebski, 1996).

(Intellectual) virtue research

Several empirical studies support aspects of the models reviewed above. For instance, one persistent issue is if (intellectual) virtue measures have appropriate (e.g., positive, negative or nonoverlapping) associations with external criteria, over and above other, related variables that have been studied for decades. Importantly, McGrath et al. (2020) found that broad character traits measured with the Virtues in Action (VIA) scale are highly related to, but not identical with, long-standing personality scales such as the NEO and HEXACO. Anjum and Amjad (2021) established this structure in a different population, confirming key associations with positive and negative affect.

Importantly, Lian and You (2017) found that several key virtues predict behaviors in undergraduates, such as time-spent on smartphones. Self-reported measures of intellectual humility have been positively correlated with cognitive reflection (Krumrei-Mancuso, Haggard, LaBouff, & Rowatt, 2020) and mastery behaviors (Porter, Schumann, Selmeczy, & Trzesniewski, 2020). Orona et al. (2023) found that an index composed of various curiosity trait measures (e.g., Openness to Experience, Need for Cognition, and Epistemic Curiosity) moderated the influence of broad learning experiences on the development of higher-order cognition, such that highly curious individuals exhibited greater grains in reasoning ability when exposed to diverse educational content than their less curious peers. And one of the mainstay thinking dispositions–the Need for Cognition–has been positively related to cognitive reflection (Šrol, 2018) and significantly predicted complex problem-solving abilities (Rudolph et al., 2018).

Moreover, general character education (including intellectual character) programs have had positive effects on well-being and other desirable outcomes. Brown et al. (2022) found in a meta-analysis that most character education programs–across a range of intervention and outcome types–exhibit an average standardized effect size of 0.24. Interestingly however, few studies have looked at the effects of character education on character, and even less on intellectual character (Ruch et al., 2020).

The intellectual virtue curriculum development and prior research

Our research team developed and piloted a novel, online thinking disposition intervention internally referred to as the “Intellectual Virtue Curriculum” (IVC). The intervention was informed by philosophical, psychological, and online pedagogical theory and practice (Baehr, 2013; Fischer et al., 2022; Hidi & Renninger, 2006; Orona, Li et al., 2022), and the brainchild of a leading virtue epistemologist. In designing the IVC, theoretical perspectives were combined to trigger and maintain interest (Hidi & Renninger, 2006) in intellectual virtues via pedagogical activities, such as: (a) introducing novel experiences (Quinlan, 2019), (b) being exposed to experts’ struggles and applications of concepts (Hong & Lin-Siegler, 2012), (c) interactive learning activities alongside a lecture course (Yuretich et al., 2001), (d) repeated involvement in inquiry activities, and (e) engaging in reflective exercises (Hulleman & Harackiewicz, 2009).

Corresponding with the above theory-based pedagogical activities, the IVC consists of a set of modules containing high-quality videos of philosophers, scientists, educators, and graduate students detailing the components of intellectual virtue, their relevance, and the ways they relate to academic and intellectual pursuits, as well as a life of flourishing. Exemplars model good epistemic thinking across different domains and perspectives, an important feature in developing students’ epistemic thought processes (Vossoughi et al., 2021). Alongside videos, students are given a series of activities: including quizzes, thought puzzles, and reflective exercises.

In this way, the IVC module attempts to develop intellectual virtue by (a) introducing what virtue is (lecture videos defining and explaining the components of intellectual virtue); (b) explaining why it is important (videos and activities geared towards explaining how virtue is relevant to different disciplinary perspectives, as well as quiz questions testing virtue knowledge), and (c) detailing how to implement it in one’s own life (reflective exercises that ask individuals to think of areas of their life where they can implement virtue). Phrased another way, the mechanism by which the IVC is posited to operate involves the degree to which students acquire knowledge of intellectual virtue (what), grasp the significance of intellectual virtue (why), and understand what it takes to implement intellectual virtue (how). According to Besser’s theory outlined above, if these three criteria can be meaningfully and effectively targeted, individual growth in (intellectual) virtue is likely to follow suite.

In the pilot iteration of this intervention, Orona and Pritchard (2021) evaluated the IVC’s effect on two measures of intellectual curiosity and found positive preliminary effects of 0.18 (Need for Cognition) and 0.13 (Epistemic Curiosity). To understand the mechanisms by which the IVC inculcates intellectual virtue, Orona (2021b) tested some of the theoretical links deemed essential for virtue learning (Besser, 2020). Consistent with Besser (2020)’s model and the broader intellectual character education stipulations, learning what intellectual virtue is and why it’s important was positively associated with increases in intellectual curiosity across a range of analytic approaches (Orona, 2021b).

Based upon this early work, the IVC has been expanded to include modules pertaining not only to intellectual curiosity, but also intellectual humility, integrity, and tenacity, as these are some of the commonly referenced intellectual virtues (Baehr, 2015; Pritchard, 2020). Therefore, in the current iteration of the IVC, students in the treatment group underwent videos, quizzes, brainstorming activities, and reflective exercises pertaining to each of these intellectual virtues.

Present study

The present study scales up the IVC evaluation, being among the first randomized controlled trials of an intentionally designed thinking disposition intervention aimed at improving university students intellectual virtue. Guided by the epistemic virtue frameworks indicating the significance of instruction on intellectual virtue development, Besser’s theory highlighting the what, why, and how of virtue learning, and the STRIVE-4 model (Fowers et al., 2021) hypotheses: “#23. Virtue acquisition can be fostered by well-designed, structured interventions” and “#24. Virtue-related behaviors can be increased with simple, short-term interventions”, we test the effect of the thinking disposition intervention on intellectual character development via a theoretically stipulated mechanism. Specifically, we specify latent variable mediation models using Besser’s learning components as mediator(s) transferring the effect of the treatment to growth on four key virtues: curiosity, tenacity, integrity, and humility. Thus, we test to research questions:

  • RQ1: Does participation in the IVC increase intellectual virtue?

  • RQ2: Do the components of virtue learning (e.g., understanding what virtue is, why it is important, and how to implement it) mediate the relationship between participation in the IVC and increases in intellectual virtue?

Methods

Participants

This study took place at a large public southern California research university. Instructors were contacted via email; those willing to embed the intervention in their course also agreed to offer extra credit to students for participating in the study. To be eligible for the study, participants needed to be: (a) enrolled in a participating course and (b) not previously exposed to the intervention. Initially, 806 undergraduates consented to participate. Due to attrition (i.e., either leaving the course or the study: 32%), eligibility criteria (of those who completed the posttest, 23% were ineligible to be included in the study) and missing data (among those eligible, 15%), 424 students (treatment = 216; control = 208) had analyzable data. However, not all of these students had full administrative data to describe the sample. Those with complete data on demographics and academic records were n = 361 (treatment = 186; control = 175). We therefore specify latent variable models on the full sample (without covariate information; n = 424) and the analytic sample (n = 361) to compare how the drop in cases and inclusion of covariates impacts the results.

In Table 1, we present descriptive statistics by condition for the analytic sample, as well as mean differences. Across both conditions, most students were enrolled full-time during the term they participated (87%) and were female (> 60%). A little over 50% of the sample were first-generation college students, and about a one-third underrepresented minority students (URM: Hispanic, Black, and Native American). All imbalances between groups and pretest measures were tested with either a chi-square test (categorical) or a t-test (ordinal/numerical). No significant differences were found between the treatment and the control across any of the demographic, academic, and pretest survey measures.

Table 1 Descriptive Statistics by Treatment Condition

Procedure

We employ a pretest/posttest randomized controlled trial. Once students signed a consent form and completed a pre-survey, they were randomly assigned with 50% probability to either the thinking disposition intervention or a control condition. The control condition consisted of additional educational materials and exercises relating to the broad course domain for which the student was enrolled. For instance, students who signed-up for the extra-credit opportunity in their science course and who were assigned to the control condition received a science-focused module touching on a variety of fields to describe emergent phenomena. Students who signed-up for the extra-credit opportunity in their critical reasoning course and who were assigned to the control condition received additional materials developed by their instructor to further knowledge in the course topics. It’s important to note that students were randomized within courses; this design feature obviates the plausibility of instructor and course topic effects (Orona, 2021a). In order to receive the post-survey, participants were required to send in proof of completion of the assigned modules via a snippet through email. All four intellectual virtues were assessed at both time points, while perceived virtue learning items were only assessed posttest. As these latter set of items ask explicit questions about intellectual virtues, this design feature was implemented to not reveal condition status to participants.

Measures

Intellectual virtue measures

To measure intellectual curiosity, we relied on the 18-item Need for Cognition (NFC) scale (Cacioppo & Petty, 1982). The measure has been shown to constitute a core aspect of epistemic curiosity (Powell et al., 2016). To measure intellectual humility, we deployed the 6-item scale developed by Leary et al. (2017). The IH scale has been shown to be highly correlated with other intellectual virtues and exhibit non-overlapping associations with personality (Leary et al., 2017). To measure intellectual integrity, the researchers devised their own 6-item scale, given no adequate preexisting scale. The items were devised to measure one’s willingness to be intellectually honest, despite personal gain. Intellectual tenacity was also developed by the researchers, as no adequate preexisting scale was found. The construct is understood as applying effort towards intellectual goals despite the presence of obstacles. All scale response options were measured on a 5-point scale ranging from 1 = “Extremely Uncharacteristic” to 5 = “Extremely Characteristic”.

Nine items were developed to measure Besser’s virtue learning (BVL) components; specifically, items were designed to track students’ subjective valuation of what intellectual virtue is, why it’s important, and how to exercise it. Three items were used per learning type (e.g., what, why, how), each of which were positioned on a 5-point scale ranging from 1 = “Disagree Strongly” to 5 = “Strongly Agree.” Full items for all scales can be found in the appendix.

Control variables

Control variables include demographic, academic, and additional pretest scores, which can be viewed in Table 1. Demographic and academic variables were obtained from university administrative records. The additional pretest measures include the 3-item cognitive reflection test (Frederick, 2005, p. CRT1: Cronbach’s α = 0.75) and the (log) minutes students spent on the pretest and posttest obtained from the survey software. The reason for the inclusion of the time spent on the surveys is because low-stakes testing/surveying studies are prone to elicit hasty and sometimes thoughtless responses from participants (Liu et al., 2012). Recent work shows that response times can provide valuable information on survey quality (Lundgren & Eklöf, 2023). Thus, by controlling total survey time, we are holding constant a proxy for the quality of responses from a student.

Data analysis

Prior to answering our main research questions, we test the adequacy of the intellectual virtue scales and the self-reported learning measures by conducting a series of measurement models. First, we begin with an exploratory factor analysis (EFA) of all the items for the intellectual virtue scales at time point 1 (36 items). Using criteria considering low loadings (< 0.40) and cross-loadings (> 0.30) outlined by Howard (2016), we drop items from the EFA. Then, we perform confirmatory factor analysis (CFA) to test configural, weak, and strong measurement invariance across the treatment groups at time point (1) We do the same invariance testing by group at time point (2) Finally, we do the same invariance testing across time points.

Since the Besser virtue learning measures were only assessed at time point 2, and because we have a theory-inspired latent variable structure, we begin with a CFA model. We then test a series of models to assess model fit. We also test configural, weak, and strong measurement invariance across the treatment groups.

For our main research question(s), we use two different strategies. First, we test the direct effects of the intervention on the IV variables using simple linear regression, reporting the coefficients and F-tests associated with each model. We use change score models for the virtue variables for interpretability. (Latent variable models testing direct effects are shown in the appendix.) Second, to test mediation, we build upon the invariance models in the preliminary analysis and specify cross-lagged, structural equation models (SEM) holding constant the time 1 intellectual virtue scores and maintaining the same invariance constraints of the strictest model obtained. We use SEM models to specify a mediation model with Besser’s virtue-learning measures as the mediator between the treatment and the intellectual virtues. This type of model allows us to estimate the indirect effect of the treatment on the intellectual virtues via the intervening mechanisms of virtue learning. Figure 1 shows the general form of the mediation model.

Fig. 1
figure 1

Cross-lagged panel model with latent variable mediation. IV = Intellectual virtue scores; T1 = time-point 1/pretest; T2 = time-point 2/posttest; BVL = Besser’s Learning Index; SNFC = short Need for Cognition; IH = Intellectual Humility; II = Intellectual Integrity; IT = Intellectual Tenacity

Additionally, we specify models with and without covariate variables. The model with covariates includes a host of pretest characteristics (see Measures section above), a strategy proposed by Gelman et al. (2020) to stabilize estimates. Thus, while we report models with and without covariates, we focus on the former. We also report standardized coefficients of latent variables (mean of 0 and standard deviation of 1); thus, regression weights from the treatment indicator to the outcomes can therefore be interpreted akin to Cohen’s d.

Results

Preliminary analysis

The Kaiser-Meyer-Olkin (KMO) measure indicated good sampling adequacy, KMO = 0.87. All individual items had KMO values above the 0.50 acceptable limit proposed by Hair et al. (2006). The Bartlett’s Test of Sphericity was \({\chi }^{2}\) = 4854.76 (df = 630), p = 0, indicating the data are suitable for reduction. Finally, the determination of the correlation matrix was positive, indicating the ability to extract common variance. The scree plot indicated a four-factor solution with the full 36 items included. After specifying a four-factor model and pruning items that had low loading and high cross-loading, we re-ran the EFA with varimax rotation. The four factors were a shortened NFC (SNFC; 8 items), IH (6 items), IT (3 items), and II (3 items). For time point 1 (T1) and time point 2 (T2), each scale exhibited adequate to strong reliability per Cronbach’s α: SFNC (T1 = 0.81; T2 = 0.83); IH (T1 = 0.84; T2 = 0.88); IT (T1 = 0.83; T2 = 0.82); II (T1 = 0.75; T2 = 0.71). Table 2 presents the factor loadings from the final solution (scree plot presented in the appendix).

Table 2 Factor Loadings for Intellectual Virtue Scales

Table 3 presents the results of the invariance testing. Both configural (e.g., model fit across groups), weak (e.g., factor loadings equal across groups), and strong (e.g., item intercepts equal across groups) invariance were met between the treatment and control groups for both time 1 and time 2, as indicated by the non-significant p-value(s). We also tested the same set of invariances across time points (before and after the intervention). Across time points, configural invariance was met but weak was not achieved. A typical approach when invariance is not met is to seek partial invariance (e.g., allowing some item loadings/intercepts to be freely estimated across groups/time points). Thus, after identifying the item causing a decrease in fit (one item from the NFC scale) and allowing it to be freely estimated across time points, partial weak invariance was achieved. Similarly, we found one problematic item intercept for the test of strong invariance (another item from the NFC scale) that, once allowed to be freely estimated across time points, partial weak invariance was achieved.

Table 3 Measurement Invariance of Intellectual Virtues (across groups and time)

For the virtue learning measures, we compared a variety of models, including a model specifying a meta-factor for virtue learning composed of three first-level factors corresponding to the three aspects of virtue learning: what, why, and how. This model, which included model implied correlated uniqueness, had the best global fit (CFI = 0.985; TLI = 0.972; SRMR = 0.026; RMSEA = 0.071) and the lowest relative fit (e.g., using BIC and AIC metrics) of the models compared (competing models are shown in the supplemental material). This model and the corresponding factor loadings are shown in Fig. 2. Thus, group invariance was tested on this model, as presented in Table 4. Both configural (e.g., model fit across groups) and weak (e.g., factor loadings equal across groups) invariance were met, as indicated by the non-significant p-value. However, one item’s intercept required to be freely estimated across treatment status; thus, partial strong invariance was achieved.

Fig. 2
figure 2

CFA model for the Besser Virtue Learning index. A meta-factor model with first-order factors (what, why, how) are comprised of nine manifest variables

Table 4 Besser Virtue Learning Invariance Testing Models Group Comparison

RQ1: does participation in the IVC increase intellectual virtue?

Figure 3 shows the mean scores for each IV variable by time point and condition. Descriptively, we see that the differences between conditions appear minimal for all the virtues.

Fig. 3
figure 3

Mean plots for each intellectual virtue before and after treatment by condition

Table 5 presents the formal tests of these comparisons, showing direct effects of the treatment on the standardized IV change scores. The treatment has a medium-sized effect on both SNFC (0.20, p < .001) and IT (0.21, p < .05). Interestingly, all effects were centered on positive values except the II change score, though this wasn’t significant (p > .05). Relatedly, the F-test is significant for all models except IH and II.

Table 5 Manifest Variable Models: Direct Effects of the Treatment

RQ2: does virtue learning mediate the relationship between IVC participation and virtue development?

Table 6 presents the results of the main analysis. First, we see that the treatment had a very large direct effect on the BVL meta-factor in the model without covariates (n = 424) conducted on the full sample (\(\beta\) = 0.72, p < .001) and the model with covariates (n = 361) conducted on the sub-sample (\(\beta\) = 0.77, p < .001).

Table 6 Latent Variable Mediation Models

Additionally, all indirect effects from the treatment to the four intellectual virtues via BVL were statistically significant, p < .05. Furthermore, the estimates were stable across the models/samples presented, with minimal difference in the size or significance of the estimates. The intellectual virtue most impacted by the treatment through BVL was IT, irrespective of the model without (\(\beta\) = 0.24, p < .001) and with covariates (\(\beta\) = 0.26, p < .001). The intellectual virtue least impacted by the treatment through BVL was SNFC, irrespective of the model without (\(\beta\) = 0.13, p < .01) and with covariates (\(\beta\) = 0.11, p < .05).

To visualize the extent to which change in the virtue scores correlates with virtue learning, Fig. 4 presents the bivariate correlations between the factor change scores and the extracted factor scores for the BVL latent variable. All BVL associations with virtue change scores were statistically significant (p < .001), except for intellectual humility (IH), which approximated significance at the conventional threshold (p = .0502).

Fig. 4
figure 4

Bivariate relations between intellectual virtue change scores (computed from extracted factor scores) and the virtue learning (Besser Virtue Learning = BVL) variable

Discussion

In line with the budding interest in both virtue science and intellectual character education in higher education, the purpose of the present study was to evaluate the impact of a thinking disposition intervention on intellectual virtue development and learning. Using a randomized controlled trial design, we measured and assessed the direct and indirect effect of the IVC on four key virtues: curiosity, humility, integrity, and tenacity through subjective virtue learning measures. The experiment generated generally positive results for the intervention–though there are some notable aspects suggesting limited effectiveness.

For example, the change score regression models revealed that there was no significant impact on II or IH. The intervention did, however, have large impacts on virtue learning, with effect sizes approaching a full standard deviation unit. Moreover, we observed statistically significant indirect effects of participation for every one of the four virtues across the full sample and the sub-sample with covariate variables included. Given these findings, the results of this study have implications for virtue science and the design of intellectual character interventions.

Virtue science

The study results have clear implications for models of (epistemic) virtue. First, the intervention was effective at promoting knowledge of what intellectual virtue is, why it is important, and how to implement it, suggesting the importance of instruction in virtue learning. Second, based on the current study’s results, Besser’s theory of virtue learning was well supported, showing that learning the what, why, and how of virtue leads to greater growth in intellectual virtue. Fourth, guided by the STRIVE-4 framework, this study communicates with criteria for virtue science. This contribution can be realized by the study results that show virtue acquisition can be fostered by well-designed, structured interventions. Furthermore, given an academic term at the study university entails a total of 10 weeks, and students complete the intervention in approximately 7, then, depending on one’s frame of reference, this may qualify as a short-term intervention. These two points relate to the two STRIVE-4 hypotheses regarding interventions and provide a basis to begin unpacking the specific components that foster virtue development.

Design and evaluation of character interventions

Ruch et al. (2020) notes that most character interventions emphasize outcomes other than character, noting the need for studies showing their efficacy for character development. Insofar as the current study measures are adequate representations of intellectual character, our study contributes to the knowledge base for the efficacy of character education interventions on character attributes. Moreover, it’s in line with recent suggestions regarding rigor in evaluating character education programs (McGrath, 2022). We also found that the treatment effects on the (shortened) Need for Cognition scale (0.20) and the intellectual tenacity (0.21) scale corresponded well with the effect sizes of previous character development programs (0.24, Brown et al., 2022), and the pilot iteration of the thinking disposition intervention (0.18, Orona & Pritchard, 2021).

Thus, another implication is that not all character traits can be impacted equally, despite equal intervention. For example, we found that intellectual curiosity–as measured by NFC–and tenacity were more impacted than humility and integrity, yet the treatment emphasized each trait equally. There are several possible explanations for this. First, it may be that the IVC modules relating to the former two traits are somehow better designed, thus resulting in stronger gains in these dispositions. Second, it may be that the virtues themselves are sufficiently different, such that the curiosity and tenacity are more fluid and easily targeted than integrity and humility, which may be more deeply ingrained character attributes and thus require more intense intervention. Third, there may be a ceiling effect: participants may have been already too high on the former traits to grow. This may be the case for intellectual humility, which was the only virtue that had a strong negative skew in its distribution (histograms of every outcome variable can be found in the supplementary material).

And most importantly, there is the possibility of measurement error. However, insofar as reliability estimates are concerned, no clear pattern was observed in the traits that were and were not impacted. For example, the two significantly impacted virtues included the most established scale administered (e.g., NFC) and a researcher-developed scale (tenacity)–each with high to adequate Cronbach 𝛼 values. Likewise, the two virtues not statistically significantly impacted included one previous existing scale (humility), and one researcher-developed scale (integrity)–exhibiting high to less than adequate Cronbach 𝛼 values, respectively. Still, only the NFC has undergone extensive psychometric testing and validation, with over a 40-year research base (e.g., Cacioppo & Petty, 1982; Lavrijsen et al., 2021). We further expound on measurement issues as they relate to limitations and future directions in the section below.

Limitations and future directions

The limitations largely concern measurement, external validity, and follow-up. Ideally, measurements beyond self-report are desiredFootnote 1 (Maul, 2017). But insofar as psychometrics is concerned, no clear pattern was observed in the traits that were and were not directly impacted. For example, the two significantly impacted virtues included the most established scale administered (e.g., NFC) and a researcher-developed scale (tenacity)–each with high to adequate Cronbach α values. Still, advancing situated, contextual measures with enhanced predictive validity will require extensive conceptual and empirical research moving well beyond self-report scales (Ng & Tay, 2020). Another limitation is external validity, which could be enhanced with a multi-campus experiment examining the effects of IVC within and across diverse institutions (and populations). Finally, a follow-up study would greatly strengthen our understanding of the lasting effect of the IVC intervention. Together, such data would be vital in gaining knowledge of if and when supports and boosters may be relevant to sustain fundamental intellectual character development.

Conclusion

Intellectual virtue development appears to be a feasible aim when intervention design is guided by philosophical, psychological, and pedagogical perspectives. This study demonstrated the effect of intellectual virtue education on learning about virtue, which in turn was related to virtue development. Given wide-spread appeal in developing life-long learners with reflective thinking habits and good epistemic hygiene, researchers and funders may consider investing time and resources in intellectual character education as a potential avenue towards a society of virtuous reasoners.