The field of personality psychology explores how individual differences relate to personal life outcomes. Obtaining a comprehensive understanding of how such individual differences impact the trajectory of life is paramount in gaining insights into human behaviour. While research on the five broad personality factors, also commonly referred to as domains, (Goldberg, 1990) is ubiquitous and replicable (Soto, 2019), there is a clear-cut case for the increased use of narrower traits (i.e., facets and items) (Seeboth & Mõttus, 2018) for improved predictive power and, subsequently, a more detailed understanding of personality (Stewart et al., 2022). As an example, individuals high in the personality factor conscientiousness report greater work satisfaction, whereas individuals high in extraversion and agreeableness report greater social satisfaction (Olaru et al., 2023). However, little is understood as to why these broad factors are associated with work and social satisfaction or how broad traits relate to life outcomes in general. It is possible that there are more informed relationships underlying such associations, visible only in the measurement of the narrower traits (i.e., facets and items), which makes up the Big Five domains. The present study set out to compare explained variance between factors, facets, and items (markers for nuances), predicting various life outcomes with a long, publicly available instrument (IPIP-NEO-120; Johnson, 2014) in a non-English sample.

Levels of personality (factors, facets and nuances)

Most commonly, personality-outcome associations are studied using the Five Factor Model (FFM; McCrae & John, 1992) or the Big Five (Goldberg, 1990), which consists of five broad traits: openness, conscientiousness, extraversion, agreeableness, and neuroticism. These have been associated with a variety of life outcomes, such as divorce, career success, or even mortality (Roberts et al., 2007; Soto, 2019). Personality plays a role in advanced professions (Raoust et al., 2023). Personality factors are highly heritable (Polderman et al., 2015), stable across adulthood (Mõttus et al., 2017), and predict various life outcomes, often better than other factors such as socio-economic status (Kajonius & Carlander, 2017).

The Big Five factors, which are considered the highest level of the personality hierarchy, are composed of narrower traits known as facets (Johnson, 2014). For instance, neuroticism comprises the facet traits anger, self-consciousness, immoderation, and vulnerability (IPIP-NEO-120; Johnson, 2014). Similar to the Big Five traits, facets have been demonstrated to be heritable and stable (Briley & Tucker-Drob, 2012) while also capturing additional variance beyond factors (Mõttus et al., 2014). This implies that facets represent enduring traits with capacity to capture important variation in personality, which might go unnoticed when aggregating to the domain level (Briley & Tucker-Drob, 2012). As such, people can obtain identical scores for broad factors, while their facet scores could be drastically different from one another. For example, the meta-analysis by Vainik et al. (2019) reported on the relationship between personality and Body-Mass Index (BMI) as a measure of obesity. While BMI was positively linked to neuroticism and negatively to conscientiousness, the analysis revealed relationships between BMI and 15 facets across all five factors. Moreover, it was found that facets explained four times more variance than factors for obesity. Interestingly, the factor neuroticism was associated with higher BMI, which could suggest a role of increased anxious behavior. However, when inspecting BMI-facet associations, BMI was associated foremost with higher impulsiveness, anger, and hostility. Notably, while BMI was also positively associated with facets of extraversion, such as warmth and assertiveness, the effect of factor-level extraversion on BMI was inconsistent. Similar studies have found that facets provide a more precise perspective on the relationship between personality and life outcomes (Espinoza et al., 2023; Schwaba et al., 2019). By utilizing narrower traits, Vainik et al. (2019) facilitated a more comprehensive understanding of the relationship between personality and obesity, demonstrating the ability of narrow traits to reveal more meaningful relationships.

While the usefulness of facets in personality-outcome prediction has been demonstrated, it has been suggested that there is yet another indispensable level of traits below facets, labeled nuances (McCrae, 2015). Due to the lack of classification, nuances are operationalized as individual items in a personality instrument. However, personality nuances and items may not be the same. Nuances refer to the smallest unique aspects of an individual’s personality (McCrae, 2015), which can be represented by a single item as well as sets of items that capture no distinct information from one another (Stewart et al., 2022). Items are standardized statements designed to capture tiny aspects of personality and are often treated as interchangeable within item-pools (Mõttus et al., 2017). For instance, extraversion can contain items such as “I have a lot of fun” and “I look at the bright side of life.” While these items capture the facet cheerfulness and tend to correlate with each other, they contain unique information on their own. “I look at the bright sight of life” may capture a nuance labeled “positive thinking,” whereas “I have a lot of fun” may reflect a nuance labeled “lively” or “playful.” While treated as interchangeable in personality instruments, items capture distinct nuances of personality (Speer et al., 2022). Indeed, nuances have been found to capture variance beyond that of facets and share some of the properties of higher-order personality traits; stability, heritability, and observability across raters (Mõttus et al., 2017).

In a comprehensive study on the predictive value of nuances, Seeboth and Mõttus (2018) surveyed a British sample (N = 8719) with 40 life outcomes and 50 items based on the International Personality Item Pool (IPIP). They found that on average, nuances explained more variance than factors, and outperformed factors in predicting 37 of the 40 outcomes. Even after dropping the 10 most predictive items, nuances outperformed factor models.

Building further on similar findings, Stewart et al. (2022) examined associations using the Big Five Inventory-2 (BFI-2; Soto & John, 2017) to predict 53 life outcomes using factor, facet, and nuance models in a large US sample (N = 6126). Nuances (20.9%) outperformed both facets (18%) and factors (16.6%) across outcomes. This held true even after removing factor and facet variance from items, suggesting that unique item-variance was behind associations between personality and outcomes.

The present study

Given that facets and nuances account for unique variance and have shown to outperform factors in predictions (Seeboth & Mõttus, 2018; Stewart et al., 2022), there is a want for research using various samples and other instruments to determine their potential. Extending on previous studies, using a different sample and different outcomes, is of importance, as it helps validate and strengthen the reliability of research findings, while also ensuring that previous findings are not only the result of flukes or biases (Soto, 2019). Moreover, there are ongoing calls for a bottom-up taxonomy and instruments for measuring nuances (Condon et al., 2020), which all require a critical mass of empirical studies. There is also a need for analyzing various lengths of instruments in trait-outcome association studies, as a substantial proportion of informative personality variance is left unaccounted for by short scales (Sleep et al., 2021).

The aim of the present study was to compare the predictive validity of the three hierarchical levels in personality models (factor-, facet-, and item-level) in predicting life outcomes. We posed two hypotheses: (1) facet-level models will outperform factor-level models in predictions, and (2) item-level models, nuances, will outperform facet- and factor- level models in life outcome predictions. The purpose of the study was to extend on the works by Seeboth and Mõttus (2018) and Stewart et al. (2022), using a new sample, different outcomes, and a different personality instrument, and thus contributing to a growing body of literature on the value of using all levels of personality.

Methods

Participants and procedure

The sample used in the present study is a convenience sample. It consists of individuals who voluntarily participated in the research by accessing and completing an online questionnaire for personality testing in Swedish. All gave their informed consent, and since the collected data was anonymous and voluntary, no ethical review was required. In total, N = 568 participants completed the survey. However, participants who skipped ≥10% of items (n = 52) were excluded from analysis. Also, participants below 25 years of age (n = 76) were excluded from the study due to the low longitudinal stability of personality during childhood, which significantly increases as individuals transition into adulthood (Briley & Tucker-Drob, 2014), as well as the potential lack of sufficient life outcome experience. The final sample size was N = 440 (57% female), ranging from 25 to 65 years old (M = 42.0, SD = 10.5). See Supplemental Materials and Appendix A for available demographics.

Measurements

IPIP-NEO-120

This is a widely used self-report instrument that assesses the Big Five with 5 factors, 30 facets, and 120 items (Johnson, 2014). The instrument is one of the few extensive and publicly free instruments with very high reliabilities (α > .85). See Kajonius and Johnson (2019) for more psychometric properties. Respondents rate items on a five-point Likert scale ranging from 1 = very inaccurate to 5 = very accurate, with balanced (+ and -) keying. Example items: “I make friends easily” (extraversion), “I love to help others” (agreeableness), “I usually leave a mess in my room” (reversed conscientiousness), “I am not bothered by difficult social situations” (reversed neuroticism), and “I have a lively imagination” (openness). Item scores are summarized into facet traits with four items per facet. Facet traits are summed and averaged into factors, with six facets per factor. See Appendix B for a complete overview of the IPIP-NEO-120 scale.

Life outcomes

Personal life outcomes were measured by six self-rated single items. These were chosen as a battery of questions concerning having a balanced life, in terms of one’s job, relationships, having space to empathize with others, having a positive outlook on future endeavors, and the incentives of working. Such were considered tapping into healthy life outcomes. See Table 1 for questions and labels used in the present study. Questions were answered using a seven-point Likert scale ranging from 1 = not at all to 7 = completely.

Table 1 Life outcome measures used in the present study

Statistical analyses

All statistical analyses were performed using R (R Core Team, 2021). The packages Psych (Revelle, 2021), dplyr (Wickham et al., 2023a), tidyr (Wickham et al., 2023b), lavaan (Rosseel, 2012), semPlot (Epskamp, 2022), ggplot2 (Wickham, 2016), reshape2 (Wickham, 2007), caTools (Tuszynski, 2021), tidyverse (Wickham et al., 2019), glmnet (Friedman et al., 2010), lme4 (Bates et al., 2015), and caret (Kuhn, 2022) were utilized for analyses and plots. Prior to all analyses, missing variables were dealt with using the R package mice (van Buuren & Groothuis-Oudshoorn, 2011). In this study, the mean value of each column was computed and inserted in the place of missing variables.

First, descriptive statistics on factor-, facet-, and item-level were analyzed. Second, we analyzed the structural validity of each of the Big Five hierarchical structures, using second-order Confirmatory Factor Analysis (CFA) in a SEM framework. Five models were tested, with respective Big Five factor at the top, loaded by respective six facets which in turn loaded by 24 items (4 items for each facet). See Appendix E for the visual structure of the models. No modification indices were used. Model fits were reported using point estimate values of the Standardized Root Mean Residuals (SRMR), robust Root Mean Squared Error of Approximation (RMSEA), and Comparative Fit Index (CFI) (Cheung & Rensvold, 2002). While conventional model fit criteria often constitute at least the following: CFI ≥ .90, RMSEA ≥ .06 and SRMR ≥ .06 (Brown, 2014), low model fit in the context of personality measures is not necessarily unacceptable (Hopwood & Donnellan, 2010). The multidimensional and multifaceted nature of personality makes it challenging to precisely define the relationships among traits and items. Specifically, the complexity of using CFA for personality measures arises from challenges related to specifying accurate models and dealing with correlated residuals. Consequently, the conventional criteria for model fit are seen as guiding principles, rather than rigid cut-off criteria. Third, bivariate zero-order correlational analyses between Big Five factors and six life outcomes were performed. Fourth and last, Elastic Net Regression (ENR) was conducted to compare the predictive validities of factor, facet, and nuance models on each of the six life outcomes (Table 1). As our study aimed to extend on Stewart et al.’s (2022) and Seeboth and Mõttus’s (2018) research, where they employed ENR in their analyses, we opted to adopt the same approach for the sake of comparative analysis. ENR is a variant of classic linear regression often used in studies predicting life outcomes from personality traits (Roberts et al., 2007). ENR differs from regular linear regression by utilizing penalty terms from Lasso (L1) and ridge (L2) regression.Footnote 1 These penalty terms guard against overfitting and handles correlated features and multicollinearity in the data.

The data set was randomly split into a training (67%) and a validation (33%) sub-sample, inspired by Stewart et al. (2022). This strategy enables us to train the model of one subset and evaluate its performance on an independent dataset. It is common practice to reserve a bigger portion of the sample for training the model, as more information produces a more accurate model. The training sample was used to set up ENR models for each of the six outcomes, with the Big Five factors, facets, or nuances as predictors. To further enhance model performance, we employed a 10-fold cross-validation and shrinkage parameters to obtain the optimal parameter lambda (λ), which minimizes cross-validation error and prevents overfitting the data to the model.Footnote 2 Next, we evaluated the trained models by applying them to the validation subsample to predict each outcome. This allowed us to assess the model’s predictive accuracy and its ability to generalize to unseen data. Finally, we quantified the model’s performance by calculating the correlation between the predicted outcomes and the actual observed values. Each correlation was squared to show the percentage of explained variance. These steps were performed using by implementing a data analysis pipeline in R.

Results

A descriptive analysis was performed for the Big Five personality structure. Table 2 displays mean value, standard deviation, skewness, and kurtosis, along with Cronbach alpha values for facets and factors. The mean reliability for facets was α = .68, with 16 facets > .70. This can be compared with a meta-analysis which reported that the mean Cronbach’s alpha (k = 4286) was α = .77, and that 25% of personality measurements report below α = .70 (Peterson, 1994). The factors and facets were overall acceptably symmetrical and normally distributed. Agreeableness and altruism showed notably high kurtosis. Furthermore, descriptive statistics on items are reported in Appendix C, together with a visual correlational heatmap for the 30 facets (Appendix D).

Table 2 Descriptives for facets and factors

To test the structural validity of the Big Five, second order CFAs were conducted (S-CFA). Table 3 reports fit indices (χ2(df), RMSEA, SRMR, and CFI), and standardized item loadings in rows for each respective facet. Neuroticism was the best-fitting model with an RMSEA of .05 and a CFI of .93. The remaining models had acceptable fits, with RMSEAs ranging from .06 to .07. CFIs ranged lower (from .81 to .93). All the models had SRMR values of .08 or less. The CFI value, although slightly below the conventional criterion of ≥ .90, is still close to it, suggesting a reasonable model fit. However, as stated earlier, achieving a perfect model fit in personality research is often unattainable due to the intricate nature of studying personality. The standardized factor loadings were on average of λ = .61 for extraversion, λ = .55 for neuroticism, λ = .55 for agreeableness, λ = .46 for conscientiousness and λ = .57 for openness. The full range was λ = .16–.95. Furthermore, see Appendix E for visual CFA figures for each Big Five structure, showing the facet loadings. Only two facets, cooperation and modesty, failed to show appreciable loading estimates. Overall, the structural validity of the Big Five hierarchies reported acceptable psychometric properties.

Table 3 S-CFA fit indices and standardized item loadings

Factors, facets and nuances in life outcomes

Before running the models comparing the predictive validities between the levels in the personality structure, a bivariate correlation analysis between the Big Five factors and outcome measures were performed. Table 4 displays correlations between the Big Five factors and the six life outcomes in the present study. Extraversion was positively associated with all six life outcomes, whereas associations with neuroticism were mostly negative (with empathy as the exception). Conscientiousness was mostly positively related, while openness was found to be positively related to positive beliefs about the future and negatively linked to both extrinsic and intrinsic reward. No relationship was observed between agreeableness and the other outcomes.

Table 4 Correlations between big five factors and outcome measures

The main aim of the study was to investigate the predictive validity of factors, facets, and nuances for various life outcomes. The hypotheses were that facets would outperform factors, and that nuances would outperform both facets and factors. Figure 1 demonstrates the results of the analysis, with variance explained by each of the models for all six life outcomes.Footnote 3 As illustrated by Fig. 1, clear support was found for both hypotheses. The factors (red dots) were all far to the left of the facets (blue dots) and items (green dots) in the diagram, indicates lesser explained variances. While the life outcomes varied in the degree to which the personality models predicted them, facets consistently explained more variance than factors for all outcomes, and nuances explained more variance than facets. On average, factors, facets, and nuances accounted for 12%, 22.5% and 34% respectively of explained variance across all outcomes (see Table 5).

Fig. 1
figure 1

Variance accounted for in each life outcome. Note. Results of the Elastic Net Regression models, with six life outcomes at three levels of personality prediction – factors (red dots), facets (blue dots), and nuances (green dots). The scores reflect the average score after 100 permutations

Table 5 Proportion of variance explained for each outcome measure

As further shown in Table 5, social satisfaction was the strongest personality-outcome association, with nuances explaining 52% of variance. Extrinsic reward was the weakest outcome-association, with nuances explaining 16% of variance. The amount of variance explained also varied across outcomes. For instance, nuances explained 12% additional variance than factors for job satisfaction, while nuances explained an additional 35% compared to factors for empathy. Overall, facets tended to explain 1.5 to 2 times more than factors, and nuances tended to explain 2 to 3 times more in these life outcomes, confirming both our hypotheses.

Discussion

The present study investigated the predictive validity of the hierarchical structure of personality traits comparing factor-, facet-, and item-level models. Results supported both hypotheses: First, facets provided better predictive validity than factors. Secondly, nuances provided better predictive validity than facets and factors. These results were largely consistent with recent research (Kajonius & Johnson, 2019; Speer et al., 2022; Stewart et al., 2022).

Particularly, inspired by and contrasted to Stewart et al. (2022), we applied the long personality instrument IPIP-NEO-120 in Swedish. Interestingly, the present study reported even greater differences between factors-, facets-, and item-models in predicting outcomes. On average, nuances explained 13% more variance than facets, while Stewart et al. (2022) reported a difference of 3%. Such differences could depend on the number of items in the IPIP-NEO (120 items) compared to the BFI-2 (60 items). Nevertheless, reliability (Cronbach’s alphas) was similar between studies and yet differences were larger than expected. It could be that our present study made use of only single-item outcomes which could have decreased some error variance, making item-level models artificially superior. Furthermore, the self-reported outcomes were quite like some personality traits (e.g., satisfaction, optimism, or motivation), and occasional items might have fitted the outcome prediction, thus making item-level unrepresentatively better than factor-level.

Value of nuances in personality

Item-level nuances accounted for the most variance in the social satisfaction outcome (see Table 1 for outcome descriptions), explaining 52%, which is unusually strong in personality research (equivalent to a multiple r = .72). This was a 25% additional variance than was explained by the factor-level model. Even in the least predictive outcome, extrinsic reward, accounting for 16% variance, nuances outperformed factors by a factor of 3. It seems that the aggregation of items into broader facets and then factors loses nuanced information in how personality traits relate to personal outlooks on life. The largest discrepancy in predictive validity between models was observed for the empathy outcome, with nuances explaining a 35% additional variance compared to factors. This could exemplify the occasional biased advantage of item-level use, seeing that IPIP-NEO-120 contains the facet sympathy, which includes items such as “I feel sympathy with homeless” and “I feel sympathy with those with problems”. Moreover, it contains the facet emotionality, including items such as “I feel others’ emotions”, which is akin to empathy. Considering the overlap between such items and the empathy outcome, a post-hoc correlational analysis was conducted, between the IPIP-NEO items and the life outcome variables (See Appendix G). However, no significantly strong correlation was found between the empathy life outcome and items akin to empathy. As such, it can be argued that the use of items more accurately reflects nuances of personality than the use of the facet sympathy or the factor agreeableness.

Implications

Generalized predictions in personality research are often derived from broad, parsimonious Big Five factors, practical and easy to communicate. Often item selection is standardized and items making up facets are viewed as interchangeable (see the large item-pool in IPIP) despite items capturing distinct and unique variance on their own (Mõttus et al., 2014). While broad factors are associated with both positive and negative life outcomes, narrow facet or nuance models may be more appropriate if precision and prediction are the desired goals.

Personality-outcome connections frequently serve as the foundation for interventions. As such, we must weigh the benefits of general face validity against the loss of precision and prediction. For instance, researchers may be interested in increasing the well-being of the general population. Considering the consistent negative correlation between neuroticism and well-being, as well as the positive association between conscientiousness and well-being (Gilberto et al., 2020) interventions may focus on strategies aimed at either reducing neuroticism or enhancing conscientiousness. However, this is a very complex task. If, instead, it turns out that only a select few specific traits, discernible through facets or nuanced measures, underlie these associations, it suggests that interventions can be more targeted and actionable. In such a scenario, should the Big Five factors continue as the prevailing level of analysis for personality-outcome research when a potentially informative item-level alternative is available? The study’s results indicate that achieving greater success in personality-based interventions may be attainable by targeting precise behavioral, cognitive, affective, and motivational tendencies, captured by narrower traits. Furthermore, researchers should aim not only to establish connections between traits and outcomes but also to offer explanations that pave the way for actionable insights. The findings of this study suggest that employing nuance models may facilitate better insight into personality outcome relationships.

Limitations and conclusion

While the study has several strengths such as a novel testing of a long personality instrument in a sample and language other than English, it also has limitations. The sample size can be considered meager for the current purpose (Cui & Gong, 2018). A larger sample is not only more representative, increasing the generalizability, but is also more robust to random fluctuations. Correspondingly, very little is known about the present sample having been collected through an internet webpage, making it difficult to gauge the generalizability of the sample. Very likely, there is an overrepresentation of high openness and agreeableness (cf. Kajonius & Johnson, 2019).

Furthermore, the life outcomes assessed in the present study were represented by singular items framed in a positive manner, potentially leading to increased social desirability tendencies among the participants. Research has demonstrated that items perceived as highly desirable have been shown to artificially produce a general factor of personality (Bäckström et al., 2009). This could imply that some of items (e.g., “to what extent is reward your source of motivation for working?”) could have overestimated predictive validities. Future studies should take into account whether their life outcome measures are framed in a neutral manner.

Another limitation worth mentioning is the low CFA fit indices that suggest a lack of internal structure. Notably, personality structure are renowned for not showing great model fits. Presumably this could be due to the error-proneness both in response styles as well as individuals differing slightly in internal factor-structures. This is considered a problem within the field of personality research, and some degree of model misfit is to be expected when working with personality assessments (Hopwood & Donnellan, 2010).

Also, both a strength and a limitation is the use of an instrument with a large item-pool, such as IPIP-NEO. This instrument is intended to maximize common variance and scope in facets and factors. Item and residual correlations make it almost impossible to gauge how many nuances are captured by the instrument. It is most likely less than 120, as the items overlap in content, and some are close to reverse-keyed duplicates.

The present study was able to conclude that personality measured at item-level clearly outperformed both facets and factors. Additionally, the predictive advantage was partly contingent on the specific outcome. Future research should encompass a wider range of life outcomes to explore which specific results are most accurately predicted by narrow traits. The Big Five factor models have traditionally produced countless trait-outcome associations and are liked for their practicality and theoretical simplicity. However, this may be the time to put forward the recommendation of making more use of narrower traits, and nuances, especially when enhanced statistical prediction is desired.