Abstract
The time-efficient assessment of moral values using systematically validated measures is a high priority in moral psychology research. However, few such options exist for researchers working with Moral Foundations Theory, one of the most popular theories in moral psychology. Across two samples totaling 1336 participants (756 Australian undergraduates and 580 American Mechanical Turk workers), we used a genetic algorithm-based (GA) approach to construct and validate abbreviated versions of the Moral Foundations Vignettes (MFV), a 90-item scale comprising vignettes of concrete violations of each of the six moral foundations. We constructed 36- and 18-item versions of the MFV, demonstrating close correspondence with the complete MFV, and adequate reliability, predictive validity, and factor-analytic goodness of fit for both abbreviated versions. Overall, the abbreviated scales achieve substantially reduced length with minimal loss of information, providing a useful resource for moral psychology researchers.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
In recent decades, moral psychology has taken to describing individual variation in people’s moral concerns. Most prominent among these descriptions is Moral Foundations Theory (MFT; Graham et al., 2013; Haidt & Joseph, 2004), according to which differences in moral judgments can be explained by differential endorsement of five (or six) foundational moral values: Care, Fairness, Ingroup loyalty, respect for Authority, Purity, and Liberty. This constellation of moral values has been used to describe moral judgments of phenomena ranging from real-life political issues (Koleva et al., 2012) to responses to sacrificial moral dilemmas (Crone & Laham, 2015), and much else besides.
To date, three systematically validated questionnaire measures have been developed to measure endorsement of these different moral values. Most prominent among these are the Moral Foundations Questionnaire (MFQ; Graham et al., 2011) and Moral Foundations Sacredness Scale (Graham & Haidt, 2012). Recently, an independent group of researchers developed the Moral Foundations Vignettes (MFV; Clifford, Iyengar, Cabeza, & Sinnott-Armstrong, 2015), comprising 90 standardized, concrete moral transgressions, covering all six foundations (unlike the MFQ and MFSS, which cover the original five),Footnote 1 assessing the extent to which respondents disapprove of violations of each foundation.
Although the MFV benefits from being long enough to yield reliable scores, its length has the obvious drawback of rendering the scale impractically long in many settings, prompting researchers to administer subsets of the items to reduce test length (e.g., Ottaviani, Mancini, Provenzano, Collazzoni, & D’Olimpio, 2018; Wagemans, Brandt, & Zeelenberg, 2018).Footnote 2 Without a systematically abbreviated scale however, researchers will tend to use ad hoc abbreviations that may vary in their quality, and insofar as researchers use different item subsamples, results will not be readily comparable.
Given the promise of the MFV as a measure of concern about the six moral foundations, and given that no abbreviated version of the MFV exists, we aimed to create abbreviated MFV scales using rigorous scale-shortening methods to allow time-efficient, standardized assessment of the extent to which people disapprove of violations of the six moral foundations.
Method
Participants
We used two samples to develop and validate the abbreviated versions of the MFV. The first sample (henceforth labeled UG) comprised 756 Australian undergraduate psychology students (age M = 19.56, SD = 3.85, range 17 to 51) participating in a larger study for course credit between January 2015 and November 2016. The second sample (labeled AMT) comprised 580 Mechanical Turk workers (age M = 34.01, SD = 10.07, range 17 to 71), recruited between May 2017 and November 2017. Additional demographic information is available in Supplementary Tables S1 and S2.
Sample partitioning
In developing an abbreviated version of the MFV, we identified three goals that had to be traded off against one another. First, it is desirable to have a sufficiently large training sample to ensure that sampling error has minimal influence on the item selection process. Second, it is desirable to have a sufficiently diverse sample for the abbreviation process to ensure that the abbreviated scales are not just optimized for use in a single, narrow population or context. Finally, related to the second goal, it is desirable to have large quantities of independent data for cross-validation to provide a concrete test of generalizability.
To strike a balance between these competing goals, we assembled a balanced “training sample” comprising 800 participants (i.e., 400 randomly drawn from each of the two samples) which we used to select items and to perform the majority of our initial analyses (described below). The remaining participants form the two “validation samples” (UG validation sample N = 356; AMT validation sample N = 180) in which we replicated the same analyses as conducted in the training sample.
Materials
Participants completed the 90-item version of the Moral Foundations Vignettes (Clifford et al., 2015). Vignettes in the MFV describe third-person moral violations, each relating to a specific moral foundation (e.g., “You see a student copying a classmate’s answer sheet on a makeup final exam” for Fairness). For each item, participants rated how morally wrong the behavior is on a 5-point scale (“Not at all wrong” to “Extremely wrong”).
Additionally, in the UG sample, participants completed a survey of political issue positions on 13 issues (e.g., abortion) taken from Koleva et al. (2012). Participants rated the issues on a 5-point scale (“Morally acceptable in most or all cases” to “Morally wrong in most or all cases”). Given MFT’s widespread use as an explanation for political differences (Graham et al., 2013), these issues served as criterion variables with which to compare the abbreviated versions of the MFV.
Further details of the materials (including descriptions of subtle variations in question wording across samples) are available in the Supplementary Materials.
Scale abbreviation procedure
To abbreviate the scale, we used the GAabbreviate package for R (Sahdra, Ciarrochi, Parker, & Scrucca, 2016). GAabbreviate uses a genetic algorithm (GA) to iterate through a large number of possible shortened scales to try to find the abbreviated scale that maximizes explained variance in the complete scale (for further details, see Sahdra et al., 2016; Schroeders, Wilhelm, & Olaru, 2016; Yarkoni, 2010). GA-based approaches have been shown to be preferable to common manual scale-shortening strategies such as selecting items with the highest loadings, both in terms of maximizing explained variance, and in terms of performance on conventional scale evaluation metrics such as factor analytic fit indices (Schroeders et al., 2016). See the analysis code available at https://osf.io/cmwpv/ for further details.
The recommended MFV set contains 90 vignettes overall, covering all six foundations (including Liberty), and containing three Care-related subscales (Animal harm, Human emotional harm, and Human physical harm). For the Care foundation, we created an overall Care scale by averaging the three Care subscales.Footnote 3 We attempted to construct two abbreviated scales with three and six items per foundation, respectively, choosing three items as the shortest version, given such is one more than the minimum number of indicators per factor typically required to identify a CFA model (Kenny & Milan, 2012). Moreover, as shown in the analyses below, any further reduction would be unlikely to yield a usable measure for such broad constructs. Our choice of six items as the longer abbreviated scale was motivated by a desire to make a short scale with minimal loss of information compared to the full MFV, but which is also of comparable length to the commonly used and substantially shorter MFQ (which also uses six items per foundation compared to the MFV’s average of 15).
Results
The final items for both abbreviated scales are presented in the Appendix.Footnote 4 Our primary analyses consisted of four components: (1) examining correlations between the original and abbreviated scales, (2) computing scale reliabilities, (3) performing confirmatory factor analysis (CFA), and (4) estimating a set of regression models in which the different MFV scale versions are used to predict political issue positions.
Correlations between the original and abbreviated 3- and 6-item scales are respectively shown in Figs. 1 and 2 for the training sample, and in Table 1 for the two validation samples. Correlations between the complete and abbreviated scales were all extremely high (rs respectively ≥ .95 and .91 for the 6- and 3-item scales in the training sample, and ≥ .91 and .86 for the 6- and 3-item scales across the two validation samples).
Next, we compared the reliabilities of the complete and abbreviated scales, as shown in Table 2. Unsurprisingly, reliabilities for the abbreviated scales tended to be slightly lower (given that scale length is used in the computation of Cronbach’s alpha). However, the reliabilities of the 6-item scales were comparable to the MFQ (which contains the same number of items per foundation).Footnote 5 Reliabilities for the 3-item scales were slightly lower than the 6-item scales, but comparable to the MFQ. In a substantial number of cases, mean inter-item correlations tended to be slightly higher for both abbreviated scales vs. the complete scales. Thus, the abbreviated scales seem to be adequately reliable despite their brevity.
Next we performed CFAs for the complete and abbreviated scales, summarized in Table 3. Although the models for the scales of differing lengths are not formally comparable given that they are based on non-identical covariance matrices, the abbreviated measure performed identically or slightly better according to the root mean square error of approximation (RMSEA), and substantially better according to the comparative fit index (CFI) and Tucker-Lewis index (TLI). Note also that the abbreviated versions approached or exceeded conventional criteria for well-fitting models (RMSEA < .06; CFI > .95; Hu & Bentler, 1999). For additional CFA results such as factor loadings, see the R Notebook available at https://osf.io/cmwpv/.
Finally, to test the construct validity of the abbreviated scales, we fitted a set of ordinary least squares (OLS) regressions in which scores on all six moral foundations were used to predict UG participants’ positions on various political issues (i.e., 39 models: 3 scale versions by 13 political issues).Footnote 6 The amount of explained variance for each of the three scale versions across these issues is summarized in Fig. 3 (analogous structural equation models are described in the SI). Across the set of regressions, the 6- and 3-item scales explained slightly less variance in the criterion variables. Despite the substantial reduction in length, however, the scales still achieved average R2 values of 90% and 87% of the full scale across the 13 issues, once again suggesting that much of the information in the complete scales is retained despite dramatic reductions in scale length.
Discussion
The present study aimed to create abbreviated versions of the MFV to allow time-efficient, standardized assessment of people’s endorsement of the six moral foundations. To this end, we constructed two abbreviated versions of the MFV comprising 40% and 20% of the original items. Across wide-ranging analyses, these shortened scales corresponded closely to the original scales, and exhibited promising levels of reliability, as well as factor analytic and predictive validity. Here, we close with a brief discussion of recommendations for, and limitations of, use of these new abbreviated scales.
Firstly, we note that the abbreviated scales we present were validated in an Australian undergraduate sample and American Mechanical Turk sample. One obvious direction for future studies is to validate these scales in other populations. However, we believe our results provide strong evidence for the validity of the abbreviated scales in samples that (for better or worse) resemble a substantial proportion of the most frequently studied populations in psychological research.
Overall, we recommend the 6-item version of the MFV for accurate results with substantially reduced testing time. While the 3-item scale does largely possess adequate psychometric properties (especially in comparison to shortened MFQ scales of comparable length), it is limited by its somewhat lower reliability. The drawbacks of the lower reliabilities can be partly remedied by using latent variable models (Cole & Preacher, 2014; Westfall & Yarkoni, 2016).Footnote 7 Finally, we note that brief scales might not be suitable for all settings. In some cases, researchers will have strict time constraints yet also want to collect more observations to enhance precision and generalizability. In such contexts, researchers might consider other time-efficient alternatives such as image stimuli (Crone, Bode, Murawski, & Laham, 2018). In many cases, however, we believe the abbreviated MFV scales presented here will be a useful resource for moral psychology researchers.
Open practices statement
The study reported in this paper was not preregistered. Analysis code and data from the UG sample are available at https://osf.io/cmwpv/. Data from the AMT sample is stored in a separate OSF repository available upon request.
Notes
MFQ-style items for Liberty do exist as a separate measurement (Iyer, Koleva, Graham, Ditto, & Haidt, 2012); however; their factor structure (particularly with respect to the rest of the MFQ) has not received extensive attention.
A recent review of scale lengths in papers published in the Journal of Personality and Social Psychology revealed that the average scale length for measures of latent constructs is less than seven items (Flake, Pek, & Hehman, 2017). This suggests that many researchers would be reluctant to use the average 15 items per moral foundation required by the full MFV.
If we instead averaged over all Care items to create a Care scale, the Care facets would be weighted in proportion to the number of items measuring each facet, implying that Animal harm is more than twice as important as Physical harm, and Emotional harm more than five times so (simply because of the unbalanced number of items). Given that we are aware of no literature justifying such an imbalanced weighting of different facets of Care, and given that measures such as the MFQ emphasize breadth, we opted to weight each of the three facets equally.
Note that the 3-item version only partially overlaps with the 6-item version: 15 of the 18 items in the 3-item version appear amongst the 36 items in the 6-item version.
In the original MFQ validation paper, alphas across the five subscales ranged from .65 (Fairness) to .84 (Purity), with an average of .73 (see Table 1 in Graham et al., 2011), while a meta-analysis of MFQ subscale reliabilities by Tamul et al. (2020) yielded estimates between .64 (Fairness) and .83 (Purity). It is also worth noting that alphas for the 3-item scales described here are substantially better than the 15-item MFQ described in Table 1 of Graham et al. (2011), whose alphas ranged from .39 to .70, with an average of .59.
Note that unlike the preceding analyses, these analyses were conducted just once, on the entire UG sample (because such data are only available in this sample). These analyses are thus not replicated in the AMT sample.
As shown in the Supplementary Materials, however, the 3-item scales may be vulnerable to overly optimistic R2 estimates in SEMs, compared to the longer scales.
References
Clifford, S., Iyengar, V., Cabeza, R., & Sinnott-Armstrong, W. (2015). Moral Foundations Vignettes: A standardized stimulus database of scenarios based on moral foundations theory. Behavior Research Methods, 47(4), 1178–1198. https://doi.org/10.3758/s13428-014-0551-2
Cole, D. A., & Preacher, K. J. (2014). Manifest variable path analysis: Potentially serious and misleading consequences due to uncorrected measurement error. Psychological Methods, 19(2), 300–315. https://doi.org/10.1037/a0033805
Crone, D. L., Bode, S., Murawski, C., & Laham, S. M. (2018). The Socio-Moral Image Database (SMID): A novel stimulus set for the study of social, moral and affective processes. PLoS ONE, 13(1), e0190954. https://doi.org/10.1371/journal.pone.0190954
Crone, D. L., & Laham, S. M. (2015). Multiple moral foundations predict responses to sacrificial dilemmas. Personality and Individual Differences, 85(4), 60–65. https://doi.org/10.1016/j.paid.2015.04.041
Flake, J. K., Pek, J., & Hehman, E. (2017). Construct validation in social and personality research: Current practice and recommendations. Social Psychological and Personality Science, 8(4), 370–378. https://doi.org/10.1177/1948550617693063
Graham, J., & Haidt, J. (2012). Sacred values and evil adversaries: A moral foundations approach. In M. Mikulincer & P. R. Shaver (Eds.), The Social Psychology of Morality: Exploring the Causes of Good and Evil (pp. 1–26). New York: APA Books.
Graham, J., Haidt, J., Koleva, S. P., Motyl, M., Iyer, R., Wojcik, S. P., & Ditto, P. H. (2013). Moral Foundations Theory: The pragmatic validity of moral pluralism. Advances in Experimental Social Psychology, 47, 55–130. https://doi.org/10.1016/B978-0-12-407236-7.00002-4
Graham, J., Nosek, B. A., Haidt, J., Iyer, R., Koleva, S. P., & Ditto, P. H. (2011). Mapping the moral domain. Journal of Personality and Social Psychology, 101(2), 366–385. https://doi.org/10.1037/a0021847
Haidt, J., & Joseph, C. M. (2004). Intuitive ethics: How innately prepared intuitions generate culturally variable virtues. Daedalus, 133(4), 55–66. https://doi.org/10.1162/0011526042365555
Hu, L., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling, 6(1), 1–55. https://doi.org/10.1080/10705519909540118
Iyer, R., Koleva, S. P., Graham, J., Ditto, P. H., & Haidt, J. (2012). Understanding libertarian morality: The psychological dispositions of self-identified libertarians. PLoS ONE, 7(8), e42366. https://doi.org/10.1371/journal.pone.0042366
Kenny, D. A., & Milan, S. (2012). Identification: A nontechnical discussion of a technical issue. In R. H. Hoyle (Ed.), Handbook of Structural Equation Modeling (pp. 145–163). Guilford Press. Retrieved from https://psycnet.apa.org/record/2012-16551-009
Koleva, S. P., Graham, J., Iyer, R., Ditto, P. H., & Haidt, J. (2012). Tracing the threads: How five moral concerns (especially Purity) help explain culture war attitudes. Journal of Research in Personality, 46(2), 184–194. https://doi.org/10.1016/j.jrp.2012.01.006
Ottaviani, C., Mancini, F., Provenzano, S., Collazzoni, A., & D’Olimpio, F. (2018). Deontological morality can be experimentally enhanced by increasing disgust: A transcranial direct current stimulation study. Neuropsychologia, 119, 474–481. https://doi.org/10.1016/j.neuropsychologia.2018.09.009
Sahdra, B. K., Ciarrochi, J., Parker, P., & Scrucca, L. (2016). Using Genetic Algorithms in a Large Nationally Representative American Sample to Abbreviate the Multidimensional Experiential Avoidance Questionnaire. Frontiers in Psychology, 7. https://doi.org/10.3389/fpsyg.2016.00189
Schroeders, U., Wilhelm, O., & Olaru, G. (2016). Meta-heuristics in short scale construction: Ant colony optimization and genetic algorithm. PLoS ONE, 11(11), e0167110. https://doi.org/10.1371/journal.pone.0167110
Tamul, D., Elson, M., Ivory, J., Hotter, J., Lanier, M., Wolf, J., & Martínez-Carrillo, N. (2020). Moral Foundations’ Methodological Foundations: A Systematic Analysis of Reliability in Research Using the Moral Foundations Questionnaire. PsyArXiv Preprint. https://doi.org/10.31234/osf.io/shcgv
Wagemans, F. M. A., Brandt, M. J., & Zeelenberg, M. (2018). Disgust sensitivity is primarily associated with purity-based moral judgments. Emotion, 18(2), 277–289. https://doi.org/10.1037/emo0000359
Westfall, J., & Yarkoni, T. (2016). Statistically controlling for confounding constructs is harder than you think. PLoS ONE, 11(3), e0152719. https://doi.org/10.1371/journal.pone.0152719
Yarkoni, T. (2010). The abbreviation of personality, or how to measure 200 personality scales with 200 items. Journal of Research in Personality, 44(2), 180–198. https://doi.org/10.1016/j.jrp.2010.01.002
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
ESM 1
(DOCX 77.3 kb)
Appendix
Appendix
Rights and permissions
About this article
Cite this article
Crone, D.L., Rhee, J.J. & Laham, S.M. Developing brief versions of the Moral Foundations Vignettes using a genetic algorithm-based approach. Behav Res 53, 1179–1187 (2021). https://doi.org/10.3758/s13428-020-01489-y
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.3758/s13428-020-01489-y