Testing the Moral Foundations Questionnaire within a Muslim society: a study among young adults in Pakistan

This paper examines the psychometric properties of the 30-item Moral Foundations Questionnaire among a sample of 370 young adults between the ages of 18 and 26 years who were born in Punjab and who had lived there since their birth. Initial analyses did not support the internal consistency reliability of the five scales of moral predispositions proposed by this measure. Exploratory factor analysis and confirmatory factor analysis identified two factors that distinguished not between areas of moral predisposition, but between the two styles of items (relevance and judgement), each of which included all five predispositions. Correlations with personal religiosity suggested that the scale comprising 12 judgement items (α = .88) was susceptible to religious sentiment, but that the scale comprising 12 relevance items (α = .89) was not. The scale of 12 relevance items is commended for further testing and application within Muslim societies.

Within the context of MFT, Haidt and Joseph (2004) postulated the multidimensionality of morality, identifying four core predispositions: to prevent suffering (care), to respect hierarchies (authorities), to act reciprocally (fairness), and to behave purely (purity). Subsequently, Haidt and Graham (2007) added a fifth core predisposition: to affirm affinity to one's group (loyalty). Building on these foundations, Graham et al. (2011) proposed the Moral Foundations Questionnaire (MFQ), comprising sets of six items to measure each of the five predispositions, together with two unscored 'catch' questions designed to screen inattentive or random responding. The 30 items are presented in two sections. In the first section, participants are invited to assess how relevant they find specific issues when making a moral decision (known as the relevance set of items). In the second section, participants are invited to assess their agreement with moral statements (known as the judgement set of items). Examples of the first set of items include: whether or not someone suffered emotionally (care), whether or not someone acted unfairly (fairness), whether or not someone did something to betray his or her group (loyalty), whether or not an action caused chaos or disorder (authority), and whether or not someone violated standards of purity and decency (purity). Examples of the second set of items include: compassion for those who are suffering is the most crucial virtue (care), justice is the most important requirement for a society (fairness), it is more important to be a team player than to express oneself (loyalty), respect for authority is something all children need to learn (authority), and people should not do things that are disgusting, even if no one is harmed (purity). A shorter 20-item version of the MFQ has also gained currency, with four items assessing each of the five predispositions (see Iurino & Saucier, 2020). There are also examples of studies that have utilised the set of 15 relevance items alone, or the 15 judgement items alone (Kivikangas et al., 2021;Klein et al., 2018).
MFT proposes that morality develops through the interaction between a small set of innate instincts and socially constructed virtues . Some have offered conceptual critiques of MFT. For example, Suhler and Churchland (2011) argued that MFT is inconsistent with recent advances in neuroscience. Musschenga (2013) argued that MFT's descriptive nature lacks a normative political theory about how politics should work in multifarious, pluralist societies. Haste (2013) argued that MFT does not adequately address the complexities of affect and cognition processing. Some have accepted the plurality of MFT but proposed alternate numbers and combinations of moral foundations, while others have argued that morality is better understood as just one foundation concerned with perceptions of dyadic harm (Gray et al., 2012). Another critique has proposed that morality is the solution to social conflict and that game theory can be used to identify types of cooperation (Curry et al., 2019).
While theoretical rationale for MFT may be generally considered largely sound and coherent, the empirical evidence for the MFQ is somewhat less secure. For example, confirmatory factor analyses have reported only reasonable levels of fit with the hypothesised five-factor model (Iurino & Saucier, 2020;Kim et al., 2012;Kivikangas et al., 2017;Nilsson & Erlandsson, 2015;Yilmaz et al., 2016). Internal consistency reliabilities for the five scales have returned low alpha coefficients (see for example, Graham et al., 2009Graham et al., , 2011Graham et al., , 2012Harper & Hogue, 2019). Exploratory factor analysis has tended to support a twofactor solution (Franks & Scherr, 2015;Graham et al., 2011;Johnson et al., 2014;Kugler et al., 2014;Milesi, 2017;Rempala et al., 2016). The two-factor solution distinguished between two individualising scales (care and fairness) and three binding scales (loyalty, authority, and sanctity). Empirically, the individualising factor was linked with liberal political values, while the binding factor was linked with conservative political values, although the precise meanings of liberal and conservative may vary across cultures (Kivikangas et al., 2021). Confirmatory factor analysis, as originally demonstrated by Graham et al. (2011), indicated that a five-factor model demonstrated better fit than single, two, three, or six factors, or as a hierarchical model involving the individualising and binding components as supra-ordinate factor.
Some studies have reported separate psychometric properties of the judgement items set alone, measuring concrete application of moral judgements to specific situations, and the relevance items set alone, measuring first-order attitudes toward the moral judgements themselves (Curry et al., 2019). There are mixed findings on the performances of these individual scales. Some studies reported no significant differences in goodness of fit as evidenced by Root Mean Square Error of Approximation (RMSEA) and Standardised Root Mean Residual (SRMR) routines (Davies et al., 2014;Graham et al., 2011). Of the studies which reported differences between the two scales, the relevance scale most frequently boasted higher internal consistency and Comparative Fit Index (CFI) compared to the judgment scale and the scale as a whole (Curry et al., 2019;Du, 2019;Yalçındağ et al., 2019). One study reported that the judgment scale indicated superior RMSEA and SRMR (Yilmaz et al., 2016).

Research question
There is one aspect of the design of the MFQ that has not featured highly in the examination and critique of the instrument. This feature concerns the distinctive natures of the two components of the instrument: one concerned with judgement and one concerned with relevance. One component is more personal than the other. The judgement component invites participants to align themselves with attitudinal predispositions and includes some 'I' statement items. Here, the assessment is turned toward the subjective evaluation of the self. The relevance component is less personal and concerns the more objective evaluation of general principles.
There is good reason to hypothesise that these two styles of questions may function differently in different cultural contexts, especially when these cultural contexts have been shaped by distinctive beliefs that may prioritise respect for religion, respect for others, and respect for self. Such beliefs may, for example, inhibit denial of expected good qualities within the subjective evaluation of the self, but influence less strongly the objective evaluation of more general propositions. This hypothesis emerges from a sequence of recent studies that has explored within Muslim societies the psychometric properties of measures concerned with the evaluation of religion (Erken & Francis, 2021;Francis & Lewis, 2016;Francis et al., 2006Francis et al., , 2013Musharraf et al., 2014), the evaluation of other people (Akhtar et al., in press a) and the evaluation of the self (Akhtar et al., in press b).
The MFQ has been employed by a small number of studies in Muslim societies, but some of these studies do not provide data on the psychometric properties of the instrument (Alper & Yilmaz, 2020;Karimi-Malekabadi & Baboli, 2022). Examining the Turkish version of the MFQ, Yilmaz et al. (2016) reported that confirmatory factor analysis found the original fivefactor model was significantly better than the hierarchical two-factor model, the three-factor model, and the two-factor model. Some of the five scales generated poor Cronbach alpha coefficients: care/harm, .60; fairness/cheating, .57; loyalty/betrayal, .66; authority/subversion, .78; purity/degradation, .76. Also working with Turkish translations of the MFQ, Yalçındağ et al. (2019) also reported some poor alpha coefficients across three samples for the five scales (ranging from .61 to .79). In this study, exploratory factor analysis yielded a three-factor solution, although confirmatory factor analysis indicated the best fit for a five-factor solution despite low fit indices and high error coefficients.
Employing the Persian version of the MFQ in Iran, Mikani and Tabatabaei (2021) reported acceptable Cronbach alpha coefficients for all five scales: care/harm, .66; fairness/cheating, .68; loyalty/betrayal, .71; authority/subversion, .69; purity/degradation, .80. Examining the three datasets using the Persian version of the MFQ in Iran, Nejat and Hatami (2019) reported that the Cronbach (1951) alphas were relatively low for care, fairness, and loyalty, and that exploratory factor analysis generated a two-factor solution on two datasets and a three-factor solution on the third. Examining a new Persian translation of the MFQ in Iran, Atari et al. (2020a) reported that exploratory factor analysis generated a five-factor solution different from the one originally proposed by Graham et al. (2011), and that the extracted factors were not readily interpretable. Confirmatory factor analysis suggested that neither the original structure nor the new structure reached satisfaction fit indices. Two studies containing Muslim society samples within a large number of cross-cultural samples reported mixed results on the performance of the five-factor model across western and non-western cultures (Doğruyola et al., 2019;Iurino & Saucier, 2020).
Against this background, the aim of the present paper is to explore more fully the psychometric properties of the MFQ proposed by Graham et al. (2011) within a Muslim society.

Procedure
The MFQ was included within the online survey Parental Attachment and Life designed for completion by young adults between the ages of 18 and 26 who were born in Punjab and had lived there all their life. Participants were assured of confidentiality. The project was approved by the Research Ethics Committee of the Advanced Studies Research Board of GC University Lahore.

Instrument
The 32 items of the MFQ (Graham et al., 2011) were presented in two parts of 16 items each. Part one (the relevance items) was fronted by the instruction, 'When you decide whether something is right or wrong, to what extent are the following considerations relevant to your thinking?' Each item was rated on a six-point scale: not at all relevant (0), not very relevant (1), slightly relevant (2), somewhat relevant (3), very relevant (4), and extremely relevant (5). The 'catch' question in this set was 'Whether or not someone was good at maths'. Part two (the judgement items) was fronted by the instruction, 'Please read the following sentences and indicate your agreement or disagreement'. Each item was rated on a six-point scale: strongly disagree (0), moderately disagree (1), slightly disagree (2), slightly agree (3), moderately agree (4), and strongly agree (5). The 'catch' item in this set was 'it is better to do good than to do bad'.
Personal religiosity was assessed by the question: 'How close do you feel to your religion?' rated on an 11-point scale from very low (0) to very high (10).
Sex was coded: male (1) and female (2). Age was coded in years from 18 to 26. Participants under 18 or over 26 were excluded from the survey.

Participants
The Parental Attachment and Life survey was fully completed by 370 participants who met the profile of young adults between the ages of 18 and 26 who were born in Punjab and had lived there since their birth. The participants comprised 151 males, 217 females, and two who preferred not to say: 45 were aged 18 or 19, 131 were aged 20 or 21, 116 were aged 22 or 23, 65 were aged 24, 25, or 26, and 13 preferred not to say.

Analysis
The data were initially analysed by SPSS using the frequency, correlation, factor, and reliability routines, assigning items to the five scales as specified in the original instrument (Graham et al., 2011). The first stage of data analysis examined the psychometric properties of these five subscales. Since the results suggested low correlations between individual items and the sum of the other five items within the scale, and low reliabilities, the second step of data analysis employed factor analyses to interrogate the factor structure of the instrument in this sample. An initial exploratory factor analysis (principal components extraction), using the default settings extracted six components. Parallel analysis (O'Connor, 2000) indicated that three components would be a better solution. Two confirmatory factor analyses specifying extraction of five and three components were then run and the varimax rotated components matrices compared. Since these varimax rotated components suggested two primary components, further analyses refined these two components to create two separate scales. Factor one comprised only items concerning the subjective evaluation of the self. Factor two comprised only items concerning the objective evaluation of more general principles. Table 1 presents the scale properties for the five scales of the MFQ in terms of the means, standard deviations, and internal consistency reliability expressed by the alpha coefficient (Cronbach, 1951). Two of these scales fail to meet the threshold of .65. Table 2 focuses more closely on the items comprising the five scales in terms of the correlations between the individual items and the sum of the other five items, and in terms of item endorsement presented as the sum of the very relevant and extremely relevant responses and the sum of the moderately agree and strongly agree responses.

Reliability analyses
Regarding the scale concerning care/harm, the item with the highest correlation was 'One of the worst things a person could do is hurt a defenceless animal' (.54). The weakest item in the set falling below the threshold of .35 was 'It can never be right to kill a human' (.34). Highest endorsement was given to the following two items: 'It can never be right to kill a human being' (65%) and 'One of the worst things a person could do is hurt a defenceless animal' (62%).
Regarding the scale concerning fairness/cheating, the item with the highest correlation was 'Justice is the most important requirement for a society' (.49). One item fell below the threshold of .35 'I think it's morally wrong that rich children inherit a lot of money while poor children inherit nothing' (.25). The highest endorsement was given to the following two items: 'Justice is the most important requirement for a society' (74%) and 'When the government makes laws, the number one principle should be ensuring that everyone is treated fairly' (63%).
Regarding the scale concerning loyalty/betrayal, no item reached the threshold of .35. The highest endorsement was given to the following two items: 'I am proud of my country's history' (58%) and 'People should be loyal to their family members' (52%).  Regarding the scale concerning authority/subversion, the item with the highest correlation was 'Respect for authority is something all children need to learn' (.46). Three items fell below the threshold of .35. The highest endorsement was given to the following two items: 'Men and women each have different roles to play in society' (65%) and 'Respect for authority is something all children need to learn' (60%).
Regarding the scale concerning purity/degradation, the item with the highest correlation was 'People should not do things that are disgusting, even if no one is harmed' (.44). The weakest item in the set falling below the threshold of .35 was 'I would call some acts wrong on the grounds that they are unnatural' (.31). The highest endorsement was given to the following two items: 'Chastity is an important and valuable virtue' (56%) and 'People should not do things that are disgusting, even if no one is harmed' (55%). Table 3 examines the correlations between each of the five scales and two personal variables: sex and age. These data demonstrate that females recorded higher scores than males on four of five scales (care/harm, fairness/cheating, authority/subversion, and purity/degradation). These sex differences are consistent with the findings of Atari et al. (2020b) concerning sex differences recorded on the MFQ across 67 countries, namely that women consistently scored more highly than men on care, fairness, and purity, but that sex differences in loyalty and authority were variable across cultures. Scores decline with age on three of the five scales (care/harm, fairness/cheating, and purity/degradation).
The results from this first stage of data analysis suggests that the five scales may need to be used with caution in Muslim societies, given the overall low internal consistency reliabilities.

Confirmatory factor analyses and scale development
Specifying a five-factor structure showed a poor fit to data in this sample (Table 4). Of the 30 items, 23 loaded most heavily on the first component, including all the items in the care and sanctity scales, and five of the six items in the fairness scale. Five of the thirty items loaded most heavily on the second component, three from the loyalty scale and one each from the fairness and authority scales. Two items loaded most heavily on the third component, one from the loyalty scale and one from the authority scale. No items loaded most heavily on the remaining two components. .50 .31 People should be loyal to their family members, even when they have done something wrong .39 .48 It is more important to be a team player than to express oneself .37 Principal components extraction, with number of components set to five, followed by varimax rotation. Factor loadings < .3 suppressed. Loadings in bold type indicate highest loading Care One of the worst things a person could do is hurt a defenseless animal .38

.63
Fair When the government makes laws, the number one principle should be ensuring that everyone is treated fairly .32

.63
Authority If I were a soldier and disagreed with my commanding officer's orders, I would obey anyway because that is my duty

.59
Loyalty People should be loyal to their family members, even when they have done something wrong

.58
Loyalty I am proud of my country's history .56 .38 Fair I think it's morally wrong that rich children inherit a lot of money while poor children inherit nothing .53 1 3 Loyalty It is more important to be a team player than to express oneself

.47
Loyalty Whether or not someone's action showed love for his or her country

.62
Authority Whether or not someone showed a lack of respect for authority .39

.59
Authority Whether or not someone conformed to the traditions of society .35 .56 Principal components extraction, with number of components set to three, followed by varimax rotation. Factor loadings < .3 suppressed. Loadings in bold type indicate highest loading Specifying a three-factor structure (as suggested by a parallel analysis) resulted in 13 items loading on the first component, 14 on the second component, and three on the third component (Table 5). Both main components had items from all five scales of the MFQ, but what so clearly differentiates these two components is that the first component comprised only the relevance items, and the second component comprised only the judgement items.
After dropping the three items that loaded on the third component, Table 6 confirmed the two-factor structure of the remaining 25 items. On the basis of this solution, and accepting now loadings of .54 and above, the internal consistency reliability of two 12-item scales One of the worst things a person could do is hurt a defenseless animal .60 I would call some acts wrong on the grounds that they are unnatural .54 I think it's morally wrong that rich children inherit a lot of money while poor children inherit nothing .53 It is more important to be a team player than to express oneself Compassion for those who are suffering is the most crucial virtue was tested. The first scale comprising 12 relevance items recorded alpha = .89. The second scale comprising 12 judgement items recorded alpha = .88. The correlation between these two scales was r = .29, p < .001. Both scales recorded a positive correlation with sex, indicating higher scores among females: scale one, r = .23, p < .01; scale two, r = .14, p < .01. Both scales recorded a negative correlation with age, indicating lower scores among older participants: scale one, r = .11, p < .05; scale two, r = − .14, p < .01. Of particular interest, however, concerns the way in which these two scales related in different ways to personal religiosity. While scores recorded on the relevance scale were independent of personal religiosity (r = .03, ns), scores recorded on the judgement scale were positively correlated with personal religiosity (r = .23, p < .001).
The results from the second stage of data analysis suggests that the measurement of MFT, employing many of the items proposed by Graham et al. (2011) may be reconceptualised as comprising two components: one component concerning the relevance items and the other component concerning the judgement items. Both components embrace the five core predispositions proposed by MFT: to prevent suffering (care), to respect hierarchies (authorities), to act reciprocally (fairness), to affirm affinity to one's group (loyalty) and to behave purely (purity). The relatively low correlation between these two components (r = .29) indicates that they are accessing distinctive underlying constructs.

Conclusion
Set against the background of MFT, the aim of the present paper was to explore the MFQ proposed by Graham et al. (2011) within a Muslim society. The first stage of data analysis examined the properties of the five scales specified in the original instrument as proposed by Graham et al. (2011). Of these five scales, two failed to record an alpha coefficient at the threshold of .65 (loyalty/betrayal and authority/subversion) and a further two failed to reach the threshold of .70 (purity/degradation and fairness/cheating). The fifth scale recorded an alpha coefficient of .70 (care/harm). Further detailed examination of the correlations between the individual items and the sum of the other five items within the scale reported some low correlations highlighting poor homogeneity within scales. These results from the first stage of data analysis suggested that the five scales as proposed by Graham et al. (2011) may need to be used and interpreted with caution in a Muslim society.
The second stage of data analysis employed exploratory factor analysis (principal component extraction), parallel analysis, and confirmatory factor analysis to re-examine the structure of responses to the original pool of 30 items within a Muslim society. These analyses suggested that the responses failed to recover the proposed structure of the five core predispositions proposed by MFT (care/harm, fairness/cheating, loyalty/betrayal, authority/subversion, and purity/degradation). Rather, the identified two-factor solution recovered the distinction between the two sets of items that comprise the MFQ: the judgement component and the relevance component. The first factor drew together 12 items from the relevance component, generating an alpha coefficient of .89, while the second factor drew together 12 items from the judgement component, generating an alpha coefficient of .88. Both components embraced the five core predispositions proposed by MFT: to prevent suffering (care), to respect hierarchies (authorities), to act reciprocally (fairness), to affirm affinity to one's group (loyalty), and to behave purely (purity), indicating that the five components cohere to generate a unidimensional construct. The correlation between these two components was quite low (r = .29), indicating that they are accessing distinctive underlying constructs.
The rationale for testing the psychometric properties of the MFQ specifically within a Muslim society was grounded in a group of earlier studies which had suggested that Muslim beliefs, and especially the notion of respect for religion, respect for others, and respect for self, may influence the way in which participants respond to some items. The specific hypothesis was advanced that such beliefs may inhibit denial of expected good qualities within the subjective evaluation of the self, but influence less strongly the objective evaluation of more general predispositions. Moreover, it was hypothesised that within the MFQ one set of questions is more personal than the other. The judgement component invites participants to align themselves with attitudinal predispositions and includes some 'I' statement items. Here, the assessment is turned toward the subjective evaluation of self. The relevance component is less personal and concerns the more objective evaluation of general principles. Now if this were the case, we would anticipate different patterning of responses for the two components within a Muslim context, and we might expect the judgement component to more closely relate to individual differences in the participants' religiosity than is the case for the relevance component. This is consistent with the finding that scores recorded on the judgement scale were positively correlated with personal religiosity, while scores recorded on the relevance scale were independent of personal religiosity.
These results from the second stage of data analysis suggested that, within a Muslim society, the most effective use of the MFQ could be achieved by scoring 12 of the 15 relevance items (as identified by factor 1 in Table 6). These items capture all five core predispositions proposed by MFT, achieve a high level of internal consistency reliability (α = .89), and are independent of individual differences in religiosity. The finding that the two styles of items (the judgement component and the relevance component) do not work satisfactorily together within a Muslim society has reduced the overall number of items beyond the point at which it would be reasonable to disaggregate the hypothesised five core predispositions. Future research could now attempt to develop more items within the relevance set to reflect the five core predispositions and then to test the factor structure of this new set of items.
The limitations with the present study include being based in one specific Muslim society (Punjab), being confined to a narrowly defined subgroup of the population (young adults between the ages of 18 and 26 who were born in Punjab and had lived there all their life), and involving only one set of data (N = 370). The implications of the findings, however, deserve testing by further replication studies. licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.