Although bariatric surgery is the most effective treatment for severe obesity in terms of long-term weight loss and reduction in medical comorbidities (Ahmed et al., 2018; Jakobsen et al., 2018; O’Brien et al., 2019), some patients experience suboptimal surgical outcomes (King et al., 2018, 2020). The American Society for Metabolic and Bariatric Surgery (ASMBS) recommends a presurgical psychological assessment to identify psychosocial risk factors and provide recommendations to both the patient and multidisciplinary team that aim to facilitate the best outcome for the patient (Sogg et al., 2016). The ASMBS recommendations indicate that presurgical assessment should include psychometric testing in addition to the clinical interview, with the rationale that test data can aid in forming a more comprehensive clinical impression, provide information that may not be sufficiently covered or available within the time restrictions of the interview, and reveal information about the patient that may not have been disclosed during the interview (Sogg et al., 2016).

The Minnesota Multiphasic Personality Inventory (MMPI) instruments have been commonly used in bariatric surgery clinics as broadband measures to assess a range of relevant psychosocial domains (Bauchowitz et al., 2005; Walfish et al., 2007). The newest version of the test, the MMPI-3 (Ben-Porath & Tellegen, 2020a), was released in November 2020. Extensive research with the previous version, the MMPI-2-Restructured Form (MMPI-2-RF; Ben-Porath & Tellegen, 2008/2011), demonstrated good psychometric properties in bariatric samples, including predictive utility, reliability and validity, and replicable comparison group data (Marek et al., 2021; Tarescavage et al., 2013). The goals of the MMPI-3 revision were to collect new normative data representative of the 2020 census and enhance content coverage while building on the previous MMPI instruments’ strong foundations (Ben-Porath & Tellegen, 2020a). The MMPI-3 consists of 335 items comprising 52 scales, which include 10 validity scales, 3 higher-order scales, 8 restructured clinical scales, 26 specific problem scales (within the domains of somatic/cognitive, internalizing, externalizing, and interpersonal), and 5 PSY-5 scales.

To date, the MMPI-2-RF has been a particularly useful psychometric tool in bariatric surgery psychological evaluations because it has bariatric norms. Currently, the MMPI-3 has a female only bariatric surgery candidate population as a standard comparison group; more data are needed to create a male bariatric surgery candidate standard comparison group. Thus, the current project which examines the validation of the MMPI-3 in both female and male bariatric surgery candidates is important to long-time users of the MMPI-2 and the MMPI-2-RF.

Ben-Porath and Tellegen (2020b) provide extensive data analyses demonstrating that the empirical correlates of MMPI-3 scale scores are comparable to those obtained with MMPI-2-RF versions of these scales. Although these analyses included a presurgical spine surgery candidate sample, findings were not reported for bariatric surgery candidates. Marek et al., (2021) have demonstrated clinical utility of the new MMPI-3 Eating Concerns-specific problem scale in assessing eating pathology in a postoperative bariatric surgery sample. Specifically, elevated scores on the Eating Concerns scale were associated with 6-year postoperative percent weight regain and higher scores on the Eating Disorder Examination-Questionnaire. However, the psychometric properties of the broader set of MMPI-3 scales within a preoperative bariatric sample have yet to be examined.

The purpose of the current study was to establish whether the MMPI-3 is comparable to the MMPI-2-RF in a sample of patients seeking bariatric surgery. We also aimed to report reliability data for all MMPI-3 scale scores. Last, we sought to explore associations between commonly used self-report symptom measures and the MMPI-3 Emotional/Internalizing Dysfunction, Behavioral Externalizing Dysfunction, and a few additional Specific Problems Scales (such as Eating Concerns) for the purpose of examining convergent and discriminant validity patterns. The self-report symptom inventories utilized in this study included the Patient Health Questionnaire-9 (PHQ-9; Spitzer et al., 1999) to assess depression, General Anxiety Disorder-7 (GAD-7; Spitzer et al., 2006) to assess anxiety, Alcohol Use Disorders Identification Test-Consumption (AUDIT-C; Bush et al., 1998) to assess alcohol use, and Eating Disorder Examination-Questionnaire (EDE-Q; Fairburn & Beglin, 1994) to assess eating disorder psychopathology.

We hypothesized that MMPI-3 scales scores would be similar across genders with the exception of the Behavioral/Externalizing Dysfunction scales for which men tend to score 5 T to 6 T score points higher than women (Ben-Porath & Tellegen, 2020b). We also hypothesized that MMPI-3 scale scores would be similar to those who were administered the MMPI-2-RF in a different sample (reported in Marek et al., 2014) of patients seeking bariatric surgery. It was also hypothesized that MMPI-3 scale scores would demonstrate reliability coefficients comparable to those reported in other samples (Ben-Porath & Tellegen, 2020b). Notably, that Cronbach’s alpha will be ≥ 0.70 for larger scales and that mean inter-item correlations will be ≥ . 15 for shorter Specific Problems scales. Last, it was hypothesized that MMPI-3 scales assessing facets of depression, anxiety, alcohol/substance use, and disordered eating would demonstrate substantial correlations [at 0.30, a moderate correlation as defined by Cohen (1988)] with commonly used, brief self-report symptom measures assessing similar constructs (i.e., PHQ-Q, GAD-7, AUDIT-C, and EDE-Q, respectively).



Participants included patients who were seeking bariatric surgery at a large academic medical center in the Midwest and, as part of standard medical care, met with a psychologist for an evaluation prior to having bariatric surgery. Participants were required to be 18 years or older and English-speaking.

The sample consisted of 649 patients. A total of 14 patients were removed from the study sample because they invalidated the MMPI-3 based on criteria outlined in the MMPI-3 Technical Manual (Ben-Porath & Tellegen, 2020b). Of those who produced valid protocols (n = 635), 502 (79.1%) were female and 133 (20.9%) were male. The mean age was 41.93 (SD = 11.05). Race breakdown was as follows: 65.5% were white, 26.6% were Black, and the rest (7.9%) identified as another race. In terms of highest level of education attained, 4.7% did not complete high school or a GED, 23.1% completed high school or had a GED, 31.7% completed some college, 10.9% had an associate’s degree, 17.5% had a bachelor’s degree, and 12.1% had a master’s degree or higher. The majority of patients (90.9%) were presenting for an initial bariatric surgery and 9.1% were presenting for a revision. In terms of surgery preference, 44.9% desired sleeve gastrectomy, 40.6% desired Roux-en-Y gastric bypass, 13.4% were undecided, and 1.1% indicated adjustable gastric banding (although this procedure is no longer offered at the study hospital). In our study, the two MMPI-3 scales assessing underreporting, Uncommon Virtues (L) and Adjustment Validity (K), were elevated in 22.2% and 37.8% of patients, respectively.


Minnesota Multiphasic Personality Inventory-3 (MMPI-3)

The MMPI-3 (Ben-Porath & Tellegen, 2020a) is a 335-item broadband self-report measure of psychopathology and personality normed based on projected 2020 census demographics. It takes approximately 25–35 min to complete via computer administration. The test is comprised of 52 scales, which include 10 validity scales and 42 substantive scales. The 42 substantive scales include 3 higher-order scales, 8 restructured clinical scales, 26 specific problem scales (within the domains of somatic/cognitive, internalizing, externalizing, and interpersonal), and 5 PSY-5 scales. The scale scores of the MMPI-3 yield good reliability and validity across samples (Ben-Porath & Tellegen, 2020b) and among presurgical psychological evaluation samples (Marek et al., 2022).

Patient Health Questionnaire-9 (PHQ-9)

The PHQ (Spitzer et al., 1999) is a self-report diagnostic tool for psychological disorders that assesses the areas of depression, anxiety, eating, alcohol, and somatoform symptoms and was derived from the Primary Care Evaluation of Mental Disorders (PRIME-MD), a diagnostic tool created by Pfizer following the publication of the DSM-III. The PHQ-9 (Kroenke & Spitzer, 2002) is the 9-item depression module from the PHQ and has commonly been used in medical populations, including bariatric surgery patients. The PHQ-9 has yielded evidence of good validity, reliability, and utility as a depression screening tool with bariatric surgery patients (Cassin et al., 2013; Marek et al., 2016). Cronbach’s alpha for the PHQ-9 in the current sample was 0.86.

General Anxiety Disorder-7 (GAD-7)

The GAD-7 (Spitzer et al., 2006) is a 7-item self-report screener for anxiety. Although the GAD-7 has demonstrated good reliability in bariatric surgery samples (Atwood et al., 2021; de Zwaan et al., 2014; Koehler et al., 2020), there is fairly limited research on its psychometric properties in bariatric settings (Marek et al., 2016). Sockalingam et al. (2017) found that anxiety scores as assessed by the GAD-7 decreased at 1 and 2 years after bariatric surgery as compared to presurgery scores. Cronbach’s alpha for the GAD-7 in the current sample was 0.91.

Eating Disorder Examination-Questionnaire (EDE-Q)

The Eating Disorder Examination (EDE; Fairburn & Cooper, 1993) is a semi- structured clinical interview that measures psychopathology of eating disorders, specifically concerns with shape, weight, and binge eating behaviors (Guest, 2000). The EDE includes four subscales including Dietary Restraint, Eating Concern, Shape Concern, and Weight Concern. The EDE-Q (EDE-Q; Fairburn & Beglin, 1994) was derived from the EQE for clinical and research purposes, and includes 28-items that address eating disorder behaviors and cognitive symptoms. The EDE-Q generates the same four subscales (Dietary Restraint, Eating Concern, Weight Concern, and Shape Concern) as the EDE plus a global score. The global score measures the incidence and severity of eating disorder behaviors (Rand-Giovannetti et al., 2020). The EDE-Q is a measure that is easily accessible on the public domain and administered in a short period, less than 10 min (Marek et al., 2016). The EDE-Q is a widely used clinical instrument that has been validated in the bariatric surgery population. It demonstrates adequate to good psychometric properties including adequate reliability (Cronbach’s alpha ranges from 0.72 to 0.95) and good concurrent/criterion-related validity among bariatric surgery samples (Elder et al., 2006; Kalarchian et al., 2000; Marek et al., 2016).

Alcohol Use Disorders Identification Test-Consumption (AUDIT-C)

The AUDIT-C (Bush et al., 1998) is a modified version of the Alcohol Use Disorders Identification Test (AUDIT; Bush et al., 1998). The AUDIT is a 10-item measure that was developed by the World Health Organization (WHO) in an effort to measure alcohol use and behaviors and alcohol-associated problems (King et al., 2012). Among bariatric patients, the instrument demonstrates good reliability and convergent validity coefficients (King et al., 2012; Marek et al., 2016; Mitchell et al., 2015). The AUDIT-C is a shorter validated instrument that measures alcohol consumption in the past 12 months using 3 items (Marek et al., 2016; Suzuki et al., 2012). The measure is a widely accepted screening tool that is often included as part of the presurgical process for patients pursuing bariatric surgery. AUDIT-C scores range from 0 to 12 with a score of ≥ 4 for men and of ≥ 3 for women indicating positive for hazardous alcohol use or an active alcohol use disorder (Bush et al., 1998; Ibrahim et al., 2019). Cronbach’s alpha for the AUDIT-C in the current sample was 0.41, likely owing to limited variability and small item count for the scale. The mean inter-item correlation was 0.29 indicating good reliability.


As part of standard clinical care at the study hospital, patients met with a psychologist for an evaluation prior to having bariatric surgery. These presurgical evaluations are risk assessments aimed to identify psychosocial factors that may diminish the outcome of bariatric surgery. The presurgical evaluations consisted of 1 h of psychological testing with questionnaires administered via computer, and then, immediately after, 1 h of face-to-face interview with the psychologist. Approximate administration times for the questionnaires are as follows: 25–35 min for MMPI-3 (Ben-Porath & Tellegen, 2020a), 5 min for PHQ-9, 5 min for GAD-7, 5–10 min for EDE-Q, and < 5 min for AUDIT-C (Marek et al., 2016). Since the onset of the COVID-19 pandemic, and for the entirety of this study, both the testing and interview portions of the presurgical evaluations have been conducted remotely. As noted above, as part of testing, patients were administered the following self-report measures: MMPI-3, PHQ-9, GAD-7, EDE-Q, AUDIT-C, and a health psychology demographics questionnaire (age, sex, marital status, race/ethnicity, highest level of education, history of bariatric surgery, and type of bariatric surgery being pursued). The MMPI-3 was administered via Pearson Assessment’s Q-global, which is a secure web-based scoring and reporting system. All other measures were administered via REDCap (Harris et al., 2009, 2019), which is secure web application for building and managing online surveys and databases. Remote administration of psychological testing was proctored as recommended (Corey & Ben-Porath, 2020). Patients were evaluated consecutively between November 2020 and May 2021. Use of data was approved by the medical center’s Institutional Review Board.

Statistical Analyses

Due to the large amount of data in relation to the sample size, a conservative correction method was deemed appropriate for interpreting results. For each set of analyses where p-values were interpreted, a Bonferroni-corrected alpha was calculated when determining statistical significance.

Means and standard deviations for the MMPI-3 scale scores and external criteria broken down by gender were calculated and placed in Table 1. To identify whether there were meaningful gender differences, independent samples t-tests were calculated. Cohen’s ds (0.20 = small effect, 0.50 = medium effect, 0.80 or greater = large effect) were also calculated for every independent samples t-test (Cohen, 1988). Because of numerous comparisons, a Bonferroni-corrected alpha was calculated (0.05/58) and differences were only deemed statistically significant if alpha was less than 0.0009.

Table 1 Means and standard deviations for MMPI-3 scales and other self-report criteria (N = 635)

Means and standard deviations from the comparable MMPI-2-RF scales reported in Marek et al. (2014) and the current sample’s combined gender MMPI-3 scale scores are reported in Table 2. Independent samples t-tests were calculated to compare scale scores along with Cohen’s d to establish effect sizes. A Bonferroni-corrected alpha was calculated (0.05/47) and differences were only deemed statistically significant if alpha was less than 0.0010.

Table 2 Scale score comparability between valid MMPI-2-RF and MMPI-3 scale scores

Listed in Table 3 are reliability and standard error of measurement estimates. Internal consistency coefficients—including mean inter-item correlations and Cronbach’s alphas—and standard error of measurements were calculated. Kuder-Richardson-20 calculations (Kuder & Richardson, 1937) were used to estimate Cronbach’s alphas due to the dichotomous nature of the MMPI-3 items (True/False).

Table 3 Minnesota Multiphasic Personality Inventory–3 Substantive Scale Internal Consistencies and Standard Errors of Measurement in a Bariatric Surgery Sample (N = 635)

Pearson Product–Moment Correlations were then calculated among the external criteria and between the MMPI-3 scale scores and the external criteria (Table 4) to examine the convergent and discriminant validity. A Bonferroni-corrected alpha was calculated (0.05/37) and correlations were only deemed statistically significant if alpha was less than 0.0014.

Table 4 Pearson product-moment correlations between MMPI-3 scale scores and other self-report measures


Descriptive Statistics

Presented in Table 1 are descriptive statistics for the MMPI-3 scale scores and external criteria broken down by gender. With regard to the MMPI-3 scale scores, men scored statistically significantly higher than women on Behavioral/Externalizing Dysfunction, Antisocial Behaviors, Juvenile Conduct Problems, Impulsivity, Aggression, Cynicism, and Disconstraint. Effect sizes for these differences were in the small to modest range. No other statistically significant gender differences on the MMPI-3 scale scores or external criteria were observed. Thus, data were combined for further analyses.

Table 2 provides scale score comparisons between a sample of patients seeking bariatric surgery who took the MMPI-2-RF (Marek et al., 2014) vs. the current sample that took the MMPI-3. Those who took the MMPI-2-RF scored trivially to modestly higher on the following scales: Infrequent Responses, Somatic Complaints, Neurological Complaints, and Aggressiveness. There was also substantial difference between Malaise scores such that those who took the MMPI-2-RF scored, on average, 12 T score points higher than those who took the MMPI-3.

Reliability Analyses

Regarding the internal consistency coefficients reported in Table 3, median Cronbach’s alpha estimates for the Higher-Order Scales were 0.77. The median reliability estimate among the Restructured Clinical Scales was 0.80. The Specific Problems Scales yielded a median of 0.72. The median internal consistency estimate among the Personality-Psychopathology-5 Scales was 0.75.

Mean inter-item correlations for the Higher-Order Scales yielded a median of 0.13. Regarding the Restructured Clinical Scales, the mean inter-item correlation median was 0.20. Mean inter-item correlations for the Specific Problems Scales yielded a median of 0.27. Among the Personality-Psychopathology-5 Scales, mean inter-item correlations yielded a median of 0.14.

Standard Error of Measurements (SEMs) are expressed in T-scores in Table 3. Among the Higher-Order Scales, these SEMs yielded a median of 3.63. Among the Restructured Clinical Scales, the median SEM was 4.16. With regard to the Specific Problems Scales, SEMs yielded a median of 4.61. Among the Personality-Psychopathology-5 Scales, the median SEM was 4.37.

Pearson Product-Moment Correlations were conducted on the external criteria measures. The inter-correlations between the external criteria measures are reported in Supplemental Table A. A substantial correlation was observed between the PHQ-9 and GAD-7 (r = 0.79), implying that both measures are likely capturing a similar construct vs. discriminating between depression and anxiety. Likewise, large inter-correlations were observed within EDE-Q subscales.

Validity Analyses

Most of the Emotional/Internalizing Dysfunction scales of the MMPI-3 were meaningfully associated with the PHQ-9, GAD-7, and most EDE-Q subscales (except for Restraint), see Table 4. Some discriminant patterns can be observed despite high inter-correlations among the external criteria. For instance, Low Positive Emotions scores were more strongly associated with the PHQ-9 than with the GAD-7. Scores on the Dysfunctional Negative Emotions scale and most of its facet scales, such as Worry and Negative Emotionality/Neuroticism, were more strongly associated with the GAD-7 than with the PHQ-9. The Eating Concerns scale on the MMPI-3 was most strongly associated the Eating Concerns subscale on the EDE-Q, though still meaningfully associated with the other EDE-Q subscales (except Restraint) and, as would be expected, not meaningfully correlated with the PHQ-9, GAD-7, or AUDIT-C—providing evidence of good discriminant validity. The MMPI-3 Substance Abuse scale was most strongly associated with the AUDIT-C and MMPI-3 scale scores evidenced good discriminant validity with the AUDIT-C.


Use of the MMPI instruments is empirically supported in bariatric surgery settings (Marek et al., 2013, 2014; Tarescavage et al., 2013). This study adds to the existing literature by being the first to appraise the recently released MMPI-3 within a presurgical bariatric sample. Our findings indicate that the MMPI-3 is a psychometrically sound measure for presurgical bariatric psychological evaluations as discussed next.

MMPI-3 scale score differences between genders map onto most other samples reported in the MMPI-3 Technical Manual (Ben-Porath & Tellegen, 2020b). Both men and women produce comparable T score means and standard deviations except for some of the Behavioral/Externalizing Dysfunction scales where men tended to score higher than women. This also was a pattern observed on the MMPI-2-RF in bariatric seeking samples (Marek et al., 2013, 2014; Tarescavage et al., 2013). These differences likely reflect actual differences rather than test bias; however, further studies using external criteria similar to Marek et al.’s (2014) with a bariatric surgery seeking sample are needed to directly address this question.

With regard to a comparison of MMPI-3 and MMPI-2-RF scales in bariatric surgery candidates, scores were similar on both the MMPI-2-RF and the MMPI-3 reflecting substantial cross-version comparability. This finding is consistent with data reported in Appendix E of the MMPI-3 Technical Manual (Ben-Porath & Tellegen, 2020b). Of note, MMPI-3 scale scores on most of the Somatic/Cognitive scales scores were modestly to substantially lower when compared to their MMPI-2-RF counterparts. Ben-Porath and Tellegen (2020b) report a comparison between the normative samples of the MMPI-2-RF (collected in the mid-1980s) and the MMPI-3 (collected in 2020), which demonstrates that there was a substantial increase in scores on the Somatic/Cognitive scales for the MMPI-3 normative sample. Thus, cross-version differences on the Somatic/Cognitive scales are accounted for by normative shifts, with MMPI-3 scale scores likely providing a more accurate reflection of somatic/cognitive functioning in medical samples compared to the MMPI-2-RF. Patients in the current sample produced MMPI-3 scores that are more in line with the MMPI-3 normative sample and this is a similar finding to those reported among patients seeking spine surgery (Marek et al., 2022).

Reliability data in the current sample are generally good. These findings are consistent with those reported in the MMPI-3 Technical Manual (Ben-Porath & Tellegen, 2020b) for the normative sample for most scales. There were some reliability estimates that are lower than conventional thresholds for adequate reliability (e.g., substance abuse). This is largely due to a restricted range of scores among patients seeking bariatric surgery, which attenuates reliability estimates. For some scales, such as the Eating Concerns scale, mean inter-item correlation coefficients are a better estimate of internal consistency. This is because Cronbach’s alpha is impacted by the number of items on a scale. Nonetheless, most scales on the MMPI-3 yielded good reliability estimates in this sample. Standard errors of measurement correct for the attenuating effects of range restriction. Most standard error of measurements across the Higher-Order, Restructured Clinical, and PSY-5 Scales in this sample fall just at or below 5 T score points. This includes the Specific Problems Scales that had lower reliability estimates, but some fall slightly above 6–7 T score points—a finding that is consistent with standard error of measurements reported for the MMPI-3 normative sample (Ben-Porath & Tellegen, 2020b).

Although the MMPI-3 Substance Abuse (SUB) scale correlated moderately with the AUDIT-C, the association was weaker than some of the other convergent correlations with other criteria. This is likely due to prevalence of alcohol use in the sample and the scope of both the AUDIT-C and SUB scale on the MMPI-3. For instance, the AUDIT-C is intended to be a screener, not a full measure, of problematic alcohol use. The screener only contains three items, which limits the ability to assess the full range and severity of problematic alcohol use. Moreover, the AUDIT-C only assesses problematic alcohol use and not the wide range of substance abuse problems that the MMPI-3 SUB scale is able to capture. Finally, the SUB scale of the MMPI-3 is a face valid measure. Because approximately 20% of patients seeking bariatric surgery engage in an underreporting response style (Ambwani et al., 2013; Marek et al., 2015), scores on the SUB scale are likely range restricted as well. Nonetheless, the pattern of correlations indicate that SUB score can detect problematic alcohol and substance use in this population.

Regarding validity, there was evidence of convergent correlations between the Emotional/Internalizing Dysfunction scales and external criteria. For example, the MMPI-3 Emotional/Internalizing Dysfunction scales that assess Demoralization (and specific facets) correlated substantially with both the PHQ-9 and GAD-7. Low Positive Emotions correlated more strongly with the PHQ-9 compared to the GAD-7. Dysfunctional Negative Emotions (and its facets, notably Worry and Anxiety) correlated more highly with the GAD-7 vs. the PHQ-9. The Eating Concern scale on the MMPI-3 correlated highest with the Eating Concern scale on the EDE-Q.

An important consideration is that inter-correlations were high between the PHQ-9 and GAD-7 (r = 0.79) and among EDE-Q subscales—findings that are not unique to this sample (Gideon et al., 2016; Rahman et al., 2022; Taube-Schiff et al., 2015; Teymoori et al., 2020). Both the PHQ-9 and GAD-7 scores and the EDE-Q Global score had the highest correlation with the MMPI-3 Demoralization (RCd) scale, indicating these measures are likely saturated with demoralization variance, limiting their ability to identify discriminating correlations between depression, anxiety, and core eating disorder constructs. This is likely due to the heterogeneity and symptom overlap of the diagnostic criteria and distress typically caused by eating disorder constructs (e.g., body image concerns). Indeed, Teymoori et al. (2020) also found a high correlation between the PHQ-9 and GAD-7 in their sample of patients post-traumatic brain injury. They stated that this may suggest a “unidimensional construct such that both instruments were part of a general common factor” (p. 12) because they were unable to independently explain the variance of the construct (Teymoori et al., 2020). They hypothesized that this may be due to there being a few similar items on both instruments, as well as the fact that depression and anxiety share some underlying aspects, including negative affect and negative bias in information processing (Teymoori et al., 2020). Interestingly, the Eating Concerns (EAT) scale on the MMPI-3 was not meaningfully associated with the EDE-Q Restraint subscale. This finding is consistent with Marek et al.’s (2021, 2022) study which examined associations between the MMPI-3 EAT scale and the EDE-Q subscales in a postoperative bariatric sample. This likely reflects content overlap between the EAT scale and the other EDE-Q subscales of Eating Concern, Weight Concern, and Shape Concern. The EDE-Q Restraint subscale, on the other hand, overlaps in content with just one EAT scale item.

In terms of generalizability, the demographic makeup of our sample is similar to other bariatric surgery centers—that is, it was primarily comprised of women and the average age was between 40 and 45 (Welbourn et al., 2018). The majority (65.5%) of our sample was white, 26.6% was Black, and 7.9% was identified as another race. Of note, our results indicate a lack of gender differences on most MMPI-3 scales with the exception of some Behavioral/Externalizing Dysfunction (BXD) scales where men scored 4–5 T score points higher than women. These findings are consistent with similar patterns in the Technical Manual (Ben-Porath & Tellegen, 2020b) across other samples and likely reflect true gender differences. MMPI-3 scale scores in the current study are indeed similar to MMPI-2-RF scale scores in samples of patients seeking bariatric surgery.

The MMPI-3 assesses a broad number of psychosocial domains that are relevant and can be used to inform clinical impressions and recommendations in the preoperative bariatric surgery evaluation process. Our study demonstrates that the substantive scales of the MMPI-3 are reliable, comparable to their MMPI-2-RF counterparts, and have good convergent validity with extra-test measures assessing depression, anxiety, alcohol use, and eating disorder psychopathology. Additional research is needed to replicate our findings and continue to ascertain the psychometric qualities of the MMPI-3 in bariatric surgery settings. It is recommended that future research utilize different external criteria measures—such as data from the clinical interview and medical records—as well as outcome data to examine whether patterns of predictive validity evidenced with the MMPI-2-RF further generalize to the MMPI-3. In terms of clinical utility and deciding whether to add, continue to use, or eliminate the MMPI from bariatric psychological evaluation protocols, there are several points that clinicians may want to consider. First, the PHQ-9 and GAD-7 were derived from the Primary Care Evaluation of Mental Disorders (PRIME-MD) Patient Questionnaire which was developed for screening in primary care clinics to make referrals. These questionnaires, along with other brief symptom measures, typically do not assess beyond DSM criteria and no qualifications are required for administration. The MMPI utilizes construct-related assessment and assesses a broad range of psychological functioning with norms; however, qualifications and adequate training are required to use the MMPI. The MMPI has also demonstrated incremental validity. For example, Martin-Fernandez et al. (2021) found that MMPI-2-RF scale scores accounted for an additional 3%–24% of the variability in postoperative eating behaviors and quality of life in bariatric surgery patients, above and beyond other preoperative variables including the EDE-Q, Binge Eating Scale, and interview portion of the psychological evaluation. Presurgical psychological evaluations are higher stake evaluations and, given the literature on the tendency for this population to present favorably (Ambwani et al., 2013; Marek, 2014), the validity scales can be helpful to assess for underreporting. Information on underreporting gathered from the validity scales can be integrated into the written report, communicated with members of the multidisciplinary team who also care for the patient, and discussed with the patient prior to or after the psychological evaluation. Discussion with the patient could help providers relay the importance of being open and honest during appointments in order for the team to make individualized, meaningful recommendations for the patient and ultimately increase the chances of optimal outcomes. Sharing this information with a patient can also help providers relay that they are interested in knowing if/when a patient is experiencing challenges before or after surgery so that they may be able to intervene with additional support/intervention.