Structural differences in psychopathy between women and men: a latent modeling perspective

Research on sex differences in psychopathy indicates that men generally exhibit higher psychopathy scores than women. Measurement equivalence is an important prerequisite for the investigation of mean differences, but is often neglected for psychopathy instruments. The current research provides a systematic qualitative review of the pertinent literature on measurement invariance between men and women for several rater-based and self-report-based psychopathy assessments. Based on 28 studies, we found that the factor structure and factor loadings are most likely comparable between sexes for four out of nine instruments. Results on item thresholds, however, are inconsistent, which questions the comparability of mean scores between men and women for these instruments. The majority of studies that reported acceptable measurement equivalence indicated higher psychopathy scores among men compared to women. As the current literature is neither consistent nor exhaustive, further research needs to address structural differences in psychopathy between biological sexes more systematically.

Psychopathic individuals are characterized by a lack of remorse or shame, empathy, and responsibility, but also by high impulsiveness, deceptiveness, poor behavioral control, egocentrism, and the susceptibility to antisocial behavior (Cleckley 1941(Cleckley , 1976. As women commit fewer crimes than men (Leuschner 2020) and are naturally thought to be compassionate, emotional, and selfless (e.g., Connell and Pearse 2015), it is not surprising that, historically, most research on psychopathy focused on males, whereas females have long been neglected in this context. Only in the last two decades has research on female psychopathy increased. Thereby, researchers focused on prevalence rates, external correlates, and manifestations. Lower prevalence rates have almost consistently been found in female compared to male offenders (for a review see Beryl et al. 2014). Typically, among female offenders, base rates of a categorical psychopathy diagnosis (i.e., psychopathic vs. nonpsychopathic) are estimated at between 11 and 17% (see Verona and Vitale 2018), whereas in male offenders they are estimated to be about twice as high (i.e., 15-30%; e.g., Hare 2003;Nicholls et al. 2005). Although overall lower prevalence rates are typically found among the general population (Hare 2003), similar disparities are found here, with estimates of, for example, 0.9% for women and 3.7% for men in the UK (Coid et al. 2009). When treated as a dimensional construct, men generally exhibit higher psychopathy scores than women in both institutionalized and non-institutionalized samples (see Verona and Vitale 2018).
In the clinical and criminal justice system, the assessment of psychopathy can have a serious impact on an individual's life and society, as psychopathy is predictive of violence, treatment response, alcoholism, and recidivism (e.g., Douglas et al. 2018;Ellingson et al. 2018;Hare 1999). Therefore, an important concern is whether the observed prevalence differences between men and women rely on actual sex differences in psychopathy, or whether they might-at least in part-be due to sex-related biases in the assessment of those traits. Therefore, it is critical to assure that measurement instruments capture psychopathy equally in men and women, a condition referred to as measurement equivalence or invariance.

Assessment of psychopathy
Based on Cleckley's (1941) descriptions of the psychopathic individual, Hare (1980) developed the Psychopathy Checklist (PCL) for the clinical and forensic assessment of psychopathy that is currently applied in its revised version (PCL-R; Hare 2003). The PCL-R is referred to as the gold standard of the psychopathy assessment, in particular in the clinical and criminal justice system. Both the PCL-R and its screening version, the PCL:SV (Hare et al. 1995), involve a semistructured interview and the review of file information. While the PCL-R is primarily used in the forensic context, the instruments of the Comprehensive Assessment of Psychopathic Personality (CAPP; Cooke et al. 2012) were developed for a variety of settings (e.g., correctional, forensic psychiatric, civil psychiatric, community, and family). Similar to the PCL-R, the CAPP Symptom Rating Scale (CAPP-SRS; Cooke et al. 2012) is an expert rating including, among others, clinical reports, interviews, and behavioral observations. In addition, for research purposes there is the CAPP Lexical Rating Scale (CAPP-LRS; Cooke et al. 2012), which exists in three variants (i.e., the prototypicality, informant, and self-rating forms) and can be used by experts as well as lay people. For both the CAPP and the PCL-R, there are also self-report versions available (CAPP-SR; Sellbom and Cooke 2020; Self-Report Psychopathy Scale in its various editions [SRP]; e.g., Paulhus et al. 2017).
Although expert ratings offer a number of advantages over self-reports, the latter yield useful information regarding the lack of emotional responsiveness in psychopathic individuals, they are economic and easily administered, and reveal response styles.

Measurement process and measurement invariance
Psychopathy is a psychological construct that cannot be observed directly (and is, therefore, considered a latent trait); its behavioral manifestations, however, can be captured by a certain set of items. In Confirmatory Factor Analysis (CFA), the items that make up the latent construct load on a latent factor representing the construct. These factor loadings reflect the strength of the association between the test item and its assigned factor. For example, the four-factor model of psychopathy (Hare and Neuman 2005) was derived from factor analysis of the 20 PCL-R items, whereby each PCL-R item is assigned to a latent factor (i.e., Interpersonal, Affective, Antisocial, and Lifestyle).
The equivalence of the psychopathy construct across sexes has been validated by several researchers in terms of internal consistency, external correlates, and factor structure  P C L -R  6  1  5  -2  --2  -1  PCL:SV  5  2  2  --1  --1 a  1  CAPP-SRS  0  ------ N number of studies, PCL-R Psychopathy Checklist-Revised (Hare 2003), PCL:SV Psychopathy Checklist: Screening Version (Hare et al. 1995), CAPP-SRS Comprehensive Assessment of Psychopathic Personality-Symptom Rating Scale (Cooke et al. 2012), CAPP-LRS Comprehensive Assessment of Psychopathic Personality-Lexical Rating Scale (Cooke et al. 2012), CAPP-SR Comprehensive Assessment of Psychopathic Personality-Self-Report (Sellbom and Cooke 2020), LSRP Levenson Self-Report Psychopathy Scale (Levenson et al. 1995), Hare SRP Hare Self-Report Psychopathy Scale, now formally labeled SRP-4 (Paulhus et al. 2017), SRP-E Self-Report Psychopathy Scale-Experimental Version (Williams et al. 2007), SRP-SF Self-Report Psychopathy Scale-Short Form (Paulhus et al. 2017), TriPM Triarchic Psychopathy Measure (Patrick 2010), PPI-R Psychopathic Personality Inventory-Revised (Lilienfeld and Widows 2005), PPI-SF Psychopathic Personality Inventory-Short Form (Lilienfeld 1990), EPA Elemental Psychopathy Assessment (Lynam et al. 2011), DIF differential item functioning a Omitted tests at lower invariance levels equivalence (for an overview see Verona and Vitale 2018). The latter is usually performed by applying Multi-Group Confirmatory Factor Analysis (MGCFA; e.g., Vandenberg and Lance 2000). If groups exhibit the same number of underlying factors as well as the same factor-item assignment, the measurement is said to be configurally invariant (CI). Metric invariance (MI) is established if, additionally, the item factor loadings are equal across groups. MI allows comparisons of relations between variables, since the same measurement unit can be assumed. To meaningfully compare mean scores between groups (such as men and women), at least (partial) scalar invariance (SI) needs to be established, which means that the thresholds of items are equal across groups. Strict invariance can be established by finally constraining residual variances of the items to be equal (Meredith 1993). These steps should be executed by imposing increasingly strict constraints (i.e., from configural to strict; for a review see Vandenberg and Lance 2000). If measurement invariance on either level cannot be obtained, releasing constraints on individual parameters while testing for invariance of the remaining parameters allows to establish partial invariance (Byrne et al. 1989). The invariance levels can be briefly illustrated on the basis of the impulsivity item of the PCL-R: If CI holds, the impulsivity item can be assigned to the same latent factor (i.e., the Lifestyle facet) for both genders. If impulsivity, however, would be assigned to another facet for men than for women (e.g., the Affective facet for men), CI would not hold. MI holds, if impulsivity is an equally adequate indicator of the Lifestyle facet in men and women. If impulsivity, however, was a more direct manifestation of the Lifestyle facet in men than in women (i.e., the factor loading is higher for men), this would suggest a lack of MI. SI holds if mean differences in the impulsivity item between men and women are equal to mean differences in the Lifestyle facet. If, for example, men would be generally more impulsive than women for other reasons than differences in the Lifestyle facet, this would result in a lack of SI. Strict invariance holds if impulsivity assesses the Lifestyle facet with the same precision. If, for example, other factors have a stronger influence on impulsivity in men than in women, this would mean that impulsivity assesses the Lifestyle facet with less precision in men than in women and strict invariance would not hold.
Applying Item Response Theory (IRT; Embretson and Reise 2000;Reise et al. 2005) also allows conclusions to be drawn on the comparability of an instrument across groups. When the probability of endorsing an item differs between groups, the item exhibits so-called differential item functioning (DIF). Items that display substantial DIF are of questionable validity and may lead to bias in total scores. Thus, DIF can imply a lack of measurement equivalence.

Current study
The aim of our research was to systematically review the extant literature on measurement invariance of psychopathy instruments between biological sexes. In their review Verona and Vitale (2018) conclude that factor-analytic research with females had produced results largely consistent with studies in males regarding the underlying factor structure of several psychopathy instruments, implying CI. Concerning higher levels of invariance, some of the extant literature suggests SI for several questionnaires (e.g., Neal and Sellbom 2012;Salekin et al. 2014), whereas other research indicates that it only partially holds (e.g., Anestis et al. 2011).
Based on these initial results, we hypothesized that CI holds for all the psychopathy instruments described above (H1). However, prior research led to the expectation that the instruments exhibit different levels of invariance beyond CI (i.e., metric, scalar, and strict invariance) or display DIF (H2). In addition, it was assumed that latent factor mean scores are generally higher in men than in women (H3) 1 .

Methods
This review applied a systematic qualitative approach with a selection process following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) 1 Note that differences in latent means were only considered if the respective researchers reported (partial) SI for a given psychopathy measure. If full SI is obtained, observed mean scores and latent mean scores are thought to be sufficiently equal to interpret the observed mean scores meaningfully. That was the case for two studies, for which we report observed mean scores (Neumann and Hare 2008;Walsh et al. 2019).
guidelines (Page et al. 2021). The literature research was conducted in the following EBSCOhost sources in September 2022: APA PsycArticles, APA PsycInfo, PSYNDEX Literature with PSYNDEX Tests, and Psychology and Behavioral Sciences Collection. Moreover, the search was repeated in the Web of Science (Social Science Citation Index [SSCI]) database. The search string comprised the names of the above-mentioned instruments-which were selected mainly because they exclusively measure psychopathy-as well as terms related to measurement invariance and sex.
Documentation of the whole search (including a table with all screened reports) and the supplemental material can be found on the Open Science Framework (OSF): https:// osf.io/dk2q7/. All steps of the review, including the hypotheses, were preregistered on the OSF.
Only empirical studies published in peer-reviewed journals were considered. Book chapters, dissertation abstracts, meta-analyses, and reviews were omitted from the search. Only studies that included both males and females were considered to allow for direct comparisons. Hereby, we also included studies that estimated CFAs separately for men and women (instead of MGCFA) since adequate model fit in both groups can be indicative of CI. A further inclusion criterion was the participants' age (minimum 18 years), since personality disorders are not diagnosed in underage individuals (e.g., American Psychiatric Association 2013). As an exception, we included two articles (Adams et al. 2020;Gummelt et al. 2012) with samples that mainly consisted of adults but also 17-year-old individuals.
The search yielded 630 results in EBSOhost and 770 in the Web of Science. Five articles were detected from backward search in the included articles. Duplicates were removed, leading to a total of 1095 articles, most of which were excluded upon screening. The most frequent reasons for exclusion were clearly off-topic articles (n = 353) and samples including only males (n = 285) or underage individuals (n = 131). Further reasons for exclusion are documented in the PRISMA diagram (Fig. 1). The final literature review comprised 28 articles, which are highlighted in the "Reference" section. Detailed information on the samples, methods, and main results of those studies can be obtained from Table 3 in the Appendix.

Results
This section summarizes the main findings of our qualitative synthesis, grouped by psychopathy measure. None of the studies included in this review investigated the latent structure of the EPA, the CAPP-SRS, or the CAPP-SR. Moreover, no study assessed strict invariance. An overview of the number of studies supporting invariance on the respective levels is given in Table 1. Some researchers omitted tests for invariance on one or more levels or did not report according results; therefore, the number of studies varies depending on the invariance level.

PCL-R
Six studies examined measurement equivalence of the PCL-R. Four of them estimated CFAs separately for males and females, two studies applied MGCFA, and one study IRT analyses (cf. Table 3 in the Appendix).
Five of the six pertinent studies found support for CI (Bolt et al. 2004;Klein Haneveld et al. 2022;Neumann et al. 2007;Walters et al. 2011;Windle and Dumenci 1999), whereas one did not find empirical support for their model in either subsample (Darke et al. 1998).
Three studies examined higher levels of invariance. Both Klein Haneveld et al. (2022) and Windle and Dumenci (1999) found partial MI with one PCL-R item lacking invariance. Their MGCFA results further supported partial SI, with three non-invariant item thresholds reported in both studies. In their IRT study, Bolt et al. (2004) found DIF for 12 items. Since the magnitude of the detected item differences between the male and female offender groups seemed to be negligible, the authors concluded that partial SI was confirmed for the PCL-R in their samples.
In summary, results of the studies reviewed largely support lower (i.e., CI) and higher levels (i.e., partial SI) of measurement invariance for the PCL-R. However, results on sex differences in the psychopathic traits assessed were inconsistent.

PCL:SV
Five studies investigated the latent structure of the PCL:SV, whereby one study applied Exploratory Factor Analysis (EFA), two studies separate CFA in men and women, two studies MGCFA, and one IRT analysis (cf. Table 3 in the Appendix). Two studies found evidence of CI (Skeem et al. 2003;Thomson et al. 2019), whereas one did not (Forth et al. 1996). An EFA by Strand and Belfrage (2005) revealed a two-factor structure of the PCL:SV for males and a three-factor structure for females.
Only one of the three studies examining higher levels of invariance tested and found support for full MI (Skeem et al. 2003). Skeem et al. (2003) did not further test for SI. Results by Neumann and Hare (2008) supported full SI, while in the IRT study by Strand and Belfrage (2005) five PCL:SV items displayed DIF. Neumann and Hare (2008) were the only researchers to report mean differences, and they found higher observed PCL:SV scores for men (M = 3.53, SD = 3.79) than for women (M = 2.16, SD = 3.23) with a small effect size (d = 0.30).
In sum, results do not unanimously support the presence of CI. Studies that tested for higher invariance levels (i.e., MI and SI) attest to the measurement equivalence of PCL:SV items in men and women. Yet, some PCL:SV items may still show sex-related response bias according to IRT analysis. The existence of sex differences in PCL:SV-assessed psychopathy traits (H3) is supported, but evidence is limited to a single study.
Taken together, extant findings indicate lower levels of measurement invariance for the self-report version of the CAPP-LRS. However, as results on SI are inconclusive, the reported sex differences in CAPP-LRS scores (Sellbom et al. 2015) should be interpreted with caution.

LSRP
In total, six studies examined measurement equivalence for the LSRP. There was one study providing CFA results and one providing congruence coefficients-an index of the similarity between factors. Two studies applied MGCFA and two IRT, respectively (cf. Table 3 in the Appendix). CI was supported by all four studies that examined CI (Anestis et al. 2019;Lynam et al. 1999;Sellbom 2011;Somma et al. 2014).
Of the two studies that tested for MI, one found support for full MI (Sellbom 2011) and one for partial MI (Lynam et al. 1999) with one non-invariant factor loading. Both studies did not further test for SI. The two studies that applied IRT detected different degrees of DIF: In Gummelt et al. (2012), 17 items displayed DIF between men and women, while in Hauck-Filho and Teixeira (2014), only 3 items displayed DIF.
All in all, the studies reviewed support the notion of measurement equivalence for the LSRP at the first two invariance levels. Nevertheless, the results of the DIF analyses indicate that the number of items that work differently in men and women may be sample dependent. (Latent) mean differences in LSRP scores between men and women were not reported. CI was examined and supported by three of these studies (Dotterer et al. 2017;Neumann et al. 2012;Walsh et al. 2019). Three studies tested for MI, two of which reported for the SRP to attain full MI (Neal and Sellbom 2012;Neumann et al. 2012), whereas Carre et al. (2018) could not. Four studies attained SI (Dotterer et al. 2017;Neal and Sellbom 2012;Neumann et al. 2012;Walsh et al. 2019). Men (M = 62.10, SD = 15.95) obtained higher observed SRP-SF total scores than women (M = 51.97, SD = 15.19; F (1, 587) = 65.85, p < 0.001) with a small effect size (η 2 = 0.10; Walsh et al. 2019). Taken together, the majority of studies found evidence of measurement equivalence for the SRP in men and women, at both lower and higher invariance levels. Our prediction on sex differences in SRP scores was also supported, but only by a single study.

PPI
Of the three studies examining equivalence for the PPI, two studies included separate CFAs for men and women, one of which used additional MGCFA, and one used IRT (cf. Table 3 in the Appendix). Anestis et al. (2011) compared three competing models of the PPI-R and found that both the one-factor (PPI-Psychopathy) and the two-factor model (Self-Centered Impulsivity and Fearless Dominance) yielded good fit for the female sample but barely acceptable fit for the male sample, whereas for the three-factor model, fit was modest for both groups. In contrast, Adams et al. (2020) did not find empirical support for their eight-factor PPI-SF model, neither in male, nor in female participants.
Two studies tested for higher levels of invariance. Anestis et al. (2011) reported partial MI for the one-and the twofactor PPI model with two items and one non-invariant item, respectively. The partially constrained three-factor model fit the data poorly. Anestis et al. (2011) did not apply further constraints. In the IRT study, 61.1% (n = 80) of the PPI-R items displayed DIF across sex groups (Eichenbaum et al. 2019).
The results cast doubt on the presence of measurement invariance for the PPI and the general suitability of the respective measurement models tested. Moreover, the amount of DIF was substantial. Mean sex differences in PPI scores were not reported in any of the studies reviewed.

TriPM
One study investigated the TriPM by means of MGCFA and one by IRT. Neither study tested for CI. Full MI was attained by Carre et al. (2018). The same study further supported full SI. In the study by Eichenbaum et al. (2021), 61% of the TriPM items (n = 34) displayed DIF. Women scored lower than men on all three factors in the study by Carre et al. (2018; boldness, t (474) = 5.874, p < 0.001; meanness, t (474) = 8.262, p < 0.001, and disinhibition, t (474) = 3.898, p < 0.001). Although the MGCFA results reported support the presence of higher-level mea-surement invariance for the TriPM, IRT results imply a lack of equivalence. Thus, evidence of sex differences in TriPM scores need to be interpreted with caution, not least because they have only been reported by one study.

Discussion
Within this qualitative synthesis we examined equivalence across sexes in the latent structure of psychopathy as measured with several expert and self-report assessment instruments. We hypothesized that the instruments exhibit at least a basic level of measurement invariance (i.e., CI [H1]). In line with previous research (Verona and Vitale 2018), CI was confirmed for several instruments, that is the PCL-R, the CAPP-LRS, the LSRP, and the SRP. No study that investigated the TriPM reported results on CI. Results on the PCL:SV and the PPI-R were inconsistent, but, in sum, suggest a lack of CI. A general reexamination of the latent factor structure of the PPI in male and female samples should be considered. Assessments of measurement equivalence of the EPA, the CAPP-SRS, and the CAPP-SR are still lacking. Therefore, we found conclusive evidence of configural measurement invariance for four of the psychopathy measures addressed (i.e., PCL-R, CAPP-LRS, LSRP, SRP), which provides only partial empirical support for our first hypothesis.
We further assumed that the psychopathy instruments would exhibit different levels of measurement invariance or display DIF (H2). Based on the studies included, (partial) MI was mostly confirmed for all instruments except the EPA, CAPP-SRS, and CAPP-SR. With regard to the SRP, however, results on MI were mixed. With respect to the remaining instruments results were limited to one or two studies each. In view of this sparse empirical evidence, the present results should be interpreted with caution.
Findings on scalar invariance (SI) are highly inconclusive between but also within several instruments. Even though Bolt et al. (2004) found 12 items of the PCL-R to display DIF, partial SI was largely supported. Likewise, studies unanimously support the presence of SI for the SRP measures. No clear conclusions can be drawn for the other psychopathy scales reviewed, as the results on SI for those measures were inconsistent and partly appeared to depend on the statistical method (e.g., DIF analysis vs. MGCFA). In sum, six instruments appeared to exhibit (partial) MI, whereas SI can only be presumed for two measures, the PCL-R and the SRP, with relative certainty.
Finally, we assumed that men exhibited higher levels of psychopathy (H3) when the assessment method has been proven to be comparable for men and women, i.e., when at least partial SI has been confirmed for that measure. Across different samples (i.e., community, student, incar-cerated) six studies (Bolt et al. 2004;Carre et al. 2018;Klein Haneveld et al. 2022;Neumann and Hare 2008;Sellbom et al. 2015;Walsh et al. 2019) found either latent or observed mean scores of the respective instrument to be higher for men than for women with small to medium effect sizes (please note that effect sizes were only given for three studies). In contrast, only one study (Windle and Dumenci 1999) did not find any significant sex differences; however, this might be attributed to the specific nature of the sample tested, which comprised alcoholic inpatients. In sum, these results support our last hypothesis and corroborate previous findings (Verona and Vitale 2018).
It is conceivable that the inconsistent results with regard to both invariance and mean differences are, at least in part, due to sampling issues. Some samples comprised criminal offenders (e.g., Bolt et al. 2004;Sellbom 2011), methadone patients (Darke et al. 1998), or alcoholics (Windle and Dumenci 1999), whereas others comprised community individuals (Neumann and Hare 2008;Somma et al. 2014) or undergraduate students (e.g., Gummelt et al. 2012;Lynam et al. 1999). The contradictory results on the PCL:SV, for example, may be attributed to differences in sample sizes between the studies by Forth et al. (1996, n = 75 per sex) and Thomson et al. (2019;N = 565). Likewise, sampling issues were apparent for the CAPP-LRS, for which qualitatively different samples were compared (general population [Sellbom et al. 2015] vs. felons [Hanniball et al. 2021]).
Another sample property that might cause inconclusive results is ethnicity or cultural background. The majority of the studies has been conducted in Western samples, whereas Hauck-Filho and Teixeira (2014) examined sex differences for the LSRP in a Latin American sample and obtained results that differed from those of a U.S. sample with the same method (Gummelt et al. 2012). Hence, the psychopathy measures examined were not only administered in different cultural contexts, but also sometimes in different languages, which has a known impact on measurement invariance analyses (see Bader et al. 2021). Accordingly, upon comparing women from different world regions in their large worldwide sample, Neumann et al. (2012) found that SI of a given psychopathy measure depended on the world regions that women came from. In this context, it is important to mention that exclusively German samples were not investigated in any of the 28 studies (but German subsamples are included in Neumann and Hare [2008] and Walters et al. [2011]). In order to draw conclusions about the use of the investigated instruments for both biological sexes in Germany, this research gap needs to be addressed.
Besides sample properties-and sample heterogeneity in particular-as a potential cause of inconsistent findings, there are five methodological aspects that we would like to address. First, it should be taken into account that the various psychopathy instruments have been developed on the basis of different conceptualizations of psychopathy. Whether or not an instrument is invariant between genders may depend on the underlying theoretical concept. Second, if a confirmatory model fits the data, it does not necessarily mean that another model would not fit the data even better. Thus, it is possible that the optimal measurement model of some instruments differs between men and women (e.g., Salekin et al. 1997). Third, many of the studies we reviewed struggled with the implementation of a properly fitting measurement model. To achieve acceptable fit, the models were often adapted in various ways, which raises the probability of overfitting. These discrepancies in model adaptation may have caused differences in the subsequent MI analyses as well. Fourth, a large amount of studies only attained partial MI or partial SI, but neither provided explanations for the partial non-invariance nor compared further results to those of a model set to invariance, as it should be done to investigate the potential effects of partial non-invariance (Putnick and Bornstein 2016).
Finally, a major concern is the inconsistent execution and reporting of the analytic strategy and results across studies. Study authors sometimes omitted tests of one or more invariance levels. They relied on different indices (e.g., likelihood ratio tests of nested models, or changes [] in comparative and absolute model fit indices) and applied inconsistent cut-off values. Among the criteria applied, some were stricter than others, thus, affording ambiguous conclusions about measurement invariance. Meredith (1993) recommended to execute MGCFA with increasing restrictiveness. Moreover, there are clear conventions on testing and reporting invariance (Putnick and Bornstein 2016), which should be applied more consistently in future studies. In order to circumvent the drawbacks of sample size-depen-dent significance tests and cut-off values, Nye and Drasgow (2011) suggested the application of effect size estimates.
Taken together, the results on sex differences in the covariance and mean structure of psychopathy measures are inconclusive. The initial question whether observed sex differences in psychopathy occur because of actual trait differences or whether they are due to sex-related bias in the assessment method(s) cannot be answered clearly. Nevertheless, the substantial amount of studies attaining partial SI for at least some of the measures provides a promising perspective towards a future assessment of psychopathy that would be equally applicable to both men and women. Future research should consider a stringent investigation of the various instruments, as for example presented by Klein Haneveld et al. (2022). In this regard, the conditions and reasons for (partial) non-invariance need to be addressed directly, for example, by systematically assessing different target populations. Only based on such studies will it be possible to make informed decisions on the suitable measure(s) when exploring psychopathy in men and women. Until then, forensic practice should be cautious when interpreting mean scores and applying cut-off values in women, as different norms may apply to them.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4. 0/. Self-report, informant report, or prototypicality rating

15-20 min
Comprehensive Assessment of Psychopathic Personality-Self-Report (  The congruence coefficient indicated that factors were not satisfyingly similar between sexes (rc = 0.83 for the first factor and rc = 0.34 for the second factor) PPI-R CFA; MGCFA One-factor, two-factor, and three-factor model In separate CFAs the one-factor and two-factor model yielded good fit for the female sample and barely acceptable fit for the male sample, while for the threefactor model fit was modest for both groups. Partial MI was attained for the one-and the two-factor model with two items and one item differing between sexes, respectively. The partially constrained threefactor model fit the data poorly. No further constraints were applied  Inventory-Revised (Lilienfeld and Widows 2005), PPI-SF Psychopathic Personality Inventory-Short Form (Lilienfeld 1990)