Daubert and Rules of Evidence

Psychological evaluations are recognized as useful in legal proceedings, especially given the relevance of human judgment, motivation, and behavior to so many criminal and civil legal matters. The Federal Rules of Evidence permit opinions by psychologists (and other mental health professionals) as expert testimony if the education, training, and experience of the expert will assist in understanding the evidence and trying the case. At the same time, the powerful influence that psychological testimony can have on jury perceptions and judicial decisions makes it imperative that the opinions expressed by psychologists in these cases derive from assessment procedures with demonstrable scientific credibility. The courts are rightly concerned about the quality of expert testimony on matters as complex as psychological functioning, and they strive to exclude irrelevant or pseudo-scientific evidence from being presented at trial.

In 1993, the US Supreme Court’s unanimous decision in the case of Daubert versus Merrell Dow Pharmaceuticals established standards regarding the admissibility of testimony from expert witnesses, such as psychologists and other mental health professionals. The Daubert standard holds that, if the introduction of psychological expert testimony is challenged by opposing counsel, the presiding judge must determine whether that testimony is based on sound scientific principles and methods. The standard further holds that the testimony must be relevant to the proceedings and that its probative value must be greater than its prejudicial potential. Daubert provides a higher hurdle for the admissibility of psychological testimony than the existing Frye rule, which required that the work of experts should be generally accepted within their field. When psychologists offer expert testimony that is informed by the results of psychological tests, applying the Daubert standards to that testimony can be facilitated by existing scientific evidence related to those tests. Challenges by the opposing counsel of a psychologist often demand an explanation or justification for the specific measures or assessment procedures selected to inform the opinion. Providing an adequate response to such challenges requires an adequate understanding of the empirical evidence for the reliability and validity of the measure, as well as a fair appraisal of the limitations of the measure. This article reviews information that may assist psychologists in addressing such challenges and in evaluating whether the Personality Assessment Inventory (PAI; Morey, 1991, 2007a) sufficiently meets the Daubert standards. The PAI was a new instrument at the time of the Daubert ruling, as it was originally published in 1991, and the last 30 years have allowed for the accumulation of a diverse array of scientific findings that inform potential questions of its admissibility under the Daubert standards.

Personality Assessment Inventory

The PAI is appropriate to address general questions about the presence of psychopathology and behavior problems. The scales and subscales of the PAI cover a wide range of mental disorders that include the more frequently diagnosed conditions described in the Diagnostic and Statistical Manual of Mental Disorders (DSM-5; American Psychiatric Association, 2013) and problems observed in clinical settings (e.g., suicidal ideation). Construction of the PAI followed the construct validation approach in test development (Clark & Watson, 2019; Loevinger, 1957). The generation and conceptual evaluation of items were closely informed by theories of the phenomenology and presentation of various mental disorders. Face validity, at least to experts in the relevant theory domain, was emphasized in the composition of the items. The items that appeared on the final version of the PAI administered to normative samples were the ones that survived multiple rounds of empirical evaluation. Two empirical parameters that were prioritized in item selection were the breadth of content represented by items within the same scale and discriminant validity with respect to other scales and demographic variables. The normative reference sample for the PAI includes 1000 cases randomly selected from a larger sample of community adults residing in 12 US states who were not in treatment at the time of assessment. The cases in the community normative sample were selected following a random-stratified sampling procedure to match sex, age, and racial identification to the 1995 US census projections.

The PAI includes 11 clinical scales that cover major categories of mental disorders described in the DSM-5, including affective disorders, anxiety disorders, schizophrenia, personality disorders, and substance use disorders. Five treatment consideration scales cover additional clinical constructs that can become relevant in court proceedings, such as aggression, suicidal ideation, situational stress, and attitude about treatment. Published research on the PAI has addressed all the scales and subscales, although some scales have attracted considerably more research attention to date than others. Fortunately, the empirical literature offers considerable evidence from forensic settings regarding the validity of PAI scales that assess posttraumatic stress disorder (ARD-T subscale), antisocial personality features (ANT scale and subscales), and aggressive behavior (AGG scale and subscales). These constructs represent frequent matters of interest in litigation concerned with psychological injury (e.g., posttraumatic stress) and in court decisions about sentence mitigation or parole (e.g., antisocial and aggressive traits). The present article reviews some validation studies of PAI for the diagnosis of posttraumatic stress disorder (PTSD) and for the prediction of criminal offending and recidivism.

The PAI also includes a collection of validity scales and indicators that aid in the assessment of protocol acceptability and response distortion (Morey, 2007a). The Infrequency (INF) and Inconsistency (ICN) scales are designed to detect random or careless patterns of responding to the PAI items, and the test manual reports on the success of these scales in discriminating between randomly generated protocols and protocols completed by community adults and clinical patients. The assessment of response distortion is especially critical in the forensic context, as many legal scenarios create incentives for plaintiffs and defendants to misrepresent their psychological status by amplifying or suppressing symptoms and problems. The empirical literature provides more information about diagnostic efficiency and error rates for the detection of feigning, malingering, and deception than for clinical diagnoses. This emphasis on validity scales is likely because it is easier for researchers to acquire criterion groups to evaluate the accuracy of validity indicators than it is to acquire patient groups with clearly established and focal DSM-5 diagnoses. Samples of non-clinical respondents can be instructed to dissimulate mental disorders, and samples of clinical patients can be instructed to conceal their symptoms and problems while completing the PAI. Negative distortion, feigning, and potential malingering are assessed with the Negative Impression Management (NIM) scale, the Malingering Index (MAL), and the Rogers Discriminant Function (RDF). Additional measures have been constructed by other researchers that show promise as supplements to these three standard indicators, including the Negative Distortion Scale (NDS; Mogge et al., 2010), the Multiscale Feigning Index (MFI; Gaines et al., 2013), the Cognitive Bias Scale (CBS; Gaasedelen et al., 2019), and the Hong Malingering Index (HMI; Hong & Kim, 2001). Positive distortion, defensiveness, and concealment of problems are assessed with three standard indicators, including the Positive Impression Management (PIM) scale, the Defensiveness Index (DEF), and the Cashel Discriminant Function (CDF). The Hong Defensiveness Index (HDI; Hong & Kim, 2001) and the Positive Distortion Scale (PDS; Mogge & LePage, 2017) are two supplemental indicators of positive distortion.

The many scales and subscales of the PAI have been the subject of hundreds of research studies. For the sake of illustrating the utility of the PAI for forensic assessment, this article will review some research on two clinical problems that are common concerns for forensic cases and that have been the subject of multiple research studies. The present review also focuses on the capacity of the PAI to detect the feigning of mental disorders and concealed psychopathology. To keep the focus on the applicability of the PAI to the Daubert standards, the following sections are not intended to be a comprehensive review of the validity of the PAI in general or in these areas specifically.

Assessment of Trauma and Psychological Injury with the PAI

As an instrument originally designed to assess a broad spectrum of psychopathological conditions and personality traits, the PAI is well-suited to provide evidence of psychological damages resulting from personal injury, infliction of harm, or traumatic events. In a recent survey of case law, disability claims were found to be the most frequent reason for introducing PAI data into evidence in legal cases brought to court (Meaux et al., 2022). Personal injury cases will also frequently center on the cognitive and neurobehavioral symptoms related to head injury. Research on PAI score profiles obtained from victims of traumatic brain injury shows that the instrument can discriminate between the somatic and cognitive symptoms that may accompany head trauma (Kurtz et al., 2007). Moreover, the PAI does not over-pathologize emotional symptoms or behavioral problems in most cases of head injury (Kennedy et al., 2015; Lange et al., 2012; Miskey et al., 2015).

Questions of psychological injury will frequently center on experiences of trauma during the accident or injury event and its lasting effects on emotional, cognitive, and behavioral functioning. The PAI Traumatic Stress (ARD-T) subscale presents eight items that inquire about ongoing distress related to some specific event in the past, including re-experiencing the event, intrusion symptoms, and avoidance behaviors. Scores on ARD-T can be helpful in a forensic setting to determine the extent to which a person is experiencing mental health problems following the criminal or civil matter in question. ARD-T has been the subject of validation research by several different investigators. Mozley et al. (2005) examined PAI results for a sample of 210 combat-exposed veterans, and they found that ARD-T had the highest score in the profile. Mozley et al. (2005) also report large correlations between ARD-T and other self-reported measures of combat-related posttraumatic stress. The validity of ARD-T has also been investigated in a study of adult females seen in a clinical setting (Cherepon & Prinzhorn, 1994), and 44 patients with reported abuse histories obtained significantly higher ARD-T scores versus 47 patients without abuse histories. Other studies have compared respondents with and without diagnoses of PTSD based on the Clinician-Administered PTSD Scale (CAPS; Blake et al., 1995). Bellet et al. (2018) report that a cut score of 64 on ARD-T yielded a sensitivity of 0.91 and specificity of 0.75 to diagnose PTSD in a sample of 47 combat-exposed veterans. McDevitt-Murphy et al. (2005) recruited a sample of women from the community who reported exposure to various forms of trauma, mostly sexual assaults and transportation accidents. The mean PAI profile of women with CAPS diagnoses of PTSD showed the highest score on ARD-T; a cut score of 71 produced sensitivity of 0.79 and specificity of 0.88. After screening a large sample of undergraduate students, McDevitt-Murphy et al. (2007) identified 30 students with PTSD using the CAPS. For these students with PTSD diagnoses, the highest mean score was on ARD-T, and the mean score for this group was significantly higher than for groups of students with depression, social phobia, or well-adjusted status. Although these studies demonstrate the adequate validity of ARD-T to evaluate the presence of psychological injury following traumatic events, further diagnostic efficiency studies are needed to identify cut scores that yield acceptable error rates.

PTSD is a complex syndrome of emotional symptoms and behavioral problems. Accordingly, there are several full scales and subscales beyond ARD-T that are relevant to the characteristic symptomatology of PTSD. Morey (1991, 1996) examined the profiles of a subgroup of patients from the clinical standardization sample with clinical diagnoses of PTSD. Although ARD-T was the highest score in the mean profile, elevations were also observed on the Anxiety (ANX), Depression (DEP), Thought Disorder (SCZ-T), and Physical Aggression (AGG-P) scales. These elevations were also observed in the above-cited study of combat veterans (Mozley et al., 2005), whereas AGG-P scores were not elevated in the mean profiles of female psychiatric patients (Cherepon & Prinzhorn, 1994) and trauma-exposed female college students (McDevitt-Murphy et al., 2007). Morey (1996) derived a LOGIT function to predict the diagnosis of PTSD, and this formula has also demonstrated diagnostic efficiency to detect CAPS-diagnosed PTSD in studies of civilian females (Calhoun et al., 2009) and male and female veterans (Bellet et al., 2018). However, the performance of the LOGIT function in these two studies was quite comparable to the performance of ARD-T alone. The PAI may also be limited in the identification of recognized subtypes of PTSD. A latent profile analysis by Ingram et al. (2021) of valid PAI profiles from 261 trauma-exposed veterans with probable diagnoses of PTSD did not recover the expected symptom clusters based on the trauma literature.

Criminal Risk Assessment with the PAI

The PAI can contribute valuable information for the prediction of institutional misconduct, the prediction of recidivism for incarcerated persons being considered for release, and the prediction of violence in individuals being evaluated for potential hospitalization or confinement. The accuracy of risk assessments is essential for security purposes and minimizing problems both during incarceration and after release. Several scales included in the PAI have been specifically associated with institutional risk assessment, including the Antisocial Feature scale (ANT), Aggression scale (AGG), Dominance scale (DOM), and the Violence Potential Index (VPI; Morey, 1996).

A meta-analytic review by Gardner et al. (2015) examined the ability of PAI scores to predict future misconduct and violence. Across 17 studies, they found that the ANT and AGG scales were the strongest predictors of all types of misconduct. These results also indicated that these PAI scales were comparable to other commonly used risk assessment instruments in the forensic field, such as the Psychopathy Checklist-Revised (PCL-R; Hare, 2003). Psychopathy increases the likelihood of individuals engaging in criminal behavior, and the subscales of ANT provide indications of the behavioral manifestations of psychopathy, specifically rule-breaking, norm violation, and legal troubles (i.e., Antisocial Behavior (ANT-A)) and stimulus-seeking and risk-taking activities (i.e., Stimulus-Seeking (ANT-S)). The interpersonal aspect of psychopathy, specifically callous attitudes and disregard for others, is assessed with the Egocentricity (ANT-E) subscale. Research with at-risk youth demonstrates that the adolescent version of the PAI (PAI-A; Morey, 2007b) provides information on how traits of psychopathy, such as callous-unemotional behavior and impulsivity problems measured by the ANT scale, can predict adult criminal behavior (Charles et al., 2021; Preston et al., 2021).

Edens and Ruiz (2009) examined the validity of the PAI for the prediction of institutional aggression. They note that the ANT scale is vulnerable to efforts at positive impression management or defensiveness. Edens and Ruiz (2009) found that, for a sample of 215 inmates who answered the PAI nondefensively, the ANT scale was the strongest predictor of both aggressive defiance and physically aggressive institutional misconduct. Among 134 inmates who responded to the PAI in a defensive manner, only the AGG and Nonsupport (NON) scales were found to significantly predict aggressive misconduct. Thus, AGG scores were an especially strong predictor of physical aggression, regardless of whether the respondent engaged in defensiveness. The AGG scale has demonstrated significant utility in forensic psychiatric units. Battaglia et al. (2021) found that the Aggressive Attitude (AGG-A)) and Verbal Aggression (AGG-V) subscales were significantly correlated with severe aggressive incidents. Several other studies have demonstrated the strength of the AGG scale in predicting misconduct among incarcerated and hospitalized individuals and differentiating between violent and nonviolent offenders (Davidson et al., 2016; Douglas et al., 2001; Newberry & Shuker, 2012).

The PAI can also provide important information on the risk of recidivism. McCallum et al. (2022) examined the predictive capabilities of PAI scores in a sample of 1412 sex offenders. They observed the validity of ANT, AGG, DOM, and the VPI scores to predict arrests after release from incarceration. McCallum et al. (2022) found that the AGG subscales provided the strongest and most consistent predictions of recidivism. AGG scores were a stronger predictor than more commonly used indicators of recidivism, including age at release and the number of pre-release arrests. Similar results have been found in earlier studies examining the relationship between AGG scores and recidivism (Walters & Duncan, 2005). Newberry and Shuker (2012) determined that ANT has significant utility in correctly classifying individuals who will re-offend. They found that a cut score of 60 T on ANT-A accurately detected 96% of offenders with a high risk of reconviction, and this score incorrectly identified a few individuals. Walters (2007) similarly found that the ANT scale was most effective at predicting recidivism and behavior outside of prison. The ANT scale also demonstrated significant incremental validity over the PCL-R. Thus, information from the ANT and AGG scales can be used to make important post-adjudication decisions.

Negative Distortion, Feigning, and Malingering

Negative distortion is a significant limitation of the use of self-report methods in the forensic context, especially when the items have strong face validity. In court proceedings, the question of malingering is most commonly encountered in cases involving questions of mental state at the time of the offense, sentence mitigation, and psychological injury. The NIM is a nine-item scale embedded within the original PAI that presents symptoms that are highly exaggerated or rarely observed in psychiatric patients. A meta-analysis by Hawes and Boccaccini (2009) found mean differences in the NIM between analog fake-bad groups and clinical patients corresponding to large effect sizes. Hawes and Boccaccini (2009) also compiled available evidence from diagnostic efficiency studies to determine optimal cut scores for identifying protocols compromised by negative distortion. Although the test manual (Morey, 1991, 2007a) recommends T-score elevations at 92 or higher as indicative of deliberate feigning of mental disorder, the meta-analytic findings place the optimum cut score at 84 (Hawes & Boccaccini, 2009). The diagnostic efficiency studies of NIM and MAL to detect feigned mental disorders typically find higher specificity than sensitivity for the recommended ranges of the cut score (Hawes & Boccaccini, 2009; Kurtz & McCredie, 2022). Thus, the use of the PAI to evaluate feigned mental disorders can be considered conservative in the legal context, as it is more likely to accept the results from a defendant who is malingering than it is to erroneously suggest that malingering is occurring in a defendant who is responding honestly.

Studies that examine attempts to feign unspecified mental disorders are most germane to legal questions about sanity, the general mental state at the time of the offense, or sentence mitigation. Psychological injury cases will be more concerned with the malingering of specific diagnoses. Subsequent to the Hawes and Boccaccini (2009) meta-analysis, more studies have emerged that examine attempts to feign specific disorders, such as general and pain-related disability (Hopwood et al., 2010; Rogers et al., 2012), attention deficit hyperactivity disorder (Aita et al., 2018; Musso et al., 2016; Smith et al., 2017), posttraumatic stress disorder (Russell & Morey, 2019; Thomas et al., 2012; Wooley & Rogers, 2015), and head injury (Armistead-Jehle, 2010; Gaasedelen et al., 2017; Keiski et al., 2015; Kurtz et al., 2023). Pignolo et al. (2023) assigned community adults and forensic inpatients to feign schizophrenia, PTSD, or depression, and they obtained very similar diagnostic efficiency statistics for all three targets for dissimulation.

Morey (1996) notes that negative distortion is not a unitary concept. Exaggeration and over-reporting of symptoms are commonly observed with various forms of psychopathology, like depression and borderline personality disorder. Thus, elevations on scales like NIM should not be equated with “faking.” The diagnosis of malingering requires extra-test evidence to support the contention that there are clear incentives for the respondent to misrepresent symptoms and problems consciously and deliberately. In contrast, the respondent who is exaggerating due to extreme distress, concern for receiving help, or negative views of the self is unaware of the distortion in their responses to the test items. Morey and Hopwood (2007) label this latter type of response “non-effortful” to reflect the absence of deliberate intention to present unfavorably. Additional indicators of negative distortion were added to the PAI by Morey (1996) and Rogers et al. (1996) that assist in the discrimination of conscious-effortful negative distortion (i.e., feigning) from the unconscious distortion associated with distress and psychopathology. MAL is a collection of score configuration rules that are more strongly associated with attempts by healthy college students to role-play severe mental disorders than with the responses of actual clinical patients (Morey, 1996). RDF also contrasts role-play attempts by students of varying sophistication with the genuine responses of clinical patients, but the differences between groups are represented in the form of a discriminant function formula that includes 20 PAI full scales and subscales (Rogers et al., 1996). The supplemental and other recently introduced negative distortion indicators have been subjected to varying amounts of validation research, and the available evidence demonstrates their modest and mixed success in incrementing the standard negative distortion indicators (Kurtz & McCredie, 2022). Among the more promising new indicators are the CBS and the three CBS Scale of Scales (CBS-SOS; Boress et al., 2022), which were collectively designed to predict non-credible performance on neuropsychological tests. The CBS and CBS-SOS may have probative value for the evaluation of head injury and other disability claims, and they have demonstrated ability to identify examinees who fail performance validity tests in outpatient neurorehabilitation (Gaasedelen et al., 2017, 2019), civil forensic (Tylicki et al., 2021), veterans health service (Shura et al., 2023), and active-duty military settings (Ingram et al., 2024).

Morey and Hopwood (2007) propose a configural analysis of all three standard indicators of negative distortion (NIM, MAL, and RDF) to make determinations regarding the conscious-effortful versus non-effortful varieties of negative distortion. The sum of the three indicators is compared to the sum of their pairwise differences to classify a case into one of three groups: minimal, effortful, or non-effortful negative distortion. Preliminary validation of the configural analysis showed that 100% of the cases in the Morey (1991) fake-bad sample and 98% of the cases in the Morey and Lanier (1998) fake-bad sample were correctly identified as conscious-effortful distortion. Applying the configural analysis to the feigning groups reported by Rogers et al. (1996) identified 89% of the cases instructed to fake bad. Twenty-five percent of 1246 patient respondents in the clinical standardization sample were identified as conscious-effortful by the Morey and Hopwood (2007) configural analysis. Although cross-validation research by other investigators is currently lacking, applying the negative distortion configural analysis will nonetheless assist test interpreters in considering all three indicators and avoiding basing conjectures about feigning or malingering solely on NIM.

Positive Distortion and Concealed Psychopathology

Positive distortion enters the forensic context in the selection of public safety applicants, questions about fitness for duty, and child custody disputes, with custody cases more likely to face Daubert challenges in court. The approach to positive distortion assessment with the PAI parallels the approach to negative distortion described above. Specifically, positively distorted responses on the PAI reflect some combination of effortful-deceptive versus non-effortful ego-defensive processes. Again, the central question about this distortion is the respondent’s state of mind while answering the test items. Persons with known symptoms and problems who wish to conceal them from the examiner are conscious and deliberate in their decisions about which answers will reflect positively on them. It requires some effort to answer the test items contrary to the truth. The non-effortful respondent will under-report problems and exaggerate strengths with less awareness of the inaccuracy of their responses. They believe the healthy and well-adjusted impression conveyed in their answers, regardless of how inaccurate these answers might be. This self-deceptive mode of responding is likely more automatic and less effortful.

The nine items of PIM are distributed among the 344 items of the original PAI. Morey (1996) constructed DEF by comparing score configuration patterns between adults in the community standardization sample and college students instructed to present positively. CDF is a discriminant function formula that includes six PAI full scales (Cashel et al., 1995). It was constructed by contrasting the PAI scores obtained by jail inmates role-playing a job-seeking scenario with their PAI scores while responding honestly. A score of T = 57 + on PIM is the optimum cut score to discriminate honest community respondents from respondents instructed to fake good in order to obtain desirable employment (Morey, 1991), and this optimum value for PIM has been replicated in several subsequent studies of faking good (Cashel et al., 1995; Fals-Stewart, 1996; Morey & Lanier, 1998). The proximity of this cut score to the normative mean (T = 50) speaks to the difficulty of distinguishing healthy good adjustment from faking good. Although more research is needed to appraise the validity of the supplemental positive distortion indicators, HDI and PDS, they show some promise in a validation study by McCredie and Morey (2018).

PIM scores will elevate in the presence of both conscious-deceptive responding and non-effortful, defensive responding (Morey & Hopwood, 2007), whereas CDF appears to be unrelated to the level of symptom reporting and specific to conscious efforts to distort the test results (Morey, 1996). Accordingly, a configural analysis strategy that parallels the one proposed for negative distortion can be used to compare the sum of the three indicators (PIM, DEF, and CDF) to the relative influence of PIM, DEF, and CDF in the overall distortion (Morey & Hopwood, 2007). In the forensic context, this determination can be very important for understanding when a respondent is concealing problems. Additional strategies using the observed PIM score to predict or adjust the substantive scales of the profile can effectively identify specific problem areas that the respondent is working to minimize (Kurtz et al., 2015, 2016).

When respondents are motivated to present themselves as healthier or better functioning than they are, denial of substance use can be easily accomplished due to the face validity of items on the Alcohol Problem (ALC) and Drug Problem (DRG) scales. Supplemental formulas are available for estimating scores on ALC and DRG from five PAI scales that are empirically correlated with substance abuse problems (Morey, 1996). Observed scores on ALC and/or DRG that are 10 + T-score points lower than the estimated scores are proposed to indicate possible denial of substance use and associated problems. Fals-Stewart (1996) reported that deviations from the ALC-estimated and DRG-estimated scores were effective in identifying the PAI profiles of known substance use disorder patients who were instructed to conceal these problems. Significant deviations in the ALC-estimated and DRG-estimated scores were also observed in respondents who denied using drugs but were referred by law enforcement for drug possession or positive drug screens. Thus, when substance use and its consequences are specifically denied by a respondent, the ALC-estimated and DRG-estimated index scores can effectively reveal such denial by their deviation from the observed ALC and DRG scores.

Specific Daubert Standards

The Daubert ruling established five criteria for the court to consider when ruling on the admissibility of expert testimony based on psychological tests. These five standards assist the judge in evaluating the scientific merits of tests or procedures used by experts. The specific Daubert standards are applied to the available research on the PAI in the following section.

Validity of the PAI Is Empirically Testable

The first Daubert criterion asks whether the instrument in question can be evaluated with objective evidence. As a structured instrument that yields quantitative and continuous scores, the PAI is amenable to close psychometric evaluation of its various scales and index scores. The reliability and validity of the PAI scores can be estimated and compared to alternative tests and procedures designed to assess the same traits and conditions. The rationale behind the construction of the PAI followed the established principles of construct validation that continue to produce the best measures in modern psychological assessment (Clark & Watson, 2019; Watson, 2012). A full review of the evidence for the reliability and validity of the PAI is beyond the scope of this article, but research on the specific scales and subscales up to 2007 is covered in the second edition of the test manual (Morey, 2007a).

PAI Has Been Subject to Peer Review

The importance of peer review of assessment instruments and procedures to establish their scientific credibility cannot be understated.

Since its publication in 1991, the PAI has been the subject of or has been used as a measure in hundreds of articles published in peer-reviewed journals in psychology, psychiatry, law, and mental health. A search of the PsycINFO database conducted on 1 April 2024 reveals that the PAI is cited in more than 1400 peer-reviewed journal articles or book chapters and in more than 200 dissertation studies.

Known Error Rate

The third Daubert criterion considers whether research has been conducted that allows for the estimation of error rates associated with using the test to make specific decisions. One of the major sources of tension between psychology and the law is that the former deals with probabilistic assertions, whereas the latter demands conclusive facts (Melton et al., 2017). Accordingly, the confidence with which the court can proceed with psychological testimony is tied to demonstrated error rates for the inferences made based on specific test findings. Error exists in the test scores themselves and in the decisions made on the basis of those scores. The former type of error can be addressed with estimates of the standard error of measurement (SEM), and these estimates are reported in the test manual (Morey, 1991, 2007a). SEM estimates for the full scales and subscales of the PAI range between 2.8 and 5.7 T-score points. Information about error rates for various PAI test decisions is available in the empirical research literature, and these studies allow for the quantitative estimation of error rates (e.g., sensitivity, positive predictive power) for categorical decisions based on continuous PAI test scores. There are several varieties of error in test decisions, including the rate of false negative errors (i.e., sensitivity) and the rate of false-positive errors (i.e., specificity). What is a desirable error rate? In the case of feigning determinations, the type of error that is perhaps most critical to avoid in an adjudicative context is false positives. The specificity of a given cut score indicates the percentage of cases that are true negatives; thus, the inverse percentage indicates the rate of false positives. Accordingly, experts in forensic psychology have proposed a benchmark specificity value of 0.90 for the cut scores to be used for entertaining inferences about feigned mental disorders or disabilities (see Sherman et al., 2020).

One challenge in obtaining useful error rates is that the relevant studies cover a wide range of specific contexts for clinical decisions. Whether a decision rule for the case at hand can be generalized from a study carried out in a different setting or with a different population is a source of uncertainty and potential controversy. Moreover, error rates trend slightly higher in studies where specific disorders are targeted for feigning than in studies of general “faking bad” (Kurtz & McCredie, 2022). Table 1 presents information about specific cut scores that produced specificity values at or above the recommended 0.90 benchmark for several different disorders targeted by feigning respondents. This information demonstrates, for example, that NIM scores at T = 84 or higher have the desirable level of specificity needed for psycho-legal questions about brain injury or ADHD, but much higher scores (T > 100) may be required for confident assertions about feigned PTSD. Even with the level of detail in Table 1, the precise cut scores corresponding to specificity estimates at 0.90 are unclear because study authors usually report such estimates for a few recommended cut scores, and they often do not report optimal cut scores. For example, although Table 1 implies T = 70 for RDF is required to assure a specificity of 0.90, the optimum cut scores in several studies were closer to T = 60 (e.g., Hopwood et al., 2010; Kurtz et al., 2023; Pignolo et al., 2023).

Table 1 PAI Negative distortion indicators in feigning studies: cut scores with observed specificity of 0.90 + 

Standards for Use and Administration of the PAI

The test publisher of the PAI requires prospective users to submit information about their education and training history to make determinations of qualification for use. These qualifications include an advanced graduate or professional degree from a program that provides instruction and training in the administration, interpretation, and ethical usage of psychological tests. Such instruction and training are typically found in doctoral and some master’s programs in clinical or counseling psychology, and they are rarely found in graduate programs for other mental health professions (including psychiatry). Some challenges to PAI evidence presented in court may be concerned with the qualification of the user rather than the value of the instrument per se. Even an instrument with firmly established scientific standards can be challenged in court proceedings if that instrument is not interpreted by a professional with adequate qualifications and training. The availability of computerized interpretive narratives from published scoring software does not relieve the PAI user from the need to fully comprehend how the test operates, to attend closely to all the test data, and to render an independent expert analysis. At least one court decision responding to a challenge to the PAI testimony objected to an evaluation report that was prepared by “cut and paste” of the software-generated interpretive text (Meaux et al., 2022).

Further objections about test-based evidence presented to the court may focus on whether the instrument was administered properly. Competent users of the PAI carefully attend to the guidelines regarding the administration procedure described in the test manual (Morey, 1991, 2007a). The integrity of any normative comparison depends on adherence to the standard instructions, monitoring the testing environment, and being available to answer questions. The introduction of remote assessment platforms is a development that has been embraced widely by mental health practitioners, especially in the aftermath of the COVID-19 pandemic. In the excitement of these new possibilities, however, the fundamental procedures for proper test administration can be overlooked. Technological advances in remote assessment do not relieve the examiner of the burden of supervising the full administration of testing protocols, including self-report measures like the PAI. In a high-stakes forensic context, closely monitoring the testing process is paramount to ensuring the integrity of the results. Furthermore, the specific procedures used for remote testing should be fully described in the written evaluation report. Although there are no indications that remote assessment platforms compromise the validity of the PAI, the complexities of forensic assessment cases demand careful consideration of the potential limitations of such practices for the conclusions offered.

General Acceptance of the PAI in Forensic Psychology

The final Daubert criterion considers whether the instrument in question is used by professionals working in forensic settings and/or testifying in court. The clinical and validity scales of the PAI offer relevant information for many legal and forensic scenarios (Edens et al., 2001; Fokas & Brovko, 2020; Morey et al., 2007). There are two reviews of published court decisions in which the PAI is cited for the case. Mullen and Edens (2008) reviewed appellate decisions filed by federal and state courts between 1991 and 2006. They found 43 criminal cases and 82 civil cases (125 total) involving the PAI. Only three cases challenged the admissibility of the expert testimony. Partly because it was typically included in a battery of procedures, this review could not identify any specific challenges to the use of the PAI by the expert. Meaux et al. (2022) conducted a similar case law survey for the years 2006 to 2020. They found 523 cases and a trend of increasing frequency in citations over the time period studied. Challenges to the PAI were rare (six cases), and they typically centered on questions of proper administration and interpretation rather than the value of the test itself. Charles et al. (2022) report findings from another case law survey that the adolescent version of the PAI (PAI-A; Morey, 2007b) has also been admitted into evidence in several federal and state court decisions.

Other indications of general acceptance in the field of assessment psychology can be gleaned from surveys of test usage by practitioners. Neal and Grisso (2014) reported that the PAI is the second-most frequently mentioned test in an international survey of 423 forensic mental health professionals. This finding was consistent with an earlier survey of 152 forensic psychologists by Archer et al. (2006). Piotrowski (2017, p. 83) reviewed surveys of psychological test usage patterns and preferences, and he concludes that the PAI is “recognized and relied upon to a high degree in both assessment training and practice,” especially among neuropsychologists and forensic professionals in the USA. Surveys of graduate and internship training programs can provide indications that a test or instrument will continue to be used in the field. Mihura et al. (2017) surveyed the assessment training offered by 83 doctoral programs in clinical psychology, and they found that the PAI was taught in 76% of these programs. Stedman et al. (2018) surveyed 355 directors of psychology internship training programs regarding their expectations of which test procedures interns should have learned in graduate school and which test procedures they should learn during the internship. Less than half (46%) of directors believed that the PAI should be taught in graduate school, and 59% of directors believed that training with the PAI should be included in the internship. A survey of 534 clinical and counseling psychology trainees by Ingram et al. (2022) found that 72.3% of trainees received didactic instruction in the PAI, but only 50.2% of trainees found an opportunity to use it with clinical training cases. Collectively, these surveys demonstrate that, while not universally taught in graduate schools or internship programs, most professional psychologists have been educated about the PAI.

Conclusions

The use of the PAI to inform psychological expert testimony should be robust to Daubert's challenges in court. The PAI is a standardized instrument, and its reliability and validity have been the subject of many peer-reviewed research studies. There is a large empirical literature on PAI in forensic contexts that affords an appraisal of the information it can bring to legal proceedings. Studies of the diagnostic efficiency of various PAI scales and subscales provide estimates of error rates, although most of these are studies of the detection of feigned mental disorders. The publisher of the PAI has specified standards for the proper education and training of PAI users, and there are explicit guidelines for administration, scoring, and interpretation. The PAI is widely used in the contemporary practice of clinical assessment, and it is frequently an instrument of choice in forensic settings and legal contexts.

Daubert provides no guarantees that scientifically sound practices in psychology will be admitted or that questionable practices will be excluded. Indeed, a recent analysis of case law records demonstrated that attorneys and judges are lacking in their ability to recognize valid versus flawed tests and assessment procedures (Neal et al., 2019). Mental health professionals who provide expert testimony to the court must be aware of and acknowledge the limitations of their input. Although there is considerable validity evidence for some PAI clinical scales, such as ARD-T, ANT, and AGG, the validity of other clinical scales and index scores in forensic assessment contexts has received less empirical research attention at present. The available case law indicates that the admissibility of any instrument is more easily defended when there is published empirical support for the specific interpretation advanced and its application to the specific legal scenario (Meaux et al., 2022). Ultimately, the PAI is, like all other psychological tests, currently imperfect but likely to be revised and improved over time. The normative samples for the PAI are now more than 30 years old, older than the Daubert standard, but plans are underway for a revision and re-standardization project. Finally, a strong psychological evaluation considers all the data. To address most psycho-legal questions, the PAI should be administered along with multiple other tests and procedures, and the opinions of the forensic evaluator should give fair and even consideration to all the evidence collected.