In 2012, the US FDA's Patient-Focused Drug Development (PFDD) initiative was born, a part of the Affordable Care Act that was intended to underscore the importance of patient engagement in the pharmaceutical development process [1]. This initiative was in many ways the natural evolution of shift in the drug development paradigm that had been signaled by previous patient-reported outcome (PRO) guidance released by the FDA in 2009 [2]. As part of the PFDD initiative, and to reflect observations and lessons learned from the first decade following release of the original PRO guidance document, the FDA has been collaborating with stakeholders in academia, healthcare, and the consulting industry to develop four guidance documents to update and expand upon the development and use of clinical outcomes assessments (COAs), including (but no longer limited to) PROs [3].

While the four guidance documents, when completed, will collectively replace the PRO guidance from 2009, standalone versions of each document are being released as they become available. Final versions of the first two guidance documents, which focus on qualitative data, are already in the public domain [4, 5]. A draft version of the third document (PFDD G3) was released in June 2022 [6]. When finalized, PFDD G3 (the focus of this commentary) will represent the FDA’s current thinking on “selecting, developing, or modifying fit-for-purpose” COAs.

For those intimately familiar with the 2009 PRO guidance, the temptation may be to directly compare the draft PFDD G3 with the information in the previous guidance. One of the most ambitious changes between the 2009 PRO guidance and the four-part PFDD guidance is the expansion of coverage from PROs to all COAs, and, based on the FDA’s experience of more than a decade of assisting study sponsors to implement PROs in clinical trials, to provide greater clarity regarding expectations around the evidence required to support the identification of patient-centric concepts and selection of appropriate COAs. Meanwhile, a particularly striking omission from the PFDD G3 compared with the 2009 PRO guidance is the deafening silence of the more recent document on the topic of COA-based product labeling. It is our hope that the forthcoming Guidance 4 will fill this void.

Perhaps the most obvious difference is the increased emphasis on the role (and rigor) of psychometrics. Where the 2009 PRO guidance mainly focused on the importance of qualitative data and made limited reference to analytic approaches, the PFDD G3 brings considerable—arguably, equal—weight to topics previously restricted to COA developers. Frameworks and standards from educational and psychological testing have long been understood and widely implemented by researchers in COA evaluation practice; now they are being formally acknowledged in the new guidance. However, without a background in educational and psychological testing, casual readers of the PFDD G3 may feel lightheaded wading through the specialized terminology and commentary on specific analytic methods associated with measurement: differential item functioning, measurement invariance, item characteristic curves, reflective indicator model, composite indicator model, Samejima’s Graded Response Model … the list goes on and on! There is an entire new vocabulary to learn, and the words may be unfamiliar but the message is clear: the age of the psychometrician has arrived.

It is no secret that, currently, psychometricians spend most of their working hours in the back rooms when contributing to COA development. The seemingly seismic shift ushered in by the PFDD G3, with its increased focus on the measurement properties and interpretability of COAs, is likely to bring psychometricians to center stage in discussions surrounding development and interpretation of COAs. The implications of these changes are many and range from fundamental revisions to the trial planning process to job security and hip lingo. Significantly, while pre-trial inquiries and meetings with the FDA on COA strategy and psychometric analysis plans have long been encouraged and will likely become more frequent, such discussions have previously been treated as an afterthought. It is hoped that the importance of psychometric testing seen in PFDD G3 will encourage sponsors to pursue these conversations earlier in the planning stages and with greater emphasis. Similarly, we predict that this interest will spur demand for at least basic psychometric training for drug development teams in pharmaceutical companies and that item response theory (IRT), which, prior to its revival in the PFDD G3, had lurked in the peripheries of COA development, will likely become a fashionable term to throw around in product-development meetings. Furthermore, some psychometricians have expressed the hope that the draft guidance will prompt support for emerging work within the COA field that may one day become the ‘go-to’ references for publications on methods rather than less applicable references from the field of educational and psychological testing.

Apart from the emphasis on the modern psychometric approaches, the guidance also ventures into the value of new technologies and modes of data collection, championing some innovations and offering a warning about overenthusiasm with other relatively novel approaches that still require ironing out. Among the ‘winners’ here, the guidance (1) acknowledges the rapid evolution of digital health technologies (DHTs) that may be used to measure outcomes in clinical trials; (2) assures readers that the FDA will consider well-justified approaches to computerized adaptive testing; and (3) encourages the use of assistive technologies such as eye trackers and screen readers to provide reliable reports on study subjects with vision impairments. This shift is highly encouraging for drug developers, especially those who struggle to demonstrate treatment benefit in rare diseases [7]. Conversely, although the PFDD G3 does not denigrate paper-based data collection, the guidance does warn of the undue bias that may be introduced by using different modes of data collection in the same study. Enthusiasts of the Bring-Your-Own-Device concept should take note.

Inevitably, the examination of several novel approaches also means the FDA has revisited some old ‘classics’ and, in some instances, found them to be lacking. The relegation of the visual analog scale, an antiquated measurement tool, to the COA Hall of Shame is to be applauded. On the other hand, the dismissal of the much-venerated Cohen’s correlation coefficient cut-offs may be mourned by those who prefer these rules of thumb.

Ultimately, the newfound celebrity of psychometricians and the flashy technology and data collection approaches described in the guidance are in service to an old master: evidence of fitness-for-purpose (or, as us old timers may still call it, ‘validity’). The PFDD G3 describes eight components that should be considered for inclusion in the rationale and supporting evidence or justification for a COA but the core of each component is the same. Evidence, evidence, and more evidence.

While the evidence requirement in the guidance may look daunting at first glance, anyone who reads the PFDD G3 and tells you that the sky is falling has missed the point. The final takeaway here is neither new nor shocking: the Agency is more likely to expect a high degree of evidence to support COAs in areas with greater uncertainty. Where an adult-developed COA is being used with adolescents, cognitive debriefing will be required. Where new translations of a COA are being used, cultural adaptation confirmatory interviews will be required. When a COA developed for home completion is being administered during a clinical site visit, an independent study to confirm equivalence will likely be required. These ‘requirements’ are not new. The insistence on demonstrating that a COA is fit-for-purpose has always been at the heart of regulatory consideration. Releasing an entire guidance document to emphasize this is simply providing greater clarity for sponsors to better demonstrate a robust COA strategy.

The speed at which recommendations of PFDD G3 will be implemented and adapted by the FDA reviewing divisions—not to mention the opinions of other major regulators, such as the European Medicines Agency—is yet to be seen. Certainly, changes based on these recommendations will take time to fully unfold, but we are already seeing growth in COA taskforces and joint endeavors undertaken by government agencies, academia, and industry for IRT-based standardized COAs such as PROMIS® and ASCQ-Me®. These advances are complemented by rapid technological developments and the spread of wearable and in-home medical devices, as well as the evolution of test theory in our field. But the increased emphasis on evidence requirements and the modern test theories that the PFDD G3 embraces to support them when taken together could result in improvements in benefit–risk analysis, greater review efficiency, and (dare we hope?) better chances of having label claims approved.