1.1 Introduction

The domain of “person centered outcomes” is an evolving array of ideas, tools, and practices. In this book, we use person centered outcomes (PCOs) as an umbrella term to encompass key stakeholders’ (i.e., the recipient,Footnote 1 caregiver or provider of healthcare) assessments, ratings, beliefs, opinions, experience or satisfaction concerning medical/surgical interventions (including clinical practice, research, trials). PCO instruments (e.g., rating scales, ability tests, biometric equipment, wearables) purport to quantify health, health-related quality of life and other latent health constructs, such as pain, mood, and function. These can also be used to quantify quality of healthcare. PCO instruments play an increasingly central role in evidence-based medicine [93, 107, 108].

Used alone, or in tandem with surrogate data (e.g., analyzed in the laboratory), PCO data offer the opportunity for more meaningful and interpretable individualized measures of patient outcomes. Custom-tailored PCO reports do not entail either superficially comparable numbers or completely disconnected details. PCO data, instruments, and theory have repeatedly—across multiple clinical situations—proven themselves dependable foundations for meaningful common languages and shared metrics that speak directly to care recipients, caregivers, and healthcare providers, researchers, and policy makers [30, 39, 59].

Meaningful PCO measures map the natural courses of disease, healing, degenerative conditions, learning, development, and growth in quality-assured common units with clearly stated uncertainties, to guide treatment decisions tailored to the unique situations of different patients. Most current approaches to person-centeredness are limited, in that they typically do not follow through from the stated intentions of focusing on people (patients, employees, clients, customers, suppliers, students, etc.) to fulfillment in practice [20, 33]. The crux of the matter concerns the difference between modernizing and ecologizing, which refers to prioritizing the objectivity of the data in disconnected statistical modeling, versus prioritizing networks of actors who agree on the objectivity of a unit quantity that retains its properties across samples and instruments [53, 76].

Therefore, it is important that we can clearly articulate how scientifically calibrated and metrologically distributed metrics—measurement systems—fulfill the meaning of person-centeredness in strikingly new and widely unanticipated ways. We offer seven suggestions.

  • First, instead of burying the structure and meaning of patients’ expressions of their experiences in sum scores and ordinal statistics, we advocate using response data and explanatory models [35, 87, 104] to calibrate quality-assured instruments expressing that experience in substantively interpretable interval quantitative terms that are uniformly comparable everywhere [84, 85, 93, 94, 116, 118].

  • Second, instead of perpetuating the failed assumption that objective reality somehow automatically propagates itself into shared languages and common metrics for free, we acknowledge and leverage networks of actors who agree on the objectivity of repeatable and reproducible structural invariances, and who collaborate in bringing those invariances into distributed measurement systems, usually at great expense, but also with proportionate returns on the investments [5, 13, 17, 37, 43, 45, 47, 48, 54, 68, 69, 72, 77, 120].

  • Third, instead of using vaguely defined terms and policies to promote patient engagement and the improved outcomes that follow from informed patient involvement, we advocate defining it by mapping it, calibrating it, explaining it, and individualizing the navigation of it [16, 86, 105, 112, 132].

  • Fourth, instead of assuming data are inherently valid and meaningful, we advocate theoretical explanations of patient experiences that support a qualitative narrative accounting for variation [35, 87, 104]; this sets up a new level of defensibility, not solely reliant on any given provider of healthcare’s skills and experience.

  • Fifth, instead of reifying unidimensionality in a rigid and uncompromising way, we take the pragmatic idealist perspective of using empirically and theoretically validated standards to illuminate differences that make a difference, and, conversely, tapping even small degrees of correlation between different dimensions for the information available [3, 113, 119].

  • Sixth, instead of siphoning off data into research and management reports incapable of affecting the care of the individual patients involved, we advocate immediately feeding back at the point of care coherent [53, 111] contextualized and structured diagnostic reports; i.e., self-scoring forms and “kidmaps” which we may call “PatientMaps”, “ClientMaps”, or “PersonMaps” [12, 18, 26, 27, 50, 79, 80, 86, 111, 114, 115, 131, 132].

  • Seventh, instead of assuming that statistical averages of ordinal scores are adequate to the needs of individual patient care, and instead of assuming even that logit measures and uncertainties are capable of summarizing everything important about an individual patient experience, we advocate displaying patterns of individual ratings illustrating diagnostically relevant special strengths and weaknesses; by acknowledging the multilevel semiotic complexity of all signification in language in this way, we recognize the nature of measured constructs as boundary objects “plastic enough to be adaptable across multiple viewpoints, yet maintain continuity of identity” [45, 47, 54, 101, p. 243].

An additionally useful reporting application would associate anomalous observations with diagnostically informative statistics drawn from historical data on similar patients with similar response patterns, conditions, co-morbidities, genetic propensities, etc. Guttman scalograms [63], for instance, used in conjunction with model fit statistics, reveal stochastic patterns in individual responses [78] predicting signature sequences of diagnostically informative departures from expectation [34, 61].

Metrological quality assurance is essential if reliable decisions about diagnosis, treatment and rehabilitation are to be made consistently throughout a healthcare system, with continuous improvement [25]. This goes far beyond the well-trodden path of debates about data analysis or model choice, which have played out ad nauseum, accomplishing little more than endless arguments over arbitrary methodological criteria. A description of the situation dating to over 20 years ago remains as true now as it was then. The fundamental oversight in person-centered health care outcome management is that, in addition to the problem of model choice,

The task of psychosocial measurement has another aspect that remains virtually unaddressed, and that is the social dimension of metrology, the networks of technicians and scientists who monitor the repeatability and reproducibility of measures across instruments, users, samples, laboratories, applications, etc. For the problem of valid, reliable interval measurement to be solved, within-laboratory results must be shared and communicated between laboratories, with the aim of coining a common currency for the exchange of quantitative value. Instrument calibration (intra-laboratory repeatability or ruggedness) studies and metrological (interlaboratory reproducibility) studies must be integrated in a systematic approach to accomplishing the task of developing valid, reliable interval measurement. [43, p. 529]

Objective metrological comparability (‘traceability’) and declared measurement uncertainty leverage patterns that have been repeatedly reproduced for decades across patients and instruments, and that cohere into a common language [53, 111]. A possible way forward involves a synthesis of metrology, psychometrics, and philosophy that involves four cornerstones.

First, it is essential to root measured constructs and unit quantities succinctly and decisively in the ways they:

  • are structured as scientific models in the form of Maxwell’s equations, following Rasch [44, 48, 96, pp. 110–115];

  • extend in new ways everyday language’s roots in the metaphoric process [49], following Maxwell’s method of analogy in his exposition of how “every metaphor is the tip of a submerged model” [14, 15, p. 30] and

  • extend everyday thinking into new sciences in the manner described by Nersessian’s [89] study of Maxwell’s method of analogy [44, 45, 48].

Second, it is furthermore also essential to show and assert that measured constructs and unit quantities:

  • are defined by the populations of persons and items manifesting the construct;

  • are substantiated empirically in terms of samples of persons and items drawn from those populations that are rigorously representative of them; and

  • are explained theoretically in terms of predictive models structuring experimental tests of cognitive, behavioral, and structural processes.

Third, from this it follows that:

  • reference standard units and associated uncertainties will be set up as formal constants open to testing, refinement, and reproduction anywhere, anytime, by anyone;

  • criteria for sample definitions, instrument administration, data security, etc. will have to be developed and adopted via consensus processes; and

  • local reference standard laboratories will be charged with reproducing the unit from standard samples and from theory, to within a relevant range of uncertainty, and maintaining it in clinical practice and research applications.

Fourth, expanding and clarifying these points:

  • day-to-day measures will not be estimated via data analysis, but will instead be read from the calibrated instrument and will be reported in varying ways depending on the application:

    • individualized ‘kidmaps’ reporting specific responses;

    • measurements in the unit quantity and uncertainty; and

    • aggregate comparisons over time, horizontally across clinics and providers, and vertically within an organization, system, or region (see Fig. 1.1);

  • quality assurance processes in the reference labs and the standard setting lab will document legally binding conformity with procedures;

  • stakeholder participation in every area of activity and completely transparent openness to every kind of critical input will be essential; and

  • we would warmly welcome every conceivable empirical and/or theoretical challenge because the contestability of comparable results is a hallmark precursor of scientific progress that has to date been counterproductively excluded and omitted from the methods of outcome modelling and measurement in health care and other fields.

The urgent need for a new focus is the key motivating factor for this edited volume. In this unique collection, we explore the synthesis of metrology, psychometrics, philosophy, and clinical management to support the global comparability and equivalence of measurement results in PCO measurement. The target audience for this book is any and all key stakeholders interested in person-centered care including policy makers, clinicians, pharmaceutical industry representatives, metrologists, and health researchers.

Fig. 1.1
A 3 D line graph. The Y axis is vertical coherence and lists patient, clinic, state, national, and international. The z axis is developmental coherence from T P 1 through T P 4. The x axis is horizontal coherence.

Developmental, horizontal, and vertical coherent measurement dimensions. (Modified from Fisher et al. [51])

This book includes a unique collection of works from world-recognized experts, researchers and thought leaders in PCO research. The two sections of this volume explore the potential benefits of moving towards a PCO metrological framework across clinical practice and research, methodology and theory to provide solutions including:

  • addressing the lack of units in patient centered outcome measurement through recourse to mathematical models devised to define meaningful, invariant, and additive units of measurement with known uncertainties;

  • establishing coordinated international networks of key stakeholders guided by five principles (i.e., collaboration, alignment, integration, innovation and communication); and

  • better use of technology leveraging measurement through item banks linking PCO reports via common items, common patients, or specification equations based in strong explanatory theory.

1.2 The Chapters

Section one includes five chapters covering person centered research and clinical practice. In her clinician’s guide to performance outcome measurements, Anna Mayhew provides excellent insight as a clinical evaluator and researcher as to the role of the Rasch model in maximizing the use and interpretability of the North Star Ambulatory Assessment in better understanding the progression of Duchenne muscular dystrophy. Continuing this theme, Diane Allen and Sang Pak provide a clinical perspective as to what drives PCO measurement strategies in patient management.

We then turn to ophthalmology in two research programs. The first, from Maureen Powers and William P. Fisher, Jr. describes how physical and psychological measurements of vision combine into a model of functional binocular vision; this psychophysical combination of biological, survey, and reading test data demonstrates how data from different domains can be integrated in a common theoretical and applied context. The next chapter, from Bob Massof and Chris Bradley, describes the evolution of a long-standing program for low vision rehabilitation, which exploits item banking and computer adaptive testing. They propose a strategy for measuring patient preferences to incorporate in benefit-risk assessments of new ophthalmic devices and procedures. Finally, Sarah Smith describes the importance of quantitative and qualitative enquiry, against the backdrop of calibrated rating scales, providing the perspective of a health services researcher working in the field of dementia, at the national level.

In Section two, we move to fundamentals and applications. The section begins with John Michael Linacre’s reflections on equating measurement scales via alternative estimation methods; conceptually similar scales are aligned so that the derived measures become independent of the specifics of the situation on which they are based, with the concomitant result that theoretical differences between supposedly superior and inferior estimation methods turn out to have little or no practical consequences. David Andrich and Dragana Surla tackle the same subject from the perspective of one estimation method, but with the goal of having a common unit referenced to a common origin and where the focus is on making decisions at the person level. Thomas Salzberger takes this one step further by providing an example from the measurement of substance dependence, making the argument for traceability in social measurement via the co-calibration of different instruments in a common metric.

Jeanette Melin and Leslie Pendrill provide two chapters, which take the conversation about co-calibration an additional step further, returning to the subject of dementia. First, the authors describe a research program which elaborates the role of construct specification equations and entropy to better understand the measurement of memory through ability tests. The subsequent chapter makes the link to quality assurance in PCO measurement by describing the potential for establishing metrological references in fields such as person-centered care in the form of “recipes” analogous to certified reference materials or procedures in analytical chemistry and materials science. Finally, William Fisher grounds the contents of this book in a philosophical framework extending everyday thinking in new directions that offer hope for achieving previously unattained levels of efficacy in health care improvement efforts.

1.3 Acknowledging and Incorporating Complexity

We expect the reader will recognize that there are potential inconsistencies and even disagreements across the chapters. We fully acknowledge these, and would respond that, though matters are very far from being resolved in any kind of a settled way, there are productive, constructive, and pragmatic reasons for considering a metrological point of view on the role of measurement in health care’s person-centered quality improvement efforts.

Of particular importance among these reasons are the ways in which metrology undercuts the “culture wars” and the futile modernist-postmodernist debates, doing so by taking the focus off the relative priorities of theory vs observation [73, 74]. In Golinski’s [60, p. 35] words, “Practices of translation, replication, and metrology have taken the place of the universality that used to be assumed as an attribute of singular science.” Alternatively, in Haraway’s [64, pp. 439–440] terms, “…embedded relationality is the prophylaxis for both relativism and transcendence.” That is, the universality of scientific laws cannot be demonstrated absent instrumentation and those laws cannot be communicated without a common language; nor can the observed data’s disparate local dependencies make any sense in relation to anything if there is no metric or linguistic standard to provide a medium of comparison.

Both modern and postmodern perspectives must inevitably make use of shared standards, suggesting a third alternative focused on the shared media of communications standards and metrologically traceable instruments. Latour’s [72, pp. 247–257] extended consideration of the roles of metrology is foundational. Latour [73, 74] characterizes this third alternative as amodern, and Dewey [36, p. 277] similarly arrives at a compatible unmodern perspective, saying that “…every science and every highly developed technology is a systematic network of connected facts and operations.” Galison [57] considers the modern focus on transcendental universals as positivist, the postmodern emphasis on relativism as antipositivist, and the unmodern inclusion of the instrument as postpositivist. A large and growing literature in science and technology studies pursues the implications of instruments and standards for understanding the progress of science [1, 5, 13, 19, 21, 37, 67, 90, 109].

Galison [58, p. 143] offers an “open-ended model” of how different communities of research and practice interrelate. This perspective allows:

  • partial autonomy to each community at their level of complexity:

    • experimentation’s focus on concrete observable data,

    • instrumentation’s focus on abstract communications standards, and

    • theory’s focus on formal models, laws, and predictive theories; and

  • “a rough parity among the strata—no one level is privileged, no one subculture has the special position of narrating the right development of the field or serving as the reduction basis” [57, p. 143].

A significant consequence of this open-ended model in physics is, Galison [57, pp. 46–47] points out, that

…between the scientific subcultures of theory and experiment, or even between different traditions of instrument making or different subcultures of theorizing, there can be exchanges (co-ordinations), worked out in exquisite detail, without global agreement. Theorists and experimenters, for example, can hammer out an agreement that a particular track configuration found on a nuclear emulsion should be identified with an electron and yet hold irreconcilable views about the properties of the electron, or about philosophical interpretations of quantum field theory, or about the properties of films.

The work that goes into creating, contesting, and sustaining local coordination is, I would argue, at the core of how local knowledge becomes widely accepted. At first blush, representing meaning as locally convergent and globally divergent seems paradoxical. On one hand, one might think that meaning could be given sentence by sentence. In this case the global sense of a language would be the arithmetical sum of the meaning given in each of its particular sentences. On the other hand, the holist would say that the meaning of any particular utterance is only given through the language in its totality. There is a third alternative, namely, that people have and exploit an ability to restrict and alter meanings in such a way as to create local senses of terms that speakers of both parent languages recognize as intermediate between the two. The resulting pidgin or creole is neither absolutely dependent on nor absolutely independent of global meanings.

What Galison describes here is the state of being suspended in language, semiotically, where abstract semantic standards mediate the negotiation of unrealistic conceptual ideals and unique, concrete local circumstances. This theme also emerges in the work of S. L. Star under the heading of the boundary object [19, 99,100,103], and in Woolley and Fuchs’ [121] contention that healthy scientific fields must incorporate both divergent and convergent thinking.

An obvious point of productive disagreement in this vein emerges in the chapter by Massof and Bradley, with their “heretical” statements about expecting and accepting failures of invariance in their low vision rehabilitation outcomes measurement and management system. Differential item functioning and differential person functioning take on a new significance when theory explains the structural invariance incorporated in a measurement standard, and item location estimates have been stable across thousands or even millions of cases. A variation on this point is raised by Allen and Pak in the section in their chapter on the tensions between standardization and personalization. Here, local failures of invariance become actionable and relevant bits of information clinicians and others need to know about if they are to be able to formulate effective interventions.

It is part of the nature of a boundary object to accept those concrete levels of manifestations of unique patterns in the diagnosis-specific ways described by Allen and Pak, and by Massof and Bradley. As Star [101, p. 251] put it,

…boundary objects…are a major method of solving heterogenous problems. Boundary objects are objects that are both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual-site use.

Star and Ruhleder [103, p. 128] similarly say, “only those applications which simultaneously take into account both the formal, computational level and the informal, workplace/cultural level are successful.” As is suggested in the chapter by Fisher, might the ongoing failures of person-centered quality improvement efforts listed by Berwick and Cassel [11] derive from inattention to the nature of boundary objects?

In that same vein, a more pointed instance of the heterogeneity of perspectives implied by boundary objects emerges in the longstanding debates between item response theory (IRT) and Rasch model advocates [4, 31, 42]. The arguments here focus on the descriptive value of statistical models obtaining the lowest p-values in significance tests, vs the prescriptive value of scientific models providing narrative explanations of variation and information, with additional indications as to how instruments and sampling procedures might be improved. These purposes are not, of course, always pursued in mutually opposed ways, and in current practice, both of them typically assume measurement to be achieved primarily via centrally planned and executed data analyses, not via the distributed metrology of calibrated instruments advocated here.

But in this metrological context, the “Rasch debate” [42] is defused. Data analysis certainly has an essential place in science, even if it should not be the primary determining focus of measurement. Boundary objects align with a kind of pragmatic idealism that recognizes there are communities, times, and places in which each of the different levels of complexity represented by data, instruments, and theory is valid and legitimate. There are just as many needs for locally intensive investigations of idiosyncratic data variations as there are for interconnected instrument standards and globally distributed explanatory theories.

But there are different ways of approaching local failures of invariance. It is essential to not confuse levels of complexity [45, 47]. Data analyses of all kinds can be productively pursued in hierarchically nested contexts bound by consensus standards structuring broad communications. But local exceptions to the rule that do not clearly invalidate the empirical and theoretical bases of item calibrations should no longer be allowed to compromise the comparability of measurements. There is nothing radical or new in saying this. It has long been recognized that “The progress of science largely depends on this power of realizing events before they occur,” that “laws are the instruments of science, not its aim,” and that “the whole value…of any law is that it enables us to discover exceptions” [32, pp. 400, 428, 430]. Instead of conceiving measurement primarily in terms of statistically modeled multivariate interactions, a larger role needs to be made for scientific modeling of univariate dimensions, as these are the means by which metrology creates labor-saving “economies of thought” [7, 32, pp. 428–429, 55, 56, 83, pp. 481–495].

Butterfield [24, pp. 16–17, 25–26, 96–98] notes that, in the history of science, observations do not accumulate into patterns recognized as lawful; instead, science advances as new ways of projecting useful geometric idealizations are worked out. Measurement structures linear geometries affording conceptual grasps of concrete phenomena by positing the hypothesis that something varies in a way that might be meaningfully quantified. Kuhn [71, p. 219; original emphasis] makes the point, saying,

The road from scientific law to scientific measurement can rarely be traveled in the reverse direction. To discover quantitative regularity, one must normally know what regularity one is seeking and one's instruments must be designed accordingly; even then nature may not yield consistent or generalizable results without a struggle.

Practical metrological implementations of the results of measurement modeling exercises that begin by specifying the structure of lawful regularities, as in the use of Rasch’s models for measurement, require agreements on standardized criteria for knowing when and if comparability is substantively threatened. Efforts in this direction are being taken up, for instance, in the European NeuroMet project [39].

Taken out of context, the unfortunate effect of compromising the invariance of the unit quantity in the application of IRT models with multiple item parameters is that data are described to death. The value of identified models [98, 100], such as Rasch’s, concerns the practical implications of structural invariances for policy and programs. Rasch and Thurstone are acknowledged for their contributions in “fruitful” discussions concerning the development of the concept of identified models, those that require structural invariances reproducible across samples and instruments [70, p. 165]. Referred to by one of Rasch’s mentors, Frisch, as “autonomy” [2], this quality in cross-data patterns is characteristic of a class of models necessary to learning generalizable lessons informing practical capacities for predicting the future [48]. Over-parameterized models, in contrast, may achieve statistical significance only at the expense of practical significance, such that the particular relationships obtained in a given data set are so closely specified that they are useless for anticipating future data [6, 23, 38, p. 211, 81, p. 22, 99, 110, p. 235; 123, 127].

By sweeping unexpected responses into data-dependent item parameters, exceptions to the rule are concealed in summary statistics and hidden from end users who might otherwise be able to make use of them in the manner described in the chapters by Allen and Pak, and by Massof and Bradley. But the disclosure of anomalies is well-established as a primary function of measurement. Rasch [96, pp. 10, 124], Kuhn [71, p. 205], and Cook [32, p. 431] all illustrate this point using the example of the discovery of Neptune from perturbations in the orbit of Uranus. Burdick and Stenner [22] concur, noting that IRT models put analysts in a position akin to Ptolemaic astronomers, whose descriptive approach to planetary orbits was more accurate than could be achieved using Newton’s laws. What if astronomy had stuck with the Ptolemaic methods instead of adopting new ones based on physical theory? Ptolemaic astronomers can be imagined saying, “Forget about those perturbations in the orbit of Uranus. Our model accounts for them wonderfully.” If astronomy as a field had accepted that position, instead of insisting on the prescriptive Newtonian model, then Neptune never could have been discovered by estimating the position and mass of an object responsible for perturbations of the magnitude observed in Uranus’ orbit.

Though this process of being able to perceive exceptions only in relation to a standard may seem to be a highly technical feature of mathematical science, it is but an instance of the fact recognized by Plato in The Republic (523b–c) that “experiences that do not provoke thought are those that do not at the same time issue a contradictory perception.” That is, we do not really think much at all about experiences meeting our expectations. The labor-saving economy of thought, created when language pre-thinks the world for us, removes the need to bother with irrelevant details. Scientific instruments extend this economy by embodying invariant meaning structures that predict the form of new data.

But in contrast to these more typical situations, innovative ideas and thoughtful considerations tend to follow from observations that make one think, “That’s odd… .” And so it happened with the discovery of penicillin when a lab culture died, the discovery of x-rays when a lead plate was misplaced in a lab, of vulcanization when liquid rubber was accidentally left on a hot stove, of post-it notes when an experimental glue did not stick, etc.

Nature is revealed by means of exceptions that are often products of serendipitous accidents [97]. Anomalous observations are answers to questions that have not been asked. Because attention is focused on conceptually salient matters already incorporated in the linguistic economy of thought, most unexpected observations are ignored as mere noise, as nuisance parameters of little interest or value. Deconstructing the context in which unexpected observations arise is difficult, as it requires a capacity for closely following the phenomenology giving rise to something that may or may not be of any use. It is not only hard to know when pursuit of new avenues of investigation might be rewarded, but formulating the question to which the observation is an answer requires imagination and experience. Thus, Kuhn [71, p. 206] observes that a major value of quantified methods follows from the fact that numbers, sterile in themselves, “register departures from theory with an authority and finesse that no qualitative technique can duplicate.” Creating organizational environments capable of supporting full pivots in new directions is, of course, another matter entirely, but that is just what is entailed by the way science and society co-evolve [5, 54, 68, 69, 72, 77].

Continuing to accept summed ratings and multiparameter IRT models’ undefined, unstable, uninterpretable, sample- and instrument-dependent unit quantities as necessary and unavoidable has proven itself as a highly effective means of arresting the development of psychology and the social sciences. The common practice of willfully mistaking ordinal ratings and IRT estimates for interval measures perpetuates the failure to even conceive the possibility that communities of research and practice could think and act together in the terms of common languages employed for their value as the media of communication and the rules by which exceptions are revealed. Continued persistence in this confusion has reached a quite perverse degree of pathological denial, given that the equivalence of measurement scales across the sciences was deemed “widely accepted” over 35 years ago [88, p. 169] but still has not fulfilled its potential in mainstream applications. Grimby et al. [62] are not unreasonable, from our point of view, in viewing the ongoing acceptance of ordinal scores and undefined numeric units in person-centered outcome measurement as a form of fraudulent malpractice.

The dominant paradigm’s failure to distinguish numeric scores from measured quantities [10] commits the fundamental epistemological error of separating individual minds from the environmental context they inhabit [9, p. 493; 47]. Cognitive processes do not occur solely within brains, but must necessarily leverage scaffolded supports built into the external environment, such as alphabets, phonemes, grammars, dictionaries, printing presses, and quantitative unit standards’ quality assurance protocols [65, 66, 75, 95, 106]. Metrological infrastructures define the processes by which real things and events in the world are connected with formal conceptual ideals and are brought into words with defined meanings, including common metrics’ number words. And so, as Latour [72, pp. 249, 251] put it,

Every time you hear about a successful application of a science, look for the progressive extension of a network. … Metrology is the name of this gigantic enterprise to make of the outside a world inside which facts and machines can survive.

Shared standards and common languages are the means by which we prepare minds on mass scales to better recognize and act on chance events. As Pasteur put it in 1854, “in the fields of observation, chance favors only the prepared mind” [40, p. 309]. Because currently popular measurement methods neither map the unfolding sequences of changes in health, performance, functionality, or learning, nor express differences in terms of defined unit quantities with stated uncertainties, nor reveal unexpected departures from theory, person-centered care lacks systematic ways of apprehending and communicating accidental and serendipitous events that might possess actionable value.

Identified models and metrological standards set up an alternative vision of broad ecosystems of interdependent and reproductively viable forms of social life. A key potential for productive innovations emerges here, since, as the populations of these forms of life grow, highly improbable combinations (mutations) become so frequent that their failure to occur is what becomes unlikely [41, p. 91]. In other words, multilevel metrologically-traceable systems of measurement create the combinations of construct theories’ self-descriptive genotypes, instrument standard phenotypes, and mutable individual data needed for natural selection to amplify adaptively superior new forms of social life [91, 92] in a kind of epigenetic organism-environment integration. But if statistical descriptions of ordinal scores and IRT’s varying unit estimates continue to be taken as satisfactory approaches to quantifying person-centered outcomes, it is only reasonable to expect continued perpetuation of the status quo vis-à-vis systematically and statistically concealed anomalies and exceptions to the rule that could otherwise lead in qualitatively productive new directions at both individual and aggregate levels.

1.4 Concluding Comments

Differences between centrally-planned data analytics and distributed metrological networks were a matter of concern for Ben Wright [123, 126, 127] not just in his steadfast focus on science over statistics but more broadly throughout his conception of measurement [46, 117]. In the last paragraph of his 1988 Inaugural Address to the AERA Rasch Measurement SIG, Wright ([124]; also see [125]) said:

So, we come to my last words. The Rasch model is not a data model at all. You may use it with data, but it’s not a data model. The Rasch model is a definition of measurement, a law of measurement. Indeed, it’s the law of measurement. It’s what we think we have when we have some numbers and use them as though they were measures. And it’s the way numbers have to be in order to be analyzed statistically. The Rasch model is the condition that data must meet to qualify for our attention. It’s our guide to data good enough to make measures from. And it’s our criterion for whether the data with which we are working can be useful to us.

This recurring theme in Wright’s work is also foregrounded on the first page of Wright and Masters’ 1982 book [130]:

Because we are born into a world full of well-established variables it can seem that they have always existed as part of an external reality which our ancestors have somehow discovered. But science is more than discovery. It is also an expanding and ever-changing network of practical inventions. Progress in science depends on the creation of new variables constructed out of imaginative selections and organizations of experience.

With his colleagues and students, Wright [122] advanced the ideas of item banking and adaptive item administration [8, 28, 29, 82, 129], individualized “kidmap” reports [26, 27, 131] and self-scoring forms [12, 50, 79, 80, 86, 126, 128, 132]. All of these depend on understanding measurement as operationalized via structurally invariant, anchored item estimates and networks of instruments read in a common language at the point of use [46, 117]. Wright’s contributions to construct theorizing, instrument calibration, and individually-customized reports of special strengths and weaknesses span all three semiotic levels of complexity.

The chapters in this book build on and remain in dialogue with Wright’s [127, p. 33] realization that “Science is impossible without an evolving network of stable measures.” With increasing awareness of the metrological viability of instruments calibrated using Wright’s ideas [25, 52, 85, 93], and the emergence of new consensus standards for uniform metrics [39], there is also increasing need for examples of the kind brought together in this book. We hope that the efforts begun by the contributors to this volume will inspire a growing sphere of imaginative and productive applications of person-centered outcome metrology.