Abstract
Broadly stated, this book makes the case for a different way of thinking about how to measure and manage person-centered outcomes in health care. The basic contrast is between statistical and metrological definitions of measurement. The mainstream statistical tradition focuses attention on numbers in centrally planned and executed data analyses, while metrology focuses on distributing meaningfully interpretable instruments throughout networks of end users. The former approaches impose group-level statistics from the top down in homogenizing ways. The latter tracks emergent patterns from the bottom up, feeding them back to end users in custom tailored applications, whose decisions and behaviors are coordinated by means of shared languages. New forms of information and knowledge necessitate new forms of social organization to create them and put them to use. The chapters in this book describe the analytic, design, and organizational methods that have the potential to open up exciting new possibilities for systematic and broad scale improvements in health care outcomes.
Keywords
- Statistical modeling
- Measurement modeling
- Complexity
- Distributed thinking
Download chapter PDF
1.1 Introduction
The domain of “person centered outcomes” is an evolving array of ideas, tools, and practices. In this book, we use person centered outcomes (PCOs) as an umbrella term to encompass key stakeholders’ (i.e., the recipient,Footnote 1 caregiver or provider of healthcare) assessments, ratings, beliefs, opinions, experience or satisfaction concerning medical/surgical interventions (including clinical practice, research, trials). PCO instruments (e.g., rating scales, ability tests, biometric equipment, wearables) purport to quantify health, health-related quality of life and other latent health constructs, such as pain, mood, and function. These can also be used to quantify quality of healthcare. PCO instruments play an increasingly central role in evidence-based medicine [93, 107, 108].
Used alone, or in tandem with surrogate data (e.g., analyzed in the laboratory), PCO data offer the opportunity for more meaningful and interpretable individualized measures of patient outcomes. Custom-tailored PCO reports do not entail either superficially comparable numbers or completely disconnected details. PCO data, instruments, and theory have repeatedly—across multiple clinical situations—proven themselves dependable foundations for meaningful common languages and shared metrics that speak directly to care recipients, caregivers, and healthcare providers, researchers, and policy makers [30, 39, 59].
Meaningful PCO measures map the natural courses of disease, healing, degenerative conditions, learning, development, and growth in quality-assured common units with clearly stated uncertainties, to guide treatment decisions tailored to the unique situations of different patients. Most current approaches to person-centeredness are limited, in that they typically do not follow through from the stated intentions of focusing on people (patients, employees, clients, customers, suppliers, students, etc.) to fulfillment in practice [20, 33]. The crux of the matter concerns the difference between modernizing and ecologizing, which refers to prioritizing the objectivity of the data in disconnected statistical modeling, versus prioritizing networks of actors who agree on the objectivity of a unit quantity that retains its properties across samples and instruments [53, 76].
Therefore, it is important that we can clearly articulate how scientifically calibrated and metrologically distributed metrics—measurement systems—fulfill the meaning of person-centeredness in strikingly new and widely unanticipated ways. We offer seven suggestions.
-
First, instead of burying the structure and meaning of patients’ expressions of their experiences in sum scores and ordinal statistics, we advocate using response data and explanatory models [35, 87, 104] to calibrate quality-assured instruments expressing that experience in substantively interpretable interval quantitative terms that are uniformly comparable everywhere [84, 85, 93, 94, 116, 118].
-
Second, instead of perpetuating the failed assumption that objective reality somehow automatically propagates itself into shared languages and common metrics for free, we acknowledge and leverage networks of actors who agree on the objectivity of repeatable and reproducible structural invariances, and who collaborate in bringing those invariances into distributed measurement systems, usually at great expense, but also with proportionate returns on the investments [5, 13, 17, 37, 43, 45, 47, 48, 54, 68, 69, 72, 77, 120].
-
Third, instead of using vaguely defined terms and policies to promote patient engagement and the improved outcomes that follow from informed patient involvement, we advocate defining it by mapping it, calibrating it, explaining it, and individualizing the navigation of it [16, 86, 105, 112, 132].
-
Fourth, instead of assuming data are inherently valid and meaningful, we advocate theoretical explanations of patient experiences that support a qualitative narrative accounting for variation [35, 87, 104]; this sets up a new level of defensibility, not solely reliant on any given provider of healthcare’s skills and experience.
-
Fifth, instead of reifying unidimensionality in a rigid and uncompromising way, we take the pragmatic idealist perspective of using empirically and theoretically validated standards to illuminate differences that make a difference, and, conversely, tapping even small degrees of correlation between different dimensions for the information available [3, 113, 119].
-
Sixth, instead of siphoning off data into research and management reports incapable of affecting the care of the individual patients involved, we advocate immediately feeding back at the point of care coherent [53, 111] contextualized and structured diagnostic reports; i.e., self-scoring forms and “kidmaps” which we may call “PatientMaps”, “ClientMaps”, or “PersonMaps” [12, 18, 26, 27, 50, 79, 80, 86, 111, 114, 115, 131, 132].
-
Seventh, instead of assuming that statistical averages of ordinal scores are adequate to the needs of individual patient care, and instead of assuming even that logit measures and uncertainties are capable of summarizing everything important about an individual patient experience, we advocate displaying patterns of individual ratings illustrating diagnostically relevant special strengths and weaknesses; by acknowledging the multilevel semiotic complexity of all signification in language in this way, we recognize the nature of measured constructs as boundary objects “plastic enough to be adaptable across multiple viewpoints, yet maintain continuity of identity” [45, 47, 54, 101, p. 243].
An additionally useful reporting application would associate anomalous observations with diagnostically informative statistics drawn from historical data on similar patients with similar response patterns, conditions, co-morbidities, genetic propensities, etc. Guttman scalograms [63], for instance, used in conjunction with model fit statistics, reveal stochastic patterns in individual responses [78] predicting signature sequences of diagnostically informative departures from expectation [34, 61].
Metrological quality assurance is essential if reliable decisions about diagnosis, treatment and rehabilitation are to be made consistently throughout a healthcare system, with continuous improvement [25]. This goes far beyond the well-trodden path of debates about data analysis or model choice, which have played out ad nauseum, accomplishing little more than endless arguments over arbitrary methodological criteria. A description of the situation dating to over 20 years ago remains as true now as it was then. The fundamental oversight in person-centered health care outcome management is that, in addition to the problem of model choice,
The task of psychosocial measurement has another aspect that remains virtually unaddressed, and that is the social dimension of metrology, the networks of technicians and scientists who monitor the repeatability and reproducibility of measures across instruments, users, samples, laboratories, applications, etc. For the problem of valid, reliable interval measurement to be solved, within-laboratory results must be shared and communicated between laboratories, with the aim of coining a common currency for the exchange of quantitative value. Instrument calibration (intra-laboratory repeatability or ruggedness) studies and metrological (interlaboratory reproducibility) studies must be integrated in a systematic approach to accomplishing the task of developing valid, reliable interval measurement. [43, p. 529]
Objective metrological comparability (‘traceability’) and declared measurement uncertainty leverage patterns that have been repeatedly reproduced for decades across patients and instruments, and that cohere into a common language [53, 111]. A possible way forward involves a synthesis of metrology, psychometrics, and philosophy that involves four cornerstones.
First, it is essential to root measured constructs and unit quantities succinctly and decisively in the ways they:
-
are structured as scientific models in the form of Maxwell’s equations, following Rasch [44, 48, 96, pp. 110–115];
-
extend in new ways everyday language’s roots in the metaphoric process [49], following Maxwell’s method of analogy in his exposition of how “every metaphor is the tip of a submerged model” [14, 15, p. 30] and
-
extend everyday thinking into new sciences in the manner described by Nersessian’s [89] study of Maxwell’s method of analogy [44, 45, 48].
Second, it is furthermore also essential to show and assert that measured constructs and unit quantities:
-
are defined by the populations of persons and items manifesting the construct;
-
are substantiated empirically in terms of samples of persons and items drawn from those populations that are rigorously representative of them; and
-
are explained theoretically in terms of predictive models structuring experimental tests of cognitive, behavioral, and structural processes.
Third, from this it follows that:
-
reference standard units and associated uncertainties will be set up as formal constants open to testing, refinement, and reproduction anywhere, anytime, by anyone;
-
criteria for sample definitions, instrument administration, data security, etc. will have to be developed and adopted via consensus processes; and
-
local reference standard laboratories will be charged with reproducing the unit from standard samples and from theory, to within a relevant range of uncertainty, and maintaining it in clinical practice and research applications.
Fourth, expanding and clarifying these points:
-
day-to-day measures will not be estimated via data analysis, but will instead be read from the calibrated instrument and will be reported in varying ways depending on the application:
-
individualized ‘kidmaps’ reporting specific responses;
-
measurements in the unit quantity and uncertainty; and
-
aggregate comparisons over time, horizontally across clinics and providers, and vertically within an organization, system, or region (see Fig. 1.1);
-
-
quality assurance processes in the reference labs and the standard setting lab will document legally binding conformity with procedures;
-
stakeholder participation in every area of activity and completely transparent openness to every kind of critical input will be essential; and
-
we would warmly welcome every conceivable empirical and/or theoretical challenge because the contestability of comparable results is a hallmark precursor of scientific progress that has to date been counterproductively excluded and omitted from the methods of outcome modelling and measurement in health care and other fields.
The urgent need for a new focus is the key motivating factor for this edited volume. In this unique collection, we explore the synthesis of metrology, psychometrics, philosophy, and clinical management to support the global comparability and equivalence of measurement results in PCO measurement. The target audience for this book is any and all key stakeholders interested in person-centered care including policy makers, clinicians, pharmaceutical industry representatives, metrologists, and health researchers.
Developmental, horizontal, and vertical coherent measurement dimensions. (Modified from Fisher et al. [51])
This book includes a unique collection of works from world-recognized experts, researchers and thought leaders in PCO research. The two sections of this volume explore the potential benefits of moving towards a PCO metrological framework across clinical practice and research, methodology and theory to provide solutions including:
-
addressing the lack of units in patient centered outcome measurement through recourse to mathematical models devised to define meaningful, invariant, and additive units of measurement with known uncertainties;
-
establishing coordinated international networks of key stakeholders guided by five principles (i.e., collaboration, alignment, integration, innovation and communication); and
-
better use of technology leveraging measurement through item banks linking PCO reports via common items, common patients, or specification equations based in strong explanatory theory.
1.2 The Chapters
Section one includes five chapters covering person centered research and clinical practice. In her clinician’s guide to performance outcome measurements, Anna Mayhew provides excellent insight as a clinical evaluator and researcher as to the role of the Rasch model in maximizing the use and interpretability of the North Star Ambulatory Assessment in better understanding the progression of Duchenne muscular dystrophy. Continuing this theme, Diane Allen and Sang Pak provide a clinical perspective as to what drives PCO measurement strategies in patient management.
We then turn to ophthalmology in two research programs. The first, from Maureen Powers and William P. Fisher, Jr. describes how physical and psychological measurements of vision combine into a model of functional binocular vision; this psychophysical combination of biological, survey, and reading test data demonstrates how data from different domains can be integrated in a common theoretical and applied context. The next chapter, from Bob Massof and Chris Bradley, describes the evolution of a long-standing program for low vision rehabilitation, which exploits item banking and computer adaptive testing. They propose a strategy for measuring patient preferences to incorporate in benefit-risk assessments of new ophthalmic devices and procedures. Finally, Sarah Smith describes the importance of quantitative and qualitative enquiry, against the backdrop of calibrated rating scales, providing the perspective of a health services researcher working in the field of dementia, at the national level.
In Section two, we move to fundamentals and applications. The section begins with John Michael Linacre’s reflections on equating measurement scales via alternative estimation methods; conceptually similar scales are aligned so that the derived measures become independent of the specifics of the situation on which they are based, with the concomitant result that theoretical differences between supposedly superior and inferior estimation methods turn out to have little or no practical consequences. David Andrich and Dragana Surla tackle the same subject from the perspective of one estimation method, but with the goal of having a common unit referenced to a common origin and where the focus is on making decisions at the person level. Thomas Salzberger takes this one step further by providing an example from the measurement of substance dependence, making the argument for traceability in social measurement via the co-calibration of different instruments in a common metric.
Jeanette Melin and Leslie Pendrill provide two chapters, which take the conversation about co-calibration an additional step further, returning to the subject of dementia. First, the authors describe a research program which elaborates the role of construct specification equations and entropy to better understand the measurement of memory through ability tests. The subsequent chapter makes the link to quality assurance in PCO measurement by describing the potential for establishing metrological references in fields such as person-centered care in the form of “recipes” analogous to certified reference materials or procedures in analytical chemistry and materials science. Finally, William Fisher grounds the contents of this book in a philosophical framework extending everyday thinking in new directions that offer hope for achieving previously unattained levels of efficacy in health care improvement efforts.
1.3 Acknowledging and Incorporating Complexity
We expect the reader will recognize that there are potential inconsistencies and even disagreements across the chapters. We fully acknowledge these, and would respond that, though matters are very far from being resolved in any kind of a settled way, there are productive, constructive, and pragmatic reasons for considering a metrological point of view on the role of measurement in health care’s person-centered quality improvement efforts.
Of particular importance among these reasons are the ways in which metrology undercuts the “culture wars” and the futile modernist-postmodernist debates, doing so by taking the focus off the relative priorities of theory vs observation [73, 74]. In Golinski’s [60, p. 35] words, “Practices of translation, replication, and metrology have taken the place of the universality that used to be assumed as an attribute of singular science.” Alternatively, in Haraway’s [64, pp. 439–440] terms, “…embedded relationality is the prophylaxis for both relativism and transcendence.” That is, the universality of scientific laws cannot be demonstrated absent instrumentation and those laws cannot be communicated without a common language; nor can the observed data’s disparate local dependencies make any sense in relation to anything if there is no metric or linguistic standard to provide a medium of comparison.
Both modern and postmodern perspectives must inevitably make use of shared standards, suggesting a third alternative focused on the shared media of communications standards and metrologically traceable instruments. Latour’s [72, pp. 247–257] extended consideration of the roles of metrology is foundational. Latour [73, 74] characterizes this third alternative as amodern, and Dewey [36, p. 277] similarly arrives at a compatible unmodern perspective, saying that “…every science and every highly developed technology is a systematic network of connected facts and operations.” Galison [57] considers the modern focus on transcendental universals as positivist, the postmodern emphasis on relativism as antipositivist, and the unmodern inclusion of the instrument as postpositivist. A large and growing literature in science and technology studies pursues the implications of instruments and standards for understanding the progress of science [1, 5, 13, 19, 21, 37, 67, 90, 109].
Galison [58, p. 143] offers an “open-ended model” of how different communities of research and practice interrelate. This perspective allows:
-
partial autonomy to each community at their level of complexity:
-
experimentation’s focus on concrete observable data,
-
instrumentation’s focus on abstract communications standards, and
-
theory’s focus on formal models, laws, and predictive theories; and
-
-
“a rough parity among the strata—no one level is privileged, no one subculture has the special position of narrating the right development of the field or serving as the reduction basis” [57, p. 143].
A significant consequence of this open-ended model in physics is, Galison [57, pp. 46–47] points out, that
…between the scientific subcultures of theory and experiment, or even between different traditions of instrument making or different subcultures of theorizing, there can be exchanges (co-ordinations), worked out in exquisite detail, without global agreement. Theorists and experimenters, for example, can hammer out an agreement that a particular track configuration found on a nuclear emulsion should be identified with an electron and yet hold irreconcilable views about the properties of the electron, or about philosophical interpretations of quantum field theory, or about the properties of films.
The work that goes into creating, contesting, and sustaining local coordination is, I would argue, at the core of how local knowledge becomes widely accepted. At first blush, representing meaning as locally convergent and globally divergent seems paradoxical. On one hand, one might think that meaning could be given sentence by sentence. In this case the global sense of a language would be the arithmetical sum of the meaning given in each of its particular sentences. On the other hand, the holist would say that the meaning of any particular utterance is only given through the language in its totality. There is a third alternative, namely, that people have and exploit an ability to restrict and alter meanings in such a way as to create local senses of terms that speakers of both parent languages recognize as intermediate between the two. The resulting pidgin or creole is neither absolutely dependent on nor absolutely independent of global meanings.
What Galison describes here is the state of being suspended in language, semiotically, where abstract semantic standards mediate the negotiation of unrealistic conceptual ideals and unique, concrete local circumstances. This theme also emerges in the work of S. L. Star under the heading of the boundary object [19, 99,100,103], and in Woolley and Fuchs’ [121] contention that healthy scientific fields must incorporate both divergent and convergent thinking.
An obvious point of productive disagreement in this vein emerges in the chapter by Massof and Bradley, with their “heretical” statements about expecting and accepting failures of invariance in their low vision rehabilitation outcomes measurement and management system. Differential item functioning and differential person functioning take on a new significance when theory explains the structural invariance incorporated in a measurement standard, and item location estimates have been stable across thousands or even millions of cases. A variation on this point is raised by Allen and Pak in the section in their chapter on the tensions between standardization and personalization. Here, local failures of invariance become actionable and relevant bits of information clinicians and others need to know about if they are to be able to formulate effective interventions.
It is part of the nature of a boundary object to accept those concrete levels of manifestations of unique patterns in the diagnosis-specific ways described by Allen and Pak, and by Massof and Bradley. As Star [101, p. 251] put it,
…boundary objects…are a major method of solving heterogenous problems. Boundary objects are objects that are both plastic enough to adapt to local needs and constraints of the several parties employing them, yet robust enough to maintain a common identity across sites. They are weakly structured in common use, and become strongly structured in individual-site use.
Star and Ruhleder [103, p. 128] similarly say, “only those applications which simultaneously take into account both the formal, computational level and the informal, workplace/cultural level are successful.” As is suggested in the chapter by Fisher, might the ongoing failures of person-centered quality improvement efforts listed by Berwick and Cassel [11] derive from inattention to the nature of boundary objects?
In that same vein, a more pointed instance of the heterogeneity of perspectives implied by boundary objects emerges in the longstanding debates between item response theory (IRT) and Rasch model advocates [4, 31, 42]. The arguments here focus on the descriptive value of statistical models obtaining the lowest p-values in significance tests, vs the prescriptive value of scientific models providing narrative explanations of variation and information, with additional indications as to how instruments and sampling procedures might be improved. These purposes are not, of course, always pursued in mutually opposed ways, and in current practice, both of them typically assume measurement to be achieved primarily via centrally planned and executed data analyses, not via the distributed metrology of calibrated instruments advocated here.
But in this metrological context, the “Rasch debate” [42] is defused. Data analysis certainly has an essential place in science, even if it should not be the primary determining focus of measurement. Boundary objects align with a kind of pragmatic idealism that recognizes there are communities, times, and places in which each of the different levels of complexity represented by data, instruments, and theory is valid and legitimate. There are just as many needs for locally intensive investigations of idiosyncratic data variations as there are for interconnected instrument standards and globally distributed explanatory theories.
But there are different ways of approaching local failures of invariance. It is essential to not confuse levels of complexity [45, 47]. Data analyses of all kinds can be productively pursued in hierarchically nested contexts bound by consensus standards structuring broad communications. But local exceptions to the rule that do not clearly invalidate the empirical and theoretical bases of item calibrations should no longer be allowed to compromise the comparability of measurements. There is nothing radical or new in saying this. It has long been recognized that “The progress of science largely depends on this power of realizing events before they occur,” that “laws are the instruments of science, not its aim,” and that “the whole value…of any law is that it enables us to discover exceptions” [32, pp. 400, 428, 430]. Instead of conceiving measurement primarily in terms of statistically modeled multivariate interactions, a larger role needs to be made for scientific modeling of univariate dimensions, as these are the means by which metrology creates labor-saving “economies of thought” [7, 32, pp. 428–429, 55, 56, 83, pp. 481–495].
Butterfield [24, pp. 16–17, 25–26, 96–98] notes that, in the history of science, observations do not accumulate into patterns recognized as lawful; instead, science advances as new ways of projecting useful geometric idealizations are worked out. Measurement structures linear geometries affording conceptual grasps of concrete phenomena by positing the hypothesis that something varies in a way that might be meaningfully quantified. Kuhn [71, p. 219; original emphasis] makes the point, saying,
The road from scientific law to scientific measurement can rarely be traveled in the reverse direction. To discover quantitative regularity, one must normally know what regularity one is seeking and one's instruments must be designed accordingly; even then nature may not yield consistent or generalizable results without a struggle.
Practical metrological implementations of the results of measurement modeling exercises that begin by specifying the structure of lawful regularities, as in the use of Rasch’s models for measurement, require agreements on standardized criteria for knowing when and if comparability is substantively threatened. Efforts in this direction are being taken up, for instance, in the European NeuroMet project [39].
Taken out of context, the unfortunate effect of compromising the invariance of the unit quantity in the application of IRT models with multiple item parameters is that data are described to death. The value of identified models [98, 100], such as Rasch’s, concerns the practical implications of structural invariances for policy and programs. Rasch and Thurstone are acknowledged for their contributions in “fruitful” discussions concerning the development of the concept of identified models, those that require structural invariances reproducible across samples and instruments [70, p. 165]. Referred to by one of Rasch’s mentors, Frisch, as “autonomy” [2], this quality in cross-data patterns is characteristic of a class of models necessary to learning generalizable lessons informing practical capacities for predicting the future [48]. Over-parameterized models, in contrast, may achieve statistical significance only at the expense of practical significance, such that the particular relationships obtained in a given data set are so closely specified that they are useless for anticipating future data [6, 23, 38, p. 211, 81, p. 22, 99, 110, p. 235; 123, 127].
By sweeping unexpected responses into data-dependent item parameters, exceptions to the rule are concealed in summary statistics and hidden from end users who might otherwise be able to make use of them in the manner described in the chapters by Allen and Pak, and by Massof and Bradley. But the disclosure of anomalies is well-established as a primary function of measurement. Rasch [96, pp. 10, 124], Kuhn [71, p. 205], and Cook [32, p. 431] all illustrate this point using the example of the discovery of Neptune from perturbations in the orbit of Uranus. Burdick and Stenner [22] concur, noting that IRT models put analysts in a position akin to Ptolemaic astronomers, whose descriptive approach to planetary orbits was more accurate than could be achieved using Newton’s laws. What if astronomy had stuck with the Ptolemaic methods instead of adopting new ones based on physical theory? Ptolemaic astronomers can be imagined saying, “Forget about those perturbations in the orbit of Uranus. Our model accounts for them wonderfully.” If astronomy as a field had accepted that position, instead of insisting on the prescriptive Newtonian model, then Neptune never could have been discovered by estimating the position and mass of an object responsible for perturbations of the magnitude observed in Uranus’ orbit.
Though this process of being able to perceive exceptions only in relation to a standard may seem to be a highly technical feature of mathematical science, it is but an instance of the fact recognized by Plato in The Republic (523b–c) that “experiences that do not provoke thought are those that do not at the same time issue a contradictory perception.” That is, we do not really think much at all about experiences meeting our expectations. The labor-saving economy of thought, created when language pre-thinks the world for us, removes the need to bother with irrelevant details. Scientific instruments extend this economy by embodying invariant meaning structures that predict the form of new data.
But in contrast to these more typical situations, innovative ideas and thoughtful considerations tend to follow from observations that make one think, “That’s odd… .” And so it happened with the discovery of penicillin when a lab culture died, the discovery of x-rays when a lead plate was misplaced in a lab, of vulcanization when liquid rubber was accidentally left on a hot stove, of post-it notes when an experimental glue did not stick, etc.
Nature is revealed by means of exceptions that are often products of serendipitous accidents [97]. Anomalous observations are answers to questions that have not been asked. Because attention is focused on conceptually salient matters already incorporated in the linguistic economy of thought, most unexpected observations are ignored as mere noise, as nuisance parameters of little interest or value. Deconstructing the context in which unexpected observations arise is difficult, as it requires a capacity for closely following the phenomenology giving rise to something that may or may not be of any use. It is not only hard to know when pursuit of new avenues of investigation might be rewarded, but formulating the question to which the observation is an answer requires imagination and experience. Thus, Kuhn [71, p. 206] observes that a major value of quantified methods follows from the fact that numbers, sterile in themselves, “register departures from theory with an authority and finesse that no qualitative technique can duplicate.” Creating organizational environments capable of supporting full pivots in new directions is, of course, another matter entirely, but that is just what is entailed by the way science and society co-evolve [5, 54, 68, 69, 72, 77].
Continuing to accept summed ratings and multiparameter IRT models’ undefined, unstable, uninterpretable, sample- and instrument-dependent unit quantities as necessary and unavoidable has proven itself as a highly effective means of arresting the development of psychology and the social sciences. The common practice of willfully mistaking ordinal ratings and IRT estimates for interval measures perpetuates the failure to even conceive the possibility that communities of research and practice could think and act together in the terms of common languages employed for their value as the media of communication and the rules by which exceptions are revealed. Continued persistence in this confusion has reached a quite perverse degree of pathological denial, given that the equivalence of measurement scales across the sciences was deemed “widely accepted” over 35 years ago [88, p. 169] but still has not fulfilled its potential in mainstream applications. Grimby et al. [62] are not unreasonable, from our point of view, in viewing the ongoing acceptance of ordinal scores and undefined numeric units in person-centered outcome measurement as a form of fraudulent malpractice.
The dominant paradigm’s failure to distinguish numeric scores from measured quantities [10] commits the fundamental epistemological error of separating individual minds from the environmental context they inhabit [9, p. 493; 47]. Cognitive processes do not occur solely within brains, but must necessarily leverage scaffolded supports built into the external environment, such as alphabets, phonemes, grammars, dictionaries, printing presses, and quantitative unit standards’ quality assurance protocols [65, 66, 75, 95, 106]. Metrological infrastructures define the processes by which real things and events in the world are connected with formal conceptual ideals and are brought into words with defined meanings, including common metrics’ number words. And so, as Latour [72, pp. 249, 251] put it,
Every time you hear about a successful application of a science, look for the progressive extension of a network. … Metrology is the name of this gigantic enterprise to make of the outside a world inside which facts and machines can survive.
Shared standards and common languages are the means by which we prepare minds on mass scales to better recognize and act on chance events. As Pasteur put it in 1854, “in the fields of observation, chance favors only the prepared mind” [40, p. 309]. Because currently popular measurement methods neither map the unfolding sequences of changes in health, performance, functionality, or learning, nor express differences in terms of defined unit quantities with stated uncertainties, nor reveal unexpected departures from theory, person-centered care lacks systematic ways of apprehending and communicating accidental and serendipitous events that might possess actionable value.
Identified models and metrological standards set up an alternative vision of broad ecosystems of interdependent and reproductively viable forms of social life. A key potential for productive innovations emerges here, since, as the populations of these forms of life grow, highly improbable combinations (mutations) become so frequent that their failure to occur is what becomes unlikely [41, p. 91]. In other words, multilevel metrologically-traceable systems of measurement create the combinations of construct theories’ self-descriptive genotypes, instrument standard phenotypes, and mutable individual data needed for natural selection to amplify adaptively superior new forms of social life [91, 92] in a kind of epigenetic organism-environment integration. But if statistical descriptions of ordinal scores and IRT’s varying unit estimates continue to be taken as satisfactory approaches to quantifying person-centered outcomes, it is only reasonable to expect continued perpetuation of the status quo vis-à-vis systematically and statistically concealed anomalies and exceptions to the rule that could otherwise lead in qualitatively productive new directions at both individual and aggregate levels.
1.4 Concluding Comments
Differences between centrally-planned data analytics and distributed metrological networks were a matter of concern for Ben Wright [123, 126, 127] not just in his steadfast focus on science over statistics but more broadly throughout his conception of measurement [46, 117]. In the last paragraph of his 1988 Inaugural Address to the AERA Rasch Measurement SIG, Wright ([124]; also see [125]) said:
So, we come to my last words. The Rasch model is not a data model at all. You may use it with data, but it’s not a data model. The Rasch model is a definition of measurement, a law of measurement. Indeed, it’s the law of measurement. It’s what we think we have when we have some numbers and use them as though they were measures. And it’s the way numbers have to be in order to be analyzed statistically. The Rasch model is the condition that data must meet to qualify for our attention. It’s our guide to data good enough to make measures from. And it’s our criterion for whether the data with which we are working can be useful to us.
This recurring theme in Wright’s work is also foregrounded on the first page of Wright and Masters’ 1982 book [130]:
Because we are born into a world full of well-established variables it can seem that they have always existed as part of an external reality which our ancestors have somehow discovered. But science is more than discovery. It is also an expanding and ever-changing network of practical inventions. Progress in science depends on the creation of new variables constructed out of imaginative selections and organizations of experience.
With his colleagues and students, Wright [122] advanced the ideas of item banking and adaptive item administration [8, 28, 29, 82, 129], individualized “kidmap” reports [26, 27, 131] and self-scoring forms [12, 50, 79, 80, 86, 126, 128, 132]. All of these depend on understanding measurement as operationalized via structurally invariant, anchored item estimates and networks of instruments read in a common language at the point of use [46, 117]. Wright’s contributions to construct theorizing, instrument calibration, and individually-customized reports of special strengths and weaknesses span all three semiotic levels of complexity.
The chapters in this book build on and remain in dialogue with Wright’s [127, p. 33] realization that “Science is impossible without an evolving network of stable measures.” With increasing awareness of the metrological viability of instruments calibrated using Wright’s ideas [25, 52, 85, 93], and the emergence of new consensus standards for uniform metrics [39], there is also increasing need for examples of the kind brought together in this book. We hope that the efforts begun by the contributors to this volume will inspire a growing sphere of imaginative and productive applications of person-centered outcome metrology.
Notes
- 1.
For brevity, we use the term ‘patient’ as a shorthand for the recipient of health care or treatment, although we acknowledge this is quickly becoming an outdated term.
References
J.R. Ackermann, Data, Instruments, and Theory: A Dialectical Approach to Understanding Science (Princeton University Press, 1985)
J. Aldrich, Autonomy. Oxf. Econ. Pap. 41, 15–34 (1989)
D.D. Allen, M. Wilson, Introducing multidimensional item response modeling in health behavior and health education research. Health Educ. Res. 21(suppl_1), 73–i84 (2006)
D. Andrich, Controversy and the Rasch model: A characteristic of incompatible paradigms? Med. Care 42(1), I-7–I-16 (2004)
C. Audia, F. Berkhout, G. Owusu, Z. Quayyum, S. Agyei-Mensah, Loops and building blocks: A knowledge co-production framework for equitable urban health. J. Urban Health 98(3), 394–403 (2021)
D. Bamber, J.P.H. van Santen, How many parameters can a model have and still be testable? J. Math. Psychol. 29, 443–473 (1985)
E. Banks, The philosophical roots of Ernst Mach’s economy of thought. Synthese 139(1), 23–53 (2004)
M. Barney, W.P. Fisher Jr., Adaptive measurement and assessment. Annu. Rev. Organ. Psych. Organ. Behav. 3, 469–490 (2016)
G. Bateson, Steps to an Ecology of Mind: Collected Essays in Anthropology, Psychiatry, Evolution, and Epistemology (University of Chicago Press, 1972)
G. Bateson, Number is different from quantity. CoEvol. Q. 17, 44–46 (1978) [Reprinted from pp. 53–58 in Bateson, G. (1979). Mind and Nature: A Necessary Unity. New York: E. P. Dutton]
D.M. Berwick, C.K. Cassel, The NAM and the quality of health care-inflecting a field. N. Engl. J. Med. 383(6), 505–508 (2020)
W.R. Best, A Rasch model of the Crohn’s Disease Activity Index (CDAI): Equivalent levels of ranked attribute and continuous variable scales, in Crohn’s Disease: Etiology, Pathogenesis and Interventions, ed. by J. N. Cadwallader, (Nova Science Publishers, Inc, 2008), p. Chapter 5
A. Bilodeau, L. Potvin, Unpacking complexity in public health interventions with the Actor–Network Theory. Health Promot. Int. 33(1), 173–181 (2018)
M. Black, Models and Metaphors (Cornell University Press, 1962/2019)
M. Black, More about metaphor, in Metaphor and Thought, ed. by A. Ortony, (Cambridge University Press, Cambridge, 1993), pp. 19–43
P. Black, M. Wilson, S. Yao, Road maps for learning: A guide to the navigation of learning progressions. Meas. Interdiscip. Res. Persp. 9, 1–52 (2011)
A. Blok, I. Farias, C. Roberts (eds.), The Routledge Companion to Actor-Network Theory (Routledge, 2020)
R.K. Bode, A.W. Heinemann, P. Semik, Measurement properties of the Galveston Orientation and Amnesia Test (GOAT) and improvement patterns during inpatient rehabilitation. J. Head Trauma Rehabil. 15(1), 637–655 (2000)
G. Bowker, S. Timmermans, A. E. Clarke, E. Balka (eds.), Boundary Objects and beyond: Working with Leigh Star (MIT Press, 2015)
J. Browne, S. Cano, S. Smith, Using patient-reported outcome measures to improve healthcare: Time for a new approach. Med. Care 55, 901–904 (2017)
R. Bud, S.E. Cozzens (eds.), SPIE Institutes: Vol. 9. Invisible connections: Instruments, Institutions, and Science, ed. by R.F. Potter (SPIE Optical Engineering Press, 1992)
H. Burdick, A.J. Stenner, Theoretical prediction of test items. Rasch Meas. Trans. 10(1), 475 (1996). http://www.rasch.org/rmt/rmt101b.htm
J.R. Busemeyer, Y.-M. Wang, Model comparisons and model selections based on generalization criterion methodology. J. Math. Psychol. 44(1), 171–189 (2000)
H. Butterfield, The Origins of Modern Science (Revised Edition) (The Free Press, 1957)
S. Cano, L. Pendrill, J. Melin, W. Fisher, Towards consensus measurement standards for patient-centered outcomes. Measurement 141, 62–69 (2019)
T.W. Chien, Y. Chang, K.S. Wen, Y.H. Uen, Using graphical representations to enhance the quality-of-care for colorectal cancer patients. Eur. J. Cancer Care 27(1), e12591 (2018)
T.-W. Chien, W.-C. Wang, H.-Y. Wang, H.-J. Lin, Online assessment of patients’ views on hospital performances using Rasch model’s KIDMAP diagram. BMC Health Serv. Res. 9, 135 (2009). https://doi.org/10.1186/1472-6963-9-135. or http://www.ncbi.nlm.nih.gov/pmc/articles/PMC2727503/
B. Choppin, An item bank using sample-free calibration. Nature 219, 870–872 (1968)
B. Choppin, Recent developments in item banking, in Advances in Psychological and Educational Measurement, ed. by D. N. M. DeGruitjer, L. J. van der Kamp, (Wiley, 1976), pp. 233–245
W. Cohen, L. Mundy, T. Ballard, A. Klassen, S. Cano, J.P. Browne, A. Pusic, The BREAST-Q in surgical research: A review of the literature 2009–2015. J. Plast. Reconstr. Surg. 69, 149–162 (2016)
K. Cook, P.O. Monahan, C.A. McHorney, Delicate balance between theory and practice: Health status assessment and Item Response Theory. Med. Care 41(5), 571–574 (2003)
T.A. Cook, The Curves of Life (Dover, 1914/1979)
A. Coulter, Measuring what matters to patients. Br. Med. J. 356, j816 (2017)
L.H. Daltroy, M. Logigian, M.D. Iversen, M.H. Liang, Does musculoskeletal function deteriorate in a predictable sequence in the elderly? Arthritis Care Res. 5, 146–150 (1992)
P. De Boeck, M. Wilson. Explanatory item response models: A generalized linear and nonlinearapproach. Statistics for Social and Behavioral Sciences). New York: Springer-Verlag (2004)
J. Dewey, in Unmodern Philosophy and Modern Philosophy, ed. by P. Deen, (Southern Illinois University Press, 2012)
S. Donetto, C. Chapman, S. Brearley, A.M. Rafferty, D. Allen, G. Robert, Exploring the impact of patient experience data in acute NHS hospital trusts in England: Using Actor-Network Theory to optimise organisational strategies and practices for improving patients’ experiences of care. Health Serv. Deliv. Res. 14, 156 (2019)
S.E. Embretson, Item Response Theory models and spurious interaction effects in factorial ANOVA designs. Appl. Psychol. Meas. 20(3), 201–212 (1996)
EMPIR Project 18HLT04 NeuroMet, Innovative Measurements for Improved Diagnosis and Management of Neurodegenerative Diseases. https://www.lgcgroup.com/our-programmes/empir-neuromet/neuromet-landing-page/ (2022)
S. Finger, Minds Behind the Brain: A History of the Pioneers and Their Discoveries (Oxford University Press, 2004)
R.A. Fisher, Retrospect of the criticisms of the theory of natural selection, in Evolution as a Process, ed. by J. Huxley, A. C. Hardy, E. B. Ford, (George Allen & Unwin Ltd, 1954), pp. 84–98
W.P. Fisher Jr., The Rasch debate: Validity and revolution in educational measurement, in Objective Measurement: Theory into Practice, ed. by M. Wilson, vol. II, (Ablex Publishing Corporation, 1994), pp. 36–72
W.P. Fisher Jr., Objectivity in psychosocial measurement: What, why, how. J. Outcome Meas. 4(2), 527–563 (2000)
W. P. Fisher, Jr. The standard model in the history of the natural sciences, econometrics, and the socialsciences. J. Phys. Conf. Ser. 238(1) (2010). http://iopscience.iop.org/1742-6596/238/1/012016/pdf/1742-6596_238_1_012016.pdf.
W.P. Fisher Jr., Contextualizing sustainable development metric standards: Imagining new entrepreneurial possibilities. Sustainability 12(9661), 1–22 (2020a)
W.P. Fisher Jr., Wright, Benjamin D. [Biographical entry], in SAGE Research Methods Foundations, ed. by P. Atkinson, S. Delamont, A. Cernat, J. W. Sakshaug, R. Williams, (Sage, 2020b). https://methods.sagepub.com/foundations/wright-benjamin-d
W.P. Fisher Jr., Bateson and Wright on number and quantity: How to not separate thinking from its relational context. Symmetry 13, 1415 (2021a)
W.P. Fisher Jr., Separation theorems in econometrics and psychometrics: Rasch, Frisch, two fishers, and implications for measurement. J. Interdiscip. Econ., OnlineFirst, 1–32 (2021b)
W.P. Fisher Jr., Metaphor and Measurement (Submitted, in Review, 2022)
W.P. Fisher Jr., R.F. Harvey, K.M. Kilgore, New developments in functional assessment: Probabilistic models for gold standards. NeuroRehabilitation 5(1), 3–25 (1995)
W.P. Fisher Jr., E.P.-T. Oon, S. Benson, Applying design thinking to systemic problems in educational assessment information management. J. Phys. Conf. Ser. 1044, 012012 (2018). http://iopscience.iop.org/article/10.1088/1742-6596/1044/1/012012
W.P. Fisher Jr., A.J. Stenner, Theory-based metrological traceability in education: A reading measurement network. Measurement 92, 489–496 (2016). http://www.sciencedirect.com/science/article/pii/S0263224116303281
W.P. Fisher Jr., A.J. Stenner, Ecologizing vs modernizing in measurement and metrology. J. Phys. Conf. Ser. 1044, 012025 (2018). http://iopscience.iop.org/article/10.1088/1742-6596/1044/1/012025
W.P. Fisher Jr., M. Wilson, Building a productive trading zone in educational assessment research and practice. Pensamiento Educativo: Revista de Investigacion Educacional Latinoamericana 52(2), 55–78 (2015)
G. Franck, The scientific economy of attention: A novel approach to the collective rationality of science. Scientometrics 55(1), 3–26 (2002)
G. Franck, The economy of attention. J. Sociol. 55(1), 8–19 (2019)
P. Galison, Image and Logic: A Material Culture of Microphysics (University of Chicago Press, 1997)
P. Galison, Trading zone: Coordinating action and belief, in The Science Studies Reader, ed. by M. Biagioli, (Routledge, 1999), pp. 137–160
J. Goldstein, M. Chun, D. Fletcher, J. Deremeik, R. Massof, Low vision research network study group. Visual ability of patients seeking outpatient low vision services in the United States. J. AMA Ophthalmol. 132, 1169–1177 (2014)
J. Golinski, Is it time to forget science? Reflections on singular science and its history. Osiris 27(1), 19–36 (2012)
C.V. Granger, R.T. Linn, Biologic patterns of disability. J. Outcome Meas. 4(2), 595–615 (2000). http://jampress.org/JOM_V4N2.pdfs
G. Grimby, A. Tennant, L. Tesio, The use of raw scores from ordinal scales: Time to end malpractice? J. Rehabil. Med. 44, 97–98 (2012)
L. Guttman, The basis for scalogram analysis, in Measurement and Prediction, Studies in Social Psychology in World War II. Volume 4, ed. by S. A. Stouffer, L. Guttman, E. A. Suchman, P. F. Lazarsfeld, S. A. Star, J. A. Clausen, (Wiley, 1950), pp. 60–90
D.J. Haraway, Modest witness: Feminist diffractions in science studies, in The Disunity of Science: Boundaries, Contexts, and Power, ed. by P. Galison, D. J. Stump, (Stanford University Press, 1996), pp. 428–441
E. Hutchins, Cognition in the Wild (MIT Press, 1995)
E. Hutchins, The cultural ecosystem of human cognition. Philos. Psychol. 27(1), 34–49 (2014)
D. Ihde, Instrumental Realism: The Interface Between Philosophy of Science and Philosophy of Technology, The Indiana Series in the Philosophy of Technology (Indiana University Press, 1991)
S. Jasanoff, States of Knowledge: The Co-production of Science and Social Order, International Library of Sociology (Routledge, 2004)
S. Jasanoff, The practices of objectivity in regulatory science, in Social Knowledge in the Making, ed. by C. Camic, N. Gross, M. Lamont, (University of Chicago Press, 2011), pp. 307–338
T.C. Koopmans, O. Reiersøl, The identification of structural characteristics. Ann. Math. Stat. XXI, 165–181 (1950)
T.S. Kuhn, The function of measurement in modern physical science. Isis 52(168), 161–193 (1961/1977). (Rpt. in T. S. Kuhn, (Ed.). (1977). The essential tension: Selected studies in scientific tradition and change (pp. 178–224). University of Chicago Press)
B. Latour, Science in Action: How to Follow Scientists and Engineers Through Society (Harvard University Press, 1987)
B. Latour, Postmodern? No, simply amodern: Steps towards an anthropology of science. Stud. Hist. Phil. Sci. 21(1), 145–171 (1990)
B. Latour, We Have Never Been Modern (Harvard University Press, 1993)
B. Latour, Cogito ergo sumus! Or psychology swept inside out by the fresh air of the upper deck: Review of Hutchins’ Cognition in the Wild, MIT Press, 1995. Mind Cult. Activity Int. J. 3(192), 54–63 (1995)
B. Latour, To modernise or ecologise? That is the question, in Remaking Reality: Nature at the Millennium, ed. by B. Braun, N. Castree, (Routledge, 1998), pp. 221–242
B. Latour, Reassembling the Social: An Introduction to Actor-Network-Theory, Clarendon Lectures in Management Studies (Oxford University Press, 2005)
J.M. Linacre, Stochastic Guttman order. Rasch Meas. Trans. 5(4), 189 (1991). http://www.rasch.org/rmt/rmt54p.htm
J.M. Linacre, Instantaneous measurement and diagnosis. Phys. Med. Rehabil. State Art Rev. 11(2), 315–324 (1997). http://www.rasch.org/memo60.htm
J. Liu, Development and translation of measurement findings for the motivation assessment for team readiness, integration, and collaboration self-scoring form. Am. J. Occup. Ther. 72(4_Supplement_1), 7211500015p1 (2018)
J. Lumsden, Tests are perfectly reliable. Br. J. Math. Stat. Psychol. 31, 19–26 (1978)
M.E. Lunz, B.A. Bergstrom, R.C. Gershon, Computer adaptive testing. Probabilistic Conjoint Measurement. A Special Issue of the Int. J. Educ. Res. (W.P. Fisher, Jr., B.D. Wright, eds.), 21(6), 623–634 (1994)
E. Mach, The Science of Mechanics: A Critical and Historical Account Of Its Development, Trans. T.J. McCormack, 4th ed. (The Open Court Publishing Co., 1883/1919)
L. Mari, M. Wilson, An introduction to the Rasch measurement approach for metrologists. Measurement 51, 315–327 (2014)
L. Mari, M. Wilson, A. Maul. Measurement Across the Sciences, R. Morawski, G. Rossi, others, eds., Springer Series in Measurement Science and Technology (Springer, 2021)
G.N.Masters, R.J. Adams, J. Lokan, Mapping student achievement. Probabilistic Conjoint Measurement, A Special Issue of the Int. J. Educ. Res. (W.P. Fisher, Jr., B.D. Wright, eds.), 21(6), 595–610 (1994)
J. Melin, S. Cano, L. Pendrill, The role of entropy in construct specification equations (CSE) to improve the validity of memory tests. Entropy 23(2), 212 (2021)
L. Narens, R.D. Luce, Measurement: The theory of numerical assignments. Psychol. Bull. 99(2), 166–180 (1986)
N.J. Nersessian, Maxwell and “the method of physical analogy”: Model-based reasoning, generic abstraction, and conceptual change, in Reading Natural Philosophy: Essays in the History and Philosophy of Science and Mathematics, ed. by D. Malament, (Open Court, 2002), pp. 129–166
J. O’Connell, Metrology: The creation of universality by the circulation of particulars. Soc. Stud. Sci. 23, 129–173 (1993)
H.H. Pattee, Universal principles of measurement and language functions in evolving systems, in Complexity, Language, and Life: Mathematical Approaches, ed. by J. L. Casti, A. Karlqvist, (Springer, 1985), pp. 268–281
H.H. Pattee, J. Raczaszek-Leonardi, Biosemiotics. Vol. 7: Laws, Language and Life: Howard Pattee’s Classic Papers on the Physics of Symbols with Contemporary Commentary, M. Barbieri, J. Hoffmeyer, eds., (Springer, 2012)
L. Pendrill, Quality Assured Measurement: Unification Across Social and Physical Sciences, R. Morawski, G. Rossi, others, eds., Springer Series in Measurement Science and Technology (Springer, 2019)
L. Pendrill, W.P. Fisher Jr., Counting and quantification: Comparing psychometric and metrological perspectives on visual perceptions of number. Measurement 71, 46–55 (2015)
E. Petracca, S. Gallagher, Economic cognitive institutions. J. Inst. Econ. 16(6), 747–765 (2020)
G. Rasch, Probabilistic Models for Some Intelligence and Attainment Tests. (Reprint, with Foreword and Afterword by B. D. Wright, Chicago: University of Chicago Press, 1980) (Danmarks Paedogogiske Institut, 1960)
R.M. Roberts, Serendipity: Accidental Discoveries in Science (Wiley, 1989)
E. San Martin, J. Gonzalez, F. Tuerlinckx, Identified parameters, parameters of interest, and their relationships. Meas. Interdiscip. Res. Persp. 7(2), 97–105 (2009)
E. San Martin, J. Gonzalez, F. Tuerlinckx, On the unidentifiability of the fixed-effects 3 PL model. Psychometrika 80(2), 450–467 (2015)
E. San Martin, J.M. Rolin, Identification of parametric Rasch-type models. J. Stat. Plan. Inference 143(1), 116–130 (2013)
S.L. Star, The structure of ill-structured solutions: Boundary objects and heterogeneous distributed problem solving, in Proceedings of the 8th AAAI Workshop on Distributed Artificial Intelligence, Technical Report, (Department of Computer Science, University of Southern California, 1988/2015). (Rpt. in G. Bowker, S. Timmermans, A. E. Clarke & E. Balka, (Eds.). (2015). Boundary objects and beyond: Working with Leigh Star (pp. 243–259). The MIT Press)
S.L. Star, J.R. Griesemer, Institutional ecology, ‘translations,’ and boundary objects: Amateurs and professionals in Berkeley’s Museum of Vertebrate Zoology, 1907–39. Soc. Stud. Sci. 19(3), 387–420 (1989)
S.L. Star, K. Ruhleder, Steps toward an ecology of infrastructure: Design and access for large information spaces. Inf. Syst. Res. 7(1), 111–134 (1996)
A.J. Stenner, W.P. Fisher Jr., M.H. Stone, D.S. Burdick, Causal Rasch models. Front. Psychol. Quant. Psychol. Meas. 4(536), 1–14 (2013)
M.H. Stone, Substantive scale construction. J. Appl. Meas. 4(3), 282–297 (2003)
J. Sutton, C.B. Harris, P.G. Keil, A.J. Barnier, The psychology of memory, extended cognition, and socially distributed remembering. Phenomenol. Cogn. Sci. 9(4), 521–560 (2010)
UK Department of Health, UK PROMS Programme (2022), https://digital.nhs.uk/data-and-information/data-tools-and-services/data-services/patient-reported-outcome-measures-proms
US Food and Drug Administration, FDA Patient-Focused Drug Development Guidance Series for Enhancing the Incorporation of the Patient’s Voice in Medical Product Development and Regulatory Decision Making (2022), https://www.fda.gov/drugs/development-approval-process-drugs/fda-patient-focused-drug-development-guidance-series-enhancing-incorporation-patients-voice-medical
A. van Helden, T. L. Hankins (eds.), Instruments (Vol. 9). Osiris: A Research Journal Devoted to the History of Science and Its Cultural Influences (University of Chicago Press, 1994)
N.D. Verhelst, C.A.W. Glas, The one parameter logistic model, in Rasch Models: Foundations Recent Developments, and Applications, ed. by G. H. Fischer, I. W. Molenaar, (Springer, 1995), pp. 215–237
M. Wilson (ed.), National Society for the Study of Education Yearbooks, Vol. 103, Part II: Towards Coherence Between Classroom Assessment and Accountability (University of Chicago Press, 2004)
M.R. Wilson, Constructing Measures: An Item Response Modeling Approach (Lawrence Erlbaum Associates, 2005a)
M. Wilson, Subscales and summary scales: Issues, in Outcomes Assessment in Cancer: Measures, Methods and Applications, ed. by J. Lipscomb, C. C. Gotay, C. Snyder, (Cambridge University Press, 2005b), pp. 465–479
M. Wilson, Cognitive diagnosis using item response models. Zeitschrift Für Psychologie/J. Psychol. (Special Issue: Current Issues in Competence Modeling and Assessment) 216(2), 74–88 (2008)
M. Wilson, Making measurement important for education: The crucial role of classroom assessment. Educ. Meas. Issues Pract. 37(1), 5–20 (2018)
M. Wilson, W.P. Fisher Jr., Preface: 2016 IMEKO TC1-TC7-TC13 Joint Symposium: Metrology Across the Sciences: Wishful Thinking? J. Phys. Conf. Ser. 772(1), 011001 (2016)
M. Wilson, W. P. Fisher Jr. (eds.), Psychological and Social Measurement: The Career and Contributions of Benjamin D. Wright, ed. by M. G. Cain, G. B. Rossi, J. Tesai, M. van Veghel, K.-Y. Jhang, Springer Series in Measurement Science and Technology (Springer, 2017). https://link.springer.com/book/10.1007/978-3-319-67304-2
M. Wilson, W.P. Fisher Jr., Preface of special issue, Psychometric Metrology. Measurement 145, 190 (2019)
M. Wilson, P. Gochyyev, Having your cake and eating it too: Multiple dimensions and a composite. Measurement 151, 107247 (2020)
M.N. Wise, Precision: Agent of unity and product of agreement. Part III—“Today precision must be commonplace”, in The Values of Precision, ed. by M. N. Wise, (Princeton University Press, 1995), pp. 352–361
A.W. Woolley, E. Fuchs, Collective intelligence in the organization of science. Organ. Sci. 22(5), 1359–1367 (2011)
B.D. Wright, Solving measurement problems with the Rasch model. J. Educ. Meas. 14(2), 97–116 (1977)
B.D. Wright, Despair and hope for educational measurement. Contemp. Educ. Technol. 3(1), 281–288 (1984)
B.D. Wright, Georg Rasch and measurement: Informal remarks by Ben Wright at the inaugural meeting of the AERA Rasch Measurement SIG, New Orleans—April 8, 1988. Rasch Meas. Trans. 2, 25–32 (1988a). http://www.rasch.org/rmt/rmt23.htm. (Rpt. in J. M. Linacre, (Ed.). (1995). Rasch Measurement Transactions, Part 1 (pp. 25–32). MESA Press)
B.D. Wright, Useful measurement through concurrent equating and one-step (concurrent) item banking. Rasch Meas. Trans. 2(2), 24 (1988b). http://www.rasch.org/rmt/rmt22f.htm
B.D. Wright, Fundamental measurement for outcome evaluation. Phys. Med. Rehabil. State Art Rev. 11(2), 261–288 (1997a)
B.D. Wright, A history of social science measurement. Educ. Meas. Issues Pract. 16(4), 33-45–33-52 (1997b)
B.D. Wright, Benjamin D. Wright’s annotated KeyMath diagnostic profile. Rasch Meas. Trans. 25(4), 1350 (2012). https://www.rasch.org/rmt/rmt254.pdf
B.D. Wright, S.R. Bell, Item banks: What, why, how. J. Educ. Meas. 21(4), 331–345 (1984). http://www.rasch.org/memo43.htm
B.D. Wright, G.N. Masters, Rating Scale Analysis: Rasch Measurement (MESA Press, 1982)
B.D. Wright, R.J. Mead, L.H. Ludlow, KIDMAP: Person-by-Item Interaction Mapping, MESA Memorandum #29 (MESA Press, Chicago, 1980). http://www.rasch.org/memo29.pdf
B.D. Wright, M.H. Stone, Best Test Design: Rasch Measurement (MESA Press, 1979)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Fisher Jr., W.P., Cano, S.J. (2023). Ideas and Methods in Person-Centered Outcome Metrology. In: Fisher, Jr., W.P., Cano, S.J. (eds) Person-Centered Outcome Metrology. Springer Series in Measurement Science and Technology. Springer, Cham. https://doi.org/10.1007/978-3-031-07465-3_1
Download citation
DOI: https://doi.org/10.1007/978-3-031-07465-3_1
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-07464-6
Online ISBN: 978-3-031-07465-3
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)