Abstract
The last 50 years of human and social science measurement theory and practice have witnessed a steady retreat from physical science as the canonical model. Humphry (2011) unapologetically draws on metrology and physical science analogies to reformulate the relationship between discrimination and the unit. This brief note focuses on why this reformulation is important and on how these ideas can improve measurement theory and practice.
Measurement: Interdisciplinary Research and Perspectives, 9. 2011.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
In principle, any characteristic of the instrument, objects of measurement, or measurement contexts that influences the change in probability of a modeled response (i.e., slope of the item response function) is a threat to unit invariance. Such influences produce instability in the unit. Progress in science requires stable unit specifications because it is only through such conventions that a unit is reproduced and shared. A large fraction of the metrology budget for more mature sciences is devoted to identifying and engineering around threats to unit stability.
A big step in realizing the metrology program outlined by Humphry is the abandonment of descriptive Rasch models in favor of an explicit causal interpretation of the regression of the response probability on the exponentiated difference between a person parameter and an instrument/item parameter. All that is meant by this causal claim is that an intervention on the person parameter can be traded off for an offsetting intervention on the item parameters to hold the probability of a correct response constant. If this trade-off property is experimentally verified throughout the range of the attribute and is invariant across task types, person characteristics, and measurement contexts, then, a stable reproducible unit for measuring persons and items has been specified and actualized. If invariance is lacking, say for a new task type, the first thing to check is whether the new task type purportedly measuring the same construct as the task type evidencing invariance has unexpected added easiness/hardness or is measuring in a differently sized unit. In writing research we have found that human and machine scoring of student writings need to be adjusted for differences in unit size, and once this adjustment is made the concordance between machine scoring and human ratings is striking.
A measurement instrument is built to detect variation of a kind. The specification equation answers the question what causes the variation the instrument detects? It is also clear that non-test behaviors (e.g., reading a Harry Potter novel) can be brought into a reading measurement frame of reference by imagining that the novel contains an ensemble of test items the distribution properties of which may be treated as known. The text complexity for the novel is then the reader ability required to correctly answer, say, 75% of the virtual items making up the novel. The specification equation is the tool used to calibrate these non-test behaviors. A third use of the specification equation is to calibrate actual test items making it possible to convert counts correct into quantities for, for example, computer-generated reading items. However, all of the above uses of the specification equation pale in relation to the role the specification equation can and should play in maintaining the unit of scale for an attribute. Once a specification equation for an attribute is “locked down” the unit origin and unit size are fixed. New task types and measurement contexts (e.g., machine vs. human scoring) can be linked back to the fixed unit. Tampering with the specification equation by changing the intercept or adjusting the regression weights alters the origin and unit size, respectively. Thus, the specification equation defines the unit, independent of any particular test form or linking study, and maintains that unit over widely varying instrumentation and measurement contexts.
It often happens that new task types, improved theory or improved technology necessitate changes to the specification equation that, if not taken into account, will result in a “new” unit. Adjustments to the specification equation can be made to ensure that the “old” unit and “new” unit are comparable as to origin and unit size. Typically some standard artifact (boiling point of normal water, platinum meter bar, or collection of empirically calibrated texts) is used to ensure unit stability over time.
Scale unification is a well-understood theme in the history of science. The obverse, scale proliferation, is a prominent feature of measurement theory and practice in the human and social sciences. Today there are dozens of scales for measuring every important attribute (anxiety, depression, reading ability, spatial reasoning). There is often debate about whether the fact that the task types vary in added easiness/hardness or unit size may signal that something different is being measured. Infrequently, attempts are made to document that the same attribute is being measured by the various instruments and then linking studies are launched that result in correspondence tables or equations that link the respective scales (similar to the equation that links the Fahrenheit and Celsius scales).
Humphry has sketched a metrology program for the human and social sciences to follow as we begin the arduous task of building a system of units. Although the far-term goal is a system of units for the human and social sciences, the near-term goal should be an invariant unit shared by a relevant community for a single attribute. We will learn much as these first attempts play out.
Reference
Humphry, S. M. (2011). The role of the unit in physics and psychometrics. Measurement: Interdisciplinary Research and Perspectives, 9(1), 1–24.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2023 The Author(s)
About this chapter
Cite this chapter
Stenner, A.J., Burdick, D.S. (2023). Can Psychometricians Learn to Think Like Physicists?. In: Fisher Jr., W.P., Massengill, P.J. (eds) Explanatory Models, Unit Standards, and Personalized Learning in Educational Measurement. Springer, Singapore. https://doi.org/10.1007/978-981-19-3747-7_16
Download citation
DOI: https://doi.org/10.1007/978-981-19-3747-7_16
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-3746-0
Online ISBN: 978-981-19-3747-7
eBook Packages: Physics and AstronomyPhysics and Astronomy (R0)