1.1 Why We Wrote This Book

It would be difficult to overstate the value and importance of measurement in nearly every aspect of society. Every time we purchase or eat food, take prescribed medicine, travel in a vehicle, use a phone or computer, or step inside a building, we place our trust in the results of measurements—and, for the most part, that trust seems well-earned, and as such measurement is commonly associated with precision, accuracy, and overall trustworthiness. Against this backdrop, it seems little wonder that the human sciencesFootnote 1 have, since their inception, attempted to incorporate measurement into their activities as well. However, despite—or perhaps, to at least some extent, because of—the ubiquity of measurement-related concepts and discourse, there remains a remarkable lack of shared understanding of these concepts across (and often within) different fields, perhaps most visibly reflected in the vast array of proposed definitions of measurement itself. In addition to hampering communication across different disciplinary fields regarding shared methodological principles, such a lack of common understanding hints at the possibility that the same terms—“measurement”, “measurement result”, measurement model”, etc.—are used with very different and possibly even incompatible meanings in different disciplines, with potentially disastrous results.

Of course, measurement is not a natural entity, pre-existing and waiting to be discovered; rather, measurement is designed and performed on purpose. Hence, in attempting to define or characterize measurement, we inevitably must attend to domain-related conventions and customs in the contexts in which measurement has been developed and established. Given the aforementioned lack of common understanding of measurement across the scientific and technical literatures, one might conclude that there is an irreducible multiplicity of measurement-related concepts and terms; from this perspective, an endeavor aimed at exploring a possible shared understanding of measurement across the sciences would seem to be pointless.

Obviously, this is not our position. We believe, instead, that a transdisciplinary understanding of the nature of measurement is a valuable target, for both theoretical and practical reasons. As previously noted, measurement is commonly acknowledged to be a (or even the) basic process of acquiring and formally expressing information on empirical entities, which suggests the usefulness of a shared understanding of basic and general concepts related to measurement (hence not only <measurement> itself, but also <measurand>, <measurement result>, <uncertainty>, <accuracy>, etc.Footnote 2). Information is routinely acquired and reported by means of values on properties as diverse as reading comprehension ability, well-being, the quality of industrial products, the complexity of software systems, user satisfaction with social services, and political attitudes. But should these cases all be understood as instances of measurement? Stated alternatively, are all such cases worthy of the trust commonly afforded to measurement, and if so, what do they share in common that makes them so worthy? Or, at least in some cases, are such examples better understood as instances of something other than measurement—perhaps something less trustworthy, such as statements of opinion or subjective evaluation? To restate the issue as succinctly as possible: what justifies the perceived epistemic authority of measurement?

We think any attempt to answer such questions will require acknowledgment that measurement is not something that can be isolated from scientific and technical knowledge more generally. Characterizations of measurement found in different fields and different historical periods relate in important ways to more general issues in science, philosophy, and society, such as the nature of properties and the objects that bear them, the relationship between experimentation and modeling, and the relationships between data, information, and knowledge—and indeed, the very possibility of (true) knowledge. In particular, on this last point, a question of increasing relevance to our data-saturated world is: given the increasing trends of interest in “big data” and “datafication”, under what conditions does data actually provide information on the measurand? In the radically new context of widespread availability of large, sometimes huge, amounts of data in which we are now living, it is plausible that measurement science will maintain a role in our society only by attaining a broadly shared fundamental basis, instead of dissolving in a myriad of technical sub-disciplines.

1.1.1 Is Measurement Necessarily Physical?

Kuhn (1961: p. 161) once observed that

at the University of Chicago, the facade of the Social Science Research Building bears Lord Kelvin’s famous dictum: “when you cannot measure, your knowledge is of a meagre and unsatisfactory kind.” Would that statement be there if it had been written not by a physicist, but by a sociologist, political scientist, or economist? Or again, would terms like “meter reading” and “yardstick” recur so frequently in contemporary discussions of epistemology and scientific method were it not for the prestige of modern physical science and the fact that measurement so obviously bulks large in its research?

It is hard to dispute that, for most, the paragon of measurement is physical measurement. For some, this might even be the end of the conversation: measurement is necessarily of physical quantities, and thus, anything called “measurement” in the human sciences is either ultimately of something physical, or is at best a metaphorical application of the concept of measurement to something that is in fact not measurement. And indeed, there is some historical weight to this argument: for much of the history of human civilization, measurement was associated with a relatively small number of spatiotemporal properties, such as length, mass, and time, and more recently force, temperature, and electric charge. As scientific understanding of the physical world has advanced, these properties have become increasingly understood as mutually interdependent, via physical laws (such as Newton’s second law of motion, which posits that force is the product of mass and acceleration); when values are attributed to physical properties, such laws can be used for inferential purposes by operating mathematically on the available values by means of the relevant laws. Reasoning about the physical world in this way proved so successful that it was the common ground upon which new branches of physics were created in the eighteenth and nineteenth centuries, in particular thermodynamics and electromagnetism, the development of each of which involved the discovery of their own sets of properties and laws connecting them. And, of course, such scientific advances led to technological changes, which in turn triggered further scientific advances, as well as changes in society at large.

This positive feedback loop would not have been possible without effective tools for obtaining information about the relevant properties. This is, of course, the role played by measurement; as Norman Campbell effectively summarized, “the object of measurement is to enable the powerful weapon of mathematical analysis to be applied to the subject matter of science” (1920: p. 267).Footnote 3 Measurement is thus a critical component of the scientific paradigm of the physical sciences and has been integral to its success.

Again, given this, it is perhaps unsurprising that other scientific fields and areas of human activity have increasingly incorporated measurement-related concepts and terms into their own activities. In part, this seems to be based on a widespread acceptance of the premise contained in Lord Kelvin’s credo—that without measurement, knowledge is at best “meagre and unsatisfactory”—with the further implication that, as put for example by psychologist James McKeen Cattell, “psychology cannot attain the certainty and exactness of the physical sciences, unless it rests on a foundation of experiment and measurement” (1890: p. 373; for more extended discussions see, e.g., Michell, 1999; Briggs, 2021).

In the late nineteenth and early twentieth centuries, scholars working in the field of psychophysics, such as Gustav Theodor Fechner and Stanley Smith Stevens, investigated relationships between physical stimuli and their associated human responses (sensation and perception), on the premise that by establishing quantitative relationships (e.g., through what is now known as Fechner’s Law; Fechner, 1860) between known physical quantities such as weight and human sensations, the latter could be measured as well.Footnote 4 Separately, scholars interested in individual differences, such as Francis Ysidro Edgeworth (1888, 1892) and Charles Spearman (1904, 1907), applied statistical methods and logic originally developed in the context of astronomy (in particular related to the Gaussian distribution) to the study of human beings, largely in the context of scores on educational and psychological tests. Starting from the observation that some sets of test items, such as arithmetic questions, seemed to give more internally consistent results than others, they posited that an individual’s “observed scores” (O) on tests could be decomposed into “true scores” (T) and “errors” (E), i.e., O = T + E, giving rise to a field that would later come to be known as psychometrics.

Perhaps unsurprisingly, these early attempts at measuring psychosocial properties were met with skepticism by some members of the broader scientific community. As one important committee concluded, referring in particular to psychophysics, “to insist on calling these other processes measurement adds nothing to their actual significance but merely debases the coinage of verbal intercourse” (Ferguson et al., 1940: p. 345). Even within the human sciences, many were (and still are) skeptical of psychosocial measurement, often based on concerns such as that “not everything that can be counted counts, and not everything that counts can be counted”, as put by the sociologist William Bruce Cameron (1963: p. 13).

In part, skepticism about psychosocial measurement may have been related to the very fact that, as previously mentioned, available examples of measurement pertained exclusively to physical properties, which might have given the impression that only physical properties are measurable. Interestingly, even many prominent proponents of psychosocial measurement seem to have accepted this position; for example, Jum C. Nunnally and Ira H. Bernstein, in the 3rd edition of their influential textbook Psychometric Theory noted that “it is more defensible to make no claims for the objective reality of a construct name such as ‘anxiety’ and simply use the construct name as a convenient label for a particular set of observable variables. The name is ‘valid’ only to the extent that it accurately describes the kinds of observables being studied to others. […] The words that scientists use to denote constructs (e.g., ‘anxiety’ and ‘intelligence’) have no real counterparts in the world of observables; they are only heuristic devices for exploring observables. Whereas, for example, the scientist might find it more comfortable to speak of anxiety than of [item] set A, only set A and its relations objectively exist, research results relate only set A, and, in the final analysis, only relations within members of set A and between set A and members of other sets can be unquestionably documented”. (Nunnally & Bernstein, 1994) This use of a term such as “heuristic devices” again seems to imply that claims about measurement in the human sciences are best understood as being metaphorical rather than literal; alternatively, one might conclude the term “measurement” simply has irreducibly different meanings in the physical sciences and the human sciences, which was indeed the conclusion of some human scientists like Stanley Smith Stevens (see, e.g., McGrane, 2015).Footnote 5

But, to us, such conclusions seem unsatisfactory: again, measurement is regarded as integral to science and society on the basis of its epistemic authority, and so the question remains of what, exactly, justifies claims to such authority. As Kuhn asked: “what [is] the source of [the] special efficacy” of measurement? (1961: p. 162). Stated alternatively, what are the necessary and sufficient elements of trustworthy measurement processes, independent of domain or area of application? We hope, in this book, to address exactly this question: what could be a common foundation of measurement across the sciences?

1.2 Some Familiar and Not-So Familiar Contexts for Measurement

In the sections below, we introduce two examples of the sorts of measurement that we had in mind when writing this book. Each will appear later at several points in the text, along with other examples when they are more pertinent. In particular, we recognize that many of our readers might not have experience with measurement across both the physical sciences and the human sciences, and hence the accounts are each designed to be quite basic, starting from a very low expectation of expertise in their respective topic areas. These basic accounts will be expanded, deepened and updated at appropriate places in the text. We have also included a third section, where we give an illustration of how the typical format of measurement in the human sciences, in terms of sets of items, can be seen as structurally analogous to measurement approaches in the physical sciences.

1.2.1 A Brief Introduction to Temperature and Its Measurement

While discussing the features and the problems of measurement systems in this book, we mention some examples of physical properties, including the well-known cases of length and mass. In particular, in Chap. 6 the hypothesis that length is an additive quantity is exploited in the construction that, starting from lengths of rods, leads to units of length and, hence, values of length. But a bit more is developed for the example of temperature, which is used in Chap. 6 for showing how values may be obtained for a non-additive quantity and also in Chap. 7 where we introduce a model of direct measurement.

From the perspective of our conceptual analysis of measurement, temperature has some very interesting features. It is, first, a property of critical importance: “Temperature has a profound influence upon living organisms. Animal life is normally feasible only within a narrow range of body temperatures, with the extremes extending from about 0–5 °C (32–41 °F) to about 40–45 °C (104–113 °F).”Footnote 6 It is a property that we perceive with our senses and that we understand qualitatively, in relative terms of warmer and colder, but the quality of our perception system is quite low, in particular due to its limited selectivity (what we actually perceive is the so-called apparent temperature, caused by the combined effects of air temperature, relative humidity, and wind speed) and range (our thermoception loses all discriminatory power for temperatures outside the narrow range mentioned above). Given its practical importance, it is not surprising that the history of the understanding and the measurement of temperature is rich, with several significant stages (see, e.g., Chang, 2007; Sherry, 2011), from the starting point of our physiology, which allows us to consider temperature only as a (partially) ordinal property based on the relation warmer than, to the introduction of instruments which make differences of temperature observable by transducing temperature to the height of a liquid via the effect of thermal expansion. Such instruments were based on the hypothesis of a causal relation between temperature and volume: ceteris paribus, if the temperature of the liquid increases then its volume increases (and then also its height increases, thanks to the ingenious configuration of the instrument). In other words, the problem of the low sensitivity to temperature of the human senses was solved not by looking for some sort of “temperature amplifier”, but by gaining and then exploiting knowledge about the effects of temperature on a second property, which is in some sense more directly observable, by means of instruments that were designed as nomological machines (Cartwright, 1999).Footnote 7

The discovery that a transduction effect can be effectively modeled as a monotonic relation is sufficient for building instruments that only detect changes of the relevant property, as is, for example, the case for a thermoscope, which is a device able to reveal a change of temperature by somehow showing a change of volume. However, the ceteris paribus condition is critical for a quantitative characterization of the transduction effect, and therefore equipping a thermoscope with a scale and thus making it a thermometer, given the dependency of the transduced height on the context—air pressure in particular—and the instrument’s features, including the kind of liquid used and the material of which the tube is made (typically some sort of glass). It was only on the basis of such a condition that fixed points were discovered, so that, e.g., ceteris paribus, water boils always at the same temperature. This was a fundamental enabler of the establishment of scales of temperature, which were initially created without a strong theoretical understanding of temperature and its relation to thermal expansion, and instead were mainly based on models of data, typically with the assumption of linearity of values between the fixed points (Bringmann & Eronen, 2015). The compatibility of the results produced by different instruments was hard to achieve, and in consequence so was the construction of a socially agreed thermometric scale, Celsius and Fahrenheit being only the two remnants of a larger set of once-proposed scales. But this multiplicity of instruments, able to produce at least partially compatible results, also helped advance our knowledge of temperature: the observed transduction effects implemented in different instruments share a common cause, which is also the same physical property that we perceive and describe in terms of warmer or colder. This standpoint was further supported by the discovery of other temperature-related transduction effects, independent of thermal expansion, for example the thermoelectric effect, such that differences of temperature are transduced to differences of electric potential (i.e., voltage). The hypothesis of the existence of temperature, as the cause of multiple, independent but correlated effects, was thus strongly corroborated.

Temperature has some other interesting features for our conceptual metrological perspective. It is an intensive property, i.e., “one that is independent of the quantity of matter being considered”,Footnote 8 so that the temperature of a thermally homogeneous body does not change by removing a part of the body, and nevertheless it has a fundamental connection with several additive/extensive properties, and in particular heat energy, which spontaneously flows from bodies at a higher temperature to bodies at a lower temperature. Moreover, the temperature of a gas is equivalent to the average kinetic energy of its molecules, where thus a property at the macroscopic level (temperature) is explained in terms of a property at the microscopic level (molecular kinetic energy).

Finally, the measurement of temperature and its development are also interesting with respect to scale types. While historically temperature was considered to be only an ordinal property, the scientific and technological advances resulting from the adoption of the experimental method led to thermometric scales (including the previously mentioned Celsius and Fahrenheit scales), which are interval scales, because of the lack of knowledge of a “natural zero”, common to all scales. (Compare to the case of length and mass: even though many scales of length and mass were introduced, each corresponding to a different unit, all scales of length share the same zero-length and all scales of mass share the same zero-mass.) Accordingly, ratios of values of (thermometric) temperature are still not meaningful—in the sense that if the temperatures of two bodies are, e.g., 20 and 40 °C, then the conclusion that the latter is twice as warm as the former is mistaken, as one can easily check by converting the two values to °F—but units of temperature are nevertheless well-defined, and allow us to compare invariantly the ratios of differences of values. A second scientific development created the conditions for the final step: thermodynamics implies the existence of a minimum, or absolute zero, of temperature, at—273.15 °C, which led to the Kelvin scale, and which is thus a ratio scale, although, of course, still non-additive, as further discussed in Sect. 6.3.6.

1.2.2 A Brief Introduction to Reading Comprehension Ability and Its Measurement

An important example of measurement in the human science domain is that of a student’s reading comprehension ability (RCA). The relevance of reading comprehension ability to the modern world can hardly be exaggerated; indeed, you, the reader, would not have gotten this far without your own reading comprehension! It is obvious that accurate measurement of RCA is of crucial importance in education, but it is equally so in many other social domains, such as in the writing of guidebooks, the formulation of tests such as driving tests, and in the communication of public health warnings.

A basic scenario for the measurement of reading comprehension ability might involve the following.

  1. (a)

    A reader reads a textual passage, and is then asked one or more questions about how well they understand the contents of the passage. One of the first such tests was developed by Frederick Kelly (Ƒ6), and an example question from that test (see Fig. 1.1) will serve as an illustration of this typical format. The questions were chosen by Kelly to be likely to generate incontrovertibly correct or incorrect responses. Such questions and their accompanying rules for interpretation of responses are commonly called items in this field.

    Fig. 1.1
    A text image reads, I have red, green, and yellow papers in my hand, if I place red and green papers on the chair, which color do I still have in my hand question mark

    Item from Kelly’s reading test

  1. (b)

    The reader responds to each item by writing a response or selecting an option from a predetermined set of options. Thus, the reader’s RCA is transduced to the responses to the items.

  2. (c)

    A rater judges the correctness of each item response (this may be carried out automatically, especially in the case of multiple choice items).

  3. (d)

    An initial indication of a reader’s RCA is then given by the pattern of correct and incorrect responses for the set of items, which might be summarized in terms of the number (or percentage) of test items that the reader answered correctly, typically called the “sum-score”. The sum-score is then an indication at the macro level of the reader’s comprehension of the reading passage, whereas each individual item response is an indication at the micro-level of the reader’s comprehension of the question asked in the item.

  4. (e)

    This indication is, of course, limited in its interpretation to just the specific set of items administered. A variety of methods are available to allow interpretation beyond that specific test to generate a value on an instrument-independent RCA scale (some of which are presented below).

In a typical educational context, once the measurement is completed, a teacher would use interpretational curriculum materials keyed to the RCA scale value to assign the reader to some specific reading instruction activities designed to be appropriate for her level of RCA.

If the process just described has been successful, the basic input of the process is the student’s RCA, and the basic output is the estimated value of the student’s reading comprehension ability. As in the case of temperature, other inputs are usually present that could contribute to the output, such as distracting noises, the mood of the student, peculiarities of the text passages and/or the questions, and even specific background characteristics of the reader, etc.

A traditional method to generate an RCA scale is the norm-referenced approach, which relates the RCA values to the sum-score distribution for a chosen reference population of readers. In this method, a representative sample of individual readers from a specified population (e.g., five-year-olds in a given country) take a test, and this generates a sample of results in the form of the local values (“sum-scores”) on the test. Then some statistics are computed on the sum-scores, such as the mean and the standard deviation (or the median and the interquartile range, or the percentiles), and the public reference properties are taken to be the RCAs of readers at those values. For example, if the mean and standard deviation were the chosen reference points, then a scale could be set by mapping the mean to, say, 500, and the standard deviation to, say, 100: thus, following this scale formulation, a value of 600 RCA units would be for a reader located at one standard deviation (100 RCA units)Footnote 9 above the mean (500 RCA units). This is thus an ordinal scale, but it is often treated as an interval level scale in psychosocial measurement. An alternative approach to the norm-referenced approach is where the RCA scale values are related to specific reference reading comprehension criteria known as the criterion-referenced approach, and an example of that will be discussed in Sect. 7.3.5.

1.2.3 An Initial View of Psychosocial Measurement from a Physical Science Perspective

Some may find it difficult to relate the above account of an RCA test to the traditional idea of a physical instrument such as a thermometer. Seeing the analogies can be not so obvious—in particular, it can be hard to conceive how observations of how well readers respond to RCA items can be compared to how temperature is reflected in a thermometer—as these just do not seem like similar events!

This subsection (itself based on Mari & Wilson, 2014) is intended as a stepping-stone between these two worldviews of measurement. It starts with a standard physical measurement context, specifically the measurement of temperature using an alcohol thermometer, and shows how this can be adapted to a situation analogous to that for RCA.

With the aim of measuring a temperature Θ, a thermometer can be exploited as an indicating measuring instrument, and specifically as a sensor, which is supposed to behave according to a transduction function (sometimes also called “observation function”: this is what Fechner called a “measurement formula”, Boumans, 2007: p. 234) assumed to be linear in the relevant range:

$$x = \varTheta /k$$
(1.1)

i.e., the measurement principle is that an object a of temperature Θ[a] put in interaction with a thermometer of sensitivity k1 generates an expansion of the substance (e.g., a gas or a liquid) in the thermometer bulb and therefore an elongation x = Θ[a]/k of the substance in the tube.Footnote 10 (We will omit measurement units from now on, but of course we take the kelvin, K, as the unit of the temperature Θ, the metre, m, as the unit of the elongation x, and K m1 as the unit of the constant k.) Once the instrument has been calibrated, and therefore a value for k is obtained, the measurement is performed by applying the measurand Θ[a] to the thermometer, getting a value for the indication x and finally exploiting the inverted version of the law, Θ[a] = k x for calculating a value for the measurand.

This relationship (which is linear, due to constant sensitivity) is illustrated in Fig. 1.2 by the dotted line.

Fig. 1.2
A line graph depicts obtained elongation versus applied temperature. Linear and boolean have positive slope and step profiles. Peak values are 2 and 1.

Relationship between Θ and x (the transduction function) for thermometers as given in Eq. (1.1) and Eq. (1.2) (scaled values)

Suppose now that, instead of a thermometer whose behavior is described by Eq. (1.1), a modified thermometer is available, again characterized by a constant k, operating according to the following transduction function:

$$\left\{ {\begin{array}{*{20}l} {{\text{if }}\frac{\varTheta \left[ a \right]}{k} < 1{\text{, then the alcohol stays in its rest position,}}} \hfill & {x = 0} \hfill \\ {{\text{if }}\frac{\varTheta \left[ a \right]}{k} \ge 1{\text{, then the alcohol elongates to a fixed position,}}} \hfill & {x = 1} \hfill \\ \end{array} } \right.$$
(1.2)

Let us call such a transducer a “Boolean thermometer”, whereas “linear thermometer” will be the term for any transducer behaving according to Eq. (1.1). (The principle of transduction for a Boolean thermometer is not important here: we might suppose, for example, that the substance enters the tube only when it reaches its boiling temperature—as such, it could be interpreted as a calibrated thermoscope.)

While the behavior of a linear thermometer is mathematically modeled as a continuous, linear function in the relevant range, Eq. (1.2) defines a function whose range is discrete, and in fact binary. A second major difference between Eqs. (1.1) and (1.2) is related to the dimension of the parameter k: while in the case of linear thermometers, dim Θ/k = dim x = L, and therefore dim k = Θ L–1, Eq. (1.2) assumes that dim Θ/k = 1 (i.e., is a quantity with unit one—sometimes, the term “dimensionless quantity” is used in this case), so that dim k = dim Θ for Boolean thermometers. The fact that in this case the parameter k is dimensionally homogeneous to a temperature has the important consequence that it can be interpreted as a “threshold temperature”, such that the substance elongates in the tube only if the applied temperature is greater than the threshold. This interpretation is crucial for what follows, as it allows the comparison of the involved quantities not only through ratios (Θ/k > 1) but also through differences (Θk > 0) and orderings (Θ > k), and therefore makes it possible to place values of the measurand and the parameter of the measuring instrument on the same scale.

Calibrating such a Boolean thermometer requires one to apply increasing temperatures whose values are known and registering the value Θ’ of the temperature that makes the substance elongate, so that k = Θ’. If we then apply a temperature Θ to this calibrated Boolean thermometer, and we obtain the indication value x = 1, then the only conclusion that can be drawn in this case is that Θ/k ≥ 1, and therefore that Θ ≥ k. Thus, this Boolean thermometer has taken the underlying algebraically rich scale of the quantity subject to measurement (the temperature Θ) and rendered it as an ordinal quantity: that is, it has operated as a pass/fail classifier. This, then, is the link to the correct/incorrect nature of the RCA items, as described in the previous section. Of course, the imperfection of this instrument is clear—it does no more than divide up temperatures into two categories, above k and below (or equal to) k. And this is, of course, also the reason why RCA tests always consist of multiple RCA items: so that the RCA scale will be able to distinguish more categories.

Then, to accomplish this using Boolean thermometers, suppose that an array of M calibrated Boolean thermometers is available, each of them with a different constant ki, and sequenced so that ki < ki+1 (this sequencing is an immediate by-product of the calibration described in the previous paragraph). Then, given an object to be measured a, the measurement procedure would be to apply the measurand Θ[a] to the Boolean thermometers in sequence until the jth thermometer is identified such that:

  • Θ[a] generates an elongation in all thermometers i, i < j, i.e., the indication value xi = 1 is obtained, so that Θ[a] ≥ ki;

  • Θ[a] does not generate an elongation in the jth thermometer, i.e., the indication value xj = 0 is obtained, so that Θ[a] < kj

Hence, if j = 1, i.e., no thermometers elongate, Θ[a] < k1, and if j = M + 1, i.e., all thermometers elongate, Θ[a] ≥ kM.

In the simplest case of a sequence of M = 2 Boolean thermometers, with constants k1 and k2, k1 < k2, three cases can then arise:

  1. (a)

    x1 = 0, i.e., the applied temperature does not elongate any Boolean thermometer: Θ[a] < k1;

  2. (b)

    x1 = 1 and x2 = 0, i.e., the applied temperature elongates the first Boolean thermometer but not the second one: k1 ≤ Θ[a] < k2;

  3. (c)

    x2 = 1, i.e., the applied temperature elongates both the Boolean thermometers: Θ[a] ≥ k2.

Clearly, this procedure can be extended to any number of Boolean thermometers that were pragmatically usable in a given context. And that is exactly the formal foundation for one classical approach to measurement in the human sciences, called Guttman Scaling (1944). Under this approach, RCA “Guttman” items are seen as being related to the underlying scale as is the Boolean thermometer in Eq. (1.2), and a sequence of successively harder Guttman items are generated, so that they specify an ordinal scale of readers.

This illustrates what one can do if one already has an algebraically rich measurand such as temperature. The real situation in the case of RCA is, of course, that this is not readily available, so that one must, in some sense, reverse the logic that was worked through here, to proceed from the Guttman items back to the underlying scale. The problem is actually a little bit more complicated than that, as the drawback to this formulation is that RCA and other human sciences items only seldom behave so exactly as given in Eq. (1.2), but rather they function in a less reliable way, so that an element of probability must be introduced in order to better model the situation. In fact, one way to do so is illustrated in Fig. 1.3, where the indication is given in terms of a probability. How this can be done is described in Sect. 7.3.7.

Fig. 1.3
A line graph of the probability of positive indication versus measurand has an S shaped profile. Peak Point is (4, 1).

Sketch of a transduction relationship between an RCA measurand (in log metric) and the probability of observing a correct response

1.3 The Path We Will Travel in This Book

As we say above, in this book we are seeking a conceptualization of <measurement> that can encompass evaluation of both physical and psychosocial properties, and of both quantitative and non-quantitative properties. In doing so, we require that this conceptualization is specific enough to account for the acknowledged epistemic authority of measurement, which is a critical part of its importance.

We start our story proper in Chap. 2, where we seek to identify a basic set of conditions necessary for measurement, which we hypothesize to be acceptable for many, if not all, researchers and practitioners across a wide range of fields of application of measurement. Chapter 2 concludes with a statement that summarizes those conditions:

measurement is an empirical and informational process, designed on purpose, whose input is an empirical property of an object and that produces information in the form of values of that property

In Chap. 3, we will add to this position three key additional points. First, we stipulate that measurement results should include information about the quality of the reported values, though we acknowledge that sometimes this is neglected in non-scientific situations. Formerly, this has been considered in reference to measurement errors, but, in contemporary measurement, it is more usually characterized in terms of uncertainty and validity, in physical and psychosocial measurement respectively.

Second, as inherited from the Euclidean tradition, when we report measured values, we are providing a relational form of information—the ratio of the measured property to the chosen unit, in quantitative cases. To do so requires that there is broad social availability of a metrological system that disseminates the reference properties by the usual means of measurement standards connected through traceability chains. This means that measurement requires calibration.

Third, in our view, and despite our previous point, we see that there has been an overemphasis on the relevance of the Euclidean tradition to measurement science. In particular, this tradition refers to a concept that is only loosely related to the above-mentioned empirical and informational process of measurement: the mathematical concept <measure> , i.e., a numerical ratio of entities. Hence, our conclusion is that the contention that measurement applies only to quantitative properties cannot be justified by kowtowing to the Euclidean tradition.

At this point in the book, we begin our own explorations beyond these basic positions, and address the question: given these necessary conditions, what complementary conditions are sufficient to characterize measurement?

As a background to answering that question, we review, in Chap. 4, three broad perspectives on measurement—realism, operationalism, and representationalism—and discuss, in the context of each of them, the epistemic status of measurement and the conditions of its proper use. We present the main findings of this discussion in a simple two-by-two mapping,Footnote 11 and the whole discussion leads us to the conclusion that an essential characterization of measurement is as an empirically structured model of the process, rather than some set of mathematical constraints on the inputs or the outputs of the process. This, coupled with an acknowledgment of the inevitable role of models in the measurement process, can be summarized as a model-dependent realism about measurement.

Next, in Chap. 5, we take up the very target of measurement, i.e., properties. We analyze properties from both ontological and epistemological perspectives, and identify a core issue in terms of the meaning of the Basic Evaluation Equation (BEE),

$${\text{property}}\,{\text{of}}\,{\text{a}}\,{\text{given}}\,{\text{object}} = {\text{value}}\,{\text{of}}\,{\text{a}}\,{\text{property}}$$

which displays the basic components of any measurement result, and which must also be complemented with some information about uncertainty. From our model-dependent realist standpoint, we interpret the BEE relation as the (simple though controversial) claim of an actual referential equality: the BEE conveys information on the measurand because the measurand and the measured value remain conceptually distinct entities, though they identify the same individual property. Our position that measurement is an empirical process forces us to conclude that properties cannot be conceptual entities, and hence, we must investigate the very existence of properties. An additional complexity of the subject of the existence of properties is that <property> is a cluster concept, including four sub-concepts:

  •  <property of an object> (e.g., the mass of a given object and the reading comprehension ability of a given individual);

  •  <value of a property> (e.g., 1.234 kg and 1.23 logits on a specific RCA scale);

  •  <individual property> (e.g., a given mass and a given RCA);

  •  <general property> (e.g., mass and RCA).

In our realist perspective, individual properties exist as universals, but the interpretation of the BEE as a referential equality is compatible with other positions: hence, the continued progress in our exploration is not thwarted by possible disagreements over the actual nature of the entities exemplified by one or more of the sub-concepts of <property> .

In Chap. 6, we discuss three fundamental issues for measurement science. The first issue concerns the nature of values of properties, though we start by discussing the values of quantities. We provide a step-by-step construction to show that values are not symbols for the representation of properties, but that values are individual properties, identified as elements of a scale. Taking this perspective, we can see that the difference between values of quantitative and non-quantitative properties is a matter of the structure of the scale to which they belong. The second issue is then about the structure of scales and the related conditions of invariance, so that scale types provide a classification for property evaluations and then properties themselves. Our analysis finds no unique condition for separating quantitative and non-quantitative properties, and this finding reinforces the distinction between being quantitative and being measurable. The third issue in this chapter concerns general properties. Our basic assumption here is that an empirical process can interact only with an empirically existing entity, and that this applies both to the objects that bear the properties and the properties of the objects. Thus, a distinction needs to be maintained between empirical properties and the mathematical variables that may be used as mathematical models of properties. Regarding the conditions of the existence of general properties and the possible role of measurement in the definition of general properties, the hypothesis of existence of an empirical property can be corroborated by the observation of effects causally attributed to the property.

In Chap. 7, we reach the high point of our story where we propose a general model of the measurement process, one consistent with the ontological and epistemological commitments developed in the chapters before. Again, we start with the distinction between empirical and informational processes, and recall that measurement is neither a purely empirical nor a purely informational process. We broadly distinguish between direct and indirect methods of measurement as a fundamental classification of measurement methods related to the complementary roles of these empirical and informational components: indirect measurements necessarily include at least one direct measurement. In consequence, we give a structural characterization of direct measurement as the actual foundation of measurement science. This structural characterization we call the Hexagon Framework, and we exemplify it for both physical and psychosocial properties. We also use the framework to highlight the importance of evaluating the quality of the information produced by a measurement, but now frame this in terms of the high-level, complementary conditions of object-relatedness (“objectivity”) and subject-independence (“intersubjectivity”). Finally, the Framework provides a sufficient condition for measurability: a property is measurable if it is the input of at least one process that has been successfully structured according to the Framework.

Thus, in the conclusion of our story in Chap. 8, we revisit the arguments and discussions of the earlier chapters. We come back to address our initial question: following the necessary conditions, we discuss in Chaps. 2 and 3, and the conclusions we reach in the subsequent chapters, what sufficient conditions, complementary to the necessary conditions, do we propose for characterizing measurement across the sciences?