Design for Values and the Definition, Specification, and Operationalization of Values

  • Peter KroesEmail author
  • Ibo van de Poel
Living reference work entry


This chapter discusses a methodological problem that advocates of design for values have to face. In order to take into account moral values in designing technology, these values have to be operationalized or made measureable; otherwise it will not be possible to evaluate various design options with regard to these values. A comparison of the operationalization of values with the operationalization of physical concepts shows that certain conditions that enable the operationalization of physical concepts in objective measurement procedures are not fulfilled for the operationalization of values. The most significant difference is that physical concepts are embedded in networks of well-tested theories and operational procedures, which is not the case for moral values. We argue that because of this second-order value judgments play a crucial role in the operationalization of values and that these value judgments seriously undermine any claim that values may be measured in an objective way. The absence of objective measurement of values, however, does not imply that the operationalization and measurement of values in design is arbitrary. In our opinion technical codes and standards may play a major role in coming to a reasonable or justified consensus on how to operationalize and measure moral values in design.


Design for values Specification of values Operationalization of values Measuring moral values 

Introduction: Design and Value Creation

The design and development of technical artifacts and systems is just one step in a complex process of trying to create value, more in particular to create valuable technical goods and services. Apart from design and development, other steps included in this value creation process are production, sales, after-sales, and use. The different stakeholders involved in this process may have different views on what kind of value is being created. Design engineers may highlight the technical value by stressing technical innovations in and patents on the product, whereas production managers may look at the value created primarily in terms of corporate profits, and sales managers in terms of market position. The end users may appreciate the value of the goods and services in terms of satisfying their needs and reaching their goals; these needs and goals may be very diverse bringing into play various kinds of user values (which, for instance, may be classified as values corresponding to Maslow’s five basic human needs). Governmental institutions may look at how the creation, production, and use of technical goods and services enhance public or social values like the health and safety of production workers or users or the privacy of citizens.

So, various kinds of value play a role in the design and production of technical goods and services, including technical, economic, social, and moral ones. Although these various kinds are associated with different phases and stakeholders in the product creation process, there is a strong tendency to take more or all of them into account in the design phase. Technical and economic values and values related to health, safety, and environment play a central role in today’s engineering design practice. Advocates of design for values and of socially responsible innovation argue that design engineers should go one step further, namely, that they also should take into account social and moral values in designing technology. This raises the main issue of this chapter, namely, the issue whether it is possible to take such values into account and if so – the possibility of design for values hinges on a positive answer – how this may be achieved. Much progress has already been made with regard to taking into account various kinds of values in engineering design; there are, for instance, depending on the kind of technical artifact that is being designed, all kinds of norms and standards for values such as health, safety, and environment. Clearly, some of these are highly morally relevant. From this perspective, it seems that the prospects for making progress in taking moral values into account look rather good. So, what are the obstacles, if any, for bringing design for values into practice?

The problem with regard to design for values is a methodological one, which is not specific to taking into account moral values but is of a general design methodological nature. According to design methodology, any functional requirement and any other constraint that the object of design has to satisfy have to be formulated or translated into a list of design specifications. Since any proposed design is going to be evaluated against this list of specifications, specifications have to be formulated as unambiguously as possible, preferably in terms of criteria that may be operationalized in objective measurement procedures. Often the meaning of these criteria and the measurement procedures are fixed in industry or governmental standards. One of the reasons for putting so much effort in standardization is to avoid disagreements about whether or not a particular technical design (technical artifact) satisfies the list of specifications or is “out of specs.”

So, if the aim is to make design for values an integral part of engineering design practice, then any constraint imposed on the object of design stemming from social or moral values somehow has to be translated as unambiguously as possible in design specifications, and these in turn have to be operationalized, again as unambiguously as possible, in measurement procedures. In order to explore to what extent the specification and operationalization of moral and social values face problems that are specific for these kinds of values and that may raise doubts about the feasibility of design for values, we will have a closer look at how physical concepts are made measurable. Just as a general definition of, for instance, privacy does not tell what specific constraints a particular object of design has to satisfy in order to protect or enhance the privacy of its users, general definitions of physical concepts such as temperature or mass are not sufficient to put these concepts to “work” in physics. For that it is necessary to operationalize these concepts in terms of measurement procedures. We will take the way concepts are made measurable in physics as our “golden standard” and explore to what extent this standard may be transposed to the specification and operationalization of moral and social values in engineering design.

In order to analyze what it would take to measure moral values in the context of design for values, we proceed in the following way. After a brief look at the philosophical background of the issue of measuring moral values (section “Philosophical Background”) and a discussion of some preliminary issues (section “Some Preliminary Issues”), we describe for comparison purposes how the concept of temperature is operationalized in physics (section “Definition and Measurement of Temperature”). This is followed by a discussion of three conditions that a “good” measurement has to satisfy (section “A ‘Good’ Measurement”). In the next step, we analyze with the help of an example the problems that are encountered in trying to operationalize and measure morally relevant values (section “Value Definition, Specification, and Operationalization: An Example”). In particular we will focus on what is called “specification” of values and how it relates to the operationalization of values. Thereafter we turn to a discussion of the role of codes and standards in value judgments (section “Codes, Standards, and Value Judgments”). The chapter ends with a brief summary of our main results.

Philosophical Background

Let us introduce the philosophical background of the methodological issue addressed in this chapter with the help of two quotations. The first quotation is taken from one of Plato’s dialogues, Euthyphro. In this dialogue Socrates questions Euthyphro about what is the holy and good. Euthyphro professes to know what these notions stand for, and convinced that in this he is doing the good, he is on his way to the Athenian court to accuse his own father of murder! This dialogue contains the following passage (Plato 1973, p. 175):

SOCRATES: And similarly if we differed on a question of greater length or less, we would take a measurement, and quickly put an end to the dispute.

EUTHYPHRO: Just that.

SOCRATES: And so, I fancy, we should have recourse to scales, and settle any question about a heavier or lighter weight?

EUTHYPHRO: Of course.

SOCRATES: What sort of thing, then, is it about which we differ, till, unable to arrive at a decision, we might get angry and be enemies to one another? Perhaps you have no answer ready, but listen to me. See if it is not the following – right and wrong, the noble and the base, and good and bad. Are not these the things about which we differ, till, unable to arrive at a decision, we grow hostile, when we do grow hostile, to each other, you and I and everybody else?

EUTHYPHRO: Yes, Socrates, that is where we differ, on these subjects.

In this dialogue the two disputants come to the agreement that certain differences of opinion may be resolved by measurements, others not. Their examples of problems that may be resolved by measurements are called in modern terms “empirical” problems, problems that may be resolved by observation, whereas the problems that may not so be resolved concern issues about moral values. So, with regard to a certain kind of issues, consensus may be reached (or forced?) by an appeal to measurements. In those cases, it is possible to reveal, so to speak, the true, objective state of affairs in the world simply by observation or performing a measurement. When it comes to differences about moral values, “scales” (methods) for measuring the moral value of something are lacking and so we are “unable to arrive at a decision.”

The second quotation stems from The Tanner Lecture on Human Values delivered by Thomas Nagel in 1979 which is entitled The Limits of Objectivity. In his Tanner Lecture, Nagel defends the pursuit of objectivity in the domain of ethics. He interprets objectivity as a method of understanding the world; we may arrive at a more objective understanding of the world by stepping back from our own subjective view of the world (“the view from within”) and by including ourselves with our subjective view in the world that is to be understood (“the view from without”). However, this way of “objectivizing” the world has its dangers (1979, p. 117):

So far I have been arguing against scepticism, and in favour of realism and the pursuit of objectivity in the domain of practical reason. But if realism is admitted as a possibility, one is quickly faced with the opposite of the problem of scepticism. This is the problem of over-objectification: the temptation to interpret the objectivity of reasons in too strong and unitary way.

In ethics, as in metaphysics, the allure of objectivity is very great: there is a persistent tendency in both areas to seek a single, complete objective account of reality – in the area of value that means a search for the most objective possible account of all reasons for action: the account acceptable from a maximally detached standpoint.

According to Nagel objectivity has its limits and conflicts between objective and subjective reasons for action should be taken seriously in ethical issues. However, from the pursuit of objectivity in ethics, more in particular of objectivizing reasons for action, it appears to follow that we should strive for the most objective possible account of moral evaluations of various options for actions and of states of affairs in the world, since these moral evaluations play an important role in reasons for action. If we assume that there are no a priori methods for doing so, the only way to achieve this, it seems, is to try to introduce objective methods for measuring the moral value (goodness, badness) of actions or states of affairs. If we would succeed in doing so, then, as in the case of length or weight, it would be possible to settle disagreements about moral values with the help of such measurements.

This chapter deals with the methodological question of whether or not or to what extent it is possible to measure objectively moral value (goodness), such that disagreements about moral value (goodness) may be resolved with the help of measurements. Socrates’ claim that moral disagreements cannot be settled in an empirical way is widespread, and even the suggestion to explore to what extent moral issues may be settled by empirical measurements may sound strange. After all it is quite common to oppose the domain of the moral (or of the normative in general) to the domain of the empirical: it is taken to be a defining feature of moral (normative) issues that they cannot be resolved empirically. Nevertheless there is, as Nagel points out, the allure of realism and objectivity in the domain of the moral. Indeed, there is a long tradition in philosophy of defending various forms of moral realism, all of which center around the core idea that there are (moral) facts in the world that make moral judgments true or false.1 If there are such facts, then the question arises why apparently it is not possible to resolve disagreements about moral claims by an appeal to these (moral) facts similar to how disagreements about physical claims may be resolved by an appeal to physical facts. It is not our intention to enter here into a discussion of whether there are such facts, that is, whether moral realism is indeed the case. We will approach the problem of whether measurements may resolve or help in resolving moral issues in a different way. In order to reach a better understanding of the possible role of measurements in resolving moral issues, we will analyze in detail the role of measurements in resolving disagreements about physical claims. What is involved in measuring physical quantities and what conditions have to be fulfilled such that measurements can play their role in settling disagreements about physical claims? The answers to these questions will put us in a better position to diagnose the reasons why an appeal to measurements is or may be problematic in the case of moral disagreements.

The methodological issue of measuring moral values (goodness) appears of central importance to any attempt to implement design for values. Somehow, design for values appears to presuppose that at least with regard to some moral values, this is possible. If it would not be possible to measure and compare the moral goodness of various design options, then it seems that the whole idea underlying design for values, namely, that engineers should take moral values into account when designing technical artifacts and systems, loses its rationale. In that case it would be difficult to settle disagreements about the moral value of various design options, since there would be no way of telling which design option is morally better than another.2 We will argue that in the absence of methods for objectively measuring moral values, design for values may still make sense in case there is widespread consensus about which design option is the morally better one; in that case, however, this intersubjective consensus is not grounded in objectively measurable features of the design options under consideration. In first instance, however, we are interested in analyzing the conditions that have to be fulfilled so that judgments about the moral goodness of designs may be grounded in objectively measurable features of these designs, just as claims about the physical world may be grounded in objectively measureable features of the world.

Some Preliminary Issues

Before we enter into a discussion whether moral values may be measured, a number of preliminary remarks are in order. First, of course, there is the issue about the nature of moral values. In the literature there is neither consensus about the meaning of the notion of value in general nor about the meaning of the notion of moral value in particular. For our purposes the following will be sufficient. Examples of moral values of interest within the context of design for values are values such as safety, privacy, sustainability, and accessibility. In contrast to most other values that play a role in engineering design practice (see below), these values are not instrumental in nature but are pursued primarily (or exclusively) for their own sake because they are intimately related to or an integral aspect of human well-being and human flourishing.3 So, if we assume that there is some kind of hierarchical ordering of values, design for values deals with values located in the highest regions in a value hierarchy. In the literature these values are often characterized as intrinsic or final values. So, the question we are dealing with is whether values high up in the value hierarchy may be measured objectively.

Second, we are going to use the notion of measurement in a broad sense. Very roughly, a measurement is a representation of relations between certain features of the world in terms of relations between a set of abstract entities. The set of abstract entities is known as the measurement scale. Depending on the measurement scale that is used, a measurement may be classificatory, comparative, or quantitative. The classification of things (states of affairs) in the world in equivalence classes is a measurement on a nominal scale. Suppose that we want to classify persons morally in two types, A and B, and that we have an objective method at our disposal to do so. Then, the classification of a person as of moral type A or B is a measurement on the nominal scale with two measurement values, labeled types A and B (which are labels for two equivalence classes, each class containing all persons who are morally on the same footing). By introducing an ordering on the measurement values of a nominal scale, we get an ordinal scale with regard to which it is possible to perform comparative measurements. In that case, it makes sense to say that a person of type A is morally better (worse) than a person of type B. Suppose that we have at our disposal “moral scales” for comparing the moral goodness of persons, then just as in the case of scales for measuring weights, we would be able to perform a measurement in order to establish which person is morally better (or whether persons are morally on the same footing). If it would also be possible to establish through some kind of measurement how much some person is morally better than another, we are entering the domain of quantitative scales (interval and ratio scales). What is important to note is that on our broad notion of measurement, the idea of measuring moral value does not necessarily imply that such a measurement will result in a quantitative value. The claim that design option A is morally better than design option B may, for instance, amount to an objective comparative measurement of the moral goodness of these design options on just an ordinal scale.

Third, the notion of objectivity in relation to values and measurements stands in need of further clarification. In metaethics there is a long-standing discussion about whether values are real or objective in an ontological sense, that is, whether values are part of the ontological structure of the world. If they are, they are usually taken to be mind independent; if values are real or objective, they are part of the ontological structure of the world independently of the existence of human beings. In that case, values are in Searle’s terminology ontologically objective as opposed to ontologically subjective features whose existence is mind dependent (such as the State of France or screwdrivers) (Searle 1995). In this chapter we are not going to address issues about the ontological status of values, nor are we going to make any particular assumptions about their ontological status. We are interested in the question whether knowledge (judgments) about values may be objective and whether measurements of values may form the basis for making objective judgments about values. In Searle’s terminology, again, we are interested whether judgments about values may be epistemologically objective, which means that “the facts in the world that make them true or false are independent of anybody’s attitudes or feelings about them” (Searle 1995, p. 8). It may be argued that when values are ontologically objective, they are also epistemologically objective – this depends on how the relation between ontology and epistemology is construed – but the reverse appears not to hold. Searle has argued convincingly that objective knowledge of ontologically subjective features of the world is possible: it is, for instance, an objective fact that the State of France exists and that a particular thing is a screwdriver, in spite of the fact that both features of the world are ontologically mind dependent. So, the idea of the epistemological objectivity of moral values is compatible with the idea of their ontological subjectivity. Similarly, it may be argued that the idea that knowledge of moral values is epistemologically subjective is compatible with the idea that moral values are ontologically objective – again this depends on how the relation between ontology and epistemology is construed. Thus, whatever conclusion we may reach with regard to the epistemological objectivity or subjectivity of moral values, it does not commit us to a particular view with regard to the ontological objectivity or subjectivity of moral values.

Fourth, we have to clarify what we mean by the notion of an objective measurement. Intuitively, a measurement is considered to be epistemologically objective if it tells us something about the object on which the measurement is performed, that is, if its outcome is determined only by features of that object, where we leave open whether these features are taken to be ontologically subjective or objective. This intuitive idea and Searle’s notion of epistemological objectivity suggest the following necessary condition for an objective measurement: if a measurement is objective, then its outcome does not depend on particular features of the person, the subject, who performs the measurement, such as her or his preferences, points of view, or attitudes. Thus, the outcome of an objective measurement is intersubjectively valid. A measurement of which one of two objects is heavier with the help of scales satisfies this condition; the outcome does not depend on who performs the measurement and is the same for every subject. This is not the case when the measurement is done by comparing the weights of the objects “by hand,” for then the outcome may depend on subjective features (wishes, preferences, etc.) of the person who performs the measurement. Thus measuring by hand is not an objective but a subjective measurement method, and in particular cases, it may not be possible to reach an agreement about which object is heavier by this measuring method. Note that it is not the simple fact that a person (subject) performs the measurement by hand that makes this method of measuring subjective; any measurement as an intentional act is performed by a person (subject). What makes this method subjective is the fact that subjective features of the person who performs the measurement may influence its outcome.

This intersubjectivity condition, however, is not strong enough. In order to see why, note that if a measurement satisfies the above condition, this does not imply that that measurement is objective in the above sense. For instance, a systematic error may occur in measurements with the scales due to an error in their construction. Nevertheless, measurements with such scales satisfy the intersubjectivity condition: the outcomes do not depend on subjective features of the person who performs the experiments. What is measured, however, is not some feature of the objects whose weight is being compared but some feature of the system consisting of these objects and the measurement equipment.4 In order to ensure that a measurement reveals only features about the object of the measurement, we also have to require that the measurement outcome is not influenced by features of the measuring device. So for a measurement to be strictly objective, it is necessary that it is transparent (in the sense of not containing any traces) not only with regard to features of the person who performs the measurement but also with regard to features of the measurement equipment.5 This means that the measurement of epistemologically objective features also does not depend on the kind of measurement equipment involved (for instance, temperature may be measured with a mercury thermometer or a thermocouple). Even a measurement that satisfies this stronger condition is not always a good measurement, for, as we will see later on in more detail, there may be problems about its validity.

Let us see how our analysis of the notion of (strictly) objective measurement would work out for measurements of moral values. Suppose person X has to decide whether person Y is a morally good or bad person and, on the basis of the observation of Y’s features and behavior (to be interpreted in the broadest sense), concludes that Y is a morally good person. On our account of measurement, X performs a measurement of the moral value of Y on a nominal scale using himself or herself as the measuring device. What person X as a measuring device does is to represent all the information about Y’s features and behavior on a measuring scale with two values, good and bad. Whatever the particular details of how person X as a measuring device does this, if the outcome of the measurement and the measurement method is objective, then the outcome is not influenced by X’s preferences, points of view, attitudes, etc. In that case, anybody making the same observations of Y’s features and behavior and making use of the same measurement device (i.e., the same way of representing information about features and behavior of a person on the measurement scale as used by X) will come to the same measurement result. As in the case of the weighing scales, this does not exclude the possibility that the measurement method itself does influence the outcome. X’s measurement device may lead to systematic errors in the outcome.

So far we have been discussing necessary conditions for (strictly) objective measurements. We will leave it an open matter whether these conditions may also be considered to be sufficient. Suffice it here to remark the following. Suppose that there is intersubjective agreement (consensus) about person X being morally better than person Y (even Y agrees!). Does this mean that it is an epistemologically objective feature of X that (s)he is morally better than Y? That depends on the nature of this consensus. If it is the result of everybody applying the same measurement method, along the lines sketched above, then this case looks similar to the one in which scales are used to reach consensus about which one of two objects is heavier. It is generally assumed that in the latter case, given the absence of systematic errors, we are dealing with an objective measurement and with epistemologically objective features. Since there seem to be no significant differences between the two cases, we do not see any reason to question the epistemological objectivity of the moral features and of the objectivity of comparatively measuring moral goodness. This conclusion, however, is based on a big “if,” namely, that the consensus about the moral evaluation is based on the use of a common measurement method.

As the quotation from Plato suggests, we do not have at our disposal a common measurement method for moral goodness. So if de facto there is consensus about X being morally better than Y, then this consensus is not forced, so to speak, by the use of a common measurement procedure. Everyone performs his or her own measurement, often without a clear insight into the details of their measurement method, but the outcome of all measurements is nevertheless the same. In that case the inference from consensus (intersubjective agreement) to epistemologically objectivity becomes much more problematic. The main reason for this is the relation between the meaning of the notion of moral goodness and the way it is measured or operationalized. Suppose that the consensus is the result of the use of various measurement procedures for moral goodness. Schematically, two different situations may be distinguished. In the first situation, all these measurement procedures correspond to different operationalizations of the same notion of moral goodness, similarly to the different operationalizations and measurement methods of, for instance, the notion of temperature in physics. In the second situation, different notions of moral goodness are at play, each with their own operationalization and corresponding measurement method. In the first case, there is no reason to doubt the inference from intersubjectivity to epistemological objectivity. But of course it will be necessary to underpin the claim that the intersubjectivity with regard to some particular moral issue is indeed grounded in different ways of operationalizing and measuring the same concept. How is that to be done? As we will see shortly for the concept of temperature, physics offers detailed conceptual frameworks and theories about what temperature is and how it may be measured to support the claim that the various measurement methods for temperature are all operationalizations of the same concept. Nothing that comes near to this exists in the field of ethics and with regard to the concept of moral goodness. That is one of the main reasons why in this domain it is so difficult to make a strong case for the inference from intersubjectivity to epistemological objectivity. In the second case, any inference from intersubjectivity to epistemological objectivity appears to be out of the question, since the concept of moral goodness has various meanings and various things are being measured depending on the meaning attached to the notion of moral goodness. What on the face of it seems to be consensus about a moral judgment (“person X is morally better than person Y”) is then on closer inspection not a consensus at all, since there is no agreement about the meaning of this judgment.

A final preliminary remark concerns issues about aggregation of values and value (in)commensurability. Consider a situation in which the moral goodness of a person is operationalized in such way that it is measured in terms of different criteria (such as honesty, justice, and altruism). Then the question arises how evaluations of a person against these various criteria separately may be aggregated into an overall evaluation of that person’s moral goodness. In general, such an aggregation is necessary in order to be able to compare the moral goodness of different persons. This aggregation problem involves issues about whether different values may be compared to each other or not. In our discussion of measuring values, we will run into issues about aggregation of values and value (in)commensurability, and although we recognize the relevance and importance of these issues, we will not discuss them in any depth (but see chapter “Conflicting Values in Design for Values” in this handbook).

Definition and Measurement of Temperature

It has taken physicists and engineers centuries to develop a clear notion of temperature, of its unit and scale, and methods of how to measure it quantitatively. Moreover, it seems that this development has not yet reached an endpoint and is still on its way; for instance, the latest revision of the definition of the International Temperature Scale dates back to 1990 (see below). Without doing any justice to the complex history of the notion of temperature and to the complexity of the modern notion itself, the following remarks suffice for our purposes (for more details, see Chang (2004)).

From a phenomenological point of view, the notion of temperature has always been associated with the distinction between warm and cold and with the notion of heat and has been taken to be some kind of measure of the hotness or coldness of things. Within physics, it was only after a clear distinction between intensity of heat and quantity of heat was made and the idea of heat as a form of fluid (“caloric”) was given up that the modern notion of temperature established itself during the nineteenth century. Before that time various reliable ways to measure temperature and various temperature scales and units (Celsius and Fahrenheit) had already been introduced. According to most definitions of temperature to be found in present day introductory physics textbooks, the temperature of an object is a measure of the disorderly (random) motion of the particles of which it is made up, more in particular of their mean kinetic energy. For an ideal gas, the temperature is defined more precisely as a measure proportional to the mean translational kinetic energy of its particles.

Physics, however, has much more to say about the notion of temperature than there is to be found in introductory textbooks. Actually, various notions of temperature are in use in physics. The interpretation of temperature in terms of mean kinetic energy works fine at the macroscopic level for particular kinds of systems (composed of atoms and/or molecules). But physicists have developed notions of temperature for other kinds of systems, including systems consisting of photons (electromagnetic radiation; they speak about the temperature of “blackbody radiation”) and spins. Apart from that, thermodynamics offers the following definition of absolute thermodynamic temperature in terms of energy and entropy:
$$ T=\frac{d{q}_{\mathrm{rev}}}{ dS} $$
which is independent of the particular physical makeup of the system under consideration. This notion of temperature is only well defined for systems that exchange energy with their environment in a reversible way and thus are in equilibrium with their environment. Just as in the case of an ideal gas, absolute thermodynamic temperature is defined with reference to an ideal kind of system since in practice heat exchange is not reversible.
All in all, the situation with regard to the notion of temperature in physics is that it is defined in different ways in different theoretical frameworks, but it may be shown, theoretically as well as empirically, that these various notions of temperature hang together and because of this it is assumed that they all refer to one and the same physical quantity. This is reflected in the fact that measurements of all of these various theoretical notions of temperature share a common temperature scale, namely, the International Temperature Scale of 1990 (T90) referred to above. This scale is defined in the following way6:

Between 0.65 K and 5.0 K T90 is defined in terms of the vapour-pressure temperature relations of 3He and 4He.

Between 3.0 K and the triple point of neon (24.5561 K) T90 is defined by means of a helium gas thermometer calibrated at three experimentally realizable temperatures having assigned numerical values (defining fixed points) and using specified interpolation procedures.

Between the triple point of equilibrium hydrogen (13.8033 K) and the freezing point of silver (961.78 °C) T90 is defined by means of platinum resistance thermometers calibrated at specified sets of defining fixed points and using specified interpolation procedures.

Above the freezing point of silver (961.78 °C) T90 is defined in terms of a defining fixed point and the Planck radiation law.

The unit of this temperature scale, the kelvin, is defined as the fraction 1/273.16 of the temperature of the triple point of water.7 This does not concern ordinary water; for measurement purposes a standardized form of water with a specific isotopic composition is used known as Vienna Standard Mean Ocean Water.8

The technical details of the definition of the temperature scale and its unit do not concern us. They are presented here because they illustrate an important point, namely, that not one measurement procedure is used for defining the whole temperature scale in one stroke; apparently it is considered better to refer to different measurement methods in different regions of the temperature scale. This should not come as a surprise. It is no use trying to measure temperatures in the region of 10,000 K with a mercury thermometer. In general, when we want to measure the temperature of something, it is necessary not only to specify the relevant temperature range but also other physical characteristics of that something. For instance, even if the temperature of a tiny drop of water lies within a temperature range of a mercury thermometer, it does not make sense to use that measurement method, because, as we will see shortly, it will not lead to valid measurements. Thus, specification of the conditions under which a physical quantity such as temperature is to be measured (the kinds of system and the temperature range involved) is an important step in making that quantity measurable.

The definition of the temperature scale and unit illustrates yet another point. They do not define the meaning of the notion of temperature in the sense of what kind of physical quantity is being measured and of what the kelvin is the unit measure. What is defined is how a certain quantity called “temperature” is to be measured by specifying for various regions of the temperature scale a specific measurement procedure against which other measurement methods that may be used in that region are to be calibrated. If these other measurement methods lead to results which are coherent with the definition of the standard in that region, then they are supposed to measure the same physical quantity. There is no reference to notions like mean kinetic energy or other theoretical notions that play a role in the various theoretical definitions of temperature.

The foregoing does not mean that theory plays no role in measuring temperature. On the contrary, these definitions are the outcome of numerous developments in theoretical and experimental physics, and the idea that these various methods all measure the same physical quantity is anchored deeply in theoretical as well as empirical considerations. According to Chang (2004, p. 212 ff), the modern accurate methods for measuring temperature are the outcome of a long process of successful convergence of iterative attempts to improve our theoretical conceptions of and measurement methods for temperature. This is not to be interpreted as a convergence toward the measurement of the “true” value of the absolute thermodynamic temperature of a physical system. For Chang (2004, p. 207) an unoperationalized abstract concept like absolute thermodynamic temperature “does not correspond to anything definite in the realm of physical operations, which is where values of physical quantities belong.” The true or real value of the temperature of a physical system is defined by the measurement method that is the outcome of successful convergence of repeated attempts to operationalize the abstract notion of temperature; in other words, the real value is constituted by a successful operationalization (Chang 2004, p. 217).

So in general the problem of measuring a physical quantity is a problem of how to connect abstract (theoretical) concepts of this quantity to real physical systems and operations on these systems, that is, how to make “contact between thinking and doing” (Chang 2004, p. 197). Abstract physical concepts have, so to speak, no grip on real physical systems; as we will see below, the same applies to abstract moral values and technical designs. As an alternative to the widespread idea that the operationalization of abstract concepts may be achieved through correspondence rules that directly connect these concepts to empirical terms, Chang (2004, pp. 206–207) offers a two-step view on how operationalization of abstract concepts may be achieved (see Fig. 1). The first step consists in finding a concrete image for a system of abstract concepts. A concrete image is not a real physical system but an “imagined system, a conception, consisting of conceivable physical entities and operations, but not actual ones” (Chang 2004, p. 206). The conception of an ideal gas is an example of a concrete image; it is an imagined system whose behavior is governed by an abstract system (the concepts and laws of thermodynamics). The second step consists in finding a match between the imagined system and actual, real physical systems. If no exact match is possible but real physical systems may be configured such that they approach the imagined system, then the concrete image may be characterized as an idealized system. According to Chang the notion of a valid operationalization has to be interpreted in terms of what he calls a “good correspondence” between the abstract system and concrete image and between the latter and actual physical systems. Unfortunately, he leaves us more or less in the dark about the details of what constitutes a good correspondence. He claims that it is not a one-dimensional notion and that any assessment of the validity of an operationalization calls for a complex judgment.
Fig. 1

Chang’s two-step view of operationalization

Finally, the history of the notion of temperature in physics shows that reliable ways of measuring a physical quantity may exist in spite of the fact that the nature of that physical quantity, that is, the meaning of the corresponding notion, may still be under dispute. It shows that, historically, measurement procedures are not always the end result of specifying and operationalizing a well-defined notion. On the contrary, as the history of the notion of temperature shows reliable measurement, methods often play a key role in arriving at consensus about the nature of what is being measured. It may be questioned whether generally speaking the definition of a concept always comes prior to its specification and operationalization. For example, from an operationalist’s point of view, the measurement procedure appears to be conceptually prior since the meaning of a concept is identified with its measurement procedure.

The operationalization of concepts plays a key role not only in science but also in engineering, especially in the context of drawing up design requirements. Vincenti (1990, Chapter 3) contains a detailed description of how engineers in the early days of powered flight wrestled with and finally solved the problem of defining and operationalizing the notion of “flying quality” of airplanes. He describes how they set up a research program to deal with issues about what it means for an airplane to have good flying qualities and how those might be measured. Finally, they were successful and were able to specify measurable (testable) design requirements for flying qualities for airplanes. When attempts to operationalize concepts are successful, it is possible to settle disputes about those concepts by an appeal to measurement (testing).

Before we turn to a discussion of the measurement of moral values, it will be necessary to pause for a moment on the question of what makes a measurement a “good” measurement.

A “Good” Measurement

When is a measurement method a good way of measuring something, that is, when may we consider the outcome of measuring something to be a good measurement, assuming that the measurement method has been applied correctly? If we take the notion of measurement in our broad sense, then this is a crucial question for any attempt to settle whatever issue in an empirical way, since then even the categorization of some observable event or object into a particular class involves a measurement. If an appeal to measurement is made in the context of settling a disagreement, then a “good measurement” must satisfy certain necessary conditions in order to assure that the measurement will make it possible to settle the dispute in an unambiguous way, that is, to “force” consensus among rational disputants. Ideally, these necessary conditions together are sufficient and then define the notion of a good measurement. At least the following three questions appear to be important with regard to the notion of a good measurement in the context of a dispute:
  1. 1.

    Validity: Does what is measured correspond to what one intended to measure in order to settle the dispute?

  2. 2.

    Reproducibility: Is the outcome of the measurement independent of the person who performs it?

  3. 3.

    Accuracy: Is the outcome of the measurement accurate, that is, to what extent does it correspond to the “real” state of affairs in the world (to the “real” value in the case of quantitative measurements)?9


If the answer to one of these questions is negative or under dispute, then clearly an appeal to measurement will not settle the issue, because the (outcome of the) measurement itself may become the object of dispute. In general, any good measurement has to be valid, reproducible, and accurate. We will briefly discuss each of these features and illustrate their relevance with the help of measurement methods for temperature.

A measurement method is called valid if it measures what it is supposed or claimed to measure. Various notions of validity are in use (especially in the social sciences). Here we concentrate on what is often called construct validity of a measurement method: a measurement method is construct valid if it measures the theoretical (abstract) notion it is intended to measure. For instance, if temperature is theoretically defined in terms of the mean kinetic energy of particles, then the measurement of the temperature of a cup of tea with a mercury thermometer is a valid measurement method (on the assumption that the measurement procedure is correctly executed). The same is true when a thermocouple is used. What is measured is the intended physical quantity, namely, the temperature of the cup of tea. Construct validity presupposes that there is a “theoretical network that surrounds the concept” and is “concerned with the extent to which a particular measure relates to other measures consistent with theoretically derived hypotheses concerning the concepts (or constructs) that are being measured” (Carmines and Zeller 1979, p. 23). As we observed in the previous section, this is indeed the case for the physical concept of temperature. It is embedded in a whole network of theories which explain why measurements of temperature based on the expansion of liquids or on the generation of a voltage difference over a boundary layer between two metals lead to consistent results in situations where both measurement methods can be applied.

To illustrate the importance of the theoretical network for assessing the construct validity of a measurement method, consider the situation in which the temperature of a cup of tea is measured by simply measuring the volume of the tea contained in the cup. This is clearly a nonvalid measurement; its outcome is a particular volume and as such has no relation to temperature at all. The theoretical network in which the notions of temperature and volume of a fluid are embedded does not allow interpreting this particular volume in terms of the temperature of the tea. However, measurement of changes in the volume of the tea may be taken as a valid measurement of changes in its temperature, since there is a theoretically and empirically grounded relation between changes in the volume of a fluid and changes in its temperature. So, changes in temperature may be measured validly by changes in volume (or changes in voltage, or changes in radiation spectrum, etc.) on condition that a suitable theoretical background is in place.

On top of being construct valid, a good measurement has to be reproducible: the outcome of a good measurement may not depend on specific features of the person who performs the measurement (see also the discussion in section “Some Preliminary Issues”). As we have stressed in the foregoing, reproducibility is intended to safeguard the objectivity of the measurement outcome.10 In equipment for measuring temperature, automatically the person who performs the measurement is more or less eliminated (her role is reduced to switching on the measuring apparatus or even not that; think of the thermostat measuring the temperature in a room) and then reproducibility is usually not an issue. But when temperature is measured by means of the human body (e.g., by hand), then reproducibility may become a real issue; for instance, two persons may systematically, that is, after repeated measurements, disagree about which one of two objects is warmer than the other.

Finally, there is the notion of the accuracy of a measurement method. It is usually interpreted in terms of the extent to which the measured value corresponds to the real value. For a measurement to be accurate, it is necessary that the outcome is not influenced by features of the measuring instruments (see the discussion about the transparency of the measurement equipment in section “Some Preliminary Issues”). The more accurate a measurement procedure is, the higher the chance will be that the outcome of a measurement is closer to the real value. The notion of real value, however, has to be interpreted with care. It is not simply the value that corresponds to the objective state of affairs “out there.” It would be a mistake, as Chang has pointed out with regard to the notion of temperature, to think that the objective state of affairs includes a definite value of the absolute thermodynamic temperature for a system under consideration. Without an operationalization of that concept, the system has no absolute thermodynamic temperature. The real value, we propose, may be interpreted as the value that will be measured in the long run when the successful convergence of corrective epistemic iterations in the field of temperature physics (see Chang (2004, pp. 44–48) has come to an end. This real value as such is not part of the objective states of affairs but is constituted by that state of affairs in combination with the operationalization procedures that will be adopted in the long run.

In the following section, we will explore the extent to which these ideas about operationalization of concepts and about good measurements taken mainly from physics may be applied when it comes to measuring moral values in the context of design for values.

Value Definition, Specification, and Operationalization: An Example

To see how values can be operationalized in design for values and whether this can result in a “good” measurement of values, we will look at an example. The example we will consider is the “design” of a new coolant for household refrigerators in the 1990s (van de Poel 2001).

Before the 1990s, CFC 12 was the commonly used coolant for household refrigerators of the vapor compression type. It had come into common use in the 1930s when Thomas Midgley invented the CFCs. After 1970, however, the use of CFCs came under increasing pressure due to their contribution to degradation of the ozone layer. In 1987, the Montreal Treaty called for a substantive reduction in the use of CFCs. International conferences following the Montreal Treaty recommended yet tougher measures, and during the 1990s many Western countries decided to ban CFCs.

In the search for alternatives to CFC 12, three values played a key role: environmental sustainability, health, and safety. Each of these values has moral significance, and the better an alternative coolant scores on each of these values, the better it is from a moral point of view. So, taken together these three values may be said to determine the “moral goodness” of different potential alternatives. The values were operationalized in a two-step process (see Fig. 2). In a first step, the values were associated with certain evaluation criteria that are more concrete and specific than the values. This step is somewhat comparable with step 1 in Fig. 1. This step associates a more concrete image with an abstract concept. Similarly, evaluation criteria can be seen as the more concrete image of abstract values.
Fig. 2

Operationalization of the moral goodness of alternative coolants

However, the evaluation criteria are not directly measurable. To make them measurable, they have to be matched with attributes that can be readily measured and for which (standard) measurement methods exist. This step is comparable with the second step in Fig. 2 in which the concrete image (evaluation criteria in our case) is matched with actual entities and processes (attributes in our case).

It is important to note that both steps in Fig. 2, from values to evaluation criteria and from evaluation criteria to attributes, involve value judgments that have to be carefully distinguished from value judgments about the moral goodness of a particular refrigerant. The latter may be called first-order (or object-level) value judgments; they are judgments about the moral goodness of a particular refrigerant (design). The former are second-order (or meta-level) value judgments; they are value judgments involved in operationalizing moral values in a specific way. The role of second-order value judgments in choosing attributes is clearly pointed out by Keeney and Gregory (2005). They state that good attributes for measuring evaluation criteria11 have to satisfy a number of conditions under which they list comprehensiveness. An attribute (measure) is comprehensive when its levels cover all possible forms of achieving the evaluation criterion and any value judgment expressed in the attribute is reasonable. They give the following example (Keeney and Gregory 2005, p. 4):

Comprehensiveness … requires that one consider the appropriateness of value judgments embedded in attributes. Whenever an attribute involves counting, such as the number of fatalities, there is the assumption that each of the items counted is equivalent. With the number of fatal heart attacks, there is a built-in assumption that a fatal heart attack for a 45-year-old is equivalent to a fatal heart attack for a 90-year-old. Is this a reasonable value judgment for a particular decision? There is not a right or wrong answer, but it is an issue that should be considered in selecting the attribute.

According to Keeney (1992, p. 100) “the assignment of attributes to measure objectives always requires values judgments.” This is in fact also visible in the refrigerant case. One example is the use of global warming potential (GWP) as an attribute for the evaluation criterion “direct contribution to global warming.” GWP can be measured on different so-called integrated time horizons, i.e., different time spans over which the contribution of a certain substance to global warming is integrated. The choice of a specific time horizon involves a second-order value judgment, because it reflects a judgment about what are appropriate time horizons, for example, in the light of considerations of intergenerational justice that are part of the first-order value of “environmental sustainability,” of which the GWP is an operationalization.

Second-order value judgments are also involved in the first step of Fig. 2, leading from the overall value of moral goodness to more specific values and from there to evaluation criteria. This is related to the fact that the operationalization of values in design is always context dependent. This can be clearly seen in Fig. 2: it would be absurd to claim that the attributes mentioned in Fig. 2 measure “moral goodness” in general! Rather, they are meant to measure “moral goodness” in a very specific situation, namely, the “moral goodness” of potential alternative coolants to be used in household refrigerators. The choice of which specific values and evaluation criteria should be taken into account in a specific context itself involves (second-order) value judgments.

A closer look at the various “translations” made in Fig. 2 reveals that context and second-order value judgments play a different role in the various steps involved in the operationalization of values:
  • The association of certain values (like safety, health, and environmental sustainability) with “moral goodness” is context dependent. It depends on second-order value judgments on which values are affected by a design and should be taken into account in the design process.

  • The definition (and conceptualization) of values (like safety, health, and environmental sustainability) is largely context independent as there are general definitions available (e.g., in moral philosophy or in law). However, there may be lack of consensus on how to define (and conceptualize) these values.

  • The specification of values in terms of evaluation criteria is context dependent as it depends on the specific product (or class of products) designed. Similarly the selection of certain attributes to measure an evaluation criterion is context dependent. Both steps also involve second-order value judgments.

  • Measurement methods for attributes in design will often (but not always) be context independent as for many relevant attributes, general measurement methods are available. Here value judgments and lack of consensus play a minor role.

From the foregoing we may draw the conclusion that the operationalization of moral values can never be “objective” in the sense that the operationalization can be rationally derived from the meaning of the value without intermediary second-order value judgments. Prima facie a comparison of Figs. 1 and 2 may leave the impression that moral values can be operationalized in a two-step way similar to how physical quantities like temperature are operationalized. But a closer look reveals a significant difference. Because of the role of second-order value judgments in the operationalization of values, it may always be questioned whether a particular operationalization of a value will result in a “good” measurement of that value.12 In section “A ‘Good’ Measurement,” we distinguished three considerations in judging measurements: the validity, the reproducibility, and the accuracy of a measurement. The attributes mentioned in Fig. 2 can be measured in a reproducible and accurate way.13 It is, however, far less clear that they result in a valid measurement of the values and ultimately of moral goodness. Do the attributes together indeed measure the “moral goodness” of a certain alternative refrigerant? The issue here is one of construct validity. We have seen that in the case of the measurement of temperature, construct validity is achieved (or at least enabled) by a network of theories and measurement procedures. The crucial difference between the measurement of physical quantities and the measurement of values is that in the case of the measurement of values, we lack such a network of theories to guide the choice of second-order value judgments; as a result these second-order value judgments seriously undermine the construct validity of any measurement of values.

Codes, Standards, and Value Judgments

Second-order value judgments may be indispensable in operationalizing and measuring moral values in design, but that does not mean that such value judgments need be arbitrary. One may ask whether there is anything in the case of design and moral values that may take the place that the network of theories plays in operationalizing physical concepts. One option here may be so-called technical codes and standards. Technical codes are legal requirements that are enforced by a governmental body to protect safety, health, and other relevant values (Hunter 1997). Standards are usually not legally binding, but they might be designated as a possible, and sometimes even mandatory, way to meet a code; they may also play a role in business contracts and are sometimes seen as describing good design practice, and as such they may also play a role in litigation.

Codes and standards often play a prime role in the operationalization and measurement of moral values in design. In the coolants’ case, for example, the specification of the values of safety and health in terms of flammability and toxicity, and the attributes matched to these evaluation criteria, was directly based on technical codes and standards. In this case the ANSI/ASHRAE standards 15 and 34 played a major role.14 Standard 34 (“Designation and Safety Classification of Refrigerants”) says the following:

6.1 Refrigerants shall be classified into safety groups according to the following criteria.

6.1.1 Classification. The safety classification shall consist of two alphanumeric characters (e.g., “A2” or “B1”). The capital letter indicates the toxicity as determined by Section 6.1.2; the arabic numeral denotes the flammability as determined by Section 6.1.3.

6.1.2 Toxicity Classification. Refrigerants shall be assigned to one of two classes—A or B—based on allowable exposure: Class A refrigerants have an OEL of 400 ppm or greater. Class B refrigerants have an OEL of less than 400 ppm.

6.1.3 Flammability Classification. Refrigerants shall be assigned to one of three classes (1, 2, or 3) and one optional subclass (2 L) based on lower flammability limit testing, heat of combustion, and the optional burning velocity measurement. (ASHRAE 2013b, p. 14)

ASHRAE standard 34, then, results in six safety classes for refrigerants as indicated in Table 1:
Table 1

Refrigerant safety group classification (Based on ASHRAE 2013b, Fig. 6.1.4)


Lower toxicity

Higher toxicity

No flame propagation



Lower flammability



Higher flammability



Standard 15 (Safety Standard for Refrigeration Systems) prescribes which of these safety classes are allowed, and in what maximum amounts, for different kinds of refrigerating applications. The initial versions of standard 15 allowed unlimited use of A1 refrigerants in household refrigerators, forbade the use of A3 and B3 refrigerants, and set limits to all other categories, in this way guaranteeing a certain level of safety and health protection (ASHRAE 1994).

It should be noted that there are basically two kinds of standards: (1) standards for measurement and classification, like ANSI/ASHRAE standard 34 and, for example, the European Standard IEC 60079-20-1, and (2) standards setting (minimal) design requirements (or certain performances) like ANSI/ASHRAE standard 15 and the European Standard IEC 60335-1 and IEC 60335-2-24. The former are especially important for the operationalization and measurement of values in design, while the second are intended to guarantee that all designs at least meet relevant values to a minimal degree.

Standards often associate certain evaluation criteria and attributes with values like safety (or health or environmental sustainability). They also may contain measurement procedures or criteria for reproducibility (sometimes by reference to other standards). For example, European Standard IEC 60079-20-1:2010 contains the following description of the measurement method for autoignition temperature, an attribute that is relevant for the flammability of a coolant:

A known volume of the product to be tested is injected into a heated open 200 ml Erlenmeyer flask containing air. The contents of the flask are observed in a darkened room until ignition occurs. The test is repeated with different flask temperatures and different sample volumes. The lowest flask temperature at which ignition occurs is taken to be the auto-ignition temperature of the product in air at atmospheric pressure. (International Electrotechnical Commission 2010a, p. 14)

The same standards also contain the following criteria for reproducibility (and repeatability):

7.5.1 Repeatability

Results of repeated tests obtained by the same operator and fixture shall be considered suspect if they differ by more than 2 %.

7.5.2 Reproducibility

The averages of results obtained in different laboratories shall be considered suspect if they differ by more than 5 %. (International Electrotechnical Commission 2010a, p. 17)

Even if standards may contain very detailed prescriptions and measurement methods for values, they eventually also rely on value judgments. This becomes quite clear if one looks at the process of standard formulation (and revision). Standards are usually formulated by engineers sitting on standardization committees. The large standardization organizations like ANSI (American National Standards Institute), ISO (International Organization for Standardization), and CEN (European Committee for Standardization) all have procedural safeguards that try to ensure that stakeholders are heard in standard setting and that the resulting standards are based on a certain degree of consensus (or at least a majority). ANSI, for example, has requirements to guarantee openness, transparency, balance of interests, and due process, and standards require consensus. Consensus is defined by ANSI as:

substantial agreement reached by directly and materially affected interest categories. This signifies the concurrence of more than a simple majority, but not necessarily unanimity. Consensus requires that all views and objections be considered, and that an effort be made toward their resolution. (ASHRAE 2013a)

This is a clear recognition that the process of standard formulation involves value judgments, about which different people (stakeholders) may reasonably disagree. Nevertheless, standardization may be seen as a process in which a certain social consensus is achieved about how to operationalize and measure specific values in the design of specific product classes. If the achievement of this consensus meets certain (procedural) constraints, it might even be the case that a justified consensus is achieved.15 However, the sheer existence of technical codes and standards should not be seen as the proof that such a justified consensus exists. The coolants’ case is an interesting example that shows why such an assumption is problematic.

As we have seen ANSI/ASHRAE standard 15 initially forbade the use of flammable coolants. Table 2 lists a number of alternatives to CFC 12 that were considered. Of these, only HFC134a and HFC152a met the requirements of standard 15; of these, HFC134a quickly became the preferred coolant of the refrigerator industry because HFC152 was moderately flammable. However, the choice for HFC134a was heavily opposed by some environmental groups like Greenpeace, who preferred alternatives with a lower GWP. Eventually this led, at least, in Europe for a choice for other coolants like propane and isobutane.
Table 2

Properties of refrigerantsa


Environmental sustainability





Toxicity class

Flammability class

CFC 12





HFC 134a





HFC 152a





HC 290 (propane)





HC 600a (isobutane)





aOWP and GWP are based on Solomon et al. (2007)

Interestingly the choice for flammable coolants was accompanied by a change in the relevant codes and standards and in another operationalization of safety. What happened was that safety was at first specified in terms of the evaluation criterion “flammability (of the coolant).” This now came to be replaced by the evaluation criterion “explosion risk (of the refrigerator).” So the prescriptive European Standard EN-IEC 60335-2-24 in its 2010 version now contains the following prescription:

22.107 Compression-type appliances with a protected cooling system and which use flammable refrigerants shall be constructed to avoid any fire or explosion hazard, in the event of leakage of the refrigerant from the cooling system.

Compliance is checked by inspection and by the tests of 22.107.1, 22.107.2 and if necessary, 22.107.3. (International Electrotechnical Commission 2010b, pp. 31–32)

The reformulated standard 15 of ASHRAE in its 2013 version also leaves open the possibility for using flammable coolants, but it does not (yet) provide a new operationalization of safety in terms of explosion risk. Instead it says:

Group A3 and B3 refrigerants shall not be used except where approved by the AHJ. [AHJ = authority having jurisdiction]. (ASHRAE 2013a, p. 9)

If we want to understand why the operationalization of safety in terms of flammability suddenly became contested in the 1990s after it had been taken for granted since at least the 1930s, two contextual factors are of prime importance. One is the growing emphasis on environmental sustainability as an important value (see, e.g., Calm 2008). As we have seen, flammable coolants scored good on this criterion, which raised the question whether a refrigerator with flammable coolant could nevertheless be safe (which is obviously an issue of construct validity). The other has to do with the design of refrigerators. In the 1930s, when the CFCs were introduced, a household refrigerator could contain more than a kilogram of coolant and leakages were not uncommon; at that time explosion risks and toxicity were a serious issue. By the 1990s, after 60 years of design improvement, a typical household refrigerator contained a factor 10–100 less refrigerant, and leakages were much less common. These changes in technical design, in fact, opened the way to another operationalization of safety in terms of explosion risk of the refrigerator rather than flammability of the coolant.

What this story underlines is that changes in context may undermine the construct validity of a certain operationalization and measurement of a value in design. This is largely due to the fact that operationalization of values in design is very context dependent as we have seen. This is different from the case where we measure physical quantities. Of course, it is conceivable that new insights in physics may change the operationalization and measurement of temperature, but the operationalization and measurement of temperature appears to be much more robust against such changes in physics than the operationalization and measurement of values against the occurrence of contextual changes in engineering.

Conclusion and Discussion

The first conclusion to be drawn from our analysis is that there is a strong analogy between operationalizing physical quantities and moral values in the sense that abstract notions first have to be made more concrete by interpreting them in a specific setting or context: physical quantities in terms of a specific kind of physical system (concrete images) and moral values in terms of evaluation criteria for a specific design. Once that has been done, the interpreted physical quantities and the morally relevant evaluation criteria may be operationalized. In both cases the operationalization thus proceeds in two distinct steps.

But there are also crucial differences. Certain conditions that enable the operationalization of physical concepts in objective measurement procedures are not fulfilled when it comes to the operationalization of values. The most significant difference concerns the embedding of physical and moral concepts in detailed theoretical (abstract) frameworks. Physical concepts are embedded in networks of well-tested theories and operational procedures which make it possible not only to relate various interpretations of a physical concept to each other but also to relate one physical concept to other physical concepts. At present, something similar is lacking with regard to moral values. As a result, issues about construct validity play no major role in modern physics; the convergence and coherence of theoretical and empirical developments usually makes it possible to settle disagreements about construct validity of a particular physical measurement procedure. Because of the absence of such a network of detailed theoretical frameworks and measurement procedures when dealing with values, issues about whether a particular measurement procedure captures the attribute one intends to measure, that is, issues about construct validity, are much more difficult to resolve. More in particular, we have seen that second-order value judgments play a crucial role in the operationalization of values and that these value judgments seriously undermine any claim that values may be measured in an objective way.

Not only are controversies about construct validity scarce in physics but also controversies about what is called content validity. Content validity of a measuring procedure is related to the “adequacy with which the content [of a concept] has been cast in the form of test-items” (Carmines and Zeller 1979, p. 22). Because of the intricate network of theories and measurement procedures in which the notion of temperature is embedded, it is clear that the various notions of temperature employed all hang together and that we are dealing here with a “monolithic” notion that can be measured on one temperature scale and that measurement on this temperature scale fully exhausts the content of the notion of temperature. In contrast, the conceptual resources for arguing that a particular specification of a moral value of a design is content valid appear to be missing. If moral goodness of a design is not a monolithic notion, that is, that its meaning is so to speak spanned up by various attributes (dimensions), each attribute corresponding to a different aspect relevant to the moral goodness of a design and being measured on a different scale, then how to ascertain whether all relevant attributes (with their corresponding scales) of the notion of moral goodness have been taken into account? Moreover, how to justify the relative importance of the various attributes for the assessment of the overall notion of moral goodness of a design? In other words, how to aggregate the scores on the various attributes into an overall score for moral goodness? This multi-criteria problem with regard to the morally best design option is just a special case of the general multi-criteria problem that presents itself with regard to selecting the best design option from a set of options given their scores in various criteria (Franssen 2005).16

All in all we may conclude that issues about construct and content validity and issues about aggregation in case of multi-attributes make any objective measurement (comparison) of the overall moral value of design options a highly problematic affair. The absence of objective measurement of values, however, does not imply that the operationalization and measurement of values in design is arbitrary. We have seen that technical codes and standards play a major role in the operationalization and measurement of values in design. Although codes and standards ultimately rely on certain value judgments, they may nevertheless establish a reasonable or justified consensus on how to operationalize and measure values in design. Standard organizations indeed adhere to certain procedural criteria in order to enable the achievement of such a consensus.

Still, we have seen that we cannot simply assume that current codes and standards establish a reasonable or justified consensus on how to operationalize and measure values in design. One main reason is the highly context-dependent character of operationalizations in design. As a consequence, standards may not reflect the latest technical and social contextual developments, because even if codes and standards are regularly revised, major changes, as with many formalized rule systems like the law, often go slowly and are difficult to achieve. In addition, even if standards can be very detailed and specific for particular kinds of apparatus and devices, they may still not cover all relevant considerations for designing them. For both reasons, the operationalization of values in design processes will usually require value judgments by the designer or design team. However, the value judgments made by designers need not be arbitrary or unjustified. What designers at least can do is to try to embed them in a network of other considerations, including definitions of the values at stake in moral philosophy (or the law), existing codes and standards, earlier design experiences, etc. (For a suggestion on how designers might do so especially when they try to translate moral values in design requirements, see Van de Poel (2014).)



  1. 1.

    See, for instance, Sayre-McCord (2011).

  2. 2.

    Here we have to point out an important caveat. If the overall moral goodness of a design option is not directly measurable but is the aggregated result of the assessment of that design option on various criteria each of which is separately objectively measurable, then in general it will not be possible to compare various design options with regard to their overall moral goodness. In that case the notion of the morally best design option makes no sense. This is due to issues in multiple criteria analysis (see below). What we have in mind here is the assessment of various designs against a “monolithic” moral criterion, that is, a criterion that is not itself the aggregated result of multiple measurable sub-criteria and that may be directly measured in an objective way.

  3. 3.

    Note that the fact that a value is pursued for its own sake does not exclude that it may also be pursued for other reasons.

  4. 4.

    Of course, it is possible to correct for systematic errors of measuring devices such that the corrected outcome is determined only by features of the object(s) on which the measurement is performed (for instance, one may correct the outcome of a measurement with scales for the fact that the arms of the scales are not of the same length). In our opinion, however, we are then dealing with two different kinds of measurement methods, the original one and one with a correction procedure.

  5. 5.

    For an interesting discussion of the notion of transparency of experimental equipment, including measurement devices, see Lelas (1993).

  6. 6.
  7. 7.
  8. 8.
  9. 9.

    In the literature accuracy is often taken to be part of the notion of validity; see, for instance, Carmines and Zeller (1979). Conceptually, however, a distinction can be made between the questions whether a measurement method measures the intended quantity (e.g., temperature) of the system under consideration or not and if so, how accurate the measurement method is. That is the reason why we prefer to distinguish between validity and accuracy. In specific cases, however, it may be difficult to distinguish between accuracy and construct validity (see below for the notion of construct validity). Consider the case in which someone tries to measure the temperature of a liquid with a mercury thermometer. Suppose that the amount of heat transferred from the liquid to the tip of the thermometer is small compared to the total amount of heat in the liquid. Then, the smaller the heat transfer from the liquid to the tip of thermometer is, the more accurate the measurement will be. But now suppose that the amount of heat transferred becomes more or less equal to or less than the amount of heat in the liquid (e.g., someone tries to measure the temperature of a drop of water with an ordinary mercury thermometer). Then the measurement becomes less accurate or even construct invalid, for in the extreme case of a very small amount of liquid compared to the mercury in the tip of the thermometer, one no longer measures the temperature of the drop of liquid but of the ambient temperature in which the thermometer is kept. This example shows that under certain conditions, very inaccurate measurements may become construct invalid.

  10. 10.

    Reproducibility does not mean that if the measurement is repeated by the same person, exactly the same result will come out. Due to (random) measurement errors, the outcomes will be distributed according to a certain probability function. Reproducibility requires that this probability function over the outcomes is the same when the measurement is performed by another person.

  11. 11.

    They speak of objectives but these are similar to what we call evaluation criteria.

  12. 12.

    Note that, similar to second-order value judgments in the case of the operationalization of values, second-order epistemic value judgments are important in the operationalization of physical concepts. In general, however, these second-order epistemic value judgments appear not to undermine the construct validity of measurement procedures for physical concepts; we will not enter here into a discussion of why this is the case.

  13. 13.

    At least reproducibility and accuracy are not fundamentally more problematic than in the case of physical quantities because the attributes are, at least in this case, all physical quantities. It may not always be possible to operationalize values in terms of physical quantities, and in such cases reproducibility and accuracy are more of an issue. But even, then, we would argue the real issue is validity.

  14. 14.

    ASHRAE is American Society of Heating, Refrigerating and Air-Conditioning Engineers; ANSI is American National Standards Institute.

  15. 15.

    We leave in the middle here when a consensus is justified, but one might think here of John Rawls’ idea of an overlapping consensus (Rawls 2001).

  16. 16.

    See the chapter “Conflicting Values in Design for Values” for a detailed description of this multi-criteria problem for choosing the morally best design option.



We thank Maarten Franssen for valuable comments on an earlier version of this chapter.


  1. ASHRAE (1994) Safety code for mechanical refrigeration, ANSI/ASHRAE standard, 15-1994. ASHRAE, AtlantaGoogle Scholar
  2. ASHRAE (2013a) Safety standard for refrigeration systems, ANSI/ASHRAE standard, 15-2013. ASHRAE, AtlantaGoogle Scholar
  3. ASHRAE (2013b) Designation and safety classification of refrigerants, ANSI/ASHRAE standard, 34-2013. ASHRAE, AtlantaGoogle Scholar
  4. Calm JM (2008) The next generation of refrigerants – historical review, considerations, and outlook. Int J Refrig 31:1123–1133CrossRefGoogle Scholar
  5. Carmines EG, Zeller RA (1979) Reliability and validity assessment. Sage, LondonCrossRefGoogle Scholar
  6. Chang H (2004) Inventing temperature: measurement and scientific progress. Oxford University Press, OxfordCrossRefGoogle Scholar
  7. Franssen M (2005) Arrow’s theorem, multi-criteria decision problems and multi-attribute design problems in engineering design. Res Eng Des 16:42–56CrossRefGoogle Scholar
  8. Hunter TA (1997) Designing to codes and standards. In: Dieter GE, Lampman S (eds) ASM handbook. ASM International, Materials Park, pp 66–71Google Scholar
  9. International Electrotechnical Commission (2010a) International standard IEC 60079-20-1 Explosive atmospheres – part 20–1: material characteristics for gas and vapour classification – test methods and data. Edition 1.0 1-2010, GenevaGoogle Scholar
  10. International Electrotechnical Commission (2010b) International standard IEC 60335-2-24 Household and similar electrical appliances – safety – part 2–24: particular requirements for refrigerating appliances, ice-cream appliances and ice-makers. Edition 7.0 2010-02Google Scholar
  11. Keeney RL (1992) Value-focussed thinking: a path to creative decisionmaking. Harvard University Press, Cambridge, MAGoogle Scholar
  12. Keeney RL, Gregory RS (2005) Selecting attributes to measure the achievement of objectives. Oper Res 53:1–11CrossRefGoogle Scholar
  13. Lelas S (1993) Science as technology. Br J Philos Sci 44:423–442CrossRefGoogle Scholar
  14. Nagel T (1979) The limits of objectivity. Brasenose College, Oxford University (May 4, 11 and 18)Google Scholar
  15. Plato, Hamilton E, Cairns H (eds) (1973) The collected dialogues of Plato, vol LXXI, Bollingen series. Princeton University Press, PrincetonGoogle Scholar
  16. Rawls J (2001) Justice as fairness. A restatement. The Belknap Press of Harvard University Press, Cambridge, MAGoogle Scholar
  17. Sayre-McCord G (2011) Moral realism. In: Zalta EN (ed) The Stanford encyclopedia of philosophy (Summer 2011 Edition).
  18. Searle J (1995) The construction of social reality. Penguin, LondonGoogle Scholar
  19. Solomon S, Qin D, Manning M, Chen Z, Marquis M, Averyt KB, Tignor M, Miller HL (eds) (2007) Climate change 2007: the physical science basis: contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change. Cambridge University Press, CambridgeGoogle Scholar
  20. van de Poel I (2001) Investigating ethical issues in engineering design. Sci Eng Ethics 7:429–446CrossRefGoogle Scholar
  21. Van de Poel I (2014) Translating values into design requirements. In: Mitchfelder D, McCarty N, Goldberg DE (eds) Philosophy and engineering: reflections on practice, principles and process. Springer, Dordrecht, pp 253–266Google Scholar
  22. Vincenti WG (1990) What engineers know and how they know it. John Hopkins University Press, BaltimoreGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2014

Authors and Affiliations

  1. 1.Delft University of TechnologyDelftNetherlands

Personalised recommendations