Happiness Studies and the Problem of Interpersonal Comparisons of Satisfaction: Two Histories, Three Approaches


An old methodological obstacle confronts the use of Life Satisfaction surveys in Happiness Studies: a problem that economists recognize by the name of (the impossibility of) interpersonal comparisons of satisfaction/utility. But the recent implementation of insights from happiness studies into policy making transforms an originally theoretical obstacle into a real-world problematic, providing substantial motivation for engaging with this issue. Just this problem is highlighted by recent critics of happiness surveys. This paper locates the problem currently facing happiness studies at the intersection of two traditions or two histories: that of economic methodology and that of psychological methodology. Three dominant approaches to the issue revealed through these histories are identified: ‘the skeptical approach’, ‘the pragmatic approach’, and ‘the ethical-normative approach’. The paper works to bring together the two disciplinary histories and evaluate the three approaches in order to frame a suitable interpretation of inter-personal comparisons in happiness studies. The implications of this are twofold: it contributes to the legitimation of happiness studies, suggesting an answer to its critics, while, at the same time casting the status of its building blocks under a different light.

  1. I choose to use the term satisfaction and not utility here, first, in order to avoid confusions about the changing meaning of the concept of utility in the history of economics, and, second, because it is the term more frequently met in the language of psychologists.

  2. Prominent examples include Oswald (1997), Diener and Seligman Martin (2004), Layard (2005), Layard (2006), Diener (2009), Veenhoven (2010), Frey and Stutzer (2009), Frey and Stutzer (2012).

  3. For example, researchers widely use surveys such as World Values Surveys in which the one item (question) is: “All things considered, how satisfied are you with your life as a whole these days? Please use this card to help with your answer [range of 1–10 with 1 labelled “Very Dissatisfied” and 10 labelled “Very Satisfied”]. For more examples see Adler (2012, pp. 2–3).

  4. Extensive surveys of the subject in the history of economics has been provided by Sen (1979), Hammond (1991), Fleurbaey and Hammond (2004).

  5. See Fleurbaey and Hammond (2004, pp. 46–49), Davis (1992), Rosenbaum (1995).

  6. I relate here to Hands’s important clarification of the distinction between the mere ‘normative’ versus the ethical-normative (Hands 2012). The later constitutes a case of dependence on values and ethical convictions, i.e. not just on any kind of value judgement, but on one that relates to social arrangements, welfare, fairness or justice.

  7. Enthused by the doctrine of utilitarianism and by its psychological-hedonist presumptions aimed at combining utilitarian elements with cutting edge economic theory. The result in Jevons’s case was the formulation of a mathematical representation of individual utility (focusing on the ‘lower feelings’). Jevons, nevertheless, avoided using this as a platform for a mathematical formulation of social aggregate utility/satisfaction.

  8. Jevons (1888), paragraph 20. Jevons’s colleague, Edgeworth, by contrast, did not hesitate to do exactly this. Under what he titled ‘exact utilitarianism,’ Edgeworth presented abstract problems of allocation (of stimulus, and then of ‘means to stimulus,’ such as wealth and labor) using basic functions of pleasure-producing, so that the aggregate of pleasures/happiness, the result of that allocation, would be at a maximum.

    Edgeworth (1877, 1879). The basic mathematical representation was:

    $$ \int {\int {\int {(dp)(dt)(dn)} } } $$

    where p is pleasure degree, t is time duration, n is ‘number of enjoyers.’ See Edgeworth (1879, p. 394).

  9. The interpretation of utility as cardinal ascribes to the utility function more than just an ordering of preferences of one individual. (i.e. it addresses utility as a magnitude).

  10. Robbins (1934, p. 123).

  11. See Hands (2009).

  12. Robbins (1997).

  13. See footnote 6 above.

  14. See Angner (2009a) on the two methodologies: the psychometric approach used in happiness studies, and the axiomatic/representative approach in welfare economics. For the basic distinction see Krantz (1991).

  15. Fumagalli (2013, pp. 325–326).

  16. Broome (1991), Hausman and McPherson (2009).

  17. Kahneman et al. (1997), Kahneman et al. (1999), Kahneman and Krueger (2006).

  18. Camerer et al. (2004).

  19. It is unfortunate because neural utility is, perhaps, the most resistant to the IPCS problem.

  20. Adler (2012, pp. 10–14). The interpretation of LS data as non-purely-experiential goes hand in hand with psychologists’ distinction between affective and cognitive components of subjective well-being and their designing scales in order to reflect both. Life-satisfaction scales were implicitly designed as more than mere affective-well-being scales. See Lyubomirsky and Lepper (1999), Diener (1994).

  21. Kahneman and Krueger, who have taken Robbins’s economic point of view explicitly into account. Thus they make use of this very problem when arguing for the need for alternating the methodology of life-satisfaction surveys with other methodologies for the measurement of subjective well-being (such as the U-index): “One of the difficulties of using data on subjective well-being is that individuals may interpret and use the response categories differently… when Tim answers a 4 about the intensity of a particular emotion, maybe that is the equivalent of a 6 for Jim… We propose an index, called the U-index which overcomes this problem.” Kahneman and Krueger (2006, pp. 18–19). Whether or not these scientists succeeded in overcoming with their suggested methodology the problem of IPCS is an interesting question, albeit outside the scope of this paper.

  22. For this point see also Angner (2009b, pp. 158–163), Fleurbaey and Hammond (2004, p. 52).

  23. Here the approaches may differ. One could argue for the objective or descriptive status of the data. This might be done by suggesting that once the many individuals make their reports in a contemplative manner (with no hidden intra-personal problematics), there is no need for an external authority to ascribe the given scores with meaning (thereby making them useful) This is the approach endorsed in Ng (1996). A counterargument might be that this step necessarily requires an additional external ethical-normative judgment.

  24. See for example the constructive suggestions regarding these issues in Fleurbaey and Blanchet (2013, pp. 199–201). Their suggestion focuses on the particular design of the questionnaires. Another attempt to face these challenges is Ng (1996). Whether this problem is solvable is an interesting question, but beyond the scope of this paper since this is not the stage involving IPCS.

  25. Hence the development of a microeconomics that connected competitive market solutions with Pareto optimality (via the first and the second ‘fundamental theorems’) and that was therefore, allegedly, immune to IPCS.

  26. See the surveys by Sen (1979) and Hammond’s (1991). Among those approaches a very partial list would include: Harsanyi (1955), Little (1950), Waldner (1972), Ng (1975, 1982). See also Davis (1992) for a philosophical account of the issue. Davis characterized inter personal comparisons of utility as descriptive and value-laden and so not ethical-normative.

  27. Economists such as Isbell (1959) and Schick (1971) have suggested such strategies, i.e. first cardinalization by using preferences over risk (probabilities), a method used by von Neumann and Morgenstern, and then normalizing the scales by putting upper and lower bounds on all individual utilities. See in Hammond (1991, pp. 215–216).

  28. As it raised by Hammond, ibid. p. 216.

  29. Harsanyi (1955, pp. 315–316).

  30. Harsanyi (1955, p. 317).

  31. Described as follows: “If in a given situation one individual gives more forcible signs of satisfaction or dissatisfaction than another, is this so because the former feels more intense satisfaction… or only because he is inclined to give stronger expression to his feeling?” Ibid. p. 318.

  32. Ibid. pp. 318–319.

  33. It should be noted, though, that Harsanyi does agree with Robbins that in some cases IPCS does involve “ethical or political restrictive postulates”, but holds that those cases should be distinguished from the cases of IPCS without conventional element of this kind. Ibid. p. 320.

  34. Ibid pp. 319–320.

  35. Harsanyi’s later and more famous accounts of inter-personal comparisons, which used such concepts as ‘the similarity postulate,’ ‘imaginative empathy’ and ‘extended preferences,’ are in line with this approach. Nevertheless, they do not add much to the basic solution offered to our concerns here. Harsanyi (1982, p. 50). This is also the case with the spectator’s ‘extended preferences’; see Adler (2014, pp. 126–131).

  36. A recent account of the downfalls of this approach in welfare economics is given by Adler (2014). Adler suggests to base inter-personal comparisons of utility on spectator’s sympathy. As stated by Adler this alternative base, contrary to Harsanyi’s ‘imaginative empathy,’ is not value-free. Ibid p. 150.

  37. Kahneman et al. (2004, p. 432).

  38. Kahneman et al. (1999), preface p. ix.

  39. See Michell (1999, chapter 4).

  40. Stevens’s idea was developed especially in reaction to skeptical views concerning psychology and scientific measurement expressed by The Ferguson Committee (1940); see Michell (1997, pp. 368–369; 1999, pp. 143–155).

  41. Stevens (1946, p. 677).

  42. Chang and Cartwright (2013, p. 367).

  43. On Stevens’s operationalism (a view that he explicitly adhered to) see Michell (1999, pp. 169–177).

  44. The other, more moderate nominalism is known as conventionalism, according to which we are free to choose by agreement the correct measurement method for a concept; see Chang and Cartwright (2013, p. 368).

  45. Michell (1997, pp. 360–361). The ontology of a theory consists in the objects the theory assumes there to be. Epistemology is the study of knowledge and justification; see Audi (2015).

  46. Chang and Cartwright distinguish between “precision” and “accuracy,” which are often confused: “Accuracy is a realist notion about whether measurement results agree with the true values; precision is a concept that is meaningful to the realist and nominalist alike, as it indicates merely how specific a measurement result is.” Ibid. p. 370. So focusing only on methods could lead us to far-reaching conclusions as far as improvements in precision are involved but not necessarily to improvements in accuracy.

  47. Cronbach and Meehl (1955, p. 283).

  48. Nunnally and Bernstein (1994, chapter 3: ‘Validity,’ p. 85).

  49. The internal consistency and stability with which a measuring instrument performs its function. Colman (2008).

  50. The extent to which a test measures what it purports to measure, or the extent to which specified inferences from the test’s scores are justified or meaningful. Colman (2008).

  51. Test theory is the archetype of a problem unique to psychological research that requires a statistical solution. Considering a test score from a statistical point of view, it is highly desirable to derive an ancillary statement of its precision. In the most basic approach to true score theory, the test score X is considered the sum of a true score T and a random error E., where X = T+E. The standard deviation of the errors E is a statement of the (lack of) precision, or standard error, of the test score. Jones and Thissen (2007, pp. 10–11)

  52. Zumbo (2007, 48, 56).

  53. The idea behind this form of validity was introduced by the English statistician and psychologist Charles Spearman in 1904, where he interpreted intelligence as the factor g that underlies all test items and subtests with good content validity (that is, items and subtests that appear to require intelligence) and argued that the most valid tests are those with highest loadings on the factor g. Colman (2008).

  54. Zumbo (2007, p. 72).

  55. Diener et al. (1985).

  56. “Results indicated that the Subjective Happiness Scale has high internal consistency, which was found to be stable across samples. Test–retest and self-peer correlations suggested good to excellent reliability, and construct validation studies confirmed the use of this scale to measure the construct of subjective happiness.” Lyubomirsky and Lepper (1999).

  57. Equating is a statistical process that is used to adjust scores on different test forms so that scores can be used interchangeably; see Kolen and Brennan (2004, p. 2).

  58. Linking is concerned with situations in which statistical adjustments are made to scores for tests that differ in content and/or difficulty; see Kolen and Brennan (2004, p. 423).

  59. Among the first seminal works in the field are: Lord and Novick (1968), Samejima (1969), Bock and Lieberman (1970). For an historical review see Jones and Thissen (2007, pp. 12–13).

  60. Barsboom (2005, Chapters 2, 3).

  61. Reise et al. (2005, p. 95).

  62. To judge the quality of an item, one can transform the item’s IRF into an item information function (a number that represents an item’s ability to differentiate among people) the item provides at each trait level. Reise et al. (2005, p. 95).

  63. Invariance in IRT means two things. First, an individual’s position on a latent-trait continuum can be estimated from his responses to any set of items with known IRFs, even items that come from different measures. (In contrast, in classical test-theory, item responses are aggregated to estimate a true score that is specific to that measure alone). Second, item properties, as represented by IRF, do not depend on the characteristics of a particular population (also, contrary to classical test-theory). Ibid. p. 96. This gives these kinds of model advantages in linking and equating (see footnotes 57, 58).

  64. Ibid. p. 95.

  65. “Trait-level estimates in IRT are superior to raw total scores (in classical test-theory) because: (a) they are optimal scaling of individual differences (i.e. no scaling can be more precise or reliable; (b) latent-trait scales have relatively better (i.e. closer to interval) scaling properties.” Ibid p. 98.

  66. The model developed by Rasch and his followers is often called the “one parameter” model to contrast it with other IRT “two/multi parameters” models: its item parameter reflects difficulty (of the items) and does not consider any item discrimination parameter, this later representing how sensitive is the discrimination between respondents at all levels of the latent-trait continuum.

  67. Note the interesting definition of “objective measurement” on the institute website: “An objective measurement estimate of amount stays constant and unchanging (within the allowable error) across the persons measured, across different brands of instruments, and across instrument users. The goal of objective measurement is to produce a reference standard common currency for the exchange of quantitative value, so that all research and practice relevant to a particular variable can be conducted in uniform terms. Objective measurement research tests the extent to which a given number can be interpreted as indicating the same amount of the thing measured, across persons measured, and brands of instrument.”

  68. The same scale is analyzed in Lyubomirsky and Lepper (1999).

  69. SEM represents a method that combines factor analysis model (see footnote 53) with multiple regression (econometrics). Jones and Thissen (2007, pp. 17–18).

  70. See for example Oishi (2006).

  71. Maul (2013, p. 753). While some psychologists are aware of this shift and have explicitly suggested abandoning Stevens’s basic definition of measurement, many other psychologists embrace Stevens’s definition (usually without acknowledging the nominalist position it embodies). See Bond and Fox (2015, pp. 1–5), Michell (1997, 1999), Borsboom (2008, pp. 47–50).

  72. Maul (2013, pp. 753–754).

  73. Borsboom and Mellenbergh (2004, p. 118). See also Barsboom (2005), and Borsboom (2008) for an elaborated discussion of “latent variables”.

  74. Barsboom (2005), chapters 2, 3.

  75. Zumbo (2007, p. 73).

  76. For specific problematic implications for public policy see Duncan (2013), Frey and Stutzer (2012).

  77. Indeed, it might be the case that part of the research is actually restricted to this kind of policy (or very close to it), but most decisions in public policy (especially the more interesting ones) are not such as this.

  78. Simon (1974, p. 66).

  79. Fleurbaey and Hammond (2004, p. 48). This description can be framed also with the first-aspect and second-aspect terminology (presented in Sect. 1.2), with the second and third steps together with the first part of the fourth step generating the first-aspect.

  80. Ibid.

  81. In an earlier prominent paper by Hammond (1991) a related explicit argument was raised to establish IPCS with the aid of: the values of the ethical observer, as influenced by that observer’s understanding of the individual’s psychology, and the observer’s view of how society benefits from creating that individual or changing the individual’s situation. Hammond (1991, p. 234).

  82. As suggested in the general case by Fleurbaey and Hammond.


  Interpersonal comparisons
  Life satisfaction
  Methodology
  Happiness
  Utility
  Psychology
  Economics