Are Most People Happy? Exploring the Meaning of Subjective Well-Being Ratings

The claim that most people are happy and satisfied, assuming that high self-ratings on numerical scales indicate good lives, is cross-checked against extensive verbal reports in a large-scale mixed-methods validation study. For a sample of 500 qualitative interviews conducted in Austria, the usual 10-point-scale self-ratings of life satisfaction and happiness were linked to the content of respondents’ actual narrations. Additionally, the narrated well-being was classified according toanalternative evaluation schemebyexternal raters. The results show that many persons report substantial restrictions to their hedonic experience in spite of high or evenveryhigh ratings, and that the narratedwell-beingevaluation ismuchmorecritical than the self-rating. Therefore it is argued that a naı̈ve interpretation of high self-rating values as top life experience systematically ignores negative aspects of life. The claimed predominance of happiness should be substantially reformulated. Inparticular,more attention should bedrawn to resilient satisfaction in the presence of substantial psychological burden, and to the nonnegligible group of highly positive life satisfaction ratings which lack evidence of corresponding hedonic experience in the life narratives.


Introduction
It is a common statement in the subjective well-being (SWB) literature that most people in Western civilization feel happy and are satisfied with their lives. As an example of corresponding media coverage, Statistics Austria (2012) published in a press release that 79 % of the Austrians are satisfied with their lives, and reported a ''high global life satisfaction''. Diener and Diener (1996) use the phrase ''Most people are happy'' as the title for a widely cited article. Indeed, there is a huge amount of data showing that country averages of life satisfaction (LS) ratings are mostly in the positive regions, seldom falling below ''the midpoint of the scale'' or the ''neutral point' ' (p. 181): ''For moods and emotions, the neutral point refers to that place at which the individual experiences an equal amount of pleasant and unpleasant affect. A positive hedonic level refers to experiencing positive affect more of the time than negative affect.'' There is also evidence that people spend more time in predominantly positive hedonic states. In a similar vein, Veenhoven (1991) argues that reported levels of happiness are more than just relative values. However, Biswas-Diener et al. (2005: 205) conclude: ''Thus, the fact that most people tend to be moderately happy does not mean that they are ecstatic.'' Considering certain kinds of positivity tendencies in the ratings, they add later (223): ''In other words, positive happiness ratings do not indicate that conditions are excellent or that the society need not be improved.'' Indeed, other sources of evidence about psychological states seem to convey a less optimistic impression than satisfaction charts. As an example, prevalence of mental disorders in the past year is estimated as high as 27 % in the European Union (WHO 2015) and 19 % in the US (NIH 2015). Cummins and Nistico (2002: 37) suspect a positive cognitive bias underlying the ''remarkable level of uniformity'' regarding self-rated SWB, which they specifically attribute to a necessity of keeping up self-esteem, control and optimism. Cummins (2003) and Tomyn and Cummins (2011) assume a general tendency of homeostatically protecting one's own mood inhibiting negative thoughts about one's life. Comparing the shape of different distributions, the former author concludes that homeostatic protection particularly prevents self-ratings falling below 70 % of the maximum possible value. This theory refers to LS in general, whereby the articles do not explicitly focus on biases only due to defense in the moment of responding. Similarly, Staudinger (2000) formulates high satisfaction in spite of negative circumstances as a ''subjective well-being paradox'', and explains it, above all, by means of coping and adaptation processes (see also Shmotkin 2005).
Unfortunately, in spite of the complex and subjective character of LS and happiness, there are hardly any cross-checks at an individual level on what kind of lives and concrete circumstances are actually represented by numerical self-ratings such as 7 or 8, nor is there-to the authors' knowledge-any guideline on how to translate an ''8'' back into actual life circumstances. This study tries to bridge this gap by comparing the self-ratings of 500 interview partners with the contents of in-depth semi-structured qualitative interviews about their lives.

Background
Measures of SWB (Diener 2000), in particular the aspects of LS (the overall evaluation of life satisfaction) and happiness (focusing rather on the emotional aspect), are widely used and have made their way into official statistics and reporting (Krueger and Stone 2014), such as the OECD publication ''How's life?'' (OECD 2013a). Many see SWB-assessment as one of the tools which will enable a move away from the merely economic evaluation of societies (such as through GDP) towards a more human-centered view: In his seminal paper, Easterlin (1974) claims that LS remains rather static in spite of rapidly growing economic output, giving a diagnosis which is still subject to scientific debate (Stevenson and Wolfers 2008;Easterlin et al. 2010;Veenhoven and Vergunst 2014). Among others, the well-known ''Stiglitz report'' (Stiglitz et al. 2010) propose enhancing assessment of the progress of societies by including subjective indicators about well-being such as LS or happiness-a recommendation which has been widely adopted by many nations and international initiatives like Europe's 'Beyond GDP' movement (European Commission 2015). However, there is still an ongoing discussion about the significance of subjective information in general, and about the meaning of LS or happiness ratings in particular (Angner 2005;Haybron 2008;Schwarz et al. 2008).
Formulations of items can be found in the World Database of Happiness (Veenhoven 1995); the wording ''Taking all things together, how satisfied are you with your life these days? 0…very dissatisfied, 10… very satisfied.'' may serve as a typical example. Items of this kind reach a certain level of stability (Diener et al. 2013; see also Krueger and Schkade 2008)-correlations between two time points of about 0.56 (Fujita and Diener 2005), 0.67 (Michalos and Kahlke 2010) or 0.77 (Lucas et al. 1996) after 1 year-and validity-typically moderate correlations with plausible criteria of a satisfactory life (Diener et al. 2013;Larsen et al. 1985;Layard 2010;Oswald and Wu 2010). Critics refer to the fundamental problem that a rating about one's own life involves many hardly controllable factors, such as different aspiration levels of what constitutes a ''satisfying'' life. For an overview about different interpretations of responses on LS questions see Haybron (2008).
Of course, although single-item measures are widely applied, far more sophisticated measurement tools have been developed, for example the Satisfaction with Life Scale  or panel approaches such as the experience-sampling method (Kahneman and Krueger 2006). However, the problems presented by this article still remain as long as the assessment is based on sets of items involving self-rating procedures.
In addition to the problem that measured LS may be subject to a lot of moderating processes, such as adaptation towards positive or negative changes in life, homeostatically protected mood (Tomyn and Cummins 2011), self-serving biases and top-down effects such as downgrading actual negative experience in global hindsight (cf. Diener et al. 2013), defense mechanisms, social desirability, different judgment standards, better-than-averageeffects (Alicke 1985), context or order effects, and similar (for an overview, cf . Pavot 2008;OECD 2013b;Diener et al. 2013), we want to draw attention to two very fundamental measurement issues: the actual semantic meaning of ''life satisfaction'' from the respondent's point of view, and the actual semantic meaning of the particular response categories. Interpreting the notion ''life satisfaction'' and the meaning of the possible response levels is typically left to the respondent. Whereas there are sophisticated scientific definitions describing how researchers understand ''satisfaction'', the responses are usually just defined by their numerical positions between two extremes, for example a certain position between 0 and 10. Veenhoven (2009) had participants from various countries assign numbers from 0 to 10 to various verbal labels (in their respective languages) to indicate the level of implied happiness. Slight country differences were observed, such as ''very happy'' receiving an average rating of 9.0 from the Dutch, but 8.6 from the English participants. Even if all categories have a verbal anchor, formulations such as ''very satisfied'' leave still some room for interpretation. For a discussion about interpretation in rating scales, see Schwarz et al. 2008. Diener et al. (2013 discuss potentially differential use of the underlying numerical value by different persons, and how this might be dealt with by robustness analyses or mixed item-response models. Well-known approaches to tackling the latter issue are the Cantril ladder (1965), which invokes a comparison with the ''best possible life'', or the usage of anchor vignettes (King et al. 2004) where the respondents rate certain fictitious lives for satisfaction, enabling comparison of their own lives with the anchors. Neither approach assigns direct explanations for the absolute quality of a rated level; it still has to be taken as an implicit assumption that the respondents perceive the question and the grading of possible responses in a similar way to the professionals. In contrast, instruments like Kahneman's and Krueger's U-Index (Kahneman and Krueger 2006) circumvent the anchor problem by reporting a time percentage (namely, the time spent in a more unpleasant than pleasant state).
The aforementioned conclusion that most people are happy is subject to essentially the same problem: what amount (or 'what level') of happiness does it indicate? How happy is ''happy''? Is the statement just another formulation of the numerical fact that average ratings are to the right of the middle of the scale, or does it express that most people really live in a desirable psychological state? The cited sources rather suggest that the latter is the case. In fact, interpreting values to the right of the middle as ''happy'' assumes that the respondent's transition process (Kim-Prieto et al. 2005; 'transformation function' in Köke and Perino 2014) from life perception to self-rating and the researcher's transformation function from the response back to a judgment of another person's life are sufficiently inverse to each other (for an overview of the psychology of responding to rating scales, see again Schwarz et al. 2008;Schwarz and Strack 1999). In both cases, a semantic bridge between the rating and the experienced quality of life is desirable: either to give the term ''happy'' some psychological content, or to check whether people with responses in the upper half really enjoy a positive life experience. Accordingly, the results described in the following will shed some more light on the actual meaning of happiness ratings, and in particular will provide evidence against premature interpretations of ''very happy'' responses as ''top life experience''.

The MODUL Study of Living Conditions
Between April and August 2012, 500 semi-structured interviews were conducted in German language at 10 different locations in Austria (Ponocny et al. 2015). 27.6 % of respondents (all aged 16?) were recruited by simple random sampling from telephone lists, professional marketing address lists, and via local communities. The remaining participants were recruited through snowball sampling by the randomly sampled participants, whereby an additional 22.8 % stemmed from the same household, with the other 49.6 % from new households. Demographic data showed approximate representativeness of the Austrian adult population regarding age and education, but not for gender (62.1 % female). Respondents were asked about good and bad things in life by a total of 12 trained interviewers (psychology graduates), with a typical interview duration of about 45 min. The aim of the interviews was to cover those aspects which were important for the evaluation of life from the participant's point of view. The interview guidelines contained, among others, the tasks ''Describe good and bad times in your life'', ''What is important for your well-being?'', ''Are there burdens or challenges in your life?'', ''What is currently influencing your mood?'', ''Are there restrictions in your Balanced Balanced positive versus negative emotions: ''so-so''. Credible reports of positive AND negative experience, but somewhat dampened (compared to ambiguity). This applies to many persons who rate their mood as ''neutral'', ''normal'', or ''not good -not bad'', in a life without marked peaks or lows, or show resignation. Defense of emotions is likely, in particular claiming normality (''that's how it is'') 5 Small emotions or close-lipped Shallow, indifferent or close-lipped. Positive emotions are not credibly supported by the narrative, but there is also no report about substantial impairment due to burden 4 Unfulfilled An anchorless, unauthentic life in which desired positive experience remains unfulfilled (without pronounced negative experience). Any reports of positive resources are hardly supported in the narrative, at least in central areas. Reported burden is likely downplayed. The persons seem to live a life they would not have chosen, or work towards alternative lives, without explicit dissatisfaction with concrete issues. Usually something is ''missing'', a lack of perspectives is noticeable, or people are bored and lacking alternatives. A slightly negative, weary disposition results less from suffering than lack of satisfaction, and may be manifested as a grumpy tone. Time poverty due to obligations is possible, while emotional defenses of downplaying, justifying, rationalizing, denying more ambitious aspiration or projecting emotions towards outside stimuli are likely. Persons may judge themselves as satisfied but not happy Are Most People Happy? Exploring the Meaning of Subjective… life?'', and at the end ''Are there issues influential to your well-being which have not been addressed yet?'' Almost all interviews took place in the interviewees' homes and were audio-recorded and transcribed. The guidelines for these semi-structured interviews had been probed and improved in pre-tests on 50 persons. Immediately after the interviews, a 1-page demographic questionnaire was filled in, which involved the standard questions ''Taking all things together, how satisfied with your life/how happy are you these days?'' (in German), with a 10-point rating between ''very dissatisfied'' and ''very satisfied'' or ''very unhappy'' and ''very Table 1 continued   3 Disharmonious life but with support Sad, burdened or stressed, but with positive resources or support.
Persons report that problems impair their mood, with at least one problem in a central area which is detrimental for the global mood. Burden is explicitly expressed at least once, using words such as dissatisfaction, occasional depression, burden, stress, pain, or similar. Time either lacks due to obligations or cannot be filled with positive experience. Persons may struggle to fix some problems or achieve a targeted goal, such that life is out of balance or unfulfilled. Life regularly brings negative emotions (grief, sadness, anger, displeasure, anxiety, sorrows, pain, problems, unmet needs or desires, quarreling, loneliness, disappointment, boredom, or lack of self-esteem or self-confidence), unpleasant situations or torturous thoughts. Symptoms such as sleep disorders, nervousness, fragility, agitation or irritation are likely to be reported, possibly accompanied by deprecation of others. Problems constitute a major part of the interviews or are mentioned repeatedly, though they may be downplayed or temporary (e.g. when preparing for an important exam), but life still has (authentically reported) positive sides and resources such as family which bring joy to the person 2 Disharmonious life without support Like the previous, but without support by credibly reported positive resources. Although impairment does not lead to a dominantly depressive mood, it seems to outweigh the good things in life. Resigned or cynical statements are likely, as well as symptoms of depression and anxiety disorders. Life seems to be a burden, and persons are not looking forward to the future 1 Dominant depression Depressive, clouded, dominantly burdened: a distinctively negative touch characterizes the interview, persons report about depression or depressive symptoms, despair, hopelessness, disappointment or dissatisfaction with life, or the lack of positive expectations. The category applies to persons with or without noticeable positive support a Central areas: core family, partner, children (for young persons: parents); social contacts; work life/ education; way of life, leisure time, time use, self-actualization, time poverty; self-esteem, self confidence; health (physiological ? psychological), feeling well; spirituality/religion; finances/standard of living; caregiving; social environment/ecological environment/habitat; housekeeping b Resignation: Involuntary acceptance of undesirable circumstances, without acute impairment, having ''got used to'' something, without fully successful coping (such as deliberately abandoning unrealistic desires through mature re-evaluation). Giving up a goal, the pursuit of which is realized to be detrimental may serve as an example happy'', respectively. The 10-point scale was chosen in order to be consistent with the European Quality of Life Survey (Eurofound 2005). (The German terms were ''zufrieden'' for ''satisfied'' and ''glücklich'' for ''happy''.) Presenting the self-rating questions after the interview made it likely that the responses coincide more closely to the contents of the narratives than the reverse order, which was desirable for the comparison between verbal and numerical statements.
The following results section links actual contents of life narratives to subjective selfratings. An additional illustration is provided by a comparison between the self-ratings regarding happiness and LS and an alternative classification ''narrated well-being'' (NWB), for which a total of 13 raters coded their overall impression of the transcribed life narratives. NWB was developed in an iterative process of trial and error, merging theoretical expectations with empirical experience.

The NWB Classification
Since an interview contains manifold qualitative information which is hardly communicable in a journal paper, a classification scheme was developed in order to summarize the interviewees' own evaluative judgments about positive and negative conditions of their lives. This scheme preserves information about whether considerable pleasure or displeasure was explicitly expressed by the participants, and provides a workable characterization of a person's balance between these experiences. The different levels of the scheme are verbalized in a way which supports an interpretation of ''how life is'' in absolute terms, as independent of individual anchor levels as possible.
The starting point was the collection of good and bad circumstances. Numerical conversions like scaling or counting did not lead to satisfying consistency between different coders, but experience showed that explicitly considering the authenticity of emotional expression helped resolve discrepancies between different ratings. Similarly, explicitly involving coping efforts in the classification removed coding ambiguity substantially. In the end, a scheme with 11 different levels was considered detailed enough to assign all 500 cases plausibly, while retaining sufficient distinctiveness between the levels in terms of their semantic meaning. The resulting scheme is called the narrative well-being classification (NWB); an overview of its categories is given in Table 1.
Note that the NWB is not claimed to be an alternative assessment of life satisfaction or happiness, but an instrument to summarize the occurrence and affective balance of subjectively evaluated circumstances in a 45 min life narrative. Only aspects which influence present well-being are considered. It is a formative indicator (in the meaning of Diamantopoulos and Winklhofer 2001) rather than a reflective one, merging heterogeneous information into a single number, including objective and subjective aspects of well-being. But it is less quantitative than indexes proposed by Alkire and Foster (2011) which count the number of dimensions in which certain problematic thresholds are exceeded. Certainly, NWB cannot be considered as a fully objective external rating, but it does ease communication between researchers and is therefore employed as an illustration tool in the results section.
A substantial improvement regarding positive NWB ratings has been the consideration of ''authentic'' statements which go beyond mere (possibly dissonance-avoiding) evaluations (such as ''my job is all right'') but make the appraisal plausible (such as ''I love what I am doing at work''). Furthermore, the category ''resilient happiness'' (3) Are Most People Happy? Exploring the Meaning of Subjective… helped to classify persons who do well to a certain extent and plausibly describe their life experience as positive although life burdens them, which means that some life circumstances are described as unfavorable but the negative effects are largely overcome by successful coping. Typical examples are informal care-givers who tell about the burdens of their obligations but on the other hand credibly claim that they have adapted and do not mind anymore, or elderly people who change their activities according to their reduced capacities. This corresponds to ''adaptation'' in Zapf's (1984) welfare positions-subjective well-being in spite of unfavorable objective conditions. Category (4) relates to similar situations in which the coping strategies are only partially successful. Three categories (5)-(7) (which are not considered ordered) characterize ambiguous narratives with no clear dominance of positive or negative aspects, yet are differentiated according to the degree of emotion expressed, whereby (7) includes interviews where people did not talk openly about their emotional experiences, for example by constantly downplaying them. (8) was introduced for situations where authentic positive experience reports are lacking, giving the impression of unfulfilled desires as the main impairment (cf. Zapf's welfare position ''dissonance'', dissatisfaction in spite of supportive objective conditions, or Mc Kennell's 1978 ''resigned'', satisfied but not happy). The remaining categories describe narratives with a clearly negative dominance, either still supported by (but outweighing) positive resources (9), or not (10), with (11) already describing symptoms of depression or naming it explicitly. According to the feedback during coding, these categories seem in principle workable, understandable, sufficiently distinct but nevertheless exhaustive.
The raters marked all circumstances with a reported evaluation and classified the degree of its influence on the interviewee's life according to frequency and centrality (the detailed results are not reported here). All of these highlighted codes and the total interview content were eventually taken into account to assign a NWB rating. (A version of NWB suitable for self-rating is currently under preparation.) It turned out that with sufficient training, inter-rater agreement increased to very satisfying levels: up to an intra-class-correlation of 0.85. (On the basis of two independent ratings per case, this would enable a Cronbach's a reliability of 0.92, qualifying for as a highly reliable assessment.) The 500 codings which are used here rely on a platform where 12 raters reached an inter-rater agreement of 0.67, which is acceptable for our purposes (0.40-0.75 may be considered ''fair to good'') (Fleiss 1986).  Figure 1 shows the distribution of the interviewees' LS and happiness self-ratings which appear, as usual, to have a strong dominance of positive ratings (8.54 on average for LS, 8.46 for happiness) and a remarkable step between 7 and 8. The alternative NWB categorization, in contrast, suggests a much more moderate view (Fig. 2). There are substantially fewer extreme judgments, many more neutral categorizations, and more negative ones, though these remain a minority. Self-and external (where external coders rated life satisfaction and happiness on the basis of the interviews, using the same scale as the interviewee) standard numerical LS and happiness ratings on the 10-point rating scale are weakly related (Pearson's r = 0.29, p \ 0.001 for LS as well as for happiness), and external ratings are much more critical (on average: 7.71, p \ 0.001, for LS, 7.43, p \ 0.001, for happiness). There are, for LS, few cases (17 %) with a better external rating than an internal one, 27 % ties, but 56 % with more critical external ratings. (In particular, negativity bias in the meaning of Rozin and Royzman (2001) does not seem to play a dominating role.) The top LS categories 9 and 10 were chosen by 59.2 % of the self-raters, but only by 38.4 % of the external raters.
The small correlation between self-and external standard ratings cannot be explained purely by the information gap between real life and interview narrative, since external ratings of the same interviews by different coders are even less correlated (0.16 for LS, and 0.22 for happiness). Thus, there is a much lower degree of common understanding among the raters about how the standard scales should be applied to the narratives than for the NWB classification. But apart from this unsystematic variation, there remains a substantial systematic difference in the size of the values.
Thus, assuming that the interviews create a valid impression of a life at all, the ratings obviously do not mean the same to the subject and the observer (i.e. the transformation functions mentioned before are not inverse to each other). Additionally, in 24 % of cases the external raters judged the respondent's value not to be plausible given the interview content. Extreme disparities could be found regarding dissatisfaction: 40 persons were rated 5 or worse by the external raters and 15 persons by self-rating, but of those only 6 cases match. Table 2 shows the bivariate distribution of LS self-ratings and NWB (essentially the same picture is obtained for happiness, not shown). From the point of view of NWB, the   ''very satisfied'' (LS) group looks rather heterogeneous, but not like a highly privileged one. Most interestingly, people who express only small emotions or who are categorized as ''small emotions or close-lipped'' have a strong tendency to rate themselves as ''10'', which gives rise to the suspicion that, for some respondents, positive self-rating might express defensive response behavior rather than true bliss (in line with Cummins 2003, as mentioned in the introduction). Some of the entries in Table 2 seem quite discrepant, combining rather negative NWB with very positive self-ratings, or, more seldom, the other way round. Fortunately, contents of the interviews offer some explanation. Two examples shall be given, one where the selfrating seems to characterize coping with unfavorable conditions rather than the actual hedonic status, and one which seems to involve cognitive reappraisal or suppression of emotional expression which are well-known strategies of emotion regulation (e.g. Gross 2002).
Regarding coping, a female participant (self-rating: 10) continuingly complains about the double burden of professional work and raising a small child, and was therefore rated as disharmonious life but with support: She ''still regrets'' becoming a mother again after an unplanned pregnancy, in particular missing the more comfortable life she was used to. She reports frequently being fully exhausted and flipping-out occasionally, but also that she appreciates the support by her husband and that she takes time for personal recreation every now and then, ''when I can't take it anymore and then say: Ok, I have to do something for myself, or I'll flake out'' (translated from German to English). In spite of all the social support, the case was not rated as ''resilient'' because the interviewee's own words describe her actual emotional reaction to her situation as markedly negative.
The following passage from another interview reveals verbal compliance to obviously burdensome circumstances (self-rating: 9, NWB: disharmonious life but with support). Having repeatedly mentioned being burdened by time pressure, the participant responds to a question about restrictions in life: ''Time pressure, I repeat myself […] It will get better. And still everything works. It does not knock me out. It is ok as it is. I just hope I continue to have the strength and health to keep going like that. And, all in all, it is fine. It is fine. [In German: ''Es passt.'']'' To what extent this can be taken as a direct proof of suppressed emotional expression is debatable, at any rate it strikingly demonstrates substantial (subjectively reported) impairment which is not detectable in evaluative judgments, verbally as well as numerically.
This case may also serve as an example of positive life satisfaction not considering current but rather temporary (or hopefully temporary) stressors, such as for students writing a thesis or persons longing for a partnership. In another example (self-rating: 9, NWB: small emotions or close-lipped), the interviewee insists his stressful experiences to be ''normal'', which leads to a rationalization setting low standards: After claiming that stress at work influences his mood, he adds: ''That's normal, that does not matter'', and being asked whether this occurs often: ''No, not really. But there are days, once a week, where it does. That's normal. […] There is no job where you do all things right, where everything is ok. That does not exist. And if somebody says it does, he lies.'' In the very seldom cases where positive NWB meets very critical self-rating, the negative self-rating is harder to explain. One case shows a large discrepancy between LS (1) and happiness (9), maybe suggesting that-in spite of private happiness-she believes that her current job might only be a transition phase, making it too early to be ''satisfied''. In another case (LS and happiness: 2, light-hearted happiness with minor impairment), the concept of happiness as an unreachable ideal leads to paradoxical statements (''if I would say I am happy I would most probably be very unhappy''). However, it is always possible that crucial issues have simply not been mentioned in the interview, or that persons occasionally mixed up the positive and the negative side of the response scales.
To explain why the NWB ratings have been so critical, Tables 3, 4 contrasts positive and negative life circumstances from the life narratives with self-rated LS values 7-10. Thereby, only life incidences enter the table for which the respondents themselves expressed substantial positive or negative hedonic consequences. Out of those, the most important ones for the person's overall hedonic state as judged by the external raters were selected. The interviews were chosen according to implicit stratification by NWB (and randomly chosen within the strata) which is indicated on the left-hand side and by the background shades. Therefore the choice of cases can be regarded as representative for the LS categories within the sample. Tables 3, 4 clearly show that even extremely positive Table 3 Self-rating versus NWB, and self-reported circumstances (Part I) Self-rating: 10&9 (Scale: 1 very dissatisfied -10 very satisfied) NWB: 1 dominant depression -11 light-hearted happiness 10 9 11 10 9 8 7 6 5 4 3 2 1 + family, sport, traveling, heritage, takes care for others at work -has seen bad things during alternative civilian service, politics, corruption + good partnership, family, kids, grandchildren, garden/flowers, sports, going for a walk with the dog, motorbike tour, positivism, health...a) -menopause (sleep disorder, hot flash), going to be unemployed a few years before retirement, eye degeneration, financial restrictions + family, energy-sapping leadership of school but with pleasure -dual burden: household and work + overall good life, lots of experiences/friends/parties, two kids, family, partner, garden -mother's/sister's death, misses recognition as housewife from others and partner, wants to stay at home -financially not possible as children's allowance to be terminated + 3 grandchildren after birth of kid at the age of 44, gym, humor -early death of 1st child, marriage break up, multiple operations (intervertebral discs, hip...) + stronger from things happened in the past, hobbies, natural surrounding -brother suffers from cancer, mother's depressions, worries about kids' future + army, baby cats, good relationship with mother since father's death, paragliding -no good relationship of family with grandmother, stressful movement to new site + visit from son, strong connection to homeland, dog, helps out people in the surrounding -parents, husband, sister and youngest son died within last 10 years + strength from partner and family, job and education, sport, rural surrounding father died, lost job/partner, job dissatisfaction, too many foreigners + kids, friendship, satisfied with herself -17 years horrible marriage, 2nd divorce, husband lives next door, aortic aneurysma, taking care of mother till death, father's late comeback from captivity -no real relationship + family, sister, friends, nature -physical burden, away from home and friends + intense partnership while kids are out, appreciating the little things since taking care of sister and father, parents, siblings...c) + job in nursery school, marriage, birth of three kids -sad to become older, has to wear glasses + family, good job climate, soccer, trips to home country, housing benefit facilitates financial situation, wants to buy own apartment -war in home country, had to leave family + family, friends, quality of life -annoyed with people who think they are a cut above the rest + family, son, grandchildren, skilled manual work, physical health -surrounding infrastructure + finished studies, kids -abused during childhood, conflicts, missed qualifying examination for school, low self-esteem in comparison to others, intervertebral discs operation, back problems, daughter suffered from depression and anxiety states one year ago, burnout, cumbersome infrastructural changes from bus to train + friends, partner, hobbies, natural surroundings and neighborhood -not happy with education, lots of work, constructions change surrounding + animals, good reputation, nature and fresh air -misses friends after moving, talking about somebody behind one's back, troubles with mother in law + partner, part time job, nice colleagues, makes ends meet quite well -rich's politics, mother's alcoholism, violent father, brother's suicide, son in drug scene + being at home, garden, talking to mother, motivation gained through motivation books, house, silence, nature, rural surrounding, neighbors + recognition from parents, financial support, good relationship with mother -conflict with best friend led to bad school grades, more leisure time and shopping facilities desired + engagement in fire department -pressure to perform + financially secure by heritage, traveling, locality -won fight against breast cancer + place of residence and surroundings, friends, financial situation, optimism, neighbors -health problems -avoids going to doctor, mother's death after taking care of her lead to heritage disputes with brother who had no contact, became single, joint pain, retirement pay will be better than unemployment pay + bicycle tour -stress in the workplace + partner, kids, mother refugee of war, language problems, alone -no parents/friends...d) + weekends without kids, married, healthy, standard of living…e) celiac disease, neighbors, atopic dermatitis, two divorces…e) + birth of daughter, partnership, sports, rural surroundings...f) -contact with daughter/relationship with daughter's mother...f) + faith gives power, psychiatrist helps out with stress and burdens…g) takes care of aunt -no holidays, time squeezes, insomnia...g) + friends, music band -brother's suicide, best friend died, yearly deaths of friends...b) self-ratings do not necessarily indicate highly comfortable life circumstances or experiences, but may point to considerable harm or suffering. Even respondents with self-ratings of ''10'' often report substantial psychological burden, including financial restrictions, health problems, unemployment, alcoholism, discrimination, death or life-threatening diseases of close relatives, and sadness. With decreasing self-rating, the NWB rating also becomes more critical, as expected.
Aggregating NWB categories 1-2, 5-7, and 8-11, the same sample could be described following Fig. 3: 23 % of respondents are living an authentically happy life without major problems-light-hearted; 20 % are judged as resilient (coping well in spite of substantial psychological burden); another 15 % as noticeably impaired but still with positive tenorstill positive but impaired; a great proportion (28 %) seems rather balanced or close-lipped regarding positive and negative aspects; and finally for 15 % negative content seems to dominate the interview-negative balance. We believe that this is an informative, realistic and workable add-on to diagnoses like Fig. 1 or statements such as ''84 % report one's own LS of at least 8'' or ''the average LS is 8.54''.
What do these sample results tell us about the real-life meaning of the possible responses to the ''life satisfaction'' question? Table 2 suggests a rough translation of the rating scale values into the NWB scheme in the following manner: 10-''light-hearted with probably still some impairment, or possibly just close-lipped''; 8 and 9-''positive but impaired to a minor or substantial degree, resilient, or ups and downs balanced''; 7-''probably balanced, but could be more or less anything''. For the other infrequently chosen values, the sample is too small to reliably assign characteristics. The few cases with LS ratings of 5 or 6 generally show marked negative tendencies in NWB, as is the case for ratings of 4, but strangely not for ratings of 3 or worse.

Conclusion
Linking the interviews to LS or happiness self-ratings clearly shows that many persons evaluate their lives very positively, in spite of essential restrictions of their hedonic status (as narrated by themselves), an effect some researchers do not seem to be fully aware of (in Table 3 continued a) ? money for holiday, recognition from others, religion, nice place of residence, no problems with neighbors, participation in communities (choir) b) -alcohol problem, unemployment, calcification of ligaments c) ? sports, social contacts, good marriage, kids, feeling healthy -sister's death (breast cancer),worries about kids' future, feels like an elephant compared to other women, financial restrictions, less time with husband d) -mother's cancer, discrimination (no job, problems in finding a flat) e) ? work in contrast to mother role, surrounding -struggles with late-born kid due to unintentional pregnancy f) ? few foreigners (good for kids coming to school) -2 month wheel chair (traffic accident), financial/time pressure (self-employment), movement of shops to shopping centers, crime rate due to immigrants, politics g) ? enjoys job, friends -bad experiences with men -one died, compassion with kids Are Most People Happy? Exploring the Meaning of Subjective… particular according to feedback the authors received at conferences). Interpreting high self-ratings as representing highly pleasant psychological states would clearly underestimate the amount of suffering in the sample. Considering what people actually report in the interview, some high ratings seem mainly to express that their burden was not worse than normal, unbearable or as a reason to complain. Additionally, a considerable share of the cheerful self-evaluations is deemed implausible by external raters who obviously tend to apply different criteria than the interviewees themselves. Tables 3, 4 contain some examples of comments from high self-raters which appear contrary to what most people would consider a good life. Table 4 Self-rating versus NWB, and self-reported circumstances (Part II) Self-rating: 8&7 (Scale: 1 very dissatisfied -10 very satisfied) NWB: 1 dominant depression -11 light-hearted happiness + traveling, partner and friends, physical condition, close to city, nature for leisure time, quality of life in locality -mother died, bad relationship with stepmother, lack of contact to father, daughter's autoimmune disease, kids living far away + music, friends, nature -self-esteem, pressure in school, boring town, wants to make own decisions + partnership, studies, music, veganism, relatives -comparison with brother and others, two grandfathers died (leukemia, heart attack), motivational problems, lack of self-esteem, depressive mother, annoyed girlfriend + desired workplace with recognition from employer/parents, relationship with brother, friendships, own house, financially independent, nice country, sport, go for a walk to relax, newspaper, TV, leisure time, accepts oneself -worries about mother in hospital, green areas covered by buildings nowadays, less nursery school places because of immigrants + friends, own apartment with good access to public transport, enjoy time on one's own -comparison with others, being alone, failed exam, health problems (toes, spine, joints, chronic bladder infection, bronchitis, asthma) + sports career and engagement in sports association...b) -suicide of a friend's mother, father died, moved to mother's town + support from parents and grandparents, having moved a lot facilitates the setup of a circle of friends, routines like weekly meal with father, finished exams, friend to talk to -grandmother's operation, distance to partner, failed exams, thoughts about future, pressure from studies and parents, comparison with others, has to wear glasses + relationship with sister, contact to father -alcoholic stepfather, lots of disputes in childhood + partner -politics + family, daughter, education -war in home country, less money for same job than husband, small flat -searching for bigger one + nature and leisure activities -negative thoughts about deceit of others, stopped education due to parent's divorce + family, kids, quality of life in hometown -too few jobs, settlements away from town, criticizes politics (nursing care insurance) -has seen harvester-thresher accident with fire brigade -financial troubles, living together with parents, tinnitus, bad j ob opportunities due to chosen studies + culture, art, literature, music, language studies, talking to friends, go for a walk, living in green surrounding with kids illness of daughter, troublesome partnership, employment…g) + contact to kids renewed, let ex-husband off the hook -mental breakdown, depressions, separation escalated into a wars of roses, lost contact to kids and conflict + child has mutism under control, kids, tai chi gong, go for a walk, setting up house plan, kids learning musical instrument...h) -sister's depression-less contact, daughter's mutism-lots of therapies, because of child care part time job -waits until they are older...h) + hiking, engagement in associations accompanied by friends, collecting mushrooms -back problems and related limitations in leisure activities...c) + partner and kids mother's health condition, job and troubles with head, time pressure because of moving, poor sleep + kids' development, goals reached -no permanent job, divorce, no partner to share costs, early death of father, overweight + positive health report, new partner after death of former one, kids and grandchildren, good social system and retirement plan -need of care, cancer, death of partner who took care...e) + relatives and friends, former work in nursery school, made peace with stepparents during their lifetime -overweight due to frustration, not accepted by stepparents...f) + psychological and physiological condition, time with friends and activities in communities, living in harmony with others, locality -early death of father/mother/grandmother, no steady partner, only time for others, impersonal development of others (phone calls, big supermarkets instead of small ones, opinionated driving style) + surrounding and people takes care of partner (dementia, often aggressive) + physical condition, house for whole family, own car, friends, dogs unemployed (mobbed out of last job), dissatisfied, loss of motivation i) + physical condition, sport, support from parents who take care of son conflicts with parents living in the same house, noise, pollution from construction works by neighbors, bus connection, overweight + married, financial support from husband's parents, pharmacy studies -grandfather's death, eye operations, financial pressure, 2 years in foreign country/too big city -wants to go back home, no friendships To sum up the results in the sample, good ratings should not be misinterpreted by researchers as indicating lives full of positive emotional experience, at least concerning some part of the population. Doing so will produce a positivity bias and artificially overestimate well-being. Contrasting the self-ratings with the NWB scheme, it becomes evident that only markedly positive ratings (8?) may be taken as clear dominance of positive over negative aspects, whereby also a part of these raters' lives is substantially impaired.
In principle, these results allow for a wide range of different explanations: (1) A common understanding of the meaning of the numbers on the self-rating scales exists, but some of the well-known self-serving biases interfere when it comes to rating one's own life. This view is supported by the fact that external raters evaluate systematically more critically than internal ones, and that many positive LS self-ratings lack plausibility. Moreover, the interviews contain lots of evidence of downplaying negative circumstances. In fact, this is observed in about half of the interviews (Grünwald 2014). (2) There is no self-serving bias and the numbers fully capture what people feel, but the external interpretation that ''high rating = top life'' is too superficial because the transformation processes life ? self-rating (within the subjective rater) and self-rating ? life (within the observing researcher) are not sufficiently inverse to each other. Evidence for this or similar Table 4 continued a) ? relationship with stepparents b) ? positive thinking, beautiful surrounding, relaxed mood compared to former times…b) c) -overdrawn bank account, public transport options d) ? curiosity -arthrosis, minimum pension, conflict with sister, partner's/mother's death e) -meniscus operation failed, bladder and larynge cancer, in wheelchair since feet operation, hometown and its infrastructure f) -psychotherapy due to broken contact with kids/grandchildren, dislikes public attention due to partner's job, forthcoming carpal canal syndrome operation g) -situation of husband, often tired, headache, stomachache h) ? nice people in nursery school, leisure opportunities, sustainable living (bio products, less car rides…) -no friends in town -housewife, wants to move (rent, commuting, heating costs), only ones who live sustainably, brothers and sisters do not get along, restrictions because of back problems, no support from husband due to his job, difficult to get bio products i) -life often felt meaningless, feeling everything went wrong, no support by others, no positive feedback from family/friends

Fig. 3 A shortened narrated well-being categorization
Are Most People Happy? Exploring the Meaning of Subjective… views lies, for example, in the fact that some respondents explained LS (''Zufriedenheit'' in German) as specific contentment with material or visible achievements (in contrast to the inner hedonic level), whereas others considered it just as a modest form of happiness (for these interviewees, highly rated satisfaction means that a basic happiness level is reached, but in no way indicates an extreme appraisal). A theoretical framework for the heterogeneity of happiness concepts across individuals is presented by Rojas (2005).
It has to be noted that the various interpretations do not necessarily affect SWB comparisons between groups or trend evaluations, but they strongly affect the interpretation of absolute levels, as reflected in statements such as ''for most people, everything is ok'', or, that ''most people are happy''. We believe that categorization schemes which better differentiate between the positive states possess higher relevance for policy. It could even be detrimental for a society's progress if official institutions claim that a vast majority is doing well, as this may downplay the need for political action. Superficial pronouncements of a population's well-being may therefore not only be useless but even dangerous.
Are most people happy, after all? Judged from our sample showing the usual positive self-ratings, a more sophisticated statement is required. Roughly speaking, comfortable experience seems to outweigh negative experience for a 60 %-majority (only). But it is also true that about 55 % seem noticeably impaired regarding their hedonic state. And of the more privileged remaining 45 %, almost half feel well in spite of substantial problems.

Limitations
These first results strongly suggest that quantitative ratings of SWB need to be evaluated and interpreted very carefully. Future studies need to confirm these findings, since the sample cannot be considered fully representative of the Austrian 16? population, much less for populations in other countries. In particular, it cannot be concluded automatically that our results may be generalized to other languages than German, or to other cultures. Thus, the study cannot show how self-ratings work in general. On the other hand, there is no evidence that the relation between self-rating and life circumstances or the underlying psychological processes would be completely different for other countries (acknowledging that the translation of LS and happiness into national languages may create semantic differences).
The fundamental question arises as to whether it is at all possible to validate subjective life evaluation by external ratings of life narratives. Maybe an interview does not cover the really relevant aspects of life, or the emotional consequences of a narrated life are not accessible via external rating (moreover, the external ratings on the traditional 10-point scale could be negatively biased). Fortunately, there is empirical evidence that the discrepancy between NWB and self-rated LS does not merely reflect a principle impossibility of judging a life from outside: for example, in spite of the handicap of being evaluated by another person, NWB correlated even better to the self-rated item ''living in harmony with oneself'' (0.4) than the (also self-rated) LS (0.3). NWB does capture some aspects of subjective experience, but is far from being identical to the LS question-and should not be considered less relevant given the results in Tables 3, 4. In any case, if a lengthy self-report should not provide the basis for a valid impression about a person's hedonic state, it is also hard to assume that a few closed responses should. Note that our results do not at all question the honesty of the individual's response, but rather any kind of naïve interpretation of the chosen rating value by an external observer.
Replication of these analyses in languages other than German or in different cultures would be highly recommendable, as well as additional applications of qualitative techniques or more sophisticated questionnaires, in order to better find out what respondents are actually telling us when they rate their lives.

Implications
How should SWB assessment improve, consequently? Further development in two directions is recommended: (1) SWB studies should involve more qualitative information to lay a more solid fundamental and validated basis for its assessment instruments, and (2) restricted inventories of question types should be avoided, by moving beyond overall evaluations of life or life domains to construct new items or response categories with more explicitly defined content. This approach would also be a step towards more personoriented research in the meaning of Bergman and Magnusson (1997), acknowledging the importance of considering many components on the individual level simultaneously. In fact, the results demonstrate an essential increase of knowledge by involving an idiographic, qualitative component into the assessment procedure (but still allowing for quantitative classification). Diener and Fujita (1995) observed on a quantitative level that resources correlate more closely with SWB if they are deemed more important by the individual under consideration. Our results also merge well with Kim-Prieto et al.'s (2005) integrative model for the various stages of the evaluation process, ranging from events and circumstances, experienced emotions, and recalled emotions to the final global valuation, whereby personality can play a moderating role on each of the stages, since different persons experience different events, react and recall differently, and use different criteria for the global judgment.
The proposed approach would require intense effort (including human resources and interdisciplinary working teams), of course, and it seems particularly unlikely that national surveys would include qualitative interviewing, but it should certainly be considered for supplemental studies. From a scientific point of view, such an approach is substantially more promising than running an assessment, ranked so highly on the political agenda, without firm qualitative evidence. A major disaster in SWB research would be that people do not do well, but researchers fail to recognize. Our results strongly suggest that relying exclusively on standard self-rating questions will not protect against this danger.