1 Introduction

1.1 “All happy families are alike; each unhappy family is unhappy in its own way”

This famous first line of Anna Karenina (Tolstoy, 1877/2014) has left the literary realm and become an active area of scientific research. Much of this work has been spearheaded by psychologists, who have found evidence that we tend to both (a) perceive positive cases (e.g., happy families) as more similar to each other than their negative counterparts and (b) judge more similar cases to be more positive than their less similar counterparts. Several aspects of this observation remain less clear, however. First, to what extent do these subjective perceptions reflect objective properties of the environment. To use Tolstoy’s example, are happy families really more alike, or does something about their being happy lead people to an illusory perception of similarity (or does something about their similarity perhaps lead to a biased judgment that they are happier)? Second, to what extent do these observations, whether real or illusory, generalize beyond happy families to more general assessments of positivity? Making this assessment particularly hard is the arguably inherently subjective nature of both positivity and of similarity. It is not obvious how to define positivity without reference to evaluation from the human mind, and there is widespread debate in psychology about how to measure similarity and the extent to which it is a subjective metric stemming from a cognitive process or a representation of something external and more absolute (Alves et al., 2017). In this paper we borrow ideas from non-psychological research on the convergence of positivity (Diamond, 1997; Zaneveld et al., 2017) and propose a practical solution to these challenges. In a series of five studies we demonstrate that Tolstoy’s observation about families might be objectively correct about positive psychological outcomes more generally, at least in the realm of individual differences. We also discuss possible mechanisms behind this phenomenon, speculate about its scope, and consider implications for psychometrics, the process of scientific discovery, and our understanding of complex systems.

Understanding the differences between good and bad, pleasure and pain, reward and punishment has been a central thread in philosophical and religious texts for millennia, and, consequently, it has turned into a central topic in psychology since its inception as an empirical science. Two reviews covering decades of research on the role of valence in psychology found a fundamental psychological advantage of negative information (Baumeister et al., 2001; Rozin & Royzman, 2001). For example, negative events are remembered more easily (Skowronski & Carlston, 1987), negative words are recognized more accurately (Hansen & Hansen, 1988), proximity enhances the effects of negative stimuli more strongly (Brown, 1948), negative events result in more extreme or elaborate causal and moral judgements (Alicke, 2000; Kahneman & Miller, 1986; Knobe, 2003; Weiner, 1985), mixes between positive and negative features are evaluated as more negative than positive (Rozin & Royzman, 2001), negative behaviors are much more likely to lead to illusory correlations and stereotyping than positive ones (Mullen & Johnson, 1990), and losses loom larger than gains (Kahneman & Tversky, 1979).

More recently, the earlier conclusion that psychologically “bad is stronger than good” has been extended and partially replaced by a novel line of research, which can be summarised as “good is more alike than bad” (Alves et al., 2017). While accepting the role of motivational and affective factors when processing valenced information, it was proposed that valence also affects how information is represented in memory (Unkelbach et al., 2008). The authors suggested that the semantic space is more dense toward the positive pole, and that positive items, events or concepts are closer to each other in the mental space.

There has been a substantial amount of evidence to support this hypothesis. Compared to their negative counterparts, positive concepts are stronger primers for other positive concepts (Unkelbach et al., 2008), positive words are judged to be more similar to other positive words (Unkelbach, 2012), and both positive emotions (Schrauf & Sanchez, 2004) and positive words (Koch et al., 2016) cover narrower semantic space. In addition, when we are evaluating other people, we perceive likable people as more similar to each other than less likable people (Alves et al., 2016). Alternatively, when we create likable personality profiles we tend to converge on the same desirable features, but when creating negative profiles we do not converge on the same undesirable features (Borkenau & Leising, 2016; Wood & Furr, 2016). Greater semantic density of positivity is also reflected in language: while people tend to talk about positive events more than negative ones (Boucher & Osgood, 1969), the number of terms describing negative emotions exceeds the number of terms describing positive emotions (Rozin & Royzman, 2001), leading to a greater differentiation between negative states.

Indeed, people tend to perceive average stimuli (an indicator of similarity) more positively than stimuli which are further from the center of a particular dimension. This pattern has been observed for multiple categories, ranging from human faces to cars (Winkielman et al., 2006). This suggests that perhaps the perception that happy families are more similar than unhappy families stems from a cognitive bias to view similar families more positively than dissimilar families rather than (or in addition to) a tendency for more positive things to be–or at least appear–more similar.

1.2 The Role of the Environment

The previous work on negativity bias, and the more recent work on convergence of positivity have emphasized that in one or another form the environment is at least partially responsible for the observed asymmetries in mind and language. There is plenty of evidence that statistical patterns influence our perception of positivity. Increasing the frequency of stimuli (Zajonc, 2001) or the averageness of a geometric pattern (Winkielman et al., 2006) results in more positive evaluations, and changing the range of arbitrary stimuli can selectively lead to tighter or looser mental clustering regardless of the valence (Shin & Niv, 2021). Similarly, manipulating the statistical distribution of gains and losses can decrease, or even reverse loss aversion (Walasek & Stewart, 2015). If positive events have different statistical patterns in the environment, those statistical patterns might lead to convergence of positivity in the mind.

But is there a convergence of positivity in the environment? Are positive items, events or people more alike than their negative counterparts? Leading psychologists who have studied happiness for decades have been sceptical:

“Calling a notion a principle need not make it so. I prefer to regard the Anna Karenina Principle as a hypothesis to be tested. While it may hold in some cases, it likely does not hold in all or indeed most cases.” (Peterson, 2012).

While there is ample evidence that people perceive positively valenced cases to be more similar than their negative counterparts and, from the other direction, evaluate more similar cases more positively, to the best of our knowledge there is no systematic work testing whether these subjective assessments are associated with objectively more similar or more positive cases. There is, however, a suggestive mix of empirical and anecdotal evidence. In one study (Tumminello et al., 2011), the homogeneity of answers was computed in a large questionnaire among different groups of older adults. The authors found that the happier the participants were, the more similar their answers to the questions were. The work provides some evidence that happier people might be objectively more alike than less happy people. Alternatively, research has found that average faces are not only judged more attractive, but are also predictors of better health (Little et al., 2011), suggesting that more similar people (to the extent averageness is a good proxy for greater similarity to others) are objectively more positive (at least with respect to good health) than less similar people. There is also some evidence that this pattern could be observed in domains which are not linked to happiness, attractiveness, or human evaluation. A study on animal microbiomes (Zaneveld et al., 2017) found that microbial communities in stressed or diseased individuals do not form a new. “unhealthy” cluster, but spread around healthy controls, leading to a convergence of positivity pattern. The authors further note that the pattern is ubiquitous, yet underappreciated, “easily missed or discarded by some common workflows, and therefore probably underreported” (Zaneveld et al., 2017).

1.3 Potential Mechanisms

In this work we present a targeted testing of the convergence of positivity hypothesis in regard to psychological similarity between humans. Before we describe our studies and results, we will review four potential mechanisms which have been proposed as possible explanations of convergence of positivity patterns. For a graphical representation of the four mechanisms see Fig. 1.

  1. 1.

    Asymmetric distributions: Many authors have noted that if events are positioned on a valence scale, positive events will be more frequent than negative events, yet negative events have a greater range and variance. An illustration for this range asymmetry is that extreme pain is a much stronger subjective experience than extreme pleasure (Baumeister et al., 2001; Rozin & Royzman, 2001). In other words, there is larger negative space and smaller positive space and the positive space is more densely populated. It is easy to see how this asymmetry results in a greater proximity of positive items, where the distance between two randomly chosen items will be smaller on average for positive than for negative pairs.

  2. 2.

    Averageness and homeostatic models: Aristotle claimed that goodness is in the lack of extremes, and recently this principle was proposed as a likely explanation of the environmental factors behind the psychological convergence of positivity (Alves et al., 2017; Unkelbach et al., 2021). The relationship between a dimension in the environment and subjective valence is often inversely U-shaped, where the positivity first increases, then peaks, and then it starts to decrease. An example is the narrow range of comfortable temperature, where discomfort is associated with too high or too low temperatures. Since a negative state in a homeostatic model can be on either side of the comfort zone, on average the distance between negative states is expected to be higher than between positive states. Alternatively, averageness might be related to a joint maximization of multiple factors. For a given sport, for example, both strength and speed might be important, leading to an optimal body weight, which balances both requirements.

  3. 3.

    Many necessary but few sufficient conditions: In evolved and other complex systems, proper functioning overall might tend to involve several necessary conditions, with failure resulting from even one unmet requirement. This idea has been referred to as the chain principle (Peeters & Czapinski, 1990; Weinberg, 1975) or as the Anna Karenina Principle (Diamond, 1997), and has been proposed as an explanation of why independent cases of animal and plant domestication, for example, have been so rare in human cultural evolution. Since there are very few ways to succeed, but many ways to fail, less successful systems are more different from each other than more successful systems. Another example comes from evolutionary genetics, where it is well known that a random mutation is much more likely to be malevolent for the organism than to be benevolent (Desai & Fisher, 2007).

  4. 4.

    Cultural Niche Construction and Outliers: The second mechanism emphasized the positivity of the average; this and the first mechanism emphasize the ways in which positive outliers, positives on the extreme tail of a distribution, may be more similar to each other. It should be uncontroversial that cultural groups seek to promote what they consider good, and so it may not be surprising that over time cultural entities construct institutions and other built environments that are carefully fit to assess and promote those culture-specific valued goods (Laland & O’brien, 2011). Medin et al. (2010) point to this as a possible problem for research in cultural psychology which—to the extent cultural psychologists come from or are asymmetrically influenced by North American or Western conceptions of psychology—may lead to psychological measures that are better fit to participants from the field of psychology’s original and current dominant home, Europe and North America, respectively (Henrich et al., 2010). Bennis and Medin (2010) suggest this as a possible explanation for Henrich and colleagues’ finding that WEIRD people (research participants from Western Educated Industrialized, Rich, Democracies) seem to be psychological outliers: perhaps WEIRD people are not inherent outliers, but are just outliers given the culturally constructed measures of positivity that correspond in turn to institutions and other built environments that select and promote people who best fit those standards of success. The mechanism here might be facilitated by the previous mechanism, whereby there are many necessary and few sufficient conditions for success, but it points to a process of cultural niche construction whereby environments are built to identify and enhance culturally marked positives, sculpting and refining narrow extremes as ideals for success. Note that, unlike the previous three potential explanations, this mechanism might account for convergence not just on the positive side, but also a convergence of negativity, at least for negatives that are culturally marked, and in turn associated with institutions and other built environments that seek to identify (and reduce) people who fit those negatives, such as tools for identifying suicide risk, terrorism threats, or psychological disorders.

Fig. 1
figure 1

Graphical representations of the four potential mechanisms which can lead to the convergence of positivity. p1 and p2 are randomly selected positively valenced events, entities, or subjects, and n1 and n2 are randomly selected negative ones. The red–black scale represents the positive and negative regions of the valenced space accordingly. In examples A, B.1, B.2 and C |p1, p2| <|n1, n2|, where |x,y| is the distance between x and y. A. Asymmetric Distributions: The density of positive events or entities is higher, so two randomly selected positive examples are more likely to be closer in a given semantic space than two negative ones. B.1 Averageness: The middle space of a particular dimension — e.g., temperature — has positive valence, while the two extremes have negative valences. B.2 Joint Optimization: Preference for middle areas might be related to joint optimization of two or more factors. For example, for a given environment, both strength and speed might increase fitness, yet they might be reversely related to body-weight and their joint optimization leads to increased positivity in the center compared to the extremes. C. Necessary Conditions for Success: f1 and f2 are hypothetical features, and they are both necessary to have a positive example (p1 and p2), so the distance |p1, p2|= 0. Negative examples, on the other hand have only some of the necessary features; therefore, being more likely to be further away than positive examples are. D. Cultural Niche Construction: p1 and p2 are members of a cultural group, while n1 and n2 are not members of this group. d1 and d2 are hypothetical dimensions which are not directly measured. In the original (d1, d2) space |p1, p2| >|n1, n2|. Due to a hypothetical cultural process, instead of considering the original (d1, d2) space, the p1, p2 group derives a new, culturally salient dimension, d', and creates metrics which compare subjects based on their  d' projections where |p1', p2'| < |n1',n2'|.

1.4 Methods

Testing the convergence of positivity outside the mind faces the two-fold challenges of how to define positivity and similarity in ways that can be measured independent of subjective evaluations. Here we propose a practical approach which can help us with our empirical quest. Furthermore, we distinguish between at least two components of positivity. The first one is the hedonic aspect of positivity and it is conditional on the presence of subjective evaluation. For the current purposes, we will limit these subjective evaluations to human judgements. For example, in Tolstoy’s quote, the positivity of the families might be defined as self-reported well-being, or as the rating by an external human observer. Note that most of the psychological research on valence-based asymmetries in general, and on convergence of positivity in particular has been focused on this hedonic aspect of positivity. However, researchers who have studied convergence of positivity in non-psychological contexts have used more flexible definitions, related to the direction of change, or preferred state in a complex systems. For example, when trying to understand animal domestication, we might define positivity as the evolutionary success of animals in the context of human civilization (Diamond, 1997). When studying the symbiotic communities of microorganisms, we might define positivity as the health of the microbiome’s host (Zaneveld et al., 2017). When trying to understand societal patterns, we might use socio-economic status, or health as measures of positivity (Adler & Ostrove, 1999). In the context of data science, we might define positivity of a data structure as the ease to be directly used by an analyst (Wickham, 2014). To capture these diverse meanings associated with positivity, we introduce a fitness component, which unlike the hedonic component will not necessarily be conditional on a direct subjective evaluation. Instead, it will define the directionality of the movement or preferred state of the entity via other metrics. Determining the relevant fitness component of positivity will depend on the particular complex system we are studying and on the particular research question we are trying to answer. Having two interpretations of positivity will allow us to examine a broader set of domains and eventually to search for converging evidence. In some cases only the hedonic component of positivity might be available for analysis, in other cases only the fitness components might be available. When both components are present, in general we expect them to lead to similar conclusions, yet we treat this as an empirical question which might lead to different answers in different domains or datasets.

It is worth noting that our definition of positivity is very broad, and clearly leaves a lot of space for subjective interpretation on the side of the researcher. For example, it assumes a general direction of movement in the complex system and correspondence between different operationalizations of the hedonic and the fitness components of positivity. It will be very challenging, or may be impossible, to define this in the general case, since different systems might have their own idiosyncrasies, and the same system might have different short and long term goals. A classical example comes from research on rats (Bozarth & Wise, 1985; Routtenberg & Lindy, 1965), where rats allowed to stimulate directly the hedonic component of their motivational system often die from starvation or from exhaustion. Another example comes from research on pain. While people are motivated to avoid pain, and pain is a proverbial example of a negative experience, physiological insensitivity to pain is a life-threatening condition, with people often dying in childhood from injuries or illness (Linton, 2005). So while pleasure and pain are in general aligned with the environmental success of an organism, this alignment might depend on the level of analysis, the time scope, and in some cases it will also depend on the particular context or research question.

Furthermore, as the mechanism of cultural niche construction mentioned above might suggest, an unfortunate consequence of the convergence of positivity might be that similarity to the majority or dominant culture might be part of what gets selected for in defining and promoting positivity. To the extent that a high degree of positivity is culturally identified and nurtured and over time ratified such that only a selective few can achieve it, and to the extent there is a convergence of positivity such that the people who are able to achieve it must be more alike across other variables as well, it may serve as a barrier to minority and less powerful other groups that did not play a central role in defining and promoting success. We should note that the measures of positivity used in this paper are not outside the scope of this criticism. Rather than endorsing these standards, we are simply calling attention to their wide use as standards of positivity. To the extent they correspond to increased similarity with respect to other traits, it would – in our opinion – be a mistake to conclude that these other traits are important to success. Rather, we suspect the process of cultural niche construction may have led to an unjust preference for irrelevant traits that have been selected for by members of cultural institutions overfitting their tools to the individuals most like themselves.

The second big challenge for a broader testing of the convergence of positivity principle is defining and applying a distance or similarity metric which can allow us somehow to measure proximity. Similarly to valence, in psychological studies similarity is easy to measure or manipulate. People readily provide subjective similarity judgments and their estimates are fairly predictable (Nosofsky, 1989, although they might violate various distance assumptions, e.g., Tversky, 1977). In addition, in lab studies it is rather easy to measure and control for objective similarity between stimuli. Measuring similarity in general, however, is not possible, and it will again depend on the particular system being studied and on the concrete research question we have. We will list some of the challenges for applying a similarity metric to the general case, and the precautions we implemented to avoid them.

  • There are dozens (Brusco et al., 2021) of distance/similarity metrics to choose from, and some of them might return very different results. Using a consistent metric across studies helps to avoid this problem. In the present work we use Euclidean distance across all tests.

  • Similarity is always measured with respect to something, and two entities might be highly similar or highly different depending on what dimensions or features the metric is based on (Goodman, 1972; Murphy & Medin, 1985). This is a challenge, but does not invalidate similarity as a scientific construct (Medin et al., 1993). Choosing well established scales or metrics which are directly relevant to the particular research question helps to avoid this problem.

  • Another aspect of similarity is that it can be based on distances in a multidimensional space, or on the relationships between the dimensions within this space. This distinction is often referred to as surface similarity versus structural similarity (Gentner & Markman, 1997). Here we take the most simple approach, and look at surface similarity only.Footnote 1

  • Similarity might be sensitive to the level of resolution of the features included. For example, a set of books in a section in the library might be very different with respect to language, but if we use an English—foreign classification rule, then the previously different books will become very similar.

  • Similarity measures can be affected by scale transformations or scale design. For example, income is often studied on a logarithmic scale, and using transformed versus raw data could easily lead to opposite conclusions regarding distances. Alternatively, some scales are intentionally designed to follow a particular distribution (e.g., IQ scores), and changing the scale creation procedure might result in over- or under-weighting distances. In our studies we use existing metrics, created by other researchers, and we do not run any scale transformations.

  • In the context of testing the convergence of positivity hypothesis, it is also important to define which groups are being compared in terms of similarity. When Tolstoy referred to “happy families” did he mean the upper half versus the lower half of the population, or did he mean the top n% and the bottom n% on some hypothetical happiness scale? Different splits might lead to different answers. To avoid this problem, across our studies we use a conservative 50/50 split.Footnote 2

All these considerations related to defining positivity and similarity make a targeted testing of the convergence of positivity hypothesis a rather difficult task. The fact that we are relying on existing scales points to the fact that there might be systematic patterns in the scales themselves that reflect cultural norms or biases that make distributions on the positive side more similar than distributions on the negative side. This work is just a first step in exploring the question of whether there is a convergence of positivity when relying on external measures rather than just with respect to preferential choice or subjective judgments of similarity. It may well be that the existing assessment tools themselves are systematically biased in ways that correspond to some of the issues in the list above. Right now the initial question as to whether there is a convergence of positivity at all needs to be addressed. Once that is established, future research remains to address the many possible mechanisms behind that convergence, including systematic tendencies in the measurement tools used to assess similarity. We believe that such an endeavor can be very useful for generating novel knowledge and for understanding complex systems and our roles in them better. In the next part we present five studies which test the convergence of positivity hypothesis in the domain of individual differences.

1.5 Hypotheses and Research Spaces

Broadly speaking, if we define two groups of individuals, where one group is closer to some positive pole, and the other group is further away from the positive pole, the convergence of positivity hypothesis suggests that the individuals in the positive group will be closer to each other than the individuals in the less positive group. In this paper we are narrowing down this broad prediction by focusing on psychological individual differences. When studying individual differences, psychologists often focus on personality, and the most widely used tool for studying personality is the Big Five model, which suggests that a set of stable characteristics describing humans can be represented in a 5-dimensional space (Gosling et al., 2003).

As such, the first research space in which we will test the hypothesis will be defined by the Big Five model:


Individuals in the high positivity group will be closer to each other in the Big Five space than individuals in the low positivity group.

While personality models were primarily designed to capture universal patterns, largely ignoring group membership, other individual-differences models were designed to capture variance which can be modeled while explicitly taking cultural and societal norms into account. A central example for such a model is Schwartz’s theory of Basic Human Values. The theory defines a 10-dimensional human values space, which leads us to the next hypothesis:


Individuals in the high positivity group will be closer to each other in Schwartz’s values space than individuals in the low positivity group.

The first two hypotheses are narrowly focused and define the research space based on some of the most popular tools for assessing individual differences. We also propose a less restrictive hypothesis, defining research spaces as conceptually related questions targeting psychologically relevant individual differences which can be represented on a number axis. This hypothesis is exploratory, and we do not have a priori an explicit set of domains. Instead, this hypothesis will be aimed at utilizing any additional data that we might encounter while testing our two main hypotheses.


Individuals in the high positivity group will be closer to each other in a conceptually coherent research space than individuals in the low positivity group.

1.6 Defining Positivity

For the purposes of this paper we operationalized the two components of positivity as follows:

  • Hedonic: Well-being or life satisfaction.

  • Fitness1: General health.

  • Fitness2: Income.

It is important to note that all of these measures of positivity are partly cultural. Cultural groups differ in terms of what goes into evaluations of health, well-being, and life-satisfaction, and in terms of how highly-valued (positive) these variables are (including income). As already noted, the convergence of positivity might in part be explained by the culture-bound components of these measures. This research should not be seen as a claim that the positivity of these measures transcends culture. Indeed, to the extent cultural components contribute to greater similarity among people who score positively on these outcomes, the findings may provide insight into some of the least-visible cultural barriers to success among those who do not fit the majority-culture mold.

1.7 Overview of Studies

To test our hypotheses we ran a series of studies, which looked for a relationship between various proximity metrics and positivity in its two components, hedonic and fitness. Table 1 presents an overview of the studies and the main variables of interest.

Table 1 An overview of the fives studies presented in Sect. 2: main variables, sample sizes and data sources

2 Results

2.1 Study 1: HILDA Personality

Data: We searched through various publicly available sources, with the goal to find a large dataset which contains personality scores as well as some, or ideally all, of the positivity metrics we outlined above. We found the HILDA dataset, which is a longitudnal study of Australian households, containing data from approximately 17,000 subjects, studied over the period 2001–2016. For three of the years (2005, 2009, 2013), the study included a Big Five personality assessment, as well as a question about income, and scales about general health and emotional well-being.

Analyses: We averaged the data across the time period (2005–2013), and computed mean scores for each of the positivity metrics and for each of the five personality dimensions. Next, we split the participants into high and low positivity groups based on the hedonic metric. Then we computed the average distance in the personality space between all members of the high positivity group and then we repeated the procedure for all members of the low positivity group. The average distance in the high positivity group was reliably smaller than in the low positivity group (p < 0.005, permutation test). The same procedure was applied to the other two positivity metrics. Both for health and for income the high positivity group participants were closer to each other in the personality space than those in the low positivity group. See Fig. 2.A and the Supplementary Materials (SM) for more details. The Pearson’s correlations between the three positivity metrics are presented in Table 2. These results provide support for H1.

Fig. 2
figure 2

Results from single-domain analyses in Studies 1–3. A. Study 1: HILDA personality. B. Study 2: WVS Schwartz’s basic values. C. Study 3: HILDA Cognitive Assessment Task. D. Study 3: HILDA Time Used. The y-axis represents the positive convergence scores (averaged distances for the low positivity group minus the averaged distances for the high positivity group), where greater values indicate greater support for the convergence of positivity hypothesis. For details, see the main text. The x-axis represent the three positivity metrics used to split the samples. The error bars represent ± 1 standard errors of the mean based on bootstrapped estimates for panels A, C and D, and on country-level aggregation (n = 60) for panel B. The significance estimates were based on permutation tests for panels A, C and D, and on one-sample t-test for panel B. Significance levels: * p < .05, **p < .01, ***p < .005, two-tailed.

Table 2 Pearson correlation between the three positivity metrics from HILDA and WVS datasets.

2.2 Study 2: The World Values Survey and Schwartz’s Theory of Basic Human Values

Data: Next, we searched for a dataset which had the Schwartz’s values measures as well as some, or all of the relevant postivity metrics. We found the World Values Survey (WVS) Wave 6 dataset, which contained data from about 89,000 participants from 60 countries, collected in the period 2010—2014. Analyses: First, for each country, we split the participants in high and low positivity groups based on the hedonic metric, and we computed a convergence of positivity score as the difference between the distances in the low positivity and the distances in the high positivity groups. Based on H2, we expected that this score should be reliably higher than 0. We averaged the computed scores across countries, and we found that the mean convergence of positivity score was significantly greater than 0 (p < 0.05). We repeated the same analysis for health and for income, reaching the same conclusion. The individuals in the high positivity groups were closer to each other in the Schawartz’s values space than the individuals in the low positivity groups, and this pattern was directionally the same for all three positivity metrics. The results are presented in Fig. 2B and more details about the data and the analyses are provided in the SM. This study provides supports for H2.

In the next two studies we explored further the generality of the convergence of positivity hypothesis by testing if the pattern occurs in a broader set of domains related to individual differences.

2.3 Study 3: Miscellaneous Psychological Domains in HILDA

Data: This study was designed as an initial testing of H3. We searched the HILDA dataset, for other sets of variables which can define a research space in the context of the current work. In our search we were guided by two criteria: 1. There should be two or more questions, or scales, which represent a conceptually consistent grouping. 2. The answers of the questions should be on the same scale, and should contain at least three alternatives, which can be treated as numeric values. We found two such sets of variables (in addition to the personality space already analyzed in Study 1). The first new domain was referred in HILDA as “Cognitive Abilities Task” and contained measures of cognitive performance on three dimensions: working memory, word pronunciation, and symbol-digit processing. The second domain which met our criteria was referred to as “Time Use” and measured how participants spend their time on 9 dimension, ranging from work and household duties to volunteering and playing with children.

Analyses: For both domains we ran the same analysis as in Study 1. First we tested if individuals in the high positivity groups are closer to each other in HILDA’s cognitive space than individual in the low positivity groups. For all three positivity metrics the results agreed with the convergence of positivity hypothesis (ps < 0.005). Next, we repeated the same analysis for the time use domain, and we reached the same conclusion, individuals in the high positivity groups were closer to each other in the ways they spent their time than individuals in the low positivity groups (ps < 0.005). The results can be seen in Panels C and D of Fig. 2. More details are provided in the SM. These two analyses provided strong initial support for H3.

2.4 Study 4: Miscellaneous Psychological Domains in WVS

Data: In the WVS dataset we found 25 additional domains which fit our criteria for a research space. Those domains ranged from attitudes towards science and democracy, to trust and confidence. For a full list see the SM.

Analyses: First we split the participants from each country into high positivity and low positivity groups based on the hedonic measure. Next, we computed the average distance within group for each of the 25 additional domains. We analyzed the data with a mixed-effects model, where country and domain were entered as random intercepts, while group valence (high vs low positivity) was entered as a fixed factor. The results revealed a robust overall convergence of positivity. Controlling for country and domain differences in intercepts, the participants in the high positivity groups were more likely to be closer to each other than participants from the low positivity groups (p < 0.005). The same analysis was run for splits based on health and income. The statistical results revealed the same pattern for both of these metrics (ps < 0.005). These results provided additional support for H3. Country- and domain-level aggregations are presented in Fig. 3.

Fig. 3
figure 3

Results from Study 3: WVS. In this analysis we measured distances within high and low positivity groups for 60 countries and 25 domains. The error bars represent ± 1 standard errors of the mean based on the level of aggregation (n = 60 for country-level, n = 25 for domain-level). Blue bars are effects in the direction predicted by the convergence of positivity hypothesis, red bars are effects in the opposite direction. A. Country-level aggregation. B. Domain-level aggregation.

It is important to note that while the WVS miscellaneous domain analysis provided overall support for H3, it also revealed plenty of heterogeneity across countries, domains, and across the three positivity metrics. Since our focus here is on the overall test of the convergence of positivity hypothesis, we will not attempt to explain the sources of various levels of heterogeneity, yet we will point out that some of the opposite effects we observed seems to be related to political attitudes and political activities.

2.5 Study 5: Follow-up Personality.

One of the 25 miscellaneous domains in Study 4 was personality, and the effect was in the predicted direction only for income. While the overall effect in Study 4 was statistically robust, this partial nonreplication challenged the results we observed in Study 1 with the HILDA dataset. We found that other researchers had problems with the personality domain of the WVS dataset and referred to it as “problematic,” warning against using it for testing overall effects (Ludeke & Larsen, 2017). Nevertheless, we considered this a partial non-replication, and we ran a follow-up study as an additional test of H1. We searched for other publicly available datasets which contained a Big Five personality metric with some relevant measure of positivity. We came across such a source, created by Lauriola and Iani (2015), who studied the link between personality traits and positivity. In this line of work positivity is measured as a combination of optimism, self-esteem, and life-satisfaction (Caprara et al., 2012). We followed the same analysis steps as in our Study 1, and created high and low positivity groups. Next we computed the distance between participants in the Big Five space, and similarly to our finding in Study 1, we found that participants in the high positivity group were closer to each other than those in the low positivity group (p < 0.005, permutation test). This result replicated our finding from Study 1, and provided additional support and generalization for H1. For more information see the SM.

3 Discussion

Psychologists have long been suggesting that the valence-based asymmetries, frequently found in psychological research, are related to patterns in the environment, yet those claims have not been backed by direct empirical evidence. In this paper we focused on the psychological domain of individual differences, and tested if individuals who are closer to the positive end of the spectrum are also closer to each other in some broadly defined research space relevant to individual differences. Our approach was two-pronged, combining focused studies with a more exploratory search. First, we started with targeted hypotheses, aiming at the most popular tools for measuring individual differences. We looked at the Big Five personality measures and at Schwartz’s Basic Human Values measures as complementary metrics for assessing psychological individual differences. We found two large public datasets which contained these measures, and in both datasets we found strong support for the convergence of positivity hypothesis. Further, we took a more exploratory stance, and expanded our empirical tests to research spaces which were not explicitly planned, but were conveniently available in the datasets we were working with. We looked at all items or scales which can be represented on a numeric scale and are aimed at a coherent psychologically-relevant domain. We analyzed them with the methods used in our targeted hypothesis testing. This exploratory approach brought additional support for the convergence of positivity hypothesis. We found strong convergence of positivity in the cognitive tasks and the time use measures in the HILDA dataset. We also found strong overall support using a diverse set of domains in the WVS. The combined results of our five studies suggest that the convergence of positivity in the realm of individual differences is a robust and rather broad phenomenon. Even though personality and cultural values have been among the most extensively studied areas of psychology, to the best of our knowledge this pattern has not been documented before and no current theory in behavioral or social sciences can explicitly predict the results we have observed.

While we demonstrated that postivity convergence can be reliably found outside the psychological lab, our findings bring up many new questions, which, at this stage, we are not in a position to answer. The most important one is to what degree what we observed is due to an objective pattern in the environment, or to some kind of a wide-spread research bias in developing psychological scales, creating survey items, or selecting domains. When discussing valence-based asymmetries in the mind, researchers have often referred to patterns in the environment which are related to the evolution of complex biological or social systems. From this perspective, the convergence of positivity can be expected in many domains, including some which are not directly related to the human mind (Diamond, 1997; Zaneveld et al., 2017). In our current work, however, we only demonstrated convergence of positivity between individuals on metrics developed by other behavioral researchers. It is possible that the metrics are biased in their statistical properties. It is also possible that what researchers choose to focus on is guided by their cultural understanding of what is important and valuable (Bennis & Medin, 2010; Medin et al., 2010). In this sense, our results cannot be directly compared to studies on the convergence of positivity in microbioms (Zaneveld et al., 2017) or to studies where objective physical features can be measured independently from psychological scales (Little et al., 2011; Winkielman et al., 2006). Irrespective of whether our findings are due to a domain-general principle in the environment or to a systematic bias in our measurement of individual differences, we believe that the current work will provide an impetus for building a broader framework for understanding valence-based convergence of research spaces.

Why is studying research spaces important? Currently, the dominant approaches in behavioral science are typically focused on establishing connections between single pairs of dependent–independent variables, trying to control for external confounds in elaborate study designs. While undeniably successful, this focus on single relationships and emphasis on control might also be contributing to what some authors refer to as crisis in psychology (Feldman-Barrett, 2021) and particularly to generalizability (Yarkoni, 2019). The approach we have explored here is different from current approaches in behavioral science in two important ways. First, we are not focusing on relationships between single variables, but on spaces defined by a research question. Second, our framework suggests that there might be an inherent valence-based directionality in the research spaces we are studying. We will illustrate this point with two versions of a quote ascribed to Edward Thorndike by Paul Meehl. Meehl was an avid critic of the single-relationship hypothesis testing approach which had been emerging as a primary tool for generating psychological knowledge. His main criticism was that mainstream approaches ignore the “crud factor,” where in complex systems the null hypothesis (no relationship between two variables) is almost never true. To illustrate his point, he cited Thorndike’s dictum, yet he provided two slightly different versions. The first version of the dictum was “All good things tend to go together, as do all bad ones” (P. E. Meehl, 1990). In a later version, the dictum became less restrictive “In psychology and sociology, all good things tend to go together” (P. Meehl, 2016). While both versions emphasize the complexity of the network of relationships, the second version of the dictum does not assume a symmetry between the positive and the negative end of the spectrum. Our framework also emphasizes the multiple connections between variables, yet it is better aligned with the second version of Thorndike’s dictum. Instead of assuming symmetry, we treat it as an empirical question which can be estimated from existing data. For example, when looking at a broader set of domains, we found overall convergence of positivity, yet we also observed counterexamples related to political attitudes and to the nature of the research space. It also might be the case that tools explicitly focused on the negative end of the spectrum, such as psychopathology or suicide risk, might result in greater convergence of negativity. Both the presence and the directionality of the asymmetry in research spaces might be informative for understanding the dynamics of the system and can help us to approach various problems at a different level of abstraction.

The work we described here also has implications that go beyond behavioral and social sciences. It adds to a growing number of examples coming from multiple disciplines which suggest that a broad set of desirable features associated with evolutionary success tend to converge in a narrower space than the space defined by undesirable features. A broad positivity convergence pattern would suggest that various types of statistical analyses, machine learning algorithms, and design approaches might benefit from including valence or desirability information. For example, a machine learning classifier would benefit from knowing the valence of the different features before any training data is available, since negatively valenced features might be more diagnostic, while positive outcomes might be more informative, therefore the accuracy of predicting features or outcomes will be dependent on valence.Footnote 3 Similarly, researchers working on the design of new products, or the discovery of new materials with desirable properties, could also benefit from models which account for the valence-based space asymmetries in a particular research domain.

Last, before we conclude, we need to address a limitation of the current work. Here we were able to demonstrate a pattern of convergence which is not predicted by existing theories, but we did not provide a particular explanation of why this pattern happens. When the same pattern is observed multiple times, this is often an indicator for a common mechanism. At the level of generality of the phenomenon studied in this paper, however, it is not clear if a single mechanism is at play. There is evidence that all four mechanisms listed in the introduction can lead to the patterns we have observed, but we do not propose that any one of them is responsible, or that they work in isolation. We hope that our work will inspire future research which will provide deeper understanding of the relationship between variables in context, rather than in isolation.

4 Conclusion

In five studies and fourteen hypothesis tests we found robust support for the convergence of positivity in the domain of individual differences. While those results are well aligned with some previous finding on convergence of positivity in the mind, current theories in psychology do not predict such a pattern. We propose that this pattern emerges either as a consequence of a domain-general convergence of positivity principle or as a systematic bias on the side of researchers in choosing domains and constructing scales. Both of these possibilities are extremely intriguing and invite further investigation.