Introduction

Friendship is prevalent in all cultures: people in every society habitually create long-lasting social bonds with non-kin outside the mating context1,2,3,4,5. For chimpanzees, although kin-relationships are the primary norm, non-kin friendship is frequent both in within-sex female6 and male dyads7. The fact that friendship formation behaviour is universal in humans and appears in similar form in our species’ close relatives points towards evolutionary roots8,9,10,11.

Although friendship is similar to kinship in many ways12, there is one key difference: friendship is always more of a choice than kinship. While our family tends to be pre-determined13,14, we can and do choose our friends. The propensity to make such non-kin, non-mating positive social affiliations raises a practical problem: who to make friends with? This is an important question: friendship can be a long-lasting bond that requires substantial investment both to create and to maintain4,15.

Both humans16,17,18 and chimpanzees10 use trait similarity as the primary friendship choice heuristics. Why is it similarity rather than compatible difference that drives friendship choice? Arguably, the latter would allow division of labour through specialisation. The literature’s existing argument19 is that homophily increases the efficiency of interaction via reducing the cost of mental state synchronisation, and thus facilitates collective action within the dyad. There is empirical and theoretical evidence that supports this explanation. For instance, ethnocentricity, i.e., choosing social connections based on ethnic similarity, has been associated with increased efficiency of collective action20. A series of mathematical models demonstrated that similarity or “tag” based cooperation can arise autonomously, bypassing the free-rider problem21,22.

However, recent empirical results raised questions about the direction of causality: although trait homophily correlates with efficiency in real world collective action, the underlying mechanism might be more complicated. For instance, a longitudinal study of six thousand students23 has shown that homophily in academic performance among friends was due not to the friends-will-pull-you-up effect, rather to the rewiring of the friendship networks based on academic performance.

The suggestion that some other dynamics than choosing a partner for effective collective action drives the correlation, pairs well with the observation that many of the particular traits that are taken into account in homophilic friendship choice, such as gender or ethnicity24,25, have little to do with effectiveness in collective action. Equally, body similarity is unlikely to correlate with behaviour, despite the fact, that both genetic26 and facial27 similarity have been shown to trigger positive social affiliation even when relative relatedness was negligible.

Some network similarity traits, e.g., smoking or eating behaviour28,29 point to the possibility that homophily has more to do with belonging to a segregated group rather than with collective action efficiency30,31,32. In any case, the focal factors are not necessarily stable: experimental evidence has shown that the traits on which homophilic friendship choice heuristic is based can be shifted using a simple and brief experimental manipulation33. This suggests that homophilic friendship choice might have evolved for a different reason than facilitating cooperation in the dyad.

In this paper, I explore an alternative hypothesis: homophily is adaptive because it increases the clustering coefficient of the ego network. If a higher clustering coefficient induces a cooperative stance from the agent’s partners34, and thus creates a more trustful network environment, then trait similarity-based friendship choice can be adaptive. For instance, let us assume that a group of individuals are assigned a letter of the alphabet each. Let us focus on an individual with the letter A, who has a choice of possible friends with any letter in the entire range of the alphabet. If she chooses two friends with letters B and C, this choice is unlikely to result in triadic closure as they themselves can choose from the entire range, as well, and hence it is not likely that they will choose each other. However, if A chooses two others who are also As, then, assuming they too are following homophily as choice heuristic, they are more probable to connect to each other. Thus, with a simple similarity-based criterion, a triadic closure is achieved, and the clustering coefficient of all three friends increases.

Empirical evidence supports the idea that people actively increase their clustering coefficients by trying to achieve triadic closure among their social contacts35. It may be having a hobby with a social element, joining a club where new friends are also likely to be friends with each other, or playing a team sport in which cooperation is polyadic by design36,37,38. In these, the motivation for using a homophily heuristic as a friend choice strategy is the resulting triadic closure39,40.

Methods

Let G denote a set of graphs, each representing kin networks with 500 vertices, generated in a simulated population history with varying fertility34. In these binomial kin networks, the edges represent either sibling or first-degree cousin relationships in the following way. The average degree of the graphs in the G set varies, determined by the population’s fertility34. For any of these kin-only graphs gG, let a denote the corresponding adjacency matrix, where aij = 1 if agents i and j share two grandparents34.

I assume that the desired number of social contacts is uniform among the agents and set at the semi-arbitrary ν = 60, such that almost all the agents have fewer relatives than their degree target34. To fill the missing social connections, friends are added using the following three methods: (a) homophily only, (b) homophily weighted by graph distance, and (c) random friend choice34. (See the Supplementary Information for robustness calculations concerning these assumptions).

Homophily only

Let us assume that the agents possess a varied trait type, and let ϕ denote the value corresponding to the type of an agent, such that

$${\phi }_{i} \sim {\rm{U}}(0,100)\forall i=1,\ldots ,n$$
(1)

Let b denote the adjacency matrix that contains all kin and friend edges. At the beginning of the friendship choice algorithm, let us set b = a. (For notational simplicity, I drop the algorithm step counter for b).

Let si denote the number of social contacts for individual i:

$${s}_{i}\equiv \mathop{\sum }\limits_{j=1}^{n}{a}_{i,j}$$
(2)

Step 1: let X denote the set of all individuals that have a total number of social contacts lower than ν:

$$X\equiv \{i\}|{s}_{i} < \upsilon $$
(3)

Step 2: select the element, xi, from the X set that has the lowest number of contacts:

$$xi\in X|{s}_{xi}=\,{\rm{\min }}\,{\{{s}_{i}\}}_{i\in X}$$
(4)

and select another element, xj, that is different to xi:

$$xj\in X|c({\phi }_{xi},{\phi }_{xj})=\,\min \,{\{c({\phi }_{xi},{\phi }_{j})\}}_{j\in X\backslash xi}$$
(5)

where c denotes the type distance:

$$c({\phi }_{xi},{\phi }_{xj})=|{\phi }_{xi}-{\phi }_{xj}|$$
(6)

Step 3: if they are not connected, create a network edge between xi and xj:

$${b}_{xi,xj}={b}_{xj,xi}=1$$
(7)

I repeated the Steps 1–3 until either

$${s}_{i}=\upsilon \forall i=1,\ldots ,n$$
(8)

or

$$\nexists \{xi,xj\}|xi,xj\in X\wedge xi\ne xj\wedge {a}_{xi,xj}=0$$
(9)

that is, either all individuals have the limiting number of social contacts, or there are no two elements of X left such that they are not connected while their degrees are smaller than the limit.

When this algorithm finishes, the b adjacency matrix has its final form, and it defines the full social network graph, h. Notice that as g is a network that contains the kin relationships, and h is a network that contains both kin and friend relationships, g is a subgraph of h.

Homophily weighted by graph distance

In the above version of the friend-choice algorithm, the missing social contacts are filled with non-kin friends only on the basis of similarity. This assumes that their choice is unaffected by graph distance. To see the effect of this assumption, let us consider a version in which friends are chosen based on both type similarity and the number of network steps from each other.

For this, I used the Eqs. (15) and (79) as above, but with (6) replaced by the type distance weighted by the graph distance:

$$c({\phi }_{xi},{\phi }_{xj})={({\phi }_{xi}-{\phi }_{xj})}^{2}\cdot {d}_{b}{(xi,xj)}^{0.5}$$
(10)

where \({d}_{b}(xi,xj)\) is the graph distance between xi and xj on the graph defined by the adjacency matrix b. (I chose the power factors, arbitrarily set to 2 and 0.5, so that they reflect the assumptions that larger trait distances are disliked with increasing intensity, and changes in graph distance are less important further away from the ego. The results are not qualitatively sensitive to the exact value of these parameters, see the Supplementary Information for robustness calculations).

Random, homophily independent friend-choice

As a baseline, I include the random friendship choice algorithm34, which is the same as (14) and (79) with (5) replaced by

$$xj \sim {\rm{U}}(X\backslash i)$$
(11)

Thus, the new connections are entirely random, and both trait similarity and network distance are ignored in friendship choice.

Results

Let λ denote the proportion of social network edges that are non-kin in nature, i.e., λ is the ratio of friends to all contacts:

$$\lambda =1-\frac{{\Sigma }_{i=1}^{n}{\Sigma }_{j=1}^{n}{a}_{i,j}}{{\Sigma }_{i=1}^{n}{\Sigma }_{j=1}^{n}{b}_{i,j}}$$
(12)

Let χ denote the average local clustering coefficient of the combined kin and non-kin adjacency matrix, b:

$$\chi =\frac{{\Sigma }_{i=1}^{n}{\Sigma }_{j=1}^{n}{\Sigma }_{k=1}^{n}{b}_{i,j}{b}_{i,k}{b}_{j,k}}{{n}^{2}\upsilon (\upsilon -1)}$$
(13)

that is, χ denotes the average number of triadic closures around the agents as a proportion of all possible such triads.

The results show that the friendship choice algorithm determines the network structure in the following way (Fig. 1). First, the λ-dependent χ path varies with the friendship choice algorithm such that homophily based-friendship choice raises the average clustering coefficient (Both orange and green are above blue in Fig. 1). Second, homophilic friendship choice is additive with the network proximity-based friendship choice (the green pattern in Fig. 1 is similarly dependent on λ as the orange, shifted up).

Figure 1
figure 1

The availability of relatives drives the clustering coefficient. Orange: purely homophily-based friendship choice algorithm. Green: network distance weighted homophily based algorithm. Blue: the baseline random friendship selection. X-axis: the proportion of social contacts that are non-relatives. Y-axis: clustering coefficient.

Third, the λ-dependent χ path exhibits a transition pattern. When friendship choice is random (blue in Fig. 1), the falling ratio of relatives among the social contacts (moving from left to right on the x-axis in Fig. 1) leads to monotonously decreasing clustering coefficient. When homophily drives friendship choice at least to some extent (orange and green in Fig. 1) then the falling number of available relatives results in a lower clustering coefficient only up to a point. After a trough, the clustering coefficient increases from about mid-point of the x-axis all the way to the right end. Thus, when homophily plays a role in friendship choice, the transition curve is u-shaped.

Discussion

This paper introduced homophily based friendship choice into a model in which individuals replace missing kin with non-kin friends. The results show that similarity-based friendship choice increases the clustering coefficient, and thus it is an adaptive response to decreasing interconnectedness within the individual’s immediate social network. This is of import as although the clustering coefficient may fall for a number of reasons for any one individual, for an entire society it tends to be associated with demographic processes in which the number of available relatives falls34. Urbanisation, falling fertility, migration, war, and epidemics can all result in reduced access to kin, either because relatives are not available, as in urbanisation and migration, or they were never born, as in falling fertility, or have died in wars and epidemics. The consequence of all these demographic processes is a fall in the clustering coefficient that results in increasing incentives for norm violation and evaporating social trust34.

Homophily is adaptive behaviour in increasing the clustering coefficient

It has been suggested in the previous literature that the implementation of law, i.e., an institutionalised third party punishment system is a society-level response to increased norm violations34. This paper has shown that homophilic friendship choice can be seen as a parallel, individual-level response. When traditional kin networks break up, the average clustering coefficient falls34, which people perceive as feeling lonely, disconnected, surrounded by a social environment in which trust collapses, and norm violating behaviours become frequent41,42,43. To reverse the fall in the clustering coefficient, the individuals’ optimal response is to rely on a friendship choice strategy based on trait similarity. Notice, however, this individually adaptive behaviour may not be a societally optimal response if as a consequence a network cleavage emerges.

Social identity is a consequence of homophily

The social network literature, which does not have an evolutionary behavioural science-based approach, suggested a causal link from identity to social network homophily44: people are raised with socially constructed identities which determine the traits in which people choose each other as social connections. This paper’s logic suggests causality in the opposite direction: first, people agree on the traits on which similarity-based friendship choice can take place. The content of these traits is irrelevant, as long as there is a societal consensus concerning the traits themselves. Using these traits, people build their social networks using trait similarity as a friendship choice heuristic. The consequently high network clustering leads to dense subgroups emerging. The social identity of individuals within such subgroups may then be merely a mental shorthand.

The logic of falling clustering to homophilic friendship choice to segmentation to group identity would explain why the phenomenon of trait-based societal segmentation has emerged historically with urbanisation and falling fertility, and why there is such a wide range of traits along which social identities emerge, despite the similarity of many societal and economic factors in the underlying societies.

Transition clique-iness

The results suggest a clustering coefficient pattern during demographic processes in a u-shaped curve, predicting that mid-transition societies are particularly prone to a crisis of trust and norm-violation. If this is true, then societies in the middle of demographic change are more likely to exhibit coping behaviours. Shifting from predominantly rural to urban living, a large influx of migration, prolonged war, or deadly epidemics should increase the importance of sub-group membership (e.g., clubs), group signalling behaviours (e.g., special handshakes), and difficult-to-fake markers of origins (e.g., accents). Through the middle of the u-shaped curve, clique-iness becomes normal behaviour, which may fade away once the transition is over if replaced by multi-dimensional trait signalling.

Fake news

Notice that although the clustering coefficient of a homophily-based friendship network can be as high as for a traditional kin-based one, there remains a crucial difference. While the stability of network edges is high among relatives as these tend to be fixed for life, friendship edges are less certain. Empirically, it has been established that without maintenance via meaningful social interactions, friendship wanes faster than kinship does15. Friends, in general, are not chosen for life, these relationships need to be reaffirmed with regularity. To be chosen and re-chosen by their friends, people need to keep signalling their type. This might be particularly pertinent in societies in which others with whom there is a similarity in a number of traits are much more numerous than the personal sociality limit.

In these societies, individuals might feel a need to signal their traits frequently, and in a way that stands out as new. Seasonally changing fashion associated with brands, and competition news about affiliated sport clubs may serve as a source of such flow of markers. Anecdotal observation suggests that it is customary in many societies to start social interaction with a “banter” that refers to news concerning one’s chosen brand (car, clothing, accessory, food etc.) or club (sport, religion, political party etc.) affiliation.

In the past decade, with the rise of social networking apps, a new pathway has opened up for this similarity marker reaffirmation: internet-based news outlets provide a constant flow of possible markers, while social networking apps provide a way of renewing self-assignment to type, by passing on, or “wearing”, the ever newer signals of affiliation. Thus, the fake news phenomenon can be viewed as a side effect of an individual-level coping mechanism responding to decreasing clustering coefficient due to urbanisation, falling fertility, migration, exploited by politically motivated institutions.

In summary, this paper offers a theoretical argument that shows how homophily, i.e., using trait similarity as a friend selection criterion, can increase the clustering coefficient of the social network. Thus, if, as argued in previous literature, higher local connectedness is beneficial for the individual by increasing the cooperative stance of the network partners, achieving this aim using homophily is an adaptive behaviour. Furthermore, the results also suggest that societies that go from a rural, high fertility, low migration state to an urban, low fertility, high migration state are likely to go through a u-shaped transition period in terms of clustering coefficient. If the lower clustering coefficients are associated with increasing norm violations, this theoretical result offers a new explanation for the transition crises in countries that have gone through this shift in recent history.