A relatively recent phenomenon, online dating is becoming an increasingly relevant site of investigation spanning disciplines as varied as sociology, economics, evolutionary biology, and anthropology [5]. Foundational work on mate preferences in online dating, matching markets, and the role of physical attractiveness in online dating has been done by Finkel et al. [9], Hirstch et al. [14], and Fiore et al. [10]. Zhang and Yasseri [31] explored the latent asymmetries in messaging between men and women on these platforms and Holme et al. [15] studied the users’ community dynamics of such platforms. Though the aforementioned literature is rich and sets a foundation for robust discussion of online dating, no existing study presents a longitudinal approach to online dating. The contribution of this work is the expansive dataset which encompasses over 12 years of user activity, allowing us to better understand not only how these phenomena of interest work in extraordinary detail, but how they have changed over time.

As the Internet rose as a social medium used to facilitate communication, it eventually adapted to specialist functions including online dating sites. Online dating is the practice of using dating sites—made specifically for users to meet each other for the end goal of finding a romantic partner [9]. As Michael Norton put it, “Finding a romantic partner is one of the biggest problems that humans face and the invention of online dating is one of the first times in human history we’ve seen some innovation” [13]. In fact, online dating has emerged as one of the most widely used applications on the Internet. Online dating has an annual growth rate of 70% in the United States. It has also developed into a highly profitable business with growing numbers of people worldwide willing to pay for access to services that will find them a romantic partner. Online dating is now a $2.1 billion business in the US and is expected to continue growing in the foreseeable future [22]. Considering three-quarters of US singles have tried dating sites and up to a third of newly married couples originally met online [32], online dating seems to have shed its old stigma, ostensibly here to stay as the new normal.

When considering online dating, it may be useful to think of these platforms and marriage in general as markets [25]. As economist Alvin Roth explains in his book Who Gets What and Why, there can be thick and thin matching markets where thick markets have lots of buyers and sellers (single people in this case) and little differentiation, while thin markets have fewer buyers and sellers and considerable differentiation [25]. For instance, we can imagine that there was a thick market for marrying your high school sweethearts before women started going to college. However, as more and more women decided to pursue higher education and enter the workforce, the market shifted to a wider selection of potential spouses for each side and decreased from the thickness of the market.

The increased variety of potential mates gave way to dating phenomena like speed-dating, which was a pre-internet predecessor to any modern app with a market design where singles meet many people very quickly, indicate who they are interested in, and only receive each other’s contact information if there is mutual interest. However, with the rise of the Internet, there is now a thick market for finding love online again. More specifically, we can think of these Internet-based dating platforms as two-sided matching markets (if we exclude niche platforms for polyamory and non-traditional relationships). This means that there are two sides of the market to be matched, participants on both sides care about to whom they are matched, and money cannot be used to determine the assignment [1]. This model includes high-end management consulting firms competing for college graduates that must attract candidates who also choose them, home buyers and sellers, and many more important markets. Two-sided matching markets have been extensively studied, with the literature splitting them into two categories: the “marriage” model and the “college admissions” model [1].

Becker’s (1973) marriage model [33] assumes simple preferences, with men and women ranked vertically from best to worst. This model and its assumptions have been applied to diverse problems such as explaining gender differences in educational attainment, changes in chief executive officer wages, and the relationship between the distribution of talent and international trade [34,35,36,37,38,39].

Another line of research follows Gale and Shapley’s college admissions model [41] which allows for complex heterogeneous preferences. This model is a cornerstone of market design and has been applied to the study and design of market clearing houses such as matching residents to hospitals and students to charter schools. This begs the question: who gets matched with whom in the online dating matching market? Are differences in dimensions of type mostly horizontal (e.g., some pairs make better matches than others, following the college admissions model), or vertical (e.g., there are some people that we can universally agree are more desirable mates than others, following the marriage model)? Work on a small sample of online dating users provides limited support for the latter [3].

Earlier work suggests that there are “superstar” users who attract lots of attention and matches on any given platform. In some cases, the top 5% of all men on a platform receives twice as many messages as the next 5% and several times as many messages as all the other men [23]. However, it would be incorrect to assume these superstars would be universally appealing to all users and that popularity alone determines matches. Instead, it could be useful to consider the economic concept of assortative mating observed in offline marriage markets, and how online matching reflects or deviates from this behaviour.

Positive assortative mating or matching occurs when people choose mates with similar characteristics. Empirical evidence strongly suggests that spouses tend to be similar in a variety of characteristics, including age, education, race, religion, physical characteristics, and personality traits [24, 34, 41, 43, 44]. This phenomenon can be measured and observed in online dating markets when we inspect the pairs. Using data from an online dating site, Hirsch et al. found that although physical attractiveness and income are largely vertical attributes, preferences concerning a partner’s age, education, race, and height tend to sort assortatively. Likewise, the examination of "bounding" characteristics shows that life course attributes, including marital status, whether one wants children, and how many children one has already, are much more likely than chance to be the same across the two users in a dyadic interaction [10].

In other words, mate preferences are not simply vertical, meaning that we always want mates with the highest level of education, income, etc. Rather, horizontal preferences and preferences for similarity, in particular, play an important role [14]. Overall, users with similar education levels are three times as likely to match. As we can observe, assortative mating occurs in both online and offline contexts and can partially help explain why these markets still tend to be efficient. Lewis [18] provides evidence for the co-existence of both similarity and universal desirability (status)-based mechanisms.

Newer niche dating apps that only admit users from certain echelons of society may be changing the way we sort and actually exacerbate existing assortative tendencies. A recent Bloomberg report argues that dating apps, particularly elite ones like the League and Luxy, may be worsening economic inequality by making it easier for couples to pair by socioeconomic status. The League famously only admits graduates from top universities, while Luxy purports that the median income of users on its platform is $500,000. Instead of meeting someone at a bar or other social setting, singles can now use apps to find their economic and educational equivalent. While one might argue that this phenomenon already occurs offline, according to Bloomberg, “these services help facilitate unions between educated, affluent Millennials who are clustering in such cities as San Francisco and New York"—indirectly intensifying economic inequality.

While those may be exceptional cases, some combination of an individual’s attributes and potential partners’ preferences dictate market dynamics both in online and offline contexts. This means that an individual may have high desirability for one person and low desirability for another, and the preferences may not necessarily be monotonically related to their attributes. Efficient matching in this market thus relies on the existence of pairs of mutually desirable agents in a setting where preferences are heterogeneously distributed. As Hitsch et al. note, these markets tend to naturally resolve into pairs of mutual desirability [14].

Online platforms provide us with a unique opportunity to study the economic and evolutionary concepts of sorting and matching. While part of this is due to the ability to observe and classify user attributes, preferences, and behaviour in great detail, it is also due to the unique lack of search frictions in online dating markets. Certainly, a main reason for the existence of online dating sites is to make the search for a partner as easy as possible. Yet, despite the wealth of insight user-generated data, online dating has revealed about latent and stated mate preferences, there remains significant uncertainty regarding the way these preferences have evolved over time.

Sociologists often assume that society has become more egalitarian, and that the pluralist ideals have translated into a more equal quest for love [7]. It would then follow that people’s mate preferences have become more pluralist, switching from sorting based on ascribed traits to sorting based on acquired traits. Ascribed characteristics, as used in the social sciences, refer to properties of an individual attained at birth. The individual has very little, if any, control over these characteristics. In other words, based on the progress we have reportedly seen over the past decade in social integration, we would expect to observe users placing less importance on inherited traits like ethnicity and height, and more importance placed on characteristics achieved through merit such as education.

RQ1: How have stated and revealed mate preferences evolved over the last decade and are the claims of a more egalitarian society in fact reflected in online dating and mate selection?

In mate selection and especially in online dating, there seems to be a preoccupation with physical beauty [24]. Historically, theories of interpersonal attraction and interpersonal judgments have emphasized the importance of physical attributes over other factors such as personality and intelligence [44, 45]. Accordingly, online dating sites often urge their users to post photos of themselves to increase the chances that potential dates will contact them. Dating services like Grindr and Tinder have gone even further by doing away with detailed profile descriptions altogether, allowing users to base their dating decisions on physical appearance alone or at least at the first instance [12]. Indeed, 85% of interviewees in a study of Australian online dating users said that they would not contact someone without a photo on his or her profile [30].

Only a few studies so far have considered how users judge attractiveness online generally or in online dating in particular and how this translates into messaging strategy. Ellison et al. [5] describe the strategies employed by online dating users to interpret the self-presentations of others. Primarily, the participants they interviewed made substantial inferences from small cues, lending support to Walther’s theory of Social Information Processing [58]. For example, one woman felt that people who were sitting down in their online dating profile photos were trying to disguise that they were overweight [5]. Fiore et al. found that in line with past research on the psychology of attraction, the attractiveness of the photograph was the strongest predictors of whole profile attractiveness in online dating [59].

However, while it is evident that the attractiveness of one’s photo is important in determining overall perceived attractiveness of an online dating profile as a whole, predicting popularity based on looks alone is much more ambiguous. Rudder [26] explored the importance of attractiveness in online dating and found that how good-looking you are does not dictate how popular you are on an online dating website. In fact, having some people think that you are ugly can work in your favor [12].

To try and test how attractiveness might predict popularity, the OkCupid team took a random sample of 5,000 female users and compared the average attractiveness scores they each received from other users with the number of messages they were sent in a month. They found that it is not just the better-looking people who receive lots of messages. Using the spread of attractiveness ratings, they identified people who divide opinion on their attractiveness. These polarizing users ended up being far more popular on internet dating sites than universally attractive people [26]. In essence, the most beautiful users will always do well, but users whose attractiveness divides opinion are better off than those who everyone agrees is just quite cute.

Fiore and Donath [60] also explored this question of predicting popularity, but used self-reported attractiveness instead of attractiveness scores given by other users. They found that men received more messages when they were older, more educated, and had higher levels of self-reported attractiveness. Women received more messages when they did not describe themselves as “heavy,” had higher levels of self-reported attractiveness, and posted a photo on their profiles.

Among online daters, sending signals such as a “Superlike” or “Smile,” or “favoriting” a user can be a way to let them know a user is interested. In a notable study using a Korean dating/marriage site, researchers found evidence that the most sought-after people on the website were not very responsive to “virtual roses” [17]. Because their attitude was “well, of course, that person is interested in me.” Instead, the virtual rose was most effective on the middle desirability group which did not have as many great dating options and was almost twice as likely to accept a proposal sent with the costly signal of a rose.

This brings to light issues with signaling optimization: Despite the positive effect of sending roses, a considerable portion of participants did not use their roses and even those who exhausted their supply did not properly use them to maximize their dating success. It seems there are substantial tradeoffs in preference signaling. Reminiscent of the bar scene with John Nash in A Beautiful Mind, a user could send their signal to the ‘blonde’ or the most attractive female on the platform, who would be their number one pick. However, if everyone uses this strategy, chances of success are low. Instead, users would be better off using their costly signal on a medium-quality mate where chances of reciprocity are higher. By the same token, it seems like success could be almost guaranteed by seeking out the least desirable mate and sending a signal, but this is obviously not optimal. Therefore, there is a trade-off in choosing who to send a costly signal such as a favorite or message to that goes back to the aforementioned difference in user “quality” or desirability.

RQ2: What is the impact of user attractiveness on messaging patterns and is it a powerful predictor of “success” in online dating?

In the social sciences, gender is a built-in variable that can account for measurable differences in behaviour [46]. While non-binary users and same-sex dyads are a growing segment of online dating users, the dataset examined in this work consists exclusively of heterosexual dyads. One of the main research areas related to online dating systems is the difference in messaging behaviour between men and women on these platforms. However, to meaningfully investigate computer-mediated communication between genders, it is important to first understand underlying patterns of offline communication between heterosexual dyads that may be reflected, moderated, or exacerbated online.

Examining single women’s use of the telephone in heterosexual dating relationships, Sarch found that in line with gender norms at the time of the study, subjects expected men to pursue women [47]. Additionally, on occasions when a woman ever took initiative and started a conversation, she expected her partner to “overcompensate” by reaching out with more frequency. Subjects also reportedly saw the frequency of how often their dates called as an indicator of how well the relationship was going or how often their date was thinking about them.

In keeping with these two indicators, subjects did not want to be perceived as the pursuer, so they limited the frequency of their own calls by ensuring that each one was “carefully executed so that sufficient time elapsed between multiple phone calls” ([47], p. 141). This phenomenon has not entirely disappeared—Ansari and Klinenberg observe, “the fear of coming off as desperate or overeager through texting” as a common concern in recent focus groups [32]. Despite coming 22 years after Sarch’s study, Ansari and Klinenberg’s research shows that initiator status and contact frequency equating to interest have translated from telephone calls to modern online messaging culture.

Besides the stigma against female initiators, another reason initiators tend to be male has to do with the way incentives are structured in online dating. About 60% of the men in Whitty and Carr’s study saw online dating as a “numbers game” [30]. Given the seemingly endless number of profiles available, individuals could keep trying until they get a response, meaning that they are not fully interested in some of the profiles they send messages to. Instead, they would send a large number of initiations regardless of actual interest and see which women reciprocate, filtering at the response level.

The result is staggeringly lop-sided activity levels for men and women. Men are on average twice as active as women in online dating apps—skewing an already imbalanced gender ratio; taking into consideration activity level, the gender ratio of the active user base is about 80:20 [13]. Rudder [26] confirms this, showing that even the most attractive men receive fewer messages than women on average. In turn, since women are often inundated with date requests, they are less compelled to respond to each request [28]. Fiore et al. confirm this, finding that women responded more selectively than men, answering 16% of the time compared to men’s 26% reciprocation rate [10].

Zhang and Yasseri found that messages were five times more likely to have been initiated by a man than by a woman even in mobile dating applications that allow users to communicate only after they have mutually signaled their interest [31], in line with previous work that found men to be the main initiators in heterosexual conversations [9, 28, 29, 49]. Fiore et al. also confirm this, finding that rates of initial contact differed sharply by gender. Men initiated a median 1 contact per day compared with 0.875 for women [10]. Given this difference combined with the greater number of men on the site, women tended to be contacted much more often than men, a median 2 times per day, compared to 0.5 for men. Finally, more popular men and women—those who were contacted more often per day—initiated contact with others slightly less often, confirming economic theory that “high quality” users need not pursue others as actively.

RQ3: Has gender asymmetry in online dating messaging behaviour remained stable, lessened, or grown over time?

We integrate the previously mentioned literature on attractiveness and selectivity to investigate how user behaviour and strategy varies across different facets of communication; searching for partners to initiate contact with, and selecting which users to reply to when they have some awareness of their attractiveness or signals of their success. As well as studying variations of behaviour in the population, we are also motivated by research around Dunbar’s number [4] to study what limits and commonalities might be present in the data around users’ communication.

RQ4: How different facets of online daters’ success relate to their selectivity?

Referring back to the “college admission” model that suggests strong homophily in seeking partners, most studies have overlooked whether a match based on homophily actually translates into initiation of contact and communication between users in a liquid market and in the absence of search friction. Given the abundance of inactive users and the asymmetry in the activity between male and female users, matching alone is insufficient to determine whether online dating is driven by homophilic tendencies. Hence, we form our last research question as the following.

RQ5: Does similarity between the parties involved in a computationally made match map into initiation of contact and successful communication?

Moreover, homophily is unlikely to be uniformly distributed across all characteristics for all users. For instance, some users will weigh age differences stronger than others. While there seem to be some hints that especially demographic or socioeconomic features play an important role, the exact relationships and relevant variables are still ambiguous.

RQ6: Given presence of homophily, which are the decisive dimensions and variables predicting successful communication?

Materials and methods

To address the aforementioned research questions, this work analyses a data set obtained through a collaboration with eharmony UK, a major web-based online dating system. Broadly speaking, web-based online dating systems include the following:

  • Personal profiles for each user, which include demographic and other fixed-choice responses, free-text responses to prompts, and, optionally, one or more photographs.

  • Searching and/or matching mechanisms, so that users can find potential dates from among the thousands of profiles on a typical system.

  • Some means of private communication that permits users to contact potential dates within the closed online dating system without disclosing an email address, phone number, or identifying information. This usually means a private mail system, but it sometimes also includes the ability to send “smiles” or some other token of interest.

  • Optionally, other forms of self-description: for example, the results of a personality test, or multimedia uploaded by the user.

eharmony’s platform follows the typical format of other online dating systems, including personal profiles and messaging channels, but is distinctive in that users can only communicate with matches selected through an algorithm. This matching algorithm is based on responses from a questionnaire each user completes upon registration. This work will utilize stated mate preference and demographic data collected through this questionnaire, as well as the user interactions that occur after the match—namely, messaging communications. Since the aim of online dating systems is to facilitate face-to-face contact, with communications being a prerequisite to any offline encounters, this research will operationalize communications received, sent, and reciprocated as meaningful measures of interest and popularity.

Sample dataset used in this work was generated from users who registered during a randomly selected month (March) for each year between 2007 and 2018. Data were not sampled from January or February, since they are probably not the most "typical" months, due to holidays including New Year’s Day and Valentine's Day. The data were generated from user profiles and private messaging activity on the dating site over the 12-year period and consists of 149,440 unique heterosexual users from across the United Kingdom. All demographic information and gender were self-reported upon registration. The dataset did not contain any users identifying as non-binary, so the term “gender” in this work will refer to male or female self-identification. Since we sampled for all users registered in the month of March for each year, the registration month for all cases is the same, but the total cases of each year vary, as reported in Table 1.

Table 1 Number of users in the dataset for each year

Table 2 summarizes the variables describing users and their features. In the following sections, the variables are grouped by type and defined in further detail.

Table 2 Description of variables in the dataset for each user (user-level data)

Mate preferences were collected upon registration through a questionnaire asking about the importance of different match criteria based on a Likert-type scale, ranging from Not Important to Very Important. The variable for user attractiveness used in RQ2 was created using the average score of self-reported responses to the following questions: “How stylish do you consider yourself?” “How attractive do you consider yourself?” and “How sexy do you consider yourself?” on a scale from 1 to 7. The remaining psychometric variables were also created using a similar formula of questionnaire responses.

Communication-level data were inspected by gender and initiation. Initiation refers to whether the sender of the message is the user who had sent the first message in the conversation. The gender of a given message sender is tied to the initiator of a message, as all messages in the dataset are between heterosexual matches—for example, if the conversation initiator is male, the responder would be female.

When computing messaging statistics, the primary measure was not the sheer number of messages sent or received but the number of distinct people whom a user contacts or is contacted by. This places the focus not on how many messages a pair exchanges, but rather on distinct cases of initiated contact. In particular, one key focus of this study is predicting “popularity” in an online dating system. This study falls in line with Fiore et al. [59] theory that a person’s popularity on an online dating site is best indexed by the average number of people who initiate contact with him or her. However, this work deviates from their belief that this measure doubly serves as a reasonable proxy for overall attractiveness as well. While Fiore et al. assumed that more attractive people on average receive more unsolicited attention than less attractive people, this work seeks to tie in later findings from Rhodes et al. [24], Rudder [26], and Fry [12], and understand user attractiveness and popularity as two distinct variables. Finally, to control for the fact that the number of matches each user has is an artifact of the algorithm which has slightly changed over the years, we sought to normalize the number of contacts received by the number of profile views each user received, creating a new “communication rate” variable for each user. This metric is an approximation which accounts for users who might be much more active and get more site exposure than others.

Further to the user profile data, we also possess data for messages for a subset of users (70,508 users, 1,048,575 matches—which can be acted upon by both people, one person, or neither). This is summarized in Table 3.

Table 3 Data parameters (match-level data)


Given that this research used data generated from real users of an online dating platform, privacy was a top ethical concern throughout data collection and analysis. As such, proper precautions were taken to preserve privacy and ensure the anonymity of users. During data collection, the eharmony team excluded any personally identifiable information such as names, payment information, and address to prevent triangulation. Data were captured, transferred, and stored on a password-protected computer with an encrypted hard-drive. Users are only identified by an anonymous user ID number.

Confidentiality and data transfer agreements were signed both by the company and the University of Oxford to preserve privacy rights for eharmony users. Users are informed of data collection and analysis efforts at the time of sign up, when presented with Terms & Conditions and Privacy Policy agreements. eharmony UK operates fully certified under the EU-US and Swiss-US Privacy Shield frameworks. The University of Oxford CUREC (Central University Research Ethics Committee) approved all handling of data and research methods. The CUREC number for this project is SSH OII C1A 18 032.


Partner preferences

The majority of users in our dataset are from London, followed by Manchester, Birmingham, Glasgow, and Bristol. The minimum age is 18 and the maximum age is 98. The mean age of the users in the sample is 38, while the median age is 37. The gender makeup of the dataset is 52% female and 48% male. Most users are non-religious (53%), followed by Christian (34%), then Other and Muslims. Most users have never been married (67%), 24% are divorcees, and 3% widowed. All users are engaging in heterosexual interactions on the platform.

We inspected the stated level of importance for both men and women in regards to six different mate preference criteria: income, education, age, religion, smoking level, and drinking level.

Regarding the average importance of income (Fig. 1), women have a consistently higher mean than men, meaning that they consider income of a potential match more important than men do for all years 2007–2018. This difference between female and male preference for income is statistically significant (at p < 0.05). Nevertheless, for both genders, after a Post-Financial Crisis increase, we see that the importance of the income of the partner has been decreasing over more recent years.

Fig. 1
figure 1

Importance of income of the partner (2007–2018). Note that the scale of the vertical axis is different in each panel

As for education (Fig. 2), women have a consistently higher mean than men, meaning that they consider education of a potential match more important than men do for all years 2008–2018. The overall trend is very similar to the one for income: an increase around 2010–2013 and then a steady decrease.

Fig. 2
figure 2

Importance of education level of the partner (2007–2018)

When it comes to age preference (Fig. 3) too, women have a higher mean than men for all years 2008–2018, meaning that they consider age more important than men do. Change in average score over time is not monotonic for men or women.

Fig. 3
figure 3

Importance of age of the partner (2007–2018)

Regarding the importance of smoking and drinking levels, there is no clear pattern in the changes of “average” over time, mostly because users are polarized by prospective partners smoking (58% Not Important, 40% Very Important). Users are less concerned with prospective partners drinking alcohol (77% Not important to Somewhat Not Important) in 2018.

Finally, comparing all online dating mate preferences with each other, women consider these traits most to least important: smoking level, drinking level, education, income, and then religion. For men, the order is: smoking level, drinking level, religion, education, and finally income.

Physical attractiveness

The second research question we sought to answer was to what extent self-perceived physical attractiveness determines popularity in online dating. When investigating the relationship between self-perceived physical attractiveness and communication rate (communication initiations received over profile views), we found no significant change from year to year. Thus, here, the results will be discussed in context of the overall dataset and not in respect to change over time. Looking at the aggregate dataset, users do tend to have a higher communication rate as self-perceived attractiveness increases, but the rate of increase appears to first plateau and then decreases.

When the data are separated by gender (Fig. 4), the pattern holds for both men and women, although the slope for women between 2 and 6 attractiveness (0.023) is significantly larger than for men (0.008).

Fig. 4
figure 4

Communication rate vs attractiveness. The slope of the linear fit is 0.023 for women (left) and 0.008 for male (right) profiles

An intriguing observation here is that for both genders, the most “successful” profiles based on the chance of receiving a message from a visitor are not the ones whose owners have ranked themselves the most attractive. For male profiles, this is even more evident.

Communication patterns

Next, we addressed the third research question regarding the asymmetries in communication initiation between men and women in online dating (Fig. 5). The initiator ratio, or percentage of sent communication initiations over total communications for the average female user trends downward over time. For men, the initiator ratio shows the opposite trend, increasing from 2008 to 2013, with a small dip in 2014 before climbing again. Initiator ratio drops for both men and women in 2018. It should be noted that percentages for men and women do not add up to 100% for each year, because the calculation is not for total communications sent and received between men and women within a single year, but for the lifetime of each user profile sampled from March of each given year. As evidenced, men on average consistently initiate more communications than women. The difference between men and women’s average initiator ratios is persistent from 2009 to 2018 (Fig. 5, right panel).

Fig. 5
figure 5

Initiation ratios for men and women over time (left) and difference in initiation ratios between men and women over time (right)

We look further to the individuals in the matching data to investigate initiation (\(I\text{'}\)) and reply rate (\(R\text{'}\)) in time, defined as the average number of requests initiated or replied to by a user per day. Messages are binned into individual days, and users tend to go through phases of high and low activity, presumed to be when they might be ‘exclusive’ with a partner, temporarily unsubscribed from the platform, or simply too busy or uninterested in engaging in online dating. As such, we take the average of the inverse time between days of activity, weighted by the number of interactions on the days in question.

There is a sizable number of users who seemingly are active at the very start of their subscription, then not at all until the end of their subscription, we speculate that this is due to a reminder email encouraging one last stab at finding romance. This results in an artificial peak of users whose activity is around 1 request per 365 days. These users are removed from the following analysis, so as to focus on those who regularly use the site. We also remove users who are only active on 1 day. The cumulative distributions are shown in Fig. 6, along with the 50%, 90%, 95%, and 99% quantiles. The quantiles are given as number per day, so initiation rate at 95% at 1.735 per day translates to 12.15 new initiations per week. Similarly reply rate at 95% at 0.943 per day translates to 6.60 new replies per week.

Fig. 6
figure 6

Cumulative histograms for Initiation Rate (\(I\text{'}\)) and Reply Rate (\(R\text{'}\))

In Relation to RQ4, the match-level data offer the opportunity to investigate the properties of attractiveness and selectivity of users when they send, but also receive requests—not necessarily symmetrical. The match-level data are aggregated by user, allowing us to study the range of behaviors of users when sending or receiving requests. We define the following measures for communication attractiveness when sending \(({A}_{s})\) and receiving (\({A}_{r})\) requests

\({A}_{s}=\frac{{n}_{rep}}{{n}_{sent}}\) \({A}_{r}=\frac{{n}_{rec}}{{n}_{v}}\) where \({n}_{sent}\) is the number of messages sent, \({n}_{rep}\) is the number of messages replied, \({n}_{rec}\) is the number of messages received, and \({n}_{v}\) is the number of profile views. Note that this defines attractiveness according to the success of a user, rather than self-perceived physical attractiveness, as used previously. We also calculate the initiation ratio and reply ratio; how likely a user is to engage with another user when presented with a possible match or a new message

\(I=\frac{{n}_{sent}}{{n}_{v}}\) \(R=\frac{{n}_{rep}}{{n}_{rec}}\).

Selectivity when sending (\({S}_{s}\)) and receiving (\({S}_{r}\)) requests, or how picky a user is when selecting partners, is consequently defined as

\({S}_{s}=1-I\) \({S}_{r}=1-R\),

where S ~ 0 indicates users send/reply to requests indiscriminately, and S ~ 1 indicates users send/reply to a very small fraction of possible requests. Thresholds for the studied users are applied, such that each user has more than 5 potential interactions for the attractiveness/selectivity measures and at least one interaction is undertaken by the user. The correlations between the attractiveness and selectivity features are plotted in Fig. 7.

Fig. 7
figure 7

Relationships between attractiveness and selectivity for both receiving and sending

We observe a positive, significant correlation between attractiveness and selectivity: those who receive lots of requests are able to select their preferred partners, whereas those who receive fewer matches are forced to be less choosy. Users act on feedback based on how many replies they receive to adjust their expectations, as well as self-perception of attractiveness. There is also correlation between attractiveness when receiving requests, and selectivity when sending requests, one step removed from each other, and reliant on self-perception of attractiveness compared to potential matches. In summary, more attractive people, whether initiating or receiving contact, are more selective with the fraction of people that they interact with.

Personality features

We built a multivariate regression model to determine which variables could predict “success” in online, as measured by communication rate (communication initiations received over profile views by matches). After transforming skewed variables to normalize distribution and standardizing all coefficients, the results of the multivariate regression against communication rate are reported in Table 4.

Table 4 Multivariate linear regression models to determine which variables predict receiving communication initiations. The first model is for men receiving communication initiations from women, and the second is for women receiving communication initiations from men.

The results of the individual models for each gender reveal that there are different variables that predict success for men and women. Since the coefficients are standardized, we can compare between variables within each gender. For men, being altruistic and having a higher drinking level were the strongest predictors of receiving messages, while being older and more oriented toward conflict resolution were the most negative predictors of receiving messages. For women, cleverness, neuroticism, and drinking level had no impact on predicting likelihood of receiving messages. Being older was the strongest negative predictor of receiving messages, while being athletic was the strongest positive predictor. Similar to the results for men, sending communications and being sexual or oriented toward conflict resolution had a negative impact on receiving messages. Having photos and being romantic and altruistic helped chances of success for women as well. Also, we observe that overall, the rate at which women receive messages is much more predictable than men judging by the R-squared for both models.


To answer RQ5 and RQ6 we use a logistic regression to analyze whether homophily in sociodemographic or psychometric variables translates into higher chances of match communication. Homophily is operationalized as two users having the same value for any particular variable by creating a series of dummy variables. In the case of number of children and age, however, it was more sensible to simply calculate the absolute value of the difference.Footnote 1 Whether two users communicate with each other was operationalized by creating two dependent variables: Communication and Initiation. Both are binary variables, which are set to one, whenever a suitable variable indicates that a user has replied to a message or initiated contact after a match by sending a message.

The logistic regression was run in a number of different specifications to hedge against omitted relevant variable bias and to test the robustness of the results. Variables were grouped by clusters into socioeconomic, personal, and BAPIM, and importance variables and regressions were run within clusters for all years, within clusters for each year separately, within each year with all clusters, and across all years with all clusters. The significance and sign of most coefficients varied severely, with the most rigorous specification rendering the majority insignificant. The only variables which seem to be fairly robustly significant and somewhat stable in their effects are number of children, the desire for children, and a user’s smoking level.

Discussion and conclusion

The development of evolutionary theories of human social behaviour [50,51,52,53,54] has afforded a strong theoretical framework for sex differences in mate selection criteria. The finding that women have consistently higher means across mate preferences in this work confirms findings of gender differences in mate preferences; namely that women are more selective and restrict their potential mating pool more than men do. This finding has been found in the literature about speed-dating, as well (e.g., [11, 55]), and falls in line with theories in evolutionary biology about females being pickier about their potential mates. However, there are notable new findings in the work at hand that contradict the previous investigations of mate preference in online daters.

For instance, Hitsch et al. claim that women have a stronger preference than men for income over physical attributes [14]. This work reveals that smoking level and drinking level were the most important match criteria for both men and women overall, suggesting that lifestyle choices are important across both genders. In fact, income was the second least important criterion to women, religion being the least. Hitsch et al. claim is partially true, in that women on average do consider income in a potential match more important than men do, but the importance of this trait has decreased significantly over time. This change could theoretically be due to women’s increased financial independence, though it would be difficult to attribute cause definitively.

The decline in importance of income, religion, and education for both men and women is a surprising trend that suggests perhaps people are becoming more tolerant and open to dating others outside of their own social strata. This tolerance has notably not translated over to age preferences, where patterns over time are less clear. Somewhat surprisingly, women are still more restrictive overall in their preference for age than men are. This may seem counterintuitive to those who might expect men to only seek mates within child-bearing age. As it turns out, women are pickier across the board, which may also have more to do with male over-representation on online dating sites and therefore increased female choice.

The finding that gender differences in response for the two lifestyle questions (smoking level and drinking level) were not significant from 2015 onward may reveal that social attitudes toward these activities are not gender-dependent. While the importance of drinking level for men rose from 2014 to 2017, both genders consistently regard drinking level “Somewhat Important,” suggesting that social attitudes may have relaxed toward drinking level for both men and women. Meanwhile, preferences for smoking level became almost evenly split between those who consider smoking “Very Important” and “Not Important,” suggesting that people in general fall into the two camps of smokers and non-smokers and are intolerant towards the other group. Since smoking levels have decreased in the UK over time, this polarization of opinion may be due to changing demographics of the user base as well.

While women are more selective along virtually every mate preference criterion, this gender difference in selectivity crucially depends on group size. Previous literature has found that in smaller sessions (fewer than fifteen partners), selectivity is virtually identical for men and women, with subjects of each gender “saying yes” to about half of their partners. In larger group sizes, however, male selectivity is unchanged, while females become significantly more selective, choosing a little more than a third of their partners [11]. These results are quite distinct from the average difference in selectivity between men and women, suggesting rather more rapidly diminishing returns for increased dates for females when group size increases. Though Fisman’s research focused on speed-dating, the parallel holds for the significantly increased group size of an online dating platform, where choice is virtually endless. The reasons women may be more selective than men and find less utility in increased choice could be manifold, from social stigma against women who go on many dates to differing motivations for why men and women use online dating in the first place.

The findings relating to the relationship between physical attractiveness and communication rate are notable due to their online context. Other studies attempting to measure the effects of physical attractiveness on popularity have encountered difficulties separating physical attractiveness from confounding characteristics, including social skills [56, 57]. However, the online nature of this work is unique in that very little can be socially expressed from an online profile on eharmony. After being matched through an algorithm, users are left to evaluate a profile based on only little more than a picture.

The findings that physical attractiveness does not have a monotonous relationship with communication rate are somewhat surprising, but in line with the previous research that produced similar findings [12]. The slight but notable differences in the relationship for each gender have several implications. First, the higher rate of change for women scored between 2 and 6 in attractiveness suggests that women’s communication rates are more dependent on their looks than for men. The finding that men value attractiveness more than women is consistent with the previous research that found stronger correlations between opposite-sex romantic popularity for women than for men. It is also in line with critical feminist theory as well as evolutionary and sociocultural theories of mate selection preferences that contend that men place greater value on physical attractiveness than do women. However, this is complicated as the rate of change is steeper and more negative for men ranked 6 and above. The same fear of rejection mentioned earlier may be stronger than for women initiating conversations with particularly attractive men.

The findings of growing asymmetry in communication initiation between men and women are rather counterintuitive. While early on, people might have hoped online dating would create a more equal playing field for women to initiate courtship, it has become clear that online dating has not only reflected but exacerbated male-dominated initiation. This is due largely to the lop-sided activity levels for men and women on online dating sites, as women learn to expect male initiation and avoid initiation in keeping with learned norms. The introduction and mass popularity of mobile dating applications such as Tinder in 2014 could also explain the accelerated decline of female initiation over the following years, as online dating became more popular and the signaling and psychological costs for men sending messages declined.

As online dating becomes more popular and increasingly sophisticated, a new generation of dating apps is embedding costly signals into their platform design to solve this issue of lop-sided communication. By instituting a mechanism whereby each agent has only a limited number of signals, they create opportunity costs associated with sending signals. For instance, Coffee Meets Bagel has a Woo button, where users pay (with the in-app “beans” currency) to send an extra signal to a specific someone. Users only get beans by performing tasks like inviting friends or purchasing them directly in the app store. In a similar vein, Tinder lets users send one Superlike per day. These signals work, because they are costly to the sender by virtue of scarcity and the receiver knows this, and thus, they pay attention to the signal in an otherwise noisy environment.

Other factors that may influence the design of online dating platforms are our findings on individuals’ ‘dating capacity’, or how frequently people engage with new prospective partners. While user strategies might vary across more casual dating platforms, users on eharmony are particularly invested in finding a long-term romantic partner, so we are confident that these findings are applicable to non-casual courtship behaviour in general. These findings and methods may nevertheless be integrated across dating platforms to enable effective platform-specific communication. Though research behind Dunbar’s number acted as motivation for this line of inquiry, the picture is by no means complete. Specifically, further work is needed to assess the number of prospective partners that contact is simultaneously maintained with, rather than new people contacted, as well as cross-platform research to test results for serious communication patterns with different platform affordances, cultures, and sexualities. A related result for initiation rate (median ~ 1 per day) is provided in [10], which is of comparable size to our own, though initiation is likely more sensitive to the platform population and design as compared to reply rate.

Our results across different dimensions of attractiveness by popularity and selectivity indicate individuals’ awareness, if weak, of their own desirability. This, together with feedback from their level of success on the platform, informs user choices in initiating and replying. The correlations between attractiveness and selectivity for both the same and different modes of replying and initiating indicate that this awareness and strategy goes beyond directly addressing the individual games of active search for and deciding responses to potential partners.

From the models for the number of communications received, we learn that being younger and athletic and having more photos increases likelihood of receiving messages in online dating, as does being romantic and altruistic. These results add a more nuanced understanding to previous findings in RQ2 about the importance of physical attractiveness. It could be possible that being young and athletic is at least related to identifying as physically attractive, and that these traits increase likelihood of receiving messages. The negative relationship between communication rate and being older could suggest that age is not heterogeneous, but that users prefer younger potential partners. The negative relationship between communication rate and sending communication initiations also confirms that users who receive many communication initiations are less likely to send initiations themselves. It is unclear what signals users with higher levels of neuroticism and conflict resolution skills are sending in their profiles that decrease likelihood of receiving messages.

As for the differences between predicting success for men and women, the findings that drinking and being clever were positive predictors of success for men, but not for women were noteworthy. These findings suggest that physically reflected traits such as age and athleticism were most important factors for determining whether women would receive messages, in line with our earlier results in RQ2 about women being evaluated by their looks more than men.

The logistic regressions testing for the interplay of homophily and match communication uncover that a big difference in the number of children between two users seems to have strong negative effects on their chances of communicating or initiating contact. This seems intuitive, as the shared experience of having children or not having children has huge implications on the success of a potential relationship. Furthermore, the presence of children in a relationship introduces responsibility on a potential spouse that will greatly impact the seriousness of the relationship. Additionally, the desire for children is unsurprisingly decisive, with a matching desire leading to sometimes very strong positive effects. The choice of having or not having children is clearly a strong factor in a potential couple’s life planning. Differing opinions on such life plans will clearly have a negative impact on a couple’s shared experience.

The sometimes very strongly negative effect of a user’s matching smoking level is notable. Intuition seems to dictate that non-smokers will be very averse to the idea of entering a relationship with smokers, whereas smokers would either be indifferent or adverse to the idea of entering a relationship with a non-smoker. These results, however, indicate that matching smoking levels are detrimental to successful communication. One explanation might be that smokers do not want to enter a relationship with other smokers.Footnote 2

The lack of robustly significant findings in the remainder of the variable set hints at low levels of homophily on online dating sites. Nor similarity in key socioeconomic, or psychometric variables seem to matter for the chances of successful match initiation or communication. Even when users seem to place nominal importance on variables such as income, they de facto appear to be insignificant.

This may be driven by a few factors: First, men are much less selective in who they communicate with than women. Moreover, low activity of a large number of users, lower activity levels for women vis-à-vis men [28], and the social and gender-normative conventions of men still typically being the first initiator for a conversation in dating settings [9, 26, 32] make it optimal for men to use the “shotgun method” of dating. That is, the optimal strategy is not to find a “good match” (i.e., high homophily levels), but to maximize the probability of a successful match by messaging a large number of people, irrespective of their potentially low fit. Second, it is very possssible that people’s reflections on what they find desirable in a potential match are overridden by more superficial concerns, such as a user’s profile picture. Third, it is possible that the relationship between homophily and successful communication is non-linear and thus is not captured by a logistic regression.

In conclusion, our findings provide a quantitative overview of how heterosexual users seek mates, evaluate physical attractiveness, and communicate with one another in online dating systems. The results span investigations of various online dating phenomena first at the individual level, then between pairs of users, and finally between genders. In addition to its broader findings, this work sheds light on latent gender asymmetries in the user preferences and user behaviour of online daters and how these differences have changed over time. The results from the aforementioned areas, and the regression of a number of them against the number of communications received on the platform, show that while there are many variations across mate preferences and communication patterns, there are a few pointed variables that could actually act as potential predictors of sending and receiving messages overall.

Finally, one should note that this work was conducted within the context of data generated from one online dating site within the United Kingdom geography. While there was extra caution taken to ensure that findings were as representative as possible, it could always be valuable to confirm validity by testing novel data sets from other dating sites and from different geographies where social norms may differ. Future research could also strive to situate these findings within the offline courtship context. Building on this work, researchers could use a combination of qualitative and quantitative methods to better understand whether users approach mate preference, evaluation of physical attractiveness, and communication initiation differently in offline contexts where search costs or fear of rejection may be higher.

Funding statement

RD was partially funded by eharmony. CB was funded by the Alan Turing Institute Studentship. TY was partially funded by The Alan Turing Institute under the EPSRC grant EP/N510129/1. The Funder had no role in the conceptualization, design, data collection, analysis, decision to publish, or preparation of the manuscript. Open Access funding provided by the IReL Consortium.