1 Introduction

Most individuals adopt an innovation by imitating their influential peers (Rogers 1962; Bass 1969) that underlines the role of social networks in the diffusion of new products, technologies or ideas (Granovetter 1978; Valente 1996). Network scientists argue that the structure of social networks can explain the underlying mechanisms of social influence and adoption: highly connected nodes have more influence than others (Pastor-Satorras et al. 2015), while diffusion is more likely in tightly connected cliques and less likely across them (Centola and Macy 2007). Wang et al. (2019) paint a more nuanced picture as they found that network hubs are effective in spreading simple messages, less connected nodes gain importance in the diffusion of complex stories.

A central part of this discussion has led to the “influence maximization” problem (IM) (Kempe et al. 2003), which aims to identify the ideal seed nodes that a marketing campaign should target to achieve maximum impact, given pre-defined diffusion models. The IM is NP-hard; thus, many use heuristics to find the seed nodes and start optimization by assuming that nodes with high network centrality (e.g., degree) are influential spreaders (Kitsak et al. 2010; De Arruda et al. 2014) and run diffusion simulations, most notably using the linear threshold and the independent cascade models. However, these models fail to capture an important feature that is observed in real-life networks: homophily, the tendency that similar individuals are more likely to be connected than dissimilar ones.

Homophily, also referred to as assortativity in relation to social networks (Newman 2002), is a general phenomenon (McPherson et al. 2001; Cho et al. 2012) that has a fundamental role in innovation spreading (Anwar et al. 2021). Ignoring this effect poses a major problem for seed identification in influence maximization when the sole source of information is the network structure (Aral and Dhillon 2018).

Although there are papers that address the role of homophily in network diffusion and papers that consider innovators and early adopters in influence spreading (see Sect. 2 for an overview), none focus on both problems simultaneously. In particular, our paper is the first that aims to identify innovators and early adopters, while taking their assortative mixing into account, with the aim to provide heuristics for seed selection in influence maximization.

This research niche is important since social influence and centrality are difficult to disentangle without knowing at least some of the early adopters of the specific innovation (Banerjee et al. 2013) or assuming homophily in terms of adoption in the network (Toole et al. 2012). Furthermore, central individuals may be reluctant to participate in a campaign or may not be susceptible to the marketing message due to risk-averseness. Subscribing to a new trend or technology needs commitment and entails social risk—not everyone is willing to do that—yet, homophily in terms of risk-taking behavior is a prerequisite of diffusion cascades (Watts 2002). Central agents with many friends may particularly feel the social pressure to be conformist and to avoid eccentric behavior. Innovators and early adopters, on the other hand, are known to possess psychological traits that make them perfect subject for the early market of an innovation (Rogers 2003).

In this paper, we aim to contribute to the above discussion in two ways. First, empirical data on adoption dynamics from two online social networks enable us to investigate how network structure can be useful to identify innovators and early adopters in innovation diffusion. Second, we propose a ranking of the users based on the so-called Top Candidate method (Sziklai 2018)—an expert selection algorithm that exhibits features resembling assortativity.

We compare the Top Candidate ranking with seven well-known centrality measures on two online social networks: iWiW from Hungary and Pokec from Slovakia. Registration days of users are known in both networks, both are assortative in terms of adoption time but represent different levels of assortativity in network centralities. We look at the top 1000 nodes of the Top Candidate ranking and the other seven alternative measures and plot how the date of registration is distributed over time.

We find that the Top Candidate ranking is more efficient in capturing innovators and early adopters than other widely used indicators. Top Candidate nodes adopt earlier and have higher reach among innovators, early adopters and early majority than nodes highlighted by other methods. These results suggest that the Top Candidate method can identify good seeds for influence maximization campaigns on social networks.

2 Literature overview

2.1 Early adopters as opinion leaders

The identification of innovators and early adopters is key for marketing campaigns and their characterization received considerable attention. The literature converges toward the conclusion that innovators and early adopters stand out from their peers.

Rogers (2003) describes innovators as venturesome individuals who can cope with a high degree of uncertainty, and early adopters as a group with high socioeconomic status. Moore (2014) depicts innovators as technology enthusiasts, or geeks and early adopters as visionaries who are willing to take high risk. In the literature, innovators and early adopters are often grouped together under the term early market (Muller and Yogev 2006; Moore 2014). Although we do not follow this convention here, we do assume that both groups are susceptible to marketing messages, hence they are good candidates as seeds for influence maximization.

A field study by Brancheau and Wetherbe (1990) supports hypotheses that early adopters were more highly educated, more attuned to mass media, more involved in interpersonal communication, and more likely to be opinion leaders. Eastlick and Lotz (1999) reports that social risk negatively relates to the tendency to be a potential innovator and potential innovators possessed significantly stronger opinion leadership. A dutch survey shows that early adopters are likely to be highly mobile, have a high socioeconomic status, high levels of education and high personal incomes (Zijlstra et al. 2020). Gender imbalance can be also observable for certain products. Plötz et al. (2014) report that early adopters for electric vehicles are predominantly middle-age men. Finally, Muller and Yogev (2006) provides empirical evidence that the average time at which the main market outnumbers the early market is indeed when 16% of the market has already adopted the product—giving support Rogers (1962)’s division of adopter sets.

Another important concept is market mavenness (Feick and Price 1987). Market mavens are consumers who are highly involved in a market. They have information about many kinds of products and shops, and they enjoy sharing their knowledge. Peers often seek out their opinion and rely on their expertise. Goldsmith et al. (2003) finds that consumer innovativeness and market mavenism positively correlate, although they argue that market mavens and innovators are distinct groups. Nevertheless, market mavens can convince their community and thus their social interaction is decisive for innovation diffusion, as it was demonstrated in the case of electric vehicles (Seebauer 2015).

Directly related to the context of this study, Lynn et al. (2011) explores the relationship between personality traits of early adopters of social network sites. They report that extraversion, openness and conscientiousness impact positively and significantly on information sharing, and negatively on rumor sharing. On the other hand both, information sharing and rumor sharing impact positively and significantly on the centrality of early adopters. The seemingly contradictory observations can be explained away by separating the social status of opinion leadership and the influencing capacity of the agent which relates more to network centrality.

To sum up, innovators and early adopters stand out in their personal characteristics. Thus, marketing campaigns have usually targeted and labeled them as opinion leaders to convince society. However, both Lynn et al. (2011) and Dedehayir et al. (2017) argue that a distinction has to be made between opinion leadership and innovativeness. Even Rogers (2003) affirms that opinion leaders are not necessarily innovators.

2.2 Early adoption and homophily in network diffusion

In the Influence Maximization framework (Kempe et al. 2003), few papers addressed other node characteristics concentrating in network communities that can help to predict the future popularity of novelty. For example, influential individuals can form clusters that can help the early propagation of an idea (Aral and Walker 2012). Weng et al. (2014) build a predictive model for meme popularity using three classes of features: network topology, community diversity, and growth rate. They found that community related features are the most powerful predictors of future success. Hajdu et al. (2020) study the community structure of public transportation networks and find that transmission probabilities depend on the community structure. Calió and Tagarelli (2021) study attribute-based seed diversification. They argue that a seed set with different characteristics (age, gender, etc.) might be more successful in information-propagation. Rahimkhani et al. (2015) identifies the community structures of the input graph then chooses a number of representative nodes to form the final output of the proposed algorithm.

However, this literature has largely overlooked a phenomenon inherent is social networks and diffusion dynamics alike: the role of homophily (McPherson et al. 2001). It has long been recognized that a behavior can spread in society only when those most prone to it are surrounded by peers who are somewhat less but almost equally open to its adoption (Granovetter 1978). In other words, innovators must be connected to early adopters such that adoption can penetrate in their communities and later influence the rest of the market too, otherwise the innovation will not spread (Watts 2002). Adoption dynamics can be predicted at small scales only by assuming homophily of adoption (Toole et al. 2012). Despite the importance of adoption homophily in networks, it has been largely ignored in influence maximization modeling (Aral and Dhillon 2018).

Instead, a usual assumption to find the seed nodes for Influence Maximization is that network structure alone can quantify influence. For example, nodes with high network centrality (e.g., degree) are usually considered as influential spreaders (Kitsak et al. 2010; De Arruda et al. 2014).

Finally, the presence of assortativity implies that not every connection is equally important in the diffusion. However, the literature also ignored the problem of determining where the probabilities of influence between users come from (Goyal et al. 2010). Recently, Qiang et al. (2019) proposed two learning models that are aimed at understanding person-to-person influence in information diffusion from historical cascades, while Bóta et al. (2015) and Bóta et al. (2016) considered the Inverse Infection Problem as a way to estimate the hidden edge infection probabilities.

3 Data and methodology

In this paper, we propose the Top Candidate method that can identify innovators and early adopters in social networks more efficiently than other widely used network centrality measures, by using network structure as the only source of information. We compare the ranking induced by the Top Candidate method with seven other centrality measures by using data from two online social networks.

3.1 Data

Our empirical analysis leverages data retrieved from two social media platforms. The first platform is called iWiW (international who is who) that was an early Hungarian version of online social networks aiming to link pre-existing friends and an outstanding online innovation of its time. The iWiW platform existed between 2002 and 2014. It was the most visited website in the country in the mid 2000s, but failed the competition with Facebook that started in Hungary from 2008. Pokec is a still functioning Slovakian dating and chatting website with a purpose of meeting new people.

These data sources provide unique opportunity to understand how network structure can help us identifying early adopters of an innovation. Both data sources contain the date of individual registration to the websites that is used as a proxy of adoption. We define innovators as the first 2.5% and early adopters as the following 13.5% of adopters (Rogers 2003). Data also include the identifiers of friends that enables us to generate social networks. The iWiW dataset has been used in previous work to describe and model the innovation diffusion process (Török and Kertész 2017; Lengyel et al. 2020; Bokányi et al. 2022).

Here, we use a 10% sample of the iWiW data that contain 271 913 nodes 2 712 587 edges. The Pokec network contains 277 695 nodes and 2 122 778 edges. Access to iWiW data was provided to us by a non-disclosure agreement with the data owner company. Pokec data are open access at https://snap.stanford.edu/data/soc-pokec.html.

3.2 The Top Candidate method

Top Candidate (TC) algorithm is a group identification method designed to find experts on recommendation networks (Sziklai 2018, 2021). The algorithm takes a network as an input and outputs a list of experts. With a parameter, \(\alpha \in [0,1],\) we can adjust how exclusive our list should be. Each agent nominates \(\alpha\) fraction of their most popular neighbors as experts, where popularity is based on (weighted) in-degree. In the beginning, every agent is labeled as an expert, then in successive rounds, we remove the nominations of agents who were not nominated by anyone until we obtain a stable set. The underlying idea resembles homophily: Experts identify other experts much more effectively than amateurs. Thus, in the set of experts (i) each expert should be nominated by another expert and (ii) each nominee of an expert should be also included in the set—this property is called stability. In this paper, we apply this algorithm to identify innovators and early adopters based on the assumption that opinion leaders can be similarly identified in networks like experts.

One advantage of the Top Candidate algorithm is its axiomatic characterization. It is the unique methodFootnote 1 that satisfies stability, exhaustiveness and decisiveness. Exhaustiveness ensures that all possible experts are recognized on the network, not just a subset, and decisiveness guarantees that at least one expert is selected if reasonable choices are presented. There are other centralities that feature characterizations, most notably PageRank (Wąs and Skibski 2018), Generalized Degree (Csató 2017) and the Shapley value (Shapley 1953; Young 1985), but it is less clear how these relate to socio-demographic properties of the nodes.

3.3 Network centralities

We compute seven other measures on the data to asses their ability in finding innovators and early adopters.

Degree represents the number of connections that a user has. It is a natural benchmark for the user’s centrality. Another classical measure is Harmonic centrality. It is a distance-based measure proposed by Marchiori and Latora (2000). Harmonic centrality of a node, \(\textbf{u},\) is the sum the reciprocal of distances between \(\textbf{u}\) and every other node in the network. Disconnected node pairs have infinite distance, thus the reciprocal is defined as zero. Peripheral agents, who are many handshakes away from most of the other users, thus have a small Harmonic centrality.

PageRank (PR), introduced by Page et al. (1999), is a close relative of Eigenvector centrality (Bonacich 1972). The latter assigns centrality scores to nodes based on the eigenvector of the adjacency matrix of the underlying graph. The method breaks down if the graph is not strongly connected. PageRank rectifies this by (i) Connecting sink nodes (i.e., nodes with no leaving arc) with every other node through a link and (ii) Redistributing some value uniformly among the nodes. Redistribution is parameterised by the so-called damping factor, \(\alpha \in (0,1)\). PageRank was designed to model a random walk on the World Wide Web. We start from an arbitrary webpage. On any subsequent step, we leave the current webpage with equal probability on one of the departing links. After each step, we have a \((1-\alpha )\) probability to restart the walk at a random node. The probability that we occupy node \(\textbf{u}\) as the number of steps tends to infinity is the PageRank value of node \(\textbf{u}\). PageRank composes the core of Google’s search engine, but the algorithm is used in a wide variety of applications. The damping value is usually chosen from the interval (0.7, 0.9), here we opted for \(\alpha =0.8\).

Generalized degree discount (GDD) introduced by Wang et al. (2016) was developed specifically for the independent cascade network diffusion model. In this model, each active node has a single chance to infect its neighbors, transmission occurring with the probability specified by the arc weights. GDD is a suggested improvement on Degree Discount (Chen et al. 2009) which constructs a seed group of size q starting from the empty set and adding nodes one by one using a simple heuristic. It primarily looks at the degree of the nodes but also considers how many of their neighbors are already in the seed set. GDD improves this by also taking into account how many of the neighbors’ neighbors are spreaders. The spreading parameter of the algorithm was chosen to be 0.05.

k-core, also referred to as k-shell, categorizes nodes into layers (Seidman 1983; Kitsak et al. 2010). First, it successively delete nodes with only one neighbors. These are assigned a k-core value of 1. Then, it deletes nodes with two or less neighbors and labels them with a k-core value of 2. The process is continued until every node is classified. For instance, every node of a path or a star graph is assigned a k-core value of 1, while nodes of a cycle will have a k-core value of 2.

Linear threshold centrality (LTC), as the name suggests, was developed for the linear threshold diffusion model (Riquelme et al. 2018). Given a network, G with node thresholds and arc weights, LTC of a node \(\textbf{u}\) represents the fraction of nodes that \(\textbf{u}\) and its neighbors would manage to activate as a seed set in the linear threshold model. Since the social networks we used in our analysis had no data on friendship intensity, we decided to assign a uniform unit weight to each connection. Node thresholds was defined as 0.7 times the node degree. That is, a user became activated if 70% of its friends had been active.

Suri and Narahari (2008) define a cooperative game on the network and derive node centrality by computing the Shapley value. In this setting, the Shapley value of a node is the average marginal contribution that a node generates when the seed set is composed by adding nodes one by one and any order of the nodes is equally likely. Every node set is assigned a (characteristic function) value. Marginal contribution of a node \(\textbf{u}\) is just the difference between the value of the node set with and without \(\textbf{u}\). There is more than one way how this can be done. We use the G1 game variant proposed by Michalak et al. (2013) who also gave an efficient algorithm to compute the corresponding Shapley(G1)-value. In G1, the characteristic function value of a node set C is the number of nodes in C plus the number of neighbors of C. Under this setting, the Shapley value of a node u is calculated as the sum of reciprocals, \(\frac{1}{1+deg(v)}\), for each v belonging to the extended reach of u (the neighbors of u plus u itself).

4 Results

4.1 Homophily of adoption

Before we delve into the performances of centrality measures, let us take a look at the networks themselves. Tables 1 and 2 explore the interconnectedness of adopter groups. iWiW and Pokec paint a similar picture: Typically, there are more connections between subsequent groups in the adoption timeline than between other groups. Innovators are mainly befriended with early adopters, who in turn are mainly connected to early majority and so on.

Table 1 Group interconnectedness in iWiW. An entry of the matrix shows the portion of links that connects the column group to the row group with respect to the column group’s total connections
Table 2 Group interconnectedness in Pokec. An entry of the matrix shows the portion of links that connects the column group to the row group with respect to the column group’s total connections

A number of interesting observations can be made. Firstly, the result reinforces Rogers’ classification. It is much more obvious why cascade happens the way it does. Innovators have the biggest impact on early adopters because early adopters are the innovators closest—or at least the most numerous—friends.

Secondly, psychological traits do affect the network structure. Rogers’ categorization correlates with risk attitudes, extraversion, openness and a number of other traits. It seems that risk-seeking (extrovert, open-minded, etc.) users prefer the company of other risk-seekers, while risk-averse users are more comfortable with other risk-averse individuals. The results are in line with the findings of Selfhout et al. (2010).

Thirdly, identifying innovators and early adopters does not seem to be a hopeless task anymore. Clearly, these groups form clusters on the network. Thus, there can be centralities that are systematically better in recognizing them.

These observations have a rather remarkable implication. Researchers of influence maximization frequently validate their algorithms using simulations with either the linear threshold or the independent cascade diffusion models—these are the most commonly used configurations by far. A basic flaw in these simulations is that thresholds and diffusion probability are chosen at random either independently of the network structure or only having a crude relationship with it. For instance, in the linear threshold model in every simulation, the node thresholds (which signify the tendency for the nodes to adopt an innovation) are generated uniformly at random for each node (Kempe et al. 2003). In the independent cascade model, the two most common propagation setup is the weighted cascade and the trivalency models (Jung et al. 2012). In the first, the propagation probability on each edge equals to the reciprocal of the degree of the source node, while in the latter, it is chosen randomly from the set \(\{0.1,0.01,0.001\}\).

In light of Tables 1 and 2, these assumptions lead to a highly unrealistic threshold/propagation probability distributions. In order to obtain a realistic network configuration, the distribution should take into consideration the clustering of the adopter sets. For instance, thresholds of nodes that belong to innovators or early adopters should be lower in general than thresholds of other nodes. This could be achieved by choosing the thresholds from an interval. Disregarding the underlying structure introduces a systemic bias that may be favorable for some influence maximization algorithms while detrimental to others.

Table 3 Assortativity of the iWiW and Pokec networks in terms of network centralities. The assortativity index ranges from −1 to +1. Negative values mean that nodes of similar centrality values are not connected, while positive values mean that nodes of similar centrality values are connected

Although the two online social networks are similar in terms of adoption homophily, the assortativity of these networks is different in terms of the network centrality measures described in Sect. 3.3. Both networks are assortative in terms of Harmonic centrality and k-shell measures (Table 3). However, Pokec is disassortative in terms of Degree, Generalized Degree Discount, and PageRank. This means that the identification of innovators and early adopters is carried out on networks in which individuals of similar levels of assumed influence are mixed differently.

4.2 Identification of innovators and early adopters

Now we turn to the network centrality indicators and their performances in finding innovators and early adopters. We computed the top 1000 nodes according to eight centrality measures on both iWiW and Pokec. If the 1000th and 1001st node tied under some measure, we discarded nodes of the same centrality value randomly until there were only 1000 nodes in the set.

Tables 4 and  5 show the overlap between the top 1000 nodes of the centralities that we employed in this paper on the iWiW and Pokec networks. Each centrality genuinely differs from the others, although LTC, GDD and PageRank somewhat overlap with Degree on both networks. In general, k-core, TC and Harmonic centrality contain more nodes that are uniquely represented by those centralities.

Table 4 Overlap in the top 1000 nodes of different centralities on the iWiW network. Measures are ordered according to their distance to degree
Table 5 Overlap in the top 1000 nodes of different centralities on the Pokec network. Measures are ordered according to their distance to degree

Table 6 compiles the average and median date of registration for the top 1000 nodes. Centralities are ordered by the median, last row shows the average and the median for all nodes in the network.

Table 6 Average and median date of registration of the top 1000 nodes of various centrality measures on the social networks iWiW and Pokec. Registration date is measured in number of days from the kickoff of the network. Last row shows the average/median on the whole network

In case of iWiW all measures performed well, that is, all averages/medians are below the network average/median. The Top Candidate (TC) method proved to be the best, with an average date of registration 7% lower than that of the next best centrality, Degree, and almost 20% lower than the network average.

TC retains its first place on Pokec as well, though with smaller margins. It performs 4.3% better than the next best, GDD, and 7.5% better than the network average. Note that, the centralities showed much more volatility: Five out of the eight performed worse than the network average.

The results seem to be consistent. TC, GDD and Degree are among the first four, while Harmonic centrality, PageRank and Shapley (G1) lag behind on both networks. Only LTC and k-core showed varying results.

The average and median date of registration are, in themselves, imperfect indicators of performance. Due to their extreme risk-aversion, laggards would almost surely refuse to participate in a campaign, while individuals belonging to the early majority might be persuaded with, e.g., a small financial reward. Hence, we need to take a look at the whole distribution to evaluate the measures.

In case of iWiW, the field is mostly even (Fig. 1). TC is the only centrality that sticks out of the crowd, consistently outperforming the other measures in innovator and early adopter category, while also having the fewest laggards and late majority.

Although the performances are more nuanced in Pokec, TC is still the best (Fig. 2). In case of innovators, its performance is on par with the other measures. This is perhaps due to the fact that very few individuals fell into this category. It has more early adopters and early majority and less late majority than any other centrality, while in laggards category, it is the second best. GDD also shows some very promising results.

Fig. 1
figure 1

iWiW dates of registration of top 1000 users of various network centralities. Measures are ordered by the median day of registration

Fig. 2
figure 2

Pokec dates of registration of top 1000 users of various network centralities. Measures are ordered by the median day of registration

Assuming that (i) A marketing message or a product sample will only incite innovators or early adopters, and that (ii) These two groups have their greatest influence on like-minded groups and on early majority, it is worth to restrict our attention to these two groups and their interactions with their neighbors. Figures 3 and 4 show the net reach of innovators and early adopters among the top 1000. The bar graph on the left depicts how many innovators, early adopters and early majority they reach not counting themselves. This illustrates the indirect impact of the campaign. The bar graphs on the right hand side show the composition of their reach.

Note that, TC only comes out as a winner if these two assumptions hold—the bulk reach of, e.g., PageRank, that includes late majority and laggards as well, is much larger than that of TC. Thus, on a conventional linear threshold or independent cascade simulation, PageRank would outperform TC. However, by omitting these two assumptions, we oversimplify the diffusion model and assign inaccurate prediction power to the tested algorithms.

Fig. 3
figure 3

Reach of innovators and early adopters among the top 1000 nodes of different centralities on iWiW

Fig. 4
figure 4

Reach of innovators and early adopters among the top 1000 nodes of different centralities on Pokec

5 Conclusion

Innovators and early adopters are not abstract theoretical constructions, but groups that can be found on social networks as node clusters with distinct connection preferences. Consequently, they can be identified by observing the network structure. The top choices of some network centralities include more innovators and early adopters than others. Since these two groups play an essential role in innovation spreading, such network centralities might be more effective in real marketing campaigns.

Influence maximization aims to find the most influential nodes on the network. In the past two decades, myriads of clever heuristics were invented to optimize this computationally difficult task. Usually, these algorithms are validated via computer simulations with little care about what a real diffusion would look like. In real life, targeted agents often refuse to participate in the campaign. The underlying reasons are manyfold, but most prominently agents differ in their risk attitudes. No matter how central a node is if it is risk-averse, unwilling to try the advertised product or commit to it openly.

Simulations also commonly ignore network homophily which can have serious impact on how a cascade unfolds. Both social networks presented here show strong patterns of homophily (Tables 1 and 2).

We tested eight different network centralities on two social networks where data about the date of registration were available. This allowed us to rank the centralities by their ability to identify innovators and early adopters. A novel expert selection algorithm, the Top Candidate method (TC), consistently outperformed every other method. To a smaller extent, Generalized Degree Discount and Degree were also effective.

A possible explanation of the success of the Top Candidate ranking is that individuals with high socioeconomic status and opinion leadership qualities—two traits that are associated with innovators and early adopters—are perceived as experts in society. Since the Top Candidate method is specifically designed to identify experts, it is a small wonder, that it finds more innovators and early adopters than other measures. The Top Candidate ranking is derived by the different parametrizations of the Top Candidate method. For a fixed parameter, the Top Candidate method outputs a list of individuals that form a stable set—the underlying idea is that experts are much more efficient in recognizing each other than amateurs, thus the selected individuals must support each other. This property resembles to assortativity and might be the reason why the method is successful in identifying such highly assortative sets as innovators and early adopters. Another possible explanation is that TC identifies more market mavens, who are also crucial in innovation spreading and widely acknowledged as experts.

The results may be interesting for practitioners of various fields. Computer scientists often test their heuristics with simulations on either the linear threshold or the independent cascade models. In light of the results, the accuracy of these experiments can be improved by redesigning the threshold and propagation probability distributions. There are already a few papers that study how to obtain sensible propagation probabilities for the independent cascade model but less attention was given to node thresholds, and no papers take into account Roger’s adopter classification when calibrating diffusion variables.

For marketing specialists, the practical lessons of this paper are that aiming for experts in a campaign might be a rewarding strategy, and that the Top Candidate method is an excellent tool for finding them.

6 Limitations and future research

In our study, we implicitly assumed that date of registration is a good proxy for innovativeness. Users that are keen to connect to the social network at an early stage can be reasonably categorized as innovators or early adopters at least regarding products and services related to social media. However, it is unclear how general the area is where this innovativeness applies. Opinion leaders tend to be monomorphic in nature, meaning they exercise their influence in a domain specific manner (Flynn et al. 1996; Doumit et al. 2011). The further we are from the initial product, the less certain we can be about their behavior. The same users might be innovative in information technology, but we cannot meaningfully say anything about their attitudes toward unrelated subjects like food and fashion. Thus, researchers of innovation diffusion should always look for additional characteristics beside network position.

The high level of assortativity that is observable in social networks calls for the revision of diffusion models. It would be expedient to test variants of the linear threshold and independent cascade models that account for homophily.

The good performance of Top Candidate method suggests that at least some fraction of the innovators and early adopters are considered experts in society. However, the rich characterization of innovators and early adopters does not traditionally include the ’expert’ label. An interesting sociometric line of research would be to explore the relationship between early adoption (or in general innovativeness) and perceived expertise.