Introduction

Social networks are complex structures that describe individuals in any social context. Theoretically, they can be mapped to graphs where nodes represent the individuals and edges connect nodes according to the individuals’ relationships. Then, properties and features can be extracted from the graph, and metrics can be applied to nodes and edges in order to better understand the individuals’ social behavior. Finally, there are many interesting applications based on such networks, including (but definitely not limited to) ranking individuals and their groups, link prediction, information diffusion, recommendation, and pattern analysis (e.g., [5, 14, 22]).

One of such properties is the strength of ties (given by the graph edges). Initial studies of social networks have emphasized the importance of properly measuring tie strength to understand social behaviors [17, 25]. More recently, analyzing how strong a tie is has allowed to investigate the different roles of relationships including ranking for influence detection [14], identifying impact at micro-macro levels in the network [7], its influence in patterns of communications [31], and team formation [8].

Despite the importance of analyzing the strength of ties, there are not many studies on evaluating how to measure it in scientific collaboration networks (also called co-authorship networks). In such networks, nodes are researchers and there is an edge between those pairs that have co-authored at least one scientific publication. Specifically, studying the strength of co-authorship ties may reveal how its behaviors relate to research and how any application based on co-authorship patterns may benefit. For instance, new strength-related metrics could help existing works on measuring research productivity [12] and ranking researchers [14] and their graduate programs [21], as well as recommending collaborations [5].

Furthermore, properly measuring the strength of co-authorship ties may help to identify which collaborations are more influent to each researcher. For example, if a researcher A collaborates with other researchers B and C, the strength of ties reveals which one is more important to A, then allowing different studies, such as team formation analyses. Also, researchers that form mostly weak (or strong) ties in the social network may indicate different collaboration patterns, for example, a researcher who has many collaborators through single papers, i.e., that person has collaborated only once with many people.

Overall, tie strength may be measured by a combination of the amount of time, the cooperation intensity, and the reciprocal services that characterize the tie [17]. Such strength may also be measured by using the neighborhood overlap metric [4, 13], a topological property that captures the total number of collaborations between the two ends of each edge and identifies the edges forming bridges in a community (set of nodes that are densely connected). The advantages of using such metric are its simple computation, the possibility to identify if ties are bridges or not, and the inclusion of neighbors to calculate tie strength (then allowing to analyze how a tie is in the social network, for example, isolated or not).

Another metric that has been largely used to measure the intensity of co-authorship between ties is the absolute frequency of interaction (the number of publications between pairs of researchers) [29, 32]. Besides its simple calculation, another advantage is the representation of the exact frequency of collaboration between ties. However, we find a few problems in both metrics that complicate their sole use to measure the strength of co-authorship ties, such as presenting extreme values that do not represent reality. The existence of such problems suggests the metrics should be considered together and with other social network (SN) properties to better measure tie strength.

To overcome such limitations, this work proposes a new metric, called tieness, that helps to define a tie as weak or strong.1 Note the goal of tieness is not to replace neighborhood overlap and absolute frequency of interaction, but to be an additional feature that may allow deeper and complementary analyses.

In summary, tieness is an easy-computing metric that considers the neighbors and the intensity of co-authorships between researchers to measure tie strength. It differs from the existing ones by combining relevant aspects from the social network. Moreover, tieness can solve problems present in neighborhood overlap and weight (a simpler way to call absolute frequency of interaction), which have been largely used to measure tie strength [13, 26]. It may also be applied to different social networks, not only co-authorship social networks, e.g., a movie-producing network such as the one in [30].

After discussing the methods (“Methods overview” section), we present the contributions of this paper, summarized as follows:

  • We discuss four case studies where neighborhood overlap and absolute frequency of interaction alone have problems to measure the strength of ties. Also, we show the relationship between both metrics in three real datasets built from digital libraries of distinct fields—Computer Science with DBLP2, Medicine with PubMed3, and Physics with APS4 (“Neighborhood overlap and absolute frequency of interaction” section).

  • We propose a new metric called tieness that is a combination between a modification in neighborhood overlap and absolute frequency of interaction. It is easy to calculate and better differentiate tie strength in different levels. We also introduce a nominal scale to tieness based on the values of a modified neighborhood overlap and absolute frequency of interaction. Such nominal scale allows to identify when a tie is weak or strong and if it links researchers from different communities or not (“Tieness: a new metric for the strength of ties” section).

  • We validate tieness and its nominal scale according to Granovetter’s theory by removing weak and strong ties (“Results and discussion” section).

We finish this article by discussing previous work in the “Related work” section and final remarks in the “Conclusion” section.

Methods overview

The main goal of this article is to propose a new metric to measure the strength of co-authorship ties. In order to do so, we empirically evaluate four cases in which existing metrics commonly used to measure tie strength (neighborhood overlap and absolute frequency of interaction) present problems. Then, we propose our new metric called tieness focusing on solving these problems.

Next, we analyze the linear and non-linear correlation between neighborhood overlap (NO) and absolute frequency of interaction (W). The result of such correlation helps to identify whether both metrics are independent, i.e., whether they add or multiply when taken together. We do so by analyzing the relationship between both metrics on academic social networks from three different areas of expertise. The areas and their datasets are (i) Computer Science given by DBLP (collected in September 2015), (ii) Medicine by PubMed (April 2016), and (iii) Physics by APS (March 2016). For DBLP, we split it into two datasets: DBLP articles and DBLP inproceedings. For PubMed (a US national library of the Medicine National Institute of Health that comprises biomedical publications), we consider publications from the top-20 journals classified by h-index. For APS (American Physical Society), we consider a sample dataset with its journal publications. Then, we build a co-authorship SN for each dataset with features shown in Table 1.

Table 1 Datasets and their basic statistics and information

Considering the four problem cases and correlation results, we propose tieness by combining a modification in neighborhood overlap and the absolute frequency of interaction. As neighborhood overlap is a normalized metric and absolute frequency of interaction is not, we have to normalize the latter before combining with a modification in the neighborhood overlap. Thus, we guarantee that tieness is in the range [0;1].

In the following, we propose a nominal scale to tieness by analyzing the ECDFs (Empirical Cumulative Distribution Functions [20]) of neighborhood overlap, absolute frequency of interaction, modified neighborhood overlap, and tieness for each social network. ECDF is a graph used to evaluate the data distribution, estimate percentiles, and compare distinct distributions. The analysis of such graph reveals the percentile of data that falls below a specific value.

Finally, we validate such nominal scale by following Granovetter’s theory, which claims that weak ties connect nodes from different communities, whereas the strong ones link nodes from the same community. In other words, weak ties are acquaintances and provide access to novel information, while strong ties represent relationships with people whose social circles overlap. In order to follow this theory, we remove weak and strong ties at a time and analyze the effect of such removals in the co-authorship social networks.

Neighborhood overlap and absolute frequency of interaction

In this section, we first present four cases in which neighborhood overlap and absolute frequency of interaction cannot be solely used to measure tie strength. Then, we empirically show their relationship on three different networks.

Four motivating cases

We have empirically studied different co-authorship social networks and identified four cases in which existing metrics cannot be solely used to measure tie strength. Such study considers three different networks and the two main metrics: neighborhood overlap (NO) and absolute frequency of interaction (W).

Case 1: a pair of collaborators without any common neighbor. One of the problems of using only NO to measure the strength of ties is when an author has a high frequency of collaboration with another author but they do not have any common neighbor. In this case, the NO is zero, which does not represent reality. Figure 1 exemplifies this case. Another problem here is that NO and W present contradictory results. Analyzing NO, the pair A,C is a bridge as the strength of co-authorship is very weak. At the same time, W may indicate that such tie is not very weak. Therefore, considering both metrics is better to analyze how strong a tie is.

Fig. 1
figure 1

Case 1: no common co-author

Case 2: determining if two collaborators are from the same community or not. One problem in measuring the strength of ties using only W is that such metric provides a simple vision of the relationship. It is not possible to know if the relationship is intracommunity or not. This case is exemplified by Fig. 2. Since ties with low W may be intracommunity and ties with high W may be intercommunities, using only W is not enough to assess how weak/strong a tie is (i.e., it does not allow to properly verify Granovetter’s theory [17], in which weak ties serve as bridges in the network).

Fig. 2
figure 2

Case 2: no community information (a, b)

Case 3: little collaboration between a pair of collaborators and plenty of common neighbors. In this case, NO and W give values with opposite meaning, i.e., high NO and low W. Such results make it hard to define tie strength. Certainly, it depends on the analysis of the context. However, following Granovetter’s theory, such tie should be strong. Figure 3 gives an example of this case.

Fig. 3
figure 3

Case 3: many common co-authors

Case 4: results with extreme values. Here, the problem is when NO or W has extreme values that may not represent the reality. Figure 4 a shows a maximum value to NO, because the edge is part of a triad. Nevertheless, the value of W for the same edge is very small, which means that the tie is not necessarily very strong. Figure 4 b presents a similar situation but when W is very high and NO has the minimum value (zero). In this case, defining a tie as weak or strong based on only one of the metrics may provide a misleading interpretation.

Fig. 4
figure 4

Case 4: results too small/high (a, b)

Based on these four cases, we claim that developing a new metric for tie strength is necessary. Then, after experimentally analyzing both metrics in the “Analysis of NO and W over different networks” section, we introduce a new one in the “Tieness: a new metric for the strength of ties” section.

Analysis of NO and W over different networks

We now analyze the relationship between neighborhood and absolute frequency of interaction on DBLP, PubMed, and APS. As we consider co-authorship social networks, we call absolute frequency of interaction as co-authorship frequency, which measures the amount of publications that a pair of researchers has co-authored. Table 2 presents the correlation between both metrics for each dataset considering three coefficients: Kendall measures the degree of non-linear dependence between two variables; Pearson evaluates the linear relationship between two variables; and Spearman is more appropriate to measure the non-linear association between two variables [1, 18].

Table 2 The correlation coefficients between neighborhood overlap and co-authorship frequency

Overall, the correlation between neighborhood overlap and co-authorship frequency is small for the three coefficients. Therefore, neighborhood overlap and co-authorship frequency are monotonically and linearly independent in the three datasets. In other words, both metrics are important to measure the strength of ties as they capture different characteristics of the social network.

Tieness: a new metric for the strength of ties

Motivated by the problems generated by using neighborhood overlap and co-authorship frequency (coAfrequency—a short name to the absolute frequency of interaction in the co-authorship social network context) alone to measure tie strength, we now introduce a new metric called tieness. Specifically, tieness results from a combination between a modification in neighborhood overlap (entitled modified neighborhood overlap), which captures the social circle of nodes involved in a tie, and co-authorship frequency, which represents the absolute number of publications common to a pair of researchers, as shown by Eq. 1.

$${} \text{tieness}_{i,j}\! =\! \frac{\left|\mathcal{N}\left(v_{i}\right) \cap \mathcal{N}\left(v_{j}\right)\right| + 1}{1 + \left|\mathcal{N}\left(v_{i}\right) \cup \mathcal{N}\left(v_{j}\right)\right| - \left\{v_{i}, v_{j}\right\}} \text{coAfrequency}_{i,j} $$
(1)

where \(\mathcal {N}(v_{i})\) represents the co-authors (neighbors) of researcher v i and \(\mathcal {N}\left (v_{j}\right)\) the co-authors of v j . Note that we sum one at the numerator of neighborhood overlap to indicate that there is a link between v i and v j . This solves the problem when a pair of authors does not have any co-author in common. Then, we sum one at the denominator to give the right proportion to the equation. Also, for unweighted social networks, tieness value is the same as the modified neighborhood overlap.

Regarding computation time cost of tieness, the operations with the highest time cost are intersection \((O (\mathcal {N} (v_{i})+\mathcal {N} (v_{j})))\) and union \((O (\min (\mathcal {N} (v_{i}),\mathcal {N} (v_{j}))))\) using hash tables. Thus, the time complexity of tieness is \(O(\max (\mathcal {N}(v_{i}),\mathcal {N}(v_{j})))\)—Big O notation property: \(O(\min (\mathcal {N}(v_{i}),\mathcal {N}(v_{j}))) + O(\mathcal {N}(v_{i})+\mathcal {N}(v_{j})) =O(\min \ (\mathcal {N}(v_{i}),\mathcal {N}(v_{j}))+\mathcal {N}(v_{i})+\mathcal {N}(v_{j})) =O(\max (\min (\mathcal {N}(v_{i}),\mathcal {N} \ (v_{j})),\mathcal {N}(v_{i}),\mathcal {N}(v_{j})))\) [10].

A problem of Eq. 1 is that coAfrequency is a non-normalized metric, i.e., the set of weights of the datasets is not in the range 0 to 1. In order to solve this problem, we try to normalize coAfrequency by using two methods: the norm (equal to the Euclidean distance) of the set of weights that can be seen as a vector [2] and the unity-based normalization5. However, the first method is not appropriate, because the norm of the coAfrequency vector is very high, which reduces most of the weights to the magnitude of 104. Regarding the second method, it means to fit the data within unity (1), so all data will be in the range 0 to 1. However, sometimes it is important to choose a different range to the data. The unity-based normalization allows to normalize the data within a selected range. Thus, let the co-authorship frequency of all edges in the social network be defined as a vector coAfrequency that represents each data point k (i.e., value of the edge). Then, the unity-based normalization is computed by

$${} \begin{aligned} &||\text{coAfrequency}_{i,j}||\\ &\quad= a + \frac{\left(\text{coAfrequency}_{k} - \min(\text{coAfrequency})\right) (b-a)}{\max(\text{coAfrequency}) - \min(\text{coAfrequency})} \end{aligned} $$
(2)

where coAfrequency k is the k-value in the vector coAfrequency, min(coAfrequency) is the minimum value among all the sets of co-authorship frequency in the social network (i.e., the minimum value in coAfrequency), and max(coAfrequency) is the maximum value among all the sets of co-authorship frequency (i.e., the maximum value in coAfrequency). Moreover, a and b define the range of values for the co-authorship frequency, i.e., the data will be normalized in that range. Here, we select a=1 and b=2, because considering the range [0,1] makes the value of neighborhood overlap be annulled when the co-authorship frequency is 1 without the normalization. Thus, the range [1,2] guarantees that the co-authorship frequency can indeed contribute to increase the value of tieness.

Such improvement is presented in Eq. 3, where tieness i,j is in the range [0;4]. Then, we divide the equation by 4 to put tieness i,j in the range [0;1].

$$ \text{tieness}_{i,j} = \frac{\frac{\left|\mathcal{N}(v_{i}) \cap \mathcal{N}(v_{j})\right| + 1}{1 + \left|\mathcal{N}\left(v_{i}\right) \cup \mathcal{N}\left(v_{j}\right)\right| - \left\{v_{i}, v_{j}\right\}} \ ||\text{coAfrequency}_{i,j}||}{4} $$
(3)

where ||coAfrequency i,j || is the co-authorship frequency of a pair of researchers v i and v j as unity-based normalized by Eq. 2.

Tieness is calculated for each edge (pair of nodes) in the social network. Let tieness be a vector that contains tieness i,j for each edge k in the social network. Thus, the overall level of tieness in a social network is measured by the average of the tieness values of all edges:

$$ \overline{\text{tieness}} = \frac{1}{|E|} \sum_{k=1}^{|E|} \text{tieness}_{k} $$
(4)

where tieness k is the value of tieness for each edge in the social network and |E| is the number of edges in the social network. Also, the time complexity of the algorithm to measure the overall tieness is \(O\left (|E| \ \max \left (\mathcal {N}\left (v_{i}\right),\mathcal {N}\left (v_{j}\right)\right)\right)\).

In order to understand how tieness represents ties in SN, Table 3 shows tieness’ values for each case study. In Case 1, tieness gives a small value that indicates the presence of interactions (opposite of neighborhood overlap). However, analyzing only the final result of tieness for Cases 1, 2, and 3 is not enough to identify if a pair of researchers is intracommunity or not. Also, regarding Case 4, tieness is the same as the normalized co-authorship frequency when neighborhood overlap is zero and 2∗||coAfrequency|| when neighborhood overlap is one. In the Regular Case, when neighborhood overlap and co-authorship frequency are in accordance indicating that a tie is strong, tieness also provides a high value that may represent a strong tie. Such results cannot be used to identify if the tie belongs to a community and if it is a bridge or not.

Table 3 Tieness for each case study and an extra case study representing the situation when NO and coAfrequency are in accordance

Indeed, an advantage of using our new metric is the values of tie strength are more distinct, then allowing to better differentiate the strength of a tie and establish different levels of tie strength. Moreover, we can consider the value of the modified neighborhood overlap and co-authorship frequency separately to evaluate the final result of tieness. Thus, the definition of a nominal scale is necessary to identify when a tie is weak or strong.

We define a nominal scale to tieness by comparing the modified neighborhood overlap and co-authorship frequency. In doing so, we follow concepts discussed by Easley and Kleinberg [13]: a weak tie has a small neighborhood overlap and a strong tie has a large one.

Therefore, Fig. 5 shows the ECDFs and quartiles for neighborhood overlap, co-authorship frequency, modified neighborhood overlap, and tieness. The analysis of ECDFs shows that co-authorship frequency provides many repeated results to the strength of ties, as 50% of the data are equal to 1. On the other hand, the neighborhood overlap, modified neighborhood overlap, and tieness provide different results for each quartile. Furthermore, considering the neighborhood overlap’s ECDFs of each dataset, they are very different from each other. For example, the values of APS’s ECDF are different from PubMed’s ECDF. However, modified neighborhood overlap and tieness ECDFs have similar values through different datasets. This result may indicate that tieness is less sensible to the dataset and better distinguishes the relationship between nodes.

Fig. 5
figure 5

ECDF of each metric (ad). In this scenario, modified neighborhood overlap and tieness metrics have more distinct values through the quartiles

Having studied such distributions, we may now consider the values of quartiles to define a nominal scale. In other words, the quartiles’ distributions help to identify when a tie is weak or strong and if it connects different communities or not. Equation 5 shows the nominal scale to tieness based on the quartiles. Note for an unweighted social network, such scale is also valid because modified neighborhood overlap has the same value as tieness to the second and third quartiles.

$$ \begin{cases} \mathbf{weak}, \text{tieness} \leqslant 0.10\\ \mathbf{moderate}, 0.10 < \text{tieness} < 0.43 \\ \mathbf{strong}, 0.43 \leqslant \text{tieness} \end{cases} $$
(5)

Results and discussion

In order to validate the proposed nominal scale, we verify if Granovetter’s theory governs the social network and the strength of ties with such values. Given that weak ties are bridges that connect different parts of the network, his theory claims the network tends to be more disconnected when weak ties are removed (i.e., the number of connected components tends to increase). Hence, we analyze the number of connected components in the social network after removing weak and strong ties.

Tables 4, 5, 6, and 7 present the number of edges and connected components after removing weak and strong ties in each dataset. Also, we show results when the strength of ties is measured by tieness (weighted SN) and modified neighborhood overlap (considering the SN as unweighted). According to these tables, when weak ties are removed, the number of connected components is higher than when removing strong ties. Also, there are differences between the result for modified neighborhood overlap and tieness, which is caused by the co-authorship frequency of interaction. Moreover, the number of removed edges is larger when weak ties are removed. Indeed, the larger number of connected components may be explained by the larger removal of bridging edges.

Table 4 DBLP articles: number of connected components when weak and strong ties are removed from the social network
Table 5 DBLP inproceedings: number of connected components when weak and strong ties are removed from the social network
Table 6 PubMed: number of connected components when weak and strong ties are removed from the social network
Table 7 APS: number of connected components when weak and strong ties are removed from the social network

We now compare the proportion of the number of connected components by the number of edges for tieness and modified neighborhood overlap when weak and strong ties are removed from the social network. Table 8 presents these proportions. The analysis of such proportions shows that the number of connected components per edge is larger when weak ties are removed. Thus, the nominal scale is valid. Moreover, as the removal of weak ties (defined according to the nominal scale) breaks the connected components of the social network, tieness is indeed able to identify when a tie connects different communities or not.

Table 8 Proportion between the number of connected components and the number of edges in the social networks when weak and strong ties are removed

Furthermore, we note that the different research areas considered (Computer Science, Medicine, and Physics) present similar behavior. The presence of weak ties is bigger than the strong ones when they are measured by tieness. This is a result from a network with nodes not very well clustered (regarding their neighbors). In order to verify it, we analyze the clustering coefficient6 from the four co-authorship social networks. The results show that the highest clustering coefficient is from PubMed (equal to 0.357) and the smallest one is from DBLP inproceedings (equal to 0.16). Thus, the clustering coefficient from the four networks is very small, which justifies the low tieness for the pairs of researchers.

Although tieness is able to better differentiate the strength of ties when compared to neighborhood overlap and co-authorship frequency, there are limitations. One of them is that tieness classifies a tie as strong when the modified neighborhood overlap and weight are very high. Thus, few ties are classified as strong. A solution to this is changing the nominal scale, but it requires to make more analyses from the social networks. Another limitation is applying tieness in co-authorship social networks from research areas in which collaborations among researchers are not a common practice. For example, in the sociology area, the level of collaboration is low [4]. Nonetheless, this is a limitation intrinsic to the definition of co-authorship networks, which should contain a good number of connections for any proper analysis.

Moreover, defining a nominal scale is very hard, because it requires to consider different parameters from the data. Here, the nominal scale of tieness has a simplifying assumption: to consider only the values of the ECDFs and percentiles. Another possibility is to define the nominal scale by combining different properties from the ties in the social networks with tieness in a math model. Then, the nominal scale would be more complete but more complex as well.

Related work

Many studies address tie strength in social networks [4, 6, 8, 16, 17, 31]. Following Granovetter’s theory [17], ties are weak when they serve as bridges in the network by connecting users from different groups, and strong when they link individuals in the same group. All previous studies contextualize the importance of our work to use different networks to corroborate previous insights, such as when distinct relationships play different roles, ties have large impact at the micro-macro level in the network depending on their strength, the influences in the patterns of communications, and so on.

Specifically, those studies consider the strength of ties in different social networks. For example, Pappalardo et al. [28] propose a definition of tie strength by measuring the interaction between two individuals over three different social channels: Facebook, Twitter, and Foursquare. Also in Facebook, Gilbert and Karahalios [15] classify friendship strength based on variables from interaction history (e.g., inbox messages exchanged, days since first or last communication), whereas Kahanda and Neville [19] map four different categories of features: transactional (such as picture postings and groups), network-transactional (considers the interaction between a pair of users and the overall interaction of these two users with the remaining users), topological (e.g., node degree and number of shared neighbors), and attribute-based features (such as gender and interests). On a different perspective and network (now, Twitter), McGee et al. [23] study if the geographic distance influence the strength of ties among users by considering users’ friends, followers, and recent tweets.

Overall, those methods require an interaction through the history (messages on timeline, tweets, shared check-ins, etc.) to build a predictive model or to measure tie strength. Nonetheless, Wiese et al. [31] show that the accuracy of methods based only on interaction history may be misleading. Then, Zignani et al. [33] disregard history and classify Facebook ties as interactive (strong) or non-interactive (weak) at their creation time. They consider topological features, interaction-graph features, and temporal features in supervised learning classifiers. In summary, these more recent studies favor the importance of developing metrics based on other information besides interaction history.

Whereas all the aforementioned studies rely on datasets from social networks that include people interaction, there are also studies on datasets without such information. Specifically, for academic social networks, the data available comes from collaboration between authors and/or publications [9, 11]. Not having the over-used social interaction, data requires new and better topological features. Hence, Table 9 shows different topological properties that have been used to measure tie strength on such context. We emphasize that neighborhood overlap is the metric most used in such measurement. Also, note that we present the clustering coefficient which is not a metric for a pair of nodes, but it is commonly used to measure the strength of a node in the social network regarding its neighbors. The clustering coefficient is computed for a node i and a node j. Then, the clustering coefficient of both nodes is used to measure the strength of the tie.

Table 9 Given two nodes i and j, there are different metrics that can be used to measure the strength of ties

In this context, we propose a new topological feature and a nominal scale that help to measure tie strength in co-authorship social networks. Our new metric is based on neighborhood overlap and the absolute frequency of interaction among researchers. Our new metric differs from the existing ones [4, 13, 19, 2527, 33, 34] by combining these two simple metrics commonly used to measure tie strength. Also, tieness is ideal for networks without much information available, such as academic social networks. Thus, this work is a step forward on social network metrics.

Conclusion

In the context of academic social networks, we identified problems with using solely a modification in neighborhood overlap and absolute frequency of interaction to measure the strength of co-authorship ties. Then, we presented a new metric to measure such tie strength, called tieness, which has relatively low computational cost and can be applied to other social network types (since tieness is a topological feature). Also, the definition of tieness comes with a nominal scale that allows to identify when a tie is weak or strong and if it links researchers from different communities or not. The main limitation to such a new metric is that the network must have nodes collaborating with each other.

We have performed empirical studies considering the networks from three different areas of expertise (Computer Science, Medicine, and Physics). Overall, our analyses showed that tieness provides more distinct values through the ties than neighborhood overlap and absolute frequency of interaction. Such distinction is important to better compare how strong (weak) a tie is regarding another one. We also observed similar behavior through the three different research areas.

Furthermore, all the four co-authorship social networks are dominated by the presence of weak ties. This is so, because most pairs of researchers have low amount of shared neighbors and small co-authorship frequency of interaction. Therefore, tieness is able to classify as strong ties only pairs of researchers with very high neighborhood overlap and co-authorship frequency.

As future work, we plan to consider temporal aspects and other topological properties as features to a computational model to automatically define the strength of co-authorship ties. We also plan to improve the nominal scale by considering different properties from the co-authorship social networks. The datasets supporting the analyses of this article are publicly available at http://www.dcc.ufmg.br/~mirella/projs/apoena/.

Endnotes

1 An initial version of this work was published in [3]. It evaluates the metric over only one dataset and discusses its relation to the quality of publication venues; not presented here.

2 DBLP: http://dblp.uni-trier.de.

3 PubMed: http://www.ncbi.nlm.nih.gov.

4 APS: http://www.aps.org.

5 Etzkorn, B. “Data normalization and standardization.” BE BLOG [Online]. Available: http://www.benetzkorn.com/2011/11/data-normalization-and-standardization(2011).

6 Clustering coefficient measures the proportion of nodes’ neighbors that can be reached by other neighbors [13], i.e., it also considers the connectivity among neighbors.