In this section we construct the classification models to estimate the degree of homophily from the community features. In Sect. 4.1 we start with the Skype contact graph describing a specific instantiation of the analyzed problem, namely Social Engagement. In such scenario we are interested in using topological, geographical and temporal network features to estimate the average engagement each community has on two Skype products, video and chat. In Sect. 4.2 we analyze the LastFM graph and shift our attention on a different formulation of our original problem: Service Engagement. Here, we want to estimate the average community level of music listening, i.e., how much users in the same community use in average the LastFM scrobbler (estimated by the gross number of her listenings). Finally in Sect. 4.3 we address the problem of estimating the degree of homophily w.r.t. the education level of Google+ users within communities.
Skype: user engagement
We use the topological, geographical and temporal features described above to classify the level of engagement of social communities with respect to the chat and video activity features. To this purpose, we build a supervised classifier that assigns communities to two possible categories: high level of engagement or low level of engagement. We address two different scenarios: (1) a balanced class scenario where the two classes have the same percentage of population and (2) an unbalanced class scenario, where we consider an uneven population distribution.
Balanced scenario
We consider two classes of user engagement for each of the two activity features (chat and video): low engagement and high engagement. To transform the two continuous activity features into discrete variables we partition the range of values through the median of their distribution. This produces, for each variable to predict, two equally populated classes: (1) low engagement, ranging in the interval [0, median] and (2) high engagement, ranging in the interval [median, 31].Footnote 4 To perform classification we use stochastic gradient descent (SGD) and area under the ROC curve (AUC) to evaluate their performance. The ROC curve illustrates the performance of a binary classifier and is created by plotting the true positive rate (tpr, also called sensitivity) versus the false positive rate (fpr, also called fallout or 1-specificity), at various threshold settings. The overall accuracy is instead the proportion of true results (both true positives and true negatives) in the population. Moreover, in a preliminary testing phase the classification step was repeated also using a random forest model built upon C4.5: due to the similar performance observed, the more intuitively interpretation of the obtained results and the lower execution time we decided to show only the results obtained by SGD.
Table 3 Skype: AUC and accuracy (within brackets) produced by the SGD method in the balanced scenario, for video and chat features
We learn the SGD classifier with logistic error function (Tsuruoka et al. 2009; Zhang 2004) exploiting its implementation provided by the sklearn Python library.Footnote 5 We execute 5 iterations, performing data shuffling before each one of them, imposing the elastic-net penalty \(\alpha =0.0001\) and l1-ratio = 0.05. The adoption of elastic-net penalty results in some feature weights set to zero, thus eliminating less important features.
We apply a fivefold cross-validation for learning and testing. Table 3 shows the AUC produced by the SGD method on the features extracted from the community sets produced by the four algorithms (for HDemon and Louvain only the two best performing community sets are reported). HDemon produces the best performance, both in terms of AUC and overall accuracy, for all the three activity features. Louvain, conversely, reaches a poor performance, and it is outperformed by the more trivial BFS and ego-network algorithms. This result suggests that the adoption of modularity optimization approaches, like Louvain, is not effective when categorizing group-based user engagement due to their resolution limit which causes the creation of huge communities (Fortunato and Barthélemy 2007). As the level of the Louvain hierarchy increases, and hence, the modularity increases, both the AUC and overall accuracy decrease. In the experiments, indeed, the first Louvain hierarchical level outperforms the last level, even though the latter has the highest modularity. Figure 2 shows the features which obtain a weight value by the SGD method higher than 0.2 or lower than \(-0.2\) (i.e., the most discriminative features for the classification process).
HDemon distributes the weights in a less skewed way, while the other algorithms tend to give high importance to a limited subset of the extracted features. Moreover only a few Louvain features have a weight higher than 0.2 or lower than \(-0.2\) (see Fig. 3d), confirming that a modularity approach produces communities with weak predictive power with respect to user engagement. Moreover, an interesting phenomenon emerges: independently from the chosen community discovery approach, the most relevant class of features for the classification process seems to be to the topological one (i.e., the sum of the absolute values of the SGD weights for the features belonging to such class is always greater than the same sum for community formation and geographical features combined).
In particular degree, density, community size and clustering-related measures often appear among the most weighted features. Figure 4 shows the relationships between the average community size, the average community density and the AUC value produced by the SGD method on the community sets which reach the best performances in the balanced scenario. The best performance is obtained for the HDemon community sets, which constitute a compromise between the micro- and the macro-level of network granularity. When the average size of the communities is too low, as for the ego-network level, we lose information about the surroundings of nodes and do not capture the inner homophily hidden in the social context. On the other hand, when communities become too large, as in the case of communities produced by Louvain we mix together different social contexts losing definition. Communities expressing a good trade-off between size and density, as in the case of the HDemon algorithm, effectively reach the best performance in the problem of estimating user engagement.
Unbalanced scenario
We address also an unbalanced scenario where we use the 75th percentile for the low engagement class, which thus contains the 75 % of the observations, and put the remaining 25 % of the observations in the high engagement class. Table 4 describes the results produced by the SGD methods in the unbalanced scenario, using the same features and community discovery approaches discussed before. The baseline method for the unbalanced scenario is the majority classifier: it reaches an AUC of 0.75 by assigning each item to the majority class (the low engagement class). We observe that, regardless the community set used, the SGD method (as well as random forest) is not able to improve significantly the baseline classifier for video. Conversely the results obtained for the chat feature by SGD outperform the baseline when we adopt HDemon, ego-networks and BFS community sets, reaching an AUC of 0.83.
In order to provide additional insights into the models built with the adoption of the different CD algorithms, we also compute the precision and recall measures with respect to the minority class (see Table 5). Looking at these measures enables us to understand which is the advantage in using SGD to identify correctly instances of the less predictable class. Moreover, we can observe how choosing the 75th percentile led to a very difficult classification setup: the instances belonging to the minority class often represent outliers having very few examples from which the classifier can learn the model. Here the baseline is the minority classifier which reaches a precision of 25 % by assigning each community item to the minority class (the high engagement one). We observe that the SGD method outperforms the baseline classifier on all the community sets (reaching values in the range [.33, .57]). HDemon and ego-networks are the community sets which led to the best precision, on the video features and the chat feature, respectively.
Table 4 Skype: AUC and accuracy (within brackets) produced by the SGD method in the unbalanced scenario, for the video and chat features
Table 5 Skype: precision and recall (within brackets) produced by the SGD model for the video and chat features in the unbalanced scenario
In order to measure the effectiveness of SGD we report the lift chart which shows the ratio between the results obtained with the built model and the ones obtained by a random classifier. The charts in Fig. 5 are visual aids for measuring SGD’s performance on the community sets: the greater the area between the lift curve and the baseline, the better the model. We observe that HDemon performs better than the competitors for the video features. For the chat features, the community sets produced by the three naive algorithm win against the other two CD algorithms. For all the three activity features, Louvain reaches the worst performance, as in the balanced scenario.
As done for the balanced scenario in Fig. 3 we report the features having weight greater than 0.2 or lower than \(-0.2\). In contrast with the results presented in the previous section, where topological features alway show the higher relative importance for the classification process, in this scenario we observe how community formation and geographical features are the ones which ensure greater descriptive power. As previously observed the minority class identified by a 75th percentile split is mostly composed by particular, rare, community instances. This obviously affects the relative importance of temporal and geographical information: the results suggest that the more a community is active the more significative are its geographical and temporal bounds. Finally in Fig. 6 we show the relationships between the average community size, the average community density and the AUC value produced by the SGD method on the community sets which reach the best performances in the unbalanced scenario. We can observe how, in this settings, the algorithms producing communities with small average sizes and high density are the ones that assure the construction of SGD models reaching higher AUC. In particular HDemon in both its instantiation outperforms the other approaches.
Skype community characterization
From our analysis a well-defined trend emerges: among the compared methodologies, in both the balanced and unbalanced scenarios, HDemon is the best in bounding homophily producing communities that guarantee useful insights into the product engagement level. For this reason starting from the communities extracted by such bottom-up overlapping approach we computed the Pearson correlation for all the defined features against the final class label (high/low engagement). As shown in Fig. 7a when splitting the video engagement using the 50th percentile we are able to identify as highly active communities the ones having high country entropy \(E_s\) as well as high geographical distance among its users \(dist_{avg}\) and whose formation is recent (i.e., whose first user has joined the network recently, \(T_f\), as well as the last one, \(IT_{l,f}\).). Moreover, video active communities tends to be composed by users having on average low degree as shown by \(deg_{avg}^{all}\) and \(deg_{max}^C\). Conversely looking at Fig. 7b we can notice that communities which exhibit high chat engagement can be described by persistent structures (i.e., social groups for which the inter-arrival time \(IT_{l,f}\) from the first to the last user is high), composed by users showing almost the same connectivity (in particular having high degree) and sparse social connections (low clustering coefficient CC, low density D and high radius). Moreover, we calculate the same correlations for the 75th percentile split: in contrast with the new results for the chat engagement (Fig. 7d) which do not differ significantly from the ones discussed for the balanced scenario, in this settings the highly active video communities show new peculiarities. In Fig. 7c we observe how the level of engagement inversely correlates with the community radius (and diameter) and directly correlates with density. This variation describes highly active video communities as a specific and homogeneous subclass composed by small and dense network structures composed by users who live in different countries (high geographical entropy \(E_s\)).
LastFM: service engagement
For the LastFM scenario we want to understand if the topological features of the social network can explain whether a community is predictive of the engagement into the service, measured by the total number of listenings of users into the community. To do that we transform the problem into a binary classification task by assigning each community to one of the two classes: low volume of listenings or high volume of listenings. As for the Skype network, we address two different scenarios: (1) a balanced class scenario where the two classes have the same percentage of population (50th percentile split) and (2) an unbalanced class scenario (75th percentile split) where we consider an uneven class distribution.
Balanced scenario
The results reported in Table 6 highlight how, in contrast with Skype, Louvain produces the best performance in predicting the volume of listenings (both in AUC and accuracy). This trend is also evident from Fig. 8: Louvain shows lower average density and lower average size than the other algorithms, albeit obtaining the highest AUC. The Ego-Nets approach produces the worst performance highlighting how, in a balanced scenario, the community-based approach improves the prediction of the engagement.
Table 6 LastFM: AUC and accuracy (within brackets) produced by the best classifier in the balanced scenario, for the average total listenings feature
Unbalanced scenario
In the unbalanced scenario the low volume of listenings class is the 75 % of the dataset. Tables 7 and 8 show two main results. On the one hand, HDemon produces the best performance reaching an AUC = .78 (Table 7), a considerable improvement with respect to the baseline classifier (.25). Figure 9 shows that HDemon communities are the ones whose topological attributes better discriminate among the high volume and low volume listenings classes. On the other hand, the Ego-Nets algorithm produces the best precision on the minority class (Table 8). In any case all the algorithms outperform the baseline precision on the minority class (0.25), even though they show a rather low recall (while the baseline by definition has recall = 1).
Table 7 LastFM: AUC and accuracy (within brackets) produced by the best classifier in the unbalanced scenario, for the average total listening feature
Table 8 LastFM: precision and recall (within brackets) produced by the best classifier for the average total listenings feature in the unbalanced scenario
Google+: community homogeneity
In this scenario we investigate the ability of topological features in explaining whether a community is composed by users having a homogeneous level of education. As done before, we see the problem as a binary classification task, i.e., each community is assigned to one of the two classes: (1) homogeneous or (2) heterogeneous education level. The target feature is built computing the node label entropy \(e_i\) for each community \(c_i\): if \(e_i\rightarrow 0\) community users have the same education level, conversely if \(e_i\rightarrow 1\) they show heterogeneous education levels. The chosen target feature distributes almost equally on all the partitions made, following a normal distribution. We address two different scenarios: (1) a balanced class scenario where the two classes have the same percentage of population (50th percentile split) and (2) an unbalanced class scenario (75th percentile split), where we consider an uneven class assignment (rising the threshold level for homogeneous communities).
Balanced scenario
As done for LastFM, since the dataset has moderate size we applied an ensemble of classification approaches and report the results obtained by the best performer. The results reported in Table 9 highlight how, contrarily to what observed on Skype, Louvain guarantees the best performances (both in AUC and accuracy). This trend is evident in Fig. 10: Louvain seems to better capture the degree of homophily because—due to the scale problem that affects modularity-based approaches—it outputs huge communities (whose entropy tends to 1) and tiny communities (whose entropy tends to 0).
Table 9 Google+: AUC and accuracy (within brackets) produced by the best classifier (SGD) applied to the Google+ topological features in the balanced scenario
The reduced quality of prediction obtained by HDemon and ego-network highlights the complexity of the problem: ego-networks guarantee smaller and denser communities, but fail in recovering all the positive instances (low recall on the homogeneous class, \(\simeq 0.41\)); HDemon reaches a higher recall but, due to the higher average sizes of the identified communities, lacks in precision (\(\simeq\)0.52).
Unbalanced scenario
We applied the same strategy to address a more complex scenario: in this settings the homogeneous level of education is assigned only to communities having node label entropy in the range [0, 0.25]. We are searching for the most homogeneous communities.
Table 10 Google+: AUC and accuracy (within brackets) produced by the best classifier (decision tree) applied to the Google+ topological features in the unbalanced scenario
Table 11 Google+: precision and recall (within brackets) produced by the best classifier (decision tree) applied to the Google+ topological features in the unbalanced scenario
Tables 10 and 11 show that the best classification is reached when the HDemon communities are used. As expected Louvain performances decrease while focusing on the minority class (which contains small- and medium-sized communities). From Table 11 we get a very clear picture on the complexity of the problem itself: all the proposed community discovery algorithms outperform the baseline precision on the minority class; however, their recall is quite low (while the baseline, by definition has recall = 1). Again Fig. 11 shows that HDemon communities are the best in discriminating among homogeneous and heterogeneous users education level.