1 Introduction

Since research on academic achievement began to emerge as a field in the 1960s, it has guided educational policies on admissions and dropout prevention [1]. Although much of the literature has focused on higher education, the knowledge obtained on behavioral phenomena observed in colleges and universities can potentially guide research on student behavior in primary and secondary schools. A number of behavioral patterns have been linked to academic performance, such as time allocation [2], active social ties [3], sleep duration and sleep quality [4], or participation in sport activity [5]. Most of the existing studies, however, suffer from biases and limitations often associated with surveys and self-reports [6, 7], particularly when measuring social networks [811].

Here we investigate the performance of 538 students within a novel dataset collected as part of the Copenhagen Network Study (CNS), with data collection ongoing for more than two years [12]. Due to the scale of the CNS, and the inclusion of directly observed data from smartphones in place of self-reports, we are able to mitigate some of the limitations encountered in existing ‘traditional’ studies. The strength of the CNS data is the high-resolution multi-channel measures for social interactions, including person-to-person proximity (using Bluetooth scans), calls and text messages, activity on online social networks (Facebook), and mobility traces.

The aim of our study was to better understand the impact of individual and network factors on our ability to distinguish between groups of students based on their performance. That is, we wanted to identify the ways in which low performers are significantly different from high performers and vice versa. We divide this goal into three specific objectives:

  1. (i)

    Identify individual and network factors that correlate with students’ performances.

  2. (ii)

    Analyze the importance of different sets of features for supervised learning models to classify students as low, moderate, or high performers.

  3. (iii)

    Investigate significant differences among performance groups for the most important individual and network features.

2 Related work

2.1 Individual behavior

Through a variety of methods, a large number of studies have investigated the factors that determine academic performance. Vandamme et al. [13] analyzed a broad range of individual characteristics concerning personal history, behavior, and perception. Similarly, the StudentLife study [14] used smartphones to collect data on student activity, social behavior, personality, and mental health. Both research groups observed correlations between performance and all feature categories, building a case that factors influencing academic performance are not limited to a single aspect of an individual’s life. Nghe et al. [15] reframed the problem as a prediction task: using data to predict performance in a population of undergraduate and postgraduate students at two different institutions. Using a wide range of features, they predicted GPA after third year with high accuracy. One of the features included GPA after the second year; in this work we show that even without the knowledge of past achievements it is possible to explain the students’ performance levels to a large extent. Furthermore, prior research has emphasized the positive influence of attending classes [1619]. The study by Crede et al. [19] concludes that attendance is the most accurate known predictor of academic performance; see [20] for a more detailed analysis of the impact of class attendance on academic performance based on the CNS data.

Cao et al. [21] analyzed behavioral data from the digital records of nearly 19,000 students’ smart cards, such as entering and leaving the library, having a meal in the cafeteria, or taking a shower in the dormitory. They conclude that the students’ orderness (regularity of daily activities) is a strong predictor of academic performance. Our approach shares some similarities with [21], but the key difference is that we have investigated not only individual behavior but also the students’ social environment.

2.2 Individual traits

A large body of research at the intersection of psychology and education investigated the relationship between personality and performance, as pioneered by [22]. Many personality traits were found to be linked to academic success: Among the dimensions of the well-studied Big-Five Inventory [23] Conscientiousness (positive) and Neuroticism (negative) displayed the strongest correlation with academic performance [2452]. The other three dimensions showed only very weak or no correlation. Furthermore, the characteristics Self Esteem [53], Satisfaction with Life [54, 55], and Positive Affect Schedule [56] were also found to be positively correlated, while Stress [57, 58], Depression [5961], and Locus of Control [54, 55] showed a negative effect on academic achievements.

2.3 Online social media

Only a few prior studies have investigated the impact of social media activity on academic performance, despite the growing availability of such data and undisputed presence of these media in our daily lives. The majority of existing studies found a decrease in academic performance with increasing time spent on social media [6269]. However, not all studies confirm this result. In some studies, time spent on social media was found to be unrelated to academic performance [70, 71] or even a had positive effect on performance [72, 73].

2.4 Social interactions

There is a growing interest in the relationship between social interactions (especially online social interactions) and academic performance [3, 7492]. In the relevant literature there exist two dominant approaches. The first approach focuses on the relation between own performance and that of peers [7481], based on a hypothesis of similarity in peer achievement. The similarity between pairs of individuals connected via social ties are attributed to various aspects: selection into friendships by similarity (i.e., homophily); influence by social peers (also know as peer effect); and correlated shocks (e.g., being exposed to the same teacher). As noted by [74, 93] the issue of separating these effects is inherently difficult. The second approach emphasizes the positive influence of having a central position in the social network between students [8590]. The majority of results in the existing research which measure social networks are, however, based on self-reports and therefore subject to various biases [811] that are in many ways mitigated by using smartphones to measure the social network [94]. However, it should be noted that surveys and observational studies often measure very different aspects of reality. For instance, in the case of assessing tie strengths, observational studies may be more accurate in quantifying duration and frequency variables of a relationship, while surveys can provide qualitative insights into depth and intimacy [95, 96].

3 Materials and methods

3.1 Data collection and preprocessing

Results presented in this paper are based on the data collected in the Copenhagen Network Study (CNS) [12]. In the CNS, dedicated smartphones where handed out to students at the Technical University of Denmark (DTU) and used as their primary phones for two years. During this period various data types were recorded: Bluetooth scans, call and text message meta data, Facebook activity logs, and mobility traces. Additionally, participating students answered a survey on personality at the beginning of the study. Due to the possibility to exit the experiment at any given point, the number of participants varied over time. We investigate the data from 538 undergraduate students for whom we have complete data.

The raw data records are cleaned and transformed to meaningful information before the analysis. Bluetooth scans are used to estimate person-to-person interactions corresponding to a physical distance of up to 10 m (30 ft) between participants. While physical proximity is not a perfect proxy for person-to-person interactions, there is evidence that the proximity interactions are predictive of friendship in online social networks and communication using phone calls and text messages [9799].

Facebook data was obtained via the Facebook Graph API, and contains both static friendship connections as well as various interactions on the social network. All types of interactions are treated equally. Private messages, however, are unavailable since they cannot be obtained from Facebook using the official Graph API.

The location data on the smartphones has varying accuracy depending on the providing sensor. The accuracy of the collected position can vary between a few meters for GPS locations, to hundreds of meters for cell tower location. We group the location data into 15-minute bins and use the median location of all data points with an accuracy below 80 m. In order to compute attendance we combined the smartphone locations with the person-to-person proximity obtained from Bluetooth scans. A detailed description of the method can be found in a companion paper [20].

We considered social interactions of five different channels: proximity, Facebook (friendships + interactions), calls, and text messages. For each channel we created a network to model the social relations. Note that these models are based only on the interactions among participants of the CNS. Interactions with any people outside the study were not considered. Importantly, for the proximity networks we excluded all meetings that took place during class time in order to eliminate effects caused by class co-attendance. Section B in Additional file 1 discusses further details of the creation of these network models. In the remainder of this paper, the direct neighbors in those networks are refereed to as ‘peers’.

The students’ course grades were provided by DTU administration. Only courses using the Danish 7-point grading scale were considered. This scale consists of the grades 12, 10, 7, 4, 02, 00, and −3 with 12 being the best grade and 00 and −3 indicating that the student failed. The positive weighted mean grades (term or cumulative) were converted to the standard GPA scale ranging from 4.0 (best) to 0.0 (worst). Every negative mean grade was set to 0.0. Only students attending at least three courses were considered. Figure 1 illustrates the distribution of the 538 cumulative GPAs. It shows a left-skewed distribution with a mean GPA of 2.5. More information about the student population can be found in Section A of Additional file 1.

Figure 1
figure 1

Distribution of cumulative GPAs. Distribution of 538 cumulative GPAs. The histogram shows a left-skewed distribution with a mean GPA of 2.5

In order to increase the stability of the results we applied bootstrap resampling. Analyses were performed on 100 bootstrap samples, where each has the same size as the original sample. We report as results the mean of the bootstrap analyses with approximated standard errors described by the Standard Error of the Mean.

3.2 Feature sets

To account for the different explanatory power of the individual and network aspects, we constructed four feature sets, each representing a certain aspect of life and corresponding to a specific level of information: personality, individual, network and combined.

3.2.1 Personality features

The personality features contain 16 individual personality traits obtained from questionnaires that the study participants had to fill in before receiving a phone.

3.2.2 Individual features

The individual feature set combines the 16 personality traits with behavioral and personal variables. Behavioral variables include average class attendance and the Facebook activity level (log of average number of posts per week). In terms of personal information, we added the students’ gender and their study year to the feature set. Information about the sociological background of the students was not available to us.

3.2.3 Network features

For the network features we consider metrics from five different networks, each based on a different channel (texts, calls, proximity, Facebook interactions, and Facebook friendships). Despite the large number of possible features to extract from networks, we considered only the metrics that follow the main approaches found in the literature, such as the mean GPA of peers, centrality, and the fraction of low and high performing peers. However, further aspects, such as deviation, skewness, or entropy of peers’ GPAs, would undoubtedly be interesting for future investigations.

The structure of the interaction networks provide further insight into how students’ position in their social environment is correlated with performance. Therefore, we evaluated different centrality measures.Footnote 1 Overall, the degree centrality displayed the strongest correlation and was therefore used as feature in our analyses.

3.2.4 Combined features

The combined feature set contains all 20 individual features and all 20 network features yielding a total of 40 features. See Table 1 for a complete list of features in each category. More details including descriptive statistics can be found in Section E of Additional file 1.

Table 1 Feature sets for data-driven modeling

3.3 Approach

We use machine learning techniques to evaluate the importance of different factors on the academic performance of students. Specifically, we create supervised learning models and evaluate their performance on classifying students as low, moderate, or high performers. This framework allows us to compare our results to related work, in particular, the works by Vandamme et al. [13] and Nghe et al. [15]. Furthermore, this approach makes it easier to detect significant differences between the individual performance groups. In contrast to classical statistical modeling with test of significance, machine learning uses a hypothesis-free approach that allows us to model complex interactions driven by the data [100]. We evaluate the model performance based on the mean classification accuracy of 100 independent 10-fold cross-validations.

A key point to emphasize here is that while classifying students’ performance levels based on current behavior might be useful in a practical context (for example to identify students in need of extra support), it is not our primary reason for using machine learning in the current study. Rather, we use machine learning as a tool for ranking and comparing features. That is, the more predictive a given feature is, the more important it is for describing performance. By training our models on features arising from many categories, previously only studied independently, we can begin to understand their relative importance, as well as their interplay in terms of academic performance.

4 Results

The following results are reported in three stages. First, we perform an ANOVA F-test on all features to identify the most important features for dividing students into performance groups. Then we utilize supervised learning models to investigate the importance and interplay of the different feature categories. Based on the results of the first two stages, we then conduct an in-depth analysis of the most expressive impact factors of each category. Our primary focus is on the social behavioral features which have only been considered to a limited extent in previous studies.

4.1 Analysis of variance

Figure 2 shows the feature importance for features achieving significance of \(p < 0.001\) obtained from an ANOVA F-test.Footnote 2 Although all feature categories are correlated with academic performance, the result indicates that features which describe the social networks of students have the highest explanatory power. In general, network properties dominate the results with more than half of the significant features corresponding to this category. A potential explanation for the high impact of social relations is that the network connections may act as a proxy for previous performance, since the network features include information on the grades of others. The fraction of low performing peers as well as the mean GPA of peers contacted over text messages and calls display the highest explanatory power.Footnote 3 Class attendance proves to be the most important individual feature and moreover, overall the most important one if we had no information on anyone’s grades. Centrality in the proximity network is also found to be a significant descriptor with moderate importance. Among personality traits, only self-esteem and conscientiousness have significant explanatory power.

Figure 2
figure 2

Feature importance ranking. Results from ANOVA F-test for 3-class classification. Features which did not achieve sufficient significance (\(p \geq0.001\)) are omitted

4.2 Supervised learning

In order to better understand the importance and interplay of different factors on the academic performance we utilized supervised learning techniques. We created models based on the different feature sets to classify the students as low, moderate, and high performers according to their GPAs. Each of those three groups contains the same number of students, corresponding to a baseline accuracy of 33.33%.

We use Linear Discriminant Analysis (LDA) to find an optimal model that separates the three performance classes. Figure 3 illustrates the mean results of 100 independent 10-fold cross-validations. The results show that the LDA model solely based on personality features exceeds the baseline performance by about 9 pps. Adding the four additional individual features (behavior + background info) improves the model’s performance by further 5.2 pps. Using network features instead of individual features results in a performance of about 19 pps above baseline. Combining individual and network features yields a superior model with about 57.9% accuracy; roughly 25 pps above baseline. Figure 4 shows its achieved in-class precision and recall values along with the corresponding \(F_{1}\) values. As the results indicate, once the GPA class is provided, the model has high predictive power among the low and high performers (compared to that of the moderate performers) with \(F_{1}\) values of 0.649 and 0.626, respectively.

Figure 3
figure 3

Model performances on the different feature sets. Bars show the classification accuracy of the different LDA models

Figure 4
figure 4

Precision-recall curve. Dots represent the model performance in the low (red), moderate (green) and high (blue) performer classes. Dashed lines mark the profile of constant \(F_{1}\) corresponding to the measured values for the specific class

4.3 Feature analysis

4.3.1 Individual behavior

Among the considered individual effects, class attendance was found to have the highest impact on academic performance. A correlation coefficient of \(r_{S} = 0.294\) for cumulative GPAs was determined (\(p < 0.001\)). An in-depth analysis of the observed class attendance patterns along with a detailed description of the method to measure attendance in the CNS dataset is discussed in [20].

The Facebook activity level measures the average number of published posts. Since the activity levels change significantly over time we consider each semester separately and use the corresponding term GPAs as measure for academic performance. This gives us up to four data points per student (one for each semester of the data collection period) for this analysis. In Fig. 5 students are divided into three groups of equal size according to their activity levels. As Fig. 5(a) shows, the distribution of posts among students is heavy-tailed and is described by the vast majority of the students having less than 3 posts in a typical week. The distribution of term GPA values in the different tertiles reveals that, on average, students with lower activity perform better (see Fig. 5(b)). To statistically evaluate the variation in the distribution over the different tertiles, we performed a Kruskal–Wallis H-test. This test rejected the global null hypothesis with \(p<0.001\) that the medians of the groups are all equal. A follow-up Dunn multiple comparison test with Bonferroni correction revealed pair-wise differences among the tertiles: all pairs are significantly different from each other (\(p<0.001\)). Thus, groups with different levels of Facebook activity have significantly different academic performances.

Figure 5
figure 5

Facebook usage and performance in the tertiles. (a) Division of students into three groups of equal size according to their active Facebook updates. Each box represents a single tertile, width corresponds to the span of Facebook activity in the specific group and the x-position shows the mean term GPA. (b) Grade distribution inside each Facebook activity class

4.3.2 Social interactions

Based on the results presented in Fig. 2 and Fig. 3 we conclude that a student’s performance can be accurately inferred from the achievements of their peers. This effect was consistently observed across different communication and interaction channels, as shown in Fig. 6. There, each channel is represented by a separate line illustrating the mean correlation of the members of each performance group and their respective peers. We can observe that regardless of the channel considered, each curve shows a strong increasing trend. This is further quantified in Table 2 which displays the corresponding correlation coefficients on the individual level. The most pronounced effect is observed for calls and text messages, which are considered to be proxies for strong social ties because this type of connection requires effort to initiate and maintain [101].

Figure 6
figure 6

Similarity in academic performance for social ties. Curves show the mean GPAs of every performance group and their peers from different communication channels

Table 2 Correlation between the cumulative GPA of the students and the mean cumulative GPA of their peers based on different communication channels. Corresponding p-values are below 0.001

Interestingly, these channels are not dominant in the case of centrality measures. Here, proximity interactions displayed the strongest correlation among all channels. However, we found weak to moderate positive correlations in all social networks, in agreement with the existing literature [8590].

We further assessed the validity of pairwise similarity in the network by focusing exclusively on social ties based on text messages. Figure 7 shows a scatter plot of the correlation between the own GPA and mean GPA of the texting peers for every student in the dataset. Once again, we observe a clear linear trend; the trend is especially strong in the region where the majority of the students is located (GPAs in the range between 2 and 3). In Fig. 8 we divided the population into tertiles based on the GPA and calculated the fraction of text messages exchanged with members of the different groups. Beyond the correlation, we can see that the students’ communication in each group is dominated by members of the same group. This observation further underlines the importance of the social environment for academic success.

Figure 7
figure 7

Correlation between performance of strong peers. For each student, we show their cumulative GPA versus the mean GPA of their peers obtained by their text messages. Color denotes density of points in arbitrary units

Figure 8
figure 8

Own academic performance and peers’ academic performance. Each histogram displays how students distribute their text messages exchanged with others over the various performance groups. Groups are defined by tertiles based on their cumulative GPA

5 Discussion

For the participants of the CNS, we found that the peers’ academic performance has a strong explanatory power for academic performance of individuals. We observed this effect across different channels of social interactions with calls and text messages showing the strongest correlations, further emphasizing the phenomena. As mentioned in the literature review, this effect could be caused by either peer effects (adaption) or homophily (selection). It should be noted that GPA information is used here as target and, in aggregated form, also as network feature. This allows us to analyze and understand the relationships among peers; but should be taken into account when framing the problem as prediction task.

We found network centrality to have a positive correlation with academic performance, in agreement with the literature [8590]. However, among all types of interaction networks, only proximity networks exhibited a strong effect. A possible limitation in measuring centrality is that the mere physical proximity of two individuals does not necessarily involve direct communication. Nevertheless, it is reasonable to expect an increased level of information exchange in a group of individuals if they are in close proximity, which was the case in our dataset.Footnote 4

Consistent with findings in existing literature, we found that class attendance showed the strongest correlation with academic performance when we consider only individual effects [16, 18, 19, 102106]. We also found that Facebook activity has a negative relation to academic performance—also in agreement with the majority of the studies that investigated Facebook and social media usage [6269]. We note, however, that our the data is limited to Facebook activities such as posting a status update or uploading a picture etc, and that we have no information regarding ‘passive’ Facebook usage, such as scrolling and reading. Also, our data does not include direct messages which may constitute a relevant fraction of communications performed via the social network site.

The analysis of the different personality traits revealed that two characteristics, namely conscientiousness and self-esteem, have considerable explanatory power for academic success. These two traits reached a correlation coefficient between 0.2 and 0.3 corresponding to the upper limit achievable for any correlation with a personality trait, according to Mischel [107]. The impact of other investigated characteristics could not be confirmed with proper significance. These results agree with existing literature [2453].

In the supervised learning experiment we achieved a classification accuracy of around 25 percentage points above baseline, a result similar to that of Vandamme et al. [13] While the classification accuracy is similar, comparing our results with theirs is difficult because of the very different feature sets and experimental setups. Vandamme et al. [13] use nearly ten times as many features to build a model as we did. In addition, the accuracy of Vandamme et al. [13] is driven by using prior achievement (grades), which is known to be a strong predictor of performance (e.g. due to persistence of skill and motivation). We note here that a potential reason for the similarity in performance to Vandamme et al. [13] could be that the network features used in our study include the grades of others in the network. Thus, if the network homophily with respect to academic performance is sufficiently strong, the average performance of others could serve as a proxy for each individual’s academic achievements.

Networks originating from different channels were treated separately because each network provides different information. For future studies it could be interesting to combine them and create multiplex network models which capture interactions across multiple channels and provide more information about the actual tie strength.

In summary, our findings—together with the results in the literature—emphasize that there is a considerable dependence of academic performance on personality and social environment. This experiment is by no means an attempt to be exhaustive of the possibilities for impact factors. Rather, we hope that this demonstration will stir interest to further study the impact of the social environment on academic success, as well as the interplay of individual and network factors.

5.1 Limitations

Although we utilized wider and more detailed data than most other studies, our approach also has important limitations which need to be taken into account. First, we only observed students from a single, technical, Danish university. For this reason, the findings may not be generalizable to students at other institutions, of other academic disciplines or with other demographics. Furthermore, only a subset of all the students at DTU participated in our study—for first year students the rate was around 40%. Although we observed a high degree of variation with respect to behavioral and network measures as well as academic performance, our sample may not be representative of the whole student population. Our measures of ego-networks and model estimates reflect only the smaller (and not closed) community of students in the CNS within the larger population of students.

Although direct measures overcome a lot of the limitations of surveys and self-reports, they continue to be affected by standard concerns over observational data, including selection bias, information bias, and confounding [108]. In particular, confounding plays a big role in our study as there are many factors that we were unable to capture but provenly affect the academic performance directly or interplay with other observed factors. For instance, many socio-economic variables have been identified as good predictors for academic achievements [109112] but unfortunately such data was not available to us. There was also some tendency of selection into the study as the average student in the study tends to achieve higher grades than non-participants [113]. Furthermore, investigations on the CNS data have revealed, that findings differ slightly for men and women [114].

Social network observations were limited to phone calls/texts, meetings, and Facebook activities. Although these are arguably some of the most important means of communication, some students may communicate via other smartphone apps. Our method of inferring attendance is also subject to some noise (as thoroughly discussed in [20]). Furthermore, it does not imply in-class participation nor attention to the taught material.

Although we have identified many factors that correlate with academic performance, we make no claims regarding causality. The question of establishing causality from purely observational data is far from trivial. Thus, while being beyond the scope of this work we consider this question as promising and interesting for future research.