Having observed that users are exposed to politically diverse news articles far more by their friends than directly by the media sources, we now test the PoNS model and examine which of the four factors (gratification, selective exposure, socialization, and trust & intimacy) are better associated with the chance of sharing political news. We use a logistic binomial regression, which models the probability that a user retweets a given news article based on twelve predictors extracted across the four factors. All predictors undergo a logarithmic transformation, when necessary (i.e., when they are skewed). The dependent variable is thus:
Since our data only includes positive cases - that is, the cases when people share the news articles - we need to augment our dataset with negative cases (by under-sampling them): we do so by adding an equal number of negative cases - that is, with a set of random news article-and-user pairs. By construction, the resulting sample is balanced (the response variable is split 50-50), and the accuracy of a random prediction model would thus be 50%. We model a retweeting probability as a linear combination of the predictive variables, plus terms for interactions. We use the first 7 and a half months of our data to calculate the independent variables and use the last two weeks of data for the test, which had 14,309 retweeting cases. Adding the same number of random negative cases, we use 28,618 cases to build the model.
The results of the logistic regressions are reported in Table 4. The coefficients reported tell us the extent to which the corresponding predictors explain the retweeting behavior. The p-values indicate the extent to which coeffiecients are statistically significant. To show how well the model fits the data, we use Hosmer-Lemeshow test of ‘goodness-of-fit’ and report and its p-value. Please note that with Hosmer-Lemeshow test, the higher the p-value of the model, the better the model fits the data. The Hosmer and Lemeshow’s (H-L) goodness of fit test divides subjects into deciles based on predicted probabilities, then computes a chi-square from observed and expected frequencies. Then a probability (p) value is computed from the chi-square distribution to test the fit of the logistic model. If the H-L goodness-of-fit test statistic is greater than 0.05, as we want for well-fitting models, we fail to reject the null hypothesis that there is no difference between observed and model-predicted values, implying that the model’s estimates fit the data at an acceptable level. That is, well-fitting models show non-significance on the goodness-of-fit test, indicating model prediction that is not significantly different from observed values.
Logistic regression coefficients cannot directly be interpreted on the scale of the data as models are nonlinear on the probability scale. To ease the interpretation of the logistic regression coefficients β, one could apply the ‘divide by 4’ rule which can be applied if the probabilities (i.e., values of the outcome variable) are close to 0.5, that is the case for our data . To see how, take a predictor x (e.g., whether or not the article is about politics), its regression coefficient , and the outcome variable . From the idea that the slope of the logistic curve is maximized at the center point, one can take the logistic regression coefficient and divide it by 4 to get an upper bound on how much a unit difference in x (e.g., whether article is about politics or not) would change the outcome variable (e.g., probability of retweeting the article). If is, for example, 0.8, then articles about politics are likely to be retweeted with a probability 20% () more than articles of any other subject.
6.1 General news sharing
We first investigate the generic news sharing pattern. We consider retweeting cases not only of political news but also of other kinds of news for comparison. Table 4 reports the results of the logistic regression: the ‘original model’ column fits the original dataset, while the ‘revised model’ column includes only the significant predictors whose sign remain unchanged compared to those of the original model. Both models fit the data better than the null model and the prediction error rate of our model is only 0.19, while that of the null model is 0.5. Below we discuss the findings.
Gratification: The F1 feature is statistically significant, while F2 is not. The number of repeated exposures to the same article (F1) is positively correlated with retweeting the news, emphasizing the importance of a news article being informative to be retweeted. The positive coefficient of 0.93 indicates that one extra exposure to the article increases one’s retweeting probability by 23% (0.93/4 = 0.23). On the other hand, what a user generally likes is not correlated to what he shares (F2). This finding counters what had been found in more traditional settings: one major motivation for consuming and sharing news is entertainment .
Selective exposure: Both F3 and F4 are statistically significant variables. People tend to retweet news articles in subject areas other than politics. The negative correlation for F3 indicates that a user is 20% less likely to retweet political articles as opposed to other types of news (−0.8). When articles about politics are concerned, one retweets them more with a high positive correlation, if they express political views one agrees with (F4, 0.72). This suggests that although Twitter allows the flow of politically diverse news articles, people have a strong tendency to retweet only what matches their views.
Socialization: We find that what one’s followers are interested in (F5) is positively related to what one chooses to share (0.57). This finding is in line with findings from other work  in that social interaction is a key factor that encourages information sharing in the online world. Trying to please one’s friends may be particularly important in Twitter.
Trust and intimacy: The results show that all the variables except for F10 are statistically significant, and only few are mildly correlated. The significance of source credibility (F7) shows a negative correlation (−2.09). This indicates that a user is 52% more likely to retweet news articles that come from media sources than from friends. However, news from a friend who has a mutual relationship (F8) have a 3% higher probability of being retweeted (0.13). To a limited extent, one is also likely to preferentially retweet news coming from popular friends (F9). Finally, political news is unlikely to be shared, yet a user is 8% more likely to share a political article given that it was shared by a friend (F13, 0.31). This peer pressure effect was even true for friends who had opposing political views (F12, −0.37).
The regression analysis can determine the relative importance of the 13 predictors (in the following order): trust & intimacy, gratification, selective exposure, and socialization. Significant factors are ranked based on how much they increase the retweeting probability and are summarized in Figure 5. Each column reports the results for the original dataset and the politically balanced dataset, respectively. For the two datasets, the impact of each factor varies in scale, but their signs (positive or negative) do not, speaking for the validity of the results.
To sum up, the credibility of a news outlet (trust & intimacy) and the informativeness or the enjoyment of the articles themselves (gratification) are the two strongest factors that motivate people to share news. Socialization plays a role in choosing news topics to a certain extent - what a user shares depends on what his friends like. In sharing political news, we see that people share political news less frequently than other types of news; however, when they do so, the political stances of articles are likely to match those of the users (selective exposure) or of their friends. As one might expect, one’s taste is a strong motivation to encourage to share a news article. However the above results also suggest that social relationships do affect media consumption in notable ways.
6.2 Political news sharing
Next, we focus on the specific question of whether users retweet articles differently depending on the article’s political views. We consider two situations: one in which a news article matches the retweeter’s political views - that incorporates 3,379 positive retweeting cases, and the other in which it does not match (701 negative retweeting cases). We run a logistic regression for these two cases separately, and report the results in Table 5. For the two regressions, the likelihood ratio test were significant at the 5% level. In both cases, the strongest predictor is numexposures, which is the number of times the retweeter has been exposed to the article. If the article agrees with the retweeter’s political views, then the article does not necessarily agree with the followers’ political views (−0.44) and is likely to come from reciprocal friends (0.14), who might happen to have diverse political views (−0.27).
In contrast, if the article disagrees with the retweeter’s political views, then the article is likely to be of retweeter’s interest (1.23) but not necessarily of followers’ interest (−0.36), match followers’ political preferences (1.546), come from friends who have different political views (−1.60), and come from friends with whom one has a mutual relationship (0.52). This means that, when people decide to retweet political articles, they do care about their online social relationships (e.g., who shared, who is the audience). When it is an article contrasting their views, then social context becomes more significant. As such, contextualizing the news reading experience could offer ways of nudging people to accept a variety of political views.