Political theorists have long argued that exposure to diverse perspectives is vital to a robust civil society and to the development of citizens (Kahne et al. 2012). Democratic discourse requires engaging with people who hold different perspectives, and while networked technologies hold great promise for bringing people together, new technologies also have frightening capacity to separate people into ideologically segregated online spaces.

Massive Open Online Courses (MOOCs) are potential sites for students to engage with peers who hold differing beliefs, but the scope and scale of discussions among thousands of students makes tracking these interactions in detail a challenging task for faculty and course teams. In this work, we have prototyped a series of computational measures of engagement across difference: measures that could be deployed in near real-time in courses that give faculty some insight into the nature of participant interactions within and across political affiliations.

Background and Context

Internet researchers have posed two competing theories for how people confront differences on the Web (Gardner and Davis 2013). One theory holds that the Internet is a series of “silos” where individuals seek out media and communities that conform to their established beliefs (Pariser 2012). Another theory holds that the Internet contains many interest-driven spaces that serve as ideological “bridges” (Rheingold 2000), where people attracted to these interest-driven spaces can be diverse across many dimensions. Previous research has examined how technology-enabled platforms - social networks, web browsing, news aggregation - affect consumption of political content (Garrett 2009; Gentzkow and Shapiro 2011; Athey and Mobius 2012; Flaxman et al. 2016; Quattrociocchi et al. 2016). This prior work generally suggests a picture of technology at odds with healthy civic discourse, in which users seek and find ideologically homogenous news sources, rather than exploring the diversity of available viewpoints (Sunstein 2017; though see Boxell et al. 2017). The media environment surrounding the 2016 U.S. Presidential Election shows a stark example of these divides (Faris et al. 2017).

The rise of open online education has potentially offered a new pathway for students to join communities of diverse learners and actively participate in political discourse with others (Reich et al. 2014). Demographic research into massive open online courses (MOOCs) has shown that these courses are among the most diverse “classrooms” in the world, with students of different ages, levels of education, and life circumstances (Chuang and Ho 2016). MOOCs have the potential to bridge the geographic patterns that divide students in brick-and-mortar schools along ideological lines (Orfield et al. 2012). But this optimism is tempered by several important questions that have yet to be answered empirically. Does the demographic diversity in MOOCs translate into ideological diversity? Do students use these learning communities to encounter and consider different perspectives, or are student interactions limited to communicating primarily with other students who share pre-existing ideological perspectives?

This work extends a budding literature on the discourse within online learning environments and beyond. Peer interactions are central to the pedagogical designs of online education (Siemens 2005). However, most of the previous research on discourse in MOOCs from edX and Coursera has focused on how in-course language relates to student persistence and dropout (Koutropoulos et al. 2012; Wen et al. 2014; Yang et al. 2015).

In our work, rather than focusing on how forum activity and language use predicts student performance, we are interested in peer discussion as an educational end itself. Effective citizenship education and a healthy civic sphere require opportunities for public deliberation (Della Carpini et al. 2004). One prerequisite for health public deliberation in MOOCs is that people with differing beliefs should engage one another directly, which we can evaluate by measuring the political beliefs of students and examining whether students with different beliefs respond to one another in forums. We can also investigate the quality of deliberation by examining language use directly among political partisans. Political psychologists have observed that political partisans often shape debates through the use of competing “framings” for issues (Lakoff 2014), such as defining estate taxes as death taxes, or referring to tax cuts as tax relief (or, in the education policy space, the recent shift from away from “vouchers” and towards “funding that follows the student”). We hypothesize that extensive partisan framing would lead to divergent language whereas a convergent language would indicate that these kinds of rhetorical moves—often aimed at winning arguments rather than understanding others—were not excessively shaping forum discourse. Other interpretations of language convergence are certainly possible, Welbers and de Nooy (2014) use conversational accommodation theory, which observes that people adjust to their conversation partner’s verbal and non-verbal behaviors, to evaluate convergence in an ethnic group discussion forum. Conversational alignment is an important line of inquiry in linguistics, and new computational methods are advancing the field (Doyle and Frank 2016). Our work extends a normative perspective on this work, identifying the “inadequacies of existing discourse relative to an ideal model of democratic deliberation” (Gastil 1992). Our data allow us to test whether MOOCs breed a discussion that aligns ideologically distant students in their forum behavior and in their language use. These questions of how discourse in online forums promotes democratic deliberation have been addressed previously through hand-coding of forum interactions (Loveland and Popescu 2011), and we build on this literature by developing fully automated approaches.

In this paper we tackle this broad agenda by providing novel evidence for four research questions. First, do MOOCs attract enrollment from an ideologically diverse student body? Second, is that diversity reflected among students’ participation? Third, do students selectively interact in the forums (via comment or up-vote) based on partisanship? Finally, do students converge on a shared language, or does their discourse remain divided along partisan lines? Each of these questions identifies a unique and necessary prerequisite for engagement across differences in MOOCs.

The rest of this paper proceeds in three parts. We first introduce the two courses, their students, and their political beliefs. Then, we then describe the student interactions in course forums. Finally, we analyze the text of the posts, to evaluate the degree to which students with diverse political beliefs converge on a shared language and interact constructively with one another.


Our data are collected from two online courses run by HarvardX on the edX platform. Each course was taught by a Harvard professor and modeled on a real campus course. Course material was released in weekly chapters over 3–4 months. Students were also asked to complete a pre-course survey at the beginning of the course, which included measures of political ideology. Both classes also had a discussion forum, in which students were asked to post regularly as part of the requirements for completing the course. However, the course administrators did not specifically check the students’ posts to confirm their completion credit. Additionally, while this directive might affect the volume of posting, the nature of those posts is still determined by the students - including the contents of the posts, and the placement of posts within the forum discussion.


Saving Schools was a course about U.S. education policy and reform offered by HarvardX on the edX platform that ran from September 2014 to March 2015. The course was taught by Paul Peterson, Director of the Program on Education Policy and Governance at Harvard University and Editor- In-Chief of Education Next, a journal of opinion and research. The course was designed around Peterson’s (2010) book Saving Schools and consisted of four mini- courses based on chapters of the book: “History and Politics of U.S. Education,” “Teaching Policies,” “Accountability and National Standards,” and “School Choice.”

Each mini-course was 5–6 weeks long, with content released in weekly bundles according to topic. Each week included a package of materials, such as video lectures, assigned reading, multiple choice questions, and discussion forums. For example, in the second Saving Schools module, “Teaching Policies,” the weekly modules included discussions of “Teacher Compensation” and “Class Size Reduction.” The “Teacher Compensation” module included three video lectures with the homework questions “are teachers paid too little?”; “are teachers paid too much?”; and “are teachers paid the wrong way?” Students were then instructed to read two opposing Education Next pieces on teacher pay and to respond in the forums to a discussion prompt on that topic. Some weeks, students were split into discussion cohorts by letter of last name or date of birth. Learners earning a certificate were required to post at least once in the discussion forum each week, which they confirmed through an honor-based self-assessment.

The politics of U.S. education reform do not perfectly align with conservative/liberal distinctions, but the education policy preferences of the professor - Paul Peterson - are generally associated with conservative positions. His journal, Education Next, is considered one of the leading publications for conservative viewpoints on education policy issues, and executive editor Martin West was an educational advisor to Mitt Romney’s presidential campaign. Prof. Peterson is a proponent of free market reforms, school and teacher accountability, charter schools, and standardized testing; and he has been critical of policies advocated by labor unions and schools boards. Our informal assessment of Saving Schools is that Prof. Peterson provides multiple perspectives on issues and gives each side a fair hearing, though he also makes clear his own, generally conservative, policy preferences.

American Government was a course about the institutions of American politics that ran from September 2015 to January 2016, taught by Harvard Kennedy School of Government faculty member Thomas Patterson. Patterson is an expert in media and public opinion. The course topics ranged from constitutional structures, political parties, the role of the media, and other elements of the national political system. The course contained 24 modules released over four months. Each module included discussion questions; for instance, in the first unit on dynamics of American power, students were asked to discuss the prompt: “‘Money is power’ in the American system. Explain some of the ways that money is used to exert influence and who benefits as a result.” As with Saving Schools, students confirmed their participation in forums through an honor-based self-assessment. The course also used mechanisms to divide students into discussion forum cohorts by last name. Our assessment is that the selection of course topics conveys some bias or emphasis on center-left interests—concerns with income inequality and money in politics for example—in the context of a largely non-partisan explication of how American government functions. If Saving Schools tilts right, while largely providing a balanced perspective on issues, American Government tilts left, while largely remaining non-partisan.

Population of Interest

While total enrollment for these two online courses was 30,006, most of these enrollees do not actually engage with the course content - only 16,169 did anything more than enrolling through the course website. Of those students only 7,204 started the precourse survey, as one indicator of introductory activity. In Table 1, we use data collected from outside the survey to compare the demographic composition of the survey respondents, relative to non-respondents.

Table 1 Demographics of course participants

Among this population of survey-takers, 45.8% reported being from the United States. While these courses drew diverse international interest, we focus on these U.S. students in all our analyses below, for three reasons. Most practically, the survey in Saving Schools did not display the ideology questions to non-U.S. students (by instructor design). Second, these U.S. students form a clear plurality of the student body, and the subject matter mostly focuses on U.S. politics. Finally, there may be cultural differences in the ideological foundations of partisanship, and our measures may not capture the true diversity in our international participants’ points of view. Therefore, our theoretical interpretation of ideological differences will be most clear among the subset of students that are from the United States.

Political Beliefs

Students’ ideology was measured during the pre-course survey in both courses. Almost all U.S. students who took the survey completed these items. However, the measures were different in each course, as we describe below. Broadly, to analyze American Government (n = 1,258) we used a single measure of generic political leaning, while in analyzing Saving Schools (n = 1,315) we used a four-item measure of political leaning within specific topics in education policy, which was transformed into a single ideology dimension. In all our analyses we use the continuous ideology measure. However we also divide the ideology dimension into a tripartite categorization (i.e. “liberal”, “moderate”, “conservative”) for graphical representation.

Saving Schools

Students were given a set of questions used previously in a nationally representative poll on education policy (Peterson et al. 2014; Education Next 2015). Many education policy issues do not map on to typical left/right divides in American politics, so we chose the four questions that were most strongly correlated with broader measures of political partisanship: questions about school taxes, school vouchers, unions, and teacher tenure. In Table 2 we show the responses in our target sample (i.e. US-based survey respondents who posted in the forum), along with the responses from the original, nationally-representative survey. In general, our sample covered a wide range of political viewpoints across the US population.

Table 2 Responses by Saving Schools U.S. forum posters to education policy questions. Answers from the original nationally representative poll in italics

These responses generally mapped onto a single dimension of partisanship, as shown in Table 3 (note that the tenure question has an opposite sign from the others). That is, people who were generally on the “left” in terms of education policy were more likely to support higher taxes, tenure, and unions, while people who were generally on the “right” were more likely to support vouchers. This aligned with our understanding of the general terms of debate in current policy, so we condensed these four measures into a single-dimensional measure of ideology. We standardized the responses to each question (by subtracting the mean and dividing the standard deviation), reverse-scored the tenure question, and averaged the four measures into a single ideology index. Additionally, we confirm that our results are robust across alternate mappings to a one-dimensional ideology scale (such as the first principal component from a principal component analysis). To create tripartite categories, we divided this continuous measure into equally-sized terciles - that is, with a third of all students in each bucket. Due to the liberal skew of the student population, this meant that the liberal third was more strongly liberal than the conservative third was conservative.

Table 3 Correlation matrix of ideology measures in Saving Schools

American Government

Students were asked a single item which targeted a more general measure of ideology, taken from the World Values Survey (2009). Participants answered the following question:

“In political matters, people talk of "the left” and “the right." How would you place your views on this scale, generally speaking?”

They responded on a ten-point scale, and the distribution of responses is shown in Table 4, along with the results from the most recent US sample of the World Values Survey. In general, our sample provides good coverage of the ideological spectrum. The central tendency of the distribution is, on average, more left-leaning than the population at large. However, our analyses are more concerned with the range of views, and these results suggest that the student body does contain a diversity of viewpoints. To create tripartite categories, we divided the discrete scale points into three approximately equal groups: 1–3 (“liberal”), 4–5 (“moderate”), and 6–10 (“conservative”).

Table 4 Ideology responses by American Government U.S. forum posters. Answers from the original nationally representative poll in italics

These results address the first of our four research questions concerning who participates in politically-themed MOOCs. That is, both classes managed to attract students that represent a diverse range of political beliefs, across the ideological spectrum. Though the demographic diversity of MOOCs has been well-established, to our knowledge, this is the first direct evidence for ideological diversity. And this is a necessary (but not sufficient) condition for engagement across differences. In the remainder of the paper, we merge these data with the forum activity logs to test our three remaining research questions, which concern ideologically-driven participation and interaction in the course discussion forums.

Forum Results

The structure and function of the discussion forums in both classes were identical, and this structure is displayed in Fig. 1. The top level of the forums included “threads”, and the forum structure allowed for up to two other levels of posting below each thread. Specifically, students could post “replies” to an original post, and each reply could be followed by an arbitrarily long list of “comments” in chronological order. Thus, the discussion threads had three levels of responses: initial posts, replies to initial posts, and comments on replies. Additionally, students were allowed to “upvote” threads and replies (but not comments), which promoted posts and replies to a higher position in the thread.

Fig. 1
figure 1

Thread Structure

In general, each thread was self-contained, with no interaction across threads. Likewise, the replies within each thread were also self-contained, in that replies almost never responded to other replies. Thus, almost all student-to-student interactions were nested as comments within each reply, since as we shall see the vast majority of initial posts with active discussion were those started by course staff. After a student posted a reply, other students could interact with that reply by giving an upvote or adding a comment.

In our analyses, we assume that every post is directed towards the post above it in the thread - all replies are directed to posts, and all comments are directed to replies. This interpretation is in line with the intent of the commenting platform design. But in practice this was not always the case. For instance, some commenters direct their comments towards each other, rather than towards the post or reply above. In addition, some posts address a number of posters in a thread simultaneously (e.g. “let me try to synthesize four perspectives in this thread”) rather than referring to one person in particular. Still other posts were off-topic, and directed to no one at all. In these cases, the raw trace data from the forum might not match the posters’ intent.

To evaluate the extent of this mismatch - and to perform other basic qualitative coding tasks, like removing off-topic posts - we developed a new software tool: Discourse (Kindel et al. 2017). This tool handles many common formats for forum data, allows individual posts to be read and rated within the context of the other posts in the thread that preceded it. We had a team of coders (at least two per post) read and rate 24,556 posts in the course-focused threads (see below) from both courses. We asked coded to classify posts according to the writers’ intent in the context of the conversation. In general, their results confirmed that the metadata structure of the forums was close to accurate, with over 90% agreement between coder consensus and trace data. In fact, the trace data was as consistent with coders as the coders were with one another. So throughout we report analyses using the raw trace data. But we confirm that are substantively robust across other analytical strategies - either by assuming instead that the qualitative codes are ground truth, or else by focusing only on posts where humans and trace data agree.

Our first analytical strategy involves testing the “assortativity” of these reply-comment and reply-upvote interactions - namely, whether forum activity is self-sorted into ideologically consistent groups. To perform this test, we compare whether the ideology of a reply poster is at all predictive of the comments and upvotes they eventually receive (collapsed across the order of all threads and replies). Our second analytical strategy involves testing the partisan distinctiveness of individual posts. That is, we strip the forum context from replies and comments and treat each substantive forum post as an independent document, so that we can compare the language used by students with opposing ideologies.

Thread Types

Across both courses, we observe 2,125 threads, containing 16,522 replies, 9,889 comments, and 2,566 upvotes. But forum threads serve many different purposes in MOOCs (Stump et al. 2013; Wen et al. 2014), and forum actions were not distributed evenly across threads. Accordingly, we first consider two top-level categories of threads, based on who created the thread (student vs. course team) and the contents of the thread (administrative vs. low partisan salience vs. high partisan salience). The results of this categorization are shown in Table 5, and described below. For later analyses, we focus on course-generated content threads, and exclude both student-generated and administrative threads.

Table 5 Categorization of Thread Types. Cells indicate counts across both courses. Average forum actions per thread (comments, replies, upvotes) in parentheses

Course Vs. Student Threads

The most salient distinction is between “course threads” that were generated by a member of the teaching team, and “student threads” that were generated by the students themselves. For each chapter, the course team created a top-level “thread” in the discussion forum, and the class was usually broken into thirds (based on username or birthdate) to create smaller communities, which is a common MOOC practice (Baek and Shore 2016). The top post in these course threads provided a question about the most recent chapter, and students were encouraged to participate. Students started threads for two reasons. The first was to generate new topics of conversation. The second was that students sometimes tried to reply to a teaching-team thread, but accidentally created a new thread. The distributions in forum activity across these thread types were non-overlapping. Compared to student threads, course threads on average garnered far more replies (76.1 vs. 0.3), comments (40.7 vs. 0.2) and upvotes (9.9 vs. 0.1). The counts for student-generated threads were not much higher when we exclude all the orphaned threads that received zero comments or replies - specifically, even the subset of threads that had any activity still received only 1.5 replies, 0.9 comments, and 0.3 upvotes per thread, on average.

Thread Contents

The forum threads could also be categorized in terms of their function and content. Some serve an administrative role, such as having students introduce themselves, making announcements or providing feedback from the course staff, or gathering logistical or technical questions about the course platform. The vast majority of threads focused on the content of the course itself. But even then, some threads touched on topics that were more political in nature, while others focused on less controversial topics. Though the raw activity levels were similar across thread types, we were particularly interested in how political ideology affects more- and less-politically-salient threads.

All threads (student- and course-generated) were grouped based on content into three categories. First, administrative threads were partitioned by the authors, and set aside. Second, content threads were divided into high and low political salience, according to the presence of a salient issue mentioned in the question posted by the course team to start the thread. These were coded by a research assistant, and confirmed by the authors’ own readings of the post contents. In Saving Schools, this coding captured the presence of themes in U.S. educational policy that were described in the course as controversial or worthy of further policy discussion, such as high stakes testing, the No Child Left Behind Act (NCLB), the Common Core, teachers’ unions, and charter schools, including the policies used in the ideology questions above. In American Government, this coding captured the presence of any of the twelve politically controversial issues identified as controversial in the Cooperative Congressional Election Study (Schaffner and Ansolabehere 2015), such as abortion, gun control, or tax rates. In both courses, the remaining threads were focused on conceptual or comprehension questions about educational and political institutions, and did not fall into an issue category. Accordingly, all threads that received an issue code were identified as having high partisan salience, while all threads that did not receive an issue code (and were not previously labeled as administrative) were identified as low partisan salience.

Ideology and Forum Participation

Of all the students who posted to the forums at least once, 39% of them were (i) from the U.S. and (ii) answered the ideology scale in the pre-course survey. These students accounted for 42% of forum activity in the focal (i.e. non-logistic course-created) threads. We did not try to infer the ideology of students who did not report their ideology. So the students that met all of these criteria formed the effective sample for all the following analyses of how partisanship affects forum participation.

Our second research question was whether students across the ideological spectrum were participating at similar rates. We tested this by comparing the rates of the three main forum activities (replying, commenting, and upvoting) across the entire course, among the U.S. students who reported their ideology. This analysis found essentially no relationship between political ideology and total forum activity (rank order correlation: rτ = −.019), and this result was consistent when we looked separately at replies (rτ = −.024), comments (rτ = −.015), and upvotes (rτ = .007). These relationships between ideology and forum activities are also plotted in Fig. 2 at the person level, separated by activity type.

Fig. 2
figure 2

Ideology and forum activity in substantive course threads

It is possible that this diversity of activity at the course level might not translate to the thread level. That is, it is possible that liberals and conservatives participated equally overall, but did so within distinct sets of ideologically segregated threads. To investigate this, we calculated the average ideology of the posters (comments and replies) in every substantive course thread. These averages are plotted in Fig. 3 - the x axis represents the average ideology of the posters (with zero being equal balance between liberals and conservatives), and the y axis represents the effective sample size of each thread (i.e. the number of posts for whom the poster’s ideology is known). The results show that almost all the threads in both classes had a balanced ideological contribution, relative to the distribution of individual posters. There are some conservative-leaning threads, but they are small (under fifty posts) and limited to a small number of partisan topics. This result provides further evidence that these courses were rich in opportunities for students of different ideologies to interact with one another. For students to build “bridges” in online courses, it is first necessary that forum threads include a range of political perspectives, and this condition seems to hold in our data.

Fig. 3
figure 3

Ideological distribution of posters in substantive course threads. The y axis shows the effective sample size (i.e. post for which ideology is known) and the x axis shows the average ideology among those posts. Error bars +/− 1 SE

Ideology and Forum Interactions

These classes contain a diversity of viewpoints. But do students actually engage with their ideological opposites? We investigated the role of students’ ideology in student-to-student forum interactions, specifically how other students respond to the replies at the top of each thread. There are two primary kinds of interaction - students can either upvote the replies directly, or they can write comments to the thread. Across both courses, we observe 400 reply-upvote pairs and 2,914 reply-comment pairs in which (a) both students were from the U.S. (b) both students answered the ideology scale, and (c) the pair occurred in a course-team-generated thread focused on course content. The analyses below focus on this sample of forum interactions.


Upvoting was uncommon, and only 10% of the replies in the focal sample received an upvote, for an average of 8.3 per thread. In all our analyses we remove all upvotes that students gave to themselves, since this could not reflect engagement with differing perspectives. We also dropped all upvote events that were immediately followed by an “unvote” event, since that was most likely the result of a mis-click rather than a true endorsement of the post.

If ideology had no impact on upvoting behavior, we would expect to see no correlation between the upvoters’ ideology and the original poster’s ideology. However, we do see some evidence of ideology-based assortativity in upvoting. In particular, American Government posters were more likely to gather upvotes from people with shared ideology than not (rank-order correlation: rτ = .121, z(213) = 2.0, p < .05), though this relationship was not apparent in Saving Schools (rτ = −.019, z(155) = 0.43, p = .669). In both classes we did not find moderation by thread type - that is, the assortativity of upvotes was constant within each class across substantive and partisan threads. We visualize these relationships in Fig. 4. For all focal threads in each course, we divide all identified upvote-reply pairs according to the tripartite classification (i.e. liberal, moderate, and conservative) of both the upvoter, and the original poster. This produces nine ideological pairings (3 × 3) for each course. These results suggest that the ideological sorting in American Government is stronger among conservatives, who upvote other conservative posts more than liberal posts, unlike liberals, who more evenly spread their upvotes across the ideological spectrum. We are reluctant to interpret the differences between classes, or between liberals and conservatives, though there are perhaps many class design choices that could affect these distributions.

Fig. 4
figure 4

Distribution of upvotes per reply in substantive course threads


Commenting was more frequent than upvoting. The average thread had 45 comments, and 17% of all replies received at least one comment. Among these, we can exclude students who comment on their own replies, and students without ideology scores, to focus on the 2,587 reply-comment pairs that allow us to measure ideological influences on forum behavior.

Comments were not evenly distributed across replies - 83% of replies received zero comments, and most of the other threads only received one single comment. However, some replies spawned comment sections that were much longer. Much of this heterogeneity was simply due to timing. Longer threads mostly sprung from the first or earliest replies, while replies posted later on were typically ignored (and this was most likely due to the design of the interface, which displayed earlier posts more prominently). Furthermore, these longer threads often included commenters interacting with one another. These complexities could affect our ability to measure ideology-driven connectivity between the main reply and individual commenters, especially later commenters. Accordingly, we also analyze ideological assortativity among only the first comments for each reply, as a robustness check.

The distribution of comment length was also uneven. Specifically, we noticed that many comments were disproportionately short, and this was especially true in the American Government threads - 34% of all comments in that course were under four words long, while that was true of only 2% of comments in Saving Schools (and less than 1% of other posts). Upon closer inspection, almost all of these short comments were simple content-free agreement (e.g. “well said”, “I agree”, “good points”, and so on). We suspect that many of these were written so that students could claim participation credit. This credit-seeking might also explain why a small fraction of posters were unaware of the topic of discussion (and in some cases, obviously plagiarized).

To filter these out, we recruited a team of six human coders to go through the course-created threads and, using Discourse, manually label each reply and comment as either (a) substantive, (b) a short yes, or (c) off-topic. Each thread was assigned to at least two independent coders, who each labelled every post in the thread, following the same order in which the posts were originally written. Any disagreements in the labels given by the first two coders were resolved by an independent third coder. Using the final labels, we removed the off-topic posts (0.5% of all posts), and separately counted the substantive comments (64% of on-topic posts) and short yes comments (36% of on-topic comments) as distinct forms of interaction between reply poster and comment poster.

In Table 6, we report rank-order correlations (with 95% confidence intervals) of ideology-based sorting among different subsets of reply-comment pairs culled from the course-created content threads. Following the analyses of reply-upvote pairs, these tests are all non-parametric correlations between the original posters’ ideology and their commenter’s ideology. Here again, a positive correlation between poster and commenter would indicate more siloing, on average, while a zero or negative correlation would indicate engagement across difference. In general, we find little evidence for ideology-based sorting in the forum comments. That is, most of these correlations are not significantly different from zero, which indicates that partisanship is having no aggregate effect on how commenters sort themselves among various replies. In particular, the high partisan salience threads in both classes contain a range of post-comment pairings, while the low partisan salience threads may have some modest ideological sorting among their substantive comments.

Table 6 Ideological sorting in reply-comment pairs, by comment type. Values reflect spearman correlations between poster's and reply-writer's ideology scores, with 95% confidence intervals

Partisan assortativity can also be represented graphically, as in Fig. 5. Here we display the percentage of substantive comments (i.e. the first column of Table 6) as divided by the tripartite ideology of the poster and the commenter. We divide these results by the partisanship of the thread in which each poster-commenter appears, and in general they agree with the analyses presented in Table 6. Posters often receive comments from students with divergent ideology. In particular, threads on subjects with high partisan salience seem to induce more interactions between students with different views of the world. These results are consistent with our hypothesis that MOOCs might provide a meaningful space for interactions between people with opposing views, rather than providing yet another echo chamber on the Internet.

Fig. 5
figure 5

Distribution of comments per reply in substantive course threads

Forum Language

In this section, we consider our final research question: how does ideology influence the language students use? The results so far have relied on tracking data, to simply show that partisan opponents interact in the discussion forums. However, tracking data cannot tell us the nature of that interaction. Do they diverge along partisan lines and talk past one another? Or do they converge on a shared language? To answer these questions we turn to the contents of the forum posts.

Descriptive Statistics

After removing students who were not from the US and students who did not answer the ideology questions, the course-generated non-administrative threads in the two course forums included 4,516 and 5,452 posts (i.e. replies and substantive comments). The posts in this dataset provided us with the opportunity to model the language that partisans use in discussion with one another. The distribution of the length of these posts is given in Fig. 6. The average post across both courses was 125 words long (SD = 116). And main replies were somewhat longer, on average (m = 142, SD = 125) than the substantive comments they received (m = 91, SD = 89). In our analyses of partisan language to follow, we collapse the thread structure to consider each post as an independent event. This removes from analysis all information about the order of posts, and whether a post was a comment or a reply, and to which reply each comment was directed. We do, however, account for some thread-level information by considering how content systematically varies across course chapters. Each thread used here was posted for students to specifically discuss one chapter from their course, and included a chapter-related content question from the course administrators as the top-level post. There were 25 chapters in Saving Schools, and 24 chapters in American Government, and each chapter received 2–6 threads (depending on whether the threads were subdivided based on username, or whether the entire class was funneled into the same thread). We expected (and confirmed) that there would be differences in word use between the threads of each chapter, that was orthogonal to the partisanship of individual posters, and might cloud our ability to detect generic markers of partisanship. In both the classification results and topic modeling results below, we pre-process the text features to remove those that were particularly concentrated in only one chapter of a course (e.g. “cognitive skills” and “merit pay” in Saving Schools, or “equal protection clause” and “invisible primary” in American Government) so that the algorithm can prioritize the detection of partisan-leaning features that generalize across many threads and discussions.

Fig. 6
figure 6

Distribution of word count per post in the focal set of course threads

Partisan Distinctiveness

To construct an initial test for the existence of a linguistic partisan divide, we treated the forum data from each course as a standard supervised learning problem (Jurafsky and Martin 2009; Grimmer and Stewart 2013). Specifically, we first extracted a wide set of features from the text of the forum posts (as above, the data from the two classes were kept separate). We then used those feature counts as the inputs into penalized linear regression algorithms, which each estimated a model that could best predict each posters’ self-reported ideology (Groseclose and Milyo 2005; Gentzkow and Shapiro 2010). If opposed students were simply talking past one another, we would expect the language they use to reliably reveal their ideology. That is, we could expect it to be relatively easy to detect a person’s ideology, because they would process the course material through a biased partisan filter. On the other hand, if students were converging on a collaborative discourse, we would expect few linguistic markers of partisanship in the students’ posts.

Feature Extraction

We followed a typical “bag of words” approach, in which we simply counted the most common words and phrases from each post, removing all information about the order in which those words and phrases appear. The text from each post was parsed according to the following steps (Feinerer et al. 2008; Benoit and Paul 2017). In order, the text was converted to lowercase; then contractions were expanded; then punctuation was removed. Common stop words (“and”, “the”, and so on) were also dropped. The remaining words were stemmed using the standard Porter stemmer, and then grouped into “ngrams” - groups of one, two, or three sequential word stems. For example, “state and local government” would be parsed into six stemmed ngrams [“state”, “local”, “govern”, “state local”, “local govern”, and “state local govern”].

To focus on the most prevalent features, ngrams which appeared in less than 1% of all posts were excluded. Additionally, any ngram which was concentrated only in one particular chapter (i.e. 80% of all occurrences are in a single chapter) was also dropped - this was true of 24 ngrams in Saving Schools, and 90 words in American Government. The end result of this process was a “feature count matrix”, in which each post was assigned a row, while each ngram feature was assigned a column, and the value of each cell represented the number of times that ngram appeared in that post. Both classes provided rich vocabularies, with 1192 ngrams in Saving Schools and 983 ngrams in American Government. However, this dataset is sparse – specifically, 96% of the cells are zero, since most posts only included a few of the full set of ngrams.

Model Estimation

These steps processed the unstructured text into a high-dimensional set of features, and we needed to determine how those features could be chosen and weighted to best distinguish the writers’ ideology. We followed a bottom-up approach, using a common method, the LASSO, implemented in the glmnet package in R (Tibshirani 1996; Friedman et al. 2010). This algorithm estimates a linear regression, and shrinks the effective feature space by imposing a constraint on the total absolute size of the coefficients across all features. The size of that constraint is determined empirically, by minimizing out-of-sample error via cross-validation within the training set. This process reduces many coefficients in the regression to exactly zero, leaving a smaller set with non-zero coefficients in the model.

To estimate the algorithm’s accuracy, we used a nested cross-validation procedure (Stone 1974; Varma and Simon 2006). The entire dataset was randomly divided into ten folds of equal size. To produce out-of-sample predictions for each fold, a classification model was trained and tuned on the other nine folds, and applied directly to the held-out data to predict the ideology of those posts. To smooth out the random fluctuation across folds, we performed this whole procedure five times, and averaged across all five predictions to determine a final predicted partisanship for each post. The predicted partisanship of all posts in both classes are plotted against the actual partisanship of the author in Fig. 7. We also fit a loess regression, which is shown in the figure (with 95% confidence intervals) These results suggest that there is indeed some predictive distinction that can be made between ideologically opposed students. However the relationship between predicted and actual partisanship is not especially strong. Additionally, it seems to be somewhat stronger in American Government (rτ = .179, z(5452) = 19, p < .001), while it is weaker in Saving Schools (rτ = .077, z(4516) = 7.7, p < .001). These results provide evidence for the existence of some modest ideological divisions in the language of forum posters.

Fig. 7
figure 7

Distribution of predicted versus actual partisanship from language classifier. A loess regression line is also plotted, with 95% CI. All units are in standardized ideology scores, with higher values indicating more conservative posters, and lower values indicating liberal posters

To highlight the features that distinguished partisan posts in this algorithm, we conducted a separate analysis that considered the partisan distinctiveness of each feature individually. Specifically, we calculated two commonly used statistics that capture strength of association - variance-weighted long-odds ratio and mutual information - to evaluate the relationship between feature frequency and ideology in each class (Monroe et al. 2008). In Fig. 8 we plot these two metrics against one another for every word in both classes, which gives a visual representation of the words and phrases that were the most distinctively partisan in our data. The words towards the upper corners of these plots are among the most useful. Some of these distinctive words do carry partisan connotations - for example, in American Government liberals were more likely to discuss “corporations”, while conservatives were more likely to discuss the “constitution”. However, the linguistic differences represented here are modest, and do not cleave along familiar partisan lines, for the most part. This provides additional evidence that the weak results of the classifier above represent a property of our data, and not just the limitations of our algorithm. That is, the language of posters does not diverge sharply along partisan lines.

Fig. 8
figure 8

Partisan distinctiveness of words in both classes. The variance-weighted log odds ratio of each word is given on the x axis, capturing the direction of partisan distinction in each word. The mutual information of each word is on the y axis, an absolute measure of distinctiveness

Partisan Topics.

One limitation of the ngram-level analysis above is that the analysis is too granular. That is, the number of ideas and themes in the forum is much smaller than the number of unique ngrams. Thus, it is possible our ngram-based analysis could miss the effect of partisanship broader trends in topic use. To examine this we use a form of unsupervised text analysis from the topic modeling tradition (Blei et al. 2003) called the Structural Topic Model (Roberts et al. 2014; Reich et al. 2015; Roberts et al. 2016a). Topic models are designed to identify sets of words, “topics,” that tend to occur together. This reduces the high-dimensional space of “all words used in the forums” to a more manageable space of re-occurring common themes, which we can then map onto partisan ideology. The STM estimates the relationship between metadata whether it was written by a liberal, moderate, or conservative, and the proportion of the post belonging to a particular topic. From each course we estimated a separate model to evaluate whether particular topics are more likely to be discussed by students from one side of the political ideology scale. Differences in the distribution of topic usage by one partisan group may be evidence of fracturing discussions.

For each class, we processed the text of the posts using the same steps as above, with the exception that we only use unigrams (and not bigrams and trigrams) in the topic model. We estimated a separate 30-topic model for each class, using a spectral initialization procedure (Arora et al. 2013; Roberts et al. 2016b). After the model was estimated we extracted seven representative words from each topic, using the FREX metric (Bischof and Airoldi 2012; Roberts et al. 2014). The resulting topic word lists are given in Tables 7 and Table 8. Additionally, we also estimated the partisan lean of each topic, by comparing the difference in prevalence of each topic among liberals and conservatives (as defined by the tripartite metric described above). These estimates are plotted in Fig. 9, and suggest that all partisan differences in topic use amount to less than 2% of all posts. Consistent with the classifier results, the topic model estimates suggest that for the most part, students with conflicting ideology still converge around similar topics.

Table 7 Most distinctive words from topics in Saving Schools
Table 8 Most distinctive words from topics in American Government
Fig. 9
figure 9

Distribution of estimated change in prevalence of topics between liberal- and conservative-authored posts. Point estimates and 95% confidence intervals are shown

For example, in Saving Schools even topics that one would expect to be politically charged, such as racial achievement gaps (Saving Schools # 29), or common core (Saving Schools # 26) seem to be evenly distributed across the ideological spectrum. The results holds in American Government, as well. Topics that are politically divisive, such as the Supreme Court (American Government #10) and national security (American Government #25) are discussed at essentially equal rates by both liberals and conservatives. These results suggest that for many of these politically sensitive topics, students are converging around shared language and concepts.

These topic models also suggest that there were in fact some topics where language diverged across partisan lines. In Saving Schools, one large ideological division was among the topics that focused on the teachers. Left-leaning writers focused more often on the teachers’ certifications (Saving Schools #3), and time commitments (Saving Schools #1). Right-leaning writers instead focused more often on teachers’ compensation (Saving Schools # 22) and school board governance (Saving Schools #18). In American Government, the most distinctive topics for liberals were economic issues, such as international trade (American Government #17) and interest group lobbying (American Government #5). Interestingly, many of the most conservative topics were not substantive, but simply captured the syntactical structures of abstract principles (American Government #4) and delineating disagreement (American Government #29), perhaps because they recognize themselves to be a minority. There was also some modest partisan taunting (Grimmer and King 2011). For example, some right-leaning students in Saving Schools did explicitly complain about left-leaning policy makers in education (Saving Schools #11). Additionally, the left-leaning students in American Government did levy their complaints about religious conservatives (American Government #13).

Overall, though, these partisan-leaning topics were rare, and not especially heated based on our own reading of the posts. Instead the general pattern seemed to reflect an open and diverse conversation that welcomed views from across the political spectrum, and brought opposed students together around common language in much of the discussion forums. For transparency, we selected some representative posts from the topics mentioned in the main text here, and included them in Appendix.

Linguistic Interaction Style

So far, our analyses have considered the language of each post in isolation, evaluating the contents of each post with respect to the poster’s own ideology. However, this does not reflect the context in which many posts are generated. As Fig. 5 shows, many posts are direct comments on other students’ posts, and these comments come from students across the ideological spectrum. That is, liberals leave comments on posts from both conservatives and liberals, and conservatives’ comments are similarly distributed. These post-comment pairs, then, can span a range of ideological distance - some comments are ideologically close to their parent post (“intra-party” comments) while other comments are ideologically distant from their parent post (“cross-party” comments). Does this ideological distance between comment and parent affect the contents of the comments themselves?

Pair-Level Data

Though commenters were not randomly assigned to parents, our dataset can provide some insight into these interactions. However, our effective sample size is limited because this analysis requires that both the parent post and comment be written by someone in our target sample (from the U.S., answered the ideology survey question, etc.). We also decided to remove pairs that involved a moderate poster to highlight the contrast in ideological distance, leaving only intra-party pairs (i.e. liberal-liberal or conservative-conservative) and cross-party pairs (i.e. liberal-conservative or conservative-liberal). All in all, we found 569 such parent-comment pairs in the substantive course-created threads in Saving Schools, and 297 pairs in American Government. To increase our power, we pooled the courses together, for a total dataset of 866 parent-comment pairs.

Stylistic Measures

Our sample was not large enough to conduct a bottom-up analysis of the words that distinguished ideological distance in the same way that our earlier analyses distinguished liberals and conservatives. Instead we draw on the literature to test whether established markers of linguistic style are more common in intra-party or cross-party pairs. In particular, we focus on three kinds of linguistic styles that might relate to engagement across difference. First, we considered the emotional content of the posts (often called “sentiment analysis”) by tallying the use of positively- and negatively-valenced words in the comments, as defined by a commonly used dictionary (Pennebaker et al. 2007). Second, we considered several markers of linguistic complexity, including: the average word count; the Flesch-Kincaid readability score, a measure of syllable- and sentence-level complexity (Kincaid et al. 1975); and vocabulary depth, as measured by the (reverse-scored) average frequency of the words used in the comment (Brysbaert and New 2009). Third, we considered markers that might reflect accommodation in the comments. One simple marker of accommodation is the presence of hedging language in the comment post (Hübler 1983; Jason 1988). We also explore more complex measures of accommodation, by measuring two kinds of matching in word use between the parent and comment post (Giles et al. 1991; Doyle and Frank 2016). One version measures stylistic matching, by similarity in function word use - broad categories of word classes such as pronouns, negations, quantifiers, and expletives (Ireland et al. 2011). Another version measures semantic matching, by similarity in use of topics, measured by the (reverse-scored) Hellinger distance between the distribution of topics in the parent and comment post (Blei et al. 2003).


For each of the measures listed above, we calculated a value for each parent-comment pair in our data. However, our dataset was somewhat imbalanced, in that liberal commenters were somewhat over-represented in the cross-party pairs, relative to the intra-party pairs. Conceptually we were most interested in the relationship between these linguistic style markers and each comment’s ideological distance from its parent, holding constant the raw ideology of the comment. To test this we conducted weighted regressions, with the weight on each observation inversely proportional to the prevalence of the ideological configuration of the parent-comment pair (i.e. liberal-liberal; liberal-conservative; conservative-liberal; conservative-conservative). This put equal weight on each of the four configurations in our estimates, eliminating any mechanical correlation between ideology and ideological distance.

We summarize these estimates in Table 9, by reporting the results of a series of regressions that included the binary intra- vs. cross-party pair variable as a predictor, and each (standardized) linguistic style measure as an outcome. We do see a highly significant relationship between ideological distance and word count - that is, commenters who were ideologically opposed to the writer of the parent post wrote shorter posts, on average. There was also a modest trend in function word matching, which was more common from intra-party commenters. These results are not, by themselves, definitive evidence for engagement, because dictionary-based methods can in some cases be context dependent, and unreliable in small sample sizes. However, these results corroborate the main conclusions of the other analyses here, and provide evidence that the conversations being held in these MOOC forums provide a rich, engaging source of civic discourse for students.

Table 9 Estimated relationship between measures of linguistic style in comments and ideological distance from their parent posts. Positive coefficients indicate higher levels in comments from cross-party pairs, compared to intra-party pairs. (* = p < .1; ** = p < .05; *** = p < .01)

General Discussion

Open online courses attract a wide diversity of students, and this diversity can potentially serve many of the institutional goals of MOOCs. In this paper we present results that potentially identify a neglected benefit of this diversity - that is, can these courses allow students to bridge political differences and interact with their ideological opponents? Contrary to the concerns of observers that the internet has become a place of echo chambers and silos (Sunstein 2017), we find evidence that, at least in these two examples, online courses are a space in which people with different political opinions can learn and engage together.

We found that the student body contained participants with diverse policy preferences. Only a subset of participants chose to engage in online forum discussions, but the subset that did so had a range of political ideologies. Within forums, we found that most threads contained a balanced proportion of liberal and conservative posters, and that liberals and conservatives directly responded to each others’ posts. We argue that these online courses present a case study where at least the pre-conditions of deliberative discourse appear to be met.

Additionally, text analysis of student forum posts suggests that students with different political beliefs tend to discuss many similar topics in roughly equal proportion. However, we still found evidence of some partisan division in some of the topics they discussed. In particular, discussions did seem to diverge more around issues related to teacher’s contracts in Saving Schools, and economic issues in American Government. However, these divisions were modest, and combined with the tracking data, these results suggest that, generally speaking, students did not segregate themselves within rhetorical frames that inhibited meaningful discussion. Finally, we found that the linguistic style of comments was not meaningfully different between those that replied to intra-party posts, and those that replied to cross-party posts.

These results fit into a broader research framework driven by two categorizations that could be applied to any online community that attracts diverse participation. These categories form a 2 × 2 matrix that maps onto our latter two research questions, and summarizes the ways in which political differences might affect online discussion forums, shown in Fig. 10. The bottom left quadrant describes forums where people with different political beliefs separate into silos and use different language; these are the echo chambers of Internet discourse. The top left quadrant describes integrated threads in which partisans use different language; these are spaces where students with different beliefs talk past one another. In the bottom right quadrant, students discuss topics using a shared language, but they divide themselves into conversational silos with like-minded others. In the top right quadrant is the ideal condition of “deliberative discourse”, where people with diverse beliefs converse together, using a common vocabulary. Here we focus primarily on describing and measuring these possible categories of discussions. In future work we hope to understand what causes these different categories to emerge, and what might be done to promote deliberative discourse, at the expense of the other types of discussions.

Fig. 10
figure 10

Two-by-two schematic of dimensions of engagement in online discussion forums

One clear limitation of these results is that we can only analyze the content of MOOC discussions among students who endogenously choose to enroll. This means that we cannot determine whether the generally positive intellectual climate in these discussion forums is inherent to the MOOC format, or whether these MOOCs attract a particularly open-minded and reasonable student body (or some combination thereof). It is hard in our sample to detect the extent of selection effects, in part because the individual-level determinants of civil discourse may be unobservable. Furthermore, we do not at present have comparable data from some other reference population. Intuitively, one could easily imagine situations in which MOOC enrollment is positively correlated with some latent propensity to be civil to one’s ideological opponents. On the other hand, one could also imagine situations in which even well-intentioned students might fail to bridge across partisan divides. We cannot conclude whether the MOOCs’ student selection or the MOOCs themselves are independently necessary to produce constructive civic discourse. However, our results suggest that these two factors are jointly sufficient to produce constructive civic discourse.

Another important limitation of this current work is that we lack a “ground truth” measure of partisanship with which to calibrate our measures of engagement across political difference. For instance, our analysis of the forum text suggests some (but not total) partisan division in language use. Are these divisions large or small? Though we try to provide some context, ultimately we do not have a perfect benchmark. Additionally, we have presented analyses that draw from many established approaches to analyzing open-ended text, and all of these approaches rely on simplifications and assumptions to quantify the unstructured data from the forums. However, these are not exhaustive, and the results we present here do not preclude the possibility of other linguistic differences that might be better captured by other modeling choices or feature sets. We hope that our research is a starting point to spur new work, as other datasets and language models are adopted in the scientific community.

One important advantage of the methods we use here are that they are generalizable across contexts. That is, the same analytical framework could be applied in other settings where the structure and contents of their discussions are tracked, including settings in which other dimensions of interest (such as gender or ethnicity) are the focus of diversity efforts. In parallel with this basic research, our future efforts include building a dashboard that incorporates these analyses in a standard suite of tools. This dashboard could be used by administrators to monitor engagement across difference (or lack thereof) in their own class discussion forums, informing classroom policy in real time. We also hope that in future research, we might use experimental interventions explicitly designed to increase engagement across political differences, and to evaluate how the measures we describe here respond to those interventions.

Although many online discussion spaces tend towards partisan division, our results suggest that MOOCs stand out from that trend, and can provide a space where students’ exposure to divergent perspectives can be enriched. Ultimately, our hope is that greater research and attention to non-cognitive and civic outcomes in MOOCs can broaden the conversation about the purposes of open online learning. Historically, public education has not only served the purpose of developing young people for professions but also for their roles as citizens in civil society. MOOC research should engage with questions as broad as our hopes for higher education.