Introduction

The global COVID-19 pandemic significantly impacted all levels of education. In response to the quarantine order, universities and colleges shut down campuses and shifted to fully remote instructions to sustain teaching and learning activities (Strielkowski, 2020; Sun et al., 2020). Navigating learning in this unpredictable time can be challenging. Educators and learners had to deal with the stress and anxiety induced by the health crisis, while striving to engage in online courses and navigate digital learning systems (Pokhrel, & Chhetri, 2021). The lack of interaction and emotional support in physical isolation can expose students to greater challenges in online learning than regular times (Elmer et al., 2020). This underscores the need to understand what changes took place in the online learning environment that might potentially influence learner experience during the COVID-19 pandemic (Mishra et al, 2020). Studies have shown that the contingency shift to remote learning also the promoted digital revolution in higher education and creative solutions to cultivate an interactive and engaging environment in virtual learning (Strielkowski, 2020; Pokhrel & Chhetri, 2021).

Social and cognitive engagement are important factors for engaging in online learning (Ouyang & Chang, 2019). Online discussion forum plays a critical role in enabling communication and collaboration in remote settings and self-paced asynchronous courses. It promotes student-centric learning and peer interactions on specific topics with instructor guidance (Desai et al., 2021). Studies on Massive Open Online Courses (MOOCs) have found that discussion forums increase social presence and peer interactions, which enhances learners’ psychological experience and learning outcomes. While discussion forums are an integral part of MOOC courses for question-answering and creating classroom communities due to the large student-to-teacher ratio, the utilization of discussion forums remained optional and is largely determined by instructors at accredited universities. In previous years, although LMS has been widely adopted by higher institutions for managing course materials, discussion forum remained to be a supplementary component to in-class lectures and discussions. One potential incentive for instructors to adopt discussion forums during remote instruction is that since discussion forums are already embedded in LMS, it requires lesser technological competency compared to other social facilitation tools. Studies suggest that discussion forums offer opportunities for information sharing and facilitating learners’ soft skills (Suryaningsih, 2021), and support online assignments and questions (Mustadi et al., 2021). While there is reasons to believe the benefits discussion forum could bring regarding fostering social connectedness and learning community during the COVID-19 pandemic, there is a lack of understanding around the specific content of discourse and the changes of learners social and cognitive engagement in discussion forum prior to and after the contingency shift took place.

The goal of this study is to explore the emergent changes in student engagement in online discussion forum as a consequence of COVID-19 pandemic and fully remote instructions. Specifically, we aim to first provide a high level view of how learners' participation, characterized by social and cognitive presence, in online discussion has changed, then compliment with evidence of what topics has emerged and whether students talk about pandemic in this formal learning environment. The findings aim to shed light on how teachers and students respond to a contemporary event that impacted teaching and learning, and how educational technology spaces such as discussion forum afforded these adaptations to happen. This study utilizeseducational data mining techniques to harvest insights on teaching and learning at scale in a non-intrusive manner. We examine the topics and linguistic characteristics of students' posts in the discussion forumwithin a subset of classes that consistently used discussion forum to control for course characteristics. The novelty of this study, compared to previous exploratory work on learner engagement in discussion forum, is our attempt to contextualize changes in learner interactions and examine emergent discourse attributed to the impact of the COVID-19 pandemic. The contribution of this work are three folds. First, we show how emergent machine learning methodscan help us assess and monitor learning, and how educational technology spaces such as discussion forums afford different need of learning. As a methodological contribution, compared to survey and interview measures, we discuss the potential of a non-intrusive way to sample learning experience at scale. examine the impact of COVID-19 on learning behavior at scale. Second, we shed light on social, cognitive and affective aspects of learning at scale, offering both quantitative and qualitative to demonstrate not only what students talked about, but how they interacted with their peers. . Lastly, we offer discussions on implication of the study for on teaching and learning in higher education beyond the COVID-era.

The following sections are structured as follows. We first provide an overview of related work in educational data mining where researchers leverage large-scale educational discourse to gain insights into learning behavior. Then, we provide an account of research studying engagement in online discussion forums and identify constructs that are relevant for contextualizing teaching and learning in such environment. We describe the methods we used to quantify linguistic characteristics of learner discourse, and methods to extract topics relevant to COVID-19 within discussion forum during the early stages of remote learning. In the results section, we illustrate the change in sentiment during the academic year as well as the topics and sentiment characteristics associated with discourse around COVID-19. We present further interpretation of the results and implications in the discussion section.

Related Work

Educational Discourse and Text Mining

According to Vygostky’s (1978) sociocultural learning theory, learners bring their personal experiences into learning as they interact and build social relationships with others. From the psycholinguistic perspective, lexical expressions in writing and conversations disclose subtleties about people’s internal thoughts, attitudes, and emotions (Pennebaker et al., 2003; Tausczik & Pennebaker, 2010). The value of peer interaction in discussion forums is well documented (Gilbert & Dabbagh, 2005; Ziegler et al., 2014;). Asynchronous online discussions promote articulation, reflection, social negotiation, and meaningful discourse that demonstrate critical thinking skills by relating course content to prior knowledge and experience in web-based or online learning environments. Language not only provides a window into learners’ cognitive process but also characterizes the quality of classroom interactions (Bransford et al., 1999; Cazden, 1988). As the amount of student data generated online has increased exponentially during COVID-19, manually inspecting discussion content and their linguistic properties is no longer viable.

In the context of education, applied computational techniques are valuable tools to detect changes in response to the COVID-19 pandemic for the student population. A growing body of research has been centering on mining educational data to generate information that drive decision making in instructional activities (Romero, & Ventura, 2013). Emergent subcommunities such as learning analytics, educational data mining (EDM), artificial intelligence in education (AIED), and learning at scale (L@S) combine interdisciplinary knowledge to enhance our understanding of learner behavior and learner experience. In a time of crisis and sweeping changes taking place in higher education, being able to gather timely insights is particularly crucial to help teachers and students to come up with appropriate teaching and learning strategies. Computational techniques provide effective and efficient means to do so. For instance, student discourse provides critical information about learner experience and engagement and computerized text analysis allows us to delve into learner-generated text both in-depth and at scale.

To tackle the challenge of large unstructured educational text, NLP techniques, such as sentiment analysis and topic modeling, offer a means to process and obtain patterns from large-scale data (Romero & Ventura, 2017). The application of NLP in social media studies and product reviews in business is particularly prevalent, as an effective way to understand public discourse and sentiment at scale (Xue et al, 2020). In the field of education, a plethora of research has also demonstrated the benefits of text mining in addressing educational questions (Dowell & Kovanovic, 2022; Lemay et al., 2021).For instance, studies have applied topic modeling to understand people's concerns and attitude towards online learning during COVID (Mujahid et al., 2021). Other researchers model learner-generated posts to surface insights around learning in formal and informal pedagogical environments. Chopra et al (2022) constructed topic chains by connecting semantically similar topics across months, demonstrating the temporal changes in learner discourse in the discussion forum during the pandemic. Sentiment analysis has been used widely to evaluate students’ reflective writing (Chong et al., 2020) and student feedback (Altrabsheh et al., 2013; Lundqvist et al., 2020). It has also been used to track learner’s emotional trajectory (Gkontzis et al., 2017; Munezero et al., 2013; Neumann, & Linzmayer, 2021) and predict learning performance and satisfaction (Hew et al., 2020). Recently, Peng & Xu (2020) combined topic modeling and sentiment analysis to reveal significant differences in discourse behaviors from course review between completers and non-completers in MOOCs. They emphasized the importance of examining implicit discourse behavior (i.e., focused topics, topics' emotional tendencies and behavioral patterns) beyond explicit interaction features (i.e. clickstream). To further understand sentiments expressed on specific topics, we can combine sentiment analysis with topic modeling to provide more contextualized details (Dolianiti et al., 2019a, b).

The contingent shift to online learning due to the COVID-19 pandemic brought huge opportunities for advancing research in understanding educational phenomena and searching for solutions to address challenges in higher education (Adedoyin & Soykan, 2023). Using learning analytics and EDM techniques (e.g., sentiment and topic modeling), we aim to provide insights into students’ learning process and experience in Learning Management Systems (LMS).

Discussion Forum and Community of Inquiry Framework

Among literature on asynchronous discussion forums, the Community of Inquiry (CoI), which consists of three main components: cognitive, social, and teaching presence, is a framework to study forum activity (Garrison et al., 2001). According to the framework, social presence refers to the process when learners involve in interpersonal interactions and coordinate efforts with peers. This is further extended to reflect an individual’s ability to identify within the community (Garrison, 2009). From the learning science perspective, social constructivists deemed the significance of interactions with peers and instructors that facilitate knowledge co-construction (Andrews, 2012). As such, facilitating social presence can be particularly crucial in distance education where facial expressions, body language, and auditory cues are lacking (Swan, 2010). In a way, social presence can be seen as ‘the degree which a person is perceived as a “real person” and serve as a predictor of satisfaction in a computer-mediated environment (Gunawardena & Zittle, 1997). Research on social presence in online courses further established association to learner satisfaction and academic achievement (Joksimović et al., 2015; Kang et al., 2014), emphasizing the value of promoting social presence in online courses.

Cognitive presence represents higher-order thinking and constructing meaning through active reflection. Cognitive presence can be achieved when students link new concepts to past knowledge and reflect on the application of what they learned in class to real-life scenarios (Kilis & Yıldırım, 2018). Previously, research suggested that increased cognitive and social presence is beneficial to learning outcomes and psychological experience for online courses (Garrison & Arbaugh, 2007). A review also suggests that more purposeful interaction should be facilitated in future distance education (Abrami et al., 2011). However, there has been little empirical evidence to suggest how interaction patterns in asynchronous discussion forums changed during distance learning under the impact of COVID-19.

In response to COVID-19, courses that originally leveraged online discussion forums as an extension of classroom became more dependent on this digital environment for enabling student-to-teacher and peer-to-peer interactions when classes became fully online. Students’ cognitive presence might also be seen in incorporating real-life events into critical discussions with classroom content. In addition, students might seek to connect with others more eagerly during remote instructions for the needs of social connection and belonging. Aside from using survey instruments to directly measure CoI, studies have utilized computational linguistic program such as Linguistic Inquiry and Word Count (LIWC) to reflect socio-cognitive processes in discussion forum posts (Lin et al., 2020). In particular, cognitive process variable contain words that describe cognitive process and higher-order thinking (Moore et al., 2019; Pennebaker et al., 2015). Previous study has shown that LIWC-based analysis offers distinct proxies of cognitive presence (Joksimovic et al., 2014). The category of words social, which describes social content, and affect, which indicates emotional expression or self-disclosure, can be seen as signals of social presence (Ferreira et al., 2020).

Current Study

The current study seeks to examine the changes in discourse in online asynchronous discussion forums prior to and after the shift to remote learning and declaration of the global pandemic. We aim to shed light on the role of discussion forum from student posts, We take an exploratory approach to look at how social and cognitive presence in the forums have changed, and what emergent topics. We leveraged the alignment of COVID-19 development and instructional timeline to approximate the shock (i.e. the abrupt change in instructional activity) being introduced at the end of Winter quarter and prior to the start of Spring quarter. While the news around COVID-19 began to brew at the beginning of the year, the spread of the virus only gained more serious attention in the United States in March 2020 when the first case was discovered in the U.S. The outbreak was announced as a global pandemic by the end of March, which was also around the end of the Winter 2020 quarter (Cucinotta & Vanelli, 2020). As the government-issued stay-at-home order was in effect, the university took immediate action to shift to fully remote learning for the Spring 2020 quarter. Since the pandemic announcement coincides with the transition between Winter and Spring quarters, we consider that as a cutoff line to observe whether students’ discourse in Spring is different from that which is observed in Winter and Fall. Figure 1 illustrates a timeline of the disease development with respect to instructional activities.

Fig. 1
figure 1

Timeline of COVID-19 development and instructional activity

Specifically, we focus our investigation on the changes in social and cognitive presence in repeatedly offered courses. To control for course content and course level characteristics, we filtered out a set of courses that were offered every quarter and consistently used online discussion forum at the higher education institution where we obtained the data from. Additionally, we zoom in on the emergent discourse around COVID-19 and observed how social, cognitive, and affective components were manifested amongst those posts. Our current study investigates the following research questions:

  • RQ1) How did social and cognitive presence in online discussion forum change before and after the transition to fully remote learning?

  • RQ2) Whether topics around COVID-19 were present in the discussion forum?

  • RQ3) What were the characteristics of COVID-related discussions in the forum?

We provide several hypotheses for the research questions. First, we hypothesize that social presence might increase under the assumption that discussion forum is used more heavily to facilitate social interactions to compensate for the lack of physical interactions. Research has shown that online discussion boards could effectively motivate student-student and student-instructor interactions (Bernard et al., 2009), especially in remote situations. In Ashokkumar and Pennebaker (2021), they found that language in a social forum indicating social connections and cognitive processes amplified following days when the growth rate of COVID-19 infections was higher, as a reflection of people’s attempts to better understand and process the issues they were facing and seeking comfort from social ties when faced with threats. We were cognizant of the different nature between LMS forums and social media forums, where cognitive processes are inherently high in a learning context. We hypothesize that cognitive presence in student discourse might remain the same or slightly decrease, under the assumption that instructors might become more lenient and pose less complex problems for students to address in the discussions considering learners’ limited mental capacity in this challenging time. Second, we expect some direct discussion on COVID-19 to be present in the discussion forum. Although forums tend to evolve around course content, COVID-19 and quarantine as collective experience should be detectable and might take up a non-trivial place in forum discussions. Lastly, we expect discourse around COVID-19 topics to be particularly high in social presence. Students might be more actively seeking social connections to cope with the challenging situation and build classroom community as a social buffer.

Method

Data

Our data was retrieved from an online learning management system at a large public university in the United States. Specifically, discussion forum posts naturally occurred during Fall 2019, Winter 2020, and Spring 2020 quarters across all courses were obtained. In addition to forum posts, timestamp at the time a post was created, student ID, course and term information associated with the post, topic messages were also retrieved. We first removed posts that were not in English. We then further filtered messages based on the course level characteristics. Two inclusion criteria were used to determine whether a post would be selected. First, we aggregated forum posts within a course and filtered out courses that had fewer than 30 posts in a given quarter. Second, in order to eliminate course level differences, we retained courses that had been repeatedly offered and actively utilized discussion forum across all three quarters during the 2019–2020 academic year based on course code. The resulting courses are relatively consistent in curriculum design, course requirements, class size, and other course characteristics. The reason for restricting courses that have consistently used discussion forum is to reduce other potential confounding factor such as novelty effect when technological platform is first introduced to a class. As such, we can observe the changes in students’ engagement in discussion forum in Spring more likely as a result of instructional or learning changes given the impact of COVID-19 pandemic. In total, 64 courses (22 in Fall, 20 in Winter, and 22 in Spring) remained in the dataset, comprised of writing seminars from the Humanities and English department. The courses are open to all students in the university and partially satisfy lower-division writing requirements. Since the writing requirement is a part of the General Education requirement that applies to all university students, students enrolled in these classes were predominantly first-year students. The course requirements and curriculum design are also relatively consistent across the quarters as foundational courses. We retrieved a total of 15,263 student posts from these classes. The total count of forum posts for the Fall, Winter, and Spring quarter were 4245, 4384, and 6634 respectively. The average posts per course were 192 in Fall, 219 in Winter, and 301 in Fall. Posts by instructors and teaching assistants were obtained but not included in the main analysis.

Participants

We obtained the administrative data on enrollment and matched with the student ID associated with the forum posts. In the dataset for analysis, a total of 1441 students enrolled in writing courses contributed to the forum posts. The enrollments by terms were 482 in Fall, 460 in Winter, and 532 in Spring. As students are required to take two lower-division writing courses, students may enroll in another course in a different quarter. As such, we observe a small the discrepancy in total number of enrollment and total number of unique students. According to the demographic data, we have 744 female and 697 male students in the data sample. Of those participants who reported their race and ethnicity (1384 out of 1441), 11.05% of the students were White, 3.32% were Black or African American, 58.60% were Asian, 26.51% were Hispanic or Latino, and less than 1% were identified as American Indian or Pacific Islander. Students enrollment status was primarily Freshman (96.6%) with a few transfer students.

Quantifying Online Presence and COVID-19 Presence in Discussion Forum

To characterize forum posts and identify COVID-19 relevant discourse in the entire forum, we employed several text mining methods. Figure 2 presents how we transform posts and generate linguistic measures across psychological and semantic dimensions for further quantitative analysis. To capture the social and cognitive presence, we used a lexicon-based computational linguistic program Linguistic Inquiry and Word Count (LIWC). We explain the rationale for the selection of LIWC properties for social and cognitive presence in the following section. We also performed a sentiment analysis to capture the positive or negative valence of discussion posts through sentiment scores. To explore the semantic dimension of learner discourse, we applied Top2Vec (Angelov, 2020), an unsupervised topic modeling technique to examine latent topics in the corpus. We performed a semantic search using keywords “covid” and “quarantine” to examine whether topics relevant to COVID-19 and quarantine emerged as a prevalent topic in the forum. We took a further look at the social and cognitive presence amongst quarantine and covid posts in order to examine whether COVID-19 related discussions were distinctively different from other forum discussions. We provide further details of each method in the following subsections.

Fig. 2
figure 2

Graphical representation of analysis workflow

Linguistic Inquiry and Word Count (LIWC)

Linguistic Inquiry and Word Count (LIWC) is an extensively validated dictionary-based tool for capturing psychological and linguistic properties in text (Tausczik, & Pennebaker, 2010). Each discussion forum post was considered a single document for analysis. We applied the LIWC2015 program to process each forum post, and the tool turned a score for each word category. LIWC captures a wide range of psychological dimensions and is capable of reflecting changes in people’s emotion, cognition, and social connections during the early months of COVID development at the linguistic level (Ashokkumar & Pennebaker, 2021). In the context of online discussion forum, previous literature suggest that discussion posts that contain more affective, interactive and cohesive components demonstrate high social presence (Rourke et al., 1999; Hostetter, 2013). Ferreria et al. (2020) further suggest that several sociolinguistic indicators in LIWC can capture social presence automatically. As such, we selected LIWC categories that are representative of social presence, including social processes and affective processes.

Both social and affective processes have previously been considered features for constructing social presence (André et al., 2021). For cognitive presence, we included cognitive process and Analytic as indicators. Cognitive process captures higher-order thinking and critical thinking skills through words associated with causation, self-reflection, uncertainty, differentiation and so on (Moore et al., 2019). LIWC’s cognitive processing score has been found to have high levels of predictive validity and has been used for automatic classification of cognitive presence empirically (Kovanović et al., 2016; Ferreira et al., 2020). Analytical thinking signifies formal and logical language which results from cognitive processes (Pennebaker et al., 2014). Table 1 shows the subcategory and example words of non-summary variables according to the LIWC2015 dictionary (Pennebaker et al., 2015). Note that analytical thinking is a summary variable that is calculated based on standardized scores from large comparison corpora and is a non-transparent variable in the dictionary.

Table 1 LIWC properties for social and cognitive presence

Sentiment Analysis

Sentiment analysis is a common text mining technique to reveal people’s opinions, attitudes and emotions toward an individual, events, or topic. In order to identify student opinions or attitude towards COVID-19 specific discussion, we created a sentiment score for each post using VADER (Hutto & Gilbert, 2014). VADER is a rule-based model for characterizing sentiment valences from written documents. We chose this method because VADER is interpretable, empirically validated, computationally efficient compared to BERT or other deep-learning based models, and a highly accurate tool to capture sentiment (Hilmy et al., 2019; Rääf et al., 2021). VADER has been widely applied across domains as well as in the educational domain. While LIWC also has its categories representing positive and negative emotions, VADER is more sensitive and nuanced by incorporating lexical features such as emoticons, sentiment-related acronyms and initialisms, as well as commonly used slang with sentiment value (Hutto & Gilbert, 2014). Since discussion forum posts resemble the format of microblogs - the type of text VADER is trained on and most attuned to - this approach would be viable for our data. We used VADER within the Natural Language Toolkit (NLTK) library in Python to produce a normalized compound score for each post. The compound score calculates the sum of all lexicon ratings and returns a value from − 1 (extremely negative) to 1 (extremely positive). The more a compound score is close to 1, the higher positivity is indicated in the text.

Top2Vec Modeling

To extract student discourse on COVID-19, we employed an unsupervised topic modeling technique named Top2Vec (Angelov, 2020). This modeling approach automatically detects topics present in text and determines the optimum number of topics. Top2Vec algorithm works on the assumption that many semantically similar documents are indicative of the underlying topic. It creates jointly embedded document and word vectors using Doc2Vec (Le & Mikolov, 2014), turns it into lower dimensional embedding of document vectors with dimension reduction technique (Uniform Manifold Approximation and Projection), and finds dense areas of documents. It then calculates the topic vector, which is the centroid of document vectors in the original dimension and calculates n-closest word vectors to the resulting topic vector. The original paper (Angelov, 2020) provides further algorithmic intuitions behind the model. Compared to traditional topic modeling such as LDA, an advantage of this unsupervised approach is that the model takes into account the semantic relationship of text and produces results at a more granular level. Instead of using bag-of-words (BoW) representation of documents which ignore the ordering and semantics of words, top2vec leverages joint document and word semantic embedding to find topic vectors. Moreover, the authors pointed out that the misconception that we commonly fall into topics are often thought of as discrete values (i.e. politics, science, art), when in reality topics can be further subdivided into many other subtopics. Unlike traditional topic modeling methods, such as LDA, top2vec does not rely on human input or parameter tuning during the training process.

While it is more efficient to use a universal sentence encoder and other pre-trained embeddings, we trained the Doc2Vec model from scratch due to the novelty of COVID-19 vocabulary. Between “fast-learn”, “learn”, and “deep-learn”, we set the parameter at “learn” to achieve a balance between speed and quality vectors. We consider each post as a document, and the entire collection of posts as corpus. Once the model is trained, we ranked the top 10 most prominent topics by topic size as well as topic words to interpret the topics. We further located specific topics using keywords. Top2Vec allows for topic and document searches by keywords. We did two separate searches by the keywords of “quarantine” and “covid / coronavirus” and extracted 5 topics that were most semantically relevant to each search term. Top2Vec returns a list of topics with an index and topic score to each topic. Topic score illustrates the cosine similarity for each topic to the search keywords. We then retrieved a list of 50 words under each topic, ranked by word score (a cosine similarity score of the word to the topic). We generated word clouds for the top 5 most relevant topics to “quarantine” and “covid / coronavirus”. The size of the words in word cloud is determined by word score, signifying the importance of this word in the topic. For further analysis on posts, we retrieved 20 most relevant documents in each of the top 5 topics. For the posts retrieved from the five quarantine topics, we will refer to them as “quarantine posts” in the subsequent sections. For the posts retrieved from the five covid topics, we will refer to them as “covid posts in the subsequent sections.

Statistical Analysis

To address our first research question, we compared the characteristics of forum posts in Spring and those that were posted in Fall and Winter. As depicted in Fig. 1, the alignment of COVID-19 development and instructional activities suggests that the beginning of the Spring quarter signals the transition to fully remote instruction. We generated a binary variable to differentiate whether a post was posted during remote instruction or prior to remote instruction. We conducted Welch two-sample t-test to compare the means of selected LIWC properties (as shown in Table 1) and sentiment scores between posts in Spring quarter and posts in Fall and Winter quarter.

To address the second research question, we turn to topic score, which indicates the importance of a given topic amongst other topics, to observe whether COVID-19 emerges as one of the most representative topics in the entire corpus. To address our third research question, we repeated the same analysis as in RQ1 to compare differences between covid-related posts and other posts.

Results

Prior to reporting the results regarding our research questions, we would like to highlight some contextual information at the course and term level that would situate the interpretation of results. First, we observed an increase in overall posts numbers (NFall = 4245, NWinter = 4384, NSpring = 6634) in the discussion forums across all courses. The number of students who generated these posts were respectively 482, 460, and 532 in Fall, Winter, and Spring. The number of courses included in the dataset were 22, 20 and 22 in each term. We calculated the posting activities per class by using total posts in a given quarter divided by total number of courses in a given quarter. As shown in Table 2, we can see that the sum of forum posts per course increased from 192 in Fall and 219 in Winter, to 301 posts in Spring. This indicates a potential increase in active utilization of discussion forum for instructional purposes, or a more active engagement from students during remote instruction period.

Table 2 Enrollment, Course, and Posting Count by Quarter

Overall Online Presence

To our first research question, we found evidence on changes in social and cognitive presence in the discussion forum. Table 3 display the outcome of analysis based on LIWC. Our results suggest a significant increase in overall social presence in the online forums. Specifically, when we compared the linguistic characteristics of forum posts between the periods of in-person instruction (i.e. Fall and Winter) and fully remote instruction (i.e. Spring), two main linguistic categories (i.e. social, affect) that represent social presence showed significant differences. First, we found significantly more prominent social language in Spring (M = 9.94, SD = 5.97) than in Fall and Winter (M = 9.20, SD = 5.41). We also observe stronger affective language in the discussion forum during remote learning (M = 5.57, SD = 4.56) than the average level of affective language prior to remote learning (M = 5.07, SD = 3.30).

Table 3 Differences between Spring posts versus Fall/Winter posts on social and cognitive presence captured by LIWC variables. Welch test is reported because Levene’s test indicated homogeneity of variances assumption was not met. Embolden signals p < .05

Secondly, consistent with our hypothesis, our results showed an overall similar level of cognitive presence before and after fully remote instructions, signaled by LWIC’s cognitive processing. This suggests students engaged at similar cognitive reasoning level in their posts. However, we did find evidence for a decrease in analytical thinking marked by a decrease in analytical expression. Analytical thinking during Fall and Winter was (M = 72.15, SD = 23.98) and dropped to (M = 70.26, SD = 25.17) in Spring, While this does not mean students are less cognitively invested in the discussion assignments, it does indicate students were using lesser formal and less complex expression during remote instruction. This aligns with other literature suggesting the decrease in analytical expression and people’s tendency to use more simple and straightforward expressions during COVID-19. We further elaborate on the implications of this result in the discussion section. With respect to the sentiment analysis results, spring quarter posts show significantly higher compound score, indicating an overall more positive sentiment in these posts. We conducted Cohen’s d test estimate between-subjects effects for the grouped data. We interpreted Cohen’s d effect sizes using a variation on Sawilowsky’s extension of Cohen’s original scheme (Sawilowsky, 2009; Cohen, 1992; Windsor et al., 2019). According to this scheme, the effect sizes on social process (|d| = 0.12) and affective process (|d| = 0.12) were small, and the effect size on cognitive process (|d| = 0.02) and analytical process as well as sentiment analysis (|d| = 0.08) was very small.

COVID-19 Topic Analysis

To further illustrate emergent learner engagement in discussion forum during remote instruction, we examined the prominent topics across the time span of the Fall, Winter, and Spring quarter. First, we found that COVID-19 was among the main discussion. Table 4 shows a list of the top 10 topics and topic words within the topic. The topics are shown in the order of significance ranked by topic size. We can see that casual interaction related to quarantine is one of the more prominent topics detected in the corpus.

Table 4 List of top 10 topics and representative topic words

We found the 5 most relevant topics associated with “quarantine” is generally associated with casual interaction and students sharing personal interests or experience. We visualized the most semantically related topic words in these topics through word clouds in Fig. 3. A full list of the topic words (listed in order of importance to the topic) and topic scores (the cosine similarity for each topic to the search keyword) is available in the supplementary material. From the word clouds below, we can see that students were using pronouns for friends, family, and pets. There is also evidence to the types of activities and hobbies students do such as “beach”, “walks”, and “trips”. We can also observe emotional words such as “shocking”, “surprised”, and “miss” that indicate the sharing of feelings.

Fig. 3
figure 3

Word cloud visualizations of top 5 topics extracted by keyword “quarantine”. Note. The size of words correlates with the importance of the words in given text. The larger and bolder the words are the more important they are to the topic

Similarly, we extracted the top five most relevant topics retrieved with keyword “covid / coronavirus”. We found two of the topics overlapped with the previous search, which was unsurprising given the close semantic relatedness of the two search terms and indicates that they were used interchangeably in similar contexts. However, we did observe some differences in the other three topics compared to topics search results from “quarantine. The group of covid topics expands beyond casual interaction and has a stronger emphasis on discussions on societal events and opinion expression. For instance, in topic 66, the keywords indicate discussion on misinformation and fake news during COVID-19 outbreak; Topic 99 alludes to the social justice protests ignited by the death of George Floyd caused by police brutality in May 2020 and the black lives matter movement that happened as the coronavirus continued to spread. Topic 31 indicated timeline and discourse on politics and social media.

To obtain information on discourse characteristics around COVID-19 topics to address how students were talking about them, we extracted top 20 posts from the 5 most relevant topics associated with “covid / coronavirus” and “pandemic” separately, resulting in 100 posts “covid” post and 100 “pandemic” post. We then analyzed these posts both quantitatively and qualitatively. First, we compared the same LIWC properties as above (i.e. Analytic, social, cognitive process, affect) and compound sentiment of covid posts against non-covid posts using t-test. Our results suggest no significant difference that distinct covid posts from the rest. However, we found that covid posts are on average Next, we repeated the analysis on quarantine posts and non-quarantine posts. Our results suggest that quarantine posts show no significant differences in social process, but significantly higher affective process (t(98.73) = -2.06, p = .042, d = 0.24), lower cognitive process (t(99.42) = 2.43, p = .017, d = 0.24), and significantly lower analytical language (t(99) = 3.32, p < .001, d = 0.35) compared to other posts. Additionally, quarantine related posts show significantly higher compound score from sentiment analysis (t(99.5) = -2.57, p = .012, d = − 0.24), indicating an overall more positive sentiment in these posts. Table 5 demonstrates the differences in social and cognitive presence between quarantine-related posts and non-quarantine-related posts. The effect sizes for affective and cognitive processes, compound scores (|d| = 0.24), and analytical process (|d| = 0.35) were considered medium according to the new scheme for effect size interpretation (Sawilowsky, 2009). There was no evident difference between the characteristics of posts extracted under covid topics.

Table 5 Comparison between posts extracted by keyword “quarantine” and the rest of the discussion forum posts across LIWC variables representing social and cognitive presence.

To gain contextualized insights, we sampled several posts to provide more qualitative details amongst quarantine and covid related posts. We observed that students shared lived experiences, built connections, and showed empathy with peers. This discourse was associated with high positive sentiment valence. Under covid related topics, students frequently expressed gratitude, shared wishes and hope for others, and provided encouragement and support. For instance, one student acknowledged the challenges their peers were going through by sharing their own experience:

I hope that quarantine is not as stressful as it initially was for you. My allergies began to act up around the same time as this whole mess so I totally understand you for that. I’m glad you made it back home in time and that you were able to reunite with your family.” In another example, one student expressed empathy to the respondent and established common grounds:

“…Thank you for sharing your online class experience and your time during quarantine. I’m also really worried about the online class as well, but I believe it’s going to be a good experience. I love to workout, so maybe we can have a virtual workout together during the quarantine time? haha anyways, nice to meet you, and hope you stay healthy! :)” This suggests high social presence amongst the posts and content-oriented around building social rapport.

In addition, we observed posts where students provide constructive support or feedback to peers. For example, a student wrote

“I really like your topic! I found it interesting and think it’s a great idea to relate the current situation to the first amendment. I think adding research to the pros and cons of social distancing would flow nicely with your work. Adding more context to COVID-19 would be a great addition too. You could explain what its is, the effects it has, and its severity…”

In another example, student emphasized the strengths in their peer’s work and pointed to actionable areas that they might improve on:

I believe that your connection to the pandemic occurring right now and using COVID-19 as an example is really strong, and showcases how the problem is present right as we speak of it. Using this crisis and stating how it is changing how our communities are functioning is good, and I like how you touched upon the misrepresentation it may display. I believe that delving in a bit keeping into the COVID-19 situation in connection with the racial discrimination occurring against Asians right now would have been a really good example.

Other posts centered on critical discussion along with other social issues. For example, one student discussed COVID-19 situation in relation to healthcare within the prison system:

With COVID going on right now I focused on how we could help inmates and one of the main solutions right now is focusing on the safe release of inmates. However, I’ve found that that isn’t enough, because prisons would still be overcrowded and it wouldn’t prevent the spread of COVID that much.

Another student discussed the racial discrimination against Asian American amplified by the pandemic:

“…As a result of COVID-19, there has been an increase in discriminatory actions, and the cause of this is due to authorities of higher power, such as politicians, giving rise to the association of COVID-19 with China, causing people to also associate China with all Asians and Asian-Americans in the US…”

These posts are characterized by positive sentiment, and suggest a type of discourse that demonstrates both high social presence and high cognitive presence.

Discussion

We found that there is an overall increase in social presence, characterized by increased social processes, affective processes, and positive valence in student discourse. Overall, this result demonstrated impact of COVID-19 pandemic on people’s expression of emotions that were found in other online platforms (Monzani et al., 2021). However, unlike social media where negative sentiments were pervasive, the sentiments expressed in discussion forums were positive. This indicated that learners might have increased social connectivity in Spring with positive language, which was corroborated by our qualitative analysis. Upon further investigating the content of these posts, we found students established positive interactions by showing rapport and empathy amongst themselves. Interestingly, our results show little change in cognitive processes in learner discourse but a significant decrease in the analytical property. This may imply that while learners remained actively engaged in critical thinking and reasoning, it was expressed in a less analytical manner (i.e. language that are less formal, logical, and hierarchical). A possible explanation is that disruptive events could reduce analytic thinking and invoke more personal and informal language (Seraj et al., 2021). Monzani et al. (2020) corroborated with evidence that emotional tone and analytic thinking were lower in the first two months of the pandemic, which was characterized by uncertainty. Markowitz (2023) further suggests that one’s interest and motivation to think was the driver behind reliable and robust connection between analytic thinking and cognition, not cognitive ability. Taken together, we may consider the change in analytical thinking signaling a drop of incentives to engage in linguistically complex manner, which prompts future examination on its association with students’ psychological experience.

To our second research question, since the set of courses we focus on have discussion forum participation as a pedagogical component, the hypothesis is that the student discourse should remain consistent if nothing else is changed except courses moving to online format. However, we found the emergent COVID-19 discourse took up a non-trivial space in the corpus as indicated by topic rank. The presence of COVID-19 contents in discussion signals a few possible changes happened in teaching and learning. First, instructors facilitated interactions related to COVID-19 experiences. Second, the facilitation remained unchanged (e.g. “Introduce yourself to class”), but learners organically leveraged discussion forum to foster peer connection or reasoning around COVID-19. For both possible explanations, we can infer that there is an emergent need for social connectivity during online instructions in Spring, perceived by instructors or inherently expressed by students. Interpreting with the results for RQ1, this increased social presence suggests there is an increase in socialization activity in learning during COVID-19.

To our third research question, we found that there are more nuances in the way students talked about COVID-19. While there were some overlaps between topics retrieved with “covid” versus “quarantine” (topic 103 and topic 1), the remaining topics were different. We can observe from the wordcloud visualization and also the topic words provided in the appendix, that the forum post content varies based on the search term. Quarantine posts centers on personal experience sharing and building social connections, whereas covid posts involve more objective discussion and sense making of this contemporary events. Quarantine posts showed significantly stronger affective language and more positive sentiment, compared to non-quarantine posts. Our qualitative analysis corroborates with this finding. After examining the details of quarantine posts, we found prominent themes of building resilience, sharing struggles and personal frustrations and showing social rapport. However, we did not observe a significant difference in social process between quarantine and non-quarantine posts. This suggests that quarantine messages were not a main contributor to high social process in spring. Instead, we could infer that the overall increase of social connectivity in spring forum came from students seeking to build social connections in various manners, not exclusive to messages related to quarantine. Interestingly, the results show that there is a significant drop in both cognitive and analytic processes for quarantine-related posts. This could indicate that the conversations on quarantine were less formal and cognition-oriented.

Measuring learners’ online engagement in discussion forum can provide valuable information to educators on adapting their pedagogical practice to learning needs or take necessary intervention. Overall, our findings underscore several the important role discussion forums can serve in remote instruction, specifically by affording classroom community building and social connectedness, and by engaging students in cognitive and analytical thinking. We identified posts that expressed social connection needs, which could imply the necessity of facilitating supportive classroom community. The increased social presence in a discussion forum during the Spring quarter expressed learners’ psychological need for social connections, and discussion forum served as a channel for them to interact with peers. We found student discourse that demonstrate rapport-building and emotional buffering that could be beneficial for coping with social isolation during quarantine. Discussion forum could also help instructors identify specific challenges students experience. For instance, students reported some of the biggest challenges during the lockdown include loneliness from social isolation and a lack of support in learning. Our study also has implications on application of educational technology, particularly on assessment. For instance, an overall decline in analytical language in Spring might signal students having limited capacity in processing complex subjects on top of the need for making sense of COVID-19 impacts. While many learning management systems are now equipped with automatic assessment based on learners’ language, we should be careful in interpretation and take into account of external factors.

There are several limitations to our study. First, our results regarding the differences in social and cognitive presence represented through LIWC between spring and non-spring quarters yielded small effect sizes. Considering psychological language markers tend to have modest effect sizes (Holtzman et al., 2019), we argue that the trend and pattern still provide meaningful information on how overall discourse has changed in the discussion forum prior to and after shift to remote instruction. The effect sizes for the comparison between quarantine-relevant posts and other posts were moderate, distinguishing quarantine posts as more positive and less analytic from non-quarantine content posts. Second, the sample of our courses was limited to a set of first-year writing seminars. This selection was intentional in order to ensure course level characteristics stay relatively salient, although it does not capture a wide range of disciplines and domains. We may also safely assume that teachers for these courses were familiar with facilitating discussion using online forums. As such, our result may not represent the activities in courses that were plunged online and used discussion forum for the first time. For courses that have been online and remained online courses during the pandemic, their discussion forum activities could appear differently as well. Despite the course level differences, our study mainly focuses on capturing the changes to courses that transitioned from in-person to fully-remote instruction due to the disruption of COVID-19, where discussion forum’s role may have also shifted from an additional environment for out-of-classroom interactions, to a more centered space for online presence. Lastly, our current study focuses only on learner discourse and primarily peer interaction in the discussion forum. This might not paint a full picture of online learning because we did not investigate teacher’s role or teacher-student interactions. However, by inferring from learner discourse, our study can indirectly reflect teacher facilitation. For instance, we may infer from learner discussion that instructor facilitated students to introduce themselves and share their experience in quarantine. Future studies should focus on teacher-student interactions to gain further insights into instructors’ role in moderating and facilitating discussion forum activities.

Conclusion

Learner engagement can be an important indicator for academic performance and interest in online courses. The application of artificial intelligence to automatically assess learner engagement at scale lends valuable information on teaching and learning activities. Our study examines the changes in learners’ social and cognitive presence in an asynchronous online discussion forum prior to and after the onset of COVID-19 lockdown. The emergent discussions around COVID-19 and quarantine experiences we detected in early spring quarter suggests discussion forum can serve as a rich source for understanding learner experience and emergent learning needs through online peer interactions. We demonstrate an analysis that combines multiple text mining techniques to effectively harvest insights from learner discourse, sampling discussions surrounding specific topics for further qualitative analysis. We suggest the combination of latent semantic content and linguistic characteristics provides richer contextual details to learner interactions. This analytic process can be adapted to explore other research questions with different kinds of textual data. For instance, one may examine the impact of a change of instructional strategy or course requirement in using LMS. We may apply this analysis procedure to examine effects of such strategy on learners’ online engagement before and after the change takes place. We may also analyze discourse characteristics and further connect them to learning outcomes and psychological wellbeing. In future studies, we plan to expand this analysis onto the entire discussion forum across disciplines, and examine the link between discourse features to survey response, so as to verify whether social, cognitive, and affective language predict learners’ psychological experience (e.g. stress, perceived support) during COVID-19. The same analysis can also be applied to courses offered exclusively online where the level of asynchronous discussion might be higher, to see whether similar patterns can be found in learner discourse. Lastly, our study suggest that discussion forum holds promise for promoting genuine personal connection and building classroom community. We found that students engage in non-trivial discussions of COVID-19 in either a socially-oriented or cognitively-oriented manner. These discussion indicated a need for social connection and sense making, and emerged organically or facilitated by instructors. This suggest that teaching practices should be quickly adapted to meet future learners needs, and instructors should pay close attention to fostering meaningful learning experience. As more and more university courses remain online or hybrid to provide flexibility to students, instructors may consider facilitating meaningful social interaction by incorporating shared lived experiences or critical reflections on contemporary events, rather than superficial shallow connections for participation points in online discussion forums.