Introduction

In January 2020, the novel coronavirus disease (COVID-19) broke out in mainland China with the key characteristics of a pandemic virus, and rapidly spread to many cities in the country within several weeks. Soon, the World Health Organisation (WHO) declared it as a public health emergency of international concern. In the event of prolonged public health threats, such as COVID-19, online learning courses, platforms and resources have been shown to be timely and effective in serving as alternative outlets for learning (Cumbers, 2020; Schwartz & Bayles, 2012). In recognition of this situation, the Chinese central government disseminated online higher education and K-12 education-specific pandemic influenza guidance and recommended the use of distance courses for all grade levels. As a consequence, almost all schools and institutions had to rely on electronic forms of learning to reach students and continue their learning. However, this transition to online learning involved a lot more than flipping a switch. Proponents of online learning have posited that moving face-to-face classes online could become an important component of the academic continuity planning (Allen & Seaman, 2010). Whereas opponents of this pointed out increasing difficulties with creating individually tailored differentiated instruction for each student in online environments (Gillett-Swan, 2017), and students may need additional motivation, scaffolding, organization, and self-discipline to be successful in their online learning endeavors (Jacob & Radhai, 2016).

Public opinion has obtained considerable scholarly attention. While online learning boomed during the pandemic, either on a voluntary or an involuntary basis, public attention has been drawn to this transformation in teaching and learning during this critical period. The purpose of this study was thus to track the changes in perceiving, understanding and experiencing online learning before, during and after COVID-19, using social media data. While efforts have been made in analysing such large-scale data on various topics during the pandemic, such as online healthcare forums (Jelodar et al., 2020), events related to COVID-19 (Wahbeh et al., 2020), social prejudice (Croucher et al., 2020) and etc., relatively less attention has been paid to online education during this outbreak using large open datasets with a few exceptions such as student attitudes towards it (Unger & Meiran, 2020) and user satisfaction (Chen et al., 2020). With the limited number of studies on examining public responses to online education by adopting a longitudinal perspective, we attempted to gather data from social media on this topic and compare the public’s opinions before, during and after the pandemic. Specifically, how positive (or negative) did the Chinese public feel about online learning, and what was of their utmost concern about online learning? Given the concerns about the presence and impact of a “gender gap” on Internet access and usage (Colley & Maltby, 2008), one significant area of research over the last decade in examining social media perception and usage has focused on gender differences. As Ong & Lai (2006) argued, providing more detailed information about users’ views across gender is increasingly important for learning technology stakeholders. Thus, it is important to examine whether their views on online learning varied across periods of time and across gender. The findings would make up for the limited literature on the adoption of online education in case of emergencies, so as to improve the education level from more perspectives.

Literature review

Online education in China

Online education is not new to the educational community in this century, with rapid developments in technology (McBrien et al., 2009). Accessibility, affordability, flexibility, learning pedagogy, and life-long learning are some buzzwords related to online education (Dhawan, 2020). Compared to the days whereby online learning was taken as an optional mode for enriching or optimizing traditional teaching and learning, it is now emerging as a necessity, especially in emergency events. Many universities and schools around the globe have fully digitalized their operations with an understanding of the dire need of the pandemic. During this tough time, the concern is not about whether online education can ensure quality education; rather, it is more about how schools and institutions will be able to make effective use of online learning in such a massive manner (Carey, 2020).

Online education in China has increased exponentially during the outbreak. With the world’s largest student population of the 176 million K-12 students, online education tools have been a natural progression for China, with impressive fiber, broadband and Internet coverage across the country and over 800 million Chinese citizens being Internet users (Kologrivaya & Shleifer, 2020). During the outbreak, online classes have been the main solution for schools and higher institutions to adapt to the changing situation, with an overnight shift of normal classrooms into e-classrooms. While some foresee that the unplanned and rapid move to online learning will result in poor learning experiences that are unconducive to sustained growth due to little professional training, insufficient bandwidth, and insufficient preparation, Others are expecting to witness a new hybrid model of education with significant benefits. As Tao, the Vice President of Tencent Cloud and Vice President of Tencent Education, posited, “the integration of information technology in education will be further accelerated and that online education will eventually become an integral component of school education.” (Li & Lalani, 2020).

Public opinions on new forms of technologies in education

It has been widely recognized that understanding the processes through which the public makes sense of technologies and develop responses to them is critical for the design and coordination of mechanisms for public engagement and participation (Macnaghtena et al., 2019). Researchers have empirically confirmed the importance of examining public opinion to new forms of technologies in education and identified relevant factors and issues (Zhou, 2020). For example, Giannakoulopoulos et al. (2019) tracked the public opinion about online education by analyzing tweets on e-learning and found the prevalence of one-way promotional material over bidirectional discussions among users, which suggested a necessity for quality control of educational information on social media. More recently, Fu et al. (2020) analyzed GDELT and Twitter data to examine public opinion about online education during COVID-19 using python.

The growing theoretical literature concerning public opinion towards emerging technologies revealed three main approaches: individuals’ general level of knowledge and attentiveness, trust in institutional actors and regulatory bodies, and individuals’ values/ethical considerations (Weldon & Laycock, 2009). This echoed Kardooni et al.’s (2018) framework which comprised four variables: cost, knowledge, trust and intention to adopt a certain type of technology. The “Deficit Model” discussed individuals’ knowledge and understanding of new technologies with the main assumption that the public would embrace new technologies if it were more knowledgeable about the benefits and limited risks of the technology (Sturgis & Allum, 2004). Under this assumption, a concerted effort to raise public awareness and knowledge will be necessary to increase support for the adoption of technology (Bodmer, 1985).

The trust in institutional actors and stakeholders is concerned with the trust of institutional actors and stakeholders for support of new technologies (Priest et al., 2003). In other words, individuals tend to look to trusted official sources to help them make decisions, particularly when they perceive a personal lack of knowledge in that area. This has been confirmed in previous studies (Barnett et al., 2007; Durant & Legge, 2005). To this extent, the public’s trust in technology stakeholders, government rules and regulatory bodies carries more weight than knowledge about science per se (Weldon & Laycock, 2009). The third approach of individuals’ ethical concerns and core values include their religious and moral inclinations. These orientations towards technology are embedded in spiritual beliefs, worldviews, and value positions (Cook & Fairweather, 2005). People’s affective orientations towards specific technologies have been found to be critical heuristics that help guide individual judgments (Weldon & Laycock, 2009), and greatly increase the explanation of variance in individual perceptions about technologies (Sjoberg, 2000).

The above theories provided guidelines when we sought to examine public opinions about technology. However, in the case of COVID-19, the use of technology in education is not an option but the only outlet that sustains learning opportunities. Public agencies, political entities, and research institutions may enable and constrain learning during a disease outbreak crisis (Muller-Seitz & Macpherson, 2014). Relative to users’ knowledge and religious/moral beliefs about new technology, the enforcement of online education by government and stakeholders of education in China carries the most weight, which could greatly affect the formation of opinions about online forms of learning. As Blumberg (2008) posited, the political power still dominates the country (China), despite the growth, openness, and diversification of religious beliefs. As such, the rudimentary form of a general model of public opinion towards online education in China is well reflected in the second approach as reviewed above. The extent to which the public trusts in government policies and technology stakeholders becomes more critical in determining how well or badly the public can accept, engage, and experience alternative learning modes.

Public perceptions about online education during crisis events

Recent three-factor theory and its extended factor theory identified a range of factors that influenced public opinion on education network. Each factor received a specific weight through the analytic hierarchy process, among which crisis events was assigned the second highest weight right after noumenon of public opinion (Li et al., 2021). Previous research on Ebola outbreaks has identified that knowledge and attitudes of students on disease transmission can help inform best practices for public health (Holakouie-Naeieni et al., 2015). During the SARS and H1N1 outbreaks of 2002 and 2008, Hong Kong implemented complete online schooling (Barbour et al., 2011), yet quite limited research was conducted regarding how students were impacted when schools had to close unexpectedly, indefinitely, and switched to online learning communities during a catastrophic and an urgent event (Unger & Meiran, 2020).

During COVID-19, several studies were conducted with large-scale social media datasets with the purpose of revealing how the public perceived and experienced online education. For example, Persada et al. (2020) explored public perceptions of online learning applications in Indonesia using Twitter data. Results showed that Indonesian students had a positive attitude towards online learning, and even had a significant willingness to recommend an online learning application when they found it useful to learn subjects taught at school. However, their study was conducted before the outbreak (from March to June 2019). The most relevant study was conducted by Chen et al. (2020) who collected online user experience data of seven major online education platforms in China within two periods: Nov 16, 2019 to Dec 16, 2019 and Feb 17, 2020 to March 17, 2020. They found that before the outbreak, users were concerned about the access speed, reliability, and timeliness of video information transmission of the platform, and the user experience of the Zoom Cloud platform was the best among all candidate platforms. After the outbreak, users mainly focused on course management, communication and interaction, learning and technical support services of the platform, and the user experience of the platform was the most important. Nonetheless, their data were restricted by the chosen learning platforms and did not address the semantic dimension of public opinions as expressed in the social networking platform. This present study aimed to fill in these research gaps.

Gender differences in social media on online education

Researching gender effects on the adoption of online learning has become more wildly implemented in recent years (Hilao & Wichadee, 2017; Park et al., 2019). Classic research on gender differences in decision-making processes has indicated significant differences in schematic processing by males and females (Bem & Allen, 1974). In view of this difference, it is not surprising that males and females respond differently to the emergence of new types of technology. In a recent meta-analysis by Cai et al. (2017), males held more favorable attitudes toward technology use than females, and there was only minimal reduction in the gender attitudinal gap in general. But when the general attitude was broken down to different dimensions, smaller gender differences in attitudes towards technology use were only observed in affect (about personal feelings and emotions such as comfort, anxiety, or personal liking associated with technology use) and self-efficacy (about one's ability in utilizing technology), but not in belief (concerning the usefulness of technology and its social impact).

With regards to online education in particular, Park et al. (2019) verified the moderating effect of gender difference between perceived usefulness of technology and intention to use it for learning— the effect of perceived usefulness on intention to use was greater for males, suggesting that men tend to be highly task-oriented when it comes to the adoption of technology. Other researchers also found the effect of gender on the determinants of adopting a new technology. For example, Moghavvemia et al. (2017) found that the effect of social influence was more salient in females when forming the intention to use a new technology, since females tended to be more sensitive to the opinion of others. In addition, females were more concerned about the existence of support and facilitating conditions to use e-learning. We believe that the findings from this line of research would help us better understand the adoption mechanism and gender differences in employing technology for online learning. As such, an investigation of gender differences in the public’s attitudes and thoughts toward online education would also inform policy makers, school leaders and online learning designers of how to encourage and improve learning processes against potential gender obstacles.

Present study

Microblogs have been used extensively to collect public opinions on different topics, such as responses to public emergencies (Fang et al., 2020), food safety (Zheng et al., 2020), MOOCs (Zhou, 2020), global climate change (Dahal et al., 2019), and so forth. By providing updates about themselves or sharing information on such platforms (Naaman et al., 2010), microblogs can convey information reflecting the emotional states of their authors. Microblogs facilitate the public to express their opinions regarding different issues (Dong et al., 2017), The large-scale data generated from these platforms allow researchers to examine whether the results generated by smaller-scale studies stand up to scrutiny, enable investigations of samples drawn at random from large populations, or investigate questions that can only be answered by larger datasets (Kimmons & Veletsianos, 2018). These public sources reveal important technical, social, institutional, pedagogical, and other related challenges surrounding online education and can inform future research about important topics of the highest societal and public importance (Kovanović et al., 2015).

According to the 2020 Weibo User Development Report, Weibo had 523 million active users. Despite social media censorship in China, the relatively liberal nature of this social networking platform allows users to openly share opinions on the advantages and disadvantages of any educational ideology. With the aim to solicit the public opinions about the adoption of online learning in China during COVID-19, we chose Weibo, the Chinese equivalent of Twitter, for data collection. As such, Weibo is considered as encompassing microscopic instantiations of public sentiment and opinions.

This study distinguishes itself from all existing studies in online education in at least three aspects. First, the analysis of large-scale datasets enables a comprehensive view of the topic under investigation. It is timely as this information is needed for different stakeholders to assess, monitor, and even predict the acceptance, utilization and efficiency of online education. Second, during this critical historical period, the three distinctive periods allow us to present a chronological view of public opinions on online education. Through this view, a roadmap can be identified for future research and development based on public demand, especially in response to emergency occasions. Third, simply moving a learning community online does not mean that it automatically becomes less aggressive or free of the gender-related problems that plague traditional classrooms (Anderson, 2006). Potential gender differences in online education-related microblogs were examined in this study based on the sentiment values during different periods. Scholars have found that female online students had to cope with insufficient interaction with teachers, problems or frustration with the technology (Müller, 2008), and asynchronous text-based online discussion may well suit females who are shy or reticent or who prefer more time to digest the content (Gunn et al., 2003).

As such, the current study was guided by the following research questions:

  1. (1)

    What was the overall online education-related microblog activity before, during and after the pandemic, respectively?

  2. (2)

    What was the pattern of the sentiment values of public opinion on online education before, during and after the pandemic, respectively?

  3. (3)

    Were there any gender differences in overall online education-related microblog activity and the pattern of sentiment values before, during and after the pandemic, respectively?

  4. (4)

    What were the main topics as shown in online education-related microblogs before, during and after the pandemic, respectively?

Method

Data collection

Microblog data were extracted with a self-developed algorithm using the following search terms: 在线教育 (online education), 远程教育 (distance education), 远程学习 (long-distance learning), 互联网教育 (internet education), 电子学习 (e-learning), 手机教学 (mobile teaching), 网课 (online course/class), 多媒体学习 (multimedia learning), 在线课程 (online course), 在线学习社区 (online learning community), 网络教育 (web education), 网络教学 (web-based teaching), 微课教育 Microlecture), 函授教育 (correspondence education), 联机学习 (online learning), 在线学习 (online learning), 网络学习 (e-learning), 网上学习 (web-based learning), 慕课 (MOOC), 网易公开课 (NetEase Online Open Courses), 学堂在线 (XuetangX), 沪江.

(Hujiang), 酷学习 (Kuxuexi), 可汗学院 (Khan Academy), 好大学在线 (CNMOOC), 中国大学(Chinese University), 新浪公开课 (Sina open course), 茱莉亚公开课 (Juilliard open course), CCTV中国公开课 (CCTV openCLA), 万门大学 (One-Man University), 百度传课 (Baidu Chuanke), 腾讯课堂 (Tencent Classroom), 优酷教育 (YouKu education), YY教育 (YY Education), 人民网公开课 (People’s Online Course), 优课联盟 (University Open Online Course), 高校一体化教学平台 (university integrated teaching platform), EduCoder在线实践教学平台 (EduCoder online practical teaching platform), 高校邦 (Gaoxiaobang), and 优学院 (Ulearning).

As reported in the White Paper by The State Council Information Office of the People’s Republic of China (2020), the outbreak of the COVID-19 pandemic was first revealed in July 2019 at Wuhan, Hubei Province. Since then, clusters of pneumonia cases of unknown etiology gradually emerged and confirmed cases were subsequently reported in other Chinese regions. A few months later, as the virus carriers traveled around the country, a nationwide spread signaled the peak of COVID-19 from 10th January 2020. Although a lockdown was imposed on Wuhan with travel restrictions, the number of daily confirmed cases rapidly reached 15,152 on 12th February. Efforts were made to consolidate gains in virus control, resulting in reported cases generally under control. At the beginning of May 2020, Chinese President Xi Jinping concluded that China achieved a major strategic success in COVID-19 control efforts, and stressed that nationwide virus prevention was conducted on an ongoing basis. Schools were reopened, and people resumed work as control measures are still ongoing until 30th November, wherein a “new normal” was achieved. Therefore, public responses to online education were collected from Weibo according to these three milestones: from 1st July, 2019 to 9th January, 2020 (pre-pandemic phase); from 10th January, 2020 to 30th April, 2020 (amid-pandemic phase); and from 1st May, 2020 to 30th Nov, 2020 (post-pandemic phase), respectively.

Within each specified timeframe, the relevant data were first collected and then processed using the MS Excel application. A second-level cleansing of the dataset was conducted by filtering out microblogs by zombie accounts (automated accounts with the purpose of inflating follower counts). All user identity details were removed to ensure anonymity and unbiased analysis. The final dataset included 212,335 microblogs by 87,100 users from 1st July, 2019 to 9th January, 2020, 1,048,575 microblogs by 538,021 users from 10th January, 2020 to 30th April, 2020, and 561,004 microblogs by 287,963 users from 1st May, 2020 to 30th Nov, 2020.

We retrieved gender information from user profiles on Weibo that described a series of personalized features of a specific user. This presented useful attributes including gender, age, occupation, location, and etc. (Yu & Yao, 2017). Although self-reported information might suffer from unintentional or intentional errors, this gender information retrieval method has been used widely in past research on social media (e.g., Li et al., 2014a, b ; Yang et al., 2013). Other user information was not be considered due to its unavailability (such as occupation) or incompleteness (such as age).

Data analysis strategies

As the process of determining the attitude or polarity of opinions (Liu, 2010), sentiment analysis has been applied to textual forms of opinions such as Facebook posts and comments (Hajhmida & Oueslati, 2021), public perceptions of chemistry (Guerris et al., 2020), (Santos et al., 2018), and the impact of research articles (Hassan et al., 2020), to name a few, to reveal behavioral and affective trends, and explain people's sentiments, attitudes, beliefs, and emotions (Persada et al., 2020). Sentiment analysis can be applied at three levels of detail: document, sentence, and entity/aspect (Dragoni et al., 2019). In this study, we adopted BERT (Bidirectional Encoder Representations from Transformers) model with sentence pair input for an aspect-based sentiment analysis. BERT is a pre-trained language representation model proposed based on deep learning techniques by a Google AI team in 2018 (Devlin et al., 2019). Different from other language representation models, BERT can generate deep bidirectional representations from the unlabeled input text. This model can achieve outstanding results for text recognition and classification (Sun et al., 2019), and encoding information from a huge amount of unlabeled data and fine-tuned on small supervised datasets specifically designed for certain tasks (Pota et al., 2021).

The sentiment analysis started with a pre-processing phase to remove spammers or those blogs intended for commercials from the dataset. The BERT model was then used in its pre-trained version on the text, because the pre-trained models avoid the time-consuming and resource intensive model training directly on microblogs from scratch, and allow to focus only on their fine-tuning (Pota et al., 2021). By dividing the microblog data into positive, negative, and neutral categories, each microblog was tagged with a numerical sentiment value of − 1, 0, or + 1 to indicate negative, neutral, or positive polarity, respectively.

In addition to sentiment analysis, content analysis was conducted to identify popular topics in online education-related microblogs. Content analysis has been well applied in the analysis of microblogs and online discussions on various topics, such as public transit (Casas & Delmelle, 2017), financial market (Li et al., 2018), and higher education (Pizarro Milian & Rizk, 2019). To provide deeper insight into how the community perceives online education in the Chinese educational landscape, especially over the period of COVID-19, microblog content was examined more closely to obtain a dynamic picture of how people reacted, and what elaborative ideas were contained in the microblogs and whether online education really remedied the urgent educational needs during the pandemic.

For content analysis, natural language processing (NLP) was first used for clustering messages related to a topic to identify the topics of concern (Zolnoori et al., 2019). Latent Dirichlet Allocation (LDA) algorithm (Bicalho et al., 2017) was adopted for this purpose as it has often been used to automatically identify the latent semantic topics in unstructured textual data, such as tweets or microblogs (Chen et al., 2018). The “Gensim” package (Řehůřek & Sojka, 2010) in Python was used to implement the LDA model. We then used the Hierarchical Dirichlet Process (HDP, Teh et al. 2006) to infer the number of topics from the data (Bertalan & Ruiz, 2019; Wong et al., 2019). To extract the discussed topics from the dataset, Gensim first cleaned the dataset by tokenising each sentence into a list of words and removing the punctuations and unnecessary characters. With the help of Gensim’s phrases model, we could create bigrams and trigram models, preparing for the next step of building the dictionary and corpus. LDA model was then built with different topics. Each topic was a combination of keywords and each keyword contributed a certain weightage to the topic. A flowchart is provided in Fig. 1 which describes the data analysis procedure.

Fig. 1
figure 1

A flowchart of the data analysis process

Results

General description

Figure 2 shows the total number of microblogs about online education before, during and after COVID-19, respectively. In general, the number of microblogs increased dramatically during the pandemic, with the peak reaching 430,566 in March 2020. After the pandemic, the number of microblogs about online education dropped again to a similar level to that before the pandemic. The lowest record (around 9000) of microblogs appeared in Nov 2020. Another look at the data showed that the dramatic increase of the number of microblogs about online education largely came from female users. Before the pandemic, females and males seemed to be equally “indifferent” to online education, but this pattern changed greatly during the pandemic. As shown in Fig. 3, the number of microblogs from females boomed during the pandemic whereas the number of microblogs male users only varied to some extent. After the pandemic, the gender gap was diminished, as indicated by the number of microblogs which returned to the previous pattern.

Fig. 2
figure 2

Total number of microblogs about online education before, during and after COVID-19

Fig. 3
figure 3

Change in sentiment values of microblogs about online education before, during and after COVID-19

Sentiment analysis

The sentiment data were reorganized according to the three main periods of time to identify the public opinion trend over COVID-19. Figure 4 revealed the percentages of public opinions based on sentiment value. Overall, the public expressed neutral perceptions of online education (on average 79.03%) with an extremely small portion of negative perceptions (on average 4.67%) before the pandemic. During the pandemic, however, the percentage of Weibo users holding neutral views about online education dropped to 33.04% while those with negative views increased to 51.63%. The percentage of positive views stayed the same. After the pandemic, the pattern somehow returned to the previous state, with neutral views taking the majority. Figure 5 provides a gender-based comparison of the change in positive, neutral, and negative public opinions. Over the outbreak, females’ negative opinions about online education increased from 5.53 to 19.17%, and did not drop very much after the pandemic (16.21%). Their positive views showed a steadily decreasing trend over the pandemic. In comparison, males seemed to hold a positive view on online education, regardless the phases. Although the percentage of negative opinions increased over the outbreak, the portion of females almost doubled that of males in being negative about online education.

Fig. 4
figure 4

Total number of microblogs about online education before, during and after COVID-19 by gender

Fig. 5
figure 5

Comparison across gender in the numbers of microblogs based on sentiment values before, during and after COVID-19

Content analysis

There is no universal rule for determining the number of topics existing in a dataset. To find the optimal number of topic, we followed most studies (Cha & Cho, 2012; Kobayashi, 2014; Vu et al., 2019) by using perplexity, which is based on the probability of unseen test set, normalized by the number of words to evaluate the goodness of LDA model (Neishabouri & Desmarais, 2020). To calculate the perplexity, we first trained and evaluated an LDA model on the dataset. This routine was then repeated for models with different numbers of topics, so that it became clear which amount led to the lowest perplexity (Jacobi et al., 2016), as a lower perplexity indicates a better prediction (Blei et al., 2003). The value of perplexity for the number of topics was represented as “weight” in Fig. 6. We observed that when the optimal number of topics was set to be 2 in the pre-pandemic dataset, the perplexity ceased to be continuously decreasing when it reached 0.024. Similarly, the optimal number of topics was set to be 4 in the amid-pandemic dataset because the weight ceased to decrease when it reached 0.028, and the optimal number of topics was set to be 7 in the post-pandemic dataset when the weight stopped decreasing when it reached 0.020. As the number of topics increased, the granularity of the topic became finer which did not contribute much more to our understanding of the content of each topic. We examined the top 15 tokens from each topic to represent the theme. Table 1 lists the word-level topics as identified by LDA.

Fig. 6
figure 6

Optimal number of topics by LDA

Table 1 Topics generated by LDA analysis (translated into English)

We integrated semantically related information into one theme. This led to a more abstract categorization of the content compared to the machine-detected categories. Before the pandemic, the first topic was mainly concerned with the purposes and functions of online education, in a positive tone. Participants also mentioned the instrumental value of learning online, such as for graduate study. The second topic was more related to personal experience while using online education, both in a positive (e.g., success) and negative (e.g., pressure) way. During the pandemic, the number of topics increased to 4. Given the outbreak, the terms that emerged reflected the special concern with schools, teachers, students and curricula. For example, Topic 1 in this period explicitly included such terms as “prevention and control”, “pandemic”, and “China”. This reflected the close relationship between online education and external environment at the macro level. Topic 2 was more related to the users of online education, including teachers and students, and their education and future. Topic 3 was more personally related, such as personal mood and growth. The last topic during this period was more school-related, including terms of “homework”, “course”, “internet surfing” and etc.

After the pandemic, despite the decreasing number of microblogs, the topics that emerged from the Weibo were of an even bigger variety. Among the seven topics, Topic 1 and 4 were more political than other topics as shown in the key term of “resisting”, “education bureau” and “resistance”, as a response to the actions made by the government of suspending schools during the outbreak. Topic 2 was about resuming face-to-face classes after the pandemic. Topic 3 seemed to be more concerned with one’s life at a more general level with the use of online education, such as personal growth, life taste, and so on. Topic 5 tapped on the multiple consequences of adopting online education, such as learning English and popularizing science. Topic 6 was more related to how students made fuller or better use of online education, by mentioning instructors and studying, with a particular focus on personal experiences such as working hard. Topic 7 presented technology-related posts about online education, such as “tablet”, and “zoom in/out”. Positive emotional terms mainly appeared in Topic 3, as shown by the keywords of “passion, active”, where a bit of negative emotions were expressed in Topic 5 such as “anxiety”.

Discussion

This study examined Chinese public responses to online education based on social media data. To address the first research question (Research Question 1) about the overall trend over COVID-19, the results suggested that public interest in distance education was on a quite stable trend both before and after the pandemic, as reflected in the even number of microblogs. When we entered the phase of the pandemic, the number of microblogs increased almost three times within Jan 2020. The peak was seen in March 2020, when the whole country closed schools and institutions switched to online learning completely. This shows that emergency situations influence public concerns about education to a great extent (Li et al., 2021), when the public is engaged in a form of education, they were eager to express their thoughts and opinions. This pattern was particularly salient in females (Research Question 3). Further examination of the limited info on users’ age showed that over 80% of the users who were engaged in Weibo activities were post-90 s and post-00 s females, who were most likely in their student phase of life. Yan et al. (2021) have reported that females were more vulnerable to developing mental or psychological problems in response to life stressors or potentially traumatic events. Anxious individuals tended to use social media more often to actively seek a path to adapt to the current pandemic situation (Cauberghe et al., 2021).

With regards to the second question (Research Question 2) on sentiment values of public opinion on distance education in China, the results revealed a steadily low, positive view with respect to adopting online forms of teaching and learning, regardless of whether it was COVID-19 (between 10 and 20%). The most dramatic change in the percentage of sentimental value was found in those holding negative perspectives towards online education. During the pandemic, it was more than 10 times of the percentage before the outbreak. This result aligned with a recent study on Chinese user satisfaction with selected online education platforms during the pandemic (Chen et al., 2020), but stood in contrast with past studies that examined the public view of online learning such as MOOCs. In a series of time-series studies (Zhou, 2020; Costello et al., 2016), negative opinions of MOOCs seemed to remain stable with only a small portion. Research has shown that remote learning can be as good or better than in-person learning for the students who chose it (Fitter et al., 2020). It appeared that the public became more negative when they were “forced” to engage in and be adapted to online education on an involuntary basis during the pandemic. This was well explained by the self-determination theory that one’s autonomy of making decisions in learning (e.g., choosing learning mode) determined learners’ attitude to a great extent (Deci & Ryan, 1985). Indeed, learning theories have articulated the downside of requesting learners to adopt a given mode of learning, which jeopardizes their motivation as well as attitude. People feel more autonomous when they are enabled to choose between options (Katz & Assor, 2007). This sense of autonomy promotes successful learning experiences and satisfaction (Ghanizadeh, 2016).

Further to explore this issue by looking at potential gender differences in microblog activities based on the sentiment patterns (Research Question 3), the results revealed that females’ negative opinions constituted the main reason for the above observation during the pandemic, with the percentage of females nearly doubling the number of males. This was in line with Cai et al. (2017) meta-analysis and Shapiro et al. (2017) sentiment analysis on gender and attitudes toward technology use that males held more favourable attitudes toward technology use than females. It appeared that female Weibo users became bigger in number and more negative while engaged in online education during the outbreak, and this trend even continued into the post-pandemic phase. More recently, Prowse et al. (2021) also observed that female students were more likely than male students to find the transition to online learning difficult during this critical period and that the COVID-19 had negatively impacted their learning.

Subsequent content analyses sought to address the potential model to be developed based on the content of microblogs (Research Question 4). The keywords extracted from the Weibo dataset by LDA analysis generated different user profiles, which can be classified into a threefold typology based on the views of how distance education functioned over COVID-19. Before the pandemic, the positive tone in both topics confirmed the power of online education that can help with one leveling up the life standard (as seen in taking postgraduate and English language exams) in the hope of bringing better changes to one’s life. We thus profiled them as “affirmative” users. During the pandemic, more attention was paid to school, curricula, teachers and students. Apparently, online education has been integrated into a daily practice during this special phase, unlike being an option for furthering one’s study and growth before the pandemic. When online education became a dominant mode of teaching and learning, the public was more concerned with its curriculum design and learning effectiveness. We thus profiled them as “achievement-based” users. After the pandemic, the public attended to different aspects of online education. Some people cared more about the social political consequences of adopting online education and suspending/re-opening schools. Some people reflected about how this decision affected our life in general. Others were more concerned with technological aspects of learning online, while the rest were still viewing it as part of school education. We thus profiled them as “critical” users.

Caution is needed when interpreting our results. First, microblogs are not the only outlet through which the public can voice its opinions. Many netizens also opted for the more private WeChat’s Moment for such purposes (Li et al., 2014a, b). Thus, although a considerably large number of microblogs were analyzed, it could be argued that the sample size was still insufficient to generalize the observed trends. Personal websites, chat boards, discussion forums, and other types of social media can also provide invaluable information for a more solid conclusion. Further, one may argue that the topics obtained through LDA might be not optimal and that different hyperparameters could have generated more coherent topics. Unfortunately, the literature on this issue gives very little advice on how such optimization can be achieved (Péladeau & Davoodi, 2018). While Mimno et al. (2011) proposed to optimize LDA through internal or external coherence measures, optimizing topic modeling this way tended to favor topics lacking independence. Last, the analysis of key terms and word frequency alone may not be able to capture the full picture. As Tsou (2015) posited, studying such public data from social media needs to include the complexities of human dynamics and spatiotemporal analysis. In future studies, other data sources could be considered such as survey (O’Connor et al., 2010). To analyze and visualize social media data more effectively, we could combine, integrate, and cross-reference multiple data sources together and explore their dynamic spatiotemporal patterns in visual graphs (such as a visual cluster analysis, Zhao et al., 2020 or word shift graphs, Cody et al., 2015). By doing so, we can generate big ideas, big impacts, and big changes toward the future development of research in this area.

Conclusions and implications

Online education has traditionally been viewed as an alternative pathway, one that is particularly well suited to adult learners seeking higher education opportunities. However, the emergence of the COVID-19 pandemic has required educators and students across all levels of education to adapt quickly to virtual courses. The pandemic has forced the world to engage in the ubiquitous use of virtual learning (Lockee, 2021). Despite the above limitations, the user-generated large-scale dataset in this study provides a holistic view and can be effectively used to understand and possibly improve learners’ experiences online by capturing their most important concerns, compared to purely instrument-driven survey-based studies (e.g., Bakhmat et al., 2021) which restrict the expression of public opinion to a certain reference frame and small-scale studies involving only selected online learning platforms (e.g., Chen, Peng, et al., 2020).

Academic implications

The current study makes several unique contributions to the existing online education literature. First, this study focuses on the context of China which was largely neglected in previous studies of public views to online education (Giannakoulopoulos et al., 2019; Fu et al., 2020). The present study provides insights and offers diverse perspectives of online education research so as to enable the field to take a more active role in societal conversations of interest. The massiveness and magnitude of the datasets yield tremendous data and the analysis of these data (Kimmons & Veletsianos, 2018) prepares a useful and timely tool to assess the orientation of Chinese public opinion toward new forms of teaching and learning.

Another innovation of this research is the attention to the pandemic period, a historical phase in human history. To our knowledge, this is the first attempt in analysing public opinions about online education in China during this period. By attending to the emotional orientations of online posts over the outbreak, researchers are better informed of the developmental trend of online interaction over such an abrupt transition to online learning. Subsequent gender analyses highlighted the need to attend to females versus males’ attitudes and thoughts about online education. It was exciting to see more females in China were active in voicing their perceptions and thoughts, although they were expressed in a more negative manner than their male counterparts, which provides us with a new territory to revisit this issue.

Last, the user profiles generated by our LDA analyses pointed to the need to attend to the variation of public opinion about online education both temporally and laterally, as well as the direction when we attempt to establish such a model. As past models of public opinion tended to focus on one temporal dimension or multiple dimensions within the same timeframe, we proposed the possibility that the public opinion model could vary substantially across periods of time and the context within each period makes a case for re-examining the content and trend of public opinion about the same issue/phenomenon.

Practical implications

The findings have great implications for different stakeholders in online education as well. First, and fundamentally, the results from the current study confirm that the Chinese users as a whole became more critical about the idea of online education especially after the pandemic. The experience of a forced and hurried review of instructional approaches provided a rare opportunity to reconsider/reformulate strategies that best facilitate learning within the affordances and constraints of the online context (Lockee, 2021). In particular, the expectations of online learners to be highly self-disciplined and self-regulated are challenging to be met (Gillett-Swan, 2017). “Sleepy” and “anxious” also appeared in the keywords in our content analysis. Thus, a greater variance in teaching and learning online activities is needed to maximize the value of “staring at screen” during lengthy online sessions.

While acknowledging the value and benefit of online education, the negative views shown in our datasets alerts us of staying critical about it. As online education is promised to be a pervasive force that can meet our emergent needs during unexpected events, questions about how to reach the financially disadvantaged population should be central to the field as we do not want to sacrifice the effort to develop a more just and fair educational system. One topic that emerged from our content analysis suggested that technology is a big concern to enable this form of education. This is an even bigger challenge than the concern with how to improve online teaching and learning pedagogy.