Leveraging text mining and analytic hierarchy process for the automatic evaluation of online courses

Chen, Xieling; Xie, Haoran; Tao, Xiaohui; Wang, Fu Lee; Cao, Jie

doi:10.1007/s13042-024-02203-6

Leveraging text mining and analytic hierarchy process for the automatic evaluation of online courses

Original Article
Open access
Published: 21 May 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Machine Learning and Cybernetics Aims and scope Submit manuscript

Leveraging text mining and analytic hierarchy process for the automatic evaluation of online courses

Download PDF

Xieling Chen¹,
Haoran Xie²,
Xiaohui Tao³,
Fu Lee Wang⁴ &
…
Jie Cao⁵

424 Accesses
Explore all metrics

Abstract

This study introduced a multi-criteria decision-making methodology leveraging text mining and analytic hierarchy process (AHP) for online course quality evaluation based on students’ feedback texts. First, a hierarchical structure of online course evaluation criteria was formulated by integrating topics (sub-criteria) identified through topic modeling and interpreted based on transactional distance and technology acceptance theories. Second, the weights of the criteria in the hierarchical structure were determined based on topic proportions. Third, the AHP was employed to determine the overall relative advantage of online courses and their relative advantage within each criterion based on the hierarchical framework and criterion weights. The proposed approach was implemented on the datasets of 6940 reviews for knowledge-seeking courses in Art, Design, and Humanities (D1) and 44,697 reviews for skill-seeking courses in Computer Science, Engineering, and Programming (D2) from Class Central to determine ranking positions of nine courses from both D1 and D2 as alternatives. Results revealed common concerns among knowledge and skill-seeking course learners, encompassing “assessment”, “content”, “effort”, “usefulness”, “enjoyment”, “faculty”, “interaction”, and “structure”. The article provides valuable insights into the online course evaluation and selection processes for learners in D1 and D2 groups. Notably, both groups prioritize “effort” and “faculty”, while D2 learners value “assessment” and “enjoyment”, and D1 learners value “usefulness” more. This study demonstrates the efficacy of leveraging online learner reviews and topic modeling for automating MOOC evaluation and informing learners’ decision-making processes.

Diagnostic Evaluation of MOOCs Based on Learner Reviews: The Analytic Hierarchy Process (AHP) Approach

Research on the Quality Evaluation of Online Teaching of Instrument Analysis Courses in Universities Based on Analytic Hierarchy Process

Teaching Evaluation Index Based on Analytic Hierarchy Process

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Massive open online courses (MOOCs), enabling the delivery of high-quality education on an unprecedented level in terms of cost-effectiveness and worldwide accessibility [1], have captured the attention of researchers to explore diverse topics such as MOOC classifications, learning engagement, and concept recommendations. For example, utilizing 102,184 reviews across 401 MOOCs as the foundation, Chen et al. [2] devised DNN-powered models to autonomously differentiate a set of semantic groupings. Wei et al. [3] examined the correlations between motivation, perceived support, engagement, and self-regulated learning strategies concerning learners’ perceived outcomes in MOOCs using an online survey involving 546 participants. Gong et al. [4] focused on reinforcement learning and heterogeneous information network-driven concept recommendations by leveraging the interactions between users and knowledge concepts, and among users, courses, videos, and concepts. Gong et al. [5] proposed an attention-driven, heterogeneous graph convolutional deep knowledge recommender designed to suggest knowledge concepts within MOOCs. The recommender harnessed content and contextual information to master entity representation through graph convolution networks.

The proliferation of MOOCs has generated concerns about and prompted extensive research on courses’ quality and effectiveness. Currently, MOOC websites only provide rough overall ratings, making it challenging to differentiate between various courses, especially in the cases of overall course ratings displayed on the MOOC website. Therefore, there is a pressing need to obtain detailed insights into the performance of MOOCs based on different evaluation criteria [6]. As learners have diverse needs, some may prioritize assessment aspects, while others may value interaction. By displaying course performance across different dimensions, learners can identify courses that excel in criteria that are most relevant to their preferences.

Traditional top-down methods for MOOC course evaluation included surveys, expert interviews, and literature reviews [7,8,9]. Survey-based methods are frequently utilized for exploring learner engagement, satisfaction, and the intention to continue with courses. For instance, Dai et al. [9] revealed that learners’ attitude significantly impacted their intent to continue, and MOOC instructors should be cautious in their course promotions to avoid overemphasizing benefits. Similarly, based on 622 structured questionnaires from undergraduate students in Malaysia, Albelbisi et al. [8] revealed that (1) system quality positively influenced satisfaction, (2) satisfaction and service quality positively affected self-regulated learning, and (3) system quality affected self-regulated learning through satisfaction. Interviews, although involving fewer participants, present the advantage of flexible and in-depth questioning to gain deeper insights into learners’ motivations, course completion, and satisfaction levels. For instance, Zhu et al. [7] carried out semi-structured discussions with 15 online learners, revealing that learners’ contentment was influenced by course design factors (such as well-structured organization) and instructional methodologies (such as instruction presence). However, surveys and interviews suffer from delayed feedback, time and cost constraints, expert-centricity, and a lack of learner perspectives. To address these limitations, adopting a scientific approach for systematically and objectively assessing the quality of online courses becomes crucial. Such an approach is pivotal in diagnosing and improving online course quality.

Due to the progress in big data and text mining methodologies, scholars have shifted their focus towards utilizing online course review data to obtain valuable insights for enhancing course quality [10]. This approach facilitates a more comprehensive comprehension of learners’ emotions, actions, course acceptance, platform comparisons, and prevailing trends, contributing to a better grasp of the crucial elements influencing MOOC success [11]. However, existing methods for determining online course quality evaluation indicators and [or] their weights often rely on group decision-making [1, 12, 13], resulting in poor adaptability and applicability for fine-grained evaluation of different types of courses. Expert simulation evaluations and the use of pre-determined indicators may not accurately reflect learners’ experiences and requirements, resulting in a lack of learner-centeredness in course evaluation. Thus, there is a need to leverage crowdsourced data from MOOC platforms, artificial intelligence (AI), and text analysis techniques to empower researchers to effectively tap into the collective wisdom and expertise of learners to enhance course design and learner contentment [14].

To address deficiencies in prior studies, this study focuses on learners’ learning experiences and needs, breaking away from the reliance on experiential judgment and excessive human subjectivity in traditional online course quality research. The objective is to leverage unstructured data from student feedback texts, utilizing text mining and hierarchical structure modeling to develop an intelligent evaluation model and implementation plan for online course quality.

Accordingly, the present study formulates three research questions (RQs):

RQ1: What factors will be included in the multi-criteria decision-making framework? What are the similarities and differences between the selection of knowledge- and skill-seeking courses?
RQ2: What are the most influential factors in course selection? What are the similarities and differences between the selection of knowledge- and skill-seeking courses?
RQ3: What are the ratings of courses and how do they rank according to sub-criteria and criteria?

This research endeavors to answer these questions by presenting a novel crowdsourcing technique that utilizes text mining and analytic hierarchy process (AHP) to automatically evaluate MOOCs. The evaluation is based on the aggregation of learners’ reviews collected from 169 MOOCs on Class Central. Considering RQ1, this study integrates topics (sub-criteria) identified through topic modeling and interpreted under the framework of transactional distance and technology acceptance theories to form the hierarchical structure of MOOC evaluation criteria. Regarding RQ2, this study leverages the probability distribution of topics (sub-criteria) identified through topic modeling to weigh the relative importance between criteria. Subsequently, addressing RQ3, based on the established hierarchical structure and the respective criterion weights, this study employs AHP to rank the online courses to determine their overall relative advantage and relative advantage within each criterion. By doing so, this study can provide a finely-tuned analysis methodology for large-scale online educational course quality evaluation using a text mining perspective, thus providing the necessary technical support for investigating large-scale online course quality evaluation.

2 MOOC evaluation based on review mining

Currently, online course quality evaluation research primarily focuses on the construction of evaluation indicators and models. These studies often rely on traditional methods such as literature review, questionnaire surveys, and expert scoring. However, these methods are time-consuming and costly, and they are subject to experiential evaluations and subjective interventions, resulting in poor adaptability of indicators and difficulty in conducting detailed evaluations for different course types [12, 15].

Consequently, the investigation into developing theories to assess the caliber of online courses using text mining from student feedback has garnered considerable interest. The main achievements in this field involve using topic modeling to automatically identify latent evaluation topics from text data [16,17,18]. Through the amalgamation of findings from topic modeling and undertaking theoretical examination, course evaluation indicators can be formulated. Based on this, course evaluation indicators can be hierarchically aggregated according to their affiliations, forming a multi-level analytical structural model, thus accomplishing the establishment of the theoretical framework for course evaluation, and authentically reflecting the learners’ demands, guiding the course evaluation practice effectively.

Traditional research on course quality evaluation often relies on qualitative group decision-making and expert simulation evaluation methods, neglecting the role of learners as the primary stakeholders. Moreover, the use of pre-determined indicator factors often results in measurement items that fail to accurately reflect learners’ learning experiences and needs, making it difficult to achieve large-scale, normalized, and continuous course quality monitoring. However, learners’ perceived learning experiences are crucial references for online course design and quality improvement. Therefore, it is necessary to focus on learners’ experiences and needs and use text mining in course evaluations to provide important foundations for the evaluation of online teaching effectiveness from the learners’ perspective [1].

However, existing research on online course quality evaluation using text mining often relies on subjective assignment methods such as AHP [19] to obtain indicator weights [1], heavily relying on expert opinions and judgments, which fail to reflect the learners’ true experiences and needs. By utilizing the estimated popularity of evaluation topics obtained via topic modeling as weights for course evaluation indicators to represent the learners’ levels of interest, achieving automated customization of indicator weights and enabling automatic sorting of various courses to be evaluated [20].

3 Data preparation

3.1 Review dataset collection

The MOOC reviews were scraped from Class Central using a self-developed crawler, which was then parsed into Excel files for further processing. Upon removing duplicates and MOOCs with less than 20 review comments, non-English reviews were identified and subsequently excluded. using a Python package named “langid”, a standalone language identification tool. Python with TextBlob was used to automatically check and correct spelling. For instance, “I havv goood speling” was corrected to “I have good spelling”. Two types of courses were considered in this study. The first relates to Art, Design, and Humanities, which is generally in a domain that is “knowledge-seeking”. The second relates to Computer Science, Engineering, and Programming, which is generally in a domain that is “skill-seeking”. As a result, a total of 52,881 reviews were obtained.

3.2 Helpful review identification

Among the 52,881 reviews, 4407 reviews had a helpful vote value. For these 4407 reviews, this study followed O’Mahony and Smyth [21] to define the top 75% ranked by the number of the helpful votes they received as helpful reviews, while the rest were treated as unhelpful reviews. An examination of 50 randomly selected unhelpful reviews suggested that they mostly lacked valuable information pertaining to specific aspects of MOOC courses. Exampled included: “great class”, “the class was amazing”, “omg so good”, “I loved this class”, “one of the best I have ever taken”, “strongly recommended”, and “I don’t know what to say”. As a result, 3305 helpful reviews, labeled as “1” and 1102 unhelpful reviews, labeled as “0”, were randomly divided into training and testing datasets to train and test the classifier based on a Naive Bayes model and Word Level TF-IDF to automatically predict labels (“1” and “0”) of the 48,474 reviews without a helpful vote value. The prediction results, combined with the previously labeled sample, comprised the datasets of D1 and D2 for (1) Art, Design, and Humanity and (2) Computer Science, Engineering, and Programming courses, respectively. The numbers of courses and reviews for D1 and D2 datasets are presented in Table 1. Specifically, the D1 dataset comprised 63 courses and 6940 helpful reviews, taking proportions of 37.28% and 13.44% of the total number of courses and helpful reviews included, respectively. The D2 dataset comprised 106 courses and 44,697 helpful reviews, taking proportions of 62.72% and 86.56% of the total number of courses and helpful reviews included, respectively.

Table 1 Number of courses and reviews for the D1 and D2 datasets

Leveraging text mining and analytic hierarchy process for the automatic evaluation of online courses

Abstract

Similar content being viewed by others

Diagnostic Evaluation of MOOCs Based on Learner Reviews: The Analytic Hierarchy Process (AHP) Approach

Research on the Quality Evaluation of Online Teaching of Instrument Analysis Courses in Universities Based on Analytic Hierarchy Process

Teaching Evaluation Index Based on Analytic Hierarchy Process

1 Introduction

2 MOOC evaluation based on review mining

3 Data preparation

3.1 Review dataset collection

3.2 Helpful review identification

3.3 Data preprocessing

4 Methods

4.1 Creating STM models

4.2 Exclusivity and coherence measures

4.3 Topic representation and assignment

4.4 Determining criteria, weights, and alternatives in AHP

4.5 Course ranking via AHP

5 Results

5.1 Topic identification

5.2 Course ranking

6 Discussion

6.1 RQ1: What factors will be included in the multi-criteria decision-making framework? What are the similarities and differences between the selection of knowledge- and skill-seeking courses?

6.2 RQ2: What are the most influential factors in course selection? What are the similarities and differences between the selection of knowledge- and skill-seeking courses?

6.2.1 Effort

6.2.2 Faculty

6.2.3 Assessment

6.2.4 Enjoyment

6.2.5 Usefulness

6.3 RQ3: What are the ratings of courses and how do they rank according to sub-criteria and criteria?

6.4 Implications and suggestions

6.5 Reflections on research methodologies

6.6 Limitations and future work

7 Conclusion

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation