Introduction

Learning management systems (LMS), at times referred to as course management systems (CMS; Malikowski et al., 2007), have emerged as a standard component of higher education institutions for the web-based delivery and management of courses (Al-Busaidi & Al-Shihi, 2012; Turnbull et al., 2019). LMS has been defined as information systems that generate, distribute, and manage learning content as part of an organization’s IT infrastructure (Martins et al., 2019). Given learning management system’s history of underutilization in many higher education institutions (Kite et al., 2020), it has become increasingly important to find waysof promoting its use.

The COVID-19 pandemic highlighted the value of LMSs in facilitating online teaching and learning in colleges and universities. At the height of the global health crisis, LMSs have transitioned into a key information and communication tool in higher education institutions that formerly only used LMSs to supplement traditional classroom-based instruction. Measuring students’ LMS experiences could generate valuable information for improving user experience especially with the current resumption of face-to-face classes. Capturing their experiences may also generate valuable insights that would help promote the continued use and adoption of LMSs in the post-COVID pandemic era. Lastly, the growing appeal of massive open online courses (MOOCs) and open universities (Lemley, 2015) that exclusively rely on LMSs to deliver courses further underscores the importance of capturing students’ LMS experience.

In the current study, we identify constructs that are important in measuring the LMS experience of students. Decisions were based on a scoping review of LMS studies that captured LMS experience with the use of quantitative measurement tools (Simon et al., 2023). In this scoping review, we identified 270 concepts measured in the included studies with 2327 items in total. The top ten concepts measured in the studies were perceived usefulness, perceived ease of use, behaviour intention, information quality, performance expectancy, social influence, effort expectancy, facilitating conditions, system quality and self-efficacy. Through this scoping review, we found that it was common practice for LMS researchers to adopt items from pre-existing tools. In the majority of studies, item selection process was contingent upon the theories and frameworks guiding the research model. While we adhere to the same procedure, our decision process was further informed by findings from qualitative interviews we conducted with students and teachers. Using a hybrid approach in the process of item selection, we applied the results of the qualitative study to supplement other existing frameworks used in educational technology as well as educational psychology literature. This process entailed modifying items from previous studies to align with our chosen constructs and their definitions (see Supplementary Table 1). We discuss the theories employed and the process of item selection in the section below.

Study framework

We drew from the work of AL-Nuaimi et al. (2022) that employed three theories often utilized in LMS acceptance and adoption studies (Al-Nuaimi & Al-Emran, 2021; Reddy et al., 2021)—Information Systems Success Model (DeLone & McLean, 1992, 2003), Technology Acceptance Model (TAM; Davis, 1989), and Theory of Planned Behavior (TPB; Ajzen, 1991). We employed an updated Information Systems Success Model that incorporates service quality in addition to what had originally been identified as Information Systems success factors. We also adopted two core variables from the TAM, perceived ease of use and perceived usefulness, as these two demonstrated explanatory power in various contexts including LMS use (Al-Nuaimi & Al-Emran, 2021). The inclusion of TPB, on the other hand, is an acknowledgment that LMS acceptance and adoption decisions are influenced by attitudinal and social normative factors (Ajzen & Fishbein, 1973). We added more constructs based on findings gleaned from the qualitative interviews we conducted with teachers and students on their use of LMS.

Anchored on the above-mentioned theories, the present study examines essential factors for information systems success (information quality, system quality, and service quality). We related these factors to two fundamental perceptions of LMS that have been extensively used by past researchers—perceived ease of use and usefulness. This is important as research examining the relationship of quality factors on LMS perceived ease of use and usefulness are rather scarce (Al-Nuaimi et al., 2022). The significance of studying external factors that are likely to be associated with perceived ease of use and usefulness stems from a strong literature base that supports how these two variables lead to behavioral intention and actual use of a system (Cigdem & Topcu, 2015; Raza et al., 2021; Teo et al., 2019; Watty et al., 2016). The theory of planned behavior suggests a link between attitudes (affected by past interactions and experiences) and behavior (Ajzen, 1991). The influence of the social group to which the individual belongs is purported to exert a great influence on attitude development and intention to perform a particular behavior. The same goes for the extent of their belief in their ability to control the behavior in question. Therefore, we additionally included socio-psychological constructs from the said theory, subjective norms and perceived behavioral control. These two factors were found to be related to behavioral intention and actual LMS use (Huang et al., 2019; Kim et al., 2021).

The Technological Pedagogical Content Knowledge framework (TPACK; Mishra & Koehler, 2006) is a widely used framework when discussing successful Information and Communication Technologies integration in education (e.g., Hew et al., 2019; Rosenberg & Koehler, 2015; Willermark, 2018). The framework posits that technological knowledge, pedagogical knowledge, context knowledge, and their intersections impact effective technology integration in teachers’ practice. Results of the qualitative interviews with LMS users (students and teachers) conducted prior to the selection of constructs for the present study confirmed how the dynamic interaction among technological knowledge, pedagogical style, and course content influences educational technology integration. Hence, in addition to the four mentioned constructs that are hypothesized to impact behavioural intention to use LMS, we add to the original model two variables—instructor quality and relevance of LMS to content and pedagogy. Instructor quality captures students’ perceptions of teachers’ technological, pedagogical, and communication skills in the LMS context, while relevance of LMS to content and pedagogy captures students’ perceptions of the LMS’s relevance to their program and to most aspects of their learning. Most LMS acceptance research customarily practices integration of socio-psychological theories and information systems theories. Still, compelling questions remain in motivational research in the use of learning management systems (Huang, 2022). Quite understandably, most models examine the factors that lead to LMS use (Cigdem & Topcu, 2015; Garone et al., 2019; Panigrahi et al., 2018; Raza et al., 2021; Teo et al., 2019; Watty et al., 2016). Interest and research on the use of learning analytics (e.g., digital trace data) to monitor behavioural engagement of students in LMS are also on the rise (Wong et al., 2021; Ye & Pennisi, 2022). But in order to measure the impact of an educational technology tool, examination must go beyond intention to use and actual use of the tool. Therefore, in measuring the LMS experience of students, we deem it imperative to capture outcomes relevant to learning in the LMS: perceived performance, motivation to learn through the use of a LMS, and self-efficacy for the course through the use of a LMS. These three factors can fall under net benefits as defined in DeLone and McLean’s updated IS success model (2003). As the variables pertain to the perceived performance, motivation to learn, and self-efficacy of students, our focus is on how the information system, in this case the LMS, benefits the individual.

Past research on LMSs tended to either only measure students’ motivation in the learning content (e.g., course, domain) (e.g., Ozonur et al., 2018; Karaoğlan Yılmaz, 2022), or only measure their intention to use and actual use of a new digital learning tool (e.g., AL-Nuaimi et al., 2022; Koh & Kan, 2020). Without a measure for both, we cannot know whether the learning experience with new technologies actually contributed to the development of motivation in the learning content. Students might show greater attention or effort that contribute to stronger learning gains due to the novelty of the learning tool, but once students get used to it, their engagement might start to diminish (i.e., novelty effect, Koch et al., 2018). More importantly, previous frameworks could not fully explain why the intention to use and actual use of LMS or any learning tool connect to students’ motivation towards learning. Expectancy-Value Theory (e.g., Eccles & Wigfield, 2002; Eccles et al., 1983; Wigfield et al., 2021) postulates that achievement-related choices are motivated by a combination of students’ expectations for success and subjective task value in particular domains. In other words, students are more likely to pursue an activity if they expect to do well and if they value the activity. Hence, we applied Expectancy-Value Theory to come up with constructs that reflect this. Learning theories such as the four-phase model of interest development (e.g., Hidi & Renninger, 2006; Renninger & Hidi, 2017) and Model of Domain Learning (e.g., Alexander, 2003) and empirical evidence (e.g., Fryer et al., 2021) indicated that knowledge, motivation (i.e., interest) and confidence (i.e., self-efficacy) influence one another, and that quantitative and qualitative changes occur in these components during learning activities. Consequently, we added items that capture a student’s confidence in their ability to succeed in their academic tasks and to perform well through the use of LMS (self-efficacy in the course through the use of LMS and perceived performance). We also came up with items that indicate how useful or enjoyable the student perceives learning to be when using the LMS (motivation to learn through the use of LMS). The value a student places on learning via LMS is assumed to be shaped by their prior experiences, beliefs, and environmental influences. These prior experiences and beliefs are reflected in the constructs of perceived usefulness (as a parallel to utility value in the theory), perceived ease of use (as a parallel to cost), and other constructs (e.g., perceived behavioural control, subjective norms) anchored on different theories mentioned previously. We argue that authentic assessment of e-learning success should consider the impact of various technological, social, and individual difference variables on learning outcomes relevant to the use of a particular educational technology tool. This study aligns with previous suggestions to combine the assessment of technical features in an LMS and the evaluation of constructs pertaining to learning (Malikowski et al., 2007).

Significance and aims of the study

Modular Object-Oriented Dynamic Learning Environment (Moodle) was the LMS evaluated in this study. It is the official LMS used by teachers and students at a public university in Hong Kong where the survey was administered. Unlike Blackboard or Canvas, Moodle is an open-source, free to use LMS that is low cost and flexible. Relative to other platforms, Moodle can easily fulfil the unique needs of different faculties. Moodle had a record of having 11,289,190 members by the end of 2019 (Hill, 2019). A systematic review found it to be the most common and most preferred open-source LMS (Altinpulluk & Kesim, 2021). Moodle version 3.11 was being used at the time when the survey was distributed among students.

The need for this study is justified by the lack of studies on LMS conducted in Asia (Hu et al., 2019; Huang, 2022). This is a point of concern given the monumental increase in the number of students who have access to higher education in the region (Huang, 2012). Second, the significance of examining the factors that impact the success of LMS use in higher education institutions during the pandemic has been underestimated (Al-Nuaimi et al., 2022; Alsabawy et al., 2016). There is a need to revive interest in a systematic validation and evaluation of LMS experiences now that most universities globally have shifted back to face-to-face classes. Third, a number of studies neglected to report the psychometric properties of the instruments that they used (Liu & Tsai, 2011; Zhang et al., 2021). We therefore saw the importance of developing and validating an empirical and theory-based measurement tool to capture the overall LMS experience of university students. Fourth, to the authors’ knowledge, there is currently no standard Chinese translation of LMS instruments in the literature. While English is the most international and universal language used globally, Chinese comes in second with roughly 1.2 billion people who speak the language (Lane, 2023). For this reason, we aim to assess the psychometric properties of the LMS instrument for use in universities around the globe, and to have a Chinese version available as well. As an additional objective, we aim to test the measurement equivalence of the constructs across the English version and the Chinese translation of the scales. The Chinese translation in this study employed traditional Chinese characters that can easily be modified to simplified Chinese. Should invariance be achieved, indicators can then be said to reflect the same underlying construct across the Chinese and English versions and thus have the same meaning.

The present study aimed to address three specific research questions relevant to measuring the LMS experience of students using multiple constructs:

Research question 1: Do the selected constructs have sufficient validity and reliability to measure the LMS experience of undergraduate students?

Research question 2: Are the constructs interrelated in a manner consistent with the underlying framework used in the selection of items for measuring LMS experience of undergraduate students?

Research question 3: Are the constructs for LMS student experience measured equivalently across the English and Chinese versions?

Methods

Participants and procedures

The project was reviewed and approved by the university’s Institutional Review Board. A protocol was pre-registered and published in the Open Science Framework (OSF) repository prior to conducting the current study. It can be accessed through the following link: https://doi.org/10.17605/OSF.IO/VAQU7.

Participants were 486 (Female = 245) undergraduate students from all 10 faculties of a public university in Hong Kong. The English version of the survey was answered by 250 participants, while the Chinese version was answered by 236 undergraduate students. Please refer to Supplementary Tables 2 and 3 for the gender, school year and faculty distribution of participants for the English and the Chinese versions of the survey, respectively.

Undergraduate student research assistants from across the university’s faculties helped recruit respondents for the study and were managed by a project coordinator. Half of the research assistants were assigned to distribute the link to the English version of the survey while the other half were assigned to distribute the link to the Chinese version of the instrument. Data were collected at specific locations across campus and through students’ online group in social media platforms as well.

Informed consent and data were collected via Qualtrics with mobile devices. Students were asked to complete the survey by scanning a QR-code. Before answering the survey, all participants were informed of the project’s aims (i.e., [1] evaluate the use of Moodle in 10 faculties of the university, and [2] identify areas for improvement in Moodle, thereby providing direction for promoting the university's online learning strategy). Students were informed that their participation is voluntary, that the data collected from them will be anonymized, and that they are free to withdraw at any stage without negative consequences.

Instruments

The online questionnaire consisted of items that were organized into three sections. The first section is the consent form with one item informing them of the purposes of the study and requesting their willingness to participate. The second section consisted of three items that gathered baseline demographic information, including gender, faculty, school year and experience in using Moodle (i.e., how long have they been using Moodle). The third section had 35 items that made up the 14 constructs we selected to measure students’ LMS experience. To reduce the impact of an ordering effect, the items were presented to the participants in a random order using the “statement randomization” function of Qualtrics. Items were preceded by “These are statements about your general experience in using Moodle. Please rate how much each statement matches you.” All responses were coded using a Likert-type scale ranging from 1 (Not at all) to 6 (Completely).

The 35 items that measure the students’ learning experience in using LMS were originally developed in English. Therefore, English-to-Chinese translation was performed by a post-doctoral fellow who is fluent in both languages. The translations were independently back translated by two bilingual research assistants.

Data analysis procedure

To establish an instrument for measuring students’ learning experience in using LMS that suits the higher education context, we gathered cross-sectional data from university students across faculties to verify the reliability and validity of the scale in both Chinese and English. Specifically, we evaluated (1) internal consistency reliability, and (2) construct validity of the scale.

First, we used composite reliability (CR) to verify the internal consistency reliability of each scale. A value of CR that is greater than 0.70 indicates good reliability (Hair et al., 2010) and 0.60 for acceptable level (Hair et al., 2021). Second, we calculated the average variance extracted (AVE) and the maximum shared variance (MSV) to evaluate the convergent and discriminant validity of the constructs. Convergent validity is achieved when AVE is equal or greater than 0.50 and lower than the corresponding CR, and an AVE that is equal or greater than 0.40 is acceptable if the corresponding CR is greater than 0.60 (Fornell & Larcker, 1981). As for discriminant validity, the AVE should be greater than MSV (Hair et al., 2010). Correlations among different constructs were calculated to check for multicollinearity. Tabachnick and Fidell (2007) indicated that multicollinearity is suggested when predictors correlate above 0.90.

Confirmatory Factor Analysis (CFA; Brown, 2015) was conducted and multiple fit indices were used to assess the structural models. Specifically, Root Mean Square Error of Approximation (RMSEA), with estimates below 0.08 and 0.05 indicating acceptable and good fit respectively (Browne & Cudeck, 1992), Confirmatory Fit Index (CFI) with estimates above 0.90 and 0.95 indicating acceptable and good fit respectively (Marsh et al., 1988), and Standardized Root Mean Square Residual (SRMR) where estimates less than 0.08 are generally considered a good fit (Hu & Bentler, 1999). Items with factor loadings that are lower than 0.40 or higher than 0.95 should be removed (Hair et al., 2010).

Finally, measurement invariance between the English and Chinese version of the scale was tested by creating three models (1) a configural model, in which the factor structure is the same across the scale in different languages but factor loadings, intercepts, and residual variances are allowed to differ between different languages; (2) a metric model, in which the factor loadings are equal across languages but the intercepts are allowed to differ between languages; (3) a scalar model, whereby loadings and intercepts are constrained to be equal across languages. The next level of invariance was deemed to be not supported since the chi-square test was significant at α = 0.05 and the ∆CFI is more than 0.01, even though the ∆RMSEA is less than 0.015 and the ∆SRMR is less than 0.03 (Chen, 2007). Due to two criteria not having been met, we returned to the level of measurement invariance (i.e. scalar invariance) that failed to meet the criteria and therefore conducted a partial measurement invariance analysis (Luong & Flake, 2022). There is another model which requires residuals or measurement errors equivalence across different groups (i.e., residual variance invariance). However, invariance levels beyond scalar invariance represent very strict standards that are often difficult to fulfill empirically (Wang et al., 2018). Therefore, in current study we only tested configural, metric, and scalar invariance between the Chinese and the English versions of the scale.

Results

Reliability and validity of the English version of the survey

Results of CR, AVE, and MSV of English version of the survey are presented in Table 1. The CR ranged from 0.60 to 0.81 that all met the acceptable level and demonstrated internal consistency. As for AVE, estimates ranged from 0.43 to 0.63 except for System Quality. An AVE value of 0.4 is acceptable with the condition that if AVE value is less than 0.5 but CR is higher than 0.6, the convergent validity of the construct can still be deemed acceptable. Therefore, convergent validity was achieved for almost every construct except for System Quality. The MSV of majority of the constructs were also lower than the corresponding AVE. This indicates discriminant validity except for the construct of System Quality, Service Quality and Instructor Quality.

Table 1 CR, AVE, and MSV of the English version of the survey

Table 2 shows the correlation among the 14 constructs of English version of the survey. The range of the estimates were from 0.32 to 0.74, demonstrating weak to moderate level of correlation and no multicollinearity among the constructs.

Table 2 Correlations among constructs of the English version of the survey (n = 250)

Results of the CFA of the English version of the survey are presented in Table 3. RMSEA and CFI showed acceptable levels of model fit, and SRMR demonstrated good model fit. The factor loadings of 14 constructs ranged from 0.45 to 0.81 which all met the acceptable level.

Table 3 Correlations among constructs of the Chinese version of the survey (n = 236)

Reliability and validity of the Chinese version of the scale

Results of the CR, AVE, and MSV of the Chinese version of the survey are presented in Table 4. Firstly, the CR of most of the constructs were higher than 0.6 which met the acceptable level of internal consistency. However, Information Quality, Subjective Norms, Behavioral Intention and Actual Use had CRs that were below 0.6. With regard to AVE, estimates of System Quality, Information Quality and Actual Use were lower than 0.4, but the AVE estimates of other constructs were higher than 0.4 and lower than the corresponding CR which demonstrated convergent validity. The MSV of System Quality, Information Quality, Service Quality, Perceived Behavioural Control, Subjective Norm and Behavioural Intention were higher than the corresponding AVE, failing to meet the criteria for discriminant validity.

Table 4 Results of CFA of the Chinese version of the survey

Table 3 shows the correlation among the 14 constructs of the Chinese version of the survey. The range of the estimates were from 0.41 to 0.71 which demonstrated moderate level of correlation and no multicollinearity among the constructs.

Results of the CFA of the Chinese version survey are presented in Table 4. CFI showed acceptable level of model fit, whereas estimates of RMSEA and SRMR demonstrated good model fit. Factor loadings of the 14 constructs ranged from 0.44 to 0.81 that all met the acceptable level (Hair et al., 2010).

Measurement invariance between the English and Chinese versions of the survey

Table 5 shows the results of measurement invariance between the English and Chinese version of the survey. Despite the significant increase of the chi-square value, there were no significant differences in the CFI, RMSEA, and SRMR between the metric and configural model. Next, comparison between the metric model and the scalar model showed significant increase and the decrease of CFI was 0.02. However, there were no significant differences in RMSEA and SRMR. We therefore removed the constraint on the intercept of item INSQ2 and PBC2 to achieve partial invariance of the scalar model. Comparison between the new model (C2) and the metric model showed a significant increase of chi-square, but the decrease of CFI declined to 0.01. Hence, we concluded that the partial invariance of the scalar model was tenable.

Table 5 Model fit indices of the models with configural (A), metric (B), and scalar (C) measurement invariance between the English and Chinese versions of the survey

Discussion

The present research developed an empirical and theory-based measurement tool to capture the overall experience of university students in using LMS in both English and Chinese languages. We verified the reliability and validity of both versions of the survey as well as their measurement invariance. We present our findings in the following paragraphs, organized based on our research questions. We end the discussion with implications, limitations, and future directions.

Reliability and validity of the English and Chinese versions of the survey

The reliabilities of all scales of the English version of the survey were all within the acceptable range. On the other hand, four constructs from the Chinese version of the instrument (i.e., Information Quality, Subjective Norms, Behavioural Intention and Actual Use) had less than ideal internal consistencies. The relatively low reliability of these four factors can perhaps be attributed to the scales having fewer items than usual (most of the scales were composed of only two to three items). Researchers come up with shorter scales to parsimoniously capture the target construct, thereby reducing participants’ burden in survey studies. While there are instances when the use of fewer items does not necessarily compromise the reliability of shorter versions of established scales, it is often the case that psychometric quality is lost to save time and resources with the use of brief scales (Kemper et al., 2019). The relatively low reliability of the brief scales used in the current study is a demonstration of trade-offs that occur in employing pragmatic strategies in measuring certain constructs, in this case, students’ LMS experiences. Another explanation for the suboptimal reliability of some of the scales was the use of the “statement randomization” function of Qualtrics to reduce the impact of an ordering effect. Nevertheless, the fact that the English version of the scales (and most scales from the Chinese version) still had adequate reliability despite items having been presented in a random order can be treated as evidence for the adequacy of the instruments’ psychometric properties.

In terms of validity, the results of the CFA for both the English and Chinese versions of the instrument were adequate and generally demonstrated good fit. Convergent validity was achieved for almost every construct in the English version except for System Quality. Criteria for discriminant validity were also met except for the constructs System Quality, Service Quality and Instructor Quality. Discriminant validity indicates the degree to which a specific scale does not measure constructs other than the construct that the scale is originally intended to measure (Ramayah et al., 2013). But System Quality, Service Quality, and Instructor Quality were all based on the Information Systems (IS) Success Model (DeLone & McLean, 1992, 2003), possibly explaining why these three constructs did not meet the criteria for discriminant validity. For the Chinese version, most scales met the criteria for convergent validity, except for three (i.e., System Quality, Information Quality and Actual Use). Additionally, six constructs out of 14 (i.e., System Quality, Information Quality, Service Quality, Perceived Behavioural Control, Subjective Norm and Behavioural Intention) failed to meet the criteria for discriminant validity. For a construct to meet the criteria for discriminant validity, it should not statistically correlate with constructs that are theoretically unrelated to it (Crano et al., 2015). This partly explains why the six constructs failed to meet the criteria for this type of validity. Theoretically, these six constructs are supposed to be related since they were drawn from the same models (IS Success Model for System Quality, Information Quality, and Service Quality; Theory of Planned Behaviour for Behavioural Control, Subjective Norm and Behavioural Intention).

Despite having bilingual English-Chinese speakers translate and back-translate the scales, it is notable how the Chinese version has weaker reliability and validity compared to the English version. To address this issue, the items in the Chinese version may have to be reviewed and assessed for possible modifications before they are employed in future studies. It should also be noted that item one from the construct System Quality, I am able to access Moodle easily from any device [e.g., tablet, notebook, smart phone (iOS, Android)], consistently had low factor loadings relative to other items in the scale (0.45 for the English version and 0.44 for the Chinese version). These values, however, still meet the cut-off set by Hair et al. (2010) for acceptability, just like the rest of the factor loadings in both the English and Chinese versions. The value of measuring this particular item is corroborated by findings from the qualitative interviews we conducted with teachers and students about the importance of convenience and accessibility of Moodle in defining user experience.

Correlations between variables

Correlations between the variables in both the English and Chinese versions of the instrument were all significant and in the expected direction. The strongest correlation in the English version was between System Quality and Perceived Usefulness (0.74), while the weakest correlation was between Motivation to Learn through the Use of LMS and Actual Use (0.32). Concerning the Chinese version, the strongest correlation was between Service Quality and Perceived Performance (0.73), while the weakest relationship was between Motivation to Learn through the Use of LMS and Perceived Behavioural Control (0.41). There is evidence for the positive relationship between System Quality and Perceived Usefulness of LMS (Al-Fraihat et al., 2020; Mailizar et al., 2021). It can also be noted that in the study by AL-Nuaimi et al. (2022), only System Quality (and not Information Quality and Service Quality) had a significant relationship with Moodle’s Perceived Usefulness. This suggests that an LMS’s efficiency, functionality and accessibility are factors that have the most potential impact on whether students perceive the platform as useful in their learning. On the other hand, we found that there was a relatively weak, albeit significant, positive relationship between Motivation to Learn through the Use of LMS and Actual Use. Coming up with a new construct that measures students’ motivation to learn through an LMS was motivated by the observed scarcity of recent studies discussing the impact of technology use on academic motivation. Although a review of existing meta-analyses found medium mean effect sizes of technology use on academic motivation (see review by Jansen et al., 2022), there is a need to update the literature to keep up with the pandemic-induced shift to more online technologies in education. In addition, we also aimed to fill a gap found in previous LMS research that tended to either only measure students’ motivation in the learning content (e.g., course, domain) (e.g., Ozonur et al., 2018; Karaoğlan Yılmaz, 2022), or only measure their intention to use and actual use of a new digital learning tool (e.g., AL-Nuaimi et al., 2022; Koh & Kan, 2020). Future studies would benefit from employing robust methods such as longitudinal cross-lagged panel design in investigating student motivation in the context of e-learning (see Fryer et al., 2014; Fryer & Bovee, 2016, 2018). This will allow researchers to establish causal connections and to come up with more meaningful insights on how student motivation is developed and sustained in digital learning environments.

For the Chinese version, the strong correlation between Service Quality and Perceived Performance implies that students’ perception of the availability and accessibility of training opportunities and technical support from IT staff pertaining to LMS use (Al-Fraihat et al., 2020) is significantly related to their perception of their academic performance while using LMS. From the perspective of Expectancy-Value Theory, the value a student places on learning via LMS and their expectations of success are shaped by their prior experiences, beliefs, and environmental influences. It is likely that knowing that help on the LMS is available whenever they need it influences students’ confidence to perform well academically through LMS. Finally, while the relationship between Motivation to Learn through the Use of LMS and Perceived Behavioural Control was relatively weak, the correlation implies that a students’ belief in their knowledge and control over their decision to use LMS could influence their motivation to learn through the platform.

Measurement invariance across the English and Chinese versions

The instrument met the criteria for configural and metric invariance, and partial scalar invariance. Partial invariance of the scalar model was achieved after the removal of constraints on the intercept of item two of the Instructor Quality scale and item two of the Perceived Behavioural Control scale. These two items were chosen because their intercept estimate differs from most other items and came from scales with more than two items (Steinmetz et al., 2009). There are two things that can potentially explain the non-equivalence of these items. One could be due to how the English items were translated into Chinese, and this is why we recommend having the items in the Chinese version be reviewed and assessed for possible modifications before they are employed in future studies. Another source of variation could be the year difference in the samples surveyed. The students who responded to the English version of the survey were younger compared to the respondents in the Chinese version. Nonetheless, achieving measurement invariance at the configural and metric levels and partial invariance at the scalar level indicates that the two versions reflect the same underlying constructs and have equivalent functions across the two languages (Byrne et al., 1989; Steenkamp & Baumgartner, 1998).

Implications

Learning management system (LMS) has emerged as a standard component of higher education institutions for the web-based delivery and management of courses. To improve students’ learning experience while using LMSs, it is important to find ways to systematically assess the factors that impact success of LMS use in higher education with psychometrically sound instruments. Developing and validating a tool for capturing students’ LMS experience thereby contributes to this endeavour. Findings demonstrated that in addition to common constructs previously employed in LMS studies, the newly formed constructs are also important aspects of students’ learning experience in using LMS. Specifically, learning outcome variables (i.e., self-efficacy, learning motivation, performance) in the LMS learning environment were added to this scale. Adding these constructs followed well-developed learning theories such as Expectancy-Value Theory, four-phase model of interest development (e.g., Hidi & Renninger, 2006; Renninger & Hidi, 2017), Model of Domain Learning (e.g., Alexander, 2003), which could better explain students’ intention to use LMS. Moreover, the evidence we found for the reliability and validity of the scale hints at the potential utility of the instrument in capturing a more comprehensive picture of students’ LMS experience.

We must note that traditional Chinese characters were used in the current sample because the study was conducted in Hong Kong where traditional Chinese characters is used officially instead of simplified Chinese. The use of simplified rather than traditional Chinese characters in future studies is one potential way to improve the instrument’s reach, through making available a culturally appropriate instrument among majority of Chinese-speaking populations.

Limitations and future directions

Despite sufficient evidence for reliability and validity provided in this study, there are methodological limitations that must be noted as guidance for future research. First, only bivariate correlations were tested in the current study. We acknowledge that future LMS research would benefit from assessing mediators that could explain the mechanisms behind the relationships, and from testing moderators that could provide the conditions under which these relationships can be strengthened or weakened. Combining theories to form comprehensive and integrative frameworks is then an important challenge that educational technology researchers could take on in order to build sound models for testing. Second, the current study used a non-probability convenience sampling method that limits the generalizability of the results. Future research can therefore consider utilizing probability sampling methods (e.g., cluster random sampling, stratified sampling) to more accurately reflect the actual proportion or distribution of students in each faculty or program in a particular university setting. Third, as the present study only captured students’ LMS experience, future research can extend the LMS evaluation to teachers who are also important stakeholders and end-users of the platform. This is to capture a more comprehensive and diverse set of perspectives on the LMS experience (Neuman, 2014; Sakala & Chigona, 2020). Fourth, since our sample is limited to students from a higher education institution in Hong Kong, future research can also consider gathering data from samples that are from the different regions of Greater China area (e.g., Mainland, Macao, Taiwan) and from countries across the globe. Fifth, despite our best efforts to collect data from all faculties, the distribution of respondents in terms of gender and school year was uneven among the various faculties, as well as the distribution of respondents who answered the English and Chinese versions of the surveys. This uneven distribution of the participants could have affected the results to some extent. Lastly, we note the possibility that students could exhibit cultural and gender biases in their ratings of constructs such as instructor quality, despite the inclusion of student assessments of teaching quality in previous studies (Burić & Frenzel, 2023). We encourage researchers to consider such potential biases in future studies.

Conclusion

Identifying important constructs for assessing the LMS experience of students needs sufficient contextualization. By way of an initial scoping review of literature of LMS studies (Simon et al., 2023) and qualitative interviews with students and teachers, we chose constructs that aligned with our findings and examined how they fit and supplement existing frameworks employed in educational technology and educational psychology literature. By adding outcome constructs that are relevant to learning in the LMS, the current study provided a more comprehensive measurement that can capture students’ learning experience in the educational platform. The study is also an attempt to revive interest in a systematic evaluation of LMS experiences in the post-pandemic era, given the apparent underestimation of the importance of systematically assessing the factors that impacts success of LMS use in higher education (Al-Nuaimi et al., 2022; Alsabawy et al., 2016). The availability of culturally appropriate instruments with sound psychometric properties can aid in generating valuable information for two reasons—to improve user experience and to promote the continued use of LMSs in the post-COVID pandemic era. By addressing the limitations and building on the findings of this study, researchers can further advance our understanding of LMS experiences and contribute to the development of more effective e-learning systems to support teaching and learning in higher education.