Introduction

With the deepening integration of big data [1] and education, the learning revolution, represented by MOOCs [2], Khan academy [3] and Flipped classroom [4], has had a strong impact on traditional forms of education and has highlighted the importance of large-scale online education in reshaping education reshaping—that is, globalized resources, modularized supply, personalized teaching and independent study can be implemented by restructuring and process reengineering. Large-scale online education has made significant breakthroughs in teaching environments, content presentation, teaching modes, and learning evaluations. Large-scale online education also innovates and revolutionizes study approaches and is an effective way to promote educational equity and improve teaching quality. By offering new learning situations and technologies, the unique characteristics of MOOCs have given rise to a new learning model, personalized learning [5, 6]. In a MOOC environment, learners can select and order customized courses according to their own purposes and backgrounds. However, the materials in MOOCs are rather complicated; consequently learners can easily become trapped in “learning Trek” and “cognitive overload” situations [7] during the process of receiving information. Nevertheless, MOOCs provide the same learning materials and learning activities for all learners and ignore individual differences because they fail to analyze individual learning behaviors. As a result, MOOCs cannot achieve the ability to teach students according to their aptitudes [8]. By analyzing online learning behaviors, we can accurately identify a user’s learning characteristics and recommend personalized resources to help them improve their quality of learning. At present, it is urgent to determine how to analyze and identify the learning features of individual users by mining big data in adaptive learning [9, 10].

In an online learning environment, individual differences between learners are clearly evident in terms of time, learning duration, selected learning content, online interactions, etc. Some learners prefer to complete learning tasks during the daytime, while others prefer learning at night. Some learners participate in discussions or group activities on discussion forums or though other social networking applications, while others prefer quiet study. Among all the individual characteristics, learning style is an important factor that affects learners’ individual differences. In other words, different learners have different tendencies in terms of learning style [11].

To date, researchers have made important contributions regarding how to apply learning styles to online learning, especially in the field of learning style identification and prediction [12,13,14,15]. Most of these studies mainly collected and recorded real behavior data left by learners during the network learning process to build a set of learners’ network learning behaviors, and used data mining algorithms, neural networks or simple calculation rules to automatically detect learning; thus some research results have been obtained. However, the central problem is that these studies were not based on large-scale online learning platforms; consequently, they do not meet the demands of modern online learning platforms for effective learning style detection based on a large number of complex network learner behaviors.

Therefore, it is urgent to solve problems such as how to identify learners’ learning styles, how to guide individual learners to construct learning objectives and plans, and how to recommend specific learning resources that reflect an individual’s needs and abilities. Identifying learning styles through learners’ online behavior and classifying them correctly plays an important role in realizing adaptive learning. In ta MOOC environment, the following three important problems must be solved to accurately identify and classify learning styles:

  1. 1)

    What type of learning style model is suitable in MOOC learning environments?

  2. 2)

    What is the relationship between learning behavior and learning styles in MOOC learning environments?

  3. 3)

    What classification method can be used to overcome the problem of inaccurate classification due to the high dimensionality of learning behavior data in MOOC environments?

Through intensive study of the relevant literature concerning learning styles, we found that the learning style model proposed by Felder-Silverman is suitable for MOOC online learning environments. Meanwhile, the online learning relationship between behavior characteristics and learning styles is complex. Through our observations, together with the results obtained from experts, we found that differences in student behavior can be described using the following four dimensions: information processing, perception, input, and understanding. Therefore, the characteristic indicators should be developed based on these four learning behavior dimensions to form the data mining dimension.

We propose a learning style classification based on a DBN for MOOCs, called DBNLS. In this method, the inherent characteristics of learners—their learning styles—are adopted as the classification criteria. First, we summarized and analyzed individual learner differences and preferences to build a learning style model suitable for MOOC learning environments. Then, we determined which learning-habit indicators should be used based on expert experience and linked the indicators to learning styles within individual sessions. The deep learning model DBN was used to learn the high-dimensional learning style features and model learning styles to classify students accurately. Finally, we collected network learning behavior data by analyzing the weblogs left during online learning sessions on StarC [16], a MOOC platform used at Central China Normal University. Meanwhile, we also conducted offline empirical research and collected learning style questionnaire data, which we used as training samples to train the DBN model. The trained DBNLS model was applied to classify students’ learning styles. The results show that the method proposed in this paper is superior to the traditional methods.

The main contributions of this paper are as follows:

  1. 1)

    Learners’ intrinsic characteristics and learning styles are introduced as an important standard for learner classification. Simultaneously, the explicit attributes of the network behavior are effectively mapped to the intrinsic characteristic learning style indicators. These network behavior indicators serve as important DBN inputs for classifying students’ learning styles.

  2. 2)

    We introduce social interaction factors into the learning style model, with the view of including the network learning environment characteristics and the learner interactions in the learning platform. The learning style model established in this paper focuses not only on the static learning resources available to students but also on how they interact with others. For example, some students tend to complete tasks through communication and discussion, while others tend to think and work independently.

  3. 3)

    We introduce a deep-learning algorithm into learning classification in the field of education. This approach effectively overcomes the problems of the sharp rise in computational complexity resulting from high-dimensional data in traditional classification methods and data overfitting conditions.

  4. 4)

    The research was conducted using a practical online learning activity. We collected and distinguished both online and offline data. The offline data served as training data used to train the model, which subsequently was used to classify students’ online learning behaviors. The results show that the proposed mechanism is considerably more accurate than is the traditional classification model.

The rest of this paper is organized as follows: Related work and background introduces the related works and provides some background concerning learning styles, the restricted Boltzmann machine (RBM) learning model, and BP. We then present the learning style model and the classification model in Learning style model in MOOCs. We present our DBNLS algorithm and its training model in Learning style detection based on deep belief neural networks. The details of the experimental evaluation are described in Section 5, and this work is concluded in Conclusions.

Related work and background

Research on learning style theory

The concept of “learning style”, first defined by Herbert Thelen, has since evolved into dozens of learning style theories and has been put into practice in the field of education. With the teaching methods described in “Teaching Students in Accordance with Their Aptitude” and “Learner Centered”, an increasing number of scholars have shifted their focus to the learning style of learners, which is hoped to be fully considered in the MOOC design process. In recent years, the rapid development of online educational tools such as MOOCs has inspired scholars to consider how to reflect different learning styles in online education so that appropriate materials and methods could be suggested accordingly to help improve learning efficiency [11,12,13,14,15, 17].

Theories on learning style have become relatively mature after a long research history. Many scholars have proposed sophisticated learning style models. According to Curry’s learning style model, all learning styles can be classified into three levels, namely, “teaching preference” the outer level, “information processing mode” the middle level, and “cognitive style” the innermost level [18]. Learning style models of this type include Kolb [19], Honey and Mumford [20], Dunn [21] and Felder-Silverman’s learning style models [11]. In addition, other models such as cognitive styles, VARK, and Keefe’s learning style model, propose different definitions of learning style.

Dunn’s learning style model is the representative theory of the “onion model” at the outer level. Dunn was mainly concerned about the stimuli that influence learning activities [19]. These stimuli are related to the learning environment, social environment, physiological factors, psychological factors and emotional factors. However, all these stimuli are highly unstable and vulnerable as can easily be observed. In contrast, Kolb was interested in the learning process. He suggested that each learning process goes through four self-related phases. Learners exhibit different preferences towards these phases [19]. Based on the learning models in [19, 22], and [20], Felder-Silverman provided a brand-new learning style model [11] that focuses on learners’ individual cognitive characteristics and provides a comprehensive description of learning styles by combining information processing, information cognition, information input and information understanding.

Felder also designed the Solomon Learning Style Scale based on the learning style model, which provides a good method for measuring learning styles. As a result, Felder-Silverman’s model is not only widely used in practice but also suitable for a web-based learning environment. Moreover, the Solomon Scale enjoys fairly good reliability and validity and has become popular in the educational field [23, 24]. Although scholars have provided different definitions, they all include the three main characteristics of learning styles. First, learning style varies among individuals, which means that different learners tend to have different learning style preferences. Second, learning style formation is affected by stimuli from both the outside environment and the inner self, such as cultural discrepancies, family factors, educational factors, and physiological factors. Third, learning style affects learning behavior. Learners with different learning styles show differences in learning strategies and learning habits.

Because the measurement approach biased toward essential characteristics (cognitive characteristics) by Felder-Silverman fits web learning, and Felder-Silverman’s model has been proved through numerous experiments to have gained a high frequency in practice, as shown in Table 1, this paper chooses Felder-Silverman as the base learning style model [25]. However, characteristics such as social interactions in new learning environments, such as MOOCs, were not considered in Felder-Silverman’s model even though they should be considered in web learning. Therefore, although based on the Felder-Silverman model, this paper will expands the social interaction dimension to make it more suitable for a MOOC teaching environment.

Table 1 The frequency of typical learning model uses in real environments

Learning style identification

The traditional questionnaire approach to measuring learning style does not fit well in a MOOC teaching environment—primarily because factors such as the subjective consciousness of the interviewees, failure to understand questions, and learning preferences measured at a specific point would negatively affect the accuracy of the results. Therefore, an increasing number of scholars both domestically and internationally have instead been studying learning styles through automatic detection methods [26,27,28,29,30]. An automatic detection method is a way to detect learners’ learning styles automatically by collecting real data recorded about learners by web learning platforms and adapting data mining, neural networks or simple calculation rules to the set of learning behaviors that arise in in web learning contexts [31,32,33,34,35].

The University of Vienna analyzed learning platform data and web logs formed during learning platform sessions to recognize learners’ learning behaviors [31]. In [32], the authors predicted students’ learning styles by analyzing log data using BP neural networks, while [33] used a Bayesian network approach to recognize learning styles of students attending artificial intelligence online courses and discovered a significant disparity in the accuracy of predicting learning styles from different dimensions [34] combined decision tree and hidden Markov models to evaluate learning behaviors and solve difficulties in sequence data to more accurately analyze of sequential and comprehensive learning styles from the understanding dimension. A contrast test was conducted in [26] on the teaching efficiency of adaptive learning styles. The results show that the students who attended adaptive courses acquired a higher learning efficiency and performed better on examinations. The authors of [28] reported that standard achievement assessments could not only assess students’ learning abilities but also detect individual learning characteristics and predict the results [29]. analyzed and processed the web pages visited by web-based learners to study learning styles, and [30] recorded web-based learners’ learning demands and activities and explored their individualized features to study learning performance assessments.

Both the traditional learning algorithms and the backpropagation with adaptive learning rate (BPAL) network algorithm have two weak points. The first is that the raw data cannot include too many properties; the greater the number of properties are, the greater the difficulty is when composing a corresponding vector. When executing the classification algorithm, the computational complexity increases exponentially as the vector length increases. Second, the mapping relations between the properties of the raw data and learning styles cannot be too complex. Thus, these algorithms are unsuitable for complex mapping relations. The conventional methods fail to analyze and process the complex relations between the behavior data of web learning and learning styles. However, deep learning is a way to extract characteristics from vectors in a step-by-step manner, allowing more useful features to be studied by building a machine learning model with multiple hidden layers and enormous amounts of training data to promote the classification and prediction accuracy.

Deep belief networks

Deep belief networks (DBNs) were first proposed by Hinton et al. in 2006 [36]. A DBN analyzes the potential features of texts, images and voice by constructing a multilayer neural network model [37]. The training data proceeds through the network layer by layer, and each layer extracts more advanced features than the previous layers. Deep learning has substantial advantages over traditional neural network learning methods from two aspects. One is that each individual layer’s training greatly promotes the training efficiency. The other is that they avoid the traditional neural networks’ risk of becoming trapped in local minima under an unsupervised learning environment. The DBN model can be a combination of a multilayer RBM (unsupervised learning network) [38], BP (a supervised classifier) [39] or other prediction models.

As shown in Fig. 1, an RBM consists of two layers: a visible layer (visible units) and a hidden layer (hidden units).. The connections between neurons have the following characteristics: no connections within the layer and a fully connected inner layer, where the inner layer includes both the visible layer neurons and the hidden layer neurons. The term fully connected refers to the connection between every neuron in the visible and hidden layers. In an RBM each neuron has only two states: 0 or 1.

Fig. 1
figure 1

RBM network structure

An RBM is an energy-based undirected generative model. Its energy function is formulated as follows for a given set of states (v, h):

$$ {E}_{\theta}\left(\mathrm{v},\mathrm{h}\right)=-{\sum}_{i=1}^{n_v}{b}_i{v}_i-{\sum}_{j=1}^{n_h}{a}_j{h}_j-{\sum}_{i=1}^{n_v}{\sum}_{j=1}^{n_h}{w}_{ij}{v}_i{h}_j, $$
(1)

whose terms are described as follows:

\( \mathrm{v}={\left({v}_1,{v}_2,\cdots, {v}_{n_v}\right)}^T \): the state vector for the visible layer, vi represents the state of the first i neuron in the visible layer;

\( \mathrm{h}={\left({h}_1,{h}_2,\cdots, {h}_{n_h}\right)}^T \): the state vector for the hidden layer, hj represents the state of the first j neuron in the hidden layer;

\( \mathrm{a}={\left({a}_1,{a}_2,\cdots, {a}_{n_v}\right)}^T\in {\mathrm{R}}^{n_v} \):the bias vector of the visible layer, ai indicates the bias of the i neurons in the visible layer;

\( \mathrm{b}={\left({b}_1,{b}_2,\cdots, {b}_{n_h}\right)}^T\in {\mathrm{R}}^{n_h} \):the bias vector of the hidden layer, bj represents the bias of the j neurons in the visible layer;

\( W=\left({w}_{ij}\right)\in {\mathrm{R}}^{n_h\times {n}_v} \):the weight matrix between the hidden layer and the visible layer. wij represents the connection weights between the i neurons in the hidden layer and the j neurons in the visible layer.

The above shows the component form, but it can be rewritten in matrix form:

$$ {E}_{\theta}\left(v,h\right)=-{\mathrm{b}}^Th-{\mathrm{a}}^Tv- Wvh. $$
(2)

Using the energy function defined in Formula 1, the joint probability distribution of the state (V, H) can be given as follows:

$$ {P}_{\theta}\left(v,h\right)=\frac{1}{Z_{\theta }}\ast {e}^{-{E}_{\theta}\left(v,h\right)}, $$
(3)

where

$$ {Z}_{\theta }={\sum}_{v,h}{e}^{-{E}_{\theta}\left(v,h\right)}. $$
(4)

which is a normalization factor, also called the partition function.

Through derivation, we obtain

$$ {P}_{\theta}\left({h}_k=1\ |\ v\right)= sigmoid\left({b}_k+{\sum}_{i=1}^{n_v}{w}_{k,i}{v}_i\right) $$
(5)
$$ {P}_{\theta}\left({v}_k=1\ |\ h\right)= sigmoid\left({a}_k+{\sum}_{i=1}^{n_h}{w}_{j,k}{h}_j\right) $$
(6)

The sigmoid function is a commonly used activation function in neural networks and is defined as follows:

$$ sigmoid(x)=\left(\frac{1}{1+{e}^{-x}}\right). $$
(7)

RBM training

Given a training sample, RBM training is intended to adjust the parameters θ {W, a, b} to fit the given training sample. For this parameter, the probability distribution of the corresponding RBM representation should fit the training data insofar as possible. The mathematical description is as follows:

Suppose the training sample set is

$$ \mathrm{S}=\left\{{v}^1,{v}^2,\cdots, {v}^n\right\}, $$
(8)

where ns is the number of training samples, \( {\mathrm{v}}^i=\left({v}_1^i,{v}_2^i,\cdots, {v}_{n_v}^i\right) \), i = 1, 2, ⋯, ns, and the samples are independent and identically distributed. The goal when training the RBM is to maximize the likelihood function.

$$ {\mathcal{L}}_{\theta, S}={\prod}_{i=1}^{n_s}P\left({v}^i\right). $$
(9)

The product \( {\prod}_{i=1}^{n_s}P\left({v}^i\right) \) which addresses special problems can be derived from the strict monotonic property of the function lnx; the maximization of \( {\mathcal{L}}_{\theta, S} \) is equivalent to \( \ln\ {\mathcal{L}}_{\theta, S} \). Therefore, the goal of training RBM is to maximize the likelihood function.

$$ \mathit{\ln}\ {\mathcal{L}}_{\theta, S}=\mathit{\ln}{\prod}_{i=1}^{n_s}P\left({v}^i\right)={\sum}_{i=1}^{n_s}\mathit{\ln}\ P\left({v}^i\right). $$
(10)

For simplicity, in the following, \( {\mathcal{L}}_{\theta, S} \) is simplified to \( {\mathcal{L}}_S \).

Then, the maximum likelihood function evaluates the gradient

$$ \frac{\partial lnP(v)}{\partial \theta }=-{\sum}_hP\left(h|v\right)\frac{\partial E\left(v,h\right)}{\partial \theta }+{\sum}_{v,h}P\left(v,h\right)\frac{\partial E\left(v,h\right)}{\partial \theta }. $$
(11)

The maximum likelihood function of wi, jaibi for partial derivation is

$$ \frac{\partial lnP(v)}{\partial {w}_{i,j}}\approx P\left({h}_i=1|v\right){v}_j-{\sum}_vP(v)P\left({h}_i=1|v\right){v}_j $$
(12)
$$ \frac{\partial lnP(v)}{\partial {a}_i}\approx {v}_i-{\sum}_vP(v){v}_i $$
(13)
$$ \frac{\partial lnP(v)}{\partial {b}_i}\approx P\left({h}_i=1|v\right)-{\sum}_vP(v)P\left({h}_i=1|v\right). $$
(14)

Contrast divergence (CD) is a standard method for training an RBM [40]. The steps in the k-step CD algorithm (abbreviated as CD-k) are simple. Specifically, approximately ∀v ∈ S take an initial value of v(0) ≔ v ; however, the implementation of k-step acquisition Gibbs sampling, which constitutes the first t steps (t = 1, 2, ⋯, k), is executed as follows:

Use P(h | v(t − 1)) to sample h(t − 1);

Use P(h | v(t − 1)) to sample vt;

Then, through k-step Gibbs sampling, we obtain the v(k) formula to approximate the corresponding desired item \( {\sum}_vP(v) \) (or mean term), specifically as shown in (12), (13), and (14).

$$ \frac{\partial lnP(v)}{\partial {w}_{i,j}}\approx P\left({h}_i=1|{v}^{(0)}\right){v}_j^{(0)}-P\left({h}_i=1|{v}^{(k)}\right){v}_j^{(k)} $$
(15)
$$ \frac{\partial lnP(v)}{\partial {a}_i}\approx {v}_i^{(0)}-{v}_i^{(0)} $$
(16)
$$ \frac{\partial lnP(v)}{\partial {b}_i}\approx P\left({h}_i=1|{v}^{(0)}\right)-P\left({h}_i=1|{v}^{(k)}\right). $$
(17)

In fact, the above approximation can be regarded as using

$$ C{D}_k\left(\theta, v\right)=-{\sum}_hP\left(h|{v}^{(0)}\right)\frac{\partial E\left({v}^{(0)},h\right)}{\partial \theta }+{\sum}_hP\left(h|{v}^{(k)}\right)\frac{\partial E\left({v}^{(k)},h\right)}{\partial \theta } $$
(18)

to approximate (11)

$$ \frac{\partial lnP(v)}{\partial \theta }=-{\sum}_hP\left(h|{v}^{(0)}\right)\frac{\partial E\left({v}^{(0)},h\right)}{\partial \theta }+{\sum}_hP\left(h|{v}^{(k)}\right)\frac{\partial E\left({v}^{(k)},h\right)}{\partial \theta }. $$
(19)

In this way, by using the stochastic gradient ascent method to maximize the log likelihood and estimating the function value on the training data, the update criteria for each parameter can be described as follows:

$$ \Delta {W}_{ij}={\left\langle {v}_i{h}_j\right\rangle}_{data}-{\left\langle {v}_i{h}_j\right\rangle}_{recon}=\Delta {W}_{ij}+P\left({h}_i=1|{v}^{(0)}\right){v}_j^{(0)}-P\left({h}_i=1|{v}^{(k)}\right){v}_j^{(k)} $$
(20)
$$ \Delta \ {a}_i={\left\langle {v}_i\right\rangle}_{data}-{\left\langle {v}_i\right\rangle}_{recon}=\Delta \ {a}_i+{v}_i^{(0)}-{v}_i^{(0)} $$
(21)
$$ \Delta {b}_j={\left\langle {h}_j\right\rangle}_{data}-{\left\langle {h}_j\right\rangle}_{recon}=\Delta {b}_j+P\left({h}_i=1|{v}^{(0)}\right)-P\left({h}_i=1|{v}^{(k)}\right). $$
(22)

Learning style model in MOOCs

Learning style model

The related studies show that Felder-Silverman’s learning style is suitable for use in web-based education environments, and the Solomon questionnaire that matches it has relatively high sophistication and popularity. From the perspectives of information processing, perception, input, and understanding, the learning style model classifies the learning styles into 16 categories, which are LS = {Procession, Perception, Input, Understanding}. Perception = {Sensitive/Intuitive}, Input = {Visual/Verbal}, Understanding = {Sequential/Global}. Nevertheless, the model considers only the interactions between learners and learning materials rather than the interactions between students as well as between students and teachers, and the latter is the defining characteristic in MOOC learning environments.

Opinions from domain experts were adequately considered to build a learning style model suitable for MOOC environments. Based on the Felder-Silverman model and considering the new features available in online learning environments, the learning style model used in this study adds social factors and independent/dependent/competitive types with reference to the measurement of interpersonal factors in the Readiness for Education at a Distance Indicator (READI) [41].

We built a learning style model suitable for MOOCs called MOOCLS, that comprises five dimensions and is able to provide a comprehensive description of learners in net-based learning environments. MOOCLS = {Procession, Perception, Input, Understanding, Society}, Procession = {Active/Reflective}, Perception = {Sensitive/Intuitive}, Input = {Visual/Verbal}, Understanding = {Sequential/Global}, Society = {social/solitary}. The dimensions in the model and their performance characteristics are listed in Table 2.

Table 2 The learning style model for moocs
Table 3 Matching of learning style dimension and learning behavior

Collecting and mapping of e-learning behaviors

Most of the e-learning behavior data studied in the paper come from StarC (an e-learning platform) that serves as a fundamental education tool. StarC has made great contributions to resource sharing nationwide. Two hundred MOOC resources were made available online from March 2015 to August 2016, and 85,345 accounts were opened, 32,987 of which were active. The platform provides various functions for users to learn and communicate, such as knowledge maps, videos, tests, wikis, instant messaging, etc. All the data produced in the learning process are recorded in the system and constitute a set (F) of users’ learning behavior features. We then map these learning behaviors according to the learning styles defined by the MOOCLS model and the experiences of e-learning experts to predict which learning style model the user belongs to based on the user’s learning behavior features, as is shown in the Table 3.

Active/reflective learning style and e-learning behavior

A learning style can be recognized by users’ learning behaviors on forums, wikis, and homework in the active/reflective dimension. Generally, active learners tend to have more posts, post replies and post views in the forum, while reflective learners prefer to think in private and tend to read posts instead of posting. This approach can also be used to recognize active/reflective learners. A wiki is an important feature through which teachers can instruct students by providing key information. When teachers make an entry, students can revise and extend its definition, connotation and denotation, in the course of which each change to the wiki view and all the wiki edits are recorded. Active learners are inclined to try more wiki functions and continually revise the entry. In contrast, reflective learners analyze each question carefully and look up wiki entries more than they revise or update them. Regarding homework, data concerning the time an assignment is submitted before the homework deadline and the frequency of completed homework problems are collected.

Sensitive/intuitive learning style and E-learning behavior

The number of users’ video views, textbook views, outline views, K-Map views, homework dealing, homework review, and introduction view. Sensitive learners are more likely to review their homework carefully, spend more time on doing homework and deal with practical cases such as video, textbook and K-map. Intuitive learners are more creative. They prefer abstract ideas and would like to spend more time on them. Therefore, we can recognize an intuitive learning style by the visit frequency for such abstract resources as K-maps, outline views, and course introductions.

Visual/verbal learning style and e-learning behavior

The users’ video play time is recorded as well as the frequency of video views, K-map views, textbook views, wiki views, wiki edits and introduction views. Verbal learners are more inclined to read text messages such as course book listings, course introductions, wiki edits and views, while visual learners prefer materials such as graphs, tables and videos. As a result, the time they spend on video and the frequency with which they check K-maps tend to be larger.

Sequential/global learning style and e-learning behavior

Outline views, introduction views, button clicks and tables of contents views are collected to help determine whether the learning style is sequential or global. Global learners normally visit comprehensive introduction pages, such as outline views and introduction views, while sequential learners visit fewer such pages. Sequential learners prefer to click buttons, while global learners prefer to navigate by clicking catalogue links.

Social/solitary learning style and e-learning behavior

From a social interaction perspective, learners can be divided into social and solitary types. These two types are distinguishable by number of posts, forum visits, replies, post views, post replied to, posts viewed by, self-homework views, charts, and chart replies. Social learners like to communicate with others, while solitary learners prefer to study alone. Therefore, social learners tend to be more active in discussion areas. They communicate with others frequently by posting, replying to posts and using built-in chat tools. They also maintain good relationships with others through their posts, post replies and chats. Solitary learners are much less active than their social counterparts and more likely to focus on their own homework. These learners have fewer posts, post replied to, viewed by, and fewer chat messages.

Description of the learning style test

The learning style test recognizes a learner’s learning style based on learning behavior features. It can be formulated as follows:

$$ F\Rightarrow LS $$

where LS ∈ {00000, 00001, ⋯, 11111} and the user’s learning style belongs to one of the learning style dimensions. For example, a 00001 means that a user’s learning style could be recognized by fundamental features and properties from the following dimensions: information processing as Active, information perception as Sensitive, information input as Visual, information understanding as Sequential, and social interaction as Social.

Learning style detection based on deep belief neural networks

Predicting and identifying users’ learning styles by analyzing a large number of e-learning behaviors is a complicated process in light of the complicated interactions in learning and the high-dimensionality of the data caused by remarkable differences. The traditional classification approaches cannot identify users’ learning styles effectively and accurately. In this paper, the deep learning DBN model is used to identify the users’ learning styles. Although deep learning in other areas has achieved great success, it has still not been widely applied in the field of educational technology.

DBNLS model

The DBNLS (DBN for MOOCLS) model is the core component of identifying learners’ learning styles. DBNLS consists of a multilayer RBM and a BP network layer: the RBM implements feature extraction and the BP assists in fine-tuning the DBN and predicting learning styles. The core procedure can be formulated as follows. The first step is to obtain set of original learning behavior data by collecting and pretreating network behavior data. After training the DBN model with the extracted set, the features and weights of the e-learning activities are determined. Some of the learning styles are recognized through questionnaires and set as the standard dataset. Thus, the DBN model is fine-tuned and trained by adding the BP to fit the DBNLS to identify the learning style model in e-learning activities. The details are shown in Fig. 2.

Fig. 2
figure 2

Learning style detection model based on the deep belief neural network model

The steps of DBNLS are as follows:

  • Step 1: Data preprocessing: Eliminate logged dates that deviate from the normal values, and normalize the dates from the characteristic value of learning behavior according to the formula \( {x}^{\ast }=\frac{x-\min }{\max -\min } \) (where min represents the minimum value with a single characteristic value, and max represents the maximum value with a single characteristic value) to obtain the initial dataset DBall.

  • Step 2: Measure part of the active users’ learning styles through the Index of Learning Style Questionnaire. The scale contains 44 questions. Each question is based on different learning style dimensions. With four dimensions in total, each dimension includes 11 questions, and each question corresponds to 2 alternative answers. Learning styles can be tested accurately using this scale. By labeling the online learning styles with the corresponding e-learning activity data from the questionnaire, we finally obtained a labeled activity set DBlabel. We then divide the dataset into a training set DBtrain and a testing set DBtest.

  • Step 3: We used the data from DBtrain to train the DBN. BP received supervised training based on DBtrain after the unsupervised DBN training.

  • Step 4: We used DBtest to test the performance and effect of the DBNLS model.

DBNLS model training

After training the DBNLS model the DBN can extract the behavioral characteristics of network learning behaviors and identify the learning styles of network learning behavior. The process of training the DBNLS learning style model is to adjust the parameters of each layer of the RBM to minimize the reconstruction error of the DBNLS model output. The model can abstract and extract the most original features of the sample data. The RBM parameter set of each layer of the DBNLS model in this paper is \( \uptheta =\left\{\begin{array}{c}W=W+\upeta \left\langle \frac{1}{n_s}\Delta W\right\rangle \\ {}a=a+\eta \left\langle \frac{1}{n_s}\Delta a\right\rangle \\ {}b=b+\eta \left\langle \frac{1}{n_s}\Delta b\right\rangle \end{array}\right. \), where W, a and b are the weight matrix, the bias vector of the visible layer to the hidden layer, and the bias vector of the hidden layer to the visible layer, respectively. The DBNL model training is performed layer by layer, and the output of each RBM layer is used as the input to the next layer. The gradient descent method is used during the training of each layer of the RBM to maximize the likelihood function of the RBM output. When the gradient descent reaches a reasonable value (or the training times reach a preset value), feature extraction is complete, and the DBMLS model has obtained a complete representation of the training data. At this point, the training process ends.

To perform learning style detection, a known analysis test sequence is input as a parameter into the DBMLS model, and the classifier output of the DBNLS model is used as the prediction result.

According to the introduction of the DBN model, DBN training can be simplified into several independent RBM training sessions. The feature vector data are used as the input to fully train the first RBM; then, we fix the weight matrices and bias of the first RBM. The hidden layer neurons are set as the input vector to the second RBM. After fully training the second RBM, we stack the second RBM on top of the first RBM. The steps above are repeated until the entire training process is completed for all the RBM models. Finally, the hidden layer neuron state of the last RBM is set as the input to the BP neural network to supervise it efficiently.

DBNLS training can be roughly divided into two phases. The first phase is to initialize the model parameters, in which the given vector training sample S(|S| = ns), training cycle T, learning rate η, and k algorithm in CD-k is set before specifying the unit numbers nv, mh of the visible and hidden layers and initializing the vectors a, b as well as the weighting matrix W. Then, the training process is as follows:

$$ \mathrm{FOR}\ \mathrm{iter}=1,2,\cdots, \mathrm{T}\ \mathrm{DO} $$

{

  1. 1.

    transfer CD − k, generate ∆W, ∆a, ∆b.

  2. 2.

    refresh the parameters: \( \mathrm{W}=\mathrm{W}+\upeta \left\langle \frac{1}{n_s}\Delta W\right\rangle, \)

$$ a=a+\eta \left\langle \frac{1}{n_s}\Delta a\right\rangle, b=b+\eta \left\langle \frac{1}{n_s}\Delta b\right\rangle . $$

}

To improve the efficiency and effectiveness of DBNLS training, the important parameters have been processed. The DBN classification algorithm is the core part of DBNLS; it determines the learning style identification performance of the whole DBNLS. The DBN classification algorithm is shown in Fig. 3.

Fig. 3
figure 3

Sample weblog data

figure a

Small batch data

During the course of the FOR loop in the second training step, the training sample set S is applied to generate the data Δθ ′ (ΔW, Δa, Δb) one time through superposition of the partial derivative in various samples. By dividing set S into tens or hundreds of small batches and calculating them individually, the training efficiency can be improved considerably, and the advantages of GPUs and MATLAB matrix multiplication can be exploited. To address the problem that the gradient of the parameter differs for the different capacities of small batch data, we can divide the total gradient by the size of the small batch data to obtain the average gradient during the parameter updating process:

$$ {\uptheta}^{\prime }={\theta}^{\prime }+\eta \bullet \frac{\Delta \theta^{\prime }}{B}. $$
(23)

In (23), B denotes the small batch data capacity. Every small batch of data should include samples from each class, and the samples function better when they are of equal size. The size of B is set to multiples of the number 10.

Learning rate

The study of η plays a key role in the process of updating neural network parameters because its size directly affects the algorithm training performance. A large η improves the rate of convergence; however, it also can cause the algorithm to become unstable. In contrast, a smaller η improves algorithm stability, but slows down convergence [42]. In this paper, a momentum term was added to update the parameters to promote both the convergence speed and the performance equilibrium.

$$ {\uptheta}^{\prime }=\lambda {\theta}^{\prime }+\eta \bullet \Delta \theta^{\prime }. $$
(24)

In (24), λ is the momentum learning rate. The addition of λ causes parameter updating that is not totally in accord with the current sample gradient direction; instead, it is combined with the previous parameter change direction.

Number of hidden layer units

Due to the lack of theoretical support, it is difficult to determine the RBM numbers of hidden layer units in DBN. Generally, the units can be determined according to the number of neurons in the input and output layers. If the hidden layer units are quite small, the ability of the RBM model to obtain information will be too weak to sum and reflect the characteristics of the training set. When there are too many hidden layer units, training will require more time and cause overfitting problems.

We determined the number of hidden layer units through experiments in this paper. The first step is to set the number of hidden layer units to a smaller value and then train on the samples. The reasonability of the results are judged according to the number of hidden layer units. When the training effect is poor, we can increase the number of hidden layer units until the results reach reasonable values.

The prediction of learning style

During the DBNL model training, the expressions of different network behavior characteristics are established. The MOOCLS learning style model is established in the form of parameter set θ ={W,a,b}. When another set of network learning behavior characteristics are input, the trained DBNLS can quickly identify the learning style the input network behavior characteristics belong to.

Experimental

We conducted a set of experiments to set the parameters and examine the effectiveness of our proposed DBNLS approach for identifying learning styles in terms of accuracy and quality.

The experimental environment

StarC is a free MOOC (Massive Open Online Course) platform open to K-12 education set up by the Central China Normal University. It is a research exchange and results application platform of the National Digital Learning Project and Technology Center, and it works to deliver free learning courses designed by first-rate teachers in our country to provide high-quality teaching materials for the majority of young students and to help educators establish an effective online learning community. Educators use the platform to design and release online courses and online teaching evaluations. During the process of online learning on StarC, interactive data of network behavior, such as independent learning journals, test performances and learners’ essential information are produced and stored.

This study is based on the curriculum reform experiment carried out by Suzhou Educational Informatization Reform. Our candidates are more than 500 s year students from 10 classes in Suzhou Junior High School and Zhenhua Junior High School. All the students are eager to learn. Before the online learning courses formally started, some targeted training was delivered by the teachers to help the students understand how to use an online learning platform to conduct independent learning.

Data preparation

In addition to the behavior data collected by the StarC platform, we also collected offline questionnaires. We could associate the offline subjects’ names and student numbers using an online learning behaviors dataset. The questionnaires were mainly used as the class labels for the training dataset. The learner behaviors were trained through the DBNLS model, and finally, the classification results were obtained and compared with the class labels and used to assess the accuracy of the DBNLS model.

The questionnaire data

The questionnaire used in this study is divided into two parts. In the first part, we designed the offline questionnaire of this research based on the acquisition and mapping of online learning behaviors, according to the Index of Learning Styles quantization table [11], which describes a learning style model and refers to the maturity scale of each dimension in the DBNLS model. Van Zwanenberg’s research results indicate that the Index of Learning Styles scales have good reliability and validity [42]. The online learning styles model constructed for this research used all the items of the Index of Learning Styles scales to ensure both the consistency and integrity of the scales. In addition, it adds the social interdimension—11 questions for the social dimension of learning style. The second part of the questionnaire captures personal information. To enable the offline respondents to adapt the online data analysis objects, this part seeks to acquire personal information such as the testee’s real name, student number, and StarC platform user account, enabling one-to-one correspondence between the students’ learning style information and their online learning behavior data.

The scoring formula for this scale has three steps. The first step is to classify a and b answers to questions corresponding to each dimension in the questionnaire. Then, the second step is to determine the total number of answers to the questions in each dimension. Finally, the last step is to compare the larger sum with the smaller sum and combine the difference value with the letter of the larger sum. The final result is added to the last list. (The difference value means indicate the tendency level of the learning style; the letter denotes different types of learning style). The design for measuring the items for social contact and solitude in the social dimension referred to learning-styles-online.com [43]. Each question adopts a three-level scale for measurement, where a “0” means unmatched and a “1” means matched. The forum of the questionnaire is shown in the Table 4.

Table 4 Index of learning styles

To explain the class label role of the questionnaire in the DBNLS model, the measuring items in the questionnaire needed to be designed. There are 44 questions in the questionnaire corresponding to the Index of Learning Styles and 11 questions for each dimension for a total of 55 questions. The specific questionnaire question content refers to [43, 44] and the Learning Style Self-Test Questionnaire, as shown in the supplementary material below. With the assistance of the head teacher in the subject class and under the assumption that the questionnaire data matched the platform data, all the questionnaires were effective. To ensure that all the questionnaires were repeatedly confirmed and examined, 520 questionnaires were recycled, of which 508 are valid.

According to the Table 5, the measurement result “9a” in the information processing dimension means that the subject belongs to the “active type” and has a strong tendency level in learning style. The measuring result“5b” in the information perception dimension means that the subject belongs to the “conscious type” and has an average tendency level of learning style. The information input and information comprehension dimensions can also be analyzed in this way. For the social dimension, a 1 means the subject belongs to the “social level” learning style. In summary, the measurement result is =10,101 = 21. The online learning behavior data are used to obtain the corresponding learning style by comparing the data trained through the DBNLS model and the questionnaire results.

Table 5 Measurement results of the subjects

Network behavior data

The data produced by the learners during the whole learning process is stored in a MySQL database, as Mongo data and as a journal file. The user table is used to store students’ personal information, including birth date, grade, gender, learner types, while the journal file is used to record the online learning student interactive data (the students’ personal data was hidden). As shown in Fig. 3, this table reflect only part of the learning data from one student. Typically, there will be hundreds of records after a student finishes an entire course.

The processing and choice of data have a large effect on the test learning style results; thus, these aspects are core steps in the learning style testing process. For this study, we preprocessed the log files and extracted the core property information, course information, forum information, wiki information and assignment submission information.

After communicating with four veteran teachers who participated in the experiment for this study, we confirmed the five types of learning style dimensions and the corresponding Internet learning behavior, which is shown in Table 2. Accordingly, we collected the number of visits, posts, readers, repliers, posts replied to and post read from the forum aspect; the compilation of wiki entries changed, the number of changed wiki entries and the number of read wiki entries from the wiki aspect; the advance times of assignment submissions, the number of exercises, the times the assignments were performed, checking and examining homework from the homework aspect; the number of message sent, messages replied to and message replies from the instant messaging aspect; the durations and times when video was played from the video aspect; and 24 other parameter values related to other aspects such as the number of visits to the course outline, course abstract, course catalog, course teaching materials, knowledge map, the number of navigational clicks to move to the “previous/next page” and so on. These are considered to be core characteristic attributes.

We also obtained core personal information, such as the learners’ names and student IDs by linking the student ID to the userInfo table in the “school in CCNU” database, as shown in Fig. 4.

Fig. 4
figure 4

Unlabeled network learning behavior data

The preprocessing of data

In this paper, the learning style data are composed of two parts: training data and test data. The training data are divided into 30,000 unlabeled training data and 6150 labeled training data. In total, there are more than 30,000 records containing 32 types of learning styles. The test data comprise more than 2050 records.

The network learning behavior data, as discussed in the previous section, come from all the learners’ network behavior in the 10 courses of the “normal school” platform. These learners consist of two groups: 820 students who completed the learning style questionnaire and many other online learners who did not complete the learning style questionnaire. Based on the student IDs and usernames, we associated the learning style information with the students’ learning style questionnaires and the unlabeled network learning behavior data. Thus, we extracted 8200 different Internet learning behaviors. Then, we integrated the questionnaire data into the network learning behavior to form the 8200 labeled learning style data, as shown in Fig. 5, forming learning style tags that include _type information.

Fig. 5
figure 5

Tagged data

To improve the quality of the data. We eliminated some invalid learning behavior data. Then, we randomly divided the 8200 labeled learning style data into two groups at a 3:1 ratio: 6150 data items were used as labeled training data for supervised model training; these data are denoted as train_label. The remaining 2050 data were used as a test datasets, denoted as test_label, to test the effects of the DBNLS model. Finally, the train_label test_label datasets were used to train test the DBNLS model.

Another part of the unlabeled data was collected from unknown learning style students who had also taken the 10 courses on the “normal school” platform. There are approximately 30,000 records in this part. An example of the form of the data is shown in Fig. 6. Compared with the labeled learning style data, unlabeled data did not include the last line (the types of learning styles) because their learning styles were unknown. We used this part of the learning style data plan for unsupervised DBNLS model training.

Fig. 6
figure 6

Performance comparisons of different dimensions

DBNLS classifier evaluation

To evaluate the effectiveness of the DBNLS model, we chose the ILS Standard Index of Learning Styles to collect the student learning style data of students. As explained before, these students were attending specific courses in the StarC Course Platform; thus, their online learning style behavior data was collected. We correlated the learning style data of each student with their online learning behaviors to form the learning style detection data, as shown in Table 6. We separated the data into two parts for training and testing the DBNLS model classifier.

Table 6 The results of the experiment

Table 6 demonstrates the results of the DBNLS model. The column is the ground truth of the learning styles of each student, the row headers indicate the learning style predictions by DBNLS, and the values in the table show the instances numbers in which a student with the i learning style was falsely predicted as the j learning style, here 1 < =i < =32 and 1 < =j < =32:

$$ \mathrm{Dp}=\raisebox{1ex}{${\sum}_{i=1}^n\left( TN+ TP\right)$}\!\left/ \!\raisebox{-1ex}{$n$}\right.. $$
(25)

To further evaluate the DBNLS model, we decided to compute the classification accuracy. As is shown in the Table 7, the accuracy scores for the procession, perception, input, understanding, and sociality dimensions were 84%, 81%, 89%, 69%, 79%, respectively as is shown in the Table 8. TN and TP represent the true negative instances and true positive instances, respectively, and n is the total number of items in the dataset.

Table 7 The classifier results
Table 8 Conflusion matrix

In addition, we also applied the BP neural network to perform the same classification experiments as the DBNLS model. The result show that DBNLS clearly performs better than the traditional BP neural network. García P used a Bayesian network to classify four types of learning styles; the average accuracy of DBNLS was also better than that achieved by BP.

We found that compared to the traditional BP neural network model, the standard deviation of the DBMS model’s accuracy is 0.1033, which is quite stable. As is shown in the Table 9, the standard deviation of the accuracy of the BP model is 0.8019, which is relatively large and indicates model uncertainty. Because the DBN LS model uses unsupervised pretraining of the RBM, the RBM is pretrained, and its weight matrix is adjusted to near-optimum values. Therefore, the stability of DBNLS is greater than that of the BP neural network model.

Table 9 The stability of the results

We further analyzed each data dimension and found that the detection rate for sequential/global learning styles reflected a significantly lower value in the understanding dimension. By analyzing the student logs, we found that most students read the whole course—that is, there was no difference among the syllabus and course profiles in terms of the number of visits. Moreover, most students did not skip units and read the entire unit. This phenomenon can be observed in the results obtained for the understanding dimension, where no global learners were discovered.

To summarize the study results, we can conclude that DBNLS performs better than do traditional methods for complex learning style classifications that consider more properties.

Conclusions

The differences between learners are determined according to each learner’s previous knowledge of the subject matter, their learning style, learning characteristics, preferences and goals. Being able to accurately classify online students through networking learning behavior analysis could enable more effective personalized support for students seeking information and learning in an online context. This paper proposes a learning style identification and classification method based on a DBN, called DBNLS. First, we built a student learning style model for MOOCS based on the existing Felder-Silverman model and expert experience. According to the proposed learning style model and its implications, we evaluated the correlations between network learning behaviors to learning style within individual sessions; then, we used the deep learning DBN model to learn those learning style features and model learning styles to classify students accurately.

We conducted several experiments on an actual educational dataset to verify the proposed method. First, some behavioral patterns in the various learning style dimensions were determined by conducting an experiment with learning content based on the ILS theory by Felder and Soloman (1996) and Readiness for Education At a Distance Indicator. We collected actual network learning behavior data from a MOOC learning platform and labeled the collected learning style data using ILS theory to analyze and classify them. Then, we utilized those data to train our DBNLS model. Compared with traditional classification methods (BP and BN), the proposed method achieves good accuracy and performance. However, the accuracy of the predictions regarding the understanding learning style dimensions is not perfect.

In the future, we plan to obtain more learning behaviors to improve the accuracy of predictions regarding the understanding learning style dimensions. Then, we plan to apply the DBNLS to other personalized learning systems, such as personalized recommender systems and personalized learning navigation.