Keywords

1 Introduction

Knowledge tracing is an essential and classical problem in intelligent education systems. By tracing the knowledge transition process, we could recommend specific educational items to a student based on one’s weak knowledge. Existing methods try to solve knowledge tracing problems from both educational psychology and data mining perspectives, such as Item Response Theory (IRT) [7], Bayesian Knowledge Tracing (BKT) [1], Performance Factors Analysis (PFA) framework [9] and Deep knowledge tracing (DKT) [10]. Those models have been proved effective but still have limitations. They do not systematically consider the impact of different attributes of the exercises itself on the knowledge tracing problem. Exercise Enhanced Knowledge Tracing (EKT) [5] is the first method to take exercise text and attention mechanism into consideration. However, EKT extracts features of text by feeding the text of exercise directly into a neural network, which fails to extract hierarchical features from exercise (Fig. 1).

Fig. 1.
figure 1

The illustration of hierarchical features of exercise

Fig. 2.
figure 2

Exercise hierarchical feature enhanced framework.

2 Exercise Hierarchical Feature Enhanced Framework

Framework Overview. Knowledge tracing task can be summarized as: In an online educational system, suppose we have M students and E exercises in total. Given any learners’ exercise record \( E =\{(q_{1},r_{1}),(q_{2},r_{2})\ldots (q_{m},r_{m})\}\), predict one’s performance on \(q_{t+1}\). Here \((q_{t},r_{t})\) represents that a learner practices question \(q_{t}\) and answers \(r_{t}\) at step t. The entire structure of the framework is shown in Fig. 2. In order to dig deeper into the information in the exercise text, first we utilize Bert [2] to generate embedding vector \(v_{b}\). Then we feed them into three systems to generate knowledge distribution \(v_{t} \in R^{K}\), semantic features \(s_{t}\) and question difficulty \(d_{t}\) separately. Let \(\varphi (s_{t})\) be the one-hot encoding of the semantic cluster where the question belongs at time t. Finally, we concatenate \(v_{t}\), \(\varphi (s_{t})\), \(d_{t}\), and \(r_{t}\) as \(x_{t}\) and feed \(x_{t}\) into a sequence model.

Subsystems Introduction. Two text classification systems, named KDES and DFES, are designed to predict the knowledge distribution and difficulty of the exercise respectively. The semantic feature extractor system (SFES) could be considered as an unsuperviesed clusering problems. The input of those systems is the Bert encoding of the exercise text. The knowledge labeled by teacher and the correct rate of a question [4] serve as ground truth and are predicted using TextCNN [8] in KDES and DFES systems. In KDES system, we use softmax results classified in the trained model to represent the knowledge distribution of an exercise. In DFES systems, we use neural networks to predict difficulty in order to solve the cold start problem. In SFES systems, we cluster the input using a Hierarchical Clustering method by calculating the cos distance between different semantic vectors [6].

$$\begin{aligned} h_{t},c_{t} = LSTM(x_{t},h_{t-1},c_{t-1};\theta _{t}) \end{aligned}$$
(1)
$$\begin{aligned} y_{t}= \sigma (W_{yh} \cdot h_{t}+b_{y}) \end{aligned}$$
(2)
$$\begin{aligned} loss = -\sum _{t}(r_{t+1}*log(y_{i}^{T}\cdot {\varphi (s_{t+1})})+(1-r_{t+1}) *log(1-y_{i}^{T}\cdot {\varphi (s_{t+1})})) \end{aligned}$$
(3)

Modeling Process. In the propagation stage, as shown in Eq. 1, we process \(x_{t}\) and the previous learner’s hidden state \(h_{t-1}\) and then use RNN network to get current learner’s hidden state \(h_{t}\). Here we use LSTM as a variant of RNN since it can better preserve long-term dependency in the exercise sequence [3]. Finally, we use \(h_{t}\) to predict \(y_{t}\) which contains information about students’ mastery of each semantic feature. Additionally, the dimension of \(y_{t}\) is same as the total number of different semantic clustering in DFES system. The \(\theta _{t}, W_{yh}, b_{y}\) in the equation are the parameters of models. The goal of training is to minimize the negative log likelihood of the observed sequence of student response logs (shown in Eq. 3).

3 Experiment

3.1 Experimental Setting

Since there is no open dataset which could provide exercising records with text information. We derive an experimental dataset containing 132,179 students and 91,449,914 answer records from a large real-world online education system: aixuexi.com.

Table 1. The result of clustering

The baselines of the experiments are as following: BKT, which is based on Bayesian inference; DKT, which uses recurrent neural networks to model student learning; EKTA, which incoporate exercise text features and attention mechanism into the recurrent neural networks; EHFKT_K/S/D, a simplified version of EHFKT, which only contains KDES/SFES/DFES system. The input of EHFKT series is the concatenation of problem encoding and the ouput of each system; EHFKT_T, which contains all subsystems. It diagnoses the transition of mastery of knowledge, while EHFKT diagnoses transition of the mastery of semantic features.

3.2 Experimental Results

Hierarchical Clustering Result. The SFES system uses Bert and Hierarchical Clustering to obtain semantic features of questions. Figure 3 shows the visualization of the clustering results of 11410 questions. The y-axis corresponds to the classification threshold and x-axis corresponds to each exercise. Table 1 implies the result of clustering when the number of clustering \(\lambda _{s}\) is 912.

Fig. 3.
figure 3

Hierarchical clustering result

Fig. 4.
figure 4

AUC of EHFKT series

EHFKT Result. In this part, our experiment divides the dataset into a training set with 105,744 learners’ logs and a test dataset with 26,435 learners’ logs. Figure 4 shows the transition of AUC during the training process. Table 2 shows the overall comparing results in this task. The results indicate that EHFKT performs better than other baseline models. Thus, we could draw several conclusions from the result: In the knowledge tracing task, adding hierarchical features can better represent questions; Besides, tracing the mastery of semantic clusterings can predict students’ performance more precisely. The reason is that the exercises contained in the same clusters have similar knowledge distribution, difficulty, and semantics; This result also demonstrates the instability of the tracing of knowledge mastery since the difficulty of an exercise is unpredictable.

Table 2. Evaluation metrics of different deep learning methods

4 Conclusions

In this article, we propose a novel knowledge tracing framework which could extract the knowledge distribution, semantic features and difficulty from the exercise. Besides, We introduce the diagnosis of semantic features of questions into knowledge tracing, which leads to more accurate performance prediction. Although the meaning of these semantic clusters is beyond people’s understanding, in the future we will try extracting the meaning of the exercises in the same cluster by text sumarization technique to make the data-driven clusters result more understandable to human.