1 Introduction

Online education is popular at present because of school closures caused by the breakout of COVID-19 [25]. Since the beginning of 2020, almost all students around the world have experienced online study. Massive open online courses (MOOCs) have become the main online learning method. The main MOOC platforms, for example, EdX and Coursera, are collecting historical learning data from an increasing number of students. Thus, discovering knowledge from MOOC data is a promising approach to improve online learning quality.

Data mining is the non-trivial process of identifying valid, novel, potentially useful, and ultimately, understandable patterns from an extremely large volume of data. As one of the main data mining tasks, pattern mining [35] discovers various interesting, useful, and unexpected patterns efficiently and effectively. Itemsets [32], sequential patterns (SPs) [12], and sub-graphs [7] are typical patterns discovered in pattern mining.

In this paper, data mining is used, or specifically, pattern mining techniques, to discover knowledge hidden in MOOC data. Online learning activities involve temporal factors; hence, SPs play an important role. Thus, mining SPs in learners’ historical data is a promising approach to improve online learning quality.

Given a sequence database, the problem of SP mining (SPM) is to discover subsequences whose supports are no lower than a user-specified minimum support [12]. Many algorithms have been proposed, most of which focus on developing efficient strategies for identifying all SPs, which can be categorized into three broad classes: Apriori-based [33], vertical database format [41], and projection-based pattern growth algorithms [27]. Generally, numerous SPs are discovered by typical SPM algorithms, which makes it difficult for people to identify meaningful results. To address this limitation, various constraints, such as gap [26] and discreteness constraints [38], are used to discover effective and actionable SPs.

Recently, SPM has been successfully applied in fields such as vehicle trajectory prediction [42] and electronic medicine [24]. Among these application fields, online education is the most promising application domain at present because of school closures caused by the breakout of COVID-19. Considering the characteristics of MOOC data, flexible constraints are incorporated into typical SPM algorithms to discover meaningful SPs to improve learning quality.

The “Course Recommendation” datasetFootnote 1 provided by the MoocData platform is used throughout this paper. The dataset was collected from XuetangX,Footnote 2 one of the largest MOOC platforms in China. Originally used for course recommendation, the dataset contains the records of 82,535 course enrollment sequences from XuetangX from October 1, 2016 to March 31, 2018. The characteristics of this dataset are shown in Table 1.

Table 1 Characteristics of the Course Recommendation dataset

Considering the dataset shown in Table 1, the major parts of this study are as follows:

First, the importance of SPs is evaluated from three aspects: the lengths of enrollment sequences containing them, the variance of days within them, and the moments of enrollments in them. These three aspects are modeled using three constraints.

Second, to make the mining process with three constraints efficient, they are integrated into the support, which is the most general parameter for evaluating SPs, to develop a new parameter called support with flexible constraints (SFC). It is proved that SFC also satisfies the downward closure property.

Third, using breadth-first traversal and depth-first traversal, two algorithms for mining SPs with flexible constraints are proposed and explained. In these two algorithms, SFC is used to replace the support directly.

Finally, extensive experiments were conducted on MOOC data. The results demonstrated that the proposed algorithms effectively reduced the number of discovered results with acceptable efficiency and memory consumption.

The remainder of this paper is organized as follows: Related work is described in Section 2, and the SPM problem is defined in Section 3. In Section 4, the three constraints, in addition to their rationality, are discussed. The mining algorithms are described in detail in Section 5. The experimental results are presented and analyzed in Section 6. Finally, conclusions are drawn in Section 7.

2 Related work

In this section, first, applications of data mining for MOOC data are reviewed. Then, studies on constraint-based SPM are discussed.

2.1 Data mining from MOOC data

Mining knowledge from MOOC data not only helps instructors to improve their teaching materials and methods but also helps learners to access more appropriate courses or learning paths [1]. Data mining from MOOC data is receiving increasing attention, particularly with the rise of online learning during the COVID-19 pandemic. Learning behavior understanding [16], dropout prediction [8], and personalized learning [43] are typical data mining tasks that use MOOC data.

SPM has become an effective tool for analyzing students’ online learning behaviors. Fournier-Viger et al. [10] used SPM techniques to mine frequent action sequences and associations between these sequences in a set of recorded usage of the RomanTutor by novices, intermediates, and experts. Using the discovered SPs, learners’ actions were tracked, and suggestions were provided to improve the learners’ experience. Kinnebrew et al. used SPM and action abstraction to identify important learning behaviors of students in different groups [19]. In their method, both sequence support and instance support were used to evaluate the resulting SPs.

Using SPs to recommend MOOC teaching materials is a promising approach [34]. Taking a student's sequence of past courses, Wang and Zaïane [36] implemented a course recommender system based on three sequence-related approaches, including SPM. Wong et al. used SPM to verify the effect of self-regulated learning (SRL) [37]. Specifically, SPM was used to explore whether differences exist between learners who viewed the SRL-prompt videos and those who did not. The results demonstrated that the SRL-prompt viewers tended to follow the sequential structure of the course provided by the instructor, whereas this was less likely in the group of SRL-prompt non-viewers.

Different from the above-mentioned SPM-based methods, the object of analysis in the present paper is course enrollment MOOC data rather than device usage data, learning behavior data, and video-viewing data.

2.2 Constraint-based SPM

Many SPM algorithms have been proposed to discover frequent SPs (FSPs) [15], high utility SPs [31], negative SPs [5], and SPs from data streams [18].

In many application domains (e.g., music genre classification) [29], SPs confined by predefined constraints are more meaningful than general SPs. A constraint is an additional set of criteria that the user provides to indicate more precisely the types of patterns to be found. This idea has been used from the beginning of the topic of SPM in the GSP algorithm [33]. For constraint-based SPM, the approach used to push the constraints deep into the mining process is important [23].

Time constraints, generally including gap and duration, are the most widely used constraints in SPM. The gap constraint refers to the minimum and maximum amount of time between two consecutive itemsets within an SP, whereas the duration constraint is the maximum time difference for each SP. Li et al. proposed two gap-constrained algorithms [21]: Gap-BIDE and Gap-Connect. The former mines closed gap-constrained subsequences from a set of input sequences and the latter discovers repetitive gap-constrained subsequences from a single input sequence. Wu et al. solved the problem of SPM with periodic wildcard gaps using the data structure of Nettree [39]. Sqn2Vec [26] and NegPSpan [14] are SPM algorithms that use time constraints, and TRuleGrowth [11] is an algorithm for mining sequential rules with a sliding-window constraint.

Length constraints that restrict the minimum/maximum number of items per SP are also commonly used in SPM. cSpade [41] incorporates max-gap, max-span, and length constraints. The length-decreasing support constraint was proposed by Seno and Karypis [30]. Their algorithm SLPMiner finds all the FSPs whose support decreases as a function of their length. Thus, long SPs that usually have lower supports can also be discovered. WSLPMiner is also an SPM algorithm with a length-decreasing support constraint [40].

Aggregate constraints are imposed on an aggregate of items in an SP, where aggregate functions can be those involving the average, general sum, or minimum/maximum number. Chen et al. proposed the PTAC algorithm to discover SPs with tough aggregate constraints [2]. In their algorithm, two strategies that avoid an unnecessary item check and unnecessary projected database generation are used to improve the efficiency and memory consumption.

Other typical constraints used for SPM also exist, such as the item constraint [6], discreteness constraint [38], and norm constraint [4].

3 Preliminaries

Let Σ be a set of courses. An item is represented as a pair (c, t), where c ∈ Σ is a course and t is the enrollment time of c. A sequence S =  < (c1, t1), (c2, t2), …, (cn, tn) > is a list of time-ordered items, where for any 1 ≤ i < j ≤ n, ti < tj holds. The length of sequence S, denoted by |S|, is the total number of items in S. S[i] (1 ≤ i ≤ n) denotes the ith item in S, and S[i].c and S[i].t are the course and enrollment time of S[i], respectively. It should be noted that, at each time, only a single item rather than an itemset is used in this paper. This is because students can only enroll on one course at one time in the MOOC data used in this study.

A sequence S =  < (c1, t1), (c2, t2), …, (cn, tn) > is called a subsequence of another sequence S' =  < (c'1, t'1), (c'2, t'2), …, (c'm, t'm) > (n ≤ m), and S' a super-sequence of S, denoted by S ⊑ S', if there exist integers 1 ≤ i1 < … < in ≤ m such that S[1].c = S'[i1].c, S[2].c = S'[i2].c, …, S[n].c = S'[in].c. The ordered list of pairs < S'[i1], S'[i2], …, S'[in] > is called an occurrence of S in S', denoted by Occ(S, S'). If there exists at least one item (cj, tj) ∈ S', and (cj, tj) ∉ S, S is called a proper subsequence of S', or S' a proper super-sequence of S, denoted by SS'.

A sequence database SDB is a set of 2-tuples (sid, IS), where sid is called a sequence-id and IS an input sequence. A tuple (sid, IS) in a sequence database SDB is said to contain a sequence S if S is a subsequence of IS. For the MOOC data used in this paper, each sequence S has at most only one occurrence in one input sequence IS.

The number of tuples in a sequence database SDB containing sequence S is called the support of S, denoted by sup(S). The set of input sequences in tuples of SDB containing sequence S is called the support set of S, denoted by sup_set(S).

Consider two input sequences ISX and ISY containing S. It is easy to understand that the enrollment times in Occ(S, ISX) are not equal to the enrollment times in Occ(S, ISY). Thus, the enrollment times are omitted in the mining results in this paper. Formally, S =  < c1, c2, …, cn > is called an SP, where c1, c2, …, cn are time-ordered courses without specific enrollment times. S.ci(1 ≤ i ≤ n) denotes the ith course of S.

Let min_sup be the user-specified minimum support threshold. An SP S is an FSP in the sequence database SDB if sup(S)≥ min_sup. The frequent SPM problem is to find the complete set of FSPs in SDB with respect to min_sup.

Consider the example sequence database in Table 2. To make the explanation simple and clear, the enrollment time of each item in all the input sequences is omitted. IS1, IS2, and IS4 contain the SP S =  < Data structure, Operating system > , and input sequences of these three tuples comprise sup_set(S). Thus, if the support threshold min_sup = 2, < Data structure, Operating system > is an FSP.

Table 2 Example sequence database

Different from traditional classroom teaching, learners on the same MOOC course may be significantly different in age, prerequisite knowledge, and learning objectives. For example, IS3 is different from the other four input sequences in the example sequence database because IS3 includes a non-computing course, whereas the other four sequences are all composed of computing courses. Furthermore, some learners may also enroll on many courses without a clear relationship. Thus, mining SPs directly in MOOC data may lead to an extremely large number of uninteresting patterns using substantial computational time and space.

Constraint-based mining may overcome the above-mentioned difficulties because constraints usually confine the patterns to be found to a particular subset that satisfies some strong conditions. Moreover, fewer resulting SPs also reduce the search space, thereby leading to an efficient mining process with small memory consumption. The challenge is how to push the constraints deep into the mining process rather than using constraints to filter the results after all SPs are discovered.

4 Flexible constraints

To determine the interestingness of SPs, three flexible constraints are considered from the perspective of the number of course enrollments within the input sequences, span of enrollment days, and specific enrollment time within a day. To improve efficiency, we push these constraints into the mining process by proving the downward closure property.

4.1 Length constraint

First, the lengths of the enrollment sequences were considered and their distribution seemed to be long tailed. Figure 1 shows the distribution of sequence lengths in the Course Recommendation dataset.

Fig. 1
figure 1

Distribution of sequence lengths in the Course Recommendation dataset

Figure 1 shows that most sequence lengths are short. Specifically, 37.76% of the sequences have lengths equal to 3, 18.91% of the sequences have lengths equal to 4, 13.81% of the sequences have lengths equal to 5, and only five sequences are longer than 200. This phenomenon illustrates that school education is still the most important channel for people to acquire knowledge, although MOOCs are playing an increasingly important role in learning. Thus, most learners resort to MOOCs as an auxiliary learning method when they encounter problems that they need to solve using knowledge covered by online courses. Learners who have enrolled on multiple courses, or even hundreds of courses, may be platform testers or staff of relevant management departments.

This indicates that enrolling on a few courses is feasible for MOOC learners, whereas enrolling on a large number of courses occurs infrequently. Thus, the argument in this study is that the supports contributed by short sequences and long sequences are not the same, and the support contributed by long sequences is not as important as that contributed by short sequences. To model this fact, the length constraint is defined.

Definition 1 (Length constraint)

Let SDB be the sequence database and S be an SP. The length constraint of S with respect to IS ∈ sup_set(S) is defined as

$$LC\left(S,IS\right)=\exp\left(-\left|IS\right|/max\_L\right),$$
(1)

where max_L is the maximum length of all input sequences in SDB.

In this study, the length of the input sequence is divided by the maximum length of all input sequences to ensure that the value of |IS| / max_L is in the range \((0,1]\), which prevents the decay of the constraint from being too large. To push the length constraint into the mining process, it is incorporated into the support.

Definition 2 (Support with length constraint)

The support with length constraint (SLC) of an SP S is defined as

$${sup}_{L}\left(S\right)={\sum }_{IS\in sup\_set\left(S\right)}LC\left(S,IS\right)={\sum }_{IS\in sup\_set\left(S\right)}\mathrm{exp}\left(-\left|IS\right|/max\_L\right)$$
(2)

The SLC in Definition 1 reflects that the support contribution decays as the length increases, and it is lower than the general support of the sequence stated in Section 3. The rationality is verified in Lemma 1.

Lemma 1

Let SDB be the sequence database and S be an SP. Then, supL(S)  ≤ sup(S).

Proof. Suppose that m input sequences in SDB contain S; that is, there are m input sequences in sup_set(S), and sup(S) = m. For any IS ∈ sup_set(S), |IS|≤ max_L. Thus, 0 < (|IS| / max_L)  ≤ 1. Hence, 0 < exp(-|IS| / max_L)  ≤ 1; that is, 0 < LC(S, IS)  ≤ 1. Because there are m input sequences in sup_set(S), \({\sum }_{IS\in sup\_set(S)}LC\left(S,IS\right)\le m\); that is, supL(S) ≤ sup(S).□

The next lemma shows that the support with length constraint satisfies the downward closure property, which is an effective tool for reducing the search space, and is widely used in SPM.

Lemma 2

For any two SPs SX and SY, if SXSY, supL(SY)  ≤ supL(SX).

Proof. For SXSY, sup_set(SY) ⊆ sup_set(SX). There are two cases:

  • (1) If sup_set(SY) = sup_set(SX), supL(SY) = supL(SX).

  • (2) If sup_set(SY) ⊂ sup_set(SX), input sequences are contained in sup_set(SX) but not contained in sup_set(SY). Thus,

    $$ \begin{aligned}{sup}_L\left(S_Y\right)&={\textstyle\sum_{IS\in sup\_set\left({S_{Y}}\right)\wedge IS\in sup\_set\left({S_{X}}\right)}}LC\left({S_{Y}},IS\right)\\&={\textstyle\sum_{IS\in sup\_set\left({S_{Y}}\right)\wedge IS\in sup\_set\left({S_{X}}\right)}}{\textstyle L}{\textstyle C}{\textstyle\left({S_{X}},IS\right)}\\&< \sum\nolimits_{IS\in sup\_set\left({S_{Y}}\right)\wedge IS\in sup\_set\left({S_{X}}\right)}{\textstyle L}{\textstyle C}{\textstyle\left({S_{X}},IS\right)}+\sum\nolimits_{IS'\not\in sup\_set\left({S_{Y}}\right)\wedge IS'\in sup\_set\left({S_{X}}\right)}{\textstyle L}{\textstyle C}{\textstyle\left({S_{X}},IS'\right)}\\&={}{sup}_{L}\left({S_{X}}\right)\end{aligned} $$

According to the above discussion, supL(SY)  ≤ supL(SX).□

Lemma 2 shows that the length constraint can be pushed into the mining process to speed up the discovery of SPs.

4.2 Discreteness constraint

The discreteness constraint is also proposed, which describes how each enrollment time varies from the mean time in a sequence.

Consider an SP S =  < Database, Data mining > in the example sequence database shown in Table 2. Both IS3 and IS5 contain S. To explain the discreteness constraint, the specific enrollment date of each course of IS3 and IS5 is provided. Examples with enrollment dates are shown in Table 3.

Table 3 Two sequences with enrollment dates

To engage learners in the MOOC platform, small discreteness among enrollment dates is preferred. From this point of view, for the same SP S, IS5 contributes more to sup(S) than IS3. For IS5, the mean date of two enrollment dates of S is 2017/2/22, and the distance between both enrollment dates and the mean date is 4 days. For IS3, the mean date of the two enrollment dates of S is 2017/4/2, and the distance between both enrollment dates and the mean date is 37 days. To model this assumption, the discreteness constraint is defined.

Definition 3 (Discreteness constraint)

Let S =  < c1, c2, …, cn > be an SP. IS ∈ sup_set(S) is an input sequence, and there exist integers 1 ≤ i1 < i2 < … < in ≤ m such that S.c1 = IS[i1].c, S.c2 = IS[i2].c, …, S.cn = IS[in].c. The discreteness constraint of S with respect to IS is defined as

$$DC\left(S,IS\right)=\exp\left(-{\textstyle\sum_{j=1}^n}\left(IS\left[{i_{j}}\right].t-\overline{IS\left[{i_{n}}\right].t}\right)^2\right),$$
(3)

where

$$\overline{IS\left[i_n\right].t}=\frac1n{\textstyle\sum_{j=1}^n}IS\left[{i_{j}}\right].t.$$
(4)

From Definition 3, the discreteness constraint indicates how widely enrollment times in a sequence’s occurrence vary. If enrollment times vary greatly from the mean time of a sequence’s occurrence, the constraint is small. To simplify the calculation, only the enrollment dates are considered and the specific enrollment moments are omitted when computing the discreteness constraints.

To push the discreteness constraint into the mining process, it is incorporated into the support.

Definition 4 (Support with discreteness constraint)

Let S =  < c1, c2, …, cn > be an SP. The support with discreteness constraint (SDC) of S is defined as

$$\begin{array}{c}\begin{array}{cc}{sup}_D\left(S\right)&=\sum\nolimits_{IS\in sup\_set\left(S\right)}DC\left(S,IS\right)\end{array}\\=\sum\nolimits_{IS\in sup\_set\left(S\right)}\exp\left(-\sum_{j=1}^n\left(IS\left[i_j\right].t-\overline{IS\left[i_n\right].t}\right)^2\right).\end{array}$$
(5)

The function exp(-x) is monotone decreasing. To avoid supD(S) becoming too small, min–max normalization is used to rescale the enrollment time into the range [0, 1] before the discreteness constraint and SDC are calculated. It can also be proved that the SDC is lower than the general support stated in Section 3.

Lemma 3

Let SDB be the sequence database and S be an SP. Then, supD(S)  ≤ sup(S).

Proof. Suppose that m input sequences in SDB contain S; that is, there are m input sequences in sup_set(S) and sup(S) = m. For any IS ∈ sup_set(S), \(\sum_{j=1}^{n}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)}^{2}\)≥ 0. Thus, 0 < \(\mathrm{exp}\left(-\sum_{j=1}^{n}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)}^{2}\right)\)  ≤  1; that is, 0 < DC(S, IS)  ≤ 1. Because there are m input sequences in sup_set(S), \({\sum }_{IS\in sup\_set(S)}DC\left(S,IS\right) \le m\); that is, supD(S)  ≤ sup(S). □

The SDC also satisfies the downward closure property, which is proved in Lemma 4.

Lemma 4

For any two SPs, SX and SY, if SXSY, supD(SY)  ≤ supD(SX).

Proof. For SXSY, there are two cases.

  1. (1)

    If SX = SY, supD(SY) = supD(SX).

  2. (2)

    If SXSY, first consider the case in which |SY| =|SX|+ 1. Let SX =  < c1, c2, …, cn > and SY =  < c1, c2, …, cn, cn+1 > . For an input sequence IS containing both SX and SY, there exist integers i1 < … < in < in+1 such that IS[i1].c = c1, …, IS[in].c = cn, IS[in+1].c = cn+1. Then \(DC\left({S}_{Y},IS\right)=\mathrm{exp}\left(-\sum_{j=1}^{n+1}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n+1}\right].t}\right)}^{2}\right)\) and \(DC\left({S}_{X},IS\right)=\mathrm{exp}\left(-\sum_{j=1}^{n}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)}^{2}\right)\). According to Eq. (4),

    $$\begin{array}{c}\begin{array}{cc}\overline{IS\left[i_{n+1}\right].t}&=\frac1{n+1}\left(\sum_{j=1}^nIS\left[i_j\right].t+IS\left[i_{n+1}\right].t\right)=\frac n{n+1}\frac1n\sum_{j=1}^nIS\left[i_j\right].t+\frac1{n+1}IS\left[i_{n+1}\right].t\end{array}\\=\frac n{n+1}\overline{IS\left[i_n\right].t}+\frac1{n+1}IS\left[i_{n+1}\right].t=\frac{n+1-1}{n+1}\overline{IS\left[i_n\right].t}+\frac1{n+1}IS\left[i_{n+1}\right].t\\=\overline{IS\left[i_n\right].t}+\frac1{n+1}\left(IS\left[i_{n+1}\right].t-\overline{IS\left[i_n\right].t}\right).\end{array}$$
    (6)

By substitution with Eq. (6),

$$\begin{gathered} \sum\limits_{j = 1}^{n + 1} {(IS[i_{j} ].t - \overline{{IS[i_{n + 1} ].t}} )^{2} } = \sum\limits_{j = 1}^{n + 1} {(IS[i_{j} ].t - \overline{{IS[i_{n} ].t}} - \frac{1}{n + 1}(IS[i_{n + 1} ].t - \overline{{IS[i_{n} ].t}} ))^{2} } \hfill \\ = \sum\limits_{j = 1}^{n + 1} {((IS[i_{j} ].t - \overline{{IS[i_{n} ].t}} )^{2} - \frac{{2(IS[i_{j} ].t - \overline{{IS[i_{n} ].t}} )(IS[i_{n + 1} ].t - \overline{{IS[i_{n} ].t}} )}}{n + 1} + \frac{{(IS[i_{n + 1} ].t - \overline{{IS[i_{n} ].t}} )^{2} }}{{(n + 1)^{2} }})} \hfill \\ = \sum\limits_{j = 1}^{n + 1} {(IS[i_{j} ].t - \overline{{IS[i_{n} ].t}} )^{2} - \frac{{2(IS[i_{n + 1} ].t - \overline{{IS[i_{n} ].t}} )}}{n + 1}\sum\limits_{j = 1}^{n + 1} {(IS[i_{j} ].t - \overline{{IS[i_{n} ].t}} )} + \frac{{(IS[i_{n + 1} ].t - \overline{{IS[i_{n} ].t}} )^{2} }}{n + 1} \, {.}} \hfill \\ \end{gathered}$$
(7)

For the first term on the right-hand side of the last expression in Eq. (7),

$${\textstyle\sum_{j=1}^{n+1}}\left(IS\left[i_j\right].t-\overline{IS\left[i_n\right].t}\right)^2={\textstyle\sum_{j=1}^n}\left(IS\left[i_j\right].t-\overline{IS\left[i_n\right].t}\right)^2+\left(IS\left[i_{n+1}\right].t-\overline{IS\left[i_n\right].t}\right)^2$$
(8)

For the second term on the right-hand side of the last expression in Eq. (7),

$$\begin{array}{c}-\frac{2\left(IS\left[{i}_{n+1}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)}{n+1}\sum_{j=1}^{n+1}\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)\\ \begin{array}{cc}=& -\frac{2(IS\left[{i}_{n+1}\right].t-\overline{IS\left[{i}_{n}\right].t})}{n+1}\left(\sum_{j=1}^{n}\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)+\left(IS\left[{i}_{n+1}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)\right)\end{array}\end{array}$$
(9)

Because \(\sum_{j=1}^n\left(IS\left[i_j\right].t-\overline{IS\left[i_n\right].t}\right)=\sum_{j=1}^nIS\left[i_j\right].t-n\times\overline{IS\left[i_n\right].t}=0,\)

$$-\frac{2\left(IS\left[i_{n+1}\right].t-\overline{IS\left[i_n\right].t}\right)}{n+1}{\textstyle\sum_{j=1}^{n+1}}\left(IS\left[i_j\right].t-\overline{IS\left[i_n\right].t}\right)=-\frac{2\left(IS\left[i_{n+1}\right].t-\overline{IS\left[i_n\right].t}\right)^2}{n+1}$$
(10)

Substituting Eqs. (8) and (10) into Eq. (7) yields

$$\begin{array}{c}\sum_{j=1}^{n+1}\left(IS\left[i_j\right].t-\overline{IS\left[i_{n+1}\right].t}\right)^2\\\begin{array}{cc}=&\sum_{j=1}^n\left(IS\left[i_j\right].t-\overline{IS\left[i_n\right].t}\right)^2+\frac n{n+1}\left(IS\left[i_{n+1}\right].t-\overline{IS\left[i_n\right].t}\right)^2\end{array}.\end{array}$$
(11)

Hence, \(\sum_{j=1}^{n+1}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n+1}\right].t}\right)}^{2}\ge \sum_{j=1}^{n}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)}^{2}\). Thus, \(\mathrm{exp}\left(-\sum_{j=1}^{n+1}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n+1}\right].t}\right)}^{2}\right)\le \mathrm{exp}\left(-\sum_{j=1}^{n}{\left(IS\left[{i}_{j}\right].t-\overline{IS\left[{i}_{n}\right].t}\right)}^{2}\right)\); that is, DC(SY, IS)  ≤ DC(SX, IS). Thus,

$${\sum }_{IS\in sup\_set({S}_{Y})\bigwedge IS\in sup\_set({S}_{X})}DC\left({S}_{Y},IS\right)\le {\sum }_{IS\in sup\_set({S}_{Y})\bigwedge IS\in sup\_set({S}_{X})}DC\left({S}_{X},IS\right)$$
(12)

Because SX SY, sup_set(SY) ⊆ sup_set(SX) holds. If sup_set(SY) = sup_set(SX), Eq. (12) implies that supD(SY)  ≤ supD(SX). If sup_set(SY) ⊂ sup_set(SX), input sequences are contained in sup_set(SX) and not contained in sup_set(SY). Thus,

$$ \begin{aligned}{sup}_D\left(S_Y\right)&={\textstyle\sum_{IS\in sup\_set\left({S_Y}\right)\wedge IS\in sup\_set\left({S_X}\right)}DC\left({S_Y},IS\right)}\\&\leq\sum\nolimits_{IS\in sup\_set\left({S_Y}\right)\wedge IS\in sup\_set\left({S_X}\right)}DC\left({S_X},IS\right)\\&<\sum\nolimits_{IS\in sup\_set\left({S_Y}\right)\wedge IS\in sup\_set\left({S_X}\right)}DC\left({S_X},IS\right)+\sum\nolimits_{IS'\not\in sup\_set\left({S_Y}\right)\wedge IS'\in sup\_set\left({S_X}\right)}DC\left({S_X},IS'\right)\\&=sup_{D}\left({S_X}\right)\end{aligned} $$

According to the above discussion, supD(SY)  ≤ supD(SX) when |SY| =|SX|+ 1.

When |SY| =|SX|+ m (m > 1), (m − 1) SPs S1, S2,…, Sm−1 can be identified such that SX ﻐﺀ ⊏ S1S2 ⊏…⊏ Sm−2Sm−1SY and |SY| =|Sm−1|+ 1 =|Sm−2|+ 2 = … =|S1|+ m − 1 =|SX|+ m. Similar to the case in which |SY| =|SX|+ 1, supD(SY)  ≤ supD(Sm−1)  ≤ supD(Sm−2)  ≤ … ≤ supD(S1)  ≤ supD(SX).

According to the above discussion, supD(SY)  ≤ supD(SX) when SXSY.□

Lemma 4 shows that the discreteness constraint can also be pushed into the mining process to speed up the discovery of SPs.

4.3 Validity constraint

The validity constraint is also proposed, which distinguishes serious learning from casual learning enrollments. The object of this constraint is still the enrollment time, that is, the specific moment within a day.

Consider IS3 and IS5 in the example sequence database in Table 2. The specific enrollment moment is shown in Table 4. It should be noted that the format of Table 4 is the same as the original format of the Course Recommendation dataset. To simplify the explanation, some information was omitted in the previous examples.

Table 4 Two input sequences with specific enrollment times

The motivation for defining the validity constraint is that enrollments during normal working hours are often generated by learners who have a strong desire to learn, whereas enrollments during non-working hours are often generated by learners who simply want to gain some basic knowledge. For IS3 and IS5 in Table 4, although both contain the SP S =  < Database, Data mining > , the enrollment moments for IS3 are during working hours, whereas the enrollment moments for IS5 are during non-working hours (early morning and midnight). It is assumed that IS3 contributes more to sup(S) than IS5. To model this assumption, the validity constraint is defined. In this paper, enrollment during normal working hours is called valid enrollment and enrollment during non-working hours is called casual enrollment. For example, if normal working hours are set to the period 8:00–22:59 and non-working hours to the period 23:00–7:59, S has two valid enrollments in IS3 and two casual enrollments in IS5.

Definition 5 (Validity constraint)

Let S =  < c1, c2, …, cn > be an SP. Suppose that IS ∈ sup_set(S) is an input sequence. The validity constraint of S with respect to IS is defined as

$$VC\left(S,IS\right)=\exp\left(-num\_l/max\_L\right),$$
(13)

where num_l is the number of casual enrollments of S in IS and max_L is the maximum length of all input sequences in SDB.

From Definition 5, the validity constraint distinguishes between standard learning behavior and casual learning behavior. For S =  < Database, Data mining > , VC(S, IS3) = 1, which indicates that sup(S) does not decay in IS3 with respect to enrollment moments because both enrollments are valid enrollments.

To push the validity constraint into the mining process, it is incorporated into the support.

Definition 6 (Support with validity constraint)

Let S be an SP. The support with validity constraint (SVC) of S is defined as

$${sup}_V\left(S\right)=\sum\nolimits_{IS\in sup\_set\left(S\right)}VC\left(S,IS\right)=\sum\nolimits_{IS\in sup\_set\left(S\right)}\exp\left(-num\_l/max\_L\right).$$
(14)

It can also be proved that the SVC is lower than the general support stated in Section 3.

Lemma 5

Let SDB be the sequence database and S be an SP. Then, supV(S)  ≤ sup(S).

Proof. Suppose that m input sequences in SDB contain S; that is, there are m input sequences in sup_set(S) and sup(S) = m. For any IS ∈ sup_set(S), num_l(S, IS) / max_L ≥ 0. Thus, 0 < exp(-num_l(S, IS) / max_L)  ≤ 1; that is, 0 < VC(S, IS)  ≤ 1. Because there are m input sequences in sup_set(S), \({\sum }_{IS\in sup\_set(S)}VC\left(S,IS\right)\le m\); that is, supV(S)  ≤ sup(S). □

The SVC also satisfies the downward closure property, which is proved in Lemma 6.

Lemma 6

For any two SPs SX and SY, if SXSY, supV(SY)  ≤ supV(SX).

Proof. For SXSY, there are two cases.

  • (1) If SX = SY, supV(SY) = supV(SX).

  • (2) If SXSY, for an input sequence IS containing both SX and SY, num_l(SX, IS)/max_L ≤ num_l(SY, IS)/max_L. Thus, exp(-num_l(SX, IS)/max_L)  ≥ exp(-num_l(SY, IS)/max_L); that is,

    $$VC\left(S_X,IS\right)\geq VC\left(S_Y,IS\right).$$
    (15)

Because SX SY, sup_set(SY) ⊆ sup_set(SX) holds. If sup_set(SY) = sup_set(SX),

$$\begin{array}{c}\begin{array}{cc}{sup}_V\left(S_Y\right)&=\sum\nolimits_{IS\in sup\_set\left(S_Y\right)}VC\left(S_Y,IS\right)=\sum\nolimits_{IS\in sup\_set\left(S_X\right)}VC\left(S_Y,IS\right)\end{array}\\\leq\sum\nolimits_{IS\in sup\_set\left(S_X\right)}VC\left(S_X,IS\right)\\={sup}_V\left(S_X\right).\end{array}$$

If sup_set(SY) ⊂ sup_set(SX), input sequences are contained in sup_set(SX) and not contained in sup_set(SY). Thus,

$$ \begin{aligned}{sup}_V\left(S_Y\right)&={\textstyle\sum_{IS\in sup\_set\left({S_Y}\right)\wedge IS\in sup\_set\left({S_X}\right)}}VC\left({S_Y},IS\right)\\&\leq{\textstyle\sum_{IS\in sup\_set\left({S_Y}\right)\wedge IS\in sup\_set\left({S_X}\right)}}VC\left({S_X},IS\right)\\&<{\textstyle\sum_{IS\in sup\_set\left({S_Y}\right)\wedge IS\in sup\_set\left({S_X}\right)}}VC\left({S_X},IS\right)+{\textstyle\sum_{IS{'}\not\in sup\_set\left({S_Y}\right)\wedge IS{'}\in sup\_set\left({S_X}\right)}}VC\left({S_X},IS{'}\right)\\&={sup_V}\left({S_{X}}\right)\end{aligned} $$

According to the above discussion, supV(SY)  ≤  supV(SX) when SXSY.□

Lemma 6 shows that the validity constraint can be pushed into the mining process to speed up the discovery of SPs.

4.4 Constraint integration

To speed up the SPM process, the length constraint, discreteness constraint, and validity constraint are integrated flexibly into one constraint, and the general support is replaced.

Definition 7 (SFC)

Let S be an SP. The SFC of S is defined as

$${sup}_{FC}\left(S\right)=\mathrm{ \alpha }\times {sup}_{L}\left(S\right) +\upbeta \times {sup}_{D}\left(S\right) +\upgamma \times {sup}_{V}\left(S\right)$$
(16)

where α (0 ≤ α ≤ 1) is the length factor, β (0 ≤ β ≤ 1) is the discreteness factor, and γ (0 ≤ γ  ≤ 1) is the validity factor such that

$$\mathrm{\alpha }+\upbeta +\upgamma =1$$
(17)

For an SP S, supFC(S) reflects the decay of sup(S) affected by the input sequences in sup_set(S), including the lengths of these input sequences, variances of the enrollment dates in these input sequences, and enrollment moments within a day in these input sequences. If the lengths of input sequences in sup_set(S) are short, the variances of the enrollment dates are small, and there are few casual enrollments, then there will be more opportunities to discover S when using the proposed algorithms.

It also can be proved that the SFC is lower than the general support.

Theorem 1

Let SDB be the sequence database and S =  < c1, c2, …, cn > be an SP. Then, supFC(S)  ≤ sup(S).

Proof. Let IS be an input sequence and IS ∈ sup_set(S). There exist integers 1 ≤ i1 < … < in such that S.c1 = IS[i1].c, S.c2 = IS[i2].c, …, S.cn = IS[in].c. According to Lemma 1,

$$0<LC\left(S,IS\right)=\exp\left(-\left|IS\right|/max\_L\right)\leq1.$$
(18)

Similarly, according to Lemmas 3 and 5,

$$0<DC\left(S,IS\right)=\exp\left(-\sum_{j=1}^n\left(IS\left[i_j\right].t-\overline{IS\left[i_n\right].t}\right)^2\right)\leq1,$$
(19)
$$0<VC\left(S,IS\right)=\exp\left(-num\_l\left(S,IS\right)/max\_L\right)\leq1.$$
(20)

Assume LC(S, IS)  ≥ DC(S, IS) and LC(S, IS)  ≥ VC(S, IS). Then,

$$\begin{array}{c}\alpha\times LC\left(S,IS\right)+\beta\times DC\left(S,IS\right)+\gamma\times VC\left(S,IS\right)\\\leq\alpha\times LC\left(S,IS\right)+\beta\times LC\left(S,IS\right)+\gamma\times LC\left(S,IS\right)\\=\left(\alpha+\beta+\gamma\right)\times LC\left(S,IS\right).\end{array}$$

According to Eq. (17),

$$\alpha\times LC\left(S,IS\right)+\beta\times DC\left(S,IS\right)+\gamma\times VC\left(S,IS\right)\leq1.$$
(21)

For the other two cases, (1) DC(S, IS)  ≥ LC(S, IS) and DC(S, IS)  ≥ VC(S, IS) and (2) VC(S, IS)  ≥ LC(S, IS) and VC(S, IS)  ≥ DC(S, IS), it can be concluded that Eq. (21) holds similarly.

Suppose that m input sequences in SDB contain S; that is, there are m input sequences in sup_set(S), and sup(S) = m. According to Eq. (21),

$$\begin{array}{c}\begin{array}{cc}{sup}_{FC}\left(S\right)&=\alpha\times{sup}_L\left(S\right)+\beta\times{sup}_D\left(S\right)+\gamma\times{sup}_V\left(S\right)\end{array}\\=\sum\nolimits_{IS\in sup\_set\left(S\right)}\left(\alpha\times LC\left(S,IS\right)+\beta\times DC\left(S,IS\right)+\gamma\times VC\left(S,IS\right)\right)\\\leq m=\mathit{sup}\left(S\right)\mathit.\end{array}$$

According to the above discussion, supFC(S)  ≤ sup(S). □

Using SFC to replace the support can guarantee mining efficiency because it also satisfies the downward closure property.

Theorem 2

For any two SPs SX and SY, if SXSY, supFC(SY)  ≤ supFC(SX).

Proof. According to Lemma 2, supL(SY)  ≤ supL(SX). Because 0 ≤ α ≤ 1,

$$\alpha \times {sup}_{L}\left({S}_{Y}\right)\le \alpha \times {sup}_{L}\left({S}_{X}\right)$$
(22)

Similarly, according to Lemmas 4 and 6,

$$\beta\times{sup}_D\left(S_Y\right)\leq\beta\times{sup}_D\left(S_X\right),$$
(23)
$$\gamma\times{sup}_V\left(S_Y\right)\leq\gamma\times{sup}_V\left(S_X\right).$$
(24)

Summing Eqs. (22), (23), and (24) yields

$$\begin{array}{c}\begin{array}{cc}{sup}_{FC}\left({S}_{Y}\right)& =\alpha \times {sup}_{L}\left({S}_{Y}\right)+\beta \times {sup}_{D}\left({S}_{Y}\right)+\gamma \times {sup}_{V}\left({S}_{Y}\right)\end{array}\\ \le \alpha \times {sup}_{L}\left({S}_{X}\right)+\beta \times {sup}_{D}\left({S}_{X}\right)+\gamma \times {sup}_{V}\left({S}_{X}\right)\\ ={sup}_{FC}\left({S}_{X}\right)\end{array}$$

According to the above discussion, supFC(SY)  ≤ supFC(SX) if SXSY.□

Using Theorem 2, when an SP’s SFC is found to be lower than the minimum support threshold, all its super patterns can be safely pruned when using the proposed algorithms.

Given the above discussion, the problem to be solved is redefined as follows: Given a positive integer min_sup as the minimum support threshold, an SP S is a flexible-constraint-based SP (FCSP) in the sequence database SDB if supFC(S)  ≥ min_sup. An FCSP with length l is called an l-FCSP. The flexible-constraint-based SPM (FCSPM) problem is to find the complete set of FCSPs with respect to SDB and min_sup.

Theorem 3

Let S_FCSP and S_FSP be the sets of FCSPs and FSPs with respect to the same min_sup, respectively. Then, S_FCSP ⊆ S_FSP.

Proof. For ∀ S ∈ S_FCSP, supFC(S)  ≥ min_sup. According to Theorem 1, supFC(S)  ≤ sup(S). Hence, sup(S)  ≥ min_sup, and S ∈ S_FSP. Thus, S_FCSP ⊆ S_FSP.□

From Theorem 3, the set of FCSPs is a subset of the set of FSPs when the same threshold is set.

5 Algorithm description

To discover FCSPs, two algorithms are proposed. One traverses the search space level-by-level and is called SPM using flexible constraints level-wisely (SPM-FC-L), and the other traverses the search space using recursive projections and is called SPM using flexible constraints by projection (SPM-FC-P). The SPM-FC-L algorithm is convenient to implement, whereas SPM-FC-P is more efficient. Because it was proved in Section 4.4 that SFC satisfies the downward closure property, as does the support, the support is replaced by SFC in both algorithms directly.

5.1 SPM-FC-L algorithm

To replace the support with SFC, it is natural to discover the FCSPs based on the GSP algorithm [33]. Algorithm 1 describes the proposed SPM-FC-L for mining FCSPs.

figure a

In Algorithm 1, FCSPs with single items are first discovered on Line 1. FSk is used to denote the set of FCSPs with length k. The initial value of k is set to one on Line 2. The main loop discovers all FCSPs using a candidate generation-and-test methodology (Lines 3–7). On Line 4, the function candidate_gen (described in Algorithm 2) is called to generate candidates with length (k + 1). CSk is used to denote the set of candidate FCSPs with length k. On Line 5, only candidates with SFC no lower than min_sup are kept. The number of iterations is incremented by one on Line 6. Finally, on Line 8, all the discovered FCSPs are returned.

figure b

The function candidate_gen generates the candidate FCSPs with length (k + 1) by joining two k-FCSPs that share the first (k-1) common courses. For each such pair of FCSPs, two candidates can be generated. Each candidate is not retained until all its subsequences are FCSPs because of the downward closure property of SFC. Different from typical SPM algorithms that use both itemset-extension and sequence-extension to generate new candidates, only sequence-extension is considered. This is because there is only one course enrollment at one time in the Course Recommendation dataset used in this paper.

5.2 SPM-FC-P algorithm

In this section, another FCSP mining algorithm, SPM-FC-P, is proposed that uses the recursive sequence database projection approach. To explain the algorithm, the following concepts of sequence database projection are introduced.

Let SX =  < c1, c2, …, cn > and SY =  < c1, c2, …, cm > be two SPs. SY is called a prefix of SX if (1) m < n and (2) there exist integers 1 ≤ i1 < i2 < … < im < n such that \({S}_{Y.{C}_{1}}={S}_{X.{C}_{{i}_{1}}}\), \({S}_{Y.{C}_{2}}={S}_{X.{C}_{{i}_{2}}}\), …, \({S}_{Y.{C}_{m}}={S}_{X.{C}_{{i}_{m}}}\). \({S}_{Z} = <{c}_{{i}_{m}+1},{c}_{{i}_{m}+2},\dots ,{c}_{n}>\) is called the suffix of SX with respect to prefix SY, and denoted by SZ = SX / SY.

Fox example, SP SY =  < Data structure, Operating system > is a prefix of SX =  < Data structure, Introduction to logic, Operating system, Linear algebra, Introduction to big data > , and SZ =  < Linear algebra, Introduction to big data > is a suffix of SX with respect to SY.

Let S be an SP in a sequence database SDB. The S-projected database, denoted by SDB|S, is the collection of suffixes of input sequences in SDB with respect to prefix S.

The sequence database in Table 2 is considered as an example. Consider S =  < Data structure, Operating system > . The S-projected database is shown in Table 5.

Table 5 S-projected database in the example sequence database

According to the above concepts, Algorithm 3 describes the proposed SPM-FC-P for mining FCSPs.

figure c

In the S-projected database, all 1-FCSPs are enumerated on Line 1. Then the main loop (Lines 2–7) generates new FCSPs by appending each 1-FCSP to the current FCSP. On Line 3, a 1-FCSP is appended after the last item of the current FCSP to form a new FCSP. According to previous SPM algorithms based on pattern growth [27], it is easy to understand that the SFC of the new SP is the same as the SFC of the appended item. Thus, it is also an FCSP. Then, the newly formed FCSP is output on Line 4 and its projected database is constructed on Line 5. On Line 6, the SPM-FC-P procedure is called to generate FCSPs recursively. It should be noted that, when SPM-FC-P is called the first time, S is an empty set and SDB | S is SDB itself.

5.3 Summary of the proposed algorithms

Discovering SPs from MOOC learning data is important for improving the online learning experience. To the best of the authors’ knowledge, this is the first work on extracting constraint-based SPs from MOOC data. The novelty of the two proposed algorithms can be summarized as follows.

First, the interestingness of the resulting SPs is measured from three perspectives: the number of courses in which students were enrolled, date span of course enrollment, and specific enrollment moment in a day. Thus, the problem of the extremely large number of resulting SPs of a typical FSP mining problem can be solved, to great extent. Additionally, the FCSPs are more meaningful than FSPs that use frequency only.

Second, the downward closure property was also proved to be satisfied for FCSPs. Thus, the two algorithms for mining FCSPs are not only easy to implement but also effective in reducing the extremely large search space. Therefore, the efficiency of both SPM-FC-L and SPM-FC-P is comparable with that of counter level-wise and projection-based SPM algorithms.

Finally, the three constraints, that is, length constraint, discreteness constraint, and validity constraint, were also all proved to satisfy the downward closure property. Hence, these three constraints can be used separately according to the specific application scenario, which makes the two proposed algorithms suitable for general usage.

6 Experimental results

In this section, the performance of the proposed algorithms is evaluated and they are compared with two general SPM algorithms: GSP [33] and PrefixSpan [27], and one constraint-based sequential rule mining algorithm, TRuleGrowth [11]. The source code of each algorithm was downloaded from the SPMF data mining library [9]. To run GSP, PrefixSpan, and TRuleGrowth on the Course Recommendation dataset, the dataset was transformed by deleting the specific enrollment time and retaining the order of course enrollment within each sequence. It should be noted that TRuleGrowth is an algorithm with a sliding-window constraint for mining partially ordered sequential rules. For a fair comparison, when TRuleGrowth was run, the part that calculated confidence was blocked. Thus, in Sects. 6.1 and 6.2, only the time and memory required for TRuleGrowth to mine the SPs is recorded, and the time and memory required for TRuleGrowth to generate rules from the SPs is ignored. Similarly, the number of results for TRuleGrowth is also the number of discovered SPs rather than the number of sequential rules.

The experiments were conducted on a computer with a 2-Core 1.80 GHz CPU and 8 GB memory running 64-bit macOS Mojave (macOS 10.14). The programs were written in Java. It should be noted that the support used for evaluation was the ratio of the number of input sequences containing the target pattern to the total number of input sequences in the sequence database; that is, the support values used in experiments were in the range [0, 1].

In the proposed model, the length factor α (0 ≤ α ≤ 1), discreteness factor β (0 ≤ β≤ 1), and validity factor γ (0 ≤ γ  ≤ 1) had to be set to appropriate values. First, the approximate ranges of these parameters were outlined, and then their optimal values were determined using progressive refinement. For all the experiments, α = 1/6, β = 3/6, and γ = 2/6.

6.1 Runtime

First, the efficiency performance of these algorithms was demonstrated. When measuring the runtime, the minimum support threshold was varied. Because there was only one dataset, the same dataset was tested using two groups of minimum support thresholds in the experiments in Sects. 6.1 to 6.3.

In Fig. 2, the efficiency of the five algorithms can be categorized into two groups. Generally, the three projection-based algorithms (PrefixSpan, TRuleGrowth, and SPM-FC-P) were faster than the two level-wise algorithms (GSP and SPM-FC-L). This is consistent with the existing consensus in the field of pattern mining; that is, pattern-growth-based algorithms are more efficient because numerous candidates and multiple database scans can be avoided effectively. The two proposed algorithms demonstrated efficiency comparable with their counterpart algorithms. Specifically, SPM-FC-L was slightly faster than GSP, and SPM-FC-P was slightly slower than PrefixSpan and slightly faster than TRuleGrowth.

Fig. 2
figure 2

Comparison of execution times

In this set of experiments, the two proposed algorithms were not faster than PrefixSpan. This can be explained by the following two aspects. The low efficiency of SPM-FC-L was caused by its level-wise search space traversal, whereas the main reason that algorithm SPM-FC-P was slightly slower than PrefixSpan is that SPM-FC-P had to calculate three types of constraints in addition to the corresponding supports, and then integrate them into an SFC.

6.2 Memory consumption

The memory usage of the five algorithms was also compared. The results are shown in Fig. 3.

Fig. 3
figure 3

Comparison of memory usage

The plots of the results for this set of comparisons can also be divided into two categories that are similar to the results in Fig. 2. For the two level-wise algorithms, the proposed SPM-FC-L algorithm consumed less memory than the GSP algorithm, on average, whereas for the three projection-based algorithms, the memory consumption of the proposed SPM-FC-P algorithm was less than that of PrefixSpan, and comparable with that of TRuleGrowth. For example, when the minimum support threshold was 0.4%, SPM-FC-P saved nearly half the memory compared with PrefixSpan. This was mainly because a considerable number of SPs were not FCSPs when using SFC. Thus, fewer results for the proposed algorithms could avoid unnecessary join operations and database projections, which led to less memory consumption.

For SPM-FC-P and TRuleGrowth, the memory consumption of SPM-FC-P in the first set of experiments was worse than that of TRuleGrowth, whereas the memory consumption of SPM-FC-P in the second set of experiments was better than that of TRuleGrowth. The results were closely related to the number of discovered SPs; that is, SPM-FC-P consumed more memory than TRuleGrowth when the number of discovered FCSPs was more than the number of SPs discovered by TRuleGrowth, whereas SPM-FC-P consumed less memory than TRuleGrowth when the number of discovered FCSPs was fewer than the number of SPs discovered by TRuleGrowth, on average. This is also verified in the comparison of the number of discovered patterns in Section 6.3.

6.3 Number of discovered patterns

The number of SPs discovered by our algorithms was also compared with the number of SPs discovered by the other three algorithms. The results are shown in Fig. 4. Because SPM-FC-L and SPM-FC-P returned the same results, and GSP and PrefixSpan returned the same results, the results for the comparison were discovered using SPM-FC-P (FCSPs) and PrefixSpan (FSPs), respectively. Because the SPs discovered by TRuleGrowth, used for extracting partially ordered sequential rules (POSRs), are different from both FCSPs and FSPs, this type of SP is denoted by partially ordered SPs (POSPs) in this set of experiments.

Fig. 4
figure 4

Number of discovered patterns

Figure 4 shows that the number of FCSPs was always smaller than the number of FSPs. This reflects that flexible constraints could present fewer results to users according to the characteristics of MOOC data. Generally, the greater the number of results found, the greater the number of results the proposed algorithms could reduce. For example, when min_sup was 0.05%, the maximum number of FSPs and FCSPs could be determined, and the number of FCSPs was 6,837 smaller than the number of FSPs.

For the results discovered by TRuleGrowth, the number of FCSPs was sometimes less than the number of POSPs, but more often, the number of FCSPs was more than that of POSPs. The reason behind these results is that POSPs are used for generating POSRs pair by pair. Within each pair of POSPs, one POSP is tested for the antecedent, and the other is verified for the consequent. Items in each POSP are unordered. Thus, a large number of permutation results of SPs caused by different orders are avoided. Thus, the number of final resulting POSPs is reduced accordingly.

6.4 Impact of a single constraint

The two proposed algorithms measure the importance of FCSPs with SFC, which is the integration of SLC, SDC, and SVC. To show the effect of each constraint, the performance of each proposed algorithm was compared with that of its counterpart that uses only one constraint.

As discussed in Sects. 6.1 and 6.2, the performance of SPM-FC-L was lower than the performance of SPM-FC-P. Therefore, the comparison between the four level-wise algorithms was conducted using a group of high thresholds. The three level-wise algorithms used only the length constraint, discreteness constraint, and validity constraint denoted by SPM-LC-L, SPM-DC-L, and SPM-VC-L, respectively. The comparison between the four projection-based algorithms was conducted on the group of low thresholds. The three projection-based algorithms used only the length constraint, discreteness constraint, and validity constraint denoted by SPM-LC-P, SPM-DC-P, and SPM-VC-P, respectively.

The runtime, memory consumption, and the number of discovered SPs were compared, and the middle threshold of each threshold group was used, that is, 0.5% for level-wise algorithms and 0.07% for projection-based algorithms. The comparison results are shown in Tables 6 and 7.

Table 6 Performance comparison for the level-wise algorithms
Table 7 Performance comparison for the projection-based algorithms

From Tables 6 and 7, the algorithms that only considered the length constraint (SPM-LC-L and SPM-LC-P) performed best, the two algorithms that only considered the discreteness constraint (SPM-DC-L and SPM-DC-P) performed worst, and the performance of the two proposed algorithms (SPM-FC-L and SPM-FC-P) using three constraints was between the performance of the three algorithms using a single constraint. Compared with the other two constraints, the length constraint was the easiest to calculate. Furthermore, the value of SLC decreased as the length of the input sequence increased. Without considering the actual meaning of the discovered SPs, these two features of SLC made the two length-constraint-based algorithms perform best, on average.

6.5 Pattern analysis

From Theorem 3, any FCSP is also an FSP. To show the effect of the constraints, two typical FSPs that were not FCSPs were analyzed.

When min_sup was set to 0.75%, S1 =  < Literature management and information analysis, Traditional Chinese medicine health preservation > was discovered as an FSP, but not an FCSP. To analyze the reason for this, two random input sequences containing S1 are shown in Table 8.

Table 8 Two input sequences containing S1

For the two selected input sequences, ISα was long, and contained two casual enrollments, whereas ISβ was a typical input sequence that satisfied all three constraints (length, discreteness, and casual enrollments). Similarly, other input sequences containing S1 reduced the SFC because of the length, discreteness, and validity, hence S1 was filtered out by the proposed algorithms.

As another example, when min_sup was set to 0.3%, S2 =  < Ideological and moral cultivation and legal basis, Fiscal policy and tax reform, Traditional Chinese rites > was discovered as an FSP, but not an FCSP. Similarly, two input sequences containing S2 were randomly selected, and are shown in Table 9.

Table 9 Two input sequences containing S2

From Table 9, supFC(S2) reduced because of ISγ for two reasons. One is that ISγ was long, which led to a small contribution to supFC(S2). The other is high discreteness; hence, the contribution to supFC(S2) was small. In addition to high discreteness, two out of the three items in S2 with respect to ISδ were casual enrollments. Thus, the contribution of ISδ to supFC(S2) was small.

To further analyze the interestingness of the resulting SPs, the differences between the FCSP results and the results discovered by only using one constraint were compared. This was achieved by checking the results discussed in Section 6.4. When the minimum threshold was set to 0.06%, two interesting FCSPs that were not discovered by any single constraint were selected. They were S3 =  < Surgical nursing, Discipline studies in nursing, Community nursing, Geriatric nursing > and S4 =  < Surgical nursing, Community nursing, Gynecology nursing > . Both S3 and S4 are courses in nursing. They are certainly interesting and useful for people who want to study nursing, medicine, or related courses.

The above pattern analysis has illustrated that the proposed constraints can effectively filter patterns that are deemed to be less interesting.

7 Conclusions and future work

MOOCs are changing education at the present time. SPM is an effective tool for analyzing the historical behavior of numerous online learners. By analyzing the characteristics of MOOC data, flexible constraints were considered from the perspectives of the length of enrollment sequences, span of enrollment dates, and enrollment moments. To push these constraints deep into the mining process, the SFC was designed step by step, and it was proved that this new parameter also satisfies the downward closure property, which reduced the search space greatly and effectively. Two algorithms called SPM-FC-L and SPM-FC-P were proposed for the breadth-first and depth-first traversal of the search space, respectively. The experimental results demonstrated that the proposed algorithms discovered fewer results than FSPs. Furthermore, their efficiency and memory consumption were comparable with classical SPM algorithms.

To the best of the authors’ knowledge, there has been very little research on SPM from MOOC data, let alone incorporating constraints. The proofs of downward closure allow the three constraints to be used together or individually according to the real-world problem. Therefore, the two proposed algorithms are meaningful in terms of whether they improve the design of MOOCs or improve the learning quality of learners.

Designing more efficient algorithms to discover FCSPs by proposing novel search space traversal and pruning strategies will be attempted in future work. Furthermore, FCSPs will be used instead of FSPs to recommend more suitable learning resources to learners. Other potential interesting future work includes feature selection in actionable SPs [22], visualization of FCSPs [3, 13], and mining FCSP with a deep neural network [17, 20, 28, 44].