Keywords

1 Introduction

Time series data are ubiquitous in many practical applications ranging from health care [3], action recognition [10], financial markets [15] to urban traffic control [16]. How to extract the features of time series effectively is a popular research topic [4, 5, 7, 9, 13]. However, time series extraction remains a challenging task due to the potentially complex non-linear dynamic system behind the time series.

Recently, Random Projection Filter Bank (RPFB) [5] is proposed as a generic and simple approach to extract features from time series data. RPFB is a set of randomly generated stable autoregressive filters that are convolved with the input time series to generate the features. These features can be used by any conventional machine learning algorithm for solving tasks such as time series prediction, classification with time series data, etc. Different filters in RPFB extract different aspects of the time series, and together they provide a reasonably good summary of the time series.

However, numerous random filters inevitably have redundancy and lead to the increased computational cost of classifier. Moreover, in some cases, redundant features will make the performance of classifier worse. How to reduce redundant features (i.e., estimate the quality of the filter) is an important issue. In this paper, with an aim of reducing the number of redundant filters, we propose a way to distil the filters of RPFB, named D-RPFB, which uses a set of specific rules to filter the filters that are most capable of guiding the classifier to get better performance. D-RPFB can reduce the number of redundant and even potentially mislead filters, thus improving the quality of the features provided to the classifier which directly improves the learning ability of the classifier and obtains a better performance.

2 Preliminaries

There is a crucial process for the distillation of RPFB, which is designed to measure the quality of a specific filter. To do that, we introduce entropy [6]. Considering that entropy is not very common in time series analysis, we first introduce the concept of entropy briefly before proposing our D-RPFB formally.

Entropy [6] is often used in information theory and probability statistics to measure the uncertainty of a variable. Entropy is always a real number larger than 0 but smaller than 1. Its value indicates the degree of uncertainty of random variables. When the entropy is equal to 0, the random variable is completely certain without any randomness. When entropy is equal to 1, the uncertainty of the random variable peaks. This property of entropy makes it possible to use the entropy to measure the classification quality of the classification subset when a classifier uses a single feature extracted by certain filter to classify an instance. The smaller the entropy of a subset, the more the feature extracted by the filter can make the classifier better complete the clustering, and vice versa, the greater the entropy value indicates that the feature extracted by the filter may lead to the confusion of the classification results.

3 Proposed Methods D-RPFB

3.1 Brief Review of Random Projection Filter Bank

The idea behind RPFB is to randomly generate many simple dynamical systems (i.e., \(\frac{1}{1-Z^{'}_{n}z^{-1}}\) denotes a certain simple dynamical system with a given pole \(Z^{'}_{n}\) and \(z^{-1}\) denotes the inverse of z-transform [11]) that can approximate optimal dynamical systems with a high accuracy.

In order to do this, what we should do first is to determine the number of filters in the filter bank. After that, given the certain number of filters N, we draw N random real numbers or the imaginary numbers \(Z^{'}_{1}, \cdots , Z^{'}_{n}\) from the unit circle to construct a filter bank defined by filter \(\phi (z^{-1}) = ( \frac{1}{1-Z^{'}_{1}z^{-1}}, \cdots , \frac{1}{1-Z^{'}_{n}z^{-1}})\) which contains N random projection filters. Then, we pass each input time series through every filter in RPFB to do convolution and generate N features corresponding to each time series at each time step. For example, assuming the length of the each input time series is T, we will get \(N*T\) features after passing it through RPFB. Finally, we can input the obtained features into different classifiers for conducting time series classification.

3.2 The Distillation of Random Projection Filter Bank

Introduce the Entropy into Time Series. The entropy is used in the traditional decision tree ID3 algorithm [12] for feature selection. That motivates us to use entropy to evaluate the quality of a certain filter. However, in the traditional decision tree ID3 algorithm [12], the entropy is only applicable to a discrete variable. To solve this issue, we use an extra classifier to introduce the entropy into time series and achieve the purpose of evaluating the quality of a certain filter. In general, assuming the length of the each input time series is T, we will get T features through time after passing it through a certain filter. Then, we input the T features into a certain classifier to get the classification result. In this way, for each time series example, we get a classification result which makes a certain filter become a discrete variable. And, we propose evaluation method combined with entropy and classification result to evaluate the quality of a certain filter.

Computation of Subset Uncertainty and Evaluation of Filters. After using RPFB to generate filter, each filter will be executed with the proposed evaluation algorithms to get their evaluation value. The overall algorithm flow is shown in Algorithm 1. First, in the training data set, randomly select the same number of instances in each category to form data set \(D_{m}\) for avoiding unbalanced sample. For each filter in RPFB, randomly select the half number of instances in \(D_{m}\) as training data \(D_{t}\), the other half as validation data \(D_{v}\) and then pass the train and valid data into the filter, extracting the corresponding features (denoted by \(F_t\) and \(F_v\)). Then, fitting the classifier with the \(F_t\). When the remaining features \(F_v\) are classified by the classifier, each category (totally M category) will produce a corresponding subset \(D^{'}_{m}\). Each subset \(D^{'}_{m}\) may contain the instances that belong to the subset or contains instances that do not belong to the subset. Thirdly, we can calculate the uncertainty of each subsets \(D^{'}_{m}\) by entropy. If the uncertainty of the subset \(D^{'}_{m}\) is small it means that \(D^{'}_{m}\) contains many instances of the same category, which means that the feature extracted by the filter can guide the classifier to complete the clustering of the time series. However, only clustering results cannot evaluate whether a filter is really efficient because if a subset \(D^{'}_{m}\) contains many instances of the same category that do not belong to \(D^{'}_{m}\), the feature extracted by the filter is quite bad which misleads the classifier. Therefore, we have to consider the classification accuracy as the second characteristics of each subset \(D^{'}_{m}\). In this way, the two important measurements, the clustering effect and the classification accuracy are both considered. Both of them are equally important for evaluating the quality of the feature extracted by a filter. Therefore, D-RPFB proposes a method for calculating the evaluation value of a certain filter as follow:

figure a
$$\begin{aligned} H(D^{'}_{m}) = -\sum _{m=1}^{M}p_{m}log{p_{m}} \end{aligned}$$
(1)
$$\begin{aligned} Recall_{D^{'}_{m}} = \frac{TP}{TP+FN} \end{aligned}$$
(2)
$$\begin{aligned} E_{\phi (z^{-1})} = \sum _{m=1}^M(1-H(D^{'}_{m}))\times (Recall_{D^{'}_{m}}) \end{aligned}$$
(3)

where \(H(D^{'}_{m})\) is the entropy of a classification subset of the filter i, M is the total number of category, \(p_m\) is proportion of an instance of M category in the classification subset \(D^{'}_{m}\), TP is the number of the samples classified correctly in this category, \(TP+FN\) is the number of the total samples in this category, \(Recall_{D^{'}_{m}}\) is the recall of classification subsets \(D^{'}_{m}\) and \(E_{\phi (z^{-1})}\) is the total evaluation value E of the i filter.

4 Experiment

In order to verify that the proposed D-RPFB can reduce the redundancy of the numerous filters while also keeping or even improving the performance of classification, we evaluate it in three different areas of time series data with three traditional classifiers (i.e., LR, SVM and RF) compared with RPFB. First, we investigate the effect of the proposed evaluation method for measuring the quality of a specific filter. Then, we show the experimental results on other two time series. Finally, we give an analysis of the screening percentage of the filters to empirically decide how many filters should be retained.

4.1 Analyzing the Effect of the Proposed Evaluation on Star Curve Data Set

The proposed evaluation method in the Eq. (3) for measuring the quality of a certain filter plays an important role in our D-RPFB. We first investigate the effect of the proposed evaluation method on the Star curve data set [1]. We assess the effect of the Eq. (3) by answering the question: Can we use the Eq. (3) to get three group filter banks that correspond to an excellent, inferior, and average property and get the corresponding performance on the test set? If this happens, then the proposed evaluation method is considered to be effective.

Our experimental scheme is as follows. Firstly, a sufficient number of filters are generated to form an initial filter group. Then, we input a part of the training data and the initial filter bank into the filter method to get the evaluation of all the filters by Eq. 3). Third, sorting the filter by the respective evaluation value of E, we divide the filter into four intervals according to the evaluation value of E (i.e., \(0<E<0.25\) for worst, \(0.25<E<0.5\) for worse, \(0.5<E<0.75\) for better, \(0.75<E<1\) for best). Finally, we construct three group filter banks with 200 filters in each that corresponds to excellent, inferior, and average distribution by randomly selecting a specific number of filters in a specific interval to meet the scheme we need. The corresponding distribution is shown in Fig. 1.

Fig. 1.
figure 1

The number of filters with different evaluation values in the three group of filter banks.

Fig. 2.
figure 2

The performance of classification comparison among three filter banks distribution with three classifiers.

Figure 2 shows clearly the ability of the evaluation method to distinguish high quality filters from inferior filters. Generally speaking, the classification error of the inferior distribution is far higher than the classification error rate of the average distribution and the classification errors of the excellent distribution are lower than the average distribution on the three classifiers, which shows that the proposed evaluation method can effectively distinguish high quality and low quality filters.

4.2 Detection of Bearing Defects

To compare D-RPFB and RPFB, we employ the bearing defect detection data set [5] used by the RPFB. We extract 40 time series of length 3333 in each class time series for filtering screening and testing. First, we select 15 time series (3 categories in total 45) in each category to screen the filter. Next, we generate a set of filter banks, each of which will be used in the D-RPFB and RPFB respectively. In RPFB, the filter group will maintain the number of the filters and participate in the classification of time series, and finally produce the classification error rate. In D-RPFB, the filter group will be firstly screened and then participate in the classification of time series. In this case, if the classification error rate of the D-RPFB is the same with that of the RPFB, it can verify that D-RPFB can reduce the number of redundancy and even potentially mislead filters, thus obtaining a better performance.

Fig. 3.
figure 3

The performance of classification comparison between the RPFB and D-RPFB with different classifiers on data set detection of bearing defects [5].

In our experiment, we empirically retain 75% filters (i.e., reduced number of filters in RPFB by 25%) in D-RPFB. As shown in Fig. 3, both of D-RPFB and RPFB are decreasing with the increasing of the number of filters. On this data set, the SVM can provide a lower error rate than the LR or RF. This conclusion is consistent in both the D-RPFB and RPFB. On the one hand, the error rate of the RPFB and D-RPFB is relatively high when the number of filters is relatively small. Besides, D-RPFB is worse than PRFB. This implies that the RPFB has a limited ability to summarize the time series when there are only a few filters. Meanwhile, D-RPFB further reduces the number of filters with relatively poor quality by distillation mechanism results in fewer filters, which reduces the accuracy of the D-RPFB. On the other hand, with the increasing of the number of filters, the error rate of the D-RPFB and RPFB has decreased, but the D-RPFB declines more. This is because the D-RPFB has gradually obtained the filter which can accurately summarize the time series through the screening mechanism and remove some filters that can produce a misleading effect. The RPFB, because there is no screening mechanism to distinguish the redundant and misleading filters, the effect of some inefficient filters hinders classifier from getting a better performance.

Fig. 4.
figure 4

The performance of classification comparison between the RPFB and D-RPFB with different classifiers on data set heart rate [5].

4.3 Heart Rate Classification

To show more that the D-RPFB can improve the performance of classification, we apply the heart rate data set [5] used in the RPFB. There are two time series with a length of 1800, which belong to category A and B respectively. We firstly divide the time series of category A into 30 short time series with 60 length, 15 of which are training data sets and 15 others are test data sets. Next, we conduct the same operations on the time series of category B. After dividing two long time series, we get 30 training time series (15 of them are category A and the remaining 15 are category B) and 30 test time series (also 15 of them are category A and the remaining 15 are category B). Then, we generate a set of filter banks, each of which will be used in the D-RPFB and RPFB respectively. Finally, again, RPFB uses all the generated filters for classifier. And D-RPFB uses the screened filters for classifier.

In this experiment, we empirically retain 75% filters (i.e., reduced number of filters in RPFB by 25%) in D-RPFB. As shown in Fig. 4, with the small amounts of filters, the performance of D-RPFB is inferior to RPFB again. This implies that there is no need for distillation when the number of filter is very small. However, with the increase of the filters, most of the points on the classification error curve using the features provided by D-RPFB are under the classification error curve of using the features provided by RPFB, even if some points are not under the classification error curve of RPFB, they are not much higher than in the original method. That is to say, such numerous filters randomly generated by RPFB are indeed redundant and have some misleading filters. D-RPFB distil the filters obtained by RPFB to reduce redundancy or some misleading filters to achieve the high quality of the filters and then input to the classifier, resulting a better performance.

Fig. 5.
figure 5

The performance of classification obtained by using different classifiers under different percentages of retainment on D-RPFB.

4.4 Analyzing the Choosing of the Screening Percentage of the Filters on Hand Profile Data Set

How many filters can be kept to obtain a good summary of the input time series remains to be a question. The above reported result is under the 75% retainment (i.e., the corresponding percentage of screening is 25%) of the filters case. In this section, we analyze the choosing of the screening percentage of the filters on Hand profile data set [1]. We first generate 200 filters and then adjust the remaining filter ratio by selecting the high ranking filters, obtaining the corresponding results.

As shown in Fig. 5, if the number of filters retained is too small, the features extracted by these filters may not provide a good summary of the input time series, thus resulting a worse performance. With the percentage of retainment is increasing, the performance is better. Combined with the conclusions of experiments 4.2 and 4.3, there is no redundant or misleading information which could harm the performance among such 200 filters. We can see that the original time series has been well summarized at the 80% of retainment (i.e., the corresponding percentage of screening is 20%), because the benefits from retaining more are already very small. Besides, more filters retained mean more running-time consuming when combined with specific classifier. So, in our experiment, we retain the number of filters at the original 80% while making further adjustments and finally retain 75% (i.e., the corresponding percentage of screening is 25%) to get a better performance.

5 Conclusion

In this paper, we proposed the distillation of random projection filter bank (D-PRFB) for time series classification, which is an improvement method of the random projection filter bank (PRFB). Before directly applying the features generated by the randomly generated numerous autoregressive filters that are convolved with the input time series, we add filter screening in the original method for screening the filters that are most capable of guiding the classifier to get better performance. We evaluated the D-PRFB in three different areas of time series data with three traditional classifiers. Extensive experimental results demonstrate that D-RPFB can reduce redundancy and even potentially misleading filters, thus improving the quality of the features provided to the classifier which directly improves the learning ability of the classifier to obtain a better performance.