Manifold learning for novelty detection and its application in gesture recognition

Luo, Yang; Yuan, Yibiao; Zheng, Wei; Mo, Xiaohui

doi:10.1007/s40747-022-00702-z

Manifold learning for novelty detection and its application in gesture recognition

Original Article
Open access
Published: 14 March 2022

Volume 8, pages 4089–4100, (2022)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Manifold learning for novelty detection and its application in gesture recognition

Download PDF

Yang Luo ORCID: orcid.org/0000-0003-4071-4078¹,
Yibiao Yuan²,
Wei Zheng³ &
…
Xiaohui Mo³

1137 Accesses
Explore all metrics

A Correction to this article was published on 20 December 2022

This article has been updated

Abstract

As a state-of-the-art novelty detection method, Kernel Null Foley–Sammon Transform (KNFST) could identify multiple known classes and detect novelties from an unknown class via a single model. However, KNFST only captures the global information of the training set. The local geometrical structure is neglected. In this paper, a manifold is incorporated into KNFST to solve this issue. First, we use manifold graphs to depict the local structure for within-class scatter and total scatter. Second, the training samples from the same class are mapped into a single point in null space via null projected directions (NPDs). The proposed method can overcome the weakness of KNFST caused by ignoring local geometrical structure in the class. The experimental results on several toy and benchmark datasets show that manifold learning novelty detection (MLND) is superior to KNFST.

LGND: a new method for multi-class novelty detection

Article 21 November 2017

Class specific nullspace marginal discriminant analysis with overfitting-prevention kernel estimation for hand trajectory recognitions

Article 09 May 2023

Gesture Recognition Benchmark Based on Mobile Phone

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

In many real-world applications, a test sample may be from an unknown class that is not available in the training set. These samples can be regarded as novelties or anomalies for known classes as they are far away from the distributions of known classes [1, 2]. The problem is termed as novelty detection or anomaly detection. When the known classes are more than one, it is called multi-class supervised novelty detection [3, 4]. Novelty detection is widely used in the community of pattern recognition. For instance, traffic police wants to find the illegal traffic flow [5], ophthalmologist wants to detect retinal damage [6], cyber security expert needs to monitor cyber-intrusion from massive visiting [7, 8], engineer needs to analyze big data in Internet of Things (IoT) [9], abrupt changes in air temperature [10], unknown pixels in hyperspectral image [11], and medical ultrasound image analysis [12, 13] to name just a few. During the past several decades, the works about supervised novelty detection mainly focus on the training set that only contains one known class [14, 15]. It is called one-class classification [16, 17]. The one-class classifier can only tell us whether a test sample is normal or not [18, 19]. When the training set contains more than one known class, it requires to treat all known classes as a superclass [20] or learn several one-class classifiers [21]. For the former way, it still requires to learn a multi-class classifier to tell us which class a test sample is from if it is not a novelty.

Kernel null Foley–Sammon Transform (KNFST) [3] can deal with one-class classification as well as multi-class supervised novelty detection. For multi-class supervised novelty detection, KNFST only learns a single model which can tell us a test sample whether is a novelty or which class it comes from if it is a normal sample. KNFST maps the samples from the same class to a single point in reproducing kernel Hilbert space (RKHS) via null projection directions. Let c represent the number of classes. Then, the label of a test sample is determined by the minimum distance to the mapping points. If the distances to mapping points are all very large, it is a novelty. However, KNFST can only capture the global information and neglects the local geometrical structure. It may fail when the local geometrical structure is complex, which also exists in classification [22] or ordinal regression [23]. In order to address this issue, we propose a manifold learning-novelty detection (MLND) method in which a manifold graph is introduce to regularize within- and between-class scatter matrices, respectively. The manifold graph is used to depict local geometrical structure. The experimental results demonstrate that MLND is superior to KNFST on several toy and benchmark datasets. The main contributions are summarized as following three points.

First, we introduce a manifold into within-class scatter and total scatter to depict local structure in class for Foley–Sammon transform (or Fisher discriminant analysis).
Second, a new criteria for the projected directions is proposed, which requires the regularized within-class scatter to be zero and the regularized total scatter to be greater than zero in the projected space.
Third, the manifold regularized Foley–Sammon transform is used as a detect novelty method and evaluated on several toy and benchmark datasets.

The rest of this paper is organized as follows. A simple review of supervised novelty detection and kernel null Foley–Sammon transformation (KNFST) is provided in “Related work”. A manifold regularized null Foley–Sammon transformation (NFST) is proposed in “Manifold regularized NFST”. In “Experiments and simulations”, we evaluate manifold learning-novelty detection (MLND) on two toy datasets, eight benchmark datasets, and Gestures dataset. The last section is “Discussion and conclusion”.

Related work

A review of supervised novelty detection

The supervised novelty detection is used to predict the test sample whether is from an unknown class by learning a model from a training set which consists of massive labelled samples. When the labelled samples follow i.i.d. assumption, they can be considered to be from the same class and the supervised novelty detection is a one-class classification problem [2]. For instance, Scholkopf et al. [16] proposed to find a decision hyperplane which could make the minimum margin between the samples in reproducing kernel Hilbert space (RKHS) and the origin be maximized; David and Robert [17] proposed to find a hypersphere which could enclose most of the training samples with minimum volume; Ruff et al. [24] proposed a one-class classifier based on deep learning; Iosifidis et al. [25] used extreme learning to do one-class classification, to name just a few.

When the labelled samples are from a mixture of distributions, they are from several known classes. Then, it is a multi-class supervised novelty detection problem [3]. Compared with multi-class classification, multi-class supervised novelty detection can identify whether a test sample is from an unknown class or which class the test sample is from if it comes from a known class. One way to solve multi-class supervised novelty detection is to treat all known classes as a superclass and learn a one-class classifier to detect the test sample whether is from an unknown class. If the test sample is not from an unknown class, then we train a multi-class classifier to predict which class it comes from [20]. Obviously, this way needs two models: a one-class classifier and a multi-class classifier. Additionally, the one-class classifier would be effected by the complex structure of the superclass. Another way to solve multi-class supervised novelty detection is to train several one-class classifiers. Each one-class classifier is associated with a known class [21]. It has many issues to train several models, such as more training time, more parameters to tune et al. To solve the issues in previous ways, some researchers proposed to learn a single model that can identify a test sample whether is from an unknown class and which class the test sample comes from if it is from a known class, simultaneously. For instance, Bodesheim et al. [3] proposed a multi-class supervised novelty detection method in which the samples from the same class are mapped to a single point in reproducing kernel Hilbert space (RKHS) via null projected directions (NPDs). Zhang et al. [26] proposed a semi-supervised version of KNFST and used it in person re-identification problem. Liu et al. [27] proposed a kernel null space discriminant analysis for incremental supervised novelty detection. Huang et al. [28] used incremental KNFST in person re-identification problem. T Ali and Chaudhuri [29] combined maximum margin metric learning with null space to do supervised novelty detection. The common ground of these methods is that they all adopt null space skill. However, null space only considers the global information and neglects the local geometrical structure. To solve this issue, we propose a manifold learning-based supervised novelty detection method in which the local geometrical structure is depicted by a manifold.

Recap of Kernel null Foley–Sammon transform (KNFST)

Foley–Sammon transform, also called Fisher transform or linear discriminant analysis (LDA), maximizes the between-class scatter and minimizes the within-class scatter, simultaneously. Let $\mathbf {X}_j,j=1,\dots ,c$ represent the set consisting of the samples which belong to class j; the $\mathbf {S}_w$, $\mathbf {S}_b$, $\mathbf {S}_t$ represent the within-class scatter, between-class scatter, and total scatter matrices, respectively; $\varvec{\varphi }\in \mathbb {R}^D$ be one direction in the discriminant subspace. Then, the Fisher discriminant criterion is written as follows.

$$\begin{aligned} J(\mathbb {\varphi })=\frac{\varvec{\varphi }^T\mathbf {S}_b\varvec{\varphi }}{\varvec{\varphi }^T\mathbf {S}_w\varvec{\varphi }}. \end{aligned}$$

(1)

Maximizing Eq. (1) can be done via solving a generalized eigenvalue problem as follows:

$$\begin{aligned} \mathbf {S}_b\varvec{\varphi }=\lambda \mathbf {S}_w\varvec{\varphi }. \end{aligned}$$

(2)

The eigenvectors $\varphi ^1,\dots ,\varphi ^k$ associated with the largest eigenvalues $\lambda _1,\dots ,\lambda _k$ are selected as the discriminant directions.

In null Foley–Sammon transform (NFST), the direction should make the within-class scatter be zero and between-class scatter be positive. Therefore, Eq. (1) becomes $J(\varvec{\varphi })=\infty $, which has the best separability. The solution in NFST should satisfy

$$\begin{aligned} \varvec{\varphi }^T\mathbf {S}_w\varvec{\varphi }=0 \quad \mathrm{{and}} \quad \varvec{\varphi }^T\lambda \mathbf {S}_b\varvec{\varphi }>0. \end{aligned}$$

(3)

The directions satisfying Eq. (3) are called null projection directions (NPDs). The Eq. (3) is equivalent to

$$\begin{aligned} \varvec{\varphi }^T\mathbf {S}_w\varvec{\varphi }=0 \quad \mathrm{{and}} \quad \varvec{\varphi }^T\lambda \mathbf {S}_t\varvec{\varphi }>0. \end{aligned}$$

(4)

Here, $\mathbf {S}_t$ is the total scatter, $\mathbf {S}_t=\mathbf {S}_w+\mathbf {S}_b$. The samples in the same class are mapped to a single point due to $\varvec{\varphi }^T\mathbf {S}_w\varvec{\varphi }=0$. An illustration of NFST is shown in Fig. 1.

Figure 1 is a three-class problem. The c1, c2, and c3 are the mapped points of class 1, class 2, and class 3 in null space. For a test sample, the associated label is decided by the minimum distance to the points, c1, c2, and c3. If the test sample is far away from all points, it comes from an unknown class.

In both FST and NFST, within-class scatter and between-class scatter only capture global information and neglect local geometrical structure. In this paper, we adopt a manifold to regularize the within-class scatter and between-class scatter to describe the local structure in a class.

Manifold regularized NFST

Manifold learning for novelty detection

In manifold learning, it assumes that if two data points are close in the original distribution, they are also close in the projected subspace. In this paper, we use the neighborhood preserving embedding (NPE) to describe the local geometrical structure in within-class scatter and total scatter. Then, we propose a regularized within-class scatter and a regularized total scatter instead of within-class scatter and local total scatter, respectively.

Definition 1

(Regularized within-class scatter) Given a dataset $\mathbf {X}\in \mathbb {R}^{N\times D}$. The associated label is $\mathbf {Y}$ ($y_i\in \{1,\dots ,c\}$). $\mathbf {X}_j$ consists of all samples belonging to class j. The local within-class scatter is defined as

$$\begin{aligned} \mathbf {S}_\mathrm{{wreg}}&=\sum \limits _{i=1}^N\left( \alpha \left( \mathbf {x}_i-\varvec{\mu }_j \right) -\left( 1-\alpha \right) \left( \mathbf {x}_i-\sum \limits _{p=1}^N W_{i,p}\mathbf {x}_p\right) \right) \nonumber \\&\quad \times \left( \alpha \left( \mathbf {x}_i-\varvec{\mu }_j \right) -\left( 1-\alpha \right) \left( \mathbf {x}_i-\sum \limits _{p=1}^N W_{i,p}\mathbf {x}_p\right) \right) ^T. \end{aligned}$$

(5)

Here, $\mathbf {W}$ is an adjacency graph. If $\mathbf {x}_p$ is one of the k-nearest neighbors of $\mathbf {x}_i$ and has the same label as that of $\mathbf {x}_i$, there is an edge between $\mathbf {x}_i$ and $\mathbf {x}_p$ ($W_{i,p}\ne 0$); otherwise, $W_{i,p}=0$.

The Eq. (5) can be rewritten as follows

$$\begin{aligned} \mathbf {S}_\mathrm{{wreg}}&=\sum \limits _{i=1}^N\left( \mathbf {x}_i-\alpha \dfrac{1}{N_j}\sum \limits _{\mathbf {x}_p\in \mathbf {X}_j}\mathbf {x}_p - \left( 1-\alpha \right) \sum \limits _{p=1}^N W_{i,p}\mathbf {x}_p\right) \nonumber \\&\quad \times \left( \mathbf {x}_i-\alpha \dfrac{1}{N_j}\sum \limits _{\mathbf {x}_p\in \mathbf {X}_j}\mathbf {x}_p - \left( 1-\alpha \right) \sum \limits _{p=1}^N W_{i,p}\mathbf {x}_p\right) ^T. \end{aligned}$$

(6)

Let $\mathbf {I}$ be a $N\times N$ identity matrix and $\mathbf {L}$ be a block diagonal matrix whose block size is $N_j$ and elements are $\dfrac{1}{N_j}$. The $\mathbf {S}_\mathrm{{wreg}}=\mathbf {X}\left( \alpha \left( \mathbf {I}-\mathbf {L}\right) +\left( 1-\alpha \right) \left( \mathbf {I}-\mathbf {W}\right) \right) \left( \alpha \left( \mathbf {I}-\mathbf {L}\right) +\left( 1-\alpha \right) \left( \mathbf {I}-\mathbf {W}\right) \right) ^T\mathbf {X}=\mathbf {X}\left( \mathbf {I}-\left( \alpha \mathbf {L}+\left( 1-\alpha \right) \mathbf {W}\right) \right) \left( \mathbf {I}-\left( \alpha \mathbf {L}+\left( 1-\alpha \right) \mathbf {W}\right) \right) ^T\mathbf {X}^T$ holds.

Let $\mathbf {X}_w=\mathbf {X}\left( \mathbf {I}-\left( \alpha \mathbf {L}+\left( 1-\alpha \right) \mathbf {W}\right) \right) $. Then, the regularized within-class scatter is rewritten as $\mathbf {S}_\mathrm{{wreg}}=\mathbf {X}_w\mathbf {X}_w^T$.

The weights are computed by minimizing the following objective function.

$$\begin{aligned} \begin{matrix} {\min :} &{} {\left\| \mathbf {x}_i-\sum \limits _{j=1}^N W_{i,j}\mathbf {x}_j\right\| ^2} \quad \mathrm{{s.t.}} \quad {\sum \limits _{j=1}^N W_{i,j}=1,} &{} {j=1,\dots ,N.} \\ \end{matrix} \nonumber \\ \end{aligned}$$

(7)

The term, $ \sum \nolimits _{\mathbf {x}_j\in KNN(\mathbf {x}_i) \& c(\mathbf {x}_j)\ne c(\mathbf {x}_i)} W_{i,j}\mathbf {x}_j$ , can be regarded as the weighted mean of k-nearest neighbors of $\mathbf {x}_j$. The details to solve formula (7) can refer to the reference [30].

Definition 2

(Regularized total scatter) Given a dataset $\mathbf {X}\in \mathbb {R}^{N\times D}$. The associated label is $\mathbf {Y}$ ($y_i\in \{1,\dots ,c\}$). $\mathbf {X}_j$ consists of all samples belonging to class j. The regularized total scatter is defined as

$$\begin{aligned}&\mathbf {S}_\mathrm{{treg}}=\sum \limits _{i=1}^N\left( \beta \left( \mathbf {x}_i-\varvec{\mu }\right) +\left( 1-\beta \right) \left( \mathbf {x}_i-\varvec{\mu }'\right) \right) \nonumber \\&\quad \left( \beta \left( \mathbf {x}_i-\varvec{\mu }\right) +\left( 1-\beta \right) \left( \mathbf {x}_i-\varvec{\mu }'\right) \right) ^T \end{aligned}$$

(8)

Here, $\varvec{\mu }'$ and $\varvec{\mu }'_j$ are defined as Eq. (9) and (10), respectively.

$$\begin{aligned}&\varvec{\mu }'=\frac{1}{N_c}\sum \limits _{j=1}^{N_c}\varvec{\mu }'_j. \end{aligned}$$

(9)

$$\begin{aligned}&\varvec{\mu }'_j=\frac{1}{N_j}\sum \limits _{\mathbf {x}_i \in C_j}\sum \limits _{\mathbf {x}_h\in KNN(\mathbf {x}_i)} W_{i,h}\mathbf {x}_h. \end{aligned}$$

(10)

The Eq. (8) can be rewritten as follows:

$$\begin{aligned}&\mathbf {S}_\mathrm{{treg}}=\sum \limits _{i=1}^N\left( \mathbf {x}_i-\left( \beta \varvec{\mu }+\left( 1-\beta \right) \varvec{\mu }'\right) \right) \nonumber \\&\quad \left( \mathbf {x}_i-\left( \beta \varvec{\mu }+\left( 1-\beta \right) \varvec{\mu }'\right) \right) ^T \end{aligned}$$

(11)

The NPDs for regularized within-class scatter and regularized total scatter satisfy the following conditions.

$$\begin{aligned} \varvec{\varphi }^T\mathbf {S}_\mathrm{{wreg}}\varvec{\varphi }=0 \quad and \quad \varvec{\varphi }^T\mathbf {S}_\mathrm{{treg}}\varvec{\varphi }>0. \end{aligned}$$

(12)

Since $\mathbf {S}_\mathrm{{wreg}}$ is a semipositive definite matrix and can be represented as $\mathbf {S}_\mathrm{{wreg}}=\mathbf {X}_w\mathbf {X}_w^T$, we can obtain ${{\varvec{\varphi }}^{T}}{{\mathbf {X}}_{w}}\mathbf {X}_{w}^{T}\varvec{\varphi }=0\Rightarrow {{({{\mathbf {X}_w}^{T}}\varvec{\varphi })}^{T}}({{\mathbf {X}_w}^{T}}\varvec{\varphi })=0\Rightarrow {{\mathbf {X}_w}^{T}}\varvec{\varphi }=0$. By multiplying both sides of the equation from the left with matrix ${{\mathbf {X}_w}}$, then we can obtain ${{\mathbf {X}}_{w}}{{\mathbf {X}}_{w}}^{T}\varvec{\varphi }=0\Rightarrow \mathbf {S}{_\mathrm{{wreg}}}\varvec{\varphi }=0$ holds.

On the other hand, the solution of ${{\mathbf {S}}_\mathrm{{wreg}}}\varvec{\varphi }=0$ exactly satisfies ${{\varvec{\varphi }}^{T}}\mathbf {S}{_\mathrm{{wreg}}}\varvec{\varphi }=0$. The $\mathbf {S}{_\mathrm{{wreg}}}\varvec{\varphi }=0\Leftarrow {{\varvec{\varphi }}^{T}}\mathbf {S}{_\mathrm{{wreg}}}\varvec{\varphi }=0$ holds.

From the above two points, we can obtain ${{\varvec{\varphi }}^{T}}\mathbf {S}{_\mathrm{{wreg}}}\varvec{\varphi }=0\Leftrightarrow \mathbf {S}{_\mathrm{{wreg}}}\varvec{\varphi }=0$.

Let $\mathbf {Z}_w=\left\{ \mathbf {z}\vert \mathbf {S}_\mathrm{{wreg}}\mathbf {z}=0\right\} $ be the null space of $\mathbf {S}_\mathrm{{wreg}}$, $\mathbf {Z}_t=\left\{ \mathbf {z}\vert \mathbf {S}_\mathrm{{treg}}\mathbf {z}=0\right\} $ be the null space of $\mathbf {S}_\mathrm{{treg}}$, and $\mathbf {Z}_t^\perp $ be the orthogonal complement space of $\mathbf {Z}_t$. The NPDs satisfy

$$\begin{aligned} \varvec{\varphi }\in \mathbf {Z}_t^\perp \cap \mathbf {Z}_w. \end{aligned}$$

(13)

In order to ensure $\varvec{\varphi }\in \mathbf {Z}_t^\perp $ , each $\varvec{\varphi }$ can be represented as

$$\begin{aligned} \varvec{\varphi }=\gamma _1\varvec{\theta }_1+\gamma _2\varvec{\theta }_2+\dots +\lambda _m\varvec{\theta }_m=\mathbf {Q}\varvec{\gamma }. \end{aligned}$$

(14)

Here, $\mathbf {Q}=(\varvec{\theta }_1,\varvec{\theta }_2,\dots ,\varvec{\theta }_m)$ and $\varvec{\gamma }=(\gamma _1,\gamma _2,\dots ,\gamma _m)$. Since the $\mathbf {S}_\mathrm{{treg}}$ is seimipositive definite, the solution of $\varvec{\varphi }^T\mathbf {S}_\mathrm{{treg}}\varvec{\varphi }>0$ is just $\mathbf {Z}_t^\perp $ ($\mathbf {Z}_t=\left\{ \mathbf {z}\vert \mathbf {S}_\mathrm{{treg}}\mathbf {z}=0\right\} $). It means that the subspace $\varvec{\varphi }^T\mathbf {S}_\mathrm{{treg}}\varvec{\varphi }>0$ is just the $\mathbf {Z}_t^\perp $. Let $\varvec{\theta }_1,\varvec{\theta }_2,\dots ,\varvec{\theta }_m$ be the basis of subspace which is spanned by $\mathbf {x}_1-(\beta \varvec{\mu }+(1-\beta )\mu '),\dots ,\mathbf {x}_m-(\beta \varvec{\mu }+(1-\beta )\mu ')$ and can be obtained by principal component analysis (PCA).

Substituting Eq. (14) into $\varvec{\varphi }^T\mathbf {S}'_w\varvec{\varphi }=0$, then $(\mathbf {Q\gamma })^T\mathbf {S}_\mathrm{{wreg}}(\mathbf {Q\gamma })=0$. It is equivalent to the following eigenproblem.

$$\begin{aligned} \left( \mathbf {Q}^T\mathbf {S}_\mathrm{{wreg}}\mathbf {Q}\right) \varvec{\gamma }=0. \end{aligned}$$

(15)

Due to $\mathbf {X}_w=\mathbf {X}\left( \mathbf {I}-\left( \alpha \mathbf {L}+\left( 1-\alpha \right) \mathbf {W}\right) \right) $ and $\mathbf {S}_\mathrm{{wreg}}=\mathbf {X}_w\mathbf {X}_w^T$, the Eq. (15) can be rewritten as follows.

$$\begin{aligned} \mathbf {H}\mathbf {H}^T\varvec{\gamma }=0. \end{aligned}$$

(16)

Here, $\mathbf {H}=\mathbf {Q}^T\mathbf {X}_w=\mathbf {Q}^T\mathbf {X}(\mathbf {I}-(\alpha \mathbf {L}+(1-\alpha )\mathbf {W}))$. Eq. (16) is an eigenproblem. After solve Eq. (16), the null projection directions $\varvec{\gamma }_1,\varvec{\gamma }_2,\dots ,\varvec{\gamma }_l$ can be obtained from Eq. (14).

The null projection directions $\varvec{\gamma }_1,\varvec{\gamma }_2,\dots ,\varvec{\gamma }_l$ are associated different eigenvalues $\lambda _1,\lambda _2,\dots ,\lambda _l$ ($\lambda _i\ne \lambda _j$). From Eq. (16), we can obtain $\varvec{\gamma }_i^T\mathbf {H}\mathbf {H}^T=\lambda _i\varvec{\gamma }_i^T,\varvec{\gamma }_j^T\mathbf {H}\mathbf {H}^T=\lambda _j\varvec{\gamma }_j^T\Rightarrow \varvec{\gamma }_i^T\mathbf {H}\mathbf {H}^T\varvec{\gamma }_j=\lambda _i\varvec{\gamma }_i^T\varvec{\gamma }_j,\varvec{\gamma }_j^T\mathbf {H}\mathbf {H}^T\varvec{\gamma }_j=\lambda _j\varvec{\gamma }_j^T\varvec{\gamma }_j \Rightarrow 0=\left( \lambda _i-\lambda _j\right) \varvec{\gamma }_i^T\varvec{\gamma }_j$. Due to $\lambda _i\ne \lambda _j$, $\varvec{\gamma }_i^T\varvec{\gamma }_j=0$ hols. Therefore, $\varvec{\gamma }_i$ and $\varvec{\gamma }_j$ are orthogonalized.

In Eq. (14), the matrix $\mathbf {Q}$ is obtained from PCA. The column vectors of $\mathbf {Q}$ are orthogonalized. We can obtain $\varvec{\varphi }_i^T\varvec{\varphi }_j=\left( \mathbf {Q}\varvec{\gamma }_i\right) ^T\left( \mathbf {Q}\varvec{\gamma }_j\right) =\varvec{\gamma }_i^T\mathbf {Q}^T\mathbf {Q}\varvec{\gamma }_j=\varvec{\gamma }_i^T\varvec{\gamma }_j=0$. Therefore, the directions obtained from Eq. (14) are orthogonalized as well.

Let $\mathbf {P}$ be a matrix whose columns are null projection directions $\varvec{\varphi }_1,\dots ,\varvec{\varphi }_l$ ($l<N$). A test sample $\mathbf {x}$ is mapped into null space via Eq. (17) and scored via Eq. (18).

$$\begin{aligned}&\mathbf {x}^{\star }=\mathbf {P}\mathbf {x}. \end{aligned}$$

(17)

$$\begin{aligned}&\mathrm{{Score}}(\mathbf {x})=\min \limits _{1\le j \le c} \mathrm{{dist}}(\mathbf {x}^{\star },\mathbf {t}_j). \end{aligned}$$

(18)

Here, $\mathbf {t}_j$ is the mapped point of class j in null space. The $\mathrm{{Score}}(\mathbf {x})$ is the novelty score of $\mathbf {x}$, which reflects the probability of a test sample from an unknown class. When it is very large, $\mathbf {x}$ is a novelty with high probability.

The procedure to find NPDs is summarized as the following Algorithm 1

Compared with null space Foley–Sammon transform (NFST), the extra cost of MLND includes solving Eq. (7) and calculating regularized within-class scatter and regularized total scatter. The rest cost of MLND is the same as that of NFST.

Kernel form manifold learning for novelty detection

In MLND, it assumes that $\mathbf {S}_\mathrm{{wreg}}$ is singular. When $\mathbf {S}_\mathrm{{wreg}}$ is full rank, the samples are mapped into a reproducing Hilbert space (RKHS) via kernel trick to avoid the null space $\mathbf {Z}_w$ being empty. The mapping of a sample $\mathbf {x}$ in RKHS is represented as $\varPhi (\mathbf {x})$ where $\varPhi (\mathbf {x})$ is an implicit function. The inner product of the mappings of two samples can be calculated via kernel function which is defined as $k\left( \mathbf {x}_i,\mathbf {x}_j\right) =\langle \varPhi (\mathbf {x}_i),\varPhi (\mathbf {x}_j)\rangle $, such as Radial Basis Function (RBF) kernel $k\left( \mathbf {x}_i,\mathbf {x}_j\right) =\exp \left( -\frac{\Vert \mathbf {x}_i-\mathbf {x}_j\Vert }{2\sigma ^2}\right) $. Obviously, $\mathbf {S}_\mathrm{{wreg}}$ is a high dimensional space other than a $d\times d$ matrix any more. For instance, $\mathbf {S}_\mathrm{{wreg}}$ is a $inf\times inf$ matrix when RBF kernel is adopted.

Let $\widetilde{\varPhi }(\mathbf {x}_i)=\varPhi (\mathbf {x}_i)-\Big (\frac{\beta }{N}\sum \nolimits _{j=1}^N\varPhi (\mathbf {x}_j)+\frac{1-\beta }{N}\sum \nolimits _{j=1}^N\sum \nolimits _{\mathbf {x}_h\in KNN(\mathbf {x}_i)}W_{i,j}\varPhi (\mathbf {x}_h)\Big )$, $\widetilde{\mathbf {K}}=(\mathbf {I}-(\beta \mathbf {1}_N+(1-\beta )\mathbf {1}_N\mathbf {W}))\mathbf {K}(\mathbf {I}-(\beta \mathbf {1}_N+(1-\beta )\mathbf {1}_N\mathbf {W}))^T$, $\widetilde{\mathbf {X}}=[\varPhi (\mathbf {x}_1),\dots ,\varPhi (\mathbf {x}_N)]$, $\mathbf {K}$ be the kernel matrix where $K(i,j)=\langle \varPhi (\mathbf {x}_i),\varPhi (\mathbf {x}_j)\rangle $. Then, $\mathbf {S}_\mathrm{{treg}}$ in RKHS is rewritten as follows

$$\begin{aligned} \mathbf {S}_\mathrm{{treg}}=\sum _{i=1}^N\widetilde{\varPhi }(\mathbf {x}_i)\widetilde{\varPhi }(\mathbf {x}_i)^T. \end{aligned}$$

(19)

The eigenvector $\varvec{\theta }_j$ in high dimensional feature space lies in the space of $\widetilde{\varPhi }(\mathbf {x}_1),\dots ,\widetilde{\varPhi }(\mathbf {x}_N)$ and there exist coefficients $\delta _{1,j},\dots ,\delta _{N,j}$ satisfying the following equation.

$$\begin{aligned} \varvec{\theta }_j=\sum _{i=1}^N\delta _{i,j}\widetilde{\varPhi }(\mathbf {x}_i). \end{aligned}$$

(20)

The eigenvalues and eigenvectors of $\mathbf {S}_\mathrm{{treg}}$ satisfy

$$\begin{aligned} \lambda _j\varvec{\theta }_j=\mathbf {S}_\mathrm{{treg}}\varvec{\theta }_j. \end{aligned}$$

(21)

Then,

$$\begin{aligned} \lambda _j\widetilde{\varPhi }(\mathbf {x}_i)^T\varvec{\theta }_j=\widetilde{\varPhi }(\mathbf {x}_i)^T\mathbf {S}_\mathrm{{treg}}\varvec{\theta }_j,\quad i=1,\dots ,l. \end{aligned}$$

(22)

Substituting Eqs. (20) and (21) into Eq. (22), we can obtain

$$\begin{aligned} \lambda _j\widetilde{\mathbf {K}}\varvec{\delta }_j=\widetilde{\mathbf {K}}\widetilde{\mathbf {K}}\varvec{\delta }_j. \end{aligned}$$

(23)

Here, $\varvec{\delta }_j$ is the vector form of $\delta _{1,j},\dots ,\delta _{N,j}$ and can be obtained by solving the following eigenvalue problem

$$\begin{aligned} \lambda _j\varvec{\delta }_j=\widetilde{\mathbf {K}}\varvec{\delta }_j. \end{aligned}$$

(24)

Since $\langle \varvec{\theta }_j,\varvec{\theta }_j\rangle =\varvec{\delta }_j^T\widetilde{\mathbf {K}}\varvec{\delta }_j=\lambda _j\langle \varvec{\delta }_j,\varvec{\delta }_j\rangle $, the orthonormal basis of $\mathbf {S}_\mathrm{{treg}}$ in high dimensional space is represented as follows:

$$\begin{aligned} \widetilde{\varvec{\theta }}_j=\sum \limits _{i=1}^N\widetilde{\delta }_{i,j}\widetilde{\varPhi }(\mathbf {x}_i). \end{aligned}$$

(25)

Here $\widetilde{\delta }_{i,j}=\lambda _j^{-\frac{1}{2}}\delta _{i,j}$. Equation (25) can be solved implicitly. By introducing Eq. (25) and inner products in reproducing kernel Hilbert space (RKHS), the matrix $\mathbf {H}$ is rewritten as follows.

$$\begin{aligned}&\mathbf {H}=\left( (\mathbf {I}-(\beta \mathbf {1}_N+(1-\beta )\mathbf {1}_N\mathbf {W}))\widetilde{\mathbf {V}}\right) ^T \nonumber \\&\quad \mathbf {K}\left( \alpha \left( \mathbf {I}-\mathbf {L} \right) +\left( 1-\alpha \right) \left( \mathbf {I}-\mathbf {W}\right) \right) . \end{aligned}$$

(26)

Here, $\widetilde{\mathbf {V}}=\{\varvec{\theta }_1,\dots ,\varvec{\theta }_l\}$. Then, substituting Eq. (26) into Eq. (16), we can obtain $\gamma _j$ in RKHS. The final null space directions in RKHS are obtained by the

$$\begin{aligned} \varvec{\varphi }=\left( (\mathbf {I}-\mathbf {1}_N)\widetilde{\mathbf {V}}\right) \gamma _j. \end{aligned}$$

(27)

Let $\mathbf {P}=[\varvec{\varphi }_1,\dots ,\varvec{\varphi }_l]$. In kernel MLND, the test sample $\mathbf {x}^{\star }$ is mapped into null space through $\mathbf {K}_{\star }^T\mathbf {P}$. The $\mathbf {K}_{\star }^T$ is the vector form of $[k(\mathbf {x}_1,\mathbf {x}_{\star });\dots ;k(\mathbf {x}_n,\mathbf {x}_{\star })]$. The novelty score of $\mathbf {x}^{\star }$ is the minimum distance between the mapped point to each class. The procedure of kernel MLND is summarized as Algorithm 2.

When the parameters $\alpha =1$ and $\beta =1$, kernel form MLND degenerates as KNFST. When $\alpha =0$ and $\beta =0$, the $\mathbf {S}_\mathrm{{wreg}}$ is just a LLE manifold. Compared with kernel null space Foley–Sammon transform (KNFST), the extra cost of MLND includes solving Eq. (7) and the time complexity of rest part is the same as that of KNFST.

Experiments and simulations

In this section, manifold learning novelty detection (MLND) will be evaluated on several datasets. Here, we use the kernel MLND. The code of MLND is implemented by Matlab 2018b. To verify the validity of MLND, we compare MLND with some state-of-the-art null space methods, including KNFST [3], Local KNFST [31], and NK3ML [29]. The codes of KNFST, Local KNFST, and NK3ML are provided by the authors.

The generalized histogram intersection kernel (HIK) is used as kernel function in KNFST, Local KNFST, and NK3ML. For fair comparision, the HIK is used as kernel function, which is defined as $k(\mathbf {x}_i,\mathbf {x}_j)=\exp (2\kappa _\mathrm{{HIK}}(\mathbf {x}_i,\mathbf {x}_j)-\kappa _\mathrm{{HIK}}(\mathbf {x}_i,\mathbf {x}_i)-\kappa _\mathrm{{HIK}}(\mathbf {x}_j,\mathbf {x}_j)$. The $\kappa _\mathrm{{HIK}}(\mathbf {x}_i,\mathbf {x}_j)$ is defined as $\kappa _\mathrm{{HIK}}(\mathbf {x}_i,\mathbf {x}_j)=\sum \limits _{d=1}^D\min (x_{i,d},x_{j,d})$.

First, we adopt an EMG dataset to demonstrate the effectiveness of MLND on real dataset; then, two toy datasets are used to further evaluate MLND; lastly several benchmark datasets which are collected by UCI^{Footnote 1} or Libsvm website [32] are adopted to further evaluate MLND. The experimental results are reported in terms of AUC value, ROC curve and accuracy. The AUC value and ROC curve are used to evaluate novelty detection methods. The higher AUC value is, the better the novelty detector is. Accuracy is defined as the ratio of correctly predicted normal samples to all normal samples. It is used to measure the classification performance of multi-class supervised novelty detection for normal samples.

Table 1 The experimental results of gestures recognition

Full size table

Experiments on EMG dataset

In this section, we use Gestures, which is an electromyogram signal (EMG) dataset, to verify MLND. The signals are collected via MYO Thalmic bracelet which is worn on user’s forarm. The bracelet is equipped with 8 sensors to collect myographic signals simultaneously. The raw signals are from 36 subjects. Each subject performs 2 series. Each series contains 6 or 7 basic gestures: hand at rest, hand clenched in a fist, wrist flexion, wrist extension, radial deviations, ulnar deviations, and extended palm. In this experiment, we only consider former six gestures since the extended palm is not performed in some subjects. An illustration of the signals of former six gestures is shown in Fig. 2. The label of horizontal axis is the channel which the signal are collected from. A channel is associated with a sensor in MYO Thalmic bracelet.

Different from previous gesture recognition work [33], this paper converts the gesture recognition as a multi-class supervised novelty detection problem to identify unknown gesture. Except seven basic gestures, some of the signals are not marked as basic gestures. In this section, we use hand at rest, hand clenched in a fist, wrist flexion, wrist extension, radial deviations, and ulnar deviations as normal classes. The extended palm and unmarked signals are used as anomalies. Therefore, the task of Gesture recognition becomes to recognize whether the EMG signal are from basic hand gestures or which basic hand gesture the EMG signal comes from if it belongs to one of the basic hand gestures. Obviously, it is a multi-class supervised novelty detection problem. The gesture recognition can be widely used in robot control [34, 35] and traffic control [36, 37].

We set a 200 ms window for sampling. The window overlaps with a 100 ms step. Then, we generate 30,240 normal samples (5,040 samples per class) and 10,000 abnormal samples as novelties. The normal samples are divided into two parts equally. One part is used as a training set, the other part and abnormal samples are used as a test set. The features from eight channels are reorganized as an 800*1 vector.

In MLND, we directly set parameters as ($\alpha =0.5$, $\beta =0.5$). The number of nearest neighbors in Definition 1 and Definition 2 is directly set as 20 to avoid extra cost to tune parameters. To avoid randomness, we repeat the experiment 30 times. The results are reported as mean ± std. form in terms of AUC value and accuracy. The results are reported in Table 1.

From Table 1, it can be found the average AUC value of MLND reaches 0.9251 which is higher than KNFST, Local KNFST, and NK3ML; the average accuracy of MLND reaches 93.87% which is also higher than KNFST, Local KNFST, and NM3ML. The ROC curve of one trail is drawn in Fig. 3.

From Fig. 3, the ROC curve of MLND is still superior to that of KNFST, Local KNFST, and NK3ML. MLND performs better than KNFST, Local KNFST, and NK3ML on Gesture.

Furthermore, we also consider the influence of different parameter k in Definitions 1 and 2 on the performance of MLND. The parameter k is in the range from 10 to 100 steps by 10. Here, the parameters $\alpha $ and $\beta $ are both set as 0.5 directly. The curve between AUC value and the parameter k is shown in the left sub-figure of Fig. 4. The curve between accuracy and the parameter k is shown in the right sub-figure of Fig. 4.

From the results in Fig. 4, it can be found that both AUC value and accuracy decrease with the increasing of the number of k nearest neighbors in MLND when $k>30$. The reason is that the manifold is used to depict a small region. When the neighborhood is too large, the manifold is invalid. When $k=20$, the AUC value reaches the peak ($AUC=0.9152$). When $k=30$, the accuracy reaches the peak ($accuracy=94.65\%$). In our experience, the parameter k in MLND cannot be set too large. In the following experiments, the parameter k is set as 20 directly.

Experiments on toy datasets

In this subsection, we will evaluate MLND on two toy datasets. The first one contains 3 normal classes and the second one contains 2 normal classes. In toy 1, the samples in $\mathbf {X}_j,j=1,2,3$ follows the below distributions:

$$\begin{aligned} \mathbf {X}_j=\mathbf {N}_j. \end{aligned}$$

(28)

Here, $\mathbf {N}_1\thicksim \left( [\begin{matrix} 0&0 \end{matrix}],\left[ \begin{matrix} 0.5^2 &{} 0 \\ 0 &{} 1.25^2\end{matrix}\right] \right) $, $\mathbf {N}_2\thicksim \left( [\begin{matrix} 2&0 \end{matrix}],\left[ \begin{matrix} 0.5^2 &{} 0 \\ 0 &{} 1.25^2\end{matrix}\right] \right) $, $\mathbf {N}_3\thicksim \left( [\begin{matrix} 4&0 \end{matrix}],\left[ \begin{matrix} 0.5^2 &{} 0 \\ 0 &{} 1.25^2\end{matrix}\right] \right) $. An illustration of toy 1 is shown in Fig. 5a.

In toy 2, the samples in $\mathbf {X}_j,j=1,2$ follows the below distributions:

$$\begin{aligned} \mathbf {X}_j=\mathbf {N}_j+\left[ 1-\frac{x_{i,2}^3}{25}+\epsilon \right] . \end{aligned}$$

(29)

Here, $\mathbf {N}_1\thicksim \left( [\begin{matrix} 0&0 \end{matrix}],\left[ \begin{matrix} 2^2 &{} 0 \\ 0 &{} 2^2\end{matrix}\right] \right) $, $\mathbf {N}_2\thicksim \left( [\begin{matrix} 3&3 \end{matrix}],\left[ \begin{matrix} 2^2 &{} 0 \\ 0 &{} 2^2\end{matrix}\right] \right) $, $\epsilon \thicksim N\left( 0,0.25^2\right) $. An illustration of toy 2 is shown in Fig. 5b.

In toy 1, we generate 600 samples in the training set (200 samples per class) and 2000 samples in the test set (500 samples per class and 500 novelties). In toy 2, we generate 400 samples in the training set (200 samples per class) and 1500 samples in the test set (500 samples per class and 500 novelties). To refrain the randomness, we repeat the experiments 30 times. The AUC value and accuracy are reported in the form of mean ± std. in Table 2.

Table 2 The experimental results of toy datasets

Full size table

In Table 2, the sixth column represents the results of MLND via fine-tuned parameters $\alpha ,\beta $. The parameters $\alpha ,\beta $ are tuned via grid search in the range from 0.1 to 1 stepped 0.1. The seventh column of Table 2 represents the results of MLND with fixed parameters $\alpha =0.5,\beta =0.5$.

For Toy 1, the average AUC value of MLND is 0.9589 when tuning parameters $\alpha ,\beta $ via grid search and is 0.9492 when $\alpha =0.5,\beta =0.5$. For Toy 2, the average AUC value of MLND is 0.9314 when tuning parameters $\alpha ,\beta $ via grid search and is 0.9249 when $\alpha =1,\beta =1$. The average AUC value of MLND is higher than KNFST, Local KNFST, and NK3ML even the parameters are directly set as $\alpha =0.5,\beta =0.5$.

For Toy 1, the average accuracy of MLND is 96.71% when tuning parameters $\alpha ,\beta $ via grid search and is 95.17% when $\alpha =0.5,\beta =0.5$. For Toy 2, the average accuracy of MLND is 93.47% when tuning parameters $\alpha ,\beta $ via grid search and is 93.28% when $\alpha =0.5,\beta =0.5$. The average accuracy of MLND is higher than KNFST, Local KNFST, and NK3ML even the parameters are directly set as $\alpha =0.5,\beta =0.5$ as well.

This is because MLND considers both global information and local structure in class. The ROC curves of Toy 1 and Toy 2 are shown in Fig. 6.

From Fig. 6, we can obtain the same conclusion that is from AUC value and accuracy for Toy 1 and Toy 2.

Experiments on benchmark datasets

In this subsection, we will compare MLND with KNFST, Local KNFST, and NK3ML on several benchmark dataset which are collected from UCI repository and Libsvm website [32]. The details of these datasets are listed in Table 3.

Table 3 The details of benchmark datasets

Full size table

These datasets are reorganized to suit for evaluating multi-class supervised novelty detection. For DNA, protein, satimage, and shuttle, we remove a class from the training set and add the samples from this class into a test set for testing. For pendigits poker, SVHN, and usps, we remove five classes from the training set and add the samples from these classes into the test set for testing. The parameters in MLND is directly set as $\alpha =0.5$, $\beta =0.5$ and $k=20$. The AUC value and accuracy are reported in Tables 4 and 5, respectively.

Table 4 The AUC value results of benchmark datasets

Full size table

Table 5 The accuracy results of benchmark dataset

Full size table

In Tables 4 and 5, the last row is the win-loss-tie (W-L-T) of AUC value and accuracy, respectively. The MLND is used as a based method. From Table 4, it can be found that the AUC value of MLND is higher than that of KNFST on eight datasets, Local KNFST on eight datasets, and NK3ML on seven datasets. From Table 5, it can be found that the accuracy of MLND is higher than that of KNFST on eight datasets, Local KNFST on eight datasets, and NK3ML on six datasets. The MLND is superior to KNFST, Local KNFST, and NK3ML on most of these benchmark datasets.

Discussions and conclusion

In this paper, we propose a manifold learning-based novelty detection method. The manifold learning novelty detection (MLND) can be regarded as an improvement of kernel null space Foley–Sammon transformation (KNFST). In MLND, first we introduce a manifold into within-class scatter and total scatter to depict the local geometrical structure in class; then we map the samples from the same class into a single point via null projection directions. Compared with KNFST, MLND considers both global information and local geometrical structure in the class. Therefore, MLND can overcome the weakness of KNFST caused by ignoring local geometrical structure in the class. We evaluate MLND on an EMG Gesture dataset, two toy dataset and eight benchmark datasets. The experimental results demonstrate MLND is superior to KNFST and its two improved methods: Local KNFST and NK3ML.

Change history

20 December 2022
A Correction to this paper has been published: https://doi.org/10.1007/s40747-022-00951-y

Notes

https://archive.ics.uci.edu/ml/index.php.

References

Chalapathy R, Chawla S (2019) Deep learning for anomaly detection: a survey. arXiv preprint arXiv:1901.03407
Chandola V, Banerjee A, Kumar V (2009) Anomaly detection: a survey. ACM Comput Surv (CSUR) 41(3):1–58
Article Google Scholar
Bodesheim P, Freytag A, Rodner E, et al (2013) Kernel null space methods for novelty detection. Proceedings of the IEEE conference on computer vision and pattern recognition. pp 3374–3381
Chan FTS, Wang ZX, Patnaik S et al (2020) Ensemble-learning based neural networks for novelty detection in multi-class systems. Appl Soft Comput 93:106396
Article Google Scholar
Xie X, Wang C, Chen S, et al (2017) Real-time illegal parking detection system based on deep learning. Proceedings of the 2017 international conference on deep learning technologies. pp 23–27
Schlegl T, Seeböck P, Waldstein SM, et al (2017) Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. International conference on information processing in medical imaging. Springer, Cham, pp 146–157
Javaid A, Niyaz Q, Sun W, et al (2016) A deep learning approach for network intrusion detection system. Proceedings of the 9th EAI international conference on bio-inspired information and communications technologies. pp 21–26
Kravchik M, Shabtai A (2018) Detecting cyber attacks in industrial control systems using convolutional neural networks. Proceedings of the 2018 workshop on cyber-physical systems security and privacy. pp 72–83
Mehdi M, Ala A-F, Sameh S, Mohsen G (2017) Deep learning for iot big data and streaming analytics: a survey. arXiv preprint arXiv:1712.04301
Zhang W, Zhang B, Zhu W et al (2021) Comprehensive assessment of MODIS-derived near-surface air temperature using wide elevation-spanned measurements in China. Sci Total Environ 800:149535
Article Google Scholar
Guo Z, Min A, Yang B et al (2021) A sparse oblique-manifold nonnegative matrix factorization for hyperspectral unmixing. IEEE Trans Geosci Remote Sens 60:1–13
Google Scholar
Koundal D, Sharma B, Guo Y (2020) Intuitionistic based segmentation of thyroid nodules in ultrasound images. Comput Biol Med 121:103776
Article Google Scholar
Koundal D, Gupta S, Singh S (2018) Computer aided thyroid nodule detection system using medical ultrasound images. Biomed Signal Process Control 40:117–130
Article Google Scholar
Perera P, Patel VM (2019) Learning deep features for one-class classification. IEEE Trans Image Process 28(11):5450–5463
Article MathSciNet MATH Google Scholar
Zhu F, Yang J, Gao C et al (2016) A weighted one-class support vector machine. Neurocomputing 189:1–10
Article Google Scholar
Scholkopf B, Platt JC, Shawe-Taylor J, Smola Alex J, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7):1443–1471
Article MATH Google Scholar
Tax DMJ, Duin RPW (2004) Support vector data description. Mach Learn 54(1):45–66
Article MATH Google Scholar
Zhu F, Ye N, Yu W et al (2014) Boundary detection and sample reduction for one-class support vector machines. Neurocomputing 123:166–173
Article Google Scholar
Zhu F, Yang J, Xu S et al (2016) Relative density degree induced boundary detection for one-class SVM. Soft Comput 20(11):4473–4485
Article Google Scholar
Landgrebe T, Paclík P, Tax DMJ et al (2005) Optimising two-stage recognition systems. International workshop on multiple classifier systems. Springer, Berlin, pp 206–215
Tax DMJ, Duin RPW (2008) Growing a multi-class classifier with a reject option. Pattern Recognit Lett 29(10):1565–1570
Article Google Scholar
Zhu F, Ning Y, Chen X et al (2021) On removing potential redundant constraints for SVOR learning. Appl Soft Comput 102:106941
Article Google Scholar
Zhu F, Gao J, Yang J et al (2021) Neighborhood linear discriminant analysis. Pattern Recognit 123:108422
Article Google Scholar
Ruff L, Vandermeulen R, Goernitz N, et al (2018) Deep one-class classification. International conference on machine learning. pp 4393–4402
Iosifidis A, Mygdalis V, Tefas A et al (2017) One-class classification based on extreme learning and geometric class information. Neural Process Lett 45(2):577–592
Article Google Scholar
Zhang L, Xiang T, Gong S (2016) Learning a discriminative null space for person re-identification. Proceedings of the IEEE conference on computer vision and pattern recognition. pp 1239–1248
Liu J, Lian Z, Wang Y, et al (2017) Incremental kernel null space discriminant analysis for novelty detection. IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 4123–4131
Huang X, Xu J, Guo G (2018) Incremental kernel null Foley–Sammon transform for person re-identification. 2018 24th international conference on pattern recognition (ICPR). IEEE, pp 1683–1688
T Ali M F, Chaudhuri S (2018) Maximum margin metric learning over discriminative null space for person re-identification. Proceedings of the European conference on computer vision (ECCV). pp 122–138
He X, Cai D, Yan S, et al (2005) Neighborhood preserving embedding. Tenth IEEE international conference on computer vision (ICCV’05) volume 1(2). IEEE, pp 1208–1213
Bodesheim P, Freytag A, Rodner E, et al (2015) Local novelty detection in multi-class recognition problems. 2015 IEEE winter conference on applications of computer vision. IEEE, pp 813–820
Chang CC, Lin CJ (2011) LIBSVM: a library for support vector machines. ACM Trans Intell Syst Technol (TIST) 2(3):1–27
Article Google Scholar
Duan H, Sun Y, Cheng W et al (2021) Gesture recognition based on multi-modal feature weight. Concurr Comput Pract Exp 33(5):e5991
Article Google Scholar
Zhang X, Liu J, Gao Q et al (2020) Adaptive robust decoupling control of multi-arm space robots using time-delay estimation technique. Nonlinear Dyn 100(3):2449–2467
Article Google Scholar
Zhang X, Liu J, Feng J et al (2019) Effective capture of nongraspable objects for space robots using geometric cage pairs. IEEE/ASME Trans Mechatron 25(1):95–107
Article Google Scholar
Yue W, Li C, Chen Y, et al (2021) What is the root cause of congestion in urban traffic networks: road infrastructure or signal control?. IEEE Trans Intell Transport Syst
Zhou C, Gu Y, Fan X et al (2018) Direction-of-arrival estimation for coprime array via virtual array interpolation. IEEE Trans Signal Process 66(22):5956–5971
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors would like to thank the editor and the anonymous reviewers for their critical and constructive comments and suggestions. This work is partially supported by the University Philosophy and Social Science Research Project (2021SJA0522), and the Major Project of the Natural Science Foundation of the Jiangsu Higher Education Institutions of China (19KJA510004, 21KJA120001).

Author information

Authors and Affiliations

The School of Software Engineering, Jinling Institute of Technology, Nanjing, 211169, People’s Republic of China
Yang Luo
The Laboratory Center for Basic Medical Sciences, Nanjing Medical University, Nanjing, 211169, People’s Republic of China
Yibiao Yuan
The School of Computer Engineering, Jinling Institute of Technology, Nanjing, 211169, People’s Republic of China
Wei Zheng & Xiaohui Mo

Authors

Yang Luo
View author publications
You can also search for this author in PubMed Google Scholar
Yibiao Yuan
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohui Mo
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yang Luo.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Luo, Y., Yuan, Y., Zheng, W. et al. Manifold learning for novelty detection and its application in gesture recognition. Complex Intell. Syst. 8, 4089–4100 (2022). https://doi.org/10.1007/s40747-022-00702-z

Download citation

Received: 21 July 2021
Accepted: 30 January 2022
Published: 14 March 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s40747-022-00702-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Manifold learning for novelty detection and its application in gesture recognition

Abstract

Similar content being viewed by others

LGND: a new method for multi-class novelty detection

Class specific nullspace marginal discriminant analysis with overfitting-prevention kernel estimation for hand trajectory recognitions

Gesture Recognition Benchmark Based on Mobile Phone

Introduction

Related work

A review of supervised novelty detection

Recap of Kernel null Foley–Sammon transform (KNFST)

Manifold regularized NFST

Manifold learning for novelty detection

Definition 1

Definition 2

Kernel form manifold learning for novelty detection

Experiments and simulations

Experiments on EMG dataset

Experiments on toy datasets

Experiments on benchmark datasets

Discussions and conclusion

Change history

20 December 2022

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Manifold learning for novelty detection and its application in gesture recognition

Abstract

Similar content being viewed by others

LGND: a new method for multi-class novelty detection

Class specific nullspace marginal discriminant analysis with overfitting-prevention kernel estimation for hand trajectory recognitions

Gesture Recognition Benchmark Based on Mobile Phone

Explore related subjects

Introduction

Related work

A review of supervised novelty detection

Recap of Kernel null Foley–Sammon transform (KNFST)

Manifold regularized NFST

Manifold learning for novelty detection

Definition 1

Definition 2

Kernel form manifold learning for novelty detection

Experiments and simulations

Experiments on EMG dataset

Experiments on toy datasets

Experiments on benchmark datasets

Discussions and conclusion

Change history

20 December 2022

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation