Introduction

The information era, a large amount of data has been generated every moment in many fields such as education, medical care, social media, business, etc. However, most of these data cannot be directly applied to the real situation as impurities and redundancy. Thus, data preprocessing such as data cleaning and data transformation is paid more and more attention. The raw data always contains many noises and unnecessary background and the extra information may affect the data usage such as classification, regression, etc. Dimensionality reduction (DR) [1, 2] is a part of data preprocessing technique. Some important fields are also closely related to data dimension reduction. For example, in the field of object detection [3], the pictures taken by it often have the ultra-high resolution, so DR is needed to reduce the data dimension, which makes the following algorithm more smooth [4]. And in the field of iterative learning control domain which is related to robot field [5], the DR algorithms are also used in the preprocessing stage to reduce the computational complexity and computational time in the iterative learning algorithm [6]. The main goal of DR is to find out the optimal representative features or extract the low-dimensional features from the high-dimensional space to address the curse of dimensionality [7, 8]. Up to now, a variety of DR methods have been proposed to remove the redundant, insignificant, or noisy information from the raw data. For example, the supervised DRs represented by linear discriminant analysis (LDA) [9], the semi-supervised DRs represented by semi-supervised discriminant analysis (SDA) [10], and the unsupervised DRs represented by principal component analysis (PCA) [11].

The DR methods are generally divided into two categories: linear and nonlinear methods [12]. Principal component analysis (PCA) and linear discriminant analysis (LDA) are two of the most representative linear dimensionality reduction techniques. PCA is proposed by Turk and Pentland in 1991, which is also called Eigenface [11] in face recognition. The mathematical foundations of PCA are the properties of the covariance matrix and the special meaning of eigenvectors. And PCA has many different implement algorithms, such as eigenvalues, latent variable analysis, factor analysis etc. [13,14,15]. LDA is proposed by Fisher in 1936, which is also called Fisherface in face recognition. And its main idea is to maximize the ratio of the between-class scatter to the within-class scatter, thereby maximizing the separability of between-class. However, both PCA and LDA are linear methods.

To address the non-linear problem, a large number of manifold learning algorithms have been proposed [16], such as Laplacian eigenmap (LE) [17], and locally linear embedding (LLE) [18], Isomap [19], local tangent space alignment (LTSA) [20]. All these methods can obtain a low-dimensional embedding which is regarded as the best representative subspace from the high-dimensional non-linear structure of the data. The idea of LE is to use the Laplacian of graphs to find the optimal low-dimensional representation that preserves local neighborhood information of the original manifold. LLE is based on the idea that each data point and its closest certain number of neighbors are viewed as a locally linear patch of the manifold and then reconstruct each data point from its neighbors in the corresponding subspace. The main objective of Isomap is to preserve the best similarity or dissimilarity on the manifold between any pairs of data points and it improves the computation method based on multi-dimensional scaling (MDS) [21, 22]. The idea of LTSA is to represent the local geometry of the high-dimensional manifold by using the tangent space in the neighborhood of a data point and then align those tangent spaces to construct the global coordinate system for the nonlinear manifold.

However, those DR methods mentioned above all face the “out-of-sample” problem [23] which means they are only defined on the training set and do not apply to the test set. To solve the problem, many linearization methods of manifold learning are proposed. For example, Isomap Projection (IsoP) [24] is the linearization of Isomap, locality preserving projection (LPP) [25] is the linearization of LE, neighborhood preserving embedding (NPE) [26] is the linearization of LE, and linear local tangent space alignment (LLTSA) [27] is the linearization of LTSA. Specially, LLTSA first projects the data set into PCA subspace to avoid the singularity of the matrix and throw away the redundancy information. Secondly, it denotes a set k nearest neighbors (KNN) by a matrix for every point. And thirdly, it finds out a low-dimensional embedding of high-dimensional data with a linear mapping. This mapping keeps the structure of the original manifold data points. LLTSA not only uses the tangent space to preserve manifold structure as LTSA does but also solves the “out-of-sample” problem by providing the linear mapping available on both the training set and test set.

The objective of LLTSA is to seek out a low-dimensional embedding that keeps the structure of the local geometry from the original manifold data points. More specifically, LLTSA constructs a neighbor graph for each data point by k nearest neighbors (KNN), and then, by using tangent space, it computes a local linear approximation for the data set. Last, LLTSA obtains the optimal mapping by minimizing the error from high-dimensional manifold to the low-dimensional feature space. Besides, the eigenproblem of molecular alignment [28] is related to LLTSA. LLTSA aims to align the local tangent spaces in low-dimensional spaces, and eigenproblem translated for alignment of molecules aims to distinguish different molecular arrangements. Their goal is to align the items in the proper position in space. Furthermore, There are already several improved methods based on LLTSA. For example, adaptive linear local tangent space alignment (ALLTSA) [29] is aimed at solving the problem that it is always hard to choose the best k in the neighborhood selection. Orthogonal linear local tangent space alignment (OLLTSA) [30] eliminates the redundant information by taking the constraint of the basis vector on the orthogonal form. The warp linear local tangent space alignment [31] constructs a curved local tangent space measure to improve the performance. An improved linear local tangent space alignment algorithm based on principal component analysis (PLLTSA) [32] improves LLTSA by considering not only the local geometric structure of the data set but also the global structure of the samples. Weighted linear local tangent space alignment (WLLTSA) [33] is a recently proposed improved method of LLTSA, which uses the weighted version of PCA to approximate local tangent space in each neighborhood instead of conventional PCA.

As the conventional LLTSA method is unsupervised, some supervised improved methods are proposed to exert the label information to obtain more accurate low-dimensional embedding. By taking the label information into consideration and redefining the distance matrix, discriminant linear local tangent space alignment algorithm (DLLTSA) [34] and supervised-linear local tangent space alignment (S-LLTSA) [35] are proposed. And based on the above idea, many modified algorithms are proposed. For example, adaptive discriminant linear local tangent space alignment (ADLLTSA) [36] is proposed by Lv, whose main idea is to add the adaptive neighborhood selection to DLLTSA. And orthogonal discriminant linear local tangent space alignment (ODLLTSA) [37] orthogonalizes the subspace generated by DLLTSA. Marginal discriminant linear local tangent space alignment (MDLLTSA) [38] improves LLTSA by considering the margin of intraclass and interclass.

However, the LLTSA method and all its extended versions obtain the embedding only by considering the one-way mapping from high-dimensional space to low-dimensional space. This mapping enables the embedded low-dimensional data points to preserve the local neighborhood information of the original samples partly. At the same time, the information about the original high-dimensional space may lose. So it may not “represent” the original sample very accurately and effectively.

To address the above problems, in this paper, based on the encoder–decoder paradigm, we present a novel LLTSA method named LLTSA-AE (LLTSA with autoencoder). Specifically, under the condition of maintaining the neighborhood structure information of the samples, the data points in high-dimensional manifold space are encoded into data points in low-dimensional space by using the conventional LLTSA projection model. Furthermore, we also use the decoder to reconstruct the original high-dimensional data points from the low-dimensional data points and minimize the error between the original space and the reconstructed space. However, the original LLTSA algorithm only considers one-way mapping from high-dimensional space to low-dimensional space, and the modified methods of LLTSA all focus on enhancing the one-way mapping to improve performance. That is, compared with the conventional LLTSA and the modified methods of LLTSA, the new LLTSA method has an additional reconstruction stage. This stage enables the low-dimensional data to retain as much information as possible about the original high-dimensional data, so the embedded low-dimensional data “represent” the original samples more accurately and effectively.

The rest of this paper proceeds as follows: in “The related works”, we review the LLTSA method and autoencoder. In “Linear local tangent space alignment with autoencoder”, we propose the novel LLTSA method with the encoder–decoder paradigm. In “Experimental results”, we make experiments to evaluate the new method. The conclusion and future work are given in “Conclusion”.

The related works

Local tangent space alignment

LLTSA is the linearization version of LTSA, so the algorithm of LTSA is demonstrated as follows. Suppose there are l original samples \(\varvec{{x}}_{{1}},{{\varvec{{x}}_{2}},...,\varvec{{x}}_{l}}\). in \({\mathbb {R}}^n\) space and denote \(\varvec{X}=\left[ \varvec{{x}}_{1},{{\varvec{x}}_{2}},\cdots ,\varvec{x}_l \right] \).

  1. 1.

    Construct neighborhoods: to construct a neighborhood for each point. In the original high-dimensional input space, the k nearest neighbors of each point are found based on Euclidean distance.

  2. 2.

    Extracting local coordinates: to preserve the local structure of the data points, the local coordinates are computed by solving the optimal linear problem Eq. (1).

    $$\begin{aligned}{} & {} \sum \limits ^{k_{i}}_{j=1}\left\| {\varvec{x}}_{i j}-\left( \overline{{\varvec{x}}}_{i}+{\varvec{Q}}_{i} \varvec{\theta }_{j}^{(i)}\right) \right\| ^{2} \nonumber \\{} & {} \quad =\mathop {\min }\limits _{{\varvec{x}},\left\{ \varvec{\theta }_{j}\right\} , {\varvec{Q}}^{T} {\varvec{Q}}={\varvec{I}}}^{} \left\| {\varvec{x}}_{ij}-\left( \overline{{\varvec{x}}}_{i}+{\varvec{Q}}_{i} \varvec{\theta }_{j}^{(i)}\right) \right\| ^{2} \end{aligned}$$
    (1)

    where \(\varvec{Q}\) is an orthonormal basis matrix of tangent space, and the solution of Eq. (1) is \(\overline{{{\varvec{x}}_{i}}}=\frac{1}{{{k}_{i}}}\sum \nolimits _{j=1}^{{{k}_{i}}}{{{\varvec{x}}_{{{i}_{j}}}}}\) and \({{\varvec{\theta }}_{j}}^{(i)}={\varvec{Q}_{i}}^{T}({\varvec{x}_{{{i}_{j}}}}-{{\varvec{{\overline{x}}}}_{i}})\), which are the local coordinates of \({{\varvec{x}}_{{{i}_{j}}}}\). The above procedure is actually a local principal component analysis.

  3. 3.

    Aligning local coordinates: to preserve more geometry in low-dimensional feature space as possible. The error of reconstruction from high-dimensional manifold space to low-dimensional feature space is minimized, i.e.

    $$\begin{aligned} {{\mathop {\min } _ { \varvec{T}, \varvec{T} ^ { \varvec{T} } = \varvec{I} } \sum _ { i = 1 } ^ { I } \mathop {\min } _ { \begin{array}{c} c _ { i } \in \varvec{R} ^ { d } \\ L \in \varvec{R} ^ { d \times d } \end{array} } \frac{ 1 }{ k _ { i } } \Vert \varvec{t} _ { i j } - ( \varvec{c} _ { i } + \varvec{L} _ { i } \varvec{\theta } _ { J } ^ { ( i ) } ) \Vert ^ { 2 }}} \end{aligned}$$
    (2)

    where \(T=\left[ {{t}_{1}},...,{{t}_{I}} \right] \in {{\varvec{R}}^{d\times I}}\) is the objective global coordinates. Furthermore, Eq. (2) can be transformed into the following eigenvalue problem with algebra processing.

    $$\begin{aligned} \min _{{\varvec{T}} T^{T}={\varvec{I}}} {\text {tr}}\left( \phi ^{T}\right) \end{aligned}$$
    (3)

    where the alignment matrix \(\varvec{\phi } = \sum \nolimits _{i=1}^{I}{\frac{1}{{{k}_{i}}}}{{\varvec{S}}_{i}}{{\varvec{\phi }}_{i}}\varvec{S}_{i}^{T}\) is symmetric semidefinite. \({{\varvec{S}}_{i}}\) is the 0–1 selection matrix which computed by \(\varvec{T}{{\varvec{S}}_{i}}=\left[ {{\varvec{t}}_{i1}},...,{{\varvec{t}}_{i{{k}_{i}}}} \right] ={{\varvec{T}}_{i}}\). \({{\varvec{\phi }}_{i}}\) is an orthogonal projection.

Linear local tangent space alignment

Although LTSA can learn from the manifold space, it faces the problem that it is only available on the training set. Thus, the LLTSA method is proposed to solve the problem.

LLTSA is aimed at finding out a dimensionality reduction mapping available on both the training set and test set.

$$\begin{aligned} {{\varvec{y}}_{i}}={{\varvec{A}}^{T}}{{\varvec{x}}_{i}} \end{aligned}$$
(4)

Considering the mapping (4), the objective function of LLTSA can be computed by the following problem:

$$\begin{aligned} {{\varvec{A}}_{{\text {opt}}}}\varvec{=}\underset{{{\varvec{A}}^{T}}\varvec{XH}{{\varvec{X}}^{T}}\varvec{A=I}}{\mathop {\arg \min }}\,{\text {tr}}\left( {{\varvec{A}}^{T}}\varvec{XHBH}{{\varvec{X}}^{T}}\varvec{A} \right) \end{aligned}$$
(5)

where \({{\varvec{A}}^{T}}\varvec{XH}{{\varvec{X}}^{T}}\varvec{A}=\varvec{I}\) is a constraint, \(\varvec{B}=\varvec{SW}{{\varvec{W}}^{T}}{{\varvec{S}}^{T}}\), \(\varvec{S}=\left[ {{\varvec{S}}_{1}},{{\varvec{S}}_{2}},...,{{\varvec{S}}_{I}} \right] \), \({{\varvec{S}}_{i}}\) is the 0–1 selection matrix which computed by \(\varvec{YS}={{\varvec{Y}}_{i}}\), \(\varvec{Y}=\left[ {{\varvec{y}}_{1}},...,{{\varvec{y}}_{I}} \right] \) and \({{\varvec{y}}_{i}}\) is the global coordinates. And \(\varvec{W}=diag\left( {{\varvec{W}}_{1}},...,{{\varvec{W}}_{I}} \right) \) with \({{\varvec{W}}_{i}}={{\varvec{H}}_{k}}\left( \varvec{I }-{{\varvec{V}}_{i}}\varvec{V}_{i}^{T} \right) \). \({{\varvec{H}}_{k}}=\varvec{I}-e{{e}^{T}}/k\) represents the centering matrix, and \({{\varvec{V}}_{i}}\) is the matrix of d right singular vector of \({{\varvec{X}}_{i}}{{\varvec{H}}_{K}}\) corresponding to its d largest singular values.

The object function (5) is easy to resolve by applying Lagrange multiplier methods, and the function is transformed into a generalized eigenvalue problem:

$$\begin{aligned} \varvec{XHBH}{{\varvec{X}}^{T}}\varvec{a}=\lambda \varvec{XH}{{\varvec{X}}^{T}}\varvec{a} \end{aligned}$$
(6)

Then, the solution of (6) \({{\varvec{a}}_{1}},...,{{\varvec{a}}_{d}}\) ordered corresponding to the size of the eigenvalues, \({{\varvec{\lambda } }_{1}},...,{{\varvec{\lambda } }_{d}}\),so the mapping computed by the above process can be expressed as \({{\varvec{A}}_{{\text {LLTSA}}}}=({{\varvec{\alpha }}_{1}},...,{{\varvec{\alpha }}_{d}})\). But because the PCA is used to avoid singularity, the ultimate transformation matrix is:

$$\begin{aligned} \varvec{A}={{\varvec{A}}_{{\text {PCA}}}}{{\varvec{A}}_{{\text {LLTSA}}}} \end{aligned}$$
(7)

Autoencoder

The standard autoencoder is a two-layer fully connected neural network, including the input layer, hidden layer, and output layer. The input layer and the hidden layer constitute the encoder, and the hidden layer and the output layer constitute the decoder. The encoder encodes the input data into a new feature representation, and the decoder decodes the feature expression to obtain the reconstruction of the input data. The autoencoder trains the weight parameters by minimizing the error between the reconstruction and the original input data, to get the optimal feature representation of the input data.

Thus, there are many improvements for autoencoder, which include Stacked Autoencoder [39, 40], Sparse Autoencoder [41], Convolutional Autoencoder [42], Variational Autoencoder [43], etc.

For autoencoder, if the number of hidden layer nodes is less than the number of input layer nodes, it is called the undercomplete model; otherwise, it is called the overcomplete model. If the activation function of the hidden layer is linear, the autoencoder is called linear autoencoder. In this paper, the proposed method is based on the linear, undercomplete autoencoder with only one hidden layer.

Linear local tangent space alignment with autoencoder

Framework

In his section, the framework of the new LLTSA is demonstrated as follows. The new LLTSA method can be divided into two stages based on the structure of the encoder–decoder paradigm.

The first stage is the encoding stage by using the conventional projection model of LLTSA. The model maps the high-dimension data point \({{\varvec{x}}_{i}}\) to the low-dimensional data point \({{\varvec{y}}_{i}}\) with the linear mapping \({{\varvec{y}}_{i}}={{\varvec{A}}^{T}}{{\varvec{x}}_{i}}\). From the perspective of the linear autoencoder, this mapping can be regarded as encoding each high-dimensional data point into a low-dimensional data point. And, the mapping simultaneously preserves local neighborhood information of the original samples, i.e., if the original samples \({{\varvec{x}}_{i}}\) and \({{\varvec{x}}_{j}}\) are “close”, then the embedded points \({{\varvec{y}}_{i}}\) and \({{\varvec{y}}_{j}}\) are also “close”.

The second stage is to decode the low-dimensional embedded point \({{\varvec{y}}_{i}}\) to the original data with the linear autoencoder. Let \({{\varvec{{\hat{x}}}}_{i}}\) be the reconstruction of the original data point \({{\varvec{x}}_{i}}\), \({{\varvec{A}}^{*}}\) be the weight matrix of the decoder. Mathematically, the decoding process may be formulated as:

$$\begin{aligned} {{\varvec{{\hat{x}}}}_{i}}={{\varvec{A}}^{*}}{{\varvec{y}}_{i}} \end{aligned}$$

To simplify the model, the weights of encoder and decoder in autoencoder can be tied as introduced in Ref. [44], i.e., \({{\varvec{A}}^{*}}={{\left( {{\varvec{A}}^{T}} \right) }^{T}}=\varvec{A}\). Thus, the decoding process may be rewritten as:

$$\begin{aligned} {{\varvec{{\hat{x}}}}_{i}}=\varvec{A}{{\varvec{y}}_{i}} \end{aligned}$$

Above all, the reconstruction error between original manifold data \({{\varvec{x}}_{i}}\) and the reconstruction data \({{\varvec{{\hat{x}}}}_{i}}\) built by autoencoder is supposed to be minimized, which makes the low-dimensional embedded point “represents” the original sample more accurately and effectively. Figure 1 shows the architecture of the proposed LLTSA-AE method.

Fig. 1
figure 1

The architecture of the proposed LLTSA-AE method

As described above, the proposed LLTSA-AE method is divided into two stages. The conventional LLTSA is regarded as the first stage. The second stage is to decode the low-dimensional data to the high-dimensional data space. And the reconstruction error between the original data and the reconstruction data is supposed to be minimized.

Compared with the LLTSA and other modified methods of LLTSA such as ALLTSA, OLLTSA, and WLLTSA, the proposed LLTSA-AE not only consider the error from high-dimensional space to low-dimensional space but also the reconstruction error between reconstructed space and original space, which is regarded as the decoding stage of the autoencoder. That is, our method obtains low-dimensional data by considering two-way mapping, rather than one-way mapping like traditional LLTSA algorithm and other modified methods of LLTSA. Thus, the proposed LLTSA-AE can “represent” the original samples more accurately and effectively. However, the original LLTSA algorithm obtains the low-dimensional embedding by considering minimizing the error from high-dimensional space to low-dimensional space. Besides, other modified methods of LLTSA also only concentrate on optimizing the process of the mapping from high-dimensional space to low-dimensional space. For example, ALLTSA tries to obtain better neighborhood number k to compute the better mapping by minimizing the error from high-dimensional space to low-dimensional space. OLLTSA is to eliminate the redundant information of the original manifold by taking the constraint of the basis vector on the orthogonal form. And WLLTSA enhances the mapping from high-dimensional space to low-dimensional space by repaceing the original PCA algorithm with the weight PCA algorithm. Thus, the proposed LLTSA-AE is superior to the conventional LLTSA and other modified algorithms.

Based on such idea, the new method is named LLTSA-AE (Linear Local Tangent Space Alignment with Autoencoder).

The objective function

To implement the LLTSA-AE algorithm, the objective function of the LLTSA-AE is proposed in this section.

The first stage of LLTSA-AE is the conventional projection model of LLTSA and preserves the local neighborhood information of the samples. It can be formulated as minimizing Eq. (5). In this paper, Eq. (5) is used as the first item of the objective function of the new method, i.e.,

$$\begin{aligned} {{{\mathcal {L}}}_{1{\text {st}}}}(\varvec{A})={\text {tr}}\left( {{\varvec{A}}^{\varvec{T}}}\varvec{XHBH}{{\varvec{X}}^{\varvec{T}}}\varvec{A} \right) \end{aligned}$$
(8)

And the original constraint is also imposed on Eq. (8), i.e., \({{\varvec{A}}^{T}}\varvec{XH}{{\varvec{X}}^{T}}\varvec{A}=\varvec{I}\) can be relaxed as:

$$\begin{aligned} {{{\mathcal {L}}}_{2{\text {nd}}}}(\varvec{A})={\text {tr}}\left( {{\varvec{A}}^{\varvec{T}}}\varvec{XH}{{\varvec{X}}^{\varvec{T}}}\varvec{A} \right) -d \end{aligned}$$
(9)

where, the parameter d is the dimensionality number of the target low-dimensional space, and the description is presented in “Local tangent space alignment”.

The second stage of LLTSA-AE is to reconstruct the original data and minimize the error between the original data and the reconstructed data. And the objective function of this stage denotes as follows:

$$\begin{aligned} {{{\mathcal {L}}}_{3{\text {rd}}}}(\varvec{A}) = {\text {tr}}\left( \left( \varvec{I}-\varvec{A}{{\varvec{A}}^{T}} \right) \varvec{X}{{\varvec{X}}^{T}}{{\left( \varvec{I}-\varvec{A}{{\varvec{A}}^{T}} \right) }^{T}} \right) \end{aligned}$$
(10)

Finally, combine Eqs. (8), (9), and (10), the objective function of LLTSA-AE algorithm is obtained. And the new method is to find the optimal projection matrix \(\varvec{A}\) by minimizing the objective function:

$$\begin{aligned} {\mathcal {L}}(\varvec{A})= & {} {{{\mathcal {L}}}_{1{\text {st}}}}(\varvec{A})+\lambda {{{\mathcal {L}}}_{2{\text {nd}}}}(\varvec{A})+\gamma {{{\mathcal {L}}}_{3{\text {rd}}}}(\varvec{A}) \nonumber \\= & {} {\text {tr}}\left( {{\varvec{A}}^{T}}\varvec{XHBH}{{\varvec{X}}^{T}}\varvec{A} \right) +\lambda \left( {\text {tr}}\left( {{\varvec{A}}^{T}}\varvec{XH}{{\varvec{X}}^{T}}\varvec{A} \right) -d \right) \nonumber \\{} & {} +\gamma {\text {tr}}\left( \left( \varvec{I}-\varvec{A}{{\varvec{A}}^{T}} \right) \varvec{X}{{\varvec{X}}^{T}}{{\left( \varvec{I}-\varvec{A}{{\varvec{A}}^{T}} \right) }^{T}} \right) \end{aligned}$$
(11)

where \(\lambda \) and \(\gamma \) are the balance parameters, reflecting the importance of the corresponding item.

Justification

As the first stage is the conventional LLTSA algorithm, the justification of Eqs. (8) and (9) is not discussed here and it can be viewed in detail in [27].

The second stage of LLTSA-AE is to reconstruct the original data and minimize the error between the input data point \({{\varvec{x}}_{i}}\) and its reconstruction data point \({{\varvec{{\hat{x}}}}_{i}}\). It can be formulated as:

$$\begin{aligned} \sum \limits _{i=1}^{l}{\left\| {{\varvec{x}}_{i}}-{{{\varvec{{\hat{x}}}}}_{i}} \right\| _{2}^{2}} \end{aligned}$$
(12)

The reconstruction point \({{\varvec{{\hat{x}}}}_{i}}\) can be further expressed as follows by considering Eq. (4), the linear mapping of LLTSA:

$$\begin{aligned} {{\varvec{{\hat{x}}}}_{i}}=\varvec{A}{{\varvec{y}}_{i}}=\varvec{A}{{\varvec{A}}^{T}}{{\varvec{x}}_{i}} \end{aligned}$$
(13)

Besides, Eq. (12) is the third part of the objective function of LLTSA-AE and it can be rewritten as:

$$\begin{aligned} \begin{aligned} {{{\mathcal {L}}}_{3{\text {rd}}}}(\varvec{A})&=\sum \limits _{i=1}^{l}{\left\| {{\varvec{x}}_{i}}-\varvec{A}{{\varvec{A}}^{T}}{{\varvec{x}}_{i}} \right\| _{2}^{2}}\\&=\sum \limits _{i=1}^{l}{\left\| \left( \varvec{I}-\varvec{A}{{\varvec{A}}^{T}} \right) {{\varvec{x}}_{i}} \right\| _{2}^{2}}\\&={\text {tr}}\left( \left( \varvec{I}-\varvec{A}{{\varvec{A}}^{T}} \right) \varvec{X}{{\varvec{X}}^{T}}{{\left( \varvec{I}-\varvec{A}{{\varvec{A}}^{T}} \right) }^{T}} \right) \end{aligned} \end{aligned}$$
(14)

It is worth mentioning that \(\left\| \varvec{A} \right\| _{F}^{2}\) regularization has not been considered in our model. It is unnecessary because the weights of the encoder and decoder are tied, i.e., \({{\varvec{A}}^{*}}=\varvec{A}\). If the norm \(\left\| \varvec{A} \right\| _{F}^{2}\) is large, the low-dimensional projection produced by the encoder will have large values; and then, in the decoding stage, after the low-dimensional projection is multiplied by the matrix \(\varvec{A}\), bad reconstruction will be produced. That is, the \(\left\| \varvec{A} \right\| _{F}^{2}\) regularization has been automatically handled by the reconstruction constraints [44].

Optimization

The formulation of LLTSA is linear and can be transformed into a generalized eigenvalue problem. However, the objective function of LLTSA-AE is non-linear and hard to solve directly. Thus, the stochastic gradient descent with momentum algorithm is employed to obtain the optimal matrix \(\varvec{A}\).

And the algorithm mainly includes three steps:

  1. 1.

    Calculate the gradient of the objective function (15)

    $$\begin{aligned} \nabla {\mathcal {L}}(\varvec{A}_{t}) =\nabla {{{\mathcal {L}}}_{1{\text {st}}}}(\varvec{A}_{t})+ \lambda \nabla {{{\mathcal {L}}}_{2{\text {nd}}}}(\varvec{A}_{t})+\gamma \nabla {{{\mathcal {L}}}_{3{\text {rd}}}}(\varvec{A}_{t}).\nonumber \\ \end{aligned}$$
    (15)

    Where t is the number of iterations,

    $$\begin{aligned} \begin{aligned}&\nabla {{{\mathcal {L}}}_{1{\text {st}}}}(\varvec{A}_{t})=2\varvec{X}_{i_t}\varvec{H}_{i_t}\varvec{B}_{i_t}\varvec{H}_{i_t}{{\varvec{X}_{i_t}}^{T}}\varvec{A}_{t}, \\&\nabla {{{\mathcal {L}}}_{2{\text {nd}}}}(\varvec{A}_{t})=2\varvec{X}_{i_t}\varvec{H}_{i_t}{{\varvec{X}_{i_t}}^{T}}\varvec{A}_{t}, \\&\nabla {{{\mathcal {L}}}_{3{\text {rd}}}}(\varvec{A}_{t})=-4\left( \varvec{I}-\varvec{A}_{t}{{\varvec{A}_{t}}^{T}} \right) \varvec{X}_{i_t}{{\varvec{X}_{i_t}}^{T}}\varvec{A}_{t}. \end{aligned} \end{aligned}$$

    Where \(i_t \in [1, n]\) represents the sample number randomly selected according to uniform distribution in the iteration.

  2. 2.

    Calculate the gradient of historical accumulation by the following formula:

    $$\begin{aligned} \varvec{v}_t = \rho \varvec{v}_{t} + \alpha {\nabla {{\mathcal {L}}}_{i_t}}(\varvec{A}_t) \end{aligned}$$
    (16)

    Where \(\rho \in (0, 1)\) is the coefficient of momentum, and its default value is 0.9 in the experiment. \(\alpha \) is the learning rate, which is default set to \(5*10^{-3}\) in the next experiment.

  3. 3.

    With the historical accumulation gradient, update the matrix \(\varvec{A}\) using the following formula until the optimal matrix \(\varvec{A}\) is found:

    $$\begin{aligned} {\varvec{A}_{t+1}=\varvec{A}_{t}- \varvec{v}_t} \end{aligned}$$
    (17)

The algorithm of LLTSA-AE is summarized in Algorithm 1.

figure a

After the above algorithm, the optimal projection matrix \(\varvec{A}\) is obtained. And the solution matrix \(\varvec{A}\) not only satisfies the formula of conventional LLTSA but also takes the reconstruction stage into consideration. Thus, the low-dimensional data point \({{\varvec{y}}_{i}}\) computed by the linear mapping \({{\varvec{y}}_{i}}={\varvec{A}^{T}}{{\varvec{x}}_{i}}\) can “represent” the original high-dimensional data point \({{\varvec{x}}_{i}}\) accurately.

The parameter analysis

Two free parameters \(\lambda \) and \(\gamma \) (see Eq. (11)) are contained in the proposed LLTSA-AE algorithm.

In Eq. (11), if \(\gamma = 0\), Eq. (11) is reduced to

$$\begin{aligned} {\mathcal {L}}={\text {tr}}\left( {{\varvec{A}}^{T}}\varvec{XHBH}{{\varvec{X}}^{T}}\varvec{A} \right) +\lambda \left( {\text {tr}}\left( {{\varvec{A}}^{T}}\varvec{XH}{{\varvec{X}}^{T}}\varvec{A} \right) -d \right) \nonumber \\ \end{aligned}$$
(18)

Equation (18) is actually another way to solve the objective function of conventional LLTSA, because the reconstruction formula is no longer considered in it. And the aim of LLTSA is to find out the eigenvectors corresponding to the minimum eigenvalues. Therefore, the parameter \(\lambda \) is taken smaller values and it is validated by our next experiments.

In Eq. (11), the first item \({{{\mathcal {L}}}_{1{\text {st}}}}\) can be regarded as local structure information preserving item and its coefficient can be regarded as 1. The third item \({{{\mathcal {L}}}_{3{\text {rd}}}}\) can be regarded as the reconstruction item and the parameter \(\gamma \) reflects the proportion of reconstruction. If \(0<\gamma <1\), the proportion of the reconstruction item is less than that of the local structure information preserving item.

Experimental results

In this Section, we make experiments on the Handwritten Alphadigits dataset, FERET (Face Recognition Technology) dataset, GT (Georgia Tech) face dataset, and Yale face dataset. The proposed LLTSA-AE is compared with LPP, NPE, LLTSA, ALLTSA, OLLTSA, PLLTSA, and WLLTSA respectively.

For the LPP, NPE, LLTSA, and the improved methods of LLTSA, PCA needs to be used to reduce the dimension of the original samples. For all datasets, 98% of the principal components are retained in LPP and NPE, and 70% in LLTSA and other improved methods of LLTSA in this step.

The accuracy in the experiment can be described as:

$$\begin{aligned} {\text {ACC}}=\frac{N^{CC}}{N} \end{aligned}$$

Where N is the total number of classified subjects, \(N^{CC}\) is the total number of correctly classified subjects, and ACC is the accuracy [45]. Accuracy is widely used in scientific research. The advantage of the index is intuitive and clear, and the performance of the model can be viewed at a glance. Its disadvantage is that it may not perform well on unbalanced data sets. However, the data sets used in this paper are all balanced data sets, so the accuracy is the optimal index, which can not only well represent the performance of the model, but also make the performance more intuitive and clearer.

With Matlab’s powerful matrix computing power and rich toolbox resources, we program on Intel i7-8750 H CPU and 16GB RAM computer based on MatlabR2021a. Besides, the optimization techniques of our algorithm refer to an easy-to-use optimization library: SGDLibrary [46]. The SGDLibrary is a pure-MATLAB library or toolbox of a collection of stochastic optimization algorithms. The library contains many optimization algorithms, for example, the traditional SGD and SGD with classical momentum. Besides, other optimization algorithms such as Adam and RMSProp are also implemented in the library.

Experimental datasets

Fig. 2
figure 2

Some samples in the experiments. The first line is the Handwritten Alphadigits dataset. The second line is the FERET dataset. The third line is the GT dataset. The fourth line is the Yale dataset

Handwritten Alphadigits dataset. This dataset consists of 10 digits of “0” through “9” and 26 capital letters “A” through “Z”, with 39 examples of each class. Each sample is a handwritten image. In the experiment, the image size is 20\(\times \)16 pixels.

FERET face dataset [47] was constructed by the Army Research Laboratory. There are up to 200 human subjects and each person contains 7 images. Notably, the photos of the same person vary in expression, lighting, posture, and age. In the experiment, the size of the images is also adjusted to 32\(\times \)32 pixels.

GT face dataset [48] contains 50 persons‘ face images taken between 06/01/99 and 11/15/99. And there are 15 images of each people. The images contain frontal and tilted faces with different facial expressions, lighting conditions, and scales. In the experiment, the images are resized to 32\(\times \)32 pixels.

Yale face dataset [49] was made at the Yale Center for Computational Vision and Control. And it contains 165 grayscale images of 15 individuals, with 11 samples of each individual. The images contain variations in facial expressions, lighting, and with/without glasses. In this experiment, all images are aligned based on eye coordinates and are cropped and scaled to 24\(\times \)24.

Some sample images of the Handwritten Alphadigits, FERET, GT, and Yale are shown in the following Fig. 2.

Parameter settings

Fig. 3
figure 3

The recognition accuracy in four datasets w.r.t the parameter \(\lambda \) (\(\gamma \)) while \(\gamma \) (\(\lambda \)) fixed within long ranges

Fig. 4
figure 4

The recognition accuracy in four datasets w.r.t the parameter \(\lambda \) (\(\gamma \)) while \(\gamma \) (\(\lambda \)) fixed within short ranges

In this section, on four datasets, for the LLTSA-AE method, we make experiments to find the optimal configuration of the parameters \(\lambda \) and \(\gamma \). The range of the parameters \(\lambda \) and \(\gamma \) depends on experience and the algorithm is an iterative process. At the beginning of the iteration, the values of each item in the objective function are larger, and after multiple iterations, the values of each item in the objective function are getting smaller. Therefore, after comprehensive consideration, we have taken a relative compromise range of [0, 50]. This range (described as the long range) is used to find the trend of the parameters, and then an optimal short range is found. I.e., \(\lambda \) and \(\gamma \) are separately found on the short-range [0, 4] and [0, 6] to obtain the optimal values. In the experiment, the training samples of 4 datasets, Handwritten Alphadigits, FERET, GT, and Yale, are 9, 5, 11, and 8, respectively. The neighborhood size parameter k = 10, and the subspace dimension is 40.

When the parameter \(\gamma \) is fixed, the recognition rate curves w.r.t the parameter \(\lambda \) in the four datasets are shown in Figs. 3a and 4a. When the parameter \(\lambda \) is fixed, the recognition rate curves w.r.t the parameter \(\gamma \) in the four datasets are shown in Figs. 3b and 4b. According to Figs. 3 and 4, their manifestations are as: (1) for parameter \(\lambda \), the changing trend of its recognition accuracy is weak in the range [0, 3], and then it declined obviously after 3. (2) for parameter \(\gamma \), the recognition accuracy increases with the increase of \(\gamma \) within the range of [0, 2]. And then the recognition accuracy keeps steady after 2. Thus, We can conclude that: (1) the optimal value of the parameter \(\lambda \) is usually within the smaller values and begins to decline steadily and substantially after 3. (2) the optimal value of parameter \(\gamma \) is usually greater than 2, and its recognition rate tends to be stable after it is greater than 2. Therefore, we choose to conduct a grid search for two parameters between [0, 4] and [0, 6], which can completely cover the optimal parameter values obtained in the experiment and reduce meaningless search operations at the same time.

Table 1 The parameter configuration of the LLTSA-AE method on four experimental datasets
Table 2 Best average recognition accuracy (in percent), standard deviation, and the optimal dimension (in parentheses) on the Handwritten Alphadigits dataset

According to the recognition rate curve in Fig. 4, we can get the optimal configuration of the parameters \(\lambda \) and \(\gamma \), which are listed in Table 1. The optimal parameter configuration is used for the following experiments.

The experiments are proceeded as follows: Firstly, p samples of each subject are selected randomly to form as the training set, and the rest of the samples are used as the test dataset. Secondly, fix p, the reduced dimensions are taken from [10, 100] at the step size 5. Thirdly, fix p and the reduced dimension value, the neighborhood size k is augmented from 5 to 25 with the interval of 5. Fourthly, the best recognition rate corresponding to the best k value is used as the recognition rate in the current p and reduced dimension. Finally, for each of the reduced dimension and p, we can get the corresponding recognition rate.

We regard the above process as a cycle. For a given p, we calculate 10 cycles. In this way, there are 10 recognition rates for each subspace dimension, and then we take their average value as the recognition rate in the current p and subspace dimensions. Finally, we take the best recognition rate from the chosen reduced dimensions as the final result of the training samples p.

Experimental results

In the four datasets, for the above eight methods, the best recognition rates, standard deviation, and the optimal dimension are reported as follows.

Fig. 5
figure 5

The recognition rates versus the subspace dimension on the Handwritten Alphadigits dataset (9 training samples)

Table 3 Best average recognition accuracy (in percent), standard deviation, and the optimal dimension (in parentheses) on the FERET dataset
Table 4 Best average recognition accuracy (in percent), standard deviation, and the optimal dimension (in parentheses) on the GT dataset

We first evaluate the performance of LLTSA-AE on the Handwritten Alphadigits dataset which contains 26 capital letters “A” through “Z”. And the results is shown in Table 2 and Fig. 5.

As is shown in Table 2, LLTSA-AE achieves the highest accuracy compared with other methods. To be specific, the accuracy of LLTSA-AE exceeds that of the original LLTSA by 10.68, 9.65, and 9.4%. And it can be seen that with the dimension increases, the accuracy of LLTSA-AE goes up steadily. While other methods such as LLTSA and LPP get lower accuracy at higher dimensions.

Then the FERET dataset with 3, 4 and 5 training samples are used to evaluate the algorithm.

Fig. 6
figure 6

The recognition rates versus the subspace dimension on the FERET dataset (5 training samples)

According to the Table 3, LLTSA-AE also obtains the best performance. The recognition rates of LLTSA-AE are 53.50, 65.13 and 67.45%, which are higher than LLTSA 10.2, 20.63, and 14.03% separately. And Fig. 5 also shows that our proposed method can extract more representative information at higher dimensions. It’s worth explaining that our proposed method performs not so well at lower dimensions compared with some other methods, and this mainly because the bigger encoding rate by LLTSA algorithm may influence the reconstruction of the decoding stage.

And the experiment results on GT dataset is shown in follwoing Table 4 and Fig. 7.

Fig. 7
figure 7

The recognition rates versus the subspace dimension on the GT dataset (11 training samples)

Table 5 Best average recognition accuracy (in percent), standard deviation, and the optimal dimension (in parentheses) on the Yale dataset

Based on the Table 4 and Fig. 7, the recognition rate of LLTSA-AE is much higher than the conventional LLTSA algorithm. As shown in the Fig. 6, the accuracy of LLTSA-AE is still steadily improved in high dimensions, while other methods show a fluctuating trend.

Finally, the Yale face dataset is employed to test our proposed algorithm and the result is shown in the following Table 5 and Fig. 8.

Fig. 8
figure 8

The recognition rates versus the subspace dimension on the Yale dataset (8 training samples)

On the Yale face dataset, LLTSA-AE still obtains the highest recognition rate compared with all other methods. In detail, the recognition rates of LLTSA-AE are 76.19, 83.02 and 86.67% which is higher than LLTSA 11.05, 10.89 and 12.39%.

Fig. 9
figure 9

The convergence curve of LLTSA-AE

The convergence of the proposed LLTSA-AE is also investigated. The convergence curves of LLTSA-AE on four datasets are presented in Fig. 9. Where, the training samples are 9, 5, 11 and 8 in the Handwritten Alphadigits, FERET, GT, and YALE datasets respectively, the neighborhood size parameter k = 10, and the subspace dimension is 40. As can be seen from Fig. 9, LLTSA-AE can converge in only 30 steps on four datasets. Besides, we also simply investigate the computation time of our algorithm. We compute the average time 10 times under the same parameter settings as the above experiment on the four datasets. The time for each computation and classification on the four datasets: Handwritten Alphadigits, FERET, GT, and YALE datasets is 0.13, 2.13, 2.51, and 0.61 s separately. Since only one training process is required to get the mapping in real-world applications, the computation time is perfectly acceptable.

Analysis

From the above experiment, it can be concluded that: (1) as an improved method of LLTSA, the performance of LLTSA-AE has a great improvement on all datasets. Specifically, compared with the conventional LLTSA, LLTSA-AE has about 9, 14, 7, and 12% improvement in recognition rate on 4 datasets (Handwritten Alphadigits, FERET, GT, YALE), respectively. (2) moreover, LLTSA-AE is also compared with LPP, NPE, LLTSA, and the improved methods of LLTSA, for example, ALLTSA and OLLTSA. Compared with them, LLTSA-AE has different degrees of improvement, and the performance of LLTSA-AE is the best. (3) LLTSA-AE performs better at the high dimensions especially. And this is mainly because lower dimensions mean higher encoding rates, which may lose more information and it is hard to recover the original information accurately. Besides, this influences the reconstruction stage a lot which causes the lower recognition rate. (4) as can be seen from Figs. 4, 5, 6 and 7, the recognition rates of LLTSA-AE significantly outperform that of the other methods, which shows that LLTSA-AE has the best performance.

Conclusion

LLTSA and all its variants only consider one-way mapping from high-dimensional space to low-dimensional space, which may result in that projected low-dimensional data may not effectively “represent” the original data. In this paper, a novel LLTSA method called LLTSA-AE (linear local tangent space alignment with autoencoder) based on the encoder–decoder paradigm is proposed. The proposed LLTSA-AE is aiming at obtaining an optimal linear mapping from high-dimensional space to low-dimensional space by considering two-way mapping between high-dimensional space and low-dimensional space. And it makes the obtained low-dimensional data represent the original samples accurately and effectively. The main idea of LLTSA-AE is to take the conventional projection of LLTSA as the encoding stage and use the decoder to reconstruct the original data from the projected low-dimensional data. Compared with LLTSA, LLTSA-AE makes that the low-dimensional features more accurately and effectively “represent” the original samples with the reconstruction stage. The experiments on the Handwritten Alphadigits, FERET, GT, and Yale datasets show that LLTSA-AE obtains the optimal performance rate compared with other methods.

The idea of LLTSA-AE can be further extended to other manifold learning methods, such as neighborhood preserving embedding (NPE). Besides, there is no conflict with other promotion algorithms which means this idea can be employed in other improved methods of LLTSA.

And the classical autoencoder is simple and easy to implement, but in the face of a complex learning system, its learning ability is limited. Thus, some improved algorithms of autoencoder are proposed. For example, the stacked autoencoder is proposed to improve the learning ability by increasing the number of layers. And sparse autoencoder is proposed to address the problem that the traditional autoencoder may lose the automatic learning ability when the node number of the hidden layer is too many. Thus, the autoencoder used in our algorithm can further extend to these improved autoencoders. Furthermore, extending the autoencoder to other neural networks and combined with LLTSA algorithm is also a theoretically feasible direction.

Declarations

Novelty statements Based on the encoder–decoder paradigm, a new LLTSA method is proposed in this paper. The new LLTSA method is called LLTSA-AE (Linear Local Tangent Space Alignment with Autoencoder). In this method, the conventional LLTSA projection is regarded as the encoding stage, and the decoder is used to reconstruct the original high-dimensional data from the projected low-dimensional data. Since LLTSA and all its variants only consider the one-way mapping from high-dimensional space to low-dimensional space. The projected low-dimensional data of LLTSA may not “represent” the original data accurately and effectively. Based on the encoder–decoder paradigm, the proposed LLTSA-AE method not only make the low-dimensional embedding can “represent” the original data more accurately and effectively, but also preserves the original manifold structure. Finally, the experimental results on databases such as Handwritten Alphadigits, FERET, Georgia Tech (GT), etc show that LLTSA-AE outperforms LLTSA and some of its representative variants.