Heuristic rank selection with progressively searching tensor ring network

Li, Nannan; Pan, Yu; Chen, Yaran; Ding, Zixiang; Zhao, Dongbin; Xu, Zenglin

doi:10.1007/s40747-021-00308-x

Heuristic rank selection with progressively searching tensor ring network

Original Article
Open access
Published: 17 March 2021

Volume 8, pages 771–785, (2022)
Cite this article

Download PDF

You have full access to this open access article

Complex & Intelligent Systems Aims and scope Submit manuscript

Heuristic rank selection with progressively searching tensor ring network

Download PDF

Nannan Li^1,2^na1,
Yu Pan³^na1,
Yaran Chen^1,2,
Zixiang Ding^1,2,
Dongbin Zhao ORCID: orcid.org/0000-0001-8218-9633^1,2 &
…
Zenglin Xu^3,4

2133 Accesses
22 Citations
Explore all metrics

Abstract

Recently, tensor ring networks (TRNs) have been applied in deep networks, achieving remarkable successes in compression ratio and accuracy. Although highly related to the performance of TRNs, rank selection is seldom studied in previous works and usually set to equal in experiments. Meanwhile, there is not any heuristic method to choose the rank, and an enumerating way to find appropriate rank is extremely time-consuming. Interestingly, we discover that part of the rank elements is sensitive and usually aggregate in a narrow region, namely an interest region. Therefore, based on the above phenomenon, we propose a novel progressive genetic algorithm named progressively searching tensor ring network search (PSTRN), which has the ability to find optimal rank precisely and efficiently. Through the evolutionary phase and progressive phase, PSTRN can converge to the interest region quickly and harvest good performance. Experimental results show that PSTRN can significantly reduce the complexity of seeking rank, compared with the enumerating method. Furthermore, our method is validated on public benchmarks like MNIST, CIFAR10/100, UCF11 and HMDB51, achieving the state-of-the-art performance.

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

Article Open access 13 March 2024

Framework for the Training of Deep Neural Networks in TensorFlow Using Metaheuristics

A Collective Neurodynamic Optimization Approach to Nonnegative Tensor Decomposition

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Deep neural networks have made great successes in various areas, such as image classification [6, 15, 21], autonomous driving [1, 14, 30], game artificial intelligence [16, 20] and so on [13, 24, 25]. However, parameters redundancy leads to two major drawbacks for deep neural networks: (1) difficult training, and (2) poor ability to run on resource-constrained devices (e.g., mobile phones [7] and internet of things (IoT) devices [11]). To address these problems, Tensor Ring (TR) has been introduced to deep neural networks. With a ring-like structure as shown in Fig. 3, TR can significantly reduce the parameters of convolutional neural network (CNN) [26] and recurrent neural network (RNN) [17], and even can achieve better results than uncompressed models in some tasks. Thus, tensor ring is increasingly being researched.

However, as the crucial component of tensor ring, setting of rank (e.g. $R_{0} \sim R_{3}$ in Fig. 3) is seldom investigated. In most of the existing works, it merely sets to be equal in whole network [26]. Such an equal setting requires multiple manual attempts for a feasible rank value and often leads to a weak result. Fortunately, as shown in our synthetic experiment, we discover the relationship between the rank distribution and its performance. Experimental results demonstrate the link that part of rank elements with good performance will gather to the interest region. Then we extend this phenomenon to build our Hypothesis 1. Utilizing the hypothesis, we design a heuristic algorithm to explore the potential power of tensor ring.

Specifically, we propose progressive searching tensor ring network (PSTRN) inspired by neural architecture search (NAS) [31]. Similarly, our approach is divided into three parts,

search space: combinations of rank element candidates for TRN in evolutionary phase;
search strategy: the Non-dominated Sorted Genetic Algorithm-II (NSGA-II) [3] to search rank;
performance estimation strategy: stochastic gradient descent to train TRN.

The overall framework of PSTRN is illustrated in Fig. 1. In the searching process, we initialize search space first. Then through evolutionary phase, we derive optimized rank within search space. Next, to draw near interest region, the proposed approach shrinks the bound of search space to the around of optimized rank during progressive phase. By alternately executing evolutionary phase and progressive phase, our algorithm can find rank with high performance. Additionally, on large-scale models (i.e. ResNet20/32 [6] and WideResNet28-10 [29]), the performance estimation is time-consuming, which is harmful to search speed. So we employ a weight inheritance strategy [18] to accelerate the evaluation of rank.

Experimental results prove that PSTRN can obtain optimal rank of TRN according to Hypothesis 1. And our algorithm can compress LeNet5 [12] with compression ratio as 16x and 0.49% error rate in MNIST [4] image classification task. In TR-ResNets, our approach can achieve state-of-the-art performance on CIFAR10 and CIFAR100 [9]. PSTRN also exceeds TR-LSTM models that set rank elements equal on HMDB51 and UCF11. Furthermore, compared with the enumerating method, our work can greatly reduce the complexity of seeking rank. Overall, our contributions can be summarized as follows:

1.
PSTRN can search rank automatically instead of manual setting. At the meantime, The time cost is reduced significantly by progressively searching, compared with an enumerating method.
2.
To speed up the search on large-scale model, our proposed method adopts weight inheritance into the search process. And the proposed method achieves about $200 \times $ speed-up ratio on classification tasks of CIFAR10/100 datasets.
3.
As a heuristic approach based on Hypothesis 1, our algorithm can achieve better performance with fewer parameters than existing works. All the experimental results demonstrate the rationality of the hypothesis that is first found by us.

Background

In this section, we will introduce the tensor background and some related works that consist of rank fixed method and rank selection method. The rank fixed method is the work that sets rank manually, while rank selection method means the work of learning the rank.

Tensor background

In this part, we would like to introduce the background of tensor.

Notation

A tensor is a high-order array. In this paper, a d-order tensor $\varvec{{\mathcal {T}}} \in {\mathbb {R}}^{L_1\times L_2 \cdots \times L_d} $ is denoted by a boldface Euler script letter. With all subscripts fixed, each element of a tensor is expressed as: $\varvec{{\mathcal {T}}}_{l_1,l_2,\ldots l_d}\in {\mathbb {R}}$. Given a subset of subscripts, we can get a sub-tensor. For example, given a subset $\{L_1=l_1, L_2=l_2\}$, we can obtain a sub-tensor $\varvec{{\mathcal {T}}}_{l_1, l_2} \in {\mathbb {R}}^{L_3 \cdots \times L_d}$. Figure 2 draws the tensor diagrams that present the graphical notations and the essential operations.

Tensor contraction

Tensor contraction can be performed between two tensors if some of their dimensions are matched. As shown in Fig. 2b, given two 4-order tensors $\varvec{{\mathcal {A}}}\in {\mathbb {R}}^{I_1\times I_2\times I_3\times I_4}$ and $\varvec{{\mathcal {B}}} \in {\mathbb {R}}^{J_1\times J_2 \times J_3 \times J_4}$, when $I_3 = D_1 = J_1$ and $I_4 = D_2 =J_2$, the contraction between these two tensors results in a tensor with the size of $I_1\times I_2 \times J_3 \times J_4$, where the matching dimension is reduced, as shown in equation:

$$\begin{aligned} (\varvec{{\mathcal {A}}}\varvec{{\mathcal {B}}})_{i_1, i_2, j_3, j_4} = \sum _{m=1}^{D_1}\sum _{n=1}^{D_2} \varvec{{\mathcal {A}}}_{i_1, i_2, m, n }\varvec{{\mathcal {B}}}_{m, n, j_3, j_4}. \end{aligned}$$

(1)

Tensorization

Given a matrix $\mathbf {M} \in {\mathbb {R}}^{ I \times O}$, we transfer it into a new tensor

$$\begin{aligned} \varvec{{\mathcal {C}}} \in {\mathbb {R}}^{I_0 \times I_1 \times \cdots \times I_{M-1} \times O_0 \times O_1 \times \cdots \times O_{N-1}} \end{aligned}$$

satisfying the equation:

$$\begin{aligned} \prod _{i=1}^{M}I_i = I,\quad \prod _{j=1}^{N}O_j = O, \end{aligned}$$

(2)

where M, N are the number of the input nodes and output nodes respectively. Therefore, a corresponding element of $\mathbf {W}_{i, o}$ is $\varvec{{\mathcal {C}}}_{i_0, \ldots , i_{m-1}, o_0, \ldots , o_{n-1}}$, where $i \in \{1, \dots , I\}$, $o \in \{1, \ldots , O\}$, $i_*\in \{1, \dots , I_*\}$, $o_*\in \{1, \ldots , O_*\}$ are indexes, following the rule^{Footnote 1}

$$\begin{aligned} i = \sum ^{M}_{u=1}{\prod ^{i_u-1}_{v=1}{I_v}(i_u-1)}, o = \sum ^{N}_{u=1}{\prod ^{o_u-1}_{v=1}{O_v}(o_u-1)}. \end{aligned}$$

(3)

Tensor ring format (TRF)

TRF is constructed with a series of 3-order nodes linked one by one, forming a ring-like structure. The TRF of a d-order tensor can be formulated as

$$\begin{aligned} \varvec{{\mathcal {T}}}_{l_1,l_2,\ldots , l_d} = \sum ^{R_0, R_1, \ldots , R_{d-1}}_{r_0, r_1, \ldots , r_{d-1}} \varvec{{\mathcal {Z}}}^{(1)}_{r_0,l_1, r_1}\varvec{{\mathcal {Z}}}^{(2)}_{r_1, l_2, r_2} \dots \varvec{{\mathcal {Z}}}^{(d)}_{r_{d-1}, l_d,r_0}, \end{aligned}$$

where ${\mathbf {R}} = \{R_i | i \in \{0, 1, \dots , d-1\}\}$ denotes the rank of TRF, and the symbol $\varvec{{\mathcal {Z}}}$ represents the tensor ring node. Figure 3 shows a graph structure of a simple TRF.

Through replacing layers(e.g. convolutional layer, fully-connected layer) of a network with TRF, we can derive a TRN.

Rank fixed

Tensor ring decomposition has been successfully applied to the compression of deep neural networks. Wenqi et al. [26] compress both the fully connected layers and the convolutional layers of CNN with the equal rank elements for whole network. Yu et al. [17] replace the over-parametric input-to-hidden layer of LSTM with TRF, when dealing with high-dimensional input data. Rank of these models are determined via multiple manual attempts by manipulation, which requires much time.

Rank selection

In this part, we would like to introduce the works of rank selection. Yerlan et al. [28] formulate the low-rank compression problem as a mixed discrete-continuous optimization jointly over the rank elements and over the matrix elements. Zhiyu et al. [2] propose a novel rank selection scheme for tensor ring, which apply deep deterministic policy gradient to control the selection of rank. Their algorithms calculate the optimal rank directly from the trained weight matrix without the analysis of rank. Different from them, our approach is inspired by the relevance between the rank distribution and performance of Hypothesis 1, towards a better result.

Methodology

To verify the optimization of PSTRN on TRN, we choose two most commonly used deep neural networks for evaluation, i.e. tensor ring CNN (TR-CNN) and tensor ring LSTM (TR-LSTM).

In the section, we first present preliminaries of TR-CNN and TR-LSTM, including graphical illustrations of the two TR-based models. Then we elaborate on evolutionary phase and progressive phase of PSTRN. The implementation of weight inheritance will be given in final.

Preliminaries

TR-CNN

For a convolutional core $\varvec{{\mathcal {C}}} \in {\mathbb {R}}^{K \times K \times C_\mathrm{in} \times C_\mathrm{out}}$ where K denotes the kernel size, $C_{in}$ means the input channel and $C_{out}$ represents the output channel. We first reshape it as $\hat{\varvec{{\mathcal {C}}}} \in {\mathbb {R}}^{K \times K \times I_1 \times \cdots \times I_{\alpha } \times O_1 \times \cdots \times O_{\beta }}$, satisfying the rule

$$\begin{aligned} C_\mathrm{in} = \prod ^{\alpha }_{i=1}{I_i},\quad C_\mathrm{out} = \prod ^{\beta }_{j=1}{O_j}. \end{aligned}$$

(4)

Then we decompose it into input nodes $\varvec{{\mathcal {U}}}^{(i)} \in {\mathbb {R}}^{{R_{i-1}}\times {I_{i}}\times {R_{i}}}, i \in \{1, 2, \ldots , {\alpha }\}$, output nodes $\varvec{{\mathcal {V}}}^{(j)} \in {\mathbb {R}}^{{R_{{\alpha }+j}}\times {O_{i}}\times {R_{{\alpha }+j+1}}}, j \in \{1, 2, \ldots , {\beta }\}$ and one convolutional node $\varvec{{\mathcal {G}}} \in {\mathbb {R}}^{K\times K\times {R_{\alpha }}\times {R_{{\alpha }+1}}}$, where $R_{{\alpha }+{\beta }+1} = R_0$. An instance (${\alpha }=2$, ${\beta }=2$) is illustrated in Fig. 4a. And the compression ratio of TR-CNN is calculated as

$$\begin{aligned} C_\mathrm{CNN} = \frac{K^2C_\mathrm{in}C_\mathrm{out}}{\sum ^{\alpha }_{i=1}{R^2{I_{i}}} +\sum ^{\beta }_{j=1}{R^2{O_{j}}} +K^2R^2}, \end{aligned}$$

(5)

where R is a simplification of rank element. The TR-CNN is proposed by Wenqi et al. [26].

TR-LSTM

By replacing each matrix of the affine matrices $\mathbf {W}_{*} \in {\mathbb {R}}^{I \times O}$ of input vector $x \in {\mathbb {R}}^I$ with TRF in LSTM, we implement the TR-LSTM model as introduced by Yu et al. [17]. Similar to TR-CNN, the nodes are combined by input nodes $\varvec{{\mathcal {U}}}^{(i)}$ and output nodes $\varvec{{\mathcal {V}}}^{(j)}$, and the decomposition needs to follow

$$\begin{aligned} I = \prod ^{\alpha }_{i=1}{I_i},\quad O = \prod ^{\beta }_{j=1}{O_j}. \end{aligned}$$

(6)

A 6-node example is shown in Fig. 4b. Compression ratio of TR-LSTM can be computed as

$$\begin{aligned} C_\mathrm{RNN} = \frac{IO}{\sum ^{\alpha }_{i=1}{R^2I_i}+\sum ^{\beta }_{j=1}{R^2O_j}}. \end{aligned}$$

(7)

Progressive searching tensor ring network

In our search process, the rank ${\mathbf {R}}$ of a TRN is formulated as

$$\begin{aligned} {\mathbf {R}} = \{R_{0}, R_{1}, \ldots , R_{d-1} | R_{*} \in \{r_1, r_2, \ldots , r_{m}\}\}, \end{aligned}$$

(8)

where d is the number of rank elements, $r_{*}$ is a rank element candidate, and m is the quantity of rank element candidates. Full combinations of the rank elements (i.e. state space) can be calculated as

$$\begin{aligned} S_{\mathrm{state}} = {m}^d. \end{aligned}$$

(9)

Next, we would like to introduce Hypothesis 1, the extension of the aforementioned gathering phenomenon.

Hypothesis 1

When a shape-fixed TRN performs well, part or all of its rank elements are sensitive and each of them will tend to aggregate in a narrow region, which is called interest region.

According to Hypothesis 1, the optimal rank can be found in the interest region. It is a more efficient and accurate way to find a optimal rank in interest region rather than a much wider range of the whole rank element candidates. Thus, we build an pipeline of PSTRN to achieve the purpose by two alternative procedures:

Evolutionary phase: finding good models in the search space and locating the interest region through well-performed models.
Progressive phase: calculating the width of a rough approximation of interest region and defining search space within this region.

Through these two procedures, the rank of a TRN will approach the interest region and be optimized. Additionally, we apply weight inheritance to accelerate the training process. The pseudocode of the algorithm is shown as below, where P is the number of progressive phase, and G is the generations of each evolutionary phase.

Evolutionary phase

As described in Hypothesis 1 that well-performed models aggregate in interest region, good models keep a high probability of appearing in interest region. Therefore, we determine interest region around the models with high performance.

In PSTRN, we adopt multi-objectives genetic algorithm NSGA-II [3] to search for TR-based models with high performance and few parameters.

A typical genetic algorithm requires two prerequisites, the genetic representation of solution domain (i.e. search space), and the fitness functions (i.e. classification accuracy and compression ratio) that is used to evaluate each individual. In the process, an individual means the rank ${\mathbf {R}}$ and each rank element $R_{*}$ is in $\{\hat{r}_1, \hat{r}_2, \ldots , \hat{r}_n\}$ that is sampled from the whole rank element candidates. The search space is a sub-space of the state space and can be calculated as

$$\begin{aligned} S_{\mathrm{search}} = n^d. \end{aligned}$$

(10)

The method of choosing the search space will be introduced in the progressive phase. Classification accuracy is obtained by testing the model on the test dataset. And compression ratio of TR-CNN and TR-LSTM are calculated by Eqs. (5) and (7), respectively.

The key idea of the genetic algorithm is to evolve individuals via some genetic operations. At each evolutionary generation, the selection process preserves strong individuals as a population and then sorts them according to their fitness function, while eliminating weak ones. The retained strong individuals reproduce new children through mutation and crossover with a certain probability. After this, we obtain the new population consists of the new children and the retained strong individuals. The new population executes the evolution to derive next generation. When the termination condition is met, evolutionary phase stops and optimization of the rank will be completed. Eventually, taking the top-k individuals into consider, we derive the most promising rank element $\hat{R}_{*}$ by

$$\begin{aligned} \hat{R}_{*} = floor\left( \frac{1}{k}{\sum ^k_{i=1}{R_{*, i}}}\right) , \end{aligned}$$

(11)

where $R_{*, i}$ is a rank element of the i-th individual and floor denotes the rounding down operation. The interest region is around the $\hat{R}_{*}$.

Progressive phase

Progressive phase is used to determine the next search space as shown in Fig. 5. At the begining of the PSTRN, we first obtain initial search space via sampling from the state space at equal intervals as below:

$$\begin{aligned} \{R_\mathrm{min}+b_{1}, R_\mathrm{min}+2b_{1}, \ldots , R_\mathrm{min}+nb_{1}\}, \end{aligned}$$

(12)

where $R_\mathrm{min}$ is the minimum of rank element candidates, and $b_{1}$ is the initial sampling interval. Then through carrying out evolutionary phase within initial search space, we derive the promising rank

$$\begin{aligned} {\hat{\mathbf {R}}} = \{\hat{R}_{0,1}, \hat{R}_{1,1}, \ldots , \hat{R}_{d-1,1}\}, \end{aligned}$$

(13)

where $\hat{R}_{i,j}, i \in \{0, 1, 2, \ldots , d-1\}, j \in \{2, 3, \dots , P\}$ denotes the ith promising rank element in jth evolutionary phase. Based on the optimized rank, our PSTRN shrinks bound of search space to

Low bound: $\min (\hat{R}_{i,j-1}-s_{j}, R_{min})$,
High bound: $\max (\hat{R}_{i,j-1}+s_{j}, R_{max})$,

where $R_\mathrm{max}$ is the maximum of rank element candidates, and $\{s_j | j\in \{2, 3, \ldots , P\}\}$ is the offset coefficient and usually sets to $b_{j-1}$. Thus, the rank element candidates of the next search space can be expressed as

$$\begin{aligned} \{\hat{R}_{i,j-1}-s_{j}+b_{j}, \hat{R}_{i,j-1}-s_{j}+2b_{j}, \ldots , \hat{R}_{i,j-1}-s_{j}+nb_{j}\}, \end{aligned}$$

(14)

where $b_{j}$ is the sampling interval of the jth progressive phase, satisfying

$$\begin{aligned} b_{j+1} \le b_j, j \in \{1, 2, \ldots , P\}. \end{aligned}$$

(15)

The interval $b_{j}$ is gradually reduced, and when $b_{j}$ decreases to 1, the progressive phase will stop.

In addition, considering the above Hypothesis 1 cannot be proved by theory, the progressive genetic algorithm may fall into local optima. Therefore, we adds an exploration mechanism to the algorithm. Concretely speaking, except for the initial phase, the algorithm has a $10\%$ probability to choose rank within the search space in the previous evolutionary phase.

In the above evolutionary phase, the solution domain is one of the key components. Generally speaking, it will try to cover all possible states. Such an excessive solution domain may lead to the divergence of search algorithm. Compared with full state space, our algorithm can improve the search process in computational complexity significantly.

Weight inheritance

During evolutionary phase, to validate the performance, the searched TRN needs to be fully trained, which is the most time-consuming part of the search process. On MNIST, we can train searched TR-LeNet5 from scratch because of its fast convergence. But the training speed is slow on ResNets. Thus, we employ weight inheritance as a performance estimation acceleration strategy, which is inspired by the architecture evolution [18].

In our algorithm, to inherit trained weight directly, the rank $R=\{R^k_i | i \in [0, 1, \ldots , d-1]\}$ of kth layer needs to follow

$$\begin{aligned} R^k_0=R^k_1=\cdots =R^k_{d-1}=V_k. \end{aligned}$$

(16)

Obviously, the number of rank elements to be searched is reduced to k from d. For the kth layer, we will load the checkpoint whenever possible. Namely, if the kth layer matches $V_k$, the weights are preserved. Such a method is called warm-up.

During search process, we directly inherit the weights trained in warm-up stage and fine-tune the weights for each searched TRN. Instead of training from scratch, fine-tuning the trained weights can greatly resolve the time-consuming problem. For example, training ResNet20 on CIFAR10 from scratch requires about 200 epochs. On the contrary, our training with fine-tuning only needs 1 epoch, which brings the acceleration of 200$\times $.

Experiments

In this section, we conduct experiments to verify the effectiveness of PSTRN. First, to display the relation between the rank elements and performance of TR-based models, we design the synthetic experiment. Then we estimate the effect of the searched TR-based models on prevalent benchmarks. The optimization objectives of NSGA-II [3] are classification performance and compression ratio, namely PSTRN-M. In addition, to gain the TR-based model with high performance, we also conduct optimization algorithm PSTRN-S that only consider classification accuracy. In the tables of all experiments, the best results of the compressing models in the same magnitude are denoted in bold. All the experiments are implemented on Nvidia Tesla V100 GPUs.^{Footnote 2}

Synthetic experiment

Previous works of rank search lack of heuristic method, so they derive rank elements depending on decomposition, which limits the exploration of searching rank. Hypothesis 1 would bring a promising way to solve this problem, and we would like show the phenomenon of interest region in a synthetic experiment.

Experimental setting Given a low-rank weight matrix $\mathbf{W} \in {\mathbb {R}}^{144\times 144}$. We generate 5000 samples, and each dimension follows the normal distribution, i.e. $x \sim {\mathcal {N}}(0, 0.05\mathbf{I} )$, where $\mathbf{I} \in {\mathbb {R}}^{144}$ is the identity matrix. Then we generate the label y according to $y = \mathbf{W} (x+\epsilon )$ for each x, where $\epsilon \sim {\mathcal {N}}(0, 0.05\mathbf{I} )$ is a random Gaussian noise. Data pairs of $\{x, y\}$ constitute the dataset. We divide it into 4000 samples as the training set and 1000 samples as the testing set. For the model, we constructed the TR-linear model by replacing the $\mathbf{W} \in {\mathbb {R}}^{144 \times 144}$ with a TRF $\in {\mathbb {R}}^{12 \times 12 \times 12 \times 12}$. Then we train the TR-linear model with different ranks to completion, and validate the performance on the testing set with mean-square error (MSE) between the prediction $\hat{y}$ and label y. The rank is denoted as ${\mathbf {R}} = \{R_0, R_1, R_2, R_3 | R_{*} \in \{3, 4, \dots , 15\}\}$.

In the experiment, optimizer sets to Adam with a learning rate $1e-2$, MSE is adopted as loss function and batch size is 128. The total epoch is 100, and the learning rate decreases 90% every 30 epoch. For a comparison, we run the enumerating results as the baseline, which needs $13^4=28561$ times training.

Experimental results Figure 6 shows the distribution of top-100 rank elements sorted by the value of loss. The size of the circle denotes the number of models who has the same two rank elements. And the circle color represents ranking. It shows that it is not ideal to set each rank element the same arbitrarily. We calculate the mean $\mu $(3.6) and standard variance $\delta $(0.96) of top-100 models, and derive the interest region $[\mu - \delta , \mu + \delta ]$([2.64, 4.56]). Obviously, $R_1$ mostly distributes in the interest region. Should be noted that other rank elements do not show an apparent phenomenon, for the reason that they do not play a critical role in the performance. Our model can find the interest region that is important to the ability of models and achieve good results.

As shown in Fig. 7a, the approximation of interest region gradually approaches groundtruth, which demonstrates that PSTRN can locate interest region precisely. As illustrated in Fig. 7b, our model can find the best rank in the second phase, which proves the powerful capacity of PSTRN algorithm. Compared with 28561 enumerating results, we only need $n\_gen \times pop\_size \times P=10 \times 20 \times 3=600$ times training, which is much smaller. $pop\_size$ and $n\_gen$ are the population size and the number of generations, respectively. Undoubtedly, our PSTRN can find the optimal rank efficiently and precisely.

Table 1 The comparison of PSTRN and NSGA-II

Heuristic rank selection with progressively searching tensor ring network

Abstract

Similar content being viewed by others

Deep Convolutional Neural Network Compression Method: Tensor Ring Decomposition with Variational Bayesian Approach

Framework for the Training of Deep Neural Networks in TensorFlow Using Metaheuristics

A Collective Neurodynamic Optimization Approach to Nonnegative Tensor Decomposition

Introduction

Background

Tensor background

Notation

Tensor contraction

Tensorization

Tensor ring format (TRF)

Rank fixed

Rank selection

Methodology

Preliminaries

TR-CNN

TR-LSTM

Progressive searching tensor ring network

Hypothesis 1

Evolutionary phase

Progressive phase

Weight inheritance

Experiments

Synthetic experiment

Experiments on MNIST and Fashion MNIST

Experiments on CIFAR10 and CIFAR100

Experiments on HMDB51 and UCF11

Remark

Conclusion

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Ethics declarations

Conflict of interest

Availability of data and material

Code availability

Additional information

Publisher's Note

Appendix

Appendix

A.1: TR-LeNet5 on MNIST and Fashion MNIST

A.2: TR-ResNets on CIFAR10 and CIFAR100

A.3: TR-LSTM on HMDB51

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation