Skip to main content
Log in

Matrix variate deep belief networks with CP decomposition algorithm and its application

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Deep belief networks (DBNs) are used in many applications such as image processing and pattern recognition. But the data vectorized resulting in the loss of high-dimensional data and valuable spatial information. The classical DBNs model is based on restricted Boltzmann machines (RBMs) and full connectivity between the visible units and the hidden units. It requires a large number of parameters to be trained using an army of training samples. However, it is difficult to obtain so many training samples in reality. To solve this problem, this paper proposes a matrix-variate deep belief networks (MVDBNs) model which is created from matrix variate RBMs whose parameter is restricted as canonical polyadic (CP) decomposition. MVDBNs are composed of two or more matrix-variate restricted Boltzmann machines (MVRBMs) whose input and latent variables are in matrix form. The MVDBNs have much fewer model parameters and deeper layer with better features to avoid overfitting and more accurate model easier to be learned. We demonstrate the capacity of MVDBNs on handwritten digit classification and medical image processing. We also extend MVRBMs to a multimodal case for image super-resolution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. http://yann.lecun.com/exdb/mnist/.

  2. https://github.com/rasmusbergpalm/DeepLearnToolbox/.

  3. http://decsai.ugr.es/cvg/dbimagenes/gbio256.php.

  4. http://decsai.ugr.es/cvg/dbimagenes/c512.php.

References

  1. Andrew, N.: Sparse Autoencoder. CS294A Lecture notes, vol. 72 (2011). https://web.stanford.edu/class/cs294a/sparseAutoencoder.pdf

  2. Cai, Y., Landis, M., Laidley, D.T., Kornecki, A., Lum, A., Li, S.: Multi-modal vertebrae recognition using transformed deep convolution network. Comput. Med. Imaging Graph. 51, 11–19 (2016)

    Article  Google Scholar 

  3. De Chazal, P., Tapson, J., van Schaik, A.: A comparison of extreme learning machines and back-propagation trained feed-forward networks processing the MINST database. In: IEEE conference on (ICASSP), pp. 2165–2168 (2015). https://doi.org/10.1109/ICASSP.2015.7178354

  4. Cun, Y.L., Boser, B., Denker, J.S., Howard, R.E., Habbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, pp. 396–404 (1990)

  5. Deng, L.: The MNIST database of handwritten digit images for machine learning research. IEEE Signal Process. Mag. 29(6), 141–142 (2012)

    Article  Google Scholar 

  6. Ferroni, G., Bonfigli, R., Principi, E., Squartini, S., Piazza, F.: A deep neural network approach for voice activity detection in multi-room domestic scenarios. In: Proceedings of the International Joint Conference on Neural networks, pp. 1–8 (2015). https://doi.org/10.1109/IJCNN.2015.7280510

  7. Fischer, A., Igel, C.: An introduction to restricted Boltzmann machines. In: Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, pp. 14–36. Springer, Berlin (2012)

  8. Greenspan, H., Ginneken, B., Summers, R.M.: Guest editorial deep learning in medical imaging: overview and future promise of an exciting new technique. IEEE Trans. Med. Imaging 35(5), 1153–1159 (2016)

    Article  Google Scholar 

  9. Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)

    Article  Google Scholar 

  10. Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)

    Article  MathSciNet  Google Scholar 

  11. Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)

    Article  MathSciNet  Google Scholar 

  12. Hinton, G.E., Sejnowski, T.J.: Optimal perceptual inference. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 448–453 (1983)

  13. Hutchinson, B., Deng, L., Yu, D.: Tensor deep stacking networks. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1944–1957 (2013)

    Article  Google Scholar 

  14. Jagadeesh, M., Kumar, M., Soman, K.: Deep belief network based part-of-speech tagger for Telugu language. In: Proceedings of the Second International Conference on Computer and Communication Technologies, pp. 75–84. Springer, Berlin (2016)

  15. Kang, M., Gonugondla, S.K., Min-Sun, K., Shanbhag, N.R.: An energy-efficient memory-based high-throughput vlsi architecture for convolutional networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1037–1041 (2015). https://doi.org/10.1109/ICASSP.2015.7178127

  16. Khamparia, A., Gupta, D., Nguyen, N., Khanna, A., Pandey, B., Tiwari, P.: Sound classification using convolutional neural network and tensor deep stacking network. IEEE Access 1(99), 7717–7727 (2019)

    Article  Google Scholar 

  17. LeCun, Y., Kavukcuoglu, K., Farabet, C.: Convolutional networks and applications in vision. In: Proceedings of 2010 IEEE International Symposium on Circuits and Systems (ISCAS), pp. 253–256 (2010). https://doi.org/10.1109/ISCAS.2010.5537907

  18. Liu, P., Han, S., Meng, Z., Tong, Y.: Facial expression recognition via a boosted deep belief network. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1805–1812 (2014). https://doi.org/10.1109/CVPR.2014.233

  19. Lu, R., Duan, Z., Zhang, C.: Audio-visual deep clustering for speech separation. IEEE/ACM Trans. Audio Speech Lang. Process. 27(11), 1697–1712 (2019)

    Article  Google Scholar 

  20. Mleczko, W.K., Kapuściński, T., Nowicki, R.K.: Rough deep belief network-application to incomplete handwritten digits pattern classification. In: Information and Software Technologies, pp. 400–411. Springer, Berlin (2015)

  21. Zhao, W., Ammar, H.B., Roos, N.: Dynamic object recognition using sparse coded three-way conditional restricted boltzmann machines. In: Proceedings of the 25th Benelux Conference on Artificial Intelligence, pp. 271–278 (2013)

  22. Qi, G., Sun, Y., Gao, J., Hu, Y., Li, J.: Matrix variate restricted Boltzmann machine. In: Proceedings of the International Joint Conference on Neural Networks, pp. 389–395 (2016). https://doi.org/10.1109/IJCNN.2016.7727225

  23. Rövid, A., Szeidl, L., Várlaki, P.: On tensor-product model based representation of neural networks. In: 2011 15th IEEE International Conference on Intelligent Engineering Systems, pp. 69–72 (2011). https://doi.org/10.1109/INES.2011.5954721

  24. Salakhutdinov, R., Hinton, G.E.: Replicated softmax: an undirected topic model. In: Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference, pp. 1607–1614, Vancouver, BC, Canada (2009)

  25. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, L., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  26. Tang, Y., Salakhutdinov, R., Hinton, G.: Tensor analyzers. In: 30 th International conference on machine learning, pp. 163–171, Atlanta, GA, United states (2013)

  27. Xia, B., Li, Q., Jia, J., Wang, J., Chaudhary, U., Ramos-Murguialday, A., Birbaumer, N.: Electrooculogram based sleep stage classification using deep belief network. In: Proceedings of the International Joint Conference on Neural Networks (IJCNN), pp. 1–5, Killarney, Ireland (2015). https://doi.org/10.1109/IJCNN.2015.7280775

  28. Yang, J., Wang, Z., Lin, Z., Scott, C., Thomas, H.: Coupled dictionary training for image super-resolution. IEEE Trans. Image Process. 21(8), 3467–3478 (2012)

    Article  MathSciNet  Google Scholar 

  29. Yang, J., Wright, J., Huang, T.S., Ma, Y.: Image super-resolution via sparse representation. IEEE Trans. Image Process. 19(11), 2861–2873 (2010)

    Article  MathSciNet  Google Scholar 

  30. Yu, D., Deng, L.: Deep learning and its applications to signal and information processing [exploratory DSP]. Signal Process. Mag. IEEE 28(1), 145–154 (2011)

    Article  Google Scholar 

  31. Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2010)

  32. Zhao, L., Zhou, Y., Lu, H., Fujita, H.: Parallel computing method of deep belief networks and its application to traffic flow prediction. Knowl. Based Syst. 163, 972–987 (2019)

    Article  Google Scholar 

Download references

Acknowledgements

We thank Dr. Zaiwen Liu and all our reviewers for their valuable discussions and suggestions.

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) [Grant Numbers 11901319].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanglei Qi.

Additional information

Communicated by I. IDE.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

X. Hou, G. Qi: Joint first author.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hou, X., Qi, G. Matrix variate deep belief networks with CP decomposition algorithm and its application. Multimedia Systems 26, 571–583 (2020). https://doi.org/10.1007/s00530-020-00666-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-020-00666-5

Keywords

Navigation