An Unsupervised Deep Learning Framework via Integrated Optimization of Representation Learning and GMM-Based Modeling

  • Jinghua Wang
  • Jianmin JiangEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11361)


While supervised deep learning has achieved great success in a range of applications, relatively little work has studied the discovery of knowledge from unlabeled data. In this paper, we propose an unsupervised deep learning framework to provide a potential solution for the problem that existing deep learning techniques require large labeled data sets for completing the training process. Our proposed introduces a new principle of joint learning on both deep representations and GMM (Gaussian Mixture Model)-based deep modeling, and thus an integrated objective function is proposed to facilitate the principle. In comparison with the existing work in similar areas, our objective function has two learning targets, which are created to be jointly optimized to achieve the best possible unsupervised learning and knowledge discovery from unlabeled data sets. While maximizing the first target enables the GMM to achieve the best possible modeling of the data representations and each Gaussian component corresponds to a compact cluster, maximizing the second term will enhance the separability of the Gaussian components and hence the inter-cluster distances. As a result, the compactness of clusters is significantly enhanced by reducing the intra-cluster distances, and the separability is improved by increasing the inter-cluster distances. Extensive experimental results show that the propose method can improve the clustering performance compared with benchmark methods.


Unsupervised clustering Representation learning Gaussian Mixture Model Deep learning 



The authors wish to acknowledge the financial support from: (i) Natural Science Foundation China (NSFC) under the Grant No. 61620106008; (ii) Natural Science Foundation China (NSFC) under the Grant No. 61802266; and (iii) Shenzhen Commission for Scientific Research & Innovations under the Grant No. JCYJ20160226191842793.


  1. 1.
    Aggarwal, C.C., Reddy, C.K.: Data Clustering: Algorithms and Applications, 1st edn. Chapman & Hall/CRC, Boca Raton (2013)CrossRefGoogle Scholar
  2. 2.
    Bishop, C.M.: Pattern Recognition and Machine Learning. Information Science and Statistics. Springer, New York (2006)zbMATHGoogle Scholar
  3. 3.
    Bruna, J., Mallat, S.: Invariant scattering convolution networks. TPAMI 35(8), 1872–1886 (2013)CrossRefGoogle Scholar
  4. 4.
    Cai, D., He, X., Han, J.: Document clustering using locality preserving indexing. TKDE 17(12), 1624–1637 (2005)Google Scholar
  5. 5.
    CaliåSki, T., Harabasz, J.: A dendrite method for cluster analysis. Commun. Stat. 3(1), 1–27 (1974)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Chen, X., Cai, D.: Large scale spectral clustering with landmark-based representation. In: AAAI, pp. 313–318 (2011)Google Scholar
  7. 7.
    Coates, A., Ng, A., Lee, H.: An analysis of single-layer networks in unsupervised feature learning 15, 215–223 (2011)Google Scholar
  8. 8.
    Deng, L., Chen, J.: Sequence classification using the high-level features extracted from deep neural networks. In: ICASSP, pp. 6844–6848 (2014)Google Scholar
  9. 9.
    Ding, C., Li, T., Jordan, M.I.: Convex and semi-nonnegative matrix factorizations. TPAMI 32(1), 45–55 (2010)CrossRefGoogle Scholar
  10. 10.
    Dizaji, K.G., Herandi, A., Huang, H.: Deep clustering via joint convolutional autoencoder embedding and relative entropy minimization. In: ICCV, pp. 5747–5756 (2017)Google Scholar
  11. 11.
    Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes paris look like paris? ACM Trans. Graph. 31(4), 101:1–101:9 (2012)CrossRefGoogle Scholar
  12. 12.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)Google Scholar
  13. 13.
    Heigold, G., Ney, H., Lehnen, P., Gass, T., Schluter, R.: Equivalence of generative and log-linear models. IEEE Trans. Audio Speech Lang. Process. 19(5), 1138–1148 (2011)CrossRefGoogle Scholar
  14. 14.
    Heigold, G.: A log-linear discriminative modeling framework for speech recognition. Ph.D. dissertation, Rwth Aachen (2010)Google Scholar
  15. 15.
    Hinton, G., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)MathSciNetCrossRefGoogle Scholar
  16. 16.
    Law, M.T., Urtasun, R., Zemel, R.S.: Deep spectral clustering learning. In: ICML, vol. 70, pp. 1985–1994 (2017)Google Scholar
  17. 17.
    Lecun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  18. 18.
    Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: a new benchmark collection for text categorization research. J. Mach. Learn. Res. 5, 361–397 (2004)Google Scholar
  19. 19.
    Lloyd, S.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–137 (1982)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Maaten, L.: Learning a parametric embedding by preserving local structure. In: Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics, pp. 384–391 (2009)Google Scholar
  21. 21.
    Nakayama, H., Harada, T., Kuniyoshi, Y.: Global Gaussian approach for scene categorization using information geometry, pp. 2336–2343 (2010)Google Scholar
  22. 22.
    Nene, S.A., Nayar, S.K., Murase, H.: Columbia university image library (coil-100) (1996)Google Scholar
  23. 23.
    Nene, S.A., Nayar, S.K., Murase, H.: Columbia university image library (coil-20) (1996)Google Scholar
  24. 24.
    Noh, H., Hong, S., Han, B.: Learning deconvolution network for semantic segmentation. In: CVPR, pp. 1520–1528 (2015)Google Scholar
  25. 25.
    Paulik, M.: Lattice-based training of bottleneck feature extraction neural networks. In: INTERSPEECH (2013)Google Scholar
  26. 26.
    Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. In: NIPS, pp. 91–99 (2015)Google Scholar
  27. 27.
    Sainath, T.N., Kingsbury, B., Ramabhadran, B.: Auto-encoder bottleneck features using deep belief networks. In: ICASSP, pp. 4153–4156 (2012)Google Scholar
  28. 28.
    Schroff, F., Kalenichenko, D., Philbin, J.: FaceNet: a unified embedding for face recognition and clustering. In: CVPR, pp. 815–823 (2015)Google Scholar
  29. 29.
    Serra, G., Grana, C., Manfredi, M., Cucchiara, R.: Gold: Gaussians of local descriptors for image representation. Comput. Vis. Image Underst. 134, 22–32 (2015)CrossRefGoogle Scholar
  30. 30.
    Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. JMLR 15, 1929–1958 (2014)MathSciNetzbMATHGoogle Scholar
  31. 31.
    Stuhlsatz, A., Lippel, J., Zielke, T.: Feature extraction with deep neural networks by a generalized discriminant analysis. IEEE Trans. Neural Netw. Learn. Syst. 23, 596–608 (2012)CrossRefGoogle Scholar
  32. 32.
    Trigeorgis, G., Bousmalis, K., Zafeiriou, S., Schuller, B.W.: A deep semi-NMF model for learning hidden representations. In: ICML, pp. II-1692–II-1700 (2014)Google Scholar
  33. 33.
    Tüske, Z., Tahir, M.A., Schlüter, R., Ney, H.: Integrating Gaussian mixtures into deep neural networks: softmax layer with hidden variables. In: ICASSP, pp. 4285–4289 (2015)Google Scholar
  34. 34.
    Variani, E., Mcdermott, E., Heigold, G.: A Gaussian mixture model layer jointly optimized with discriminative features within a deep neural network architecture. In: ICASSP, pp. 4270–4274 (2015)Google Scholar
  35. 35.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. JMLR 11, 3371–3408 (2010)MathSciNetzbMATHGoogle Scholar
  36. 36.
    Wang, J., Wang, G.: Hierarchical spatial sum-product networks for action recognition in still images. IEEE Trans. Circuits Syst. Video Technol. 28(1), 90–100 (2018)CrossRefGoogle Scholar
  37. 37.
    Wang, J., Wang, Z., Tao, D., See, S., Wang, G.: Learning common and specific features for RGB-D semantic segmentation with deconvolutional networks. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9909, pp. 664–679. Springer, Cham (2016). Scholar
  38. 38.
    Wang, Q., Li, P., Zhang, L.: G\(^2\)DeNet: global gaussian distribution embedding network and its application to visual recognition. In: CVPR (2017)Google Scholar
  39. 39.
    Wang, Q., Li, P., Zuo, W., Zhang, L.: RAID-G: robust estimation of approximate infinite dimensional Gaussian with application to material recognition. In: CVPR, pp. 4433–4441 (2016)Google Scholar
  40. 40.
    Xie, J., Girshick, R., Farhadi, A.: Unsupervised deep embedding for clustering analysis. In: ICML, pp. 478–487Google Scholar
  41. 41.
    Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the ACM SIGIR 2003, pp. 267–273 (2003)Google Scholar
  42. 42.
    Yang, B., Fu, X., Sidiropoulos, N.D., Hong, M.: Towards k-means-friendly spaces: simultaneous deep learning and clustering. ICML 70, 3861–3870 (2017)Google Scholar
  43. 43.
    Yang, J., Parikh, D., Batra, D.: Joint unsupervised learning of deep representations and image clusters. In: CVPR, pp. 5147–5156 (2016)Google Scholar
  44. 44.
    You, C., Robinson, D.P., Vidal, R.: Scalable sparse subspace clustering by orthogonal matching pursuit. In: CVPR, pp. 3918–3927, June 2016Google Scholar
  45. 45.
    Zelnik-Manor, L.: Self-tuning spectral clustering. NIPS 17, 1601–1608 (2004)Google Scholar
  46. 46.
    Zhang, W., Wang, X., Zhao, D., Tang, X.: Graph degree linkage: agglomerative clustering on a directed graph. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012. LNCS, vol. 7572, pp. 428–441. Springer, Heidelberg (2012). Scholar
  47. 47.
    Zhang, W., Zhao, D., Wang, X.: Agglomerative clustering via maximum incremental path integral. Pattern Recognit. 46(11), 3056–3065 (2013)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Research Institute for Future Media Computing, College of Computer Science and Software EngineeringShenzhen UniversityShenzhenChina

Personalised recommendations