Online Learning of Deep Hybrid Architectures for Semi-supervised Categorization

  • Alexander G. OrorbiaII
  • David Reitter
  • Jian Wu
  • C. Lee Giles
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9284)


A hybrid architecture is presented capable of online learning from both labeled and unlabeled samples. It combines both generative and discriminative objectives to derive a new variant of the Deep Belief Network, i.e., the Stacked Boltzmann Experts Network model. The model’s training algorithm is built on principles developed from hybrid discriminative Boltzmann machines and composes deep architectures in a greedy fashion. It makes use of its inherent “layer-wise ensemble" nature to perform useful classification work. We (1) compare this architecture against a hybrid denoising autoencoder version of itself as well as several other models and (2) investigate training in the context of an incremental learning procedure. The best-performing hybrid model, the Stacked Boltzmann Experts Network, consistently outperforms all others.


Restricted Boltzmann machines Denoising autoencoders Semi-supervised learning Incremental learning Hybrid architectures 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bengio, Y.: Deep learning of representations for unsupervised and transfer learning. Journal of Machine Learning Research-Proceedings Track (2012)Google Scholar
  2. 2.
    Bengio, Y., Courville, A.C., Vincent, P.: Unsupervised feature learning and deep learning: Review and new perspectives (2012). CoRR abs/1206.5538Google Scholar
  3. 3.
    Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. Advances in Neural Information Processing Systems (2007)Google Scholar
  4. 4.
    Bengio, Y., LeCun, Y.: Scaling learning algorithms towards AI. Large-Scale Kernel Machines 34, 1–41 (2007)Google Scholar
  5. 5.
    Calandra, R., Raiko, T., Deisenroth, M.P., Pouzols, F.M.: Learning deep belief networks from non-stationary streams. In: Villa, A.E.P., Duch, W., Érdi, P., Masulli, F., Palm, G. (eds.) ICANN 2012, Part II. LNCS, vol. 7553, pp. 379–386. Springer, Heidelberg (2012) CrossRefGoogle Scholar
  6. 6.
    Caragea, C., Wu, J., Williams, K., Das, S., Khabsa, M., Teregowda, P., Giles, C.L.: Automatic identification of research articles from crawled documents. In: Web-Scale Classification: Classifying Big Data from the Web, co-located with WSDM (2014)Google Scholar
  7. 7.
    Cardoso-Cachopo, A.: Improving methods for single-label text categorization. PdD Thesis, Instituto Superior Tecnico, Universidade Tecnica de Lisboa (2007)Google Scholar
  8. 8.
    Chen, G., Srihari, S.H.: Restricted Boltzmann machine for classification with hierarchical correlated prior (2014). arXiv preprint arXiv:1406.3407
  9. 9.
    Crammer, K., Singer, Y.: On the algorithmic implementation of multiclass kernel-based vector machines. Journal of Machine Learning Research 2, 265–292 (2002)MATHGoogle Scholar
  10. 10.
    Côté, M.A., Larochelle, H.: An infinite Restricted Boltzmann Machine (2015). arXiv preprint arXiv:1502.02476
  11. 11.
    Elfwing, S., Uchibe, E., Doya, K.: Expected energy-based restricted Boltzmann machine for classification. Neural Networks 64, 29–38Google Scholar
  12. 12.
    Fiore, U., Palmieri, F., Castiglione, A., De Santis, A.: Network anomaly detection with the Restricted Boltzmann Machine. Neurocomputing 122 (2013)Google Scholar
  13. 13.
    Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks. In: Proc. 14th International Conference on Artificial Intelligence and Statistics, vol. 15, pp. 315–323 (2011)Google Scholar
  14. 14.
    Glorot, X., Bordes, A., Bengio, Y.: Domain adaptation for large-scale sentiment classification: a deep learning approach. In: Proc. 28th International Conference on Machine Learning (ICML 2011), pp. 513–520 (2011)Google Scholar
  15. 15.
    Gollapalli, S.D., Caragea, C., Mitra, P., Giles, C.L.: Researcher homepage classification using unlabeled data. In: Proc. 22nd International Conference on World Wide Web, Geneva, Switzerland, pp. 471–482 (2013)Google Scholar
  16. 16.
    Halevy, A., Norvig, P., Pereira, F.: The unreasonable effectiveness of data. IEEE Intelligent Systems 24(2), 8–12 (2009)CrossRefGoogle Scholar
  17. 17.
    Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Computation 14(8), 1771–1800 (2002)CrossRefMathSciNetMATHGoogle Scholar
  18. 18.
    Hinton, G.E.: What kind of graphical model is the brain? In: Proc. 19th International Joint Conference on Artificial Intelligence, Edinburgh, Scotland, vol. 5, pp. 1765–1775 (2005)Google Scholar
  19. 19.
    Larochelle, H., Bengio, Y.: Classification using discriminative Restricted Boltzmann Machines. In: Proc. 25th International Conference on Machine learning, pp. 536–543 (2008)Google Scholar
  20. 20.
    Larochelle, H., Mandel, M., Pascanu, R., Bengio, Y.: Learning algorithms for the classification Restricted Boltzmann Machine. Journal of Machine Learning Research 13, 643–669 (2012)MathSciNetMATHGoogle Scholar
  21. 21.
    Lasserre, J.A., Bishop, C.M., Minka, T.P.: Principled hybrids of generative and discriminative models. In: Proc. 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 87–94. IEEE Computer Society, Washington (2006)Google Scholar
  22. 22.
    Lee, C.Y., Xie, S., Gallagher, P., Zhang, Z., Tu, Z.: Deeply-supervised nets (2014). arXiv:1409.5185 [cs, stat]
  23. 23.
    Lee, D.H.: Pseudo-label: the simple & efficient semi-supervised learning method for deep neural networks. In: Workshop on Challenges in Representation Learning, ICML (2013)Google Scholar
  24. 24.
    Liu, T.: A novel text classification approach based on deep belief network. In: Wong, K.W., Mendis, B.S.U., Bouzerdoum, A. (eds.) ICONIP 2010, Part I. LNCS, vol. 6443, pp. 314–321. Springer, Heidelberg (2010) Google Scholar
  25. 25.
    Louradour, J., Larochelle, H.: Classification of sets using Restricted Boltzmann Machines (2011). arXiv preprint arXiv:1103.4896
  26. 26.
    Lu, Z., Li, H.: A deep architecture for matching short texts. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 1367–1375. Curran Associates, Inc. (2013)Google Scholar
  27. 27.
    Maas, A.L., Hannun, A.Y., Ng, A.Y.: Rectifier nonlinearities improve neural network acoustic models. In: Proc. ICML, vol. 30 (2013)Google Scholar
  28. 28.
    Masud, M.M., Woolam, C., Gao, J., Khan, L., Han, J., Hamlen, K.W., Oza, N.C.: Facing the reality of data stream classification: Coping with scarcity of labeled data. Knowledge and Information Systems 33(1), 213–244 (2012)CrossRefGoogle Scholar
  29. 29.
    Nair, V., Hinton, G.E.: Rectified linear units improve Restricted Boltzmann Machines. In: Proc. 27th International Conference on Machine Learning (ICML 2010), pp. 807–814 (2010)Google Scholar
  30. 30.
    Ranzato, M.A., Szummer, M.: Semi-supervised learning of compact document representations with deep networks. In: Proc. 25th International Conference on Machine Learning, pp. 792–799. ACM (2008)Google Scholar
  31. 31.
    Salakhutdinov, R., Hinton, G.: Semantic Hashing. International Journal of Approximate Reasoning 50(7), 969–978 (2009)CrossRefGoogle Scholar
  32. 32.
    Sarikaya, R., Hinton, G., Deoras, A.: Application of Deep Belief Networks for natural language understanding. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(4), 778–784 (2014)CrossRefGoogle Scholar
  33. 33.
    Schapire, R.E.: The strength of weak learnability. Machine Learning 5(2), 197–227 (1990)Google Scholar
  34. 34.
    Schmah, T., Hinton, G.E., Small, S.L., Strother, S., Zemel, R.S.: Generative versus discriminative training of RBMs for classification of fMRI images. In: Advances in Neural Information Processing Systems, pp. 1409–1416 (2008)Google Scholar
  35. 35.
    Shalev-Shwartz, S., Singer, Y., Srebro, N., Cotter, A.: Pegasos: Primal estimated sub-gradient solver for SVM. Mathematical Programming 127(1), 3–30 (2011)CrossRefMathSciNetMATHGoogle Scholar
  36. 36.
    Sun, X., Li, C., Xu, W., Ren, F.: Chinese microblog sentiment classification based on deep belief nets with extended multi-modality features. In: 2014 IEEE International Conference on Data Mining Workshop (ICDMW), pp. 928–935 (2014)Google Scholar
  37. 37.
    Tomczak, J.M.: Prediction of breast cancer recurrence using classification Restricted Boltzmann Machine with dropping (2013). arXiv preprint arXiv:1308.6324
  38. 38.
    Tomczak, J.M., Ziba, M.: Classification restricted Boltzmann machine for comprehensible credit scoring model. Expert Systems with Applications 42(4), 1789–1796 (2015)CrossRefGoogle Scholar
  39. 39.
    Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11, 3371–3408 (2010)MathSciNetMATHGoogle Scholar
  40. 40.
    Welling, M., Rosen-zvi, M., Hinton, G.E.: Exponential family harmoniums with an application to information retrieval. In: Saul, L.K., Weiss, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems, vol. 17, pp. 1481–1488. MIT Press (2005)Google Scholar
  41. 41.
    Zhang, J., Tian, G., Mu, Y., Fan, W.: Supervised deep learning with auxiliary networks. In: Proc. 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining pp. 353–361. ACM (2014)Google Scholar
  42. 42.
    Zhou, G., Sohn, K., Lee, H.: Online incremental feature learning with denoising autoencoders. In: Proc. 15th International Conference on Artificial Intelligence and Statistics, pp. 1453–1461 (2012)Google Scholar
  43. 43.
    Zhou, J., Luo, H., Luo, Q., Shen, L.: Attentiveness detection using continuous restricted Boltzmann machine in e-learning environment. In: Wang, F.L., Fong, J., Zhang, L., Lee, V.S.K. (eds.) ICHL 2009. LNCS, vol. 5685, pp. 24–34. Springer, Heidelberg (2009) CrossRefGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Alexander G. OrorbiaII
    • 1
  • David Reitter
    • 1
  • Jian Wu
    • 1
  • C. Lee Giles
    • 1
  1. 1.College of Information Sciences and TechnologyThe Pennsylvania State UniversityState CollegeUSA

Personalised recommendations