Skip to main content
Log in

Autonomous Learning of Representations

  • Technical Contribution
  • Published:
KI - Künstliche Intelligenz Aims and scope Submit manuscript

Abstract

Besides the core learning algorithm itself, one major question in machine learning is how to best encode given training data such that the learning technology can efficiently learn based thereon and generalize to novel data. While classical approaches often rely on a hand coded data representation, the topic of autonomous representation or feature learning plays a major role in modern learning architectures. The goal of this contribution is to give an overview about different principles of autonomous feature learning, and to exemplify two principles based on two recent examples: autonomous metric learning for sequences, and autonomous learning of a deep representation for spoken language, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Aharon M, Elad M, Bruckstein A (2006) k -svd: An algorithm for designing overcomplete dictionaries for sparse representation. Sig Process, IEEE Trans 54(11):4311–4322

    Article  Google Scholar 

  2. Bellet A, Habrard A (2015) Robustness and generalization for metric learning. Neurocomputing 151:259–267

    Article  Google Scholar 

  3. Bellet A, Habrard A, Sebban M (2012) Good edit similarity learning by loss minimization. Mach Learn 89(1–2):5–35

    Article  MathSciNet  MATH  Google Scholar 

  4. Bellet A, Habrard A, Sebban M (2012) Good edit similarity learning by loss minimization. Mach Learn 89(1):5–35

    Article  MathSciNet  MATH  Google Scholar 

  5. A. Bellet, A. Habrard, and M. Sebban. A survey on metric learning for feature vectors and structured data. CoRR, abs/1306.6709, 2013

  6. Bengio Y, Courville AC, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35(8):1798–1828

    Article  Google Scholar 

  7. Bengio Y, Simard P, Frasconi P (1994) Learning long-term dependencies with gradient descent is difficult. IEEE Trans Neural Networks 5(2):157–166

    Article  Google Scholar 

  8. Bernard M, Boyer L, Habrard A, Sebban M (2008) Learning probabilistic models of tree edit distance. Pattern Recogn 41(8):2611–2629

    Article  MATH  Google Scholar 

  9. Bianchini M, Scarselli F (2014) On the complexity of neural network classifiers: a comparison between shallow and deep architectures. IEEE Trans Neural Netw Learn Syst 25(8):1553–1565

    Article  Google Scholar 

  10. Biehl M, Bunte K, Schneider P (2013) Analysis of flow cytometry data by matrix relevance learning vector quantization. PLoS ONE 8(3):e59401

    Article  Google Scholar 

  11. S. Chaudhuri, M. Harvilla, and B. Raj. Unsupervised learning of acoustic unit descriptors for audio content representation and classification. In Proceedings of Interspeech, 2011

  12. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  13. de Vries G, Pauws SC, Biehl M (2015) Insightful stress detection from physiology modalities using learning vector quantization. Neurocomputing 151:873–882

    Article  Google Scholar 

  14. Foldiak P, Endres D (2008) Sparse coding. Scholarpedia 3(1):2984

    Article  MathSciNet  Google Scholar 

  15. J. Fransen, D. Pye, T. Robinson, P. Woodland, and S. Younge. WSJCAMO corpus and recording description. Citeseer, 1994

  16. Frénay B, Verleysen M (2011) Parameter-insensitive kernel in extreme learning for non-linear support vector regression. Neurocomputing 74(16):2526–2531

    Article  Google Scholar 

  17. Giotis I, Bunte K, Petkov N, Biehl M (2013) Adaptive matrices and filters for color texture classification. J Math Imaging Vis 47:79–92

    Article  MathSciNet  Google Scholar 

  18. Gisbrecht A, Hammer B (2015) Data visualization by nonlinear dimensionality reduction. Wiley Interdiscip Rev 5(2):51–73

    Google Scholar 

  19. Gisbrecht A, Schulz A, Hammer B (2015) Parametric nonlinear dimensionality reduction using kernel t-sne. Neurocomputing 147:71–82

    Article  Google Scholar 

  20. J. Goldberger, S. Roweis, G. Hinton, and R. Salakhutdinov. Neighborhood Component Analysis. In NIPS, 2004

  21. Hammer B, Gersmann K (2003) A note on the universal approximation capability of support vector machines. Neural Process Lett 17(1):43–53

    Article  Google Scholar 

  22. Hammer B, Hofmann D, Schleif F, Zhu X (2014) Learning vector quantization for (dis-)similarities. Neurocomputing 131:43–51

    Article  Google Scholar 

  23. Hammer B, Villmann T (2002) Generalized relevance learning vector quantization. Neural Netw 15(8–9):1059–1068

    Article  Google Scholar 

  24. J. Hastad. Almost optimal lower bounds for small depth circuits. In Proceedings of the Eighteenth Annual ACM Symposium on Theory of Computing, STOC ’86, pp 6–20, New York, 1986. ACM

  25. J. Heymann, O. Walter, R. Haeb-Umbach, and B. Raj. Unsupervised Word Segmentation from Noisy Input. In Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2013

  26. J. Heymann, O. Walter, R. Haeb-Umbach, and B. Raj. Iterative bayesian word segmentation for unspuervised vocabulary discovery from phoneme lattices. In 39th International Conference on Acoustics, Speech and Signal Processing (ICASSP 2014), May 2014

  27. Hinton GE (2007) Learning multiple layers of representation. Trends Cogn Sci 11:428–434

    Article  Google Scholar 

  28. Hocke J, Labusch K, Barth E, Martinetz T (2012) Sparse coding and selected applications. KI 26(4):349–355

    Google Scholar 

  29. Huang G, Huang G, Song S, You K (2015) Trends in extreme learning machines: a review. Neural Networks 61:32–48

    Article  Google Scholar 

  30. Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Networks 13(4–5):411–430

    Article  Google Scholar 

  31. A. Jansen, E. Dupoux, S. Goldwater, M. Johnson, S. Khudanpur, K. Church, N. Feldman, H. Hermansky, F. Metze, R. Rose, M. Seltzer, P. Clark, I. McGraw, B. Varadarajan, E. Bennett, B. Börschinger, J. Chiu, E. Dunbar, A. Fourtassi, D. Harwath, C.-Y. Lee, K. Levin, A. Norouzian, V. Peddinti, R. Richardson, T. Schatz, and S. Thomas. A summary of the 2012 JHU CLSP workshop on Zero Resource speech technologies and models of early language acquisition. In Proceedings of the 38th International Conference on Acoustics, Speech, and Signal Processing, 2013

  32. Kaski S, Sinkkonen J, Peltonen J (2001) Bankruptcy analysis with self-organizing maps in learning metrics. IEEE Trans Neural Networks 12(4):936–947

    Article  Google Scholar 

  33. Kirstein S, Wersing H, Gross H, Körner E (2012) A life-long learning vector quantization approach for interactive learning of multiple categories. Neural Networks 28:90–105

    Article  Google Scholar 

  34. Krüger N, Janssen P, Kalkan S, Lappe M, Leonardis A, Piater JH, Rodríguez-Sánchez AJ, Wiskott L (2013) Deep hierarchies in the primate visual cortex: what can we learn for computer vision? IEEE Trans Pattern Anal Mach Intell 35(8):1847–1871

    Article  Google Scholar 

  35. Kulis B (2013) Metric learning: A survey. Found Trends Mach Learn 5(4):287–364

    Article  MathSciNet  Google Scholar 

  36. Lukosevicius M, Jaeger H (2009) Reservoir computing approaches to recurrent neural network training. Comput Sci Rev 3(3):127–149

    Article  MATH  Google Scholar 

  37. Martino GDS, Sperduti A (2010) Mining structured data. IEEE Comput Intell Mag 5(1):42–49

    Article  Google Scholar 

  38. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  39. D. Mochihashi, T. Yamada, and N. Ueda. Bayesian unsupervised word segmentation with nested Pitman-Yor language modeling. In Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP: Vol. 1, 2009

  40. B. Mokbel, B. Paassen, F.-M. Schleif, and B. Hammer. Metric learning for sequences in relational lvq. Neurocomputing, accepted, 2015

  41. Neubig G, Mimura M, Kawaharak T (2012) Bayesian learning of a language model from continuous speech. IEICE Trans Inf Syst 95(2):614

    Article  Google Scholar 

  42. Nova D, Estévez PA (2014) A review of learning vector quantization classifiers. Neural Comput Appl 25(3–4):511–524

    Article  Google Scholar 

  43. Schneider P, Biehl M, Hammer B (2009) Adaptive relevance matrices in learning vector quantization. Neural Comput 21(12):3532–3561

    Article  MathSciNet  MATH  Google Scholar 

  44. Seo S, Obermeyer K (2003) Soft learning vector quantization. Neural Comput 15:1589–1604

    Article  MATH  Google Scholar 

  45. S. Shalev-shwartz, Y. Singer, A. Y. Ng. Online and batch learning of pseudo-metrics. ICML, ACM Press, pp 743–750 (2004)

  46. Y. Shi, A. Bellet, and F. Sha. Sparse compositional metric learning. CoRR, abs/1404.4105, 2014

  47. Siu M-H, Gish H, Chan A, Belfield W, Lowe S (2014) Unsupervised training of an hmm-based self-organizing unit recognizer with applications to topic classification and keyword discovery. Comp Speech Lang 28(1):210–223

    Article  Google Scholar 

  48. Steinwart I (2005) Consistency of support vector machines and other regularized kernel classifiers. IEEE Trans Inf Theory 51(1):128–142

    Article  MathSciNet  MATH  Google Scholar 

  49. Y. W. Teh. A Bayesian interpretation of interpolated Kneser-Ney. 2006

  50. Y. W. Teh. A hierarchical Bayesian language model based on Pitman-Yor processes. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 2006

  51. Tiño P, Hammer B (2003) Architectural bias in recurrent neural networks: Fractal analysis. Neural Comput 15(8):1931–1957

    Article  MATH  Google Scholar 

  52. Van der Maaten L, Postma E, Van den Herik H (2009) Dimensionality reduction: A comparative review. Technical Report TiCC TR 2009–005:

  53. O. Walter, V. Despotovic, R. Haeb-Umbach, J. Gemmeke, B. Ons, and H. Van hamme. An evaluation of unsupervised acoustic model training for a dysarthric speech interface. In INTERSPEECH 2014, 2014

  54. O. Walter, R. Haeb-Umbach, S. Chaudhuri, and B. Raj. Unsupervised Word Discovery from Phonetic Input Using Nested Pitman-Yor Language Modeling. ICRA Workshop on Autonomous Learning, 2013

  55. O. Walter, T. Korthals, R. Haeb-Umbach, and B. Raj. A Hierarchical System For Word Discovery Exploiting DTW-Based Initialization. In Automatic Speech Recognition and Understanding Workshop (ASRU), Dec. 2013

  56. Weinberger KQ, Saul LK (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244

    MATH  Google Scholar 

  57. Widrow B, Lehr MA (1990) 30 years of adaptive neural networks: perceptron, madaline, and backpropagation. Proc IEEE 78(9):1415–1442

    Article  Google Scholar 

  58. Wiskott L, Berkes P, Franzius M, Sprekeler H, Wilbert N (2011) Slow feature analysis. Scholarpedia 6(4):5282

    Article  Google Scholar 

  59. E. P. Xing, A. Y. Ng, M. I. Jordan, and S. Russell. Distance metric learning, with application to clustering with side-information. In ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 15, pages 505–512. MIT Press, 2003

  60. Zhu X, Schleif F, Hammer B (2014) Adaptive conformal semi-supervised vector quantization for dissimilarity data. Pattern Recogn Lett 49:138–145

    Article  Google Scholar 

Download references

Acknowledgments

The work was in part supported by Deutsche Forschungsgemeinschaft under contract Nos. Ha 3455/9-1 and Ha 2719/6-1 within the Priority Program SPP1527 ”Autonomous Learning”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Oliver Walter.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Walter, O., Haeb-Umbach, R., Mokbel, B. et al. Autonomous Learning of Representations. Künstl Intell 29, 339–351 (2015). https://doi.org/10.1007/s13218-015-0372-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13218-015-0372-1

Keywords

Navigation