Advertisement

Representation Learning for Underdefined Tasks

  • Jean-François BonastreEmail author
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11896)

Abstract

In the neural network galaxy, the large majority of approaches and research effort is dedicated to defined tasks, like recognize an image of a cat or discriminate noise versus speech records. For these kind of tasks, it is easy to write a labeling reference guide in order to obtain training and evaluation data with a ground truth. But for a large set of high level human tasks, and particularly for tasks related to the artistic field, the task itself is not easy to define, only the result is known, and it is difficult or impossible to write such a labeling book. We name this kind of problem as “Underdefined task”. In this presentation, a methodology based on representation learning is proposed to tackle this class of problems and a practical example is shown in the domain of voice casting for voice dubbing.

Keywords

Representation learning Underdefined task Knowledge distillation Transfer learning Voice casting Voice dubbing 

Notes

Acknowledgment and Credits

The voice casting for voice dubbing work was supported by Avignon University foundation “Pierre Berge” PhD program and by ANR TheVoice project ANR-17-CE23-0025 (DIGITAL VOICE DESIGN FOR THE CREATIVE INDUSTRY).

The main part of the presented work on voice casting was done by Adrien Gresse during his PhD. Some ongoing parts are directly issued from Mathias Quillot’s (on going) PhD. Both provided a large part of the figures and tabs of this presentation.

References

  1. 1.
    Variani, E., Lei, X., McDermott, E., Moreno, I.L., Gonzalez-Dominguez, J.: Deep neural networks for small footprint text-dependent speaker verification. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2014)Google Scholar
  2. 2.
    Lopez-Paz, D., Bottou, L., Schölkopf, B., Vapnik, V.: Unifying distillation and privileged information. In: International Conference on Learning Representations (2016)Google Scholar
  3. 3.
    Vapnik, V., Izmailov, R.: Learning using privileged information: similarity control and knowledge transfer. J. Mach. Learn. Res. 16, 2023–2049 (2015)MathSciNetzbMATHGoogle Scholar
  4. 4.
    Hinton, G., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network (2015)Google Scholar
  5. 5.
    Price, R., Iso, K.-I., Shinoda, K.: Wise teachers train better DNN acoustic models. EURASIP J. Audio Speech Music Process. 2016 (2016)Google Scholar
  6. 6.
    Markov, K., Matsui, T.: Robust speech recognition using generalized distillation framework. In: INTERSPEECH (2016)Google Scholar
  7. 7.
    Li, J., Seltzer, M.L., Wang, X., Zhao, R., Gong, Y.: Large-scale domain adaptation via teacher-student learning (2017)Google Scholar
  8. 8.
    Watanabe, S., Hori, T., Le Roux, J., Hershey, J.R.: Student-teacher network learning with enhanced features. In: Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)Google Scholar
  9. 9.
    Asami, T., Masumura, R., Yamaguchi, Y., Masataki, H., Aono, Y.: Domain adaptation of DNN acoustic models using knowledge distillation. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2017)Google Scholar
  10. 10.
    Joy, N.M., Kothinti, S.R., Umesh, S., Abraham, B.: Generalized distillation framework for speaker normalization. In: INTERSPEECH (2017)Google Scholar
  11. 11.
    Obin, N., Roebel, A., Bachman, G.: On automatic voice casting for expressive speech: speaker recognition vs. speech classification. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2014)Google Scholar
  12. 12.
    Gresse, A., Rouvier, M., Dufour, R., Labatut, V., Bonastre, J.-F.: Acoustic pairing of original and dubbed voices in the context of video game localization. In: INTERSPEECH (2017)Google Scholar
  13. 13.
    Gresse, A., Quillot, M., Dufour, R., Labatut, V., Bonastre, J.-F.: Similarity metric based on Siamese neural networks for voice casting. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE (2019)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Avignon UniversityAvignonFrance

Personalised recommendations