Central Audio-Library of the University of Novi Sad

Conference paper
Part of the Studies in Computational Intelligence book series (SCI, volume 868)


This paper presents the project Central Audio Library of the University of Novi Sad (CABUNS), aimed at automated creation of audio editions of textbooks, presentations and other course material using the new technology of text-to-speech synthesis in the Serbian language. The paper describes the architecture and the features of the developed system, from the points of view of both teachers and assistants who upload course material to the CABUNS server, as well as students who can download audio editions and listen to them (and view them) using their computers and mobile phones. The examples of the first audio editions of textbooks and PowerPoint presentations related to the course Acoustics and Audio Engineering are presented. The paper also analyzes the advantages and drawbacks of this new learning technology, which has a potential to greatly contribute to the quality of higher education, but also to education at other levels. The paper also presents the most recent results in the development of text-to-speech, enabling voice conversion, which means that very soon it will be possible to produce an audio edition in the voice of the author of the textbook or the person who delivers the lecture.


Audio library Audio editions of textbooks New digital learning technologies Text-to-speech synthesis Deep neural networks Voice conversion 



The work described in this paper was supported in part by the Ministry of Education, Science and Technological Development of the Republic of Serbia, within the project “Development of Dialogue Systems for Serbian and Other South Slavic Languages”, and the Provincial Secretariat for Higher Education and Scientific Research, within the project “Central Audio-Library of the University of Novi Sad”, No. 114-451-2570/2016-02.


  1. 1.
    Aasbrenn, M., Bingen, H.: Maximizing flexibility and learning; using learning technology to improve course programs in higher education. In: ICDE 23rd World Conference, Maastricht MECC, The Netherlands (2009)Google Scholar
  2. 2.
    Abe, M.: Speaking styles: statistical analysis and synthesis by a text-to-speech system. Progress in Speech Synthesis, pp. 495–510. Springer, New York (1997)CrossRefGoogle Scholar
  3. 3.
    Beer, K.: Listen while you read. School Libr. J. 44(4), 30–35 (1998)Google Scholar
  4. 4.
    Delić, T., Suzić, S., Sec̆ujski, M., Ostojić, V.: Deep neural network speech synthesis based on adaptation to amateur speech data. In: 5th International Conference on Electrical, Electronic and Computing Engineering (IcETRAN), Subotica, Serbia, pp. 1249–1252 (2018)Google Scholar
  5. 5.
    Delić, T., Suzić, S., Sec̆ujski, M., Pekar, D.: Rapid development of new TTS voices by neural network adaptation. In: 17th International Symposium INFOTEH-JAHORINA, Jahorina, Bosnia and Herzegovina, pp. 1–6 (2018)Google Scholar
  6. 6.
    Delić, T., Sec̆ujski, M., Suzić, S.: A review of Serbian parametric speech synthesis based on deep neural networks. Telfor J. 9(1), 32–37 (2017). Scholar
  7. 7.
    Have, I., Stougaard Pedersen, B.: Digital Audiobooks: New Media, Users, and Experiences. Routledge, New York (2015)CrossRefGoogle Scholar
  8. 8.
    Mišković, D., Gnjatović, M., Jakovljević, N., Delić, V.: Development of the audio library of the University of Novi Sad. In: 11th DOGS, Digital Speech and Image Processing, Novi Sad, Serbia, pp. 53–56 (2017)Google Scholar
  9. 9.
    Morise, M., Yokomori, F., Ozawa, K.: WORLD: a vocoder-based high-quality speech synthesis system for real-time applications. In: IEICE Transactions on Information and Systems, vol. E99-D(7), pp. 1877–1884 (2016)CrossRefGoogle Scholar
  10. 10.
    Nees, M.A., Berry, L.F.: Audio assistive technology and accommodations for students with visual impairments: potentials and problems for delivering curricula and educational assessments. Perform. Enhancement Health 2(3), 101–109 (2013)CrossRefGoogle Scholar
  11. 11.
    Ozgur, A.Z., Kiray, H.S.: Evaluating audio books as supported course materials in distance education: the experiences of the blind learners. Turkish Online J. Educ. Technol. TOJET 6(4), 30–35 (2007)Google Scholar
  12. 12.
    Suzić, S., Delić, T., Jovanović, V., Sec̆ujski, M., Pekar, D., Delić, V.: A comparison of multi-style DNN-based TTS approaches using small datasets. In: 13th International Conference on Electromechanics and Robotics “Zavalishin’s Readings”, ER(ZR)-2018, St. Petersburg, Russia, pp. 1–6 (2018)Google Scholar
  13. 13.
    Suzić, S., Delić, T., Ostrogonac, S., Đurić, S., Pekar, D.: Style-code method for multi-style parametric text-to-speech synthesis. SPIIRAS Proc. 5(60), 216–240 (2018). Scholar
  14. 14.
    Székely, É., Cabral, J.P., Abou-Zleikha, M., Cahill, P., Carson-Berndsen, J.: Evaluating expressive speech synthesis from audiobooks in conversational phrases. In: International Conference on Language Resources and Evaluation, Istanbul, Turkey, pp. 3335–3339 (2012)Google Scholar
  15. 15.
    Wu, Z., Swietojanski, P., Veaux, C., Renals, S., King, S.: A study of speaker adaptation for DNN-based speech synthesis. In: 16th Annual Conference of the International Speech Communication Association, INTERSPEECH, Dresden, Germany (2015)Google Scholar
  16. 16.
    Zen, H., Agiomyrgiannakis, Y., Egberts, N., Henderson, F., Szczepaniak, P.: Fast, compact, and high quality LSTM-RNN based statistical parametric speech synthesizers for mobile devices. In: 17th Annual Conference of the International Speech Communication Association, INTERSPEECH, San Francisco, CA, USA, pp. 2273–2277 (2016)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  1. 1.Department for Power, Electronic and Telecommunication Engineering, Faculty of Technical SciencesUniversity of Novi SadNovi SadSerbia
  2. 2.Department for Music Production and Sound Design, Academy of ArtsAlfa BK UniversityBelgradeSerbia
  3. 3.Computer Programming Agency Code85 OdžaciOdžaciSerbia

Personalised recommendations