Skip to main content

Cross-Gender and Age Speech Conversion Using Hidden Markov Model Based on Cepstral Coefficients Conversion

  • Conference paper
  • First Online:
Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 746))

  • 524 Accesses

Abstract

Animation movies often use children’s characters and they need children aged 5–10 to do a dubbing. For cost efficiency, a speech conversion can be done to support dubbing a children’s speech. To deal with it, in this research we propose the method to converting an adult’s speech to children’s speech. The contribution of this study is to design a signal processing algorithm to perform the conversion. In this study we propose a conversion method using the Hidden Markov Model (HMM) based on Cepstral Coefficients Conversion. The input is the speech of source speakers and the target speakers that using similar sentences. Features extraction, which is used is by extracted pitch (f0) and cepstral in conversion process, and the modeling method is HMM. System output is converted speech signals that has similar characteristics with target speech signal. From the testing results, the most optimal HMM parameter is using 4-state. The highest increase of cepstral Root Mean Square Error (RMSE) before conversion and after conversion is equal to 32.35% and an average 25.83% which obtained from 400 samples. Mean Opinion Score (MOS) on a scale from 1 (converted speech is very dissimilar with the target speech) to 5 (converted speech is very similar with the target speech). It resulted an average value of 2.505 in terms of similarities and has an average value of 2.805 in terms of quality which obtained from 30 respondents. The proposed method is expected to be used in the animation film industry in order to simplify and make efficient the dubbing process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Haikuo Yu (2013) English-Chinese Film Translation in China. J Transl 9(2):55–65

    Google Scholar 

  2. Ban R, Dıaz-Cintas J (2017) Language and translation in film. The Routledge Handbook of Translation Studies and Linguistics, (January):313–326)

    Google Scholar 

  3. de Reyes Lozano J, Julio de los Reyes Lozano (2017) Bringing all the Senses into Play: the Dubbing of Animated Films for Children. Palimpsestes 30:99–115

    Google Scholar 

  4. Ye H, Young S (2004) Voice conversion for unknown speakers. In: 8th International Conference on Spoken Language Processing, ICSLP 2004, number June, pp 1161–1164)

    Google Scholar 

  5. Stylianou Y, Olivier C (1998) A System for Voice Conversion Based On Probabilistic Classification and a Harmonic Plus Noise Model. In Proceedings of the 1998 In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP 98, pp 281–284

    Google Scholar 

  6. Yathigiri A, Bathula M, Kothapalli S, Vekkot S, Tripathi S (2017). Voice transformation using pitch and spectral mapping. In 2017 International Conference on Advances in Computing, Communications and Informatics, ICACCI 2017, volume 2017-Janua, pp 1540–1544

    Google Scholar 

  7. Lawlor B, FaganAD (1999). A Novel Efficient Algorithm for Voice Gender Conversion. In International Congress of Phonetic Sciences, pp 77–80

    Google Scholar 

  8. Bharti SK, Koolagudi SG, Sreenivasa Rao K, Choudhary A, Kumar B. Voice conversion using linear prediction coefficients and artificial neural network. In: ACM International Conference Proceeding Series, pp 240–245

    Google Scholar 

  9. Mousa Allam (2010) Voice Conversion using Pitch Shifting Algorithm by Time Stretching with PSOLA and Re-Sampling. J Electr Eng 61(1):57–61

    Google Scholar 

  10. Kianbakht Sajjad (2016) Dubbing and subtitling american comedy series. Eur J Engl Lang Lit Stud 4(4):65–80

    Google Scholar 

  11. Moh Supardi and Dea Amanda Putri (2018) Audio-Visual Translation Techniques: Subtitling and Dubbing of Movie Soundtrack in Frozen: Let it Go. Buletin Al-Turas 24(2):399–414

    Article  Google Scholar 

  12. Piazza Roberta (2010) Voice-over and self-narrative in film: A multimodal analysis of Antonioni’s When Love Fails (Tentato Suicidio). Lang Lit 19(2):173–195

    Article  Google Scholar 

  13. Szarkowska A, Jankowska A (2012). Text-to-speech audio description of voiced-over films . A case study of audio described Volver in Polish. Emerging topics in translation: Audio description, pp 81–98

    Google Scholar 

  14. Fernandez-Torn ́A, Matamala A (2015). Text-to-speech vs. Human voiced audio descriptions: A reception study in films dubbed into Catalan. J SpecIsed Transl 24(July):61–88

    Google Scholar 

  15. Jacob A, Mythili P (2008) Developing a Child Friendly Text-to-Speech System. Advances in Human-Computer Interaction, 1–6)

    Google Scholar 

  16. Watts Oliver, Yamagishi Junichi, King Simon, Berkling Kay (2010) Synthesis of child speech with HMM adaptation and voice conversion. IEEE Trans Audio Speech Lang Process 18(5):1005–1016

    Article  Google Scholar 

  17. Watts O, Yamagishi J, King S, Berkling K. HMM Adaptation and voice conversion for the synthesis of child speech: A comparison. In: Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, pp 2627–2630

    Google Scholar 

  18. Reima Karhila DR, Sanand, MK, Smit P. Creating synthetic voices for children by adapting adult average voice using stacked transformations and VTLN. In: ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing—Proceedings, number March, pp 4501–4504

    Google Scholar 

  19. Prashanth Gurunath Shivakumar and Panayiotis Georgiou (2020) Transfer learning from adult to children for speech recognition: Evaluation, analysis and recommendations. Comput Speech Lang 63:1–15

    Google Scholar 

  20. Banno H, Hata H, Morise M, Takahashi T, Irino T, Kawahara H (2007) Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation. Acoust Sci Technol 28:140–146

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raditiana Patmasari .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gultom, M.A.H., Patmasari, R., Wijayanto, I., Hadiyoso, S. (2021). Cross-Gender and Age Speech Conversion Using Hidden Markov Model Based on Cepstral Coefficients Conversion. In: Triwiyanto, Nugroho, H.A., Rizal, A., Caesarendra, W. (eds) Proceedings of the 1st International Conference on Electronics, Biomedical Engineering, and Health Informatics. Lecture Notes in Electrical Engineering, vol 746. Springer, Singapore. https://doi.org/10.1007/978-981-33-6926-9_13

Download citation

  • DOI: https://doi.org/10.1007/978-981-33-6926-9_13

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-33-6925-2

  • Online ISBN: 978-981-33-6926-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics