Skip to main content

Reconstruction of the Human Persona in 3D from Voice, and its Reverse

  • Chapter
  • First Online:
Profiling Humans from their Voice
  • 867 Accesses

Abstract

It seems magical that we are at a point in time where it is possible to discuss the subject of accurate, in-vacuo generation of a three dimensional image of the human form from the voice signal alone. From the discussion in this book so far, it should be evident that both direct and indirect relationships exist between voice and the human form. For example, voice can be indirectly related to bone structure. It can also at the same time be directly related to the person’s height, weight, age, gender and many other factors. These relationships can be transformed into predictive mechanisms. From predictions of the body dimensions and the weight, the person’s body mass index may be deduced; from predictions of the skull type, and the length of the vocal tract, the person’s likely skeletal proportions can be deduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Zaidi, A. A., Mattern, B. C., Claes, P., McEcoy, B., Hughes, C., & Shriver, M. D. (2017). Investigating the case of human nose shape and climate adaptation. PLoS Genetics, 13(3), e1006616.

    Article  Google Scholar 

  2. Subtelny, J. D. (1959). A longitudinal study of soft tissue facial structures and their profile characteristics, defined in relation to underlying skeletal structures. American Journal of Orthodontics, 45(7), 481–507.

    Article  Google Scholar 

  3. Short, L. A., Mondloch, C. J., McCormick, C. M., Carré, J. M., Ma, R., Fu, G., et al. (2012). Detection of propensity for aggression based on facial structure irrespective of face race. Evolution and Human Behavior, 33(2), 121–129.

    Article  Google Scholar 

  4. Carré, J. M., McCormick, C. M., & Mondloch, C. J. (2009). Facial structure is a reliable cue of aggressive behavior. Psychological Science, 20(10), 1194–1198.

    Article  Google Scholar 

  5. Swift, W. B. (1916). The possibility of voice inheritance. Review of Neurology and Psychiatry, 14, 103.

    Google Scholar 

  6. McAllister, H. A., Dale, R. H., Bregman, N. J., McCabe, A., & Cotton, C. R. (1993). When eyewitnesses are also earwitnesses: Effects on visual and voice identifications. Basic and Applied Social Psychology, 14(2), 161–170.

    Article  Google Scholar 

  7. Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends in Cognitive Sciences, 11(12), 535–543.

    Article  Google Scholar 

  8. Schweinberger, S. R., Kloth, N., & Robertson, D. M. (2011). Hearing facial identities: Brain correlates of face-voice integration in person identification. Cortex, 47(9), 1026–1037.

    Article  Google Scholar 

  9. Locher, P. (2010). How does a visual artist create an artwork (pp. 131–144). The Cambridge handbook of creativity. Cambridge, UK: Cambridge University Press.

    Google Scholar 

  10. Schkolne, S., Pruett, M., & Schröder, P. (2001). Surface drawing: Creating organic 3D shapes with the hand and tangible tools. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 261–268). Seattle, WA, USA: ACM.

    Google Scholar 

  11. Mori, G., & Malik, J. (2002). Estimating human body configurations using shape context matching. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 666–680). Denmark: Copenhagen.

    Google Scholar 

  12. Kim, C., Shin, H. V., Oh, T. H., Kaspar, A., Elgharib, M., & Matusik, W. (2018). On learning associations of faces and voices. arXiv:1805.05553.

  13. Nagrani, A., Albanie, S., & Zisserman, A. (2018). Learnable PINs: Cross-modal embeddings for person identity. arXiv:1805.00833.

  14. Nagrani, A., Albanie, S., & Zisserman, A. (2018). Seeing voices and hearing faces: Cross-modal biometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA (pp. 8427–8436).

    Google Scholar 

  15. Lippert, C., Sabatini, R., Maher, M. C., Kang, E. Y., Lee, S., Arikan, O., et al. (2017). Identification of individuals by trait prediction using whole-genome sequencing data. Proceedings of the National Academy of Sciences, 114(38), 10166–10171.

    Article  Google Scholar 

  16. Wen, Y., Ismail, M. A., Liu, W., Raj, B., & Singh, R. (2018). Disjoint mapping network for cross-modal matching of voices and faces. arXiv:1807.04836.

  17. Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Proceedings of Siggraph (Vol. 99, pp. 187–194).

    Google Scholar 

  18. Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 681–685.

    Article  Google Scholar 

  19. Sela, M., Richardson, E., & Kimmel, R. (2017). Unrestricted facial geometry reconstruction using image-to-image translation. In Proceedings of the International Conference on Computer Vision (ICCV) (pp. 1576–1585). Venice, Italy: IEEE.

    Google Scholar 

  20. Feng, Y., Wu, F., Shao, X., Wang, Y., & Zhou, X. (2018). Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany (pp. 534–551).

    Google Scholar 

  21. Belin, P., Fecteau, S., & Bedard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129–135.

    Article  Google Scholar 

  22. Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). Putting the face to the voice: Matching identity across modality. Current Biology, 13(19), 1709–1714.

    Article  Google Scholar 

  23. Schweinberger, S. R., Robertson, D., & Kaufmann, J. M. (2007). Hearing facial identities. The Quarterly Journal of Experimental Psychology, 60(10), 1446–1456.

    Article  Google Scholar 

  24. Ellis, A. W. (1989). Neuro-cognitive processing of faces and voices. In A. W. Young & H. D. Ellis (Eds.), Handbook of research on face processing (pp. 207–215).

    Google Scholar 

  25. Belin, P., Bestelmeyer, P. E., Latinus, M., & Watson, R. (2011). Understanding voice perception. British Journal of Psychology, 102(4), 711–725.

    Article  Google Scholar 

  26. Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., et al. (2016). WaveNet: A generative model for raw audio. arXiv:1609.03499.

  27. Taigman, Y., Wolf, L., Polyak, A., & Nachmani, E. (2018). VoiceLoop: Voice fitting and synthesis via a phonological loop. arXiv:1707.06588.

  28. Ping, W., Peng, K., & Chen, J. (2018). ClariNet: Parallel wave generation in end-to-end text-to-speech. arXiv:1807.07281.

  29. Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., et al. (2017). Tacotron: A fully end-to-end text-to-speech synthesis model. arXiv:1703.10135.

  30. Danlos, L. (1987). The linguistic basis of text generation. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  31. Oberlander, J., & Brew, C. (2000). Stochastic text generation. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 358(1769), 1373–1387.

    Article  MATH  Google Scholar 

  32. Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., et al. (2017). Adversarial feature matching for text generation. arXiv:1706.03850.

  33. Semeniuta, S., Severyn, A., & Barth, E. (2017). A hybrid convolutional variational autoencoder for text generation. arXiv:1702.02390.

  34. Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., & Kaiser, K. (2018). Universal transformers. arXiv:1807.03819.

  35. Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014). Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland (pp. 94–108).

    Google Scholar 

  36. Perakis, P., Passalis, G., Theoharis, T., & Kakadiaris, I. A. (2013). 3D facial landmark detection under large yaw and expression variations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1552–1564.

    Article  Google Scholar 

  37. Oliver, R. G., & Evans, S. P. (1986). Tongue size, oral cavity size and speech. The Angle Orthodontist, 56(3), 234–243.

    Google Scholar 

  38. Story, B. H., Titze, I. R., & Hoffman, E. A. (2001). The relationship of vocal tract shape to three voice qualities. The Journal of the Acoustical Society of America, 109(4), 1651–1667.

    Article  Google Scholar 

  39. Thornbury, S. (1993). Having a good jaw: Voice-setting phonology. ELT Journal, 47(2), 126–131.

    Article  Google Scholar 

  40. Hynes, W. (1953). The results of pharyngoplasty by muscle transplantation in “failed cleft palate” cases, with special reference to the influence of the pharynx on voice production: Hunterian lecture delivered at the Royal College of Surgeons of England on 12th February 1953. Annals of the Royal College of Surgeons of England, 13(1), 17.

    Google Scholar 

  41. Estill, J. (1988). Belting and classic voice quality: Some physiological differences. Medical Problems of Performing Artists, 3(1), 37–43.

    Google Scholar 

  42. Esling, J. H. (1999). Voice quality settings of the pharynx. In Proceedings of the 14th International Congress of Phonetic Sciences (Vol. 3, pp. 2449–2452). Berkeley: University of California.

    Google Scholar 

  43. Sundberg, J., & Askenfelt, A. (1983). Larynx height and voice source: A relationship? In D. M. Bless & J. H. Abbs (Eds.), Vocal fold physiology: Contemporary research and clinical issues. San Diego, California: College-Hill Press.

    Google Scholar 

  44. Welch, G. F., & Sundberg, J. (2002). Solo voice. In R. Parncutt & G. McPherson (Eds.), The science and psychology of music performance: Creative strategies for teaching and learning (pp. 253–268). Oxford, UK: Oxford University Press.

    Google Scholar 

  45. Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA (pp. 2414–2423).

    Google Scholar 

  46. Li, Y., Wang, N., Liu, J., & Hou, X. (2017). Demystifying neural style transfer. arXiv:1701.01036.

  47. Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural networks: A tutorial. Computer, 3, 31–44.

    Article  Google Scholar 

  48. Wythoff, B. J. (1993). Backpropagation neural networks: A tutorial. Chemometrics and Intelligent Laboratory Systems, 18(2), 115–155.

    Article  Google Scholar 

  49. Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2), 4–22.

    Article  Google Scholar 

  50. Smith, H. M., Dunn, A. K., Baguley, T., & Stacey, P. C. (2016). Concordant cues in faces and voices: Testing the backup signal hypothesis. Evolutionary Psychology, 14(1), 1–10.

    Article  Google Scholar 

  51. Peelen, M. V., & Downing, P. E. (2007). The neural basis of visual body perception. Nature Reviews Neuroscience, 8(8), 636.

    Article  Google Scholar 

  52. Cunningham, M. R., Roberts, A. R., Barbee, A. P., Druen, P. B., & Wu, C. H. (1995). Their ideas of beauty are, on the whole, the same as ours: Consistency and variability in the cross-cultural perception of female physical attractiveness. Journal of Personality and Social Psychology, 68(2), 261.

    Article  Google Scholar 

  53. Bruce, V., & Young, A. (1998). In the eye of the beholder: The science of face perception. Oxford, UK: Oxford University Press.

    Google Scholar 

  54. Mermelstein, P. (1967). Determination of the vocal-tract shape from measured formant frequencies. The Journal of the Acoustical Society of America, 41(5), 1283–1294.

    Article  Google Scholar 

  55. Yang, C. S., & Kasuya, H. (1994). Accurate measurement of vocal tract shapes from magnetic resonance images of child, female and male subjects. In Proceedings of the Third International Conference on Spoken Language Processing (ICSLP), Yokohama, Japan (pp. 623–626).

    Google Scholar 

  56. Edler, R., Agarwal, P., Wertheim, D., & Greenhill, D. (2006). The use of anthropometric proportion indices in the measurement of facial attractiveness. The European Journal of Orthodontics, 28(3), 274–281.

    Article  Google Scholar 

  57. Farkas, L. G., & Munro, I. R. (1987). Anthropometric facial proportions in medicine. Springfield, Illinois: Charles C. Thomas Publisher.

    Google Scholar 

  58. Rakosi, T., Jonas, I., & Graber, T. (1993). Orthodontic diagnosis. Color atlas of dental medicine. Stuttgart, Germany: Thieme Medical Publishers.

    Google Scholar 

  59. Sassouni, V. (1969). A classification of skeletal facial types. American Journal of Orthodontics, 55(2), 109–123.

    Article  Google Scholar 

  60. Allanson, J. E., Cunniff, C., Hoyme, H. E., McGaughran, J., Muenke, M., & Neri, G. (2009). Elements of morphology: Standard terminology for the head and face. American Journal of Medical Genetics Part A, 149(1), 6–28.

    Article  Google Scholar 

  61. Collett, A. R., & West, V. C. (1993). Terminology of facial morphology in the vertical dimension. Australian Dental Journal, 38(3), 204–209.

    Article  Google Scholar 

  62. Nanda, S. K. (1988). Patterns of vertical growth in the face. American Journal of Orthodontics and Dentofacial Orthopedics, 93(2), 103–116.

    Article  Google Scholar 

  63. Byers, S. N. (2016). Introduction to forensic anthropology. Abingdon, UK: Taylor & Francis.

    Book  Google Scholar 

  64. Enlow, D. H., & McNamara, J. A, Jr. (1973). The neurocranial basis for facial form and pattern. The Angle Orthodontist, 43(3), 256–270.

    Google Scholar 

  65. Enlow, D. H., Kuroda, T., & Lewis, A. B. (1971). The morphological and morphogenetic basis for craniofacial form and pattern. The Angle Orthodontist, 41(3), 161–188.

    Google Scholar 

  66. Farkas, L. G., Katic, M. J., & Forrest, C. R. (2005). International anthropometric study of facial morphology in various ethnic groups/races. Journal of Craniofacial Surgery, 16(4), 615–646.

    Article  Google Scholar 

  67. Nei, M., & Roychoudhury, A. K. (1974). Genic variation within and between the three major races of man, Caucasoids, Negroids, and Mongoloids. American Journal of Human Genetics, 26(4), 421.

    Google Scholar 

  68. Goedde, H. W., Agarwal, D. P., Fritze, G., Meier-Tackmann, D., Singh, S., Beckmann, G., et al. (1992). Distribution of ADH 2 and ALDH2 genotypes in different populations. Human Genetics, 88(3), 344–346.

    Article  Google Scholar 

  69. Hauser, G., & De Stefano, G. F. (1989). Epigenetic variants of the human skull. Stuttgart, Germany: Schweizerbart Science Publishers.

    Google Scholar 

  70. Bass, W. M. (1987). Human osteology: A laboratory and field manual (3rd ed.). Columbia: Missouri Archaeological Society.

    Google Scholar 

  71. France, D. L. (2003). Lab manual and workbook for physical anthropology (5th ed.). Belmont, California: West/Wadsworth Publishing Company.

    Google Scholar 

  72. Blanton, P. L., & Biggs, N. L. (1969). Eighteen hundred years of controversy: The paranasal sinuses. American Journal of Anatomy, 124(2), 135–147.

    Article  Google Scholar 

  73. Chen, F. C., Ma, E. P. M., & Yiu, E. M. L. (2014). Facial bone vibration in resonant voice production. Journal of Voice, 28(5), 596–602.

    Article  Google Scholar 

  74. Titze, I. R. (2001). Acoustic interpretation of resonant voice. Journal of Voice, 15(4), 519–528.

    Article  Google Scholar 

  75. Katzmarzyk, P. T., & Leonard, W. R. (1998). Climatic influences on human body size and proportions: Ecological adaptations and secular trends. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists, 106(4), 483–503.

    Article  Google Scholar 

  76. Livshits, G., Roset, A., Yakovenko, K., Trofimov, S., & Kobyliansky, E. (2002). Genetics of human body size and shape: Body proportions and indices. Annals of Human Biology, 29(3), 271–289.

    Article  Google Scholar 

  77. Martin, R., & Saller, K. (1957). Textbook of anthropology. Stuttgart, Germany: Fischer Publications.

    Google Scholar 

  78. Katzenberg, M. A., & Grauer, A. L. (Eds.). (2018). Biological anthropology of the human skeleton. New York City, New York: Wiley.

    Google Scholar 

  79. Wilkinson, C. (2004). Forensic facial reconstruction. Cambridge, UK: Cambridge University Press.

    Book  Google Scholar 

  80. Iscan, M. Y., & Steyn, M. (2013). The human skeleton in forensic medicine. Springfield, Illinois: Charles C. Thomas Publisher.

    Google Scholar 

  81. Wright, S. (1918). On the nature of size factors. Genetics, 3(4), 367.

    Google Scholar 

  82. Antón, S. C., & Leigh, S. R. (2003). Growth and life history in Homo erectus. Cambridge Studies in Biological and Evolutionary Anthropology, 219–245.

    Google Scholar 

  83. Simoneau, J. A., & Bouchard, C. (1989). Human variation in skeletal muscle fiber-type proportion and enzyme activities. American Journal of Physiology-Endocrinology And Metabolism, 257(4), E567–E572.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rita Singh .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Singh, R. (2019). Reconstruction of the Human Persona in 3D from Voice, and its Reverse. In: Profiling Humans from their Voice. Springer, Singapore. https://doi.org/10.1007/978-981-13-8403-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-8403-5_9

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-8402-8

  • Online ISBN: 978-981-13-8403-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics