Reconstruction of the Human Persona in 3D from Voice, and its Reverse

Singh, Rita

doi:10.1007/978-981-13-8403-5_9

Rita Singh²

867 Accesses

Abstract

It seems magical that we are at a point in time where it is possible to discuss the subject of accurate, in-vacuo generation of a three dimensional image of the human form from the voice signal alone. From the discussion in this book so far, it should be evident that both direct and indirect relationships exist between voice and the human form. For example, voice can be indirectly related to bone structure. It can also at the same time be directly related to the person’s height, weight, age, gender and many other factors. These relationships can be transformed into predictive mechanisms. From predictions of the body dimensions and the weight, the person’s body mass index may be deduced; from predictions of the skull type, and the length of the vocal tract, the person’s likely skeletal proportions can be deduced.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 139.00; Price excludes VAT (USA)

Softcover Book: USD 179.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Zaidi, A. A., Mattern, B. C., Claes, P., McEcoy, B., Hughes, C., & Shriver, M. D. (2017). Investigating the case of human nose shape and climate adaptation. PLoS Genetics, 13(3), e1006616.
Article Google Scholar
Subtelny, J. D. (1959). A longitudinal study of soft tissue facial structures and their profile characteristics, defined in relation to underlying skeletal structures. American Journal of Orthodontics, 45(7), 481–507.
Article Google Scholar
Short, L. A., Mondloch, C. J., McCormick, C. M., Carré, J. M., Ma, R., Fu, G., et al. (2012). Detection of propensity for aggression based on facial structure irrespective of face race. Evolution and Human Behavior, 33(2), 121–129.
Article Google Scholar
Carré, J. M., McCormick, C. M., & Mondloch, C. J. (2009). Facial structure is a reliable cue of aggressive behavior. Psychological Science, 20(10), 1194–1198.
Article Google Scholar
Swift, W. B. (1916). The possibility of voice inheritance. Review of Neurology and Psychiatry, 14, 103.
Google Scholar
McAllister, H. A., Dale, R. H., Bregman, N. J., McCabe, A., & Cotton, C. R. (1993). When eyewitnesses are also earwitnesses: Effects on visual and voice identifications. Basic and Applied Social Psychology, 14(2), 161–170.
Article Google Scholar
Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends in Cognitive Sciences, 11(12), 535–543.
Article Google Scholar
Schweinberger, S. R., Kloth, N., & Robertson, D. M. (2011). Hearing facial identities: Brain correlates of face-voice integration in person identification. Cortex, 47(9), 1026–1037.
Article Google Scholar
Locher, P. (2010). How does a visual artist create an artwork (pp. 131–144). The Cambridge handbook of creativity. Cambridge, UK: Cambridge University Press.
Google Scholar
Schkolne, S., Pruett, M., & Schröder, P. (2001). Surface drawing: Creating organic 3D shapes with the hand and tangible tools. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 261–268). Seattle, WA, USA: ACM.
Google Scholar
Mori, G., & Malik, J. (2002). Estimating human body configurations using shape context matching. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 666–680). Denmark: Copenhagen.
Google Scholar
Kim, C., Shin, H. V., Oh, T. H., Kaspar, A., Elgharib, M., & Matusik, W. (2018). On learning associations of faces and voices. arXiv:1805.05553.
Nagrani, A., Albanie, S., & Zisserman, A. (2018). Learnable PINs: Cross-modal embeddings for person identity. arXiv:1805.00833.
Nagrani, A., Albanie, S., & Zisserman, A. (2018). Seeing voices and hearing faces: Cross-modal biometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA (pp. 8427–8436).
Google Scholar
Lippert, C., Sabatini, R., Maher, M. C., Kang, E. Y., Lee, S., Arikan, O., et al. (2017). Identification of individuals by trait prediction using whole-genome sequencing data. Proceedings of the National Academy of Sciences, 114(38), 10166–10171.
Article Google Scholar
Wen, Y., Ismail, M. A., Liu, W., Raj, B., & Singh, R. (2018). Disjoint mapping network for cross-modal matching of voices and faces. arXiv:1807.04836.
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Proceedings of Siggraph (Vol. 99, pp. 187–194).
Google Scholar
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 681–685.
Article Google Scholar
Sela, M., Richardson, E., & Kimmel, R. (2017). Unrestricted facial geometry reconstruction using image-to-image translation. In Proceedings of the International Conference on Computer Vision (ICCV) (pp. 1576–1585). Venice, Italy: IEEE.
Google Scholar
Feng, Y., Wu, F., Shao, X., Wang, Y., & Zhou, X. (2018). Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany (pp. 534–551).
Google Scholar
Belin, P., Fecteau, S., & Bedard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129–135.
Article Google Scholar
Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). Putting the face to the voice: Matching identity across modality. Current Biology, 13(19), 1709–1714.
Article Google Scholar
Schweinberger, S. R., Robertson, D., & Kaufmann, J. M. (2007). Hearing facial identities. The Quarterly Journal of Experimental Psychology, 60(10), 1446–1456.
Article Google Scholar
Ellis, A. W. (1989). Neuro-cognitive processing of faces and voices. In A. W. Young & H. D. Ellis (Eds.), Handbook of research on face processing (pp. 207–215).
Google Scholar
Belin, P., Bestelmeyer, P. E., Latinus, M., & Watson, R. (2011). Understanding voice perception. British Journal of Psychology, 102(4), 711–725.
Article Google Scholar
Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., et al. (2016). WaveNet: A generative model for raw audio. arXiv:1609.03499.
Taigman, Y., Wolf, L., Polyak, A., & Nachmani, E. (2018). VoiceLoop: Voice fitting and synthesis via a phonological loop. arXiv:1707.06588.
Ping, W., Peng, K., & Chen, J. (2018). ClariNet: Parallel wave generation in end-to-end text-to-speech. arXiv:1807.07281.
Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., et al. (2017). Tacotron: A fully end-to-end text-to-speech synthesis model. arXiv:1703.10135.
Danlos, L. (1987). The linguistic basis of text generation. Cambridge, UK: Cambridge University Press.
Book Google Scholar
Oberlander, J., & Brew, C. (2000). Stochastic text generation. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 358(1769), 1373–1387.
Article MATH Google Scholar
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., et al. (2017). Adversarial feature matching for text generation. arXiv:1706.03850.
Semeniuta, S., Severyn, A., & Barth, E. (2017). A hybrid convolutional variational autoencoder for text generation. arXiv:1702.02390.
Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., & Kaiser, K. (2018). Universal transformers. arXiv:1807.03819.
Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014). Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland (pp. 94–108).
Google Scholar
Perakis, P., Passalis, G., Theoharis, T., & Kakadiaris, I. A. (2013). 3D facial landmark detection under large yaw and expression variations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1552–1564.
Article Google Scholar
Oliver, R. G., & Evans, S. P. (1986). Tongue size, oral cavity size and speech. The Angle Orthodontist, 56(3), 234–243.
Google Scholar
Story, B. H., Titze, I. R., & Hoffman, E. A. (2001). The relationship of vocal tract shape to three voice qualities. The Journal of the Acoustical Society of America, 109(4), 1651–1667.
Article Google Scholar
Thornbury, S. (1993). Having a good jaw: Voice-setting phonology. ELT Journal, 47(2), 126–131.
Article Google Scholar
Hynes, W. (1953). The results of pharyngoplasty by muscle transplantation in “failed cleft palate” cases, with special reference to the influence of the pharynx on voice production: Hunterian lecture delivered at the Royal College of Surgeons of England on 12th February 1953. Annals of the Royal College of Surgeons of England, 13(1), 17.
Google Scholar
Estill, J. (1988). Belting and classic voice quality: Some physiological differences. Medical Problems of Performing Artists, 3(1), 37–43.
Google Scholar
Esling, J. H. (1999). Voice quality settings of the pharynx. In Proceedings of the 14th International Congress of Phonetic Sciences (Vol. 3, pp. 2449–2452). Berkeley: University of California.
Google Scholar
Sundberg, J., & Askenfelt, A. (1983). Larynx height and voice source: A relationship? In D. M. Bless & J. H. Abbs (Eds.), Vocal fold physiology: Contemporary research and clinical issues. San Diego, California: College-Hill Press.
Google Scholar
Welch, G. F., & Sundberg, J. (2002). Solo voice. In R. Parncutt & G. McPherson (Eds.), The science and psychology of music performance: Creative strategies for teaching and learning (pp. 253–268). Oxford, UK: Oxford University Press.
Google Scholar
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA (pp. 2414–2423).
Google Scholar
Li, Y., Wang, N., Liu, J., & Hou, X. (2017). Demystifying neural style transfer. arXiv:1701.01036.
Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural networks: A tutorial. Computer, 3, 31–44.
Article Google Scholar
Wythoff, B. J. (1993). Backpropagation neural networks: A tutorial. Chemometrics and Intelligent Laboratory Systems, 18(2), 115–155.
Article Google Scholar
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2), 4–22.
Article Google Scholar
Smith, H. M., Dunn, A. K., Baguley, T., & Stacey, P. C. (2016). Concordant cues in faces and voices: Testing the backup signal hypothesis. Evolutionary Psychology, 14(1), 1–10.
Article Google Scholar
Peelen, M. V., & Downing, P. E. (2007). The neural basis of visual body perception. Nature Reviews Neuroscience, 8(8), 636.
Article Google Scholar
Cunningham, M. R., Roberts, A. R., Barbee, A. P., Druen, P. B., & Wu, C. H. (1995). Their ideas of beauty are, on the whole, the same as ours: Consistency and variability in the cross-cultural perception of female physical attractiveness. Journal of Personality and Social Psychology, 68(2), 261.
Article Google Scholar
Bruce, V., & Young, A. (1998). In the eye of the beholder: The science of face perception. Oxford, UK: Oxford University Press.
Google Scholar
Mermelstein, P. (1967). Determination of the vocal-tract shape from measured formant frequencies. The Journal of the Acoustical Society of America, 41(5), 1283–1294.
Article Google Scholar
Yang, C. S., & Kasuya, H. (1994). Accurate measurement of vocal tract shapes from magnetic resonance images of child, female and male subjects. In Proceedings of the Third International Conference on Spoken Language Processing (ICSLP), Yokohama, Japan (pp. 623–626).
Google Scholar
Edler, R., Agarwal, P., Wertheim, D., & Greenhill, D. (2006). The use of anthropometric proportion indices in the measurement of facial attractiveness. The European Journal of Orthodontics, 28(3), 274–281.
Article Google Scholar
Farkas, L. G., & Munro, I. R. (1987). Anthropometric facial proportions in medicine. Springfield, Illinois: Charles C. Thomas Publisher.
Google Scholar
Rakosi, T., Jonas, I., & Graber, T. (1993). Orthodontic diagnosis. Color atlas of dental medicine. Stuttgart, Germany: Thieme Medical Publishers.
Google Scholar
Sassouni, V. (1969). A classification of skeletal facial types. American Journal of Orthodontics, 55(2), 109–123.
Article Google Scholar
Allanson, J. E., Cunniff, C., Hoyme, H. E., McGaughran, J., Muenke, M., & Neri, G. (2009). Elements of morphology: Standard terminology for the head and face. American Journal of Medical Genetics Part A, 149(1), 6–28.
Article Google Scholar
Collett, A. R., & West, V. C. (1993). Terminology of facial morphology in the vertical dimension. Australian Dental Journal, 38(3), 204–209.
Article Google Scholar
Nanda, S. K. (1988). Patterns of vertical growth in the face. American Journal of Orthodontics and Dentofacial Orthopedics, 93(2), 103–116.
Article Google Scholar
Byers, S. N. (2016). Introduction to forensic anthropology. Abingdon, UK: Taylor & Francis.
Book Google Scholar
Enlow, D. H., & McNamara, J. A, Jr. (1973). The neurocranial basis for facial form and pattern. The Angle Orthodontist, 43(3), 256–270.
Google Scholar
Enlow, D. H., Kuroda, T., & Lewis, A. B. (1971). The morphological and morphogenetic basis for craniofacial form and pattern. The Angle Orthodontist, 41(3), 161–188.
Google Scholar
Farkas, L. G., Katic, M. J., & Forrest, C. R. (2005). International anthropometric study of facial morphology in various ethnic groups/races. Journal of Craniofacial Surgery, 16(4), 615–646.
Article Google Scholar
Nei, M., & Roychoudhury, A. K. (1974). Genic variation within and between the three major races of man, Caucasoids, Negroids, and Mongoloids. American Journal of Human Genetics, 26(4), 421.
Google Scholar
Goedde, H. W., Agarwal, D. P., Fritze, G., Meier-Tackmann, D., Singh, S., Beckmann, G., et al. (1992). Distribution of ADH 2 and ALDH2 genotypes in different populations. Human Genetics, 88(3), 344–346.
Article Google Scholar
Hauser, G., & De Stefano, G. F. (1989). Epigenetic variants of the human skull. Stuttgart, Germany: Schweizerbart Science Publishers.
Google Scholar
Bass, W. M. (1987). Human osteology: A laboratory and field manual (3rd ed.). Columbia: Missouri Archaeological Society.
Google Scholar
France, D. L. (2003). Lab manual and workbook for physical anthropology (5th ed.). Belmont, California: West/Wadsworth Publishing Company.
Google Scholar
Blanton, P. L., & Biggs, N. L. (1969). Eighteen hundred years of controversy: The paranasal sinuses. American Journal of Anatomy, 124(2), 135–147.
Article Google Scholar
Chen, F. C., Ma, E. P. M., & Yiu, E. M. L. (2014). Facial bone vibration in resonant voice production. Journal of Voice, 28(5), 596–602.
Article Google Scholar
Titze, I. R. (2001). Acoustic interpretation of resonant voice. Journal of Voice, 15(4), 519–528.
Article Google Scholar
Katzmarzyk, P. T., & Leonard, W. R. (1998). Climatic influences on human body size and proportions: Ecological adaptations and secular trends. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists, 106(4), 483–503.
Article Google Scholar
Livshits, G., Roset, A., Yakovenko, K., Trofimov, S., & Kobyliansky, E. (2002). Genetics of human body size and shape: Body proportions and indices. Annals of Human Biology, 29(3), 271–289.
Article Google Scholar
Martin, R., & Saller, K. (1957). Textbook of anthropology. Stuttgart, Germany: Fischer Publications.
Google Scholar
Katzenberg, M. A., & Grauer, A. L. (Eds.). (2018). Biological anthropology of the human skeleton. New York City, New York: Wiley.
Google Scholar
Wilkinson, C. (2004). Forensic facial reconstruction. Cambridge, UK: Cambridge University Press.
Book Google Scholar
Iscan, M. Y., & Steyn, M. (2013). The human skeleton in forensic medicine. Springfield, Illinois: Charles C. Thomas Publisher.
Google Scholar
Wright, S. (1918). On the nature of size factors. Genetics, 3(4), 367.
Google Scholar
Antón, S. C., & Leigh, S. R. (2003). Growth and life history in Homo erectus. Cambridge Studies in Biological and Evolutionary Anthropology, 219–245.
Google Scholar
Simoneau, J. A., & Bouchard, C. (1989). Human variation in skeletal muscle fiber-type proportion and enzyme activities. American Journal of Physiology-Endocrinology And Metabolism, 257(4), E567–E572.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Carnegie Mellon University, Pittsburgh, PA, USA
Rita Singh

Authors

Rita Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rita Singh .

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Singh, R. (2019). Reconstruction of the Human Persona in 3D from Voice, and its Reverse. In: Profiling Humans from their Voice. Springer, Singapore. https://doi.org/10.1007/978-981-13-8403-5_9

Download citation

DOI: https://doi.org/10.1007/978-981-13-8403-5_9
Published: 19 June 2019
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8402-8
Online ISBN: 978-981-13-8403-5
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics