Abstract
It seems magical that we are at a point in time where it is possible to discuss the subject of accurate, in-vacuo generation of a three dimensional image of the human form from the voice signal alone. From the discussion in this book so far, it should be evident that both direct and indirect relationships exist between voice and the human form. For example, voice can be indirectly related to bone structure. It can also at the same time be directly related to the person’s height, weight, age, gender and many other factors. These relationships can be transformed into predictive mechanisms. From predictions of the body dimensions and the weight, the person’s body mass index may be deduced; from predictions of the skull type, and the length of the vocal tract, the person’s likely skeletal proportions can be deduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zaidi, A. A., Mattern, B. C., Claes, P., McEcoy, B., Hughes, C., & Shriver, M. D. (2017). Investigating the case of human nose shape and climate adaptation. PLoS Genetics, 13(3), e1006616.
Subtelny, J. D. (1959). A longitudinal study of soft tissue facial structures and their profile characteristics, defined in relation to underlying skeletal structures. American Journal of Orthodontics, 45(7), 481–507.
Short, L. A., Mondloch, C. J., McCormick, C. M., Carré, J. M., Ma, R., Fu, G., et al. (2012). Detection of propensity for aggression based on facial structure irrespective of face race. Evolution and Human Behavior, 33(2), 121–129.
Carré, J. M., McCormick, C. M., & Mondloch, C. J. (2009). Facial structure is a reliable cue of aggressive behavior. Psychological Science, 20(10), 1194–1198.
Swift, W. B. (1916). The possibility of voice inheritance. Review of Neurology and Psychiatry, 14, 103.
McAllister, H. A., Dale, R. H., Bregman, N. J., McCabe, A., & Cotton, C. R. (1993). When eyewitnesses are also earwitnesses: Effects on visual and voice identifications. Basic and Applied Social Psychology, 14(2), 161–170.
Campanella, S., & Belin, P. (2007). Integrating face and voice in person perception. Trends in Cognitive Sciences, 11(12), 535–543.
Schweinberger, S. R., Kloth, N., & Robertson, D. M. (2011). Hearing facial identities: Brain correlates of face-voice integration in person identification. Cortex, 47(9), 1026–1037.
Locher, P. (2010). How does a visual artist create an artwork (pp. 131–144). The Cambridge handbook of creativity. Cambridge, UK: Cambridge University Press.
Schkolne, S., Pruett, M., & Schröder, P. (2001). Surface drawing: Creating organic 3D shapes with the hand and tangible tools. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (pp. 261–268). Seattle, WA, USA: ACM.
Mori, G., & Malik, J. (2002). Estimating human body configurations using shape context matching. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 666–680). Denmark: Copenhagen.
Kim, C., Shin, H. V., Oh, T. H., Kaspar, A., Elgharib, M., & Matusik, W. (2018). On learning associations of faces and voices. arXiv:1805.05553.
Nagrani, A., Albanie, S., & Zisserman, A. (2018). Learnable PINs: Cross-modal embeddings for person identity. arXiv:1805.00833.
Nagrani, A., Albanie, S., & Zisserman, A. (2018). Seeing voices and hearing faces: Cross-modal biometric matching. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, Utah, USA (pp. 8427–8436).
Lippert, C., Sabatini, R., Maher, M. C., Kang, E. Y., Lee, S., Arikan, O., et al. (2017). Identification of individuals by trait prediction using whole-genome sequencing data. Proceedings of the National Academy of Sciences, 114(38), 10166–10171.
Wen, Y., Ismail, M. A., Liu, W., Raj, B., & Singh, R. (2018). Disjoint mapping network for cross-modal matching of voices and faces. arXiv:1807.04836.
Blanz, V., & Vetter, T. (1999). A morphable model for the synthesis of 3D faces. In Proceedings of Siggraph (Vol. 99, pp. 187–194).
Cootes, T. F., Edwards, G. J., & Taylor, C. J. (2001). Active appearance models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 6, 681–685.
Sela, M., Richardson, E., & Kimmel, R. (2017). Unrestricted facial geometry reconstruction using image-to-image translation. In Proceedings of the International Conference on Computer Vision (ICCV) (pp. 1576–1585). Venice, Italy: IEEE.
Feng, Y., Wu, F., Shao, X., Wang, Y., & Zhou, X. (2018). Joint 3d face reconstruction and dense alignment with position map regression network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany (pp. 534–551).
Belin, P., Fecteau, S., & Bedard, C. (2004). Thinking the voice: Neural correlates of voice perception. Trends in Cognitive Sciences, 8(3), 129–135.
Kamachi, M., Hill, H., Lander, K., & Vatikiotis-Bateson, E. (2003). Putting the face to the voice: Matching identity across modality. Current Biology, 13(19), 1709–1714.
Schweinberger, S. R., Robertson, D., & Kaufmann, J. M. (2007). Hearing facial identities. The Quarterly Journal of Experimental Psychology, 60(10), 1446–1456.
Ellis, A. W. (1989). Neuro-cognitive processing of faces and voices. In A. W. Young & H. D. Ellis (Eds.), Handbook of research on face processing (pp. 207–215).
Belin, P., Bestelmeyer, P. E., Latinus, M., & Watson, R. (2011). Understanding voice perception. British Journal of Psychology, 102(4), 711–725.
Van Den Oord, A., Dieleman, S., Zen, H., Simonyan, K., Vinyals, O., Graves, A., et al. (2016). WaveNet: A generative model for raw audio. arXiv:1609.03499.
Taigman, Y., Wolf, L., Polyak, A., & Nachmani, E. (2018). VoiceLoop: Voice fitting and synthesis via a phonological loop. arXiv:1707.06588.
Ping, W., Peng, K., & Chen, J. (2018). ClariNet: Parallel wave generation in end-to-end text-to-speech. arXiv:1807.07281.
Wang, Y., Skerry-Ryan, R. J., Stanton, D., Wu, Y., Weiss, R. J., Jaitly, N., et al. (2017). Tacotron: A fully end-to-end text-to-speech synthesis model. arXiv:1703.10135.
Danlos, L. (1987). The linguistic basis of text generation. Cambridge, UK: Cambridge University Press.
Oberlander, J., & Brew, C. (2000). Stochastic text generation. Philosophical Transactions of the Royal Society of London A: Mathematical, Physical and Engineering Sciences, 358(1769), 1373–1387.
Zhang, Y., Gan, Z., Fan, K., Chen, Z., Henao, R., Shen, D., et al. (2017). Adversarial feature matching for text generation. arXiv:1706.03850.
Semeniuta, S., Severyn, A., & Barth, E. (2017). A hybrid convolutional variational autoencoder for text generation. arXiv:1702.02390.
Dehghani, M., Gouws, S., Vinyals, O., Uszkoreit, J., & Kaiser, K. (2018). Universal transformers. arXiv:1807.03819.
Zhang, Z., Luo, P., Loy, C. C., & Tang, X. (2014). Facial landmark detection by deep multi-task learning. In Proceedings of the European Conference on Computer Vision (ECCV), Zurich, Switzerland (pp. 94–108).
Perakis, P., Passalis, G., Theoharis, T., & Kakadiaris, I. A. (2013). 3D facial landmark detection under large yaw and expression variations. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(7), 1552–1564.
Oliver, R. G., & Evans, S. P. (1986). Tongue size, oral cavity size and speech. The Angle Orthodontist, 56(3), 234–243.
Story, B. H., Titze, I. R., & Hoffman, E. A. (2001). The relationship of vocal tract shape to three voice qualities. The Journal of the Acoustical Society of America, 109(4), 1651–1667.
Thornbury, S. (1993). Having a good jaw: Voice-setting phonology. ELT Journal, 47(2), 126–131.
Hynes, W. (1953). The results of pharyngoplasty by muscle transplantation in “failed cleft palate” cases, with special reference to the influence of the pharynx on voice production: Hunterian lecture delivered at the Royal College of Surgeons of England on 12th February 1953. Annals of the Royal College of Surgeons of England, 13(1), 17.
Estill, J. (1988). Belting and classic voice quality: Some physiological differences. Medical Problems of Performing Artists, 3(1), 37–43.
Esling, J. H. (1999). Voice quality settings of the pharynx. In Proceedings of the 14th International Congress of Phonetic Sciences (Vol. 3, pp. 2449–2452). Berkeley: University of California.
Sundberg, J., & Askenfelt, A. (1983). Larynx height and voice source: A relationship? In D. M. Bless & J. H. Abbs (Eds.), Vocal fold physiology: Contemporary research and clinical issues. San Diego, California: College-Hill Press.
Welch, G. F., & Sundberg, J. (2002). Solo voice. In R. Parncutt & G. McPherson (Eds.), The science and psychology of music performance: Creative strategies for teaching and learning (pp. 253–268). Oxford, UK: Oxford University Press.
Gatys, L. A., Ecker, A. S., & Bethge, M. (2016). Image style transfer using convolutional neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, USA (pp. 2414–2423).
Li, Y., Wang, N., Liu, J., & Hou, X. (2017). Demystifying neural style transfer. arXiv:1701.01036.
Jain, A. K., Mao, J., & Mohiuddin, K. M. (1996). Artificial neural networks: A tutorial. Computer, 3, 31–44.
Wythoff, B. J. (1993). Backpropagation neural networks: A tutorial. Chemometrics and Intelligent Laboratory Systems, 18(2), 115–155.
Lippmann, R. P. (1987). An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2), 4–22.
Smith, H. M., Dunn, A. K., Baguley, T., & Stacey, P. C. (2016). Concordant cues in faces and voices: Testing the backup signal hypothesis. Evolutionary Psychology, 14(1), 1–10.
Peelen, M. V., & Downing, P. E. (2007). The neural basis of visual body perception. Nature Reviews Neuroscience, 8(8), 636.
Cunningham, M. R., Roberts, A. R., Barbee, A. P., Druen, P. B., & Wu, C. H. (1995). Their ideas of beauty are, on the whole, the same as ours: Consistency and variability in the cross-cultural perception of female physical attractiveness. Journal of Personality and Social Psychology, 68(2), 261.
Bruce, V., & Young, A. (1998). In the eye of the beholder: The science of face perception. Oxford, UK: Oxford University Press.
Mermelstein, P. (1967). Determination of the vocal-tract shape from measured formant frequencies. The Journal of the Acoustical Society of America, 41(5), 1283–1294.
Yang, C. S., & Kasuya, H. (1994). Accurate measurement of vocal tract shapes from magnetic resonance images of child, female and male subjects. In Proceedings of the Third International Conference on Spoken Language Processing (ICSLP), Yokohama, Japan (pp. 623–626).
Edler, R., Agarwal, P., Wertheim, D., & Greenhill, D. (2006). The use of anthropometric proportion indices in the measurement of facial attractiveness. The European Journal of Orthodontics, 28(3), 274–281.
Farkas, L. G., & Munro, I. R. (1987). Anthropometric facial proportions in medicine. Springfield, Illinois: Charles C. Thomas Publisher.
Rakosi, T., Jonas, I., & Graber, T. (1993). Orthodontic diagnosis. Color atlas of dental medicine. Stuttgart, Germany: Thieme Medical Publishers.
Sassouni, V. (1969). A classification of skeletal facial types. American Journal of Orthodontics, 55(2), 109–123.
Allanson, J. E., Cunniff, C., Hoyme, H. E., McGaughran, J., Muenke, M., & Neri, G. (2009). Elements of morphology: Standard terminology for the head and face. American Journal of Medical Genetics Part A, 149(1), 6–28.
Collett, A. R., & West, V. C. (1993). Terminology of facial morphology in the vertical dimension. Australian Dental Journal, 38(3), 204–209.
Nanda, S. K. (1988). Patterns of vertical growth in the face. American Journal of Orthodontics and Dentofacial Orthopedics, 93(2), 103–116.
Byers, S. N. (2016). Introduction to forensic anthropology. Abingdon, UK: Taylor & Francis.
Enlow, D. H., & McNamara, J. A, Jr. (1973). The neurocranial basis for facial form and pattern. The Angle Orthodontist, 43(3), 256–270.
Enlow, D. H., Kuroda, T., & Lewis, A. B. (1971). The morphological and morphogenetic basis for craniofacial form and pattern. The Angle Orthodontist, 41(3), 161–188.
Farkas, L. G., Katic, M. J., & Forrest, C. R. (2005). International anthropometric study of facial morphology in various ethnic groups/races. Journal of Craniofacial Surgery, 16(4), 615–646.
Nei, M., & Roychoudhury, A. K. (1974). Genic variation within and between the three major races of man, Caucasoids, Negroids, and Mongoloids. American Journal of Human Genetics, 26(4), 421.
Goedde, H. W., Agarwal, D. P., Fritze, G., Meier-Tackmann, D., Singh, S., Beckmann, G., et al. (1992). Distribution of ADH 2 and ALDH2 genotypes in different populations. Human Genetics, 88(3), 344–346.
Hauser, G., & De Stefano, G. F. (1989). Epigenetic variants of the human skull. Stuttgart, Germany: Schweizerbart Science Publishers.
Bass, W. M. (1987). Human osteology: A laboratory and field manual (3rd ed.). Columbia: Missouri Archaeological Society.
France, D. L. (2003). Lab manual and workbook for physical anthropology (5th ed.). Belmont, California: West/Wadsworth Publishing Company.
Blanton, P. L., & Biggs, N. L. (1969). Eighteen hundred years of controversy: The paranasal sinuses. American Journal of Anatomy, 124(2), 135–147.
Chen, F. C., Ma, E. P. M., & Yiu, E. M. L. (2014). Facial bone vibration in resonant voice production. Journal of Voice, 28(5), 596–602.
Titze, I. R. (2001). Acoustic interpretation of resonant voice. Journal of Voice, 15(4), 519–528.
Katzmarzyk, P. T., & Leonard, W. R. (1998). Climatic influences on human body size and proportions: Ecological adaptations and secular trends. American Journal of Physical Anthropology: The Official Publication of the American Association of Physical Anthropologists, 106(4), 483–503.
Livshits, G., Roset, A., Yakovenko, K., Trofimov, S., & Kobyliansky, E. (2002). Genetics of human body size and shape: Body proportions and indices. Annals of Human Biology, 29(3), 271–289.
Martin, R., & Saller, K. (1957). Textbook of anthropology. Stuttgart, Germany: Fischer Publications.
Katzenberg, M. A., & Grauer, A. L. (Eds.). (2018). Biological anthropology of the human skeleton. New York City, New York: Wiley.
Wilkinson, C. (2004). Forensic facial reconstruction. Cambridge, UK: Cambridge University Press.
Iscan, M. Y., & Steyn, M. (2013). The human skeleton in forensic medicine. Springfield, Illinois: Charles C. Thomas Publisher.
Wright, S. (1918). On the nature of size factors. Genetics, 3(4), 367.
Antón, S. C., & Leigh, S. R. (2003). Growth and life history in Homo erectus. Cambridge Studies in Biological and Evolutionary Anthropology, 219–245.
Simoneau, J. A., & Bouchard, C. (1989). Human variation in skeletal muscle fiber-type proportion and enzyme activities. American Journal of Physiology-Endocrinology And Metabolism, 257(4), E567–E572.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Singh, R. (2019). Reconstruction of the Human Persona in 3D from Voice, and its Reverse. In: Profiling Humans from their Voice. Springer, Singapore. https://doi.org/10.1007/978-981-13-8403-5_9
Download citation
DOI: https://doi.org/10.1007/978-981-13-8403-5_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8402-8
Online ISBN: 978-981-13-8403-5
eBook Packages: EngineeringEngineering (R0)