Skip to main content

Mechanisms for Profiling

  • Chapter
  • First Online:
Profiling Humans from their Voice
  • 791 Accesses

Abstract

So how is profiling actually done? Most of this book has been dedicated to developing the basic understanding needed for it. We have seen that the knowledge of how a parameter affects the vocal production mechanism can help us identify the most relevant representations from which we may extract the information needed for profiling. We have also seen how such knowledge can help us reason out why certain parameters may exert confusable influences on the voice signal. All of this knowledge can then help us design more targeted methods to discover features that are highly effective for profiling.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 139.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 179.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The variables are more appropriately called explanatory variables, since they may not be independent of one another.

References

  1. Gath, I., & Yair, E. (1988). Analysis of vocal tract parameters in Parkinsonian speech. The Journal of the Acoustical Society of America, 84(5), 1628–1634.

    Article  Google Scholar 

  2. Grenier, Y., & Omnes-Chevalier, M. C. (1988). Autoregressive models with time-dependent log area ratios. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(10), 1602–1612.

    Article  MATH  Google Scholar 

  3. Adeli, H., & Hung, S. L. (1994). Machine learning: Neural networks, genetic algorithms, and fuzzy systems. New Jersey: Wiley.

    MATH  Google Scholar 

  4. Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer.

    Google Scholar 

  5. Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26(3), 159–190.

    Article  Google Scholar 

  6. Nasrabadi, N. M. (2007). Pattern recognition and machine learning. Journal of Electronic Imaging, 16(4), 049901.

    Article  MathSciNet  Google Scholar 

  7. Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323.

    Article  Google Scholar 

  8. Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320.

    Article  Google Scholar 

  9. Wilhelms-Tricarico, R. (1995). Physiological modeling of speech production: Methods for modeling soft-tissue articulators. The Journal of the Acoustical Society of America, 97(5), 3085–3098.

    Article  Google Scholar 

  10. Steinecke, I., & Herzel, H. (1995). Bifurcations in an asymmetric vocal-fold model. The Journal of the Acoustical Society of America, 97(3), 1874–1884.

    Article  Google Scholar 

  11. Deng, L. (1999). Computational models for speech production. In Computational Models of Speech Pattern Processing (K. Ponting Ed.) (pp. 199–213). Berlin: Springer.

    Chapter  MATH  Google Scholar 

  12. Alipour, F., Berry, D. A., & Titze, I. R. (2000). A finite-element model of vocal-fold vibration. The Journal of the Acoustical Society of America, 108(6), 3003–3012.

    Article  Google Scholar 

  13. Drechsel, J. S., & Thomson, S. L. (2008). Influence of supraglottal structures on the glottal jet exiting a two-layer synthetic, self-oscillating vocal fold model. The Journal of the Acoustical Society of America, 123(6), 4434–4445.

    Article  Google Scholar 

  14. Sagisaka, Y., Campbell, N., & Higuchi, N. (Eds.). (2012). Computing prosody: Computational models for processing spontaneous speech. Berlin: Springer Science & Business Media.

    Google Scholar 

  15. Stouten, V. (2009). Automatic voice onset time estimation from reassignment spectra. Speech Communication, 51(12), 1194–1205.

    Article  Google Scholar 

  16. Lin, C. Y., & Wang, H. C. (2011). Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection. The Journal of the Acoustical Society of America, 130(1), 514–525.

    Article  MathSciNet  Google Scholar 

  17. Hansen, J. H., Gray, S. S., & Kim, W. (2010). Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Communication, 52(10), 777–789.

    Article  Google Scholar 

  18. Sonderegger, M., & Keshet, J. (2012). Automatic measurement of voice onset time using discriminative structured prediction. The Journal of the Acoustical Society of America, 132(6), 3965–3979.

    Article  Google Scholar 

  19. Keshet, J., Shalev-Shwartz, S., Singer, Y., & Chazan, D. (2007). A large margin algorithm for speech-to-phoneme and music-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2373–2382.

    Article  Google Scholar 

  20. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  21. Breiman, L. (2017). Classification and regression trees. Routledge Press, Taylor & Francis Group.

    Google Scholar 

  22. Torgo, L., & Gama, J. (1997). Regression using classification algorithms. Intelligent Data Analysis, 1(4), 275–292.

    Article  Google Scholar 

  23. Memon, S. A., Zhao, W., Raj, B., & Singh, R. (2018). Neural regression trees. arXiv:1810.00974.

  24. Shashanka, M., Raj, B., & Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience. Article ID 947438.

    Google Scholar 

  25. Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13 (T.K. Leen, T.G. Dietterich & V. Tresp (Eds.)), Proceedings of the Neural Information Processing Systems (NIPS) (pp. 556–562).

    Google Scholar 

  26. Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Proceedings of the Twenty-Eighth Annual International Conference on Research and Development in Information Retrieval (SIGIR) (pp. 601–602). Salvador, Brazil: ACM.

    Google Scholar 

  27. Cichocki, A., Zdunek, R., & Amari, S. I. (2006). Csiszar’s divergences for non-negative matrix factorization: Family of new algorithms. In International Conference on Independent Component Analysis and Blind Signal Separation (ICA) (pp. 32–39). Charleston, SC, USA.

    Google Scholar 

  28. Cichocki, A., Lee, H., Kim, Y. D., & Choi, S. (2008). Non-negative matrix factorization with \(\alpha \)-divergence. Pattern Recognition Letters, 29(9), 1433–1440.

    Google Scholar 

  29. Heiler, M. & Schnörr, C. (2006). Controlling sparseness in non-negative tensor factorization. In Proceedings of the European Conference on Computer Vision (ECCV) (56–67). Graz, Austria.

    Google Scholar 

  30. Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.

    Article  Google Scholar 

  31. Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014). Deep learning for monaural speech separation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1562–1566). lorence, Italy: IEEE.

    Google Scholar 

  32. Kumar, A. (2018). Acoustic Intelligence in Machines, Doctoral dissertation. School of Computer Science: Carnegie Mellon University.

    Google Scholar 

  33. Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of the European Conference on Machine Learning (V. Barr, & Z. Markov (Eds.)) (pp. 4–15). Heidelberg: Springer.

    Google Scholar 

  34. Zhang, H. (2004). The optimality of naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS). Florida, USA: AAAI.

    Google Scholar 

  35. Ng, A. Y., & Jordan, M. I. (2002). On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems (pp. 841–848).

    Google Scholar 

  36. Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Massachusetts, USA: MIT press.

    Google Scholar 

  37. Ho, T.K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Vol. 1, pp. 278–282). Montreal, Canada: IEEE.

    Google Scholar 

  38. Myers, R. H., & Myers, R. H. (1990). Classical and Modern Regression with Applications (Vol. 2). Belmont, California: Duxbury Press.Classical and Modern Regression with Applications.

    Google Scholar 

  39. Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1), 63–76.

    Article  MATH  Google Scholar 

  40. Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In Proceedings of the European Conference on Machine Learning (ECML) (pp. 241–252). Cavtat-Dubrovnik, Coratia.

    Google Scholar 

  41. Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

    Article  MATH  Google Scholar 

  42. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.

    Article  MathSciNet  MATH  Google Scholar 

  43. Park, T., & Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103(482), 681–686.

    Article  MathSciNet  MATH  Google Scholar 

  44. Bedard, A., & Georges, T. (2000). Atmospheric infrasound. Acoustics Australia, 28(2), 47–52.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Rita Singh .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Singh, R. (2019). Mechanisms for Profiling. In: Profiling Humans from their Voice. Springer, Singapore. https://doi.org/10.1007/978-981-13-8403-5_8

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-8403-5_8

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-8402-8

  • Online ISBN: 978-981-13-8403-5

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics