Abstract
So how is profiling actually done? Most of this book has been dedicated to developing the basic understanding needed for it. We have seen that the knowledge of how a parameter affects the vocal production mechanism can help us identify the most relevant representations from which we may extract the information needed for profiling. We have also seen how such knowledge can help us reason out why certain parameters may exert confusable influences on the voice signal. All of this knowledge can then help us design more targeted methods to discover features that are highly effective for profiling.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The variables are more appropriately called explanatory variables, since they may not be independent of one another.
References
Gath, I., & Yair, E. (1988). Analysis of vocal tract parameters in Parkinsonian speech. The Journal of the Acoustical Society of America, 84(5), 1628–1634.
Grenier, Y., & Omnes-Chevalier, M. C. (1988). Autoregressive models with time-dependent log area ratios. IEEE Transactions on Acoustics, Speech, and Signal Processing, 36(10), 1602–1612.
Adeli, H., & Hung, S. L. (1994). Machine learning: Neural networks, genetic algorithms, and fuzzy systems. New Jersey: Wiley.
Dietterich, T. G. (2000). Ensemble methods in machine learning. In Proceedings of the International Workshop on Multiple Classifier Systems (pp. 1–15). Berlin: Springer.
Kotsiantis, S. B., Zaharakis, I. D., & Pintelas, P. E. (2006). Machine learning: A review of classification and combining techniques. Artificial Intelligence Review, 26(3), 159–190.
Nasrabadi, N. M. (2007). Pattern recognition and machine learning. Journal of Electronic Imaging, 16(4), 049901.
Strobl, C., Malley, J., & Tutz, G. (2009). An introduction to recursive partitioning: Rationale, application, and characteristics of classification and regression trees, bagging, and random forests. Psychological Methods, 14(4), 323.
Childers, D. G., Hicks, D. M., Moore, G. P., & Alsaka, Y. A. (1986). A model for vocal fold vibratory motion, contact area, and the electroglottogram. The Journal of the Acoustical Society of America, 80(5), 1309–1320.
Wilhelms-Tricarico, R. (1995). Physiological modeling of speech production: Methods for modeling soft-tissue articulators. The Journal of the Acoustical Society of America, 97(5), 3085–3098.
Steinecke, I., & Herzel, H. (1995). Bifurcations in an asymmetric vocal-fold model. The Journal of the Acoustical Society of America, 97(3), 1874–1884.
Deng, L. (1999). Computational models for speech production. In Computational Models of Speech Pattern Processing (K. Ponting Ed.) (pp. 199–213). Berlin: Springer.
Alipour, F., Berry, D. A., & Titze, I. R. (2000). A finite-element model of vocal-fold vibration. The Journal of the Acoustical Society of America, 108(6), 3003–3012.
Drechsel, J. S., & Thomson, S. L. (2008). Influence of supraglottal structures on the glottal jet exiting a two-layer synthetic, self-oscillating vocal fold model. The Journal of the Acoustical Society of America, 123(6), 4434–4445.
Sagisaka, Y., Campbell, N., & Higuchi, N. (Eds.). (2012). Computing prosody: Computational models for processing spontaneous speech. Berlin: Springer Science & Business Media.
Stouten, V. (2009). Automatic voice onset time estimation from reassignment spectra. Speech Communication, 51(12), 1194–1205.
Lin, C. Y., & Wang, H. C. (2011). Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection. The Journal of the Acoustical Society of America, 130(1), 514–525.
Hansen, J. H., Gray, S. S., & Kim, W. (2010). Automatic voice onset time detection for unvoiced stops (/p/,/t/,/k/) with application to accent classification. Speech Communication, 52(10), 777–789.
Sonderegger, M., & Keshet, J. (2012). Automatic measurement of voice onset time using discriminative structured prediction. The Journal of the Acoustical Society of America, 132(6), 3965–3979.
Keshet, J., Shalev-Shwartz, S., Singer, Y., & Chazan, D. (2007). A large margin algorithm for speech-to-phoneme and music-to-score alignment. IEEE Transactions on Audio, Speech, and Language Processing, 15(8), 2373–2382.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.
Breiman, L. (2017). Classification and regression trees. Routledge Press, Taylor & Francis Group.
Torgo, L., & Gama, J. (1997). Regression using classification algorithms. Intelligent Data Analysis, 1(4), 275–292.
Memon, S. A., Zhao, W., Raj, B., & Singh, R. (2018). Neural regression trees. arXiv:1810.00974.
Shashanka, M., Raj, B., & Smaragdis, P. (2008). Probabilistic latent variable models as nonnegative factorizations. Computational Intelligence and Neuroscience. Article ID 947438.
Lee, D. D., & Seung, H. S. (2001). Algorithms for non-negative matrix factorization. In Advances in Neural Information Processing Systems 13 (T.K. Leen, T.G. Dietterich & V. Tresp (Eds.)), Proceedings of the Neural Information Processing Systems (NIPS) (pp. 556–562).
Gaussier, E., & Goutte, C. (2005). Relation between PLSA and NMF and implications. In Proceedings of the Twenty-Eighth Annual International Conference on Research and Development in Information Retrieval (SIGIR) (pp. 601–602). Salvador, Brazil: ACM.
Cichocki, A., Zdunek, R., & Amari, S. I. (2006). Csiszar’s divergences for non-negative matrix factorization: Family of new algorithms. In International Conference on Independent Component Analysis and Blind Signal Separation (ICA) (pp. 32–39). Charleston, SC, USA.
Cichocki, A., Lee, H., Kim, Y. D., & Choi, S. (2008). Non-negative matrix factorization with \(\alpha \)-divergence. Pattern Recognition Letters, 29(9), 1433–1440.
Heiler, M. & Schnörr, C. (2006). Controlling sparseness in non-negative tensor factorization. In Proceedings of the European Conference on Computer Vision (ECCV) (56–67). Graz, Austria.
Virtanen, T. (2007). Monaural sound source separation by nonnegative matrix factorization with temporal continuity and sparseness criteria. IEEE Transactions on Audio, Speech, and Language Processing, 15(3), 1066–1074.
Huang, P. S., Kim, M., Hasegawa-Johnson, M., & Smaragdis, P. (2014). Deep learning for monaural speech separation. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 1562–1566). lorence, Italy: IEEE.
Kumar, A. (2018). Acoustic Intelligence in Machines, Doctoral dissertation. School of Computer Science: Carnegie Mellon University.
Lewis, D. D. (1998). Naive (Bayes) at forty: The independence assumption in information retrieval. In Proceedings of the European Conference on Machine Learning (V. Barr, & Z. Markov (Eds.)) (pp. 4–15). Heidelberg: Springer.
Zhang, H. (2004). The optimality of naive Bayes. In Proceedings of the Seventeenth International Florida Artificial Intelligence Research Society Conference (FLAIRS). Florida, USA: AAAI.
Ng, A. Y., & Jordan, M. I. (2002). On discriminative versus generative classifiers: A comparison of logistic regression and naive Bayes. In Advances in Neural Information Processing Systems (pp. 841–848).
Scholkopf, B., & Smola, A. J. (2001). Learning with kernels: Support vector machines, regularization, optimization, and beyond. Massachusetts, USA: MIT press.
Ho, T.K. (1995). Random decision forests. In Proceedings of the Third International Conference on Document Analysis and Recognition (Vol. 1, pp. 278–282). Montreal, Canada: IEEE.
Myers, R. H., & Myers, R. H. (1990). Classical and Modern Regression with Applications (Vol. 2). Belmont, California: Duxbury Press.Classical and Modern Regression with Applications.
Frank, E., Wang, Y., Inglis, S., Holmes, G., & Witten, I. H. (1998). Using model trees for classification. Machine Learning, 32(1), 63–76.
Landwehr, N., Hall, M., & Frank, E. (2003). Logistic model trees. In Proceedings of the European Conference on Machine Learning (ECML) (pp. 241–252). Cavtat-Dubrovnik, Coratia.
Hoerl, A. E., & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. The Annals of Statistics, 32(2), 407–499.
Park, T., & Casella, G. (2008). The Bayesian lasso. Journal of the American Statistical Association, 103(482), 681–686.
Bedard, A., & Georges, T. (2000). Atmospheric infrasound. Acoustics Australia, 28(2), 47–52.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Singh, R. (2019). Mechanisms for Profiling. In: Profiling Humans from their Voice. Springer, Singapore. https://doi.org/10.1007/978-981-13-8403-5_8
Download citation
DOI: https://doi.org/10.1007/978-981-13-8403-5_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-8402-8
Online ISBN: 978-981-13-8403-5
eBook Packages: EngineeringEngineering (R0)