Abstract
This paper reports a study of automatic attitude recognition from a collection of over 500 segments of our video blog data. We annotated and analysed 3 different attitudinal states of the speakers. Following that, we extracted and analysed prosodic and visual features relevant to the classification task. We use machine learning methods and techniques to attain better understanding of the feature sets and their contribution to the prediction model.
Keywords
- Attitude prediction
- Audio and visual feature analysis
- Machine learning
- Random forest
- LibSVM
This is a preview of subscription content, access via your institution.
Buying options


Notes
- 1.
- 2.
https://www.youtube.com/user/nigahiga, https://www.youtube.com/user/JustinJamesHughes, https://www.youtube.com/user/uncuthashbrown, https://www.youtube.com/user/kevjumba, https://www.youtube.com/user/tyleroakley, https://www.youtube.com/user/ConnorFranta, https://www.youtube.com/user/TimothyDeLaGhetto2, https://www.youtube.com/user/DavidSoComedy, https://www.youtube.com/user/michaelbalalis, https://www.youtube.com/user/shane.
References
Gobl, C., Ní Chasaide, A.: The role of voice quality in communicating emotion, mood and attitude. Speech Commun. 40(1–2), 189–212 (2003)
Henrichsen, P.J., Allwood, J.: Predicting the attitude flow in dialogue based on multi-modal speech cues. In: NEALT Proceedings Series (2012)
Mac, D.-K., et al.: Cross-cultural perception of vietnamese audio-visual prosodic attitudes. In: Speech Prosody (2010)
Baumeister, R.F., Finkel, E.J.: Advanced Social Psychology: The State of the Science. Oxford University Press, USA (2010)
Wen, G., et al.: A survey of videoblogging technology on the web. ACM Comput. Surv. (CSUR) 42(4), 15–78 (2010)
Biel, J.I., Aran, O., Gatica-Perez, D.: You are known by how you vlog: personality impressions and nonverbal behavior in YouTube. In: ICWSM (2011)
Morency, L.P., Mihalcea, R., Doshi, P.: Towards multimodal sentiment analysis: harvesting opinions from the web. In: Proceedings of the 13th International Conference on Multimodal Interfaces, pp. 169–176 (2011)
Madzlan, N., et al.: Automatic recognition of attitudes in video blogs - prosodic and visual feature analysis. In: INTERSPEECH (2014)
Ekman, P.: About brows: emotional and conversational signals. In: von Cranach, M., Foppa, K., Lepenies, W., Ploog, D. (eds.) Human Ethology, pp. 169–249. Cambridge University Press, Cambridge (1979)
Sadrô, J., Jarudi, I., Sinhaô, P.: The role of eyebrows in face recognition. Perception 32(3), 285–293 (2003)
Anllo-Vento, L., Hillyard, S.A.: Selective attention to the color and direction of moving stimuli: electrophysiological correlates of hierarchical feature selection. Percept. Psychophys. 58(2), 191–206 (1996)
Chih-Chung, C., Chih-Jen, L.: LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2(3), 27:1–27:27 (2011)
Acknowledgments
This work is supported by the English Language and Literature Department, UPSI, Ministry of Education Malaysia, Center for Global Intelligent Content (CNGL) at TCD and the Speech Communication Laboratory at TCD.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Madzlan, N.A., Huang, Y., Campbell, N. (2015). Automatic Classification and Prediction of Attitudes: Audio - Visual Analysis of Video Blogs. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds) Speech and Computer. SPECOM 2015. Lecture Notes in Computer Science(), vol 9319. Springer, Cham. https://doi.org/10.1007/978-3-319-23132-7_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-23132-7_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-23131-0
Online ISBN: 978-3-319-23132-7
eBook Packages: Computer ScienceComputer Science (R0)