Abstract
This paper proposes an automatic system which uses multimodal techniques for automatically estimating oral presentation skills. It is based on a set of features from three sources; audio, gesture and power-point slides. Machine learning techniques are used to classify each presentation into two classes (high vs. low) and into three classes; low, average, and high-quality presentation. Around 448 Multimodal recordings of the MLA’14 dataset were used for training and evaluating three different 2-class and 3-class classifiers. Classifiers were evaluated for each feature type independently and for all features combined together. The best accuracy of the 2-class systems is 90.1% achieved by SVM trained on audio features and 75% for 3-class systems achieved by random forest trained on slides features. Combining three feature types into one vector improves all systems accuracy by around 5%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Mohammadi, G., Vinciarelli, A.: Humans as feature extractors: combining prosody and personality perception for improved speaking style recognition. In: Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, Anchorage, Alaska, USA, 9–12 October 2011, pp. 363–366 (2011)
Hincks, R..: Processing the prosody of oral presentations. In: InSTIL/ICALL Symposium (2004)
Shan-Wen, H., Sun, H.C., Hsieh, M.C., Tsai, M.H., Lin, H.C., Lee, C.C.: A multimodal approach for automatic assessment of school principals’ oral presentation during pre-service training program. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Gorovoy, K., Tung, J., Poupart, P.: Automatic speech feature extraction for cognitive load classification. In: Conference of the Canadian Medical and Biological Engineering Society (CMBEC) (2010)
Luzardo, G., Guaman B., Chiluiza, K., Castells, J., Ochoa, X.: Estimation of presentations skills based on slides and audio features. In: Proceedings of the 2014 ACM Workshop on Multimodal Learning Analytics Workshop and Grand Challenge, MLA 2014, pp. 37–44. ACM, New York (2014)
Kim, S., Jung, W., Han, K., Lee, J.-G., Yi, Mun Y.: Quality-based automatic classification for presentation slides. In: Rijke, M., Kenter, T., Vries, A.P., Zhai, C., Jong, F., Radinsky, K., Hofmann, K. (eds.) ECIR 2014. LNCS, vol. 8416, pp. 638–643. Springer, Cham (2014). doi:10.1007/978-3-319-06028-6_69
Jamalian, B., Tversky, T.: Gestures alter thinking about time. In: Proceedings of the 34th Annual Conference of the Cognitive Science Society (CogSci) (2012)
Carney, A.J.C., Dana, R., Yap, A.J.: Power posing: brief nonverbal displays affect neuroendocrine levels and risk tolerance. Psychol. Sci. 21(10), 1363–1368 (2010)
Burgoon, J.K., Jensen, M.L., Meservy, T.O., Kruse, J., Nunamaker, J.F.: Augmenting human identification of emotional states in video. In: Intelligence Analysis Conference, McClean, VA (2005)
Kopf, S., Guthier, B., Rietsche, R., Effelsberg, W., Schon, D.: A real-time feedback system for presentation skills. In: MLA 2014, Mannheim, Germany, pp. 1633–1640 (2015)
Fang, Z., Zhang, G., Song, Z.: Comparison of different implementations of MFCC. J. Comput. Sci. Technol. 16(6), 582–589 (2001)
Talkin, D.: A robust algorithm for pitch tracking (RAPT). In: Kleijn, W., Paliwal, K. (eds.) Speech Coding and Synthesis, pp. 495–518. Elsevier, New York (1995)
Paul, B., David, W.: PRAAT: doing phonetics by computer [Computer program]. Version 6.0.26. http://www.praat.org/. Accessed 2 Mar 2017
Jong, D., Nivja, H., Wempe, T.: Praat script to detect syllable nuclei and measure speech rate automatically. Behav. Res. Methods 41(2), 385–390 (2009)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Hanani, A., Al-Amleh, M., Bazbus, W., Salameh, S. (2017). Automatic Estimation of Presentation Skills Using Speech, Slides and Gestures. In: Karpov, A., Potapova, R., Mporas, I. (eds) Speech and Computer. SPECOM 2017. Lecture Notes in Computer Science(), vol 10458. Springer, Cham. https://doi.org/10.1007/978-3-319-66429-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-319-66429-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-66428-6
Online ISBN: 978-3-319-66429-3
eBook Packages: Computer ScienceComputer Science (R0)