Abstract
In the era of Internet, there is a tremendous amount of textual and audio data spread all over the place, and it becomes very important to develop a method to fetch the most important information efficiently and quickly. Extracting summary manually is a very redundant and time-consuming process. A good summarizing technique is one where we discern all the important points and topics of a speech or document without leaving out any valuable information. Summarizing a speech without losing the actual context has always been a challenge for programmers for a long time. This paper explores a method to divide a large speech into multiple small speeches to summarize them individually to generate an efficient and precise summary. Each sub-speech is further processed to predict the emotion of the speaker at various points during the speech. These individual emotions are used to classify a generalized emotion of the speaker throughout the speech.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
C. Busso, Z. Deng, S. Yildirim, M. Bulut, C.M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan, Analysis of emotion recognition using facial expressions, speech and multimodal information, in Proceedings of the 6th International Conference on Multimodal Interfaces (2004), pp. 205–211
C. Zhang, F.L. Kreyssig, Q. Li, P.C. Woodland, PyHTK: python library and ASR pipelines for HTK, in ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2019), pp. 6470–6474
T. Giannakopoulos, Pyaudioanalysis: an open-source python library for audio signal analysis. PLoS ONE 10(12), e0144610 (2015)
P. Achananuparp, X. Hu, X. Shen, The evaluation of sentence similarity measures, in International Conference on Data Warehousing and Knowledge Discovery (Springer, Berlin, 2008), pp. 305–316
N. Reimers, I. Gurevych, Sentence-Bert: sentence embeddings using Siamese Bert-networks (2019). arXiv preprint arXiv:1908.10084. J. Clerk Maxwell, A Treatise on Electricity and Magnetism, 3rd edn., vol. 2 (Clarendon, Oxford, 1892), pp. 68–73
E. Loper, S. Bird, Nltk: the natural language toolkit (2002). arXiv preprint cs/0205028
K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, Y. Bengio, Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
P. Koehn, Pharaoh: a beam search decoder for phrase-based statistical machine translation models, in Conference of the Association for Machine Translation in the Americas (Springer, Berlin, 2004), pp. 115–124
R. Sahba, N. Ebadi, M. Jamshidi, P. Rad, Automatic text summarization using customizable fuzzy features and attention on the context and vocabulary, in 2018 World Automation Congress (WAC) (IEEE, 2018), pp. 1–5
A. Milton, S.S. Roy, S.T. Selvi, SVM scheme for speech emotion recognition using MFCC feature. Int. J. Comput. Appl. 69(9) (2013)
B. Logan, Mel frequency cepstral coefficients for music modeling, in Ismir, vol. 270 (2000), pp. 1–11
X. Fan, J.H. Hansen, Speaker identification with whispered speech based on modified LFCC parameters and feature mapping, in 2009 IEEE International Conference on Acoustics, Speech and Signal Processing (IEEE, 2009), pp. 4553–4556
O.C. Ai, M. Hariharan, S. Yaacob, L.S. Chee, Classification of speech dysfluencies with MFCC and LPCC features. Expert Syst. Appl. 39(2), 2157–2165 (2012)
B. McFee, C. Raffel, D. Liang, D.P. Ellis, M. McVicar, E. Battenberg, O. Nieto, Librosa: audio and music signal analysis in python, in Proceedings of the 14th Python in Science Conference, vol. 8 (2015), pp. 18–25
G. Varoquaux, O. Grisel, Joblib: running python function as pipeline jobs (2009). packages.python.org/joblib
R. Yamashita, M. Nishio, R.K.G. Do et al., Convolutional neural networks: an overview and application in radiology. Insights Imaging 9, 611–629 (2018). https://doi.org/10.1007/s13244-018-0639-9
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Anand, A., Choudhary, H., Singhania, A., Manuraj, A., Jayashree, R. (2023). Topic-Wise Speech Summarization with Emotion Classification. In: Kumar, A., Ghinea, G., Merugu, S., Hashimoto, T. (eds) Proceedings of the International Conference on Cognitive and Intelligent Computing. Cognitive Science and Technology. Springer, Singapore. https://doi.org/10.1007/978-981-19-2358-6_39
Download citation
DOI: https://doi.org/10.1007/978-981-19-2358-6_39
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2357-9
Online ISBN: 978-981-19-2358-6
eBook Packages: Computer ScienceComputer Science (R0)