Abstract
In the age of The Internet we are generating documents (both written and spoken) at an unprecedented rate. This rate of document creation—as well as the number of already existing documents—makes manual processing time-consuming and costly to the point of infeasibility. This is the reason why we are in need of automatic methods that are suitable for the processing of written as well as spoken documents. One crucial part of processing documents is partitioning said documents into different segments based on the topic being discussed. A self-evident application of this would be for example partitioning a news broadcast into different news stories. One of the first steps of doing so would be identifying the shifts in the topic framework, or in other words, finding the time-interval where the announcer is changing from one news story to the next. Naturally, as the transition between news stories are often accompanied by easily identifiable audio—(e.g. signal) and visual (e.g. change in graphics) cues, this would not be a particularly different task. However, in other cases the solution to this problem would be far less obvious. Here, we approach this task for the case of spoken dialogues (interviews). One particular difficulty of these dialogues is that the interlocutors often switch between languages. Because of this (and in the hope of contributing to the generality of our method) we carried out topic change detection in a content-free manner, focusing on speaker roles, and prosodic features. For the processing of said features we will employ neural networks, and will demonstrate that using the proper classifier combination methods this can lead to a detection performance that is competitive with that of the state-of-the-art.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org
Amaral R, Trancoso I (2009) Exploring the structure of broadcast news for topic segmentation. In: Vetulani Z, Uszkoreit H (eds) Human language technology. Challenges of the information society. Springer, Berlin, pp 1–12
Angheluta R, Busser RD, Moens MF (2002) The use of topic segmentation for automatic summarization. In: Workshop on text summarization in conjunction with the ACL 2002 and including the DARPA/NIST sponsored DUC 2002 meeting on text summarization, pp 11–12
Banerjee S, Rudnicky AI (2007) Segmenting meetings into agenda items by extracting implicit supervision from human note-taking. In: Proceedings of IUI, pp 151–159
Beeferman D, Berger JLA (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210
Boersma DP, Weenink (2016) Praat: doing phonetics by computer [computer program]. version 6.0.22. http://www.praat.org/. Accessed 15 Nov 2016
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Taylor & Francis, London
Calhoun S (2002) Using prosody in ASR: the segmentation of broadcast radio news. Master’s thesis, University of Edinburgh
Chifu AG, Fournier S (2016) SegChain: towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions. In: Proceedings of the 6th international conference on web intelligence, mining and semantics, pp 1–8
Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp 1724–1734. http://www.aclweb.org/anthology/D14-1179
Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of NAACL, pp 26–33
de Jong NH, Wempe T (2009) Praat script to detect syllable nuclei and measure speech rate automatically. Behav Res Methods 41(2):385–390. https://doi.org/10.3758/BRM.41.2.385
Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp 1597–1600
Dombi J (2013) On a certain class of aggregative operators. Inf Sci 245:313–328
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Galley M, McKeown K, Fosler-Lussier E, Jing H (2003) Discourse segmentation of multi-party conversation. In: Proceedings of ACL, pp 562–569
Galukov P (2012) Application of topic segmentation in audiovisual information retrieval. In: Proceedings of WDS, pp 118–122
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of AISTATS, pp 315–323
Grosz BJ, Sidner CL (1986) Attention, intentions, and the structure of discourse. Comput Linguist 12(3):175–204
Grósz T, Nagy I (2014) Document classification with deep rectifier neural networks and probabilistic sampling. In: Proceedings of TSD, pp 108–115
Grósz T, Busa-Fekete R, Gosztolya G, Tóth L (2015) Assessing the degree of nativeness and Parkinson’s condition using Gaussian processes and deep rectifier neural networks. In: Proceedings of Interspeech, pp 1339–1343
Gruenstein A, Niekrasz J, Purver M (2005) Meeting structure annotation: data and tools. In: Proceedings of SIGDIAL, pp 117–127
Gruenstein A, Niekrasz J, Purver M (2008) Meeting structure annotation. In: Dybkjær L, Minker W (eds) Recent trends in discourse and dialogue. Springer, Netherlands, pp 247–274
Hearst MA (1994) Multi-paragraph segmentation of expository text. In: Proceedings of the ACL, pp 9–16
Hirschberg J, Nakatani CH (1996) A prosodic analysis of discourse segments in direction-giving monologues. In: Proceedings of the ACL, pp 286–293
Hirschberg J, Nakatani CH (1998) Acoustic indicators of topic segmentation. In: Proceedings of ICSLP
Hunyadi L, Váradi T, Szekrényes I (2016) Language technology tools and resources for the analysis of multimodal communication. In: Proceedings of LT4DH, University of Tübingen, Tübingen, pp 117–124
James AD (1995) Topic shift in casual conversation. Totem: Univ West Ont J Anthropol 2(1)
Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: Proceedings of ICML, pp 2342–2350
Kane B, Luz S (2006) Multidisciplinary medical team meetings: an analysis of collaborative working with special attention to timing and teleconferencing. Comput Support Coop Work 15(5–6):501–535
Khandelwal S, Lecouteux B, Besacier L (2016) Comparing GRU and LSTM for automatic speech recognition. Research report, LIG. https://hal.archives-ouvertes.fr/hal-01633254
Kovács G, Váradi T (2017) Examining the contribution of various modalities to topical unit classification on the HuComTech corpus (in Hungarian). In: Proceedings of MSZNY, pp 193–204
Kovács G, Grósz T, Váradi T (2016) Topical unit classification using deep neural nets and probabilistic sampling. In: Proceedings of CogInfoCom, pp 199–204
Kozima H (1993) Text segmentation based on similarity between words. In: Proceedings of the ACL, pp 286–288
Kuta M, Morawiec M, Kitowski J (2017) Sentiment analysis with tree-structured gated recurrent units. In: Proceedings of TSD, pp 74–82
Lawrence S, Burns I, Back A, Tsoi AC, Giles CL (1998) Neural network classification and prior class probabilities. In: Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Berlin, pp 299–313
Lu L, Zhang X, Renais S (2016) On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In: Proceedings of ICASSP, pp 5060–5064
Luz S (2009) Locating case discussion segments in recorded medical team meetings. In: Proceedings of the third workshop on searching spontaneous conversational speech, SSCS ’09. ACM, pp 21–30
Luz S, Su J (2010) Assessing the effectiveness of conversational features for dialogue segmentation in medical team meetings and in the AMI corpus. In: Proceedings of SIGDIAL, pp 332–339
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol 30/1
Malioutov I, Park A, Barzilay R, Glass J (2007) Making sense of sound: unsupervised topic segmentation over acoustic input. In: Proceedings of the ACL, pp 504–511
Molugu MC (2003) Topic segmentation. Master’s thesis, University of Edinburgh
Passonneau RJ, Litman DJ (1997) Discourse segmentation by human and automated means. Comput Linguist 23(1):103–139
Purver M (2011) Topic segmentation. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, New York, pp 291–317
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco
Reynar JC (1994) An automatic method of finding topic boundaries. In: Proceedings of the ACL, pp 331–333
Rosenberg A (2012) Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech, pp 2242–2245
Sapru A, Bourlard H (2014) Detecting speaker roles and topic changes in multiparty conversations using latent topic models. In: Proceedings of Interspeech, pp 2882–2886
Sheikh I, Fohr D, Illina I (2017) Topic segmentation in ASR transcripts using bidirectional RNNs for change detection. In: Proceedings of ASRU
Sherman M, Liu Y (2008) Using hidden Markov models for topic segmentation of meeting transcripts. In: Proceedings of SLT, pp 185–188
Shriberg E, Stolcke A, Hakkani-Tür D, Tür G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32(1–2):127–154
Sitbon L, Bellot P (2007) Topic segmentation using weighted lexical links (WLL). In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 737–738
Szekrényes I (2015) Prosotool, a method for automatic annotation of fundamental frequency. In: 6th IEEE international conference on cognitive infocommunications (CogInfoCom). IEEE, New York, pp 291–296
Szekrényes I, Kovács G (2017) Classification of formal and informal dialogues based on turn-taking and intonation using deep neural networks. In: Karpov A, Potapova R, Mporas I (eds) Speech and computer. Springer International Publishing, Cham, pp 233–243
Tóth L (2013) Phone recognition with deep sparse rectifier neural networks. In: Proceedings of ICASSP, pp 6985–6989
Tóth L, Kocsor A (2005) Training HMM/ANN hybrid speech recognizers by probabilistic sampling. In: Proceedings of ICANN, pp 597–603
Tür G, Hakkani-Tür DZ, Stolcke A, Shriberg E (2001) Integrating prosodic and lexical cues for automatic topic segmentation. CoRR 31–57
Acknowledgements
The research reported in the paper was conducted with the support of the Hungarian Scientific Research Fund (OTKA) grant #K116938 and #K116402. Ministry of Human Capacities, Hungary grant 20391-3/2018/FEKUSTRAT is acknowledged.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Kovács, G., Szekrényes, I. (2020). Applying Neural Network Techniques for Topic Change Detection in the HuComTech Corpus. In: Hunyadi, L., Szekrényes, I. (eds) The Temporal Structure of Multimodal Communication. Intelligent Systems Reference Library, vol 164. Springer, Cham. https://doi.org/10.1007/978-3-030-22895-8_8
Download citation
DOI: https://doi.org/10.1007/978-3-030-22895-8_8
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22894-1
Online ISBN: 978-3-030-22895-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)