Applying Neural Network Techniques for Topic Change Detection in the HuComTech Corpus

Kovács, György; Szekrényes, István

doi:10.1007/978-3-030-22895-8_8

György Kovács^5,6,7 &
István Szekrényes⁸

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 164))

334 Accesses
1 Citations

Abstract

In the age of The Internet we are generating documents (both written and spoken) at an unprecedented rate. This rate of document creation—as well as the number of already existing documents—makes manual processing time-consuming and costly to the point of infeasibility. This is the reason why we are in need of automatic methods that are suitable for the processing of written as well as spoken documents. One crucial part of processing documents is partitioning said documents into different segments based on the topic being discussed. A self-evident application of this would be for example partitioning a news broadcast into different news stories. One of the first steps of doing so would be identifying the shifts in the topic framework, or in other words, finding the time-interval where the announcer is changing from one news story to the next. Naturally, as the transition between news stories are often accompanied by easily identifiable audio—(e.g. signal) and visual (e.g. change in graphics) cues, this would not be a particularly different task. However, in other cases the solution to this problem would be far less obvious. Here, we approach this task for the case of spoken dialogues (interviews). One particular difficulty of these dialogues is that the interlocutors often switch between languages. Because of this (and in the hope of contributing to the generality of our method) we carried out topic change detection in a content-free manner, focusing on speaker roles, and prosodic features. For the processing of said features we will employ neural networks, and will demonstrate that using the proper classifier combination methods this can lead to a detection performance that is competitive with that of the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org
Amaral R, Trancoso I (2009) Exploring the structure of broadcast news for topic segmentation. In: Vetulani Z, Uszkoreit H (eds) Human language technology. Challenges of the information society. Springer, Berlin, pp 1–12
Google Scholar
Angheluta R, Busser RD, Moens MF (2002) The use of topic segmentation for automatic summarization. In: Workshop on text summarization in conjunction with the ACL 2002 and including the DARPA/NIST sponsored DUC 2002 meeting on text summarization, pp 11–12
Google Scholar
Banerjee S, Rudnicky AI (2007) Segmenting meetings into agenda items by extracting implicit supervision from human note-taking. In: Proceedings of IUI, pp 151–159
Google Scholar
Beeferman D, Berger JLA (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210
Article Google Scholar
Boersma DP, Weenink (2016) Praat: doing phonetics by computer [computer program]. version 6.0.22. http://www.praat.org/. Accessed 15 Nov 2016
Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Taylor & Francis, London
Google Scholar
Calhoun S (2002) Using prosody in ASR: the segmentation of broadcast radio news. Master’s thesis, University of Edinburgh
Google Scholar
Chifu AG, Fournier S (2016) SegChain: towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions. In: Proceedings of the 6th international conference on web intelligence, mining and semantics, pp 1–8
Google Scholar
Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp 1724–1734. http://www.aclweb.org/anthology/D14-1179
Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of NAACL, pp 26–33
Google Scholar
de Jong NH, Wempe T (2009) Praat script to detect syllable nuclei and measure speech rate automatically. Behav Res Methods 41(2):385–390. https://doi.org/10.3758/BRM.41.2.385
Article Google Scholar
Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp 1597–1600
Google Scholar
Dombi J (2013) On a certain class of aggregative operators. Inf Sci 245:313–328
Article MathSciNet Google Scholar
Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87
Article Google Scholar
Galley M, McKeown K, Fosler-Lussier E, Jing H (2003) Discourse segmentation of multi-party conversation. In: Proceedings of ACL, pp 562–569
Google Scholar
Galukov P (2012) Application of topic segmentation in audiovisual information retrieval. In: Proceedings of WDS, pp 118–122
Google Scholar
Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of AISTATS, pp 315–323
Google Scholar
Grosz BJ, Sidner CL (1986) Attention, intentions, and the structure of discourse. Comput Linguist 12(3):175–204
Google Scholar
Grósz T, Nagy I (2014) Document classification with deep rectifier neural networks and probabilistic sampling. In: Proceedings of TSD, pp 108–115
Chapter Google Scholar
Grósz T, Busa-Fekete R, Gosztolya G, Tóth L (2015) Assessing the degree of nativeness and Parkinson’s condition using Gaussian processes and deep rectifier neural networks. In: Proceedings of Interspeech, pp 1339–1343
Google Scholar
Gruenstein A, Niekrasz J, Purver M (2005) Meeting structure annotation: data and tools. In: Proceedings of SIGDIAL, pp 117–127
Google Scholar
Gruenstein A, Niekrasz J, Purver M (2008) Meeting structure annotation. In: Dybkjær L, Minker W (eds) Recent trends in discourse and dialogue. Springer, Netherlands, pp 247–274
Chapter Google Scholar
Hearst MA (1994) Multi-paragraph segmentation of expository text. In: Proceedings of the ACL, pp 9–16
Google Scholar
Hirschberg J, Nakatani CH (1996) A prosodic analysis of discourse segments in direction-giving monologues. In: Proceedings of the ACL, pp 286–293
Google Scholar
Hirschberg J, Nakatani CH (1998) Acoustic indicators of topic segmentation. In: Proceedings of ICSLP
Google Scholar
Hunyadi L, Váradi T, Szekrényes I (2016) Language technology tools and resources for the analysis of multimodal communication. In: Proceedings of LT4DH, University of Tübingen, Tübingen, pp 117–124
Google Scholar
James AD (1995) Topic shift in casual conversation. Totem: Univ West Ont J Anthropol 2(1)
Google Scholar
Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: Proceedings of ICML, pp 2342–2350
Google Scholar
Kane B, Luz S (2006) Multidisciplinary medical team meetings: an analysis of collaborative working with special attention to timing and teleconferencing. Comput Support Coop Work 15(5–6):501–535
Google Scholar
Khandelwal S, Lecouteux B, Besacier L (2016) Comparing GRU and LSTM for automatic speech recognition. Research report, LIG. https://hal.archives-ouvertes.fr/hal-01633254
Kovács G, Váradi T (2017) Examining the contribution of various modalities to topical unit classification on the HuComTech corpus (in Hungarian). In: Proceedings of MSZNY, pp 193–204
Google Scholar
Kovács G, Grósz T, Váradi T (2016) Topical unit classification using deep neural nets and probabilistic sampling. In: Proceedings of CogInfoCom, pp 199–204
Google Scholar
Kozima H (1993) Text segmentation based on similarity between words. In: Proceedings of the ACL, pp 286–288
Google Scholar
Kuta M, Morawiec M, Kitowski J (2017) Sentiment analysis with tree-structured gated recurrent units. In: Proceedings of TSD, pp 74–82
Chapter Google Scholar
Lawrence S, Burns I, Back A, Tsoi AC, Giles CL (1998) Neural network classification and prior class probabilities. In: Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Berlin, pp 299–313
Chapter Google Scholar
Lu L, Zhang X, Renais S (2016) On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In: Proceedings of ICASSP, pp 5060–5064
Google Scholar
Luz S (2009) Locating case discussion segments in recorded medical team meetings. In: Proceedings of the third workshop on searching spontaneous conversational speech, SSCS ’09. ACM, pp 21–30
Google Scholar
Luz S, Su J (2010) Assessing the effectiveness of conversational features for dialogue segmentation in medical team meetings and in the AMI corpus. In: Proceedings of SIGDIAL, pp 332–339
Google Scholar
Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol 30/1
Google Scholar
Malioutov I, Park A, Barzilay R, Glass J (2007) Making sense of sound: unsupervised topic segmentation over acoustic input. In: Proceedings of the ACL, pp 504–511
Google Scholar
Molugu MC (2003) Topic segmentation. Master’s thesis, University of Edinburgh
Google Scholar
Passonneau RJ, Litman DJ (1997) Discourse segmentation by human and automated means. Comput Linguist 23(1):103–139
Google Scholar
Purver M (2011) Topic segmentation. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, New York, pp 291–317
Chapter Google Scholar
Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco
Google Scholar
Reynar JC (1994) An automatic method of finding topic boundaries. In: Proceedings of the ACL, pp 331–333
Google Scholar
Rosenberg A (2012) Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech, pp 2242–2245
Google Scholar
Sapru A, Bourlard H (2014) Detecting speaker roles and topic changes in multiparty conversations using latent topic models. In: Proceedings of Interspeech, pp 2882–2886
Google Scholar
Sheikh I, Fohr D, Illina I (2017) Topic segmentation in ASR transcripts using bidirectional RNNs for change detection. In: Proceedings of ASRU
Google Scholar
Sherman M, Liu Y (2008) Using hidden Markov models for topic segmentation of meeting transcripts. In: Proceedings of SLT, pp 185–188
Google Scholar
Shriberg E, Stolcke A, Hakkani-Tür D, Tür G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32(1–2):127–154
Article Google Scholar
Sitbon L, Bellot P (2007) Topic segmentation using weighted lexical links (WLL). In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 737–738
Google Scholar
Szekrényes I (2015) Prosotool, a method for automatic annotation of fundamental frequency. In: 6th IEEE international conference on cognitive infocommunications (CogInfoCom). IEEE, New York, pp 291–296
Google Scholar
Szekrényes I, Kovács G (2017) Classification of formal and informal dialogues based on turn-taking and intonation using deep neural networks. In: Karpov A, Potapova R, Mporas I (eds) Speech and computer. Springer International Publishing, Cham, pp 233–243
Chapter Google Scholar
Tóth L (2013) Phone recognition with deep sparse rectifier neural networks. In: Proceedings of ICASSP, pp 6985–6989
Google Scholar
Tóth L, Kocsor A (2005) Training HMM/ANN hybrid speech recognizers by probabilistic sampling. In: Proceedings of ICANN, pp 597–603
Chapter Google Scholar
Tür G, Hakkani-Tür DZ, Stolcke A, Shriberg E (2001) Integrating prosodic and lexical cues for automatic topic segmentation. CoRR 31–57
Article Google Scholar

Download references

Acknowledgements

The research reported in the paper was conducted with the support of the Hungarian Scientific Research Fund (OTKA) grant #K116938 and #K116402. Ministry of Human Capacities, Hungary grant 20391-3/2018/FEKUSTRAT is acknowledged.

Author information

Authors and Affiliations

Research Institute for Linguistics of the Hungarian Academy of Sciences, Budapest, Hungary
György Kovács
MTA SZTE Research Group on Artificial Intelligence, Szeged, Hungary
György Kovács
Embedded Internet Systems Lab, Luleå University of Technology, Luleå, Sweden
György Kovács
Institute of Philosophy, University of Debrecen, Debrecen, Hungary
István Szekrényes

Authors

György Kovács
View author publications
You can also search for this author in PubMed Google Scholar
István Szekrényes
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to István Szekrényes .

Editor information

Editors and Affiliations

Department of General and Applied Linguistics, University of Debrecen, Debrecen, Hungary
Laszlo Hunyadi
Institute of Philosophy, University of Debrecen, Debrecen, Hungary
István Szekrényes

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kovács, G., Szekrényes, I. (2020). Applying Neural Network Techniques for Topic Change Detection in the HuComTech Corpus. In: Hunyadi, L., Szekrényes, I. (eds) The Temporal Structure of Multimodal Communication. Intelligent Systems Reference Library, vol 164. Springer, Cham. https://doi.org/10.1007/978-3-030-22895-8_8

Download citation

DOI: https://doi.org/10.1007/978-3-030-22895-8_8
Published: 25 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-22894-1
Online ISBN: 978-3-030-22895-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics