Skip to main content

Applying Neural Network Techniques for Topic Change Detection in the HuComTech Corpus

  • Chapter
  • First Online:
The Temporal Structure of Multimodal Communication

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 164))

Abstract

In the age of The Internet we are generating documents (both written and spoken) at an unprecedented rate. This rate of document creation—as well as the number of already existing documents—makes manual processing time-consuming and costly to the point of infeasibility. This is the reason why we are in need of automatic methods that are suitable for the processing of written as well as spoken documents. One crucial part of processing documents is partitioning said documents into different segments based on the topic being discussed. A self-evident application of this would be for example partitioning a news broadcast into different news stories. One of the first steps of doing so would be identifying the shifts in the topic framework, or in other words, finding the time-interval where the announcer is changing from one news story to the next. Naturally, as the transition between news stories are often accompanied by easily identifiable audio—(e.g. signal) and visual (e.g. change in graphics) cues, this would not be a particularly different task. However, in other cases the solution to this problem would be far less obvious. Here, we approach this task for the case of spoken dialogues (interviews). One particular difficulty of these dialogues is that the interlocutors often switch between languages. Because of this (and in the hope of contributing to the generality of our method) we carried out topic change detection in a content-free manner, focusing on speaker roles, and prosodic features. For the processing of said features we will employ neural networks, and will demonstrate that using the proper classifier combination methods this can lead to a detection performance that is competitive with that of the state-of-the-art.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Abadi M, Agarwal A, Barham P, Brevdo E, Chen Z, Citro C, Corrado GS, Davis A, Dean J, Devin M, Ghemawat S, Goodfellow I, Harp A, Irving G, Isard M, Jia Y, Jozefowicz R, Kaiser L, Kudlur M, Levenberg J, Mané D, Monga R, Moore S, Murray D, Olah C, Schuster M, Shlens J, Steiner B, Sutskever I, Talwar K, Tucker P, Vanhoucke V, Vasudevan V, Viégas F, Vinyals O, Warden P, Wattenberg M, Wicke M, Yu Y, Zheng X (2015) TensorFlow: large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/, software available from tensorflow.org

  • Amaral R, Trancoso I (2009) Exploring the structure of broadcast news for topic segmentation. In: Vetulani Z, Uszkoreit H (eds) Human language technology. Challenges of the information society. Springer, Berlin, pp 1–12

    Google Scholar 

  • Angheluta R, Busser RD, Moens MF (2002) The use of topic segmentation for automatic summarization. In: Workshop on text summarization in conjunction with the ACL 2002 and including the DARPA/NIST sponsored DUC 2002 meeting on text summarization, pp 11–12

    Google Scholar 

  • Banerjee S, Rudnicky AI (2007) Segmenting meetings into agenda items by extracting implicit supervision from human note-taking. In: Proceedings of IUI, pp 151–159

    Google Scholar 

  • Beeferman D, Berger JLA (1999) Statistical models for text segmentation. Mach Learn 34(1–3):177–210

    Article  Google Scholar 

  • Boersma DP, Weenink (2016) Praat: doing phonetics by computer [computer program]. version 6.0.22. http://www.praat.org/. Accessed 15 Nov 2016

  • Breiman L, Friedman J, Olshen R, Stone C (1984) Classification and regression trees. Taylor & Francis, London

    Google Scholar 

  • Calhoun S (2002) Using prosody in ASR: the segmentation of broadcast radio news. Master’s thesis, University of Edinburgh

    Google Scholar 

  • Chifu AG, Fournier S (2016) SegChain: towards a generic automatic video segmentation framework, based on lexical chains of audio transcriptions. In: Proceedings of the 6th international conference on web intelligence, mining and semantics, pp 1–8

    Google Scholar 

  • Cho K, van Merriënboer B, Gülçehre Ç, Bahdanau D, Bougares F, Schwenk H, Bengio Y (2014) Learning phrase representations using RNN encoder–decoder for statistical machine translation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), Association for Computational Linguistics, Doha, Qatar, pp 1724–1734. http://www.aclweb.org/anthology/D14-1179

  • Choi FYY (2000) Advances in domain independent linear text segmentation. In: Proceedings of NAACL, pp 26–33

    Google Scholar 

  • de Jong NH, Wempe T (2009) Praat script to detect syllable nuclei and measure speech rate automatically. Behav Res Methods 41(2):385–390. https://doi.org/10.3758/BRM.41.2.385

    Article  Google Scholar 

  • Dey R, Salemt FM (2017) Gate-variants of gated recurrent unit (GRU) neural networks. In: 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pp 1597–1600

    Google Scholar 

  • Dombi J (2013) On a certain class of aggregative operators. Inf Sci 245:313–328

    Article  MathSciNet  Google Scholar 

  • Domingos P (2012) A few useful things to know about machine learning. Commun ACM 55(10):78–87

    Article  Google Scholar 

  • Galley M, McKeown K, Fosler-Lussier E, Jing H (2003) Discourse segmentation of multi-party conversation. In: Proceedings of ACL, pp 562–569

    Google Scholar 

  • Galukov P (2012) Application of topic segmentation in audiovisual information retrieval. In: Proceedings of WDS, pp 118–122

    Google Scholar 

  • Glorot X, Bordes A, Bengio Y (2011) Deep sparse rectifier neural networks. In: Proceedings of AISTATS, pp 315–323

    Google Scholar 

  • Grosz BJ, Sidner CL (1986) Attention, intentions, and the structure of discourse. Comput Linguist 12(3):175–204

    Google Scholar 

  • Grósz T, Nagy I (2014) Document classification with deep rectifier neural networks and probabilistic sampling. In: Proceedings of TSD, pp 108–115

    Chapter  Google Scholar 

  • Grósz T, Busa-Fekete R, Gosztolya G, Tóth L (2015) Assessing the degree of nativeness and Parkinson’s condition using Gaussian processes and deep rectifier neural networks. In: Proceedings of Interspeech, pp 1339–1343

    Google Scholar 

  • Gruenstein A, Niekrasz J, Purver M (2005) Meeting structure annotation: data and tools. In: Proceedings of SIGDIAL, pp 117–127

    Google Scholar 

  • Gruenstein A, Niekrasz J, Purver M (2008) Meeting structure annotation. In: Dybkjær L, Minker W (eds) Recent trends in discourse and dialogue. Springer, Netherlands, pp 247–274

    Chapter  Google Scholar 

  • Hearst MA (1994) Multi-paragraph segmentation of expository text. In: Proceedings of the ACL, pp 9–16

    Google Scholar 

  • Hirschberg J, Nakatani CH (1996) A prosodic analysis of discourse segments in direction-giving monologues. In: Proceedings of the ACL, pp 286–293

    Google Scholar 

  • Hirschberg J, Nakatani CH (1998) Acoustic indicators of topic segmentation. In: Proceedings of ICSLP

    Google Scholar 

  • Hunyadi L, Váradi T, Szekrényes I (2016) Language technology tools and resources for the analysis of multimodal communication. In: Proceedings of LT4DH, University of Tübingen, Tübingen, pp 117–124

    Google Scholar 

  • James AD (1995) Topic shift in casual conversation. Totem: Univ West Ont J Anthropol 2(1)

    Google Scholar 

  • Jozefowicz R, Zaremba W, Sutskever I (2015) An empirical exploration of recurrent network architectures. In: Proceedings of ICML, pp 2342–2350

    Google Scholar 

  • Kane B, Luz S (2006) Multidisciplinary medical team meetings: an analysis of collaborative working with special attention to timing and teleconferencing. Comput Support Coop Work 15(5–6):501–535

    Google Scholar 

  • Khandelwal S, Lecouteux B, Besacier L (2016) Comparing GRU and LSTM for automatic speech recognition. Research report, LIG. https://hal.archives-ouvertes.fr/hal-01633254

  • Kovács G, Váradi T (2017) Examining the contribution of various modalities to topical unit classification on the HuComTech corpus (in Hungarian). In: Proceedings of MSZNY, pp 193–204

    Google Scholar 

  • Kovács G, Grósz T, Váradi T (2016) Topical unit classification using deep neural nets and probabilistic sampling. In: Proceedings of CogInfoCom, pp 199–204

    Google Scholar 

  • Kozima H (1993) Text segmentation based on similarity between words. In: Proceedings of the ACL, pp 286–288

    Google Scholar 

  • Kuta M, Morawiec M, Kitowski J (2017) Sentiment analysis with tree-structured gated recurrent units. In: Proceedings of TSD, pp 74–82

    Chapter  Google Scholar 

  • Lawrence S, Burns I, Back A, Tsoi AC, Giles CL (1998) Neural network classification and prior class probabilities. In: Orr GB, Müller KR (eds) Neural networks: tricks of the trade. Springer, Berlin, pp 299–313

    Chapter  Google Scholar 

  • Lu L, Zhang X, Renais S (2016) On training the recurrent neural network encoder-decoder for large vocabulary end-to-end speech recognition. In: Proceedings of ICASSP, pp 5060–5064

    Google Scholar 

  • Luz S (2009) Locating case discussion segments in recorded medical team meetings. In: Proceedings of the third workshop on searching spontaneous conversational speech, SSCS ’09. ACM, pp 21–30

    Google Scholar 

  • Luz S, Su J (2010) Assessing the effectiveness of conversational features for dialogue segmentation in medical team meetings and in the AMI corpus. In: Proceedings of SIGDIAL, pp 332–339

    Google Scholar 

  • Maas AL, Hannun AY, Ng AY (2013) Rectifier nonlinearities improve neural network acoustic models. In: Proceedings of ICML, vol 30/1

    Google Scholar 

  • Malioutov I, Park A, Barzilay R, Glass J (2007) Making sense of sound: unsupervised topic segmentation over acoustic input. In: Proceedings of the ACL, pp 504–511

    Google Scholar 

  • Molugu MC (2003) Topic segmentation. Master’s thesis, University of Edinburgh

    Google Scholar 

  • Passonneau RJ, Litman DJ (1997) Discourse segmentation by human and automated means. Comput Linguist 23(1):103–139

    Google Scholar 

  • Purver M (2011) Topic segmentation. In: Tur G, de Mori R (eds) Spoken language understanding: systems for extracting semantic information from speech. Wiley, New York, pp 291–317

    Chapter  Google Scholar 

  • Quinlan JR (1993) C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco

    Google Scholar 

  • Reynar JC (1994) An automatic method of finding topic boundaries. In: Proceedings of the ACL, pp 331–333

    Google Scholar 

  • Rosenberg A (2012) Classifying skewed data: importance weighting to optimize average recall. In: Proceedings of Interspeech, pp 2242–2245

    Google Scholar 

  • Sapru A, Bourlard H (2014) Detecting speaker roles and topic changes in multiparty conversations using latent topic models. In: Proceedings of Interspeech, pp 2882–2886

    Google Scholar 

  • Sheikh I, Fohr D, Illina I (2017) Topic segmentation in ASR transcripts using bidirectional RNNs for change detection. In: Proceedings of ASRU

    Google Scholar 

  • Sherman M, Liu Y (2008) Using hidden Markov models for topic segmentation of meeting transcripts. In: Proceedings of SLT, pp 185–188

    Google Scholar 

  • Shriberg E, Stolcke A, Hakkani-Tür D, Tür G (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun 32(1–2):127–154

    Article  Google Scholar 

  • Sitbon L, Bellot P (2007) Topic segmentation using weighted lexical links (WLL). In: Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval, SIGIR ’07, pp 737–738

    Google Scholar 

  • Szekrényes I (2015) Prosotool, a method for automatic annotation of fundamental frequency. In: 6th IEEE international conference on cognitive infocommunications (CogInfoCom). IEEE, New York, pp 291–296

    Google Scholar 

  • Szekrényes I, Kovács G (2017) Classification of formal and informal dialogues based on turn-taking and intonation using deep neural networks. In: Karpov A, Potapova R, Mporas I (eds) Speech and computer. Springer International Publishing, Cham, pp 233–243

    Chapter  Google Scholar 

  • Tóth L (2013) Phone recognition with deep sparse rectifier neural networks. In: Proceedings of ICASSP, pp 6985–6989

    Google Scholar 

  • Tóth L, Kocsor A (2005) Training HMM/ANN hybrid speech recognizers by probabilistic sampling. In: Proceedings of ICANN, pp 597–603

    Chapter  Google Scholar 

  • Tür G, Hakkani-Tür DZ, Stolcke A, Shriberg E (2001) Integrating prosodic and lexical cues for automatic topic segmentation. CoRR 31–57

    Article  Google Scholar 

Download references

Acknowledgements

The research reported in the paper was conducted with the support of the Hungarian Scientific Research Fund (OTKA) grant #K116938 and #K116402. Ministry of Human Capacities, Hungary grant 20391-3/2018/FEKUSTRAT is acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to István Szekrényes .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Kovács, G., Szekrényes, I. (2020). Applying Neural Network Techniques for Topic Change Detection in the HuComTech Corpus. In: Hunyadi, L., Szekrényes, I. (eds) The Temporal Structure of Multimodal Communication. Intelligent Systems Reference Library, vol 164. Springer, Cham. https://doi.org/10.1007/978-3-030-22895-8_8

Download citation

Publish with us

Policies and ethics