Skip to main content
Log in

Segmentation, indexing and retrieval of TV broadcast news bulletins using Gaussian mixture models and vector quantization codebooks

  • Published:
International Journal of Speech Technology Aims and scope Submit manuscript

Abstract

In this paper we proposed two-stage segmentation approach for splitting the TV broadcast news bulletins into sequence of news stories and codebooks derived from vector quantization are used for retrieving the segmented stories. At the first stage of segmentation, speaker (news reader) specific characteristics present in initial headlines of news bulletin are used for gross level segmentation. During second stage, errors in the gross level segmentation (first stage) are corrected by exploiting the speaker specific information captured from the individual news stories other than headlines. During headlines the captured speaker specific information is mixed with background music, and hence the segmentation at the first stage may not be accurate. In this work speaker specific information is represented by using mel frequency cepstral coefficients, and captured by Gaussian mixture models (GMMs). The proposed two-stage segmentation method is evaluated on manual segmented broadcast TV news bulletins. From the evaluation results, it is observed that about 93 % of the news stories are correctly segmented, 7 % are missed and 6 % are spurious. For navigating the bulletins, a quick navigation indexing method is developed based on speaker change points. Performance of the proposed two-stage segmentation and quick navigation methods are evaluated using GMM and neural networks models. For retrieving the target news stories from news corpus, sequence of codebook indices derived from vector quantization is explored. Proposed retrieval approach is evaluated using queries of different sizes. Evaluation results indicating that the retrieval accuracy is proportional to size of the query.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Antonelli, M., Rizzi, A., & del Vescovo, G. (2010, Dec). A query by humming system for music information retrieval. In: Intelligent Systems Design and Applications (ISDA), 10th International Conference (pp.586–591).

  • Bengherabi, M., & Sehad, A. (Apr. 2006). Development and evaluation of automatic-speaker based-audio identification and segmentation for broadcast news recordings indexation. Information and Communication Technologies, 1, 1230–1235.

    Google Scholar 

  • Butko, T., & Nadeu, C. (2011, May). Audio segmentation of broadcast news: A hierarchical system with feature selection for the Albayzin-2010 evaluation. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (pp.357–360)

  • Chen, S., & Gopalakrishnan, P. S. (1998). Speaker, environment and channel change detection and clustering via the bayesian information criterion. In Proceedings of DARPA Broadcast News Transcription and Understanding Workshop.

  • Delacourt, P., & Wellekens, C. (2000). Distbic: A speaker-based segmentation for audio data indexing. Speech Communication, 32(12), 111–126.

    Article  Google Scholar 

  • Dhananjaya, N., Prasad, S. G., and Yegnanarayana, B. (2004, Nov). Speaker segmentation based on subsegmental features and neural network models, 11th International Conference on Neural Information Processing (ICONIP-2004), vol. 50, (pp.1210–1215).

  • Dhananjaya, N., & Yegnanarayana, B. (2008). Speaker change detection in casual conversations using excitation source features. Speech Communication, 50, 153–161.

    Article  Google Scholar 

  • Foote, J. (2000). Automatic audio segmentation using a measure of audio novelty. In: Proceedings of International Conference on Multimedia and Expo, textit1, (pp.452–455)

  • Gish, H., Siu, M.-H., & Rohlicek, R. (1991). Segregation of speakers for speech recognition and speaker identification. In Proceedings of IEEE International Conference acoust, speech and signal processing, 2,(pp.873–876).

  • Hauptmann, A.G., and Witbrock, M. J. (1998, April). Story segmentation and detection of commercials in broadcast news video. In Proceedings of IEEE International Forum Research and Technology Advances in Digital Libraries, Santa Barbara, CA, USA (pp.168–179)

  • He, Q.-H., Yang, J.-C., Li, Y.-X., He, J., Zhang, X.-Y., & Li, W. (2010). Combining GMM, Jensen’s inequality and BIC for speaker indexing. Electronics Letters, 46(654–655), 29.

    Google Scholar 

  • Huang, R., & Hansen, J. H. L. (2004, May). Advances in unsupervised audio segmentation for the broadcast news and ngsw corpora. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing 1, (pp.741–744).

  • Karydis, A. P.: I Nanopoulos and Y. Manolopoulos. (2005, Jan). Audio indexing for efficient music information retrieval. In: Multimedia Modelling Conference, (pp.22–29)

  • Kemp, T., Schmidt, M., Westphal, M., & Waibel, A. (2000). Strategies for automatic segmentation of audio data. In Proceedings of IEEE International Conference Acoust Speech Signal Processing, 3, 1423–1426.

  • Kotti, M., Benetos, E., & Kotropoulos, C. (2008). Computationally efficient and robust bic-based speaker segmentation. IEEE Transactions on Audio, Speech and Language Processing, 16, 920–933.

    Article  Google Scholar 

  • Lei, W.: Unsupervised techniques for audio content analysis and summarization. PhD thesis, School of Computer Engineering, Nanyang Technological University, Singapore, May 2008.

  • Li, D., Sethi, I. K., Dimitrova, N., & McGee, T. (Apr. 2001). Classification of general audio data for content-based retrieval. Pattern Recognition Letters, 22, 533–544.

    Google Scholar 

  • Lu, L, Jiang, H., & Zhang, H. J. (2001, Oct). A robust audio classification and segmentation method. In: Proceedings of the ninth ACM International Conference in Multimedia, Ottawa, Canada (pp.203–211).

  • Lu, G. (2001). Indexing and retrieval of audio: A survey. Multimedia Tools and Applications, 15, 269–290.

    Article  MATH  Google Scholar 

  • Makhoul, J., Kubala, F., Leek, T., Leu, D., Nguyen, L., Schwartz, R., et al. (2000). Speech and language technologies for audio indexing and retrieval. Processing of the IEEE, 88, 1338–1353.

    Article  Google Scholar 

  • Meinedo, H., & Neto, J. (2003, April). Audio segmentation, classification and clustering in a broadcast news task. In: Proceedings of IEEE International Conference on Acoustics Speech and Signal Processing, 2, (pp.5–8.)

  • Nwe, T. L., & Li, H. (2005, Mar). Broadcast news segmentation by audio type analysis. In: Proceedings of IEEE International Conference on Acoustics Speech snd Signal Processing 2, (pp.1065–1068).

  • Ohtsuki, K., Bessho, K., Matsuo, Y., Matsunaga, S., & Hayashi, Y. (2006). Automatic multimedia indexing: combining audio, speech, and visual information to index broadcast news. Signal Processing Magazine IEEE, 23, 69–78.

    Article  Google Scholar 

  • Perez-Freire, L., & Garcia-Mateo, C. (2004, May). A multimedia approach for audio segmentation in TV broadcast news. In: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing, vol. 1.

  • Rao, K. S., Pachpande, K., Reddy, V. R., & Maity, S. (2012, Feb). Segmentation of tv broadcast news using speaker specific information. In: National Conference on Communications (NCC-2012), IIT Kharagpur, Kharagpur, India.

  • Reiss, J., Aucouturier, J. J., & Sandler, M. (2001). Efficient multidimensional searching routines for music information retrieval. International Society of Musical, Information Retrieval, pp.163–171, 2001.

  • Renals, S., Abberley, D., Kirby, D., & Robinson, T. (2000). Indexing and retrieval of broadcast news. Speech Communication, 32, 5–20.

    Article  Google Scholar 

  • Vuppala, A. K., Yadav, J., Chakrabarti, S., & Rao, K. S. (2012). Vowel onset point detection for low bit rate coded speech. IEEE Transactions on Audio, Speech and Language Processing, 20(6), 1894–1903.

    Article  Google Scholar 

  • Vuppala, A. K., Rao, K. S., & Chakrabarti, S. (2013). Vowel onset point detection for noisy speech using spectral energy at formant frequencies. International Journal of Speech Technology (Springer), 16(2), 229–235.

    Article  Google Scholar 

  • Wu, C.-H., & Hsieh, C.-H. (Mar. 2006). Multiple change-point audio segmentation and classification using an MDL-based Gaussian model. IEEE Transactions on Audio, Speech, and Language Processing, 14, 647–657.

    Google Scholar 

  • Xue, H., Li, H., Gao, C., & Shi, Z. (2010). Computationally efficient audio segmentation through a multi-stage bic approach. In 3rd International Congress on Image and Signal Processing (CISP), 8, (pp.3774–3777).

  • Zhang, T., & Kuo, C.-C. (2001). Audio content analysis for online audiovisual data segmentation and classification. IEEE Transactions on Speech and Audio Processing, 9, 441–457.

    Article  Google Scholar 

  • Zheng, F., Zhang, G., & Song, Z. (2001). Comparison of different implementations of MFCC. Journal of Computer Science and Technology, 16(6), 582–589.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. Sreenivasa Rao.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rao, K.S., Pachpande, K. Segmentation, indexing and retrieval of TV broadcast news bulletins using Gaussian mixture models and vector quantization codebooks. Int J Speech Technol 17, 259–269 (2014). https://doi.org/10.1007/s10772-014-9229-5

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10772-014-9229-5

Keywords

Navigation