Using Related Text Sources to Improve Classification of Transcribed Speech Data

  • Niraj ShresthaEmail author
  • Elias Moons
  • Marie-Francine Moens
Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 921)


Today’s content including user generated content is increasingly found in multimedia format. It is known that speech data are sometimes incorrectly transcribed especially when they are spoken by voices on which the transcribers have not been trained or when they contain unfamiliar words. A familiar mining tasks that helps in storage, indexing and retrieval is automatic classification with predefined category labels. Although state-of-the-art classifiers like neural networks, support vector machines (SVM) and logistic regression classifiers perform quite satisfactory when categorizing written text, their performance degrades when applied on speech data transcribed by automatic speech recognition (ASR) due to transcription errors like insertion and deletion of words, grammatical errors and words that are just transcribed wrongly. In this paper, we show that by incorporating content from related written sources in the training of the classification model has a benefit. We especially focus on and compare different representations that make this integration possible, such as representations of speech data that embed content from the written text and simple concatenation of speech and written content. In addition, we qualitatively demonstrate that these representations to a certain extent indirectly correct the transcription noise.


Speech data Word embeddings 


  1. 1.
    Collell, G., Zhang, T., Moens, M.: Imagined visual representations as multimodal embeddings. In: Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence, pp. 4378–4384 (2017)Google Scholar
  2. 2.
    Huang, R., Hansen, J.H.: Advances in unsupervised audio classification and segmentation for the broadcast news and NGSW corpora. In: IEEE Transactions on Audio, Speech, and Language Processing, pp. 907–919 (2006)Google Scholar
  3. 3.
    Siegler, M.A., Jain, U., Raj, B., Stern, R.M.: Automatic segmentation, classification and clustering of broadcast news audio. In: Proceedings DARPA Speech Recognition Workshop, pp. 97–99 (1997)Google Scholar
  4. 4.
    Castán, D., Ortega, A., Miguel, A., Lleida, E.: Audio segmentation-by-classification approach based on factor analysis in broadcast news domain. EURASIP J. Audio Speech Music Process. 2014, 34 (2014)CrossRefGoogle Scholar
  5. 5.
    Atrey, P.K., Maddage, N.C., Kankanhalli, M.S.: Audio based event detection for multimedia surveillance. In: 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, vol. 5 (2006)Google Scholar
  6. 6.
    Jiang, Y., Zeng, X., Ye, G., Ellis, D., Chang, S., Bhattacharya, S., Shah, M.: Columbia-UCF TRECVID2010 multimedia event detection: combining multiple modalities, contextual concepts, and temporal matching. In: TRECVID, National Institute of Standards and Technology (NIST) (2010)Google Scholar
  7. 7.
    Schwartz, R.M., Imai, T., Kubala, F., Nguyen, L., Makhoul, J.: A maximum likelihood model for topic classification of broadcast news. In: Kokkinakis, G., Fakotakis, N., Dermatas, E. (eds.) EUROSPEECH, ISCA (1997)Google Scholar
  8. 8.
    Chen, K., Liu, S., Chen, B., Wang, H., Jan, E., Hsu, W., Chen, H.: Extractive broadcast news summarization leveraging recurrent neural network language modeling techniques. IEEE/ACM Trans. Audio Speech Lang. Process. 23, 1322–1334 (2015)CrossRefGoogle Scholar
  9. 9.
    Djuric, N., Zhou, J., Morris, R., Grbovic, M., Radosavljevic, V., Bhamidipati, N.: Hate speech detection with comment embeddings. In: Proceedings of the 24th International Conference on World Wide Web, pp. 29–30. ACM (2015)Google Scholar
  10. 10.
    CMU: CMU sphinx toolbox. “CMU” (2016).
  11. 11.
    FFmpeg: FFmpeg tool (2016).

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Niraj Shrestha
    • 1
    Email author
  • Elias Moons
    • 1
  • Marie-Francine Moens
    • 1
  1. 1.Department of Computer ScienceKU LeuvenLeuvenBelgium

Personalised recommendations