Multi-stage Classification for Audio Based Activity Recognition

  • José Lopes
  • Charles Lin
  • Sameer Singh
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4224)


Context recognition in indoor and outdoor surroundings is an important area of research for the development of autonomous systems. This work describes an approach to the classification of audio signals found in both indoor and outdoor environments. Several audio features are extracted from raw signals. We analyze the relevance and importance of these features and use that information to design a multi-stage classifier architecture. Our results show that the multi-stage classification scheme is superior than a single stage classifier and it generates an 80% success rate on a 7 class problem.


Gaussian Mixture Model Audio Signal Independent Component Analysis Blind Source Separation Audio Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boersma, P.: Accurate Short-Term Analysis of the Fundamental Frequency and the Harmonics-to-Noise Ratio of a Sampled Sound. In: Institute of Phonetic Sciences, University of Amsterdam, Proceedings, vol. 17 (1993)Google Scholar
  2. 2.
    Choi, S., Cichocki, A., Park, H.-M., Lee, S.-Y.: Blind Source Separation and Independent Component Analysis: A Review. Neural Information Processing – Letters and Review 6(1), 1–57 (2005)Google Scholar
  3. 3.
    Do, M.N.: An Automatic Speaker Recognition System. Audio Visual Communications Laboratory, Swiss Federal Institute of Technology, Lausanne, SwitzerlandGoogle Scholar
  4. 4.
    Gerhard, D.: Audio Signal Classification: History and Current Techniques, Technical report TR-CS 2003-07 (2003)Google Scholar
  5. 5.
    Hu, Y.H., Hwant, J.-N.: Handbook of Neural Network Signal Processing. CRC Press, Boca RatonGoogle Scholar
  6. 6.
    Jain, A., Zongker, A.: Feature selection: evaluation, application, and small sample performance. IEEE Transactions PAMI 19(2), 153–158 (1997)Google Scholar
  7. 7.
    Kleinschmidt, M.: Methods for capturing spectro-temporal modulations in automatic speech recognition. Acustica united with acta acustica 88(3), 416–422 (2002)Google Scholar
  8. 8.
    Kobes, R., Kunstatter, G.: Physics 1501 – Modern Technology. Physics Department, University of WinnipegGoogle Scholar
  9. 9.
    Liu, Z., Wang, Y.: Audio Feature Extraction and Analysis for Scene Segmentation and Classification. Journal of VLSI Signal Processing, 61–79 (1998)Google Scholar
  10. 10.
    Liu, Z., Huang, J., Wang, Y.: Classification of TV Programs Based on Audio Information Using Hidden Markov Model. In: IEEE Workshop on Multimedia Signal Processing (1998)Google Scholar
  11. 11.
    Logan, B.: Mel Frequency Cepstral Coefficients for Music Modelling, Cambridge Research LaboratoryGoogle Scholar
  12. 12.
    Lu, L., Zhang, H.-J., Jiang, H.: Content Analysis for Audio Classification and Segmentation. IEEE Transactions on Speech and Audio Processing 10, 504–516 (2002)CrossRefGoogle Scholar
  13. 13.
    Mallat, S.: A wavelet tour of signal processing. Academic PublishersGoogle Scholar
  14. 14.
    Rabiner, L.R., Schafer, R.W.: Digital Processing of Speech Signals. Signal Processing Series. Prentice-Hall, Englewood Cliffs (1978)Google Scholar
  15. 15.
    Safavian, S.R., Landgrebe, D.A.: A Survey of decision tree classifier methodology. IEEE Trans SMC, 660–674 (1990)Google Scholar
  16. 16.
    Scheirer, E., Slaney, M.: Construction and Evaluation of a Robust Multifeature Music/Speech Discriminator. Proceedings of IEEE ICASSP 2, 1331–1334 (1997)Google Scholar
  17. 17.
    Tzanetakis, G., Cook, P.: Audio Information Retrieval (AIR) Tools. Department of Computer Science and Department of Music. Princeton University, PrincetonGoogle Scholar
  18. 18.
    Tzanetakis, G., Cook, P.: Multifeature Audio Segmentation for Browsing and Annotation. In: Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, pp. 103–106 (1999)Google Scholar
  19. 19.
    Viikki, O., Laurila, K.: Cepstral domain segmental feature vector normalization for noise robust speech recognition. Speech Communication 25, 133–147 (1998)CrossRefGoogle Scholar
  20. 20.
    Xu, C., Maddage, N., Shao, X., Cao, F., Tian, O.: Musical Genre Classification using Support Vector Machines. In: IEEE ICASSP (2003)Google Scholar
  21. 21.
    Yantorno, R.E., Iyer, A.N., Shah, J.K., Smolenski, B.Y.: Usable Speech Detection Using a Context Dependent Gaussian Mixture Model Classifier. In: IEEE International Symposium on Circuits and Systems (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • José Lopes
    • 1
  • Charles Lin
    • 1
  • Sameer Singh
    • 1
  1. 1.Research School of InformaticsLoughborough UniversityLoughboroughUK

Personalised recommendations