Machine Learning

, Volume 58, Issue 2–3, pp 127–149 | Cite as

Automatic Feature Extraction for Classifying Audio Data

  • Ingo Mierswa
  • Katharina Morik


Today, many private households as well as broadcasting or film companies own large collections of digital music plays. These are time series that differ from, e.g., weather reports or stocks market data. The task is normally that of classification, not prediction of the next value or recognizing a shape or motif. New methods for extracting features that allow to classify audio data have been developed. However, the development of appropriate feature extraction methods is a tedious effort, particularly because every new classification task requires tailoring the feature set anew.

This paper presents a unifying framework for feature extraction from value series. Operators of this framework can be combined to feature extraction methods automatically, using a genetic programming approach. The construction of features is guided by the performance of the learning classifier which uses the features. Our approach to automatic feature extraction requires a balance between the completeness of the methods on one side and the tractability of searching for appropriate methods on the other side. In this paper, some theoretical considerations illustrate the trade-off. After the feature extraction, a second process learns a classifier from the transformed data. The practical use of the methods is shown by two types of experiments: classification of genres and classification according to user preferences.


analysis of audio data feature extraction time series transformations music recommender systems 


  1. Bäck, T., Hammel, U., & Schwefel, H.-P. (1997). Evolutionary computation: Comments on the history and current state. IEEE Transactions on Evolutionary Computation, 1:1, 3–17.Google Scholar
  2. Cooley, J. W., & Tukey, J. W. (1965). An algorithm for the machine computation of the complex Fourier series. Mathematics of Computation, 19, 297–301.Google Scholar
  3. Droste, S., Jansen, T., & Wegener, I. (1998). On the analysis of the (1+1) evolutionary algorithm. Technical Report CI 21/98, SFB 531, Univ. Dortmund, Germany.Google Scholar
  4. Fischer, S., Klinkenberg, R., Mierswa, I., & Ritthoff, O. (2002). Yale-yet another learning environment tutorial. Technical Report CI 136/02, SFB 531, Univ. Dortmund, Germany.Google Scholar
  5. Ghias, A., Logan, J., Chamberlin, D., & Smith, B. C. (1995). Query by humming: Musical information retrieval in an audio database. In Proc. of ACM Multimedia (pp. 231–236).Google Scholar
  6. Guo, G., & Li, S. Z. (2003). Content-based audio classification and retrieval by support vector machines. IEEE Transaction on Neural Networks, 14:1, 209–215.Google Scholar
  7. Hastie, T., Tibshirani, R., & Friedman, J. (2001). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer series in statistics, Springer.Google Scholar
  8. Holland, J. H. (1986). Escaping brittleness: The possibilities of general-purpose learning algorithms applied to parallel rule-based systems. In R. S. Michalski, J. G. Carbonell, & T. M. Mitchell (Eds.), Machine Learning–An artificial intelligence approach, Chapt. 20, Vol. 2 (pp. 593–624). Palo Alto, CA: Morgan Kaufmann.Google Scholar
  9. Jayant, N. S., & Noll, P. (1984). Digital coding of waveforms: Principles and applications to speech and video. Prentice Hall.Google Scholar
  10. Joachims, T. (2002a). Learning to classify text using support vector machines, Vol. 668 of Kluwer International Series in Engineering and Computer Science. Kluwer.Google Scholar
  11. Joachims, T. (2002b). Optimizing search engines using clickthrough data. In Procs. of the 8th Conference on Knowledge Discovery in Databases.Google Scholar
  12. Kahveci, T., & Singh, A. K. (2001). An efficient index structure for string databases. In Proceedings of the 27th VLDB (pp. 352–360). Morgan Kaufmann.Google Scholar
  13. Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which allows fast classification, clustering and relevance feedback. In Procs. of the 4th Conference on Knowledge Discovery in Databases. (pp. 239–241).Google Scholar
  14. Keogh, E., & Smyth, P. (1997). An enhanced representation of time series which allows fast classification, clustering and relevance feedbacA probabilistic approach to fast pattern matching in time series databases. In Procs. of the 3rd Conference on Knowledge Discovery in Databases (pp. 24–30).Google Scholar
  15. Klinkenberg, R. (2004). Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis (IDA), Special Issue on Incremental Learning Systems Capable of Dealing with Concept Drift, 8:3. (to appear).Google Scholar
  16. Kohavi, R., & John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97:1/2, 273–324.Google Scholar
  17. Koza, J. (1992). Genetic programming: On the programming of computers by means of natural selection. Cambridge, MA: MIT Press.Google Scholar
  18. Kurth, F., & Clausen, M. (2001). Full-text indexing of very-large audio data bases. In 110th Convention of the Audio Engineering Society.Google Scholar
  19. Liu, Z., Wang, Y., & Chen, T. (1998). Audio feature extraction and analysis for scene segmentation and classification. Journal of VLSI Signal Processing System.Google Scholar
  20. Loy, G. (1989). Musicians make a standard: The MIDI phenomenon. Computer Music Journal, 9:4.Google Scholar
  21. Morik, K., & Wessel, S. (1999). Incremental signal to symbol processing. In K. Morik, M. Kaiser, & V. Klingspor (Eds.), Making robots smarter–combining sensing and action through robot learning Chapt. 11. (pp. 185–198). Kluwer Academic Publ.Google Scholar
  22. Pickens, J. (1996). A Survey of feature selection techniques for music information retrieval. Technical report, Center of Intelligent Information Retrieval, Department of Computer Science, University of Masschusetts.Google Scholar
  23. Rüping, S. (2000). mySVM-Manual. Universität Dortmund, Lehrstuhl Informatik VIII.
  24. Takens, F. (1980). Detecting strange attractors in turbulence. In D. A. Rand & L. S. Young (Eds.), Dynamical systems and turbulence, Vol. 898 of Lecture Notes in Mathematics (pp. 366–381). Berlin: Springer.Google Scholar
  25. Tzanetakis, G. (2002). Manipulation, analysis and retrieval systems for audio signals. Ph.D. thesis, Computer Science Department, Princeton University.Google Scholar
  26. Tzanetakis, G., Essl, G., & Cook, P. (2001). Automatic musical genre classification of audio signals. In Procs. of the Int. Symposium on Music Information Retrieval (ISMIR) (pp. 205–210).Google Scholar
  27. Yi, B., Jagadish, H., & Faloutsos, C. (1998). Efficient retrieval of similar time series under time warping. In Procs. 14th Conference on Data Engineering (pp. 201–208).Google Scholar
  28. Zhang, T. & Kuo, C. (1998). Content-based classification and retrieval of audio. In SPIE’s 43rd Annual Meeting–Conference on Advanced Signal Processing Algorithms, Architectures, and Implementations VIII. San Diego.Google Scholar

Copyright information

© Springer Science + Business Media, Inc. 2005

Authors and Affiliations

  1. 1.Artificial Intelligence UnitUniversity of DortmundGermany

Personalised recommendations