Generic Audio Data Segmentation and Indexing

  • Tong Zhang
  • C.-C. Jay Kuo
Part of the The Springer International Series in Engineering and Computer Science book series (SECS, volume 606)


In the coarse-level segmentation and indexing stage, audio data are segmented and classified into basic audio types, based on morphological and statistical analysis of the temporal curves of the short-time energy function, the short-time average zero-crossing rate, and the short-time fundamental frequency, as well as the spectral peak tracks of audio signals. Threshold-based heuristical rules are derived empirically to guide the classification procedures. Therefore, the approach is completely generic and model-free, which can be applied under any circumstances. An illustration of the scheme is shown in Figure 4.1.


Fundamental Frequency Environmental Sound Speech Segment Music Background Music Piece 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Springer Science+Business Media New York 2001

Authors and Affiliations

  • Tong Zhang
    • 1
  • C.-C. Jay Kuo
    • 2
  1. 1.Integrated Media Systems CenterUniversity of Southern CaliforniaLos AngelesUSA
  2. 2.Department of Electrical Engineering — SystemsUniversity of Southern CaliforniaLos AngelesUSA

Personalised recommendations