Skip to main content

Structural and Semantic Modeling of Audio for Content-Based Querying and Browsing

  • Conference paper

Part of the Lecture Notes in Computer Science book series (LNAI,volume 4027)

Abstract

A typical content-based audio management system deals with three aspects namely audio segmentation and classification, audio analysis, and content-based retrieval of audio. In this paper, we integrate the three aspects of content-based audio management into a single framework and propose an efficient method for flexible querying and browsing of auditory data. More specifically, we utilize two robust feature sets namely MPEG-7 Audio Spectrum Flatness (ASF) and Mel Frequency Cepstral Coefficients (MFCC) as the underlying features in order to improve the content-based retrieval accuracy, since both features have some advantages for distinct types of audio (e.g., music and speech). The proposed system provides a wide range of opportunities to query and browse an audio data by content, such as querying and browsing for a chorus section, sound effects, and query-by-example. In addition, the clients can express their queries in the form of point, range, and k-nearest neighbor, which are particularly significant in the multimedia domain.

Keywords

  • Similarity Matrix
  • Range Query
  • Point Query
  • Audio Data
  • Audio Analysis

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (Canada)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (Canada)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aigrain, P., Zhang, H., Petkovic, D.: Content-based Representation and Retrieval of Visual Media: A State-of-the-art Review. Multimedia Tools and Applications 3(3), 179–202 (1996)

    CrossRef  Google Scholar 

  2. Chai, W., Vercoe, B.: Music Thumbnailing via Structural Analysis. In: Proceedings of ACM Multimedia Conference (2003)

    Google Scholar 

  3. Bartsch, M., Wakefield, G.: Audio Thumbnailing of Popular Music Using Croma-Based Representations. IEEE Transactions on Multimedia 7(1), 96–104 (2005)

    CrossRef  Google Scholar 

  4. Bach, J.R.: The Virage Image Search Engine: An Open Framework for Image Management. In: Proceedings of SPIE 1996, San Jose, California (1996)

    Google Scholar 

  5. Niblack, W., Zhu, X., et al.: Updates to the QBIC System. In: Proceedings of SPIE 1998, San Jose, California (1998)

    Google Scholar 

  6. Amato, G., Mainetto, G., Savino, P.: An Approach to a Content-Based Retrieval of Multimedia Data. Multimedia Tools and Applications 7(1/2), 9–36 (1998)

    CrossRef  Google Scholar 

  7. Wold, E., Blum, T., Keislar, D., et al.: Content-based Classification, Search, and Retrieval of Audio. IEEE Multimedia 3(3), 27–36 (1996)

    CrossRef  Google Scholar 

  8. Foote, J.: Content-based Retrieval of Music and Audio. In: Proc. of SPIE 1997 (1997)

    Google Scholar 

  9. Zhang, T., Jay Kuo, C.-C.: Content-based Classification and Retrieval of Audio. In: Proceedings of SPIE 1998, San Diego (1998)

    Google Scholar 

  10. Lu, L., Jiang, H., Zhang, H.: A Robust Audio Classification and Segmentation Method. In: Proc. of the 9th ACM Int. Conf. on Multimedia, Ottawa Canada (2001)

    Google Scholar 

  11. Tzanetakis, G., Cook, P.: Multifeature Audio Segmentation for Browsing and Annotation. In: IEEE WASPAA conference, New Paltz, NY (1999)

    Google Scholar 

  12. Pfeiffer, S.: Pause Concepts for Audio Segmentation at Different Semantic Levels. In: Proc. of the 9th ACM Int. Conf. on Multimedia, Ottawa, Canada (2001)

    Google Scholar 

  13. Chai, W., Vercoe, B.: Structural Analysis of Musical Signals for Indexing and Thumbnailing. In: Proceedings of the 3rd ACM/IEEE-CS joint Conference on Digital Libraries, Houston Texas (2003)

    Google Scholar 

  14. Cooper, M., Foote, J.: Summarizing Popular Music via Structural Similarity Analysis. In: IEEE WASPAA conference, New Paltz, NY (2003)

    Google Scholar 

  15. Goto, M.: A Chorus-Section Detecting Method for Musical Audio Signals. In: IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, Hong Kong, China (2003)

    Google Scholar 

  16. MPEG-7.: Information Technology - Multimedia Content Description Interface - Part 4: Audio ISO/IEC JTC 1/SC 29/WG 11 (2000)

    Google Scholar 

  17. Lloyd, S.P.: Least Squares Quantization in PCM. IEEE Transaction on Information Theory IT-2, 129–137 (1982)

    CrossRef  MathSciNet  Google Scholar 

  18. Wellhausen, J., Crysandt, H.: Temporal Audio Segmentation using MPEG-7 Descriptors. In: Proceedings of SPIE, Santa Clara (CA), USA (2003)

    Google Scholar 

  19. Xu, C., Zhu, Y., Tian, Q.: Automatic Music Summarization Based on Temporal, Spectral and Cepstral Features. In: Proceedings of IEEE ICME 2002 (2002)

    Google Scholar 

  20. Lu, L., Wang, M., Zhang, H.J.: Repeating Pattern Discovery and Structure Analysis from Acoustic Music Data (MIR 2004), New York USA (2004)

    Google Scholar 

  21. Lu, G.: Indexing and Retrieval of Audio: A Survey. Multimedia Tools and Applications 15(3), 269–290 (2001)

    CrossRef  MATH  Google Scholar 

  22. Sert, M., Baykal, B., Yazıcı, A.: Generating Expressive Summaries for Speech and Musical Audio using Self-similarity Clues. In: Proc. of IEEE ICME 2006, Toronto, Ontario Canada (to appear, 2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Sert, M., Baykal, B., Yazıcı, A. (2006). Structural and Semantic Modeling of Audio for Content-Based Querying and Browsing. In: Larsen, H.L., Pasi, G., Ortiz-Arroyo, D., Andreasen, T., Christiansen, H. (eds) Flexible Query Answering Systems. FQAS 2006. Lecture Notes in Computer Science(), vol 4027. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11766254_27

Download citation

  • DOI: https://doi.org/10.1007/11766254_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-34638-8

  • Online ISBN: 978-3-540-34639-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics