Skip to main content

A Fuzzy Logic Approach for Content-Based Audio Classification and Boolean Retrieval

  • Chapter
Fuzzy Logic and the Internet

Part of the book series: Studies in Fuzziness and Soft Computing ((STUDFUZZ,volume 137))

Summary

Since the invention of fuzzy sets and maturing of the fuzzy logic theory, fuzzy logic systems have been widely applied to various fields, such as fuzzy controller, data mining, and so on. New potential areas using fuzzy logic are also being explored with the emergence of other technologies. One booming technology today is the Internet, due to its fast growing number of users and rich contents. With huge data storage and speedy networks becoming available, multimedia contents like image, video, and audio are fast increasing. In order to search and index these media effectively, various content-based multimedia retrieval systems have been studied.

In this chapter, we introduce a fuzzy logic approach for hierarchical contentbased audio classification and boolean retrieval, which is intuitive due to the fuzzy nature of human perception of audio, especially audio clips of mixed types.The fuzzy nature of audio search lies in the facts that (1) both the query and target are approximations of the user’s memory and desire and (2) exact matching is sometimes impossible or impractical. Therefore, fuzzy logic systems are a natural choice in audio classification and retrieval.

The fuzzy tree classifier is the core of the hierarchical content-based audio classification. At the beginning, audio features are extracted for audio samples in the database. Proper features are then selected and used as input to a constructed fuzzy inference system (FIS). The outputs of the FIS are two types of hierarchical audio classes. The membership functions and rules are derived from the distributions of the audio features. Non-speech and music sounds are discriminated by the FIS in the first hierarchy. Secondly, music and speech are separated. One particular sound, the telephone ring, has also been recognized in this level. In the prototype system, the classification ability of up to fourth level has been explored. Hence we can use multiple FISs to form the ‘fuzzy tree’ for retrieval of different types of audio clips. With this approach, we can classify and retrieve generic audios using fewer features and less computation time, compared to other existing approaches.

As for retrieval, the existing content-based audio retrieval systems usually adopt the query-by-example mechanism to search for desired audio files. However, only one single audio sample often cannot express the user’s needs adequately. To overcome this problem, more audio files can be chosen as queries provided by the user or through feedback during searching. Correspondingly, we present a different scheme to handle content-based audio retrieval with multi-queries. The multiple queries are linked by boolean operators and thus it can be treated as a boolean search problem. We build a framework to solve the three basic boolean operators known as AND, OR, and NOT, with concepts adopted from fuzzy logic. Experiments have shown that boolean search can be helpful in audio retrieval.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 169.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Makhoul J, Kubala F et al. (2000) Speech and language technologies for audio indexing and retrieval code. In: Proceedings of the IEEE, Volume: 88 Issue: 8, Aug 2000, pp: 1338–1353

    Google Scholar 

  2. Viswanathan M, Beigi H.S.M et al. (1999) Retrieval from spoken documents using content and speaker information. In: ICDAR’99 pp: 567–572

    Google Scholar 

  3. Gauvain J.-L, Lamel L (2000) Large-vocabulary continuous speech recognition: advances and applications. In: Proceedings of the IEEE, Volume: 88 Issue: 8, Aug 2000, pp: 1181–1200

    Google Scholar 

  4. Chih-Chin Liu, Jia-Lien Hsu, Chen A.L.P (1999) An approximate string matching algorithm for content-based music data retrieval. In: IEEE International Conference on Multimedia Computing and Systems, Volume: 1, 1999, pp: 451–456

    Google Scholar 

  5. Delfs C, Jondral F (1997) Classification of piano sounds using time-frequency signal analysis. In: ICASSP-97, Volume: 3 pp: 2093–2096

    Google Scholar 

  6. Paradie M.J, Nawab S.H (1990) The classification of ringing sounds. In: ICASSP-90, pp: 2435–2438

    Google Scholar 

  7. Scheirer E, Slaney M (1997) Construction and evaluation of a robust multifeature speech/music discriminator. In: ICASSP-97, Volume: 2, pp: 1331–1334

    Google Scholar 

  8. Tong Zhang, C.-C. Jay Kuo (1999) Heuristic approach for generic audio data segmentation and annotation. In: ACM Multimedia’99, pp: 67–76

    Google Scholar 

  9. Liu Z, Huang J, Wang Y (1998) Classification TV programs based on audio information using hidden Markov model. In: IEEE Second Workshop on Multimedia Signal Processing, 1998, pp: 27–32

    Chapter  Google Scholar 

  10. Wold E, Blum T, Keislar D, Wheaten J (1996) Content-based classification, search, and retrieval of audio. In: IEEE Multimedia, Volume: 3 Issue: 3, Fall 1996, pp: 27–36

    Google Scholar 

  11. Zhu Liu, Qian Huang (2000) Content-based indexing and retrieval-by-example in audio. In: ICME 2000, Volume: 2, pp: 877–880

    Google Scholar 

  12. Beritelli F, Casale S, Russo M (1995) Multilevel Speech Classification Based on Fuzzy Logic. In: Proceedings of IEEE Workshop on Speech Coding for Telecommunications, 1995, pp: 97–98

    Chapter  Google Scholar 

  13. Zhu Liu, Qian Huang (1998) Classification of audio events in broadcast news. In: IEEE Second Workshop on Multimedia Signal Processing, 1998, pp:364–369

    Google Scholar 

  14. Mingchun Liu, Chunru Wan (2001) A study on content-based classification and retrieval of audio database. In: International Database Engineering and Application Symposium, 2001, pp: 339–345

    Google Scholar 

  15. Li S.Z (2000) Content-based audio classification and retrieval using the nearest feature line method, IEEE Transactions on Speech and Audio Processing, Volume: 8 Issue: 5, Sept 2000, pp: 619–625

    Article  Google Scholar 

  16. Jang J.-S.R (1993) ANFIS: adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man and Cybernetics, 1993, volume: 23, Issue: 3, pp: 665–685

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2004 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Liu, M., Wan, C., Wang, L. (2004). A Fuzzy Logic Approach for Content-Based Audio Classification and Boolean Retrieval. In: Loia, V., Nikravesh, M., Zadeh, L.A. (eds) Fuzzy Logic and the Internet. Studies in Fuzziness and Soft Computing, vol 137. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39988-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-39988-9_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-05770-0

  • Online ISBN: 978-3-540-39988-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics