Telecommunication Systems

, Volume 9, Issue 3–4, pp 375–391 | Cite as

A pattern classification proposal for object‐oriented audio coding in MPEG‐4

  • Francesco Beritelli
  • Salvatore Casale
  • Marco Russo


The future MPEG‐4 standard will adopt an object‐oriented encoding strategy whereby an audio source is encoded at a very low bit‐rate by adapting a suitable coding scheme to the local characteristics of the signal. One of the most delicate issues in this approach is that the overall performance of the audio encoder greatly depends on the accuracy with which the input signal is classified. This paper shows that the difficult problem of audio classification for object‐oriented coding can be effectively solved by selecting a salient set of acoustic parameters and adopting a fuzzy model for each audio object, obtained by a soft computing‐hybrid learning tool. The audio classifier proposed operates at two levels: recognition of the class to which the input signal belongs (talkspurt, music, noise, signaling tones) and then recognition of the subclass to which it belongs. The results obtained show that fuzzy logic is a valid alternative to the matching techniques of a traditional pattern recognition approach.


Fuzzy Logic Input Signal Encode Strategy Fuzzy Model Acoustic Parameter 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    K. Arakawa, Fuzzy rule-based signal processing and its application to image restoration, IEEE Journal on Selected Areas in Communications 12 (December 1994) 1495-1502.Google Scholar
  2. [2]
    F. Beritelli, Speech classification and coding for future generation mobile systems, Ph.D. thesis, University of Catania, Catania, Italy (February 1997) (in Italian).Google Scholar
  3. [3]
    F. Beritelli, S. Casale and A. Cavallaro, A voice activity detector for mobile communications based on fuzzy logic, in: European Symposium on Intelligent Techniques, Bari, Italy (March 1997) pp. 91-95.Google Scholar
  4. [4]
    F. Beritelli, S. Casale and M. Russo, Multilevel speech classification based on fuzzy logic, in: Proceedings of 1995 IEEE Workshop on Speech Coding for Telecommunications, Annapolis, MD (September 1995) pp. 97-98.Google Scholar
  5. [5]
    F. Beritelli, S. Casale and M. Russo, A voiced/unvoiced speech discrimination technique based on fuzzy logic, in: Proceedings of 4th European Conference on Speech Communication and Technology, EUROSPEECH '95, Vol. 1, Madrid, Spain (September 1995) pp. 389-392.Google Scholar
  6. [6]
    F. Beritelli, S. Casale and P. Usai, Background noise classification in mobile environments using fuzzy logic, Contribution ITU-T (WP 3/12), in: Meeting on Noise Aspects in Evolving Networks, Question 17/12 (April 1997).Google Scholar
  7. [7]
    G. Booch, Object Oriented Design with Applications (Benjamin/Cummings, Redwood City, USA, 1991).Google Scholar
  8. [8]
    L. Cellario, M. Festa, D. Sereno, J.M. Muller and B. Watcher, An object oriented generic audio coding architecture, in: International Conference on Communications Technology, Beijing, China (May 1996) pp. 1-4.Google Scholar
  9. [9]
    L. Cellario and D. Sereno, CELP coding at variable rate, European Transactions on Telecommunications 5(5) (1994) 603-613.Google Scholar
  10. [10]
    L. Chiariglione, MPEG and multimedia communications, IEEE Transactions on Circuits and Systems for Video Technology 7 (February 1997) 5-18.Google Scholar
  11. [11]
    Coding of speech at 8 kbit/s using conjugate-structure algebraic-code-excited linear prediction (CSACELP), Rec. ITU-T G.729 (1996).Google Scholar
  12. [12]
    Coding of moving picture and associated audio for digital storage media at up about 1.5 Mbit/s, ISO/IEC IS 11172-2 (MPEG-1).Google Scholar
  13. [13]
    R.V. Cox and P. Kroon, Low bit-rate speech coders for multimedia communications, IEEE Communications Magazine 34 (December 1996) 34-41.Google Scholar
  14. [14]
    C. Dorize, J.M. Muller and D. Sereno, Functionalities addressed by the MAVT MPEG-4 audio candidate, ISO/IEC JTC1/SC29/WG11, MPEG95/0413 (November 1995).Google Scholar
  15. [15]
    Draft MPEG-4 audio verification model, ISO/IEC, MPEG Audio Subgroup, JTC1/SC29/WG11 N1214 (March 1996).Google Scholar
  16. [16]
    K. Fukunaga, Introduction to Statistical Pattern Recognition, 2nd ed. (Academic Press, London, 1990).Google Scholar
  17. [17]
    Generic coding of moving picture and associated audio, ISO/IEC IS 13818-2 (MPEG-2).Google Scholar
  18. [18]
    A. Gersho, Advances in speech and audio compression, IEEE Proceedings 82 (June 1994) 900-918.Google Scholar
  19. [19]
    Issue on intelligent signal processing in communications, IEEE Journal on Selected Areas in Communications 12 (December 1994).Google Scholar
  20. [20]
    D.H. Kil and F.B. Shin, Pattern Recognition and Prediction with Applications to Signal Characterization, Series in Modern Acoustics and Signal Processing, American Institute of Physics, New York (AIP Press, Woodbury, 1996).Google Scholar
  21. [21]
    E. Paksoy, K. Srinivasan and A. Gersho, Variable bit-rate CELP coding of speech with phonetic classification, European Transactions on Telecommunications 5 (September/October 1994) 591-601.Google Scholar
  22. [22]
    L.R. Rabiner and R.W. Schafer, Digital Processing of Speech Signal (Springer, New York, 1978).Google Scholar
  23. [23]
    J. Rugelbek, D. Pascal and P. Barrett, Subjective qualification test plan for the ITU-T 4 Kbit/s speech coding algorithm, Contribution ITU-T (WP2/12), Question 22/12 (April 1997).Google Scholar
  24. [24]
    M. Russo, A genetic approach to fuzzy learning, in: First International Symposium on Neuro-Fuzzy Systems, AT '96, EPFL, Lausanne, Switzerland (August 1996) pp. 9-16.Google Scholar
  25. [25]
    M. Russo, FuGeNeSys: A genetic neural system for fuzzy modeling, to appear in IEEE Transactions on Fuzzy Systems.Google Scholar
  26. [26]
    D. Sereno, L. Cellario and M. Festa, An object-oriented approach for audio coding, ISO/IEC JTC1/SC29/WG11 MPEG95/094 (March 1995).Google Scholar
  27. [27]
    R. Sokol and C. Mercier, Neural-fuzzy network for phonetic features recognition, in: 4th European Conference on Speech Communication and Technology, Madrid, Spain (September 1995) pp. 1579-1582.Google Scholar
  28. [28]
    K.S. Tang, K.F. Man, S. Kwong and Q. He, Genetic algorithms and their applications, IEEE Signal Processing Magazine 13(6) (1996) 22-37.Google Scholar
  29. [29]
    L.A. Zadeh, Fuzzy sets, Information and Control (1965) 338-353.Google Scholar
  30. [30]
    L.A. Zadeh, Fuzzy logic, neural networks and soft computing, Communications of the ACM 37 (March 1994) 77-84.Google Scholar

Copyright information

© Kluwer Academic Publishers 1998

Authors and Affiliations

  • Francesco Beritelli
    • 1
  • Salvatore Casale
    • 1
  • Marco Russo
    • 1
  1. 1.Istituto di Informatica e TelecomunicazioniUniversity of CataniaCataniaItaly E-mail:

Personalised recommendations