Skip to main content

Identifying perceptually congruent structures for audio retrieval

  • Conference paper
  • First Online:
Interactive Distributed Multimedia Systems and Telecommunication Services (IDMS 1998)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1483))

  • 101 Accesses

Abstract

The relatively low cost access to large amounts of multimedia data, such as over the WWW, has resulted in an increasing demand for multimedia data management. Audio data has received relatively little research attention. The main reason for this is that audio data poses unique problems. Specifically, the unstructured nature of current audio representations considerably complicates the tasks of content-based retrieval and especially browsing. This paper attempts to address this oversight by developing a representation that is based on the inherent, perceptually congruent structure of audio data. A survey of the pertinent issues is presented that includes some of limitations of current unstructured audio representations and the existing retrieval systems based on these. The benefits of a structured representation are discussed as well as the relevant perceptual issues used to identify the underlying structure of an audio data stream. Finally, the structured representation is described and its possible applications to retrieval and browsing are outlined.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. R. Gonzalez, “Hypermedia Data Modeling, Coding and Semiotics”, Proc of the IEEE, vol 85, no 7, July 1997, pp 1111–1141.

    Article  Google Scholar 

  2. D. Hindus, C. Schmandt and C. Horner, “Capturing, Structuring and Representing Ubiquitous Audio”, ACM Trans. On Information Systems, v. 11, n. 4, Oct 1993, pp 376–400.

    Article  Google Scholar 

  3. G. Hauptmann, M. J. Witbrock, A. I. Rudnicky and S. Reed, “Speech for Multimedia Information Retrieval”, UIST '95, pp. 79–80.

    Google Scholar 

  4. J. McNab, L. A. Smith, D. Bainbridge and I. H. Witten, “The New Zealand Digital Library MELody inDEX”, D-Lib Magazine, May 1997, http://www.dlib.org/dlib/may97/meldex/05witten.htm.

    Google Scholar 

  5. Ghias, J. Logan, D. Chamberlin and B. C. Smith, “Query By Humming: Musical Information Retrieval in An Audio Database”, Proc. ACM Multimedia '95, San Francisco, pp 231–236.

    Google Scholar 

  6. E. Wold, T. Blum, D. Keislar and J. Wheaton, “Content-Based Classification, Search and Retrieval of Audio”, IEEE Multimedia, Fall 1996, pp. 27–36.

    Article  Google Scholar 

  7. S. Tanguine, “A Principle of Correlativity of Perception and its Application to Music Recognition”, Music Perception, Summer 1994, 11 (4), pp. 465–502.

    Google Scholar 

  8. P.J.V. Aigrain, P. Longueville, Lepain, “Representation-based user interfaces for the audiovisual library of year 2000”, Proc. SPIE Multimedia and Computing and Networks 1995, vol. 2417, Feb 1995, pp. 35–45.

    Google Scholar 

  9. B. Arons, “SpeechSkimmer: Interactively Skimming Recorded Speech”, Proc. USIT 1993: ACM Symposium on User Interface Software and Technology, Nov 1993.

    Google Scholar 

  10. D. P. W. Ellis, B. L. Vercoe, “A Perceptual Representation of Audio for Auditory Signal Separation”, presented at the 23rd meeting of the Acoustical Society of America, Salt Lake City, May 1992.

    Google Scholar 

  11. B. C. J. Moore, “An Introduction to the Psychology of Hearing”, fourth edition, Academic Press, 1997.

    Google Scholar 

  12. T. F. Quatieri, R. J. McAulay, “Speech Transformations Based on a Sinusoidal Representation”, IEEE Trans. ASSP, vol. ASSP-34, no. 6, Dec 1986, pp. 1449–1463.

    Article  Google Scholar 

  13. N. Ahmed, T. Natarajan and K.R. Rao, “Discrete Cosine Transform”, IEEE Trans on Computers, Jan 1974, pp. 90–93.

    Google Scholar 

  14. M. Paraskevas, J. Mourjopoulos, “A Differential Perceptual Audio Coding Method with Reduced Bitrate Requirements”, IEEE Trans ASSP, v. 3, n. 6, Nov 1995.

    Google Scholar 

  15. M.R. Schroeder, B. S. Atal, J. L. Hall, “Opimizing digital speech coders by exploiting masking properties of the human ear”, J. Acoust. Soc. Amer., 66(6), Dec 1979, pp 1647–1651.

    Article  Google Scholar 

  16. ISO/IEC 11 172-3.

    Google Scholar 

  17. J. Hoyt, H. Wechsler, “Detection of Human Speech in Structured Noise”, IEEE ICASSP, vol 2. 1994, pp 237–240

    Google Scholar 

  18. A. B. Fineberg, R. J. Mammone, “Detection and Classification of Multicomponent Signals”, Proc. 25th Asilomar Conference on Computer, Signals and Systems, Nov 4–6, 1991.

    Google Scholar 

  19. E. Terhardt, G. Stoll, M. Seewann, “Algorithm for extraction of pitch and pitch salience from complex tonal signals”, J. Acoust

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Thomas Plagemann Vera Goebel

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Melih, K., Gonzalez, R. (1998). Identifying perceptually congruent structures for audio retrieval. In: Plagemann, T., Goebel, V. (eds) Interactive Distributed Multimedia Systems and Telecommunication Services. IDMS 1998. Lecture Notes in Computer Science, vol 1483. Springer, Berlin, Heidelberg. https://doi.org/10.1007/BFb0055311

Download citation

  • DOI: https://doi.org/10.1007/BFb0055311

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-64955-7

  • Online ISBN: 978-3-540-49914-5

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics