Development of an audio-visual database system for human identification

  • C. B. Bargale
  • S. Chaudhuri
  • P. Bhattacharyya
Systems and Applications
Part of the Lecture Notes in Computer Science book series (LNCS, volume 1206)


Database systems dealing with textual contents have been in use for a long time. A database management system (DBMS) allows convenient and efficient storage and retrieval of a huge amount of data. Traditional databases are designed for handling alphanumeric data efficiently, but fail to manage complex data like audio and/or video. One dimensional audio data and two dimensional image data can be stored in the form of a binary large object (BLOB) with no emphasis on the contents. Textual information can be attached to BLOBs for retrieval, but mere a textual information is insufficient for describing the rich contents of data. So there is a need to extend the capabilities of such information management system to handle both audio and visual data. Contents of such data items can be extracted in the form of features which can be used for distinction amongst the instances of these data types.

This paper describes how the relational data model can be extended to retrieve face images and audio data in the form of utterances of alphabets. Face images are characterized by sizes of different objects, e.g. nose, lips and the inter-object distances. The audio data is characterized by pitch, formants and LPC coefficients. The purpose of the paper is to develop an automated system for human identification based on audio-visual querying. The system allows the query to be partly audio, partly visual and textual.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    J. R. Bach, S. Paul and R. Jain, ”A visual information management system for interactive retrieval of faces”, IEEE Trans, on Knowledge and Data Engg., vol. 5, August 93, pp. 619–628.Google Scholar
  2. [2]
    G. Chow and X. Li, ”Toward a system for automatic facial feature detection”, Pattern Recognition, vol. 26, No. 12, 1993, pp. 1739–1755.Google Scholar
  3. [3]
    J. Flanagoan, Speech analysis, Synthesis and perception, II ed. Springer-Verlag pub., 1972.Google Scholar
  4. [4]
    M. Flinker and H. Sawhaney, ”Query by image and video content: The QBIC system”, IEEE computer, sept. 1995, pp. 23–30.Google Scholar
  5. [5]
    A. J. Goldstein, L. D. Harmon, A. B. Lesk, ”Identification of human faces”, Proc. of IEEE, vol. 59, No. 5, May 1971, pp. 749–760.Google Scholar
  6. [6]
    W. I. Grosky, ”Towards a data model for integrated pictorial databases”, Computer Vision, Graphics and Image Processing, 25, 1984, pp. 371–382.Google Scholar
  7. [7]
    V. Gudivada, V. Raghavan, ”A unified approach to data modeling and retrieval for a class of image database applications”, in Multimedia database system, Springer-Verlag pub., 1996, pp. 37–73.Google Scholar
  8. [8]
    F. Itakura, ”Minimum prediction residual principle applied to speech recognition”, IEEE ASSP-23, Feb. 1975, p. 67.Google Scholar
  9. [9]
    R. Jain, S. N. J. Murthy, P. L-J Tran, S. Chatterjee, ”Similarity measures for image databases”, in FUZZ-IEEE'95.Google Scholar
  10. [10]
    J. Markel, ”Digital inverse filtering a new tool for formant trajectory estimation”, IEEE Trans. AU-20, Jun. 1972, p. 129.Google Scholar
  11. [11]
    S. McCandless, ”An algorithm for automatic formant extraction using linear prediction spectra”, IEEE ASSP-22, April 1972, p. 135.Google Scholar
  12. [12]
    N. Miller, ”Pitch detection by data reduction”, IEEE ASSP-23, Feb 1975, p. 72.Google Scholar
  13. [13]
    V. E. Ogle, ”Chabot: Retrieval from a relational database of images”, IEEE Computer, Sept. 1995, pp. 40–48.Google Scholar
  14. [14]
    J. K. Ousterhout, Tcl and Tk Toolkit, Addison-Wesley pub., 1994.Google Scholar
  15. [15]
    N. Roeder, X. Li, ”Accuracy analysis for facial feature detection”, Pattern recognition, Jan. 1996, pp. 143–157.Google Scholar
  16. [16]
    A. Samal, P. Iyenger, ”Automatic recognition and analysis of human faces and facial expression: A survey”, Pattern Recognition, vol. 25, No. 1, 1992, pp. 65–77.Google Scholar
  17. [17]
    S. Santini and R. Jain, ”Similarity queries in image database”, to appear in CVPR, June 96.Google Scholar
  18. [18]
    R. Schafer, L. Rabiner, ”Digital representation of speech signals”, IEEE Proc., vol. 63, April 1975, p. 662.Google Scholar
  19. [19]
    G. Y. Tang, ”A management system for an integrated database of pictures and alphanumeric data”, Computer Vision, Graphics and Image Processing, 16, 1981, pp. 270–286.Google Scholar
  20. [20]
    A. Yoshitaka, S. Kishida and M. Hirakawa, ”Knowledge assisted content based retrieval for multimedia databases”, IEEE Multimedia, winter 1994, pp. 12–21.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1997

Authors and Affiliations

  • C. B. Bargale
    • 1
  • S. Chaudhuri
    • 1
  • P. Bhattacharyya
    • 2
  1. 1.Department of Electrical EngineeringIndian Institute of TechnologyPowai, Bombay400 076
  2. 2.Department of Computer Science and EngineeringIndian Institute of TechnologyPowai, Bombay400 076

Personalised recommendations