Development of an audio-visual database system for human identification
Database systems dealing with textual contents have been in use for a long time. A database management system (DBMS) allows convenient and efficient storage and retrieval of a huge amount of data. Traditional databases are designed for handling alphanumeric data efficiently, but fail to manage complex data like audio and/or video. One dimensional audio data and two dimensional image data can be stored in the form of a binary large object (BLOB) with no emphasis on the contents. Textual information can be attached to BLOBs for retrieval, but mere a textual information is insufficient for describing the rich contents of data. So there is a need to extend the capabilities of such information management system to handle both audio and visual data. Contents of such data items can be extracted in the form of features which can be used for distinction amongst the instances of these data types.
This paper describes how the relational data model can be extended to retrieve face images and audio data in the form of utterances of alphabets. Face images are characterized by sizes of different objects, e.g. nose, lips and the inter-object distances. The audio data is characterized by pitch, formants and LPC coefficients. The purpose of the paper is to develop an automated system for human identification based on audio-visual querying. The system allows the query to be partly audio, partly visual and textual.
Unable to display preview. Download preview PDF.
- J. R. Bach, S. Paul and R. Jain, ”A visual information management system for interactive retrieval of faces”, IEEE Trans, on Knowledge and Data Engg., vol. 5, August 93, pp. 619–628.Google Scholar
- G. Chow and X. Li, ”Toward a system for automatic facial feature detection”, Pattern Recognition, vol. 26, No. 12, 1993, pp. 1739–1755.Google Scholar
- J. Flanagoan, Speech analysis, Synthesis and perception, II ed. Springer-Verlag pub., 1972.Google Scholar
- M. Flinker and H. Sawhaney, ”Query by image and video content: The QBIC system”, IEEE computer, sept. 1995, pp. 23–30.Google Scholar
- A. J. Goldstein, L. D. Harmon, A. B. Lesk, ”Identification of human faces”, Proc. of IEEE, vol. 59, No. 5, May 1971, pp. 749–760.Google Scholar
- W. I. Grosky, ”Towards a data model for integrated pictorial databases”, Computer Vision, Graphics and Image Processing, 25, 1984, pp. 371–382.Google Scholar
- V. Gudivada, V. Raghavan, ”A unified approach to data modeling and retrieval for a class of image database applications”, in Multimedia database system, Springer-Verlag pub., 1996, pp. 37–73.Google Scholar
- F. Itakura, ”Minimum prediction residual principle applied to speech recognition”, IEEE ASSP-23, Feb. 1975, p. 67.Google Scholar
- R. Jain, S. N. J. Murthy, P. L-J Tran, S. Chatterjee, ”Similarity measures for image databases”, in FUZZ-IEEE'95.Google Scholar
- J. Markel, ”Digital inverse filtering a new tool for formant trajectory estimation”, IEEE Trans. AU-20, Jun. 1972, p. 129.Google Scholar
- S. McCandless, ”An algorithm for automatic formant extraction using linear prediction spectra”, IEEE ASSP-22, April 1972, p. 135.Google Scholar
- N. Miller, ”Pitch detection by data reduction”, IEEE ASSP-23, Feb 1975, p. 72.Google Scholar
- V. E. Ogle, ”Chabot: Retrieval from a relational database of images”, IEEE Computer, Sept. 1995, pp. 40–48.Google Scholar
- J. K. Ousterhout, Tcl and Tk Toolkit, Addison-Wesley pub., 1994.Google Scholar
- N. Roeder, X. Li, ”Accuracy analysis for facial feature detection”, Pattern recognition, Jan. 1996, pp. 143–157.Google Scholar
- A. Samal, P. Iyenger, ”Automatic recognition and analysis of human faces and facial expression: A survey”, Pattern Recognition, vol. 25, No. 1, 1992, pp. 65–77.Google Scholar
- S. Santini and R. Jain, ”Similarity queries in image database”, to appear in CVPR, June 96.Google Scholar
- R. Schafer, L. Rabiner, ”Digital representation of speech signals”, IEEE Proc., vol. 63, April 1975, p. 662.Google Scholar
- G. Y. Tang, ”A management system for an integrated database of pictures and alphanumeric data”, Computer Vision, Graphics and Image Processing, 16, 1981, pp. 270–286.Google Scholar
- A. Yoshitaka, S. Kishida and M. Hirakawa, ”Knowledge assisted content based retrieval for multimedia databases”, IEEE Multimedia, winter 1994, pp. 12–21.Google Scholar