A Method for Photograph Indexing Using Speech Annotation

  • Jiayi Chen
  • Tele Tan
  • Philippe Mulhem
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 2195)


We explore the feasibility of using speech input to perform the task of indexing a large volume of digital photographs. As a natural medium for image communication, speech can be used to complement existing contentbased techniques thereby promoting the reliability and use-ability of image retrieval systems. We introduce a methodology for image indexing using speech annotation technique. Speech recognition tools, like Dragon NaturallySpeaking can be adapted to perform the main role of speech-to-text transcription. The use of structured speech as opposed to free form speech in a limited system can further boost the transcription accuracy. We also introduce the idea of using N-best lists from the speech recognition output to improve the recognition performance. The transcribed text is used to populate the metadata of the corresponding photograph. A photo query strategy is implemented to affirm the performance of proposed technique for photo indexing and retrieval.


Speech Recognition Image Retrieval Photograph Indexing Image Retrieval System High Recognition Accuracy 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Flickner, M., Sawhney H., Niblack, W., Ashley J., Huang Q. and Dom B.: Query by Image and Video Content: The QBIC System. IEEE Computer, Vol. 28 (1995) 23–32Google Scholar
  2. 2.
    Wu J.K.: Content-based Indexing of Multimedia Databases. IEEE Trans. on Knowledge and Data Engineering, Vol. 9(1997) 978–989Google Scholar
  3. 3.
    Tan T., Mulhem P.: Image Query System using Object Probes. Submitted to ICIP 2001, Thessaloniki, Greece, 2001Google Scholar
  4. 4.
    Satoh S., Nakamura Y., and Kanade T.: Name-It: Naming and Detection Faces in News Videos. IEEE Multimedia (1999) 22–35Google Scholar
  5. 5.
    Siegler M.A.: Integration of Continuous Speech Recognition and Information Retrieval for Mutually Optimal Performance,“ Ph.D. Thesis, Carnegie Mellon University, U.S.(1999)Google Scholar
  6. 6.
    Srihari R.K. et al: Multimedia Indexing and Retrieval of Voice-Annotated Consumer Photos. Proceedings of the Multimedia Indexing and Retrieval Workshop, SIGIR ‘99, University of California, Berkeley, U.S (1999) 1–16Google Scholar
  7. 7.
    Kuchinsky A. et al: FotoFile: A Consumer Multimedia Organization and Retrieval System. Proceedings of the CHI 99 Conference on Human Factors in Computing Systems, Pennsylvania, U.S. (1999) 496–503Google Scholar
  8. 8.
    Mills T.J., Pye D., Sinclair D. and Wood K.R.: Shoebox: A Digital Photo Management System. AT&T Labs Cambridge Technical Reports, UK (2000)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2001

Authors and Affiliations

  • Jiayi Chen
    • 1
  • Tele Tan
    • 1
  • Philippe Mulhem
    • 2
  1. 1.Information-Base Functions Lab, Real World Computing PartnershipKent Ridge Digital LabsSingapore
  2. 2.IPAL-CNRSSingapore

Personalised recommendations