Universal Access in the Information Society

, Volume 8, Issue 3, pp 199–218 | Cite as

Towards computer-vision software tools to increase production and accessibility of video description for people with vision loss

  • Langis Gagnon
  • Samuel Foucher
  • Maguelonne Heritier
  • Marc Lalonde
  • David Byrns
  • Claude Chapdelaine
  • James Turner
  • Suzanne Mathieu
  • Denis Laurendeau
  • Nath Tan Nguyen
  • Denis Ouellet
Long Paper


This paper presents the status of a R&D project targeting the development of computer-vision tools to assist humans in generating and rendering video description for people with vision loss. Three principal issues are discussed: (1) production practices, (2) needs of people with vision loss, and (3) current system design, core technologies and implementation. The paper provides the main conclusions of consultations with producers of video description regarding their practices and with end-users regarding their needs, as well as an analysis of described productions that lead to propose a video description typology. The current status of a prototype software is also presented (audio-vision manager) that uses many computer-vision technologies (shot transition detection, key-frame identification, key-face recognition, key-text spotting, visual motion, gait/gesture characterization, key-place identification, key-object spotting and image categorization) to automatically extract visual content, associate textual descriptions and add them to the audio track with a synthetic voice. A proof of concept is also briefly described for a first adaptive video description player which allows end users to select various levels of video description.


e-Accessibility Video description Video indexing Computer vision 



This work is supported in part by the Department of Canadian Heritage (http://www.pch.gc.ca) through Canadian Culture Online, and the Ministère du développement économique, de l’innovation et de l’exportation (MDEIE) of the Gouvernement du Québec (http://www.mdeie.gouv.qc.ca). The authors are very grateful to the reviewers for their constructive comments, which helped improve the quality of the paper.


  1. 1.
    Canadian Radio-television and Telecommunications Communication: Broadcasting Decision CRTC 2002-384. http://www.crtc.gc.ca/archive/ENG/Decisions/2002/db2002-384.htm (2002)
  2. 2.
    Piety, P.J.: The language system of audio description: an investigation as a discursive process. J. Vis. Impair. Blind. 98(8), 1–36 (2004)Google Scholar
  3. 3.
    Turner, J.M.: Some characteristics of audio description and the corresponding moving image. In: Preston, C.M., Medford, N.J. (eds.) Proceedings of the 61st ASIS Annual Meeting, Pittsburgh, 24–29 October 1998, Information Today, pp. 108–117 (1998)Google Scholar
  4. 4.
    Turner, J.M., Colinet, E.: Using audio description for indexing moving images. Knowl. Org. 31(4), 222–230 (2004)Google Scholar
  5. 5.
    Office of Communication: ITC guidance on standards for audio description. http://www.ofcom.org.uk/static/archive/itc/itc_publications/codes_guidance/audio_description/index.asp.html (2000)
  6. 6.
    Canadian Network for Inclusive Cultural Exchange: Online video description guidelines. http://cnice.utoronto.ca/guidelines/video.php (2005)
  7. 7.
  8. 8.
    Mathieu, S.: Audiovision Interactive et Adaptable, Technical Report for the E-inclusion Research Network (2007)Google Scholar
  9. 9.
    Gagnon, L., Foucher, S., Laliberté, F., Lalonde, M., Beaulieu, M.: Towards an application of content-based video indexing to computer-assisted descriptive video. In: Proceedings of Computer and Robot Vision 2006, 8 pp (on CD-ROM) (2006)Google Scholar
  10. 10.
    Héritier, M., Gagnon, L., Foucher, S.: Places clustering of full-length film key-frames using latent aspects modeling over SIFT matches. IEEE Trans. Circuits Syst. Video Technol. (to appear) (2008)Google Scholar
  11. 11.
    Foucher, S., Gagnon, L.: Automatic detection and clustering of actor faces based on spectral clustering techniques. In: Proceedings of Computer and Robot Vision 2007, 8 pp (on CD-ROM) (2007)Google Scholar
  12. 12.
    Lalonde, M., Gagnon, L.: Key-text spotting in documentary videos using Adaboost. In: Proceedings of the IS&T/SPIE Symposium on Electronic Imaging: Applications of Neural Networks and Machine Learning in Image Processing X (SPIE #6064B) (2006)Google Scholar
  13. 13.
    Branje, C., Marshall, S., Tyndall, A., Fels, D.I.: LiveDescribe. In: Proceedings of the AMCIS 2006 (2006)Google Scholar
  14. 14.
  15. 15.
    State-of-the-art on Multimedia Search Engines, Technical Report D2.1. Chorus Project Consortium (2007)Google Scholar
  16. 16.
  17. 17.
    SCHEMA network of excellence. http://www.iti.gr/SCHEMA/index.html
  18. 18.
  19. 19.
    Center for Digital Video Processing. http://www.cdvp.dcu.i.e
  20. 20.
    CALIPH and EMIR project. http://caliph-emir.sourceforge.net
  21. 21.
  22. 22.
  23. 23.
  24. 24.
    MADIS project. http://madis.crim.ca
  25. 25.
    Gagnon, L., Foucher, S., Gouaillier, V., Brousseau, J., Boulianne, G., Osterrath, F., Chapdelaine, C., Brun, C., Dutrisac, J., St-Onge, F., Champagne, B., Lu, X.: MPEG-7 Audio-Visual Indexing Test-Bed for Video Retrieval, IS&T/SPIE Electronic Imaging 2004: Internet Imaging V (SPIE #5304), pp. 319–329 (2003)Google Scholar
  26. 26.
    Foucher, S., Héritier, M., Lalonde, M., Byrns, D., Chapdelaine, C., Gagnon, L.: Proof-of-concept software tools for video content extraction applied to computer-assisted descriptive video, and results of consultations with producers, technical report, CRIM-07/04-07, 2007 (2007)Google Scholar
  27. 27.
    Mathieu, S., Turner, J.M.: Audiovision interactive et adaptable, technical report, 2007. http://hdl.handle.net/1866/1307 (2007)
  28. 28.
    Turner, J.M., Mathieu, S.: Audio description for indexing films, World Library and Information Congress (IFLA), Durban. http://members.e-inclusion.crim.ca/files/articles/IFLA-en.pdf (2007)
  29. 29.
    Fels, D.I., Udo, J.P., Diamond, J.E., Diamond, J.I.: A first person narrative approach to video description for animated comedy. J. Vis. Impair. Blind. 100(5), 295–305 (2006)Google Scholar
  30. 30.
    Vendrig, J., Worring, M.: Systematic evaluation of logical story unit segmentation. IEEE Trans. Multimed. 4(4), 492–499 (2000)CrossRefGoogle Scholar
  31. 31.
    Bovik, A.C. (ed.): Handbook of Image and Video Processing. Academic Press, New York (2000)Google Scholar
  32. 32.
    Schaffalitzky, F., Zisserman, A.: Automated location matching in movies. Comput. Vis. Image Underst. 42:236–264 (2003)Google Scholar
  33. 33.
    Hofmann, T.: Probabilistic Latent Semantic Indexing. In: SIGIR (1999)Google Scholar
  34. 34.
    Bosch, A., Zisserman, A., Munoz, S.: Scene Classification via pLSA. In: ECCV (2006)Google Scholar
  35. 35.
    Quelhas, P., Monay, F., Odobez, J.M., Gatica-Perez, D., Tuytelaars, T., Van Gool, L.: Modeling Scenes with Local Descriptors and Latent Aspects. In: ICCV (2005)Google Scholar
  36. 36.
    Fei-Fei, L., Perona, P.: A Bayesian Hierarchical Model for Learning Natural Scene Categories. In: CVPR (2005)Google Scholar
  37. 37.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects categories in image collection, MIT AI Lab Memo AIM-2005-005 (2005)Google Scholar
  38. 38.
    Lowe, D.G.: Distinctive Image Features from Scale-invariant Keypoints. In: IJCV (2004)Google Scholar
  39. 39.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (2003)MATHCrossRefGoogle Scholar
  40. 40.
    Ng, A.Y., Jordan, M., Weiss, Y.: On Spectral Clustering: Analysis and an Algorithm. In: NIPS (2002)Google Scholar
  41. 41.
    Viola, P., Jones, M.: Robust real-time face detection. IJCV 57(2) (2004)Google Scholar
  42. 42.
    Gagnon, L., Laliberté, F., Foucher, S., Laurendeau, D., Branzan Albu, A.: A System for Tracking and Recognizing Pedestrian Faces using a Network of Loosely Coupled Cameras, SPIE Defense and Security: Visual Information Processing XV (SPIE #6246), Orlando (2006)Google Scholar
  43. 43.
    Yang, J., Zhang, D., Frangi, A.F., Yanf, J.: Two-dimensional PCA: a new approach to appearance-based face representation and recognition. Trans. Pattern Anal. Mach. Intell. 26(1), 131–137 (2004)CrossRefGoogle Scholar
  44. 44.
    Kong, H., Li, X., Wang, L., Teoh, E.K., Wang, J.G., Venkateswarlu, R.: Generalized 2D principal component analysis. In: IEEE International Joint Conference on Neural Networks (IJCNN) (2005)Google Scholar
  45. 45.
    Zhang, D., Zhou, Z.H., Chen, S.: Diagonal principal component analysis for face recognition. Pattern Recognit. 39(1), 140–142 (2006)CrossRefGoogle Scholar
  46. 46.
    Sato, T., Kanade, T., Hughes, E.K., Smith, M.A., Satoh, S.: VideoOCR: indexing digital news libraries by recognition of superimposed caption. ACM J. Multime. Syst. 7(5), 385–395 (1999)CrossRefGoogle Scholar
  47. 47.
    Lienhart, R., Wernicke, A.: Localizing and segmenting text in images and videos. IEEE Trans. Circuits Syst. Video Technol. 12(4), 256–268 (2002)CrossRefGoogle Scholar
  48. 48.
    Chen, X., Yuille, A.L.: Detecting and Reading Text in Natural Scenes. In: CVPR, vol. II, pp. 366–373 (2004)Google Scholar
  49. 49.
  50. 50.
    Ouellet, D., Nguyen, N.T., Dung, V.V., Laurendeau, D.: Gait and Gesture Description, Technical Report, Laval University (2007)Google Scholar
  51. 51.
    Lucas, B.D., Kanade, T.: An iterative image registration technique with an application to stereo vision. In: International Joint Conference on Artificial Intelligence, pp. 674–679 (1981)Google Scholar
  52. 52.
    Tomasi, C., Kanade, T.: Detection and Tracking of Point Features, Carnegie Mellon University Technical Report CMU-CS-91-132 (1991)Google Scholar
  53. 53.
    Birchfield, S.: KLT: an Implementation of the Kanade-Lucas-Tomasi Feature Tracker. http://www.ces.clemson.edu/~stb/klt
  54. 54.
    Bailer, W., Schallauer, P., Thallinger, G.: Camera Motion Detection, Joanneum Research. In: TRECVID (2005)Google Scholar
  55. 55.
    Birchfield, S.: Derivation of Kanade-Lucas-Tomasi Tracking Equation. http://www.ces.clemson.edu/~stb/klt/birchfield-klt-derivation.pdf (unpublished) (1997)
  56. 56.
    Bezdec, J.C.: Pattern Recognition with Fuzzy Objective Function Algorithms, Plenum Press, New York (1981)Google Scholar
  57. 57.
    Rote, G.: Computing the minimum Hausdorff distance between two point sets on a line under translation. Inf. Process. Lett. 38, 123–127 (1991)MATHCrossRefMathSciNetGoogle Scholar
  58. 58.

Copyright information

© Springer-Verlag 2009

Authors and Affiliations

  • Langis Gagnon
    • 1
  • Samuel Foucher
    • 1
  • Maguelonne Heritier
    • 1
  • Marc Lalonde
    • 1
  • David Byrns
    • 1
  • Claude Chapdelaine
    • 1
  • James Turner
    • 2
  • Suzanne Mathieu
    • 2
  • Denis Laurendeau
    • 3
  • Nath Tan Nguyen
    • 3
  • Denis Ouellet
    • 3
  1. 1.R&D DepartmentComputer Research Institute of Montreal (CRIM)MontrealCanada
  2. 2.École de bibliothéconomie et des sciences de l’informationUniversité de MontréalMontrealCanada
  3. 3.Department of Electrical and Computer EngineeringLaval UniversityQuebecCanada

Personalised recommendations