Advertisement

Ontological inference for image and video analysis

  • Christopher Town
Original Paper

Abstract

This paper presents an approach to designing and implementing extensible computational models for perceiving systems based on a knowledge-driven joint inference approach. These models can integrate different sources of information both horizontally (multi-modal and temporal fusion) and vertically (bottom–up, top–down) by incorporating prior hierarchical knowledge expressed as an extensible ontology.

Two implementations of this approach are presented. The first consists of a content-based image retrieval system that allows users to search image databases using an ontological query language. Queries are parsed using a probabilistic grammar and Bayesian networks to map high-level concepts onto low-level image descriptors, thereby bridging the ‘semantic gap’ between users and the retrieval system. The second application extends the notion of ontological languages to video event detection. It is shown how effective high-level state and event recognition mechanisms can be learned from a set of annotated training sequences by incorporating syntactic and semantic constraints represented by an ontology.

Ontologies Perceptual inference Content-based image retrieval Video analysis Knowledge-based computer vision 

References

  1. 1.
    Abella, A.: From imagery to salience: Locative expressions in context. Ph.D. Thesis, University of Columbia (1995)Google Scholar
  2. 2.
    Abella, A., Kender, J.: From pictures to words: Generating locative descriptions of objects in an image. In: ARPA94, pp II:909–918 (1994)Google Scholar
  3. 3.
    Barnard, K., Duygulu, P., Forsyth, D.: Clustering art. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2001)Google Scholar
  4. 4.
    Barnard, K., Duygulu, P., Forsyth, D., Freitas, N., Blei, D., Jordan, M.: Matching words and pictures. J. Mach. Learn. Res. 3, 1107–1135 (2003)CrossRefzbMATHGoogle Scholar
  5. 5.
    Barnard, K., Forsyth, D.: Learning the semantics of words and pictures. In: Proceedings of the International Conference on Computer Vision (2001)Google Scholar
  6. 6.
    Bobick, A., Richards, W.: Classifying objects from visual information. Technical Report, MIT AI Lab (1986)Google Scholar
  7. 7.
    Bunke, H., Pasche, D.: Parsing multivalued strings and its application to image and waveform recognition, structural pattern analysis. World Scientific Publishing, Singapore (1990)Google Scholar
  8. 8.
    Buxton, H., Walker, N.: Query based visual analysis: Spatio-temporal reasoning in computer vision. Vis. Comput. 6(4), 247–254 (1988)CrossRefGoogle Scholar
  9. 9.
    Chen, Y., Rui, Y., Huang, T.: JPDAF based HMM for real-time contour tracking. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2001)Google Scholar
  10. 10.
    Chua, T.S., Teo, K.C., Ooi, B.C., Tan, K.L.: Using domain knowledge in querying image databases. In: Proceeding of the International Conference on Multimedia Modeling (1996)Google Scholar
  11. 11.
    Cooper, G., Herskovits, E.: A Bayesian method for the induction of probabilistic networks from data. Machine Learn. 9, 309–347 (1992)zbMATHGoogle Scholar
  12. 12.
    Crowley, J., Coutaz, J., Rey, G., Reignier, P.: Perceptual components for context aware computing. In: Proceedings of the Ubicomp 2002 (2002)Google Scholar
  13. 13.
    Darrell, T., Gordon, G., Harville, M., Woodfill, J.: Integrated person tracking using stereo, color, and pattern detection. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (1998)Google Scholar
  14. 14.
    Dennett, D.: Minds, Machines, and Evolution, pp. 129–151. Cambridge University Press, Cambridge (1984)Google Scholar
  15. 15.
    Duygulu, P., Barnard, K., De Freitas, J., Forsyth, D.: Object recognition as machine translation: Learning a lexicon for a fixed image vocabulary. In: Proceedings of the European Conference on Computer Vision (2002)Google Scholar
  16. 16.
    Ekin, A., Tekalp, A., Mehrotra, R.: Semantic video querying using an integrated semantic-syntactic model. In: Proceeding of the International Conference on Image Processing (2002)Google Scholar
  17. 17.
    Friedman, N., Koller, D.: Being Bayesian about network structure. In: Proceedings of the Conference on Uncertainty in Artificial Intelligence (2000)Google Scholar
  18. 18.
    Glöckner, I., Knoll, A.: Fuzzy quantifiers for processing natural-language queries in content-based multimedia. Technical Report TR97-05, Faculty of Technology, University of Bielefeld, Germany (1997)Google Scholar
  19. 19.
    Guarino, N., Masolo, C., Vetere, G.: Ontoseek: Content-based access to the web. IEEE Intell. Syst. 14(3), 70–80 (1999)CrossRefGoogle Scholar
  20. 20.
    Harnad, S.: The symbol grounding problem. Physica D 42, 335–346 (1990)CrossRefGoogle Scholar
  21. 21.
    Heckerman, D.: A tutorial on learning with Bayesian networks. In: Jordan, M. (ed.) Learning in Graphical Models. MIT Press, Massachusetts (1998)Google Scholar
  22. 22.
    Hongeng, S., Nevatia, R.: Large-scale event detection using semi-hidden markov models. In: Proceeding of the International Conference on Computer Vision (2003)Google Scholar
  23. 23.
    Hoogs, A., Rittscher, J., Stein, G., Schmiederer, J.: Video content annotation using visual analysis and large semantic knowledgebase. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2003)Google Scholar
  24. 24.
    Hu, M.: Visual pattern recognition by moment invariants. IRA Trans. Inform. Theory 17(2), 179–187 (1962)Google Scholar
  25. 25.
    Jaimes, A., Chang, S.: A conceptual framework for indexing visual information at multiple levels. In: IS&T SPIE Internet Imaging (2000)Google Scholar
  26. 26.
    Jaimes, A., Chang, S.F.: Integrating multiple classifiers in visual object detectors learned from user input. In: Proceedings of the Asian Conference on Computer Vision (2000)Google Scholar
  27. 27.
    Jensen, F.: An Introduction to Bayesian Networks. Springer-Verlag, New York (1996)Google Scholar
  28. 28.
    Jordan, M. (ed.): Learning in Graphical Models. MIT Press, Massachusetts (1999)Google Scholar
  29. 29.
    Katz, B., Lin, J., Stauffer, C., Grimson, E.: Answering questions about moving objects in surveillance videos. In: Proceedings of the AAAI Spring Symposium on New Directions in Question Answering (2003)Google Scholar
  30. 30.
    Kohler, C.: Selecting ghosts and queues from a car trackers output using a spatio-temporal query language. In: Proceedings of the Conference on Computer Vision and Pattern Recognition (2004)Google Scholar
  31. 31.
    Kokar, M., Wang, J.: An example of using ontologies and symbolic information in automatic target recognition. In: Proceedings of the SPIE Sensor Fusion: Architectures, Algorithms, and Applications VI, pp. 40–50 (2002)Google Scholar
  32. 32.
    Kruschwitz, U.: Exploiting structure for intelligent web search. In: Proceeding of the International Conference on System Sciences. Maui, Hawaii (2001)Google Scholar
  33. 33.
    Lalmas, M.: Information retrieval and Dempster-Shafer's theory of evidence. In: Applications of Uncertainty Formalisms, pp. 157–177. Springer, Berlin Heidelberg New York (1998)Google Scholar
  34. 34.
    Lim, J.: Learnable visual keywords for image classification. In: Proceedings of the ACM International Conference on Digital Libraries (1999)Google Scholar
  35. 35.
    Mezaris, V., Kompatsiaris, I., Strintzis, M.: An ontology approach to object-based image retrieval. In: Proceedings of the International Conference on Image Processing (2003)Google Scholar
  36. 36.
    Miller, G., Beckwith, R., Fellbaum, C., Gross, D., Miller, K.: Introduction to Wordnet: an on-line lexical database. Int. J. Lexicogr. 3, 235–244 (1990)CrossRefGoogle Scholar
  37. 37.
    Mojsilovic, A., Gomes, J., Rogowitz, B.: Isee: Perceptual features for image library navigation. In: Proceedings of the 2002 SPIE Human Vision and Electronic Imaging (2002)Google Scholar
  38. 38.
    Mueller, H., Marchand-Maillet, S., Pun, T.: The truth about Corel—evaluation in image retrieval. In: Proceedings of the Conference on Image and Video Retrieval, LNCS 2383, pp. 38–50. Springer, Berlin Heidelberg, New York (2002)Google Scholar
  39. 39.
    Mueller, H., Mueller, W., Squire, D., Marchand-Maillet, S., Pun, T.: Performance evaluation in content-based image retrieval: Overview and proposals. Pattern Recog. Lett. 22(5), 593–601 (2001)zbMATHGoogle Scholar
  40. 40.
    Murphy, K.: The Bayes net toolbox for matlab. Comput. Sci. Stat. 33 (2001)Google Scholar
  41. 41.
    Nepal, S., Ramakrishna, M., Thom, J.: A fuzzy object query language (FOQL) for image databases. In: Proceedings of the Intenational Conference on Database Systems for Advanced Applications (1999)Google Scholar
  42. 42.
    Nevatia, R., Hobbs, J., Bolles, B.: An ontology for video event representation. In: Proceedings of the International Workshop on Detection and Recognition of Events in Video (at CVPR04) (2004)Google Scholar
  43. 43.
    Nevatia, R., Zhao, T., Hongeng, S.: Hierarchical language-based representation of events in video streams. In: Proceedings of the IEEE Workshop on Event Mining (2003)Google Scholar
  44. 44.
    Park, S., Aggarwal, J.: Event semantics in two-person interactions. In: Proceeding of the International Conference on Pattern Recognition (2004)Google Scholar
  45. 45.
    Parsons, S., Hunter, A.: A review of uncertainty handling formalisms. In: Applications of Uncertainty Formalisms, pp. 8–37. Springer, Berlin Heidelberg New York (1998)Google Scholar
  46. 46.
    Pastra, K., Saggion, H., Wilks, Y.: Extracting relational facts for indexing and retrieval of crime-scene photographs. IEEE Intell. Syst. 18(1), 55–61 (2002)CrossRefGoogle Scholar
  47. 47.
    Pfeffer, A., Koller, D.: Semantics and inference for recursive probability models. In: Proceedings of the AAAI'00 (2000)Google Scholar
  48. 48.
    Pfeffer, A., Koller, D., Milch, B., Takusagawa, K.: SPOOK: A system for probabilistic object-oriented knowledge representation. In: Proceeding of the Conference on Uncertainty in AI (1999)Google Scholar
  49. 49.
    Rodden, K.: Evaluating similarity-based visualisations as interfaces for image browsing. Ph.D. Thesis, Cambridge University Computer Laboratory (2001)Google Scholar
  50. 50.
    Rowe, N., Frew, B.: Automatic Classification of Objects in Captioned Descriptive Photographs for Retrieval, Chap. 4, pp. 65–79. AAAI Press, California (1997)Google Scholar
  51. 51.
    Roweis, S., Ghahramani, Z.: A unifying review of linear Gaussian models. Neural Comput. 11(2), 305–345 (1999)Google Scholar
  52. 52.
    Roy, D.: Learning visually grounded words and syntax of natural spoken language. Evol. Commun. 4, (2001)Google Scholar
  53. 53.
    Roy, D.: A trainable visually-grounded spoken language generation system. In: Proceedings of the International Conference of Spoken Language Processing (2002)Google Scholar
  54. 54.
    Sherrah, J., Gong, S.: Tracking discontinuous motion using Bayesian inference. In: Proceeding of the European Conference on Computer Vision, pp. 150–166 (2000)Google Scholar
  55. 55.
    Sherrah, J., Gong, S.: Continuous global evidence-based Bayesian modality fusion for simultaneous tracking of multiple objects. In: Proceedings of the International Conference on Computer Vision (2001)Google Scholar
  56. 56.
    Sinclair, D.: Voronoi seeded colour image segmentation. Technical Report TR99-04, AT&T Laboratories Cambridge (1999)Google Scholar
  57. 57.
    Sinclair, D.: Smooth region structure: folds, domes, bowls, ridges, valleys and slopes. In: Proceedings of the Conference on Computer Vision and Pattern Recognition, pp. 389–394 (2000)Google Scholar
  58. 58.
    Smith, P.: Edge-based motion segmentation. Ph.D. Thesis, Cambridge University Engineering Department (2001)Google Scholar
  59. 59.
    Socher, G., Sagerer, G., Perona, P.: Bayesian reasoning on qualitative descriptions from images and speech. Image Vis. Comput. 18(2), 155–172 (2000)CrossRefGoogle Scholar
  60. 60.
    Spengler, M., Schiele, B.: Towards robust multi-cue integration for visual tracking. Lect. Notes Comput. Sci. 2095, 93–106 (2001)CrossRefGoogle Scholar
  61. 61.
    Town, C.: Ontology based visual information processing. Ph.D. Thesis, University of Cambridge (2004)Google Scholar
  62. 62.
    Town, C.: Ontology-driven Bayesian networks for dynamic scene understanding. In: Proceedings of the International Workshop on Detection and Recognition of Events in Video (at CVPR04) (2004)Google Scholar
  63. 63.
    Town, C., Sinclair, D.: Content based image retrieval using semantic visual categories. Technical Report MV01-211, Society for Manufacturing Engineers (2001)Google Scholar
  64. 64.
    Town, C., Sinclair, D.: Ontological query language for content based image retrieval. In: Proceedings of the IEEE Workshop on Content-Based Access of Image and Video Libraries, pp. 75–81 (2001)Google Scholar
  65. 65.
    Town, C., Sinclair, D.: Language-based querying of image collections on the basis of an extensible ontology. Int. J. Image Vis. Comput. 22(3), 251–267 (2004)CrossRefGoogle Scholar
  66. 66.
    Tsai, W., Fu, K.: Attributed grammars—a tool for combining syntactic and statistical approaches to pattern recognition. IEEE Trans. Syst. Man Cybernetics SMC-10(12) (1980)Google Scholar
  67. 67.
    Tsotsos, J., Mylopoulos, J., Covvey, H., Zucker, S.: A framework for visual motion understanding. IEEE Trans. Pattern Anal. Mach. Intell. Special Issue on Computer Analysis of Time-Varying Imagery, 563–573 (1980)Google Scholar
  68. 68.
    Wachsmuth, S., Socher, G., Brandt-Pook, H., Kummert, F., Sagerer, G.: Integration of vision and speech understanding using Bayesian networks. Videre J. Comput. Vis. Res. 1(4) (2000)Google Scholar
  69. 69.
    Wu, Y., Huang, T.: A co-inference approach to robust visual tracking. In: Proceedings of the International Conference on Computer Vision (2001)Google Scholar
  70. 70.
    Xu, C., Prince, J.: Snakes, shapes, and gradient vector flow. IEEE Trans. Image Process. 7(3), 359–369 (1998)CrossRefzbMATHMathSciNetGoogle Scholar
  71. 71.
    Zhao, R., Grosky, W.: From features to semantics: Some preliminary results. In: Proceedings of the IEEE International Conference on Multimedia and Expo, pp. 679–682 (2000)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.University of Cambridge Computer LaboratoryCambridgeUK

Personalised recommendations