A Correlation Approach for Automatic Image Annotation

  • David R. Hardoon
  • Craig Saunders
  • Sandor Szedmak
  • John Shawe-Taylor
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4093)


The automatic annotation of images presents a particularly complex problem for machine learning researchers. In this work we experiment with semantic models and multi-class learning for the automatic annotation of query images. We represent the images using scale invariant transformation descriptors in order to account for similar objects appearing at slightly different scales and transformations. The resulting descriptors are utilised as visual terms for each image. We first aim to annotate query images by retrieving images that are similar to the query image. This approach uses the analogy that similar images would be annotated similarly as well. We then propose an image annotation method that learns a direct mapping from image descriptors to keywords. We compare the semantic based methods of Latent Semantic Indexing and Kernel Canonical Correlation Analysis (KCCA), as well as using a recently proposed vector label based learning method known as Maximum Margin Robot.


Support Vector Machine Singular Value Decomposition Query Image Scale Invariant Feature Transformation Image Annotation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Barnard, K., Duygulu, P., Forsyth, D., de Fretias, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)MATHGoogle Scholar
  2. 2.
    Blei, D., Jordan, M.: Modeling annotated data. In: Proc. of the 26th Intl. Association for Computing Machinery Special Interest Group Information Retrieval Conference (ACM SIGIR) (2003)Google Scholar
  3. 3.
    Farquhar, J.D.R., Hardoon, D.R., Meng, H., Shawe-Taylor, J., Szedmak, S.: Two view learning: SVM-2K, theory and practice. In: Advances of Neural Information Processing Systems 19 (2005)Google Scholar
  4. 4.
    Fyfe, C., Lai, P.L.: Kernel and nonlinear canonical correlation analysis. International Journal of Neural Systems (2001)Google Scholar
  5. 5.
    Hardoon, D.R.: Semantic Models for Machine Learning. PhD thesis, University of Southampton (2006)Google Scholar
  6. 6.
    Hardoon, D.R., Szedmak, S., Shawe-Taylor, J.: Canonical correlation analysis: an overview with application to learning methods. Neural Computation 16, 2639–2664 (2004)CrossRefMATHGoogle Scholar
  7. 7.
    Hare, J.S., Lewis, P.H.: On Image Retrieval Using Salient Regions with Vector-Spaces and Latent Semantics. In: Leow, W.-K., Lew, M., Chua, T.-S., Ma, W.-Y., Chaisorn, L., Bakker, E.M. (eds.) CIVR 2005. LNCS, vol. 3568, pp. 540–549. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  8. 8.
    Hare, J.S., Lewis, P.H.: Saliency-based models of image content and their application to auto-annotation by semantic propagation. In: Proceedings of Multimedia and the Semantic Web / European Semantic Web Conference (2005)Google Scholar
  9. 9.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: Proceedings of the 7th IEEE International Conference on Computer vision, Kerkyra, Greece, pp. 1150–1157 (1999)Google Scholar
  10. 10.
    Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Hawaii, USA, pp. 525–531 (2001)Google Scholar
  11. 11.
    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Proceedings of the 2002 European Conference on Computer vision, Copenhagen, Denmark, pp. 128–142 (2002)Google Scholar
  12. 12.
    Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: International Conference on Computer Vision and Pattern Recognition, pp. 257–263 (2003)Google Scholar
  13. 13.
    Monay, F., Gatica-Perez, D.: On image auto-annotation with latent space models. In: MULTIMEDIA 2003: Proceedings of the eleventh ACM international conference on Multimedia, ACM Press, New York (2003)Google Scholar
  14. 14.
    Pan, J.-Y., Yang, H.-J., Faloutsos, C., Duygulu, P.: Gcap: Graph-based automatic image captioning. In: Proc. of the 4th International Workshop on Multimedia Data and Document Engineering (MDDE 2004), in conjunction with Computer Vision Pattern Recognition Conference (CVPR 2004) (2004)Google Scholar
  15. 15.
    Rousu, J., Saunders, C.J., Szedmak, S., Shawe-Taylor, J.: Learning hierarchical multi-category text classification models. In: ICML (2005)Google Scholar
  16. 16.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Berlin (1983)MATHGoogle Scholar
  17. 17.
    Sebe, N., Tian, Q., Loupias, E., Lew, M., Huang, T.: Evaluation of salient point techniques. Image and Vision Computing 21, 1087–1095 (2003)CrossRefGoogle Scholar
  18. 18.
    Xing, E.P., Yan, R., Hauptmann, A.G.: Mining associated text and images using dual-wing harmoniums. In: Uncertainty in Artificial Intelligence 2005 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • David R. Hardoon
    • 1
  • Craig Saunders
    • 1
  • Sandor Szedmak
    • 2
  • John Shawe-Taylor
    • 1
  1. 1.University of SouthamptonISIS Research GroupSouthamptonU.K.
  2. 2.Department of Computer ScienceUniversity of HelsinkiHelsinkiFinland

Personalised recommendations