Knowledge and Information Systems

, Volume 10, Issue 2, pp 135–162 | Cite as

CoMMA: a framework for integrated multimedia mining using multi-relational associations

  • Ankur M. Teredesai
  • Muhammad A. Ahmad
  • Juveria Kanodia
  • Roger S. Gaborski
Regular Paper


Generating captions or annotations automatically for still images is a challenging task. Traditionally, techniques involving higher-level (semantic) object detection and complex feature extraction have been employed for scene understanding. On the basis of this understanding, corresponding text descriptions are generated for a given image. In this paper, we pose the auto-annotation problem as that of multi-relational association rule mining where the relations exist between image-based features, and textual annotations. The central idea is to combine low-level image features such as color, orientation, intensity, etc. and corresponding text annotations to generate association rules across multiple tables using multi-relational association mining. Subsequently, we use these association rules to auto-annotate test images.

In this paper we also present a multi-relational extension to the FP-tree algorithm to accomplish the association rule mining task effectively. The motivation for using multi-relational association rule mining for multimedia data mining is to exhibit the potential accorded by multiple descriptions for the same image (such as multiple people labeling the same image differently). Moreover, multi-relational association rule mining can also benefit the auto-annotation process by pruning the number of trivial associations that are generated if text and image features were combined in a single table through a join. In this paper, we discuss these issues and the results of our auto-annotation experiments on different test sets. Another contribution of this paper is highlighting a need to develop robust evaluation metrics for the image annotation task. We propose several applicable scoring techniques and then evaluate the performance of the different algorithms to study the utility of these techniques. A detailed analysis of the datasets used and the performance results are presented to conclude the paper.


Image captioning Multimedia data mining Auto-annotation Multi-relational association rule mining FP-Growth Multi-relational FP-Growth Text-based image retrieval 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agrawal R, Mannila H, Srikant R, Toivonen H, Verkamo A (1996) Fast discovery of association rules. In: Fayyad UM, Piatetsky-Shapiro G, Smyth P, Uthurusamy R (eds) Advances in knowledge discovery and data mining. AAAI/MIT Press, chapter 12, pp 307–328Google Scholar
  2. 2.
    Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the ACM Sigmod conference on management of data. Washington, DC, pp 207–216Google Scholar
  3. 3.
    Barnard K, Duygulu P, Freitas N, Forsyth D, Blei D, Jordan M (2003) Matching words and pictures. J Mach Learn Res 3:1107–1135CrossRefzbMATHGoogle Scholar
  4. 4.
    Brockhausen P, Haas M, Kietz J, Knobbe A, Rem O, Zucker R, Brandt N (2001) Mining multi-relational data. Technical report, IST Project MiningMart, IST-11993Google Scholar
  5. 5.
    Carson C, Thomas M, Belongie S, Hellerstein J, Malik J (1999) Blobworld: a system for region-based image indexing and retrieval. In: Proceedings of the 3rd international conference on visual information system. Amsterdam, The Netherlands, pp 509–516Google Scholar
  6. 6.
    Cheng P, Chien L (2003) Auto-generation of topic hierarchies for web images from users' perspectives, CIKM'03, pp 544–547Google Scholar
  7. 7.
    Dai J, Lee M, Hsu W (2003) Mining viewpoint patterns in image databases. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining. Washington, DC, USAGoogle Scholar
  8. 8.
    Džeroski S (1993) Multi relational data mining: an introduction. Sigmod ACM Trans Program Lang Syst 15(5):795–825CrossRefGoogle Scholar
  9. 9.
    Faloutsos C, Pan J, Yang H, Duygulu P (2004) Automatic multimedia cross-modal correlation discovery. In: Proceedings of the 10th ACM SIGKDD conference. Seatle, WAGoogle Scholar
  10. 10.
    Faloutsos C, Pan J, Yang H, Duygulu P (2004) GCap: graph-based automatic image captioning. In: Proceedings of the 4th international workshop on multimedia data and document engineering (MDDE, 04), in conjunction with computer vision pattern recognition conference (CVPR, 04). Washington, DCGoogle Scholar
  11. 11.
    Flickner M, Sawhney H, Niblack W, Ashley J, Huang Q, Dom B, Gorkani M, Hafner J, Lee D, Petkovic D, Steele D, Yanker P (1995) Query by image and video content: the QBIC system. IEEE Comput 28/29:23–32Google Scholar
  12. 12.
    Gaborski R, Vaingankar VS, Canosa RL (2003) Goal directed visual search based on color cues: cooperative effects of top-down & bottom-up visual attention. In: Proceedings of the artificial neural networks in engineering, vol 13. Rolla, Missouri, pp 613–618Google Scholar
  13. 13.
    Gaborski R, Vaingankar VS, Chaoji V, Teredesai A, Tentler A (2004) VENUS: a system for novelty detection in video streams with learning. In: Proceedings of the 17th international FLAIRS conference. South Beach, FLGoogle Scholar
  14. 14.
    Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proceedings of the ACM-SIGMOD. DallasGoogle Scholar
  15. 15.
    Hsu W, Lee M, Zhang J (2002) Image mining: trends and developments. J Intell Inform Syst. Special issue on multimedia data mining. Kluwer AcademicGoogle Scholar
  16. 16.
    Itti L, Koch C (2001) Computational modeling of visual attention. Nat Neurosci Rev 2(3):194–203CrossRefGoogle Scholar
  17. 17.
    Jensen V, Soparkar N (2001) Frequent itemset counting across multiple tables. In: Proceedings of PAKDD, pp 49–61Google Scholar
  18. 18.
    Jeon J, Manmatha R (2003) Automatic image annotation and retrieval using cross-media relevance models, SIGIR'03Google Scholar
  19. 19.
    Li J, Wang J (2003) Automatic linguistic indexing of pictures by a statistical modeling approach. IEEE Trans Pattern Anal Mach Intell 25(9)Google Scholar
  20. 20.
    Monay F, Gatica-Perez D (2003) On image auto-annotation with latent space models, MM'03, pp 275–278Google Scholar
  21. 21.
    Ooi B, Tan K, Chua T, Hsu W (1998) Fast image retrieval using color-spatial information. VLDB J 7(2):115–128. Springer VerlagCrossRefGoogle Scholar
  22. 22.
    Parkhurst D, Law K, Neibur E (2002) Modeling the role of salience in the allocation of overt visual attention. Vis Res 42(1):107–123CrossRefPubMedGoogle Scholar
  23. 23.
    Suetens P, Pascal F, Hanson, Andrew J (1992) Computational strategies for object recognition, ACM Comput Surv 5–62Google Scholar
  24. 24.
    Su Z, Zhang H, Li S (2001) Extraction of feature subspaces for content based retrieval using relevance feedback, MM'01, pp 98–106Google Scholar
  25. 25.
    Wang L, Liu L, Khan L (2004) Automatic image annotation and retrieval using subspace clustering algorithm. In: Proceedings of the second ACM international workshop on multimedia databases, ACM-MMDB'04. Arlington, VA, USAGoogle Scholar
  26. 26.
    Wenyin L, Dumais S, Sun Y, Zhang H, Czerwinski M, Field B (2001) Semi-automatic image annotation. Microsoft research technical reportGoogle Scholar
  27. 27.
    Zaiane O, Han J, Li Z, Chee S, Chiang J (1998) MultiMediaMiner: a system prototype for multimedia data mining. In: Proceedings of the ACM SIGMOD international conference on management of data. pp 581–583Google Scholar
  28. 28.
    Zhang R, Zhang Z (Mark) (2003) Addressing CBIR efficiency, effectiveness and retrieval subjectivity simultaneously, MIR'03, pp 71–78Google Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • Ankur M. Teredesai
    • 1
  • Muhammad A. Ahmad
    • 1
  • Juveria Kanodia
    • 1
  • Roger S. Gaborski
    • 1
  1. 1.Department of Computer ScienceRochester Institute of TechnologyRochesterUSA

Personalised recommendations