Information Systems Frontiers

, Volume 16, Issue 5, pp 787–799 | Cite as

Concept-concept association information integration and multi-model collaboration for multimedia semantic concept detection

  • Tao Meng
  • Mei-Ling Shyu


The recent development of the digital camera technology and the popularity of social network websites such as Facebook and Youtube have created huge amounts of multimedia data. Multimedia information is ubiquitous and essential in many applications. In order to fill the gap between data and application requirements (or the so-called semantic gap), advanced methods and tools are needed to automatically mine and annotate high-level concepts to assist in associating the low-level features to the high-level concepts directly. It has been shown that concept-concept association can be effective in bridging the semantic gap in multimedia data. In this paper, a concept-concept association information integration and multi-model collaboration framework is proposed to enhance high-level semantic concept detection from multimedia data. Several experiments are conducted and the comparison results demonstrate that the proposed framework outperforms those approaches in the comparison in terms of the Mean Average Precision (MAP) values.


Multi-model collaboration Semantic gap Association information integration Concept detection 


  1. Agrawal, R., & Srikant, R. (1994). Fast algorithms for mining association rules in large databases. In International conference on very large data bases (pp. 487–499). Santiago de Chile, Chile.Google Scholar
  2. Archambeau, C., Valle, M., Assenza, A., Verleysen, M. (2006). Assessment of probability density estimation method: Parzen window and finite gaussian mixtures. In IEEE international symposium on circuits and systems (pp. 499–503). Island of Kos, Greece.Google Scholar
  3. Aytar, Y., Orhan, O.B., Shah, M. (2007). Improving semantic concept detection and retrieval using contextual estimates. In IEEE international conference on multimedia and expo (pp. 536–539). Beijing, China.Google Scholar
  4. Ballan, L., Bertinti, M., Bimbo, A.D., Serra, G. (2010). Video annotation and retrieval using ontologies an rule learning. IEEE Multimedia, 17(4), 80–88.CrossRefGoogle Scholar
  5. Bar, M., & Ullman, S. (1993). Spatial context in recognition. Perception, 25(3), 324–352.Google Scholar
  6. Benmokhtar, R., & Huet, B. (2011). An ontology-based evidential framework for video indexing using high-level multimodal fusion. Multimedia Tools and Applications, 55, 1–27.CrossRefGoogle Scholar
  7. Chen, C., Lin, L., Shyu, M.L. (2011). Utilization of co-occurrence relationships between semantic concepts in re-ranking for information retrieval. In IEEE international symposium on multimedia (ISM2011) (pp. 53–60). Dana Point, California.Google Scholar
  8. Chen, M.Y., & Hauptmann, A. (2007). Discriminative fields for modeling semantic concepts in video. In Large scale semantic access to content (text, image, video, and sound) (pp. 151–166). Pittsburgh, Pennsylvania.Google Scholar
  9. Cherman, E.A., Metz, J., Monard, M.C. (2011). Incorporating label dependency into the binary relevance framework for multi-label classification. Expert Systems with Applications, 39(2), 1647–1655.CrossRefGoogle Scholar
  10. Dalal, N., & Triggs, B. (2005). Histograms of oriented gradients for human detection. In IEEE computer society conference on computer vision and pattern recognition (Vol. 1, pp. 886–893). San Diego, USA.Google Scholar
  11. Elleuch, N., Zarka, M., Ammar, A.B., Alimi, A.M. (2011). A fuzzy ontology-based framework for reasoning in visual video content analysis and indexing. In The eleventh international workshop on multimedia data mining (pp. 1–8). San Diego, CA.Google Scholar
  12. Galleguillos, C., Rabinovich, A., Belongie, S. (2008). Object categorization using co-occurrence, location and appearance. In IEEE conference on computer vision and pattern recognition (CVPR) (pp. 144–151). Anchorage, AK.Google Scholar
  13. Goldberg, D., Nichols, D., Oki, B.M., Terry, D. (1992). Using collaborative filtering to weave an information tapestry. Communications of ACM, 35(12), 61–70.CrossRefGoogle Scholar
  14. Gould, S., Rodgers, J., Cohen, D., Elidan, G., Koller, D. (2008). Multi-class segmentation with relative location prior. International Journal of Computer Vision, 80(3), 300–316.CrossRefGoogle Scholar
  15. Heitz, G., & Koller, D. (2008). Learning spatial context: Using stuff to find things. In The 10th European conference on computer vision (pp. 30–43). Marseille, France.Google Scholar
  16. Heitz, G., Gould, S., Saxena, A., Koller, D. (2008). Cascaded classification models: Combining models for holistic scene understanding. In Neural information processing systems (pp. 417–424). Vancouver, Canada.Google Scholar
  17. Jiang, W., Chang, S.F., Loui, A.C. (2006). Active context-based concept fusion with partial user labels. In IEEE international conference on image processing (pp. 2917–2920). Atlanta, Georgia.Google Scholar
  18. Jiang, Y.G. (2010). Prediction scores on TRECVID 2010 data set. Last accessed on 8 Sept 2011.
  19. Jiang, Y.G., Wang, J., Chang, S.F., Ngo, C.W. (2009). Domain adaptive semantic diffusion for large scale context-based video annotation. In International conference on computer vision (ICCV) (pp. 1420–1427). Kyoto, Japan.Google Scholar
  20. Jiang, Y.G., Dai, Q., Wang, J., Ngo, C.W., Xue, X., Chang, S.F. (2012). Fast semantic diffusion for large scale context-based image and video annotation. IEEE Transactions on Image Processing, 21(6), 3080–3091.CrossRefGoogle Scholar
  21. Lin, L., Ravitz, G., Shyu, M.L., Chen, S.C. (2008). Correlation-based video semantic concept detection using multiple correspondence analysis. In IEEE international symposium on multimedia (pp. 316–321). Berkeley, USA.Google Scholar
  22. Lin, L., Chen, C., Shyu, M.L., Chen, S.C. (2011). Weighted subspace filtering and ranking algorithms for video concept retrieval. IEEE Multimedia, 18(3), 32–43.CrossRefGoogle Scholar
  23. Lowe, D.G. (1999). Object recognition from local scale-invariant features. In IEEE international conference on computer vision (Vol. 2, pp. 1150–1157). Kerkyra, Greece.Google Scholar
  24. Meng, T., & Shyu, M.L. (2012a). Leveraging concept association network for multimedia rare concept mining and retrieval. In IEEE international conference on multimedia and expo (pp. 860-865). Melbourne, Australia.Google Scholar
  25. Meng, T., & Shyu, M.L. (2012b). Model-driven collaboration and information integration for enhancing video semantic concept detection. In The 13th IEEE international conference on information integration and reuse (IRI2012) (pp. 144–151). Las Vegas, Nevada.Google Scholar
  26. Merler, M., Huang, B., Xie, L., Hua, G., Natsev, A. (2012). Semantic model vectors for complex video event recognition. IEEE Transactions on Multimedia, 14(1), 88–101.CrossRefGoogle Scholar
  27. Miller, G.A. (1995). Wordnet: a lexical database for english. Communications of the ACM, 38(11), 39–41.CrossRefGoogle Scholar
  28. Naphade, M., Smith, J., Tesic, J., Chang, S.F., Hsu, W., Kennedy, L., Hauptmann, A., Curtis, J. (2006). Large-scale concept ontology for multimedia. IEEE MultiMedia, 13(3), 86–91.CrossRefGoogle Scholar
  29. Naphade, M.R., Kristjansson, T., Frey, B., Huang, T.S. (1998). Probabilistic multimedia objects (multijects): a novel approach to video indexing and retrieval in multimedia systems. In IEEE international conference on image processing (Vol. 3, pp. 536–540). Chicago, IL.Google Scholar
  30. Naphade, M.R., Kozinetsey, I., Huang, T.S., Ramchandran, K. (2000). A factor graph framework for semantic indexing and retrieval in video. In The IEEE workshop on content-based access of image and video libraries (pp. 35–39). Washington DC.Google Scholar
  31. Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S. (2007). Objects in context. In IEEE international conference on computer vision (pp. 1–8). Rio de Janeiro, Brazil.Google Scholar
  32. Shyu, M.L., Xie, Z., Chen, M., Chen, S.C. (2008). Video semantic event/concept detection using a subspace-based multimedia data mining framework. IEEE Transactions on Multimedia, 10, 252–259.CrossRefGoogle Scholar
  33. Smeaton, A.F., Over, P., Kraaij, W. (2006). Evaluation campaigns and TRECVid. In Proceedings of the 8th ACM international workshop on multimedia information retrieval (pp. 321–330). doi: 10.1145/1178677.1178722.
  34. Smith, J.R., Naphade, M., Natsev, A. (2003). Multimedia semantic indexing using model vectors. In IEEE international conference on multimedia and expo (pp. 445–448). Baltimore, MD.Google Scholar
  35. Tang, J., Hua, X.S., Wang, M., Gu, Z., Qi, G.J., Wu, X. (2009). Correlative linear neighborhood propagation for video annotation. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics, 39(2), 409–416.CrossRefGoogle Scholar
  36. Torralba, A. (2003). Contextual priming for object detection. International Journal of Computer Vision, (2), 169–191.CrossRefGoogle Scholar
  37. Torralba, A.B., Murphy, K.P., Freeman, W.T. (2004). Contextual models for object detection using boosted random fields. In Neural information processing systems (pp. 1401–1408). Vancouver, British Columbia, Canada.Google Scholar
  38. Wei, X.Y., Ngo, C.W., Jiang, Y.G. (2008). Selection of concept detectors for video search by ontology-enriched semantic spaces. IEEE Transactions on Multimedia, 10(6), 1085–1096.CrossRefGoogle Scholar
  39. Yang, Y.H. (2008). Video search reranking via online ordinal reranking. In IEEE international conference on multimedia and expo (pp. 285–288). Hannover, Germany.Google Scholar
  40. Zha, Z.J., Mei, T., Wang, Z., Hua, X.S. (2007). Building a comprehensive ontology to refine concept video detection. In International workshop on multimedia information retrieval (pp. 227–236). Augsburg, Germany.Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringUniversity of MiamiCoral GablesUSA

Personalised recommendations