Multimedia Tools and Applications

, Volume 63, Issue 2, pp 357–385 | Cite as

SHIATSU: tagging and retrieving videos without worries



The dramatic growth of video content over modern media channels (such as the Internet and mobile phone platforms) directs the interest of media broadcasters towards the topics of video retrieval and content browsing. Several video retrieval systems benefit from the use of semantic indexing based on content, since it allows an intuitive categorization of videos. However, indexing is usually performed through manual annotation, thus introducing potential problems such as ambiguity, lack of information, and non-relevance of index terms. In this paper, we present SHIATSU, a complete system for video retrieval which is based on the (semi-)automatic hierarchical semantic annotation of videos exploiting the analysis of visual content; videos can then be searched by means of attached tags and/or visual features. We experimentally evaluate the performance of SHIATSU on two different real video benchmarks, proving its accuracy and efficiency.


Content-based video annotation Video segmentation Hierarchical semantic video annotation Visual features 


  1. 1.
    Ardizzoni S, Bartolini I, Patella M (1999) Windsurf: region-based image retrieval using wavelets. In: IWOSS 1999, Florence, Italy, pp 167–173Google Scholar
  2. 2.
    Barbu T (2009) Novel automatic video cut detection technique using Gabor filtering. Comput and Electr Eng 35(5):712–721MathSciNetMATHCrossRefGoogle Scholar
  3. 3.
    Bartolini I (2009) Multi-faceted browsing interface for digital photo collections. In: CBMI 2009, Chania, Greece, pp 237–242Google Scholar
  4. 4.
    Bartolini I, Ciaccia P (2007) Imagination: accurate image annotation using link-analysis techniques. In: AMR 2007, Paris, France, pp 32–44Google Scholar
  5. 5.
    Bartolini I, Ciaccia P, Patella M (2010) Query processing issues in region-based image databases. Knowl Inf Syst 25(2):389–420CrossRefGoogle Scholar
  6. 6.
    Bartolini I, Patella M, Romani C (2010) SHIATSU: semantic-based hierarchical automatic tagging of videos by segmentation using cuts. In: AIEMPro 2010, Florence, ItalyGoogle Scholar
  7. 7.
    Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 8(6):679–698CrossRefGoogle Scholar
  8. 8.
    Chasanis V, Likas A, Galatsanos NP (2009) Simultaneous detection of abrupt cuts and dissolves in videos using support vector machines. Pattern Recogn Lett 30(1):55–65CrossRefGoogle Scholar
  9. 9.
    Ciaccia P, Patella M, Zezula P (1997) M-tree: an efficient access method for similarity search in metric spaces. In: VLDB 1997, Athens, Greece, pp 426–435Google Scholar
  10. 10.
    Dakka W, Ipeirotis PG, Wood KR (2005) Automatic construction of multifaceted browsing interfaces. In: CIKM 2005, Bremen, Germany, pp 768–775Google Scholar
  11. 11.
    Datta R, Ge W, Li J, Wang JZ (2007) Toward bridging the annotation-retrieval gap in image search. IEEE Multimed 14(3):24–35CrossRefGoogle Scholar
  12. 12.
    Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40(2):article 2Google Scholar
  13. 13.
    Diou C, Stephanopoulos G, Dimitrou N et al (2009) VITALAS at TRECVID-2009. In: TRECVID 2009, Gaithersburg, MD, pp 16–17Google Scholar
  14. 14.
    Dorado A, Calic J, Izquierdo E (2004) A rule-based video annotation system. IEEE Trans Circuits Syst Video Technol 14(5):622–633CrossRefGoogle Scholar
  15. 15.
    Fagin R, Guha R, Kumar R, Novak J, Sivakumar D, Tomkins A (2005) Multi-structural databases. In: PODS 2005, Baltimore, MD, pp 184–195Google Scholar
  16. 16.
    Geetha P, Narayanan V (2008) A survey of content-based video retrieval. J Comput Sci 4(6):474–486CrossRefGoogle Scholar
  17. 17.
    Hauptmann AG, Yan R, Lin WH, Christel MG, Wactlar HD (2007) Can high-level concepts fill the semantic gap in video retrieval? A case study with broadcast news. IEEE Trans Multimedia 9(5):958–966CrossRefGoogle Scholar
  18. 18.
    Hauptmann AG, Christel MG, Yan R (2008) Video retrieval based on semantic concepts. Proc IEEE 96(4):602–622CrossRefGoogle Scholar
  19. 19.
    Hearst MA (2006) Clustering versus faceted categories for information exploration. Commun ACM 49(4):59–61CrossRefGoogle Scholar
  20. 20.
    Hjaltason GR, Samet H (2003) Index-driven similarity search in metric spaces. ACM Trans Database Syst 28(4):517–580CrossRefGoogle Scholar
  21. 21.
    Jacobs A, Miene A, Ioannidis GT, Herzog O (2004) Automatic shot boundary detection combining color, edge, and motion features of adjacent frames. In: TRECVID 2004, Gaithersburg, MD, pp 197–206Google Scholar
  22. 22.
    Jin Y, Khan L, Wang L, Awad M (2005) Image annotations by combining multiple evidence & WordNet. In: ACM Multimedia 2005, Singapore, pp 706–715Google Scholar
  23. 23.
    Kasturi R, Strayer SH, Gargi U, Antani S (1996) An evaluation of color histogram based methods in video indexing. In: International workshop on image database and multi media search, Amsterdam, The Netherlands, pp 75–82Google Scholar
  24. 24.
    Kleban J, Moxley E, Xu J, Manjunath BS (2009) Global annotation of georeferenced photographs. In: CIVR 2009, Santorini Island, GreeceGoogle Scholar
  25. 25.
    Lew MS, Sebe N, Djeraba C, Jain R (2006) Content-based multimedia information retrieval: state of the art and challenges. ACM Trans Multimedia Comput, Commun and App 2(1):1–19CrossRefGoogle Scholar
  26. 26.
    Liao J, Zhang B (2008) A robust clustering algorithm for video shots using Haar wavelet transformation. In: IDAR 2007, Beijing, China, pp 81–82Google Scholar
  27. 27.
    Liu PY, Li F (2002) Semantic extraction and semantics-based anotation and retrieval for video databases. Multimedia Tools and Applications 17(1):5–20CrossRefGoogle Scholar
  28. 28.
    Liu Z, Zavesky E, Gibbon D, Shahraray B, Haffner P (2007) AT&T research at TRECVID 2007. AT&T Labs - Research, Middletown, NJGoogle Scholar
  29. 29.
    Navigli R (2009) Word sense disambiguation: a survey. ACM Comput Surv 41(2):article 10Google Scholar
  30. 30.
    Ngo C-W, Jiang Y-G, Wei X-Y, Zhao W, Liu Y, Wang J, Zhu S, Chang S-F (2009) VIREO/DVMM at TRECVID 2009: high-level feature extraction, automatic video search and content-based copy detection. In: TRECVID 2009, Gaithersburg, MD, pp 16–17Google Scholar
  31. 31.
    Qu Z, Liu Y, Ren L, Chen Y, Zheng R (2009) A method of shot detection based on color and edge features. In: SWS 2009, Lanzhou, China, pp 1–4Google Scholar
  32. 32.
    Rasiwasia N, Moreno PJ, Vasconcelos N (2007) Bridging the gap: query by semantic example. IEEE Trans Multimedia 9(5):923–938CrossRefGoogle Scholar
  33. 33.
    Shanmugam TN, Rajendran P (2009) An enhanced content-based video retrieval system based on query-clip. Int J Research and Reviews in Applied Sci 1(3):236–253MATHGoogle Scholar
  34. 34.
    Smeulders AWM, Worring M, Santini S, Gupta A, Jain R (2000) Content-based image retrieval at the end of the early years. IEEE Trans Pattern Anal Mach Intell 22(12):1349–1380CrossRefGoogle Scholar
  35. 35.
    Su C-W, Liao H-YM, Tyan H-R, Lin C-W, Chen D-Y, Fan K-C (2007) Motion flow-based video retrieval. IEEE Trans Multimedia 6(9):1193–1201Google Scholar
  36. 36.
    TRECVID (2001) Guidelines for the Trec-2001 video track. Accessed 17 Nov 2011
  37. 37.
    TRECVID (2008) Guidelines for the TRECVID 2008 evaluation. Accessed 17 Nov 2011
  38. 38.
    Wang L, Khan L (2006) Automatic image annotation and retrieval using weighted feature selection. Multimedia Tools and Applications 29(1):55–71CrossRefGoogle Scholar
  39. 39.
    Wu C-J, Zeng H-C, Huang S-H, Lai S-H, Wang W-H (2006) Learning-based interactive video retrieval system. In: ICME 2006, Los Alamitos, CA, pp 1785–1788Google Scholar
  40. 40.
    Yee K-P, Swearingen K, Li K, Hearst MA (2003) Faceted metadata for image search and browsing. In: CHI 2003, Ft. Lauderdale, FL, pp 401–408Google Scholar
  41. 41.
    Yuan J, Wang H, Xiao L, Zheng W, Li J, Zhang B (2007) A formal study of shot boundary detection. IEEE Trans Circuits Syst Video Technol 17(2):168–186CrossRefGoogle Scholar
  42. 42.
    Zavr̆el V, Batko M, Zezula P (2010) Visual video retrieval system using MPEG-7 descriptors. In: SISAP 2010, Istanbul, Turkey, pp 125–126Google Scholar
  43. 43.
    Zhao H, Hu B, Zheng M, Li X (2009) Shot boundary detection based on mutual information and canny edge detector. J Commun and Comput 6(10):17–22Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Ilaria Bartolini
    • 1
  • Marco Patella
    • 1
  • Corrado Romani
    • 1
  1. 1.DEIS, Alma Mater StudiorumUniversità di BolognaBolognaItaly

Personalised recommendations