Multimedia Tools and Applications

, Volume 62, Issue 1, pp 111–137 | Cite as

Interactive multi-user video retrieval systems

  • Marco Bertini
  • Alberto Del Bimbo
  • Andrea Ferracani
  • Lea Landucci
  • Daniele Pezzatini


In this paper we present two interactive multi-user systems for video search and browsing. The first is composed by web applications which allows multiuser interaction in a distributed environment; such applications are based on the Rich Internet Application paradigm, designed to obtain the levels of responsiveness and interactivity typical of a desktop application. The second system implements a multi-user collaborative application within a single location, exploiting multi-touch devices. Both systems use the same backend, based on a service oriented architecture (SOA) that provides services for automatic and manual annotation, and an ontology-based video search and browsing engine. Ontology-based browsing let users to inspect the content of video collections; user queries are expanded through ontology reasoning. User-centered field trials of the systems, conducted to assess the user experience and satisfaction, have shown that the approach followed to design these interfaces is extremely appreciated by professional archivists and people working on multimedia.


Video retrieval Usability Multimedia ontologies Natural interaction 



The authors thank Giuseppe Becchi for his work on software development. This work was partially supported by the EU IST IM3I project (—contract FP7-222267).


  1. 1.
    Amant RS, Healey CG (2001) Usability guidelines for interactive search in direct manipulation systems. In: Proc. of international joint conference on artificial intelligence, vol 2, pp 1179–1184Google Scholar
  2. 2.
    Apted T, Collins A, Kay J (2009) Heuristics to support design of new software for interaction at tabletops. In: Proc. of CHI workshop on multitouch and surface computing, pp 1–4Google Scholar
  3. 3.
    Bailer W, Weiss W, Kienast G, Thallinger G, Haas W (2010) A video browsing tool for content management in postproduction. International Journal of Digital Multimedia Broadcasting, Hindawi Publishing Corporation, vol 2010, Article ID 856761, 17 pp. doi: 10.1155/2010/856761
  4. 4.
    Ballan L, Bertini M, Del Bimbo A, Seidenari L, Serra G (2011) Event detection and recognition for semantic annotation of video. Multimed Tools Appl 51(1):279–302CrossRefGoogle Scholar
  5. 5.
    Baraldi S, Bimbo A, Landucci L (2008) Natural interaction on tabletops. Multimed Tools Appl (MTAP) 38:385–405CrossRefGoogle Scholar
  6. 6.
    Behmo R, Paragios N, Prinet V (2008) Graph commute times for image representation. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), pp 1–8Google Scholar
  7. 7.
    Bertini M, Del Bimbo A, Nunziati W (2006) Video clip matching using MPEG-7 descriptors and edit distance. In: Proc. of international conference on image and video retrieval (CIVR), pp 133–142. Tempe, AZ, USAGoogle Scholar
  8. 8.
    Bertini M, Del Bimbo A, Serra G, Torniai C, Cucchiara R, Grana C, Vezzani R (2009) Dynamic pictorially enriched ontologies for digital video libraries. IEEE MultiMedia 16(2):42–51CrossRefGoogle Scholar
  9. 9.
    Bertini M, D’Amico G, Ferracani A, Meoni M, Serra G (2010) Sirio, Orione and Pan: an integrated web system for ontology-based video search and annotation. In: Proc. of ACM international conference on multimedia (ACM MM), pp 1625–1628Google Scholar
  10. 10.
    Bevan N, Spinhof L (2007) Are guidelines and standards for web usability comprehensive? In: Jacko J (ed) Human–computer interaction. Interaction design and usability. Lecture notes in computer science, vol 4550. Springer, Berlin, pp 407–419CrossRefGoogle Scholar
  11. 11.
    Blackwell AF, Stringer M, Toye EF, Rode JA (2004) Tangible interface for collaborative information retrieval. In: Proc. of CHI conference on human factors in computing systems (CHI), pp 1473–1476Google Scholar
  12. 12.
    Brettlecker G, Milano D, Ranaldi P, Schek HJ, Schuldt H, Springmann M (2007) ISIS and OSIRIS: a process-based digital library application on top of a distributed process support middleware. In: Proc. of 1st international conference on digital libraries: research and development (DELOS), pp 46–55Google Scholar
  13. 13.
    Bronstein AM, Bronstein MM (2010) Spatially-sensitive affine-invariant image descriptors. In: Proc. of European conference on computer vision (ECCV), pp 197–208Google Scholar
  14. 14.
    Bursuc A, Zaharia T, Prêteux F (2010) Mobile video browsing and retrieval with the ovidius platform. In: Proc. of ACM international conference on multimedia (ACM MM), pp 1659–1662Google Scholar
  15. 15.
    Card SK, Mackinlay JD, Shneiderman B (eds) (1999) Readings in information visualization: using vision to think. Morgan Kaufmann, San MateoGoogle Scholar
  16. 16.
    Chen F, Eades P, Epps J, Lichman S, Close B, Hutterer P, Takatsuka M, Thomas B, Wu M (2006) Vicat: visualisation and interaction on a collaborative access table. In: Proc. of the first IEEE international workshop on horizontal interactive human-computer systems. IEEE Computer Society, Washington, DC, pp 59–62CrossRefGoogle Scholar
  17. 17.
    Christel M, Moraveji N (2004) Finding the right shots: assessing usability and performance of a digital video library interface. In: Proc. of ACM international conference on multimedia (ACM MM), pp 732–739Google Scholar
  18. 18.
    Chua B, Dyson L (2004) Applying the ISO 9126 model to the evaluation of an e-learning system. In: Proc. of the 21st ASCILITE conference, pp 184–190Google Scholar
  19. 19.
    Chum O, Philbin J, Sivic J, Isard M, Zisserman A (2007) Total recall: automatic query expansion with a generative feature model for object retrieval. In: Proc. of international conference on computer vision (ICCV), pp 1–8Google Scholar
  20. 20.
    Collins A of Sydney U (2010) Making the tabletop personal: employing user models to aid information retrieval/Anthony Collins ... [et al]. University of Sydney, School of Information Technologies, Sydney, NSW.
  21. 21.
    Czerwinski M, Robertson G, Meyers B, Smith G, Robbins D, Tan D (2006) Large display research overview. In: Proc. of CHI extended abstracts on human factors in computing systems, pp 69–74. Association for Computing Machinery (ACM) Press, Montreal, Canada. doi: 10.1145/1125451.1125471
  22. 22.
    Datta R, Joshi D, Li J, Wang JZ (2008) Image retrieval: ideas, influences, and trends of the new age. ACM Comput Surv 40:5:1–5:60CrossRefGoogle Scholar
  23. 23.
    Dietz P, Leigh D (2001) Diamondtouch: a multi-user touch technology. In: Proc. of ACM symposium on user interface software and technology (UIST), pp 219–226Google Scholar
  24. 24.
    Dumas B, Lalanne D, Oviatt S (2009) Multimodal interfaces: a survey of principles, models and frameworks. In: Lalanne D, Kohlas J (eds) Human machine interaction. Lecture notes in computer science, vol 5440. Springer, Berlin, pp 3–26CrossRefGoogle Scholar
  25. 25.
    European Broadcasting Union—Common Processes group: EBU Technical—Services Oriented Archictectures. Accessed August 2011
  26. 26.
    Everingham M, Van Gool L, Williams CKI, Winn J, Zisserman A (2010) The Pascal Visual Object Classes (VOC) challenge. Int J Comput Vis 88(2):303–338CrossRefGoogle Scholar
  27. 27.
    Fergus R, Perona P, Zisserman A (2003) Object class recognition by unsupervised scale-invariant learning. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR)Google Scholar
  28. 28.
    Grauman K, Darrell T (2007) The pyramid match kernel: efficient learning with sets of features. J Mach Learn Res (JMLR) 8:725–760MATHGoogle Scholar
  29. 29.
    Hansen P, Järvelin K (2005) Collaborative information retrieval in an information-intensive domain. Inf Process Manag 41:1101–1119CrossRefGoogle Scholar
  30. 30.
    Halvorsen P, Johansen D, Olstad B, Kupka T, Tennøe S (2010) vesp: enriching enterprise document search results with aligned video summarization. In: Proc. of ACM international conference on multimedia (ACM MM), pp 1603–1604Google Scholar
  31. 31.
    Jiang YG, Ngo CW, Yang J (2007) Towards optimal bag-of-features for object categorization and semantic video retrieval. In: Proc. of ACM international conference on image and video retrieval (CIVR), pp 494–501Google Scholar
  32. 32.
    Joshi D, Datta R, Zhuang Z, Weiss WP, Friedenberg M, Li J, Wang JZ (2006) Paragrab: a comprehensive architecture for web image management and multimodal querying. In: Proc. of the international conference on very large data bases (VLDB), pp 1163–1166Google Scholar
  33. 33.
    Kaltenbrunner M, Bovermann T, Bencina R, Costanza E (2005) TUIO a protocol for table-top tangible user interfaces. In: Proc. of international workshop on gesture in human-computer interaction and simulationGoogle Scholar
  34. 34.
    Kirakowski J (2000) Questionnaires in usability engineering: a list of frequently asked questions, 3rd edn. Available at:
  35. 35.
    Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: Proc. of IEEE conference on computer vision and pattern recognition (CVPR), vol 2, pp 2169–2178Google Scholar
  36. 36.
    Li J, Wang JZ (2006) Real-time computerized annotation of pictures. In: Proc. of ACM international conference on multimedia (ACM MM), pp 911–920Google Scholar
  37. 37.
    Morris M, Paepcke A, Winograd T (2006) Teamsearch: comparing techniques for co-present collaborative search of digital media. In: First IEEE international workshop on horizontal interactive human-computer systems, 2006. TableTop 2006, p 8Google Scholar
  38. 38.
    Natsev A, Smith J, Tešić J, Xie L, Yan R, Jiang W, Merler M (2008) IBM Research TRECVID-2008 video retrieval system. In: Proc. of TRECVID workshopGoogle Scholar
  39. 39.
    Nielsen J (2000) Why you only need to test with 5 users. Available at:
  40. 40.
    Ryall K, Morris MR, Everitt K, Forlines C, Shen C (2006) Experiences with and observations of directtouch tabletops. In: Proc. of IEEE tabletop the international workshop on horizontal interactive human computer systems, pp 89–96Google Scholar
  41. 41.
    Scott SD, Grant KD, Mandryk RL (2003) System guidelines for co-located, collaborative work on a tabletop display. In: Proc. of European conference computer-supported cooperative work (ECSCW). Helsinki, FinlandGoogle Scholar
  42. 42.
    Shen C (2006) Multi-user interface and interactions on direct-touch horizontal surfaces: collaborative tabletop research at merl. In: Proc. of IEEE international workshop on horizontal interactive human-computer systemsGoogle Scholar
  43. 43.
    Shen C (2007) From clicks to touches: enabling face-to-face shared social interface on multi-touch tabletops. In: Schuler D (ed) Online communities and social computing. Lecture notes in computer science, vol 4564. Springer, Berlin, pp 169–175CrossRefGoogle Scholar
  44. 44.
    Shneiderman B (1998) Designing the user interface: strategies for effective human-computer interaction. Addison-Wesley, ReadingGoogle Scholar
  45. 45.
    Sivic J, Zisserman A (2003) Video Google: a text retrieval approach to object matching in videos. In: Proc. of international conference on computer vision (ICCV)Google Scholar
  46. 46.
    Smeaton A, Lee H, Foley C, McGivney S (2007) Collaborative video searching on a tabletop. Multimedia Syst 12:375–391CrossRefGoogle Scholar
  47. 47.
    Smeaton A, Over P, Kraaij W (2009) High-level feature detection from video in TRECVid: a 5-year retrospective of achievements. In: Multimedia content analysis, theory and applications, pp 151–174Google Scholar
  48. 48.
    Snoek CGM, Worring M (2009) Concept-based video retrieval. FTINR 2(4):215–322Google Scholar
  49. 49.
    Snoek C, van de Sande K, de Rooij O, Huurnink B, van Gemert J, Uijlings J, He J, Li X, Everts I, Nedović V, van Liempt M, van Balen R, Yan F, Tahir M, Mikolajczyk K, Kittler J, de Rijke M, Geusebroek J, Gevers T, Worring M, Smeulders A, Koelma D (2008) The MediaMill TRECVID 2008 semantic video search engine. In: Proc. of TRECVID workshopGoogle Scholar
  50. 50.
    Snoek CGM, van de Sande KEA, de Rooij O, Huurnink B, Uijlings JRR, van Liempt M, Bugalho M, Trancoso I, Yan F, Tahir MA, Mikolajczyk K, Kittler J, de Rijke M, Geusebroek JM, Gevers T, Worring M, Koelma DC, Smeulders AWM (2009) The MediaMill TRECVID 2009 semantic video search engine. In: Proc. of TRECVID workshopGoogle Scholar
  51. 51.
    Snoek CG, Freiburg B, Oomen J, Ordelman R (2010) Crowdsourcing rock n’ roll multimedia retrieval. In: Proc. of ACM international conference on multimedia (ACM MM), pp 1535–1538Google Scholar
  52. 52.
    Snoek CGM, van de Sande KEA, de Rooij O, Huurnink B, Gavves E, Odijk D, de Rijke M, Gevers T, Worring M, Koelma DC, Smeulders AWM (2010) The MediaMill TRECVID 2010 semantic video search engine. In: Proc. of TRECVID workshop. Gaithersburg, USAGoogle Scholar
  53. 53.
    Stewart J, Bederson BB, Druin A (1999) Single display groupware: a model for co-present collaboration. In: Proc. of CHI conference on human factors in computing systems (CHI), pp 286–293Google Scholar
  54. 54.
    Taksa I, Spink A, Goldberg R (2008) A task-oriented approach to search engine usability studies. J Softw (JSW) 3(1):63–73Google Scholar
  55. 55.
    Tse E, Greenberg S, Shen C, Forlines C (2007) Multimodal multiplayer tabletop gaming. Computers in Entertainment (CIE) - Interactive TV, Association for Computing Machinery ACM Press 5(2). doi: 10.1145/1279540.1279552
  56. 56.
    Twidale MB, Nichols DM, Twidale MB, Nichols DM (1998) Designing interfaces to support collaboration in information retrieval. Computers 10:177–193Google Scholar
  57. 57.
    Ullmer B, Ishii H, Jacob RJK (2003) Tangible query interfaces: physically constrained tokens for manipulating database queries. In: Proc. of INTERACT’03, pp 279–286Google Scholar
  58. 58.
    U.S. Department of Health and Human Sciences (2006) Research-based web design & usability guidelines. Available at:
  59. 59.
    van Velsen L, König F, Paramythis A (2009) Assessing the effectiveness and usability of personalized internet search through a longitudinal evaluation. In: Proc. of 6th workshop on user-centred design and evaluation of adaptive systems (UCDEAS)Google Scholar
  60. 60.
    Yang J, Jiang YG, Hauptmann AG, Ngo CW (2007) Evaluating bag-of-visual-words representations in scene classification. In: Proc. of int’l workshop on multimedia information retrieval (MIR)Google Scholar
  61. 61.
    Yuen J, Russell B, Liu C, Torralba A (2009) Labelme video: building a video database with human annotations. In: Proc. of int’l conference on computer vision (ICCV), pp 1451–1458Google Scholar
  62. 62.
    Zhang J, Marszałek M, Lazebnik S, Schmid C (2007) Local features and kernels for classification of texture and object categories: a comprehensive study. Int J Comput Vis (IJCV) 73:213–238CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Marco Bertini
    • 1
  • Alberto Del Bimbo
    • 1
  • Andrea Ferracani
    • 1
  • Lea Landucci
    • 1
  • Daniele Pezzatini
    • 1
  1. 1.Media Integration and Communication CenterUniversity of FlorenceFlorenceItaly

Personalised recommendations