Multimedia Tools and Applications

, Volume 37, Issue 2, pp 135–167 | Cite as

DocMIR: An automatic document-based indexing system for meeting retrieval

  • Ardhendu BeheraEmail author
  • Denis Lalanne
  • Rolf Ingold


This paper describes the DocMIR system which captures, analyzes and indexes automatically meetings, conferences, lectures, etc. by taking advantage of the documents projected (e.g. slideshows, budget tables, figures, etc.) during the events. For instance, the system can automatically apply the above-mentioned procedures to a lecture and automatically index the event according to the presented slides and their contents. For indexing, the system requires neither specific software installed on the presenter’s computer nor any conscious intervention of the speaker throughout the presentation. The only material required by the system is the electronic presentation file of the speaker. Even if not provided, the system would temporally segment the presentation and offer a simple storyboard-like browsing interface. The system runs on several capture boxes connected to cameras and microphones that records events, synchronously. Once the recording is over, indexing is automatically performed by analyzing the content of the captured video containing projected documents and detects the scene changes, identifies the documents, computes their duration and extracts their textual content. Each of the captured images is identified from a repository containing all original electronic documents, captured audio–visual data and metadata created during post-production. The identification is based on documents’ signatures, which hierarchically structure features from both layout structure and color distributions of the document images. Video segments are finally enriched with textual content of the identified original documents, which further facilitate the query and retrieval without using OCR. The signature-based indexing method proposed in this article is robust and works with low-resolution images and can be applied to several other applications including real-time document recognition, multimedia IR and augmented reality systems.


Meeting recordings Automated meeting indexing and retrieval Low-resolution document identification Multimedia content extraction Multimedia IR 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abowd GD, Atkeson CG, Feinstein CHA, Hmelo C, Kooper R, Long S, Sawhney NN, Tani M (1996) Teaching and learning as multimedia authoring: the classroom 2000 project. In: Proc. ACM multimedia, Boston MA, pp 187–198, November 1996Google Scholar
  2. 2.
    Adar E, Kargar D, Stein LA (1999) Haystack: per-user information environments. In: Proc. of 8th int’l conf. on information and knowledge management (CKIM), Kansas City, USA, pp 413–422Google Scholar
  3. 3.
    Aigrain P, Zhang H, Petkovic D (1996) Content-based representation and retrieval of visual media: a state-of-the-art review. Multimedia Tools and Applications 3(3):179–202CrossRefGoogle Scholar
  4. 4.
    Barakonyi I, Fahmy T, Schmalstieg D (2004) Remote collaboration using augmented reality videoconferencing. In: Proc. of ACM graphics interface, Ontario, Canada, pp 89–96Google Scholar
  5. 5.
    Behera A, Lalanne D, Ingold R (2005) Enhancement of layout-based identification of low-resolution documents using geometrical color distribution. In: Proc. int. conf. on document analysis and recognition (ICDAR), Seoul, Korea, August–September, pp 468–472Google Scholar
  6. 6.
    Behera A, Lalanne D, Ingold R (2004) Visual signature based identification of low-resolution document images. In: Proc. ACM symposium on document engineering, Milwaukee, Wisconsin, pp 178–187Google Scholar
  7. 7.
    Behera A, Lalanne D, Ingold R (2005) Combining Color and Layout Features for the Identification of Low-resolution Documents. Int. Journal of Signal Processing (IJSP), ISSN: 1304–4478 2(1):7–14Google Scholar
  8. 8.
    Behera A (2006) A visual signature-based identification method of low-resolution document images and its exploitation to automate indexing of multimodal recordings. PhD ThesisGoogle Scholar
  9. 9.
    Bianchi MH AutoAuditorium: a Fully Automatic, Multi-Camera System to Televise Auditorium Presentations. In: Joint DARPA/NIST smart spaces workshop, Gaithersburg, MD, July 1998
  10. 10.
    Boreczky JS, Rowe LA (1996) Comparison of Video Shot Boundary Detection Techniques. In: Proc. storage and retrieval for still image and video databases IV, IS&T/SPIE int. symposium on electronic imaging: science and technology, San Jose, CA, 2670:170–179Google Scholar
  11. 11.
    Brotherton JA, Bhalodia JR, Abowd GD (1998) Automated Capture, Integration, and Visualization of Multiple Media Streams. In: Proc. IEEE Int. Conf. on Multimedia Computing and Systems, Austin, TX, pp 54–63Google Scholar
  12. 12.
    Chiu P, Kapuskar A, Reitmeier S, Wilcox L (2000) Room with a rear view: Meeting capture in a multimedia conference room. IEEE Multimed 7(4):48–54CrossRefGoogle Scholar
  13. 13.
    Chiu P, Foote J, Girgensohn A, Boreczky J (2000) Automatically linking multimedia meeting documents by image matching. In: Proc. ACM hypertext, San Antonio, TX, pp 244–245Google Scholar
  14. 14.
    Chiu P, Kapuskar A, Reitmeier S, Wilcox L (1999) NoteLook: Taking notes in meetings with digital video and ink. In: Proc. ACM multimedia, New York, pp 149–158Google Scholar
  15. 15.
    Cutler R, Rui Y, Gupta A, Cadiz JJ et al (2002) Distributed meetings: a meeting capture and broadcasting system. In: Proc. of ACM multimedia, Juan-les-Pins, France, pp 503–512Google Scholar
  16. 16.
    eClass, Georgia Institute of Technology, Atlanta, USA,
  17. 17.
    Educational multimedia library project (EmuLib), University of Mannheim, Germany,
  18. 18.
    Erol B, Hull JJ, Lee DS (2003) Linking multimedia presentations with their symbolic source documents: algorithm and applications. In: Proc. of ACM multimedia, Berkeley, CA, pp 498–507Google Scholar
  19. 19.
    Fitzgibbon AW, Pilu M, Fisher RB (1999) Direct least squares fitting of ellipses. IEEE Trans Pattern Anal Mach Intell 21(5):476–480CrossRefGoogle Scholar
  20. 20.
    Girgensohn A, Boreczky J, Wilcox L, Foote J (1999) Facilitating video access by visualizing automatic analysis. In: Proc. of human–computer interaction INTERACT ’99, IOS Press, pp 205–212Google Scholar
  21. 21.
    Gemmell J, Bell G, Lueder R, Drucker S, Wong C (2002) MyLifeBits: fulfilling the Memex Vision. In: Proc. ACM multimedia, Juan-les-Pins, France, pp 235–238Google Scholar
  22. 22.
    Geyer W, Richter H, Fuchs L, Frauenhofer T, Daijavad S, Poltrock S (2001) A team collaboration space supporting capture and access of virtual meetings. In: Proc. ACM supporting group work, Colorado, pp 188–196Google Scholar
  23. 23.
    Hadjar K, Rigamonti M, Lalanne D, Ingold R (2004) Xed: a new tool for eXtracting hidden structures from electronic documents. In: Proc. int. workshop on document image analysis for libraries (DIAL), Palo Alto, pp 212–224Google Scholar
  24. 24.
    He L, Sanocki E, Gupta A, Grudin J (1999) Auto-summarization of audio-video presentations. In: Proc. ACM Multimedia, New York, pp 489–498Google Scholar
  25. 25.
    Hunter J, Little S (2001) Building and indexing a distributed multimedia presentation archive using SMIL. In: Proc. of the 5th European conference on research and advanced technology for digital libraries, Darmstadt, Germany, pp 415–428Google Scholar
  26. 26.
    Lalanne D, Ingold R, Rotz DV, Behera A, Mekhaldi D (2004) Using static documents as structured and thematic interfaces to multimedia meeting archives. In: Proc. int. workshop on multimodal interaction and related machine learning algorithms (MLMI), Martigny, Switzerland, LNCS 3361:87–100Google Scholar
  27. 27.
    Lalanne D, Lisowska A, Bruno E, Flynn M et al (2005) The IM2 multimodal meeting browser family. Interactive Multimodal Information Management Tech. Report, Margtiny, SwitzerlandGoogle Scholar
  28. 28.
    LectureLounge, Fraunhofer-IPSI, Darmstadt, Germany,
  29. 29.
    Lee DS, Erol B, Graham J, Hull JJ, Murata N (2002) Portable Meeting Recorder In: Proc. ACM Multimedia, Juan-les-Pins, France, pp 493–502Google Scholar
  30. 30.
    Lim JS (1990) Two-dimensional signal and image processing, Englewood Cliffs, NJ: Prentice HallGoogle Scholar
  31. 31.
  32. 32.
    Lu T, Suganthan PN (2004) An Accumulation Algorithm for Video Shot Boundary Detection. Multimedia Tools and Applications 22(1):89–106CrossRefGoogle Scholar
  33. 33.
    Meeting room, Carnegie Mellon University,
  34. 34.
    Mukhopadhyay S, Smith B (1999) Passive Capture and Structuring of Lectures. In: Proc. ACM Multimedia, Orlando, FL, pp 477–487Google Scholar
  35. 35.
    Nakanishi H, Yoshida C, Nishimura T, Ishida T (1999) FreeWalk: A 3D virtual space for casual meetings. IEEE Multimedia 6(2):20–28CrossRefGoogle Scholar
  36. 36.
    Otsu N (1979) A threshold selection method from gray-level histograms. IEEE Trans Syst Man Cybern 9(1):62–66MathSciNetCrossRefGoogle Scholar
  37. 37.
    Petkovic M (2000) Content-based video retrieval. In: Proc. int. conf. on extending database technology, Konstanz, Germany, pp 74–77Google Scholar
  38. 38.
    Prince S, Cheok AD, Farbiz F, Williamson T, Johnson N, Billinghurst M, Kato H (2002) 3-D live: real time interaction for mixed reality. In: Proc. of the ACM computer supported cooperative work (CSCW), New Orleans, USA, pp 364–371Google Scholar
  39. 39.
    Rucklidge WJ (1997) Efficiently locating objects using the Hausdorff distance. Int J Comput Vis 24(3):251–270CrossRefGoogle Scholar
  40. 40.
    Rui Y, Gupta A, Grudin J, He L (2004) Automating lecture capture and broadcast: technology and videography. ACM Multimedia Systems 10:3–15CrossRefGoogle Scholar
  41. 41.
    Scott DW (1992) Multivariate density estimation. New York: WileyzbMATHGoogle Scholar
  42. 42.
    Shirmohammadi S, Ding L, Georganas N (2003) An approach for recording multimedia collaborative sessions: design and implementation. Multimedia Tools and Applications 19(2):135–154CrossRefGoogle Scholar
  43. 43.
    Silverman BW (1986) Density estimation for statistic and data analysis. New York: Chapman and HallGoogle Scholar
  44. 44.
    Smart multimedia archive for conferences project (SMAC), University of Fribourg, Switzerland,
  45. 45.
    Steinmetz A, Kienzle M (2001) The e-seminar lecture recording and distribution system. In: Proc. of SPIE multimedia computing and networking (MMCN), San Jose, CA 4312:25–36Google Scholar
  46. 46.
    Sural S, Qian G, Paramanik S (2002) Segmentation and histogram generation using the HSV color space for image retrieval. In: Proc. IEEE Intl. Conf. of Image Processing, Rochester, NY, pp 589–592Google Scholar
  47. 47.
    Swain M, Ballard D (1991) Color Indexing. Int J Comput Vis 7(1):11–32CrossRefGoogle Scholar
  48. 48.
    Synchronized multimedia integration language (SMIL 2.1) specification, W3C recommendation, February 2005.
  49. 49.
    Trier ØD, Taxt T (1995) Evaluation of Binarization Methods for Document Images. IEEE Trans Pattern Anal Mach Intell 17(3):312–315CrossRefGoogle Scholar
  50. 50.
    Wactlar H, Christel M, Hauptmann A, Gong Y (1999) Informedia experience-on-demand: capturing, integrating and communicating experiences across people, time and space. ACM Comput Surv (CSUR) 31(9)Google Scholar
  51. 51.
    Wong KY, Casey RG, Wahl FM (1982) Document analysis system. IBM J Res Develop 26:647–656CrossRefGoogle Scholar
  52. 52.
    Zaho W, Wang J, Bhat D, Sakiewiez K, Nandhakumar N (1999) Improving color based video shot detection. In: Proc. IEEE Int. Conf. on multimedia computing and systems, Florence, Italy, pp 752–756Google Scholar
  53. 53.
    Zhang D, Lu G (2003) Evaluation of similarity measurement for image retrieval. In: Proc. IEEE int. conf. on neural network and signal processing, Nanjing, China, pp 928–931Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Department of InformaticsUniversity of FribourgFribourgSwitzerland

Personalised recommendations