Advertisement

Multimedia Systems

, Volume 12, Issue 4–5, pp 439–457 | Cite as

Meeting browsing

State-of-the-art review
  • Matt-M. Bouamrane
  • Saturnino Luz
Regular Paper

Abstract

Meeting, to discuss and share information, take decisions and allocate tasks, is a central aspect of human activity. Computer mediated communication offers enhanced possibilities for synchronous collaboration by allowing seamless capture of meetings, thus relieving participants from time-consuming documentation tasks. However, in order for meeting systems to be truly effective, they must allow users to efficiently navigate and retrieve information of interest from recorded meetings. In this article, we review the state of the art in multimedia segmentation, indexing and browsing techniques and show how existing meeting browser systems build on these techniques and integrate various modalities to meet their users’ information needs.

Keywords

Multimedia segmentation Indexing and retrieval Multimodal meeting browsers 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Aigrain P., Zhang H., Petkovic D. (1996) Content-based representation and retrieval of visual media: a state-of-the-art review. Multimed. Tools Appl. 3, 179–202CrossRefGoogle Scholar
  2. 2.
    Allan, J., Carbonell, J., Doddington, G., Yamron, J., Yang, Y.: Topic detection and tracking pilot study: final report. In: Proceedings of the DARPA broadcast news transcription and understanding workshop (1998)Google Scholar
  3. 3.
    Arons, B.: Techniques, perception, and applications of time-compressed speech. In: Proceedings of conference of American voice I/O society, pp. 169–177 (1992)Google Scholar
  4. 4.
    Arons B. (1997) Speechskimmer: a system for interactively skimming recorded speech. ACM Trans. Comput. Hum. Interact. 4(1):3–38CrossRefGoogle Scholar
  5. 5.
    Boreczky, J., Girgensohn, A., Golovchinsky, G., Uchihashi, S.: An interactive comic book presentation for exploring video. In: Proceedings of CHI’00: human factors in computing systems, pp. 185–192. ACM Press (2000)Google Scholar
  6. 6.
    Bouamrane, M.M., Luz, S.: Navigating multimodal meeting recordings with the Meeting Miner. In: Proceedings of flexible query answering systems, FQAS’2006, LNCS, vol. 4027, pp. 356–367. Springer, Berlin Heidelberg New York (2006)Google Scholar
  7. 7.
    Bouamrane M.M., Luz S., Masoodian M., King D. (2005) Supporting remote collaboration through structured activity logging. In: Hai Zhuge G.C.F. (eds) Proceedings of 4th international conference on grid and cooperative computing, GCC 2005, LNCS, vol. 3795. Springer, Berlin Heidelberg New York, pp. 1096–1107CrossRefGoogle Scholar
  8. 8.
    Brotherton, J.A., Bhalodia, J.R., Abowd, G.D.: Automated capture, integration, and visualization of multiple media streams. In: Proceedings of the international conference on multimedia computing and systems, ICMCS ’98, p. 54. IEEE Computer Society (1998)Google Scholar
  9. 9.
    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st ACM sigir conference on research and development in information retrieval, SIGIR ’98, pp. 335–336. ACM Press (1998)Google Scholar
  10. 10.
    Chen, F., Withgott, M.: The use of emphasis to automatically summarize a spoken discourse. In: Proceedings of IEEE conference on acoustics, speech, and signal processing, ICASSP’92, vol. 1, pp. 229–232 (1992)Google Scholar
  11. 11.
    Chiu, P., Boreczky, J., Girgensohn, A., Kimber, D.: Liteminutes: an Internet-based system for multimedia meeting minutes. In: Proceedings of the 10th international conference on World Wide Web, WWW ’01, pp. 140–149. ACM Press (2001)Google Scholar
  12. 12.
    Chiu, P., Kapuskar, A., Reitmeier, S., Wilcox, L.: NoteLook: taking notes in meetings with digital video and ink. In: Proceedings of the 7th ACM international conference on multimedia (Part 1), MULTIMEDIA ’99, pp. 149–158. ACM Press (1999)Google Scholar
  13. 13.
    Chiu, P., Kapuskar, A., Wilcox, L., Reitmeier, S.: Meeting capture in a media enriched conference room. In: CoBuild ’99: Proceedings of the 2nd international workshop on cooperative buildings, integrating information, organization, and architecture, pp. 79–88. Springer, Berlin Heidelberg New York (1999)Google Scholar
  14. 14.
    Choi, J., Hindle, D., Pereira, F., Singhal, A., Whittaker, S.: Spoken content-based audio navigation (SCAN). In: Proceedings of the ICPhS-99 (1999)Google Scholar
  15. 15.
    Cutler, R., Rui, Y., Gupta, A., Cadiz, J.J., Tashev, I., wei He, L., Colburn, A., Zhang, Z., Liu, Z., Silverberg, S.: Distributed meetings: a meeting capture and broadcasting system. In: ACM multimedia, pp. 503–512. ACM Press (2002)Google Scholar
  16. 16.
    Dharanipragada S., Roukos S. (2002) A multistage algorithm for spotting new words in speech. IEEE Trans. Speech Audio Process. 10(8):542–550CrossRefGoogle Scholar
  17. 17.
    Dionisio J.D.N., Cardenas A.F. (1998) Unified data model for representing multimedia, timeline, and simulation data. IEEE Trans. Knowl. Data Eng. 10(5):746–767CrossRefGoogle Scholar
  18. 18.
    Erol, B., Lee, D.S., Hull, J.J.: Multimodal summarization of meeting recordings. In: Proceedings of international conference on multimedia and expo, ICME ’03, vol. 3, pp. 25–28 (2003)Google Scholar
  19. 19.
    Erol, B., Li, Y.: An overview of technologies for e-meeting and e-lecture. In: IEEE international conference on multimedia and expo, pp. 1000–1005 (2005)Google Scholar
  20. 20.
    Foote, J.: An overview of audio information retrieval. In: ACM multimedia systems, vol. 7, pp. 2–10 (1999)Google Scholar
  21. 21.
    Furui, S.: Automatic speech recognition and its application to information extraction. In: Proceedings of the 37th annual meeting of the association for computational linguistics, pp. 11–20. ACL (1999)Google Scholar
  22. 22.
    Furui, S.: Robust methods in automatic speech recognition and understanding. In: Proceedings EUROSPEECH, vol. III, pp. 1993–1998 (2003)Google Scholar
  23. 23.
    Garofolo, J.S., Voorhees, E.M., Auzanne, C.G., Stanford, V.M.: Spoken document retrieval: 1998 evaluation and investigation of new metrics. In: Proceedings of ESCA ETRW on accessing information in spoken audio, pp. 1–7 (1999)Google Scholar
  24. 24.
    Geyer, W., Richter, H., Abowd, G.D.: Making multimedia meeting records more meaningful. In: Proceedings of international conference on multimedia and expo, ICME ’03, vol. 2, pp. 669–672 (2003)Google Scholar
  25. 25.
    Geyer, W., Richter, H., Fuchs, L., Frauenhofer, T., Daijavad, S., Poltrock, S.: A team collaboration space supporting capture and access of virtual meetings. In: Proceedings of the 2001 international conference on supporting group work, GROUP ’01, pp. 188–196. ACM Press (2001)Google Scholar
  26. 26.
    Gibbs S., Breiteneder C., Tsichritzis D. (1994) Data modeling of time-based media. ACM SIGMOD Record. 23(2):91–102CrossRefGoogle Scholar
  27. 27.
    Goldman J., Renals S., Bird S., de Jong F., Federico M., Fleischhauer C., Kornbluh M., Lamel L., Oard D., Stewart C., Wright R. (2005) Accessing the spoken word. Int. J. Digit. Libr. 5(4):287–298CrossRefGoogle Scholar
  28. 28.
    Hanjalic, A.: Generic approach to highlights extraction from a sport video. In: Proceedings of international conference on image processing, ICIP 2003, vol. 1, pp. 1–4. IEEE Press (2003)Google Scholar
  29. 29.
    Hearst, M.A.: Multi-paragraph segmentation of expository text. In: Proceedings of the 32nd annual meeting of the association for computational linguistics, pp. 9–16. ACL (1994)Google Scholar
  30. 30.
    Hindus, D., Schmandt, C.: Ubiquitous audio: capturing spontaneous collaboration. In: Proceedings of the 1992 ACM conference on computer-supported cooperative work, CSCW ’92, pp. 210–217. ACM Press (1992)Google Scholar
  31. 31.
    Hirschberg, J., Whittaker, S., Hindle, D., Pereira, F., Singhal, A.: Finding information in audio: a new paradigm for audio browsing and retrieval. In: Mani, I., Maybury, M.T. (eds.) Proceedings of the ESCA workshop: accessing information in spoken audio, pp. 117–122 (1999)Google Scholar
  32. 32.
    Jaimes, A., Omura, K., Nagamine, T., Hirata, K.: Memory cues for meeting video retrieval. In: CARPE’04: Proceedings of the the 1st ACM workshop on continuous archival and retrieval of personal experiences, pp. 74–85. ACM Press (2004)Google Scholar
  33. 33.
    James, D.A., Young, S.J.: A fast lattice-based approach to vocabulary independant worspotting. In: Proceedings of international conference on acoustics, speech, and signal processing, ICASSP-94, vol. 1, pp. 377–380 (1994)Google Scholar
  34. 34.
    Janin, A., Ang, J., Bhagat, S., Dhillon, R., Edwards, J., Macias-Guarasa, J., Morgan, N., Peskin, B., Shriberg, E., Stolcke, A., Wooters, C., Wrede, B.: The ICSI meeting project: resources and research. In: NIST ICASSP meeting recognition workshop (2004)Google Scholar
  35. 35.
    Ju, W., Ionescu, A., Neeley, L., Winograd, T.: Where the wild things work: capturing shared physical design workspaces. In: CSCW ’04: Proceedings of the 2004 ACM conference on computer supported cooperative work, pp. 533–541. ACM Press (2004)Google Scholar
  36. 36.
    Jurafsky D., Martin J.H. (2000) Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Prentice-Hall, Englewood CliffsGoogle Scholar
  37. 37.
    Koumpis K., Renals S. (2005) Content-based access to spoken audio. IEEE Signal Proc. Mag. 22(5):61–69CrossRefGoogle Scholar
  38. 38.
    Lee, D.S., Erol, B., Graham, J., Hull, J.J., Murata, N.: Portable meeting recorder. In: Proceedings of the 10th ACM international conference on multimedia, MULTIMEDIA ’02, pp. 493–502. ACM Press (2002)Google Scholar
  39. 39.
    Lee, D.S., Hull, J., Erol, B., Graham, J.: Minuteaid: multimedia note-taking in an intelligent meeting room. In: IEEE international conference on multimedia and expo, vol. 3, pp. 1759–1762. IEEE Press (2004)Google Scholar
  40. 40.
    Li, F.C., Gupta, A., Sanocki, E., wei He, L., Rui, Y.: Browsing digital video. In: CHI ’00: Proceedings of the SIGCHI conference on human factors in computing systems, pp. 169–176. ACM Press (2000)Google Scholar
  41. 41.
    Luz, S.: Interleave factor and multimedia information visualisation. In: Sharp, H., Chalk, P. (eds.) Proceedings of human computer interaction, vol. 2, pp. 142–146 (2002)Google Scholar
  42. 42.
    Luz, S., Masoodian, M.: A mobile system for non-linear access to time-based data. In: Proceedings of the working conference on advanced visual interfaces, AVI ’04, pp. 454–457. ACM Press (2004)Google Scholar
  43. 43.
    Luz, S., Masoodian, M.: A model for meeting content storage and retrieval. In: Proceedings of the 11th international multimedia modelling conference, MMM’05, pp. 392–398 (2005)Google Scholar
  44. 44.
    Luz, S., Roy, D.: Meeting browser: a system for visualising and accessing audio in multicast meetings. In: Society, I.S.P. (ed.)Proceedings of the international workshop on multimedia signal processing (1999)Google Scholar
  45. 45.
    Martinez, J., Koenen, R., Pereira, F.: MPEG-7: the generic multimedia content description standard, part 1. IEEE Multimedia 9(1070-986X), 78–87 (2002)Google Scholar
  46. 46.
    Masoodian M., Luz S. (2001) Comap: A content mapper for audio-mediated collaborative writing. In: Smith M.J., Savendy G., Harris D., Koubek R.J. (eds) Usability evaluation and interface design, vol. 1. Lawrence Erlbaum, Hillsdale, pp. 208–212Google Scholar
  47. 47.
    Masoodian, M., Luz, S., Bouamrane, M.M., King, D.: Recoled: A group-aware collaborative text editor for capturing document history. In: Proceedings of WWW/Internet 2005, vol. 1, pp. 323–330 (2005)Google Scholar
  48. 48.
    Masoodian, M., Luz, S., Weng, C.: Hanmer: A mobile tool for browsing recorded collaborative meeting contents. In: Kemp, E., Philip, C., Wong, W. (eds.) Proceedings of CHI-NZ ’03, pp. 87–92. ACM Press (2003)Google Scholar
  49. 49.
    McCowan I., Gatica-Perez D., Bengio S., Lathoud G., Barnard M., Zhang D. (2005) Automatic analysis of multimodal group actions in meetings. IEEE Trans. Pattern Anal. Mach. Intell. 27(3):305–317CrossRefGoogle Scholar
  50. 50.
    Meghini C., Sebastiani F., Straccia U. (2001) A model of multimedia information retrieval. J. ACM 48(5):909–970CrossRefMathSciNetGoogle Scholar
  51. 51.
    Moran, T.P., Palen, L., Harrison, S., Chiu, P., Kimber, D., Minneman, S., van Melle, W., Zellweger, P.: “I’ll get that off the audio”: a case study of salvaging multimedia meeting records. In: Proceedings of ACM conference on human factors in computing systems, CHI 97, vol. 1, pp. 202–209 (1997)Google Scholar
  52. 52.
    Rabiner L.R., Juang B.H. (1993) Fundamentals of speech recognition. Prentice-Hall, Englewood CliffsGoogle Scholar
  53. 53.
    Richter, H.A., Abowd, G.D., Geyer, W., Fuchs, L., Daijavad, S., Poltrock, S.E.: Integrating meeting capture within a collaborative team environment. In: Proceedings of UbiComp ’01, pp. 123–138. Springer, Berlin Heidelberg New York (2001)Google Scholar
  54. 54.
    Rohlicek, J., Russell, W., Roukos, S., Gish, H.: Continuous hidden Markov modeling for speaker-independent word spotting. In: Proceedings of international conferenceof acoustics, speech, and signal processing, ICASSP-89, vol. 1, pp. 627–630 (1989)Google Scholar
  55. 55.
    Rose, R.C., Paul, D.B.: A hidden Markov model based keyword recognition system. In: Proceedings of international conference on acoustics, speech, and signal processing, ICASSP-90, vol. 1, pp. 129–132 (1990)Google Scholar
  56. 56.
    Roy, D., Malamud, C.: Speaker identification based text to audio alignment for an audio retrieval system. In: Proceedings of the 1997 IEEE international conference on acoustics, speech, and signal processing, ICASSP ’97, vol. 2, pp. 1099–1102. IEEE Computer Society (1997)Google Scholar
  57. 57.
    Russell, D.M.: A design pattern-based video summarization technique: moving from low-level signals to high-level structure. In: HICSS ’00: Proceedings of the 33rd Hawaii international conference on system sciences, vol. 3, p. 3048. IEEE Computer Society (2000)Google Scholar
  58. 58.
    Santini S., Gupta A., Jain R. (2001) Emergent semantics through interaction in image databases. IEEE Trans. Knowl. Data Eng. 13(3):337–411CrossRefGoogle Scholar
  59. 59.
    Shriberg E., Stolcke A., Hakkani-Tur D., Tur G. (2000) Prosody-based automatic segmentation of speech into sentences and topics. Speech Commun. 32(1–2):127–154CrossRefGoogle Scholar
  60. 60.
    Singh, R., Li, Z., Kim, P., Pack, D., Jain, R.: Event-based modeling and processing of digital media. In: Proceedings of CVDB’04: computer vision meets databases, pp. 19–26. ACM Press (2004)Google Scholar
  61. 61.
    Smith, M.A., Kanade, T.: Video skimming and characterization through the combination of image and language understanding techniques. In: Proceedings of workshop on content-based access of image and video database, pp. 61–70. IEEE Computer Society (1998)Google Scholar
  62. 62.
    Snoek C.G.M., Worring M. (2005) Multimodal video indexing: a review of the state-of-the-art. Multimed. Tools Appl. 25(1):5–35CrossRefGoogle Scholar
  63. 63.
    Srinivasan, S., Ponceleon, D., Amir, A., Petkovic, D.: What is in that video anyway?: in search of better browsing. In: Proceedings of IEEE conference on multimedia computing and systems, vol. 1, pp. 388–393 (1999)Google Scholar
  64. 64.
    Stifelman, L., Arons, B., Schmandt, C.: The audio notebook: paper and pen interaction with structured speech. In: Proceedings of CHI’01: Human factors in computing systems, pp. 182–189. ACM Press (2001)Google Scholar
  65. 65.
    Tucker S., Whittaker S. (2005) Accessing multimodal meeting data: systems, problems and possibilities. In: Samy Bengio H.B. (eds) Machine learning for multimodal interaction: first international workshop, MLMI 2004, vol. 3361. Springer, Berlin Heidelberg New York, pp. 1–11Google Scholar
  66. 66.
    Tur G., Hakkani-Tur D., Stolcke A., Shriberg E. (2001) Integrating prosodic and lexical cues for automatic topic segmentation. Comput. Linguist. 27(1):31–57CrossRefGoogle Scholar
  67. 67.
    Uchihashi, S., Foote, J., Girgensohn, A., Boreczky, J.: Video manga: generating semantically meaningful video summaries. In: MULTIMEDIA ’99: Proceedings of the 7th ACM international conference on multimedia (Part 1), pp. 383–392. ACM Press (1999)Google Scholar
  68. 68.
    Valenza, R., Robinson, T., Hickey, M., Tucker, R.: Summarisation of spoken audio through information extraction. In: Proceedings of the ESCA workshop: accessing information in spoken audio, pp. 111–115 (1999)Google Scholar
  69. 69.
    Waibel, A., Bett, M., Finke, M., Stiefelhagen, R.: Meeting browser: tracking and summarizing meetings. In: Penrose, D.E.M. (ed.) Proceedings of the broadcast news transcription and understanding workshop, pp. 281–286. Morgan Kaufmann (1998)Google Scholar
  70. 70.
    Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., Zechner, K.: Advances in automatic meeting record creation and access. In: Proceedings of the international conference on acoustics, speech and signal processing, pp. 597–600 (2001)Google Scholar
  71. 71.
    Weintraub, M.: Keyword-spotting using SRI’s decipher large-vocabulary speech-recognition system. In: Proceedings of IEEE international conference on acoustics, speech, and signal processing, ICASSP-93, vol. 2, pp. 463–466 (1993)Google Scholar
  72. 72.
    Wellner P., Flynn M., Guillemot M. (2004) Browsing recorded meetings with Ferret. In: Bengio S., Bourlard H. (eds) Proceedings of machine learning for multimodal interaction: first international workshop, MLMI 2004, vol. 3361. Springer, Berlin Heidelberg New York, pp. 12–21Google Scholar
  73. 73.
    Wellner, P., Flynn, M., Tucker, S., Whittaker, S.: A meeting browser evaluation test. In: CHI ’05 extended abstracts on human factors in computing systems, pp. 2021–2024. ACM Press (2005)Google Scholar
  74. 74.
    Whittaker, S., Hirschberg, J., Choi, J., Hindle, D., Pereira, F., Singhal, A.: Scan: designing and evaluating user interfaces to support retrieval from speech archives. In: Proceedings of the 22nd ACM SIGIR conference on research and development in information retrieval, SIGIR’99, pp. 26–33. ACM Press (1999)Google Scholar
  75. 75.
    Whittaker, S., Hirschberg, J., Nakatani, C.H.: Play it again: a study of the factors underlying speech browsing behavior. In: CHI ’98: CHI 98 conference summary on human factors in computing systems, pp. 247–248. ACM Press (1998)Google Scholar
  76. 76.
    Whittaker, S., Hyland, P., Wiley, M.: Filochat: handwritten notes provide access to recorded conversations. In: Proceedings of the ACM conference on human factors in computing systems, pp. 24–28. ACM Press (1994)Google Scholar
  77. 77.
    Wilcox, L., Kimber, D., Chen, F.: Audio indexing using speaker identification. In: Proceedings of conference on automatic systems for the inspection and identification of humans, pp. 149–157 (1994)Google Scholar
  78. 78.
    Yamron, J., Carp, I., Gillick, L., Lowe, S., van Mulbregt, P.: Event tracking and text segmentation via hidden Markov models. In: Proceedings of IEEE workshop on automatic speech recognition and understanding, pp. 519–526 (1997)Google Scholar
  79. 79.
    Young, S.: Large vocabulary continuous speech recognition: a review. In: Proceedings of the IEEE workshop on automatic speech recognition and understanding, pp. 3–28 (1995)Google Scholar
  80. 80.
    Zechner, K.: Automatic generation of concise summaries of spoken dialogues in unrestricted domains. In: Procedings of the conference on research and development in information retrieval, SIGIR’01, pp. 199–207. ACM Press (2001)Google Scholar
  81. 81.
    Zechner, K., Waibel, A.: DiaSumm: flexible summarization of spontaneous dialogues in unrestricted domains. In: Proceedings of the 18th conference on computational linguistics, pp. 968–974. ACL (2000)Google Scholar

Copyright information

© Springer-Verlag 2006

Authors and Affiliations

  1. 1.Department of Computer ScienceTrinity College DublinDublinIreland

Personalised recommendations