Multimedia Systems

, Volume 13, Issue 2, pp 89–102 | Cite as

An analytical evaluation of search by content and interaction patterns on multimodal meeting records

  • Matt-M. Bouamrane
  • Saturnino LuzEmail author


It has been suggested that combining content-based indexing with automatically generated temporal metadata might help improve search and browsing of recordings of computer-mediated collaborative activities such as on-line meetings, which are characterised by extensive multimodal communication. This paper presents an analytical evaluation of the effectiveness of these techniques as implemented through automatic speech recognition and temporal mapping. In particular, it assesses the extent to which this strategy can help uncover contextual relationships between audio and text segments in recorded remote meetings. Results show that even simple temporal mapping can effectively support retrieval of recorded audio segments, improve retrieval performance in situations where speech recognition alone would have exhibited prohibitively high word error rates, and provide a basic form of semantic adaptation.


Automatic Speech Recognition Speech Segment Topic Search Topic Shift Meeting Data 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Agius, H., Angelides, M.C.: Enriching MPEG-7 user models with content metadata. In: Proceedings of the 1st International Workshop on Semantic Media Adaptation and Personalization: SMAP’06, pp. 151–156 (2006)Google Scholar
  2. 2.
    Allen J.F. (1983). Maintaining knowledge about temporal intervals. Commun. ACM 11(26): 832–843 CrossRefGoogle Scholar
  3. 3.
    Bouamrane M.M. and Luz S. (2006). Meeting browsing: a state-of-the-art review. Multimedia Syst 12(4–5): 439–457 Google Scholar
  4. 4.
    Bouamrane, M.M., Luz, S.: Navigating multimodal meeting recordings with the meeting miner. In: Flexible Query Answering Systems: FQAS 2006, LNAI, vol. 4027, pp. 356–367. Springer, Milan (2006)Google Scholar
  5. 5.
    Bouamrane, M.M., Luz, S.: Temporal mining of recorded collaborative production of artefacts. In: Proceedings of Industrial Conference on Data Mining, ICDM’06, pp. 187–201, Leipzig (2006)Google Scholar
  6. 6.
    Erol, B., Li, Y.: An overview of technologies for e-meeting and e-lecture. In: IEEE International Conference on Multimedia and Expo, ICME’05, pp. 1000–1005. IEEE press, Amsterdam (2005)Google Scholar
  7. 7.
    Furui, S.: Automatic speech recognition and its application to information extraction. In: Proceedings of the 37th annual meeting of the Association for Computational Linguistics, pp. 11–20. Morristown (1999)Google Scholar
  8. 8.
    Geyer, W., Richter, H., Abowd, G.D.: Making multimedia meeting records more meaningful. In: Proceedings of International Conference on Multimedia and Expo, ICME’03, vol. 2, pp. 669–672 (2003)Google Scholar
  9. 9.
    Jain R. (2003). Are we doing multimedia?. IEEE MultiMedia 10(4): 111–112 CrossRefGoogle Scholar
  10. 10.
    Koumpis K. and Renals S. (2005). Content-based access to spoken audio. IEEE Signal Process. 22(5): 61–69 CrossRefGoogle Scholar
  11. 11.
    Lee, D.S., Hull, J., Erol, B., Graham, J.: Minuteaid: multimedia note-taking in an intelligent meeting room. In: IEEE International Conference on Multimedia and Expo, vol. 3, pp. 1759 – 1762. IEEE press, New York (2004)Google Scholar
  12. 12.
    Luz, S., Bouamrane, M.M., Masoodian, M.: Gathering a corpus of multimodal computer-mediated meetings with focus on text and audio interaction. In: Proceedings of the International Conference on Language Resources and Evaluation, LREC 2006, pp. 407–412. Genoa (2006)Google Scholar
  13. 13.
    Luz, S., Roy, D.M.: Meeting browser: A system for visualising and accessing audio in multicast meetings. In: Proceedings of the International Workshop on Multimedia Signal Processing, pp. 489–494. IEEE Signal Process. Soc. (1999)Google Scholar
  14. 14.
    Masoodian, M., Luz, S., Bouamrane, M.M., King, D.: RECOLED: A group-aware collaborative text editor for capturing document history. In: Proceedings of WWW/Internet 2005, vol. 1, pp. 323–330. Lisbon (2005)Google Scholar
  15. 15.
    McCowan I., Gatica-Perez D., Bengio S., Lathoud G., Barnard M. and Zhang D. (2005). Automatic analysis of multimodal group actions in meetings. IEEE Trans. Pattern Anal. Mach. Intell. 27(3): 305–317 CrossRefGoogle Scholar
  16. 16.
    Nakatani, C., Whittaker, S., Hirschberg, J.: Now you hear it, now you don’t: Empirical studies of audio browsing behavior. In: Proceedings of International Conference on Spoken Language Processing, ICSLP 1998, vol. 4, pp. 1651–1654. Sydney (1998)Google Scholar
  17. 17.
    Rijsbergen C.J.V. (1979). Information Retrieval. Butterworths, London, UK Google Scholar
  18. 18.
    Sellen, A.J.: Speech patterns in video-mediated conversations. In: Proceedings of the SIGCHI conference on Human factors in computing systems: CHI’92, pp. 49–59. ACM Press, New York (1992)Google Scholar
  19. 19.
    Smeaton, A.F.: Indexing, browsing, and searching of digital video and digital audio information. LNCS Lectures on information retrieval pp. 93–110 (2001)Google Scholar
  20. 20. Scholar
  21. 21.
    Tannen, D.: Talking voices, repetition, dialogue and imagery in conversational discourse. Studies in interactional sociolinguistics. Cambridge Univ. Press, (1989)Google Scholar
  22. 22.
    Tucker, S., Whittaker, S.: Accessing multimodal meeting data: systems, problems and possibilities. In: Machine Learning for Multimodal Interaction: MLMI 2004, vol. LNCS 3361, pp. 1–2. Springer, Heidelberg (2005)Google Scholar
  23. 23.
    Waibel, A., Bett, M., Metze, F., Ries, K., Schaaf, T., Schultz, T., Soltau, H., Yu, H., Zechner, K.: Advances in automatic meeting record creation and access. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 597–600 (2001)Google Scholar
  24. 24.
    Wellner, P., Flynn, M., Guillemot, M.: Browsing recorded meetings with Ferret. In: Bengio S., Bourlard H. (eds.) In: Proceedings of Machine Learning for Multimodal Interaction: 1st International Workshop, MLMI 2004, vol. 3361, pp. 12–21. Springer-, Martigny (2004)Google Scholar
  25. 25.
    Wellner, P., Flynn, M., Tucker, S., Whittaker, S.: A meeting browser evaluation test. In: CHI ’05 Extended abstracts on Human factors in computing systems, pp. 2021–2024. ACM Press, New York (2005)Google Scholar
  26. 26.
    Zechner, K.: Automatic generation of concise summaries of spoken dialogues in unrestricted domains. In: Proceedings of the 24th annual conference on Research and development in information retrieval, SIGIR ’01, pp. 199–207. ACM Press, New York (2001)Google Scholar

Copyright information

© Springer-Verlag 2007

Authors and Affiliations

  1. 1.School of Computer ScienceUniversity of ManchesterManchesterUK
  2. 2.Department of Computer ScienceTrinity College DublinDublinIreland

Personalised recommendations