International Journal on Digital Libraries

, Volume 5, Issue 4, pp 287–298

Accessing the spoken word

  • Jerry Goldman
  • Steve Renals
  • Steven Bird
  • Franciska de Jong
  • Marcello Federico
  • Carl Fleischhauer
  • Mark Kornbluh
  • Lori Lamel
  • Douglas W. Oard
  • Claire Stewart
  • Richard Wright
Regular contribution


Spoken-word audio collections cover many domains, including radio and television broadcasts, oral narratives, governmental proceedings, lectures, and telephone conversations. The collection, access, and preservation of such data is stimulated by political, economic, cultural, and educational needs. This paper outlines the major issues in the field, reviews the current state of technology, examines the rapidly changing policy issues relating to privacy and copyright, and presents issues relating to the collection and preservation of spoken audio content .


Spoken document retrieval Preservation Copyright Speech technology Content annotation 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    IASA Technical Committee(1997) The safeguarding of the audio heritage: ethics, principles and preservation strategy, February 1997. IASA-TC 03 Version 1Google Scholar
  2. 2.
    (1999) Risk management suggestions. In: Multimedia Web Strategist 5Google Scholar
  3. 3.
    Appelt D, Martin D (1999) Named entity recognition in speech: approach and results using the TextPro system. In: Proc DARPA workshop on broadcast news, pp 51–54Google Scholar
  4. 4.
    Arons B (1997) SpeechSkimmer: a systen for interactively skimming recorded speech. ACM Trans Comput Hum Interact 4:3–38CrossRefGoogle Scholar
  5. 5.
    Bird S, Harrington J (eds) (2001) Special issue on speech annotation and corpus tools. Speech Commun 33(1–2):1–174CrossRefGoogle Scholar
  6. 6.
    Bird S, Simons G (2003) Seven dimensions of portability for language documentation and description. Language 79:557–582CrossRefGoogle Scholar
  7. 7.
    Campbell JP Jr (1997) Speaker recognition: a tutorial. Proc IEEE 85:1437–1462CrossRefGoogle Scholar
  8. 8.
    Chen S, Gopalakrishnan PS (1998) Clustering via the Bayesian Information Criterion with applications in speech recognition. In: Proceedings of IEEE ICASSP-98, pp 645–648Google Scholar
  9. 9.
    Christensen CM (1997) The innovator’s dilemma. Harvard Business School Press, BostonGoogle Scholar
  10. 10.
    Electronic Privacy Information Center (EPIC) and Privacy International (2002) Privacy and Human Rights 2002, Washington, DCGoogle Scholar
  11. 11.
    Garofolo JS, Auzanne CGP, Voorhees EM (2000) The TREC spoken document retrieval track: a success story. In: Proc. RIAO 2000Google Scholar
  12. 12.
    Gauvain J-L, Lamel L (2000) Large-vocabulary continuous speech recognition: advances and applications. Proc IEEE 88:1181–1200CrossRefGoogle Scholar
  13. 13.
    Glover R, Worlton A (2002) Trans-national employers must harmonize conflicting privacy rules. In: Metropolitan Corporate Counsel, Mid-atlantic edn. Metropolitan Corporate Counsel, Mountainside, NJ, p 20Google Scholar
  14. 14.
    Godsill SJ, Rayner PJW (1995) A Bayesian approach to the restoration of degraded audio signals. IEEE Trans Speech Audio Process 3:267–278CrossRefGoogle Scholar
  15. 15.
    Gotoh Y, Renals S (2000) Information extraction from broadcast news. Philos Trans R Soc Lond Ser A 358:1295–1310CrossRefMATHGoogle Scholar
  16. 16.
    Hori C, Furui S, Malkin R, Yu H, Waibel A (2003) A statistical approach for automatic speech summarization. EURASIP J Appl Signal Process 2:128–139CrossRefMATHGoogle Scholar
  17. 17.
    Lagoze C, Van de Sompel H (2001) The Open Archives Initiative: building a low-barrier interoperability framework. In: Proceedings of the 1st ACM/IEEE-CS joint conference on digital libraries, pp 54–62Google Scholar
  18. 18.
    Ling T (2002) Why the archive introduced digitisation on demand. RLG Diginews, 6(4) Scholar
  19. 19.
    Lippmann RP (1997) Speech recognition by machines and humans. Speech Commun 22(1):1–15CrossRefGoogle Scholar
  20. 20.
    Litman J (2001) Digital Copyright. Prometheus Books, Amherst, NY, p 84Google Scholar
  21. 21.
    Logan B, Robinson T (2001) Adaptive model-based speech enhancement. Speech Commun 34:351–368CrossRefMATHGoogle Scholar
  22. 22.
    Makhoul J, Kubala F, Leek T, Liu D, Nguyen L, Schwartz R, Srivastava A (2000) Speech and language technologies for audio indexing and retrieval. Proc IEEE 88:1338–1353CrossRefGoogle Scholar
  23. 23.
    Maybury M (ed) (2000) Special issue on news on demand. Commun ACM 43(2):32–34CrossRefGoogle Scholar
  24. 24.
    Oard DW (1997) Serving users in many languages: cross-language information retrieval. D-Lib Mag Scholar
  25. 25.
    Oard DW (2000) User interface design for speech-based retrieval. Bull Am Soc Inf Sci 26(5):20–22Google Scholar
  26. 26.
    Rigoll G (2001) The ALERT system: advanced broadcast speech recognition technology for selective dissemination of multimedia information. In: IEEE workshop on automatic speech recognition and understanding, pp 301–306Google Scholar
  27. 27.
    Rothenberg LE (2000) Rethinking privacy: peeping toms, video voyeurs and failure of the criminal law to recognize a reasonable expectiation of privacy in the public space. Am University Law Rev 49:1127Google Scholar
  28. 28.
    Simons G, Bird S (2003) Building an Open Language Archives Community on the OAI foundation. Library Hi Tech 21:210–218Google Scholar
  29. 29.
    Sundara Rajan MT (2002) Moral rights and copyright harmonization: prospects for an “international moral right”. In: 17th BILETA annual conference, April 2002Google Scholar
  30. 30.
    Wactlar HD, Kanade T, Smith MA, Stevens SM (1996) Intelligent access to digital video: informedia project. IEEE Comput 29(5):46–53CrossRefGoogle Scholar
  31. 31.
    Wahlster W (ed) (2000) Verbmobil: foundations of speech-to-speech translation. Springer, Berlin Heidelberg New YorkGoogle Scholar
  32. 32.
    Wayne C (2000) Multilingual topic detection and tracking: Successful research enabled by corpora and evaluation. In: Language resources and evaluation conference (LREC), pp 1487–1494Google Scholar
  33. 33.
    Whittaker S, Hirschberg J, Choi J, Hindle D, Pereira F, Singhal A (1999) SCAN: designing and evaluating user interfaces to support retrieval from speech archives. In: Proceedings of ACM SIGIR-99 conference on research and development in information retrieval, pp 26–33Google Scholar
  34. 34.
    World Intellectual Property Organization (WIPO) (1979) Berne Convention for the Protection of Literary and Artistic Works. Scholar
  35. 35.
    Young S (1996) A review of large-vocabulary continuous-speech recognition. IEEE Signal Process Mag 13(5):45–57CrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • Jerry Goldman
    • 1
  • Steve Renals
    • 2
  • Steven Bird
    • 3
    • 4
  • Franciska de Jong
    • 5
  • Marcello Federico
    • 6
  • Carl Fleischhauer
    • 7
  • Mark Kornbluh
    • 8
  • Lori Lamel
    • 9
  • Douglas W. Oard
    • 10
  • Claire Stewart
    • 11
  • Richard Wright
    • 12
  1. 1.Department of Political ScienceNorthwestern UniversityUSA
  2. 2.CSTR and School of InformaticsUniversity of EdinburghUK
  3. 3.LDCUniversity of PennsylvaniaUSA
  4. 4.Dept. of Computer ScienceUniversity of MelbourneAustralia
  5. 5.CTITUniversity of TwenteThe Netherlands
  6. 6.ITC-IRSTTrentoItaly
  7. 7.Library of CongressUSA
  8. 8.MATRIX and Department of HistoryMichigan State UniversityUSA
  9. 9.LIMSI-CNRSOrsayFrance
  10. 10.College of Information Studies/UMIACSUniversity of MarylandUSA
  11. 11.LibraryNorthwestern UniversityUSA
  12. 12.BBC Information and ArchivesUK

Personalised recommendations