International Journal on Digital Libraries

, Volume 5, Issue 4, pp 309–316 | Cite as

Emerging language technologies and the rediscovery of the past: a research agenda

  • Gregory Crane
  • Kalina Bontcheva
  • Jeffrey A. Rydberg-Cox
  • Clifford Wulfman
Regular contribution
  • 61 Downloads

Abstract

During 2001 and 2002, our Delos/NSF working group explored the possibilities that emerging language technologies open up for teaching, learning, and research in the broad area of cultural heritage. On the one hand, emerging language technologies will profoundly redefine the research and teaching of all those working with cultural heritage languages. At the same time, developers of language technology would also benefit from exploring the needs of new audiences and new collections. While multilingual technologies may ultimately prove the most revolutionary, this report focuses on monolingual technologies such as information extraction, summarization, and other aspects of document understanding. In this paper, we describe some of the audiences affected and technologies to be evaluated and argue for the creation of venues where the application of these technologies to cultural heritage materials can be rigorously evaluated. The potential impact of language technologies for our understanding of the past will emerge over a long period of time and will doubtless include many techniques not covered here. We make no claim to a comprehensive survey. Our goal is to provide enough information to suggest the potential importance of these new technologies.

Keywords

Language technologies Hypertext Digital libraries Cultural heritage information systems 

References

  1. 1.
    Rosenzweig R, Thelen DP (1998) The presence of the past: popular uses of history in American life – supplementary Web site. George Mason University, Fairfax, VAGoogle Scholar
  2. 2.
    Rosenzweig R (2002) Everyone a historian – afterthoughts to the presence of the past. George Mason University, Fairfax, VAGoogle Scholar
  3. 3.
    D’Addezio IJ (2002) United States Historical Society Directory. D’AddezioGoogle Scholar
  4. 4.
    Listokin D, Lahr ML (1997) Economic impacts of historic preservation. New Jersey Historic Trust, Trenton, NJ, p 484Google Scholar
  5. 5.
    Leithe J, Tigue P (1999) Profiting from the past: the economic impact of historic preservation in Georgia. Georgia Historic Preservation Division, Athens, GA. http://www.gashpo.org. 26Google Scholar
  6. 6.
    Listokin D et al (2002) Economic Impacts of Historic Preservation in Florida. Florida Department of State, Division of Historic Resources, Bureau of Historic PreservationGoogle Scholar
  7. 7.
    Commission MH (2002) Preservation works: the economics of preservation. In: Massachusettts Historic Preservation Conference, 27 September 2002Google Scholar
  8. 8.
    Burns K et al (1989) The Civil War. PBS Video, Alexandria, VAGoogle Scholar
  9. 9.
    Toplin RB (1996) Ken Burns’s The Civil War: the historian’s response. Oxford University Press, New York xxvii, 197Google Scholar
  10. 10.
    Rosenzweig R, Thelen DP (1998) The presence of the past: popular uses of history in American life. Columbia University Press, New York x, 291Google Scholar
  11. 11.
    Maynard D et al (2002) Adapting a robust multi-genre NE system for automatic content extraction. In: 10th international conference on artificial intelligence: methodology, systems, applicationsGoogle Scholar
  12. 12.
    Bikel D et al (1997) Nymble: a high-performance learning name-finder. In: Proceedings of the 5th ACM conference on applied natural language processing, pp 194–201Google Scholar
  13. 13.
    Technologies B, AFRL/IFED (2001) Information extraction (IE) technology for counterdrug applications. Department of Defense: Counterdrug Technology Development Program, Washington, DC, p 5Google Scholar
  14. 14.
    Sperberg-McQueen CM, Burnard L (eds) (2001) TEI P4: Guidelines for electronic text encoding and interchange – XML-compatible version. TEI-ConsortiumGoogle Scholar
  15. 15.
    Sperberg-McQueen CM, Burnard L (1990) Guidelines for the encoding and interchange of machine-readable texts, version 1.0 ed. The Association for Computers and the Humanities; the Association for Computational Linguistics; the Association for Literacy and Linguistic ComputingGoogle Scholar
  16. 16.
    Anand P et al (2001) Qanda and the Catalyst architecture. In: 10th Text REtrieval Conference (TREC 2001). Department of Commerce, National Institute of Standards and Technology, Gaithersburg, MDGoogle Scholar
  17. 17.
    Grishman R, Contractors TPI (1998) TIPSTER Text Architecture Design. New York University Press, New York, p 70Google Scholar
  18. 18.
    Program, ACftTTPI (1996) TIPSTER Text Phase II Architecture ConceptGoogle Scholar
  19. 19.
    Cunningham H et al (2002) Developing language processing components with GATE (a user guide). University of Sheffield Press, Sheffield, UKGoogle Scholar
  20. 20.
    Bird S et al (2002) TableTrans, MultiTrans, InterTrans and TreeTrans: diverse tools built on the Annotation Graph Toolkit. In: Proceedings of the 3rd international conference on language resources and evaluation, European Language Resources Association, ParisGoogle Scholar
  21. 21.
    Bird S, Liberman M (2001) A formal framework for linguistic annotation. Speech Commun 33(1–2):23–60CrossRefMATHGoogle Scholar
  22. 22.
    Cotton S, Bird S (2002) An integrated framework for treebanks and multilayer annotations. In: 3rd international conference on language resources and evaluation, European Language Resources Association, ParisGoogle Scholar
  23. 23.
    Miller E et al (2001) W3C Semantic Web. W3C World Wide Web ConsortiumGoogle Scholar
  24. 24.
    Berners-Lee T, Miller E (2002) The Semantic Web lifts off. In: ERCIM News: online editionGoogle Scholar
  25. 25.
    Crane G, Rydberg-Cox JA (2000) New technology and new roles: the need for “corpus editors”. In: 5th ACM conference on digital libraries, San Antonio, TX. ACM Press, New YorkGoogle Scholar
  26. 26.
    Rydberg-Cox JA, Mahoney A, Crane GR (2001) Document quality indicators and corpus editions. In: JDCL 2001: 1st ACM+IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New YorkGoogle Scholar
  27. 27.
    Friedland L et al (1999) TEI text encoding in libraries: draft guidelines for best encoding practices (version 1.0)Google Scholar
  28. 28.
    Crane G (1998) New technologies for reading: the lexicon and the digital library. Classical World 92:471–501CrossRefGoogle Scholar
  29. 29.
    Crane G (2000) Designing documents to enhance the performance of digital libraries: time, space, people and a digital library on London. D-Lib Mag 6(7/8)Google Scholar
  30. 30.
    Crane G et al (2000) The symbiosis between content and technology in the Perseus Digital Library. Cultivate Interact 1(2)Google Scholar
  31. 31.
    Crane G et al (2001) Drudgery and deep thought: designing digital libraries for the humanities. Commun ACM 44(5)Google Scholar
  32. 32.
    Crane G, Smith DA, Wulfman C (2001) Building a hypertextual digital library in the humanities: a case study on London. In: JDCL 2001: 1st ACM+IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New YorkGoogle Scholar
  33. 33.
    Crane G (1996) Building a digital library: the Perseus Project as a case study in the humanities. In: Proceedings of the 1st ACM international conference on digital librariesGoogle Scholar
  34. 34.
    Crane G (2002) In a digital world, no books is an island: designing electronic primary sources and reference works for the humanities. In: Breure L, Dillon A (eds) Creation, use and deployment of digital information. Lawrence Earlbaum Associates, p forthcomingGoogle Scholar
  35. 35.
    Crane G (2002) Cultural heritage digital libraries: needs and components. In: European conference on digital libraries, Rome. Springer, Berlin Heidelberg New YorkGoogle Scholar
  36. 36.
    Crane G (1998) The Perseus Project and beyond: how building a digital library challenges the humanities and technology. D-Lib MagGoogle Scholar
  37. 37.
    Crane G (2000) Extending a Digital Library: Beginning a Roman Perseus. New Eng Classical J 27(3):140–160Google Scholar
  38. 38.
    Page W (2002) Command post of the future. DARPAGoogle Scholar
  39. 39.
    Rydberg-Cox J et al (2004) Cross-lingual searching and visualization for Greek, Latin, and Old Norse texts. In: Join conference on digital libraries, Tucson, AZGoogle Scholar
  40. 40.
    Rydberg-Cox J et al (2004) Approaching the problem of multi-lingual information retrieval and visualization in Greek and Latin and Old Norse texts. In: European conference on digital librariesGoogle Scholar
  41. 41.
    Darwish K, Oard DW (2002) Term selection for searching printed Arabic. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tamere, Finland. ACM Press, New YorkGoogle Scholar
  42. 42.
    Mayfield J, McNamee P (2002) Converting on-line bilingual dictionaries from human-readable to machine-readable form. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tamere, Finland. ACM Press, New YorkGoogle Scholar
  43. 43.
    Hockey SM (2001) Electronic texts in the humanities: principles and practice. Oxford University Press, Oxford, UKGoogle Scholar
  44. 44.
    Fuhr N, Gövert N, Großjohann K (2002) HyREX: Hyper-media retrieval engine for XML. In: SIGIR 2002, Tampere, Finland. ACM Press, New YorkGoogle Scholar
  45. 45.
    Abolhassani M et al (2002) HyREX: Hypermedia retrieval engine for XML. University of Dortmund, Dortmund, GermanyGoogle Scholar
  46. 46.
    Fuhr N, Großjohann K (2001) XIRQL: A query language for information retrieval in XML. In: Croft B et al (eds) Proceedings of the 24th annual international conference on research and development in information retrieval. ACM, New York, pp 172–180Google Scholar
  47. 47.
    Fuhr N, Lalmas M, Kazai G (2002) INEX: Initiative for the evaluation of XML retrieval. University of Dortmund, Dortmund, GermanyGoogle Scholar
  48. 48.
    National Institute for Standards and Technology (2002) Automatic Content Extraction: ACE – Phase 2 – DocumentationGoogle Scholar
  49. 49.
    Ferro L et al (2001) TIDES temporal annotation guidelines. Mitre.org, McLean, VA, p 57Google Scholar
  50. 50.
    Pustejovsky J et al (2002) TimeML annotation guidelines. Brandeis University Press, Waltham, MA, p 49Google Scholar
  51. 51.
    Voorhees EM (2001) Overview of the TREC 2001 Question Answering Track. In: TREC 2001, Gaithersburg, MD. NISTGoogle Scholar
  52. 52.
    Lee Y-B, Myaeng SH (2002) Text genre classification with genre-revealing and subject-revealing features. In: SIGIR 2002: 25th annual international ACM SIGIR conference on research and development in information retrieval, Tampere, Finland. ACM Press, New YorkGoogle Scholar
  53. 53.
    Stamatatos E, Fakotakis N, Kokkinakis G (2000) Text genre detection using common word frequencies. In: COLING2000: 18th international conference on computational linguistics, Saarbrücken, GermanyGoogle Scholar
  54. 54.
    Kessler B, Nunberg G, Schütze H (1997) Automatic detection of text genre. In: ACL 97: Proceedings of the 35th annual meeting of the Association for Computational Linguistics and 8th conference of the European chapter of the Association for Computational LinguisticsGoogle Scholar
  55. 55.
    Rauber A, Müller-Kögler A (2001) Integrating automatic genre analysis into digital libraries. In: JCDL 2001: 1st ACM/IEEE joint conference on digital libraries, Roanoke, VA. ACM Press, New YorkGoogle Scholar

Copyright information

© Springer-Verlag 2005

Authors and Affiliations

  • Gregory Crane
    • 1
  • Kalina Bontcheva
    • 2
  • Jeffrey A. Rydberg-Cox
    • 3
  • Clifford Wulfman
    • 4
  1. 1.Tufts UniversityBostonUSA
  2. 2.University of SheffieldSheffieldUK
  3. 3.University of Missouri at Kansas CityKansas CityUSA
  4. 4.Brown UniversityProvidenceUSA

Personalised recommendations