Computers and the Humanities

, Volume 33, Issue 1–2, pp 11–30 | Cite as

XML and the TEI

  • Steven DeRose


Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are re-encoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the re-encoding of HTML documents, and with the structuring the overall corpus.

XML SGML TEI markup languages 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Abiteboul, Serge et al. “Querying Documents in Object Databases”. International Journal on Digital Libraries 1(1) (1997), 5–19.Google Scholar
  2. Association for Computers and the Humanities (ACH), Association for Computational Linguistics (ACL), and Association for Literary and Linguistic Computing (ALLC). Guidelines for Electronic Text Encoding and Interchange (TEI P3)〈/title〉. Ed. C. M. Sperberg-McQueen and Lou Burnard, Chicago, Oxford: Text Encoding Initiative, 1994. Also available from Scholar
  3. Bray, T., J. Paoli, and C. M. Sperberg-McQueen. Extensible Markup Language (XML) 1.0. W3C Recommendation, 10-February-1998.Google Scholar
  4. Brüggemann-Klein, Anne. Regular Expressions into Finite Automata. Universität Freiburg, Institut für Informatik, 1991.Google Scholar
  5. Brüggemann-Klein, A. and D. Wood. Deterministic Regular Languages. Universität Freiburg, Institut für Informatik, 1991.Google Scholar
  6. Burkowski, F. J. “An Algebra for Hierarchically Organized Text-Dominated Databases”. Waterloo, Ontario, Canada: Department of Computer Science, University of Waterloo. Manuscript: Portions “appeared as part of a paper presented at RIAO '91: Intelligent Text and Image Handling, Barcelona, Spain, Apr. 1991.”Google Scholar
  7. Catano, J. V. “Poetry and Computers: Experimenting with the Communal Text”. Computers and the Humanities 13(9) (1979), 269–275.Google Scholar
  8. Coombs, J. H., A. H. Renear, and S. J. DeRose. “Markup Systems and the Future of Scholarly Text Processing”. Communications of the Association for Computing Machinery 30(11) (1987), 933–947.Google Scholar
  9. DeRose S. J., D. G. Durand, E. Mylonas and A. H. Renear. “What is Text, Really?” Journal of Computing in Higher Education 1(2) (1990), 3–26.Google Scholar
  10. DeRose, S. J. “Expanding the Notion of Links”. In Proceedings of Hypertext '89, Pittsburgh, PA, Baltimore, MD: Association for Computing Machinery Press, 1989.Google Scholar
  11. DeRose, S. J. The SGML FAQ Book: Understanding the Foundations of SGML and XML. Boston: Kluwer Academic Publishers. ISBN 0–7923–9943–9, 1997.Google Scholar
  12. DeRose, S. and E. Maler, Eds. “XML Linking Language (XLink)”. World Wide Web Consortium Working Draft. March 1998. Scholar
  13. DeRose, S. and E. Maler, Eds. “XML Pointer Language (XPointer)”. World Wide Web Consortium Working Draft. March 1998. Scholar
  14. International Organisation for Standardisation. ISO/IEC 10744. Hypermedia/Time-based Structuring Language: HyTime, 1992.Google Scholar
  15. Reid, B. “A High-level Approach to Computer Document Formatting”. Conference Record of the Seventh Annual ACM Symposium on Principles of Programming Languages, January, 1980.Google Scholar
  16. Reid, B. Scribe: A Document Specification Language and its Compiler. Ph.D. thesis, Carnegie-Mellon University, Pittsburgh, PA. Also available as Technical Report CMU-CS–81–100, 1981.Google Scholar
  17. Rice, S. “Editorial Text Structures (with some relations to information structures and format controls in computerized composition)”. Memo to ANSI Standards Planning and Requirements Committee. March 17, 1970.Google Scholar
  18. Shannon, C. E. and W. Weaver. The Mathematical Theory of Communication, Reprinted, Urbana: University of Illinois Press, 1971 (1949).Google Scholar
  19. Subramanian, B., T.W. Leung, S. L. Vandenberg and S. B. Zdonik. “The AQUA Approach to Querying Lists and Trees in Object-Oriented Databases”. Presented at the International Conference on Data Engineering, Taipei, Taiwan. Available from the authors, 1995.Google Scholar
  20. Trigg, R. H. “Guided Tours and Tabletops: Tools for Communicating in a Hypertext Environment”. In ACM Transactions on Office Information Systems, 6.4 (October 1988), 398–414.Google Scholar

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Steven DeRose
    • 1
  1. 1.Silver SpringUSA

Personalised recommendations