Computers and the Humanities

, Volume 33, Issue 1–2, pp 11–30

XML and the TEI

  • Steven DeRose


Electronic texts are claimed to exhibit features distinct from their more tangible cousins. The Snapshot project aims to observe and capture language usage in an electronic medium by creating an open corpus of World Wide Web documents. These documents are re-encoded using the TEI guidelines to create a flexible, persistent and portable data repository. This report gives an overview of the decisions made with respect to the re-encoding of HTML documents, and with the structuring the overall corpus.

XML SGML TEI markup languages 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© Kluwer Academic Publishers 1999

Authors and Affiliations

  • Steven DeRose
    • 1
  1. 1.Silver SpringUSA

Personalised recommendations