Parsing TEI XML

L. Jockers, Matthew; Thalken, Rosamond

doi:10.1007/978-3-030-39643-5_12

Matthew L. Jockers⁸ &
Rosamond Thalken⁹

Part of the book series: Quantitative Methods in the Humanities and Social Sciences ((QMHSS))

4095 Accesses

Abstract

This chapter introduces readers to parsing XML in R with an emphasis on TEI encoded XML.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 16.99; Price excludes VAT (USA)

Hardcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Notice the different path here. The XML version of Moby Dick is located in a different sub-directory of the main “TAWR2.”
2.
<TEI xmlns = "http://www.tei-c.org/ns/1.0">.
3.
Notice that the xpath argument now includes the tei prefix.
4.
A node inside of another node is often referred to as a “child” node.
5.
As long as we are on this subject, the editors also decided that the “Etymology” and “Extracts” that come before the famous “Call me Ishmael” should not be treated as chapters either. What those sections are, exactly, is something for scholars to debate.

References

Eder M (2013) Mind your corpus: systematic errors in authorship attribution. Digital Scholarship in the Humanities 28(4):603–614, URL https://doi.org/10.1093/llc/fqt039, https://doi.org/10.1093/llc/fqt039

Download references

Author information

Authors and Affiliations

College of Arts and Sciences, Washington State University, Pullman, WA, USA
Matthew L. Jockers
Digital Technology and Culture Program, Washington State University, Pullman, WA, USA
Rosamond Thalken

Authors

Matthew L. Jockers
View author publications
You can also search for this author in PubMed Google Scholar
Rosamond Thalken
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

L. Jockers, M., Thalken, R. (2020). Parsing TEI XML. In: Text Analysis with R. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39643-5_12

Download citation

DOI: https://doi.org/10.1007/978-3-030-39643-5_12
Published: 31 March 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39642-8
Online ISBN: 978-3-030-39643-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics