Abstract
This chapter introduces readers to parsing XML in R with an emphasis on TEI encoded XML.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Notice the different path here. The XML version of Moby Dick is located in a different sub-directory of the main “TAWR2.”
- 2.
<TEI xmlns = "http://www.tei-c.org/ns/1.0">.
- 3.
Notice that the xpath argument now includes the tei prefix.
- 4.
A node inside of another node is often referred to as a “child” node.
- 5.
As long as we are on this subject, the editors also decided that the “Etymology” and “Extracts” that come before the famous “Call me Ishmael” should not be treated as chapters either. What those sections are, exactly, is something for scholars to debate.
References
Eder M (2013) Mind your corpus: systematic errors in authorship attribution. Digital Scholarship in the Humanities 28(4):603–614, URL https://doi.org/10.1093/llc/fqt039, https://doi.org/10.1093/llc/fqt039
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
L. Jockers, M., Thalken, R. (2020). Parsing TEI XML. In: Text Analysis with R. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-030-39643-5_12
Download citation
DOI: https://doi.org/10.1007/978-3-030-39643-5_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-39642-8
Online ISBN: 978-3-030-39643-5
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)