Abstract
This chapter introduces readers to parsing XML in R with an emphasis on TEI encoded XML.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
See Eder, Maciej. “Mind your corpus: systematic errors in authorship attribution.” in Conference Abstracts of the 2012 Digital Humanities Conference, Hamburg, Germany. http://www.dh2012.uni-hamburg.de/conference/programme/abstracts/mind-your-corpus-systematic-errors-in-authorship-attribution/.
- 2.
While it is possible to download all of the available packages, doing so would certainly take a long time and would clog up your installation with way too many irrelevant features. R is a multipurpose platform used in a huge range of disciplines including: bio-statistics, network analysis, economics, data-mining, geography, and hundreds of other disciplines and sub-disciplines. This diversity in the user community is one of the great advantages of R and of open-source software more generally. The diversity of options, however, can be daunting to the novice user, and, to make matters even more unnerving, the online R user community is notoriously specialized and siloed and can appear to be rather impatient when it comes to newbies asking simple questions. Having said that, the online community is also an incredible resource that you must not ignore. Because the packages developed for R are developed by programmers with at least some amount of ad hoc motivation behind their coding, the packages are frequently weak on documentation and generally assume some, if not extensive, familiarity with the academic discipline of the programmer (even if the package is one with applications that cross disciplinary boundaries).
- 3.
Notice the different path here. The XML version of Moby Dick is located in a different subdirectory of the main.
- 4.
A node inside of another node is often referred to as a “child” node.
- 5.
Notice that the chap.title object is another type of list, which is why the further bracketed sub-setting is required in order to get at the text contents.
- 6.
They won’t be exactly the same because they come from slightly different sources.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Jockers, M.L. (2014). Text Quality, Text Variety, and Parsing XML . In: Text Analysis with R for Students of Literature. Quantitative Methods in the Humanities and Social Sciences. Springer, Cham. https://doi.org/10.1007/978-3-319-03164-4_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-03164-4_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-03163-7
Online ISBN: 978-3-319-03164-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)