Abstract
In this chapter, we explore approaches to parsing XML content within R and extracting content from the various types of elements in the XML document. The primary approach is to parse an XML document into a hierarchical tree object. We show how the tree representation of an XML document (described in Chapter 2) can be treated as a list in R, which makes it easy to navigate nodes and branches in the XML document. In addition, we demonstrate how to use functions in the XML package that are designed to work with different elements of the tree, e.g., functions for accessing node names, text content, attribute values, namespaces, etc. Subsequent chapters introduce XPath (Chapter 4), a powerful XML technology for locating content in an XML document, and describe more complex strategies for extracting XML content (Chapter 5).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Elliotte Rusty Harold and W. Scott Means. XML in a Nutshell. O’Reilly Media, Inc., Sebastopol, CA, 2004.
David Hunter, Jeff Rafter, Joe Fawcett, Eric van der Vlist, Danny Ayers, Jon Duckett, Andrew Watt, and Linda McKinnon. Beginning XML. Wiley Publishing, Inc., Indianapolis, IN, fourth edition, 2007.
Duncan Temple Lang. RTidyHTML: Tidy HTML documents. http://www.omegahat.org/RTidyHTML, 2011. R package version 0.2-1.
Duncan Temple Lang. XML: Tools for parsing and generating XML within R and S-PLUS. http://www.omegahat.org/RSXML, 2011. R package version 3.4.
Duncan Temple Lang. Rcompression: In-memory decompression for GNU zip and bzip2 formats. http://www.omegahat.org/Rcompression, 2012. R package version 0.94-0.
Duncan Temple Lang. RCurl: General network (HTTP, FTP, etc.) client interface for R. http://www.omegahat.org/RCurl, 2012. R package version 1.95-3.
USGS Earthquakes Hazards Program. Latest earthquakes: feeds and data. http://earthquake.usgs.gov/earthquakes/catalogs/, 2010.
Daniel Veillard. The XML C parser and toolkit of Gnome. http://www.xmlsoft.org, 2011.
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer Science+Business Media New York
About this chapter
Cite this chapter
Nolan, D., Lang, D.T. (2014). Parsing XML Content. In: XML and Web Technologies for Data Sciences with R. Use R!. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7900-0_3
Download citation
DOI: https://doi.org/10.1007/978-1-4614-7900-0_3
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7899-7
Online ISBN: 978-1-4614-7900-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)