Parsing XML Content

Nolan, Deborah; Lang, Duncan Temple

doi:10.1007/978-1-4614-7900-0_3

Deborah Nolan⁶ &
Duncan Temple Lang⁷

Part of the book series: Use R! ((USE R))

12k Accesses

Abstract

In this chapter, we explore approaches to parsing XML content within R and extracting content from the various types of elements in the XML document. The primary approach is to parse an XML document into a hierarchical tree object. We show how the tree representation of an XML document (described in Chapter 2) can be treated as a list in R, which makes it easy to navigate nodes and branches in the XML document. In addition, we demonstrate how to use functions in the XML package that are designed to work with different elements of the tree, e.g., functions for accessing node names, text content, attribute values, namespaces, etc. Subsequent chapters introduce XPath (Chapter 4), a powerful XML technology for locating content in an XML document, and describe more complex strategies for extracting XML content (Chapter 5).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Elliotte Rusty Harold and W. Scott Means. XML in a Nutshell. O’Reilly Media, Inc., Sebastopol, CA, 2004.
Google Scholar
David Hunter, Jeff Rafter, Joe Fawcett, Eric van der Vlist, Danny Ayers, Jon Duckett, Andrew Watt, and Linda McKinnon. Beginning XML. Wiley Publishing, Inc., Indianapolis, IN, fourth edition, 2007.
Google Scholar
Duncan Temple Lang. RTidyHTML: Tidy HTML documents. http://www.omegahat.org/RTidyHTML, 2011. R package version 0.2-1.
Duncan Temple Lang. XML: Tools for parsing and generating XML within R and S-PLUS. http://www.omegahat.org/RSXML, 2011. R package version 3.4.
Duncan Temple Lang. Rcompression: In-memory decompression for GNU zip and bzip2 formats. http://www.omegahat.org/Rcompression, 2012. R package version 0.94-0.
Duncan Temple Lang. RCurl: General network (HTTP, FTP, etc.) client interface for R. http://www.omegahat.org/RCurl, 2012. R package version 1.95-3.
USGS Earthquakes Hazards Program. Latest earthquakes: feeds and data. http://earthquake.usgs.gov/earthquakes/catalogs/, 2010.
Daniel Veillard. The XML C parser and toolkit of Gnome. http://www.xmlsoft.org, 2011.

Download references

Author information

Authors and Affiliations

Department of Statistics, University of California, Berkeley, CA, USA
Deborah Nolan
Department of Statistics, University of California, Davis, CA, USA
Duncan Temple Lang

Authors

Deborah Nolan
View author publications
You can also search for this author in PubMed Google Scholar
Duncan Temple Lang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Nolan, D., Lang, D.T. (2014). Parsing XML Content. In: XML and Web Technologies for Data Sciences with R. Use R!. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7900-0_3

Download citation

DOI: https://doi.org/10.1007/978-1-4614-7900-0_3
Published: 12 November 2013
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-7899-7
Online ISBN: 978-1-4614-7900-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics