Skip to main content

Parsing XML Content

  • Chapter
  • First Online:
XML and Web Technologies for Data Sciences with R

Part of the book series: Use R! ((USE R))

  • 12k Accesses

Abstract

In this chapter, we explore approaches to parsing XML content within R and extracting content from the various types of elements in the XML document. The primary approach is to parse an XML document into a hierarchical tree object. We show how the tree representation of an XML document (described in Chapter 2) can be treated as a list in R, which makes it easy to navigate nodes and branches in the XML document. In addition, we demonstrate how to use functions in the XML package that are designed to work with different elements of the tree, e.g., functions for accessing node names, text content, attribute values, namespaces, etc. Subsequent chapters introduce XPath (Chapter 4), a powerful XML technology for locating content in an XML document, and describe more complex strategies for extracting XML content (Chapter 5).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Elliotte Rusty Harold and W. Scott Means. XML in a Nutshell. O’Reilly Media, Inc., Sebastopol, CA, 2004.

    Google Scholar 

  2. David Hunter, Jeff Rafter, Joe Fawcett, Eric van der Vlist, Danny Ayers, Jon Duckett, Andrew Watt, and Linda McKinnon. Beginning XML. Wiley Publishing, Inc., Indianapolis, IN, fourth edition, 2007.

    Google Scholar 

  3. Duncan Temple Lang. RTidyHTML: Tidy HTML documents. http://www.omegahat.org/RTidyHTML, 2011. R package version 0.2-1.

  4. Duncan Temple Lang. XML: Tools for parsing and generating XML within R and S-PLUS. http://www.omegahat.org/RSXML, 2011. R package version 3.4.

  5. Duncan Temple Lang. Rcompression: In-memory decompression for GNU zip and bzip2 formats. http://www.omegahat.org/Rcompression, 2012. R package version 0.94-0.

  6. Duncan Temple Lang. RCurl: General network (HTTP, FTP, etc.) client interface for R. http://www.omegahat.org/RCurl, 2012. R package version 1.95-3.

  7. USGS Earthquakes Hazards Program. Latest earthquakes: feeds and data. http://earthquake.usgs.gov/earthquakes/catalogs/, 2010.

  8. Daniel Veillard. The XML C parser and toolkit of Gnome. http://www.xmlsoft.org, 2011.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Nolan, D., Lang, D.T. (2014). Parsing XML Content. In: XML and Web Technologies for Data Sciences with R. Use R!. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7900-0_3

Download citation

Publish with us

Policies and ethics