Skip to main content

An Introduction to XML

  • Chapter
  • First Online:
XML and Web Technologies for Data Sciences with R

Part of the book series: Use R! ((USE R))

  • 12k Accesses

Abstract

This chapter aims to give a reasonably comprehensive definition and motivation for the various aspects of the generic XML language and also to illustrate these aspects with some existing XML dialects or vocabularies. We describe elements, attributes, child elements, and the hierarchical structure of XML. We talk about “well-formedness” of an XML document and how to identify errors in a document’s structure. We discuss the use of namespaces and end with a brief discussion of validating documents with respect to DTDs and XML Schema. Readers already familiar with all aspects of XML can skip this chapter and read about the functions used to work with XML in R, which are the subject of each of Chapters 3, 4, 5, and 6.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Apple, Inc. Numbers for iOS: Supported file formats. http://support.apple.com/kb/HT4642, 2011.

  2. Bert Bos, Tantek Celik, Ian Hickson, and Hakon Wium Lie. Cascading style sheets, level 2, revision 1 (CSS 2.1) specification. Worldwide Web Consortium, 2011. http://www.w3.org/TR/CSS2/.

  3. Tim Bray, Dave Hollander, Andrew Layman, Richard Tobin, and Henry Thompson. Namespaces in XML 1.0. Worldwide Web Consortium, 2009. http://www.w3.org/TR/REC-xmlnames/.

  4. James Clark. nXML mode: An addon for GNU Emacs. http://www.thaiopensource.com/nxml-mode/, 2004.

  5. Data Mining Group. Predictive Model Markup Language. http://www.dmg.org/pmmlv3-2.html, 2011.

  6. Economic Commission for Europe. Common open standards for the exchange and sharing of socio-economic data and metadata: The SDMX initiative. http://sdmx.org/docs/2002/wp11.pdf, 2002.

  7. European Central Bank. Euro foreign exchange reference rates. http://www.ecb.int/stats/exchange/eurofxref/html/index.en.html, 2011.

  8. European Central Bank. SDMX-ML and SDMX-EDI (GESMES/TS): The ECB statistical representation standards. http://www.ecb.int/stats/services/sdmx/html/index.en.html, 2011.

  9. David Fallside and Priscilla Walmsley. XML schema, Part 0: Primer. Worldwide Web Consortium, 2004. http://www.w3.org/TR/xmlschema-0/.

  10. R. Gentleman, Elizabeth Whalen, W. Huber, and S. Falcon. graph: A package to handle graph data structures. http://cran.r-project.org/package=graph, 2011. R package version 1.33.0.

  11. Google, Inc. Keyhole markup language (KML) reference. https://developers.google.com/kml/documentation/kmlreference, 2010.

  12. Google, Inc. Google Earth: A 3D virtual earth browser, version 6. http://www.google.com/earth/, 2011.

  13. Google, Inc. Google Maps: A Web mapping service application. http://maps.google.com/, 2011.

  14. Google, Inc. Google documents list API: Allows developers to create, retrieve, update, and delete Google Docs. http://code.google.com/apis/documents/, 2012.

  15. Google, Inc. Google Sky: An online outer-space viewer. http://www.google.com/sky/, 2012.

  16. John Gruber. Markdown: A text-to-HTML conversion tool for Web writers. http://daringfireball.net/projects/markdown/, 2004.

  17. Elliotte Rusty Harold andW. Scott Means. XML in a Nutshell. O’Reilly Media, Inc., Sebastopol, CA, 2004.

    Google Scholar 

  18. David Hunter, Jeff Rafter, Joe Fawcett, Eric van der Vlist, Danny Ayers, Jon Duckett, Andrew Watt, and Linda McKinnon. Beginning XML. Wiley Publishing, Inc., Indianapolis, IN, fourth edition, 2007.

    Google Scholar 

  19. Bill Kennedy and Chuck Musciano. HTML and XHTML: The Definitive Guide. O’Reilly Media, Inc., Sebastopol, CA, 2006.

    Google Scholar 

  20. B. N. Lawrence, R. Lowry, P. Miller, H. Snaith, and A. Woolf. Information in environmental data grids. Philosophical Transactions of the Royal Society A: Mathematical, Physical, and Engineering Sciences, 367:1003–1014, 2009.

    Article  Google Scholar 

  21. LibreOffice; The Document Foundation. Calc: The LibreOffice spreadsheet program. http://www.libreoffice.org/features/calc/, 2011.

  22. R.G. Mann, R.M. Baxter, R. Carroll, Q. Wen, O.P. Buneman, B. Choi, W. Fan, R.W.O. Hutchison, and S.D. Viglas. XML Data in the virtual observatory. Astronomical Data Analysis Software and Systems XIV, 347:223, 2005.

    Google Scholar 

  23. Deborah Nolan, Roger Peng, and Duncan Temple Lang. Enhanced dynamic documents for reproducible research. In M.F. Ochs, J.T. Casagrande, and R.V. Davuluri, editors, Biomedical Informatics for Cancer Research, pages 335–346. Springer-Verlag, New York, 2009.

    Google Scholar 

  24. Deborah Nolan and Duncan Temple Lang. Learning from the statistician’s lab notebook. In Data and Context in Statistics Education: Towards an Evidence-based Society. Proceedings of the Eighth International Conference on Teaching Statistics (ICOTS8, July, 2010), Ljubljana, Slovenia. Voorburg, 2010.

    Google Scholar 

  25. Open Geospatial Consortium, Inc. OGC KML standards. http://www.opengeospatial.org/standards/kml/, 2010.

  26. Eric Raymond. DocBook demystification HOWTO, revision v1.3. The Linux Documentation Project, 2004. http://en.tldp.org/HOWTO/DocBook-Demystification-HOWTO/.

  27. Frank Rice. Introducing the Office (2007) Open XML file formats. http://msdn.microsoft.com/en-us/library/aa338205(v=office.12).aspx, 2006.

  28. Yakov Shafranovich. Common format and MIME type for comma-separated values (CSV) files. http://tools.ietf.org/html/rfc4180, 2011.

  29. Richard Stallman. GNU Emacs: An extensible, customizable text editor. http://www.gnu.org/software/emacs/, 2008.

  30. Statistical Data and Metadata Exchange Initiative. SDMX information model: UML conceptual design (version 2.0). http://www.sdmx.org/docs/2_0/SDMX_2_0SECTION_02_InformationModel.pdf, 2005.

  31. Bob Stayton. DocBook XSL: The Complete Guide. Sagehill Enterprises, Santa Cruz, CA, fourth edition, 2007.

    Google Scholar 

  32. Alex Szalay, Jim Gray, Ani Thakar, Bill Boroski, Roy Gai, Nolan Li, Peter Kunszt, Tanu Malik, Wil O’Mullane, Maria Nieto-Santisteban, Jordan Raddick, Chris Stoughton, and Jan van den Berg. The SDSS DR1 SkyServer: Public access to a terabyte of astronomical data. http://cas.sdss.org/dr6/en/skyserver/paper/, 2002.

  33. Duncan Temple Lang. RTidyHTML: Tidy HTML documents. http://www.omegahat.org/RTidyHTML, 2011. R package version 0.2-1.

  34. Duncan Temple Lang. XML: Tools for parsing and generating XML within R and S-PLUS. http://www.omegahat.org/RSXML, 2011. R package version 3.4.

  35. Duncan Temple Lang. XMLSchema: R facilities to read XML schema. http://www.omegahat.org/XMLSchema, 2012. R package version 0.7-0.

  36. United Nations Statistical Commission. Report on the thirty-ninth session. (Supplement No. 4, E/2008/24). http://unstats.un.org/unsd/statcom/doc08/DraftReport-English.pdf, 2008.

  37. US Food and Drug Administration. Structured product labeling resources. http://www.fda.gov/ForIndustry/DataStandards/StructuredProductLabeling/ default.htm, 2012.

  38. Eric van der Vlist. XML Schema. O’Reilly Media, Inc., Sebastopol, CA, 2002.

    Google Scholar 

  39. W3Schools, Inc. XML tutorial. http://www.w3schools.com/xml/default.asp, 2011.

  40. W3Schools, Inc. DTD tutorial. http://www.w3schools.com/dtd/default.asp, 2012.

  41. Priscilla Walmsley. Definitive XML Schema. Prentice Hall PTR, Upper Saddle River, NJ, 2001.

    Google Scholar 

  42. Norman Walsh and Leonard Muellner. DocBook: The Definitive Guide. O’Reilly Media, Inc., Sebastopol, CA, first edition, 1999. http://www.docbook.org/tdg5/.

  43. Worldwide Web Consortium. Extensible Markup Language (XML) 1.0. http://www.w3.org/TR/REC-xml/, 2008.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer Science+Business Media New York

About this chapter

Cite this chapter

Nolan, D., Lang, D.T. (2014). An Introduction to XML . In: XML and Web Technologies for Data Sciences with R. Use R!. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-7900-0_2

Download citation

Publish with us

Policies and ethics