Skip to main content

The Problem with XSD Binary Floating Point Datatypes in RDF

Part of the Lecture Notes in Computer Science book series (LNCS,volume 13261)

Abstract

The XSD binary floating point datatypes are regularly used for precise numeric values in RDF. However, the use of these datatypes for knowledge representation can systematically impair the quality of data and, compared to the XSD decimal datatype, increases the probability of data processing producing false results. We argue why in most cases the XSD decimal datatype is better suited to represent numeric values in RDF. A survey of the actual usage of datatypes on the relevant subset of the December 2020 Web Data Commons dataset, containing 19 453 060 341 literals from real web data, substantiates the practical relevancy of the described problem: 29%–68% of binary floating point values are distorted due to the datatype.

Keywords

  • Data Quality
  • Datatypes
  • Floating Point Numbers
  • Knowledge Graphs
  • Numerical Stability
  • RDF
  • XSD

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-031-06981-9_10
  • Chapter length: 18 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   69.99
Price excludes VAT (USA)
  • ISBN: 978-3-031-06981-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   89.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.
Fig. 3.

Notes

  1. 1.

    integer, long, int, short, byte, nonNegativeInteger, positiveInteger, unsignedLong, unsignedInt, unsignedShort, unsignedByte, nonPositiveInteger, and negativeInteger.

  2. 2.

    As the XSD recommendation refers to IEEE 754-2008 version of the standard, we do not refer to the subsequent IEEE 754-2019 version.

  3. 3.

    https://microformats.org.

  4. 4.

    https://html.spec.whatwg.org/multipage/microdata.html.

  5. 5.

    http://schema.org, current version 13.0.

  6. 6.

    https://wikiba.se/, SPARQL endpoint example: https://query.wikidata.org.

  7. 7.

    Example:

  8. 8.

    https://virtuoso.openlinksw.com/.

  9. 9.

    https://jena.apache.org/.

  10. 10.

    https://protege.stanford.edu/.

  11. 11.

    Lexical mappings (roundTiesToEven rounding scheme): \({73.1} \rightarrow {73.099\,998\,474\,1}...\) and \({0.1} \rightarrow {0.100\,000\,001\,4}...\), Interval calculations: \({73.099\,998\,474\,1}...\,\pm \,{0.100\,000\,001\,4}...\).

  12. 12.

    http://webdatacommons.org/structureddata/#results-2020-1.

  13. 13.

    https://commoncrawl.org/2020/10/september-2020-crawl-archive-now-available/.

  14. 14.

    https://github.com/jsonld-java/jsonld-java/blob/v0.13.1/core/src/main/java/com/github/jsonldjava/core/RDFDataset.java#L673.

  15. 15.

    Prefixes used for results presentation: dcterms: http://purl.org/dc/terms/, dv: http://rdf.data-vocabulary.org/#, gr: http://purl.org/goodrelations/v1#, rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#, rev: http://purl.org/ stuff/rev#, schema: http://schema.org/, use: http://search.yahoo.com/sear chmonkey-datatype/use/, vcard: http://www.w3.org/2006/vcard/ns#, xsd: http://www.w3.org/2001/XMLSchema#.

  16. 16.

    https://web.archive.org/web/20200919100939/https://open.nrw/dataset/telefonverzeichnis-alphabetisch-oktober-2019-odp.

  17. 17.

    \( \frac{\sum _{\rm T10NIFP literals}{\rm Measures}\,1 \& 2}{\sum _{\texttt {xsd:double},\texttt {xsd:float} \text {literals Measure}\,3}}\,\approx \,0.29 \), \( \frac{\sum _{\texttt {xsd:double},\texttt {xsd:float} \text {literals}}\text {Measures}\,1 \& 2}{\sum _{\texttt {xsd:double},\texttt {xsd:float} \text {literals}}\text {Measure}\,3}\,\approx \,0.68 \).

References

  1. W3C RDF Working Group: RDF 1.1 Concepts and Abstract Syntax. W3C Recommendation. W3C, 25 February 2014. http://www.w3.org/TR/2014/REC-rdf11-concepts-20140225/

  2. W3C XML Schema Working Group: W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. W3C Recommendation. W3C, 5 April 2012. http://www.w3.org/TR/2012/REC-xmlschema11-2-20120405/

  3. Keil, J.M., Schindler, S.: Comparison and evaluation of ontologies for units of measurement. Semant. Web 10(1), 33–51 (2019). https://doi.org/10.3233/SW-180310

    CrossRef  Google Scholar 

  4. Noy, N.F., McGuinness, D.L.: Ontology development 101: a guide to creating your first ontology. Technical report KSL-01-05/SMI-2001-0880. Stanford Knowledge Systems Laboratory and Stanford Medical Informatics, March 2001. http://www.ksl.stanford.edu/people/dlm/papers/ontology-tutorial-noy-mcguinness-abstract.html

  5. W3C Semantic Web Best Practices and Deployment Working Group: XML Schema Datatypes in RDF and OWL. W3C Working Group Note. W3C, 14 March 2006. https://www.w3.org/TR/2006/NOTE-swbp-xsch-datatypes-20060314/

  6. Patriot Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia. Technical report GAO/IMTEC-92-26. General Accounting Office, Information Management and Technology Division, 20 p., 4 February 1992. https://www.gao.gov/products/IMTEC-92-26

  7. IEEE: IEEE 754–2008 Standard for Floating-Point Arithmetic. Standard 754, 70 p., 29 August 2008. https://doi.org/10.1109/IEEESTD.2008.4610935

  8. Beckett, D., Berners-Lee, T., Prud’hommeaux, E., Carothers, G.: RDF 1.1 turtle: terse RDF triple language. W3C Recommendation. W3C, 25 February 2014. https://www.w3.org/TR/2014/REC-turtle-20140225/

  9. Bizer, C., Cyganiak, R.: RDF 1.1 TriG: RDF dataset language. W3C Recommendation. W3C, 25 February 2014. https://www.w3.org/TR/2014/REC-trig-20140225/

  10. W3C SPARQL Working Group: SPARQL 1.1 Query Language. W3C Recommendation. W3C, 21 March 2013. https://www.w3.org/TR/2013/REC-sparql11-query-20130321/

  11. Sporny, M., Longley, D., Kellogg, G., et al.: JSON-LD 1.1: a JSON-based Serialization for Linked Data. W3C Recommendation. W3C, 16 July 2020. https://www.w3.org/TR/2020/REC-json-ld11-20200716/

  12. Bray, T.: The JavaScript Object Notation (JSON) data interchange format. Standard 8259, 16 p., December 2017. https://doi.org/10.17487/RFC8259

  13. ECMA International: ECMA-404, The JSON Data Interchange Format. Standard (2017). https://ecma-international.org/publications/standards/Ecma-404.htm

  14. W3C RDF Working Group: RDF 1.1 XML Syntax. In: Gandon, F., Schreiber, G. (eds.) W3C Recommendation, 25 February 2014. https://www.w3.org/TR/2014/REC-rdf-syntax-grammar-20140225/

  15. Beckett, D.: RDF 1.1 N-triples: a line-based syntax for an RDF graph. W3C Recommendation. W3C, 25 February 2014. https://www.w3.org/TR/2014/REC-n-triples-20140225/

  16. W3C RDF Working Group: RDF 1.1 N-quads: a line-based syntax for RDF datasets. W3C Recommendation. W3C, 25 February 2014. https://www.w3.org/TR/2014/REC-n-quads-20140225/

  17. W3C RDFa Working Group: RDFa Core 1.1 - Third Edition: Syntax and processing rules for embedding RDF through attributes. W3C Recommendation. W3C, 17 March 2015. https://www.w3.org/TR/2015/REC-rdfa-core-20150317/

  18. W3C XML Schema Working Group: An XSD datatype for IEEE floating-point decimal. W3C Working Group Note. W3C, 9 June 2011. https://www.w3.org/TR/2011/NOTE-xsd-precisionDecimal-20110609/

  19. Higham, N.J.: Accuracy and Stability of Numerical Algorithms, 2nd edn., p. xxvii + 663. SIAM (2002). https://doi.org/10.1137/1.9780898718027

  20. W3C XML Schema Working Group: RQ-28 Allow scientific notation for decimals (scientific-notn), 11 February 2006. https://www.w3.org/Bugs/Public/show_bug.cgi?id=2853

  21. W3C XML Schema Working Group: W3C XML Schema Definition Language (XSD) 1.1 Part 2: Datatypes. W3C Candidate Recommendation. W3C, 21 July 2011. https://www.w3.org/TR/2011/CR-xmlschema11-2-20110721/

  22. International Vocabulary of Metrology. Basic and general concepts and associated terms. JCGM 200:2012 (JCGM 200:2008 with minor corrections). Joint Committee for Guides in Metrology (2012)

    Google Scholar 

  23. Neumaier, A.: Introduction to Numerical Analysis, p. 366. Cambridge University Press, Cambridge (2012)

    Google Scholar 

  24. Poveda-Villalón, M., Gómez-Pérez, A., Suárez-Figueroa, M.C.: OOPS! (ontology pitfall scanner!): an on-line tool for ontology evaluation. Int. J. Semant. Web Inf. Syst. 10(2), 7–34 (2014). https://doi.org/10.4018/ijswis.2014040102

    CrossRef  Google Scholar 

  25. Meusel, R., Petrovski, P., Bizer, C.: The WebDataCommons microdata, RDFa and microformat dataset series. In: Mika, P., et al. (eds.) ISWC 2014. LNCS, vol. 8796, pp. 277–292. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-11964-9_18

    CrossRef  Google Scholar 

  26. Gänßinger, M., Keil, J.M.: RDF property and datatype usage scanner v1.0.0 (2021). https://doi.org/10.5281/zenodo.6258887

  27. Keil, J.M., Gänßinger, M.: Web data commons (December 2020) property and datatype usage dataset (2022). https://doi.org/10.5281/zenodo.6205111

  28. Keil, J.M.: Web data commons (December 2020) property and datatype usage analysis scripts (2022). https://doi.org/10.5281/zenodo.6264286

Download references

Acknowledgments

Many thanks to Alsayed Algergawy, Felicitas Löffler, Samira Babalou, Sheeba Samuel, Sirko Schindler, Eberhard Zehendner, and the first author’s supervisor Birgitta König-Ries, as well as 10 anonymous reviewers for very helpful comments on earlier drafts of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Study conception and design, analysis and interpretation of results, and draft manuscript preparation were performed by Jan Martin Keil. Data collection was performed by Merle Gänßinger and Jan Martin Keil. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Jan Martin Keil .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Keil, J.M., Gänßinger, M. (2022). The Problem with XSD Binary Floating Point Datatypes in RDF. In: , et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-06981-9_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-06980-2

  • Online ISBN: 978-3-031-06981-9

  • eBook Packages: Computer ScienceComputer Science (R0)