Advertisement

Integration of Data on Substance Properties Using Big Data Technologies and Domain-Specific Ontologies

  • Adilbek Erkimbaev
  • Vladimir Zitserman
  • Georgii Kobzev
  • Andrey Kosinov
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 822)

Abstract

A new technology for storage and categorization of heterogeneous data on the properties of matter is proposed. Availability of a multitude of heterogeneous data from a variety of sources justifies the use of one of the popular toolkit for Big Data processing, Apache Spark. Its role in the proposed technology is to manage with extensive data warehouse in text files of the JSON format. The first stage of the technology involves the conversion of primary resources (relational databases, digital archives, Web-portals, etc.) to a standardized form of the JSON document. Advantages of JSON-format - the ability to store data and metadata within a text document, accessible perceptions of a person and a computer and support for the hierarchical structures needed to represent complex and irregular data structure. The presence of such data structures is associated with the possible expansion of the subject area: new types of materials, expansion of the nomenclature of properties, and so on. For the semantic integration of resources converted to the JSON format a repository of subject-oriented ontologies is used. The search for data in the JSON document store is implemented through a combination of SPARQL and SQL queries. The first one (addressed to the ontology repository) provide the user with the ability to view and search for adequate and related concepts. The second, accessing the JSON document sets, retrieves the required data from the document body using the capabilities of Apache Spark SQL. The efficiency of the developed technology is tested on the problems of thermophysical data integration with a characteristic for them complexity of the logical structure.

Keywords

Thermophysical properties Semi-structured data JSON format Ontology 

Notes

Acknowledgments

The work is supported by Russian Scientific Foundation, grant 14-50-00124.

References

  1. 1.
    WhatIs.com (a reference and self-education tool about information technology). http://whatis/techtarget.com/definition/3Vs
  2. 2.
    Erkimbaev, A.O., Zitserman, V.Y., Kobzev, G.A., Kosinov, A.V.: Standardization of Storage and Retrieval of Semi-structured Thermophysical Data in JSON-documents Associated with the Ontology. In: CEUR –WS 2022, urn: nbn:de:0074-2022-6 (2017). http://ceur-ws.org/Vol-2022/paper36.pdf
  3. 3.
    Frenkel, M., Chirico, R.D., Diky, V., et al.: XML-based IUPAC standard for experimental, predicted, and critically evaluated thermodynamic property data storage and capture (ThermoML). Pure Appl. Chem. 78, 541–612 (2006).  https://doi.org/10.1351/pac200678030541CrossRefGoogle Scholar
  4. 4.
    Sturrock, C.P., Begley, E.F., Kaufman, J.G.: NISTIR 6785. MatML – Materials Markup Language Workshop Report, U.S. Department of Commerce. National Institute of Standards and Technology (2001)Google Scholar
  5. 5.
    Introducing JSON. http://json.org/index.html
  6. 6.
    Michel, K., Meredig, B.: Beyond bulk single crystals: A data format for all materials structure–property–processing relationships. MRS Bull. 41, pp. 617–623.  https://doi.org/10.1557/mrs.2016.166CrossRefGoogle Scholar
  7. 7.
    Ontobee: A linked data server designed for ontologies. http://www.ontobee.org
  8. 8.
    Erkimbaev, A.O., Zhizhchenko, A.B., Zitserman, V.Yu, Kobzev, G.A., Son, E.E., Sotnikov, A.N.: Integration of databases on substance properties: approaches and technologies. Autom. Documentation Math. Linguist. 46, 170–176 (2012).  https://doi.org/10.3103/S000510551204005XCrossRefGoogle Scholar
  9. 9.
    Ataeva, O.M., Erkimbaev, A.O., Zitserman, V.Yu. et al.: Ontological Modeling as a Means of Integration Data on Substances Thermophysical Properties. In: 15th All-Russian Science Conference “Electronic Libraries: Advanced Approaches and Technologies, Electronic Collections”, s1_3. Yaroslavl (2013). http://rcdl.ru/doc/2013/paper/s1_3.pdf
  10. 10.
  11. 11.
    Hall, S.R., McMahon, B.: The implementation and evolution of STAR/CIF ontologies: interoperability and preservation of structured data. Data Sci. J. 15(3), 1–15 (2016).  https://doi.org/10.5334/dsj-2016-003CrossRefGoogle Scholar
  12. 12.
  13. 13.
    Kiselyova, N.N., Dudarev, V.A., Zemskov, V.S.: Computer information resources of inorganic chemistry and materials science. Rus. Chem. Rev. 79, 145–166 (2010).  https://doi.org/10.1070/RC2010v079n02ABEH004104CrossRefGoogle Scholar
  14. 14.
    Frenkel, M.: Global communications and expert systems in thermodynamics: Connecting property measurement and chemical process design. Pure Appl. Chem. 77, 1349–1367 (2005).  https://doi.org/10.1351/pac200577081349CrossRefGoogle Scholar
  15. 15.
    Belov, G.V., Iorish, V.S., Yungman, V.S.: IVTANTHERMO for Windows-database on thermodynamic properties and related software. Calphad 23, 173–180 (1999).  https://doi.org/10.1016/s0364-5916(99)00023-1CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Adilbek Erkimbaev
    • 1
  • Vladimir Zitserman
    • 1
  • Georgii Kobzev
    • 1
  • Andrey Kosinov
    • 1
  1. 1.Joint Institute for High TemperaturesRussian Academy of SciencesMoscowRussia

Personalised recommendations