Abstract
Common approaches and technologies applied to digital data storage and processing in various disciplines are analyzed. It is shown that regardless of a specific subject area, working with large data set obtained as a result of experimenting or modeling requires similar methodological support, involving data curation, metadata support, and data genesis and quality annotation. The interdisciplinary field called “The properties of materials and substances” is analyzed as an example of a discipline that actively applies digital data. New approaches to the integration of data with heterogeneous properties that take into account structural data variations by the class of substances, the state of sample, experimental conditions, and other factors are investigated.
Similar content being viewed by others
References
Lynch, C., Big Data: How Do Your Data Grow?, Nature, 2008, vol. 455, pp. 28–29.
Gray, J., Szalay, A.S., Thakar, A.R., et al., Online scientific data curation, publication, and archiving, in Technical Report MSR-TR-2002-74. Microsoft Research.
The Fourth Paradigm. Data-Intensive Scientific Discovery, Hey, T., Tansley, St., and Tolle, Kr., Eds., Microsoft Corporation, 2009.
Thanos, C., A vision for global research data infrastructures, Data Sci. J., 2013, vol. 12, pp. 71–90.
Zhao, J., Corcho, O., Missier, P., et al., eScience, in Handbook of Semantic Web Technologies, Berlin Heidelberg: Springer-Verlag, 2011, pp. 703–733.
Borne, K., Astroinformatics: Data-oriented astronomy research and education, Earth Sci. Inf., 2010, vol. 3, no. 1, pp. 5–17.
Erkimbaev, A.O., Zitserman, V.Yu., Kobzev, G.A., and Trakhtenhers, M.S., Nanoinformatics: Problems, methods, and technologies, Sci. Tech. Inf. Process., 2016, vol. 43, no. 4, pp. 199–216.
Smith, F.J., Data science as an academic discipline, Data Sci. J., 2006, vol. 5, pp. 163–164.
Bohle, S., What is e-science and how should it be managed, SciLogs, June 12, 2013.
Erbach, G., Data-centric view in E-science inforation systems, Data Sci. J., 2006, vol. 5, pp. 219–222.
Zhu, Y. and Xiong, Y., Towards data science, Data Sci. J., 2015, vol. 14, no. 8, pp. 1–7.
Zabezhailo, M.I., Intellectual data analysis–a new direction of development of information technologies, Nauchno-Tekh. Inf., Ser. 2, 1998, no. 5, pp. 6–17.
Zitserman, V.Yu., Kobzev, G.A., and Fokin, L.R., Prospects for the development of information and analytical tools in the collection and generation of reference data, Nauchno-Tekh. Inf., Ser. 1, 2004, no. 2, pp. 7–14.
Hansen, C., Johnson, C.R., Pascucci, V., and Silva, C.T., Visualization for data-intensive science, in The Fourth Paradigm. Data-Intensive Scientific Discovery, Hey, T., Tansley, St., and Tolle, Kr., Eds., Microsoft Corporation, 2009.
Palmer, C.L., Weber, N.M., Munoz, T., and Renear, A.H., Foundations of data curation: The pedagogy and practice of “purposeful work” with research data, Arch. J., 2013, vol. 3.
Zorich, D.M., Data management: Managing electronic information: Data curation in museums, Mus. Manage. Curatorship, 1995, vol. 14, no. 4, p. 431.
Erkimbaev, A.O., Zitserman, V.Yu., and Kobzev, G.A., The role of metadata in the creation and use of information resources about the properties of substances and materials, Nauchno-Tekh. Inf., Ser. 1, 2008, no. 11, pp. 13–19.
Khokhlov, Yu.E. and Arnautov, S.A., Overview of metadata formats. http://www.elbib.ru/index.phtml? page=elbib/rus/methodology/md_rev.
Erkimbaev, A.O., Zitserman, V.Yu., Kobzev, G.A., and Fokin, L.R., The logical structure of physical and chemical data. Problems of standardization and exchange of numerical data, Zh. Fiz. Khim., 2008, vol. 82, no. 1, pp. 20–31.
Erkimbaev, A.O., Zitserman, V.Yu., Kobzev, G.A., and Trakhtenhers, M.S., A universal metadata system for the characterization of nanomaterials, Sci. Tech. Inf. Process., 2015, vol. 42, no. 4, pp. 211–222.
Stonebraker, M. et al., Requirements for science data bases and sciDB, Fourth Bienial Conference on Innovation Data Systems Research, 2009. http://www-db.cs.wisc.edu/cidr/cidr2009/Paper_26.pdf.
Chirico, R.D., Frenkel, M., Diky, V.V., et al., ThermoMLs: an XML-based approach for storage and exchange of experimental and critically evaluated thermophysical and thermochemical property data. 2. Uncertainties, J. Chem. Eng. Data, 2003, vol. 48, no. 5, pp. 1344–1359.
Eletskii, A.V., Erkimbaev, A.O., Zitserman, V.Yu., Kobzev, G.A., and Trakhtengerts, M.S., Thermophysical properties of nanosized objects: Systematization and estimation of data reliability, Teplofiz. Vys. Temp., 2012, vol. 50, no. 4, pp. 524–532.
Wang, R.Y. and Strong, D.M., Beyond accuracy: What data quality means to data consumers, J. Manage. Inf. Syst., 1996, vol. 12, no. 4, pp. 5–33.
Cai, L. and Zhu, Y., The challenges of data quality and data quality assessment in the big data era, Data Sci. J., 2015, vol. 14, no. 2, pp. 1–10.
Potapov, V.M. and Kochetova, E.K., Khimicheskaya informatsiya. Gde i kak iskat' khimiku nuzhnye svedeniya (Chemical Information. Where and How Chemist Should Find Necessary Information), Moscow: Khimiya, 1988.
Frenkel, M., Global communications and expert systems in thermodynamics: Connecting property measurement and chemical process design, Pure Applied Chem., 2005, vol. 77, no. 8, pp. 1349–1367.
Hill, J., Mulholland, G., Persson, K., et al., Materials science with large-scale data and informatics: Unlocking new opportunities, MRS Bull., 2016, vol. 41, no. 5, pp. 399–409.
Hunt, W.H., Jr., Materials informatics: Growing from the bio world, JOM, 2006, vol. 58, no. 7, p. 88.
Kiseleva, N.N. and Dudarev, V.A., The infrastructure of providing specialists with data in inorganic chemistry and materials science, Trudy XVIII Mezhdunarodnoi konferentsii DAMDID/RCDL'2016 “Analitika i upravlenie dannymi v oblastyakh s intensivnym ispol’zovaniem dannykh” (Proceedings of the XVIII International Conference DAMDID/RCDL'2016 Analytics and Data Management in Areas with Intensive Data Use, Ershovo, October 11–14, 2016), 2016, pp. 191–198.
Dudarev, V.A., Integratsiya informatsionnykh sistem v oblasti neorganicheskoi khimii i materialovedeniya (Integration of Information Systems in the Field of Inorganic Chemistry and Materials Science), Moscow: KRASAND, 2016.
Rodgers, J.R. and Cebon, D., Materials informatics, MRS Bull., 2006, vol. 31, no. 12, pp. 975–980.
Erkimbaev, A.O., Zhizhchenko, A.B., Zitserman, V.Yu., Kobzev, G.A., Son, E.E., and Sotnikov, A.N., Integration of databases on substance properties: Approaches and technologies, Autom. Doc. Math. Linguist., 2012, vol. 46, no. 4, pp. 170–176.
Zhang, X., Zhao, C., and Wang, X., A survey on knowledge representation in materials science and engineering: An ontological perspective, Comput. Ind., 2015, vol. 73, pp. 8–22.
Erkimbaev, A.O., Zitserman, V.Yu., Kobzev, G.A., and Kosinov, A.V., Linking the ontologies to databases for properties of substances and materials, Nauchno-Tekh. Inf., Ser. 2, 2015, no. 12, pp. 1–16.
Dima, A., Bhaskarla, S., Becker, C., et al., Informatics infrastructure for the materials genome initiative, JOM, 2016, vol. 68, no. 8, pp. 2053–2064.
Michel, K. and Meredig, B., Beyond bulk single crystals: A data format for all materials structure–property–processing relationships, MRS Bull., 2016, vol. 41, no. 8, pp. 617–623.
Author information
Authors and Affiliations
Corresponding author
Additional information
Original Russian Text © A.O. Erkimbaev, V.Yu. Zitserman, G.A. Kobzev, 2017, published in Nauchno-Tekhnicheskaya Informatsiya, Seriya 2: Informatsionnye Protsessy i Sistemy, 2017, No. 9, pp. 9–22.
About this article
Cite this article
Erkimbaev, A.O., Zitserman, V.Y. & Kobzev, G.A. The intensive use of digital data in modern natural science. Autom. Doc. Math. Linguist. 51, 201–213 (2017). https://doi.org/10.3103/S0005105517050028
Received:
Published:
Issue Date:
DOI: https://doi.org/10.3103/S0005105517050028