Abstract
A review of scientific publications covering the acquisition and use of big data in modern analytical chemistry is presented. Such data are characterized by considerable volumes, flows, and variety. Their generation and manipulations with them accompany the analysis of biosamples and samples of other origin by chromatography and mass spectrometry. Big data obtained by these techniques ensure multianalyte sample analysis, though the characteristics of the detection, identification, and quantification are satisfactory not for all analytes. The application of simple analytical systems can also be accompanied by the accumulation of big data volumes. A huge body of information is contained in big chemical databases, the use of which is necessary in non-target analysis. The selection of candidates for identification takes into account the prevalence (citation rate) of chemicals; identification includes the use of reference mass spectral libraries. Methods of data processing, analysis, and presentation (statistics, chemometrics) evolve with the growth of the volume of information. Technical characteristics of computers and their networks are improved at advancing rates, creating a potential for the development of methods of data analysis and opening new possibilities for interlaboratory cooperation.
Similar content being viewed by others
Notes
“Information” and “data” are often considered similar notions. More exactly, “data” are “raw materials”, and “information” is processed data (see, for example, [2]).
REFERENCES
Eckschlager, K. and Danzer, K., Information Theory in Analytical Chemistry, New York: Wiley, 1994.
Harris, J., Data, Information, and Knowledge Management. http://www.ocdqblog.com/home/data-information-and-knowledge-management.html. Accessed October 1, 2018.
Williams, A.J. and Pence, H.E., Chem. Int., 2017, vol. 39, no. 3, p. 9.
May, J.C. and McLean, J.A., Annu. Rev. Anal. Chem., 2016, vol. 9, p. 387.
Szymańska, E., Anal. Chim. Acta, 2018, vol. 1028, p. 1.
Tauler, R. and Parastar, H., Angew. Chem., Int. Ed. E-ngl., 2018. https://doi.org/10.1002/anie.201801134
Kalidindi, S.R. and De Graef, M., Annu. Rev. Mater. Res., 2015, vol. 45, p. 171.
Andreu-Perez, J., Poon, C.C., Merrifield, R.D., Wong, S.T., and Yang, G.Z., IEEE J. Biomed. Health Inf., 2015, vol. 19, no. 4, p. 1193.
Pence, H.E. and Williams, A.J., J. Chem. Educ., 2016, vol. 93, no. 3, p. 504.
Chiang, L., Lu, B., and Castillo, I., Annu. Rev. Chem. Biomol. Eng., 2017, vol. 8, p. 63.
Haug, K., Salek, R.M., and Steinbeck, C., Curr. Opin. Chem. Biol., 2017, vol. 36, p. 58.
Pluskal, T. and Yanagida, M., Cold Spring Harbor Protocols, 2016, vol. 2016, no. 12, p. 1044.
Veselkov, K., Sleeman, J., Claude, E., Vissers, J.P., Galea, D., Mroz, A., Laponogov, I., Towers, M., Tong, R., Mirnezami, R., Takats, Z., Nicholson, J., and Langridge, J.I., Sci. Rep., 2018, vol. 8, no. 1, p. 4053.
Wright, D.A., US Patent 8935101, 2015.
Michalski, A., Cox, J., and Mann, M., J. Proteome Res., 2011, vol. 10, no. 4, p. 1785.
Apte, J.S., Messier, K.P., Gani, S., Brauer, M., Kirchstetter, T.W., Lunden, M.M., Marshall, J.D., Portier, C.J., Vermeulen, R.C.H., and Hamburg, S.P., Environ. Sci. Technol., 2017, vol. 51, no. 12, p. 6999.
Bandodkar, A.J., Jeerapan, I., and Wang, J., ACS Sens, 2016, vol. 1, no. 5, p. 464.
Koydemir, H.C. and Ozcan, A., Annu. Rev. Anal. Chem., 2018, vol. 11, no. 1, p. 127.
Mil’man, B.L. and Zhurkovich, I.K., Analitika (Analytics), 2017, no. 5, p. 30.
CAS content. http://www.cas.org/about/cas-content. Accessed July 31, 2019.
Substance identity in REACH. EU Final Report, 2016. https://op.europa.eu/en/publication-detail/-/publication/b31a7b23-b544-11e7-837e-01aa75ed71a1/language-en. Accessed October 2, 2018.
PubChem. https://pubchem.ncbi.nlm.nih.gov/search. Accessed July 31, 2019.
ChemSpider. http://www.chemspider.com. Accessed July 31, 2019.
ZINC15. http://zinc15.docking.org. Accessed July 31, 2019.
Milman, B.L. and Zhurkovich, I.K., TrAC,Trends Anal. Chem., 2017, vol. 97, p. 179.
Milman, B.L. and Kovrizhnych, M.A., Fresenius’ J. Anal. Chem., 2000, vol. 367, no. 7, p. 629.
Milman, B.L., Anal. Chem., 2002, vol. 74, no. 7, p. 1484.
Milman, B.L., J. Chem. Inf. Model, 2005, vol. 45, no. 5, p. 1153.
Mil’man, B.L., Vvedenie v khimicheskuyu identifikatsiyu (Introduction to Chemical Identification), St. Petersburg: VVM, 2008.
Milman, B.L., Chemical Identification and Its Quality Assurance, Berlin: Springer, 2011.
How many proteins exist in human body? http:// www.innovateus.net/health/how-many-proteins-exist-human-body. Accessed October 2, 2018.
Mass spectral libraries (NIST 17 and Wiley libraries). http://www.sisweb.com/software/ms/wiley.htm. Accessed October 2, 2018.
Guijas, C., Montenegro-Burke, J.R., Domingo-Almenara, X., Palermo, A., Warth, B., Hermann, G., Koellensperger, G., Huan, T., Uritboonthai, W., Aisporna, A.E., Wolan, D.W., Spilker, M.E., Benton, H.P., and Siuzdak, G., Anal. Chem., 2018, vol. 90, no. 5, p. 3156.
The Global Natural Product Social Molecular Networking (GNPS). https://gnps.ucsd.edu/ProteoSAFe/gnpslibrary.jsp?library=all. Accessed October 3, 2018.
MONA, MassBank of North America. http://mona.fiehnlab.ucdavis.edu. Accessed November 17, 2018.
MassBank. https://massbank.eu/MassBank. Accessed October 3, 2018.
Spectral Database for Organic Compounds. https://sdbs.db.aist.go.jp/sdbs/cgi-bin/cre_index.cgi. Accessed November 17, 2018.
HighChem Spectral Tree. http://www.highchem.com/index.php/81-massfrontier. Accessed November 17, 2018.
PeptideAtlas Overview. http://www.peptideatlas.org/overview.php. Accessed October 3, 2018.
X!HUNTER Annotated Spectrum Library. http://thegpm.org/HUNTER/index.html. Accessed October 3, 2018.
Griss, J., Foster, J.M., Hermjakob, H., and Vizcaino, J.A., Nat. Methods, 2013, vol. 10, no. 2, p. 95.
NIST Libraries of Peptide Tandem Mass Spectra. https:// chemdata.nist.gov/dokuwiki/doku.php?id=peptidew:start. Accessed November 17, 2018.
Kind, T., Tsugawa, H., Cajka, T., Ma, Y., Lai, Z., Mehta, S.S., Wohlgemuth, G., Barupal, D.K., Showalter, M.R., Arita, M., and Fiehn, O., Mass Spectrom. Rev., 2017, vol. 37, no. 4, p. 513.
Blaženović, I., Kind, T., Ji, J., and Fiehn, O., Metabolites, 2018, vol. 8, no. 2, p. 31.
Dumancas, G.G., Bello, G.A., Hughes, J., Murimi, R., Viswanath, L.C., Orndorff, C.O., Dumancas, G.F., and Dell, J.D., in Handbook of Research on Big Data Storage and Visualization Techniques, Segall, R. and Cook, J., Eds., Hershey, PA: IGI Global, 2018, p. 873. https://doi.org/10.4018/978-1-5225-3142-5.ch030
Dubrov, A.M., Mkhitaryan, V.S., and Troshin, L.I., Mnogomernye statisticheskie metody (Multivariate Statistical Methods), Moscow: Finansy i statistika, 1998.
Krallinger, M., Rabal, O., Lourenço, A., Oyarzabal, J., and Valencia, A., Chem. Rev., 2017, vol. 117, no. 12, p. 7673.
Postma, G.J. and Kateman, G., J. Chem. Inf. Comput. Sci., 1993, vol. 33, no. 3, p. 350.
Schneider, N., Lowe, D.M., Sayle, R.A., Tarselli, M.A., and Landrum, G.A., J. Med. Chem., 2016, vol. 59, no. 9, p. 4385.
Milman, B.L., Gostev, V.V., and Dmitriev, A.V., J. Anal. Chem., 2018, vol. 73, no. 13, p. 1217.
Milman, B.L. and Zhurkovich, I.K., Mass Spectrom. Lett., 2018, vol. 9, no. 3, p. 73.
Sample size calculator. http://www.surveysystem.com/ sscalc.htm. Accessed October 3, 2018.
Nazipova, N.N., Isaev, E.A., Kornilov, V.V., Pervukhin, D.V., Morozova, A.A., Gorbunov, A.A., and Ustinin, M.N., Mat. Biol. Bioinf., 2017, vol. 12, no. 1, p. 102.
Alyass, A., Turcotte, M., and Meyre, D., BMC Med. Genomics, 2015, vol. 8, no. 1, p. 33.
Schymanski, E.L., Ruttkies, C., Krauss, M., Brouard, C., Kind, T., Dührkop, K., Allen, F., Vaniya, A., Verdegem, D., S. Böcker Rousu, J., Shen, H., Tsugawa, H., Sajed, T., Fiehn, O., Ghesquiére, B., and Neumann, S., J. Cheminf., 2017, vol. 9, p. 22.
Blaźenović, I., Kind, T., Torbašinović, H., Obrenović, S., Mehta, S.S., Tsugawa, H., Wermuth, T., Schauer, N., Jahn, M., Biedendieck, R., Jahn, D., and Fiehn, O., J. Cheminf., 2017, vol. 9, p. 32.
Blaźenović, I., Kind, T., Sa, M.R., Ji, J., Vaniya, A., Wancewicz, B., Roberts, B.S., Torbašinović, H., Lee, T., Mehta, S.S., Showalter, M.R., Song, H., Kwok, J., Jahn, D., Kim, J., and Fiehn, O., Anal. Chem., 2019, vol. 91, no. 3, p. 2155.
Dasenaki, M.E., Bletsou, A.A., Koulis, G.A., and Thomaidis, N.S., J. Agric. Food Chem., 2015, vol. 63, no. 18, p. 4493.
Robert, C., Gillard, N., Brasseur, P.Y., Pierret, G., Ralet, N., Dubois, M., and Delahaut, P., Food Addit. Contam.,Part A, 2013, vol. 30, no. 3, p. 443.
Malachová, A., Sulyok, M., Beltrán, E., Berthiller, F., and Krska, R., J. Chromatogr. A, 2014, vol. 1362, p. 145.
Dzuman, Z., Zachariasova, M., Veprikova, Z., Godula, M., and Hajslova, J., Anal. Chim. Acta, 2015, vol. 863, p. 29.
Pérez-Ortega, P., Lara-Ortega, F.J., García-Reyes, J.F., Gilbert-López, B., Trojanowicz, M., and Molina-Díaz, A., Talanta, 2016, vol. 160, p. 704.
Fu, Y., Zhou, Z., Kong, H., Lu, X., Zhao, X., Chen, Y., Chen, J., Wu, Z., Xu, Z., Zhao, C., and Xu, G., Anal. Chem., 2016, vol. 88, no. 17, p. 8870.
Gago-Ferrero, P., Borova, V., Dasenaki, M.E., and Thomaidis, N.S., Anal. Bioanal. Chem., 2015, vol. 407, no. 15, p. 4287.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Translated by E. Rykova
Rights and permissions
About this article
Cite this article
Milman, B.L., Zhurkovich, I.K. Big Data in Modern Chemical Analysis . J Anal Chem 75, 443–452 (2020). https://doi.org/10.1134/S1061934820020124
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1134/S1061934820020124