Encyclopedia of Big Data

Living Edition
| Editors: Laurie A. Schintler, Connie L. McNeely


  • Colin L. Bird
  • Jeremy G. Frey
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-32001-4_260-1

Chemistry has always been data-dependent, but as computing power has increased, chemical science has become increasingly data-intensive, a development recognized by several contributors to the book edited by Hey, Tansley, and Tolle, The Fourth Paradigm (Hey et al. 2009). In one article, chemistry is given as one example of “a genuinely new kind of computationally driven, interconnected, Web-enabled science.”

The study of chemistry can be perceived as endeavoring to obtain big information – big in the sense of significance – from data relating to molecules, which are small in the physical sense. The transition from data to big information is perhaps well illustrated by the role of statistical mechanics as we see the move from modeling departures from ideal gas behavior through to the measurement of single molecule properties: a journey of simple information about lots of similar molecules to complex information about individual molecules, paralleled in the development of machine...
This is a preview of subscription content, log in to check access.

Further Readings

  1. Andersen, J. L., Flamm, C., Merkle, D., & Stadler, P. F. (2014). Generic strategies for chemical space exploration. International Journal of Computational Biology and Drug Design, 7(2–3), 225–258.CrossRefGoogle Scholar
  2. Araki, M., Gutteridge, A., Honda, W., Kanehisa, M., & Yamanishi, Y. (2008). Prediction of drug–target interaction networks from the integration of chemical and genomic spaces. Bioinformatics, 24(13), i232–i240.CrossRefGoogle Scholar
  3. Banck, M., Hutchison, G. R., James, C. A., Morley, C., O’Boyle, N. M., & Vandermeersch, T. (2011). Open Babel: An open chemical toolbox. Journal of Cheminformatics, 3, 33.CrossRefGoogle Scholar
  4. Barge, L. M., Cardoso, S. S., Cartwright, J. H., Cooper, G. J., Cronin, L., Doloboff, I. J., Escribano, B., Goldstein, R. E., Haudin, F., Jones, D. E., Mackay, A. L., Maselko, J., Pagano, J. J., Pantaleone, J., Russell, M. J., Sainz-Díaz, C. I., Steinbock, O., Stone, D. A., Tanimoto, Y., Thomas, N. L., & Wit, A. D. (2015). From chemical gardens to chemobrionics. Chemical Reviews, 115(16), 8652–8703.CrossRefGoogle Scholar
  5. Barrett, S. J., & Langdon, W. B. (2006). Advances in the application of machine learning techniques in drug discovery, design and development. In A. Tiwari, R. Roy, J. Knowles, E. Avineri, & K. Dahal (Eds.), Applications of soft computing. Advances in intelligent and soft computing (Vol. 36). Berlin/Heidelberg: Springer.Google Scholar
  6. Belianinov, A., et al. (2015). Big data and deep data in scanning and electron microscopies: Deriving functionality from multidimensional data sets. Advanced Structural and Chemical Imaging, 1, 6.  https://doi.org/10.1186/s40679-015-0006-6.CrossRefGoogle Scholar
  7. Benz, R. W., Baldi, P., & Swamidass, S. J. (2008). Discovery of power-laws in chemical space. Journal of Chemical Information and Modeling, 48(6), 1138–1151.CrossRefGoogle Scholar
  8. Bolstad, E. S., Coleman, R. G., Irwin, J. J., Mysinger, M. M., & Sterling, T. (2012). ZINC: A free tool to discover chemistry for biology. Journal of Chemical Information and Modeling, 52(7), 1757–1768.CrossRefGoogle Scholar
  9. Bolton, E., Bryant, S. H., Chen, J., Fu, G., Gindulyte, A., Han, L., He, J., He, S., Kim, S., Shoemaker, B. A., Thiessen, P. A., Wang, J., Yu, B., & Zhang, J. (2016). PubChem substance and compound databases. Nucleic Acids Research, 44, D1202–D1213.CrossRefGoogle Scholar
  10. Bon, R. S., & Waldmann, H. (2010). Bioactivity-guided navigation of chemical space. Accounts of Chemical Research, 43(8), 1103–1114.CrossRefGoogle Scholar
  11. Butte, A., & Chen, B. (2016). Leveraging big data to transform target selection and drug discovery. Clinical Pharmacology and Therapeutics, 99(3), 285–297.CrossRefGoogle Scholar
  12. Buytaert, W., El-khatib, Y., Macleod, C. J., Reusser, D., & Vitolo, C. (2015). Web technologies for environmental Big Data. Environmental Modelling and Software, 63, 185–198.CrossRefGoogle Scholar
  13. Clarke, P., Coveney, P. V., Heavens, A. F., Jäykkä, J., Korn, A., Mann, R. G., McEwen, J. D., Ridder, S. D., Roberts, S., Scanlon, T., Shellard, E. P., Yates, J. A., & Royal Society (2016).  https://doi.org/10.1098/rsta.2016.0153.
  14. Dekker, A., Ennis, M., Hastings, J., Harsha, B., Kale, N., Matos, P. D., Muthukrishnan, V., Owen, G., Steinbeck, C., Turner, S., & Williams, M. (2013). The ChEBI reference database and ontology for biologically relevant chemistry: Enhancements for 2013. Nucleic Acids Research, 41, D456–D463.Google Scholar
  15. Edwards, M., Aldea, M., & Belisle, M. (2015). Big Data is changing the environmental sciences. Environmental Perspectives, 1. Available from http://www.exponent.com/files/Uploads/Documents/Newsletters/EP_2015_Vol1.pdf.
  16. Ekins, S., Tkachenko, V., & Williams, A. J. (2012). Towards a gold standard: Regarding quality in public domain chemistry databases and approaches to improving the situation. Drug Discovery Today, 17(13–14), 685–701.Google Scholar
  17. Frey, J. G., & Bird, C. L. (2011). Web-based services for drug design and discovery. Expert Opinion on Drug Discovery, 6(9), 885–895.CrossRefGoogle Scholar
  18. Frey, J. G., & Bird, C. L. (2013). Cheminformatics and the semantic web: Adding value with linked data and enhanced provenance. Wiley Interdisciplinary Reviews: Computational Molecular Science, 3(5), 465–481.  https://doi.org/10.1002/wcms.1127.CrossRefGoogle Scholar
  19. Gartner. From the Gartner IT glossary: What is Big Data? Available from https://www.gartner.com/it-glossary/big-data.
  20. Gilson, M. K., Liu, T., & Nicola, G. (2012). Public domain databases for medicinal chemistry. Journal of Medicinal Chemistry, 55(16), 6987–7002.CrossRefGoogle Scholar
  21. Groth, P. T., Gray, A. J., Goble, C. A., Harland, L., Loizou, A., & Pettifer, S. (2014). API-centric linked data integration: The open phacts discovery platform case study. Web Semantics: Science, Services and Agents on the World Wide Web, 29, 12–18.CrossRefGoogle Scholar
  22. Hall, R. J., Murray, C. W., & Verdonk, M. L. (2017). The fragment network: A chemistry recommendation engine built using a graph database. Journal of Medicinal Chemistry, 60(14), 6440–6450.  https://doi.org/10.1021/acs.jmedchem.7b00809.CrossRefGoogle Scholar
  23. Han, Y., Horlacher, O., Kuhn, S., Luttmann, E., Steinbeck, C., & Willighagen, E. L. (2003). The Chemistry Development Kit (CDK): An open-source Java library for chemo-and bioinformatics. Journal of Chemical Information and Computer Sciences, 43(2), 493–500.CrossRefGoogle Scholar
  24. Hartung, T. (2016). Making big sense from big data in toxicology by read-across. ALTEX, 33(2), 83–93.CrossRefGoogle Scholar
  25. Hey, A., Tansley, S., & Tolle, K. (Eds.). (2009). The fourth paradigm, data-intensive scientific discovery. Redmond: Microsoft Research. ISBN 978-0-9825442-0-4.Google Scholar
  26. https://home.cern/. Accessed 30 Oct 2017.
  27. https://lcls.slac.stanford.edu/. Accessed 30 Oct 2017.
  28. https://pubchem.ncbi.nlm.nih.gov/. Accessed 30 Oct 2017.
  29. https://www.ccdc.cam.ac.uk. Accessed 30 Oct 2017.
  30. http://www.RDKit.org. Accessed 30 Oct 2017.
  31. https://www.xfel.eu/. Accessed 30 Oct 2017.
  32. ICIS Chemical Business. (2013). Big data and the chemical industry. Available from https://www.icis.com/resources/news/2013/12/13/9735874/big-data-and-the-chemical-industry/.
  33. Jessop, D. M., Adams, S. E., Willighagen, E. L., Hawizy, L., & Murray-Rust, P. (2011). OSCAR4: A flexible architecture for chemical text-mining. Journal of Cheminformatics, 3, 41.  https://doi.org/10.1186/1758-2946-3-41.CrossRefGoogle Scholar
  34. Kaestner, M. (2016). Big Data means big opportunities for chemical companies. KPMG REACTION, 16–29.Google Scholar
  35. Lowe, G. (1995). Combinatorial chemistry. Chemical Society Review, 24, 309–317.  https://doi.org/10.1039/CS9952400309.CrossRefGoogle Scholar
  36. Lundia, S. R. (2015). How big data is influencing chemical manufacturing. Available from https://www.chem.info/blog/2015/05/how-big-data-influencing-chemical-manufacturing.
  37. Mohimani, H., et al. (2017). Dereplication of peptidic natural products through database search of mass spectra. Nature Chemical Biology, 13, 30–37.  https://doi.org/10.1038/nchembio.2219.CrossRefGoogle Scholar
  38. Pence, H. E., & Williams, A. J. (2016). Big data and chemical education. Journal of Chemical Education, 93(3), 504–508.  https://doi.org/10.1021/acs.jchemed.5b00524.CrossRefGoogle Scholar
  39. Peter V. Coveney, Edward R. Dougherty, Roger R. Highfield, (2016) Big data need big theory too. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences 374(2080):20160153CrossRefGoogle Scholar
  40. Ramakrishnan, R., Dral, P. O., Rupp, M., & Anatole von Lilienfeld, O. (2015). Big data meets quantum chemistry approximations: The Δ-machine learning approach. Journal of Chemical Theory and Computation, 11(5), 2087–2096.  https://doi.org/10.1021/acs.jctc.5b00099.CrossRefGoogle Scholar
  41. Reymond, J. (2015). The chemical space project. Accounts of Chemical Research, 48(3), 722–730.CrossRefGoogle Scholar
  42. Sayle, R. A., Batista, J., & Grant, A. (2013). An efficient maximum common subgraph(MCS) searching of large chemical databases. Journal of Cheminformatics, 5(1), O15.  https://doi.org/10.1186/1758-2946-5-S1-O15.CrossRefGoogle Scholar
  43. Schneider, N., Lowe, D. M., Sayle, R. A., Tarselli, M. A., & Landrum, G. A. (2016). Big data from pharmaceutical patents: A computational analysis of medicinal chemists’ bread and butter. Journal of Medicinal Chemistry, 59(9), 4385–4402.  https://doi.org/10.1021/acs.jmedchem.6b00153.CrossRefGoogle Scholar
  44. Spek, A. L. (2009). Structure validation in chemical crystallography. Acta Crystallographica. Section D, Biological Crystallography.CrossRefGoogle Scholar
  45. Swain, M. C., & Cole, J. M. (2016). ChemDataExtractor: A toolkit for automated extraction of chemical information from the scientific literature. Journal of Chemical Information and Modeling, 56(10), 1894–1904.  https://doi.org/10.1021/acs.jcim.6b00207.CrossRefGoogle Scholar
  46. Szymański, P., Marcowicz, M., & Mikiciuk-Olasik, E. (2012). Adaptation of high-throughput screening in drug discovery – Toxicological screening tests. International Journal of Molecular Sciences, 13, 427–452.  https://doi.org/10.3390/ijms13010427.CrossRefGoogle Scholar
  47. Tetko, I. V., Engkvist, O., Koch, U., Reymond, J.-L., & Chen, H. (2016). BIGCHEM: Challenges and opportunities for big data analysis in chemistry. Molecular Informatics, 35, 615.CrossRefGoogle Scholar
  48. Tormay, P. (2015). Big data in pharmaceutical R&D: Creating a sustainable R&D engine. Pharmaceutical Medicine 29(2), 87–92.CrossRefGoogle Scholar
  49. Whitesides, G. M. (2015). Reinventing chemistry. Angewandte Chemie, 54(11), 3196–3209.CrossRefGoogle Scholar
  50. Yeguas, V., & Casado, R. (2014). Big Data issues in computational chemistry, 2014 international conference on future internet of things and cloud. Available from http://ieeexplore.ieee.org/abstract/document/6984225/.
  51. Zhu, H., et al. (2014). Big data in chemical toxicity research: The use of high-throughput screening assays to identify potential toxicants. Chemical Research in Toxicology, 27(10), 1643–1651.  https://doi.org/10.1021/tx500145h.CrossRefGoogle Scholar

Copyright information

© Springer International Publishing AG 2018

Authors and Affiliations

  1. 1.Department of ChemistryUniversity of SouthamptonSouthamptonUK