Skip to main content

Private and public efforts infuse artificial intelligence into materials research

A research team from Rice University used flash heating to convert scrap waste into high-quality graphene, as reported in Advanced Materials ( To tune this tricky chemical process, Jacob Beckham and colleagues first collected data from hundreds of prior experiments, which had generated tens of thousands of spectrograms. But the prospect of manually poring over such a large data set and, consequently, tweaking experiment parameters is daunting. The team, led by James M. Tour, overcame this laborious endeavor with help from a widely available “XGBoost” algorithm, which delivered a highly accurate machine learning (ML) process from experimental data to help the researchers arrive at the ultimate experimental approach much more quickly and efficiently.

figure a

Artificial intelligence (AI) and machine learning (ML) innovations allow researchers to simulate anticipated materials properties, enabling them to weed out dead-end approaches more efficiently and, instead, pursue more fruitful avenues. Credit: Kristin Persson/Scientific American.

This is just one example of increasing reliance of materials research on artificial intelligence (AI) tools—a field conventionally defined by beakers, microscopes, and other “hard science” benchtop attributes. Academia, government, and industry alike are turning to data science to extract every drop of knowledge from increasingly complex, immense volumes of experimental results. This knowledge, subsequently, can guide scientists to the next evolution of experiments, and so on.

figure b

Researchers conducted a literature search of 3 million publications for LiCoO2 and LiMn2O4 battery cathode materials. Using an artificial neural network, they predicted which words should co-occur with one another. This AI-driven algorithm can be used to identify gaps in state of the art and predict future directions for materials research. Credit: V. Tshitoyan et al., Nature 571, pp. 95–98 (2019) (

According to an IBM estimate, by 2020, the world had generated over 43 trillion gigabytes of data—an increase of 300% since 2005. And the rate is increasing exponentially: 90% of all currently existing data is less than two years old, and over 2.5 quintillion bytes are produced each day.

figure c

As microscope and detector technologies have improved over the last century, the rate of data acquired with these materials characterization techniques has grown exponentially. This continued growth needs new methods to process and deduce information from collected images for scientists to continue to reap the benefits of improved microscopy techniques. CCD, charge-coupled device; TEAM, transmission electron aberration-corrected microscope. Credit: S.R. Spurgeon et al., Nature Materials 20, pp. 274–279 (2021) (

Information generated from scientific research is no exception. Materials science data inundate researchers in numerous forms: numerical data sets from spectrometers and synchrotrons, electron microscope images, spectra, and diffraction patterns, and text from publications and news articles. Some of this information is easy to analyze, needs no cleanup, and is neatly assembled into fully filled data tables. An overwhelming majority of data, however, is a mess. It is nonnumerical and unstructured (such as pictures or strings of words), needs significant cleaning, and reasonable accounting for missing pieces before it can be of any use to a researcher.

Moreover, much of the data traversing the Internet is of low quality and unlikely to yield breakthrough discoveries. Some of it is downright falsified. Filtering this junk out of scientists’ analyses is no trivial feat. But materials scientists are now putting data science innovations in artificial intelligence and machine learning to use to better analyze good-quality information (or eliminate bad-quality information), and build, test, and use optimized algorithms to make predictions or suggest experimental approaches.

Although researchers have relied on high-performance computing to solve first-principles equations since the 1990s, use of data science to drive materials research is a relatively nascent field. In an interview with MRS Bulletin, University of Tennessee, Knoxville Professor Sergei Kalinin highlighted the sea change that occurred shortly after 2010, when applications of deep learning to extract information from images collected during experiments became possible.

“In microscopy, we often limit ourselves to purely qualitative analysis of imaging data, [be] it [the] atomic structure of [a] two-dimensional material or ferroelectric domain pattern. At the same time, these data contain fundamental information on the physics of materials, for example force fields between the atoms or the free energy functional of the ferroelectric,” says Kalinin. “Correspondingly, extracting this information from observational data is a challenge. Machine learning methods allow addressing the first step, converting observations into materials-specific descriptors such as atomic positions and identities, and ultimately offer the promise of learning [the] physics behind observation[s]. However, adoption of machine learning methods requires building [a] scientific community of shared data and codes, allowing to build upon prior work as a community.”

Although code sharing in the AI/ML community became more commonplace 3–4 years ago, challenges in standardizing and reducing data remain: raw data only become useful once they are processed and reduced, using universally accepted algorithms, and into universal formats that other researchers can access and analyze. Acceptance and coordination of such a common language between researchers worldwide remain a challenge. Kalinin identified two additional challenges that slow down the advent of ML in materials science: The discipline (1) relies on small quantities of data and (2) heavily uses past knowledge to establish needed cause–effect relationships. These factors challenge the ability to build experience into artificial intelligence algorithms.

Machine learning tackles materials science challenges

Microscopes and spectrometers generate large quantities of data, so machine learning tools are instrumental for quickly extracting key takeaways from this information. Current materials research challenges are ever-increasing in complexity, and analysis, done by hand, can neither keep pace with expected rate of discoveries nor realize the full potential of cutting-edge instruments that are currently available.

In an interview with MRS Bulletin, Pacific Northwest National Laboratory Materials Scientist and University of Washington Affiliate Professor Steven R. Spurgeon provided an overview of the machine learning development that is necessary to accelerate materials development. First, ML approaches must be developed to collect and fuse different types of complementary data to obtain representative statistics. Once this hurdle is reached, ML will be able to advance to guide and direct experiments, use machine-driven reasoning to process data streams, and build meaningful models that derive fundamental correlation and causation.

Spurgeon says, “The process of materials discovery and design stands to be reshaped by AI/ML, but we must first solve fundamental problems in implementation, reasoning in novel scenarios, and physically meaningful interpretation.” Spurgeon also reflected on equipment-dictated lower barriers to entry for data science: While first-principles calculations benefit from pretrained machine learning models and can suffice on standalone central processing units (CPUs), emerging higher fidelity algorithms need more advanced graphics processing units (GPUs) and high-performance computing directly integrated into analytical instrumentation.

Innovation in materials science requires not only profound extraction of information from extremely complex nonstandardized, and nonuniform data, but also the ability to do so on the spur of the moment and extremely quickly. This need is most evident at the US Department of Energy (DOE) Office of Science Scientific User Facilities, which is a collection of 28 research institutions around the country that, each year, empower nearly 33,000 scientists with access to some of the world’s most advanced tools of modern science. These include light sources, supercomputers, neutron sources, and numerous other instruments that are essential for cutting-edge materials innovation.

At a 2019 DOE Basic Energy Sciences Roundtable, a group of subject matter experts outlined the key innovations that AI/ML will drive in the fields of neutron, photon, and nanoscale sciences. The resulting report focused on abilities to (1) extract useful findings from large data sets; (2) do so in real time to maximize output from user facilities; (3) help design, control, and execute experiments using these high-end instruments; and (4) develop a shared data infrastructure for the scientific community to benefit from all collected data.

“Many of DOE’s user facilities are leading the way in the use of AI/ML to guide experiments and perform real-time, physics-based analysis at unprecedented scale,” Spurgeon says. “There is a tremendous opportunity to apply these approaches more broadly to harness atomically precise materials synthesis and characterization for quantum information science, energy storage, and other emerging technologies.”

For researchers to use data from microscopes or synchrotrons, information needs to be assembled into widely accessible repositories. According to a 2020 Chemistry of Materials summary, at that time, 16 such databases had been assembled by the scientific community. Although all of them include thermal properties, 11 also include mechanical properties and 10 databases include structural and electronic information. Whereas most of them make their data available to scientists under a creative common or public domain license, and one is free without preconditions, four require paid subscriptions. As of 2020, the Cambridge Structural Database ( and the International Centre for Diffraction Data ( each included over 1 million materials records.

But scientists like Kristin Persson from Lawrence Berkeley National Laboratory (Berkeley Labs) place a stronger emphasis on the richness of data in each database. Persson told MRS Bulletin that the Materials Project resource (—which DOE has recently designated a Public Reusable Research (PuRE) Data Resource—has 250,000 users (with 10,000 unique daily visitors) and delivers 5–45 million new data records each day to its users. The Materials Project currently encompasses millions of materials property data points and enables robust correlation and cross-comparison of structural-chemistry trends.

Persson says, “People come to the Materials Project for the quality, accessibility, and usefulness of the data, in particular the broad spectrum of calculated materials properties.” Persson further sketched out the challenges of the next step of evolution of this database, which will incorporate reactions and reactive interfaces as data features and account for large-space physical changes to materials.

Government and industry see promise

As the nascent AI-driven materials science field still depends on federally supported research efforts, its funding and scope fall under the US Government’s basic science budgetary portfolio, and, notably, its emerging AI development efforts. As noted in a May 2021 Congressional Research Service report on artificial intelligence, federal nondefense AI research totaled USD$1.5 billion in fiscal year (FY) 2021—an almost USD$1 billion increase since FY 2018. Among the leaders in this area were the US National Science Foundation (NSF) with USD$457 million, US Department of Agriculture (USDA) with USD$128 million, and DOE with USD$84 million. Among US Department of Defense (DoD) spending, which had allocated USD$5 billion for AI research in FY 2021, the Defense Advanced Research Projects Agency (DARPA) received USD$568 million, while the DoD newly created Joint Artificial Intelligence Center was funded at a USD$132 million level.

Initial efforts to develop AI commenced in the 1950s and underwent several ebb and flow cycles in the subsequent decades. Availability of big data, powerful computing power, and improved machine learning have put this field on a persistent growth trajectory over the past decade. The materials science community took notice of AI’s potential in 2011 when the Materials Genome Initiative (MGI) ( was instituted. In its 2014 strategic plan, the MGI highlighted its aims to integrate experiments with computation and theory, improve data access, and develop the workforce. A 2018 progress report on the MGI noted that, in collaboration with DoD, DOE, NSF, and the National Institute of Standards and Technology (NIST), it has generated almost USD$270 billion for the US economy and bolstered national defense, renewable energy, and supercomputing research. It is credited with accelerating development of consumer products like the Apple Watch.

Despite sustained, decades-long research and interest in this field, the concept of “artificial intelligence” was codified in law in the United States only as recently as 2019, in the National Defense Authorization Act. The National Artificial Intelligence Initiative Act of 2020 further clarified the definition and scope of AI as a machine-based system that can make predictions or decisions for human-defined objectives.

The February 2019 Executive Order “Maintaining American Leadership in Artificial Intelligence” set up the principles for the National Artificial Intelligence Initiative (NAII). A 2018 White House summit on AI for American Industry, which was attended by 100 professionals, helped bring this policy into life. The National Artificial Intelligence Initiative Office (NAIIO), which reports to the Office of Science and Technology Policy (OSTP) at the White House, aims to drive AI innovation for national security, health, and economy, and, in service of those goals, bolster programs for a technology apt workforce.

This federal governmental framework sets up a relationship with relevant congressional committees and governmental agencies to allocate funding for these R&D activities. A Select Committee, which is chaired by OSTP and a rotating federal agency member, consists of most senior R&D US Government officials, to represent whole-of-government. As part of the NAII set forth in the 2019 executive order, the NAIIO plans to develop government-wide standards for this technology and assist the Office of Management and Budget with guidance on AI regulation in the private sector. The Networking and Information Technology Research and Development Program acts to coordinate federal funding for AI and report these allocations to Congress. Its membership, which encompasses 25 government agencies, invests USD$7 billion annually in advanced networking and information technology capabilities.

A 2022 NAII progress report further highlighted the efforts of eight different US Government agencies that provide support for students and early-career researchers in AI. Since May 2022, researchers were able to apply for funding for a cumulative total of seven National AI Research Institutes established under the NAII. The joint effort, which NIST, USDA, DoD, and the IBM Corporation are overseeing, will allocate USD$140 million of funding for 4–5-year cross-cutting AI development endeavors.

The DOE, which also oversees development of some of the country’s most powerful supercomputers, has a long-running track record of investing in computational work that enables science discoveries. Its Scientific Discovery through Advanced Computing (SciDAC) program partnerships, which seek to bring computational solutions to challenging problems in physical sciences and several other fields, started in 2001 and has been re-competed four times since. SciDAC is a partnership involving all six DOE Office of Science programs—Advanced Scientific Computing Research (ASCR), Basic Energy Sciences, Biological and Environmental Research, Fusion Energy Sciences, High-Energy Physics, and Nuclear Physics—as well as the Office of Nuclear Energy to dramatically accelerate progress in scientific computing that delivers breakthrough scientific results through partnerships composed of applied mathematicians, computer scientists, and scientists from other disciplines.

DOE’s Office of Basic Energy Sciences (BES) also funds the ASCR effort to bolster computational and networking capabilities that simulate and predict complex physical phenomena. In FY 2020, ASCR received USD$980 million, while SciDAC received USD$69 million. BES also directly invests into data science to advance chemical and materials sciences, in addition to its SciDAC partnership with ASCR.

In an interview with MRS Bulletin, Program Manager Matthias Graf highlighted 19 data science awards made in 2019 and 10 awards in 2021. These awards aim to bring forth new AI techniques and tools, such as neural networks, for fundamental discoveries through data-driven models of complex chemical or materials systems whose macroscopic properties depend on collective behavior across multiple time and length scales. For example, awards were made in the fields of chemical separation, catalysis, grain-boundary growth, alloys, quantum magnetism, superconductivity, and electron, neutron, and photon spectroscopies. Graf says, “The AI/ML revolution adds a new tool to scientists’ computational toolbox to accelerate discovery, time to solution, and understanding of fundamental chemical and materials properties and processes, not achievable otherwise.” Notably, every DOE awardee receives access to DOE’s computing resources. This arrangement maximizes opportunities for every research team to obtain success while unconstrained by CPU or GPU limitations.

As an investor of over USD$500 million into AI each year (41% of all nondefense AI federal research funding in 2021), NIST is also an important driver in development of AI-infused materials science. In August 2020, as part of its USD$220 million plan to establish 11 Artificial Intelligence Institutes, the agency announced its first five selections. Each of these will receive USD$20 million for five years of collaborative research. One of them, the Molecule Maker Lab Institute, is led by a team at the University of Illinois at Urbana-Champaign and includes three additional academic partners. It will aim to develop new AI-enabled tools to accelerate automated chemical synthesis and advance the pace of discovery of novel materials and bioactive compounds. But NIST’s role goes beyond funding or directly carrying out research. As the flagship US agency for developing technology standards, NIST launched an effort to construct a research data framework to optimally handle artificial intelligence deliverables. The February 2021 Core Summary report sets forth best practices to store, curate, and manage scientific data.

The DoD is also keenly interested in opportunities that data science can afford to solve essential materials challenges. A 2018 report from the Institute for Defense Analyses, which is one of DoD’s federally funded R&D centers, underscored the relevance of data science to predict performance in extreme environments. Materials with enhanced corrosion and radiation resistance, high-temperature thermal barriers, and high-entropy alloys are all relevant to defense applications.

The report also highlighted the need for DoD-relevant materials databases. And while, according to a March 2022 Government Accountability Office report, most of DoD’s 685 AI projects focus on new weapons systems development, defense science initiatives are investing in ML-driven materials science to address the department’s needs. Two DARPA programs, Accelerated Molecular Discovery and Make-It, investigate the use of computation approaches to predict chemical synthesis routes and accelerate development of new molecules. In 2021, the US Army Research Laboratory launched its High-Throughput Materials Discovery for Extreme Conditions effort, which aims to leverage machine learning to accelerate discovery of extreme materials and meet the US Army’s Modernization Priorities.

While federal agencies fund research and innovation, companies see economic potential in AI-driven materials discoveries and are financing this field. Private enterprise is driven by the need to streamline and accelerate product development. As Berkeley National Laboratory Scientist Anubhav Jain stated in a recent AI workshop, the pathway from invention to commercialization took 15–24 years in 1995—an unsustainable figure in today’s world. But the ability to run calculations 5–6 orders of magnitude faster using GPUs, and relying on AI to conduct 1 million tests with the computing power of one, can significantly cut into these lead times. According to a Congressional Research Service report, US private investment into AI totaled USD$23.6 billion in 2020—more than any other country during that time. In November 2017, the company BitReFine Group predicted that AI would add USD$15 trillion to the industry and manufacturing economy by 2030. According to its estimate, companies that used data science grew 50% faster than those who eschewed it.

In a fusion of private and public interests, the Toyota Research Institute (TRI), which was founded in 2016, has significant investments into AI to accelerate the materials design timeline. TRI is partnering with universities and national laboratories to drive innovation in materials for energy storage to bring forth better batteries that reduce cost, improve durability, and maximize environmental sustainability.

In an interview with MRS Bulletin, TRI’s Energy and Materials Director Brian Storey shared the vison of the organization: “In an economy that relies on less fossil fuels, new energy materials are a competitive advantage. Better materials mean lower cost, higher durability, and more sustainable options. Accelerating materials development timelines is critical to ensure we decarbonize our global economy.”

Among the Institute’s research goals is to develop AI models that specifically infer state of materials based on its processing history—a necessary tool to gauge the health of electrodes in batteries of electric vehicles. TRI Engineer Jens Hummelshøj further reflected on the metrics by which AI can be evaluated: “If we think of innovation as invention plus adoption—then one possible metric is the rate of adoption of these new research practices and tools. Are they used by scientists and engineers in their daily work? Another possible metric is based on the amount of investment in the field. There are a number of startups that are raising money, and there are large companies like Toyota that are investing internally in such AI/ML efforts. This investment is based on an expected potential future return.”

Conclusion and outlook

Notwithstanding a sprawling network of federal agencies, research centers, and government policy initiatives, data-driven materials science is still an emerging field that constitutes a small—but growing—part of the AI enterprise. The United States is not alone in its quest for new artificial intelligence capabilities. A 2019 Science article estimated that worldwide AI funding hit USD$35 billion in 2019—a 44% increase from 2018. China put USD$9.9 billion of private equity into this field in 2020. This infusion aligns with its 2017 National AI plan, which has projected a USD$140 billion AI market by the end of this decade. Moreover, while US investments into AI constituted 30% of the worldwide total in 2013–2020, China’s share was 60% during that time. The EU had committed to investing USD$20 billion into AI by 2020. Factors such as sustained funding levels, clear paths to market, availability of computing power, strength of AI-trained workforce, and readiness of commercialization partners to realize AI designs will shape this international competition.

For all the buzz around AI-driven materials innovation, the rest of the world will appreciate results more once these designs escape their cyberspace-bound, in silico environment and make their way to the physical realm. And even though this field is new, machine learning has helped bring about better products to the shelves. When, in 2005, Duracell queried Kristin Persson and Gerbrand Ceder (Berkeley Labs) about choice of cathode chemistries for their batteries, the team computationally screened 130,000 possibilities. The researchers delivered a 200-member short list and saved the company from wasting countless hours fruitlessly synthesizing and testing the 129,800 ultimately useless options. Due to these efforts, Duracell’s optimized alkaline battery, Optimum, hit the store shelves in 2019. As this field matures, more exciting developments could be making a similar leap in the coming years.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Boris Dyatkin.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Dyatkin, B. Private and public efforts infuse artificial intelligence into materials research. MRS Bulletin (2022).

Download citation

  • Published:

  • DOI:


  • Government policy and funding
  • Artificial intelligence
  • Machine learning
  • Materials genome