Machine learning as a tool to engineer microstructures: Morphological prediction of tannin-based colloids using Bayesian surrogate models

Jin, Soo-Ah; Kämäräinen, Tero; Rinke, Patrick; Rojas, Orlando J.; Todorović, Milica

doi:10.1557/s43577-021-00183-4

Machine learning as a tool to engineer microstructures: Morphological prediction of tannin-based colloids using Bayesian surrogate models

Impact Article
Open access
Published: 28 February 2022

Volume 47, pages 29–37, (2022)
Cite this article

Download PDF

You have full access to this open access article

MRS Bulletin Aims and scope Submit manuscript

Machine learning as a tool to engineer microstructures: Morphological prediction of tannin-based colloids using Bayesian surrogate models

Download PDF

2313 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Oxidized tannic acid (OTA) is a useful biomolecule with a strong tendency to form complexes with metals and proteins. In this study we open the possibility to further the application of OTA when assembled as supramolecular systems, which typically exhibit functions that correlate with shape and associated morphological features. We used machine learning (ML) to selectively engineer OTA into particles encompassing one-dimensional to three-dimensional constructs. We employed Bayesian regression to correlate colloidal suspension conditions (pH and pK_a) with the size and shape of the assembled colloidal particles. Fewer than 20 experiments were found to be sufficient to build surrogate model landscapes of OTA morphology in the experimental design space, which were chemically interpretable and endowed predictive power on data. We produced multiple property landscapes from the experimental data, helping us to infer solutions that would satisfy, simultaneously, multiple design objectives. The balance between data efficiency and the depth of information delivered by ML approaches testify to their potential to engineer particles, opening new prospects in the emerging field of particle morphogenesis, impacting bioactivity, adhesion, interfacial stabilization, and other functions inherent to OTA.

Impact statement

Tannic acid is a versatile bio-derived material employed in coatings, surface modifiers, and emulsion and growth stabilizers, which also imparts mild anti-viral health benefits. Our recent work on the crystallization of oxidized tannic acid (OTA) colloids opens the route toward further valuable applications, but here the functional properties tend to depend strongly on particle morphology. In this study, we eschew trial-and-error morphology exploration of OTA particles in favor of a data-driven approach. We digitalized the experimental observations and input them into a Gaussian process regression algorithm to generate morphology surrogate models. These help us to visualize particle morphology in the design space of material processing conditions, and thus determine how to selectively engineer one-dimensional or three-dimensional particles with targeted functionalities. We extend this approach to visualize other experimental outcomes, including particle yield and particle surface-to-volume ratio, which are useful for the design of products based on OTA particles. Our findings demonstrate the use of data-efficient surrogate models for general materials engineering purposes and facilitate the development of next-generation OTA-based applications.

Graphic abstract

From predictive modelling to machine learning and reverse engineering of colloidal self-assembly

Article 27 May 2021

Driving forces for particle-based crystallization: From experiments to theory and simulations

Article 28 March 2024

Design of Polymeric Self-Assembling Materials and Nanocomposites in the Semi-dilute Density Regime: Multiscale Modeling

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Tannic acid (TA) is an abundant and versatile bio-based material, which readily affords synthetic pathways for the isolation of its elementary building blocks. TA contains many hydroxyl groups, allowing it to form complexes with different macromolecules via hydrogen-bonding, hydrophobic and cation-π interactions.^1,2 Abundant hydroxyl groups make TA highly soluble and stable in aqueous solutions. In alkaline conditions, TA undergoes oxidation^3,4 and produces oxidized tannic acid (OTA) followed by oligomerization. Concomitant oligomerization of OTA leads to the formation of compounds with higher molecular weight and thereby decreases the solubility of the substance.⁴ In this form, OTA can interact with different molecules and serve as coatings,^3,5 surface modifiers^1,6 and emulsion stabilizers,^1,3,6,7 or act as stabilizing and reducing agents to aid in inorganic nanoparticle growth,^8,9,10 all the while imparting beneficial biological functionality.^11,12 For instance, tannic acid has recently been shown to suppress SARS-CoV-2 as a dual inhibitor of the viral main protease.¹³ All these favorable aspects of OTA and other phenolic particles have fueled research into a wide spectrum of applications.¹⁴

OTA can also be crystallized into particles with structural properties that are highly sensitive to the experimental synthesis. Previously, Bhangu et al.¹⁰ developed a sonochemical method to chemically transform amorphous tannic acid into nano-/micro-sized crystalline particles without the use of reagents or organic solvents. They obtained OTA particles of different size and shape by simply varying ultrasonic parameters. Kämäräinen et al.⁴ further presented a facile and scalable protocol to prepare OTA of varying morphologies by altering the TA oxidation conditions. The dimensions, shapes, and the yield of these crystalline particles were highly sensitive to initial TA concentration, reaction time, initial pH, and pK_a of the base.

While OTA particulate constructs can facilitate a range of new applications, particle morphology is a key consideration. In many high surface area systems that incorporate particulate matter, particle morphology and size are major contributors to their overall performance through, for example, relationships between morphology and packing,¹⁵ percolation,¹⁶ rheology,¹⁷ and bioactivity.¹⁸ Consequently, morphology plays an important role in many applications ranging from heterogeneous catalysts¹⁹ and electrochemical cells^20,21 to drug delivery systems.²²

In this work, we employ machine learning (ML) to explore the morphology landscape of OTA particles in the chemical design space of processing conditions. As illustrated in Figure 1, we start with OTA synthesis experiments and digitalize them into data points for particle morphology. We apply Gaussian process regression (GPR),²³ an ML tool for supervised learning, to compute a surrogate model for OTA morphology. Based on the morphology model in the design space of material fabrication, we consider which particle shapes are available and learn how to tune the processing conditions to achieve an optimal outcome for a targeted application.

GPR has been employed in materials science for experimental materials design,^{24,25,26,27,28,29,30} often in combination with Bayesian optimization.^31,32,33 Given data within the phase space of N design parameters, GPR produces the statistically most likely N-dimensional landscape, which serves as a surrogate model of a target property.²³ Gaussian processes (GPs) are capable of good data interpolation, allowing us to build good quality surrogate models with relatively few data points. They produce smooth and continuous landscapes that reflect the continuous chemical process underpinning the data, and can account for experimental uncertainties as data noise. All of these characteristics makes GPR well suited to experimental applications.

The previous study of OTA particle fabrication employed principal component analysis^34,35 (PCA, an ML tool for unsupervised learning) on experimental data to ascertain that pH and pK_a used in the OTA solution correlate most strongly with particle shape. We proceed to consider OTA morphology in the two-dimensional (2D) search space of pH and pK_a. Sample characterization was performed by scanning electron microscopy (SEM) imaging. To digitalize the particle shape information, we quantified the physical dimensions allowed by OTA simple crystalline habits and took note of experimental uncertainties.

While PCA is a versatile tool, it was unable to offer further insight into morphology types, nor indicate optimal processing conditions. Conversely, GPR allowed us to visualize OTA particle morphology as a function of pH and pK_a and delivered a chemically interpretable model. Based on the morphology landscape, our objective was to drive the morphology of particles from one-dimensional (1D) to three-dimensional (3D) shapes. Moreover, by extracting particle yield and volume from each experiment we were able to generate surrogate models for multiple experimental properties at no further cost, allowing us to pursue multi-target tuning of OTA particulate structures. In this article, we present the entire workflow necessary to carry out supervised ML applications on experimental data, with the aim to motivate similar work in the community. Data-efficient ML tools from computer science have the potential to renew experimental practices in materials engineering and boost the search for advanced sustainable compounds.

Materials and methods

OTA particle synthesis

OTA particles were synthesized using the protocol reported previously.⁴ Briefly, aqueous tannic acid solution (2% w/v) was prepared by adding tannic acid powder (1701.20 g/mol, Sigma-Aldrich) into Milli-Q water and rigorously stirring (magnetic bar) until completely dissolved. The pH of the solution was adjusted to a desired pH value with either 1 M KOH, 45% (CH₃)₃ N, 1 M NaOH, 0.5 M Na₃PO₄ or 25% NH₄OH (see Figure 2a). Alkaline conditions required for the OTA synthesis reaction to proceed made us select pH > 7, but the base choice was varied widely and resulted in a pK_a in the range [9.25, 14.9]. All chemicals were reagent-grade and purchased from Sigma-Aldrich. Solutions were covered with perforated Parafilm and were shaken continuously with an orbital shaker for 14 h. All reactions were carried out at room temperature. The grown and precipitated OTA particles were collected and stored at room temperature for further characterization. Despite the simplicity in particle fabrication, multiple experiments were needed to accurately define the conditions that resulted in the given morphology. This required arduous experimentation, as well as time, since each setup produced a specific morphology depending on the reaction conditions.

SEM image analysis

The synthesized OTA particles were imaged using a field-emission SEM (Sigma VP, Zeiss, Germany) with Schottky emitter at 1.5 kV without stage bias. For this purpose, aqueous suspensions of the OTA particles were cast onto pre-cleaned silicon wafers, dried in ambient laboratory conditions and sputter-coated with 4 nm Pd/Au. All imaging was performed on the same day with the OTA suspensions freshly prepared. Collected SEM images were then analyzed using ImageJ software³⁶ to measure the dimensions of the particles typically numbering in the tens (Figure 2b). We measured the length, width and height of OTA particles as d₁, d₂, and d₃, such that d₁ > d₂ > d₃. The average values are reported here as the best estimate of particle dimensions. Standard deviations were recorded to estimate the experimental uncertainty on particle dimensions. All data points, error analysis, and the SEM images are presented in the Supplementary Material document.

Gaussian process regression algorithm

GPR is a kernel-based algorithm for supervised regression that relies on GP models to represent black box functions.²³ Given data and the GP prior, Bayes’ rule is applied to compute the GP posterior. The GP posterior mean serves as the surrogate model, the statistically most likely form of the unknown function. The GP posterior variance supplies a local measure of confidence in the model, typically rising in regions of search space where data are scarce and decreasing in well-explored regions.

For GPR fitting we used an uninformative zero mean GP prior and the radial basis function (RBF) kernel to obtain smooth and continuous landscapes. Data noise was Gaussian-distributed with zero mean. To make the model more robust, we applied inverse gamma priors on the hyperparameters of the kernel, the length scale and variance. During regression, the two hyperparameters were fitted in an automated way by maximizing marginal likelihood: this standard GPR procedure ensures that the results do not depend on manual hyperparameter choices.²³

To compute the surrogate model, we carried out GPR implemented in the Bayesian Optimization Structure Search (BOSS) code. BOSS is an open-source Python code^37,38 for performing GPR and Bayesian optimization (BO) tasks to solve problems in materials science.^39,40,41,42 It can read pre-recorded data sets or acquire data on-the-fly with acquisition functions. BOSS post-processing capabilities allowed us to construct surrogate model landscapes and analyze their features.

Results and discussion

We employed 10 experimental data points on crystallized OTA particles collected by Kämäräinen et al.⁴ to initialize the GPR model. In a departure from earlier work, the prospect of supervised learning required us to carry out experimental data analytics and consider different experimental outcomes, as well as measurement uncertainties. Supervised learning calls for a clear outcome, or label, so samples with ill-defined morphologies were not included into the ML model. Another key part of data digitalization was the conversion of experimental observations into customized descriptors for OTA particle morphology.

We started by analyzing the OTA particle morphology landscapes obtained in the 2D search space of pH and pK_a for shape predictions. To test the predictive power of the model, we performed seven more experiments in key regions of the design space. The additional data also served to refine the morphology model. We validated the morphology landscapes against all experimental data collected, including the samples which were not employed in building the model. Finally, we demonstrated how additional property models for particle yield and volume were built from the same set of experiments and consider multi-objective materials design.

Experimental data set

The experimental data set was adapted for GPR supervised learning by presenting each point in [x, y] pair format. Here x is the location in the design space of OTA particle processing conditions, and y is the label, the morphology design objective for which we construct the surrogate model. Depending on the number of design parameters, x can be N-dimensional. In this work, x = (x₁, x₂) with x₁ assigned the pH of the solution and x₂ the value of base strength pK_a. We limited the design space of the processing conditions (pH, pK_a) to the range of ([7.0, 12.2], [9.0, 15.5]) to reflect the range of the processing conditions within which the experiments were realized.

The morphology of particles was quantified from their measured dimensions (d₁, d₂, d₃). To facilitate comparison between data points, the particle dimension data was scaled by the magnitude of the leading dimension (normalizing the longest dimension to 1.0 for each data point). We defined the morphology label y as:

$$ \begin{gathered} y = \frac{{d_{2} }}{{d_{1} }} + \frac{{d_{3} }}{{d_{1} }}; \hfill \\ d_{1} = 1.0 \to y = d_{2} + d_{3} \hfill .\\ \end{gathered} $$

(1)

This label allows us to distinguish between 1D and 3D morphology conditions as follows:

$$ y = \left\{ {\begin{array}{*{20}c} { 0, d_{1} \gg d_{2} , d_{3} ; 1{\rm D}} \\ { 1, d_{3} \ll d_{1} , d_{2} ; 2{\rm D}} \\ { 2, d_{1} \cong d_{2} \cong d_{3} ; 3{\rm D}}. \\ \end{array} } \right. $$

(2)

While the 1D–3D signal difference across the realistic particles may be considerably lower than the ideal [0, 2] range, the choice of a physically meaningful property as label y allowed us to formulate interpretable surrogate models and gain immediate insight from GPR applications.

Next, we review the range of experimental outcomes and discuss their suitability as input for ML application. Unlike in computational research where a numerical result is guaranteed, any experimental data point may result in one of the following outcomes of experimental fabrication, illustrated in Figure 3a: (a) no particle precipitate; (b) non-quantifiable, ill-defined particle morphology; (c) good quality precipitates with quantifiable dimensions; and (d) multi-morphology precipitates. Too many experimental observations in the first two categories would suggest that the chosen design variables are not the key drivers of the material synthesis, and that the experimental design space needs further consideration.

In our work, 74% of experiments (17 points) resulted in quantifiable sample morphology. A further 22% (5) data points featuring ill-defined particle morphology could not be employed in building the model, but served to verify the model predictions. In one case, we observed OTA samples that featured two distinct particle morphologies in comparable yields (Figure 3d). Such a case indicates a saddle-point in the chemical design space, a two-phase region where both morphologies are in coexistence, and should be approached with caution. Here, we characterized the two morphologies and computed their arithmetic average label y: such treatment reflected the dichotomy in the design space and was supplying this information in the model.

Experimental uncertainties are common in any practical work, and must be carefully considered. In our study, there were uncertainties associated with both OTA sample fabrication and characterization. While we made every effort to fix all aspects of OTA particle synthesis apart from pH and pK_a, unaccounted differences in ambient conditions such as relative humidity could influence the evaporation rate during the experiments, affecting particle yields and morphologies. Changes in impurity content could also affect the observed morphologies. OTA particle dimensions were measured based on visual assignment of particle boundaries: these may introduce minor uncertainties into the mapping from design space to experimental outcome that are difficult to quantify. Irregular particle sizes in our experiments allowed us to perform a statistical analysis of particle dimensions (and thus morphologies). The standard deviations per particle dimension were combined to compute the overall uncertainty ∆ on the morphology label y. Since this quantity reflects the knowability of data, it was adopted to represent all sources of experimental error and served as data noise in the GPR surrogate model (see Supplementary Material for full details). For the precipitate yield, a conservative estimate of 5% variation was assumed.

Morphology landscapes in the design space

Based on GPR, we computed the initial surrogate model for OTA particle morphology in the 2D pH-pK_a design space shown in Figure 4a. The continuous morphology landscape features areas of interest associated with low y signal (1D) and high y signal (3D) structures. It also indicates that there are regions of design space where no data have been collected and where the model may be less reliable.

The minimum of the surrogate model in Figure 4a suggests that high-pH combined with high-pK_a produced OTA particles with the most strongly pronounced 1D character ($d_{1} \gg d_{2} , d_{3}$). Conversely, low pH solutions most likely produced 3D particles. To verify these predictions, we sampled further data points at the edges of the design space at pH < 7.8 and pH > 11, and also at low pK_a values, where data had been sparse. The GPR model that was re-trained with 7 additional experimental points is presented in Figure 4b.

The refined surrogate model for OTA particle morphology retains many of the features of the previous GPR fit in Figure 3a. The predicted high-pH and high-pK_a conditions for 1D particles remain unchanged. However, the region specific to 3D structures (high y values) is now enhanced, shifting to lower pK_a values. The refined landscape suggests that only low-pH and low-pK_a processing conditions give rise to 3D particles. The relatively low value of the morphology signal y throughout the design space indicates that many experimental outcomes are 1D-like. Particles that are 2D-like may form only in the region of chemical space that neighbors the 3D structural conditions.

Model validation and predictive power

To extract predictions from the surrogate model, we coarse-grained the landscape into several categories assuming linear progression from 1D to 3D. As illustrated in Figure 5, this allows us to define regions of design space where experiments would reliably produce 1D, 2D, and 3D OTA particles. We observe that 1D and 3D regions of design space are clear and well separated. The model predicts that solution pH and pK_a are directly correlated: 1D particles are obtained when their values are both high, and 3D when they are both low. In contrast, the 2D particle region spans a limited non-convex area in design space that conforms to the 3D particle region. This implies that 2D particles are difficult to synthesize. The greatest portion of design space was associated with 1D-type structures. The resulting model prediction is that when pH and pK_a are inversely correlated, 1D-like or 1D-2D mixed morphology particles are expected to occur.

In the next step, we validate our model predictions by cross-referencing SEM images of OTA particles with the particle morphology landscape. Figure 6 portrays the landscape overlaid with SEM image data from the area of design space where the OTA particle synthesis was carried out. Images outlined in red represent cases of non-quantifiable particle dimensions (ill-defined morphology), which were not included in the model construction. The case of dual particle morphologies is indicated in green.

It is immediately clear that the predictions regarding 1D and 3D particle formation were correct. 1D landscape regions are associated with very long needle-like particles (up to 0.1 mm), where the design condition $d_{1} \gg d_{2} , d_{3}$ is best satisfied. 1D-like regions exhibit a different 1D morphology where the particles are short and matchstick-like. In some cases, the short 1D particles agglomerate into a larger mass where the morphology is not easily identified. These data were not included into the surrogate model, and yet they correlate well with the mixed morphology 1–2D and 2–3D regions of the landscape. The same is true of the dual morphology data points, which correctly occur in the mixed 1D–2D section of the landscape.

SEM images reveal few examples of 3D particles obtained in these experiments, about 25% of the total. Even fewer are the 2D particle cases, which present mostly as domino-like platelet structures. As predicted by the surrogate model, 1D particles dominate the design space: short matchstick-like structures are the most common experimental outcome. At intermediate pH and pK_a values, there is a risk of particle aggregation: matchsticks combining into disordered bundles and coral-like growth are observed.

OTA particle yield and functional properties

Having demonstrated that GPR surrogate models for OTA particle morphology have good predictive power, we turn our attention to other experimental information. With each data point, we recorded the yield of the dried OTA colloidal content. The measurement of particle dimensions further allowed us to analyze and engineer other functional properties such as particle size, volume or surface area. The leading particle length in experiments varied in the range 0.4–130 µm, suggesting that experimental conditions can be used to tailor the particle size to diverse applications. We focused on the ratio of particle surface area to its volume: surface-based chemical processes underpin many technological applications, so maximizing surface area per volume (A/V) complements particle morphology control as an important design objective.

The GPR surrogate model for OTA particle yield is presented in Figure 7a. The irregular features in this landscape suggest that particle yield is strongly correlated with the base employed in the solution, rather than the pK_a value. For example, applying LiOH (pK_a 13.8) to OTA leads to relatively high yields, about 60%, but NaOH (pK_a 14.8) causes the yield to drop below 10 percent. This observation suggests that particle yield may be better correlated with a different property of the base, such as its size. Solution pH does play a role in the particle yield, with largest yields observed in the pH range of 8–11.

The OTA particle A/V landscape, illustrated in Figure 7b, presents a central region where the A/V ratio is very high. These mid-range pH and pK_a conditions are associated with 2D particles, where experimental data are scarce. OTA particles synthesized in these conditions tend to produce 2D-like lamellar forms that agglomerate into 3D structures (see Figure 6 for SEM images). It was difficult to measure the shape of these particles, so they were not included in the surrogate model. Nevertheless, such samples clearly had the highest A/V ratio, and this was correctly predicted by the A/V surrogate model despite the paucity of data.

Extracting several surrogate models from the same experimental data (at no additional cost) allows us to cross-reference different properties and infer the conditions that would satisfy several design objectives at once. For example, a high yield of 3D particles can be obtained with NH₄OH in low pH = 7 conditions. Highest yield of 1D OTA particles can be achieved with KOH at pH = 10–11, which also produces largest particles with most surface area exposed. 1D particles with high A/V ratio could be produced at very high pK_a, but at relatively low yields. In further work, different label variables can be arithmetically combined into composite labels and landscapes.

Discussion and outlook

The purpose of this work was to evaluate the predictive power of GPR on a small experimental data set; therefore, we deliberately constrained the dimensionality of the problem, which also produced interpretable surrogate models. OTA particle morphology is certainly affected by other experimental parameters. Nevertheless, the good predictive power of surrogate models in the relatively simple 2D design space demonstrated that pH and pK_a alone are sufficient to control particle morphology, in agreement with the earlier PCA result. Unfortunately, PCA was unable to provide insights into the morphology variation that could be achieved with surrogate models.

The morphology landscape portrays a very clear trend, but we are unable to interpret it using scientific intuition. The bottom-up OTA particle synthesis is a result of complex self-assembly where OTA particles coordinate into secondary supramolecular structures, which form tertiary nanofilaments and these assemble into quaternary mesoscopic crystals. It is very difficult to develop any inkling about the outcome of such an intricate procedure, nor about how processing conditions might affect it. Instead, the data-driven landscape can guide further research into the chemical processes behind such outcomes and advance fundamental understanding.

Surrogate models are of general value in materials design because they span all design space, are chemically intuitive and interpretable. It is difficult to establish the criteria for quantitative accuracy of surrogate models. Our work shows that qualitative accuracy already translates to good predictive power, marked by the good visual agreement between the morphology landscapes and the SEM images. OTA samples with ill-defined morphology (not included in the GPR) were particularly important in validating the model predictions. The correspondence of these mixed morphology samples with the appropriate regions on the map demonstrates that good quality ML predictions can be achieved in areas where no experiments were previously performed or included in the model.

The sensitivity of OTA particles to their processing conditions made them an ideal test case for this study, but they remain a challenging material to work with. The composition as well as the molecular structure of tannins are dependent on the source they were extracted from.^43,44 In other words, the plant species and their physiological state dictate the polydispersity and molecular weight, giving rise to inevitable heterogeneity, which complicates the processing and characterization of the materials. The relatively high experimental uncertainties translated into data noise that amounted to 10% of the entire GPR model corrugation. Such noise did not impair the predictive power of the models in this study, but in other work experimental errors could lead to distorted models and less optimal fits.

The convergence of GP models is an important concern in experimental work where data set sizes are small. Typically, an iterative convergence procedure is followed. Here, the addition of further seven data points intended to verify model predictions had a small effect on the qualitative features of the model, so we stopped short of additional experiments. We note that good quality fits can be obtained with small data sets in the case of simple landscapes (few extrema) and very limited problem dimensionality (2D), thus avoiding the curse of dimensionality.⁴⁵ The need for additional data can be also evaluated from the values of the GPR posterior variance, which tends to decrease with more data included in the model. We considered the OTA morphology model variance after 10 and 17 experimental points (see Figure S3). In this work, the relatively large experimental uncertainties translated into large values of GP variance, which remained unchanged with the addition of more data. This finding indicates that in GPR applications to experimental data, where large noise maintains high variance, GP posterior variance might not be a useful measure of model confidence. However, the variance could be used to guide additional experiments.

In further work, our GPR-based approach could be extended to active learning material design workflows. In BO,^32,33 GPR variance is exploited by acquisition functions to select the sampling location that would most enhance the data set. Acquisition functions balance data exploration (searching less-visited areas of phase space) with data exploitation (searching near optimum points in phase space) to attain search objectives with relatively few data points. Search objectives can be learning the entire landscape or minimizing and maximizing materials properties across the search space.

By demonstrating that GPR performs well with experimental data related to OTA morphology design, this study opens the route toward BO with experimental data in engineering colloids. Integrating BO into experimental work is challenging,^46,47,48 but there are many benefits.^49,50 With acquisition functions guiding the selection of experiments, good predictive power of machine learning could be achieved with fewer experimental data points, facilitating the study of complex N-dimensional design spaces with more design variables. Moreover, BO allows to drive experimental data collection towards materials with preferred functional properties (morphological, mechanical or chemical) within the search space. The ML-guided search can thus replace trial-and-error experimental approach in materials design.

Conclusions

Supramolecular OTA constructs present a prospect of novel applications for this versatile and bioactive material. Controlling particle morphology will help us purpose the OTA particulates toward certain functions and application areas. This study combined materials engineering with GPR supervised machine learning to correlate the processing conditions of OTA colloidal solution with the morphology of the resulting dry OTA particles. The Bayesian surrogate model landscapes revealed the variation of particle morphology in the design space, illustrating the fabrication conditions needed to achieve different particle shapes. The main finding from the OTA morphology landscape is that severe processing conditions (high pH and pK_a) give rise to extended 1D particles with high surface area per volume ratios. Reducing the severity of the solution produces smaller, compact 3D shapes.

Despite the relatively small data set size and large experimental uncertainty, the data-driven morphology landscape was in good agreement with OTA sample images. It exhibited considerable predictive power on samples that were not originally included in the model, marking the potential for predictive materials design. From the same set of experiments, we built surrogate models for OTA particle shape, yield, and surface-to-volume ratio, and cross-referenced them to demonstrate how multiple design objectives could be satisfied at once.

Mapping processing conditions directly to experimental properties of materials constitutes a practical approach to ML-led materials engineering, free of human bias. Such procedures could not only supplant experimental trial-and-error approaches, but also guide further research into the mechanisms of crystallization and self-assembly in complex materials, opening innovative engineering routes toward new phases of matter.

Data availability

All raw and processed data employed in this study are presented in the Supplementary Material document.

References

Z. Hu, H.S. Marway, H. Kasem, R. Pelton, E.D. Cranston, Dried and redispersible cellulose nanocrystal pickering emulsions. ACS Macro Lett. 5, 185 (2016). https://doi.org/10.1021/acsmacrolett.5b00919
Article CAS Google Scholar
A.E. Hagerman, K.M. Riedl, G.A. Jones, K.N. Sovik, N.T. Ritchard, P.W. Hartzfeld, T.L. Riechel, High molecular weight plant polyphenolics (tannins) as biological antioxidants. J. Agric. Food Chem. 46, 1887 (1998). https://doi.org/10.1021/jf970975b
Article CAS Google Scholar
S. Gharehkhani, N. Ghavidel, P. Fatehi, Kraft lignin-tannic acid as a green stabilizer for oil/water emulsion. ACS Sustain. Chem. Eng. 7, 2370 (2019). https://doi.org/10.1021/acssuschemeng.8b05193
Article CAS Google Scholar
T. Kämäräinen, M. Ago, L.G. Greca, B.L. Tardy, M. Müllner, L.S. Johansson, O.J. Rojas, Morphology-controlled synthesis of colloidal polyphenol particles from aqueous solutions of tannic acid. ACS Sustain. Chem. Eng. 7, 16985 (2019). https://doi.org/10.1021/acssuschemeng.9b02378
Article CAS Google Scholar
T.S. Sileika, D.G. Barrett, R. Zhang, K.H.A. Lau, P.B. Messersmith, Colorless multifunctional coatings inspired by polyphenols found in tea, chocolate, and wine. Angew. Chem. Int. Ed. 52, 10766 (2013). https://doi.org/10.1002/anie.201304922
Article CAS Google Scholar
Z. Hu, R.M. Berry, R. Pelton, E.D. Cranston, One-pot water-based hydrophobic surface modification of cellulose nanocrystals using plant polyphenols. ACS Sustain. Chem. Eng. 5, 5018 (2017). https://doi.org/10.1021/acssuschemeng.7b00415
Article CAS Google Scholar
V. Tulyathan, R.B. Boulton, V.L. Singleton, Oxygen uptake by gallic acid as a model for similar reactions in wines. J. Agric. Food Chem. 37, 844 (1989). https://doi.org/10.1021/jf00088a002
Article CAS Google Scholar
A. Dutta, S.K. Dolui, Tannic acid assisted one step synthesis route for stable colloidal dispersion of nickel nanostructures. Appl. Surf. Sci. 257, 6889 (2011). https://doi.org/10.1016/j.apsusc.2011.03.025
Article CAS Google Scholar
J. Scoccia, M.D. Perretti, D.M. Monzón, F.P. Crisóstomo, V.S. Martín, R. Carrillo, Sustainable oxidations with air mediated by gallic acid: Potential applicability in the reutilization of grape pomace. Green Chem. 18, 2647 (2016). https://doi.org/10.1039/c5gc02966j
Article CAS Google Scholar
S.K. Bhangu, R. Singla, E. Colombo, M. Ashokkumar, F. Cavalieri, Sono-transformation of tannic acid into biofunctional ellagic acid micro/nanocrystals with distinct morphologies. Green Chem. 20, 816 (2018). https://doi.org/10.1039/c7gc03163g
Article CAS Google Scholar
K.T. Chung, T.Y. Wong, C.I. Wei, Y.W. Huang, Y. Lin, Tannins and human health: A review. Crit. Rev. Food Sci. Nutr. 38, 421 (1998). https://doi.org/10.1080/10408699891274273
Article CAS Google Scholar
B. Badhani, N. Sharma, R. Kakkar, Gallic acid: A versatile antioxidant with promising therapeutic and industrial applications. RSC Adv. 5, 27540 (2015). https://doi.org/10.1039/c5ra01911g
Article CAS Google Scholar
S.-C. Wang, Y. Chen, Y.-C. Wang, W.-J. Wang, C.-S. Yang, C.-L. Tsai, M.-H. Hou, H.-F. Chen, Y.-C. Shen, M.-C. Hung, Tannic acid suppresses SARS-CoV-2 as a dual inhibitor of the viral main protease and the cellular TMPRSS2 protease. Am. J. Cancer Res. 10, 4538 (2020)
CAS Google Scholar
H. Ejima, J.J. Richardson, F. Caruso, Metal-phenolic networks as a versatile platform to engineer nanomaterials and biointerfaces. Nano Today 12, 136 (2017). https://doi.org/10.1016/j.nantod.2016.12.012
Article CAS Google Scholar
V.N. Manoharan, Colloidal matter: Packing, geometry, and entropy. Science (2015). https://doi.org/10.1126/science.1253751
Article Google Scholar
J. Lin, H. Chen, W. Xu, Geometrical percolation threshold of congruent cuboidlike particles in overlapping particle systems. Phys. Rev. E (2018). https://doi.org/10.1103/PhysRevE.98.012134
Article Google Scholar
T. Moberg, K. Sahlin, K. Yao, S. Geng, G. Westman, Q. Zhou, K. Oksman, M. Rigdahl, Rheological properties of nanocellulose suspensions: Effects of fibril/particle dimensions and surface characteristics. Cellulose 24, 2499 (2017). https://doi.org/10.1007/s10570-017-1283-0
Article CAS Google Scholar
A. Albanese, P.S. Tang, W.C.W. Chan, The effect of nanoparticle size, shape, and surface chemistry on biological systems. Annu. Rev. Biomed. Eng. 14, 1 (2012). https://doi.org/10.1146/annurev-bioeng-071811-150124
Article CAS Google Scholar
Y. Xu, M. Cao, Q. Zhang, Recent advances and perspective on heterogeneous catalysis using metals and oxide nanocrystals. Mater. Chem. Front. 5, 151 (2021). https://doi.org/10.1039/d0qm00549e
Article CAS Google Scholar
Q. Zhang, G. Cao, Hierarchically structured photoelectrodes for dye-sensitized solar cells. J. Mater. Chem. 21, 6769 (2011). https://doi.org/10.1039/c0jm04345a
Article CAS Google Scholar
M. Chen, Y. Zhang, L. Xing, Y. Liao, Y. Qiu, S. Yang, W. Li, Morphology-conserved transformations of metal-based precursors to hierarchically porous micro-/nanostructures for electrochemical energy conversion and storage. Adv. Mater. 29, 1607015 (2017). https://doi.org/10.1002/adma.201607015
Article CAS Google Scholar
J.A. Champion, Y.K. Katare, S. Mitragotri, Particle shape: A new design parameter for micro- and nanoscale drug delivery carriers. J. Control. Release 121, 3 (2007). https://doi.org/10.1016/j.jconrel.2007.03.022
Article CAS Google Scholar
M. Seeger, Gaussian processes for machine learning. Int. J. Neural Syst. 14, 69 (2004). https://doi.org/10.1142/S0129065704001899
Article Google Scholar
R. Batra, L. Song, R. Ramprasad, Emerging materials intelligence ecosystems propelled by machine learning. Nat. Rev. Mater. 44, 1 (2020)
Google Scholar
X. Wang, N. Rai, B.M. Pereira, A. Eetemadi, I. Tagkopoulos, Accelerated knowledge discovery from omics data by optimal experimental design. Nat. Commun. 11, 611 (2020)
Article CAS Google Scholar
R. Yuan, Y. Tian, D. Xue, D. Xue, Y. Zhou, X. Ding, J. Sun, T. Lookman, Accelerated search for BaTiO₃-based ceramics with large energy storage at low fields using machine learning and experimental design. Adv. Sci. 6, 1901395 (2019)
Article CAS Google Scholar
Z. Ren, S. Tian, T. Heumueller, E. Birgersson, F. Lin, A. Aberle, S. Sun, I.M. Peters, R. Stangl, C.J. Brabec, T. Buonassisi, F. Oviedo, H. Xue, M. Thway, K. Zhang, N. Li, J.D. Perea, M. Layurova, Y. Wang, 2019 IEEE 46th Photovoltaics Specialist Conference (Chicago, June 16–21, 2019), pp. 3054–3058
P.V. Balachandran, B. Kowalski, A. Sehirlioglu, T. Lookman, Experimental search for high-temperature ferroelectric perovskites guided by two-step machine learning. Nat. Commun. 9, 1 (2018)
Article CAS Google Scholar
F. Häse, L.M. Roch, C. Kreisbeck, A. Aspuru-Guzik, PHOENICS: A universal deep Bayesian optimizer. ACS Cent. Sci. 4, 1134 (2018)
Article Google Scholar
L. Himanen, A. Geurts, A.S. Foster, P. Rinke, Data-driven materials science: Status, challenges, and perspectives. Adv. Sci. 6, 1900808 (2019). https://doi.org/10.1002/advs.201900808
Article Google Scholar
P.I. Frazier, J. Wang, “Bayesian Optimization for Materials Design,” in Information Science for Materials Discovery and Design, Springer Series in Materials Science, vol. 225, T. Lookman, F.J. Alexander, K. Rajan, Eds. (Springer, Cham, Switzerland, 2015), pp. 45–75. https://doi.org/10.1007/978-3-319-23871-5_3
J. Snoek, H. Larochelle, R.P. Adams, Practical Bayesian optimization of machine learning algorithms. Adv. Neural Inf. Process. Syst. 4, 2951 (2012)
Google Scholar
E. Brochu, V.M. Cora, N. de Freitas, A tutorial on Bayesian optimization of expensive cost functions, with application to active user modeling and hierarchical reinforcement learning. Preprint, arXiv:1012.2599. [Cs.LG]. (2010). http://arxiv.org/abs/1012.2599
K. Pearson, LIII. On lines and planes of closest fit to systems of points in space. Philos. Mag. J. Sci. 2, 559 (1901). https://doi.org/10.1080/14786440109462720
H. Hotelling, Relations between two sets of variates. Biometrika 28, 321 (1936)
Article Google Scholar
C.A. Schneider, W.S. Rasband, K.W. Eliceiri, NIH Image to ImageJ: 25 years of image analysis. Nat. Methods. 9, 671 (2012). https://doi.org/10.1038/nmeth.2089
Article CAS Google Scholar
Bayesian Optimization Structure Search (BOSS) code (2020). https://cest-group.gitlab.io/boss/index.html. Accessed 21 Jan 2021
GPy, SheffieldML (n.d.). http://sheffieldml.github.io/GPy/. Accessed 21 Jan 2021
L. Fang, E. Makkonen, M. Todorović, P. Rinke, X. Chen, Efficient amino acid conformer search with Bayesian optimization. J. Chem. Theory Comput. (2021). https://doi.org/10.1021/acs.jctc.0c00648
Article Google Scholar
M. Todorović, M.U. Gutmann, J. Corander, P. Rinke, Bayesian inference of atomistic structure in functional materials. npj Comput. Mater. 5, 35 (2019)
Article Google Scholar
J. Järvi, P. Rinke, M. Todorović, Detecting stable adsorbates of (1S)-camphor on Cu(111) with Bayesian optimization. Beilstein J. Nanotechnol. 11, 1577 (2020)
Article Google Scholar
A.T. Egger, L. Hörmann, A. Jeindl, M. Scherbela, V. Obersteiner, M. Todorović, P. Rinke, O.T. Hofmann, Charge transfer into organic thin films: A deeper insight through machine-learning-assisted structure search. Adv. Sci. 7, 2000992 (2020)
Article CAS Google Scholar
L. Mouls, J.P. Mazauric, N. Sommerer, H. Fulcrand, G. Mazerolles, Comprehensive study of condensed tannins by ESI mass spectrometry: Average degree of polymerisation and polymer distribution determination from mass spectra. Anal. Bioanal. Chem. 400, 613 (2011). https://doi.org/10.1007/s00216-011-4751-7
Article CAS Google Scholar
L. Mouls, V. Hugouvieux, J.P. Mazauric, N. Sommerer, G. Mazerolles, H. Fulcrand, How to gain insight into the polydispersity of tannins: A combined MS and LC study. Food Chem. 165, 348 (2014). https://doi.org/10.1016/j.foodchem.2014.05.121
Article CAS Google Scholar
G. James, D. Witten, T. Hastie, R. Tibshirani, An Introduction to Statistical Learning, 1st ed. (Springer, New York, 2013). https://doi.org/10.1007/978-1-4614-7138-7
A.E. Gongora, B. Xu, W. Perry, C. Okoye, P. Riley, K.G. Reyes, E.F. Morgan, K.A. Brown, A Bayesian experimental autonomous researcher for mechanical design. Sci. Adv. 6(15), eaaz1708 (2020)
Article Google Scholar
L.M. Roch, F. Häse, C. Kreisbeck, T. Tamayo-Mendoza, L.P.E. Yunker, J.E. Hein, A. Aspuru-Guzik, ChemOS: Orchestrating autonomous experimentation. Sci. Robot. (2018). https://doi.org/10.1126/scirobotics.aat5559
Article Google Scholar
R. Kurchin, G. Romano, T. Buonassisi, Bayesim: A tool for adaptive grid model fitting with Bayesian inference. Comput. Phys. Commun. 239, 161 (2019)
Article CAS Google Scholar
M.M. Flores-Leonar, L.M. Mejía-Mendoza, A. Aguilar-Granda, B. Sanchez-Lengeling, H. Tribukait, C. Amador-Bedolla, A. Aspuru-Guzik, Materials acceleration platforms: On the way to autonomous experimentation. Curr. Opin. Green Sustain. Chem. 25, 100370 (2020). https://doi.org/10.1016/j.cogsc.2020.100370
Article Google Scholar
R. Shimizu, S. Kobayashi, Y. Watanabe, Y. Ando, T. Hitosugi, Autonomous materials synthesis by machine learning and robotics. APL Mater. 8, 111110 (2020)
Article CAS Google Scholar

Download references

Acknowledgments

This project received partial funding from the Academy of Finland via the Artificial Intelligence for Microscopic Structure Search (AIMSS) Project No. 316601 and the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (Grant Agreement No. 788489). We are grateful for the support by the Academic Flagship programme under the FinnCERES Materials Bioeconomy and the FCAI Center for Artificial Intelligence as well as the Canada Excellence Research Chair initiative (OJR). The facilities and technical support provided by Aalto University at OtaNano – Nanomicroscopy Center (Aalto-NMC) are also acknowledged.

Funding

Open access funding provided by Aalto University. This project received partial funding from the Academy of Finland via the Artificial Intelligence for Microscopic Structure Search (AIMSS) Project No. 316601 and the European Union’s Horizon 2020 program under the ERC Advanced Grant Agreement No. 788489, “BioElCell.”

Author information

Authors and Affiliations

Department of Chemical & Biomolecular Engineering, North Carolina State University, Raleigh, NC, 27695, USA
Soo-Ah Jin
Department of Bioproducts and Biosystems, Aalto University, Vuorimiehentie 1, P.O. Box 16300, 00076, Espoo, Aalto, Finland
Tero Kämäräinen & Orlando J. Rojas
Department of Applied Physics, Aalto University, P.O. Box 11100, 00076, Aalto, Finland
Patrick Rinke & Milica Todorović
Bioproducts Institute, Departments of Chemical & Biological Engineering, Chemistry, and Wood Science, 2360 East Mall, The University of British Columbia, Vancouver, BC, V6T 1Z3, Canada
Orlando J. Rojas
Department of Mechanical and Materials Engineering, University of Turku, 20014, Turku, Finland
Milica Todorović

Authors

Soo-Ah Jin
View author publications
You can also search for this author in PubMed Google Scholar
Tero Kämäräinen
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Rinke
View author publications
You can also search for this author in PubMed Google Scholar
Orlando J. Rojas
View author publications
You can also search for this author in PubMed Google Scholar
Milica Todorović
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.J. and T.K. performed all experimental work. M.T. performed all computational work and wrote the manuscript. M.T., O.R., and P.R. conceived the study. All authors participated in refining the manuscript.

Corresponding authors

Correspondence to Orlando J. Rojas or Milica Todorović.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 1312 kb)

Rights and permissions

Open access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jin, SA., Kämäräinen, T., Rinke, P. et al. Machine learning as a tool to engineer microstructures: Morphological prediction of tannin-based colloids using Bayesian surrogate models. MRS Bulletin 47, 29–37 (2022). https://doi.org/10.1557/s43577-021-00183-4

Download citation

Accepted: 14 August 2021
Published: 28 February 2022
Issue Date: January 2022
DOI: https://doi.org/10.1557/s43577-021-00183-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.