Skip to main content
Log in

How important is microstructural feature selection for data-driven structure-property mapping?

  • Research Letter
  • Published:
MRS Communications Aims and scope Submit manuscript

Abstract

Data-driven approaches now allow for systematic mapping of microstructure to properties. In particular, we now have diverse approaches to “featurize” microstructures, creating a large pool of machine-readable descriptors for subsequent structure-property analysis. We explore three questions in this work: (a) Can a small subset of features be selected to train a good structure-property predictive model? (b) Is this subset agnostic to the choice of feature selection algorithm? And (c) can the addition of expert-identified features improve model performance? Using a canonical dataset, we answer in the affirmative for all three questions.

Graphical abstract

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3

Data availability

The source code for analysis are available in github: https://github.com/owodolab/FeatureEngineeringOPV.

Notes

  1. We use the words “features” and “descriptors” interchangeably.

  2. P3HT:PCBM is poly(3-hexylthiophene) and 1-(3-methoxycarbonyl)-propyl-1-phenyl-[6,6]C61.

  3. Expert-enriched features are scaled through standardization (or Z-score normalization) before the feature engineering step to eliminate bias toward the subset with highest variance.

  4. Completeness is difficult to confirm even for hypothesis-driven selection approaches.

  5. Results in Supplementary Information additionally report the normalized mean absolute errors (NMAEs) for fivefold validation and prediction.

  6. Here, we use the monomial function of order two. See Supplementary Information for more results.

Abbreviations

\(\mathcal {X}\) :

Raw dataset with microstructures

\(\widehat{d_i}\) :

Salient feature

\(\widehat{d}\) :

Vector of salient features

\(\widehat{d}^E\) :

Vector of salient features derived by the expert

\(\widehat{d}^{\prime \text {mRMR}}\) :

Vector of salient features derived using the mRMR method from \(d^{\prime }\)

\(\widehat{d}^{\text {FS}}\) :

Vector of salient features derived via forward selection from d.

\(\widehat{d}^{\text {mRMR}}\) :

Vector of salient features derived using the mRMR method from d.

\(\widehat{d}^{\text {RF}}\) :

Vector of salient features derived via random forest method from d.

\(\widehat{f}\) :

Salient features derived via PCA from f

\(\widehat{f}^{\prime }\) :

Salient features derived via PCA from \(f^{\prime }\)

\(\widehat{x}\) :

Salient features derived via PCA from X

\(\widetilde{d}\) :

Vector of monomial-augmented salient features

\(\widetilde{d}^{\prime \text {mRMR}}\) :

Vector of monomial-augmented \(\widehat{d}^{\prime \text {mRMR}}\)

\(\widetilde{d}^{\text {mRMR}}\) :

Vector of monomial-augmented \(\widehat{d}^{\text {mRMR}}\)

\(\widetilde{f}\) :

Monomial-augmented \(\widehat{f}\)

\(\widetilde{f}^{\prime }\) :

Monomial-augmented \(\widehat{f}^{\prime }\)

\(\widetilde{M_4^{\prime }}\) :

SP model mapping \(\widetilde{f}^{\prime }\) to \(J_{\text{sc}}\)

\(\widetilde{M_4}\) :

SP model mapping \(\widetilde{f}\) to \(J_{\text{sc}}\)

\(\widetilde{M}_1\) :

SP model mapping \(\widetilde{d}^{\prime \text {mRMR}}\) to \(J_{\text{sc}}\)

\(A\) :

Influence coefficients capturing the SP map

\(d\) :

Vector of descriptors

\(d^{\prime }\) :

Expert-enriched vector of descriptors

\(d_i\) :

Descriptor/feature

\(F_i\) :

Autocorrelation array of microstructure \(X_i\)

\(F\) :

Function mapping the salient features to property

\(F_i^{\prime }\) :

Array of microstructure \(X_i\) auto-correlation with state enriched by the expert knowledge

\(J_{\text{sc}}\) :

The short circuit current (\(\text{A/m}^2\))

\(m(s)\) :

A state of the microstructure at the location s in X

\(M_1\) :

SP model mapping \(\hat{d}^{{{\text{mRMR}}}}\) to \(J_{\text{sc}}\)

\(M_1^{\prime }\) :

SP model mapping \(\widehat{d}^{\prime \text {mRMR}}\) to \(J_{\text{sc}}\)

\(M_2\) :

SP model mapping \(\widehat{d}^{\text {FS}}\) to \(J_{\text{sc}}\)

\(M_2^{\prime }\) :

SP model mapping \(\widehat{d}^{\prime \text {FS}}\) to \(J_{\text{sc}}\)

\(M_3\) :

SP model mapping \(\widehat{d}^{\text {RF}}\) to \(J_{\text{sc}}\)

\(M_3^{\prime }\) :

SP model mapping \(\widehat{d}^{\prime \text {RF}}\) to \(J_{\text{sc}}\)

\(M_4\) :

SP model mapping the salient features derived using low dimensional embedding of f.

\(M_4^{\prime }\) :

SP model mapping the salient features derived using low dimensional embedding of \(f^{\prime }\)

\(M_5\) :

SP model mapping \(\widehat{x}\) to \(J_{\text{sc}}\)

\(M_E\) :

SP model derived by the expert

\(N\) :

Total number of microstructures in \(\mathcal {X}\)

\(P\) :

Material property of interest, here \(J_{\text{sc}}\)

\({\text{PC}}_{i}\) :

Principal component

\(q\) :

Order of monomial functions

\(\text{RL0}\) :

Representation layer zero: raw data

\(\text{RL1}\) :

Representation layer one: input features

\(\text{RL2}\) :

Representation layer two: salient features

\(X\) :

Microstructure data point in \(\mathcal {X}\)

\(\mathcal{X}^f\) :

Dataset featurized using machine-derived approach

\(\mathcal{X}^{f\prime }\) :

Dataset featurized using machine-derived approach and enriched with expert knowledge

\(S\) :

Size of salient features vector

\(\widetilde{S}\) :

Size of salient extended features vector

References

  1. B.S.S. Pokuri, S. Ghosal, A. Kokate, S. Sarkar, B. Ganapathysubramanian, npj Comput. Mater. 5(1), 1 (2019)

    Article  Google Scholar 

  2. B.L. DeCost, E.A. Holm, Comput. Mater. Sci. 110, 126 (2015)

    Article  Google Scholar 

  3. S.R. Kalidindi, Int. Mater. Rev. 60(3), 150 (2015)

    Article  CAS  Google Scholar 

  4. A. Çeçen, T. Fast, E. Kumbur, S. Kalidindi, J. Power Sources 245, 144 (2014)

    Article  Google Scholar 

  5. R. Bostanabad, Y. Zhang, X. Li, T. Kearney, L.C. Brinson, D.W. Apley, W.K. Liu, W. Chen, Prog. Mater. Sci. 95, 1 (2018)

    Article  CAS  Google Scholar 

  6. H. Xu, Y. Li, C. Brinson, W. Chen, J. Mech. Des. 136(5), 051007 (2014)

    Article  Google Scholar 

  7. S. Torquato, Annu. Rev. Mater. Res. 32(1), 77 (2002)

    Article  CAS  Google Scholar 

  8. Y. Jiao, F.H. Stillinger, S. Torquato, Proc. Natl. Acad. Sci. USA 106(42), 17634 (2009)

    Article  CAS  Google Scholar 

  9. S. Yu, C. Wang, Y. Zhang, B. Dong, Z. Jiang, X. Chen, W. Chen, C. Sun, Sci. Rep. 7(3752) (2017)

  10. M. Teubner, Europhys. Lett. (EPL) 14(5), 403 (1991)

    Article  CAS  Google Scholar 

  11. D.M. Dimiduk, E.A. Holm, S.R. Niezgoda, Integr. Mater. Manuf. Innov. 7(3), 157 (2018)

    Article  Google Scholar 

  12. O. Wodo, J. Zola, B.S.S. Pokuri, P. Du, B. Ganapathysubramanian, Process–structure–property map for organic solar cells (2021). https://doi.org/10.5281/zenodo.5061951

  13. O. Wodo, B. Ganapathysubramanian, J. Comput. Phys. 230(15), 6037 (2011)

    Article  CAS  Google Scholar 

  14. H.K. Kodali, B. Ganapathysubramanian, Model. Simul. Mater. Sci. Eng. 20(3), 035015 (2012)

    Article  Google Scholar 

  15. O. Wodo, J. Zola, B.S.S. Pokuri, P. Du, B. Ganapathysubramanian, Mater. Discov. 1, 21 (2015)

    Article  Google Scholar 

  16. X.Y. Lee, J.R. Waite, C.H. Yang, B.S.S. Pokuri, A. Joshi, A. Balu, C. Hegde, B. Ganapathysubramanian, S. Sarkar, Nat. Comput. Sci. 1(3), 229 (2021)

    Article  Google Scholar 

  17. C.C. Aggarwal, A. Hinneburg, D.A. Keim, International Conference on Database Theory (Springer, Berlin, 2001), pp. 420–434

  18. O. Wodo, S. Tirthapura, S. Chaudhary, B. Ganapathysubramanian, Org. Electron. 13(6), 1105 (2012)

    Article  CAS  Google Scholar 

  19. GraSPI: an extensible software for graph-based morphology quantification in organic electronics (2021). https://github.com/owodolab/graspi

  20. D. Wheeler, D. Brough, A. Shanker, B. Yucel, S. Voigt, A. Rossi, A. Cecen, F. Hohman, N. Paulson, A. Lohse, A. Medford, aiskakov, S. Kalidindi, A. Castillo, M. Diehl, A. Blekh, M. Whitley, R. Cimrman, E. Popova, S. Mohan, materialsinnovation/pymks: version 0.4.1a1 (2021). https://doi.org/10.5281/zenodo.5043652

  21. D.B. Brough, D. Wheeler, S.R. Kalidindi, Integr. Mater. Manuf. Innov. 6(1), 36 (2017)

    Article  Google Scholar 

  22. A. Cecen, T. Fast, S. Kalidindi, Integr. Mater. Manuf. Innov. 5, 1 (2016)

    Article  Google Scholar 

  23. B. Yucel, S. Yucel, A. Ray, L. Duprez, S. Kalidindi, Integr. Mater. Manuf. Innov. 9, 240 (2020)

    Article  Google Scholar 

  24. S. Kalidindi, A. Khosravani, B. Yucel, A. Shanker, A.L. Blekh, Integr. Mater. Manuf. Innov. 8, 441 (2019)

    Article  Google Scholar 

  25. D.T. Fullwood, S.R. Niezgoda, B.L. Adams, S.R. Kalidindi, Prog. Mater. Sci. 55(6), 477 (2010)

    Article  CAS  Google Scholar 

  26. A. Gokhale, A. Tewari, H. Garmestani, Scr. Mater. 53(8), 989 (2005)

    Article  CAS  Google Scholar 

  27. S.R. Kalidindi, Hierarchical Materials Informatics: Novel Analytics for Materials Data (Elsevier, Amsterdam, 2015)

    Google Scholar 

  28. B. Ganapathysubramanian, N. Zabaras, Finite Elem. Anal. Des. 44(5), 298 (2008)

    Article  Google Scholar 

  29. R. Olivares-Amaya, C. Amador-Bedolla, J. Hachmann, S. Atahan-Evrenk, R.S. Sanchez-Carrera, L. Vogt, A. Aspuru-Guzik, Energy Environ. Sci. 4(12), 4849 (2011)

    Article  CAS  Google Scholar 

  30. G. Chandrashekar, F. Sahin, Comput. Electr. Eng. 40(1), 16 (2014)

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Science Foundation (1906344 and 1910539). BG acknowledges support from the ONR MURI ONR N00014-19-12453. OW and HL acknowledge the support provided by the Center for Computational Research at the University at Buffalo. BY and SK acknowledge support from NIST 70NANB18H039 (program manager Dr. James Warren).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Olga Wodo.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (PDF 876 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, H., Yucel, B., Wheeler, D. et al. How important is microstructural feature selection for data-driven structure-property mapping?. MRS Communications 12, 95–103 (2022). https://doi.org/10.1557/s43579-021-00147-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1557/s43579-021-00147-4

Keywords

Navigation