Abstract
Data-driven approaches now allow for systematic mapping of microstructure to properties. In particular, we now have diverse approaches to “featurize” microstructures, creating a large pool of machine-readable descriptors for subsequent structure-property analysis. We explore three questions in this work: (a) Can a small subset of features be selected to train a good structure-property predictive model? (b) Is this subset agnostic to the choice of feature selection algorithm? And (c) can the addition of expert-identified features improve model performance? Using a canonical dataset, we answer in the affirmative for all three questions.
Graphical abstract
Data availability
The source code for analysis are available in github: https://github.com/owodolab/FeatureEngineeringOPV.
Notes
We use the words “features” and “descriptors” interchangeably.
P3HT:PCBM is poly(3-hexylthiophene) and 1-(3-methoxycarbonyl)-propyl-1-phenyl-[6,6]C61.
Expert-enriched features are scaled through standardization (or Z-score normalization) before the feature engineering step to eliminate bias toward the subset with highest variance.
Completeness is difficult to confirm even for hypothesis-driven selection approaches.
Results in Supplementary Information additionally report the normalized mean absolute errors (NMAEs) for fivefold validation and prediction.
Here, we use the monomial function of order two. See Supplementary Information for more results.
Abbreviations
- \(\mathcal {X}\) :
-
Raw dataset with microstructures
- \(\widehat{d_i}\) :
-
Salient feature
- \(\widehat{d}\) :
-
Vector of salient features
- \(\widehat{d}^E\) :
-
Vector of salient features derived by the expert
- \(\widehat{d}^{\prime \text {mRMR}}\) :
-
Vector of salient features derived using the mRMR method from \(d^{\prime }\)
- \(\widehat{d}^{\text {FS}}\) :
-
Vector of salient features derived via forward selection from d.
- \(\widehat{d}^{\text {mRMR}}\) :
-
Vector of salient features derived using the mRMR method from d.
- \(\widehat{d}^{\text {RF}}\) :
-
Vector of salient features derived via random forest method from d.
- \(\widehat{f}\) :
-
Salient features derived via PCA from f
- \(\widehat{f}^{\prime }\) :
-
Salient features derived via PCA from \(f^{\prime }\)
- \(\widehat{x}\) :
-
Salient features derived via PCA from X
- \(\widetilde{d}\) :
-
Vector of monomial-augmented salient features
- \(\widetilde{d}^{\prime \text {mRMR}}\) :
-
Vector of monomial-augmented \(\widehat{d}^{\prime \text {mRMR}}\)
- \(\widetilde{d}^{\text {mRMR}}\) :
-
Vector of monomial-augmented \(\widehat{d}^{\text {mRMR}}\)
- \(\widetilde{f}\) :
-
Monomial-augmented \(\widehat{f}\)
- \(\widetilde{f}^{\prime }\) :
-
Monomial-augmented \(\widehat{f}^{\prime }\)
- \(\widetilde{M_4^{\prime }}\) :
-
SP model mapping \(\widetilde{f}^{\prime }\) to \(J_{\text{sc}}\)
- \(\widetilde{M_4}\) :
-
SP model mapping \(\widetilde{f}\) to \(J_{\text{sc}}\)
- \(\widetilde{M}_1\) :
-
SP model mapping \(\widetilde{d}^{\prime \text {mRMR}}\) to \(J_{\text{sc}}\)
- \(A\) :
-
Influence coefficients capturing the SP map
- \(d\) :
-
Vector of descriptors
- \(d^{\prime }\) :
-
Expert-enriched vector of descriptors
- \(d_i\) :
-
Descriptor/feature
- \(F_i\) :
-
Autocorrelation array of microstructure \(X_i\)
- \(F\) :
-
Function mapping the salient features to property
- \(F_i^{\prime }\) :
-
Array of microstructure \(X_i\) auto-correlation with state enriched by the expert knowledge
- \(J_{\text{sc}}\) :
-
The short circuit current (\(\text{A/m}^2\))
- \(m(s)\) :
-
A state of the microstructure at the location s in X
- \(M_1\) :
-
SP model mapping \(\hat{d}^{{{\text{mRMR}}}}\) to \(J_{\text{sc}}\)
- \(M_1^{\prime }\) :
-
SP model mapping \(\widehat{d}^{\prime \text {mRMR}}\) to \(J_{\text{sc}}\)
- \(M_2\) :
-
SP model mapping \(\widehat{d}^{\text {FS}}\) to \(J_{\text{sc}}\)
- \(M_2^{\prime }\) :
-
SP model mapping \(\widehat{d}^{\prime \text {FS}}\) to \(J_{\text{sc}}\)
- \(M_3\) :
-
SP model mapping \(\widehat{d}^{\text {RF}}\) to \(J_{\text{sc}}\)
- \(M_3^{\prime }\) :
-
SP model mapping \(\widehat{d}^{\prime \text {RF}}\) to \(J_{\text{sc}}\)
- \(M_4\) :
-
SP model mapping the salient features derived using low dimensional embedding of f.
- \(M_4^{\prime }\) :
-
SP model mapping the salient features derived using low dimensional embedding of \(f^{\prime }\)
- \(M_5\) :
-
SP model mapping \(\widehat{x}\) to \(J_{\text{sc}}\)
- \(M_E\) :
-
SP model derived by the expert
- \(N\) :
-
Total number of microstructures in \(\mathcal {X}\)
- \(P\) :
-
Material property of interest, here \(J_{\text{sc}}\)
- \({\text{PC}}_{i}\) :
-
Principal component
- \(q\) :
-
Order of monomial functions
- \(\text{RL0}\) :
-
Representation layer zero: raw data
- \(\text{RL1}\) :
-
Representation layer one: input features
- \(\text{RL2}\) :
-
Representation layer two: salient features
- \(X\) :
-
Microstructure data point in \(\mathcal {X}\)
- \(\mathcal{X}^f\) :
-
Dataset featurized using machine-derived approach
- \(\mathcal{X}^{f\prime }\) :
-
Dataset featurized using machine-derived approach and enriched with expert knowledge
- \(S\) :
-
Size of salient features vector
- \(\widetilde{S}\) :
-
Size of salient extended features vector
References
B.S.S. Pokuri, S. Ghosal, A. Kokate, S. Sarkar, B. Ganapathysubramanian, npj Comput. Mater. 5(1), 1 (2019)
B.L. DeCost, E.A. Holm, Comput. Mater. Sci. 110, 126 (2015)
S.R. Kalidindi, Int. Mater. Rev. 60(3), 150 (2015)
A. Çeçen, T. Fast, E. Kumbur, S. Kalidindi, J. Power Sources 245, 144 (2014)
R. Bostanabad, Y. Zhang, X. Li, T. Kearney, L.C. Brinson, D.W. Apley, W.K. Liu, W. Chen, Prog. Mater. Sci. 95, 1 (2018)
H. Xu, Y. Li, C. Brinson, W. Chen, J. Mech. Des. 136(5), 051007 (2014)
S. Torquato, Annu. Rev. Mater. Res. 32(1), 77 (2002)
Y. Jiao, F.H. Stillinger, S. Torquato, Proc. Natl. Acad. Sci. USA 106(42), 17634 (2009)
S. Yu, C. Wang, Y. Zhang, B. Dong, Z. Jiang, X. Chen, W. Chen, C. Sun, Sci. Rep. 7(3752) (2017)
M. Teubner, Europhys. Lett. (EPL) 14(5), 403 (1991)
D.M. Dimiduk, E.A. Holm, S.R. Niezgoda, Integr. Mater. Manuf. Innov. 7(3), 157 (2018)
O. Wodo, J. Zola, B.S.S. Pokuri, P. Du, B. Ganapathysubramanian, Process–structure–property map for organic solar cells (2021). https://doi.org/10.5281/zenodo.5061951
O. Wodo, B. Ganapathysubramanian, J. Comput. Phys. 230(15), 6037 (2011)
H.K. Kodali, B. Ganapathysubramanian, Model. Simul. Mater. Sci. Eng. 20(3), 035015 (2012)
O. Wodo, J. Zola, B.S.S. Pokuri, P. Du, B. Ganapathysubramanian, Mater. Discov. 1, 21 (2015)
X.Y. Lee, J.R. Waite, C.H. Yang, B.S.S. Pokuri, A. Joshi, A. Balu, C. Hegde, B. Ganapathysubramanian, S. Sarkar, Nat. Comput. Sci. 1(3), 229 (2021)
C.C. Aggarwal, A. Hinneburg, D.A. Keim, International Conference on Database Theory (Springer, Berlin, 2001), pp. 420–434
O. Wodo, S. Tirthapura, S. Chaudhary, B. Ganapathysubramanian, Org. Electron. 13(6), 1105 (2012)
GraSPI: an extensible software for graph-based morphology quantification in organic electronics (2021). https://github.com/owodolab/graspi
D. Wheeler, D. Brough, A. Shanker, B. Yucel, S. Voigt, A. Rossi, A. Cecen, F. Hohman, N. Paulson, A. Lohse, A. Medford, aiskakov, S. Kalidindi, A. Castillo, M. Diehl, A. Blekh, M. Whitley, R. Cimrman, E. Popova, S. Mohan, materialsinnovation/pymks: version 0.4.1a1 (2021). https://doi.org/10.5281/zenodo.5043652
D.B. Brough, D. Wheeler, S.R. Kalidindi, Integr. Mater. Manuf. Innov. 6(1), 36 (2017)
A. Cecen, T. Fast, S. Kalidindi, Integr. Mater. Manuf. Innov. 5, 1 (2016)
B. Yucel, S. Yucel, A. Ray, L. Duprez, S. Kalidindi, Integr. Mater. Manuf. Innov. 9, 240 (2020)
S. Kalidindi, A. Khosravani, B. Yucel, A. Shanker, A.L. Blekh, Integr. Mater. Manuf. Innov. 8, 441 (2019)
D.T. Fullwood, S.R. Niezgoda, B.L. Adams, S.R. Kalidindi, Prog. Mater. Sci. 55(6), 477 (2010)
A. Gokhale, A. Tewari, H. Garmestani, Scr. Mater. 53(8), 989 (2005)
S.R. Kalidindi, Hierarchical Materials Informatics: Novel Analytics for Materials Data (Elsevier, Amsterdam, 2015)
B. Ganapathysubramanian, N. Zabaras, Finite Elem. Anal. Des. 44(5), 298 (2008)
R. Olivares-Amaya, C. Amador-Bedolla, J. Hachmann, S. Atahan-Evrenk, R.S. Sanchez-Carrera, L. Vogt, A. Aspuru-Guzik, Energy Environ. Sci. 4(12), 4849 (2011)
G. Chandrashekar, F. Sahin, Comput. Electr. Eng. 40(1), 16 (2014)
Acknowledgments
This work was supported by National Science Foundation (1906344 and 1910539). BG acknowledges support from the ONR MURI ONR N00014-19-12453. OW and HL acknowledge the support provided by the Center for Computational Research at the University at Buffalo. BY and SK acknowledge support from NIST 70NANB18H039 (program manager Dr. James Warren).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Liu, H., Yucel, B., Wheeler, D. et al. How important is microstructural feature selection for data-driven structure-property mapping?. MRS Communications 12, 95–103 (2022). https://doi.org/10.1557/s43579-021-00147-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1557/s43579-021-00147-4