Regression with small data sets: a case study using code surrogates in additive manufacturing

Kamath, Chandrika; Fan, Ya Ju

doi:10.1007/s10115-018-1174-1

Regression with small data sets: a case study using code surrogates in additive manufacturing

Regular Paper
Published: 01 March 2018

Volume 57, pages 475–493, (2018)
Cite this article

Knowledge and Information Systems Aims and scope Submit manuscript

1594 Accesses
24 Citations
Explore all metrics

Abstract

There has been an increasing interest in recent years in the mining of massive data sets whose sizes are measured in terabytes. However, there are some problems where collecting even a single data point is very expensive, resulting in data sets with only tens or hundreds of samples. One such problem is that of building code surrogates, where a computer simulation is run using many different values of the input parameters and a regression model is built to relate the outputs of the simulation to the inputs. A good surrogate can be very useful in sensitivity analysis, uncertainty analysis, and in designing experiments, but the cost of running expensive simulations at many sample points can be high. In this paper, we use a problem from the domain of additive manufacturing to show that even with small data sets we can build good quality surrogates by appropriately selecting the input samples and the regression algorithm. Our work is broadly applicable to simulations in other domains and the ideas proposed can be used in time-constrained machine learning tasks, such as hyper-parameter optimization.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Integrated Surrogate Modeling Approach for Materials and Process Design

On the Use of Data Mining Techniques to Build High-Density, Additively-Manufactured Parts

A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management

Article 25 September 2022

References

ACME (2016) Accelerated climate modeling for energy web page. https://climatemodeling.science.energy.gov/projects/accelerated-climate-modeling-energy
Atkeson C, Schaal SA, Moore AW (1997) Locally weighted learning. AI Rev. 11:75–133
Google Scholar
Austin PC, Steyerberg EW (2015) The number of subjects per variable required in linear regression analyses. J Clin Epidemiol 68:627–636
Article Google Scholar
Babyak MA (2004) What you see may not be what you get: a brief, non-technical introduction to overfitting in regression-type models. Psychosom Med 66:411–421
Google Scholar
Bennett KP, Mangasarian OL (1992) Robust linear programming discrimination of two linearly inseparable sets. Optim Methods Softw 1(1):23–34
Article Google Scholar
Bergstra J, Bengio Y (2012) Random search for hyper-parameter optimization. J Mach Learn Res 13:281–305
MathSciNet MATH Google Scholar
Beuth J et al (2013) Process mapping for qualification across multiple direct metal additive manufacturing processes. In: Bourell D (ed) International solid freeform fabrication symposium, an additive manufacturing conference. University of Texas at Austin, Austin, Texas, pp 655–665
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. CRC Press, Boca Raton
MATH Google Scholar
Burl MC et al (2006) Automated knowledge discovery from simulators. In: Proceedings, Sixth SIAM international conference on data mining, pp 82–93
Carriera-Perpiñán MA (1996) A review of dimension reduction techniques. Tech. rep., Technical Report CS-96-09, Department of Computer Science, University of Sheffield, UK
Chang C-C, Lin C-J (2011) LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2:27:1–27:27. Software available at http://www.csie.ntu.edu.tw/~cjlin/libsvm
Chapelle O, Vapnik V, Bengio Y (2002) Model selection for small sample regression. Mach Learn 48(1):9–23
Article MATH Google Scholar
Committee on Mathematical Foundations of Verification, Validation, and Uncertainty Quantification; Board on Mathematical Sciences and Their Applications, Division on Engineering and Physical Sciences, National Research Council (2012) Assessing the reliability of complex models: mathematical and statistical foundations of verification, validation, and uncertainty quantification. The National Academies Press, Washington
Google Scholar
Eagar T, Tsai N (1983) Temperature-fields produced by traveling distributed heat-sources. Weld J 62:S346–S355
Google Scholar
Fang K-T, Li R, Sudjianto A (2005) Design and modeling for computer experiments. Chapman and Hall/CRC Press, Boca Raton
Book MATH Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19, 1(03):1–67
Article MathSciNet MATH Google Scholar
GPy (2012) GPy: A Gaussian process framework in python. http://github.com/SheffieldML/GPy
Guo Y, Graber A, McBurney RN, Balasubramanian R (2010) Sample size and statistical power considerations in high-dimensionality data settings: a comparative study of classification algorithms. BMC Bioinf 11:447
Article Google Scholar
Isaksson A, Wallman M, Goransson H, Gustafsson M (2008) Cross-validation and bootstrapping are unreliable in small sample classification. Pattern Recogn Lett 29:1960–1965
Article Google Scholar
Kamath C (2009) Scientific data mining: a practical perspective. Society for Industrial and Applied Mathematics (SIAM), Philadelphia
Book MATH Google Scholar
Kamath C (2016) Data mining and statistical inference in selective laser melting. Int J Adv Manuf Technol 86:1659–1677
Article Google Scholar
Kamath C, Cantú-Paz E (2001) Creating ensembles of decision trees through sampling. In: Proceedings of the 33-rd symposium on the interface: computing science and statistics
Kamath C, El-dasher B, Gallegos GF, King WE, Sisto A (2014) Density of additively-manufactured, 316L SS parts using laser powder-bed fusion at powers up to 400 W. Int J Adv Manuf Technol 74:65–78
Article Google Scholar
Kleijnen JPC (2008) Design and analysis of simulation experiments. Springer, New York
MATH Google Scholar
Mitchell DP (1991) Spectrally optimal sampling for distribution ray tracing. Comput Graph 25(4):157–164
Article Google Scholar
Oehlert GW (2000) A first course in design and analysis of experiments. W. H. Freeman. http://users.stat.umn.edu/~gary/Book.html
Owen AB (2003) Quasi-Monte Carlo sampling. Course notes from Siggraph course. http://www-stat.stanford.edu/~owen/reports/
Owen AB (1998) Latin supercube sampling for very high-dimensional simulations. ACM Trans Model Comput Simul 8(1):71–102
Article MATH Google Scholar
Qian Y et al (2016) Uncertainty quantification in climate modeling and projection. Bull Am Meteorol Soc 97(5):821–824
Article Google Scholar
Rasmussen CE, Williams CKI (2006) Gaussian processes for machine learning. MIT Press, Cambridge
MATH Google Scholar
Rokach L (2010) Pattern classification using ensemble methods. World Scientific Publishing, Singapore
MATH Google Scholar
Rokach L, Maimon O (2014) Data mining with decision trees: theory and applications. World Scientific Publishing, Singapore
Book MATH Google Scholar
Rudy J (2013) Py-earth. https://contrib.scikit-learn.org/py-earth/
Schölkopf B, Smola AJ, Williamson RC, Bartlett PL (2000) New support vector algorithms. Neural Comput 12(5):1207–1245
Article Google Scholar
Shiflet AB, Shiflet GW (2006) Introduction to computational science: modeling and simulation for the sciences. Princeton University Press, Princeton
MATH Google Scholar
Vapnik VN (1995) The nature of statistical learning theory. Springer, New York
Book MATH Google Scholar
Verhaeghe F, Craeghs T, Heulens J, Pandalaers L (2009) A pragmatic model for selective laser melting with evaporation. Acta Mater 57:6006–6012
Article Google Scholar
Yadroitsev I, Gusarov A, Yadroitsava I, Smurov I (2010) Single track formation in selective laser melting of metal powders. J Mater Process Technol 210:1624–1631
Article Google Scholar

Download references

Acknowledgements

The results in this paper were generated using codes we developed for regression trees and LWKR, as well as public domain codes for MARS [33], SVR [11], and GP [17]. The Eagar–Tsai data were generated using a code developed by David Macknelly. We thank the anonymous reviewers for their feedback. This work performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.

Author information

Authors and Affiliations

Lawrence Livermore National Laboratory, 7000 East Avenue, Livermore, CA, 94551, USA
Chandrika Kamath & Ya Ju Fan

Authors

Chandrika Kamath
View author publications
You can also search for this author in PubMed Google Scholar
Ya Ju Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chandrika Kamath.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kamath, C., Fan, Y.J. Regression with small data sets: a case study using code surrogates in additive manufacturing. Knowl Inf Syst 57, 475–493 (2018). https://doi.org/10.1007/s10115-018-1174-1

Download citation

Received: 31 August 2017
Revised: 12 January 2018
Accepted: 26 January 2018
Published: 01 March 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s10115-018-1174-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Regression with small data sets: a case study using code surrogates in additive manufacturing

Abstract

Access this article

Similar content being viewed by others

An Integrated Surrogate Modeling Approach for Materials and Process Design

On the Use of Data Mining Techniques to Build High-Density, Additively-Manufactured Parts

A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Regression with small data sets: a case study using code surrogates in additive manufacturing

Abstract

Access this article

Similar content being viewed by others

An Integrated Surrogate Modeling Approach for Materials and Process Design

On the Use of Data Mining Techniques to Build High-Density, Additively-Manufactured Parts

A systematic review on data of additive manufacturing for machine learning applications: the data quality, type, preprocessing, and management

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation