Abstract
This chapter focuses on the knowledge-driven dimension reduction aspect of mechanistic data science. Two types of dimension reduction methods that are introduced in this chapter: clustering and reduced order modeling. Clustering aims to reduce the total number of data points in a dataset by grouping similar data points into clusters. The datapoints within a cluster are considered to be more like each other than datapoints in other clusters. There are multiple methods and algorithms for clustering. In the first part of this chapter, three clustering algorithms are presented, ranging from entry level to advanced level: the Jenks natural breaks, K-means clustering, and self-organizing map (SOM). Clustering is a form of dimension reduction that reduces the total number of data points. In the second part of this chapter, Singular Value Decomposition (SVD) and Principal Component Analysis (PCA) will be introduced as a reduced order modeling technique that reduce the number of features by eliminating redundant and dependent features, leading to a new set of principal features. The resulting model is called a reduced order model. Proper Generalized Decomposition (PGD) is a higher order extension of PCA and will also be introduced.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Driver HE, Kroeber AL (1932) Quantitative expression of cultural relationships. University of California Publications in American Archaeology and Ethnology, Berkeley, pp 211–256
Zubin J (1938) A technique for measuring like-mindedness. J Abnorm Soc Psychol 33(4):508–516
Tryon RC (1939) Cluster analysis: correlation profile and Orthometric (factor). In: Analysis for the Isolation of Unities in Mind and Personality. Edwards Brothers, Ann Arbor
Cattell RB (1943) The description of personality: basic traits resolved into clusters. J Abnorm Soc Psychol 38(4):476–506
Jenks GF (1967) The data model concept in statistical mapping. Int Yearbook Cartograp 7:186–190
MacQueen JB (1967) Some methods for classification and analysis of multivariate observations. In: Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability, vol 1. University of California Press, Berkeley, pp 281–297
Jain AK, Dubes RC (1988) Algorithms for clustering data. Prentice-Hall, Englewood Cliffs
Defays D (1977) An efficient algorithm for a complete-link method. Comp J Br Comp Soc 20(4):364–366
Kohonen T, Honkela T (2007) Kohonen Network. Scholarpedia 2(1):1568
https://en.wikipedia.org/wiki/Singular_value_decomposition#History
Pearson K (1901) On lines and planes of closest fit to Systems of Points in space. Philos Mag 2(11):559–572
Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24:417–441 and 498–520
Ammar A, Mokdad B, Chinesta F, Keunings R (2006) A new family of solvers for some classes of multidimensional partial differential equations encountered in kinetic theory Modeling of complex fluids. J Non-Newtonian Fluid Mech 139(3):153–176
Jenks GF (1967) The data model concept in statistical mapping. Int Yearbook Cartograp 7:186–190
https://medium.com/analytics-vidhya/jenks-natural-breaks-best-range-finder-algorithm-8d1907192051
Kohonen T (1990) The self-organizing map. Proc IEEE 78(9):1464–1480
Rauber A, Merkl D, Dittenbach M (2002) The growing hierarchical self-organizing map: exploratory analysis of high-dimensional data. IEEE Trans Neural Netw 13(6):1331–1341
Goodfellow, Bengio, Courville (2016) The back-propagation algorithm (Rumelhart et al., 1986a). p. 200
Gan Z, Li H, Wolff SJ, Bennett JL, Hyatt G, Wagner GJ, Cao J, Liu WK (2019) Data-driven microstructure and microhardness design in additive manufacturing using a self-organizing map. Engineering 5(4):730–735
Wolff SJ, Gan Z, Lin S, Bennett JL, Yan W, Hyatt G, Ehmann KF, Wagner GJ, Liu WK, Cao J (2019) Experimentally validated predictions of thermal history and microhardness in laser-deposited Inconel 718 on carbon steel. Addit Manuf 27:540–551
Vesanto J, Himberg J, Alhoniemi E, Parhankangas J (2000) SOM toolbox for Matlab 5 57:2. Technical report
Mukherjee T, Zuback JS, De A, DebRoy T (2016) Printability of alloys for additive manufacturing. Sci Rep 6(1):1–8
Modesto D, Zlotnik S, Huerta A (2015) Proper generalized decomposition for parameterized Helmholtz problems in heterogeneous and unbounded domains: application to harbor agitation. Comput Methods Appl Mech Eng 295:127–149
Bro R (1997) PARAFAC. Tutorial and applications. Chem Intell Lab Syst 38(2):149–171
Hawkins T (1975) Cauchy and the spectral theory of matrices. Hist Math 2:1–29
Author information
Authors and Affiliations
5.1 Electronic Supplementary Material
(MP4 15426 kb)
(MP4 16931 kb)
(MP4 20997 kb)
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Liu, W.K., Gan, Z., Fleming, M. (2021). Knowledge-Driven Dimension Reduction and Reduced Order Surrogate Models. In: Mechanistic Data Science for STEM Education and Applications. Springer, Cham. https://doi.org/10.1007/978-3-030-87832-0_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-87832-0_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87831-3
Online ISBN: 978-3-030-87832-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)