Abstract
The development of a highly efficient methodology for establishing squeeze casting process parameters from past data is essential. However, designing squeeze casting process parameters based on past data is difficult when there are many missing values. Conventional missing data approaches are fraught with additional computational challenges when applied to high-dimensional multivariable missing data, especially material process data with correlation. As the relationship between material composition and process parameters has similar characteristics with that between users and information of interest, this paper proposes a method for missing data imputation based on a clustering-based collaborative filtering (ClubCF) algorithm to address this challenge. Data samples with and without missing values were divided into two groups. K-means clustering based on a canopy algorithm was applied to the data samples without missing values to obtain k subclass data, whose values were then selected to fill data samples with missing values via a collaborative filtering theory based on Pearson similarity user filling. The missing squeeze casting process parameters data of aluminum alloys were used to evaluate the method, and more comparative experiments were carried out to understand their performance and features. Two different indicators, including the mean absolute error and the standard deviation, were utilized to quantify the imputation performance, which was compared with those of three conventional methods (mean interpolation, regression interpolation, and the expectation maximization algorithm). The results indicate that the proposed approach is effective and outperforms conventional methods in processing high-dimensional correlated data.
Similar content being viewed by others
References
Jain A, Ong SP, Hautier G, Chen W, Richards WD, Dacek S, Cholia S, Gunter D, Skinner D, Ceder G, Persson KA (2013) Commentary: the materials project: a materials genome approach to accelerating materials innovation. APL Mater 1(1):011002. https://doi.org/10.1063/1.4812323
Alhashmy HA, Nganbe M (2015) Laminate squeeze casting of carbon fiber reinforced aluminum matrix composites. Mater Des 67:154–158. https://doi.org/10.1016/j.matdes.2014.11.034
de Pablo JJ, Jackson NE, Webb MA, Chen L-Q, Moore JE, Morgan D, Jacobs R, Pollock T, Schlom DG, Toberer ES, Analytis J, Dabo I, DeLongchamp DM, Fiete GA, Grason GM, Hautier G, Mo Y, Rajan K, Reed EJ, Rodriguez E, Stevanovic V, Suntivich J, Thornton K, Zhao J-C (2019) New frontiers for the materials genome initiative. npj Comput Mater. https://doi.org/10.1038/s41524-019-0173-4
Chiang K-T, Liu N-M, Tsai T-C (2008) Modeling and analysis of the effects of processing parameters on the performance characteristics in the high pressure die casting process of Al–SI alloys. Int J Adv Manuf Technol 41(11–12):1076–1084. https://doi.org/10.1007/s00170-008-1559-5
Patel GCM, Krishna P, Parappagoudar MB (2014) Optimization of squeeze cast process parameters using Taguchi and Grey relational analysis. Procedia Technol 14:157–164. https://doi.org/10.1016/j.protcy.2014.08.021
Ravikumar AR, Amirthagadeswaran KS, Senthil P (2014) Parametric optimization of squeeze cast AC2A-Ni coated SiCp composite using Taguchi technique. Adv Mater Sci Eng 2014:1–10. https://doi.org/10.1155/2014/160519
Souissi N, Souissi S, Lecompte J-P, Amar MB, Bradai C, Halouani F (2015) Improvement of ductility for squeeze cast 2017 A wrought aluminum alloy using the Taguchi method. Int J Adv Manuf Technol 78(9–12):2069–2077. https://doi.org/10.1007/s00170-015-6792-0
Sarfraz S, Jahanzaib M, Wasim A, Hussain S, Aziz H (2016) Investigating the effects of as-casted and in situ heat-treated squeeze casting of Al–3.5% Cu alloy. Int J Adv Manuf Technol 89(9–12):3547–3561. https://doi.org/10.1007/s00170-016-9350-5
Sarfraz MH, Jahanzaib M, Ahmed W, Hussain S (2019) Multi-response parametric optimization of squeeze casting process for fabricating Al 6061-SiC composite. Int J Adv Manuf Technol 102(1–4):759–773. https://doi.org/10.1007/s00170-018-03278-6
Agrawal A, Choudhary A (2016) Perspective: materials informatics and big data: realization of the “fourth paradigm” of science in materials science. APL Mater 4(5):053208. https://doi.org/10.1063/1.4946894
Deng Z, Yin H, Jiang X, Zhang C, Zhang K, Zhang T, Xu B, Zheng Q, Qu X (2018) Machine leaning aided study of sintered density in Cu–Al alloy. Comput Mater Sci 155:48–54. https://doi.org/10.1016/j.commatsci.2018.07.049
Fernandez-Zelaia P, Melkote SN (2019) Process–structure–property modeling for severe plastic deformation processes using orientation imaging microscopy and data-driven techniques. Integr Mater Manuf Innov 8:17–36. https://doi.org/10.1007/s40192-019-00125-8
Wenzlick M, Bauer JR, Rose K, Hawk J, Devanathan R (2020) Data assessment method to support the development of creep-resistant alloys. Integr Mater Manuf Innov 9:89–102. https://doi.org/10.1007/s40192-020-00167-3
Paik MC, Wang C (2009) Handling missing data by deleting completely observed records. J Stat Plan Inference 139(7):2341–2350. https://doi.org/10.1016/j.jspi.2008.10.024
Little RJA (1988) Missing-data adjustments in large surveys. J Bus Econ Stat 6(3):287–296. https://doi.org/10.1080/07350015.1988.10509663
Ramezani R, Maadi M, Khatami SM (2018) A novel hybrid intelligent system with missing value imputation for diabetes diagnosis. Alex Eng J 57(3):1883–1891. https://doi.org/10.1016/j.aej.2017.03.043
Di Nuovo AG (2011) Missing data analysis with fuzzy C-Means: a study of its application in a psychological scenario. Expert Syst Appl 38(6):6793–6797. https://doi.org/10.1016/j.eswa.2010.12.067
Yoke CW, Khalid ZM (2014) Comparison of multiple imputation and complete-case in a simulated longitudinal data with missing covariate. Paper presented at the AIP Conference Proceedings
Lan Q, Xu X, Ma H, Li G (2020) Multivariable data imputation for the analysis of incomplete credit data. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2019.112926
Zhang L, Lu W, Liu X, Pedrycz W, Zhong C (2016) Fuzzy C-Means clustering of incomplete data based on probabilistic information granules of missing values. Knowl-Based Syst 99:51–70. https://doi.org/10.1016/j.knosys.2016.01.048
Shahbazi H, Karimi S, Hosseini V, Yazgi D, Torbatian S (2018) A novel regression imputation framework for Tehran air pollution monitoring network using outputs from WRF and CAMx models. Atmos Environ 187:24–33. https://doi.org/10.1016/j.atmosenv.2018.05.055
Edwards JK, Cole SR, Troester MA, Richardson DB (2013) Accounting for misclassified outcomes in binary regression models using multiple imputation with internal validation data. Am J Epidemiol 177(9):904–912. https://doi.org/10.1093/aje/kws340
Robbins MW, Ghosh SK, Habiger JD (2013) Imputation in high-dimensional economic data as applied to the agricultural resource management survey. J Am Stat Assoc 108(501):81–95. https://doi.org/10.1080/01621459.2012.734158
Gao Y, Merz C, Lischeid G, Schneider M (2018) A review on missing hydrological data processing. Environ Earth Sci. https://doi.org/10.1007/s12665-018-7228-6
Walczak B, Massart DL (2001) Dealing with missing data: part II. Chemometr Intell Lab 1(58):29–42
Qiu J-Q, Zhou Y-Q, Yue T-Y, Pei J, Shui C-Y, Li X-S, Zhang T (2018) Missing data replacement methods in different scenarios. Sichuan da xue xue bao Yi xue ban J Sichuan Univ Med Sci Ed 49(3):430–435
Miró JJ, Caselles V, Estrela MJ (2017) Multiple imputation of rainfall missing data in the Iberian Mediterranean context. Atmos Res 197:313–330. https://doi.org/10.1016/j.atmosres.2017.07.016
Zainuri NA, Jemain AA, Muda N (2015) A comparison of various imputation methods for missing values in air quality data. Sains Malaysiana 44(3):449–456
Jerez JM, Molina I, Garcia-Laencina PJ, Alba E, Ribelles N, Martin M, Franco L (2010) Missing data imputation using statistical and machine learning methods in a real breast cancer problem. Artif Intell Med 50(2):105–115. https://doi.org/10.1016/j.artmed.2010.05.002
Choi Y-Y, Shon H, Byon Y-J, Kim D-K, Kang S (2019) Enhanced application of principal component analysis in machine learning for imputation of missing traffic data. Appl Sci. https://doi.org/10.3390/app9102149
Li JR, Khoo LP, Tor SB (2006) RMINE: a rough set based data mining prototype for the reasoning of incomplete data in condition-based fault diagnosis. J Intell Manuf 1(17):163–176
Tahir M, Li M, Ayoub N, Aamir M (2019) Efficacy improvement of anomaly detection by using intelligence sharing scheme. Appl Sci. https://doi.org/10.3390/app9030364
Rajula HSR, Odintsova V, Manchia M, Fanos V (2019) Overview of federated facility to harmonize, analyze and management of missing data in cohorts. Appl Sci. https://doi.org/10.3390/app9194103
Krishnamurthy N, Maddali S, Hawk JA, Romanov VN (2019) 9Cr steel visualization and predictive modeling. Comput Mater Sci 168:268–279. https://doi.org/10.1016/j.commatsci.2019.03.015
Guo S, Yu J, Liu X, Wang C, Jiang Q (2019) A predicting model for properties of steel using the industrial big data based on machine learning. Comput Mater Sci 160:95–104. https://doi.org/10.1016/j.commatsci.2018.12.056
Abuomar O, Nouranian S, King R, Lacy TE (2019) Application of materials informatics to vapor-grown carbon nanofiber/vinyl ester nanocomposites through self-organizing maps and clustering techniques. Comput Mater Sci 158:98–109. https://doi.org/10.1016/j.commatsci.2018.11.011
Verpoort PC, MacDonald P, Conduit GJ (2018) Materials data validation and imputation with an artificial neural network. Comput Mater Sci 147:176–185. https://doi.org/10.1016/j.commatsci.2018.02.002
Tang H, Lei M, Gong Q, Wang J (2019) A BP neural network recommendation algorithm based on cloud model. IEEE Access 7:35898–35907. https://doi.org/10.1109/access.2018.2890553
Wu S (2020) Research on the application of spatial partial differential equation in user oriented information mining. Alex Eng J. https://doi.org/10.1016/j.aej.2020.01.047
Ge Y, Xiong H, Tuzhilin A, Liu Q (2014) Cost-aware collaborative filtering for travel tour recommendations. ACM Trans Inf Syst 32(1):1–31. https://doi.org/10.1145/2559169
Yoon J, Seo W, Coh B-Y, Song I, Lee J-M (2017) Identifying product opportunities using collaborative filtering-based patent analysis. Comput Ind Eng 107:376–387. https://doi.org/10.1016/j.cie.2016.04.009
Nilashi M, Ibrahim O, Bagherifard K (2018) A recommender system based on collaborative filtering using ontology and dimensionality reduction techniques. Expert Syst Appl 92:507–520. https://doi.org/10.1016/j.eswa.2017.09.058
Aaldering LJ, Leker J, Song CH (2019) Recommending untapped M&A opportunities: a combined approach using principal component analysis and collaborative filtering. Expert Syst Appl 125:221–232. https://doi.org/10.1016/j.eswa.2019.02.004
Khurana P, Parveen S (2016) Effective hybrid recommender approach using improved K-means and similarity. Int J Comput Trends Technol 3(36):147–152
Xiaojun L (2017) An improved clustering-based collaborative filtering recommendation algorithm. Clust Comput 20(2):1281–1288. https://doi.org/10.1007/s10586-017-0807-6
Zhang C, Shen X, Cheng H, Qian Q (2019) Brain tumor segmentation based on hybrid clustering and morphological operations. Int J Biomed Imaging 2019:7305832. https://doi.org/10.1155/2019/7305832
Chen Q, Ibrahim JG, Chen MH, Senchaudhuri P (2008) Theory and inference for regression models with missing responses and covariates. J Multivar Anal 99(6):1302–1331. https://doi.org/10.1016/j.jmva.2007.08.009
Nguyen DV, Şentürk D (2008) Multicovariate-adjusted regression models. J Stat Comput Simul 78(9):813–827. https://doi.org/10.1080/00949650701421907
Karpievitch YV, Dabney AR, Smith RD (2012) Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinform S16(13 Suppl 16):S5–S5. https://doi.org/10.1186/1471-2105-13-S16-S5
Acknowledgements
This work was supported by the National Natural Science Foundation of China (Grant Nos. 51965006, 51875209), Guangxi Natural Science Foundation (Grant No. 2018GXNSFAA050111), and the Open Fund of the National Engineering Research Center of Near-Net-Shape Forming for Metallic Materials (Grant No. 2019001).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Deng, J., Ye, Z., Shan, L. et al. Imputation Method Based on Collaborative Filtering and Clustering for the Missing Data of the Squeeze Casting Process Parameters. Integr Mater Manuf Innov 11, 95–108 (2022). https://doi.org/10.1007/s40192-021-00248-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40192-021-00248-x