Abstract
In data analysis, analyzing the relationships between the variables such as correlation analysis and regression analysis are very important. Correlation analysis and regression analysis are not only very important in analyzing the influence relationship and causal relationship of variables but also serve as the basis for statistical analysis. Furthermore, they are essential and important as basic analysis for machine learning analysis such as deep learning. This is because in analyzing the input and output in deep learning, variables with high correlation are selected first, and in analyzing the causal relationship, it is basic to first conduct basic analysis such as regression analysis. Especially, when data are observed as fuzzy data with ambiguous information, it is difficult to propose unique methods for those analyses due to its complexity. However, the application of fuzzy theory to correlation analysis for data with such ambiguous information has not been an effective study, and several studies have been conducted in cases where the data is not general fuzzy data or interval estimation. As a result, the effectiveness of the fuzzy theory was not highlighted. In particular, the variable selection method for selecting important variables in multiple regression analysis is a very important and essential process in regression analysis. A variable that is significant in simple regression analysis may not be significant in multiple regression analysis due to its relationship with other variables. Therefore, not all variables that affect the dependent variable can be used as independent variables in multiple regression analysis. Therefore, multiple regression analysis goes through the process of excluding some variables. But until now, the process of fuzzy multiple regression analysis has not been applied without a variable selection method and the significance of important variables has not been emphasized that much. In this paper, a fuzzy correlation coefficient and multiple fuzzy regression analysis using variable section method are proposed. For this, first defuzzification and fuzzy ordering are defined. And then fuzzy correlation coefficient is proposed using \({\varvec{L}}_{2}\) distance. Next, fuzzy sum of squares are defined for F-statistics to test the significance of the regression model. Using this F-statistics, fuzzy R2, and fuzzy RMSE, several variable selection methods are proposed based on distance approach. For the data analysis, foreign exchange reserve data and house price of South Korea have been applied which are important indicators for economic crisis. The financial data is mostly recorded as closing values, but the closing values cannot be the representative of the given period of time. Therefore, we can deal with the financial data as fuzzy data which have some fluctuation that can be considered as vagueness that the data originally include. We have used foreign exchange reserve data and house price data with several financial variables. And the proposed fuzzy correlation coefficient and variable selection for fuzzy regression analysis are applied to these financial data.
Similar content being viewed by others
Data Availability
The data described in this article are openly available in Bank of Korea at http://ecos.bok.or.kr and KB real estate data bank at https://kbland.kr/.
References
Hong, D.H., Hwang, S.Y.: Correlation of intuitionistic fuzzy sets in probability spaces. Fuzzy Sets Syst. 75, 77–81 (1995)
Chiang, D.-A., Lin, N.P.: Correlation of fuzzy sets. Fuzzy Sets Syst. 102, 221–226 (1999)
Chaudhuri, B.B., Bhattacharya, A.: On correlation between two fuzzy sets. Fuzzy Sets Syst. 118, 447–456 (2001)
Liua, S.-T., Kao, C.: Fuzzy measures for correlation coefficient of fuzzy numbers. Fuzzy Sets Syst. 128, 267–275 (2002)
Hong, D.H.: Fuzzy measures for a correlation coefficient of fuzzy numbers under TW(the weakest t-norm)-based fuzzy arithmetic operations. Inf. Sci. 176, 150–160 (2006)
Saneifard, R., Saneifard, R.: Correlation coefficient between fuzzy numbers based on central interval. J. fuzzy Set Valued Anal. 2012, 1–9 (2012)
Basaran, M. A., Simonetti, B., D’Ambra, L.: Fuzzy correlation and fuzzy non-linear regression analysis. Statistical Decision-Making, 203–220 (2016)
Bustince, H., Burillo, P.: Correlation of interval-valued intuitionistic fuzzy sets. Fuzzy Sets Syst. 74, 237–244 (1995)
Cheng, Y.-T., Yang, C.-C.: The application of fuzzy correlation coefficient with fuzzy interval data. Int. J. Innov. Manag., Inform. Prod. 4, 65–71 (2014)
Yoon, J.H., Choi, S.H.: Separate fuzzy regression with crisp input and fuzzy output. J. Korean Data Inform. Sci. Soc. 18(2), 301–314 (2007)
Kim, H.K., Yoon, J.H., Li, Y.: Asymptotic properties of least squares estimation with fuzzy observations. Inf. Sci. 178(2), 439–451 (2008)
Yoon, J.H., Kim, H.K., Choi, S.H.: Asymptotic consistency of least squares estimators in fuzzy regression model. Commun. Stat. Appl. Methods 15(6), 799–813 (2008)
Yoon, J.H., Choi, S.H.: Componentwise fuzzy linear regression using least squares estimation. J. Multiple-Valued Logic Soft Comput. 15, 137–153 (2009)
Yoon, J.H., Choi, S.H.: Fuzzy Linear Regression Using Distribution Free Method. Commun. Stat. Appl. Methods 16(5), 781–790 (2009)
Yoon, J.H., Choi, S.H.: General fuzzy regression using least squares method. Int. J. Syst. Sci. 41(5), 477–485 (2010)
Yoon, J.H., Choi, S.H.: Fuzzy least squares estimation with new fuzzy operations. In: Synergies of soft computing and statistics for intelligent data analysis. Springer, Berlin (2013)
Jung, H.-Y., Yoon, J.H., Choi, S.H.: Fuzzy linear regression using rank transform method. Fuzzy Sets Syst. 274(1), 97–108 (2014)
Namdari, M., Yoon, J.H., Abadi, A., Taheri, S.M., Choi, S.H.: Fuzzy logistic regression with least absolute deviations estimators. Soft. Comput. 19, 909–917 (2015)
Lee, W.J., Jung, H.-Y., Choi, S.H., Yoon, J.H.: The statistical inferences of fuzzy regression based on bootstrap techniques. Soft Comput. 19, 883–890 (2015)
Lee, W.J., Jung, H.-Y., Yoon, J.H., Choi, S.H.: Analysis of variance for fuzzy data based on permutation method. Int. J. Fuzzy Logic Intell. Syst. 17(1), 43–50 (2017)
Yoon, J.H., Choi, S.H., Grzegorzewski, P.: On asymptotic properties of the multiple fuzzy least squares estimator. In: Soft methods for data science, p. 456. Springer, Berlin (2017)
Yoon, J.H., Kyeong, D., Seo, K.: A hybrid method based on F-transform for robust estimators. Int. J. Approximate Reasoning 104, 75–83 (2019)
Yoon, J.H.: Fuzzy mediation analysis. Int. J. Fuzzy Syst. 22(1), 338–349 (2020)
Yoon, J.H.: Fuzzy moderation and moderated-mediation analysis. Int. J. Fuzzy Syst. 22(6), 1948–1960 (2020)
D’Urso, P., Santoro, A.: Goodness of fit and variable selection in the fuzzy multiple linear regression. Fuzzy Sets Syst. 157, 2627–2647 (2006)
Kashani, M., Arashi, M., Rabiei, M.R., D’Urso, P., Giovanni, L.D.: A fuzzy penalized regression model with variable selection. Expert Syst. Appl. 175, 114696 (2021)
Gładysz, B., Kuchta, D.: A method of variable selection for fuzzy regression—the possibility approach. Op. Res. Decis. 21, 5–15 (2011)
Zadeh, L.A.: Fuzzy sets. Inf. Control 8, 338–353 (1965)
Diamond, P.: Fuzzy least squares. Inform. Sci. 46, 141–157 (1988)
Zadeh, L.A.: Similarity relations and fuzzy orderings? Inf. Sci. 3, 177–200 (1971)
Choobineh, F.: An index for ordering fuzzy numbers. Fuzzy Sets Syst. 54, 287–294 (1993)
Wang, X., Kerre, E.E.: Reasonable properties for the ordering of fuzzy quantities (I). Fuzzy Sets Syst. 118, 375–385 (2001)
Leekwijck, W.V., Kerre, E.E.: Defuzzification: criteria and classification. Fuzzy Sets and Syst. 108(2), 159–178 (1999)
Bank of Korea: Economic statistics system. http://ecos.bok.or.kr (2020)
KB real estate data bank: https://kbland.kr/ (2023)
Acknowledgements
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1A2C1A01011131).
Funding
National Research Foundation of Korea, No. 2020R1A2C1A01011131,Jin Hee Yoon
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yoon, J.H., Kim, D.J. & Koo, Y.Y. Novel Fuzzy Correlation Coefficient and Variable Selection Method for Fuzzy Regression Analysis Based on Distance Approach. Int. J. Fuzzy Syst. 25, 2969–2985 (2023). https://doi.org/10.1007/s40815-023-01546-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40815-023-01546-6