Abstract
In supervised machine learning, the choice of loss function implicitly assumes a particular noise distribution over the data. For example, the frequently used mean squared error (MSE) loss assumes a Gaussian noise distribution. The choice of loss function during training and testing affects the performance of artificial neural networks (ANNs). It is known that MSE may yield substandard performance in the presence of outliers. The Cauchy loss function (CLF) assumes a Cauchy noise distribution, and is therefore potentially better suited for data with outliers. This papers aims to determine the extent of robustness and generalisability of the CLF as compared to MSE. CLF and MSE are assessed on a few handcrafted regression problems, and a real-world regression problem with artificially simulated outliers, in the context of ANN training. CLF yielded results that were either comparable to or better than the results yielded by MSE, with a few notable exceptions.
Supported by the NRF Thuthuka Grant Number 13819413.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Zahra, M.M., Essai, M.H., Ellah, A.: Performance functions alternatives of MSE for neural networks learning. Int. J. Eng. Res. Technol. (IJERT) 3(1), 967–970 (2014)
Heravi, A.R., Hodtani, G.A.: Where does minimum error entropy outperform minimum mean square error? a new and closer look. IEEE Access 6(1), 5856–5864 (2018)
El-Melegy, M.T., Essai, M.H., Ali, A.A.: Robust training of artificial feedforward neural networks. In: Hassanien, A.E., Abraham, A., Vasilakos, A.V., Pedrycz, W. (eds.) Foundations of Computational. Studies in Computational Intelligence, vol. 201, pp. 217–242. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01082-8_9
Brunet, F.: Contributions to parametric image registration and 3D surface reconstruction. PhD thesis, University of Auvergne, Auvergne, France (2010)
Borak, S., Härdle, W., Weron, R.: Stable distributions. In: Čížek, P., Weron, R., Härdle, W. (eds.) Statistical Tools for Finance and Insurance, pp. 21–44. Springer, Heidelberg (2005). https://doi.org/10.1007/3-540-27395-6_1
Li, X., Lu, Q., Dong, Y., Tao, D.: Robust subspace clustering by Cauchy loss function. IEEE Trans. Neural Netw. Learn. Syst. 30(7), 2067–2078 (2019)
Park, S., Serpedin, E., Qaraqe, K.: Gaussian assumption: the least favorable but the most useful. IEEE Signal Process. Mag. 30(3), 183–186 (2013)
Pearson, R.K.: Control Systems, Identification, pp. 687–707. Academic Press, California (2003)
Chambers, R.L., Steel, Wang, D.G., Welsh, A.: Maximum Likelihood Estimation for Sample Surveys. Chapman and Hall/CRC (2012)
Chen, R., Paschalidis, I.C.: A robust learning approach for regression models based on distributionally robust optimization. J. Mach. Learn. Res. 19, 517–564 (2018)
Tsakalides, P., Nikias, C.L.: Maximum likelihood localization of sources in noise modeled as a Cauchy process. In: Proceedings of MILCOM 1994, vol. 2, pp. 613–617 (1994)
Barron, J.T.: A general and adaptive robust loss function. In: Proceedings of IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4326–4334 (2019)
Huang, H.-C., Cressie, N.: Deterministic/stochastic wavelet decomposition for recovery of signal from noisy data. Technometrics 42(3), 262–276 (2000)
Abu-Mostafa, Y.S., Magdon-Ismail, M., Lin, H.-T.: Learning from data : a short course. AMLbook.com, USA (2012)
Balkema, G., Embrechts, P.: Linear regression for heavy tails. Risks 6, 93 (2018)
Fan, C., Zhang, D., Zhang, C.-H.: On sample size of the Kruskal-Wallis test with application to a mouse peritoneal cavity study. Biometrics 67, 213–24 (2010)
Brcich, R., Iskander, D., Zoubir, A.: The stability test for symmetric alpha-stable distributions. IEEE Trans. Signal Process. 53(3), 977–986 (2005)
Hart, A.: Mann-Whitney test is not just a test of medians: differences in spread can be important. BMJ 323(7309), 391–393 (2001)
Sathishkumar, V.E., Park, J., Cho, Y.: Using data mining techniques for bike sharing demand prediction in metropolitan city. Comput. Commun. 153, 353–366 (2020)
Qi, Z., Wang, H.: Dirty-data impacts on regression models: an experimental evaluation. In: Jensen, C.S., et al. (eds.) DASFAA 2021. LNCS, vol. 12681, pp. 88–95. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-73194-6_6
Zhang, Z.: Improved Adam optimizer for deep neural networks. In: 2018 IEEE/ACM 26th International Symposium on Quality of Service (IWQoS), pp. 1–2 (2018)
Banerjee, C., Mukherjee, T., Pasiliao, E.L.: An empirical study on generalizations of the relu activation function. In: Proceedings of the 2019 ACM Southeast Conference (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mlotshwa, T., van Deventer, H., Bosman, A.S. (2022). Cauchy Loss Function: Robustness Under Gaussian and Cauchy Noise. In: Pillay, A., Jembere, E., Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2022. Communications in Computer and Information Science, vol 1734. Springer, Cham. https://doi.org/10.1007/978-3-031-22321-1_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-22321-1_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22320-4
Online ISBN: 978-3-031-22321-1
eBook Packages: Computer ScienceComputer Science (R0)