Abstract
Classification margins are commonly used to estimate the generalization ability of machine learning models. We present an empirical study of these margins in artificial neural networks. A global estimate of margin size is usually used in the literature. In this work, we point out seldom considered nuances regarding classification margins. Notably, we demonstrate that some types of training samples are modelled with consistently small margins while affecting generalization in different ways. By showing a link with the minimum distance to a different-target sample and the remoteness of samples from one another, we provide a plausible explanation for this observation. We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.
M. W. Theunissen and C. Mouton—Equal contribution.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
In practice, we optimize for the squared Euclidean distance in order to reduce the computational cost of gradient calculations, but report on the unsquared distance in all cases.
References
Birgin, E.G., Martinez, J.M.: Improving ultimate convergence of an augmented Lagrangian method. Optim. Methods Softw. 23(2), 177–195 (2008)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: Advances in Neural Information Processing Systems, vol. 13 (2000)
Elsayed, G., Krishnan, D., Mobahi, H., Regan, K., Bengio, S.: Large margin deep networks for classification. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Guan, S., Loew, M.: Analysis of generalizability of deep neural networks based on the complexity of decision boundary. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 101–106. IEEE (2020)
Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)
Jiang, Y., Krishnan, D., Mobahi, H., Bengio, S.: Predicting the generalization gap in deep networks with margin distributions. In: International Conference on Learning Representations (ICLR) (2018)
Johnson, S.G.: The NLopt nonlinear-optimization package. http://github.com/stevengj/nlopt
Karimi, H., Derr, T., Tang, J.: Characterizing the decision boundary of deep neural networks. arXiv preprint arXiv:1912.11460 (2019)
Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2012). https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf
Krueger, D., et al.: Deep nets don’t learn via memorization (2017)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Lyu, K., Li, J.: Gradient descent maximizes the margin of homogeneous neural networks. In: International Conference on Learning Representations (ICLR) (2019)
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. J. Stat. Mech: Theory Exp. 2021(12), 124003 (2021)
Natekar, P., Sharma, M.: Representation based complexity measures for predicting generalization in deep learning. arXiv preprint arXiv:2012.02775 (2020)
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Northcutt, C., Athalye, A., Mueller, J.: Pervasive label errors in test sets destabilize machine learning benchmarks. In: Vanschoren, J., Yeung, S. (eds.) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1 (2021)
Sokolić, J., Giryes, R., Sapiro, G., Rodrigues, M.R.: Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2017)
Somepalli, G., et al.: Can neural nets learn the same model twice? Investigating reproducibility and double descent from the decision boundary perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13699–13708 (2022)
Sun, S., Chen, W., Wang, L., Liu, T.Y.: Large margin deep neural networks: theory and algorithms. arXiv preprint arXiv:1506.05232 (2015)
Svanberg, K.: A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J. Optim. 12(2), 555–573 (2002)
Tange, O.: GNU parallel - the command-line power tool. USENIX Mag. 36(1), 42–47 (2011). http://www.gnu.org/s/parallel
Theunissen, M.W., Davel, M.H., Barnard, E.: Benign interpolation of noise in deep learning. South African Comput. J. 32(2), 80–101 (2020)
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE (2015)
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Wei, C., Lee, J., Liu, Q., Ma, T.: On the margin theory of feedforward neural networks (2018)
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. (JMLR) 10(2) (2009)
Yousefzadeh, R., O’Leary, D.P.: Deep learning interpretation: flip points and Homotopy methods. In: Lu, J., Ward, R. (eds.) Proceedings of The First Mathematical and Scientific Machine Learning Conference. Proceedings of Machine Learning Research, vol. 107, pp. 1–26. PMLR (2020)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Acknowledgements
We thank and acknowledge the Centre for High Performance Computing (CHPC), South Africa, for providing computational resources to this research project.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Theunissen, M.W., Mouton, C., Davel, M.H. (2022). The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs. In: Pillay, A., Jembere, E., Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2022. Communications in Computer and Information Science, vol 1734. Springer, Cham. https://doi.org/10.1007/978-3-031-22321-1_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-22321-1_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22320-4
Online ISBN: 978-3-031-22321-1
eBook Packages: Computer ScienceComputer Science (R0)