The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs

Theunissen, Marthinus Wilhelmus; Mouton, Coenraad; Davel, Marelie H.

doi:10.1007/978-3-031-22321-1_6

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1734))

Included in the following conference series:

Southern African Conference for Artificial Intelligence Research

364 Accesses
1 Altmetric

Abstract

Classification margins are commonly used to estimate the generalization ability of machine learning models. We present an empirical study of these margins in artificial neural networks. A global estimate of margin size is usually used in the literature. In this work, we point out seldom considered nuances regarding classification margins. Notably, we demonstrate that some types of training samples are modelled with consistently small margins while affecting generalization in different ways. By showing a link with the minimum distance to a different-target sample and the remoteness of samples from one another, we provide a plausible explanation for this observation. We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.

M. W. Theunissen and C. Mouton—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Performance of Deep Learning Algorithms vs. Shallow Models, in Extreme Conditions - Some Empirical Studies

Pre-interpolation Loss Behavior in Neural Networks

Autoencoders for sample size estimation for fully connected neural network classifiers

Article Open access 13 December 2022

Notes

1.
In practice, we optimize for the squared Euclidean distance in order to reduce the computational cost of gradient calculations, but report on the unsquared distance in all cases.

References

Birgin, E.G., Martinez, J.M.: Improving ultimate convergence of an augmented Lagrangian method. Optim. Methods Softw. 23(2), 177–195 (2008)
Article MathSciNet MATH Google Scholar
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)
Google Scholar
Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: Advances in Neural Information Processing Systems, vol. 13 (2000)
Google Scholar
Elsayed, G., Krishnan, D., Mobahi, H., Regan, K., Bengio, S.: Large margin deep networks for classification. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Guan, S., Loew, M.: Analysis of generalizability of deep neural networks based on the complexity of decision boundary. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 101–106. IEEE (2020)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)
Article MATH Google Scholar
Jiang, Y., Krishnan, D., Mobahi, H., Bengio, S.: Predicting the generalization gap in deep networks with margin distributions. In: International Conference on Learning Representations (ICLR) (2018)
Google Scholar
Johnson, S.G.: The NLopt nonlinear-optimization package. http://github.com/stevengj/nlopt
Karimi, H., Derr, T., Tang, J.: Characterizing the decision boundary of deep neural networks. arXiv preprint arXiv:1912.11460 (2019)
Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2012). https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf
Krueger, D., et al.: Deep nets don’t learn via memorization (2017)
Google Scholar
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)
Article Google Scholar
Lyu, K., Li, J.: Gradient descent maximizes the margin of homogeneous neural networks. In: International Conference on Learning Representations (ICLR) (2019)
Google Scholar
Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)
Google Scholar
Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. J. Stat. Mech: Theory Exp. 2021(12), 124003 (2021)
Article MathSciNet MATH Google Scholar
Natekar, P., Sharma, M.: Representation based complexity measures for predicting generalization in deep learning. arXiv preprint arXiv:2012.02775 (2020)
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Google Scholar
Northcutt, C., Athalye, A., Mueller, J.: Pervasive label errors in test sets destabilize machine learning benchmarks. In: Vanschoren, J., Yeung, S. (eds.) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1 (2021)
Google Scholar
Sokolić, J., Giryes, R., Sapiro, G., Rodrigues, M.R.: Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2017)
Article MathSciNet MATH Google Scholar
Somepalli, G., et al.: Can neural nets learn the same model twice? Investigating reproducibility and double descent from the decision boundary perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13699–13708 (2022)
Google Scholar
Sun, S., Chen, W., Wang, L., Liu, T.Y.: Large margin deep neural networks: theory and algorithms. arXiv preprint arXiv:1506.05232 (2015)
Svanberg, K.: A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J. Optim. 12(2), 555–573 (2002)
Article MathSciNet MATH Google Scholar
Tange, O.: GNU parallel - the command-line power tool. USENIX Mag. 36(1), 42–47 (2011). http://www.gnu.org/s/parallel
Theunissen, M.W., Davel, M.H., Barnard, E.: Benign interpolation of noise in deep learning. South African Comput. J. 32(2), 80–101 (2020)
Article Google Scholar
Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE (2015)
Google Scholar
Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)
Article Google Scholar
Wei, C., Lee, J., Liu, Q., Ma, T.: On the margin theory of feedforward neural networks (2018)
Google Scholar
Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. (JMLR) 10(2) (2009)
Google Scholar
Yousefzadeh, R., O’Leary, D.P.: Deep learning interpretation: flip points and Homotopy methods. In: Lu, J., Ward, R. (eds.) Proceedings of The First Mathematical and Scientific Machine Learning Conference. Proceedings of Machine Learning Research, vol. 107, pp. 1–26. PMLR (2020)
Google Scholar
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)
Article Google Scholar

Download references

Acknowledgements

We thank and acknowledge the Centre for High Performance Computing (CHPC), South Africa, for providing computational resources to this research project.

Author information

Authors and Affiliations

Faculty of Engineering, North-West University, Potchefstroom, South Africa
Marthinus Wilhelmus Theunissen, Coenraad Mouton & Marelie H. Davel
Centre for Artificial Intelligence Research (CAIR), Cape Town, South Africa
Marthinus Wilhelmus Theunissen, Coenraad Mouton & Marelie H. Davel
South African National Space Agency (SANSA), Hermanus, South Africa
Coenraad Mouton
National Institute for Theoretical and Computational Sciences (NITheCS), Stellenbosch, South Africa
Marelie H. Davel

Authors

Marthinus Wilhelmus Theunissen
View author publications
You can also search for this author in PubMed Google Scholar
Coenraad Mouton
View author publications
You can also search for this author in PubMed Google Scholar
Marelie H. Davel
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marthinus Wilhelmus Theunissen .

Editor information

Editors and Affiliations

University of KwaZulu-Natal, Durban, South Africa
Anban Pillay
University of KwaZulu-Natal, Durban, South Africa
Edgar Jembere
University of Pretoria, Pretoria, South Africa
Aurona Gerber

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Theunissen, M.W., Mouton, C., Davel, M.H. (2022). The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs. In: Pillay, A., Jembere, E., Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2022. Communications in Computer and Information Science, vol 1734. Springer, Cham. https://doi.org/10.1007/978-3-031-22321-1_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-22321-1_6
Published: 28 November 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-22320-4
Online ISBN: 978-3-031-22321-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs

Abstract

Access this chapter

Similar content being viewed by others

Performance of Deep Learning Algorithms vs. Shallow Models, in Extreme Conditions - Some Empirical Studies

Pre-interpolation Loss Behavior in Neural Networks

Autoencoders for sample size estimation for fully connected neural network classifiers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs

Abstract

Access this chapter

Similar content being viewed by others

Performance of Deep Learning Algorithms vs. Shallow Models, in Extreme Conditions - Some Empirical Studies

Pre-interpolation Loss Behavior in Neural Networks

Autoencoders for sample size estimation for fully connected neural network classifiers

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation