Skip to main content

The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs

  • Conference paper
  • First Online:
Artificial Intelligence Research (SACAIR 2022)

Abstract

Classification margins are commonly used to estimate the generalization ability of machine learning models. We present an empirical study of these margins in artificial neural networks. A global estimate of margin size is usually used in the literature. In this work, we point out seldom considered nuances regarding classification margins. Notably, we demonstrate that some types of training samples are modelled with consistently small margins while affecting generalization in different ways. By showing a link with the minimum distance to a different-target sample and the remoteness of samples from one another, we provide a plausible explanation for this observation. We support our findings with an analysis of fully-connected networks trained on noise-corrupted MNIST data, as well as convolutional networks trained on noise-corrupted CIFAR10 data.

M. W. Theunissen and C. Mouton—Equal contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    In practice, we optimize for the squared Euclidean distance in order to reduce the computational cost of gradient calculations, but report on the unsquared distance in all cases.

References

  1. Birgin, E.G., Martinez, J.M.: Improving ultimate convergence of an augmented Lagrangian method. Optim. Methods Softw. 23(2), 177–195 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  2. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152 (1992)

    Google Scholar 

  3. Bousquet, O., Elisseeff, A.: Algorithmic stability and generalization performance. In: Advances in Neural Information Processing Systems, vol. 13 (2000)

    Google Scholar 

  4. Elsayed, G., Krishnan, D., Mobahi, H., Regan, K., Bengio, S.: Large margin deep networks for classification. In: Advances in Neural Information Processing Systems, vol. 31 (2018)

    Google Scholar 

  5. Guan, S., Loew, M.: Analysis of generalizability of deep neural networks based on the complexity of decision boundary. In: 2020 19th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 101–106. IEEE (2020)

    Google Scholar 

  6. Hochreiter, S., Schmidhuber, J.: Flat minima. Neural Comput. 9(1), 1–42 (1997)

    Article  MATH  Google Scholar 

  7. Jiang, Y., Krishnan, D., Mobahi, H., Bengio, S.: Predicting the generalization gap in deep networks with margin distributions. In: International Conference on Learning Representations (ICLR) (2018)

    Google Scholar 

  8. Johnson, S.G.: The NLopt nonlinear-optimization package. http://github.com/stevengj/nlopt

  9. Karimi, H., Derr, T., Tang, J.: Characterizing the decision boundary of deep neural networks. arXiv preprint arXiv:1912.11460 (2019)

  10. Krizhevsky, A.: Learning Multiple Layers of Features from Tiny Images (2012). https://www.cs.toronto.edu/kriz/learning-features-2009-TR.pdf

  11. Krueger, D., et al.: Deep nets don’t learn via memorization (2017)

    Google Scholar 

  12. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)

    Article  Google Scholar 

  13. Lyu, K., Li, J.: Gradient descent maximizes the margin of homogeneous neural networks. In: International Conference on Learning Representations (ICLR) (2019)

    Google Scholar 

  14. Moosavi-Dezfooli, S.M., Fawzi, A., Frossard, P.: DeepFool: a simple and accurate method to fool deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2574–2582 (2016)

    Google Scholar 

  15. Nakkiran, P., Kaplun, G., Bansal, Y., Yang, T., Barak, B., Sutskever, I.: Deep double descent: where bigger models and more data hurt. J. Stat. Mech: Theory Exp. 2021(12), 124003 (2021)

    Article  MathSciNet  MATH  Google Scholar 

  16. Natekar, P., Sharma, M.: Representation based complexity measures for predicting generalization in deep learning. arXiv preprint arXiv:2012.02775 (2020)

  17. Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  18. Northcutt, C., Athalye, A., Mueller, J.: Pervasive label errors in test sets destabilize machine learning benchmarks. In: Vanschoren, J., Yeung, S. (eds.) Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, vol. 1 (2021)

    Google Scholar 

  19. Sokolić, J., Giryes, R., Sapiro, G., Rodrigues, M.R.: Robust large margin deep neural networks. IEEE Trans. Signal Process. 65(16), 4265–4280 (2017)

    Article  MathSciNet  MATH  Google Scholar 

  20. Somepalli, G., et al.: Can neural nets learn the same model twice? Investigating reproducibility and double descent from the decision boundary perspective. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 13699–13708 (2022)

    Google Scholar 

  21. Sun, S., Chen, W., Wang, L., Liu, T.Y.: Large margin deep neural networks: theory and algorithms. arXiv preprint arXiv:1506.05232 (2015)

  22. Svanberg, K.: A class of globally convergent optimization methods based on conservative convex separable approximations. SIAM J. Optim. 12(2), 555–573 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  23. Tange, O.: GNU parallel - the command-line power tool. USENIX Mag. 36(1), 42–47 (2011). http://www.gnu.org/s/parallel

  24. Theunissen, M.W., Davel, M.H., Barnard, E.: Benign interpolation of noise in deep learning. South African Comput. J. 32(2), 80–101 (2020)

    Article  Google Scholar 

  25. Tishby, N., Zaslavsky, N.: Deep learning and the information bottleneck principle. In: 2015 IEEE Information Theory Workshop (ITW), pp. 1–5. IEEE (2015)

    Google Scholar 

  26. Vapnik, V.N.: An overview of statistical learning theory. IEEE Trans. Neural Netw. 10(5), 988–999 (1999)

    Article  Google Scholar 

  27. Wei, C., Lee, J., Liu, Q., Ma, T.: On the margin theory of feedforward neural networks (2018)

    Google Scholar 

  28. Weinberger, K.Q., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. (JMLR) 10(2) (2009)

    Google Scholar 

  29. Yousefzadeh, R., O’Leary, D.P.: Deep learning interpretation: flip points and Homotopy methods. In: Lu, J., Ward, R. (eds.) Proceedings of The First Mathematical and Scientific Machine Learning Conference. Proceedings of Machine Learning Research, vol. 107, pp. 1–26. PMLR (2020)

    Google Scholar 

  30. Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning (still) requires rethinking generalization. Commun. ACM 64(3), 107–115 (2021)

    Article  Google Scholar 

Download references

Acknowledgements

We thank and acknowledge the Centre for High Performance Computing (CHPC), South Africa, for providing computational resources to this research project.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marthinus Wilhelmus Theunissen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Theunissen, M.W., Mouton, C., Davel, M.H. (2022). The Missing Margin: How Sample Corruption Affects Distance to the Boundary in ANNs. In: Pillay, A., Jembere, E., Gerber, A. (eds) Artificial Intelligence Research. SACAIR 2022. Communications in Computer and Information Science, vol 1734. Springer, Cham. https://doi.org/10.1007/978-3-031-22321-1_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-22321-1_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-22320-4

  • Online ISBN: 978-3-031-22321-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics