A Model is Worth Tens of Thousands of Examples

Dagès, Thomas; Cohen, Laurent D.; Bruckstein, Alfred M.

doi:10.1007/978-3-031-31975-4_17

Thomas Dagès¹²,
Laurent D. Cohen¹³ &
Alfred M. Bruckstein¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14009))

Included in the following conference series:

International Conference on Scale Space and Variational Methods in Computer Vision

894 Accesses

Abstract

Traditional signal processing methods relying on mathematical data generation models have been cast aside in favour of deep neural networks, which require vast amounts of data. Since the theoretical sample complexity is nearly impossible to evaluate, these amounts of examples are usually estimated with crude rules of thumb. However, these rules only suggest when the networks should work, but do not relate to the traditional methods. In particular, an interesting question is: how much data is required for neural networks to be on par or outperform, if possible, the traditional model-based methods? In this work, we empirically investigate this question in two simple examples, where the data is generated according to precisely defined mathematical models, and where well-understood optimal or state-of-the-art mathematical data-agnostic solutions are known. A first problem is deconvolving one-dimensional Gaussian signals and a second one is estimating a circle’s radius and location in random grayscale images of disks. By training various networks, either naive custom designed or well-established ones, with various amounts of training data, we find that networks require tens of thousands of examples in comparison to the traditional methods, whether the networks are trained from scratch or even with transfer-learning or finetuning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Introduction to Neural Networks

Bayesian learning for neural networks: an algorithmic survey

Article Open access 15 March 2023

Machine Learning—Basic Unsupervised Methods (Cluster Analysis Methods, t-SNE)

Notes

1.
The loss function is actually scaled to \(\tfrac{1}{D} MSE _{ train }\) as is commonly done in practice.
2.
Selected on the validation set.
3.
Except f and b which are slightly correlated to ensure a minimal contrast \(|f-b|>\delta \).
4.
In our tests, we take \(D=201\) implying that \(r\sim \mathcal {U}([10,40])\) and \(c\sim \mathcal {U}([-50, 50]^2)\).
5.
We use the simplest ones VGG11 and ResNet18, as larger ones are here unnecessary.
6.
To help the networks converge, the radius and centre coordinates are scaled to \([-1,1]\) using \(r_s = \tfrac{8}{D-1}(r-\tfrac{D-1}{8})\) and \(c_s = \tfrac{2}{D-1}c\). In all plots and numbers provided in this paper, the results are rescaled to the original scale: r and c and not \(r_s\) and \(c_s\).

References

Alwosheel, A., van Cranenburgh, S., Chorus, C.G.: Is your dataset big enough? sample size requirements when using artificial neural networks for discrete choice analysis. J. Choice Model. 28, 167–182 (2018)
Article Google Scholar
Anthony, M., Bartlett, P.L.: Neural Network Learning: Theoretical Foundations. Cambridge University Press, Cambridge (1999)
Book MATH Google Scholar
Bai, B., Yang, F., Chai, L.: Point flow edge detection method based on phase congruency. In: 2019 Chinese Automation Congress (CAC), pp. 5853–5858. IEEE (2019)
Google Scholar
Bengio, Y., Lecun, Y., Hinton, G.: Deep learning for AI. Commun. ACM 64(7), 58–65 (2021)
Article Google Scholar
Butcher, J.: Numerical Methods for Ordinary Differential Equations. Wiley, Hoboken (2008). https://books.google.co.il/books?id=opd2NkBmMxsC
Dagès, T., Lindenbaum, M., Bruckstein, A.M.: From compass and ruler to convolution and nonlinearity: On the surprising difficulty of understanding a simple CNN solving a simple geometric estimation task. arXiv preprint arXiv:2303.06638 (2023)
Elad, M.: Sparse and Redundant Representations: From Theory to Applications in Signal and Image Processing. Springer, Cham (2010)
Book MATH Google Scholar
Geirhos, R., Narayanappa, K., Mitzkus, B., Thieringer, T., Bethge, M., Wichmann, F.A., et al.: Partial success in closing the gap between human and machine vision. Adv. Neural. Inf. Process. Syst. 34, 23885–23899 (2021)
Google Scholar
Geirhos, R., Temme, C.R., Rauber, J., Schütt, H.H., Bethge, M., Wichmann, F.A.: Generalisation in humans and deep neural networks. In: Advances in Neural Information Processing Systems, vol. 31 (2018)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press, Cambridge (2016)
MATH Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Lakshmanan, V., Robinson, S., Munn, M.: Machine Learning Design Patterns. O’Reilly Media, Sebastopol (2020)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
LeCun, Y., Boser, B., Denker, J., Henderson, D., Howard, R., Hubbard, W., et al.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems, vol. 2 (1989)
Google Scholar
Mohamed, A.R., Dahl, G., Hinton, G.: Deep belief networks for phone recognition. In: Nips Workshop on Deep Learning for Speech Recognition and Related Applications, vol. 1, p. 39 (2009)
Google Scholar
Novikoff, A.B.: On convergence proofs for perceptrons. Technical report Stanford Research Institute, Menlo Park, CA, USA (1963)
Google Scholar
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds.) Measures of Complexity, pp. 11–30. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-21852-6_3
Chapter MATH Google Scholar
Wiener, N.: Extrapolation, Interpolation, and Smoothing of Stationary Time Series: With Engineering Applications, vol. 113. MIT press, Cambridge (1949)
Book MATH Google Scholar
Yang, F., Cohen, L.D., Bruckstein, A.M.: A model for automatically tracing object boundaries. In: 2017 IEEE International Conference on Image Processing (ICIP), pp. 2692–2696. IEEE (2017)
Google Scholar

Download references

Acknowledgements

This work is in part supported by the French government under management of Agence Nationale de la Recherche as part of the"Investissements d’avenir" program, reference ANR-19-P3IA-0001 (PRAIRIE 3IA Institute).

Author information

Authors and Affiliations

Department of Computer Science, Technion Israel Institute of Technology, Haifa, Israel
Thomas Dagès & Alfred M. Bruckstein
Ceremade, University Paris Dauphine, PSL Research University, UMR CNRS 7534, 75016, Paris, France
Laurent D. Cohen

Authors

Thomas Dagès
View author publications
You can also search for this author in PubMed Google Scholar
Laurent D. Cohen
View author publications
You can also search for this author in PubMed Google Scholar
Alfred M. Bruckstein
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Dagès .

Editor information

Editors and Affiliations

CNRS, Université Côte d'Azur, Sophia-Antipolis, France
Luca Calatroni
University of Insubria, Como, Italy
Marco Donatelli
University of Bologna, Bologna, Italy
Serena Morigi
University of Modena and Reggio Emilia, Modena, Italy
Marco Prato
University of Genova, Genova, Italy
Matteo Santacesaria

A Pointflow Implementation Details

The pointflow dynamics are implemented by discretising time and approximating the time derivative with a forward finite difference scheme, although it could be improved with a Runge-Kutta 4 implementation [5]. Given the small magnitudes of the fields, we found that a large time step \(dt = 50\) works well. We define three thresholds, \(\tau _l = 0.9\) for \(C_l\), \(\tau _s = 10^{-6}\) for \(C_s\), and \(\tau _{len}=0.001\). we consider having looped \(C_l\) if a point reaches a previous point within squared Euclidean distance \(\tau _l\) while having on the trajectory between the looping points at least one point with squared distance to them of at least \(\tau _l\). A trajectory is stuck if it reaches a point where the current flow V has small magnitude \(\Vert V\Vert _2^2\le \tau _s\). Each flow is run for \(N_i = 1000\) iterations, and trajectories shorter than \(\tau _{len}\) are discarded, e.g. trajectories of type \(C_s\). We used \(\sigma _{Pf} = 5\) for blurring out the noise before computing the fields. The implemented pointflow algorithm for finding contours in our circle images is presented in Algorithm 1.

After computing the list of contours \(\mathcal {C}\) in the image I, we estimate the radius using the average curve length \(\hat{r} = \tfrac{1}{2\pi }\sum _{i=1}^{|\mathcal {C}|} \textrm{length}(\mathcal {C}_i)\). Since the average of the points did not yield the best estimation of the circle centre, we estimate it instead using least squares. The equation of a circle is naturally given by \((x-c_x)^2 + (y-c_y)^2 = r^2\), which can be written as \(\theta _1x + \theta _2y + \theta _3 = x^2 + y^2\), where \(\theta _1 = 2c_x\), \(\theta _2 = 2c_y\), and \(\theta _3 = r^2 - c_x^2 - c_y^2\). We can thus estimate for each contour \(\theta = (\theta _1, \theta _2, \theta _3)^\top \) by least squares as \(\hat{\theta } = A^\top (AA^\top )^{-1}B\), with \(A_{i,:} = (x_i, y_i, 1)\) and \(B_i = x_i^2 + y_i^2\) and i ranging in the number of computed points on the contour. From \(\hat{\theta }\) we can estimate \(\hat{c} = (\tfrac{\theta _1}{2}, \tfrac{\theta _2}{2})\). The final centre estimation is then given by the average of this estimation over all contours. Note that we can also estimate r using \(\theta _3\) but we found that it did not outperform the lenght strategy so we do not use it.

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Dagès, T., Cohen, L.D., Bruckstein, A.M. (2023). A Model is Worth Tens of Thousands of Examples. In: Calatroni, L., Donatelli, M., Morigi, S., Prato, M., Santacesaria, M. (eds) Scale Space and Variational Methods in Computer Vision. SSVM 2023. Lecture Notes in Computer Science, vol 14009. Springer, Cham. https://doi.org/10.1007/978-3-031-31975-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-031-31975-4_17
Published: 10 May 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31974-7
Online ISBN: 978-3-031-31975-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Model is Worth Tens of Thousands of Examples

Abstract

Access this chapter