Artificial neural networks with uniform norm-based loss functions

Peiris, Vinesha; Roshchina, Vera; Sukhorukova, Nadezda

doi:10.1007/s10444-024-10124-9

Artificial neural networks with uniform norm-based loss functions

Open access
Published: 23 April 2024

Volume 50, article number 31, (2024)
Cite this article

Download PDF

You have full access to this open access article

Advances in Computational Mathematics Aims and scope Submit manuscript

Artificial neural networks with uniform norm-based loss functions

Download PDF

Vinesha Peiris^1,2,
Vera Roshchina³ &
Nadezda Sukhorukova⁴

207 Accesses
Explore all metrics

Abstract

We explore the potential for using a nonsmooth loss function based on the max-norm in the training of an artificial neural network without hidden layers. We hypothesise that this may lead to superior classification results in some special cases where the training data are either very small or the class size is disproportional. Our numerical experiments performed on a simple artificial neural network with no hidden layer appear to confirm our hypothesis.

Article PDF

Fundamentals of Artificial Neural Networks and Deep Learning

Regression Neural Networks with a Highly Robust Loss Function

Regularisation of neural networks by enforcing Lipschitz continuity

Article Open access 06 December 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Arnold, V.: On functions of three variables. Dokl. Akad. Nauk SSSR 114, 679–681,: English translation: Amer. Math. Soc. Transl. 28(1963), 51–54 (1957)
Bach, F., Jenatton, R., Mairal, J., Obozinski, G.: Convex optimization with sparsity-inducing norms, chap. 2, pp. 19–53. MIT press (2011)
Boyd, S., Vandenberghe, L.: Convex optimization, 7th (edn.) Cambridge University Press, New York, USA (2009)
Buolamwini, J., Gebru, T.: Gender shades: intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research 81, 1–15 (2018)
Google Scholar
Crouzeix, J.P.: Conditions for convexity of quasiconvex functions. Math. Oper. Res. 5(1), 120–125 (1980)
Article MathSciNet Google Scholar
Cybenko, G.: Approximation by superpositions of a sigmoidal function. Math. Control Signals Systems 2, 303–314 (1989)
Article MathSciNet Google Scholar
Daniilidis, A., Hadjisavvas, N., Martinez-Legaz, J.E.: An appropriate subdifferential for quasiconvex functions. SIAM J. Optim. 12(2), 407–420 (2002)
Article MathSciNet Google Scholar
Dau, H.A., Keogh, E., Kamgar, K., Yeh, C.C.M., Zhu, Y., Gharghabi, S., Ratanamahatana, C.A., Yanping, Hu, B., Begum, N., Bagnall, A., Mueen, A., Batista, G., Hexagon-ML: The UCR time series classification archive (2018). https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Dutta, J., Rubinov, A.M.: Abstract convexity. Handbook of generalized convexity and generalized monotonicity 76, 293–333 (2005)
Article MathSciNet Google Scholar
de Finetti, B.: Sulle stratificazioni convesse. Ann. Mat. Pura Appl. pp. 173–183 (1949)
Goodfellow, I., Bengio, Y., Courville, A.: Deep learning. MIT Press (2016). http://www.deeplearningbook.org
Gould, S., Hartley, R., Campbell, D.: Deep declarative networks: a new hope. CoRR abs/1909.04866 (2019). arXiv:1909.04866
Haeffele, B.D., Vidal, R.: Global optimality in neural network training. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, Honolulu, HI, USA, pp. 4390–4398. IEEE Computer Society (2017). https://doi.org/10.1109/CVPR.2017.467. Accessed 21–26 July 2017
Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Netw. 4(2), 251–257 (1991)
Article MathSciNet Google Scholar
Kolmogorov, A.N.: On the representation of continuous functions of many variables by superposition of continuous functions of one variable and addition. Dokl. Akad. Nauk SSSR 114, 953–956 (1957)
MathSciNet Google Scholar
LeCun, Y., Cortes, C., Burges, C.C.: The MNIST database of handwritten digits (1998). http://yann.lecun.com/exdb/mnist/
Leshno, M., Lin, V.Y., Pinkus, A., Schocken, S.: Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 6(6), 861–867 (1993). https://doi.org/10.1016/S0893-6080(05)80131-5
Article Google Scholar
Marcotte, P., Savard, G.: Novel approaches to the discrimination problem. Math. Methods Oper. Res. 36, 517–545 (1992)
Article MathSciNet Google Scholar
Pinkus, A.: Approximation theory of the MLP model in neural networks. Acta Numer 8, 143–195 (1999). https://doi.org/10.1017/S0962492900002919
Article MathSciNet Google Scholar
Rubinov, A.M.: Abstract convexity and global optimization, vol. 44. Springer Science & Business Media (2013)
Rubinov, A.M., Simsek, B.: Conjugate quasiconvex nonnegative functions. Optimization 35(1), 1–22 (1995). https://doi.org/10.1080/02331939508844124
Article MathSciNet Google Scholar
Sinha, V.B., Kudugunta, S., Sankar, A.R., Chavali, S.T., Balasubramanian, V.N.: Dante: deep alternations for training neural networks. Neural Netw. 131, 127–143 (2020). https://doi.org/10.1016/j.neunet.2020.07.026.. https://www.sciencedirect.com/science/article/pii/S0893608020302677
Steponavičė, I., Hyndman, R., Smith-Miles, K., Villanova, L.: Efficient Identification of the Pareto Optimal Set. In: Pardalos P., Resende M., Vogiatzis C., Walteros J. (eds.) Learning and Intelligent Optimization. LION 2014. Lecture Notes in Computer Science, vol. 8427. Springer, Cham (2014)
Sun, R.Y.: Optimization for deep learning: an overview. Journal of the Operations Research Society of China 8, 249–294 (2020)
Article MathSciNet Google Scholar

Download references

Acknowledgements

The authors would like to thank the anonymous referees for their helpful recommendations on improving this manuscript.

Funding

Open Access funding enabled and organized by CAUL and its Member Institutions This research was supported by the Australian Research Council (ARC), solving hard Chebyshev approximation problems through nonsmooth analysis (Discovery Project DP180100602).

Author information

Authors and Affiliations

School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, Australia
Vinesha Peiris
School of IT, Deakin University, Burwood, Melbourne, Australia
Vinesha Peiris
School of Science, Computing and Engineering Technologies, Swinburne University of Technology, Melbourne, Australia
Vera Roshchina
School of Mathematics and Statistics, UNSW Sydney, Sydney, Australia
Nadezda Sukhorukova

Authors

Vinesha Peiris
View author publications
You can also search for this author in PubMed Google Scholar
Vera Roshchina
View author publications
You can also search for this author in PubMed Google Scholar
Nadezda Sukhorukova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Vinesha Peiris.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by: Andrew C. Eberhard

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

1.1 Datasets

1.1.1 TwoLeadECG

MIT-BIH Long-Term ECG Data collection comes from the well-known PhysioNet database. Seven long-term ECG recordings with carefully evaluated beat annotations are included in this MIT-BIH Long-Term ECG data collection. We utilise the TwoLeadECG dataset from this collection, which is the final set (seventh set) of recordings. It has two signal classes: Class 1 contains signals of type signal 0, and Class 2 contains signals of type signal 1. The basic purpose is to discriminate between these two groups of signals.

1.1.2 SonyAIBORobotSurface1

The SONY AIBO Robot is a small, dog-shaped robot equipped with multiple sensors. In the experimental setting, the robot walked on two different surfaces: carpet and cement. Class 1 comprises of the data when the robot walked on the carpet, and Class 2 consists of the data of the robot when it walked on the cement floor. The main goal is to distinguish the type of floor that the robot walked on.

1.1.3 ToeSegmentation1

The ToeSegmentation data are derived from the CMU Graphics Lab Motion Capture Database (CMU). Motions in the database containing the keyword walk are classified by their motion descriptions into two categories. The first is the normal walk (Class 1), with only walk in the motion descriptions. The other is the abnormal walk (Class 2), with the motion descriptions containing: hobble walk, walk wounded leg, walk on toes bent forward, hurt leg walk, drag bad leg walk or hurt stomach walk. In the abnormal walks, the actors are pretending to have difficulty walking normally. ToeSegmentation1 contains the coordinates of the x-axis.

1.1.4 WormsTwoClass

Caenorhabditis elegans is a roundworm commonly used as a model organism in the study of genetics. The movement of these worms is known to be a useful indicator for understanding behavioural genetics. There are five variants of worms: N2, goa-1, unc-1, unc-38 and un63. N2 is wild type (i.e. normal), and the other four are mutant strains. This dataset relates to 258 traces of worms movements, and each worm is classified as either wild type (Class 1) or one of four mutant types (Class 2).

1.1.5 PhalangesOutlinesCorrect

This dataset is designed to test the efficacy of hand and bone outline detection and whether these outlines could be helpful in bone age prediction. Algorithms to automatically extract the hand outlines, and then the outlines of three bones of the middle finger (proximal, middle and distal phalanges) were applied to images, and three human evaluators labelled the output of the image outlining as correct or incorrect. If all three volunteers agree that a data point is valid, it is labelled as correct, and hence, Class 2 contains correctly identified data points whereas Class 1 contains incorrectly identified data points.

1.1.6 Strawberry

Food spectrographs are used in chemometrics to classify food types, a task that has obvious applications in food safety and quality assurance. The classes are strawberry (authentic samples) and non-strawberry (adulterated strawberries and other fruits), obtained using Fourier transform infrared (FTIR) spectroscopy with attenuated total reflectance (ATR) sampling.

1.1.7 Earthquakes

The earthquake classification problem involves predicting whether a major event is about to occur based on the most recent readings in the surrounding area. The data is taken from the Northern California Earthquake Data Center, and each data is an averaged reading for 1 h, with the first reading taken on Dec. 1st, 1967, and the last in 2003. This single time series was then transformed into a classification problem by first defining a major event as any reading of over 5 on the Rictor scale. Major events are often followed by aftershocks. The physics of these are well understood, and their detection is not the objective of this exercise. Hence, a positive case is considered to be one where a major event is not preceded by another major event for at least 512 hours. To construct a negative case, instances where there is a reading below 4 (to avoid blurring of the boundaries between major and non major events) that is preceded by at least 20 readings in the previous 512 hours that are non-zero (to avoid trivial negative cases) are considered. None of the cases overlap in time. This dataset consists of 368 negative cases (Class 1) and 93 positive cases (Class 2).

1.1.8 PowerCons

The PowerCons dataset contains the individual household electric power consumption in 1 year distributed in two season classes: warm (class 1) and cold (class 2), depending on whether the power consumption is recorded during the warm seasons (from April to September) or the cold seasons (from October to March).

1.1.9 Computers

These problems were taken from data recorded as part of a government-sponsored study called Powering the Nation. The intention was to collect behavioural data about how consumers use electricity within the home to help reduce the UK’s carbon footprint. The data contains readings from 250 households, sampled in two-minute intervals over a month. Classes are Desktop (Class 1) and Laptop (Class 2).

1.2 Experiments and results

We start the experiments with the original training and testing sets. We compare the classification accuracy computed by the MATLAB Deep learning toolbox which uses MSE with the classification accuracy computed by uniform approximation-based loss function. The results are given in Table 15.

Table 15 Original datasets: classification results

Full size table

One can see that the uniform approximation is more accurate for the TwoLeadECG, SONYAIBORobotSurface1 and ToeSegmentation1 datasets with a smaller training set than the test set while MSE is much more accurate for all the other datasets. Now, we swap the training and testing sets for the datasets bearing the numbers 4, 5, 6 and 7 since their training set is bigger than the testing set. The results are presented in Table 16.

Table 16 Original datasets: classification results, train and test sets are swapped

Full size table

Table 17 Reduced datasets: classification results for even number of points from each class in the training set

Full size table

The original training and testing sets are swapped only for datasets 4, 5, 6, and 7 in the rest of the experiments. The original training set is considered the training set for all other datasets (that is, datasets 1, 2, 3, 8 and 9).

Table 18 Reduced datasets: classification results for uneven number of points from each class in the training set where Class 1 contributes more

Full size table

Table 19 Reduced datasets: classification results for even number of points from each class in the training set where Class 1 contributes more

Full size table

Table 20 Reduced datasets: classification results for uneven number of points from each class in the training set where Class 2 contributes more

Full size table

Table 21 Reduced datasets: classification results for even number of points from each class in the training set where Class 2 contributes more

Full size table

Table 22 Reduced datasets: classification results for a randomly generated training set of 50 points

Full size table

Now, we consider reduced training sets which contain even number of representatives from each class in the training set. In particular, the first 10 points from each class were chosen to create a training set of size 20. However, since the size of the training set is small for datasets 1, 2 and 3, only the first 5 points from each class were chosen to create a training set of size 10. The results are presented in Table 17.

Our next step is to reduce the training set by considering uneven number of points from each class. The size of the training set is 20 for datasets 4, 5, 6, 7, 8 and 9. We consider 18 points from Class 1 and 2 points from Class 2. The results are presented in Table 18. For datasets 2 and 4, the size of the training set is reduced to 10: 8 points from Class 1 and 2 points from Class 2. For dataset 3, the training set contains only 8 points: 6 points from Class 1 and 2 points from Class 2. These different representations are due to the varying sizes of the training sets and also due to the size of the datasets that represents each class. The results are in Table 19.

Now, we consider a symmetric situation where the training set contains 20 points for the datasets numbers 4, 5, 6, 7, 8 and 9: 2 points from Class 1 and 18 points from Class 2. The results are presented in Table 20. Due to the same reasoning as above, we consider different representations for the rest of the datasets. For datasets 2 and 4, the size of the training set is 10: 2 points from Class 1 and 8 points from Class 2. For dataset 3, the training set contains only 8 points: 2 points from Class 1 and 6 points from Class 2. Results are in Table 21.

Table 23 Reduced datasets: classification results for a randomly generated training set of 20 points

Full size table

Table 24 Reduced datasets: classification results for a randomly generated training set of 10 points

Full size table

Now, we present the results when the training set points are chosen randomly. In Table 22, we present the results for datasets numbers 5, 6, 7, 8 and 9, in the case when the training set contains 50 randomly selected points. In Table 23, we select 20 points randomly to generate the training set, and these results are only valid for the above-mentioned datasets whose training set (original testing set) is larger than 20.

We finally present the results of the experiments when the training set contains 10 randomly selected points. This experiment is valid to all the datasets that we considered. The results are in Table 24.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Peiris, V., Roshchina, V. & Sukhorukova, N. Artificial neural networks with uniform norm-based loss functions. Adv Comput Math 50, 31 (2024). https://doi.org/10.1007/s10444-024-10124-9

Download citation

Published: 23 April 2024
DOI: https://doi.org/10.1007/s10444-024-10124-9

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Artificial neural networks with uniform norm-based loss functions

Abstract

Article PDF

Similar content being viewed by others

Fundamentals of Artificial Neural Networks and Deep Learning

Regression Neural Networks with a Highly Robust Loss Function

Regularisation of neural networks by enforcing Lipschitz continuity

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

1.1 Datasets

1.1.1 TwoLeadECG

1.1.2 SonyAIBORobotSurface1

1.1.3 ToeSegmentation1

1.1.4 WormsTwoClass

1.1.5 PhalangesOutlinesCorrect

1.1.6 Strawberry

1.1.7 Earthquakes

1.1.8 PowerCons

1.1.9 Computers

1.2 Experiments and results

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification (2010)

Navigation

Artificial neural networks with uniform norm-based loss functions

Abstract

Article PDF

Similar content being viewed by others

Fundamentals of Artificial Neural Networks and Deep Learning

Regression Neural Networks with a Highly Robust Loss Function

Regularisation of neural networks by enforcing Lipschitz continuity

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix

Appendix

1.1 Datasets

1.1.1 TwoLeadECG

1.1.2 SonyAIBORobotSurface1

1.1.3 ToeSegmentation1

1.1.4 WormsTwoClass

1.1.5 PhalangesOutlinesCorrect

1.1.6 Strawberry

1.1.7 Earthquakes

1.1.8 PowerCons

1.1.9 Computers

1.2 Experiments and results

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation