A Method for Optimal Division of Data Sets for Use in Neural Networks

Crowther, Patricia S.; Cox, Robert J.

doi:10.1007/11554028_1

Patricia S. Crowther²¹ &
Robert J. Cox²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 3684))

Included in the following conference series:

International Conference on Knowledge-Based and Intelligent Information and Engineering Systems

1756 Accesses
39 Citations

Abstract

Neural Networks are used to find a generalised solution from a sample set of a problem domain. When a small sample is all that is available, the correct division of data between the training, testing and validation sets is crucial to the performance of the resultant trained network. Data is often divided uniformly between the three data sets. We propose an alternative method for the optimal division of the data, based on empirical evidence from experiments with artificial data. The method is tested on real world data sets, with encouraging results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm. In: Proc 13th International Conference on Machine Learning 1996, pp. 148–156. Morgan Kaufmann, San Francisco (1996)
Google Scholar
Brieman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Google Scholar
Weiss, S.M., Kulikowski, C.A.: Computer Systems That Learn. Morgan Kaufmann, San Francisco (1991)
Google Scholar
Haykin, S.: Neural Networks a Comprehensive Foundation, 2nd edn. Prentice Hall, Englewood Cliffs (1999)
MATH Google Scholar
Kearns, M.: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. In: Advances in Neural Information Processing Systems 8, pp. 183–189. The MIT Press, Cambridge (1996)
Google Scholar
Kuncheva, L.I., Hadjitodorov, S.T.: Using Diversity in Cluster Ensembles. In: Proc. IEEE International Conf. on Systems, Man and Cybernetics, The Hague, The Netherlands (2004)
Google Scholar
Crowther, P., Cox, R., Sharma, D.: A Study of the Radial Basis Function Neural Network Classifiers using Known Data of Varying Accuracy and Complexity. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 210–216. Springer, Heidelberg (2004)
Chapter Google Scholar
Cox, R.J., Crowther, P.S.: An Empirical Investigation into the Error Characteristics of Neural Networks. In: Proceedings AISAT 2004 The 2nd International Conference on Artificial Intelligence in Science and Technology, Hobart, Australia, November 21-25, pp. 92–97 (2004)
Google Scholar
Machine learning repository, http://www.ics.uci.edu/~mlearn/MLSummary.html
Nash, W.J., Sellers, T.L., Talbot, S.R., Cawthorn, A.J., Ford, W.B.: The Population Biology of Abalone (Haliotis species). In: Tasmania, I. (ed.) Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait. Sea Fisheries Division Technical Report #48 (1994)
Google Scholar
Mangasarian, O.L., Wolberg, W.H.: Cancer Diagnosis via Linear Programming. SIAM News 23(5), 1, 18 (1990)
Google Scholar
Guvenir, H.A., Demiroz, G., Ilter, N.: Learning Differential Diagnosis of Eryhemato-Squamous Diseases using Voting Feature Intervals. Artificial Intelligence in Medicine 13(3), 147–165 (1998)
Article Google Scholar
Cox, R., Clark, D., Richardson, A.: An Investigation into the Effect of Ensemble Size and Voting Threshold on the Accuracy of Neural Network Ensembles. In: Foo, N.Y. (ed.) AI 1999. LNCS, vol. 1747, pp. 268–277. Springer, Heidelberg (1999)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

School of Information Sciences and Engineering, University of Canberra, ACT 2601, Australia
Patricia S. Crowther & Robert J. Cox

Authors

Patricia S. Crowther
View author publications
You can also search for this author in PubMed Google Scholar
Robert J. Cox
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Business, La Trobe University, 3086, Melbourne, Victoria, Australia
Rajiv Khosla
Centre for SMART systems Engineering Research Centre, University of Brighton, Moulsecoomb, BN2 4GJ, Brighton, UK
Robert J. Howlett
School of Electrical and Information Engineering, Knowledge Based Intelligent Engineering Systems Centre, University of South Australia, 5095, Mawson Lakes, SA, Australia
Lakhmi C. Jain

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Crowther, P.S., Cox, R.J. (2005). A Method for Optimal Division of Data Sets for Use in Neural Networks. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11554028_1

Download citation

DOI: https://doi.org/10.1007/11554028_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28897-8
Online ISBN: 978-3-540-31997-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics