Abstract
Neural Networks are used to find a generalised solution from a sample set of a problem domain. When a small sample is all that is available, the correct division of data between the training, testing and validation sets is crucial to the performance of the resultant trained network. Data is often divided uniformly between the three data sets. We propose an alternative method for the optimal division of the data, based on empirical evidence from experiments with artificial data. The method is tested on real world data sets, with encouraging results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Freund, Y., Schapire, R.: Experiments with a New Boosting Algorithm. In: Proc 13th International Conference on Machine Learning 1996, pp. 148–156. Morgan Kaufmann, San Francisco (1996)
Brieman, L.: Bagging Predictors. Machine Learning 24(2), 123–140 (1996)
Weiss, S.M., Kulikowski, C.A.: Computer Systems That Learn. Morgan Kaufmann, San Francisco (1991)
Haykin, S.: Neural Networks a Comprehensive Foundation, 2nd edn. Prentice Hall, Englewood Cliffs (1999)
Kearns, M.: A Bound on the Error of Cross Validation Using the Approximation and Estimation Rates, with Consequences for the Training-Test Split. In: Advances in Neural Information Processing Systems 8, pp. 183–189. The MIT Press, Cambridge (1996)
Kuncheva, L.I., Hadjitodorov, S.T.: Using Diversity in Cluster Ensembles. In: Proc. IEEE International Conf. on Systems, Man and Cybernetics, The Hague, The Netherlands (2004)
Crowther, P., Cox, R., Sharma, D.: A Study of the Radial Basis Function Neural Network Classifiers using Known Data of Varying Accuracy and Complexity. In: Negoita, M.G., Howlett, R.J., Jain, L.C. (eds.) KES 2004. LNCS (LNAI), vol. 3215, pp. 210–216. Springer, Heidelberg (2004)
Cox, R.J., Crowther, P.S.: An Empirical Investigation into the Error Characteristics of Neural Networks. In: Proceedings AISAT 2004 The 2nd International Conference on Artificial Intelligence in Science and Technology, Hobart, Australia, November 21-25, pp. 92–97 (2004)
Machine learning repository, http://www.ics.uci.edu/~mlearn/MLSummary.html
Nash, W.J., Sellers, T.L., Talbot, S.R., Cawthorn, A.J., Ford, W.B.: The Population Biology of Abalone (Haliotis species). In: Tasmania, I. (ed.) Blacklip Abalone (H. rubra) from the North Coast and Islands of Bass Strait. Sea Fisheries Division Technical Report #48 (1994)
Mangasarian, O.L., Wolberg, W.H.: Cancer Diagnosis via Linear Programming. SIAM News 23(5), 1, 18 (1990)
Guvenir, H.A., Demiroz, G., Ilter, N.: Learning Differential Diagnosis of Eryhemato-Squamous Diseases using Voting Feature Intervals. Artificial Intelligence in Medicine 13(3), 147–165 (1998)
Cox, R., Clark, D., Richardson, A.: An Investigation into the Effect of Ensemble Size and Voting Threshold on the Accuracy of Neural Network Ensembles. In: Foo, N.Y. (ed.) AI 1999. LNCS, vol. 1747, pp. 268–277. Springer, Heidelberg (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Crowther, P.S., Cox, R.J. (2005). A Method for Optimal Division of Data Sets for Use in Neural Networks. In: Khosla, R., Howlett, R.J., Jain, L.C. (eds) Knowledge-Based Intelligent Information and Engineering Systems. KES 2005. Lecture Notes in Computer Science(), vol 3684. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11554028_1
Download citation
DOI: https://doi.org/10.1007/11554028_1
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-28897-8
Online ISBN: 978-3-540-31997-9
eBook Packages: Computer ScienceComputer Science (R0)