Abstract
Cross-validation is a statistical approach for validating predictive methods, classification models, and clustering techniques. It assesses the reliability and stability of the results of the corresponding statistical analyses (e.g., predictions, classifications, forecasts) based on independent datasets. For prediction of trend, association, clustering, and classification, a model is usually trained on one dataset (training data) and subsequently tested on new data (testing or validation data). Statistical internal cross-validation uses iterative bootstrapping to define test datasets, evaluates the model predictive performance, and assesses its power to avoid overfitting. Overfitting is the process of computing a predictive or classification model that describes random error, i.e., fits to the noise components of the observations, instead of the actual underlying relationships and salient features in the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Elder, J, Nisbet, R, Miner, G (eds.) (2009) Handbook of Statistical Analysis and Data Mining Applications, Academic Press, ISBN 0080912036, 9780080912035.
Hastie, T, Tibshirani, R, Friedman, J. (2013) The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Springer Series in Statistics, New York, ISBN 1489905189, 9781489905185.
Hothorn, T, Everitt, BS. (2014) A Handbook of Statistical Analyses using R, CRC Press, ISBN 1482204592, 9781482204599.
http://journals.plos.org/plosone/article?id=10.1371/journal.pone.0157077
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2018 Ivo D. Dinov
About this chapter
Cite this chapter
Dinov, I.D. (2018). Prediction and Internal Statistical Cross Validation. In: Data Science and Predictive Analytics. Springer, Cham. https://doi.org/10.1007/978-3-319-72347-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-72347-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-72346-4
Online ISBN: 978-3-319-72347-1
eBook Packages: Computer ScienceComputer Science (R0)