- 5.7k Downloads
Validation is the assessment of the quality of a predictive model, in accordance with the scientific paradigm in the natural sciences: a model that is able to make accurate predictions (the position of a planet in two weeks’ time) is – in some sense – a “correct” description of reality. In many applications in the natural sciences, unfortunately, validation is hard to do: chemical and biological processes often exibit quite significant variation unrelated to the model parameters. An example is the circadian rhythm: metabolomic samples, be it from animals or plants, will show very difierent characteristics when taken at difierent time points. When the experimental meta-data on the exact time point of sampling are missing, it will be very hard to ascribe difierences in metabolite levels to difierences between patients and controls, or difierent varieties of the same plant. Only a rigorous and consistent experimental design will be able to prevent this kind of fluctuations. Moreover, biological variation between individuals often dominates measurement variation. The bigger the variation, the more important it is to have enough samples for validation. Only in this way, reliable error estimates can be obtained.
KeywordsRandom Forest Bootstrap Sample Gini Index Classi Cation Percentile Method
Unable to display preview. Download preview PDF.