The propagation of machine learning based property prediction methods (e.g. QSAR, QSPR,.…) has lead to the question of the reliability of the prediction. This leads to the development of methods enabling the estimation of the reliability of a model based prediction.

There are two principal approaches in dealing with this demand: estimating the expected derivation from the prediction (e.g. gaussian processes) or classifying each compound whether the model is specified for it or not. The last approach has become known as estimating the applicability domain [1, 2] of a model. One drawback of the different AD estimation methods is that most of them are based on the spatial embedding of the training dataset in the descriptor space. Thus these algorithms are not directly suited in modelling the applicability domain of kernel-based predictors, which are working in a extremely high dimensional implicit feature space.

In this study we examined to what extent a standard descriptor based AD model can be used to describe the applicability domain of an optimal assignment kernel [3] based predictor. We split the popular Huuskonen [4] logS dataset 2:1 in a training and a test set and compared some standard AD methods [1, 2] (range-based, convex hull, leverage,…) regarding the correlation of the estimated AD with the test error. The results indicate that it is possible to estimate the applicability domain of a kernel based model using classical descriptor encodings of the molecules. Furthermore the results show that there are significant differences between the different methods. In our application the geometrical convex hull approach was superior.