The aim of this study was to analyse the predictive power of patient characteristics in terms of length of stay. A further aim was to use patient characteristics to predict episodes with extreme length of stay. Significant effects on length of stay were found in seven patient characteristics. Affective disorder as main diagnosis, severity of disease and chronicity of disease had increasing effects on length of stay. Danger to others, substance-related disorders as main diagnosis, the daily requirement of somatic care and male gender had decreasing effects on length of stay. The cross-validated fit of the model was medium  (R2:14 %) and the RMSE in out-of-sample prediction comparably high (40 days). Furthermore, the accuracy of out-of-sample prediction of extreme length of stay was low according to common standards .
Strength and weaknesses of the study
Strength of this study was the relatively large number of patient characteristics available from the electronic medical records. Moreover, these data were consistent because their completeness and plausibility was routinely checked and demands for documentation related to missing values were sent out timely. Furthermore, patient-related variables used for prediction were documented at admission. Hence, they were external factors and therefore appropriate measures for prospective payment systems.
A limitation of this study was its single site design. Therefore, results reflect the care provided at the study site and they potentially incorporate local idiosyncrasies. This raises the issue of generalizability. However, the study sample was relatively large with care provided at eight wards for 738 inpatient episodes across all main diagnostic groups. Therefore, a rather comprehensive picture was acquired. Nevertheless, it remains uncertain whether specific patient variables would lead to other effects on length of stay under different clinical circumstances and different types of patients.
Results in relation to prior research
The direction of effects found in significant coefficients of patient characteristics was as expected and in concurrence with related findings of prior research and clinical experiences. These were affective disorders , severity of disease , chronicity of disease , male gender  and substance related disorders . The direction of effects found in danger to others and daily requirement of somatic care might appear less intuitively and are discussed below.
The decreasing effects on length of stay in documented danger to others might appear in contrast to a first intuitive assumption that more dangerous patients should require longer stays. However, Warnke et al.  found patients with increased hostility to experience significantly shorter stays. Boot et al.  found aggression to be a significant predictor of short stays. Moreover, Lansing et al.  found no differences in length of stay between patients that are dangerous to others and patients that are not. A possible explanation for shorter instead of longer length of stay associated with documented danger to others found by the presented study is concurrent lack of treatment compliance, which could have led to earlier discharges. This could have been robust to controlling for discharge against medical advice, since lack of compliance might have resulted in consented discharges. Furthermore, since the condition of dangerousness was documented at admission, stabilisation of acute crises during the first days of stay might have allowed early discharges.
Furthermore, the presented study found the daily requirement of somatic care to be associated with shorter length of stay. This is in contrast to most of prior research and counterintuitive. For instance, Lyketsos et al.  found that somatic illness documented as ‘a focus of care’ was related to longer length of stay in psychiatric inpatients. Sloan et al.  and Schubert et al.  found somatic comorbidities associated with longer length of stay in depressed patients. A probably relevant difference between these studies and the presented study is that the former were carried out in the USA and overall length of stay was substantially shorter. For instance, the mean length of stay in Lyketsos et al.  ranged between 7.5 days and 13.2 days and somatic illness as ‘a focus of care’ was associated with an increase of 3.2 days. While the requirement to alleviate acute somatic ailments before discharge in these studies might have prolonged relatively short stays, other factors in the context of somatic ailments might have shortened relatively long stays in the presented study, such as problems of the patient to participate in therapeutic measures. Comparable results of decreasing length of stay related to medical comorbidities was also found by Ismail et al. , but only for patients with dementia.
The model showed a ‘medium’ to ‘large’ fit (22 %) in the complete sample according to common standards in the social sciences  and in comparison to prior research findings. Of twenty studies included in a recent systematic review , eight studies showed better fit, ten studies showed lower fit and two studies did not report respective statistics.
Split sample cross-validation was carried out by using the one half of episodes admitted first for estimation and the other half for validation. This approach was taken instead of separating by chance in order to come closer to the concept of applying the model to external patients, who might be subject to changes in clinical circumstances . As expected, the model fit was better in the estimation sample (R2 = 28 %) than in the validation sample (R2 = 14 %). Furthermore, estimating the model for the first half of admitted patients yielded a better performance than for the complete sample (R2 = 22 %). A potential reason for this difference might be the higher mean length of stay and variance in patients admitted at the end of the year, i.e. 64 days in quarter four compared to 55 days in quarter one to three (p.003).
A more informative but far less reported measure of predictive accuracy is the RMSE, which is a measure of the mean deviation of the modelled from the observed length of stay. The RMSE of 40 days found by the presented study in out-of sample-prediction was high with respect to a median of 56 days and an interquartile range of 59 days in the validation sample (not shown is results). However, the poor performance in prediction of individual episodes’ length of stay and concomitant strong and highly significant covariates was not a contradiction but found to be a common trait of results from predictive modelling .