Abstract
Continuous populations are grouped in many social, economic, medical, or technical fields of research. However, by grouping them, a lot of information provided by the continuous population is lost. Especially the median split, which is still adopted by many researchers, and its generalization to an equiprobable k-group split lead to a high efficiency loss. Here, this loss of information is investigated by analytical and numerical analyses for some typical symmetric and skew population distributions often found in applications. Various distribution parameters, numbers of groups, and split methods are taken from theoretical considerations and real data sets. Losses sometimes in excess of 50% can be reduced by optimal grouping.
Similar content being viewed by others
References
Blischke, W.R., Murthy, D.N.P.: Reliability: Modeling, Prediction, and Optimization. Wiley, Hoboken (2000)
Connor, R.J.: Grouping for testing trends in categorical data. J. Am. Stat. Assoc. 67, 601–604 (1972)
Cox, D.R.: The analysis of exponentially distributed life-times with two types of failure. J. Am. Stat. Assoc. 21, 411–421 (1959)
Coplan, J.: Diagnosing the DVD disappointment: a life cycle view. Working Paper Glucksman Institute for Research in Security Markets, University New York, New York (2006)
Crouhy, M., Galai, D., Mark, R.: A comparative analysis of current credit risk models. J. Bank. Finance 24, 59–117 (2000)
Dalenius, T.: Sampling in Sweden. Almquist–Wiksell, Stockholm (1957)
Dalenius, T., Hodge, J.L.: Minimum variance stratification. J. Am. Stat. Assoc. 54, 88–101 (1959)
Dorofeev, S., Grant, P.: Statistics for Real-Life Sample Surveys. Cambridge University Press, Cambridge (2006)
Dwyer, D., Korablev, I.: Moody’s KMV Losscalc V3.0, Moody’s KMV (April 2009)
Farewell, V.T., Tom, B.D.M., Royston, P.: The impact of dichotomization on the efficiency of testing for an interaction effect in exponential family models. J. Am. Stat. Assoc. 99, 822–830 (2004)
Givon, M.M., Shapira, Z.: Response to rating scales: a theoretical model and its application to the number of categories problem. J. Mark. Res. 21, 410–419 (1984)
Gupton, G.M., Stein, R.M.: Losscalc V2: Dynamic Prediction of LGD. Moody’s KMV (January 2005)
Henking, A., Bluhm, C., Fahrmeir, L.: Kreditrisikomessung. Springer, Berlin (2006)
Hassani, H., Zokaei, M., von Rosen, D., Amiri, S., Ghodsi, M.: Does noise reduction matter for curve fitting in growth curve models. Comput. Methods Programs Biomed. 96, 173–181 (2009)
Irwin, J.R., McClelland, G.H.: Negative consequences of dichotomizing continuous predictor variables. J. Mark. Res. 40, 366–371 (2003)
Kalbfleisch, J.D., Prentice, R.L.: The Statistical Analysis of Failure Data, 2nd edn. Wiley, Hoboken (2002)
Kantz, H., Schreiber, T.: Nonlinear Time Series Analysis, 2nd edn. Cambridge University Press, Cambridge (2004)
Kimms, A., Müller-Bungart, M.: Simulation of stochastic demand data streams for network revenue management problems. OR Spectrum 29, 5–20 (2007)
Lagakos, S.W.: Effects of mismodelling and mismeasuring explanatory variables on tests of their association with a response variable. Stat. Med. 7, 257–274 (1988)
MacCallum, R.C., Zhang, S., Preacher, K.J., Rucker, D.D.: On the practice of dichotomization of quantitative variables. Psychol. Methods 7, 19–40 (2002)
Maxwell, S.E., Delaney, H.D.: Bivariate median splits and spurious statistical significance. Psychol. Bull. 113, 181–190 (1993)
Moe, W.W., Fader, P.S.: Modeling hedonic portfolio products: a joint segmentation analysis of music compact disc sales. J. Mark. Res. 38, 376–385 (2001)
Morgan, T.M., Eliashoff, R.M.: Effect of categorizing a continuous covariate on the comparison of survival time. J. Am. Stat. Assoc. 81, 917–921 (1986)
Nurmela, J., Sirkiä, T.: Patterns of ICT diffusion in Finland in 1996–2002. In: Bruck, P.A., Boumans, J. (eds.) High Performance Multimedia, pp. 67–88. IOS Press, Tokyo (2008)
Onorato, M., Altman, E.I.: An integrated pricing model for defaultable loans and bonds. Eur. J. Oper. Res. 163, 65–82 (2005)
Renault, O., Scaillet, O.: On the way to recovery: a nonparametric bias free estimation of recovery rate densities. J. Bank. Finance 28, 2915–2931 (2004)
Rogers, E.M.: Diffusion of Innovations. Free Press, New York (2003)
Royston, P., Altman, D.G., Sauerbrei, W.: Dichotomizing continuous predictors in multiple regression: a bad idea. Stat. Med. 25, 127–141 (2006)
Schreiber, T.: Interdisciplinary application of nonlinear time series methods. Professional diss., University of Wuppertal (1998)
Storb, R., Deeg, H.J., Farewell, V., Doney, K., Appelbaum, F., Beatty, P., Bensinger, W., Buckner, C.D., Clift, R., Hansen, J.: Marrow transplantation for severe aplastic anemia: methotrexate alone compared with a combination of methotrexate and cyclosporine for prevention of acute graft-versus-host disease. Blood 68, 119–125 (1986)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Knüppel, L., Hermsen, O. Median split, k-group split, and optimality in continuous populations. AStA Adv Stat Anal 94, 53–74 (2010). https://doi.org/10.1007/s10182-010-0122-5
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-010-0122-5