U-statistics with conditional kernels for incomplete data models

  • Ao YuanEmail author
  • Mihai Giurcanu
  • George Luta
  • Ming T. Tan


For incomplete data models, the classical U-statistic estimator of a functional parameter of the underlying distribution cannot be computed directly since the data are not fully observed. To estimate such a functional parameter, we propose a U-statistic using a substitution estimator of the conditional kernel given the observed data. This kernel estimator is obtained by substituting the non-parametric maximum likelihood estimator for the underlying distribution function in the expression of the conditional kernel. We study the asymptotic properties of the proposed U-statistic for several incomplete data models, and in a simulation study, we assess the finite sample performance of the Mann–Whitney U-statistic with conditional kernel in the current status model. The analysis of a real-world data set illustrates the application of the proposed methods in practice.


U-statistics Censored data Incomplete data models Non-parametric MLE 


  1. Akritas, M. G. (1986). Empirical processes associated with V-statistics and a class of estimators under random censoring. The Annals of Statistics, 14, 619–637.MathSciNetCrossRefzbMATHGoogle Scholar
  2. Becher, H., Hall, P., Wilson, S. R. (1993). Bootstrap hypothesis testing procedures. Biometrics, 49, 1268–1272.Google Scholar
  3. Bennet, S. (1983). Log-logistic regression models for survival data. Journal of the Royal Statistical Society Series C (Applied Statistics), 32, 165–171.Google Scholar
  4. Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Annals of Mathematical Statistics, 37, 51–58.MathSciNetCrossRefzbMATHGoogle Scholar
  5. Borovskikh, Y. V. (1986). Theory of U-Statistics in Hilbert spaces. Kiev: Institute of Mathematics, Ukrainean Academy of Science.Google Scholar
  6. Bose, A., Sen, A. (1999). The strong law of large numbers for Kaplan-Meier U-statistics. Journal of Theoretical Probability, 12, 181–200.Google Scholar
  7. Bose, A., Sen, A. (2002). Asymptotic distribution of the Kaplan-Meier U-statistics. Journal of Multivariate Analysis, 83, 84–123.Google Scholar
  8. Choi, B. Y., Fine, J. P., Brookhart, M. A. (2013). Predictable confidence intervals for current status data. Statistics in Medicine, 32, 1419–1428.Google Scholar
  9. Core Team, R. (2014). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.Google Scholar
  10. Datta, S., Bandyopadhyay, D., Satten, G. A. (2010). Inverse probability of censoring weighted U-statistics for right-censored data with and application to testing hypotheses. Scandinavian Journal of Statistics, 37, 680–700.Google Scholar
  11. Efron, B. (1979). Bootstrap methods: Another look at Jackknife. The Annals of Statistics, 7, 1–26.MathSciNetCrossRefzbMATHGoogle Scholar
  12. Giurcanu, M., Yuan, A., Luta, G., Tan, M. (2015). UStat: The Mann-Whitney U-statistic with conditional kernel. R package version, 1.Google Scholar
  13. Gregory, G. (1977). Large sample theory for U-statistics and tests of fit. Annals of Statistics, 5, 110–123.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Groeneboom, P., Wellner, J. (1992). Information bounds and nonparametric maximum likelihood estimation. Basel: Birkháuser Verlag.Google Scholar
  15. Groeneboom, P., Wellner, J. (2001). Computing Chernoff’s distribution. Journal of Computational and Graphical Statistics, 388–400.Google Scholar
  16. Hall, P. (1992). The bootstrap and edgeworth expansion. New York: Springer.CrossRefzbMATHGoogle Scholar
  17. Heagerty, P., Zheng, Y. (2005). Survival model predictive accuracy and ROC curves. Biometrics, 61, 92–105.Google Scholar
  18. Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. The Annals of Mathematical Statistics, 19, 293–325.MathSciNetCrossRefzbMATHGoogle Scholar
  19. Hoeffding, W. (1961). The strong law of large numbers for U-statistics. Institute of Statistics Mimeo Series, 302, 1–10.Google Scholar
  20. Hoel, D. G., Walburg, H. E. (1972). Statistical analysis of survival experiments. Journal of the National Cancer Institute, 49, 361–372.Google Scholar
  21. Hu, C., Degrutolla, V. (2011). Recursive partitioning of resistant mutations for longitudinal markers based on a U-type score. Biostatistics, 12, 750–762.Google Scholar
  22. Janson, S. (1979). The asymptotic distribution of degenerate U-statistics. Preprint No 5, Department of Mathematics, University of Uppsala.Google Scholar
  23. Klaus, B., Strimmer, K. (2014). Fdrtool: Estimation of (local) false discovery rates and higher criticism. R package version, 1(2), 12.Google Scholar
  24. Koroljuk, V. S., Borovskikh, I. V. (1988). Asymptotic theory of U-statistics. Institute of Mathematics, Academy of Sciences of the Ukrainean SSR, 40, 169–182.Google Scholar
  25. Korolyuk, V. S., Borovskikh, I. V. (1994). Theory of U-statistics. Dodrecht: Kluwer.Google Scholar
  26. Kowalski, J., Tu, X. M. (2007). Modern applied U-statistics. Kiev: Wiley.Google Scholar
  27. Leucht, A. (2012). Degenerate U- and V-statistics under weak dependence: Asymptotic theory and bootstrap consistency. Bernoulli, 18(2), 552–585.MathSciNetCrossRefzbMATHGoogle Scholar
  28. Ma, Y., Valle, A. G., Zhang, A., Tu, X. M. (2010). A U-statistics based approach for modelling Cronbach coefficient alpha within a longitudinal data setting. Statistics in Medicine, 29, 659–670.Google Scholar
  29. Rubin, H., Vitale, R. A. (1980). Asymptotic distribution of symmetric statistics. Annals of Statistics, 8, 165–170.Google Scholar
  30. Schisterman, E., Rotnitzky, A. (2001). Estimation of the mean of a k-sample U-statistic with missing outcomes and auxiliaries. Biometrika, 88, 713–725.Google Scholar
  31. Sen, P. K. (1974). Almost sure behavior of U-statistics and von Mises’ differentiable statistical functions. The Annals of Statistics, 2, 387–395.MathSciNetCrossRefzbMATHGoogle Scholar
  32. Serfling, R. (1980). Approximation theorems of mathematical statistics. New York: Wiley.CrossRefzbMATHGoogle Scholar
  33. Therneau, T., Grambsch, P. (2010). Modelling survival data: Extending the Cox model. New York: Springer.Google Scholar
  34. Tressou, J. (2006). Nonparametric modelling of the left censorship of analytical data in food risk assessment. Journal of the American Statistical Association, 101, 1377–1386.MathSciNetCrossRefzbMATHGoogle Scholar
  35. Tsai, W. Y., Crowley, J. (1985). A large sample study of generalized maximum likelihood estimators from incomplete data via self-consistency. Annals of Statistics, 13, 1317–1334.Google Scholar
  36. Tsiatis, A. A. (2006). Semiparametric theory and missing data. New York: Springer.zbMATHGoogle Scholar
  37. Turnbull, B. (1974). Nonparametric estimation of a survivorship function with doubly censored data. Journal of the American Statistical Association, 69, 169–173.MathSciNetCrossRefzbMATHGoogle Scholar
  38. van der Vaart, A. W. (1998). Asymptotic statistics. New York: Springer.CrossRefzbMATHGoogle Scholar
  39. Vardi, Y. (1989). Multiplicative censoring, renewal processes, deconvolution and decreasing density: Nonparametric estimation. Biometrika, 76, 751–761.MathSciNetCrossRefzbMATHGoogle Scholar
  40. Vardi, Y., Zhang, C. H. (1992). Large sample study of empirical distributions in a random-multiplicative censoring model. Annals of Statistics, 20, 1022–1039.Google Scholar
  41. von Mises, R. (1947). On the asymptotic distribution of differentiable statistical functions. Annals of Mathematical Statistics, 18, 309–348.MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© The Institute of Statistical Mathematics, Tokyo 2015

Authors and Affiliations

  • Ao Yuan
    • 1
    Email author
  • Mihai Giurcanu
    • 2
  • George Luta
    • 1
  • Ming T. Tan
    • 1
  1. 1.Department of Biostatistics, Bioinformatics and BiomathematicsGeorgetown UniversityWashingtonUSA
  2. 2.Department of StatisticsUniversity of FloridaGainesvilleUSA

Personalised recommendations