On the Inference of Partially Correlated Data with Applications to Public Health Issues

Part of the ICSA Book Series in Statistics book series (ICSABSS)


Correlated or matched data is frequently collected under many study designs in applied sciences such as the social, behavioral, economic, biological, medical, epidemiologic, health, public health, and drug developmental sciences in order to have a more efficient design and to control for potential confounding factors in the study. Challenges with respect to availability and cost commonly occur with matching observational or experimental study subjects. Researchers frequently encounter situations where the observed sample consists of a combination of correlated and uncorrelated data due to missing responses. Ignoring cases with missing responses, when analyzing the data, will introduce bias in the inference and reduce the power of the testing procedure. As such, the importance in developing new statistical inference methods to treat partially correlated data and new approaches to model partially correlated data has grown over the past few decades. These methods attempt to account for the special nature of partially correlated data.

In this chapter, we provide several methods to compare two Gaussian distributed means in the two sample location problem under the assumption of partially dependent observations. For categorical data, tests of homogeneity for partially matched-pair data are investigated. Different methods of combining tests of homogeneity based on Pearson chi-square test and McNemar chi-squared test are investigated. Also, we will introduce several nonparametric testing procedures which combine all cases in the study.


McNemar test Pearson chi-square test Inverse chi-square method Weighted chi-square test Tippett method Partially matched-pair Case–control and matching studies T-test Z-test Power of the test p-Value of the test Efficiency Matched pairs sign test Sign test Wilcoxon signed-rank test Correlated and uncorrelated data 



We are grateful to the Center for Child & Adolescent Health for providing us with the 2003 National Survey of Children’s Health. Also, we would like to thank the referees and the associate editor for their valuable comments which improved the manuscript.


  1. Agresti, A.: Categorical Data Analysis. Wiley, New York (1990)MATHGoogle Scholar
  2. Akritas, M.G., Kuha, J., Osgood, D.W.: A nonparametric approach to matched pairs with missing data. Sociol. Methods Res. 30, 425–457 (2002)MathSciNetCrossRefGoogle Scholar
  3. Brunner, E., Puri, M.L.: Nonparametric methods in design and analysis of experiments. In: Ghosh, S., Rao, C.R. (eds.) Handbook of Statistics 13, pp. 631–703. North-Holland/Elsevier, Amsterdam (1996)Google Scholar
  4. Brunner, E., Domhof, S., Langer, F.: Nonparametric Analysis of Longitudinal Data in Factorial Designs. Wiley, New York (2002)Google Scholar
  5. Child and Adolescent Health Measurement Initiative: National Survey of Children with Special Health Care Needs: Indicator Dataset 6. Data Resource Center for Child and Adolescent Health website. Retrieved from: www.childhealthdata.org (2003)
  6. Conover, W.J.: Practical Nonparametric Statistics, 3rd edn. Wiley, New York (1999)Google Scholar
  7. Dimery, I.W., Nishioka, K., Grossie, B., Ota, D.M., Schantz, S.P., Robbins, K.T., Hong, W.K.: Polyamine metabolism in carcinoma of oral cavity compared with adjacent and normal oral mucosa. Am. J. Surg. 154, 429–433 (1987)CrossRefGoogle Scholar
  8. Dubnicka, S.R., Blair, R.C., Hettmansperger, T.P.: Rank-based procedures for mixed paired and two-sample designs. J. Mod. Appl. Stat. Methods 1(1), 32–41 (2002)MATHGoogle Scholar
  9. Efron, B.: Bootstrap methods: another look at the jackknife. Ann. Stat. 7, 1–26 (1979)MathSciNetCrossRefGoogle Scholar
  10. Ekbohm, G.: Comparing means in the paired case with missing data on one response. Biometrika 63(1), 169–172 (1976)CrossRefGoogle Scholar
  11. Fisher, R.A.: Statistical Methods for Research Workers, 4th edn. Oliver & Boyd, London (1932)MATHGoogle Scholar
  12. Hedges, L.V., Oklin, I.: Statistical Methods for Meta-Analysis: Combined Test Procedures. Academic, London (1985)Google Scholar
  13. Hennekens, C.H., Burning, J.E.: Epidemiology in medicine. Boston: Little, Brown (1987)Google Scholar
  14. Hettmansperger, T.P.: Statistical Inference Based on Ranks. Wiley, New York (1984)Google Scholar
  15. Hettmansperger, T.P., McKean, J.W.: Robust Nonparametric Statistical Method, 2nd edn. CRC Press, Taylor & Francis Group, New York (2011)Google Scholar
  16. Ibrahim, H.I.: Evaluating the power of the Mann–Whitney test using the bootstrap method. Commun. Stat. Theory Methods 20, 2919–2931 (1991)CrossRefGoogle Scholar
  17. Im KyungAh: A modified signed rank test to account for missing in small samples with paired data. M.S. Thesis, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA. http://www.worldcat.org/title/modified-signed-rank-test-to-account-for-missing-data-in-small-samples-with-paired-data/oclc/52418573 (2002)
  18. Janssen, A.: Resampling Student’s t-type statistics. Ann. Inst. Stat. Math. 57, 507–529 (2005)CrossRefGoogle Scholar
  19. Janssen, A., Pauls, T.: How do bootstrap and permutation tests work? Ann. Stat. 31, 768–806 (2003)MathSciNetCrossRefGoogle Scholar
  20. Konietschke, F., Harrar, S.W., Lange, K., Brunner, E.: Ranking procedures for matched pairs with missing values – asymptotic theory and a small sample approximation. Comput. Stat. Data Anal. 56, 1090–1102 (2012)MathSciNetCrossRefGoogle Scholar
  21. Lin, P., Stivers, L.E.: On difference of means with incomplete data. Biometrika 61(2), 325–334 (1974)MathSciNetCrossRefMATHGoogle Scholar
  22. Looney, S.W., Jones, P.W.: A method for comparing two normal means using combined samples of correlated and uncorrelated data. Stat. Med. 22, 1601–1610 (2003)CrossRefGoogle Scholar
  23. Nason, G.P.: On the sum of t and Gaussian random variables. http://www.maths.bris.ac.uk/~guy/Research/papers/SumTGauss.pdf (2005). Accessed 1 May 2011
  24. Nurnberger, J., Jimerson, D., Allen, J.R., Simmons, S., Gershon, E.: Red cell ouabain-sensitive Na+-K+-adenosine triphosphatase: a state marker in affective disorder inversely related to plasma cortisol. Biol. Psychiatry 17(9), 981–992 (1982)Google Scholar
  25. Rempala, G., Looney, S.: Asymptotic properties of a two-sample randomized test for partially dependent data. J. Stat. Plann. Inference 136, 68–89 (2006)MathSciNetCrossRefMATHGoogle Scholar
  26. Samawi, H.M., Vogel, R.L.: Tests of homogeneity for partially matched-pairs data. Stat. Methodol. 8, 304–313 (2011)MathSciNetCrossRefMATHGoogle Scholar
  27. Samawi, H.M., Vogel, R.L.: Notes on two sample tests for partially correlated (paired) data. J. Appl. Stat. 41(1), 109–117 (2014)MathSciNetCrossRefMATHGoogle Scholar
  28. Samawi, H.M., Woodworth, G.G., Al-Saleh, M.F.: Two-sample importance resampling for the bootstrap. Metron. LIV(3–4) (1996)Google Scholar
  29. Samawi, H.M., Woodworth, G.G., Lemke, J.: Power estimation for two-sample tests using importance and antithetic resampling. Biom. J. 40(3), 341–354 (1998)CrossRefMATHGoogle Scholar
  30. Samawi, H.M., Yu, L., Vogel, R.L.: On some nonparametric tests for partially correlated data: proposing a new test. Unpublished manuscript (2014)Google Scholar
  31. Snedecor, G.W., Cochran, W.G.: Statistical Methods, 7th edn. Iowa State University Press, Ames (1980)Google Scholar
  32. Steere, A.C., Green, J., Schoen, R.T., Taylor, E., Hutchinson, G.J., Rahn, D.W., Malawista, S.E.: Successful parenteral penicillin therapy of established Lyme arthritis. N. Engl. J. Med. 312(14), 8699–8874 (1985)CrossRefGoogle Scholar
  33. Tang, X.: New test statistic for comparing medians with incomplete paired data. M.S. Thesis, Department of Biostatistics, Graduate School of Public Health, University of Pittsburgh, Pittsburgh, PA. http://www.google.com/search?hl=en&rlz=1T4ADRA_enUS357US357&q=Tang+X.+%282007%29New+Test+Statistic+for+Comparing+Medians+with+Incomplete+Paired+Data&btnG=Search&aq=f&aqi=&aql=&oq= (2007)
  34. Tippett, L.H.C.: The Method of Statistics. Williams & Norgate, London (1931)Google Scholar
  35. Walker, G.A., Saw, J.G.: The distribution of linear combinations of t-variables. J. Am. Stat. Assoc. 73(364), 876–878 (1978)MathSciNetGoogle Scholar
  36. Weidmann, E., Whiteside, T.L., Giorda, R., Herberman, R.B., Trucco, M.: The T-cell receptor V beta gene usage in tumor-infiltrating lymphocytes and blood of patients with hepatocellular carcinoma. Cancer Res. 52(21), 5913–5920 (1992)Google Scholar
  37. Xu, J., Harrar, S.W.: Accurate mean comparisons for paired samples with missing data: an application to a smoking-cessation trial. Biom. J. 54, 281–295 (2012)MathSciNetCrossRefMATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  1. 1.Department of BiostatisticsJiann-Ping Hsu College of Public Health, Georgia Southern UniversityStatesboroUSA
  2. 2.Department of BiostatisticsJiann-Ping Hsu College of Public Health, Georgia Southern UniversityStatesboroUSA

Personalised recommendations