Abstract
A common practice in statistics is to take the log transformation of highly skewed data and construct confidence intervals for the population average on the basis of transformed data. However, when computed based on log-transformed data, the confidence interval is for the geometric instead of the arithmetic average and neglecting this can lead to misleading conclusions. In this paper, we consider an approach based on a regression of the two sample averages to convert the confidence interval for the geometric average in a confidence interval for the arithmetic average of the original untransformed data. The proposed approach is substantially simpler to implement when compared to the existing methods and the extensive Monte Carlo and bootstrapping simulation study suggests outperforming in terms of coverage probabilities even at very small sample sizes. Some real data examples have been analyzed, which support the simulation findings of the paper.
Similar content being viewed by others
Notes
We use “average” or “mean” interchangeably.
The use of bootstrapping was also suggested by one of the reviewers.
References
Abu-Shawiesh MO, Al-Athari FM, Kittani HF (2009) Confidence interval for the mean of a contaminated normal distribution. J Appl Sci 9(15):2835–2840
Akahir M (2002) Confidence intervals for the difference of means: application to the Behrens-Fisher type problem. Stat Pap 43:273–284. https://doi.org/10.1007/s00362-002-0100-4
Albrecht P, Steenis G, Wezel AL, Salk J (1984) The geometric mean: confidence limits and significance tests. Percept Psychophys 26(5):419–421
Alf EF, Grossberg JM (1979) Standardization of poliovirus neutralizing antibody tests. Rev Infect Dis 6:S540–S544
Atkinson AC (1986) Plots, transformations, and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford statistical science series, 1st edn. Oxford University, Oxford
Baklizi A (2007) Inference about the mean difference of two non-normal populations based on independent samples: a comparative study. J Stat Comput Simul 77(7):613–624
Baklizi A (2008) Inference about the mean of skewed population: a comparative study. J Stat Comput Simul 78:421–435
Baklizi A, Kibria BMG (2009) One and two sample confidence intervals for estimating the mean of skewed populations: an empirical comparative study. J Appl Stat 1:1–9
Bland JM, Altman DG (1996) Transformations, means, and confidence intervals. BMJ 312:1079
Box GEP, Cox DR (1964) An Analysis of Transformations. J R Stat Soc 26(2):211–243
Chen L (1995) Testing the mean of skewed distributions. J Am Stat Assoc 90:762–772
Chen HJ, Chen S-Y (1999) A nearly optimal confidence interval for the largest normal mean. Commun Stat 28(1):131–146. https://doi.org/10.1080/03610919908813539
Chen Z, Mi J (2001) An approximate confidence interval for the scale parameter of the gamma distribution based on grouped data. Stat Pap 42:285–299. https://doi.org/10.1007/s003620100059
Cornish EA, Fischer RA (1937) Moments and cumulants in the specifications of distributions. Rev Int Stat Inst 5:307–327
Curto JD (2021) Averages: there is still something to learn. Computational economics. https://doi.org/10.1007/s10614-021-10165-y
Curto JD (2021) Confidence intervals for means and variances of nonnormal distributions. Communications in statistics—simulation and computation. https://doi.org/10.1080/03610918.2021.1963448
Feng C, Wang H, Lu N, Tu XM (2013) Log transformation: application and interpretation in biomedical research. Stat Med 32(2):230–239
Galton F (1897) The geometric mean in vital and social statistics. Proc R Soc Lond 29:365–367
Hall P (1992) On the removal of skewness by transformation. J R Stat Soc 54(1):221–228
Johnson NJ (1978) Modified \(t\) tests and confidence intervals for asymmetrical populations. J Am Stat Assoc 73(363):536–544
Kibria BMG (2006) Modified confidence intervals for the mean of the Asymmetric distribution. Pak J Stat 22(2):109–120
Kleijnen JPC, Kloppenburg GLJ, Meeuwsen FL (1986) Testing the mean of an asymmetric population: Johnson’s modified t test revisited. Commun Stat 15(3):715–732. https://doi.org/10.1080/03610918608812535
McGuinness D, Bennett S, Riley E (1997) Statistical analysis of highly skewed immune response data. J Immunol Methods 201:99–114
Owen AB (2001) Empirical likelihood. Chapman and Hall, London
Sherman M, Maity A, Wang S (2011) Inferences for the ratio: Fieller’s interval, log ratio, and large sample based confidence intervals. AStA Adv Stat Anal 95:313
Shi W, Kibria BMG (2007) On some confidence intervals for estimating the mean of a skewed population. Int J Math Educ Sci Technol 38(3):412–421. https://doi.org/10.1080/00207390601116086
Shoemaker LH (2003) Fixing the F test for equal variances. Am Stat 57:105–114
Sutton CD (1993) Computer-intensive methods for tests about the mean of an asymmetrical distribution. J Am Stat Assoc 88:802–810
Taylor DJ, Kupper LL, Muller KE (2002) Improved approximate confidence intervals for the mean of a log-normal random variable. Stat Med 21:1443–1459
Tian L, Wu J (2005) Confidence intervals for the mean of lognormal data with excess zeros. Biom J 48(1):149–156
Wang F-K (2001) Confidence interval for the mean of non-normal data. Qual Reliab Eng 17(4):257–267. https://doi.org/10.1002/qre.400
Wilcox R (2021) A note on computing a confidence interval for the mean. Simul Comput. https://doi.org/10.1080/03610918.2021.2011926
Willink R (2005) A confidence interval and test for the mean of an asymmetric distribution. Commun Stat 34(4):753–766. https://doi.org/10.1081/STA-200054419
Wooldridge J (2020) Introductory econometrics: a modern approach, 7th edn. South-Western, Mason
Yu K, Lu Z, Stander J (2003) Quantile regression: applications and current research areas. The Statistician 52(3):331–350
Zhou X-H, Gao S (1997) Confidence Intervals for the log-normal mean. Stat Med 16:783–790
Zhou X-H, Gao S (2000) One-Sided confidence intervals for means of positively skewed Distributions. Am Stat 54(2):100–104
Acknowledgements
The author thanks the Editor-in-Chief and referees for their valuable comments and constructive suggestions. This work was supported by Fundação para a Ciência e a Tecnologia, Grant UIDB/00315/2020.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Curto, J.D. Inference about the arithmetic average of log transformed data. Stat Papers 64, 179–204 (2023). https://doi.org/10.1007/s00362-022-01315-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00362-022-01315-x