Skip to main content
Log in

Inference about the arithmetic average of log transformed data

  • Regular Article
  • Published:
Statistical Papers Aims and scope Submit manuscript

Abstract

A common practice in statistics is to take the log transformation of highly skewed data and construct confidence intervals for the population average on the basis of transformed data. However, when computed based on log-transformed data, the confidence interval is for the geometric instead of the arithmetic average and neglecting this can lead to misleading conclusions. In this paper, we consider an approach based on a regression of the two sample averages to convert the confidence interval for the geometric average in a confidence interval for the arithmetic average of the original untransformed data. The proposed approach is substantially simpler to implement when compared to the existing methods and the extensive Monte Carlo and bootstrapping simulation study suggests outperforming in terms of coverage probabilities even at very small sample sizes. Some real data examples have been analyzed, which support the simulation findings of the paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Notes

  1. We use “average” or “mean” interchangeably.

  2. The use of bootstrapping was also suggested by one of the reviewers.

References

  • Abu-Shawiesh MO, Al-Athari FM, Kittani HF (2009) Confidence interval for the mean of a contaminated normal distribution. J Appl Sci 9(15):2835–2840

    Article  Google Scholar 

  • Akahir M (2002) Confidence intervals for the difference of means: application to the Behrens-Fisher type problem. Stat Pap 43:273–284. https://doi.org/10.1007/s00362-002-0100-4

    Article  Google Scholar 

  • Albrecht P, Steenis G, Wezel AL, Salk J (1984) The geometric mean: confidence limits and significance tests. Percept Psychophys 26(5):419–421

    Google Scholar 

  • Alf EF, Grossberg JM (1979) Standardization of poliovirus neutralizing antibody tests. Rev Infect Dis 6:S540–S544

    Google Scholar 

  • Atkinson AC (1986) Plots, transformations, and regression: an introduction to graphical methods of diagnostic regression analysis. Oxford statistical science series, 1st edn. Oxford University, Oxford

  • Baklizi A (2007) Inference about the mean difference of two non-normal populations based on independent samples: a comparative study. J Stat Comput Simul 77(7):613–624

    Article  MATH  Google Scholar 

  • Baklizi A (2008) Inference about the mean of skewed population: a comparative study. J Stat Comput Simul 78:421–435

    Article  MATH  Google Scholar 

  • Baklizi A, Kibria BMG (2009) One and two sample confidence intervals for estimating the mean of skewed populations: an empirical comparative study. J Appl Stat 1:1–9

    MATH  Google Scholar 

  • Bland JM, Altman DG (1996) Transformations, means, and confidence intervals. BMJ 312:1079

    Article  Google Scholar 

  • Box GEP, Cox DR (1964) An Analysis of Transformations. J R Stat Soc 26(2):211–243

    MATH  Google Scholar 

  • Chen L (1995) Testing the mean of skewed distributions. J Am Stat Assoc 90:762–772

    Article  Google Scholar 

  • Chen HJ, Chen S-Y (1999) A nearly optimal confidence interval for the largest normal mean. Commun Stat 28(1):131–146. https://doi.org/10.1080/03610919908813539

    Article  MATH  Google Scholar 

  • Chen Z, Mi J (2001) An approximate confidence interval for the scale parameter of the gamma distribution based on grouped data. Stat Pap 42:285–299. https://doi.org/10.1007/s003620100059

    Article  MATH  Google Scholar 

  • Cornish EA, Fischer RA (1937) Moments and cumulants in the specifications of distributions. Rev Int Stat Inst 5:307–327

    Article  Google Scholar 

  • Curto JD (2021) Averages: there is still something to learn. Computational economics. https://doi.org/10.1007/s10614-021-10165-y

  • Curto JD (2021) Confidence intervals for means and variances of nonnormal distributions. Communications in statistics—simulation and computation. https://doi.org/10.1080/03610918.2021.1963448

  • Feng C, Wang H, Lu N, Tu XM (2013) Log transformation: application and interpretation in biomedical research. Stat Med 32(2):230–239

    Article  Google Scholar 

  • Galton F (1897) The geometric mean in vital and social statistics. Proc R Soc Lond 29:365–367

    Google Scholar 

  • Hall P (1992) On the removal of skewness by transformation. J R Stat Soc 54(1):221–228

    Google Scholar 

  • Johnson NJ (1978) Modified \(t\) tests and confidence intervals for asymmetrical populations. J Am Stat Assoc 73(363):536–544

    MATH  Google Scholar 

  • Kibria BMG (2006) Modified confidence intervals for the mean of the Asymmetric distribution. Pak J Stat 22(2):109–120

    MATH  Google Scholar 

  • Kleijnen JPC, Kloppenburg GLJ, Meeuwsen FL (1986) Testing the mean of an asymmetric population: Johnson’s modified t test revisited. Commun Stat 15(3):715–732. https://doi.org/10.1080/03610918608812535

    Article  Google Scholar 

  • McGuinness D, Bennett S, Riley E (1997) Statistical analysis of highly skewed immune response data. J Immunol Methods 201:99–114

    Article  Google Scholar 

  • Owen AB (2001) Empirical likelihood. Chapman and Hall, London

    MATH  Google Scholar 

  • Sherman M, Maity A, Wang S (2011) Inferences for the ratio: Fieller’s interval, log ratio, and large sample based confidence intervals. AStA Adv Stat Anal 95:313

    Article  MATH  Google Scholar 

  • Shi W, Kibria BMG (2007) On some confidence intervals for estimating the mean of a skewed population. Int J Math Educ Sci Technol 38(3):412–421. https://doi.org/10.1080/00207390601116086

    Article  Google Scholar 

  • Shoemaker LH (2003) Fixing the F test for equal variances. Am Stat 57:105–114

    Article  MATH  Google Scholar 

  • Sutton CD (1993) Computer-intensive methods for tests about the mean of an asymmetrical distribution. J Am Stat Assoc 88:802–810

    Article  Google Scholar 

  • Taylor DJ, Kupper LL, Muller KE (2002) Improved approximate confidence intervals for the mean of a log-normal random variable. Stat Med 21:1443–1459

    Article  Google Scholar 

  • Tian L, Wu J (2005) Confidence intervals for the mean of lognormal data with excess zeros. Biom J 48(1):149–156

    Article  MATH  Google Scholar 

  • Wang F-K (2001) Confidence interval for the mean of non-normal data. Qual Reliab Eng 17(4):257–267. https://doi.org/10.1002/qre.400

    Article  Google Scholar 

  • Wilcox R (2021) A note on computing a confidence interval for the mean. Simul Comput. https://doi.org/10.1080/03610918.2021.2011926

    Article  Google Scholar 

  • Willink R (2005) A confidence interval and test for the mean of an asymmetric distribution. Commun Stat 34(4):753–766. https://doi.org/10.1081/STA-200054419

    Article  MATH  Google Scholar 

  • Wooldridge J (2020) Introductory econometrics: a modern approach, 7th edn. South-Western, Mason

    Google Scholar 

  • Yu K, Lu Z, Stander J (2003) Quantile regression: applications and current research areas. The Statistician 52(3):331–350

    Article  Google Scholar 

  • Zhou X-H, Gao S (1997) Confidence Intervals for the log-normal mean. Stat Med 16:783–790

    Article  Google Scholar 

  • Zhou X-H, Gao S (2000) One-Sided confidence intervals for means of positively skewed Distributions. Am Stat 54(2):100–104

    Google Scholar 

Download references

Acknowledgements

The author thanks the Editor-in-Chief and referees for their valuable comments and constructive suggestions. This work was supported by Fundação para a Ciência e a Tecnologia, Grant UIDB/00315/2020.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to José Dias Curto.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Curto, J.D. Inference about the arithmetic average of log transformed data. Stat Papers 64, 179–204 (2023). https://doi.org/10.1007/s00362-022-01315-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00362-022-01315-x

Keywords

Mathematics Subject Classification

Navigation