Skip to main content
Log in

A semiparametric model for compositional data analysis in presence of covariates on the simplex

  • Published:
Test Aims and scope Submit manuscript

Abstract

Compositional data occur as natural realizations of multivariate observations comprising element proportions of some whole quantity. Such observations predominate in disciplines like geology, biology, ecology, economics and chemistry. Due to unit sum constraint on compositional data, specialized statistical methods are required for analyzing these data. Dirichlet distributions were originally used to study compositional data even though this family of distribution is not appropriate (see Aitchison, 1986) because of their extreme independence properties. Aitchison (1982) endeavored to provide a viable alternative to existing methods by employing Logistic Normal distribution to analyze such constrained data. However this family does not include the Dirichlet class and is therefore unable to address the issue of extreme independence. In this paper generalized Liouville family is investigated to model compositional data which includes covariates. This class permits distributions that admit negative or mixed correlation and also contains non-Dirichlet distributions with non-positive correlation and overcomes deficits in the Dirichlet class. Semiparametric Bayesian methods are proposed to estimate the probability density. Predictive distributions are used to assess performance of the model. The methods are illustrated on a real data set.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aitchison, J. (1982). The statistical analysis of compositional data.Journal of Royal Statistical Society B, 2:139–177.

    MathSciNet  Google Scholar 

  • Aitchison, J. (1985). A general class of distributions on the simplex.Journal of Royal Statistical Society B, 47:136–146.

    MATH  MathSciNet  Google Scholar 

  • Aitchison, J. (1986).The Statistical Analysis of Compositional Data. Chapman and Hall, London.

    MATH  Google Scholar 

  • Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Czaki, eds.Proceedings of International Symposium on Information Theory, pp. 267–281. Academia Kiado, Budapest.

    Google Scholar 

  • Barndorff-Nielsen, O. andJorgensen, B. (1991). Some parametric models on the simplex.Journal of Multivariate Analysis, 39:106–116.

    Article  MATH  MathSciNet  Google Scholar 

  • Bhansali, R. andDownham, D. (1977). Some properties of the order of an autoregressive model selected by a generalization of Aitkin’s FPE criterion.Biometrika, 64:541–551.

    MathSciNet  Google Scholar 

  • Breslow, N. (1974). Covariate analysis of censored survival data.Biometrics, 30:89–99.

    Article  Google Scholar 

  • Coakley, J. andRust, B. (1968). Sedimentation in an arctic lake.Journal of Sedimentary Petrology, 38:1290–1300.

    Google Scholar 

  • Cox, D. (1972). Regression models and life tables.Journal of the Royal Statistical Society, B, 34:187–220.

    MATH  Google Scholar 

  • Devroye, L. (1986).Non-uniform Random Variate Generation. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Edwards, J. (1922).A Treatise on the Integral Calculus, vol. II. MacMillan, New York.

    Google Scholar 

  • Fahrmier, L. (1994). Dynamic modelling and penalized likelihood estimation for discrete time survival data.Biometrika, 81:317–330.

    Article  Google Scholar 

  • Gelfand, A. andMallick, B. (1995). Bayesian analysis of proportional hazards models built from monotone functions.Biometrics, 51:841–848.

    Article  Google Scholar 

  • Gelfand, A. E., Dey, D. K., andChang, H. (1992). Model determining using predictive distributions with implementation via sampling-based methods (with discussion). In J. Bernardo, J. Berger, and A. Dawid, eds.,Proceedings of the Fourth Valencia International Meeting on Bayesian Statistics, pp. 147–167. Oxford University Press, Oxford.

    Google Scholar 

  • Gupta, R. andRichards, D. S. P. (1987). Multivariate Liouville distributions.Journal of Multivariate Analysis, 43:233–256.

    Article  MathSciNet  Google Scholar 

  • Gupta, R. andRichards, D. S. P. (1992a) Multivariate Liouville distributions ii.Probability and Mathematical Statistics, 12:291–309.

    MathSciNet  Google Scholar 

  • Gupta, R. andRichards, D. S. P. (1992b). Multivariate Liouville distributions, iii.Journal of Multivariate Analysis, 43:29–57.

    Article  MATH  MathSciNet  Google Scholar 

  • Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57:97–109.

    Article  MATH  Google Scholar 

  • Iyengar, M. andDey, D. (1998). Bayesian analysis of compositional analysis.Environmetrics, 9:657–671.

    Article  Google Scholar 

  • Metropolis, N., Rosenbluth, A., Rosenbluth, M. andTeller, A. (1953). Equations of state calculations by fast computing machines.Journal of Chemical Physics, 21:1087–1091.

    Article  Google Scholar 

  • Nelder, J. andWedderburn, R. (1972). Generalized linear models.Journal of Royal Statistical Society, A, 135:370–384.

    Article  Google Scholar 

  • Raftery, A. andLewis, S. (1992). How many iterations in the Gibbs sampler? In J. Bernardo, A. Smith, A. Dawid, and J. Berger, eds.,Proceedings of the Fourth Valencia International Meeting on Bayesian Statistics. Oxford University Press, Oxford.

    Google Scholar 

  • Rayens, W. andSrinivasan, C. (1994). Dependence properties of generalized Liouville distributions on the simplex.Journal of American Statistical Association, 89:1465–1470.

    Article  MATH  MathSciNet  Google Scholar 

  • Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics, 6:461–464.

    MATH  MathSciNet  Google Scholar 

  • Sinha, D. andDey, D. (1997). Semiparametric Bayesian analysis of survival data.Journal of American Statistical Association, 92:1195–1212.

    Article  MATH  MathSciNet  Google Scholar 

  • Wedderburn, R. (1976). On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models.Biometrika, 63:27–32.

    Article  MATH  MathSciNet  Google Scholar 

  • West, M. andHarison, J. (1989).Bayesian Forecasting and Dynamic Models. Springer-Verlag, New York.

    MATH  Google Scholar 

  • Whittaker, E. andWatson, G. (1952).A Course in Modern Analysis. Cambridge University Press, Cambridge.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dipak K. Dey.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Iyengar, M., Dey, D.K. A semiparametric model for compositional data analysis in presence of covariates on the simplex. Test 11, 303–315 (2002). https://doi.org/10.1007/BF02595709

Download citation

  • Received:

  • Accepted:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02595709

Keywords

AMS subject classification

Navigation