Abstract
Compositional data occur as natural realizations of multivariate observations comprising element proportions of some whole quantity. Such observations predominate in disciplines like geology, biology, ecology, economics and chemistry. Due to unit sum constraint on compositional data, specialized statistical methods are required for analyzing these data. Dirichlet distributions were originally used to study compositional data even though this family of distribution is not appropriate (see Aitchison, 1986) because of their extreme independence properties. Aitchison (1982) endeavored to provide a viable alternative to existing methods by employing Logistic Normal distribution to analyze such constrained data. However this family does not include the Dirichlet class and is therefore unable to address the issue of extreme independence. In this paper generalized Liouville family is investigated to model compositional data which includes covariates. This class permits distributions that admit negative or mixed correlation and also contains non-Dirichlet distributions with non-positive correlation and overcomes deficits in the Dirichlet class. Semiparametric Bayesian methods are proposed to estimate the probability density. Predictive distributions are used to assess performance of the model. The methods are illustrated on a real data set.
Similar content being viewed by others
References
Aitchison, J. (1982). The statistical analysis of compositional data.Journal of Royal Statistical Society B, 2:139–177.
Aitchison, J. (1985). A general class of distributions on the simplex.Journal of Royal Statistical Society B, 47:136–146.
Aitchison, J. (1986).The Statistical Analysis of Compositional Data. Chapman and Hall, London.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. Petrov and F. Czaki, eds.Proceedings of International Symposium on Information Theory, pp. 267–281. Academia Kiado, Budapest.
Barndorff-Nielsen, O. andJorgensen, B. (1991). Some parametric models on the simplex.Journal of Multivariate Analysis, 39:106–116.
Bhansali, R. andDownham, D. (1977). Some properties of the order of an autoregressive model selected by a generalization of Aitkin’s FPE criterion.Biometrika, 64:541–551.
Breslow, N. (1974). Covariate analysis of censored survival data.Biometrics, 30:89–99.
Coakley, J. andRust, B. (1968). Sedimentation in an arctic lake.Journal of Sedimentary Petrology, 38:1290–1300.
Cox, D. (1972). Regression models and life tables.Journal of the Royal Statistical Society, B, 34:187–220.
Devroye, L. (1986).Non-uniform Random Variate Generation. Springer-Verlag, New York.
Edwards, J. (1922).A Treatise on the Integral Calculus, vol. II. MacMillan, New York.
Fahrmier, L. (1994). Dynamic modelling and penalized likelihood estimation for discrete time survival data.Biometrika, 81:317–330.
Gelfand, A. andMallick, B. (1995). Bayesian analysis of proportional hazards models built from monotone functions.Biometrics, 51:841–848.
Gelfand, A. E., Dey, D. K., andChang, H. (1992). Model determining using predictive distributions with implementation via sampling-based methods (with discussion). In J. Bernardo, J. Berger, and A. Dawid, eds.,Proceedings of the Fourth Valencia International Meeting on Bayesian Statistics, pp. 147–167. Oxford University Press, Oxford.
Gupta, R. andRichards, D. S. P. (1987). Multivariate Liouville distributions.Journal of Multivariate Analysis, 43:233–256.
Gupta, R. andRichards, D. S. P. (1992a) Multivariate Liouville distributions ii.Probability and Mathematical Statistics, 12:291–309.
Gupta, R. andRichards, D. S. P. (1992b). Multivariate Liouville distributions, iii.Journal of Multivariate Analysis, 43:29–57.
Hastings, W. (1970). Monte Carlo sampling methods using Markov chains and their applications.Biometrika, 57:97–109.
Iyengar, M. andDey, D. (1998). Bayesian analysis of compositional analysis.Environmetrics, 9:657–671.
Metropolis, N., Rosenbluth, A., Rosenbluth, M. andTeller, A. (1953). Equations of state calculations by fast computing machines.Journal of Chemical Physics, 21:1087–1091.
Nelder, J. andWedderburn, R. (1972). Generalized linear models.Journal of Royal Statistical Society, A, 135:370–384.
Raftery, A. andLewis, S. (1992). How many iterations in the Gibbs sampler? In J. Bernardo, A. Smith, A. Dawid, and J. Berger, eds.,Proceedings of the Fourth Valencia International Meeting on Bayesian Statistics. Oxford University Press, Oxford.
Rayens, W. andSrinivasan, C. (1994). Dependence properties of generalized Liouville distributions on the simplex.Journal of American Statistical Association, 89:1465–1470.
Schwarz, G. (1978). Estimating the dimension of a model.Annals of Statistics, 6:461–464.
Sinha, D. andDey, D. (1997). Semiparametric Bayesian analysis of survival data.Journal of American Statistical Association, 92:1195–1212.
Wedderburn, R. (1976). On the existence and uniqueness of the maximum likelihood estimates for certain generalized linear models.Biometrika, 63:27–32.
West, M. andHarison, J. (1989).Bayesian Forecasting and Dynamic Models. Springer-Verlag, New York.
Whittaker, E. andWatson, G. (1952).A Course in Modern Analysis. Cambridge University Press, Cambridge.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Iyengar, M., Dey, D.K. A semiparametric model for compositional data analysis in presence of covariates on the simplex. Test 11, 303–315 (2002). https://doi.org/10.1007/BF02595709
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.1007/BF02595709
Keywords
- Compositional data
- Markov chain Monte Carlo methods
- posterior predictive distribution
- semiparametric density estimation