The statistical analysis of multivariate serological frequency data
Data occurring in the form of frequencies are common in genetics—for example, in serology. Examples are provided by the AB0 group, the Rhesus group, and also DNA data. The statistical analysis of tables of frequencies is carried out using the available methods of multivariate analysis with usually three principal aims. One of these is to seek meaningful relationships between the components of a data set, the second is to examine relationships between populations from which the data have been obtained, the third is to bring about a reduction in dimensionality. This latter aim is usually realized by means of bivariate scatter diagrams using scores computed from a multivariate analysis. The multivariate statistical analysis of tables of frequencies cannot safely be carried out by standard multivariate procedures because they represent compositions and are therefore embedded in simplex space, a subspace of full space. Appropriate procedures for simplex space are compared and contrasted with simple standard methods of multivariate analysis (“raw” principal component analysis). The study shows that the differences between a log-ratio model and a simple logarithmic transformation of proportions may not be very great, particularly as regards graphical ordinations, but important discrepancies do occur. The divergencies between logarithmically based analyses and raw data are, however, great. Published data on Rhesus alleles observed for Italian populations are used to exemplify the subject.
Unable to display preview. Download preview PDF.
- Aitchison, J., 1986. The Statistical Analysis of Compositional Data. Chapman and Hall, London, pp. xv + 416.Google Scholar
- Aitchison, J., 1997. One hour course in compositional data-analysis, or compositional data-analysis is easy. In: Pawlowsky-Glahn, V. (Ed.), Proceedings of the 1997 Annual Conference of the International Association for Mathematical Geology. Universitat Politècnica de Catalunya, Barcelona, pp. 3–35.Google Scholar
- Campbell, N.A., 1979. Canonical variate analysis. Ph.D. Thesis. Imperial College, University of London.Google Scholar
- Edwards, A.W.F., 2000. Foundations of Mathematical Genetics, 2nd edition. Cambridge University Press, p. 121.Google Scholar
- Gower, J.C., 1967. Multivariate analysis and multidimensional geometry. The Statistician 17, 13–28.Google Scholar
- Mourant, A.E., Kopeíc, A.C., Domanievska-Sobczak, K., 1976. The Distribution of Human Blood Groups and Other Polymorphisms. Oxford University Press, London.Google Scholar
- Reyment, R.A., Savazzi, E., 1999. Aspects of Multivariate Statistical Analysis in Geology. Elsevier Science B. V., Amsterdam, pp. x + 285.Google Scholar
- Romano, V., Coli, C., Ragalmuto, A., D’Anna, R.P., Flugy, A., De Leo, G., Giambalvo, O., Lisa, A., Fiorani, O., Di Gaetano, C., Salemo, A., Tamouza, A., Chanon, D., Zei, G., Matullo, G., Piazza, A., 2003. Autosomal microsatellite and mtDNA genetic analysis in Sicily (Italy). Annals of Human Genetics 67, 42–53.CrossRefGoogle Scholar