Biological Theory

, Volume 4, Issue 1, pp 21–28 | Cite as

Megavariate Genetics: What You Find Is What You Go Looking For

  • Clive E. BowmanEmail author


The subjectivity or “purpose dependency” of measurement in biology is discussed using examples from high-dimensional medical genetic research. The human observer and study designer tacitly determine the numerical and graphical representation of biological simplicity or complexity via choice of ascertainment (sampling frame), numbers to measure, referential basis, statistical learning formalism and feature search, and also via the selection of display styles (cognitive analogies) for all these quantifications.


aesthetic complexity fit human choice imagination model Occam’s razor relativism simplicity sufficiency 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Bacanu S-A, Devlin B, Roeder K (2000) The power of genomic control. American Journal of Human Genetics 66: 1933–1944.CrossRefGoogle Scholar
  2. Bamshad M, Wooding SP (2003) Signatures of natural selection in the human genome. Nature Review Genetics 4: 99–111.CrossRefGoogle Scholar
  3. Bookstein FL (2009) Measurement, explanation, and biology: Lessons from a long century. Biological Theory 4: 6–20.CrossRefGoogle Scholar
  4. Bowman C, Delrieu O, Roger J (2006) Filtering pharmacogenetic signals. In: Interdisciplinary Statistics and Bioinformatics (Barber S, Baxter PD, Mardia KV, Walls RE, eds), 41–47. Leeds, UK: University of Leeds Press.Google Scholar
  5. Burnham KP, Anderson DR (2002) Model Selection and Multimodel Inference. New York: Springer.Google Scholar
  6. Charalambous C, Delrieu O, Bowman C (2008) Whole genome scan algebra and smoothing. In: The Art and Science of Statistical Bioinformatics (Barber S, Baxter PD, Gusnanto A, Mardia KV, eds), 21–27. Leeds, UK: University of Leeds Press.Google Scholar
  7. Chen GK, Witte JS (2007) Enriching the analysis of genomewide association studies with hierarchical modeling. American Journal of Human Genetics 81:397–404.CrossRefGoogle Scholar
  8. Delrieu O, Bowman CE (2005) Visualisation of gene and pathway determinants of disease. In: Quantitative Biology, Shape Analysis, and Wavelets (Barber S, Baxter PD, Mardia KV, Walls RE, eds), 21–24. Leeds, UK: University of Leeds Press.Google Scholar
  9. Delrieu O, Bowman CE (2006) Visualising gene determinants of disease in drug discovery. Pharmacogenomics 7: 311–329.CrossRefGoogle Scholar
  10. Delrieu O, Bowman C (2007) On using the correlations of divergences. In: Systems Biology and Statistical Bioinformatics (Barber S, Baxter PD, Mardia KV, eds), 27–35. Leeds, UK: University of Leeds Press.Google Scholar
  11. Engels EA, Schmid CH, Terrin N, Olkin I, Lau J (2000) Heterogeneity and statistical significance in meta-analysis: An empirical study of 125 meta-analyses. Statistics in Medicine 19: 1707–1728.CrossRefGoogle Scholar
  12. Eriksson L, Johansson E, Kettaneh-Wold N, Trygg J, Wikstrom C, Wold S (2006a) Multivariate and Megavariate Data Analysis. Part I: Basic Principles and Applications. Part II: Advanced Applications and Method Extensions. Umea, Sweden: Umetrics Press.Google Scholar
  13. Eriksson L, Johansson E, Kettaneh-Wold N, Trygg J, Wikstrom C, Wold S (2006b) Multivariate and Megavariate Data Analysis. Part II. Advanced Applications and Method Extensions. Umeå, Sweden: Umetrics Press.Google Scholar
  14. Eriksson L, Johansson E, Lindgren F, Sjstrm M, Wold S (2002) Megavariate analysis of hierarchical QSAR data. Journal of Computer-Aided Molecular Design 16: 711–726.CrossRefGoogle Scholar
  15. Fisher R (1922) On the mathematical foundations of theoretical statistics. Philosophical Transactions of the Royal Society London A 222: 309–368.CrossRefGoogle Scholar
  16. Gonik L, Smith W (1993) The Cartoon Guide to Statistics. New York: Harper Resource.Google Scholar
  17. Gordon R (2010) Building quantitative understanding of an embryo as it builds itself: Lessons from Ganymede and Google Earth. Biological Theory 4: in press.Google Scholar
  18. Grainger DJ (2003) Megavariate statistics meets high data-density analytical methods: The future of medical diagnostics? IRTL Reviews 1: 1–6.Google Scholar
  19. Greenspan NS (2007) Conceptualizing immune responsiveness. Nature Immunology 8: 5–7.CrossRefGoogle Scholar
  20. Hinds DA, Stokowski RP, Patil N, Konvicka K, Kershenobich D, Cox DR, Ballinger DG (2004) Matching strategies for genetic association studies in structured populations. American Journal of Human Genetics 74: 317–325.CrossRefGoogle Scholar
  21. Hunter DJ (2005) Gene-environment interactions in human diseases. Nature Reviews Genetics 6: 287–298.CrossRefGoogle Scholar
  22. Ioannidis JPA (2007) Non-replication and inconsistency in the genome-wide association setting. Human Heredity 64: 203–213.CrossRefGoogle Scholar
  23. Jaynes ET (2003) Probability Theory: The Logic of Science. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  24. Joosse NP, Pormann PE (2008) Archery, mathematics, and conceptualizing inaccuracies in medicine in 13th century Iraq and Syria. Journal of the Royal Society of Medicine 101: 425–427.CrossRefGoogle Scholar
  25. Koch I, Naito K (2007) Dimension selection for feature selection and dimension reduction with principal and independent component analysis. Neural Computation 19: 513–545.CrossRefGoogle Scholar
  26. Krzanowski W (2000) Principles of Multivariate Analysis: A User’s Perspective. Oxford: Oxford University Press.Google Scholar
  27. Lasky-Su J, Lyon HN, Emilsson V, Heid IM, Molony C, Raby BA, Lazarus R, Klanderman B, Soto-Quiros ME, Avila L, Silverman EK, Thorleifsson G, Thorsteinsdottir U, Kronenberg F, Vollmert C, Illig T, Fox CS, Levy D, Laird N, Ding X, McQueen MB, Butler J, Ardlie K, Papoutsakis C, Dedoussis G, O’Donnell CJ, Wichmann H-E, Celedn JC, Schadt E, Hirschhorn J, Weiss ST, Stefansson K, Lange C (2008) On the replication of genetic associations: Timing can be everything! American Journal of Human Genetics 82: 849–858.CrossRefGoogle Scholar
  28. Lee KR, Lin X, Park DC, Eslava S (2003) Megavariate data analysis of mass spectrometric proteomics data using latent variable projection method. Proteomics 3: 1680–1686.CrossRefGoogle Scholar
  29. Martens H, Kohler A (2009) Mathematics and measurements for high-throughput quantitative biology. Biological Theory 4: 29–43.CrossRefGoogle Scholar
  30. McCloskey D, Ziliak ST (2009) The Unreasonable Ineffectiveness of Fisherian “Tests” in Biology and Especially Medicine. Biological Theory 4: 44–53.CrossRefGoogle Scholar
  31. McLachlan GJ (1992) Discriminant Analysis and Statistical Pattern Recognition. New York: Wiley.CrossRefGoogle Scholar
  32. Núñez R (2009) Numbers and arithmetic: Neither hard-wired nor out there. Biological Theory 4: 68–83.CrossRefGoogle Scholar
  33. Oxnard C, O’Higgins P (2009) Biology clearly needs morphometrics: Does morphometrics need biology? Biological Theory 4: 84–97.CrossRefGoogle Scholar
  34. Pan WH, Fann CSJ, Wu JY, Hung SI, Hung YT, Chen YJ, Hsu CL, Liao CJ, Chen YT (2004) Establishment of Taiwan Han Chinese cell and gene bank: Comparing SNP profiles in MHC region with Caucasians. Paper presented at the American Society of Human Genetics 54th Annual meeting, Toronto, ON, Canada, October 26–30.Google Scholar
  35. Popper K (2002) The Logic of Scientific Discovery. Florence, KY: Routledge.Google Scholar
  36. Romero R, Kuivaniemi H, Tromp G, Olson JM (2002) The design, execution, and interpretation of genetic association studies to decipher complex diseases. American Journal of Obstetrics and Gynaecology 187: 1299–1312.CrossRefGoogle Scholar
  37. Roses AD (2004) Pharmacogenetics and drug development: The path to safer and more effective drugs. Nature Reviews Genetics 5: 643–655.CrossRefGoogle Scholar
  38. Roses AD, Burns DK, Chissoe S, Middleton L, St Jean P (2005) Disease-specific target selection: A critical first step down the right road. Drug Discovery Today 10: 177–189.CrossRefGoogle Scholar
  39. Rubingh CM, Bijlsma S, Derks EPPA, Bobeldijk I, Verheij ER, Kochhar S, Smilde AK (2006) Assessing the performance of statistical validation tools for megavariate metabolomics data. Metabolomics 2: 53–61.CrossRefGoogle Scholar
  40. Skillicorn D (2007) Understanding Complex Datasets: Data Mining with Matrix Decompositions. Boca Raton, FL: Chapman and Hall/CRC.CrossRefGoogle Scholar
  41. Skol AD, Scott LJ, Abecasis GR, Boehnke M (2006) Joint analysis is more efficient than replication-based analysis for two-stage genome-wide association studies. Nature Genetics 38: 209–213.CrossRefGoogle Scholar
  42. Tyson JJ, Chen KC, Novak B (2003) Sniffers, buzzers, toggles and blinkers: Dynamics of regulatory and signaling pathways in the cell. Current Opinion in Cell Biology 15: 221–231.CrossRefGoogle Scholar
  43. Vis DJ, Westerhuis JA, Smilde AK, van der Greef J (2007) Statistical validation of megavariate effects in ASCA. BMC Bioinformatics 8: 322.CrossRefGoogle Scholar
  44. Wakefield J (2007) A Bayesian measure of the probability of false discovery in genetic epidemiology studies. American Journal of Human Genetics 81: 208–227.CrossRefGoogle Scholar
  45. Wardrop D (2008) Ockham’s razor: sharpen or re-sheathe? Journal of the Royal Society of Medicine 101: 50–51.CrossRefGoogle Scholar
  46. Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447: 661–678.CrossRefGoogle Scholar
  47. Wigner EP (1955) Characteristic vectors of bordered matrices with infinite dimensions. The Annals of Mathematics, Second Series, 62: 548–564.CrossRefGoogle Scholar

Copyright information

© Konrad Lorenz Institute for Evolution and Cognition Research 2009

Authors and Affiliations

  1. 1.School of Biological SciencesUniversity of ReadingReadingUK

Personalised recommendations