A Bioinformatic Platform for a Bayesian, Multiphased, Multilevel Analysis in Immunogenomics

  • P. Antal
  • A. Millinghoffer
  • G. Hullám
  • G. Hajós
  • Cs. Szalai
  • A. Falus
Part of the Immunomics Reviews: book series (IMMUN, volume 3)


The accumulation of electronically accessible data and knowledge are posing theoretical and practical challenges for study design and statistical data analysis. It consists of the use of the results of earlier high-throughput measurements of genetic variations, microRNA, and gene expression levels, and the use of the biological knowledge bases. We investigate fusion in the phases of study design, data analysis, and interpretation; specifically, we present methodologies and bioinformatic tools in the Bayesian framework to deepen, lengthen, and broaden this fusion. First, we overview a Bayesian decision support for design of partial genetic association studies (GASs) incorporating domain literature, knowledge bases, and results of analysis of earlier studies. Second, we present a Bayesian multilevel analysis (BMLA) for GAS, which performs an integrated analysis at the univariate and multivariate levels, and at the level of interactions. Third, we present a Bayesian logic to support interpretation, which integrates the results of data analysis and factual domain knowledge. Finally, we discuss the advantages of the Bayesian framework to cope with small sample size, fusion of data and knowledge, challenges of multiple testing, meta-analysis, and positive results bias (i.e., the communication of scientific uncertainty). The genomics of asthma will serve as an application domain.


Bayesian Network Genetic Association Study Bayesian Model Average Markov Blanket Factual Source 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



We thank Yves Moreau for his insightful suggestion to apply the SNP study design system for prior generation in our Bayesian data analysis. Supported by grants from the OTKA National Scientific Research Fund (PD-76348); NKTH TECH_08-A1/2-2008-0120 (Genagrid), and the János Bolyai Research Scholarship of the Hungarian Academy of Sciences (P. Antal).


  1. Aerts S et al (2006) Gene prioritization through genomic data fusion. Nature 24:537–544CrossRefGoogle Scholar
  2. Ananiadou S, Mcnaught J (2006) Text mining for biology and biomedicine, Artech HouseGoogle Scholar
  3. Antal P, Millinghoffer (2006) A literature mining using Bayesian networks. In Proceedings of third European workshop on probabilistic graphical models, Prague, pp 17–24Google Scholar
  4. Antal P, Fannes G, Moreau Y, Timmerman D, DeMoor B (2004) Using literature and data to learn Bayesian networks as clinical models of ovarian tumors. Artif Intell Med 30:257–281CrossRefPubMedGoogle Scholar
  5. Antal P, Gezsi A, Hullam G, Millinghoffer A (2006) Learning complex Bayesian network features for classification. In: Proceedings of third European workshop on probabilistic graphical models, Prague, pp 9–16Google Scholar
  6. Balding DJ (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791CrossRefPubMedGoogle Scholar
  7. Bonis J et al (2006) OSIRIS: A tool for retrieving literature about sequence variants. Bioinformatics 22(20):2567–2569CrossRefPubMedGoogle Scholar
  8. Boutilier C, Friedman N, Goldszmidt M, Koller D (1996) Context-Specific Independence in Bayesian Networks, Proc. of the 20th Conf. on Uncertainty in Artificial Intelligence ({UAI}-1996), 115–123Google Scholar
  9. Cooper GF, Herskovits E (1992) A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347Google Scholar
  10. Couzin J (2008) MicroRNAs make big impression in disease after disease. Science 319:1782–1784CrossRefPubMedGoogle Scholar
  11. De La Vega FM et al (2006) A tool for selecting SNPs for association studies based on observed linkage disequilibrium patterns. Pac Symp Biocomput 11:487–498CrossRefGoogle Scholar
  12. Denison DGT, Holmes CC, Mallick BK, Smith AFM (2002) Bayesian Methods for Nonlinear Classification and Regression. Wiley & SonsGoogle Scholar
  13. Estivill X, Armengol L (2007) Copy number variants and common disorders: Filling the gaps and exploring complexity in genome-wide association studies. PLoS Genet 3:1787–1799CrossRefPubMedGoogle Scholar
  14. Franke A et al (2006) Genomizer: An integrated analysis system for genome wide association data. Hum Mutat 27(6):583–588CrossRefPubMedGoogle Scholar
  15. Friedman N (2003) Inferring cellular networks using probabilistic graphical models. Science 303(5659):799–805CrossRefPubMedGoogle Scholar
  16. Friedman N, Koller D (2003) Being Bayesian about network structure. Mach Learn 50(2):95–125CrossRefGoogle Scholar
  17. Gamerman D (1997) Markov Chain Monte Carlo. Chapman & Hall, LondonGoogle Scholar
  18. Gelman A, Carlin JB, Stern HS, Rubin DB (1995) Bayesian data analysis. Chapman & Hall, LondonGoogle Scholar
  19. Gerstein M, Junker J (2001) Blurring the boundaries between scientific “papers” and biological databases. Nature (web debate, on-line 7 May 2001)Google Scholar
  20. Giudici P, Castelo R (2003) Improving Markov Chain Monte Carlo model search for data mining. Machine Learning, 50:127–158CrossRefGoogle Scholar
  21. Grover D et al (2007) QuickSNP: An automated web server for selection of tagSNPs. Nucleic Acids Res 35:W115–W120CrossRefPubMedGoogle Scholar
  22. Gu S et al (2005) HAPLOT: A graphical comparison of haplotype blocks, tagSNP sets and SNP variation for multiple populations. Bioinformatics 21(20):3938–3939CrossRefPubMedGoogle Scholar
  23. Ingenuity Systems (2007) Ingenuity pathways analysisGoogle Scholar
  24. Kohavi R, John GH (1997) Wrappers for feature subset selection. Artif Intell 97:273–324CrossRefGoogle Scholar
  25. Moffatt MF et al (2007) Genetic variants regulating ORMDL3 expression contribute to the risk of childhood asthma. Nature 448:470–473CrossRefPubMedGoogle Scholar
  26. Pearl J (1988) Probabilistic reasoning in intelligent systems. Morgan Kaufmann, San FranciscoGoogle Scholar
  27. Peer D, Regev A, Elidan G, Friedman N (2001) Inferring subnetworks from perturbed expression profiles. Bioinformatics, Proc. of ISMB, 17(Suppl. 1):215–224Google Scholar
  28. Petretto E, Liu ET, Aitman TJ (2007) A gene harvest revealing the archeology and complexity of human disease. Nat Genet 39:1299–1301CrossRefPubMedGoogle Scholar
  29. Pettersson F et al (2004) GOLDsurfer: Three dimensional display of linkage disequilibrium. Bioinformatics 20(17):3241–3243CrossRefPubMedGoogle Scholar
  30. Russel S, Norvig P (2001) Artificial intelligence. Prentice HallGoogle Scholar
  31. Shriner D, Vaughan LK, Padilla MA, Tiwari HK (2007) Problems with genome-wide association studies. Science 316:1840–1842CrossRefPubMedGoogle Scholar
  32. Szalai C, Ungvári I, Pelyhe L, Tölgyesi G, Falus A (2008) Asthma from a pharmacogenomic point of view. Br J Pharmacol 153:1602–1614CrossRefPubMedGoogle Scholar
  33. Wang L et al (2005) SNPHunter a bioinformatic software for single nucleotide polymorphism data acquisition and management. BMC Bioinformatics 6:16CrossRefGoogle Scholar
  34. Xu H et al (2005) SNPselector: A web tool for selecting SNPs for genetic association studies. Bioinformatics 21(22):4181–4186CrossRefPubMedGoogle Scholar
  35. Yue P et al (2006) SNPs3D: Candidate gene and SNP selection for association studies. BMC Bioinformatics 7:166CrossRefPubMedGoogle Scholar
  36. Zhang Y, Liu JS (2007) Bayesian inference of epistatic interactions incase-control studies. Nat Genet 39(9):1167–1173CrossRefPubMedGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  • P. Antal
    • 1
  • A. Millinghoffer
  • G. Hullám
  • G. Hajós
  • Cs. Szalai
  • A. Falus
  1. 1.Department of Measurement and Information SystemsBudapest University of Technology and EconomicsBudapestHungary

Personalised recommendations