Skip to main content

Graph_sampler: a simple tool for fully Bayesian analyses of DAG-models

Abstract

Bayesian networks (BNs) are widely used graphical models usable to draw statistical inference about directed acyclic graphs. We presented here Graph_sampler a fast free C language software for structural inference on BNs. Graph_sampler uses a fully Bayesian approach in which the marginal likelihood of the data and prior information about the network structure are considered. This new software can handle both the continuous as well as discrete data and based on the data type two different models are formulated. The software also provides a wide variety of structure prior which can depict either the global or local properties of the graph structure. Now based on the type of structure prior selected, we considered a wide range of possible values for the prior making it either informative or uninformative. We proposed a new and much faster jumping kernel strategy in the Metropolis–Hastings algorithm. The source C code distributed is very compact, fast, uses low memory and disk storage. We performed out several analyses based on different simulated data sets and synthetic as well as real networks to discuss the performance of Graph_sampler.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

References

  • Andreassen S, Riekehr C, Kristensen B, Schonheyder H, Leibovici L (1999) Using probabilistic and decision-theoretic methods in treatment and prognosis modeling. Artif Intell Med 15:121–134

    Article  Google Scholar 

  • Barker D, Hill S, Mukherjee S (2010) Mc4: a tempering algorithm for large-sample network inference. Pattern Recognit Bioinform 6282:431–442

    Article  Google Scholar 

  • Boettcher S, Dethlefsen C (2003) deal: a package for learning bayesian networks. J Stat Softw 8:1–40

    Google Scholar 

  • Bois F, Gayraud G (2015) Probabilistic generation of random networks taking into account information on motifs occurrence. J Comput Biol 22(1):25–36

    Article  Google Scholar 

  • Edwards D (2000) Introduction to graphical modelling, 2nd edn. Springer, New york

    Book  MATH  Google Scholar 

  • Friedman N, Murphy K, Russell S (1998) Learning the structure of dynamic probabilistic networks. In: Proceedings of the fourth conference on uncertainity in artificial intelligence (UAI). Morgan Kaufmann Publishers Inc., San Francisco, pp 139–147

  • Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–511

    Article  Google Scholar 

  • Heckerman D, Geiger D, Chickering D (1995) Learning bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243

    MATH  Google Scholar 

  • Husmeier D (2003) Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic bayesian networks. Bioinformatics 19(17):2271–2282

    Article  Google Scholar 

  • Korb K, Nicholson A (2010) Bayesian artificial intelligence. CRC Press, Boca Raton

    MATH  Google Scholar 

  • Lauritzen S (1996) Graphical models. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Mukherjee S, Speed P (2008) Network inference using informative priors. Proc Natl Acad Sci USA 105(38):14,313–14,318

    Article  Google Scholar 

  • Murphy K (2007) Software for graphical models : a review. ISBA (Intl Soc for Bayesian Analysis). Bulletin 14(4):13–15

    Google Scholar 

  • Neapolitan R (1990) Probabilistic reasoning in expert systems: theory and algorithms. John Wiley and Sons, Inc., New York

    Google Scholar 

  • Nott D, Green P (2004) Bayesian variable selection and the swendsen–wang algorithm. J Comput Graph Stat 13:141–157

    Article  MathSciNet  Google Scholar 

  • Pearce D, Kelly P (2006) A dynamic topological sort algorithm for directed acyclic graphs. ACM J Exp Algorithmics 11:1–7

    MathSciNet  MATH  Google Scholar 

  • Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Burlington

    MATH  Google Scholar 

  • R Core Team (2013) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna. http://www.R-project.org/

  • Robert C, Casella G (2004) Monte Carlo statistical methods. Springer Texts in Statistics, Berlin

    Book  MATH  Google Scholar 

  • Robinson R (1973) Counting labeled acyclic digraphs. In: New directions in the theory of graphs. New York Academic Press, pp 239–273

  • Scutari M (2010) Learning bayesian networks with the bnlearn r package. J Stat Softw 35:1–22

    Article  Google Scholar 

Download references

Acknowledgements

S. Datta is funded by a Ph.D. studentship for the French Ministry of Research. The research leading to these results has received funding from the Innovative Medicines Initiative Joint Undertaking, under Grant Agreement No. 115439 (StemBANCC), resources of which are composed of financial contribution from the European Union Seventh Framework Programme (FP7/2007-2013) and EFPIA companies in kind contribution. This publication reflects only the author’s views and neither the IMI JU nor EFPIA nor the European Commission are liable for any use that may be made of the information contained therein.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sagnik Datta.

Appendix: Graph_sampler installation

Appendix: Graph_sampler installation

Graph_sampler is an easily available free software that can be redistributed or modified under the terms of the GNU General Public License as published by the Free Software Foundation. It is an inference as well as simulation tool for DAGs and can simulate random graphs for general directed graphs as well as for DAGs. In the case of BNs, we infer about their probable structure through the joint use of priors and data about node values.

Graph_sampler is written in ANSI-standard C language and can be compiled in any system having a ANSI C compliant compiler. The GNU gcc compiler (freeware) is highly recommended and the automated compilation script (called Makefile) can be successfully used if the standard ’make’ command is available. In order to modify the input file parser, the ’lex’ and ’ yacc’ are highly recommended. The full software along with the manual can be downloaded from:

https://sites.google.com/site/utcchairmmbsptp/software

Once downloaded, the software should be decompressed using ’gunzip’ and ’tar’ commands. Other archiving tools can also be used. Graph_sampler can be compiled using the ’make’ command. On successful compilation of Graph_sampler, it is ready for running. In order to run Graph_sampler, an input file specifying the simulation parameters should be provided. In Unix the command-line syntax to run that executable is:

graph_sampler [input-file [output-prefix]]

where the brackets indicate optional arguments. If no input file and/or output prefix are not specified, the program uses the defaults. The default input file is script.txt and the output files created depends on the parameters specified in the input file. Default output file names are best_graph.out, graph_samples.out, degree_count.out, motifs_count.out, edge_p.out and results_mcmc.bin.

A Graph_sampler input file is a text (ASCII) file that obeys relatively simple syntax (see the manual). Values of all the predefined variables in the input file should be properly defined. Description and range of each variable is illustrated in the manual. In case of improper assignment of values, Graph_sampler post error messages during runtime.

Rights and permissions

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Datta, S., Gayraud, G., Leclerc, E. et al. Graph_sampler: a simple tool for fully Bayesian analyses of DAG-models. Comput Stat 32, 691–716 (2017). https://doi.org/10.1007/s00180-017-0719-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-017-0719-1

Keywords