Abstract
Bayesian networks (BNs) are widely used graphical models usable to draw statistical inference about directed acyclic graphs. We presented here Graph_sampler a fast free C language software for structural inference on BNs. Graph_sampler uses a fully Bayesian approach in which the marginal likelihood of the data and prior information about the network structure are considered. This new software can handle both the continuous as well as discrete data and based on the data type two different models are formulated. The software also provides a wide variety of structure prior which can depict either the global or local properties of the graph structure. Now based on the type of structure prior selected, we considered a wide range of possible values for the prior making it either informative or uninformative. We proposed a new and much faster jumping kernel strategy in the Metropolis–Hastings algorithm. The source C code distributed is very compact, fast, uses low memory and disk storage. We performed out several analyses based on different simulated data sets and synthetic as well as real networks to discuss the performance of Graph_sampler.
This is a preview of subscription content, access via your institution.













References
Andreassen S, Riekehr C, Kristensen B, Schonheyder H, Leibovici L (1999) Using probabilistic and decision-theoretic methods in treatment and prognosis modeling. Artif Intell Med 15:121–134
Barker D, Hill S, Mukherjee S (2010) Mc4: a tempering algorithm for large-sample network inference. Pattern Recognit Bioinform 6282:431–442
Boettcher S, Dethlefsen C (2003) deal: a package for learning bayesian networks. J Stat Softw 8:1–40
Bois F, Gayraud G (2015) Probabilistic generation of random networks taking into account information on motifs occurrence. J Comput Biol 22(1):25–36
Edwards D (2000) Introduction to graphical modelling, 2nd edn. Springer, New york
Friedman N, Murphy K, Russell S (1998) Learning the structure of dynamic probabilistic networks. In: Proceedings of the fourth conference on uncertainity in artificial intelligence (UAI). Morgan Kaufmann Publishers Inc., San Francisco, pp 139–147
Gelman A, Rubin DB (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7:457–511
Heckerman D, Geiger D, Chickering D (1995) Learning bayesian networks: the combination of knowledge and statistical data. Mach Learn 20:197–243
Husmeier D (2003) Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic bayesian networks. Bioinformatics 19(17):2271–2282
Korb K, Nicholson A (2010) Bayesian artificial intelligence. CRC Press, Boca Raton
Lauritzen S (1996) Graphical models. Oxford University Press, Oxford
Mukherjee S, Speed P (2008) Network inference using informative priors. Proc Natl Acad Sci USA 105(38):14,313–14,318
Murphy K (2007) Software for graphical models : a review. ISBA (Intl Soc for Bayesian Analysis). Bulletin 14(4):13–15
Neapolitan R (1990) Probabilistic reasoning in expert systems: theory and algorithms. John Wiley and Sons, Inc., New York
Nott D, Green P (2004) Bayesian variable selection and the swendsen–wang algorithm. J Comput Graph Stat 13:141–157
Pearce D, Kelly P (2006) A dynamic topological sort algorithm for directed acyclic graphs. ACM J Exp Algorithmics 11:1–7
Pearl J (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann, Burlington
R Core Team (2013) R: a language and environment for statistical computing. R foundation for statistical computing, Vienna. http://www.R-project.org/
Robert C, Casella G (2004) Monte Carlo statistical methods. Springer Texts in Statistics, Berlin
Robinson R (1973) Counting labeled acyclic digraphs. In: New directions in the theory of graphs. New York Academic Press, pp 239–273
Scutari M (2010) Learning bayesian networks with the bnlearn r package. J Stat Softw 35:1–22
Acknowledgements
S. Datta is funded by a Ph.D. studentship for the French Ministry of Research. The research leading to these results has received funding from the Innovative Medicines Initiative Joint Undertaking, under Grant Agreement No. 115439 (StemBANCC), resources of which are composed of financial contribution from the European Union Seventh Framework Programme (FP7/2007-2013) and EFPIA companies in kind contribution. This publication reflects only the author’s views and neither the IMI JU nor EFPIA nor the European Commission are liable for any use that may be made of the information contained therein.
Author information
Authors and Affiliations
Corresponding author
Appendix: Graph_sampler installation
Appendix: Graph_sampler installation
Graph_sampler is an easily available free software that can be redistributed or modified under the terms of the GNU General Public License as published by the Free Software Foundation. It is an inference as well as simulation tool for DAGs and can simulate random graphs for general directed graphs as well as for DAGs. In the case of BNs, we infer about their probable structure through the joint use of priors and data about node values.
Graph_sampler is written in ANSI-standard C language and can be compiled in any system having a ANSI C compliant compiler. The GNU gcc compiler (freeware) is highly recommended and the automated compilation script (called Makefile) can be successfully used if the standard ’make’ command is available. In order to modify the input file parser, the ’lex’ and ’ yacc’ are highly recommended. The full software along with the manual can be downloaded from:
https://sites.google.com/site/utcchairmmbsptp/software
Once downloaded, the software should be decompressed using ’gunzip’ and ’tar’ commands. Other archiving tools can also be used. Graph_sampler can be compiled using the ’make’ command. On successful compilation of Graph_sampler, it is ready for running. In order to run Graph_sampler, an input file specifying the simulation parameters should be provided. In Unix the command-line syntax to run that executable is:
“graph_sampler [input-file [output-prefix]]”
where the brackets indicate optional arguments. If no input file and/or output prefix are not specified, the program uses the defaults. The default input file is script.txt and the output files created depends on the parameters specified in the input file. Default output file names are best_graph.out, graph_samples.out, degree_count.out, motifs_count.out, edge_p.out and results_mcmc.bin.
A Graph_sampler input file is a text (ASCII) file that obeys relatively simple syntax (see the manual). Values of all the predefined variables in the input file should be properly defined. Description and range of each variable is illustrated in the manual. In case of improper assignment of values, Graph_sampler post error messages during runtime.
Rights and permissions
About this article
Cite this article
Datta, S., Gayraud, G., Leclerc, E. et al. Graph_sampler: a simple tool for fully Bayesian analyses of DAG-models. Comput Stat 32, 691–716 (2017). https://doi.org/10.1007/s00180-017-0719-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-017-0719-1