Abstract
Since big data contain more comprehensive probability distributions and richer causal relationships than conventional small datasets, discovering Bayesian network (BN) structure from big datasets is becoming more and more valuable for modeling and reasoning under uncertainties in many areas. Facing big data, most of the current BN structure learning algorithms have limitations. First, learning BNs structure from big datasets is an expensive process that requires high computational cost, often ending in failure. Second, given any dataset as input, it is very difficult to choose one algorithm from numerous candidates for consistently achieving good learning accuracy. To address these issues, we introduce a novel approach called Adaptive Bayesian network Learning (ABNL). ABNL begins with an adaptive sampling process that extracts a sufficiently large data partition from any big dataset for fast structure learning. Then, ABNL feeds the data partition to different learning algorithms to obtain a collection of BN Structures. Lastly, ABNL adaptively chooses the structures and merge them into a final network structure using an ensemble method. Experimental results on four big datasets show that ABNL leads to a significantly improved performance than whole dataset learning and more accurate results than baseline algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ben-Gal, I.: Bayesian networks. In: Ruggeri, F., Kenett, R.S., Faltin, F.W. (eds.) Encyclopedia of Statistics in Quality and Reliability. Wiley, Hoboken (2007)
Yoo, C., Ramirez, L., Liuzzi, J.: Big data analysis using modern statistical and machine learning methods in medicine. Int. Neurourol. J. 18(2), 50–57 (2014)
Zhang, Y., Zhang, Y., Swears, N., et al.: Modeling temporal interactions with interval temporal Bayesian networks for complex activity recognition. IEEE Trans. Pattern Anal. Mach. Intell. 35(10), 2468–2483 (2013)
Njah, H., Jamoussi, S.: Weighted ensemble learning of Bayesian network for gene regulatory networks. Neurocomputing 150(B), 404–416 (2015)
Yang, J., Tong, Y., Liu, X., Tan, S.: Causal inference from financial factors: continuous variable based local structure learning algorithm. In: 2014 IEEE Conference on Computational Intelligence for Financial Engineering and Economics (CIFEr), pp. 278–285. IEEE (2014)
Yue, K., Wu, H., Fu, X., Xu, J., Yin, Z., Liu, W.: A data-intensive approach for discovering user similarities in social behavioral interactions based on the Bayesian network. Neurocomputing 219, 364–375 (2017)
Al-Jarrah, O., Yoo, P., et al.: Efficient machine learning for big data: a review. Big Data Res. 2(3), 87–93 (2015)
Fang, Q., Yue, K., Fu, X.,Wu, H., Liu, W.: A mapreduce-based method for learning Bayesian network from massive data. In: Proceedings of the 15th Asia-Pacific Web Conference (APWeb 2013), pp. 697–708 (2013)
Tang, Y., Wang, Y., Cooper, K., Li, L.: Towards big data Bayesian network learning - an ensemble learning based approach. In: Proceedings of the IEEE International Congress on Big Data (BigData Congress), pp. 355–357 (2014)
Rokach, L.: Ensemble-based classifiers. Artif. Intell. Rev. 33(1–2), 1–39 (2010)
Tang, Y., Xu, Z., Zhuang, Y.: Bayesian network structure learning from big data: a reservoir sampling based ensemble method. In: Gao, H., Kim, J., Sakurai, Y. (eds.) DASFAA 2016. LNCS, vol. 9645, pp. 209–222. Springer, Heidelberg (2016). doi:10.1007/978-3-319-32055-7_18
Chickering, D., Heckerman, D., Meek, C.: Large-sample learning of Bayesian networks is NP-hard. J. Mach. Learn. Res. 5, 1287–1330 (2004)
Wang, J., Tang, Y., Nguyen, M., Altintas, I.: A scalable data science workflow approach for big data Bayesian network learning. In: Proceedings of the 2014 IEEE/ACM International Symposium on Big Data Computing (BDC 2014), pp. 16–25 (2014)
Jiang, L., Li, C., Cai, Z., Zhang, H.: Sampled Bayesian network classifiers for class-imbalance and cost-sensitive learning. In: Proceedings of the IEEE 25th International Conference on Tools with Artificial Intelligence (ICTAI), pp. 512–517 (2013)
Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning Bayesian networks from data: an information-theory based approach. Artif. Intell. 137(1–2), 43–90 (2002)
Margaritis, D.: Learning Bayesian network model structure from data. Ph.D. thesis, Carnegie-Mellon University (2003)
Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: Fifth IEEE International Conference on Data Mining (ICDM 2005), pp. 809–812. IEEE (2005)
Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)
Njah, H., Jamoussi, S.: Weighted ensemble learning of Bayesian network for gene regulatory networks. Neurocomputing 150(PB), 404–416 (2015)
Scutari, M.: Learning Bayesian networks with the bnlearn R package. J. Stat. Softw. 35(3), 1–22 (2010)
Heckerman, D., Geiger, D., Chickering, D.: Learning Bayesian networks: the combination of knowledge and statistical data. Mach. Learn. 20, 197–243 (1995)
Spiegelhalter, D., Cowell, R.: Learning in probabilistic expert systems. In: Bayesian Statistics, vol. 4. Clarendon Press (1992)
Beinlich, I., Suermondt, H., Chavez, R., Cooper, G.: The alarm monitoring system: a case study with two probabilistic inference techniques for belief networks. In: Proceedings of the 2nd European Conference on Artificial Intelligence in Medicine, pp. 247–256 (1989)
Binder, J., Koller, D., Russell, S., Kanazawa, K.: Adaptive probabilistic networks with hidden variables. Mach. Learn. 29(2–3), 213–244 (1997)
Abramson, B., Brown, J., Edwards, W., Murphy, A., Winkler, R.L.: Hailfinder: a Bayesian system for forecasting severe weather. Int. J. Forecast. 12(1), 57–71 (1996)
Acknowledgments
This work was supported by the Natural Science Foundation of Jiangsu Province, China (Grant No. BK20141420 and Grant No. BK20140857).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Tang, Y., Zhang, Q., Liu, H., Wang, W. (2017). Adaptive Bayesian Network Structure Learning from Big Datasets. In: Bao, Z., Trajcevski, G., Chang, L., Hua, W. (eds) Database Systems for Advanced Applications. DASFAA 2017. Lecture Notes in Computer Science(), vol 10179. Springer, Cham. https://doi.org/10.1007/978-3-319-55705-2_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-55705-2_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-55704-5
Online ISBN: 978-3-319-55705-2
eBook Packages: Computer ScienceComputer Science (R0)