Skip to main content

Generalized Bayesian Structure Learning from Noisy Datasets

  • Conference paper
  • First Online:
  • 3508 Accesses

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11448))

Abstract

In recent years, with the open data movement around the world, more and more open data sets are available. But, the quality of the datasets poses issues for learning models. This study focuses on learning the Bayesian network structure from data sets containing noise. A novel approach called GBNL (Generalized Bayesian Structure Learning) is proposed. GBNL first uses a greedy algorithm to obtain an appropriate sliding window size for any dataset, then it leverages a difference array-based method to quickly improve the data quality by locating the noisy data sections and removing them. GBNL can not only evaluate the quality of the data set but also effectively reduce the noise in the data. We conduct experiments to evaluate GBNL on five large datasets, the experiment results validate the accuracy and the generalizability of this novel approach.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  1. Ben-Gal, I.: Bayesian Networks. Encyclopedia of Statistics in Quality and Reliability. Wiley, Hoboken (2007)

    Google Scholar 

  2. Jain, A.K., Duin, R.P.W., Mao, J.: Statistical pattern recognition: a review. IEEE Trans. Pattern Anal. Mach. Intell. 27(11), 1502–1502 (2002)

    Google Scholar 

  3. Njah, H., Jamoussi, S.: Weighted ensemble learning of Bayesian network for gene regulatory networks. Neurocomputing 150(B), 404–416 (2015)

    Article  Google Scholar 

  4. Yang, J., Tong, Y., Liu, X., Tan, S.: Causal inference from financial factors: continuous variable based local structure learning algorithm. In: 2014 IEEE Conference on Computational Intelligence for Financial Engineering & Economics (CIFEr), pp. 278–285. IEEE (2014)

    Google Scholar 

  5. Giudici, P., Spelta, A.: Graphical network models for international financial flows. J. Bus. Econ. Stat. 34(1), 128–138 (2016)

    Article  MathSciNet  Google Scholar 

  6. Yue, K., Wu, H., Fu, X., Xu, J., Yin, Z., Liu, W.: A data-intensive approach for discovering user similarities in social behavioral interactions based on the Bayesian network. Neurocomputing 219, 364–375 (2017)

    Article  Google Scholar 

  7. Tang, Y., Wang, Y., Cooper, K., Li, L.: Towards big data Bayesian network learning - an ensemble learning based approach. In: Proceedings of the IEEE International Congress on Big Data (BigData Congress), pp. 355–357 (2014)

    Google Scholar 

  8. Jensen, F.V.: Bayesian artificial intelligence. Pattern Anal. Appl. 7(2), 221–223 (2004)

    Article  Google Scholar 

  9. Li, D., Chen, C., Lv, Q., Yan, J., Shang, L., Chu, S.: Low-rank matrix approximation with stability. In: International Conference on Machine Learning, pp. 295–303 (2016)

    Google Scholar 

  10. Cheng, J., Greiner, R., Kelly, J., Bell, D., Liu, W.: Learning Bayesian networks from data: an information-theory based approach. Artif. Intell. 137(1–2), 43–90 (2002)

    Article  MathSciNet  Google Scholar 

  11. Sessions, V., Valtorta, M.: Towards a method for data accuracy assessment utilizing a bayesian network learning algorithm. J. Data Inf. Qual. 1(3), 1–34 (2009)

    Article  Google Scholar 

  12. Wang, S.C., Leng, C.P., Rui-Jie, D.U.: Noise smoothing in learning parameters of Bayesian network. J. Syst. Simul. 21(16), 5046–5053 (2009)

    Google Scholar 

  13. Ueno, M.: Robust learning Bayesian networks for prior belief. In: Proceedings of the Twenty-Seventh Conference on Uncertainty in Artificial Intelligence, pp. 698–707. AUAI Press (2011)

    Google Scholar 

  14. Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65(1), 31–78 (2006)

    Article  Google Scholar 

  15. Smith, J.Q., Daneshkhah, A.: On the robustness of Bayesian networks to learning from non-conjugate sampling. Int. J. Approximate Reason. 51(5), 558–572 (2010)

    Article  MathSciNet  Google Scholar 

  16. Yaramakala, S., Margaritis, D.: Speculative Markov blanket discovery for optimal feature selection. In: Fifth IEEE International Conference on Data Mining (ICDM 2005), pp. 809–812. IEEE (2005)

    Google Scholar 

  17. Wang, J., Yan, T., Mai, N., Altintas, I.: A scalable data science workflow approach for big data Bayesian network learning. In: IEEE/ACM International Symposium on Big Data Computing (2015)

    Google Scholar 

  18. Wit, E., Heuvel, E.V.D.: ‘All models are wrong...’: an introduction to model uncertainty. Statistica Neerlandica 66(3), 217–236 (2012)

    Google Scholar 

  19. Scutari, M.: Bayesian network constraint-based structure learning algorithms: parallel and optimised implementations in the bnlearn R package. J. Stat. Softw. 077 (2017)

    Google Scholar 

  20. Ruohai, D., Xiaoguang, G., Zhigao, G.: Parameter learning of discrete Bayesian networks based on monotonic constraints. Syst. Eng. Electron. 36(2), 272–277 (2014)

    MATH  Google Scholar 

Download references

Acknowledgments

The work was supported by Key Technologies Research and Development Program of China (2017YFC0405805-04).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yan Tang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Tang, Y., Chen, Y., Ge, G. (2019). Generalized Bayesian Structure Learning from Noisy Datasets. In: Li, G., Yang, J., Gama, J., Natwichai, J., Tong, Y. (eds) Database Systems for Advanced Applications. DASFAA 2019. Lecture Notes in Computer Science(), vol 11448. Springer, Cham. https://doi.org/10.1007/978-3-030-18590-9_11

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-18590-9_11

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-18589-3

  • Online ISBN: 978-3-030-18590-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics