A New Method for Vertical Parallelisation of TAN Learning Based on Balanced Incomplete Block Designs

  • Anders L. Madsen
  • Frank Jensen
  • Antonio Salmerón
  • Martin Karlsen
  • Helge Langseth
  • Thomas D. Nielsen
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8754)

Abstract

The framework of Bayesian networks is a widely popular formalism for performing belief update under uncertainty. Structure restricted Bayesian network models such as the Naive Bayes Model and Tree-Augmented Naive Bayes (TAN) Model have shown impressive performance for solving classification tasks. However, if the number of variables or the amount of data is large, then learning a TAN model from data can be a time consuming task. In this paper, we introduce a new method for parallel learning of a TAN model from large data sets. The method is based on computing the mutual information scores between pairs of variables given the class variable in parallel. The computations are organised in parallel using balanced incomplete block designs. The results of a preliminary empirical evaluation of the proposed method on large data sets show that a significant performance improvement is possible through parallelisation using the method presented in this paper.

Keywords

Bayesian networks TAN parallel learning 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Andreassen, S., Jensen, F.V., Andersen, S.K., Falck, B., Kjærulff, U., Woldbye, M., Sørensen, A.R., Rosenfalck, A., Jensen, F.: MUNIN — an expert EMG assistant. In: Desmedt, J.E. (ed.) Computer-Aided Electromyography and Expert Systems, ch. 21. Elsevier Science Publishers, Amsterdam (1989)Google Scholar
  2. 2.
    Basak, A., Brinster, I., Ma, X., Mengshoel, O.J.: Accelerating Bayesian network parameter learning using Hadoop and MapReduce. In: Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications, pp. 101–108 (2012)Google Scholar
  3. 3.
    Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, IT 14(3), 462–467 (1968)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Chu, C.-T., Kim, S.K., Lin, Y.-A., Yu, Y., Bradski, G., Ng, A.Y., Olukotun, K.: Map-reduce for machine learning on multicore. In: NIPS, pp. 281–288 (2006)Google Scholar
  5. 5.
    Cowell, R.G., Dawid, A.P., Lauritzen, S.L., Spiegelhalter, D.J.: Probabilistic Networks and Expert Systems. Springer (1999)Google Scholar
  6. 6.
    Di Paola, J.W., Wallis, J.S., Wallis, W.D.: A list of (v,b,r,k,λ) designs for r ≤ 30. In: Proc. 4th S-E Cont. Combinatorics, Graph Theory and Computing, pp. 249–258 (1973)Google Scholar
  7. 7.
    Domingos, P., Pazzani, M.: On the optimality of the simple Bayesian classifier under zero-one loss. Machine Learning, 103–130 (1997)Google Scholar
  8. 8.
    Fang, Q., Yue, K., Fu, X., Wu, H., Liu, W.: MapReduce-Based Method for Learning Bayesian Network from Massive Data. In: Ishikawa, Y., Li, J., Wang, W., Zhang, R., Zhang, W. (eds.) APWeb 2013. LNCS, vol. 7808, pp. 697–708. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  9. 9.
    Fisher, R.A.: An examination of the different possible solutions of a problem in incomplete blocks. Annals of Eugenics, 52–75 (1940)Google Scholar
  10. 10.
    The MPI Forum. MPI: A Message Passing Interface (1993)Google Scholar
  11. 11.
    Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning, 1–37 (1997)Google Scholar
  12. 12.
    Gordon, D.M.: La Jolla Difference Set Repository, http://www.ccrwest.org/diffsets/diff_sets/ (accessed May 15, 2014)
  13. 13.
    Gordon, D.M.: The Prime Power Conjecture is True for n < 2000000. Electronic J. Combinatorics 1(1, R6), 1–7 (1994)MATHGoogle Scholar
  14. 14.
    Jensen, F.V., Nielsen, T.D.: Bayesian Networks and Decision Graphs, 2nd edn. Springer (2007)Google Scholar
  15. 15.
    Kjærulff, U.B., Madsen, A.L.: Bayesian Networks and Influence Diagrams: A Guide to Construction and Analysis, 2nd edn. Springer (2013)Google Scholar
  16. 16.
    Koller, D., Friedman, N.: Probabilistic Graphical Models — Principles and Techniques. MIT Press (2009)Google Scholar
  17. 17.
    Madsen, A.L., Jensen, F., Kjærulff, U.B., Lang, M.: HUGIN - The Tool for Bayesian Networks and Influence Diagrams. International Journal on Artificial Intelligence Tools 14(3), 507–543 (2005)CrossRefGoogle Scholar
  18. 18.
    Madsen, A.L., Lang, M., Kjærulff, U.B., Jensen, F.: The Hugin Tool for Learning Bayesian Networks. In: Nielsen, T.D., Zhang, N.L. (eds.) ECSQARU 2003. LNCS (LNAI), vol. 2711, pp. 594–605. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  19. 19.
    Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Series in Representation and Reasoning. Morgan Kaufmann Publishers, San Mateo (1988)Google Scholar
  20. 20.
    Rish, I.: An empirical study of the naive Bayes classifier. In: IJCAI Workshop on Empirical Methods in AI, pp. 41–46 (2001)Google Scholar
  21. 21.
    Scutari, M.: Learning Bayesian Networks with the bnlearn R Package. Journal of Statistical Software 35(3), 1–22 (2010)Google Scholar
  22. 22.
    Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, Adaptive Computation and Machine Learning, 2nd edn. MIT Press (2000)Google Scholar
  23. 23.
    Stinson, D.: Combinatorial Designs — Constructions and Analysis. Springer (2003)Google Scholar
  24. 24.
    Zhang, N.L.: Hierarchical latent class models for cluster analysis. Journal of Machine Learning Research 5, 697–723 (2004)MATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Anders L. Madsen
    • 1
    • 2
  • Frank Jensen
    • 1
  • Antonio Salmerón
    • 3
  • Martin Karlsen
    • 1
  • Helge Langseth
    • 4
  • Thomas D. Nielsen
    • 2
  1. 1.HUGIN EXPERT A/SAalborgDenmark
  2. 2.Department of Computer ScienceAalborg UniversityDenmark
  3. 3.Department of MathematicsUniversity of AlmeríaSpain
  4. 4.Department of Computer and Information ScienceNorwegian University of Science and TechnologyNorway

Personalised recommendations