Abstract
We focus on the problem of efficient learning of dependency trees. Once grown, they can be used as a special case of a Bayesian network, for PDF approximation, and for many other uses. Given the data, a well-known algorithm can fit an optimal tree in time that is quadratic in the number of attributes and linear in the number of records. We show how to modify it to exploit partial knowledge about edge weights. Experimental results show running time that is near-constant in the number of records, without significant loss in accuracy of the generated trees.
Similar content being viewed by others
References
Blake, C., Merz, C.: UCI repository of machine learning databases (1998) http://www.ics.uci.edu/~mlearn/MLRepository.html
Chow, C.K., Liu, C.N.: Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory, 14, 462–467 (1968)
Cormen, T.H., Leiserson, C.E., Rivest, R.L.: Introduction to Algorithms. McGraw-Hill (1989)
Davies, S.: Scalable and practical probability density estimators for scientific anomaly detection. Doctoral dissertation, Carnegie-Mellon University (2002)
Domingos, P., Hulten, G.: Mining high-speed data streams. In Proceedings of 6th International Conference on Knowledge Discovery and Data Mining, pp. 71–80, N.Y., ACM Press (2000)
Domingos, P., Hulten, G.: A general method for scaling up machine learning algorithms and its application to clustering. In Proceedings of the Eighteenth International Conference on Machine Learning. Morgan Kaufmann (2001a)
Domingos, P., Hulten, G.: Learning from infinite data in finite time. In Advances in Neural Information Processing Systems 14, Vancouver, British Columbia, Canada (2001b)
Friedman, N., Goldszmidt, M., Lee, T.J.: Bayesian Network Classification with Continuous Attributes: Getting the Best of Both Discretization and Parametric Fitting. In Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann, San Francisco, CA (1998)
Friedman, N., Nachman, I., Peér, D.: Learning bayesian network structure from massive datasets: The “sparse candidate” algorithm. In Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence (UAI-99), pp. 206–215. Stockholm, Sweden 1999)
Goldenberg, A., Moore, A.: Tractable learning of large bayes net structures from sparse data. In Proc. 21st International Conf. on Machine Learning (2004)
Hettich, S., Bay, S.D.: The UCI KDD archive, (1999) http://kdd.ics.uci.edu
Maron, O., Moore A.W.: Hoeffding races: Acdelerating model selection search for classification and function approximation. Advances in Neural Information Processing Systems, pp 59–66. Denver, Colorado, Morgan Kaufmann (1994)
Meila, M.: An accelerated Chow and Liu algorithm: fitting tree distributions to high dimensional sparse data. In Proceedings of the Sixteenth International Conference on Machine Learning (1999a)
Meila, M.: Learning with Mixtures of Trees. Doctoral dissertation. Massachusetts Institute of Technology (1999b)
Moore, A.W., Lee M.S.: Efficient algorithms fro minimizing cross validation error. In Proceedings of the Eleventh International Conference on Machine Learning, pp. 190–198 New Brunswick, US: Morgan Kaufmann (1994)
Pelleg, D.: Scalable and Practical Probability Density Estimators for Scientific Anomaly Detection. Doctoral dissertation, Carnegie-Mellon University (2004)
Reza, F.: An Introduction to Information Theory. Dover Publications, pp. 282–283. New York (1994)
SDSS. The Sloan Digital Sky Survey. (1998) http://www.sdss.org
Tarjan, R.E.: Data Structures and Network Algorithms, Vol. 44 of CBMS-NSF Reg. Conf. Ser. Appl. Math. SIAM (1983)
Author information
Authors and Affiliations
Corresponding author
Additional information
Work done at Carnegie-Mellon university. This research was sponsored by the National Science Foundation (NSF) under grant no. ACI-0121671 and no. DMS-9873442.
Rights and permissions
About this article
Cite this article
Pelleg, D., Moore, A. Dependency trees in sub-linear time and bounded memory. The VLDB Journal 15, 250–262 (2006). https://doi.org/10.1007/s00778-005-0170-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-005-0170-8