Abstract
We introduce a distance measure based on the idea that two vectors are considered similar if they lead to similar predictive probability distributions. The suggested approach avoids the scaling problem inherent to many alternative techniques as the method automatically transforms the original attribute space to a probability space where all the numbers lie between 0 and 1. The method is also flexible in the sense that it allows different attribute types (discrete or continuous) in the same consistent framework. To study the validity of the suggested measure, we ran a series of experiments with publicly available data sets. The empirical results demonstrate that the unsupervised distance measure is sensible in the sense that it can be used for discovering the hidden clustering structure of the data.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
D. Aha. A Study of Instance-Based Algorithms for Supervised Learning Tasks: Mathematical, Empirical, an Psychological Observations. PhD thesis, University of California, Irvine, 1990.
D. Aha, (editor). Lazy Learning. Kluwer Academic Publishers, Dordrecht, 1997. Reprinted from Artificial Intelligence Review, 11:1–5.
C. Atkeson, A. Moore, and S. Schaal. Locally weighted learning. In Aha [2], pages 11–73.
J. O. Berger. Statistical Decision Theory and Bayesian Analysis. Springer-Verlag, New York, 1985.
C. Blake, E. Keogh, and C. Merz. UCI repository of machine learning databases, 1998. URL: http://www.ics.uci.edu/~nilearn/MLRepository.html.
E. Castillo, J. Gutiérrez, and A. Hadi. Expert Systems and Probabilistic Network Models. Monographs in Computer Science. Springer-Verlag, New York, NY, 1997.
C. Chatfield and A. Collins. Introduction to Multivariate Analysis. Chapman and Hall, New York, 1980.
G. Cooper and E. Herskovits. A Bayesian method for the induction of probabilistic networks from data. Machine Learning, 9:309–347, 1992.
N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Machine Learning, 29:131–163, 1997.
A. Gelman, J. Carlin, H. Stern, and D. Rubin. Bayesian Data Analysis. Chapman & Hall, 1995.
D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20(3):197–243, September 1995.
D. Heckerman and C. Meek. Models and selection criteria for regression and classification. In D. Geiger and P. Shenoy, (editors), Uncertainty in Arificial Intelligence 13, pages 223–228. Morgan Kaufmann Publishers, San Mateo, CA, 1997.
F. Jensen. An Introduction to Bayesian Networks. UCL Press, London, 1996.
T. Kohonen. Self-Organizing Maps. Springer-Verlag, Berlin, 1995.
J. Kolodner. Case-Based Reasoning. Morg.an Kaufmann Publishers, San Mateo, 1993.
P. Kontkanen, J. Lahtinen, P. Myllymäki, T. Silander, and H. Tirri. Using Bayesian networks for visualizing high-dimensional data. Intelligent Data Analysis, 2000. To appear.
P. Kontkanen, P. Myllymäki, T. Silander, and H. Tirri. BAYDA: Software for Bayesian classification and feature selection. In R. Agrawal, P. Stolorz, and G. Piatetsky-Shapiro, (editors), Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining (KDD-98), pages 254–258. AAAI Press, Menlo Park, 1998.
P. Kontkanen, P. Myllymäki, T. Silander, and H. Tirri. Bayes optimal instance-based learning. In C. Nédellec and C. Rouveirol, (editors), Machine Learning: ECML-98, Proceedings of the 10th European Conference, volume 1398 of Lecture Notes in Artificial Intelligence, pages 77–88. Springer-Verlag, 1998.
P. Kontkanen, P. Myllymäki, T. Silander, and H. Tirri. On Bayesian case matching. In B. Smyth and P. Cunningham, (editors), Advances in Case-Based Reasoning, Proceedings of the 4th European Workshop (EWCBR-98), volume 1488 of Lecture Notes in Artificial Intelligence, pages 13–24. Springer-Verlag, 1998.
P. Kontkanen, P. Myllymäki, T. Silander, and H. Tirri. On supervised selection of Bayesian networks. In K. Laskey and H. Prade, (editors), Proceedings of the 15th International Conference on Uncertainty in Artificial Intelligence (UAI’99), pages 334–342. Morgan Kaufmann Publishers, 1999.
P. Kontkanen, P. Myllymäki, T. Silander, H. Tirri, and P. Grünwald. On predictive distributions and Bayesian networks. Statistics and Computing, 10:39–54, 2000.
A. Moore. Acquisition of dynamic control knowledge for a robotic manipulator. In Seventh International Machine Learning Workshop. Morgan Kaufmann, 1990.
R. E. Neapolitan. Probabilistic Reasoning in Expert Systems. John Wiley & Sons, New York, NY, 1990.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann Publishers, San Mateo, CA, 1988.
C. Stanfill and D. Waltz. Toward memory-based reasoning. Communications of the ACM, 29(12):1213–1228, 1986.
H. Tirri, P. Kontkanen, and P. Myllymäki. Probabilistic instance-based learning. In L. Saitta, (editor), Machine Learning: Proceedings of the Thirteenth International Conference (ICML’96), pages 507–515. Morgan Kaufmann Publishers, 1996.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2000 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kontkanen, P., Lahtinen, J., Myllymäki, P., Tirri, H. (2000). An Unsupervised Bayesian Distance Measure. In: Blanzieri, E., Portinale, L. (eds) Advances in Case-Based Reasoning. EWCBR 2000. Lecture Notes in Computer Science, vol 1898. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-44527-7_14
Download citation
DOI: https://doi.org/10.1007/3-540-44527-7_14
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-67933-2
Online ISBN: 978-3-540-44527-2
eBook Packages: Springer Book Archive