Abstract
The problem is to predict a value y∊ Y (output, class) from an observed value of a vector x ∊ X (predictors, inputs, attributes), the relations between y and x given in (empirical) data D − {(x i , y i ): i − 1,…,N}, listing N observed pairs. We propose an estimation algorithm using a classification of D in clusters {Ω 1 ,….,Ω m }, based on a distance function in X× Y. For each cluster Ω i compute the centroid y̅ i of y, and denote me X — projection of Ω i by Ω x i . Prediction of y given x ∊ X is done by assigning the point x to a nearest projected cluster, say Ω x i , and using y̅i as estimate for y. Numerical tests show the method, in its basic general form, to give accurate predictions for well-known data sets.
Similar content being viewed by others
8. References
S Aeberhard, D Coomans and O de Vel, “Comparison of classifiers in high dimensional settings,” Technical Report 02-1992, Dept. of Computer Science, James Cook University, Australia
S Aeberhard, D Coomans and O de Vel, “The classification performance of RDA,” Technical Report 01-1992., Dept. of Computer Science, James Cook University, Australia
K Bennett and O Mangasarian, “Robust linear programming discrimination of two linearly inseparable sets,” Optimization Methods and Software 1 (1992), 23–34.
E Boros, P Hammer, T Ibaraki, and A Kogan, “Logical analysis of numerical data,” Mathematical Programming 79 (1997), 163–190.
G Cestnik, I Konenenko, and I Bratko, “A knowledge-elicitation tool for sophisticated users,” in Progress in Machine Learning, Sigma Press, 31–45.
B Duran and P Odell, Cluster Analysis, Springer-Verlag, Berlin, 1974.
P Diaconis and B Efron, “Computer-intensive, methods in statistics,” Scientific American 48, 1983.
B Everitt, Cluster Analysis, 3rd edition, Edward Ainuld, London, 1993.
K Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, Academic Press Inc., Boston, MA, 1990.
R Gnanadesikan, J Harvey and J Kettenring, “Mahalanobis metrics for cluster analysis,” The Indian Journel of Statistics, Series A 55 (1993), 494–505.
P Hansen, B Jaumard, “Cluster analysis and mathematical programming,” Math. Programming 79 (1997), 191–215.
M Jambu and M Lebeaux, Cluster Analysis and Data Analysis, North-Holland Publishing Co., Amsterdam, 1983.
Y Levin and A Ben-Israel, “A heuristic method for large-scale multifacility location problems,” to appear.
T Lim, W Loh, and Y Shih, “A comparison of prediction accuracy, complexity, and training time of thirty three old and new classification algorithms,” Machine Learning 40, 203–228.
O Mangasarian and V Wolberg, “Cancer diagnosis via linear programming,” SIAM News, 23 (1990), 1–18.
O Mangasarian, R Setono and W Wolberg, “Pattern recognition via linear pro-gramming: theory and application to medical diagnosis,” In Large-Scale Numerical Optimization, T. Colcman and Y. Li, editors, SIAM Publications, Philadelphia 1990, 22–30.
C Merz and P Murphy, “UCI Repository of machine learning databases.” Department of Information and Computer Science, University of California, Irvine, CA, 1996.
J Smith, J Everhart, W Dickson, W Knowler and R Johannes, “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus,” In Proceedings of the Symposium on Computer Applications and Medical Care, 261–265, IEEE Computer Society Press.
J T Ton and R C Gonzales, Pattern Recognition Principles, Addison-Wesiey, Reading, Mass. 1974.
W Wolberg and O Mangasarian, “Multisurface method of pattern separation for medical diagnosis applied to breast cytology,” Proceedings of the National Academy of Science, USA, 87–1990, 9193–9196.
J Zhang, “Selecting typical instances in instance-based learning,” in proceedings of the Ninth International Machine Learning Conference, Aberdeen, Scotland, 1992, 470–479.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Ben-Israel, A., Levin, Y. An Estimation Algorithm using Distance Clustering of Data. OPSEARCH 38, 443–455 (2001). https://doi.org/10.1007/BF03398650
Published:
Issue Date:
DOI: https://doi.org/10.1007/BF03398650