An Estimation Algorithm using Distance Clustering of Data

Ben-Israel, Adi; Levin, Yuri

doi:10.1007/BF03398650

An Estimation Algorithm using Distance Clustering of Data

Invited Paper
Published: 14 December 2017

Volume 38, pages 443–455, (2001)
Cite this article

OPSEARCH Aims and scope Submit manuscript

Adi Ben-Israel¹ &
Yuri Levin¹

11 Accesses
2 Citations
Explore all metrics

Abstract

The problem is to predict a value y∊ Y (output, class) from an observed value of a vector x ∊ X (predictors, inputs, attributes), the relations between y and x given in (empirical) data D − {(x_i, y_i): i − 1,…,N}, listing N observed pairs. We propose an estimation algorithm using a classification of D in clusters {Ω₁,….,Ω_m}, based on a distance function in X× Y. For each cluster Ω_i compute the centroid y̅_i of y, and denote me X — projection of Ω_i by Ω ^x_i . Prediction of y given x ∊ X is done by assigning the point x to a nearest projected cluster, say Ω ^x_i , and using y̅_i as estimate for y. Numerical tests show the method, in its basic general form, to give accurate predictions for well-known data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

8. References

S Aeberhard, D Coomans and O de Vel, “Comparison of classifiers in high dimensional settings,” Technical Report 02-1992, Dept. of Computer Science, James Cook University, Australia
S Aeberhard, D Coomans and O de Vel, “The classification performance of RDA,” Technical Report 01-1992., Dept. of Computer Science, James Cook University, Australia
K Bennett and O Mangasarian, “Robust linear programming discrimination of two linearly inseparable sets,” Optimization Methods and Software 1 (1992), 23–34.
Article Google Scholar
E Boros, P Hammer, T Ibaraki, and A Kogan, “Logical analysis of numerical data,” Mathematical Programming 79 (1997), 163–190.
Google Scholar
G Cestnik, I Konenenko, and I Bratko, “A knowledge-elicitation tool for sophisticated users,” in Progress in Machine Learning, Sigma Press, 31–45.
B Duran and P Odell, Cluster Analysis, Springer-Verlag, Berlin, 1974.
Google Scholar
P Diaconis and B Efron, “Computer-intensive, methods in statistics,” Scientific American 48, 1983.
B Everitt, Cluster Analysis, 3^rd edition, Edward Ainuld, London, 1993.
Google Scholar
K Fukunaga, Introduction to Statistical Pattern Recognition, 2^nd edition, Academic Press Inc., Boston, MA, 1990.
Google Scholar
R Gnanadesikan, J Harvey and J Kettenring, “Mahalanobis metrics for cluster analysis,” The Indian Journel of Statistics, Series A 55 (1993), 494–505.
Google Scholar
P Hansen, B Jaumard, “Cluster analysis and mathematical programming,” Math. Programming 79 (1997), 191–215.
Google Scholar
M Jambu and M Lebeaux, Cluster Analysis and Data Analysis, North-Holland Publishing Co., Amsterdam, 1983.
Google Scholar
Y Levin and A Ben-Israel, “A heuristic method for large-scale multifacility location problems,” to appear.
T Lim, W Loh, and Y Shih, “A comparison of prediction accuracy, complexity, and training time of thirty three old and new classification algorithms,” Machine Learning 40, 203–228.
O Mangasarian and V Wolberg, “Cancer diagnosis via linear programming,” SIAM News, 23 (1990), 1–18.
Google Scholar
O Mangasarian, R Setono and W Wolberg, “Pattern recognition via linear pro-gramming: theory and application to medical diagnosis,” In Large-Scale Numerical Optimization, T. Colcman and Y. Li, editors, SIAM Publications, Philadelphia 1990, 22–30.
Google Scholar
C Merz and P Murphy, “UCI Repository of machine learning databases.” Department of Information and Computer Science, University of California, Irvine, CA, 1996.
Google Scholar
J Smith, J Everhart, W Dickson, W Knowler and R Johannes, “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus,” In Proceedings of the Symposium on Computer Applications and Medical Care, 261–265, IEEE Computer Society Press.
J T Ton and R C Gonzales, Pattern Recognition Principles, Addison-Wesiey, Reading, Mass. 1974.
Google Scholar
W Wolberg and O Mangasarian, “Multisurface method of pattern separation for medical diagnosis applied to breast cytology,” Proceedings of the National Academy of Science, USA, 87–1990, 9193–9196.
J Zhang, “Selecting typical instances in instance-based learning,” in proceedings of the Ninth International Machine Learning Conference, Aberdeen, Scotland, 1992, 470–479.

Download references

Author information

Authors and Affiliations

Rutcor-Rutgers Center For Opertions Research, Rutgers University, 640, Bartholomew Rd, Piscataway, NJ, 08854-8003, USA
Adi Ben-Israel & Yuri Levin

Authors

Adi Ben-Israel
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Levin
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ben-Israel, A., Levin, Y. An Estimation Algorithm using Distance Clustering of Data. OPSEARCH 38, 443–455 (2001). https://doi.org/10.1007/BF03398650

Download citation

Published: 14 December 2017
Issue Date: October 2001
DOI: https://doi.org/10.1007/BF03398650

Key words

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Estimation Algorithm using Distance Clustering of Data

Abstract

Access this article

Similar content being viewed by others

Predictive K-means with Local Models

A Novel Clustering Algorithm Based on a Non-parametric “Anti-Bayesian” Paradigm

Clustering

8. References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Key words

Navigation

An Estimation Algorithm using Distance Clustering of Data

Abstract

Access this article

Similar content being viewed by others

Predictive K-means with Local Models

A Novel Clustering Algorithm Based on a Non-parametric “Anti-Bayesian” Paradigm

Clustering

8. References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Key words

Search

Navigation