Skip to main content
Log in

An Estimation Algorithm using Distance Clustering of Data

  • Invited Paper
  • Published:
OPSEARCH Aims and scope Submit manuscript

Abstract

The problem is to predict a value yY (output, class) from an observed value of a vector xX (predictors, inputs, attributes), the relations between y and x given in (empirical) data D − {(x i , y i ): i − 1,…,N}, listing N observed pairs. We propose an estimation algorithm using a classification of D in clusters {Ω 1 ,….,Ω m }, based on a distance function in X× Y. For each cluster Ω i compute the centroid i of y, and denote me X — projection of Ω i by Ω x i . Prediction of y given xX is done by assigning the point x to a nearest projected cluster, say Ω x i , and using i as estimate for y. Numerical tests show the method, in its basic general form, to give accurate predictions for well-known data sets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

8. References

  1. S Aeberhard, D Coomans and O de Vel, “Comparison of classifiers in high dimensional settings,” Technical Report 02-1992, Dept. of Computer Science, James Cook University, Australia

  2. S Aeberhard, D Coomans and O de Vel, “The classification performance of RDA,” Technical Report 01-1992., Dept. of Computer Science, James Cook University, Australia

  3. K Bennett and O Mangasarian, “Robust linear programming discrimination of two linearly inseparable sets,” Optimization Methods and Software 1 (1992), 23–34.

    Article  Google Scholar 

  4. E Boros, P Hammer, T Ibaraki, and A Kogan, “Logical analysis of numerical data,” Mathematical Programming 79 (1997), 163–190.

    Google Scholar 

  5. G Cestnik, I Konenenko, and I Bratko, “A knowledge-elicitation tool for sophisticated users,” in Progress in Machine Learning, Sigma Press, 31–45.

  6. B Duran and P Odell, Cluster Analysis, Springer-Verlag, Berlin, 1974.

    Google Scholar 

  7. P Diaconis and B Efron, “Computer-intensive, methods in statistics,” Scientific American 48, 1983.

  8. B Everitt, Cluster Analysis, 3rd edition, Edward Ainuld, London, 1993.

    Google Scholar 

  9. K Fukunaga, Introduction to Statistical Pattern Recognition, 2nd edition, Academic Press Inc., Boston, MA, 1990.

    Google Scholar 

  10. R Gnanadesikan, J Harvey and J Kettenring, “Mahalanobis metrics for cluster analysis,” The Indian Journel of Statistics, Series A 55 (1993), 494–505.

    Google Scholar 

  11. P Hansen, B Jaumard, “Cluster analysis and mathematical programming,” Math. Programming 79 (1997), 191–215.

    Google Scholar 

  12. M Jambu and M Lebeaux, Cluster Analysis and Data Analysis, North-Holland Publishing Co., Amsterdam, 1983.

    Google Scholar 

  13. Y Levin and A Ben-Israel, “A heuristic method for large-scale multifacility location problems,” to appear.

  14. T Lim, W Loh, and Y Shih, “A comparison of prediction accuracy, complexity, and training time of thirty three old and new classification algorithms,” Machine Learning 40, 203–228.

  15. O Mangasarian and V Wolberg, “Cancer diagnosis via linear programming,” SIAM News, 23 (1990), 1–18.

    Google Scholar 

  16. O Mangasarian, R Setono and W Wolberg, “Pattern recognition via linear pro-gramming: theory and application to medical diagnosis,” In Large-Scale Numerical Optimization, T. Colcman and Y. Li, editors, SIAM Publications, Philadelphia 1990, 22–30.

    Google Scholar 

  17. C Merz and P Murphy, “UCI Repository of machine learning databases.” Department of Information and Computer Science, University of California, Irvine, CA, 1996.

    Google Scholar 

  18. J Smith, J Everhart, W Dickson, W Knowler and R Johannes, “Using the ADAP learning algorithm to forecast the onset of diabetes mellitus,” In Proceedings of the Symposium on Computer Applications and Medical Care, 261–265, IEEE Computer Society Press.

  19. J T Ton and R C Gonzales, Pattern Recognition Principles, Addison-Wesiey, Reading, Mass. 1974.

    Google Scholar 

  20. W Wolberg and O Mangasarian, “Multisurface method of pattern separation for medical diagnosis applied to breast cytology,” Proceedings of the National Academy of Science, USA, 87–1990, 9193–9196.

  21. J Zhang, “Selecting typical instances in instance-based learning,” in proceedings of the Ninth International Machine Learning Conference, Aberdeen, Scotland, 1992, 470–479.

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ben-Israel, A., Levin, Y. An Estimation Algorithm using Distance Clustering of Data. OPSEARCH 38, 443–455 (2001). https://doi.org/10.1007/BF03398650

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF03398650

Key words

Navigation