Skip to main content
Log in

Unsupervised and supervised data classification via nonsmooth and global optimization

  • Published:
Top Aims and scope Submit manuscript

Abstract

We examine various methods for data clustering and data classification that are based on the minimization of the so-called cluster function and its modications. These functions are nonsmooth and nonconvex. We use Discrete Gradient methods for their local minimization. We consider also a combination of this method with the cutting angle method for global minimization. We present and discuss results of numerical experiments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Aarts E. and Korst J. (1989).Simulated Annealing and Boltzmann Machines: A Stochastic Approach to Combinatorial Optimization and Neural Computing. Wiley Interscience Series in Discrete Mathematics and Optimization. John Wiley.

  • Al-Sultan K.S. (1995). A tabu search approach to the clustering problem.Pattern Recognition 28, 1443–1451.

    Article  Google Scholar 

  • Al-Sultan K.S. and Khan M.M. (1996). Computational experience on four algorthms for the hard clustering problem.Pattern Recognition Letters 17, 295–308.

    Article  Google Scholar 

  • Anderberg M.R. (1973).Cluster Analysis for Applications. Academic Press.

  • Andramonov M.Yu, Rubinov A.M. and Glover B.M. (1999). Cutting angle method in global optimization.Applied Mathematics Letters 12, 95–100.

    Article  Google Scholar 

  • Babu G.P. and Murty M.N. (1993). A near optimal initial seed value selection in thek-meanws algorithm using a genetic algorithm.Pattern Recognition Letters 14, 763–769.

    Article  Google Scholar 

  • Baeza-Yates R.A. (1992). Introduction to data structures and algorithms related to information retrieval. In: Frakes W.B. and Baeza-Yates R.A. (eds.),Information Retrieval: Data Structures and Algorithms. Prentice Hall, 13–27.

  • Bagirov A.M. (1992). A method of approximating a subdifferential.Russian Journal of Computational Mathematics and Mathematical Physics 32, 561–566.

    Google Scholar 

  • Bagirov A.M. (1998). Continuous subdifferential approximation and its construction.Indian Journal of Pure and Applied Mathematics 1, 17–29.

    Google Scholar 

  • Bagirov A.M. (1999a). Derivative-free methods for unconstrained nonsmooth optimization and its numerical analysis.Investigacao Operacional 19, 75–93.

    Google Scholar 

  • Bagirov A.M. (1999b). Minimization methods for one class of nonsmooth functions and calculation of semi-equilibrium prices, In: Eberhard A. et al. (eds.)Progress in Optimization: Contribution from Australia. Kluwer Academic Publishers, 147–175.

  • Bagirov A.M. (2000). Numerical methods for minimizing quasidifferentiable functions: a survey and comparison. In: Demyanov V.F. and Rubinov A.M. (eds.),Quasidifferentiability and Related Topics. Kluwer Academic Publishers, 33–71.

  • Bagirov A.M. and Gasanov A.A. (1995). A method of approximating a quasidifferential.Russian Journal of Computational Mathematics and Mathematical Physics 35, 403–409.

    Google Scholar 

  • Bagirov A.M., Rubinov A.M. and Yearwood J. (2000). A heuristic algorithm for feature selection based on optimization techniques. In: Sarker R., Abbas H. and Newton C.S. (eds.),Heuristic and Optimization for Knowledge Discovery. Idea Publishing Group.

  • Bagirov A.M., Rubinov A.M. and Yearwood J. (2001). Using global optimization to improve classification for medical diagnosis and prognosis.Topics in Health Information Management 22, 65–74.

    Google Scholar 

  • Bagirov A.M., Rubinov A.M. and Yearwood J. (2002). A global optimization approach to classification.Optimization and Engineering 3, 129–155.

    Article  Google Scholar 

  • Bagirov A.M. and Rubinov A.M. (2000). Global minimization of increasing positively homogeneous function over unit simplex.Annals of Operations Research 98, 171–187.

    Article  Google Scholar 

  • Bagirov A.M. and Rubinov A.M. (2001). Modified versions of the cutting angle method, In: Hadjisavvas N. and Pardalos P.M. (eds.),Advances in Convex Analysis and Global Optimization. Kluwer Academic Publishers.

  • Bagirov A.M. and Rubinov A.M. (2000). The cutting angle method and a local search.Journal of Global Optimization (to appear).

  • Bagirov A.M. and Yearwood J. (2003). A new nonsmooth optimization algorithm for clustering problems. Research Report 03/02, University of Ballarat, Australia. Submitted toEuropean Journal of Operational Research.

  • Batten L. and Beliakov G. (2002). Fast algorithm for the Cutting Angle Method of Global Optimization.Journal of Global Optimization 24, 149–161.

    Article  Google Scholar 

  • Bennett K.P. and Mangasarian O.L. (1992). Robust linear programming discrimination of two linearly inseparable sets.Optimization Methods and Software 1, 23–34.

    Google Scholar 

  • Bock H.H. (1974).Automatische Klassifikation. Vandenhoeck and Ruprecht.

  • Bock H.H. (1998). Clustering and neural networks, In: Rizzi A., Vichi M. and Bock H.H. (eds.),Advances in Data Science and Classification. Springer-Verlag, 265–277.

  • Bradley P.S. and Mangasarian O.L. (1998). Feature selection via concave minimization and support vector machines. Machine Learning Proceedings of the Fifteenth International Conference (ICML’98), San Francisco, California. Morgan Kaufmann, 82–90.

  • Bradley P.S. and Mangasarian O.L. (2000). Massive data discrimination via linear support vector machines.Optimization Methods and Software 13, 1–10.

    Google Scholar 

  • Bradley P.S., Fayyad U.M. and Mangasarian O.L. (1999). Data mining: overview and optimization opportunities.INFORMS Journal on Computing 11, 217–238.

    Google Scholar 

  • Brown D.E. and Entail C.L. (1992). A practical application of simulated annealing to the clustering problem.Pattern Recognition 25, 401–412.

    Article  Google Scholar 

  • Brown M., Grundy W., Lin D., Christianini N., Sugnet C., Furey T., Ares M. and Haussler D. (2000). Knowledg-based analysis of microarray gene expression data using support vector machines. Proceedings of the National Academy of Sciences 97, 262–267.

    Article  Google Scholar 

  • Bhuyan N.J., Raghavan V.V. and Venkatesh K.E. (19919. Genetic algorithms for clustering with an ordered representation. Proceedings of the Fourth International Conference on Genetic Algorithms, 408–415.

  • Carpenter G. and Grossberg S. (1990). Art3: Hierarchical search using chemical transmitters in self organising pattern recognition architectures.Neural Networks 3, 129–152.

    Article  Google Scholar 

  • Chen C. and Mangasarian O.L. (1995). Hybrid misclassification minimization. Mathematical Programming Technical Report 95-05, University of Wisconsin.

  • DeCoste D. and Schölkopf B. (2002). Training invariant support vector machines.Machine Learning 46, 161–190.

    Article  Google Scholar 

  • Demyanov V.F. and Rubinov A.M. (1995).Constructive Nonsmooth Analysis. Peter Lang.

  • Dhillon I.S., Fan J. and Guan Y. (2001). Efficient clustering of very large document collections, In: Grossman R.L., Kamath C., Kegelmeyer P., Kumar V. and Namburu R.R. (eds.),Data Mining for Scientific and Engineering Applications. Kluwer Academic Publishers.

  • Diehr G. (1985). Evaluation of a branch and bound algorithm for clustering.SIAM Journal of Scientific and Statistical Computing 6, 268–284.

    Article  Google Scholar 

  • Dubes R. and Jain A.K. (1976). Clustering techniques: the user’s dilemma.Pattern Recognition 8, 247–260.

    Article  Google Scholar 

  • Finnie G. and Sun Z. (2003).r 5 model for case-based reasoning.Knowledge-Based Systems 16, 59–65.

    Article  Google Scholar 

  • Fogel D.B. (1994). An introduction to simulated evolutionary optimization.IEEE Transactions on Neural Networks 5, 3–14.

    Article  Google Scholar 

  • Goldberg D.E. (1989).Genetic Algorithms in Search, Optimization and Machine Learning. Addison-Wesley Publishing Co..

  • Grefenstette J. (1986). Optimization of control parameters for genetic algorithms.IEEE Transactions on Systems Man and Cybernetics 1, 122–128.

    Article  Google Scholar 

  • Hanjoul P. and Peeters D. (1985). A comparison of two dual-based procedures for solving thep-median problem.European Journal of Operational Research 20, 387–396.

    Article  Google Scholar 

  • Hansen P. and Jaumard B. (1997). Cluster analysis and mathematical programming.Mathematical Programming 79, 191–215.

    Article  Google Scholar 

  • Holland J.H. (1975).Adaptation in Natural and Artificial Systems. University of Michigan Press.

  • Hawkins D.M., Muller M.W. and Krooden J.A.ten (1982). Cluster analysis. In: Hawkins D.M. (ed.),Topics in Applied Multivariate Analysis. Cambridge University Press.

  • Jain A.K., Murty M.N. and Flynn P.J. (1999). Data clustering: a review.ACM Computing Surveys 31, 264–323.

    Article  Google Scholar 

  • Jardine N. and Sibson R. (1971).Mathematical Taxonomy. John Wiley.

  • Jensen R.E. (1969). A dynamic programming algorithm for cluster analysis.Operations Research 17, 1034–1057.

    Google Scholar 

  • Joachims T. (1998). Text categorization with support vector machines: Learning with many relevant teatures.Proceedings of the European Conference on Machine Learning. Springer-Verlag, 137–142.

  • Jones D. and Beltramo M.A. (1991). Solving partitioning problems with genetic algorithms.Proceedings of the Fourth International Conference on Genetic Algorithms 442–449.

  • King B. (1967). Step-wise clustering procedures.Journal of the American Statistical Association 69, 86–101.

    Article  Google Scholar 

  • Kohonen T. (1989).Self Organization and Associative Memory. Springer Information Sciences Series. Springer-Verlag.

  • Koontz W.L.G., Narendra P.M. and Fukunaga K. (1975). A branch and bound clustering algorithm.IEEE Transactions on Computers, 24, 908–915.

    Google Scholar 

  • Lu S.Y. and Fu K.S. (1978). A sentence to sentence clustering procedure for pattern analysis.IEEE Transactions on Systems Mans and Cybernetics 8, 381–389.

    Google Scholar 

  • MacQueen J.B. (1967). Some Methods for Classification and Analysis of Multivariate observations. In: LeCam L.M. and Neyman J. (eds.),Proceedings of the Firth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press.

  • Mangasarian O.L. (1994). Misclassification minimization.Journal of Global Optimization 5, 309–323.

    Article  Google Scholar 

  • Mangasarian O.L. (1997). Mathematical programming in data mining.Data Mining and Knowledge Discovery 1, 183–201.

    Article  Google Scholar 

  • McLachlan G.J. (1992).Discriminat Analysis and Statistical Pattern Recognition. John Wiley.

  • McLachlan G.J., Peel D. and Prado, P. (1997). Clustering via normal mixture models. Proceedings of the American Statistical Association (Bayesian Statistical Science Section), 98–103.

  • McQueen J. (1971). Some methods for classification and analysis of multivariate observations. In: LeCam L.M. and Neyman J. (eds.),Proceedings of the Firth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press.

  • Michie D., Spiegelhalter D.J. and Taylor C.C. (1994).Machine Learning, Neural and Statistical Classification. Ellis Horwood Series in Artificial Intelligence.

  • Mifflin R. (1977). Semismooth and semiconvex functions in constrained optimization.SIAM Journal on Control and Optimization 15, 959–972.

    Article  Google Scholar 

  • Mirkin B. (1996).Mathematical Classification and Clustering. Kluwer Academic Publishers.

  • Murphy P.M. and Aha D.W. (1992). UCI repository of machine learning databases. Technical report, Department of Information and Computer science, University of California, Irvine. www.ics.uci.edu/mlearn/MLRepository.html.

    Google Scholar 

  • Murtagh F. (1984). A survey of recent advances in hierarchical clustering algorithms which use cluster centres.Computer Journal 26, 354–359.

    Google Scholar 

  • Nagy G. (1968). State of the art in pattern recognition.Proceedings of the IEEE 56, 836–862.

    Article  Google Scholar 

  • Quinlan J.R. (1993).C4.5: Programs for Machine Learning. Morgan Kaufmann.

  • Raghavan V.V. and Birchand K. (1979). A comparison of the stability characteristics of some graph theoretic clustering methods. Proceedings of the Second international Conference on Information Storage and Retreival, 10–22.

  • Reeves C.R. (1993).Modern Heuristic Techniques for Combinatorial Problems. Blackwell.

  • Rubinov A.M. (2000).Abstract Convexity and Global Optimization. Kluwer Academic Publishers.

  • Rubinov A.M. and Soukhoroukova N.V. (2003). A nonsmooth optimization approach to clustering large-scale datasets, manuscript of the author.

  • Rubinov A.M., Soukhoroukova N.V. and Yearwood J. (2001). Clustering for studing structure and quality of datasets, Research Report 01/24, University of Ballarat.

  • Rubinov A.M. and Ugon J. (2002). Skeletons of finite sets of points, manuscript of the author.

  • Schölkopf B. and Smola A. (2002).Learning with Kernels. The MIT Press.

  • Selim S.Z. and Al-Sultan K.S. (1991). A simulated annealing algorithm for the clustering.Pattern Recognition 24, 1003–1008.

    Article  Google Scholar 

  • Selim S.Z. and Ismail M.A. (1984).k-means-type algorithm: generalized convergence theorem and characterization of local optimality.IEEE Transactions on Pattern Analysis and Machine Intelligence 6, 81–87.

    Article  Google Scholar 

  • Sethi I. and Jain A.K. (1991).Artificial Neural Networks and Pattern Recognition: Old and new Connections. Elsevier Science.

  • Shang Y. and Wah. B.W. (1996). Global optimization for neural network training.IEEE Computer 29, 31–44.

    Google Scholar 

  • Sneath P.H.A. and Sokal R.R. (1973).Numerical Taxonomy. Freeman.

  • Spath H. (1980).Cluster Analysis Algorithms. Ellis Horwood Limited.

  • Sun L.X., Xie Y.L., Song X.H., Wang J.H. and Yu R.Q. (1994). Cluster analysis by simulated annealing.Computers and Chemistry 18, 103–108.

    Article  Google Scholar 

  • Ward J.H. Jr. (1983). Hierarchical grouping to optimize and objective function.Journal of the American Statistical Association 58, 236–244.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Additional information

This research was supported by the Australian Research Council.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bagirov, A.M., Rubinov, A.M., Soukhoroukova, N.V. et al. Unsupervised and supervised data classification via nonsmooth and global optimization. Top 11, 1–75 (2003). https://doi.org/10.1007/BF02578945

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF02578945

Key Words

AMS subject classification

Navigation