One-dimensional center-based l 1-clustering method

Abstract

Motivated by the method for solving center-based Least Squares—clustering problem (Kogan in Introduction to clustering large and high-dimensional data, Cambridge University Press, 2007; Teboulle in J Mach Learn Res 8:65–102, 2007) we construct a very efficient iterative process for solving a one-dimensional center-based l 1—clustering problem, on the basis of which it is possible to determine the optimal partition. We analyze the basic properties and convergence of our iterative process, which converges to a stationary point of the corresponding objective function for each choice of the initial approximation. Given is also a corresponding algorithm, which in only few steps gives a stationary point and the corresponding partition. The method is illustrated and visualized on the example of looking for an optimal partition with two clusters, where we check all stationary points of the corresponding minimizing functional. Also, the method is tested on the basis of large numbers of data points and clusters and compared with the method for solving the center-based Least Squares—clustering problem described in Kogan (2007) and Teboulle (2007).

This is a preview of subscription content, access via your institution.

References

  1. 1

    Banerjee A., Merugu S., Dhillon I.S., Ghosh J.: Clustering with Bregman divergences. J. Mach. Learn. Res. 6, 1705–1749 (2005)

    MathSciNet  MATH  Google Scholar 

  2. 2

    Ben-Israel, A., Iyigun, C.: Probabilistic D-clustering. J. Classif. 25 (2007). doi:10.1007/s00357-007-0021-y

  3. 3

    Boley D.L.: Principal direction divisive partitioning. Data Min. Knowl. Discov. 2, 325–344 (1998)

    Article  Google Scholar 

  4. 4

    Boyd D.L., Vandenberghe L.: Convex Optimization. Cambridge University Press, Cambridge (2004)

    MATH  Google Scholar 

  5. 5

    Chaovalitwongse, W.A., Butenko, S., Pardalos, P.M.: Clustering Challenges in Biological Networks. World Scientific (2009)

  6. 6

    Clarke F.H.: Generalized gradients an applications. Trans. Am. Math. Soc. 205, 247–262 (1975)

    MATH  Article  Google Scholar 

  7. 7

    Clarke F.H.: Optimization and Nonsmooth Analysis. SIAM, Philadelphia (1990)

    MATH  Book  Google Scholar 

  8. 8

    Cupec R., Grbić R., Sabo K., Scitovski R.: Three points method for searching the best least absolute deviations plane. Appl. Math. Comput. 215, 983–994 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  9. 9

    Demidenko E.: Criteria for unconstrained global optimization. J. Optim. Theory Appl. 136, 375–395 (2008)

    MathSciNet  MATH  Article  Google Scholar 

  10. 10

    Dhillon I.S., Mallela S., Kumar R.: A divisive information-theoretic feature clustering algorithm for text classification. J. Mach. Learn. Res. 3, 1265–1287 (2003)

    MATH  Google Scholar 

  11. 11

    Dhillon, I.S., Guan, Y., Kulis, B.: Kernel k-means, spectral clustering and normalized cuts. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), August 22–25, 2004, Seattle, Washington, USA, pp. 551–556 (2004)

  12. 12

    Domínguez, E., Muñoz, J.: Applying bio-inspired techniques to the p-median problem. IWANN 2005; Computational Intelligence Bioinspired Syst., 8th Int. Workshop Artificial Neural Networks. Lecture Notes in Computer Science, pp. 67–74. Springer, Berlin (2005)

  13. 13

    Finkel D.E., Kelley C.T.: Additive scaling and the \({\tt DIRECT}\) algorithm. J. Glob. Optim. 36, 597–608 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  14. 14

    Floudas C.A., Gounaris C.E.: A review of recent advances in global optimization. J. Glob. Optim. 45, 3–38 (2009)

    MathSciNet  MATH  Article  Google Scholar 

  15. 15

    Gaviano M., Lera D.: A global minimization algorithm for Lipschitz functions. Optim. Lett. 2, 1–13 (2008)

    MathSciNet  MATH  Article  Google Scholar 

  16. 16

    Gan G., Ma C., Wu J.: Data Clustering: Theory, Algorithms, and Applications. SIAM, Philadelphia (2007)

    MATH  Book  Google Scholar 

  17. 17

    Hansen E.R., Walster G.W.: Global Optimization Using Interval Analysis, 2nd edn, Revised and Expanded. Marcel Dekker, New York (2004)

    Google Scholar 

  18. 18

    Iyigun, C.: Probabilistic Distance Clustering. Dissertation, Graduate School, New Brunswick, Rutgers (2007)

  19. 19

    Iyigun C., Ben-Israel A.: A generalized Weiszfeld method for the multi-facility location problem. Oper. Res. Lett. 38, 207–214 (2010)

    MathSciNet  MATH  Article  Google Scholar 

  20. 20

    Jones D.R., Perttunen C.D., Stuckman B.E.: Lipschitzian optimization without the Lipschitz constant. JOTA 79, 157–181 (1993)

    MathSciNet  MATH  Article  Google Scholar 

  21. 21

    Kogan J.: Introduction to Clustering Large and High-Dimensional Data. Cambridge University Press, Cambridge (2007)

    MATH  Google Scholar 

  22. 22

    Kogan, J., Nicholas, C., Wiacek, M.: Hybrid Clustering of large high dimensional data. In: Castellanos, M., Berry, M.W. (eds.) Proceedings of the Workshop on Text Mining. SIAM (2007)

  23. 23

    Kogan, J., Teboulle, M.: Scaling clustering algorithms with Bregman distances. In: Berry, M.W., Castellanos, M. (eds.) Proceedings of the Workshop on Text Mining at the Sixth SIAM International Conference on Data Mining (2006)

  24. 24

    Kogan J., Nicholas C., Wiacek M.: Hybrid clustering with divergences. In: Berry, M.W., Castellanos, M. (eds.) Survey of Text Mining: Clustering, Classification, and Retrieval, 2nd edn, Springer, Berlin (2007)

    Google Scholar 

  25. 25

    Leisch F.: A toolbox for K-centroids cluster analysis. Comput. Stat. Data Anal. 51, 526–544 (2006)

    MathSciNet  MATH  Article  Google Scholar 

  26. 26

    Littau D., Boley D.L.: Clustering very large data sets with PDDP. In: Kogan, J., Nicholas, C., Teboulle, M. (eds.) Grouping Multidimensional Data: Recent Advances in Clustering, pp. 99–126. Springer, New York (2006)

    Chapter  Google Scholar 

  27. 27

    Pan S., Chen J.S.: Two unconstrained optimization approaches for the Euclidean k-centrum location problem. Appl. Math. Comput. 189, 1368–1383 (2007)

    MathSciNet  MATH  Article  Google Scholar 

  28. 28

    Pardalos P.M., Hansen P.: Data Mining and Mathematical Programming. American Mathematical Society, Providence (2008)

    MATH  Google Scholar 

  29. 29

    Petrić J., Zlobec S.: Nonlinear Programming (in Croatian). Naučna knjiga, Beograd (1989)

    Google Scholar 

  30. 30

    Piyavskij, S. A.: An algorithm for finding the absolute extremum of a function. J. Comput. Math. Math. Phys. 12, 888–896 (1972, in Russian)

    Google Scholar 

  31. 31

    Reese, J.: Solution Methods for the p-Median Problem: An Annotated Bibliography. Wiley (2006)

  32. 32

    Rodrígues-Chia A.M., Espejo I., Drezner Z.: On solving the planar k-centrum problem with Euclidean distances. Eur. J. Oper. Res. 207, 1169–1186 (2010)

    Article  Google Scholar 

  33. 33

    Sabo K., Scitovski R., Vazler I.: Searching for a best LAD-solution of an overdetermined system of linear equations motivated by searching for a best LAD-hyperplane on the basis of given data. J. Optim. Theory Appl. 149, 293–314 (2011)

    MathSciNet  MATH  Article  Google Scholar 

  34. 34

    Sabo K., Scitovski R., Vazler I., Zekić-Sušac M.: Mathematical models of natural gas consumption. Energy Convers. Manag. 52, 1721–1727 (2011)

    Article  Google Scholar 

  35. 35

    Sabo K., Scitovski R.: The best least absolute deviations line—properties and two efficient methods. ANZIAM J. 50, 185–198 (2008)

    MathSciNet  MATH  Article  Google Scholar 

  36. 36

    Schöbel A., Scholz D.: The big cube small cube solution method for multidimensional facility location problems. Comput. Oper. Res. 37, 115–122 (2010)

    MathSciNet  MATH  Article  Google Scholar 

  37. 37

    Späth, H.: Cluster-Formation und Analyse. R. Oldenburg Verlag, München (1983)

  38. 38

    Su, Z., Kogan, J., Nicholas, C.: Constrained clustering with k-means type algorithms. In: Berry, M.W., Kogan, J. (eds.) Text Mining Applications and Theory, pp. 81–103. Wiley, Chichester (2010)

  39. 39

    Teboulle M.: A unified continuous optimization framework for center-based clustering methods. J. Mach. Learn. Res. 8, 65–102 (2007)

    MathSciNet  MATH  Google Scholar 

  40. 40

    Vazler, I., Sabo, K., Scitovski, R.: Weighted median of the data in solving least absolute deviations problems. Commun. Stat. Theory Methods, Art ID 539750 (LSTA-2010-0354) (2011, to appear)

  41. 41

    Vivek E.P., Sudha N.: Robust Hausdorff distance measure for face recognition. Pattern Recognit. 40, 431–442 (2007)

    MATH  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Rudolf Scitovski.

Additional information

This work is supported by the Ministry of Science, Education and Sports, Republic of Croatia, through research grant 235-2352818-1034.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Sabo, K., Scitovski, R. & Vazler, I. One-dimensional center-based l 1-clustering method. Optim Lett 7, 5–22 (2013). https://doi.org/10.1007/s11590-011-0389-9

Download citation

Keywords

  • Clustering
  • Data mining
  • Optimization
  • Weighted median problem