Skip to main content

Pairwise Data Clustering and Applications

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2697))

Included in the following conference series:

Abstract

Data clustering is an important theoretical topic and a sharp tool for various applications. Its main objective is to partition a given data set into clusters such that the data within the same cluster are “more” similar to each other with respect to certain measures. In this paper, we study the pairwise data clustering problem with pairwise similarity/ dissimilarity measures that need not satisfy the triangle inequality. By using a criterion, called the minimum normalized cut, we model the pairwise data clustering problem as a graph partition problem. The graph partition problem based on minimizing the normalized cut is known to be NP-hard. We present a ((4 + o(1)) ln n)-approximation polynomial time algorithm for the minimum normalized cut problem. We also give a more efficient algorithm for this problem by sacrificing the approximation ratio slightly. Further, our scheme achieves a ((2 + o(1)) ln n)-approximation polynomial time algorithm for computing the sparsest cuts in edge-weighted and vertex-weighted undirected graphs, improving the previously best known approximation ratio by a constant factor.

This research was supported in part by the 21st Century Research and Technology Fund from the State of Indiana.

The work of this author was supported in part by the Computing and Information Technology Center, and by the Faculty Research Council, University of Texas-Pan American, Edinburg, Texas, USA.

The work of this author was supported in part by the National Science Foundation under Grants CCR-9623585 and CCR-9988468.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. P. Agarwal and C. Procopiuc, Exact and Approximation Algorithms for Clustering, Proc. of ACM-SIAM SODA, 1998.

    Google Scholar 

  2. V. Arya, N. Garg, R. Khandekar, V. Pandit, A. Meyerson, and K. Munagala, Local Search Heuristics for k-median and Facility Location Problems, Proc. of ACM STOC, 2001, 21–29.

    Google Scholar 

  3. J. Aslam, A. Leblanc, and C. Stein, A New Approach to Clustering, Proc. of WAE, 2000.

    Google Scholar 

  4. Y. Bartal, M. Charikar, and D. Raz, Approximating Min-Sum k-clustering in Metric Spaces, Proc. of ACM STOC, 2001, 11–22.

    Google Scholar 

  5. A. Ben-Dor and Z. Yakhini, Clustering Gene Expression Patterns, Proc. of ACM RECOMB, 1999, 33–42.

    Google Scholar 

  6. D. Bienstock, January 1999. Talk at Oberwolfach, Germany.

    Google Scholar 

  7. M. Charikar, C. Chekuri, T. Feder, and R. Motwani, Incremental Clustering and Dynamic Information Retrieval, Proc. of ACM STOC, 1997, 626–635.

    Google Scholar 

  8. T.H. Cormen, C. E. Leiserson, and R. L. Rivest, Introduction to Algorithms, McGraw-Hill, 1990.

    Google Scholar 

  9. P. Drineas, A. Frieze, R. Kannan, S. Vempala, and V. Vinay, Clustering in Large Graphs and Matrices, Proc. of ACM-SIAM SODA, 1999.

    Google Scholar 

  10. G. Even, J. Naor, S. Rao, and B. Schieber, Fast Approximate Graph Partitioning Algorithms, SIAM J. Computing, 28(1999), 2187–2214.

    Article  MATH  MathSciNet  Google Scholar 

  11. B. Everitt, Cluster Analysis, Oxford University Press, 1993.

    Google Scholar 

  12. N. Garg and J. Könemann, Faster and Simpler Algorithms for Multicommodity Flow and Other Fractional Packing Problems, Proc. 39th IEEE FOCS, 1998, 300–309.

    Google Scholar 

  13. N. Garg, V. V. Vazirani, and M. Yannakakis, Approximate Max-Flow Min-(Multi)Cut Theorems and Their Applications, SIAM J. Computing, 25(1996), 235–251.

    Article  MATH  MathSciNet  Google Scholar 

  14. S. Guattery and G. Miller, On the Performance of Spectral Graph Partitioning Methods, Proc. of ACM-SIAM SODA, 1995, 233–242.

    Google Scholar 

  15. S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, Clustering Data Streams, Proc. of IEEE FOCS, 2000.

    Google Scholar 

  16. T. Hofmann and J. Buhmann, Pairwise Data Clustering by Deterministic Annealing, IEEE Trans. on Pattern Analysis and Machine Intelligence, 19(1997), 1–14.

    Article  Google Scholar 

  17. R. Kannan, S. Vempala, and A. Vetta, On Clusterings — Good, Bad and Spectral, Proc. of IEEE FOCS, 2000.

    Google Scholar 

  18. G. Karakostas, Faster Approximation Schemes for Fractional Multicommodity Flow Problems, Proc. 13th ACM-SIAM SODA, 2002, 166–173.

    Google Scholar 

  19. P. Klein, S. Plotkin, C. Stein, and É. Tardos, Faster Approximation Algorithms for the Unit Capacity Concurrent Flow Problem with Applications to Routing and Finding Sparse Cuts, SIAM J. on Computing, 23(1994), 466–487.

    Article  MATH  MathSciNet  Google Scholar 

  20. T. Leighton, F. Makedon, S. Plotkin, C. Stein, É. Tardos, and S. Tragoudas, Fast Approximation Algorithms for Multicommodity Flow Problems, J. of Computer and System Sciences, 50(1995), 228–243.

    Article  MATH  MathSciNet  Google Scholar 

  21. T. Leighton and S. Rao, Multicommodity Max-Flow Min-Cut Theorems and Their Use in Designing Approximation Algorithms, J. of the ACM, 46(1999), 787–832.

    Article  MATH  MathSciNet  Google Scholar 

  22. J. Matousek, On Approximate Geometric k-clustering, Discrete and Computational Geometry, 24(2000), 61–84.

    MATH  MathSciNet  Google Scholar 

  23. B. Mirkin, Mathematical Classification and Clustering, Kluwer Academic Publishers, 1996.

    Google Scholar 

  24. F. Shahrokhi and D. Matula, The Maximum Concurrent Flow Problem. J. of the ACM, 37(1990), 318–334.

    Article  MATH  MathSciNet  Google Scholar 

  25. J. Shi and J. Malik, Normalized Cuts and Image Segmentation, IEEE Trans. on Pattern Analysis and Machine Intelligence, 22(8) (2000), 888–905.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wu, X., Chen, D.Z., Mason, J.J., Schmid, S.R. (2003). Pairwise Data Clustering and Applications. In: Warnow, T., Zhu, B. (eds) Computing and Combinatorics. COCOON 2003. Lecture Notes in Computer Science, vol 2697. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-45071-8_46

Download citation

  • DOI: https://doi.org/10.1007/3-540-45071-8_46

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-40534-4

  • Online ISBN: 978-3-540-45071-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics