Advertisement

Discrete & Computational Geometry

, Volume 40, Issue 4, pp 537–560 | Cite as

Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem

  • Jie Gao
  • Michael Langberg
  • Leonard J. Schulman
Article
  • 64 Downloads

Abstract

The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in ℝ d , incomplete data objects correspond to affine subspaces (lines or Δ-flats). With this motivation we study the problem of finding the minimum intersection radius r(ℒ) of a set of lines or Δ-flats ℒ: the least r such that there is a ball of radius r intersecting every flat in ℒ. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher-dimensional flats, primarily because “distances” between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly’s theorem. This “intrinsic-dimension” Helly theorem states: for any family ℒ of Δ-dimensional convex sets in a Hilbert space, there exist Δ+2 sets ℒ′⊆ℒ such that r(ℒ)≤2r(ℒ′). Based upon this we present an algorithm that computes a (1+ε)-core set ℒ′⊆ℒ, |ℒ′|=O(Δ 4/ε), such that the ball centered at a point c with radius (1+ε)r(ℒ′) intersects every element of ℒ. The running time of the algorithm is O(n Δ+1 dpoly (Δ/ε)). For the case of lines or line segments (Δ=1), the (expected) running time of the algorithm can be improved to O(ndpoly (1/ε)). We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space.

Keywords

Clustering k-center Core set Incomplete data Helly theorem Approximation Inference 

References

  1. 1.
    Agarwal, P.K., Procopiuc, C.M.: Approximation algorithms for projective clustering. In: SODA’00: Proceedings of the Eleventh Annual ACM–SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2000, pp. 538–547. SIAM, Philadelphia (2000) Google Scholar
  2. 2.
    Agarwal, P.K., Procopiuc, C.M., Varadarajan, K.R.: Approximation algorithms for k-line center. In: ESA’02: Proceedings of the 10th Annual European Symposium on Algorithms, London, UK, 2002, pp. 54–63. Springer, Berlin (2002) Google Scholar
  3. 3.
    Agarwal, P.K., Procopiuc, C.M., Varadarajan, K.R.: A (1+ε)-approximation algorithm for 2-line-center. Comput. Geom. Theory Appl. 26(2), 119–128 (2003) zbMATHMathSciNetGoogle Scholar
  4. 4.
    Agarwal, P.K., Har-Peled, S., Varadarajan, K.R.: Approximating extent measures of points. J. ACM 51(4), 606–635 (2004) MathSciNetGoogle Scholar
  5. 5.
    Agarwal, P., Har-Peled, S., Varadarajan, K.R.: Geometric approximation via coresets. In: Current Trends in Combinatorial and Computational Geometry. Cambridge University Press, Cambridge (2005) Google Scholar
  6. 6.
    Bǎdoiu, M., Clarkson, K.L.: Smaller core-sets for balls. In: SODA’03: Proceedings of the Fourteenth Annual ACM–SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2003, pp. 801–802. SIAM, Philadelphia (2003) Google Scholar
  7. 7.
    Bǎdoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: STOC’02: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2002, pp. 250–257. Assoc. Comput. Mach., New York (2002) CrossRefGoogle Scholar
  8. 8.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2003) Google Scholar
  9. 9.
    Gao, J., Langberg, M., Schulman, L.: Clustering lines: classification of incomplete data. Manuscript (2006) Google Scholar
  10. 10.
    Har-Peled, S.: Private communication Google Scholar
  11. 11.
    Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: STOC’04: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2004, pp. 291–300. Assoc. Comput. Mach., New York (2004) CrossRefGoogle Scholar
  12. 12.
    Har-Peled, S., Varadarajan, K.: Projective clustering in high dimensions using core-sets. In: SCG’02: Proceedings of the Eighteenth Annual Symposium on Computational Geometry, New York, NY, USA, 2002, pp. 312–318. Assoc. Comput. Mach., New York (2002) CrossRefGoogle Scholar
  13. 13.
    Har-Peled, S., Wang, Y.: Shape fitting with outliers. SIAM J. Comput. 33, 269–285 (2003) CrossRefMathSciNetGoogle Scholar
  14. 14.
    Helly, E.: Über Mengen konvexer Körper mit gemeinschaftlichen Punkten. Jahresber. Dtsch. Math.-Ver. 32, 175–176 (1923) zbMATHGoogle Scholar
  15. 15.
    Kumar, P., Yildirim, E.A.: Minimum volume enclosing ellipsoids and core sets. J. Optim. Theory Appl. 126(1), 1–21 (2005) zbMATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Kumar, P., Mitchell, J.S.B., Yildirim, A.: Computing core-sets and approximate smallest enclosing hyperspheres in high dimensions. In: Proceedings of the Fifth Workshop on Algorithm Engineering and Experiments, pp. 45–55 (2003) Google Scholar
  17. 17.
    Kumar, P., Mitchell, J.S.B., Yildirim, E.A.: Approximate minimum enclosing balls in high dimensions using core-sets. J. Exp. Algorithmics 8, 1.1 (2003) CrossRefMathSciNetGoogle Scholar
  18. 18.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1986) Google Scholar
  19. 19.
    Matousek, J., Sharir, M., Welzl, E.: A subexponential bound for linear programming. Algorithmica 16(4/5), 498–516 (1996) zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Jie Gao
    • 1
  • Michael Langberg
    • 2
  • Leonard J. Schulman
    • 3
  1. 1.Department of Computer ScienceStony Brook UniversityStony BrookUSA
  2. 2.Computer Science DivisionThe Open University of IsraelRaananaIsrael
  3. 3.Department of Computer ScienceCalifornia Institute of TechnologyPasadenaUSA

Personalised recommendations