Discrete & Computational Geometry

, Volume 40, Issue 4, pp 537–560 | Cite as

Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem

Article
  • 58 Downloads

Abstract

The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in ℝ d , incomplete data objects correspond to affine subspaces (lines or Δ-flats). With this motivation we study the problem of finding the minimum intersection radius r(ℒ) of a set of lines or Δ-flats ℒ: the least r such that there is a ball of radius r intersecting every flat in ℒ. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher-dimensional flats, primarily because “distances” between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly’s theorem. This “intrinsic-dimension” Helly theorem states: for any family ℒ of Δ-dimensional convex sets in a Hilbert space, there exist Δ+2 sets ℒ′⊆ℒ such that r(ℒ)≤2r(ℒ′). Based upon this we present an algorithm that computes a (1+ε)-core set ℒ′⊆ℒ, |ℒ′|=O(Δ 4/ε), such that the ball centered at a point c with radius (1+ε)r(ℒ′) intersects every element of ℒ. The running time of the algorithm is O(n Δ+1 dpoly (Δ/ε)). For the case of lines or line segments (Δ=1), the (expected) running time of the algorithm can be improved to O(ndpoly (1/ε)). We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space.

Keywords

Clustering k-center Core set Incomplete data Helly theorem Approximation Inference 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Agarwal, P.K., Procopiuc, C.M.: Approximation algorithms for projective clustering. In: SODA’00: Proceedings of the Eleventh Annual ACM–SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2000, pp. 538–547. SIAM, Philadelphia (2000) Google Scholar
  2. 2.
    Agarwal, P.K., Procopiuc, C.M., Varadarajan, K.R.: Approximation algorithms for k-line center. In: ESA’02: Proceedings of the 10th Annual European Symposium on Algorithms, London, UK, 2002, pp. 54–63. Springer, Berlin (2002) Google Scholar
  3. 3.
    Agarwal, P.K., Procopiuc, C.M., Varadarajan, K.R.: A (1+ε)-approximation algorithm for 2-line-center. Comput. Geom. Theory Appl. 26(2), 119–128 (2003) MATHMathSciNetGoogle Scholar
  4. 4.
    Agarwal, P.K., Har-Peled, S., Varadarajan, K.R.: Approximating extent measures of points. J. ACM 51(4), 606–635 (2004) MathSciNetGoogle Scholar
  5. 5.
    Agarwal, P., Har-Peled, S., Varadarajan, K.R.: Geometric approximation via coresets. In: Current Trends in Combinatorial and Computational Geometry. Cambridge University Press, Cambridge (2005) Google Scholar
  6. 6.
    Bǎdoiu, M., Clarkson, K.L.: Smaller core-sets for balls. In: SODA’03: Proceedings of the Fourteenth Annual ACM–SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2003, pp. 801–802. SIAM, Philadelphia (2003) Google Scholar
  7. 7.
    Bǎdoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: STOC’02: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2002, pp. 250–257. Assoc. Comput. Mach., New York (2002) CrossRefGoogle Scholar
  8. 8.
    Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2003) Google Scholar
  9. 9.
    Gao, J., Langberg, M., Schulman, L.: Clustering lines: classification of incomplete data. Manuscript (2006) Google Scholar
  10. 10.
    Har-Peled, S.: Private communication Google Scholar
  11. 11.
    Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: STOC’04: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2004, pp. 291–300. Assoc. Comput. Mach., New York (2004) CrossRefGoogle Scholar
  12. 12.
    Har-Peled, S., Varadarajan, K.: Projective clustering in high dimensions using core-sets. In: SCG’02: Proceedings of the Eighteenth Annual Symposium on Computational Geometry, New York, NY, USA, 2002, pp. 312–318. Assoc. Comput. Mach., New York (2002) CrossRefGoogle Scholar
  13. 13.
    Har-Peled, S., Wang, Y.: Shape fitting with outliers. SIAM J. Comput. 33, 269–285 (2003) CrossRefMathSciNetGoogle Scholar
  14. 14.
    Helly, E.: Über Mengen konvexer Körper mit gemeinschaftlichen Punkten. Jahresber. Dtsch. Math.-Ver. 32, 175–176 (1923) MATHGoogle Scholar
  15. 15.
    Kumar, P., Yildirim, E.A.: Minimum volume enclosing ellipsoids and core sets. J. Optim. Theory Appl. 126(1), 1–21 (2005) MATHCrossRefMathSciNetGoogle Scholar
  16. 16.
    Kumar, P., Mitchell, J.S.B., Yildirim, A.: Computing core-sets and approximate smallest enclosing hyperspheres in high dimensions. In: Proceedings of the Fifth Workshop on Algorithm Engineering and Experiments, pp. 45–55 (2003) Google Scholar
  17. 17.
    Kumar, P., Mitchell, J.S.B., Yildirim, E.A.: Approximate minimum enclosing balls in high dimensions using core-sets. J. Exp. Algorithmics 8, 1.1 (2003) CrossRefMathSciNetGoogle Scholar
  18. 18.
    Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1986) Google Scholar
  19. 19.
    Matousek, J., Sharir, M., Welzl, E.: A subexponential bound for linear programming. Algorithmica 16(4/5), 498–516 (1996) MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2008

Authors and Affiliations

  • Jie Gao
    • 1
  • Michael Langberg
    • 2
  • Leonard J. Schulman
    • 3
  1. 1.Department of Computer ScienceStony Brook UniversityStony BrookUSA
  2. 2.Computer Science DivisionThe Open University of IsraelRaananaIsrael
  3. 3.Department of Computer ScienceCalifornia Institute of TechnologyPasadenaUSA

Personalised recommendations