Abstract
The analysis of incomplete data is a long-standing challenge in practical statistics. When, as is typical, data objects are represented by points in ℝd, incomplete data objects correspond to affine subspaces (lines or Δ-flats). With this motivation we study the problem of finding the minimum intersection radius r(ℒ) of a set of lines or Δ-flats ℒ: the least r such that there is a ball of radius r intersecting every flat in ℒ. Known algorithms for finding the minimum enclosing ball for a point set (or clustering by several balls) do not easily extend to higher-dimensional flats, primarily because “distances” between flats do not satisfy the triangle inequality. In this paper we show how to restore geometry (i.e., a substitute for the triangle inequality) to the problem, through a new analog of Helly’s theorem. This “intrinsic-dimension” Helly theorem states: for any family ℒ of Δ-dimensional convex sets in a Hilbert space, there exist Δ+2 sets ℒ′⊆ℒ such that r(ℒ)≤2r(ℒ′). Based upon this we present an algorithm that computes a (1+ε)-core set ℒ′⊆ℒ, |ℒ′|=O(Δ 4/ε), such that the ball centered at a point c with radius (1+ε)r(ℒ′) intersects every element of ℒ. The running time of the algorithm is O(n Δ+1 dpoly (Δ/ε)). For the case of lines or line segments (Δ=1), the (expected) running time of the algorithm can be improved to O(ndpoly (1/ε)). We note that the size of the core set depends only on the dimension of the input objects and is independent of the input size n and the dimension d of the ambient space.
Article PDF
Similar content being viewed by others
References
Agarwal, P.K., Procopiuc, C.M.: Approximation algorithms for projective clustering. In: SODA’00: Proceedings of the Eleventh Annual ACM–SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2000, pp. 538–547. SIAM, Philadelphia (2000)
Agarwal, P.K., Procopiuc, C.M., Varadarajan, K.R.: Approximation algorithms for k-line center. In: ESA’02: Proceedings of the 10th Annual European Symposium on Algorithms, London, UK, 2002, pp. 54–63. Springer, Berlin (2002)
Agarwal, P.K., Procopiuc, C.M., Varadarajan, K.R.: A (1+ε)-approximation algorithm for 2-line-center. Comput. Geom. Theory Appl. 26(2), 119–128 (2003)
Agarwal, P.K., Har-Peled, S., Varadarajan, K.R.: Approximating extent measures of points. J. ACM 51(4), 606–635 (2004)
Agarwal, P., Har-Peled, S., Varadarajan, K.R.: Geometric approximation via coresets. In: Current Trends in Combinatorial and Computational Geometry. Cambridge University Press, Cambridge (2005)
Bǎdoiu, M., Clarkson, K.L.: Smaller core-sets for balls. In: SODA’03: Proceedings of the Fourteenth Annual ACM–SIAM Symposium on Discrete Algorithms, Philadelphia, PA, USA, 2003, pp. 801–802. SIAM, Philadelphia (2003)
Bǎdoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: STOC’02: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2002, pp. 250–257. Assoc. Comput. Mach., New York (2002)
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2003)
Gao, J., Langberg, M., Schulman, L.: Clustering lines: classification of incomplete data. Manuscript (2006)
Har-Peled, S.: Private communication
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: STOC’04: Proceedings of the Thirty-Sixth Annual ACM Symposium on Theory of Computing, New York, NY, USA, 2004, pp. 291–300. Assoc. Comput. Mach., New York (2004)
Har-Peled, S., Varadarajan, K.: Projective clustering in high dimensions using core-sets. In: SCG’02: Proceedings of the Eighteenth Annual Symposium on Computational Geometry, New York, NY, USA, 2002, pp. 312–318. Assoc. Comput. Mach., New York (2002)
Har-Peled, S., Wang, Y.: Shape fitting with outliers. SIAM J. Comput. 33, 269–285 (2003)
Helly, E.: Über Mengen konvexer Körper mit gemeinschaftlichen Punkten. Jahresber. Dtsch. Math.-Ver. 32, 175–176 (1923)
Kumar, P., Yildirim, E.A.: Minimum volume enclosing ellipsoids and core sets. J. Optim. Theory Appl. 126(1), 1–21 (2005)
Kumar, P., Mitchell, J.S.B., Yildirim, A.: Computing core-sets and approximate smallest enclosing hyperspheres in high dimensions. In: Proceedings of the Fifth Workshop on Algorithm Engineering and Experiments, pp. 45–55 (2003)
Kumar, P., Mitchell, J.S.B., Yildirim, E.A.: Approximate minimum enclosing balls in high dimensions using core-sets. J. Exp. Algorithmics 8, 1.1 (2003)
Little, R.J.A., Rubin, D.B.: Statistical Analysis with Missing Data. Wiley, New York (1986)
Matousek, J., Sharir, M., Welzl, E.: A subexponential bound for linear programming. Algorithmica 16(4/5), 498–516 (1996)
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Pankaj Agarwal.
An extended abstract appeared in ACM–SIAM Symposium on Discrete Algorithms, 2006.
Work was done when J. Gao was with Center for the Mathematics of Information, California Institute of Technology.
Work was done when M. Langberg was a postdoctoral scholar at the California Institute of Technology. Research supported in part by NSF grant CCF-0346991.
Research of L.J. Schulman supported in part by an NSF ITR and the Okawa Foundation.
Rights and permissions
About this article
Cite this article
Gao, J., Langberg, M. & Schulman, L.J. Analysis of Incomplete Data and an Intrinsic-Dimension Helly Theorem. Discrete Comput Geom 40, 537–560 (2008). https://doi.org/10.1007/s00454-008-9107-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00454-008-9107-5