Algorithmica

, Volume 20, Issue 2, pp 136–150 | Cite as

Efficient Randomized Algorithms for the Repeated Median Line Estimator

  • J. Matoušek
  • D. M. Mount
  • N. S. Netanyahu

Abstract.

The problem of fitting a straight line to a finite collection of points in the plane is an important problem in statistical estimation. Recently there has been a great deal of interest is robust estimators, because of their lack of sensitivity to outlying data points. The basic measure of the robustness of an estimator is its breakdown point, that is, the fraction (up to 50%) of outlying data points that can corrupt the estimator. One problem with robust estimators is that achieving high breakdown points (near 50%) has proved to be computationally demanding. In this paper we present the best known theoretical algorithm and a practical subquadratic algorithm for computing a 50% breakdown point line estimator, the Siegel or repeated median line estimator. We first present an O(n\log n) randomized expected-time algorithm, where n is the number of given points. This algorithm relies, however, on sophisticated data structures. We also present a very simple O(n log2 n) randomized algorithm for this problem, which uses no complex data structures. We provide empirical evidence that, for many realistic input distributions, the running time of this second algorithm is actually O(n log n) expected time.

Key words. Repeated median estimator, Line fitting, Robust estimators, Randomized algorithms, Computational geometry. 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Copyright information

© 1998 Springer-Verlag New York Inc.

Authors and Affiliations

  • J. Matoušek
    • 1
  • D. M. Mount
    • 2
  • N. S. Netanyahu
    • 3
  1. 1.Department of Applied Mathematics, Charles University, Prague, Czech Republic.CZ
  2. 2.Department of Computer Science and Institute for Advanced Computer Studies, University of Maryland, College Park, MD 20742, USA.US
  3. 3.Center for Automation Research, University of Maryland, College Park, MD 20742, USA and Center of Excellence in Space Data and Information Sciences, Code 930.5, NASA Goddard Space Flight Center, Greenbelt, MD 20771, USA.US

Personalised recommendations