Least Absolute Deviations pp 1-36 | Cite as

# Generalities

Chapter

## Abstract

Given n points (x where y ∈ R

_{i},y_{i}) ∈ R^{k+1}, the least absolute deviation (LAD) fitting problem is to find a minimizes ĉ ∈ R^{k}, of the AD distance function$$ \begin{gathered}
f(c) = \sum\limits_{i = 1}^n {\left| {{y_i}} \right.} - \sum\limits_{j = 1}^k {{c_j} \left. {{x_{ij}}} \right|} \hfill \\
\equiv \sum\limits_{i = 1}^n {\left| {{y_i} - } \right. < } \left. {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} ,{{\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{x} }_i} > } \right| = \sum\limits_{i = 1}^n {\left| {{r_i}} \right.} \left. {(\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} )} \right| \hfill \\
\equiv \left\| {y - {{\left. {X\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} } \right\|}_1} \equiv {{\left\| {\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{r} (\underset{\raise0.3em\hbox{$\smash{\scriptscriptstyle-}$}}{c} )} \right.}_1}} \right. \hfill \\
\end{gathered} $$

(1)

^{n}, x_{i}is the i^{th}row of the n × k matrix X, and r(c) = y — Xc is the vector of residuals. Every c ∈ R^{k}defines a hyperplane*π*_{ c }= {(x,y) ∈ R^{k+1}: y=<c,x>}. Thus ĉ determines a hyperplane that best fits the n points in the LAD sense. If the first column of X is composed of 1’s, ĉ_{1}is the intercept in an equation relating y to the remaining x’s; otherwise, the fit passes through the origin.## Keywords

Convex Combination Directional Derivative Historical Background Weighted Median Mathematical Background
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

## Preview

Unable to display preview. Download preview PDF.

## References

- 1.Many alternatives to the term LAD have been used As a general rule, numerical analysis literature has referred to the problem of minimizing ||y ‒ Xc||, y ∈ R
^{n}, c ∈ R^{k}, k ≤ n, as L_{1}discrete L_{1}or*ℓ*_{1}curve-fitting [e.g., Abdelmalek (1971, 1974, 1975), Anderson and Steiger (1982), Armstrong.and Godfrey (1979), Barrodale and Roberts (1973, 1974), Bartels and Conn (1978), Bartels, Conn and Sinclair (1978), Osborne and Watson (1971), Robers and Ben-Israel (1969), Spyropoulos, Kiountouzis and Young (1972)]. Within the same literature, “overdetermined equation systems in the L_{1}norm”, or some quite similar terminology, also appears.Google Scholar - Here is a partial list of the other terms that have been used with some of the references to them: Absolute Deviation Curve-fitting (AD) [Armstrong and Frome (1976), Pfaffenberger and Dinkel (1978), Schlossmacher (1973)]; Least Absolute Error (LAE) [Bassett (1973)]; Least Absolute Residuals Regression (LAR) [Rosenberg and Carlson(1970, 1977)]; Least Absolute Value Regression (LAV) [Armstrong, Elam and Hultz (1977), Armstrong and Frome (1976, 1977), Barrodale and Roberts (1977), Bartels and Conn (1977), Gentle (1977)]; Least Deviations (LD) [Karst (1958)]; Least Total Deviation (LTD) [Daviesf 1966)]; Minimum Absolute Deviation Regression (MAD) [Ashar and Wallace (1963), Gallant and Gerig (1974), Harris (1950), Kanter and..Steiger (1974, 1977), Sharpe (1971)]; Minimum Deviation (MP) [Rhodes(1930)]; Minimum Sum of Absolute Errors (MSAE) [Naruia and Wellington(1977a, b, c)3; Sum of Absolute Deviations (SAD); [Rao and Shrinivasan (1962)]; Sum of Absolute Errors (SAE) [Orveson (1969)]; Sum of Absolute Value of Deviations (SAV) [Singleton (1940)]. Perhaps LAV is closest to “least squares”. We used LAD because it is more euphonic. Some others who agreed with that name
*are*Armstrong and Frome (1976), Gentle, Kennedy and Sposito (1977), An and Chen (1982), and Gross and Steiger (1979), perhaps for the same reason.Google Scholar - 2.There is a brief historical survey of LAD in Gentle (1977). It touches upon aspects that do not appear in Section 1.Google Scholar
- 3.The treatment of degeneracy is complicated Some aspects are discussed in Chapter 7. Sadovski (1974) is aware of the problem; his code attempts to diagnose cycling. Seneta and Steiger (1983! characterize degeneracy, but bypass the problems of dealing with it once diagnosedGoogle Scholar
- 4.Part of Theorem 2.2 and its proof is due to M.R. Osborne.Google Scholar
- 5.Theorem 3.3 was stated without proof in Gentle, Kennedy and Sposito (1977). They give an example for the necessity of an intercept term that is somewhat misleading because in it, x
_{i}# 0, all i.Google Scholar - 6.Gentle, Kennedy, and Sposito (1977) claim that in fitting y = a+bx to (x
_{i},y_{i}), i=1,...,n, there are at most 4 extreme LAD fits. This is false: (-2,3/2), (-2,-3/2), (-1,1), (-1,-1), (1,1), (1,-1), (2,3/2), (2,-3/2) has 6 extreme fits. As n → ∞ the number of extreme fits, even for k=2, is unboundedGoogle Scholar - 7.Theorem 3.4 was stated without proof in Bloomfield (1982). Theorem 3.5 is usually established via linear programming theory. This direct proof seems to be new. Also, the explicit accounting for degeneracy in Theorem 3.8 seems to be new and Theorem 3.9 is original.Google Scholar
- 8.The weighted medians may be computed efficiently in several ways. If the ratios are kept in a heap [see, e.g., Knuth (1975)], elements may be removed sequentially and the weights tested until the weighted median is obtained. Chambers’ (1971) partial quicksort may be modified to test the balance of weights at the current partition element Both these methods have average complexity n log(n). If the ratio and corresponding weight sequences
*are*stored as a single sequence of points in the plane using a quad-tree, the partial heapsort technique for removing and testing elements should yield a linear algorithm.Google Scholar

## Copyright information

© Birkhäuser Boston, Inc. 1983