Abstract
We consider the subspace proximity problem: Given a vector \({\varvec{x}} \in {\mathbb {R}}^n\) and a basis matrix \(V \in {\mathbb {R}}^{n \times m}\), the objective is to determine whether \({\varvec{x}}\) is close to the subspace spanned by V. Although the problem is solvable by linear programming, it is time consuming especially when n is large. In this paper, we propose a quick tester that solves the problem correctly with high probability. Our tester runs in time independent of n and can be used as a sieve before computing the exact distance between \({\varvec{x}}\) and the subspace. The number of coordinates of \({\varvec{x}}\) queried by our tester is \(O(\frac{m}{\epsilon }\log \frac{m}{\epsilon })\), where \(\epsilon \) is an error parameter, and we show almost matching lower bounds. By experiments, we demonstrate the scalability and applicability of our tester using synthetic and real data sets.
Similar content being viewed by others
Notes
\(T_{\mathrm {LIN}}(m,m)\) is at most \(m^\omega \), where \(\omega <2.3728639\) [25].
One may notice that the runtime slightly increases as n increases when m is small (\(m=5,20\)). It may be due to computational overheads such as memory caching, which could be dominant when the number of queries is small.
A tester with the non-negative constraint is discussed in Sect. 4.3.
One may think that, if we have the with-sunglasses images in the training phase (obtaining V), our tester is not greatly advantageous because we can just train a classifier to discriminate with- and without-sunglasses. This is incorrect. Even in such case, our tester is meaningful in terms of time complexity; although the classifier requires O(n) time, our testers works in constant time.
In actual use cases, we first fix \(\gamma \) or the number of queries depending on available computational resources, and then determine \({\varDelta }\) to achieve the best classification performance. We can choose \({\varDelta }\) by some standard methods in machine learning, such as cross-validation.
References
Achlioptas, D.: Database-friendly random projections: Johnson–Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66(4), 671–687 (2003)
Alon, N., Dar, S., Parnas, M., Ron, D.: Testing of clustering. SIAM J. Discrete Math. 16(3), 393–417 (2003)
Alon, N., Fischer, E., Newman, I., Shapira, A.: A combinatorial characterization of the testable graph properties: It’s all about regularity. SIAM J. Comput. 39(1), 143–167 (2009)
Balcan, M.-F., Li, Y., Woodruff, D.P., Zhang, H.: Testing matrix rank, optimally. In: Proceedings of the 30th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 727–746 (2019)
Barrodale, I., Phillips, C.: Algorithm 495: solution of an overdetermined system of linear equations in the chebychev norm [F4]. ACM Trans. Math. Softw. 1(3), 264–270 (1975)
Bingham, E., Mannila, H.: Random projection in dimensionality reduction: applications to image and text data. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 245–250 (2001)
Borgs, C., Chayes, J., Lovász, L., Sós, V.T., Szegedy, B., Vesztergombi, K.: Graph limits and parameter testing. In: Proceedings of the 38th Annual ACM Symposium on Theory of Computing (STOC), pp. 261–270 (2006)
Carroll, J.D., Chang, J.-J.: Analysis of individual differences in multidimensional scaling via an \(N\)-way generalization of “Eckart–Young” decomposition. Psychometrika 35(3), 283–319 (1970)
Chandrasekaran, K., Cheraghchi, M., Gandikota, V., Grigorescu, E.: Local testing for membership in lattices. In: Proceedings of the 36th IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science (FSTTCS), pp. 46:1–46:14 (2016)
Clarkson, K.L.: Las Vegas algorithms for linear and integer programming when the dimension is small. J. ACM 42(2), 488–499 (1995)
Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41(6), 391–407 (1990)
Ding, H., Wang, J.: Recurrent neural networks for minimum infinity-norm kinematic control of redundant manipulators. IEEE Trans. Syst. Man Cybern. Part A 29(3), 269–276 (1999)
Farebrother, R.W.: The historical development of the linear minimax absolute residual estimation procedure 1786–1960. Comput. Stat. Data Anal. 24(4), 455–466 (1997)
Goldreich, O., Ron, D.: Property testing in bounded degree graphs. Algorithmica 32(2), 302–343 (2002)
Guestrin, C., Koller, D., Parr, R.: Max-norm projections for factored MDPs. In: Proceedings of the 17th International Joint Conference on Artificial Intelligence (IJCAI), pp. 673–680 (2001)
Har-Peled, S.: Geometric Approximation Algorithms. American Mathematical Society, Providence (2011)
Harsanyi, J.C., Chang, C.-I.: Hyperspectral image classification and dimensionality reduction: an orthogonal subspace projection approach. IEEE Trans. Geosci. Remote Sens. 32, 779–785 (1994)
Harshman, R.: Foundations of the parafac procedure: models and conditions for an “explanatory” multi-modal factor analysis. UCLA Working Papers in Phonetics, p. 16 (1970)
Haussler, D., Welzl, E.: Epsilon-nets and simplex range queries. In: Proceedings of the 2nd Annual Symposium on Computational geometry (SoCG), pp. 61–71 (1986)
Hayashi, K., Yoshida, Y.: Minimizing quadratic functions in constant time. In: Proceedings of the 30th Annual Conference on Neural Information Processing Systems (NIPS), pp. 2217–2225 (2016)
Hotelling, H.: Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24, 417–441 (1933)
Hubert, M., Rousseeuw, P., Vanden Branden, K.: ROBPCA: a new approach to robust principal component analysis. Technometrics 47, 64–79 (2005)
Kahl, F., Hartley, R.: Multiple-view geometry under the \(L_\infty \)-norm. IEEE Trans. Pattern Anal. Mach. Intell. 30(9), 1603–1617 (2008)
Krauthgamer, R., Sasson, O.: Property testing of data dimensionality. In: Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 18–27 (2003)
Le Gall, F.: Powers of tensors and fast matrix multiplication. In: Proceedings of the 39th International Symposium on Symbolic and Algebraic Computation (ISSAC), pp. 296–303 (2014)
LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)
Lee, D.D., Seung, H.S.: Learning the parts of objects by non-negative matrix factorization. Nature 401(6755), 788–791 (1999)
Li, Y., Wang, Z., Woodruff, D.P.: Improved testing of low rank matrices. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), pp. 691–700 (2014)
Martínez, A., Benavente, R.: The AR face database. Technical report 24, Computer Vision Center (1998)
Ostrowski, A.: Sur l’Approximation du déterminant de Fredholm par les déterminants des systèmes d’équations linéaires. Arkiv för matematik, astronomi och fysik. Almqvist & Wiksells, Stockholm (1938)
Pearson, K.: On lines and planes of closest fit to systems of points in space. Philos. Mag. 2, 559–572 (1901)
Rice, J.R., White, J.S.: Norms for smoothing and estimation. SIAM Rev. 6(3), 243–256 (1964)
Rubinfeld, R., Sudan, M.: Robust characterizations of polynomials with applications to program testing. SIAM J. Comput. 25(2), 252–271 (1996)
Stiefel, E.: Note on Jordan elimination, linear programming and Tschebyscheff approximation. Numer. Math. 2, 1–17 (1960)
Stromberg, A.J.: Computing the exact least median of squares estimate and stability diagnostics in multiple linear regression. SIAM J. Sci. Comput. 14(6), 1289–1299 (1993)
Torgerson, W.S.: Multidimensional scaling: I. Theory and method. Psychometrika 17, 401–419 (1952)
Woodroofe, M.: On the maximum deviation of the sample density. Ann. Math. Stat. 38(2), 475–481 (1967)
Yoshida, Y.: A characterization of locally testable affine-invariant properties via decomposition theorems. In: Proceedings of the 46th Annual ACM Symposium on Theory of Computing (STOC), pp. 154–163 (2014)
Yoshida, Y.: Gowers norm, function limits, and parameter estimation. In: Proceedings of the 27th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1391–1406 (2016)
Yu, B.: Density estimation in the \(l^\infty \) norm for dependent data with applications to the Gibbs sampler. Ann. Stat. 21(2), 711–735 (1993)
Acknowledgements
We thank anonymous referees for helpful comments and providing an alternative algorithm for the subspace proximity problem explained in Sect. 4.4. Y.Y. is supported by JSPS KAKENHI Grant Number JP17H04676.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Hayashi, K., Yoshida, Y. Testing Proximity to Subspaces: Approximate \(\ell _\infty \) Minimization in Constant Time. Algorithmica 82, 1277–1297 (2020). https://doi.org/10.1007/s00453-019-00642-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00453-019-00642-0