Abstract
Fitting linear regression models can be computationally very expensive in large-scale data analysis tasks if the sample size and the number of variables are very large. Random projections are extensively used as a dimension reduction tool in machine learning and statistics. We discuss the applications of random projections in linear regression problems, developed to decrease computational costs, and give an overview of the theoretical guarantees of the generalization error. It can be shown that the combination of random projections with least squares regression leads to similar recovery as ridge regression and principal component regression. We also discuss possible improvements when averaging over multiple random projections, an approach that lends itself easily to parallel implementation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Achlioptas, D.: Database-friendly random projections: Johnson-Lindenstrauss with binary coins. J. Comput. Syst. Sci. 66 (4), 671–687 (2003)
Ailon, N., Chazelle, B.: Approximate nearest neighbors and the fast Johnson-Lindenstrauss transform. In: Proceedings of the 38th Annual ACM Symposium on Theory of Computing (2006)
Blocki, J., Blum, A., Datta, A., and Sheffet, O.: The Johnson-Lindenstrauss transform itself preserves differential privacy. In: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science (FOCS), pp. 410–419. IEEE, Washington, DC (2012)
Cook, R.D.: Detection of influential observation in linear regression. Technometrics 19, 15–18 (1977)
Dasgupta, S., Gupta, A.: An elementary proof of a theorem of Johnson and Lindenstrauss. Random Struct. Algoritm. 22, 60–65 (2003)
Dhillon, P.S., Foster, D.P., Kakade, S.: A risk comparison of ordinary least squares vs ridge regression. J. Mach. Learn. Res. 14, 1505–1511 (2013)
Dhillon, P., Lu, Y., Foster, D.P., Ungar, L.: New subsampling algorithms for fast least squares regression. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 360–368. Curran Associates, Inc. (2013). http://papers.nips.cc/paper/5105-new-subsampling-algorithms-for-fast-least-squares-regression.pdf
Indyk, P. and Motwani, R.: Approximate nearest neighbors: towards removing the curse of dimensionality. In: Proceedings of the 30th Annual ACM Symposium on Theory of Computing (1998)
Johnson, W., Lindenstrauss, J.: Extensions of Lipschitz mappings into a Hilbert space. In: Contemporary Mathematics: Conference on Modern Analysis and Probability (1984)
Kabán, A.: A new look at compressed ordinary least squares. In: 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 482–488 (2013). doi:10.1109/ICDMW.2013.152, ISSN:2375-9232
Lu, Y., Dhillon, P.S., Foster, D., Ungar, L.: Faster ridge regression via the subsampled randomized hadamard transform. In: Proceedings of the 26th International Conference on Neural Information Processing Systems, pp. 369-377. Curran Associates Inc., Lake Tahoe (2013). http://dl.acm.org/citation.cfm?id=2999611.2999653
Mahoney, M.W., Drineas, P.: CUR matrix decompositions for improved data analysis. Proc. Natl. Acad. Sci. 106 (3), 697–702 (2009)
Maillard, O.-A., Munos, R.: Compressed least-squares regression. In: Bengio, Y., Schuurmans, D., Lafferty, J.D., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems, vol. 22, pp. 1213–1221. Curran Associates, Inc. (2009). http://papers.nips.cc/paper/3698-compressed-least-squares-regression.pdf
Marzetta, T., Tucci, G., Simon, S.: A random matrix-theoretic approach to handling singular covariance estimates. IEEE Trans. Inf. Theory 57 (9), 6256–6271 (2011)
McWilliams, B., Krummenacher, G., Lučić, M., and Buhmann, J.M.: Fast and robust least squares estimation in corrupted linear models. In: NIPS (2014)
McWilliams, B., Heinze, C., Meinshausen, N., Krummenacher, G., Vanchinathan, H.P.: Loco: distributing ridge regression with random projections. arXiv preprint arXiv:1406.3469 (2014)
Tropp, J.A.: Improved analysis of the subsampled randomized Hadamard transform. arXiv:1011.1595v4 [math.NA] (2010)
Zhang, L., Mahdavi, M., Jin, R., Yang, T., Zhu, S.: Recovering optimal solution by dual random projection. arXiv preprint arXiv:1211.3046 (2012)
Zhou, S., Lafferty, J., Wasserman, L.: Compressed and privacy-sensitive sparse regression. IEEE Trans. Inf. Theory. 55 (2), 846-866 (2009). doi:10.1109/TIT.2008.2009605. ISSN:0018-9448
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
In this section we give proofs of the statements from the section theoretical results. Theorem 1 ([10]) Assume fixed design and Rank(X) ≥ d, then the AMSE 4 can be bounded above by
Proof
(Sketch)
Finally a rather lengthy but straightforward calculation leads to
and thus proving the statement above. □
Theorem 2 Assume Rank(X) ≥ d, then the AMSE (4) can be bounded above by
where
Proof
We have for all \(v \in \mathbb{R}^{p}\)
Which we can minimize over the whole set \(\mathbb{R}^{p}\):
This last expression we can calculate following the same path as in Theorem 1:
where Σ = X ′ X. Next we minimize the above expression w.r.t v. For this we take the derivative w.r.t. v and then we zero the whole expression. This yields
Hence we have
which is element wise equal to
Define the notation \(s =\mathop{ \mathrm{trace}}\nolimits (\varSigma )\). We now plug this back into the original expression and get
by combining the summands we get for w i the expression mentioned in the theorem. □
Theorem 3 Assume Rank(X) ≥ d, then the MSE (4) equals
Furthermore we have
Proof
Calculating the expectation yields
Going through these terms we get:
The first term in the last line equals ∑ i = 1 p β i 2 λ i 2∕η i . The second can be calculated in two ways, both relying on the shuffling property of the trace operator:
Adding the first version to the expectation from above we get the exact expected mean-squared error. Setting both versions equal we get the equation
□
Theorem 4 Assume Rank(X) ≥ d, then there exists a real number τ ∈ [d 2 ∕p,d] such that the AMSE of \(\hat{\beta }_{d}\) can be bounded from above by
where the w i ’s are given as
and
Proof
First a simple calculation [10] using the closed form solution gives the following equation:
Now using the corollary from the last section we can bound the second term by the following way:
For the first term we write
Now note that since λ i ∕η i ≤ 1 we have
and thus we get the upper bound by
For the lower bound of τ we consider an optimization problem. Denote \(t_{i} = \frac{\lambda _{i}} {\eta _{i}}\), then we want to find \(t \in \mathbb{R}^{p}\) such that
under the restrictions that
The problem is symmetric in each coordinate and thus t i = c. Plugging this into the linear sum gives c = d∕p and we calculate the quadratic term to give the result claimed in the theorem. □
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this chapter
Cite this chapter
Thanei, GA., Heinze, C., Meinshausen, N. (2017). Random Projections for Large-Scale Regression. In: Ahmed, S. (eds) Big and Complex Data Analysis. Contributions to Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-41573-4_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-41573-4_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-41572-7
Online ISBN: 978-3-319-41573-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)