Multiple Linear Regression

Berger, Paul D.; Maurer, Robert E.; Celli, Giovana B.

doi:10.1007/978-3-319-64583-4_15

Paul D. Berger⁴,
Robert E. Maurer⁵ &
Giovana B. Celli⁶

5516 Accesses
2 Citations

Abstract

In the previous chapter, we discussed situations where we had only one independent variable (X ) and evaluated its relationship with a dependent variable (Y ). This chapter goes beyond that and deals with the analysis of situations where we have more than one X (predictor) variable, using a technique called multiple regression. Similarly to simple regression, the objective here is to specify mathematical models that can describe the relationship between Y and more than one X and that can be used to predict the outcome at given values of the predictors. As we did in Chap. 14, we focus on linear models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Notes

1.
In fact, we would have a plane or hyperplane, since we have multiple dimensions. We will use the term line in this text for simplicity.
2.
For those of you who know what this means, you would need to invert a matrix by hand!
3.
If we accept the null hypothesis , we would typically abandon formal statistical analysis, since we have accepted that “the X’s as a group (or, the X’s collectively) do not provide us with predictive value about Y”; in which case, what more can be said?
4.
At this point we would drop X ² from consideration and repeat the regression without the X ² data. However, leaving it in the model/equation for the moment allows several salient points to be made about the methodology over the next several pages in a superior way. We explicitly discuss this issue later.
5.
We are making an analogy to R. That is, imagine that a “unit of information” is equivalent to .01 of the R. There are 100 units of information about Y, labeled 1−100. Obviously, if an X, or group of X’s, provide all 100 units of information, it would be equivalent to having an R of 1.
6.
In SPSS and JMP , we can enter a column of data as, for example, M and F, for the two sexes. However, we advise the reader not to do so, for the richness of the output is greater when we convert the letters to 0’s and 1’s.
7.
We would, in general, not be pleased to have 12 X’s and n = (only) 15. This is true even though all 12 X’s are extremely unlikely to enter the stepwise regression . There is too much opportunity to “capitalize on chance,” and find variables showing up as significant, when they are really not. This possibility is a criticism of the stepwise regression technique and is discussed further in “Improving the User Experience through Practical Data Analytics,” by Fritz and Berger , Morgan Kaufmann, page 259.
8.
JMP and SPSS include some options for “directions” or “methods” when performing stepwise regression . Forward is equivalent to stepwise, but once a variable is included, it cannot be removed. Remove is a stepwise in reverse; that is, your initial equation contains all the variables and the steps remove the least significant ones in each step (not available in JMP). Backward is similar to remove, although we cannot reintroduce a variable once it is removed from the equation. JMP also has mixed, which is a procedure that alternates between forward and backward. The authors recommend stepwise and, while preferring it, are not strongly against remove. We are not certain why anyone would prefer either forward or backward. These two processes remove the “guarantee” that all non-significant variables (using p = .10, usually) are deleted from the model/equation.
9.
While standardized coefficients provide an indication of relative importance of the variables in a stepwise regression , this would not necessarily be the case in a “regular” multiple regression . This is because there can be large amounts of multicollinearity in a regular multiple regression , while this element is eliminated to a very large degree in the stepwise process.

Author information

Authors and Affiliations

Bentley University, Waltham, MA, USA
Paul D. Berger
Questrom School of Business, Boston University, Boston, MA, USA
Robert E. Maurer
Cornell University, Ithaca, NY, USA
Giovana B. Celli

Authors

Paul D. Berger
View author publications
You can also search for this author in PubMed Google Scholar
Robert E. Maurer
View author publications
You can also search for this author in PubMed Google Scholar
Giovana B. Celli
View author publications
You can also search for this author in PubMed Google Scholar

1 Electronic Supplementary Material

Table Ex3 (XLS 61 kb)

Example 15.5

(XLS 59 kb)

Appendix

Example 15.8 Faculty Ratings using R

To analyze the faculty ratings example , we can import the data as we have done previously or create our own in R.

> x1 <- c(1, 4, 4, 2, 4, 4, 4, 5, 4, 3, 4, 4, 3, 4, 3) > x2 <- c(4, 4, 3, 3, 4, 4, 4, 5, 3, 3, 3, 3, 3, 3, 4) > x3 <- c(4, 3, 4, 4, 4, 3, 5, 5, 4, 4, 4, 4, 3, 3, 4) > x4 <- c(4, 4, 4, 4, 4, 3, 4, 4, 3, 3, 3, 3, 3, 3, 2) ⋮ > x12 <- c(3, 2, 1, 2, 3, 2, 3, 2, 2, 3, 1, 2, 2, 2, 1) > y <- c(4, 4, 3, 3, 4, 4, 4, 5, 4, 3, 4, 3, 3, 3, 4) > rating <- data.frame(x1, x2, x3, x4, …, x12, y)

First, let’s see how we perform a multiple-regression analysis . The functions used are the ones we already know:

> rating_model <- lm(y~x1+x2+x3+x4+…+x12, data=rating) > summary(rating_model) Call: lm(formula = y~x1+x2+x3+x4+…+x12, data=rating)

Residuals:

1	2	3	4	5	6
0.01552	-0.10636	-0.01592	-0.04003	0.14890	-0.02140
7	8	9	10	11	12
-0.04565	0.01751	-0.06493	0.05061	0.21131	-0.22315
13	14	15
0.02642	0.07319	-0.02603

Coefficients:

	Estimate	Std. error	t value	Pr(>\|t\|)
(Intercept)	-0.40784	0.84199	-0.484	0.676
x1	0.26856	0.19360	1.387	0.300
x2	0.01166	0.31473	0.037	0.974
x3	0.31028	0.21674	1.432	0.289
x4	0.02993	0.43669	0.069	0.952
x5	-0.17622	0.16670	-1.057	0.401
x6	0.20136	0.42008	0.479	0.679
x7	0.05440	0.14016	0.388	0.735
x8	0.09736	0.24867	0.392	0.733
x9	0.17106	0.14630	1.169	0.363
x10	0.27376	0.19890	1.376	0.303
x11	0.10341	0.32860	0.315	0.783
x12	0.00783	0.38118	0.021	0.985

Residual standard error: 0.2705 on 2 degrees of freedom Multiple R-squared: 0.9726, Adjusted R-squared: 0.8079 F-statistic: 5.906 on 12 and 2 DF, p-value: 0.1538

Our model is obtained as follows:

> rating_model Call: lm(formula = y~x1+x2+x3+x4+…+x12, data=rating) Coefficients:

(Intercept)	x1	x2	x3	x4	x5
-0.40784	0.26856	0.01166	0.31028	0.02993	-0.17622
x6	x7	x8	x9	x10	x11
0.20136	0.05440	0.09736	0.17106	0.27376	0.10341
x12
0.00783

There are different ways a stepwise regression can be performed in R. Here, we demonstrate a semi-automated procedure using p-value as the selection criteria. Differently from other software, with R we have to select which variable will be included or excluded. First, we create a model that contains only the intercept (called “1” by R) and none of the independent variables:

> rating_none <- lm(y~1, data=rating)

Then, using add1() or drop1() functions we can include or remove single items from the model. This is done as follows:

> add1(rating_none, formula(rating_model), test="F") Single term additions Model: y ~ 1

	Df	Sum of Sq	RSS	AIC	F value	Pr(>F)
<none>			5.3333	-13.511
x1	1	0.5178	4.8155	-13.043	1.3978	0.258258
x2	1	3.7984	1.5349	-30.194	32.1717	7.643e-05	*******
x3	1	0.9496	4.3837	-14.452	2.8161	0.117186
x4	1	0.1786	5.1548	-12.022	0.4503	0.513918
x5	1	0.2976	5.0357	-12.372	0.7683	0.396645
x6	1	2.7083	2.6250	-22.145	13.4127	0.002869	**
x7	1	0.1190	5.2143	-11.850	0.2968	0.595116
x8	1	2.8161	2.5172	-22.773	14.5434	0.002151	**
x9	1	0.3592	4.9741	-12.557	0.9388	0.350278
x10	1	2.9207	2.4126	-23.410	15.7378	0.001609	**
x11	1	3.9062	1.4271	-31.286	35.5839	4.705e-05	*******
x12	1	0.0160	5.3173	-11.556	0.0392	0.846154

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

Next, we select the variable with the smallest p-value – in this case, X ₁₁ – and introduce it in our model without dependent variables:

> rating_best <- lm(y~1+x11, data=rating) > add1(rating_best, formula(rating_model), test="F") Single term additions Model: y ~ 1 + x11

	Df	Sum of Sq	RSS	AIC	F value	Pr(>F)
<none>			1.42708	-31.286
x1	1	0.15052	1.27656	-30.958	1.4149	0.25724
x2	1	0.47429	0.95279	-35.346	5.9735	0.03093	*
x3	1	0.22005	1.20703	-31.798	2.1877	0.16488
x4	1	0.10665	1.32043	-30.451	0.9693	0.34430
x5	1	0.00125	1.42584	-29.299	0.0105	0.92013
x6	1	0.02708	1.40000	-29.574	0.2321	0.63861
x7	1	0.11905	1.30804	-30.593	1.0922	0.31659
x8	1	0.68192	0.74517	-39.033	10.9814	0.00618	**
x9	1	0.04419	1.38289	-29.758	0.3835	0.54732
x10	1	0.05887	1.36821	-29.918	0.5164	0.48616
x12	1	0.00453	1.42256	-29.334	0.0382	0.84834

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘’ 1

We keep doing this until there are no significant variables left:

> rating_best <- lm(y~1+x11+x8, data=rating) > add1(mbest, formula(rating_model), test="F") Single term additions Model: y ~ 1 + x11 + x8

	Df	Sum of Sq	RSS	AIC	F value	Pr(>F)
<none>			0.74517	-39.033
x1	1	0.011724	0.73344	-37.271	0.1758	0.6831
x2	1	0.156982	0.58818	-40.581	2.9358	0.1146
x3	1	0.072753	0.67241	-38.574	1.1902	0.2986
x4	1	0.024748	0.72042	-37.540	0.3779	0.5512
x5	1	0.012667	0.73250	-37.290	0.1902	0.6712
x6	1	0.020492	0.72468	-37.451	0.3110	0.5882
x7	1	0.001921	0.74325	-37.072	0.0284	0.8691
x9	1	0.007752	0.73742	-37.190	0.1156	0.7402
x10	1	0.049515	0.69565	-38.064	0.7830	0.3952
x12	1	0.009649	0.73552	-37.228	0.1443	0.7113

Since all the other variables are non-significant, we terminate the optimization process and, using X ₈ and X ₁₁, find our final model:

> rating_final <- lm(y~x8+x11, data=rating) > rating_final Call: lm(formula = y~x8+ x11, data=rating) Coefficients:

(Intercept)	x8	x11
0.9209	0.3392	0.6011

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Berger, P.D., Maurer, R.E., Celli, G.B. (2018). Multiple Linear Regression. In: Experimental Design. Springer, Cham. https://doi.org/10.1007/978-3-319-64583-4_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-64583-4_15
Published: 30 November 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-64582-7
Online ISBN: 978-3-319-64583-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Multiple Linear Regression

Abstract

Access this chapter

Notes

Author information

Authors and Affiliations

1 Electronic Supplementary Material

Table Ex3 (XLS 61 kb)

Example 15.5

Appendix

Appendix

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation