Skip to main content

Introduction to Simple Regression

  • Chapter
  • First Online:
Experimental Design

Abstract

In previous chapters, we have had data for which there has been a dependent variable (Y ) and an independent variable (X – even though, to be consistent with the notation that is close to universal in the field of experimental design, we have been using factor names, A, B, etc., or “column factor” and “row factor,” instead of, literally, the letter X ). The latter has been treated mostly as a categorical variable, whether actually numerical/metric or not. Often, we have had more than one independent variable. Assuming only one independent variable, if we want to say it this way (and we do!), we can say that we have had n (X, Y ) pairs of data, where n is the total number of data points. With more than one independent variable, we can say that we have n (X 1, X 2, …, Y ) data points.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Notes

  1. 1.

    Unless the scatter diagram indicates dramatically otherwise, we usually consider first a straight line, since it is the simplest functional form.

  2. 2.

    The correlation coefficient is calculated as

    $$ r=\frac{\sum_{i=1}^n\left({X}_i-\overline{X}\right)\left({Y}_i-\overline{Y}\right)}{\left(n-1\right){s}_X{s}_Y} $$

    where s X and s Y are the sample standard deviation of the independent and the dependent variables, respectively, and, of course, \( \overline{X} \) and \( \overline{Y} \) are their sample means. Note that r is the same, regardless of which variable is labeled as independent or dependent.

  3. 3.

    There are an infinite number of possible lines – so that REALLY would take up our free time! Until the availability of personal computers with software packages such as Excel , we would find the parameter values by evaluating equations.

  4. 4.

    It can be shown that the LS line is unique.

  5. 5.

    Confidence intervals for predictions are covered in the next chapter.

Author information

Authors and Affiliations

Authors

1 Electronic Supplementary Material

Appendix

Appendix

Example 14.9 Trends in Selling Toys using R

To analyze the same example, we can import the data as we have done previously or create our own in R. We will demonstrate the second option here – after all, it is more fun! This is how it is done:

> advertisement <- c(10, 20, 30, 40, 50, 60, 70, 80) > sales <- c(35, 80, 56, 82, 126, 104, 177, 153) > toy <- data.frame(advertisement, sales)

A quick inspection will show us the data frame was successfully created.

> toy

 

advertisement

sales

1

10

35

2

20

80

3

30

56

4

40

82

5

50

126

6

60

104

7

70

177

8

80

153

Using the plot() function, we can generate a scatter plot of our data, shown in Fig. 14.10. The correlation analysis is shown next.

> plot(toy, pch=16, cex=1.0, main="Sales +vs. Advertisement")

Fig. 14.10
figure 10

Scatter plot in R

> cor(toy,method="pearson")

 

advertisement

sales

advertisement

1.0000000

0.9060138

sales

0.9060138

1.0000000

> cor.test(toy$advertisement, toy$sales, method="pearson") Pearson's product-moment correlation data: toy$advertisement and toy$sales t = 5.2434, df = 6, p-value = 0.001932 alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: 0.5568723 0.9830591 sample estimates: cor 0.9060138

Now, we will perform a regression analysis , using the lm() function.

> toy_regression <- lm(sales~advertisement, data=toy) > summary(toy_regression) Call: lm(formula = sales ~ advertisement, data = toy)

Residuals:

Min

1Q

Median

3Q

Max

-24.393

-13.027

-7.434

17.336

30.762

Coefficients:

 

Estimate

Std. Error

t value

Pr(>|t|)

 

(Intercept)

21.3214

17.1861

1.241

0.26105

 

advertisement

1.7845

0.3403

5.243

0.00193

**

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 22.06 on 6 degrees of freedom Multiple R-squared: 0.8209, Adjusted R-squared: 0.791 F-statistic: 27.49 on 1 and 6 DF, p-value: 0.001932 > anova(toy_regression) Analysis of Variance Table Response: sales

 

Df

Sum Sq

Mean Sq

F value

Pr(>F)

 

advertisement

1

13375.0

13375.0

27.494

0.001932

**

Residuals

6

2918.9

486.5

   

--- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

To include the regression line in the scatter plot (shown in Fig. 14.11), we can use any of the following commands:

> abline(21.3214, 1.7845) > abline(lm(sales~advertisement))

Fig. 14.11
figure 11

Scatter plot with the regression line in R

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Berger, P.D., Maurer, R.E., Celli, G.B. (2018). Introduction to Simple Regression. In: Experimental Design. Springer, Cham. https://doi.org/10.1007/978-3-319-64583-4_14

Download citation

Publish with us

Policies and ethics