Regression

Shardt, Yuri A. W.

doi:10.1007/978-3-319-21509-9_3

Yuri A. W. Shardt²

2650 Accesses

Abstract

This chapter introduces the reader to the concepts of data modelling using least-squares, regression analysis through a simplified framework consisting of three iterative steps, model selection, parameter estimation, and model validation, which forms the foundation for all subsequent chapters. Model selection focuses on selecting an appropriate description of the data set given both physical and mathematical constraints. This chapter focuses on deterministic models, while subsequent chapters focus on stochastic or more complex models. Parameter estimations seeks to determine the values of the parameter for the given model and data set. Different approaches, including ordinary, linear regression; weighted, linear regression; and nonlinear regression, are examined in detail. Theoretical results are provided as necessary to illustrate the need for some of the components of the analysis. Also, detailed summaries listing all the required formulae are provided after each section. Finally, model validation, which consists of two components, residual testing and model adequacy testing, is explained in detail. Suggestions for corrective actions are also provided for commonly encountered issues in model validation. Detailed examples are provided to illustrate the different methods and approaches. By the end of the chapter, the reader should be familiar with the regression analysis framework and be able to apply it to complex, real-life examples.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 99.00; Price excludes VAT (USA)

Softcover Book: USD 129.00; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For two column vectors a and b, the dot product a · b can be defined as the matrix multiplication a ^T b or b ^T a.
2.
A detailed example on solving the nonlinear regression problem is given in Sect. 7.8.2, Nonlinear Regression Example for MATLAB^®, and Sect. 8.7.2, Nonlinear Regression Example for Excel^®.

Author information

Authors and Affiliations

Institute of Automation and Complex Systems (AKS), University of Duisburg-Essen, Duisberg, North Rhine-Westphalia, Germany
Yuri A. W. Shardt

Authors

Yuri A. W. Shardt
View author publications
You can also search for this author in PubMed Google Scholar

Appendix A3: Nonmatrix Solutions to the Linear, Least-Squares Regression Problem

3.1.1 A.1 Nonmatrix Solution for the Ordinary, Least-Squares Case

The nonmatrix solution only applies to the case of solving a simple model that can be written as

$$ y=a+bx $$

(3.A1)

Note that x can be replaced by f(x) here and in all the following equations.

The ordinary, least-squares problem can be solved by first computing the following two quantities:

$$ \begin{array}{c}{s}_x^2=\frac{m{\displaystyle \sum {x}^2}-{\left({\displaystyle \sum x}\right)}^{\kern-0.15em 2}}{m}\\ {}{s}_y^2=\frac{m{\displaystyle \sum {y}^2}-{\left({\displaystyle \sum y}\right)}^{\kern-0.15em 2}}{m}\end{array} $$

(3.A2)

Then, the linear regression coefficients can be calculated as follows:

$$ \begin{array}{c}\widehat{b}=\frac{m{\displaystyle \sum xy}-{\displaystyle \sum x}{\displaystyle \sum y}}{m{\displaystyle \sum {x}^2}-{\left({\displaystyle \sum x}\right)}^{\kern-0.15em 2}}\\ {}\widehat{a}=\frac{{\displaystyle \sum y}-\widehat{b}{\displaystyle \sum x}}{m}\end{array} $$

(3.A3)

The correlation coefficient is calculated using

$$ {R}^2=\frac{{\left[m{\displaystyle \sum x}y-\left({\displaystyle \sum x}\right)\left({\displaystyle \sum y}\right)\right]}^{\kern-0.10em 2}}{\left[m{\displaystyle \sum {x}^2}-{\left({\displaystyle \sum x}\right)}^{\kern-0.15em 2}\right]\left[m{\displaystyle \sum {y}^2}-{\left({\displaystyle \sum y}\right)}^{\kern-0.15em 2}\right]} $$

(3.A4)

The standard deviation of the model is given as

$$ \widehat{\sigma}=\frac{1}{m-2}\left({s}_y^2-{\widehat{b}}^{\kern-0.15em 2}{s}_x^2\right) $$

(3.A5)

The standard deviation for coefficient b is given as

$$ \widehat{\sigma}\sqrt{{\left({\mathcal{A}}^T\mathcal{A}\right)}_{22}^{\kern-0.15em -1}}={s}_b=\frac{\widehat{\sigma}}{s_x} $$

(3.A6)

The standard deviation of coefficient a is given as

$$ \widehat{\sigma}\sqrt{{\left({\mathcal{A}}^T\mathcal{A}\right)}_{11}^{\kern-0.15em -1}}={s}_a=\frac{\widehat{\sigma}}{s_x}\sqrt{\frac{{\displaystyle \sum {x}^2}}{m}} $$

(3.A7)

The confidence interval for the mean response at a value of x _d is given by

$$ \widehat{y}\pm {t}_{1-\frac{\alpha }{2},m-2}\kern0.20em \widehat{\sigma}\sqrt{\frac{1}{m}+\frac{\left({x}_d-\frac{1}{m}{\displaystyle \sum {x}_i}\right)}{s_x^2}} $$

(3.A8)

The confidence interval for the prediction at a value of x _d is given by

$$ \widehat{y}\pm {t}_{1-\frac{\alpha }{2},m-2}\kern0.20em \widehat{\sigma}\sqrt{1+\frac{1}{m}+\frac{\left({x}_d-\frac{1}{m}{\displaystyle \sum {x}_i}\right)}{s_x^2}} $$

(3.A9)

The total sum of squares would then be calculated using

$$ TSS={\displaystyle \sum_{i=1}^m{\left({y}_i-\overline{y}\right)}^2}={\displaystyle \sum {y}^2}-\frac{1}{m}{\left({\displaystyle \sum y}\right)}^{\kern-0.15em 2} $$

(3.A10)

3.1.2 A.2 Nonmatrix Solution for the Weighted, Least-Squares Case

The nonmatrix solution only applies to the case of solving a simple model that can be written as

$$ y={a}_w+{b}_wx $$

(3.A11)

Note that x can be replaced by f(x) here and in all the following equations.

The ordinary, least-squares problem can be solved by first computing the following two quantities:

$$ \begin{array}{c}{s}_{x_w}^2=\frac{{\displaystyle \sum w}{\displaystyle \sum w{x}^2}-{\left({\displaystyle \sum wx}\right)}^{\kern-0.15em 2}}{\left({\displaystyle \sum w}\right)}\\ {}{s}_{y_w}^2=\frac{{\displaystyle \sum w}{\displaystyle \sum w{y}^2}-{\left({\displaystyle \sum wy}\right)}^{\kern-0.15em 2}}{\left({\displaystyle \sum w}\right)}\end{array} $$

(3.A12)

Then, the linear regression coefficients can be calculated as follows:

$$ \begin{array}{c}{\widehat{b}}_w=\frac{\left({\displaystyle \sum w}\right){\displaystyle \sum wxy}-{\displaystyle \sum wx}{\displaystyle \sum wy}}{\left({\displaystyle \sum w}\right){\displaystyle \sum w{x}^2}-{\left({\displaystyle \sum wx}\right)}^{\kern-0.15em 2}}\\ {}{\widehat{a}}_w=\frac{{\displaystyle \sum wy}-{\widehat{b}}_w{\displaystyle \sum wx}}{\left({\displaystyle \sum w}\right)}\end{array} $$

(3.A13)

The correlation coefficient is calculated using

$$ {R}^2=\frac{{\left[\left({\displaystyle \sum w}\right){\displaystyle \sum wx}y-\left({\displaystyle \sum wx}\right)\left({\displaystyle \sum wy}\right)\right]}^{\kern-0.15em 2}}{\left[\left({\displaystyle \sum w}\right){\displaystyle \sum w{x}^2}-{\left({\displaystyle \sum wx}\right)}^{\kern-0.15em 2}\right]\left[\left({\displaystyle \sum w}\right){\displaystyle \sum w{y}^2}-{\left({\displaystyle \sum wy}\right)}^{\kern-0.15em 2}\right]} $$

(3.A14)

The standard deviation of the model is given as

$$ {\widehat{\sigma}}_w=\frac{1}{m-2}\left({s}_{y_w}^2-{\widehat{b}}_w^2{s}_{x_w}^2\right) $$

(3.A15)

The standard deviation of coefficient b _w is given as

$$ \widehat{\sigma}\sqrt{{\left({\mathcal{A}}^T\mathcal{W}\mathcal{A}\right)}_{22}^{-1}}={s}_{b_w}=\frac{{\widehat{\sigma}}_w}{s_{x_w}} $$

(3.A16)

The standard deviation of coefficient a is given as

$$ \widehat{\sigma}\sqrt{{\left({\mathcal{A}}^T\mathcal{W}\mathcal{A}\right)}_{11}^{-1}}={s}_a=\frac{{\widehat{\sigma}}_w}{s_{x_w}}\sqrt{\frac{{\displaystyle \sum w{x}^2}}{{\displaystyle \sum w}}} $$

(3.A17)

The confidence interval for the mean response at a value of x _d is given by

$$ \widehat{y}\pm {t}_{1\kern0.15em -\kern0.15em \frac{\alpha }{2},m\kern0.15em -\kern0.15em 2}\kern0.20em {\widehat{\sigma}}_w\sqrt{\frac{1}{\left({\displaystyle \sum {w}_i}\right)}+\frac{\left({x}_d-\frac{1}{\left({\displaystyle \sum {w}_i}\right)}{\displaystyle \sum {w}_i{x}_i}\right)}{s_{x_w}^2}} $$

(3.A18)

The confidence interval for the prediction at a value of x _d is given by

$$ \widehat{y}\pm {t}_{1\kern0.15em -\kern0.15em \frac{\alpha }{2},m\kern0.15em -\kern0.15em 2\kern0.15em -\kern0.15em {n}_{\sigma }}\kern0.20em {\widehat{\sigma}}_w\sqrt{\frac{1}{w_d}+\frac{1}{\left({\displaystyle \sum {w}_i}\right)}+\frac{\left({x}_d-\frac{1}{\left({\displaystyle \sum {w}_i}\right)}{\displaystyle \sum {w}_i{x}_i}\right)}{s_x^2}} $$

(3.A19)

It should be noted that the predicted weight at the given point, w _d, should be determined from a model with n _σ unknown parameters.

The total sum of squares would then be calculated using

$$ TSS={\displaystyle \sum_{i=1}^m{w}_i{\left({y}_i-\overline{y}\right)}^2}={\displaystyle \sum w{y}^2}-\frac{1}{\left({\displaystyle \sum {w}_i}\right)}{\left({\displaystyle \sum wy}\right)}^{\kern-0.15em 2} $$

(3.A20)

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Shardt, Y.A.W. (2015). Regression. In: Statistics for Chemical and Process Engineers. Springer, Cham. https://doi.org/10.1007/978-3-319-21509-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-21509-9_3
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-21508-2
Online ISBN: 978-3-319-21509-9
eBook Packages: Chemistry and Materials ScienceChemistry and Material Science (R0)

Publish with us

Policies and ethics

Regression

Abstract

Access this chapter

Notes

Author information

Authors and Affiliations

Appendix A3: Nonmatrix Solutions to the Linear, Least-Squares Regression Problem

Appendix A3: Nonmatrix Solutions to the Linear, Least-Squares Regression Problem

3.1.1 A.1 Nonmatrix Solution for the Ordinary, Least-Squares Case

3.1.2 A.2 Nonmatrix Solution for the Weighted, Least-Squares Case

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation