Analytics with Simple Regression to Identify Drivers and Forecast

Fraser, Cynthia

doi:10.1007/978-3-031-42555-4_4

Cynthia Fraser²

74 Accesses

Abstract

Analytics from regression can easily create a long range forecast based on trend. Regression and correlation, upon which regression is based, reflect linear association between two variables. Regression quantifies the influence of a continuous, independent driver x on a continuous dependent, performance variable y. In the case of a trend focused forecast, the driving variable x is time period. In later chapters, focus will be expanded to both explain how an independent decision variable x drives a dependent performance variable y and also in predicting performance y to compare the impact of alternate decision variable x values. X is also called a predictor, since from x we can predict y. Here, focus is on prediction of performance or response y from knowledge of the driver, time period, x.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Hardcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

McIntire School of Commerce, University of Virginia, Charlottesville, VA, USA
Cynthia Fraser

Authors

Cynthia Fraser
View author publications
You can also search for this author in PubMed Google Scholar

4.1 Electronic Supplementary Material(s)

Gun ownership master (XLSX 26 kb)

Hotels DC regression master (XLSX 37 kb)

TSC master (XLSX 37 kb)

Ulta trend forecast master (XLSX 37 kb)

Appendices

Excel 4.1: Analytics to Build a Simple Linear Regression Model with Cross Sectional Data

Gun Ownership

Use regression analysis to explore the linear influence of a states population on gun ownership in a state. Data are in Gun Ownership in the States.xlsx.

To confirm that the relationship is approximately linear, first insert a scatterplot of gun ownership by population.

Select the data in columns B and C, then Insert, Scatterplot. Right click a data point and choose Add Trendline:

A screenshot of a scatterplot of guns in thousands has a positive correlation. A list of options pops up on the right over the graph. Add trendline option is selected. The fill and outline options are at the top.

Activate the Analysis Tool Pak Add In: File, Options, Add Ins, Analysis ToolPak.

A pop-up window titled add-ins has four checkboxes labeled analysis tool Pak, analysis tool Pak V B A, Euro currency tools, and solver add-in. Analysis tool Pak is enabled. O K, cancel, browse, and automation buttons are on the right. O K is selected.

Call regression in the Analysis ToolPak: Data, Data Analysis, Regression. Enter the dependent variable cells in the Y Range, c1:c52. Enter the independent variable cells in the X Range, b1:b52. Choose Labels and Residuals. OK.

A pop-up window titled regression includes the input options, output options, residuals, and normal probability options. O K, cancel, and help buttons are on the right. O K is selected.

An Excel sheet has 3 tables of summary output. The column headers of 1. Regression statistics have multiple R, R square, Adjusted R, Standard, and observation. 2. d f, S S , M S, F and significance F. 3. coefficient, standard error, t s t a t, P-value, lower 95%, and upper 95%.

Add cells in column F and then find the one tail pvalue in F18 by dividing the two tail pvalue by 2:

An Excel sheet has a table with 5 columns and 2 rows. The column headers are coefficient, standard error, t s t a t, P-value, and one tail p value. The last cell of the first row is empty and the second row, 3.22388 E negative 11, is selected.

To express the regression model in an equation, use the equation editor. Choose Insert, then Equation (not from the drop down menu):

A screenshot of the menu bar highlights the insert menu. The equation is selected.

In the template that appears, enter the dependent variable name, and about half way through the name, go to the equation design tab and choose Accent, then last choice in the top row, to produce a hat. The hat means “predicted,” and is important to include:

An Excel sheet highlights the design menu in the menu bar. The accent is selected in the design options. A grid of accents appears on the right. A table with 9 columns and 2 rows is on the left.

Use the right arrow key to move out of the accent space and complete the dependent variable name, followed by parentheses containing units. Before entering the closing parenthesis, choose subscript:

An Excel sheet highlights the design menu in the menu bar. The e power x script is selected in the design options. A grid of subscripts and superscripts appears on the right.

Enter the closing parenthesis and subscript i, the equal sign, and the intercept, with units in parentheses:

A formula in the highlighted rectangular box reads guns owned of K subscript i = 37 of K.

Enter the slope with parentheses for units. The units are in K per M people, so a fraction is needed in the parentheses:

An Excel sheet highlights the design menu in the menu bar. The x by y fraction is selected in the design options. A list of fractions and common fractions appears on the right, and a table is on the left.

Enter K and M people in the fraction, followed by the multiplication sign:

An Excel sheet highlights the design menu in the menu bar. A list of symbols with the multiplication sign is highlighted. A table with 7 columns and 2 rows is at the bottom.

Enter the independent variable name, with units in parentheses and a subscript i:

$$ \mathrm{Guns}\ \mathrm{O}\hat{w} ned{(K)}_i=37(K)+16.9\left(\frac{K}{M\ people}\right)\times population{(M)}_i $$

The equation will be useful, since it presents the partworths, two in this case, which are in comparable units and together sum to predicted values.

Excel 4.2: Analytics to Build a Simple Linear Regression Model with Time Series Data

Tractor Supply Company

Use regression analysis to explore the linear influence of trend on sales across a time series sample of 11 years.

Be sure to exclude the two most recent datapoints from 2018 and 2019, so that you can later validate your model. The Input Y Range are the cells with sales. The Input X Range are the cells with corresponding years, 2009 through 2017. Specify Labels and Residuals.

A pop-up window titled regression includes the input and output options and residual options. O K, cancel, and help buttons are on the right. O K is selected. The table with 2 columns and 10 rows on the left highlights the second column. The column headers are t and net sales.

Regression statistics
Multiple R	0.9996
R Square	0.9992
Adjusted R Square	0.9991
Standard error	0.042
Observations	9
ANOVA
	df	SS	MS	F	Significance F
Regression	1	16.02	16.02	9173.	3.6E-12
Residual	7	0.012	0.002
Total	8	16.03
Coefficients		Standard error	t Stat	P-value	Lower 95%	Upper 95%
Intercept	−1035	10.9	−95.3	3.7E-12	−1061	−1009
Year	0.517	0.0054	95.8	3.6E-12	0.50	0.53

Observation	Predicted net sales ($B)	Residuals
1	3.14	0.056
2	3.66	−0.061
3	4.18	0.022
4	4.69	0.006
5	5.21	−0.011
6	5.73	−0.028
7	6.24	−0.044
8	6.76	0.039
9	7.28	0.022

Assess Residuals

Assess the residuals by finding the Durbin Watson statistic to test presence of positive autocorrelation, unaccounted for trend, shifts, shocks or cycles: sumxmy2(c25:c32,c26:c33)/sumsq(c25:c33).

An Excel sheet has a table with 4 columns and 9 rows. The column headers are observation, predicted net sales, residuals, and D W. The residuals column is highlighted.

Compare DW with the lower and upper critical values for n = 9, K = 1: dL = .82 dU = 1.32.

in http://berument.bilkent.edu.tr/DW.pdf

A table with the critical values for the Durbin-Watson statistic has 5 columns and 4 rows. The column headers are n, k = 1. k = 2. k = 3, and k = 4. The k columns are divided into d L and d U. Level of significance alpha = 0.05.

Next, plot the residuals by time period.

Copy the year from the data sheet and paste to the left of the residuals in the regression sheet.

Select the years and residuals, then insert a scatterplot.

An Excel sheet has a table with 2 columns and 9 rows on the left. The column headers are year and residuals. The residual plot on the right has a W-shaped line between 2008 and 2018 with a peak of 0.060 in 2009 and a trough of negative 0.060 in 2010.

Right click the vertical axis and choose Format Axis to change major units to the regression standard error (in B7).

An Excel sheet has a residual plot with an irregular W-shaped line between 2008 and 2018 on the left. A window titled Format Axis on the right indicates the axis options with bounds, units, and horizontal axis crosses.

Validate with Excel

Copy the data and paste next to residuals, and then use the regression equation with the coefficients (always in B17 and B18) to find predicted sales. (Use f4, function 4, to lock cell references so that your equation will use the coefficients to make predicted values in each row.)

The Intercept ($B) is = B17 f4.

The trend part worth ($B) is = B18 f4 *F25.

Predicted sales ($B) are = SUM(H25:I25)

Select H25:J25 and down fill.

An Excel sheet includes a table of residual output with 10 columns and 6 rows. The column headers are observation, predicted net sales, year, residuals, D W, year, net sales, intercept p w, and predicted sales.

Adjust the regression standard error to enable creation of individual prediction interval bounds.

In F36, find the mean year used in the regression: =average(F25:F33).

In F37, find the sum of squared deviations from the mean year: =DEVSQ(F25:F33).

An Excel sheet has a table with 4 columns and 3 rows. The years in the third column are highlighted. The formula for the average value is entered in the cell below.

Find the standard error adjustment for each year.

In K25: =SQRT(1 + 1/$B$8 + (F25-$F$36)^2/($B$8–1)/($F$37)).

Downfill.

An Excel sheet includes a formula of square root, followed by a column of 9 values from 1.056963442 to 1.079655763.

Adjust the regression standard error for each year: K25*$B$7.

Downfill.

An Excel sheet has a table with 3 columns and 11 rows. The column headers are predicted sales, s e adjustment, and set. The value 1.065529697 in the second cell and a formula in the third cell of the first row are highlighted.

Find the critical t value from the critical t for residual degrees of freedom (always in B13): =T.INV.2 T(.05,B13).

An Excel sheet includes the adjusted R, standard error, and observation. The ANOVA table at the bottom has 3 columns and 3 rows. The column headers are d f, S S, and F. The row reads regression, residual, and total. The d f value7 of residual is selected.

Find the margin of error for each year, multiplying the critical t times the adjusted standard error.

In M25, locking the cell reference to C7 with function 4: =$C$7*L25.

Downfill.

An Excel sheet has a table with 3 columns and 11 rows. The column headers are s e adjustment, set, and met. The value 0.0445 in the second cell and the formula in the third cell of the first row are highlighted.

To test the predictive validity of the model, find the lower 95% and upper 95% prediction interval bounds for each year by adding and subtracting the margin of error to and from predicted revenues.

Downfill.

Compare actual exports for 2018 and 2019, in F34 and F35, with 95% prediction interval bounds.

An Excel sheet has a table with 9 columns and 11 rows. The column headers net sales, intercept p w, trend p w, predicted sales, s e adjustment, set, met, lower pi, and upper pi. The lower and upper pi values of net sales =7.9 and 8.4 are highlighted.

Recalibrate and Present Fit and Forecast

Recalibrate to update your fit and forecast. Run regression again, this time including the two most recent datapoints.

Update your fit and forecast by reusing formulas for predicted sales, the standard error adjustment, critical t, margins of error, and the lower and upper 95% individual prediction interval bounds from the first regression.

Select year t and net sales, hold Cntl or Command down, and, with your mouse, select the lower and upper prediction interval cells and request a scatterplot of the fit and forecast.

Remove the actual sales line.

Select the Series Net Sales and right click to format the series.

Choose the paint can icon, Line, No Line.

A pop-up window titled Format Data Series indicates the series options with line and marker symbols. The drop-down options of the line are no line, solid line, gradient line, and automatic. No line option is enabled.

Change the color of one of the prediction interval lines to match the other and remove markers from both prediction intervals: Markers, Marker Options, none.

Add a vertical axis title and chart title. Right click on the vertical axis, Format Axis, then adjust the axis to make better use of white space.

A 2-line and dot plot of sales versus years from 2009 to 2024 plots an increasing trend in proximity, indicating the lower and upper pi. The dot plot of net sales lies between the 2 lines.

Case 4.1: DC Hotel Prices

Do hotel prices reflect Guest Ratings? Use Hotels DC.xlsx to investigate with simple regression.

(a)
Plot prices by guest ratings and add a trendline:
(b)
Copy and paste the coefficient section of your regression results beginning in row 16, and showing the driver name, coefficients, and one tail pvalue.
(c)
Present your regression equation. Be sure to include the hat, units, and subscripts. Round to two or three significant digits.
(d)
Interpret the driver coefficient:
(e)
Which hotels have prices much higher than predicted?
(f)
Which hotels have prices much lower than predicted?

Case 4.2: Forecast of Ulta Net Sales

An investor seeks a forecast of Ulta’s net sales through 2024. Use the data in Ulta Forecast.xlsx to build a regression model and make the forecast.

(a)
Plot net sales by year and add a trendline:
(b)
Copy and past the coefficient section of your regression results beginning in row 16, showing the driver name, coefficients and one tail pvalue.
(c)
Present your regression equation. Be sure to include the hat, units, and subscripts. Round to two or three significant digits.
(d)
Interpret the driver coefficient:
(e)
Present a scatterplot of your fit and forecast. Label the vertical axis, add a stand alone title, remove unwelcome zeros, and adjust the axes to make good use of white space.

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Fraser, C. (2024). Analytics with Simple Regression to Identify Drivers and Forecast. In: Business Statistics for Competitive Advantage with Excel and JMP . Springer, Cham. https://doi.org/10.1007/978-3-031-42555-4_4

Download citation

DOI: https://doi.org/10.1007/978-3-031-42555-4_4
Published: 05 March 2024
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42554-7
Online ISBN: 978-3-031-42555-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Analytics with Simple Regression to Identify Drivers and Forecast

Abstract

Access this chapter

Author information

Authors and Affiliations

4.1 Electronic Supplementary Material(s)

Gun ownership master (XLSX 26 kb)

Hotels DC regression master (XLSX 37 kb)

TSC master (XLSX 37 kb)

Ulta trend forecast master (XLSX 37 kb)

Appendices

Excel 4.1: Analytics to Build a Simple Linear Regression Model with Cross Sectional Data

Gun Ownership

Excel 4.2: Analytics to Build a Simple Linear Regression Model with Time Series Data

Tractor Supply Company

Assess Residuals

Validate with Excel

Recalibrate and Present Fit and Forecast

Case 4.1: DC Hotel Prices

Case 4.2: Forecast of Ulta Net Sales

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation