Abstract
Analytics from regression can easily create a long range forecast based on trend. Regression and correlation, upon which regression is based, reflect linear association between two variables. Regression quantifies the influence of a continuous, independent driver x on a continuous dependent, performance variable y. In the case of a trend focused forecast, the driving variable x is time period. In later chapters, focus will be expanded to both explain how an independent decision variable x drives a dependent performance variable y and also in predicting performance y to compare the impact of alternate decision variable x values. X is also called a predictor, since from x we can predict y. Here, focus is on prediction of performance or response y from knowledge of the driver, time period, x.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
4.1 Electronic Supplementary Material(s)
Appendices
Excel 4.1: Analytics to Build a Simple Linear Regression Model with Cross Sectional Data
Gun Ownership
Use regression analysis to explore the linear influence of a states population on gun ownership in a state. Data are in Gun Ownership in the States.xlsx.
To confirm that the relationship is approximately linear, first insert a scatterplot of gun ownership by population.
Select the data in columns B and C, then Insert, Scatterplot. Right click a data point and choose Add Trendline:
Activate the Analysis Tool Pak Add In: File, Options, Add Ins, Analysis ToolPak.
Call regression in the Analysis ToolPak: Data, Data Analysis, Regression. Enter the dependent variable cells in the Y Range, c1:c52. Enter the independent variable cells in the X Range, b1:b52. Choose Labels and Residuals. OK.
Add cells in column F and then find the one tail pvalue in F18 by dividing the two tail pvalue by 2:
To express the regression model in an equation, use the equation editor. Choose Insert, then Equation (not from the drop down menu):
In the template that appears, enter the dependent variable name, and about half way through the name, go to the equation design tab and choose Accent, then last choice in the top row, to produce a hat. The hat means “predicted,” and is important to include:
Use the right arrow key to move out of the accent space and complete the dependent variable name, followed by parentheses containing units. Before entering the closing parenthesis, choose subscript:
Enter the closing parenthesis and subscript i, the equal sign, and the intercept, with units in parentheses:
Enter the slope with parentheses for units. The units are in K per M people, so a fraction is needed in the parentheses:
Enter K and M people in the fraction, followed by the multiplication sign:
Enter the independent variable name, with units in parentheses and a subscript i:
The equation will be useful, since it presents the partworths, two in this case, which are in comparable units and together sum to predicted values.
Excel 4.2: Analytics to Build a Simple Linear Regression Model with Time Series Data
Tractor Supply Company
Use regression analysis to explore the linear influence of trend on sales across a time series sample of 11 years.
Be sure to exclude the two most recent datapoints from 2018 and 2019, so that you can later validate your model. The Input Y Range are the cells with sales. The Input X Range are the cells with corresponding years, 2009 through 2017. Specify Labels and Residuals.
Regression statistics | ||||||
Multiple R | 0.9996 | |||||
R Square | 0.9992 | |||||
Adjusted R Square | 0.9991 | |||||
Standard error | 0.042 | |||||
Observations | 9 | |||||
ANOVA | ||||||
df | SS | MS | F | Significance F | ||
Regression | 1 | 16.02 | 16.02 | 9173. | 3.6E-12 | |
Residual | 7 | 0.012 | 0.002 | |||
Total | 8 | 16.03 | ||||
Coefficients | Standard error | t Stat | P-value | Lower 95% | Upper 95% | |
Intercept | −1035 | 10.9 | −95.3 | 3.7E-12 | −1061 | −1009 |
Year | 0.517 | 0.0054 | 95.8 | 3.6E-12 | 0.50 | 0.53 |
Observation | Predicted net sales ($B) | Residuals |
---|---|---|
1 | 3.14 | 0.056 |
2 | 3.66 | −0.061 |
3 | 4.18 | 0.022 |
4 | 4.69 | 0.006 |
5 | 5.21 | −0.011 |
6 | 5.73 | −0.028 |
7 | 6.24 | −0.044 |
8 | 6.76 | 0.039 |
9 | 7.28 | 0.022 |
Assess Residuals
Assess the residuals by finding the Durbin Watson statistic to test presence of positive autocorrelation, unaccounted for trend, shifts, shocks or cycles: sumxmy2(c25:c32,c26:c33)/sumsq(c25:c33).
Compare DW with the lower and upper critical values for n = 9, K = 1: dL = .82 dU = 1.32.
in http://berument.bilkent.edu.tr/DW.pdf
Next, plot the residuals by time period.
Copy the year from the data sheet and paste to the left of the residuals in the regression sheet.
Select the years and residuals, then insert a scatterplot.
Right click the vertical axis and choose Format Axis to change major units to the regression standard error (in B7).
Validate with Excel
Copy the data and paste next to residuals, and then use the regression equation with the coefficients (always in B17 and B18) to find predicted sales. (Use f4, function 4, to lock cell references so that your equation will use the coefficients to make predicted values in each row.)
The Intercept ($B) is = B17 f4.
The trend part worth ($B) is = B18 f4 *F25.
Predicted sales ($B) are = SUM(H25:I25)
Select H25:J25 and down fill.
Adjust the regression standard error to enable creation of individual prediction interval bounds.
In F36, find the mean year used in the regression: =average(F25:F33).
In F37, find the sum of squared deviations from the mean year: =DEVSQ(F25:F33).
Find the standard error adjustment for each year.
In K25: =SQRT(1 + 1/$B$8 + (F25-$F$36)^2/($B$8–1)/($F$37)).
Downfill.
Adjust the regression standard error for each year: K25*$B$7.
Downfill.
Find the critical t value from the critical t for residual degrees of freedom (always in B13): =T.INV.2 T(.05,B13).
Find the margin of error for each year, multiplying the critical t times the adjusted standard error.
In M25, locking the cell reference to C7 with function 4: =$C$7*L25.
Downfill.
To test the predictive validity of the model, find the lower 95% and upper 95% prediction interval bounds for each year by adding and subtracting the margin of error to and from predicted revenues.
Downfill.
Compare actual exports for 2018 and 2019, in F34 and F35, with 95% prediction interval bounds.
Recalibrate and Present Fit and Forecast
Recalibrate to update your fit and forecast. Run regression again, this time including the two most recent datapoints.
Update your fit and forecast by reusing formulas for predicted sales, the standard error adjustment, critical t, margins of error, and the lower and upper 95% individual prediction interval bounds from the first regression.
Select year t and net sales, hold Cntl or Command down, and, with your mouse, select the lower and upper prediction interval cells and request a scatterplot of the fit and forecast.
Remove the actual sales line.
Select the Series Net Sales and right click to format the series.
Choose the paint can icon, Line, No Line.
Change the color of one of the prediction interval lines to match the other and remove markers from both prediction intervals: Markers, Marker Options, none.
Add a vertical axis title and chart title. Right click on the vertical axis, Format Axis, then adjust the axis to make better use of white space.
Case 4.1: DC Hotel Prices
Do hotel prices reflect Guest Ratings? Use Hotels DC.xlsx to investigate with simple regression.
-
(a)
Plot prices by guest ratings and add a trendline:
-
(b)
Copy and paste the coefficient section of your regression results beginning in row 16, and showing the driver name, coefficients, and one tail pvalue.
-
(c)
Present your regression equation. Be sure to include the hat, units, and subscripts. Round to two or three significant digits.
-
(d)
Interpret the driver coefficient:
-
(e)
Which hotels have prices much higher than predicted?
-
(f)
Which hotels have prices much lower than predicted?
Case 4.2: Forecast of Ulta Net Sales
An investor seeks a forecast of Ulta’s net sales through 2024. Use the data in Ulta Forecast.xlsx to build a regression model and make the forecast.
-
(a)
Plot net sales by year and add a trendline:
-
(b)
Copy and past the coefficient section of your regression results beginning in row 16, showing the driver name, coefficients and one tail pvalue.
-
(c)
Present your regression equation. Be sure to include the hat, units, and subscripts. Round to two or three significant digits.
-
(d)
Interpret the driver coefficient:
-
(e)
Present a scatterplot of your fit and forecast. Label the vertical axis, add a stand alone title, remove unwelcome zeros, and adjust the axes to make good use of white space.
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Fraser, C. (2024). Analytics with Simple Regression to Identify Drivers and Forecast. In: Business Statistics for Competitive Advantage with Excel and JMP . Springer, Cham. https://doi.org/10.1007/978-3-031-42555-4_4
Download citation
DOI: https://doi.org/10.1007/978-3-031-42555-4_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42554-7
Online ISBN: 978-3-031-42555-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)