Abstract
In this chapter, 0–1 indicator or “dummy” variables are used to incorporate segment differences, shocks, or structural shifts into regression models. With cross sectional data, indicators can be used to incorporate the unique responses of particular groups or segments. With time series data, indicators can be used to account for external shocks or structural shifts. Indicators also offer one option to account for seasonality or cyclicality in time series.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Author information
Authors and Affiliations
6.1 Electronic Supplementary Material(s)
Appendices
Excel 6.1: Analytics to Quantify the Impact of Segment Differences with Cross Sectonal Data
SUV Weight and Fuel Economy
First, plot mpg by weight to confirm that their relationship is approximately linear. Hybrids are in the first ten rows. First, select data in those rows in columns D and E, the insert a scatterplot. Then, select data in the remaining rows in columns D and E and insert a scatterplot. Copy one scatterplot and past it into the second. Right click on an observation in each of the two series and Add Trendline, as Fig. 6.8 illustrates.
Run a two driver model, with weight and the hybrid indicator in the X range. Add the one tail pvalues:
To see the relative importance of weight and type of engine, create the partworths, the portion of the regression equation due to each of the two drivers. Copy weight and the hybrid indicator from the data sheet and paste next to the residuals. Then find both of the two part worths (pw), the weight pw, and the hybrid pw, that together contribute to predicted mpg. The hybrid pw will be the hybrid coefficient in b18, locked with function 4, times the hybrid indicator. The weight pw will be the weight coefficient in b19 times weight of each vehicle. Lock the reference to the coefficient with function 4, so that for each SUV, the formula reads the coefficient:
Below the pws, find the minimums and maximums, and from those, the ranges. Add the driver names above the ranges.
Illustrate the importances that you just found with a bar graph (insert a column chart and choose 2D bar). Add axis labels and a stand alone title. Adjust the horizontal axis to begin at zero. Change the color of the weight bar to red, since weight has a negative impact, as shown in Fig. 6.9. (Click on that bar, then click Format, Fill, Red.)
Excel 6.2: Analytics to Quantify the Impact of Shifts or Shocks to Forecast with Time Series
Kroger Nonfuel Sales
First, following best modelling practice, hide the two most recent datapoints to enable later validation. Then, plot sales by year. Select year and sales and insert a scatterplot.
Add a linear trend line by right clicking on a datapoint and choosing Add Trendline.
The plot reveals that the trend is approximately linear.
Build the Validation Model
Run the one driver model in regression. The Input Y Range are the cells containing nonfuel sales, and the Input X Range are the cells containing the year. Add the one tail pvalue by dividing the two tail pvalue provided by Excel by 2.
Assess Residuals
First, copy the column of years from the data page and paste to the left of residuals. Then request a plot. Change the major units in the plot to the standard error, 2.9.
The sale of the convenience store business is apparent in 2018 and 2019.
Next, calculate Durbin Watson and compare with the online critical values (n = 10, k = 1), dL = .88 and dU = 1.32.
Add a Convenience Store Indicator and Build a Two Driver Model
Convenience stores were owned by Kroger through 2017. Account for that shift from their sale by adding an indicator variable next to years in the data sheet, setting the indicator to 1 in available years through 2017, and to 0 in 2018 onward:
Run a two driver regression model. The Input X Range is now cells with years and with the convenience store indicator: A1:B11. Add the one tail pvalues.
Assess Residuals
Copy and paste years from the data sheet to the left of residuals and insert a scatterplot. Rescale the major units to standard errors, 1.8.
The residuals appear to be pattern free, and all are no further than two standard errors from zero.
Calculate Durbin Watson and compare with online critical values (n = 10, k = 2) dL = .70 and dU = 1.64.
Validate the Model with JMP
Copy year and nonfuel sales and paste next to residuals. Then, use JMP to find the lower and upper 95% individual prediction interval bounds.
Copy the year, convenience stores and nonfuel sales columns, through year 2024 and paste into JMP.
Choose File, New, Data Table.
Then, in the data template, Edit, Paste with column names:
Select rows of 2020 and 2021, right click and choose the Hide and Exclude icon to run the validation model:
To run regression, select Analyze, Fit Model:
Drag nonfuel sales to the Y box. Drag year and cs to the Construct Model Effects box. Change Emphasis to Minimal Report. Click Keep dialog open. Run.
In the regression output, click the red triangle at the top, Save Columns, Indiv Confidence Interval. This will add the 95% lower and upper individual prediction intervals to the JMP data file.
Select and copy the 95% prediction interval bounds and paste into the excel two driver regression sheet.
The model underestimates nonfuel sales in 2020. The Pandemic boosted Kroger’s sales as consumers avoided restaurants and dined at home. To account for this “shock,” add a Pandemic indicator in the data sheet, equal to zero before 2020, and equal to one 2020 onwards. (Note this assumes that the impact of the Pandemic will continue through 2024. We can easily change this assumption if it is believed that consumers will return to restaurants and buy less for home consumption in later years.)
Recalibrate with Excel and JMP
To recalibrate the model and update coefficients and forecasts, run the three driver model, this time using all of the available data. The Input Y Range will now be D1:D13. The Input X Range will now be A1:C13. Add the one tail pvalues.
In JMP, again select 2020 and 2021 rows, right click, and again choose Hide and Exclude to unhide the two most recent datapoints. Add a new column Pandemic, zero through 2019 and one in 2020 onwards.
Rerun regression, adding Pandemic to the Construct Model Effects box. Save Columns, Indiv Confidence Intervals. Copy the 95% prediction interval bounds and paste next to year and nonfuel sales in the excel regression sheet.
Select year, revenues, and 95% lower and upper individual prediction interval bounds. Then insert a scatterplot of the fit and forecast. Add a vertical axis title, a stand alone chart title, adjust axes to make good use of white space, color both prediction interval bounds the same.
Case 6.1: Fuel Economy of SUVs by Country of Design
US car manufacturers are challenged to design competitive vehicles that can achieve the new 40 miles per gallon required by 2025. Use the data in SUV mpg by Region.xlsx to build a model to account for variation in SUVs. Account for engine type, hybrid or conventional, as well as region of design, US or outside the US.
-
(a)
Copy and paste cells from the coefficient section of your regression, showing driver name, coefficient and one tail pvalues: (4)
-
(b)
Copy and paste your regression equation. Include a hat, show units, round to two or three significant digits. Split into two or three lines if needed, to avoid running off the page. (And use a font size large enough to read, such as 12 pt.) (4)
-
(c)
Interpret the driver coefficient estimates. (4)
-
(d)
How much lighter would a conventional SUV designed in the U.S. need to be to be as fuel efficient as a conventional SUV designed elsewhere? (2)
-
(e)
Produce a bar graph of the importances of the three drivers. Be sure to label axes, provide a stand alone chart title, set the horizontal axis minimum to zero. Color drivers with negative influence red. Use 12 pt. font. (4)
Case 6.2: Forecast of Sephora Revenues
An investor seeks a forecast of Sephora revenues through 2025.
Recent events may have influenced revenues:
-
Class action suit. In 2014, four Asian customers filed a class action suit alleging race discrimination. Sephora had discovered that some Asia customers were buying in bulk and reselling in Asian markets. Sephora deactivated some Asian accounts. This may have been a temporary shock or it may have longer term impacts.
-
Smart phone app. In 2016 Sephora introduced a smart phone app which enabled customers to try cosmetics virtually.
-
Mexico. In 2018, Sephora entered Mexico.
-
The Pandemic. The pandemic may have hurt sales growth in 2020. Stores temporarily closed. Any Pandemic impact is thought to be temporary.
-
Build a model of revenues to explain variation and forecast through 2025. Data are in Trend Forecast of Sephora.xlsx.
-
(a)
Copy and paste the coefficient section of your final, recalibrated model, showing variable names, coefficients and one tail pvalues: (4)
-
(b)
Present your equation. Include a hat, show units, round to two or three significant digits. Split into two or three lines if needed to prevent the equation from running off of the page. (And use a font size large enough to read, such as 12 pt.) (4)
-
(c)
Interpret each of the driver coefficients: (4)
Show your recalibrated fit and forecast scatterplot, below. Paste directly, not as a picture. Adjust axes to make good use of white space, label the vertical axis, add a stand alone chart title, and remove unwelcome zeros. Use 12 pt. font. (4).
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Fraser, C. (2024). Analytics to Account for Segment or Scenario Differences. In: Business Statistics for Competitive Advantage with Excel and JMP . Springer, Cham. https://doi.org/10.1007/978-3-031-42555-4_6
Download citation
DOI: https://doi.org/10.1007/978-3-031-42555-4_6
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42554-7
Online ISBN: 978-3-031-42555-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)