Skip to main content

Analytics to Account for Segment or Scenario Differences

  • Chapter
  • First Online:
Business Statistics for Competitive Advantage with Excel and JMP
  • 73 Accesses

Abstract

In this chapter, 0–1 indicator or “dummy” variables are used to incorporate segment differences, shocks, or structural shifts into regression models. With cross sectional data, indicators can be used to incorporate the unique responses of particular groups or segments. With time series data, indicators can be used to account for external shocks or structural shifts. Indicators also offer one option to account for seasonality or cyclicality in time series.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 119.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Author information

Authors and Affiliations

Authors

6.1 Electronic Supplementary Material(s)

Appendices

Excel 6.1: Analytics to Quantify the Impact of Segment Differences with Cross Sectonal Data

SUV Weight and Fuel Economy

First, plot mpg by weight to confirm that their relationship is approximately linear. Hybrids are in the first ten rows. First, select data in those rows in columns D and E, the insert a scatterplot. Then, select data in the remaining rows in columns D and E and insert a scatterplot. Copy one scatterplot and past it into the second. Right click on an observation in each of the two series and Add Trendline, as Fig. 6.8 illustrates.

Fig. 6.8
A scatterplot of miles per gallon versus weight by the type of S U Vs has two plots in a declining trend between 3000 and 5500 on the x-axis. The negative slopes denote the trend of the plots.

Mpg by weight by type of SUV

Run a two driver model, with weight and the hybrid indicator in the X range. Add the one tail pvalues:

A screenshot of an Excel sheet has the summary output, with 3 tables for regression statistics, A N O V A, and weight and hybrid indicator of residual. Regression statistics consists of 5 rows. A N O V A consists of regression, residual, and total values.

To see the relative importance of weight and type of engine, create the partworths, the portion of the regression equation due to each of the two drivers. Copy weight and the hybrid indicator from the data sheet and paste next to the residuals. Then find both of the two part worths (pw), the weight pw, and the hybrid pw, that together contribute to predicted mpg. The hybrid pw will be the hybrid coefficient in b18, locked with function 4, times the hybrid indicator. The weight pw will be the weight coefficient in b19 times weight of each vehicle. Lock the reference to the coefficient with function 4, so that for each SUV, the formula reads the coefficient:

A screenshot of an Excel sheet provides the values of weight, hybrid, and weight p w from rows 26 to 43. Columns F and G are highlighted.

Below the pws, find the minimums and maximums, and from those, the ranges. Add the driver names above the ranges.

A screenshot of an Excel sheet. The rows from 128 to 133 consist of maximum, minimum, hybrid, range, and weight values.

Illustrate the importances that you just found with a bar graph (insert a column chart and choose 2D bar). Add axis labels and a stand alone title. Adjust the horizontal axis to begin at zero. Change the color of the weight bar to red, since weight has a negative impact, as shown in Fig. 6.9. (Click on that bar, then click Format, Fill, Red.)

Fig. 6.9
A horizontal bar graph of M P G driver versus contribution to predicted M P G. The title reads the type of engine is more influential. The bar of weight has a low value of 11 and the bar of hybrid has a high value of 11.7 on the x-axis. The values are approximated.

Contribution of drivers

Excel 6.2: Analytics to Quantify the Impact of Shifts or Shocks to Forecast with Time Series

Kroger Nonfuel Sales

First, following best modelling practice, hide the two most recent datapoints to enable later validation. Then, plot sales by year. Select year and sales and insert a scatterplot.

Add a linear trend line by right clicking on a datapoint and choosing Add Trendline.

A scatter and line graph for nonfuel sales. A best-fit line extends between (2010, 70) and (2019, 110) by passing through (2014, 90). The plots are scattered in and around the line. The values are approximated.

The plot reveals that the trend is approximately linear.

Build the Validation Model

Run the one driver model in regression. The Input Y Range are the cells containing nonfuel sales, and the Input X Range are the cells containing the year. Add the one tail pvalue by dividing the two tail pvalue provided by Excel by 2.

A screenshot of an Excel sheet has 3 tables. Table 1 has regression and statistics values from rows 4 to 8. Table 2 of A N O V A has regression, residual, and total values from rows 12 to 14. Table 3 includes values of coefficient, standard errors, and p-value for intercept and year in rows 17 and 18.

Assess Residuals

First, copy the column of years from the data page and paste to the left of residuals. Then request a plot. Change the major units in the plot to the standard error, 2.9.

A line graph plots the sales of the convenience. A curve extends between (2010, 0) and (2019, negative 4.0) by passing through (2012, negative 3.2), (2013, 1.5), (2014, 2.0), (2015, 3), (2016, 2.9), (2017, 2.5), and (2018, negative 1.5). The values are approximated.

The sale of the convenience store business is apparent in 2018 and 2019.

Next, calculate Durbin Watson and compare with the online critical values (n = 10, k = 1), dL = .88 and dU = 1.32.

A screenshot of an Excel sheet includes residual output, with values of d L, d U, observation, non-fuel sales, year, residuals, and D W from rows 22 to 34.

Add a Convenience Store Indicator and Build a Two Driver Model

Convenience stores were owned by Kroger through 2017. Account for that shift from their sale by adding an indicator variable next to years in the data sheet, setting the indicator to 1 in available years through 2017, and to 0 in 2018 onward:

A screenshot of an Excel sheet consists of year, c s, and non-fuel sales values from rows 2 to 16.

Run a two driver regression model. The Input X Range is now cells with years and with the convenience store indicator: A1:B11. Add the one tail pvalues.

A screenshot of an Excel sheet has 3 tables. Table 1 has regression and statistics values from rows 4 to 8. Table 2 of ANOVA has regression, residual, and total values from rows 12 to 14. Table 3 includes values of coefficient, standard errors, and p-value for intercept, year, and c s from rows 17 to 19.

Assess Residuals

Copy and paste years from the data sheet to the left of residuals and insert a scatterplot. Rescale the major units to standard errors, 1.8.

A line graph titled residuals. A curve extends between (2010, 1.6) and (2019, negative 1.5) by passing through (2012, negative 3.2), (2014, 0.8), (2016, 0.1), and (2018, 1.6). The values are approximated.

The residuals appear to be pattern free, and all are no further than two standard errors from zero.

Calculate Durbin Watson and compare with online critical values (n = 10, k = 2) dL = .70 and dU = 1.64.

A screenshot of an Excel sheet includes residual output, with values of observation, non-fuel sales, year, residuals, and D W from rows 26 to 35.

Validate the Model with JMP

Copy year and nonfuel sales and paste next to residuals. Then, use JMP to find the lower and upper 95% individual prediction interval bounds.

Copy the year, convenience stores and nonfuel sales columns, through year 2024 and paste into JMP.

Choose File, New, Data Table.

A screenshot of J M P home window. The file option is selected, followed by the new and data table.

Then, in the data template, Edit, Paste with column names:

A screenshot of an Excel sheet. The menu bar consists of options named file, edit, tables, rows, cols, D O E, analyze, graph, tools, and view. A table of untitled 4 is open with the values of the c s and nonfuel sales from the years 2010 to 2024.

Select rows of 2020 and 2021, right click and choose the Hide and Exclude icon to run the validation model:

A screenshot of an Excel sheet. A list of options pops up by right-clicking a cell, in which the hide and exclude option is selected.

To run regression, select Analyze, Fit Model:

A screenshot of the window titled, untitled 4 J M P Pro. The menu bar consists of a file, edit, tables, rows, cols, D O E, and analyze. The fit model option is selected from the analyze menu.

Drag nonfuel sales to the Y box. Drag year and cs to the Construct Model Effects box. Change Emphasis to Minimal Report. Click Keep dialog open. Run.

A screenshot of the fit model window. It includes the model specification and the settings of pick role variables and construct model effects. The three columns at the left are labeled year, c s, and nonfuel sales.

In the regression output, click the red triangle at the top, Save Columns, Indiv Confidence Interval. This will add the 95% lower and upper individual prediction intervals to the JMP data file.

A screenshot of a menu from which the save columns option is selected. It opens a sub-menu titled, response nonfuel sales in billion dollars. From the sub menu, the individual confidence interval option is selected. A pop-up reads, the confidence interval for an individual realization of the response.
A screenshot of a window titled, untitled 4 J M P pro. The menu bar consists of file, edit, tables, rows, cols, D O E, analyze, graph, tools, view, window, and help options. It provides the values of the lower 95% individual nonfuel sales and the upper 95% individual nonfuel sales.

Select and copy the 95% prediction interval bounds and paste into the excel two driver regression sheet.

A screenshot of an Excel sheet provides the year, nonfuel sales, lipi, and uipi data from rows 26 to 40.

The model underestimates nonfuel sales in 2020. The Pandemic boosted Kroger’s sales as consumers avoided restaurants and dined at home. To account for this “shock,” add a Pandemic indicator in the data sheet, equal to zero before 2020, and equal to one 2020 onwards. (Note this assumes that the impact of the Pandemic will continue through 2024. We can easily change this assumption if it is believed that consumers will return to restaurants and buy less for home consumption in later years.)

Recalibrate with Excel and JMP

To recalibrate the model and update coefficients and forecasts, run the three driver model, this time using all of the available data. The Input Y Range will now be D1:D13. The Input X Range will now be A1:C13. Add the one tail pvalues.

A screenshot of an Excel sheet has 3 tables. Table 1 has regression and statistics values from rows 4 to 8. Table 2 of ANOVA has regression, residual, and total values from rows 12 to 14. Table 3 includes values of coefficient and standard error for intercept, year, c s, and pandemic from rows 17 to 20.

In JMP, again select 2020 and 2021 rows, right click, and again choose Hide and Exclude to unhide the two most recent datapoints. Add a new column Pandemic, zero through 2019 and one in 2020 onwards.

A screenshot of an Excel sheet. The column headers are year, c s, nonfuel sales, lower 95% individual nonfuel sales, upper 95% individual nonfuel sales, and pandemic. The values are listed from rows 3 to 15.

Rerun regression, adding Pandemic to the Construct Model Effects box. Save Columns, Indiv Confidence Intervals. Copy the 95% prediction interval bounds and paste next to year and nonfuel sales in the excel regression sheet.

A screenshot of an Excel sheet provides the year, nonfuel sales, lipi, and uipi data from rows 27 to 40.

Select year, revenues, and 95% lower and upper individual prediction interval bounds. Then insert a scatterplot of the fit and forecast. Add a vertical axis title, a stand alone chart title, adjust axes to make good use of white space, color both prediction interval bounds the same.

A line and scatter plot of non-fuel sales versus years from 2010 to 2024. Two lines of lipi and uipi follow a similar trend that rises, with a slight fall in 2018. The plots of non-fuel sales follow a similar trend and are plotted in between the lines.

Case 6.1: Fuel Economy of SUVs by Country of Design

US car manufacturers are challenged to design competitive vehicles that can achieve the new 40 miles per gallon required by 2025. Use the data in SUV mpg by Region.xlsx to build a model to account for variation in SUVs. Account for engine type, hybrid or conventional, as well as region of design, US or outside the US.

  1. (a)

    Copy and paste cells from the coefficient section of your regression, showing driver name, coefficient and one tail pvalues: (4)

  2. (b)

    Copy and paste your regression equation. Include a hat, show units, round to two or three significant digits. Split into two or three lines if needed, to avoid running off the page. (And use a font size large enough to read, such as 12 pt.) (4)

  3. (c)

    Interpret the driver coefficient estimates. (4)

  4. (d)

    How much lighter would a conventional SUV designed in the U.S. need to be to be as fuel efficient as a conventional SUV designed elsewhere? (2)

  5. (e)

    Produce a bar graph of the importances of the three drivers. Be sure to label axes, provide a stand alone chart title, set the horizontal axis minimum to zero. Color drivers with negative influence red. Use 12 pt. font. (4)

Case 6.2: Forecast of Sephora Revenues

An investor seeks a forecast of Sephora revenues through 2025.

Recent events may have influenced revenues:

  • Class action suit. In 2014, four Asian customers filed a class action suit alleging race discrimination. Sephora had discovered that some Asia customers were buying in bulk and reselling in Asian markets. Sephora deactivated some Asian accounts. This may have been a temporary shock or it may have longer term impacts.

  • Smart phone app. In 2016 Sephora introduced a smart phone app which enabled customers to try cosmetics virtually.

  • Mexico. In 2018, Sephora entered Mexico.

  • The Pandemic. The pandemic may have hurt sales growth in 2020. Stores temporarily closed. Any Pandemic impact is thought to be temporary.

  • Build a model of revenues to explain variation and forecast through 2025. Data are in Trend Forecast of Sephora.xlsx.

  1. (a)

    Copy and paste the coefficient section of your final, recalibrated model, showing variable names, coefficients and one tail pvalues: (4)

  2. (b)

    Present your equation. Include a hat, show units, round to two or three significant digits. Split into two or three lines if needed to prevent the equation from running off of the page. (And use a font size large enough to read, such as 12 pt.) (4)

  3. (c)

    Interpret each of the driver coefficients: (4)

Show your recalibrated fit and forecast scatterplot, below. Paste directly, not as a picture. Adjust axes to make good use of white space, label the vertical axis, add a stand alone chart title, and remove unwelcome zeros. Use 12 pt. font. (4).

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Fraser, C. (2024). Analytics to Account for Segment or Scenario Differences. In: Business Statistics for Competitive Advantage with Excel and JMP . Springer, Cham. https://doi.org/10.1007/978-3-031-42555-4_6

Download citation

Publish with us

Policies and ethics