Introduction

Marketing analytics, or data-driven marketing, enables managers to make decisions based on insights from analyzing data rather than relying upon their intuition or past experiences (Iacobucci et al. 2019). New technologies and increasing data availability in marketing (Petrescu and Krishen 2018; Van Auken 2015) enable the implementation of an evidence-based approach to marketing decision-making that promises better and less risky decisions (Wedel and Kannan 2016). For example, data-driven marketing uses data to estimate demand functions depending on prices and advertising budgets and derives optimal prices through profit maximization.

Successful marketing analytics teaching requires that future decision-makers, i.e., today’s marketing students, learn (i) how to analyze data and (ii) how to turn the insights from this analysis into marketing decisions.

Hence, such decisions require skills in econometrics and marketing. However, many multivariate data analysis textbooks primarily cover theoretical aspects of data analysis (e.g., Stock and Watson 2019; Wooldridge 2013), putting less emphasis on turning the econometric insights into (optimal) substantive decisions. In contrast, textbooks on substantive marketing topics focus on marketing management and less on data analysis (e.g., Kotler et al. 2019).

In the vein of more specialized marketing analytics textbooks (e.g., Mizik and Hanssens 2018), this article aims to bridge this gap between econometrics and substantive marketing knowledge by suggesting a marketing analytics case study that connects both skills.

This article aims to introduce and disseminate a marketing analytics case study that covers a fundamental task of marketing—deriving optimal prices—and sets this decision into context with several other typical marketing questions: measurement of advertising effects, marketing mix allocation, and performance-based compensation of sales representatives.

By analyzing a simulated data set including sales, prices, and other marketing mix variables, students learn the theory and practical application of standard econometric methods, such as the multivariate linear regression, and use its results to determine optimal prices, discuss advertising allocations to different marketing channels, and estimate the impact of those marketing decisions on sales and profit.

We assess the exercise’s efficacy as a 60-min final exam by analyzing the performance of 134 students in an undergraduate marketing analytics class. The results show that the exercise results in approximately normally distributed student performance across all sub-exercises with a mean close to half of the achievable points. Although the exercise is very challenging to solve in 60 min, the almost perfect solutions of a few students show that its difficulty is well-calibrated.

On an individual sub-exercise level, our empirical analysis shows that the sub-exercises vary in difficulty, as reflected by the distribution of student performance. These results indicate which sub-exercises confront students with the biggest challenge and suggest where educators could focus their teaching efforts.

Our article first introduces the didactic setting, including the case study’s pedagogical vision, learning goals, structure, and theoretical background. After introducing the context, we provide the full case study and detailed solutions with a rubric that enables educators to use the exercise as a case study, a (graded) assignment, or an exam. The article proceeds with a discussion of the case study’s implementation. Before concluding, we assess and discuss the case study’s efficacy by presenting the graded results from 134 exam submissions. We provide data, simulation codes, exercise templates, and solutions in the accompanying repository, https://github.com/lukas-jue/marketing-analytics-exercise.

Background of the case study

Description of pedagogical vision

We pursue the following two pedagogical aims with this case study:

  1. (a)

    equip students and future marketing managers with the quantitative intuition and econometric toolkit to analyze marketing data, and

  2. (b)

    enable students to use those quantitative skills to derive better marketing decisions.

Fundamentally, we want students to understand in which marketing setting which econometric method fits best. For example, this case study does not ask students to “estimate a linear regression and interpret its parameters” but instead asks them to derive optimal prices based on the available data. Hence, students need to identify the appropriate method themselves. Through this didactic approach, we aim to prepare the students for their future work by simulating a realistic working environment where part of the challenge is identifying the path toward a solution.

Description of learning goals

Table 1 describes the learning goals of our case study, which correspond to the two pedagogical aims stated above. In the first column, we list the marketing learning goals, which equip students with substantive marketing skills. The second column describes the analytical or econometric techniques students need to fulfill the corresponding marketing learning goal.

Table 1 Link between the Case Study’s Learning Goals in Marketing and Analytics

We structure the case study to achieve those stated learning goals through the following seven parts:

  1. 1.

    Starting with simple descriptive statistics and visualizations to understand the data and potential problems,

  2. 2.

    continuing with specifying the correct demand function and modeling it through an appropriate multivariate linear regression,

  3. 3.

    using the regression results to compute price and advertising budget elasticities of demand,

  4. 4.

    using the estimated demand function to derive optimal prices,

  5. 5.

    estimating the impact of changes in the advertising budget allocation on optimal prices,

  6. 6.

    using the econometric results to evaluate proposed marketing budget allocations, and

  7. 7.

    assessing which performance characteristics of sales representatives could justify different compensations for individual salespersons based on their expected profit (derived from the demand function).

This ordering follows a typical analytics workflow—from descriptively analyzing the data to modeling and deciding based on those results.

Considerations from the literature for successful case study development

Because our case study should be suitable for both exams and teaching, we consult the literature to derive characteristics of a successful exam exercise and teaching case study, respectively.

Characteristics of a successful exam exercise

One of the prominent theories in psychometrics and student testing is the Item Response Theory. This framework aids in constructing exams that accurately measure the latent student ability. To achieve this aim, Item Response Theory models students’ probability of correctly solving a sub-exercise as a function of the exercise’s difficulty, among other parameters (e.g., DeMars 2010). Constructing an exam with sub-exercises of varying difficulty enables the educator to differentiate between different levels of student ability.

Hence, in line with the requirements of Item Response Theory, our exam aims to feature sub-exercises of varying difficulty such that they can discriminate well between different levels of student performance.

Although assessing the exercise’s capability of measuring the latent ability remains beyond this manuscript’s scope, we posit that the full exam should be sufficiently difficult to distinguish between degrees of student performance granularly but not too difficult such that the exercise lacks differential power. Hence, the results of the exam should.

  1. (a)

    yield an approximately normal distribution of achieved points, thereby reflecting the assumed normal distribution of students’ abilities,

  2. (b)

    with students’ performances covering the entire range of possible outcomes.

While these two empirical characteristics refer to the full set of sub-exercises, the exam should feature sub-exercises from a broad range of difficulties. Hence, the exam results should feature.

  1. (c)

    sub-exercises with a comparatively high degree of correct submissions (i.e., relatively easy sub-exercises), and

  2. (d)

    exercises with a comparatively low degree of correct submissions (i.e., relatively challenging sub-exercises).

Eventually, we test whether those requirements hold in an exam setting in our empirical section. There, we present and discuss the student performance distribution to our exercise in an exam setting.

Additionally, we designed the exercise to feature high construct validity (e.g., Peter 1981). Transferring the role of construct validity in marketing research to an educational setting, a successful marketing analytics exercise should mirror the substantive marketing and analytics requirements demanded in students’ future careers. Hence, the exercise should reflect a typical marketing analytics task commonly faced in practice and academia—thereby correctly assessing students’ abilities in this field. We ensure this requirement through the topic choice (pricing) and the required analytics techniques (data visualization, regressions, and optimization).

Characteristics of a successful teaching case study

While the previous theoretical considerations apply primarily to the exam setting, we also aim to provide a useful case study for teaching purposes. Hence, the following theoretical considerations inspired the development of the case study.

First, and congruent to exhibiting a high construct validity, a successful case study for teaching purposes should exhibit constructive alignment (Biggs 1996). Hence, the case study should mirror the course’s intended learning outcomes. We designed the case study to feature high constructive alignment with a typical marketing analytics course, as reflected by the marketing and analytics learning goals in Table 1.

Second, besides the case study mirroring the academic course’s learning goals, it should also mirror what students expect in the real world. Hence, the case study should be an “authentic assessment,” meaning that students can expect a similar case once they graduate and practice marketing analytics (Montano et al., forthcoming).

Third, the case study should be sufficiently difficult to provide ample learning opportunities. Hence, some of the case study’s sub-exercises should be relatively challenging even for exceptional students, enabling discussions in which a teacher explains the underlying non-trivial approach to solve the case study correctly.

After introducing the exercise’s pedagogical vision, learning goals, and theoretical background, we now introduce the case study and provide detailed solutions, including a grading rubric.

Description and solution of the case study

Your B2B firm sells an energy-efficient heater fan and has divided the country into 100 comparable sales territories, each staffed exclusively by a female sales representative. The sales representatives differ according to their professional experience (measured by the number of years they have been in sales) and whether they have an engineering degree.

The management recently stated that 50% of the sales representatives have at least five years of professional experience in sales, and at least one-third of the sales representatives also have an engineering degree.

The firm’s management randomly varied the prices across the sales territories.

In addition, the management commissioned an advertising agency to support the sales representatives through marketing and provided a dedicated budget. The agency should use this marketing budget for telephone marketing and social media marketing, with an approximately equal allocation towards the two channels (i.e., 50% phone calls and 50% social media marketing). So far, the firm’s management has randomly varied the agency’s budget across the sales territories.

The variable cost per unit of a heater fan is $200.

In the file data_sales.csv, you receive information that helps you answer the questions. The data set includes 100 rows, each describing one of the 100 sales territories with the following variables:

Table: Description of the data set data_sales.csv

Variable

Description

quantity

Number of sold heater fans in the respective sales territory

price

Price of the heater fan sold in the respective sales territory

experience

Number of years the sales representative has been working in sales

engineer

Binary variable indicating whether the sales representative has an engineering degree (= 1) or not (= 0)

gender

Gender of the sales representative, female (= 1) or other (= 0)

location

Number of letters of the sales representative’s birthplace (e.g., 7 if the birthplace is “Chicago”)

budget_agency

Budget used by the advertising agency, which is either spent for phone or social media marketing, in $

budget_phone

Budget used for telephone marketing, in $

budget_social_media

Budget used for social media marketing, in $

(1): Exploratory data analysis

For each part of this first exercise, please answer the question by analyzing the data set through one visualization and one non-visual descriptive analysis.

We do not require custom labels for the visualizations.

(a)

Does the firm only employ female sales representatives? (2 points).

figure a

Yes, because the summary statistics and the barplot show that the data set only includes gender = 1, corresponding to females.

(b)

Do the prices differ across sales territories? (2 points).

figure b

Yes, prices differ across sales territories. For example, the summary statistics exhibit different values for the minimum ($209.4) and maximum ($369.4). Additionally, the histogram shows that prices vary across sales territories.

(c)

Is the management’s statement regarding the professional experience of their sales representatives correct? (2 points).

figure c

No, because the summary statistics and the boxplot show that the median of experience is 4. If management’s statement (“50% of the sales representatives have at least five years of professional experience in sales”) were true, then the median of experience should be 5.

An alternative and equally correct solution is to count the observations that have less than 5 years of experience:

figure d

(d)

Is the management’s statement regarding the background of their sales representatives correct? (2 points).

figure e

Management’s statement is incorrect because we only observe that 28% of sales representatives hold an engineering degree, not at least 33%.

$$\frac{28}{{72 + 28}} = 0.28 < \frac{1}{3} \approx 33{\text{\% }}.$$

(e)

Does the advertising agency’s budget differ across the sales territories? (2 points).

figure f

Yes, the budgets differ. The summary statistics outline that the minimum budget across all sales territories is 329, and the maximum is 2,056. Additionally, the histogram shows that the budget varies strongly across sales territories.

(f)

Did the advertising agency achieve the aim of approximately equally allocating the budget to the two channels (telephone and social media marketing)? (2 points).

figure g

Yes, all sales territories received almost equal telephone and social media marketing budgets. This insight is easiest to observe in the scatterplot. The visualization displays the almost perfect linear relationship and shows that the social media budget approximately equals the telephone marketing budget. The correlation coefficient is almost 1, and both budgets’ average values are also almost equal.

Note: Displaying the individual distributions of the two variables, e.g., via a histogram, is insufficient because a similar distribution of two variables is a necessary but not a sufficient condition for similar budgets in all sales territories.

(g)

Were the sales representatives born in at least 8 different cities? (2 points).

figure h

Yes, because the data set includes 11 different values of location, which implies that the sales representatives were born in at least 11 cities.

Step

Task

Max. Points

Achieved Points

1-a

Correct visualization (1 point), correct non-visual descriptive statistic (1 point)

2

 

1-b

see above

2

 

1-c

see above

2

 

1-d

see above

2

 

1-e

see above

2

 

1-f

see above

2

 

1-g

see above

2

 

Exercise (2): Estimating a demand function

Estimate a linear demand function. Justify which variables you include and which ones you ignore. Discuss the influence of price and professional experience on the quantity and how confident you are about their influence. (14 points).

Solution

Step 1: Identify the variables to include in the linear regression equation.

Include all variables except:

  • gender, because it only has the value 1 and, thus, no variation.

  • location, because there is no plausible reason why the character length of the sales representative’s city of birth should impact sales.

  • budget_phone and budget_social_media, because

    • the two variables are almost perfectly collinear (50% share each and correlation is almost 1), and

    • the sum of both variables is a linear combination corresponding to the third variable budget_agency, leading to perfect multicollinearity: budget_agency = budget_phone + budget_social_media.

Step 2: Use a linear regression to estimate the linear demand function.

figure i
figure j
figure k

Conceptual notes concerning model selection:

When deciding on the appropriate model for this exercise, students need to consider three crucial factors to ensure a correct estimation:

  • Correctly defining a demand function: the defined regression equation must feature quantity as the dependent variable. Additionally, price and experience must serve as independent variables to determine how they impact the quantity.

  • Omitted variable bias: failing to control for a variable that correlates with the dependent and other independent variable(s) leads to biased coefficient estimates. For example, omitting experience or one of the advertising budget variables could bias the coefficient estimate of price if the omitted variable(s) correlate with price and the dependent variable.

    However, this exercise introduces the simplifying assumption of randomly set prices, thereby ruling out endogenous prices through omitted variable bias. Randomizing prices might be challenging to implement in firms outside of narrow A/B tests. If the educator decides to omit randomization of prices as a simplifying element of the exercise, students must discuss the impact of endogeneity on the coefficient estimation and potential remedies.

  • Multicollinearity: Including at least two independent variables that strongly correlate with each other can bias those variables’ coefficient estimates. If this correlation is very high, estimated coefficients might exhibit opposing signs, even though the actual effects would be in the same direction. In an extreme case, including at least two perfectly collinear variables leads to perfect collinearity, which makes the regression estimation impossible due to a less-than-full rank of the design matrix of all independent variables.

Model selection:

Model 1 is the preferred model according to the justification from Step 1. In the incorrect model 4, we see the effects of collinearity between the two variables budget_phone and budget_social_media (as observed through their correlation, which is very close to 1) on the estimated coefficients. The coefficients in model 4 have different signs, even though the overall effect (model 1) and the effect of each variable alone in models 2 and 3 are positive. Additionally, the sum of the coefficients of the variables budget_phone and budget_social_media in model 4 is similar in magnitude to the total effect, i.e., the coefficient of the variable budget_agency in model 1.

Step 3: Interpretation of Model 1.

If the price increases by $1, the (expected) quantity decreases by 2.954 units.

For each additional year of work experience, the (expected) quantity increases by 9.034 units.

Both coefficients display p-values smaller than 1%. Hence, we are very confident for both coefficients that we can reject the null hypothesis that the coefficient is not different from zero. Formulated differently, we are confident that the estimated effect differs from zero.

Step

Task

Max. Points

Achieved Points

2–1

Correct justification of which dependent and independent variables to include in the linear regression

6

(6 Points for the correct model. 1 point deduction for each incorrectly included or excluded variable. 0.5 points deduction if the correct decision but not justified.)

 

2–2

Estimate regression with lm() and print summary()

2

 

2–3

Interpret the estimated coefficient of price correctly

3

(1.5 points for magnitude and statistical significance, each)

 

2–4

Interpret the estimated coefficient of experience correctly

3

(1.5 points for magnitude and statistical significance, each)

 

Exercise (3): Compute advertising and price elasticities of demand

Use your model to compute the price elasticity of demand and the (advertising) budget elasticity of demand. Interpret these elasticities and assess whether your computed elasticities are consistent with economic intuition. (12 Points).

Solution

First, note that elasticities in a linear demand model are not constant. Hence, the elasticity depends on the independent variable’s value. Students must pick a sensible value for both variables to compute the elasticities (advertising budget and price). A good choice would be to select the average values.

figure l

Interpretation of the elasticities:

At the average budget, a 1% increase in advertising budget yields a 0.06% increase in demand. Thus, the value of the advertising elasticity is 0.06. A positive advertising elasticity makes economic sense because more advertising should yield higher demand. The value is low but broadly in line with findings from meta-analytical studies (e.g., Sethuraman et al. 2011: 0.11).

At the average price, a 1% increase in price yields a 2.29% decrease in demand. So, the value of the price elasticity is − 2.29. The size of this value is plausible because a higher price yields a lower demand. The value also aligns with findings from meta-analytical studies (e.g., Tellis 1988: − 1.76; Bijmolt et al. 2005: − 2.62).

Step

Task

Max. Points

Achieved Points

3–1

Mentioning that elasticities from a linear demand function are not constant

2

 

3–2

Sensible choice of values at which to compute the elasticities (e.g., mean, median, profit-maximizing value)

2

 

3–3

Correct computation of the two elasticities

4

 

3–4

Correct interpretation of the two computed elasticities

2

 

3–4

Correct evaluation of the two computed elasticities’ plausibility

2

 

Exercise (4): Determine the optimal price

A sales representative with three years of professional experience and an engineering degree operates in a sales territory. In addition, the advertising agency spends a budget of $2,000, allocating 50% to telephone and 50% to social media marketing. Determine the optimal price in this setting and explain the steps you take to arrive at your result. Assume fixed costs of zero. What is the contribution margin per unit and the firm’s profit? (20 points).

Solution

Step 1: Write the demand function with regression coefficients from the most preferred model.

Notation (\(equation\)/code):

  • \(q\)/q: quantity sold

  • \({q}_{c}\)/q_c: quantity sold without price term, constant

  • \(p\)/p: price

  • \({c}_{v}\)/c_v: variable costs

    $$\widehat{q}={\widehat{\beta }}_{0}+{\widehat{\beta }}_{1}\cdot p+{\widehat{\beta }}_{2}\cdot experience+{\widehat{\beta }}_{3}\cdot engineer+{\widehat{\beta }}_{4}\cdot budget\_agency$$
    $$\widehat{q}={\widehat{q}}_{c}+{\widehat{\beta }}_{1}\cdot p$$

where \(\widehat{q}\) is the (expected value of) quantity.

\({\widehat{q}}_{c}\) is the expected quantity without considering the price term:

$${\widehat{q}}_{c}={\widehat{\beta }}_{0}+{\widehat{\beta }}_{2}\cdot experience+{\widehat{\beta }}_{3}\cdot engineer+{\widehat{\beta }}_{4}\cdot budget\_agency$$

Step 2: Compute \({\widehat{q}}_{c}\), the expected quantity without the impact of the price.

figure m

Step 3: Set up the cost and profit functions.

Cost function

$$c\left(\widehat{q}\right)={c}_{v}\cdot \widehat{q}={c}_{v}\cdot \left({\widehat{q}}_{c}+{\widehat{\beta }}_{1}\cdot p\right)$$

Profit function

$$\pi =p\cdot \widehat{q}-c\left(\widehat{q}\right)=p\cdot \left({\widehat{q}}_{c}+{\widehat{\beta }}_{1}\cdot p\right)-\left[{c}_{v}\cdot \left({\widehat{q}}_{c}+{\widehat{\beta }}_{1}\cdot p\right)\right]=p{\widehat{q}}_{c}+{p}^{2}{\widehat{\beta }}_{1}-{c}_{v}{\widehat{q}}_{c}-{c}_{v}{\widehat{\beta }}_{1}p$$

Step 4: Take the first derivative of the profit function with respect to \(p\) and set it equal to zero

$$\frac{\partial \pi }{\partial p}={\widehat{q}}_{c}+2p{\widehat{\beta }}_{1}-{c}_{v}{\widehat{\beta }}_{1}=0$$

Step 5: Rearrange the function to get the optimal price \({p}^{*}\)

$${p}^{*}=\frac{1}{2}\left({c}_{v}-\frac{{\widehat{q}}_{c}}{{\widehat{\beta }}_{1}}\right)$$

Step 6: Compute the optimal price by inserting the computed values from the above steps.

figure n

Step 7: Compute the profit contribution per unit.

figure o

Step 8: Compute the (total) profit.

figure p

The optimal price is $318.27. The profit contribution per unit amounts to $118.27. The firm sells 349 units of the heater fan, yielding a profit of $41,305.69.

Step

Task

Max. Points

Achieved Points

4–1

Description: Set up demand function with regression coefficients

2

 

4–2

Description: Constant demand \({\widehat{q}}_{c}\) includes all terms from the regression equation, excluding the price term

2

 

4–3

Description: Set up the profit function and insert the demand function into the profit function

2

 

4–4

Description: Take the first derivative of the profit function with respect to \(p\) and set it equal to zero

2

 

4–5

Description: Rearrange and solve for the optimal price \({p}^{*}\)

2

 

4–6

State the correct formula for the optimal price

2

 

4–7

Compute the expected quantity (constant) without the price term correctly

2

 

4–8

Compute the optimal price correctly

2

 

4–9

Compute the profit contribution per unit correctly

2

 

4–10

Compute the firm’s profit correctly

2

 

Exercise (5): Impact of changes on optimal prices

Answer the following two sub-questions by ticking the correct answer.

(a)

How does the optimal price change if a sales representative without an engineering degree works in the sales territory instead of a sales representative with an engineering degree? (2 points).

The optimal price…

  • increases

  • decreases

  • remains about the same

  • no statement possible

Solution

The optimal price remains about the same.

Explanation (not required):

Changes in the regression parameter of engineerYes cause a parallel shift of the demand curve (upwards if positive, downwards if negative) or leave the demand curve unaffected if it is statistically insignificantly different from zero.

In our case, the parameter estimate of engineerYes is statistically insignificantly different from zero. Hence, we can expect that the demand curve does not change much, implying that the optimal price remains the same.

The following computation of the optimal price for a sales representative without an engineering degree supports this claim:

figure q

compared to the optimal price with an engineering degree.

figure r

The optimal price of a sales representative without an engineering degree is $318.11, and with an engineering degree is $318.27. The price difference is only $0.16.

Conclusion: The magnitude of the difference between the two optimal prices is very small and statistically insignificantly different from zero (\(p>0.1\)). Hence, the optimal price remains about the same.

(b)

How does the optimal price change if a sales representative with only one year of professional experience works in the sales territory instead of a sales representative with three years of professional experience? (2 points).

The optimal price…

  • increases

  • decreases

  • remains about the same

  • no statement possible

Solution

The optimal price decreases.

Explanation (not required):

Similar to the above exercise, we must consider how the demand function changes. The demand function changes significantly because the experience regression parameter is statistically significantly positive.

Hence, a decrease in the level of experience from 3 to 1 (= -2) years causes a parallel downward shift of the demand function. This change leads to lower demand. Thus, the optimal price decreases.

The following computation of the optimal price with one year of experience supports this claim.

figure s

compared with the optimal price with three years of experience:

figure t

The optimal price of a sales representative with a one-year experience is $315.21, and with a three-year experience, $318.27. So, the price decreases ($-3.06).

Conclusion: the optimal price decreases by $-3.06, in line with the expectation, because the coefficient on the variable experience is statistically significantly different from zero (\(p<0.01\)).

Step

Task

Max. Points

Achieved Points

5-a

Correct answer

2

 

5-b

Correct answer

2

 

Exercise (6): Marketing budget allocation

The management wants to know whether it should spend the marketing budget on telephone marketing or social media marketing. Which recommendation can you give based on your analyses? (2 points).

Solution

In this case, no recommendation is possible because the firm always allocated 50% of the budget to telephone marketing and 50% to social media marketing in each sales territory. Hence, the two budget variables, budget_phone and budget_social_media, are almost perfectly collinear, making the correct estimation of each variable’s effect on the quantity sold impossible.

Step

Task

Max. Points

Achieved Points

6

Correct answer

2

 

Exercise (7): Performance and compensation of sales representatives

(a)

Do female sales representatives sell more than non-female sales representatives? (2 points).

Answer this question based on your previous results.

Solution

No answer is possible because the data set only includes female sales representatives.

(b)

Do business outcomes justify paying sales representatives with an engineering degree a higher salary than those without an engineering degree? (2 points).

Answer this question based on your previous results.

Solution

No, because the expected sold quantity does not differ whether a person holds an engineering degree (see engineerYes). While the estimated coefficient is positive, it is not statistically significantly different from zero (\(p>0.1\)). So, we do not expect a substantial increase in the quantity sold that justifies a higher salary for sales representatives with an engineering degree.

Decrease in estimated sales without an engineering degree:

figure u

Decrease of profit:

figure v

The profit decreases by $− 112.86. However, this effect is very small compared to the profit of $41,305.69. Additionally, the coefficient estimate of engineerYes is not statistically significantly different from zero (\(p>0.1\)).

(c)

Do business outcomes justify paying sales representatives with more professional experience a higher salary than those with less professional experience? (2 points).

Answer this question based on your previous results.

Solution

Yes, because the expected quantity increases by 9.04 units for each additional year of experience. The profit contribution per unit is $118.27, so 9.04 additionally sold units increase the profit by $1,068.97 (= 9.04 * $118.27).

Additionally, the estimated coefficient on the variable experience is statistically significantly different from zero (\(p<0.01\)). So, we expect a substantial decrease in the quantity sold that justifies a lower (higher) salary for sales representatives with less (more) experience.

Step

Task

Max. Points

Achieved Points

7-a

Correct answer, stating that no conclusion is possible

2

 

7-b

Correct answer, including stating that the coefficient is not statistically significantly different from zero

2

(0.5 points deduction if no level of significance is defined)

 

7-c

Correct conclusion based on regression estimates

2

 

Discussion of the case study’s implementation

Description of target audience and requirements

Previous course requirements

The present case study aims at students starting from second-year undergraduates with a basic understanding of statistics and econometrics (e.g., through first-year statistics or econometrics courses) and marketing (e.g., an introductory marketing course). Additionally, the case study requires the fundamental calculus skill of taking the first derivative of a profit function and solving for the optimal price. Students typically learn such skills in an introductory microeconomics course.

Software requirements

To complete this case study, students need to be familiar with the R programming language (R Core Team 2023) or another similar statistical programming language such as Python or Stata. In our solution using R, we use “Base R,” i.e., the core functionalities of the R language available without installing additional packages.

The only exception used in the solution is the package stargazer, which conveniently creates regression tables for multiple model specifications (Hlavac 2022). Using this package makes a quick model comparison feasible.

If educators focus less on the calculus requirement of the optimal pricing decision, symbolic programming packages could provide a shortcut, making the case study accessible for students without taking the first derivative. One such package is calculus (Guidotti 2022). For example, students can take the first derivative of the profit function from Exercise 4,

$$\begin{aligned} \pi & = p \cdot \hat{q} - c\left( {\hat{q}} \right) \\ & = p \cdot \left( {\hat{q}_{c} + \hat{\beta }_{1} \cdot p} \right) - \left[ {c_{v} \cdot \left( {\hat{q}_{c} + \hat{\beta }_{1} \cdot p} \right)} \right] \\ & = p\hat{q}_{c} + p^{2} \hat{\beta }_{1} - c_{v} \hat{q}_{c} - c_{v} \hat{\beta }_{1} p \\ \end{aligned}$$

with the following R code:

figure w

Scope of the case study

In this case study, students learn to handle threats to a correct estimation of coefficient estimates in linear regression analysis and understand how those apply when estimating a demand function. This case study assesses three critical factors when estimating a demand function:

  • Correctly defining a demand function,

  • considering omitted variable bias as one source for endogeneity,

  • multicollinearity as a reason for biased coefficient estimates.

Beyond ensuring that students select the correct variables from the data set, we avoid assessing students’ comprehension of endogeneity problems when estimating demand as a function of prices by outlining that management randomly sets prices. Random variation enables us to estimate the causal effect easily. While such a pricing setting can occur in the context of A/B tests or larger experimental studies, it might not be too realistic. However, this shortcut enables us to derive an easier case study that focuses on data-driven pricing, i.e., turning statistical insights into pricing decisions. If the corresponding course discusses endogeneity in detail, educators can drop the simplifying assumption of random prices.

While we do not focus on endogeneity, we assess students’ understanding of another potential threat to the unbiased identification of demand function parameters: multicollinearity. While students sometimes perceive the problem and consequences of multicollinearity as theoretical and abstract, our setting outlines a marketing scenario that will likely occur in practice.

Students must critically assess the included variables in the data set and disregard a set of variables being highly or even perfectly collinear to other potential independent variables. However, a simple look at correlations between independent variables is not enough in this setting—students have to combine their econometric understanding of collinearity with economic intuition.

In our case, two advertising channels receive almost, but not strictly equal, budgets. The sum of those two budgets yields the total budget. Students must first identify that the former two variables add to the latter variable, thus facing perfect collinearity when including all three variables. Secondly, students should identify that the two budget variables correlate highly but not perfectly. As highlighted in the incorrect model specifications, this high correlation leads to severe multicollinearity—producing opposite-sign coefficient estimates for the impact of advertising on sales—while the individual (and correct) inclusion of only one budget variable always yields a positive estimate. Third, students should understand why multicollinearity occurs in many settings. For example, the management’s request to split the budget 50:50 is a request that could easily occur in a real-world setting.

Corresponding marketing analytics class

The preceding marketing analytics class teaches students how to use a statistical programming language—in our case, R, but software such as Python or Stata works equally well—to analyze data and turn these results into marketing decisions.

This class focuses on an experiential and active learning approach that prioritizes students “getting their hands dirty.” That means we provide ample time and case studies in class for students to apply the econometric techniques themselves, interpret the econometric results in a managerially relevant way, and critically discuss the results in students’ small peer groups and larger class conversations.

We structure the proposed case study and the associated marketing analytics class in a way that initially leads students, most with only basic econometric or statistical programming experience, to lose potentially existing fear of statistics or programming quickly. Hence, we prioritize interactive learning opportunities through in-class case studies with readily available teaching staff for questions. By quickly and individually giving feedback, the instructors can lead the student groups to quick results and instill a positive “can do” mentality, which is critical for learning new quantitative techniques that students often perceive as difficult.

Discussion questions for educators

Table 2 provides educators with initial questions to lead a classroom discussion about the case study. Those questions should prompt the students to solve the case study successfully. In addition, the questions aim to induce students to think about how and under which circumstances they can transfer the case study’s methods to other scenarios.

Table 2 Teaching questions by topic

Application and results of the case study as an exam exercise

Setup of the exam

We initially developed and applied the case study as a computer-based final exam for second to third-year undergraduate students in business and economics. The course is a mandatory marketing analytics class for undergraduates majoring in “Management and Marketing” at a large public university. During the semester, students obtained substantive marketing and analytics (econometrics) skills corresponding to the learning goals in Table 1. The presented exercise matches some of a quantitative marketing analyst’s most essential expected skills. Hence, we ensure the constructive alignment of the exercise to what students can expect once they graduate.

The final exam was administered in the year 2023 to 134 students at the end of the course. We used a computer classroom with all necessary software pre-installed to ensure a controlled exam setting and equal opportunity, irrespective of student hardware. Consistent with the course’s learning goals, we intended to avoid students having to memorize easily accessible information, such as specific code snippets or function names for statistical procedures. Hence, the exam mode is “open book,” allowing students to reference all physical and digital materials, such as pre-existing R codes from the preceding class. Exam supervisors ensured that no communication among students took place. Students then had 60 min to solve the exam question. The exercise discussed in this article was part of a 90-min exam that contained an additional (smaller) exercise.

Demographic characteristics of the students

This subsection discusses which students took the exam. The examination office strictly anonymizes student data, so we only obtained a student identifier without the student’s name or demographic characteristic. Hence, we cannot make a precise statement about the demographics of students in this exam cohort.

However, course evaluations (N = 24) taken before the exam suggest that 50 – 60% of the students in the corresponding class are female, and the age distribution centers around 21 – 22 years. Approximately nine out of ten students enroll in a business and economics undergraduate degree, with the remainder taking the course as part of a business minor in another degree, such as (business) pedagogy or other social sciences. Students majoring in business and economics have passed introductory statistics, mathematics, microeconomics, and marketing courses before being able to enroll in this marketing analytics course and have completed at least three semesters. Most students attended it in their fourth or fifth semester of an undergraduate program scheduled for completion in six semesters. Nevertheless, a few students attended the course later than the fifth semester, in line with the observation that many students require more than six semesters to finish their studies.

Results of the exam

Figure 1 visualizes the distribution of achieved points from the 134 exam submissions. The student performance approximately follows a normal distribution around the mean achieved points of 27.2 and a standard deviation of 11.1 points. The average student correctly solved 27.2/60 = 45.3% of the exam. The minimum achieved points were 0 (some students prefer to fail the exam instead of getting a bad passing grade, so there is an incentive to submit an empty solution). The best-performing student achieved 57.5 out of 60 points.

Fig. 1
figure 1

Distribution of student performance for the exercise in an exam setting (N = 134)

Figure 2 displays the distribution of the student performance by sub-exercise. This visualization serves as a proxy for a sub-exercise’s difficulty—the lower the share of achieved points relative to the achievable points, the more difficult the sub-exercise was to solve for the students.

Fig. 2
figure 2

Distribution of student performance by sub-exercise. Note: Due to the exam’s time constraints, the implemented exam consisted of all discussed sub-exercises except for Exercise 3 (Price Elasticity). Hence, we present the empirical results from all but this sub-exercise

Ranking the sub-exercises by this metric yields that students performed best in Exercise 2, which estimated a demand function through a regression. The most challenging was Exercise 6, which required students to decide on the optimal budget allocation under multicollinearity. However, the share of achieved points is only an imperfect measure of difficulty since all sub-exercises appeared chronologically for all students. Hence, students might have had less time to solve the sub-exercises towards the end of the exam.

Discussion of the applicability as an exam and implications for other settings

The significant variation in Fig. 2’s distributions shows that our case study’s sub-exercises exhibit various degrees of difficulty, enabling the educator to assess students’ performance very granularly.

Nevertheless, the comparatively low average number of achieved points (27.2/60 = 45.3%) shows that it is challenging for the given student population to solve the full exercise correctly in 60 min. Still, the occasional almost perfect scores show that very strong students can solve the full exercise correctly within 60 min. Hence, we conclude that the exercise is well-calibrated to measure varying degrees of student performance.

These results from using the exercise as an exam hold implications for other settings earlier in the student’s learning journey, such as take-home or in-class case study assignments.

First, students need sufficient time to work on the exercises. Assuming a lower familiarity with the subject than students writing an exam, we recommend that students in a take-home case study or in-class group work have at least 120 min to solve the case study as part of a learning experience.

Second, the exam results show in which sub-exercises students struggle to find the correct solution. Exercise 6 (marketing budget allocation under high multicollinearity) and Exercise 4 (determining the optimal price) were the most challenging parts of the case study. Hence, it might be sensible for educators to spend most of their time discussing those sub-exercises and the underlying concepts.

While our analysis shows the promise of the case study in teaching and assessing marketing analytics, this analysis cannot comment on whether this case study is optimal such that it maximizes student learning. This article aims to make this case study freely available to educators as a first step in improving learning outcomes in marketing analytics. Hence, we encourage further research comparing our case study’s efficacy to other teaching and assessment approaches in marketing analytics.

Summary

This article described a case study to teach or assess marketing analytics skills. The presented case study combines substantive marketing skills with econometric methods, conveying how data-driven marketing can create value for firms and enable better decision-making. Through accessing our open-source repository, marketing educators can use this case study, its solution, and the accompanying data (accessible via https://github.com/lukas-jue/marketing-analytics-exercise) as a template for in-class group work, homework, or computer-based exams.