Abstract
Retailers often co-locate spatially to draw consumers, even though it increases price competition. The paper develops a structural model of entry and location choice that isolates the agglomeration benefit of co-location, after controlling for pure differentiation rationales for co-location such as (1) high demand and/or low cost at the location, (2) zoning restrictions, and (3) format differentiation that minimizes the need for spatial differentiation. We augment the entry and location choice data used in the literature with revenue and price data to help identify the agglomeration effect. We introduce a new approach to obtaining zoning data across a large number of markets that should be of general interest for a large stream of spatial location applications. We find that agglomeration benefits explain a significant fraction of observed co-location. While zoning restrictions have a little direct impact on co-location, in combination with the agglomeration benefit, they explain a surprisingly large fraction of observed co-location.
Similar content being viewed by others
Notes
Some empirical evidence of the benefits of spatial co-location can be found in Fox et al. [20] and Watson [37]. Vitorino [35] finds evidence for inter-store spillovers in a particular kind of retail cluster—shopping malls. In a mall setting, however, firms only make a strategic entry decision,they do not face the tradeoff of whether to co-locate or spatially differentiate with rivals.
For example, Fox et al. [20] use data from a multi-outlet panel to study consumers’ shopping behavior and its impact on store revenues. However, their data is from a single major metropolitan market.
Orhun [27] attempts to control for location-specific common profit shocks. However, with only choice data, one can only model latent profits whose errors have to be normalized for estimation. For instance, Orhun [27] assumed that the distribution of common profit shocks have a standard normal distribution.
We do not have store entry dates which are required to solve a dynamic choice game. However, our model can be extended to a dynamic set-up similar to Aguirregabiria and Vicentini [2] who have proposed a dynamic model of an oligopoly industry characterized by spatial competition.
See Aguirregabiria et al., [4] for a discussion on the distinction between multiple equilibria in model and multiple equilibria in data.
If we do not impose such a cap on the maximum distance, then the estimation becomes very cumbersome and slow as our dataset consists of several large markets that consist of large number of locations and CBGs.
We use per capita income for convenience. Alternatively, one could, of course, use other better variables such as per capita expenditure on grocery.
We use sales-weighted prices across all categories in a store as the price index of the store.
We normalize the log of profit from not entering a market (the outside option) to zero; this implies that the profit for the outside option is normalized to 1. The log transformation also implies that the profit in Eq. (3) is restricted to being non-negative.
Another application of the NPL approach for a static game can be found in Ellickson and Misra [17].
Many of the fixed points may be identical.
Su and Judd [32] suggest using a Mathematical Programming with Equilibrium Constraints approach that finds the parameter estimates and the equilibrium CCPs simultaneously. However, like the parallel-NPL, this approach also relies on multiple runs with different starting values to find different equilibria. Hence, its ability to find the global optimum in problems that have a large action space (as in our entry and location choice problem) is unclear.
Kasahara and Shimotsu [24] suggest the following procedure for selecting the value of \(\delta\): Simulate a sequence \({\left\{{\widetilde{P}}_{n}\right\}}_{n=0}^{N}\) by iterating the transformed mapping for different values of \(\delta\), say for \(\delta \in \left\{\mathrm{0.1,0.2},...,0.9\right\}\). Then pick the value of \(\delta\) that leads to the smallest value of the mean of across n = 1,…, N.
We have weekly product category-level price index data for a one year period for 27 grocery product categories and for each store that belongs to the store chain (\({\mathrm{pr}}_{cts}=\sum\limits_{\forall i\in c}\sum\limits_{\forall u\in i}{w}_{ciuts}*p{r}_{ciuts};\) where, \({w}_{ciuts}\) is the revenue share of UPC, u, of item, i, within product category, c, for week t in store s). To construct store-level price indices, we adopt an approach similar to Chevalier et al. (2003, p. 22). That is, we aggregate over the product categories and weeks to form a store-level price index (\({\mathrm{pr}}_{s}=\sum\limits_{c=1}^{27}\sum\limits_{t=1}^{52}{w}_{cts}*{\mathrm{pr}}_{cts};\) where \({w}_{cts}\) is the dollar share of category c in week t in store s).
A comparison of the market configurations between 2001 and 2008 showed that the number of stores in these markets increased less than 10% from 399 to 438.
In this paper, distance between two points always refers to the great-circle distance.
A pixel point is one of the individual dots that make up a graphical image. Each pixel point combines red, green, and blue phosphors to create a specific color.
Note that competition between stores in neighboring locations cannot explain the absence of big-box stores in a location as we are considering big-box stores across any segment of the retail industry.
Comparing the results (not presented here) with different specifications for the maximum distance that consumers may travel for shopping, Rad, suggested that a distance of 5 mi. was sufficient. Rad values of 6 mi. and above did not change parameter estimates or increase the likelihood value significantly (vis-à-vis AIC and BIC criteria). On the other hand, Rad values of 4 mi. and below resulted in significantly different estimates for some model parameters and also gave significantly smaller likelihood values.
For this counterfactual simulation, we are counting stores within 1 mi. of a rival as a co-located store. It is plausible that the two stores belong to two neighboring 1-mi.2 block retail location whose commercial centers happen to be within 1 mi. of each other.
We acknowledge that there is a potential selection bias because we only observe revenue data for locations that were chosen. Ellickson and Misra [16] propose a selection correction function in their application where supermarkets choose from one of three pricing strategies. However, their approach suffers from a curse of dimensionality in cases where the cardinality of firms’ action space is large, as is the case for firms choosing from multiple locations within a market.
References
Aguirregabiria V, Mira P (2005) A genetic algorithm for the structural estimation of games with multiple equilibria. Working paper
Aguirregabiria V, Vicentini G (2012) Dynamic spatial competition between multi-store firms. Working paper
Aguirregabiria V, Mira P (2007) Sequential estimation of dynamic discrete games. Econometrica 75(1):1–53
Aguirregabiria V, Bajari P, Draganska M, Einav L, Horsky D, Misra S, Narayanan S, Orhun Y, Reiss P, Seim K, Singh V, Thomadsen R, Zhu T (2008) Discrete choice models with strategic interactions. Mark Lett 19:399–416
Aradillas-López A (2020) The econometrics of static games. Ann Rev Econ 12(1):135–165
Arentze TA, Oppewal OH, Timmermans HJP (2005) A multipurpose shopping trip model to assess retail agglomeration effects. J Mark Res 42(February):109–115
Bajari P, Benkard L, Levin J (2007) Estimating dynamic models of imperfect competition. Econometrica 75(5):1331–1370
Berry S (1992) Estimation of a model of entry in the airline industry. Econometrica 60:889–917
Bester H (1998) Quality uncertainty mitigates product differentiation. RAND J Econ 29(Winter):828–844
Bresnahan T, Reiss P (1991) Entry and competition in concentrated markets. J Polit Econ 99:977–1009
Ciliberto F, Tamer E (2009) Market structure and multiple equilibria in airline markets. Econometrica 77(6):1791–1828
de Paula A (2013) Econometric analysis of games with multiple equilibria. Ann Rev Econ 5(1):107–131
Draganska M, Mazzeo M, Seim K (2009) Beyond plain vanilla: modeling joint product assortment and pricing decisions. Quant Mark Econ 7(2):105–146
Dudey M (1990) Competition by choice: the effect of consumer search on firm location decisions. Am Econ Rev 80(5):1092–1104
Ellickson PB, Houghton S, Timmins C (2013) Estimating network economies in retail chains: a revealed preference approach. Rand J Econ 44(2):169–193
Ellickson PB, Misra S (2012) Enriching interactions: incorporating revenue and cost data into static discrete games. Quant Mark Econ 10:1–26
Ellickson PB, Misra S (2008) Supermarket pricing strategies. Mark Sci 27(5):811–828
Fischer JH, Harington JE (1996) Product variety and firm agglomeration. RAND Journal of Economics 27:281–309
Fox J (2007) Semiparametric estimation of multinomial discrete choice models using a subset of choices. RAND J Econ 38(4):1002–1019
Fox E, Postrel S, McLaughlin A (2007) The impact of retail location on retailer revenues: an empirical investigation. Working paper
Holmes T (2011) The diffusion of Wal-Mart and economies of density. Econometrica 79(1):252–301
Homer C, Huang C, Yang L, Wylie B, Coan M (2004) Development of a 2001 national land-cover database for the United States. Photogramm Eng Remote Sens 70(7):829–840
Jia P (2008) What happens when Wal-Mart comes to town: an empirical analysis of the discount retailing industry. Econometrica 76(6):1263–1316
Kasahara H, Shimotsu K (2012) Sequential estimation of structural models with a fixed point constraint. Econometrica 80:2303–2319
Konishi H (2005) Concentration of competing retail stores. J Urban Econ 58:488–512
Mazzeo M (2002) Product choice and oligopoly market structure. RAND J Econ 33:221–242
Orhun Y (2013) Spatial differentiation in the supermarket industry: the role of common information. Quant Mark Econ 11:3–37
Pakes A, Ostrovsky M, Berry S (2007) Simple estimators for the parameters of discrete dynamic games, with entry/exit examples. RAND J Econ 38:373–399
Pesendorfer M, Schmidt-Dengler P (2008) Asymptotic least squares estimators for dynamic games. Rev Econ Stud 75:901–928
Seim K (2006) An empirical model of firm entry with endogenous product-type choices. RAND J Econ 37(3):619–640
Stahl K (1982) Differentiated products, consumer search, and locational oligopoly. J Ind Econ 31(1–2):97–113
Su C, Judd KL (2012) Constrained optimization approaches to estimation of structural models. Econometrica 80:2213–2230
Thomadsen R (2007) Product positioning and competition: the role of location in the fast food industry. Mark Sci 26(6):792–804
Varian RH (1980) A model of sales. Am Econ Rev 70:651–659
Vitorino MA (2012) Empirical entry games with complementarities: an application to the shopping center industry. J Mark Res 49(2):175–191
Vogelmann JE, Howard SM, Yang L, Larson CR, Wylie BK, Van Driel JN (2001) Completion of the 1990’s national land cover data set for the conterminous United States. Photogramm Eng Remote Sens 67(6):650–662
Watson R (2005) Entry and location choice in eyewear retailing. mimeo., University of Texas-Austin.
Wolinsky A (1983) Retail trade concentration due to consumers’ imperfect information. Bell J Econ 14(1):275–282
Zhu T, Singh V, Manuszak M (2009) Market structure and competition in the retail discount industry. J Mark Res 46(4):453–466
Zhu T, Singh V (2009) Spatial competition with endogenous location choices: an application to discount retailing. Quant Mark Econ 7(1):1–35
Marshall A (1920) Principles of Economics, 8th edn. Macmillan, London
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1
Expected total number of competing stores in a location:
The expectation that a location will have stores with other formats besides format-f:
Expected number of format-f’ rivals in distance band b around a format-f store that is in location l (interformat competition):
where \({\mathcal{l}}_{lb}\) is the set of locations in distance band b around location l.
Expected number of format-f rivals in distance band b around a format-f store that is in location l (intraformat competition):
When accounting for the number of rivals with the same format, we need to discount the choice probability of the focal firm, conditional on its decision to enter the market:
Note that the probability that an f-format firm enters the market is simply \(\sum_{l=1}^{{l}_{m}}{p}_{fl}\). Hence, the probability \(\left({p}_{fj}\left|f\mathrm{ Enters }m\right.\right)\) is given by \({p}_{fj}/\sum_{l=1}^{{l}_{m}}{p}_{fl}\) and Eq. (36) can be rewritten as:
Appendix 2
-
Step 0: Initial population:
Generate a set of T vectors of starting values for retailers’ beliefs about rivals’ CCPs for location choices, \(\left[{\overline{P} }_{0}^{1};{\overline{P} }_{0}^{2};...;{\overline{P} }_{0}^{T}\right]\) Also, create an initial guess for the parameter vector,\(\theta \left(=\left\{\alpha ,\beta ,\gamma ,\sigma ,\rho \right\}\right)\).
-
Step 1: Locally contractive, q-NPL iteration:
For the likelihood maximization, set up an internal loop to do the following for each of the T CCP vectors:
Given the current parameter values, pick a large number of Halton draws of price, revenue, and cost shocks for all retail locations. Obtain the location choice probabilities (Eqs. 20–22) and the market-specific cost parameters (Eqs. 26 and 27). Next, calculate the price indices of firms, sans the unobserved component, for the chosen locations and with the observed configuration of stores in 2001. Compare the price estimates with the price data to obtain the price shocks at the chosen locations of the store chain for which we have price data,\(\left({\overline{\omega }}_{\mathrm{obv}}^{\mathrm{pr}}\left|\theta \right.\right)\). Also, calculate the revenues of stores by integrating over the distribution of unobserved price shocks, sans the unobserved revenue component, for the chosen locations and with the observed store configuration in 2008. Compare the revenue estimates with the revenue data for all stores to obtain the revenue shocks of firms in their chosen locations,\(\left({\overline{\omega }}_{\mathrm{obv}}^{r}\left|\theta \right.\right)\). We now have all the components of the likelihood function.Footnote 27
Maximize the pseudo likelihood (Eq. 28) to obtain a set of T vectors of parameter estimates: \({\Theta }_{n}^{t}=\underset{\Theta }{\mathrm{argmax}}\left(L\left({\overline{P} }_{n-1}^{t},\Theta \right)\right)\) and a new population of CCPs using the q-NPL operator:\({\widehat{\overline{P}} }_{n}^{t}={\Lambda }^{q}\left({\overline{P} }_{n-1}^{t},{\Theta }_{n}^{t}\right)\).
Within each market, normalize the CCPs for each store format so that the CCPs of all formats add up to one. Essentially, for each format-f, and market location l, we have:
-
Step 2: Selection of Parents:
Based on their fitness, draw, with replacement, T “mother” CCP vectors and T “father” CCP vectors from the set, \(\left[{\widehat{\overline{P}} }_{n}^{1};{\widehat{\overline{P}} }_{n}^{2};...;{\widehat{\overline{P}} }_{n}^{T}\right]\) and form couples or Parents. CCPs with high likelihood values, \(L\left({\widehat{\overline{P}} }_{n}^{t},{\Theta }_{n}^{t}\right)\), and those closer to convergence (Absolute value of \(\left({\widehat{\overline{P}} }_{n}^{t}-{\overline{P} }_{n-1}^{t}\right)\) closer to zero) are considered more fit to continue. In our problem, we use the following fitness criterion:
where, \({\lambda }_{1}\) and \({\lambda }_{2}\) are small positive constants. The tth CCP vector gets selected with the probability:
Now, we have the set of couples: \(\left[\left({\hat{\bar{P_{n}^{{1}^{^{\prime}}}}} },{\hat{\bar{P_{n}^{{1}^{^{\prime\prime} }}}} }\right);\left({\hat{\bar{P_{n}^{{2}^{^{\prime}}}}} },{\hat{\bar{P_{n}^{{2}^{^{\prime\prime} }}}} }\right);...;\left({\hat{\bar{P_{n}^{{T}^{^{\prime}}}}} },{\hat{\bar{P_{n}^{{T}^{^{\prime\prime}}}}} }\right)\right]\)
-
Step 3: Crossover and mutation
Obtain an offspring from each couple as follows:
where D is a vector of indicators for the identity of the parent who provides each element of the CCPs. Its elements are i.i.d. with \(\mathrm{Pr}\left({D}_{j}=1\right)=0.5\) for the jth element. Zn is another vector of indicators for the identity of the elements of the CCPs, which undergo mutation. Its elements are also i.i.d. with \(\mathrm{Pr}\left({Z}_{jn}=1\right)=0.5/\sqrt{n}\). Hence, with multiple iterations, as we get closer to the global optimum, we allow the number of mutations to reduce to zero. Finally, \({\delta }_{n}\) is a vector whose elements represent the magnitude of a mutation. It is also defined such that its elements go to zero with multiple iterations. Specifically, we use: \({\delta }_{jn}\in U\left(-0.5/\sqrt{n},\mathrm{ 0.5}/\sqrt{n}\right)\)
As with step 1, within each market, again normalize the CCPs so that the CCPs of all formats add up to one. Now, we have the new set of CCPs,\(\left[{\overline{P} }_{n}^{1};{\overline{P} }_{n}^{2};...;{\overline{P} }_{n}^{T}\right]\).
Iterate steps 1–3 until the set of CCPs converges.
Appendix 3
Price index data are not critical for identification because our model parameters can be potentially identified with only revenue data as we are exploiting the variation in the spatial distribution of consumers around stores, the locations of stores relative to consumers, and the locations of stores relative to competitors. We illustrate the intuition using a stylized example.
Consider two markets (illustrated below), each with two consumers, C1 and C2, and two firms, F1 and F2. Consumers’ distances from F1 and F2 are the same in both markets \(\left(\sqrt{6}\mathrm{ mi. and }\sqrt{10}\mathrm{ mi.}\right)\). So, any difference in the observed revenue of F1 in the two markets (similarly for F2) must be because of a difference in the price index of F1 in these markets, which in turn will be attributed to the difference in the location of F1 relative to F2 in these markets (2 mi. vs. 4 mi.).
This stylized example illustrates that even without any price data, the variations in observed revenue and the distances of stores from their respective rivals can potentially help identify the competition parameters in our price index function. Furthermore, format-specific scaling parameters (i.e., intrinsic ability parameters in the price index function and customer value parameters in the volume function) can also be identified from the magnitude of revenue and its response to variations in distances from rivals versus its response to variations in consumer characteristics or distances from consumers. Therefore, price index data may not be critical if we have a large number of sample markets so that the data are likely to be sufficiently rich in terms of the spatial distribution of consumers around stores, the locations of stores relative to consumers, and the locations of stores relative to competitors. But when working with a limited number of sample markets, augmenting the revenue data with even a limited amount of price index data can give us more precise estimates.
Appendix 4
The following steps explain the technical operations involved in extracting commercial land use pixel point data from NLCD. This is the authors’ original approach. However, a more efficient approach may be plausible.
-
1.
Open NLCD data in ArcGIS
-
2.
Zoom in to the interested market area and select the data frame for further processing
-
3.
Change coordinate system to WGS 1984
-
4.
Reclassify the raster data to show only commercial land pixel points
-
5.
Convert the reclassified raster data into Point Features and save them as a Shapefile
-
6.
Convert the saved Shapefile into a kml file using shp2kml software. The kml file can be opened in Google Earth (GE), allowing us to see the pixel point data on GE
-
7.
Make a copy of the saved kml file and rename the file from “.kml” to “.xml.” This xml file can be opened in Excel, and the spreadsheet will show the coordinates (latitude and longitude) of each pixel point, which may be used for further analysis
-
8.
The count of these pixel points within each 1-mi.2 block market location gives a measure of the intensity of commercial activity in the location, and the mean of the coordinates of the pixel points within the location gives the commercial center of the location
In their classification of land types, NLCD 2001 combines high-density residential land with commercial land, but NLCD 1992 separates them. Hence, we match the two datasets using ArcGIS software to separate the pixel data for all residential land areas from land areas with commercial activity in 2001. We are able to do this separation because land areas that were high-density residential in 1992 are unlikely to convert to commercial land areas by 2001, and vice versa. In the rare instances where an area that was low-density residential in 1992 was classified as commercial land in the 2001 data, we do a quick visual inspection of the geographical area using Google Earth to confirm whether that area is truly commercial land or if it has converted into a high-density residential land.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Datta, S., Sudhir, K. The Agglomeration-Differentiation Tradeoff in Spatial Location Choice. Cust. Need. and Solut. 10, 2 (2023). https://doi.org/10.1007/s40547-023-00135-w
Accepted:
Published:
DOI: https://doi.org/10.1007/s40547-023-00135-w