1 Introduction

The traditional discrete choice model in marketing relies on the assumption that the consumer is aware of all products on the market and their characteristics (i.e., prices, product attributes, etc.) and makes a purchase decision among them. Since the work of Howard and Sheth (1969), however, the concept of the consideration set has been well recognized in the marketing literature. The notion of the consideration set posits that, instead of assuming omniscient consumers, consumers restrict their attention to a subset of products on the market and purchase a product within that set by picking the alternative with the highest utility. This large and growing literature includes, among many others, studies by Ratchford et al. (1980), Roberts et al. (1989), Hauser and Wernerfelt (1990), Roberts and Lattin (1991), and Siddarth et al. (1995).

In the recent literature, the formation of the consideration set has been modeled as the result of costly consumer search; i.e., consumers are a priori uncertain about certain characteristics of the product (e.g., price or the “match" between the individual consumer and product) and spend resources (temporal, psychological, etc.) to resolve that uncertainty.Footnote 1 The presence of such search costs leads consumers to obtain information only on a subset of products in the market—the consideration set (Mehta et al., 2003; Kim et al., 2010; Kim et al., 2016; Honka and Chintagunta, 2017)—since the consumer trades off the benefits and the costs of searching. Following the search literature in economics and marketing (Stigler, 1961; Weitzman, 1979; Ratchford et al., 1980), the consideration set formation based on search theory has assumed either simultaneous (Stigler, 1961) or sequential (Weitzman, 1979) search.

In the context of online search, for example, a consumer submits a search query on a website (e.g. Expedia.com or Amazon.com) and receives a list of products. Since such a list often reveals only partial information on the listed products, the consumer has to browse the list and click on a subset of products to reveal more information on the product. In terms of the search model, the act of clicking corresponds to searching, and the set of clicked products is viewed as the consideration set (Kim et al., 2010; Ursu, 2018; Chen and Yao, 2016; De Los Santos et al., 2012), from which the consumer makes a choice. A majority of the recent literature on online search has used the sequential search model (SSM) by Weitzman (1979) to model the formation of the consideration set.Footnote 2

In this paper, we propose a novel likelihood simulator for SSM that exploits the assumption that search costs vary at the consumer-product level and are unobserved (and hence, random) from the researcher’s perspective. We first lay out the details on how to simulate the likelihood for SSM in which consumers search to resolve the uncertainty in the match value. Then, with an extensive simulation study, we validate the proposed method by comparing the parameter recovery of the proposed method and other likelihood simulators that have been used in the literature. In addition, we show that the proposed method works with an alternative search cost distribution and that it recovers consumers’ relative preferences (i.e., ratios between parameters) even if the utility function and/or the search cost distribution is mis-specified. Lastly, we validate the proposed method by applying it to the field data from Expedia and comparing its out-of-sample prediction with other simulators’. There, we also show that the position effect and the search cost have larger impacts on the products in the lower part of the product list than on those in the upper part.

In the traditional discrete choice literature, it is straightforward to construct the probability of the observed choice under the assumption the consumer is fully informed and considers all products in the category, conditional on the chosen structure of the choice model (e.g., logit). Thus, one can formulate the likelihood function for a series of choices made by, say, a panel of consumers and then estimate the parameters of the model by maximizing the likelihood function. On the other hand, however, applying the consumer search model to the formulation of consideration sets and estimating it have been challenging, since there is no closed-form solution for the joint probability of search and purchase. Thus, researchers have resorted to simulation-based methods to estimate the model parameters, including the crude frequency simulator (CFS) (Chen and Yao, 2016; Ghose et al., 2019), the kernel-smoothed frequency simulator (KSFS) (Honka and Chintagunta, 2017; Ursu, 2018; Elberg et al., 2019), and recently developed Geweke-Hajivassiliou-Keane-based (GHK) (Hajivassiliou et al., 1996) simulators (Jiang et al., 2021).

Those simulators differ in what serves as the approximated joint probability of search and purchase. First, CFS makes a large number of random draws, without any restriction, of stochastic components that drive the randomness in search and purchase and computes the proportion of these draws that satisfy the conditions stipulated by SSM. This proportion then serves as the approximated joint probability. KSFS is similar to CFS, but, instead of using the proportions directly as probabilities, it smooths them out via a kernel. The resulting kernel-smoothed value is treated as the approximated probability. Lastly, GHK-type simulators make random draws of one quantity, such as the reservation utility or the realized utility, without any restriction and then sequentially make random draws of other quantities that satisfy the search set conditions. This approach then approximates the joint probability of search and purchase with the assumed error distribution and numerical integration.

The crucial difference is that the first two simulators do not force the random draws for the match values or the search costs to satisfy the search set conditions, while the GHK-type simulators enforce these conditions in every objective function evaluation. Therefore, CFS and KSFS are computationally much simpler to implement than GHK-type simulator is as the researcher needs to make random draws from the distribution truncated according to the search set conditions. In addition, for the same reason, CFS and KSFS are flexible in how the researcher introduces the randomness of the model, while GHK-type simulators require a careful selection of the source of the stochasticity as it dictates what other model components need to be randomly drawn from truncated distributions.

The proposed method is a variation of the GHK-type simulator, and its key distinction from previous GHK-type approaches is that the search cost serves as the source of randomness in the search order, instead of an additive error term besides the match value observed by the consumer prior to search (a “pre-search" error term). This specification allows a more precise identification of model parameters because two stochastic components, namely the match value and the search cost, are responsible for the randomness in the purchase decision and the search order, respectively, and do not interact with each other.

From a conceptual perspective, in comparison with KSFS, the simulated likelihood from the proposed method represents the “true” joint probability of search and purchase conditional on the model. In addition, from a computational perspective, by constructing a smooth, well-behaved likelihood, the proposed method allows the use of gradient-based optimization routines, thereby estimating the model parameters faster than other simulators. On the other hand, CFS approximates the probability with the proportion among random draws, implying that the simulated likelihood is in fact a step function. Hence, CFS requires a much larger number of random draws, overshadowing its computational simplicity, and it is challenging to use gradient-based optimization software in this case.

Although the above advantages are not specific to the proposed method but are shared with other GHK-type simulators, the proposed method is distinguished from other GHK-type simulators. By attributing 1) the randomness in the search order to the stochastic search costs and 2) the randomness in the purchase decision (conditional on search) to the match values, the proposed method identifies the model parameters with a single optimization. On the other hand, the approach used by Jiang et al. (2021), another GHK-type simulator, requires a grid search to avoid the “local optima problem” and identify the variance of the pre-search error term. A problem may arise from the grid search since the comparison of in-sample fit and the comparison of out-of-sample prediction may differ in the final choice of the variance of the pre-search error.

Our paper focuses on the case in which consumers search for their normally distributed match values, but our method can be generalized to other cases, including consumers searching for other product features (e.g., price) (Zhang et al., 2018; Elberg et al., 2019; De Los Santos et al., 2012) and/or the search objective following a different distribution, as long as the empirical model directly applies Weitzman (1979)’s framework to individual-level search and purchase data. Other likelihood-based approaches suggested in the literature include those by Kim et al. (2016) (using aggregate-level search and market share data) and Moraga-González et al. (2015) (using a combination of individual and aggregate data). Our approach is based on the commonly observed data structure wherein consumers’ search and purchase behavior are recorded at the individual level and so represents a direct translation of Weitzman (1979)’s results into an empirical likelihood.

2 Related literature

As our paper focuses on the empirical application of SSM, we classify the existing literature that employs the same model into three groups by the choice of the likelihood simulator. First, Chen and Yao (2016) and Ghose et al. (2019) estimate SSM by constructing the likelihood with CFS. Chen and Yao (2016) study how the refinement of product listings impacts consumer search behavior and market structure by applying SSM to individual-level online hotel search data. Ghose et al. (2019) use their estimates from SSM applied to online hotel search data to show that online search engines can enhance revenues by curating information on the product listings page.

Honka and Chintagunta (2017) identify whether consumers engage in simultaneous or sequential search in the auto insurance category by exploiting how prices in consideration sets follow different patterns across search methods. Then, they show that an incorrectly assumed search method over-predicts the market share of the largest company. Ursu (2018), by applying SSM to online hotel search data, demonstrates that the position within a product list impacts consumers’ search decisions but does not affect the purchase decision conditional on search. Elberg et al. (2019) apply SSM to the search path of consumers in a retail setting and find that consumers are more likely to purchase products on promotion after they are exposed to deep discounts than to shallow discounts. Yavorsky et al. (2021) combine mobile device location data and automobile registration data to show that the dealership visits are beneficial to potential buyers of automobiles. These studies construct the simulated likelihoods for SSM with KSFS proposed by McFadden (1989).

Jiang et al. (2021) estimate SSM to find that recommending sellers and their offerings to consumers is more effective than sending out discount coupons, and that both sellers and consumers will benefit if sellers are provided with information on consumer search behavior. Their likelihood simulator, like ours, forces the random draws to satisfy search set conditions at every objective function evaluation by making random draws from distributions truncated according to the search set conditions and current parameters. Although our approach shares the basic intuition with Jiang et al. (2021), these two approaches differ in how the stochasticity is introduced into the reservation utilities (i.e., what component of the reservation utility is random from the researcher’s perspective), whose order determines the order of search. Specifically, Jiang et al. (2021) adds a pre-search shock to the utility function that is observed only by consumers prior to search besides the match value, which is revealed only after the search. On the other hand, our method assumes that consumer-product-specific search costs are unobserved by researchers and therefore random from researchers’ perspective. Henceforth, we refer to the approaches that introduce a pre-search error term as Jiang et al. (2021)’s method (referred to as JM).Footnote 3

As an alternative approach, Kim et al. (2016) study the online search behavior with aggregate data from an online retail website. They utilize view rank data (the ranking of products in the number of views conditional on viewing a focal product), conditional share data (the choice shares of products in the category, conditional on viewing a focal product), and sales rank data (the ranking of products in the number of sales). Given parameters, they make predictions on view rank, conditional share, and sales of products and derive the likelihood from the measurement error assumed on those measures.

Approaches other than SSM have been applied to examine the consumer search behavior. De Los Santos et al. (2017) utilize Bayesian updating so that consumers update their prior beliefs. Agarwal et al. (2011) model the click probability with a logit model to demonstrate that the top position in online advertising is not necessarily the profit-maximizing position. Chan and Park (2015) model consumer’s decision in online advertising environment into the ad clicking stage and the search stage to show that consumers can be classified into multiple segments each with distinct search behavior and that the value of ad positions depends not only on advertiser’s position but also on the composition of online consumers. Gardete and Antill (2019) model the consumer’s incremental decision to acquire more information on a used car dealer website to find the impact on the conversion of the information provision. Greminger (2022) builds a model that incorporates consumers’ decision to discover more products into the search problem and finds the optimal policy for the search and discovery problem. Greminger (2021) builds on the model to study the implication of various rankings on the consumer welfare and the search intermediary’s revenue. While these papers represent important contributions to the search literature, given our focus on the structural application of Weitzman (1979)’s SSM, we exclude these studies from our subsequent analysis and discussion.

3 Utility and the sequential search model

3.1 Utility

Following the literature, we specify consumer i’s utility for product j \(u_{i,j}\) as follows.

$$\begin{aligned} u_{i,j}&= X_{i,j} \mathbf {\theta }_i + \epsilon _{i,j} = \mu _{i,j} + \epsilon _{i,j}, \;\; j = 1,\cdots , J \nonumber \\ u_{i,0}&= \mu _0 + \epsilon _{i,0}, \quad \epsilon _{i,j} \sim N(0,1), \quad \mathbf {\theta }_i \sim N(\bar{\theta }, \Sigma _\theta ) \end{aligned}$$
(1)

where \(X_{i,j}\) is a row vector of product j’s characteristics presented to consumer i, and \(\mathbf {\theta }_i\) is a vector of consumer i’s preference. We assume that \(X_{i,j}\) is known to the consumer prior to search. In online search, \(X_{i,j}\) represents the information presented after the search query but before clicking on a particular alternative. Thus, \(\mu _{i,j}\) represents the deterministic expected utility of consumer i from alternative j that is known to consumer prior to search. \(\epsilon _{i,j}\) denotes the consumer’s idiosyncratic preference or “match value,” which is assumed to be iid across consumers and products and follow the standard normal distribution for identification purposes.

We assume that prior to search, consumers know the distribution of \(\epsilon \) as above but the exact value of \(\epsilon _{i,j}\) is unknown to consumers prior to search. Therefore, consumers search through products to resolve the uncertainty in \(\epsilon _{i,j}\) and thereby reveal the true utility \(u_{i,j}\) (also referred to as realized utility). Consumer preferences are allowed to be heterogeneous in the model, and these preferences are assumed to be distributed according to a normal distribution with mean \(\bar{\theta }\) and diagonal covariance matrix \(\Sigma _\theta \). Lastly, the consumer has an outside option indexed by 0, whose utility \(u_{i,0}\) follows the normal distribution with mean \(\mu _0\) and variance equal to 1, same as the variance of \(\epsilon _{i,j}\).

In the empirical analysis, depending on the the structure of data, \(u_{i,0}\) should be assumed either to be known prior to the first search or to be revealed by the first search. In the case of online search, if the data include consumers who submit search queries and leave the website without clicking on any option, then one should assume that \(u_{i,0}\) is known prior to the first search because such consumers find that the marginal benefit of the first search is not greater than the cost of search and they choose the outside option without searching (clicking) any products (see Chen and Yao (2016); Honka and Chintagunta (2017)).Footnote 4 On the other hand, if the data include only consumers who make at least one click, then one should assume that the first search reveals the realized utilities of both the outside option and the searched product (see Kim et al. (2016)).Footnote 5

In this paper, we assume that \(u_{i,0}\) is revealed after the first search, along with the realized utility of the searched product, as the data used in the field-data validation contain only consumers who make at least one click. After the first click, the outside option serves as another inside option. That is, when deciding to continue searching or not, consumers compare the highest realized utility of the searched options and the outside option to the highest reservation utility of non-searched products. Once the stopping criterion is met, consumers make purchase decisions by choosing a product with the highest realized utility among searched products including the outside option.

We assume that search costs vary at the consumer-product level and are therefore, stochastic from the researcher’s perspective.Footnote 6 We discuss the assumption from two perspectives. First, from the conceptual perspective, since search costs represent the unwillingness of consumers to click on a product and to gather and process additional information on the product, there could be individual, product-related, psychological, or temporal factors that can result in variations across consumers and products. For example, Kim et al. (2010) and Chen and Yao (2016) assume that there are observable reasons for differences across consumers, for instance demographic differences, whereas others assume these differences to be unobserved (see e.g., Hortaçsu and Syverson (2004) and Jolivet and Turon (2018)). In terms of product-related factors, there could be aspects related to brand awareness and familiarity that could influence both preferences as well as the willingness to search. For example, Mehta et al. (2003) assume that advertising influences search costs and since advertising can vary by brand and consumers can be differentially exposed to such advertising, search costs can vary across products and consumers. From a psychological or temporal perspective, Ursu et al. (2022) for example point out that there could be factors such as fatigue that could affect consumers; to the extent that such fatigue influences consumers differentially and varies across products, this can also contribute to unobservable differences in search costs across consumers.

From a methodological perspective, one can think of our specification as being similar to previous work such as Jiang et al. (2021) in the sense that we have an unobservable factor that varies across consumers and products (see page 225, Eq. (2) of Jiang et al. (2021)). Further, similar to random coefficients models that have unobserved heterogeneity in preferences, one can think of search costs as being another such aspect of unobserved heterogeneity.

The assumption that search costs vary at the consumer-product level plays an important role in constructing the simulated likelihood (see Section 3.3), since it induces the randomness in the reservation utilities and thus in the search order and the stopping condition. Although it seems reasonable to make the assumption that we do, restrictions may arise from the ability to identify these search costs as in Honka (2014), who assumes the search costs are invariant across consumers and products. Further, one may choose to assume that search costs are heterogeneous across consumers but homogeneous within a consumer, when reservation utilities have a stochastic component. In addition, one may specify that the search costs are a deterministic function of consumer-product-specific variables to control for the search cost heterogeneity at the consumer-product level, given that the previous condition is satisfied.

Importantly, we assume that search costs \(c_{i,j}\) and match values \(\epsilon _{i,j}\) are independent. We further assume that search costs follow an iid exponential distribution with mean \(\bar{c}\), which is reparameterized as follows to ensure it takes a positive value.

$$c_{i,j} \sim \text {Exp}(\bar{c}), \;\; \text { where } \bar{c} = \exp (c_0)$$

We note that, in the literature, a log-normal distribution is a popular choice for the search cost distribution (see Kim et al. (2010); De Los Santos et al. (2017); Morozov et al. (2021); Jiang et al. (2021)), and we find that the proposed method works with a log-normal distribution as well (see Section 5.2.4). However, we find that the log-normal distribution is sometimes numerically unstable during the estimation, because the likelihood approximation involves evaluating cdf at small positive numbers and making random draws from a truncated distribution whose upper or lower bound is far from the mean. In the field-data validation, we use the shifted exponential distribution to accommodate the position effect.

3.2 Optimal sequential search algorithm

Weitzman (1979) shows that if the search objectives are uncorrelated across products, the optimal sequential search strategy can be summarized by the following three stages.

  1. 1.

    Selection : Consumers compute the reservation utilities of all products and begin searching in a descending order of the reservation utility. Each search resolves the uncertainty about the match value, revealing the realized utility of that product.

  2. 2.

    Stopping : Consumers stop searching if the expected marginal benefit of search is less than the search cost.

  3. 3.

    Choice : After stopping, consumers purchase the option in the search set that has the highest realized utility. Note that the searched set includes the outside option.

Product j’s reservation utility \(z_j\) is defined to be the level of utility that makes a consumer indifferent between searching and not searching product j. That is, if a consumer has already found an option k with a realized utility higher than the reservation utility of j, then j will not be searched. On the other hand, a consumer will search j if and only if its reservation utility is higher than the current maximum of already-searched options’ realized utilities.

Following Kim et al. (2010), we can derive the relationship between product j’s search cost \(c_j\) and reservation utility \(z_j\) by equating \(c_j\) and the expected marginal benefit of searching product j \(B_j\) given the current maximum realized utility equal to \(z_j\) as follows.

$$\begin{aligned} c_j&= B_j(z_j) = \int _{z_j}^\infty (u_j - z_j)f_j(u_j)du_j \nonumber \\&= (1- F_j (z_j)) E[u_j - z_j |u_j>z_j] \nonumber \\&= \left( 1 - \Phi \left( \frac{z_j - \mu _j}{\sigma }\right) \right) \left[ \mu _j - z_j + \sigma \frac{\phi (\frac{z_j - \mu _j}{\sigma })}{1 - \Phi (\frac{z_j - \mu _j}{\sigma })}\right] \end{aligned}$$
(2)

where \(f_j(\cdot )\) and \(F_j(\cdot )\) denote the probability density function (pdf) and the cumulative distribution function (cdf) of \(u_j\), respectively, \(\Phi (\cdot )\) and \(\phi (\cdot )\) denote cdf and pdf of the standard normal distribution respectively, and \(u_j \sim N(\mu _j, \sigma ^2)\). In principle, the first two lines of Eq. (2) hold for any arbitrary distribution of \(u_j\). The normality assumption allows us to further simplify the (implicit) equation. If we define \(\gamma _j \equiv \frac{c_j}{\sigma }\) and \(\eta _j \equiv \frac{z_j - \mu _j}{\sigma }\), then Eq. (2) can be rewritten as

$$\begin{aligned} \gamma _j&= (1 - \Phi (\eta _j))\left[ \frac{\phi (\eta _j)}{1 - \Phi (\eta _j)} - \eta _j\right] \nonumber \\&= \phi (\eta _j) - \eta _j(1-\Phi (\eta _j))\nonumber \\&= g(\eta _j) \end{aligned}$$
(3)

\(g(\cdot )\) is a mapping from the standardized reservation utility \(\eta _j\) to the scaled search cost \(\gamma _j\). The derivative of \(\gamma _j\) with respect to \(\eta _j\) shows that function g is a monotonically decreasing one-to-one mapping, suggesting that there exists an inverse of function g.

$$\begin{aligned} \frac{d\gamma _j}{d\eta _j} =&-(1 - \Phi (\eta _j)) \in (-1, 0) \end{aligned}$$

Therefore, given \(\mu _j\), one can use 1) \(g(\cdot )\) to find \(c_j\) implied by \(z_j\) and 2) \(g^{-1}(\cdot )\) to find \(z_j\) corresponding to \(c_j\) as follows.Footnote 7

$$\begin{aligned} c_j = g\left( \frac{z_j - \mu _j}{\sigma }\right) \times \sigma , \quad \quad z_j = \mu _j + \sigma \times g^{-1}\left( \frac{c_j}{\sigma }\right) \end{aligned}$$

The function g plays a crucial role in the construction of the likelihood. The difficulty of estimating a sequential search model stems from the fact that the reservation utility has no closed-form distribution under the usual probit or logit settings. However, the aforementioned features of function g allow us to use the search cost distribution to compute the distribution of reservation utilities. To illustrate, we can convert and compute the probability \(\Pr (z<a)\) using g and the distribution of the search cost as below.

$$\begin{aligned} \Pr (z< a)&= \Pr (\mu + \sigma \eta< a)\nonumber \\&= \Pr \left( \eta < \frac{a - \mu }{\sigma }\right) \nonumber \\&= \Pr \left( \gamma> g\left( \frac{a - \mu }{\sigma }\right) \right) \nonumber \\&= \Pr \left( c > \sigma g\left( \frac{a - \mu }{\sigma }\right) \right) \end{aligned}$$
(4)

Note that the inequality changes its direction in the third line as g is a decreasing function.

3.3 Stochastic component of reservation utility

The assumption that search costs are stochastic and vary at the consumer-product level plays an important role in our likelihood construction. First, we note that the reservation utilities must have a stochastic component for researchers to derive the probability of search order. For example, consider the following utility function and search cost specification, in which search costs are consumer-specific and homogeneous across products within a consumer.

$$\begin{aligned} u_{i,j} = X_{i,j} \mathbf {\theta }_i + \epsilon _{i,j}=\mu _{i,j} + \epsilon _{i,j}, \quad c_{i,j} = c_{i} \;\;\forall \; j, \quad \epsilon _{i,j} \sim N(0,1) \end{aligned}$$

The specification above leads to the following reservation utility.

$$\begin{aligned} z_{i,j}=\mu _{i,j} + g^{-1}\left( c_i\right) \end{aligned}$$

Note that the difference between the deterministic expected utility \(\mu _{i,j}\) and the reservation utility \(z_{i,j}\) is equal to \(g^{-1}\left( c_i\right) \) and is identical across products. Therefore, the order of \(z_{i,j}\)’s is identical to the order of \(\mu _{i,j}\)’s, implying that, given the parameter estimates and \(X_{i,j}\), the order of reservation utilities and the order of search are deterministic and have no randomness even from the researcher’s perspective. In this case, consumers will search in the same order whenever presented with the same set of products. Hence, the reservation utility must have a component that is unobserved by researchers and varies at the consumer-product-level. In our approach, search costs serve as such a stochastic component that induces randomness in the search order.

Another approach found in the literature to introduce a stochastic component into the search order is to add a pre-search shock to the utility function (e.g., Yavorsky et al. (2021), Jiang et al. (2021), Morozov (2023)) and let search costs be homogeneous across products within a consumer \((c_{i,j} = c_i\; \forall \; j)\). The pre-search shock is known to consumers prior to search but not observed by researchers. Then, the utility is given by

$$\begin{aligned} u_{i,j}&= X_{i,j} \mathbf {\theta }_i + e_{i,j} + \epsilon _{i,j} =\mu _{i,j} + e_{i,j} + \epsilon _{i,j}, \quad \epsilon _{i,j}\sim N(0,1), \quad e_{i,j} \sim N(0, \sigma _e^2) \end{aligned}$$

where \(e_{i,j}\) is the pre-search preference shock. Then, the reservation utility will be given as follows.

$$z_{i,j}=\mu _{i,j}+ e_{i,j} + g^{-1}\left( c_i\right) $$

Since \(e_{i,j}\) is unobserved by researchers and varies at the consumer-product-level, the reservation utility order is no longer entirely determined by the expected utility \(\mu _{i,j}\), and therefore the search order now is random from the researcher’s perspective.

A shortcoming of this approach, however, is that it is difficult to jointly identify the search cost parameters and \(\sigma _e\). Since SD of \(\epsilon _{i,J}\) is usually fixed at 1, the scale of reservation utilities is jointly determined by \(c_i\) and \(\sigma _e\), while \(\sigma _e\) is passed to realized utilities and also affects the scale of realized utilities. Thus, if \(\sigma _e\) is to be estimated (instead of being fixed), then the scales of both reservation utilities and realized utilities cannot be determined, and researchers are forced to take an additional step for identification. For instance, Jiang et al. (2021) undertake a grid search “to avoid the local optima problem” by estimating the model with \(\sigma _e\) fixed at some candidate values and picking \(\sigma _e\) that yields the highest maximized likelihood.

On the other hand, PM introduces randomness into the reservation utility by exploiting the assumption that search costs vary across products and consumers, so the scale of reservation utilities is determined by the standard deviation of the search cost. Since only the deterministic part of the reservation utility \(\mu _{i,j}\) is passed to realized utilities, the scale of realized utilities is unaffected by the search cost but determined by the SD of \(\epsilon _{i,j}\), fixed at 1, thereby allowing researchers to identify and estimate all model parameters with a single likelihood maximization.

These different choices of the stochastic component have an implication on the correlation between the purchase decision and the search order. Let us assume that there are only two products, j and k, with the same expected utilities \((\mu _{i,j} = \mu _{i,k})\) and that a consumer is observed to search j and then k. In this case, PM leads to the following inequalities.

$$\begin{aligned} \mu _{i,j} + g^{-1}(c_{i,j}) > \mu _{i,k} + g^{-1}(c_{i,k})&\Rightarrow c_{i,j}< c_{i,k}\\ \mu _{i,j} + \epsilon _{i,j}< \mu _{i,k} + g^{-1}(c_{i,k})&\Rightarrow \epsilon _{i,j} < g^{-1}(c_{i,k}) \end{aligned}$$

On the other hand, JM produces the following inequalities.

$$\begin{aligned} \mu _{i,j} + e_{i,j} + g^{-1}(c_{i})> \mu _{i,k} +e_{i,k} + g^{-1}(c_{i})&\Rightarrow e_{i,j} > e_{i,k}\\ \mu _{i,j} + e_{i,j}+ \epsilon _{i,j}< \mu _{i,k} + e_{i,k}+ g^{-1}(c_{i})&\Rightarrow \epsilon _{i,j} < (e_{i,k}-e_{i,j}) + g^{-1}(c_{i}) \end{aligned}$$

Then, the following inequalities must be satisfied in order for the first-searched product j to be chosen by the consumer under PM and JM, respectively.

$$\begin{aligned} \text {PM: }&\mu _{i,j} + \epsilon _{i,j}> \mu _{i,k} + \epsilon _{i,k} \Rightarrow \epsilon _{i,k}< \epsilon _{i,j}\\ \text {JM: }&\mu _{i,j} + e_{i,j}+\epsilon _{i,j} > \mu _{i,k} + e_{i,k}+\epsilon _{i,k} \Rightarrow \epsilon _{i,k} < \epsilon _{i,j}+e_{i,j} - e_{i,k} \end{aligned}$$

Given the same \(\epsilon _{i,j}\) that satisfies inequalities above, the first-searched product j is more likely to be chosen by the consumer under JM than under PM because \((e_{i,j} - e_{i,k}) > 0\).

4 Likelihood and estimation

4.1 Joint probability of search and purchase

In this section, we translate the decision of search and purchase under SSM into a set of inequalities. The joint probability depends on the order of search, the length of search sequence, the purchased product, and the purchased product’s position within the search sequence. We let \(S_i\) denote the ordered search set of consumer i, \(D_i\) the purchased product, \(r_{i,k}\) the index of a product that has k-th highest reservation utility of all products on the market (e.g. \(r_{i,1}\) returns the index of product with the highest reservation utility), \(K_i\) the total number of products searched by consumer i, and J the total number of products available on the market. For a comprehensive treatment, we need to consider three cases—(1) \(K_i = 1\), (2) \(1< K_i< J\), and (3) \(K_i = J\).

4.1.1 Case 1—single product searched (\(K_i = 1\))

If \(J=3\) and a consumer terminates his search process after searching, say, Product 2 \((S_i=\{2\})\) and does not purchase any product \((D_i = 0)\), then the joint probability of his search and the purchase decision will be given as follows.

$$\begin{aligned}&\Pr (S_i = \{2\}, D_i = 0) \nonumber \\ =&\Pr (\underbrace{z_{i,2}> \max _{l\notin \{S_i\cup 0\}} z_{i,l}}_{\text {Selection}} \; \bigcap \; \underbrace{u_{i,0}> \max _{l\notin \{S_i\cup 0\}} z_{i,l}}_{\text {Stopping}} \;\bigcap \; \underbrace{u_{i,0}> u_{i,2}}_{\text {Choice}}) \nonumber \\ =&\Pr (\min (z_{i,2}, u_{i,0})> \max _{l\notin \{S_i\cup 0\}} z_{i,l} \;\bigcap \; u_{i,0}> u_{i,2})\nonumber \\ =&\Pr (u_{i,0}> z_{i,2} \;\bigcap \; z_{i,2}> \max _{l\notin \{S_i\cup 0\}} z_{i,l} \;\bigcap \; u_{i,0}> u_{i,2})\nonumber \\&+ \Pr (z_{i,2}> u_{i,0} \;\bigcap \; u_{i,0}> \max _{l\notin \{S_i\cup 0\}} z_{i,l} \;\bigcap \; u_{i,0} > u_{i,2}) \end{aligned}$$
(5)

Searching Product 2 first means that it has the highest reservation utility. The choice of the outside option implies \(u_{i,0} > u_{i,2}\), and terminating after the first search suggests that \(u_{i,0}\) is higher than the maximum of non-searched products’ reservation utilities. We need to break the probability down to two cases as done after the last equality—1) \(\min (u_{i,0}, z_{i,2})= z_{i,2}\Rightarrow u_{i,0} > z_{i,2}\) and 2) \(\min (u_{i,0}, z_{i,2})=u_{i,0} \Rightarrow z_{i,2} > u_{i,0}\)—since there is no closed-form distribution of \(\min (z_{i,j}, u_{i,j'})\) in this setting.

4.1.2 Case 2—multiple, but not all products searched (\(1<K_i<J\))

If \(J=3\) and a consumer searches Products 2 and then 3 and purchases Product 3, the joint probability of the search set and the purchase decision will be given as follows.

$$\begin{aligned}{} & {} \Pr (S_i = \{2, 3\}, D_i = 3) \nonumber \\= & {} \Pr (\underbrace{z_{i,2}> z_{i,3}> \max _{l\notin \{S_i\cup 0\}} z_{i,l}}_{\text {Selection}} \;\bigcap \;\underbrace{u_{i,3}> \max (u_{i,2}, u_{i,0})}_{\text {Choice}} \nonumber \\{} & {} \bigcap \;\underbrace{u_{i,3}> \max _{l\notin \{S_i\cup 0\}} z_{i,l} \;\bigcap \; z_{i,3}> \max (u_{i,2}, u_{i,0})}_{\text {Stopping}})\nonumber \nonumber \\= & {} \Pr (z_{i,2}> z_{i,3} \;\bigcap \; \min (z_{i,3}, u_{i,3})> \max (u_{i,0},u_{i,2}) \bigcap \; \min (z_{i,3}, u_{i,3})>\max _{l\notin \{S_i\cup 0\}} z_{i,l}) \nonumber \\= & {} \Pr (z_{i,3}> u_{i,3} \;\bigcap \; z_{i,2}> z_{i,3} \;\bigcap \; u_{i,3}> u_{i,0} \;\bigcap \; u_{i,3}> u_{i,2} \bigcap \; u_{i,3}> \max _{l\notin \{S_i\cup 0\}} z_{i,l} ) \nonumber \\{} & {} +\Pr (u_{i,3}> z_{i,3} \;\bigcap \; z_{i,2}> z_{i,3}\;\bigcap \; z_{i,3}> u_{i,0} \;\bigcap \; z_{i,3}> u_{i,2} \bigcap \; z_{i,3} > \max _{l\notin \{S_i\cup 0\}} z_{i,l}) \end{aligned}$$
(6)

If the purchased product coincides with the last-searched product \((D_i = r_{i,K_i})\), one will need to break down the probability into two cases just as in the \(K_i = 1\) case. If \(1< K_i < J\) and \(D_i \ne r_{i,K_i}\), the second probability after last equality sign of Eq. (6) will be unnecessary, since \(z_{i,r_{i,K_i}} > u_{i,D_i}\) is implied by the search set conditions.

4.1.3 Case 3—all products searched (\(K_i = J\))

If \(J=3\) and a consumer searches Products 2, 3, and then 1 and purchases Product 1, then the probability of his search and purchase decision will be given as follows.

$$\begin{aligned}{} & {} \Pr (S_i = \{2,3,1\}, D_i = 1) \nonumber \\= & {} \Pr (\underbrace{z_{i,2}> z_{i,3}> z_{i,1}}_{\text {Selection}} \;\bigcap \;\underbrace{u_{i,1}> \max (u_{i,2}, u_{i,3}, u_{i,0})}_{\text {Choice}} \bigcap \;\underbrace{z_{i,1}> \max (u_{i,2}, u_{i,3}, u_{i,0})}_{\text {Stopping}})\nonumber \\= & {} \Pr (z_{i,2}> z_{i,3}> z_{i,1} \;\bigcap \; \min (z_{i,1}, u_{i,1})> \max (u_{i,2}, u_{i,3}, u_{i,0})) \nonumber \\= & {} \Pr (z_{i,1}> u_{i,1} \;\bigcap \; z_{i,2}> z_{i,3}> z_{i,1} \bigcap \; u_{i,1}> u_{i,2} \;\bigcap \; u_{i,1}> u_{i,3} \;\bigcap \; u_{i,1}> u_{i,0}) \nonumber \\{} & {} + \Pr (u_{i,1}> z_{i,1} \;\bigcap \; z_{i,2}> z_{i,3}> z_{i,1}\bigcap \; z_{i,1}> u_{i,2} \;\bigcap \; z_{i,1}> u_{i,3} \;\bigcap \; z_{i,1} > u_{i,0}) \end{aligned}$$
(7)

As in the \(1<K_i<J\) case, if \(D_i \ne r_{i,K_i}\), then only the first probability after the last equality will be needed to approximate the joint probability of search and purchase.

4.2 Simulating likelihood

We outline the construction of our sample likelihood. The likelihood of search sequences and purchase decisions can be expressed as follows.

$$\begin{aligned} \mathcal {L}&= \prod _i \Pr (S_i = s, D_i = j^* |\Theta , X_i) \nonumber \\&= \prod _i E[\mathbb {I}(S_i = s, D_i = j^*|\Theta , X_i)] \nonumber \\&= \prod _i \int _\xi \mathbb {I}(S_i = s, D_i = j^* |\textbf{u}_i(\xi _i|X_i), \textbf{z}_i(\xi _i|X_i) ) dF(\xi |\Theta ) \end{aligned}$$
(8)

where \(\Theta = \{\bar{\theta }, \Sigma _\theta , \mu _0, c_0\}\), \(\xi _i = \{\mathbf {\theta }_i, \textbf{c}_i, \mathbf {\epsilon }_i\}\) (\(\textbf{c}_i = [c_{i,1},\cdots ,c_{i,J}]'\), \(\mathbf {\epsilon }_i = [\epsilon _{i,0},\epsilon _{i,1}\cdots ,\epsilon _{i,J}]'\)), and \(X_i\) is the matrix of characteristics of all products available to consumer i prior to search. \(\textbf{u}_i(\xi _i |X_i)\) and \(\textbf{z}_i(\xi _i |X_i)\) are vectors of consumer i’s realized utilities and reservation utilities, respectively, given \(\xi _i\) and \(X_i\). We use Monte Carlo integration to approximate Eq. (8).

$$\begin{aligned} \mathcal {L}&\approx \prod _{i} \frac{1}{n_d}\sum _{d=1}^{n_d} \Pr (S_i = s, D_i = j^* | \textbf{u}_i^{(d)}, \textbf{z}_i^{(d)}, \Theta , X_i) \end{aligned}$$
(9)

where \(\textbf{u}_i^{(d)}\) and \(\textbf{z}_i^{(d)}\) denote vectors of utilities and reservation utilities, respectively, computed at the d-th draw of \(\xi _i\) given \(\Theta \), and \(n_d\) is the number of random draws per observation. More specifically, given \(\xi _i^{(d)} = \{\mathbf {\theta }_i^{(d)}, \textbf{c}_i^{(d)}, \mathbf {\epsilon }_i^{(d)}\}\), we use \(\textbf{u}_i^{(d)} = X_{i} \mathbf {\theta }_i^{(d)}+\mathbf {\epsilon }_i^{(d)}\) and \(\textbf{z}_i^{(d)} = X_{i} \mathbf {\theta }_i^{(d)} + g^{-1}(\textbf{c}_i^{(d)})\).

We now discuss how to simulate the likelihood for SSM. PM shares the basic principle with Geweke-Hajivassiliou-Keane simulator (GHK simulator) for the probit model—recursively making random draws while enforcing inequality constraints. The basic idea behind PM is that, beginning with the random draw of the realized utility of the purchased product, we can exploit the stochastic search cost assumption and the function g to calculate the probabilities of search set conditions and to simulate other quantities that satisfy them. The construction of simulated likelihood can be summarized as follows.

  1. 1.

    Given the current parameter values, make a random draw of the realized utility of purchased product, denoted by \(u_i^*\).

  2. 2.

    Given \(u_i^*\), compute the probability that \(u_i^* \ge u_{i,j'} \;\; \forall j' \in \{S_i \cup 0\}\).

  3. 3.

    Using Eq. (4), compute the probability of the inequality between \(u_i^*\) and \(z_{i,r_{K_i}}\) and randomly draw \(z_{i,r_{K_i}}\) from the distribution of c truncated in accordance with the inequality.

  4. 4.

    Compute the probability that non-searched products’ reservation utilities are lower than \(z_{i,r_{K_i}}\) or \(u_i^*\), depending on the search sequence and purchase decision.

  5. 5.

    For \(l = K_i, K_i-1,\cdots , 2\), compute the probability of the inequality \(z_{i,r_l} < z_{i,r_{l-1}}\), and, given a random draw of \(z_{i,r_l}\), make a random draw of \(z_{i,r_{l-1}}\) that satisfies the inequality.

  6. 6.

    Take the product of probabilities from the steps above.

  7. 7.

    Repeat steps above \(n_d\) times for each observation and take the average across random draws. The average serves as the approximated probability of the observed search and purchase decision.

Once the joint probability of search and purchase decision of every consumer in the data is approximated as above, one can use the standard simulated maximum likelihood estimation (SMLE) procedure to obtain the model parameter estimates \(\hat{\Theta }\).

$$\hat{\Theta } = {\underset{\Theta }{\arg \max }} \prod _{i} \frac{1}{n_d}\sum _{d=1}^{n_d} \text {Pr}\left( S_i = s, D_i = j^*|\textbf{u}_i^{(d)}, \textbf{z}_i^{(d)}, \Theta , X_i\right) $$

4.2.1 Notation

Before we present a step-by-step procedure for likelihood simulation, we need to clarify notations to be used. As we begin the construction of simulated likelihood by making a random draw of the realized utility of the purchased product, we write probability statements conditional on the realized utility of the purchased product. For example, the first term of Eq. (5), where Product 2 is searched \((S_i = \{2\})\) and the outside is chosen \((D_i=0)\), is denoted as follows for better readability.

$$\begin{aligned}&\Pr \left( u_{i,0}> u_{i,2} \;\bigcap \; u_{i,0}> z_{i,2} \;\bigcap \; z_{i,2}> \max _{l\notin \{S_i\cup 0\}} z_{i,l}\right) \\ =&\Pr \bigg (\underbrace{u_{i,0}> u_{i,2}}_{\text {A}} \;\bigcap \; \underbrace{u_{i,0}> z_{i,2}}_{\text {B}} \;\bigcap \; \underbrace{z_{i,2} > \max _{l\notin \{S_i\cup 0\}} z_{i,l}}_{\text {C}}\bigg )\\ =&\Pr (A \cap B \cap C) \end{aligned}$$

Technically, the probability above is calculated by evaluating the following integral.

$$\begin{aligned}&\Pr \left( u_{i,0}> u_{i,2} \;\bigcap \; u_{i,0}> z_{i,2} \;\bigcap \; z_{i,2} > \max _{l\notin \{S_i\cup 0\}} z_{i,l}\right) \nonumber \\ =&\int _{-\infty }^\infty \int _{-\infty }^{u_{i,0}}\int _{-\infty }^{u_{i,0}}\int _{-\infty }^{z_{i,2}} f_0(u_{i,0}) f_2(u_{i,2}) h_2(z_{i,2}) h_m(m_i) dm_i dz_{i,2} du_{i,2} du_{i,0} \nonumber \\ =&\int _{-\infty }^\infty f_0(u_{i,0}) \left[ \int _{-\infty }^{u_{i,0}} f_2 (u_{i,2})du_{i,2}\right] \quad \left[ \int _{-\infty }^{u_{i,0}}\int _{-\infty }^{z_{i,2}} h_2(z_{i,2}) h_m(m_i)dm_i dz_{i,2}\right] du_{i,0} \nonumber \\ =&\int _{-\infty }^\infty f_0(u_{i,0}) F_2(u_{i,0})\int _{-\infty }^{u_{i,0}} h_2(z_{i,2}) \int _{-\infty }^{z_{i,2}} h_m(m_i)dm_i dz_{i,2} du_{i,0} \nonumber \\ =&\int _{-\infty }^\infty f_0(u_{i,0}) F_2 (u_{i,0}) \int _{-\infty }^{u_{i,0}} h_2(z_{i,2}) H_m(z_{i,2})dz_{i,2} du_{i,0} \end{aligned}$$
(10)

where \(m_i \equiv \max _{l\notin \{S_i\cup 0\}}z_{i,l}\), and \(f_l\), \(h_l\), and \(h_m\) denote pdf of \(u_{i,l}\), \(z_{i,l}\), and \(m_i\), respectively, and \(F_l\) and \(H_m\) denote cdf of \(u_{i,l}\) and \(m_i\), respectively.

Next, assume that we know the value of the realized utility of the purchased product, \(u_{i,0}^{(d)}\), and the value of the reservation utility of Product 2, \(z_{i,2}^{(d)}\), such that inequality B is satisfied. Then, the integrand of the outer integral in Eq. (10) can be expressed by

$$\begin{aligned}&f_0(u_{i,0}^{(d)}) F_2 (u_{i,0}^{(d)}) \int _{-\infty }^{u_{i,0}^{(d)}} h_2(z_{i,2}) H_m(z_{i,2}^{(d)})dz_{i,2} \\ =&f_0(u_{i,0}^{(d)}) F_2 (u_{i,0}^{(d)}) H_2(u_{i,0}^{(d)}) H_m(z_{i,2}^{(d)}) \end{aligned}$$

Then, the probability in Eq. (10) can be approximated by Monte Carlo integration.

$$\begin{aligned}&\Pr \bigg (\underbrace{u_{i,0}> u_{i,2}}_{\text {A}} \;\bigcap \; \underbrace{u_{i,0}> z_{i,2}}_{\text {B}} \;\bigcap \; \underbrace{z_{i,2}> \max _{l\notin \{S_i\cup 0\}} z_{i,l}}_{\text {C}}\bigg ) \nonumber \\ \approx&\frac{1}{n_d} \sum _{d=1}^{n_d} F_2 (u_{i,0}^{(d)}) H_2(u_{i,0}^{(d)}) H_m(z_{i,2}^{(d)}|u_{i,0}^{(d)} > z_{i,2}^{(d)}) \end{aligned}$$
(11)

One can see that given a random draw of \(u_{i,0}^{(d)}\), inequalities A and B are independent as it is assumed that \(\epsilon _{i,j}\) and \(c_{i,j}\) are independent, while inequalities B and C are correlated since we make a random draw of \(z_{i,2}^{(d)}\) that satisfies inequality B and the same random draw is plugged into \(H_m\) to calculate the probability of inequality C. Following this structure, we reflect such dependency as following.

$$\begin{aligned}&\Pr \bigg (\underbrace{u_{i,0}> u_{i,2}}_{\text {A}} \;\bigcap \; \underbrace{u_{i,0}> z_{i,2}}_{\text {B}} \;\bigcap \; \underbrace{z_{i,2} > \max _{l\notin \{S_i\cup 0\}} z_{i,l}}_{\text {C}}\bigg )\\ =&\Pr (A) \Pr (B) \Pr (C|B) \end{aligned}$$

In the following example, the inequalities represented by alphabets are actually conditional on the random draw of the realized utility of the purchased product.

4.3 Example—case 1 \((K_i = 1)\)

As an illustration, we present how to simulate the probability for Case 1 presented in Eq. (5). The joint probability of the search sequence and the purchase decision can be expressed as the product of conditional probabilities as follows.

$$\begin{aligned} \Pr (S_i = \{2\}, D_i = 0) =&\Pr (\underbrace{u_{i,0}> u_{i,2}}_{\text {A}} \;\bigcap \; \underbrace{u_{i,0}> z_{i,2}}_{\text {B}} \;\bigcap \; \underbrace{z_{i,2}> \max _{l\notin \{S_i\cup 0\}} z_{i,l})}_{\text {C}}\nonumber \\&+ \Pr (\underbrace{u_{i,0}> u_{i,2}}_{\text {A}} \;\bigcap \; \underbrace{z_{i,2}> u_{i,0}}_{\text {D}} \;\bigcap \; \underbrace{u_{i,0} > \max _{l\notin \{S_i\cup 0\}} z_{i,l}}_{\text {E}})\nonumber \\ =&\Pr (A)\Pr (B)\Pr (C|B) + \Pr (A)\Pr (D)\Pr (E) \end{aligned}$$
(12)

The first term of the right hand side of Eq. (12) is approximated by the following steps.

  1. 1.

    Make a random draw of \(\epsilon _{i,0}^{(d)}\) and \(\mathbf {\theta }_i^{(d)},\) to form \(\mu _{i,j}^{(d)} \; \forall \; j\) and \(u_{i,0}^{(d)}\).

  2. 2.

    Compute the probability of inequality A, given \(u_{i,0}^{(d)}\).

    $$\Pr \left( A^{(d)}\right) = \Pr \left( \epsilon _{i,2} < u_{i,0}^{(d)} - \mu _{i,2}^{(d)}\right) $$
  3. 3.

    Given \(u_{i,0}^{(d)}\) and \(\mu _{i,2}^{(d)}\), compute \(\bar{\eta }_{i,2}^{(d)}\), the upper bound of \(\eta _{i,2}\) implied by inequality B, and then corresponding \(\underline{\gamma }_{i,2}^{(d)}\), the implied lower bound of \(\gamma _{i,2}\), using function g.

    $$\bar{\eta }_{i,2}^{(d)} = u_{i,0}^{(d)} - \mu _{i,2}^{(d)}, \qquad \underline{\gamma }_{i,2}^{(d)} = g\left( \bar{\eta }_{i,2}^{(d)}\right) $$

    Note that the upper bound of \(\eta \) corresponds to the lower bound of \(\gamma \) as function g is a decreasing function.

  4. 4.

    Compute the probability of inequality B using the distribution assumption on \(c_{i,j}\).

    $$\Pr \left( B^{(d)}\right) = \Pr \left( c_{i,2} > \underline{\gamma }_{i,2}^{(d)}\right) $$
  5. 5.

    Given \(\underline{\gamma }_{i,2}^{(d)}\), make a random draw of \(c_{i,2}^{(d)}\) from its distribution left-truncated at \(\underline{\gamma }_{i,2}^{(d)}\) and calculate corresponding \(\eta _{i,2}^{(d)}\) by using \(g^{-1}\) to form the reservation utility of Product 2, \(z_{i,2}^{(d)}\), which satisfies inequality B by construction.

    $$c_{i,2}^{(d)} \sim \text {Exp}(\bar{c}|c_{i,2}^{(d)} > \underline{\gamma }_{i,2}^{(d)}), \qquad \eta _{i,2}^{(d)} = g^{-1}\left( c_{i,2}^{(d)}\right) ,\qquad z_{i,2}^{(d)} = \mu _{i,2}^{(d)} + \eta _{i,2}^{(d)}$$
  6. 6.

    Given \( z_{i,2}^{(d)}\), \(\forall \; l \notin \{S_i \cup 0\}\), calculate \(\bar{\eta }_{i,l}^{(d)}\), the upper bound of \(\eta _{i,l}\) implied by inequality C, and then corresponding \(\underline{\gamma }_{i,l}^{(d)}\), the implied lower bound for \(c_{i,l}\), and calculate the probability of inequality C.

    $$\begin{aligned} \bar{\eta }_{i,l}^{(d)} \!=\! z_{i,2}^{(d)} \!-\! \mu _{i,l}^{(d)}, \quad \underline{\gamma }_{i,l}^{(d)} \!=\! g\left( \bar{\eta }_{i,l}^{(d)}\right) , \quad \Pr \left( C^{(d)}|B^{(d)}\right) \!=\! \prod _{l\notin \{S_i\cup 0\}} \Pr \left( c_{i,l} > \underline{\gamma }_{i,l}^{(d)}\right) \end{aligned}$$
  7. 7.

    The product of probabilities above approximates the probability given one set of random draws. Repeat the steps above for \(d=1, \cdots , n_d\). The average of probabilities across d will be the approximated probability of the first term in Eq. (5).

    $$\begin{aligned} \Pr (u_{i,0}> u_{i,2}&\;\bigcap \; u_{i,0}> z_{i,2}\;\bigcap \; z_{i,2} > \max _{l\notin \{S_i\cup 0\}} z_{i,l}) \\ \approx&\frac{1}{n_d}\sum _{d} \Pr \left( A^{(d)}\right) \Pr \left( B^{(d)}\right) \Pr \left( C^{(d)}|B^{(d)}\right) \end{aligned}$$

The second term of Eq. (5) is approximated by a similar process.

  1. 1.

    Compute the probability of \(D^{(d)}\) as the complement of \(B^{(d)}\).

    $$ \Pr \left( D^{(d)}\right) = 1 - \Pr \left( B^{(d)}\right) $$
  2. 2.

    Retain the random draws of \(u_{i,0}^{(d)}\) and \(\mu _{i,l}^{(d)}\) from the first term approximation. Given \(u_{i,0}^{(d)}\) and \(\mu _{i,l}^{(d)}\), \(\forall \; l \notin \{S_i \cup 0\}\), calculate \(\bar{\eta }_{i,l}^{(d)}\), the upper bound of \(\eta _{i,l}\) implied by inequality E, and corresponding \(\underline{\gamma }_{i,l}^{(d)}\), the implied lower bound for \(c_{i,l}\), and calculate the probability of E.

    $$\begin{aligned} \bar{\eta }_{i,l}^{(d)} = u_{i,0}^{(d)} - \mu _{i,l}^{(d)}, \quad \underline{\gamma }_{i,l}^{(d)} = g\left( \bar{\eta }_{i,l}^{(d)}\right) , \quad \Pr \left( E^{(d)}\right) = \prod _{l\notin \{S_i\cup 0\}} \Pr \left( c_{i,l} > \underline{\gamma }_{i,l}^{(d)}\right) \end{aligned}$$
  3. 3.

    Taking product of probabilities and taking average across \(d = 1,\cdots , n_d\) yield the approximated probability of the second term.

    $$\begin{aligned} \Pr (u_{i,0}> u_{i,2}&\;\bigcap \; z_{i,2}> u_{i,0} \;\bigcap \; u_{i,0} > \max _{l\notin \{S_i\cup 0\}} z_{i,l})\\ \approx&\frac{1}{n_d}\sum _{d} \Pr \left( A^{(d)}\right) \Pr \left( D^{(d)}\right) \Pr \left( E^{(d)}\right) \end{aligned}$$

The sum of two approximated probabilities represents the simulated joint probability of the search sequence and the purchase decision.

$$\begin{aligned} \Pr (S_i = \{2\}, D_i = 0) \approx&\frac{1}{n_d}\sum _{d} \Pr \left( A^{(d)}\right) \Pr \left( B^{(d)}\right) \Pr \left( C^{(d)}|B^{(d)}\right) \\&+\frac{1}{n_d}\sum _{d} \Pr \left( A^{(d)}\right) \Pr \left( D^{(d)}\right) \Pr \left( E^{(d)}\right) \end{aligned}$$

The probabilities from Eqs. (6)-(7) are approximated in a similar manner. We refer interested readers to Appendix A.

4.4 Identification

As mentioned in Section 3.3, the proposed method has an advantage in parameter identification since the stochastic component of the reservation utility, which stems from the search cost, does not appear in the realized utility, which has its own stochastic component \(\epsilon \). By having two stochastic components, one responsible for randomness in the reservation utility and the other for randomness in the realized utility, the preference parameters are identified from the purchase decision, a typical identification strategy in the literature. First, the vector of mean utility parameters \(\bar{\theta }\) is identified by the correlation between the product attributes and the product’s purchase popularity. The unobserved preference heterogeneity \(\Sigma _\theta \) is identified by the heterogeneity in the composition of searched products across consumers. For instance, if there is a search sequence that contains only high-priced items and another that contains only low-priced items, one can infer that consumers differ in their price sensitivities.

Then, given the preference parameters, the mean search cost \(\bar{c}\) is identified by the distribution of the search sequence lengths. As an illustration, assume that there are only two products, A and B, available and that there are only two search sequences observed in the data, \(S_1 = \{A\}\) and \(S_2 = \{A,B\}\), which imply the following inequalities, respectively.

$$\begin{aligned} S_1 = \{A\} \Rightarrow u_A> z_B&\Rightarrow \mu _A + \epsilon _A> \mu _B + \eta _B \Rightarrow \eta _B< \mu _A - \mu _B + \epsilon _A\\ S_2 = \{A,B\} \Rightarrow u_A< z_B&\Rightarrow \mu _A + \epsilon _A < \mu _B + \eta _B \Rightarrow \eta _B > \mu _A - \mu _B + \epsilon _A \end{aligned}$$

If one observes a higher proportion of \(S_2\), one can infer that \(\eta = g^{-1}(c)\) has a larger value more often, or equivalently that the search cost has a smaller value more often. Accordingly, the distribution of the lengths of search sequences, combined with the fixed SD of \(\epsilon \), identifies the mean and variance of \(\eta \), which translates to the mean and variance of the search cost given the distributional assumption. The same identification strategy applies to two-parameter distributions, such as log-normal distribution. In addition, since the search order is partially determined by the expected utilities, given the search cost parameters, the identification of preference parameters becomes more precise as the information from the purchase decision is combined with the information from the search decision.

5 Simulation study validation

We present the results from the extensive simulation studies to validate the proposed likelihood simulator. The first objective of these simulation studies is that PM recovers true parameter values more precisely than other simulators, including KSFS, CFS, JM, and importance sampling (IS).Footnote 8 We also show that PM works well with the log-normal distribution assumed on search costs, a common distribution assumption in the literature. Second, we show that PM closely approximates the true joint probability of search and purchase. Lastly, we show that PM is robust to the model mis-specification, including the mis-specification of the utility function and/or the search cost distribution, in that PM consistently estimates the ratio between utility parameters.Footnote 9

5.1 Model setup

The realized utility \(u_{i,j}\), the reservation utility \(z_{i,j}\), and the search costs \(c_{i,j}\) for PM, CFS, and KSFS are specified as follows.

$$\begin{aligned} u_{i,j}&= \alpha _{j} + X_{1,i,j}\theta _{1,i} + X_{2,i,j}\theta _{2} + \epsilon _{i,j}, \;\;\quad u_{i,0} = \mu _0 + \epsilon _{i,0} \nonumber \\ z_{i,j}&= \alpha _{j} + X_{1,i,j}\theta _{1,i} + X_{2,i,j}\theta _{2} + g^{-1}\left( c_{i,j}\right) \nonumber \\ c_{i,j}&\sim \text {Exp}(\bar{c}), \quad \;\; \theta _{1,i} \sim N(\bar{\theta }_1, \sigma _{\theta _1}^2), \;\;\quad \epsilon _{i,j} \sim N(0,1) \end{aligned}$$
(13)

where \(\alpha _j\)’s are product intercepts, \(\theta \)’s are coefficients for product characteristics \(X_1\) and \(X_2\), and \(\mu _0\) is the mean preference for the outside option. We allow \(\theta _{i,1}\) to be heterogeneous across consumers and normally distributed across consumers with mean \(\bar{\theta }_1\) and SD \(\sigma _{\theta _1}\). Note that \(\alpha _1\) is fixed at zero for identification purposes. In addition, as assumed in Section 3, the realized utility of the outside option \(u_{i,0}\) is assumed to be revealed by the first search.Footnote 10

The utility for Jiang et al. (2021)’s method (JM) is specified as follows.

$$\begin{aligned} u_{i,j}&= \alpha _{j} + X_{1,i,j}\theta _{1,i} + X_{2,i,j}\theta _{2} + e_{i,j} + \epsilon _{i,j}, \;\;\quad u_{0,i} = \mu _0 + \epsilon _{i,0}\nonumber \\ z_{i,j}&= \alpha _{j} + X_{1,i,j}\theta _{1,i} + X_{2,i,j}\theta _{2} + e_{i,j} + g^{-1}\left( c_i\right) \nonumber \\ c_{i}&\sim \text {Exp}(\bar{c}), \quad \;\; \theta _{1,i} \sim N(\bar{\theta }_1, \sigma _{\theta _1}^2), \;\;\quad \epsilon _{i,j} \sim N(0,1),\;\; \quad e_{i,j}\sim N(0,\sigma _e^2) \end{aligned}$$
(14)

JM’s specification differs from above in that 1) the utility includes the pre-search error term \(e_{i,j}\) and 2) the search costs are consumer-specific \((c_{i,j} = c_i \; \forall j)\). Under both specifications, consumers have uncertainties in the value of \(\epsilon _{i,j}\).Footnote 11 Jiang et al. (2021) use the log-normal distribution for the search cost distribution, but in this part, we use the exponential distribution so the main difference is how the randomness is added to the reservation utility.

5.2 Comparison of parameter recovery

In this section, we compare various likelihood simulators based on how they recover the parameter values used in the data-generating process (DGP). We first compare PM, KSFS, CFS, and JM by simulating the search and purchase decisions with the same parameter values and estimating the model with those likelihood simulators. From the first study, we conclude that JM comes closest to PM in terms of parameter recovery, and further compare the parameter recovery of PM and JM in the second simulation study in which the dataset is generated by parameter values randomly drawn from uniform distributions. Next, we compare the parameter recovery by PM and IS proposed by Morozov et al. (2021), which requires all preference parameters to be heterogeneous across consumers. Lastly, we demonstrate that PM recovers true parameter values when a log-normal distribution is used in DGP and is correctly assumed in the estimation.

We base our comparison of likelihood simulators mainly on the parameter recovery, because we believe that the precise recovery of parameter values in simulation settings is the basic requirement for a likelihood simulator to be applied with confidence in empirical settings. Even if a likelihood simulator produces a better in-sample fit and out-of-sample prediction in the simulation setting based on biased parameter estimates, however unlikely, the biased estimates in the simulation settings are likely to raise concerns about the validity of predictions and further inferences, including the price elasticity and the consumer welfare effect, in the empirical settings. Therefore, in simulation settings, in which true parameter values are known, the parameter recovery should take precedence over the in-sample fit and the out-of-sample prediction.

5.2.1 Parameter recovery with fixed parameter values

In this study, we use one set of parameter values to simulate search and purchase decisions of 1,000 consumers according to SSM with \(J=5\) and an outside option (no purchase). We generate 100 such distinct datasets by using the same parameters but varying \(X_{i,j}\)’s and random draws of \(c_{i,j}\), \(\epsilon _{i,j}\), and \(e_{i,j}\). For each dataset, we construct the simulated likelihood with PM, KSFS, CFS, and JM and estimate the model parameters to obtain 100 sets of parameter estimates.Footnote 12

Table 1 Monte Carlo estimation results with fixed true parameter values

To explore the performance of other methods, we present estimates from 1) KSFS using different scaling factors \((s = 1,5,10,\text { and } 25)\), 2) CFS using different number of random draws \((n_d = 50, 100, 150, \text { and }200)\), and 3) JM using different \(\sigma _e\)’s. Note that since CFS requires computation of the proportion of one variable while fixing another variable, the number of random draws made per observation is in fact \(n_d^2\). We use \(n_d=100\) for PM, KSFS, and JM.Footnote 13 For JM, we generate the data with \(\sigma _e=0.5\) and estimate the model with \(\sigma _e\) fixed at different values \((\sigma _e \in \{0.25, 0.5, 1, 1.5\})\) to emulate the grid search required to identify \(\sigma _e\) (see Jiang et al. (2021)). Lastly, we estimate log values of \(\sigma _{\theta _1}\) and \(\bar{c}\) to ensure positive values and use the origin as starting values for the optimization of all simulators.Footnote 14

Table 1 summarizes the results. The first row shows the parameter values used to generate the data, and the first 10 columns show the mean (SD in parentheses) of estimates across datasets. The next column shows the average (wall-clock) run time in minutes taken to complete the estimation. Table 1 shows that PM and JM with the correct \(\sigma _e\) perform better than CFS and KSFS do. PM and JM0.5 have estimates whose 1) means are closer to the true parameter values and 2) SDs are smaller than KSFS and CFS estimates. Lastly, PM and JM 3) take less time to converge. We discuss the mechanism and issues of other simulators in Section 5.4. Here, we conclude that JM is the only likelihood simulators whose performance comes close to PM’s and move onto further comparison of these two simulators.Footnote 15

Table 2 Monte Carlo estimation results with randomly drawn true parameter values

5.2.2 Parameter recovery with random parameter values

In this study, instead of using the same parameter values to generate all datasets, we randomly draw the true parameter values from uniform distributions and, for each set of parameters, generate a dataset of 1,000 consumers’ search and purchase decisions for PM and another dataset for JM according to the respective DGP. Then, we estimate the model with PM and JM. We repeat this process 100 times and collect the deviation of estimates from the true parameter values. The first two rows of Table 2 show the lower and upper bounds of the uniform distributions from which the true parameter values are drawn. For instance, we randomly draw \(\alpha _j\) from \(U(-1,1)\) and use the drawn value to generate the data. In subsequent rows, we report the root-mean-square error (RMSE) and the median absolute deviation (MAD) of estimates.

Note that JM datasets are generated with \(\sigma _e=0.5\), but the model is estimated with \(\sigma _e\) fixed at other values (0.25, 1, and 1.5) as well to emulate the grid search. Table 2 shows that PM has RMSEs and MADs that are consistently low across parameters, and JM results with \(\sigma _e\)’s other than 0.5 show much larger RMSEs and MADs. Although \(\text {JM}_{0.5}\) estimates show the results comparable to PM’s, the last two parameters, namely the outside option mean utility \(\mu _0\) and the log of mean search cost \(\log \bar{c}\), show much larger deviations than PM estimates do. Therefore, we conclude that PM recovers the true parameter values more precisely than JM does and that PM’s parameter recovery does not depend on a particular set of true parameter values.

5.2.3 Comparison with importance sampling

Another recent approach to estimate SSM, proposed by Morozov et al. (2021), utilizes importance sampling (IS). This approach avoids making random draws of variables whose distributions are truncated in accordance with the search set conditions, which is computationally expensive, by instead making random draws from another density, called a proposal density, that does not depend on the model parameters. Then, the approach counts the proportion of those random draws that satisfy all the search set conditions and smooths the proportion with the ratio of the proposal density and the density with the model parameters. The smoothed quantity serves as the simulated likelihood of a search and purchase decision.Footnote 16

To compare the parameter recovery by PM and IS, we simulate search and purchase decisions by 1,000 consumers with the following model specification. Note that IS needs all preference parameters, in addition to the search cost, to be heterogeneous across consumers.

$$\begin{aligned} u_{i,j}&= \alpha _{i,j} + X_{i,j,1}\theta _{i,1} + X_{i,j,2}\theta _{i,2} + \epsilon _{i,j}, \quad u_{i,0} = \mu _{i,0} + \epsilon _{i,0}, \quad c_{i,j} \sim \text {Exp}(\bar{c})\\ \mathbf {\alpha }_i&\sim N(\bar{\alpha }, \sigma _\alpha ^2 \textbf{I}_{J+1}), \; \theta _{i,k} \sim N(\bar{\theta }_k, \sigma _{\theta _k}^2) \text { for } k = 1,2, \; \epsilon _{i,j} \sim N(0,1) \text { for } j = 0,1,\cdots , J \end{aligned}$$

where \(\mathbf {\alpha }_i = [\alpha _{i,2},\cdots ,\alpha _{i,5}]'\) and \(\bar{\alpha } = [\bar{\alpha }_{2}, \cdots , \bar{\alpha }_{5}]'\). Note that \(\alpha _{i,1}\) is fixed at zero for identification purpose. To keep the simulation simple, we assume that the product intercepts are uncorrelated and have the same heterogeneity \(\sigma _\alpha ^2\). We create 100 distinct datasets and estimate the model with PM and IS. Table 3 summarizes the true parameter values used to generate the data and mean and SD of parameter estimates across datasets.

Table 3 Parameter Recovery of PM and Importance Sampling

While PM recovers the parameter values, estimates from IS exhibit biases. We attribute these biases to the similarity of IS to CFS because, although smoothed, the proportion of random draws is used as the simulated likelihood. In addition, the implementation of IS is computationally simpler than PM is since it avoids making random draws from truncated distributions, but IS requires a much larger number of random draws per observation, offsetting the computational simplicity. In this particular example, we make \(100^2\) random draws per observation for IS and 100 for PM.

5.2.4 Proposed method with log-normally distributed search costs

In Section 3, we choose the exponential distribution as the search cost distribution, since the log-normal distribution can be numerically unstable in calculating cdf and making random draws from the truncated distribution. However, we do not preclude applying PM to cases in which a log-normal distribution is assumed for search costs, and therefore, in this section, we present the parameter recovery by PM in such a case. We first create 100 distinct datasets of search and purchase decisions by 1,000 consumers whose search costs are log-normally distributed. Then, we estimate model parameters by applying PM with the correctly assumed search cost distribution. Note that DGP below is identical to the one used in Section 5.2.1 except for the search cost distribution.

$$\begin{aligned} u_{i,j}&= \alpha _{j} + X_{1,i,j}\theta _{1,i} + X_{2,i,j}\theta _{2} + \epsilon _{i,j}, \;\;\quad u_{i,0} = \mu _0 + \epsilon _{i,0} \nonumber \\ z_{i,j}&= \alpha _{j} + X_{1,i,j}\theta _{1,i} + X_{2,i,j}\theta _{2} + g^{-1}\left( c_{i,j}\right) \nonumber \\ \log c_{i,j}&\sim N(\bar{c}, \sigma _c^2), \quad \;\; \theta _{1,i} \sim N(\bar{\theta }_1, \sigma _{\theta _1}^2), \;\;\quad \epsilon _{i,j} \sim N(0,1) \end{aligned}$$
(15)

Table 4 presents mean and SD of estimates across datasets. It shows that PM precisely recovers true parameter values even with log-normally distributed search costs, given that the distribution is correctly assumed, implying that the identification strategy works well and does not hinge on a particular choice of distribution assumption on search costs. Therefore, researchers may apply PM to estimate SSM with either the exponential distribution or the log-normal distribution assumed on search costs.

5.3 Probability approximation

The last column of Table 1, named “SSE”, shows the sum of squared errors of probabilities approximated by PM, KSFS, CFS, and JM. To compute these numbers, we simulate search and purchase decisions of 100 million consumers who are exposed to the same product characteristics \((X_{i,j}=X_j,\; \forall \; i)\). Then, we collect unique combinations of the search set and the purchase decision \((S_l, D_l)\) and count the number of occurrences of each combination. We treat the frequency as the true joint probability of \((S_l,D_l)\) since there are no closed-form solutions for such probabilities. Then, we approximate the probability of each \((S_l, D_l)\) by feeding the true parameter values into the four likelihood simulators. SSEs are calculated with the observed frequency and the approximated probability as follows.

$$ SSE_M = \sum _l \left( \tilde{\Pr }(S_l, D_l|\Theta ^*,M) - {\Pr }^*(S_l, D_l)\right) ^2 $$

where l is the index for a unique combination of search and purchase decision, \(\Theta ^*\) the true parameters, and M the choice of likelihood simulator. \(\tilde{\Pr }(S_l, D_l|\Theta ^*,M)\) is the approximated probability of \((S_l, D_l)\) given \(\Theta ^*\) and M, and \(\Pr ^*(S_l, D_l)\) the observed frequency of \((S_l, D_l)\).

First, all cases of KSFS and CFS show larger SSEs than PM’s. Note that KSFS with the scaling factor equal to 1 shows the largest SSE, while its mean estimates, among KSFS estimates, are closest to the true parameter values, and it appears that increasing \(n_d\) for CFS does not improve SSEs. Second, SSEs for JM decrease as \(\sigma _e\) increases. While JM1 and JM1.5 show SSEs smaller than PM’s, their inability to recover the true parameters outweighs the smaller SSEs of the approximated probabilities. Therefore, we conclude that PM has a strong advantage over other simulators in that, besides the parameter recovery and the run time, it closely approximates the true probabilities of search and purchase decisions.

5.4 Discussion of other simulators

In this section, we briefly discuss how KSFS, CFS, and JM work and what potential issues they have.

Table 4 Parameter Recovery by PM with Log-Normally Distributed Search Costs

5.4.1 Kernel-smoothed frequency simulator

We first describe how KSFS constructs the simulated likelihood with an example from Eq. (6) in which \(S_i = \{2, 3\}\), \(D_i = 3\), and \(J=3\), whose joint probability of search and purchase is reproduced below.

$$\begin{aligned} \Pr (S_i = \{2, 3\}, D_i = 3) = \Pr (&z_{i,2}> z_{i,3} \bigcap \min (z_{i,3}, u_{i,3})> u_{i,0} \\&\bigcap \min (z_{i,3}, u_{i,3})> u_{i,2} \bigcap \min (z_{i,3}, u_{i,3})>\max _{l\notin \{S_i\cup 0\}} z_{i,l}) \nonumber \end{aligned}$$
(16)

The construction of KSFS likelihood begins with making \(n_d\) random draws of \(u_{i,j}^{(d)}\) and \(z_{i,j}^{(d)}\) \((d=1,\cdots ,n_d)\) of all alternatives from their unrestricted (non-truncated) distributions with the current guess of parameters. Then, as given below, they are plugged into equations, which are converted from Eq. (16) so that they return positive values if inequalities in Eq. (16) are satisfied.

$$\begin{aligned} w_{i,1}^{(d,1)} = z_{i,2}^{(d)} - \max (z_{i,1}^{(d)}, z_{i,3}^{(d)}),&\quad w_{i,1}^{(d,2)} = z_{i,3}^{(d)} - z_{i,1}^{(d)}, \quad w_{i,2}^{(d)} = u_{i,3}^{(d)} - z_{i,1}^{(d)}, \nonumber \\ w_{i,3}^{(d)} = u_{i,3}^{(d)} - \max (u_{i,2}^{(d)}, u_{i,0}^{(d)}),&\quad w_{i,4}^{(d)} = z_{i,3}^{(d)} - \max (u_{i,2}^{(d)}, u_{i,0}^{(d)}) \end{aligned}$$
(17)

Then, \(w_{i,k}^{(d)}\) values are plugged into a kernel, for which a popular choice in the literature is cdf of a scaled multivariate logistic distribution (Honka and Chintagunta, 2017; Ursu, 2018), with some positive scaling factor s.Footnote 17 The average of the kernel-smoothed values across \(u_{i,j}^{(d)}\) and \(z_{i,j}^{(d)}\) approximates the probability of consumer i’s search and purchase decision.

$$\begin{aligned} \Pr (S_i, D_i) \!\approx \! \frac{1}{n_d}\sum _{d=1}^{n_d} P_i^{(d)}, \quad \text {where } P_i^{(d)} \!=\! \frac{1}{1+e^{-s w_{i,1}^{(d,1)}}+e^{-s w_{i,1}^{(d,2)}}+\sum _{k=2}^4 e^{-s w_{i,k}^{(d)}}} \end{aligned}$$

The mechanism underlying KSFS is intuitive: a set of random draws that violates inequalities in Eq. (16) (i.e., search set conditions), thereby returning negative values in equations in Eq. (17), is penalized by the kernel and the scaling factor. However, KSFS and GHK-type likelihood simulators differ in how the penalty is imposed. In the GHK-type simulator, the penalty is imposed in the form of lowered likelihood as set forth by the assumed distributions of the model components (e.g. distribution of \(\epsilon _{i,j}\) and \(c_{i,j}\) in PM). On the other hand, in KSFS, the penalty is imposed by a kernel and scaling factors, which are not a part of the underlying model specification. Therefore, if the kernel and the scaling factor fail to properly represent the behavior of the distributions of model components that govern DGP, the parameter estimates from KSFS will be biased.

Accordingly, Table 1 shows that KSFS estimates are sensitive to the choice of scaling factors. KSFS estimates with smaller scaling factors have means closer to the true parameter values but larger SDs across iterations. As the scaling factor increases, KSFS estimates’ means tend to move toward zero perhaps because the optimization software tries to avoid the objective function returning infinity since it is easier for the objective function to return infinity as the scaling factor increases. Such an impact of scaling factors on estimates suggests that one should be cautious when applying KSFS to real-world data, since scaling factors are difficult to calibrate as, unlike in simulation settings, true parameter values are unknown.

5.4.2 Crude frequency simulator

CFS, another simulator used to estimate SSM, makes random draws of \(c_{i,j}\) and construct \(z_{i,j}\) with the current guess of parameters. Then, it makes random draws of \(u_{i,j}\) from distributions truncated in accordance with the search set conditions and random draws of \(z_{i,j}\). The proportion of random draws that satisfy the choice and the stopping rules serves as the joint probability of the search and purchase.

Since the proportion among random draws approximates probabilities under CFS, one needs to make a sufficiently large number of random draws to ensure the smoothness of the objective function. However, as shown in Table 1, increasing the number of random draws does not necessarily guarantee estimates with smaller bias and spread. Making even more random draws, 1,000 per dimension, for example, might yield better estimates, but such a modification will impose a much heavier computational burden, diminishing the advantage of CFS’s computational simplicity. Note also that, even with only 50 random draws per dimension, CFS takes the longest time to converge.

5.4.3 Jiang et al. (2021)’s method

JM makes random draws of \(c_i\) and \(e_{i,j}\) to construct \(z_{i,j}\) with the current guess of parameters. Then, beginning from the maximum of non-searched products’ reservation utilities, JM sequentially makes random draws of other quantities, whose distributions are truncated by the search set conditions, and computes the joint probability of search and purchase with the distributions assumed on two error terms \(e_{i,j}\) and \(\epsilon _{i,j}\).

In Table 1, JM with the correct \(\sigma _e\) (JM0.5) shows a performance comparable to PM’s, although PM’s estimates are closer to the true values and PM takes a shorter time to converge. However, the issue with JM is that the identification of \(\sigma _e\) requires a grid search. We confirm that, of four \(\sigma _e\) values in Table 1, the correct \(\sigma _e=0.5\) yields the maximized log likelihood that is consistently higher than those with other \(\sigma _e\) values, given that the same pseudo-random number generator is used across \(\sigma _e\)’s.

Although we confirm that JM with the correct \(\sigma _e\) performs better than the other three values, we also note that the other attempted values of \(\sigma _e\) are not in the close neighborhood of the true value. If the grid search is to be used with confidence, the correct \(\sigma _e\) should consistently yield the highest maximized log likelihood even among \(\sigma _e\) values in the close neighborhood of the true value. To check whether it is the case, we estimate the model with the same simulated data while fixing \(\sigma _e\) at values from 0.4 to 0.6 in 0.01 increments. We repeat this process for all 100 sets of data and count how many times each \(\sigma _e\) value achieves the highest maximized likelihood among all \(\sigma _e\)’s.

Fig. 1
figure 1

Number of times \(\sigma _e\) achieves the highest maximized likelihood

Figure 1 shows such counts. Surprisingly, the correct \(\sigma _e=0.5\) achieves the highest maximized likelihood only two times out of 100. In fact, \(\sigma _e\)’s that achieve the highest maximized likelihood the most turn out to be 0.44 and 0.46, each 14 times, followed by 0.45 (12 times) and 0.40 (10 times). Though the difference between those two values (0.44 and 0.46) and the true value (0.5) seems small and harmless, such a bias even in the simple simulation setting implies that a precise inference with JM in empirical applications may be challenging.

Table 5 Cases of model mis-specifications
Table 6 PM estimates with mis-specified model

5.5 Robustness to model mis-specification

So far, we have compared the performance of likelihood simulators assuming that the model is correctly specified so that it coincides with DGP. To further demonstrate the advantage of PM, in this section, we present simulation study results that show how PM can be still useful even if the model is mis-specified and therefore differs from DGP. We consider six cases of model mis-specifications summarized in Table 5 with two potential sources of mis-specification—the stochastic component in the reservation utility and the search cost distribution. For instance, in Case 1, the data is generated, as in Jiang et al. (2021), with pre-search error term \(e_{i,j}\) as the stochastic component in the reservation utility and exponentially distributed search costs that are homogeneous within consumer, but the model is estimated, as under PM, with no pre-search error but exponentially distributed heterogeneous search costs \(c_{i,j}\) as the stochastic component. For each case, we create 100 datasets and estimate parameters for each dataset. For DGPs with the pre-search error, we use \(\sigma _e = 0.5\).

Table 6 presents the mean of estimates across datasets (SD in parentheses). The first column under each case shows the parameter values used in DGP. Table 6 shows that the mis-specification of the utility function and/or the search cost distribution causes the estimates to be biased, even though the bias may not be statistically significant in some cases.

The potential of PM even when the model is mis-specified can be found in Table 7, which shows the mean of parameter estimates divided by estimates of \(\alpha _2\). Regardless of the source of stochasticity and the search cost distribution used in DGP, the ratio of preference parameters \((\alpha _j, \bar{\theta }_1, \theta _2)\) to \(\alpha _2\) is consistently estimated. This result indicates that even when the model is mis-specified, PM can be used to examine consumers’ relative preferences toward product features or the relative effectiveness of marketing mix from consumer search data.Footnote 18

Table 7 PM estimate ratios with mis-specified model

6 Field data validation

In this section, we validate PM with field data. We first describe the data and lay out the model. Then, we present the estimation results with three likelihood simulators—PM, JM, and KSFS—and compare their out-of-sample predictions.Footnote 19 Lastly, we use PM estimates to compute the search cost and the position effect elasticities of search.

6.1 Data

This dataset is provided by one of the leading online travel agencies, Expedia.Footnote 20 On its website, after a consumer submits a search query, he is presented with an ordered list of hotels, called an impression. The original data include 136,886 hotels in 23,715 destinations in 172 countries presented in 399,344 impressions. In the data, we observe which hotels are clicked and purchased by consumers, and each impression has at least one click. We note that the dataset does not reveal the order in which hotels are clicked. While this information is crucial in estimating the model, over 90% of impressions have only one click, and therefore, we assume that consumers click first on hotels that are located closer to the top of the impression, as in Ursu (2018).Footnote 21

An impression presents 1) star ratings, 2) review scores, 3) chains, 4) location scores, 5) prices per night, and 6) ongoing promotions of listed hotels. The star rating (from 0 to 5) is assigned by Expedia according to the type of hotel (e.g. motel, hotel, upscale hotel), the level of luxury, and the amenities provided. The review score (from 0 to 5) is the average review score from consumers who made reservations for the hotel on Expedia in the past. The chain variable reveals to consumers which hotel chain (e.g. Hilton, Marriott) a hotel belongs to, but we, as researchers, observe not the identity of the chain, but whether the hotel belongs to a chain. The location score (from 0 to 7) is assigned by Expedia to summarize how central a hotel’s location is, what amenities surround it, and so on. These four variables are invariant over the sample period. The price per night is the pre-tax price presented to consumers. Lastly, the promotion indicator shows whether the hotel is on a promotion.

In 70% of impressions, the positions of hotels are determined by Expedia’s proprietary algorithm, and in the remaining impressions, Expedia randomizes the positions of hotels that meet the search criteria submitted by consumers. It is also randomly determined which sorting algorithm is applied to an impression following each search query. To remove concerns about the endogeneity of the order of hotels within an impression, we restrict the estimation sample to randomly sorted impressions. Consumers may sort or filter hotels presented to them, but the dataset contains only impressions to which consumers do not apply such functions.

We also remove impressions that contain any hotel whose price is too low (below $10 per night) or too high (above $1,000 per night) so the estimation sample represents “typical” search environment. In addition, to correct any errors in the price, we remove impressions with a purchase whose total cost exceeds 130% of the price per night multiplied by the length of stay.Footnote 22 As the original data contain over 20,000 destinations, we consider only the four destinations with the most impressions. Lastly, in order to mitigate the effect of varying number of hotels per impression, we include impressions whose lengths are equal to the two most frequent lengths of impressions for each destination (e.g. if the two most frequent lengths of impressions are 31 and 32 for Destination 1, we only consider impressions for the destination that have 31 or 32 hotels). Table 8 summarizes the data included in the estimation sample.

Table 8 Data summary statistics

6.2 Model

We follow Ursu (2018) to specify consumer i’s utility from hotel at the j-th position as below.

$$\begin{aligned} u_{i,j} = X_{i,j} \theta + \epsilon _{i,j} = \mu _{i,j} + \epsilon _{i,j}, \quad \epsilon _{i,j} \sim N(0, 1) \end{aligned}$$

where \(X_{i,j}\) is a vector that contains the hotel’s star rating, review score, location score, price, chain indicator, and promotion indicator. Since the data contain more than 50,000 hotels, including the hotel fixed effect in the model is not feasible. As in Section 3, we assume that consumers have uncertainties in the match value \(\epsilon _{i,j}\). The realized utility from no purchase, assumed to be revealed by the first search, is given by

$$\begin{aligned} u_{i,0} = \mu _0 + \epsilon _{i,0}, \quad \epsilon _{i,0} \sim N(0,1) \end{aligned}$$

Note that we assume that consumers have the homogeneous preferences (i.e., \(\theta _i = \theta ,\; \forall i\)).Footnote 23 Upon receiving an impression, consumer i observes the expected utilities \(\mu _{i,j}\) for all hotels in the impression and begins the search process to reveal the match values \(\epsilon _{i,j}\) and thereby the realized utilities \(u_{i,j}\).

To do so, the consumer has to incur a search cost for each click on a hotel. Ursu (2018) points out that hotels closer to the bottom of an impression get fewer clicks than those closer to the top. To incorporate this trend, she specifies that the search cost increases as the hotel position moves down, and the increase in the search cost is called “position effect”.

To achieve a similar goal, we use the “shifted” exponential distribution. The shifted exponential distribution with the location parameter \(L > 0\) and the parameter \(\lambda \) is same as the exponential distribution with the parameter \(\lambda \), except that it is shifted by L toward the positive infinity. The shifted exponential distribution has the mean equal to \(\lambda + L\) and the standard deviation equal to \(\lambda \). Its pdf is given as follows.

$$\begin{aligned} f(x|\lambda , L) = \frac{1}{\lambda } \exp \left( -\frac{x-L}{\lambda }\right) , \quad x \ge L \end{aligned}$$

We incorporate the position effect into the search cost by specifying the search cost for the hotel at j-th position to follow the shifted exponential distribution given below.

$$c_{i,j}\sim \text {ShiftExp}(\gamma _0, \gamma _1\times p_j) \text { for } c_{i,j} \ge \gamma _1\times p_j, \; p_j = 1+\frac{j}{40}$$

where \(\gamma _0\) represents the mean base search cost and \(\gamma _1\) is the position effect on the mean search cost. \(p_{j}\) is the position variable that is transformed for proper scaling from the position of hotels in the raw data. Under this specification, the mean search cost at j-th position will be \(\gamma _0 + \gamma _1\times p_j\) and the standard deviation of search costs will be constant at \(\gamma _0\) across positions.Footnote 24 We use this specification to estimate the model with PM and KSFS.

For JM, we make two modifications in the utility and the search cost specifications. First, JM introduces randomness in the reservation utility via a pre-search error term \(e_{i,j}\) that is observed by consumers prior to search. Therefore, the utility function for JM is given by

$$\begin{aligned} u_{i,j}&= X_{i,j} \theta + e_{i,j} + \epsilon _{i,j}, \quad u_{i,0} = \mu _0 + \epsilon _{i,0}\\ \epsilon _{i,j}&\sim N(0,1), \quad e_{i,j}\sim N(0,\sigma _e^2) \end{aligned}$$

Second, JM specifies that the search costs are common across products within a consumer so that \(e_{i,j}\) is the only source of the randomness in the reservation utility. Hence, as given below, we model that the consumer-specific search cost \(c_i\) is drawn from the exponential distribution with the mean \(\gamma _0\) and the position effect is deterministically added to \(c_i\) to form the consumer-product-specific search cost \(c_{i,j}\).

$$\begin{aligned} c_{i,j}&= c_i + \gamma _1 \times p_{j}, \quad c_i \sim \text {Exp}(\gamma _0) \end{aligned}$$

6.3 Estimation results

We estimate the model with PM, JM, and KSFS, and Table 9 presents the model parameter estimates. The standard errors in parentheses are obtained by the bootstrap resampling method (Efron and Tibshirani, 1993) with \(B = 200\).

Table 9 Estimation results

To choose the final estimates for JM, we perform a grid search for \(\sigma _e\) by estimating the model with \(\sigma _e\) equal to values from 0.3 to 0.7 separated by 0.05 and choosing the set of estimates with \(\sigma _e\) that achieves the highest maximized log likelihood. For KSFS, we choose estimates obtained with the scaling factor equal to 1, since they show RMSEs of the number of clicks on each position that are consistently lower than other scaling factors across destinations, as presented in Table 10.Footnote 25

One should note that KSFS does not allow the calibration of scaling factors based on the maximized likelihood in this case. The likelihood of SSM is complicated and involves several quantities (i.e., search costs, realized utilities, and reservation utilities). Therefore, most draws of the realized and the reservation utilities drawn without enforcing the search set conditions are bound to fail to satisfy the search set conditions and thereby return negative values for equations in Eq. (17), subject to the penalty imposed by the scaling factors. It follows that the maximized likelihood is likely to decline as the positive scaling factor increases as shown in Web Appendix 2.1.

We now discuss PM estimates from Table 9 and then point out how KSFS and JM estimates differ from them. PM estimates indicate that consumers prefer hotels on promotion with higher star ratings, higher locations scores, and lower prices. The coefficient estimates for the chain indicator show mixed signs across destinations. If the data reveal the identities of chains, we can include chain fixed effects into the model and compare consumer preferences across chains. The estimates of the outside option’s mean utility have large magnitudes, reflecting the small proportion of impressions concluding with purchases in all destinations. Lastly, the position effect is positive and significant across destinations, implying that it becomes more difficult for hotels to be searched (i.e., clicked) as their positions move downward in the impression. In fact, it is estimated that moving down by one position has an impact on the reservation utilities that is equivalent to the increase in price by $1.94 for Destination 1, $0.48 for Destination 2, $1.43 for Destination 3, and $0.53 for Destination 4.

Fig. 2
figure 2

Probability of click by position

Figure 2 visualizes PM estimates of the position effect on the search decisions of consumers. To separate out the position effect, we assume that all alternatives in an impression have the same expected utility. We compute the expected utilities of all hotels within a destination with PM estimates from Table 9 and set all hotels’ expected utilities are equal to the median of expected utilities. For each destination, we simulate the search decision of 10,000 consumers and count the number of clicks on each position. This process is repeated 100 times, and Fig. 2 shows the average click probability on each positions with the 95% confidence band. For instance, in Destination 1, 5% of consumers are predicted to click on the first hotel in the impression, while only 1.5% of consumers are predicted to click on the last hotel.

The figure shows that, given that all hotels within an impression have the same expected utilities, the rate of decrease in the click probability is fairly constant across positions. In Destination 1, on average, a hotel has a click probability that is 3.5% lower than the hotel right above it, and in other destinations, the click probability decreases by 1.5% as a hotel moves down by one slot.

JM estimates differ from PM estimates in several ways. For instance, in Destinations 1 and 4, JM estimates of the coefficient for the promotion indicator are insignificant, while PM estimates are positive and significant. In addition, the base search cost \(\gamma _0\) and the position effect \(\gamma _1\) estimates across all destinations are smaller with JM than with PM. In fact, JM estimates of \(\gamma _1\) are all insignificant, while PM estimates are all significant. Most of preference parameter estimates \(\theta \) from KSFS show much smaller magnitudes than PM estimates, and the base search cost estimates from KSFS too are much smaller than PM estimates. Since the maximized likelihoods do not permit a direct comparison across simulators, we cannot decide which simulator best fits the data solely from Table 9, and therefore in the next section we compare the out-of-sample predictions of these simulators.

6.4 Out-of-sample prediction

In this section, we use stratified ten-fold cross-validation (Kohavi, 1995) to compare three likelihood simulators’ out-of-sample predictions. Impressions of a destination are partitioned into ten equal-sized subgroups so that each partition has the same proportion of impressions concluding with purchases as the original set of impressions. Then, we estimate the model with nine partitions and make R sets of predictions of search and purchase decisions using the data from the remaining partition. This process is repeated until each subgroup is used as the test set. Then, we compare 1) the total number of purchases, 2) the total number of searches, and 3) the number of clicks on each position from the test set and from the prediction and summarize the results by RMSEs. \(RMSE_{p}^{(d,m)}\) denotes the RMSE of the total number of purchases for destination d and likelihood simulator m, \(RMSE_{s}^{(d,m)}\) RMSE of the total number of searches, and \(RMSE_{q}^{(d,m)}\) RMSE of the number of searches on each position.

$$\begin{aligned} RMSE_{p}^{(d,m)}&= \frac{1}{K}\sum _{k=1}^{K}\sqrt{\frac{1}{R}\sum _{r=1}^R\left( \tilde{n}_{p,k,r}^{(d,m)} - n_{p,k}^{(d)}\right) ^2} \\ RMSE_{s}^{(d,m)}&= \frac{1}{K}\sum _{k=1}^{K}\sqrt{\frac{1}{R}\sum _{r=1}^R\left( \tilde{n}_{s,k,r}^{(d,m)} - n_{s,k}^{(d)}\right) ^2} \\ RMSE_{q}^{(d,m)}&= \frac{1}{K}\sum _{k=1}^{K}\sqrt{\frac{1}{Q\times R}\sum _{r=1}^R\sum _{q=1}^Q\left( \tilde{n}_{q,k,r}^{(d,m)} - n_{q,k}^{(d)}\right) ^2} \end{aligned}$$
Table 10 Mean RMSEs from cross-validation

where k is the index for the partition used for the test set, and K is the total number of partitions. \(\tilde{n}_{p,k,r}^{(d,m)}\) is the total number of purchases for destination d predicted with estimates from likelihood simulator m in the prediction set r for the test set k, and \(n_{p,k}\) is the total number of purchases observed in the test set k. \(\tilde{n}_{s,k,r}^{(d,m)}\) and \(n_{s,k}\) are similarly defined for the total number of searches. \(\tilde{n}_{q,k,r}^{(d,m)}\) is the total number of clicks on position q in destination d predicted with estimates from likelihood simulator m in the prediction set r for the test set k, and \(n_{q,k}\) is the total number of clicks on position q observed in the test set k.

Table 10 presents these RMSEs. In general, PM has the smallest RMSEs of three likelihood simulators, demonstrating that PM estimates best predict consumers’ search behavior. Although KSFS with scaling factor equal to 1 shows \(RMSE_p\) that are smaller than PM’s across destinations, it is outperformed by PM in terms of other RMSEs. Next, \(RMSE_s\) and \(RMSE_q\) from JM are consistently larger than those from PM, while a few \(RMSE_p\)’s from JM are smaller than those from PM. We note here that the out-of-sample prediction of JM improves as the assumed value of \(\sigma _e\) increases. This pattern reveals the potential problem of grid search because it contradicts the pattern from the in-sample fit comparison across \(\sigma _e\)’s (see Table 8 in Web Appendix 2.1.) that the maximized log likelihood increases as \(\sigma _e\) decreases. This contradiction shows that researchers may arrive at a different choice of \(\sigma _e\) depending on whether they use the in-sample fit or the out-of-sample prediction as the criterion.

To summarize the process needed to use JM in the empirical setting, one would have to estimate the same model with various candidate values of \(\sigma _e\) and compare the in-sample fit based on the maximized log likelihood. Instead, one may perform and compare out-of-sample predictions of those \(\sigma _e\) values. Importantly, the conclusions based on the in-sample fit and the out-of-sample prediction may not coincide. If KSFS were to be used, then the comparison of the out-of-sample predictions would be needed to choose the proper scaling factor. These multiple steps required to obtain the final estimates with JM and KSFS highlight the simplicity of using PM estimates, which consists of only a single optimization.

6.5 Search cost and position effect elasticities of search

Now that we have established that PM has a better out-of-sample prediction of the consumer search behavior than KSFS and JM’s, in this section, as an additional analysis, we examine how the number of searches on each position is affected by changes in the search cost \(c_{i,j}\) and the position effect \(\gamma _1\). We summarize the results by the search cost elasticity of search \(e_{c,j}\) and the position effect elasticity of search \(e_{\gamma _1, j}\). Since no closed-form solution for the search probability is available, we approximate these elasticities with a simulation-based method. First, for each impression, we simulate the search and purchase decisions 10,000 times with the PM estimates from Table 9 and count the number of clicks on each position. Then, we simulate search and purchase decisions with the search costs \(c_{i,j}\) reduced by 10% and count the number of clicks on each position. We repeat this process for all impressions and approximate elasticities as follows.

$$\begin{aligned} e_{c, j}&= \frac{\partial n_j}{\partial c_{i,j}} \times \frac{c_{i,j}}{n_j}\nonumber \\&\approx \frac{1}{N} \sum _{i=1}^N\frac{(\tilde{n}_{i,j} - n_{i,j})/n_{i,j}}{(\tilde{c}_{i,j} - c_{i,j})/c_{i,j}} \end{aligned}$$
(18)

where \(c_{i,j}\) is the simulated search cost for position j of impression i, and \(\tilde{c}_{i,j}\) is the perturbed search cost (i.e., \(c_{i,j}\) reduced by 10%). \(n_{i,j}\) is the predicted number of clicks on position j of impression i from 10,000 iterations at \(c_{i,j}\), and \(\tilde{n}_{i,j}\) is the same quantity simulated with \(\tilde{c}_{i,j}\). N denotes the total number of impressions.

Similarly, the position effect elasticity of search \(e_{\gamma _1, j}\) is approximated as follows.

$$\begin{aligned} e_{\gamma _1, j} \approx \frac{1}{N}\sum _{i=1}^N \frac{(\tilde{n}_{i,j} - n_{i,j})/n_{i,j}}{(\tilde{c}_{i,j} - c_{i,j})/\gamma _1}\times p_j \end{aligned}$$
(19)

Note that we use the inverse transform sampling for search costs and its derivative below.

$$\begin{aligned} c_{i,j}&= F_c^{-1}(q_{i,j}|\gamma _0, \gamma _1\times p_j), \; \text { where } q\sim U(0,1) \nonumber \\&= \gamma _1\times p_j - \gamma _0 \log (1-q_{i,j}) \nonumber \\ \frac{\partial c_{i,j}}{\partial \gamma _1}&= p_j \end{aligned}$$
(20)

where \(F_c^{-1}(\cdot |\gamma _0, \gamma _1\times p_j)\) is the inverse cdf of the shifted exponential distribution with inverse rate parameter \(\gamma _0\) and location parameter \(\gamma _1\times p_j\).

Fig. 3
figure 3

Search cost and position effect elasticities of search

Figure 3 presents the approximated elasticities. First, interestingly, hotels in the top few positions show positive position effect elasticities. As the position effect becomes larger, then the difference between the reservation utilities of upper-positioned hotels and those of lower-positioned hotels will be larger, implying that consumers become less willing to click on lower-positioned hotels. Therefore, upper-positioned hotels will receive more clicks as the position effect increases. Second, except the first few positions with the positive position effect elasticities, both elasticities tend to increase in magnitude as the hotel position moves downward, implying that the hotels in lower positions are more susceptible to such changes in the search cost and the position effect. However, the position effect elasticity increases at a faster rate than the search cost elasticity does as the hotel position moves down, and the position effect elasticity is at least 70% greater than the search cost elasticity at the bottom of an impression.

The search intermediary (Expedia in this context) can exploit this result in designing its sorting algorithm and/or search result display to induce clicks on different parts of an impression. For instance, if Expedia wants to increase the number of clicks in the lower part of impressions, then it will be far more effective to facilitate navigating down the search result page, thereby reducing the position effect, than to reduce the search cost. On the other hand, reducing the search cost and increasing the position effect at the same time will increase the number of clicks in the top positions more than taking each action separately will.

7 Conclusion and future research

In this paper, we propose a likelihood simulator for the sequential search model that leverages the assumption that search costs are stochastic and vary at the consumer-product level. Simulating the likelihood for SSM has been challenging because the joint probability of the search and purchase decision does not have a closed-form solution. The proposed method circumvents the problem by converting the distribution of reservation utilities to that of search costs and computing the probability of the inequalities involving the reservation utility with the search cost distribution. We lay out the procedure to simulate the likelihood and validate the proposed method with simulated data and field data.

In the simulation study validation, we show that PM precisely recovers the true parameter values and closely approximates the true probabilities of the search and purchase decisions. PM’s precise estimation of model parameters stems from two factors—1) it allows two stochastic components, one responsible for the randomness in the reservation utility and the other responsible for the randomness in the realized utility, and 2) the likelihood is computed with the distributions of those two components instead of with the raw proportion of random draws (as in CFS) or such a proportion smoothed by a kernel that is not part of the model component (as in KSFS). In addition, we show that PM works well with an alternative search cost distribution assumption that is widely used in the literature. Lastly, PM allows researchers to examine the relative effectiveness of marketing mix even when the model is mis-specified as it precisely recovers the ratio of consumer preference parameters.

In the field data validation, we estimate SSM with the likelihoods constructed by PM, KSFS, and JM. Since the comparison of in-sample fit is not feasible across simulators, we perform cross-validation to demonstrate that PM produces better out-of-sample predictions on consumer search behavior than KSFS and JM do. Lastly, using the estimates from PM, we calculate the search cost elasticity and the position effect elasticity of search on each position and infer that the alternatives in the lower part of the ranking are more sensitive to changes in the search cost and the position effect. We also find that lower-positioned hotels are more sensitive to changes in the position effect than to those in the search cost.

This paper contributes to the search literature as it proposes a likelihood simulator that yields model parameter estimates without requiring any additional steps. In contrast, to calibrate the scaling factors for KSFS, one needs to first estimate the model with various potential scaling factors and compare the out-of-sample prediction results because the direct comparison of maximized likelihoods across scaling factors is not permitted. For JM, a grid search is required to identify the variance of the assumed pre-search error term. While one can compare either the maximized likelihoods or the out-of-sample prediction to choose the pre-search error term, the conclusion drawn from one criterion may differ from the conclusion drawn from the other, as in our field data validation. On the other hand, the proposed likelihood simulator precisely estimates the model parameters with a single optimization.

We see a potential avenue for future research from the methodological perspective. Since PM relies on the purchase decisions and the order of searches to identify model parameters, it will be another contribution to the literature to devise a method to identify the model only with the purchase decision and the identities of searched products without the search order as in Honka and Chintagunta (2017). From a substantive perspective, as existing studies that employ the sequential search model usually assume that consumers are aware of all products in the list, approaches that can relax such an assumption, for instance, by incorporating consumers’ decision to scroll and consumer learning about the distribution of observable characteristics will more accurately reflect consumer search behavior and let researchers have a deeper understanding of search behavior.Footnote 26