R package DCchoice for dichotomous choice contingent valuation: a contribution to open scientific software and its impact

The R package DCchoice is designed to mitigate programing-related barriers to the application of dichotomous choice contingent valuation (DCCV) methods in empirical studies. Since its release in 2014, DCchoice has been updated. This paper introduces the current version of DCchoice which supports single-, one-and-one-half-, and double-bounded DCCVs, with and without a spike. Additionally, the willingness-to-pay and its confidence intervals can be calculated for a representative respondent as well as for a user-defined specific respondent using the current version. The associated web tutorial and R Commander plug-in for basic usage of DCchoice are also available. DCchoice has advanced DCCV applications in various fields.

source software (F/OSS) occupies an especially important role within the statistical and data science field. F/OSS enhances the reproducibility of research results, allows innovations in techniques and approach to be quickly validated and shared, and provides the basis for educating the next generation of researchers and analysts across many fields (Carmichael & Marron, 2018). There are various languages and/or software products released under F/OSS licenses, but R (R Core Team, 2021) is the most popular dedicated statistical computer language. 1 R is used widely across all science disciplines, and also in the humanities when data analysis constitutes a major activity (for the evolution of R, see Chambers, 2020). The popularity of R is fundamentally linked to the ease of installation, and its expandability, both of which are facilitated by the Comprehensive R Archive Network (CRAN). CRAN is the official R site for distributing R and its add-on packages. R can be expanded by installing an add-on package, which is a collection of functions, dataset, and help documents for a specific topic. CRAN connects package users and developers effectively: users can easily install add-on packages to their computer from CRAN by executing a single command, and developers can publish their packages to their (potential) users via CRAN (for the features and roles of CRAN in detail, see, Hornik, 2012;Mora-Cantallops et al., 2020). As of 23 May 2022, 18,591 packages were available on CRAN.
DCchoice  is an add-on R package that provides functions for analyzing responses to contingent valuation (CV) questions available from CRAN. CV involves asking people about their willingness-to-pay (WTP) for a good or service, and CV has been used extensively to value non-market goods and services (Carson, 2012). Applications of CV methods are varied, and this diversity can be seen across examples considering: the benefits of water quality improvement (Whitehead, 1995(Whitehead, , 2015; consumer values for food labeling information (Loureiro & Hine, 2004); citizen value of vaccines (Gracía & Cerda, 2020); and patient value of time (Verbooy et al., 2018).
CV methods can be implemented in a number of ways (Carson & Hanemann, 2005), but the most popular approach is the dichotomous choice contingent valuation (DCCV) method. The classical DCCV implementation is the single-bounded (SB) DCCV method (Bishop & Heberlein, 1979), and this approach asks respondents whether they would be willing to pay a 'bid' amount of money for a change in a good or service. Varying bid values across respondents reveals the WTP for the change.
The first version of DCchoice was released from R-Forge in 2014, and since 2015 revised versions have been distributed from CRAN. DCchoice was developed to mitigate programing-related barriers associated with implementing DCCV methods for empirical studies. Both parametric and non-parametric methods have been used to analyze the responses to DCCV questions (Carson & Hanemann, 2005). Parametric approaches generally fit discrete choice models to the dataset using a maximum likelihood estimation method. SB-DCCV survey data can be analyzed with standard logit or probit model specifications, However, when analyzing the responses to other format questions users typically need to write code chunks to: (a) define the log-likelihood function corresponding to the model to be fitted, and (b) search parameter values that maximize the log-likelihood function under the responses using an optimization function. The non-parametric estimation approach also needs support from software packages because, for example, the proportion of YES for each bid should be adjusted when the relationship between bids and the proportion of YES is a non-monotonic sequence. Additional lines of code are also needed to calculate the mean/median WTP and associated confidence intervals from the fitted models, even when using the SB-DCCV model. Writing the required code to estimate DCCV models is thus challenging, and the need for this technical skill is a barrier to implementing DCCV.
An early version of DCchoice is detailed in Aizaki et al. (2014), but over time the features of DCchoice have been substantially updated in response to user requests; the available models have been expanded; and the functionality regarding the calculation of WTPs extended. This paper provides an overview of DCCV methods (Sect. 2); describes the current version of DCchoice, and documents the differences between the initial and current versions of the package (Sect. 3); illustrates how to use DCchoice (Sect. 4); provides evidence on the impact of DCchoice on CV applications (Sect. 5); and then presents some concluding comments (Sect. 6).

Outline of DCCV formats
In addition to the SB-DCCV format, DCCV has two other major variants: the doublebounded (DB) DCCV format (Carson, 1985;Hanemann, 1985); and the one-and-onehalf-bounded (OOHB) DCCV format (Cooper et al., 2002). Each model can also be implemented either with or without a spike at zero (Kriström, 1997). A model with a spike at zero allows for respondents to have zero WTP for the good because they are not in the relevant 'market'. A spike model is analogous to the corner solution outcome in private good markets, where even when the price is zero, the good is not purchased.
The SB-DCCV asks respondents whether they would be willing to pay a bid for a change in a good or service (Fig. 1a). The bid is randomly assigned from a list of bid values defined in advance. The relationship between the bids (BID1) and their responses (proportion of YES [R1 = 1] responses) can be modeled statistically. The model reveals the WTP for a change in the good or service.
The DB-DCCV involves respondents answering a DCCV question in SB format twice (Fig. 1b). Although the bid shown to respondents in the first stage (BID1) is assigned randomly from the list of bids in the same way as the SB-DCCV, the bid in the second stage is determined by the response to the first stage. Specifically, the second bid value is higher (BID2H) than the first when the first response is YES (R1 = 1) they are willing-to-pay the amount, and lower (BID2L) than the first, when the first response is NO (R1 = 0). Thus, there are four possible patterns of responses to the DB-DCCV question: YES-YES (R1 = R2 = 1), YES-NO (R1 = 1, R2 = 0), NO-YES (R1 = 0, R2 = 1), and NO-NO (R1 = R2 = 0).
With the OOHB-DCCV format (Fig. 1c), after answering the first stage of the DCCV question, only respondents who satisfy a condition are requested to answer the second-stage question in the SB-DCCV format. The steps of the OOHB question are as follows: (1) a list of bid ranges [BIDLj, BIDHj] (j = 1, 2, …, J), where BIDLj < BIDHj, Fig. 1 Steps in single-bounded-, double-bounded-, and one-and-one-half-bounded dichotomous choice contingent valuation questions with and without a spike question (gray background) is designed; (2) one of the bid ranges is randomly assigned to the respondent; (3) the respondent is asked whether they would pay the amount (the first stage) randomly selected from the two bids; and (4) they are asked to answer the second stage if they satisfy either condition-(a) their answer in the first stage is YES (R1 = 1) when the lower bid (BIDL) is presented in the first stage; and (b) their answer in the first stage is NO (R1 = 0) when the higher bid (BIDH) is presented in the first stage. There are then six possible responses to the OOHB-DCCV question: NO (R1 = 0), YES-NO (R1 = 1, R2 = 0), and YES-YES (R1 = R2 = 1) when the lower bid is shown in the first stage; and YES (R1 = 1), NO-YES (R1 = 0, R2 = 1), and NO-NO (R1 = R2 = 0) when the higher bid is shown in the first stage.
The spike model implementation assumes a non-zero probability of a zero WTP for a good or service, and a zero probability of a negative WTP (Kriström, 1997). To implement this model variation an additional question asking respondents whether their WTP is positive (S = 1) or not (S = 0) is used. Only respondents who have a positive WTP are asked to answer the DCCV question. The spike model assumption has been considered in SB-, DB-, and OOHB-DCCV studies (Kriström, 1997;Yoo & Kwak, 2002;Kwak et al., 2013).

Roles of DCchoice
DCchoice provides functions for preparing a dataset, fitting models, and calculating the confidence intervals of WTP estimates (Fig. 2). A raw dataset, including responses to the DCCV question, where rows corresponding to respondents (a respondent per row) and columns corresponding to variables or questions, is prepared using a spreadsheet software package. The raw dataset then is read into R as a data frame. If the format of the raw dataset is a contingency-table of bids and responses to the DCCV question, a function in DCchoice is used to convert the data into a data frame in respondent per row format.
There are six functions for parametric estimation: the three main formats SB-, DB-, and OOHB-DCCV, each with and without a spike. The function for parametric SB-DCCV estimation without a spike uses the function glm in the package stats, while the remaining five functions depend on the function optim in stats. Functions for nonparametric estimation approaches for SB-, DB-, and OOHB-DCCV responses follow the Kaplan-Meier-Turnbull (KMT) method (Carson & Hanemann, 2005), which is implemented using the function icfit in package interval (Fay & Shaw, 2010) internally. A function for Krsitröm's non-parametric estimation method for SB-DCCV data (Kriström, 1990) is also provided.
Outputs from the parametric and non-parametric estimation functions have corresponding object classes such as "sbchoice" and "turnbull.sb". The print, summary, and plot methods for these object classes are provided. Applying the summary method to the output from the estimation functions returns WTP estimates.
With parametric estimation the confidence intervals for WTPs are calculated using the functions krCI and bootCI, where krCI implements Krinsky and Robb's parametric bootstrap approach (Krinsky & Robb, 1986, 1990 and bootCI implements the non-parametric bootstrap approach. Both approaches simulate a large number of parameter vectors, construct empirical distributions of the WTP from the replicated Fig. 2 Workflow of analyzing DCCV data with DCchoice parameter vectors, and find the confidence intervals of the WTP according to the corresponding percentiles of the empirical distribution. Specifically, the Krinsky and Robb approach replicates parameter vectors from a multivariate normal distribution with the estimated parameters, and variance and covariance matrix of the estimated parameters, while the non-parametric bootstrap approach creates datasets by resampling the original dataset and fits the model to the bootstrapped datasets, resulting in replicated parameter vectors. Table 1 provides a reconciliation of the functions in the initial and current versions of DCchoice. The function ct2df, which converts a dataset that is in contingencytable format to a dataset in one respondent per row format was unavailable in the initial  Lim et al., 2014) to be prepared, and so is helpful for creating DCCV examples for teaching. The initial version of DCchoice had functions for fitting parametric SB-and DB-DCCV models (without a spike), and non-parametric models. The current version has additional functions for fitting OOHB-DCCV parametric and non-parametric models, and spike SB-, DB-, and OOHB-DCCV parametric models. This is a material increase in package functionality. In the current version of DCchoice, confidence intervals for WTP estimates from non-parametric KMT models can also be calculated. The overall improvement in the types of models that can be fitted is one of the main user benefits available due to updating DCchoice.

DCchoice: from initial to current version
In the initial version of DCchoice, functions for calculating the WTP and associated confidence intervals assumed only a representative individual, and used mean values of covariates in the estimated model when solving for key values. The current version of DCchoice can calculate the WTP and associated confidence intervals for a userdefined individual, expressed via combinations of different values of the covariates (see Sects. 3.3 and 4 for details). This functionality enables the comparison of WTPs for a target good or service among various types of respondents. As such comparisons are highly valuable in empirical CV studies, this change represents a substantial increase in package functionality.

Main functionalities of DCchoice
This section details the functions for fitting parametric models and calculating the confidence intervals of WTPs, which is typically the primary focus in any DCCV study.
As detailed in Table 2, the ordinal (non-spike) and spike SB-, DB-, and OOHB-DCCV functions have common arguments. Under the general situation of applying DCCV methods, these models can be fitted by setting only three arguments: dist, A character string setting the error distribution in the model, which takes "logistic", "normal", "log-logistic", "log-normal", or "weibull". This is invalid for spike model functions par A vector of initial parameters over which the optimization is carried out. This is invalid for the SB-DCCV function data, and formula. For non-spike models, five error distributions are available: logistic, log-logistic, normal, log-normal, and Weibull. For spike models, the error distribution is limited to the logistic distribution, and so the argument dist is not available for spike models. Most empirical studies using spike models assume a logistic distribution. A data frame containing the variables used for fitting models is assigned to the argument data. The model formula, which is assigned to the argument formula, must be expressed according to the class "Formula" (Zeileis & Croissant, 2010) that extends the base class "formula". For the non-spike model functions sbchoice, dbchoice, and oohbchoice, the model formula consists of three parts. The first part is the left-hand side of the tilde (~) and contains the response variable(s). For the SB format, the response variable is a single variable, such as R1 in Fig. 1a. For the DB and OOHB formats, there are two response variables, corresponding to the two stages of the DCCV question, such as R1 and R2 in Fig. 1bc. The right-hand side of the tilde is divided into two parts with a vertical bar (|): the part before the vertical bar contains covariates such as respondent characteristic variables, and the part after the vertical bar contains a single bid variable such as BID1 for the SB (Fig. 1a), two bid variables such as BID1 and BID2 for the DB (Fig. 1b), or BIDL and BIDH for the OOHB (Fig. 1c). For example, when the covariates are respondent age and gender, the model formula for sbchoice, dbchoice, and oohbchoice are given, respectively, as: where AGE is a variable detailing the respondent's age; and FEM is a dummy variable taking the value 1 for female respondents and 0 otherwise. When there are no covariates, the corresponding part contains only a value of 1, as follows: For spike models, the part to the left of the tilde is divided into two parts with a vertical bar: the part before the vertical bar contains the response variable(s), and the part after the vertical bar contains a dummy variable showing the response to the spike question regarding whether their WTP is positive or not (i.e., the variable S in Fig. 1). For the spike models, the model formula for the models with the covariates mentioned above is modified as follows: The functions krCI and bootCI for calculating the confidence intervals of WTPs have common arguments, except for nsim and nboot (Table 3). The effects of the two arguments are, however, similar: the value assigned to the argument corresponds to the number of simulated WTPs. The output from the two functions contains vectors of simulated WTPs (the length of the vectors corresponds to the value assigned to the argument nsim or nboot) and a matrix of point estimates and confidence intervals of WTPs. The corresponding print method displays only the point estimates and confidence intervals of the WTPs. A significant feature of each function is the calculation of WTPs and confidence intervals for a specific individual, when a data frame of covariates containing values corresponding to the individual is assigned to the argument individual (see the example below).

Simple example
This section demonstrates how WTP for improving the environmental conditions in a location can be estimated with each spike model. We assume that respondents are selected randomly from the residents of the location, and DCCV questions in three formats, SB, DB, and OOHB, are used. The datasets are modified versions of CarsonSB, CarsonDB, and oohbsyn that are available with DCchoice. Code chunks for creating the datasets, model fitting, and estimating WTPs are available in the online supplementary file. Additional examples are available via a free web tutorial (Aizaki & Fogarty, 2019) and the help in DCchoice . The variables used in the example are the same as those defined in the previous sections, however, bid variables are typed in lowercase letters, such as bid1. Figure 3 displays the code chunks for the SB spike model and the outputs. After fitting the model (line 1), executing the summary method with the output from sbspike (line 2) returns the estimates and the corresponding statistics, summary statistics of the model, and point estimates of mean and median WTPs for a representative respondent (lines 3-31). The function spikeCoef calculates and returns a spike coefficient using the output from the spike model function (lines 32-34). Note, a value of one minus the spike coefficient corresponds closely to the y-axis intercept (Kriström, 1997). Applying the function krCI to the output from the spike Fig. 3 Fitting a SB spike model, calculating a spike coefficient, estimating mean and median WTPs, and plotting the fitted model model function under three different settings of the arguments calculates the mean and median WTPs and their confidence intervals for the representative respondent (lines 35-39), female respondent of 30 years old (lines 40-44), and male respondent of 60 years old (lines 45-49), respectively. The plot method draws the probability of paying the assigned bid for the representative respondent (line 50), ranging from 0 to the maximum bid value (Fig. 4). For the results of the DB and OOHB spike models, execute the supplementary file in R.

Impact
In response to user requests, DCchoice has been substantially updated and improved since its first release in 2014. The major updates are summarized as follows: (1) addition of a function for fitting OOHB-DCCV models; (2) addition of functions for fitting SB, DB, and OOHB spike models; and (3) the ability to calculate WTP and confidence intervals for respondents with different characteristics. There is an additional package mded (Aizaki, 2015) that is available to test whether a WTP differs from another on the basis of empirical distributions of the WTPs (Poe et al., 1997(Poe et al., , 2005 that are provided using krCI and bootCI in DCchoice (Lorber et al., 2021). Accordingly, using DCchoice with other CRAN packages makes R a free platform for implementing comprehensive DCCV analysis. A CRAN package RcmdrPlugin.DCCV (Aizaki, 2021), which is a plug-in package for R Commander (Fox, 2005(Fox, , 2017Fox & Bouchet-Valat, 2020), has been developed to enable the use of some parametric functions in DCchoice without writing R code chunks. DCchoice also serves as a base package for developing functions for fitting new models (Qiu et al., 2020) and calculating the confidence intervals of WTPs more quickly (Howard, 2018). Furthermore, a free web tutorial for DCchoice (Aizaki & Fogarty, 2019) is provided under the Creative Commons license. Thus, DCchoice plays a core role in the activities that induce new material for DCCV applications. The number of DCchoice downloads from the RStudio CRAN mirror continues to increases (Fig. 5) and the package is increasingly used in published research (Lin et al., 2017;Lorber et al., 2021;Morello et al., 2019;Mostafa, 2016;Tokunaga et al., 2020;Whitehead, 2018), education (Polomé, 2020), and practice (Turpie et al., 2017). These applications indicate the impact of DCchoice in various fields.

Conclusion
DCchoice is easily installed from CRAN. The functions in DCchoice are designed to be accessible to novice R users, via an R Commander plug-in. For more advanced applications, the model formula differs only slightly from that used in linear regression models, such as the formula structure used for the lm function in the stats package, and is consistent with many other extension packages. DCchoice has a wide range of applicability in various fields and is updated regularly. A web tutorial also serves to provide additional user support, and assist with the adoption of DCCV methods.