Counterfactual approach
Our approach was to build counterfactual regression models based on data from before the Dory movie release for different stages of the purchase process for P. hepatus (Dory’s species). We use these models to project our expectation of that variable of interest after the movie release and compare the observed level of interest to the counterfactual expectation of interest. This approach aligns conceptually with the synthetic control method, previously used in conservation science to investigate the impact of protected areas on deforestation (Sills et al. 2015). We built three counterfactual models. The first model focused on global Google search frequency, a type of indicator increasingly used in the biodiversity conservation literature as an indicator of societal interest in a species or topic (Nghiem et al. 2016). This information seeking behaviour fits into the research stage of the purchase funnel (Fig. 1), as it is an example of an action that may be undertaken before a decision on whether to make a purchase is made. The second model focused on aquarium fish imports in the US, a metric that reflects purchasing behaviour, the last stage of the purchase funnel model and the one that more closely relates to actual impacts on fish populations. Approximately 100 000 P. hepatus are imported yearly into the US from 25 countries, although the vast majority of these originate from Indonesia and the Philippines (www.aquariumtradedata.org). The species is the most traded in its family, accounting for about half of all imports into the USA of Acanthuridae fish. The species Acanthurus japonicus and Acanthurus leucosternon make up the majority of the remaining imports (www.aquariumtradedata.org). There are no restrictions to the trade in this fish species.
The third model focused on visitation patterns to US Aquaria, as we aimed to explore other potential behavioural outcomes of the “decision” stage of the purchase funnel model (Fig. 1), which could have led to outcomes other than purchases. With “Finding Dory” being a movie targeted at young audiences and P. hepatus being considered a species for advanced hobbyists, there is the potential for children or their parents to opt for an aquarium visit as a substitute for the time and resources needed to keep P. hepatus. In fact, past qualitative research into motivations of visitors to Aquaria in Canada, has found that some visitors mention movies, including Finding Nemo, as motivations for visiting (Briseno-Garzon et al. 2007).
We fit each model in a Bayesian framework to allow us to treat the predictions probabilistically and derive conclusions about the probability of effects of the movie release on purchase or likelihood of purchase. We fit all models with the probabilistic programming language Stan (Carpenter et al. 2017) using the packages brms (Bürkner 2017) and rstan (Stan Development 2018) for the statistical software R (R Core Team 2018). In all cases, we sampled from the posterior with 2000 iterations across each of four chains and discarded the first half as warm-up. We ensured consistency with chain convergence by inspecting trace plots, ensuring \( \hat{R} \) (the potential scale reduction factor) was less than 1.05 for all parameters, and ensuring the minimum effective sample size neff was greater than 100 for all parameters (Gelman et al. 2014).
Online searches
For this analysis, we first selected the species that would form the counterfactual. We selected species that were biologically close to P. hepatus, that shared key physical traits, such as size and broadly similar vivid colours (Fig. S1) and that were traded in the ornamental fish market. These included Tang fish species from the genera Acanthurus (A. coeruleus, A. leucosternon, A. nigricans, A. sohal and A. japonicus), and Zebrasoma (Zebrasoma xanthurum) both of which share the family Acanthuridae with Paracanthurus. On May 1, 2018 we retrieved relative global Google search frequency for each of the species scientific name, from January 1, 2014 to July 10, 2017 on a weekly basis using the R package gtrendsR (Massicotte and Eddelbuettel 2018) on January 21, 2018 (Fig. S2). We used the species scientific names as keywords to avoid overlaps in the common names of different species, for example with two species sharing the common name “blue tang”. We included the option “low_search_volume = TRUE”, which also includes searches from lower search volume regions of the world. We fit the models as beta regressions:
$$ \begin{aligned} y_{i} \sim {\text{Beta}}\left( {\mu_{i} ,\phi } \right) \hfill \\ {\text{logit}}\left( {\mu_{i} } \right) = {\mathbf{X}}_{i} {\varvec{\upbeta}}, \hfill \\ \end{aligned} $$
where
$$ {\text{logit}}\left( {\mu_{i} } \right) = { \log }\left( {\frac{{\mu_{i} }}{{1 - \mu_{i} }}} \right), $$
And yi represents the observed proportion of global Google searches for P. hepatus in week i, Xi represents a vector of predictors with each element representing either the relative search frequency of a counterfactual species or the first-order interaction between them and β representing a vector of regression parameters including an intercept. We substituted 0.99 for a single week that had the maximum observed search frequency for P. hepatus (1.0) to avoid the complexity of a one-inflated beta-regression since the beta distribution cannot model 0 s or 1 s. We used the mean-precision parameterization of the beta distribution commonly used for regression purposes, which replaces the usual \( {\text{Beta}}\left( {p,q} \right) \), with \( {\text{Beta}}\left( {\mu ,\phi } \right) \) by setting \( \mu = p/\left( {p + q} \right) \) and \( \phi = p + q \) (Ferrari and Cribari-Neto 2004). We placed weakly informative priors of \( {\text{Normal}}\left( {0, 2} \right) \) on the slope coefficients, \( {\text{Normal}}\left( {0, 10} \right) \) on the intercept coefficient, and Half-Student-t(3, 0, 25) on the Beta precision parameter \( \phi \).
Imports
To reflect purchase patterns, we used monthly import data for 2015 and 2016, provided by Quality Marine, one of the US’ largest ornamental fish wholesalers (www.qualitymarine.com). We focused on the US as it was the largest market for the movie Finding Dory and was, therefore, the country where an impact was most likely. For this analysis, we used the same counterfactual species and overall approach detailed in the previous section (Fig. S3). We excluded Z. xanthurum and A. sohal due to the time series for these species being overwhelmingly dominated by zeros (Fig. S3). We modelled imports of P. hepatus with a negative binomial GLM (Generalized Linear Model) as:
$$\begin{array}{l} y_{i} \sim {\text{NegativeBinomial}}\left( {\mu _{i} ,\phi } \right) \hfill \\ \log \left( {\mu _{i} } \right) = {\mathbf{X}}_{i} ,{\varvec{\upbeta}}, \hfill \\ \end{array}$$
Where yi represents the observed number of imports of P. hepatus in month, i, Xi represents a vector of predictors with each element representing the number of imports for a counterfactual species in that month and β representing a vector of regression parameters including an intercept. We used the “NB2” parameterization of the negative binomial where the variance increases quadratically with linear increases in the mean \( \left( {{\text{variance}} = \mu + \mu^{2} /\phi } \right) \), and that relationship is controlled by the dispersion parameter \( \phi \) with smaller values of \( \phi \) corresponding to more dispersed data (Hilbe 2011). We placed weakly informative priors of \( {\text{Normal}}\left( {0, 2} \right) \) on the slope coefficients, \( {\text{Normal}}\left( {0, 10} \right) \) on the intercept coefficient, and Half-Student-t(3, 0, 25) on the negative binomial dispersion parameter \( \phi \).
Aquarium visits
We used monthly aquarium attendance reported by 20 public and private institutions in the USA. These institutions varied considerably in size, with annual visitation varying between 10 000 and 2 400 000 visitors. Here we present an analysis of the summed visitor counts across all aquaria due to a data sharing agreement. We modelled the natural logarithm of the number of aquaria visits each month from January 1, 2006 to December 1, 2016 with an additive model as:
$$ \begin{aligned} { \log }\left( {y_{i} } \right)\sim {\text{Normal}}\left( {\mu_{i} ,\sigma^{2} } \right) \hfill \\ \mu_{i} = {\mathbf{M}}_{{\mathbf{i}}} {\varvec{\upbeta}}_{1} + s\left( {T_{i} } \right), \hfill \\ \end{aligned} $$
Where yi represents 100,000 aquarium visits for point in time i, \( {\mathbf{M}}_{{\mathbf{i}}} \) represents a factor predictor for month: either a vector of 0 s for the base month (January) or a vector of 0 s with a single 1 indicating the respective month, and \( s\left( {T_{i} } \right) \) represents a smooth function over time. We represented time T as the decimal date (e.g. February 1st, 2014 = 2014.085) standardized by centring on its mean and scaling by two times its standard deviation to produce coefficients on an appropriate scale for the priors. The \( {\varvec{\upbeta}}_{1} \) symbol represents a vector of coefficients and σ represents the residual standard deviation. We placed weakly informative priors of \( {\text{Normal}}\left( {0, 2} \right) \) on the slope coefficients, \( {\text{Normal}}\left( {0, 10} \right) \) on the intercept coefficient, and Half-Student-t(3, 0, 2) on σ and the standard deviation of the parameter describing the wiggliness of the spline smoother (Bürkner 2017).