Introduction

Catch per unit effort (CPUE) is commonly used to reflect the annual trends in stock abundance (N) and is expressed using a general relationship: CPUE = qN, where q represents a constant catchability coefficient (Maunder and Punt 2004). Nominal CPUE, however, is seldom proportional to actual annual trends in stock abundance, as nominal CPUE includes bias and error (Maunder and Punt 2004). The assumption of being proportional is also violated if a fishery frequently changes the target strategy during the year through changes in the operation area and season as well as changes in gear configurations that commonly affect the catchability coefficient. The effects of such various factors must therefore be removed from the nominal CPUE. This manipulation, referred to as CPUE standardization (Maunder and Punt 2004), is typically achieved using generalized linear models (GLMs), generalized linear mixed models (GLMMs), and generalized additive models (GAMs) (Zuur et al. 2009).

Many fisheries commonly capture multiple species simultaneously. For example, longline fisheries generally target a particular species in a fishing operation and typically use a specific fishing strategy that includes operational area, depth of set, gear material, and/or bait type (Hoyle et al. 2014). This targeting behavior is referred to as “target strategy” in this study. Although logbook records of longline fisheries provide a variety of information about their fishing practices, target species are seldom recorded. This fact makes it difficult to directly remove the effects of changes in the target strategy and achieve CPUE standardization for the longline fishery.

Researchers proposed several methods that incorporate a target strategy in the model as an explanatory variable (Biseau 1998; Carvalho et al. 2010; Chang et al. 2011; He et al. 1997; Hiraoka et al. 2016). The number of hooks per basket (HPB) is commonly used to identify the target species in CPUE analysis (Hoyle et al. 2014), because the number of HPB strongly correlates with the depth of hook distribution (Bigelow et al. 2006), which in turn affects the nominal CPUE of different species. For example, Japanese longline fisheries frequently use deep-sets for tunas and shallow-sets for swordfish Xiphias gladius and blue shark Prionace glauca (Hiraoka et al. 2016). Hoyle et al. (2014), however, questioned the usage of HPB as an indicator of target species for a number of reasons, one being that the number of HPB was available only for some limited periods and fisheries.

The catch composition is also commonly used to estimate a target strategy (Biseau 1998; Carvalho et al. 2010; Chang et al. 2011; He et al. 1997). There is a concern, however, that its direct use may cause bias in the CPUE analysis, as composition data include information on stock abundance and these data are used as the response variable (Chang et al. 2011).

A ranking method (RM) was also developed and applied to CPUE standardization for yellowfin tuna Thunnus albacares caught by a Taiwanese deep-sea longline fishery in the tropical Indian Ocean (Chang et al. 2011). The aim of this method was to extract the targeting operation, because the catchability coefficient of targeting operations is higher than that of non-targeting operations, and the targeting operations would be ranked higher when these operations are arranged in descending order based on their respective annual operational catch amounts (Chang et al. 2011; Hiraoka et al. 2016). Although RMs have been applied to actual stock assessments (Hiraoka et al. 2016), the performance of adjusting the target effect has not yet been evaluated.

A finite mixture model (FMM) is also used to standardize the CPUE. The FMM combines two or more probability density functions (McLachlan and Peel 2004). Since the properties of those individual probability density functions can be combined in mixture models to simultaneously estimate target species and annual trends in stock abundance, we focus on the potential of a finite mixture model (FMM) in this study. This FMM was applied to an Irish mid-water pair trawl fleet targeting albacore tuna Thunnus alalunga in the northeastern Atlantic Ocean (Cosgrove et al. 2014). The basic assumption behind the FMM is that different fleet strategies, such as those of fisheries that search for fish and congregate at fish aggregations, cause different catchability coefficients.

The directed residual mixture (DRM) method (Okamura et al. 2017) was developed to adjust the target effect based on the FMM, which assumes that various components of mixture (i.e. modes of targeting) have different covariate relationships with the catchability coefficient. The DRM method outperformed several traditional methods, including cluster analysis (He et al. 1997), in a simulation study (Okamura et al. 2017). However, our concern is that the DRM method assumes a lognormal model, which is appropriate for catch weight data or a fishery such as a trawling fishery the catch amount is large, and the performance of lognormal models has never been investigated for the count data from longline fisheries with low catch numbers.

Although several methods for target effect in CPUE analysis have been developed and applied to actual stock assessments as mentioned above, a performance evaluation in terms of ability to identify the target strategy and estimate the abundance trends has never been conducted in a cross-sectional manner. The suitability of these methods needs to be compared using numerical simulations to manifest the potential biases and variances that may be introduced under a range of different multispecies abundance trends (Hoyle et al. 2014).

Our objectives are to (1) use a FMM to investigate the possibility of identifying the target strategy and estimating the abundance indices, (2) use a simulation study to compare the performance of the FMM with commonly used alternative methods in the presence of target and area effects, and (3) discuss the potential of each method as a suitable methodology in the modeling of CPUE standardization for longline fisheries targeting multiple species.

Materials and methods

A numerical simulation is used to evaluate the accuracy and precision of the annual trends in CPUE estimates for each method. Robustness and limitations of each method are also investigated with consideration given to target change over years, multispecies abundance trends, and different catchability by operational areas (e.g., eastern and western areas). Figure 1 provides a flowchart of the methodology of the numerical simulation.

Fig. 1
figure 1

Flowchart of the simulation study

Definition of target strategy

Catch data are generated by mimicking actual Japanese longline fishery data at Kesennuma, on the Pacific coast in northeastern Japan. The main targets of this fishery are swordfish, blue shark, and tunas such as bigeye Thunnus obesus and albacore. The gear configurations such as line materials and depth of hooks are tailored to the target species (Hiraoka et al. 2016). It is therefore assumed that the longline fishery frequently changes the target strategy and that the target species are mutually exclusive. Based on these assumptions, the fishing operations at this fishery are grouped into three target strategies:

  1. 1.

    Strategy I: Target species is Species 1 (i.e. blue shark) only,

  2. 2.

    Strategy II: Target species is Species 2 (i.e. swordfish) only,

  3. 3.

    Strategy III: Target species is neither Species 1 nor 2,

The proportions of the three target strategies in year i (\(p_{{i, {\text{Strategy I}}}}^{{}}\), \(p_{{i, {\text{Strategy II}}}}^{{}}\), and \(p_{{i, {\text{Strategy III}}}}^{{}}\); \(\mathop \sum \limits_{\theta }^{{}} p_{i, \theta }^{{}}\) = 1,\(\theta \in \left\{ {{\text{Strategy I}},{\text{ Strategy II}},{\text{ and Strategy III}}} \right\}\)) are given based on skipper records of the longline fishery in Kesennuma between 2004 and 2008. The target strategies on each operation are assigned using the Monte Carlo method assuming a multinomial distribution with expectation \( p_{{i,{\text{Strategy I}}}}^{{}}\), \(p_{{i,{\text{Strategy II}}}}^{{}}\), and \(p_{{i,{\text{Strategy III}}}}^{{}} \) for each strategy. The annual transition of target species is expressed using a linear equation:

$$p_{{i,{\text{Strategy I}}}}^{{}} = a + b{\text{Year}}_{i} ,$$
(1)

a and b are fixed values of the intercept and slope, respectively. Based on the skipper records, the value of the first year \( p_{{1,{\text{Strategy I}}}}^{{}}\) is arbitrarily set to 1.5 times higher than the average ratio of operations targeting blue shark (0.514 \(\times\) 1.5 = 0.772), and this value decreases on a yearly basis from 0.772 to 0.257, i.e., a = 0.828, b = − 0.057. This means that the major target strategy is shifted from Strategy I to II every year (Fig. 2). The ratio of Strategy III, \(p_{{i,{\text{Strategy III}}}}^{{}}\), is fixed at 0.100 over 10 years across all scenarios (Fig. 2). The ratio of Strategy II is given as \(p_{{i,{\text{Strategy I}}}}^{{}} = 1 - ( {p_{{i,{\text{Strategy II}}}}^{{}} + p_{{i,{\text{Strategy III}}}}^{{}} } )\).

Fig. 2
figure 2

Annual changes in the proportion of different strategy groups

Data generation

The annual abundance trends over the 10-year period are generated using the following equation:

$$\begin{gathered} N_{i + 1, \rho } = N_{i, \rho } \times e^{{r_{\rho } }} , \hfill \\ i \in \left\{ {1,2, \ldots , 9} \right\}, \hfill \\ \rho \in { }\left\{ {{\text{Sp}}1,{\text{ Sp}}2} \right\} \hfill \\ \end{gathered}$$
(2)

\(N_{i, \rho }\) is the abundance of Species 1 or 2 in year i, and \(r_{\rho }\) is the annual change rate of abundance. Two scenarios of annual abundance trends (i.e. increasing or decreasing) for Species 1 and 2 are assumed. \(N_{10, \rho }\) is set to be 30% larger than \(N_{1, \rho }\) for the increasing scenario, as \(N_{10, \rho }\) is 30% smaller than \(N_{1, \rho }\) for the decreasing scenario. The ratio of 30% is given in accordance with the actual minimum and maximum ranges of abundance estimates for swordfish and blue shark in the North Pacific (Brodziak and Ishimura 2011; Hiraoka et al. 2016). These two scenarios are set for each Species 1 and 2. A total of four scenarios (i.e. increasing and increasing, increasing and decreasing, decreasing and increasing, and decreasing and decreasing) are used (Fig. 3).

Fig. 3
figure 3

Scenarios of annual trends in abundance (increasing, decreasing) for Species 1 (Sp1; solid line) and Species 2 (Sp2; dashed line) in the simulation study. Panels (1)–(4) show the scenarios (Table 3)

The mean catch amount (μi,Sp1, μi,Sp2) is given using the following general relationship (Maunder and Punt, 2004):

$$\begin{gathered} \mu_{i, \rho ,\kappa ,\alpha } \in \left\{ {q_{\rho } z_{\kappa ,\rho } D_{\alpha ,\rho } N_{i,\rho } E} \right\}, \hfill \\ i \in \left\{ {1,2, \ldots , 10} \right\}, \hfill \\ \rho \in { }\left\{ {{\text{Sp}}1,{\text{ Sp}}2} \right\}, \hfill \\ \kappa \in { }\left\{ {{\text{tar}},{\text{ non}}} \right\}, \hfill \\ \alpha \in { }\left\{ {{\text{Area}}1,{\text{ Area}}2} \right\} \hfill \\ \end{gathered}$$
(3)

q is the catchability coefficient per hook with neither target effect nor area effect, \(z_{\kappa ,\rho }\) represents the effect of targeting behavior on q,\(D_{\alpha ,\rho }\) is relative density by area, and E is fishing effort. Descriptions of the notations are shown in Table 1. The value of qSp1 was set to 0.004 and the value of qSp2 was set to 0.002. These values are set based on data from longline fisheries that mainly catch blue shark and swordfish (Hiraoka et al. 2016; Brodziak and Ishimura 2011).

Table 1 Descriptions of the notations used in the simulation study

The subscript \(\kappa\)(tar or non) indicates whether the species is targeted or not targeted, respectively. The values of ztar,Sp1 and ztar,Sp2 were set to 1, and znon,Sp1 and znon,Sp2 were set to 0.2.

The subscript \(\alpha\) indicates fishing area (Area1 or Area2). The variable \(D_{\alpha ,\rho }\) represents relative density by area. The variable \(D_{\alpha ,\rho }\) has a Bernoulli distribution with expectation \(v_{i}\), which is the average proportion of operations in Area1. The Bernoulli trials based on \(v_{i}\) determine the areas in which each operation was conducted. The value of \(v_{i}\) varies over the years to simulate the yearly transition of fishing areas:

$$v_{i} = c + d{\text{Year}}_{i} ,$$
(4)

c and d are the intercept and slope, respectively. We used three scenarios of area effect (A, B, and C). For all three scenarios, the major target strategy is shifted from Strategy I to Strategy II each year, and the ratio of Strategy III is fixed over years. In Scenario A, we assume no area effect, i.e. \(D_{{{\text{Area}}1, \rho }} = 1\) and \(D_{{{\text{Area}}2, \rho }} = 1\). In Scenarios B and C, we assume \(D_{{{\text{Area}}\;1, \rho }} = 0.5\) and \(D_{{{\text{Area}} 2, \rho }} = 1\). In Scenario B, the value of \(v_{i}\) is fixed at 0.5 over years, i.e. c = 0.5 and d = 0. In Scenario C, the value of \(v_{i}\) increases on a yearly basis from 0.1 to 0.9, i.e. c = 0.011 and d = 0.089. The difference between variables \(z_{\kappa ,\rho }\) and \(D_{\alpha ,\rho }\) is that the former is an unobserved or latent variable, as the latter is an observed variable.

The values of \(\mu_{i, \rho ,\kappa ,\alpha }\) on an operation used in each abundance scenario are shown in Fig. 4. The fishing effort E was fixed at 2000 hooks for all operations.

Fig. 4
figure 4

Mean catch (number of catch) on an operation. The black line denotes mean catch by an operation with higher catchability coefficient for the species of interest. The gray line denotes mean catch by an operation with lower catchability coefficient for the species of interest. The solid line and dashed line show Species 1 and 2, respectively. For example, the black solid line shows the mean catch by an operation targeting Species 1

The catch amounts “Catch” on operation s is generated using a random variable following a Poisson distribution with the mean values of \(\mu_{i, \rho ,\kappa ,\alpha }\):

$$\begin{gathered} {\text{Catch}}_{s,i, \rho ,\kappa ,\alpha } \sim {\text{Poisson}}\left( {\mu_{i, \rho ,\kappa ,\alpha } } \right) \hfill \\ s = 1,2, \ldots , 10000 \hfill \\ \end{gathered}$$
(5)

The annual sample size (i.e. the number of fishing operations) is set to 10,000. The simulation is repeated 100 times for each combination of four scenarios of abundance and three scenarios of area effect.

Model structure for CPUE standardization

The eight candidates of CPUE standardization methods and two control scenarios for comparison are summarized in Table 2. The descriptions of each method are as follows:

Table 2 Summary of CPUE standardization model structures with Poisson and lognormal error distribution
  1. (i)

    Nominal CPUE (Nominal)

The CPUE is calculated without considering target effect.

  1. (ii)

    Nominal CPUE with positive catch (Nominal-P)

The CPUE is calculated using only positive catches. It is assumed that the catchability coefficient and the catch number for a target species are both greater than zero. The objective for using this method was that if it were possible to successfully choose the dataset of only a homogeneous condition of a target strategy, the CPUEs estimated from the dataset would not be affected by the target strategy. Nominal-P is applied in actual stock assessments in Japan, e.g., in the CPUE analyses for Japanese bottom-trawl fisheries targeting Alaska pollock Gadus chalcogrammus (Hamatsu et al. 2017) and brown sole Pseudopleuronectes herzensteini (Yamashita et al. 2019) in the Pacific Ocean.

  1. (iii)

    Two-step delta-lognormal method (Delta)

One of the conventional methods to estimate CPUE is the two-step delta-lognormal method (Lo et al. 1992). This model is commonly used when most of the errors are effectively modeled with a lognormal distribution, although the catches include zeros. This model is used in particular to address catch data containing a large proportion of zeros due to target effect, i.e. the species of interest was not targeted. This approach is generally applied to actual stock assessments, e.g., Pacific bluefin tuna Thunnus orientalis (Ichinokawa et al. 2014), albacore (ISC 2017), and blue marlin Makaira nigricans (Forrestal et al. 2019a).

In this analysis, we applied the two-step delta-lognormal method (Lo et al. 1992) to catch data that were generated based on Poisson distribution, and evaluated its performance. The proportion of positive catches was modeled using a binomial GLM with a logit link function, and the positive CPUEs were modeled with a normal linear model for log-transformed positive CPUE.

  1. (iv)

    Ranking method (RM)

The CPUE is standardized using RM in a GLM with Poisson error distribution with the main effects of year and rank. The RM divides the generated data into 10 groups in accordance with the order of nominal CPUE for Species 2 at each 10th percentile (Hiraoka et al. 2016). The group number “10” signifies that the nominal CPUE for Species 2 is at the lowest rank (Rank 1), indicating that Species 2 is the main target species. Rank was modeled as a categorical variable. In all subsequent methods using rank, we treated rank in the same manner. Method (v), RM-intrc, below, estimates the mean CPUE of all ranks, whereas method (vi), RM-intrc-Rank1, estimates only the mean CPUE of Rank 1.

  1. (v)

    Ranking method with the interaction term (RM-intrc)

The CPUE is standardized using RM in a GLM with the interaction term between rank and year.

  1. (vi)

    Estimating the mean CPUE of Rank 1 using RM-intrc (RM-intrc-Rank1).

The nominal CPUE at Rank 1 for Species 2 (the lower nominal CPUE for Species 2) is standardized using the GLM with the output of RM-intrc. The operations of Rank 1 theoretically contain a large proportion of operations that mainly target Species 1. The mean CPUE is expected to be robust, because the catchability coefficients for Species 1 are relatively high and less variable in operations targeting Species 1, irrespective of changes in abundance.

  1. (vii)

    Directed residual mixture (DRM) model.

The CPUE is estimated using the DRM method (Okamura et al. 2017). To deal with the zeros, we used an additive constant, 0–13 (Appendix F in ESM.txt). The explanatory variable (i.e. year) was treated as a continuous variable, unlike the categorical variables of the other models (i.e. year and target strategies in RM and FMM, and control models CP and CL described below). The regression model, which is a three-component mixture model similar to the FMM, was implemented for the catches of Species 1 and 2.

  1. (viii)

    Finite mixture model (FMM)

The CPUE is estimated using the FMM, which directly estimates the target strategy from a dataset containing multiple species (Cosgrove et al. 2014). The regression model, which is a three-component mixture model, is used to estimate the CPUE (i.e. number of fish in the catch) of Species 1 and 2, and to separate the components into the three target strategies. Details are described in Online Appendix A. The FMM is carried out using R version 3.3.1 (R Core Team 2018) and “flexmix” package version 2.3–13 (Leisch 2004; Grün and Leisch 2007, 2008).

  1. (ix)

    Complete data with Poisson model (CP)

CP is one of two control scenarios we used for comparison. The CPUE is estimated using a GLM with a Poisson distribution for the complete dataset (target strategy is known). Although such a dataset is unrealistic, comparing the results of this method with those of other methods enables us to effectively show the degree of bias and variance that the other methods would produce.

  1. (x)

    Complete data with lognormal model (CL).

CL is the other control scenario. Here, the CPUE is estimated using a GLM with a lognormal distribution for the complete dataset. To deal with the zeros, we again used an additive constant, 0–13 (Appendix F in ESM.txt). As with the CP model, we used this method to examine and validate the suitability of the assumption for lognormality inherent in the DRM method (Okamura et al. 2017) in the modeling of count data with low catch numbers.

The least squared mean (LSMEAN; SAS Institute Inc. 2009) was calculated to estimate the annual CPUE for each method. In calculating the annual CPUE estimated by Delta, we used the products of the predicted year effects (i.e. LSMEAN) from the binomial and lognormal components.

Model evaluation and comparison

In the performance evaluation for the 10 models, the annual trend residual (ATR) was used to represent variance, bias, and hyper-stability (the annual trend would be flattened compared to the actual trend) for the annual trends in abundance estimates. The ATR is defined as:

$${\text{ATR}}_{i} = \left( {{\text{log}}\left( {\frac{{{\text{CPUE}}_{i} }}{{{\text{CPUE}}_{1} }}} \right) - {\text{log}}\left( {\frac{{N_{i} }}{{N_{1} }}} \right)} \right)/\left( {i - 1} \right),$$
(6)

the first and second terms indicate the estimated and true annual trends in stock abundance, respectively.

A zero value of ATR indicates that the estimated annual trend is unbiased, while a positive or negative value indicates the direction of the bias. The whiskers in the box plots are visual illustrations of variance, a measure of the precision of the estimated trend. The calculation of ATR includes just one species, Sp1.

Results

Although the ATR of each method was different among the three scenarios of area effect A, B and C (Fig. 5a–c), it was similar across the four scenarios of the abundance trend (the four panels in Fig. 5a–c). Although the performance of the CP was slightly better than the FMM, the performance of the FMM was still comparable with the CP; their ATRs were almost zero with low variances (Fig. 5a–c). These results indicated that the FMM capably adjusted the target effect in the CPUE estimates. Overall, the performance of the FMM was the best of the eight methods.

Fig. 5
figure 5

Box plots and violin plots of ATR (annual trend residual) from the Monte Carlo iterations against the ten CPUE standardization methods (Nominal, Nominal-P, Delta, RM, RM-intrc, RM-intrc-Rank1, DRM, FMM, CP, and CL; see Table 2). The bold black lines in the box plots show medians of ATR, rectangles show the interquartile ranges of ATR, the whiskers extend to the most extreme data that are not considered outliers, and the black points show outliers. The width of the violin plot displays the frequency distribution of iterations that produced a certain value of ATR. The plots show trends in abundance for Species 1 (Sp1; rows) and Species 2 (Sp2; columns). The horizontal dotted line denotes an ATR of zero, indicating that the trends of annual CPUE estimates are unbiased. Panels (1)–(4) show the abundance scenarios, and ac show the area effect for Scenarios A, B, and C, respectively (see Table 3)

Table 3 Control parameters used for 12 scenarios in the simulations

The performance of RM-intrc-Rank1 was the second-best in Scenario A (Fig. 5a) and Scenario B (Fig. 5b). Its performance, however, was poor in Scenario C (Fig. 5c).

The performances of the Nominal, Nominal-P, Delta, RM, RM-intrc, and CL were poor, and these ATRs had large values (Fig. 5a–c). In Scenario C, however, the biases of Nominal and Nominal-P were small. These results do not imply that the Nominal and Nominal-P performed well, as they were affected by bias that was offset by the decrement of Strategy I (targeting Species 1) and the increment of Area1 (higher density area).

A bimodal distribution resulting from two very different patterns was observed in the ATRs of the DRM: in one pattern, the ATRs showed very good performance, while in the other, they showed significant bias (Fig. 5a–c).

Discussion

Comparison of performance among eight methods on target strategy

Our simulation study revealed that (1) the FMM had a high potential to directly identify the target species, (2) the FMM also performed well in adjusting the target effect in the estimation of annual abundance trends, and (3) the relative performances of the other methods (i.e. Nominal, Nominal-P, Delta, the other RMs, and DRM) were poor in all scenarios. These results suggested that the FMM was more appropriate than the other methods to adjust the target effect with lower bias and variances in CPUE standardization. The incorporation of area effect in the model had a negative impact on the performance of all methods except for the FMM (Fig. 5b, c). The FMM was the best method among the eight methods for addressing area effect.

The performance of RM-intrc-Rank1 was superior to that of the other two RMs (Fig. 5a, b) for two reasons: (1) For operations at Rank 1, the proportion of operations targeting Species 1 was greater than that of the other operations across the simulation period that had an advantage in the use of the mean CPUE of Rank 1, and (2) a large proportion of the catch at Rank 1 over the simulation period consisted of Species 1.

The performance of RM-intrc-Rank1 was poor in the scenario of severe area effect (Fig. 5c). This result can be attributed to the limitations of this method in estimating an indirect indicator of target species to extract targeting operations. In our simulation, the RM could extract the operations of Strategy I into Rank 1 when the catchability for Species 2 changed only due to target strategy. An additional effect, however, could change the performance. For example, when the catchability for Species 2 decreases in some areas regardless of target strategy, some operations of Strategy II could be incorrectly grouped into Rank 1. Such incorrect grouping was the cause of the poor performance of the RM.

While the RM was applied to the CPUE standardization for North Pacific blue shark (Hiraoka et al. 2016), the FMM has not yet been applied to this stock. A future application of the FMM to this stock would be a useful way to improve accuracy and precision in the estimation of abundance indices.

To explain the details of the bias by year, we showed the annual trend of ATR for Scenario C, which had the largest annual change in bias (Fig. 6). The biases of the other scenarios (A and B) were also shown in Appendix B in Online Resource 1. The annual ATR for Scenario C indicated that the bias in the 10th year was smaller than that in the second year (Fig. 6). In the scenarios used in the simulation, the second year had the largest proportion of operations with Strategy I. For the use of Poisson distribution in the generation of catch data, a larger proportion of Strategy I increased the average value of the catch and also the variance. ATR of the second year thus resulted in the highest variance. In general, as Hilborn and Walters (1992) have stated, the accuracy and precision of CPUE estimates would increase if data were collected from a large number of operations with a high catchability coefficient. Meanwhile, our study showed the opposite results, i.e., the more operations with a high catchability coefficient, the higher variances in the estimates became. Nevertheless, the FMM had the lowest variability and provided a comparable level of performance as the CP (i.e. the model with full information).

Fig. 6
figure 6

Annual ATR (annual trend residual) for Scenarios C1–4 (see Table 3). Box plots and violin plots display the distributions of ATR from the Monte Carlo iterations against the ten CPUE standardization methods (Nominal, Nominal-P, Delta, RM, RM-intrc, RM-intrc-Rank1, DRM, CP, CL, and FMM; see Table 2). The horizontal dotted line denotes an ATR of zero, indicating that the trends of annual CPUE estimates are unbiased

In the Introduction, we mentioned that target species are seldom recorded. If some vessels provided information on their target species, this information would be beneficial for validating the estimates of the FMM. Also, such auxiliary information might be useful for diagnostics in determining success or failure of labeling target strategy. An example of these diagnostics is shown in Appendix B in Online Resource 1.

The DRM method had large biases and variances (Fig. 5a–c). The results indicated that the accuracy and precision of the abundance index estimated by DRM could vary depending on data generation iterations, even under the same scenario with the same settings. One of the possible reasons for this seems to lie in the difference between lognormal and Poisson as the distribution of catch amount. In addition, the DRM is known to have a limitation in that the normal linear regressions used in this method might not work well if the sample size is small, as in the count data of tunas and billfishes in longline catches, and the recommendation was thus made to extend this method to the GLMs and their variants (Okamura et al. 2017).

The poorer performance of Nominal-P compared to all the other methods (Fig. 5a, b) might have been caused by ignoring all zero-catches including true zero-catches for the main target species in the CPUE analysis. Since the use of only positive catches is equivalent to assuming homogeneous target strategies, the method does not seem appropriate for adjusting the target effect. However, Nominal-P was applied in actual stock assessments for Japanese bottom-trawl fisheries targeting Alaska pollock Gadus chalcogrammus (Hamatsu et al. 2017) and brown sole Pleuronectes herzensteini (Yamashita et al. 2019) in the Pacific Ocean. The application of Nominal-P to actual stocks must proceed with caution, as the zero-catch can be caused by factors other than targeting behavior, and the removal of zero-catch can cause a bias. If caution is not taken, the stock assessment results could become misleading.

Comparison between one-stage and two-stage methods

K-means cluster analysis (He et al. 1997) is commonly used to adjust the target effect in the CPUE standardization of fishery data (e.g., Carvalho et al. 2010). The analysis separates the observations into k clusters each observation belongs to only one group. In the application of the k-means cluster analysis, the target strategy is estimated directly while assuming that the change in the species composition of the catch is caused by the target change. This means that the following two assumptions hold: (1) year-to-year variation in the abundance of each species is sufficiently small and (2) relative densities among fishing areas are homogeneous. For example, suppose we have a dataset from a fishery in which targeting strategy did not change over time. If the abundances of each species caught in the fishery fluctuated from year to year, the annual catch of each species would change. In addition, if catch data were collected from multiple areas with different relative densities, the catch amounts would differ by area. It is therefore not appropriate to apply k-means cluster analysis to such datasets, and the clusters estimated by the analysis may not represent the target species. Moreover, k-means cluster analysis has an issue CPUE standardization using the analysis divides the error into two stages: the first stage is the estimation process of the target species and the second stage is the CPUE standardization process using the estimated target species.

A variety of other methods on target strategy have been proposed. The hierarchical clustering method (Ward hclust) was used to standardize the CPUE of longline data, and the CPUEs were applied to the stock assessments of tuna species in tuna-RFMOs (e.g., Hoyle et al. 2015, 2016; Lee et al. 2018). Winker et al. (2014) used principal component analysis (PCA) to adjust the target effect. Clustering by catch, however, cannot remove the target effect if the catch rate of each species varies widely by year or location.

The FMM and DRM estimate the target strategy and abundance trend without dividing the error into multiple stages. In their estimation process, the expectation–maximization (EM) algorithm provides considerably clearer and easily identifiable discrimination for the catch data, as the error rate of strategies estimated by k-means cluster analysis was higher than that of FMM (Table B1 in Appendix B and Table C in Appendix C in Online Resource 1). Similarly, Okamura et al. (2017) demonstrated that the k-means cluster analysis had a larger bias in the CPUE estimates and that the DRM method outperformed the k-means cluster analysis (Okamura et al. 2017). These results suggested that the k-means cluster analysis seemed not to be the best way to adjust the target effect.

The FMM is especially advantageous in its ability to use data on multiple species as response variables in a single model. This means that the FMM can simultaneously estimate both the annual trends in target strategies and the annual trends in the relative stock abundance of multiple species while removing the influence of other factors. The FMM thus performed better than all other methods and enabled us to estimate the unobserved variable (i.e. target strategy).

Extension of the simulation framework

We assumed that the catchability coefficient was the same over operations within the same target/non-target species in this study. Since the purpose of this study is to investigate the possibility of estimating target species using FMM, we assumed the simplest situation. Some of the reasons for the remarkably high performance of FMM shown in this study may be attributed to the current simulation framework: (1) the use of data from two species with contrasting catchability coefficients, and (2) the lack of fluctuation in the catchability coefficient due to random changes from vessel to vessel.

If the factors interfering with the estimation of mixture distribution are included in the model, the estimation performance of the FMM might deteriorate. For example, if the mean values of the mixture distribution are similar, or the mean value fluctuates due to random changes from vessel to vessel, it could make it more difficult for the FMM to converge, resulting in greater uncertainty in the labeling target strategy (Target, Non-target) in the mixture model. As an example, we demonstrated the performance of the FMM when the difference of the mean values of the mixture distribution were similar (Appendix D in Online Resource 1). In this example, the performance of both the FMM and the other methods deteriorated, although the FMM outperformed the other methods. In cases the catchability coefficient varies from vessel to vessel, the performance of the FMM might be worse. Further simulation studies are essential to determine the extent of the deterioration.

Since these factors cannot be ignored in actual fishery data, it will be necessary to incorporate as many fish species as possible (i.e. increase the volume of data catch rates vary depending on the target effect), and to incorporate random effects (e.g., vessel effect) in the model to estimate target strategy and abundance simultaneously. It will also be necessary to conduct simulations to examine the changes in performance caused by these factors.

In our simulation study, the spatial structure of the fishery was quite simple and only two categories of area effect were used to standardize the CPUE. The VAST (Vector Autoregressive Spatio-Temporal) software package for R (Thorson 2019), which makes it possible to analyze fishery data using a spatiotemporal delta-GLMM (Thorson et al. 2015), was recently developed and is now commonly used globally (e.g., Kai et al. 2017; Grüss et al 2019; Hsu et al. 2020) to predict spatial changes in species distribution and temporal variations in a population range and density, based on spatial and temporal autocorrelation among catch rates and correlations with various biotic and abiotic environmental factors (Thorson 2019). The spatiotemporal delta-GLMM is another one-stage method that is clearly advantageous over two-stage methods. In actual practice, the spatiotemporal delta-GLMM was applied to the estimation of abundance from multispecies fishery data accounting for spatiotemporal variation and fisher targeting (Thorson et al. 2017). If sufficient spatiotemporal data of multiple species are available, it would be beneficial in future work to apply VAST to the catch data of longline fisheries as an alternative approach to addressing the target effect.

On the other hand, we have a small concern about the operation of VAST. For the FMM, we can divide clusters by each operational unit, and the accuracy of the estimated target strategy can be checked by comparing it with the record of the target strategy, even if only a portion of the data are available. This validation process, however, is difficult to conduct using VAST.

Applying lognormal distribution to count data

Various studies based on numerical simulations showed that lognormal models performed well (Dick 2004; Forrestal et al. 2019a, b; Lynch et al. 2012) even when the error distribution assumptions were inappropriate for the data (e.g., count data). In this study, however, the CL method performed worse than the FMM, as its error distribution assumption was inappropriate. Even when sufficient knowledge of the target strategy is available, if the error distribution assumption is inappropriate, the variance of the estimate may be large. The FMM, which estimates the target strategy assuming the correct error distribution, performed better than CL. Based on these results, we suggest that lognormal error distribution might need to be checked to see whether it is appropriate for the CPUE analyses of count data. The performance of estimation might be improved by checking error distribution.

Conclusion

The FMM is recommended for adjusting the target effect in CPUE standardization for longline data, as the FMM is statistically robust and enables realistic modeling. However, we focused solely on the extraction of annual abundance trends from catch data affected by the target strategy, using a simple model. In future work, it would be essential to consider various factors (such as seasons and explicit spatial structures) in simulations, and to undertake practical applications to confirm the usefulness of this method.