Background

The development and progress in nematology has been accompanied by the relevant achievements made in the statistical use of its various aspects, whether basic or applied. Admittedly, nematologists utilizing statistics can offer the advances and application of statistical approaches to a broad range of topics in their various nematology-related disciplines. These methods encompass various directions in nematology but basically address the design of field, laboratory, and greenhouse bioassays and experiments, the setting and analysis of data from those tests, and the interpretation of their outputs (Nelson 1985; Abd-Elgawad 1998). Hence, in the absence of statistical analysis, all of the nematological collected data wouldn’t mean much. To name but a few, relevant statistics may serve a number of functions such as comparing means of variables, measuring regression and correlation, and predicting change in variables. Consequently, the types of statistical tests used in nematology may comprise analysis of variance (ANOVA), t-tests, regression tests, indices of dispersion, chi-squares, and other ones. Eventually, these tests help furnish nematologists with insights about procedures and operations which are either too big, too microscopic, or too many to be analyzed and interpreted by approaches other than statistics. Basically, the major role of statistics in nematology is to assist in setting up bioassays and experiments and test hypotheses. However, this latter might sometimes be attenuated in proportion to the complexity (addition of extra-treatments, many factors affecting the targeted goal, use of raw rather than sterile soil where number and type of microorganisms are controlled, etc.) of experimental designs. Therefore, nematologists should always utilize the most simplest and proper experimental designs while searching other approaches that can benefit from advanced statistical applications.

In this respect, specific statistical tests are utilized in nematology to study spatial (horizontal and vertical) and temporal distribution of nematodes (Been and Schomaker 2013), measure reliability of methods which apply molecular identification on nematode samples (Dritsoulas et al. 2019), predictions of crop damage caused by plant-parasitic nematodes (PPNs) (Ferris 1984) and precise interpretation of data and results. Additionally, some statistical functions and concepts can also help choose optimum sample size or which species to study from a polyspecific nematode community (Abd-Elgawad et al. 2016), or inter-laboratory proficiency tests to examine and compare nematode counts (Berg et al. 2014). With the recent advances in nematology, many experiments may need the interpretation of large datasets that are too large or too complicated for scientists to analyze by hand. Therefore, scientists may utilize statistical software in the laboratory specifically designed for data processing. Using such software programs often requires preliminary training. Nematologists should be careful to use appropriate software programs to help them with nematode sampling and statistics in order to keep their findings meaningful (Abd-Elgawad 2020). On the other hand, some nematologists can create, develop, and modify these computer programs based on results from different situations and/or modification in the rule of thumb.

This review presents highlights for common misuses of statistics in nematological research and reporting. It also throws light on less recognized but useful statistical applications such as comparing accuracy between two groups of molecularly examined samples and optimizing screening of nematode-host suitability studies. Finally, this paper comments on a shift (Abd-Elgawad 2020) in the thumb rule value for the randomness of Taylor’s Power Law (TPL) (Taylor 1961).

Main text

Misuses in experimental planning and analysis and interpretation of data

Basically, the process of planning an experiment so that data can be adequately analyzed by statistical methods should include the following steps to reach valid and objective conclusions (Montgomery 2001): (1) recognition of and statement of the problem, (2) selection of the response variable(s), (3) choice of factors, levels and range, (4) choice of adequate experimental design, (5) performance of the experiment, and (6) careful statistical analysis of the data especially when software programs are used. Any violation of one or more of these processes may result in misuses of the obtained data. Moreover, the whole process must rely on the researcher's experience and his perfect knowledge of other relevant researches and publications. Strikingly, various degrees of such an experience, related to differences in the size of the sampling unit and the index of nematode dispersion used have led to misinterpretation of the data (Abd-Elgawad 2019). Valid as these units and indices are, they sometimes lack in the exactitude of the applied parameter. Hence a few nematologists tend to utilize several indices to the same entomopathogenic nematode (EPN) statistics to characterize their spatial distribution patterns in order to lessen the disadvantages of both issues. For instance, Wilson et al. (2003) employed unfamiliar dispersion indices in Nematology. They applied spatial analysis by distance indices (SADIE) that consider the geographic coordinates of the sampled nematode location. Then, Spiridonov et al. (2007) applied SADIE in addition to Lloyd’s index. Later, Bal et al. (2017) implemented several indices for perfection of statistical interpretation of the data. They harnessed SADIE in addition to Moran’s I and Geary’s c as two indicators of autocorrelation among spatial EPN distributions. Moreover, common sampling processes of EPNs may result in erratic results. For example, comparing different sampling methods which applied the same index (i.e., SADIE) to examine EPN distributions for better conceiving of various patterns is highlighted (Table 1). Dissimilarity in the size of the sampling area may account for various statistical interpretations of the obtained data (Been and Schomaker 2013; Abd-Elgawad 2019). In this vein, when the area or size of the soil sampling unit progressively increased, the measured dispersion of the same EPN population accordingly change from random to contagious and finally regular (Fig. 1). In other words, the shift in the spatial distribution pattern of the nematodes is definitely dictated by the area of the sampling unit (Fig. 2). Moreover, checking the pattern of nematode-infective juveniles (IJs)-natural distribution revealed apparent discrepancy among the used indices (Spiridonov et al. 2007). The authors speculated that the discrepancy may be related to using a scale finer than the commonly used one for EPN aggregation. On the other hand, Wilson et al. (2003) attributed uniform distribution pattern of EPN examined by software SADIE program to the symmetrical nature of their sampled plots as well. Therefore, matching different values for the same index of aggregation by SADIE (Ia) using unequal sampling units in various regions (Table 1) should be cautiously interpreted. Thus, the difference in volume/area of the samplers (Fig. 2) can likely impact the obtained distribution patterns of EPNs. In this respect, In Holland, this unit was a 20 cm diameter borer/sampler (Taylor 1999), but as narrow as a 2 × 15 cm deep core in New Jersey, USA (Wilson et al. 2003). Another shift in its square shape and area occurred as its surface was 5 × 5 cm when a square auger was used to 5 or 10 cm deep at Merelbeke, Belgium (Spiridonov et al. 2007). Moreover, cylindrical bait trap of about 3 cm in diameter and 10 cm tall was employed in Ohio, USA (Bal et al. 2017), but a shovel is used to sample from approximately 30 cm diameter and 25 cm deep spots in El-Beheira governorate, Egypt (Abd-Elgawad 2014).

Table 1 Indices of aggregation (Ia)* values recorded at different countries on entomopathogenic nematode (EPN) populations with various sampling methods and units (Abd-Elgawad 2019)
Fig. 1
figure 1

Four circles (a, b, c, d) with consequent different sample sizes showing different distribution patterns of the same nematode population at the surface of the sample. These may be parts of the largest circle which has a contagious distribution with regularly distributed clumps of entomopathogenic nematodes (Abd-Elgawad 2019)

Fig. 2
figure 2

Various samplers with different sizes used as tools to take nematode samples. a A spade to collect large samples of fairly undisturbed soils and roots, b A trowel to obtain samples of medium volumes, and c Soil tubes (augers) to get many sub-samples (cores)

In addition to the above-mentioned confusion in determining the exact values for indices of nematode dispersion concerning the spatial and temporal distribution patterns for nematode ecology studies, there is another issue for the proper nematode transformation counts. It is apparent that the basic types of statistical tests utilized in nematology fall into four categories: correlation, regression, comparison of means, and nonparametric. Means comparisons statistically specify the difference between the means of two or more sets of variables/datasets. Correlation coefficient indicates how closely two or more variables are related. Regressions analyze if a shift in one variable can predict shift in another. Nonparametric tests are utilized for datasets that don’t satisfy the pre-requisites for parametric analysis tests. Hence, numerical values should be subjected to different transformation methods as a pre-requisite to comply with assumptions needed for these three basic types of parametric statistical analyses. For example, if such values indicate random or aggregated distribution, then PPN or EPN counts should be root-squared or log-transformed, respectively, before the statistical analyses, to fulfill the pre-requisites of normal distribution, and consequently apply such statistical approaches. In this vein, Abd-Elgawad (1992) found that the use of log transformed data led to identical statistical interpretation of three experiments analyzed according to nematode dispersion in TPL. Both log and TPL transformations significantly decreased the heterogeneity of variances (P ≤ 0.05 and P ≤ 0.01) approximating the pre-requisite of equal variance underlying parametric statistical analysis. Nonetheless, TPL equations could better reduce differences between treatment variance than log transformation. So, such a difference in indices of nematode dispersion for the nematode population may give rise to problems of inadequate statistical reporting and misinterpretation of the data (Abd-Elgawad 2019). Eventually, sampling unit should be standardized with a norm model to obtain samples that can offer logic comparative evaluations. This is also recommended as it will allow future reviews to be so analytical that they can build on previous studies for such objective and more reliable investigations.

Other misuses have been raised by an overdependence on computer software without careful consideration of the calculations which the computer is ought to perform (Nelson and Rawlings 1983). Testa and Shields (2017) implemented low labor in vivo mass rearing approach for EPNs to decrease their cost of production for an area-wide biocontrol project of arable crops. They found that total cost of rearing one billion alive IJs that can be applied in the field is about $350 USD ($0.35 per million IJs). They valued the cost structure relying on easily obtainable Galleria larvae as the rearing susceptible insects. In order to culture 1 billion, 109, alive IJs require 12,500 Galleria larvae, because each larva can yield on average 80 million IJs under the preferred culturing temperatures. However, the output of their assessing (12,500 × 80 million) is 1012 not 109 alive IJs. The insects are sold in 450 ml plastic containers with 250 Galleria larvae. So, Testa and Shields (2017) estimated the sum as 1 billion, which is apparently a misprint and should be 1012, IJs which needs 50 containers of insects with a price ranging between $5 and 7 per container handled (50 × $6 = $300). Factually, one Galleria larva can culture 68,000 to 100,000 IJs (Elson J. Shields, Cornell University, Ithaca, NY, Pers. Comm). Hence, Abd-Elgawad (2017) noted that the discrepancy in their assessment of cost is apparent. They stated that total labor which comprises time to furnish the IJs to inoculate the 50 containers, inoculation of a container, checking containers in the culturing process, and gathering the IJs for application on the spot is in the range 3–4 h (4 h × $12/h = $48). Inexpensive as it may be, the right estimate to culture one billion alive IJs ready for practical use is about $0.875. Nonetheless, the actual expense should be $875 ha−1 when the same applied rate is 2.5 × 109 EPN ha−1 with correct estimations.

Common and sound uses of statistics in nematology

Such uses consider the planning stages of experimentation as crucial steps. They can rightly furnish for the sound objective(s) and consequent conclusions. Therefore, the above-mentioned steps for the process of such planning should be cared for. In this respect, nematologists should correctly choose the simplest experimental designs which are capable of achieving their objectives among many others (Montgomery 2001). They commonly utilize rational variables. For example, for studying the effect of certain ecological factors on nematodes, they should choose variables that are more impacted by the settings in which the nematodes occur. These factors are more useful than those that are less subject to ecological effects. Likewise, morphological, morphometric, and molecular characters and ratios that vary greatly among nematode populations in relation to their variations within populations are obviously more helpful for systematic and taxonomic studies than are those that offer lesser variability among nematode populations in proportion to their statistical variances within populations. Furthermore, the treatments which can offer the data/information required by the question under study should be accurately selected. Sound conception of comparisons among these treatments should be determined ahead of time; at the planning phase. While random lay out should assign the treatments to adequate experimental units, replication is direly required to assess experimental error and to stabilize the estimates of treatment means (Nelson 1985).

Nematologists usually apply statistics to look for indications of effects and they rely on the data to supply them with estimates of the magnitudes of these effects. So, there has also been some revision in our thinking about some concepts in order to soundly use and extensively apply statistical techniques to biological problems. For example, the use of large plots in field experiments because the variance of large plots is small was formerly recommended. Now, we recommend the use of small plots with a compensating rise in number of replications to utilize the available resources (Nelson and Rawlings 1983). Furthermore, researchers can often avoid the use of an inappropriate error term for testing or for modeling standard errors especially with the common use of statistical computing packages that use a default error term. By this is meant, for instance, that all terms not comprised in the linear equation spelled out in the software are pooled into an error term which the computer uses for statistical tests and estimates of precision. In a very large proportion of the cases, the tests of significance automatically supplied by computer packages are incorrect. So, each nematologist using computer software is responsible for the incorrectness of the analysis that may unintentionally occur (Nelson and Rawlings 1983). Clearly, nematologists need to be sure that the appropriate error term is being utilized in all cases; i.e., the analysis of variance is being practiced via a software program or a desk calculator. On the other hand, some resulting data may have variance heterogeneity. To resolve such an issue, in some cases, a transformation such as the log-transformation may be utilized (Abd-Elgawad and Hasabo 1995). Another approach is to group the data into sets which have homogeneous variance and conduct separate and independent analyses of variance for each set (Nelson and Rawlings 1983). A precise check of the data types will be useful to define if the priori biological model is appropriate or if the types display that some other equation would be more adequate.

Further statistical considerations in nematode research and reporting

As the field of statistics is the science of learning from data, it has been used so and should further advance nematological types of research and relevant applications. Nematologists have been using statistics to help them utilize the proper methods to gather the data, implement the correct analyses, and effectively display the results. Furthermore, statistical processes should be harnessed to support novel methodologies and discoveries in nematology, make decisions based on sound data, and make predictions. Therefore, statistics can open new avenues to grasp subject(s) much more deeply. For example, one of the most effective, economical, ecologically benign treatments to reduce yield losses from nematode diseases is to use pathogen-resistant plant cultivars (Sasser et al. 1984). A recent approach is to use biochemical markers to rate pathogen-plant suitability especially for the most economically significant pathogens with potential genes for resistance via statistical approaches (e.g., Abd-Elgawad and Molinari 2008; Abd-Elgawad et al. 2012). On the other hand, pitfalls of classical approaches for rating PPN-host suitability were discussed (Roberts 1992; Abd-Elgawad 2003). In this vein, for better categorization, Abd-Elgawad (1991)’s report provided nematode-host suitability designation in which statistical analysis played a key role in terms of different levels of statistical probability (P) values (Table 2). The P value always gives the probability of observing the evidence given the null hypothesis (H0). This hypothesis assumes non-significant difference between specified populations; i.e., any noted difference being due to sampling or/and experimental error. Thus, the P value is the probability, under the assumption of no association or no effect (H0), of getting a result equal to or more extreme than what was actually found (Goodman 2005). Hence, the first step toward a shift in quantitative interpretation of P values is its categorization into more than two levels (Held and Ott 2018). Therefore, Abd-Elgawad (1991) used the following P values as measures of evidence for H0: if P ≈ 0.2 there is “a suggestion of evidence of insignificant nematode damage” agrees with H0, if P ≈ 0.1 there is “a suggestion of evidence of significant nematode damage” against H0, if P ≈ 0.05 there is “modest evidence of significant nematode damage” against H0, and if P ≈ 0.01 there is “strong evidence of significant nematode damage” against H0. Thus, categorization of P values into levels of evidence against H0 could support other factors (Reports 1992; Akinsanya and Afolami 2019) usually employed in root-knot nematode-host suitability designation (Table 2).

Table 2 A rating for host suitability designation to Meloidogyne spp

Dritsoulas et al. (2019) also found that metabarcoding, which has terrific potential to characterize soil communities of EPNs and their natural enemies at lower expense (because species specific tools are not required) than use of quantitative polymerase chain reaction (qPCR). Interestingly, they demonstrated that all EPN species detected by qPCR were also recognized by metabarcoding. So, they harnessed statistics to measure sampling accuracy based on gathered data from each method by the fit of TPL. Metabarcoding showed significantly better sample reliability, because its coefficient of determination (r2) for the linear regression equation of TPL was considerably higher than that of qPCR. Furthermore, analysis of variance of EPN abundance in different habitats resulted in the same conclusions. In order to grasp the subject more soundly, they also applied canonical correspondence analysis which revealed the same abiotic variables (pH, and clay content) as most relevant factors to the community variation in both data sets. Eventually, such statistical inferences could help to determine the most cost-effective and reliable means of assessing soil food webs of techniques currently available (Dritsoulas et al. 2019). Furthermore, these conclusions implied a value-added approach to pest management programs.

Another example is related to nematode ecology which uses TPL as an empirical law (e.g., Duncan and Phillips 2009). As a rule of thumb, this law generally refers to a procedure or standard, based on practical experience rather than theory (Clapp et al. 2011). The law has been broadly applied as a principle in different disciplines other than nematology. Nevertheless, in nematology, it is used to describe and quantify nematode distribution patterns in Egypt (Abd-Elgawad 2014, 2016) and elsewhere (Duncan and Phillips 2009; Been and Schomaker 2013). Thus, recent nematological investigations could integrate theoretical and empirical methods to improve TPL application. This was materialized in sufficient data which supported a shift in the thumb rule value for the randomness of TPL (Abd-Elgawad 2020).

Conclusions

Statistical planning of bioassays and experiments should be a pre-consideration in nematology. Common misuses of statistics in research and reporting should be avoided. On the contrary, useful conclusions from nematological data via upgraded and modern statistical analyses should be further drawn. Statistical analyses always develop and consequently can help in optimizing nematological conclusions. Reckoning with the careful and recent application of statistics, one may expect that data analyses and outputs will be scholarly better assessed in near future to achieve considerable advances in nematology. Nonetheless, the identification of research priorities for harnessing statistics in sustainable agriculture in the context of grasping nematode relevant ecology, biology, and interaction with other microorganisms and agricultural inputs may hasten such advances.