1 Introduction

The term "sample size" refers to the quantity of subjects, patients, or units that will be part of a biomedical investigation. The process of determining how many observations or repetitions to incorporate into a statistical sample is known as sample size determination. In any empirical research endeavor aimed at making inferences about a population based on a sample, careful consideration of sample size is crucial. The primary objective of calculating sample size is to ascertain the number of samples required to detect unknown clinical parameters, treatment effects, or associations once data collection is complete [1].

Recently, biomedical studies have faced heightened scrutiny within the context of evidence-based medicine. The strength of evidence is directly contingent on the statistical soundness of the supporting research [2]. In statistical terms, a "population" encompasses the entire group, while a "target population" refers to a subset of participants possessing specific demographic and clinical characteristics, which are the focus of the intervention research. A "sample" denotes a portion of the target population selected for inclusion in the study [3].

Crucially, determining the sample size is a pivotal facet of the research process as it leads to more accurate findings that can be more broadly applicable. To effectively employ sample size calculation formulas, a researcher must have a solid grasp of factors such as significance level, effect size, study capacity, and effect size; margin of error; ratio within the population; and design effect [4].

2 Sample size calculation: basic statistical concepts

2.1 Null hypothesis and alternative hypothesis

The null hypothesis, denoted as H0, represents a statement that contradicts the researcher's or experimenter's expectations or predictions. It essentially posits that the variables lack a precise or genuine relationship. For instance, in a clinical trial assessing a new drug, the null hypothesis might state that the new drug, on average, is no more effective than the current drug. On the other hand, the alternative hypothesis, denoted as H1, outlines the purpose of a statistical hypothesis test. In the same clinical trial scenario, the alternative hypothesis could propose that the new drug has a different average impact compared to the existing drug. An alternative hypothesis is a statement that suggests or hints at a potential finding or result for an investigator or researcher. These alternative theories are categorized into two groups: directional and non-directional.

2.2 Significance levels

The P value, denoted as p, serves as a crucial metric in statistical hypothesis testing. It signifies the probability of obtaining a result as extreme as, or more extreme than, the observed data, assuming that the null hypothesis is true. A P value that is less than or equal to the significance level α is deemed statistically significant. The significance levels commonly used are 0.1, 0.05, and 0.01, representing thresholds that reflect the likelihood of observing such a high value by random chance [5]. This metric can be envisioned as a predictive gauge of how well the data aligns with the model employed for its analysis. It ranges from 0, indicating complete incompatibility, to 1, denoting full compatibility. This value offers insight into how effectively the model captures the essence of the data under consideration [6]. Typically, a P value of 0.05 or less is regarded as "statistically significant," implying that the observed result is unlikely to have occurred by chance. Conversely, any P-value greater than 0.05 is considered "nonsignificant," suggesting that the observed result may be reasonably explained by random variability [6].

2.3 Study power

Sample size calculation influences the generalizability of study findings to the broader population. A study's power strengthens its statistical capability and diminishes the chance of making a Type II error, thereby lowering the likelihood of incorrect negative conclusions. This statistical power is denoted as 1 − 1 − β and, in most clinical trials, a power of 0.8 (or 80%) is deemed optimal for effectively detecting a statistically significant distinction. With an 80% power, there remains a 20% possibility of failing to detect a significant difference, even if it truly exists [4].

2.4 Effect size

An effect size is a numerical measure, derived from either actual data or a sample, that quantifies the strength of the relationship between two variables within a population. This value might arise from an equation elucidating the contributions of statistics and elements to the effect size, or it could represent a parameter value for an imagined population [7]. Effect sizes can be described in terms of various parameters, including the regression coefficient in a regression analysis, the difference in means, or the likelihood of a specific event [8] (such as a heart attack) taking place.

2.5 Prevalence rate in the population

Prevalence, in the context of a specific population, denotes the proportion of individuals who exhibit a particular disease or characteristic at a specific juncture or over a defined duration [9]. In Epidemiology, prevalence distinguishes itself from incidence by encompassing all existing and previous cases within the population at the time of assessment, while incidence exclusively accounts for new occurrences [9]. Point prevalence specifically pertains to the observed prevalence at a particular moment in time, representing the percentage of individuals with a specific disease or attribute on a specific date. On the other hand, period prevalence refers to the prevalence of a disease over a specified span, signifying the proportion of individuals with a particular illness or condition at some point within that interval [9].

$$\mathrm{Prevalence\, of\, disease}= \frac{\mathrm{all\, new \,and\, pre}-\mathrm{existing\, cases\, in\, a\, given\, period}}{\mathrm{ population\, during\, the\, same\, period}} \times {10}^{{\text{n}}}$$
(1)
$$\mathrm{Prevalence\, of \,attribute}= \frac{\mathrm{persons\, having\, particular\, attribute\,in\, a\,given\,period}}{\mathrm{Population\, during\, the\, same \,period}} \times {10}^{{\text{n}}}$$
(2)

The value of 10n is usually 1 or 100 for common attributes. The value of 10n might be 1,000, 100,000, or even 1,000,000 for rare attributes and most diseases.

Example for calculating prevalence in population. An evaluation of 2000 pregnant women attending a hospital in Kano, Nigeria reported a total of 785pregnant women taking supplements at least 2 times a week during the second trimester. Calculate the prevalence of frequent supplement use in this group.

$$\mathrm{Prevalance }= (785 / 2000) \times 100 = 0.39 \times 100 = 39\mathrm{\%}$$

Therefore, the prevalence of using supplement in this group is 39%

2.6 Incidence rate

An incidence rate is a measure of how easily disease spreads across a population. The incidence rate is the ratio of the number of cases to the total time that the population is exposed to the disease. The incidence, which can be expressed as a risk or an incidence rate, represents the number of new cases of disease over a given period.

$$\mathrm{Incidence\, rate}= \frac{\mathrm{Number\, of\, new\, cases\, of\, disease}}{\mathrm{ time\, each\, person\, was\, observed},\mathrm{ the\, total\, for\, all\, the\, people}}$$
(3)

Examples: Calculating Incidence Rates in the population.

Example: In the year 2019, 130,000 new cases of HIV were reported in Nigeria. If the population of people in Nigeria is estimated to be 200,000,000 in 2019. Calculate the incidence rate of HIV in 2019 in Nigeria.

$$\mathrm{Incidence\, rate}= \frac{\mathrm{Number\, of\, new\, cases\, of\, disease}}{\mathrm{ total\, population x Timeframe}}$$
(4)

Incidence rate = (130,000 ⁄ 200,000,000 × 1) × 100,000.

 = 65 new cases of HIV per 100,000 population.

The Incidence rate is therefore 65 new cases of HIV per 100,000 population.

2.7 Margin of error

The margin of error quantifies the degree of random sampling variability present in the results of a survey. A greater margin of error implies less certainty that a poll's findings accurately represent the broader population. When the margin of error is elevated, it suggests either an inaccurate sampling of the population or a variation in the measure. In such cases, the variable being measured typically exhibits a positive variance.

2.8 Standard deviation (SD) in the population

The standard deviation gauges the extent of spread within a given dataset. A smaller standard deviation indicates that the data points are closely clustered around the mean (also known as the expected value), while a higher standard deviation signifies a wider dispersion of values [10]. When the standard deviation is smaller, the data consistently centers around the mean, whereas a larger standard deviation indicates greater variability or dispersion in the data. A standard deviation approaching 0 implies that data points are near the mean, whereas a high or low standard deviation suggests that data points deviate from the mean, either above or below it, respectively [11].

3 Sample size calculation for cross-sectional studies/surveys

A cross-sectional study alternatively called a transversal study, prevalence study or cross-analysis is a type of research that focuses on data from a population or a typical subgroup at a specific point in time

$$\mathrm{Sample\, size }({\text{n}}) =\frac{{{(Z}_{1-\frac{\alpha }{2}})}^{2}(p)(q)}{ {(d)}^{2}}$$
(5)

n = sample size.

Z1−α/2 = Critical value and a standard corresponding level of confidence.

(At 95% CI or 5% level of significance (type-I error) it is 1.96 and at 99% CI it is 2.58).

P = prevalence or based on previous research q = 1-p.

d = Margin of error or precision.

Example for calculating sample size from the cross-sectional survey. Calculate the sample size required to study the prevalence of liver diseases in a capital of a country if a previous study indicated that liver diseases in the population was 9%, at 95% CI and 5% margin of error.

Given:

Z1−α/2 = 1.96.

P = 9% = 0.09.

q = 1–0.09 = 0.91.

d = 5% = 0.05.

Applying

$$\mathrm{Sample \,size }\left({\text{n}}\right)=\frac{{{(Z}_{1-\frac{\alpha }{2}})}^{2}\left(p\right)\left(q\right)}{ {\left(d\right)}^{2}}$$
$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{\left(1.96\right)}^{2}\left(0.09\right)\left(0.91\right)}{ {\left(0.05\right)}^{2}}=125.85$$

(n) = 126.

Therefore, the investigator needed a minimum of at least 126 samples.

4 Calculating sample size for quantitative variable

Quantitative variables are factors that can be counted or numerically measured in the population, such as a person's height, weight, age, arm length, blood pressure, temperature, glucose level, hemoglobin content, and cholesterol level. These variables can be calculated using the formula below.

Where.

Z1−α/2 = Critical value and a standard corresponding level of confidence.

SD = Standard Deviation.

d = Margin of error or precision

$$\mathrm{Sample \,size }\left({\text{n}}\right)=\frac{{{(Z}_{1-\frac{\alpha }{2}})}^{2}{(SD)}^{2}}{ {\left(d\right)}^{2}}$$
(6)

Example for calculating sample size with a quantitative variable. Calculate the sample size require to study fasting blood sugar level in the population of a city at a 95% confidence interval if the margin of error is 2 mmol/L (mean blood sugar) and fasting blood sugar from previous studies was recorded as 6.5 mmol.

Given:

Z1−α/2 = 1.96.

SD = 6.5

d = 2.

by employing the formula:

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{{(Z}_{1-\frac{\alpha }{2}})}^{2}{(SD)}^{2}}{ {\left(d\right)}^{2}}$$
$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{(1.95)}^{2}{(6.5)}^{2}}{ {\left(2\right)}^{2}}=40.164$$

(n) = 40.

In this case, the researcher needed at least 40 samples.

5 Sample size calculation from the case–control survey

A case–control study, also known as a case–reference study, is a research method that involves comparing two existing groups with different outcomes based on a key causative factor. Its primary purpose is to investigate whether exposure is associated with a specific outcome. This type of study is commonly employed to identify factors contributing to a particular health condition by contrasting individuals with the condition to those without it. Examples of case–control studies include examining the relationship between tobacco smoking and lung cancer, alcohol consumption and hepatic diseases, as well as mortality rates and smoking. There are two main types of case–control studies:

Qualitative case–control study: This type aims to establish a non-quantitative relationship, such as the link between tobacco smoking and lung cancer or exposure to carcinogens versus non-exposure.

Quantitative case–control study: This type focuses on quantifying the association between two conditions (disease and non-disease).

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{{(r+1)p(1-p)({Z}_{1-\beta }+Z}_{1-\frac{\alpha }{2}})}^{2}}{ {r\left({p}_{1}+{p}_{2}\right)}^{2}}$$
(7)

where.

n = number of required sample.

r = ration of case to control.

p = population proportion.

p1 = case proportion.

p2 = control proportion.

z1-β = the power of the study.

z1-α/2 = Critical value.

Calculation of sample size from a qualitative case–control study. Calculate the sample size required to investigate the relationship between coffee consumption and risk of heart diseases at 95% CI and 80% power of the study. Assumes that the expected proportion in the case is 20% and the control group is 8%.

Given;

r = 1.

p = 0.14.

z1-β = 0.84.

z1-α/2 = 1.96.

p1 = 0.2

p2 = 0.08.

By employing Eq. (6):

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{{(r+1)p(1-p)({Z}_{1-\beta }+Z}_{1-\frac{\alpha }{2}})}^{2}}{ {r\left({p}_{1}+{p}_{2}\right)}^{2}}$$
$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{(1+1)(0.14)(1-0.14)(0.84+1.95)}^{2}}{ {1\left(0.2+0.08\right)}^{2}}=24.08$$

(n) = 24.

From the calculations above, the researcher needed at least 24 samples.

Calculation of sample size from a quantitative case–control study. Calculate the sample size required to study the amount of monosodium glutamate associated with kidney damage at 95% CI and fix the power of the study at 80% if a researcher obtained from the previous study the mean difference in monosodium consumption between the case and control groups was 2 mg/day and SD was 10 mg/day.

Given that:

r = 1.

SD = 10.

z1-β = 0.84.

z1-α/2 = 1.95.

d = 2.

using the following equation:

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{{(r+1){(SD)}^{2}({Z}_{1-\beta }+Z}_{1-\frac{\alpha }{2}})}^{2}}{ {r\left(d\right)}^{2}}$$
(8)
$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{(1+1){(10)}^{2}(0.84+1.95)}^{2}}{ {1\left(2\right)}^{2}}=389.205$$

(n) = 389.

The researcher, therefore, needed at least 389 samples.

6 Sample size estimation for cohort studies

A cohort study is a type of medical research aimed at understanding the causes of diseases and how various risk factors influence health outcomes. It constitutes a specific form of longitudinal study that observes a group of individuals who share a common characteristic (forming a cross-section of a cohort) at different points in time. Cohort studies represent one of the fundamental designs in epidemiology and find application in research across diverse fields including social science, nursing, medicine, psychology, and other areas reliant on hard-to-obtain evidence-based responses (statistics).

While clinical trials predominantly serve to evaluate the effectiveness of newly developed pharmaceuticals before their market release, epidemiological analysis of how risk factors influence disease occurrence plays a crucial role in classifying the origins of diseases and furnishing pre-clinical evidence regarding the viability of protective measures.

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{(({Z}_{1-}\frac{\alpha }{2})\sqrt{\left\{(1+1/m)p(1-p)\right\}}+({Z}_{1-}\beta )\sqrt{\left\{({p}_{0}(1-{p}_{0})/m){p}_{1}(1-{p}_{1})\right\}})}^{2}}{ {\left({p}_{0}-{p}_{1}\right)}^{2}}$$
(9)

where.

n = number of samples required.

m = number of control subject per group.

p0 = control possibility.

p1 = Experimental possibility.

p = population proportion.

z1-β = the power of the study.

z1-α/2 = Critical value.

Example of calculating same size estimation from a cohort studies. Calculate the sample size needed to assess the association between tobacco smoking and risk of mortality at 95% CI and 80% power of the study with the equal number of case and control subjects if a previous study indicated that proportion of risk of mortality in tobacco smokers is 15% and, in the control, group is 24%.

Given that;

m = 1.

p = 0.195.

z1-β = 0.84.

z1-α/2 = 1.96.

po = 0.24.

p1 = 0.15.

Applying:

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{(({Z}_{1-}\frac{\alpha }{2})\sqrt{\left\{(1+1/m)p(1-p)\right\}}+({Z}_{1-}\beta )\sqrt{\left\{({p}_{0}(1-{p}_{0})/m){p}_{1}(1-{p}_{1})\right\}})}^{2}}{ {\left({p}_{0}-{p}_{1}\right)}^{2}}$$
$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{(1.96\sqrt{\left\{2\left(0.195\right)\left(1-0.195\right)\right\}}+0.84\sqrt{\left\{\left(0.24\left(1-0.24\right)\right)0.15(1-0.15)\right\}})}^{2}}{ {\left(0.24-0.15\right)}^{2}}=185.6593$$

(n) = 186.

Based on this calculation, 186 samples are needed to assess the association between tobacco smoking and the risk of mortality.

7 Sample size estimation for comparative studies

Comparative studies involve the examination, comparison, and contrast of various subjects or concepts. These studies employ both quantitative and qualitative methods to assess similarities and/or differences between two groups of subjects and/or patients. For instance, if a researcher aims to investigate the effects of an antimalarial drug, they will partition their subjects into two groups: one receiving the antimalarial drug and the other receiving a placebo for comparison.

Quantitative comparative studies

  1. (1)

    Calculate the sample size required to study the effect of new antihypertensive drug if this drug reduces hypertension by 8 mmHg and the standard deviation did on the previous study with other drugs indicated 20 mmHg. Assumes that the level of significance is 5% and study power 80%

Applying

SD = 20 mg/dl.

d = 8 mg/dl.

z1–β = 0.84.

z1–α/2 = 1.96

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{{2{(SD)}^{2}({Z}_{1-\beta }+Z}_{1-\frac{\alpha }{2}})}^{2}}{ {\left(d\right)}^{2}}$$
(10)
$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{2(20)}^{2}{(0.84+1.96)}^{2}}{ {\left(8\right)}^{2}}=98$$

(n) = 98.

Therefore, the investigator needed a minimum of at least 98 samples.

  1. (2)

    A clinician wants to compare a new antidiabetic drug with a placebo. If the new antidiabetic drug reduces the blood sugar level by 10 mg/dl as compared to placebo, which he considered as clinically significant. Assuming a previous study recorded SD as 20 mg/dl. Calculate the sample size if he decided to conduct the study at 95% CI with 80% power of the study.

Given that:

SD = 20 mg/dl.

d = 10 mg/dl.

z1-β = 0.84.

z1-α/2 = 1.96.

Applying

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{{2{(SD)}^{2}({Z}_{1-\beta }+Z}_{1-\frac{\alpha }{2}})}^{2}}{ {\left(d\right)}^{2}}$$
(11)
$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{2(20)}^{2}{(0.84+1.96)}^{2}}{ {\left(10\right)}^{2}}=62.72$$

n = 63.

Therefore, the investigator needed a minimum of at least 63 samples.

Qualitative comparative studies

  1. (1)

    If a previous study indicated that 35% of patients suffering from AIDS die within a stipulated time. A clinician is confident that a new antiretroviral therapy tested and found to increase survival to 24% would be clinically significant. Assuming the effect size to be between proportions 0.35 – 0.24 = 0.11. Calculate the sample size required at a 5% level of significance and study power of 80%.

Given that:

p = 0.295.

p1 = 0.35.

p2 = 0.24.

zβ = 0.84.

zα/2 = 1.96.

Applying:

$$\mathrm{Sample\, size }\left({\text{n}}\right)=\frac{{{2p(1-p)({Z}_{1-\beta }+Z}_{1-\frac{\alpha }{2}})}^{2}}{ {\left({p}_{0}-{p}_{1}\right)}^{2}}$$
(12)
$$\mathrm{Sample \,size }\left({\text{n}}\right)=\frac{2(0.295)(1-0.295){(0.84+1.96)}^{2}}{{\left(0.35-0.24\right)}^{2}}=269.5081$$

(n)=270.

Therefore, the investigator needed a minimum of at least 270 samples.

8 Sample size formula for animal studies

Taking into account both ethical considerations and financial constraints, it is imperative to conduct experiments with a limited number of animals, but with meticulously designed protocols to ensure precise data analysis. It is strongly advised that researchers engage a statistician in the early stages of experimental design. Furthermore, embarking on any experiment without a clear understanding of how the results will be assessed is inadvisable [12]. The power analysis approach, akin to its application in human studies, is often the preferred method for determining sample size in animal research. Additionally, the 'resource equation' method stands as an equally valid alternative for ascertaining the appropriate sample size in studies involving animals [2, 12, 14, 15]. This approach aligns with the overarching goal of conducting research that is scientifically sound, cost-effective, and ethically responsible.

The resource equation [13] is given as

$$E= \left(the\, total \,number\, of\, experimental\, units\right)- \left(the\, total \,number\, of\, treatment\, groups\right)$$
(13)

where E denotes sample size should be between 10 and 20 [16].

The power analysis. The power, significance level, sidedness, standard deviation, size effect, and Sample size are six variables that are mathematically related to power analysis. If the first five are known, the sixth (normally sample size) can be calculated [16].

Sample size calculation for animal study having one experimental group and one control

  1. (I)

    For an experiment involving the use of one experimental group and one control to be compared using an independent T-test, the number of animals can be calculated using the formula below.

    $$df = {\text{N}} - {\text{s}} = {\text{sn}} - {\text{s}} = {\text{s}}\left({\text{n}} - 1\right)$$
    (14)

where N denotes the total number of subjects, s denotes the number of groups, and n denotes the number of subjects per group, df denotes the degree of freedom. n is obtained by rearranging the formula:

$$\mathrm{n }=\mathrm{ df}/\mathrm{s }+ 1$$
(15)

The df in the formulas is substituted with the minimum (10) and maximum (20) based on the appropriate range of the df to achieve the optimal and maximum numbers of animals per group:

For minimum, n = 10/s + 1 and for maximum, n = 20/s + 1.

Example. Calculate the minimum and a maximum number of animals required per group to test a new antidiarrheal drug with control if the research is designed to have 2 groups. Assume df to be 10 and 20 respectively.

$$\mathrm{n }=\mathrm{ df}/\mathrm{s }+ 1$$
(16)
$$ \begin{aligned} Minimum = {\text{n}} = & \frac{{df}}{{\text{s}}} + 1 \\ = & \frac{{10}}{2} + 1 \\ = & 6\;animals\;per\;group, \\ \end{aligned} $$
$$Total sample size= 6\times 2=12$$
$$ \begin{aligned} Maximum = {\text{n}} = & \frac{{df}}{{\text{s}}} + 1 \\ = & \frac{{20}}{2} + 1 \\ = & 11\;animals\;per\;group, \\ \end{aligned} $$
$$Total\, sample\, size=11\times 2=22$$

Sample size calculation for animal studies having one control group with two or more experimental groups

  1. (II)

    for an experiment involving the use of one control group and two or more experimental groups to be treated with the same compound at varying concentrations and at the end to be analyzed using ANOVA, the formula below can be used.

    $$df = \left({\text{N}} - 1\right)\left({\text{v}} - 1\right)$$
    (17)

where N is the total number of subjects and v is the number of measurements repeated

$$\mathrm{N }=\mathrm{ df}/(\mathrm{v }- 1) + 1$$
(18)

To obtain the minimum and maximum numbers of animals available, substitute the df in the formulas below with 10 and 20, respectively:

$$Minimum= {\text{N}} =\frac{ df}{{\text{v}} - 1}+ 1=\frac{10}{v-1}+1$$
(19)
$$Maximum=\mathrm{ N} =\frac{ df}{{\text{v}} - 1}+ 1=\frac{20}{v-1}+1$$
(20)

Example. Calculate the sample size required to test the toxicity of different doses of cisplatin 4.5 mg/kg,5.5 mg/kg, 6.5 mg/kg, and 7.5 mg/kg against control.

$$ \begin{aligned} Minimum = {\text{N}} = & \frac{{10}}{{v - 1}} + 1 \\ = & \frac{{10}}{{\left( {5 - 1} \right)}} + 1 \\ = & \frac{{10}}{4} + 1 \\ = & 2.5 + 1 = 3.5 \\ \end{aligned} $$

3.5 should be rounded to 4

$$ \begin{aligned} Maximum = {\text{N}} = & \frac{{20}}{{v - 1}} + 1 \\ = & \frac{{20}}{{5 - 1}} + 1 \\ = & \frac{{20}}{4} + 1 \\ = & 5 + 1 = 6 \\ \end{aligned} $$

Since this experiment may require sacrificing animals, the total sample size required can be calculated as follows

$$Minimum = N\times v =4 \times 5= 20\, animals$$
$$Maximum= N\times v =6\times 5 =30 \,animals$$

As this experiment involve cisplatin which is toxic, it is advisable to use the maximum number of animals as some may die during the experiment. In this case, 30animals may be divided into 5 groups with each group containing 6 animals per group.

The animal sample size for one control group and two or more experimental groups to treated with different compounds

  1. (III)

    for an experiment involving the use of one control group and two or more experimental groups to be treated with two different compounds at varying concentrations and at the end to be analyzed using ANOVA, the formula below can be used.

    $$\mathrm{n }=\frac{\mathrm{ df}}{\mathrm{gv }}+ 1$$
    (21)

where df denotes the degree of freedom, N denotes the total number of subjects, g denotes the number of groups, n denotes the number of subjects per group, and v denotes the number of measurements repeated.

Example. If a single dose of freshly made streptozotocin of 45 mg/kg body weight was given to animals to induce diabetes. Calculate the sample size needed to test the hypoglycemic effect of moringa olifera extract in diabetic and non-diabetic mice with the following groups(G): G1: non-diabetic (normal saline), G2(diabetic untreated), G3:diatetic (100 mg/kg), G4:(diabetic 400 mg/kg), and G5:(Diabetic 800 mg/kg) and G6: positive control(metformin 42 mg/kg).

Note that the number of treatment group can be considered as (negative control, positive control, treated groups, and diabetic untreated which is 4) and number of repeated measurement=6

$$\mathrm{n }=\mathrm{ df}/\mathrm{gv }+ 1$$
(22)
$$ \begin{aligned} minimum = {\text{n}} = & \frac{{df}}{{{\text{gv}}}} + 1 \\ = & \frac{{10}}{{4 \times 6}} + 1 \\ = & \frac{{10}}{{24}} + 1 \\ = & 0.4 + 1 = 1.4\sim 2 \\ \end{aligned} $$
$$ \begin{aligned} Maximum = & \frac{{df}}{{{\text{gv}}}} + 1 \\ = & \frac{{20}}{{4 \times 6}} + 1 \\ = & \frac{{20}}{{24}} + 1 \\ = & 0.8 + 1 = 1.8\sim 2 \\ \end{aligned} $$

In this case Minimum = Maximum

$$Total\, sample\, size = Minimum =Maximum=N \times g =2\times 4 = 8$$

As the experiment will involve sacrificing an animal, the total sample size can be given as

$$Minimum =Maximum = N \times g = 8 \times 6 =48\, animals$$

Therefore 48 animals can be divided into 6 groups with each group having 8 animals per group.

Software and online calculators for computing sample size. There are many software and online calculators used for calculating sample size. Some of these are presented in Table 1.

Table 1 Software and online calculators for computing sample size

9 Conclusion

An ample sample size is crucial for ensuring that research findings can be applied to the broader population, providing insights that are both definitive and precise. In the field of biomedical science, particularly in clinical and animal research, maintaining a high standard of quality is imperative. This ensures that any conclusions drawn about the origins, pathophysiological mechanisms, as well as strategies for prevention and treatment, are not only relevant but also reliable. Achieving this level of rigor involves meticulous determination of sample sizes, which in turn contributes to the validity and robustness of the research outcomes. This approach is fundamental in advancing our understanding of various diseases and conditions, ultimately leading to more effective and targeted interventions in the field of healthcare.