Introduction

In commerce, any negotiation puts buyer and supplier in direct conflict.Footnote 1 Although the exchange of products and services can take place either with legal contracts or as informal agreements promoting the welfare of all participants, the main characteristic of negotiation is the attempt of one adversary to gain more. Even in honest and open negotiations with a relatively free flow of well-defined objectives among all participants, there are still differences between the antagonisms of buyers and sellers. Each adversary is an independent decision maker at least in theory, capable of assuming responsibility for her own decisions. In the commerce of large lots of standardized goods, statistical modeling and the concepts of probability can distinguish between different points of view, recognizing and revealing the conflicts inherent in negotiations. Consequently, to ensure the quality of large lots, each party may require different contractual sampling plans which specify lot size (N), sample size (n) and the maximum number of defective parts (c) in the sample that still allows for lot acceptance, the formal symbols are PL(N, n, c).

The main objective of this paper is to discuss the relationship between acceptance sampling and formal hypothesis tests as developed by NP. Considering that the pioneering work of Dodge and Romig (DR) (1929) in acceptance sampling, which has survived decades of academic debate and practice, arrived before the formalization of hypothesis testing by NP, the question is why bring hypothesis testing into the discussion at all. Throughout the rest of this paper, we will attempt to show that, if used appropriately, hypothesis testing offers a more logically complete structure to decision-making and therefore to better decisions.

It is common in the literature (for example, Shmueli 2011; Hund 2014), but not in the original pioneering work of DR, to associate consumer and producerFootnote 2 risk with the concepts of the probability of type II and type I error, respectively. In our approach, we shall go further and develop the generalization that both consumer and producer feel the cost of both errors. In other words, we shall explicitly allow for two type I errors and two type II errors depending on the perspective of the consumer or the producer. We will show that the decision-making process may be compromised by the commonly used simplification that type I error is felt only by the producer and in like manner type II error is inclusive only to the consumer. Hypothesis testing from NP, by offering a common theoretical structure, can produce a better understanding of the application of sampling procedures and their results.Footnote 3 In a series of examples, we show that measures of risk will be more reliable and risk itself lowered.

Deming (1986) was opposed to acceptance sampling. He argued that inspection by sampling leads to the erroneous acceptance of bad product as a natural and inevitable result of any commercial process, which in turn leads to the abandonment of continuous process improvement at the heart of the organization. Deming’s position that inspection should be either abandoned altogether or applied with 100% intensity has been debated in the literature (see Chyu and Yu 2006, for review and a Bayesian approach to the question), and his position is supported by some. Even though acceptance sampling is only a simple counting exercise with no analysis for uncovering the causes of non-conforming quality,Footnote 4 our position is that acceptance sampling should be an integral part of the commercial–industrial process and, even when perfect confidence reigns between buyer and seller, but sampling itself should never be abandoned. Deming (1975), however, was very much in accord with statistical studies by random sampling that are restricted to inferring well-defined characteristics of large populations, just not as a procedure for continuous quality improvement.

In the next section, we discuss traditional acceptance sampling emphasizing those concepts modified in the rest of the text. Sections “Lot tolerance percent defective (LTPD) in consumer risk” and “Acceptable quality limit (AQL) in producer risk” will present the traditional relationships between AQL and producer risk, and LTPD and consumer risk. Section “A unique sampling plan for both parties—DR tradition” closes the discussion of traditional acceptance sampling offering the possibilities of constructing sampling plans that are unique for both producer and consumer. In section “Acceptance sampling via hypothesis tests”, we will lay out our interpretation of NP hypothesis testing and its connection to acceptance sampling. The next two sections will attempt a synthesis of basic concepts in NP hypothesis testing and acceptance sampling. We then propose new procedures for the solution to unique sampling plans that simultaneously satisfy producer and consumer. Finally, the last two sections present conclusions and ideas for future work in the area. A series of appendices offer review material for statistical concepts frequently used in acceptance sampling, including R snippets that give a brief description of the R code used in figures and tables.

Traditional acceptance sampling

Considering that the pioneering work of DR comes earlier than the formalization of hypothesis testing by NP, the question is why acceptance sampling should integrate hypothesis testing at all. We will show that hypothesis testing can offer a common structure that generalizes and clarifies some issues in the application of acceptance sampling.

DR formally introduced inspection sampling in 1929 and in fact only from the viewpoint of the consumer. The priority given to the consumer will be an important ingredient for the discussion of hypothesis testing in this paper. They mention producer risk only marginally. In 1944, they emphasize even more clearly their position that consumer risk is their first priority (Dodge and Romig 1944).

The first requirement for the method will, therefore, be in the form of a definite assurance against passing any unsatisfactory lot that is submitted for inspection. […] For the first requirement, there must be specified at the outset a value for the lot tolerance percent defective (LTPD) as well as a limit to the probability of accepting any submitted lot of unsatisfactory quality. The latter has, for convenience, been termed the Consumer’s Risk….

Both consumer and producer are concerned with the quality of the lot measured by the percentage (p = X/N) of defective items. Values of p close to zero indicate that the lot is high quality. In traditional acceptance sampling, it is natural to assume that the producer requires a relatively low maximum value for p to guarantee that the lot is, in fact, acceptable to the consumer. The producer calls this limiting value for p the acceptable quality level (AQL). Even though management and business strategy determine the value of AQL, it should reflect the actual value of quality reached by the producer. A value of AQL lower than the dictates of the production line will lead to sequential rejections. On the other hand, the consumer in question will allow for a limiting value of p that is a maximum value for defining the defective rate tolerable to the consumer who calls this value the LTPD.Footnote 5 Any value of p greater than LTPD signifies that the consumer will reject the lot as low quality. Both producer and consumer know that AQL should be substantially lower than LTPD; this signifies that lots have relatively high quality when they leave the producer and avoid rejection by the buyer.

The classification rule in traditional acceptance sampling is relatively simple: a lot is acceptable if in a sample of size n the number of defective parts x is less than or equal to a predetermined cutoff value c. The inequality x ≤ c identifying a high quality sample signifies that it is very likely that the lot also possesses an acceptable level of quality. On the other hand, if x is greater than c (x > c), identifying a low-quality sample, then it is likely that the lot is also of low quality. The practice of acceptance sampling determines the values of c and n that reduce to an acceptable level the probability of error. In other words, the intention of acceptance sampling is to minimize the probability of wrongly classifying the lot. Equation (1) represents the conditional probability of the sample indicating unacceptable low quality x > c when, in fact, the lot is high quality \(p \le {\text{AQL}}\), a false positive (FP).Footnote 6 The cost of rejecting good lots falls heavily on the producer, more than the consumer.

$$P\left( {x > c/p \le {\text{AQL}}} \right) = P\left( {\text{FP}} \right) = {\text{probability}}\;{\text{of}}\;{\text{a}}\;{\text{false}}\;{\text{positive}}\;\left( {\text{FP}} \right)$$
(1)

The equation illustrates that the frequency of FPs depends on the chosen values of c and AQL. For example, if they were chosen to result in P(FP) = 5%, then, in the universe of high-quality lots, 5% of all samples would indicate in error that the lot was unacceptable. DR label Eq. (1) as producer risk. The producer who rejects a good product is creating a problem that in fact does not exist, perhaps even stopping the assembly line to find solutions to difficulties only imagined. Traditional acceptance sampling refers to Eq. (1), the probability of type 1 error, as α. We emphasize that DR never associated producer and consumer risk to type I and type II error.

When the sample includes a small number of defective items (x ≤ c), this indicates to the buyer that the lot is high quality. Equation (2) represents the conditional probability of the sample indicating high quality when, in fact, the lot is unacceptable (p > LTPD). This condition represents a false negative (FN).

$$P\left( {x \le c/p > {\text{LTPD}}} \right) = P\left( {\text{FN}} \right) = {\text{probability}}\;{\text{of}}\;{\text{a}}\;{\text{false}}\;{\text{negative}}\left( {\text{FN}} \right)$$
(2)

DR labeled Eq. (2) as consumer risk since acquiring bad product would harm assembly lines or retail with low-quality inputs and merchandise. In traditional acceptance sampling, the probability of type 2 error (Eq. (2)) takes the name of β. In the application of acceptance sampling, the producer and consumer predetermine the acceptable values for P(FN) and P(FP) along with LTPD and AQL. The solution for n and c called a sampling plan (or sample design) PL(n, c) is mathematically determined from the binomial or Poisson distribution. All of this information is summarized Table 1.

Table 1 Traditional lot classification in acceptance sampling with sample results compared to the real state

The application of acceptance sampling procedures fall into three simple steps:

  1. 1.

    Determine the values for the size of the sample n and the cutoff limit c.

  2. 2.

    Draw the sample and count the number of defective items c in it.

  3. 3.

    a. If x ≤ c then accept the lot as likely high quality, but this may be wrong for β percent of bad lots.

    b. If x > c then reject the lot as likely low unacceptable quality, but this may be wrong for α percent of good lots.

Later on, our considerations on hypothesis testing will require modifications in the above steps. Appendix “Operating characteristic curve (OCC)” presents the operating characteristic curve (OCC). The curve is a standard statistical tool for understanding and constructing sampling plans and appears several times in this article.

Lot tolerance percent defective (LTPD) in consumer risk

The consumer defines the LTPD as the maximum acceptable rate of poor quality. The sampling plan, if well thought out, possesses values for n and c that indicate little chance of acceptance if p is greater than the LTPD tolerated by the consumer. Specifically, this means that the probability of error P(x ≤ c/p ≥ LTPD) = β is very small. In other words, the consumer protects herself against poor quality by choosing an adequate sampling plan that keeps her risk at a low and tolerable level for undesirable levels of p.

In Fig. 1 the sampling plan is PL(3000, 200, 0), remembering that it is generally beneficial to the buyer to have a very small c. In this example, LTPD is 1%. With p equal to 1% or greater (quality worse), there is a probability of still accepting the lot equal to 0.125 or less. Depending upon the necessities and market power of the buyer, consumer risk of less than 12.5% may be required. It is important to emphasize that the sampling plans analyzed in this section follow the cumulative hypergeometric distribution (Appendix “Hypergeometric distribution”).

Fig. 1
figure 1

Hypergeometric sampling plan as OCC for PL(3000, 200, 0) and LTPD = 0.01

Along the OCC, the pair of values LTPD and P(LTPD) signifies a single point. There are several configurations of PL(N, n, c) compatible with a given pair of values for LTPD, P(LTPD), each configuration producing different shapes for the OCC. The choice of configuration, in practice, is not as free as it seems. Technology and the commercial terms of the negotiation usually impose lot size N. The value of c usually does not flee too far from zero. In the end, only sample size n remains unknown. We discuss this question further in what follows.

Table 2 shows new calculations for consumer sampling plans defined by P(LTPD) and LTPD. The columns labeled letter, N, n, and c are common to most sampling standards. Shmueli (2011) uses ANSI/ASQC Z1.4 and ISO 2859 (1999, 1985) extensively. Note that in the table adequate sampling plans for the consumer are not abundant. There are few plans that produce a risk factor less than 10%. They appear mostly in the last three lines of the table. Table 3 produces comparable results for producer risk. This exercise in comparing consumer and producer risk serves to demonstrate the difficulty for two bargaining parties to find one unique plan that would satisfy the minimum risk requirements of both simultaneously. We will return to this topic after the discussion of producer risk.

Table 2 Traditional consumer sampling plans and consumer risk with the hypergeometric distribution
Table 3 Traditional producer sampling plans and producer risk with the hypergeometric distribution; Appendix “Producer sampling plans recalculated with the hypergeometric distribution” has a complete table of producer risk

Acceptable quality limit (AQL) in producer risk

Producer error comes from the idea that the producer suffers more from the rejection of good lots than the acceptance of bad ones. To calculate producer risk, the producer must decide upon the value of AQL. If p ≤ AQL, then the batch is defined as good, and likewise, if p > AQL lots are considered non-compliant. Well-chosen AQL and corresponding sampling plans reduce producer risk and therefore increase the probability of not rejecting good lots. The producer should offer items that bring high levels of satisfaction to the consumer and consequently renewed contracts. This means that AQL should always be less than LTPD.

In Fig. 2, the sampling plan is PL(3000, 10, 0), and AQL is 0.5%. For p equal to 0.5%, the probability of accepting the lot is equal to P(AQL) = 0.951. Since the sum of the probabilities of accepting the good lot and rejecting the good lot is equal to unity (see the definition of α in Table 1), the probability of rejection of the good lot is 0.049 (= 1 − 0.951). If p is less than the AQL of 0.5%, high quality is present; the producer is more likely to accept the lot. Remember that the probability of rejecting good lots is producer risk. In Fig. 2, the horizontal line P(AQL) divides the vertical axis at 0.951, and the part above that point up to the limit of one is the producer risk 0.049. In industry, a producer risk 1 − P(AQL) below 5% is very attractive and usual for acceptance sampling.

Fig. 2
figure 2

Hypergeometric sampling plan with the OCC for PL(3000, 10, 0) and AQL = 0,005

The pair AQL and P(AQL) may correspond to several sampling plans and, consequently, several OCCs. In the next section, we will investigate the difficulties in finding a single sampling plan that would allow for both producer and consumer to possess their own distinct values of AQL and LTPD, and the risk factors P(LTPD) and {1 − P(AQL)} all for one unique sampling plan, PL(N, n, c). If such a plan exists, one unique inspection somewhere between producer and consumer would satisfy all parties and inspection costs reduced.

Table 3 has the same sampling plans as Table 2 but from the point of view of the producer. The table has an additional set of sampling plans in the last columns where c has comparatively larger values than c in the other columns. Only in the last column of the table are there sampling plans that would satisfy the requirements of the producer. Plans where c = 5 or 6 have risk factors that are less than 10%, even though in practice a risk limit of 5% for the producer is much more common.

Recognizing that the points of view of the producer or consumer come from different perspectives, the value chosen for c is of pivotal importance. The producer will want a c that is relatively large, admitting the possibility of accepting bad lots so that there is no rejection of good lots. The consumer, on the other hand, will desire a low value for c, consequently rejecting some good lots but better guaranteeing the acceptance of only good lots. Therefore, it will be difficult if not impossible to find values for c that will serve the desires of both adversaries.

A unique sampling plan for both parties—DR tradition

A solution for a unique sampling plan for buyer and seller might be determined based on five predetermined parameters and the search for appropriate values for n and c. The five predetermined parameters are:

  • LTPD e P(LTPD), desired values for the consumer,

  • AQL e [1 − P(AQL)], desired values for the producer, and

  • N lot size (same for both parties), necessary as a parameter of the hypergeometric probability function H{}.

  • Following Eq. 6 in the appendix,

    $${\text{Consumer}}\;{\text{risk}} = P({\text{LTPD}}) = {\text{H}}\left\{ {{\text{PL}}\left( {N,n,c} \right),{\text{LTPD}}} \right\}$$
    (3)
    $${\text{Producer}}\;{\text{risk}} = 1 - P({\text{AQL}}) = 1- {\text{H}}\left\{ {{\text{PL}}(N,n,c),\;{\text{AQL}}} \right\}$$
    (4)

It is common practice in all standards for commerce and industry to work with consumer and producer risks at maximum values of either 5 or 10%. The common risk percentages produce four cases illustrated in Table 4.

Table 4 Common risk settings for maximum permissible values for consumer and producer risk

In case 5–5, both producer risk [1 − P(AQL)] and consumer risk P(LTPD) are 5%. In Fig. 3 AQL and LTPD are set at 0.005 and 0.01, respectively. We have drawn the corresponding ROC curve (see Appendix “The receiver operating characteristics (ROC) curve”) using the hypergeometric function for c = 10, 11, 12, 13. The plan PL(3000, 1600, 11) satisfies the risk conditions specified by buyer and seller, that both risks be less than 5%. Because of the discreteness of the probability function, consumer risk is 4.9% and producer risk is 3.2% at c = 11. Along the ROC curve, the value of c changes and accordingly the values of α and β. For example, the plan PL(3000, 1600, 12) is supported by α = 0.7% and β = 9.9%. This last plan is much better for the producer and much worse for the consumer. The higher ROC curve originates from the binomial distribution with the same sampling plan; nevertheless, due to the mathematics of the binomial, consumer and supplier risks result in much larger values, greater than 10%. The binomial deceives the decision makers into seeing almost double the risk where it does not exist.

Fig. 3
figure 3

ROC curve hypergeometric and binomial sampling plan PL(3000, 1600, c) for Case 5-5 from Table 4. AQL = 0.5% and LTPD = 1.0%

Case 5–10, illustrated in Fig. 4, is the most encountered in practice: consumer risk at 10% and producer risk 5%. Buyers (who are disinterested or ignorant to the disadvantages) apply sampling plans that follow these risk levels even though they are prejudicial to the buyer himself. For AQL and LTPD at 0.005 and 0.01, respectively, the plan PL(3000, 1400, 10) satisfies the risk conditions specified. Buyer risk is 0.098 and producer risk is 0.034. This plan is slightly easier to apply than case 5–5, given the smaller sample n and acceptance number c.

Fig. 4
figure 4

ROC curve hypergeometric and binomial sampling plans PL(3000, 1400, c): case 5-10 from Table 4 AQL = 0.5% and LTPD = 1.0%

Case 10–5 in Fig. 5, represents a sampling plan that pleases the buyer and demonstrates his market power by putting the seller at a disadvantage. This case is actually quite frequent when the seller is a small or medium sized establishment and the buyer is a large retailing or manufacturing firm; producer risk [1 − P(AQLp)] has been placed at 10% while consumer risk P(LTPDc) remains at 5%. AQL continues to be 0.005. The resulting unique sampling plan is PL(3000, 1400, 9). The buyer should be very pleased with this plan represented by a risk factor of 4.7%, while on the other side, the supplier finds his position weakened, as he is obligated to produce at a relatively high-quality rate AQL of 0.005 and must confront a risk factor of 9.7%. Once again, the difference is large between the outcomes of the hypergeometric and the binomial probability functions.

Fig. 5
figure 5

ROC curve hypergeometric and binomial sampling plans PL(3000, 1400, c): advantageous to the consumer Case 10-5 AQL = 0.5% and LTPD = 1.0%

The last case in Table 4, where both consumer and producer risks are 10%, is not analyzed due to its very rare occurrence.

What is unclear in traditional acceptance sampling is the necessity of linking AQL exclusively to the producer and LTPD exclusively to the consumer. In reality, the consumer should also be preoccupied with a value of AQL, as should the producer with LTPD. We also question why type I error is always associated with the producer as producer risk, and likewise, the same question arises with consumer risk which is necessarily associated with type II error. The resolution of these questions is new to the literature and the remainder of this article will elaborate a response. In the next sections, we show that hypothesis test concepts from NP are relevant to practical applications of acceptance sampling, but only if the specific nature of the decision maker is taken into account.

Acceptance sampling via hypothesis tests

Historically, the work of Dodge and Romig (1929) appeared before the concepts of hypothesis testing received wide acceptance in practice. Their work depends exclusively on probability functions, and the probabilistic interpretation of the concepts of producer and consumer risk some years before Neyman and Pearson (1933) offered their seminal interpretation of type I and type II error.

DR worked in industry and commerce and, subsequently, the design of acceptance sampling they developed, because of the innate conflict between buyers and sellers, was strictly applicable to this environment. Our review of hypothesis testing is at most a simple skeleton of the relevant area of scientific methodology, better elaborated in works like Rice (1995, chapter 9) and the original work of Neyman and Pearson (1933). Nevertheless, our interpretation of acceptance sampling in light of hypothesis testing is new to the literature. First, we will concentrate on the nature and definition of the null hypothesis.

Simply stated, a hypothesis is a clear statement of a characteristic of a population and usually its numerical value, or of a relationship among characteristics (something happens associated with something else), that may or may not be true. It carries with itself a doubt that calls for evaluation. Hypotheses are not unique but come in pairs (or multiples not reviewed here) of exclusive statements in the sense that if one statement is true then the other statement is false. When the decision maker judges one of the hypotheses as true, he necessarily judges the other as false. The lot is conforming or non-conforming. Vaccination drives reached the target population or not. Your candidate is winning the election campaign or is not winning. The accused is either innocent or guilty.

From the viewpoint of the decision maker, the consequences of incorrectly rejecting one of the hypotheses are usually more severe than those of incorrectly rejecting the other. As we have seen above, lots are either conforming or non-conforming, and for the consumer for instance, incorrectly accepting the non-conforming lot committing the false negative can be disastrous. In such a case, the null hypothesis is the statement that costs the most when wrongly judged (Rice 1995). This nomenclature serves to organize relevant social or industrial questions or laboratory experiments. The null carries the symbol Ho, the alternative hypothesis Ha. From the consumer’s point of view, therefore, the null hypothesis is that the lot is non-conforming. Rejecting this null when it is true incurs extremely high costs for the consumer. In similar fashion but from the producer point of view, the null hypothesis is that the lot is conforming, because as mentioned already, rejecting this null has extremely high costs for the producer. We illustrate these differences in Table 5.

Table 5 Hypotheses and the decision maker

The hypothesis test attempts to classify the lot by accepting or rejecting the null usually by examining a small random sample. In Table 5, the decision maker indicates states of the null by examining a small sample of the population and consequently accepting or rejecting the null hypothesis. For the purpose of this article, we follow statistical methodology; a random sample from the relevant population indicates the state of the null. However, other methods are available outside the realm of Statistics, like flipping a coin or throwing seashells in a basket. In the population itself, the null is, in reality, either true or false, even though this condition in the population is unknown to the decision maker, and continues to be unknown even after the sampling procedure. As shown in Table 6, the result of the acceptance sampling procedure can have one of four possible results.

Table 6 Occurrences for hypothesis tests of the consumer, and Table 8 that of the producer

Two quadrants are labeled as correct, and the other two as errors. In general, we would like to maximize the probability of falling into the correct boxes and minimize the probability of error. Following NP, accepting as true the false null is a type II error, whereas rejecting a true null is type I error. The exact definition of the null hypothesis is crucial, and as stated above should be defined as the condition that incurs the highest cost if chosen in error. The choice of which of the two hypotheses is to be the null depends, therefore, on the decision maker, and how he perceives the distinctive costs of the two errors.

In acceptance sampling, the statistical test of the validity of the null hypothesis is based on the relationship between x and c, given the value of n. When the null hypothesis suffers rejection, the researcher makes an inference as inference as to the population value of the characteristic in the hypothesis test. However, the value of the characteristic is not a point estimate but rather only a probabilistic generalization of a region of values inferred from x and c. In other words, the rejection of the null does not imply anything about the point value of p itself other than its role in determining the conformance of the lot. Even the construction of confidence intervals do not supply isolated point estimates of the population parameters but rather an interval of probable variation around the point estimate.

We have assumed in the discussion above that the null for the producer is that the lot is conforming, or analogously the production line is stable producing a good product. The engineer who tries to correct problems that do not exist (he rejects the null when it is true) is wasting precious time in worthless activities. This is a type I error, the basis of the p valueFootnote 7 to the statistician and equivalent to producer risk for the industrial engineer. Increasing the value of the cutoff c will decrease the probability of type I error by making rejection of the lot more difficult. However, increasing the value of c makes acceptance easier and therefore will increase the probability of type II error, β p . Considering that type II error is relatively less important for the producer, the tradeoff tends to be attractive for the producer. From the consumer’s side of the story and contrary to the producer, the null should be that the lot is non-conforming, in other words, that the consumer should naturally distrust the quality of the lot or process.

In traditional acceptance sampling, consumer risk has been exclusively the subject of the consumer, and producer risk the subject of the producer. Nonetheless, there is no conceptual reason to restrict each risk factor to only one adversary in the negotiation process. Logically, there is no reason why the producer should not recognize and react to the probability of accepting the bad lot what has been called up to now consumer risk. Accepting the bad lot is certainly a problem for the producer, however, as described earlier a problem of secondary intensity. Likewise, rejecting the good lot and committing a false positive is also a problem for the consumer but of only moderate intensity. From the viewpoint of the consumer we have, LTPDc and P(LTPDc) and furthermore AQLc and P(AQLc).Footnote 8 Here the consumer feels both risks, primarily the probability of accepting bad lots P(LTPDc), and less intensely the probability of rejecting good lots [1 − P(AQLc)]. The consumer can and should construct his sampling plan using both risks recognizing that less consumer risk should be his objective since its repercussions are more costly, while he tolerates more secondary risk. Specifically, the consumer, for example, could use a P(LTPDc) of 3% and a [1 − P(AQLc)] of 10%.

The producer could follow analogous procedures. The producer will apply not only the risk pair AQLp and [1 − P(AQLp)] as would be traditional, but also the risk pair LTPDp and P(LTPDp) recognizing that the producer suffers from his own secondary risk even though by a lesser degree. For example, the producer could set [1 − P(AQLp)] to 1% and P(LTPDp) 10%.

The correct routine for hypothesis testing is that first, we elaborate the hypothesis by conceptualizing an important characteristic of the population, and only then, in a second step, are the relevant probabilities of the resulting sampled data calculated. More importantly, the state of the hypothesis in the population is usually unknown, and will remain that way forever. Of course, one day in the future, end users will know the quality of the lot with certainty, depending upon the availability of all appropriate data. Nevertheless, even after ample time has passed, lot quality will remain elusive.

This section has been a first attempt at generalizing risk factors to both players. We have allowed consumers to recognize producer risk and producers may now acknowledge consumer risk. However, we have kept the two decision makers each as a self-determining unit. In later sections, we attempt to generalize acceptance sampling to the case of both risks applying to both producer and consumer simultaneously.

Acceptance sampling from the viewpoint of the decision maker

As seen above, the definition of the null depends on point of view. The producer must decide on a limiting value for AQLp above which the lot is unacceptable. In other words, if the fraction defective p is less than AQLp (p ≤ AQLp), then the lot is defined by the producer as conforming. On the other hand, for the consumer, (p ≤ LTPDc) defines the conforming lot. Under no circumstances should we assume that the values of AQLp and LTPDc are equal, nor should they be, given that they come from distinct decision makers on opposite sides of the negotiation.

Each decision maker should weigh the importance of two risks when constructing his own sampling plan: a primary risk based on his own null hypothesis and a secondary risk based on his own alternative hypothesis. For the consumer, the relation \(p \le {\text{AQL}}_{{{\text{C}} }}\) defines secondary risk. Likewise, p > LTPDp defines the secondary risk for the producer. Consequently, Eq. (1) and (2) can be rewritten as the following, featuring either the viewpoint of the producer emphasizing AQLp and LTPDp, or the consumer emphasizing LTPDc and AQLc, respectively.Footnote 9

$$P\left( {x > c/p \le {\text{AQL}}_{\text{P}} } \right) = P\left( {\text{FP}} \right) = \alpha_{\text{P}}$$
(1a)
$$P\left( {x \le c/p > {\text{LTPD}}_{\text{P}} } \right) = P\left( {\text{FN}} \right) = \beta_{\text{P}}$$
(2a)
$$P\left( {x \le c/p > {\text{LTPD}}_{\text{C}} } \right) = P\left( {\text{FN}} \right) = \alpha_{\text{C}}$$
(1b)
$$P\left( {x > c/p \le {\text{AQL}}_{\text{C}} } \right) = P\left( {\text{FP}} \right) = \beta_{\text{C}}$$
(2b)

The probability of type I errors (α P and α C) called primary risks is given in Eqs. (1a) and (1b). Both equations represent the rejection of the respective null when it is true. Similarly, the probability of type II errors (β P and β C) here called secondary risks is given in Eqs. (2a) and (2b). These four equations can be collapsed back to the original Eqs. (1) and (2) by assuming unique values of c and n and assuming AQLp = AQLc and LTPDc = LTPDp. Consequently, α P = β C and α C = β P. Considering that the producer and the consumer are independent decision makers, there is no reason to expect these equalities in the real world. It is essential for the logic of this paper to understand the relative importance of FP and FN for the decision makers. For producers, FPs are more important, and for consumers, FNs are more important as illustrated in the next tables. One further equation completes the concepts for hypothesis testing: α < β, reflecting the higher costs of type I error. Table 7 elaborates the point of view of the consumer, and Table 8 that of the producer. At the buyer’s warehouse, the inspection will indicate the high or low quality of the lot. In some very rare cases, the buyer accepts the lot without inspection implying total confidence between buyer and seller, but considering the rarity of mutual confidence in the marketplace, the buyer usually undertakes some kind of inspection. The desire of the buyer at the moment of inspection is to maximize the probability of accepting good lots, a true negative (TN), or rejecting bad ones, a true positive (TP). There is a tradeoff between the probability of erroneously accepting the bad lot, a false negative (FN), and the probability of erroneously rejecting the good lot, a false positive (FP). An appropriate probability function can measure these probabilities and form the basis for the construction of sampling plans.

Table 7 Alternative occurrences for consumer

For the consumer, FN is an error of disastrous proportions, and as stated above, DR calls the probability of this occurrence consumer risk. Accepting the defective lot means placing inadequate material on the assembly line or on the store shelves. On the other hand, the consumer who rejects good lots commits FP, a lesser error, even though the result may be costly in terms of unnecessary replacement costs and delays. DR and all posterior literature in acceptance sampling assume that the risk of rejecting acceptable lots has no relevance to the consumer. Even though this assumption may have been necessary to simplify the probability calculations in the early 1900s, today’s calculators have made this assumption reasonably gratuitous. Logically, the sampling plan that satisfies the consumer will have as a sample cutoff a very small number of defective items c and a relatively large sample size n, facilitating rejection. This means that some good lots will be judged guilty as non-conforming and rejected, but this consequence is less troublesome to the consumer. For instance, when a multinational makes purchases from a small supplier, the multinational sets c at zero, compelling the small supplier to rectify some lots unjustly rejected (Squeglia 1994). Logically, since its repercussions are more severe, the decision maker should hold primary risk to lower levels when compared to secondary risk.

Table 8 describes the point of view of the producer. The null for the producer is that the lot is conforming. In this case, type I error means that the lot is actually conforming but has been rejected by the producer. As always and by definition, type I error is more costly than type II error.

Table 8 Alternative occurrences for producer

Immediately after the fabrication of the lot, but before expediting the lot to the consumer, an inspection should occur to verify its quality level p. The interest of the producer is to expedite lots with acceptable quality to assure customer satisfaction and future transactions. When an inspection by the producer results in rejected lots, the common practice is to apply universal 100% inspection to the rejected lot still in the factory replacing all bad parts. From the producer point of view, the major worry is the probability of the rejection of good lots P(x > c/p ≤ AQLp), which are false positives FP known as producer risk, a false alarm calling the producer to action where action is not necessary. The producer priority is to avoid the rejection of good lots, and, consequently, the sampling plan includes relatively large c. Clearly, with c large, the producer judges some bad lots as conforming, but as emphasized above this error is less troublesome for the producer.

Calculating the unique plan that satisfies both parties, all risks

This section illustrates what happens when taking into account both consumer and producer simultaneously, and assumes that each adversary worries about both kinds of risk, the probability of accepting the bad lot and the probability of rejecting the good lot. Moreover, each adversary will prioritize one of the two risks and downgrade the importance of the other. In other words, we should select only those sampling plans where α < β. This generalization of acceptance sampling is not present in the literature.

As stated earlier the possibility of letting bad product pass through undetected is only a minor nuisance for the producer and so he tolerates this error at higher levels of risk. In Table 9, we allow the values of α and β to vary and revised plans are reported. Sample size shows itself to be very sensitive to β risk. When allowing for higher values of secondary β risk, these new configurations can greatly reduce the cost of sampling. A small secondary risk of 5% requires a sample size of 1600, whereas allowing for more secondary risk brings sample size down to 1000. The hypergeometric distribution is the basis of all of the calculations in Table 9, with two exceptions commented on later. Producer and consumer fix primary risks at less than 5% and secondary risks vary. Lot size is 3000 units. LTPD is 1% and AQL is 0.5%.Footnote 10 The sampling plan PL(N, n, c) should satisfy these five parameters. The resulting sample sizes are much smaller than lot size 3000. We have devoted two lines in Table 9 to the binomial distribution to illustrate its inaccuracy. In the first line, the sample size is equal to the size of the lot, an illogical result since sampling by definition requires samples smaller than the population. 100% inspection is not sampling and therefore there is no need to specify a value for c. This illogical result is due to the large variance formed by the binomial distribution explained in Appendix “Hypergeometric distribution”. Generally, the binomial yields pessimistic results; measures of risk and dispersion are overestimated. Surely, such pessimism could lead to discouragement for sampling procedures and even the abandonment of sampling programs.

Table 9 Hypergeometric sampling plans PL(3000, n, c) for several values of α and β, N = 3000, LTPDp is 1%; AQLp is 0.5%

Figure 6 shows the appropriate ROC curves for the consumer sampling plan PL(3000, 1400, c) and PL(3000, 1400, c) for the producer. The dialog between the two parties, in this case, is not so difficult since the two plans are very similar. The two parties could use the same sample of 1400 and proceed with the counting of defective items. The sampling procedure would be inconclusive in only one case when x = 10. The cutoff value for the producer is the value of 10 (x ≤ 10) meaning that the producer will see the lot as good quality, whereas the consumer will see this value of x as demonstrating the low quality of the lot (x > 9). The appearance of exactly 10 defective items in the sample will occur in only 5% of the lots and therefore should not cause extreme conflict between parties.

Fig. 6
figure 6

ROC curve sampling plans for producer PL(3000, 1400, 10) and consumer PL(3000, 1400, 9)

The examples above have all used N = 3000 for illustrative purposes for didactic reasons only. In Table 10, we have collected results that permit population size to vary. Noteworthy in the table is the similarity for plans among the last three population sizes of 20,000, 30,000, and 40,000. Sample size and cutoff values are all the same PL(n, 3000, c = 20, 21) but risk factors are not, as population size increases risk factors also increase, once again showing the necessity of using the hypergeometric distribution for obtaining more accurate results even with relatively large populations.

Table 10 Hypergeometric sampling plans PL(N, n, c) for several lot sizes given LTPDp is 1%; AQLp is 0.5%

Conclusions

The paper has shown that sampling plans should be true to the data and the situation they represent. When lot sizes are finite, statistical approximations may lead to serious estimation errors. Since the process of sampling is inherently error prone, the sampling process should employ only the most accurate data available, including lot size. The priority for researchers in any area should be the utilization of the most appropriate formulations, like the hypergeometric distribution in sampling plans, so that unnecessary additions to inherent errors do not occur.

There are several directions for advancing this research, for example, the important area of Bayesian statistics not mentioned in the paper. This was not because we wanted to devalue its appropriateness for sampling plans, but rather because its contribution would be doubtful to the questions of this paper, based entirely on traditional frequentist statistics. Bayesian approaches will be an important part of future research. The work of Suresh and Sangeetha (2011) is especially interesting in this regard. Finally, we suggest that the application of the R package Shiny (shiny.rstudio.com, RSTUDIO 2013) could facilitate the day-to-day operation of the approach proposed here. Contributions to the literature have already gone in this direction (Hund 2014).

We intend to expand this research into the areas of public health and political polling. Public health has seen extensive applications of traditional acceptance sampling in lot quality assurance sampling (LQAS), but in our view, this area would benefit greatly from the reformulation following a careful reassessment of NP. The public health literature for LQAS is ambiguous on the issue of defining the null hypothesis, depending on the author and the area under scrutiny (Biedron, et al. 2010; Rhoda et al. 2010; Pagano and Valadez 2010). We suggest that the sampling plans should recognize and emphasize the viewpoint of the target population as Rhoda et al. (2010) argues. For example in the case of a vaccination campaign where coverage is of major interest, the population (analogous to the consumer) should suggest a null that would define the cutoff where coverage is inadequate. A sample that indicates the target population is receiving adequate levels of coverage, when in fact coverage is inadequate, is a very serious error indeed.

For political polling, the decision makers perspective can come from the candidate (consumer of consulting services) or from the pollster (supplier, producer). The null for the candidate is that his adversary is winning the political campaign, analogous to the null of the buyer that lots are of bad quality. If in fact, the adversary is winning but the candidate believes that his own campaign is winning, the result for the candidate could be very costly. Falsely believing his campaign is ahead; he may diminish his efforts and consequently fall behind even more. This is type I error for the candidate. Public health and polling have already seen some progress in applying the new methods proposed here (Samohyl 2015, 2016).