Skip to main content

On Detecting Some Defective Items inĀ Group Testing

  • Conference paper
  • First Online:
Computing and Combinatorics (COCOON 2023)

Abstract

Group testing is an approach aimed at identifying up to d defective items among a total of n elements. This is accomplished by examining subsets to determine if at least one defective item is present. We focus on the problem of identifying a subset of \(\ell < d\) defective items. We develop upper and lower bounds on the number of tests required to detect \(\ell \) defective items in both the adaptive and non-adaptive settings while considering scenarios where no prior knowledge of d is available, and situations where some non-trivial estimate of d is at hand.

When no prior knowledge on d is available, we prove a lower bound of \( \varOmega (\frac{\ell \log ^2n}{\log \ell +\log \log n})\) tests in the randomized non-adaptive settings and an upper bound of \(O(\ell \log ^2 n)\) for the same settings. Furthermore, we demonstrate that any non-adaptive deterministic algorithm must ask \(\varTheta (n)\) tests, signifying a fundamental limitation in this scenario. For adaptive algorithms, we establish tight bounds in different scenarios. In the deterministic case, we prove a tight bound of \(\varTheta (\ell \log {(n/\ell )})\). Moreover, in the randomized settings, we derive a tight bound of \(\varTheta (\ell \log {(n/d)})\).

When d, or at least some non-trivial estimate of d, is known, we prove a tight bound of \(\varTheta (d\log (n/d))\) for the deterministic non-adaptive settings, and \(\varTheta (\ell \log (n/d))\) for the randomized non-adaptive settings. In the adaptive case, we present an upper bound of \(O(\ell \log (n/\ell ))\) for the deterministic settings, and a lower bound of \(\varOmega (\ell \log (n/d)+\log n)\). Additionally, we establish a tight bound of \(\varTheta (\ell \log (n/d))\) for the randomized adaptive settings.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 59.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 79.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    A lower bound for the number of tests when the algorithm knows exactly d, is also a lower bound when the algorithm knows some estimate of d or does know d.

References

  1. Ahlswede, R., Deppe, C., Lebedev, V.: Finding one of d defective elements in some group testing models. Probl. Inf. Transm. 48, 04 (2012)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  2. Balding, D.J., Bruno, W.J., Torney, D., Knill, E.: A comparative survey of non-adaptive pooling designs. In: Speed, T., Waterman, M.S. (eds.) Genetic Mapping and DNA Sequencing. The IMA Volumes in Mathematics and its Applications, vol. 81, pp. 133ā€“154. Springer, New York, NY (1996). https://doi.org/10.1007/978-1-4612-0751-1_8

  3. Bar-Noy, A., Hwang, F.K., Kessler, I., Kutten, S.: A new competitive algorithm for group testing. Discret. Appl. Math. 52(1), 29ā€“38 (1994)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  4. Bshouty, N.H.: Lower bound for non-adaptive estimation of the number of defective items. In: 30th International Symposium on Algorithms and Computation, ISAAC 2019, December 8ā€“11, 2019, Shanghai University of Finance and Economics, Shanghai, China, pp. 2:1ā€“2:9 (2019)

    Google ScholarĀ 

  5. Bshouty, N.H., Bshouty-Hurani, V.E., Haddad, G., Hashem, T., Khoury, F., Sharafy, O.: Adaptive group testing algorithms to estimate the number of defectives. In: Algorithmic Learning Theory, ALT 2018, 7ā€“9 April 2018, Lanzarote, Canary Islands, Spain, pp. 93ā€“110 (2018)

    Google ScholarĀ 

  6. Bshouty, N.H., Diab, N., Kawar, S.R., Shahla, R.J.: Non-adaptive randomized algorithm for group testing. In: International Conference on Algorithmic Learning Theory, ALT 2017, 15ā€“17 October 2017, Kyoto University, Kyoto, Japan, pp. 109ā€“128 (2017)

    Google ScholarĀ 

  7. Bshouty, N.H., Haddad, G., Haddad-Zaknoon, C.A.: Bounds for the number of tests in non-adaptive randomized algorithms for group testing. In: Chatzigeorgiou, A., et al. (eds.) SOFSEM 2020. LNCS, vol. 12011, pp. 101ā€“112. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-38919-2_9

    ChapterĀ  MATHĀ  Google ScholarĀ 

  8. Cabrera Alvargonzalez, J.J., et al.: Pooling for SARs-COV-2 control in care institutions. BMC Inf. Dis. 20(1), 1ā€“6 (2020)

    Google ScholarĀ 

  9. Chen, H., Hwang, F.K.: Exploring the missing link among d-separable, d\(^-\)-separable and d-disjunct matrices. Discret. Appl. Math. 155(5), 662ā€“664 (2007)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  10. Cheng, Y., Du, D., Xu, Y.: A zig-zag approach for competitive group testing. INFORMS J. Comput. 26(4), 677ā€“689 (2014)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  11. Damaschke, P., Muhammad, A.S.: Randomized group testing both query-optimal and minimal adaptive. In: BielikovĆ”, M., Friedrich, G., Gottlob, G., Katzenbeisser, S., TurĆ”n, G. (eds.) SOFSEM 2012. LNCS, vol. 7147, pp. 214ā€“225. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-27660-6_18

    ChapterĀ  Google ScholarĀ 

  12. Dorfman, R.: The detection of defective members of large populations. Ann. Math. Stat. 14(4), 436ā€“440 (1943)

    ArticleĀ  Google ScholarĀ 

  13. Du, D., Hwang, F.K.: Pooling Design and Nonadaptive Group Testing: Important Tools for DNA Sequencing. World Scientific Publishing Company, Singapore (2006)

    Google ScholarĀ 

  14. Du, D., Hwang, F.K.: Competitive group testing. Discret. Appl. Math. 45(3), 221ā€“232 (1993)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  15. Du, D., Park, H.: On competitive group testing. SIAM J. Comput. 23(5), 1019ā€“1025 (1994)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  16. Du, D., Xue, G., Sun, S., Cheng, S.: Modifications of competitive group testing. SIAM J. Comput. 23(1), 82ā€“96 (1994)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  17. Du, D.-Z., Hwang, F.K.: Combinatorial Group Testing and Its Applications. World Scientfic Publishing, Singapore (1993)

    Google ScholarĀ 

  18. D.-Z. Du and F. K. Hwang. Pooling Designs And Nonadaptive Group Testing: Important Tools For DNA Sequencing. World Scientfic Publishing, Singapore (2006)

    Google ScholarĀ 

  19. Dā€™yachkov, A.G., Rykov, V.V.: Bounds on the length of disjunctive codes. Probl. peredachi Inf. 18(3), 7ā€“13 (1982)

    Google ScholarĀ 

  20. Eis-HĆ¼binger, A.M.: Ad hoc laboratory-based surveillance of SARS-CoV-2 by real-time rt-RT-PCR using minipools of RNA prepared from routine respiratory samples. J. Clin. Virol. 127, 104381 (2020)

    ArticleĀ  Google ScholarĀ 

  21. Falahatgar, M., Jafarpour, A., Orlitsky, A., Pichapati, V., Suresh, A.T.: Estimating the number of defectives with group testing. In: IEEE International Symposium on Information Theory, ISIT 2016, Barcelona, Spain, 10ā€“15 July 2016, pp. 1376ā€“1380. IEEE (2016)

    Google ScholarĀ 

  22. FĆ¼redi, Z.: On r-cover-free families. J. Comb. Theory, Ser. A 73(1), 172ā€“173 (1996)

    Google ScholarĀ 

  23. Gollier, C., Gossner, O.: Group testing against covid-19. Covid Economics, pp. 32ā€“42, April 2020

    Google ScholarĀ 

  24. Haddad-Zaknoon, C.A.: Heuristic random designs for exact identification of defectives using single round non-adaptive group testing and compressed sensing. In: The Fourteenth International Conference on Bioinformatics, Biocomputational Systems and Biotechnologies, BIOTECHNO 2022 (2022)

    Google ScholarĀ 

  25. Hwang, F.K.: A method for detecting all defective members in a population by group testing. J. Amer. Stat. Assoc. 67, 605ā€“608 (1972)

    ArticleĀ  MATHĀ  Google ScholarĀ 

  26. Katona, G.O.: Finding at least one excellent element in two rounds. J. Stat. Planning Inf. 141(8), 2946ā€“2952 (2011)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  27. Kautz, W., Singleton, R.: Nonrandom binary superimposed codes. IEEE Trans. Inf. Theory 10(4), 363ā€“377 (1964)

    ArticleĀ  MATHĀ  Google ScholarĀ 

  28. Kuppusamy, P., Bharathi, V.: Human abnormal behavior detection using CNNs in crowded and uncrowded surveillance - a survey. Meas. Sens. 24, 100510 (2022)

    ArticleĀ  Google ScholarĀ 

  29. Liang, W., Zou, J.: Neural group testing to accelerate deep learning. In: IEEE International Symposium on Information Theory, ISIT 2021. IEEE (2021)

    Google ScholarĀ 

  30. Mentus, C., Romeo, M., DiPaola, C.: Analysis and applications of adaptive group testing methods for covid-19. medRxiv (2020)

    Google ScholarĀ 

  31. Porat, E., Rothschild, A.: Explicit nonadaptive combinatorial group testing schemes. IEEE Trans. Inf. Theory 57(12), 7982ā€“7989 (2011)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  32. Roth, R.M.: Introduction to Coding Theory. Cambridge University Press, Cambridge (2006)

    Google ScholarĀ 

  33. RuszinkĆ³, M.: On the upper bound of the size of the r-cover-free families. J. Comb. Theory Ser. A 66(2), 302ā€“310 (1994)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  34. Schlaghoff, J., Triesch, E.: Improved results for competitive group testing. Comb. Probab. Comput. 14(1ā€“2), 191ā€“202 (2005)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  35. Shani-Narkiss, H., Gilday, O.D., Yayon, N., Landau, I.D.: Efficient and practical sample pooling for high-throughput PCR diagnosis of covid-19. medRxiv (2020)

    Google ScholarĀ 

  36. Sobel, M., Groll, P.A.: Group testing to eliminate efficiently all defectives in a binomial sample. Bell Syst. Tech. J. 38, 1179ā€“1252 (1959)

    ArticleĀ  MathSciNetĀ  Google ScholarĀ 

  37. Wang, W., Siau, K.: Artificial intelligence, machine learning, automation, robotics, future of work and future of humanity: a review and research agenda. J. Database Manage. (JDM) 30(1), 61ā€“79 (2019)

    ArticleĀ  Google ScholarĀ 

  38. Wolf, J.: Born again group testing: multiaccess communications. IEEE Trans. Inf. Theory 31(2), 185ā€“191 (1985)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  39. Wu, J., Cheng, Y., Du, D.: An improved zig zag approach for competitive group testing. Discret. Optim. 43, 100687 (2022)

    ArticleĀ  MathSciNetĀ  MATHĀ  Google ScholarĀ 

  40. Xie, S., Girshick, R., DollĆ”r, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1492ā€“1500 (2017)

    Google ScholarĀ 

  41. Yelin, I., et al.: Evaluation of covid-19 rt-qPCR test in multi-sample pools. medRxiv (2020)

    Google ScholarĀ 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Catherine A. Haddad-Zaknoon .

Editor information

Editors and Affiliations

Appendices

A Known Results forĀ Detecting All theĀ Defective Items

The following results are known for detecting all the d defective items. See the Table in Fig.Ā 3.

Fig. 3.
figure 3

Results for the test complexity of detecting the d defective items. The lower bounds are in the \(\varOmega \)-symbol and the upper bound are in the O-symbol

  • In (1) and (2) (in the table in Fig.Ā 3), the algorithm is deterministic adaptive, and d is known in advance to the algorithm. The best lower bound is the information-theoretic lower bound \(\log {n\atopwithdelims ()d}\ge d\log (n/d)+\varOmega (d)\). Hwang inĀ [25] gives a generalized binary splitting algorithm that makes \(\log {n\atopwithdelims ()d}+d-1=d\log (n/d)+O(d)\) tests.

  • In (3) and (4), the algorithm is deterministic adaptive, and d is unknown to the algorithm. The upper bound \(d\log (n/d)+O(d)\) follows fromĀ [3, 10, 14,15,16, 34, 39] and the best constant currently known in O(d) is \(5-\log 5\approx 2.678\)Ā [39]. The lower bound follows from (2). In [5], Bshouty et al. show that estimating the number of defective items within a constant factor requires at least \(\varOmega (d\log (n/d))\) tests.

  • In (5) and (6), the algorithm is randomized adaptive, and d is known in advance. The upper bound follows from (1). The lower bound follows from Yaoā€™s principle with the information-theoretic lower bound.

  • In (7) and (8), the algorithm is randomized adaptive, and d is unknown to the algorithm. The upper bound \(d\log (n/d)+O(d)\) follows from (3). The lower bound follows from (6).

  • In (9) and (10), the algorithm is deterministic non-adaptive, and d is known in advance to the algorithm. The lower bound \(\varOmega (d^2\log n/\log d)\) is proved inĀ [9, 19, 22, 33]. A polynomial time algorithm that constructs an algorithm that makes \(O(d^2\log n)\) tests was first given by Porat and RothschildĀ [31].

  • In (11) and (12), the algorithm is deterministic non-adaptive and d is unknown to the algorithm. In [4], Bshouty shows that estimating the number of defective items within a constant factor requires at least \(\varOmega (n)\) tests. The upper bound is the trivial bound of testing all the items individually.

  • In (13) and (14), the algorithm is randomized non-adaptive, and d is known in advance to the algorithm. The lower bound follows from (6). The upper bound is \(O(d\log (n/d))\). The constant in the O-symbol was studied inĀ [2, 6, 7, 13] and referenced within. The best constant known in the O-symbol is \(\log e\approx 1.443\)Ā [7].

  • In (15) and (16), the algorithm is randomized non-adaptive, and d is unknown to the algorithm. The lower bound \(\varOmega (n)\) follows from Yaoā€™s principle and the fact that, for a random uniform \(i\in [n]\), to detect the defective items \([n]\backslash \{i\}\), we need at least \(\varOmega (n)\) tests. The upper bound is the trivial bound of testing all the items individually.

B Applications

In many cases, the detection of a specific number of defective items, \(\ell \), is of utmost importance due to system limitations or operational requirements. For instance, in scenarios like blood tests or medical facilities with limited resources such as ventilators, doctors, beds, or medicine supply, it becomes crucial to employ algorithms that can precisely identify \(\ell \) defectives instead of detecting all potential cases. This targeted approach offers significant advantages in terms of efficiency, as the time required to detect only \(\ell \) defective items is generally much shorter than the time needed to identify all defects. By focusing on any subset of \(\ell \) defectives, the algorithms proposed in this paper offer more efficient procedures.

1.1 B.1 Identifying aĀ Subset ofĀ Samples thatĀ Exhibit aĀ PCR-Detectable Syndrome

Polymerase Chain Reaction or PCR testing is a widely used laboratory technique in molecular biology. This technique is used to amplify specific segments of DNA or RNA in a sample, and therefore, allowing for detection, quantification and analyses of these specific genetic sequencesĀ [13, 24, 41]. PCR tests can be designed to identify various organisms, including pathogens such as viruses or bacteria (e.g. COVID-19), by targeting their unique DNA or RNA signatures. Although PCR tests are associated with high costs and time consumption, they are extensively utilized in a wide range of fields, including medical diagnostics, research laboratories, forensic analysis, and other applications that demand accurate and sensitive detection of genetic material. This popularity is primarily attributed to their exceptional accuracy. To enhance the efficiency and cost-effectiveness of PCR testing, group testing methodologies can be applied to PCR testing. Applying group testing to PCR involves combining multiple samples into a single test sample. The combined sample, also called the group test, is then examined. If the sample screening indicates an infectious sample, this implies that at least one of the original samples is infected. Conversely, if none of the samples in the combined sample exhibit signs of infection, then none of the individual samples are infected. Typically, PCR tests are conducted by specialized machines capable of simultaneously performing approximately 96 tests. Each test-run can span over several hours. Therefore, when applying group testing to accelerate PCR process, it is recommended to employ non-adaptive methodologies.

Assume that a scientific experiment need to be conducted over a group of study participants to examine the efficiency of a new drug developed for medicating some disease related to bacterial or virus infection. Suppose that a PCR test is required to check whether a participant is affected by the disease or not. Moreover, assume that the number of the participants that volunteered for the experiment is n and the incidence rate of the infection among them is known in advance, denote that by p. Therefore, an approximation of the number of infected participants can be derived from n and p, denote that value by d. In situations where logistic constraints necessitate selecting a limited number of infected individuals, specifically \(\ell \le d\), to participate in an experiment, a non-adaptive group testing algorithm for identifying \(\ell \) defectives (virus carriers) from n samples when d is known can be employed.

1.2 B.2 Abnormal Event Detection inĀ Surveillance Camera Videos

Efficiently detecting abnormal behavior in surveillance camera videos plays a vital role in combating crimes. These videos are comprised of a sequence of continuous images, often referred to as frames. The task of identifying suspicious behavior within a video is equivalent to searching for abnormal behavior within a collection of frames. Training deep neural networks (shortly, DNN) for automating suspicious image recognition is currently a widely adopted approach for the task Ā [28, 37, 40]. By utilizing the trained DNN, it becomes possible to classify a new image and determine whether it exhibits suspicious characteristics or not. However, once the training process is complete, there are further challenges to address, specially when dealing with substantial amount of images that need to be classified via the trained network. In this context, inference is the process of utilizing the trained model to make predictions on new data that was not part of the training phase. Due to the complexity of the DNN, inference time of images can cost hundreds of seconds of GPU time for a single image. Long inference time poses challenges in scenarios where real-time or near-real-time processing is required, prompting the need for optimizing and accelerating the inference process.

The detection of abnormal behavior in surveillance camera videos is often characterized by an imbalanced distribution of frames portraying abnormal behavior, also called abnormal frames, in relation to the total number of frames within the video. Denote the total number of frames in a video by n and the number of abnormal frames by d. To identify suspicious behavior in a video, the goal is to find at least one abnormal frame among the d frames. In most cases, we cannot assume any non-trivial upper bound or estimation of any kind for d. Therefore, applying non-adaptive group testing algorithms for finding \(\ell <d\) defectives when d is unknown best suits this task.

It is unclear, however, how group testing can be applied to instances like images. Liang and Zou,Ā [29], proposed three different methods for pooling image instances: 1) merging samples in the pixel space, 2) merging samples in the feature space, and 3) merging samples hierarchically and recursively at different levels of the network. For each grouping method, they provide network enhancements that ensure that the group testing paradigm continues to hold. This means that a positive prediction is inferred on a group if and only if it contains at least one positive image (abnormal frame).

C Useful Lemmas

We will use the following version of Chernoffā€™s bound.

Lemma 5

Chernoffā€™s Bound. Let \(X_1,\ldots , X_m\) be independent random variables taking values in \(\{0, 1\}\). Let \(X=\sum _{i=1}^mX_i\) denotes their sum, and \(\mu = \textbf{E}[X]\) denotes the sumā€™s expected value. Then

$$\begin{aligned} \Pr [X>(1+\lambda )\mu ]\le \left( \frac{e^{\lambda }}{(1+\lambda )^{(1+\lambda )}}\right) ^{\mu }\le e^{-\frac{\lambda ^2\mu }{2+\lambda }}\le {\left\{ \begin{array}{ll} e^{-\frac{\lambda ^2\mu }{3}} &{} \text{ if } 0< \lambda \le 1 \\ e^{-\frac{\lambda \mu }{3}} &{} \text{ if } \lambda >1. \end{array}\right. } \end{aligned}$$
(1)

In particular,

$$\begin{aligned} \Pr [X>\varLambda ]\le \left( \frac{e\mu }{\varLambda }\right) ^{\varLambda }.\end{aligned}$$
(2)

For \(0\le \lambda \le 1\) we have

$$\begin{aligned} \Pr [X<(1-\lambda )\mu ]\le \left( \frac{e^{-\lambda }}{(1-\lambda )^{(1-\lambda )}}\right) ^{\mu }\le e^{-\frac{\lambda ^2\mu }{2}}. \end{aligned}$$
(3)

D Proofs forĀ Non-adaptive Settings

In this section, we give all the proofs that were stated in Sect.Ā 3. We restate the Theorems and Lemmas for convenience.

1.1 D.1 Deterministic Algorithms

In this subsection, we prove TheoremĀ 3. This proves resultĀ (12) in Fig.Ā 1. Result (11) follows from the algorithm that tests every item individually. Moreover, we give a detailed proof for LemmaĀ 3.

Lemma Ā 3. Let \(s\le cr\) for some constant \(1/2<c<1\). Let

$$t=O\left( \frac{r\log (n/r)+\log (1/\delta )}{\log (1/c)}\right) .$$

Consider a \(t\times n\) 0-1-matrix M where \(M_{i,j}=1\) with probability 1/r. Then, with probability at least \(1-\delta \), M is a (r,Ā s)-restricted weight one \(t\times n\)-matrix. In particular, there is a (r,Ā s)-restricted weight one \(t\times n\)-matrix with \(t=O\left( \frac{r\log (n/r)}{\log (1/c)}\right) .\)

Proof

Consider any r columns \(J=\{j_1,\ldots ,j_{r}\}\) in M. Let \(A_J\) be the event that columns J in M do not contain at least s distinct weight one vectors. For every \(i\in [t]\), the probability that \((M_{i,j_1},\ldots ,M_{i,j_r})\) is of weight 1 is \({r\atopwithdelims ()1}(1/r)(1-1/r)^{r-1}\ge 1/2\). In every such row, the entry that is equal to 1 is distributed uniformly at random over J. Let \(m_J\) be the number of such rows. The probability that columns J in M do not contain at least s distinct weight one vectors is at most \(\Pr [A_J|m_J=m]\le {r\atopwithdelims ()s-1}\left( \frac{s-1}{r}\right) ^{m}\le 2^rc^{m}.\) Since \(\textbf{E}[m_J]\ge t/2\), by Chernoffā€™s bound (LemmaĀ 5), \(\Pr \left[ m_J<\frac{t}{4}\right] \le 2^{-t/16}.\) Therefore, the probability that M is not (r,Ā s)-restricted weight one \(t\times n\)-matrix is at most

$$\begin{aligned} \Pr [(\exists J\subset [n], |J|=r) A_J]\le & {n\atopwithdelims ()r} \Pr [A_J]\le {n\atopwithdelims ()r} \left( \Pr [A_J|m_J\ge \frac{t}{4}]+\Pr [m_J<\frac{t}{4}]\right) \\ \le & {n\atopwithdelims ()r}\left( 2^rc^{t/4}+2^{-t/16}\right) \le {n\atopwithdelims ()r}2^{r+1}c^{t/16}\le \delta . \end{aligned}$$

Ā Ā Ā \(\square \)

Theorem Ā 3. If d is unknown, then any non-adaptive deterministic algorithm that detects one defective item must make at least \(\varOmega (n)\) tests.

Proof

Consider any non-adaptive deterministic algorithm \(\mathcal{A}\) that detects one defective item. Let M be a 0-1-matrix of size \(t\times n\) whose rows correspond to the tests of \(\mathcal{A}\). Suppose for the set of defective items \(I_0=[n]\) the algorithm outputs \(i_1\), for the set \(I_1=[n]\backslash \{i_1\}\) outputs \(i_2\), for \(I_2=[n]\backslash \{i_1,i_2\}\) outputs \(i_3\), etc. Obviously, \(\{i_1,\ldots ,i_n\}=[n]\). Now, since the output for \(I_0\) is distinct from the output for \(I_1\), we must have a row in M that is equal to 1 in entry \(i_1\) and zero elsewhere. Since the output for \(I_1\) is distinct from the output for \(I_2\), we must have a row in M that is equal to 1 in entry \(i_2\) and zero in entries \([n]\backslash \{i_1,i_2\}\). Etc. Therefore, M must have at least n rows.Ā Ā Ā \(\square \)

1.2 D.2 Random Algorithms

In this subsection, we give detailed proofs for results (13)ā€“(15) in Fig.Ā 1, that are summarized in Subsect.Ā 3.2. We start with proving the following lemma:

Lemma 6

There is a non-adaptive deterministic algorithm that makes \(t=\log n+0.5\log \log n+O(1)\) tests and decides whether \(d\le 1\) and if \(d=1\) detects the defective item.

Proof

We define a 0-1-matrix M, where the rows of the matrix correspond to the tests of the algorithm. The size of the matrix is \(t\times n\), where t is the smallest integer such that \(n\le {t\atopwithdelims ()\lfloor t/2\rfloor }\) and its columns contain distinct Boolean vectors of weight \(\lfloor t/2\rfloor \). Therefore \(t=\log n+0.5\log \log n+O(1)\).

Now, if there are no defective items, we get 0 in all the answers of the tests. If there is only one defective item, and it is \(i\in [n]\), then the vector of answers to the tests is equal to the column i of M. If there is more than one defective item, then the weight of the vector of the answers is greater than \(\lfloor t/2\rfloor \). Ā Ā Ā \(\square \)

TheoremĀ 4 shows result (13) in Fig.Ā 1:

Theorem Ā 4. Suppose some integer D is known in advance to the algorithm where \(d/4\le D\le 4d\). There is a polynomial time non-adaptive randomized algorithm that makes \(O(\ell \log (n/d)+\log (1/\delta )\log (n/d))\) tests and, with probability at least \(1-\delta \), detects \(\ell \) defective items.

Proof

If \(\ell \ge D/32\), then the non-adaptive randomized algorithm that finds all the defective items makes \(O(d\log (n/d))=O(\ell \log (n/d))\) tests. So, we may assume that \(\ell <D/32\le d/8\).

Let \(\ell \le d/8\). The algorithm runs \(t=O(\ell +\log (1/\delta ))\) iterations. At each iteration, it uniformly at random chooses each element in \(X=[n]\) with probability 1/(2D) and puts it in \(X'\). If \(|X'|>4n/D\), then it continues to the next iteration. If \(|X'|\le 4n/D\), then it uses the algorithm in LemmaĀ 6 to detect if \(X'\) contains one defective item, and if it does, it detects the item. If \(X'\) contains no defective item or more than one item, it continues to the next iteration.

Although the presentation of the above algorithm is adaptive, it is clear that all the iterations can be run non-adaptively.

Let A be the event that \(X'\) contains exactly one defective item. The probability of A is

$$\Pr [A]={d\atopwithdelims ()1}\frac{1}{2D}\left( 1-\frac{1}{2D}\right) ^{d-1}\ge \frac{1}{10}.$$

Since \(\textbf{E}[|X'|]=n/(2D)\), by Chernoffā€™s bound (LemmaĀ 5)

$$\Pr [|X'|>4n/D]\le \left( \frac{e^7}{8^8}\right) ^{n/(2D)}\le \frac{1}{20}.$$

Therefore,

$$\Pr [A \text{ and } |X'|\le 4n/D]\ge \frac{1}{20}.$$

Now, assuming A occurs, the defective in \(X'\) is distributed uniformly at random over the d defective items. Since \(\ell <d/8\), at each iteration, as long as the algorithm does not get \(\ell \) defective items, the probability of getting a new defective item in the next iteration is at least 7/8. Let \(B_i\) be the event that, in iteration i, the algorithm gets a new defective item. Then

$$\Pr [B_i]=\frac{7}{8}\Pr [A \text{ and } |X'|\le 4n/D]\ge \frac{7}{160}.$$

By Chernoffā€™s bound (LemmaĀ 5) , after \(O(\ell +\log (1/\delta ))\) iterations, with probability at least \(1-\delta \), the algorithm detects \(\ell \) defective items.

Therefore, by LemmaĀ 6, the test complexity of the algorithm is

$$O((\ell +\log (1/\delta ))\log {|X'|})=O\left( \ell \log \frac{n}{d}+\log (1/\delta )\log \frac{n}{d}\right) . $$

Ā Ā Ā \(\square \)

The following lower bound follows from TheoremĀ 12 from Sect.Ā 4. This is result (14) in Fig.Ā 1.

Theorem 5. Let \(\ell \le d\le n/2\) and d be known in advance to the algorithm. Any non-adaptive randomized algorithm that, with probability at least 2/3, detects \(\ell \) defective items must make at least \(\ell \log (n/d)-1\) tests.

In [4, 11, 21], the following is proved

Lemma 7

There is a polynomial time non-adaptive randomized algorithm that makes \(O(\log (1/\delta )\log n)\) tests and, with probability at least \(1-\delta \), finds an integer D that satisfies \(d/2<D<2d\).

Result (15) in Fig.Ā 1 is summarized in TheoremĀ 6.

Theorem Ā 6. Let \(c<1\) be any constant, \(\ell \le n^c\), and d be unknown to the algorithm. There is a polynomial time non-adaptive randomized algorithm that makes \(O(\ell \log ^2n+\log (1/\delta )\log ^2n)\) tests, and with probability at least \(1-\delta \), detects \(\ell \) defective items.

Proof

We make all the tests of the non-adaptive algorithm that, with probability at least \(1-\delta /2\), 1/4-estimate d, i.e., finds an integer D such that \(d/4<D<4d\). By LemmaĀ 7, this can be done with \(O(\log (1/\delta )\log n)\) tests.

We also make all the tests of the non-adaptive algorithms that, with probability at least \(1-\delta /2\), detects \(\ell \) defective items for all \(d=2^i\ell \), \(i=1,2,\ldots ,\log (n/\ell )\). By TheoremĀ 4, this can be done with

$$O\left( \sum _{i=1}^{\log (n/\ell )}\ell \log \frac{n}{2^i\ell }+\log \frac{2}{\delta }\log \frac{n}{2^i\ell }\right) =O((\ell +\log (1/\delta ))\log ^2n)$$

tests. Ā Ā Ā \(\square \)

E Proofs forĀ Adaptive Settings

In this section, we bring all the proofs of the Theorems that appeared in Sect.Ā 4. We restate the Theorems for convenience.

1.1 E.1 Deterministic Algorithms

In the following, we bring proofs for all the Theorems in Sect.Ā 4.1.

Theorem Ā 8. Let \(d\ge \ell \). There is a polynomial time adaptive deterministic algorithm that detects \(\ell \) defective items and makes at most \(\ell \log (n/\ell )+3\ell =O(\ell \log (n/\ell ))\) tests.

Proof

We first split the items \(X=[n]\) to \(\ell \) disjoint sets \(X_1,\ldots ,X_\ell \) of (almost) equal sizes (each of size \(\lfloor n/\ell \rfloor \) or \(\lceil n/\ell \rceil \)). Then we use the binary search algorithm (binary splitting algorithm) for each i to detect all the defective items in \(X_i\) until we get \(\ell \) defective items.

Each binary search takes at most \(\lceil \log (n/\ell )\rceil +1\) tests, and testing all \(X_i\) takes at most \(\ell \) tests. Ā Ā Ā \(\square \)

The following Theorem, summarizes the lower bound (2) in Fig.Ā 1. We remind the reader that when we say that d is known in advance to the algorithm, we mean that an estimate D that satisfies \(d/4\le D\le 4d\) is known to the algorithm. The following lower bound holds even if the algorithm knows d exactly in advance.

Theorem Ā 9. Let \(\ell \le d\le n/2\) and d be known in advance to the algorithm. Any adaptive deterministic algorithm that detects \(\ell \) defective items must make at least \(\max (\ell \log (n/d), \log n-1)=\varOmega (\ell \log (n/d)+\log n)\) tests.

Proof

Let A be an adaptive deterministic algorithm that detects \(\ell \) defective items. Let \(L_1,\ldots ,L_t\) be all the possible \(\ell \)-subsets of X that A outputs. Since the algorithm is deterministic, the test complexity of A is at least \(\log t\). Since \(L_i\subseteq I\) (the set of d defective items), each \(L_i\) can be an output of at most \({n-\ell \atopwithdelims ()d-\ell }\) sets I. Since the number of possible sets of defective items I is \({n\atopwithdelims ()d}\), we have

$$t\ge \frac{{n\atopwithdelims ()d}}{{n-\ell \atopwithdelims ()d-\ell }}\ge \frac{n(n-1)\cdots (n-\ell +1)}{d(d-1)\cdots (d-\ell +1)}\ge \left( \frac{n}{d}\right) ^\ell .$$

Therefore the test complexity of A is at least \(\log t\ge \ell \log (n/d).\)

We now show that \(t> n-d\). Now suppose, to the contrary, that \(t\le n-d\). Choose any \(x_i\in L_i\) and consider any \(S\subseteq X\backslash \{x_i|i\in [t]\}\) of size d. For the set of defective items \(I=S\), the algorithm outputs some \(L_i\), \(i\in [t]\). Since \(L_i\not \subseteq S\), we get a contradiction. Therefore, \(t> n-d\) and \(\log t> \log (n-d)\ge \log (n/2)=\log n-1\).Ā Ā Ā \(\square \)

Note that the upper bound \(O(\ell \log (n/\ell ))\) in TheoremĀ 8 asymptotically matches the lower bound \(\varOmega (\ell \log (n/d))\) in TheoremĀ 9 when \(d=n^{o(1)}\).

The following Theorem proves resultĀ (4) in Fig.Ā 3.

Theorem Ā 10. Let \(\ell \le d\le n/2\) and d be unknown to the algorithm. Any adaptive deterministic algorithm that detects \(\ell \) defective items must make at least \(\ell \log (n/\ell )\) tests.

Proof

Since the algorithm works for any d, we let \(d=4\ell \). Then by the first bound in TheoremĀ 9, the result follows. Ā Ā Ā \(\square \)

1.2 E.2 Random Algorithms

In this subsection, we demonstrate the results on the test complexity of adaptive randomized algorithms. The following theorem proves the upper bound when d is known in advance to the algorithm. This proves result (5) in Fig.Ā 1.

Theorem Ā 11. Let \(\ell \le d/2\). Suppose some integer D is known in advance to the algorithm where \(d/4\le D\le 4d\). There is a polynomial time adaptive randomized algorithm that makes \(\ell \log (n/d)+\ell \log \log (1/\delta )+O(\ell )\) tests and, with probability at least \(1-\delta \), detectsĀ \(\ell \) defective items.

Proof

Let \(c=32\log (2/\delta )\). If \(D<c\ell \), we can use the deterministic algorithm in TheoremĀ 8. The test complexity is \(\ell \log (n/\ell )+2\ell =\ell \log (cn/D)+2\ell =\ell \log (n/d)+\ell \log \log (1/\delta )+O(\ell )\).

If \(D>c\ell \), then the algorithm uniformly at random chooses each element in X with probability \(c\ell /D<1\) and puts the items in \(X'\). If \(|X'|\le 3c\ell n/D\), then deterministically detects \(\ell \) defective items in \(X'\) using TheoremĀ 8.

Let \(Y_i\) be an indicator random variable that is 1 if the ith defective item is in \(X'\) and 0 otherwise. Then \(\textbf{E}[Y_i]=c\ell /D\). The number of defective items in \(X'\) is \(Y=Y_1+\cdots +Y_d\) and \(\mu :=\textbf{E}[Y]=cd\ell /D\ge c\ell /4\). By Chernoffā€™s bound (LemmaĀ 5), we have \(\Pr [Y<\ell ]\le e^{-(1-4/c)^2c\ell /8}<e^{-c\ell /32}\le \delta /2\). Also, \(\textbf{E}[|X'|]=c\ell n/D\), and by Chernoffā€™s bound (LemmaĀ 5), \(\Pr [|X'|>3c\ell n/D]\le (e/3)^{3c\ell n/D}\le \delta /2.\) Therefore, with probability at least \(1-\delta \), the number of defective items in \(X'\) is at least \(\ell \) and \(|X'|\le 3c\ell n/D\). Therefore, with probability at least \(1-\delta \), the algorithm detects \(\ell \) defective items.

Since \(|X'|\le 3c\ell n/D\le 12c\ell n/d\), by TheoremĀ 8, the test complexity is at most \(\ell \log (|X'|/\ell )+2\ell =\ell \log (n/d)+\ell \log \log (1/\delta )+O(\ell ).\) Ā Ā Ā \(\square \)

We now prove the lower bound when d is known in advance to the algorithm. This proves results (6) and (8) in Fig.Ā 1. These are summarized in TheoremĀ 12.

Theorem Ā 12. Let \(\ell \le d\le n/2\) and d be known in advance to the algorithm. Any adaptive randomized algorithm that, with probability at least 2/3, detects \(\ell \) defective items must make at least \(\ell \log (n/d)-1\) tests.

Proof

We use Yaoā€™s principle in the standard way. Let A(s,Ā I) be any adaptive randomized algorithm that, with probability at least 2/3, detects \(\ell \) defective items. Here s is the random seeds, and I is the set of defective items. Let X(I,Ā s) be an indicator random variable that is equal 1 if A(s,Ā I) returns a subset \(L\subset I\) of size \(\ell \) and 0 otherwise. Then for every I, \(\textbf{E}_s[X(s,I)]\ge 2/3\). Therefore, \(\textbf{E}_s[\textbf{E}_I[X(s,I)]]=\textbf{E}_I[\textbf{E}_s[X(s,I)]]\ge 2/3\), where the distribution in \(\textbf{E}_I\) is the uniform distribution. Thus, there is a seedĀ \(s_0\) such that \(\textbf{E}_I[X(s_0,I)]\ge 2/3\). That is, for at least \(2{n\atopwithdelims ()d}/3\) sets of defective items I, the deterministic algorithm \(A(s_0,I)\) returns \(L\subseteq I\) of size \(\ell \). Now, similar to the proof of TheoremĀ 9, the algorithm \(A(s_0,I)\) makes at least

$$\log \frac{\frac{2}{3}{n\atopwithdelims ()d}}{{n-\ell \atopwithdelims ()d-\ell }}\ge \ell \log (n/d)-1.$$

Ā Ā Ā \(\square \)

In particular,

Theorem Ā 13. Let \(\ell \le d\le n/2\) and d is unknown to the algorithm. Any adaptive randomized algorithm that, with probability at least 2/3, detects \(\ell \) defective items must make at least \(\ell \log (n/d)-1\) tests.

We now prove the upper bound when d is unknown to the algorithm. This proves resultĀ (7) in Fig.Ā 1.

Theorem Ā 14. Let \(\ell \le d/2\) and d be unknown to the algorithm. There is a polynomial time adaptive randomized algorithm that detects \(\ell \) defective items and makes \(\ell \log (n/d)+\ell \log \log (1/\delta )+O(\ell +\log \log (\min (n/d,d))+\log (1/\delta ))\) tests.

Proof

We first estimate d within a factor of 2 and probability at least \(1-\delta /2\). By LemmaĀ 9, this can be done in \(2\log \log (n/d)+O(\log (1/\delta ))\). Then, by TheoremĀ 11, the result follows.Ā Ā Ā \(\square \)

F Estimating d

The following lemma follows from [5, 21].

Lemma 8

Let \(\epsilon <1\) be any positive constant. There is a polynomial time adaptive algorithm that makes \(O(\log \log d+\log (1/\delta ))\) expected number of tests and with probability at least \(1-\delta \) outputs D such that \((1-\epsilon )d\le D\le (1+\epsilon )d\).

In AppendixĀ F, we use a similar technique to prove:

Lemma 9

Let \(\epsilon <1\) be any positive constant. There is a polynomial time adaptive algorithm that makes \(O(\log \log (\min (d,n/d))+\log (1/\delta ))\) expected number of tests and with probability at least \(1-\delta \) outputs D such that \((1-\epsilon )d\le D\le (1+\epsilon )d\).

To prove LemmaĀ 9, we first prove:

Lemma 10

Let \(\epsilon <1\) be any positive constant. There is a polynomial time adaptive algorithm that makes \(O(\log \log (n/d)+\log (1/\delta ))\) expected number of tests and with probability at least \(1-\delta \) outputs D such that \((1-\epsilon )d\le D\le (1+\epsilon )d\).

We first give an algorithm that makes \(O(\log \log (n/d))\) expected number of tests and outputs D that with probability at least \(1-\delta \) satisfies

$$\begin{aligned} \frac{\delta d^2}{4n\log ^2(2/\delta )}\le D\le d. \end{aligned}$$
(4)

The algorithm is

  1. 1.

    \(\lambda =2\).

  2. 2.

    Let each \(x\in [n]\) be chosen to be in the test Q with probability \(1-2^{-\lambda /n}\).

  3. 3.

    If \(T_I(Q)=0\) then \(\lambda \leftarrow \lambda ^2\); Return to step 2.

  4. 4.

    \(D=\delta n/(4\lambda ).\)

  5. 5.

    Output D.

We now prove

Lemma 11

We have

$$\Pr \left[ \frac{\delta d^2}{4n\log ^2(2/\delta )}\le D\le d\right] \ge 1-\delta .$$

Proof

Let \(\lambda _i=2^{2^i}\) and \(Q_i\) be a set where each \(x\in [n]\) is chosen to be in \(Q_i\subseteq [n]\) with probability \(1-2^{-\lambda _i/n}\), \(i=0,1,\cdots \). Let \(i'\) be such that \(\lambda _{i'}< \delta n/(4d)\) and \(\lambda _{i'+1}\ge \delta n/(4d)\). Let \(D=\delta n/(4\lambda _j)\) be the output of the algorithm. Then, since \(\lambda _i\le \lambda _{i+1}/2\), we have \(\lambda _{i'-t}< \delta n/(2^{t+2}d)\) and

$$\begin{aligned} \Pr [D> d]=&\Pr [\delta n/(4\lambda _j)> d]=\Pr [\lambda _j< \delta n/(4d)]=\Pr [j\in \{0,1,\ldots ,i'\}] \\ =&\Pr [T_I(Q_0)=1 \vee T_I(Q_1)=1 \vee \cdots \vee T_I(Q_{i'})=1]\le \sum _{i=0}^{i'} \Pr [T_I(Q_i)=1]\\ =& \sum _{i=0}^{i'} (1-2^{-d\lambda _i/n})\le \sum _{i=0}^{i'} \frac{d\lambda _i}{n}\le \cdots +\frac{\delta }{8}+\frac{\delta }{4}\le \frac{\delta }{2}. \end{aligned}$$

Also, since \(\lambda _j>a\) implies \(\lambda _{j-1}>\sqrt{a}\),

$$\begin{aligned} \Pr \left[ D<\frac{\delta d^2}{4n\log ^2(2/\delta )}\right] =&\Pr \left[ \lambda _j\ge \frac{n^2}{d^2}\log ^2\frac{2}{\delta }\right] \\ =&\Pr \left[ T_I(Q_{j-1})=0\ \wedge \ \lambda _j\ge \frac{n^2}{d^2}\log ^2\frac{2}{\delta }\right] \\ \le & 2^{-d\lambda _{j-1}/n}\le 2^{-\log (2/\delta )}=\frac{\delta }{2}. \end{aligned}$$

This completes the proof. Ā Ā Ā \(\square \)

Lemma 12

The expected number of tests of the algorithm is \(\log \log (n/d)+O(1)\).

Proof

For \((n/d)^{2}> \lambda _k\ge (n/d)\), the probability that the algorithm makes \(k+t+1\) tests is less than

$$2^{-d\lambda _{k+t}/n}=2^{-d\lambda _k^{2^t}/n}\le 2^{-(n/d)^{2^t-1}}.$$

Therefore the expected number of tests of the algorithm is at most \(k+O(1)\). Since \(\lambda _k=2^{2^k}< (n/d)^2\), we have \(k=\log \log (n/d)+O(1)\). Ā Ā Ā \(\square \)

We now give another adaptive algorithm that, given that (4) holds, it makes \(\log \log (n/d)+O(\log \log (1/\delta ))\) tests and with probability at least \(1-\delta \) outputs \(D'\) that satisfies \(d\delta /8\le D'\le 8d/\delta \).

By (4), we have

$$1\le \frac{d}{D}\le H:=\sqrt{\frac{4\log ^2(2/\delta )}{\delta }\frac{n}{D}}$$

Let \(\tau =\lceil \log (1+\log H)\rceil \). Then \(1\le d/D\le 2^{2^\tau -1}\) and \(0\le \log (d/D)\le 2^\tau -1\).

Consider an algorithm that, given a hidden number \(0\le i\le 2^\tau -1\), binary searches for i with queries of the form ā€œIs \(i>m\)ā€. Consider the tree \(T(\tau )\) that represents all the possible runs of this algorithm, with nodes labeled with m. See, for example, the tree T(4) in Fig.Ā 4.

Fig. 4.
figure 4

The tree T(4), which is all the runs of the binary search algorithm for \(0\le i\le 15\). Suppose we search for the hidden number \(i=9\). We start from the treeā€™s root, and the first query is ā€œIs \(i>7.5\)ā€. The answer is yes, and we move to the right son of the root. The following query is ā€œIs \(i>11.5\)ā€ the answer is no, and we move to the left son. Etc.

We will do a binary search for an integer close to \(\log (d/D)\) in the tree \(T(\tau )\).

The algorithm is the following

  1. 1.

    Let \(\ell =0; r=2^\tau -1;\)

  2. 2.

    While \(\ell \not =r\) do

  3. 3.

    Let \(m=(\ell +r)/2\)

  4. 4.

    Let each \(x\in [n]\) be chosen to be in the test Q with probability \(1-2^{-1/(2^mD)}\).

  5. 5.

    If \(T_I(Q)=1\) then \(\ell = \lceil m\rceil \) else \(r=\lfloor m \rfloor \).

  6. 6.

    Output \(D':=D2^\ell \).

We first prove

Lemma 13

Consider \(T(\tau )\) for some integer \(\tau \). Consider an integer \(0\le i\le 2^\tau -1\) and the path \(P_i\) in \(T(\tau )\) from the root to the leave i. Then

  1. 1.

    \(P_i\) passes through a node labeled with \(i-1/2\), and the next node in \(P_i\) is its right son.

  2. 2.

    \(P_i\) passes through a node labeled with \(i+1/2\), and the next node in \(P_i\) is its left son.

Proof

If the path does not go through the node labeled with \(i-1/2\) (resp. \(i+1/2\)), then, in the search, we cannot distinguish between i and \(i-1\) (resp. \(i+1\)). Obviously, if we search for i and reach the node labeled with \(i-1/2\), the next node in the binary search is the right son. Ā Ā Ā \(\square \)

Now, by LemmaĀ 13, if the algorithm outputs \(\ell '\), then there is a node labeled with \(m=\ell '-1/2\) that the algorithm went through, and the answer to the test was 1. That is, the algorithm continues to the right node.

$$\begin{aligned} \Pr \left[ D'>\frac{8d}{\delta }\right] =& \Pr \left[ D2^\ell >\frac{8d}{\delta }\right] =\Pr \left[ \ell >\log \frac{d}{D}+\log \frac{8}{\delta }\right] \\ =& \sum _{\ell '=\lceil \log (d/D)+\log (8/\delta )\rceil }^{2^\tau -1} \Pr [\ell =\ell ']\\ =& \sum _{\ell '=\lceil \log (d/D)+\log (8/\delta )\rceil }^{2^\tau -1} \Pr [\text {Answer in node labeled with }m=\ell '-1/2 \text { is }1]\\ =&\sum _{\ell '=\lceil \log (d/D)+\log (8/\delta )\rceil }^{2^\tau -1} 1-2^{-d/(2^{\ell '-1/2}D)}\\ \\ \le &\sum _{\ell '=\lceil \log (d/D)+\log (8/\delta )\rceil }^{2^\tau -1} \frac{d}{D2^{\ell '-1/2}}\le \frac{\delta }{4}+\frac{\delta }{8}\cdots \le \frac{\delta }{2}.\\ \end{aligned}$$

By LemmaĀ 13, if the algorithm outputs \(\ell '\), then there is a node labeled with \(m=\ell '+1/2\) that the algorithm went through, and the answer to the test was 0.

$$\begin{aligned} \Pr \left[ D'<\frac{\delta d}{8}\right] =& \Pr \left[ D2^\ell <\frac{\delta d}{8}\right] =\Pr \left[ \ell <\log \frac{d}{D}-\log \frac{8}{\delta }\right] \\ =& \sum _{\ell '=0}^{\lfloor \log (d/D)-\log (8/\delta )\rfloor } \Pr [\ell =\ell ']\\ =& \sum _{\ell '=0}^{\lfloor \log (d/D)-\log (8/\delta )\rfloor }\Pr [\text {Answer in node labeled with }m=\ell '+1/2 \text { is }0]\\ =&\sum _{\ell '=0}^{\lfloor \log (d/D)-\log (8/\delta )\rfloor } 2^{-d/(D2^{\ell '+1/2})}\\ \\ \le &2^{-4/\delta }+2^{-8/\delta }+2^{-16/\delta }+\cdots \le \frac{\delta }{4}+\frac{\delta }{8}+\cdots =\frac{\delta }{2}.\\ \end{aligned}$$

Therefore, with probability at least \(1-\delta \), \(D'\) satisfies \(d\delta /8\le D'\le 8d/\delta \).

Lemma 14

The number of tests of the algorithm is \(\log \log (n/d)+O(\log \log (1/\delta ))\).

Proof

Since, by (4), \({\delta d^2}/({4n\log ^2(2/\delta )})\le D\), the number of tests is

$$\begin{aligned} \tau +1\le & \log \log H+3\\ \le & 3+\log \log \sqrt{\frac{4\log ^2(2/\delta )}{\delta }\frac{n}{D}} \\ \le & 3+\log \log \left( \frac{4\log ^2\frac{2}{\delta }}{\delta }\cdot \frac{n}{d}\right) =\log \log \frac{n}{d}+O\left( \log \log \frac{1}{\delta }\right) . \end{aligned}$$

Ā Ā Ā \(\square \)

Finally, given \(D'\) that satisfies \(d\delta /8\le D'\le 8d/\delta \), Falahatgar et al.Ā [21] presented an algorithm that, for any constant \(\epsilon >0\), makes \(O(\log (1/\delta ))\) queries and, with probability at least \(1-\delta \), returns an integer \(D''\) that satisfies \((1-\epsilon )d\le D''\le (1+\epsilon )d\).

By LemmaĀ 12 andĀ 14, LemmaĀ 10 follows.

One way to prove LemmaĀ 9 is by running the algorithms in LemmaĀ 8 and LemmaĀ 10 in parallel, one step in each algorithm, and halt when one of them halts. Another way is by using the following result.

Lemma 15

Let d and m be integers, and \(\epsilon \le 1\) be any real number. There is a non-adaptive randomized algorithm that makes \(O((1/\epsilon ^2)\log (1/\delta ))\) tests and

  • If \(d<m\) then, with probability at least \(1-\delta \), the algorithm returns 0.

  • If \(d>(1+\epsilon )m\), then, with probability at least \(1-\delta \), the algorithm returns 1.

  • If \(m\le d\le (1+\epsilon )m\) then, the algorithm returns 0 or 1.

Proof

Consider a random test \(Q\subseteq X\) where each \(x\in X\) is chosen to be in Q with probability \(1-(1+\epsilon )^{-1/(m\epsilon )}\). The probability that \(T_I(Q)=0\) is \((1+\epsilon )^{-d/(m\epsilon )}\). Since

$$\begin{aligned} \Pr [T_I(Q)=0|d<m]-\Pr [T_I(Q)=0|d>(1+\epsilon )m]\ge & (1+\epsilon )^{-1/\epsilon }-(1+\epsilon )^{-(1+\epsilon )/\epsilon }\\ =&(1+\epsilon )^{-1/\epsilon }\frac{\epsilon }{1+\epsilon }\\ \ge & \frac{\epsilon }{2e}. \end{aligned}$$

By Chernoffā€™s bound (LemmaĀ 5), we can, with probability at least \(1-\delta \), estimate \(\Pr [T_I(Q)=0]\) up to an additive error of \(\epsilon /(8e)\) using \(O((1/\epsilon ^2)\log (1/\delta ))\) tests. If the estimation is less than \((1+\epsilon )^{-(1+\epsilon )/\epsilon }+\epsilon /(4e)\) we output 0. Otherwise, we output 1. This implies the result.Ā Ā Ā \(\square \)

Now, to prove LemmaĀ 9, we first run the algorithm in LemmaĀ 15 with \(m=\sqrt{n}\) and \(\epsilon =1\). If the output is 0 (\(d<2\sqrt{n}\)), then we run the algorithm in LemmaĀ 8. Otherwise, we run the algorithm in LemmaĀ 10.

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bshouty, N.H., Haddad-Zaknoon, C.A. (2024). On Detecting Some Defective Items inĀ Group Testing. In: Wu, W., Tong, G. (eds) Computing and Combinatorics. COCOON 2023. Lecture Notes in Computer Science, vol 14422. Springer, Cham. https://doi.org/10.1007/978-3-031-49190-0_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-49190-0_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-49189-4

  • Online ISBN: 978-3-031-49190-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics