Advertisement

Algorithmica

, Volume 81, Issue 9, pp 3765–3802 | Cite as

Improving and Extending the Testing of Distributions for Shape-Restricted Properties

  • E. FischerEmail author
  • O. Lachish
  • Y. Vasudev
Article
  • 13 Downloads

Abstract

Distribution testing deals with what information can be deduced about an unknown distribution over \(\{1,\ldots ,n\}\), where the algorithm is only allowed to obtain a relatively small number of independent samples from the distribution. In the extended conditional sampling model, the algorithm is also allowed to obtain samples from the restriction of the original distribution on subsets of \(\{1,\ldots ,n\}\). In 2015, Canonne, Diakonikolas, Gouleakis and Rubinfeld unified several previous results, and showed that for any property of distributions satisfying a “decomposability” criterion, there exists an algorithm (in the basic model) that can distinguish with high probability distributions satisfying the property from distributions that are far from it in the variation distance. We present here a more efficient yet simpler algorithm for the basic model, as well as very efficient algorithms for the conditional model, which until now was not investigated under the umbrella of decomposable properties. Additionally, we provide an algorithm for the conditional model that handles a much larger class of properties. Our core mechanism is an algorithm for efficiently producing an interval-partition of \(\{1,\ldots ,n\}\) that satisfies a “fine-grain” quality. We show that with such a partition at hand we can avoid the search for the “correct” partition of \(\{1,\ldots ,n\}\).

Keywords

Testing Distributions Decomposability Conditional samples 

Notes

References

  1. 1.
    Alon, N., Andoni, A., Kaufman, T., Matulef, K., Rubinfeld, R., Xie, N.: Testing k-wise and almost k-wise independence. In: Proceedings of the Thirty-Ninth Annual ACM Symposium on Theory of Computing (New York, NY, USA), STOC ’07, ACM, pp. 496–505 (2007)Google Scholar
  2. 2.
    Acharya, J., Canonne, C.L., Kamath, G.: A chasm between identity and equivalence testing with conditional queries. In: Garg, N., Jansen, K., Rao, A., Rolim, J.D.P., (eds.) Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques, APPROX/RANDOM 2015, August 24–26, 2015, Princeton, NJ, USA, LIPIcs, vol. 40, Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik, pp. 449–466 (2015)Google Scholar
  3. 3.
    Acharya, J., Daskalakis, C., Kamath, G.: Optimal testing for properties of distributions. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015, December 7–12, 2015, Montreal, Quebec, Canada, pp. 3591–3599 (2015)Google Scholar
  4. 4.
    Batu, T., Fortnow, L., Fischer, E., Kumar, R., Rubinfeld, R., White, P.: Testing random variables for independence and identity. In: 42nd Annual Symposium on Foundations of Computer Science, FOCS 2001, 14–17 October 2001, Las Vegas, Nevada, USA, pp. 442–451 (2001)Google Scholar
  5. 5.
    Batu, T., Fortnow, L., Rubinfeld, R., Smith, W.D., White, P.: Testing that distributions are close. In: 41st Annual Symposium on Foundations of Computer Science, FOCS 2000, 12–14 November 2000, Redondo Beach, California, USA, pp. 259–269. IEEE Computer Society (2000)Google Scholar
  6. 6.
    Batu, T., Kumar, R., Rubinfeld, R.: Sublinear algorithms for testing monotone and unimodal distributions,. In: Babai, L. (ed.) Proceedings of the 36th Annual ACM Symposium on Theory of Computing, Chicago, IL, USA, June 13–16, 2004, pp. 381–390. ACM (2004)Google Scholar
  7. 7.
    Canonne, C.L.: A survey on distribution testing: your data is big. But is it blue? Electron. Colloq. Comput. Complex. (ECCC) 22, 63 (2015)Google Scholar
  8. 8.
    Canonne, C.L., Diakonikolas, I., Gouleakis, T., Rubinfeld, R.: Testing shape restrictions of discrete distributions. In: 33rd Symposium on Theoretical Aspects of Computer Science, STACS 2016, February 17–20, 2016, Orléans, France, pp. 25:1–25:14 (2016)Google Scholar
  9. 9.
    Chakraborty, S., Fischer, E., Goldhirsh, Y., Matsliah, A.: On the power of conditional samples in distribution testing. SIAM J. Comput. 45(4), 1261–1296 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Canonne, C.L., Ron, D., Servedio, R.A.: Testing probability distributions using conditional samples. SIAM J. Comput. 44(3), 540–616 (2015)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Diakonikolas, I.: Learning structured distributions. In: Handbook of Big Data, p. 267 (2016)Google Scholar
  12. 12.
    Diakonikolas, I., Kane, D.M.: A new approach for testing properties of discrete distributions. In: IEEE 57th Annual Symposium on Foundations of Computer Science, FOCS 2016, 9–11 October 2016, Hyatt Regency, New Brunswick, New Jersey, USA, pp. 685–694 (2016)Google Scholar
  13. 13.
    Diakonikolas, I., Lee, H.K., Matulef, K., Onak, K., Rubinfeld, R., Servedio, R.A., Wan, A.: Testing for concise representations, In: 48th Annual IEEE Symposium on Foundations of Computer Science (FOCS 2007), October 20–23, 2007, Providence, RI, USA, Proceedings, pp. 549–558 (2007)Google Scholar
  14. 14.
    Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. Electron. Colloquium Comput. Complex. (ECCC) 7(20) (2000)Google Scholar
  15. 15.
    Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Am. Stat. Assoc. 58, 13–30 (1963). MR 0144363MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Indyk, P., Levi, R., Rubinfeld, R.: Approximating and testing k-histogram distributions in sub-linear time. In: Proceedings of the 31st ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2012, Scottsdale, AZ, USA, May 20–24, 2012, pp. 15–22 (2012)Google Scholar
  17. 17.
    Paninski, L.: A coincidence-based test for uniformity given very sparsely sampled discrete data. IEEE Trans. Inf. Theory 54(10), 4750–4755 (2008)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Servedio, R.A.: Testing by implicit learning: a brief survey. In: Property Testing—Current Research and Surveys [outgrow of a workshop at the Institute for Computer Science (ITCS) at Tsinghua University, January 2010], pp. 197–210 (2010)Google Scholar
  19. 19.
    Valiant, G., Valiant, P.: A CLT and tight lower bounds for estimating entropy. Electron. Colloq. Comput. Complex. (ECCC) 17, 179 (2010)Google Scholar
  20. 20.
    Valiant, G., Valiant, P.: An automatic inequality prover and instance optimal identity testing. SIAM J. Comput. 46(1), 429–455 (2017)MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  1. 1.Faculty of Computer ScienceIsrael Institute of Technology (Technion)HaifaIsrael
  2. 2.Birkbeck, University of LondonLondonUK
  3. 3.Department of Computer Science and EngineeringIndian Institute of Technology MadrasChennaiIndia

Personalised recommendations