Advertisement

Theory of Computing Systems

, Volume 48, Issue 2, pp 428–442 | Cite as

Sublinear Time Algorithms for Earth Mover’s Distance

  • Khanh Do Ba
  • Huy L. Nguyen
  • Huy N. Nguyen
  • Ronitt Rubinfeld
Article

Abstract

We study the problem of estimating the Earth Mover’s Distance (EMD) between probability distributions when given access only to samples of the distribution. We give closeness testers and additive-error estimators over domains in [0,1] d , with sample complexities independent of domain size—permitting the testability even of continuous distributions over infinite domains. Instead, our algorithms depend on the dimension of the domain space and the quality of the result required. We also prove lower bounds for closeness testing, showing the dependencies on these parameters to be essentially optimal. Additionally, we consider whether natural classes of distributions exist for which there are algorithms with better dependence on the dimension, and show that for highly clusterable data, this is indeed the case. Lastly, we consider a variant of the EMD, defined over tree metrics instead of the usual 1 metric, and give tight upper and lower bounds.

Keywords

Sublinear-time algorithms Property testing Earth Mover’s Distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Alon, N., Dar, S., Parnas, M., Ron, D.: Testing of clustering. In: FOCS ’00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p. 240. IEEE Computer Society, Washington (2000) CrossRefGoogle Scholar
  2. 2.
    Andoni, A., Indyk, P., Krauthgamer, R.: Earth mover distance over high-dimensional spaces. In: SODA ’08: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 343–352. Society for Industrial and Applied Mathematics, Philadelphia (2008) Google Scholar
  3. 3.
    Bartal, Y.: Probabilistic approximation of metric spaces and its algorithmic applications. In: FOCS ’96: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, p. 184. IEEE Computer Society, Washington (1996) Google Scholar
  4. 4.
    Batu, T., Fortnow, L., Rubinfeld, R., Smith, W.D., White, P.: Testing that distributions are close. In: FOCS ’00: Proceedings of the 41st Annual Symposium on Foundations of Computer Science, p. 259. IEEE Computer Society, Washington (2000) CrossRefGoogle Scholar
  5. 5.
    Batu, T., Fortnow, L., Fischer, E., Kumar, R., Rubinfeld, R., White, P.: Testing random variables for independence and identity. In: FOCS ’01: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, p. 442. IEEE Computer Society, Washington (2001) Google Scholar
  6. 6.
    Charikar, M.S.: Similarity estimation techniques from rounding algorithms. In: STOC ’02: Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing, pp. 380–388. ACM, New York (2002) CrossRefGoogle Scholar
  7. 7.
    Cohen, S., Guibas, L.: The earth mover’s distance under transformation sets. In: ICCV ’99: Proceedings of the International Conference on Computer Vision, vol. 2, p. 1076. IEEE Computer Society, Washington (1999) Google Scholar
  8. 8.
    Fakcharoenphol, J., Rao, S., Talwar, K.: A tight bound on approximating arbitrary metrics by tree metrics. In: STOC ’03: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, pp. 448–455. ACM, New York (2003) CrossRefGoogle Scholar
  9. 9.
    Goldreich, O., Ron, D.: On testing expansion in bounded-degree graphs. In: Electronic Colloquium on Computational Complexity (ECCC), 7(20), (2000) Google Scholar
  10. 10.
    Indyk, P.: A near linear time constant factor approximation for Euclidean bichromatic matching (cost). In: SODA ’07: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, pp. 39–42. Society for Industrial and Applied Mathematics, Philadelphia (2007) Google Scholar
  11. 11.
    Indyk, P., Thaper, N.: Fast image retrieval via embeddings. In: 3rd International Workshop on Statistical and Computational Theories of Vision. ICCV (2003) Google Scholar
  12. 12.
    Peleg, S., Werman, M., Rom, H.: A unified approach to the change of resolution: space and gray-level (1989) Google Scholar
  13. 13.
    Rubner, Y., Tomasi, C.: Texture metrics. In: Systems, Man, and Cybernetics, 1998. 1998 IEEE International Conference on, vol. 5, pp. 4601–4607 Oct. 1998 Google Scholar
  14. 14.
    Rubner, Y., Guibas, L.J., Tomasi, C.: The earth movers distance, multi-dimensional scaling, and color-based image retrieval. In: APRA Image Understanding Workshop, pp. 661–668, May 1997 Google Scholar
  15. 15.
    Rubner, Y., Tomasi, C., Guibas, L.J.: A metric for distributions with applications to image databases. In Computer Vision, 1998. Sixth International Conference on, pp. 59–66 (1998) Google Scholar
  16. 16.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int. J. Comput. Vision 40(2), 99–121 (2000) MATHCrossRefGoogle Scholar
  17. 17.
    Ruzon, M.A., Tomasi, C.: Color Edge Detection with the Compass Operator, vol. 2, p. 166 (1999) Google Scholar
  18. 18.
    Ruzon, M.A., Tomasi, C.: Corner detection in textured color images. In: Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, vol. 2, pp. 1039–1045 (1999) Google Scholar
  19. 19.
    Valiant, P.: Testing symmetric properties of distributions. In: STOC ’08: Proceedings of the 40th Annual ACM Symposium on Theory of Computing, pp. 383–392. ACM, New York (2008) CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Khanh Do Ba
    • 1
  • Huy L. Nguyen
    • 2
  • Huy N. Nguyen
    • 1
  • Ronitt Rubinfeld
    • 1
  1. 1.MIT CSAILCambridgeUSA
  2. 2.MITCambridgeUSA

Personalised recommendations