Edit Distance to Monotonicity in Sliding Windows

  • Ho-Leung Chan
  • Tak-Wah Lam
  • Lap-Kei Lee
  • Jiangwei Pan
  • Hing-Fung Ting
  • Qin Zhang
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7074)

Abstract

Given a stream of items each associated with a numerical value, its edit distance to monotonicity is the minimum number of items to remove so that the remaining items are non-decreasing with respect to the numerical value. The space complexity of estimating the edit distance to monotonicity of a data stream is becoming well-understood over the past few years. Motivated by applications on network quality monitoring, we extend the study to estimating the edit distance to monotonicity of a sliding window covering the w most recent items in the stream for any w ≥ 1. We give a deterministic algorithm which can return an estimate within a factor of (4 + ε) using \(O(\frac{1}{\epsilon ^2} \log^2(\epsilon w))\) space.

We also extend the study in two directions. First, we consider a stream where each item is associated with a value from a partial ordered set. We give a randomized (4 + ε)-approximate algorithm using \(O(\frac{1}{\epsilon^2} \log \epsilon^2 w \log w)\) space. Second, we consider an out-of-order stream where each item is associated with a creation time and a numerical value, and items may be out of order with respect to their creation times. The goal is to estimate the edit distance to monotonicity with respect to the numerical value of items arranged in the order of creation times. We show that any randomized constant-approximate algorithm requires linear space.

Keywords

Data Stream Edit Distance Approximate Algorithm Creation Time Stream Model 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ablayev, F.: Lower bounds for one-way probabilistic communication complexity and their application to space complexity. Theoretical Computer Science 157(2), 139–159 (1996)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Ajtai, M., Jayram, T.S., Kumar, R., Sivakumar, D.: Approximate counting of inversions in a data stream. In: Proc. STOC, pp. 370–379 (2002)Google Scholar
  3. 3.
    Ben-Moshe, S., Kanza, Y., Fischer, E., Matsliah, A., Fischer, M., Staelin, C.: Detecting and exploiting near-sortedness for efficient relational query evaluation. In: Proc. ICDT, pp. 256–267 (2011)Google Scholar
  4. 4.
    Brin, S., Page, L.: The anatomy of a large-scale hypertextual web search engine. Computer Networks 30(1-7), 107–117 (1998)Google Scholar
  5. 5.
    Chakrabarti, A.: A note on randomized streaming space bounds for the longest increasing subsequence problem. In: ECCC, p. 100 (2010)Google Scholar
  6. 6.
    Cormode, G., Korn, F., Tirthapura, S.: Time-decaying aggregates in out-of-order streams. In: Proc. PODS, pp. 89–98 (2008)Google Scholar
  7. 7.
    Cormode, G., Muthukrishnan, S.M., Şahinalp, S.C.: Permutation Editing and Matching via Embeddings. In: Yu, Y., Spirakis, P.G., van Leeuwen, J. (eds.) ICALP 2001. LNCS, vol. 2076, pp. 481–492. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  8. 8.
    Ergun, F., Jowhari, H.: On distance to monotonicity and longest increasing subsequence of a data stream. In: Proc. SODA, pp. 730–736 (2008)Google Scholar
  9. 9.
    Estivill-Castro, V., Wood, D.: A survey of adaptive sorting algorithms. ACM Computing Surveys 24, 441–476 (1992)CrossRefGoogle Scholar
  10. 10.
    Gál, A., Gopalan, P.: Lower bounds on streaming algorithms for approximating the length of the longest increasing subsequence. In: Proc. FOCS, pp. 294–304 (2007)Google Scholar
  11. 11.
    Gopalan, P., Jayram, T.S., Krauthgamer, R., Kumar, R.: Estimating the sortedness of a data stream. In: Proc. SODA, pp. 318–327 (2007)Google Scholar
  12. 12.
    Gopalan, P., Krauthgamer, R., Thathachar, J.: Method of obtaining data samples from a data stream and of estimating the sortednesss of the data stream based on the samples. United States Patent 7,797,326 B2 (2010)Google Scholar
  13. 13.
    Lin, X., Lu, H., Xu, J., Yu, J.X.: Continuously maintaining quantile summaries of the most recent n elements over a data stream. In: Proc. ICDE, pp. 362–374 (2004)Google Scholar
  14. 14.
    Jayram, T.S.: Hellinger Strikes Back: A Note on the Multi-party Information Complexity of AND. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) APPROX 2009. LNCS, vol. 5687, pp. 562–573. Springer, Heidelberg (2009)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ho-Leung Chan
    • 1
  • Tak-Wah Lam
    • 1
  • Lap-Kei Lee
    • 2
  • Jiangwei Pan
    • 1
  • Hing-Fung Ting
    • 1
  • Qin Zhang
    • 2
  1. 1.Department of Computer ScienceUniversity of Hong KongHong Kong
  2. 2.MADALGO, Department of Computer ScienceAarhus UniversityDenmark

Personalised recommendations