Advertisement

Fast Exact Algorithm to Solve Continuous Similarity Search for Evolving Queries

  • Tomohiro Yamazaki
  • Hisashi Koga
  • Takahisa Toda
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10648)

Abstract

We study the continuous similarity search problem for evolving queries which has recently been formulated. Given a data stream and a database composed of n sets of items, the purpose of this problem is to maintain the top-k most similar sets to the query which evolves over time and consists of the latest W items in the data stream. For this problem, the previous exact algorithm adopts a pruning strategy which, at the present time T, decides the candidates of the top-k most similar sets from past similarity values and computes the similarity values only for them. This paper proposes a new exact algorithm which shortens the execution time by computing the similarity values only for sets whose similarity values at T can change from time \(T-1\). We identify such sets very fast with frequency-based inverted lists (FIL). Moreover, we derive the similarity values at T in O(1) time by updating the previous values computed at time \(T-1\). Experimentally, our exact algorithm runs faster than the previous exact algorithm by one order of magnitude.

Keywords

Data stream Evolving query Set similarity search Inverted lists 

Notes

Acknowledgments

This work was supported by JSPS KAKENHI Grant Number JP15K00148, 2016.

References

  1. 1.
    Broder, A.Z., Charikar, M., Frieze, A.M., Mitzenmacher, M.: Min-wise independent permutations. J. Comput. Syst. Sci. 60(3), 630–659 (2000)CrossRefzbMATHMathSciNetGoogle Scholar
  2. 2.
    Datar, M., Immorlica, N., Indyk, P., Mirrokni, V.S.: Locality-sensitive hashing scheme based on p-stable distributions. In: Proceedings of ACM SCG 2004, pp. 253–262. ACM (2004)Google Scholar
  3. 3.
    Datar, M., Muthukrishnan, S.: Estimating rarity and similarity over data stream windows. In: Möhring, R., Raman, R. (eds.) ESA 2002. LNCS, vol. 2461, pp. 323–335. Springer, Heidelberg (2002). doi: 10.1007/3-540-45749-6_31 CrossRefGoogle Scholar
  4. 4.
    Kontaki, M., Papadopoulos, A.N., Manolopoulos, Y.: Adaptive similarity search in streaming time series with sliding windows. Data Knowl. Eng. 63(2), 478–502 (2007)CrossRefGoogle Scholar
  5. 5.
    Lian, X., Chen, L., Wang, B.: Approximate similarity search over multiple stream time series. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds.) DASFAA 2007. LNCS, vol. 4443, pp. 962–968. Springer, Heidelberg (2007). doi: 10.1007/978-3-540-71703-4_86 CrossRefGoogle Scholar
  6. 6.
    Rao, W., Chen, L., Chen, S., Tarkoma, S.: Evaluating continuous top-k queries over document streams. World Wide Web 17(1), 59–83 (2014)CrossRefGoogle Scholar
  7. 7.
    U, L.H., Zhang, J., Mouratidis, K., Li, Y.: Continuous top-k monitoring on document streams. IEEE Trans. Knowl. Data Eng. 29(5), 991–1003 (2017)CrossRefGoogle Scholar
  8. 8.
    Xu, X., Gao, C., Pei, J., Wang, K., Al-Barakati, A.: Continuous similarity search for evolving queries. Knowl. Inf. Syst. 48(3), 649–678 (2016)CrossRefGoogle Scholar
  9. 9.
    Yang, D., Shastri, A., Rundensteiner, E.A., Ward, M.O.: An optimal strategy for monitoring top-k queries in streaming windows. In: Proceedings of the 14th International Conference on Extending Database Technology, pp. 57–68. ACM (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Tomohiro Yamazaki
    • 1
  • Hisashi Koga
    • 1
  • Takahisa Toda
    • 1
  1. 1.Graduate School of Informatics and EngineeringThe University of Electro-CommunicationsChofuJapan

Personalised recommendations