The VLDB Journal

, Volume 25, Issue 6, pp 767–790

Efficient discovery of longest-lasting correlation in sequence databases

  • Yuhong Li
  • Leong Hou U
  • Man Lung Yiu
  • Zhiguo Gong
Regular Paper

DOI: 10.1007/s00778-016-0432-7

Cite this article as:
Li, Y., U, L.H., Yiu, M.L. et al. The VLDB Journal (2016) 25: 767. doi:10.1007/s00778-016-0432-7
  • 455 Downloads

Abstract

The search for similar subsequences is a core module for various analytical tasks in sequence databases. Typically, the similarity computations require users to set a length. However, there is no robust means by which to define the proper length for different application needs. In this study, we examine a new query that is capable of returning the longest-lasting highly correlated subsequences in a sequence database, which is particularly helpful to analyses without prior knowledge regarding the query length. A baseline, yet expensive, solution is to calculate the correlations for every possible subsequence length. To boost performance, we study a space-constrained index that provides a tight correlation bound for subsequences of similar lengths and offset by intraobject and interobject grouping techniques. To the best of our knowledge, this is the first index to support a normalized distance metric of arbitrary length subsequences. In addition, we study the use of a smart cache for disk-resident data (e.g., millions of sequence objects) and a graph processing unit-based parallel processing technique for frequently updated data (e.g., nonindexable streaming sequences) to compute the longest-lasting highly correlated subsequences. Extensive experimental evaluation on both real and synthetic sequence datasets verifies the efficiency and effectiveness of our proposed methods.

Keywords

Time series analysis Similarity search Longest-lasting correlated subsequences 

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Yuhong Li
    • 1
  • Leong Hou U
    • 1
  • Man Lung Yiu
    • 2
  • Zhiguo Gong
    • 1
  1. 1.Department of Computer and Information ScienceUniversity of MacauMacau SARChina
  2. 2.Department of ComputingHong Kong Polytechnic UniversityHong Kong SARChina