Advertisement

Fast network discovery on sequence data via time-aware hashing

  • Tara Safavi
  • Chandra Sripada
  • Danai Koutra
Regular Paper
  • 53 Downloads

Abstract

Discovering and analyzing networks from non-network data is a task with applications in fields as diverse as neuroscience, genomics, climate science, economics, and more. In domains where networks are discovered on multiple time series, the most common approach is to compute measures of association or similarity between all pairs of time series. The nodes in the resultant network correspond to time series, which are linked by edges weighted according to the association scores of their endpoints. Finally, the fully connected network is thresholded such that only the edges with stronger weights remain and the desired sparsity level is achieved. While this approach is feasible for small datasets, its quadratic (or higher) time complexity does not scale as the individual time series length and the number of compared series increase. Thus, to circumvent the inefficient and wasteful intermediary step of building a fully connected graph before network sparsification, we propose a fast network discovery approach based on probabilistic hashing. Our methods emphasize consecutiveness, or the intuition that time series following similar fluctuations in longer time-consecutive intervals are more similar overall. Evaluation on real data shows that our method can build graphs nearly 15 times faster than baselines (when the baselines do not run out of memory), while achieving accuracy comparable to, or better than, baselines in task-based evaluation. Furthermore, our proposals are general, modular, and may be applied to a variety of sequence similarity search tasks.

Keywords

Network discovery Brain networks Networks Hashing LSH Time series Sequences Knowledge discovery 

Notes

Acknowledgements

We thank the anonymous reviewers for their useful comments and suggestions. This material is based upon work supported by the National Science Foundation under Grant No. IIS 1743088, Trove. AI, Google, and the University of Michigan. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation or other funding parties. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation here on.

References

  1. 1.
    Akoglu L, Tong H, Koutra D (2015) Graph based anomaly detection and description: a survey. Data Min Knowl Discov 29(3):626–688MathSciNetCrossRefGoogle Scholar
  2. 2.
    Andoni A, Indyk P (2008) Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions. CACM 51(1):117–122CrossRefGoogle Scholar
  3. 3.
    Ashkenazy Y, Ivanov PC, Havlin S, Peng C-K, Goldberger AL, Stanley HE (2001) Magnitude and sign correlations in heartbeat fluctuations. Phys Rev Lett 86(9):1900–1903CrossRefGoogle Scholar
  4. 4.
    Balakrishnan N, Koutras M (2002) Runs and scans with applications. Wiley, HobokenzbMATHGoogle Scholar
  5. 5.
    Bassett D, Bullmore E (2009) Human brain networks in health and disease. Curr Opin Neurol 22(4):340–347CrossRefGoogle Scholar
  6. 6.
    Bayardo RJ, Ma Y, Srikant R (2007) Scaling up all pairs similarity search. In: Proceedings of the 16th international conference on world wide web, pp 131–140Google Scholar
  7. 7.
    Brugere I, Gallagher B, Berger-Wolf TY (2018) Network structure inference, a survey: motivations, methods, and applications. ACM Comput Surv (CSUR) 51(2):24CrossRefGoogle Scholar
  8. 8.
    Bullmore E, Sporns O (2009) Complex brain networks: graph theoretical analysis of structural and functional systems. Nat Rev Neurosci 10(3):186–198CrossRefGoogle Scholar
  9. 9.
    Center for Biomedical Research Excellence (2012) http://fcon\_1000.projects.nitrc.org/indi/retro/cobre.htmlGoogle Scholar
  10. 10.
    Chaudhuri S, Ganti V, Kaushik R (2006) A primitive operator for similarity joins in data cleaning. In: Proceedings of the 22nd international conference on data engineering. ICDE ’06Google Scholar
  11. 11.
    Chen Y, Keogh E, Hu B, Begum N, Bagnall A, Mueen A, Batista G (2015) The UCR time series classification archive. www.cs.ucr.edu/~eamonn/time_series_data/. Accessed 1 Jan 2017
  12. 12.
    Dai Z, He Y (2014) Disrupted structural and functional brain connectomes in mild cognitive impairment and Alzheimer’s disease. Neurosci Bull 30(2):217–232CrossRefGoogle Scholar
  13. 13.
    Davidson I, Gilpin S, Carmichael O, Walker P (2013) Network discovery via constrained tensor analysis of fmri data. In: KDD, pp 194–202Google Scholar
  14. 14.
    Dong W, Moses C, Li K (2011) Efficient k-nearest neighbor graph construction for generic similarity measures. In: Proceedings of the 20th international conference on World wide web, ACM, pp 577–586Google Scholar
  15. 15.
    Friston KJ (2011) Functional and effective connectivity: a review. Brain Connect 1(1):13–36MathSciNetCrossRefGoogle Scholar
  16. 16.
    Hallac D, Park Y, Boyd S, Leskovec J (2017) Network inference via the time-varying graphical lasso. In: ‘KDD’Google Scholar
  17. 17.
    Heimann M, Lee W, Pan S, Chen K, Koutra D (2018) Hashalign: Hash-based alignment of multiple graphs. In: Advances in knowledge discovery and data mining—22nd Pacific-Asia conference, PAKDD 2018, Melbourne, VIC, Australia, June 3–6, 2018, Proceedings, Part III, pp 726–739Google Scholar
  18. 18.
    Iglesias F, Kastner W (2013) Analysis of similarity measures in times series clustering for the discovery of building energy patterns. Energies 6(2):579–597CrossRefGoogle Scholar
  19. 19.
    Indyk P, Motwani R (1998) Approximate nearest neighbors: towards removing the curse of dimensionality. In: ‘STOC’, pp 604–613Google Scholar
  20. 20.
    Jäkel F, Schlkopf B, Wichmann F (2008) Similarity, kernels, and the triangle inequality. J Math Psychol 52(5):297–303MathSciNetCrossRefGoogle Scholar
  21. 21.
    Kale DC, Gong D, Che Z, Liu Y, Medioni G, Wetzel R, Ross P (2014) An examination of multivariate time series hashing with applications to health care. In: ICDM, pp 260–269Google Scholar
  22. 22.
    Keogh E, Pazzani M (1999) An indexing scheme for fast similarity search in large time series databases. In: SSDM, pp 56–67Google Scholar
  23. 23.
    Kim YB, Hemberg E, O’Reilly U-M (2016) Stratified locality-sensitive hashing for accelerated physiological time series retrieval. In: EMBCGoogle Scholar
  24. 24.
    Kim YB, O’Reilly U-M (2015) Large-scale physiological waveform retrieval via locality-sensitive hashing. In: EMBC, pp 5829–5833Google Scholar
  25. 25.
    Koutra D, Faloutsos C (2017) Individual and collective graph mining: principles, algorithms, and applications. In: Synthesis lectures on data mining and knowledge discovery. Morgan and Claypool PublishersGoogle Scholar
  26. 26.
    Koutra D, Shah N, Vogelstein JT, Gallagher B, Faloutsos C (2016) Deltacon: principled massive-graph similarity function with attribution. TKDD 10(3):28:1–28:43CrossRefGoogle Scholar
  27. 27.
    Kuo C-T, Wang X, Walker P, Carmichael O, Ye J, Davidson I (2015) Unified and contrasting cuts in multiple graphs: application to medical imaging segmentation. In: KDD, pp 617–626Google Scholar
  28. 28.
    Leskovec J, Rajaraman A, Ullman JD (2014) Mining of massive datasets. Cambridge University Press, CambridgeCrossRefGoogle Scholar
  29. 29.
    Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic representation of time series, with implications for streaming algorithms. In: SIGMOD, pp 2–11Google Scholar
  30. 30.
    Liu Y, Safavi T, Dighe A, Koutra D (2018) Graph summarization methods and applications: a survey. ACM Comput Surv 51(3):62:1–62:34CrossRefGoogle Scholar
  31. 31.
    Luo C, Shrivastava A (2016) SSH (Sketch, Shingle, and Hash) for indexing massive-scale time series. In: NIPS time series workshopGoogle Scholar
  32. 32.
    Martínez V, Berzal F, Cubero J-C (2016) A survey of link prediction in complex networks. ACM Comput Surv 49(4):69:1–69:33CrossRefGoogle Scholar
  33. 33.
    Müller M (2007) Information retrieval for music and motion. Springer, New YorkCrossRefGoogle Scholar
  34. 34.
    Onnela J-P, Kaski K, Kertsz J (2004) Clustering and information in correlation based financial networks. Eur Phys J B 38:353–362CrossRefGoogle Scholar
  35. 35.
    Park H-J, Friston K (2013) Structural and functional brain networks: from connections to cognition. Science 342(6158):579–589CrossRefGoogle Scholar
  36. 36.
    Ratanamahatana C, Keogh E, Bagnall AJ, Lonardi S (2005) A novel bit level time series representation with implication of similarity search and clustering. In: PAKDD, pp 771–777Google Scholar
  37. 37.
    Satterthwaite T, Elliott M, Ruparel K, Loughead J, Prabhakaran K, Calkins M, Hopson R, Jackson C, Keefe J, Riley M, Mentch F, Sleiman P, Verma R, Davatzikos C, Hakonarson H, Gur R, Gur R (2014) Neuroimaging of the Philadelphia neurodevelopmental cohort. Neuroimage 86:544–553CrossRefGoogle Scholar
  38. 38.
    Scharwächter E, Geier F, Faber L, Müller E (2018) Low redundancy estimation of correlation matrices for time series using triangular bounds. In: Pacific-Asia conference on knowledge discovery and data mining. Springer, pp 458–470Google Scholar
  39. 39.
    Shah N, Koutra D, Jin L, Zou T, Gallagher B, Faloutsos C (2017) On summarizing large-scale dynamic graphs. IEEE Data Eng Bull 40(3):75–88Google Scholar
  40. 40.
    Shuman DI, Narang SK, Frossard P, Ortega A, Vandergheynst P (2013) The emerging field of signal processing on graphs: extending high-dimensional data analysis to networks and other irregular domains. IEEE Signal Process Mag 30(3):83–98CrossRefGoogle Scholar
  41. 41.
    Tsitsulin A, Mottin D, Karras P, Bronstein AM, Müller E (2018) Netlsd: hearing the shape of a graph. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery and data mining, KDD 2018, London, UK, August 19–23, 2018, pp 2347–2356Google Scholar
  42. 42.
    Yang S, Sun Q, Ji S, Wonka P, Davidson I, Ye J (2015) Structural graphical lasso for learning mouse brain connectivity. In: KDD, pp 1385–1394Google Scholar
  43. 43.
    Yeh C-CM, Zhu Y, Ulanova L, Begum N, Ding Y, Dau HA, Silva DF, Mueen A, Keogh E (2016) Matrix profile i: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 1317–1322Google Scholar
  44. 44.
    Zhang Y-M, Huang K, Geng G, Liu C-L (2013) Fast kNN graph construction with locality sensitive hashing. In: ECML PKDD, pp 660–674Google Scholar

Copyright information

© Springer-Verlag London Ltd., part of Springer Nature 2018

Authors and Affiliations

  1. 1.Computer Science and EngineeringUniversity of MichiganAnn ArborUSA
  2. 2.Psychiatry and PhilosophyUniversity of MichiganAnn ArborUSA

Personalised recommendations