Skip to main content

Multi-type clustering using regularized tensor decomposition

Abstract

Geospatial analytics increasingly rely on data fusion methods to extract patterns from data; however robust results are difficult to achieve because of the need for spatial and temporal regularization and latent structures within data. Tensor decomposition is a promising approach because it can accommodate multidimensional structure of data (e.g., trajectory information about users, locations, and time periods). To address these challenges, we introduce Multi-Type Clustering using Regularized tensor Decomposition (MCRD), an innovative method for data analysis that provides insight not just about groupings within data types (e.g., clusters of users), but also about the interactions between data types (e.g., clusters of users and locations) in the latent features of complex multi-type datasets. This is done by combining two innovations. First, a tensor representing spatiotemporal data is decomposed using a novel regularization method to account for structure within the data. Next, within- and cross-type groups are found through the application of novel hypergraph community detection methods to the decomposed results. Experimentation on both synthetic and real trajectory data demonstrates MCRD’s capacity to reveal the within- and cross-type grouping in data, and MCRD outperforms related methods including tensor decomposition without regularization, unfolding of tensors, Laplacian regularization, and tensor block models. The robust and versatile analysis provided by combining new regularization and clustering techniques outlined in this paper likely have utility in geospatial analytics beyond the movement applications explicitly studied.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22

Availability of data and material

While the is not publicly available at this time, the method for creating the synthetic data used is described in Section 4.1 . As for the real-world data, we used a portion of the Porto dataset available at http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html.

Code Availability

The code is not publicly available at this time.

Notes

  1. Because it is an internal clustering index, the CH criterion is not as meaningful for comparing the results of clustering elements based on factor matrices with different number of factors.

  2. Several MATLAB packages for basic tensor operations are used [2, 3, 33].

  3. The data can be found at: http://www.geolink.pt/ecmlpkdd2015-challenge/dataset.html.

  4. The code used for this can be found at https://github.com/ike002jp/npartite.

References

  1. Acar E, Kolda TG, Dunlavy DM (2011) All-at-once optimization for coupled matrix and tensor factorizations. arXiv:1105.3422

  2. Bader BW, Kolda TG (2006) Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. ACM Trans on Math Softw 32(4):635–653. https://doi.org/10.1145/1186785.1186794

    MathSciNet  Article  MATH  Google Scholar 

  3. Bader BW, Kolda TG et al (2015) Matlab tensor toolbox version 2.6. Available online. http://www.sandia.gov/tgkolda/TensorToolbox/

  4. Battaglino C, Ballard G, Kolda TG (2018) A practical randomized cp tensor decomposition. SIAM J Matrix Anal Appl 39(2):876–901

    MathSciNet  Article  Google Scholar 

  5. Caliński T, Harabasz J (1974) A dendrite method for cluster analysis. Communications in Statistics-theory and Methods 3(1):1–27

    MathSciNet  Article  Google Scholar 

  6. Castro PS, Zhang D, Chen C, Li S, Pan G (2013) From taxi gps traces to social and community dynamics: a survey. ACM Computing Surveys (CSUR) 46(2):1–34

    Article  Google Scholar 

  7. Chi EC, Gaines BR, Sun WW, Zhou H, Yang J (2018) Provable convex co-clustering of tensors. arXiv:1803.06518

  8. Comon P, Luciani X, De Almeida AL (2009) Tensor decompositions, alternating least squares and other tales. J Chemom: A J Chemom Soc 23(7-8):393–405

    Article  Google Scholar 

  9. Danon L, Diaz-Guilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech: Theory and Exp 2005(09):P09008

    Article  Google Scholar 

  10. Gauvin L, Panisson A, Cattuto C (2014) Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach. PloS One 9(1):e86028

    Article  Google Scholar 

  11. Grauwin S, Sobolevsky S, Moritz S, Gódor I, Ratti C (2015) Towards a comparative science of cities: Using mobile traffic records in new york, london, and hong kong. In: Computational approaches for urban environments, Springer, pp 363–387

  12. Haass MJ, Van Benthem MH, Ochoa EM (2014) Tensor analysis methods for activity characterization in spatiotemporal data. Sandia Tech Report SAND2014–1825

  13. Hong D, Kolda TG, Duersch JA (2018) Generalized canonical polyadic tensor decomposition. arXiv:abs/1808.07452

  14. Ikematsu K, Murata T (2013) A fast method for detecting communities from tripartite networks. In: Int conferen on soc inform, Springer, pp 192–205

  15. Ioannidis VN, Zamzam AS, Giannakis GB, Sidiropoulos ND (2018) Coupled graphs and tensor factorization for recommender systems and community detection. arXiv:1809.08353

  16. Kolda TG, Bader BW (2009) Tensor decompositions and applications. SIAM Rev 51(3):455–500. https://doi.org/10.1137/07070111x

    MathSciNet  Article  MATH  Google Scholar 

  17. Lee DD, Seung HS (2001) Algorithms for non-negative matrix factorization. pp 556–562

  18. Li X, Li M, Gong YJ, Zhang XL, Yin J (2016) T-desp: Destination prediction based on big trajectory data. IEEE Transactions on Intell Transp Syst 17(8):2344–2354

    Article  Google Scholar 

  19. Lin YR, Sun J, Castro P, Konuru R, Sundaram H, Kelliher A (2009) Metafac: community discovery via relational hypergraph factorization. In: Proc of the 15th ACM SIGKDD int conferen on knowl discov and data min, ACM, pp 527–536

  20. Liu JX, Wang D, Gao YL, Zheng CH, Xu Y, Yu J (2017) Regularized non-negative matrix factorization for identifying differentially expressed genes and clustering samples: a survey. IEEE/ACM Trans on Computl Biolog and Bioinform 15(3):974–987

    Article  Google Scholar 

  21. Liu L, Andris C, Ratti C (2010) Uncovering cabdrivers’ behavior patterns from their digital traces. Comput Environ Urban Syst 34(6):541–548

    Article  Google Scholar 

  22. Liu Y, Li Z, Xiong H, Gao X, Wu J (2010) Understanding of internal clustering validation measures. In: 2010 IEEE International conference on data mining, IEEE, pp 911–916

  23. Moreira-Matias L, Gama J, Ferreira M, Mendes-Moreira J, Damas L (2016) Time-evolving od matrix estimation using high-speed gps data streams. Expert Systems with Applications 44:275–288

    Article  Google Scholar 

  24. Moreira-Matias L, Gama J, Ferreira M, Moreira J, Damas L (2013) Predicting taxi-passenger demand using streaming data. IEEE Trans on Intell Transp Syst 14:1393–1402. https://doi.org/10.1109/TITS.2013.2262376

    Article  Google Scholar 

  25. Murtagh F (1983) A survey of recent advances in hierarchical clustering algorithms. The Comput J 26(4):354–359

    Article  Google Scholar 

  26. Narita A, Hayashi K, Tomioka R, Kashima H (2012) Tensor factorization using auxiliary information. Data Min and Knowl Discov 25(2):298–324

    MathSciNet  Article  Google Scholar 

  27. Neubauer N, Obermayer K (2010) Community detection in tagging-induced hypergraphs. In: Workshop on inform in netw. New York University NY, USA, pp 24–25

  28. Ouvrard X, Goff JL, Marchand-Maillet S (2017) Adjacency and tensor representation in general hypergraphs part 1: e-adjacency tensor uniformisation using homogeneous polynomials. arXiv:1712.08189

  29. Phithakkitnukoon S, Veloso M, Bento C, Biderman A, Ratti C (2010) Taxi-aware map: Identifying and predicting vacant taxis in the city. In: International joint conference on ambient intelligence, Springer, pp 86–95

  30. Shashua A, Hazan T (2005) Non-negative tensor factorization with applications to statistics and computer vision. In: Proc of the 22nd int conferen on mach learn, ACM, pp 792–799

  31. Sun L, Axhausen KW (2016) Understanding urban mobility patterns with a probabilistic tensor factorization framework. Transp Res Part B: Methodol 91:511–524

    Article  Google Scholar 

  32. Takeuchi K, Tomioka R, Ishiguro K, Kimura A, Sawada H (2013) Non-negative multiple tensor factorization. In: 2013 IEEE 13Th int conferen on data min, IEEE, pp 1199–1204

  33. Vervliet N, Debals O, Sorber L, Van Barel M, De Lathauwer L (2016) Tensorlab 3.0. https://www.tensorlab.net. Available online

  34. Wang M, Zeng Y (2019) Multiway clustering via tensor block models. In: Adv in neural inf process sys, pp 713–723

  35. Wang Y, Zheng Y, Xue Y (2014) Travel time estimation of a path using sparse trajectories. In: Proc of the 20th ACM SIGKDD int conferen on knowl discov and data min, ACM, pp 25–34

  36. Wu R, Luo G, Jin Q, Shao J, Lu CT (2020) Learning evolving user’s behaviors on location-based social networks. GeoInformatica, pp 1–31

  37. Wu T, Benson AR, Gleich DF (2016) General tensor spectral co-clustering for higher-order data. In: Adv in neural inf process syst, pp 2559–2567

  38. Xu Y, Yin W (2013) A block coordinate descent method for regularized multiconvex optimization with applications to nonnegative tensor factorization and completion. SIAM J on Imaging Sci 6(3):1758–1789

    MathSciNet  Article  Google Scholar 

  39. Yao L, Sheng QZ, Qin Y, Wang X, Shemshadi A, He Q (2015) Context-aware point-of-interest recommendation using tensor factorization with social regularization. In: Proc of the 38th int ACM SIGIR conferen on res and dev in inf retr, ACM, pp 1007–1010

  40. Yılmaz KY, Cemgil AT, Simsekli U (2011) Generalised coupled tensor factorisation. In: Adv in neural inf process syst, pp 2151–2159

  41. Zheng Y (2015) Trajectory data mining: an overview. ACM Trans on Intell Syst Technol (TIST) 6(3):29

    Google Scholar 

  42. Zheng Y, Liu T, Wang Y, Zhu Y, Liu Y, Chang E (2014) Diagnosing new york city’s noises with ubiquitous data. In: Proc of the 2014 ACM int jt conferen on pervasive and ubiquitous comput, ACM, pp 715–725

  43. Zheng Y, Liu Y, Yuan J, Xie X (2011) Urban computing with taxicabs. In: Proceedings of the 13th international conference on Ubiquitous computing, pp 89–98

  44. Zheng Y, Zhou X (2011) Computing with spatial trajectories. Springer Science & Business Media

Download references

Acknowledgements

This work was supported by the US Army Engineer Research and Development Center, Geospatial Research Engineering basic research program. Any opinions expressed in this paper are those of the authors, and are not to be construed as official positions of the funding agency or the Department of the Army unless so designated by other authorized documents.

Funding

This work was supported by the U.S. Army Engineer Research and Development Center, Geospatial Research Engineering basic research program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Charlotte L. Ellison.

Ethics declarations

Competing interests

The authors have no conflicts of interest or competing interests.

Conflict of Interests

The authors have no conflicts of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ellison, C.L., Fields, W.R. Multi-type clustering using regularized tensor decomposition. Geoinformatica (2022). https://doi.org/10.1007/s10707-021-00457-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10707-021-00457-8

Keywords

  • Spatiotemporal reasoning
  • Trajectory analysis
  • Tensor
  • CP Decomposition
  • Co-Clustering