Skip to main content

Functional Data Clustering Analysis via the Learning of Gaussian Processes with Wasserstein Distance

  • Conference paper
  • First Online:
Neural Information Processing (ICONIP 2020)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

Abstract

Functional data clustering analysis becomes an urgent and challenging task in the new era of big data. In this paper, we propose a new framework for functional data clustering analysis, which adopts a similar structure as the k-means algorithm for the conventional clustering analysis. Under this framework, we clarify three issues: how to represent functions, how to measure distances between functions, and how to calculate centers of functions. We utilize Gaussian processes to represent the clusters of functions which are actually their sample curves or trajectories on a finite set of sample points. Moreover, we take the Wasserstein distance to measure the similarity between Gaussian distributions. With the choice of Wasserstein distance, the centers of Gaussian processes can be calculated analytically and efficiently. To demonstrate the effectiveness of the proposed method, we compare it with existing competitive clustering methods on synthetic datasets and the obtained results are encouraging. We finally apply the proposed method to three real-world datasets with satisfying results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The details here are not so important, and the definition of Wasserstein 2-distance of Gaussian measures is enough for the development of this work. We present the formal definition here for completeness.

References

  1. Agueh, M., Carlier, G.: Barycenters in the wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)

    Article  MathSciNet  Google Scholar 

  2. Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retriev. 12(4), 461–486 (2009)

    Article  Google Scholar 

  3. Bezdek, J.C., Ehrlich, R., Full, W.: FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)

    Article  Google Scholar 

  4. Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)

    MATH  Google Scholar 

  5. Chen, Z., Ma, J., Zhou, Y.: A precise hard-cut EM algorithm for mixtures of gaussian processes. In: Huang, D.-S., Jo, K.-H., Wang, L. (eds.) ICIC 2014. LNCS (LNAI), vol. 8589, pp. 68–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09339-0_7

    Chapter  Google Scholar 

  6. Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, Hoboken (2012)

    MATH  Google Scholar 

  7. Desgraupes, B.: Clustering indices. University of Paris Ouest-Lab Modal’X, vol. 1, p. 34 (2013)

    Google Scholar 

  8. Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, Springer series in statistics New York, vol. 1 (2001)

    Google Scholar 

  9. Gaffney, S.: Probabilistic curve-aligned clustering and prediction with regression mixture models. Ph.D. thesis, University of California, Irvine (2004)

    Google Scholar 

  10. Gaffney, S.J., Smyth, P.: Joint probabilistic curve clustering and alignment. In: Advances in Neural Information Processing Systems, pp. 473–480 (2005)

    Google Scholar 

  11. Kolouri, S., Park, S.R., Thorpe, M., Slepcev, D., Rohde, G.K.: Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Process. Mag. 34(4), 43–59 (2017)

    Article  Google Scholar 

  12. Li, T., Ma, J.: Fuzzy clustering with automated model selection: entropy penalty approach. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 571–576 (2018)

    Google Scholar 

  13. López-Pintado, S., Romo, J.: On the concept of depth for functional data. J. Am. Stat. Assoc. 104(486), 718–734 (2009)

    Article  MathSciNet  Google Scholar 

  14. Mallasto, A., Feragen, A.: Learning from uncertain curves: the 2-wasserstein metric for gaussian processes. In: Advances in Neural Information Processing Systems, pp. 5660–5670 (2017)

    Google Scholar 

  15. Nocedal, J., Wright, S.: Numerical Optimization. Springer, Heidelberg (2006). https://doi.org/10.1007/978-0-387-40065-5

    Book  MATH  Google Scholar 

  16. Peyré, G., et al.: Computational optimal transport. Found. Trends® Mach. Learn. 11(5—-6), 355–607 (2019)

    Article  Google Scholar 

  17. Ramsay, J.O., Hooker, G., Graves, S.: Functional Data Analysis with R and MATLAB, 1st edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-98185-7

    Book  MATH  Google Scholar 

  18. Ramsay, J.O.: Functional data analysis. Encycl. Stat. Sci. 4 (2004)

    Google Scholar 

  19. Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4

    Chapter  Google Scholar 

  20. Rasmussen, C.E., Nickisch, H.: Gaussian processes for machine learning (GPML) toolbox. J. Mach. Learn. Res. 11(Nov), 3011–3015 (2010)

    MathSciNet  MATH  Google Scholar 

  21. Toth, A., Kelley, C.: Convergence analysis for anderson acceleration. SIAM J. Numeric. Anal. 53(2), 805–819 (2015)

    Article  MathSciNet  Google Scholar 

  22. Trefethen, L.N., Bau III, D.: Numerical Linear Algebra, vol. 50. SIAM (1997)

    Google Scholar 

  23. Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71050-9

    Book  MATH  Google Scholar 

  24. Wagner, S., Wagner, D.: Comparing clusterings: an overview. Universität Karlsruhe, Fakultät für Informatik Karlsruhe (2007)

    Google Scholar 

  25. Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numeric. Anal. 49(4), 1715–1735 (2011)

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This work was supported by the National Key R & D Program of China (2018YFC0808305).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinwen Ma .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, T., Ma, J. (2020). Functional Data Clustering Analysis via the Learning of Gaussian Processes with Wasserstein Distance. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_33

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-63833-7_33

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-63832-0

  • Online ISBN: 978-3-030-63833-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics