Functional Data Clustering Analysis via the Learning of Gaussian Processes with Wasserstein Distance

Li, Tao; Ma, Jinwen

doi:10.1007/978-3-030-63833-7_33

Tao Li¹⁴ &
Jinwen Ma¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2525 Accesses
1 Citations

Abstract

Functional data clustering analysis becomes an urgent and challenging task in the new era of big data. In this paper, we propose a new framework for functional data clustering analysis, which adopts a similar structure as the k-means algorithm for the conventional clustering analysis. Under this framework, we clarify three issues: how to represent functions, how to measure distances between functions, and how to calculate centers of functions. We utilize Gaussian processes to represent the clusters of functions which are actually their sample curves or trajectories on a finite set of sample points. Moreover, we take the Wasserstein distance to measure the similarity between Gaussian distributions. With the choice of Wasserstein distance, the centers of Gaussian processes can be calculated analytically and efficiently. To demonstrate the effectiveness of the proposed method, we compare it with existing competitive clustering methods on synthetic datasets and the obtained results are encouraging. We finally apply the proposed method to three real-world datasets with satisfying results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The details here are not so important, and the definition of Wasserstein 2-distance of Gaussian measures is enough for the development of this work. We present the formal definition here for completeness.

References

Agueh, M., Carlier, G.: Barycenters in the wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)
Article MathSciNet Google Scholar
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retriev. 12(4), 461–486 (2009)
Article Google Scholar
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Article Google Scholar
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
MATH Google Scholar
Chen, Z., Ma, J., Zhou, Y.: A precise hard-cut EM algorithm for mixtures of gaussian processes. In: Huang, D.-S., Jo, K.-H., Wang, L. (eds.) ICIC 2014. LNCS (LNAI), vol. 8589, pp. 68–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09339-0_7
Chapter Google Scholar
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, Hoboken (2012)
MATH Google Scholar
Desgraupes, B.: Clustering indices. University of Paris Ouest-Lab Modal’X, vol. 1, p. 34 (2013)
Google Scholar
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, Springer series in statistics New York, vol. 1 (2001)
Google Scholar
Gaffney, S.: Probabilistic curve-aligned clustering and prediction with regression mixture models. Ph.D. thesis, University of California, Irvine (2004)
Google Scholar
Gaffney, S.J., Smyth, P.: Joint probabilistic curve clustering and alignment. In: Advances in Neural Information Processing Systems, pp. 473–480 (2005)
Google Scholar
Kolouri, S., Park, S.R., Thorpe, M., Slepcev, D., Rohde, G.K.: Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Process. Mag. 34(4), 43–59 (2017)
Article Google Scholar
Li, T., Ma, J.: Fuzzy clustering with automated model selection: entropy penalty approach. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 571–576 (2018)
Google Scholar
López-Pintado, S., Romo, J.: On the concept of depth for functional data. J. Am. Stat. Assoc. 104(486), 718–734 (2009)
Article MathSciNet Google Scholar
Mallasto, A., Feragen, A.: Learning from uncertain curves: the 2-wasserstein metric for gaussian processes. In: Advances in Neural Information Processing Systems, pp. 5660–5670 (2017)
Google Scholar
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Heidelberg (2006). https://doi.org/10.1007/978-0-387-40065-5
Book MATH Google Scholar
Peyré, G., et al.: Computational optimal transport. Found. Trends® Mach. Learn. 11(5—-6), 355–607 (2019)
Article Google Scholar
Ramsay, J.O., Hooker, G., Graves, S.: Functional Data Analysis with R and MATLAB, 1st edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-98185-7
Book MATH Google Scholar
Ramsay, J.O.: Functional data analysis. Encycl. Stat. Sci. 4 (2004)
Google Scholar
Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4
Chapter Google Scholar
Rasmussen, C.E., Nickisch, H.: Gaussian processes for machine learning (GPML) toolbox. J. Mach. Learn. Res. 11(Nov), 3011–3015 (2010)
MathSciNet MATH Google Scholar
Toth, A., Kelley, C.: Convergence analysis for anderson acceleration. SIAM J. Numeric. Anal. 53(2), 805–819 (2015)
Article MathSciNet Google Scholar
Trefethen, L.N., Bau III, D.: Numerical Linear Algebra, vol. 50. SIAM (1997)
Google Scholar
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71050-9
Book MATH Google Scholar
Wagner, S., Wagner, D.: Comparing clusterings: an overview. Universität Karlsruhe, Fakultät für Informatik Karlsruhe (2007)
Google Scholar
Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numeric. Anal. 49(4), 1715–1735 (2011)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Key R & D Program of China (2018YFC0808305).

Author information

Authors and Affiliations

Department of Information Science, School of Mathematical Sciences and LMAM, Peking University, Beijing, China
Tao Li & Jinwen Ma

Authors

Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Jinwen Ma
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jinwen Ma .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, T., Ma, J. (2020). Functional Data Clustering Analysis via the Learning of Gaussian Processes with Wasserstein Distance. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_33

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_33
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics