Abstract
Functional data clustering analysis becomes an urgent and challenging task in the new era of big data. In this paper, we propose a new framework for functional data clustering analysis, which adopts a similar structure as the k-means algorithm for the conventional clustering analysis. Under this framework, we clarify three issues: how to represent functions, how to measure distances between functions, and how to calculate centers of functions. We utilize Gaussian processes to represent the clusters of functions which are actually their sample curves or trajectories on a finite set of sample points. Moreover, we take the Wasserstein distance to measure the similarity between Gaussian distributions. With the choice of Wasserstein distance, the centers of Gaussian processes can be calculated analytically and efficiently. To demonstrate the effectiveness of the proposed method, we compare it with existing competitive clustering methods on synthetic datasets and the obtained results are encouraging. We finally apply the proposed method to three real-world datasets with satisfying results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The details here are not so important, and the definition of Wasserstein 2-distance of Gaussian measures is enough for the development of this work. We present the formal definition here for completeness.
References
Agueh, M., Carlier, G.: Barycenters in the wasserstein space. SIAM J. Math. Anal. 43(2), 904–924 (2011)
Amigó, E., Gonzalo, J., Artiles, J., Verdejo, F.: A comparison of extrinsic clustering evaluation metrics based on formal constraints. Inf. Retriev. 12(4), 461–486 (2009)
Bezdek, J.C., Ehrlich, R., Full, W.: FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 10(2–3), 191–203 (1984)
Bishop, C.M.: Pattern Recognition and Machine Learning. Springer, Heidelberg (2006)
Chen, Z., Ma, J., Zhou, Y.: A precise hard-cut EM algorithm for mixtures of gaussian processes. In: Huang, D.-S., Jo, K.-H., Wang, L. (eds.) ICIC 2014. LNCS (LNAI), vol. 8589, pp. 68–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-09339-0_7
Cover, T.M., Thomas, J.A.: Elements of Information Theory. John Wiley, Hoboken (2012)
Desgraupes, B.: Clustering indices. University of Paris Ouest-Lab Modal’X, vol. 1, p. 34 (2013)
Friedman, J., Hastie, T., Tibshirani, R.: The Elements of Statistical Learning, Springer series in statistics New York, vol. 1 (2001)
Gaffney, S.: Probabilistic curve-aligned clustering and prediction with regression mixture models. Ph.D. thesis, University of California, Irvine (2004)
Gaffney, S.J., Smyth, P.: Joint probabilistic curve clustering and alignment. In: Advances in Neural Information Processing Systems, pp. 473–480 (2005)
Kolouri, S., Park, S.R., Thorpe, M., Slepcev, D., Rohde, G.K.: Optimal mass transport: signal processing and machine-learning applications. IEEE Signal Process. Mag. 34(4), 43–59 (2017)
Li, T., Ma, J.: Fuzzy clustering with automated model selection: entropy penalty approach. In: 2018 14th IEEE International Conference on Signal Processing (ICSP), pp. 571–576 (2018)
López-Pintado, S., Romo, J.: On the concept of depth for functional data. J. Am. Stat. Assoc. 104(486), 718–734 (2009)
Mallasto, A., Feragen, A.: Learning from uncertain curves: the 2-wasserstein metric for gaussian processes. In: Advances in Neural Information Processing Systems, pp. 5660–5670 (2017)
Nocedal, J., Wright, S.: Numerical Optimization. Springer, Heidelberg (2006). https://doi.org/10.1007/978-0-387-40065-5
Peyré, G., et al.: Computational optimal transport. Found. Trends® Mach. Learn. 11(5—-6), 355–607 (2019)
Ramsay, J.O., Hooker, G., Graves, S.: Functional Data Analysis with R and MATLAB, 1st edn. Springer, Heidelberg (2009). https://doi.org/10.1007/978-0-387-98185-7
Ramsay, J.O.: Functional data analysis. Encycl. Stat. Sci. 4 (2004)
Rasmussen, C.E.: Gaussian processes in machine learning. In: Bousquet, O., von Luxburg, U., Rätsch, G. (eds.) ML -2003. LNCS (LNAI), vol. 3176, pp. 63–71. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28650-9_4
Rasmussen, C.E., Nickisch, H.: Gaussian processes for machine learning (GPML) toolbox. J. Mach. Learn. Res. 11(Nov), 3011–3015 (2010)
Toth, A., Kelley, C.: Convergence analysis for anderson acceleration. SIAM J. Numeric. Anal. 53(2), 805–819 (2015)
Trefethen, L.N., Bau III, D.: Numerical Linear Algebra, vol. 50. SIAM (1997)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-71050-9
Wagner, S., Wagner, D.: Comparing clusterings: an overview. Universität Karlsruhe, Fakultät für Informatik Karlsruhe (2007)
Walker, H.F., Ni, P.: Anderson acceleration for fixed-point iterations. SIAM J. Numeric. Anal. 49(4), 1715–1735 (2011)
Acknowledgements
This work was supported by the National Key R & D Program of China (2018YFC0808305).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, T., Ma, J. (2020). Functional Data Clustering Analysis via the Learning of Gaussian Processes with Wasserstein Distance. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-63833-7_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)