Simultaneous dimension reduction and clustering via the NMF-EM algorithm

Carel, Léna; Alquier, Pierre

doi:10.1007/s11634-020-00398-4

Simultaneous dimension reduction and clustering via the NMF-EM algorithm

Regular Article
Published: 25 May 2020

Volume 15, pages 231–260, (2021)
Cite this article

Advances in Data Analysis and Classification Aims and scope Submit manuscript

Léna Carel¹ &
Pierre Alquier²

397 Accesses
2 Citations
20 Altmetric
Explore all metrics

Abstract

Mixture models are among the most popular tools for clustering. However, when the dimension and the number of clusters is large, the estimation of the clusters become challenging, as well as their interpretation. Restriction on the parameters can be used to reduce the dimension. An example is given by mixture of factor analyzers for Gaussian mixtures. The extension of MFA to non-Gaussian mixtures is not straightforward. We propose a new constraint for parameters in non-Gaussian mixture model: the K components parameters are combinations of elements from a small dictionary, say H elements, with \(H \ll K\). Including a nonnegative matrix factorization (NMF) in the EM algorithm allows us to simultaneously estimate the dictionary and the parameters of the mixture. We propose the acronym NMF-EM for this algorithm, implemented in the R package nmfem. This original approach is motivated by passengers clustering from ticketing data: we apply NMF-EM to data from two Transdev public transport networks. In this case, the words are easily interpreted as typical slots in a timetable.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Comprehensive Survey of Clustering Algorithms

Article 01 June 2015

Dongkuan Xu & Yingjie Tian

Density-Based Clustering Based on Hierarchical Density Estimates

Data clustering: application and trends

Article 27 November 2022

Gbeminiyi John Oyewole & George Alex Thopil

References

Alquier P, Guedj B (2017) An oracle inequality for quasi-Bayesian non-negative matrix factorization. Math Methods Stat 26(1):55–67
Article Google Scholar
Arlot S, Massart P (2009) Data-driven calibration of penalties for least-squares regression. J Mach Learn Res 10(Feb):245–279
Google Scholar
Baek J, McLachlan GJ, Flack LK (2009) Mixtures of factor analyzers with common factor loadings: applications to the clustering and visualization of high-dimensional data. IEEE Trans Pattern Anal Mach Intell 32(7):1298–1309
Article Google Scholar
Baudry J-P, Maugis C, Michel B (2012) Slope heuristics: overview and implementation. Stat Comput 22(2):455–470
Article MathSciNet Google Scholar
Benaglia T, Chauveau D, Hunter DR, Young D (2009) mixtools: An R package for analyzing finite mixture models. J Stat Softw 32(6):1–29
Article Google Scholar
Biernacki C, Celeux G, Govaert G (1999) An improvement of the NEC criterion for assessing the number of clusters in a mixture model. Pattern Recognit Lett 20(3):267–272
Article Google Scholar
Bishop C (2007) Pattern recognition and machine learning (information science and statistics), 1st edn. 2006. corr. 2nd printing edn
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data Anal 71:52–78
Article MathSciNet Google Scholar
Bouveyron C, Côme E, Jacques J (2015) The discriminative functional mixture model for a comparative analysis of bike sharing systems. Ann Appl Stat 9(4):1726–1760
Article MathSciNet Google Scholar
Boyd S, Parikh N, Chu E, Peleato B, Eckstein J (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found Trends Mach Learn 3(1):1–122
Article Google Scholar
Carel L, Alquier P (2017) Non-negative matrix factorization as a pre-processing tool for travelers temporal profiles clustering. In: Verleysen M (ed) Proceedings of the 25th European symposium on artificial neural networks. pp 417–422. i6doc.com
Celeux G, Frühwirth-Schnatter S, Robert CP (eds) (2018a) Handbook of mixture analysis. CRC Press, Boca Raton
Google Scholar
Celeux G, Maugis-Rabusseau C, Sedki M (2018b) Variable selection in model-based clustering and discriminant analysis with a regularization approach. Adv Data Anal Classif 13:259–278
Article MathSciNet Google Scholar
Côme E, Oukhellou L (2014) Model-based count series clustering for bike sharing system usage mining: a case study with the Vélib’ system of Paris. ACM Trans Intell Syst Technol (TIST) 5(3):39
Google Scholar
Ding C, He X, Simon H. D (2005) On the equivalence of nonnegative matrix factorization and spectral clustering. In: Proceedings of the 2005 SIAM international conference on data mining. SIAM, pp 606–610
El Mahrsi MK, Côme E, Baro J, Oukhellou L (2014) Understanding passenger patterns in public transit through smart card and socioeconomic data: a case study in Rennes, France. In: ACM SIGKDD workshop on urban computing
Févotte C, Bertin N, Durrieu J-L (2009) Nonnegative matrix factorization with the Itakura–Saito divergence: with application to music analysis. Neural Comput 21(3):793–830
Article Google Scholar
Fop M, Murphy TB (2017) Variable selection methods for model based clustering. arXiv preprint arXiv:1707.00306
Fraley C, Raftery AE (2002) Model-based clustering, discriminant analysis, and density estimation. J Am Stat Assoc 97(458):611–631
Article MathSciNet Google Scholar
Ghahramani Z, Hinton GE (1996) The EM algorithm for mixtures of factor analyzers. Technical report, Technical Report CRG-TR-96-1, University of Toronto
Gonzalez EF, Zhang Y (2005) Accelerating the Lee–Seung algorithm for non-negative matrix factorization. Department of Computational and Applied Mathematics, Rice University, Houston, TX, Tech. Rep. TR-05-02
Grün B (2018) Model-based clustering. In: Celeux G, Frühwirth-Schnatter S, Robert CP (eds) Handbook of mixture analysis. CRC Press, Boca Raton, pp 155–188
Google Scholar
Hamon R, Borgnat P, Févotte C, Flandrin P, Robardet C (2015) Factorisation de réseaux temporels: étude des rythmes hebdomadaires du système Vélo’v. In: Colloque GRETSI 2015
Ihaka R, Gentleman R (1996) R: a language for data analysis and graphics. J Comput Graph Stat 5:299–314
Google Scholar
Khan ME, Bouchard G, Murphy KP, Marlin BM (2010) Variational bounds for mixed-data factor analysis. In: Lafferty JD, Williams CKI, Shawe-Taylor J, Zemel RS, Culotta A (eds) Advances in neural information processing systems, vol 23. Curran Associates, Inc, pp 1108–1116
Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. Computer 42(8):30–37
Article Google Scholar
Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Lee DL, Seung HS (2001) Algorithms for non-negative matrix factorization. In: Leen TK, Dietterich TG, Tresp V (eds) Advances in neural information processing systems, vol 13. MIT Press, pp 556–562
Lin C-J (2007) Projected gradient methods for non-negative matrix factorization. Neural Comput 19(10):2756–2779
Article MathSciNet Google Scholar
Luo X, Zhou M, Xia Y, Zhu Q (2014) An efficient non-negative matrix-factorization-based approach to collaborative filtering for recommender systems. IEEE Trans Ind Inform 10(2):1273–1284
Article Google Scholar
Maugis C, Celeux G, Martin-Magniette M-L (2009a) Variable selection for clustering with gaussian mixture models. Biometrics 65(3):701–709
Article MathSciNet Google Scholar
Maugis C, Celeux G, Martin-Magniette M-L (2009b) Variable selection in model-based clustering: a general variable role modeling. Comput Stat Data Anal 53(11):3872–3882
Article MathSciNet Google Scholar
McLachlan GJ, Peel D (2004) Finite mixture models. Wiley, Hoboken
MATH Google Scholar
McLachlan GJ, Peel D, Bean RW (2003) Modelling high-dimensional data by mixtures of factor analyzers. Comput Stat Data Anal 41(3–4):379–388
Article MathSciNet Google Scholar
McNicholas PD (2016a) Model-based clustering. J Classif 33(3):331–373
Article MathSciNet Google Scholar
McNicholas PD (2016b) Mixture model-based classification. CRC Press, Boca Raton
Book Google Scholar
McNicholas PD, Murphy TB (2008) Parsimonious gaussian mixture models. Stat Comput 18(3):285–296
Article MathSciNet Google Scholar
Mei J, De Castro Y, Goude Y, Hébrail G (2017) Recovering multiple nonnegative time series from a few temporal aggregates. In: 34th International conference on machine learning (ICML)
Montanari A, Viroli C (2010) Heteroscedastic factor mixture analysis. Stat Model 10(4):441–460
Article MathSciNet Google Scholar
Morency C, Trépanier M, Agard B (2007) Measuring transit use variability with smart-card data. Transp Policy 14(3):193–203
Article Google Scholar
Murphy K, Gormley IC, Viroli C (2017) Infinite mixtures of infinite factor analysers: nonparametric model-based clustering via latent gaussian models. arXiv preprint arXiv:1701.07010
Paisley J, Blei D, Jordan MI (2014) Bayesian nonnegative matrix factorization with stochastic variational inference. In: Airoldi EM, Blei D, Erosheva EA, Fienberg SE (eds) Handbook of mixed membership models and their applications. Chapman and Hall/CRC Handbooks of Modern Statistical Methods
Pelletier M.-P, Trépanier M, Morency C (2009) Smart card data in public transit planning: a review. CIRRELT
Peng C, Jin X, Wong K-C, Shi M, Liò P (2012) Collective human mobility pattern from taxi trips in urban area. PLoS ONE 7(4):e34487
Article Google Scholar
Poussevin M, Tonnelier E, Baskiotis N, Guigue V, Gallinari P (2014) Mining ticketing logs for usage characterization with nonnegative matrix factorization. In: International workshop on modeling social media. Springer, pp 147–164
Raftery AE, Dean N (2006) Variable selection for model-based clustering. J Am Stat Assoc 101(473):168–178
Article MathSciNet Google Scholar
Randriamanamihaga AN, Côme E, Oukhellou L, Govaert G (2013) Clustering the Vélib’ origin-destinations flows by means of poisson mixture models. In: ESANN
Scrucca L, Fop M, Murphy TB, Raftery AE (2016) mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. R J 8(1):289
Article Google Scholar
Shahnaz F, Berry M, Pauca P, Plemmons R (2006) Document clustering using nonnegative matrix factorization. Inf Process Manag 42(2):373–386
Article Google Scholar
Steinley D, Brusco MJ (2008) Selection of variables in cluster analysis: an empirical comparison of eight procedures. Psychometrika 73(1):125–144
Article MathSciNet Google Scholar
Sun D, Févotte C (2014) Alternating direction method of multipliers for non-negative matrix factorization with the beta-divergence. In: 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP). IEEE, pp 6201–6205
Tonnelier E, Baskiotis N, Guigue V, Gallinari P (2018) Anomaly detection in smart card logs and distant evaluation with twitter: a robust framework. Neurocomputing 298:109–121
Article Google Scholar
Wolfe JH (1963) Object cluster analysis of social areas. MSc thesis, Univ. of California
Wu M (2007) Collaborative filtering via ensembles of matrix factorizations. In: Proceedings of KDD cup and workshop. vol 2007
Xu W, Liu Xi, Gong Y (2003) Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th annual international ACM SIGIR conference on research and development in informaion retrieval. ACM, pp 267–273
Yang Y (2005) Can the strengths of AIC and BIC be shared? A conflict between model indentification and regression estimation. Biometrika 92(4):937–950
Article MathSciNet Google Scholar
Yang Z, Corander J, Oja E (2016) Low-rank doubly stochastic matrix decomposition for cluster analysis. J Mach Learn Res 17(187):1–25
MathSciNet MATH Google Scholar
Zheng Y, Capra L, Wolfson O, Yang H (2014) Urban computing: concepts, methodologies, and applications. ACM Trans Intell Syst Technol (TIST) 5(3):38
Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous Referees and the Associate Editor for their constructive comments and suggestions. We also thank Denis COUTROT and Nadir MEZIANI from Transdev for their support and comments on previous versions of this work.

Author information

Authors and Affiliations

Expedia Group, Geneva, Switzerland
Léna Carel
RIKEN Center for Advanced Intelligence Project, Tokyo, Japan
Pierre Alquier

Authors

Léna Carel
View author publications
You can also search for this author in PubMed Google Scholar
Pierre Alquier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pierre Alquier.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Léna Carel: This paper was written when the first author was a Ph.D. student at ENSAE Paris funded by the Transdev Group. Both author acknowledge the Transdev Group for funding, and for providing the data used in this paper.

Appendix

1.1 Analysis of the clusters of users

As written above, we have no personal information in our data. Therefore, we are not able to describe individually the users in each cluster. However, for each transaction made, we have the encrypted card number and the transport ticket used. So we can recover for each card the most used transport ticket during the period. This provides interesting information as some schemes are associated to age ranges (Young, Senior...) and to time periods (Unit, Annual or Monthly Subscription). Let us now provide the description of each cluster in terms of age ranges (Fig. 3a–c in Fig. 8).

Adults are more present in clusters 7 and 9, that are clusters with check-ins mostly in the morning. People benefiting from half-price are present in every cluster but with highest rates in clusters 2, 3, 4 and 5. Children (4–6) are not very present on the network, but they are more represented in clusters 1, 5 and 9. Young travelers (6–25) are more present in clusters 1 and 4. These clusters correspond to scholar time slot. In clusters 8 and 10 there are large rate of seniors and free travelers. As these clusters have profiles of diffuse travels during the week and as free travelers are unemployed or low salaries people, these regroupments make sense.

Figure 9 shows the repartition of transport ticket type through clusters. Unit products are more used in clusters 8 and 10 that are clusters with a lots of seniors and free travelers. As they don’t have obligations, they likely use unit products for occasional trips. Clusters 1, 3, 4 and 9, that have mostly scholar profiles althought have a large majority of annual subscripters. A possible interpretation is that schoolchildren and students are public transportation captives, and have to use the network in order to go to class every day. Thus, buying an annual pass is more advantageous than buying any other product type.

As described in Sect. 4.1, we kept only users whose first trip of the day is made at the same station at least \(50\%\) of the study time. That main “morning station” is thus called the “home station” as it gives us an estimation of the residence place of users. In Figs. 10 and 11, we can observe the shares of clusters by home stations. It shows the share of travelers identified as belonging to every cluster leaving near each station.

We note that:

1.
Cluster 1: travelers are over represented at peripheral stations.
2.
Cluster 2: no particular pattern observed.
3.
Cluster 3: no particular pattern observed.
4.
Cluster 4: few stations show over representation of cluster 4.
5.
Cluster 5: over representation of the cluster at two stations in the north.
6.
Cluster 6: no particular pattern observed.
7.
Cluster 7: One station is \(100\%\) represented by cluster 7. As only one user is assigned to this station, no particular pattern is observed.
8.
Cluster 8: the cluster is over represented at one station in the city center and at another further.
9.
Cluster 9: cluster 9 is over represented in few stations in the center.
10.
Cluster 10: cluster is over represented in poorest neighborhoods of the city.

1.2 Stations profile clustering

Clustering the different stations of the network would allow us to better know the different type of stations, and to group them by temporal similarity. As we have very few number of stations (475), it is not safe to process as described above for the users clustering. Indeed, a K larger than 6 or 7 leads to very small clusters. In place we fixed H and K a priori to 3 and 5 respectively.

The 3 words obtained are the ones in Fig. 12. The first time component is described by check-ins at 7 and 8 a.m. We will call it the “morning component”. The second time component shows check-ins at 4 and 5 p.m on Mondays, Tuesdays, Thursdays and Fridays and check-ins at 12 p.m on Wednesdays. We will name it the “end of school component”. The third component shows check-ins at 6 p.m, during Wednesdays afternoons, during Saturdays and off-peaks periods. This component will be called the “off-peak component”.

Figure 13 shows the 5 clusters. Stations in cluster 1 are stations where there are check-ins only in the morning at 7 or 8 a.m. These stations are likely in residential areas. In cluster 2, the stations have check-ins all day long, but with highest probabilities during peaks. Stations in cluster 3 have check-ins in the morning and at the end of school. They are likely to be near schools in residential areas. Stations in cluster 4 have check-ins only at end of school times. Thus, these stations are probably near schools. Finally, stations in cluster 5 are pretty similar than the ones in cluster 1: a large majority of check-ins are made in the morning (7 or 8 p.m). The only difference is that it is more likely to have check-ins during the rest of the day in cluster 5 than in cluster 1.

Thanks to the French National Institute of Statistics and Economics Studies (INSEE), there are open data permitting us to introduce contextual information. Firstly, a database containing socioeconomic data on a grid of 200 m \(\times \) 200 m is available. We used two indicator of it: the number of inhabitants and the percentage of households living in collective housing per tiles. Secondly, we used a database referencing and geolocating every french company or administration. In this way, we were able to know the number of employees per tile. By clustering the tiles in the study area, we obtained different group of areas that will allow us to lead the study on stations more finely. Table 3 contains the description of the mean tile by cluster.

Table 3 Description of tiles clusters

Full size table

As tiles contained in cluster 1 and 2, are those with the least number of employees, they can be described as residential areas. Moreover, the percentage of collective housing allows to distinguish them. Indeed, cluster 1 have more households living in collective housing than cluster 3. That is why we will refer as tiles from cluster 1 as residential areas in collective housing and as residential areas in individual housing for tiles from cluster 2. Since the number of inhabitants and of employees are high, tiles from cluster 3 will be refered as mixed areas. Finally, as the number of employees in cluster 4 is very large, we will refer these tiles as business areas.

The figures in Fig. 14 show the geographical repartition of the five clusters. In Fig. 6a, we oserve the stations contained in cluster 1. This cluster groups stations that have check-ins only in the morning. On the figure, we observe that these stations are distant from the city center and are mainly located in residential areas. Figure 6b shows stations of cluster 2, that have check-ins all day long with stronger attendance during peak-periods. These stations are mainly located in the city center. Figure 6c, d look alike. Indeed, clusters 3 and 4 have the “end of school” component and the points on the map are close to educational establishment. Figure 6e shows stations from cluster 5. These stations have check-ins all day long but most are made in the morning. By looking at the map, we cannot notice any significant pattern.

1.3 Passengers profile clustering on another network

To ensure efficiency of the algorithm, we applied it on another network located in the Netherlands. By applying the same model selection method as in Sect. 4.2, we obtained the optimal values of \(K=10\) and \(H=7\). Figures 15 and 16 contain respectively the profiles of the words and clusters obtained.

The interpretation of the words is:

1.
Word 1: travels at 6 or 7 a.m and slightly around 4 p.m during the week.
2.
Word 2: travels during the week-end.
3.
Word 3: diffuse travel habits from 8 a.m to 4 p.m Mondays to Fridays.
4.
Word 4: travels at 7a.m on weekdays.
5.
Word 5: diffuse habits with highest probabilities from 5 p.m to 12 a.m during the week.
6.
Word 6: diffuse habits from 9 a.m to 5 p.m with highest probability at 1 p.m Mondays to Saturdays.
7.
Word 7: travels at 8 a.m and 5 p.m.

We can interpret the cluster as follows:

1.
Cluster 1: diffuse habits from 9 a.m to 5 p.m with highest probability at 1 p.m Mondays to Saturdays.
2.
Cluster 2: travels at 6 or 7 a.m and at 4 or 5 p.m during the week.
3.
Cluster 3: diffuse habits from 7 a.m to 6 p.m on weekdays.
4.
Cluster 4: diffuse travel habits from 9 a.m to 11 p.m.
5.
Cluster 5: travels at 7 or 8 a.m diffuse habits during the afternoon.
6.
Cluster 6: travels at 8 a.m and 5 p.m.
7.
Cluster 7: diffuse travel habits from 7 a.m to 5 p.m Mondays to Fridays.
8.
Cluster 8: diffuse habits from 8 a.m to 4 p.m during the week.
9.
Cluster 9: travels during the week-end.
10.
Cluster 10: travels at 7 or 8 a.m and around 4 p.m.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Carel, L., Alquier, P. Simultaneous dimension reduction and clustering via the NMF-EM algorithm. Adv Data Anal Classif 15, 231–260 (2021). https://doi.org/10.1007/s11634-020-00398-4

Download citation

Received: 16 July 2019
Revised: 08 March 2020
Accepted: 25 April 2020
Published: 25 May 2020
Issue Date: March 2021
DOI: https://doi.org/10.1007/s11634-020-00398-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Simultaneous dimension reduction and clustering via the NMF-EM algorithm

Abstract

Access this article

Similar content being viewed by others

A Comprehensive Survey of Clustering Algorithms