Knowledge and Information Systems

, Volume 29, Issue 2, pp 457–478 | Cite as

Convex non-negative matrix factorization for massive datasets

  • Christian Thurau
  • Kristian Kersting
  • Mirwaes Wahabzada
  • Christian Bauckhage
Regular paper


Non-negative matrix factorization (NMF) has become a standard tool in data mining, information retrieval, and signal processing. It is used to factorize a non-negative data matrix into two non-negative matrix factors that contain basis elements and linear coefficients, respectively. Often, the columns of the first resulting factor are interpreted as “cluster centroids” of the input data, and the columns of the second factor are understood to contain cluster membership indicators. When analyzing data such as collections of gene expressions, documents, or images, it is often beneficial to ensure that the resulting cluster centroids are meaningful, for instance, by restricting them to be convex combinations of data points. However, known approaches to convex-NMF suffer from high computational costs and therefore hardly apply to large-scale data analysis problems. This paper presents a new framework for convex-NMF that allows for an efficient factorization of data matrices of millions of data points. Triggered by the simple observation that each data point can be expressed as a convex combination of vertices of the data convex hull, we require the basic factors to be vertices of the data convex hull. The benefits of convex-hull NMF are twofold. First, for a growing number of data points the expected size of the convex hull, i.e. the number of its vertices, grows much slower than the dataset. Second, distance preserving low-dimensional embeddings allow us to efficiently sample the convex hull and hence to quickly determine candidate vertices. Our extensive experimental evaluation on large datasets shows that convex-hull NMF compares favorably to convex-NMF in terms of both speed and reconstruction quality. We demonstrate that our method can easily be applied to large-scale, real-world datasets, in our case consisting of 750,000 DBLP entries, 4,000,000 digital images, and 150,000,000 votes on World of Warcraft ®guilds, respectively.


Matrix factorization Low-rank approximation Data mining Information retrieval Large-scale data analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aggarwal C (2009) On classification and segmentation of massive audio data streams. Knowl Inf Syst 20(2): 137–156MathSciNetCrossRefGoogle Scholar
  2. 2.
    Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc B 44(2): 139–177MathSciNetMATHGoogle Scholar
  3. 3.
    Cai D, He X, Wu X, Han J (2008) Non-negative matrix factorization on manifold. In: Proceedings of IEEE international conference on data miningGoogle Scholar
  4. 4.
    Chen Y, Rege M, Dong M, Hua J (2008) Non-negative matrix factorization for semi-supervised data clustering. Knowl Inf Syst 17(3): 355–379CrossRefGoogle Scholar
  5. 5.
    Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4): 338–347MathSciNetMATHCrossRefGoogle Scholar
  6. 6.
    de Berg M, van Kreveld M, Overmars M, Schwarzkopf O (2000) Computational geometry. Springer, HeidelbergMATHGoogle Scholar
  7. 7.
    Ding C, Li T, Jordan M (2009) Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell 32(1): 45–55CrossRefGoogle Scholar
  8. 8.
    Donoho D, Stodden V (2004) When does non-negative matrix factorization give a correct decomposition into parts?. In: Advances in neural information processing systems 16. MIT PressGoogle Scholar
  9. 9.
    Drineas P, Kannan R, Mahoney M (2006) , Fast Monte Carlo algorithms III: computing a compressed approixmate matrix decomposition. SIAM J Comput 36(1): 184–206MathSciNetMATHCrossRefGoogle Scholar
  10. 10.
    Faloutsos C , Lin K-I (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of ACM SIGMOD conferenceGoogle Scholar
  11. 11.
    Golub G, van Loan J (1996) Matrix computations. 3. Johns Hopkins University Press, BaltimoreGoogle Scholar
  12. 12.
    Halevy A, Norvig P, Pereira F (2009) The unreasonable effectiveness of data. IEEE Intell Syst 24(2): 8–12CrossRefGoogle Scholar
  13. 13.
    Hoyer P (2004) Non-negative matrix factorization with sparseness constraints. J Mach Learn 5(Dec): 1457–1469MathSciNetGoogle Scholar
  14. 14.
    Hueter I (1999) Limit theorems for the convex hull of random points in higher dimensions. Trans Am Math Soc 351(11): 4337–4363MathSciNetMATHCrossRefGoogle Scholar
  15. 15.
    Jolliffe I (1986) Principal component analysis. Springer, New YorkGoogle Scholar
  16. 16.
    Kim J, Park H (2008) Toward faster nonnegative matrix factorization: a new algorithm and comparisons. In: Proceedings of IEEE internationl conference on data miningGoogle Scholar
  17. 17.
    Klingenberg B, Curry J, Dougherty A (2008) Non-negative matrix factorization: ill-posedness and a geometric algorithm. Pattern Recogn 42(5): 918–928CrossRefGoogle Scholar
  18. 18.
    Langville A, Meyer C, Albright R (2006) Initializations for the nonnegative matrix factorization. In: Proceedings of ACM international conference on knowledge discovery and data miningGoogle Scholar
  19. 19.
    Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755): 788–799CrossRefGoogle Scholar
  20. 20.
    Li T (2008) Clustering based on matrix approximation: a unifying view. Knowl Inf Syst 17(1): 1–15MATHCrossRefGoogle Scholar
  21. 21.
    Olivia A, Torralba A (2001) Modeling the shape of the scene: a holistic representation of the spatial envelope. Int J Comput Vis 42(3): 145–175CrossRefGoogle Scholar
  22. 22.
    Ostrouchov G, Samatova N (2005) On FastMap and the convex hull of multivariate data: toward fast and robust dimension reduction. IEEE Trans Pattern Anal Mach Intell 27(8): 1340–1434CrossRefGoogle Scholar
  23. 23.
    Paatero P, Tapper U (1994) Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics 5(2): 111–126CrossRefGoogle Scholar
  24. 24.
    Rennie J, Srebro N (2005) Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of international conference on machine learningGoogle Scholar
  25. 25.
    Srebro N, Rennie JM, Jaakola T (2005) Maximum-margin matrix factorization. In: Advances in neural information processing systems 17. MIT PressGoogle Scholar
  26. 26.
    Sun J, Xie Y, Zhang H, Faloutsos C (2007) Less is more: compact matrix decomposition for large sparse graphs. In: Proceedings of SIAM international conference on data miningGoogle Scholar
  27. 27.
    Suvrit S (2008) Block-iterative algorithms for non-negative matrix approximation. In: Proceedings of IEEE international conference on data miningGoogle Scholar
  28. 28.
    Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the Wild. In: Proceedings of IEEE international conference on data miningGoogle Scholar
  29. 29.
    Torralba A, Fergus R, Freeman WT (2008) 80 Million tiny images: a large data set for nonparametric object and scene recognition. IEEE Trans Pattern Anal Mach Intell 30(11): 1958–1970CrossRefGoogle Scholar
  30. 30.
    Vasiloglou N, Gray A, Anderson D (2009) Non-negative matrix factorization, convexity and isometry. In: Proceedings of SIAM international conference on data miningGoogle Scholar
  31. 31.
    Ziegler G (1995) Lectures on polytopes. Springer, New YorkMATHGoogle Scholar

Copyright information

© Springer-Verlag London Limited 2010

Authors and Affiliations

  • Christian Thurau
    • 1
  • Kristian Kersting
    • 1
  • Mirwaes Wahabzada
    • 1
  • Christian Bauckhage
    • 1
  1. 1.Fraunhofer IAIS, Schloss BirlinghovenSankt AugustinGermany

Personalised recommendations