Skip to main content

Descriptive matrix factorization for sustainability Adopting the principle of opposites

Abstract

Climate change, the global energy footprint, and strategies for sustainable development have become topics of considerable political and public interest. The public debate is informed by an exponentially growing amount of data and there are diverse partisan interest when it comes to interpretation. We therefore believe that data analysis methods are called for that provide results which are intuitively understandable even to non-experts. Moreover, such methods should be efficient so that non-experts users can perform their own analysis at low expense in order to understand the effects of different parameters and influential factors. In this paper, we discuss a new technique for factorizing data matrices that meets both these requirements. The basic idea is to represent a set of data by means of convex combinations of extreme data points. This often accommodates human cognition. In contrast to established factorization methods, the approach presented in this paper can also determine over-complete bases. At the same time, convex combinations allow for highly efficient matrix factorization. Based on techniques adopted from the field of distance geometry, we derive a linear time algorithm to determine suitable basis vectors for factorization. By means of the example of several environmental and developmental data sets we discuss the performance and characteristics of the proposed approach and validate that significant efficiency gains are obtainable without performance decreases compared to existing convexity constrained approaches.

This is a preview of subscription content, access via your institution.

References

  • Achlioptas D, McSherry F (2007) Fast computation of low-rank matrix approximations. J ACM 54(9): 1–19

    MathSciNet  Google Scholar 

  • Aguilar O, Huerta G, Prado R, West M (1998) Bayesian inference on latent structure in time series. In: Bernardo J, Bergen J, Dawid A, Smith A (eds) Bayesian statistics. Oxford University Press, Oxford

    Google Scholar 

  • Blumenthal LM (1953) Theory and applications of distance geometry. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Chan B, Mitchell D, Cram L (2003) Archetypal analysis of galaxy spectra. Mon Not R Astron Soc 338(3): 790–795

    Article  Google Scholar 

  • Chang CI, Wu CC, Liu WM, Ouyang YC (2006) A new growing method for simplex-based endmember extraction algorithm. IEEE T Geosci Remote 44(10): 2804–2819

    Article  Google Scholar 

  • Crippen G (1988) Distance geometry and molecular conformation. Wiley, New York

    MATH  Google Scholar 

  • Cutler A, Breiman L (1994) Archetypal analysis. Technometrics 36(4): 338–347

    Article  MATH  MathSciNet  Google Scholar 

  • Dean J, Ghemawat S (2008) Mapreduce: simplified data processing on large clusters. Commun ACM 51(1): 107–113

    Article  Google Scholar 

  • Ding C, Li T, Jordan M (2010) Convex and semi-nonnegative matrix factorizations. IEEE T Pattern Anal 32(1): 45–55

    Article  Google Scholar 

  • Drineas P, Kannan R, Mahoney M (2006) Fast Monte Carlo algorithms III: computing a compressed approixmate matrix decomposition. SIAM J Comput 36(1): 184–206

    Article  MATH  MathSciNet  Google Scholar 

  • Faloutsos C, Lin KI (1995) FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets. In: Proceedings of the ACM SIGMOD international conference on management of data, San Diego

  • Foster D, Nascimento S, Amano K (2004) Information limits on neural identification of coloured surfaces in natural scenes. Visual Neurosci 21: 331–336

    Article  Google Scholar 

  • Gomes C (2009) Computational sustainability. The Bridge, National Academy of Engineering 39(4): 6–11

    Google Scholar 

  • Goreinov SA, Tyrtyshnikov EE (2001) The maximum-volume concept in approximation by low-rank matrices. Contemp Math 280: 47–51

    MathSciNet  Google Scholar 

  • Hotelling H (1933) Analysis of a complex of statistical variables into principal components. J Educ Psychol 24(7): 498–520

    Article  Google Scholar 

  • Kersting K, Wahabzada M, Thurau C, Bauckhage C (2010) Hierarchical convex NMF for clustering massive data. In: Proceedings of the 2nd Asian Conference on Machine Learning (ACML-10)

  • Lee DD, Seung HS (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755): 788–799

    Article  Google Scholar 

  • Lucas A, Klaassen P, Spreij P, Straetmans S (2003) Tail behaviour of credit loss distributions for general latent factor models. Appl Math Finance 10(4): 337–357

    Article  MATH  Google Scholar 

  • MacKay D (2009) Sustainable energy—without the hot air. UIT Cambridge Ltd, Cambridge

    Google Scholar 

  • Miao L, Qi H (2007) Endmember extraction from highly mixed data using minimum volume constrained nonnegative matrix factorization. IEEE T Geosci Remote 45(3): 765–777

    Article  Google Scholar 

  • Nascimento JMP, Dias JMB (2005) Vertex component analysis: a fast algorithm to unmix hyperspectral data. IEEE T Geosci Remote 43(4): 898–910

    Article  Google Scholar 

  • Ostrouchov G, Samatova N (2005) On fastmap and the convex hull of multivariate data: toward fast and robust dimension reduction. IEEE T Pattern Anal 27(8): 1340–1434

    Article  Google Scholar 

  • Sippl M, Sheraga H (1986) Cayley-Menger coordinates. Proc Natl Acad Sci 83(8): 2283–2287

    Article  MATH  Google Scholar 

  • Spearman C (1904) General intelligence objectively determined and measured. Am J Psychol 15: 201–293

    Article  Google Scholar 

  • Thurau C, Kersting K, Bauckhage C (2009) Convex non-negative matrix factorization in the wild. In: Proceedings of the IEEE International Conference on Data Mining, Miami

  • Thurau C, Kersting K, Wahabzada M, Bauckhage C (2010) Convex non-negative matrix factorization for massive datasets. Knowl Inf Syst (KAIS). doi:10.1007/s10115-010-0352-6

  • Winter ME (1999) N-FINDR: an algorithm for fast and autonomous spectral endmember determination in hyperspectral data. In: Proceedings of the International Conference on Applied Geologic Remote Sensing, Vancouver

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christian Thurau.

Additional information

Responsible editor: Katharina Morik, Kanishka Bhaduri and Hillol Kargupta.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Thurau, C., Kersting, K., Wahabzada, M. et al. Descriptive matrix factorization for sustainability Adopting the principle of opposites. Data Min Knowl Disc 24, 325–354 (2012). https://doi.org/10.1007/s10618-011-0216-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10618-011-0216-z

Keywords

  • Matrix factorization
  • Convex combinations
  • Distance geometry
  • Large-scale data analysis