Skip to main content

A Matrix Factorization Framework for Jointly Analyzing Multiple Nonnegative Data Sources

  • Chapter
  • First Online:
Data Mining for Service

Part of the book series: Studies in Big Data ((SBD,volume 3))

Abstract

Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this chapter, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications—improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    http://www.blogger.com/

  2. 2.

    http://www.flickr.com/services/api/

  3. 3.

    http://code.google.com/apis/youtube/overview.html

  4. 4.

    fixed at 0.05 for selecting the tags with more than 5 % weight in a topic.

References

  1. Ando, R., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)

    MATH  MathSciNet  Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  3. Berry, M., Browne, M.: Email surveillance using non-negative matrix factorization. Comput. Math. Org. Theor. 11(3), 249–264 (2005)

    Article  MATH  Google Scholar 

  4. Cilibrasi, R., Vitanyi, P.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)

    Article  Google Scholar 

  5. Golder, S., Huberman, B.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198 (2006)

    Article  Google Scholar 

  6. Gu, Q., Zhou, J., (2009) Learning the shared subspace for multi-task clustering and transductive transfer classification. In: 9th IEEE International Conference on Data Mining: ICDM’09, pp. 159–168. IEEE (2009)

    Google Scholar 

  7. Gupta, S., Phung, D., Adams, B., Tran, T., Venkatesh, S.: Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1169–1178. ACM (2010)

    Google Scholar 

  8. Gupta, S., Phung, D., Adams, B., Venkatesh, S.: Regularized nonnegative shared subspace learning. Data Min. Knowl. Disc. 26(1), 57–97, (2011)

    Google Scholar 

  9. Ji, S., Tang, L., Yu, S., Ye, J.: A shared-subspace learning framework for multi-label classification. ACM Trans. Knowl. Disc. Data 4(2), 1–29 (2010)

    Article  Google Scholar 

  10. Kankanhalli, M., Rui, Y.: Application potential of multimedia information retrieval. Proc. IEEE 96(4), 712–720 (2008)

    Article  Google Scholar 

  11. Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 13, 556–562 (2001)

    Google Scholar 

  12. Lin, C.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  13. Lin, Y., Sundaram, H., De Choudhury, M., Kelliher, A.: Temporal patterns in social media streams: theme discovery and evolution using joint analysis of content and context. In: IEEE International Conference on Multimedia and Expo, 2009: ICME 2009, pp. 1456–1459 (2009)

    Google Scholar 

  14. Mardia, K.V., Bibby, J.M., Kent, J.T.: Multivariate Analysis. Academic Press, New York (1979)

    Google Scholar 

  15. Marlow, C., Naaman, M., Boyd, D., Davis, M.: HT06, tagging paper, taxonomy, flickr, academic article, to read. In: Proceedings Hypertext’06, pp. 31–40 (2006)

    Google Scholar 

  16. Shahnaz, F., Berry, M., Pauca, V., Plemmons, R.: Document clustering using nonnegative matrix factorization. Inf. Process. Manage. 42(2), 373–386 (2006)

    Article  MATH  Google Scholar 

  17. Si, S., Tao, D., Geng, B.: Bregman divergence based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22(7), 929–942 (2009)

    Article  Google Scholar 

  18. Sigurbjörnsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceeding of the 17th International Conference on World Wide Web, pp. 327–336. ACM, New York (2008)

    Google Scholar 

  19. Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267–273 (2003)

    Google Scholar 

  20. Yan, R., Tesic, J., Smith, J.: Model-shared subspace boosting for multi-label classification. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 834–843. ACM (2007)

    Google Scholar 

  21. Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 175–184. ACM (2009)

    Google Scholar 

  22. Yi, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. Language 1520, 9210 (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sunil Kumar Gupta .

Editor information

Editors and Affiliations

Appendix

Appendix

1.1 Proof of Convergence

We prove the convergence of multiplicative updates given by Eqs.  (10) and (11). We avoid lengthy derivations and only provide a sketch of the proof. Following Ref. [11], the auxiliary function \(G(w,w^{t})\) is defined as an upper bound function for \(J(w^{t})\). For our MS-NMF case, we prove the following lemma extended from Ref. [11]:

Lemma.

If \(\left( W_{\nu }\right) _{p}\) is \(p\)th row of matrix \(W_{\nu }\), \(\nu \in S\left( n,i\right) \) and \(C\left( \left( W_{\nu }\right) _{p}\right) \) is the diagonal matrix with its \(\left( l,k\right) \)th element

$$ C_{lk}\left( \left( W_{\nu }\right) _{p}\right) =\mathbf {1}_{l,k}\frac{\left( \sum \limits _{i\in \nu }\lambda _{i}H_{i,\nu }\left( \sum \limits _{u\in S\left( n,i\right) }H_{i,u}^{\mathsf {T}}\left( W_{u}\right) _{p}\right) \right) _{l}}{\left( W_{\nu }\right) _{pl}} $$

then

$$ G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) =J\left( \left( W_{\nu }\right) _{p}^{t}\right) +\left( \left( W_{\nu }\right) _{p}-\left( W_{\nu }\right) _{p}^{t}\right) ^{\mathsf {T}}\nabla _{\left( W_{\nu }\right) _{p}^{t}}J\left( \left( W_{\nu }\right) _{p}^{t}\right) \\ +\frac{1}{2}\left( \left( W_{\nu }\right) _{p}-\left( W_{\nu }\right) _{p}^{t}\right) ^{\mathsf {T}}C\left( \left( W_{\nu }\right) _{p}^{t}\right) \left( \left( W_{\nu }\right) _{p}^{t}-\left( W_{\nu }\right) _{p}^{t}\right) $$

is an auxiliary function for \(J\left( \left( W_{\nu }\right) _{p}^{t}\right) \), cost function defined for \(p\)th row of the data.

Proof.

The second derivative of \(J\left( \left( W_{\nu }\right) _{p}^{t}\right) \) i.e. \(\nabla _{\left( W_{\nu }\right) _{p}^{t}}^{2}J\left( \left( W_{\nu }\right) _{p}^{t}\right) =\sum _{i\in \nu }\lambda _{i}H_{i,\nu } H_{i,\nu }^{\mathsf {T}}\). Comparing the expression of \(G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) in the lemma with the Taylor series expansion of \(G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) at \(\left( W_{\nu }\right) _{p}^{t}\), it can be seen that all we need to prove is the following

$$ \left( \left( W_{\nu }\right) _{p}-\left( W_{\nu }\right) _{p}^{t}\right) ^{\mathsf {T}}T_{W_{\nu }}\left( \left( W_{\nu }\right) _{p}^{t}-\left( W_{\nu }\right) _{p}^{t}\right) \ge 0 $$

where \(T_{W_{\nu }}\triangleq C\left( \left( W_{\nu }\right) _{p}^{t}\right) -\sum _{i\in \nu }\lambda _{i}H_{i,\nu }H_{i,\nu }^{\mathsf {T}}\). Similar to Ref. [11], instead of showing it directly, we show the positive definiteness of matrix \(E\) with elements

$$\begin{aligned} E_{lk}\left( \left( W_{\nu }\right) _{p}^{t}\right)&= \left( \left( W_{\nu }\right) _{p}-\left( W_{\nu }\right) _{p}^{t}\right) _{l}^{\mathsf {T}}\left( T_{W_{\nu }}\right) _{lk}\left( \left( W_{\nu }\right) _{p}^{t}-\left( W_{\nu }\right) _{p}^{t}\right) _{k} \end{aligned}$$

For positive definiteness of matrix \(E\), for every nonzero \(z\), we have to show that \(z^{\mathsf {T}}Mz\) is positive. To avoid lengthy derivation, we only show main step here :

$$\begin{aligned} z^{\mathsf {T}}Mz&=\sum _{l,k}z_{l}\left( W_{\nu }\right) _{pl}^{t}\left( T_{W_{\nu }}\right) _{lk}\left( W_{\nu }\right) _{pk}^{t}z_{k}\\&=\sum _{l,k}z_{l}^{2}\left( W_{\nu }\right) _{pl}^{t}\left( \sum _{u\in S\left( n,i\right) ,u\ne \nu }H_{i,u}^{\mathsf {T}}\left( W_{u}\right) _{p}\right) _{l}\\&\quad +\lambda \sum _{l,k}\left( W_{\nu }\right) _{pl}^{t}\left( \sum _{i\in \nu }\lambda _{i}\left( H_{i,\nu }H_{i,\nu }^{\mathsf {T}}\right) _{lk}\right) \left( W_{\nu }\right) _{pk}^{t}\frac{\left( z_{l}-z_{k}\right) ^{2}}{2}\ge 0 \end{aligned}$$

At the local minimum of \(G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) for iteration \(\left( t\right) \), by comparing \(\nabla _{\left( W_{\nu }\right) _{p}^{t}}G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) with gradient-descent update of Eq. (8), we get the step size \(\eta _{\left( W_{\nu }\right) _{lk}^{t}}\) as in Eq. (9).

Rights and permissions

Reprints and permissions

Copyright information

© 2014 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Gupta, S.K., Phung, D., Adams, B., Venkatesh, S. (2014). A Matrix Factorization Framework for Jointly Analyzing Multiple Nonnegative Data Sources. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-45252-9_10

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-45251-2

  • Online ISBN: 978-3-642-45252-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics