Abstract
Nonnegative matrix factorization based methods provide one of the simplest and most effective approaches to text mining. However, their applicability is mainly limited to analyzing a single data source. In this chapter, we propose a novel joint matrix factorization framework which can jointly analyze multiple data sources by exploiting their shared and individual structures. The proposed framework is flexible to handle any arbitrary sharing configurations encountered in real world data. We derive an efficient algorithm for learning the factorization and show that its convergence is theoretically guaranteed. We demonstrate the utility and effectiveness of the proposed framework in two real-world applications—improving social media retrieval using auxiliary sources and cross-social media retrieval. Representing each social media source using their textual tags, for both applications, we show that retrieval performance exceeds the existing state-of-the-art techniques. The proposed solution provides a generic framework and can be applicable to a wider context in data mining wherever one needs to exploit mutual and individual knowledge present across multiple data sources.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
- 2.
- 3.
- 4.
fixed at 0.05 for selecting the tags with more than 5Â % weight in a topic.
References
Ando, R., Zhang, T.: A framework for learning predictive structures from multiple tasks and unlabeled data. J. Mach. Learn. Res. 6, 1817–1853 (2005)
Baeza-Yates, R., Ribeiro-Neto, B., et al.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Berry, M., Browne, M.: Email surveillance using non-negative matrix factorization. Comput. Math. Org. Theor. 11(3), 249–264 (2005)
Cilibrasi, R., Vitanyi, P.: The Google similarity distance. IEEE Trans. Knowl. Data Eng. 19(3), 370–383 (2007)
Golder, S., Huberman, B.: Usage patterns of collaborative tagging systems. J. Inf. Sci. 32(2), 198 (2006)
Gu, Q., Zhou, J., (2009) Learning the shared subspace for multi-task clustering and transductive transfer classification. In: 9th IEEE International Conference on Data Mining: ICDM’09, pp. 159–168. IEEE (2009)
Gupta, S., Phung, D., Adams, B., Tran, T., Venkatesh, S.: Nonnegative shared subspace learning and its application to social media retrieval. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1169–1178. ACM (2010)
Gupta, S., Phung, D., Adams, B., Venkatesh, S.: Regularized nonnegative shared subspace learning. Data Min. Knowl. Disc. 26(1), 57–97, (2011)
Ji, S., Tang, L., Yu, S., Ye, J.: A shared-subspace learning framework for multi-label classification. ACM Trans. Knowl. Disc. Data 4(2), 1–29 (2010)
Kankanhalli, M., Rui, Y.: Application potential of multimedia information retrieval. Proc. IEEE 96(4), 712–720 (2008)
Lee, D., Seung, H.: Algorithms for non-negative matrix factorization. Adv. Neural Inf. Process. Syst. 13, 556–562 (2001)
Lin, C.: Projected gradient methods for nonnegative matrix factorization. Neural Comput. 19(10), 2756–2779 (2007)
Lin, Y., Sundaram, H., De Choudhury, M., Kelliher, A.: Temporal patterns in social media streams: theme discovery and evolution using joint analysis of content and context. In: IEEE International Conference on Multimedia and Expo, 2009: ICME 2009, pp. 1456–1459 (2009)
Mardia, K.V., Bibby, J.M., Kent, J.T.: Multivariate Analysis. Academic Press, New York (1979)
Marlow, C., Naaman, M., Boyd, D., Davis, M.: HT06, tagging paper, taxonomy, flickr, academic article, to read. In: Proceedings Hypertext’06, pp. 31–40 (2006)
Shahnaz, F., Berry, M., Pauca, V., Plemmons, R.: Document clustering using nonnegative matrix factorization. Inf. Process. Manage. 42(2), 373–386 (2006)
Si, S., Tao, D., Geng, B.: Bregman divergence based regularization for transfer subspace learning. IEEE Trans. Knowl. Data Eng. 22(7), 929–942 (2009)
Sigurbjörnsson, B., Van Zwol, R.: Flickr tag recommendation based on collective knowledge. In: Proceeding of the 17th International Conference on World Wide Web, pp. 327–336. ACM, New York (2008)
Xu, W., Liu, X., Gong, Y.: Document clustering based on non-negative matrix factorization. In: Proceedings of the 26th ACM SIGIR Conference on Research and Development in Informaion Retrieval, pp. 267–273 (2003)
Yan, R., Tesic, J., Smith, J.: Model-shared subspace boosting for multi-label classification. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 834–843. ACM (2007)
Yang, Y., Xu, D., Nie, F., Luo, J., Zhuang, Y.: Ranking with local regression and global alignment for cross media retrieval. In: Proceedings of the 17th ACM International Conference on Multimedia, pp. 175–184. ACM (2009)
Yi, Y., Zhuang, Y., Wu, F., Pan, Y.: Harmonizing hierarchical manifolds for multimedia document semantics understanding and cross-media retrieval. Language 1520, 9210 (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix
Appendix
1.1 Proof of Convergence
We prove the convergence of multiplicative updates given by Eqs. (10) and (11). We avoid lengthy derivations and only provide a sketch of the proof. Following Ref. [11], the auxiliary function \(G(w,w^{t})\) is defined as an upper bound function for \(J(w^{t})\). For our MS-NMF case, we prove the following lemma extended from Ref. [11]:
Lemma.
If \(\left( W_{\nu }\right) _{p}\) is \(p\)th row of matrix \(W_{\nu }\), \(\nu \in S\left( n,i\right) \) and \(C\left( \left( W_{\nu }\right) _{p}\right) \) is the diagonal matrix with its \(\left( l,k\right) \)th element
then
is an auxiliary function for \(J\left( \left( W_{\nu }\right) _{p}^{t}\right) \), cost function defined for \(p\)th row of the data.
Proof.
The second derivative of \(J\left( \left( W_{\nu }\right) _{p}^{t}\right) \) i.e. \(\nabla _{\left( W_{\nu }\right) _{p}^{t}}^{2}J\left( \left( W_{\nu }\right) _{p}^{t}\right) =\sum _{i\in \nu }\lambda _{i}H_{i,\nu } H_{i,\nu }^{\mathsf {T}}\). Comparing the expression of \(G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) in the lemma with the Taylor series expansion of \(G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) at \(\left( W_{\nu }\right) _{p}^{t}\), it can be seen that all we need to prove is the following
where \(T_{W_{\nu }}\triangleq C\left( \left( W_{\nu }\right) _{p}^{t}\right) -\sum _{i\in \nu }\lambda _{i}H_{i,\nu }H_{i,\nu }^{\mathsf {T}}\). Similar to Ref. [11], instead of showing it directly, we show the positive definiteness of matrix \(E\) with elements
For positive definiteness of matrix \(E\), for every nonzero \(z\), we have to show that \(z^{\mathsf {T}}Mz\) is positive. To avoid lengthy derivation, we only show main step here :
At the local minimum of \(G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) for iteration \(\left( t\right) \), by comparing \(\nabla _{\left( W_{\nu }\right) _{p}^{t}}G\left( \left( W_{\nu }\right) _{p},\left( W_{\nu }\right) _{p}^{t}\right) \) with gradient-descent update of Eq. (8), we get the step size \(\eta _{\left( W_{\nu }\right) _{lk}^{t}}\) as in Eq. (9).
Rights and permissions
Copyright information
© 2014 Springer-Verlag Berlin Heidelberg
About this chapter
Cite this chapter
Gupta, S.K., Phung, D., Adams, B., Venkatesh, S. (2014). A Matrix Factorization Framework for Jointly Analyzing Multiple Nonnegative Data Sources. In: Yada, K. (eds) Data Mining for Service. Studies in Big Data, vol 3. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45252-9_10
Download citation
DOI: https://doi.org/10.1007/978-3-642-45252-9_10
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-45251-2
Online ISBN: 978-3-642-45252-9
eBook Packages: EngineeringEngineering (R0)