Data Mining and Knowledge Discovery

, Volume 21, Issue 2, pp 327–343 | Cite as

Predicting labels for dyadic data

Open Access
Article

Abstract

In dyadic prediction, the input consists of a pair of items (a dyad), and the goal is to predict the value of an observation related to the dyad. Special cases of dyadic prediction include collaborative filtering, where the goal is to predict ratings associated with (user, movie) pairs, and link prediction, where the goal is to predict the presence or absence of an edge between two nodes in a graph. In this paper, we study the problem of predicting labels associated with dyad members. Special cases of this problem include predicting characteristics of users in a collaborative filtering scenario, and predicting the label of a node in a graph, which is a task sometimes called within-network classification or relational learning. This paper shows how to extend a recent dyadic prediction method to predict labels for nodes and labels for edges simultaneously. The new method learns latent features within a log-linear model in a supervised way, to maximize predictive accuracy for both dyad observations and item labels. We compare the new approach to existing methods for within-network classification, both experimentally and analytically. The experiments show, surprisingly, that learning latent features in an unsupervised way is superior for some applications to learning them in a supervised way.

Keywords

Dyadic prediction Collaborative filtering Link prediction Social networks Within-network classification Relational learning 

References

  1. Blei DM, McAuliffe JD (2010) Supervised topic models. Revised version. http://arxiv.org/PS_cache/arxiv/pdf/1003/1003.0783v1.pdf
  2. Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9: 1871–1874Google Scholar
  3. Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (Denver, CO, USA, June 7–11, 2005), JCDL’05. ACM, New York, NY, pp 141–142Google Scholar
  4. Macskassy SA, Provost F (2003) A simple relational classifier. In: Proceedings of the second workshop on multi-relational data mining (MRDM-2003) at KDD-2003, pp 64–76Google Scholar
  5. Menon AK, Elkan C (2010a) Dyadic prediction using a latent feature log-linear model. http://arxiv.org/abs/1006.2156
  6. Menon AK, Elkan C (2010b) Fast algorithms for approximating singular value decomposition. ACM Trans Knowl Discov Data. Special issue large-scale data mining: theory appl (to appear)Google Scholar
  7. Sarkar P, Chen L, Dubrawski A (2008) Dynamic network model for predicting occurrences of salmonella at food facilities. In: Proceedings of the BioSecure international workshop. Springer, Heidelberg, pp 56–63Google Scholar
  8. Tang L (2010) Social dimension approach to classification in large-scale networks. http://www.public.asu.edu/~ltang9/social_dimension.html
  9. Tang L, Liu H (2009) Relational learning via latent social dimensions. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Edmonton, Alberta, pp 817–826Google Scholar
  10. USPS (2010) USPS dataset. Obtained from http://www-i6.informatik.rwth-aachen.de/~keysers/usps.html
  11. Weimer M, Karatzoglou A, Smola AJ (2008) Improving maximum margin matrix factorization. In: European conference on machine learning and principles and practice of knowledge discovery in databases. pp 263–276Google Scholar
  12. Yu K, Yu S, Tresp V (2005) Multi-label informed latent semantic indexing. In: ACM SIGIR conference on research and development in information retrieval. ACM, Boston, pp 258–265Google Scholar
  13. Yu S, Yu K, Tresp V, Kriegel HP, Wu M (2006) Supervised probabilistic principal component analysis. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, pp 464–473Google Scholar
  14. Zhu S, Yu K, Chi Y, Gong Y (2007) Combining content and link for classification using matrix factorization. In: ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Amsterdam, pp 487–494Google Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  1. 1.Department of Computer Science and EngineeringUniversity of CaliforniaSan DiegoUSA

Personalised recommendations