Predicting labels for dyadic data
- 932 Downloads
In dyadic prediction, the input consists of a pair of items (a dyad), and the goal is to predict the value of an observation related to the dyad. Special cases of dyadic prediction include collaborative filtering, where the goal is to predict ratings associated with (user, movie) pairs, and link prediction, where the goal is to predict the presence or absence of an edge between two nodes in a graph. In this paper, we study the problem of predicting labels associated with dyad members. Special cases of this problem include predicting characteristics of users in a collaborative filtering scenario, and predicting the label of a node in a graph, which is a task sometimes called within-network classification or relational learning. This paper shows how to extend a recent dyadic prediction method to predict labels for nodes and labels for edges simultaneously. The new method learns latent features within a log-linear model in a supervised way, to maximize predictive accuracy for both dyad observations and item labels. We compare the new approach to existing methods for within-network classification, both experimentally and analytically. The experiments show, surprisingly, that learning latent features in an unsupervised way is superior for some applications to learning them in a supervised way.
KeywordsDyadic prediction Collaborative filtering Link prediction Social networks Within-network classification Relational learning
The authors thank Lei Tang for gracious help with running the code for SocDim and for answering several queries regarding the same. The authors also thank David Blei for providing the senator dataset.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- Blei DM, McAuliffe JD (2010) Supervised topic models. Revised version. http://arxiv.org/PS_cache/arxiv/pdf/1003/1003.0783v1.pdf
- Fan RE, Chang KW, Hsieh CJ, Wang XR, Lin CJ (2008) LIBLINEAR: a library for large linear classification. J Mach Learn Res 9: 1871–1874Google Scholar
- Huang Z, Li X, Chen H (2005) Link prediction approach to collaborative filtering. In: Proceedings of the 5th ACM/IEEE-CS joint conference on digital libraries (Denver, CO, USA, June 7–11, 2005), JCDL’05. ACM, New York, NY, pp 141–142Google Scholar
- Macskassy SA, Provost F (2003) A simple relational classifier. In: Proceedings of the second workshop on multi-relational data mining (MRDM-2003) at KDD-2003, pp 64–76Google Scholar
- Menon AK, Elkan C (2010a) Dyadic prediction using a latent feature log-linear model. http://arxiv.org/abs/1006.2156
- Menon AK, Elkan C (2010b) Fast algorithms for approximating singular value decomposition. ACM Trans Knowl Discov Data. Special issue large-scale data mining: theory appl (to appear)Google Scholar
- Sarkar P, Chen L, Dubrawski A (2008) Dynamic network model for predicting occurrences of salmonella at food facilities. In: Proceedings of the BioSecure international workshop. Springer, Heidelberg, pp 56–63Google Scholar
- Tang L (2010) Social dimension approach to classification in large-scale networks. http://www.public.asu.edu/~ltang9/social_dimension.html
- Tang L, Liu H (2009) Relational learning via latent social dimensions. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Edmonton, Alberta, pp 817–826Google Scholar
- USPS (2010) USPS dataset. Obtained from http://www-i6.informatik.rwth-aachen.de/~keysers/usps.html
- Weimer M, Karatzoglou A, Smola AJ (2008) Improving maximum margin matrix factorization. In: European conference on machine learning and principles and practice of knowledge discovery in databases. pp 263–276Google Scholar
- Yu K, Yu S, Tresp V (2005) Multi-label informed latent semantic indexing. In: ACM SIGIR conference on research and development in information retrieval. ACM, Boston, pp 258–265Google Scholar
- Yu S, Yu K, Tresp V, Kriegel HP, Wu M (2006) Supervised probabilistic principal component analysis. In: ACM SIGKDD international conference on knowledge discovery and data mining. ACM, Philadelphia, pp 464–473Google Scholar
- Zhu S, Yu K, Chi Y, Gong Y (2007) Combining content and link for classification using matrix factorization. In: ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, Amsterdam, pp 487–494Google Scholar