Abstract
We introduce Neural Choice by Elimination, a new framework that integrates deep neural networks into probabilistic sequential choice models for learning to rank. Given a set of items to chose from, the elimination strategy starts with the whole item set and iteratively eliminates the least worthy item in the remaining subset. We prove that the choice by elimination is equivalent to marginalizing out the random Gompertz latent utilities. Coupled with the choice model is the recently introduced Neural Highway Networks for approximating arbitrarily complex rank functions. We evaluate the proposed framework on a large-scale public dataset with over 425K items, drawn from the Yahoo! learning to rank challenge. It is demonstrated that the proposed method is competitive against state-of-the-art learning to rank methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsReferences
Agarwal, A., Raghavan, H., Subbian, K., Melville, P., Lawrence, R.D., Gondek, D.C., Fan, J.: Learning to rank for robust question answering. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, pp. 833–842. ACM (2012)
Azari, H., Parks, D., Xia, L.: Random utility theory for social choice. In: Advances in Neural Information Processing Systems, pp. 126–134 (2012)
Bengio, Y.: Learning deep architectures for AI. Found. Trends Mach. Learn. 2(1), 1–127 (2009)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., Hullender, G.: Learning to rank using gradient descent. In: Proceedings of the 22nd International Conference on Machine Learning, p. 96. ACM (2005)
Burges, C.J., Svore, K.M., Bennett, P.N., Pastusiak, A., Qiang, W.: Learning to rank using an ensemble of lambda-gradient models. In: Yahoo! Learning to Rank Challenge, pp. 25–35 (2011)
Chapelle, O., Chang, Y.: Yahoo! learning to rank challenge overview. In: JMLR Workshop and Conference Proceedings, vol. 14, pp. 1–24 (2011)
Chapelle, O., Metlzer, D., Zhang, Y., Grinspan, P.: Expected reciprocal rank for graded relevance. In: CIKM, pp. 621–630. ACM (2009)
Deng, L., He, X., Gao, J.: Deep stacking networks for information retrieval. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3153–3157. IEEE (2013)
Dong, Y., Huang, C., Liu, W.: RankCNN: when learning to rank encounters the pseudo preference feedback. Comput. Stan. Interfaces 36(3), 554–562 (2014)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29(5), 1189–1232 (2001)
Friedman, J.H.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Fu, Q., Lu, J., Wang, Z.: ‘Reverse’ nested lottery contests. J. Math. Econ. 50, 128–140 (2014)
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63(1), 3–42 (2006)
Gormley, I.C., Murphy, T.B.: A mixture of experts model for rank data with applications in election studies. Ann. Appl. Stat. 2(4), 1452–1477 (2008)
Henery, R.J.: Permutation probabilities as models for horse races. J. R. Stat. Soc. Ser. B 43(1), 86–91 (1981)
Huang, P.-S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: Proceedings of the 22nd ACM International Conference on Information and Knowledge Management, pp. 2333–2338. ACM (2013)
Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 20(4), 446 (2002)
Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142. ACM, New York (2002)
Li, P., Burges, C., Wu, Q., Platt, J.C., Koller, D., Singer, Y., Roweis, S.: McRank: learning to rank using multiple classification and gradient boosting. In: Advances in Neural Information Processing Systems (2007)
Liu, T.-Y.: Learning to Rank for Information Retrieval. Springer, Berlin (2011)
McFadden, D.: Conditional logit analysis of qualitative choice behavior. In: Frontiers in Econometrics, pp. 105–142 (1973)
Plackett, R.L.: The analysis of permutations. Appl. Stat. 24, 193–202 (1975)
Severyn, A., Moschitti, A.: Learning to rank short text pairs with convolutional deep neural networks. In: Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 373–382. ACM (2015)
Song, Y., Wang, H., He, X.: Adapting deep ranknet for personalized search. In: Proceedings of the 7th ACM International Conference on Web Search and Data Mining, pp. 83–92. ACM (2014)
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014)
Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks (2015). arXiv preprint: arXiv:1507.06228
Thurstone, L.L.: A law of comparative judgment. Psychol. Rev. 34(4), 273 (1927)
Tran, T., Phung, D., Venkatesh, S.: Thurstonian Boltzmann machines: learning from multiple inequalities. In: International Conference on Machine Learning (ICML), Atlanta, USA, 16–21 June 2013
Tran, T., Phung, D.Q., Venkatesh, S.: Sequential decision approach to ordinal preferences in recommender systems. In: Proceedings of the 26th AAAI Conference, Toronto, Ontario, Canada (2012)
Truyen, T., Phung, D.Q., Venkatesh, S.: Probabilistic models over ordered partitions with applications in document ranking and collaborative filtering. In: Proceedings of SIAM Conference on Data Mining (SDM), Mesa, Arizona, USA. SIAM (2011)
Tversky, A.: Elimination by aspects: a theory of choice. Psychol. Rev. 79(4), 281 (1972)
Xia, F., Liu, T.Y., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank: theory and algorithm. In Proceedings of the 25th International Conference on Machine Learning, pp. 1192–1199. ACM (2008)
Yellott, J.I.: The relationship between Luce’s choice axiom, Thurstone’s theory of comparative judgment, and the double exponential distribution. J. Math. Psychol. 15(2), 109–144 (1977)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Tran, T., Phung, D., Venkatesh, S. (2016). Neural Choice by Elimination via Highway Networks. In: Cao, H., Li, J., Wang, R. (eds) Trends and Applications in Knowledge Discovery and Data Mining. PAKDD 2016. Lecture Notes in Computer Science(), vol 9794. Springer, Cham. https://doi.org/10.1007/978-3-319-42996-0_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-42996-0_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-42995-3
Online ISBN: 978-3-319-42996-0
eBook Packages: Computer ScienceComputer Science (R0)