Learning a Semantic Space of Web Search via Session Data

  • Lidong BingEmail author
  • Zheng-Yu Niu
  • Wai Lam
  • Haifeng Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9994)


In Web search, a user first comes up with an information need and issues an initial query. Then some retrieved URLs are clicked and other queries are issued if he/she is not satisfied. We advocate that Web search is governed by a hidden semantic space, and each involved element such as query and URL has its projection, i.e., as a vector, in this space. Each of above actions in the search procedure, i.e. issuing queries or clicking URLs, is an interaction result of those elements in the space. In this paper, we aim at uncovering such a semantic space of Web search that uniformly captures the hidden semantics of search queries, URLs and other elements. We propose session2vec and session2vec+ models to learn vectors in the space with search session data, where a search session is regarded as an instantiation of an information need and keeps the interaction information of queries and URLs. Vector learning is done on a large query log from a search engine, and the efficacy of learnt vectors is examined in a few tasks.


  1. 1.
    Baroni, M., Dinu, G., Kruszewski, G.: Don’t count, predict! A systematic comparison of context-counting vs. context-predicting semantic vectors. In: ACL, pp. 238–247 (2014)Google Scholar
  2. 2.
    Bengio, Y., Ducharme, R., Vincent, P., Janvin, C.: A neural probabilistic language model. J. Mach. Learn. Res. 3, 1137–1155 (2003)zbMATHGoogle Scholar
  3. 3.
    Bing, L., Lam, W., Wong, T.L., Jameel, S.: Web query reformulation via joint modeling of latent topic dependency and term context. ACM Trans. Inf. Syst. 33(2), 1–38 (2015)CrossRefGoogle Scholar
  4. 4.
    Craswell, N., Szummer, M.: Random walks on the click graph. In: SIGIR, pp. 239–246 (2007)Google Scholar
  5. 5.
    Fang, H., Tao, T., Zhai, C.: A formal study of information retrieval heuristics. In: SIGIR, pp. 49–56 (2004)Google Scholar
  6. 6.
    Gao, J., He, X., Nie, J.Y.: Clickthrough-based translation models for web search: from word models to phrase models. In: CIKM (2010)Google Scholar
  7. 7.
    Gao, J., Toutanova, K., Yih, W.T.: Clickthrough-based latent semantic models for web search. In: SIGIR, pp. 675–684 (2011)Google Scholar
  8. 8.
    Grbovic, M., Djuric, N., Radosavljevic, V., Silvestri, F., Bhamidipati, N.: Context- and content-aware embeddings for query rewriting in sponsored search. In: SIGIR 2015 (2015)Google Scholar
  9. 9.
    Hofmann, T.: Unsupervised learning by probabilistic latent semantic analysis. Mach. Learn. 42(1–2), 177–196 (2001)CrossRefzbMATHGoogle Scholar
  10. 10.
    Huang, P.S., He, X., Gao, J., Deng, L., Acero, A., Heck, L.: Learning deep structured semantic models for web search using clickthrough data. In: CIKM, pp. 2333–2338 (2013)Google Scholar
  11. 11.
    Järvelin, K., Kekäläinen, J.: Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. 20(4), 422–446 (2002)CrossRefGoogle Scholar
  12. 12.
    Jones, R., Klinkner, K.L.: Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In: CIKM 2008, pp. 699–708 (2008)Google Scholar
  13. 13.
    Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: ICML, pp. 1188–1196 (2014)Google Scholar
  14. 14.
    Lee, S., Hu, Y.: Joint embedding of query and ad by leveraging implicit feedback. In: EMNLP, pp. 482–491 (2015)Google Scholar
  15. 15.
    Mikolov, T.: Statistical language models based on neural networks. Ph.D. thesis (2012)Google Scholar
  16. 16.
    Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: NIPS, pp. 3111–3119 (2013)Google Scholar
  17. 17.
    Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: EMNLP, pp. 1532–1543 (2014)Google Scholar
  18. 18.
    Ren, X., Wang, Y., Yu, X., Yan, J., Chen, Z., Han, J.: Heterogeneous graph-based intent learning with queries, web pages and wikipedia concepts. In: WSDM, pp. 23–32 (2014)Google Scholar
  19. 19.
    Robertson, S.E., Walker, S., Jones, S., Hancock-Beaulieu, M., Gatford, M.: Okapi at TREC-3. In: TREC, pp. 109–126 (1994)Google Scholar
  20. 20.
    Schwenk, H.: Continuous space language models. Comput. Speech Lang. 21(3), 492–518 (2007)CrossRefGoogle Scholar
  21. 21.
    Shen, Y., He, X., Gao, J., Deng, L., Mesnil, G.: A latent semantic model with convolutional-pooling structure for information retrieval. In: CIKM, pp. 101–110 (2014)Google Scholar
  22. 22.
    Singhal, A., Buckley, C., Mitra, M.: Pivoted document length normalization. In: SIGIR, pp. 21–29 (1996)Google Scholar
  23. 23.
    Socher, R., Bengio, Y., Manning, C.D.: Deep learning for NLP (without magic). In: Tutorial Abstracts of ACL 2012, p. 5 (2012)Google Scholar
  24. 24.
    Wu, W., Li, H., Xu, J.: Learning query and document similarities from click-through bipartite graph with metadata. In: WSDM, pp. 687–696 (2013)Google Scholar
  25. 25.
    Yih, W.t., Toutanova, K., Platt, J.C., Meek, C.: Learning discriminative projections for text similarity measures. In: CoNLL, pp. 247–256 (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2016

Authors and Affiliations

  • Lidong Bing
    • 1
    Email author
  • Zheng-Yu Niu
    • 2
  • Wai Lam
    • 3
  • Haifeng Wang
    • 2
  1. 1.Tencent Inc.ShenzhenChina
  2. 2.Baidu Inc.BeijingChina
  3. 3.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongHong KongChina

Personalised recommendations