A Query Log Analysis of Dataset Search

  • Emilia KacprzakEmail author
  • Laura M. Koesten
  • Luis-Daniel Ibáñez
  • Elena Simperl
  • Jeni Tennison
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10360)


Data is one of the most important digital assets in the world and its availability on the web is increasing. To use it effectively, we need tools that can retrieve the most relevant datasets to match our information needs. Web search engines are not well suited for this task, as they are designed primarily for documents, not data. In this paper, we present the first query log analysis for dataset search, based on logs of four national open data portals. Our aim is to gain a better understanding of the typical users of these portals and the types of queries they issue, and frame the findings in the broader context of dataset search. The logs suggest that queries issued on data portals differ from those issued to web search engines in their length and structure. From the analysis we could also infer that the portals are used exploratively, rather than to answer focused questions. These insights can inform the design of more effective dataset retrieval technology, and improve the user experience of data portals.


Query log analysis Dataset search User behaviour 



This project is supported by the European Union Horizon 2020 program under the Marie Sklodowska-Curie grant agreement No. 642795.


  1. 1.
    Ames, D.P., Horsburgh, J.S., Cao, Y., Kadlec, J., Whiteaker, T.L., Valentine, D.: HydroDesktop: web services-based software for hydrologic data discovery, download, visualization, and analysis. Environ. Model. Softw. 37, 146–156 (2012)CrossRefGoogle Scholar
  2. 2.
    Bendersky, M., Croft, W.B.: Analysis of long queries in a large scale search log. In: Proceedings of the 2009 Workshop on Web Search Click Data, pp. 8–14. ACM (2009)Google Scholar
  3. 3.
    Cafarella, M.J., Halevy, A., Madhavan, J.: Structured data on the web. Commun. ACM 54(2), 72–79 (2011)CrossRefGoogle Scholar
  4. 4.
    Devarakonda, R., Palanisamy, G., Wilson, B.E., Green, J.M.: Mercury: reusable metadata management, data discovery and access system. Earth Sci. Inform. 3(1), 87–94 (2010)CrossRefGoogle Scholar
  5. 5.
    Gan, Q., Attenberg, J., Markowetz, A., Suel, T.: Analysis of geographic queries in a search engine log. In: Proceedings of the First International Workshop on Location and the Web, pp. 49–56. ACM (2008)Google Scholar
  6. 6.
    Jansen, B.J., Spink, A.: How are we searching the world wide web? a comparison of nine search engine transaction logs. Inf. Process. Manag. 42(1), 248–263 (2006)CrossRefGoogle Scholar
  7. 7.
    Jiang, D., Pei, J., Li, H.: Mining search and browse logs for web search: a survey. ACM Trans. Intell. Syst. Technol. 4(4), 57:1–57:37 (2013)CrossRefGoogle Scholar
  8. 8.
    Jones, S., Cunningham, S.J., McNab, R., Boddie, S.: A transaction log analysis of a digital library. Int. J. Digit. Libr. 3(2), 152–169 (2000)CrossRefGoogle Scholar
  9. 9.
    Kelly, D.: Methods for evaluating interactive information retrieval systems with users. Found. Trends Inf. Retrieval 3(1–2), 1–224 (2009)Google Scholar
  10. 10.
    Koesten, L.M., Kacprzak, E., Tennison, J., Simperl, E.: The trials and tribulations of working with structured data - a study on information seeking behaviour. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI 2017. ACM (2017, to appear)Google Scholar
  11. 11.
    Kunze, S.R., Auer, S.: Dataset retrieval. In: 2013 IEEE Seventh International Conference on Semantic Computing, September 2013Google Scholar
  12. 12.
    Lehmberg, O., Ritze, D., Meusel, R., Bizer, C.: A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 75–76 (2016)Google Scholar
  13. 13.
    Nunes, S., Ribeiro, C., David, G.: Use of temporal expressions in web search. In: Macdonald, C., Ounis, I., Plachouras, V., Ruthven, I., White, R.W. (eds.) ECIR 2008. LNCS, vol. 4956, pp. 580–584. Springer, Heidelberg (2008). doi: 10.1007/978-3-540-78646-7_59 CrossRefGoogle Scholar
  14. 14.
    Silverstein, C., Marais, H., Henzinger, M., Moricz, M.: Analysis of a very large web search engine query log. ACM SIGIR Forum 33(1), 6–12 (1999)CrossRefGoogle Scholar
  15. 15.
    Singhal, A., Kasturi, R., Sivakumar, V., Srivastava, J.: Leveraging web intelligence for finding interesting research datasets. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies (2013)Google Scholar
  16. 16.
    Spink, A., Wolfram, D., Jansen, M.B., Saracevic, T.: Searching the web: the public and their queries. J. Am. Soc. Inf. Sci. Technol. 52(3), 226–234 (2001)CrossRefGoogle Scholar
  17. 17.
    Taghavi, M., Patel, A., Schmidt, N., Wills, C., Tew, Y.: An analysis of web proxy logs with query distribution pattern approach for search engines. Comput. Stand. Interfaces 34(1), 162–170 (2012)CrossRefGoogle Scholar
  18. 18.
    Weerkamp, W., Berendsen, R., Kovachev, B., Meij, E., Balog, K., de Rijke, M.: People searching for people: analysis of a people search engine log. In: Proceedings of the 34th international ACM SIGIR Conference on Research and Development in Information Retrieval (2011)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Emilia Kacprzak
    • 1
    • 2
    Email author
  • Laura M. Koesten
    • 1
    • 2
  • Luis-Daniel Ibáñez
    • 1
  • Elena Simperl
    • 1
  • Jeni Tennison
    • 2
  1. 1.University of SouthamptonSouthamptonUK
  2. 2.The Open Data InstituteLondonUK

Personalised recommendations