A Query Log Analysis of Dataset Search
Data is one of the most important digital assets in the world and its availability on the web is increasing. To use it effectively, we need tools that can retrieve the most relevant datasets to match our information needs. Web search engines are not well suited for this task, as they are designed primarily for documents, not data. In this paper, we present the first query log analysis for dataset search, based on logs of four national open data portals. Our aim is to gain a better understanding of the typical users of these portals and the types of queries they issue, and frame the findings in the broader context of dataset search. The logs suggest that queries issued on data portals differ from those issued to web search engines in their length and structure. From the analysis we could also infer that the portals are used exploratively, rather than to answer focused questions. These insights can inform the design of more effective dataset retrieval technology, and improve the user experience of data portals.
KeywordsQuery log analysis Dataset search User behaviour
This project is supported by the European Union Horizon 2020 program under the Marie Sklodowska-Curie grant agreement No. 642795.
- 2.Bendersky, M., Croft, W.B.: Analysis of long queries in a large scale search log. In: Proceedings of the 2009 Workshop on Web Search Click Data, pp. 8–14. ACM (2009)Google Scholar
- 5.Gan, Q., Attenberg, J., Markowetz, A., Suel, T.: Analysis of geographic queries in a search engine log. In: Proceedings of the First International Workshop on Location and the Web, pp. 49–56. ACM (2008)Google Scholar
- 9.Kelly, D.: Methods for evaluating interactive information retrieval systems with users. Found. Trends Inf. Retrieval 3(1–2), 1–224 (2009)Google Scholar
- 10.Koesten, L.M., Kacprzak, E., Tennison, J., Simperl, E.: The trials and tribulations of working with structured data - a study on information seeking behaviour. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI 2017. ACM (2017, to appear)Google Scholar
- 11.Kunze, S.R., Auer, S.: Dataset retrieval. In: 2013 IEEE Seventh International Conference on Semantic Computing, September 2013Google Scholar
- 12.Lehmberg, O., Ritze, D., Meusel, R., Bizer, C.: A large public corpus of web tables containing time and context metadata. In: Proceedings of the 25th International Conference Companion on World Wide Web, pp. 75–76 (2016)Google Scholar
- 15.Singhal, A., Kasturi, R., Sivakumar, V., Srivastava, J.: Leveraging web intelligence for finding interesting research datasets. In: IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies (2013)Google Scholar
- 18.Weerkamp, W., Berendsen, R., Kovachev, B., Meij, E., Balog, K., de Rijke, M.: People searching for people: analysis of a people search engine log. In: Proceedings of the 34th international ACM SIGIR Conference on Research and Development in Information Retrieval (2011)Google Scholar