Discovering Unpredictably Related Words from Logs of Scholarly Repositories for Grouping Similar Queries
As the number of institutional repositories is increasing, more and more people, including non-researchers, are accessing academic contents on them via search engines. User models of non-researchers are not well understood yet, unlike researchers, although non-researchers may use quite different queries from researchers. For understanding their search behavior, it is a good way to categorize search queries of non-researchers into groups. This chapter is devoted to finding related query words at the first step from logs of scholarly repositories. In particular, we try to find words which are related from the viewpoint of non-researchers. In this sense, these words are unpredictably related. A simple method to do this using the access log is that we treat queries which lead to the same paper as related. However, it is challenging because one academic paper generally has a small amount of accesses while accesses to one paper bring many kinds of query words. Instead, we expand relationships between query words and papers, and use a graph-based algorithm in which query words and papers are vertices to find related words. As experiments, we use more than 400,000 accesses recorded at a major portal site of Japanese theses, and show that we can find related words with respect to specific disciplines if these words appear frequently. There words seems to be interested in non-researchers and hence we can’t say they are not related in a usual manner. This result implicates that we can obtain related words if we enrich relationships between technical terminologies using background knowledge, such as dictionaries.
KeywordsInstitutional repositories Access log mining Randome walk Hitting time Query expansion
This work was supported by JSPS KAKENHI Grant-in-Aid for Scientific Research (B), Number 23300087.
- 1.D. Ikeda, S. Inoue, Access flows to a repository from other services, in Proceedings of the 4th International Conference on Open Repositories (2009),http://hdl.handle.net/1853/28422
- 2.D. Ikeda, P. Wang, Revealing presence of amateurs at an institutional repository by analyzing queries at search engine, in Proceedings of the 7th International Conference of Open Repositories (2012)Google Scholar
- 3.Q. Mei, D. Zhou, K. Church, Query suggestion using hitting time, in Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 469–478 (2008)Google Scholar
- 4.M. Sahami, T.D. Heilman, A web-based kernel function for measuring the similarity of short text snippets, in Proceedings of the 15th International Conference on World Wide Web, pp. 377–386 (2006)Google Scholar
- 5.K. Saito, On the radiological contamination of environment by utilization of the atomic power. The Review of Liberal Arts 26, 103–127 (1963). (in Japanese)Google Scholar
- 7.S. Sato, M. Yoshida, Usage log analysis of articles in six japanese institutional repositories: which region do users access articles from? in The 2010 CiSAP colloquium on Digital Library Research, Doctoral Student Forum (2010)Google Scholar