Implementing Recommendations in the PATHS System
- 802 Downloads
In this paper we describe the design and implementation of non-personalized recommendations in the PATHS system. This system allows users to explore items from Europeana in new ways. Recommendations of the type “people who viewed this item also viewed this item” are powered by pairs of viewed items mined from Europeana. However, due to limited usage data only 10.3 % of items in the PATHS dataset have recommendations (4.3 % of item pairs visited more than once). Therefore, “related items”, a form of content-based recommendation, are offered to users based on identifying similar items. We discuss some of the problems with implementing recommendations and highlight areas for future work in the PATHS project.
KeywordsDigital libraries Recommendations Europeana
Increasingly recommender systems are being used to assist users with information discovery by bringing relevant content to users’ attention. They are part of a wider set of techniques for providing personalization: the tailoring of systems or services to the specific needs of individual users or communities [1, 2]. Recommendation mechanisms provide advice on objects depending on the user context or profile. They can be broadly classified by the strategy they employ (content-based or collaborative filtering) and by the recipient of the recommendations (individual user or group recommendations). Recommender functionality (and personalization more generally) has been proven useful when providing information access to cultural heritage .
The EU-funded PATHS1 (Personalized Access to Cultural Heritage) project [4, 5] is investigating ways of assisting users with exploring a large collection of cultural heritage material taken from Europeana2, the European aggregator for museums, archives, libraries, and galleries. A prototype system has been developed that includes novel functionality for exploring the collection based on Google map-style interfaces, data-driven taxonomies and supporting the manual creation of guided tours or paths. Another aspect being explored is the use of recommendations to promote information discovery. To date we have been exploring non-personalized recommendations based on item-to-item co-occurrences. These provide recommendations of the kind “people who viewed this item also viewed this item.” Co-occurrence information (items that have been viewed consecutively in the same session) has been minded from a sample of Europeana logs to power the recommendations. Additionally, we provide links to “related items”, a form of content-based recommendation, based on identifying ‘similar’ items and classifying the type of relation. In this paper we describe our recommendation work to date, difficulties in implementing recommendations and our plans for future work.
2 The PATHS System
2.1 Implementing “people who viewed this also viewed this”
We implemented a mechanism to automatically download transaction logs for the main Europeana portal on a daily basis. Currently we use a 6-months sample of logs (1 Jan to 30 June 2012), but have collected almost 2 years of data. We applied standard pre-processing, including the removal of lines not relating to user actions (e.g. cascading style sheets and images), removal of non-human actions (e.g. robots), session segmentation (based on a 30 min timeout between actions) and classification of requests (e.g. viewing an item). A 30 min timeout period of inactivity was selected based on previous research [6, 7], but we recognize that a fixed timeout period does have limitations for reliably detecting sessions and warrants further investigation .
In total, the processed data consists of 14,164,379 requests (3,245,766 sessions), with 53.7 % of requests for item views. We filter out those sessions without any request for items that map to the PATHS dataset. This results in 102,525 sessions (3.2 % of the initial log) with 208,584 item requests. For each session we extract sequences of 2 viewed items (ignoring all other request types). For example for the action sequence item1→item2→search1→item3 we would extract the sequences item1→item2 and item2→item3. We ignored pairs containing repeated items (i.e. item1=item2). This resulted in 55,521 different pairs of items and an average of 1.82 recommendations per item.
2.2 Implementing “Related Items”
For the “related items” functionality, the similarity between each pair of items is computed using a state of the art approach based on Latent Dirichlet Allocation over the text, allowing users to quickly find related items when browsing. An evaluation dataset was crowd-sourced to enable us to assess this approach . In addition, a typed similarity approach is implemented to determine the ‘type’ of the relation, such as similar author, location, date, event, people involved or subject. With this extra functionality, users know why the system is making the suggestion, an aspect considered as important to recommender systems . The approach is a combination of simple similarity heuristics, based on the appropriate metadata fields, and a lineal regression . The latter method improved the results considerably, obtaining second position among several contenders at an open evaluation exercise3.
Like most cultural heritage systems the amount of interaction data generated by users of the PATHS system is insufficient for implementing “people who viewed this also viewed this” functionality, due to data sparseness. Therefore, we exploit usage information from a more widely used system (Europeana), but restrict the data to only those items we index. However, even using data from a more widely used system we can only make recommendations for 10.3 % of items in the PATHS dataset.
A further issue is that only 2,407 pairs of items (4.3 %) are viewed more than once which may be the threshold at which recommendations are acceptable. Therefore, we are also working on extracting more pairs between items based on transitivity (e.g. for the sequence item1→item2→item3 we could also assume a relation exists between item1→item3) and duality (e.g. for the pair item1→item2 we could also extract item2→item1). Another approach to deal with data sparseness could be to map each item to a semantic category and then make recommendations at higher levels than item, i.e. suggest pairs of items for the same subject that are viewed consecutively.
One approach we adopt in the current prototype is utilization of additional content-based recommendations. These “related items” help to alleviate the problems of insufficient usage data. Combinations of approaches are commonly used to overcome the limitations in using collaborative filtering and content-based approaches independently . Further work being planned includes evaluating recommendations in a controlled lab-based setting and field trials. Also, we are developing personalized recommendations based on a session-based user model (i.e. the user profile is built up during a session from items viewed) and using PageRank to identify items of interest.
This paper discusses the implementation of non-personalized recommendations at the item-level in the PATHS system, which assists users with exploring Europeana. Recommendations of the form “people who viewed this item also viewed this item” are powered by mining co-occurrences of items viewed in Europeana. To complement these recommendations, and alleviate some of the issues with data sparseness, we also implemented “related items” functionality. We discuss some of the issues with implementing non-personalized recommendations, in addition to avenues for further work on personalized recommendations in the PATHS system.
The research leading to these results was carried out as part of the PATHS project (http://paths-project.eu) funded by the European Community’s Seventh Framework Programme (FP7/2007-2013) under grant agreement no. 270082.
- 4.Agirre, E., et al.: PATHS: a system for accessing cultural heritage collections. In: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (ACL’13), Sofia, Bulgaria, 4–9 August 2013, pp. 151–156 (2013)Google Scholar
- 5.Fernie, K., et al.: PATHS: personalising access to cultural heritage spaces. In: Proceedings of 18th International Conference on Virtual Systems and Multimedia (VSMM 2012), pp. 469–474 (2012)Google Scholar
- 7.Catledge, L., Pitkow, J.: Characterizing browsing strategies in the world-wide web. In: Proceedings of the Third International World-Wide Web Conference on Technology, Tools and Applications, vol. 27 (1995)Google Scholar
- 8.Jones, R., Klinkner, K.: Beyond the session timeout: automatic hierarchical segmentation of search topics in query logs. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management (CIKM’08), pp. 699–708. ACM, New York (2008)Google Scholar
- 10.Sinha, R., Swearingen, K.: The role of transparency in recommender systems. In: Proceedings of the Conference of Human Factors in Computing Systems, 20–25 April 2002, Minneapolis, MN, pp. 830–831. ACM, New York (2002)Google Scholar
- 11.Agirre, E., et al.: UBC UOS-TYPED: regression for typed-similarity. In: Proceedings of the Second Joint Conference on Lexical and Computational Semantics (*SEM 2013), vol. 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity, Atlanta, Georgia, 13–14 June 2013, pp. 132–137 (2013)Google Scholar