Keywords

1 Introduction

Recommender systems are intelligent systems that exploit item preferences in the form of explicit or implicit feedback and identify novel items that are likely to be interesting and relevant to the user [7]. By incorporating these systems into augmented reality, recommendations can be visualized and superimposed on top of the view of the real world to create an appealing user experience.

In this work, we developed a mobile application for the Niederdorf old town in Zurich that shows personalized recommendations and useful information about the POIs in augmented reality. Our main motivation was to exploit contextual information and user-item preferences to generate personalized POI suggestions for each user in AR. Such a system might be able to reduce the overwhelming number of restaurants and shops that can be explored and prevent an overload of information that is irrelevant to the users. We also created a navigation system that helps users to find the system’s recommended places. In the next section, we describe the AR experience and present the different components of our prototype.

2 Description of Our Prototype

Our prototype can be used on Android and iOS devices that support augmented reality. As the first step, new users have to complete a short registration process when they open our application. After entering their name, age and gender, users have to fill out the Five-Item Personality Inventory (FIPI) questionnaire [6] to determine their personality traits. All of this information will be leveraged by our recommender system to provide a personalized experience for each user.

Once the user profile is created, the user can visit the area of the Niederdorf old town in Zurich that is supported by our prototype and point their mobile devices at the surrounding buildings. If the application manages to localize the mobile device within the environment containing the POIs (see Sect. 2.2), rotating icons appear in front of restaurants and shops as shown in Fig. 1a. If users are interested in learning more about a particular POI, they can simply click on the corresponding icon to reveal an additional window. This closeup view shows many details about the POI such as its name, description, rating and some user reviews (see Fig. 1b). For better usability, users can also perform a pinch gesture to transform the closeup view from world space (where the window is attached to the real world) to screen space (where the window has a fixed position on the screen). This can be helpful if the text on the window is too small to read when the user stands far away from it.

Additionally, we also have a map feature that shows the current position of the user as well as all the POI locations. Similarly to the rotating icons that float in front of the buildings, users can click on the icons on the map to open a closeup view that contains more information about the corresponding POI. This map is also useful to check which alleyways of the old town are supported by our application.

Fig. 1.
figure 1

Screenshots of the AR experience (Color figure online)

The core feature of our AR experience is a recommender system (see Sect. 2.1) that generates personalized POI recommendations for each user. Presenting these item recommendations in AR needs to be intuitive and easy to understand for users. For that, we designed a virtual 3D signpost that can be spawned anytime by pointing the camera at the floor and tapping on the screen. Each generated POI suggestion is visualized by a signboard on the signpost that displays the name, distance and POI type (restaurant or shop) and points towards the location of the recommended POI (see Fig. 1c). Clicking on a signboard will open the same closeup view as mentioned above and reveal a navigation line that shows the entire path to the suggested POI. I.e. if users like any of the POI recommendations, they can simply follow the corresponding navigation line to reach the desired destination (see Fig. 1d, yellow line). Users can also give feedback to the recommender system by clicking on a like button on the closeup view in case they are satisfied with a given suggestion. These explicit user-item preferences will be taken into account by our system for future recommendations.

2.1 Recommender System

To elicit user preferences, we used an active learning strategy proposed by [4] that exploits the user’s personality traits defined by the Five-Factor Model (FFM). This approach is able to identify items to be presented to the user even in the absence of any rated items by the user. This is the case in our AR experience since there exist no ratings when a new user creates a user profile. Previous research [5] has shown that exploiting such additional sources of information about the user is effective to identify potentially useful items for the user. Moreover, personality-based active learning approaches have also been applied by other tourism applications [2, 7].

For the recommendation logic, let \(u \in U\) denote a user, and \(i \in I\) a POI. \(r_{ui}\) is the rating that the user u gave to the POI i and \(r^*_{ui}\) is the rating predicted by a model for a POI i whose true rating \(r_{ui}\) is unknown. We implemented an extended version of the matrix factorization (MF) model [9], the most widely used technique for building collaborative filtering models. In the MF model, each POI i and each user u is associated with f-dimensional real vectors \(q_i\) and \(p_u\). Let P denote the user-latent factor matrix and Q the item-latent factor matrix. The model parameters, i.e., the vector representations of the users and items (and other parameters such as item bias or user bias if present), are learned by minimizing the error of the model predictions on a training set of ratings [8]. We additionally enhanced the user representation by introducing parameters to the model that represent the known user attributes: age group, gender and the scores for the FFM personality traits. As mentioned above, these user attributes are collected during the user registration for our system. The model parameters are learned by minimizing the associated regularized squared error function through stochastic gradient descent. More details about the objective function and model update rules can be found in [1, 3].

2.2 Localization

For location-aware AR applications, it is essential to obtain a proper alignment of the virtual content with the real-world scene. For that, precise localization techniques are required to estimate the position and rotation of the user’s device with respect to the environment of interest. If the localization is done inaccurately, the virtual content will be misplaced, which breaks the whole immersion of the AR experience. In this work, we made use of visual localization methods that estimate the pose of a camera based on its images.

We namely used the ImmersalFootnote 1 software development kit for the Unity engineFootnote 2, which is an AR solution that lets developers spatially map real-world locations and then augment them with digital content. One of the main benefits of the Immersal SDK is its support for very large spaces and its scalability to entire cities. To create our AR experience, we first visited the Niederdorf old town to spatially map all the buildings by taking numerous images from different viewpoints. Each set of images was then sent to the Immersal cloud service that generated a 3D point cloud and a textured mesh of the mapped location by finding and matching distinct visual features in the submitted images. Additionally, each point cloud was tagged with the GPS coordinates of the corresponding location such that it is possible to narrow down nearby POIs when using the application, which speeds up the localization process. While the point clouds are only needed for the localization process, we used the textured meshes to create a digital twin of the Niederdorf old town inside the Unity editor. These meshes served as a point of reference and allowed us to carefully define the position, rotation and scale of each piece of virtual content and set up the navigation logic for the personalized signpost.

When users visit the Niederdorf old town in the final AR experience, our prototype periodically captures the current camera frame and sends it to the Immersal cloud service which tries to match it with a point cloud of one of the nearby POIs. If there is a match, the cloud service returns a projection matrix from which the current position and orientation of the device can be extracted. Once the localization process is completed, our application is able to align the digital version of the old town with the real one.

Implementation Details: As mentioned before, we developed our AR applicationFootnote 3 using the Unity engine, which is a popular development platform to create video games and 3D applications. We implemented our recommender system in Python as a server application using the Flask web framework and deployed it to Google Cloud Run in a Docker container. Finally, we stored all the POI data, user profiles and user-item preferences with Amazon Web Services. Our source codeFootnote 4 is publicly available to extend our application for other cities.

3 Conclusion and Future Work

In this work, we aimed to design a personalized AR experience based on user preferences that shows content in AR which is adapted to the user’s profile and interests. In particular, we presented our prototype, a mobile AR application that shows only personalized and relevant POIs to users when they are visiting a touristic city by exploiting and modeling users preferences which are elicited in AR.

With the vast increasing computing power of smartphones, AR technologies have enabled more and more stable, immersive and accessible AR experiences for consumers in recent years. As our future work, we want to understand the scalability of such applications to multiple cities and also want to look into a deeper analysis of localization techniques [10] combining object detection and prediction models.