1 Introduction

The ubiquity of personal mobile devices enables consumers access to a wealth of online product and shopping information at their fingertips. Even while holding a product within a retail shop, consumers can browse detailed product information, lookup reviews, and compare prices online through their devices. However, there is a distinct gap between what users see and touch physically at the shop, and the digital content that they are reading on their mobile devices, in terms of the interaction experience. One approach to reducing this division of physical and digital information spaces is to combine the two spaces, to make digital content accessible directly on physical objects.

Augmented Reality (AR) or Mixed Reality (MR) is an interactive technology that combines real and virtual content, and which are registered in 3D [1]. Currently, mobile AR is one of the most common form as it is easily accessible by anyone with a mobile phone, and has been used extensively to increase various retail experiences [2]. However, the main drawbacks are that users need to hold their device with their hand, and they have to switch their device “on” to use it, creating an extra barrier to activation.

Mixed Reality that uses head-mounted displays (HMD) has the potential to overcome these challenges. It allows for a more immersive experience, and removes the need to hold any devices in the hand. Hardware technologies, such as Microsoft HoloLens, have become increasingly popular recently as they become more portable and untethered, enabling users more freedom while using them. In the near future, when hardware devices become even more compact and less cumbersome, e.g. in the form of a normal pair of glasses, MR will be ubiquitous. It is this future vision where we are aiming this work towards.

In this paper, we propose the use of continuous context awareness, natural user actions, and augmented physical products, in order to provide users with relevant digital content at their context of use. This combination enables a novel, entertaining, and personalized interactive retail shopping experience for consumers. We contribute to the research community by presenting our MR-Shoppingu concept and design guidelines.

2 Related Work

Context is defined as “any information that can be used to characterize the situation of an entity” [3]. Based on contextual information such as location, activities or user actions, more relevant content specific to the situation can be presented to the user.

Valkkynen et al. presented a mobile handheld AR system that takes into account the location context [5]. Depending on whether the user is at home or at the store, different content is overlaid on top of product packaging. In ShelfTorchlight [4], while moving a mobile camera projector over a product shelf, coloured circles are projected onto the products, green for products that suit the user, red for ones that do not.

An advantage of using HMD is that it is always “on” and can potentially capture, or be aware of, not only the activity or location of the user, but also the activity and location of the products themselves. We also aim to enable users to just walk-up and use, without the need for prior-knowledge of how the system is to be used.

3 System Design and Implementation

In this section, we describe the design and implementation of our proof-of-concept prototype, MR-Shoppingu. In order to create an interactive in-store shopping experience that enhances physical products with augmented online content, we envision that users would interact with physical products naturally, without the need for any special input; the system would be able to continuously detect these user actions, and react by augmenting the physical products with relevant information. In order to achieve this, our system is guided by these design requirements:

  1. (1)

    Continuous Context Awareness - the system is continuously aware of the context surrounding its users and the products that they are interacting with;

  2. (2)

    Natural User Actions - Users only need to use their natural gestures and physical actions, without the need to learn a new interface, or interaction method. The system should be able to react to user’s everyday activities and actions;

  3. (3)

    Incorporate online capabilities & content – make use of digital capabilities, such as those information and functionalities that are currently only available online (e.g. virtual bookmarks and reviews), and combine them with the physical shopping experience, bridging the gap between physical and digital worlds.

Using these design requirements, we aim to provide relevant information and recommendation to the user at the appropriate time.

MR-Shoppingu, is an application for Microsoft HoloLens and was built with Unity, Microsoft Visual Studio, HoloToolkit, and Vuforia SDK. The design requirements are realized in our system in the form of four main components:

  • Context detection – The context can be detected via activities between the user, objects, and the situation in the scene. We have divided possible context into 6 pre-defined states: idle, approach, gaze, grab, reverse, return (Fig. 1). Predominately, the distance between the user and the object is used to distinguish the states in the current context, e.g. the distance at idle is the largest, while grab and reverse, are the smallest. Sequence information from the different states are also used to identify the context, e.g. grab must happen before reverse.

    Fig. 1.
    figure 1

    (a) A user using our system. (b–f) Screenshots taken from user’s view. The 6 states of our MR-Shoppingu system: idle, (b) approach, (c) gaze, (d) grab, (e) reverse, (f) return

  • Gaze detection - The gaze direction provided by HoloToolkit is used in conjunction with Vuforia to determine when products are being gazed at.

  • Product recognition and tracking - An image of each side of the product are first scanned manually and registered as a cuboid, Vuforia will then return the object location in 3D space. The side of the object that is facing the user can then be calculated, and used to determine if the object is in reverse.

  • Visualisation - Texts and basic geometries are used in our current system, while videos, text descriptions and text reviews are pre-defined for each state, and for each product.

4 User Scenario

In order to demonstrate the use case of our proof-of-concept system, we showcase one specific user scenario that it is designed to support:

  1. (a)

    In a café, a shelf with a variety of coffee beans are being displayed (idle).

  2. (b)

    As the user walks closer to the shelf, the Peru bag is recommended to the user, shown as a message displayed next to the product. The Peru bag is recommended as the system knows the cup of coffee that the user has just purchased in the café (approach).

  3. (c)

    As the user gazes at the various products, each product “highlights” by showing a bounding box, as well as its price and rating information (gaze).

  4. (d)

    As the user picks up the Peru bag, further description of the product as well as the top user review is shown to the user (grab).

  5. (e)

    The user turns the product around for further information at the back. The system plays a video showing provenance information about the coffee plantation in Peru and how coffee beans are processed (reverse).

  6. (f)

    As the user places the bag down, the system detects that he may not be interested in the bag anymore, it then recommends another bag nearby (return).

5 Conclusion and Future Work

In this paper, we proposed MR-Shoppingu, a novel mixed-reality interactive in-store shopping experience that enhances physical products by combining continuous context awareness, natural user actions, and augmented online content that is relevant to the user and their context of use at the particular time. This combination may help to increase efficiency and certainty of purchase, and enables a more personalized and entertaining experience for consumers.

As our next step, we aim to conduct a user study to investigate how much users prefer using MR-Shoppingu to purchase, and how effective it is in helping consumers shop in physical retail shops. In the long term, we hope to be able to make the system more flexible and robust, by automatically detecting objects, and recognizing products by searching in online product databases in real-time.