Keywords

1 Introduction

Due to the increasing availability and adoption of mobile smart devices, users can enjoy online shopping from any location reachable within a network. Users generally access applications that can record a significant amount of individual information; thus, users can engage in convenient and personalized shopping experiences relative to their context and preference. However, despite efforts to optimize visualization for small screens, visual representation to support marketing engagement of e-commerce items on mobile devices remains limited.

In recent years digital signage has penetrated significant numbers of indoors and outdoors public spaces and gradually replacing traditional printed or electric (bulbs/LED) billboards in train or bus stations, department stores, office entrances, on building walls, and many city streets. This trend is mainly driven by the decreasing cost to deploy large LCD displays that are practically suitable to visualize dynamic information, which can be customized and scheduled based on marketers’ preference. These displays offer attractive visualizations of various types of content that can easily attract pedestrians’ attention, thereby serving as strategic entry points for advertising and e-commerce. Interactive digital signage with large displays is suitable for dynamically presenting extensive information to a large number of users. The common practice to maximize content exposure in digital signage is to divide the screen canvas into several sections respective to content’s type or genre. This multi-section signage offers clean layout design for supporting simultaneous multi-channel content delivery.

Significant numbers of users are hesitant to interact with public digital signage due to privacy concerns. Other people near the display can easily see private activities and content, which makes users reluctant to input personal information such as names, passwords, credit card numbers, and addresses. Marketers also observed that depending on the digital signage’s location and content, users are hesitant to even show interest to publicly visible signage; therefore limiting the strategic effectiveness of deploying advertisements in digital signage. We believe that an effective solution to maximize limited resources and obtain maximum user experience is the sharing of a large display to visualize extensive information and displaying personalized content on the users’ private display.

Research aiming to close the gap between public signage and user’s private mobile devices has been conducted very actively in Human-Computer Interaction (HCI) field [18]. In this work, we specifically address content bridging for multi-section digital signage. Search query keywords, URLs, QR codes as well as other 1D/2D barcodes are still widely used to provide users with an affordance to access a specific content’s detailed information. However, excessive usage of texts, visual codes, and URLs are incompatible for multi-section signage due to spatial and design constraints.

In this paper, we report GlassNage, an approach to address aforementioned issues by implementing signage layout recognition based on computer vision techniques, which includes corner, lines, edges, and line segment detection of images captured on the user’s device. The aforementioned 2D features are used to generate signage layout map, which is then matched with pre-learned layout map to define which layout is being used, and further, to detect corrected perspective using homography calculation. To grab a specific content, users are able to choose a section within the captured layout using the device’s interface, which in turn creates a request to content server to send respective info based on a timestamp and a unique section ID. Actual usage scenario is depicted in Fig. 1.

Fig. 1.
figure 1

We report our approach to realize contents transfer from multi-section digital signage to users’ private display, such as Glass-like wearable system. We use computer vision approach that includes corner, line, edge, and line segment detection to identify signage layout within user’s camera view.

Using this approach, we mitigate requirement to pre-learn content for section-specific object recognition, thus allowing dynamic changes of content in each section. In real-world practices, these dynamic content include seasonal scheduling, real-time updates, changing contents source/channel, and so on. Moreover, our layout recognition performs in real-time at up to 8 fps for 720 p resolution (1280 × 720), deployed in a Glass-like wearable device with OMAP4430 2 GB RAM dual-core System-on-Chip (SoC), on the Android 4.4.2 operating system.

In this paper, we contribute to HCI community by presenting GlassNage prototype design and implementation, as well as user study evaluation results. In addition, we also discuss limitations, insights, and design principles for future work and for other researchers with similar interests to bridge public content for private consumption.

2 Related Work

Various research and development in online-to-offline (and vice versa) shopping approaches using signage have been undertaken. TESCO deployed a trial advertising campaign that leverages static QR codes for product displays in a subway station [7]. Users could access product websites by scanning the QR codes with their devices. However, despite recent advancements in QR codes as well as other 1D/2D barcodes technology [9], these codes remain visually perceivable cues that potentially clutter signage design. Moreover, in multi-section dynamically scheduled content signage, putting visual codes on each section of signage layout is not a feasible solution.

A recent survey has highlighted the need for research in interactive digital signage that utilizes both public and private screens [1]. Turner [2] proposed cross-device eye-based interaction, which is a content-sharing mechanism that combines user gaze information with mobile input modalities to enable content transfer between public and personal displays in close proximity. However, this system requires special devices to detect public display and map user gaze information to a screen. SWINGNAGE [8] is a gesture-based mobile interaction system for a distant public display that focuses on item search and comparison on the public display. Users interact with content on the display using private devices. However, users are required to perform device pairing, which is based on the detection of user gesture information with a mobile device using a depth-camera. Due to camera line-of-sight issues and vision-based user-tracking limitations, this pairing mechanism is not feasible when multiple users simultaneously connect to the display from a relatively distant location.

Touch-Projector [4, 6] and Shoot & Copy [5] highlight a method to leverage camera equipped mobile device to recognize content that are being visualized in larger public screen, this method is referred as mobile interaction through video. Although partially sharing common concept of the underlined computer vision methodology, GlassNage focuses on applicability and deployment scaling in commercial signage, hence we extend the system design and implementation to omit the requirement of a central processing server, i.e., layout recognition is being performed in users’ private device. GlassNage’s target is commercial or marketing digital signage; therefore, we focus on user’s privacy protection and enhance the system to provide detailed information of a specific content on users’ private display.

3 GlassNage Implementation

3.1 Interaction Scenario Using GlassNage Framework

We highlight the framework of our proposed approach in Fig. 2. The signage content is developed on the top of Adobe Integration Runtime (Adobe AIR) platform. We construct a signage content scheduler server to control visualization for multiple displays, as well as to handle requests sent from GlassNage app.

Fig. 2.
figure 2

Workflow of our proposed approach involves: (a) signage display is managed by a content scheduling server, and when (b) a user has interest to the content in a specific section of the signage, s/he uses GlassNage app installed in a wearable Glass device to perform layout recognition and section selection (c). Selected signage section ID paired with a timestamp is (d) sent to the server to obtain relevant info of the selected content (e). Consequently, user is able to perform follow-up actions within his/her private display.

When a user is interested in a specific content of the digital signage, s/he uses GlassNage app to take an image that captures a large part of the signage. The application subsequently performs signage layout recognition, and then overlays recognized layout in the users’ private display. User is then able to select which section does s/he wish to obtain more detailed information. User’s selected section ID paired with a timestamp will be sent to the server for obtaining a URL, which is a link to a web page containing detailed information of its respective content. GlassNage app will visualize the received URL to give users a notification to view this information.

3.2 Digital Signage Layout Design Approach

We describe our layout design approach in Fig. 3. We define a static layout consisting multiple sections, and assign a unique ID for each section. For visual enhancement, we create signage frame that functions as a placeholder for each section, as well as to increase visual affordance of the multi-section design. We select background design that largely preserves visual characteristics of the predefined static layout. Based on this layout, we assign content source for each section. Lastly, we fill the layout with assigned content for final visualization. Our prototype incorporates 11 sections (Fig. 3, left), which includes content such as café interior and exterior image slideshow, food and beverage menu image slideshow, special offer image slideshow, news (text and video), as well as weather forecast. We chose this layout mainly to represent typical model of multi-section signage deployed in cafés. Additionally, we intended to create more challenging recognition problem for testing purposes.

Fig. 3.
figure 3

We describe signage layout design process that is necessary to be compatible with GlassNage framework. First, we define a static content layout that serves as a ground truth for our layout recognition. Second, we fill the layout with a background design that preserves visual characteristics of the content layout. Third, we assign content source for each section of the signage using server side application. The contents in each section can be dynamically scheduled. Lastly, we fill the frame with actual content

3.3 Layout Recognition

We implement client-server protocol to realize layout recognition and content distribution. Our layout recognition software was implemented within the GlassNage mobile application, which was mostly written in Java, with some parts in C ++ for computationally resource consuming functions. We utilize camera equipped Glass-like wearable as target device, considering that the device is equipped with a see-through optical head-mount display that will enhance user experience on public-to-private content retrieval.

We aggregate visual features such as corners, lines, edges, and line segments; to obtain computational model of the signage layout based on relative positioning of the aforementioned visual features. We use these 2D features due to the low computation time that is crucial for deploying the algorithm to mobile device with limited computing resources. Firstly, we apply this approach to the base content layout (Fig. 3, left) to form ground truth content layout map, which is utilized to match with features aggregated from the same approach that is applied to the camera frame. We implemented our matching using FLANN [10]. We describe the overview of our layout recognition approach in Fig. 4(b)–(d).

Fig. 4.
figure 4

We developed a mobile app, namely GlassNage, to perform the following functions: (1) layout recognition from an image captured using camera equipped device, (2) perform perspective correction, (3) let a user selects a section which s/he wants to receive further information from, and (4) visualize a website containing content’s detailed information. We describe our layout recognition approach in (b)–(e).

3.4 User’s Selection Method and Visualizing the Result

An exemplary figure of user’s view after successful layout recognition is depicted in Fig. 4(e). The green outline visualizes the detected signage layout, and sections are highlighted in colored shades. To select a section, user can browse through sections using the Glass device’s touch interface by flick gestures. For other mobile device such as smartphones, users can simply tap on the desired color-shaded section.

After user selection has been confirmed, the GlassNage mobile application creates a request to the content distribution server to send back a URL that refers to details or further information related to selected section (depicted in Fig. 2(d)). This request contains data such as signage ID, layout ID, section ID, and a timestamp (obtained when user’s selection is confirmed). The server side of our system also manages contents scheduling, therefore pairing between a specific timestamp to layout section ID and its’ respective content is straightforward. After receiving and process this request, the server will then send a URL to the client GlassNage mobile application. The user is then presented with the URL on their private heads-up display, and has the choice to browse detailed content using the browser.

4 Evaluation

4.1 GlassNage Application Statistics

To evaluate GlassNage mobile application performance, we used Android application analysis tools available in the Android SDK. We deployed GlassNage in a Glass-like wearable device with OMAP4430 2 GB RAM dual-core System-on-Chip (SoC), on Android 4.4.2 operating system. Our layout recognition performs in real-time at up to 8 fps for 1280 × 720 (720 p) resolution. The camera of this device had 54.8 degrees horizontal and 42.5 degrees vertical Angle-of-View. A series of quantitative and qualitative user study were conducted to test the feasibility of GlassNage approach. The results are presented in the following subsections.

4.2 Focus of the User Study

In our experiments, we used GlassNage mobile application that was deployed in a Glass-like wearable device. We identified a usability issue when a user is trying to frame the content that they are actually seeing with the camera frame. I.e., natural users’ Field-of-Vision (FoV) do not align well with the camera’s Angle-of-View (AoV). This is mainly caused by:

  1. 1.

    There is only a single camera, i.e. not stereoscopic, which does not compensate 3D gaze.

  2. 2.

    The position of the camera is in the front-right part of the frame, which does not match with human FoV’s centroid.

  3. 3.

    The camera only has a relatively narrow AoV (54.8 degrees horizontal and 42.5 degrees vertical), compared to in total of 124 degrees FoV of the human.

We illustrate our finding in Fig. 5. Based on this finding, we first focused in indicative factors of users targeting behaviour and also their accuracy. The second focus of our study was to assess users’ perceptual workload when using GlassNage.

Fig. 5.
figure 5

We identify an issue in framing a user’s Field-of-Vision (FoV, shown in blue shade) against the Glass device camera’s Angle-of-View (AoV, shown in red shade). In this figure, we illustrate where these two viewing angles overlaps to each other. From this illustration, we can observe that when a user is looking horizontally straight, a part of the camera’s AoV is actually outside the user’s typical paracentral and near-peripheral FoV area. In vertical scenario, we can observe that some parts of user’s near-peripheral FoV (when considering eye rotation) are not covered by the static camera AoV.

4.3 Quantitative User Study

We conduct a series quantitative user study to test users behaviour as well as their accuracy when trying to align their perceptual FoV and Glass device’s camera AoV.

Participants. We recruited six participants for this study. The participants were all from outside of our research organization. They were 4 male and 2 female, age 25.4 ± 4.21 years old. All of the participants were familiar with the Glass-like wearable device, and were confident of wearing, seeing the display, and interact with the side-mounted touch panel.

Procedures. Firstly, we instructed the participants to stand 2 meters in front of a 60-inch monitor that was pivoted vertically (resembles the setup depicted in Fig. 1 left). The monitor was displaying signage content that previously described in Fig. 3. We assigned 11 sections in the signage, and presented content related to coffee shop menu, news, weather, etc. Second, we instructed the participants to comfortably center their head posture and FoV, as well as to focus their eye gaze onto a designated section, promptly followed with taking a picture with the Glass-like wearable device’s camera app. We asked each participant to perform this task for all 11 sections of the signage, and repeat this series of tasks for 5 times.

Data Statistics. For each signage section, we obtained 5 images from each participant; i.e., in total we have 30 images capturing the same signage section. The images were taken using the 5MP camera of the Glass-like wearable device, which translates to 2528 × 1856 pixel resolution. We imported the images from the internal storage of the Glass-like wearable device into a desktop PC to perform further analysis. We did not change the size or aspect ratio of the images.

Processing and Analysis. Firstly, we locate the centroid of each image (xc, yc) = (1129, 928). We then locate the target signage section in the image, extract the centroid (xs, ys), and calculate the Euclidean distance between image centroid (xc, yc) = (1129, 928) and captured section centroid (xs, ys).

Results. We compile the results in Table 1. Table 1 shows a comprehensive comparison of how users targeting behaviour and accuracy are affected by the size and position of the signage sections. Section number 1, 2, 3, 8, and 10 represent larger signage section area. In these sections, users’ center targeting absolute deviations were relatively high. Section 6, 7, 9, and 11 represents horizontally wide sections. Notably for section number 6 and 7, absolute deviations were the highest of all sections. Interestingly, sections with small area such as 4, 5, and 9 have relatively lower absolute deviations.

Table 1. The absolute deviations (mean ± std) of gazing towards a target

Insights and Design Implications. We observe participants’ behavior during the signage section targeting study, as an addition to the assessment of participants’ accuracy when trying to align their perceptual FoV and Glass device’s camera AoV. We compile our insights as listed below:

  1. 1.

    Participants tended to perform fine-adjustment to their framing when targeting at sections that have small area (e.g. section number 4, 5, 9). Therefore, we can conclude that users are more cautious during targeting (hence their overall absolute deviations are lower), when compared to targeting sections with larger area.

  2. 2.

    Sections with larger area give users more instant confidence in targeting task. However, the absolute deviations are relatively high. Therefore, we need to incorporate more deviation permissive framing procedure to the GlassNage app, or any other system that relies on Glass-mounted camera capture.

  3. 3.

    Sections with horizontally wide area are quite difficult for users to target, when we compare users targeting with section’s centroid. This is mainly due to spatial perception of users when framing such sections, where users are more likely to be satisfied with their framing although it’s not horizontally centered.

  4. 4.

    Overall, using our Glass-like wearable device, we learned that more sophisticated alignment method is desirable to support users’ perceptual matching between their FoV and Glass device’s camera AoV. In current GlassNage implementation, we mitigate this issue with allowing the user to firstly capture an image, and then perform layout recognition. By doing so, we allow users to capture image that consists their section-of-interest, as well as other important landmark features.

4.4 Qualitative User Study

A qualitative user satisfaction study was conducted to test the usability of GlassNage.

We recruited the same participants as the previous Quantitative User Study.

Procedures. Each participant was provided with a Glass-like wearable device that was pre-installed with GlassNage application. S/he was then given a brief introduction followed by a set of instructions on how to use GlassNage. This was immediately followed by asking each participant to interact with the app and digital signage. The experimenter intervened when specific questions were asked, or when the instructions were misunderstood. After the exercise, the participants were asked questions related to the perceived workload from the NASA-TLX assessment [11].

In addition, “Was GlassNage hard to learn?” was always added into the questionnaire to gain insight on learning curve of GlassNage. This was followed by several general questions about GlassNage that are shown below:

  1. 1.

    Did GlassNage make the content browsing experience more enjoyable?

  2. 2.

    Did you feel that fetching content items through GlassNage is more effective than previously available methods?

  3. 3.

    Do you have any additional comments?

All the questions above were rated using the Likert scale (1: strongly agree – 5: strongly disagree).

Results. Table 2 shows the participants’ rating on the perceived workload (NASA-TLX) of the user study.

Table 2. The ratings (mean ± std) of the NASA-TLX questions

The results from Table 2 show that participants felt positive while using GlassNage. The average rating for the subscale “mental demand”, “effort”, and “frustration” was the lowest at 3.61 ± 1.52, 3.24 ± 1.42, and 3.64 ± 1.25, respectively. On the other hand, the participants showed positive feelings that GlassNage was not difficult to learn (4.23 ± 1.52). Finally, the participants felt that overall GlassNage did perform well (1.21 ± 1.51).

The general questions suggested that subjects agree that GlassNage did make the content browsing experience more enjoyable (1.02 ± 0.24) and also felt that GlassNage is more effective (1.43 ± 0.84) than previous methods of content fetching.

Many comments were given regarding technical issues in the application such as:

  1. 1.

    Implement finger pointing gesture recognition to select content rather than using touch panel.

  2. 2.

    Implement faster signage section recognition framework.

  3. 3.

    Include a function to “push” information to public signage

In addition to the user obtained feedbacks, we observed that some participants initially had difficulties to grab a frame capture of their desired section. This is coherent with the issue we raise in Quantitative User Study subsection. This motivates us to explore ways to mitigate this problem.

5 Discussion

With GlassNage we explored a simplistic design to incorporate signage layout recognition to assist users in selecting the specific part of dynamic signage content. We realize that our simplistic design may only work for very particular interface designs that consist of different tiles with clear contours. However, we believe that this work contributes in highlighting the maximal result using minimal computational strategy, and discuss limitation of this particular strategy.

A limitation remains that borders between content must remain visible; to be recognizable as line or line segments. We argue that visible borders can be incorporated within the signage design itself; hence mitigate the visual sense of a frame. In the case of content appearing as 2D features, our layout features incorporate correlation between lines, corners, and line segments, hence mitigate the false negatives. Temporal comparison between multiple frames can also be implemented to filter non-static lines, corners, and line segments.

6 Conclusion and Future Work

We presented GlassNage, an approach to support dynamic content transfer from publicly available large display digital signage that has multiple sections, to users’ private display. We implemented a series of computer vision techniques to detect and recognize content layout. We have conducted quantitative and qualitative user study to evaluate users’ targeting behavior, users’ perceived workload, and GlassNage framework usability. We also highlighted insights and interface design implications which were aggregated from the user study results and observation of participants’ behavior. In future iteration of this research, we plan to explore non-distractive layout frame visualization, and finding appropriate interface design to accommodate smoother user’s section selection.