Abstract
We report our approach to support dynamic content transfer from publicly available large display digital signage to users’ private display, specifically Glass-like wearable devices. We aim to address issues concerning dynamic multimedia signage where the content are divided into several sections. This type of signage has become increasingly popular due to optimal content exposures. In contrast to prior research, our approach excludes computer vision based object recognition, and instead took an approach to identify how contents are being laid-out in a digital signage. We incorporate techniques to recognize basic layout features including corners, lines, edges, and line segments; which are obtained from the camera frame taken by the user using their own device. Consequently, these layout features are combined to generate signage layout map, which is then compared to pre-learned layout map for position detection and perspective correction using homography estimation. To grab a specific content, users are able to choose a section within the captured layout using the device’s interface, which in turn creates a request to contents server to send respective content information based on a timestamp and a unique section ID. In this paper, we describe implementation details, report user study results, and conclude with discussion of our experiences in implementation as well as highlighting future work.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
- Digital signage
- Public display
- Public-to-private
- Multi section
- Layout recognition
- Computer vision
- Visual features
- Line segment
- User study
1 Introduction
Due to the increasing availability and adoption of mobile smart devices, users can enjoy online shopping from any location reachable within a network. Users generally access applications that can record a significant amount of individual information; thus, users can engage in convenient and personalized shopping experiences relative to their context and preference. However, despite efforts to optimize visualization for small screens, visual representation to support marketing engagement of e-commerce items on mobile devices remains limited.
In recent years digital signage has penetrated significant numbers of indoors and outdoors public spaces and gradually replacing traditional printed or electric (bulbs/LED) billboards in train or bus stations, department stores, office entrances, on building walls, and many city streets. This trend is mainly driven by the decreasing cost to deploy large LCD displays that are practically suitable to visualize dynamic information, which can be customized and scheduled based on marketers’ preference. These displays offer attractive visualizations of various types of content that can easily attract pedestrians’ attention, thereby serving as strategic entry points for advertising and e-commerce. Interactive digital signage with large displays is suitable for dynamically presenting extensive information to a large number of users. The common practice to maximize content exposure in digital signage is to divide the screen canvas into several sections respective to content’s type or genre. This multi-section signage offers clean layout design for supporting simultaneous multi-channel content delivery.
Significant numbers of users are hesitant to interact with public digital signage due to privacy concerns. Other people near the display can easily see private activities and content, which makes users reluctant to input personal information such as names, passwords, credit card numbers, and addresses. Marketers also observed that depending on the digital signage’s location and content, users are hesitant to even show interest to publicly visible signage; therefore limiting the strategic effectiveness of deploying advertisements in digital signage. We believe that an effective solution to maximize limited resources and obtain maximum user experience is the sharing of a large display to visualize extensive information and displaying personalized content on the users’ private display.
Research aiming to close the gap between public signage and user’s private mobile devices has been conducted very actively in Human-Computer Interaction (HCI) field [1–8]. In this work, we specifically address content bridging for multi-section digital signage. Search query keywords, URLs, QR codes as well as other 1D/2D barcodes are still widely used to provide users with an affordance to access a specific content’s detailed information. However, excessive usage of texts, visual codes, and URLs are incompatible for multi-section signage due to spatial and design constraints.
In this paper, we report GlassNage, an approach to address aforementioned issues by implementing signage layout recognition based on computer vision techniques, which includes corner, lines, edges, and line segment detection of images captured on the user’s device. The aforementioned 2D features are used to generate signage layout map, which is then matched with pre-learned layout map to define which layout is being used, and further, to detect corrected perspective using homography calculation. To grab a specific content, users are able to choose a section within the captured layout using the device’s interface, which in turn creates a request to content server to send respective info based on a timestamp and a unique section ID. Actual usage scenario is depicted in Fig. 1.
Using this approach, we mitigate requirement to pre-learn content for section-specific object recognition, thus allowing dynamic changes of content in each section. In real-world practices, these dynamic content include seasonal scheduling, real-time updates, changing contents source/channel, and so on. Moreover, our layout recognition performs in real-time at up to 8 fps for 720 p resolution (1280 × 720), deployed in a Glass-like wearable device with OMAP4430 2 GB RAM dual-core System-on-Chip (SoC), on the Android 4.4.2 operating system.
In this paper, we contribute to HCI community by presenting GlassNage prototype design and implementation, as well as user study evaluation results. In addition, we also discuss limitations, insights, and design principles for future work and for other researchers with similar interests to bridge public content for private consumption.
2 Related Work
Various research and development in online-to-offline (and vice versa) shopping approaches using signage have been undertaken. TESCO deployed a trial advertising campaign that leverages static QR codes for product displays in a subway station [7]. Users could access product websites by scanning the QR codes with their devices. However, despite recent advancements in QR codes as well as other 1D/2D barcodes technology [9], these codes remain visually perceivable cues that potentially clutter signage design. Moreover, in multi-section dynamically scheduled content signage, putting visual codes on each section of signage layout is not a feasible solution.
A recent survey has highlighted the need for research in interactive digital signage that utilizes both public and private screens [1]. Turner [2] proposed cross-device eye-based interaction, which is a content-sharing mechanism that combines user gaze information with mobile input modalities to enable content transfer between public and personal displays in close proximity. However, this system requires special devices to detect public display and map user gaze information to a screen. SWINGNAGE [8] is a gesture-based mobile interaction system for a distant public display that focuses on item search and comparison on the public display. Users interact with content on the display using private devices. However, users are required to perform device pairing, which is based on the detection of user gesture information with a mobile device using a depth-camera. Due to camera line-of-sight issues and vision-based user-tracking limitations, this pairing mechanism is not feasible when multiple users simultaneously connect to the display from a relatively distant location.
Touch-Projector [4, 6] and Shoot & Copy [5] highlight a method to leverage camera equipped mobile device to recognize content that are being visualized in larger public screen, this method is referred as mobile interaction through video. Although partially sharing common concept of the underlined computer vision methodology, GlassNage focuses on applicability and deployment scaling in commercial signage, hence we extend the system design and implementation to omit the requirement of a central processing server, i.e., layout recognition is being performed in users’ private device. GlassNage’s target is commercial or marketing digital signage; therefore, we focus on user’s privacy protection and enhance the system to provide detailed information of a specific content on users’ private display.
3 GlassNage Implementation
3.1 Interaction Scenario Using GlassNage Framework
We highlight the framework of our proposed approach in Fig. 2. The signage content is developed on the top of Adobe Integration Runtime (Adobe AIR) platform. We construct a signage content scheduler server to control visualization for multiple displays, as well as to handle requests sent from GlassNage app.
When a user is interested in a specific content of the digital signage, s/he uses GlassNage app to take an image that captures a large part of the signage. The application subsequently performs signage layout recognition, and then overlays recognized layout in the users’ private display. User is then able to select which section does s/he wish to obtain more detailed information. User’s selected section ID paired with a timestamp will be sent to the server for obtaining a URL, which is a link to a web page containing detailed information of its respective content. GlassNage app will visualize the received URL to give users a notification to view this information.
3.2 Digital Signage Layout Design Approach
We describe our layout design approach in Fig. 3. We define a static layout consisting multiple sections, and assign a unique ID for each section. For visual enhancement, we create signage frame that functions as a placeholder for each section, as well as to increase visual affordance of the multi-section design. We select background design that largely preserves visual characteristics of the predefined static layout. Based on this layout, we assign content source for each section. Lastly, we fill the layout with assigned content for final visualization. Our prototype incorporates 11 sections (Fig. 3, left), which includes content such as café interior and exterior image slideshow, food and beverage menu image slideshow, special offer image slideshow, news (text and video), as well as weather forecast. We chose this layout mainly to represent typical model of multi-section signage deployed in cafés. Additionally, we intended to create more challenging recognition problem for testing purposes.
3.3 Layout Recognition
We implement client-server protocol to realize layout recognition and content distribution. Our layout recognition software was implemented within the GlassNage mobile application, which was mostly written in Java, with some parts in C ++ for computationally resource consuming functions. We utilize camera equipped Glass-like wearable as target device, considering that the device is equipped with a see-through optical head-mount display that will enhance user experience on public-to-private content retrieval.
We aggregate visual features such as corners, lines, edges, and line segments; to obtain computational model of the signage layout based on relative positioning of the aforementioned visual features. We use these 2D features due to the low computation time that is crucial for deploying the algorithm to mobile device with limited computing resources. Firstly, we apply this approach to the base content layout (Fig. 3, left) to form ground truth content layout map, which is utilized to match with features aggregated from the same approach that is applied to the camera frame. We implemented our matching using FLANN [10]. We describe the overview of our layout recognition approach in Fig. 4(b)–(d).
3.4 User’s Selection Method and Visualizing the Result
An exemplary figure of user’s view after successful layout recognition is depicted in Fig. 4(e). The green outline visualizes the detected signage layout, and sections are highlighted in colored shades. To select a section, user can browse through sections using the Glass device’s touch interface by flick gestures. For other mobile device such as smartphones, users can simply tap on the desired color-shaded section.
After user selection has been confirmed, the GlassNage mobile application creates a request to the content distribution server to send back a URL that refers to details or further information related to selected section (depicted in Fig. 2(d)). This request contains data such as signage ID, layout ID, section ID, and a timestamp (obtained when user’s selection is confirmed). The server side of our system also manages contents scheduling, therefore pairing between a specific timestamp to layout section ID and its’ respective content is straightforward. After receiving and process this request, the server will then send a URL to the client GlassNage mobile application. The user is then presented with the URL on their private heads-up display, and has the choice to browse detailed content using the browser.
4 Evaluation
4.1 GlassNage Application Statistics
To evaluate GlassNage mobile application performance, we used Android application analysis tools available in the Android SDK. We deployed GlassNage in a Glass-like wearable device with OMAP4430 2 GB RAM dual-core System-on-Chip (SoC), on Android 4.4.2 operating system. Our layout recognition performs in real-time at up to 8 fps for 1280 × 720 (720 p) resolution. The camera of this device had 54.8 degrees horizontal and 42.5 degrees vertical Angle-of-View. A series of quantitative and qualitative user study were conducted to test the feasibility of GlassNage approach. The results are presented in the following subsections.
4.2 Focus of the User Study
In our experiments, we used GlassNage mobile application that was deployed in a Glass-like wearable device. We identified a usability issue when a user is trying to frame the content that they are actually seeing with the camera frame. I.e., natural users’ Field-of-Vision (FoV) do not align well with the camera’s Angle-of-View (AoV). This is mainly caused by:
-
1.
There is only a single camera, i.e. not stereoscopic, which does not compensate 3D gaze.
-
2.
The position of the camera is in the front-right part of the frame, which does not match with human FoV’s centroid.
-
3.
The camera only has a relatively narrow AoV (54.8 degrees horizontal and 42.5 degrees vertical), compared to in total of 124 degrees FoV of the human.
We illustrate our finding in Fig. 5. Based on this finding, we first focused in indicative factors of users targeting behaviour and also their accuracy. The second focus of our study was to assess users’ perceptual workload when using GlassNage.
4.3 Quantitative User Study
We conduct a series quantitative user study to test users behaviour as well as their accuracy when trying to align their perceptual FoV and Glass device’s camera AoV.
Participants. We recruited six participants for this study. The participants were all from outside of our research organization. They were 4 male and 2 female, age 25.4 ± 4.21 years old. All of the participants were familiar with the Glass-like wearable device, and were confident of wearing, seeing the display, and interact with the side-mounted touch panel.
Procedures. Firstly, we instructed the participants to stand 2 meters in front of a 60-inch monitor that was pivoted vertically (resembles the setup depicted in Fig. 1 left). The monitor was displaying signage content that previously described in Fig. 3. We assigned 11 sections in the signage, and presented content related to coffee shop menu, news, weather, etc. Second, we instructed the participants to comfortably center their head posture and FoV, as well as to focus their eye gaze onto a designated section, promptly followed with taking a picture with the Glass-like wearable device’s camera app. We asked each participant to perform this task for all 11 sections of the signage, and repeat this series of tasks for 5 times.
Data Statistics. For each signage section, we obtained 5 images from each participant; i.e., in total we have 30 images capturing the same signage section. The images were taken using the 5MP camera of the Glass-like wearable device, which translates to 2528 × 1856 pixel resolution. We imported the images from the internal storage of the Glass-like wearable device into a desktop PC to perform further analysis. We did not change the size or aspect ratio of the images.
Processing and Analysis. Firstly, we locate the centroid of each image (xc, yc) = (1129, 928). We then locate the target signage section in the image, extract the centroid (xs, ys), and calculate the Euclidean distance between image centroid (xc, yc) = (1129, 928) and captured section centroid (xs, ys).
Results. We compile the results in Table 1. Table 1 shows a comprehensive comparison of how users targeting behaviour and accuracy are affected by the size and position of the signage sections. Section number 1, 2, 3, 8, and 10 represent larger signage section area. In these sections, users’ center targeting absolute deviations were relatively high. Section 6, 7, 9, and 11 represents horizontally wide sections. Notably for section number 6 and 7, absolute deviations were the highest of all sections. Interestingly, sections with small area such as 4, 5, and 9 have relatively lower absolute deviations.
Insights and Design Implications. We observe participants’ behavior during the signage section targeting study, as an addition to the assessment of participants’ accuracy when trying to align their perceptual FoV and Glass device’s camera AoV. We compile our insights as listed below:
-
1.
Participants tended to perform fine-adjustment to their framing when targeting at sections that have small area (e.g. section number 4, 5, 9). Therefore, we can conclude that users are more cautious during targeting (hence their overall absolute deviations are lower), when compared to targeting sections with larger area.
-
2.
Sections with larger area give users more instant confidence in targeting task. However, the absolute deviations are relatively high. Therefore, we need to incorporate more deviation permissive framing procedure to the GlassNage app, or any other system that relies on Glass-mounted camera capture.
-
3.
Sections with horizontally wide area are quite difficult for users to target, when we compare users targeting with section’s centroid. This is mainly due to spatial perception of users when framing such sections, where users are more likely to be satisfied with their framing although it’s not horizontally centered.
-
4.
Overall, using our Glass-like wearable device, we learned that more sophisticated alignment method is desirable to support users’ perceptual matching between their FoV and Glass device’s camera AoV. In current GlassNage implementation, we mitigate this issue with allowing the user to firstly capture an image, and then perform layout recognition. By doing so, we allow users to capture image that consists their section-of-interest, as well as other important landmark features.
4.4 Qualitative User Study
A qualitative user satisfaction study was conducted to test the usability of GlassNage.
We recruited the same participants as the previous Quantitative User Study.
Procedures. Each participant was provided with a Glass-like wearable device that was pre-installed with GlassNage application. S/he was then given a brief introduction followed by a set of instructions on how to use GlassNage. This was immediately followed by asking each participant to interact with the app and digital signage. The experimenter intervened when specific questions were asked, or when the instructions were misunderstood. After the exercise, the participants were asked questions related to the perceived workload from the NASA-TLX assessment [11].
In addition, “Was GlassNage hard to learn?” was always added into the questionnaire to gain insight on learning curve of GlassNage. This was followed by several general questions about GlassNage that are shown below:
-
1.
Did GlassNage make the content browsing experience more enjoyable?
-
2.
Did you feel that fetching content items through GlassNage is more effective than previously available methods?
-
3.
Do you have any additional comments?
All the questions above were rated using the Likert scale (1: strongly agree – 5: strongly disagree).
Results. Table 2 shows the participants’ rating on the perceived workload (NASA-TLX) of the user study.
The results from Table 2 show that participants felt positive while using GlassNage. The average rating for the subscale “mental demand”, “effort”, and “frustration” was the lowest at 3.61 ± 1.52, 3.24 ± 1.42, and 3.64 ± 1.25, respectively. On the other hand, the participants showed positive feelings that GlassNage was not difficult to learn (4.23 ± 1.52). Finally, the participants felt that overall GlassNage did perform well (1.21 ± 1.51).
The general questions suggested that subjects agree that GlassNage did make the content browsing experience more enjoyable (1.02 ± 0.24) and also felt that GlassNage is more effective (1.43 ± 0.84) than previous methods of content fetching.
Many comments were given regarding technical issues in the application such as:
-
1.
Implement finger pointing gesture recognition to select content rather than using touch panel.
-
2.
Implement faster signage section recognition framework.
-
3.
Include a function to “push” information to public signage
In addition to the user obtained feedbacks, we observed that some participants initially had difficulties to grab a frame capture of their desired section. This is coherent with the issue we raise in Quantitative User Study subsection. This motivates us to explore ways to mitigate this problem.
5 Discussion
With GlassNage we explored a simplistic design to incorporate signage layout recognition to assist users in selecting the specific part of dynamic signage content. We realize that our simplistic design may only work for very particular interface designs that consist of different tiles with clear contours. However, we believe that this work contributes in highlighting the maximal result using minimal computational strategy, and discuss limitation of this particular strategy.
A limitation remains that borders between content must remain visible; to be recognizable as line or line segments. We argue that visible borders can be incorporated within the signage design itself; hence mitigate the visual sense of a frame. In the case of content appearing as 2D features, our layout features incorporate correlation between lines, corners, and line segments, hence mitigate the false negatives. Temporal comparison between multiple frames can also be implemented to filter non-static lines, corners, and line segments.
6 Conclusion and Future Work
We presented GlassNage, an approach to support dynamic content transfer from publicly available large display digital signage that has multiple sections, to users’ private display. We implemented a series of computer vision techniques to detect and recognize content layout. We have conducted quantitative and qualitative user study to evaluate users’ targeting behavior, users’ perceived workload, and GlassNage framework usability. We also highlighted insights and interface design implications which were aggregated from the user study results and observation of participants’ behavior. In future iteration of this research, we plan to explore non-distractive layout frame visualization, and finding appropriate interface design to accommodate smoother user’s section selection.
References
She, J., Crowcroft, J., Fu, H., Li, F.: Convergence of interactive displays with smart mobile devices for effective advertising: a survey. ACM Trans. Multimedia Comput. Commun. Appl. 10(2), Article 17, 16 pp. (2014) doi:10.1145/2557450
Turner, J.: Cross-device eye-based interaction. In: Proceedings of the Adjunct Publication of the 26th Annual ACM Symposium on User Interface Software and Technology (UIST 2013 Adjunct), pp. 37–40. ACM, New York, NY, USA (2013). doi:10.1145/2508468.2508471
von Gioi, R.G., Jakubowicz, J., Morel, J.-M., Randall, G.: LSD: a line segment detector. Image Process. Line 2(2012), 35–55 (2012). http://dx.doi.org/10.5201/ipol.2012.gjmr-lsd
Boring, S., Baur, D., Butz, A., Gustafson, S., Baudisch, P.: Touch projector: mobile interaction through video. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2010), pp. 2287–2296. ACM, New York, NY, USA (2010). doi:10.1145/1753326.1753671
Boring, S., Altendorfer, M., Broll, G., Hilliges, O., Butz, A.: Shoot and copy: phonecam-based information transfer from public displays onto mobile phones. In: Proceedings of the 4th International Conference on Mobile Technology, Applications, and Systems and the 1st International Symposium on Computer Human Interaction in Mobile Technology (Mobility 2007), pp. 24–31. ACM, New York, NY, USA (2007). doi:10.1145/1378063.1378068
Boring, S., Gehring, S., Wiethoff, A., Blöckner, A.M., Schöning, J., Butz, A.: Multi-user interaction on media facades through live video on mobile devices. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2011), pp. 2721–2724. ACM, New York, NY, USA (2011). doi:10.1145/1978942.1979342
The QR Code Tesco Store: From Concept to Reality. http://2d-code.co.uk/tesco-qr-code-store/. Accessed 20 Feb 2015
Yamaguchi, T., Fukushima, H., Tatsuzawa, S., Nonaka, M., Takashima, K., Kitamura, Y.: SWINGNAGE: gesture-based mobile interactions on distant public displays. In: Proceedings of the 2013 ACM International Conference on Interactive Tabletops and Surfaces (ITS 2013), pp. 329–332 (2013). ACM, New York, NY, USA. doi:10.1145/2512349.2514596
Chu, H.-K., Chang, C.-S., Lee, R.-R., Mitra, N.J.: Halftone QR codes. ACM Trans. Graph. Article 217 32(6), 8 pp. (2013). doi:10.1145/2508363.2508408
Muja, M., Lowe, D.: Fast approximate nearest neighbors with automatic algorithm configuration. In: Proceedings of the International Conference on Computer Vision Theory and Applications, pp. 331–340 (2009)
Hart, S.G., Staveland, L.E.: Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. Adv. Psychol. 52, 139–183 (1988). North-Holland, ISSN 0166-4115, ISBN 9780444703880
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Mujibiya, A. (2015). GlassNage: Layout Recognition for Dynamic Content Retrieval in Multi-Section Digital Signage. In: Streitz, N., Markopoulos, P. (eds) Distributed, Ambient, and Pervasive Interactions. DAPI 2015. Lecture Notes in Computer Science(), vol 9189. Springer, Cham. https://doi.org/10.1007/978-3-319-20804-6_31
Download citation
DOI: https://doi.org/10.1007/978-3-319-20804-6_31
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20803-9
Online ISBN: 978-3-319-20804-6
eBook Packages: Computer ScienceComputer Science (R0)