Encyclopedia of Computer Graphics and Games

Living Edition
| Editors: Newton Lee

Interactive Augmented Reality Pop-Up Book with Natural Gesture Interaction for Handheld

  • Muhammad Nur Affendy Nor’aEmail author
  • Ajune Wanis Ismail
  • Mohamad Yahya Fekri Aladin
Living reference work entry
DOI: https://doi.org/10.1007/978-3-319-08234-9_365-1



Handheld augmented reality (AR) has been widely used with smart and portable device in the applications such as education, games, visual experience, and information visualization. However, most of the handheld applications do not fully support natural interaction, and the existing 3D pop-up book has used touch-based to interact with 3D content. Therefore, this entry describes a fundamental to design an interactive AR pop-up book with natural gesture interaction using real hand. Subsequently, the real hand gesture tracking in handheld AR is explored to examine how it can track user’s hands in real time. Thus, this entry describes about gesture interaction to allow the user to directly interact with the virtual objects. The user feels more realistic to interact with 3D objects using their bare hands on 3D pop-up book.


Augmented reality (AR)33 is a technology that allows computer-generated information or digital information including text, video, 2D virtual images, and 3D virtual objects to be overlaid onto the real-world environment in real time (Ismail and Sunar 2013). The main reason people intend to develop AR application is to merge the real world into the virtual world to provide the users with information-enhanced environment (Billinghurst et al. 2008). The connection between these two worlds seems impossible back then, but now it becomes an attraction, and its potential was very overwhelming. Usually, the virtual elements are generated by the computer and made to be overlaid onto the real world, to enhance the user’s sensory perception of the augmented world they are seeing or interacting with.

Nowadays, the concept of AR technology is used widely in entertainment, military training, engineering design, robotics, manufacturing, and other industries. AR technologies bring a lot of advantages to perform a task especially once it involves with design and planning. AR has the ability to perform 3D object manipulation and can provide natural user interaction techniques (Ismail and Sunar 2013). All developers take an advantage on AR technologies and believe it could help them to perform real task in virtual way easily besides reducing cost for real task and able to solve many issues which cannot be remedied in the real world.

The level of immersion for both elements of virtual and real objects in AR application refers to the merging of real and virtual worlds to produce AR environments and visualizations where real and digital objects coexist and interact in real time (Azuma et al. 2001). According to Ismail and Sunar 2013, a tracking process is very important in developing AR application and in running it in real time. The main requirements for trackers are high accuracy and little latency at a reasonable cost. The tracking of objects in the scene amounts to calculating the pose between the camera and the objects. Virtual objects can then be projected into the scene using the pose.

Augmented Reality Handheld Interface

There are three main fundamentals that can be found: tracking, display technology, and interaction (Billinghurst et al. 2008). Tracking is one of the fundamental parts in enabling technologies in AR, and it still have many problems that are unsolved (Ismail and Sunar 2013). Interaction technique issues in mobile AR and multimodal AR are becoming more popular. In vision-based interaction, hand and fingertip tracking and hand gesture recognition method are widely used to provide an easy way to interact with virtual object in AR (Chun and Lee 2012). A real-time vision-based approach was introduced to manipulate the overlaid virtual objects dynamically in a marker-less AR system using bare hand with a single camera (Cohen et al. 1989). It is natural that the collision between the human hand and the augmented object can occur during manipulation of the virtual 3D object. In AR, however, the collision happened between a virtual object and a real object; thus, the collision detection approach may be different compared with the ways in the real world. Most AR-handheld applications are not applying a natural interaction, and the user interactions mostly are using touch-based (Kim and Lee 2016). Therefore, this entry describes the interaction in an interactive pop-up book with natural gesture interaction using real hand in handheld interface.

The existing AR book which is generally known as the magic book contains 3D virtual and animated content registered on real book pages, mimicking a traditional “pop-up book” (Markouzis and Fessakis 2015). AR pop-up book is a book that involves the process of overlaying a virtual content onto the pages of a physical pop-up book. The current existing AR book that uses similar metaphor is MagicBook (Billinghurst et al. 2001). MagicBook offers the user the ability to experience the full reality-virtuality continuum. This is because the MagicBook itself is capable of changing the mode between AR and VR. Through AR display, the user is able to see a scene alike, and they could change the view mode to an immersive virtual environment. Another application that adopted the AR book metaphor is AR coloring book (Clark et al. 2011). The AR coloring book aims at augmenting an educational coloring book with user-generated AR virtual content.

Handheld interfaces have four interaction techniques that have been recently explored: touch-based interaction (Kim and Lee 2016), midair gesture-based interaction (Vuibert et al. 2015), device-based interaction (Samini and Palmerius 2016), and direct interaction (Hilliges et al. 2018). The traditional touch-based interaction methods for handheld AR cannot provide intuitive 3D interaction due to a lack of natural gesture input with real-time depth information (as agreed by Bai et al. 2013). Therefore, this entry aims to illustrate the design of natural interaction techniques in 3D spaces by handheld AR devices. Positions and movements of the user’s fingertips are corresponding to the manipulations of the virtual objects in the AR scene (as recommended in Bai et al. 2013).

Augmented Reality Pop-Up Book

There are three phases carried out to develop AR pop-up book that are described in the following subsections.

Phase 1: Defining Interactivity and Storytelling for AR Pop-Up Book

The interactivity for an interactive book happens when it contains story and activities which required the user to perform and interact. The real pop-up book does offer a lot of advantages, but in the transformation to more digital and interactive experience, the book offers a lot more than just a pile of heavy paper. Digital books recently have been widely restructured and recycled, yet it enhances the reading experience and more interactive than the conventional printed books. The main advantages of a digital book are that it can be customized to meet the reader’s prospect (Markouzis and Fessakis 2015). This phase is conducted to design and construct the 3D contents for AR pop-up book. The 3D object built with animation is developed during this phase since the physical pop-up book does not in a digital mode. It was a fully printed copy.

An interactive storytelling enables user to take part and affects the plot of the story, creating a new genre of narrations that is much more engaging and adaptive. Several levels of interactive storytelling start from a simple branching plot to fully dynamic narration models. Interactive storytelling constitutes a new genre of literature which promises considerable learning effectiveness. This stage also defined that the appropriate 3D animation could be applied on the virtual object so the visual is more appealing and interesting. However, the storytelling has been chosen based on the current available conventional pop-up book which is entitled Beauty and the Beast. The physical fairytale pop-up book is being used to provide pop-up book with the storytelling. Therefore, we were transforming the existing format for real pop-up book into AR transitional and tangible in order to measure the AR experience.

Phase 2: Setting Up AR-Handheld User Interface

The phase is carried out that includes determining the display technique, tracking technique, and interaction method. This stage focuses on setting up the handheld AR interface as shown in Fig. 1. The user interface for AR application that uses the “pop-up book” metaphor has been designed. In order to create a good AR presentation, ensuring the virtual environment was displayed in a correct alignment to merge with real environment. This stage is the crucial part. Next, the display technique that was chosen is a handheld display device. The tracking technique that has been applied in this project is a feature-based. Feature-based tracking technique involves the registration of the virtual element on top of the real marker. Sensor-based was used in this project since it required the depth data to recognize the user’s real hand gesture features. These elements have been prepared and examined to proceed with the next stage, the development of the AR pop-up book.
Fig. 1

Setting up of the AR-handheld interface

As illustrated in the diagram, it can be seen the hardware configuration. In order to overlay the virtual element on the top of real environment, the data of 3D object are loaded binding with 2D textures. In order to display the AR interface, handheld device is chosen as AR display technology. The standard vision-based tracking system works to recognize the input, marker, and user hand. It recognizes the registered marker before it was loaded with the appropriate 3D object onto the scene. The user’s hand required to be captured by the leap motion device (Guna et al. 2014). User interacts with the AR environment by using their bare hand as an interaction tool. The application is able to recognize user’s one hand to interact with the virtual object, and the other hand holds the handheld device. Users can interact with the virtual animation by performing a define gesture that is recognized by the system.

Phase 3: Pop-Up Book Feature Tracking

This phase is conducted to design and construct the 3D contents for AR pop-up book. The 3D object built with animation is developed during this phase since the physical pop-up book does not in a digital mode. It was a fully printed copy. The phase is carried out that includes determining the display technique, tracking technique, and interaction method.

The main challenge in AR pop-up book application is to ensure the registrations and hand tracking problem are effectively solved. AR pop-up book utilizes the benefit of hand gesture recognition technique as an interaction tool in the AR environment. The tracking library is used to track the page of the pop-up book that utilizes a feature-based tracking technique.

Figure 2 shows the natural feature tracking process. The original RGB image is captured and converted to features so it will be recognized by the camera as the target image. Printed-colored image in Fig. 2a shows the original state of the marker. The marker then will be converted into gray scale using image processing to gray-scale format as shown in Fig. 2b before it is being processed as image target in the form of features as shown in Fig. 2c. The features were recognized by the system as a unique identification. The system will detect the marker and register the marker with a virtual element. The virtual cube, for example, will appear on the top of the marker after the camera recognizes the marker. The AR user interface was using this tracking process to display animation on the top of pop-up book. The edges of real pop-up book are being converted into features for this project.
Fig. 2

Natural feature tracking process. (a) RGB image. (b) Gray-scale image. (c) Feature points

Phase 4: Developing Hand Gesture Interaction

This phase focuses on exploring the gesture interaction for the user to interact with AR pop-up book. The study of the pop-up book concept and its interactivity processes has been carried out in Phase 3. In order to enhance the realism in AR environment for conventional pop-up book, we merge the AR pop-up book with the live character, and the story elements of the pop-up book come alive. The character will follow user’s hand movement, and the story elements will activate the animation effects once user’s hands touch them. To actualize this realism effects, user interaction is crucial to precisely hit the characters. To look more natural, the user can use their bare hands to directly contact with the virtual elements.

Therefore, hand gesture recognition method is one of the crucial parts in this project as it acts as the input metaphor for the user to interact with the virtual object in AR environment. Sensor-based tracking device, leap motion, allows the application to read depth data and is able to track the position of user’s hand in real world and mapping it into the virtual world (Guna et al. 2014). 3D hand skeleton-based interaction uses a leap motion sensor attached in front or back of a mobile device to provide simultaneous manipulations of 3D AR objects. By capturing the hand skeleton and identifying 3D finger positions and orientations, we can support a more natural hand gesture-based interaction in an AR scene. In addition to the 3D translation-only tasks in the previous works, simultaneous 3D translation and 3D rotation are possible to alter the location, pose, and size of virtual objects with hand gestures. As shown in Fig. 3, sensor-based tracking device, leap motion, allows the application to read depth data during recognition. Then, the device produces positions and orientations. It runs to track the position of user’s hand in real world and to map it into the virtual world. To display virtual hand skeleton, the modeling process is required, and to enable interaction cues, the rigid body was applied to the 3D model of virtual hands. Once this process was completed, the gesture inputs are created.
Fig. 3

Hand gesture recognition method

Natural Gesture Interaction

This section explains on natural gesture interaction which was divided into the following process.

Phase 1: Acquiring Gesture Inputs

There are three gesture inputs that have been defined such as TouchGesture, SwipeGesture, and CircleGesture. TouchGesture represents a virtual object that will call an appropriate animation as a feedback once it is being touched. SwipeGesture represents a virtual object that is being swiped, while CircleGesture is being retrieved and updated whenever user performed a circling gesture at designated position in the AR environment and call appropriate animation.

Figure 4 shows the flow of acquiring gesture inputs. The process starts when a leap motion device detects the hand interaction from the user using the sensor, and the gestures are identified in the pose detection. Then, the signal is sent to start the skeleton calibration that later leads to skeleton tracking. In this project, gestures used are grabbing to grasp object, pointing to select menu, palm up gesture to activate menu, and pinch to rescale the 3D object. The next process is to develop the natural gesture to interact with virtual object in AR pop-up book. In the next section, the real hand human gestures were captured by leap motion device, and recognition process was executed to obtain depth data from leap motion sensor-based tracking system.
Fig. 4

Flow of acquiring gesture inputs

The SwipeGesture is a gesture input where the user swipes their index finger to interact with the virtual object of the AR environment. The gesture is defined in this particular project by calculating the velocity and speed of the tip of the index finger and the collision detection between the finger and the virtual object that can be interacted with.

The CircleGesture is a gesture input where the user makes a circling gesture by using their index finger to enable certain features in the AR environment in order to interact with the virtual object. The gesture is defined by calculating the vector, magnitude, and angle of the circle based on the position of the tip of the index finger of the user. Figure 5 is executed to calculate the angle.
Fig. 5

Flow of acquiring gesture inputs

The TouchGesture is a gesture input where the user touches the virtual object by using their index finger to enable certain features in the AR environment and interact with the virtual object. The gesture is defined in this particular project by making collision detection whenever the tip of the index finger collider of the user collides with the interactable virtual object in the AR environment.

Phase 2: Integrating Gesture with Handheld AR

In this phase, the gesture interaction technique is then integrated with the handheld AR scene. The hand gesture interaction technique has been developed for the user to interact with the AR pop-up book. In order to transmit the signal from the leap motion gesture tracking device to the application, we need to use Internet protocol. To actualize this, we enable the multiplayer networking as shown in Fig. 6. The network protocol in PUN (Photon Unity Networking) (Network 2015) is used, so we can send and receive gesture inputs to the AR handheld application.
Fig. 6

Flow of data transmitting using PUN

Through the PUN network protocol that is being implemented in this stage, the user hand tracking data (position and rotation) from the real world is being sent by desktop (sender) to the handheld device (client or receiver). Photon network always uses a master server and one or more game servers. The master server manages the currently available games and does matchmaking. Once a room is found or created, the actual gameplay is done on a game server. All servers run on dedicated machines.

Phase 3: Executing Gesture Interaction in AR Pop-Up Book

The leap motion is attached at the back of the smartphone, and the leap motion needs to be triggered and well-connected. It is necessary to enable the hand tracking and gesture interaction. Gesture recognition can be achieved by using the leap motion controller. It detects the hand gesture or hand signal as shown in Fig. 7. The hand gesture in the real world is recognized by the controller as shown in Fig. 7a, while the hand gesture in the virtual world is produced as shown in Fig. 7b.
Fig. 7

Gesture signal transferred to handheld device. (a) Real hand gesture. (b) Virtual gesture inputs

The virtual hand is the representative of the real hand. Each gesture that is detected by the leap motion sensor can be seen in the monitor. Thus, every hand gesture such as swiping, pinching, or pointing in the real world is replaced by the virtual hand. This is done to ease the system development and give the user an immersive feeling or realism. Handheld device captures the user’s bare hand to work with real hand gesture in handheld AR scene as presented in Fig. 8.
Fig. 8

Gesture signal transferred to handheld device

The handheld device’s camera has synchronized the video input (720 HD pixel resolutions, 25 frames per second). It was placed in single alignment with the physical pop-up book (image of the marker) and the leap motion device which is attached to the handheld device (Android Samsung), respectively, to detect the user’s real hand skeleton as shown in Fig. 9a. A handheld screen displayed the viewing of AR scene. On the top of the table, the pop-up book was demonstrated with the user’s fingertip as a controller to get the reference point of the augmentation as presented in Fig. 9b.
Fig. 9

AR pop-up book in handheld screen. (a) User interacts with AR pop-up book. (b) Swipe the gesture, it will bring character alive

Problems and Restriction

The AR pop-up book is demonstrated in this entry as an interactive AR environment that enables users to play with the storytelling. The gesture interaction provides the user to directly interact with the virtual objects. The user feels more realistic to interact with 3D objects using their bare hands, and the realism of the 3D objects appears on the top of the pop-up book in the real world. Hence, there are several problems that arise regarding the real-time 3D gesture sensing in AR pop-up book. The first problem is the accuracy of the hand’s detection because when the hands move into a certain position, it loses the accuracy. Accuracy in tracking is vital to ensure intuitive user interaction with the virtual elements (Lv et al. 2015). The second problem is that the user feels detached from the AR environment because of the indirect interaction method. However, these problems still persist especially when involving the precision of the hand’s detection which can cause problem in the performance. It is natural that collision between the human hand and the augmented object can occur when manipulating the virtual 3D object. In AR, however, the collision happened between a virtual object and a real object; thus, the collision detection approach may be different compared with the ways in the real world. In the user’s observation, with a handheld, the screens are often restricted and sometimes can be rotated between portrait and landscape. Handheld is small enough to hold and operate in the hand; nevertheless the user cannot use their both hands since the other hand needs to hold the device.

Based on the development stages described in the previous section, the standard guidelines emphasize on developing the handheld AR interface for AR pop-up book application that applies natural gesture interaction instead of touchscreen. The AR pop-up book development is explained in this entry but does not study the education pedagogy and the development stresses on AR technology to bring the physical book into more appealing and interesting handheld AR application. On the physical book, the virtual environment was overlaid in real time. The study on education purposes can be further explored the potentials and future work. Also, more future work in user’s interaction for usability aspect can be carried out such as invoking the multimodal interaction that may bring AR pop-up book to be more interactive when speech input complements the gesture. Multimodal interaction is seen to advance interaction technique in AR which can improve user’s experience in AR (Ismail and Sunar 2014; Piumsomboon et al. 2014). Handheld AR has been widely used with smart and portable device in the applications such as education, games, visual experience, and information visualization. However, most of the handheld applications have used touch-based to interact. Subsequently, the real hand gesture tracking in handheld AR is explored to examine how it tracks user’s hands in real time. This entry describes the gesture interaction that allows the user to directly interact with the virtual objects. Thus, the user feels more realistic to interact with 3D objects using their bare hands.


  1. Azuma, R., Behringer, R., Feiner, S., Julier, S., Macintyre, B.: Recent advances. In EEE Computer Graphics and Applications, 2011(December), 1–27 (2001)Google Scholar
  2. Bai, H., Gao, L., El-Sana, J., Billinghurst, M.: Markerless 3D gesture-based interaction for handheld augmented reality interfaces. In Mixed and Augmented Reality (ISMAR), 2013 IEEE International Symposium on, pp. 1–6. IEEE (2013)Google Scholar
  3. Billinghurst, M., Kato, H., Poupyrev, I. Tangible Augmented Reality. ACM SIGGRAPH ASIA 2008 Courses, 7, pp. 1–10 (2008)Google Scholar
  4. Billinghurst, M., Kato, H., Poupyrev, I.: The MagicBook: a transitional AR interface. Comput. Graph. 25(5), 745–753 (2001)CrossRefGoogle Scholar
  5. Chun, J., Lee, S.: A vision-based 3D hand interaction for marker-based AR. Int J Multimed Ubiquit Eng. 7(3), 51–58 (2012)Google Scholar
  6. Clark, A., Dünser, A., Grasset, R.: An interactive augmented reality coloring book. In: Mixed and Augmented Reality (ISMAR), 2011 10th IEEE International Symposium on, pp. 259–260. IEEE (2011)Google Scholar
  7. Cohen, P.R., Dalrymple, M., Moran, D.B., Pereira, F.C., Sullivan, J.W., Cohen, P.R., Sullivan, J.W.: Synergistic use of direct manipulation and natural language. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems Wings for the Mind – CHI ’89, vol. 20, pp. 227–233. ACM Press, New York (1989)CrossRefGoogle Scholar
  8. Guna, J., Jakus, G., Pogačnik, M., Tomažič, S., Sodnik, J.: An analysis of the precision and reliability of the leap motion sensor and its suitability for static and dynamic tracking. Sensors. 14(2), 3702–3720 (2014)CrossRefGoogle Scholar
  9. Hilliges, O., Kim D., Izadi S., Molyneaux D., Hodges S.E., Butler D.A.: Augmented reality with direct user interaction. U.S. Patent 9,891,704, issued February 13 (2018)Google Scholar
  10. Ismail, A.W., Sunar, M.S.: Intuitiveness 3D objects interaction in augmented reality using S-PI algorithm. Indones J Electr Eng Comput Sci. 11(7), 3561–3567 (2013)Google Scholar
  11. Ismail, A.W., Sunar, M.S.: Multimodal fusion: gesture and speech input in augmented reality environment. In: Computational Intelligence in Information Systems: Proceedings of the Fourth INNS Symposia Series on Computational Intelligence in Information Systems (INNS-CIIS 2014), vol. 331, p. 245. Springer, Cham (2014)CrossRefGoogle Scholar
  12. Kim, M., Lee, J.Y.: Touch and hand gesture-based interactions for directly manipulating 3D virtual objects in mobile augmented reality. Multimed. Tools Appl. 75, 16529 (2016)CrossRefGoogle Scholar
  13. Lv, Z., Halawani, A., Feng, S., Ur Réhman, S., Li, H.: Touch-less interactive augmented reality game on vision-based wearable device. Pers. Ubiquit. Comput. 19(3–4), 551–567 (2015)CrossRefGoogle Scholar
  14. Markouzis, D., & Fessakis, G.: Interactive storytelling and mobile augmented reality applications for learning and entertainment – a rapid prototyping perspective. In: Interactive Mobile Communication Technologies and Learning (IMCL), 2015 International Conference on, pp. 4–8. IEEE (2015)Google Scholar
  15. Network, P.U.: How to Create an Online Multiplayer Game with Photon Unity Networking (2015)Google Scholar
  16. Piumsomboon, T., Altimira, D., Kim, H., Clark, A., Lee, G., Billinghurst, M.: Grasp-Shell vs gesture-speech: a comparison of direct and indirect natural interaction techniques in augmented reality. In ISMAR 2014 – IEEE International Symposium on Mixed and Augmented Reality – Science and Technology 2014, Proceedings, pp. 73–82 (2014)Google Scholar
  17. Samini, A., Palmerius, K.L.: A study on improving close and distant device movement pose manipulation for hand-held augmented reality. In The 22nd ACM Symposium on Virtual Reality Software and Technology (VRST), Munich, Germany, November 02-04, 2016 (pp. 121–128). ACM Press (2016)Google Scholar
  18. Vuibert, V., Stuerzlinger, W., Cooperstock, J.R.: Evaluation of docking task performance using mid-air interaction techniques. In: Proceedings of the 3rd ACM Symposium on Spatial User Interaction (pp. 44–52). ACM (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Muhammad Nur Affendy Nor’a
    • 1
    • 2
    Email author
  • Ajune Wanis Ismail
    • 1
    • 2
  • Mohamad Yahya Fekri Aladin
    • 1
    • 2
  1. 1.Mixed and Virtual Reality Research Lab, VicubelabUniversiti Teknologi MalaysiaJohor BahruMalaysia
  2. 2.School of Computing, Faculty of EngineeringUniversiti Teknologi MalaysiaJohor BahruMalaysia