Abstract
Interactive tabletop projections overlay an interface into the spatial environment and enable new and frameless fields of interaction. The continually evolving technology has potential in several application areas. In contrast to traditional invasive technology setups such as displays, capacitive foils, or infrared grids, projected systems do not require any construction of the interaction surface. This overview intends to present the current state of the art in touch tracking methods for tabletop projections and identify the potential of such environments as a new generation of interfaces for current and future research in this emerging field. It reviews selected novel methods for touch input, using sensors, on a flat surface projection, including a timeline of selected recent work, and a summary showing the characteristics, primary strengths, and drawbacks. Methods using machine learning are promising and linked to research in mid-air fingertip recognition, as implementations using depth and infrared sensors are more sophisticated in terms of fingertip recognition and accuracy.
You have full access to this open access chapter, Download conference paper PDF
Similar content being viewed by others
Keywords
1 Introduction
Unlike traditional invasive technologies, such as displays, foils, or infrared (IR) grids that are limited in detecting interactions based on an assembly, in tabletop interactions, an existing environment integrates into the field of interaction. This integration allows a high degree of freedom to incorporate elements into the environment and provides new interaction possibilities that can fundamentally change traditional user interfaces. Projections liberate us from the thought of an image format, as the physical environment offers the natural boundaries of interaction possibilities, and enables the fusion of the physical world with possibilities of interaction and projection, without invasive assemblies in the environment.
This review is on methods in multi-touch tracking with an image projection on a flat surface (e.g., table or wall) using a projection device (e.g., beamer) and sensors (e.g., depth sensor). One of the research topics of the Institute for Interactive Technologies at the University of Applied Sciences and Arts Northwestern lies within the scope of interactive blended projections. They have introduced “Live Paper” [5], an interactive tabletop projection-based environment based on the findings of Xiao et al. [18], to enhance user interaction scenarios. The field of tabletop projections continuously develops in terms of interaction and interface design with potential in multiple areas of application. So far, we have been limited by the lack of possibilities of existing sensors and projection technologies. We are now at a point where there is potential for further development, as new sensors and methods have been introduced.
This review intends to find out the current state of touch tracking for tabletop projections to use in future projects and tries to identify state of the art and the potential of such environments as a new generation of interfaces for current and future research in the institute. It will examine which methods are relevant to the next development steps and identify trends for future research. The main research question, “What is the current state and where lies the potential for current research?" should cover emerging trends and identify further research areas for this application. The following criteria must be fulfilled by a published paper to be included in this review:
-
published between 2015–2019
-
projection-based on a non-moving surface (e.g., wall and table)
-
no assembly of surface needed
-
multiple fingertip detection independent of the sensor angle (i.e., a fingertip behind another fingertip should be recognized if in the detection area)
It is not intended to cover the history of projection-based touch tracking. Where required, papers published before 2015 or not meeting the selection criteria can be mentioned. It emerged from a systematic overview of different methods in tabletop projections. As the selected paper do not distribute sufficient statistical data and are difficult to compare, this review excludes statistical comparisons and approaches a narrative review.
2 Overview of Interactive Tabletop Projection Methods
Table 1 shows an overview of milestones in touch tracking for tabletop projections meeting the criteria. For a more detailed summary of the history of projected interfaces, see Xiao et al. [16].
Comparability. Unfortunately, not all methods and measurements can be compared to each other, as they use different sensors, platforms, or not all results are published. For example, Chai et al. [2] state that they could not compare their results to Xiao et al. [18], because of the limitations of the environment and sensors. Figure 1 shows the citation graph of the chosen papers revealing that Lee et al. [10] did not refer to any of the other selected papers, which makes it difficult to compare. To give an overview, Table 2 shows the primary subjective strengths and drawbacks of selected papers with their characteristics. Fujinawa et al. [6], Gao et al. [7], Gregor et al. [8], and Wu et al. [15] published to little technical information to be included.
2.1 Review of Selected Methodical Approaches
As there are different approaches on a conceptual level (e.g., depth segmentation or hand pose estimation) but also using separate, or combining, sensors, we identify tasks in touch recognition, which also structures the following sections. Each section contains the approaches of selected papers that made notable public contributions to that topic. Touch needs to be detected (see Sect. 2.2). Such a detected touch can be associated using an approximate region (see Sect. 2.3) and reduced into a touchpoint with an accuracy, measuring the distance from the intended target point to the estimated point (see Sect. 2.4). The detected touchpoint can be associated with the left or right hand (see Sect. 2.5), or further classified with a fingertip such as a thumb, index finger, middle finger, ring finger, and little finger (see Sect. 2.6).
2.2 Touch Detection
The challenge of touch detection is sensing when a finger has physically contacted a surface [18, p. 1]. Cadena et al. [1] compare the average of the assumed touchpoint and neighboring depth values (due to noise) to a threshold above the learned background. Both Cadena et al. [1] and Xiao et al. [18] apply the hysteresis process to avoid rapid changes in the touch state. Xiao et al. [18] distinguish hovering and contact by including the neighborhood around the estimated touchpoint before comparing it to the threshold. In Lee et al. [10], the touch region is determined by comparing the average of depth values of several measurements to the initially calibrated background to reduce the noise of the touch panel and comparing them to a threshold [10, p. 4]. Lee et al. [10] apply 3x3 block patterns on the depth image values on non-background neighboring pixels, which are below a threshold, of a touch region to detect the touchpoint. Zhang et al. [19] propose a method to determine if a fingertip contacts the surface based on a finger-modeling approach (see Sect. 2.3), comparing the depth values to the part around the fingertip.
2.3 Touch Region Detection
Touch region detection describes how the region of interest (ROI), such as hands and fingers, are extracted omitting the background. Xiao et al. [16, 18] divide touch tracking systems into background and finger modeling approaches. A background modeling approach detects contact with the surface by comparing current depth values with a background map [18, pp. 2–3]. In contrast, finger modeling does not require background data because they segment fingers based on their characteristics [18, pp. 2–3].
Wilson [14] suggested a static background model, comparing the surface depth value using a minimal and maximal threshold. Also, Choi et al. [4], Son et al. [13], and Lee et al. [10] capture a static background depth image once. Such methods have been extended by Cadena et al. [1] using the arithmetic mean for each depth pixel, to calibrate the background model initially. Chai et al. [2] use the 30 first frames to build the background model and calculate a per-pixel depth-difference background histogram. Lee et al. [10] enable us to choose the method to obtain the depth values of the background between average, which is faster but could be incorrect if not enough depth values are available, and mode, which is slower because a sort is needed. Xiao et al. [18] introduced a statistical model of the background inspired by “WorldKit” [17], to enable dynamic updating, allowing changes on the projection surface. Son et al. [13] detect the ROI using thresholds and extracts the three largest connected components. Zhang et al. [19] remove static objects from the background to extract the hand blobs on the foreground image.
2.4 Touchpoint Estimation
Xiao et al. [18] and also Cadena et al. [1] are extending the depth sensor, due to its noise, with IR information to detect edge contours better. Cadena et al. [1] extracts the farthest point concerning the center of the hand, but additionally use a k-curvature algorithm to support closed fists or joined fingers and corrects the touchpoint towards the edges extracted from the IR image. Xiao et al. [18] extract the farthest point concerning the center of each finger. If this fails, forward projection using arm and hand position is used. Son et al. [13] compute the skeleton of the connected components using morphological operations, calculate the centroid, and 20 corresponding extremal points using Dijkstra’s algorithm to test with an introduced complementary fingertip model and a cost function. Lee et al. [10] determine the touchpoint using a bounding box and 3 \(\times \) 3 block patterns. The touch path is corrected using a predicted position, the measured position, and a weight parameter to solve the disadvantage of a slow response in case of a sudden change of touchpoint movement. Fujinawa et al. [6] introduce interaction modes for different hand poses. A convolutional neural network (CNN) detects positions on the hand (e.g., fingertips, finger joints, center of palm) and obtains the position by comparing the depth value of each fingertip to the background depth, and selects an interaction mode (such as index posture) to reduce false fingertip detection. Matsubara et al. [11] propose a detection method to detect touch by the shape of the finger’s shadow using two IR lights.
2.5 Arm, Hand, and Fingertip Detection
Cadena et al. [1] extracts the arm-hand contours from ROI and uses the K-means algorithm to create two clusters, to classify the area with more points inside as hand region. Xiao et al. [18] use a flood fill segmentation to form a hierarchy containing arm, hand, finger, and fingertip and reject finger-like objects. Xiao et al. [18] use only the IR edge to fill fingertips, as the fingers merge with noise, but have built a fallback to depth-only touch tracking if the IR image is unusable (e.g., holes in the edge image). Fujinawa et al. [6] distinguish hands from objects and detect positions of hand fingertips, finger joints, and the center of palm using their CNN. Zhang et al. [19] extracts the hand region using the IR and depth image and applies a modified convex hull algorithm to detect candidate fingertips on the hand contour.
2.6 Fingertip Classification
Knowing which finger triggered the touch can bring extended functionality to a user (e.g., using the thumb as an eraser). As Choi et al. [4] stated, “determining the identity of a fingertip is more difficult than detecting fingertips” [4, p. 1488]. Choi et al. [4] noted that “index fingers and thumbs are the most frequently used fingers for human-computer interaction” and classify fingertips into the (left and right) index and thumb fingertips using cascaded random forests and a score function [4, p. 1487]. Chai et al. [2] developed a deep CNN-based hand pose estimation to identify fingertips.
3 Conclusion
Concerning the main research question, “What is the current state and where lies the potential for current research?” touch tracking on tabletop projections using different sensors and methods have been reviewed. The analysis of the selected papers suggests that there is an increasing trend to use machine learning to estimate the hand posture, which looks very promising. They are closely related to mid-air recognition, an emerging area in current research, which will also provide results for touch tracking in interactive table projections. Nevertheless, implementations using background modeling are more refined in terms of fingertip recognition and accuracy. Different sensors (e.g., depth sensor, IR sensor) are used or even combined to make surfaces touchable. Since it is a challenge to select a specific method for an applied solution, a testing framework would ensure the comparability of a method in the domain of interactive tabletop projections. It would be interesting to use low-power computing devices to enable further implementations in the emerging area of the Internet of Things (IoT). New technological possibilities, such as solid-state LIDAR or radar technology, could also provide new opportunities for the application of interactive table projections. The classification of fingers and hands, but also touch gestures, can improve touch recognition and offer new functionality for touch input (e.g., use the little finger as an eraser). The combination of the mature results of sensor processing with new insights from anatomy and machine learning or even new sensors (e.g., radar) would create more accurate and functional input systems.
References
Cadena, A., Carvajal, R., Guaman, B., Granda, R., Pelaez, E., Chiluiza, K.: Fingertip detection approach on depth image sequences for interactive projection system. In: 2016 IEEE Ecuador Technical Chapters Meeting, ETCM 2016. pp. 1–6. Institute of Electrical and Electronics Engineers Inc. (10 2016). https://doi.org/10.1109/ETCM.2016.7750827. http://ieeexplore.ieee.org/document/7750827/
Chai, Z., Shilkrot, R.: Enhanced touchable projector-depth system with deep hand pose estimation. CoRR abs/1812.11090 (2018). http://arxiv.org/abs/1812.11090
Cheng, J., Wang, Q., Song, R., Wu, X.: Fingertip-based interactive projector-camera system. Sig. Process. 110, 54–66 (2015). https://doi.org/10.1016/j.sigpro.2014.08.043
Choi, O., Son, Y.J., Lim, H., Ahn, S.C.: Co-recognition of multiple fingertips for tabletop human-projector interaction. IEEE Trans. Multimed. 21(6), 1487–1498 (2019). https://doi.org/10.1109/TMM.2018.2880608. https://ieeexplore.ieee.org/document/8528493/
Dolata, M., et al.: Welcome, computer! How do participants introduce a collaborative application during face-to-face interaction? In: Lamas, D., Loizides, F., Nacke, L., Petrie, H., Winckler, M., Zaphiris, P. (eds.) INTERACT 2019. LNCS, vol. 11748, pp. 600–621. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-29387-1_35
Fujinawa, E., Goto, K., Irie, A., Wu, S., Xu, K.: Occlusion-aware hand posture based interaction on tabletop projector. In: UIST 2019 Adjunct - Adjunct Publication of the 32nd Annual ACM Symposium on User Interface Software and Technology, pp. 113–115. Association for Computing Machinery, Inc. (10 2019). https://doi.org/10.1145/3332167.3356890
Gao, Y., Huang, C.M.: PATI: A projection-based augmented table-top interface for robot programming. In: International Conference on Intelligent User Interfaces, Proceedings IUI, vol. Part F1476, pp. 345–355. Association for Computing Machinery (2019). https://doi.org/10.1145/3301275.3302326
Gregor, D., Prucha, O., Rocek, J., Kortan, J.: Digital playgroundz ACM reference format. ACM SIGGRAPH 2017 VR Village on - SIGGRAPH 2017, pp. 1–2 (2017). https://doi.org/10.1145/3089269.3089288. http://dl.acm.org/citation.cfm?doid=3089269.3089288
Laput, G., Harrison, C.: SurfaceSight: a new spin on touch, user, and object sensing for IoT experiences. In: Conference on Human Factors in Computing Systems - Proceedings. Association for Computing Machinery, May 2019. https://doi.org/10.1145/3290605.3300559
Lee, D.S., Kwon, S.K.: Virtual touch sensor using a depth camera. Sensors (Switzerland) 19(4) (2019). https://doi.org/10.3390/s19040885
Matsubara, T., Mori, N., Niikura, T., Tano, S.: Touch detection method for non-display surface using multiple shadows of finger. In: 2017 IEEE 6th Global Conference on Consumer Electronics, GCCE 2017, vol. 2017-January, pp. 1–5. Institute of Electrical and Electronics Engineers Inc., December 2017. https://doi.org/10.1109/GCCE.2017.8229364
Murugappan, S., Vinayak, Elmqvist, N., Ramani, K.: Extended multitouch: recovering touch posture and differentiating users using a depth camera. In: UIST 2012 - Proceedings of the 25th Annual ACM Symposium on User Interface Software and Technology, pp. 487–496 (2012). https://doi.org/10.1145/2380116.2380177
Son, Y.J., Choi, O., Lim, H., Ahn, S.C.: Depth-based fingertip detection for human-projector interaction on tabletop surfaces. In: 2016 IEEE International Conference on Consumer Electronics-Asia, ICCE-Asia 2016. Institute of Electrical and Electronics Engineers Inc., January 2017. https://doi.org/10.1109/ICCE-Asia.2016.7804809
Wilson, A.D.: Using a depth camera as a touch sensor. In: ACM International Conference on Interactive Tabletops and Surfaces, ITS 2010, pp. 69–72 (2010). https://doi.org/10.1145/1936652.1936665
Wu, Q., Wang, J., Wang, S., Su, T., Yu, C.: MagicPAPER. In: ACM SIGGRAPH 2019 Posters on - SIGGRAPH 2019, pp. 1–2. ACM Press, New York (2019). https://doi.org/10.1145/3306214.3338575. https://dl.acm.org/citation.cfm?doid=3306214.3338575
Xiao, R.: SIGCHI outstanding dissertation award: on-world computing. In: Conference on Human Factors in Computing Systems - Proceedings, pp. 1–4. ACM Press, New York (2019). https://doi.org/10.1145/3290607.3313774. http://dl.acm.org/citation.cfm?doid=3290607.3313774
Xiao, R., Harrison, C., Hudson, S.E.: WorldKit: rapid and easy creation of ad-hoc interactive applications on everyday surfaces. In: Conference on Human Factors in Computing Systems - Proceedings, pp. 879–888 (2013). https://doi.org/10.1145/2470654.2466113
Xiao, R., Hudson, S., Harrison, C.: DIRECT: making touch tracking on ordinary surfaces practical with hybrid depth-infrared sensing. In: Proceedings of the 2016 ACM International Conference on Interactive Surfaces and Spaces: Nature Meets Interactive Surfaces, ISS 2016, pp. 85–94. Association for Computing Machinery, Inc., November 2016. https://doi.org/10.1145/2992154.2992173
Zhang, L., Matsumaru, T.: Near-field touch interface using time-of-flight camera. J. Robot. Mechatron. 28(5), 759–775 (2016). https://doi.org/10.20965/jrm.2016.p0759. https://www.fujipress.jp/jrm/rb/robot002800050759
Author information
Authors and Affiliations
Corresponding authors
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Pereto, S., Agotai, D. (2020). Review on Methods in Touch Tracking for Tabletop Projections. In: Stephanidis, C., Antona, M. (eds) HCI International 2020 - Posters. HCII 2020. Communications in Computer and Information Science, vol 1224. Springer, Cham. https://doi.org/10.1007/978-3-030-50726-8_37
Download citation
DOI: https://doi.org/10.1007/978-3-030-50726-8_37
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-50725-1
Online ISBN: 978-3-030-50726-8
eBook Packages: Computer ScienceComputer Science (R0)