1 Introduction

Graphical user interfaces (GUIs) that utilize touchscreens as an input modality have become ubiquitous. In everyday life, we can observe several occasions where people use touchscreens to interact with technology. Touchscreens enable a flexible way of context-sensitive adaptation of the GUI while providing an intermediate form of interaction without requiring additional mediating input devices (e.g., trackpads or keyboards). Thus, they have become the primary way to interact with many technologies [1], for example, modern smartphones, tablets, navigation systems in the car, or vending machines. Also, in the aviation domain, the use of touchscreens is gaining momentum, as their flexibility and direct manipulation support offer both display and control input [2].

Thus, these interfaces have found common applications in the context of space travel. At the International Space Station (ISS), a new tablet-based system—the IPV Tablet system—has been recently deployed to support the ISS crew in daily tasks, including scientific experiments, spacewalk preparations, maintenance of life support, and other systems [3]. The IPV Tablet system allows the crew to flexibly access task details via a mobile tablet directly at the location of an experiment or maintenance task [3]. Another prominent example is one of SpaceX’s most recent spacecrafts for crewed flight, the Crew Dragon, which successfully launched in May 2020, and features a nearly completely touchscreen-based interface [4].

The great potential of touchscreen-based interfaces for future developments in space travel implies the need for sound performance evaluations with these input devices. That is because, when conducting a spaceflight, human performance is critical for mission success. For example, maneuvering a spacecraft or efficiently interacting with a complex data-rich user interface requires adequate cognitive and sensorimotor skills. Thus, it is crucial to understand how conditions that occur during space missions—specifically microgravity—might affect human performance. This concerns the mission’s start phase in which astronauts need to adapt to physiological conditions in space. Furthermore, in the light of planned long-term space missions, their sheer duration may influence performance [5]. While previous studies have investigated human performance during microgravity in tasks that require pointing or similar movements [6,7,8,9,10] most of them only considered tasks with non-interactive objects or “conventional” input modalities (e.g., trackballs, switches, and knobs). While a recent study by Holden et al. [11] shed first light on the effects of microgravity on task performance with touch devices, more research appears to necessary to broaden and consolidate our understanding of this critical subject. Thus, the present study aims to contribute to this research topic by exploring multidirectional tapping performance with touchscreen-based interfaces in a pointing task during a microgravity analog, namely head-down tilt bed rest (HDT).

A highly influential tool for studying pointing movements is Fitts’ law, a model describing the relationship between movement time and the size and distance of a target [12, 13]. The original model and its variations have been shown to apply in various situations [14,15,16]. However, so far Fitts’ law has not been used to describe tapping performance under long-term physiologically constraining conditions such as HDT. It is unclear to what extent the relationship suggested by Fitts’ law might be altered through these conditions. Hence, in addition to performance investigation, the present study examines the validity of Fitts’ law in microgravity analogs.

This paper offers three main contributions. First, we provide findings on Fitts’ law with touch devices under HDT conditions and extend the existing literature on the ubiquity of Fitts’ law in various contexts. Second, the present study adds to the limited knowledge on the effects of space travel on human performance with touch devices. This is highly relevant given the increasingly frequent application of touchscreen-based interfaces in space travel and elsewhere. Finally, the paper offers directions for future research regarding human performance with touch devices in microgravity and microgravity analogs.

2 Related work

2.1 Task performance in microgravity and microgravity analogs

Microgravity influences can be studied during actual spaceflights as well as during parabolic flights, which provide brief near-weightless environments. Moreover, several methodologies can be used as microgravity analogs (i.e., Earth-based simulations of microgravity) to estimate the effects of microgravity on the human body [17]. They are a resource-efficient approach to studying a relatively large number of participants over an extended period. A commonly applied microgravity analog is HDT which mimics several characteristics of the microgravity environment in space, including cephalad fluid shift, cardiovascular deconditioning, and body unloading [17]. While HDT is less reliable in simulating certain other effects on the human body during a space mission (e.g., spinal dysfunction and radiation hazards), at present, HDT with normal volunteers is the most common analog for microgravity simulation [18, 19].

Routines and tasks conducted during space travel often entail target acquisition via pointing movements (e.g., pressing a physical button on an instrument panel or tapping an interaction element on a touchscreen). In this context, studies have shown that microgravity can adversely affect human performance, particularly in tasks related to sensorimotor skills, including pointing movements.

Berger et al. [6] found that the movement duration of horizontal arm pointing movements to visual targets increased during short-term and long-term spaceflight. Additionally, movement peak velocities as well as acceleration and deceleration peak values decreased compared to baseline measures. This is in line with findings by Bock et al. [8], where participants pointed at visual targets during spaceflight without seeing their hands. The speed of pointing movements also declined significantly compared to conditions of terrestrial gravity. Another study by Bock et al. [7] investigated the accuracy of pointing movements toward mirror-viewed targets throughout parabolic flights. Results indicate that pointing movements were less accurate during microgravity compared to normal gravity. Worse performance in these tasks might also correlate with an increase in reaction time, which has been observed in the context of parabolic flights [9] and HDT [20]. Moreover, a study by Steinberg et al. [10] examined performance in a more realistic complex task instead of simple laboratory-type skills. They implemented a realistic instrument-control task in which participants controlled a simulated power plant via knobs and switches during short-term microgravity intervals of parabolic and level flight [10]. Among other findings, control efficiency (operationalized as net earnings of the simulated power plant) substantially decreased in microgravity, and hand velocity slowed down compared to normal gravity [10].

A recent study by Holden et al. [11] investigated the effects of space travel on fine-motor performance with touch devices for the first time. In the study, astronauts completed a test battery before, during, and after a standard duration (6-month) ISS mission [11]. Compared to a ground-based control group, astronauts demonstrated a decline in fine motor skills during gravitational transitions, specifically in the first week after reaching orbit and following landing [11]. However, no decline in motor skills was observed during the remainder of the in-flight period after the first week of adaptation [11].

In summary, previous studies suggest a decrease in human performance in tasks that require pointing or similar movements. However, despite the increasingly common application and great potential of touch devices for space travel, studies that investigate how prior results transfer to interactions involving touchscreen-based interfaces are scarce. Research on this subject is particularly relevant as previous studies have shown that, in general, interaction via touch can differ from interaction via indirect input devices (e.g., computer mouse) in terms of various parameters such as input speed and error rate [21,22,23]. A recent study by Holden et al. [11] provided first valuable insights into the effects of microgravity on task performance with touch devices. As these findings are based on a single paper, more research appears to be pertinent to solidify and expand knowledge on this topic.

2.2 Assessing human performance with touchscreen-based interfaces: Fitts’ law and performance in target acquisition tasks

A standard approach to investigate performance with touchscreen-based interfaces comprises tasks originating from works by Fitts [12]. Several decades ago, a study by Fitts [12] laid the foundation for what is now commonly known as Fitts’ law. Fitts’ law is a model that predicts the duration of pointing movements in target acquisition tasks. That might include pointing to a physical object as well as pointing movements to interact with a GUI via indirect (e.g., a mouse) or direct input (e.g., finger tap on a touchscreen). Fitts’ law uses the distance (also referred to as amplitude) to the target as well as its size as parameters. In general, the time to reach a target increases with a smaller target and a greater distance to the target.

The original model and its variations have been shown to apply for different input devices [16], user groups of different age and gender [14] as well as specific contexts, such as underwater environments [15]. For example, Kerr [15] compared performance in a tapping task under water and on land. The study revealed that movement times were faster on land than under water; however, Fitts’ law was still found to be valid in both conditions.

Thus, Fitts’ law has become a highly influential tool for research and design in human–computer interaction and ergonomics [13]. Also, specific to touch interactions, Fitts’ law models have been successfully applied in various contexts. This includes modeling and evaluating two-dimensional touchscreen tapping [24], tapping fatigability [25], thumb-based versus finger-based input [26], one-handed smartphone interaction [27], or input behavior when using two fingers on a multi-touch device [28]. However, until now, Fitts’ law has not been investigated as to how it can model tapping performance under long-term physiologically constraining conditions represented by HDT. Therefore, it is currently not possible to determine whether such conditions might alter the strength of the relationship suggested by Fitts’ law.

Fitts’ motivation for his studies was to investigate whether human performance in target acquisition tasks could be quantified [12, 29]. He designed a target acquisition task in which participants were instructed to alternately tap on two rectangular target plates with a metal-tipped stylus as quickly as possible while avoiding mistakes [12]. Width and distance of the target plates were systemically varied in one dimension. Based on empirical findings from his experiments, Fitts [12] introduced a metric to determine the difficulty of a task, the index of difficulty (ID). Inspired by an analogy from information theory where the distance to the center of a target (A; in in.) corresponds to the signal and the width of a target (W; in in.) corresponds to noise, the ID quantifies the difficulty of a target acquisition task in bits. To quantify performance in multidirectional tapping task, Fitts also introduced Throughput, which is the ratio between ID and movement time (i.e., time to reach the target [12]).

Researchers after Fitts have introduced several variants and modifications of the original model and respective metrics. This includes the introduction of the Shannon formulation based on information theory [13, 30, 31] to improve predictive quality of the model. While the theoretical foundation of the Shannon formulation has been disputed [32, 33], this form is most frequently applied for calculating the ID [13]. Moreover, as Throughput does not explicitly consider Error Rates (i.e., the proportion of trials in which a target is missed), these are often reported as a supplementary performance metric [13]. Another important addition is using effective values to account for the spatial variability in selection coordinates [34,35,36].

The formalizations of Throughput and Error Rate can be used to compare the performance characteristics with different non-keyboard input devices (e.g., mice, trackpads, touchscreens). The International Organization for Standardization (ISO) has adopted this approach in ISO 9241-9, which contains guidelines and testing procedures for evaluating non-keyboard input devices, including one-directional tapping, multidirectional tapping, dragging, tracing, and free-handwriting [37]. The tasks described in ISO 9241-9 have become the preferred way to evaluate non-keyboard input devices [38] and have been used to assess various devices, including videogame controllers. [39], touchscreen-based interfaces for older adults [40], or input devices in aircraft cockpits [41].

Although tasks based on ISO 9241-9 are typically used to compare performance for different input devices, they also appear to be a suitable methodology to assess performance with only one device under varying experimental conditions as they are easy to standardize and replicate and provide a theoretical foundation for performance assessment. Thus, to investigate changes in task performances with touchscreen-based interfaces during HDT, we chose to implement a multidirectional tapping task based on ISO 9241-9. We specifically chose a multidirectional tapping task as opposed to other ISO 9241-9 tasks because it requires pointing movements in two dimensions representing a highly universal movement pattern when interacting with touchscreen-based interfaces. Here, we consider four measurements: Movement Time, Throughput, Error Count (respectively Error Rate), and Touch Duration (these measurements are further described in the Method section). In contrast to other indicators, Touch Duration (i.e., Touch the duration of a participant touching the screen when selecting a target) is not an established performance indicator in the context of multidirectional tapping tasks. However, Touch Duration was included in our study because it quantifies a small but central part of the motion sequence during the interaction. Thus, it possibly allows us to capture more subtle effects of simulated microgravity on the interaction with touch devices.

3 Research questions and hypothesis

To sum up, prior research generally suggests that pointing performance decreases in microgravity and microgravity analogs. However, studies focusing on touch device usage in this context are scarce, despite the rising relevance of touchscreen-based interaction technology in aerospace and findings from previous research showing that, in general, the interaction with touch devices is not equivalent to traditional input devices [21,22,23]. Our analysis of state-of-the-art methodological approaches reveals that the ISO 9241-9 multidirectional tapping task is a useful methodology for investigating performance with touch devices. Another conclusion from previous research is that the validity of Fitts’ law has not been investigated for touch device interaction in microgravity or microgravity analogs.

To fill these gaps, the present study investigates whether multidirectional tapping performance with touchscreen-based interfaces is affected by (long-term) HDT. Moreover, it aims to examine the validity of Fitts’ law in HDT. Regarding these objectives, the following research questions (RQs) are formulated:

RQ 1: To what extent is the relationship of Movement Time and ID as suggested by Fitts’ law altered in HDT conditions?

RQ 2: How does the performance (in terms of Throughput, Movement Time, Error Rate, and Touch Duration) change during the transition from normal conditions to HDT?

RQ 3: How does performance (in terms of Throughput, Movement Time, Error Rate, and Touch Duration) evolve during long-term HDT?

4 Method

4.1 Sample

24 healthy participants aged 24 to 55 years (M = 33.33, SD = 9.38) took part in the study. 16 individuals (66.67%) identified themselves as male and eight (33.33%) as female. On a 7-point Likert scale, participants reported high familiarity with touch devices (M = 6.25, SD = 0.94). Affinity for technology interaction (ATI) was reported by the sample with M = 3.65 and SD = 1.20 on a 6-point Likert scale (see Sect. 4.5 for further details).

4.2 Materials

4.2.1 Multidirectional tapping task

A multidirectional tapping task adapted from ISO 9241-9 was implemented [37]. The technical implementation was based on the “FittsTouch” software [42]. The goal of the task was to select highlighted targets on a touchscreen as quickly as possible while avoiding mistakes.

A task session was structured into blocks and trials. Each block consisted of 11 targets to select, laid out in a circle, see Fig. 1. The selection of a target corresponded to one trial. Thus, each block comprised 11 trials in total. At the beginning of a block, the first target to select was highlighted. After selection, highlighting was set to the target opposite to the previously selected target. Selection continued in a rotating pattern until all targets were selected once [29]. Missing a target was registered as an error indicated by highlighting the respective target in red. In this case, participants had to correctly choose the corresponding target before the software converted to the next target to select.

Fig. 1
figure 1

Example task conditions with varying distance and width of the targets. a Example task with low difficulty (small distance, large target width), b example task with high difficulty (large distance, small target width)

Between blocks, the task was paused, and participants continued with the next block by tapping a button on the screen. A task session consisted of 16 blocks. All trials within a given block used the same combination of distance and width of the targets. Distance and width were systemically varied between blocks in a randomized order. To provide a realistic variability in task difficulty [13], four different values for distance (294.88, 589.76, 884.64 and 1179.52 pixels) and width (39.32, 78.63, 157.27 and 235.90 pixels) were chosen resulting in 16 unique combinations of distance and width with the ID ranging from 1.17 to 4.95.

4.2.2 Hardware

The multidirectional tapping task was completed on an HTC Nexus 9 tablet using an Android operating system (version 7.1.1). The screen diagonal of the tablet measured 226 mm (8.9 inches) with a resolution of 2048 × 1536 pixel and a pixel density of 287 dpi. Questionnaires were administered using the same device. To allow for a comfortable reproducible positioning of the tablet during the study a tablet stand (Nbryte® Tablift®) was used (see Fig. 2a).

Fig. 2
figure 2

a Example of HTC Nexus 9 tablet mounted on the Nbryte® Tablift® tablet stand. b schematic representation of the study setup in HDT and non-HDT conditions

4.3 Research design

Data were collected as part of the Artificial Gravity Bed Rest–European Space Agency (AGBRESA) study,Footnote 1,Footnote 2 conducted at the German Aerospace Center (DLR) in Cologne, Germany. The study was conducted in accordance with the declaration of Helsinki and was registered at the German Clinical Trials Register under number DRKS00015677. The protocol was approved by the ethics commissions of the Medical Association North Rhine (number 2018143) and the National Aeronautics and Space Administration (NASA; Johnson Space Center, Houston, United States of America). Participants were recruited via social media and other types of media advertisements. Subjects that stated their willingness to participate in DLR-based studies and passed an initial routine screening on key in- and exclusion criteria were elected to participate in the study. Detailed in- and exclusion criteria are described in Appendix A.

The study design itself was defined by NASA and the European Space Agency (ESA). DLR implemented their specifications and conducted the study according to international guidelines [43]. Accordingly, our approach to data collection for the multidirectional tapping task was subjected to the general study design.

The study employed a full within-subjects design in which all participants completed 60 days of HDT with measurements before, during, and after HDT. During HDT, participants were put into a − 6° head-down tilt position (i.e., the foot end of the bed was titled upwards at a − 6° angle) and were not permitted to sit up or stand up (see Fig. 2). Due to limited resources (e.g., the number of available beds), the study was conducted in two successive parts with 12 participants each. Additional details of the study design that are beyond the scope of the present study can be found in [44].

4.4 Procedure

Data were obtained at six measurement points (Time of Measurement). Each participant took part in one task session per measurement point. Accordingly, individual participants completed 1056 trials in total \(\left(6 \; measurement \; points \times 16 \; blocks \times 11 \; trials\right).\) The baseline measurement was carried out 12 days before HDT (BDC−12). The next measurements followed 5 days (HDT5), 21 days (HDT21), and 56 days (HDT56) respectively after the initial HDT. Finally, the last two measurements were performed 4 days (R+4) and 90 days (R+90) after HDT. Data at R+4 were obtained only for 12 participants due to corruption of collected data.

During the study, all measurements were taken while lying down (see Fig. 2), during HDT at − 6° head-down tilt position, before and after at 0°. At BDC−12, the tablet was positioned on the tablet stand on the participant’s bed. Participants had to place their elbows on a pad. The tablet position was adjusted so that the participants did not have to lift their heads and could reach all areas of the screen with their dominant hand without lifting their elbow from the pad. The position was then photographed to reproduce it as accurately as possible for the following measurement points. Before the multidirectional tapping task at BDC−12, subjects filled in the first questionnaire on the tablet. For the multidirectional tapping task, participants were instructed to select the highlighted targets as quickly as possible without making mistakes using only their dominant hand. At each measurement point, participants answered another questionnaire on the same tablet after completing the task. While filling in questionnaires, they were allowed to use both hands. All instructions and questionnaires were administered in German.

4.5 Measures

All data needed to compute the metrics regarding the multidirectional tapping task was automatically collected via the “FittsTouch” software [42].

4.5.1 Throughput

Throughput is an indicator of the speed and accuracy of selections. Calculation of Throughput followed recommendations by MacKenzie [29], Soukoreff and MacKenzie [38], and the ISO 9241-9 standard [37]. Hence, Throughput (in bits/s) was computed as an aggregate measure for each block. For calculating the underlying ID, the Shannon formulation with the effective distance \({A}_{e}\) and effective width \({W}_{e}\) of the targets was used.

$$ID={log}_{2}\left(\frac{{A}_{e}}{{W}_{e }}+1\right)$$
(1)
$$Throughput= \frac{ID}{Movement \; time}$$
(2)

Thus, higher Throughput using effective values indicates better performance in terms of both speed and accuracy of selections.

4.5.2 Movement time

As a supplementary measure to Throughput, Movement Time was considered indicating the speed of selections. As measured in the present study, it corresponds to the time between the onset of a target to select and the subsequent selection of a given target.

4.5.3 Error rate

An error corresponds to missing a given target. The Error Rate is the number of first selections that were erroneous as a ratio of the total number of targets in a block [41]. Accordingly, a participant’s Error Count in a block could vary between 0 to 11 (number of trials per block).

4.5.4 Touch duration

Touch Duration refers to the duration of a participant touching the screen when selecting an individual target (i.e., the duration of physical contact of the participant’s finger and the tablet’s screen during target selection).

4.5.5 Additional measurements

For sample description, data on participants’ Gender, Age, Previous Experience with Touch Devices, and ATI were gathered. These variables were collected at BDC−12, before participants completing the multidirectional tapping task. As part of the AGBRESA study, which focused on a multitude of human physiological, neurological, and behavioral spaceflight outcomes, additional variables were obtained that are not reported in the context of this paper.

Previous Experience with Touch Devices was measured with a single item (“How familiar are you with touch displays (e.g., smartphones, tablets, ticket machines)?”) on a 7-point Likert scale from 1 (not familiar at all) to 7 (very familiar).

ATI was measured using the ATI scale [45]. The scale contains 9 statements (e.g., “I like to occupy myself in greater detail with technical systems”) about a person’s tendency to actively engage with technology [45]. These items are rated on a 6-point Likert scale ranging from 1 (completely disagree) to 6 (completely agree). Internal consistency of the ATI scale was good, with α = 0.87.

4.6 Data processing and analysis

Data processing and statistical analyses were conducted using R 4.0.2 [46]. Consistent with the recommendations by Soukoreff and MacKenzie [38], Throughput, Movement Time, Error Rate, and Touch Duration were analyzed as mean aggregates by block.

As the collected measurements are susceptible to erroneous values and extreme values that might negatively affect the validity of further statistical procedures, outlier analyses were performed in two steps. First, univariate outliers for Movement Time and Touch Duration were investigated by block. Following data exploration, observations (i.e., individual trials) that are located more than three interquartile ranges (IQR) away from the median of the respective variable were removed before aggregation by block. Secondly, outlier analysis was performed on the aggregated data. Based on inspection of the univariate distributions of Movement Time, Touch Duration, and Throughput by Time of Measurement, only observations within two IQRs from the median of the respective variables were retained for further analyses. Investigation of outliers was not performed for Error Rate, as the variable is bound between 0 and 1.

Changes in Throughput, Movement Time, and Touch Duration were assessed with linear mixed effects models (LMMs) implemented via the lme4 and lmerTest packages [47, 48]. Separate models were fitted for each dependent variable. The models were specified with Time of Measurement as a categorical fixed effect. By-subject and by-item random intercepts (i.e., varying intercepts for participants and unique combinations of target distance and width) and their interaction were defined as random effects. Significance of the predictors in the final model was based on F tests employing the Satterthwaite method for denominator degrees of freedom [48,49,50].

The effect on Error Rate was examined by first transforming Error Rate into a count bounded between 0 and 11 (Error Count). Data were then analyzed with Error Count as a dependent variable in a generalized LMM with a zero-inflated Poisson distribution for the response via the glmmTMB package [51]. Like models for other dependent variables, the model was fitted with Time of Measurement as a fixed effect and varying intercepts for subject, item, and their interaction as random effects. Significance of predictors was determined with likelihood ratio tests.

If Time of Measurement was a significant predictor in a given model, differences between individual levels of Time of Measurement were explored. Accordingly, pairwise contrasts for the estimated marginal means of the factor levels were computed using the emmeans package [52]. Family-wise error rate was controlled by adjusting obtained p-values using the correction method introduced by Holm [53] as this method is valid under arbitrary assumptions [46]. For all statistical comparisons, a threshold of α = 0.05 (two-tailed) was adopted to determine statistical significance.

5 Results

5.1 Stability of Fitts’ law in HDT

To compare the stability of Fitts’ law in HDT and non-HDT conditions, mean Movement Times were calculated for combinations of unique values of ID and Time of Measurement. Subsequently, separate linear regression models were fitted for each Time of Measurement, with Movement Time as the dependent variable and ID as a predictor. Figure 3 visualizes the resulting models, including the respective regression equation and R2. As seen in Fig. 3, the estimated relationships are very similar across models, with slopes ranging from 143 to 163 and R2 ranging from 0.94 to 0.96. The maximum difference in R2 between non-HDT and HDT conditions only amounts to 0.01. This indicates that the strength of the relationship between the ID and Movement Time is similar in HDT compared to non-HDT conditions.

Fig. 3
figure 3

Relationship between Movement Time and ID for each Time of Measurement. HDT5 = measurement 5 days after start of HDT phase. HDT21 = measurement 21 days after start of HDT phase. HDT56 = measurement 56 days after start of HDT phase. R+4 = measurement 4 days after end of HDT phase. R+90 = measurement 90 days after end of HDT phase

5.2 Changes in performance during HDT

For each performance indicator, a separate LMM was fitted with Time of Measurement as a within-subject predictor (for additional details refer to section Data processing and analysis). Regarding LMMs for Throughput, Error, and Touch Duration, inspection of model diagnostics did not indicate deviations from model assumptions. For the LMM for Movement Time, residuals of the model did not appear to be normally distributed based on visual inspection and tests for normality. However, as prior research has demonstrated high robustness of LMMs to violations of normality, no transformations of the response variable were applied [54].

Significant effects of Time of Measurement were found for all four performance indicators: Throughput, F(5, 1675.2) = 45.16, p < 0.001; Movement Time, F(5, 1605.81) = 72.465, p < 0.001, Error Count, X2(5) = 15.10, p < 0.01; Touch Duration, F(5, 1631.26) = 14.15, p < 0.001. Figure 4 shows the estimated marginal means of the performance indicators for individual levels of Time of Measurement and respective pairwise contrasts visualized by comparison arrows [52]. Additional tables with detailed coefficients for estimated marginal means and pairwise contrasts can be found in Appendix B.

Fig. 4
figure 4

Estimated marginal means of performance indicators with comparison arrows [52] by individual levels of Time of Measurement. A Throughput, B Movement Time, C Error Count, D Touch Duration. If an arrow from one mean overlaps with an arrow from another mean, the difference is not significant [52]. Significance threshold α = 0.05, adjusted based on Holm’s method [53]. HDT5 = measurement 5 days after start of HDT phase. HDT21 = measurement 21 days after start of HDT phase. DT56 = measurement 56 days after start of HDT phase. R+4 = measurement 4 days after end of HDT phase. R+90 = measurement 90 days after end of HDT phase

For Throughput, estimates indicate that it steadily increased over the study period, regardless of HDT. In general, all estimates are significantly different from BDC−12 indicating substantially higher Throughput later in the study. For example, compared to BDC−12, average Throughput is estimated to increase by 0.49 bits/s, and at R+90, an increase by almost 0.95 bits/s is estimated. Moreover, while this does not apply to all comparisons, some contrasts comparing HDT with recovery conditions are significant as well, with consistently higher Throughput in recovery conditions.

The estimated marginal means of Movement Time by Time of Measurement indicate that Movement Time decreased during the study period. All pairwise contrasts involving BDC−12 and HDT5 were significant, with positive sign of the estimated differences. Moreover, no significant differences were found for other contrasts. That is to say, Movement Time significantly decreased until HDT21, and no further substantial change was observed subsequent to HDT21.

Regarding Error Count, estimated marginal means indicated a slightly increased Error Count in HDT and recovery conditions. Corresponding pairwise contrasts revealed significant differences between BDC−12 and HDT conditions. In contrast, differences between BDC−12, R+4 and R+90 were not significant. Moreover, contrasts comparing HDT and recovery conditions did not reach statistical significance either.

Finally, pairwise contrasts revealed significant changes in Touch Duration for HDT5, HDT21, HDT56 compared to BDC−12, indicating an elevated average Touch Duration during HDT. No significant shift in Touch Duration was found for R+4 and R+90 compared to BDC−12. Beyond this, recovery conditions were associated with significantly lower Touch Duration compared to HDT conditions; however, this pattern did not apply to all pairs of conditions.

6 Discussion

Touchscreen-based interfaces are increasingly common in the context of space travel. Understanding how conditions that occur during space travel—specifically microgravity—might affect human performance is important for successfully conducting space missions. However, studies that investigated performance in tasks with touchscreen-based interfaces in microgravity or microgravity analogs are scarce. Moreover, while Fitts’ law has been studied in many different contexts, its stability in microgravity analogs has been largely neglected by previous research. The present study adds to the body of existing research by investigating these objectives through a multidirectional task with touch devices in HDT.

6.1 Summary of findings relating to the research questions

In the following, the main findings from our study in relation to respective research questions are summarized. Concerning the comparison of Fitts’ law in HDT and non-HDT conditions (RQ 1), our study reveals that the strength of the relationship between Movement Time and ID as suggested by Fitts’ law in its original form (Fitts, 1954) is highly similar in HDT and non-HDT conditions (maximum difference in R2 between non-HDT and HDT conditions amounts to 0.01).

Regarding the initial transition from non-HDT to HDT (RQ 2), performance in terms of Throughput and Movement Time increased, while performance in terms of Error Count decreased. Touch Duration also increased with the transition to HDT.

Finally, regarding the temporal evolution of performance change (RQ 3), findings indicate that Throughput generally increased in the period of the study regardless of HDT, while Movement Time decreased. Moreover, Error Count was significantly higher in all HDT conditions compared to BDC−12, while differences between BDC−12 and recovery conditions were not significant. However, no differences between HDT and recovery conditions were found either. For Touch Duration, the analysis revealed elevated Touch Duration during all three HDT conditions compared to BDC−12. At the same time, recovery conditions and BDC−12 did not differ significantly in terms of Touch Duration, while most contrasts comparing HDT and recovery conditions reached statistical significance.

Possible explanations for these ambiguous findings regarding RQ 2 and RQ 3 (i.e., better performance in HDT for Movement Time and Throughput but worse performance in HDT regarding Error Rate and Touch Duration) are explored in the following sections.

6.2 Interpretation and contextualization of results

A primary challenge in interpreting the results regarding changes in task performance concerns the apparent effects of adaptation. In particular, findings for Throughput and Movement Time imply better performance in terms of speed and accuracy regardless of HDT. In contrast, Error Count (and Touch Duration) was higher during HDT compared to BDC−12, which indicates worse task performance. These conflicting results might be in part explained by some form of performance stabilization. Participants might have been more hesitant when solving the task at BDC−12, thus showing lower Throughput but making fewer errors. However, as Error Count remained stable after HDT5 while Throughput and Movement Time further improved following HDT5, we conclude that improvements in Throughput and Movement Time are only partially attributable to this form of performance stabilization. Instead, further performance improvement is best explained by practice effects (i.e., participants became more efficient at solving the task through practice but still maintained a similar Error Rate).

Moreover, findings regarding improved performance stand in contrast to previous studies showing that human movement tends to slow down during microgravity [6, 8, 10] and is less accurate [7]. Most previous studies investigated performance in short-term microgravity where participants had less time to adapt to the new conditions and task requirements (in contrast to long-term HDT). Thus, this inconsistency might be explained by the adaption to experimental conditions. Results from the present study align with findings by Shehab et al. [55], who investigated performance in several tasks, including tracking ability with a trackpad during HDT. They found a trend of performance improvement for all tasks. In light of steady improvements in Throughput and Movement Time and considering practice effects in the present study, we broadly conclude that performance in multidirectional tapping tasks with touchscreen-based interfaces remains rather stable despite HDT given that there is enough time to adapt to task requirements and experimental conditions.

This conclusion is in line with Holden et al.'s findings [11] that performance deterioration mainly concerns gravitational transitions. The performance decline at gravitational transitions observed by Holden et al. [11] is also mirrored by increased Error Count and Touch Duration compared to baseline in the present study (although not evident for Movement Time and Throughput likely due to practice effects as discussed above).

When comparing findings from the present study to results by Holden et al. [11], a main difference appears to be that data from the present study does not indicate a performance drop-off in early post-HDT (equivalent to early post-flight conditions). We suggest two possible reasons for this difference: First, Holden et al. [11] corrected all performance metrics for the estimated learning effect (based on data from their ground-based control group). However, the inclusion of a control group was not feasible in our study, which prevented us from making this correction. Consequently, practice effects in our study potentially superimposed negative effects of gravitational transitions. Second, in contrast to Holden et al.'s study [11], our early post-HDT measurements were taken four days after HDT, while their measurements were collected zero to three days after landing. Furthermore, Holden et al. [11] obtained their late post-flight measurements between five to 30 days after landing, whereas our final measurements were collected 90 days following HDT. Based on these differences, a tentative conclusion is that the first few days after transition may be particularly critical for performance degradation. However, this hypothesis requires confirmation by future research.

In conclusion, while the lack of a ground-based control complicates interpretation of findings, results from the present study support evidence by Holden et al. [11] showing that performance tends to stabilize after a prolonged time in microgravity conditions. Furthermore, also in line with Holden et al. [11], our study also indicates performance drop-offs for transitions from non-HDT to HDT conditions (at least for Error Count and Touch Duration). Performance drop-offs post-HDT (respectively post-flight) were not observed in the present study in contrast to Holden et al. [11] which might be due to the superposition by practice effects and performance degradation mainly affecting the first few days after landing (as described above).

6.3 Limitations

The present study gives insights into the validity of Fitts’ law and performance with touchscreen-based interfaces during HDT. However, several limitations need to be considered.

First, it should be noted that the comparison of model fits between HDT and non-HDT conditions employed in the present study to investigate Fitts’ law is relative in nature. Hence, this approach does not allow nuanced statements regarding the validity of Fitts’ law for the multiple directional tapping task itself. This is important to note as previous research has suggested several modelling variants that may improve predictive accuracy compared to Fitts’ law in its original form [56,57,58,59].

Second, the study only employed a within-subjects design with no control group. While the inclusion of a control group would have been beneficial, this was not possible, as our approach to data collection was subjected to the general study design (see section Research Design). This presents a limitation as confounding influences—particularly practice effects—cannot be precluded effectively.

Third, data at R+4 were only obtained for 12 participants due to corrupted data. While LMMs can produce unbiased estimates when data are missing at random, this missingness still reduces statistical power and thus is associated with possible false-negative findings.

6.4 Future research

Regarding the validity of Fitts’ law in space, future research should investigate if variations of Fitts’ original model (e.g., [56,57,58,59]) prove useful to further improve model accuracy in microgravity and microgravity analogs.

Moreover, to further deepen the understanding of effects of space travel on human performance with touch devices more studies are needed. First, we discuss implications for future research in the light of advantages and disadvantages of HDT studies in comparison to studies during real spaceflights. On the one hand, HDT allows for a more controlled study setting and the inclusion of more participants due to lower costs. For example, in the study by Holden et al. [11] (conducted during actual spaceflights) only seven astronauts took part and time of measurements was not fully consistent across subjects due to operational constraints. On the other hand, studies during real spaceflights allow for greater ecological validity. Given, this trade-off we suggest studies in both controlled HDT and real spaceflight settings for further research. In any case, these studies should include control groups (with randomization or matching), if possible, to allow for more confident conclusions regarding causal relationships.

As tasks conducted during a space mission are often more complex, requiring a combination of skills (e.g., sensorimotor skills, strategic decision making [10]), future studies should also use more realistic, complex tasks (see also [11]). Ideally, such experiments should employ user interfaces that resemble displays in contemporary spacecrafts to further increase external validity of findings.

Moreover, beyond developing measures to counteract possible negative effects of gravitational transitions and provide adaptative post landing applications to minimize errors as suggested by Holden et al. [11], conducting further research is crucial to determining the exact duration of the adverse effects during gravitational transitions.

Future research could also expand the range of bio-mechanical parameters examined, (e.g., including touch pressure) to more comprehensively characterize human performance with touch devices. Such research could have important implications for non-invasive prediction of an astronaut's mental state (e.g., stress levels), building on findings from previous research conducted on Earth (e.g., [60, 61]).

Finally, future experiments comparing different devices (touch-based or non-touch-based) in microgravity and microgravity analogs could provide further insight regarding the suitability of specific input devices for space travel. Existing studies on the comparison of input devices for aviation (e.g., [41]) provide valuable methodological references for this purpose.

7 Conclusion

In this paper, we presented a unique study investigating Fitts’ law in HDT on the one hand and the influence of HDT on human performance with touch devices on the other. This is a highly relevant research topic, given the increasingly frequent application of touchscreen-based interfaces in space travel. The study revealed that the relationship between movement time and ID suggested by Fitts’ law is of similar magnitude in HDT conditions compared to non-HDT. This finding complements existing literature demonstrating the validity of Fitts’ law in manifold contexts (e.g., [14,15,16]).

Furthermore, in line with Holden et al. [11], we found some indication that effects of HDT on performance with touch devices tend to be rather small given that there is enough time for adaption. Also taking previous findings by Holden et al. [11] into account, performance decline appears to primarily concern gravitational transitions. Thus, the present study provides another empirical building block to solidify knowledge on the practical use of touch devices for space travel. As open questions still remain, future research directions motivated by our results were discussed to further deepen the understanding of the subject.