1 Introduction

Daily movement behaviour data are mostly examined in terms of their association with human health and wellbeing. They are usually reported as amounts of time spent in various activities during a certain time period of observation. When 24-h data are available, the partition is generally made in terms of sleep, sedentary behaviour (SB), and physical activity (PA) of different intensities: light (LPA), moderate (MPA) and vigorous (VPA). We denote the corresponding vector of variables as MB = (Sleep, SB, LPA, MPA, VPA). Time-use data are compositional in nature because they consist of co-dependent positive parts or fractions of a whole period of observation carrying relative information. This implies that the interest lies, not on the absolute values of the compositional parts, but on the ratios between them. As a consequence, the particular scale in which the time-use composition is measured (typically either hours/day, minutes/day or similar, or directly in percentages or proportions) is actually irrelevant. The logratio approach provides a consistent statistical methodology for the analysis of compositional data (Aitchison 1986; Pawlowsky-Glahn et al. 2015).

There has been an increasing awareness of the suitability of compositional methods to conduct statistical analysis in public health research. The basic concepts of compositional descriptive statistics, visualisation, and linear regression models based on orthonormal logratio (olr) coordinates (often referred to as isometric logratio (ilr) coordinates) in the context of time-use data were demonstrated in Chastin et al. (2015). Further studies followed applying those concepts and presenting other techniques within the logratio framework in the area (Dumuid et al. 2017a, 2017; Štefelová et al. 2018; McGregor et al. 2020; Pelclová et al. 2020). In relation to the work presented here, standard partial least squares (PLS) regression has been used to deal with the multicollinearity problem in raw 24-h movement behaviour data when investigating their relationship with a health parameter (e.g. Aadland et al. (2018)). This certainly allows to circumvent the technical issue; however, as with ordinary regression analysis omitting one category of time use for the same purpose, it still ignores the relative nature of time-use data entirely and focuses on the absolute information. This for example implies the assumption that potential benefits of increasing VPA are the same regardless of the initial condition of the individual. Within the compositional framework, time-use data are instead conceptualised as carriers of relative information where that and the interplay with the other time-use categories matter. Hence, PLS regression in this context, as it is equally the case for ordinary regression analysis and its robust alternatives (Štefelová et al. 2018), should be applied in logratio coordinates to account for these features (Hinkle and Rayens 1995; Gallo 2010; Kalivodová et al. 2015; Wang et al. 2010; Chen et al. 2021; Štefelová et al. 2021).

As for the graphical representation of time-use compositions, ternary diagrams are commonly used as a counterpart of scatterplots for ordinary data (Chastin et al. 2015). A ternary diagram is an equilateral triangle that allows to visualise 3-part compositions. In the context of the current work, the vertices represent the three time-use behaviours and the observed compositions are displayed by points. Those lying close to a vertex have higher percentages in the behaviour that is represented by that vertex, whereas points lying around the centre of the triangle have similar percentages in all three behaviours. Thus, the compositions are plotted at distances from the edges corresponding to their values in the respective compositional parts. However, similarly to the ordinary scatterplot, its practical usefulness is limited as the number of compositional parts increases, since only 3-part subsets (subcompositions) can be displayed at once. Thus, if we consider a 5-part MB composition, 10 ternary diagrams would be needed to visualise all possible combinations of 3-part subcompositions. Moreover, ternary diagrams display the original compositional data as standard (positive-constrained) observations, which could lead to biased conclusions in relation to the data structure and variability, specifically concerning groupings in the data and presence of outlying observations (Filzmoser et al. 2018), as the difference observed with the naked eye may not precisely correspond to the actual dissimilarity between observations. The reason is that compositions obey a particular geometry so-called Aitchison geometry (Pawlowsky-Glahn and Egozcue 2001; Billheimer et al. 2001) which, unlike the commonly assumed Euclidean real space geometry, accounts for their scale invariance and relative nature. The Aitchison geometry has Euclidean vector space structure and then enables to express compositional data in coordinates with respect to an orthonormal basis. Accordingly, it is possible to transfer the statistical processing of the data into the ordinary Euclidean real space while preserving the distances and angles of the original data. Such olr coordinate representation can thus be used for a visually reliable graphical representation (Pawlowsky-Glahn et al. 2015). Then, and even more in the case of larger compositions, data dimensionality reduction techniques such as compositional variants of principal components analysis (PCA) or PLS regression (with this latter considering the relationship with an outcome variable) can be used to obtain a more insightful visualisation of compositions through a biplot display based on their logratio coordinates.

Moreover, it is common in compositional data analysis to express the input data in a so-called balance coordinate system (another form of olr coordinates) (Egozcue and Pawlosky-Glahn 2005; Egozcue and Pawlowsky-Glahn 2019; Martín-Fernández 2019). Balance coordinates represent contrasts between two groups of parts of the composition and their particular expression depends on the orthonormal basis used as reference (one amongst infinitely many possible). Importantly, this can be tailored to some extent so that the collection of balances associated to it includes some contrast(s) which have a practical interpretation (see the sequential binary partition procedure described in Sect. 2). However, not all balances of interest in a study can be generally obtained from just one such a coordinate system. Consequently, often, several of the balances generated from a single orthogonal basis are meaningless for the practitioner, with this issue being more likely as the size of the composition at hand increases. This might have discouraged a wider adoption of balance coordinates within time-use epidemiology where, with just a few exceptions (see e.g. McGregor et al. (2020)), the so-called pivot coordinates approach (Fišerová and Hron 2011; Filzmoser et al. 2018) has been mostly applied so far (Dumuid et al. 2017; Štefelová et al. 2018; Dumuid et al. 2020). This later formally considers orthogonal rotations between olr coordinate systems so that the dominance of one behaviour relative to the others is sequentially isolated for statistical assessment.

The issues discussed above and the combination of the ideas around balance and pivot coordinates lead to the introduction here of the concept of pivoting balances and its implementation to facilitate a synthetic and meaningful graphical display of compositions and their relationships with outcome variables. This work thus enables the combination of balance-like coordinates into a same modelling and visualisation task, here PLS regression and biplot, including those that cannot be obtained from a single balance coordinate system. In particular, this is implemented with the objective of providing improved visualisation of complex time-use patterns in relation to adiposity. The method relies on the flexibility of the concept of pivot coordinates, which has been also exploited recently to define orthonormal pairwise logratio representations (Hron et al. 2021) and to implement weighted statistical analysis schemes (Hron et al. 2021). Firstly, we discuss compositional balances as a way to construct interpretable logratio coordinate representations of time-use compositions, considering the ordinal character of the wake-time components. Next, the basic ideas behind PLS regression and the associated biplot display are developed, including their extension to the compositional case based on the novel concept of pivoting balances. Finally, the proposed method is applied to examine the association between 24-h movement behaviours patterns and adiposity from a sample of Czech school-aged girls, particularly assessing how the distribution of time between behaviours relates to healthier conditions.

2 Balance coordinates for compositional data

Compositional data are essentially characterised by the scale invariance property (Pawlowsky-Glahn et al. 2015; Filzmoser and Hron 2019). This means that multiplying a composition by a positive constant does not alter the relative information conveyed by the (log)ratios between its parts. Scale invariance implies that compositional data are formally defined on a sample space consisting of equivalence classes of proportional vectors (Barcelól Vidal and Martín-Fernández 2016). This clashes with the direct application of ordinary statistical methods which assume data on an absolute scale obeying the standard Euclidean geometry of the real space. However, it has been shown that ordinary methods can still be applied after mapping of compositions into the real Euclidean space through e.g. olr coordinates (Egozcue et al. 2003; Egozcue and Pawlowsky-Glahn 2019; Martín-Fernández 2019). These olr coordinates are obtained with respect to an orthonormal basis in the Aitchison geometry of compositional data and facilitate transferring the statistical results back to the original sample space by inverse mapping (Egozcue et al. 2003).

Given a D-part composition \({\varvec{x}} = (x_1, \ldots , x_D)^\top\), a procedure known as sequential binary partition (SBP) can be applied to construct \(D-1\) (the actual degrees of freedom of a D-part composition) customised olr coordinates called (compositional) balances consisting of a real vector \({\varvec{b}} = ( b_1, \ldots , b_{D-1})^\top\) (Egozcue and Pawlosky-Glahn 2005). In the first step of the SBP process, the entire collection of compositional parts is split into two disjoint subsets, with each subset being summarised by the geometric mean of its components and going into the numerator and denominator respectively of a normalised logratio. This defines the first balance \(b_1\). The resulting subsets are sequentially further split in the same way until only one-part subsets remain, leading to the subsequent balances \(b_2, \ldots , b_{D-1}\).

The general formula for a balance coordinate is as follows:

$$\begin{aligned} b_j = \sqrt{\frac{r_j s_j}{r_j + s_j}} \ln \frac{\root r_j \of { \prod _{i=1}^{r_j}x^{+}_{j_{i}}}}{\root s_j \of { \prod _{i=1}^{s_j}x^{-}_{j_{i}}}} = {\varvec{g}}_j^\top \ln ({\varvec{x}}), \quad j=1,\ldots , D-1, \end{aligned}$$
(1)

where \(x^{+}_{j_{i}}\) and \(x^{-}_{j_{i}}\) refers to the parts selected for the numerator and denominator, respectively, in the jth balance while \(r_j\) and \(s_j\) stands for the respective number of parts in each subset; the vector \({\varvec{g}}_j\) represents rows of a \((D-1, D)\)-matrix of logcontrast coefficients \({\textbf{G}}\) with elements

$$\begin{aligned}&g_{jd} = \left\{ \begin{array}{ll} +\frac{1}{r_j} \sqrt{\frac{r_j s_j}{r_j + s_j}}, &{} \text { if } x_d \in \{ x^{+}_{j_{i}}, \ i = 1, \ldots , r_j \}, \\ -\frac{1}{s_j} \sqrt{\frac{r_j s_j}{r_j + s_j}}, &{} \text { if } x_d \in \{ x^{-}_{j_{i}}, \ i = 1, \ldots , s_j \}, \\ 0 &{} \text { otherwise }, \end{array} \right. \\&j = 1, \ldots , D-1; \ d = 1, \ldots , D. \end{aligned}$$

It holds that \({\textbf{G}} {\textbf{G}}^\top = {\textbf{I}}_{D-1}\), where \({\textbf{I}}_{D-1}\) is the \((D-1)\)-dimensional identity matrix.

Balance coordinates are then usually interpreted, as their name indicates, in terms of a balance (contrast) between two subsets of components. The use of balances is particularly advantageous to define interpretable contrasts between parts of the composition according to the scientific questions of interest and based on domain-specific knowledge (e.g. active versus non-active behaviours (McGregor et al. 2020)).

Note that different partitions lead to different systems of balances but, as for any olr coordinate representation, they are orthogonal rotations of each other that simply represent the information in the composition in a different way (Egozcue and Pawlosky-Glahn 2005). So, given two different vectors of balances \({\varvec{b}} = {\textbf{G}} \ln {\varvec{x}}\) and \({\varvec{b}}^* = {\textbf{G}}^* \ln {\varvec{x}}\), it holds that \({\varvec{b}}^* = {\textbf{G}}^* {\textbf{G}}^\top {\varvec{b}}\).

A popular type of balances are the so-called pivot coordinates (Fišerová and Hron 2011), where one part (placed in the numerator) is set against the remaining ones (in the denominator). Note that the role of a single compositional part relative to all the others is highlighted in the first pivot coordinate. Thus, by placing one particular part each time in the numerator of the first pivot coordinate, we can construct D different olr coordinate systems and extract the first coordinate from each one of them. In more detail, denoting composition \({\varvec{x}}\) rearranged so that the lth part is put at the first place as \({\varvec{x}}^{(l)} = \left( x_{1}^{(l)}, \ldots , x_{D}^{(l)} \right) ^\top = (x_{l}, x_{1}, \ldots , x_{l-1}, x_{l+1}, \ldots , x_{D})^\top , \ l=1, \ldots , D\), the corresponding set of pivot coordinates given by a real vector \({\varvec{z}}^{(l)} = \left( z_{1}^{(l)}, \ldots , z_{D-1}^{(l)} \right) ^\top\) is constructed as

$$\begin{aligned} z_{j}^{(l)} =\sqrt{\frac{D-j}{D-j+1}} \ln \frac{x_{j}^{(l)}}{\root D-j \of {\prod _{d=j+1}^D x_{d}^{(l)}}} \quad j=1, \ldots , D-1, \ l=1, \ldots , D. \end{aligned}$$

Note that the first pivot coordinate \(z_{1}^{(l)}\) isolates all the information about part \(x_l\) relative to the others. It is interesting to note the relationship between the set of first pivot coordinates \(z_{1}^{(1)}, \ldots , z_{1}^{(D)}\) and the well-known clr coefficients

$$\begin{aligned} {\varvec{c}} = (c_1, \ldots , c_D)^\top = \left( \ln \frac{x_{1}}{\root D \of { \prod _{d=1}^{D}x_{d}}}, \ldots , \ln \frac{x_{D}}{\root D \of { \prod _{d=1}^{D}x_{d}}} \right) ^\top . \end{aligned}$$

Clr coefficients are not olr coordinates but coefficients with respect to a generating system on the simplex and lead a to singular covariance matrix (Aitchison 1986). It holds that \(c_l = \sqrt{(D-1)/D} \cdot z_1^{(l)}, \ l=1, \ldots , D\) and, therefore, \(z_{1}^{(1)} + \ldots + z_{1}^{(D)} = 0\). Moreover, for general balances, it holds that \({\varvec{b}} = {\textbf{G}} {\varvec{c}}\), which consequently also applies to pivot coordinates.

2.1 Pivoting balances and their use with movement behaviour compositions

Although the process described above based on pivot coordinates has been commonly used in practice (Filzmoser et al. 2018), it might not necessarily be the best strategy when there is a natural ordering across compositional parts, as we can understand for time-use compositions. A pivot coordinate associated with a part regarded within the ordering as an “intermediate level” could potentially aggregate logratios having opposite associations with the outcome variable. Then it seems reasonable to consider more general balances that represent relevant trade-offs between subsets of parts instead of just one against the others (McGregor et al. 2020). In particular, we are interested here in balances that take into consideration all the parts available and their order. That is, considering different first balances where each adds one more intense activity part into the subset in the numerator of the balance, and then investigate how these additions affect the results. It is not be possible to obtain all these balances from a single SBP by construction, and it requires to consider different olr coordinate systems instead. Hence, following a strategy analogous to the one used with pivot coordinates, we can consider L olr coordinate systems to construct desirable balances \({\varvec{b}}^{(l)} = \left( b_{1}^{(l)}, \ldots , b_{D-1}^{(l)} \right) ^\top\), with \(l=1,\ldots , L\) (the superscript here refers to the balance coordinate system considered), that in turn isolate a balance of interest in the first coordinates \(b_{1}^{(l)}\). Accordingly, we denote the respective matrix of logcontrast coefficients as \({\textbf{G}}^{(l)}\). The number of coordinate systems L to consider will depend on the specific data set. In the following, and because of its conceptual analogy with pivot coordinates, we will refer to the balances resulting from these coordinate systems as pivoting balances.

Focusing on our context of application, it is generally accepted that a higher health benefit is obtained from more physically demanding activities, but the role of sleep is less clear (Felsö et al. 2017), we suggest the following coordinate systems with different initial partitions into two subgroups. Starting with sleep against the rest of behaviours, the subset of parts in the numerator is subsequently enlarged from the least to the most intense activities; and vice versa, sleep in the numerator is subsequently accompanied by the other activities from the most to the least intense. The remaining three balances in each coordinate system are constructed arbitrarily by following the SBP rules. Thus, this gives rise to \(L=7\) coordinate systems with pivoting balances \({\varvec{b}}^{(l)} = \left( b_{1}^{(l)}, b_{2}^{(l)}, b_{3}^{(l)}, b_{4}^{(l)}\right) ^\top , \ l=1,\ldots , 7\). We give those coordinates of interest, i.e. \(b_{1}^{(1)}, \ldots , b_{1}^{(7)}\) , the symbolic notation Sleep_SB.LPA.MPA.VPA, Sleep.SB_LPA.MPA.VPA, Sleep.SB.LPA_MPA.VPA, Sleep.SB.LPA.MPA_VPA, Sleep.VPA_SB.LPA.MPA, Sleep.MPA.VPA_SB.LPA, and Sleep.LPA.MPA.VPA_SB (i.e. using an underscore to separate the parts in the numerator and the denominator of the logratio and a point symbol to split out the parts into the respective subgroup). Table 1 illustrates an exemplary SBP to obtain the required set of balances.

Table 1 Exemplary SBP for MB composition which results in the required pivoting balance systems with the (first) balance of interest as noted on the left. Parts chosen for the numerator and denominator of the jth balance are coded \(+\) and \(-\), respectively; 0 indicates that the part is not included in the respective balance

As for the interpretation, e.g. the balance Sleep_SB.LPA.MPA.VPA compares time spent in sleep relative to the average (computed by the geometric mean) of time spent in waking-time behaviours, Sleep.SB_LPA.MPA.VPA is a contrast of average time spent in inactive behaviours against average time spent in physical activities, and so on. Accordingly, positive values of the balance Sleep_SB.LPA.MPA.VPA mean dominance of sleep over the averaged contributions of the waking-time components, and vice versa. Similarly, positive values of the balance Sleep.SB_LPA.MPA.VPA implies a dominance of averaged contributions of inactive behaviours over averaged contributions of physical activity. Note that the reciprocal balances, i.e. the balances resulting from swapping the subsets of behaviours in the numerator and denominator of the logratio, differ only on the sign. Thus, the arrangement of the subsets of behaviours in the balance can be chosen accordingly to the intended interpretation.

The approach based on balances, where the groups are represented by the respective geometric means, can be compared with the case where the groups are characterized by amalgamation (summing) of parts in both groups, called summated logratios (Greenacre 2020). In principle, the idea of pivoting logratio coordinates as discussed above could be used with summated logratios as well, but the resulting coordinate systems would not be orthogonal rotations of each other anymore, which is a crucial requirement for the PLS regression modelling introduced below. Another important point is the difference in interpretation. While the parts in each group are amalgamated when using summated logratios, using balances instead implies that all possible logratios between parts from both groups are aggregated (Hron et al. 2020),

$$\begin{aligned} b_j = \sqrt{\frac{r_j s_j}{r_j + s_j}} \ln \frac{\root r_j \of { \prod _{k=1}^{r_j}x^{+}_{j_{i}}}}{\root s_j \of { \prod _{l=1}^{s_j}x^{-}_{j_{i}}}} = \sqrt{\frac{1}{r_j s_j (r_j + s_j)}} \sum _{i=1}^{r_j} \sum _{i=1}^{s_j} \ln \frac{x^{+}_{j_{i}}}{x^{-}_{j_{i}}} , \quad j=1,\ldots , D-1, \end{aligned}$$

which honours the fact that all relevant information in compositional data is contained in (pairwise) logratios. Balances stress that each part has a role in the relationship with the response. Accordingly, when investigating the possible effect on the health outcome of changes in the balance as measured through the respective regression coefficient, for instance by increasing the value of the balance by one, such a change implies to multiply the parts aggregated by the geometric mean in the numerator of the balance by the same constant. Note that the result here involves the normalising constant of the balance which guarantees orthonormality. Alternatively, interpretation can be facilitated by omitting the normalising constants of the balances and considering binary instead of natural logarithms (Müller et al. 2018), so that the orthonormality of coordinates is relaxed to be orthogonality. In this case, an increase of one in the balance corresponds with each part in the numerator being doubled and, consequently, the values of all ratios whose logs are aggregated into the balance being doubled. For an interpretation in terms of e.g. min/day of a regression coefficient, say for the balance Sleep.VPA_SB.LPA.MPA, the coefficient would account for how the health outcome is changed when the time allocations to Sleep and VPA are doubled with respect to SB, LPA and MPA.

3 PLS regression and biplot

PLS regression enjoys wide popularity in areas such as chemometrics (Höskuldson 1988; Martens 2001), especially in the case where the number of explanatory variables is significantly larger than the number of observations. It aims to fit the relationship between response variable(s) and potentially many and/or highly correlated explanatory variables by finding a small number of uncorrelated latent factors that synthetize the relationship in lower dimensions. The underlying assumption is that the observed data are generated by a process driven by this small number of latent factors, also known as PLS components. The values on the PLS components (scores) are linear combinations of the explanatory variables with parameters (loadings) determined in such a way that they maximize the covariance between the response and the explanatory variables. Regression coefficients associated with the original explanatory variables can be deduced from the PLS components. Even if PLS regression is particularly useful for the analysis of high-dimensional data, it offers other features that make the method also appealing for datasets with a relatively small to moderate number of explanatory variables, as it is commonly the case of movement behaviour data. These include the capacity to handle multicollinearity and highly correlated explanatory variables, the ability to separate main information from noise, the no requirement of distributional assumptions for error terms and, what is particularly relevant for our purposes, the possibility to visualize data in low dimensions via a PLS biplot that, unlike common PCA biplots, takes into account the relationship between outcome and explanatory variables.

In this work we focus on PLS regression with just one response variable, commonly known as PLS1 in practice (in contrast with PLS2 used for the multivariate response case). Although meaning PLS1, for the sake of succinctness we will refer to just PLS in the following.

In order to describe PLS regression, we consider two data structures: a column vector \({\textbf{y}}\) of size n and a matrix \({\textbf{F}}\) of size \(n \times m\) called design matrix. The vector \({\textbf{y}}\) represents values of the response variable on n objects, whereas the columns of \({\textbf{F}}\) describe m explanatory variables measured on the same n objects. The data are assumed to be mean-centred. The aim of PLS is to find a linear relationship of the form

$$\begin{aligned} {\textbf{y}} = {\textbf{F}} \varvec{\beta } + {\textbf{e}} \end{aligned}$$
(2)

by estimating a vector \(\varvec{\beta }\) of m unknown regression coefficients. The vector \({\textbf{e}}\).0 represents an error term with n components. PLS decomposes the design matrix as

$$\begin{aligned} {\textbf{F}} = {\textbf{T}} {\textbf{P}}' + \mathbf {E_{F}}, \end{aligned}$$

where \(\mathbf {E_{F}}\) is an error matrix, \({\textbf{T}}\) is a score matrix of size \(n \times a\), and \({\textbf{P}}\) is a loading matrix of size \(m \times a\), with a being the number of PLS components. This latter is usually selected based on cross-validated (CV) prediction performance assessed by root mean squared error of prediction (RMSEP) and coefficient of determination R\(^2\) (Varmuza and Filzmoser 2009). One procedure to determine the optimal number of PLS components is the randomisation test approach (van der Voet 1994). In brief, given a reference model chosen according to the absolute minimum in the CV curve, the procedure tests for the significance of increments of the squared prediction error from using models with fewer components. The selected model is the one with the smallest number of components that is not significantly worse than the reference model. In the following, \({\textbf{t}}_j\) and \({\textbf{p}}_j\), with \(\ j=1,\ldots ,a\), will denote the jth column of \({\textbf{T}}\) and \({\textbf{P}}\) respectively.

A number of numerical methods have been proposed to estimate the PLS model coefficients so that the covariance between the scores and the response is maximised. The NIPALS algorithm is a popular choice (Varmuza and Filzmoser 2009) and it can be summarised in the following steps.

For \(j=1,\ldots ,a\):

  1. 1.

    \({\textbf{v}}_j = \dfrac{{\textbf{F}}_j^\top {\textbf{y}}}{\left( {\textbf{y}}^\top {\textbf{y}} \right) }\), with \({\textbf{F}}_1 = {\textbf{F}}\),

  2. 2.

    \({\textbf{w}}_j = \dfrac{{\textbf{v}}_j}{\sqrt{{\textbf{v}}_j^\top {\textbf{v}}_j}}\),

  3. 3.

    \({\textbf{t}}_j = {\textbf{F}}_j {\textbf{w}}_j\),

  4. 4.

    \({\textbf{p}}_j = \dfrac{{\textbf{F}}_j^\top {\textbf{t}}_j}{\left( {\textbf{t}}_j^\top {\textbf{t}}_j \right) }\),

  5. 5.

    \(u_j = \dfrac{{\textbf{y}}^\top {\textbf{t}}_j}{\left( {\textbf{t}}_j^\top {\textbf{t}}_j \right) }\),

  6. 6.

    \({\textbf{F}}_{j+1} = {\textbf{F}}_j - {\textbf{t}}_j {\textbf{p}}_j^\top\).

The regression coefficients are then estimated by

$$\begin{aligned} \hat{\varvec{\beta }} = {\textbf{W}} \left( {\textbf{P}}^\top {\textbf{W}} \right) ^{-1} {\textbf{u}}, \end{aligned}$$

where the matrix \({\textbf{W}}\) is formed by columns \({\textbf{w}}_j\), \({\textbf{P}}\) by columns \({\textbf{p}}_j\) and the column vector \({\textbf{u}}\) by elements \(u_j, \ j = 1, \ldots , a\).

The NIPALS algorithm, as well as most of the other popular methods for PLS regression (e.g. kernel or SIMPLS algorithms), produces uncorrelated scores (i.e. \({\varvec{t}}_j^\top {\varvec{t}}_k=\delta _{jk}\), for \(j,k=1,\dots ,a\), with \(\delta _{jk}\) being 1 if \(j=k\) and 0 otherwise) (Varmuza and Filzmoser 2009).

The individual statistical significance of the explanatory variables is determined by the bootstrap method. Denoting by \({\hat{\mu }}_{k}\), \({\hat{\sigma }}_{k}\), \({\hat{\gamma }}_{k (\alpha /2)}\) and \({\hat{\gamma }}_{k (1-\alpha /2)}\) respectively the bootstrap mean, standard deviation, and the \((\alpha /2)\)- and \((1-\alpha /2)\) quantiles of the coefficient estimate \({\hat{\beta }}_{k}\), \(k=1,\ldots ,m,\), over S bootstrap resamples; the estimated bootstrap standardised coefficients are computed as \({\hat{\mu }}_{k}/{\hat{\sigma }}_{k}\) and \(100(1-\alpha ) \%\) bootstrap confidence intervals for standardised coefficients as \(({\hat{\gamma }}_{k (\alpha /2)}/{\hat{\sigma }}_{k}, {\hat{\gamma }}_{k (1-\alpha /2)}/{\hat{\sigma }}_{k})\). The k-th explanatory variable is considered statistically significant (at statistical significance level \(\alpha\)) if the respective confidence interval does not include zero.

Finally, the two-dimensional PLS biplot displays scores and loadings corresponding to the first two PLS components (Oyedele and Gardner-Lubbe 2015). That is, the representation of the n observations (using points) is given by the rows of the matrix \({\textbf{T}}_{2}=({\textbf{t}}_1, {\textbf{t}}_2)\) and the representation of the m explanatory variables (using arrows from the origin) is given by the rows of the matrix \({\textbf{P}}_{2}=({\textbf{p}}_1, {\textbf{p}}_2)\). The scores represent the projection of the observations onto the space defined by the PLS components, while the loadings represent the effect of the explanatory variables on the directions of the projections. Therefore, a PLS biplot provides a single graphical representation of the observations alongside the explanatory variables which, unlike ordinary biplots based on PCA, accounts for the relationship with the response variables as said above. The observations in the direction of an arrow are characterised by higher values on the corresponding explanatory variable. The sign of the relationship with the outcome variables determines the direction of the arrow.

3.1 Compositional PLS regression and biplot based on pivoting balances

Where compositional explanatory variables are present, PLS regression is applied on an adequate logratio coordinate representation (Gallo 2010; Kalivodová et al. 2015; Wang et al. 2010). Unlike in previous literature, we do not represent compositional data in just one coordinate system and instead extend the concept of pivot coordinates to a more general setting as described in Sect. 2.1. This allows to generate directly balance logratio coordinates which are of interest in the study at hand.

We consider D-part composition alongside other q non-compositional covariates as explanatory variables, with the composition represented by pivoting balances, i.e. by balance coordinates \(b_{1}^{(l)}, \ldots , b_{D-1}^{(l)}\) from the lth coordinate systems, \(l=1,\ldots ,L\). Specifically, recall that the interest is in each of the first balances \(b_{1}^{(l)}\) that includes all the parts in the given arrangement (see Sect. 2.1). Then the following L regression models are considered:

$$\begin{aligned} {\textbf{y}} = {\textbf{F}}^{(l)} \varvec{\beta }^{(l)} + {\textbf{e}}, \quad l=1,\ldots ,L, \end{aligned}$$
(3)

where the vector \({\textbf{y}}\) stands for the values of the response variable and the columns of the matrix \({\textbf{F}}^{(l)}\) combine values on lth set of balances and q non-compositional covariates for the n observations. The vector \(\varvec{\beta }^{(l)} = \left( \beta _1^{(l)},\ldots , \beta _{D-1}^{(l)}, \beta _D, \ldots , \beta _{D-1+q} \right) ^\top\) is the corresponding vector of regression coefficients and \({\textbf{e}}\) is the error term. Following the PLS approach depicted above, from the model fit resulting from each of the L coordinate systems, the focus is on the coefficient estimate of the first balance \(b_{1}^{(l)}\). So the estimates of interest can be summarised in the vector \(\left( {\hat{\beta }}_1^{(1)}, \ldots , {\hat{\beta }}_1^{(D)}, {\hat{\beta }}_D, \ldots {\hat{\beta }}_{D-1+q} \right)\). Note that the coefficients associated with the non-compositional covariates are invariant to the specific choice of balances due to the orthogonality of the coordinate representation and linearity of PLS regression (Helland 2010). This property also leads to the fact that the decomposition of the matrix \({\textbf{F}}^{(l)}\) yields the same score matrix \({\textbf{T}}\) in each of the L models; although of course different first \(D-1\) rows of the matrix of loadings

$$\begin{aligned} {\textbf{P}}^{(l)} = \begin{pmatrix} p_{11}^{(l)} &{} \cdots &{} p_{a1}^{(l)} \\ \vdots &{} \ddots &{} \vdots \\ p_{1, D-1}^{(l)} &{} \cdots &{} p_{a, D-1}^{(l)} \\ p_{1D} &{} \cdots &{} p_{aD} \\ \vdots &{} \ddots &{} \vdots \\ p_{1, D-1+q} &{} \cdots &{} p_{a, D-1+q} \end{pmatrix} \end{aligned}$$

are obtained from each model.

For the PLS biplot, we display only loadings corresponding to the first balance from each coordinate system and the loadings corresponding to non-compositional variables. That is, we use the matrix of scores corresponding to the first two PLS components \({\textbf{T}}_{2}=({\textbf{t}}_1, {\textbf{t}}_2)\) for the visualisation of the observations. Denoting by \({\textbf{P}}_{2}^{(l)}=({\textbf{p}}_1^{(l)}, {\textbf{p}}_2^{(l)})\) the matrix of loadings corresponding to the first two PLS components in the lth coordinate system, the first row of \({\textbf{P}}_{2}^{(l)}\) is used for the representation of the balance \(b_{1}^{(l)}\) with \(l=1, \ldots , L\); and the last q rows from any given \({\textbf{P}}_{2}^{(l)}\) are used to visualise the q non-compositional covariates. When interpreting the biplot, similarly to the case for PCA biplots (Kynčlová et al. 2016), we need to take into account that the loadings are generated from different PLS models. Then, observations in the direction of an arrow are characterised by higher (absolute) values on the corresponding balance (and hence by dominance of the parts in the numerator over those in the denominator of the logratio). Nevertheless, note that due to the way the PLS biplot is constructed, i.e. using first balances from different coordinate systems, the usual interpretation of the angles between arrows (loadings) in terms of degree of association between the corresponding balances would be misleading. For example, we cannot say that two balances are highly correlated based on the fact that their two arrows nearly overlap. However, in such a case, we can still refer to proximity between the balances, in the sense that having them pointing to the same direction indicates that they have a similar relationship with the response variable. This follows from the fact that the same scores are produced by all balance coordinate systems (as it is case for any olr coordinate system) and the corresponding matrices of loadings are orthogonal rotations of each other.

Building on such relationship between coordinate systems, a model can be fit in an arbitrary balance coordinate system and be used to obtain the regression coefficients and loadings for all the L models. Namely, it holds that

$$\begin{aligned} \left( {\hat{\beta }}_1^{(l)}, \ldots , {\hat{\beta }}_{D-1}^{(l)} \right) ^\top = {\textbf{G}}^{(l)} \left( {\textbf{G}}^{(k)} \right) ^\top \left( {\hat{\beta }}_1^{(k)}, \ldots , {\hat{\beta }}_{D-1}^{(k)} \right) ^\top \end{aligned}$$

and

$$\begin{aligned} \begin{pmatrix} p_{11}^{(l)} &{} \cdots &{} p_{a1}^{(l)} \\ \vdots &{} \ddots &{} \vdots \\ p_{1, D-1}^{(l)} &{} \cdots &{} p_{a, D-1}^{(l)} \end{pmatrix} = {\textbf{G}}^{(l)} \left( {\textbf{G}}^{(k)} \right) ^\top \begin{pmatrix} p_{11}^{(k)} &{} \cdots &{} p_{a1}^{(k)} \\ \vdots &{} \ddots &{} \vdots \\ p_{1, D-1}^{(k)} &{} \cdots &{} p_{a, D-1}^{(k)} \end{pmatrix}, \quad k, l = 1, \ldots L, \end{aligned}$$

where \({\textbf{G}}^{(k)}\) and \({\textbf{G}}^{(l)}\) stand for the respective matrices of log-contrast coefficients.

Consequently, we can obtain results for all the L models from one model, where compositions are expressed in clr coefficients instead of balances (note that unlike for ordinary least squares (LS) regression, collinearity between clr variables is not a problem for PLS regression). The regression coefficient estimates and the loadings for the non-compositional covariates, as well as the matrix of scores, are the same for any of the L models using balances. Denoting the regression coefficient estimates for the clr coefficients by \({\hat{\beta }}_1^{\star }, \ldots , {\hat{\beta }}_{D}^{\star }\) and the elements in the first D rows of the loading matrix by \(p_{ij}^{\star }\), it holds that

$$\begin{aligned} \left( {\hat{\beta }}_1^{(l)}, \ldots , {\hat{\beta }}_{D-1}^{(l)} \right) ^\top = {\textbf{G}}^{(l)} \left( {\hat{\beta }}_1^{\star }, \ldots , {\hat{\beta }}_{D}^{\star } \right) ^\top \end{aligned}$$

and

$$\begin{aligned} \begin{pmatrix} p_{11}^{(l)} &{} \cdots &{} p_{a1}^{(l)} \\ \vdots &{} \ddots &{} \vdots \\ p_{1, D-1}^{(l)} &{} \cdots &{} p_{a, D-1}^{(l)} \end{pmatrix} = {\textbf{G}}^{(l)} \begin{pmatrix} p_{11}^{\star } &{} \cdots &{} p_{a1}^{\star } \\ \vdots &{} \ddots &{} \vdots \\ p_{1D}^{\star } &{} \cdots &{} p_{aD}^{\star } \end{pmatrix}, \quad l = 1, \ldots L. \end{aligned}$$

Then, the relevant information about the L first balances can be obtained as

$$\begin{aligned} \begin{pmatrix} {\hat{\beta }}_1^{(1)} \\ \vdots \\ {\hat{\beta }}_{1}^{(L)} \end{pmatrix} = \begin{pmatrix} h_{1}^{(1, k)} &{} \cdots &{} h_{D-1}^{(1, k)} \\ \vdots &{} \ddots &{} \vdots \\ h_{1}^{(L, k)} &{} \cdots &{} h_{D-1}^{(L, k)} \end{pmatrix} \begin{pmatrix} {\hat{\beta }}_1^{(k)} \\ \vdots \\ {\hat{\beta }}_{D-1}^{(k)} \end{pmatrix} = \begin{pmatrix} g_{11}^{(1)} &{} \cdots &{} g_{1D}^{(1)} \\ \vdots &{} \ddots &{} \vdots \\ g_{11}^{(L)} &{} \cdots &{} g_{1D}^{(L)} \end{pmatrix} \begin{pmatrix} {\hat{\beta }}_1^{\star } \\ \vdots \\ {\hat{\beta }}_{D}^{\star } \end{pmatrix} \end{aligned}$$

and the composed matrix of loadings, with \(a=2\) for 2-dimensional graphical representation, is

$$\begin{aligned} \begin{pmatrix} p_{11}^{(1)} &{} p_{21}^{(1)} \\ \vdots &{} \vdots \\ p_{11}^{(L)} &{} p_{21}^{(L)} \end{pmatrix} = \begin{pmatrix} h_{1}^{(1, k)} &{} \cdots &{} h_{ D-1}^{(1, k)} \\ \vdots &{} \ddots &{} \vdots \\ h_{1}^{(L, k)} &{} \cdots &{} h_{ D-1}^{(L, k)} \end{pmatrix} \begin{pmatrix} p_{11}^{(k)} &{} p_{21}^{(k)} \\ \vdots &{} \vdots \\ p_{1, D-1}^{(k)} &{} p_{2, D-1}^{(k)} \end{pmatrix} = \begin{pmatrix} g_{11}^{(1)} &{} \cdots &{} g_{1D}^{(1)} \\ \vdots &{} \ddots &{} \vdots \\ g_{11}^{(L)} &{} \cdots &{} g_{1D}^{(L)} \end{pmatrix} \begin{pmatrix} p_{11}^{\star } &{} p_{21}^{\star } \\ \vdots &{} \vdots \\ p_{1D}^{\star } &{} p_{2D}^{\star } \end{pmatrix}, \end{aligned}$$

where

$$\begin{aligned} h_{j}^{(l, k)} = \sum _{i=1}^{D} g_{1i}^{(l)} g_{ji}^{( k)} \quad j = 1, \ldots D-1; \ k, l = 1, \ldots L. \end{aligned}$$

Finally, note that for pivot coordinates it holds that \({\hat{\beta }}_1^{(l)} = \sqrt{D/(D-1)} \cdot {\hat{\beta }}_l^{\star }\) and \(p_{j1}^{(l)} = \sqrt{D/(D-1)} \cdot p_{jl}^{\star }\), with \(\ l = 1, \ldots L\) and \(j=1,\ldots , a\). This is analogous to the case of loadings in principal component analysis (Kynčlová et al. 2016). The relationships between PLS regression results from different coordinate systems are examined in more detail in Appendix A.

4 Case study: examining the association of 24-h movement behaviours with adiposity

The association between the 24-h time-use composition MB = (Sleep,SB,LPA,MPA,VPA) and adiposity is investigated from a sample of 414 Czech school-aged girls. Fat mass (FM) is used as an adiposity-related indicator and was measured using multi-frequency bio-electrical impedance analysis (InBody 720, InBody Co., Seoul, Korea). The 24-h time-use data were collected using a wrist-worn tri-axial ActiGraph accelerometers (ActiGraph Corp., Pensacola, FL, USA) wGT3X-BT and GT9X Link for children and adolescents, respectively. Raw data were processed with the GGIR package (version 1.10-7) on the R system for statistical computing (Migueles et al. 2019). Time spent in SB, LPA, MPA and VPA was classified using cut-points for non-dominant wrist (SB: \(< 36\) mg; LPA: \(36-200\) mg; MPA: \(201-706\) mg; VPA \(\le 707\) mg, where mg stands for milli gravity-based acceleration units (Hildebrand et al. 2017)). The default algorithm guided by participants sleep log was used to detect sleep time (van Hees et al. 2015). Additionally, age and height were recorded. The age of the participants ranged from 8.1 to 19 years (with mean \(\pm\) standard deviation equal to \(13.8 \pm 2.9\) years). The height ranged from 117.7 to 181.9 cm (\(157.7 \pm 12.2\) cm). Fat mass ranged from 1.1 to 38.7 kg (\(12.5 \pm 6.6\) kg). The compositional mean for MB, computed as the vector of geometric means of its parts and re-scaled to be expressed in min/day, was (491.3, 676.0, 231.9, 38.2, 2.5). The study was approved by the Ethics Committee of the Faculty of Physical Culture, Palacký University Olomouc (reference number: 19/2017). It was conducted in accordance with the Ethical principles of the 1964 Declaration of Helsinki and its later amendments. Parents or guardians provided a written informed consent for the participation of their children in the study.

To gain a initial insight into the problem at hand, we can visualise the (centred) data using ternary diagrams. Instead of plotting all 10 possible combinations of 3-part subcompositions, we display the 3-part composition of sleep, SB and the amalgamated parts of PA; as well as the 3-part subcomposition consisting of the PA parts only (Fig. 2 in Appendix B). The data points are coloured according to the individuals’ fat mass. We can already observe some patterns from this limited visualisation. Particularly, it indicates that a higher proportion of time spent in SB is associated with higher fat mass, whereas the association points towards the opposite direction for VPA.

In the following, we illustrate how applying the proposed PLS regression biplot based on pivoting balances provides a more comprehensive view. As justified in Sect. 2.1, \(L=7\) regression models (3) are considered with FM (in log-scale) set as the response variable and the composition MB playing the role of explanatory variable, with this represented by sets of pivoting balances \(\left( b_{1}^{(l)}, b_{2}^{(l)}, b_{3}^{(l)}, b_{4}^{(l)}\right) , \ l=1,\ldots , 7\), resulting from seven balance coordinate systems. Naturally, fat mass depends on height and age (with these two being highly correlated for individuals in school age). Accordingly, age and height (both mapped into real space using a log-transformation) are put in as covariates in the models. PLS regression modelling is conducted as detailed in the previous section. Based on the randomised test approach, retaining the first two PLS components to produce the PLS biplot is considered adequate (CV RMSEP \(= 0.49\) and CV R\(^2 = 0.32\)). Thus, estimates of the regression coefficients are derived from the first two PLS components for each of the 7 regression models. Actually, following on the relationships between olr coordinate systems detailed in Sect. 3.1, a single model is fitted and all the other estimates are derived from this one.

Table 2 shows the bootstrap standardised regression coefficients estimated from 1000 bootstrap resamples (including the respective 95% bootstrap confidence intervals) for each of the first pivoting balances \(b_1^{(l)},\ l=1,\ldots ,7\), as indicated in Table 1, as well as for the additional covariates. Considering the usual \(5\%\) significance level threshold, statistically significant balances having a negative relationship with FM are the following: Sleep_SB.LPA.MPA.VPA, VPA_Sleep.SB.LPA.MPA, Sleep.VPA_SB.LPA.MPA, Sleep.MPA.VPA_SB.LPA and Sleep.LPA.MPA.VPA_SB. Thus, the significant balances are those including SB in the denominator and Sleep and/or VPA in the numerator whereas the non-significant include a confront Sleep against VPA. Thus, the results suggest that for obesity prevention (in school-aged girls) it would be beneficial to spend relatively less time sitting and more time sleeping and doing high-intense physical activity. Moreover, it is suggested that the two non-active behaviours (sleep and SB) have opposite associations with fat. That is, sleep, unlike SB, would play a positive role in fat reduction. Finally, both non-compositional covariates (age and height) are positively associated with FM as it would be expected.

Table 2 Bootstrap standardised coefficients (and \(95\%\) confidence intervals) from PLS regression fit to fat mass on movement behaviour pivoting balances with age and height as covariates (boldface indicates statistically significant results at 5% level)

It is worth mentioning that for this case study applying ordinary LS regression instead of PLS in a similar way (i.e. sequentially on the corresponding pivoting balance coordinate systems plus the covariates) leads to analogous final conclusions in terms of statistical significance for all the variables except for age, which is not identified as a relevant variable (see Table 5 in Appendix B). This counter-intuitive result is just a consequence of the high correlation between age and height, that is not well handled by LS regression but adequately so by using PLS instead.

Furthermore, the construction of the PLS biplot based on the first pivoting balances (together with the non-compositional variables) from the seven coordinate systems provides further insight into what would be more recommendable movement behaviour patterns within the 24-h period associated with lower adiposity (Fig. 1). The first two PLS components explain \(92.22\%\) of the variance of the explanatory variables and \(33.32\%\) of the variance of the response variable (i.e. the corresponding coefficient of determination for this model is 0.3332). The arrows representing the variables are coloured according to the sign of their respective associations with the FM outcome (positive in red and negative in blue for statistically significant associations, grey for non-significant association). Note that extending the arrow of a balance backwards from the origin would represent the direction of increasing value of the corresponding reciprocal balance. A colour gradient is used to distinguish the points according to the individuals’ FM (in log-scale). The points are fairly well distinguished according to the relationships of the significant variables with the FM outcome along both PLS component axes. The arrows of the variables with significantly positive regression coefficients point roughly towards the top-right quadrant, which includes mainly individuals having higher fat mass. The variables with significantly negative regression coefficients point to opposite direction (i.e. roughly bottom-left quadrant), where the individuals with lower fat mass mostly concentrate. A few outlying cases are observed in the bottom-right quadrant, particularly two cases of lower fat mass corresponding to the very young individuals with low level of physical activity. We can see that the first PLS component is largely related to the relative amount of VPA, while the second one is determined rather by the relative amount of sleep (plus age and height). The balance that most clearly indicates the division between lower and higher fat mass is Sleep.VPA_SB.LPA.MPA. So increasing the ratios of both VPA and sleep to SB, LPA and MPA is an even better strategy for obesity prevention than only increasing VPA with respect to the other behaviours. From the results obtained via PLS regression and the associated biplot we can therefore draw the conclusion that VPA is the most beneficial behaviour for fat reduction, however enjoying enough sleep is also relevant. Furthermore, the most beneficial action would be to spend more time in these activities at the expense of sitting, whereas there is no meaningful evidence of a beneficial impact of lighter PA.

Fig. 1
figure 1

PLS biplot based on compositional balances from seven pivoting balance coordinate systems. The lighter the colour of the point, the higher the fat mass level of the corresponding individual. Statistically significant variables in positive (resp. negative) direction are coloured in red (resp. blue), grey colour indicates non-significant variables. The dashed lines indicate the origin for the first and second PLS component axes (PLS comp. 1 and PLS comp. 2). A \(92.22\%\) of the explanatory variables variance (resp. \(33.32\%\) of the response variable variance) is explained by the first two PLS components: \(88.50\%\) by PLS comp. 1 and \(3.72\%\) by PLS comp. 2 (resp. \(20.49\%\) by PLS comp.1 and \(12.83\%\) by PLS comp. 2)

4.1 Comparison with pivot coordinates and PCA

For comparison, we can also look at the results obtained from PLS regression performed on pivot coordinates (Table 6 and Fig. 3 in Appendix B; the notation has been simplified as there is no need to detail the parts in the denominator). The results indicate that only balances considered in the pivoting balances are significant, and the PLS biplot leads to similar conclusions. However, note that if e.g. increasing MPA was also beneficial for fat reduction, the pivot coordinate representing MPA against the geometric mean of the others (which also aggregates logratios to VPA and sleep) would hardly give this information. Instead, the balance Sleep.MPA.VPA_SB.LPA could provide it when compared to Sleep.VPA_SB.LPA.MPA. Furthermore, we can compare the resulting PLS biplots with appropriate PCA biplots (Figs. 4 and 5 in Appendix B). Note that the relationships between PCA scores and loadings from different coordinate systems are analogous to those in the PLS case. Here, the beneficial impact of VPA for obesity prevention is depicted, but the role of sleep is not that easily deduced.

4.2 Isotemporal substitution analysis

By performing isotemporal substitution analysis (Dumuid et al. 2017a), we can look at how reallocation of time (in absolute sense) from one behaviour to another would impact the outcome variable. Table 3 shows estimated change in FM associated with 1 min reallocation between movement behaviours. These estimates were computed as follows: (1) PLS regression coefficient estimates obtained from each bootstrap resample were used to compute the relative difference between the predicted FM from the reallocated mean composition and the FM from the original mean composition (with parts adding to \(\kappa = 24\) h); (2) the mean across bootstrap resamples was used as final estimate of the change (including a confidence interval based on the \((\alpha /2)\)- and \((1-\alpha /2)\) quantiles of the distribution).

At the usual \(5\%\) significance level, statistically significant changes were obtained for time reallocations between VPA and any of the remaining activities and for time reallocation between sleep and SB. Hence, this support the idea that obesity can be prevented by increasing VPA even at the expanse of sleep. Moreover, note that to compare e.g. the impact of a time exchange of SB and VPA against SB and sleep, we should take into account that, in relative terms, 1 min of VPA means something completely different than 1 min of sleep. That is, taking our mean composition as reference, that means \(40\%\) of VPA, whereas \(40\%\) of sleep is about 200 min. Estimating the change in FM for a 200-min reallocation from sleep to SB (respectively from SB to sleep) we obtain a \(31.17\% \ (20.35, 43.84)\%\) increase (respectively \(-20.72\% \ (-26.78, -13.69)\%\) decrease).

Table 3 Estimated percentage changes in FM (with \(95\%\) confidence intervals) associated with 1 min reallocation between movement behaviours (statistically significant results at \(\alpha = 0.05\) indicated in boldface)

Therefore it seems reasonable to also examine the effect of relative time reallocations (something that in fact aligns with what pivoting balances are meant for). That is, looking at the difference in the outcome variable when, instead of a part \(x_j\), \(j=1,\ldots , D\), we consider \(R \cdot x_j\), with R being a proportional factor. Note that in order to have all the parts adding up to \(\kappa = 24\) h, the remaining parts \(x_k\), with \(k=1,\ldots , D\) and \(\ k\ne j\), need to be multiplied by \((\kappa - Rx_j)/(\kappa -x_j)\). In case that two parts \(x_i\) and \(x_j,\) with \(\ i, j =1,\ldots , D\) and \(\ i \ne j\), are multiplied by R, then the remaining parts \(x_k\), for \(\ k=1,\ldots , D\) and \(\ k\ne i,j\), need to be multiplied by \([\kappa - R(x_i+x_j)]/[\kappa -(x_i+x_j)]\), and so on for more parts. Using this procedure, Table 4 shows the results from increasing a part by \(10\%\), i.e. using \(R=1.1\), while the remaining parts are proportionally decreased. These results confirm that the relative increase in both VPA and sleep with respect to the remaining behaviours leads to even higher decrease in FM than only increasing VPA at the expense of the remaining behaviours.

Table 4 Estimated percentage changes in FM (with \(95\%\) confidence intervals) associated with \(10\%\) increase in given movement behaviours (statistically significant results at \(\alpha =0.05\) indicated in boldface)

5 Final remarks

This work introduces and demonstrates the potential of the new concept of pivoting balance coordinates. In combination with a tailored compositional PLS regression biplot display, we devise an advanced method to investigate and visualize movement behaviour patterns and their association with health outcomes. This type of low-dimensional representation becomes particularly relevant as the 24-h movement behaviours are increasingly described at a higher level of granularity thanks to the use of new accelerometer technologies. The proposed method combines advantages of the pivot coordinate and the balance coordinate logratio approaches to deal with compositional data. In particular, only those balances which are relevant for the analysis are considered and the mutual orthonormality of the coordinate systems allows to achieve a unique PLS solution. The user just needs to be aware that the pivoting balances come from different coordinate systems and adapt interpretation of their relationships accordingly.

The practical use of the method has been illustrated using a database about 24-h movement behaviour and adiposity in Czech school-aged girls. The findings stress the relevant role that time-use exchanges combining both VPA and sleep together can play in obesity prevention. The study emphasizes the relevance of considering all movement behaviours within the day, while accounting for their compositional character, to better understand the combined effects of movement behaviours in obesity within the paediatric population.

Building on previous developments, this work demonstrates the flexibility of the pivot coordinates approach (Hron et al. 2020). In principle, any logratio of interest (even beyond the concept of balances) can be set to occupy the pivotal position and be complemented by other coordinates to achieve orthonormality of the resulting olr coordinate system. We expect further methodological progress following this approach in the near future.

All computations in this work were performed on the R environment for statistical computing, (R Core Team 2021) using the packages compositions (van den Boogaart and Tolosana-Delgado 2008) and pls (Mevik and Wehrens 2007) for specific tasks. The computing routines implementing the proposed method are available at https://github.com/StefelovaN/Balance-based-PLS-biplot/.