Human beings constantly move the eyes to sample visual information of interest from the environment. Eye fixations deliver inputs with the highest resolution to the human visual cortex from the fovea, as well as blurry, low-spatial-frequency information from peripheral vision (Rayner, 1998). Thus, isolating statistically where and how long fixations are deployed to process visual information is of particular interest to behavioral researchers, psychologists, and neuroscientists. Moreover, fixation mapping has a wide range of practical applications in determining marketing strategies and the understanding of consumer behaviour (Duchowski, 2002).

Conventional eye movement data analyses rely on the estimation of probabilities of occurrence of fixations and saccades (or their characteristics, such as duration or length) within predefined regions of interest (ROIs), which are at best defined a priori—but often also defined a posteriori, on the basis of data exploration, which inflates the Type I error rate. Another issue with ROIs is of course that other important information not included in the ROI is discarded. In a continuous effort to circumvent the limitations of the ROI approach (for a detailed discussion on this point, see Caldara & Miellet, 2011), we previously developed an unbiased, data-driven approach to compute statistical fixation maps of eye movements: the iMap toolbox (Caldara & Miellet, 2011). From the very first version, the toolbox was developed as a MATLAB open source toolbox freely available for download online. The previous versions (1 and 2) made use of Gaussian smoothing and the random field theory as a statistical engine (Caldara & Miellet, 2011), which is one of the standard methods applied in statistical analyses for functional magnetic resonance imaging (fMRI) data (Penny, Friston, Ashburner, Kiebel, & Nichols, 2011). Version 3 introduced pixel-wise t test and bootstrap clustering in order to generate self-contained statistical maps (Miellet, Lao, & Caldara, 2014). However, all of the previous versions of iMap still suffered from a major limitation: They could only contrast two conditions at a time.

A major revision of the toolbox was necessary to enable the analysis of more complex experimental designs routinely used in the field. One of the most suitable and obvious statistical solutions to overcome this problem would be to implement a general linear model, a widespread approach in both behavioral and neural-imaging data analyses. In fact, many modern procedures for hypothesis testing, such as the t test, analysis of variance (ANOVA), regression, and so forth, belong to the family of general linear models. However, eye movement data are a sparse production of visual perceptual sampling. Unlike neuroimaging data, eye movement data contain many empty cells with little to no data points across the tested space (e.g., all of the pixels in an image). This caveat engenders a statistical problem when the same statistical inference procedure is applied on each pixel, regardless or whether or not its data are missing. To account for the sparseness and the high variation of spatial eye movement data, we developed a specific novel approach for smoothed fixation maps, which was inspired by the statistical framework implemented in diverse state-of-the-art neuroimaging data-processing toolboxes: statistical parametric mapping (SPM; Penny et al., 2011), Fieldtrip (Oostenveld, Fries, Maris, & Schoffelen, 2011), and LIMO EEG (Pernet, Chauveau, Gaspar, & Rousselet, 2011). In the simplest case, users can apply a massive univariate, pixel-wise linear mixed model (LMM) on the smoothed fixation data with the subject considered as a random effect, which offers the flexibility to code for multiple between- and within-subjects comparisons. Our approach allows users to perform all possible linear contrasts for the fixed effects (main effects, interactions, etc.) from the resulting model coefficients and the estimated covariance. Importantly, we also introduced a novel nonparametric statistical test based on resampling (permutation and bootstrap spatial clustering) to assess the statistical significance of the linear contrasts (Pernet, Latinus, Nichols, & Rousselet, 2015; Winkler, Ridgway, Webster, Smith, & Nichols, 2014).

In the next section, we briefly describe the key concepts of the LMM approach. We then introduce the novel nonparametric statistical approach on the fixed effects that we implemented in iMap4, which uses a resampling procedure and spatial clustering. We also report a validation of the proposed resampling procedures, and illustrate how iMap4 can be used, with both a subset of data from a previous study and computer-simulated data. Finally, we give an overview of future development and discuss technical insights on eye fixation mapping.

Linear mixed models

In this part, we outline the key elements and concepts of LMMs in comparison with general linear models (GLM) and hierarchical linear models (HLM). Mixed models represent a complex topic, and the discussion of many underlying mathematical details goes beyond the scope of this article. For general, thoughtful introductions to mixed models, users of the toolbox should refer to Raudenbush and Bryk (2002) and McCulloch, Searle, and Neuhaus (2011). Users may also wish to consult the documentation and help files of the LinearMixedModel class in the MATLAB Statistics Toolbox for details about parameter estimation and the available methods (

Statistical hypothesis testing methods that make use of the analysis of variance (regression, t test, ANOVA, analysis of covariance, etc.) are the most popular methods of data analysis in many fields of research. Commonly used in psychology and neuroimaging studies, these methods could all be written as particular cases of GLM:

$$ \begin{array}{c}\hfill {y}_i={\beta}_1{x}_{1i}+{\beta}_2{x}_{2i}+\dots +{\beta}_t{x}_{ti}+{\varepsilon}_i,\kern1.25em \hfill \\ {}\hfill {\upvarepsilon}_i \sim \mathrm{N}\left(0,{\sigma}^2\right),\hfill \end{array} $$

where y i is the ith experiment measure and β 1, β 2 ,, β t are the model coefficients. The error term ε i is normally distributed with mean 0 and variance σ 2. Alternatively, the GLMs (Eq. 1) could be expressed in matrix form:

$$ \begin{array}{c}\hfill \mathrm{Y}=\mathrm{X}\beta +\varepsilon, \kern1.5em \hfill \\ {}\hfill \varepsilon \sim \mathrm{N}\left(0,\ {\sigma}^2I\right),\hfill \end{array} $$

where matrix X = [x 1, x 2, , x t ] is the design matrix, and I is an n-by-n identity matrix (n being the total number of observation). Usually, one of the columns in X is 1, so that the model includes a constant or intercept coefficient that represents the overall mean. It is worth noting that the design matrix could be parameterized in a different way. In conventional psychology or behavioral researches, a sigma-restricted parameterization is often applied. In a sigma-restricted design matrix, X is full rank and invertible, and the degrees of freedom are equal to the number of columns. In comparison, many types of neuroimaging analysis software prefer a cell-mean model or an overparameterized design matrix at the single-subject level (Penny et al., 2011; Pernet et al., 2011). Such software uses an overparameterized design matrix, and its solution to Eq. 2 is given by projecting the response vector Y to the pseudo-inverse of the design matrix X. The form of the design matrix is important, since it codes different experiment designs and the intended statistical testing. In iMap4, the design matrix of the fixed effect can be a cell-mean model, sigma-restricted model (for the Type III ANOVA), or the offset from a reference model (for Type I ANOVA).

The coefficient estimations \( \widehat{\boldsymbol{\beta}} \) could be found easily by ordinary least squares or other, more robust methods. Finally, statistical inferences on the model estimations could be expressed in different forms, depending on the types of design matrix. In the case of sigma-restricted parameterization, we can separate the design matrix X sr and the vector of parameters β sr into two parts (Kherad-Pajouh & Renaud, 2010):

$$ {\mathrm{X}}^{\mathrm{sr}}=\left[{\mathrm{X}}_1\kern0.5em {\mathrm{X}}_2\right], {\beta}^{\mathrm{sr}}=\left[\begin{array}{c}\hfill {\beta}_1\hfill \\ {}\hfill {\beta}_2\hfill \end{array}\right] $$

where X 1 and β 1 are the components of interest, with the corresponding hypotheses:

$$ {\mathrm{H}}_0\ :\ {\boldsymbol{\beta}}_1 = 0\kern0.5em \mathrm{versus}\kern0.5em {\mathrm{H}}_1\ :\ {\boldsymbol{\beta}}_1\ \ne\ 0. $$

Given the Gaussian distribution of the error ε and the existence of the inverse or general inverse of design matrix X sr, we can get the statistics for the F test by means of ANOVA with the following equations (for simplicity, in Eqs. 48 we denote X = X sr):

$$ \mathrm{H}=\mathrm{X}{\left({\mathrm{X}}^{\mathrm{T}}\mathrm{X}\right)}^{\hbox{-} }{\mathrm{X}}^{\mathrm{T}} $$
$$ {\mathrm{X}}_{\mathrm{resid}}=\left(I\hbox{-} {\mathrm{X}}_2{\left({\mathrm{X}}_2^{\mathrm{T}}{\mathrm{X}}_2\right)}^{\hbox{-} }{\mathrm{X}}_2^{\mathrm{T}}\right)\ {\mathrm{X}}_1 $$
$$ {\mathrm{H}}_{\mathrm{resid}}={\mathrm{X}}_{\mathrm{resid}}{\left({\mathrm{X}}_{\mathrm{resid}}^{\mathrm{T}}{\mathrm{X}}_{\mathrm{resid}}\right)}^{\hbox{-} }{\mathrm{X}}_{\mathrm{resid}}^{\mathrm{T}} $$
$$ \mathrm{d}\mathrm{f}\mathrm{e}=\mathrm{Number}\ \mathrm{of}\ \mathrm{observations}-\mathrm{rank}\left(\mathbf{X}\right) $$
$$ F = \frac{\ {\mathbf{Y}}^{\mathbf{T}}\ {\mathbf{H}}_{\mathbf{resid}}\ \mathbf{Y}/\mathrm{rank}\left({\mathbf{X}}_1\right)\ }{{\mathbf{Y}}^{\mathbf{T}\ }\left(\boldsymbol{I} - \mathbf{H}\right)\ \mathbf{Y}\ /\ \left(\mathrm{d}\mathrm{f}\mathrm{e}\right)}, $$

where H represents the hat matrix of the linear model in Eq. 2; it projects the response vector Y onto the column space of X. H resid is the hat matrix of the hypothesis in Eq. 3, and dfe is the model’s degrees of freedom. F has a Fisher–Snedecor distribution \( \mathcal{F}\;\left(\mathrm{rank}\left({\mathrm{X}}_1\right),\;\mathrm{d}\mathrm{f}\mathrm{e}\right) \).

As a comparison, in an overparameterized design matrix or cell-mean model design matrix with design matrix X cm and the vector of parameters β cm, the statistics of various effects are performed by linear combinations of the coefficient β cm. For example, the equivalent Hypothesis (3) could be expressed as:

$$ {\mathrm{H}}_0\ :\ \mathbf{c}\ast {\boldsymbol{\beta}}^{\boldsymbol{cm}} = 0\kern0.5em \mathrm{versus}\kern0.5em {\mathrm{H}}_1\ :\ \mathbf{c}\ast {\boldsymbol{\beta}}^{\boldsymbol{cm}}\ \ne\ 0, $$

where rank(c) = rank(X 1 ) and c* β cm = β 1 in the sigma-restricted parameterization model. The related F test is then given by the quartic form of the linear contrast matrix c and the inverse of the covariant matrix of β cm (for simplicity, in Eqs. 10 and 11 we denote X = X cm):

$$ \mathrm{M}\mathrm{S}\mathrm{E}={\mathbf{Y}}^{\mathbf{T}\ }\left(\boldsymbol{I} - \mathbf{H}\right)\ \mathbf{Y}/\left(\mathrm{d}\mathrm{f}\mathrm{e}\right) $$
$$ F=\frac{{\left(\mathbf{c}\ast {\boldsymbol{\beta}}^{\boldsymbol{cm}}\right)}^{\mathbf{T}}{\left(\mathrm{M}\mathrm{S}\mathrm{E}\ast \mathbf{c}{\left({\mathbf{X}}^{\mathbf{T}}\mathbf{X}\right)}^{-}{\mathbf{c}}^{\mathbf{T}}\right)}^{-}\left(\mathbf{c}\ast {\boldsymbol{\beta}}^{\boldsymbol{cm}}\right)\kern1.25em }{\mathrm{rank}\left(\mathrm{c}\right)}, $$

where H and dfe are computed using Eqs. 4 and 7, respectively. Moreover, it could be proved that Eqs. 8 and 11 are equivalent. The related details and mathematical proofs could be found in many textbooks (e.g., Christensen, 2011).

The GLM Y = X β + ε could be easily extended into a generalized form with ε ~ Ν(0, σ 2 V) where V is some known positive definite matrix. Moreover, if a more specific structure of the error ε is available, the GLM (Eq. 2), which has one random-effect term (the error ε), could be further extended into a mixed model. Mixed models include additional random-effect terms that can represent the clusters or classes. In a typical neuroimaging study, this could be the subjects or groups. In the following example, we consider a simplified case in which only the subject is considered as the additional random effect. This type of model is one of the most widely used models in both fMRI and electroencephalography (EEG).

As a demonstration, here we consider a random intercept and slope model, with both the intercept (i.e., the overall mean of each subject) and the slope (i.e., the differences among conditions within each subject) varying independently. This type of HLM, or so-called two-level linear model, takes the form of an expansion of Eq. 1 into:

$$ \begin{array}{c}\hfill {y}_{ij}={\beta}_{1j}{x}_{1ij}+{\beta}_{2j}{x}_{2ij}+\dots +{\beta}_{tj}{x}_{tij}+{\upvarepsilon}_{ij},\kern1em {\upvarepsilon}_{ij} \sim \mathrm{N}\ \left(0,\ {\sigma}^2\right)\hfill \\ {}\hfill\ {\beta}_{1j} = {\beta}_{10}+{b}_{1j}, \kern1em {b}_{1j} \sim \mathrm{N}\ \left(0,\ {\sigma}_1^2\right)\hfill \\ {}\hfill \vdots \hfill \\ {}\hfill\ {\beta}_{tj} = {\beta}_{t0}+{b}_{tj}, \kern1.5em {b}_{tj} \sim \mathrm{N}\ \left(0,\ {\sigma}_t^2\right),\hfill \end{array} $$

where j stands for the jth subject. After substituting the subject-level parameters in the first-level model, Eq. 12 becomes

If we express the subject-level predictor x tij in the random effects by the term z tij , we get the LMM

which corresponds to the standard form of LMMs:

$$ \mathrm{Y}=\mathrm{X}\beta +\mathrm{Z}b+\varepsilon $$

b ~ Ν(0, σ 2 D), ε ~ Ν(0, σ 2 I), b and ε are independent from each other,where σ 2 D is the covariance matrix for the random effects. In the example here, D would be a j-by-j identity matrix. An alternative form of Eq. 12, as applied in LIMO EEG or SPM, can be found in Friston, Stephan, Lund, Morcom, and Kiebel (2005, Eq. 1).

HLMs are specific cases of LMMs. In a mixed model, factors are not necessarily hierarchical. Moreover, crossed factors between fixed effects and random effects are much easier to model in mixed than in hierarchical models. In additions, the fixed effects and random effects are estimated simultaneously in mixed models, which is not always the case in hierarchical models.

Parameter estimation in mixed models is much more complicated than in GLM or HLM. Assuming that the model in Eq. 13 has the error covariance matrix R: \( \mathrm{v}\mathrm{a}\mathrm{r}\left(\mathbf{Y}|\boldsymbol{b}\right)=\boldsymbol{R} \), this model is equivalent to \( y\sim \mathbf{N}\left(\boldsymbol{X}\boldsymbol{\beta },\;\boldsymbol{V}\right),\;\boldsymbol{V}=\boldsymbol{Z}\boldsymbol{D}{\boldsymbol{Z}}^{\boldsymbol{T}}+\boldsymbol{R} \). The estimation of the fixed effects β requires prior knowledge of V, which is usually unavailable. In practice, the variance component V is commonly replaced by an estimation \( \widehat{\boldsymbol{V}} \) based on one of several approaches, such as ANOVA, maximum likelihood (ML) estimation, restricted maximum likelihood (ReML) estimation, or Monte Carlo approximation (McCulloch, Searle, & Neuhaus, 2011; Pinheiro & Bates, 2000). In general, the model-fitting procedure of LMM is implemented in major statistical packages (e.g., R and Stata) by solving Henderson’s mixed model equation. iMap4 calls the MATLAB class LinearMixedModel from Statistics Toolbox (versions R2013b or above) to estimate the coefficients (fixed effect β and random effect b) and the covariance matrix V with various options (key concepts with regard to parameter estimations can be found in the MATLAB documentation: In brief, model coefficients are estimated by ML or ReML, and the pattern of the covariance matrix of the random effects (D) could take the form of a full covariance matrix, a diagonal covariance matrix, or other symmetry structure.

Statistical inferences in LMM are also much more complex than in a GLM. In a balanced design, or with the variance component V known, hypothesis testing of the fixed effect follows Eq. 8 or 11 as an exact test. However, in an unbalanced design with random effects, no exact F statistics are available, since biases in the estimation usually result in an unknown distribution of F (Kherad-Pajouh & Renaud, 2015). Although F and t values are available as approximate tests in most statistical packages, Baayen, Davidson, and Bates (2008) discouraged the usage of t or F statistics, and especially report of the p value, in mixed models. Other approaches have also been proposed. For example, likelihood ratio tests could be performed to test composite hypotheses by comparing the desired model with the reduced model. However, there are many constraints on the application of likelihood ratio tests (e.g., the method of model fitting and selection of the reduced model). Moreover, running multiple copies of similar LMMs is computationally expensive, especially in the context of pixel-wise testing, such as in iMap4.

Besides the practical problem of statistical inferences with LMM, another main challenge in the application of LMM to spatial eye movement data is the Type I error from multiple comparisons. To resolve these issues, we adopted resampling techniques for null-hypothesis statistical testing, as is suggested in neuroimaging analysis with GLM or HLM (Pernet et al., 2015; Winkler et al., 2014). Nonparametric statistics using Monte Carlo simulation are ideal for both parameter estimation and hypothesis testing (Baayen et al., 2008; Kherad-Pajouh & Renaud, 2015). In iMap4, we adapted a simplified version of the permutation test suggested by Winkler et al. (2014) and a bootstrap clustering method similar to the one applied in LIMO EEG (Pernet et al., 2011). Details of the proposed algorithm and preliminary validation result are described in the following section.

Pixel-wise modeling and spatial clustering

Although the generation mechanism of eye movement data is still largely under debate, recent theories and applications suggest that a spatial model is the most appropriate to consider the statistical analysis of fixations, especially their location distribution. For example, Barthelmé, Trukenbrod, Engbert, and Wichmann (2013) recommended using the point process framework to infer how fixations are distributed in space. Although we endorse this fruitful approach and its Bayesian nature, here we aimed to resolve this problem from the opposite perspective. Instead of inferring from the spatial distribution of the fixations, we inferred on each location in the search space (i.e., each pixel within the eyetracker’s recordable range or each pixel in the visible stimuli). In other words, we addressed the question: “How long is this pixel being fixated (or what is the probability of this pixel being fixated) in the function of the experimental conditions?”, by formally applying mixed models independently on each pixel, we have

$$ \mathbf{Y}(s)=\mathbf{X}\boldsymbol{\beta } (s)+\mathbf{Z}\boldsymbol{b}(s)+\boldsymbol{\varepsilon} (s) $$

For sD of the search space.

The complete procedure as implemented in iMap4 is explained in Fig. 1. The eye movement data for each participant are concatenated into one input data matrix. iMap4 first partitions the data matrix into a fixation characteristic matrix (red box) and an experiment condition information matrix (green box). The fixation characteristic matrix contains a fixation’s spatial location (x and y), the fixation duration, and an order index of each fixation. The experiment condition matrix contains an index of each subject, an index of each trial/item, and the different levels of each experimental condition. Fixation durations are then projected into the two-dimensional space according to their x- and y-coordinates at the single-trial level. iMap4 then smooths the fixation duration map by convoluting it with a two-dimension Gaussian kernel function:

$$ \mathrm{Kernel} \sim \mathrm{N}\left(0,{\sigma}^2I\right), $$

where I is a two-by-two identity matrix and the full width at half maximum (FWHM) of the kernel is 1° of visual angle as the default setting.

Fig. 1
figure 1

Illustration of the basic processing steps implemented in iMap4. The input data matrix is partitioned into an eye movement matrix and predictor matrix. Fixation durations are projected into the two-dimensional space according to their x- and y-coordinates at the single-trial level for each participant. The experimental information of each trial is also summarized in a predictor table. Subsequently, the sparse representation of the fixation duration map is smoothed by convoluting it with a two-dimensional Gaussian kernel function, \( \mathrm{kernel}\sim \mathrm{N}\left(0,\;{\sigma}^2\boldsymbol{I} \right) \). After estimating the fixation bias of each condition independently for all observers (by taking the expected values across trials within the same condition), iMap4 models the 3-D smoothed fixation map (item*xSize*ySize) independently for each pixel using an LMM. The result is saved as a MATLAB structure in LMMmap. iMap4 offers many parametric and nonparametric methods for hypothesis testing and multiple-comparison correction

This step is essential to account for the spatial uncertainty of eye movement recordings (both mechanical and physiological) and the sparseness of the fixation locations. The Gaussian kernel could also be replaced by other 2-D spatial filters to best suit the research question.

The resulting smoothed fixation map is a 3-D matrix. The last two dimensions of the fixation matrix are the sizes of the stimuli or search space. The information of each entry in the first dimension is stored in a predictor table, which is generated from the experiment condition matrix. Each experiment condition can be coded at the single-trial level in the predictor table, or as one entry by taking the average map across trials.

In addition, iMap4 provides a robust estimation option by applying Winsorization in order to limit extreme values in the smoothed fixation matrix. The goal here is to reduce the effect of any potential outliers. Additional options include: spatial normalization (z-scored map or probability map), spatial down-sampling (linear transformation using imresize in MATLAB) to optimize computing speed, and mask creation to exclude irrelevant pixels.

The resulting 3-D fixation matrix is then modeled in a LMM as the response variable. The results are saved as a MATLAB structure (LMMmap, as in the examples below). The fields of LMMmap are nearly identical to the output from the LinearMixedModel class. For each modeled pixel, iMap4 saves the model criterion, variances explained, error sum of squares, coefficient estimates, and their covariance matrix for both fixed and random effects, and the ANOVA results for the LMM. Additional modeling specifications, as well as other model parameters, including the LMM’s formula, design matrix for fixed and random effect, and residual degrees of freedom, are also saved in LMMmap. Linear contrasts and other analyses based on variance or covariance can be performed afterward from the model-fitting information. Any other computation on the LinearMixedModel output can also be replicated with LMMmap.

One of the crucial assumptions of pixel-wise modeling is that all pixels are independent and identically distributed. Of course, this assumption is never satisfied, neither before nor after smoothing. To ensure valid inferences on activity patterns in a large 2-D pixel space, we applied nonparametric statistics to resolve the biases in parameter estimation and problems arising from multiple comparisons. We developed two resampling-based statistical hypothesis-testing methods for the fixed-effect coefficients: a universal permutation test and a universal bootstrap clustering test.

The resampling tests on the model coefficient for fixed effects β operate on the fixed-effect-related variances. To do so, we simply removed the variance associated with the random effects from the response matrix:

$$ {\mathbf{Y}}_{\mathrm{fixed}}(s)=\mathbf{X}\boldsymbol{\beta } (s)+\boldsymbol{\varepsilon} (s)=\mathbf{Y}(s)-\mathbf{Z}\boldsymbol{b}(s), $$

For sD of the search space.

For any permutation test, iMap4 performs the following algorithms on Y fixed for each pixel.

Algorithm 1

For a given hypothesis or linear contrast c (as in Eq. 9), iMap4

  • Performs a linear transformation on the design matrix X to get a new design matrix M so that the partitioning of M = [M1, M2]. Then iMap4 computes the new coefficients by projecting Y fixed to the pseudo-inverse of M. The design matrix M is created so that the original hypothesis testing is equivalent to the hypothesis regarding the M1 coefficients. The matrix transformation and partition are the same as the algorithm described in Winkler et al. (2014, Appx. A).

  • Computes the residuals related to the hypothesis by subtracting the variance accounted for by M2 from Y fixed , to get Y rr .

  • Fits Y rr to M by solving Y rr  = M β m  + ε, and gets the statistical value F rr of M1 according to Eqs. 10 and 11. Note that to replicate the original hypothesis testing on the fixed effect, the new contrast c’ is just used to partition M into M1 and M2.

  • Permutes the rows of the design matrix M to obtain the new design matrix M *.

  • Fits Y rr to M * and gets the F rr * of M1 *.

  • Repeats the previous two steps a large number of times (k resamplings/repetitions), and the p value is then defined as in Eq. 16. Importantly, the family-wise error rate (FWER) corrected p value is computed by comparing the largest F rr * across all tested pixels in one resampling with the original F rr :

    $$ p=\frac{\left(\#\ {{\mathbf{F}}_{\boldsymbol{rr}}}^{*}\ge\ {\mathbf{F}}_{\boldsymbol{rr}}\right)}{k}. $$

Algorithm 1 is a simplified version of Winkler et al. (2014, Algorithm 1): The resampling table includes permutation but not sign-flipping, which assumes the errors to be independent and symmetric. Thus, the underlying assumptions are stronger than with classical permutations, which require only exchangeable errors (Winkler et al., 2014).

Importantly, this test is exact only under a balanced design with no missing values and only subjects as a random effect. As was previously shown in Kherad-Pajouh and Renaud (2015), a general and exact permutation approach for mixed-model designs should be performed on modified residuals that have up to second-moment exchangeability. This is done to satisfy the important assumptions for repeated measures ANOVA: normality and the sphericity of errors. However, there are strict requirements to achieve this goal: careful transformation and partition of both the fixed- and random-effects design matrices, and removal of the random effects related to M2 (Kherad-Pajouh & Renaud, 2015). In iMap4, we perform an approximation version by removing all random effects to increase the efficiency and speed of the huge amount of resampling computation in our pixel-wise modeling algorithm. Validation and simulation data set indeed showed that the sensitivity and the false alarm rate of the proposed algorithm were not compromised.

Algorithm 2

iMap4 performs the following algorithm on Y fixed for each pixel as the bootstrap clustering approach.

  • For each unique categorical variable, iMap4 removes the conditional expectations from Y fixed for each pixel. A random shuffling is then performed on the centered data to acquire Y c , so that any potential covariance is also disrupted. This is done to construct the true empirical null-hypothesis distribution in which all elements and their linear combinations in Y c have expected values equal to 0.

  • Randomly draws with replacement from {XZY c } equal numbers of subjects {X*, Z*, Y c *}.

  • Fits Y c * to X* by solving Y c * = X*β* + ε. For a given hypothesis or linear contrast c (as in Eq. 9), iMap4 computes the statistics value F* according to Eqs. 10 and 11, and their parametric p value under the GLM framework.

  • Thresholds the statistical maps F* at p*≤.05 and records the desired maximum cluster characteristics across all significant clusters. The cluster characteristics considered are cluster mass (summed F value within a cluster), cluster extent (size of the cluster), and cluster density (mean F value).

  • The previous three steps are repeated a large number of times, to get the cluster characteristic distribution under the null hypothesis.

  • Thresholds the original statistical map F at \( p\le .05 \) and compares the selected cluster characteristic with the value of the null distribution corresponding to the 95th percentile. Any cluster with the chosen characteristic larger than this threshold is considered significant.

The bootstrap clustering approach is identical to the bootstrap procedure described by Pernet et al. (2011; Pernet et al., 2015) if only a subject intercept is considered as the random effect. In addition, Algorithm 2 extents the philosophy and approach presented by Pernet et al. (2011; Pernet et al., 2015) to nonhierarchical mixed-effect models.

It is worth noting that we implemented in iMap4 a high-performance algorithm to minimize the computational demands of the large amount of resampling. The model fitting in both resampling approaches makes use of ordinary least squares. The inversion of the covariance matrices (required for Eq. 11) is computed on the upper triangular factor of the Cholesky decomposition. Calculation of the quartic form (as in Eq. 11) for all pixels is optimized by constructing a sparse matrix of the inverse of the covariance matrix. More details of these algrebraic simplifications can be found in the imapLMMresample function in iMap4.

Other multiple-comparison correction methods, such as Bonferroni correction, false discovery rate, or random field theory (RFT), could also be applied. A threshold-free cluster enhancement algorithm could also be applied on the statistical (F-value) maps as an option after the permutation and bootstrap clustering procedures (Smith & Nichols, 2009).

We performed a validation study to assess the Type I error rate when applying the permutation and bootstrap clustering approach for hypothesis testing. We used a balanced repeated measures ANOVA design with a two-level between-group factor and a three-level within-group factor. A total population of 134 observers (67 in each group) was drawn from previous face-viewing eye movement studies. We centered the cell means for the whole dataset to obtain the validation dataset under the null hypothesis (similar to Step 1 in Algorithm 2). Thus, we used real data to warrant realistic distributions and centered them to ensure that H0 was confirmed. Any significant output from iMap4 performed on this dataset would be considered as a false alarm (Type I error).

The validation procedure followed the steps below: We first randomly sampled without replacement a balanced number of subjects from both groups. We then ran iMap4 under the default settings and performed hypothesis testing on the two main effects and the interaction. To estimate the FWER, we computed the frequency of significant output under different statistics and MCC settings. Preliminary results based on 1,000 randomizations with a sample size of n ∈ [8, 16, 32, 64] showed that with an alpha of .05, the FWERs were indeed all under. 05 using nonparametric statistics (see Fig. 2b for the permutation test, and Fig. 2c and d for the bootstrap clustering test). More simulations considering a wider range of scenarios will be required to understand fully the behavior of the proposed approaches, although the cluster stats are likely to behave as in Pernet et al. (2015).

Fig. 2
figure 2

Validation results of the proposed resampling procedure for statistical inference. (a) Family-wise error rates (FWERs) using the uncorrected parametric p values. All FWERs are significantly above .05. (b) FWERs using the permutation approach (Algorithm 1). (c) FWERs using the proposed bootstrap-clustering approach (Algorithm 2) thresholds on cluster mass. (d) FWERs using the proposed bootstrap-clustering approach (Algorithm 2) thresholds on cluster extent. Notice that the FWERs of panels a and b are computed at the pixel level (i.e., the proportions of false-positive pixels across simulations), and the FWERs of panels c and d are calculated at the test level (i.e., the percentages of any false positives per test for the 1,000 simulations). Error bar shows the standard errors

Graphical user interface (GUI) and command line handling

iMap4 runs on MATLAB 2013b and above, since it requires some essential functions and classes from the Image Processing Toolbox and Statistics Toolbox in these versions. iMap4 will execute in parallel on multicores or distributed workers, when available.

We recommend that users install iMap4 as a MATLAB application. The users can call iMap4 directly in the MATLAB command window after installation. A general GUI will open upon >>iMAP, called in the command window or when launching the app (Fig. 3a). The users can then import the fixation data, load a preprocessed data matrix for LMM, or display the modeling results and perform statistical hypothesis testing. These main steps have their own independent GUIs: Create Fixation Matrix (Fig. 3b), Linear Mixed Model (Fig. 3c), and Display Results (Fig. 3d). Although most features of iMap4 could be obtained via these GUIs, we encourage advanced users to use command lines, especially for the additional options specification of the LinearMixedModel class. A short example of the command-line handling of the main functions is shown in Fig. 3e. A user guidebook containing the instructions for each step can be accessed via the Help button. We have also provided datasets with tutorial files to explain practically how to use iMap4. As a demonstration, two examples based on real and simulation data are given in the next section. MATLAB scripts of the examples are part of the iMap4 installation package.

Fig. 3
figure 3

The main graphical user interfaces of iMap4 (ad) and example command lines handling the core functions (e). For more details, please refer to the online guidebook and demonstration codes

Applications to real and simulation data

In the following examples, we illustrate iMap4’s flexibility and power with two real data sets and a computer simulation. All material and codes presented here are available in the iMap4 installation package.

Example 1

We consider first a subset of participants from Bovet, Lao, Bartholomée, Caldara, & Raymond, (2016), as a demonstration of the analysis procedure in iMap4. A step-by-step demonstration is available in the user guidebook and example code.

In short, the dataset consists of eye movement data from 20 male observers during a gaze-contingent study. Observers viewed computer-rendered female bodies in different conditions and performed a behavioral task (i.e., subjective ratings of bodily attractiveness). This was a within-subjects design with two experimental manipulations: the viewing condition (three levels: 2° spotlight, 4° spotlight, or natural viewing) and body orientation (two levels: front view or back view). The aim of the study was to evaluate the use of visual information for bodily attractiveness evaluation in the male observers. Other details of the experiment can be found in the article.

Fixation durations were projected into the two-dimensional space according to their coordinates at the single-trial level. The fixation duration maps were first smoothed at 1° of visual angle. We used the “estimated” option by taking the expected values across trials within the same condition independently for each observer. To reduce the computation time, we down-sampled the fixation map to 256*205 pixels and applied a mask to only model the pixels with average durations longer than half of the minimum fixation duration input.

Before proceeding to the modeling step, we visualized the preprocessed fixation maps and the descriptive statistics to get a sense of the data. For each of the categorical conditions, iMap4 outputs the mean fixation map for each level. Descriptive statistics for the following eye movement measures are saved in a matrix and will be plotted in a histogram or boxplot: number of fixations, sum of fixation durations (total viewing time), mean fixation duration, total path length (total eye movement distance in pixels), and mean path length. See Fig. 4 for an example of the descriptive-results output.

Fig. 4
figure 4

Descriptive results from iMap4 on the real dataset. (a) Five eye movement measures plotted in histograms. In this case, fixation durations are in milliseconds and path lengths are in pixels. (b) Mean fixation maps of all levels of the categorical conditions

We applied a full model on the fixation duration map without any spatial normalization:

$$ \begin{array}{l}{{\mathrm{Pixel}}_{\mathrm{Intensity}}}_{\left(x,\ y\right)}\sim \kern0.75em \mathrm{Viewing}\ \mathrm{condition}\kern0.5em +\kern0.5em \mathrm{Body}\ \mathrm{orientation}\kern0.5em +\kern0.5em \mathrm{Viewing}\ \mathrm{condition}\ast \kern0.5em \mathrm{Body}\ \mathrm{orientation}\hfill \\ {}\kern0.5em +\kern0.5em \left(\mathrm{fixation}\ \mathrm{duration}\ \left|\mathrm{subject}\right.\right),\kern1.5em x,y\ \in\ \mathrm{fixation}\ \mathrm{map}\ \mathrm{resolution}.\hfill \end{array} $$

Notice that the mean fixation duration for each condition and subject were treated as random effects to control for the variation across individuals. The parameters were fitted with restricted maximum likelihood estimation (ReML).

We encourage users to interpret the result from iMap4 in the following way. First, check the model fitting by displaying the model criteria. For example, Fig. 5a shows the R 2 values or multiple-correlation coefficients, which represent the proportions of variability in the fixation matrix explained by the fitted model. Interpretation of the result should be drawn with caution if the R 2 values are too low. The users can then proceed to test their hypotheses, such as through ANOVA or linear contrast, and perform multiple-comparisons corrections (Fig. 5b and c). A post-hoc analysis is applicable if any interaction is presented, or if any condition contains multiple levels. The user can select one or more significant area(s) as data-driven ROI(s) for the post-hoc analysis. iMap4 performs t tests between any pairs of categorical conditions within this ROI by using the raw input values from the nonsmoothed fixation matrix (Fig. 5d). In addition, users can compute the above-average or above-chance fixation intensity for each categorical predictor (Fig. 5e).

Fig. 5
figure 5

iMap4 results for Bovet et al. (2016) with different output styles. (a) Ordinary R 2 values for the fitted model. (b) ANOVA results of the main effects and interaction. Here the intensity represents the F values. iMap4 only displays significant maps. (c) Statistical results of the linear contrast [2° spotlight–natural viewing] in the back view condition. Here the F value is represented on a contour map. (d) Post-hoc analysis in the selected mask. The mask is generated from the significant region of the body orientation effect (left panel). The t test results are shown in the matrix in the right panel (labeled conditions on the y-axis minus those labeled on the x-axis). Only significant results are shown (p < .05, Bonferroni corrected). (e) One-tailed t tests against the average over all fixation intensities for the 2° spotlight front view and 2° spotlight back view conditions. The solid black lines contain the significant regions for all of the panels above

Example 2

As a second demonstration, we reanalyzed the full dataset from one of our previous studies, Miellet, He, Zhou, Lao, and Caldara (2012).

Previous studies testing Western Caucasian (WC) and East Asian (EA) observers had shown that people deploy different eye movement strategies during free viewing of faces. WC observers fixate systematically toward the eyes and mouth, following a triangular pattern, whereas EA observers predominantly fixated at the center of the face (Blais, Jack, Scheepers, Fiset, & Caldara, 2008; Caldara, Zhou, & Miellet, 2010). Moreover, human observers can flexibly adjust their eye movement strategies to adapt to environmental constraints, as has been shown using different gaze-contingent paradigms (Caldara, Zhou, & Miellet, 2010; Miellet et al., 2012). In our 2012 study, we tested two groups of observers in a face task in which their foveal vision was restricted by a blind spot. This was a mixed design with the culture of the observers as the between-subjects factor (WCs or EAs) and the blind spot size as the within-subjects factor (four level: natural viewing, 2° blindspot, 5° blindspot, or 8° blindspot). For more details of the experiment, please refer to Miellet et al. (2012).

Using iMap4, we created the single-trial 2-D fixation duration map and smoothed at 1° of visual angle. Importantly, to keep in line with Miellet et al. (2012), spatial normalization was performed by z-scoring the fixation map across all pixels independently for each trial (the results are identical without spatial normalization in this example). We also applied a mask generated with the default option. No down-sampling was performed. We then applied a full model on the single-trial fixation duration map made used of the “single-trial” option in iMap4:

$$ \begin{array}{l}{{\mathrm{Pixel}}_{\mathrm{Intensity}}}_{\left(x,\ y\right)}\sim \kern0.75em \mathrm{Observer}\ \mathrm{culture}+\mathrm{Blindspot}\ \mathrm{size}+\mathrm{Observer}\ \mathrm{culture}\ast \mathrm{Blindspot}\ \mathrm{size}\hfill \\ {}+\kern0.5em \left(1\left|\mathrm{subject}\right.\right),\kern.5em x,\ y\ \in\ \mathrm{fixation}\ \mathrm{map}\ \mathrm{resolution}.\hfill \end{array} $$

Only the subject predictor was treated as a random effect, and the model was fitted using ML.

After model fitting, we performed an ANOVA to test the two main effects and their interactions. We applied a bootstrap clustering test using a cluster density of 1,000 as the criterion. We found a significant interaction and a main effect of blind spot size, but not a main effect of culture (see Fig. 6a). This result replicates the findings in Miellet et al. (2012). Moreover, by performing a linear contrast of the model coefficients, we reproduced Fig. 2 from Miellet et al. (2012). The results using iMap4 are shown in Fig. 6b.

Fig. 6
figure 6

iMap4 results for Miellet et al. (2012). (a) ANOVA results of the linear mixed model. (b) Replication of the Fig. 2 results for Miellet et al. (2012), using linear contrasts of the model coefficients. The solid black lines contain the significant regions for all of the panels above

Example 3

We also used simulated data to illustrate the use of iMap4 with continuous predictor. We created a dataset and manually introduced an effect between the numbers of fixations and the subjective rating on a single-trial level. Moreover, to maximize the simulation’s efficiency, different linear relationships were introduced simultaneously. For each subject, we generated a data matrix through the two following steps:

  • In a 4*4 grid, we introduced a different linear relationship in each cell between fixation number and subjective rating. Figure 7a shows the linear relationships we introduced for one subject. We varied the slope and the strength of the linear association. The correlation was strongest on the top row (r = .9), and there was no correlation on the bottom row (r = 0). The slope varied among [1, 0.4, –0.2, –0.8] across the columns. Note that each dot on a scatterplot represents one trial, and the dots with the same rating (value on the x-axis) across subplots belong to the same trial. The resulting matrices after this step were a one-dimensional array Rating and a two-dimensional matrix P (matrix size: 16 * number of trials)

    Fig. 7
    figure 7

    iMap4 results on the simulation dataset. (a) Linear relationships being introduced into the 4*4 grid. The x-axis shows the z-scored rating, and the y-axis shows the expected number of fixations. The slopes between y and x are the same within each column ([1, 0.4, –0.2, –0.8], respectively), whereas the correlation rho is the same within each row ([0.9, 0.6, 0.3, 0], respectively). (b) One realization of a random trial for one subject. The left panel shows the raw fixation locations; the right panel shows the smoothed fixation number map. (c) The average fixation map across all trials for the 20 subjects. (d) Estimated relationships between rating and fixation number (regression coefficients). The black circles indicate statistical significance

  • The spatial locations of fixations were generated using linear Gaussian random fields. For each trial, we created a Gaussian mixture model gm using the gmdistribution class in MATLAB. The Gaussian mixture model gm contains 16 (4*4) 2-D Gaussian distribution components. The center of each component aligned with the center of each grid, and the covariance was an identity matrix with 1° of visual angle on the diagonal. Crucially, the mixing proportion of each component was decided by the column of the specific trial in P. A number of random fixations were then generated from this Gaussian mixture model gm. See Fig. 7b for a realization of one random trial for one subject.

The dataset contained 20 subjects performing 100 trials, each with an average fixation number of 58.02. Figure 7c shows the average map for fixation numbers. We fitted a simple model with ReML:

$$ \mathrm{Pixel}\_{\mathrm{Intensity}}_{\left(x,\ y\right)} \sim 1+\mathrm{Rating}+\left(1\left|\mathrm{subject}\right.\right),\kern1.5em x,y\in \mathrm{screen}\ \mathrm{resolution}. $$

The significant regression coefficients of Rating are shown in Fig. 7d. iMap4 accurately rejected the null hypothesis for most conditions when there was a significant relationship. For the most robust effect (r = .9), iMap4 accurately estimated the coefficients. It also correctly reported a null result for r = 0. Moreover, iMap4 did not report any significant effect for the weakest relationship (slope = –0.2, r = .3), due to the lack of power. Indeed, further simulations showed that increasing the numbers of fixations, trials, or subjects would lead to significance.

Discussion and future developments

In the present article, we have reported a major update of iMap, a toolbox for statistical fixation mapping of eye movement data. While keeping unchanged the general data-driven philosophy of iMap, we significantly improved the underlying statistical engine, by incorporating pixel-wise LMMs and a variety of robust nonparametric statistics. Crucially, the new analysis pipeline allows for the testing of complex designs while controlling for a wide range of random factors. We also implemented a full GUI to make this approach more accessible to MATLAB beginners. Examples from empirical and computer-simulated datasets showed that this approach has a slightly conservative FWER under H0, while remaining highly sensitive to actual effects (e.g., Fig. 6d). The present method represents a significant advance in eye movement data analysis, particularly for analyzing experimental designs using normalized visual stimuli. In fact, iMap4 uses a statistical inference method similar to those in fMRI and magnetoencephalography/EEG analysis. The interpretation of the statistical maps is simply done by looking at which stimulus features/pixels relate to the significant areas (after multiple-comparison correction). This procedure is similar to the interpretation of fMRI results: after a significant region is revealed, we can use its spatial coordinates to check in which part of the cortex the region activated above chance level is located.

As a powerful statistical tool, LMMs are gaining popularity in psychological research and have previously been applied in eye movement studies (e.g., Kliegl, Masson, & Richter, 2010). Similarly, particular cases of LMM, such as HLM or two-level models, are now standard data-processing approaches in neuroimaging studies. As a general version of HLMs, LMMs are much more flexible and powerful than other multilevel models. Most importantly, an exact same LMM could be applied to behavior, eye movement, and neuroimaging data, bridging these different measures to allow drawing more direct and complete conclusions.

However, there are both theoretical and practical challenges in using LMM for the statistical spatial mapping of fixation data. First, the fixation locations are too sparse to directly apply pixel-wise modeling. Similarly to previous versions of iMap, we used spatial smoothing of the fixation locations, a preprocessing step necessary to account for the measurement error of eyetrackers and the imprecision of the physiological system (i.e., the human eye). The second issue is selecting the appropriate hypothesis testing for LMM and the multiple-comparison problems caused by modeling massive number of pixels in nonbalanced designs. We addressed this issue by applying nonparametric statistics based on resampling and spatial clustering. Another important challenge is the constraint of computational resources. Parameter estimations using LMM, the pixel-wise modeling approach, and resampling techniques are very computationally demanding and time-consuming. To produce a useful but also usable tool, we adapted many advanced and novel algorithms, such as parallel computing. Preprocessing options such as down-sampling and applying a mask also significantly decrease the computational time of iMap4.

The comparison among ROIs/areas of interest, iMap 2.0, and the current version

In classical eye movement data analyses, particularly those considering fixation locations, the main challenge for statstially identifying the regions that have been fixated above chance level lies in the fact that we are facing a high-dimensional data space. Mathematically, each pixel represents one dimension that could be potentially important. However, it is trivial to say that many of these dimensions are redundant and could be reduced to a particular set of representations or features. In other words, eye fixation data points are embedded in a high-dimensional pixel space, but they actually occupy only a subspace with much lower dimensionality (Belkin & Niyogi, 2003). Indeed, in similar high-dimensional datasets, a low-dimensional structure is often assumed and is naturally the main focus for investigation. Thus, by arbitrarily choosing one or multiple ROIs, one can represent the high-dimensional dataset as a low-dimensional manifold. The fixation map thus projects into this manifold, and all the pixels within the same ROI are then considered as being in the same dimension. In this case, each ROI represents one feature. Such a method is comparable to early neural network and many other linear dimension reduction methods in the machine-learning literature with hand-coded features (LeCun, Haffner, Bottou, & Bengio, 1999; Sorzano, Vargas, & Montano, 2014).

The early versions of iMap (1 and 2) adopted a similar logic, but relied on RFT to isolate data-driven features. Therefore, the fixation bias in each pixel was projected into a lower-dimensional subspace, resulting in fixation clusters. Second-level statistics were then computed at the cluster level instead of the pixel level to perform statistical inference (Miellet, Lao, & Caldara, 2014).

From iMap 3 onward, we took a very different approach. We used spatial clustering and multiple-comparison correction to avoid the use of second-level statistics to perform statistical inference. In iMap4, the fixation bias is similarly modeled on each pixel using a flexible yet powerful statistical model: the LMM. The LMM, in combination with nonparametric statistics and a spatial clustering algorithm, directly isolates the significant pixels. As a result, the iMap4 outputs can be interpreted intuitively and straightforwardly at the map level (i.e., by visualizing the areas reaching significance from the tested hypothesis).

Parameter settings and statistical choices

Our aim was and still is the development of a data-driven and fully automatized analysis tool. However, even in iMap4 some of the parameters in the analysis rely on a user’s expertise and subjective choices, which thus should be considered carefully before use. These parameters include the kernel size for the smoothing procedure, the spatial down-sampling and masking, the spatial normalization, and the choice of statistics.

The rationale for the determining the kernel size for the smoothing procedure has been previously discussed (Caldara & Miellet, 2011), and the majority of the arguments we put forward in this previous article still hold true. Here, we would remind users that the spatial smoothing procedure mainly resolves the sparseness of fixation data. It also partially accounts for the spatial covariance, which is ignored in univariate pixel-wise modeling. Finally, it accounts for the recording errors from eyetrackers, such as drift during the calibration, pupil-size variations, and so forth.

We also recommend that users perform down-sampling and apply a mask before modeling their data. This step is important to reducing computational demands (time, memory, etc.). In general, we recommend that the down-sampling factor not be bigger than half of the smoothing kernel size. In other words, if the FWHM of the Gaussian kernel is 10 pixels, the rescale factor should be less than 5. We are currently running further simulations and validations to investigate the best parameters under different settings, and hopefully will provide a statistical data-driven solution for this choice in future updates.

Spatial normalization (via a z-scored or probability map) is available as an option in iMap4. Spatial normalization used to be a standard preprocessing procedure in previous versions of iMap. However, the hypotheses tested on raw fixation duration/number maps are fundamentally different from their spatially normalized versions. Importantly, after spatial normalization, the interpretation of results should be drawn on a spatially relative bias instead of on the absolute differences. Of course, if the viewing duration in each trial is constant within an experiment, spatial normalization will not make any difference.

For iMap4 we developed two main, nonparametric statistics based on resampling techniques. It is worth noting that different applicability comes with the choice of permutation tests versus bootstrap spatial-clustering tests. In our own experience during empirical and simulation studies, permutation tests are more sensitive for studies with small sample sizes; the bootstrap-clustering approach usually gives more homogeneous results but is biased toward bigger clusters. We suggest that users adopt a “wisdom of crowds” approach and look at the agreement among different approaches before concluding on the data analysis (Marbach et al., 2012). Nonconvergent results should be interpreted carefully.

An alternative to pixel-wise approaches

In recent years, other frameworks have been also developed to model eyetracking data (Boccignone, 2015). One such approach is the aforementioned Poisson point process model (Barthelmé et al., 2013). It is a well-established statistical model when the point (fixation) occurrence is the main concern. Under some transformation, the Poisson point processes model of fixation occurrences could be expressed and modeled as a logistic regression, making it straightforward to apply using conventional statistical software (Barthelmé & Chopin, 2015). For example, Nuthmann and Einhäuser (2015) made use of logistic mixed models to determine the influence of low- and high- visual properties in scene images on eye movements. Moreover, smooth effect and spatial covariants could be captured by applying regression splines in a generalized additive model, as demonstrated in Barthelmé and Chopin (2015).

Importantly, the point process model addresses different questions than does iMap. It is most appropriate when the effect of spatial location is considered irrelevant, a nuisance effect, or a fixed intercept (see, e.g., Barthelmé & Chopin, 2015; Nuthmann & Einhäuser, 2015). As a comparison, in iMap the parameters of interest are location specific, varying from pixel to pixel. In other words, the differences or effects among different conditions are location-specific, forming a complex pattern in two dimensions. These high dimension effects are more natural and easy to model using a pixel-wise model, as in iMap4.

Conclusion and future development

In conclusion, we have presented an advanced eye movement analysis approach using LMMs and nonparametric statistics: iMap4. This method is implemented in MATLAB with a user-friendly interface. We aimed to provide a framework for analyzing spatial eye movement data with the most sophisticated statistical modeling to date. The procedure described in the present article currently represents our best attempt to conform with the conventional null-hypothesis testing, while providing options for robust statistics. We currently are still working on many improvements, including functions to compare different fitted models, statistics on the random-effect coefficients, and replacing LMMs with generalized LMMs for modeling fixation numbers (Bolker, Brooks, Clark, Geange, Poulsen, Stevens, & White, 2009). In the future, we will also switch our focus to Bayesian statistics and the generative model (such as the Gaussian process) in an effort to develop a unified model of statistical inference for eye movement data (Jaynes & Bretthorst, 2003).