Cognitive, Affective, & Behavioral Neuroscience

, Volume 13, Issue 4, pp 690–702

fMRI reliability: Influences of task and experimental design



As scientists, it is imperative that we understand not only the power of our research tools to yield results, but also their ability to obtain similar results over time. This study is an investigation into how common decisions made during the design and analysis of a functional magnetic resonance imaging (fMRI) study can influence the reliability of the statistical results. To that end, we gathered back-to-back test–retest fMRI data during an experiment involving multiple cognitive tasks (episodic recognition and two-back working memory) and multiple fMRI experimental designs (block, event-related genetic sequence, and event-related m-sequence). Using these data, we were able to investigate the relative influences of task, design, statistical contrast (task vs. rest, target vs. nontarget), and statistical thresholding (unthresholded, thresholded) on fMRI reliability, as measured by the intraclass correlation (ICC) coefficient. We also utilized data from a second study to investigate test–retest reliability after an extended, six-month interval. We found that all of the factors above were statistically significant, but that they had varying levels of influence on the observed ICC values. We also found that these factors could interact, increasing or decreasing the relative reliability of certain Task × Design combinations. The results suggest that fMRI reliability is a complex construct whose value may be increased or decreased by specific combinations of factors.


fMRI statistics Reliability 

Supplementary material

13415_2013_195_MOESM1_ESM.pdf (604 kb)
Figure S1(PDF 603 kb)
13415_2013_195_MOESM2_ESM.pdf (3 mb)
Figure S2(PDF 3033 kb)
13415_2013_195_MOESM3_ESM.pdf (2.9 mb)
Figure S3(PDF 2970 kb)
13415_2013_195_MOESM4_ESM.pdf (24 kb)
Table S1(PDF 23 kb)


  1. Andersson, J. L., Hutton, C., Ashburner, J., Turner, R., & Friston, K. (2001). Modeling geometric deformations in EPI time series. NeuroImage, 13, 903–919.PubMedCrossRefGoogle Scholar
  2. Aron, A. R., Gluck, M. A., & Poldrack, R. A. (2006). Long-term test–retest reliability of functional MRI in a classification learning task. NeuroImage, 29, 1000–1006.PubMedCentralPubMedCrossRefGoogle Scholar
  3. Ashburner, J., & Friston, K. J. (1999). Nonlinear spatial normalization using basis functions. Human Brain Mapping, 7, 254–266.PubMedCrossRefGoogle Scholar
  4. Ashburner, J., Neelin, P., Collins, D. L., Evans, A., & Friston, K. (1997). Incorporating prior knowledge into image registration. NeuroImage, 6, 344–352.PubMedCrossRefGoogle Scholar
  5. Bennett, C. M., & Miller, M. B. (2010). How reliable are the results from functional magnetic resonance imaging? Annals of the New York Academy of Sciences, 1191, 133–155.PubMedCrossRefGoogle Scholar
  6. Brainard, D. H. (1997). The psychophysics toolbox. Spatial Vision, 10, 433–436. doi:10.1163/156856897X00357 PubMedCrossRefGoogle Scholar
  7. Braver, T. S., Cole, M. W., & Yarkoni, T. (2010). Vive les differences! Individual variation in neural mechanisms of executive control. Current Opinion in Neurobiology, 20, 242–250. doi:10.1016/j.conb.2010.03.002 PubMedCentralPubMedCrossRefGoogle Scholar
  8. Brodersen, K. H., Wiech, K., Lomakina, E. I., Lin, C. S., Buhmann, J. M., Bingel, U., & Tracey, I. (2012). Decoding the perception of pain from fMRI using multivariate pattern analysis. NeuroImage, 63, 1162–1170.PubMedCentralPubMedCrossRefGoogle Scholar
  9. Brown, G. G., Mathalon, D. H., Stern, H., Ford, J., Mueller, B., Greve, D. N., & Potkin, S. G. (2011). Multisite reliability of cognitive BOLD data. NeuroImage, 54, 2163–2175.PubMedCentralPubMedCrossRefGoogle Scholar
  10. Buracas, G. T., & Boynton, G. M. (2002). Efficient design of event-related fMRI experiments using M-sequences. NeuroImage, 16, 801–813.PubMedCrossRefGoogle Scholar
  11. Caceres, A., Hall, D. L., Zelaya, F. O., Williams, S. C., & Mehta, M. A. (2009). Measuring fMRI reliability with the intra-class correlation coefficient. NeuroImage, 45, 758–768.PubMedCrossRefGoogle Scholar
  12. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.Google Scholar
  13. Eaton, K. P., Szaflarski, J. P., Altaye, M., Ball, A. L., Kissela, B. M., Banks, C., & Holland, S. K. (2008). Reliability of fMRI for studies of language in post-stroke aphasia subjects. NeuroImage, 41, 311–322.PubMedCentralPubMedCrossRefGoogle Scholar
  14. Fliessbach, K., Rohe, T., Linder, N. S., Trautner, P., Elger, C. E., & Weber, B. (2010). Retest reliability of reward-related BOLD signals. NeuroImage, 50, 1168–1176.PubMedCrossRefGoogle Scholar
  15. Friston, K. J., Ashburner, J., Frith, C. D., Poline, J.-B., Heather, J. D., & Frackowiak, R. S. (1995). Spatial registration and normalization of images. Human Brain Mapping, 2, 165–189.CrossRefGoogle Scholar
  16. Gradin, V., Gountouna, V. E., Waiter, G., Ahearn, T. S., Brennan, D., Condon, B., & Steele, J. D. (2010). Between- and within-scanner variability in the CaliBrain study n-back cognitive task. Psychiatry Research, 184, 86–95.PubMedCrossRefGoogle Scholar
  17. Harrington, G. S., Buonocore, M. H., & Farias, S. T. (2006a). Intrasubject reproducibility of functional MR imaging activation in language tasks. American Journal of Neuroradiology, 27, 938–944.PubMedGoogle Scholar
  18. Harrington, G. S., Farias, S. T., Buonocore, M. H., & Yonelinas, A. P. (2006b). The intersubject and intrasubject reproducibility of FMRI activation during three encoding tasks: Implications for clinical applications. Neuroradiology, 48, 495–505.PubMedCrossRefGoogle Scholar
  19. Havel, P., Braun, B., Rau, S., Tonn, J. C., Fesl, G., Bruckmann, H., & Ilmberger, J. (2006). Reproducibility of activation in four motor paradigms: An fMRI study. Journal of Neurology, 253, 471–476.PubMedCrossRefGoogle Scholar
  20. Kimberley, T. J., Khandekar, G., & Borich, M. (2008). fMRI reliability in subjects with stroke. Experimental Brain Research, 186, 183–190.CrossRefGoogle Scholar
  21. Kong, J., Gollub, R. L., Webb, J. M., Kong, J. T., Vangel, M. G., & Kwong, K. (2007). Test–retest study of fMRI signal change evoked by electroacupuncture stimulation. NeuroImage, 34, 1171–1181.PubMedCentralPubMedCrossRefGoogle Scholar
  22. Koolschijn, P. C., Schel, M. A., de Rooij, M., Rombouts, S. A., & Crone, E. A. (2011). A three-year longitudinal functional magnetic resonance imaging study of performance monitoring and test–retest reliability from childhood to early adulthood. Journal of Neuroscience, 31, 4204–4212.PubMedCrossRefGoogle Scholar
  23. Liu, T. T. (2004). Efficiency, power, and entropy in event-related fMRI with multiple trial types: Part II. Design of experiments. NeuroImage, 21, 401–413.CrossRefGoogle Scholar
  24. Liu, T. T., & Frank, L. R. (2004). Efficiency, power, and entropy in event-related fMRI with multiple trial types: Part I. Theory. NeuroImage, 21, 387–400.CrossRefGoogle Scholar
  25. Matthews, P. M., Honey, G. D., & Bullmore, E. T. (2006). Applications of fMRI in translational medicine and clinical practice. Nature Reviews Neuroscience, 7, 732–744.PubMedCrossRefGoogle Scholar
  26. Maus, B., van Breukelen, G. J., Goebel, R., & Berger, M. P. (2010). Robustness of optimal design of fMRI experiments with application of a genetic algorithm. NeuroImage, 49, 2433–2443.PubMedCrossRefGoogle Scholar
  27. Mazziotta, J. C., Toga, A. W., Evans, A., Fox, P., & Lancaster, J. (1995). A probabilistic atlas of the human brain: Theory and rationale for its development. The International Consortium for Brain Mapping (ICBM). NeuroImage, 2, 89–101.PubMedCrossRefGoogle Scholar
  28. Miller, M. B., Donovan, C. L., Bennett, C. M., Aminoff, E. M., & Mayer, R. E. (2012). Individual differences in cognitive style and strategy predict similarities in the patterns of brain activity between individuals. NeuroImage, 59, 83–93.PubMedCrossRefGoogle Scholar
  29. Miller, M. B., Donovan, C. L., Van Horn, J. D., German, E., Sokol-Hessner, P., & Wolford, G. L. (2009). Unique and persistent individual patterns of brain activity across different memory retrieval tasks. NeuroImage, 48, 625–635.PubMedCentralPubMedCrossRefGoogle Scholar
  30. Miller, M. B., Van Horn, J. D., Wolford, G. L., Handy, T. C., Valsangkar-Smyth, M., Inati, S., & Gazzaniga, M. S. (2002). Extensive individual differences in brain activations associated with episodic retrieval are reliable over time. Journal of Cognitive Neuroscience, 14, 1200–1214.PubMedCrossRefGoogle Scholar
  31. Nichols, T., & Hayasaka, S. (2003). Controlling the familywise error rate in functional neuroimaging: A comparative review. Statistical Methods in Medical Research, 12, 419–446.PubMedCrossRefGoogle Scholar
  32. Raemaekers, M., du Plessis, S., Ramsey, N. F., Weusten, J. M. H., & Vink, M. (2012). Test–retest variability underlying fMRI measurements. NeuroImage, 60, 717–727.PubMedCrossRefGoogle Scholar
  33. Raemaekers, M., Vink, M., Zandbelt, B., van Wezel, R. J. A., Kahn, R. S., & Ramsey, N. F. (2007). Test–retest reliability of fMRI activation during prosaccades and antisaccades. NeuroImage, 36, 532–542.PubMedCrossRefGoogle Scholar
  34. Rau, S., Fesl, G., Bruhns, P., Havel, P., Braun, B., Tonn, J. C., & Ilmberger, J. (2007). Reproducibility of activations in Broca area with two language tasks: A functional MR imaging study. American Journal of Neuroradiology, 28, 1346–1353.PubMedCrossRefGoogle Scholar
  35. Sato, J. R., Hoexter, M. Q., Fujita, A., & Rohde, L. A. (2012). Evaluation of pattern recognition and feature extraction methods in ADHD prediction. Frontiers in Systems Neuroscience, 6, 68.PubMedCentralPubMedCrossRefGoogle Scholar
  36. Shrout, P., & Fleiss, J. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86, 420–428.PubMedCrossRefGoogle Scholar
  37. Soltysik, D. A., Thomasson, D., Rajan, S., Gonzalez-Castillo, J., DiCamillo, P., & Biassou, N. (2011). Head-repositioning does not reduce the reproducibility of fMRI activation in a block-design motor task. NeuroImage, 56, 1329–1337.PubMedCentralPubMedCrossRefGoogle Scholar
  38. Tzourio-Mazoyer, N., Landeau, B., Papathanassiou, D., Crivello, F., Etard, O., Delcroix, N., & Joliot, M. (2002). Automated anatomical labeling of activations in SPM using a macroscopic anatomical parcellation of the MNI MRI single-subject brain. NeuroImage, 15, 273–289. doi:10.1006/nimg.2001.0978 PubMedCrossRefGoogle Scholar
  39. Wager, T. D., & Nichols, T. (2003). Optimization of experimental design in fMRI: A general framework using a genetic algorithm. NeuroImage, 18, 293–309.PubMedCrossRefGoogle Scholar
  40. Waldvogel, D., van Gelderen, P., Immisch, I., Pfeiffer, C., & Hallett, M. (2000). The variability of serial fMRI data: Correlation between a visual and a motor task. NeuroReport, 11, 3843–3847.PubMedCrossRefGoogle Scholar
  41. Yetkin, F. Z., McAuliffe, T. L., Cox, R., & Haughton, V. M. (1996). Test–retest precision of functional MR in sensory and motor task activation. American Journal of Neuroradiology, 17, 95–98.PubMedGoogle Scholar
  42. Zhang, J., Anderson, J. R., Liang, L., Pulapura, S. K., Gatewood, L., Rottenberg, D. A., & Strother, S. C. (2009). Evaluation and optimization of fMRI single-subject processing pipelines with NPAIRS and second-level CVA. Magnetic Resonance Imaging, 27, 264–278.PubMedCrossRefGoogle Scholar

Copyright information

© Psychonomic Society, Inc. 2013

Authors and Affiliations

  1. 1.Department of PsychologyUniversity of California at Santa BarbaraSanta BarbaraUSA

Personalised recommendations