25 years of inflated false-positives / by Siobhán Cronin

3T Toshiba scanner drawing by author

3T Toshiba scanner drawing by author

While statistical analysis forms the backbone of experimental research, it is not often that statisticians step forward as the heroes in our innovation narratives. Yet here we have a study that boldly questions the statistical premises that underscore the validity of over two decades of fMRI research.

Cluster failure: Why fMRI inferences for spatial extent have inflated false-positive rates. Proceedings of the National Academy of Science. May 17, 2016 [Full Text]

This study calls attention to false-positive rates embedded in the leading neuroimaging software platforms (SPM, FSL, AFNI). In particular, the results call into question the validity of spatial extent reported in results from research studies using these platforms, which the authors estimate at a staggering 40,000 studies. In fact, the authors assert that in the 25 year history of fMRI research, this is the first study to validate its core statistical methods using real data.

As a quick refresher, false-positive rate is the expectancy of the false positive ratio, and is arrived at by dividing the number of false positives (FP) by the sum of false positives (FP) and true negatives (TN). The shorthand, we typically say the false-positive rate is the probability of falsely rejecting the null hypothesis. For instance, if our null hypothesis states there are is no difference between the taste of apples and apples, the false positive data would boost the probability we would reject the null, leading us to believe there are indeed differences between the taste of apples and apples.

The authors of this study performed 3 million task group analyses in a study of 499 healthy subjects, and where they expected to find a false positive rate of 5%, they found that the most commonly software packages (SPM, FSL, AFNI) resulted in false-positive rates of up to 70%.

So, what are these statistical methods anyway, and why was this overlooked?

Well, they have to do with the spatial autocorrection of these software platforms, and the assumptions required to justify the use of Gaussian random-field theory (RFT) for FWE-corrected voxelwise and cluterwise inferences. Specifically, these assumption are that the 1) “spatial smoothness of the fMRI signal is constant over the brain”, and 2) “the spatial autocorrelation function has a specific shape (a squared exponential)”. For further reading on these assumptions, the authors cited Hayasaka & Nichols’ 2003 article in Neuroimage, Validating cluster size inference: Random field and permutation methods.

The authors’ observed “heavy tails” of false clusters in resting-state fMRI data that appeared to become more pronounced after spatial smoothing. They analyzed these clusters to see if they appeared “randomly in the brain”, which would confirm the spatial smoothness assumption. However, when they generated their maps of voxelwise cluster frequency they showed the posterior cingulate to be most clustered and white matter to be least covered. The identification of this posterior cingulate “hot spot” led them to state “that violation of the stationary smoothness assumption may also be contributing to the excess of false positives”. The study goes on to test if false clusters are equally inflated in task-based fMRI data, which they indeed found to be the case using four task datasets downloaded from OpenfMRI.

What do we make of this?

The research community will undoubtedly debate this article’s methods in the months to come as it works to fully digest its implications, and my hope is that additional statisticians will weigh in as well as the software companies themselves.

In the meantime, here are few talking points we can chew on:

  • Open data culture continues to provide researchers with the data access they need to conduct the meta-analyses necessary to validate research paradigms and such field-wise statistical analyses. Such work strengthens science as a whole, yet the commodification and siloing of knowledge by institutions continues to thwart this progress. What will it take to shift the economics of research in favor of open data?
  • This article begs the question — how can we in the the research community take responsibility to validate the code and address bugs in the proprietary software we use? Is the information we need too sewn up by the companies designing and selling this software? Could further developing open source research tools provide us with greater agency and accuracy? In the meantime, how common is it for researchers to publish the platform version they used in their analysis (which could be useful when bugs are identified that could affect previously published data)?
  • If we are doing our science right, our work will invariably complicate the research we conducted ten years ago, and should definitely complicate the research of our predecessors. Tools advance in their precision. Knowledge advances in its nuanced treatment of complex phenomena. Statistical analysis and computational strength grow up alongside our growing disciplines. How might we build toward a more robust read-write-revise global science culture? One that is comfortable with the historical erosion of certainty that the advance of knowledge demands? How can this been done in a climate where funding and job security are so intimately connected to research outcomes?

These are horizons for continued discussion and collaboration. In the meantime, the authors point to some great projects I encourage you to check out, including the OpenMRI project already stated, as well as the 1,000 Functional Connectomes Project and NeuroVault.org.