Leanpub: Publish Early, Publish Often

Chapter 4 - Brain mapping: A conceptual overview

What is a brain map?

Understanding the basics of brain mapping is increasingly important for a broad segment of society as brain images make their way into media, medical practices, courtrooms, advertisements, and other sectors of public life. However, without an explanation of the process and some of the ground rules, our understanding of how we construct brain images and what they can and cannot tell us about the brain and the mind is not obvious.

Both functional and structural imaging rely on construction of brain maps, which are maps of localized signals. There are many types of brain signals that we can map which relate to many external (outside the brain) conditions and outcomes. However, different types of brain maps rely on many of the same principles and underlying assumptions. We will devote this chapter to a conceptual overview of how researchers construct brain maps, what we can learn from them, and what some of those assumptions and limitations are.

The brain maps like those shown in Figure 4.1, generally speaking, are statistical constructions. In some cases, brain images display actual data values; this is typical in neuroradiology, in which experts `read’ an image and come up with an opinion or diagnosis. However, in most scientific areas, researchers want to make quantitative inferences, which means statistically comparing image data across conditions or individuals and then showing maps of the statistical results. We often call this practice statistical parametric mapping. Such maps show brain areas where researchers have deemed some effect of interest statistically significant.

Types of maps

The types of processes that researchers map to local brain regions or networks are numerous. They include:

Effects of experimental manipulations
Correlations with behavior, clinical status, or other person-level outcomes
Correlations with performance or other within-person variables
Brain areas’ correlation with other specific areas
Brain areas’ that are part of a group of areas (e.g. a cluster or network)

Accordingly, a first question to ask about any brain map is what effect it actually maps.

Types of inference

A second question to ask is to whom does the map apply - which individual or population of individuals? Data from only a single individual, scanned repeatedly, can be used to construct some maps as shown in Figure 4.2’s top panel. We refer to these as single-subject maps. These maps are common in some sub-fields, such as vision science or primate neuroimaging, and are increasingly present in clinical and legal applications. Researchers can construct single-subject maps by comparing data from one condition (e.g. one experimental task) with another across repeated measurements, which thus test statistical significance in each brain region or `voxel’ (a three-dimensional cube of brain). Another method to construct these maps is by comparing an individual with a population of other individuals. If the statistics are valid (a big if!), such maps can say something useful about how an individual’s brain differs from others’ brains.

Figure 4.2. Examples of single-subject maps (top row) and group-level maps (bottom row).

However, single-subject maps cannot tell us much about the brain’s general organization: researchers cannot use them to make population inferences, which are claims about how the brain functions in general. To make such claims, it is necessary to scan a group of participants and conduct statistical tests which explicitly evaluate how well the findings likely will generalize to new individuals. We refer to these as group-level maps; they identify brain areas that show consistent effects across individuals. The bottom panel of Figure 4.2 shows a schematic view of a group-level map’s construction. Technically speaking, such maps require a statistical procedure which we often call random effects analysis because the statistical model treats each participant as a random effect. We will return to this in more detail in later chapters. The map shown in Figure 1 is a group-level brain activity map.

Maps can widely vary in what they reflect, but they all share the same underlying basic distinction between single-subject and population inference. For example, Figure 4.3 shows maps of three different kinds of brain connectivity. In this case, the colored regions do not show the significant effects of interest; here the lines connecting regions show the effects: they indicate significant functional associations across regions. The left map comes from a dynamic causal model (DCM), which analyzes dynamic regional changes from one second to the next, while controlling for other regions and experimental task variables, to examine relationships among regions. Lines show significant associations at a population level. The center map comes from a method which identifies the most likely connections among regions and their variations across time. The connections the lines identify are not necessarily individually significant. This is common practice with many multivariate map types: one must be careful to make the correct inference because regions associated with a `network’ are not necessarily all significantly associated. Finally, the map at the right shows a large scale network in which each colored circle represents a brain region or system and each line shows significant associations across studies. Clearly, knowledge of a map’s construction process and its level of analysis are crucial for understanding what it means.

Figure 4.3. Maps illustrating three different kinds of brain connectivity.

Fundamental assumptions and principles

In order to make statistical maps of all kinds, we rely on the assumption that the brain signals we measure reflect both effects of interest and noise. Researchers further assume that the noise is independent from the effects of interest (e.g. ``random’’). Repeated measurements in which the noise varies independently and stochastically allow us to obtain an average map that contains the true effect and reduces noise to a minimum. As the noise randomly varies around the true effect, it `averages out’, so the more data we collect, the closer the average noise will get to zero - as long as the noise is independent of the interest effect.

Consider the example in Figure 4.4. The brain - we show one representative horizontal slice here - contains some areas with a true effect, shown in blue. Perhaps this is a working memory task that requires people to maintain more versus less information in their minds; the map reflects concentration of the blue areas in frontal and parietal cortico-striatal networks. We observe a mixture of the true effects (signal) plus random noise, in red here.

Figure 4.4. (Top) A single slice of the brain contains some areas with a true effect, shown in blue. We observe a mixture of the true effects (signal) plus random noise, in red here. Statistical test are used to infer which voxels show true effects. (Bottom) Three common data types that go into such maps: task-related group analyses that compare a task of interest to a control task; brain-behavior correlations; and the average accuracy in predicting a stimulus category or behavior from each voxel's local multivariate patterns of brain activity. — Figure 4.4. (Top) A single slice of the brain contains some areas with a true effect, shown in blue. We observe a mixture of the true effects (signal) plus random noise, in red here. Statistical test are used to infer which voxels show true effects. (Bottom) Three common data types that go into such maps: task-related group analyses that compare a task of interest to a control task; brain-behavior correlations; and the average accuracy in predicting a stimulus category or behavior from each voxel’s local multivariate patterns of brain activity.

Importantly, this noise is non-zero even averaged across the observed data, so we need to first separate it from the signal and then decide which areas really show the effect. We do this with a statistical test which compares each voxel’s observed effect with its noise level (i.e. signal/noise). Common statistics, which include T-scores, F-values, and Z-scores, are all examples of such signal-to-noise ratios. We then compare the resulting statistic value with an assumed distribution to obtain each voxel’s p-value. The p-value reflects the probability of observing a statistic value (e.g. a T-score) as or more extreme then that actually observed under the null hypothesis - that is, if there is no true effect. The lower the P-value is, the less likely that we believe the null hypothesis is true. We compare p-values with a fixed value to threshold the map and to infer which voxels show true effects. Because of the many possible tests, researchers often set a very high bar for significance (i.e. low p-values) by correcting for multiple comparisons.

When we use standard statistic values like T-scores and compare them with their canonical, assumed distributions, we are using parametric statistics. When we use the data itself to estimate the null hypothesis’ Ñ which often involves fewer assumptions Ñ we are using nonparametric statistics.

In most cases, we test each voxel in the brain separately, ignoring other voxels’ potential influence, to construct brain maps. This is the case whether one maps activations which respond to a task, structural differences between groups, or functional correlation of areas with a `seed’ region of interest. It is a big assumption that the rest of the brain doesn’t matter, so many multivariate analyses relax this assumption in certain ways (depending on the specifics of the multivariate model). However, the assumption is in some ways quite useful as we can interpret one brain area’s effects independently of other area’s responses. For example, a brain map which correlates activity levels in an anger-induction task with self-reported anger levels can provide a simple picture of which areas are associated with anger and so can be a starting point for more sophisticated models.

This basic brain mapping procedure applies to the vast majority of published neuroimaging findings, including both structural and functional imaging using MRI and PET. Figure 4.4’s bottom panels show three common data types that go into such maps. On the left, the statistical brain map’s voxels reflect a task-related group analysis that compares a task of interest to a control task. Each data point that goes into the test at that voxel (the circles) is the [task - control] contrast magnitude from one participant; the null hypothesis here is that the population’s [task - control] differences are zero. The center map shows a brain-behavior correlation in which the test statistic is the correlation between the activity levels (often in a [task - control] contrast) and an external outcome, as in the anger example above. The right map shows an ``information-based mapping’’ test in which the test statistic is the average accuracy in predicting a stimulus category or behavior from each voxel’s local multivariate patterns of brain activity. In all of these cases, the above principles and assumptions apply.

Bringing prior information to bear: Anatomical hypotheses

Regardless of the type of map constructed and the variables involved, researchersÕ basic question is, ``is there some effect at this location?’’ As Figure 4.5 shows, researchers can apply hypothesis tests to each brain voxel or to a set of voxels in pre-defined regions of interest (ROIs). They can also apply hypothesis tests to voxels in a single ROI or to signals averaged over voxels in one or more ROIs. These examples illustrate a progression from conducting many tests across the brain to performing few tests, a movement that depends on the prior information brought to bear to constrain hypotheses.

Figure 4.5. Researchers can apply hypothesis tests to each brain voxel, to a set of voxels in pre-defined regions of interest (ROIs), to voxels in a single ROI, or to signals averaged over voxels in one or more ROIs, depending on the prior information brought to bear to constrain hypotheses.

The more tests researchers perform then the more stringent the correction for multiple comparisons must be if they are to interpret all significant results as `real’ findings. As the threshold becomes more stringent, statistical power - the chance of finding a true effect if it exists - drops, often dramatically, which entails increasingly missed activations. In the extreme case in which there is only one ROI and the signal in its voxels is averaged, researchers perform only one test and do not need multiple comparisons correction.

Researchers need not limit a priori hypotheses to single regions; it is also possible to specify a pattern of interest, in which an average or a weighted average is taken across a set of brain region, and a single test is performed. Figure 4.6 shows an example from a working memory study. We first defined a pattern of interest based on previous working memory studies from neurosynth.org, which is an online repository of over 10,000 studies’ activation results. Then we applied the pattern to working memory-related maps from two participant groups - a group exposed to a social evaluative threat (SET) stressor and a control group - by calculating a weighted voxel activity average in the pattern of interest. Applying the pattern allowed us to (a) establish that, in our study, working memory produced robust activation in the pattern expected from previous studies and (b) test for SET effects on working memory-related activation without needing multiple comparison correction.

Figure 4.6. An example from a working memory study. A pattern of interest based on previous working memory studies was created. The pattern was applied to data from two groups (one exposed to a stressor and a control group)by calculating a weighted voxel activity average in the pattern of interest. This allowed for a test of stressor effects on working memory-related activation without requiring multiple comparison correction.

There are thus many benefits to specifying anatomical hypotheses a priori. However, when we specify a priori hypotheses, we must truly specify the region or pattern in advance based on data whose errors are independent from the dataset testing the effect, otherwise the p-values and the inferences will not be valid. We suspect there are many unreported cases of post-hoc ``a priori’’ selection of ROIs.

Types of inference: What brain maps can and cannot tell us

What can we infer from thresholded brain maps of all types, regardless whether they concern anatomy, neurochemistry, or functional activation? What we can make inferences about is rather specific and may not be exactly what you expect. Below, we discuss inferences about brain effects, a term which applies to many types of images that span beyond task-based activation, anatomical relationships with behavior, or maps of molecular imaging. First we discuss inferences about brain effects’ presence, size, and location. Then we discuss forward and reverse inference, which, respectively, relate to making inferences about the brain and our psychological states (or other outcomes).

Inferences about the presence, size, and location of effects

The basic brain mapping procedure involves a test of significance at each voxel; this is a hypothesis test. This allows us to reject the null hypothesis that a subset of voxels has no effect in favor of an alternative hypothesis. That alternative hypothesis, however, is not very precise: it is merely that there is some non-zero effect.

As we will see in the next chapter, this does not let us conclude anything about how big or how meaningful the effects are; attempts to do so using standard hypothesis testing procedures can be highly misleading. At best, then, brain maps can allow inference that a set of significant voxels has some effect, but not how much effect.

Standard brain maps are also not very good for determining which voxels do versus do not show effects. Thus they are not useful to show us the complete pattern of activity (or structural effects, etc.) across the brain. This is primarily because of the stringent thresholds that usually limit the false positive findings. Current thresholding procedures do not optimally balance the number of false positives and false negatives (missed findings).

Another thing in which standard brain maps are not particularly good for is precise determination where the effects are in the brain. This may seem very surprising as researchers nearly always interpret thresholded brain maps in terms of where the most statistically significant results lie in the brain. However, the trouble lies in brain maps providing confidence intervals (which researchers use as a guide for how strongly to believe in the effect) on whether each voxel is significant but not on the significant voxels’ locations. They provide a `yes/no’ value for whether a significant effect appears at each voxel. Inferences about result locations, then, are heuristic rather than quantitative.

This limitation becomes intuitive if we consider the brain map in Figure 4.1. The map contains significant activation (yellow) in the ventrolateral prefrontal cortex (vlPFC), marked with a red arrow. Imagine repeating this experiment again. What are the chances that the exact same voxels in vlPFC would be active? Or that the most active voxel would fall in the exact same location? We do not know. Standard mapping procedures do not provide p-values or confidence intervals on the activationÕs location or shape. However, we know from meta-analyses like the one in Figure 4.7 that the location of the peak voxel will likely be quite variable, possibly around plus or minus 1 Ð 1.5 cm. Incidentally, Figure 4.7 does show spatial 95% confidence intervals for the across study mean location for positive (green) and negative (red) emotions, drawn as 3-D ellipsoids. In addition to noise related uncertainty about local effects’ locations and shapes, we also must keep in mind that artifacts and imprecision in anatomical alignment can also cause mis-localized effects. All brain images have an intrinsic point-spread function, or a blurring of localized true effects at one local brain point into a broader `blob’ of observable signal. BOLD images in particular are susceptible to arterial inflow and draining vein artifacts; they are also typically overlaid on an anatomical reference image which may not perfectly align with the functional map.

Figure 4.7. An illustration of the variability in the location and shape of activation.

The upshot of all this is that though we can make inferences about certain areas’ activity, we must be cautious about over-interpreting size and location of significant findings and about the completeness of the picture thresholded maps provide.

If these types of inferences sound limited, we agree! Standard brain maps are very limited - we devote much of the next chapter to further unpacking their limitations. Fortunately emerging alternative methods avoid some of standard brain maps’ problems. These include (a) specific multivariate pattern analyses types that build predictive models and (b) spatial models that we can use to make inferences about the location of effects.

Forward and reverse inference

Inferences drawn from brain maps have another limit. Typically, researchers either (a) induce a psychological state by manipulating experimental variables or (b) observe a behavior of interest or other outcome. Then researchers assume that the state or behavior is known and make inferences about the statistical reliability of brain activity given (or conditional on) the state or behavior. In Bayesian terms, we infer the probability of brain activity given a psychological state or behavior, or P(Brain | Psy). This is forward inference, which can tell us about how the brain functions under different psychological or behavioral conditions but not much about the psychological state or behavior itself (see Figure 4.8).

Standard brain maps provide information on forward inferences. Though above we expressed them in terms of probability, the same concept applies to effect size measurements. The stronger a brain map’s statistical effects then the more likely we are to observe a significant result in probabilistic terms.

Figure 4.8. An illustration of forward and reverse inference.

Why can’t standard brain maps teach us much about psychological states? Forward inferences take psychological states as given. They do not tell us how brain measures constrain our theories of which psychological processes are engaged. For that, the inference we want concerns P(Psy | Brain), the probability (or, heuristically, the strength) of a psychological process’ engagement given activity in a particular brain region or pattern. Neuroimaging literature has termed this reverse inference. Though related through Bayes’ Rule, forward and reverse inference are not the same thing qualitatively or quantitatively.

The field of logic calls fallacious reverse inference ‘affirming the consequent’. For example, assume this statement is true: ‘If one is a dog, then one loves ice cream’, or P(Ice Cream | Dog) = 1 for short. Then given that Mary loves ice cream, i.e. P(Ice cream) = 1, one might erroneously infer that Mary is a dog. The problem is that all dogs love ice cream, but not all ice-cream lovers are dogs. P(Ice Cream | Dog) = 1 does not imply that P(Dog | Ice cream) = 1.

Standard brain maps’ limitations in constraining psychological theory have led many researchers to be critical of neuroimaging, often rightly so. Examples of papers that make fallacious reverse inferences - like, for example, inferences that long-term memory processes were engaged (Psy) because the hippocampus was activated (Brain) - litter neuroimaging literature. In fact, some psychologists have argued that neuroimaging has not taught us anything about the mindÑyet.

Reverse inference is actually possible; it is a major piece of the puzzle in constraining psychological (and behavioral and clinical) theory with brain measures. To understand how, let’s revisit forward and reverse inference from a diagnostic testing perspective. P(Brain | Psy) is the ‘hit rate’ of significant activity given a psychological state; testing theory calls it sensitivity. In a standard test, e.g. a diagnostic test for a disease, Brain is analogous to having a positive diagnostic test, and Psy to having the disease. P(Psy | Brain) is the test’s positive predictive value - how likely one is to have the disease given a positive test. High positive predictive value requires both high sensitivity and high specificity, which entails a low probability of a positive test if one does not have the disease - or, in brain imaging terms, low P(Brain | ~Psy), where ~ means `not’. To use a brain example, before we can infer that hippocampal activity implies memory involvement, we must first show that hippocampal activity is specific to memory and that other processes do not activate it.

Thus to make reverse inferences about psychological states we must estimate the relative probabilities of a defined psychological hypotheses set given the data, typically by using Bayes Rule. This requires analysts to construct brain maps of multiple - ideally many - psychological conditions and assess the brain findings’ positive predictive value formally.

In addition to assessing positive predictive value, analysts can optimize maps and models of brain function to maximize function - that is, to strongly and specifically respond to particular classes of psychological events, behaviors, or other prompts. This is the goal of an increasing number of studies which use multivariate pattern analysis with machine learning or statistical learning algorithms. This is a promising direction; we devote a great deal of space to these techniques later in the book.

Ability to infer a psychological process’ presence or strength is important in its own right. It opens up various possibilities for testing and constraining psychological theories - or at least, their biological bases. Valid reverse inferences could allow, in some cases, researchers to infer a number of processes otherwise problematic or impossible to confidently measure. Among others, these states include being in pain, experiencing an emotion, lying or hiding information, and engaging in cognitive work. Researchers can use reverse inferences to probe the unconscious and to help study mental processes in cognitively impaired, very young, or otherwise unresponsive individuals. And, finally, comparing brain markers for different psychological processes could allow us to develop new mental process typologies - including emotion, memory, and other processes - which, regardless whether they match our heuristic psychological categories, may have their own diagnostic value.

In conclusion, standard brain maps provide specific types of inference about brain activity. Though there are a number of fundamental limits to these inferences, new techniques are circumventing many of those limitations and providing a more complete range of inferences about the brain and mind.

In the next chapter, we further explore those limitations, some ways that researchers exploit brain maps to support erroneous conclusions, and how you can become a savvy consumer of neuroimaging results.

Up next

Chapter 6 - How to lie with brain imaging