Research – Blank Lab

The overall aim of our research is to understand how the human brain combines expectations and sensory information. Our ability to successfully communicate with other people is an essential skill in everyday life. Therefore, unravelling how the human brain can derive meaning from acoustic speech signals and recognize our communication partner based on seeing a face represents an important scientific endeavour.

Speech recognition depends on both the clarity of the acoustic input and on what we expect to hear. For example, in noisy listening conditions, listeners of the identical speech input can differ in their perception of what was said. Similarly for face recognition, brain responses to faces depend on expectations and do not simply reflect the presented facial features.

These findings for speech and face recognition are compatible with the more general view that perception is an active process in which incoming sensory information is interpreted with respect to expectations. The neural mechanisms supporting such integration of sensory signals and expectations, however, remain to be identified. Conflicting theoretical and computational models have been suggested for how, when, and where expectations and new sensory information are combined.

Predictions in speech perception

Opposing serial effects of stimulus and choice in speech perception scale with context variability

Here, we disentangled stimulus and choice history revealing opposing effects on perception: Choice history attracts and stimulus history repels current speech perception. The repulsive effect of stimulus history was only revealed once choice history was accounted for. On top of that, stability in the speaker context strengthens the repulsive effects of stimulus history.

In this study, we investigated serial effects on the perception of auditory vowel stimuli across three experimental setups with different degrees of context variability. Aligning with recent findings in visual perception, our results confirm the existence of two distinct processes in serial dependence: a repulsive sensory effect coupled with an attractive decisional effect. Importantly, our study extends these observations to the auditory domain, demonstrating parallel serial effects in audition. Furthermore, we uncover context variability effects, revealing a linear pattern for the repulsive perceptual effect and a quadratic pattern for the attractive decisional effect. These findings support the presence of adaptive sensory mechanisms underlying the repulsive effects, while higher-level mechanisms appear to govern the attractive decisional effect. The study provides valuable insights into the interplay of attractive and repulsive serial effects in auditory perception and contributes to our understanding of the underlying mechanisms.

Ufer, C., & Blank, H., (2024) Opposing serial effects of stimulus and choice in speech perception scale with context variability. iScience.
https://doi.org/10.1016/j.isci.2024.110611

The pupil dilation response as an indicator of visual cue uncertainty and auditory outcome surprise

Expectations can be induced by cues that indicate the probability of following sensory events. The information provided by cues may differ and hence lead to different levels of uncertainty about which event will follow. In this experiment, we employed pupillometry to investigate whether the pupil dilation response to visual cues varies depending on the level of cue-associated uncertainty about a following auditory outcome.

Also, we tested whether the pupil dilation response reflects the amount of surprise about the subsequently presented auditory stimulus. In each trial, participants were presented with a visual cue (face image) which was followed by an auditory outcome (spoken vowel). After the face cue, participants had to indicate by keypress which of three auditory vowels they expected to hear next. We manipulated the cue-associated uncertainty by varying the probabilistic cue-outcome contingencies: One face was most likely followed by one specific vowel (low cue uncertainty), another face was equally likely followed by either of two vowels (intermediate cue uncertainty) and the third face was followed by all three vowels (high cue uncertainty). Our results suggest that pupil dilation in response to task-relevant cues depends on the associated uncertainty, but only for large differences in the cue-associated uncertainty. Additionally, in response to the auditory outcomes, the pupil dilation scaled negatively with the cue-dependent probabilities, likely signalling the amount of surprise.

Becker, J., Viertler, M., Korn, C., & Blank, H., (2024) The pupil dilation response as an indicator of visual cue uncertainty and auditory outcome surprise. European Journal of Neuroscience, 1-16.
doi.org/10.1111/ejn.16306

Pupil diameter as an indicator of sound pair familiarity after statistically structured auditory sequence

Inspired by recent findings in the visual domain, we investigated whether the stimulus-evoked pupil dilation reflects temporal statistical regularities in sequences of auditory stimuli. Our findings suggest that pupil diameter may serve as an indicator of sound pair familiarity but does not invariably respond to task-irrelevant transition probabilities of auditory sequences.

We conducted two preregistered pupillometry experiments (experiment 1, n = 30, 21 females; experiment 2, n = 31, 22 females). In both experiments, human participants listened to sequences of spoken vowels in two conditions. In the first condition, the stimuli were presented in a random order and, in the second condition, the same stimuli were presented in a sequence structured in pairs. The second experiment replicated the first experiment with a modified timing and number of stimuli presented and without participants being informed about any sequence structure. The sound-evoked pupil dilation during a subsequent familiarity task indicated that participants learned the auditory vowel pairs of the structured condition. However, pupil diameter during the structured sequence did not differ according to the statistical regularity of the pair structure. This contrasts with similar visual studies, emphasizing the susceptibility of pupil effects during statistically structured sequences to experimental design settings in the auditory domain.

Becker, J., Korn, C.W. & Blank, H. (2024) Pupil diameter as an indicator of sound pair familiarity after statistically structured auditory sequence. Scientific Reports, 14, 8739.
doi.org/10.1038/s41598-024-59302-1

Multivariate analysis to unravel predictive processes in speech

Speech perception is heavily influenced by our expectations about what will be said. In this review, we discuss the potential of multivariate analysis as a tool to understand the neural mechanisms underlying predictive processes in speech perception.

First, we discuss the advantages of multivariate approaches and what they have added to the understanding of speech processing from the acoustic-phonetic form of speech, over syllable identity and syntax, to its semantic content. Second, we suggest that using multivariate techniques to measure informational content across the hierarchically organised speech-sensitive brain areas might enable us to specify the mechanisms by which prior knowledge and sensory speech signals are combined. Specifically, this approach might allow us to decode how different priors, e.g. about a speaker’s voice or about the topic of the current conversation, are represented at different processing stages and how incoming speech is as a result differently represented.

Ufer, C. & Blank, H., (2023). Multivariate analysis of brain activity patterns as a tool to understand predictive processes in speech perception. Language, Cognition and Neuroscience.
doi.org/10.1080/23273798.2023.2166679

Speech perception depends on speaker priors

When different speakers articulate identical words, the physical properties of the produced sounds can vary substantially. Fundamental frequency (F₀) and formant frequencies (F_F), the two main parameters that discriminate between voices, also influence vowel perception. In this study we investiagted how we use speaker prior information when decoding the speech stream.

While, in everyday circumstances, we have no difficulties decoding a speaker’s intended word from the audio stream, listeners benefit from a general familiarity with target speakers when comprehending speech. We used a twofold approach to investigate the influence of speaker expectations on vowel perception. On the one hand, we showed how shifts in F_F and F₀ influenced vowel perception depending on context, i.e., when the F_F and F₀ voice characteristics of the speaker could or could not be anticipated. On the other hand, we observed that different expected speaker F_F and F₀ characteristics influence perception of identical, unshifted vowels in a contrastive manner. In conclusion, our findings support the view that knowledge about a speaker’s voice characteristics influences vowel perception.

Krumbiegel J, Ufer C, Blank H (2022) Influence of voice properties on vowel perception depends on speaker context. The Journal of the Acoustical Society of America 152:820–834. doi.org/10.1121/10.0013363

Prediction errors during speech perception

Perception inevitably depends on combining sensory input with prior expectations, particularly for identification of degraded input. However, the underlying neural mechanism by which expectations influence sensory processing is unclear. Predictive Coding suggest that the brain passes forward the unexpected part of the sensory input while expected properties are suppressed (Prediction Error). However, evidence to rule out the opposite and perhaps more intuitive mechanism, in which the expected part of the sensory input is enhanced (Sharpening), has been lacking.

We investigated the neural mechanisms by which sensory clarity and prior knowledge influence the perception of degraded speech. A univariate measure of brain activity obtained from functional magnetic resonance imaging (fMRI) was in line with both neural mechanisms (Prediction Error and Sharpening). However, combining multivariate fMRI measures with computational simulations allowed us to determine the underlying mechanism. Our key finding was an interaction between sensory input and prior expectations: For unexpected speech, increasing speech clarity increased the amount of information represented in sensory brain areas. In contrast, for speech that matched prior expectations, increasing speech clarity reduced the amount of this information. Our observations were uniquely simulated by a model of speech perception that included Prediction Errors.

Blank, H. & Davis, M. (2016). Prediction errors but not sharpened signals simulate multivoxel fMRI patterns during speech perception, PLOSBiology, 14(11) doi.org/10.1371/journal.pbio.1002577

Slips of the Ear: When Knowledge Deceives Perception

The ability to draw on past experience is important to keep up with a conversation, especially in noisy environments where speech sounds are hard to hear. However, these prior expectations can sometimes mislead listeners; convincing them that they heard something that a speaker did not actually say.

To investigate the neural underpinnings of speech misperception, we presented participants with pairs of written and degraded spoken words that were either identical, clearly different or similar-sounding. Reading and hearing similar sounding words (like kick followed by pick), led to frequent misperception.

Using fMRI, we found that misperception was associated with reduced activity in the left superior temporal sulcus a brain region critical for processing speech sounds. Furthermore, when perception of speech was more successful, this brain region represented the difference between prior expectations and heard speech (like the initial k/p in kick-pick).

Blank, H., Spangenberg, M., & Davis, M. (2018). Neural Prediction Errors Distinguish Perception and Misperception of Speech. The Journal of Neuroscience, . 38 (27) 6076-6089.
https://doi.org/10.1523/JNEUROSCI.3258-17.2018

Predictions in face perception

Protocol to study how expectations guide predictive eye movements and information sampling in humans

Here, we provide a detailed protocol about how to explore information sampling with eye tracking.

Predictive coding suggests an active sampling of expected information. We studied visual information sampling during face perception. Participants performed predictive saccades and early fixations of expected features. Expectations guide eye movements toward locations of interest.

Garlichs, A., Lustig, M, Gamer, M., Blank, H. (2025). Protocol to study how expectations guide predictive eye movements and information sampling in humans. STAR Protocols, 6:103737.
https://doi.org/10.1016/j.xpro.2025.103737

Expectations Guide Predictive Eye Movements and Information Sampling During Face Recognition

We conducted two eye-tracking experiments to investigate how the way we look at faces is modulated by expectations.

Our ability to recognize faces is significantly influenced by context information. Predictive processing theories suggest that our brain uses context to guide where we look for sensory evidence. However, it’s unclear how these expectations affect our visual sampling during face perception. To explore this, we conducted two eye-tracking experiments with 34 participants each, using face morphs with expected and unexpected features. We found that participants made predictive saccades towards expected features and fixated on them more often and longer than on unexpected ones. In face morphs, expected features attracted early eye movements, followed by unexpected ones, showing that both top-down and bottom-up information drive face sampling. These findings highlight that expectations shape face processing by guiding early eye movements to anticipated locations, supporting predictive processing theories.

Garlichs, A., Lustig, M, Gamer, M., & Blank, H. (2024). Expectations Guide Predictive Eye Movements and Information Sampling During Face Recognition. iScience. 27:110920.
https://doi.org/10.1016/j.isci.2024.110920

Prediction error processing and sharpening of expected information across the face-processing hierarchy

The perception and neural processing of sensory information are strongly influenced by prior expectations. The integration of prior and sensory information can manifest through distinct underlying mechanisms: focusing on unexpected input, denoted as prediction error (PE) processing, or amplifying anticipated information via sharpened representation. In this study, we employed computational modeling using deep neural networks combined with representational similarity analyses of fMRI data to investigate these two processes during face perception.

Participants were cued to see face images, some generated by morphing two faces, leading to ambiguity in face identity. We show that expected faces were identified faster and perception of ambiguous faces was shifted towards priors. Multivariate analyses uncovered evidence for PE processing across and beyond the face-processing hierarchy from the occipital face area (OFA), via the fusiform face area, to the anterior temporal lobe, and suggest sharpened representations in the OFA. Our findings support the proposition that the brain represents faces grounded in prior expectations.

Garlichs, A., Blank, H. (2024). Prediction error processing and sharpening of expected information across the face-processing hierarchy. Nature Communications, 15, 3407. doi.org/10.1038/s41467-024-47749-9

Representations of face priors

In everyday life, we face our environment with several prior expectations about what we are going to encounter, for example, whom we are going to see most likely at a certain location. These expectations have to be weighted according to their probability, e.g., a student regularly entering our office will be expected with a higher probability than a shy colleague we rarely meet. In this study, we show that the human brain weights face priors according to their certainty in high-level face-sensitive regions.

We used functional resonance imaging (fMRI) in combination with multivariate methods to test whether the strength of face expectations can be detected alongside expected face identity in face-sensitive regions (i.e., OFA, FFA, and aTL) of the human brain. Participants used scene cues to predict face identities with different probabilities. We found evidence that representations of expected face identities were weighted according to their probability in the high-level face-sensitive aTL.

Blank, H., Alink, A., & Büchel, C. (2023). Multivariate functional neuroimaging analyses reveal that strength-dependent face expectations are represented in higher-level face-identity areas. Communications Biology, 6, 135. doi.org/10.1038/s42003-023-04508-8

Direct structural connections between face and voice ares

By combining fMRI with diffusion-weighted imaging we could show that the brain is equipped with direct structural connections between face- and voice-recognition areas to activate learned associations of faces and voices even in unimodal conditions to improve person-identity recognition.

According to hierarchical processing models of person-identity recognition, information from faces and voices is only integrated at later stages after person-identity has been achieved. However, functional neuroimaging studies showed that the fusiform face area was activated by familiar voices during auditory-only speaker recognition. To test for direct structural connections between face- and voice-recognition areas, we localized voice-sensitive areas in anterior, middle, and posterior STS and face-sensitive areas in the fusiform gyrus. Probabilistic tractography revealed evidence for direct structural connections between these regions. These connections seemed to be functionally relevant because they were particularly strong between those areas that were engaged during processing of voice identity in anterior/middle STS in contrast to areas that process less identity-specific, acoustic features in posterior STS.

What kind of information is exchanged between these specialized areas during cross‐modal recognition of other individuals? To address this question, we used functional magnetic resonance imaging and a voice‐face priming design. In this design, familiar voices were followed by morphed faces that matched or mismatched with respect to identity or physical properties. The results showed that responses in face‐sensitive regions were modulated when face identity or physical properties did not match to the preceding voice. The strength of this mismatch signal depended on the level of certainty the participant had about the voice identity. This suggests that both identity and physical property information was provided by the voice to face areas.

Blank, H., Anwander, A., & von Kriegstein, K. (2011). Direct structural connections between voice- and face-recognition areas. The Journal of Neuroscience, 31(36), 12906-12915. https://doi.org/10.1523/JNEUROSCI.2091-11.2011

Blank, H., Kiebel, S. J. & von Kriegstein, K. (2015). How the human brain exchanges information across sensory modalities to recognize other people. Human Brain Mapping, 36(1), 324-39. http://dx.doi.org/10.1002/hbm.22631

Lipreading: How we “hear” with our eyes

In a noisy environment it is often very helpful to see the mouth of the person you are speaking to. When our brain is able to combine information from different sensory sources, for example during lip-reading, speech comprehension is improved.

We investigated this phenomenon in more detail to uncover how visual and auditory brain areas work together during lip-reading. In the experiment, brain activity was measured using functional magnetic resonance imaging while participants heard short sentences. The participants then watched a short silent video of a person speaking. Using a button press, they indicated whether the sentence they had heard matched the mouth movements in the video. If the sentence did not match the video, a part of the brain network that combines visual and auditory information showed greater activity and there were increased connections between the auditory speech region and the STS. How strong the activation was depended on the lip-reading skill of participants: The stronger the activation, the more correct were responses. This effect seemed to be specific to the content of speech – it did not occur when the subjects had to decide if the identity of the voice and face matched.

Blank, H. & von Kriegstein, K. (2013). Mechanisms of enhancing visual-speech recognition by prior auditory information. Neuroimage, 65, 109-118. http://dx.doi.org/10.1016/j.neuroimage.2012.09.047