Cortical networks for recognition of speech with simultaneous talkers.
Document Type
Article
Abstract
The relative contributions of superior temporal vs. inferior frontal and parietal networks to recognition of speech in a background of competing speech remain unclear, although the contributions themselves are well established. Here, we use fMRI with spectrotemporal modulation transfer function (ST-MTF) modeling to examine the speech information represented in temporal vs. frontoparietal networks for two speech recognition tasks with and without a competing talker. Specifically, 31 listeners completed two versions of a three-alternative forced choice competing speech task: "Unison" and "Competing", in which a female (target) and a male (competing) talker uttered identical or different phrases, respectively. Spectrotemporal modulation filtering (i.e., acoustic distortion) was applied to the two-talker mixtures and ST-MTF models were generated to predict brain activation from differences in spectrotemporal-modulation distortion on each trial. Three cortical networks were identified based on differential patterns of ST-MTF predictions and the resultant ST-MTF weights across conditions (Unison, Competing): a bilateral superior temporal (S-T) network, a frontoparietal (F-P) network, and a network distributed across cortical midline regions and the angular gyrus (M-AG). The S-T network and the M-AG network responded primarily to spectrotemporal cues associated with speech intelligibility, regardless of condition, but the S-T network responded to a greater range of temporal modulations suggesting a more acoustically driven response. The F-P network responded to the absence of intelligibility-related cues in both conditions, but also to the absence (presence) of target-talker (competing-talker) vocal pitch in the Competing condition, suggesting a generalized response to signal degradation. Task performance was best predicted by activation in the S-T and F-P networks, but in opposite directions (S-T: more activation = better performance; F-P: vice versa). Moreover, S-T network predictions were entirely ST-MTF mediated while F-P network predictions were ST-MTF mediated only in the Unison condition, suggesting an influence from non-acoustic sources (e.g., informational masking) in the Competing condition. Activation in the M-AG network was weakly positively correlated with performance and this relation was entirely superseded by those in the S-T and F-P networks. Regarding contributions to speech recognition, we conclude: (a) superior temporal regions play a bottom-up, perceptual role that is not qualitatively dependent on the presence of competing speech; (b) frontoparietal regions play a top-down role that is modulated by competing speech and scales with listening effort; and (c) performance ultimately relies on dynamic interactions between these networks, with ancillary contributions from networks not involved in speech processing per se (e.g., the M-AG network).
Medical Subject Headings
Male; Humans; Female; Speech; Speech Perception; Cognition; Cues; Acoustics; Speech Intelligibility; Perceptual Masking
Publication Date
9-15-2023
Publication Title
Hearing research
ISSN
1878-5891
Volume
437
First Page
108856
Last Page
108856
PubMed ID
37531847
Digital Object Identifier (DOI)
10.1016/j.heares.2023.108856
Recommended Citation
Herrera, Christian; Whittle, Nicole; Leek, Marjorie R; Brodbeck, Christian; Lee, Grace; Barcenas, Caleb; Barnes, Samuel; Holshouser, Barbara; Yi, Alex; and Venezia, Jonathan H, "Cortical networks for recognition of speech with simultaneous talkers." (2023). Clinical Neuropsychology. 299.
https://scholar.barrowneuro.org/neuropsychology/299