Journal articles on the topic 'Temporal Representation in speech'

To see the other types of publications on this topic, follow the link: Temporal Representation in speech.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Temporal Representation in speech.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Mazoyer, B. M., N. Tzourio, V. Frak, A. Syrota, N. Murayama, O. Levrier, G. Salamon, S. Dehaene, L. Cohen, and J. Mehler. "The Cortical Representation of Speech." Journal of Cognitive Neuroscience 5, no. 4 (October 1993): 467–79. http://dx.doi.org/10.1162/jocn.1993.5.4.467.

Full text
Abstract:
In this study, we compare regional cerebral blood flow (rCBF) while French monolingual subjects listen to continuous speech in an unknown language, to lists of French words, or to meaningful and distorted stories in French. Our results show that, in addition to regions devoted to single-word comprehension, processing of meaningful stories activates the left middle temporal gyrus, the left and right temporal poles, and a superior prefrontal area in the left frontal lobe. Among these regions, only the temporal poles remain activated whenever sentences with acceptable syntax and prosody are presented.
APA, Harvard, Vancouver, ISO, and other styles
2

Bhaya-Grossman, Ilina, and Edward F. Chang. "Speech Computations of the Human Superior Temporal Gyrus." Annual Review of Psychology 73, no. 1 (January 4, 2022): 79–102. http://dx.doi.org/10.1146/annurev-psych-022321-035256.

Full text
Abstract:
Human speech perception results from neural computations that transform external acoustic speech signals into internal representations of words. The superior temporal gyrus (STG) contains the nonprimary auditory cortex and is a critical locus for phonological processing. Here, we describe how speech sound representation in the STG relies on fundamentally nonlinear and dynamical processes, such as categorization, normalization, contextual restoration, and the extraction of temporal structure. A spatial mosaic of local cortical sites on the STG exhibits complex auditory encoding for distinct acoustic-phonetic and prosodic features. We propose that as a population ensemble, these distributed patterns of neural activity give rise to abstract, higher-order phonemic and syllabic representations that support speech perception. This review presents a multi-scale, recurrent model of phonological processing in the STG, highlighting the critical interface between auditory and language systems.
APA, Harvard, Vancouver, ISO, and other styles
3

Young, Eric D. "Neural representation of spectral and temporal information in speech." Philosophical Transactions of the Royal Society B: Biological Sciences 363, no. 1493 (September 7, 2007): 923–45. http://dx.doi.org/10.1098/rstb.2007.2151.

Full text
Abstract:
Speech is the most interesting and one of the most complex sounds dealt with by the auditory system. The neural representation of speech needs to capture those features of the signal on which the brain depends in language communication. Here we describe the representation of speech in the auditory nerve and in a few sites in the central nervous system from the perspective of the neural coding of important aspects of the signal. The representation is tonotopic, meaning that the speech signal is decomposed by frequency and different frequency components are represented in different populations of neurons. Essential to the representation are the properties of frequency tuning and nonlinear suppression. Tuning creates the decomposition of the signal by frequency, and nonlinear suppression is essential for maintaining the representation across sound levels. The representation changes in central auditory neurons by becoming more robust against changes in stimulus intensity and more transient. However, it is probable that the form of the representation at the auditory cortex is fundamentally different from that at lower levels, in that stimulus features other than the distribution of energy across frequency are analysed.
APA, Harvard, Vancouver, ISO, and other styles
4

Mikell, Charles B., and Guy M. McKhann. "Categorical Speech Representation in Human Superior Temporal Gyrus." Neurosurgery 67, no. 6 (December 2010): N19—N20. http://dx.doi.org/10.1227/01.neu.0000390615.58208.a8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chang, Edward F., Jochem W. Rieger, Keith Johnson, Mitchel S. Berger, Nicholas M. Barbaro, and Robert T. Knight. "Categorical speech representation in human superior temporal gyrus." Nature Neuroscience 13, no. 11 (October 3, 2010): 1428–32. http://dx.doi.org/10.1038/nn.2641.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Hirahara, Tatsuya. "Internal speech spectrum representation by spatio-temporal masking pattern." Journal of the Acoustical Society of Japan (E) 12, no. 2 (1991): 57–68. http://dx.doi.org/10.1250/ast.12.57.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Moore, Brian C. J. "Basic auditory processes involved in the analysis of speech sounds." Philosophical Transactions of the Royal Society B: Biological Sciences 363, no. 1493 (September 7, 2007): 947–63. http://dx.doi.org/10.1098/rstb.2007.2152.

Full text
Abstract:
This paper reviews the basic aspects of auditory processing that play a role in the perception of speech. The frequency selectivity of the auditory system, as measured using masking experiments, is described and used to derive the internal representation of the spectrum (the excitation pattern) of speech sounds. The perception of timbre and distinctions in quality between vowels are related to both static and dynamic aspects of the spectra of sounds. The perception of pitch and its role in speech perception are described. Measures of the temporal resolution of the auditory system are described and a model of temporal resolution based on a sliding temporal integrator is outlined. The combined effects of frequency and temporal resolution can be modelled by calculation of the spectro-temporal excitation pattern, which gives good insight into the internal representation of speech sounds. For speech presented in quiet, the resolution of the auditory system in frequency and time usually markedly exceeds the resolution necessary for the identification or discrimination of speech sounds, which partly accounts for the robust nature of speech perception. However, for people with impaired hearing, speech perception is often much less robust.
APA, Harvard, Vancouver, ISO, and other styles
8

Koromilas, Panagiotis, and Theodoros Giannakopoulos. "Deep Multimodal Emotion Recognition on Human Speech: A Review." Applied Sciences 11, no. 17 (August 28, 2021): 7962. http://dx.doi.org/10.3390/app11177962.

Full text
Abstract:
This work reviews the state of the art in multimodal speech emotion recognition methodologies, focusing on audio, text and visual information. We provide a new, descriptive categorization of methods, based on the way they handle the inter-modality and intra-modality dynamics in the temporal dimension: (i) non-temporal architectures (NTA), which do not significantly model the temporal dimension in both unimodal and multimodal interaction; (ii) pseudo-temporal architectures (PTA), which also assume an oversimplification of the temporal dimension, although in one of the unimodal or multimodal interactions; and (iii) temporal architectures (TA), which try to capture both unimodal and cross-modal temporal dependencies. In addition, we review the basic feature representation methods for each modality, and we present aggregated evaluation results on the reported methodologies. Finally, we conclude this work with an in-depth analysis of the future challenges related to validation procedures, representation learning and method robustness.
APA, Harvard, Vancouver, ISO, and other styles
9

Poeppel, David, William J. Idsardi, and Virginie van Wassenhove. "Speech perception at the interface of neurobiology and linguistics." Philosophical Transactions of the Royal Society B: Biological Sciences 363, no. 1493 (September 21, 2007): 1071–86. http://dx.doi.org/10.1098/rstb.2007.2160.

Full text
Abstract:
Speech perception consists of a set of computations that take continuously varying acoustic waveforms as input and generate discrete representations that make contact with the lexical representations stored in long-term memory as output. Because the perceptual objects that are recognized by the speech perception enter into subsequent linguistic computation, the format that is used for lexical representation and processing fundamentally constrains the speech perceptual processes. Consequently, theories of speech perception must, at some level, be tightly linked to theories of lexical representation. Minimally, speech perception must yield representations that smoothly and rapidly interface with stored lexical items. Adopting the perspective of Marr, we argue and provide neurobiological and psychophysical evidence for the following research programme. First, at the implementational level, speech perception is a multi-time resolution process, with perceptual analyses occurring concurrently on at least two time scales (approx. 20–80 ms, approx. 150–300 ms), commensurate with (sub)segmental and syllabic analyses, respectively. Second, at the algorithmic level, we suggest that perception proceeds on the basis of internal forward models, or uses an ‘analysis-by-synthesis’ approach. Third, at the computational level (in the sense of Marr), the theory of lexical representation that we adopt is principally informed by phonological research and assumes that words are represented in the mental lexicon in terms of sequences of discrete segments composed of distinctive features. One important goal of the research programme is to develop linking hypotheses between putative neurobiological primitives (e.g. temporal primitives) and those primitives derived from linguistic inquiry, to arrive ultimately at a biologically sensible and theoretically satisfying model of representation and computation in speech.
APA, Harvard, Vancouver, ISO, and other styles
10

Martínez, C., J. Goddard, D. Milone, and H. Rufiner. "Bioinspired sparse spectro-temporal representation of speech for robust classification." Computer Speech & Language 26, no. 5 (October 2012): 336–48. http://dx.doi.org/10.1016/j.csl.2012.02.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Chandrashekar, H. M., Veena Karjigi, and N. Sreedevi. "Spectro-Temporal Representation of Speech for Intelligibility Assessment of Dysarthria." IEEE Journal of Selected Topics in Signal Processing 14, no. 2 (February 2020): 390–99. http://dx.doi.org/10.1109/jstsp.2019.2949912.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Verkholyak, Oxana, Heysem Kaya, and Alexey Karpov. "Modeling Short-Term and Long-Term Dependencies of the Speech Signal for Paralinguistic Emotion Classification." SPIIRAS Proceedings 18, no. 1 (February 11, 2019): 30–56. http://dx.doi.org/10.15622/sp.18.1.30-56.

Full text
Abstract:
Recently, Speech Emotion Recognition (SER) has become an important research topic of affective computing. It is a difficult problem, where some of the greatest challenges lie in the feature selection and representation tasks. A good feature representation should be able to reflect global trends as well as temporal structure of the signal, since emotions naturally evolve in time; it has become possible with the advent of Recurrent Neural Networks (RNN), which are actively used today for various sequence modeling tasks. This paper proposes a hybrid approach to feature representation, which combines traditionally engineered statistical features with Long Short-Term Memory (LSTM) sequence representation in order to take advantage of both short-term and long-term acoustic characteristics of the signal, therefore capturing not only the general trends but also temporal structure of the signal. The evaluation of the proposed method is done on three publicly available acted emotional speech corpora in three different languages, namely RUSLANA (Russian speech), BUEMODB (Turkish speech) and EMODB (German speech). Compared to the traditional approach, the results of our experiments show an absolute improvement of 2.3% and 2.8% for two out of three databases, and a comparative performance on the third. Therefore, provided enough training data, the proposed method proves effective in modelling emotional content of speech utterances.
APA, Harvard, Vancouver, ISO, and other styles
13

Millman, Rebecca E., Sam R. Johnson, and Garreth Prendergast. "The Role of Phase-locking to the Temporal Envelope of Speech in Auditory Perception and Speech Intelligibility." Journal of Cognitive Neuroscience 27, no. 3 (March 2015): 533–45. http://dx.doi.org/10.1162/jocn_a_00719.

Full text
Abstract:
The temporal envelope of speech is important for speech intelligibility. Entrainment of cortical oscillations to the speech temporal envelope is a putative mechanism underlying speech intelligibility. Here we used magnetoencephalography (MEG) to test the hypothesis that phase-locking to the speech temporal envelope is enhanced for intelligible compared with unintelligible speech sentences. Perceptual “pop-out” was used to change the percept of physically identical tone-vocoded speech sentences from unintelligible to intelligible. The use of pop-out dissociates changes in phase-locking to the speech temporal envelope arising from acoustical differences between un/intelligible speech from changes in speech intelligibility itself. Novel and bespoke whole-head beamforming analyses, based on significant cross-correlation between the temporal envelopes of the speech stimuli and phase-locked neural activity, were used to localize neural sources that track the speech temporal envelope of both intelligible and unintelligible speech. Location-of-interest analyses were carried out in a priori defined locations to measure the representation of the speech temporal envelope for both un/intelligible speech in both the time domain (cross-correlation) and frequency domain (coherence). Whole-brain beamforming analyses identified neural sources phase-locked to the temporal envelopes of both unintelligible and intelligible speech sentences. Crucially there was no difference in phase-locking to the temporal envelope of speech in the pop-out condition in either the whole-brain or location-of-interest analyses, demonstrating that phase-locking to the speech temporal envelope is not enhanced by linguistic information.
APA, Harvard, Vancouver, ISO, and other styles
14

Ghasemzadeh, Hamzeh, Philip C. Doyle, and Jeff Searl. "Image representation of the acoustic signal: An effective tool for modeling spectral and temporal dynamics of connected speech." Journal of the Acoustical Society of America 152, no. 1 (July 2022): 580–90. http://dx.doi.org/10.1121/10.0012734.

Full text
Abstract:
Recent studies have advocated for the use of connected speech in clinical voice and speech assessment. This suggestion is based on the presence of clinically relevant information within the onset, offset, and variation in connected speech. Existing works on connected speech utilize methods originally designed for analysis of sustained vowels and, hence, cannot properly quantify the transient behavior of connected speech. This study presents a non-parametric approach to analysis based on a two-dimensional, temporal-spectral representation of speech. Variations along horizontal and vertical axes corresponding to the temporal and spectral dynamics of speech were quantified using two statistical models. The first, a spectral model, was defined as the probability of changes between the energy of two consecutive frequency sub-bands at a fixed time segment. The second, a temporal model, was defined as the probability of changes in the energy of a sub-band between consecutive time segments. As the first step of demonstrating the efficacy and utility of the proposed method, a diagnostic framework was adopted in this study. Data obtained revealed that the proposed method has (at minimum) significant discriminatory power over the existing alternative approaches.
APA, Harvard, Vancouver, ISO, and other styles
15

Agnew, Zarinah K., Carolyn McGettigan, and Sophie K. Scott. "Discriminating between Auditory and Motor Cortical Responses to Speech and Nonspeech Mouth Sounds." Journal of Cognitive Neuroscience 23, no. 12 (December 2011): 4038–47. http://dx.doi.org/10.1162/jocn_a_00106.

Full text
Abstract:
Several perspectives on speech perception posit a central role for the representation of articulations in speech comprehension, supported by evidence for premotor activation when participants listen to speech. However, no experiments have directly tested whether motor responses mirror the profile of selective auditory cortical responses to native speech sounds or whether motor and auditory areas respond in different ways to sounds. We used fMRI to investigate cortical responses to speech and nonspeech mouth (ingressive click) sounds. Speech sounds activated bilateral superior temporal gyri more than other sounds, a profile not seen in motor and premotor cortices. These results suggest that there are qualitative differences in the ways that temporal and motor areas are activated by speech and click sounds: Anterior temporal lobe areas are sensitive to the acoustic or phonetic properties, whereas motor responses may show more generalized responses to the acoustic stimuli.
APA, Harvard, Vancouver, ISO, and other styles
16

Crosse, Michael J., and Edmund C. Lalor. "The cortical representation of the speech envelope is earlier for audiovisual speech than audio speech." Journal of Neurophysiology 111, no. 7 (April 1, 2014): 1400–1408. http://dx.doi.org/10.1152/jn.00690.2013.

Full text
Abstract:
Visual speech can greatly enhance a listener's comprehension of auditory speech when they are presented simultaneously. Efforts to determine the neural underpinnings of this phenomenon have been hampered by the limited temporal resolution of hemodynamic imaging and the fact that EEG and magnetoencephalographic data are usually analyzed in response to simple, discrete stimuli. Recent research has shown that neuronal activity in human auditory cortex tracks the envelope of natural speech. Here, we exploit this finding by estimating a linear forward-mapping between the speech envelope and EEG data and show that the latency at which the envelope of natural speech is represented in cortex is shortened by >10 ms when continuous audiovisual speech is presented compared with audio-only speech. In addition, we use a reverse-mapping approach to reconstruct an estimate of the speech stimulus from the EEG data and, by comparing the bimodal estimate with the sum of the unimodal estimates, find no evidence of any nonlinear additive effects in the audiovisual speech condition. These findings point to an underlying mechanism that could account for enhanced comprehension during audiovisual speech. Specifically, we hypothesize that low-level acoustic features that are temporally coherent with the preceding visual stream may be synthesized into a speech object at an earlier latency, which may provide an extended period of low-level processing before extraction of semantic information.
APA, Harvard, Vancouver, ISO, and other styles
17

Ding, N., and J. Z. Simon. "Adaptive Temporal Encoding Leads to a Background-Insensitive Cortical Representation of Speech." Journal of Neuroscience 33, no. 13 (March 27, 2013): 5728–35. http://dx.doi.org/10.1523/jneurosci.5297-12.2013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Erb, Julia, Marcelo Armendariz, Federico De Martino, Rainer Goebel, Wim Vanduffel, and Elia Formisano. "Homology and Specificity of Natural Sound-Encoding in Human and Monkey Auditory Cortex." Cerebral Cortex 29, no. 9 (November 3, 2018): 3636–50. http://dx.doi.org/10.1093/cercor/bhy243.

Full text
Abstract:
Abstract Understanding homologies and differences in auditory cortical processing in human and nonhuman primates is an essential step in elucidating the neurobiology of speech and language. Using fMRI responses to natural sounds, we investigated the representation of multiple acoustic features in auditory cortex of awake macaques and humans. Comparative analyses revealed homologous large-scale topographies not only for frequency but also for temporal and spectral modulations. In both species, posterior regions preferably encoded relatively fast temporal and coarse spectral information, whereas anterior regions encoded slow temporal and fine spectral modulations. Conversely, we observed a striking interspecies difference in cortical sensitivity to temporal modulations: While decoding from macaque auditory cortex was most accurate at fast rates (> 30 Hz), humans had highest sensitivity to ~3 Hz, a relevant rate for speech analysis. These findings suggest that characteristic tuning of human auditory cortex to slow temporal modulations is unique and may have emerged as a critical step in the evolution of speech and language.
APA, Harvard, Vancouver, ISO, and other styles
19

Shih, Stephanie S., and Sharon Inkelas. "Subsegments and the emergence of segments." Proceedings of the Linguistic Society of America 4, no. 1 (March 15, 2019): 37. http://dx.doi.org/10.3765/plsa.v4i1.4541.

Full text
Abstract:
Q Theory proposes that the most granular and basic temporal unit of abstract phonological representation is not the segment, as widely assumed in classic generative phonology, but the quantized subsegment. With a more granular quantization of the speech stream, Q Theory provides phonological grammar with the representational capability to model behaviors that affect both the parts and the wholes of segments. In Q Theory, segments are emergent from strings of subsegments and from subsegmental interactions based on the principles of similarity, proximity, and co-occurrence that already underlie phonological operations. Evidence is presented from linguistic typology, and mechanics are drawn from speech segmentation and recognition. Q Theory makes it possible to develop an advanced theory of complex segments.
APA, Harvard, Vancouver, ISO, and other styles
20

Tsunada, Joji, Jung Hoon Lee, and Yale E. Cohen. "Representation of speech categories in the primate auditory cortex." Journal of Neurophysiology 105, no. 6 (June 2011): 2634–46. http://dx.doi.org/10.1152/jn.00037.2011.

Full text
Abstract:
A “ventral” auditory pathway in nonhuman primates that originates in the core auditory cortex and ends in the prefrontal cortex is thought to be involved in components of nonspatial auditory processing. Previous work from our laboratory has indicated that neurons in the prefrontal cortex reflect monkeys' decisions during categorical judgments. Here, we tested the role of the superior temporal gyrus (STG), a region of the secondary auditory cortex that is part of this ventral pathway, during similar categorical judgments. While monkeys participated in a match-to-category task and reported whether two consecutive auditory stimuli belonged to the same category or to different categories, we recorded spiking activity from STG neurons. The auditory stimuli were morphs of two human-speech sounds ( bad and dad). We found that STG neurons represented auditory categories. However, unlike activity in the prefrontal cortex, STG activity was not modulated by the monkeys' behavioral reports (choices). This finding is consistent with the anterolateral STG's role as a part of functional circuit involved in the coding, representation, and perception of the nonspatial features of an auditory stimulus.
APA, Harvard, Vancouver, ISO, and other styles
21

Dehaene-Lambertz, G. "Cerebral Specialization for Speech and Non-Speech Stimuli in Infants." Journal of Cognitive Neuroscience 12, no. 3 (May 2000): 449–60. http://dx.doi.org/10.1162/089892900562264.

Full text
Abstract:
Early cerebral specialization and lateralization for auditory processing in 4-month-old infants was studied by recording high-density evoked potentials to acoustical and phonetic changes in a series of repeated stimuli (either tones or syllables). Mismatch responses to these stimuli exhibit a distinct topography suggesting that different neural networks within the temporal lobe are involved in the perception and representation of the different features of an auditory stimulus. These data confirm that specialized modules are present within the auditory cortex very early in development. However, both for syllables and continuous tones, higher voltages were recorded over the left hemisphere than over the right with no significant interaction of hemisphere by type of stimuli. This suggests that there is no greater left hemisphere involvement in phonetic processing than in acoustic processing during the first months of life.
APA, Harvard, Vancouver, ISO, and other styles
22

Lucas, Timothy H., Daniel L. Drane, Carl B. Dodrill, and George A. Ojemann. "LANGUAGE REORGANIZATION IN APHASICS." Neurosurgery 63, no. 3 (September 1, 2008): 487–97. http://dx.doi.org/10.1227/01.neu.0000324725.84854.04.

Full text
Abstract:
ABSTRACT OBJECTIVE The purpose of this investigation was to determine whether clinical speech deficits after brain injury are associated with functional speech reorganization. METHODS Across an 18-year interval, 11 patients with mild-to-moderate speech deficits underwent language mapping as part of their treatment for intractable epilepsy. These “aphasics” were compared with 14 matched “control” patients with normal speech who also were undergoing epilepsy surgery. Neuroanatomic data were compared with quantitative language profiles and clinical variables. RESULTS Cortical lesions were evident near speech areas in all aphasia cases. As expected, aphasic and control patients were distinguished by quantitative language profiles. The groups were further distinguished by the anatomic distribution of their speech sites. A significantly greater proportion of frontal speech sites was found in patients with previous brain injury, consistent with frontal site recruitment. The degree of frontal recruitment varied as a function of patient age at the time of initial brain injury; earlier injuries were associated with greater recruitment. The overall number of speech sites remained the same after injury. Significant associations were found between the number of the speech sites, naming fluency, and the lesion proximity in the temporal lobe. CONCLUSION Language maps in aphasics demonstrated evidence for age-dependent functional recruitment in the frontal, but not temporal, lobe. The proximity of cortical lesions to temporal speech sites predicted the overall extent of temporal lobe speech representation and performance on naming fluency. These findings have implications for neurosurgical planning in patients with preoperative speech deficits.
APA, Harvard, Vancouver, ISO, and other styles
23

Lüttke, Claudia S., Matthias Ekman, Marcel A. J. van Gerven, and Floris P. de Lange. "Preference for Audiovisual Speech Congruency in Superior Temporal Cortex." Journal of Cognitive Neuroscience 28, no. 1 (January 2016): 1–7. http://dx.doi.org/10.1162/jocn_a_00874.

Full text
Abstract:
Auditory speech perception can be altered by concurrent visual information. The superior temporal cortex is an important combining site for this integration process. This area was previously found to be sensitive to audiovisual congruency. However, the direction of this congruency effect (i.e., stronger or weaker activity for congruent compared to incongruent stimulation) has been more equivocal. Here, we used fMRI to look at the neural responses of human participants during the McGurk illusion—in which auditory /aba/ and visual /aga/ inputs are fused to perceived /ada/—in a large homogenous sample of participants who consistently experienced this illusion. This enabled us to compare the neuronal responses during congruent audiovisual stimulation with incongruent audiovisual stimulation leading to the McGurk illusion while avoiding the possible confounding factor of sensory surprise that can occur when McGurk stimuli are only occasionally perceived. We found larger activity for congruent audiovisual stimuli than for incongruent (McGurk) stimuli in bilateral superior temporal cortex, extending into the primary auditory cortex. This finding suggests that superior temporal cortex prefers when auditory and visual input support the same representation.
APA, Harvard, Vancouver, ISO, and other styles
24

Lee, Yoonjeong, Jelena Krivokapić, and Ruaridh Purse. "Effects of prosodic structure on the temporal organization of speech and co-speech gestures." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A199. http://dx.doi.org/10.1121/10.0016019.

Full text
Abstract:
Although many studies have observed a close relationship between prosodic structure and co-speech gestures, little is understood about cross-modal gestural coordination. The present study examines the relationship between articulatory and co-speech gestures at prosodic boundaries and under prominence, focusing on non-referential manual and eyebrow beat gestures in Korean, a language in which co-speech gestures are virtually unexplored. This study hypothesizes that prosodic structure systematically governs the production of both speech and co-speech gestures and their temporal organization. Multimodal signals of a story reading were collected from eight speakers (5F, 3M). The lips, tongue, and eyebrows were point-tracked using EMA, and the vertical manual movements obtained from a video recording were auto-tracked using a geometrical centroid tracking method. Measurements taken included the duration of intervals from the timepoint of concurrent beat gesture onset and target to 1) consonant gesture onset and target, 2) vowel gesture onset and target, 3) pitch gesture (F0) onset and target, and 4) phrasal boundaries. Results reveal systematic inter-articulator coordination patterns, suggesting that beat gestures co-occurring with speech gestures are recruited to signal information grouping and highlighting. The findings are discussed with reference to the nature of prosodic representation and models of speech planning. [Work supported by NSF.]
APA, Harvard, Vancouver, ISO, and other styles
25

Straube, Benjamin, Antonia Green, Susanne Weis, Anjan Chatterjee, and Tilo Kircher. "Memory Effects of Speech and Gesture Binding: Cortical and Hippocampal Activation in Relation to Subsequent Memory Performance." Journal of Cognitive Neuroscience 21, no. 4 (April 2009): 821–36. http://dx.doi.org/10.1162/jocn.2009.21053.

Full text
Abstract:
In human face-to-face communication, the content of speech is often illustrated by coverbal gestures. Behavioral evidence suggests that gestures provide advantages in the comprehension and memory of speech. Yet, how the human brain integrates abstract auditory and visual information into a common representation is not known. Our study investigates the neural basis of memory for bimodal speech and gesture representations. In this fMRI study, 12 participants were presented with video clips showing an actor performing meaningful metaphoric gestures (MG), unrelated, free gestures (FG), and no arm and hand movements (NG) accompanying sentences with an abstract content. After the fMRI session, the participants performed a recognition task. Behaviorally, the participants showed the highest hit rate for sentences accompanied by meaningful metaphoric gestures. Despite comparable old/new discrimination performances (d′) for the three conditions, we obtained distinct memory-related left-hemispheric activations in the inferior frontal gyrus (IFG), the premotor cortex (BA 6), and the middle temporal gyrus (MTG), as well as significant correlations between hippocampal activation and memory performance in the metaphoric gesture condition. In contrast, unrelated speech and gesture information (FG) was processed in areas of the left occipito-temporal and cerebellar region and the right IFG just like the no-gesture condition (NG). We propose that the specific left-lateralized activation pattern for the metaphoric speech–gesture sentences reflects semantic integration of speech and gestures. These results provide novel evidence about the neural integration of abstract speech and gestures as it contributes to subsequent memory performance.
APA, Harvard, Vancouver, ISO, and other styles
26

Idsardi, William. "Underspecification in time." Canadian Journal of Linguistics/Revue canadienne de linguistique 67, no. 4 (December 2022): 670–82. http://dx.doi.org/10.1017/cnj.2022.36.

Full text
Abstract:
AbstractSubstance-free phonology or SFP (Reiss 2017) has renewed interest in the question of abstraction in phonology. Perhaps the most common form of abstraction through the absence of substance is underspecification, where some aspects of speech lack representation in memorized representations, within the phonology or in the phonetic implementation (Archangeli 1988, Keating 1988, Lahiri and Reetz 2010 among many others). The fundamental basis for phonology is argued to be a mental model of speech events in time, following Raimy (2000) and Papillon (2020). Each event can have properties (one-place predicates that are true of the event), which include the usual phonological features, and also structural entities for extended events like moras and syllables. Features can be bound together in an event, yielding segment-like properties. Pairs of events can be ordered in time by the temporal logic precedence relation represented by ‘<’. Events, features and precedence form a directed multigraph structure with edges in the graph interpreted as “maybe next”. Some infant bimodal speech perception results are examined using this framework, arguing for underspecification in time in the developing phonological representations.
APA, Harvard, Vancouver, ISO, and other styles
27

Presacco, Alessandro, Jonathan Z. Simon, and Samira Anderson. "Evidence of degraded representation of speech in noise, in the aging midbrain and cortex." Journal of Neurophysiology 116, no. 5 (November 1, 2016): 2346–55. http://dx.doi.org/10.1152/jn.00372.2016.

Full text
Abstract:
Humans have a remarkable ability to track and understand speech in unfavorable conditions, such as in background noise, but speech understanding in noise does deteriorate with age. Results from several studies have shown that in younger adults, low-frequency auditory cortical activity reliably synchronizes to the speech envelope, even when the background noise is considerably louder than the speech signal. However, cortical speech processing may be limited by age-related decreases in the precision of neural synchronization in the midbrain. To understand better the neural mechanisms contributing to impaired speech perception in older adults, we investigated how aging affects midbrain and cortical encoding of speech when presented in quiet and in the presence of a single-competing talker. Our results suggest that central auditory temporal processing deficits in older adults manifest in both the midbrain and in the cortex. Specifically, midbrain frequency following responses to a speech syllable are more degraded in noise in older adults than in younger adults. This suggests a failure of the midbrain auditory mechanisms needed to compensate for the presence of a competing talker. Similarly, in cortical responses, older adults show larger reductions than younger adults in their ability to encode the speech envelope when a competing talker is added. Interestingly, older adults showed an exaggerated cortical representation of speech in both quiet and noise conditions, suggesting a possible imbalance between inhibitory and excitatory processes, or diminished network connectivity that may impair their ability to encode speech efficiently.
APA, Harvard, Vancouver, ISO, and other styles
28

Roque, Lindsey, Casey Gaskins, Sandra Gordon-Salant, Matthew J. Goupell, and Samira Anderson. "Age Effects on Neural Representation and Perception of Silence Duration Cues in Speech." Journal of Speech, Language, and Hearing Research 62, no. 4S (April 26, 2019): 1099–116. http://dx.doi.org/10.1044/2018_jslhr-h-ascc7-18-0076.

Full text
Abstract:
Purpose Degraded temporal processing associated with aging may be a contributing factor to older adults' hearing difficulties, especially in adverse listening environments. This degraded processing may affect the ability to distinguish between words based on temporal duration cues. The current study investigates the effects of aging and hearing loss on cortical and subcortical representation of temporal speech components and on the perception of silent interval duration cues in speech. Method Identification functions for the words DISH and DITCH were obtained on a 7-step continuum of silence duration (0–60 ms) prior to the final fricative in participants who are younger with normal hearing (YNH), older with normal hearing (ONH), and older with hearing impairment (OHI). Frequency-following responses and cortical auditory-evoked potentials were recorded to the 2 end points of the continuum. Auditory brainstem responses to clicks were obtained to verify neural integrity and to compare group differences in auditory nerve function. A multiple linear regression analysis was conducted to determine the peripheral or central factors that contributed to perceptual performance. Results ONH and OHI participants required longer silence durations to identify DITCH than did YNH participants. Frequency-following responses showed reduced phase locking and poorer morphology, and cortical auditory-evoked potentials showed prolonged latencies in ONH and OHI participants compared with YNH participants. No group differences were noted for auditory brainstem response Wave I amplitude or Wave V/I ratio. After accounting for the possible effects of hearing loss, linear regression analysis revealed that both midbrain and cortical processing contributed to the variance in the DISH–DITCH perceptual identification functions. Conclusions These results suggest that age-related deficits in the ability to encode silence duration cues may be a contributing factor in degraded speech perception. In particular, degraded response morphology relates to performance on perceptual tasks based on silence duration contrasts between words.
APA, Harvard, Vancouver, ISO, and other styles
29

Leonard, Matthew K., and Edward F. Chang. "Dynamic speech representations in the human temporal lobe." Trends in Cognitive Sciences 18, no. 9 (September 2014): 472–79. http://dx.doi.org/10.1016/j.tics.2014.05.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Herman, Vimala. "Discourse and time in Shakespeare’s Romeo and Juliet." Language and Literature: International Journal of Stylistics 8, no. 2 (June 1999): 143–61. http://dx.doi.org/10.1177/096394709900800203.

Full text
Abstract:
This article explores different verbal resources for the representation of time in drama. Drama as a genre is subjected to different pressures of time, given that the fictional time spans in the dramatic world must be realized within the real time allocated to a performance, a context which a dramatic text necessarily addresses. The temporal scope of plays can be highly expanded or contracted, but whatever option is used in a play, it is the result of the strategic exploitation of different resources. Theatre provides non-verbal means, like lighting and décor, but verbal resources are more dynamic. The verbal dialogue enacts speech events, and speech events are tied to spatio-temporal contexts, which can be transformed via speech use. The article examines various verbal resources like deixis, tense and aspect, lexical choices in clock and calendrical references, and pragmatics in order to explore their productive functions in constructing the complex and dynamic temporal world of one play, Shakespeare’s Romeo and Juliet.
APA, Harvard, Vancouver, ISO, and other styles
31

Rubinstein, Jay T., and Robert Hong. "Signal Coding in Cochlear Implants: Exploiting Stochastic Effects of Electrical Stimulation." Annals of Otology, Rhinology & Laryngology 112, no. 9_suppl (September 2003): 14–19. http://dx.doi.org/10.1177/00034894031120s904.

Full text
Abstract:
Speech perception in quiet with cochlear implants has increased substantially over the past 17 years. If current trends continue, average monosyllabic word scores will be nearly 80% by 2010. These improvements are due to enhancements in speech processing strategies, to the implantation of patients with more residual hearing and shorter durations of deafness, and to unknown causes. Despite these improvements, speech perception in noise and music perception are still poor in most implant patients. These deficits may be partly due to poor representation of temporal fine structure by current speech processing strategies. It may be possible to improve both this representation and the dynamic range of electrical stimulation through the exploitation of stochastic effects produced by high-rate (eg, 5-kilopulse-per-second) pulse trains. Both the loudness growth and the dynamic range of low-frequency sinusoids have been enhanced via this technique. A laboratory speech processor using this strategy is under development. Although the clinical programming for such an algorithm is likely to be complex, some guidelines for the psychophysical and electrophysiological techniques necessary can be described now.
APA, Harvard, Vancouver, ISO, and other styles
32

Holmes, Stephen D., Christian J. Sumner, Lowel P. O’Mard, and Ray Meddis. "The temporal representation of speech in a nonlinear model of the guinea pig cochlea." Journal of the Acoustical Society of America 116, no. 6 (December 2004): 3534–45. http://dx.doi.org/10.1121/1.1815111.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Nikolov, Plamen, Skrikanth Damera, Noah Steinberg, Naama Zur, Lillian Chang, Kyle Yoon, Marcus Dreux, Peter Turkeltaub, Josef Rauschecker, and Maximilian Riesenhuber. "350 Investigating the Architecture of Speech Processing Pathways in the Brain." Journal of Clinical and Translational Science 6, s1 (April 2022): 64–65. http://dx.doi.org/10.1017/cts.2022.198.

Full text
Abstract:
OBJECTIVES/GOALS: Speech production requires mapping between sound-based and motor-based neural representations of a word – accomplished by learning internal models. However, the neural bases of these internal models remain unclear. The aim of this study is to provide experimental evidence for these internal models in the brain during speech production. METHODS/STUDY POPULATION: 16 healthy human adults were recruited for this electrooculography speech study. 20 English pseudowords were designed to vary on confusability along specific features of articulation (place vs manner). All words were controlled for length and voicing. Three task conditions were performed: speech perception, covert and overt speech production. EEG was recorded using a 64-channel Biosemi ActiveTwo system. EMG was recorded on the orbicularis orbis inferior and neck strap muscles. Overt productions were recorded with a high-quality microphone to determine overt production onset. EMG during was used to determine covert production onset. Neuroimaging: Representational Similarity Analysis (RSA), was used to probe the sound- and motor-based neural representations over sensors and time for each task. RESULTS/ANTICIPATED RESULTS: Production (motor) and perception (sound) neural representations were calculated using a cross-validated squared Euclidean distance metric. The RSA results in the speech perception task show a strong selectivity around 150ms, which is compatible with recent human electrocorticography findings in human superior temporal gyrus. Parietal sensors showed a large difference for motor-based neural representations, indicating a strong encoding for production related processes, as hypothesized by previous studies on the ventral and dorsal stream model of language. Temporal sensors, however, showed a large change for both motor- and sound-based neural representations. This is a surprising result since temporal regions are believed to be primarily engaged in perception (sound-based) processes. DISCUSSION/SIGNIFICANCE: This study used neuroimaging (EEG) and advanced multivariate pattern analysis (RSA) to test models of production (motor-) and perception (sound-) based neural representations in three different speech task conditions. These results show strong feasibility of this approach to map how the perception and production processes interact in the brain.
APA, Harvard, Vancouver, ISO, and other styles
34

Mathiak, Klaus, Ingo Hertrich, Wolfgang Grodd, and Hermann Ackermann. "Cerebellum and Speech Perception: A Functional Magnetic Resonance Imaging Study." Journal of Cognitive Neuroscience 14, no. 6 (August 1, 2002): 902–12. http://dx.doi.org/10.1162/089892902760191126.

Full text
Abstract:
A variety of data indicate that the cerebellum participates in perceptual tasks requiring the precise representation of temporal information. Access to the word form of a lexical item requires, among other functions, the processing of durational parameters of verbal utterances. Therefore, cerebellar dysfunctions must be expected to impair word recognition. In order to specify the topography of the assumed cerebellar speech perception mechanism, a functional magnetic resonance imaging study was performed using the German lexical items “Boden” ([bodn], Engl. “floor”) and “Boten” ([botn], “messengers”) as test materials. The contrast in sound structure of these two lexical items can be signaled either by the length of the wordmedial pause (closure time, CLT; an exclusively temporal measure) or by the aspiration noise of wordmedial “d” or “t” (voice onset time, VOT; an intrasegmental cue). A previous study found bilateral cerebellar disorders to compromise word recognition based on CLT whereas the encoding of VOT remained unimpaired. In the present study, two series of “Boden—Boten” utterances were resynthesized, systematically varying either in CLT or VOT. Subjects had to identify both words “Boden” and “Boten” by analysis of either the durational parameter CLT or the VOT aspiration segment. In a subtraction design, CLT categorization as compared to VOT identification (CLT VOT) yielded a significant hemodynamic response of the right cerebellar hemisphere (neocerebellum Crus I) and the frontal lobe (anterior to Broca's area). The reversed contrast (VOT CLT) resulted in a single activation cluster located at the level of the supra-temporal plane of the dominant hemisphere. These findings provide first evidence for a distinct contribution of the right cerebellar hemisphere to speech perception in terms of encoding of durational parameters of verbal utterances. Verbal working memory tasks, lexical response selection, and auditory imagery of word strings have been reported to elicit activation clusters of a similar location. Conceivably, representation of the temporal structure of speech sound sequences represents the common denominator of cerebellar participation in cognitive tasks acting on a phonetic code.
APA, Harvard, Vancouver, ISO, and other styles
35

Saltzman, David I., and Emily B. Myers. "Neural Representation of Articulable and Inarticulable Novel Sound Contrasts: The Role of the Dorsal Stream." Neurobiology of Language 1, no. 3 (August 2020): 339–64. http://dx.doi.org/10.1162/nol_a_00016.

Full text
Abstract:
The extent that articulatory information embedded in incoming speech contributes to the formation of new perceptual categories for speech sounds has been a matter of discourse for decades. It has been theorized that the acquisition of new speech sound categories requires a network of sensory and speech motor cortical areas (the “dorsal stream”) to successfully integrate auditory and articulatory information. However, it is possible that these brain regions are not sensitive specifically to articulatory information, but instead are sensitive to the abstract phonological categories being learned. We tested this hypothesis by training participants over the course of several days on an articulable non-native speech contrast and acoustically matched inarticulable nonspeech analogues. After reaching comparable levels of proficiency with the two sets of stimuli, activation was measured in fMRI as participants passively listened to both sound types. Decoding of category membership for the articulable speech contrast alone revealed a series of left and right hemisphere regions outside of the dorsal stream that have previously been implicated in the emergence of non-native speech sound categories, while no regions could successfully decode the inarticulable nonspeech contrast. Although activation patterns in the left inferior frontal gyrus, the middle temporal gyrus, and the supplementary motor area provided better information for decoding articulable (speech) sounds compared to the inarticulable (sine wave) sounds, the finding that dorsal stream regions do not emerge as good decoders of the articulable contrast alone suggests that other factors, including the strength and structure of the emerging speech categories are more likely drivers of dorsal stream activation for novel sound learning.
APA, Harvard, Vancouver, ISO, and other styles
36

Ohala, John J., Catherine P. Browman, and Louis M. Goldstein. "Towards an articulatory phonology." Phonology Yearbook 3 (May 1986): 219–52. http://dx.doi.org/10.1017/s0952675700000658.

Full text
Abstract:
ABSTRACTWe propose an approach to phonological representation based on describing an utterance as an organised pattern of overlapping articulatory gestures. Because movement is inherent in our definition of gestures, these gestural ‘constellations’ can account for both spatial and temporal properties of speech in a relatively simple way. At the same time, taken as phonological representations, such gestural analyses offer many of the same advantages provided by recent nonlinear phonological theories, and we give examples of how gestural analyses simplify the description of such ‘complex segments’ as /s/–stop clusters and prenasalised stops. Thus, gestural structures can be seen as providing a principled link between phonological and physical description.
APA, Harvard, Vancouver, ISO, and other styles
37

Sánchez-Hevia, Héctor A., Roberto Gil-Pita, Manuel Utrilla-Manso, and Manuel Rosa-Zurera. "Age group classification and gender recognition from speech with temporal convolutional neural networks." Multimedia Tools and Applications 81, no. 3 (January 2022): 3535–52. http://dx.doi.org/10.1007/s11042-021-11614-4.

Full text
Abstract:
AbstractThis paper analyses the performance of different types of Deep Neural Networks to jointly estimate age and identify gender from speech, to be applied in Interactive Voice Response systems available in call centres. Deep Neural Networks are used, because they have recently demonstrated discriminative and representation capabilities in a wide range of applications, including speech processing problems based on feature extraction and selection. Networks with different sizes are analysed to obtain information on how performance depends on the network architecture and the number of free parameters. The speech corpus used for the experiments is Mozilla’s Common Voice dataset, an open and crowdsourced speech corpus. The results are really good for gender classification, independently of the type of neural network, but improve with the network size. Regarding the classification by age groups, the combination of convolutional neural networks and temporal neural networks seems to be the best option among the analysed, and again, the larger the size of the network, the better the results. The results are promising for use in IVR systems, with the best systems achieving a gender identification error of less than 2% and a classification error by age group of less than 20%.
APA, Harvard, Vancouver, ISO, and other styles
38

Bautista, John Lorenzo, Yun Kyung Lee, and Hyun Soon Shin. "Speech Emotion Recognition Based on Parallel CNN-Attention Networks with Multi-Fold Data Augmentation." Electronics 11, no. 23 (November 28, 2022): 3935. http://dx.doi.org/10.3390/electronics11233935.

Full text
Abstract:
In this paper, an automatic speech emotion recognition (SER) task of classifying eight different emotions was experimented using parallel based networks trained using the Ryeson Audio-Visual Dataset of Speech and Song (RAVDESS) dataset. A combination of a CNN-based network and attention-based networks, running in parallel, was used to model both spatial features and temporal feature representations. Multiple Augmentation techniques using Additive White Gaussian Noise (AWGN), SpecAugment, Room Impulse Response (RIR), and Tanh Distortion techniques were used to augment the training data to further generalize the model representation. Raw audio data were transformed into Mel-Spectrograms as the model’s input. Using CNN’s proven capability in image classification and spatial feature representations, the spectrograms were treated as an image with the height and width represented by the spectrogram’s time and frequency scales. Temporal feature representations were represented by attention-based models Transformer, and BLSTM-Attention modules. Proposed architectures of the parallel CNN-based networks running along with Transformer and BLSTM-Attention modules were compared with standalone CNN architectures and attention-based networks, as well as with hybrid architectures with CNN layers wrapped in time-distributed wrappers stacked on attention-based networks. In these experiments, the highest accuracy of 89.33% for a Parallel CNN-Transformer network and 85.67% for a Parallel CNN-BLSTM-Attention Network were achieved on a 10% hold-out test set from the dataset. These networks showed promising results based on their accuracies, while keeping significantly less training parameters compared with non-parallel hybrid models.
APA, Harvard, Vancouver, ISO, and other styles
39

Presacco, Alessandro, Jonathan Z. Simon, and Samira Anderson. "Effect of informational content of noise on speech representation in the aging midbrain and cortex." Journal of Neurophysiology 116, no. 5 (November 1, 2016): 2356–67. http://dx.doi.org/10.1152/jn.00373.2016.

Full text
Abstract:
The ability to understand speech is significantly degraded by aging, particularly in noisy environments. One way that older adults cope with this hearing difficulty is through the use of contextual cues. Several behavioral studies have shown that older adults are better at following a conversation when the target speech signal has high contextual content or when the background distractor is not meaningful. Specifically, older adults gain significant benefit in focusing on and understanding speech if the background is spoken by a talker in a language that is not comprehensible to them (i.e., a foreign language). To understand better the neural mechanisms underlying this benefit in older adults, we investigated aging effects on midbrain and cortical encoding of speech when in the presence of a single competing talker speaking in a language that is meaningful or meaningless to the listener (i.e., English vs. Dutch). Our results suggest that neural processing is strongly affected by the informational content of noise. Specifically, older listeners' cortical responses to the attended speech signal are less deteriorated when the competing speech signal is an incomprehensible language rather than when it is their native language. Conversely, temporal processing in the midbrain is affected by different backgrounds only during rapid changes in speech and only in younger listeners. Additionally, we found that cognitive decline is associated with an increase in cortical envelope tracking, suggesting an age-related over (or inefficient) use of cognitive resources that may explain their difficulty in processing speech targets while trying to ignore interfering noise.
APA, Harvard, Vancouver, ISO, and other styles
40

Dollfus, S., G. Josse, M. Joliot, F. Crivello, D. Papathanassiou, B. Mazoyer, and N. Tzourio-Mazoyer. "Speech processing cortical representation: invariance in left superior temporal sulcus and variability in Broca's area." European Psychiatry 17 (May 2002): 220. http://dx.doi.org/10.1016/s0924-9338(02)80938-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Ding, Nai, and Jonathan Z. Simon. "Neural coding of continuous speech in auditory cortex during monaural and dichotic listening." Journal of Neurophysiology 107, no. 1 (January 2012): 78–89. http://dx.doi.org/10.1152/jn.00297.2011.

Full text
Abstract:
The cortical representation of the acoustic features of continuous speech is the foundation of speech perception. In this study, noninvasive magnetoencephalography (MEG) recordings are obtained from human subjects actively listening to spoken narratives, in both simple and cocktail party-like auditory scenes. By modeling how acoustic features of speech are encoded in ongoing MEG activity as a spectrotemporal response function, we demonstrate that the slow temporal modulations of speech in a broad spectral region are represented bilaterally in auditory cortex by a phase-locked temporal code. For speech presented monaurally to either ear, this phase-locked response is always more faithful in the right hemisphere, but with a shorter latency in the hemisphere contralateral to the stimulated ear. When different spoken narratives are presented to each ear simultaneously (dichotic listening), the resulting cortical neural activity precisely encodes the acoustic features of both of the spoken narratives, but slightly weakened and delayed compared with the monaural response. Critically, the early sensory response to the attended speech is considerably stronger than that to the unattended speech, demonstrating top-down attentional gain control. This attentional gain is substantial even during the subjects' very first exposure to the speech mixture and therefore largely independent of knowledge of the speech content. Together, these findings characterize how the spectrotemporal features of speech are encoded in human auditory cortex and establish a single-trial-based paradigm to study the neural basis underlying the cocktail party phenomenon.
APA, Harvard, Vancouver, ISO, and other styles
42

Ojemann, Jeffrey O., and Daniel L. Silbergeld. "Cortical stimulation mapping of phantom limb rolandic cortex." Journal of Neurosurgery 82, no. 4 (April 1995): 641–44. http://dx.doi.org/10.3171/jns.1995.82.4.0641.

Full text
Abstract:
✓ Findings of intraoperative rolandic cortex mapping during awake craniotomy for a tumor in a patient with a contralateral upper-extremity amputation are presented. This patient sustained a traumatic amputation at the mid-humerus 24 years previously. Initially he had experienced rare painless phantom limb sensations but none in the past 10 years. Functional mapping during an awake craniotomy was performed to maximize safe tumor resection. Typical temporal and frontal speech areas were identified; motor representation of face and jaw extended more superiorly than sensory representation. Shoulder movements were evoked more laterally than usual at the superior aspect of the craniotomy. A small region of precentral gyrus, between the jaw and shoulder representations, elicited no detectable effect when stimulated. Somatosensory mapping showed a similar topographical distribution of face and mouth cortex; however, posterior and inferior to the shoulder motor cortex, right arm and hand (phantom) sensations were evoked. Evidence suggests that significant motor reorganization occurs following an amputation, with expansion of neighboring homuncular representations without loss of somatosensory representation, despite a long period of time without any sensation referable to the amputated limb. Contrary to models of sensory cortex plasticity, the plasticity of the adult cortex may be system specific, with reorganization present in motor, but not in sensory, cortical systems.
APA, Harvard, Vancouver, ISO, and other styles
43

Honari-Jahromi, Maryam, Brea Chouinard, Esti Blanco-Elorrieta, Liina Pylkkänen, and Alona Fyshe. "Neural representation of words within phrases: Temporal evolution of color-adjectives and object-nouns during simple composition." PLOS ONE 16, no. 3 (March 4, 2021): e0242754. http://dx.doi.org/10.1371/journal.pone.0242754.

Full text
Abstract:
In language, stored semantic representations of lexical items combine into an infinitude of complex expressions. While the neuroscience of composition has begun to mature, we do not yet understand how the stored representations evolve and morph during composition. New decoding techniques allow us to crack open this very hard question: we can train a model to recognize a representation in one context or time-point and assess its accuracy in another. We combined the decoding approach with magnetoencephalography recorded during a picture naming task to investigate the temporal evolution of noun and adjective representations during speech planning. We tracked semantic representations as they combined into simple two-word phrases, using single words and two-word lists as non-combinatory controls. We found that nouns were generally more decodable than adjectives, suggesting that noun representations were stronger and/or more consistent across trials than those of adjectives. When training and testing across contexts and times, the representations of isolated nouns were recoverable when those nouns were embedded in phrases, but not so if they were embedded in lists. Adjective representations did not show a similar consistency across isolated and phrasal contexts. Noun representations in phrases also sustained over time in a way that was not observed for any other pairing of word class and context. These findings offer a new window into the temporal evolution and context sensitivity of word representations during composition, revealing a clear asymmetry between adjectives and nouns. The impact of phrasal contexts on the decodability of nouns may be due to the nouns’ status as head of phrase—an intriguing hypothesis for future research.
APA, Harvard, Vancouver, ISO, and other styles
44

Ter-Mikaelian, Maria, Malcolm N. Semple, and Dan H. Sanes. "Effects of spectral and temporal disruption on cortical encoding of gerbil vocalizations." Journal of Neurophysiology 110, no. 5 (September 1, 2013): 1190–204. http://dx.doi.org/10.1152/jn.00645.2012.

Full text
Abstract:
Animal communication sounds contain spectrotemporal fluctuations that provide powerful cues for detection and discrimination. Human perception of speech is influenced both by spectral and temporal acoustic features but is most critically dependent on envelope information. To investigate the neural coding principles underlying the perception of communication sounds, we explored the effect of disrupting the spectral or temporal content of five different gerbil call types on neural responses in the awake gerbil's primary auditory cortex (AI). The vocalizations were impoverished spectrally by reduction to 4 or 16 channels of band-passed noise. For this acoustic manipulation, an average firing rate of the neuron did not carry sufficient information to distinguish between call types. In contrast, the discharge patterns of individual AI neurons reliably categorized vocalizations composed of only four spectral bands with the appropriate natural token. The pooled responses of small populations of AI cells classified spectrally disrupted and natural calls with an accuracy that paralleled human performance on an analogous speech task. To assess whether discharge pattern was robust to temporal perturbations of an individual call, vocalizations were disrupted by time-reversing segments of variable duration. For this acoustic manipulation, cortical neurons were relatively insensitive to short reversal lengths. Consistent with human perception of speech, these results indicate that the stable representation of communication sounds in AI is more dependent on sensitivity to slow temporal envelopes than on spectral detail.
APA, Harvard, Vancouver, ISO, and other styles
45

Magrassi, Lorenzo, Giuseppe Aromataris, Alessandro Cabrini, Valerio Annovazzi-Lodi, and Andrea Moro. "Sound representation in higher language areas during language generation." Proceedings of the National Academy of Sciences 112, no. 6 (January 26, 2015): 1868–73. http://dx.doi.org/10.1073/pnas.1418162112.

Full text
Abstract:
How language is encoded by neural activity in the higher-level language areas of humans is still largely unknown. We investigated whether the electrophysiological activity of Broca’s area correlates with the sound of the utterances produced. During speech perception, the electric cortical activity of the auditory areas correlates with the sound envelope of the utterances. In our experiment, we compared the electrocorticogram recorded during awake neurosurgical operations in Broca’s area and in the dominant temporal lobe with the sound envelope of single words versus sentences read aloud or mentally by the patients. Our results indicate that the electrocorticogram correlates with the sound envelope of the utterances, starting before any sound is produced and even in the absence of speech, when the patient is reading mentally. No correlations were found when the electrocorticogram was recorded in the superior parietal gyrus, an area not directly involved in language generation, or in Broca’s area when the participants were executing a repetitive motor task, which did not include any linguistic content, with their dominant hand. The distribution of suprathreshold correlations across frequencies of cortical activities varied whether the sound envelope derived from words or sentences. Our results suggest the activity of language areas is organized by sound when language is generated before any utterance is produced or heard.
APA, Harvard, Vancouver, ISO, and other styles
46

Caplan, Spencer, Alon Hafri, and John C. Trueswell. "Now You Hear Me, Later You Don’t: The Immediacy of Linguistic Computation and the Representation of Speech." Psychological Science 32, no. 3 (February 22, 2021): 410–23. http://dx.doi.org/10.1177/0956797620968787.

Full text
Abstract:
What happens to an acoustic signal after it enters the mind of a listener? Previous work has demonstrated that listeners maintain intermediate representations over time. However, the internal structure of such representations—be they the acoustic-phonetic signal or more general information about the probability of possible categories—remains underspecified. We present two experiments using a novel speaker-adaptation paradigm aimed at uncovering the format of speech representations. We exposed adult listeners ( N = 297) to a speaker whose utterances contained acoustically ambiguous information concerning phones (and thus words), and we manipulated the temporal availability of disambiguating cues via visually presented text (presented before or after each utterance). Results from a traditional phoneme-categorization task showed that listeners adapted to a modified acoustic distribution when disambiguating text was provided before but not after the audio. These results support the position that speech representations consist of activation over categories and are inconsistent with direct maintenance of the acoustic-phonetic signal.
APA, Harvard, Vancouver, ISO, and other styles
47

Duncan, Susan D. "Gesture, verb aspect, and the nature of iconic imagery in natural discourse." Gesture 2, no. 2 (December 31, 2002): 183–206. http://dx.doi.org/10.1075/gest.2.2.04dun.

Full text
Abstract:
Linguistic analyses of Mandarin Chinese and English have detailed the differences between the two languages in terms of the devices each makes available for expressing distinctions in the temporal contouring of events — verb aspect and Aktionsart. In this study, adult native speakers of each language were shown a cartoon, a movie, or a series of short action sequences and then videotaped talking about what they had seen. Comparisons revealed systematic within-language covariation of choice of aspect and/or Aktionsart in speech with features of co-occurring iconic gestures. In both languages, the gestures that speakers produced in imperfective aspect-marked speech contexts were more likely to take longer to produce and were more complex than those in perfective aspect speech contexts. Further, imperfective-progressive aspect-marked spoken utterances regularly accompanied iconic gestures in which the speaker’s hands engaged in some kind of temporally-extended, repeating or‘agitated’ movements. Gestures sometimes incorporated this type of motion even when there was nothing corresponding to it in the visual stimulus; for example, when speakers described events of stasis. These facts suggest that such gestural agitation may derive from an abstract level of representation, perhaps linked to aspectual view itself. No significant between-language differences in aspect- or Aktionsart-related gesturing were observed. We conclude that gestural representations of witnessed events, when performed in conjunction with speech, are not simply derived from visual images, stored as perceived in the stimulus, and transposed as faithfully as possible to the hands and body of the speaker (cf. Hadar & Butterworth, 1997). Rather, such gestures are part of a linguistic-conceptual representation (McNeill & Duncan, 2000) in which verb aspect has a role. We further conclude that the noted differences between the systems for marking aspectual distinctions in spoken Mandarin and English are at a level of patterning that has little or no influence on speech-co-occurring imagistic thinking.
APA, Harvard, Vancouver, ISO, and other styles
48

Moles, John. "The thirteenth oration of Dio Chrysostom: complexity and simplicity, rhetoric and moralism, literature and life." Journal of Hellenic Studies 125 (November 2005): 112–38. http://dx.doi.org/10.1017/s0075426900007138.

Full text
Abstract:
AbstractThis paper takes the Thirteenth Oration as a test case of many of the questions raised by the career and works of Dio Chrysostom. The speech's generic creativity and philosophical expertise are demonstrated. Historical problems are clarified. Analysis shows how Dio weaves seemingly diverse themes into a complex unity. New answers are given to two crucial interpretative problems. Exploration of Dio's self-representation and of his handling of internal and external audiences and of temporal and spatial relationships leads to the conclusion that he has a serious philosophical purpose: the advocacy of Antisthenic/Cynicpaideiain place of the currentpaideiaboth of Romans and Athenians. Paradoxically, this clever, ironic and sophisticated speech deconstructs its own apparent values in the interests of simple, practical moralizing.
APA, Harvard, Vancouver, ISO, and other styles
49

Bradski, Gary, Gail A. Carpenter, and Stephen Grossberg. "Working Memory Networks for Learning Temporal Order with Application to Three-Dimensional Visual Object Recognition." Neural Computation 4, no. 2 (March 1992): 270–86. http://dx.doi.org/10.1162/neco.1992.4.2.270.

Full text
Abstract:
Working memory neural networks, called Sustained Temporal Order REcurrent (STORE) models, encode the invariant temporal order of sequential events in short-term memory (STM). Inputs to the networks may be presented with widely differing growth rates, amplitudes, durations, and interstimulus intervals without altering the stored STM representation. The STORE temporal order code is designed to enable groupings of the stored events to be stably learned and remembered in real time, even as new events perturb the system. Such invariance and stability properties are needed in neural architectures which self-organize learned codes for variable-rate speech perception, sensorimotor planning, or three-dimensional (3-D) visual object recognition. Using such a working memory, a self-organizing architecture for invariant 3-D visual object recognition is described. The new model is based on the model of Seibert and Waxman (1990a), which builds a 3-D representation of an object from a temporally ordered sequence of its two-dimensional (2-D) aspect graphs. The new model, called an ARTSTORE model, consists of the following cascade of processing modules: Invariant Preprocessor → ART 2 → STORE Model → ART 2 → Outstar Network.
APA, Harvard, Vancouver, ISO, and other styles
50

Vougioukas, Konstantinos, Stavros Petridis, and Maja Pantic. "Realistic Speech-Driven Facial Animation with GANs." International Journal of Computer Vision 128, no. 5 (October 13, 2019): 1398–413. http://dx.doi.org/10.1007/s11263-019-01251-8.

Full text
Abstract:
Abstract Speech-driven facial animation is the process that automatically synthesizes talking characters based on speech signals. The majority of work in this domain creates a mapping from audio features to visual features. This approach often requires post-processing using computer graphics techniques to produce realistic albeit subject dependent results. We present an end-to-end system that generates videos of a talking head, using only a still image of a person and an audio clip containing speech, without relying on handcrafted intermediate features. Our method generates videos which have (a) lip movements that are in sync with the audio and (b) natural facial expressions such as blinks and eyebrow movements. Our temporal GAN uses 3 discriminators focused on achieving detailed frames, audio-visual synchronization, and realistic expressions. We quantify the contribution of each component in our model using an ablation study and we provide insights into the latent representation of the model. The generated videos are evaluated based on sharpness, reconstruction quality, lip-reading accuracy, synchronization as well as their ability to generate natural blinks.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography