To see the other types of publications on this topic, follow the link: Vocoder.

Journal articles on the topic 'Vocoder'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Vocoder.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Cychosz, Margaret, Matthew B. Winn, and Matthew J. Goupell. "How to vocode: Using channel vocoders for cochlear-implant research." Journal of the Acoustical Society of America 155, no. 4 (April 1, 2024): 2407–37. http://dx.doi.org/10.1121/10.0025274.

Full text
Abstract:
The channel vocoder has become a useful tool to understand the impact of specific forms of auditory degradation—particularly the spectral and temporal degradation that reflect cochlear-implant processing. Vocoders have many parameters that allow researchers to answer questions about cochlear-implant processing in ways that overcome some logistical complications of controlling for factors in individual cochlear implant users. However, there is such a large variety in the implementation of vocoders that the term “vocoder” is not specific enough to describe the signal processing used in these experiments. Misunderstanding vocoder parameters can result in experimental confounds or unexpected stimulus distortions. This paper highlights the signal processing parameters that should be specified when describing vocoder construction. The paper also provides guidance on how to determine vocoder parameters within perception experiments, given the experimenter's goals and research questions, to avoid common signal processing mistakes. Throughout, we will assume that experimenters are interested in vocoders with the specific goal of better understanding cochlear implants.
APA, Harvard, Vancouver, ISO, and other styles
2

Karoui, Chadlia, Chris James, Pascal Barone, David Bakhos, Mathieu Marx, and Olivier Macherey. "Searching for the Sound of a Cochlear Implant: Evaluation of Different Vocoder Parameters by Cochlear Implant Users With Single-Sided Deafness." Trends in Hearing 23 (January 2019): 233121651986602. http://dx.doi.org/10.1177/2331216519866029.

Full text
Abstract:
Cochlear implantation in subjects with single-sided deafness (SSD) offers a unique opportunity to directly compare the percepts evoked by a cochlear implant (CI) with those evoked acoustically. Here, nine SSD-CI users performed a forced-choice task evaluating the similarity of speech processed by their CI with speech processed by several vocoders presented to their healthy ear. In each trial, subjects heard two intervals: their CI followed by a certain vocoder in Interval 1 and their CI followed by a different vocoder in Interval 2. The vocoders differed either (i) in carrier type—(sinusoidal [SINE], bandfiltered noise [NOISE], and pulse-spreading harmonic complex) or (ii) in frequency mismatch between the analysis and synthesis frequency ranges—(no mismatch, and two frequency-mismatched conditions of 2 and 4 equivalent rectangular bandwidths [ERBs]). Subjects had to state in which of the two intervals the CI and vocoder sounds were more similar. Despite a large intersubject variability, the PSHC vocoder was judged significantly more similar to the CI than SINE or NOISE vocoders. Furthermore, the No-mismatch and 2-ERB mismatch vocoders were judged significantly more similar to the CI than the 4-ERB mismatch vocoder. The mismatch data were also interpreted by comparing spiral ganglion characteristic frequencies with electrode contact positions determined from postoperative computed tomography scans. Only one subject demonstrated a pattern of preference consistent with adaptation to the CI sound processor frequency-to-electrode allocation table and two subjects showed possible partial adaptation. Those subjects with adaptation patterns presented overall small and consistent frequency mismatches across their electrode arrays.
APA, Harvard, Vancouver, ISO, and other styles
3

Harding, Eleanor, Etienne Gaudrain, Imke Hrycyk, Robert Harris, Barbara Tillmann, Bert Maat, Rolien Free, and Deniz Başkent. "Arousal but not valence: Music emotion categorization in normal hearing and cochlear implanted participants." Journal of the Acoustical Society of America 153, no. 3_supplement (March 1, 2023): A287. http://dx.doi.org/10.1121/10.0018868.

Full text
Abstract:
Perceiving acoustic cues that convey music emotion is challenging for cochlear implant (CI) users. Emotional arousal (stimulating/relaxing) can be conveyed by temporal cues such as tempo, while emotional valence (positive/negative) can be conveyed by spectral information salient to pitch and harmony. It is however unclear the extent to which other temporal and spectral features convey emotional arousal and valence in music, respectively. In 23 normal-hearing participants, we varied the quality of temporal and spectral content using vocoders during a music emotion categorization task—musical excerpts conveyed joy (high arousal high valence), fear (high arousal low valence), serenity (low arousal high valence), and sorrow (low arousal low valence). Vocoder carriers (sinewave/noise) primarily modulated temporal information, and filter orders (low/high) primarily modulated spectral information. Improvement of temporal- (using sinewave carriers) and spectral content (using high filter order) both improved categorization. Vocoder results were compared to data from 25 CI users performing the same task with non-vocoded musical excerpts. The CI user data showed a similar pattern of errors as observed for the vocoded conditions in normal-hearing participants, suggesting that increasing the quality of temporal information, and not only spectral details, could prove beneficial for CI users’ music emotion perception.
APA, Harvard, Vancouver, ISO, and other styles
4

Roebel, Axel, and Frederik Bous. "Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet." Information 13, no. 3 (February 23, 2022): 103. http://dx.doi.org/10.3390/info13030103.

Full text
Abstract:
The use of the mel spectrogram as a signal parameterization for voice generation is quite recent and linked to the development of neural vocoders. These are deep neural networks that allow reconstructing high-quality speech from a given mel spectrogram. While initially developed for speech synthesis, now neural vocoders have also been studied in the context of voice attribute manipulation, opening new means for voice processing in audio production. However, to be able to apply neural vocoders in real-world applications, two problems need to be addressed: (1) To support use in professional audio workstations, the computational complexity should be small, (2) the vocoder needs to support a large variety of speakers, differences in voice qualities, and a wide range of intensities potentially encountered during audio production. In this context, the present study will provide a detailed description of the Multi-band Excited WaveNet, a fully convolutional neural vocoder built around signal processing blocks. It will evaluate the performance of the vocoder when trained on a variety of multi-speaker and multi-singer databases, including an experimental evaluation of the neural vocoder trained on speech and singing voices. Addressing the problem of intensity variation, the study will introduce a new adaptive signal normalization scheme that allows for robust compensation for dynamic and static gain variations. Evaluations are performed using objective measures and a number of perceptual tests including different neural vocoder algorithms known from the literature. The results confirm that the proposed vocoder compares favorably to the state-of-the-art in its capacity to generalize to unseen voices and voice qualities. The remaining challenges will be discussed.
APA, Harvard, Vancouver, ISO, and other styles
5

Ausili, Sebastian A., Bradford Backus, Martijn J. H. Agterberg, A. John van Opstal, and Marc M. van Wanrooij. "Sound Localization in Real-Time Vocoded Cochlear-Implant Simulations With Normal-Hearing Listeners." Trends in Hearing 23 (January 2019): 233121651984733. http://dx.doi.org/10.1177/2331216519847332.

Full text
Abstract:
Bilateral cochlear-implant (CI) users and single-sided deaf listeners with a CI are less effective at localizing sounds than normal-hearing (NH) listeners. This performance gap is due to the degradation of binaural and monaural sound localization cues, caused by a combination of device-related and patient-related issues. In this study, we targeted the device-related issues by measuring sound localization performance of 11 NH listeners, listening to free-field stimuli processed by a real-time CI vocoder. The use of a real-time vocoder is a new approach, which enables testing in a free-field environment. For the NH listening condition, all listeners accurately and precisely localized sounds according to a linear stimulus–response relationship with an optimal gain and a minimal bias both in the azimuth and in the elevation directions. In contrast, when listening with bilateral real-time vocoders, listeners tended to orient either to the left or to the right in azimuth and were unable to determine sound source elevation. When listening with an NH ear and a unilateral vocoder, localization was impoverished on the vocoder side but improved toward the NH side. Localization performance was also reflected by systematic variations in reaction times across listening conditions. We conclude that perturbation of interaural temporal cues, reduction of interaural level cues, and removal of spectral pinna cues by the vocoder impairs sound localization. Listeners seem to ignore cues that were made unreliable by the vocoder, leading to acute reweighting of available localization cues. We discuss how current CI processors prevent CI users from localizing sounds in everyday environments.
APA, Harvard, Vancouver, ISO, and other styles
6

Wess, Jessica M., and Joshua G. W. Bernstein. "The Effect of Nonlinear Amplitude Growth on the Speech Perception Benefits Provided by a Single-Sided Vocoder." Journal of Speech, Language, and Hearing Research 62, no. 3 (March 25, 2019): 745–57. http://dx.doi.org/10.1044/2018_jslhr-h-18-0001.

Full text
Abstract:
PurposeFor listeners with single-sided deafness, a cochlear implant (CI) can improve speech understanding by giving the listener access to the ear with the better target-to-masker ratio (TMR; head shadow) or by providing interaural difference cues to facilitate the perceptual separation of concurrent talkers (squelch). CI simulations presented to listeners with normal hearing examined how these benefits could be affected by interaural differences in loudness growth in a speech-on-speech masking task.MethodExperiment 1 examined a target–masker spatial configuration where the vocoded ear had a poorer TMR than the nonvocoded ear. Experiment 2 examined the reverse configuration. Generic head-related transfer functions simulated free-field listening. Compression or expansion was applied independently to each vocoder channel (power-law exponents: 0.25, 0.5, 1, 1.5, or 2).ResultsCompression reduced the benefit provided by the vocoder ear in both experiments. There was some evidence that expansion increased squelch in Experiment 1 but reduced the benefit in Experiment 2 where the vocoder ear provided a combination of head-shadow and squelch benefits.ConclusionsThe effects of compression and expansion are interpreted in terms of envelope distortion and changes in the vocoded-ear TMR (for head shadow) or changes in perceived target–masker spatial separation (for squelch). The compression parameter is a candidate for clinical optimization to improve single-sided deafness CI outcomes.
APA, Harvard, Vancouver, ISO, and other styles
7

Bosen, Adam K., and Michael F. Barry. "Serial Recall Predicts Vocoded Sentence Recognition Across Spectral Resolutions." Journal of Speech, Language, and Hearing Research 63, no. 4 (April 27, 2020): 1282–98. http://dx.doi.org/10.1044/2020_jslhr-19-00319.

Full text
Abstract:
Purpose The goal of this study was to determine how various aspects of cognition predict speech recognition ability across different levels of speech vocoding within a single group of listeners. Method We tested the ability of young adults ( N = 32) with normal hearing to recognize Perceptually Robust English Sentence Test Open-set (PRESTO) sentences that were degraded with a vocoder to produce different levels of spectral resolution (16, eight, and four carrier channels). Participants also completed tests of cognition (fluid intelligence, short-term memory, and attention), which were used as predictors of sentence recognition. Sentence recognition was compared across vocoder conditions, predictors were correlated with individual differences in sentence recognition, and the relationships between predictors were characterized. Results PRESTO sentence recognition performance declined with a decreasing number of vocoder channels, with no evident floor or ceiling performance in any condition. Individual ability to recognize PRESTO sentences was consistent relative to the group across vocoder conditions. Short-term memory, as measured with serial recall, was a moderate predictor of sentence recognition (ρ = 0.65). Serial recall performance was constant across vocoder conditions when measured with a digit span task. Fluid intelligence was marginally correlated with serial recall, but not sentence recognition. Attentional measures had no discernible relationship to sentence recognition and a marginal relationship with serial recall. Conclusions Verbal serial recall is a substantial predictor of vocoded sentence recognition, and this predictive relationship is independent of spectral resolution. In populations that show variable speech recognition outcomes, such as listeners with cochlear implants, it should be possible to account for the independent effects of spectral resolution and verbal serial recall in their speech recognition ability. Supplemental Material https://doi.org/10.23641/asha.12021051
APA, Harvard, Vancouver, ISO, and other styles
8

Yang, Jing, Jenna Barrett, Zhigang Yin, and Li Xu. "Recognition of foreign-accented vocoded speech by native English listeners." Acta Acustica 7 (2023): 43. http://dx.doi.org/10.1051/aacus/2023038.

Full text
Abstract:
This study examined how talker accentedness affects the recognition of noise-vocoded speech by native English listeners and how contextual information interplays with talker accentedness during this process. The listeners included 20 native English-speaking, normal-hearing adults aged between 19 and 23 years old. The stimuli were English Hearing in Noise Test (HINT) and Revised Speech Perception in Noise (R-SPIN) sentences produced by four native Mandarin talkers (two males and two females) who learned English as a second language. Two talkers (one in each sex) had a mild foreign accent and the other two had a moderate foreign accent. A six-channel noise vocoder was used to process the stimulus sentences. The vocoder-processed and unprocessed sentences were presented to the listeners. The results revealed that talkers’ foreign accents introduced additional detrimental effects besides spectral degradation and that the negative effect was exacerbated as the foreign accent became stronger. While the contextual information provided a beneficial role in recognizing mildly accented vocoded speech, the magnitude of contextual benefit decreased as the talkers’ accentedness increased. These findings revealed the joint influence of talker variability and sentence context on the perception of degraded speech.
APA, Harvard, Vancouver, ISO, and other styles
9

Bak, Taejun, Junmo Lee, Hanbin Bae, Jinhyeok Yang, Jae-Sung Bae, and Young-Sun Joo. "Avocodo: Generative Adversarial Network for Artifact-Free Vocoder." Proceedings of the AAAI Conference on Artificial Intelligence 37, no. 11 (June 26, 2023): 12562–70. http://dx.doi.org/10.1609/aaai.v37i11.26479.

Full text
Abstract:
Neural vocoders based on the generative adversarial neural network (GAN) have been widely used due to their fast inference speed and lightweight networks while generating high-quality speech waveforms. Since the perceptually important speech components are primarily concentrated in the low-frequency bands, most GAN-based vocoders perform multi-scale analysis that evaluates downsampled speech waveforms. This multi-scale analysis helps the generator improve speech intelligibility. However, in preliminary experiments, we discovered that the multi-scale analysis which focuses on the low-frequency bands causes unintended artifacts, e.g., aliasing and imaging artifacts, which degrade the synthesized speech waveform quality. Therefore, in this paper, we investigate the relationship between these artifacts and GAN-based vocoders and propose a GAN-based vocoder, called Avocodo, that allows the synthesis of high-fidelity speech with reduced artifacts. We introduce two kinds of discriminators to evaluate speech waveforms in various perspectives: a collaborative multi-band discriminator and a sub-band discriminator. We also utilize a pseudo quadrature mirror filter bank to obtain downsampled multi-band speech waveforms while avoiding aliasing. According to experimental results, Avocodo outperforms baseline GAN-based vocoders, both objectively and subjectively, while reproducing speech with fewer artifacts.
APA, Harvard, Vancouver, ISO, and other styles
10

Shi, Yong Peng. "Research and Implementation of MELP Algorithm Based on TMS320VC5509A." Advanced Materials Research 934 (May 2014): 239–44. http://dx.doi.org/10.4028/www.scientific.net/amr.934.239.

Full text
Abstract:
A kind of MELP vocode is designed based on DSP TMS320VC5509A in this article. Firstly, it expatiates the MELP algorithm,then the idea of modeling and realization process on DSP based is proposed. At last we can complete the function simulation of the encoding and decoding system,and the experiment result shows that the synthetical signals fit well with the original ones, and the quality of the speech got from the vocoder is good.
APA, Harvard, Vancouver, ISO, and other styles
11

Clark, Graeme, and Peter J. Blamey. "ELectrotactile vocoder." Journal of the Acoustical Society of America 90, no. 5 (November 1991): 2880. http://dx.doi.org/10.1121/1.401830.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Goupell, Matthew J., Garrison T. Draves, and Ruth Y. Litovsky. "Recognition of vocoded words and sentences in quiet and multi-talker babble with children and adults." PLOS ONE 15, no. 12 (December 29, 2020): e0244632. http://dx.doi.org/10.1371/journal.pone.0244632.

Full text
Abstract:
A vocoder is used to simulate cochlear-implant sound processing in normal-hearing listeners. Typically, there is rapid improvement in vocoded speech recognition, but it is unclear if the improvement rate differs across age groups and speech materials. Children (8–10 years) and young adults (18–26 years) were trained and tested over 2 days (4 hours) on recognition of eight-channel noise-vocoded words and sentences, in quiet and in the presence of multi-talker babble at signal-to-noise ratios of 0, +5, and +10 dB. Children achieved poorer performance than adults in all conditions, for both word and sentence recognition. With training, vocoded speech recognition improvement rates were not significantly different between children and adults, suggesting that improvement in learning how to process speech cues degraded via vocoding is absent of developmental differences across these age groups and types of speech materials. Furthermore, this result confirms that the acutely measured age difference in vocoded speech recognition persists after extended training.
APA, Harvard, Vancouver, ISO, and other styles
13

Ding, Yuntao, Rangzhuoma Cai, and Baojia Gong. "Tibetan speech synthesis based on an improved neural network." MATEC Web of Conferences 336 (2021): 06012. http://dx.doi.org/10.1051/matecconf/202133606012.

Full text
Abstract:
Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper uses WaveNet vocoder instead of griffin-lim for Tibetan speech synthesis.This paper first uses convolution operation and attention mechanism to extract sequence features.And then uses linear projection and feature amplification module to predict mel spectrogram.Finally, use WaveNet vocoder to synthesize speech waveform. Experimental data shows that our model has a better performance in Tibetan speech synthesis.
APA, Harvard, Vancouver, ISO, and other styles
14

Eng, Erica, Can Xu, Sarah Medina, Fan-Yin Cheng, René Gifford, and Spencer Smith. "Objective discrimination of bimodal speech using the frequency following response: A machine learning approach." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A91. http://dx.doi.org/10.1121/10.0015651.

Full text
Abstract:
Bimodal hearing, which combines a cochlear implant (CI) with a contralateral hearing aid, provides significant speech recognition benefits relative to a monaural CI. Factors predicting bimodal benefit remain poorly understood but may involve extracting fundamental frequency and/or formant information from the non-implanted ear. This study investigated whether neural responses (frequency following responses, FFRs) to simulated bimodal signals can be (1) accurately classified using machine learning and (2) used to predict perceptual bimodal benefit. We hypothesized that FFR classification accuracy would improve with increasing acoustic bandwidth due to greater fundamental and formant frequency access. Three vowels (/e/, /i/, and /ʊ/) with identical fundamental frequencies were manipulated to create five bimodal simulations (vocoder in right ear, lowpass filtered in left ear): Vocoder-only, Vocoder +125 Hz, Vocoder +250 Hz, Vocoder +500 Hz, and Vocoder +750 Hz. Perceptual performance on the BKB-SIN test was also measured using the same five configurations. FFR classification accuracy improved with increasing bimodal acoustic bandwidth. Furthermore, FFR bimodal benefit predicted behavioral bimodal benefit. These results indicate that the FFR may be useful in objectively verifying and tuning bimodal configurations.
APA, Harvard, Vancouver, ISO, and other styles
15

Shinohara, Yasuaki. "Japanese pitch-accent perception of noise-vocoded sine-wave speech." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A175. http://dx.doi.org/10.1121/10.0015940.

Full text
Abstract:
A previous study has demonstrated that speech intelligibility is improved for a tone language when sine-wave speech is noise-vocoded, because noise-vocoding eliminates the quasi-periodicity of sine-wave speech. This study examined whether identification accuracy of Japanese pitch-accent words increases after sine-wave speech is noise-vocoded. The results showed that the Japanese listeners’ identification accuracy significantly increased, but their discrimination accuracy did not show a significant difference between the sine-wave speech and noise-vocoded sine-wave speech conditions. These results suggest that Japanese listeners can auditorily discriminate minimal-pair words using any acoustic cues in both conditions, but quasi-periodicity is eliminated by noise-vocoding so that the Japanese listeners’ identification accuracy increases in the noise-vocoded sine-wave speech condition. The same results were not observed when another way of noise-vocoding was used in a previous study, suggesting that the quasi-periodicity of sine-wave speech needs to be adequately eliminated by a noise-vocoder to show a significant difference in identification.
APA, Harvard, Vancouver, ISO, and other styles
16

Jacobs, Paul E. "Variable rate vocoder." Journal of the Acoustical Society of America 103, no. 4 (April 1998): 1700. http://dx.doi.org/10.1121/1.421053.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Griffin, Daniel W., and Jae S. Lim. "Multiband excitation vocoder." IEEE Transactions on Acoustics, Speech, and Signal Processing 36, no. 8 (August 1988): 1223–35. http://dx.doi.org/10.1109/29.1651.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Shinohara, Yasuaki. "Perception of noise-vocoded sine-wave speech of Japanese pitch-accent words." JASA Express Letters 2, no. 8 (August 2022): 085204. http://dx.doi.org/10.1121/10.0013423.

Full text
Abstract:
The present study examined whether the identification accuracy of Japanese pitch-accent words increased after the sine-wave speech underwent noise vocoding, which eliminates the quasi-periodicity of the sine-wave speech. The results demonstrated that Japanese listeners were better at discriminating sine-wave speech than noise-vocoded sine-wave speech, with no significant difference in identification between the two conditions. They identify sine-wave pitch-accent words to some extent using acoustic cues other than the pitch accent. The noise vocoder used in the present study might not have been substantially effective for Japanese listeners to show a significant difference in the identification between the two conditions.
APA, Harvard, Vancouver, ISO, and other styles
19

Jeong, Changhyeon, Hyung-pil Chang, In-Chul Yoo, and Dongsuk Yook. "Wav2wav: Wave-to-Wave Voice Conversion." Applied Sciences 14, no. 10 (May 17, 2024): 4251. http://dx.doi.org/10.3390/app14104251.

Full text
Abstract:
Voice conversion is the task of changing the speaker characteristics of input speech while preserving its linguistic content. It can be used in various areas, such as entertainment, medicine, and education. The quality of the converted speech is crucial for voice conversion algorithms to be useful in these various applications. Deep learning-based voice conversion algorithms, which have been showing promising results recently, generally consist of three modules: a feature extractor, feature converter, and vocoder. The feature extractor accepts the waveform as the input and extracts speech feature vectors for further processing. These speech feature vectors are later synthesized back into waveforms by the vocoder. The feature converter module performs the actual voice conversion; therefore, many previous studies separately focused on improving this module. These works combined the separately trained vocoder to synthesize the final waveform. Since the feature converter and the vocoder are trained independently, the output of the converter may not be compatible with the input of the vocoder, which causes performance degradation. Furthermore, most voice conversion algorithms utilize mel-spectrogram-based speech feature vectors without modification. These feature vectors have performed well in a variety of speech-processing areas but could be further optimized for voice conversion tasks. To address these problems, we propose a novel wave-to-wave (wav2wav) voice conversion method that integrates the feature extractor, the feature converter, and the vocoder into a single module and trains the system in an end-to-end manner. We evaluated the efficiency of the proposed method using the VCC2018 dataset.
APA, Harvard, Vancouver, ISO, and other styles
20

Hodges, Aaron, Raymond L. Goldsworthy, Matthew B. Fitzgerald, and Takako Fujioka. "Transfer effects of discrete tactile mapping of musical pitch on discrimination of vocoded stimuli." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A229. http://dx.doi.org/10.1121/10.0016101.

Full text
Abstract:
Many studies have found benefits of using somatosensory modality to augment sound information for individuals with hearing loss. However, few studies have explored the use of multiple regions of the body sensitive to vibrotactile stimulation to convey discrete F0 information, important for music perception. This study explored whether mapping of multiple finger patterns associated with musical notes can be learned quickly and transferred to discriminate vocoded auditory stimuli. Each of eight musical diatonic scale notes were associated with one of unique finger digits 2-5 patterns in the dominant hand, where pneumatic tactile stimulation apparatus were attached. The study consisted of a pre and post-test with a learning phase in-between. During the learning phase, normal-hearing participants had to identify common nursery song melodies presented with simultaneous auditory-tactile stimulus for about 10 min, using non-vocoded (original) audio. Pre- and post-tests examined stimulus discrimination for 4 conditions: original audio + tactile, tactile only, vocoded audio only, and vocoded audio + tactile. The audio vocoder used cochlear implant 4 channel simulation. Our results demonstrated audio-tactile learning improved participant’s performance on the vocoded audio + tactile tasks. The tactile only condition also significantly improved, indicating the rapid learning of the audio-tactile mapping and its effective transfer.
APA, Harvard, Vancouver, ISO, and other styles
21

Niu, Qing Yu, Qiang Li, and Qin Jun Shu. "Research and Analysis on the Implementation of MELP Algorithm on DSP." Advanced Materials Research 1030-1032 (September 2014): 1755–59. http://dx.doi.org/10.4028/www.scientific.net/amr.1030-1032.1755.

Full text
Abstract:
This paper briefly analyses the principle of MELP vocoder algorithm, and TI's TMS320C5509(C5509) DSP is selected as an implementation platform of 2.4kbps MELP speech-coding algorithm. To ensure the rational and efficient utilization of the limited memory resources, this paper introduces the processing method of MELP algorithm based on frame structure and gives a profound analysis on how to configure its memory space of the selected DSP by analyzing its memory structure and considering the specific circumstances of the MELP vocoder algorithm. Finally, this paper gives the memory configuration during the implementation of MELP vocoder on TI's TMS320C5509 DSP.
APA, Harvard, Vancouver, ISO, and other styles
22

Tamati, Terrin N., Lars Bakker, Stefan Smeenk, Almut Jebens, Thomas Koelewijn, and Deniz Başkent. "Pupil response to familiar and unfamiliar talkers in the recognition of noise-vocoded speech." Journal of the Acoustical Society of America 151, no. 4 (April 2022): A264. http://dx.doi.org/10.1121/10.0011285.

Full text
Abstract:
In some challenging listening conditions, listeners are more accurate at recognizing speech produced by a familiar talker compared to unfamiliar talkers. However, previous studies have found little to no talker-familiarity benefit in the recognition of noise-vocoded speech, potentially due to limitations in the talker-specific details conveyed in noise-vocoded signals. Although no strong effect on performance has been observed, listening to a familiar talker may reduce the listening effort experienced. The current study used pupillometry to assess how talker familiarity could impact the amount of effort required to recognize noise-vocoded speech. Four groups of normal-hearing, listeners completed talker familiarity training, each with a different talker. Then, listeners repeated sentences produced by the familiar (training) talker and three unfamiliar talkers. Sentences were mixed with multi-talker babble, and were processed with an 8-channel noise-vocoder; SNR was set to a participant’s 50% correct performance level. Preliminary results demonstrate no overall talker-familiarity benefit across training groups. Examining each training group separately showed differences in pupil response for familiar and unfamiliar talkers, but the direction and size of the effect depended on the training talker. These preliminary findings suggest that normal-hearing, listeners make use of limited talker-specific details in the recognition of noise-vocoded speech.
APA, Harvard, Vancouver, ISO, and other styles
23

Asyraf, Muhammad A., and Dhany Arifianto. "Effect of electric-acoustic cochlear implant stimulation and coding strategies on spatial cues of speech signals in reverberant room." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A195. http://dx.doi.org/10.1121/10.0016005.

Full text
Abstract:
The comparison of spatial cues changes in different setups and coding strategies used in cochlear implants (CI) is investigated. In this experiment, we implement three voice coder setups, such as bilateral CI, bimodal CI, and electro-acoustic stimulation (EAS). Two well-known coding strategies are used, which are continuous interleaved sampling (CIS) and spectral peak (SPEAK). Speech signals are convoluted with appropriate binaural room impulse response (BRIR), creating reverberant spatial stimuli. Five different reverberant conditions (including anechoic) were applied to the stimuli. Interaural level and time differences (ILD and ITD) are evaluated objectively and subjectively, and their relationship with the intelligibility of speech is observed. Prior objective evaluation with CIS reveals that clarity (C50) becomes a more important factor in spatial cue change than reverberation time. Vocoded conditions (bilateral CI) show an increment in ILD value (compression has not been implemented yet on the vocoder processing), when the value of ITD gets more different (decreased) from the middle point. Reverberation degrades the intelligibility rate at various rates depending on the C50 value, both in unvocoded and vocoded conditions. In the vocoded condition, decrement on spatial cues was also followed by the decreement on the intelligibility of spatial stimuli.
APA, Harvard, Vancouver, ISO, and other styles
24

Taguchi, Tetsu. "Formant pattern matching vocoder." Journal of the Acoustical Society of America 91, no. 3 (March 1992): 1790. http://dx.doi.org/10.1121/1.403749.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Taguchi, Tetsu. "Multi‐pulse type vocoder." Journal of the Acoustical Society of America 88, no. 6 (December 1990): 2913. http://dx.doi.org/10.1121/1.399635.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Ali, Hussnain, Nursadul Mamun, Avamarie Bruggeman, Ram Charan M. Chandra Shekar, Juliana N. Saba, and John H. L. Hansen. "The CCi-MOBILE Vocoder." Journal of the Acoustical Society of America 144, no. 3 (September 2018): 1872. http://dx.doi.org/10.1121/1.5068238.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Hillenbrand, James M., and Robert A. Houde. "A damped sinewave vocoder." Journal of the Acoustical Society of America 104, no. 3 (September 1998): 1835. http://dx.doi.org/10.1121/1.424405.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Liu, Ludy. "Fixed point vocoder implementation." Computer Standards & Interfaces 20, no. 6-7 (March 1999): 464–65. http://dx.doi.org/10.1016/s0920-5489(99)91011-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Gibbs, Bobby E., Joshua G. W. Bernstein, Douglas S. Brungart, and Matthew J. Goupell. "Effects of better-ear glimpsing, binaural unmasking, and spectral resolution on spatial release from masking in cochlear-implant users." Journal of the Acoustical Society of America 152, no. 2 (August 2022): 1230–46. http://dx.doi.org/10.1121/10.0013746.

Full text
Abstract:
Bilateral cochlear-implant (BICI) listeners obtain less spatial release from masking (SRM; speech-recognition improvement for spatially separated vs co-located conditions) than normal-hearing (NH) listeners, especially for symmetrically placed maskers that produce similar long-term target-to-masker ratios at the two ears. Two experiments examined possible causes of this deficit, including limited better-ear glimpsing (using speech information from the more advantageous ear in each time-frequency unit), limited binaural unmasking (using interaural differences to improve signal-in-noise detection), or limited spectral resolution. Listeners had NH (presented with unprocessed or vocoded stimuli) or BICIs. Experiment 1 compared natural symmetric maskers, idealized monaural better-ear masker (IMBM) stimuli that automatically performed better-ear glimpsing, and hybrid stimuli that added worse-ear information, potentially restoring binaural cues. BICI and NH-vocoded SRM was comparable to NH-unprocessed SRM for idealized stimuli but was 14%–22% lower for symmetric stimuli, suggesting limited better-ear glimpsing ability. Hybrid stimuli improved SRM for NH-unprocessed listeners but degraded SRM for BICI and NH-vocoded listeners, suggesting they experienced across-ear interference instead of binaural unmasking. In experiment 2, increasing the number of vocoder channels did not change NH-vocoded SRM. BICI SRM deficits likely reflect a combination of across-ear interference, limited better-ear glimpsing, and poorer binaural unmasking that stems from cochlear-implant-processing limitations other than reduced spectral resolution.
APA, Harvard, Vancouver, ISO, and other styles
30

Ezzine, Kadria, Joseph Di Martino, and Mondher Frikha. "Any-to-One Non-Parallel Voice Conversion System Using an Autoregressive Conversion Model and LPCNet Vocoder." Applied Sciences 13, no. 21 (November 2, 2023): 11988. http://dx.doi.org/10.3390/app132111988.

Full text
Abstract:
We present an any-to-one voice conversion (VC) system, using an autoregressive model and LPCNet vocoder, aimed at enhancing the converted speech in terms of naturalness, intelligibility, and speaker similarity. As the name implies, non-parallel any-to-one voice conversion does not require paired source and target speeches and can be employed for arbitrary speech conversion tasks. Recent advancements in neural-based vocoders, such as WaveNet, have improved the efficiency of speech synthesis. However, in practice, we find that the trajectory of some generated waveforms is not consistently smooth, leading to occasional voice errors. To address this issue, we propose to use an autoregressive (AR) conversion model along with the high-fidelity LPCNet vocoder. This combination not only solves the problems of waveform fluidity but also produces more natural and clear speech, with the added capability of real-time speech generation. To precisely represent the linguistic content of a given utterance, we use speaker-independent PPG features (SI-PPG) computed from an automatic speech recognition (ASR) model trained on a multi-speaker corpus. Next, a conversion model maps the SI-PPG to the acoustic representations used as input features for the LPCNet. The proposed autoregressive structure enables our system to produce the following prediction step outputs from the acoustic features predicted in the previous step. We evaluate the effectiveness of our system by performing any-to-one conversion pairs between native English speakers. Experimental results show that the proposed method outperforms state-of-the-art systems, producing higher speech quality and greater speaker similarity.
APA, Harvard, Vancouver, ISO, and other styles
31

Wu, Ya Ting, Y. Y. Zhao, and Fei Yu. "An Improved Echo Cancellation Algorithm with Low Computational Complexity." Applied Mechanics and Materials 303-306 (February 2013): 2042–45. http://dx.doi.org/10.4028/www.scientific.net/amm.303-306.2042.

Full text
Abstract:
A low-complexity echo canceller integrated with vocoder is proposed in this paper to speed up the convergence process. By making full use of the linear prediction parameters retrieved from decoder and the voice active detection feature of the vocoder, the new echo canceller avoids the need to calculate decorrelation filter coefficients and prewhiten the received signal separately. Simulation results show performance improvement of the proposed algorithm in terms of convergence rate and echo return loss enhancement.
APA, Harvard, Vancouver, ISO, and other styles
32

Dolson, Mark. "The Phase Vocoder: A Tutorial." Computer Music Journal 10, no. 4 (1986): 14. http://dx.doi.org/10.2307/3680093.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Ketchum, Richard H. "Code excited linear predictive vocoder." Journal of the Acoustical Society of America 91, no. 6 (June 1992): 3594. http://dx.doi.org/10.1121/1.402803.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Pereira, M. A. T., and F. A. G. Ferreira. "Simulação de Um Vocoder Digital." Journal of Communication and Information Systems 2, no. 1 (December 30, 1987): 49–66. http://dx.doi.org/10.14209/jcis.1987.3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Sidhu, Jaskirat, Li Xu, and Jing Yang. "Accent rating of noise- and tone-vocoded foreign-accented speech." Journal of the Acoustical Society of America 153, no. 3_supplement (March 1, 2023): A340. http://dx.doi.org/10.1121/10.0019078.

Full text
Abstract:
There are abundant acoustic-phonetic cues in speech signals for listeners to encode talker identity. However, speech signals in the real world are always less optimal due to various adverse listening sources. Vocoded speech is one type of simplified signal that has less spectral and/or temporal information in comparison to normal speech. The purpose of this study is to examine whether and how listeners’ judgment of talker accent is affected by noise and tone vocoding. Twelve Mandarin-accented English speakers with varying degree of accentedness and two native English speakers were recorded reading “The Rainbow Passage.” The recorded speech samples from each talker were segmented into small sections that were randomly selected for noise- or tone-excited vocoder processing into 1, 2, 4, 8, and 16 channels. The vocoded and unprocessed speech samples were randomly presented to a group of normal-hearing, monolingual English listeners for accent rating. The listeners judged the degree of talker accent on a 9-point Likert scale with “1” representing no accent and “9” representing extremely strong accent. The data are still in the process of being collected and analyzed. Results and implications of the present study will be discussed.
APA, Harvard, Vancouver, ISO, and other styles
36

Lynch, Michael P., Rebecca E. Eilers, D. Kimbrough Oller, Richard C. Urbano, and Patricia J. Pero. "Multisensory Narrative Tracking by a Profoundly Deaf Subject Using an Electrocutaneous Vocoder and a Vibrotactile Aid." Journal of Speech, Language, and Hearing Research 32, no. 2 (June 1989): 331–38. http://dx.doi.org/10.1044/jshr.3202.331.

Full text
Abstract:
A congenitally, profoundly deaf adult who had received 41 hours of tactual word recognition training in a previous study was assessed in tracking of connected discourse. This assessment was conducted in three phases. In the first phase, the subject used the Tacticon 1600 electrocutaneous vocoder to track a narrative in three conditions: (a) lipreading and aided hearing (L+H), (b) lipreading and tactual vocoder (L+TV), and (c) lipreading, tactual vocoder, and aided hearing (L+TV+H), Subject performance was significantly better in the L+TV+H condition than in the L+H condition, suggesting that the subject benefitted from the additional information provided by the tactual vocoder. In the second phase, the Tactaid II vibrotactile aid was used in three conditions: (a) lipreading alone, (b) lipreading and tactual aid (L+TA), and (c) lipreading, tactual aid, and aided hearing (L+TA+H). The subject was able to combine cues from the Tactaid II with those from lipreading and aided hearing. In the third phase, both tactual devices were used in six conditions: (a) lipreading alone (L), (b) lipreading and aided hearing (L+H), (c) lipreading and Tactaid II (L+TA), (d) lipreading and Tacticon 1600 (L+TV), (e) lipreading, Tactaid II, and aided hearing (L+TA+H), and (f) lipreading, Tacticon 1600, and aided hearing (L+TV+H). In this phase, only the Tactaid II significantly improved tracking performance over lipreading and aided hearing. Overall, improvement in tracking performance occurred within and across phases of this study.
APA, Harvard, Vancouver, ISO, and other styles
37

Al-Radhi, Mohammed Salah, Tamás Gábor Csapó, and Géza Németh. "Continuous vocoder applied in deep neural network based voice conversion." Multimedia Tools and Applications 78, no. 23 (September 16, 2019): 33549–72. http://dx.doi.org/10.1007/s11042-019-08198-5.

Full text
Abstract:
Abstract In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.
APA, Harvard, Vancouver, ISO, and other styles
38

Fodor, Ádám, László Kopácsi, Zoltán Ádám Milacski, and András Lőrincz. "Speech De-identification with Deep Neural Networks." Acta Cybernetica 25, no. 2 (December 7, 2021): 257–69. http://dx.doi.org/10.14232/actacyb.288282.

Full text
Abstract:
Cloud-based speech services are powerful practical tools but the privacy of the speakers raises important legal concerns when exposed to the Internet. We propose a deep neural network solution that removes personal characteristics from human speech by converting it to the voice of a Text-to-Speech (TTS) system before sending the utterance to the cloud. The network learns to transcode sequences of vocoder parameters, delta and delta-delta features of human speech to those of the TTS engine. We evaluated several TTS systems, vocoders and audio alignment techniques. We measured the performance of our method by (i) comparing the result of speech recognition on the de-identified utterances with the original texts, (ii) computing the Mel-Cepstral Distortion of the aligned TTS and the transcoded sequences, and (iii) questioning human participants in A-not-B, 2AFC and 6AFC tasks. Our approach achieves the level required by diverse applications.
APA, Harvard, Vancouver, ISO, and other styles
39

Cychosz, Margaret, Kevin Xu, and Qian-Jie Fu. "Effects of spectral smearing on speech understanding and masking release in simulated bilateral cochlear implants." PLOS ONE 18, no. 11 (November 2, 2023): e0287728. http://dx.doi.org/10.1371/journal.pone.0287728.

Full text
Abstract:
Differences in spectro-temporal degradation may explain some variability in cochlear implant users’ speech outcomes. The present study employs vocoder simulations on listeners with typical hearing to evaluate how differences in degree of channel interaction across ears affects spatial speech recognition. Speech recognition thresholds and spatial release from masking were measured in 16 normal-hearing subjects listening to simulated bilateral cochlear implants. 16-channel sine-vocoded speech simulated limited, broad, or mixed channel interaction, in dichotic and diotic target-masker conditions, across ears. Thresholds were highest with broad channel interaction in both ears but improved when interaction decreased in one ear and again in both ears. Masking release was apparent across conditions. Results from this simulation study on listeners with typical hearing show that channel interaction may impact speech recognition more than masking release, and may have implications for the effects of channel interaction on cochlear implant users’ speech recognition outcomes.
APA, Harvard, Vancouver, ISO, and other styles
40

Ming, Yan, Li Zhen Wang, and Xu Jiu Xia. "A Rate of 4kbps Vocoder Based on MELP." Advanced Materials Research 1030-1032 (September 2014): 1638–41. http://dx.doi.org/10.4028/www.scientific.net/amr.1030-1032.1638.

Full text
Abstract:
A 4kbps vocoder based on MELP is presented in this paper. It uses the parameter encoding and mixed excitation technology to ensure the quality of speech. Through adopting the scalar quantization of Line Spectrum Frequency (LSF), the algorithm reduces the storage and computational complexity. Meanwhile, 4kbps vocoder adds a new frame type-transition frame. The classifier can reduce the U/V decision errors and avoid excessive switching between voiced frame and unvoiced frame. A modified bit allocation table is introduced and the PESQ-MOS and coding time test shows that the synthetic speech quality has been improved and reached the quality of communication.
APA, Harvard, Vancouver, ISO, and other styles
41

Ao, Zhen, Feng Li, Qiang Ma, and Guiqing He. "Voice and Position Simultaneous Communication System Based on Beidou Navigation Constellation." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 38, no. 5 (October 2020): 1010–17. http://dx.doi.org/10.1051/jnwpu/20203851010.

Full text
Abstract:
Considering China Beidou has unique two-way communication capability for short messages that are not available in other navigation systems such as GPS, a 600bps vocoder adapted to the short message channel of Beidou is developed. The sinusoidal excitation linear prediction algorithm is adopted by the vocoder to achieve voice communication with clear communication quality. Furthermore, a coordinate compression algorithm for processing positioning information is designed to provide more transmission space for speech encoded data. Based on the above-mentioned results, a communication system that only the Beidou navigation system is used to complete two-way secure voice and positioning simultaneous interpretation is realized. The system firstly uses a voice conversion program to convert the voice coded data obtained by the vocoder codec module into the Beidou short message data format; and then the voice code analysis program and the latitude and longitude analysis program is used to parse the voice code and location information; Finally, the experimental results of voice communication and positioning transmission are verified on the Beidou short message transceiver, and the subjective MOS test score indicates that the way is paved for the practical use of Beidou short message voice communication.
APA, Harvard, Vancouver, ISO, and other styles
42

Poojary, Nigam R., and K. H. Ashish. "Text To Speech with Custom Voice." International Journal for Research in Applied Science and Engineering Technology 11, no. 4 (April 30, 2023): 4523–30. http://dx.doi.org/10.22214/ijraset.2023.51217.

Full text
Abstract:
Abstract: The Text to Speech with Custom Voice system described in this work has vast applicability in numerous industries, including entertainment, education, and accessibility. The proposed text-to-speech (TTS) system is capable of generating speech audio in custom voices, even those not included in the training data. The system comprises a speaker encoder, a synthesizer, and a WaveRNN vocoder. Multiple speakers from a dataset of clean speech without transcripts are used to train the speaker encoder for a speaker verification process. The reference speech of the target speaker is used to create a fixed-dimensional embedding vector. Using the speaker embedding, the synthesizer network based on Tacotron2 creates a mel spectrogram from text, and the WaveRNN vocoder transforms the mel spectrogram into time-domain waveform samples. These waveform samples are converted to audio, which is the output of our work. The adaptable modular design enables external users to quickly integrate the Text to Speech with Custom Voice system into their products. Additionally, users can edit specific modules and pipeline phases in this work without changing the source code. To achieve the best performance, the speaker encoder, synthesizer, and vocoder must be trained on a variety of speaker datasets.
APA, Harvard, Vancouver, ISO, and other styles
43

Taguchi, Tetsu. "Pattern matching vocoder using LSP parameters." Journal of the Acoustical Society of America 93, no. 3 (March 1993): 1676. http://dx.doi.org/10.1121/1.406754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Manley, Harold J., and Joseph de Lellis. "Half duplex integral vocoder modem system." Journal of the Acoustical Society of America 79, no. 4 (April 1986): 1198–99. http://dx.doi.org/10.1121/1.393322.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

FISCHMAN, RAJMIL. "The phase vocoder: theory and practice." Organised Sound 2, no. 2 (August 1997): 127–45. http://dx.doi.org/10.1017/s1355771897009060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Dusheng, Wang, Zhang Jiankang, and Fan Changxin. "A single processor multi-rate vocoder." Journal of Electronics (China) 14, no. 1 (January 1997): 59–62. http://dx.doi.org/10.1007/s11767-996-1024-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Brooks, P. L., B. J. Frost, J. L. Mason, and D. M. Gibson. "Word and Feature Identification by Profoundly Deaf Teenagers Using the Queen's University Tactile Vocoder." Journal of Speech, Language, and Hearing Research 30, no. 1 (March 1987): 137–41. http://dx.doi.org/10.1044/jshr.3001.137.

Full text
Abstract:
The experiments described are part of an ongoing evaluation of the Queen's University Tactile Vocoder, a device that allows the acoustic waveform to be felt as a vibrational pattern on the skin. Two prelingually profoundly deaf teenagers reached criterion on a 50-word vocabulary (live voice, single speaker) using information obtained solely from the tactile vocoder with 28.5 and 24.0 hours of training. Immediately following word-learning experiments, subjects were asked to place 16 CVs into five phonemic categories (voiced & unvoiced stops, voiced & unvoiced fricatives, approximants). Average accuracy was 84.5%. Similar performance (89.6%) was obtained for placement of 12 VCs into four phonemic categories. Subjects were able to acquire some general rules about voicing and manner of articulation cues.
APA, Harvard, Vancouver, ISO, and other styles
48

Gauer, Johannes, Anil Nagathil, Kai Eckel, Denis Belomestny, and Rainer Martin. "A versatile deep-neural-network-based music preprocessing and remixing scheme for cochlear implant listeners." Journal of the Acoustical Society of America 151, no. 5 (May 2022): 2975–86. http://dx.doi.org/10.1121/10.0010371.

Full text
Abstract:
While cochlear implants (CIs) have proven to restore speech perception to a remarkable extent, access to music remains difficult for most CI users. In this work, a methodology for the design of deep learning-based signal preprocessing strategies that simplify music signals and emphasize rhythmic information is proposed. It combines harmonic/percussive source separation and deep neural network (DNN) based source separation in a versatile source mixture model. Two different neural network architectures were assessed with regard to their applicability for this task. The method was evaluated with instrumental measures and in two listening experiments for both network architectures and six mixing presets. Normal-hearing subjects rated the signal quality of the processed signals compared to the original both with and without a vocoder which provides an approximation of the auditory perception in CI listeners. Four combinations of remix models and DNNs have been selected for an evaluation with vocoded signals and were all rated significantly better in comparison to the unprocessed signal. In particular, the two best-performing remix networks are promising candidates for further evaluation in CI listeners.
APA, Harvard, Vancouver, ISO, and other styles
49

McGee, W. F., and Paul Merkley. "A Real-Time Logarithmic-Frequency Phase Vocoder." Computer Music Journal 15, no. 1 (1991): 20. http://dx.doi.org/10.2307/3680383.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Pope, S. P., B. Solberg, and R. W. Brodersen. "A single-chip linear-predictive-coding vocoder." IEEE Journal of Solid-State Circuits 22, no. 3 (June 1987): 479–87. http://dx.doi.org/10.1109/jssc.1987.1052754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography