Journal articles on the topic 'Vocoder'

To see the other types of publications on this topic, follow the link: Vocoder.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Vocoder.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Karoui, Chadlia, Chris James, Pascal Barone, David Bakhos, Mathieu Marx, and Olivier Macherey. "Searching for the Sound of a Cochlear Implant: Evaluation of Different Vocoder Parameters by Cochlear Implant Users With Single-Sided Deafness." Trends in Hearing 23 (January 2019): 233121651986602. http://dx.doi.org/10.1177/2331216519866029.

Full text
Abstract:
Cochlear implantation in subjects with single-sided deafness (SSD) offers a unique opportunity to directly compare the percepts evoked by a cochlear implant (CI) with those evoked acoustically. Here, nine SSD-CI users performed a forced-choice task evaluating the similarity of speech processed by their CI with speech processed by several vocoders presented to their healthy ear. In each trial, subjects heard two intervals: their CI followed by a certain vocoder in Interval 1 and their CI followed by a different vocoder in Interval 2. The vocoders differed either (i) in carrier type—(sinusoidal [SINE], bandfiltered noise [NOISE], and pulse-spreading harmonic complex) or (ii) in frequency mismatch between the analysis and synthesis frequency ranges—(no mismatch, and two frequency-mismatched conditions of 2 and 4 equivalent rectangular bandwidths [ERBs]). Subjects had to state in which of the two intervals the CI and vocoder sounds were more similar. Despite a large intersubject variability, the PSHC vocoder was judged significantly more similar to the CI than SINE or NOISE vocoders. Furthermore, the No-mismatch and 2-ERB mismatch vocoders were judged significantly more similar to the CI than the 4-ERB mismatch vocoder. The mismatch data were also interpreted by comparing spiral ganglion characteristic frequencies with electrode contact positions determined from postoperative computed tomography scans. Only one subject demonstrated a pattern of preference consistent with adaptation to the CI sound processor frequency-to-electrode allocation table and two subjects showed possible partial adaptation. Those subjects with adaptation patterns presented overall small and consistent frequency mismatches across their electrode arrays.
APA, Harvard, Vancouver, ISO, and other styles
2

Roebel, Axel, and Frederik Bous. "Neural Vocoding for Singing and Speaking Voices with the Multi-Band Excited WaveNet." Information 13, no. 3 (February 23, 2022): 103. http://dx.doi.org/10.3390/info13030103.

Full text
Abstract:
The use of the mel spectrogram as a signal parameterization for voice generation is quite recent and linked to the development of neural vocoders. These are deep neural networks that allow reconstructing high-quality speech from a given mel spectrogram. While initially developed for speech synthesis, now neural vocoders have also been studied in the context of voice attribute manipulation, opening new means for voice processing in audio production. However, to be able to apply neural vocoders in real-world applications, two problems need to be addressed: (1) To support use in professional audio workstations, the computational complexity should be small, (2) the vocoder needs to support a large variety of speakers, differences in voice qualities, and a wide range of intensities potentially encountered during audio production. In this context, the present study will provide a detailed description of the Multi-band Excited WaveNet, a fully convolutional neural vocoder built around signal processing blocks. It will evaluate the performance of the vocoder when trained on a variety of multi-speaker and multi-singer databases, including an experimental evaluation of the neural vocoder trained on speech and singing voices. Addressing the problem of intensity variation, the study will introduce a new adaptive signal normalization scheme that allows for robust compensation for dynamic and static gain variations. Evaluations are performed using objective measures and a number of perceptual tests including different neural vocoder algorithms known from the literature. The results confirm that the proposed vocoder compares favorably to the state-of-the-art in its capacity to generalize to unseen voices and voice qualities. The remaining challenges will be discussed.
APA, Harvard, Vancouver, ISO, and other styles
3

Ausili, Sebastian A., Bradford Backus, Martijn J. H. Agterberg, A. John van Opstal, and Marc M. van Wanrooij. "Sound Localization in Real-Time Vocoded Cochlear-Implant Simulations With Normal-Hearing Listeners." Trends in Hearing 23 (January 2019): 233121651984733. http://dx.doi.org/10.1177/2331216519847332.

Full text
Abstract:
Bilateral cochlear-implant (CI) users and single-sided deaf listeners with a CI are less effective at localizing sounds than normal-hearing (NH) listeners. This performance gap is due to the degradation of binaural and monaural sound localization cues, caused by a combination of device-related and patient-related issues. In this study, we targeted the device-related issues by measuring sound localization performance of 11 NH listeners, listening to free-field stimuli processed by a real-time CI vocoder. The use of a real-time vocoder is a new approach, which enables testing in a free-field environment. For the NH listening condition, all listeners accurately and precisely localized sounds according to a linear stimulus–response relationship with an optimal gain and a minimal bias both in the azimuth and in the elevation directions. In contrast, when listening with bilateral real-time vocoders, listeners tended to orient either to the left or to the right in azimuth and were unable to determine sound source elevation. When listening with an NH ear and a unilateral vocoder, localization was impoverished on the vocoder side but improved toward the NH side. Localization performance was also reflected by systematic variations in reaction times across listening conditions. We conclude that perturbation of interaural temporal cues, reduction of interaural level cues, and removal of spectral pinna cues by the vocoder impairs sound localization. Listeners seem to ignore cues that were made unreliable by the vocoder, leading to acute reweighting of available localization cues. We discuss how current CI processors prevent CI users from localizing sounds in everyday environments.
APA, Harvard, Vancouver, ISO, and other styles
4

Wess, Jessica M., and Joshua G. W. Bernstein. "The Effect of Nonlinear Amplitude Growth on the Speech Perception Benefits Provided by a Single-Sided Vocoder." Journal of Speech, Language, and Hearing Research 62, no. 3 (March 25, 2019): 745–57. http://dx.doi.org/10.1044/2018_jslhr-h-18-0001.

Full text
Abstract:
PurposeFor listeners with single-sided deafness, a cochlear implant (CI) can improve speech understanding by giving the listener access to the ear with the better target-to-masker ratio (TMR; head shadow) or by providing interaural difference cues to facilitate the perceptual separation of concurrent talkers (squelch). CI simulations presented to listeners with normal hearing examined how these benefits could be affected by interaural differences in loudness growth in a speech-on-speech masking task.MethodExperiment 1 examined a target–masker spatial configuration where the vocoded ear had a poorer TMR than the nonvocoded ear. Experiment 2 examined the reverse configuration. Generic head-related transfer functions simulated free-field listening. Compression or expansion was applied independently to each vocoder channel (power-law exponents: 0.25, 0.5, 1, 1.5, or 2).ResultsCompression reduced the benefit provided by the vocoder ear in both experiments. There was some evidence that expansion increased squelch in Experiment 1 but reduced the benefit in Experiment 2 where the vocoder ear provided a combination of head-shadow and squelch benefits.ConclusionsThe effects of compression and expansion are interpreted in terms of envelope distortion and changes in the vocoded-ear TMR (for head shadow) or changes in perceived target–masker spatial separation (for squelch). The compression parameter is a candidate for clinical optimization to improve single-sided deafness CI outcomes.
APA, Harvard, Vancouver, ISO, and other styles
5

Bosen, Adam K., and Michael F. Barry. "Serial Recall Predicts Vocoded Sentence Recognition Across Spectral Resolutions." Journal of Speech, Language, and Hearing Research 63, no. 4 (April 27, 2020): 1282–98. http://dx.doi.org/10.1044/2020_jslhr-19-00319.

Full text
Abstract:
Purpose The goal of this study was to determine how various aspects of cognition predict speech recognition ability across different levels of speech vocoding within a single group of listeners. Method We tested the ability of young adults ( N = 32) with normal hearing to recognize Perceptually Robust English Sentence Test Open-set (PRESTO) sentences that were degraded with a vocoder to produce different levels of spectral resolution (16, eight, and four carrier channels). Participants also completed tests of cognition (fluid intelligence, short-term memory, and attention), which were used as predictors of sentence recognition. Sentence recognition was compared across vocoder conditions, predictors were correlated with individual differences in sentence recognition, and the relationships between predictors were characterized. Results PRESTO sentence recognition performance declined with a decreasing number of vocoder channels, with no evident floor or ceiling performance in any condition. Individual ability to recognize PRESTO sentences was consistent relative to the group across vocoder conditions. Short-term memory, as measured with serial recall, was a moderate predictor of sentence recognition (ρ = 0.65). Serial recall performance was constant across vocoder conditions when measured with a digit span task. Fluid intelligence was marginally correlated with serial recall, but not sentence recognition. Attentional measures had no discernible relationship to sentence recognition and a marginal relationship with serial recall. Conclusions Verbal serial recall is a substantial predictor of vocoded sentence recognition, and this predictive relationship is independent of spectral resolution. In populations that show variable speech recognition outcomes, such as listeners with cochlear implants, it should be possible to account for the independent effects of spectral resolution and verbal serial recall in their speech recognition ability. Supplemental Material https://doi.org/10.23641/asha.12021051
APA, Harvard, Vancouver, ISO, and other styles
6

Shi, Yong Peng. "Research and Implementation of MELP Algorithm Based on TMS320VC5509A." Advanced Materials Research 934 (May 2014): 239–44. http://dx.doi.org/10.4028/www.scientific.net/amr.934.239.

Full text
Abstract:
A kind of MELP vocode is designed based on DSP TMS320VC5509A in this article. Firstly, it expatiates the MELP algorithm,then the idea of modeling and realization process on DSP based is proposed. At last we can complete the function simulation of the encoding and decoding system,and the experiment result shows that the synthetical signals fit well with the original ones, and the quality of the speech got from the vocoder is good.
APA, Harvard, Vancouver, ISO, and other styles
7

Clark, Graeme, and Peter J. Blamey. "ELectrotactile vocoder." Journal of the Acoustical Society of America 90, no. 5 (November 1991): 2880. http://dx.doi.org/10.1121/1.401830.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Goupell, Matthew J., Garrison T. Draves, and Ruth Y. Litovsky. "Recognition of vocoded words and sentences in quiet and multi-talker babble with children and adults." PLOS ONE 15, no. 12 (December 29, 2020): e0244632. http://dx.doi.org/10.1371/journal.pone.0244632.

Full text
Abstract:
A vocoder is used to simulate cochlear-implant sound processing in normal-hearing listeners. Typically, there is rapid improvement in vocoded speech recognition, but it is unclear if the improvement rate differs across age groups and speech materials. Children (8–10 years) and young adults (18–26 years) were trained and tested over 2 days (4 hours) on recognition of eight-channel noise-vocoded words and sentences, in quiet and in the presence of multi-talker babble at signal-to-noise ratios of 0, +5, and +10 dB. Children achieved poorer performance than adults in all conditions, for both word and sentence recognition. With training, vocoded speech recognition improvement rates were not significantly different between children and adults, suggesting that improvement in learning how to process speech cues degraded via vocoding is absent of developmental differences across these age groups and types of speech materials. Furthermore, this result confirms that the acutely measured age difference in vocoded speech recognition persists after extended training.
APA, Harvard, Vancouver, ISO, and other styles
9

Ding, Yuntao, Rangzhuoma Cai, and Baojia Gong. "Tibetan speech synthesis based on an improved neural network." MATEC Web of Conferences 336 (2021): 06012. http://dx.doi.org/10.1051/matecconf/202133606012.

Full text
Abstract:
Nowadays, Tibetan speech synthesis based on neural network has become the mainstream synthesis method. Among them, the griffin-lim vocoder is widely used in Tibetan speech synthesis because of its relatively simple synthesis.Aiming at the problem of low fidelity of griffin-lim vocoder, this paper uses WaveNet vocoder instead of griffin-lim for Tibetan speech synthesis.This paper first uses convolution operation and attention mechanism to extract sequence features.And then uses linear projection and feature amplification module to predict mel spectrogram.Finally, use WaveNet vocoder to synthesize speech waveform. Experimental data shows that our model has a better performance in Tibetan speech synthesis.
APA, Harvard, Vancouver, ISO, and other styles
10

Eng, Erica, Can Xu, Sarah Medina, Fan-Yin Cheng, René Gifford, and Spencer Smith. "Objective discrimination of bimodal speech using the frequency following response: A machine learning approach." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A91. http://dx.doi.org/10.1121/10.0015651.

Full text
Abstract:
Bimodal hearing, which combines a cochlear implant (CI) with a contralateral hearing aid, provides significant speech recognition benefits relative to a monaural CI. Factors predicting bimodal benefit remain poorly understood but may involve extracting fundamental frequency and/or formant information from the non-implanted ear. This study investigated whether neural responses (frequency following responses, FFRs) to simulated bimodal signals can be (1) accurately classified using machine learning and (2) used to predict perceptual bimodal benefit. We hypothesized that FFR classification accuracy would improve with increasing acoustic bandwidth due to greater fundamental and formant frequency access. Three vowels (/e/, /i/, and /ʊ/) with identical fundamental frequencies were manipulated to create five bimodal simulations (vocoder in right ear, lowpass filtered in left ear): Vocoder-only, Vocoder +125 Hz, Vocoder +250 Hz, Vocoder +500 Hz, and Vocoder +750 Hz. Perceptual performance on the BKB-SIN test was also measured using the same five configurations. FFR classification accuracy improved with increasing bimodal acoustic bandwidth. Furthermore, FFR bimodal benefit predicted behavioral bimodal benefit. These results indicate that the FFR may be useful in objectively verifying and tuning bimodal configurations.
APA, Harvard, Vancouver, ISO, and other styles
11

Shinohara, Yasuaki. "Japanese pitch-accent perception of noise-vocoded sine-wave speech." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A175. http://dx.doi.org/10.1121/10.0015940.

Full text
Abstract:
A previous study has demonstrated that speech intelligibility is improved for a tone language when sine-wave speech is noise-vocoded, because noise-vocoding eliminates the quasi-periodicity of sine-wave speech. This study examined whether identification accuracy of Japanese pitch-accent words increases after sine-wave speech is noise-vocoded. The results showed that the Japanese listeners’ identification accuracy significantly increased, but their discrimination accuracy did not show a significant difference between the sine-wave speech and noise-vocoded sine-wave speech conditions. These results suggest that Japanese listeners can auditorily discriminate minimal-pair words using any acoustic cues in both conditions, but quasi-periodicity is eliminated by noise-vocoding so that the Japanese listeners’ identification accuracy increases in the noise-vocoded sine-wave speech condition. The same results were not observed when another way of noise-vocoding was used in a previous study, suggesting that the quasi-periodicity of sine-wave speech needs to be adequately eliminated by a noise-vocoder to show a significant difference in identification.
APA, Harvard, Vancouver, ISO, and other styles
12

Jacobs, Paul E. "Variable rate vocoder." Journal of the Acoustical Society of America 103, no. 4 (April 1998): 1700. http://dx.doi.org/10.1121/1.421053.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Griffin, Daniel W., and Jae S. Lim. "Multiband excitation vocoder." IEEE Transactions on Acoustics, Speech, and Signal Processing 36, no. 8 (August 1988): 1223–35. http://dx.doi.org/10.1109/29.1651.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Shinohara, Yasuaki. "Perception of noise-vocoded sine-wave speech of Japanese pitch-accent words." JASA Express Letters 2, no. 8 (August 2022): 085204. http://dx.doi.org/10.1121/10.0013423.

Full text
Abstract:
The present study examined whether the identification accuracy of Japanese pitch-accent words increased after the sine-wave speech underwent noise vocoding, which eliminates the quasi-periodicity of the sine-wave speech. The results demonstrated that Japanese listeners were better at discriminating sine-wave speech than noise-vocoded sine-wave speech, with no significant difference in identification between the two conditions. They identify sine-wave pitch-accent words to some extent using acoustic cues other than the pitch accent. The noise vocoder used in the present study might not have been substantially effective for Japanese listeners to show a significant difference in the identification between the two conditions.
APA, Harvard, Vancouver, ISO, and other styles
15

Hodges, Aaron, Raymond L. Goldsworthy, Matthew B. Fitzgerald, and Takako Fujioka. "Transfer effects of discrete tactile mapping of musical pitch on discrimination of vocoded stimuli." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A229. http://dx.doi.org/10.1121/10.0016101.

Full text
Abstract:
Many studies have found benefits of using somatosensory modality to augment sound information for individuals with hearing loss. However, few studies have explored the use of multiple regions of the body sensitive to vibrotactile stimulation to convey discrete F0 information, important for music perception. This study explored whether mapping of multiple finger patterns associated with musical notes can be learned quickly and transferred to discriminate vocoded auditory stimuli. Each of eight musical diatonic scale notes were associated with one of unique finger digits 2-5 patterns in the dominant hand, where pneumatic tactile stimulation apparatus were attached. The study consisted of a pre and post-test with a learning phase in-between. During the learning phase, normal-hearing participants had to identify common nursery song melodies presented with simultaneous auditory-tactile stimulus for about 10 min, using non-vocoded (original) audio. Pre- and post-tests examined stimulus discrimination for 4 conditions: original audio + tactile, tactile only, vocoded audio only, and vocoded audio + tactile. The audio vocoder used cochlear implant 4 channel simulation. Our results demonstrated audio-tactile learning improved participant’s performance on the vocoded audio + tactile tasks. The tactile only condition also significantly improved, indicating the rapid learning of the audio-tactile mapping and its effective transfer.
APA, Harvard, Vancouver, ISO, and other styles
16

Niu, Qing Yu, Qiang Li, and Qin Jun Shu. "Research and Analysis on the Implementation of MELP Algorithm on DSP." Advanced Materials Research 1030-1032 (September 2014): 1755–59. http://dx.doi.org/10.4028/www.scientific.net/amr.1030-1032.1755.

Full text
Abstract:
This paper briefly analyses the principle of MELP vocoder algorithm, and TI's TMS320C5509(C5509) DSP is selected as an implementation platform of 2.4kbps MELP speech-coding algorithm. To ensure the rational and efficient utilization of the limited memory resources, this paper introduces the processing method of MELP algorithm based on frame structure and gives a profound analysis on how to configure its memory space of the selected DSP by analyzing its memory structure and considering the specific circumstances of the MELP vocoder algorithm. Finally, this paper gives the memory configuration during the implementation of MELP vocoder on TI's TMS320C5509 DSP.
APA, Harvard, Vancouver, ISO, and other styles
17

Tamati, Terrin N., Lars Bakker, Stefan Smeenk, Almut Jebens, Thomas Koelewijn, and Deniz Başkent. "Pupil response to familiar and unfamiliar talkers in the recognition of noise-vocoded speech." Journal of the Acoustical Society of America 151, no. 4 (April 2022): A264. http://dx.doi.org/10.1121/10.0011285.

Full text
Abstract:
In some challenging listening conditions, listeners are more accurate at recognizing speech produced by a familiar talker compared to unfamiliar talkers. However, previous studies have found little to no talker-familiarity benefit in the recognition of noise-vocoded speech, potentially due to limitations in the talker-specific details conveyed in noise-vocoded signals. Although no strong effect on performance has been observed, listening to a familiar talker may reduce the listening effort experienced. The current study used pupillometry to assess how talker familiarity could impact the amount of effort required to recognize noise-vocoded speech. Four groups of normal-hearing, listeners completed talker familiarity training, each with a different talker. Then, listeners repeated sentences produced by the familiar (training) talker and three unfamiliar talkers. Sentences were mixed with multi-talker babble, and were processed with an 8-channel noise-vocoder; SNR was set to a participant’s 50% correct performance level. Preliminary results demonstrate no overall talker-familiarity benefit across training groups. Examining each training group separately showed differences in pupil response for familiar and unfamiliar talkers, but the direction and size of the effect depended on the training talker. These preliminary findings suggest that normal-hearing, listeners make use of limited talker-specific details in the recognition of noise-vocoded speech.
APA, Harvard, Vancouver, ISO, and other styles
18

Asyraf, Muhammad A., and Dhany Arifianto. "Effect of electric-acoustic cochlear implant stimulation and coding strategies on spatial cues of speech signals in reverberant room." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A195. http://dx.doi.org/10.1121/10.0016005.

Full text
Abstract:
The comparison of spatial cues changes in different setups and coding strategies used in cochlear implants (CI) is investigated. In this experiment, we implement three voice coder setups, such as bilateral CI, bimodal CI, and electro-acoustic stimulation (EAS). Two well-known coding strategies are used, which are continuous interleaved sampling (CIS) and spectral peak (SPEAK). Speech signals are convoluted with appropriate binaural room impulse response (BRIR), creating reverberant spatial stimuli. Five different reverberant conditions (including anechoic) were applied to the stimuli. Interaural level and time differences (ILD and ITD) are evaluated objectively and subjectively, and their relationship with the intelligibility of speech is observed. Prior objective evaluation with CIS reveals that clarity (C50) becomes a more important factor in spatial cue change than reverberation time. Vocoded conditions (bilateral CI) show an increment in ILD value (compression has not been implemented yet on the vocoder processing), when the value of ITD gets more different (decreased) from the middle point. Reverberation degrades the intelligibility rate at various rates depending on the C50 value, both in unvocoded and vocoded conditions. In the vocoded condition, decrement on spatial cues was also followed by the decreement on the intelligibility of spatial stimuli.
APA, Harvard, Vancouver, ISO, and other styles
19

Taguchi, Tetsu. "Formant pattern matching vocoder." Journal of the Acoustical Society of America 91, no. 3 (March 1992): 1790. http://dx.doi.org/10.1121/1.403749.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Taguchi, Tetsu. "Multi‐pulse type vocoder." Journal of the Acoustical Society of America 88, no. 6 (December 1990): 2913. http://dx.doi.org/10.1121/1.399635.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Ali, Hussnain, Nursadul Mamun, Avamarie Bruggeman, Ram Charan M. Chandra Shekar, Juliana N. Saba, and John H. L. Hansen. "The CCi-MOBILE Vocoder." Journal of the Acoustical Society of America 144, no. 3 (September 2018): 1872. http://dx.doi.org/10.1121/1.5068238.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Hillenbrand, James M., and Robert A. Houde. "A damped sinewave vocoder." Journal of the Acoustical Society of America 104, no. 3 (September 1998): 1835. http://dx.doi.org/10.1121/1.424405.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Liu, Ludy. "Fixed point vocoder implementation." Computer Standards & Interfaces 20, no. 6-7 (March 1999): 464–65. http://dx.doi.org/10.1016/s0920-5489(99)91011-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Gibbs, Bobby E., Joshua G. W. Bernstein, Douglas S. Brungart, and Matthew J. Goupell. "Effects of better-ear glimpsing, binaural unmasking, and spectral resolution on spatial release from masking in cochlear-implant users." Journal of the Acoustical Society of America 152, no. 2 (August 2022): 1230–46. http://dx.doi.org/10.1121/10.0013746.

Full text
Abstract:
Bilateral cochlear-implant (BICI) listeners obtain less spatial release from masking (SRM; speech-recognition improvement for spatially separated vs co-located conditions) than normal-hearing (NH) listeners, especially for symmetrically placed maskers that produce similar long-term target-to-masker ratios at the two ears. Two experiments examined possible causes of this deficit, including limited better-ear glimpsing (using speech information from the more advantageous ear in each time-frequency unit), limited binaural unmasking (using interaural differences to improve signal-in-noise detection), or limited spectral resolution. Listeners had NH (presented with unprocessed or vocoded stimuli) or BICIs. Experiment 1 compared natural symmetric maskers, idealized monaural better-ear masker (IMBM) stimuli that automatically performed better-ear glimpsing, and hybrid stimuli that added worse-ear information, potentially restoring binaural cues. BICI and NH-vocoded SRM was comparable to NH-unprocessed SRM for idealized stimuli but was 14%–22% lower for symmetric stimuli, suggesting limited better-ear glimpsing ability. Hybrid stimuli improved SRM for NH-unprocessed listeners but degraded SRM for BICI and NH-vocoded listeners, suggesting they experienced across-ear interference instead of binaural unmasking. In experiment 2, increasing the number of vocoder channels did not change NH-vocoded SRM. BICI SRM deficits likely reflect a combination of across-ear interference, limited better-ear glimpsing, and poorer binaural unmasking that stems from cochlear-implant-processing limitations other than reduced spectral resolution.
APA, Harvard, Vancouver, ISO, and other styles
25

Wu, Ya Ting, Y. Y. Zhao, and Fei Yu. "An Improved Echo Cancellation Algorithm with Low Computational Complexity." Applied Mechanics and Materials 303-306 (February 2013): 2042–45. http://dx.doi.org/10.4028/www.scientific.net/amm.303-306.2042.

Full text
Abstract:
A low-complexity echo canceller integrated with vocoder is proposed in this paper to speed up the convergence process. By making full use of the linear prediction parameters retrieved from decoder and the voice active detection feature of the vocoder, the new echo canceller avoids the need to calculate decorrelation filter coefficients and prewhiten the received signal separately. Simulation results show performance improvement of the proposed algorithm in terms of convergence rate and echo return loss enhancement.
APA, Harvard, Vancouver, ISO, and other styles
26

Dolson, Mark. "The Phase Vocoder: A Tutorial." Computer Music Journal 10, no. 4 (1986): 14. http://dx.doi.org/10.2307/3680093.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Ketchum, Richard H. "Code excited linear predictive vocoder." Journal of the Acoustical Society of America 91, no. 6 (June 1992): 3594. http://dx.doi.org/10.1121/1.402803.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Pereira, M. A. T., and F. A. G. Ferreira. "Simulação de Um Vocoder Digital." Journal of Communication and Information Systems 2, no. 1 (December 30, 1987): 49–66. http://dx.doi.org/10.14209/jcis.1987.3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Fodor, Ádám, László Kopácsi, Zoltán Ádám Milacski, and András Lőrincz. "Speech De-identification with Deep Neural Networks." Acta Cybernetica 25, no. 2 (December 7, 2021): 257–69. http://dx.doi.org/10.14232/actacyb.288282.

Full text
Abstract:
Cloud-based speech services are powerful practical tools but the privacy of the speakers raises important legal concerns when exposed to the Internet. We propose a deep neural network solution that removes personal characteristics from human speech by converting it to the voice of a Text-to-Speech (TTS) system before sending the utterance to the cloud. The network learns to transcode sequences of vocoder parameters, delta and delta-delta features of human speech to those of the TTS engine. We evaluated several TTS systems, vocoders and audio alignment techniques. We measured the performance of our method by (i) comparing the result of speech recognition on the de-identified utterances with the original texts, (ii) computing the Mel-Cepstral Distortion of the aligned TTS and the transcoded sequences, and (iii) questioning human participants in A-not-B, 2AFC and 6AFC tasks. Our approach achieves the level required by diverse applications.
APA, Harvard, Vancouver, ISO, and other styles
30

Lynch, Michael P., Rebecca E. Eilers, D. Kimbrough Oller, Richard C. Urbano, and Patricia J. Pero. "Multisensory Narrative Tracking by a Profoundly Deaf Subject Using an Electrocutaneous Vocoder and a Vibrotactile Aid." Journal of Speech, Language, and Hearing Research 32, no. 2 (June 1989): 331–38. http://dx.doi.org/10.1044/jshr.3202.331.

Full text
Abstract:
A congenitally, profoundly deaf adult who had received 41 hours of tactual word recognition training in a previous study was assessed in tracking of connected discourse. This assessment was conducted in three phases. In the first phase, the subject used the Tacticon 1600 electrocutaneous vocoder to track a narrative in three conditions: (a) lipreading and aided hearing (L+H), (b) lipreading and tactual vocoder (L+TV), and (c) lipreading, tactual vocoder, and aided hearing (L+TV+H), Subject performance was significantly better in the L+TV+H condition than in the L+H condition, suggesting that the subject benefitted from the additional information provided by the tactual vocoder. In the second phase, the Tactaid II vibrotactile aid was used in three conditions: (a) lipreading alone, (b) lipreading and tactual aid (L+TA), and (c) lipreading, tactual aid, and aided hearing (L+TA+H). The subject was able to combine cues from the Tactaid II with those from lipreading and aided hearing. In the third phase, both tactual devices were used in six conditions: (a) lipreading alone (L), (b) lipreading and aided hearing (L+H), (c) lipreading and Tactaid II (L+TA), (d) lipreading and Tacticon 1600 (L+TV), (e) lipreading, Tactaid II, and aided hearing (L+TA+H), and (f) lipreading, Tacticon 1600, and aided hearing (L+TV+H). In this phase, only the Tactaid II significantly improved tracking performance over lipreading and aided hearing. Overall, improvement in tracking performance occurred within and across phases of this study.
APA, Harvard, Vancouver, ISO, and other styles
31

Al-Radhi, Mohammed Salah, Tamás Gábor Csapó, and Géza Németh. "Continuous vocoder applied in deep neural network based voice conversion." Multimedia Tools and Applications 78, no. 23 (September 16, 2019): 33549–72. http://dx.doi.org/10.1007/s11042-019-08198-5.

Full text
Abstract:
Abstract In this paper, a novel vocoder is proposed for a Statistical Voice Conversion (SVC) framework using deep neural network, where multiple features from the speech of two speakers (source and target) are converted acoustically. Traditional conversion methods focus on the prosodic feature represented by the discontinuous fundamental frequency (F0) and the spectral envelope. Studies have shown that speech analysis/synthesis solutions play an important role in the overall quality of the converted voice. Recently, we have proposed a new continuous vocoder, originally for statistical parametric speech synthesis, in which all parameters are continuous. Therefore, this work introduces a new method by using a continuous F0 (contF0) in SVC to avoid alignment errors that may happen in voiced and unvoiced segments and can degrade the converted speech. Our contribution includes the following. (1) We integrate into the SVC framework the continuous vocoder, which provides an advanced model of the excitation signal, by converting its contF0, maximum voiced frequency, and spectral features. (2) We show that the feed-forward deep neural network (FF-DNN) using our vocoder yields high quality conversion. (3) We apply a geometric approach to spectral subtraction (GA-SS) in the final stage of the proposed framework, to improve the signal-to-noise ratio of the converted speech. Our experimental results, using two male and one female speakers, have shown that the resulting converted speech with the proposed SVC technique is similar to the target speaker and gives state-of-the-art performance as measured by objective evaluation and subjective listening tests.
APA, Harvard, Vancouver, ISO, and other styles
32

Ming, Yan, Li Zhen Wang, and Xu Jiu Xia. "A Rate of 4kbps Vocoder Based on MELP." Advanced Materials Research 1030-1032 (September 2014): 1638–41. http://dx.doi.org/10.4028/www.scientific.net/amr.1030-1032.1638.

Full text
Abstract:
A 4kbps vocoder based on MELP is presented in this paper. It uses the parameter encoding and mixed excitation technology to ensure the quality of speech. Through adopting the scalar quantization of Line Spectrum Frequency (LSF), the algorithm reduces the storage and computational complexity. Meanwhile, 4kbps vocoder adds a new frame type-transition frame. The classifier can reduce the U/V decision errors and avoid excessive switching between voiced frame and unvoiced frame. A modified bit allocation table is introduced and the PESQ-MOS and coding time test shows that the synthetic speech quality has been improved and reached the quality of communication.
APA, Harvard, Vancouver, ISO, and other styles
33

Ao, Zhen, Feng Li, Qiang Ma, and Guiqing He. "Voice and Position Simultaneous Communication System Based on Beidou Navigation Constellation." Xibei Gongye Daxue Xuebao/Journal of Northwestern Polytechnical University 38, no. 5 (October 2020): 1010–17. http://dx.doi.org/10.1051/jnwpu/20203851010.

Full text
Abstract:
Considering China Beidou has unique two-way communication capability for short messages that are not available in other navigation systems such as GPS, a 600bps vocoder adapted to the short message channel of Beidou is developed. The sinusoidal excitation linear prediction algorithm is adopted by the vocoder to achieve voice communication with clear communication quality. Furthermore, a coordinate compression algorithm for processing positioning information is designed to provide more transmission space for speech encoded data. Based on the above-mentioned results, a communication system that only the Beidou navigation system is used to complete two-way secure voice and positioning simultaneous interpretation is realized. The system firstly uses a voice conversion program to convert the voice coded data obtained by the vocoder codec module into the Beidou short message data format; and then the voice code analysis program and the latitude and longitude analysis program is used to parse the voice code and location information; Finally, the experimental results of voice communication and positioning transmission are verified on the Beidou short message transceiver, and the subjective MOS test score indicates that the way is paved for the practical use of Beidou short message voice communication.
APA, Harvard, Vancouver, ISO, and other styles
34

Taguchi, Tetsu. "Pattern matching vocoder using LSP parameters." Journal of the Acoustical Society of America 93, no. 3 (March 1993): 1676. http://dx.doi.org/10.1121/1.406754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Manley, Harold J., and Joseph de Lellis. "Half duplex integral vocoder modem system." Journal of the Acoustical Society of America 79, no. 4 (April 1986): 1198–99. http://dx.doi.org/10.1121/1.393322.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

FISCHMAN, RAJMIL. "The phase vocoder: theory and practice." Organised Sound 2, no. 2 (August 1997): 127–45. http://dx.doi.org/10.1017/s1355771897009060.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Dusheng, Wang, Zhang Jiankang, and Fan Changxin. "A single processor multi-rate vocoder." Journal of Electronics (China) 14, no. 1 (January 1997): 59–62. http://dx.doi.org/10.1007/s11767-996-1024-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Brooks, P. L., B. J. Frost, J. L. Mason, and D. M. Gibson. "Word and Feature Identification by Profoundly Deaf Teenagers Using the Queen's University Tactile Vocoder." Journal of Speech, Language, and Hearing Research 30, no. 1 (March 1987): 137–41. http://dx.doi.org/10.1044/jshr.3001.137.

Full text
Abstract:
The experiments described are part of an ongoing evaluation of the Queen's University Tactile Vocoder, a device that allows the acoustic waveform to be felt as a vibrational pattern on the skin. Two prelingually profoundly deaf teenagers reached criterion on a 50-word vocabulary (live voice, single speaker) using information obtained solely from the tactile vocoder with 28.5 and 24.0 hours of training. Immediately following word-learning experiments, subjects were asked to place 16 CVs into five phonemic categories (voiced & unvoiced stops, voiced & unvoiced fricatives, approximants). Average accuracy was 84.5%. Similar performance (89.6%) was obtained for placement of 12 VCs into four phonemic categories. Subjects were able to acquire some general rules about voicing and manner of articulation cues.
APA, Harvard, Vancouver, ISO, and other styles
39

Gauer, Johannes, Anil Nagathil, Kai Eckel, Denis Belomestny, and Rainer Martin. "A versatile deep-neural-network-based music preprocessing and remixing scheme for cochlear implant listeners." Journal of the Acoustical Society of America 151, no. 5 (May 2022): 2975–86. http://dx.doi.org/10.1121/10.0010371.

Full text
Abstract:
While cochlear implants (CIs) have proven to restore speech perception to a remarkable extent, access to music remains difficult for most CI users. In this work, a methodology for the design of deep learning-based signal preprocessing strategies that simplify music signals and emphasize rhythmic information is proposed. It combines harmonic/percussive source separation and deep neural network (DNN) based source separation in a versatile source mixture model. Two different neural network architectures were assessed with regard to their applicability for this task. The method was evaluated with instrumental measures and in two listening experiments for both network architectures and six mixing presets. Normal-hearing subjects rated the signal quality of the processed signals compared to the original both with and without a vocoder which provides an approximation of the auditory perception in CI listeners. Four combinations of remix models and DNNs have been selected for an evaluation with vocoded signals and were all rated significantly better in comparison to the unprocessed signal. In particular, the two best-performing remix networks are promising candidates for further evaluation in CI listeners.
APA, Harvard, Vancouver, ISO, and other styles
40

Mohammed, Zinah J., and Abdulkareem A. Kadhim. "A Comparative Study of Speech Coding Techniques for Electro Larynx Speech Production." Iraqi Journal of Information and Communication Technology 5, no. 1 (April 29, 2022): 31–41. http://dx.doi.org/10.31987/ijict.5.1.185.

Full text
Abstract:
Speech coding is a method of earning a tight speech signals representation for efficient storage and efficient transmission over band limited wired or wireless channels. This is usually achieved with acceptable representation and least number of bits without depletion in the perceptual quality. A number of speech coding methods already developed and various speech coding algorithms for speech analysis and synthesis are used. This paper deals with the comparison of selected coding methods for speech signal produced by Electro Larynx (EL) device. The latter is a device used by cancer patients with their vocal laryngeal cords being removed. The used methods are Residual-Excited Linear Prediction (RELP), Code Excited Linear Prediction (CELP), Algebraic Code Excited Linear Predictive (ACELP), Phase Vocoders based on Wavelet Transform (PVWT), Channel Vocoders based on Wavelet Transform (CVWT), and Phase vocoder based on Dual-Tree Rational-Dilation Complex Wavelet Transform (PVDT-RADWT). The aim here is to select the best coding approach based on the quality of the reproduced speech. The signal used in the test is speech signal recorded either directly by normal persons or else produced by EL device. The performance of each method is evaluated using both objective and subjective listening tests. The results indicate that PVWT and ACELP coders perform better than other methods having about 40 dB SNR and 3 PESQ score for EL speech and 75 dB with 3.5 PESQ score for normal speech, respectively.
APA, Harvard, Vancouver, ISO, and other styles
41

Apoux, Frédéric, Brittney L. Carter, and Eric W. Healy. "Effect of Dual-Carrier Processing on the Intelligibility of Concurrent Vocoded Sentences." Journal of Speech, Language, and Hearing Research 61, no. 11 (November 8, 2018): 2804–13. http://dx.doi.org/10.1044/2018_jslhr-h-17-0234.

Full text
Abstract:
Purpose The goal of this study was to examine the role of carrier cues in sound source segregation and the possibility to enhance the intelligibility of 2 sentences presented simultaneously. Dual-carrier (DC) processing (Apoux, Youngdahl, Yoho, & Healy, 2015) was used to introduce synthetic carrier cues in vocoded speech. Method Listeners with normal hearing heard sentences processed either with a DC or with a traditional single-carrier (SC) vocoder. One group was asked to repeat both sentences in a sentence pair (Experiment 1). The other group was asked to repeat only 1 sentence of the pair and was provided additional segregation cues involving onset asynchrony (Experiment 2). Results Both experiments showed that not only is the “target” sentence more intelligible in DC compared with SC, but the “background” sentence intelligibility is equally enhanced. The participants did not benefit from the additional segregation cues. Conclusions The data showed a clear benefit of using a distinct carrier to convey each sentence (i.e., DC processing). Accordingly, the poor speech intelligibility in noise typically observed with SC-vocoded speech may be partly attributed to the envelope of independent sound sources sharing the same carrier. Moreover, this work suggests that noise reduction may not be the only viable option to improve speech intelligibility in noise for users of cochlear implants. Alternative approaches aimed at enhancing sound source segregation such as DC processing may help to improve speech intelligibility while preserving and enhancing the background.
APA, Harvard, Vancouver, ISO, and other styles
42

McGee, W. F., and Paul Merkley. "A Real-Time Logarithmic-Frequency Phase Vocoder." Computer Music Journal 15, no. 1 (1991): 20. http://dx.doi.org/10.2307/3680383.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Pope, S. P., B. Solberg, and R. W. Brodersen. "A single-chip linear-predictive-coding vocoder." IEEE Journal of Solid-State Circuits 22, no. 3 (June 1987): 479–87. http://dx.doi.org/10.1109/jssc.1987.1052754.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Yoneguchi, Ryoichi, and Takahiro Murakami. "A Phase Vocoder without Requiring Phase Unwrapping." IEEJ Transactions on Electronics, Information and Systems 138, no. 4 (2018): 352–59. http://dx.doi.org/10.1541/ieejeiss.138.352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
45

Rakowski, Kathleen, Christine Brenner, and Janet M. Weisenberger. "Evaluation of a 32‐channel electrotactile vocoder." Journal of the Acoustical Society of America 86, S1 (November 1989): S83. http://dx.doi.org/10.1121/1.2027686.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Cowan, Robert S. C. "Electrotactile vocoder using handset with stimulating electrodes." Journal of the Acoustical Society of America 114, no. 5 (2003): 2545. http://dx.doi.org/10.1121/1.1634117.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Fette, Bruce A., and Cynthia A. Jaskie. "Low bit rate vocoder means and method." Journal of the Acoustical Society of America 96, no. 4 (October 1994): 2622. http://dx.doi.org/10.1121/1.410048.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Xiao, Qiang, Liang Chen, and Chao Geng. "Low Bit Rate Speech Coding Using Lattice Vector Quantization and Time-Scale Modification." Advanced Materials Research 383-390 (November 2011): 5111–16. http://dx.doi.org/10.4028/www.scientific.net/amr.383-390.5111.

Full text
Abstract:
This paper presents a low bit rate speech coder based on predictive lattice vector quantization (PLVQ) and time-scale modification (TSM). The coding model of proposed vocoder is built on the MELP, in which bit rate reduction is achieved by taking advantage of PLVQ and TSM techniques. PLVQ is used to encode the speech line spectrum pair (LSP) parameters, which has the advantage of lower implementation complexity than multi-stage vector quantization (MSVQ), moreover, it does not require memory for codebook storage. With our speech data base, PLVQ can save up to 4 bits/frame compared to unstructured codebook MSVQ. TSM can change the speed of speech signal with its perceptual characteristics remained. Through appending TSM as previous and post process, speech coding at bit rate about 1.1 kbps could be easily achieved without modifying the vocoder structure.
APA, Harvard, Vancouver, ISO, and other styles
49

Patro, Chhayakanta, and Lisa Lucks Mendel. "Gated Word Recognition by Postlingually Deafened Adults With Cochlear Implants: Influence of Semantic Context." Journal of Speech, Language, and Hearing Research 61, no. 1 (January 22, 2018): 145–58. http://dx.doi.org/10.1044/2017_jslhr-h-17-0141.

Full text
Abstract:
PurposeThe main goal of this study was to investigate the minimum amount of sensory information required to recognize spoken words (isolation points [IPs]) in listeners with cochlear implants (CIs) and investigate facilitative effects of semantic contexts on the IPs.MethodListeners with CIs as well as those with normal hearing (NH) participated in the study. In Experiment 1, the CI users listened to unprocessed (full-spectrum) stimuli and individuals with NH listened to full-spectrum or vocoder processed speech. IPs were determined for both groups who listened to gated consonant-nucleus-consonant words that were selected based on lexical properties. In Experiment 2, the role of semantic context on IPs was evaluated. Target stimuli were chosen from the Revised Speech Perception in Noise corpus based on the lexical properties of the final words.ResultsThe results indicated that spectrotemporal degradations impacted IPs for gated words adversely, and CI users as well as participants with NH listening to vocoded speech had longer IPs than participants with NH who listened to full-spectrum speech. In addition, there was a clear disadvantage due to lack of semantic context in all groups regardless of the spectral composition of the target speech (full spectrum or vocoded). Finally, we showed that CI users (and users with NH with vocoded speech) can overcome such word processing difficulties with the help of semantic context and perform as well as listeners with NH.ConclusionWord recognition occurs even before the entire word is heard because listeners with NH associate an acoustic input with its mental representation to understand speech. The results of this study provide insight into the role of spectral degradation on the processing of spoken words in isolation and the potential benefits of semantic context. These results may also explain why CI users rely substantially on semantic context.
APA, Harvard, Vancouver, ISO, and other styles
50

Dickinson, Kay. "‘Believe’? Vocoders, digitalised female identity and camp." Popular Music 20, no. 3 (October 2001): 333–47. http://dx.doi.org/10.1017/s0261143001001532.

Full text
Abstract:
In the two or so years since Cher's ‘Believe’ rather unexpectedly became the number one selling British single of 1998, the vocoder effect – which arguably snagged the track such widespread popularity – grew into one of the safest, maybe laziest, means of guaranteeing chart success. Since then, vocoder-wielding tracks such as Eiffel 65's ‘Blue (Da Ba Dee)’ and Sonique's ‘It Feels So Good’ have held fast at the slippery British number one spot for longer than the now-standard one week, despite their artists' relative obscurity. Even chart mainstays such as Madonna (‘Music’), Victoria Beckham (with the help of True Steppers and Dane Bowers) (‘Out of Your Mind’), Steps (‘Summer of Love’) and Kylie Minogue (the back-ups in ‘Spinning Around’) turned to this strange, automated-sounding gimmick which also proved to be a favourite with the poppier UK garage outfits (you can hear it on hits such as Lonyo/Comme Ci Comme Ca's ‘Summer of Love’, for example).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography