Journal articles on the topic 'Statistical Parametric Speech Synthesizer'

To see the other types of publications on this topic, follow the link: Statistical Parametric Speech Synthesizer.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Statistical Parametric Speech Synthesizer.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Szklanny, Krzysztof, and Jakub Lachowicz. "Implementing a Statistical Parametric Speech Synthesis System for a Patient with Laryngeal Cancer." Sensors 22, no. 9 (April 21, 2022): 3188. http://dx.doi.org/10.3390/s22093188.

Full text
Abstract:
Total laryngectomy, i.e., the surgical removal of the larynx, has a profound influence on a patient’s quality of life. The procedure results in a loss of natural voice, which in effect constitutes a significant socio-psychological problem for the patient. The main aim of the study was to develop a statistical parametric speech synthesis system for a patient with laryngeal cancer, on the basis of the patient’s speech samples recorded shortly before the surgery and to check if it was possible to generate speech quality close to that of the original recordings. The recording made use of a representative corpus of the Polish language, consisting of 2150 sentences. The recorded voice proved to indicate dysphonia, which was confirmed by the auditory-perceptual RBH scale (roughness, breathiness, hoarseness) and by acoustical analysis using AVQI (The Acoustic Voice Quality Index). The speech synthesis model was trained using the Merlin repository. Twenty-five experts participated in the MUSHRA listening tests, rating the synthetic voice at 69.4 in terms of the professional voice-over talent recording, on a 0–100 scale, which is a very good result. The authors compared the quality of the synthetic voice to another model of synthetic speech trained with the same corpus, but where a voice-over talent provided the recorded speech samples. The same experts rated the voice at 63.63, which means the patient’s synthetic voice with laryngeal cancer obtained a higher score than that of the talent-voice recordings. As such, the method enabled for the creation of a statistical parametric speech synthesizer for patients awaiting total laryngectomy. As a result, the solution would improve the quality of life as well as better mental wellbeing of the patient.
APA, Harvard, Vancouver, ISO, and other styles
2

Chee Yong, Lau, Oliver Watts, and Simon King. "Combining Lightly-supervised Learning and User Feedback to Construct Andimprove a Statistical Parametric Speech Synthesizer for Malay." Research Journal of Applied Sciences, Engineering and Technology 11, no. 11 (December 15, 2015): 1227–32. http://dx.doi.org/10.19026/rjaset.11.2229.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Coto-Jiménez, Marvin. "Discriminative Multi-Stream Postfilters Based on Deep Learning for Enhancing Statistical Parametric Speech Synthesis." Biomimetics 6, no. 1 (February 7, 2021): 12. http://dx.doi.org/10.3390/biomimetics6010012.

Full text
Abstract:
Statistical parametric speech synthesis based on Hidden Markov Models has been an important technique for the production of artificial voices, due to its ability to produce results with high intelligibility and sophisticated features such as voice conversion and accent modification with a small footprint, particularly for low-resource languages where deep learning-based techniques remain unexplored. Despite the progress, the quality of the results, mainly based on Hidden Markov Models (HMM) does not reach those of the predominant approaches, based on unit selection of speech segments of deep learning. One of the proposals to improve the quality of HMM-based speech has been incorporating postfiltering stages, which pretend to increase the quality while preserving the advantages of the process. In this paper, we present a new approach to postfiltering synthesized voices with the application of discriminative postfilters, with several long short-term memory (LSTM) deep neural networks. Our motivation stems from modeling specific mapping from synthesized to natural speech on those segments corresponding to voiced or unvoiced sounds, due to the different qualities of those sounds and how HMM-based voices can present distinct degradation on each one. The paper analyses the discriminative postfilters obtained using five voices, evaluated using three objective measures, Mel cepstral distance and subjective tests. The results indicate the advantages of the discriminative postilters in comparison with the HTS voice and the non-discriminative postfilters.
APA, Harvard, Vancouver, ISO, and other styles
4

Coto-Jiménez, Marvin. "Improving Post-Filtering of Artificial Speech Using Pre-Trained LSTM Neural Networks." Biomimetics 4, no. 2 (May 28, 2019): 39. http://dx.doi.org/10.3390/biomimetics4020039.

Full text
Abstract:
Several researchers have contemplated deep learning-based post-filters to increase the quality of statistical parametric speech synthesis, which perform a mapping of the synthetic speech to the natural speech, considering the different parameters separately and trying to reduce the gap between them. The Long Short-term Memory (LSTM) Neural Networks have been applied successfully in this purpose, but there are still many aspects to improve in the results and in the process itself. In this paper, we introduce a new pre-training approach for the LSTM, with the objective of enhancing the quality of the synthesized speech, particularly in the spectrum, in a more efficient manner. Our approach begins with an auto-associative training of one LSTM network, which is used as an initialization for the post-filters. We show the advantages of this initialization for the enhancing of the Mel-Frequency Cepstral parameters of synthetic speech. Results show that the initialization succeeds in achieving better results in enhancing the statistical parametric speech spectrum in most cases when compared to the common random initialization approach of the networks.
APA, Harvard, Vancouver, ISO, and other styles
5

Trinh, Son, and Kiem Hoang. "HMM-Based Vietnamese Speech Synthesis." International Journal of Software Innovation 3, no. 4 (October 2015): 33–47. http://dx.doi.org/10.4018/ijsi.2015100103.

Full text
Abstract:
In this paper, improving naturalness HMM-based speech synthesis for Vietnamese language is described. By this synthesis method, trajectories of speech parameters are generated from the trained Hidden Markov models. A final speech waveform is synthesized from those speech parameters. The main objective for the development is to achieve maximum naturalness in output speech through key points. Firstly, system uses a high quality recorded Vietnamese speech database appropriate for training, especially in statistical parametric model approach. Secondly, prosodic informations such as tone, POS (part of speech) and features based on characteristics of Vietnamese language are added to ensure the quality of synthetic speech. Third, system uses STRAIGHT which showed its ability to produce high-quality voice manipulation and was successfully incorporated into HMM-based speech synthesis. The results collected show that the speech produced by our system has the best result when being compared with the other Vietnamese TTS systems trained from the same speech data.
APA, Harvard, Vancouver, ISO, and other styles
6

Zen, Heiga, Keiichi Tokuda, and Alan W. Black. "Statistical parametric speech synthesis." Speech Communication 51, no. 11 (November 2009): 1039–64. http://dx.doi.org/10.1016/j.specom.2009.04.004.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Ekpenyong, Moses, Eno-Abasi Urua, Oliver Watts, Simon King, and Junichi Yamagishi. "Statistical parametric speech synthesis for Ibibio." Speech Communication 56 (January 2014): 243–51. http://dx.doi.org/10.1016/j.specom.2013.02.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Chen, Sin‐Horng, Saga Chang, and Su‐Min Lee. "A statistical model based fundamental frequency synthesizer for Mandarin speech." Journal of the Acoustical Society of America 92, no. 1 (July 1992): 114–20. http://dx.doi.org/10.1121/1.404276.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Takahashi, Sateshi, Yasuaki Satoh, Takeshi Ohno, and Katsuhiko Shirai. "Statistical modeling of dynamic spectral patterns for a speech synthesizer." Journal of the Acoustical Society of America 84, S1 (November 1988): S23. http://dx.doi.org/10.1121/1.2026230.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

King, Simon. "An introduction to statistical parametric speech synthesis." Sadhana 36, no. 5 (October 2011): 837–52. http://dx.doi.org/10.1007/s12046-011-0048-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Maia, Ranniery, Masami Akamine, and Mark J. F. Gales. "Complex cepstrum for statistical parametric speech synthesis." Speech Communication 55, no. 5 (June 2013): 606–18. http://dx.doi.org/10.1016/j.specom.2012.12.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Shannon, M., Heiga Zen, and W. Byrne. "Autoregressive Models for Statistical Parametric Speech Synthesis." IEEE Transactions on Audio, Speech, and Language Processing 21, no. 3 (March 2013): 587–97. http://dx.doi.org/10.1109/tasl.2012.2227740.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Holtse, Peter, and Anders Olsen. "SPL: A speech synthesis programming language." Annual Report of the Institute of Phonetics University of Copenhagen 19 (January 1, 1985): 1–42. http://dx.doi.org/10.7146/aripuc.v19i.131806.

Full text
Abstract:
This report describes the first version of a high level computer programming language for experiments with synthetic speech. In SLP a context sensitive parser is programmed to recognize linguistic constructs in an input string. Both the structural and phonetic descriptions of the recognized structures may be modified under program control. The final output of an SPL program is a data stream capable of driving a parametric speech synthesizer. The notation used is based on the principles known from Chomsky and Halle's "The Sound Pattern of English". This means that in principle all linguistic constructs are programmed in segmental units. However, in SPL certain macro facilities have been provided for more complicated units such as syllables or words.
APA, Harvard, Vancouver, ISO, and other styles
14

Yong. "LOW FOOTPRINT HIGH INTELLIGIBILITY MALAY SPEECH SYNTHESIZER BASED ON STATISTICAL DATA." Journal of Computer Science 10, no. 2 (February 1, 2014): 316–24. http://dx.doi.org/10.3844/jcssp.2014.316.324.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Khudoyberdiev, Khurshed A. "The Algorithms of Tajik Speech Synthesis by Syllable." ITM Web of Conferences 35 (2020): 07003. http://dx.doi.org/10.1051/itmconf/20203507003.

Full text
Abstract:
This article is devoted to the development of a prototype of a computer synthesizer of Tajik speech by the text. The need for such a synthesizer is caused by the fact that its analogues for other languages not only help people with visual and speech defects, but also find more and more application in communication technology, information and reference systems. In the future, such programs will take their proper place in the broad acoustic dialogue of humans with automatic machines and robotics in various fields of human activity. The article describes the prototype of the Tajik computer synthesizer by the text developed by the author, which is constructed on the principle of a concatenative synthesizer, in which the syllable is chosen as the speech unit, which in turn, indicates the need for the most complete description of the variety of Tajik language syllables. To study the patterns of the Tajik language associated with the concept of syllable, it was introduced the concept of “syllabic structure of the word”. It is obtained the statistical distribution of structures, i.e. a correspondence is established between the syllabic structures of words and the frequencies of their occurrence in texts in the Tajik language. It is proposed an algorithm for breaking Tajik words into syllables, implemented as a computer program. A solution to the problem of Tajik speech synthesis from an arbitrary text is proposed. The article describes the computer implementation of the algorithm for syncronization of words, numbers, characters and text. For each syllable the corresponding sound realization is extracted from the “syllable-sound” database, then the sound of the word is synthesized from the extracted elements.
APA, Harvard, Vancouver, ISO, and other styles
16

Fagel, Sascha. "Merging methods of speech visualization." ZAS Papers in Linguistics 40 (January 1, 2005): 19–32. http://dx.doi.org/10.21248/zaspil.40.2005.255.

Full text
Abstract:
The author presents MASSY, the MODULAR AUDIOVISUAL SPEECH SYNTHESIZER. The system combines two approaches of visual speech synthesis. Two control models are implemented: a (data based) di-viseme model and a (rule based) dominance model where both produce control commands in a parameterized articulation space. Analogously two visualization methods are implemented: an image based (video-realistic) face model and a 3D synthetic head. Both face models can be driven by both the data based and the rule based articulation model. The high-level visual speech synthesis generates a sequence of control commands for the visible articulation. For every virtual articulator (articulation parameter) the 3D synthetic face model defines a set of displacement vectors for the vertices of the 3D objects of the head. The vertices of the 3D synthetic head then are moved by linear combinations of these displacement vectors to visualize articulation movements. For the image based video synthesis a single reference image is deformed to fit the facial properties derived from the control commands. Facial feature points and facial displacements have to be defined for the reference image. The algorithm can also use an image database with appropriately annotated facial properties. An example database was built automatically from video recordings. Both the 3D synthetic face and the image based face generate visual speech that is capable to increase the intelligibility of audible speech. Other well known image based audiovisual speech synthesis systems like MIKETALK and VIDEO REWRITE concatenate pre-recorded single images or video sequences, respectively. Parametric talking heads like BALDI control a parametric face with a parametric articulation model. The presented system demonstrates the compatibility of parametric and data based visual speech synthesis approaches.
APA, Harvard, Vancouver, ISO, and other styles
17

Saito, Yuki, Shinnosuke Takamichi, and Hiroshi Saruwatari. "Statistical Parametric Speech Synthesis Incorporating Generative Adversarial Networks." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 1 (January 2018): 84–96. http://dx.doi.org/10.1109/taslp.2017.2761547.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Koriyama, Tomoki, and Takao Kobayashi. "Statistical Parametric Speech Synthesis Using Deep Gaussian Processes." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, no. 5 (May 2019): 948–59. http://dx.doi.org/10.1109/taslp.2019.2905167.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Liu, Zheng-Chen, Zhen-Hua Ling, and Li-Rong Dai. "Statistical Parametric Speech Synthesis Using Generalized Distillation Framework." IEEE Signal Processing Letters 25, no. 5 (May 2018): 695–99. http://dx.doi.org/10.1109/lsp.2018.2819886.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Zen, Heiga, Mark J. F. Gales, Yoshihiko Nankaku, and Keiichi Tokuda. "Product of Experts for Statistical Parametric Speech Synthesis." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 3 (March 2012): 794–805. http://dx.doi.org/10.1109/tasl.2011.2165280.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Brumberg, Jonathan S., and Kevin M. Pitt. "Motor-Induced Suppression of the N100 Event-Related Potential During Motor Imagery Control of a Speech Synthesizer Brain–Computer Interface." Journal of Speech, Language, and Hearing Research 62, no. 7 (July 15, 2019): 2133–40. http://dx.doi.org/10.1044/2019_jslhr-s-msc18-18-0198.

Full text
Abstract:
Purpose Speech motor control relies on neural processes for generating sensory expectations using an efference copy mechanism to maintain accurate productions. The N100 auditory event-related potential (ERP) has been identified as a possible neural marker of the efference copy with a reduced amplitude during active listening while speaking when compared to passive listening. This study investigates N100 suppression while controlling a motor imagery speech synthesizer brain–computer interface (BCI) with instantaneous auditory feedback to determine whether similar mechanisms are used for monitoring BCI-based speech output that may both support BCI learning through existing speech motor networks and be used as a clinical marker for the speech network integrity in individuals without severe speech and physical impairments. Method The motor-induced N100 suppression is examined based on data from 10 participants who controlled a BCI speech synthesizer using limb motor imagery. We considered listening to auditory target stimuli (without motor imagery) in the BCI study as passive listening and listening to BCI-controlled speech output (with motor imagery) as active listening since audio output depends on imagined movements. The resulting ERP was assessed for statistical significance using a mixed-effects general linear model. Results Statistically significant N100 ERP amplitude differences were observed between active and passive listening during the BCI task. Post hoc analyses confirm the N100 amplitude was suppressed during active listening. Conclusion Observation of the N100 suppression suggests motor planning brain networks are active as participants control the BCI synthesizer, which may aid speech BCI mastery.
APA, Harvard, Vancouver, ISO, and other styles
22

Přibil, Jiří, Anna Přibilová, and Jindřich Matoušek. "Automatic statistical evaluation of quality of unit selection speech synthesis with different prosody manipulations." Journal of Electrical Engineering 71, no. 2 (April 1, 2020): 78–86. http://dx.doi.org/10.2478/jee-2020-0012.

Full text
Abstract:
AbstractQuality of speech synthesis is a crucial issue in comparison of various text-to-speech (TTS) systems. We proposed a system for automatic evaluation of speech quality by statistical analysis of temporal features (time duration, phrasing, and time structuring of an analysed sentence) together with standard spectral and prosodic features. This system was successfully tested on sentences produced by a unit selection speech synthesizer with a male as well as a female voice using two different approaches to prosody manipulation. Experiments have shown that for correct, sharp, and stable results all three types of speech features (spectral, prosodic, and temporal) are necessary. Furthermore, the number of used statistical parameters has a significant impact on the correctness and precision of the evaluated results. It was also demonstrated that the stability of the whole evaluation process is improved by enlarging the used speech material. Finally, the functionality of the proposed system was verified by comparison of the results with those of the standard listening test.
APA, Harvard, Vancouver, ISO, and other styles
23

Wang, Xin, Shinji Takaki, and Junichi Yamagishi. "Autoregressive Neural F0 Model for Statistical Parametric Speech Synthesis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 8 (August 2018): 1406–19. http://dx.doi.org/10.1109/taslp.2018.2828650.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Koriyama, Tomoki, Takashi Nose, and Takao Kobayashi. "Statistical Parametric Speech Synthesis Based on Gaussian Process Regression." IEEE Journal of Selected Topics in Signal Processing 8, no. 2 (April 2014): 173–83. http://dx.doi.org/10.1109/jstsp.2013.2283461.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Tao, Jianhua, Keikichi Hirose, Keiichi Tokuda, Alan W. Black, and Simon King. "Introduction to the Issue on Statistical Parametric Speech Synthesis." IEEE Journal of Selected Topics in Signal Processing 8, no. 2 (April 2014): 170–72. http://dx.doi.org/10.1109/jstsp.2014.2309416.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Cai, Ming-Qi, Zhen-Hua Ling, and Li-Rong Dai. "Statistical parametric speech synthesis using a hidden trajectory model." Speech Communication 72 (September 2015): 149–59. http://dx.doi.org/10.1016/j.specom.2015.05.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Saheer, Lakshmi, John Dines, and Philip N. Garner. "Vocal Tract Length Normalization for Statistical Parametric Speech Synthesis." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 7 (September 2012): 2134–48. http://dx.doi.org/10.1109/tasl.2012.2198058.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

AL-RADHI, Mohammed Salah, Tamás Gábor CSAPÓ, and Géza NÉMETH. "Continuous Noise Masking Based Vocoder for Statistical Parametric Speech Synthesis." IEICE Transactions on Information and Systems E103.D, no. 5 (May 1, 2020): 1099–107. http://dx.doi.org/10.1587/transinf.2019edp7167.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Wang, Xin, Shinji Takaki, and Junichi Yamagishi. "Neural Source-Filter Waveform Models for Statistical Parametric Speech Synthesis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020): 402–15. http://dx.doi.org/10.1109/taslp.2019.2956145.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Reddy, M. Kiran, and K. Sreenivasa Rao. "Excitation modelling using epoch features for statistical parametric speech synthesis." Computer Speech & Language 60 (March 2020): 101029. http://dx.doi.org/10.1016/j.csl.2019.101029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Achanta, Sivanand, and Suryakanth V. Gangashetty. "Deep Elman recurrent neural networks for statistical parametric speech synthesis." Speech Communication 93 (October 2017): 31–42. http://dx.doi.org/10.1016/j.specom.2017.08.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Yu, Kai, and Steve Young. "Continuous F0 Modeling for HMM Based Statistical Parametric Speech Synthesis." IEEE Transactions on Audio, Speech, and Language Processing 19, no. 5 (July 2011): 1071–79. http://dx.doi.org/10.1109/tasl.2010.2076805.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Zen, H., N. Braunschweiler, S. Buchholz, M. J. F. Gales, K. Knill, S. Krstulovic, and J. Latorre. "Statistical Parametric Speech Synthesis Based on Speaker and Language Factorization." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 6 (August 2012): 1713–24. http://dx.doi.org/10.1109/tasl.2012.2187195.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Adiga, Nagaraj, and S. R. M. Prasanna. "Acoustic Features Modelling for Statistical Parametric Speech Synthesis: A Review." IETE Technical Review 36, no. 2 (March 21, 2018): 130–49. http://dx.doi.org/10.1080/02564602.2018.1432422.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Barra-Chicote, Roberto, Junichi Yamagishi, Simon King, Juan Manuel Montero, and Javier Macias-Guarasa. "Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech." Speech Communication 52, no. 5 (May 2010): 394–404. http://dx.doi.org/10.1016/j.specom.2009.12.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Chee Yong, Lau, and Tan Tian Swee. "Statistical Parametric Speech Synthesis of Malay Language using Found Training Data." Research Journal of Applied Sciences, Engineering and Technology 7, no. 24 (June 25, 2014): 5143–47. http://dx.doi.org/10.19026/rjaset.7.910.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Csapó, Tamás Gábor, and Géza Németh. "Statistical parametric speech synthesis with a novel codebook-based excitation model." Intelligent Decision Technologies 8, no. 4 (June 27, 2014): 289–99. http://dx.doi.org/10.3233/idt-140197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Takamichi, Shinnosuke, Tomoki Toda, Alan W. Black, Graham Neubig, Sakriani Sakti, and Satoshi Nakamura. "Postfilters to Modify the Modulation Spectrum for Statistical Parametric Speech Synthesis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 24, no. 4 (April 2016): 755–67. http://dx.doi.org/10.1109/taslp.2016.2522655.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Erro, Daniel, Inaki Sainz, Eva Navas, and Inma Hernaez. "Harmonics Plus Noise Model Based Vocoder for Statistical Parametric Speech Synthesis." IEEE Journal of Selected Topics in Signal Processing 8, no. 2 (April 2014): 184–94. http://dx.doi.org/10.1109/jstsp.2013.2283471.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Chen, Ling-Hui, Tuomo Raitio, Cassia Valentini-Botinhao, Zhen-Hua Ling, and Junichi Yamagishi. "A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 23, no. 11 (November 2015): 2003–14. http://dx.doi.org/10.1109/taslp.2015.2461448.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Adiga, Nagaraj, Banriskhem K. Khonglah, and S. R. Mahadeva Prasanna. "Improved voicing decision using glottal activity features for statistical parametric speech synthesis." Digital Signal Processing 71 (December 2017): 131–43. http://dx.doi.org/10.1016/j.dsp.2017.09.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Airaksinen, Manu, Lauri Juvela, Bajibabu Bollepalli, Junichi Yamagishi, and Paavo Alku. "A Comparison Between STRAIGHT, Glottal, and Sinusoidal Vocoding in Statistical Parametric Speech Synthesis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 9 (September 2018): 1658–70. http://dx.doi.org/10.1109/taslp.2018.2835720.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Csapo, Tamas Gabor, and Geza Nemeth. "Modeling Irregular Voice in Statistical Parametric Speech Synthesis With Residual Codebook Based Excitation." IEEE Journal of Selected Topics in Signal Processing 8, no. 2 (April 2014): 209–20. http://dx.doi.org/10.1109/jstsp.2013.2292037.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Souza, Fernando, and Adolfo Maia Jr. "A Mathematical, Graphical and Visual Approach to Granular Synthesis Composition." Revista Vórtex 9, no. 2 (December 10, 2021): 1–27. http://dx.doi.org/10.33871/23179937.2021.9.2.4.

Full text
Abstract:
We show a method for Granular Synthesis Composition based on a mathematical modeling for the musical gesture. Each gesture is drawn as a curve generated from a particular mathematical model (or function) and coded as a MATLAB script. The gestures can be deterministic through defining mathematical time functions, hand free drawn, or even randomly generated. This parametric information of gestures is interpreted through OSC messages by a granular synthesizer (Granular Streamer). The musical composition is then realized with the models (scripts) written in MATLAB and exported to a graphical score (Granular Score). The method is amenable to allow statistical analysis of the granular sound streams and the final music composition. We also offer a way to create granular streams based on correlated pair of grains parameters.
APA, Harvard, Vancouver, ISO, and other styles
45

Al-Radhi, Mohammed Salah, Tamás Gábor Csapó, and Géza Németh. "Adaptive Refinements of Pitch Tracking and HNR Estimation within a Vocoder for Statistical Parametric Speech Synthesis." Applied Sciences 9, no. 12 (June 16, 2019): 2460. http://dx.doi.org/10.3390/app9122460.

Full text
Abstract:
Recent studies in text-to-speech synthesis have shown the benefit of using a continuous pitch estimate; one that interpolates fundamental frequency (F0) even when voicing is not present. However, continuous F0 is still sensitive to additive noise in speech signals and suffers from short-term errors (when it changes rather quickly over time). To alleviate these issues, three adaptive techniques have been developed in this article for achieving a robust and accurate F0: (1) we weight the pitch estimates with state noise covariance using adaptive Kalman-filter framework, (2) we iteratively apply a time axis warping on the input frame signal, (3) we optimize all F0 candidates using an instantaneous-frequency-based approach. Additionally, the second goal of this study is to introduce an extension of a novel continuous-based speech synthesis system (i.e., in which all parameters are continuous). We propose adding a new excitation parameter named Harmonic-to-Noise Ratio (HNR) to the voiced and unvoiced components to indicate the degree of voicing in the excitation and to reduce the influence of buzziness caused by the vocoder. Results based on objective and perceptual tests demonstrate that the voice built with the proposed framework gives state-of-the-art speech synthesis performance while outperforming the previous baseline.
APA, Harvard, Vancouver, ISO, and other styles
46

Mazenan, Mohd Nizam, Tan Tian Swee, Tan Hui Ru, and Azran Azhim. "Statistical Parametric Evaluation on New Corpus Design for Malay Speech Articulation Disorder Early Diagnosis." American Journal of Applied Sciences 12, no. 7 (July 1, 2015): 452–62. http://dx.doi.org/10.3844/ajassp.2015.452.462.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Juvela, Lauri, Bajibabu Bollepalli, Vassilis Tsiaras, and Paavo Alku. "GlotNet—A Raw Waveform Model for the Glottal Excitation in Statistical Parametric Speech Synthesis." IEEE/ACM Transactions on Audio, Speech, and Language Processing 27, no. 6 (June 2019): 1019–30. http://dx.doi.org/10.1109/taslp.2019.2906484.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Maia, Ranniery, and Masami Akamine. "On the impact of excitation and spectral parameters for expressive statistical parametric speech synthesis." Computer Speech & Language 28, no. 5 (September 2014): 1209–32. http://dx.doi.org/10.1016/j.csl.2013.10.001.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Yu, Kai, Heiga Zen, François Mairesse, and Steve Young. "Context adaptive training with factorized decision trees for HMM-based statistical parametric speech synthesis." Speech Communication 53, no. 6 (July 2011): 914–23. http://dx.doi.org/10.1016/j.specom.2011.03.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Raitio, Tuomo, Lauri Juvela, Antti Suni, Martti Vainio, and Paavo Alku. "Phase perception of the glottal excitation and its relevance in statistical parametric speech synthesis." Speech Communication 81 (July 2016): 104–19. http://dx.doi.org/10.1016/j.specom.2016.01.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography