Acceder

Bibliografías temáticas / Robust speech features / Artículos de revistas

Artículos de revistas sobre el tema "Robust speech features"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Robust speech features.

Autor: Grafiati

Publicado: 4 de junio de 2021

Última modificación: 11 de febrero de 2022

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores artículos de revistas para su investigación sobre el tema "Robust speech features".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore artículos de revistas sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Huang, Kuo-Chang, Yau-Tarng Juang y Wen-Chieh Chang. "Robust integration for speech features". Signal Processing 86, n.º 9 (septiembre de 2006): 2282–88. http://dx.doi.org/10.1016/j.sigpro.2005.10.020.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Potamianos, Alexandros. "Novel features for robust speech recognition". Journal of the Acoustical Society of America 112, n.º 5 (noviembre de 2002): 2278. http://dx.doi.org/10.1121/1.4779131.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Goh, Yeh Huann, Paramesran Raveendran y Sudhanshu Shekhar Jamuar. "Robust speech recognition using harmonic features". IET Signal Processing 8, n.º 2 (abril de 2014): 167–75. http://dx.doi.org/10.1049/iet-spr.2013.0094.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Eskikand, Parvin Zarei y Seyyed Ali Seyyedsalehia. "Robust speech recognition by extracting invariant features". Procedia - Social and Behavioral Sciences 32 (2012): 230–37. http://dx.doi.org/10.1016/j.sbspro.2012.01.034.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Dimitriadis, D., P. Maragos y A. Potamianos. "Robust AM-FM features for speech recognition". IEEE Signal Processing Letters 12, n.º 9 (septiembre de 2005): 621–24. http://dx.doi.org/10.1109/lsp.2005.853050.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Harding, Philip y Ben Milner. "Reconstruction-based speech enhancement from robust acoustic features". Speech Communication 75 (diciembre de 2015): 62–75. http://dx.doi.org/10.1016/j.specom.2015.09.011.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Raj, Bhiksha, Michael L. Seltzer y Richard M. Stern. "Reconstruction of missing features for robust speech recognition". Speech Communication 43, n.º 4 (septiembre de 2004): 275–96. http://dx.doi.org/10.1016/j.specom.2004.03.007.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

ONOE, K., S. SATO, S. HOMMA, A. KOBAYASHI, T. IMAI y T. TAKAGI. "Bi-Spectral Acoustic Features for Robust Speech Recognition". IEICE Transactions on Information and Systems E91-D, n.º 3 (1 de marzo de 2008): 631–34. http://dx.doi.org/10.1093/ietisy/e91-d.3.631.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Bansal, Poonam, Amita Dev y Shail Jain. "Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients". International Journal of Cognitive Informatics and Natural Intelligence 4, n.º 4 (octubre de 2010): 37–46. http://dx.doi.org/10.4018/ijcini.2010100103.

Texto completo

Resumen

In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Majeed, Sayf A., Hafizah Husain y Salina A. Samad. "Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition". Archives of Acoustics 40, n.º 1 (1 de marzo de 2015): 25–31. http://dx.doi.org/10.1515/aoa-2015-0004.

Texto completo

Resumen

Abstract In this paper, a new feature-extraction method is proposed to achieve robustness of speech recognition systems. This method combines the benefits of phase autocorrelation (PAC) with bark wavelet transform. PAC uses the angle to measure correlation instead of the traditional autocorrelation measure, whereas the bark wavelet transform is a special type of wavelet transform that is particularly designed for speech signals. The extracted features from this combined method are called phase autocorrelation bark wavelet transform (PACWT) features. The speech recognition performance of the PACWT features is evaluated and compared to the conventional feature extraction method mel frequency cepstrum coefficients (MFCC) using TI-Digits database under different types of noise and noise levels. This database has been divided into male and female data. The result shows that the word recognition rate using the PACWT features for noisy male data (white noise at 0 dB SNR) is 60%, whereas it is 41.35% for the MFCC features under identical conditions

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Hsieh, Hsin-Ju, Berlin Chen y Jeih-weih Hung. "Histogram equalization of contextual statistics of speech features for robust speech recognition". Multimedia Tools and Applications 74, n.º 17 (8 de marzo de 2014): 6769–95. http://dx.doi.org/10.1007/s11042-014-1929-y.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Ouzounov, A. "Mean-Delta Features for Telephone Speech Endpoint Detection". Information Technologies and Control 12, n.º 3-4 (1 de diciembre de 2014): 36–44. http://dx.doi.org/10.1515/itc-2016-0005.

Texto completo

Resumen

Abstract In this paper, a brief summary of the author’s research in the field of the contour-based telephone speech Endpoint Detection (ED) is presented. This research includes: development of new robust features for ED – the Mean-Delta feature and the Group Delay Mean-Delta feature and estimation of the effect of the analyzed ED features and two additional features in the Dynamic Time Warping fixed-text speaker verification task with short noisy telephone phrases in Bulgarian language.

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Shoiynbek, Kozhakhmet, Sultanova, Zhumaliyeva, Aisultan, Kanat, Nazerke, Rakhima. "The Robust Spectral Audio Features for Speech Emotion Recognition". Applied Mathematics & Information Sciences 13, n.º 5 (1 de septiembre de 2019): 867–70. http://dx.doi.org/10.18576/amis/130521.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Milner, B. P. y S. V. Vaseghi. "Bayesian channel equalisation and robust features for speech recognition". IEE Proceedings - Vision, Image, and Signal Processing 143, n.º 4 (1996): 223. http://dx.doi.org/10.1049/ip-vis:19960577.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Shahnawazuddin, Syed, Rohit Sinha y Gayadhar Pradhan. "Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition". IEEE Signal Processing Letters 24, n.º 8 (agosto de 2017): 1128–32. http://dx.doi.org/10.1109/lsp.2017.2705085.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Spille, Constantin, Birger Kollmeier y Bernd T. Meyer. "Combining Binaural and Cortical Features for Robust Speech Recognition". IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, n.º 4 (abril de 2017): 756–67. http://dx.doi.org/10.1109/taslp.2017.2661712.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Ikbal, Shajith, Hemant Misra, Hynek Hermansky y Mathew Magimai-Doss. "Phase AutoCorrelation (PAC) features for noise robust speech recognition". Speech Communication 54, n.º 7 (septiembre de 2012): 867–80. http://dx.doi.org/10.1016/j.specom.2012.02.005.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Nishimura, Yoshitaka, Takahiro Shinozaki, Koji Iwano y Sadaoki Furui. "Noise‐robust speech recognition using multi‐band spectral features". Journal of the Acoustical Society of America 116, n.º 4 (octubre de 2004): 2480. http://dx.doi.org/10.1121/1.4784906.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Gavat, Inge, Gabriel Costache y Claudia Iancu. "Enhancing robustness of speech recognizers by bimodal features". Facta universitatis - series: Electronics and Energetics 19, n.º 2 (2006): 287–98. http://dx.doi.org/10.2298/fuee0602287g.

Texto completo

Resumen

In this paper a robust speech recognizer is presented based on features obtained from the speech signal and also from the image of the speaker. The features were combined by simple concatenation, resulting composed feature vectors to train the models corresponding to each class. For recognition, the classification process relies on a very effective algorithm, namely the multiclass SVM. Under additive noise conditions the bimodal system based on combined features acts better than the unimodal system, based only on the speech features, the added information obtained from the image playing an important role in robustness improvement.

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Lingnan, Ge, Katsuhiko Shirai y Akira Kurematsu. "Approach of features with confident weight for robust speech recognition". Acoustical Science and Technology 32, n.º 3 (2011): 92–99. http://dx.doi.org/10.1250/ast.32.92.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Jeih-Weih Hung y Wei-Yi Tsai. "Constructing Modulation Frequency Domain-Based Features for Robust Speech Recognition". IEEE Transactions on Audio, Speech, and Language Processing 16, n.º 3 (marzo de 2008): 563–77. http://dx.doi.org/10.1109/tasl.2007.913405.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Farooq, O. y S. Datta. "Robust features for speech recognition based on admissible wavelet packets". Electronics Letters 37, n.º 25 (2001): 1554. http://dx.doi.org/10.1049/el:20011029.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Wang, Shuiping, Zhenmin Tang, Ye Jiang y Ying Chen. "Robust FHPD Features from Speech Harmonic Analysis for Speaker Identification". Applied Mathematics & Information Sciences 7, n.º 4 (1 de julio de 2013): 1591–98. http://dx.doi.org/10.12785/amis/070445.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Mitra, Vikramjit, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman y Louis Goldstein. "Robust speech recognition with articulatory features using dynamic Bayesian networks". Journal of the Acoustical Society of America 130, n.º 4 (octubre de 2011): 2408. http://dx.doi.org/10.1121/1.3654653.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Revathi, A., N. Sasikaladevi, R. Nagakrishnan y C. Jeyalakshmi. "Robust emotion recognition from speech: Gamma tone features and models". International Journal of Speech Technology 21, n.º 3 (4 de agosto de 2018): 723–39. http://dx.doi.org/10.1007/s10772-018-9546-1.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Seyedin, Sanaz, Seyed Mohammad Ahadi y Saeed Gazor. "New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition". Scientific World Journal 2013 (2013): 1–11. http://dx.doi.org/10.1155/2013/634160.

Texto completo

Resumen

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Alabbasi, Hesham A., Ali M. Jalil y Fadhil S. Hasan. "Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system". International Journal of Electrical and Computer Engineering (IJECE) 10, n.º 5 (1 de octubre de 2020): 5208. http://dx.doi.org/10.11591/ijece.v10i5.pp5208-5216.

Texto completo

Resumen

The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Farahat, Mahboubeh y Ramin Halavati. "Noise Robust Speech Recognition Using Deep Belief Networks". International Journal of Computational Intelligence and Applications 15, n.º 01 (marzo de 2016): 1650005. http://dx.doi.org/10.1142/s146902681650005x.

Texto completo

Resumen

Most current speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. In these systems acoustic inputs are represented by Mel Frequency Cepstral Coefficients temporal spectrogram known as frames. But MFCC is not robust to noise. Consequently, with different train and test conditions the accuracy of speech recognition systems decreases. On the other hand, using MFCCs of larger window of frames in GMMs needs more computational power. In this paper, Deep Belief Networks (DBNs) are used to extract discriminative information from larger window of frames. Nonlinear transformations lead to high-order and low-dimensional features which are robust to variation of input speech. Multiple speaker isolated word recognition tasks with 100 and 200 words in clean and noisy environments has been used to test this method. The experimental results indicate that this new method of feature encoding result in much better word recognition accuracy.

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Zhang, Zhan, Yuehai Wang y Jianyi Yang. "Accent Recognition with Hybrid Phonetic Features". Sensors 21, n.º 18 (18 de septiembre de 2021): 6258. http://dx.doi.org/10.3390/s21186258.

Texto completo

Resumen

The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

KHAN, EMDAD y ROBERT LEVINSON. "ROBUST SPEECH RECOGNITION USING A NOISE REJECTION APPROACH". International Journal on Artificial Intelligence Tools 08, n.º 01 (marzo de 1999): 53–71. http://dx.doi.org/10.1142/s0218213099000051.

Texto completo

Resumen

In this paper, we explore some new approaches to improve speech recognition accuracy in a noisy environment. The key approaches taken are: (a) use no additional data (i.e. use only speakers data, no data for noise) for training and (b) no adaptation phase for noise. Instead of making adaptation in the recognition, preprocessing or both stages, we make a noise tolerant (rejection) speech recognition system where the system tries to reject noise automatically because of its inherent structure. We call our approach a noise rejection-based approach. Noise rejection is achieved by using multiple views and dynamic features of the input sequences. Multiple views exploit more information from the available data that is used for training multiple HMMs (Hidden Markov Models). This makes the training process simpler, faster and avoids the need to use a noise database, which is often difficult to obtain. The dynamic features (added to the HMM using vector emission probabilities) add more information about the input speech during training. Since the values of dynamic features of noise are usually much smaller than that of the speech signal, it helps reject the noise during recognition. Multiple views (we also call these scrambles) can be used at different stages in the recognition processes. This paper explore these possibilities. Also, multiple views of the input sequence are applied to multiple HMMs during recognition and the outcome of the multiple HMMs are combined using maximum evidence criterion. The accuracy of the noise rejection-based approach is further improved by using Higher Level Decision Making (HLD) - our method for data fusion. HLD improves accuracy by efficiently resolving conflicts. The key approaches taken for HLD are: meta reasoning, single cycle training (SCT), confidence factors and view minimization. Our tests show very encouraging results.

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Lin, Shih-Hsiang, Berlin Chen y Yao-Ming Yeh. "Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition". IEEE Transactions on Audio, Speech, and Language Processing 17, n.º 1 (enero de 2009): 84–94. http://dx.doi.org/10.1109/tasl.2008.2007612.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Sen, TjongWan, Bambang Riyanto Trilaksono, Arry Akhmad Arman y Rila Mandala. "Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients". ITB Journal of Information and Communication Technology 3, n.º 2 (2009): 123–34. http://dx.doi.org/10.5614/itbj.ict.2009.3.2.4.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Legoh, Kapang, Utpal Bhattacharjee y T. Tuithung. "Features and Model Adaptation Techniques for Robust Speech Recognition: A Review". Communications on Applied Electronics 1, n.º 2 (31 de enero de 2015): 18–31. http://dx.doi.org/10.5120/cae-1507.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Gairola, Atul y Swapna Baadkar. "Hindi Speech Recognition System with Robust Front End-Back End Features". International Journal of Computer Applications 64, n.º 1 (15 de febrero de 2013): 42–45. http://dx.doi.org/10.5120/10601-5305.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Shen, Peng, Satoshi Tamura y Satoru Hayamizu. "Multistream sparse representation features for noise robust audio-visual speech recognition". Acoustical Science and Technology 35, n.º 1 (2014): 17–27. http://dx.doi.org/10.1250/ast.35.17.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Bach, Jörg-Hendrik, Jörn Anemüller y Birger Kollmeier. "Robust speech detection in real acoustic backgrounds with perceptually motivated features". Speech Communication 53, n.º 5 (mayo de 2011): 690–706. http://dx.doi.org/10.1016/j.specom.2010.07.003.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Jeih-Weih Hung y Lin-Shan Lee. "Optimization of temporal filters for constructing robust features in speech recognition". IEEE Transactions on Audio, Speech and Language Processing 14, n.º 3 (mayo de 2006): 808–32. http://dx.doi.org/10.1109/tsa.2005.857801.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Fazel, A. y S. Chakrabartty. "Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition". IEEE Transactions on Audio, Speech, and Language Processing 20, n.º 4 (mayo de 2012): 1362–71. http://dx.doi.org/10.1109/tasl.2011.2179294.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

Lee, Moa y Joon-Hyuk Chang. "Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots". Applied Sciences 10, n.º 13 (2 de julio de 2020): 4602. http://dx.doi.org/10.3390/app10134602.

Texto completo

Resumen

Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Li, Naihan, Yanqing Liu, Yu Wu, Shujie Liu, Sheng Zhao y Ming Liu. "RobuTrans: A Robust Transformer-Based Text-to-Speech Model". Proceedings of the AAAI Conference on Artificial Intelligence 34, n.º 05 (3 de abril de 2020): 8228–35. http://dx.doi.org/10.1609/aaai.v34i05.6337.

Texto completo

Resumen

Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a "pseudo non-causal attention" mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1-D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4.36) with TransformerTTS (4.37) and Tacotron2 (4.37) on our general set.

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

FAROOQ, O., S. DATTA y M. C. SHROTRIYA. "WAVELET SUB-BAND BASED TEMPORAL FEATURES FOR ROBUST HINDI PHONEME RECOGNITION". International Journal of Wavelets, Multiresolution and Information Processing 08, n.º 06 (noviembre de 2010): 847–59. http://dx.doi.org/10.1142/s0219691310003845.

Texto completo

Resumen

This paper proposes the use of wavelet transform-based feature extraction technique for Hindi speech recognition application. The new proposed features take into account temporal as well as frequency band energy variations for the task of Hindi phoneme recognition. The recognition performance achieved by the proposed features is compared with the standard MFCC and 24-band admissible wavelet packet-based features using a linear discriminant function based classifier. To evaluate robustness of these features, the NOISEX database is used to add different types of noise into phonemes to achieve signal-to-noise ratios in the range of 20 dB to -5 dB. The recognition results show that under noisy background the proposed technique always achieves a better performance over MFCC-based features.

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Meng, Xiang Tao y Shi Yin. "Speech Recognition Algorithm Based on Nonlinear Partition and GFCC Features". Applied Mechanics and Materials 556-562 (mayo de 2014): 3069–73. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.3069.

Texto completo

Resumen

In order to speed up and enhance the robustness of speech recognition system, this paper proposes a speech recognition algorithm based on segment-level features of GFCC. In training and testing stage we use segment-level features of GFCC which is more robust to noise instead of the widely used MFCC features. Experiment results show that both the training time and test time decreased, while the accuracy of system was made to improve.

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Zvarevashe, Kudakwashe y Oludayo O. Olugbara. "Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm". Intelligent Data Analysis 24, n.º 5 (30 de septiembre de 2020): 1065–86. http://dx.doi.org/10.3233/ida-194747.

Texto completo

Resumen

Speech emotion recognition has become the heart of most human computer interaction applications in the modern world. The growing need to develop emotionally intelligent devices has opened up a lot of research opportunities. Most researchers in this field have applied the use of handcrafted features and machine learning techniques in recognising speech emotion. However, these techniques require extra processing steps and handcrafted features are usually not robust. They are computationally intensive because the curse of dimensionality results in low discriminating power. Research has shown that deep learning algorithms are effective for extracting robust and salient features in dataset. In this study, we have developed a custom 2D-convolution neural network that performs both feature extraction and classification of vocal utterances. The neural network has been evaluated against deep multilayer perceptron neural network and deep radial basis function neural network using the Berlin database of emotional speech, Ryerson audio-visual emotional speech database and Surrey audio-visual expressed emotion corpus. The described deep learning algorithm achieves the highest precision, recall and F1-scores when compared to other existing algorithms. It is observed that there may be need to develop customized solutions for different language settings depending on the area of applications.

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Zhou, Bin, Jing Liu y Zheng Pei. "Noise-Robust Voice Activity Detector Based on Four States-Based HMM". Applied Mechanics and Materials 411-414 (septiembre de 2013): 743–48. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.743.

Texto completo

Resumen

Voice activity detection (VAD) is more and more essential in the noisy environments to provide an accuracy performance in the speech recognition. In this paper, we provide a method based on left-right hidden Markov model (HMM) to identify the start and end of the speech. The method builds two models of non-speech and speech instead of existed two states, formally, each model could include several states, we also analysis other features, such as pitch index, pitch magnitude and fractal dimension of speech and non-speech.. We compare the VAD results with the proposed algorithm and two states HMM. Experiments show that the proposed method make a better performance than two states HMMs in VAD, especially in the low signal-to-noise ratio (SNR) environment.

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Park, Taejin, SeungKwan Beack y Taejin Lee. "Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features". IEIE Transactions on Smart Processing and Computing 3, n.º 5 (31 de octubre de 2014): 259–66. http://dx.doi.org/10.5573/ieiespc.2014.3.5.259.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Hsieh, C. T., E. Lai y Y. C. Wang. "Robust speech features based on wavelet transform with application to speaker identification". IEE Proceedings - Vision, Image, and Signal Processing 149, n.º 2 (2002): 108. http://dx.doi.org/10.1049/ip-vis:20020121.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Jeyalakshmi, C., K. Thenmozhi y A. Revathi. "Non-spectral features-based robust speaker independent emotion recognition from speech signal". International Journal of Medical Engineering and Informatics 12, n.º 5 (2020): 500. http://dx.doi.org/10.1504/ijmei.2020.10031560.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Revathi, A., C. Jeyalakshmi y K. Thenmozhi. "Non-spectral features-based robust speaker independent emotion recognition from speech signal". International Journal of Medical Engineering and Informatics 12, n.º 5 (2020): 500. http://dx.doi.org/10.1504/ijmei.2020.109944.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Shen, Jia-lin y Wen L. Hwang. "New temporal features for robust speech recognition with emphasis on microphone variations". Computer Speech & Language 13, n.º 1 (enero de 1999): 65–78. http://dx.doi.org/10.1006/csla.1998.0050.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Amrous, Anissa Imen, Mohamed Debyeche y Abderrahman Amrouche. "Robust Arabic speech recognition in noisy environments using prosodic features and formant". International Journal of Speech Technology 14, n.º 4 (23 de septiembre de 2011): 351–59. http://dx.doi.org/10.1007/s10772-011-9113-5.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!