Artykuły w czasopismach na temat „Robust speech features”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Robust speech features.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Robust speech features”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Huang, Kuo-Chang, Yau-Tarng Juang i Wen-Chieh Chang. "Robust integration for speech features". Signal Processing 86, nr 9 (wrzesień 2006): 2282–88. http://dx.doi.org/10.1016/j.sigpro.2005.10.020.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Potamianos, Alexandros. "Novel features for robust speech recognition". Journal of the Acoustical Society of America 112, nr 5 (listopad 2002): 2278. http://dx.doi.org/10.1121/1.4779131.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Goh, Yeh Huann, Paramesran Raveendran i Sudhanshu Shekhar Jamuar. "Robust speech recognition using harmonic features". IET Signal Processing 8, nr 2 (kwiecień 2014): 167–75. http://dx.doi.org/10.1049/iet-spr.2013.0094.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Eskikand, Parvin Zarei, i Seyyed Ali Seyyedsalehia. "Robust speech recognition by extracting invariant features". Procedia - Social and Behavioral Sciences 32 (2012): 230–37. http://dx.doi.org/10.1016/j.sbspro.2012.01.034.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Dimitriadis, D., P. Maragos i A. Potamianos. "Robust AM-FM features for speech recognition". IEEE Signal Processing Letters 12, nr 9 (wrzesień 2005): 621–24. http://dx.doi.org/10.1109/lsp.2005.853050.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Harding, Philip, i Ben Milner. "Reconstruction-based speech enhancement from robust acoustic features". Speech Communication 75 (grudzień 2015): 62–75. http://dx.doi.org/10.1016/j.specom.2015.09.011.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Raj, Bhiksha, Michael L. Seltzer i Richard M. Stern. "Reconstruction of missing features for robust speech recognition". Speech Communication 43, nr 4 (wrzesień 2004): 275–96. http://dx.doi.org/10.1016/j.specom.2004.03.007.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

ONOE, K., S. SATO, S. HOMMA, A. KOBAYASHI, T. IMAI i T. TAKAGI. "Bi-Spectral Acoustic Features for Robust Speech Recognition". IEICE Transactions on Information and Systems E91-D, nr 3 (1.03.2008): 631–34. http://dx.doi.org/10.1093/ietisy/e91-d.3.631.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Bansal, Poonam, Amita Dev i Shail Jain. "Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients". International Journal of Cognitive Informatics and Natural Intelligence 4, nr 4 (październik 2010): 37–46. http://dx.doi.org/10.4018/ijcini.2010100103.

Pełny tekst źródła
Streszczenie:
In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.
Style APA, Harvard, Vancouver, ISO itp.
10

Majeed, Sayf A., Hafizah Husain i Salina A. Samad. "Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition". Archives of Acoustics 40, nr 1 (1.03.2015): 25–31. http://dx.doi.org/10.1515/aoa-2015-0004.

Pełny tekst źródła
Streszczenie:
Abstract In this paper, a new feature-extraction method is proposed to achieve robustness of speech recognition systems. This method combines the benefits of phase autocorrelation (PAC) with bark wavelet transform. PAC uses the angle to measure correlation instead of the traditional autocorrelation measure, whereas the bark wavelet transform is a special type of wavelet transform that is particularly designed for speech signals. The extracted features from this combined method are called phase autocorrelation bark wavelet transform (PACWT) features. The speech recognition performance of the PACWT features is evaluated and compared to the conventional feature extraction method mel frequency cepstrum coefficients (MFCC) using TI-Digits database under different types of noise and noise levels. This database has been divided into male and female data. The result shows that the word recognition rate using the PACWT features for noisy male data (white noise at 0 dB SNR) is 60%, whereas it is 41.35% for the MFCC features under identical conditions
Style APA, Harvard, Vancouver, ISO itp.
11

Hsieh, Hsin-Ju, Berlin Chen i Jeih-weih Hung. "Histogram equalization of contextual statistics of speech features for robust speech recognition". Multimedia Tools and Applications 74, nr 17 (8.03.2014): 6769–95. http://dx.doi.org/10.1007/s11042-014-1929-y.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Ouzounov, A. "Mean-Delta Features for Telephone Speech Endpoint Detection". Information Technologies and Control 12, nr 3-4 (1.12.2014): 36–44. http://dx.doi.org/10.1515/itc-2016-0005.

Pełny tekst źródła
Streszczenie:
Abstract In this paper, a brief summary of the author’s research in the field of the contour-based telephone speech Endpoint Detection (ED) is presented. This research includes: development of new robust features for ED – the Mean-Delta feature and the Group Delay Mean-Delta feature and estimation of the effect of the analyzed ED features and two additional features in the Dynamic Time Warping fixed-text speaker verification task with short noisy telephone phrases in Bulgarian language.
Style APA, Harvard, Vancouver, ISO itp.
13

Shoiynbek, Kozhakhmet, Sultanova, Zhumaliyeva, Aisultan, Kanat, Nazerke, Rakhima. "The Robust Spectral Audio Features for Speech Emotion Recognition". Applied Mathematics & Information Sciences 13, nr 5 (1.09.2019): 867–70. http://dx.doi.org/10.18576/amis/130521.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
14

Milner, B. P., i S. V. Vaseghi. "Bayesian channel equalisation and robust features for speech recognition". IEE Proceedings - Vision, Image, and Signal Processing 143, nr 4 (1996): 223. http://dx.doi.org/10.1049/ip-vis:19960577.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
15

Shahnawazuddin, Syed, Rohit Sinha i Gayadhar Pradhan. "Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition". IEEE Signal Processing Letters 24, nr 8 (sierpień 2017): 1128–32. http://dx.doi.org/10.1109/lsp.2017.2705085.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Spille, Constantin, Birger Kollmeier i Bernd T. Meyer. "Combining Binaural and Cortical Features for Robust Speech Recognition". IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, nr 4 (kwiecień 2017): 756–67. http://dx.doi.org/10.1109/taslp.2017.2661712.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Ikbal, Shajith, Hemant Misra, Hynek Hermansky i Mathew Magimai-Doss. "Phase AutoCorrelation (PAC) features for noise robust speech recognition". Speech Communication 54, nr 7 (wrzesień 2012): 867–80. http://dx.doi.org/10.1016/j.specom.2012.02.005.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Nishimura, Yoshitaka, Takahiro Shinozaki, Koji Iwano i Sadaoki Furui. "Noise‐robust speech recognition using multi‐band spectral features". Journal of the Acoustical Society of America 116, nr 4 (październik 2004): 2480. http://dx.doi.org/10.1121/1.4784906.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
19

Gavat, Inge, Gabriel Costache i Claudia Iancu. "Enhancing robustness of speech recognizers by bimodal features". Facta universitatis - series: Electronics and Energetics 19, nr 2 (2006): 287–98. http://dx.doi.org/10.2298/fuee0602287g.

Pełny tekst źródła
Streszczenie:
In this paper a robust speech recognizer is presented based on features obtained from the speech signal and also from the image of the speaker. The features were combined by simple concatenation, resulting composed feature vectors to train the models corresponding to each class. For recognition, the classification process relies on a very effective algorithm, namely the multiclass SVM. Under additive noise conditions the bimodal system based on combined features acts better than the unimodal system, based only on the speech features, the added information obtained from the image playing an important role in robustness improvement.
Style APA, Harvard, Vancouver, ISO itp.
20

Lingnan, Ge, Katsuhiko Shirai i Akira Kurematsu. "Approach of features with confident weight for robust speech recognition". Acoustical Science and Technology 32, nr 3 (2011): 92–99. http://dx.doi.org/10.1250/ast.32.92.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
21

Jeih-Weih Hung i Wei-Yi Tsai. "Constructing Modulation Frequency Domain-Based Features for Robust Speech Recognition". IEEE Transactions on Audio, Speech, and Language Processing 16, nr 3 (marzec 2008): 563–77. http://dx.doi.org/10.1109/tasl.2007.913405.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
22

Farooq, O., i S. Datta. "Robust features for speech recognition based on admissible wavelet packets". Electronics Letters 37, nr 25 (2001): 1554. http://dx.doi.org/10.1049/el:20011029.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
23

Wang, Shuiping, Zhenmin Tang, Ye Jiang i Ying Chen. "Robust FHPD Features from Speech Harmonic Analysis for Speaker Identification". Applied Mathematics & Information Sciences 7, nr 4 (1.07.2013): 1591–98. http://dx.doi.org/10.12785/amis/070445.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
24

Mitra, Vikramjit, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman i Louis Goldstein. "Robust speech recognition with articulatory features using dynamic Bayesian networks". Journal of the Acoustical Society of America 130, nr 4 (październik 2011): 2408. http://dx.doi.org/10.1121/1.3654653.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
25

Revathi, A., N. Sasikaladevi, R. Nagakrishnan i C. Jeyalakshmi. "Robust emotion recognition from speech: Gamma tone features and models". International Journal of Speech Technology 21, nr 3 (4.08.2018): 723–39. http://dx.doi.org/10.1007/s10772-018-9546-1.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
26

Seyedin, Sanaz, Seyed Mohammad Ahadi i Saeed Gazor. "New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition". Scientific World Journal 2013 (2013): 1–11. http://dx.doi.org/10.1155/2013/634160.

Pełny tekst źródła
Streszczenie:
This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.
Style APA, Harvard, Vancouver, ISO itp.
27

Alabbasi, Hesham A., Ali M. Jalil i Fadhil S. Hasan. "Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system". International Journal of Electrical and Computer Engineering (IJECE) 10, nr 5 (1.10.2020): 5208. http://dx.doi.org/10.11591/ijece.v10i5.pp5208-5216.

Pełny tekst źródła
Streszczenie:
The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.
Style APA, Harvard, Vancouver, ISO itp.
28

Farahat, Mahboubeh, i Ramin Halavati. "Noise Robust Speech Recognition Using Deep Belief Networks". International Journal of Computational Intelligence and Applications 15, nr 01 (marzec 2016): 1650005. http://dx.doi.org/10.1142/s146902681650005x.

Pełny tekst źródła
Streszczenie:
Most current speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. In these systems acoustic inputs are represented by Mel Frequency Cepstral Coefficients temporal spectrogram known as frames. But MFCC is not robust to noise. Consequently, with different train and test conditions the accuracy of speech recognition systems decreases. On the other hand, using MFCCs of larger window of frames in GMMs needs more computational power. In this paper, Deep Belief Networks (DBNs) are used to extract discriminative information from larger window of frames. Nonlinear transformations lead to high-order and low-dimensional features which are robust to variation of input speech. Multiple speaker isolated word recognition tasks with 100 and 200 words in clean and noisy environments has been used to test this method. The experimental results indicate that this new method of feature encoding result in much better word recognition accuracy.
Style APA, Harvard, Vancouver, ISO itp.
29

Zhang, Zhan, Yuehai Wang i Jianyi Yang. "Accent Recognition with Hybrid Phonetic Features". Sensors 21, nr 18 (18.09.2021): 6258. http://dx.doi.org/10.3390/s21186258.

Pełny tekst źródła
Streszczenie:
The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.
Style APA, Harvard, Vancouver, ISO itp.
30

KHAN, EMDAD, i ROBERT LEVINSON. "ROBUST SPEECH RECOGNITION USING A NOISE REJECTION APPROACH". International Journal on Artificial Intelligence Tools 08, nr 01 (marzec 1999): 53–71. http://dx.doi.org/10.1142/s0218213099000051.

Pełny tekst źródła
Streszczenie:
In this paper, we explore some new approaches to improve speech recognition accuracy in a noisy environment. The key approaches taken are: (a) use no additional data (i.e. use only speakers data, no data for noise) for training and (b) no adaptation phase for noise. Instead of making adaptation in the recognition, preprocessing or both stages, we make a noise tolerant (rejection) speech recognition system where the system tries to reject noise automatically because of its inherent structure. We call our approach a noise rejection-based approach. Noise rejection is achieved by using multiple views and dynamic features of the input sequences. Multiple views exploit more information from the available data that is used for training multiple HMMs (Hidden Markov Models). This makes the training process simpler, faster and avoids the need to use a noise database, which is often difficult to obtain. The dynamic features (added to the HMM using vector emission probabilities) add more information about the input speech during training. Since the values of dynamic features of noise are usually much smaller than that of the speech signal, it helps reject the noise during recognition. Multiple views (we also call these scrambles) can be used at different stages in the recognition processes. This paper explore these possibilities. Also, multiple views of the input sequence are applied to multiple HMMs during recognition and the outcome of the multiple HMMs are combined using maximum evidence criterion. The accuracy of the noise rejection-based approach is further improved by using Higher Level Decision Making (HLD) - our method for data fusion. HLD improves accuracy by efficiently resolving conflicts. The key approaches taken for HLD are: meta reasoning, single cycle training (SCT), confidence factors and view minimization. Our tests show very encouraging results.
Style APA, Harvard, Vancouver, ISO itp.
31

Lin, Shih-Hsiang, Berlin Chen i Yao-Ming Yeh. "Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition". IEEE Transactions on Audio, Speech, and Language Processing 17, nr 1 (styczeń 2009): 84–94. http://dx.doi.org/10.1109/tasl.2008.2007612.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
32

Sen, TjongWan, Bambang Riyanto Trilaksono, Arry Akhmad Arman i Rila Mandala. "Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients". ITB Journal of Information and Communication Technology 3, nr 2 (2009): 123–34. http://dx.doi.org/10.5614/itbj.ict.2009.3.2.4.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
33

Legoh, Kapang, Utpal Bhattacharjee i T. Tuithung. "Features and Model Adaptation Techniques for Robust Speech Recognition: A Review". Communications on Applied Electronics 1, nr 2 (31.01.2015): 18–31. http://dx.doi.org/10.5120/cae-1507.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Gairola, Atul, i Swapna Baadkar. "Hindi Speech Recognition System with Robust Front End-Back End Features". International Journal of Computer Applications 64, nr 1 (15.02.2013): 42–45. http://dx.doi.org/10.5120/10601-5305.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
35

Shen, Peng, Satoshi Tamura i Satoru Hayamizu. "Multistream sparse representation features for noise robust audio-visual speech recognition". Acoustical Science and Technology 35, nr 1 (2014): 17–27. http://dx.doi.org/10.1250/ast.35.17.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
36

Bach, Jörg-Hendrik, Jörn Anemüller i Birger Kollmeier. "Robust speech detection in real acoustic backgrounds with perceptually motivated features". Speech Communication 53, nr 5 (maj 2011): 690–706. http://dx.doi.org/10.1016/j.specom.2010.07.003.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
37

Jeih-Weih Hung i Lin-Shan Lee. "Optimization of temporal filters for constructing robust features in speech recognition". IEEE Transactions on Audio, Speech and Language Processing 14, nr 3 (maj 2006): 808–32. http://dx.doi.org/10.1109/tsa.2005.857801.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Fazel, A., i S. Chakrabartty. "Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition". IEEE Transactions on Audio, Speech, and Language Processing 20, nr 4 (maj 2012): 1362–71. http://dx.doi.org/10.1109/tasl.2011.2179294.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
39

Lee, Moa, i Joon-Hyuk Chang. "Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots". Applied Sciences 10, nr 13 (2.07.2020): 4602. http://dx.doi.org/10.3390/app10134602.

Pełny tekst źródła
Streszczenie:
Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.
Style APA, Harvard, Vancouver, ISO itp.
40

Li, Naihan, Yanqing Liu, Yu Wu, Shujie Liu, Sheng Zhao i Ming Liu. "RobuTrans: A Robust Transformer-Based Text-to-Speech Model". Proceedings of the AAAI Conference on Artificial Intelligence 34, nr 05 (3.04.2020): 8228–35. http://dx.doi.org/10.1609/aaai.v34i05.6337.

Pełny tekst źródła
Streszczenie:
Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a "pseudo non-causal attention" mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1-D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4.36) with TransformerTTS (4.37) and Tacotron2 (4.37) on our general set.
Style APA, Harvard, Vancouver, ISO itp.
41

FAROOQ, O., S. DATTA i M. C. SHROTRIYA. "WAVELET SUB-BAND BASED TEMPORAL FEATURES FOR ROBUST HINDI PHONEME RECOGNITION". International Journal of Wavelets, Multiresolution and Information Processing 08, nr 06 (listopad 2010): 847–59. http://dx.doi.org/10.1142/s0219691310003845.

Pełny tekst źródła
Streszczenie:
This paper proposes the use of wavelet transform-based feature extraction technique for Hindi speech recognition application. The new proposed features take into account temporal as well as frequency band energy variations for the task of Hindi phoneme recognition. The recognition performance achieved by the proposed features is compared with the standard MFCC and 24-band admissible wavelet packet-based features using a linear discriminant function based classifier. To evaluate robustness of these features, the NOISEX database is used to add different types of noise into phonemes to achieve signal-to-noise ratios in the range of 20 dB to -5 dB. The recognition results show that under noisy background the proposed technique always achieves a better performance over MFCC-based features.
Style APA, Harvard, Vancouver, ISO itp.
42

Meng, Xiang Tao, i Shi Yin. "Speech Recognition Algorithm Based on Nonlinear Partition and GFCC Features". Applied Mechanics and Materials 556-562 (maj 2014): 3069–73. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.3069.

Pełny tekst źródła
Streszczenie:
In order to speed up and enhance the robustness of speech recognition system, this paper proposes a speech recognition algorithm based on segment-level features of GFCC. In training and testing stage we use segment-level features of GFCC which is more robust to noise instead of the widely used MFCC features. Experiment results show that both the training time and test time decreased, while the accuracy of system was made to improve.
Style APA, Harvard, Vancouver, ISO itp.
43

Zvarevashe, Kudakwashe, i Oludayo O. Olugbara. "Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm". Intelligent Data Analysis 24, nr 5 (30.09.2020): 1065–86. http://dx.doi.org/10.3233/ida-194747.

Pełny tekst źródła
Streszczenie:
Speech emotion recognition has become the heart of most human computer interaction applications in the modern world. The growing need to develop emotionally intelligent devices has opened up a lot of research opportunities. Most researchers in this field have applied the use of handcrafted features and machine learning techniques in recognising speech emotion. However, these techniques require extra processing steps and handcrafted features are usually not robust. They are computationally intensive because the curse of dimensionality results in low discriminating power. Research has shown that deep learning algorithms are effective for extracting robust and salient features in dataset. In this study, we have developed a custom 2D-convolution neural network that performs both feature extraction and classification of vocal utterances. The neural network has been evaluated against deep multilayer perceptron neural network and deep radial basis function neural network using the Berlin database of emotional speech, Ryerson audio-visual emotional speech database and Surrey audio-visual expressed emotion corpus. The described deep learning algorithm achieves the highest precision, recall and F1-scores when compared to other existing algorithms. It is observed that there may be need to develop customized solutions for different language settings depending on the area of applications.
Style APA, Harvard, Vancouver, ISO itp.
44

Zhou, Bin, Jing Liu i Zheng Pei. "Noise-Robust Voice Activity Detector Based on Four States-Based HMM". Applied Mechanics and Materials 411-414 (wrzesień 2013): 743–48. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.743.

Pełny tekst źródła
Streszczenie:
Voice activity detection (VAD) is more and more essential in the noisy environments to provide an accuracy performance in the speech recognition. In this paper, we provide a method based on left-right hidden Markov model (HMM) to identify the start and end of the speech. The method builds two models of non-speech and speech instead of existed two states, formally, each model could include several states, we also analysis other features, such as pitch index, pitch magnitude and fractal dimension of speech and non-speech.. We compare the VAD results with the proposed algorithm and two states HMM. Experiments show that the proposed method make a better performance than two states HMMs in VAD, especially in the low signal-to-noise ratio (SNR) environment.
Style APA, Harvard, Vancouver, ISO itp.
45

Park, Taejin, SeungKwan Beack i Taejin Lee. "Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features". IEIE Transactions on Smart Processing and Computing 3, nr 5 (31.10.2014): 259–66. http://dx.doi.org/10.5573/ieiespc.2014.3.5.259.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
46

Hsieh, C. T., E. Lai i Y. C. Wang. "Robust speech features based on wavelet transform with application to speaker identification". IEE Proceedings - Vision, Image, and Signal Processing 149, nr 2 (2002): 108. http://dx.doi.org/10.1049/ip-vis:20020121.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
47

Jeyalakshmi, C., K. Thenmozhi i A. Revathi. "Non-spectral features-based robust speaker independent emotion recognition from speech signal". International Journal of Medical Engineering and Informatics 12, nr 5 (2020): 500. http://dx.doi.org/10.1504/ijmei.2020.10031560.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

Revathi, A., C. Jeyalakshmi i K. Thenmozhi. "Non-spectral features-based robust speaker independent emotion recognition from speech signal". International Journal of Medical Engineering and Informatics 12, nr 5 (2020): 500. http://dx.doi.org/10.1504/ijmei.2020.109944.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
49

Shen, Jia-lin, i Wen L. Hwang. "New temporal features for robust speech recognition with emphasis on microphone variations". Computer Speech & Language 13, nr 1 (styczeń 1999): 65–78. http://dx.doi.org/10.1006/csla.1998.0050.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

Amrous, Anissa Imen, Mohamed Debyeche i Abderrahman Amrouche. "Robust Arabic speech recognition in noisy environments using prosodic features and formant". International Journal of Speech Technology 14, nr 4 (23.09.2011): 351–59. http://dx.doi.org/10.1007/s10772-011-9113-5.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii