Journal articles on the topic 'Robust speech features'

To see the other types of publications on this topic, follow the link: Robust speech features.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Robust speech features.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Huang, Kuo-Chang, Yau-Tarng Juang, and Wen-Chieh Chang. "Robust integration for speech features." Signal Processing 86, no. 9 (September 2006): 2282–88. http://dx.doi.org/10.1016/j.sigpro.2005.10.020.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Potamianos, Alexandros. "Novel features for robust speech recognition." Journal of the Acoustical Society of America 112, no. 5 (November 2002): 2278. http://dx.doi.org/10.1121/1.4779131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Goh, Yeh Huann, Paramesran Raveendran, and Sudhanshu Shekhar Jamuar. "Robust speech recognition using harmonic features." IET Signal Processing 8, no. 2 (April 2014): 167–75. http://dx.doi.org/10.1049/iet-spr.2013.0094.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Eskikand, Parvin Zarei, and Seyyed Ali Seyyedsalehia. "Robust speech recognition by extracting invariant features." Procedia - Social and Behavioral Sciences 32 (2012): 230–37. http://dx.doi.org/10.1016/j.sbspro.2012.01.034.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Dimitriadis, D., P. Maragos, and A. Potamianos. "Robust AM-FM features for speech recognition." IEEE Signal Processing Letters 12, no. 9 (September 2005): 621–24. http://dx.doi.org/10.1109/lsp.2005.853050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Harding, Philip, and Ben Milner. "Reconstruction-based speech enhancement from robust acoustic features." Speech Communication 75 (December 2015): 62–75. http://dx.doi.org/10.1016/j.specom.2015.09.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Raj, Bhiksha, Michael L. Seltzer, and Richard M. Stern. "Reconstruction of missing features for robust speech recognition." Speech Communication 43, no. 4 (September 2004): 275–96. http://dx.doi.org/10.1016/j.specom.2004.03.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

ONOE, K., S. SATO, S. HOMMA, A. KOBAYASHI, T. IMAI, and T. TAKAGI. "Bi-Spectral Acoustic Features for Robust Speech Recognition." IEICE Transactions on Information and Systems E91-D, no. 3 (March 1, 2008): 631–34. http://dx.doi.org/10.1093/ietisy/e91-d.3.631.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Bansal, Poonam, Amita Dev, and Shail Jain. "Robust Feature Vector Set Using Higher Order Autocorrelation Coefficients." International Journal of Cognitive Informatics and Natural Intelligence 4, no. 4 (October 2010): 37–46. http://dx.doi.org/10.4018/ijcini.2010100103.

Full text
Abstract:
In this paper, a feature extraction method that is robust to additive background noise is proposed for automatic speech recognition. Since the background noise corrupts the autocorrelation coefficients of the speech signal mostly at the lower orders, while the higher-order autocorrelation coefficients are least affected, this method discards the lower order autocorrelation coefficients and uses only the higher-order autocorrelation coefficients for spectral estimation. The magnitude spectrum of the windowed higher-order autocorrelation sequence is used here as an estimate of the power spectrum of the speech signal. This power spectral estimate is processed further by the Mel filter bank; a log operation and the discrete cosine transform to get the cepstral coefficients. These cepstral coefficients are referred to as the Differentiated Relative Higher Order Autocorrelation Coefficient Sequence Spectrum (DRHOASS). The authors evaluate the speech recognition performance of the DRHOASS features and show that they perform as well as the MFCC features for clean speech and their recognition performance is better than the MFCC features for noisy speech.
APA, Harvard, Vancouver, ISO, and other styles
10

Majeed, Sayf A., Hafizah Husain, and Salina A. Samad. "Phase Autocorrelation Bark Wavelet Transform (PACWT) Features for Robust Speech Recognition." Archives of Acoustics 40, no. 1 (March 1, 2015): 25–31. http://dx.doi.org/10.1515/aoa-2015-0004.

Full text
Abstract:
Abstract In this paper, a new feature-extraction method is proposed to achieve robustness of speech recognition systems. This method combines the benefits of phase autocorrelation (PAC) with bark wavelet transform. PAC uses the angle to measure correlation instead of the traditional autocorrelation measure, whereas the bark wavelet transform is a special type of wavelet transform that is particularly designed for speech signals. The extracted features from this combined method are called phase autocorrelation bark wavelet transform (PACWT) features. The speech recognition performance of the PACWT features is evaluated and compared to the conventional feature extraction method mel frequency cepstrum coefficients (MFCC) using TI-Digits database under different types of noise and noise levels. This database has been divided into male and female data. The result shows that the word recognition rate using the PACWT features for noisy male data (white noise at 0 dB SNR) is 60%, whereas it is 41.35% for the MFCC features under identical conditions
APA, Harvard, Vancouver, ISO, and other styles
11

Hsieh, Hsin-Ju, Berlin Chen, and Jeih-weih Hung. "Histogram equalization of contextual statistics of speech features for robust speech recognition." Multimedia Tools and Applications 74, no. 17 (March 8, 2014): 6769–95. http://dx.doi.org/10.1007/s11042-014-1929-y.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Ouzounov, A. "Mean-Delta Features for Telephone Speech Endpoint Detection." Information Technologies and Control 12, no. 3-4 (December 1, 2014): 36–44. http://dx.doi.org/10.1515/itc-2016-0005.

Full text
Abstract:
Abstract In this paper, a brief summary of the author’s research in the field of the contour-based telephone speech Endpoint Detection (ED) is presented. This research includes: development of new robust features for ED – the Mean-Delta feature and the Group Delay Mean-Delta feature and estimation of the effect of the analyzed ED features and two additional features in the Dynamic Time Warping fixed-text speaker verification task with short noisy telephone phrases in Bulgarian language.
APA, Harvard, Vancouver, ISO, and other styles
13

Shoiynbek, Kozhakhmet, Sultanova, Zhumaliyeva, Aisultan, Kanat, Nazerke, Rakhima. "The Robust Spectral Audio Features for Speech Emotion Recognition." Applied Mathematics & Information Sciences 13, no. 5 (September 1, 2019): 867–70. http://dx.doi.org/10.18576/amis/130521.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Milner, B. P., and S. V. Vaseghi. "Bayesian channel equalisation and robust features for speech recognition." IEE Proceedings - Vision, Image, and Signal Processing 143, no. 4 (1996): 223. http://dx.doi.org/10.1049/ip-vis:19960577.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Shahnawazuddin, Syed, Rohit Sinha, and Gayadhar Pradhan. "Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition." IEEE Signal Processing Letters 24, no. 8 (August 2017): 1128–32. http://dx.doi.org/10.1109/lsp.2017.2705085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Spille, Constantin, Birger Kollmeier, and Bernd T. Meyer. "Combining Binaural and Cortical Features for Robust Speech Recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing 25, no. 4 (April 2017): 756–67. http://dx.doi.org/10.1109/taslp.2017.2661712.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Ikbal, Shajith, Hemant Misra, Hynek Hermansky, and Mathew Magimai-Doss. "Phase AutoCorrelation (PAC) features for noise robust speech recognition." Speech Communication 54, no. 7 (September 2012): 867–80. http://dx.doi.org/10.1016/j.specom.2012.02.005.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Nishimura, Yoshitaka, Takahiro Shinozaki, Koji Iwano, and Sadaoki Furui. "Noise‐robust speech recognition using multi‐band spectral features." Journal of the Acoustical Society of America 116, no. 4 (October 2004): 2480. http://dx.doi.org/10.1121/1.4784906.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Gavat, Inge, Gabriel Costache, and Claudia Iancu. "Enhancing robustness of speech recognizers by bimodal features." Facta universitatis - series: Electronics and Energetics 19, no. 2 (2006): 287–98. http://dx.doi.org/10.2298/fuee0602287g.

Full text
Abstract:
In this paper a robust speech recognizer is presented based on features obtained from the speech signal and also from the image of the speaker. The features were combined by simple concatenation, resulting composed feature vectors to train the models corresponding to each class. For recognition, the classification process relies on a very effective algorithm, namely the multiclass SVM. Under additive noise conditions the bimodal system based on combined features acts better than the unimodal system, based only on the speech features, the added information obtained from the image playing an important role in robustness improvement.
APA, Harvard, Vancouver, ISO, and other styles
20

Lingnan, Ge, Katsuhiko Shirai, and Akira Kurematsu. "Approach of features with confident weight for robust speech recognition." Acoustical Science and Technology 32, no. 3 (2011): 92–99. http://dx.doi.org/10.1250/ast.32.92.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Jeih-Weih Hung and Wei-Yi Tsai. "Constructing Modulation Frequency Domain-Based Features for Robust Speech Recognition." IEEE Transactions on Audio, Speech, and Language Processing 16, no. 3 (March 2008): 563–77. http://dx.doi.org/10.1109/tasl.2007.913405.

Full text
APA, Harvard, Vancouver, ISO, and other styles
22

Farooq, O., and S. Datta. "Robust features for speech recognition based on admissible wavelet packets." Electronics Letters 37, no. 25 (2001): 1554. http://dx.doi.org/10.1049/el:20011029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Wang, Shuiping, Zhenmin Tang, Ye Jiang, and Ying Chen. "Robust FHPD Features from Speech Harmonic Analysis for Speaker Identification." Applied Mathematics & Information Sciences 7, no. 4 (July 1, 2013): 1591–98. http://dx.doi.org/10.12785/amis/070445.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Mitra, Vikramjit, Hosung Nam, Carol Espy-Wilson, Elliot Saltzman, and Louis Goldstein. "Robust speech recognition with articulatory features using dynamic Bayesian networks." Journal of the Acoustical Society of America 130, no. 4 (October 2011): 2408. http://dx.doi.org/10.1121/1.3654653.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Revathi, A., N. Sasikaladevi, R. Nagakrishnan, and C. Jeyalakshmi. "Robust emotion recognition from speech: Gamma tone features and models." International Journal of Speech Technology 21, no. 3 (August 4, 2018): 723–39. http://dx.doi.org/10.1007/s10772-018-9546-1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Seyedin, Sanaz, Seyed Mohammad Ahadi, and Saeed Gazor. "New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition." Scientific World Journal 2013 (2013): 1–11. http://dx.doi.org/10.1155/2013/634160.

Full text
Abstract:
This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.
APA, Harvard, Vancouver, ISO, and other styles
27

Alabbasi, Hesham A., Ali M. Jalil, and Fadhil S. Hasan. "Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system." International Journal of Electrical and Computer Engineering (IJECE) 10, no. 5 (October 1, 2020): 5208. http://dx.doi.org/10.11591/ijece.v10i5.pp5208-5216.

Full text
Abstract:
The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.
APA, Harvard, Vancouver, ISO, and other styles
28

Farahat, Mahboubeh, and Ramin Halavati. "Noise Robust Speech Recognition Using Deep Belief Networks." International Journal of Computational Intelligence and Applications 15, no. 01 (March 2016): 1650005. http://dx.doi.org/10.1142/s146902681650005x.

Full text
Abstract:
Most current speech recognition systems use Hidden Markov Models (HMMs) to deal with the temporal variability of speech and Gaussian mixture models (GMMs) to determine how well each state of each HMM fits a frame or a short window of frames of coefficients that represents the acoustic input. In these systems acoustic inputs are represented by Mel Frequency Cepstral Coefficients temporal spectrogram known as frames. But MFCC is not robust to noise. Consequently, with different train and test conditions the accuracy of speech recognition systems decreases. On the other hand, using MFCCs of larger window of frames in GMMs needs more computational power. In this paper, Deep Belief Networks (DBNs) are used to extract discriminative information from larger window of frames. Nonlinear transformations lead to high-order and low-dimensional features which are robust to variation of input speech. Multiple speaker isolated word recognition tasks with 100 and 200 words in clean and noisy environments has been used to test this method. The experimental results indicate that this new method of feature encoding result in much better word recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
29

Zhang, Zhan, Yuehai Wang, and Jianyi Yang. "Accent Recognition with Hybrid Phonetic Features." Sensors 21, no. 18 (September 18, 2021): 6258. http://dx.doi.org/10.3390/s21186258.

Full text
Abstract:
The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
30

KHAN, EMDAD, and ROBERT LEVINSON. "ROBUST SPEECH RECOGNITION USING A NOISE REJECTION APPROACH." International Journal on Artificial Intelligence Tools 08, no. 01 (March 1999): 53–71. http://dx.doi.org/10.1142/s0218213099000051.

Full text
Abstract:
In this paper, we explore some new approaches to improve speech recognition accuracy in a noisy environment. The key approaches taken are: (a) use no additional data (i.e. use only speakers data, no data for noise) for training and (b) no adaptation phase for noise. Instead of making adaptation in the recognition, preprocessing or both stages, we make a noise tolerant (rejection) speech recognition system where the system tries to reject noise automatically because of its inherent structure. We call our approach a noise rejection-based approach. Noise rejection is achieved by using multiple views and dynamic features of the input sequences. Multiple views exploit more information from the available data that is used for training multiple HMMs (Hidden Markov Models). This makes the training process simpler, faster and avoids the need to use a noise database, which is often difficult to obtain. The dynamic features (added to the HMM using vector emission probabilities) add more information about the input speech during training. Since the values of dynamic features of noise are usually much smaller than that of the speech signal, it helps reject the noise during recognition. Multiple views (we also call these scrambles) can be used at different stages in the recognition processes. This paper explore these possibilities. Also, multiple views of the input sequence are applied to multiple HMMs during recognition and the outcome of the multiple HMMs are combined using maximum evidence criterion. The accuracy of the noise rejection-based approach is further improved by using Higher Level Decision Making (HLD) - our method for data fusion. HLD improves accuracy by efficiently resolving conflicts. The key approaches taken for HLD are: meta reasoning, single cycle training (SCT), confidence factors and view minimization. Our tests show very encouraging results.
APA, Harvard, Vancouver, ISO, and other styles
31

Lin, Shih-Hsiang, Berlin Chen, and Yao-Ming Yeh. "Exploring the Use of Speech Features and Their Corresponding Distribution Characteristics for Robust Speech Recognition." IEEE Transactions on Audio, Speech, and Language Processing 17, no. 1 (January 2009): 84–94. http://dx.doi.org/10.1109/tasl.2008.2007612.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Sen, TjongWan, Bambang Riyanto Trilaksono, Arry Akhmad Arman, and Rila Mandala. "Robust Automatic Speech Recognition Features using Complex Wavelet Packet Transform Coefficients." ITB Journal of Information and Communication Technology 3, no. 2 (2009): 123–34. http://dx.doi.org/10.5614/itbj.ict.2009.3.2.4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Legoh, Kapang, Utpal Bhattacharjee, and T. Tuithung. "Features and Model Adaptation Techniques for Robust Speech Recognition: A Review." Communications on Applied Electronics 1, no. 2 (January 31, 2015): 18–31. http://dx.doi.org/10.5120/cae-1507.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Gairola, Atul, and Swapna Baadkar. "Hindi Speech Recognition System with Robust Front End-Back End Features." International Journal of Computer Applications 64, no. 1 (February 15, 2013): 42–45. http://dx.doi.org/10.5120/10601-5305.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Shen, Peng, Satoshi Tamura, and Satoru Hayamizu. "Multistream sparse representation features for noise robust audio-visual speech recognition." Acoustical Science and Technology 35, no. 1 (2014): 17–27. http://dx.doi.org/10.1250/ast.35.17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Bach, Jörg-Hendrik, Jörn Anemüller, and Birger Kollmeier. "Robust speech detection in real acoustic backgrounds with perceptually motivated features." Speech Communication 53, no. 5 (May 2011): 690–706. http://dx.doi.org/10.1016/j.specom.2010.07.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Jeih-Weih Hung and Lin-Shan Lee. "Optimization of temporal filters for constructing robust features in speech recognition." IEEE Transactions on Audio, Speech and Language Processing 14, no. 3 (May 2006): 808–32. http://dx.doi.org/10.1109/tsa.2005.857801.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Fazel, A., and S. Chakrabartty. "Sparse Auditory Reproducing Kernel (SPARK) Features for Noise-Robust Speech Recognition." IEEE Transactions on Audio, Speech, and Language Processing 20, no. 4 (May 2012): 1362–71. http://dx.doi.org/10.1109/tasl.2011.2179294.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Lee, Moa, and Joon-Hyuk Chang. "Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots." Applied Sciences 10, no. 13 (July 2, 2020): 4602. http://dx.doi.org/10.3390/app10134602.

Full text
Abstract:
Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.
APA, Harvard, Vancouver, ISO, and other styles
40

Li, Naihan, Yanqing Liu, Yu Wu, Shujie Liu, Sheng Zhao, and Ming Liu. "RobuTrans: A Robust Transformer-Based Text-to-Speech Model." Proceedings of the AAAI Conference on Artificial Intelligence 34, no. 05 (April 3, 2020): 8228–35. http://dx.doi.org/10.1609/aaai.v34i05.6337.

Full text
Abstract:
Recently, neural network based speech synthesis has achieved outstanding results, by which the synthesized audios are of excellent quality and naturalness. However, current neural TTS models suffer from the robustness issue, which results in abnormal audios (bad cases) especially for unusual text (unseen context). To build a neural model which can synthesize both natural and stable audios, in this paper, we make a deep analysis of why the previous neural TTS models are not robust, based on which we propose RobuTrans (Robust Transformer), a robust neural TTS model based on Transformer. Comparing to TransformerTTS, our model first converts input texts to linguistic features, including phonemic features and prosodic features, then feed them to the encoder. In the decoder, the encoder-decoder attention is replaced with a duration-based hard attention mechanism, and the causal self-attention is replaced with a "pseudo non-causal attention" mechanism to model the holistic information of the input. Besides, the position embedding is replaced with a 1-D CNN, since it constrains the maximum length of synthesized audio. With these modifications, our model not only fix the robustness problem, but also achieves on parity MOS (4.36) with TransformerTTS (4.37) and Tacotron2 (4.37) on our general set.
APA, Harvard, Vancouver, ISO, and other styles
41

FAROOQ, O., S. DATTA, and M. C. SHROTRIYA. "WAVELET SUB-BAND BASED TEMPORAL FEATURES FOR ROBUST HINDI PHONEME RECOGNITION." International Journal of Wavelets, Multiresolution and Information Processing 08, no. 06 (November 2010): 847–59. http://dx.doi.org/10.1142/s0219691310003845.

Full text
Abstract:
This paper proposes the use of wavelet transform-based feature extraction technique for Hindi speech recognition application. The new proposed features take into account temporal as well as frequency band energy variations for the task of Hindi phoneme recognition. The recognition performance achieved by the proposed features is compared with the standard MFCC and 24-band admissible wavelet packet-based features using a linear discriminant function based classifier. To evaluate robustness of these features, the NOISEX database is used to add different types of noise into phonemes to achieve signal-to-noise ratios in the range of 20 dB to -5 dB. The recognition results show that under noisy background the proposed technique always achieves a better performance over MFCC-based features.
APA, Harvard, Vancouver, ISO, and other styles
42

Meng, Xiang Tao, and Shi Yin. "Speech Recognition Algorithm Based on Nonlinear Partition and GFCC Features." Applied Mechanics and Materials 556-562 (May 2014): 3069–73. http://dx.doi.org/10.4028/www.scientific.net/amm.556-562.3069.

Full text
Abstract:
In order to speed up and enhance the robustness of speech recognition system, this paper proposes a speech recognition algorithm based on segment-level features of GFCC. In training and testing stage we use segment-level features of GFCC which is more robust to noise instead of the widely used MFCC features. Experiment results show that both the training time and test time decreased, while the accuracy of system was made to improve.
APA, Harvard, Vancouver, ISO, and other styles
43

Zvarevashe, Kudakwashe, and Oludayo O. Olugbara. "Recognition of speech emotion using custom 2D-convolution neural network deep learning algorithm." Intelligent Data Analysis 24, no. 5 (September 30, 2020): 1065–86. http://dx.doi.org/10.3233/ida-194747.

Full text
Abstract:
Speech emotion recognition has become the heart of most human computer interaction applications in the modern world. The growing need to develop emotionally intelligent devices has opened up a lot of research opportunities. Most researchers in this field have applied the use of handcrafted features and machine learning techniques in recognising speech emotion. However, these techniques require extra processing steps and handcrafted features are usually not robust. They are computationally intensive because the curse of dimensionality results in low discriminating power. Research has shown that deep learning algorithms are effective for extracting robust and salient features in dataset. In this study, we have developed a custom 2D-convolution neural network that performs both feature extraction and classification of vocal utterances. The neural network has been evaluated against deep multilayer perceptron neural network and deep radial basis function neural network using the Berlin database of emotional speech, Ryerson audio-visual emotional speech database and Surrey audio-visual expressed emotion corpus. The described deep learning algorithm achieves the highest precision, recall and F1-scores when compared to other existing algorithms. It is observed that there may be need to develop customized solutions for different language settings depending on the area of applications.
APA, Harvard, Vancouver, ISO, and other styles
44

Zhou, Bin, Jing Liu, and Zheng Pei. "Noise-Robust Voice Activity Detector Based on Four States-Based HMM." Applied Mechanics and Materials 411-414 (September 2013): 743–48. http://dx.doi.org/10.4028/www.scientific.net/amm.411-414.743.

Full text
Abstract:
Voice activity detection (VAD) is more and more essential in the noisy environments to provide an accuracy performance in the speech recognition. In this paper, we provide a method based on left-right hidden Markov model (HMM) to identify the start and end of the speech. The method builds two models of non-speech and speech instead of existed two states, formally, each model could include several states, we also analysis other features, such as pitch index, pitch magnitude and fractal dimension of speech and non-speech.. We compare the VAD results with the proposed algorithm and two states HMM. Experiments show that the proposed method make a better performance than two states HMMs in VAD, especially in the low signal-to-noise ratio (SNR) environment.
APA, Harvard, Vancouver, ISO, and other styles
45

Park, Taejin, SeungKwan Beack, and Taejin Lee. "Noise Robust Automatic Speech Recognition Scheme with Histogram of Oriented Gradient Features." IEIE Transactions on Smart Processing and Computing 3, no. 5 (October 31, 2014): 259–66. http://dx.doi.org/10.5573/ieiespc.2014.3.5.259.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Hsieh, C. T., E. Lai, and Y. C. Wang. "Robust speech features based on wavelet transform with application to speaker identification." IEE Proceedings - Vision, Image, and Signal Processing 149, no. 2 (2002): 108. http://dx.doi.org/10.1049/ip-vis:20020121.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Jeyalakshmi, C., K. Thenmozhi, and A. Revathi. "Non-spectral features-based robust speaker independent emotion recognition from speech signal." International Journal of Medical Engineering and Informatics 12, no. 5 (2020): 500. http://dx.doi.org/10.1504/ijmei.2020.10031560.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Revathi, A., C. Jeyalakshmi, and K. Thenmozhi. "Non-spectral features-based robust speaker independent emotion recognition from speech signal." International Journal of Medical Engineering and Informatics 12, no. 5 (2020): 500. http://dx.doi.org/10.1504/ijmei.2020.109944.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Shen, Jia-lin, and Wen L. Hwang. "New temporal features for robust speech recognition with emphasis on microphone variations." Computer Speech & Language 13, no. 1 (January 1999): 65–78. http://dx.doi.org/10.1006/csla.1998.0050.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Amrous, Anissa Imen, Mohamed Debyeche, and Abderrahman Amrouche. "Robust Arabic speech recognition in noisy environments using prosodic features and formant." International Journal of Speech Technology 14, no. 4 (September 23, 2011): 351–59. http://dx.doi.org/10.1007/s10772-011-9113-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography