Dissertations / Theses on the topic 'Robust speech features'

To see the other types of publications on this topic, follow the link: Robust speech features.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 45 dissertations / theses for your research on the topic 'Robust speech features.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Saenko, Ekaterina 1976. "Articulatory features for robust visual speech recognition." Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/28736.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.
Includes bibliographical references (p. 99-105).
This thesis explores a novel approach to visual speech modeling. Visual speech, or a sequence of images of the speaker's face, is traditionally viewed as a single stream of contiguous units, each corresponding to a phonetic segment. These units are defined heuristically by mapping several visually similar phonemes to one visual phoneme, sometimes referred to as a viseme. However, experimental evidence shows that phonetic models trained from visual data are not synchronous in time with acoustic phonetic models, indicating that visemes may not be the most natural building blocks of visual speech. Instead, we propose to model the visual signal in terms of the underlying articulatory features. This approach is a natural extension of feature-based modeling of acoustic speech, which has been shown to increase robustness of audio-based speech recognition systems. We start by exploring ways of defining visual articulatory features: first in a data-driven manner, using a large, multi-speaker visual speech corpus, and then in a knowledge-driven manner, using the rules of speech production. Based on these studies, we propose a set of articulatory features, and describe a computational framework for feature-based visual speech recognition. Multiple feature streams are detected in the input image sequence using Support Vector Machines, and then incorporated in a Dynamic Bayesian Network to obtain the final word hypothesis. Preliminary experiments show that our approach increases viseme classification rates in visually noisy conditions, and improves visual word recognition through feature-based context modeling.
by Ekaterina Saenko.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
2

Domont, Xavier. "Hierarchical spectro-temporal features for robust speech recognition." Münster Verl.-Haus Monsenstein und Vannerdat, 2009. http://d-nb.info/1001282655/04.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Javadi, Ailar. "Bio-inspired noise robust auditory features." Thesis, Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/44801.

Full text
Abstract:
The purpose of this work is to investigate a series of biologically inspired modifications to state-of-the-art Mel- frequency cepstral coefficients (MFCCs) that may improve automatic speech recognition results. We have provided recommendations to improve speech recognition results de- pending on signal-to-noise ratio levels of input signals. This work has been motivated by noise-robust auditory features (NRAF). In the feature extraction technique, after a signal is filtered using bandpass filters, a spatial derivative step is used to sharpen the results, followed by an envelope detector (recti- fication and smoothing) and down-sampling for each filter bank before being compressed. DCT is then applied to the results of all filter banks to produce features. The Hidden- Markov Model Toolkit (HTK) is used as the recognition back-end to perform speech recognition given the features we have extracted. In this work, we investigate the role of filter types, window size, spatial derivative, rectification types, smoothing, down- sampling and compression and compared the final results to state-of-the-art Mel-frequency cepstral coefficients (MFCC). A series of conclusions and insights are provided for each step of the process. The goal of this work has not been to outperform MFCCs; however, we have shown that by changing the compression type from log compression to 0.07 root compression we are able to outperform MFCCs for all noisy conditions.
APA, Harvard, Vancouver, ISO, and other styles
4

Schädler, Marc René [Verfasser]. "Robust automatic speech recognition and modeling of auditory discrimination experiments with auditory spectro-temporal features / Marc René Schädler." Oldenburg : BIS-Verlag, 2016. http://d-nb.info/1113296755/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Jancovic, Peter. "Combination of multiple feature streams for robust speech recognition." Thesis, Queen's University Belfast, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.268386.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Fairhurst, Harry. "Robust feature extraction for the recognition of noisy speech." Thesis, University of Liverpool, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.327705.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Darch, Jonathan J. A. "Robust acoustic speech feature prediction from Mel frequency cepstral coefficients." Thesis, University of East Anglia, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445206.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Szymanski, Lech. "Comb filter decomposition feature extraction for robust automatic speech recognition." Thesis, University of Ottawa (Canada), 2005. http://hdl.handle.net/10393/27051.

Full text
Abstract:
This thesis discusses the issues of Automatic Speech Recognition in presence of additive white noise. Comb Filter Decomposition (CFD), a new method for approximating the magnitude of the speech spectrum in terms of its harmonics is proposed. Three feature extraction methods from CFD coefficients are introduced. The performance of the method and resulting features are evaluated using simulated recognition systems with Hidden Markov Model classifiers and conditions of additive white noise under varying Signal to Noise ratios. The results are compared with the performance of the existing robust feature extraction methods. The results show that the proposed method has a good potential for Automatic Speech Recognition under noisy conditions.
APA, Harvard, Vancouver, ISO, and other styles
9

Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.

Full text
Abstract:
In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.
APA, Harvard, Vancouver, ISO, and other styles
10

Mushtaq, Aleem. "An integrated approach to feature compensation combining particle filters and Hidden Markov Models for robust speech recognition." Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/48982.

Full text
Abstract:
The performance of automatic speech recognition systems often degrades in adverse conditions where there is a mismatch between training and testing conditions. This is true for most modern systems which employ Hidden Markov Models (HMMs) to decode speech utterances. One strategy is to map the distorted features back to clean speech features that correspond well to the features used for training of HMMs. This can be achieved by treating the noisy speech as the distorted version of the clean speech of interest. Under this framework, we can track and consequently extract the underlying clean speech from the noisy signal and use this derived signal to perform utterance recognition. Particle filter is a versatile tracking technique that can be used where often conventional techniques such as Kalman filter fall short. We propose a particle filters based algorithm to compensate the corrupted features according to an additive noise model incorporating both the statistics from clean speech HMMs and observed background noise to map noisy features back to clean speech features. Instead of using specific knowledge at the model and state levels from HMMs which is hard to estimate, we pool model states into clusters as side information. Since each cluster encompasses more statistics when compared to the original HMM states, there is a higher possibility that the newly formed probability density function at the cluster level can cover the underlying speech variation to generate appropriate particle filter samples for feature compensation. Additionally, a dynamic joint tracking framework to monitor the clean speech signal and noise simultaneously is also introduced to obtain good noise statistics. In this approach, the information available from clean speech tracking can be effectively used for noise estimation. The availability of dynamic noise information can enhance the robustness of the algorithm in case of large fluctuations in noise parameters within an utterance. Testing the proposed PF-based compensation scheme on the Aurora 2 connected digit recognition task, we achieve an error reduction of 12.15% from the best multi-condition trained models using this integrated PF-HMM framework to estimate the cluster-based HMM state sequence information. Finally, we extended the PFC framework and evaluated it on a large-vocabulary recognition task, and showed that PFC works well for large-vocabulary systems also.
APA, Harvard, Vancouver, ISO, and other styles
11

Herms, Robert [Verfasser], Maximilian [Akademischer Betreuer] Eibl, Maximilian [Gutachter] Eibl, and Günter Daniel [Gutachter] Rey. "Effective Speech Features for Cognitive Load Assessment: Classification and Regression / Robert Herms ; Gutachter: Maximilian Eibl, Günter Daniel Rey ; Betreuer: Maximilian Eibl." Chemnitz : Universitätsverlag Chemnitz, 2019. http://d-nb.info/1215909594/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Shao, Yang. "Sequential organization in computational auditory scene analysis." Columbus, Ohio : Ohio State University, 2007. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1190127412.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Suryanarayana, Venkata K. "Spectro-Temporal Features For Robust Automatic Speech Recognition." Thesis, 2009. http://hdl.handle.net/2005/1007.

Full text
Abstract:
The speech signal is inherently characterized by its variations in time, which get reflected as variations in frequency. The specto temporal changes are due to changes in vocaltract, intonation, co-articulation and successive articulation of different phonetic sounds. In this thesis we are looking for improving the speech recognition performance through better feature parameters using a non-stationary model of speech. One effective means of modeling a general non-stationary signal is using the AM-FM model. AM-FM model can be extended to speech through a sub-band analysis, which can be mimic the auditory analysis. In this thesis, we explore new methods for estimating AM and FM parameters based on the non-uniform samples of the signal. The non-uniform sample approach along with adaptive window estimation provides for important advantage because of multi-resolution analysis. We develop several new methods based on ZC intervals, local extrema intervals and signal derivative at ZC’s as different sample measures of the signal and explore their effectiveness for instantaneous frequency (IF) and instantaneous envelope (IE) estimation. To deal with speech signal for automatic speech recognition, we explore the use of auditory motivated spectro temporal information through the use of an auditory filter bank and signal parameters (or features) are derived from the instantaneous energy in each band using the non-linear energy operator over a larger window length. The temporal correlation present in the signal is exploited by using DCT and keeping the lower few coefficients of DCT to keep the trend in the energy in each band. The DCT coefficients from different frequency bands are concatenated together, and a further spectral decorrelation is achieved through KLT (Karhunen-Loeve Transform) of the concatenated feature vector. The changes in the vocaltract are well captured by the change in the formant structure and to emphasize these details for ASR we have defined a temporal formant by using the AM-FM decomposition of sub-band speech. A uniform wideband non-overlaping filters are used for sub-band decomposition. The temporal formant is defined using the AM-FM parameters of each subband signal. The temporal evolution of a formant is represented by the lower order DCT coefficients of the temporal formant in each band and its use for ASR is explored. To address the robustness of ASR performance to environmental noisy conditions, we have used a hybrid approach of enhancing the speech signal using statistical models of the speech and noise. Use of GMM for statistical speech enhancement has been shown to be effective. It is found that the spectro-temporal features derived from enhanced speech provide further improvement to ASR performance.
APA, Harvard, Vancouver, ISO, and other styles
14

Hsien-ShunKuo and 郭先舜. "Auditory-Based Features for Robust Speech Recognition System." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/17104347666389915184.

Full text
Abstract:
碩士
國立成功大學
電機工程學系
102
An auditory-based feature extraction algorithm is proposed for enhancing the robustness of automatic speech recognition. In the proposed approach, the speech signal is characterized using a new feature referred to as the Basilar-membrane Frequency-band Cepstral Coefficient (BFCC). In contrast to the conventional Mel-Frequency Cepstral Coefficient (MFCC) method based on a Fourier spectrogram, the proposed BFCC method uses an auditory spectrogram based on a gammachirp wavelet transform in order to more accurately mimic the auditory response of the human ear and improve the noise immunity. In addition, a Hidden Markov Model (HMM) is used for both training and testing purposes. The evaluation results obtained using the AURORA 2 noisy speech database show that compared to the MFCC method, Gammatone Wavelet Cepstral Coefficient (GWCC), and Gammatone Frequency Cepstral Coefficient (GFCC), the proposed scheme improves the speech recognition rate by 13%, 17%, and 0.5% on average given speech samples with Siganl-to-Noise Ratios (SNRs) ranging from -5 to 20 dB, respectively.
APA, Harvard, Vancouver, ISO, and other styles
15

Tu, Wen-Hsiang, and 杜文祥. "Enhancing Speech Features in Various Domains for Noise-Robust Speech Recognition." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/66756772463135462510.

Full text
Abstract:
博士
國立暨南國際大學
電機工程學系
100
The performance of an automatic speech recognition (ASR) system is often degraded due to the various types of noise and interference in the application environment. In this disseration, we aim to develop robustness methods specifically for handling additive noise and channel disturbance. In particular, these developed methods are used to refine the mel-frequency cepstral coefficient (MFCC), which is one of the most widely used speech feature representation in ASR. At first, we discuss the effect of noise in the linear spectral domain of MFCC, and then present the approach of magnitude spectrum enhancement (MSE) to refine the spectrum of speech signals. Next, the method of hybrid cepstral statistics normalization is presented to process the MFCC in the mel-spectral domain. Finally, two novel compensation algorithms, modulation spectrum replacement (MSR) and modulation spectrum filtering (MSF), are provided to enhance the MFCC in the cepstral domain. The recognition experiments conducted on the Aurora-2 connected-digit database show that the aforementioned novel methods are capable of improving the recognition accuracy of the MFCC in various noise conditions, and in most cases they perform better than, or at least similarly to, the state-of-the-art noise robustness techniques such as Wiener filtering (WF), spectral subtraction (SS), mean and variance normalization (MVN) and histogram equalization (HEQ).
APA, Harvard, Vancouver, ISO, and other styles
16

Fan, Hao-Teng, and 范顥騰. "Sub-band Processing in Various Domains of Speech Features for Robust Speech Recognition." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/15896893826936012585.

Full text
Abstract:
博士
國立暨南國際大學
電機工程學系
102
The environmental mismatch caused by additive noise and/or channel distortion often dramatically degrades the performance of an automatic speech recognition system (ASR). In order to reduce this mismatch, a plenty of robustness techniques have been developed.This dissertation proposes several novel methods via using sub-band process in different domains of speech features to improve noise robustness for speech recognition. Briefly speaking, in this dissertation we investigate the noise effect in three domains of speech features and then develop the respective counter measures. Firstly, we present the methods of wavelet threshold de-noising and sub-band feature statistics normalization that are applied in temporal domain. Second, two modulation-domain algorithms, sub-band modulation spectrum normalization and modulation spectrum power-law expansion, are developed and evaluated. Finally, we provide a novel scheme that processes high- and lowpass portions of the spatial-domain features, and this scheme is called weighted sub-band histogram equalization. The presented novel methods are examined in two databases, Aurora-2 and Aurora-4. The corresponding experiment results show these sub-band methods behave better than the respective full-band methods in most cases, and they benefit the speech recognition process significantly by improving the recognition accuracy under a wide range of noise environments.
APA, Harvard, Vancouver, ISO, and other styles
17

Lin, Wen-chi, and 林文琦. "DCT-based Processing of Dynamic Features for Robust Speech Recognition." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/32454204118543401202.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
98
In this thesis, we explore the various properties of cepstral time coefficients (CTC) in speech recognition, and then propose several methods to refine the CTC construction process. It is found that CTC are the filtered version of mel-frequency cepstral coefficients (MFCC), and the used filters are from the discrete cosine transform (DCT) matrix. We modify these DCT-based filters by windowing, removing DC gain, and varying the filter length. The speech recognition task using Aurora-2 digit database show that the proposed methods can enhance the original CTC in improving the recognition accuracy. The resulting relative error reduction is around 20%.
APA, Harvard, Vancouver, ISO, and other styles
18

Ion, Valentin [Verfasser]. "Transmission error robust speech recognition using soft features / von Valentin Ion." 2008. http://d-nb.info/990334589/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Ni-Chun, Wang, and 王迺鈞. "Multi-eigenvector-based Features and Related Topics for Robust Speech Recognition." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/01537864568510620829.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Hsieh, Hsin-Ju, and 謝欣汝. "Compensating the speech features in the temporal domain via discrete cosine transform for robust speech recognition." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/00106248218554232326.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
99
In this thesis, we develop a series of algorithms to improve the noise robustness of speech features based on discrete cosine transform (DCT). The DCT-based modulation spectra of clean speech feature streams in the training set are employed to generate two sequences representing the reference magnitudes and magnitude weights, respectively. The two sequences are then used to update the magnitude spectrum of each feature stream in the training and testing sets. The resulting new feature streams have shown robustness against the noise distortion. The experiments conducted on the Aurora-2 digit string database reveal that the proposed DCT-based approaches can provide relative error reduction rates of over 25% as compared with the baseline system using MVN-processed MFCC features. Experimental results also show that these new algorithms are well additive to many noise robustness methods to produce even higher recognition accuracy rates.
APA, Harvard, Vancouver, ISO, and other styles
21

Lu, I.-Chia, and 呂宜家. "Exploiting wavelet de-noising in the temporal sequencesof features for robust speech recognition." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/03773923951424999965.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
99
In this thesis, we propose to apply the wavelet de-noising (WD) techniques in temporal-domain feature sequences for enhancing the noise robustness in order to improve the accuracy of noisy speech recognition. In the proposed method, the temporal domain feature sequence is first processed by some specific statistic normalization scheme, such as mean and variance normalization (MVN) and cepstral gain normalization(CGN), and then dealt with the wavelet de-noising algorithm. With this process, we find that the wavelet de-noising procedure can effectively reduce the middle and high modulation frequency distortion remaining in the statistics-normalized speech features. On the Aurora-2 digit database and task, experimental results show that the above process can significantly improve the accuracy of speech recognition under noise environments. The pairing of WD and CMVN/CGN provides about 20% relative error reduction associated with the MFCC baseline, outperforms the individual CMVN/CGN, and makes the overall recognition rate beyond 90%.
APA, Harvard, Vancouver, ISO, and other styles
22

"Exploitation of phase and vocal excitation modulation features for robust speaker recognition." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075192.

Full text
Abstract:
Mel-frequency cepstral coefficients (MFCCs) are widely adopted in speech recognition as well as speaker recognition applications. They are extracted to primarily characterize the spectral envelope of a quasi-stationary speech segment. It was shown that cepstral features are closely related to the linguistic content of speech. Besides the magnitude-based cepstral features, there are resources in speech, e.g, the phase and excitation source, are believed to contain useful properties for speaker discrimination. Moreover, in real situations, there are large variations exist between the development and application scenarios for a speaker recognition system. These include channel mismatch, recording apparatus mismatch, environmental variation, or even change of emotional/healthy state of speakers. As a consequence, the magnitude-based features are insufficient to provide satisfactory and robust speaker recognition accuracy. Therefore, the exploitation of complementary features with MFCCs may provide one solution to alleviate the deficiency, from a feature-based perspective.
Speaker recognition (SR) refers to the process of automatically determining or verifying the identity of a person based on his or her voice characteristics. In practical applications, a voice can be used as one of the modalities in a multimodal biometric system, or be the sole medium for identity authentication. The general area of speaker recognition encompasses two fundamental tasks: speaker identification and speaker verification.
Wang, Ning.
Adviser: Pak-Chung Ching.
Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical references (leaves 177-193).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
APA, Harvard, Vancouver, ISO, and other styles
23

Pan, Chi-an, and 潘吉安. "Study of the Improved Normalization Techniques of Energy-Related Features for Robust Speech Recognition." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/04279772087665285401.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
96
The rapid development of speech processing techniques has made themselves successfully applied in more and more applications, such as automatic dialing, voice-based information retrieval, and identity authentication. However, some unexpected variations in speech signals deteriorate the performance of a speech processing system, and thus relatively limit its application range. Among these variations, the environmental mismatch caused by the embedded noise in the speech signal is the major concern of this thesis. In this thesis, we provide a more rigorous mathematical analysis for the effects of the additive noise on two energy-related speech features, i.e. the logarithmic energy (logE) and the zeroth cepstral coefficient (c0). Then based on these effects, we propose a new feature compensation scheme, named silence feature normalization (SFN), in order to improve the noise robustness of the above two features for speech recognition. It is shown that, regardless of its simplicity in implementation, SFN brings about very significant improvement in noisy speech recognition, and it behaves better than many well-known feature normalization approaches. Furthermore, SFN can be easily integrated with other noise robustness techniques to achieve an even better recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
24

"Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances." 2007. http://library.cuhk.edu.hk/record=b5893317.

Full text
Abstract:
Wang, Ning.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.
Includes bibliographical references (leaves 106-115).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Introduction to Speech and Speaker Recognition --- p.1
Chapter 1.2 --- Difficulties and Challenges of Speaker Authentication --- p.6
Chapter 1.3 --- Objectives and Thesis Outline --- p.7
Chapter 2 --- Speaker Recognition System --- p.10
Chapter 2.1 --- Baseline Speaker Recognition System Overview --- p.10
Chapter 2.1.1 --- Feature Extraction --- p.12
Chapter 2.1.2 --- Pattern Generation and Classification --- p.24
Chapter 2.2 --- Performance Evaluation Metric for Different Speaker Recognition Tasks --- p.30
Chapter 2.3 --- Robustness of Speaker Recognition System --- p.30
Chapter 2.3.1 --- Speech Corpus: CU2C --- p.30
Chapter 2.3.2 --- Noise Database: NOISEX-92 --- p.34
Chapter 2.3.3 --- Mismatched Training and Testing Conditions --- p.35
Chapter 2.4 --- Summary --- p.37
Chapter 3 --- Speaker Recognition System using both Vocal Tract and Vocal Source Features --- p.38
Chapter 3.1 --- Speech Production Mechanism --- p.39
Chapter 3.1.1 --- Speech Production: An Overview --- p.39
Chapter 3.1.2 --- Acoustic Properties of Human Speech --- p.40
Chapter 3.2 --- Source-filter Model and Linear Predictive Analysis --- p.44
Chapter 3.2.1 --- Source-filter Speech Model --- p.44
Chapter 3.2.2 --- Linear Predictive Analysis for Speech Signal --- p.46
Chapter 3.3 --- Vocal Tract Features --- p.51
Chapter 3.4 --- Vocal Source Features --- p.52
Chapter 3.4.1 --- Source Related Features: An Overview --- p.52
Chapter 3.4.2 --- Source Related Features: Technical Viewpoints --- p.54
Chapter 3.5 --- Effects of Noises on Speech Properties --- p.55
Chapter 3.6 --- Summary --- p.61
Chapter 4 --- Estimation of Robust Acoustic Features for Speaker Discrimination --- p.62
Chapter 4.1 --- Robust Speech Techniques --- p.63
Chapter 4.1.1 --- Noise Resilience --- p.64
Chapter 4.1.2 --- Speech Enhancement --- p.64
Chapter 4.2 --- Spectral Subtractive-Type Preprocessing --- p.65
Chapter 4.2.1 --- Noise Estimation --- p.66
Chapter 4.2.2 --- Spectral Subtraction Algorithm --- p.66
Chapter 4.3 --- LP Analysis of Noisy Speech --- p.67
Chapter 4.3.1 --- LP Inverse Filtering: Whitening Process --- p.68
Chapter 4.3.2 --- Magnitude Response of All-pole Filter in Noisy Condition --- p.70
Chapter 4.3.3 --- Noise Spectral Reshaping --- p.72
Chapter 4.4 --- Distinctive Vocal Tract and Vocal Source Feature Extraction . . --- p.73
Chapter 4.4.1 --- Vocal Tract Feature Extraction --- p.73
Chapter 4.4.2 --- Source Feature Generation Procedure --- p.75
Chapter 4.4.3 --- Subband-specific Parameterization Method --- p.79
Chapter 4.5 --- Summary --- p.87
Chapter 5 --- Speaker Recognition Tasks & Performance Evaluation --- p.88
Chapter 5.1 --- Speaker Recognition Experimental Setup --- p.89
Chapter 5.1.1 --- Task Description --- p.89
Chapter 5.1.2 --- Baseline Experiments --- p.90
Chapter 5.1.3 --- Identification and Verification Results --- p.91
Chapter 5.2 --- Speaker Recognition using Source-tract Features --- p.92
Chapter 5.2.1 --- Source Feature Selection --- p.92
Chapter 5.2.2 --- Source-tract Feature Fusion --- p.94
Chapter 5.2.3 --- Identification and Verification Results --- p.95
Chapter 5.3 --- Performance Analysis --- p.98
Chapter 6 --- Conclusion --- p.102
Chapter 6.1 --- Discussion and Conclusion --- p.102
Chapter 6.2 --- Suggestion of Future Work --- p.104
APA, Harvard, Vancouver, ISO, and other styles
25

WANG, SHANG-YU, and 王上瑜. "A Study of Applying Noise-Robust Features in Reduced Frame-Rate Acoustic Models for Speech Recognition." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/63485710426800421992.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
104
Speech recognition in mobile devices has been increasingly popular in our life, while it has to deal with the requirements of high recognition accuracy and low transmission load. One of the most challenging tasks for improving the recognition accuracy for real-world applications is to alleviate the noise effect, and one prominent way to reducing the transmission load is to make the speech features as compact as possible. In this study, we evaluate and explore the effectiveness of integrating the noise-robust speech feature representation with the reduced frame-rate acoustic model architecture. The used noise-robustness algorithms for improving features include cepstral mean subtraction (CMS), ceptral mean and variance normalization (MVN), histogram equalization (HEQ), cepstral gain normalization (CGN), MVN plus auto-regressive moving average filtering (MVA) and modulation spectrum power-law expansion (MSPLE). On the other hand, the adapted hidden Markov model (HMM) structure for reduced frame-rate (RFR) speech features, developed by Professor Lee-min Lee, is exploited in our evaluation task. The experiments conducted on the Aurora-2 digit database shows that: in the clean noise-free situation, the adapted HMM with the RFR features can provide comparable recognition accuracy relative to the non-adapted HMM with full frame-rate (FFR) features, while in the noisy situations, the noise-robustness algorithms work well in the RFR HMM scenarios and are capable of improving the recognition performance even when the RFR down-sampling ratio is as low as 1/4.
APA, Harvard, Vancouver, ISO, and other styles
26

Kao, Shyh-Jer, and 高世哲. "Robust Speech Recognition Based on Feature Normalization." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/42377056084343012440.

Full text
Abstract:
碩士
國立交通大學
電信工程系所
95
In this thesis. Some robust speech feature processing algorithms were proposed, in order to improve the speech recognition performance under the noisy environments . First, the well-known robust speech feature processing algorithms such as mean variance normalization(MVN) and histogram equalization(HEQ) was implemented in a Mandarin AURORA-like system database as the base-line system. Then, the class-based MVA was proposed to further implement the speech recognition performance. The class-based MVA algorithm was first categorized the signal into speech and non-speech parts and applied MVAs to each class separately. A 82.26% recognition rate can be achieved comparing to 81.31% in traditional MVA. Final, a Three-class voiced, unvoiced and non-speech MVA was investigated. A 86.25% recognition rate can be achieved under the ideal category of voiced/unvoiced/non-speech case.
APA, Harvard, Vancouver, ISO, and other styles
27

Huang, Yung-Sheng, and 黃永勝. "Robust Speech Recognition : Improved temporal filtering on speech feature coefficients." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/59595195142262613727.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
92
In recent year, with the fast expanding of high-tech industry, computer is much important for human being. For the convenient communication to computer, Man Machine Interface of speech recognition has become one of the most important researches in the world and also has been applied in daily life. However, the most important problem of speech recognition techniques is the accuracy of recognition. But, when there is a mismatch between the acoustic conditions of training and application environments for a speech recognition system, the performance of the system very often is seriously degraded for speech recognition system. Hence, the robustness of speech recognition techniques with respect to any of these different mismatched acoustic conditions becomes very important. In this thesis, we develop some robust techniques of speech feature coefficients for the mismatch to alleviate the effect of noise of outside environment and the level of mismatch conditions. This thesis is organized as follows: (1)Maximum Mutual Information temporal filters approach (2)Discussing the length of the data-driven temporal filters (3)Multi-Eigenvector temporal filters approach(4)Combining temporal filters with Nonlinear Spectral Subtraction In the first part, we proposed the use of new optimization criteria, Maximum Mutual Information (MMI), is applied in the optimization process to obtain temporal filters. Experimental results found that the MMI-derived temporal filters significantly improve the recognition performance as RASTA temporal filter do. Especially, MMI-derived temporal filters work with nonstationary noise better than stationary noise. In the second part, we will analyze the length for the FIR filters and discuss the effect of the length to the accuracy of recognition. Experimental results also found that different length has the effect to the recognition. We found that the optimal lengths for LDA- , PCA-, MCE, and Feature -based MCE- derived filters may be taken as 11, 11, 101, and 101, respectively. In the third part, we introduce the Multi-Eigenvector temporal filters approach which it design form the first M eigenvectors obtained in LAD or PCA are weighted by their corresponding eigenvalues or square root of eigenvalues and summed to be used as the filter coefficients. In order to include much data and make the robust feature coefficients. In the experiments, we found that the high accuracy of recognition for LDA, and PCA filters may be weighted as square root of eigenvalues, and eigenvalues, respectively. In the last part, we will make use of spectra subtraction to enhance the speech signal in the spectral domain, so as to reduce the influence by noise. Next, we make use of temporal filters in the cepstral domain. Experimental results showed that the temporal filters are combined with the spectra subtraction, the recognition performance will be further improved. So combining with the two kinds of approach in the different domain is additive.
APA, Harvard, Vancouver, ISO, and other styles
28

Lin, Meng-kai, and 林盟凱. "Feature Exponent Adjustment Methods in Robust Speech Recognition." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/51556065034316819061.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
94
The performance of a speech recognition system is often degraded due to the mismatch between the environments of development and application. One of the major sources that give rise to this mismatch is, additive noise. The approaches for handling the problem of additive noise can be divided into three classes, speech enhancement, robust representation of speech, and compensation of speech models. In this thesis, the discussed methods belong to the first class, speech enhancement techniques. A common characteristic of our studied and proposed approaches in this thesis is the processing of exponentiation. When the exponentiation is performed on the original mel-frequency cepstral coefficients (MFCC), the resulted method is called cepstral exponent adjustment (CEA). On the other hand, when the exponentiation is carried out on the logarithmic spectrum or directly replace the logarithm operation, during the derivation process of MFCC, the resulted algorithms are called Exponentiated log-MelFBS (ExpoMFCC) and root Mel-filter bank spectrum (RMFCC), respectively. As a result, the three and applied to obtain new speech features for recognition in a noisy environment. Experimental results show that they apparently enhance the robustness of the speech features and thus improve the recognition accuracy. Moreover, they can be integrated with other robustness to obtain further improvement.
APA, Harvard, Vancouver, ISO, and other styles
29

陳韋豪. "Feature Normalization Exploiting Spatial-Temporal Distribution Characteristics for Robust Speech Recognition." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/01407714632726246776.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Hsieh, Tsung-Hsueh, and 謝宗學. "Feature Statistics Compensation for Robust Speech Recognition in Additive Noise Environments." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/77143721882774978160.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
95
To improve the accuracy of a speech recognition system under a mismatched noisy environment has always been a major research issue in the speech processing area. A great amount of approaches have been proposed to reduce this environmental mismatch, and one class of these approaches focuses on normalizing the statistics of speech features under different noise conditions. The well-known utterance-based cepstral mean and variance normalization (U-CMVN) and segmental cepstral mean and variance normalization (S-CMVN) both belong to this class. Both of them make use of the whole utterance or segments of an utterance to estimate the statistics, which may be not accurate enough, and they cannot be implemented in an on-line manner. In the thesis, instead of estimating the statistics in an utterance-wise manner as in U-CMVN and S-CMVN, we construct two set of codebooks, called pseudo stereo codebooks, which represent the speech features in clean and noisy environments, respectively. Then based on pseudo stereo codebooks, we develop three feature compensation approaches, i.e., cepstral statistics compensation (CSC), linear least squares (LLS) regression, and quadratic least squares (QLS) regression. These new approaches are simple yet very effective. Online implementation of them is achievable. We perform the proposed three approaches on four different types of cepstral features, including mel-frequency cepstral coefficients (MFCC), auto-correlation mel-frequency cepstral coefficients (AMFCC), linear prediction cepstral coefficients (LPCC) and perceptual linear prediction cepstral coefficients (PLPCC). Experiments conducted on the Aurora-2 database show that for each type of speech features, the proposed three approaches bring about very encouraging performance improvements under various noise environments. Besides, compared with the traditional utterance-based CMVN and segmental CMVN, the three approaches provide further improved recognition accuracy.
APA, Harvard, Vancouver, ISO, and other styles
31

Chu, Po-Han, and 祝伯翰. "Front-End Feature Processing using Particle Filter for Robust Speech Recognition." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/56355259808505297847.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Wu, Szyu, and 吳思予. "The study of speech feature shaping and normalization in quefrency bands for noise-robust speech recognition." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/53851035103957265947.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
100
In this study, we develop a novel noise-robustness method, termed weighted sub-band level histogram equalization (WS-HEQ), to promote the speech recognition accuracy in a noise-corrupted environment. Based on the observation that the high-pass and low-pass portions of the intra-frame cepstral features possess unequal importance for speech recognition and different signal-to-noise ratios (SNRs), WS-HEQ intends to alleviate the high-pass portion in order to highlight the speech components and reduce the effect of noise. Furthermore, we provide four variants of WS-HEQ, which primarily refer to the structure of sub-band level histogram equalization (S-HEQ). In the experiments conducted on the Aurora-2 connected US digit database, we show that all the presented four variants of WS-HEQ give significant recognition improvements relative to the MFCC baseline in various noise-corrupted situations. WS-HEQ outperforms HEQ in recognition accuracy, and it behaves better than S-HEQ in most cases. Besides, WS-HEQ can be implemented more efficiently than S-HEQ since fewer HEQ processes are needed in WS-HEQ than S-HEQ.
APA, Harvard, Vancouver, ISO, and other styles
33

Tu, Wen Hsiang, and 杜文祥. "Study on the Voice Activity Detection Techniques for Robust Speech Feature Extraction." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/76966247400637028949.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
95
The performance of a speech recognition system is often degraded due to the mismatch between the environments of development and application. One of the major sources that give rises to this mismatch is additive noise. The approaches for handling the problem of additive noise can be divided into three classes: speech enhancement, robust speech feature extraction, and compensation of speech models. In this thesis, we are focused on the second class, robust speech feature extraction. The approaches of speech robust feature extraction are often together with the voice activity detection in order to estimate the noise characteristics. A voice activity detector (VAD) is used to discriminate the speech and noise-only portions within an utterance. This thesis primarily investigates the effectiveness of various features for the VAD. These features include low-frequency spectral magnitude (LFSM), full-band spectral magnitude (FBSM), cumulative quantized spectrum (CQS) and high-pass log-energy. The resulting VAD offers the noise information to two noise-robustness techniques, spectral subtraction (SS) and silence log-energy normalization (SLEN), in order to reduce the influence of additive noise in speech recognition. The recognition experiments are conducted on Aurora-2 database. Experimental results show that the proposed VAD is capable of providing accurate noise information, with which the following processes, SS and SLEN, significantly improve the speech recognition performance in various noise-corrupted environments. As a result, we confirm that an appropriate selection of features for VAD implicitly improves the noise robustness of a speech recognition system.
APA, Harvard, Vancouver, ISO, and other styles
34

Tseng, Wen-Yu, and 曾文俞. "Linear Prediction Processing of Feature Time Sequences for Noise-Robust Speech Recognition." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/99175156112060279105.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
101
This paper presents a novel method for extracting noise-robust speech features in speech recognition. In the presented method, the algorithm of linear predictive coding (LPC) is exploited on the feature time series of mel-frequency cepstral coefficients (MFCC), and the resulting linear predictive version of the feature time series are shown to be more noise-robust than the original one, with the probable reason that the prediction error component that corresponds to noise effect is reduced. The process of LPC in the presented method is analogous to a temporal filter that can emphasize the speech component of the feature time series. Experiments conducted on the Aurora-2 connected digit database shows that the presented approach can enhance the noise robustness of various types of features in terms of significant improvement in recognition performance under a wide range of noise environments.
APA, Harvard, Vancouver, ISO, and other styles
35

Kao, Yu-chen, and 高予真. "Distribution-based Feature Normalization with Temporal-Structural Information on Robust Speech Recognition." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/27w69n.

Full text
Abstract:
碩士
國立臺灣師範大學
資訊工程學系
101
Recently, histogram equalization (HEQ) of speech features has received considerable attention in the area of robust speech recognition because of its relative simplicity and good empirical performance. In this thesis, we present a polynomial variant of spectral histogram equalization (SHE) on the modulation spectra of speech features and a novel extension to the conventional HEQ approach conducted on the cepstral domain. Our HEQ methods at least have the following two attractive properties. First, polynomial regression of various orders is employed to efficiently perform feature normalization building upon the notion of HEQ. Second, not only the contextual distributional statistics but also the dynamics of feature values are taken as the input to the presented regression functions for better normalization performance. By doing so, we can to some extent relax the dimension-independence and bag-of-frames assumptions made by the conventional HEQ methods. All experiments were carried out on the Aurora-2 corpus and task and further verified on the Aurora-4 corpus and task. The corresponding results demonstrate that our proposed methods can achieve considerable word error rate reductions over the baseline systems and offer additional performance gains for the AFE-processed features.
APA, Harvard, Vancouver, ISO, and other styles
36

Chiou, Sheng-chiuan, and 邱聖權. "Auditory Based Modification of MFCC Feature Extraction for Robust Automatic Speech Recognition." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/9qrexg.

Full text
Abstract:
碩士
國立中山大學
資訊工程學系研究所
97
The human auditory perception system is much more noise-robust than any state-of theart automatic speech recognition (ASR) system. It is expected that the noise-robustness of speech feature vectors may be improved by employing more human auditory functions in the feature extraction procedure. Forward masking is a phenomenon of human auditory perception, that a weaker sound is masked by the preceding stronger masker. In this work, two human auditory mechanisms, synaptic adaptation and temporal integration are implemented by filter functions and incorporated to model forward masking into MFCC feature extraction. A filter optimization algorithm is proposed to optimize the filter parameters. The performance of the proposed method is evaluated on Aurora 3 corpus, and the procedure of training/testing follows the standard setting provided by the Aurora 3 task. The synaptic adaptation filter achieves relative improvements of 16.6% over the baseline. The temporal integration and modified temporal integration filter achieve relative improvements of 21.6% and 22.5% respectively. The combination of synaptic adaptation with each of temporal integration filters results in further improvements of 26.3% and 25.5%. Applying the filter optimization improves the synaptic adaptation filter and two temporal integration filters, results in the 18.4%, 25.2%, 22.6% improvements respectively. The performance of the combined-filters models are also improved, the relative improvement are 26.9% and 26.3%.
APA, Harvard, Vancouver, ISO, and other styles
37

張志豪. "Robust And Discriminative Feature Extraction Techniques For Large Vocabulary Continuous Speech Recognition." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/90526888062025408931.

Full text
Abstract:
碩士
國立臺灣師範大學
資訊工程研究所
93
Speech is the primary and the most convenient means of communication between people. Due to the successful development of much smaller electronic devices and the popularity of wireless communication and networking, it is widely believed that speech will play a more active role and will serve as the major human-machine interface for the interaction between people and different kinds of smart devices in the near future. Therefore, research on automatic speech recognition (ASR) is now becoming more and more emphasized, and in which the development of discriminative as well as robust feature extraction approaches for ASR to be deployed in real and diverse environments has continuously gained much attention over the past two decades. With the above observation in mind, in this thesis we studied the techniques of auditory-perception-based feature extraction and data-driven linear feature transformation for robust speech recognition. For auditory-perception-based feature extraction, we extensively compares the conventional Mel-frequency Cepstral Coefficients (MFCC) with the Perceptual Linear Prediction Coefficients (PLPC), as well as compared various ways to derive and combine their corresponding time trajectory information. For data-driven linear feature transformation, we started with the attempt to show the superior performance of the linear discriminant analysis (LDA) over that of the principal component analysis (PCA) in the feature transformation for speech recognition. We then investigated several improved approaches, such as the heteroscedastic linear discriminant analysis (HLDA) and heteroscedastic discriminant analysis (HDA) etc., for removing the inherent assumption of the same cluster variation in the derivation of LDA. Moreover, we proposed the use of the minimum classification error (MCE) and maximum mutual information (MMI) criteria, respectively, in the optimization of the transformation matrices, in comparison to the maximum likelihood (ML) criterion. Finally, the maximum likelihood linear transformation (MLLT) and other robust techniques, such as the feature mean subtraction or/and variance normalization were further applied. All experiments were carried out on the Mandarin broadcast news corpus (MATBN). Very promising experimental results were initially indicated.
APA, Harvard, Vancouver, ISO, and other styles
38

Tsai, Shang-nien, and 蔡尚年. "Robust Speech Feature Front-End Processing Techniques Based on Progressive Histogram Equalization." Thesis, 2004. http://ndltd.ncl.edu.tw/handle/70500526488219371940.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Yeh, Bing-Feng, and 葉秉豐. "Gaussian Mixture Model-based Feature Compensation with Application to Noise-robust Speech Recognition." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/69780960016247960787.

Full text
Abstract:
碩士
國立中山大學
資訊工程學系研究所
100
In this paper, we propose a new method for noise robustness base on Gaussian Mixture Model (GMM), and the method we proposed can estimate the noise feature effectively and reduce noise effect by plain fashion, and we can retain the smoothing and continuity from original feature in this way. Compared to the traditional feature transformation method MMSE(Minimum Mean Square Error) which want to find a clean one, the different is that the method we proposed only need to fine noise feature or the margin of noise effect and subtract the noise to achieve more robustness effect than traditional methods. In the experiment method, the test data pass through the trained noise classifier to judge the noise type and SNR, and according to the result of classifier to choose the corresponding transformation model and generate the noise feature by this model, and then we can use different weight linear combination to generate noise feature, and finally apply simple subtraction to achieve noise reduction. In the experiment, we use AURORA 2.0 corpus to estimate noise robustness performance, and using traditional method can achieve 36:8% relative improvement than default, and the our method can achieve 52:5% relative improvement, and compared to the traditional method our method can attain 24:9% relative improvement.
APA, Harvard, Vancouver, ISO, and other styles
40

Tai, Chung-Fu, and 戴仲甫. "The Improved Techniques of Energy Feature Enhancement and FrameSelection for Robust Speech Recognition." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/28763460961441249370.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
94
The performance of a speech recognition system is often seriously degraded in the presence of noise. In this thesis, the problem of additive noise is the main issue and frame selectivity techniques and some compensation approaches on the energy feature are analyzed to improve the recognition accuracy. A common advantage of these methods is its low computation complexity. They help to separate the non-speech portion from the whole utterance, and thus the recognition speed can be further improved. In chapter 3, we discuss the methods that compensate all cepstral. They are Spectral Subtraction ( SS ), Cepstral Mean and Variance Normalization ( CMVN ), Selective Frame Cepstral Mean Variance Normalization ( SFCMVN), Log Energy Frame Selection ( LEFS ) and Dynamic Frame Selection,( DFS ). In chapter 4, the methods that just deal with the energy feature are analyzed. They are Sample-based Noise Amplitude Subtraction ( S-NAS ), Frame Energy Subtraction ( FES ), Log Energy Dynamic Range Normalization ( LERN ), Energy Normalization ( EN ), Log Energy Normalization ( LEN ) and Silence Log Energy Normalization ( SLEN ). Our recognition experiments are performed on Aurora2 database. From the experimental results, it is show that all of the approaches in this thesis can enhance the recognition performance under various noisy environments. Particularly, compensation on the energy feature brings further significant improvements, which indicates that the energy component is one of the most salient features for speech recognition.
APA, Harvard, Vancouver, ISO, and other styles
41

Cheng-Wei, Liu, and 劉成韋. "A Study on Feature Normalization and Other Improved Techniques for Robust Speech Recognition." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/45415359257552707820.

Full text
Abstract:
碩士
國立臺灣師範大學
資訊工程研究所
93
In the course of evolution for thousands of years, human beings have continuously acquired as well as accumulated their knowledge from their daily life. Therefore, the civilization and evolution of human beings were almost on a par with each other in the past several thousand years. However, the quick development of technology nowadays has surmounted the evolution of human beings further. For example, huge quantities of multimedia information, such as broadcast radio and television programs, voice mails, digital archives and so on, are continuously growing and filling our computers, networks and lives. Therefore, accessing multimedia information at anytime, anywhere by small handheld mobile devices is now becoming more and more emphasized. It is well known that speech is the primary and the most convenient means of communication between people, and it will play a more active role and serve as the major human-machine interface for the interaction between people and different kinds of smart devices in the near future. Hence, it would be much more comfortable if we could use speech as the human-machine interface, and automatically transcribe, retrieve and summarize multimedia using the speech information inherent in it. However, speech recognition is usually interfered with some complicated factors, such as the background and channel noises, speaker and linguistic variations, etc., which make the current state-of-the-art recognition systems still far from perfect. With these observations in mind, in this thesis, several attempts were made to improve the current speech robustness techniques, as well as to find a way to integrate them together. The experiments were carried out on the Aurora 2.0 database and the Mandarin broadcast news speech collected in Taiwan. Considering the phonetic characteristics of the Chinese language, a modified histogram equalization (MHEQ) approach was first proposed. Separated reference histograms for the silence and speech segments (MHEQ-2), or more precisely, the silence, INITIAL and FINAL segments (MHEQ-3) in Chinese, were established. The proposed approach can yield above 5.75% and 4.04% relative improvements over the baseline system and the conventional table-based histogram equalization (THEQ) approach, respectively, in the clean environments. Furthermore, the spectral entropy features obtained after Linear Discriminant Analysis (LDA) were used to augment the Mel-frequency cepsctral features, and considerable improvements were initially indicated. Finally, fusion of the above proposed approaches was also investigated with very promising results demonstrated.
APA, Harvard, Vancouver, ISO, and other styles
42

Chang, Yang, and 張暘. "Robust Speech Recognition with Two-dimensional Frame-and-feature Weighting and Modulation Spectrum Normalization." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/80127356350852988068.

Full text
Abstract:
碩士
國立臺灣大學
電信工程學研究所
100
In this paper we propose a new approach of two-dimensional frame-and-feature weighted Viterbi decoding performed at the recognizer back-end for robust speech recognition. The frame weighting is based on an Support Vector Machine (SVM) classifier considering the energy distribution and cross-correlation spectrum of the frame. The basic idea is that voiced frames with higher harmonicity is in general more reliable than other frames in noisy speech and therefore should be weighted higher. The feature weighting is based on an entropy measure considering confusion between phoneme classes. The basic idea is that the scores obtained with more discriminating features causing less confusion between phonemes should be weighted higher. These two different weighting schemes on the two different dimensions, frames and features, are then properly integrated in Viterbi decoding. Very significant improvements were achieved in extensive experiments performed with the Aurora 4 testing environment for all types of noise and all SNR values.
APA, Harvard, Vancouver, ISO, and other styles
43

Liang, Cheng-hao, and 梁振浩. "Vehicle Distance Estimation Using Optical Flow and Speed Up Robust Feature." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/28666570965231154217.

Full text
Abstract:
碩士
國立中央大學
資訊工程學系
103
Because effect of accident rate rising on the highway, making vehicle anti-collision system as the current main trends. Moreover, currently vehicle anti-collision system is also widely used in unmanned aerial vehicles. Such as Apple, Benz, BMW, Audi, etc. Among Google Driverless Car as a representative. In recent years, most of the vehicle anti-collision system with sensors to prevent collisions. The reason the current prices of sensors are still expensive and the consumer demand of people is not high. Then, making low-cost vehicle anti-collision system to be vendors essential considerations. The purpose of the paper is to use single camera without assisting sensors to detect vehicle. Then, both symmetric line and the bottom of vehicle are met condition as vehicle. After getting the location of vehicle, we can use the information of vehicle and look-up table to convert the real distance. And then we can determine whether the forward vehicle is too close. By using both the strong direction of optical flow and the information of lane detected, we can determine whether vehicle departure lane and the near vehicle quickly drive to own lane. Then, if is true to alert driver. Avoiding accident to ensure safety of driver. Three different experiments were conducted to verify the validity of our proposed method. They were categorized in terms of candidate vehicle detection, candidate vehicle filter and judge, vehicle tracking. Experimental results demonstrate that the proposed method exhibit better detection rate.
APA, Harvard, Vancouver, ISO, and other styles
44

Fan, Hao-teng, and 范顥騰. "The Study of Sub-band Feature Statistics Compensation Techniques Based on a Discrete Wavelet Transform for Robust Speech Recognition." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/24264693292357990593.

Full text
Abstract:
碩士
國立暨南國際大學
電機工程學系
97
The environmental mismatch caused by additive noise and/or channel distortion often degrades the performance of a speech recognition system seriously. Various robustness techniques have been proposed to reduce this mismatch, and one category of them aims to normalize the statistics of speech features in both training and testing conditions. In general, these statistics normalization methods deal with the speech feature sequences in a full-band manner, which somewhat ignores the fact that different modulation frequency components have unequal importance for speech recognition. With the above observations, in this paper we propose that the speech feature streams be processed in a sub-band manner. The processed temporal-domain feature sequence is first decomposed into non-uniform sub-bands using discrete wavelet transform (DWT), and then each sub-band stream is individually processed by the well-known normalization methods, like mean and variance normalization (MVN) and histogram equalization (HEQ). Finally, we reconstruct the feature stream with all the modified sub-band streams using inverse DWT. With this process, the components that correspond to more important modulation spectral bands in the feature sequence can be processed separately. For the Aurora-2 clean-condition training task, the new proposed sub-band MVN and HEQ provide relative error rate reductions of 20.32% and 16.39% over the conventional MVN and HEQ, respectively. These results reveal that the proposed methods significantly enhance the robustness of speech features in noise-corrupted environments.
APA, Harvard, Vancouver, ISO, and other styles
45

Chao-chieh, CHEN, and 陳兆捷. "Design and Implementation of Mandarin Robot Speech Recognition System using Linear Transient/Steady-State Feature Coefficient Alignment." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/03533150119221902400.

Full text
Abstract:
碩士
國立臺灣科技大學
電機工程系
97
This thesis design and implement of a robotic Mandarin speech recognition system.Due to the reason that the commonly used conventional robot speech recognition system “Dynamic Time Warping model (DTW)” requires an extensive amount of calculation and processing, we proposed and tested a novel speech recognition design“Linear Transient/Steady-State Feature Coefficient Alignment model (LTSFCA)” Relying on the basis of Chinese language pronunciations which separates speeches into transient and Steady-State segments, this method fractionates speeches into pieces for recognition. In contrast to DTW, this particular method is built upon simple principles and it requires much less processing resources and can, therefore, be applied on simpler and older devices or computers. Spared processing resources can then be used on other synchronization functions such as machine vision, motion planning, manipulator control.We constructed “Interval Boundary Function model (IBF)” in order to shorten the calculation required during the aligning procedures of linear alignment. Multiple words recognition was achieved. Utilization on word numbers categorizations and the increase of database numbers greatly raise the recognition rate. According to our experimental results, the speech recognition rate was 98.86% among 35 databases tested, while the recognition rate was 96.43% for speaker-independent tests(2 males and 2 females) using one training result done by the author of this article.We designed the“rule of the quantization” commands such as “ move forward 149”. Moreover, the“rule of the conjunction” was also built in order to recognize advanced commands such as “move forward then turn to the right” or “start up and accelerate”.However,the recognition rates for these were roughly 90% only.Design trainer was able to receive multiple inputs of the same speech in one single training and transform them to Mel-frequency cepstral coefficient(MFCC) for storages or modifications.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography