Rozprawy doktorskie na temat „Speaker recognition systems”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Speaker recognition systems.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Speaker recognition systems”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Neville, Katrina Lee, i katrina neville@rmit edu au. "Channel Compensation for Speaker Recognition Systems". RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453.

Pełny tekst źródła
Streszczenie:
This thesis attempts to address the problem of how best to remedy different types of channel distortions on speech when that speech is to be used in automatic speaker recognition and verification systems. Automatic speaker recognition is when a person's voice is analysed by a machine and the person's identity is worked out by the comparison of speech features to a known set of speech features. Automatic speaker verification is when a person claims an identity and the machine determines if that claimed identity is correct or whether that person is an impostor. Channel distortion occurs whenever information is sent electronically through any type of channel whether that channel is a basic wired telephone channel or a wireless channel. The types of distortion that can corrupt the information include time-variant or time-invariant filtering of the information or the addition of 'thermal noise' to the information, both of these types of distortion can cause varying degrees of error in information being received and analysed. The experiments presented in this thesis investigate the effects of channel distortion on the average speaker recognition rates and testing the effectiveness of various channel compensation algorithms designed to mitigate the effects of channel distortion. The speaker recognition system was represented by a basic recognition algorithm consisting of: speech analysis, extraction of feature vectors in the form of the Mel-Cepstral Coefficients, and a classification part based on the minimum distance rule. Two types of channel distortion were investigated: • Convolutional (or lowpass filtering) effects • Addition of white Gaussian noise Three different methods of channel compensation were tested: • Cepstral Mean Subtraction (CMS) • RelAtive SpecTrAl (RASTA) Processing • Constant Modulus Algorithm (CMA) The results from the experiments showed that for both CMS and RASTA processing that filtering at low cutoff frequencies, (3 or 4 kHz), produced improvements in the average speaker recognition rates compared to speech with no compensation. The levels of improvement due to RASTA processing were higher than the levels achieved due to the CMS method. Neither the CMS or RASTA methods were able to improve accuracy of the speaker recognition system for cutoff frequencies of 5 kHz, 6 kHz or 7 kHz. In the case of noisy speech all methods analysed were able to compensate for high SNR of 40 dB and 30 dB and only RASTA processing was able to compensate and improve the average recognition rate for speech corrupted with a high level of noise (SNR of 20 dB and 10 dB).
Style APA, Harvard, Vancouver, ISO itp.
2

Du, Toit Ilze. "Non-acoustic speaker recognition". Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16315.

Pełny tekst źródła
Streszczenie:
Thesis (MScIng)--University of Stellenbosch, 2004.
ENGLISH ABSTRACT: In this study the phoneme labels derived from a phoneme recogniser are used for phonetic speaker recognition. The time-dependencies among phonemes are modelled by using hidden Markov models (HMMs) for the speaker models. Experiments are done using firstorder and second-order HMMs and various smoothing techniques are examined to address the problem of data scarcity. The use of word labels for lexical speaker recognition is also investigated. Single word frequencies are counted and the use of various word selections as feature sets are investigated. During April 2004, the University of Stellenbosch, in collaboration with Spescom DataVoice, participated in an international speaker verification competition presented by the National Institute of Standards and Technology (NIST). The University of Stellenbosch submitted phonetic and lexical (non-acoustic) speaker recognition systems and a fused system (the primary system) that fuses the acoustic system of Spescom DataVoice with the non-acoustic systems of the University of Stellenbosch. The results were evaluated by means of a cost model. Based on the cost model, the primary system obtained second and third position in the two categories that were submitted.
AFRIKAANSE OPSOMMING: Hierdie projek maak gebruik van foneem-etikette wat geklassifiseer word deur ’n foneemherkenner en daarna gebruik word vir fonetiese sprekerherkenning. Die tyd-afhanklikhede tussen foneme word gemodelleer deur gebruik te maak van verskuilde Markov modelle (HMMs) as sprekermodelle. Daar word ge¨eksperimenteer met eerste-orde en tweede-orde HMMs en verskeie vergladdingstegnieke word ondersoek om dataskaarsheid aan te spreek. Die gebruik van woord-etikette vir sprekerherkenning word ook ondersoek. Enkelwoordfrekwensies word getel en daar word ge¨eksperimenteer met verskeie woordseleksies as kenmerke vir sprekerherkenning. Gedurende April 2004 het die Universiteit van Stellenbosch in samewerking met Spescom DataVoice deelgeneem aan ’n internasionale sprekerverifikasie kompetisie wat deur die National Institute of Standards and Technology (NIST) aangebied is. Die Universiteit van Stellenbosch het ingeskryf vir ’n fonetiese en ’n woordgebaseerde (nie-akoestiese) sprekerherkenningstelsel, asook ’n saamgesmelte stelsel wat as primˆere stelsel dien. Die saamgesmelte stelsel is ’n kombinasie van Spescom DataVoice se akoestiese stelsel en die twee nie-akoestiese stelsels van die Universiteit van Stellenbosch. Die resultate is ge¨evalueer deur gebruik te maak van ’n koste-model. Op grond van die koste-model het die primˆere stelsel tweede en derde plek behaal in die twee kategorie¨e waaraan deelgeneem is.
Style APA, Harvard, Vancouver, ISO itp.
3

Shou-Chun, Yin 1980. "Speaker adaptation in joint factor analysis based text independent speaker verification". Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100735.

Pełny tekst źródła
Streszczenie:
This thesis presents methods for supervised and unsupervised speaker adaptation of Gaussian mixture speaker models in text-independent speaker verification. The proposed methods are based on an approach which is able to separate speaker and channel variability so that progressive updating of speaker models can be performed while minimizing the influence of the channel variability associated with the adaptation recordings. This approach relies on a joint factor analysis model of intrinsic speaker variability and session variability where inter-session variation is assumed to result primarily from the effects of the transmission channel. These adaptation methods have been evaluated under the adaptation paradigm defined under the NIST 2005 speaker recognition evaluation plan which is based on conversational telephone speech.
Style APA, Harvard, Vancouver, ISO itp.
4

Uzuner, Halil. "Robust text-independent speaker recognition over telecommunications systems". Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843391/.

Pełny tekst źródła
Streszczenie:
Biometric recognition methods, using human features such as voice, face or fingeorprints, are increasingly popular for user authentication. Voice is unique in that it is a non-intrusive biometric which can be transmitted over the existing telecommunication networks, thereby allowing remote authentication. Current spealcer recognition systems can provide high recognition rates on clean speech signals. However, their performance has been shown to degrade in real-life applications such as telephone banking, where speech compression and background noise can affect the speech signal. In this work, three important advancements have been introduced to improve the speaker recognition performance, where it is affected by the coder mismatch, the aliasing distortion caused by the Line Spectral Frequency (LSF) parameter extraction, and the background noise. The first advancement focuses on investigating the speaker recognition system performance in a multi-coder environment using a Speech Coder Detection (SCD) System, which minimises training and testing data mismatch and improves the speaker recognition performance. Having reduced the speaker recognition error rates for multi-coder environment, further investigation on GSM-EFR speech coder is performed to deal with a particular - problem related to LSF parameter extraction method. It has been previously shown that the classic technique for extraction of LSF parameters in speech coders is prone to aliasing distortion. Low-pass filtering on up-sampled LSF vectors has been shown to alleviate this problem, therefore improving speech quality. In this thesis, as a second advancement, the Non-Aliased LSF (NA-LSF) extraction method is introduced in order to reduce the unwanted effects of GSM-EFR coder on speaker recognition performance. Another important factor that effects the performance of speaker recognition systems is the presence of the background noise. Background noise might severely reduce the performance of the targeted application such as quality of the coded speech, or the performance of the speaker recognition systems. The third advancement was achieved by using a noise-canceller to improve the speaker recognition performance in mismatched environments with varying background noise conditions. Speaker recognition system with a Minimum Mean Square Error - Log Spectral Amplitudes (MMSE-LSA) noise- canceller used as a pre-processor is proposed and investigated to determine the efficiency of noise cancellation on the speaker recognition performance using speech corrupted by different background noise conditions. Also the effects of noise cancellation on speaker recognition performance using coded noisy speech have been investigated. Key words; Identification, Verification, Recognition, Gaussian Mixture Models, Speech Coding, Noise Cancellation.
Style APA, Harvard, Vancouver, ISO itp.
5

Wildermoth, Brett Richard, i n/a. "Text-Independent Speaker Recognition Using Source Based Features". Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Pełny tekst źródła
Streszczenie:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Style APA, Harvard, Vancouver, ISO itp.
6

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features". Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Pełny tekst źródła
Streszczenie:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
Style APA, Harvard, Vancouver, ISO itp.
7

Adami, André Gustavo. "Modeling prosodic differences for speaker and language recognition /". Full text open access at:, 2004. http://content.ohsu.edu/u?/etd,19.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Yu, K. P. "Text dependency and adaptation in training speaker recognition systems". Thesis, Swansea University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.636721.

Pełny tekst źródła
Streszczenie:
This thesis investigates speaker specific models trained with training sets with a number of different repetitions per text but focusing mainly on the models trained with only a few (less than 3) repetitions. This work aims to assess the abilities of a speaker model as the amount of training data increases while keeping the length of test utterances fixed. This theme is chosen because small data sets are problematic to the training of models for speech and speaker recognition. Small training set sizes regularly occur when training speaker specific models, as it is often difficult to collect a large amount of speaker specific data. In the first part of this work, three speaker recognition approaches, namely vector quantisation (VQ), dynamic time warping (DTW) and continuous density hidden Markov models (CDHMMs) are assessed. These experiments use increasing training set sizes which contain from 1 to 10 repetitions of each text to train each speaker model. Here the intent is to show which approach is most appropriate across the range of available training set sizes, for text-dependent and text-independent speaker recognition. This part concludes by suggesting that the TD DTW approach is best of all the chosen configurations. The second part of the work concerns adaptation using text-dependent CDHMMs. A new approach for adaptation called cumulative likelihood estimation (CLE) is introduced, and compared with the maximum a posteriori (MAP) approach and other benchmark results. The framework is chosen such that only single repetitions of each utterance are available for enrolment and subsequent adaptation of the speaker model. The objective is to assess whether creating speaker models through the use of an adaptation approach is a viable alternative to creating speaker models using stored speaker specific speech. It is concluded that both MAP and CLE are viable alternatives, and CLE in particular can create a model by adapting single repetitions of data which achieves performance as good as or better than that of an equivalent model, such as DTW, which has been trained using an equivalent amount of stored data.
Style APA, Harvard, Vancouver, ISO itp.
9

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition". Thesis, Queensland University of Technology, 2001.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Mathan, Luc Stefan. "Speaker-independent access to a large lexicon". Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63773.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

Huliehel, Fakhralden A. "An RBFN-based system for speaker-independent speech recognition". Diss., This resource online, 1995. http://scholar.lib.vt.edu/theses/available/etd-06062008-162619/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Castellano, Pierre John. "Speaker recognition modelling with artificial neural networks". Thesis, Queensland University of Technology, 1997.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
13

Sepasian, Mojtaba. "Multibiometric security in wireless communication systems". Thesis, Brunel University, 2010. http://bura.brunel.ac.uk/handle/2438/5081.

Pełny tekst źródła
Streszczenie:
This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims.
Style APA, Harvard, Vancouver, ISO itp.
14

Chan, Siu Man. "Improved speaker verification with discrimination power weighting /". View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20CHANS.

Pełny tekst źródła
Streszczenie:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 86-93). Also available in electronic version. Access restricted to campus users.
Style APA, Harvard, Vancouver, ISO itp.
15

Slomka, Stefan. "Multiple classifier structures for automatic speaker recognition under adverse conditions". Thesis, Queensland University of Technology, 1999.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Cilliers, Francois Dirk. "Tree-based Gaussian mixture models for speaker verification". Thesis, Link to the online version, 2005. http://hdl.handle.net/10019.1/1639.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Campanelli, Michael R. "Computer classification of stop consonants in a speaker independent continuous speech environment /". Online version of thesis, 1991. http://hdl.handle.net/1850/11051.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Reynolds, Douglas A. "A Gaussian mixture modeling approach to text-independent speaker identification". Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/16903.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
19

Pool, Jan. "Investigation of the impact of high frequency transmitted speech on speaker recognition". Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52895.

Pełny tekst źródła
Streszczenie:
Thesis (MScEng)--Stellenbosch University, 2002.
Some digitised pages may appear illegible due to the condition of the original hard copy.
ENGLISH ABSTRACT: Speaker recognition systems have evolved to a point where near perfect performance can be obtained under ideal conditions, even if the system must distinguish between a large number of speakers. Under adverse conditions, such as when high noise levels are present or when the transmission channel deforms the speech, the performance is often less than satisfying. This project investigated the performance of a popular speaker recognition system, that use Gaussian mixture models, on speech transmitted over a high frequency channel. Initial experiments demonstrated very unsatisfactory results for the base line system. We investigated a number of robust techniques. We implemented and applied some of them in an attempt to improve the performance of the speaker recognition systems. The techniques we tested showed only slight improvements. We also investigates the effects of a high frequency channel and single sideband modulation on the speech features of speech processing systems. The effects that can deform the features, and therefore reduce the performance of speech systems, were identified. One of the effects that can greatly affect the performance of a speech processing system is noise. We investigated some speech enhancement techniques and as a result we developed a new statistical based speech enhancement technique that employs hidden Markov models to represent the clean speech process.
AFRIKAANSE OPSOMMING: Sprekerherkenning-stelsels het 'n punt bereik waar nabyaan perfekte resultate verwag kan word onder ideale kondisies, selfs al moet die stelsel tussen 'n groot aantal sprekers onderskei. Wanneer nie-ideale kondisies, soos byvoorbeeld hoë ruisvlakke of 'n transmissie kanaal wat die spraak vervorm, teenwoordig is, is die resultate gewoonlik nie bevredigend nie. Die projek ondersoek die werksverrigting van 'n gewilde sprekerherkenning-stelsel, wat gebruik maak van Gaussiese mengselmodelle, op spraak wat oor 'n hoë frekwensie transmissie kanaal gestuur is. Aanvanklike eksperimente wat gebruik maak van 'n basiese stelsel het nie goeie resultate opgelewer nie. Ons het 'n aantal robuuste tegnieke ondersoek en 'n paar van hulle geïmplementeer en getoets in 'n poging om die resultate van die sprekerherkenning-stelsel te verbeter. Die tegnieke wat ons getoets het, het net geringe verbetering getoon. Die studie het ook die effekte wat die hoë-frekwensie kanaal en enkel-syband modulasie op spraak kenmerkvektore, ondersoek. Die effekte wat die spraak kenmerkvektore kan vervorm en dus die werkverrigting van spraak stelsels kan verlaag, is geïdentifiseer. Een van die effekte wat 'n groot invloed op die werkverrigting van spraakstelsels het, is ruis. Ons het spraak verbeterings metodes ondersoek en dit het gelei tot die ontwikkeling van 'n statisties gebaseerde spraak verbeteringstegniek wat gebruik maak van verskuilde Markov modelle om die skoon spraakproses voor te stel.
Style APA, Harvard, Vancouver, ISO itp.
20

Tsao, Yu. "An ensemble speaker and speaking environment modeling approach to robust speech recognition". Diss., Atlanta, Ga. : Georgia Institute of Technology, 2008. http://hdl.handle.net/1853/26540.

Pełny tekst źródła
Streszczenie:
Thesis (Ph.D)--Electrical and Computer Engineering, Georgia Institute of Technology, 2009.
Committee Chair: Lee, Chin-Hui; Committee Member: Anthony Joseph Yezzi; Committee Member: Biing-Hwang (Fred) Juang; Committee Member: Mark Clements; Committee Member: Ming Yuan. Part of the SMARTech Electronic Thesis and Dissertation Collection.
Style APA, Harvard, Vancouver, ISO itp.
21

Brummer, Niko. "Measuring, refining and calibrating speaker and language information extracted from speech". Thesis, Stellenbosch : University of Stellenbosch, 2010. http://hdl.handle.net/10019.1/5139.

Pełny tekst źródła
Streszczenie:
Thesis (PhD (Electrical and Electronic Engineering))--University of Stellenbosch, 2010.
ENGLISH ABSTRACT: We propose a new methodology, based on proper scoring rules, for the evaluation of the goodness of pattern recognizers with probabilistic outputs. The recognizers of interest take an input, known to belong to one of a discrete set of classes, and output a calibrated likelihood for each class. This is a generalization of the traditional use of proper scoring rules to evaluate the goodness of probability distributions. A recognizer with outputs in well-calibrated probability distribution form can be applied to make cost-effective Bayes decisions over a range of applications, having di fferent cost functions. A recognizer with likelihood output can additionally be employed for a wide range of prior distributions for the to-be-recognized classes. We use automatic speaker recognition and automatic spoken language recognition as prototypes of this type of pattern recognizer. The traditional evaluation methods in these fields, as represented by the series of NIST Speaker and Language Recognition Evaluations, evaluate hard decisions made by the recognizers. This makes these recognizers cost-and-prior-dependent. The proposed methodology generalizes that of the NIST evaluations, allowing for the evaluation of recognizers which are intended to be usefully applied over a wide range of applications, having variable priors and costs. The proposal includes a family of evaluation criteria, where each member of the family is formed by a proper scoring rule. We emphasize two members of this family: (i) A non-strict scoring rule, directly representing error-rate at a given prior. (ii) The strict logarithmic scoring rule which represents information content, or which equivalently represents summarized error-rate, or expected cost, over a wide range of applications. We further show how to form a family of secondary evaluation criteria, which by contrasting with the primary criteria, form an analysis of the goodness of calibration of the recognizers likelihoods. Finally, we show how to use the logarithmic scoring rule as an objective function for the discriminative training of fusion and calibration of speaker and language recognizers.
AFRIKAANSE OPSOMMING: Ons wys hoe om die onsekerheid in die uittree van outomatiese sprekerherkenning- en taalherkenningstelsels voor te stel, te meet, te kalibreer en te optimeer. Dit maak die bestaande tegnologie akkurater, doeltre ender en meer algemeen toepasbaar.
Style APA, Harvard, Vancouver, ISO itp.
22

Phythian, Mark. "Speaker identification for forensic applications". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36079/3/__qut.edu.au_Documents_StaffHome_StaffGroupR%24_rogersjm_Desktop_36079_Digitised%20Thesis.pdf.

Pełny tekst źródła
Streszczenie:
A major application of Speaker Identification (SI) is suspect identification by voice. This thesis investigates techniques that can be used to improve SI technology as applied to suspect identification. Speech Coding techniques have become integrated into many of our modern voice communications systems. This prompts the question - how are automatic speaker identification systems and modern forensic identification techniques affected by the introduction of digitally coded speech channels? Presented in this thesis are three separate studies investigating the effects of speech coding and compression on current speaker recognition techniques. A relatively new Spectral Analysis technique - Higher Order Spectral Analysis (HOSA) - has been identified as a potential candidate for improving some aspects of forensic speaker identification tasks. Presented in this thesis is a study investigating the application of HOSA to improve the robustness of current ASR techniques in the presence of additive Gaussian noise. Results from our investigations reveal that incremental improvements in each of these aspects related to automatic and forensic identification are achievable.
Style APA, Harvard, Vancouver, ISO itp.
23

Fanner, Robert M. "Analysis and implementation of the speaker adaptation techniques : MAP, MLLR, and MLED". Thesis, Stellenbosch : Stellenbosch University, 2002. http://hdl.handle.net/10019.1/52653.

Pełny tekst źródła
Streszczenie:
Thesis (MScEng)--University of Stellenbosch, 2002.
ENGLISH ABSTRACT: The topic of this thesis is speaker adaptation, whereby speaker-independent speech models are adapted to more closely match individual speakers by utilising a small amount of data from the targeted individual. Speaker adaptation methods - specifically, the MAP, MLLR and MLED speaker adaptation methods - are critically evaluated and compared. Two novel extensions of the MLED adaptation method are introduced, derived and evaluated. The first incorporates the explicit modelling of the mean speaker model in the speaker-space into the MLED framework. The second extends MLED to use basis vectors modelling inter-class variance for classes of speech models, instead of basis vectors modelling inter-speaker variance. An evaluation of the effect of two different types of feature vector - PLP-cepstra and LPCCs - on the performance of speaker adaptation is made, to determine which feature vector is optimal for speaker-independent systems and the adaptation thereof.
AFRIKAANSE OPSOMMING: Die onderwerp van hierdie tesis is spreker-aanpassing, dit wil sê, die verandering van 'n spreker-onafhanklike spraakmodel om nader aan 'n spreker-afhanklike model vir 'n individu te wees, gegewe 'n klein hoeveelheid spraakdata van die individu. Die volgende sprekeraanpassing-metodes word geëvalueer: MAP, MLLR en MLED. Twee nuwe uitbreidings vir die MLED-metode word beskryf, afgelei en geëvalueer. Die eerste inkorporeer die eksplisiete modellering van die gemiddelde sprekermodel van die sprekerruimte in die MLED metode. Die tweede uitbreiding maak gebruik van basisvektore vir MLED wat vanaf die interklas-variansie tussen 'n stel sprekerklasse in plaas van die interspreker-variansie afgelei is. Die effek van twee tipes kenmerk-vektore - PLP-kepstra en LPCC's - op die prestasie van sprekeraanpassings-metodes word ondersoek, sodat die optimale tipe kenmerk-vektor vir spreker-onafhanklike modelle en hul aanpassing gevind kan word.
Style APA, Harvard, Vancouver, ISO itp.
24

Barger, Peter James. "Speech processing for forensic applications". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36081/1/36081_Barger_1998.pdf.

Pełny tekst źródła
Streszczenie:
This thesis examines speech processing systems appropriate for use in forensic analysis. The need for automatic speech processing systems for forensic use is justified by the increasing use of electronically recorded speech for communication. An automatic speaker identification and verification system is described which was tested on data gathered by the Queensland Police Force. Speaker identification using Gaussian mixture models (GMMs) is shown to be useful as an indicator of identity, but not sufficiently accurate to be used as the sole means of identification. It is shown that training GMMs on speech of one language and testing on speech of another language introduces significant bias into the results, which is unpredictable in its effects. This has implications for the performance of the system on subjects attempting to disguise their voices. Automatic gender identification systems are shown to be highly accurate, attaining 98% accuracy, even with very simple classifiers, and when tested on speech degraded by coding or reverberation. These gender gates are useful as initial classifiers in a larger speaker classification system and may even find independent use in a forensic environment. A dual microphone method of improving the performance of speaker identification systems in noisy environments is described. The method gives a significant improvement in log-likelihood scores when its output is used as input to a GMM. This implies that speaker identification tests may be improved in accuracy. A method of automatically assessing the quality of transmitted speech segments using a classification scheme is described. By classifying the difference between cepstral parameters describing the original speech and the transmitted speech, an estimate of the speech quality is obtained.
Style APA, Harvard, Vancouver, ISO itp.
25

Tomashenko, Natalia. "Speaker adaptation of deep neural network acoustic models using Gaussian mixture model framework in automatic speech recognition systems". Thesis, Le Mans, 2017. http://www.theses.fr/2017LEMA1040/document.

Pełny tekst źródła
Streszczenie:
Les différences entre conditions d'apprentissage et conditions de test peuvent considérablement dégrader la qualité des transcriptions produites par un système de reconnaissance automatique de la parole (RAP). L'adaptation est un moyen efficace pour réduire l'inadéquation entre les modèles du système et les données liées à un locuteur ou un canal acoustique particulier. Il existe deux types dominants de modèles acoustiques utilisés en RAP : les modèles de mélanges gaussiens (GMM) et les réseaux de neurones profonds (DNN). L'approche par modèles de Markov cachés (HMM) combinés à des GMM (GMM-HMM) a été l'une des techniques les plus utilisées dans les systèmes de RAP pendant de nombreuses décennies. Plusieurs techniques d'adaptation ont été développées pour ce type de modèles. Les modèles acoustiques combinant HMM et DNN (DNN-HMM) ont récemment permis de grandes avancées et surpassé les modèles GMM-HMM pour diverses tâches de RAP, mais l'adaptation au locuteur reste très difficile pour les modèles DNN-HMM. L'objectif principal de cette thèse est de développer une méthode de transfert efficace des algorithmes d'adaptation des modèles GMM aux modèles DNN. Une nouvelle approche pour l'adaptation au locuteur des modèles acoustiques de type DNN est proposée et étudiée : elle s'appuie sur l'utilisation de fonctions dérivées de GMM comme entrée d'un DNN. La technique proposée fournit un cadre général pour le transfert des algorithmes d'adaptation développés pour les GMM à l'adaptation des DNN. Elle est étudiée pour différents systèmes de RAP à l'état de l'art et s'avère efficace par rapport à d'autres techniques d'adaptation au locuteur, ainsi que complémentaire
Differences between training and testing conditions may significantly degrade recognition accuracy in automatic speech recognition (ASR) systems. Adaptation is an efficient way to reduce the mismatch between models and data from a particular speaker or channel. There are two dominant types of acoustic models (AMs) used in ASR: Gaussian mixture models (GMMs) and deep neural networks (DNNs). The GMM hidden Markov model (GMM-HMM) approach has been one of the most common technique in ASR systems for many decades. Speaker adaptation is very effective for these AMs and various adaptation techniques have been developed for them. On the other hand, DNN-HMM AMs have recently achieved big advances and outperformed GMM-HMM models for various ASR tasks. However, speaker adaptation is still very challenging for these AMs. Many adaptation algorithms that work well for GMMs systems cannot be easily applied to DNNs because of the different nature of these models. The main purpose of this thesis is to develop a method for efficient transfer of adaptation algorithms from the GMM framework to DNN models. A novel approach for speaker adaptation of DNN AMs is proposed and investigated. The idea of this approach is based on using so-called GMM-derived features as input to a DNN. The proposed technique provides a general framework for transferring adaptation algorithms, developed for GMMs, to DNN adaptation. It is explored for various state-of-the-art ASR systems and is shown to be effective in comparison with other speaker adaptation techniques and complementary to them
Style APA, Harvard, Vancouver, ISO itp.
26

PINHEIRO, Hector Natan Batista. "Verificação de locutores independente de texto: uma análise de robustez a ruído". Universidade Federal de Pernambuco, 2015. https://repositorio.ufpe.br/handle/123456789/18045.

Pełny tekst źródła
Streszczenie:
Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-11-08T19:13:18Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5)
Made available in DSpace on 2016-11-08T19:13:18Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5) Previous issue date: 2015-02-25
O processo de identificação de um determinado indivíduo é realizado milhões de vezes, todos os dias, por organizações dos mais diversos setores. Perguntas como "Quem é esse indivíduo?" ou "É essa pessoa quem ela diz ser?" são realizadas frequentemente por organizações financeiras, sistemas de saúde, sistemas de comércio eletrônico, sistemas de telecomunicações e por instituições governamentais. Identificação biométrica diz respeito ao processo de realizar essa identificação a partir de características físicas ou comportamentais. Tais características são comumente referenciadas como características biométricas e alguns exemplos delas são: face, impressão digital, íris, assinatura e voz. Reconhecimento de locutores é uma modalidade biométrica que se propõe a realizar o processo de identificação pessoal a partir das informações presentes unicamente na voz do indivíduo. Este trabalho foca no desenvolvimento de sistemas de verificação de locutores independente de texto. O principal desafio no desenvolvimento desses sistemas provém das chamadas incompatibilidades que podem ocorrer na aquisição dos sinais de voz. As técnicas propostas para suavizá-las são chamadas de técnicas de compensação e três são os domínios onde elas podem operar: no processo de extração de características do sinal, na construção dos modelos dos locutores e no cálculo do score final do sistema. Além de apresentar uma vasta revisão da literatura do desenvolvimento de sistemas de verificação de locutores independentes de texto, esse trabalho também apresenta as principais técnicas de compensação de características, modelos e scores. Na fase de experimentação, uma análise comparativa das principais técnicas propostas na literatura é apresentada. Além disso, duas técnicas de compensação são propostas, uma do domínio de modelagem e outra do domínio dos scores. A técnica de compensação de score proposta é baseada na Distribuição Normal Acumulada e apresentou, em alguns contextos, resultados superiores aos apresentados pelas principais técnicas da literatura. Já a técnica de compensação de modelo é baseada em uma técnica da literatura que combina dois conceitos: treinamento multi-condicional e Teoria dos Dados Ausentes (Missing Data Theory). A formulação apresentada pelos autores é baseada nos chamados Modelos de União a Posteriori (Posterior Union Models), mas não é completamente adequada para verificação de locutores independente de texto. Este trabalho apresenta uma formulação apropriada para esse contexto que combina os dois conceitos utilizados pelos autores com um tipo de modelagem utilizando UBMs (Universal Background Models). A técnica proposta apresentou ganhos de desempenhos quando comparada à técnica-padrão GMM-UBM, baseada em Modelos de Misturas Gaussianas (GMMs).
The personal identification of individuals is a task executed millions of times every day by organizations from diverse fields. Questions such as "Who is this individual?" or "Is this person who he or she claims to be?" are constantly made by organizations in financial services, health care, e-commerce, telecommunication systems and governments. Biometric identification is the process of identifying people using their physiological or behavioral characteristics. These characteristics are generally known as biometrics and examples of these include face, fingerprint, iris, handwriting and speech. Speaker recognition is a biometric modality which makes the personal identification by using speaker-specific information from the speech. This work focuses on the development of text-independent speaker verification systems. In these systems, speech from an individual is used to verify the claimed identity of that individual. Furthermore, the verification must occur independently of the pronounced word or phrase. The main challenge in the development of speaker recognition systems comes from the mismatches which may occur in the acquisition of the speech signals. The techniques proposed to mitigate the mismatch effects are referred as compensation methods. They may operate in three domains: in the feature extraction process, in the estimation of the speaker models and in the computation of the decision score. Besides presenting a wide description of the main techniques used in the development of text-independent speaker verification systems, this work presents the description of the main feature-, model- and score-based compensation methods. In the experiments, this work shows comprehensive comparisons between the conventional techniques and the alternatively compensations methods. Furthermore, two compensation methods are proposed: one operates in the model domain and the other in the score-domain. The scoredomain proposed compensation method is based on the Normal cumulative distribution function and, in some contexts, outperformed the performance of the main score-domain compensation techniques. On the other hand, the model-domain compensation technique proposed in this work is based on a method presented in the literature which combines two concepts: the multi-condition training and the Missing Data Theory. The formulation proposed by the authors is based on the Posterior Union models and is not completely appropriate for the text-independent speaker verification task. This work proposes a more appropriate formulation for this context which combines the concepts used by the authors with a type of modeling using Universal Background Models (UBMs). The proposed method outperformed the usual GMM-UBM modeling technique, based on Gaussian Mixture Models (GMMs).
Style APA, Harvard, Vancouver, ISO itp.
27

Lucey, Simon. "Audio-visual speech processing". Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Pełny tekst źródła
Streszczenie:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
Style APA, Harvard, Vancouver, ISO itp.
28

Hong, Z. (Zimeng). "Speaker gender recognition system". Master's thesis, University of Oulu, 2017. http://jultika.oulu.fi/Record/nbnfioulu-201706082645.

Pełny tekst źródła
Streszczenie:
Abstract. Automatic gender recognition through speech is one of the fundamental mechanisms in human-machine interaction. Typical application areas of this technology range from gender-targeted advertising to gender-specific IoT (Internet of Things) applications. It can also be used to narrow down the scope of investigations in crime scenarios. There are many possible methods of recognizing the gender of a speaker. In machine learning applications, the first step is to acquire and convert the natural human voice into a form of machine understandable signal. Useful voice features then could be extracted and labelled with gender information so that are then trained by machines. After that, new input voice can be captured and processed and the machine is able to extract the features by pattern modelling. In this thesis, a real-time speaker gender recognition system was designed within Matlab environment. This system could automatically identify the gender of a speaker by voice. The implementation work utilized voice processing and feature extraction techniques to deal with an input speech coming from a microphone or a recorded speech file. The response features are extracted and classified. Then the machine learning classification method (Naïve Bayes Classifier) is used to distinguish the gender features. The recognition result with gender information is then finally displayed. The evaluation of the speaker gender recognition systems was done in an experiment with 40 participants (half male and half female) in a quite small room. The experiment recorded 400 speech samples by speakers from 16 countries in 17 languages. These 400 speech samples were tested by the gender recognition system and showed a considerably good performance, with only 29 errors of recognition (92.75% accuracy). In comparison with previous speaker gender recognition systems, most of them obtained the accuracy no more than 90% and only one obtained 100% accuracy with very limited testers. We can then conclude that the performance of the speaker gender recognition system designed in this thesis is reliable.
Style APA, Harvard, Vancouver, ISO itp.
29

Kamarauskas, Juozas. "Speaker recognition by voice". Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20090615_093847-20773.

Pełny tekst źródła
Streszczenie:
Questions of speaker’s recognition by voice are investigated in this dissertation. Speaker recognition systems, their evolution, problems of recognition, systems of features, questions of speaker modeling and matching used in text-independent and text-dependent speaker recognition are considered too. The text-independent speaker recognition system has been developed during this work. The Gaussian mixture model approach was used for speaker modeling and pattern matching. The automatic method for voice activity detection was proposed. This method is fast and does not require any additional actions from the user, such as indicating patterns of the speech signal and noise. The system of the features was proposed. This system consists of parameters of excitation source (glottal) and parameters of the vocal tract. The fundamental frequency was taken as an excitation source parameter and four formants with three antiformants were taken as parameters of the vocal tract. In order to equate dispersions of the formants and antiformants we propose to use them in mel-frequency scale. The standard mel-frequency cepstral coefficients (MFCC) for comparison of the results were implemented in the recognition system too. These features make baseline in speech and speaker recognition. The experiments of speaker recognition have shown that our proposed system of features outperformed standard mel-frequency cepstral coefficients. The equal error rate (EER) was equal to 5.17% using proposed... [to full text]
Disertacijoje nagrinėjami kalbančiojo atpažinimo pagal balsą klausimai. Aptartos kalbančiojo atpažinimo sistemos, jų raida, atpažinimo problemos, požymių sistemos įvairovė bei kalbančiojo modeliavimo ir požymių palyginimo metodai, naudojami nuo ištarto teksto nepriklausomame bei priklausomame kalbančiojo atpažinime. Darbo metu sukurta nuo ištarto teksto nepriklausanti kalbančiojo atpažinimo sistema. Kalbėtojų modelių kūrimui ir požymių palyginimui buvo panaudoti Gauso mišinių modeliai. Pasiūlytas automatinis vokalizuotų garsų išrinkimo (segmentavimo) metodas. Šis metodas yra greitai veikiantis ir nereikalaujantis iš vartotojo jokių papildomų veiksmų, tokių kaip kalbos signalo ir triukšmo pavyzdžių nurodymas. Pasiūlyta požymių vektorių sistema, susidedanti iš žadinimo signalo bei balso trakto parametrų. Kaip žadinimo signalo parametras, panaudotas žadinimo signalo pagrindinis dažnis, kaip balso trakto parametrai, panaudotos keturios formantės bei trys antiformantės. Siekiant suvienodinti žemesnių bei aukštesnių formančių ir antiformančių dispersijas, jas pasiūlėme skaičiuoti melų skalėje. Rezultatų palyginimui sistemoje buvo realizuoti standartiniai požymiai, naudojami kalbos bei asmens atpažinime – melų skalės kepstro koeficientai (MSKK). Atlikti kalbančiojo atpažinimo eksperimentai parodė, kad panaudojus pasiūlytą požymių sistemą buvo gauti geresni atpažinimo rezultatai, nei panaudojus standartinius požymius (MSKK). Gautas lygių klaidų lygis, panaudojant pasiūlytą požymių... [toliau žr. visą tekstą]
Style APA, Harvard, Vancouver, ISO itp.
30

Nosratighods, Mohaddeseh Electrical Engineering &amp Telecommunications Faculty of Engineering UNSW. "Robust speaker verification system". Publisher:University of New South Wales. Electrical Engineering & Telecommunications, 2008. http://handle.unsw.edu.au/1959.4/42796.

Pełny tekst źródła
Streszczenie:
Identity verification or biometric recognition systems play an important role in our daily lives. Applications include Automatic Teller Machines (ATM), banking and share information retrieval, and personal verification for credit cards. Among the biometric techniques, authentication of speakers by his/her voice is of great importance, since it employs a non-invasive approach and is the only available modality in many applications. However,the performance of Automatic Speaker Verification (ASV) systems degrades significantly under adverse conditions which cause recordings from the same speaker to be different.The objective of this research is to investigate and develop robust techniques for performing automatic speaker recognition over various channel conditions, such as telephony and recorded microphone speech. This research is shown to improve the robustness of ASV systems in three main areas of feature extraction, speaker modelling and score normalization. At the feature level, a new set of dynamic features, termed Delta Cepstral Energy (DCE) is proposed, instead of traditional delta cepstra, which not only greatly reduces thedimensionality of the feature vector compared with delta and delta-delta cepstra, but is also shown to provide the same performance for matched testing and training conditions on TIMIT and a subset of the NIST 2002 dataset. The concept of speaker entropy, which conveys the information contained in a speaker's speech based on the extracted features, facilitates comparative evaluation of the proposed methods. In addition, Frequency Modulation features are combined in a complementary manner with the Mel Frequency CepstralCoefficients (MFCCs) to improve the performance of the ASV system under channel variability of various types. The proposed fused system shows a relative reduction of up to 23% in Equal Error Rate (EER) over the MFCC-based system when evaluated on the NIST 2008 dataset. Currently, the main challenge in speaker modelling is channel variability across different sessions. A recent approach to channel compensation, based on Support Vector Machines (SVM) is Nuisance Attribute Projection (NAP). The proposed multi-component approach to NAP, attempts to compensate for the main sources of inter-session variations through an additional optimization criteria, to allow more accurate estimates of the most dominant channel artefacts and to improve the system performance under mismatched training and test conditions. Another major issue in speaker recognition is that the variability of score distributions due to incompletely modelled regions of the feature space can produce segments of the test speech that are poorly matched to the claimed speaker model. A segment selection technique in score normalization is proposed that relies only on discriminative and reliable segments of the test utterance to verify the speaker. This approach is particularly useful in noisy conditions where using speech activity detection is not reliable at the feature level. Another source of score variability comes from the fact that not all phonemes are equally discriminative. To address this, a new score re-weighting technique is applied to likelihood values based on the discriminative level of each Gaussian component, i.e. each particular region of the feature space. It is found that a limited number of Gaussian mixtures, herein termed discriminative components are responsible for the overall performance, and that inclusion of the other non-discriminative components may only degrade the system performance.
Style APA, Harvard, Vancouver, ISO itp.
31

Tran, Michael. "An approach to a robust speaker recognition system". Diss., This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-06062008-164814/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
32

Ramirez, Jose Luis. "Effects of clipping distortion on an Automatic Speaker Recognition system". Thesis, University of Colorado at Denver, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10112619.

Pełny tekst źródła
Streszczenie:

Clipping distortion is a common problem faced in the audio recording world in which an audio signal is recorded at higher amplitude than the recording system’s limitations, resulting in a portion of the acoustic event not being recorded. Several government agencies employ the use of Automatic Speaker Recognition (ASR) systems in order to identify the speaker of an acquired recording. This is done automatically using a nonbiased approach by running a questioned recording through an ASR system and comparing it to a pre-existing database of voice samples of whom the speakers are known. A matched speaker is indicated by a high correlation of likelihood between the questioned recording and the ones from the known database. It is possible that during the process of making the questioned recording the speaker was speaking too loudly into the recording device, a gain setting was set too high, or there was post-processing done to the point that clipping distortion is introduced into the recording. Clipping distortion results from the amplitude of an audio signal surpassing the maximum sampling value of the recording system. This affects the quantized audio signal by truncating peaks at the max value rather than the actual amplitude of the input signal. In theory clipping distortion will affect likelihood ratios in a negative way between two compared recordings of the same speaker. This thesis will test this hypothesis. Currently there is no research that has helped as a guideline for knowing the limitations when using clipped recordings. This thesis will investigate to what degree of effect will clipped material have on the system performance of a Forensic Automatic Speaker Recognition system.

Style APA, Harvard, Vancouver, ISO itp.
33

Necioğlu, Burhan F. "Objectively measured descriptors for perceptual characterization of speakers". Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15035.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Heimark, Erlend. "Authentication: From Passwords to Biometrics : An implementation of a speaker recognition system on Android". Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for telematikk, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-19004.

Pełny tekst źródła
Streszczenie:
We implement a biometric authentication system on the Android platform, which is based on text-dependent speaker recognition. The Android version used in the application is Android 4.0. The application makes use of the Modular Audio Recognition Framework, from which many of the algorithms are adapted in the processes of preprocessing and feature extraction. In addition, we employ the Dynamic Time Warping (DTW) algorithm for the comparison of different voice features. A training procedure is implemented, using the DTW algorithm to align features. Furthermore, we introduce personal thresholds, based on which the performance for each individual user can be further optimized.We have carried out several tests in order to evaluate the performance of the developed system. The tests are performed on 16 persons, with in total 240 voice samples, of which 15 samples are from each person. As a result, for authentication, one of the optimal trade-offs of the False Acceptance Rate (FAR) and False Rejection Rate (FRR) achieved by the system is shown to be 13% and 12%, respectively. For identification, the system could identify the user correctly with a rate of 81%. Our results show that one can actually improve the system performance in terms of FAR and FRR significantly, through using the training procedure and the personal thresholds.
Style APA, Harvard, Vancouver, ISO itp.
35

Alamri, Safi S. "Text-independent, automatic speaker recognition system evaluation with males speaking both Arabic and English". Thesis, University of Colorado at Denver, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1605087.

Pełny tekst źródła
Streszczenie:

Automatic speaker recognition is an important key to speaker identification in media forensics and with the increase of cultures mixing, there?s an increase in bilingual speakers all around the world. The purpose of this thesis is to compare text-independent samples of one person using two different languages, Arabic and English, against a single language reference population. The hope is that a design can be started that may be useful in further developing software that can complete accurate text-independent ASR for bilingual speakers speaking either language against a single language reference population. This thesis took an Arabic model sample and compared it against samples that were both Arabic and English using and an Arabic reference population, all collected from videos downloaded from the Internet. All of the samples were text-independent and enhanced to optimal performance. The data was run through a biometric software called BATVOX 4.1, which utilizes the MFCCs and GMM methods of speaker recognition and identification. The result of testing through BATVOX 4.1 was likelihood ratios for each sample that were evaluated for similarities and differences, trends, and problems that had occurred.

Style APA, Harvard, Vancouver, ISO itp.
36

Raab, Martin. "Real world approaches for multilingual and non-native speech recognition". Berlin Logos-Verl, 2010. http://d-nb.info/1002021049/04.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
37

Al-Ani, Ahmed Karim. "An improved pattern classification system using optimal feature selection, classifier combination, and subspace mapping techniques". Thesis, Queensland University of Technology, 2002.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Mangayyagari, Srikanth. "Voice recognition system based on intra-modal fusion and accent classification". [Tampa, Fla.] : University of South Florida, 2007. http://purl.fcla.edu/usf/dc/et/SFE0002229.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
39

Bekli, Zeid, i William Ouda. "A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model". Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20122.

Pełny tekst źródła
Streszczenie:
Voice recognition has become a more focused and researched field in the last century,and new techniques to identify speech has been introduced. A part of voice recognition isspeaker verification which is divided into Front-end and Back-end. The first componentis the front-end or feature extraction where techniques such as Mel-Frequency CepstrumCoefficients (MFCC) is used to extract the speaker specific features of a speech signal,MFCC is mostly used because it is based on the known variations of the humans ear’scritical frequency bandwidth. The second component is the back-end and handles thespeaker modeling. The back-end is based on the Gaussian Mixture Model (GMM) andGaussian Mixture Model-Universal Background Model (GMM-UBM) methods forenrollment and verification of the specific speaker. In addition, normalization techniquessuch as Cepstral Means Subtraction (CMS) and feature warping is also used forrobustness against noise and distortion. In this paper, we are going to build a speakerverification system and experiment with a variance in the amount of training data for thetrue speaker model, and to evaluate the system performance. And further investigate thearea of security in a speaker verification system then two methods are compared (GMMand GMM-UBM) to experiment on which is more secure depending on the amount oftraining data available.This research will therefore give a contribution to how much data is really necessary fora secure system where the False Positive is as close to zero as possible, how will theamount of training data affect the False Negative (FN), and how does this differ betweenGMM and GMM-UBM.The result shows that an increase in speaker specific training data will increase theperformance of the system. However, too much training data has been proven to beunnecessary because the performance of the system will eventually reach its highest point and in this case it was around 48 min of data, and the results also show that the GMMUBM model containing 48- to 60 minutes outperformed the GMM models.
Style APA, Harvard, Vancouver, ISO itp.
40

Carbonell, Noëlle. "Reconnaissance de la parole continue et dialogue homme-machine : acquisition et mise en œuvre d'expertises". Nancy 1, 1991. http://www.theses.fr/1991NAN10423.

Pełny tekst źródła
Streszczenie:
Pour améliorer l'efficacité des systèmes de compréhension-gestion de dialogues oraux homme-machine en langue naturelle, nous avons choisi d'enrichir les connaissances et les informations mises en œuvre par ces systèmes. Afin d'accroitre la précision du décodage acoustico-phonétique multi-locuteurs (speaker-independent) de la parole continue, nous avons analyse les compétences d'un expert en lecture de spectrogrammes, puis formalisé et intégré les connaissances recueillies à un systeme expert d'identification phonétique. En ce qui concerne la prosodie, l'étude d'un corpus de dialogues oraux homme-machine jointe à une démarche empirique nous a permis de localiser sur le signal, avec un taux d'erreurs inferieur à 10%, certaines frontières entre unités lexicales contigües. D'autre part, nous sommes parvenus à déterminer des stratégies de dialogue efficaces et conviviales, grâce à l'analyse d'entretiens téléphoniques entre une opératrice expérimentée et les usagers d'un centre de renseignements. Nous avons montré au terme d'une expérience contrôlée de simulation de dialogues oraux homme-machine que les résultats des études sur le dialogue oral homme-homme s'appliquaient à la communication homme-machine. Enfin, nous proposons une architecture spécialisée qui facilite la mise en œuvre des connaissances nécessaires pour comprendre et gérer des dialogues oraux homme-machine relativement complexes sur le plan cognitif. Dans le tome 1, nous présentons et discutons notre approche et les résultats obtenus. Le tome 2 reproduit nos principales publications
Style APA, Harvard, Vancouver, ISO itp.
41

Ying-Hao, Chen, i 陳英豪. "Text-independent Speaker Recognition Systems Using GMM in 200 Mandarin Speakers". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/98000557557954695918.

Pełny tekst źródła
Streszczenie:
碩士
高苑科技大學
電子工程研究所
102
The purpose of this thesis is to establish a text-independent speaker identification system for medium-amount speaker populations.The database is from television recording, containing two hundred voice data. The period of each person's voice is seventy-five seconds.Text-independent speaker identification system operates pros and cons, depending on the rate of accurate identification and the speed of identification systems.In general, the rate of accurate identification is affected by the amount of subject language speakers.This paper classified a large number of databases by vector quantization formula Gaussian mixture model (Vector Quantization Gaussian Mixture Model) so the rate of accurace identification won’t drop off rapidly because of a significant increase in population.We use a independent Gaussian distribution of Gaussian mixture model to simulate a particular speaker feature in extensive voice space.Experimental results show that GMM speaker identification system is outstanding in text-independent pattern.Besides, the concept of pre-grouping makes speaker GMM training much faster than traditional grouping method. It can save up half of the training time.The paper also compares the performance of the diagonal covariance matrix with full covariance matrix. Although the latter's rate of identification slightly higher than the former, the training time is three times more than the former.
Style APA, Harvard, Vancouver, ISO itp.
42

"An automatic speaker recognition system". Chinese University of Hong Kong, 1989. http://library.cuhk.edu.hk/record=b5886206.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
43

"GMM-based speaker recognition for mobile embedded systems". 2004. http://library.cuhk.edu.hk/record=b6073660.

Pełny tekst źródła
Streszczenie:
Leung Cheung-chi.
"July 2004."
Thesis (Ph.D.)--Chinese University of Hong Kong, 2004.
Includes bibliographical references (p. 77-81).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Mode of access: World Wide Web.
Abstracts in English and Chinese.
Style APA, Harvard, Vancouver, ISO itp.
44

Mohan, Aanchan K. "Combining speech recognition and speaker verification". 2008. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17528.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
45

Fang, Yu-Chieh, i 方雨傑. "Using Average Spectrum in English text-independent speaker recognition systems". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/26294843116089040140.

Pełny tekst źródła
Streszczenie:
碩士
高苑科技大學
電機工程研究所
102
the technology of Voice recognition is more important to speaker identification, because of its technical is practiced and application is extended widely. The recognition system is insufficient whether the research or data to the Taiwanese speakers. so, this paper has represented several common voice recognition technologies to help complete the recognition system. In the paper, mainly in create a speaker identification system which apply to unspecific intonation that spoke by few speakers. The data base which we applied is a private broadcasting station that has system characteristic basic on averaged spectrums, recognition reader is Mahalanobis Distance and Euclidean distance, and judgment basis by minimum range. The research is achieved by the soft-ware Matlab,sampling from 70 people each sample has 60 seconds voice signal. The result under the Mahalanobis Distance has 95% characteristic identification, 89% characteristic identification under the Euclidean distance.
Style APA, Harvard, Vancouver, ISO itp.
46

"Text-independent bilingual speaker verification system". 2003. http://library.cuhk.edu.hk/record=b5891732.

Pełny tekst źródła
Streszczenie:
Ma Bin.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 96-102).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Biometrics --- p.2
Chapter 1.2 --- Speaker Verification --- p.3
Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4
Chapter 1.4 --- Text Dependency --- p.4
Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5
Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6
Chapter 1.5 --- Language Dependency --- p.6
Chapter 1.6 --- Normalization Techniques --- p.7
Chapter 1.7 --- Objectives of the Thesis --- p.8
Chapter 1.8 --- Thesis Organization --- p.8
Chapter 2 --- Background --- p.10
Chapter 2.1 --- Background Information --- p.11
Chapter 2.1.1 --- Speech Signal Acquisition --- p.11
Chapter 2.1.2 --- Speech Processing --- p.11
Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13
Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14
Chapter 2.1.5 --- Feature Parameters --- p.15
Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16
Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18
Chapter 2.1.5.3 --- Energy Measures --- p.20
Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21
Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22
Chapter 2.2 --- Common Techniques --- p.24
Chapter 2.2.1 --- Template Model Matching Methods --- p.25
Chapter 2.2.2 --- Statistical Model Methods --- p.26
Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27
Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30
Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31
Chapter 2.2.2.4 --- The Advantages of GMM --- p.32
Chapter 2.2.3 --- Likelihood Scoring --- p.32
Chapter 2.2.4 --- General Approach to Decision Making --- p.35
Chapter 2.2.5 --- Cohort Normalization --- p.35
Chapter 2.2.5.1 --- Probability Score Normalization --- p.36
Chapter 2.2.5.2 --- Cohort Selection --- p.37
Chapter 2.3 --- Chapter Summary --- p.38
Chapter 3 --- Experimental Corpora --- p.39
Chapter 3.1 --- The YOHO Corpus --- p.39
Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39
Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40
Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41
Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42
Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42
Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44
Chapter 3.3 --- Chapter Summary --- p.46
Chapter 4 --- Text-Dependent Speaker Verification --- p.47
Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48
Chapter 4.2 --- Cohort Normalization Setup --- p.50
Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53
Chapter 4.3.1 --- Subword HMM Models --- p.53
Chapter 4.3.2 --- Experimental Results --- p.55
Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55
Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58
Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61
Chapter 4.4.1 --- Experimental Setup --- p.61
Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62
Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64
Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65
Chapter 4.5 --- Comparison with Previous Systems --- p.67
Chapter 4.6 --- Chapter Summary --- p.70
Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71
Chapter 5.1 --- Front-End Processing of the CUBS --- p.72
Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73
Chapter 5.3 --- Cohort Normalization --- p.74
Chapter 5.4 --- Experimental Results and Analysis --- p.75
Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78
Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79
Chapter 5.4.3 --- Language Dependency --- p.80
Chapter 5.4.4 --- Language-Independency --- p.83
Chapter 5.5 --- Chapter Summary --- p.88
Chapter 6 --- Conclusions and Future Work --- p.90
Chapter 6.1 --- Summary --- p.90
Chapter 6.1.1 --- Feature Comparison --- p.91
Chapter 6.1.2 --- HMM Modeling --- p.91
Chapter 6.1.3 --- GMM Modeling --- p.91
Chapter 6.1.4 --- Cohort Normalization --- p.92
Chapter 6.1.5 --- Language Dependency --- p.92
Chapter 6.2 --- Future Work --- p.93
Chapter 6.2.1 --- Feature Parameters --- p.93
Chapter 6.2.2 --- Model Quality --- p.93
Chapter 6.2.2.1 --- Variance Flooring --- p.93
Chapter 6.2.2.2 --- Silence Detection --- p.94
Chapter 6.2.3 --- Conversational Speaker Verification --- p.95
Bibliography --- p.102
Style APA, Harvard, Vancouver, ISO itp.
47

Zhang, Danqing. "Multi-speaker isolated digit recognition using artificial neural networks". Master's thesis, 1994. http://hdl.handle.net/1885/145777.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

"Speaker recognition using complementary information from vocal source and vocal tract". Thesis, 2005. http://library.cuhk.edu.hk/record=b6074159.

Pełny tekst źródła
Streszczenie:
Experimental results show that source-tract information fusion can also improve the robustness of speaker recognition systems in mismatched conditions. For example, relative improvements of 15.3% and 12.6% have been achieved for speaker identification and verification, respectively.
For speaker verification, a text-dependent weighting scheme is developed. Analysis results show that the source-tract discrimination ratio varies significantly across different sounds due to the diversity of vocal system configurations in speech production. This thesis analyzes the source-tract speaker discrimination ratio for the 10 Cantonese digits, upon which a digit-dependent source-tract weighting scheme is developed. Information fusion with such digit-dependent weights relatively improves the verification performance by 39.6% in matched conditions.
This thesis investigates the feasibility of using both vocal source and vocal tract information to improve speaker recognition performance. Conventional speaker recognition systems typically employ vocal tract related acoustic features, e.g the Mel-frequency cepstral coefficients (MFCC), for discriminative purpose. Motivated by the physiological significance of the vocal source and vocal tract system in speech production, this thesis develops a speaker recognition system to effectively incorporate these two complementary information sources for improved performance and robustness.
This thesis presents a novel approach of representing the speaker-specific vocal source characteristics. The linear predictive (LP) residual signal is adopted as a good representative of the vocal source excitation, in which the speaker specific information resides on both time and frequency domains. Haar transform and wavelet transform are applied for multi-resolution analyses of the LP residual signal. The resulting vocal source features, namely the Haar octave coefficients of residues (HOCOR) and wavelet octave coefficients of residues (WOCOR), can effectively extract the speaker-specific spectro-temporal characteristics of the LP residual signal. Particularly, with pitch-synchronous wavelet transform, the WOCOR feature set is capable of capturing the pitch-related low frequency properties and the high frequency information associated with pitch epochs, as well as their temporal variations within a pitch period and over consecutive periods. The generated vocal source and vocal tract features are complementary to each other since they are derived from two orthogonal components, the LP residual signal and LP coefficients. Therefore they can be fused to provide better speaker recognition performance. A preliminary scheme of fusing MFCC and WOCOR together illustrated that the identification and verification performance can be respectively improved by 34.6% and 23.6%, both in matched conditions.
To maximize the benefit obtained through the fusion of source and tract information, speaker discrimination dependent fusion techniques have been developed. For speaker identification, a confidence measure, which indicates the reliability of vocal source feature in speaker identification, is derived based on the discrimination ratio between the source and tract features in each identification trial. Information fusion with confidence measure offers better weighted scores given by the two features and avoids possible errors introduced by incorporating source information, thereby improves the identification performance further. Compared with MFCC, relative improvement of 46.8% has been achieved.
Zheng Nengheng.
"November 2005."
Adviser: Pak-Chung Ching.
Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6647.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references (p. 123-135).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.
Style APA, Harvard, Vancouver, ISO itp.
49

Chandrasekaran, Aravind. "Efficient methods for rapid UBM training (RUT) for robust speaker verification /". 2008. http://proquest.umi.com/pqdweb?did=1650508671&sid=2&Fmt=2&clientId=10361&RQT=309&VName=PQD.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

Moonasar, Viresh. "Hardware implementation of an automatic speaker recognition system using artificial neural networks". Thesis, 2002. http://hdl.handle.net/10321/2714.

Pełny tekst źródła
Streszczenie:
Submitted in fulfillment of the academic requirements for the degree of Master of Technology in Electrical Engineering in the Department of Electronic Engineering, Faculty of Engineering, ML Sultan Technikon of Durban in South Africa, March 2002.
The use of speaker recognition technology in interactive voice response and electronic commerce systems has been limited. This is due to the lack of research attention and published results when compared to all the other areas of speech recognition technologies
M
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii