Dissertations / Theses on the topic 'Speaker identification systems'

To see the other types of publications on this topic, follow the link: Speaker identification systems.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 25 dissertations / theses for your research on the topic 'Speaker identification systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
APA, Harvard, Vancouver, ISO, and other styles
2

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
3

Phythian, Mark. "Speaker identification for forensic applications." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36079/3/__qut.edu.au_Documents_StaffHome_StaffGroupR%24_rogersjm_Desktop_36079_Digitised%20Thesis.pdf.

Full text
Abstract:
A major application of Speaker Identification (SI) is suspect identification by voice. This thesis investigates techniques that can be used to improve SI technology as applied to suspect identification. Speech Coding techniques have become integrated into many of our modern voice communications systems. This prompts the question - how are automatic speaker identification systems and modern forensic identification techniques affected by the introduction of digitally coded speech channels? Presented in this thesis are three separate studies investigating the effects of speech coding and compression on current speaker recognition techniques. A relatively new Spectral Analysis technique - Higher Order Spectral Analysis (HOSA) - has been identified as a potential candidate for improving some aspects of forensic speaker identification tasks. Presented in this thesis is a study investigating the application of HOSA to improve the robustness of current ASR techniques in the presence of additive Gaussian noise. Results from our investigations reveal that incremental improvements in each of these aspects related to automatic and forensic identification are achievable.
APA, Harvard, Vancouver, ISO, and other styles
4

Gedik, Berk. "Evaluation of Text-Independent and Closed-Set Speaker Identification Systems." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239625.

Full text
Abstract:
Speaker recognition is the task of recognizing a speaker of a given speech record and it has wide application areas. In this thesis, various machine learning models such as Gaussian Mixture Model (GMM), k-Nearest Neighbor(k-NN) Model and Support Vector Machines (SVM) and feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Predictive Cepstral Coefficients (LPCC) are investigated for the speaker recognition task. Combinations of those models and feature extraction methods are evaluated on many datasets varying on the number of speakers and training data size. This way, the performance of methods in different settings are analyzed. As results, it is found that GMM and KNN methods are providing good accuracies and LPCC method performs better than MFCC. Also, the effect of audio recording duration, training data duration and number of speakers on the prediction accuracy is analyzed.
Talarigenkänning är en benämning på tekniker som syftar till att identifiera en talare givet en inspelning av dennes röst; dessa tekniker har ett brett användningsområde. I det här examensarbetet tillämpas ett antal maskininlärningsmodeller på uppgiften att känna igen talare. Modellerna är Gaussian Mixture Model(GMM), k-Nearest Neighbour(k-NN) och Support Vector Machine(SVM). Olika tekniker för att ta fram  variabler till modelleringen provas, såsom Mel-Frequency Cepstral Coefficients (MFCC) och Linear Predictive Cepstral Coefficients (LPCC). Teknikernas lämplighet för talarigenkänning undersöks. Kombinationer av ovan nämnda modeller och tekniker utvärderas över många olika dataset som skiljer sig åt i antalet talare samt mängden data. På så sätt utvärderas och analyseras de olika metoderna för olika förut- sättningar. Resultaten innehåller bland annat utfallen att både GMM och kNN ger hög träffsäkerhet medan LPCC ger högre träffsäkerhet än MFCC. Även effekten av inspelningslängden för de olika rösterna, den sammanlagda längden på träningsdatan samt antalet talare på de olika modellerna analyseras och presenteras.
APA, Harvard, Vancouver, ISO, and other styles
5

Cohen, Zachary Gideon. "Noise Reduction with Microphone Arrays for Speaker Identification." DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/884.

Full text
Abstract:
The presence of acoustic noise in audio recordings is an ongoing issue that plagues many applications. This ambient background noise is difficult to reduce due to its unpredictable nature. Many single channel noise reduction techniques exist but are limited in that they may distort the desired speech signal due to overlapping spectral content of the speech and noise. It is therefore of interest to investigate the use of multichannel noise reduction algorithms to further attenuate noise while attempting to preserve the speech signal of interest. Specifically, this thesis looks to investigate the use of microphone arrays in conjunction with multichannel noise reduction algorithms to aid aiding in speaker identification. Recording a speaker in the presence of acoustic background noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the noise environment where the speech sample is taken, noise reduction algorithms must be developed and applied to clean the speech signal in order to give speaker identification software a chance at a positive identification. Due to the limitations of single channel techniques, it is of interest to see if spatial information provided by microphone arrays can be exploited to aid in speaker identification. This thesis provides an exploration of several time domain multichannel noise reduction techniques including delay sum beamforming, multi-channel Wiener filtering, and Spatial-Temporal Prediction filtering. Each algorithm is prototyped and filter performance is evaluated using various simulations and experiments. A three-dimensional noise model is developed to simulate and compare the performance of the above methods and experimental results of three data collections are presented and analyzed. The algorithms are compared and recommendations are given for the use of each technique. Finally, ideas for future work are discussed to improve performance and implementation of these multichannel algorithms. Possible applications for this technology include audio surveillance, identity verification, video chatting, conference calling and sound source localization.
APA, Harvard, Vancouver, ISO, and other styles
6

Chan, Siu Man. "Improved speaker verification with discrimination power weighting /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20CHANS.

Full text
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 86-93). Also available in electronic version. Access restricted to campus users.
APA, Harvard, Vancouver, ISO, and other styles
7

Al-Kaltakchi, Musab Tahseen Salahaldeen. "Robust text independent closed set speaker identification systems and their evaluation." Thesis, University of Newcastle upon Tyne, 2018. http://hdl.handle.net/10443/3978.

Full text
Abstract:
This thesis focuses upon text independent closed set speaker identi cation. The contributions relate to evaluation studies in the presence of various types of noise and handset e ects. Extensive evaluations are performed on four databases. The rst contribution is in the context of the use of the Gaussian Mixture Model-Universal Background Model (GMM-UBM) with original speech recordings from only the TIMIT database. Four main simulations for Speaker Identi cation Accuracy (SIA) are presented including di erent fusion strategies: Late fusion (score based), early fusion (feature based) and early-late fusion (combination of feature and score based), late fusion using concatenated static and dynamic features (features with temporal derivatives such as rst order derivative delta and second order derivative delta-delta features, namely acceleration features), and nally fusion of statistically independent normalized scores. The second contribution is again based on the GMM-UBM approach. Comprehensive evaluations of the e ect of Additive White Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and without a G.712 type handset) upon identi cation performance are undertaken. In particular, three NSN types with varying Signal to Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus interior and a crowded talking environment. The performance evaluation also considered the e ect of late fusion techniques based on score fusion, namely mean, maximum, and linear weighted sum fusion. The databases employed were: TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3,600 speech utterances. The third contribution is based on the use of the I-vector, four combinations of I-vectors with 100 and 200 dimensions were employed. Then, various fusion techniques using maximum, mean, weighted sum and cumulative fusion with the same I-vector dimension were used to improve the SIA. Similarly, both interleaving and concatenated I-vector fusion were exploited to produce 200 and 400 I-vector dimensions. The system was evaluated with four di erent databases using 120 speakers from each database. TIMIT, SITW and NIST 2008 databases were evaluated for various types of NSN namely, street-tra c NSN, bus-interior NSN and crowd talking NSN; and the G.712 type handset at 16 kHz was also applied. As recommendations from the study in terms of the GMM-UBM approach, mean fusion is found to yield overall best performance in terms of the SIA with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings. However, in the I-vector approach the best SIA was obtained from the weighted sum and the concatenated fusion.
APA, Harvard, Vancouver, ISO, and other styles
8

Reynolds, Douglas A. "A Gaussian mixture modeling approach to text-independent speaker identification." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/16903.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Leis, John W. "Spectral coding methods for speech compression and speaker identification." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36062/7/36062_Digitised_Thesis.pdf.

Full text
Abstract:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
APA, Harvard, Vancouver, ISO, and other styles
10

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition." Thesis, Queensland University of Technology, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
11

Castellano, Pierre John. "Speaker recognition modelling with artificial neural networks." Thesis, Queensland University of Technology, 1997.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
12

Slomka, Stefan. "Multiple classifier structures for automatic speaker recognition under adverse conditions." Thesis, Queensland University of Technology, 1999.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
13

Barger, Peter James. "Speech processing for forensic applications." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36081/1/36081_Barger_1998.pdf.

Full text
Abstract:
This thesis examines speech processing systems appropriate for use in forensic analysis. The need for automatic speech processing systems for forensic use is justified by the increasing use of electronically recorded speech for communication. An automatic speaker identification and verification system is described which was tested on data gathered by the Queensland Police Force. Speaker identification using Gaussian mixture models (GMMs) is shown to be useful as an indicator of identity, but not sufficiently accurate to be used as the sole means of identification. It is shown that training GMMs on speech of one language and testing on speech of another language introduces significant bias into the results, which is unpredictable in its effects. This has implications for the performance of the system on subjects attempting to disguise their voices. Automatic gender identification systems are shown to be highly accurate, attaining 98% accuracy, even with very simple classifiers, and when tested on speech degraded by coding or reverberation. These gender gates are useful as initial classifiers in a larger speaker classification system and may even find independent use in a forensic environment. A dual microphone method of improving the performance of speaker identification systems in noisy environments is described. The method gives a significant improvement in log-likelihood scores when its output is used as input to a GMM. This implies that speaker identification tests may be improved in accuracy. A method of automatically assessing the quality of transmitted speech segments using a classification scheme is described. By classifying the difference between cepstral parameters describing the original speech and the transmitted speech, an estimate of the speech quality is obtained.
APA, Harvard, Vancouver, ISO, and other styles
14

Lucey, Simon. "Audio-visual speech processing." Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Full text
Abstract:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
APA, Harvard, Vancouver, ISO, and other styles
15

Hubeika, Valiantsina. "Intersession Variability Compensation in Language and Speaker Identification." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235432.

Full text
Abstract:
Variabilita kanálu a hovoru je velmi důležitým problémem v úloze rozpoznávání mluvčího. V současné době je ve velkém množství vědeckých článků uvedeno několik technik pro kompenzaci vlivu kanálu. Kompenzace vlivu kanálu může být implementována jak v doméně modelu, tak i v doménách příznaků i skóre. Relativně nová výkoná technika je takzvaná eigenchannel adaptace pro GMM (Gaussian Mixture Models). Mevýhodou této metody je nemožnost její aplikace na jiné klasifikátory, jako napřílad takzvané SVM (Support Vector Machines), GMM s různým počtem Gausových komponent nebo v rozpoznávání řeči s použitím skrytých markovových modelů (HMM). Řešením může být aproximace této metody, eigenchannel adaptace v doméně příznaků. Obě tyto techniky, eigenchannel adaptace v doméně modelu a doméně příznaků v systémech rozpoznávání mluvčího, jsou uvedeny v této práci. Po dosažení dobrých výsledků v rozpoznávání mluvčího, byl přínos těchto technik zkoumán pro akustický systém rozpoznávání jazyka zahrnující 14 jazyků. V této úloze má nežádoucí vliv nejen variabilita kanálu, ale i variabilita mluvčího. Výsledky jsou prezentovány na datech definovaných pro evaluaci rozpoznávání mluvčího z roku 2006 a evaluaci rozpoznávání jazyka v roce 2007, obě organizované Amerických Národním Institutem pro Standard a Technologie (NIST)
APA, Harvard, Vancouver, ISO, and other styles
16

Al-Ani, Ahmed Karim. "An improved pattern classification system using optimal feature selection, classifier combination, and subspace mapping techniques." Thesis, Queensland University of Technology, 2002.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
17

Huang, Hung-Pin, and 黃紘斌. "Robust Speaker Identification Systems against Additive Noise and Reverberation." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/82731413646820206152.

Full text
Abstract:
碩士
國立交通大學
電信工程研究所
100
Conventional speaker recognition systems usually use MFCCs as features, which are known as low-level features and the performance are severely compromised by interference, such as additive or convolutional noises. On the contrary, the high-level features seemed to be less accuracy in clean condition but more robustness in noisy environment. In this thesis, we first pass the speech wave thorough the hearing models, then choose 9 sets of outputs after spectro-temporal modulation filtering as our features and apply those features for robust speaker recognition. The results show that, in reverberant condition, we performs better after T60 is more than 0.6, and in noisy criteria, we get significant improvement for all SNR conditions compare to MFCCs ,and superior performance to ANTCCs is also showed in low SNR conditions.
APA, Harvard, Vancouver, ISO, and other styles
18

Mohan, Aanchan K. "Combining speech recognition and speaker verification." 2008. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17528.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Lin, Guan-Liang, and 林冠良. "An MFCC-based Speaker Identification System." Thesis, 2017. http://ndltd.ncl.edu.tw/handle/fx7824.

Full text
Abstract:
碩士
東海大學
資訊工程學系
105
Nowadays, speech recognition has many practical applications which are currently used by people in the world. Typical examples are the SIRI of iPhone, Google speech recognition system, and mobile phones operated by voice, etc. On the contrary, speaker identification in its current stage is relatively immature. Therefore, in this paper, we study a speaker identification technique which first takes the original voice signals of a person, e.g., Bob. After that, the voice signals is converted from time domain to frequency domain by employing the Fourier transformation approach. A MFCC-based human auditory filtering model is then utilized to adjust the energy levels of different frequencies as the quantified characteristics of Bob’s voice. Next, the energies are normalized to the scales of logarithm as the feature of the voice signals. Further, the probability density function of Gaussian mixture model is employed to represent the distribution of the logarithmic characteristics as Bob’s specific acoustic model. When receiving an unknown person, e.g., x’s voice, the system processes the voice with the same procedure, and compares the processing result, which is x’s acoustic model, with known-people’s acoustic models collected in an acoustic-model database beforehand to identify who the most possible speaker is.
APA, Harvard, Vancouver, ISO, and other styles
20

Lyu, Yi-Chen, and 呂易宸. "Speech Access System based on Speaker Identification." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/49174049479521734264.

Full text
Abstract:
碩士
國立中央大學
電機工程研究所
99
The purpose of this thesis is to design a speech access system with speaker recognition technology which can determine whether the input sound of the user voice is valid or not. Combined with keywords spotting technology, the system can identify the name of users. And coupled with text-to-speech technology, the system uses not only a text but also human voice response. System built by Microsoft Foundation Classes (MFC) windows based interface is facilitated for the user to operate. Because access control system needs to meet the requirements of real-time or online, as the result, the consumed time of used methods must take into account because users would not spend much time waiting for results. Therefore, methods must be selective since they affect the recognition rate and time seems to be regarded as the prerequisite element while selecting the appropriate algorithm. There are 40 participants join this test, and there are 38 target users among them, while the other two are imposers. Speaker recognition rate is 94.9%, the false acceptance rate is 0.8%, and the keyword recognition rate is 90.6%. The average recognition sentences are about 0.5 seconds each. Identification has been up to the real-time requirements.
APA, Harvard, Vancouver, ISO, and other styles
21

Yu, Ming-Der, and 于明德. "A Design of Text Independent Speaker Identification System for Telephone Speech." Thesis, 1997. http://ndltd.ncl.edu.tw/handle/48072274198405347870.

Full text
Abstract:
碩士
國立中山大學
電機工程研究所
85
A Text independent speaker identification system based on long-term spectral feature averaging (LTA) and Karhunen-Loeve transform (KLT) for telephone speech is proposed. This system uses basis function derived from KLT to effectively reduce the data volume and preserve most of the identification information for each speaker. A database with 61 male and 72 female mandarin speakers recorded from the telephone answering system are collected for system evaluation. By the use of the first 28 features from the 128 basis functions, it is demonstrated that the correct classification rate can reach 88% if our special frame selection criterion is used, and correct classification rate is 73% only if the special frame selection criterion is not used! And the same time the classification time can be reduced to about 3% of the classification time with 128 features!
APA, Harvard, Vancouver, ISO, and other styles
22

CHIANG, CHANG YEN, and 蔣昌言. "Text-Independent Speaker Identification System Based on Karhunen-Loeve Transform and Quadratic Classifier." Thesis, 1996. http://ndltd.ncl.edu.tw/handle/19284077261651106741.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Tsai, Zong-Syuan, and 蔡宗軒. "Speaker Identification System based on Long Term Average Spectrum and Speech Content Distribution." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/10322757746063947883.

Full text
Abstract:
碩士
國立臺灣大學
電機工程學研究所
100
Timbre is the important characteristic that human can distinguish the difference between each other by their voice. This thesis aims to give a feature of timbre that considers both Long Term Average Spectrum (LTAS) and speech content distribution and implements to speaker identification system. LTAS is a feature influenced by both characteristics of speaker and content, so the same speaker still has inconsistent patterns of LTAS. The inconsistency of patterns also directly influences the accuracy of speaker identification using LTAS as feature. To increase the accuracy by considering the effect of content, this thesis proposes the idea of Pseudo LTAS. All Taiwanese Mandarin phonemes are analyzed. Then the influential phonemes are chosen and their average spectra are derived as the components of speaker database. When the test speech signal is inputted, system recognizes its content and synthesizes the pseudo LTAS weighted by the content for individual. Because the contents of Pseudo LTAS and test speech signal are same, the accuracy of speaker identification using Pseudo LTAS as the decision pattern will be better than the one using LTAS which ignores the influence of content. The accuracy of speaker identification system using Pseudo LTAS is 94.2 %.
APA, Harvard, Vancouver, ISO, and other styles
24

Sanchez, Jose Boris Meyer-Baese Anke. "Speaker identification based on an integrated system combining cepstral feature extraction and vector quantization." Diss., 2005. http://etd.lib.fsu.edu/theses/available/etd-04172005-151234.

Full text
Abstract:
Thesis (M.S.)--Florida State University, 2005.
Advisor: Dr. Anke Meyer-Baese, Florida State University, College of Engineering, Dept. of Electrical Engineering. Title and description from dissertation home page (viewed June 15, 2005). Document formatted into pages; contains vii, 30 pages. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
25

古詩峰. "A Large Population Speaker Identification System Based on Wavelet Transform Features by Using Microphone and Telephone Speech Corpus." Thesis, 2003. http://ndltd.ncl.edu.tw/handle/54547071414963915299.

Full text
Abstract:
碩士
長庚大學
電機工程研究所
91
In this thesis, we have established a large population speaker identification system which is based on Gaussian Mixture Models using wavelet based features. The corpus for experiments of this thesis are the well-known TIMIT phonetically balance speech database for microphone speech (TIMIT) and a new collected Formosa Speech Corpus for telephone speech (FSC-Tel 1000). We achieve the best speaker accuracy of 99.127% for TIMIT corpus and 95.5% for FSC-Tel 1000 corpus. About wavelet based features, our objective is to found a suitable wavelet decomposing tree by which the speech waveform could be transformed to the wavelet coefficients. In our experiments, we found that different wavelet decomposing trees play an important role for the accuracy of a speaker identification system. In the thesis we have found several efficient types of wavelet decomposing trees. However the optimal trees have not yet been found. We also implemented a speaker identification system on Windows 2000/XP in which a user only speaks a segment of speech of about 25 seconds for registering his voice in the system. In real-time testing, each speaker speaks only 2 seconds and then the system can correctly identify him with a high accuracy. In the system the feature extraction consumes the most time. For this reason we must found a good algorithm to speed up the feature extraction.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography