Academic literature on the topic 'Speaker recognition systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speaker recognition systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speaker recognition systems"

1

Gonzalez-Rodriguez, Joaquin. "Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)." Loquens 1, no. 1 (June 30, 2014): e007. http://dx.doi.org/10.3989/loquens.2014.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Bouziane, Ayoub, Jamal Kharroubi, and Arsalane Zarghili. "Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models." International Journal of Electrical and Computer Engineering (IJECE) 7, no. 6 (December 1, 2017): 3655. http://dx.doi.org/10.11591/ijece.v7i6.pp3655-3663.

Full text
Abstract:
<p>This paper presents a novel speaker modeling approachfor speaker recognition systems. The basic idea of this approach consists of deriving the target speaker model from a personalized background model, composed only of the UBM Gaussian components which are really present in the speech of the target speaker. The motivation behind the derivation of speakers’ models from personalized background models is to exploit the observeddifference insome acoustic-classes between speakers, in order to improve the performance of speaker recognition systems.</p>The proposed approach was evaluatedfor speaker verification task using various amounts of training and testing speech data. The experimental results showed that the proposed approach is efficientin termsof both verification performance and computational cost during the testing phase of the system, compared to the traditional UBM based speaker recognition systems.
APA, Harvard, Vancouver, ISO, and other styles
3

Singh, Satyanand. "Forensic and Automatic Speaker Recognition System." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 5 (October 1, 2018): 2804. http://dx.doi.org/10.11591/ijece.v8i5.pp2804-2811.

Full text
Abstract:
<span lang="EN-US">Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics</span>
APA, Harvard, Vancouver, ISO, and other styles
4

Singh, Mahesh K., P. Mohana Satya, Vella Satyanarayana, and Sridevi Gamini. "Speaker Recognition Assessment in a Continuous System for Speaker Identification." International Journal of Electrical and Electronics Research 10, no. 4 (December 30, 2022): 862–67. http://dx.doi.org/10.37391/ijeer.100418.

Full text
Abstract:
This research article presented and focused on recognizing speakers through multi-speaker speeches. The participation of several speakers includes every conference, talk or discussion. This type of talk has different problems as well as stages of processing. Challenges include the unique impurity of the surroundings, the involvement of speakers, speaker distance, microphone equipment etc. In addition to addressing these hurdles in real time, there are also problems in the treatment of the multi-speaker speech. Identifying speech segments, separating the speaking segments, constructing clusters of similar segments and finally recognizing the speaker using these segments are the common sequential operations in the context of multi-speaker speech recognition. All linked phases of speech recognition processes are discussed with relevant methodologies in this article. This entire article will examine the common metrics, methods and conduct. This paper examined the algorithm of speech recognition system at different stages. The voice recognition systems are built through many phases such as voice filter, speaker segmentation, speaker idolization and the recognition of the speaker by 20 speakers.
APA, Harvard, Vancouver, ISO, and other styles
5

Mridha, Muhammad Firoz, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md Abdul Hamid, Md Rashedul Islam, and Yutaka Watanobe. "U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data." Applied Sciences 11, no. 21 (October 27, 2021): 10079. http://dx.doi.org/10.3390/app112110079.

Full text
Abstract:
Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, termed unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, a Bengali dataset is included to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves satisfactory performance using pairwise architectures.
APA, Harvard, Vancouver, ISO, and other styles
6

Nematollahi, Mohammad Ali, and S. A. R. Al-Haddad. "Distant Speaker Recognition: An Overview." International Journal of Humanoid Robotics 13, no. 02 (May 25, 2016): 1550032. http://dx.doi.org/10.1142/s0219843615500322.

Full text
Abstract:
Distant speaker recognition (DSR) system assumes the microphones are far away from the speaker’s mouth. Also, the position of microphones can vary. Furthermore, various challenges and limitation in terms of coloration, ambient noise and reverberation can bring some difficulties for recognition of the speaker. Although, applying speech enhancement techniques can attenuate speech distortion components, it may remove speaker-specific information and increase the processing time in real-time application. Currently, many efforts have been investigated to develop DSR for commercial viable systems. In this paper, state-of-the-art techniques in DSR such as robust feature extraction, feature normalization, robust speaker modeling, model compensation, dereverberation and score normalization are discussed to overcome the speech degradation components i.e., reverberation and ambient noise. Performance results on DSR show that whenever speaker to microphone distant increases, recognition rates decreases and equal error rate (EER) increases. Finally, the paper concludes that applying robust feature and robust speaker model varying lesser with distant, can improve the DSR performance.
APA, Harvard, Vancouver, ISO, and other styles
7

Garcia‐Romero, Daniel, and Carol Espy‐Wilson. "Automatic speaker recognition: Advances toward informative systems." Journal of the Acoustical Society of America 128, no. 4 (October 2010): 2394. http://dx.doi.org/10.1121/1.3508584.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Padmanabhan, M., L. R. Bahl, D. Nahamoo, and M. A. Picheny. "Speaker clustering and transformation for speaker adaptation in speech recognition systems." IEEE Transactions on Speech and Audio Processing 6, no. 1 (1998): 71–77. http://dx.doi.org/10.1109/89.650313.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Singh, Satyanand. "Bayesian distance metric learning and its application in automatic speaker recognition systems." International Journal of Electrical and Computer Engineering (IJECE) 9, no. 4 (August 1, 2019): 2960. http://dx.doi.org/10.11591/ijece.v9i4.pp2960-2967.

Full text
Abstract:
This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.
APA, Harvard, Vancouver, ISO, and other styles
10

Kamiński, Kamil A., and Andrzej P. Dobrowolski. "Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features." Sensors 22, no. 23 (December 1, 2022): 9370. http://dx.doi.org/10.3390/s22239370.

Full text
Abstract:
This article presents the Automatic Speaker Recognition System (ASR System), which successfully resolves problems such as identification within an open set of speakers and the verification of speakers in difficult recording conditions similar to telephone transmission conditions. The article provides complete information on the architecture of the various internal processing modules of the ASR System. The speaker recognition system proposed in the article, has been compared very closely to other competing systems, achieving improved speaker identification and verification results, on known certified voice dataset. The ASR System owes this to the dual use of genetic algorithms both in the feature selection process and in the optimization of the system’s internal parameters. This was also influenced by the proprietary feature generation and corresponding classification process using Gaussian mixture models. This allowed the development of a system that makes an important contribution to the current state of the art in speaker recognition systems for telephone transmission applications with known speech coding standards.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Speaker recognition systems"

1

Neville, Katrina Lee, and katrina neville@rmit edu au. "Channel Compensation for Speaker Recognition Systems." RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453.

Full text
Abstract:
This thesis attempts to address the problem of how best to remedy different types of channel distortions on speech when that speech is to be used in automatic speaker recognition and verification systems. Automatic speaker recognition is when a person's voice is analysed by a machine and the person's identity is worked out by the comparison of speech features to a known set of speech features. Automatic speaker verification is when a person claims an identity and the machine determines if that claimed identity is correct or whether that person is an impostor. Channel distortion occurs whenever information is sent electronically through any type of channel whether that channel is a basic wired telephone channel or a wireless channel. The types of distortion that can corrupt the information include time-variant or time-invariant filtering of the information or the addition of 'thermal noise' to the information, both of these types of distortion can cause varying degrees of error in information being received and analysed. The experiments presented in this thesis investigate the effects of channel distortion on the average speaker recognition rates and testing the effectiveness of various channel compensation algorithms designed to mitigate the effects of channel distortion. The speaker recognition system was represented by a basic recognition algorithm consisting of: speech analysis, extraction of feature vectors in the form of the Mel-Cepstral Coefficients, and a classification part based on the minimum distance rule. Two types of channel distortion were investigated: • Convolutional (or lowpass filtering) effects • Addition of white Gaussian noise Three different methods of channel compensation were tested: • Cepstral Mean Subtraction (CMS) • RelAtive SpecTrAl (RASTA) Processing • Constant Modulus Algorithm (CMA) The results from the experiments showed that for both CMS and RASTA processing that filtering at low cutoff frequencies, (3 or 4 kHz), produced improvements in the average speaker recognition rates compared to speech with no compensation. The levels of improvement due to RASTA processing were higher than the levels achieved due to the CMS method. Neither the CMS or RASTA methods were able to improve accuracy of the speaker recognition system for cutoff frequencies of 5 kHz, 6 kHz or 7 kHz. In the case of noisy speech all methods analysed were able to compensate for high SNR of 40 dB and 30 dB and only RASTA processing was able to compensate and improve the average recognition rate for speech corrupted with a high level of noise (SNR of 20 dB and 10 dB).
APA, Harvard, Vancouver, ISO, and other styles
2

Du, Toit Ilze. "Non-acoustic speaker recognition." Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16315.

Full text
Abstract:
Thesis (MScIng)--University of Stellenbosch, 2004.
ENGLISH ABSTRACT: In this study the phoneme labels derived from a phoneme recogniser are used for phonetic speaker recognition. The time-dependencies among phonemes are modelled by using hidden Markov models (HMMs) for the speaker models. Experiments are done using firstorder and second-order HMMs and various smoothing techniques are examined to address the problem of data scarcity. The use of word labels for lexical speaker recognition is also investigated. Single word frequencies are counted and the use of various word selections as feature sets are investigated. During April 2004, the University of Stellenbosch, in collaboration with Spescom DataVoice, participated in an international speaker verification competition presented by the National Institute of Standards and Technology (NIST). The University of Stellenbosch submitted phonetic and lexical (non-acoustic) speaker recognition systems and a fused system (the primary system) that fuses the acoustic system of Spescom DataVoice with the non-acoustic systems of the University of Stellenbosch. The results were evaluated by means of a cost model. Based on the cost model, the primary system obtained second and third position in the two categories that were submitted.
AFRIKAANSE OPSOMMING: Hierdie projek maak gebruik van foneem-etikette wat geklassifiseer word deur ’n foneemherkenner en daarna gebruik word vir fonetiese sprekerherkenning. Die tyd-afhanklikhede tussen foneme word gemodelleer deur gebruik te maak van verskuilde Markov modelle (HMMs) as sprekermodelle. Daar word ge¨eksperimenteer met eerste-orde en tweede-orde HMMs en verskeie vergladdingstegnieke word ondersoek om dataskaarsheid aan te spreek. Die gebruik van woord-etikette vir sprekerherkenning word ook ondersoek. Enkelwoordfrekwensies word getel en daar word ge¨eksperimenteer met verskeie woordseleksies as kenmerke vir sprekerherkenning. Gedurende April 2004 het die Universiteit van Stellenbosch in samewerking met Spescom DataVoice deelgeneem aan ’n internasionale sprekerverifikasie kompetisie wat deur die National Institute of Standards and Technology (NIST) aangebied is. Die Universiteit van Stellenbosch het ingeskryf vir ’n fonetiese en ’n woordgebaseerde (nie-akoestiese) sprekerherkenningstelsel, asook ’n saamgesmelte stelsel wat as primˆere stelsel dien. Die saamgesmelte stelsel is ’n kombinasie van Spescom DataVoice se akoestiese stelsel en die twee nie-akoestiese stelsels van die Universiteit van Stellenbosch. Die resultate is ge¨evalueer deur gebruik te maak van ’n koste-model. Op grond van die koste-model het die primˆere stelsel tweede en derde plek behaal in die twee kategorie¨e waaraan deelgeneem is.
APA, Harvard, Vancouver, ISO, and other styles
3

Shou-Chun, Yin 1980. "Speaker adaptation in joint factor analysis based text independent speaker verification." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100735.

Full text
Abstract:
This thesis presents methods for supervised and unsupervised speaker adaptation of Gaussian mixture speaker models in text-independent speaker verification. The proposed methods are based on an approach which is able to separate speaker and channel variability so that progressive updating of speaker models can be performed while minimizing the influence of the channel variability associated with the adaptation recordings. This approach relies on a joint factor analysis model of intrinsic speaker variability and session variability where inter-session variation is assumed to result primarily from the effects of the transmission channel. These adaptation methods have been evaluated under the adaptation paradigm defined under the NIST 2005 speaker recognition evaluation plan which is based on conversational telephone speech.
APA, Harvard, Vancouver, ISO, and other styles
4

Uzuner, Halil. "Robust text-independent speaker recognition over telecommunications systems." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843391/.

Full text
Abstract:
Biometric recognition methods, using human features such as voice, face or fingeorprints, are increasingly popular for user authentication. Voice is unique in that it is a non-intrusive biometric which can be transmitted over the existing telecommunication networks, thereby allowing remote authentication. Current spealcer recognition systems can provide high recognition rates on clean speech signals. However, their performance has been shown to degrade in real-life applications such as telephone banking, where speech compression and background noise can affect the speech signal. In this work, three important advancements have been introduced to improve the speaker recognition performance, where it is affected by the coder mismatch, the aliasing distortion caused by the Line Spectral Frequency (LSF) parameter extraction, and the background noise. The first advancement focuses on investigating the speaker recognition system performance in a multi-coder environment using a Speech Coder Detection (SCD) System, which minimises training and testing data mismatch and improves the speaker recognition performance. Having reduced the speaker recognition error rates for multi-coder environment, further investigation on GSM-EFR speech coder is performed to deal with a particular - problem related to LSF parameter extraction method. It has been previously shown that the classic technique for extraction of LSF parameters in speech coders is prone to aliasing distortion. Low-pass filtering on up-sampled LSF vectors has been shown to alleviate this problem, therefore improving speech quality. In this thesis, as a second advancement, the Non-Aliased LSF (NA-LSF) extraction method is introduced in order to reduce the unwanted effects of GSM-EFR coder on speaker recognition performance. Another important factor that effects the performance of speaker recognition systems is the presence of the background noise. Background noise might severely reduce the performance of the targeted application such as quality of the coded speech, or the performance of the speaker recognition systems. The third advancement was achieved by using a noise-canceller to improve the speaker recognition performance in mismatched environments with varying background noise conditions. Speaker recognition system with a Minimum Mean Square Error - Log Spectral Amplitudes (MMSE-LSA) noise- canceller used as a pre-processor is proposed and investigated to determine the efficiency of noise cancellation on the speaker recognition performance using speech corrupted by different background noise conditions. Also the effects of noise cancellation on speaker recognition performance using coded noisy speech have been investigated. Key words; Identification, Verification, Recognition, Gaussian Mixture Models, Speech Coding, Noise Cancellation.
APA, Harvard, Vancouver, ISO, and other styles
5

Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
APA, Harvard, Vancouver, ISO, and other styles
6

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
7

Adami, André Gustavo. "Modeling prosodic differences for speaker and language recognition /." Full text open access at:, 2004. http://content.ohsu.edu/u?/etd,19.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yu, K. P. "Text dependency and adaptation in training speaker recognition systems." Thesis, Swansea University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.636721.

Full text
Abstract:
This thesis investigates speaker specific models trained with training sets with a number of different repetitions per text but focusing mainly on the models trained with only a few (less than 3) repetitions. This work aims to assess the abilities of a speaker model as the amount of training data increases while keeping the length of test utterances fixed. This theme is chosen because small data sets are problematic to the training of models for speech and speaker recognition. Small training set sizes regularly occur when training speaker specific models, as it is often difficult to collect a large amount of speaker specific data. In the first part of this work, three speaker recognition approaches, namely vector quantisation (VQ), dynamic time warping (DTW) and continuous density hidden Markov models (CDHMMs) are assessed. These experiments use increasing training set sizes which contain from 1 to 10 repetitions of each text to train each speaker model. Here the intent is to show which approach is most appropriate across the range of available training set sizes, for text-dependent and text-independent speaker recognition. This part concludes by suggesting that the TD DTW approach is best of all the chosen configurations. The second part of the work concerns adaptation using text-dependent CDHMMs. A new approach for adaptation called cumulative likelihood estimation (CLE) is introduced, and compared with the maximum a posteriori (MAP) approach and other benchmark results. The framework is chosen such that only single repetitions of each utterance are available for enrolment and subsequent adaptation of the speaker model. The objective is to assess whether creating speaker models through the use of an adaptation approach is a viable alternative to creating speaker models using stored speaker specific speech. It is concluded that both MAP and CLE are viable alternatives, and CLE in particular can create a model by adapting single repetitions of data which achieves performance as good as or better than that of an equivalent model, such as DTW, which has been trained using an equivalent amount of stored data.
APA, Harvard, Vancouver, ISO, and other styles
9

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition." Thesis, Queensland University of Technology, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Mathan, Luc Stefan. "Speaker-independent access to a large lexicon." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63773.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Speaker recognition systems"

1

Fundamentals of speaker recognition. New York: Springer, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Christian, Müller, ed. Speaker classification. Berlin: Springer, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Meisel, William S. The telephony voice user interface: Applications of speech recognition, text-to-speech, and speaker verification over the telephone. Tarzana, CA: TMA Associates, 1998.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
4

Sabourin, Conrad. Computational speech processing: Speech analysis, recognition, understanding, compression, transmission, coding, synthesis, text to speech systems, speech to tactile displays, speaker identification, prosody processing : bibliography. Montréal: Infolingua, 1994.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Russell, M. J. The development of the speaker independent ARM continuous speech recognition system. [London: Controller, H.M.S.O., 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
6

Franz, Gerl, Minker Wolfgang, and SpringerLink (Online service), eds. Self-Learning Speaker Identification: A System for Enhanced Speech Recognition. Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2011.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
7

Beigi, Homayoon. Fundamentals of Speaker Recognition. Springer, 2016.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
8

Gallardo, Laura Fernández. Human and Automatic Speaker Recognition over Telecommunication Channels. Springer, 2015.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
9

Speaker Classification. Springer, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
10

Müller, Christian. Speaker Classification I: Fundamentals, Features, and Methods. Springer London, Limited, 2007.

Find full text
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Speaker recognition systems"

1

Ghate, P. M., Shraddha Chadha, Aparna Sundar, and Ankita Kambale. "Automatic Speaker Recognition System." In Advances in Intelligent Systems and Computing, 1037–44. New Delhi: Springer India, 2013. http://dx.doi.org/10.1007/978-81-322-0740-5_126.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Katrak, Kayan K., Kanishk Singh, Aayush Shah, Rohit Menon, and V. R. Badri Prasad. "Transformers for Speaker Recognition." In Machine Learning and Autonomous Systems, 49–62. Singapore: Springer Singapore, 2022. http://dx.doi.org/10.1007/978-981-16-7996-4_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Glasser, Avery. "Designing Better Speaker Verification Systems: Bridging the Gap between Creators and Implementers of Investigatory Voice Biometric Technologies." In Forensic Speaker Recognition, 511–27. New York, NY: Springer New York, 2011. http://dx.doi.org/10.1007/978-1-4614-0263-3_18.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Martin, Alvin, Mark Przybocki, and Joseph P. Campbell. "The NIST speaker recognition evaluation program." In Biometric Systems, 241–62. London: Springer London, 2005. http://dx.doi.org/10.1007/1-84628-064-8_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Koolagudi, Shashidhar G., Kritika Sharma, and K. Sreenivasa Rao. "Speaker Recognition in Emotional Environment." In Eco-friendly Computing and Communication Systems, 117–24. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-32112-2_15.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Returi, Kanaka Durga, Y. Radhika, and Vaka Murali Mohan. "A Simple Method for Speaker Recognition and Speaker Verification." In Advances in Intelligent Systems and Computing, 663–72. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-5400-1_64.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Hemakumar, G., and P. Punitha. "Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent." In Advances in Intelligent Systems and Computing, 73–80. New Delhi: Springer India, 2015. http://dx.doi.org/10.1007/978-81-322-2250-7_8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Shulipa, Andrey, Sergey Novoselov, and Yuri Matveev. "Scores Calibration in Speaker Recognition Systems." In Speech and Computer, 596–603. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-43958-7_72.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hatem, Ahmed Samit, Muthanna J. Adulredhi, Ali M. Abdulrahman, and Mohammed A. Fadhel. "Human Speaker Recognition Based Database Method." In Advances in Intelligent Systems and Computing, 1145–54. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-71187-0_106.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Rajendran, Sindhu, Meghamadhuri Vakil, Praveen Kumar Gupta, Lingayya Hiremath, S. Narendra Kumar, and Ajeet Kumar Srivastava. "An Overview of the Concept of Speaker Recognition." In Intelligent Systems, 107–24. Includes bibliographical references and index.: Apple Academic Press, 2019. http://dx.doi.org/10.1201/9780429265020-6.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speaker recognition systems"

1

Kohler, M. A., W. D. Andrews, J. P. Campbell, and J. Herndndez-Cordero. "Phonetic speaker recognition." In Conference Record. Thirty-Fifth Asilomar Conference on Signals, Systems and Computers. IEEE, 2001. http://dx.doi.org/10.1109/acssc.2001.987748.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Badji, Aliou, Youssou Dieng, Ibrahima Diop, Papa Alioune Cisse, and Boubacar Diouf. "Automatic Speaker Recognition (ASR)." In ICIST '20: 10th International Conference on Information Systems and Technologies. New York, NY, USA: ACM, 2020. http://dx.doi.org/10.1145/3447568.3448544.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhu, Jian-wei, Shui-fa Sun, Xiao-li Liu, and Bang-jun Lei. "Pitch in Speaker Recognition." In 2009 Ninth International Conference on Hybrid Intelligent Systems (HIS 2009). IEEE, 2009. http://dx.doi.org/10.1109/his.2009.14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Heck, Larry P., and Dominique Genoud. "Combining speaker and speech recognition systems." In 7th International Conference on Spoken Language Processing (ICSLP 2002). ISCA: ISCA, 2002. http://dx.doi.org/10.21437/icslp.2002-415.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Pandey, Bipul, Alok Ranjan, Rajeev Kumar, and Anupam Shukla. "Multilingual speaker recognition using ANFIS." In 2010 2nd International Conference on Signal Processing Systems (ICSPS). IEEE, 2010. http://dx.doi.org/10.1109/icsps.2010.5555759.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

"SPEAKER RECOGNITION USING DECISION FUSION." In International Conference on Bio-inspired Systems and Signal Processing. SciTePress - Science and and Technology Publications, 2008. http://dx.doi.org/10.5220/0001065502670272.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Kadyrov, Shirali, Cemil Turan, Altynbek Amirzhanov, and Cemal Ozdemir. "Speaker Recognition from Spectrogram Images." In 2021 IEEE International Conference on Smart Information Systems and Technologies (SIST). IEEE, 2021. http://dx.doi.org/10.1109/sist50301.2021.9465954.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Fei, Wanchun, Liangjun Xu, and Xingxing Lu. "Speaker recognition on nonstationary characteristics." In 2010 Seventh International Conference on Fuzzy Systems and Knowledge Discovery (FSKD). IEEE, 2010. http://dx.doi.org/10.1109/fskd.2010.5569783.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Selvan, Karthik, Aju Joseph, and K. K. Anish Babu. "Speaker recognition system for security applications." In 2013 IEEE Recent Advances in Intelligent Computational Systems (RAICS). IEEE, 2013. http://dx.doi.org/10.1109/raics.2013.6745441.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Slyh, Raymond, Eric Hansen, and Brian Ore. "The 2005 AFRL/HEC One-Speaker Detection Systems." In 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop. IEEE, 2006. http://dx.doi.org/10.1109/odyssey.2006.248119.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Speaker recognition systems"

1

Slyh, Raymond E., Eric G. Hansen, and Timothy R. Anderson. AFRL/HECP Speaker Recognition Systems for the 2004 NIST Speaker Recognition Evaluation. Fort Belvoir, VA: Defense Technical Information Center, December 2004. http://dx.doi.org/10.21236/ada430750.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Ferrer, Luciana, Mitchell McLaren, Nicolas Scheffer, Yun Lei, Martin Graciarena, and Vikramjit Mitra. A Noise-Robust System for NIST 2012 Speaker Recognition Evaluation. Fort Belvoir, VA: Defense Technical Information Center, August 2013. http://dx.doi.org/10.21236/ada614010.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Remus, Jeremiah. Advanced Subspace Techniques for Modeling Channel and Session Variability in a Speaker Recognition System. Fort Belvoir, VA: Defense Technical Information Center, March 2012. http://dx.doi.org/10.21236/ada557785.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Issues in Data Processing and Relevant Population Selection. OSAC Speaker Recognition Subcommittee, November 2022. http://dx.doi.org/10.29325/osac.tg.0006.

Full text
Abstract:
In Forensic Automatic Speaker Recognition (FASR), forensic examiners typically compare audio recordings of a speaker whose identity is in question with recordings of known speakers to assist investigators and triers of fact in a legal proceeding. The performance of automated speaker recognition (SR) systems used for this purpose depends largely on the characteristics of the speech samples being compared. Examiners must understand the requirements of specific systems in use as well as the audio characteristics that impact system performance. Mismatch conditions between the known and questioned data samples are of particular importance, but the need for, and impact of, audio pre-processing must also be understood. The data selected for use in a relevant population can also be critical to the performance of the system. This document describes issues that arise in the processing of case data and in the selections of a relevant population for purposes of conducting an examination using a human supervised automatic speaker recognition approach in a forensic context. The document is intended to comply with the Organization of Scientific Area Committees (OSAC) for Forensic Science Technical Guidance Document.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography