Relevant bibliographies by topics / Speaker recognition

Academic literature on the topic 'Speaker recognition'

Author: Grafiati

Published: 4 June 2021

Last updated: 7 July 2024

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Speaker recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Speaker recognition"

Sun, Linhui, Yunyi Bu, Bo Zou, Sheng Fu, and Pingan Li. "Speaker Recognition Based on Fusion of a Deep and Shallow Recombination Gaussian Supervector." Electronics 10, no. 1 (December 25, 2020): 20. http://dx.doi.org/10.3390/electronics10010020.

Full text

Abstract:

Extracting speaker’s personalized feature parameters is vital for speaker recognition. Only one kind of feature cannot fully reflect the speaker’s personality information. In order to represent the speaker’s identity more comprehensively and improve speaker recognition rate, we propose a speaker recognition method based on the fusion feature of a deep and shallow recombination Gaussian supervector. In this method, the deep bottleneck features are first extracted by Deep Neural Network (DNN), which are used for the input of the Gaussian Mixture Model (GMM) to obtain the deep Gaussian supervector. On the other hand, we input the Mel-Frequency Cepstral Coefficient (MFCC) to GMM directly to extract the traditional Gaussian supervector. Finally, the two categories of features are combined in the form of horizontal dimension augmentation. In addition, when the number of speakers to be recognized increases, in order to prevent the system recognition rate from falling sharply, we introduce the optimization algorithm to find the optimal weight before the feature fusion. The experiment results indicate that the speaker recognition rate based on the feature which is fused directly can reach 98.75%, which is 5% and 0.62% higher than the traditional feature and deep bottleneck feature, respectively. When the number of speakers increases, the fusion feature based on optimized weight coefficients can improve the recognition rate by 0.81%. It is validated that our proposed fusion method can effectively consider the complementarity of the different types of features and improve the speaker recognition rate.

APA, Harvard, Vancouver, ISO, and other styles

Singh, Satyanand. "Forensic and Automatic Speaker Recognition System." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 5 (October 1, 2018): 2804. http://dx.doi.org/10.11591/ijece.v8i5.pp2804-2811.

Full text

Abstract:

<span lang="EN-US">Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics</span>

APA, Harvard, Vancouver, ISO, and other styles

Markowitz, Judith A. "Speaker recognition." Information Security Technical Report 3, no. 1 (January 1998): 14–20. http://dx.doi.org/10.1016/s1363-4127(98)80014-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Markowitz, Judith A. "Speaker recognition." Information Security Technical Report 4 (January 1999): 28. http://dx.doi.org/10.1016/s1363-4127(99)80053-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Furui, Sadaoki. "Speaker recognition." Scholarpedia 3, no. 4 (2008): 3715. http://dx.doi.org/10.4249/scholarpedia.3715.

Full text

APA, Harvard, Vancouver, ISO, and other styles

O'Shaughnessy, D. "Speaker recognition." IEEE ASSP Magazine 3, no. 4 (October 1986): 4–17. http://dx.doi.org/10.1109/massp.1986.1165388.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Singh, Mahesh K., P. Mohana Satya, Vella Satyanarayana, and Sridevi Gamini. "Speaker Recognition Assessment in a Continuous System for Speaker Identification." International Journal of Electrical and Electronics Research 10, no. 4 (December 30, 2022): 862–67. http://dx.doi.org/10.37391/ijeer.100418.

Full text

Abstract:

This research article presented and focused on recognizing speakers through multi-speaker speeches. The participation of several speakers includes every conference, talk or discussion. This type of talk has different problems as well as stages of processing. Challenges include the unique impurity of the surroundings, the involvement of speakers, speaker distance, microphone equipment etc. In addition to addressing these hurdles in real time, there are also problems in the treatment of the multi-speaker speech. Identifying speech segments, separating the speaking segments, constructing clusters of similar segments and finally recognizing the speaker using these segments are the common sequential operations in the context of multi-speaker speech recognition. All linked phases of speech recognition processes are discussed with relevant methodologies in this article. This entire article will examine the common metrics, methods and conduct. This paper examined the algorithm of speech recognition system at different stages. The voice recognition systems are built through many phases such as voice filter, speaker segmentation, speaker idolization and the recognition of the speaker by 20 speakers.

APA, Harvard, Vancouver, ISO, and other styles

Gonzalez-Rodriguez, Joaquin. "Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)." Loquens 1, no. 1 (June 30, 2014): e007. http://dx.doi.org/10.3989/loquens.2014.007.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Mannepalli, Kasiprasad, Suman Maloji, Panyam Narahari Sastry, Swetha Danthala, and Durgaprasad Mannepalli. "Text independent emotion recognition for Telugu speech by using prosodic features." International Journal of Engineering & Technology 7, no. 2.7 (March 18, 2018): 594. http://dx.doi.org/10.14419/ijet.v7i2.7.10887.

Full text

Abstract:

The human speech delivers different types of information about the speaker and speech. From the speech production side, the speech signal carries linguistic information such as the meaningful message and the language and emotional, geographical and the speaker’s physiological characteristics of the speaker information are conveyed. This paper focuses on automatically identifying the emotion of a speaker given a sample of speech. the speech signals considered in this work are collected from Telugu speakers. The features like pitch, pitch related prosody, energy and formants. The overall recognition accuracy obtained is 72% in this work.

APA, Harvard, Vancouver, ISO, and other styles

Lakshmi Prasanna, P. "Attention for the speech of cleft lip and palate in speaker recognition." Open Journal of Pain Medicine 7, no. 1 (December 1, 2023): 7–1. http://dx.doi.org/10.17352/ojpm.000036.

Full text

Abstract:

Artificial Intelligence (AI) has become indispensable to all people, primarily for the purposes of speaker recognition, voice identification, educational purposes, workplace, and health care. Based on a speaker’s voice characteristics, identification and recognition of the speaker is accomplished. The voice is affected by both intra- and interspeaker variability. In addition to this, a condition known as structural abnormalities can cause resonance, which can seriously affect voice quality. As a result, speakers may experience difficulties when using AI-based devices. The study aims to investigate the effects of speech with cleft lip and palate on speaker recognition. The review stated that even after surgery, some people with cleft lip and palate exhibit hypernasality and poor speech intelligibility depending on the severity of the cleft. The author discovered that artificial intelligence has been applied to surgical procedures. In children with corrected cleft lip and palate, acoustic analysis revealed poor benchmarking for speaker identification. The most prevalent type of hypernasality also affects speech intelligibility. Thus, more research on speaker recognition using different algorithms and hypernasality is essential. These can help speakers who have CLP to use AI freely and without any issues. Even with its flaws, people with CLP can still learn more about using AI.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Speaker recognition"

Chatzaras, Anargyros, and Georgios Savvidis. "Seamless speaker recognition." Thesis, KTH, Radio Systems Laboratory (RS Lab), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-159021.

Full text

Abstract:

In a technologically advanced society, the average person manages dozens of accounts for e-mail, social networks, e-banking, and other electronic services. As the number of these accounts increases, the need for automatic user identification becomes more essential. Biometrics have long been used to identify people and are the most common (if not the only) method to achieve this task. Over the past few years, smartphones have become frequently used gadgets. These devices have built-in microphones and are commonly used by a single user or a small set of users, such as a couple or a family. This thesis uses a smartphone’s microphone to capture user’s speech and identify him/her. Existing speaker recognition systems typically prompt the user to provide long voice samples in order to provide accurate results. This results in a poor user experience and discourages users who do not have the patience to go through such a process. The main idea behind the speaker recognition approach presented in this thesis is to provide a seamless user experience where the recording of the user’s voice takes place in the background. An Android application is developed which silently collects voices samples and performs speaker recognition without requiring extensive user interaction. Two variants of the proposed tool have been developed and are described in depth in this thesis. The open source framework Recognito is used to perform the speaker recognition task. The analysis of Recognito showed that it is not capable of achieving high accuracy especially when the voice samples contain background noise. Finally, the comparison between the two architectures showed that they do not differ significantly in terms of performance.
I ett teknologiskt avancerat samhälle så hanterar den genomsnittliga personen dussintals konton för e-post, sociala nätverk, internetbanker, och andra elektroniska tjänster. Allt eftersom antalet konton ökar, blir behovet av automatisk identifiering av användaren mer väsentlig. Biometri har länge använts för att identifiera personer och är den vanligaste (om inte den enda) metoden för att utföra denna uppgift. Smartphones har under de senaste åren blivit allt mer vanligt förekommande, de ger användaren tillgång till de flesta av sina konton och, i viss mån, även personifiering av enheterna baserat på deras profiler på sociala nätverk. Dessa enheter har inbyggda mikrofoner och används ofta av en enskild användare eller en liten grupp av användare, till exempel ett par eller en familj. Denna avhandling använder mikrofonen i en smartphone för att spela in användarens tal och identifiera honom/henne. Befintliga lösningar för talarigenkänning ber vanligtvis användaren om att ge långa röstprover för att kunna ge korrekta resultat. Detta resulterar i en dålig användarupplevelse och avskräcker användare som inte har tålamod att gå igenom en sådan process. Huvudtanken bakom den strategi för talarigenkänningen som presenteras i denna avhandling är att ge en sömlös användarupplevelse där inspelningen av användarens röst sker i bakgrunden. En Android-applikation har utvecklats som, utan att märkas, samlar in röstprover och utför talarigenkänning på dessa utan att kräva omfattande interaktion av användaren. Två varianter av verktyget har utvecklats och dessa beskrivs ingående i denna avhandling. Öpen source-ramverket Recognito används för att utföra talarigenkänningen. Analysen av Recognito visade att det inte klarar av att uppnå tillräckligt hög noggrannhet, speciellt när röstproverna innehåller bakgrundsbrus. Dessutom visade jämförelsen mellan de två arkitekturerna att de inte skiljer sig nämnvärt i fråga om prestanda.

APA, Harvard, Vancouver, ISO, and other styles

VASILAKAKIS, VASILEIOS. "Forensic speaker recognition: speaker and height estimation techniques." Doctoral thesis, Politecnico di Torino, 2014. http://hdl.handle.net/11583/2551370.

Full text

Abstract:

In this work, we analyse some techniques used to perform speaker verification, ex- plaining the steps from feature extraction to mathematical models used for speaker characterisation and discriminative modelling. The main contributions of the au- thor, is a modification on the i–vector generation process, making it either faster or less memory-demanding, a novel way to perform speaker verification by the use of the Pairwise Support Vector Machine and a new way to perform speaker characteri- sation by means of Deep belief networks. Apart from these contributions, additional work in Automatic Speech-Based Height Estimation is presented, including a base- line model and then improvement of this by the use of a Mixture of Expert Neural Networks.

APA, Harvard, Vancouver, ISO, and other styles

Kamarauskas, Juozas. "Speaker recognition by voice." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20090615_093847-20773.

Full text

Abstract:

Questions of speaker’s recognition by voice are investigated in this dissertation. Speaker recognition systems, their evolution, problems of recognition, systems of features, questions of speaker modeling and matching used in text-independent and text-dependent speaker recognition are considered too. The text-independent speaker recognition system has been developed during this work. The Gaussian mixture model approach was used for speaker modeling and pattern matching. The automatic method for voice activity detection was proposed. This method is fast and does not require any additional actions from the user, such as indicating patterns of the speech signal and noise. The system of the features was proposed. This system consists of parameters of excitation source (glottal) and parameters of the vocal tract. The fundamental frequency was taken as an excitation source parameter and four formants with three antiformants were taken as parameters of the vocal tract. In order to equate dispersions of the formants and antiformants we propose to use them in mel-frequency scale. The standard mel-frequency cepstral coefficients (MFCC) for comparison of the results were implemented in the recognition system too. These features make baseline in speech and speaker recognition. The experiments of speaker recognition have shown that our proposed system of features outperformed standard mel-frequency cepstral coefficients. The equal error rate (EER) was equal to 5.17% using proposed... [to full text]
Disertacijoje nagrinėjami kalbančiojo atpažinimo pagal balsą klausimai. Aptartos kalbančiojo atpažinimo sistemos, jų raida, atpažinimo problemos, požymių sistemos įvairovė bei kalbančiojo modeliavimo ir požymių palyginimo metodai, naudojami nuo ištarto teksto nepriklausomame bei priklausomame kalbančiojo atpažinime. Darbo metu sukurta nuo ištarto teksto nepriklausanti kalbančiojo atpažinimo sistema. Kalbėtojų modelių kūrimui ir požymių palyginimui buvo panaudoti Gauso mišinių modeliai. Pasiūlytas automatinis vokalizuotų garsų išrinkimo (segmentavimo) metodas. Šis metodas yra greitai veikiantis ir nereikalaujantis iš vartotojo jokių papildomų veiksmų, tokių kaip kalbos signalo ir triukšmo pavyzdžių nurodymas. Pasiūlyta požymių vektorių sistema, susidedanti iš žadinimo signalo bei balso trakto parametrų. Kaip žadinimo signalo parametras, panaudotas žadinimo signalo pagrindinis dažnis, kaip balso trakto parametrai, panaudotos keturios formantės bei trys antiformantės. Siekiant suvienodinti žemesnių bei aukštesnių formančių ir antiformančių dispersijas, jas pasiūlėme skaičiuoti melų skalėje. Rezultatų palyginimui sistemoje buvo realizuoti standartiniai požymiai, naudojami kalbos bei asmens atpažinime – melų skalės kepstro koeficientai (MSKK). Atlikti kalbančiojo atpažinimo eksperimentai parodė, kad panaudojus pasiūlytą požymių sistemą buvo gauti geresni atpažinimo rezultatai, nei panaudojus standartinius požymius (MSKK). Gautas lygių klaidų lygis, panaudojant pasiūlytą požymių... [toliau žr. visą tekstą]

APA, Harvard, Vancouver, ISO, and other styles

Du, Toit Ilze. "Non-acoustic speaker recognition." Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16315.

Full text

Abstract:

Thesis (MScIng)--University of Stellenbosch, 2004.
ENGLISH ABSTRACT: In this study the phoneme labels derived from a phoneme recogniser are used for phonetic speaker recognition. The time-dependencies among phonemes are modelled by using hidden Markov models (HMMs) for the speaker models. Experiments are done using firstorder and second-order HMMs and various smoothing techniques are examined to address the problem of data scarcity. The use of word labels for lexical speaker recognition is also investigated. Single word frequencies are counted and the use of various word selections as feature sets are investigated. During April 2004, the University of Stellenbosch, in collaboration with Spescom DataVoice, participated in an international speaker verification competition presented by the National Institute of Standards and Technology (NIST). The University of Stellenbosch submitted phonetic and lexical (non-acoustic) speaker recognition systems and a fused system (the primary system) that fuses the acoustic system of Spescom DataVoice with the non-acoustic systems of the University of Stellenbosch. The results were evaluated by means of a cost model. Based on the cost model, the primary system obtained second and third position in the two categories that were submitted.
AFRIKAANSE OPSOMMING: Hierdie projek maak gebruik van foneem-etikette wat geklassifiseer word deur ’n foneemherkenner en daarna gebruik word vir fonetiese sprekerherkenning. Die tyd-afhanklikhede tussen foneme word gemodelleer deur gebruik te maak van verskuilde Markov modelle (HMMs) as sprekermodelle. Daar word ge¨eksperimenteer met eerste-orde en tweede-orde HMMs en verskeie vergladdingstegnieke word ondersoek om dataskaarsheid aan te spreek. Die gebruik van woord-etikette vir sprekerherkenning word ook ondersoek. Enkelwoordfrekwensies word getel en daar word ge¨eksperimenteer met verskeie woordseleksies as kenmerke vir sprekerherkenning. Gedurende April 2004 het die Universiteit van Stellenbosch in samewerking met Spescom DataVoice deelgeneem aan ’n internasionale sprekerverifikasie kompetisie wat deur die National Institute of Standards and Technology (NIST) aangebied is. Die Universiteit van Stellenbosch het ingeskryf vir ’n fonetiese en ’n woordgebaseerde (nie-akoestiese) sprekerherkenningstelsel, asook ’n saamgesmelte stelsel wat as primˆere stelsel dien. Die saamgesmelte stelsel is ’n kombinasie van Spescom DataVoice se akoestiese stelsel en die twee nie-akoestiese stelsels van die Universiteit van Stellenbosch. Die resultate is ge¨evalueer deur gebruik te maak van ’n koste-model. Op grond van die koste-model het die primˆere stelsel tweede en derde plek behaal in die twee kategorie¨e waaraan deelgeneem is.

APA, Harvard, Vancouver, ISO, and other styles

Hong, Z. (Zimeng). "Speaker gender recognition system." Master's thesis, University of Oulu, 2017. http://jultika.oulu.fi/Record/nbnfioulu-201706082645.

Full text

Abstract:

Abstract. Automatic gender recognition through speech is one of the fundamental mechanisms in human-machine interaction. Typical application areas of this technology range from gender-targeted advertising to gender-specific IoT (Internet of Things) applications. It can also be used to narrow down the scope of investigations in crime scenarios. There are many possible methods of recognizing the gender of a speaker. In machine learning applications, the first step is to acquire and convert the natural human voice into a form of machine understandable signal. Useful voice features then could be extracted and labelled with gender information so that are then trained by machines. After that, new input voice can be captured and processed and the machine is able to extract the features by pattern modelling. In this thesis, a real-time speaker gender recognition system was designed within Matlab environment. This system could automatically identify the gender of a speaker by voice. The implementation work utilized voice processing and feature extraction techniques to deal with an input speech coming from a microphone or a recorded speech file. The response features are extracted and classified. Then the machine learning classification method (Naïve Bayes Classifier) is used to distinguish the gender features. The recognition result with gender information is then finally displayed. The evaluation of the speaker gender recognition systems was done in an experiment with 40 participants (half male and half female) in a quite small room. The experiment recorded 400 speech samples by speakers from 16 countries in 17 languages. These 400 speech samples were tested by the gender recognition system and showed a considerably good performance, with only 29 errors of recognition (92.75% accuracy). In comparison with previous speaker gender recognition systems, most of them obtained the accuracy no more than 90% and only one obtained 100% accuracy with very limited testers. We can then conclude that the performance of the speaker gender recognition system designed in this thesis is reliable.

APA, Harvard, Vancouver, ISO, and other styles

Al-Kilani, Menia. "Voice-signature-based Speaker Recognition." University of the Western Cape, 2017. http://hdl.handle.net/11394/5888.

Full text

Abstract:

Magister Scientiae - MSc (Computer Science)
Personal identification and the protection of data are important issues because of the ubiquitousness of computing and these have thus become interesting areas of research in the field of computer science. Previously people have used a variety of ways to identify an individual and protect themselves, their property and their information. This they did mostly by means of locks, passwords, smartcards and biometrics. Verifying individuals by using their physical or behavioural features is more secure than using other data such as passwords or smartcards, because everyone has unique features which distinguish him or her from others. Furthermore the biometrics of a person are difficult to imitate or steal. Biometric technologies represent a significant component of a comprehensive digital identity solution and play an important role in security. The technologies that support identification and authentication of individuals is based on either their physiological or their behavioural characteristics. Live-‐data, in this instance the human voice, is the topic of this research. The aim is to recognize a person’s voice and to identify the user by verifying that his/her voice is the same as a record of his / her voice-‐signature in a systems database. To address the main research question: “What is the best way to identify a person by his / her voice signature?”, design science research, was employed. This methodology is used to develop an artefact for solving a problem. Initially a pilot study was conducted using visual representation of voice signatures, to check if it is possible to identify speakers without using feature extraction or matching methods. Subsequently, experiments were conducted with 6300 data sets derived from Texas Instruments and the Massachusetts Institute of Technology audio database. Two methods of feature extraction and classification were considered—mel frequency cepstrum coefficient and linear prediction cepstral coefficient feature extraction—and for classification, the Support Vector Machines method was used. The three methods were compared in terms of their effectiveness and it was found that the system using the mel frequency cepstrum coefficient, for feature extraction, gave the marginally better results for speaker recognition.

APA, Harvard, Vancouver, ISO, and other styles

Oglesby, J. "Neural models for speaker recognition." Thesis, Swansea University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.638359.

Full text

Abstract:

In recent years a resurgence of interest in neural modeling has taken place. This thesis examines one such class applied to the task of speaker recognition, with direct comparisons made to a contemporary approach based on vector quantisation (VQ). Speaker recognition systems in general, including feature representations and distance measures, are reviewed. The VQ approach, used for comparisons throughout the experimental work, is described in detail. Currently popular neural architectures are also reviewed and associated gradient-based training procedures examined. The performance of a VQ speaker identification system is determined experimentally for a range of popular speech features, using codebooks of varying sizes. Perceptually-based cepstral features are found to out-perform both standard LPC and filterbank representations. New approaches to speaker recognition based on multilayer perceptrons (MLP) and a variant using radial basis functions (RBF) are proposed and examined. To facilitate the research in terms of computational requirements a novel parallel training algorithm is proposed, which dynamically schedules the computational load amongst the available processors. This is shown to give close to linear speed-up on typical training tasks for up to fifty transputers. A transputer-based processing module with appropriate speech capture and synthesis facilities is also developed. For the identification task the MLP approach is found to give approximately the same performance as equivalent sized VQ codebooks. The MLP approach is slightly better for smaller models, however for larger models the VQ approach gives marginally superior results. MLP and RBF models are investigated for speaker verification. Both techniques significantly out-perform the VQ approach, giving 29.5% (MLP) and 21.5% (RBF) true talker rejections for a fixed 2% imposter acceptance rate, compared to 34.5% for the VQ approach. These figures relate to single digit test utterances. Extending the duration of the test utterance is found to significantly improve performance across all techniques. The best overall performance is obtained from RBF models: five digit utterances achieve around 2.5% true talker rejections for a fixed 2% imposter acceptance rate.

APA, Harvard, Vancouver, ISO, and other styles

Thompson, J. "Speech variability in speaker recognition." Thesis, Swansea University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.639230.

Full text

Abstract:

This thesis is concerned with investigating the effects of variability on the automatic speaker recognition system performance. Both speaker generated variability and variability of the recording environment are examined. Speaker generated variability (intra-variation) has received less attention than variability of the recording environment, and is therefore the main focus of this thesis. In particular, of most concern is the intra-variation of data typically found in co-operative speaker recognition tasks. That is normally spoken speech, collected over a period of months. To assess the scale of recognition errors attributed to intra-variation, errors due to noise degradation are considered first. Additive noise can rapidly degrade recognition performance, so for a more realistic assessment, a 'state of the art' noise compensation algorithm is also introduced. Comparisons between noise degradation and intra-variation, shows intra-variation to be a significant source of recognition errors, with intra-variation being the source of most recognition errors of a background noise of 9dB SNR or greater. The level of intra-variation and recognition errors is shown to be highly speaker dependent. Analysis of cepstral variation shows intra-variation to correlate more closely with recognition errors than inter-variation. Recognition experiments and analysis of the glottal pulse shape demonstrate that variation between two recording sessions generally increases as the time gap between the recording of the sessions lengthens. Glottal pulse variation is also shown to vary within recording sessions, albeit with less variation than between sessions. Glottal pulse shape variation is shown by others to vary for highly stressed speech. It is shown here to also vary for normally spoken speech collected under relatively controlled conditions. It is hypothesized that these variations occur, in part, due to the speaker's anxiety during recording. Glottal pulse variation is shown to broadly match the hypothesised anxiety profile. The gradual change of glottal pulse variation demonstrates an underlying reason why incremental speaker adaptation can be used for intra-variation compensation. Experiments show that potentially adaptation can reduce speaker identification error rates from 15% to 2.5%.

APA, Harvard, Vancouver, ISO, and other styles

Mukherjee, Rishiraj. "Speaker Recognition Using Shifted MFCC." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4136.

Full text

Abstract:

Speaker Recognition is the art of recognizing a speaker from a given database using speech as the only input. In this thesis we will be discussing a novel approach to detect speakers. Here we will introduce the concept of shifted MFCC to add improvement over the performance from previous work which has shown quite a decent amount of accuracy of about 95% at best. We will be talking about adding different parameters which also contributed in improving the efficiency of speaker recognition. Also we will be testing our algorithm on Text dependent speech data and Text Independent speech data. Our technique was evaluated on TIDIGIT - database. In order to further increase the speaker recognition rate at lower FARs, we combined accent information added with pitch and higher order formants. The possible application areas for the work done here is in any access control entry system or now a day's a lot of smart phones, laptops, operating systems etc have Also, in homeland security applications; speaker accent will play a critical role in the evaluation of biometric systems since users will be international in nature. So incorporating accent information into the speaker recognition/verification system is a key component that our study focused on. The accent incorporation method and Shifted MFCC techniques discussed in this work can also be applied to any other speaker recognition systems.

APA, Harvard, Vancouver, ISO, and other styles

Mwangi, Elijah. "Speaker independent isolated word recognition." Thesis, Loughborough University, 1987. https://dspace.lboro.ac.uk/2134/15425.

Full text

Abstract:

The work presented in this thesis concerns the recognition of isolated words using a pattern matching approach. In such a system, an unknown speech utterance, which is to be identified, is transformed into a pattern of characteristic features. These features are then compared with a set of pre-stored reference patterns that were generated from the vocabulary words. The unknown word is identified as that vocabulary word for which the reference pattern gives the best match. One of the major difficul ties in the pattern comparison process is that speech patterns, obtained from the same word, exhibit non-linear temporal fluctuations and thus a high degree of redundancy. The initial part of this thesis considers various dynamic time warping techniques used for normalizing the temporal differences between speech patterns. Redundancy removal methods are also considered, and their effect on the recognition accuracy is assessed. Although the use of dynamic time warping algorithms provide considerable improvement in the accuracy of isolated word recognition schemes, the performance is ultimately limited by their poor ability to discriminate between acoustically similar words. Methods for enhancing the identification rate among acoustically similar words, by using common pattern features for similar sounding regions, are investigated. Pattern matching based, speaker independent systems, can only operate with a high recognition rate, by using multiple reference patterns for each of the words included in the vocabulary. These patterns are obtained from the utterances of a group of speakers. The use of multiple reference patterns, not only leads to a large increase in the memory requirements of the recognizer, but also an increase in the computational load. A recognition system is proposed in this thesis, which overcomes these difficulties by (i) employing vector quantization techniques to reduce the storage of reference patterns, and (ii) eliminating the need for dynamic time warping which reduces the computational complexity of the system. Finally, a method of identifying the acoustic structure of an utterance in terms of voiced, unvoiced, and silence segments by using fuzzy set theory is proposed. The acoustic structure is then employed to enhance the recognition accuracy of a conventional isolated word recognizer.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Speaker recognition"

Neustein, Amy, and Hemant A. Patil, eds. Forensic Speaker Recognition. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-0263-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Auditory speaker recognition. Hamburg: Buske, 1987.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Beigi, Homayoon. Fundamentals of Speaker Recognition. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-77592-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Fundamentals of speaker recognition. New York: Springer, 2011.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Lee, Chin-Hui, Frank K. Soong, and Kuldip K. Paliwal, eds. Automatic Speech and Speaker Recognition. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/978-1-4613-1367-0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Keshet, Joseph, and Samy Bengio, eds. Automatic Speech and Speaker Recognition. Chichester, UK: John Wiley & Sons, Ltd, 2009. http://dx.doi.org/10.1002/9780470742044.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zheng, Thomas Fang, and Lantian Li. Robustness-Related Issues in Speaker Recognition. Singapore: Springer Singapore, 2017. http://dx.doi.org/10.1007/978-981-10-3238-7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rao, K. Sreenivasa, and Sourjya Sarkar. Robust Speaker Recognition in Noisy Environments. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-07130-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Speaker separation and tracking. Konstanz: Hartung-Gorre Verlag, 2006.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Chin-Hui, Lee, Soong Frank K, and Paliwal K. K, eds. Automatic speech and speaker recognition: Advanced topics. Boston: Kluwer Academic Publishers, 1996.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Speaker recognition"

Zhang, David D. "Speaker Recognition." In Automated Biometrics, 179–201. Boston, MA: Springer US, 2000. http://dx.doi.org/10.1007/978-1-4615-4519-4_9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Farouk, Mohamed Hesham. "Speaker Recognition." In SpringerBriefs in Electrical and Computer Engineering, 33–35. Cham: Springer International Publishing, 2013. http://dx.doi.org/10.1007/978-3-319-02732-6_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Beigi, Homayoon. "Speaker Recognition." In Fundamentals of Speaker Recognition, 543–59. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-77592-0_17.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Furui, Sadaoki. "Speaker Recognition." In Computational Models of Speech Pattern Processing, 132–42. Berlin, Heidelberg: Springer Berlin Heidelberg, 1999. http://dx.doi.org/10.1007/978-3-642-60087-6_14.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Beigi, Homayoon. "Speaker Recognition." In Encyclopedia of Cryptography and Security, 1232–42. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-1-4419-5906-5_747.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Campbell, Joseph P. "Speaker Recognition." In Biometrics, 165–89. Boston, MA: Springer US, 1996. http://dx.doi.org/10.1007/0-306-47044-6_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Somogyi, Zoltán. "Speaker Recognition." In The Application of Artificial Intelligence, 185–96. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-60032-7_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Beigi, Homayoon. "Speaker Recognition." In Encyclopedia of Cryptography, Security and Privacy, 1–17. Berlin, Heidelberg: Springer Berlin Heidelberg, 2021. http://dx.doi.org/10.1007/978-3-642-27739-9_747-2.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Drygajlo, Andrzej. "From Speaker Recognition to Forensic Speaker Recognition." In Biometric Authentication, 93–104. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-13386-7_8.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Beigi, Homayoon. "Speaker Modeling." In Fundamentals of Speaker Recognition, 525–41. Boston, MA: Springer US, 2011. http://dx.doi.org/10.1007/978-0-387-77592-0_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Speaker recognition"

Shantoash, C., M. Vishal, S. Shruthi, and Gopalsamy N. Bharathi. "Speech Accent Recognition." In International Research Conference on IOT, Cloud and Data Science. Switzerland: Trans Tech Publications Ltd, 2023. http://dx.doi.org/10.4028/p-irai1l.

Full text

Abstract:

The speech accent demonstrates that accents are systematic instead of merely mistaken speech. This project allows detecting the demographic and linguistic backgrounds of the speakers by comparing different speech outputs with the speech accent archive dataset to work out which variables are key predictors of every accent. Given a recording of a speaker speaking a known script of English words, this project predicts the speaker’s language. This project aims to classify various sorts of accents, specifically foreign accents, by the language of the speaker. This project revolves round the detection of backgrounds of each individual using their speeches

APA, Harvard, Vancouver, ISO, and other styles

Bao, Yinan, Qianwen Ma, Lingwei Wei, Wei Zhou, and Songlin Hu. "Speaker-Guided Encoder-Decoder Framework for Emotion Recognition in Conversation." In Thirty-First International Joint Conference on Artificial Intelligence {IJCAI-22}. California: International Joint Conferences on Artificial Intelligence Organization, 2022. http://dx.doi.org/10.24963/ijcai.2022/562.

Full text

Abstract:

The emotion recognition in conversation (ERC) task aims to predict the emotion label of an utterance in a conversation. Since the dependencies between speakers are complex and dynamic, which consist of intra- and inter-speaker dependencies, the modeling of speaker-specific information is a vital role in ERC. Although existing researchers have proposed various methods of speaker interaction modeling, they cannot explore dynamic intra- and inter-speaker dependencies jointly, leading to the insufficient comprehension of context and further hindering emotion prediction. To this end, we design a novel speaker modeling scheme that explores intra- and inter-speaker dependencies jointly in a dynamic manner. Besides, we propose a Speaker-Guided Encoder-Decoder (SGED) framework for ERC, which fully exploits speaker information for the decoding of emotion. We use different existing methods as the conversational context encoder of our framework, showing the high scalability and flexibility of the proposed framework. Experimental results demonstrate the superiority and effectiveness of SGED.

APA, Harvard, Vancouver, ISO, and other styles

Tripathi, Supriya, and Smriti Bhatnagar. "Speaker Recognition." In 2012 3rd International Conference on Computer and Communication Technology (ICCCT 2012). IEEE, 2012. http://dx.doi.org/10.1109/iccct.2012.64.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kohler, M. A., W. D. Andrews, J. P. Campbell, and J. Herndndez-Cordero. "Phonetic speaker recognition." In Conference Record. Thirty-Fifth Asilomar Conference on Signals, Systems and Computers. IEEE, 2001. http://dx.doi.org/10.1109/acssc.2001.987748.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Desai, Veena, and Hema A. Murthy. "Distributed speaker recognition." In Interspeech 2004. ISCA: ISCA, 2004. http://dx.doi.org/10.21437/interspeech.2004-536.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wenndt, Stanley J., and Ronald L. Mitchell. "Familiar speaker recognition." In ICASSP 2012 - 2012 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 2012. http://dx.doi.org/10.1109/icassp.2012.6288854.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Yu, Kin, John S. Mason, and John Oglesby. "Speaker recognition models." In 4th European Conference on Speech Communication and Technology (Eurospeech 1995). ISCA: ISCA, 1995. http://dx.doi.org/10.21437/eurospeech.1995-105.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Andrews, Walter D., Mary A. Kohler, and Joseph P. Campbell. "Phonetic speaker recognition." In 7th European Conference on Speech Communication and Technology (Eurospeech 2001). ISCA: ISCA, 2001. http://dx.doi.org/10.21437/eurospeech.2001-416.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Doddington, George. "Speaker recognition based on idiolectal differences between speakers." In 7th European Conference on Speech Communication and Technology (Eurospeech 2001). ISCA: ISCA, 2001. http://dx.doi.org/10.21437/eurospeech.2001-417.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Schulze, Lucas, Renan Sebem, and Douglas Wildgrube Bertol. "Performance of PSO and GWO Algorithms Applied in Text-Independent Speaker Identification." In Congresso Brasileiro de Inteligência Computacional. SBIC, 2021. http://dx.doi.org/10.21528/cbic2021-98.

Full text

Abstract:

In this paper, we analyze the performance of two bio-inspired algorithms applied in text-independent speaker recognition through voice signal. The analyzed algorithms are particle swarm optimization and grey wolf optimization. The complete methodology described in this paper was specifically developed in the context of this work. First, a widely known model of the speaker is determined based on discrete transfer functions. Then a method of estimation of the input signal is determined. The bio-inspired algorithms are custom-developed and applied to parameterize the transfer functions based on the models. The proposed method is composed by three major parts, first the fitness used in the bio-inspired algorithms is created based on the cross-correlation. Second, a method to create a database with speakers’ identities is proposed, and third, a method to compare the characteristics of the speaker is proposed, to identify or distinguish two different speakers. Finally, experiments were made considering 4 speakers with 2 speech each, a representation of the identity of each speaker was created through both algorithms, totalizing 16 entries on the database. The experiment had a total of 240 runs, comparing the entries to each other. Results show that all comparison results were accurate. The algorithms identify the speaker even when two different speeches were compared, and, as expected, distinguish when two different speakers were compared.

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Speaker recognition"

Slyh, Raymond E., Eric G. Hansen, and Timothy R. Anderson. AFRL/HECP Speaker Recognition Systems for the 2004 NIST Speaker Recognition Evaluation. Fort Belvoir, VA: Defense Technical Information Center, December 2004. http://dx.doi.org/10.21236/ada430750.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wu, Jin Chu, Alvin F. Martin, Craig S. Greenberg, and Raghu N. Kacker. Measurement uncertainties in speaker recognition evaluation. Gaithersburg, MD: National Institute of Standards and Technology, 2010. http://dx.doi.org/10.6028/nist.ir.7722.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Quatieri, T. F., E. Singer, R. B. Dunn, D. A. Reynolds, and J. P. Campbell. Speaker and Language Recognition Using Speech Codec Parameters. Fort Belvoir, VA: Defense Technical Information Center, January 1999. http://dx.doi.org/10.21236/ada526525.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Martinson, E., and W. Lawson. Learning Speaker Recognition Models through Human-Robot Interaction. Fort Belvoir, VA: Defense Technical Information Center, May 2011. http://dx.doi.org/10.21236/ada550036.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Hansen, John H. Robust Speech Processing & Recognition: Speaker ID, Language ID, Speech Recognition/Keyword Spotting, Diarization/Co-Channel/Environmental Characterization, Speaker State Assessment. Fort Belvoir, VA: Defense Technical Information Center, October 2015. http://dx.doi.org/10.21236/ada623029.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Quatieri, T. F. Nonlinear Auditory Modeling as a Basis for Speaker Recognition. Fort Belvoir, VA: Defense Technical Information Center, May 2002. http://dx.doi.org/10.21236/ada402327.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wu, Jin Chu, Alvin F. Martin, Craig S. Greenberg, and Raghu N. Kacker. Data dependency on measurement uncertainties in speaker recognition evaluation. Gaithersburg, MD: National Institute of Standards and Technology, 2011. http://dx.doi.org/10.6028/nist.ir.7810.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Cieri, Christopher, Joseph P. Campbell, Hirotaka Nakasone, Kevin Walker, and David Miller. The Mixer Corpus of Multilingual, Multichannel Speaker Recognition Data. Fort Belvoir, VA: Defense Technical Information Center, January 2004. http://dx.doi.org/10.21236/ada523534.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Karakowski, Joseph A., and Hai H. Phu. Text Independent Speaker Recognition Using A Fuzzy Hypercube Classifier. Fort Belvoir, VA: Defense Technical Information Center, October 1998. http://dx.doi.org/10.21236/ada354792.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Ferrer, Luciana, Mitchell McLaren, Nicolas Scheffer, Yun Lei, Martin Graciarena, and Vikramjit Mitra. A Noise-Robust System for NIST 2012 Speaker Recognition Evaluation. Fort Belvoir, VA: Defense Technical Information Center, August 2013. http://dx.doi.org/10.21236/ada614010.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Speaker recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Speaker recognition"

Dissertations / Theses on the topic "Speaker recognition"

Books on the topic "Speaker recognition"

Book chapters on the topic "Speaker recognition"

Conference papers on the topic "Speaker recognition"

Reports on the topic "Speaker recognition"