Dissertations / Theses on the topic 'Speaker recognition'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 50 dissertations / theses for your research on the topic 'Speaker recognition.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Chatzaras, Anargyros, and Georgios Savvidis. "Seamless speaker recognition." Thesis, KTH, Radio Systems Laboratory (RS Lab), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-159021.
Full textI ett teknologiskt avancerat samhälle så hanterar den genomsnittliga personen dussintals konton för e-post, sociala nätverk, internetbanker, och andra elektroniska tjänster. Allt eftersom antalet konton ökar, blir behovet av automatisk identifiering av användaren mer väsentlig. Biometri har länge använts för att identifiera personer och är den vanligaste (om inte den enda) metoden för att utföra denna uppgift. Smartphones har under de senaste åren blivit allt mer vanligt förekommande, de ger användaren tillgång till de flesta av sina konton och, i viss mån, även personifiering av enheterna baserat på deras profiler på sociala nätverk. Dessa enheter har inbyggda mikrofoner och används ofta av en enskild användare eller en liten grupp av användare, till exempel ett par eller en familj. Denna avhandling använder mikrofonen i en smartphone för att spela in användarens tal och identifiera honom/henne. Befintliga lösningar för talarigenkänning ber vanligtvis användaren om att ge långa röstprover för att kunna ge korrekta resultat. Detta resulterar i en dålig användarupplevelse och avskräcker användare som inte har tålamod att gå igenom en sådan process. Huvudtanken bakom den strategi för talarigenkänningen som presenteras i denna avhandling är att ge en sömlös användarupplevelse där inspelningen av användarens röst sker i bakgrunden. En Android-applikation har utvecklats som, utan att märkas, samlar in röstprover och utför talarigenkänning på dessa utan att kräva omfattande interaktion av användaren. Två varianter av verktyget har utvecklats och dessa beskrivs ingående i denna avhandling. Öpen source-ramverket Recognito används för att utföra talarigenkänningen. Analysen av Recognito visade att det inte klarar av att uppnå tillräckligt hög noggrannhet, speciellt när röstproverna innehåller bakgrundsbrus. Dessutom visade jämförelsen mellan de två arkitekturerna att de inte skiljer sig nämnvärt i fråga om prestanda.
VASILAKAKIS, VASILEIOS. "Forensic speaker recognition: speaker and height estimation techniques." Doctoral thesis, Politecnico di Torino, 2014. http://hdl.handle.net/11583/2551370.
Full textKamarauskas, Juozas. "Speaker recognition by voice." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20090615_093847-20773.
Full textDisertacijoje nagrinėjami kalbančiojo atpažinimo pagal balsą klausimai. Aptartos kalbančiojo atpažinimo sistemos, jų raida, atpažinimo problemos, požymių sistemos įvairovė bei kalbančiojo modeliavimo ir požymių palyginimo metodai, naudojami nuo ištarto teksto nepriklausomame bei priklausomame kalbančiojo atpažinime. Darbo metu sukurta nuo ištarto teksto nepriklausanti kalbančiojo atpažinimo sistema. Kalbėtojų modelių kūrimui ir požymių palyginimui buvo panaudoti Gauso mišinių modeliai. Pasiūlytas automatinis vokalizuotų garsų išrinkimo (segmentavimo) metodas. Šis metodas yra greitai veikiantis ir nereikalaujantis iš vartotojo jokių papildomų veiksmų, tokių kaip kalbos signalo ir triukšmo pavyzdžių nurodymas. Pasiūlyta požymių vektorių sistema, susidedanti iš žadinimo signalo bei balso trakto parametrų. Kaip žadinimo signalo parametras, panaudotas žadinimo signalo pagrindinis dažnis, kaip balso trakto parametrai, panaudotos keturios formantės bei trys antiformantės. Siekiant suvienodinti žemesnių bei aukštesnių formančių ir antiformančių dispersijas, jas pasiūlėme skaičiuoti melų skalėje. Rezultatų palyginimui sistemoje buvo realizuoti standartiniai požymiai, naudojami kalbos bei asmens atpažinime – melų skalės kepstro koeficientai (MSKK). Atlikti kalbančiojo atpažinimo eksperimentai parodė, kad panaudojus pasiūlytą požymių sistemą buvo gauti geresni atpažinimo rezultatai, nei panaudojus standartinius požymius (MSKK). Gautas lygių klaidų lygis, panaudojant pasiūlytą požymių... [toliau žr. visą tekstą]
Du, Toit Ilze. "Non-acoustic speaker recognition." Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16315.
Full textENGLISH ABSTRACT: In this study the phoneme labels derived from a phoneme recogniser are used for phonetic speaker recognition. The time-dependencies among phonemes are modelled by using hidden Markov models (HMMs) for the speaker models. Experiments are done using firstorder and second-order HMMs and various smoothing techniques are examined to address the problem of data scarcity. The use of word labels for lexical speaker recognition is also investigated. Single word frequencies are counted and the use of various word selections as feature sets are investigated. During April 2004, the University of Stellenbosch, in collaboration with Spescom DataVoice, participated in an international speaker verification competition presented by the National Institute of Standards and Technology (NIST). The University of Stellenbosch submitted phonetic and lexical (non-acoustic) speaker recognition systems and a fused system (the primary system) that fuses the acoustic system of Spescom DataVoice with the non-acoustic systems of the University of Stellenbosch. The results were evaluated by means of a cost model. Based on the cost model, the primary system obtained second and third position in the two categories that were submitted.
AFRIKAANSE OPSOMMING: Hierdie projek maak gebruik van foneem-etikette wat geklassifiseer word deur ’n foneemherkenner en daarna gebruik word vir fonetiese sprekerherkenning. Die tyd-afhanklikhede tussen foneme word gemodelleer deur gebruik te maak van verskuilde Markov modelle (HMMs) as sprekermodelle. Daar word ge¨eksperimenteer met eerste-orde en tweede-orde HMMs en verskeie vergladdingstegnieke word ondersoek om dataskaarsheid aan te spreek. Die gebruik van woord-etikette vir sprekerherkenning word ook ondersoek. Enkelwoordfrekwensies word getel en daar word ge¨eksperimenteer met verskeie woordseleksies as kenmerke vir sprekerherkenning. Gedurende April 2004 het die Universiteit van Stellenbosch in samewerking met Spescom DataVoice deelgeneem aan ’n internasionale sprekerverifikasie kompetisie wat deur die National Institute of Standards and Technology (NIST) aangebied is. Die Universiteit van Stellenbosch het ingeskryf vir ’n fonetiese en ’n woordgebaseerde (nie-akoestiese) sprekerherkenningstelsel, asook ’n saamgesmelte stelsel wat as primˆere stelsel dien. Die saamgesmelte stelsel is ’n kombinasie van Spescom DataVoice se akoestiese stelsel en die twee nie-akoestiese stelsels van die Universiteit van Stellenbosch. Die resultate is ge¨evalueer deur gebruik te maak van ’n koste-model. Op grond van die koste-model het die primˆere stelsel tweede en derde plek behaal in die twee kategorie¨e waaraan deelgeneem is.
Hong, Z. (Zimeng). "Speaker gender recognition system." Master's thesis, University of Oulu, 2017. http://jultika.oulu.fi/Record/nbnfioulu-201706082645.
Full textAl-Kilani, Menia. "Voice-signature-based Speaker Recognition." University of the Western Cape, 2017. http://hdl.handle.net/11394/5888.
Full textPersonal identification and the protection of data are important issues because of the ubiquitousness of computing and these have thus become interesting areas of research in the field of computer science. Previously people have used a variety of ways to identify an individual and protect themselves, their property and their information. This they did mostly by means of locks, passwords, smartcards and biometrics. Verifying individuals by using their physical or behavioural features is more secure than using other data such as passwords or smartcards, because everyone has unique features which distinguish him or her from others. Furthermore the biometrics of a person are difficult to imitate or steal. Biometric technologies represent a significant component of a comprehensive digital identity solution and play an important role in security. The technologies that support identification and authentication of individuals is based on either their physiological or their behavioural characteristics. Live-‐data, in this instance the human voice, is the topic of this research. The aim is to recognize a person’s voice and to identify the user by verifying that his/her voice is the same as a record of his / her voice-‐signature in a systems database. To address the main research question: “What is the best way to identify a person by his / her voice signature?”, design science research, was employed. This methodology is used to develop an artefact for solving a problem. Initially a pilot study was conducted using visual representation of voice signatures, to check if it is possible to identify speakers without using feature extraction or matching methods. Subsequently, experiments were conducted with 6300 data sets derived from Texas Instruments and the Massachusetts Institute of Technology audio database. Two methods of feature extraction and classification were considered—mel frequency cepstrum coefficient and linear prediction cepstral coefficient feature extraction—and for classification, the Support Vector Machines method was used. The three methods were compared in terms of their effectiveness and it was found that the system using the mel frequency cepstrum coefficient, for feature extraction, gave the marginally better results for speaker recognition.
Oglesby, J. "Neural models for speaker recognition." Thesis, Swansea University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.638359.
Full textThompson, J. "Speech variability in speaker recognition." Thesis, Swansea University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.639230.
Full textMukherjee, Rishiraj. "Speaker Recognition Using Shifted MFCC." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4136.
Full textMwangi, Elijah. "Speaker independent isolated word recognition." Thesis, Loughborough University, 1987. https://dspace.lboro.ac.uk/2134/15425.
Full textLuettin, Juergen. "Visual speech and speaker recognition." Thesis, University of Sheffield, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264432.
Full textAlkilani, Menia Mohamed. "Voice signature based Speaker Recognition." University of the Western Cape, 2017. http://hdl.handle.net/11394/6196.
Full textPersonal identification and the protection of data are important issues because of the ubiquitousness of computing and these havethus become interesting areas of research in the field of computer science. Previously people have used a variety of ways to identify an individual and protect themselves, their property and their information.
Mošner, Ladislav. "Microphone Arrays for Speaker Recognition." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2017. http://www.nusl.cz/ntk/nusl-363803.
Full textCUMANI, SANDRO. "Speaker and Language Recognition Techniques." Doctoral thesis, Politecnico di Torino, 2012. http://hdl.handle.net/11583/2496928.
Full textHo, Ka-Lung. "Kernel eigenvoice speaker adaptation /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?COMP%202003%20HOK.
Full textIncludes bibliographical references (leaves 56-61). Also available in electronic version. Access restricted to campus users.
Seymour, R. "Audio-visual speech and speaker recognition." Thesis, Queen's University Belfast, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.492489.
Full textNeville, Katrina Lee, and katrina neville@rmit edu au. "Channel Compensation for Speaker Recognition Systems." RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453.
Full textDomínguez, Sánchez Carlos. "Speaker Recognition in a handheld computer." Thesis, KTH, Kommunikationssystem, CoS, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-99123.
Full textHanddatorer används mycket, det kan vara en mobiltelefon, handdator (PDA) eller en media spelare. Även om dessa enheter är personliga, kan en liten uppsättning med personer ofta använda en viss enhet, t.ex. en grupp av vänner eller en familj. Det mest naturliga sättet att kommunicera för de flesta människor är att tala. Därför ett naturligt sätt för dessa enheten att veta vem som använder dem är för enheten att lyssna på användarens röst, till exempel att erkänna talaren baserat på deras röst. Detta projekt utnyttjar mikrofonen inbyggd i de flesta av dessa enheter och frågar om det är möjligt att utveckla ett effektivt system högtalare erkännande som kan verka inom de begränsade resurserna av dessa enheter (jämfört med en stationär dator). Målet med denna högtalare erkännande är att skilja mellan den lilla set av människor som skulle kunna dela en handdator och de utanför detta lilla set. Därför kriterierna är att enheten bör arbeta för någon av medlemmarna i detta lilla set och inte fungerar för någon utanför detta lilla set. Övrigt inom denna lilla set, bör enheten erkänna som specifik person inom denna lilla grupp. En ansökan om emph Windows Mobile PDA har utvecklats med C++. Denna ansökan och det underliggande teoretiska begreppet, liksom delar av koden och uppnådda resultat (i form av noggrannhet hastighet och prestanda) presenteras i denna avhandling. Experimenten som utförs inom denna forskning visar att det är möjligt att känna användaren baserat på deras röst inom en liten grupp och ytterligare mer att identifiera vilken medlem i gruppen är användaren. Detta har stor potential för att automatiskt konfigurera enheter inom en hemifrån eller från kontoret till den specifika användaren. Potentiellt behöver en användare tala inom hörhåll för att identifiera sig till enheten. Enheten kan konfigurera själv för denna användare.
Chan, Chit-man, and 陳哲民. "Speaker-independent recognition of Putonghua finals." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1987. http://hub.hku.hk/bib/B12363091.
Full textabstract
toc
Electrical and Electronic Engineering
Doctoral
Doctor of Philosophy
Deterding, David Henry. "Speaker normalisation for automatic speech recognition." Thesis, University of Cambridge, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.359822.
Full textPark, Alex S. (Alex Seungryong) 1979. "ASR dependent techniques for speaker recognition." Thesis, Massachusetts Institute of Technology, 2002. http://hdl.handle.net/1721.1/87287.
Full textChan, Chit-man. "Speaker-independent recognition of Putonghua finals /." [Hong Kong : University of Hong Kong], 1987. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12363091.
Full textVogt, Robert Jeffery. "Automatic speaker recognition under adverse conditions." Thesis, Queensland University of Technology, 2006. https://eprints.qut.edu.au/36195/1/Robert_Vogt_Thesis.pdf.
Full textAl-Ali, Ahmed Kamil Hasan. "Forensic speaker recognition under adverse conditions." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/130783/1/Ahmed%20Kamil%20Hasan_Al-Ali_Thesis.pdf.
Full textShou-Chun, Yin 1980. "Speaker adaptation in joint factor analysis based text independent speaker verification." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100735.
Full textThiruvaran, Tharmarajah Electrical Engineering & Telecommunications Faculty of Engineering UNSW. "Automatic speaker recognition using phase based features." Awarded by:University of New South Wales. Electrical Engineering & Telecommunications, 2009. http://handle.unsw.edu.au/1959.4/44705.
Full textKatz, Marcel [Verfasser]. "Discriminative classifiers for speaker Recognition / Marcel Katz." Saarbrücken : Südwestdeutscher Verlag für Hochschulschriften, 2009. http://www.vdm-verlag.de.
Full textElvira, Jose M. "Neural networks for speech and speaker recognition." Thesis, Staffordshire University, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.262314.
Full textMcAuley, J. "Subband correlation and robust speech/speaker recognition." Thesis, Queen's University Belfast, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.426761.
Full textChan, Carlos Chun Ming. "Speaker model adaptation in automatic speech recognition." Thesis, Robert Gordon University, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.339307.
Full textIrvine, David Alexander. "A comparison of some speaker recognition techniques." Thesis, University of Ulster, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.385661.
Full textIliadi, Konstantina. "Bio-inspired voice recognition for speaker identification." Thesis, University of Southampton, 2016. https://eprints.soton.ac.uk/413949/.
Full textFér, Radek. "Speaker Recognition Based on Long Temporal Context." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-236121.
Full textCastellano, Pierre John. "Speaker recognition modelling with artificial neural networks." Thesis, Queensland University of Technology, 1997.
Find full textHo, Ching-Hsiang. "Speaker modelling for voice conversion." Thesis, Brunel University, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365076.
Full textFredrickson, Steven Eric. "Neural networks for speaker identification." Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294364.
Full textNosratighods, Mohaddeseh Electrical Engineering & Telecommunications Faculty of Engineering UNSW. "Robust speaker verification system." Publisher:University of New South Wales. Electrical Engineering & Telecommunications, 2008. http://handle.unsw.edu.au/1959.4/42796.
Full textWildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.
Full textWildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.
Full textThesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
Baker, Brendan J. "Speaker verification incorporating high-level linguistic features." Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17665/1/Brendan_Baker_Thesis.pdf.
Full textBaker, Brendan J. "Speaker verification incorporating high-level linguistic features." Queensland University of Technology, 2008. http://eprints.qut.edu.au/17665/.
Full textTran, Michael. "An approach to a robust speaker recognition system." Diss., This resource online, 1994. http://scholar.lib.vt.edu/theses/available/etd-06062008-164814/.
Full textFarrús, Cabeceran Mireia. "Fusing prosodic and acoustic information for speaker recognition." Doctoral thesis, Universitat Politècnica de Catalunya, 2008. http://hdl.handle.net/10803/31779.
Full textAutomatic speaker recognition is the use of a machine to identify an individual from a spoken sentence. Recently, this technology has been undergone an increasing use in applications such as access control, transaction authentication, law enforcement, forensics, and system customisation, among others. One of the central questions addressed by this field is what is it in the speech signal that conveys speaker identity. Traditionally, automatic speaker recognition systems have relied mostly on short-term features related to the spectrum of the voice. However, human speaker recognition relies on other sources of information; therefore, there is reason to believe that these sources can play also an important role in the automatic speaker recognition task, adding complementary knowledge to the traditional spectrum-based recognition systems and thus improving their accuracy. The main objective of this thesis is to add prosodic information to a traditional spectral system in order to improve its performance. To this end, several characteristics related to human speech prosody – which is conveyed through intonation, rhythm and stress – are selected and combined them with the existing spectral features. Furthermore, this thesis also focuses on the use of additional acoustic features – namely jitter and shimmer – to improve the performance of the proposed spectral-prosodic verification system. Both features are related to the shape and dimension of the vocal tract, and they have been largely used to detect voice pathologies. Since almost all the above-mentioned applications can be used in a multimodal environment, this thesis also aims to combine the voice features used in the speaker recognition system together with other biometric identifiers – face – in order to improve the global performance. To this end, several normalisation and fusion techniques are used, and the final fusion results are improved by applying different fusion strategies based on sequences of several steps. Furthermore, multimodal fusion is also improved by applying a histogram equalisation to the unimodal score distributions as a normalisation technique. On the other hand, it is well know that humans are able to identify others from voice even when their voices are disguised. The question arises as to how vulnerable automatic speaker recognition systems are against different voice disguises, such as human imitation or artificial voice conversion, which are potential threats to security systems that rely on automatic speaker recognition. The last part of this thesis finishes with an analysis of the robustness of such systems against human voice imitations and synthetic converted voices, and the influence of foreign accents and dialects – as a sort of imitation – in auditory speaker recognition.
Khan, Umair. "Self-supervised deep learning approaches to speaker recognition." Doctoral thesis, Universitat Politècnica de Catalunya, 2021. http://hdl.handle.net/10803/671496.
Full textLos avances recientes en Deep Learning (DL) para el reconocimiento del hablante están mejorado el rendimiento de los sistemas tradicionales basados en i-vectors. En el reconocimiento de locutor basado en i-vectors, la distancia coseno y el análisis discriminante lineal probabilístico (PLDA) son las dos técnicas más usadas de puntuación. La primera no es supervisada, pero la segunda necesita datos etiquetados por el hablante, que no son siempre fácilmente accesibles en la práctica. Esto crea una gran brecha de rendimiento entre estas dos técnicas de puntuación. La pregunta es: ¿cómo llenar esta brecha de rendimiento sin usar etiquetas del hablante en los datos de background? En esta tesis, el problema anterior se ha abordado utilizando técnicas de DL sin utilizar y/o limitar el uso de datos etiquetados. Se han realizado tres propuestas basadas en DL. En la primera, se propone una representación vectorial de voz basada en la máquina de Boltzmann restringida (RBM) para las tareas de agrupación de hablantes y seguimiento de hablantes en programas de televisión. Los experimentos en la base de datos AGORA, muestran que en agrupación de hablantes los vectores RBM suponen una mejora relativa del 12%. Y, por otro lado, en seguimiento del hablante, los vectores RBM,utilizados solo en la etapa de identificación del hablante, muestran una mejora relativa del 11% (coseno) y 7% (PLDA). En la segunda, se utiliza DL para aumentar el poder discriminativo de los i-vectors en la verificación del hablante. Se ha propuesto el uso del autocodificador de varias formas. En primer lugar, se utiliza un autocodificador como preentrenamiento de una red neuronal profunda (DNN) utilizando una gran cantidad de datos de background sin etiquetar, para posteriormente entrenar un clasificador DNN utilizando un conjunto reducido de datos etiquetados. En segundo lugar, se entrena un autocodificador para transformar i-vectors en una nueva representación para aumentar el poder discriminativo de los i-vectors. El entrenamiento se lleva a cabo en base a los i-vectors vecinos más cercanos, que se eligen de forma no supervisada. La evaluación se ha realizado con la base de datos VoxCeleb-1. Los resultados muestran que usando el primer sistema obtenemos una mejora relativa del 21% sobre i-vectors, mientras que usando el segundo sistema, se obtiene una mejora relativa del 42%. Además, si utilizamos los datos de background en la etapa de prueba, se obtiene una mejora relativa del 53%. En la tercera, entrenamos un sistema auto-supervisado de verificación de locutor de principio a fin. Utilizamos impostores junto con los vecinos más cercanos para formar pares cliente/impostor sin supervisión. La arquitectura se basa en un codificador de red neuronal convolucional (CNN) que se entrena como una red siamesa con dos ramas. Además, se entrena otra red con tres ramas utilizando la función de pérdida triplete para extraer embeddings de locutores. Los resultados muestran que tanto el sistema de principio a fin como los embeddings de locutores, a pesar de no estar supervisados, tienen un rendimiento comparable a una referencia supervisada. Cada uno de los enfoques propuestos tienen sus pros y sus contras. El mejor resultado se obtuvo utilizando el autocodificador con el vecino más cercano, con la desventaja de que necesita los i-vectors de background en el test. El uso del preentrenamiento del autocodificador para DNN no tiene este problema, pero es un enfoque semi-supervisado, es decir, requiere etiquetas de hablantes solo de una parte pequeña de los datos de background. La tercera propuesta no tienes estas dos limitaciones y funciona de manera razonable. Es un en
Uzuner, Halil. "Robust text-independent speaker recognition over telecommunications systems." Thesis, University of Surrey, 2006. http://epubs.surrey.ac.uk/843391/.
Full textEriksson, Erik J. "That voice sounds familiar : factors in speaker recognition." Doctoral thesis, Umeå : Department of Philosophy and Linguistics, Umeå University, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1106.
Full textFalk, Jennie, and Gabriella Hultström. "Support Vector Machines for Optimizing Speaker Recognition Problems." Thesis, KTH, Optimeringslära och systemteori, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-103821.
Full textKlassi cering av data har manga anvandningsomraden, bland annat inom rostigenkanning. Rostigenkanning ar en del av talmodellering som behandlar problemet med att kunna identi era talare och veri era en talares identitet med hjalp av karakteristiska drag hos dennes rost. Fokus ligger pa att hitta metoder som kan separera data, for att sedan kunna separera talare. I detta kandidatexamensarbete byggs, for detta syfte, en support vector machine som has visats vara ett bra satt att separera olika data. Den forsta versionen anvands pa data som ar linjart separerbart i tva dimensioner, sedan utvecklas den till att kunna separera data som inte ar linjart separerbart, genom att tillata vissa datapunkter att bli felklassi cerade. Slutligen modi eras denna support vector machine till att kunna separera data i hogre dimensioner, samt anvanda olika karnor for att ge separerande hyperplan av hogre ordning. Den fardiga versionen av denna support vector machine anvands till sist pa data for ett rostigenkanningsproblem. Resultatet av att separera tva talare var inte tillfredsstallande, dock skulle mer data fran olika talare ge ett battre resultat. Nar daretmot en annan, mer komplett, mangd av data anvands for att bygga denna support vector machine blir resultatet valdigt bra.
Farnes, Karen. "Development of a Speaker Recognition Solution in Vidispine." Thesis, Umeå universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-74180.
Full textOpenshaw, J. P. "The effects of additive noise in speaker recognition." Thesis, Swansea University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.638372.
Full textCox, S. J. "Techniques for rapid speaker adaptation in speech recognition." Thesis, University of East Anglia, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.267271.
Full text