Dissertations / Theses: 'Speech biometrics'

1

Sanderson, Conrad, and conradsand@ieee org. "Automatic Person Verification Using Speech and Face Information." Griffith University. School of Microelectronic Engineering, 2003. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20030422.105519.

Full text

Abstract:

Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the persons speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems based on face images and/or speech signals have been shown to be quite effective. However, their performance easily degrades in the presence of a mismatch between training and testing conditions. For speech based systems this is usually in the form of channel distortion and/or ambient noise; for face based systems it can be in the form of a change in the illumination direction. A system which uses more than one biometric at the same time is known as a multi-modal verification system; it is often comprised of several modality experts and a decision stage. Since a multi-modal system uses complimentary discriminative information, lower error rates can be achieved; moreover, such a system can also be more robust, since the contribution of the modality affected by environmental conditions can be decreased. This thesis makes several contributions aimed at increasing the robustness of single- and multi-modal verification systems. Some of the major contributions are listed below. The robustness of a speech based system to ambient noise is increased by using Maximum Auto-Correlation Value (MACV) features, which utilize information from the source part of the speech signal. A new facial feature extraction technique is proposed (termed DCT-mod2), which utilizes polynomial coefficients derived from 2D Discrete Cosine Transform (DCT) coefficients of spatially neighbouring blocks. The DCT-mod2 features are shown to be robust to an illumination direction change as well as being over 80 times quicker to compute than 2D Gabor wavelet derived features. The fragility of Principal Component Analysis (PCA) derived features to an illumination direction change is solved by introducing a pre-processing step utilizing the DCT-mod2 feature extraction. We show that the enhanced PCA technique retains all the positive aspects of traditional PCA (that is, robustness to compression artefacts and white Gaussian noise) while also being robust to the illumination direction change. Several new methods, for use in fusion of speech and face information under noisy conditions, are proposed; these include a weight adjustment procedure, which explicitly measures the quality of the speech signal, and a decision stage comprised of a structurally noise resistant piece-wise linear classifier, which attempts to minimize the effects of noisy conditions via structural constraints on the decision boundary.

APA, Harvard, Vancouver, ISO, and other styles

2

Sanderson, Conrad. "Automatic Person Verification Using Speech and Face Information." Thesis, Griffith University, 2003. http://hdl.handle.net/10072/367191.

Full text

Abstract:

Identity verification systems are an important part of our every day life. A typical example is the Automatic Teller Machine (ATM) which employs a simple identity verification scheme: the user is asked to enter their secret password after inserting their ATM card; if the password matches the one prescribed to the card, the user is allowed access to their bank account. This scheme suffers from a major drawback: only the validity of the combination of a certain possession (the ATM card) and certain knowledge (the password) is verified. The ATM card can be lost or stolen, and the password can be compromised. Thus new verification methods have emerged, where the password has either been replaced by, or used in addition to, biometrics such as the person’s speech, face image or fingerprints. Apart from the ATM example described above, biometrics can be applied to other areas, such as telephone & internet based banking, airline reservations & check-in, as well as forensic work and law enforcement applications. Biometric systems based on face images and/or speech signals have been shown to be quite effective. However, their performance easily degrades in the presence of a mismatch between training and testing conditions. For speech based systems this is usually in the form of channel distortion and/or ambient noise; for face based systems it can be in the form of a change in the illumination direction. A system which uses more than one biometric at the same time is known as a multi-modal verification system; it is often comprised of several modality experts and a decision stage. Since a multi-modal system uses complimentary discriminative information, lower error rates can be achieved; moreover, such a system can also be more robust, since the contribution of the modality affected by environmental conditions can be decreased. This thesis makes several contributions aimed at increasing the robustness of single- and multi-modal verification systems. Some of the major contributions are listed below. The robustness of a speech based system to ambient noise is increased by using Maximum Auto-Correlation Value (MACV) features, which utilize information from the source part of the speech signal. A new facial feature extraction technique is proposed (termed DCT-mod2), which utilizes polynomial coefficients derived from 2D Discrete Cosine Transform (DCT) coefficients of spatially neighbouring blocks. The DCT-mod2 features are shown to be robust to an illumination direction change as well as being over 80 times quicker to compute than 2D Gabor wavelet derived features. The fragility of Principal Component Analysis (PCA) derived features to an illumination direction change is solved by introducing a pre-processing step utilizing the DCT-mod2 feature extraction. We show that the enhanced PCA technique retains all the positive aspects of traditional PCA (that is, robustness to compression artefacts and white Gaussian noise) while also being robust to the illumination direction change. Several new methods, for use in fusion of speech and face information under noisy conditions, are proposed; these include a weight adjustment procedure, which explicitly measures the quality of the speech signal, and a decision stage comprised of a structurally noise resistant piece-wise linear classifier, which attempts to minimize the effects of noisy conditions via structural constraints on the decision boundary.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Microelectronic Engineering
Full Text

APA, Harvard, Vancouver, ISO, and other styles

3

Rouse, Kenneth Arthur Gilbert Juan E. "Classifying speakers using voice biometrics In a multimodal world." Auburn, Ala, 2009. http://hdl.handle.net/10415/1824.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Kotulek, Milan. "Jednoduchý textově nezávislý hlasový zámek - Softwarový systém pro verifikaci mluvčích." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-221256.

Full text

Abstract:

A brief introduction into biometrics is described in this thesis leading to description and to design a solution of verification system using speech analysis. The designed system provides firstly basic signal processing, then vowel recognition in fluent Czech speech. For each found vowel, observed speech features are calculated. The created GUI application was tested on created speaker database and its efficiency is approximately 54 % for short testing utterances, and approx. 88 % for long testing utterances respectively.

APA, Harvard, Vancouver, ISO, and other styles

5

Melin, Håkan. "Automatic speaker verification on site and by telephone: methods, applications and assessment." Doctoral thesis, KTH, Tal, musik och hörsel, TMH, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4242.

Full text

Abstract:

Speaker verification is the biometric task of authenticating a claimed identity by means of analyzing a spoken sample of the claimant's voice. The present thesis deals with various topics related to automatic speaker verification (ASV) in the context of its commercial applications, characterized by co-operative users, user-friendly interfaces, and requirements for small amounts of enrollment and test data. A text-dependent system based on hidden Markov models (HMM) was developed and used to conduct experiments, including a comparison between visual and aural strategies for prompting claimants for randomized digit strings. It was found that aural prompts lead to more errors in spoken responses and that visually prompted utterances performed marginally better in ASV, given that enrollment data were visually prompted. High-resolution flooring techniques were proposed for variance estimation in the HMMs, but results showed no improvement over the standard method of using target-independent variances copied from a background model. These experiments were performed on Gandalf, a Swedish speaker verification telephone corpus with 86 client speakers. A complete on-site application (PER), a physical access control system securing a gate in a reverberant stairway, was implemented based on a combination of the HMM and a Gaussian mixture model based system. Users were authenticated by saying their proper name and a visually prompted, random sequence of digits after having enrolled by speaking ten utterances of the same type. An evaluation was conducted with 54 out of 56 clients who succeeded to enroll. Semi-dedicated impostor attempts were also collected. An equal error rate (EER) of 2.4% was found for this system based on a single attempt per session and after retraining the system on PER-specific development data. On parallel telephone data collected using a telephone version of PER, 3.5% EER was found with landline and around 5% with mobile telephones. Impostor attempts in this case were same-handset attempts. Results also indicate that the distribution of false reject and false accept rates over target speakers are well described by beta distributions. A state-of-the-art commercial system was also tested on PER data with similar performance as the baseline research system.
QC 20100910

APA, Harvard, Vancouver, ISO, and other styles

6

Válková, Jana. "Formy zadávání a zpracování textových dat a informací v podnikových IS - trendy a aktuální praxe." Master's thesis, Vysoká škola ekonomická v Praze, 2011. http://www.nusl.cz/ntk/nusl-114263.

Full text

Abstract:

This thesis introduces readers to the basic types of the text and information inputs and processing to the computer. Thesis also includes historical contexts, current trends and future perspective of computer data input technologies and their use in practice. The first part of the thesis is a summary of a particular forms of entering and processing of the text data and information. The following part presents technological trends on the market concentrated on the automatic speech recognition systems along with the possibilities of their application in the business sphere. The rest of the thesis consists of a survey between Czech IT companies and based on it's results comes a suggestion of which technologies should be used as a part of the information systems.

APA, Harvard, Vancouver, ISO, and other styles

7

Boško, Božilović. "Биометријско обележје за препознавање говорника: дводимензионална информациона ентропија говорног сигнала." Phd thesis, Univerzitet u Novom Sadu, Fakultet tehničkih nauka u Novom Sadu, 2016. http://www.cris.uns.ac.rs/record.jsf?recordId=101369&source=NDLTD&language=en.

Full text

Abstract:

Mотив за истраживање је унапређење процеса аутоматског препознавања говорника без обзира на садржај изговоренoг текста.Циљ ове докторске дисертације је дефинисање новог биометријског обележја за препознавање говорника независно од изговореног текста − дводимензионалне информационе ентропије говорног сигнала.Дефинисање новог обележја се врши искључиво у временском домену, па је рачунарска сложеност алгоритма за његово издвајање знатно мања у односу на обележја која се издвајају у фреквенцијском домену. Оцена перформанси дводимензионалне информационе ентропије је урађена над репрезентативним скупом случајно одабраних говорника. Показано је да предложено обележје има малу варијабилност унутар говорног сигнала једног говорника, а велику варијабилност између говорних сигнала различитих говорника.
Motiv za istraživanje je unapređenje procesa automatskog prepoznavanja govornika bez obzira na sadržaj izgovorenog teksta.Cilj ove doktorske disertacije je definisanje novog biometrijskog obeležja za prepoznavanje govornika nezavisno od izgovorenog teksta − dvodimenzionalne informacione entropije govornog signala.Definisanje novog obeležja se vrši isključivo u vremenskom domenu, pa je računarska složenost algoritma za njegovo izdvajanje znatno manja u odnosu na obeležja koja se izdvajaju u frekvencijskom domenu. Ocena performansi dvodimenzionalne informacione entropije je urađena nad reprezentativnim skupom slučajno odabranih govornika. Pokazano je da predloženo obeležje ima malu varijabilnost unutar govornog signala jednog govornika, a veliku varijabilnost između govornih signala različitih govornika.
Тhe motivation for the research is the improvement of the automatic speaker recognition process regardless of the content of spoken text.The objective of this dissertation is to define a new biometric text-independent speaker recognition feature − the two-dimensional informational entropy of speech signal.Definition of the new feature is performed in time domain exclusively, so the computing complexity of the algorithm for feature extraction is significantly lower in comparison to feature extraction in spectral domain. Performance analysis of two-dimensional information entropy is performed on the representative set of randomly chosen speakers. It has been shown that new feature has small within-speaker variability and significant between-speaker variability.

APA, Harvard, Vancouver, ISO, and other styles

8

Chan, Siu Man. "Improved speaker verification with discrimination power weighting /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20CHANS.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 86-93). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

9

Vlasenko, Andrej. "Studentų emocinės būklės testavimo metu tyrimas panauduojant biometrines technologijas." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120329_153219-37955.

Full text

Abstract:

Disertacijoje nagrinėjamas kompiuterinės sistemos kūrimas, su kuria būtų galima nustatyti asmens psichoemicinę būseną pagal jo balso signalų požymius. Taip pat pateikiama vyzdžio skersmens matavimo sistema. Taigi, pagrindiniai mokslinio tyrimo objektai yra žmogaus balso požymiai ir jo vyzdžio dydžio pa-sikeitimo dinamika. Pagrindinis disertacijos tikslas – sukurti metodikas ir algo-ritmus, skirtus automatiškai apdoroti ir išanalizuoti balso signalo požymius. Šių sukurtų algoritmų taikymo sritis – streso valdymo sistemos programinė įranga. Šiame darbe sprendžiami keli pagrindiniai uždaviniai: analizuojant kalbėtojo balsą, kalbančiojo psichoemocinės būklės identifikavimo galimybės ir vyzdžio dydžio kaitos dinamika. Disertaciją sudaro įvadas, keturi skyriai, rezultatų apibendrinimas, naudotos literatūros sąrašas ir autoriaus publikacijų disertacijos tema sąrašas. Įvade aptariama tiriamoji problema, darbo aktualumas, aprašomas tyrimų objektas, formuluojamas darbo tikslas bei uždaviniai, aprašoma tyrimų metodi-ka, darbo mokslinis naujumas, darbo rezultatų praktinė reikšmė, ginamieji teigi-niai. Įvado pabaigoje pristatomos disertacijos tema autoriaus paskelbtos publika-cijos bei pranešimai konferencijose ir disertacijos struktūra. Pirmajame skyriuje pateikta asmens biometrinių bei fiziologiniu požymiu analizės pagrindu sukurta „Rekomendacine biometrinė streso valdymo sistema” (angl. Recommended Biometric Stress Management System). Sistema gali padėti nustatyti neigiamą streso lygį... [toliau žr. visą tekstą]
The dissertation investigates the issues of creating a computer system that uses voice signal features to determine person’s emotional state. In addition pre-sented system of measuring pupil diameter.The main objects of research include emotion recognition from speech and dynamics of eye pupil size change.The main purpose of this dissertation is employing suitable methodologies and algo-rithms to automatically process and analyse human voice parameters. Created algorithms can be used in Stress Management System software. The dissertation also focuses on researching the possibilities of identification of speaker’s psy-choemotional state: applying the analysis of speaker’s voice parameters and the analysis of dynamics of eye pupil size change. The dissertation consists of four parts including Introduction, 4 chapters, Conclusions and References. The introduction reveals the investigated problem, importance of the thesis and the object of research and describes the purpose and tasks of the paper, re-search methodology, scientific novelty, the practical significance of results ex-amined in the paper and defended statements. The introduction ends in present-ing the author’s publications on the subject of the defended dissertation, offering the material of made presentations in conferences and defining the structure of the dissertation. Chapter 1- the Recommended Biometric Stress Management System found-ed on the speech analysis. The System can assist in determining the level of... [to full text]

APA, Harvard, Vancouver, ISO, and other styles

10

Hartung, Karin. "Biometrical approaches for analysing gene bank evaluation data on barley (Hordeum spec.)." [S.l. : s.n.], 2007. http://nbn-resolving.de/urn:nbn:de:bsz:100-opus-2251.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Jagadeesan, Harini. "Design and Verification of Privacy and User Re-authentication Systems." Thesis, Virginia Tech, 2009. http://hdl.handle.net/10919/32394.

Full text

Abstract:

In the internet age, privacy and security have become major concerns since an increasing number of transactions are made over an unsecured network. Thus there is a greater chance for private data to be misused. Further, insider attacks can result in loss of valuable data. Hence there arises a strong need for continual, non-intrusive, quick user re-authentication. Previously, a number of studies have been conducted on authentication using behavioral attributes. Currently, few successful re-authentication mechanisms are available since they use either the mouse or the keyboard for re-authentication and target particular applications. However, successful re-authentication is still dependent on a large number of factors such as user excitation level, fatigue and using just the keyboard or the mouse does not mitigate these factors successfully.

Both keyboard and mouse contain valuable, hard-to-duplicate information about the userâ s behavior. This can be used for analysis and identification of the current user. We propose an application independent system that uses this information for user re-authentication. This system will authenticate the user continually based on his/her behavioral attributes obtained from both the keyboard and mouse operations. This re-authentication system is simple, continual, non-intrusive and easily deployable. To utilize the mouse and keyboard information for re-authentication, we propose a novel heuristic that uses the percentage of mouse-to-keyboard interaction ratio. This heuristic allows us to extract suitable user-behavioral attributes. The extracted data is compared with an already trained database for user re-authentication.

The accuracy of the system is calculated by the number of correct identifications to total number of identifications. At present, the accuracy of the system is around 96% for application based user re-authentication and around 82% for application independent user re-authentication. We perform black box, white box testing and Spec# verification procedures that prove the robustness of the proposed system. On testing POCKET, a privacy protection software for children, it was found that the security of POCKET was inadequate at the user level. Our system enhances POCKET security at the user level and ensures that the childâ s privacy is protected.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

12

Mekyska, Jiří. "Identifikace osob pomocí otisku hlasu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2010. http://www.nusl.cz/ntk/nusl-218235.

Full text

Abstract:

This work deals with the text-dependent speaker recognition in systems, where just a few training samples exist. For the purpose of this recognition, the voice imprint based on different features (e.g. MFCC, PLP, ACW etc.) is proposed. At the beginning, there is described the way, how the speech signal is produced. Some speech characteristics important for speaker recognition are also mentioned. The next part of work deals with the speech signal analysis. There is mentioned the preprocessing and also the feature extraction methods. The following part describes the process of speaker recognition and mentions the evaluation of the used methods: speaker identification and verification. Last theoretically based part of work deals with the classifiers which are suitable for the text-dependent recognition. The classifiers based on fractional distances, dynamic time warping, dispersion matching and vector quantization are mentioned. This work continues by design and realization of system, which evaluates all described classifiers for voice imprint based on different features.

APA, Harvard, Vancouver, ISO, and other styles

13

Kahn, Juliette. "Parole de locuteur : performance et confiance en identification biométrique vocale." Phd thesis, Université d'Avignon, 2011. http://tel.archives-ouvertes.fr/tel-00995071.

Full text

Abstract:

Ce travail de thèse explore l'usage biométrique de la parole dont les applications sont très nombreuses (sécurité, environnements intelligents, criminalistique, surveillance du territoire ou authentification de transactions électroniques). La parole est soumise à de nombreuses contraintes fonction des origines du locuteur (géographique, sociale et culturelle) mais également fonction de ses objectifs performatifs. Le locuteur peut être considéré comme un facteur de variation de la parole, parmi d'autres. Dans ce travail, nous présentons des éléments de réponses aux deux questions suivantes :- Tous les extraits de parole d'un même locuteur sont-ils équivalents pour le reconnaître ?- Comment se structurent les différentes sources de variation qui véhiculent directement ou indirectement la spécificité du locuteur ? Nous construisons, dans un premier temps, un protocole pour évaluer la capacité humaine à discriminer un locuteur à partir d'un extrait de parole en utilisant les données de la campagne NIST-HASR 2010. La tâche ainsi posée est difficile pour nos auditeurs, qu'ils soient naïfs ou plus expérimentés.Dans ce cadre, nous montrons que ni la (quasi)unanimité des auditeurs ni l'auto-évaluation de leurs jugements ne sont des gages de confiance dans la véracité de la réponse soumise.Nous quantifions, dans un second temps, l'influence du choix d'un extrait de parole sur la performance des systèmes automatiques. Nous avons utilisé deux bases de données, NIST et BREF ainsi que deux systèmes de RAL, ALIZE/SpkDet (LIA) et Idento (SRI). Les systèmes de RAL, aussi bienfondés sur une approche UBM-GMM que sur une approche i-vector montrent des écarts de performances importants mesurés à l'aide d'un taux de variation autour de l'EER moyen, Vr (pour NIST, VrIdento = 1.41 et VrALIZE/SpkDet = 1.47 et pour BREF, Vr = 3.11) selon le choix du fichier d'apprentissage utilisé pour chaque locuteur. Ces variations de performance, très importantes, montrent la sensibilité des systèmes automatiques au choix des extraits de parole, sensibilité qu'il est important de mesurer et de réduire pour rendre les systèmes de RAL plus fiables.Afin d'expliquer l'importance du choix des extraits de parole, nous cherchons les indices les plus pertinents pour distinguer les locuteurs de nos corpus en mesurant l'effet du facteur Locuteur sur la variance des indices (h2). La F0 est fortement dépendante du facteur Locuteur, et ce indépendamment de la voyelle. Certains phonèmes sont plus discriminants pour le locuteur : les consonnes nasales, les fricatives, les voyelles nasales, voyelles orales mi-fermées à ouvertes.Ce travail constitue un premier pas vers une étude plus précise de ce qu'est le locuteur aussi bien pour la perception humaine que pour les systèmes automatiques. Si nous avons montré qu'il existait bien une différence cepstrale qui conduisait à des modèles plus ou moins performants, il reste encore à comprendre comment lier le locuteur à la production de la parole. Enfin, suite à ces travaux, nous souhaitons explorer plus en détail l'influence de la langue sur la reconnaissance du locuteur. En effet, même si nos résultats indiquent qu'en anglais américain et en français, les mêmes catégories de phonèmes sont les plus porteuses d'information sur le locuteur, il reste à confirmer ce point et à évaluer ce qu'il en est pour d'autres langues

APA, Harvard, Vancouver, ISO, and other styles

14

Fabík, Vojtěch. "Fantomy pro oftalmologický ultrazvukový systém." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-220047.

Full text

Abstract:

In our work we have studied the ultrasonic imaging systems and their use in ophthalmology, especially with the device Nidek 4000. We described ophthalmological examination methods. In addition, we are using the simulation program Field II. It simulated eye phantom and created his B-scan and biometry, where we compared the effects of different central frequency ultrasonic probes and different speeds of sound in the resulting values. We also created phantoms using agarose gel and materials of different properties. On phantoms, we studied the effect of the velocity of ultrasound in measurement results, effect of the concentration of the agarose gel to the velocity of sound. And we created phantoms simulating the human eye. Measurement protocol was created for use in teaching.

APA, Harvard, Vancouver, ISO, and other styles

15

LEE, CHIEN-PENG, and 李建鵬. "Multi-modal Presentation Attacks Detection based on Mouth Dynamic and Speech Biometrics." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/u68nz3.

Full text

Abstract:

碩士
國防大學
網路安全碩士班
107
Biometric technologies have been widely used in daily life due to the advancement of information technology today. However, biometrics still have a high risks of being deceived. For example, an imposter pretends to be a legitimate user illegally accessing the system. This study proposed a countermeasure for the “Video Attack” in face recognition system based on the multi-modal method combined with motion detection and speech recognition. The motion is detected in a continuous time based on the mouth aspect ratio (MAR) while the user is talking. The similarities between the talk and the recognized speech are compared. The score level fusion method is used to fuse these two features, and then the Decision Tree, Random Forest, k-Nearest Neighbor and Naïve Bayes classifiers are used to conduct classifying and testing in the experiments. Experimental results show the accuracy of the proposed method for Video Attack detection reaches as high as 95.17%. It also shows that the proposed multi-modal presentation attacks detection method can effectively improve face recognition system security.

APA, Harvard, Vancouver, ISO, and other styles

16

Таванець, Назарій Станіславович, and Nazariy Tavanets. "Математичне моделювання мовних сигналів для задач біометричної ідентифікації користувачів." Master's thesis, 2022. http://elartu.tntu.edu.ua/handle/lib/37919.

Full text

Abstract:

Дипломна робота присв’ячена вибору математичної моделі мовних сигналів та розробленню методу їхнього опрацювання для задачі біометричної ідентифікації користувачів. В першому розділі дипломної роботи проаналізовано стан задачі біометричної ідентифікації та зокрема за мовним сигналом. В другому розділі дипломної роботи проведено аналіз відомих математичних моделей мовного сигналу та вибрано модель у вигляді кусково стаціонарного випадкового процесу для задачі біометричної ідентифікації. В третьому розділі дипломної роботи розроблено метод опрацювання мовних сигналів для отримання нових інформативних ознак для біометричної ідентифікації. Thesis is devoted to the choice of a mathematical model of speech signals and the development of a method of their processing for the task of biometric identification of users. In the first section of the thesis the state of the problem of biometric identification and in particular by the speech signal is analyzed. In the second section of the thesis the analysis of known mathematical models of speech signal is carried out and the model in the form of piecewise stationary random process for the problem of biometric identification is chosen. In the third section of the thesis developed a method of processing speech signals to obtain new informative features for biometric identification.
ВСТУП……………………………………………………………………………10 1 СТАН ДОСЛІДЖЕНЬ В ОБЛАСТІ БІОМЕТРИЧНОЇ ІДЕНТИФІКАЦІЇ ТА АУТЕНТИФІКАЦІЇ……………………………………………………………...13 1.1 Суть біометричної ідентифікації та аутентифікації………………….13 1.2 Традиційні методи ідентифікації……………………………………...14 1.3 Переваги біометричної ідентифікації…………………………………16 1.4 Окремі методи біометричної ідентифікації…………………………..18 1.4.1 Розпізнавання відбитків пальців……………………………..18 1.4.2 Розпізнавання обличчя……………………………………..…19 1.4.3 Розпізнавання райдужної оболонки…………………………20 1.4.4 Розпізнавання вен пальців……………………………………21 1.4.5 Розпізнавання образів долонної вени………………………..22 1.5 Основи ідентифікації за мовним сигналом…………………………...23 1.6 Суть та типи ідентифікації за мовним сигналом…………………..…25 1.7 Практики використання ідентифікації за мовними сигналами……...27 1.8 Переваги та недоліки ідентифікації за мовним сигналом…………...28 1.9 Висновки до розділу 1…………………………………………………30 2 ОБГРУНТУВАННЯ ВИБОРУ МАТЕМАТИЧНОЇ МОДЕЛІ МОВНИХ СИГНАЛІВ………………………………………………………………………32 2.1 Природа мовних сигналів…………………………………………..…32 2.2 Можливості подання мовних сигналів як стаціонарного випадкового процесу…………………………………………………………………..…41 2.3 Вибір математичної моделі мовних сигналів для задачі ідентифікації користувача…………………………………………………………………45 2.4 Висновки до розділу 2…………………………………………………46 3 РОЗРОБКА МЕТОДУ ІДЕНТИФІКАЦІЇ КОРИСТУВАЧІВ ЗА МОВНИМ СИГНАЛОМ……………………………………………………………………..48 3.1 Метод ідентифікації користувача за мовним сигналом……………..48 3.2 Перспективи використання розробленого методу…………………...60 3.3 Висновки до розділу 3…………………………………………………61 4 ОХОРОНА ПРАЦІ ТА БЕЗПЕКА В НАДЗВИЧАЙНИХ СИТУАЦІЯХ…..62 4.1 Вимоги до приміщення та робочого місця при дослідженні мовного сигналу………………………………………………………………………62 4.2 Організація і функціонування системи управління охороною праці 69 ВИСНОВКИ……………………………………………………………………...74 ПЕРЕЛІК ВИКОРИСТАНИХ ДЖЕРЕЛ……………………………………….76

APA, Harvard, Vancouver, ISO, and other styles

17

Wu, Dalei [Verfasser]. "Discriminative preprocessing of speech : towards improving biometric authentication / vorgelegt von Dalei Wu." 2007. http://d-nb.info/98472317X/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Adamski, Michal Jerzy. "A speaker recognition solution for identification and authentication." Thesis, 2014. http://hdl.handle.net/10210/11317.

Full text

Abstract:

M.Com. (Informatics)
A certain degree of vulnerability exists in traditional knowledge-based identification and authentication access control, as a result of password interception and social engineering techniques. This vulnerability has warranted the exploration of additional identification and authentication approaches such as physical token-based systems and biometrics. Speaker recognition is one such biometric approach that is currently not widely used due to its inherent technological challenges, as well as a scarcity of comprehensive literature and complete open-source projects. This makes it challenging for anyone who wishes to study, develop and improve upon speaker recognition for identification and authentication. In this dissertation, we condense some of the available speaker recognition literature in a manner that would provide a comprehensive overall picture of speaker identification and authentication to a wider range of interested audiences. A speaker recognition solution in the form of an open, user-friendly software prototype environment is presented, called SRIA (Speaker Recognition Identification Authentication). In SRIA, real users may enrol and perform speaker identification and authentication tasks. SRIA is intended as platform for speaker recognition understanding and further research and development.

APA, Harvard, Vancouver, ISO, and other styles

19

"Text-independent speaker recognition using discriminative subspace analysis." 2012. http://library.cuhk.edu.hk/record=b5549636.

Full text

Abstract:

說話人識別(Speaker Recognition) 主要利用聲音來檢測說話人的身份，是一項重要且極具挑戰性的生物認證研究課題。通常來說，針對語音信號的文本內容差別，說話人識別可以分成文本相關和文本無關兩類。另外，說話人識別有兩類重要應用，第一類是說話人確認，主要是通過給定話者聲音信息對說話人聲稱之身份進行二元判定。另一類是說話人辨識，其主要是從待選說話人集中判斷未知身份信息的話者身份。
在先進的說話人識別系統中，每個說話人模型是通過給定的說話人數據進行特徵統計分佈估計由生成模型訓練得到。這類方法由於需要逐帧進行概率或似然度計算而得出最終判決，會耗費大量系統資源並降低實時性性能。採用子空間降維技術，我們不僅避免選取冗餘高維度數據，同時能夠有效删除於識別中無用之數據。為克服上述生成性模型的不足並獲得不同說話人間的區分邊界，本文提出了利用區分性子空間方法訓練模型並採用有效的距離測度作為最終的建模識別新算法。
在本篇論文中，我們將先介紹並分析各類產生性說話人識別方法，例如高斯混合模型及聯合因子分析。另外，為了降低特徵空間維度和運算時間，我們也對子空間分析技術做了調研。除此之外，我們提出了一種取名為Fishervoice 基於非參數分佈假定的新穎說話人識別框架。所提出的Fishervoice 框架的主要目的是為了降低噪聲干擾同時加重分類信息，而能夠加強在可區分性的子空間內對聲音特徵建模。採用上述Fishervoice 框架，說話人識別可以簡單地通過測試樣本映射到Fishervoice 子空間並計算其簡單歐氏距離而實現。為了更好得降低維度及提高識別率，我們還對Fishervocie 框架進行多樣化探索。另外，我們也在低維度的全變化空間(Total Variability) 對各類多種子空間分析模型進行調比較。基於XM2VTS 和NIST 公開數據庫的實驗驗證了本文提出的算法的有效性。
Speaker Recognition (SR), which uses the voice to determine the speaker’s identity, is an important and challenging research topic for biometric authentication. Generally speaking, speaker recognition can be divided into text-dependent and text-independent methods according to the verbal content of the speech signal. There are two major applications of speaker recognition: the first is speaker verification, also referred to speaker authentication, which is used to validate the identity of a speaker according to the voice and it involves a binary decision. The second is speaker identification, which is used to determine an unknown speaker’s identity.
In a state-of-art speaker recognition system, the speaker training model is usually trained by generative methods, which estimate feature distribution of each speaker among the given data. These generative methods need a frame-based metric (e.g. probability, likelihoods) calculation for making final decision, which consumes much computer resources, slowing down the real-time responses. Meanwhile, lots of redundant data frames are blindly selected for training without efficient subspace dimension reduction. In order to overcome disadvantages of generative methods and obtain boundary information between individual speakers, we propose to apply the discriminative subspace technique for model training and employ simple but efficient distance metrics for decision score calculation.
In this thesis, we shall present an overview of both conventional and state-of-the-art generative speaker recognition methods (e.g. Gaussian Mixture Model and Joint Factor Analysis) and analyze their advantages and disadvantages. In addition, we have also made an investigation of the application of subspace analysis techniques to reduce feature dimensions and computation time. After that, a novel speaker recognition framework based on the nonparametric Fisher’s discriminant analysis which we name Fishervoice is proposed. The objective of the proposed Fishervoice algorithm is to model the intrinsic vocal characteristics in a discriminant subspace for de-emphasizing unwanted noise variations and emphasizing classification boundaries information. Using the proposed Fishervoice framework, speaker recognition can be easily realized by mapping a test utterance to the Fishervoice subspace and then calculating the score between the test utterance and its reference. Besides, we explore the proposed Fishervoice framework with several extensions for further dimensionality reduction and performance improvement. Furthermore, we investigate various subspace analysis techniques in a total variability-based low-dimensional space for fast computation. Extensive experiments on two large speaker recognition corpora (XM2VTS and NIST) demonstrate significant improvements of Fishervoice over standard, state-of-the-art approaches for both speaker identification and verification systems.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Detailed summary in vernacular field only.
Jiang, Weiwu.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2012.
Includes bibliographical references (leaves 127-135).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
Abstract --- p.i
Acknowledgements --- p.vi
Contents --- p.xiv
List of Figures --- p.xvii
List of Tables --- p.xxiii
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Overview of Speaker Recognition Systems --- p.1
Chapter 1.2 --- Motivation --- p.4
Chapter 1.3 --- Outline of Thesis --- p.6
Chapter 2 --- Background Study --- p.7
Chapter 2.1 --- Generative Gaussian Mixture Model (GMM) --- p.7
Chapter 2.1.1 --- Basic GMM --- p.7
Chapter 2.1.2 --- The Gaussian Mixture Model-Universal Background Model (GMM-UBM) System --- p.9
Chapter 2.2 --- Discriminative Subspace Analysis --- p.12
Chapter 2.2.1 --- Principal Component Analysis --- p.12
Chapter 2.2.2 --- Linear Discriminant Analysis --- p.16
Chapter 2.2.3 --- Heteroscedastic Linear Discriminant Analysis --- p.17
Chapter 2.2.4 --- Locality Preserving Projections --- p.18
Chapter 2.3 --- Noise Compensation --- p.20
Chapter 2.3.1 --- Eigenvoice --- p.20
Chapter 2.3.2 --- Joint Factor Analysis --- p.24
Chapter 2.3.3 --- Probabilistic Linear Discriminant Analysis --- p.26
Chapter 2.3.4 --- Nuisance Attribute Projection --- p.30
Chapter 2.3.5 --- Within-class Covariance Normalization --- p.32
Chapter 2.4 --- Support Vector Machine --- p.33
Chapter 2.5 --- Score Normalization --- p.35
Chapter 2.6 --- Summary --- p.39
Chapter 3 --- Corpora for Speaker Recognition Experiments --- p.41
Chapter 3.1 --- Corpora for Speaker Identification Experiments --- p.41
Chapter 3.1.1 --- XM2VTS Corpus --- p.41
Chapter 3.1.2 --- NIST Corpora --- p.42
Chapter 3.2 --- Corpora for Speaker Verification Experiments --- p.45
Chapter 3.3 --- Summary --- p.47
Chapter 4 --- Performance Measures for Speaker Recognition --- p.48
Chapter 4.1 --- Performance Measures for Identification --- p.48
Chapter 4.2 --- Performance Measures for Verification --- p.49
Chapter 4.2.1 --- Equal Error Rate --- p.49
Chapter 4.2.2 --- Detection Error Tradeoff Curves --- p.49
Chapter 4.2.3 --- Detection Cost Function --- p.50
Chapter 4.3 --- Summary --- p.51
Chapter 5 --- The Discriminant Fishervoice Framework --- p.52
Chapter 5.1 --- The Proposed Fishervoice Framework --- p.53
Chapter 5.1.1 --- Feature Representation --- p.53
Chapter 5.1.2 --- Nonparametric Fisher’s Discriminant Analysis --- p.55
Chapter 5.2 --- Speaker Identification Experiments --- p.60
Chapter 5.2.1 --- Experiments on the XM2VTS Corpus --- p.60
Chapter 5.2.2 --- Experiments on the NIST Corpus --- p.62
Chapter 5.3 --- Summary --- p.64
Chapter 6 --- Extension of the Fishervoice Framework --- p.66
Chapter 6.1 --- Two-level Fishervoice Framework --- p.66
Chapter 6.1.1 --- Proposed Algorithm --- p.66
Chapter 6.2 --- Performance Evaluation on the Two-level Fishervoice Framework --- p.70
Chapter 6.2.1 --- Experimental Setup --- p.70
Chapter 6.2.2 --- Performance Comparison of Different Types of Input Supervectors --- p.72
Chapter 6.2.3 --- Performance Comparison of Different Numbers of Slices --- p.73
Chapter 6.2.4 --- Performance Comparison of Different Dimensions of Fishervoice Projection Matrices --- p.75
Chapter 6.2.5 --- Performance Comparison with Other Systems --- p.77
Chapter 6.2.6 --- Fusion with Other Systems --- p.78
Chapter 6.2.7 --- Extension of the Two-level Subspace Analysis Framework --- p.80
Chapter 6.3 --- Random Subspace Sampling Framework --- p.81
Chapter 6.3.1 --- Supervector Extraction --- p.82
Chapter 6.3.2 --- Training Stage --- p.83
Chapter 6.3.3 --- Testing Procedures --- p.84
Chapter 6.3.4 --- Discussion --- p.84
Chapter 6.4 --- Performance Evaluation of the Random Subspace Sampling Framework --- p.85
Chapter 6.4.1 --- Experimental Setup --- p.85
Chapter 6.4.2 --- Random Subspace Sampling Analysis --- p.87
Chapter 6.4.3 --- Comparison with Other Systems --- p.90
Chapter 6.4.4 --- Fusion with the Other Systems --- p.90
Chapter 6.5 --- Summary --- p.92
Chapter 7 --- Discriminative Modeling in Low-dimensional Space --- p.94
Chapter 7.1 --- Discriminative Subspace Analysis in Low-dimensional Space --- p.95
Chapter 7.1.1 --- Experimental Setup --- p.96
Chapter 7.1.2 --- Performance Evaluation on Individual Subspace Analysis Techniques --- p.98
Chapter 7.1.3 --- Performance Evaluation on Multi-type of Subspace Analysis Techniques --- p.105
Chapter 7.2 --- Discriminative Subspace Analysis with Support Vector Machine --- p.115
Chapter 7.2.1 --- Experimental Setup --- p.116
Chapter 7.2.2 --- Performance Evaluation on LDA+WCCN+SVM --- p.117
Chapter 7.2.3 --- Performance Evaluation on Fishervoice+SVM --- p.118
Chapter 7.3 --- Summary --- p.118
Chapter 8 --- Conclusions and Future Work --- p.120
Chapter 8.1 --- Contributions --- p.120
Chapter 8.2 --- Future Directions --- p.121
Chapter A --- EM Training GMM --- p.123
Bibliography --- p.127

APA, Harvard, Vancouver, ISO, and other styles

20

Hartung, Karin [Verfasser]. "Biometrical approaches for analysing gene bank evaluation data on barley (Hordeum spec.) / presented by Karin Hartung." 2008. http://d-nb.info/987648837/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Speech biometrics'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles