Rozprawy doktorskie: „Speaker verification system”

1

Nosratighods, Mohaddeseh Electrical Engineering &amp Telecommunications Faculty of Engineering UNSW. "Robust speaker verification system". Publisher:University of New South Wales. Electrical Engineering & Telecommunications, 2008. http://handle.unsw.edu.au/1959.4/42796.

Pełny tekst źródła

Streszczenie:

Identity verification or biometric recognition systems play an important role in our daily lives. Applications include Automatic Teller Machines (ATM), banking and share information retrieval, and personal verification for credit cards. Among the biometric techniques, authentication of speakers by his/her voice is of great importance, since it employs a non-invasive approach and is the only available modality in many applications. However,the performance of Automatic Speaker Verification (ASV) systems degrades significantly under adverse conditions which cause recordings from the same speaker to be different.The objective of this research is to investigate and develop robust techniques for performing automatic speaker recognition over various channel conditions, such as telephony and recorded microphone speech. This research is shown to improve the robustness of ASV systems in three main areas of feature extraction, speaker modelling and score normalization. At the feature level, a new set of dynamic features, termed Delta Cepstral Energy (DCE) is proposed, instead of traditional delta cepstra, which not only greatly reduces thedimensionality of the feature vector compared with delta and delta-delta cepstra, but is also shown to provide the same performance for matched testing and training conditions on TIMIT and a subset of the NIST 2002 dataset. The concept of speaker entropy, which conveys the information contained in a speaker's speech based on the extracted features, facilitates comparative evaluation of the proposed methods. In addition, Frequency Modulation features are combined in a complementary manner with the Mel Frequency CepstralCoefficients (MFCCs) to improve the performance of the ASV system under channel variability of various types. The proposed fused system shows a relative reduction of up to 23% in Equal Error Rate (EER) over the MFCC-based system when evaluated on the NIST 2008 dataset. Currently, the main challenge in speaker modelling is channel variability across different sessions. A recent approach to channel compensation, based on Support Vector Machines (SVM) is Nuisance Attribute Projection (NAP). The proposed multi-component approach to NAP, attempts to compensate for the main sources of inter-session variations through an additional optimization criteria, to allow more accurate estimates of the most dominant channel artefacts and to improve the system performance under mismatched training and test conditions. Another major issue in speaker recognition is that the variability of score distributions due to incompletely modelled regions of the feature space can produce segments of the test speech that are poorly matched to the claimed speaker model. A segment selection technique in score normalization is proposed that relies only on discriminative and reliable segments of the test utterance to verify the speaker. This approach is particularly useful in noisy conditions where using speech activity detection is not reliable at the feature level. Another source of score variability comes from the fact that not all phonemes are equally discriminative. To address this, a new score re-weighting technique is applied to likelihood values based on the discriminative level of each Gaussian component, i.e. each particular region of the feature space. It is found that a limited number of Gaussian mixtures, herein termed discriminative components are responsible for the overall performance, and that inclusion of the other non-discriminative components may only degrade the system performance.

Style APA, Harvard, Vancouver, ISO itp.

2

Sarma, Sridevi Vedula. "A segment-based speaker verification system using SUMMIT". Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/43406.

Pełny tekst źródła

Streszczenie:

Thesis (M.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.
Includes bibliographical references (p. 75-79).
by Sridevi Vedula Sarma.
M.S.

Style APA, Harvard, Vancouver, ISO itp.

3

Zhou, Yichao. "Lip password-based speaker verification system with unknown language alphabet". HKBU Institutional Repository, 2018. https://repository.hkbu.edu.hk/etd_oa/562.

Pełny tekst źródła

Streszczenie:

The traditional security systems that verify the identity of users based on password usually face the risk of leaking the password contents. To solve this problem, biometrics such as the face, iris, and fingerprint, begin to be widely used in verifying the identity of people. However, these biometrics cannot be changed if the database is hacked. What's more, verification systems based on the traditional biometrics might be cheated by fake fingerprint or the photo.;Liu and Cheung (Liu and Cheung 2014) have recently initiated the concept of lip password, which is composed of a password embedded in the lip movement and the underlying characteristics of lip motion [26]. Subsequently, a lip password-based system for visual speaker verification has been developed. Such a system is able to detect a target speaker saying the wrong password or an impostor who knows the correct password. That is, only a target user speaking correct password can be accepted by the system. Nevertheless, it recognizes the lip password based on a lip-reading algorithm, which needs to know the language alphabet of the password in advance, which may limit its applications.;To tackle this problem, in this thesis, we study the lip password-based visual speaker verification system with unknown language alphabet. First, we propose a method to verify the lip password based on the key frames of lip movement instead of recognizing the individual password elements, such that the lip password verification process can be made without knowing the password alphabet beforehand. To detect these key frames, we extract the lip contours and detect the interest intervals where the lip contours have significant variations. Moreover, in order to avoid accurate alignment of feature sequences or detection on mouth status which is computationally expensive, we design a novel overlapping subsequence matching approach to encode the information in lip passwords in the system. This technique works by sampling the feature sequences extracted from lip videos into overlapping subsequences and matching them individually. All the log-likelihood of each subsequence form the final feature of the sequence and are verified by the Euclidean distance to positive sample centers. We evaluate the proposed two methods on a database that contains totally 8 kinds of lip passwords including English digits and Chinese phrases. Experimental results show the superiority of the proposed methods for visual speaker verification.;Next, we propose a novel visual speaker verification approach based on diagonal-like pooling and pyramid structure of lips. We take advantage of the diagonal structure of sparse representation to preserve the temporal order of lip sequences by employ a diagonal-like mask in pooling stage and build a pyramid spatiotemporal features containing the structural characteristic under lip password. This approach eliminates the requirement of segmenting the lip-password into words or visemes. Consequently, the lip password with any language can be used for visual speaker verification. Experiments show the efficacy of the proposed approach compared with the state-of-the-art ones.;Additionally, to further evaluate the system, we also develop a prototype of the lip password-based visual speaker verification. The prototype has a Graphical User Interface (GUI) that make users easy to access.

Style APA, Harvard, Vancouver, ISO itp.

4

Mtibaa, Aymen. "Towards robust and privacy-preserving speaker verification systems". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAS002.

Pełny tekst źródła

Streszczenie:

Les systèmes de vérification du locuteur sont une technologie clé dans de nombreux appareils et services tels que les smartphones, les assistants numériques intelligents et les applications bancaires. Pendant la pandémie de COVID-19, les systèmes de contrôle d'accès basés sur des lecteurs d'empreintes digitales ou des claviers augmentent le risque de propagation du virus. Par conséquent, les entreprises repensent maintenant leurs systèmes de contrôle d'accès des employés et envisagent des technologies d'autorisation sans contact, telles que les systèmes de vérification des locuteurs. Cependant, les systèmes de vérification des locuteurs exigent que le système d'accès stocke les modèles des locuteurs et ait accès aux enregistrements ou aux caractéristiques dérivées des voix des locuteurs lors de l'authentification. Ce processus soulève certaines préoccupations concernant le respect de la vie privée de l'utilisateur et la protection de ces données biométriques sensibles. Un adversaire peut voler les informations biométriques des locuteurs pour usurper l'identité de l'utilisateur authentique et obtenir un accès non autorisé. De plus, lorsqu'il s'agit de données vocales, nous sommes confrontés à des problèmes supplémentaires de confidentialité et de respect de vie privée parce que à partir des données vocales plusieurs informations personnelles liées à l'identité, au sexe, à l'âge ou à l'état de santé du locuteur peuvent être extraites. Dans ce contexte, la présente thèse de doctorat aborde les problèmes de protection des données biométriques, le respect de vie privée et la sécurité pour les systèmes de vérification du locuteur basés sur les modèles de mélange gaussien (GMM), i-vecteur et x-vecteur comme modélisation du locuteur. L'objectif est le développement de systèmes de vérification du locuteur qui effectuent une vérification biométrique tout en respectant la vie privée et la protection des données biométriques de l'utilisateur. Pour cela, nous avons proposé des schémas de protection biométrique afin de répondre aux exigences de protection des données biométriques (révocabilité, diversité, et irréversibilité) décrites dans la norme ISO/IEC IS~24745 et pour améliorer la robustesse des systèmes contre différentes scénarios d'attaques
Speaker verification systems are a key technology in many devices and services like smartphones, intelligent digital assistants, healthcare, and banking applications. Additionally, with the COVID pandemic, access control systems based on fingerprint scanners or keypads increase the risk of virus propagation. Therefore, companies are now rethinking their employee access control systems and considering touchless authorization technologies, such as speaker verification systems.However, speaker verification system requires users to transmit their recordings, features, or models derived from their voice samples without any obfuscation over untrusted public networks which stored and processed them on a cloud-based infrastructure. If the system is compromised, an adversary can use this biometric information to impersonate the genuine user and extract personal information. The voice samples may contain information about the user's gender, accent, ethnicity, and health status which raises several privacy issues.In this context, the present PhD Thesis address the privacy and security issues for speaker verification systems based on Gaussian mixture models (GMM), i-vector, and x-vector as speaker modeling. The objective is the development of speaker verification systems that perform biometric verification while preserving the privacy and the security of the user. To that end, we proposed biometric protection schemes for speaker verification systems to achieve the privacy requirements (revocability, unlinkability, irreversibility) described in the standard ISO/IEC IS~24745 on biometric information protection and to improve the robustness of the systems against different attack scenarios

Style APA, Harvard, Vancouver, ISO itp.

5

Li, Yi. "Speaker Diarization System for Call-center data". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286677.

Pełny tekst źródła

Streszczenie:

To answer the question who spoke when, speaker diarization (SD) is a critical step for many speech applications in practice. The task of our project is building a MFCC-vector based speaker diarization system on top of a speaker verification system (SV), which is an existing Call-centers application to check the customer’s identity from a phone call. Our speaker diarization system uses 13-Dimensional MFCCs as Features, performs Voice Active Detection (VAD), segmentation, Linear Clustering and the Hierarchical Clustering based on GMM and the BIC score. By applying it, we decrease the Equal Error Rate (EER) of the SV from 18.1% in the baseline experiment to 3.26% on the general call-center conversations. To better analyze and evaluate the system, we also simulated a set of call-center data based on the public audio databases ICSI corpus.
För att svara på frågan vem som talade när är högtalardarisering (SD) ett kritiskt steg för många talapplikationer i praktiken. Uppdraget med vårt projekt är att bygga ett MFCC-vektorbaserat högtalar-diariseringssystem ovanpå ett högtalarverifieringssystem (SV), som är ett befintligt Call-center-program för att kontrollera kundens identitet från ett telefonsamtal. Vårt högtalarsystem använder 13-dimensionella MFCC: er som funktioner, utför Voice Active Detection (VAD), segmentering, linjär gruppering och hierarkisk gruppering baserat på GMM och BIC-poäng. Genom att tillämpa den minskar vi EER (Equal Error Rate) från 18,1 % i baslinjeexperimentet till 3,26 % för de allmänna samtalscentret. För att bättre analysera och utvärdera systemet simulerade vi också en uppsättning callcenter-data baserat på de offentliga ljuddatabaserna ICSI corpus.

Style APA, Harvard, Vancouver, ISO itp.

6

Guo, Yunfei. "Personalized Voice Activated Grasping System for a Robotic Exoskeleton Glove". Thesis, Virginia Tech, 2021. http://hdl.handle.net/10919/101751.

Pełny tekst źródła

Streszczenie:

Controlling an exoskeleton glove with a highly efficient human-machine interface (HMI), while accurately applying force to each joint remains a hot topic. This paper proposes a fast, secure, accurate, and portable solution to control an exoskeleton glove. This state of the art solution includes both hardware and software components. The exoskeleton glove uses a modified serial elastic actuator (SEA) to achieve accurate force sensing. A portable electronic system is designed based on the SEA to allow force measurement, force application, slip detection, cloud computing, and a power supply to provide over 2 hours of continuous usage. A voice-control-based HMI referred to as the integrated trigger-word configurable voice activation and speaker verification system (CVASV), is integrated into a robotic exoskeleton glove to perform high-level control. The CVASV HMI is designed for embedded systems with limited computing power to perform voice-activation and voice-verification simultaneously. The system uses MobileNet as the feature extractor to reduce computational cost. The HMI is tuned to allow better performance in grasping daily objects. This study focuses on applying the CVASV HMI to the exoskeleton glove to perform a stable grasp with force-control and slip-detection using SEA based exoskeleton glove. This research found that using MobileNet as the speaker verification neural network can increase the speed of processing while maintaining similar verification accuracy.
Master of Science
The robotic exoskeleton glove used in this research is designed to help patients with hand disabilities. This thesis proposes a voice-activated grasping system to control the exoskeleton glove. Here, the user can use a self-defined keyword to activate the exoskeleton and use voice to control the exoskeleton. The voice command system can distinguish between different users' voices, thereby improving the safety of the glove control. A smartphone is used to process the voice commands and send them to an onboard computer on the exoskeleton glove. The exoskeleton glove then accurately applies force to each fingertip using a force feedback actuator.This study focused on designing a state of the art human machine interface to control an exoskeleton glove and perform an accurate and stable grasp.

Style APA, Harvard, Vancouver, ISO itp.

7

Bekli, Zeid, i William Ouda. "A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model". Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20122.

Pełny tekst źródła

Streszczenie:

Voice recognition has become a more focused and researched field in the last century,and new techniques to identify speech has been introduced. A part of voice recognition isspeaker verification which is divided into Front-end and Back-end. The first componentis the front-end or feature extraction where techniques such as Mel-Frequency CepstrumCoefficients (MFCC) is used to extract the speaker specific features of a speech signal,MFCC is mostly used because it is based on the known variations of the humans ear’scritical frequency bandwidth. The second component is the back-end and handles thespeaker modeling. The back-end is based on the Gaussian Mixture Model (GMM) andGaussian Mixture Model-Universal Background Model (GMM-UBM) methods forenrollment and verification of the specific speaker. In addition, normalization techniquessuch as Cepstral Means Subtraction (CMS) and feature warping is also used forrobustness against noise and distortion. In this paper, we are going to build a speakerverification system and experiment with a variance in the amount of training data for thetrue speaker model, and to evaluate the system performance. And further investigate thearea of security in a speaker verification system then two methods are compared (GMMand GMM-UBM) to experiment on which is more secure depending on the amount oftraining data available.This research will therefore give a contribution to how much data is really necessary fora secure system where the False Positive is as close to zero as possible, how will theamount of training data affect the False Negative (FN), and how does this differ betweenGMM and GMM-UBM.The result shows that an increase in speaker specific training data will increase theperformance of the system. However, too much training data has been proven to beunnecessary because the performance of the system will eventually reach its highest point and in this case it was around 48 min of data, and the results also show that the GMMUBM model containing 48- to 60 minutes outperformed the GMM models.

Style APA, Harvard, Vancouver, ISO itp.

8

Shou-Chun, Yin 1980. "Speaker adaptation in joint factor analysis based text independent speaker verification". Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=100735.

Pełny tekst źródła

Streszczenie:

This thesis presents methods for supervised and unsupervised speaker adaptation of Gaussian mixture speaker models in text-independent speaker verification. The proposed methods are based on an approach which is able to separate speaker and channel variability so that progressive updating of speaker models can be performed while minimizing the influence of the channel variability associated with the adaptation recordings. This approach relies on a joint factor analysis model of intrinsic speaker variability and session variability where inter-session variation is assumed to result primarily from the effects of the transmission channel. These adaptation methods have been evaluated under the adaptation paradigm defined under the NIST 2005 speaker recognition evaluation plan which is based on conversational telephone speech.

Style APA, Harvard, Vancouver, ISO itp.

9

Chan, Siu Man. "Improved speaker verification with discrimination power weighting /". View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20CHANS.

Pełny tekst źródła

Streszczenie:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 86-93). Also available in electronic version. Access restricted to campus users.

Style APA, Harvard, Vancouver, ISO itp.

10

Cilliers, Francois Dirk. "Tree-based Gaussian mixture models for speaker verification". Thesis, Link to the online version, 2005. http://hdl.handle.net/10019.1/1639.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

11

Wan, Qianhui. "Speaker Verification Systems Under Various Noise and SNR Conditions". Thesis, Université d'Ottawa / University of Ottawa, 2017. http://hdl.handle.net/10393/36888.

Pełny tekst źródła

Streszczenie:

In speaker verification, the mismatches between the training speech and the testing speech can greatly affect the robustness of classification algorithms, and the mismatches are mainly caused by the changes in the noise types and the signal to noise ratios. This thesis aims at finding the most robust classification methods under multi-noise and multiple signal to noise ratio conditions. Comparison of several well-known state of the art classification algorithms and features in speaker verification are made through examining the performance of small-set speaker verification system (e.g. voice lock for a family). The effect of the testing speech length is also examined. The i-vector/Probabilistic Linear Discriminant Analysis method with compensation strategies is shown to provide a stable performance for both previously seen and previously unseen noise scenarios, and a C++ implementation with online processing and multi-threading is developed for this approach.

Style APA, Harvard, Vancouver, ISO itp.

12

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition". Thesis, Queensland University of Technology, 2001.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

13

Phythian, Mark. "Speaker identification for forensic applications". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36079/3/__qut.edu.au_Documents_StaffHome_StaffGroupR%24_rogersjm_Desktop_36079_Digitised%20Thesis.pdf.

Pełny tekst źródła

Streszczenie:

A major application of Speaker Identification (SI) is suspect identification by voice. This thesis investigates techniques that can be used to improve SI technology as applied to suspect identification. Speech Coding techniques have become integrated into many of our modern voice communications systems. This prompts the question - how are automatic speaker identification systems and modern forensic identification techniques affected by the introduction of digitally coded speech channels? Presented in this thesis are three separate studies investigating the effects of speech coding and compression on current speaker recognition techniques. A relatively new Spectral Analysis technique - Higher Order Spectral Analysis (HOSA) - has been identified as a potential candidate for improving some aspects of forensic speaker identification tasks. Presented in this thesis is a study investigating the application of HOSA to improve the robustness of current ASR techniques in the presence of additive Gaussian noise. Results from our investigations reveal that incremental improvements in each of these aspects related to automatic and forensic identification are achievable.

Style APA, Harvard, Vancouver, ISO itp.

14

Slomka, Stefan. "Multiple classifier structures for automatic speaker recognition under adverse conditions". Thesis, Queensland University of Technology, 1999.

Znajdź pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

15

Leis, John W. "Spectral coding methods for speech compression and speaker identification". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36062/7/36062_Digitised_Thesis.pdf.

Pełny tekst źródła

Streszczenie:

This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.

Style APA, Harvard, Vancouver, ISO itp.

16

Barger, Peter James. "Speech processing for forensic applications". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36081/1/36081_Barger_1998.pdf.

Pełny tekst źródła

Streszczenie:

This thesis examines speech processing systems appropriate for use in forensic analysis. The need for automatic speech processing systems for forensic use is justified by the increasing use of electronically recorded speech for communication. An automatic speaker identification and verification system is described which was tested on data gathered by the Queensland Police Force. Speaker identification using Gaussian mixture models (GMMs) is shown to be useful as an indicator of identity, but not sufficiently accurate to be used as the sole means of identification. It is shown that training GMMs on speech of one language and testing on speech of another language introduces significant bias into the results, which is unpredictable in its effects. This has implications for the performance of the system on subjects attempting to disguise their voices. Automatic gender identification systems are shown to be highly accurate, attaining 98% accuracy, even with very simple classifiers, and when tested on speech degraded by coding or reverberation. These gender gates are useful as initial classifiers in a larger speaker classification system and may even find independent use in a forensic environment. A dual microphone method of improving the performance of speaker identification systems in noisy environments is described. The method gives a significant improvement in log-likelihood scores when its output is used as input to a GMM. This implies that speaker identification tests may be improved in accuracy. A method of automatically assessing the quality of transmitted speech segments using a classification scheme is described. By classifying the difference between cepstral parameters describing the original speech and the transmitted speech, an estimate of the speech quality is obtained.

Style APA, Harvard, Vancouver, ISO itp.

17

PINHEIRO, Hector Natan Batista. "Verificação de locutores independente de texto: uma análise de robustez a ruído". Universidade Federal de Pernambuco, 2015. https://repositorio.ufpe.br/handle/123456789/18045.

Pełny tekst źródła

Streszczenie:

Submitted by Irene Nascimento (irene.kessia@ufpe.br) on 2016-11-08T19:13:18Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5)
Made available in DSpace on 2016-11-08T19:13:18Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Dissertação_Final.pdf: 15901621 bytes, checksum: e3bd1c1be70941932d970f61be02e4c1 (MD5) Previous issue date: 2015-02-25
O processo de identificação de um determinado indivíduo é realizado milhões de vezes, todos os dias, por organizações dos mais diversos setores. Perguntas como "Quem é esse indivíduo?" ou "É essa pessoa quem ela diz ser?" são realizadas frequentemente por organizações financeiras, sistemas de saúde, sistemas de comércio eletrônico, sistemas de telecomunicações e por instituições governamentais. Identificação biométrica diz respeito ao processo de realizar essa identificação a partir de características físicas ou comportamentais. Tais características são comumente referenciadas como características biométricas e alguns exemplos delas são: face, impressão digital, íris, assinatura e voz. Reconhecimento de locutores é uma modalidade biométrica que se propõe a realizar o processo de identificação pessoal a partir das informações presentes unicamente na voz do indivíduo. Este trabalho foca no desenvolvimento de sistemas de verificação de locutores independente de texto. O principal desafio no desenvolvimento desses sistemas provém das chamadas incompatibilidades que podem ocorrer na aquisição dos sinais de voz. As técnicas propostas para suavizá-las são chamadas de técnicas de compensação e três são os domínios onde elas podem operar: no processo de extração de características do sinal, na construção dos modelos dos locutores e no cálculo do score final do sistema. Além de apresentar uma vasta revisão da literatura do desenvolvimento de sistemas de verificação de locutores independentes de texto, esse trabalho também apresenta as principais técnicas de compensação de características, modelos e scores. Na fase de experimentação, uma análise comparativa das principais técnicas propostas na literatura é apresentada. Além disso, duas técnicas de compensação são propostas, uma do domínio de modelagem e outra do domínio dos scores. A técnica de compensação de score proposta é baseada na Distribuição Normal Acumulada e apresentou, em alguns contextos, resultados superiores aos apresentados pelas principais técnicas da literatura. Já a técnica de compensação de modelo é baseada em uma técnica da literatura que combina dois conceitos: treinamento multi-condicional e Teoria dos Dados Ausentes (Missing Data Theory). A formulação apresentada pelos autores é baseada nos chamados Modelos de União a Posteriori (Posterior Union Models), mas não é completamente adequada para verificação de locutores independente de texto. Este trabalho apresenta uma formulação apropriada para esse contexto que combina os dois conceitos utilizados pelos autores com um tipo de modelagem utilizando UBMs (Universal Background Models). A técnica proposta apresentou ganhos de desempenhos quando comparada à técnica-padrão GMM-UBM, baseada em Modelos de Misturas Gaussianas (GMMs).
The personal identification of individuals is a task executed millions of times every day by organizations from diverse fields. Questions such as "Who is this individual?" or "Is this person who he or she claims to be?" are constantly made by organizations in financial services, health care, e-commerce, telecommunication systems and governments. Biometric identification is the process of identifying people using their physiological or behavioral characteristics. These characteristics are generally known as biometrics and examples of these include face, fingerprint, iris, handwriting and speech. Speaker recognition is a biometric modality which makes the personal identification by using speaker-specific information from the speech. This work focuses on the development of text-independent speaker verification systems. In these systems, speech from an individual is used to verify the claimed identity of that individual. Furthermore, the verification must occur independently of the pronounced word or phrase. The main challenge in the development of speaker recognition systems comes from the mismatches which may occur in the acquisition of the speech signals. The techniques proposed to mitigate the mismatch effects are referred as compensation methods. They may operate in three domains: in the feature extraction process, in the estimation of the speaker models and in the computation of the decision score. Besides presenting a wide description of the main techniques used in the development of text-independent speaker verification systems, this work presents the description of the main feature-, model- and score-based compensation methods. In the experiments, this work shows comprehensive comparisons between the conventional techniques and the alternatively compensations methods. Furthermore, two compensation methods are proposed: one operates in the model domain and the other in the score-domain. The scoredomain proposed compensation method is based on the Normal cumulative distribution function and, in some contexts, outperformed the performance of the main score-domain compensation techniques. On the other hand, the model-domain compensation technique proposed in this work is based on a method presented in the literature which combines two concepts: the multi-condition training and the Missing Data Theory. The formulation proposed by the authors is based on the Posterior Union models and is not completely appropriate for the text-independent speaker verification task. This work proposes a more appropriate formulation for this context which combines the concepts used by the authors with a type of modeling using Universal Background Models (UBMs). The proposed method outperformed the usual GMM-UBM modeling technique, based on Gaussian Mixture Models (GMMs).

Style APA, Harvard, Vancouver, ISO itp.

18

Lucey, Simon. "Audio-visual speech processing". Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Pełny tekst źródła

Streszczenie:

Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.

Style APA, Harvard, Vancouver, ISO itp.

19

"Text-independent bilingual speaker verification system". 2003. http://library.cuhk.edu.hk/record=b5891732.

Pełny tekst źródła

Streszczenie:

Ma Bin.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 96-102).
Abstracts in English and Chinese.
Abstract --- p.i
Acknowledgement --- p.iv
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Biometrics --- p.2
Chapter 1.2 --- Speaker Verification --- p.3
Chapter 1.3 --- Overview of Speaker Verification Systems --- p.4
Chapter 1.4 --- Text Dependency --- p.4
Chapter 1.4.1 --- Text-Dependent Speaker Verification --- p.5
Chapter 1.4.2 --- GMM-based Speaker Verification --- p.6
Chapter 1.5 --- Language Dependency --- p.6
Chapter 1.6 --- Normalization Techniques --- p.7
Chapter 1.7 --- Objectives of the Thesis --- p.8
Chapter 1.8 --- Thesis Organization --- p.8
Chapter 2 --- Background --- p.10
Chapter 2.1 --- Background Information --- p.11
Chapter 2.1.1 --- Speech Signal Acquisition --- p.11
Chapter 2.1.2 --- Speech Processing --- p.11
Chapter 2.1.3 --- Engineering Model of Speech Signal --- p.13
Chapter 2.1.4 --- Speaker Information in the Speech Signal --- p.14
Chapter 2.1.5 --- Feature Parameters --- p.15
Chapter 2.1.5.1 --- Mel-Frequency Cepstral Coefficients --- p.16
Chapter 2.1.5.2 --- Linear Predictive Coding Derived Cep- stral Coefficients --- p.18
Chapter 2.1.5.3 --- Energy Measures --- p.20
Chapter 2.1.5.4 --- Derivatives of Cepstral Coefficients --- p.21
Chapter 2.1.6 --- Evaluating Speaker Verification Systems --- p.22
Chapter 2.2 --- Common Techniques --- p.24
Chapter 2.2.1 --- Template Model Matching Methods --- p.25
Chapter 2.2.2 --- Statistical Model Methods --- p.26
Chapter 2.2.2.1 --- HMM Modeling Technique --- p.27
Chapter 2.2.2.2 --- GMM Modeling Techniques --- p.30
Chapter 2.2.2.3 --- Gaussian Mixture Model --- p.31
Chapter 2.2.2.4 --- The Advantages of GMM --- p.32
Chapter 2.2.3 --- Likelihood Scoring --- p.32
Chapter 2.2.4 --- General Approach to Decision Making --- p.35
Chapter 2.2.5 --- Cohort Normalization --- p.35
Chapter 2.2.5.1 --- Probability Score Normalization --- p.36
Chapter 2.2.5.2 --- Cohort Selection --- p.37
Chapter 2.3 --- Chapter Summary --- p.38
Chapter 3 --- Experimental Corpora --- p.39
Chapter 3.1 --- The YOHO Corpus --- p.39
Chapter 3.1.1 --- Design of the YOHO Corpus --- p.39
Chapter 3.1.2 --- Data Collection Process of the YOHO Corpus --- p.40
Chapter 3.1.3 --- Experimentation with the YOHO Corpus --- p.41
Chapter 3.2 --- CUHK Bilingual Speaker Verification Corpus --- p.42
Chapter 3.2.1 --- Design of the CUBS Corpus --- p.42
Chapter 3.2.2 --- Data Collection Process for the CUBS Corpus --- p.44
Chapter 3.3 --- Chapter Summary --- p.46
Chapter 4 --- Text-Dependent Speaker Verification --- p.47
Chapter 4.1 --- Front-End Processing on the YOHO Corpus --- p.48
Chapter 4.2 --- Cohort Normalization Setup --- p.50
Chapter 4.3 --- HMM-based Speaker Verification Experiments --- p.53
Chapter 4.3.1 --- Subword HMM Models --- p.53
Chapter 4.3.2 --- Experimental Results --- p.55
Chapter 4.3.2.1 --- Comparison of Feature Representations --- p.55
Chapter 4.3.2.2 --- Effect of Cohort Normalization --- p.58
Chapter 4.4 --- Experiments on GMM-based Speaker Verification --- p.61
Chapter 4.4.1 --- Experimental Setup --- p.61
Chapter 4.4.2 --- The number of Gaussian Mixture Components --- p.62
Chapter 4.4.3 --- The Effect of Cohort Normalization --- p.64
Chapter 4.4.4 --- Comparison of HMM and GMM --- p.65
Chapter 4.5 --- Comparison with Previous Systems --- p.67
Chapter 4.6 --- Chapter Summary --- p.70
Chapter 5 --- Language- and Text-Independent Speaker Verification --- p.71
Chapter 5.1 --- Front-End Processing of the CUBS --- p.72
Chapter 5.2 --- Language- and Text-Independent Speaker Modeling --- p.73
Chapter 5.3 --- Cohort Normalization --- p.74
Chapter 5.4 --- Experimental Results and Analysis --- p.75
Chapter 5.4.1 --- Number of Gaussian Mixture Components --- p.78
Chapter 5.4.2 --- The Cohort Normalization Effect --- p.79
Chapter 5.4.3 --- Language Dependency --- p.80
Chapter 5.4.4 --- Language-Independency --- p.83
Chapter 5.5 --- Chapter Summary --- p.88
Chapter 6 --- Conclusions and Future Work --- p.90
Chapter 6.1 --- Summary --- p.90
Chapter 6.1.1 --- Feature Comparison --- p.91
Chapter 6.1.2 --- HMM Modeling --- p.91
Chapter 6.1.3 --- GMM Modeling --- p.91
Chapter 6.1.4 --- Cohort Normalization --- p.92
Chapter 6.1.5 --- Language Dependency --- p.92
Chapter 6.2 --- Future Work --- p.93
Chapter 6.2.1 --- Feature Parameters --- p.93
Chapter 6.2.2 --- Model Quality --- p.93
Chapter 6.2.2.1 --- Variance Flooring --- p.93
Chapter 6.2.2.2 --- Silence Detection --- p.94
Chapter 6.2.3 --- Conversational Speaker Verification --- p.95
Bibliography --- p.102

Style APA, Harvard, Vancouver, ISO itp.

20

Huang, Wei-Hsun, i 黃威勛. "A Study of Speaker Verification System". Thesis, 2008. http://ndltd.ncl.edu.tw/handle/59811329629420223200.

Pełny tekst źródła

Streszczenie:

碩士
大葉大學
電信工程學系碩士班
96
The main purpose of speaker verification is to identify the speaker according to the related information of voice signals, and it requires a lot of steps to catch the differences between these signals by computers. In this thesis, Mel-Frequency Cepstrum Coefficients, MFCCs, are used as voice characteristic coefficients to match the characteristics of human pronunciation and hearing. Gaussian mixture model is widely used in the field of text independent speaker verification. However, differences of various apeakers’ voice are not only caused by different oral cavity shapes and vocal cords, but also the articulation speed. Because Gaussian mixture model does not consider the difference of articulation speed, high-order ergodic Gaussian model is adopted in this thesis to implement the text independent speaker verification system. These two models are tested under the same condition, and the results show that high-order ergodic Gaussian model can improve the performance. Equal Error Rate reduces 3.8 percentages.

Style APA, Harvard, Vancouver, ISO itp.

21

Shiy, Zhi-Hong, i 許志宏. "A Study of Speaker Verification System". Thesis, 2007. http://ndltd.ncl.edu.tw/handle/30030484954050201539.

Pełny tekst źródła

Streszczenie:

碩士
清雲科技大學
電機工程研究所
95
In the progress of technology. The speech recognition is adapted enough to in daily life, which is a subfield of speech recognition is often used in a security system. There are many different kinds of speaker recognition technical and (or methods) and each one has its strong point and drawback. In this thesis, we will discuss the difference between each method. In chapter one, we introduce the study background, motivation, and outline of each chapter, chapter two discusses the signal processing steps before speech recognition, which includes framing, end-point-detection, Hamming window and feature selection. Chapter three explains two kinds of matching methods that is dynamic programming and vectorquantization. The experimental results which obtained by using Matlab Mathematical Toolare discussed in chapter four. In our experiments, the characteristic parameters of a speaker are Mel-frequency, and we change the experimental parameter in the experiment to observe the influence of the verification rate, the experimental parameters are the number of samples in each frame, the number of the overlap samples between frames and the dimensions of Mel-frequency.

Style APA, Harvard, Vancouver, ISO itp.

22

劉耀隆. "Parameter choice for speaker verification system and imitating voiceprint verification". Thesis, 2005. http://ndltd.ncl.edu.tw/handle/grmqju.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

23

Yu-hong, Li, i 李昱鴻. "An Enhanced Text-Independent Speaker Verification System". Thesis, 2005. http://ndltd.ncl.edu.tw/handle/90462488993922482956.

Pełny tekst źródła

Streszczenie:

碩士
國立中興大學
電機工程學系
93
Speaker verification is an important technique in security and crime monitored, in this thesis, we proposed two algorithms to perform a traditional text-independent speaker verification system. First, an entropy algorithms is used in endpoint detection, and determine the test utterance length, next, a normalized background model is proposed to enhance the verification rate, the difference of our proposed model and traditional background model is computation decreasing. Experimental results demonstrate that our proposed algorithm was efficiency on text-independent speaker verification system, the proposed background model is normalized with the consequence of more compact score distribution and low equal error rate, moreover, experimental result of entropy-based algorithm prove that the improved feature can be successfully used in the noisy environments.

Style APA, Harvard, Vancouver, ISO itp.

24

xing-min-lin i 林幸民. "A Robust Text Dependent Speaker Verification System". Thesis, 2004. http://ndltd.ncl.edu.tw/handle/73367737259637336248.

Pełny tekst źródła

Streszczenie:

碩士
國立中興大學
電機工程學系
92
Abstract By the prosperity of computer industry，people have higher requests for the security environment. Thus, the need for the speaker verification with the high distinguishing rate and low cost is indispensable. In general, in the quiets laboratory, it makes no difference that the speaker verification rate can be both reached the high distinguishing rate. However, the distinguishing rate in the different channel can be carried a lot. Therefore, to improve the distinguishing rate in the different channel is the major issue in this thesis. In the thesis, a volume normalization and cepstral normalization is added to increase the speaker verification rate. We have test many voice data in quiet environment and also in noisy environment. We also test the speech in different channel. Simulation results show that using the cepstral normalization, can reduce the channel effect and increase the speaker verification rate. Using the volume normalization can also improve the speaker verification rate in quite environment.

Style APA, Harvard, Vancouver, ISO itp.

25

Chung-Ying, Hsieh. "An Improved Speaker Verification System Using Orthogonal GMM". 2006. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0005-1508200621115700.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

26

Li, An-Chi, i 李安基. "Noise Reduction for Text Dependent Speaker Verification System". Thesis, 2008. http://ndltd.ncl.edu.tw/handle/28242714521701538505.

Pełny tekst źródła

Streszczenie:

碩士
國立中興大學
電機工程學系所
96
The development of speaker verification system become maturely and its application become extension of the scope. To raise the recognition rate is the key point of the speech recognition. In this thesis, we use many noise reduce methods to reduce the noise of speech and to raise the recognition rate. Two major methods were need in thesis to reduce the noise for test dependent speaker verification. The speaker verification experiment was conducted. The speech signals were taken from the MMLab database, NCHU. 100 speaker(50 males, 50 females) were need in the test. The tests show that using cepstral mean subtraction(CMS) noise reduction method can effectively increase the speaker verification rate. Adding the cepstral weighting(CW) noise reduction method can improve the verification performance.

Style APA, Harvard, Vancouver, ISO itp.

27

Chen, Bo-ren, i 陳柏仁. "The Application of Voting to the Speaker Verification System". Thesis, 2007. http://ndltd.ncl.edu.tw/handle/78831977862308500010.

Pełny tekst źródła

Streszczenie:

碩士
國立中央大學
電機工程研究所
95
This thesis uses a kind of new score computing –Voting, making use of it on the speaker verification system and the efficiency of speaker verification system is improved. We combine Voting and Test normalization and four new kinds of speaker verification system are proposed, improved hybrid speaker verification system can reach the greatest improvement. The experimental result shows, improved hybrid speaker verification system compare with the traditional speaker verification system that EER can be up to 3.25% and DCF can be up to 0.0402 of the improvement. Improved hybrid speaker verification system compare with the test normalization speaker verification system that EER can be up to 0.59% and DCF can be up to 0.0022 of the improvement. The new speaker verification system we propose may assist with test normalization speaker verification system. The new system can supply speaker information and improve the efficiency of speaker verification system.

Style APA, Harvard, Vancouver, ISO itp.

28

Chang, Sheng-Jyun, i 張勝鈞. "Double Feature Extraction for Text Dependent Speaker Verification System". Thesis, 2007. http://ndltd.ncl.edu.tw/handle/12043081480115871273.

Pełny tekst źródła

Streszczenie:

碩士
中興大學
電機工程學系所
95
In recent years, speaker verification technique and its applications become extension of the scope and the importance of the study of speaker verification is increasing. In this thesis, we developed a combined feature extraction set and used in place of conventional LPC or MFCC feature only. The Linear Predictive Coding (LPC) and its Delta-cepstral coefficients in voice verification system have shown a good result in speaker verification. The use of Mel-Frequency Cepstral Coefficients (MFCC) that has twenty triangular filters to approximate entire speech features was also been used in speaker verification for many years. The experimental results show using the new LPCC-MFCC combined feature have better performance on text dependent speaker verification system.

Style APA, Harvard, Vancouver, ISO itp.

29

Chang, Su-Yu, i 張蘇瑜. "Speaker Verification System with Converted Speech Spoofing Detection Mechanism". Thesis, 2019. http://ndltd.ncl.edu.tw/handle/et33a6.

Pełny tekst źródła

Streszczenie:

碩士
國立中山大學
資訊工程學系研究所
107
In this paper, we implement a speaker verification system that can detect converted speech attack through combining representation learning and neural networks. The system is divided into two subsystems: the countermeasure system and the verification system. The countermeasure system is responsible for detecting whether the speech is a spoofing speech generated by voice conversion or speech synthesis. The verification system is able to verify whether the speech is consistent with the identity claimed by the speaker through the voiceprint feature. In the countermeasure system, we use the method of representation learning and transfer learning to let the neural network can learn various spoofing speech features. First we use multiple labels of data for training, then use two labels of data fine-tuning models to learn the representation vectors of bona fide and spoofing speech, and finally use the support vector machine to classify. On the ASVspoof 2019 evaluation set, our system achieves a minimum tandem decision cost of 0.1782, and an equal error rate (EER) of 7.62%. In the speaker verification system, we apply a large training data to learn the speaker characterization, and use the learned speaker representation to enrollment and verification. We focus on the text-dependent task, and we evaluate our system on the real environment of 20 testers can achieve the 99% accuracy.

Style APA, Harvard, Vancouver, ISO itp.

30

(11178210), Li-Chi Chang. "Defending against Adversarial Attacks in Speaker Verification Systems". Thesis, 2021.

Znajdź pełny tekst źródła

Streszczenie:

With the advance of the technologies of Internet of things, smart devices or virtual personal assistants at home, such as Google Assistant, Apple Siri, and Amazon Alexa, have been widely used to control and access different objects like door lock, blobs, air conditioner, and even bank accounts, which makes our life convenient. Because of its ease for operations, voice control becomes a main interface between users and these smart devices. To make voice control more secure, speaker verification systems have been researched to apply human voice as biometrics to accurately identify a legitimate user and avoid the illegal access. In recent studies, however, it has been shown that speaker verification systems are vulnerable to different security attacks such as replay, voice cloning, and adversarial attacks. Among all attacks, adversarial attacks are the most dangerous and very challenging to defend. Currently, there is no known method that can effectively defend against such an attack in speaker verification systems.

The goal of this project is to design and implement a defense system that is simple, light-weight, and effectively against adversarial attacks for speaker verification. To achieve this goal, we study the audio samples from adversarial attacks in both the time domain and the Mel spectrogram, and find that the generated adversarial audio is simply a clean illegal audio with small perturbations that are similar to white noises, but well-designed to fool speaker verification. Our intuition is that if these perturbations can be removed or modified, adversarial attacks can potentially loss the attacking ability. Therefore, we propose to add a plugin-function module to preprocess the input audio before it is fed into the verification system. As a first attempt, we study two opposite plugin functions: denoising that attempts to remove or reduce perturbations and noise-adding that adds small Gaussian noises to an input audio. We show through experiments that both methods can significantly degrade the performance of a state-of-the-art adversarial attack. Specifically, it is shown that denoising and noise-adding can reduce the targeted attack success rate of the attack from 100% to only 56% and 5.2%, respectively. Moreover, noise-adding can slow down the attack 25 times in speed and has a minor effect on the normal operations of a speaker verification system. Therefore, we believe that noise-adding can be applied to any speaker verification system against adversarial attacks. To the best of our knowledge, this is the first attempt in applying the noise-adding method to defend against adversarial attacks in speaker verification systems.

Style APA, Harvard, Vancouver, ISO itp.

31

ChiaFeng, Chen, i 陳嘉峰. "A Study on Speaker Verification System Using Hidden Markov Model". Thesis, 2000. http://ndltd.ncl.edu.tw/handle/91019952933404054926.

Pełny tekst źródła

Streszczenie:

碩士
國立臺北科技大學
電腦通訊與控制研究所
88
This thesis develops a text-dependent speaker verification system based on Hidden Markov Model (HMM). The fixed digit-string password utterances are segmented into a sequence of isolated-word units for constructing speaker models by employing a segmental K-means training procedure. In order to improve the performance of speaker verification, normalized log-likelihood scoring is utilized against specified speaker reference models and speaker background models which were obtained from cohort speaker set that is based on similarity measure. Several sets of experimental utterances were used for the evaluation of the system, which include male and female utterances recorded through microphone and telephone networks. Experimental results indicate that with the use of individual speaker background models the best equal error rates (EER) of 0.3% and 4.48% were achieved, respectively, for microphone speech (20 true speakers, 5 impostors) and telephone speech (20 true speakers, 10 impostors).

Style APA, Harvard, Vancouver, ISO itp.

32

Chang, Jung-Lin, i 張榮霖. "Case Study of CTI System And Speaker Verification Via Telephone". Thesis, 2003. http://ndltd.ncl.edu.tw/handle/28926152976021236089.

Pełny tekst źródła

Streszczenie:

碩士
國立高雄第一科技大學
電腦與通訊工程所
91
The open to the telecommunications, the establish of the broadband Internet, the abundant of service and business and along with the CTI Computer Telephony Integration generate the largest economic benefit to the enterprise and the customers. This research paper is aimed to further study the CIT Applied Technology. To begin with,theresearch will investigate individually on Genesys’and Chain Sea Integration’s CTI system. In the end, develop the speaker verification in the telephone system. First of all, we use Dialogic’s speech telephone card to design a speech verification process combined with speaker verification. Next, set up member speech database in assistance to website registration system. The core of speaker verification is based on Hidden Markov Models (HMM), cooperated with member speech database for training and then creates a threshold level for each member. Member can receive advanced functions and service through the verification of website registration and telephony speaker verification system.

Style APA, Harvard, Vancouver, ISO itp.

33

Wu, Cheng-Hsiung, i 吳正雄. "Performance Evaluation of Speaker Verification for Mobile Voice-Activated Trading System". Thesis, 2001. http://ndltd.ncl.edu.tw/handle/09854519574455568728.

Pełny tekst źródła

Streszczenie:

碩士
國立臺北科技大學
機電整合研究所
89
This thesis investigates the effects of transcoded speech and real GSM speech on the performance of speaker verification for mobile voiced-activated trading system. The transcoded speech for simulation is obtained by transcoding microphone and wired telephone speech databases using various coding schemes. In order to match the real-world environments, a GSM speech database consisting of 20 male and 20 female speakers is also collected over the mobile wireless network. Three in-vehicle call environments are considered: stopped cars (0 km/hr) with running engine, running cars with driving speeds of 50 km/hr and 90 km/hr. Each speaker pronounced 40 7-digit strings at each condition. This results in a database of 4800 digit strings, which is suitable for use in related researches. A text-dependent Hidden Markov Model-based system is implemented for performance evaluation. Experimental results demonstrate that verification performance of real GSM speech is far worse than that of transcoded speech due to channel effects and background noise. Consequently, this investigation provides a useful and practical baseline of performance evaluation for mobile voice-activated trading systems. The results also indicate that 0 km/hr case yields the best performance in the matched conditions; 90 km/hr results in the worst performance in mismatched conditions; and performance of male is always superior to that of female in all conditions. Moreover, we find that the proposed mixed training model improves the performance in some cases.

Style APA, Harvard, Vancouver, ISO itp.

34

Lin, Shiou-De, i 林修德. "Deep Neural Network based Factor Analysis for Robust Speaker Verification System". Thesis, 2013. http://ndltd.ncl.edu.tw/handle/pf7dv7.

Pełny tekst źródła

Streszczenie:

碩士
國立臺北科技大學
電腦與通訊研究所
102
The goal of this study is to build a model of robust speaker verification. In the speaker verification, performance is affected with noise, environment, or session …etc. i-Vector+Linear Discriminant Analysis (LDA) and i-Vector+ Probabilistic Linear Discriminant Analysis (PLDA) systems have become the state-of-the art technique in the speaker verification field. Because of PLDA''s speaker model is based on the strong assumption that the probability distribution is a Gaussian distribution of information, but due to the variability of the data, the assumption is not always right. So we further proposed variation Deep Neural Network-based systems based on neural network using method.We use the model (FA-DNN), the hidden layer having a high degree of representation, into the non-language speaker node, in the test, only focus on the contribution speaker node. In this thesis, three methods are experimented on the SRE14. The experimental results on min DCF trial showed that relative performance gain of FA-DNN is 9.84%, and EER of PLDA is 13.25%.

Style APA, Harvard, Vancouver, ISO itp.

35

Lin, Hung-Lung, i 林宏隆. "Implementation of a Speaker Verification System Using a Neural Network Processor". Thesis, 1993. http://ndltd.ncl.edu.tw/handle/28891650358178728058.

Pełny tekst źródła

Streszczenie:

碩士
大同工學院
電機工程研究所
81
Speaker verification is one of the applications ofspeaker recognition and has practical usage. The goal of this research is to design and implement a real-time speaker verification system using a digital signal processor (DSP96002) and a neural network chip (80170NX). As a typical speaker recognition system, there are two main parts in our system, feature extraction and pattern classifier. We take the linear predictive coding (LPC) derived cepstrum as a feature, which is found to have the best performance for speaker recognition. Hardware of our speaker verification system has been imple- mented successfully and achieved the requirement of real-time operation. System performance is evaluated and the system parameters for the highest recognition rate are suggested. The precision of analog voltage in the circuit is the key of system performance. The experiment results showed that, our design of speaker verification system has high potential and flexibility for practical application.

Style APA, Harvard, Vancouver, ISO itp.

36

Ho, Hon-Ron, i 何宏榮. "A Study of Speaker Verification on Contactless Smart Card Application System". Thesis, 2001. http://ndltd.ncl.edu.tw/handle/72175373022550969647.

Pełny tekst źródła

Streszczenie:

碩士
國立成功大學
工程科學系
89
【Abstract】 This thesis treats about the application of the Contactless Smart Cards and the speaker verification technology. At first a personal speech is recorded through a microphone, these speaker wave data are then processing through a DSP device, which performs FFT and LPC operation. These training results are packed and filtered to become a compact speaker voice signature as small as 32 bytes. Thus signature data can then be written into a smart card chip for the cardholder ID verification application through the same speech recorder unit. The speaker ID verification system is not only user friendly but also good for user ID protection. A V-star’s QUISAR 560 card reader and writer, Mifare contactless smart cards, a compact microphone, a voice direct module and a 64MB RAM Sound Dialog Card are used in cooperating with a PentiumIII 500 PC for this thesis. The developing softwares are including Borland C++3.1, Visual Basic 6.0 and Matlab 5.3 tool kits for this particular application. This thesis can also be applied for the safety of cardholders’ verification on the future E-business.

Style APA, Harvard, Vancouver, ISO itp.

37

Yu, Hao Chung, i 俞皓中. "Open Set Classification Based on Tolerance Interval for Speaker Verification System". Thesis, 2002. http://ndltd.ncl.edu.tw/handle/48804952821192817644.

Pełny tekst źródła

Streszczenie:

碩士
國立臺灣大學
資訊工程學研究所
90
Speaker verification systems solve the problem of verifying whether a given utterance comes from a claimed speaker. This problem is important because an accurate speaker verification system can be applied to many security systems. Comparing to other biometric methods like fingerprint or face recognition, speaker verification systems do not require expensive specialized equipments and are effective especially for remote identity verification. Previously, Renoylds et al. have proposed a speaker verification system using Gaussian mixture model, but their system is incomplete because their system needs a set of background speaker models, which are constructed using a large speech database of a variety of speakers. It may not be feasible to obtain such a database in the real world. In this thesis, I propose a new solution called OSCILLO, for speaker verification. By applying tolerance interval technique in statistics, OSCILLO can verify a speaker's ID without background speaker models. This greatly reduces the size of the whole system and the time for both training and testing. We compare OSCILLO and Reynolds' method using three standard speech databases: TCC-300, TIMIT and NIST. The experimental results show that OSCILLO performs well for all databases.

Style APA, Harvard, Vancouver, ISO itp.

38

Su, Yu-Jui, i 蘇俞睿. "A study and implementation on Speaker Verification System using Gender Information". Thesis, 2018. http://ndltd.ncl.edu.tw/handle/3jrk4n.

Pełny tekst źródła

Streszczenie:

碩士
國立臺灣大學
資訊工程學研究所
106
For speaker verification task, one way to improve system’s accuracy without changing the algorithm of acoustic model is to use gender-dependent model instead of gender-independent one. However, since test speakers’ gender are not available, gender classifier plays an important role since its accuracy directly affects the performance of the whole speaker verification system; furthermore, ensuring that the system can maintain good performance under different gender composition of test speakers is also an important appeal. To explore the impact of different gender information’s usage on speaker verification system, this paper implemented a speaker verification system using i-vector and PLDA model as speaker feature and scoring model respectively, and 3 i-vector-based gender classifier. After analyzing the weakness of speaker verification system using gender-dependent model in a general way, we proposed several different methods for the application of gender information under the conditions when gender classifier has good and poor performance respectively; moreover, we analysis the performance of each method under different gender composition of test speakers as well. Finally, we reached the goal of making our system achieve better performance than tradition practice under different circumstances.

Style APA, Harvard, Vancouver, ISO itp.

39

Yan, Kuan-Hao, i 管浩延. "The Use of Mixture Endpoint Detection Technique for Text-Dependent Speaker Verification System". Thesis, 2009. http://ndltd.ncl.edu.tw/handle/56600116939838315911.

Pełny tekst źródła

Streszczenie:

碩士
國立中興大學
通訊工程研究所
97
Speaker verification has been used in the area of biometric authentication. The recognition rate is the key issue for recent development of speech recognition. In this thesis, instead of the traditional endpoint detection method, we have developed a new mixture endpoint detection method to increase the recognition rate of the speaker verification. We adopt entropy and zero-crossing for end point detection to detect a real speech sections. We use this technique to locate a real speech section in noisy data, and then use the zero-crossing to detect an air sound from the speech. After these processes, the SNR can really reflect the real speech level; therefore the threshold can be set. Using the mixture endpoint detection technique can easily increase the text-dependent speaker verification system efficiency.

Style APA, Harvard, Vancouver, ISO itp.

40

Wun-SyongLin i 林文雄. "An Embedded System Design and Implementation for Speaker Independent Single-Words Speech Verification using AMDF-based Pitch Features". Thesis, 2011. http://ndltd.ncl.edu.tw/handle/58222930150850195164.

Pełny tekst źródła

Streszczenie:

碩士
國立成功大學
電機工程學系碩博士班
99
Speech Interface of human-machine interactive system provides not only friendly interface but also a directly feedback mechanism for user. In this thesis, an embedded system is designed and implemented with using the pitch-based single-words speech verification to promote the functionality for speech interactive interface. The proposed system is especially designed for the hardware resource limitation environment, and it has the following features: small size, low cost, low power consumption, real-time operation, and can be widely applied to speech interactive applications. In pitch detection, the average magnitude difference function (AMDF) is adopted to predict the pitch period feature for speaker independent utterance verification. We propose an upper bound strategy to reduce the iteration times of SAA (subtraction absolute operation and accumulation), and this new manner can reduce the computations power with high AMDF accuracy. The proposed pitch period feature extraction manner is implemented on an embedded system with 8051 MCU, preamplifier circuit, AGC and filtering circuit. For single command of 2 seconds duration speech data, the average speech verification accuracy rate is about 95% under difference distances from speaker to microphone. From experimental results, we found that the detected pitch period from the modified AMDF is still reliable. The proposed prototype can be widely applied to human-machine interactive system, such as alarm clock, intelligent toys, real-time feedback system, and hand-free remote controller, etc.

Style APA, Harvard, Vancouver, ISO itp.

41

Su, Hua, i 蘇樺. "PSO Algorithm for Speaker Verification Systems". Thesis, 2014. http://ndltd.ncl.edu.tw/handle/64080634241659055321.

Pełny tekst źródła

Streszczenie:

碩士
國立中央大學
電機工程學系
102
This thesis focused on speaker verification between test corpus and registered speaker models. First of all, the thesis introduces score normalization approaches to the speaker verification system. Then, we apply Particle Swarm Optimization algorithm to optimize model parameters. The main idea of PSO method is like fish foraging behavior. All particles of PSO have memories. The algorithm has simple calculation and fast convergence. With its optimized features to build a more accurate speaker model, the system is more discernment. In addition, the thesis also introduces a regression analysis method to speaker verification system. Regression analysis is a useful statistics analysis method. We build the regression model for each speaker by ordinary least squares estimation and the coefficients of determination analysis. Experiments showed that the proposed method can improve performance of the speaker verification system.

Style APA, Harvard, Vancouver, ISO itp.

42

Mohan, Aanchan K. "Combining speech recognition and speaker verification". 2008. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.17528.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

43

Chandrasekaran, Aravind. "Efficient methods for rapid UBM training (RUT) for robust speaker verification /". 2008. http://proquest.umi.com/pqdweb?did=1650508671&sid=2&Fmt=2&clientId=10361&RQT=309&VName=PQD.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat „Speaker verification system”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych