Articles de revues : « Speaker verification system »

1

Watari, Masao. « Speaker verification system ». Journal of the Acoustical Society of America 91, n^o 1 (janvier 1992) : 546. http://dx.doi.org/10.1121/1.402663.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

2

Sakoe, Hiroaki. « Speaker verification system ». Journal of the Acoustical Society of America 85, n^o 5 (mai 1989) : 2246. http://dx.doi.org/10.1121/1.397806.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

3

Uchiyama, Hiroki. « Speaker verification system ». Journal of the Acoustical Society of America 95, n^o 1 (janvier 1994) : 593. http://dx.doi.org/10.1121/1.408274.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

4

Shanmugapriya, P., et Y. Venkataramani. « Analysis of Speaker Verification System Using Support Vector Machine ». JOURNAL OF ADVANCES IN CHEMISTRY 13, n^o 10 (25 février 2017) : 6531–42. http://dx.doi.org/10.24297/jac.v13i10.5839.

Texte intégral

Résumé :

The integration of GMM- super vector and Support Vector Machine (SVM) has become one of most popular strategy in text-independent speaker verification system.Â This paper describes the application of Fuzzy Support Vector Machine (FSVM) for classification of speakers using GMM-super vectors. Super vectors are formed by stacking the mean vectors of adapted GMMs from UBM using maximum a posteriori (MAP). GMM super vectors characterize speakerâ€™s acoustic characteristics which are used for developing a speaker dependent fuzzy SVM model. Introducing fuzzy theory in support vector machine yields better classification accuracy and requires less number of support vectors. Experiments were conducted on 2001 NIST speaker recognition evaluation corpus. Performance of GMM-FSVM based speaker verification system is compared with the conventional GMM-UBM and GMM-SVM based systems.Â Experimental results indicate that the fuzzy SVM based speaker verification system with GMM super vector achieves better performance to GMM-UBM system. Â

Styles APA, Harvard, Vancouver, ISO, etc.

5

Gada, Amay, Neel Kothari, Ruhina Karani, Chetashri Badane, Dhruv Gada et Tanish Patwa. « DR-SASV : A deep and reliable spoof aware speech verification system ». International Journal on Information Technologies and Security 15, n^o 4 (1 décembre 2023) : 93–106. http://dx.doi.org/10.59035/ffmb8272.

Texte intégral

Résumé :

A spoof-aware speaker verification system is an integrated system that is capable of jointly identifying impostor speakers as well as spoofing attacks from target speakers. This type of system largely helps in protecting sensitive data, mitigating fraud, and reducing theft. Research has recently enhanced the effectiveness of countermeasure systems and automatic speaker verification systems separately to produce low Equal Error Rates (EER) for each system. However, work exploring a combination of both is still scarce. This paper proposes an end-to-end solution to address spoof-aware automatic speaker verification (ASV) by introducing a Deep Reliable Spoof-Aware-Speaker-Verification (DR-SASV) system. The proposed system allows the target audio to pass through a “spoof aware” speaker verification model sequentially after applying a convolutional neural network (CNN)-based spoof detection model. The suggested system produces encouraging results after being trained on the ASVSpoof 2019 LA dataset. The spoof detection model gives a validation accuracy of 96%, while the transformer-based speech verification model authenticates users with an error rate of 13.74%. The system surpasses other state-of-the-art models and produces an EER score of 10.32%.

Styles APA, Harvard, Vancouver, ISO, etc.

6

Mammone, Richard J. « Speaker identification and verification system ». Journal of the Acoustical Society of America 101, n^o 2 (février 1997) : 665. http://dx.doi.org/10.1121/1.419408.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

7

Rabin, Michael D. « Speaker verification system and process ». Journal of the Acoustical Society of America 103, n^o 6 (juin 1998) : 3138. http://dx.doi.org/10.1121/1.423030.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

8

Milewski, Krzysztof, Szymon Zaporowski et Andrzej Czyżewski. « Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice ». Electronics 12, n^o 21 (30 octobre 2023) : 4458. http://dx.doi.org/10.3390/electronics12214458.

Texte intégral

Résumé :

The vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron, WaveRNN, and GE2E neural networks. The results of attacks using voice cloning were analyzed and discussed in the context of a subjective assessment of cloned voice fidelity. Subjective test results and attempts to authenticate speakers proved that the tested biometric identity verification system might resist voice cloning attacks even if humans cannot distinguish cloned samples from original ones.

Styles APA, Harvard, Vancouver, ISO, etc.

9

Bouziane, Ayoub, Jamal Kharroubi et Arsalane Zarghili. « Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models ». International Journal of Electrical and Computer Engineering (IJECE) 7, n^o 6 (1 décembre 2017) : 3655. http://dx.doi.org/10.11591/ijece.v7i6.pp3655-3663.

Texte intégral

Résumé :

<p>This paper presents a novel speaker modeling approachfor speaker recognition systems. The basic idea of this approach consists of deriving the target speaker model from a personalized background model, composed only of the UBM Gaussian components which are really present in the speech of the target speaker. The motivation behind the derivation of speakers’ models from personalized background models is to exploit the observeddifference insome acoustic-classes between speakers, in order to improve the performance of speaker recognition systems.</p>The proposed approach was evaluatedfor speaker verification task using various amounts of training and testing speech data. The experimental results showed that the proposed approach is efficientin termsof both verification performance and computational cost during the testing phase of the system, compared to the traditional UBM based speaker recognition systems.

Styles APA, Harvard, Vancouver, ISO, etc.

10

Pham, Tuan, et Michael Wagner. « Speaker Verification with Fuzzy Fusion and Genetic Optimization ». Journal of Advanced Computational Intelligence and Intelligent Informatics 3, n^o 6 (20 décembre 1999) : 451–56. http://dx.doi.org/10.20965/jaciii.1999.p0451.

Texte intégral

Résumé :

Most speaker verification systems are based on similarity or likelihood normalization techniques as they help to better cope with speaker variability. In the conventional normalization, the it a priori probabilities of the cohort speakers are assumed to be equal. From this standpoint, we apply the fuzzy integral and genetic algorithms to combine the likelihood values of the cohort speakers in which the assumption of equal <I>a priori</I> probabilities is relaxed. This approach replaces the conventional normalization term by the fuzzy integral which acts as a non-linear fusion of the similarity measures of an utterance assigned to the cohort speakers. Furthermore, genetic algorithms are applied to find optimal fuzzy densities which are very important for the fuzzy fusion. We illustrate the performance of the proposed approach by testing the speaker verification system with both the conventional and the proposed algorithms using the commercial speech corpus TI46. The results in terms of the equal error rates show that the speaker verification system using the fuzzy integral is more favorable than the conventional normalization method.

Styles APA, Harvard, Vancouver, ISO, etc.

11

Kamiński, Kamil A., et Andrzej P. Dobrowolski. « Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features ». Sensors 22, n^o 23 (1 décembre 2022) : 9370. http://dx.doi.org/10.3390/s22239370.

Texte intégral

Résumé :

This article presents the Automatic Speaker Recognition System (ASR System), which successfully resolves problems such as identification within an open set of speakers and the verification of speakers in difficult recording conditions similar to telephone transmission conditions. The article provides complete information on the architecture of the various internal processing modules of the ASR System. The speaker recognition system proposed in the article, has been compared very closely to other competing systems, achieving improved speaker identification and verification results, on known certified voice dataset. The ASR System owes this to the dual use of genetic algorithms both in the feature selection process and in the optimization of the system’s internal parameters. This was also influenced by the proprietary feature generation and corresponding classification process using Gaussian mixture models. This allowed the development of a system that makes an important contribution to the current state of the art in speaker recognition systems for telephone transmission applications with known speech coding standards.

Styles APA, Harvard, Vancouver, ISO, etc.

12

Shim, Hye-jin, Jee-weon Jung, Ju-ho Kim et Ha-jin Yu. « Integrated Replay Spoofing-Aware Text-Independent Speaker Verification ». Applied Sciences 10, n^o 18 (10 septembre 2020) : 6292. http://dx.doi.org/10.3390/app10186292.

Texte intégral

Résumé :

A number of studies have successfully developed speaker verification or presentation attack detection systems. However, studies integrating the two tasks remain in the preliminary stages. In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach. The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning using a common feature. However, through experiments, we hypothesize that the information required for performing speaker verification and presentation attack detection might differ because speaker verification systems try to remove device-specific information from speaker embeddings, while presentation attack detection systems exploit such information. Therefore, we propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection. This approach has thee input components: two speaker embeddings (for enrollment and test each) and prediction of presentation attacks. Experiments are conducted using the ASVspoof 2017-v2 dataset, which includes official trials on the integration of speaker verification and presentation attack detection. The proposed back-end approach demonstrates a relative improvement of 21.77% in terms of the equal error rate for integrated trials compared to a conventional speaker verification system.

Styles APA, Harvard, Vancouver, ISO, etc.

13

Luo, Hongwei, Yijie Shen, Feng Lin et Guoai Xu. « Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference ». Security and Communication Networks 2021 (9 février 2021) : 1–10. http://dx.doi.org/10.1155/2021/6664578.

Texte intégral

Résumé :

Speaker verification system has gained great popularity in recent years, especially with the development of deep neural networks and Internet of Things. However, the security of speaker verification system based on deep neural networks has not been well investigated. In this paper, we propose an attack to spoof the state-of-the-art speaker verification system based on generalized end-to-end (GE2E) loss function for misclassifying illegal users into the authentic user. Specifically, we design a novel loss function to deploy a generator for generating effective adversarial examples with slight perturbation and then spoof the system with these adversarial examples to achieve our goals. The success rate of our attack can reach 82% when cosine similarity is adopted to deploy the deep-learning-based speaker verification system. Beyond that, our experiments also reported the signal-to-noise ratio at 76 dB, which proves that our attack has higher imperceptibility than previous works. In summary, the results show that our attack not only can spoof the state-of-the-art neural-network-based speaker verification system but also more importantly has the ability to hide from human hearing or machine discrimination.

Styles APA, Harvard, Vancouver, ISO, etc.

14

INOUE, Yoshiaki, et Satoshi KUMAKURA. « Working over Speech : Speaker Verification System "VoiceGATEII", and Speaker Identification System "VoiceSync" ». Proceedings of the Conference on Information, Intelligence and Precision Equipment : IIP 2000 (2000) : 1–4. http://dx.doi.org/10.1299/jsmeiip.2000.1.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

15

Sarmah, K. « Comparison Studies of Speaker Modeling Techniques in Speaker Verification System ». International Journal of Scientific Research in Computer Science and Engineering 5, n^o 5 (30 octobre 2017) : 75–82. http://dx.doi.org/10.26438/ijsrcse/v5i5.7582.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

16

Lv, Gang, et Heming Zhao. « Joint Factor Analysis of Channel Mismatch in Whispering Speaker Verification ». Archives of Acoustics 37, n^o 4 (1 décembre 2012) : 555–59. http://dx.doi.org/10.2478/v10168-012-0065-9.

Texte intégral

Résumé :

Abstract A speaker recognition system based on joint factor analysis (JFA) is proposed to improve whispering speakers’ recognition rate under channel mismatch. The system estimated separately the eigenvoice and the eigenchannel before calculating the corresponding speaker and the channel factors. Finally, a channel-free speaker model was built to describe accurately a speaker using model compensation. The test results from the whispered speech databases obtained under eight different channels showed that the correct recognition rate of a recognition system based on JFA was higher than that of the Gaussian Mixture Model-Universal Background Model. In particular, the recognition rate in cellphone channel tests increased significantly.

Styles APA, Harvard, Vancouver, ISO, etc.

17

Sathiamoorthy, S., R. Ponnusamy et R. Visalakshi. « Performance of Speaker Verification Using CSM and TM ». Asian Journal of Computer Science and Technology 7, n^o 2 (5 août 2018) : 123–27. http://dx.doi.org/10.51983/ajcst-2018.7.2.1866.

Texte intégral

Résumé :

In this paper, we presented the performance of a speaker verification system based on features computed from the speech recorded using a Close Speaking Microphone(CSM) and Throat Microphone(TM) in clean and noisy environment. Noise is the one of the most complicated problem in speaker verification system. The background noises affect the performance of speaker verification using CSM. To overcome this issue, TM is used which has a transducer held at the throat resulting in a clean signal and unaffected by background noises. Acoustic features are computed by means of Relative Spectral Transform-Perceptual Linear Prediction (RASTA-PLP). Autoassociative neural network (AANN) technique is used to extract the features and in order to confirm the speakers from clean and noisy environment. A new method is presented in this paper, for verification of speakers in clean using combined CSM and TM. The verification performance of the proposed combined system is significantly better than the system using the CSM alone due to the complementary nature of CSM and TM. It is evident that an EER of about 1.0% for the combined devices (CSM+TM) by evaluating the FAR and FRR values and the overall verification of 99% is obtained in clean speech.

Styles APA, Harvard, Vancouver, ISO, etc.

18

Gorban, Igor I. « Crime automatic speaker verification and identification system ». Journal of the Acoustical Society of America 102, n^o 5 (novembre 1997) : 3165. http://dx.doi.org/10.1121/1.420769.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

19

Shanmugapriya, P., et Y. Venkataramani. « Wavelet fuzzy LVQ based speaker verification system ». International Journal of Speech Technology 16, n^o 4 (14 mars 2013) : 403–12. http://dx.doi.org/10.1007/s10772-013-9191-7.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

20

Chen, Chaotao, Di Jiang, Jinhua Peng, Rongzhong Lian, Chen Jason Zhang, Qian Xu, Lixin Fan et Qiang Yang. « A Health-friendly Speaker Verification System Supporting Mask Wearing ». Proceedings of the AAAI Conference on Artificial Intelligence 35, n^o 18 (18 mai 2021) : 16004–6. http://dx.doi.org/10.1609/aaai.v35i18.17994.

Texte intégral

Résumé :

We demonstrate a health-friendly speaker verification system for voice-based identity verification on mobile devices. The system is built upon a speech processing module, a ResNet-based local acoustic feature extractor and a multi-head attention-based embedding layer, and is optimized under an additive margin softmax loss for discriminative speaker verification. It is shown that the system achieves superior performance no matter whether there is mask wearing or not. This characteristic is important for speaker verification services operating in regions affected by the raging coronavirus pneumonia. With this demonstration, the audience will have an in-depth experience of how the accuracy of bio-metric verification and the personal health are simultaneously ensured. We wish that this demonstration would boost the development of next-generation bio-metric verification technologies.

Styles APA, Harvard, Vancouver, ISO, etc.

21

Prasetio, Barlian Henryranu, Hiroki Tamura et Koichi Tanno. « Emotional Variability Analysis Based I-Vector for Speaker Verification in Under-Stress Conditions ». Electronics 9, n^o 9 (1 septembre 2020) : 1420. http://dx.doi.org/10.3390/electronics9091420.

Texte intégral

Résumé :

Emotional conditions cause changes in the speech production system. It produces the differences in the acoustical characteristics compared to neutral conditions. The presence of emotion makes the performance of a speaker verification system degrade. In this paper, we propose a speaker modeling that accommodates the presence of emotions on the speech segments by extracting a speaker representation compactly. The speaker model is estimated by following a similar procedure to the i-vector technique, but it considerate the emotional effect as the channel variability component. We named this method as the emotional variability analysis (EVA). EVA represents the emotion subspace separately to the speaker subspace, like the joint factor analysis (JFA) model. The effectiveness of the proposed system is evaluated by comparing it with the standard i-vector system in the speaker verification task of the Speech Under Simulated and Actual Stress (SUSAS) dataset with three different scoring methods. The evaluation focus in terms of the equal error rate (EER). In addition, we also conducted an ablation study for a more comprehensive analysis of the EVA-based i-vector. Based on experiment results, the proposed system outperformed the standard i-vector system and achieved state-of-the-art results in the verification task for the under-stressed speakers.

Styles APA, Harvard, Vancouver, ISO, etc.

22

Kadhim, Imad Burhan, Ali Najdet Nasret et Zuhair Shakor Mahmood. « Enhancement and modification of automatic speaker verification by utilizing hidden Markov model ». Indonesian Journal of Electrical Engineering and Computer Science 27, n^o 3 (1 septembre 2022) : 1397. http://dx.doi.org/10.11591/ijeecs.v27.i3.pp1397-1403.

Texte intégral

Résumé :

<div class="WordSection1"><p>The purpose of this study is to discuss the design and implementation of autonomous surface vehicle (ASV) systems. There’s a lot riding on the advancement and improvement of ASV applications, especially given the benefits they provide over other biometric approaches. Modern speaker recognition systems rely on statistical models like hidden Markov model (HMM), support vector machine (SVM), artificial neural networks (ANN), generalized method of moments (GMM), and combined models to identify speakers. Using a French dataset, this study investigates the effectiveness of prompted te xt speaker verification. At a context-free, single mixed mono phony level, this study has been constructing a continuous speech system based on HMM. After that, suitable voice data is used to build the client and world models. In order to verify speakers, the text-dependent speaker ver-ification system uses sentence HMM that have been concatenated for the key text. Normalized log-likelihood is determined from client model forced by Viterbi algorithm and world model, in the verification step as the difference between the log-likelihood. At long last, a method for figuring out the verification results is revealed.</p></div>

Styles APA, Harvard, Vancouver, ISO, etc.

23

GUOJIE, LI, P. SARATCHANDRAN et N. SUNDARARAJAN. « TEXT-INDEPENDENT SPEAKER VERIFICATION USING MINIMAL RESOURCE ALLOCATION NETWORKS ». International Journal of Neural Systems 14, n^o 06 (décembre 2004) : 347–54. http://dx.doi.org/10.1142/s0129065704002108.

Texte intégral

Résumé :

This paper presents a text-independent speaker verification system based on an online Radial Basis Function (RBF) network referred to as Minimal Resource Allocation Network (MRAN). MRAN is a sequential learning RBF, in which hidden neurons are added or removed as training progresses. LP-derived cepstral coefficients are used as feature vectors during training and verification phases. The performance of MRAN is compared with other well-known RBF and Elliptical Basis Function (EBF) based speaker verification methods in terms of error rates and computational complexity on a series of speaker verification experiments. The experiments use data from 258 speakers from the phonetically balancedcontinuous speech corpus TIMIT. The results show that MRAN produces comparable error rates to other methods with much less computational complexity.

Styles APA, Harvard, Vancouver, ISO, etc.

24

Wang, Wei, Jiqing Han, Tieran Zheng, Guibin Zheng et Xingyu Zhou. « Speaker Verification via Modeling Kurtosis Using Sparse Coding ». International Journal of Pattern Recognition and Artificial Intelligence 30, n^o 03 (22 février 2016) : 1659008. http://dx.doi.org/10.1142/s0218001416590084.

Texte intégral

Résumé :

This paper proposes a new model for speaker verification by employing kurtosis statistical method based on sparse coding of human auditory system. Since only a small number of neurons in primary auditory cortex are activated in encoding acoustic stimuli and sparse independent events are used to represent the characteristics of the neurons. Each individual dictionary is learned from individual speaker samples where dictionary atoms correspond to the cortex neurons. The neuron responses possess statistical properties of acoustic signals in auditory cortex so that the activation distribution of individual speaker’s neurons is approximated as the characteristics of the speaker. Kurtosis is an efficient approach to measure the sparsity of the neuron from its activation distribution, and the vector composed of the kurtosis of every neuron is obtained as the model to characterize the speaker’s voice. The experimental results demonstrate that the kurtosis model outperforms the baseline systems and an effective identity validation function is achieved desirably.

Styles APA, Harvard, Vancouver, ISO, etc.

25

Hao, Zhanjun, Jianxiang Peng, Xiaochao Dang, Hao Yan et Ruidong Wang. « mmSafe : A Voice Security Verification System Based on Millimeter-Wave Radar ». Sensors 22, n^o 23 (29 novembre 2022) : 9309. http://dx.doi.org/10.3390/s22239309.

Texte intégral

Résumé :

With the increasing popularity of smart devices, users can control their mobile phones, TVs, cars, and smart furniture by using voice assistants, but voice assistants are susceptible to intrusion by outsider speakers or playback attacks. In order to address this security issue, a millimeter-wave radar-based voice security authentication system is proposed in this paper. First, the speaker’s fine-grained vocal cord vibration signal is extracted by eliminating static object clutter and motion effects; second, the weighted Mel Frequency Cepstrum Coefficients (MFCCs) are obtained as biometric features; and finally, text-independent security authentication is performed by the WMHS (Weighted MFCCs and Hog-based SVM) method. This system is highly adaptable and can authenticate designated speakers, resist intrusion by other unspecified speakers as well as playback attacks, and is secure for smart devices. Extensive experiments have verified that the system achieves a 93.4% speaker verification accuracy and a 5.8% miss detection rate for playback attacks.

Styles APA, Harvard, Vancouver, ISO, etc.

26

Li, Tingyu. « Speaker Recognition System based on Triplet State Loss Function ». Scientific Journal of Technology 5, n^o 8 (22 août 2023) : 39–46. http://dx.doi.org/10.54691/sjt.v5i8.5496.

Texte intégral

Résumé :

The purpose of this paper is to build a model and design a speaker recognition system by comprehensively summarizing and learning the research data of speaker speech recognition models at home and abroad, and adopting a research method based on deep machine learning theory. Its main contents and proposed methods are as follows: For data processing, firstly, select and download the public data set from official website, preprocess each voice in the data set, extract Fbank features, convert it into. npy, store it in a file, process the voice into a format suitable for model input, and wait for subsequent input into the model. In practice, a ResCNN architecture based on convolution neural network is used to build a model. The model uses triplet loss function training to map speech to hyperplane, so cosine similarity is directly used to characterize the distance between two speakers. Speaker verification function provides three different ways to obtain speech, input the two acquired speech into the model, judge the similarity of the two speech and give the judgment result. For the speaker recognition model, three different ways can also be used to obtain the speech and determine which speaker the speech is in the corpus. For the speaker confirmation model, a speech is randomly played, and a speaker is randomly selected to judge whether the speech is the speaker's voice.

Styles APA, Harvard, Vancouver, ISO, etc.

27

Abid Noor, Ali O. « Robust speaker verification in band-localized noise conditions ». Indonesian Journal of Electrical Engineering and Computer Science 13, n^o 2 (1 février 2019) : 499. http://dx.doi.org/10.11591/ijeecs.v13.i2.pp499-506.

Texte intégral

Résumé :

This research paper presents a robust method for speaker verification in noisy environments. The noise is assumed to contaminate certain parts of the voice’s frequency spectrum. Therefore, the verification method is based on splitting the noisy speech into subsidiary bands then using a threshold to sense the existence of noise in a specific part of the spectrum, hence activating an adaptive filter in that part to track changes in noise’s characteristics and remove it. The decomposition is achieved using low complexity quadrature mirror filters QMF in three levels thus achieving four bands in a non-uniform that resembles human hearing perceptual. Speaker recognition is based on vector quantization VQ or template matching technique. Features are extracted from speaker’s voice using the normalized power in a similar way to the Mel-frequency cepstral coefficients. The performance of the proposed system is evaluated using 60 speakers subjected to five levels of signal to noise ratio SNR using total success rate TSR, false acceptance rate FAR, false rejection rate FRR and equal error rate. The proposed method showed higher recognition accuracy than existing methods in severe noise conditions.

Styles APA, Harvard, Vancouver, ISO, etc.

28

A, Ajila. « An Overview on Automatic Speaker Verification System Techniques ». International Journal for Research in Applied Science and Engineering Technology 8, n^o 4 (30 avril 2020) : 355–61. http://dx.doi.org/10.22214/ijraset.2020.4055.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

29

Teoh, Andrew Beng Jin, et Lee-Ying Chong. « Secure speech template protection in speaker verification system ». Speech Communication 52, n^o 2 (février 2010) : 150–63. http://dx.doi.org/10.1016/j.specom.2009.09.003.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

30

Wang, Jia-Ching, Li-Xun Lian, Yan-Yu Lin et Jia-Hao Zhao. « VLSI Design for SVM-Based Speaker Verification System ». IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, n^o 7 (juillet 2015) : 1355–59. http://dx.doi.org/10.1109/tvlsi.2014.2335112.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

31

Bae, Ara, et Wooil Kim. « Speaker Verification Employing Combinations of Self-Attention Mechanisms ». Electronics 9, n^o 12 (21 décembre 2020) : 2201. http://dx.doi.org/10.3390/electronics9122201.

Texte intégral

Résumé :

One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting the speaker embedding using attention mechanism instead of average or statistics pooling. In the attention method, the speaker recognition performance is improved by employing multiple heads rather than a single head. In this paper, we propose advanced methods to extract a new embedding by compensating for the disadvantages of the single-head and multi-head attention methods. The combination method comprising single-head and split-based multi-head attentions shows a 5.39% Equal Error Rate (EER). When the single-head and projection-based multi-head attention methods are combined, the speaker recognition performance improves by 4.45%, which is the best performance in this work. Our experimental results demonstrate that the attention mechanism reflects the speaker’s properties more effectively than average or statistics pooling, and the speaker verification system could be further improved by employing combinations of different attention techniques.

Styles APA, Harvard, Vancouver, ISO, etc.

32

Kim, Ju-Ho, Hye-Jin Shim, Jee-Weon Jung et Ha-Jin Yu. « A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher ». Applied Sciences 12, n^o 1 (22 décembre 2021) : 76. http://dx.doi.org/10.3390/app12010076.

Texte intégral

Résumé :

The majority of recent speaker verification tasks are studied under open-set evaluation scenarios considering real-world conditions. The characteristics of these tasks imply that the generalization towards unseen speakers is a critical capability. Thus, this study aims to improve the generalization of the system for the performance enhancement of speaker verification. To achieve this goal, we propose a novel supervised-learning-method-based speaker verification system using the mean teacher framework. The mean teacher network refers to the temporal averaging of deep neural network parameters, which can produce a more accurate, stable representations than fixed weights at the end of training and is conventionally used for semi-supervised learning. Leveraging the success of the mean teacher framework in many studies, the proposed supervised learning method exploits the mean teacher network as an auxiliary model for better training of the main model, the student network. By learning the reliable intermediate representations derived from the mean teacher network as well as one-hot speaker labels, the student network is encouraged to explore more discriminative embedding spaces. The experimental results demonstrate that the proposed method relatively reduces the equal error rate by 11.61%, compared to the baseline system.

Styles APA, Harvard, Vancouver, ISO, etc.

33

Machado, Vieira Filho et de Oliveira. « Forensic Speaker Verification Using Ordinary Least Squares ». Sensors 19, n^o 20 (10 octobre 2019) : 4385. http://dx.doi.org/10.3390/s19204385.

Texte intégral

Résumé :

In Brazil, the recognition of speakers for forensic purposes still relies on a subjectivity-based decision-making process through a results analysis of untrustworthy techniques. Owing to the lack of a voice database, speaker verification is currently applied to samples specifically collected for confrontation. However, speaker comparative analysis via contested discourse requires the collection of an excessive amount of voice samples for a series of individuals. Further, the recognition system must inform who is the most compatible with the contested voice from pre-selected individuals. Accordingly, this paper proposes using a combination of linear predictive coding (LPC) and ordinary least squares (OLS) as a speaker verification tool for forensic analysis. The proposed recognition technique establishes confidence and similarity upon which to base forensic reports, indicating verification of the speaker of the contested discourse. Therefore, in this paper, an accurate, quick, alternative method to help verify the speaker is contributed. After running seven different tests, this study preliminarily achieved a hit rate of 100% considering a limited dataset (Brazilian Portuguese). Furthermore, the developed method extracts a larger number of formants, which are indispensable for statistical comparisons via OLS. The proposed framework is robust at certain levels of noise, for sentences with the suppression of word changes, and with different quality or even meaningful audio time differences.

Styles APA, Harvard, Vancouver, ISO, etc.

34

Sepulveda Sepulveda, Franklin Alexander, Dagoberto Porras-Plata et Milton Sarria-Paja. « Speaker verification system based on articulatory information from ultrasound recordings ». DYNA 87, n^o 213 (1 avril 2020) : 9–16. http://dx.doi.org/10.15446/dyna.v87n213.81772.

Texte intégral

Résumé :

Current state-of-the-art speaker verification (SV) systems are known to be strongly affected by unexpected variability presented during testing, such as environmental noise or changes in vocal effort. In this work, we analyze and evaluate articulatory information of the tongue's movement as a means to improve the performance of speaker verification systems. We use a Spanish database, where besides the speech signals, we also include articulatory information that was acquired with an ultrasound system. Two groups of features are proposed to represent the articulatory information, and the obtained performance is compared to an SV system trained only with acoustic information. Our results show that the proposed features contain highly discriminative information, and they are related to speaker identity; furthermore, these features can be used to complement and improve existing systems by combining such information with cepstral coefficients at the feature level.

Styles APA, Harvard, Vancouver, ISO, etc.

35

Bharathi, B. « Speaker-specific-text based speaker verification system using spectral and phase based features ». International Journal of Speech Technology 20, n^o 3 (12 mai 2017) : 465–74. http://dx.doi.org/10.1007/s10772-017-9416-2.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

36

Wang, Meng, Dazheng Feng, Tingting Su et Mohan Chen. « Attention-Based Temporal-Frequency Aggregation for Speaker Verification ». Sensors 22, n^o 6 (10 mars 2022) : 2147. http://dx.doi.org/10.3390/s22062147.

Texte intégral

Résumé :

Convolutional neural networks (CNNs) have significantly promoted the development of speaker verification (SV) systems because of their powerful deep feature learning capability. In CNN-based SV systems, utterance-level aggregation is an important component, and it compresses the frame-level features generated by the CNN frontend into an utterance-level representation. However, most of the existing aggregation methods aggregate the extracted features across time and cannot capture the speaker-dependent information contained in the frequency domain. To handle this problem, this paper proposes a novel attention-based frequency aggregation method, which focuses on the key frequency bands that provide more information for utterance-level representation. Meanwhile, two more effective temporal-frequency aggregation methods are proposed in combination with the existing temporal aggregation methods. The two proposed methods can capture the speaker-dependent information contained in both the time domain and frequency domain of frame-level features, thus improving the discriminability of speaker embedding. Besides, a powerful CNN-based SV system is developed and evaluated on the TIMIT and Voxceleb datasets. The experimental results indicate that the CNN-based SV system using the temporal-frequency aggregation method achieves a superior equal error rate of 5.96% on Voxceleb compared with the state-of-the-art baseline models.

Styles APA, Harvard, Vancouver, ISO, etc.

37

Takialddin, Al smadi, et Ahmed Handam. « Artificial neural networks for voice activity detection Technology ». Journal of Advanced Sciences and Engineering Technologies 5, n^o 1 (14 janvier 2022) : 23–31. http://dx.doi.org/10.32441/jaset.05.01.03.

Texte intégral

Résumé :

Currently, the direction of voice biometrics is actively developing, which includes two related tasks of recognizing the speaker by voice: the verification task, which consists in determining the speaker's personality, and the identification task, which is responsible for checking the belonging of the phonogram to a particular speaker. An open question remains related to improving the quality of the verification identification algorithms in real conditions and reducing the probability of error. In this work study Voice activity detection algorithm is proposed, which is a modification of the algorithm based on pitch statistics; VAD is investigated as a component of a speaker recognition system by voice, and therefore the main purpose of its work is to improve the quality of the system as a whole. On the example of the proposed modification of the VAD algorithm and the energy-based VAD algorithm, the analysis of the influence of the choice on the quality of the speaker recognition system is carried out.

Styles APA, Harvard, Vancouver, ISO, etc.

38

Indu D. « A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients ». Journal of Electrical Systems 20, n^o 6s (2 mai 2024) : 2938–45. http://dx.doi.org/10.52783/jes.3299.

Texte intégral

Résumé :

Research on Speaker Identification is always difficult. A speaker may be automatically identified using by comparing their voice sample with their previously recorded voice, the machine learning strategy has grown in favor in recent years. Convolutional neural networks (CNN) , deep neural networks (DNN) are some of the machine learning techniques that has employed recently. The article will discuss a successful speaker verification system based on the d-vector to construct a new approach based on speaker diarization. In particular, in this article, we use the concept of LSTM to cluster the speech segments using MFCC coefficients and identify the speakers in the diarization system. The proposed system will be evaluated using benchmark performance metrics, and a comparative study will be made with other models. The need to consider the LSTM neural network using acoustic data and linguistic dialect is considered. LSTM networks could produce reliable speaker segmentation outputs.

Styles APA, Harvard, Vancouver, ISO, etc.

39

Suhartono, Suhartono, Fresy Nugroho, Muhammad Faisal, Muhammad Ainul Yaqin et Suyanta Suyanta. « Speaker Recognition in Content-based Image Retrieval for a High Degree of Accuracy ». Bulletin of Electrical Engineering and Informatics 7, n^o 3 (1 septembre 2018) : 350–58. http://dx.doi.org/10.11591/eei.v7i3.957.

Texte intégral

Résumé :

The purpose of this research is to measure the speaker recognition accuracy in Content-Based Image Retrieval. To support research in speaker recognition accuracy, we use two approaches for recognition system: identification and verification, an identification using fuzzy Mamdani, a verification using Manhattan distance. The test results in this research. The best of distance mean is size 32x32. The best of the verification for distance rate is 965, and the speaker recognition system has a standard error of 5% and the system accuracy is 95%. From these results, we find that there is an increase in accuracy of almost 2.5%. This is due to a combination of two approaches so the system can add to the accuracy of speaker recognition.

Styles APA, Harvard, Vancouver, ISO, etc.

40

Gorban, Igor I., Nick I. Gorban et Anatoly V. Klimenko. « Crime‐detection automatic speaker verification and identification (CASVI) system ». Journal of the Acoustical Society of America 105, n^o 2 (février 1999) : 1353. http://dx.doi.org/10.1121/1.426411.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

41

Ramos-Lara, Rafael, Mariano López-García, Enrique Cantó-Navarro et Luís Puente-Rodriguez. « Real-Time Speaker Verification System Implemented on Reconfigurable Hardware ». Journal of Signal Processing Systems 71, n^o 2 (28 juin 2012) : 89–103. http://dx.doi.org/10.1007/s11265-012-0683-5.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

42

Impedovo, Donato, Giuseppe Pirlo et Mario Petrone. « A multi-resolution multi-classifier system for speaker verification ». Expert Systems 29, n^o 5 (2 juin 2011) : 442–55. http://dx.doi.org/10.1111/j.1468-0394.2011.00603.x.

Texte intégral

Styles APA, Harvard, Vancouver, ISO, etc.

43

Nirmal, Asmita, Deepak Jayaswal et Pramod H. Kachare. « Statistically Significant Duration-Independent-based Noise-Robust Speaker Verification ». International Journal of Mathematical, Engineering and Management Sciences 9, n^o 1 (1 février 2024) : 147–62. http://dx.doi.org/10.33889/ijmems.2024.9.1.008.

Texte intégral

Résumé :

A speaker verification system models individual speakers using different speech features to improve their robustness. However, redundant features degrade the system's performance. This paper presents Statistically Significant Duration-Independent Mel frequency Cepstral Coefficients (SSDI-MFCC) features with the Extreme Gradient Boost classifier for improving the noise-robustness of speaker models. Eight statistical descriptors are used to generate signal duration-independent features, and a statistically significant feature subset is obtained using a t-test. A redeveloped Librispeech database by adding noises from the AURORA database to simulate real-world test conditions for speaker verification is used for evaluation. The SSDI-MFCC is compared with Principal Component Analysis (PCA) and Genetic Algorithm (GA). The comparative results showed average equal error rate improvements by 4.93 % and 3.48 % with the SSDI-MFCC than GA-MFCC and PCA-MFCC in clean and noisy conditions, respectively. A significant reduction in verification time is observed using SSDI-MFCC than the complete feature set.

Styles APA, Harvard, Vancouver, ISO, etc.

44

Selin, M., et Dr K. Preetha Mathew. « Text-independent Speaker Verification Using Hybrid Convolutional Neural Networks ». Webology 18, n^o 2 (23 décembre 2021) : 756–66. http://dx.doi.org/10.14704/web/v18i2/web18352.

Texte intégral

Résumé :

Automatic speaker verification is an active research area for more than four decades, and the technology has gradually upgraded for real application. In this paper, a hybrid convolutional neural network (CNN) model is proposed where a combination of the 3D CNN & 2D CNN model is used for speaker verification in the text-independent scenario. For speaker verification, this novel convolutional neural network architecture was built to capture and discard speaker and non-speaker information at the same time. In the training process, the network is trained to differentiate between different identities of a speaker to establish the background model. The model development of the speaker is one of the important aspects. Most conventional techniques employed the d-vector system to create speaker models by means of an average of the features collected from the speaker utterance. Here a hybrid of convolutional neural networks model is utilized in the development and registration phases for building a speaker model. The approach suggested exceeds the existing methods of speaker verification.

Styles APA, Harvard, Vancouver, ISO, etc.

45

Mingote, Victoria, Antonio Miguel, Alfonso Ortega et Eduardo Lleida. « Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification ». Applied Sciences 9, n^o 16 (11 août 2019) : 3295. http://dx.doi.org/10.3390/app9163295.

Texte intégral

Résumé :

In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part.

Styles APA, Harvard, Vancouver, ISO, etc.

46

Khan, Umair, Pooyan Safari et Javier Hernando. « Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows ». Applied Sciences 9, n^o 13 (9 juillet 2019) : 2761. http://dx.doi.org/10.3390/app9132761.

Texte intégral

Résumé :

Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectively.

Styles APA, Harvard, Vancouver, ISO, etc.

47

Rudramurthy, M. S., V. Kamakshi Prasad et R. Kumaraswamy. « Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm ». Journal of Intelligent Systems 23, n^o 4 (1 décembre 2014) : 359–78. http://dx.doi.org/10.1515/jisys-2013-0085.

Texte intégral

Résumé :

AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.

Styles APA, Harvard, Vancouver, ISO, etc.

48

Chen, Zesheng, Li-Chi Chang, Chao Chen, Guoping Wang et Zhuming Bi. « Defending against FakeBob Adversarial Attacks in Speaker Verification Systems with Noise-Adding ». Algorithms 15, n^o 8 (17 août 2022) : 293. http://dx.doi.org/10.3390/a15080293.

Texte intégral

Résumé :

Speaker verification systems use human voices as an important biometric to identify legitimate users, thus adding a security layer to voice-controlled Internet-of-things smart homes against illegal access. Recent studies have demonstrated that speaker verification systems are vulnerable to adversarial attacks such as FakeBob. The goal of this work is to design and implement a simple and light-weight defense system that is effective against FakeBob. We specifically study two opposite pre-processing operations on input audios in speak verification systems: denoising that attempts to remove or reduce perturbations and noise-adding that adds small noise to an input audio. Through experiments, we demonstrate that both methods are able to weaken the ability of FakeBob attacks significantly, with noise-adding achieving even better performance than denoising. Specifically, with denoising, the targeted attack success rate of FakeBob attacks can be reduced from 100% to 56.05% in GMM speaker verification systems, and from 95% to only 38.63% in i-vector speaker verification systems, respectively. With noise adding, those numbers can be further lowered down to 5.20% and 0.50%, respectively. As a proactive measure, we study several possible adaptive FakeBob attacks against the noise-adding method. Experiment results demonstrate that noise-adding can still provide a considerable level of protection against these countermeasures.

Styles APA, Harvard, Vancouver, ISO, etc.

49

Jayanthi Kumari, T. R., et H. S. Jayanna. « i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques ». Journal of Intelligent Systems 29, n^o 1 (3 mai 2018) : 565–82. http://dx.doi.org/10.1515/jisys-2017-0047.

Texte intégral

Résumé :

Abstract In many biometric applications, limited data speaker verification plays a significant role in practical-oriented systems to verify the speaker. The performance of the speaker verification system needs to be improved by applying suitable techniques to limited data condition. The limited data represent both train and test data duration in terms of few seconds. This article shows the importance of the speaker verification system under limited data condition using feature- and score-level fusion techniques. The baseline speaker verification system uses vocal tract features like mel-frequency cepstral coefficients, linear predictive cepstral coefficients and excitation source features like linear prediction residual and linear prediction residual phase as features along with i-vector modeling techniques using the NIST 2003 data set. In feature-level fusion, the vocal tract features are fused with excitation source features. As a result, on average, equal error rate (EER) is approximately equal to 4% compared to individual feature performance. Further in this work, two different types of score-level fusion are demonstrated. In the first case, fusing the scores of vocal tract features and excitation source features at score-level-maintaining modeling technique remains the same, which provides an average reduction approximately equal to 2% EER compared to feature-level fusion performance. In the second case, scores of the different modeling techniques are combined, which has resulted in EER reduction approximately equal to 4.5% compared with score-level fusion of different features.

Styles APA, Harvard, Vancouver, ISO, etc.

50

Mao, Hongwei, Yan Shi, Yue Liu, Linqiang Wei, Yijie Li et Yanhua Long. « Short-time speaker verification with different speaking style utterances ». PLOS ONE 15, n^o 11 (11 novembre 2020) : e0241809. http://dx.doi.org/10.1371/journal.pone.0241809.

Texte intégral

Résumé :

In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studies focused on extracting target speaker information from natural speech. This paper aims to design a new ASV corpus with multi-speaking styles and investigate the ASV robustness to these different speaking styles. We first release this corpus in the Zenodo website for public research, in which each speaker has several text-dependent and text-independent singing, humming and normal reading speech utterances. Then, we investigate the speaker discrimination of each speaking style in the feature space. Furthermore, the intra and inter-speaker variabilities in each different speaking style and cross-speaking styles are investigated in both text-dependent and text-independent ASV tasks. Conventional Gaussian Mixture Model (GMM), and the state-of-the-art x-vector are used to build ASV systems. Experimental results show that the voiceprint information in humming and singing speech are more distinguishable than that in normal reading speech for conventional ASV systems. Furthermore, we find that combing the three speaking styles can significantly improve the x-vector based ASV system, even when only limited gains are obtained by conventional GMM-based systems.

Styles APA, Harvard, Vancouver, ISO, etc.

Articles de revues sur le sujet « Speaker verification system »

Créez une référence correcte selon les styles APA, MLA, Chicago, Harvard et plusieurs autres