Journal articles on the topic 'Speaker verification system'

To see the other types of publications on this topic, follow the link: Speaker verification system.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speaker verification system.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Watari, Masao. "Speaker verification system." Journal of the Acoustical Society of America 91, no. 1 (January 1992): 546. http://dx.doi.org/10.1121/1.402663.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sakoe, Hiroaki. "Speaker verification system." Journal of the Acoustical Society of America 85, no. 5 (May 1989): 2246. http://dx.doi.org/10.1121/1.397806.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Uchiyama, Hiroki. "Speaker verification system." Journal of the Acoustical Society of America 95, no. 1 (January 1994): 593. http://dx.doi.org/10.1121/1.408274.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Shanmugapriya, P., and Y. Venkataramani. "Analysis of Speaker Verification System Using Support Vector Machine." JOURNAL OF ADVANCES IN CHEMISTRY 13, no. 10 (February 25, 2017): 6531–42. http://dx.doi.org/10.24297/jac.v13i10.5839.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The integration of GMM- super vector and Support Vector Machine (SVM) has become one of most popular strategy in text-independent speaker verification system. This paper describes the application of Fuzzy Support Vector Machine (FSVM) for classification of speakers using GMM-super vectors. Super vectors are formed by stacking the mean vectors of adapted GMMs from UBM using maximum a posteriori (MAP). GMM super vectors characterize speaker’s acoustic characteristics which are used for developing a speaker dependent fuzzy SVM model. Introducing fuzzy theory in support vector machine yields better classification accuracy and requires less number of support vectors. Experiments were conducted on 2001 NIST speaker recognition evaluation corpus. Performance of GMM-FSVM based speaker verification system is compared with the conventional GMM-UBM and GMM-SVM based systems. Experimental results indicate that the fuzzy SVM based speaker verification system with GMM super vector achieves better performance to GMM-UBM system. Â
5

Gada, Amay, Neel Kothari, Ruhina Karani, Chetashri Badane, Dhruv Gada, and Tanish Patwa. "DR-SASV: A deep and reliable spoof aware speech verification system." International Journal on Information Technologies and Security 15, no. 4 (December 1, 2023): 93–106. http://dx.doi.org/10.59035/ffmb8272.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A spoof-aware speaker verification system is an integrated system that is capable of jointly identifying impostor speakers as well as spoofing attacks from target speakers. This type of system largely helps in protecting sensitive data, mitigating fraud, and reducing theft. Research has recently enhanced the effectiveness of countermeasure systems and automatic speaker verification systems separately to produce low Equal Error Rates (EER) for each system. However, work exploring a combination of both is still scarce. This paper proposes an end-to-end solution to address spoof-aware automatic speaker verification (ASV) by introducing a Deep Reliable Spoof-Aware-Speaker-Verification (DR-SASV) system. The proposed system allows the target audio to pass through a “spoof aware” speaker verification model sequentially after applying a convolutional neural network (CNN)-based spoof detection model. The suggested system produces encouraging results after being trained on the ASVSpoof 2019 LA dataset. The spoof detection model gives a validation accuracy of 96%, while the transformer-based speech verification model authenticates users with an error rate of 13.74%. The system surpasses other state-of-the-art models and produces an EER score of 10.32%.
6

Mammone, Richard J. "Speaker identification and verification system." Journal of the Acoustical Society of America 101, no. 2 (February 1997): 665. http://dx.doi.org/10.1121/1.419408.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Rabin, Michael D. "Speaker verification system and process." Journal of the Acoustical Society of America 103, no. 6 (June 1998): 3138. http://dx.doi.org/10.1121/1.423030.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Milewski, Krzysztof, Szymon Zaporowski, and Andrzej Czyżewski. "Comparison of the Ability of Neural Network Model and Humans to Detect a Cloned Voice." Electronics 12, no. 21 (October 30, 2023): 4458. http://dx.doi.org/10.3390/electronics12214458.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The vulnerability of the speaker identity verification system to attacks using voice cloning was examined. The research project assumed creating a model for verifying the speaker’s identity based on voice biometrics and then testing its resistance to potential attacks using voice cloning. The Deep Speaker Neural Speaker Embedding System was trained, and the Real-Time Voice Cloning system was employed based on the SV2TTS, Tacotron, WaveRNN, and GE2E neural networks. The results of attacks using voice cloning were analyzed and discussed in the context of a subjective assessment of cloned voice fidelity. Subjective test results and attempts to authenticate speakers proved that the tested biometric identity verification system might resist voice cloning attacks even if humans cannot distinguish cloned samples from original ones.
9

Bouziane, Ayoub, Jamal Kharroubi, and Arsalane Zarghili. "Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models." International Journal of Electrical and Computer Engineering (IJECE) 7, no. 6 (December 1, 2017): 3655. http://dx.doi.org/10.11591/ijece.v7i6.pp3655-3663.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
<p>This paper presents a novel speaker modeling approachfor speaker recognition systems. The basic idea of this approach consists of deriving the target speaker model from a personalized background model, composed only of the UBM Gaussian components which are really present in the speech of the target speaker. The motivation behind the derivation of speakers’ models from personalized background models is to exploit the observeddifference insome acoustic-classes between speakers, in order to improve the performance of speaker recognition systems.</p>The proposed approach was evaluatedfor speaker verification task using various amounts of training and testing speech data. The experimental results showed that the proposed approach is efficientin termsof both verification performance and computational cost during the testing phase of the system, compared to the traditional UBM based speaker recognition systems.
10

Pham, Tuan, and Michael Wagner. "Speaker Verification with Fuzzy Fusion and Genetic Optimization." Journal of Advanced Computational Intelligence and Intelligent Informatics 3, no. 6 (December 20, 1999): 451–56. http://dx.doi.org/10.20965/jaciii.1999.p0451.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Most speaker verification systems are based on similarity or likelihood normalization techniques as they help to better cope with speaker variability. In the conventional normalization, the it a priori probabilities of the cohort speakers are assumed to be equal. From this standpoint, we apply the fuzzy integral and genetic algorithms to combine the likelihood values of the cohort speakers in which the assumption of equal <I>a priori</I> probabilities is relaxed. This approach replaces the conventional normalization term by the fuzzy integral which acts as a non-linear fusion of the similarity measures of an utterance assigned to the cohort speakers. Furthermore, genetic algorithms are applied to find optimal fuzzy densities which are very important for the fuzzy fusion. We illustrate the performance of the proposed approach by testing the speaker verification system with both the conventional and the proposed algorithms using the commercial speech corpus TI46. The results in terms of the equal error rates show that the speaker verification system using the fuzzy integral is more favorable than the conventional normalization method.
11

Kamiński, Kamil A., and Andrzej P. Dobrowolski. "Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features." Sensors 22, no. 23 (December 1, 2022): 9370. http://dx.doi.org/10.3390/s22239370.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This article presents the Automatic Speaker Recognition System (ASR System), which successfully resolves problems such as identification within an open set of speakers and the verification of speakers in difficult recording conditions similar to telephone transmission conditions. The article provides complete information on the architecture of the various internal processing modules of the ASR System. The speaker recognition system proposed in the article, has been compared very closely to other competing systems, achieving improved speaker identification and verification results, on known certified voice dataset. The ASR System owes this to the dual use of genetic algorithms both in the feature selection process and in the optimization of the system’s internal parameters. This was also influenced by the proprietary feature generation and corresponding classification process using Gaussian mixture models. This allowed the development of a system that makes an important contribution to the current state of the art in speaker recognition systems for telephone transmission applications with known speech coding standards.
12

Shim, Hye-jin, Jee-weon Jung, Ju-ho Kim, and Ha-jin Yu. "Integrated Replay Spoofing-Aware Text-Independent Speaker Verification." Applied Sciences 10, no. 18 (September 10, 2020): 6292. http://dx.doi.org/10.3390/app10186292.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A number of studies have successfully developed speaker verification or presentation attack detection systems. However, studies integrating the two tasks remain in the preliminary stages. In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach. The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning using a common feature. However, through experiments, we hypothesize that the information required for performing speaker verification and presentation attack detection might differ because speaker verification systems try to remove device-specific information from speaker embeddings, while presentation attack detection systems exploit such information. Therefore, we propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection. This approach has thee input components: two speaker embeddings (for enrollment and test each) and prediction of presentation attacks. Experiments are conducted using the ASVspoof 2017-v2 dataset, which includes official trials on the integration of speaker verification and presentation attack detection. The proposed back-end approach demonstrates a relative improvement of 21.77% in terms of the equal error rate for integrated trials compared to a conventional speaker verification system.
13

Luo, Hongwei, Yijie Shen, Feng Lin, and Guoai Xu. "Spoofing Speaker Verification System by Adversarial Examples Leveraging the Generalized Speaker Difference." Security and Communication Networks 2021 (February 9, 2021): 1–10. http://dx.doi.org/10.1155/2021/6664578.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Speaker verification system has gained great popularity in recent years, especially with the development of deep neural networks and Internet of Things. However, the security of speaker verification system based on deep neural networks has not been well investigated. In this paper, we propose an attack to spoof the state-of-the-art speaker verification system based on generalized end-to-end (GE2E) loss function for misclassifying illegal users into the authentic user. Specifically, we design a novel loss function to deploy a generator for generating effective adversarial examples with slight perturbation and then spoof the system with these adversarial examples to achieve our goals. The success rate of our attack can reach 82% when cosine similarity is adopted to deploy the deep-learning-based speaker verification system. Beyond that, our experiments also reported the signal-to-noise ratio at 76 dB, which proves that our attack has higher imperceptibility than previous works. In summary, the results show that our attack not only can spoof the state-of-the-art neural-network-based speaker verification system but also more importantly has the ability to hide from human hearing or machine discrimination.
14

INOUE, Yoshiaki, and Satoshi KUMAKURA. "Working over Speech : Speaker Verification System "VoiceGATEII", and Speaker Identification System "VoiceSync"." Proceedings of the Conference on Information, Intelligence and Precision Equipment : IIP 2000 (2000): 1–4. http://dx.doi.org/10.1299/jsmeiip.2000.1.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Sarmah, K. "Comparison Studies of Speaker Modeling Techniques in Speaker Verification System." International Journal of Scientific Research in Computer Science and Engineering 5, no. 5 (October 30, 2017): 75–82. http://dx.doi.org/10.26438/ijsrcse/v5i5.7582.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

Lv, Gang, and Heming Zhao. "Joint Factor Analysis of Channel Mismatch in Whispering Speaker Verification." Archives of Acoustics 37, no. 4 (December 1, 2012): 555–59. http://dx.doi.org/10.2478/v10168-012-0065-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract A speaker recognition system based on joint factor analysis (JFA) is proposed to improve whispering speakers’ recognition rate under channel mismatch. The system estimated separately the eigenvoice and the eigenchannel before calculating the corresponding speaker and the channel factors. Finally, a channel-free speaker model was built to describe accurately a speaker using model compensation. The test results from the whispered speech databases obtained under eight different channels showed that the correct recognition rate of a recognition system based on JFA was higher than that of the Gaussian Mixture Model-Universal Background Model. In particular, the recognition rate in cellphone channel tests increased significantly.
17

Sathiamoorthy, S., R. Ponnusamy, and R. Visalakshi. "Performance of Speaker Verification Using CSM and TM." Asian Journal of Computer Science and Technology 7, no. 2 (August 5, 2018): 123–27. http://dx.doi.org/10.51983/ajcst-2018.7.2.1866.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this paper, we presented the performance of a speaker verification system based on features computed from the speech recorded using a Close Speaking Microphone(CSM) and Throat Microphone(TM) in clean and noisy environment. Noise is the one of the most complicated problem in speaker verification system. The background noises affect the performance of speaker verification using CSM. To overcome this issue, TM is used which has a transducer held at the throat resulting in a clean signal and unaffected by background noises. Acoustic features are computed by means of Relative Spectral Transform-Perceptual Linear Prediction (RASTA-PLP). Autoassociative neural network (AANN) technique is used to extract the features and in order to confirm the speakers from clean and noisy environment. A new method is presented in this paper, for verification of speakers in clean using combined CSM and TM. The verification performance of the proposed combined system is significantly better than the system using the CSM alone due to the complementary nature of CSM and TM. It is evident that an EER of about 1.0% for the combined devices (CSM+TM) by evaluating the FAR and FRR values and the overall verification of 99% is obtained in clean speech.
18

Gorban, Igor I. "Crime automatic speaker verification and identification system." Journal of the Acoustical Society of America 102, no. 5 (November 1997): 3165. http://dx.doi.org/10.1121/1.420769.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Shanmugapriya, P., and Y. Venkataramani. "Wavelet fuzzy LVQ based speaker verification system." International Journal of Speech Technology 16, no. 4 (March 14, 2013): 403–12. http://dx.doi.org/10.1007/s10772-013-9191-7.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Chen, Chaotao, Di Jiang, Jinhua Peng, Rongzhong Lian, Chen Jason Zhang, Qian Xu, Lixin Fan, and Qiang Yang. "A Health-friendly Speaker Verification System Supporting Mask Wearing." Proceedings of the AAAI Conference on Artificial Intelligence 35, no. 18 (May 18, 2021): 16004–6. http://dx.doi.org/10.1609/aaai.v35i18.17994.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
We demonstrate a health-friendly speaker verification system for voice-based identity verification on mobile devices. The system is built upon a speech processing module, a ResNet-based local acoustic feature extractor and a multi-head attention-based embedding layer, and is optimized under an additive margin softmax loss for discriminative speaker verification. It is shown that the system achieves superior performance no matter whether there is mask wearing or not. This characteristic is important for speaker verification services operating in regions affected by the raging coronavirus pneumonia. With this demonstration, the audience will have an in-depth experience of how the accuracy of bio-metric verification and the personal health are simultaneously ensured. We wish that this demonstration would boost the development of next-generation bio-metric verification technologies.
21

Prasetio, Barlian Henryranu, Hiroki Tamura, and Koichi Tanno. "Emotional Variability Analysis Based I-Vector for Speaker Verification in Under-Stress Conditions." Electronics 9, no. 9 (September 1, 2020): 1420. http://dx.doi.org/10.3390/electronics9091420.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Emotional conditions cause changes in the speech production system. It produces the differences in the acoustical characteristics compared to neutral conditions. The presence of emotion makes the performance of a speaker verification system degrade. In this paper, we propose a speaker modeling that accommodates the presence of emotions on the speech segments by extracting a speaker representation compactly. The speaker model is estimated by following a similar procedure to the i-vector technique, but it considerate the emotional effect as the channel variability component. We named this method as the emotional variability analysis (EVA). EVA represents the emotion subspace separately to the speaker subspace, like the joint factor analysis (JFA) model. The effectiveness of the proposed system is evaluated by comparing it with the standard i-vector system in the speaker verification task of the Speech Under Simulated and Actual Stress (SUSAS) dataset with three different scoring methods. The evaluation focus in terms of the equal error rate (EER). In addition, we also conducted an ablation study for a more comprehensive analysis of the EVA-based i-vector. Based on experiment results, the proposed system outperformed the standard i-vector system and achieved state-of-the-art results in the verification task for the under-stressed speakers.
22

Kadhim, Imad Burhan, Ali Najdet Nasret, and Zuhair Shakor Mahmood. "Enhancement and modification of automatic speaker verification by utilizing hidden Markov model." Indonesian Journal of Electrical Engineering and Computer Science 27, no. 3 (September 1, 2022): 1397. http://dx.doi.org/10.11591/ijeecs.v27.i3.pp1397-1403.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
<div class="WordSection1"><p>The purpose of this study is to discuss the design and implementation of autonomous surface vehicle (ASV) systems. There’s a lot riding on the advancement and improvement of ASV applications, especially given the benefits they provide over other biometric approaches. Modern speaker recognition systems rely on statistical models like hidden Markov model (HMM), support vector machine (SVM), artificial neural networks (ANN), generalized method of moments (GMM), and combined models to identify speakers. Using a French dataset, this study investigates the effectiveness of prompted te xt speaker verification. At a context-free, single mixed mono phony level, this study has been constructing a continuous speech system based on HMM. After that, suitable voice data is used to build the client and world models. In order to verify speakers, the text-dependent speaker ver-ification system uses sentence HMM that have been concatenated for the key text. Normalized log-likelihood is determined from client model forced by Viterbi algorithm and world model, in the verification step as the difference between the log-likelihood. At long last, a method for figuring out the verification results is revealed.</p></div>
23

GUOJIE, LI, P. SARATCHANDRAN, and N. SUNDARARAJAN. "TEXT-INDEPENDENT SPEAKER VERIFICATION USING MINIMAL RESOURCE ALLOCATION NETWORKS." International Journal of Neural Systems 14, no. 06 (December 2004): 347–54. http://dx.doi.org/10.1142/s0129065704002108.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper presents a text-independent speaker verification system based on an online Radial Basis Function (RBF) network referred to as Minimal Resource Allocation Network (MRAN). MRAN is a sequential learning RBF, in which hidden neurons are added or removed as training progresses. LP-derived cepstral coefficients are used as feature vectors during training and verification phases. The performance of MRAN is compared with other well-known RBF and Elliptical Basis Function (EBF) based speaker verification methods in terms of error rates and computational complexity on a series of speaker verification experiments. The experiments use data from 258 speakers from the phonetically balancedcontinuous speech corpus TIMIT. The results show that MRAN produces comparable error rates to other methods with much less computational complexity.
24

Wang, Wei, Jiqing Han, Tieran Zheng, Guibin Zheng, and Xingyu Zhou. "Speaker Verification via Modeling Kurtosis Using Sparse Coding." International Journal of Pattern Recognition and Artificial Intelligence 30, no. 03 (February 22, 2016): 1659008. http://dx.doi.org/10.1142/s0218001416590084.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This paper proposes a new model for speaker verification by employing kurtosis statistical method based on sparse coding of human auditory system. Since only a small number of neurons in primary auditory cortex are activated in encoding acoustic stimuli and sparse independent events are used to represent the characteristics of the neurons. Each individual dictionary is learned from individual speaker samples where dictionary atoms correspond to the cortex neurons. The neuron responses possess statistical properties of acoustic signals in auditory cortex so that the activation distribution of individual speaker’s neurons is approximated as the characteristics of the speaker. Kurtosis is an efficient approach to measure the sparsity of the neuron from its activation distribution, and the vector composed of the kurtosis of every neuron is obtained as the model to characterize the speaker’s voice. The experimental results demonstrate that the kurtosis model outperforms the baseline systems and an effective identity validation function is achieved desirably.
25

Hao, Zhanjun, Jianxiang Peng, Xiaochao Dang, Hao Yan, and Ruidong Wang. "mmSafe: A Voice Security Verification System Based on Millimeter-Wave Radar." Sensors 22, no. 23 (November 29, 2022): 9309. http://dx.doi.org/10.3390/s22239309.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
With the increasing popularity of smart devices, users can control their mobile phones, TVs, cars, and smart furniture by using voice assistants, but voice assistants are susceptible to intrusion by outsider speakers or playback attacks. In order to address this security issue, a millimeter-wave radar-based voice security authentication system is proposed in this paper. First, the speaker’s fine-grained vocal cord vibration signal is extracted by eliminating static object clutter and motion effects; second, the weighted Mel Frequency Cepstrum Coefficients (MFCCs) are obtained as biometric features; and finally, text-independent security authentication is performed by the WMHS (Weighted MFCCs and Hog-based SVM) method. This system is highly adaptable and can authenticate designated speakers, resist intrusion by other unspecified speakers as well as playback attacks, and is secure for smart devices. Extensive experiments have verified that the system achieves a 93.4% speaker verification accuracy and a 5.8% miss detection rate for playback attacks.
26

Li, Tingyu. "Speaker Recognition System based on Triplet State Loss Function." Scientific Journal of Technology 5, no. 8 (August 22, 2023): 39–46. http://dx.doi.org/10.54691/sjt.v5i8.5496.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The purpose of this paper is to build a model and design a speaker recognition system by comprehensively summarizing and learning the research data of speaker speech recognition models at home and abroad, and adopting a research method based on deep machine learning theory. Its main contents and proposed methods are as follows: For data processing, firstly, select and download the public data set from official website, preprocess each voice in the data set, extract Fbank features, convert it into. npy, store it in a file, process the voice into a format suitable for model input, and wait for subsequent input into the model. In practice, a ResCNN architecture based on convolution neural network is used to build a model. The model uses triplet loss function training to map speech to hyperplane, so cosine similarity is directly used to characterize the distance between two speakers. Speaker verification function provides three different ways to obtain speech, input the two acquired speech into the model, judge the similarity of the two speech and give the judgment result. For the speaker recognition model, three different ways can also be used to obtain the speech and determine which speaker the speech is in the corpus. For the speaker confirmation model, a speech is randomly played, and a speaker is randomly selected to judge whether the speech is the speaker's voice.
27

Abid Noor, Ali O. "Robust speaker verification in band-localized noise conditions." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 2 (February 1, 2019): 499. http://dx.doi.org/10.11591/ijeecs.v13.i2.pp499-506.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This research paper presents a robust method for speaker verification in noisy environments. The noise is assumed to contaminate certain parts of the voice’s frequency spectrum. Therefore, the verification method is based on splitting the noisy speech into subsidiary bands then using a threshold to sense the existence of noise in a specific part of the spectrum, hence activating an adaptive filter in that part to track changes in noise’s characteristics and remove it. The decomposition is achieved using low complexity quadrature mirror filters QMF in three levels thus achieving four bands in a non-uniform that resembles human hearing perceptual. Speaker recognition is based on vector quantization VQ or template matching technique. Features are extracted from speaker’s voice using the normalized power in a similar way to the Mel-frequency cepstral coefficients. The performance of the proposed system is evaluated using 60 speakers subjected to five levels of signal to noise ratio SNR using total success rate TSR, false acceptance rate FAR, false rejection rate FRR and equal error rate. The proposed method showed higher recognition accuracy than existing methods in severe noise conditions.
28

A, Ajila. "An Overview on Automatic Speaker Verification System Techniques." International Journal for Research in Applied Science and Engineering Technology 8, no. 4 (April 30, 2020): 355–61. http://dx.doi.org/10.22214/ijraset.2020.4055.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Teoh, Andrew Beng Jin, and Lee-Ying Chong. "Secure speech template protection in speaker verification system." Speech Communication 52, no. 2 (February 2010): 150–63. http://dx.doi.org/10.1016/j.specom.2009.09.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Wang, Jia-Ching, Li-Xun Lian, Yan-Yu Lin, and Jia-Hao Zhao. "VLSI Design for SVM-Based Speaker Verification System." IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, no. 7 (July 2015): 1355–59. http://dx.doi.org/10.1109/tvlsi.2014.2335112.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Bae, Ara, and Wooil Kim. "Speaker Verification Employing Combinations of Self-Attention Mechanisms." Electronics 9, no. 12 (December 21, 2020): 2201. http://dx.doi.org/10.3390/electronics9122201.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
One of the most recent speaker recognition methods that demonstrates outstanding performance in noisy environments involves extracting the speaker embedding using attention mechanism instead of average or statistics pooling. In the attention method, the speaker recognition performance is improved by employing multiple heads rather than a single head. In this paper, we propose advanced methods to extract a new embedding by compensating for the disadvantages of the single-head and multi-head attention methods. The combination method comprising single-head and split-based multi-head attentions shows a 5.39% Equal Error Rate (EER). When the single-head and projection-based multi-head attention methods are combined, the speaker recognition performance improves by 4.45%, which is the best performance in this work. Our experimental results demonstrate that the attention mechanism reflects the speaker’s properties more effectively than average or statistics pooling, and the speaker verification system could be further improved by employing combinations of different attention techniques.
32

Kim, Ju-Ho, Hye-Jin Shim, Jee-Weon Jung, and Ha-Jin Yu. "A Supervised Learning Method for Improving the Generalization of Speaker Verification Systems by Learning Metrics from a Mean Teacher." Applied Sciences 12, no. 1 (December 22, 2021): 76. http://dx.doi.org/10.3390/app12010076.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The majority of recent speaker verification tasks are studied under open-set evaluation scenarios considering real-world conditions. The characteristics of these tasks imply that the generalization towards unseen speakers is a critical capability. Thus, this study aims to improve the generalization of the system for the performance enhancement of speaker verification. To achieve this goal, we propose a novel supervised-learning-method-based speaker verification system using the mean teacher framework. The mean teacher network refers to the temporal averaging of deep neural network parameters, which can produce a more accurate, stable representations than fixed weights at the end of training and is conventionally used for semi-supervised learning. Leveraging the success of the mean teacher framework in many studies, the proposed supervised learning method exploits the mean teacher network as an auxiliary model for better training of the main model, the student network. By learning the reliable intermediate representations derived from the mean teacher network as well as one-hot speaker labels, the student network is encouraged to explore more discriminative embedding spaces. The experimental results demonstrate that the proposed method relatively reduces the equal error rate by 11.61%, compared to the baseline system.
33

Machado, Vieira Filho, and de Oliveira. "Forensic Speaker Verification Using Ordinary Least Squares." Sensors 19, no. 20 (October 10, 2019): 4385. http://dx.doi.org/10.3390/s19204385.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In Brazil, the recognition of speakers for forensic purposes still relies on a subjectivity-based decision-making process through a results analysis of untrustworthy techniques. Owing to the lack of a voice database, speaker verification is currently applied to samples specifically collected for confrontation. However, speaker comparative analysis via contested discourse requires the collection of an excessive amount of voice samples for a series of individuals. Further, the recognition system must inform who is the most compatible with the contested voice from pre-selected individuals. Accordingly, this paper proposes using a combination of linear predictive coding (LPC) and ordinary least squares (OLS) as a speaker verification tool for forensic analysis. The proposed recognition technique establishes confidence and similarity upon which to base forensic reports, indicating verification of the speaker of the contested discourse. Therefore, in this paper, an accurate, quick, alternative method to help verify the speaker is contributed. After running seven different tests, this study preliminarily achieved a hit rate of 100% considering a limited dataset (Brazilian Portuguese). Furthermore, the developed method extracts a larger number of formants, which are indispensable for statistical comparisons via OLS. The proposed framework is robust at certain levels of noise, for sentences with the suppression of word changes, and with different quality or even meaningful audio time differences.
34

Sepulveda Sepulveda, Franklin Alexander, Dagoberto Porras-Plata, and Milton Sarria-Paja. "Speaker verification system based on articulatory information from ultrasound recordings." DYNA 87, no. 213 (April 1, 2020): 9–16. http://dx.doi.org/10.15446/dyna.v87n213.81772.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Current state-of-the-art speaker verification (SV) systems are known to be strongly affected by unexpected variability presented during testing, such as environmental noise or changes in vocal effort. In this work, we analyze and evaluate articulatory information of the tongue's movement as a means to improve the performance of speaker verification systems. We use a Spanish database, where besides the speech signals, we also include articulatory information that was acquired with an ultrasound system. Two groups of features are proposed to represent the articulatory information, and the obtained performance is compared to an SV system trained only with acoustic information. Our results show that the proposed features contain highly discriminative information, and they are related to speaker identity; furthermore, these features can be used to complement and improve existing systems by combining such information with cepstral coefficients at the feature level.
35

Bharathi, B. "Speaker-specific-text based speaker verification system using spectral and phase based features." International Journal of Speech Technology 20, no. 3 (May 12, 2017): 465–74. http://dx.doi.org/10.1007/s10772-017-9416-2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Wang, Meng, Dazheng Feng, Tingting Su, and Mohan Chen. "Attention-Based Temporal-Frequency Aggregation for Speaker Verification." Sensors 22, no. 6 (March 10, 2022): 2147. http://dx.doi.org/10.3390/s22062147.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Convolutional neural networks (CNNs) have significantly promoted the development of speaker verification (SV) systems because of their powerful deep feature learning capability. In CNN-based SV systems, utterance-level aggregation is an important component, and it compresses the frame-level features generated by the CNN frontend into an utterance-level representation. However, most of the existing aggregation methods aggregate the extracted features across time and cannot capture the speaker-dependent information contained in the frequency domain. To handle this problem, this paper proposes a novel attention-based frequency aggregation method, which focuses on the key frequency bands that provide more information for utterance-level representation. Meanwhile, two more effective temporal-frequency aggregation methods are proposed in combination with the existing temporal aggregation methods. The two proposed methods can capture the speaker-dependent information contained in both the time domain and frequency domain of frame-level features, thus improving the discriminability of speaker embedding. Besides, a powerful CNN-based SV system is developed and evaluated on the TIMIT and Voxceleb datasets. The experimental results indicate that the CNN-based SV system using the temporal-frequency aggregation method achieves a superior equal error rate of 5.96% on Voxceleb compared with the state-of-the-art baseline models.
37

Takialddin, Al smadi, and Ahmed Handam. "Artificial neural networks for voice activity detection Technology." Journal of Advanced Sciences and Engineering Technologies 5, no. 1 (January 14, 2022): 23–31. http://dx.doi.org/10.32441/jaset.05.01.03.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Currently, the direction of voice biometrics is actively developing, which includes two related tasks of recognizing the speaker by voice: the verification task, which consists in determining the speaker's personality, and the identification task, which is responsible for checking the belonging of the phonogram to a particular speaker. An open question remains related to improving the quality of the verification identification algorithms in real conditions and reducing the probability of error. In this work study Voice activity detection algorithm is proposed, which is a modification of the algorithm based on pitch statistics; VAD is investigated as a component of a speaker recognition system by voice, and therefore the main purpose of its work is to improve the quality of the system as a whole. On the example of the proposed modification of the VAD algorithm and the energy-based VAD algorithm, the analysis of the influence of the choice on the quality of the speaker recognition system is carried out.
38

Indu D. "A Methodology for Speaker Diazaration System Based on LSTM and MFCC Coefficients." Journal of Electrical Systems 20, no. 6s (May 2, 2024): 2938–45. http://dx.doi.org/10.52783/jes.3299.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Research on Speaker Identification is always difficult. A speaker may be automatically identified using by comparing their voice sample with their previously recorded voice, the machine learning strategy has grown in favor in recent years. Convolutional neural networks (CNN) , deep neural networks (DNN) are some of the machine learning techniques that has employed recently. The article will discuss a successful speaker verification system based on the d-vector to construct a new approach based on speaker diarization. In particular, in this article, we use the concept of LSTM to cluster the speech segments using MFCC coefficients and identify the speakers in the diarization system. The proposed system will be evaluated using benchmark performance metrics, and a comparative study will be made with other models. The need to consider the LSTM neural network using acoustic data and linguistic dialect is considered. LSTM networks could produce reliable speaker segmentation outputs.
39

Suhartono, Suhartono, Fresy Nugroho, Muhammad Faisal, Muhammad Ainul Yaqin, and Suyanta Suyanta. "Speaker Recognition in Content-based Image Retrieval for a High Degree of Accuracy." Bulletin of Electrical Engineering and Informatics 7, no. 3 (September 1, 2018): 350–58. http://dx.doi.org/10.11591/eei.v7i3.957.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The purpose of this research is to measure the speaker recognition accuracy in Content-Based Image Retrieval. To support research in speaker recognition accuracy, we use two approaches for recognition system: identification and verification, an identification using fuzzy Mamdani, a verification using Manhattan distance. The test results in this research. The best of distance mean is size 32x32. The best of the verification for distance rate is 965, and the speaker recognition system has a standard error of 5% and the system accuracy is 95%. From these results, we find that there is an increase in accuracy of almost 2.5%. This is due to a combination of two approaches so the system can add to the accuracy of speaker recognition.
40

Gorban, Igor I., Nick I. Gorban, and Anatoly V. Klimenko. "Crime‐detection automatic speaker verification and identification (CASVI) system." Journal of the Acoustical Society of America 105, no. 2 (February 1999): 1353. http://dx.doi.org/10.1121/1.426411.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Ramos-Lara, Rafael, Mariano López-García, Enrique Cantó-Navarro, and Luís Puente-Rodriguez. "Real-Time Speaker Verification System Implemented on Reconfigurable Hardware." Journal of Signal Processing Systems 71, no. 2 (June 28, 2012): 89–103. http://dx.doi.org/10.1007/s11265-012-0683-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Impedovo, Donato, Giuseppe Pirlo, and Mario Petrone. "A multi-resolution multi-classifier system for speaker verification." Expert Systems 29, no. 5 (June 2, 2011): 442–55. http://dx.doi.org/10.1111/j.1468-0394.2011.00603.x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Nirmal, Asmita, Deepak Jayaswal, and Pramod H. Kachare. "Statistically Significant Duration-Independent-based Noise-Robust Speaker Verification." International Journal of Mathematical, Engineering and Management Sciences 9, no. 1 (February 1, 2024): 147–62. http://dx.doi.org/10.33889/ijmems.2024.9.1.008.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
A speaker verification system models individual speakers using different speech features to improve their robustness. However, redundant features degrade the system's performance. This paper presents Statistically Significant Duration-Independent Mel frequency Cepstral Coefficients (SSDI-MFCC) features with the Extreme Gradient Boost classifier for improving the noise-robustness of speaker models. Eight statistical descriptors are used to generate signal duration-independent features, and a statistically significant feature subset is obtained using a t-test. A redeveloped Librispeech database by adding noises from the AURORA database to simulate real-world test conditions for speaker verification is used for evaluation. The SSDI-MFCC is compared with Principal Component Analysis (PCA) and Genetic Algorithm (GA). The comparative results showed average equal error rate improvements by 4.93 % and 3.48 % with the SSDI-MFCC than GA-MFCC and PCA-MFCC in clean and noisy conditions, respectively. A significant reduction in verification time is observed using SSDI-MFCC than the complete feature set.
44

Selin, M., and Dr K. Preetha Mathew. "Text-independent Speaker Verification Using Hybrid Convolutional Neural Networks." Webology 18, no. 2 (December 23, 2021): 756–66. http://dx.doi.org/10.14704/web/v18i2/web18352.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Automatic speaker verification is an active research area for more than four decades, and the technology has gradually upgraded for real application. In this paper, a hybrid convolutional neural network (CNN) model is proposed where a combination of the 3D CNN & 2D CNN model is used for speaker verification in the text-independent scenario. For speaker verification, this novel convolutional neural network architecture was built to capture and discard speaker and non-speaker information at the same time. In the training process, the network is trained to differentiate between different identities of a speaker to establish the background model. The model development of the speaker is one of the important aspects. Most conventional techniques employed the d-vector system to create speaker models by means of an average of the features collected from the speaker utterance. Here a hybrid of convolutional neural networks model is utilized in the development and registration phases for building a speaker model. The approach suggested exceeds the existing methods of speaker verification.
45

Mingote, Victoria, Antonio Miguel, Alfonso Ortega, and Eduardo Lleida. "Supervector Extraction for Encoding Speaker and Phrase Information with Neural Networks for Text-Dependent Speaker Verification." Applied Sciences 9, no. 16 (August 11, 2019): 3295. http://dx.doi.org/10.3390/app9163295.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In this paper, we propose a new differentiable neural network with an alignment mechanism for text-dependent speaker verification. Unlike previous works, we do not extract the embedding of an utterance from the global average pooling of the temporal dimension. Our system replaces this reduction mechanism by a phonetic phrase alignment model to keep the temporal structure of each phrase since the phonetic information is relevant in the verification task. Moreover, we can apply a convolutional neural network as front-end, and, thanks to the alignment process being differentiable, we can train the network to produce a supervector for each utterance that will be discriminative to the speaker and the phrase simultaneously. This choice has the advantage that the supervector encodes the phrase and speaker information providing good performance in text-dependent speaker verification tasks. The verification process is performed using a basic similarity metric. The new model using alignment to produce supervectors was evaluated on the RSR2015-Part I database, providing competitive results compared to similar size networks that make use of the global average pooling to extract embeddings. Furthermore, we also evaluated this proposal on the RSR2015-Part II. To our knowledge, this system achieves the best published results obtained on this second part.
46

Khan, Umair, Pooyan Safari, and Javier Hernando. "Restricted Boltzmann Machine Vectors for Speaker Clustering and Tracking Tasks in TV Broadcast Shows." Applied Sciences 9, no. 13 (July 9, 2019): 2761. http://dx.doi.org/10.3390/app9132761.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Restricted Boltzmann Machines (RBMs) have shown success in both the front-end and backend of speaker verification systems. In this paper, we propose applying RBMs to the front-end for the tasks of speaker clustering and speaker tracking in TV broadcast shows. RBMs are trained to transform utterances into a vector based representation. Because of the lack of data for a test speaker, we propose RBM adaptation to a global model. First, the global model—which is referred to as universal RBM—is trained with all the available background data. Then an adapted RBM model is trained with the data of each test speaker. The visible to hidden weight matrices of the adapted models are concatenated along with the bias vectors and are whitened to generate the vector representation of speakers. These vectors, referred to as RBM vectors, were shown to preserve speaker-specific information and are used in the tasks of speaker clustering and speaker tracking. The evaluation was performed on the audio recordings of Catalan TV Broadcast shows. The experimental results show that our proposed speaker clustering system gained up to 12% relative improvement, in terms of Equal Impurity (EI), over the baseline system. On the other hand, in the task of speaker tracking, our system has a relative improvement of 11% and 7% compared to the baseline system using cosine and Probabilistic Linear Discriminant Analysis (PLDA) scoring, respectively.
47

Rudramurthy, M. S., V. Kamakshi Prasad, and R. Kumaraswamy. "Speaker Verification Under Degraded Conditions Using Empirical Mode Decomposition Based Voice Activity Detection Algorithm." Journal of Intelligent Systems 23, no. 4 (December 1, 2014): 359–78. http://dx.doi.org/10.1515/jisys-2013-0085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
AbstractThe performance of most of the state-of-the-art speaker recognition (SR) systems deteriorates under degraded conditions, owing to mismatch between the training and testing sessions. This study focuses on the front end of the speaker verification (SV) system to reduce the mismatch between training and testing. An adaptive voice activity detection (VAD) algorithm using zero-frequency filter assisted peaking resonator (ZFFPR) was integrated into the front end of the SV system. The performance of this proposed SV system was studied under degraded conditions with 50 selected speakers from the NIST 2003 database. The degraded condition was simulated by adding different types of noises to the original speech utterances. The different types of noises were chosen from the NOISEX-92 database to simulate degraded conditions at signal-to-noise ratio levels from 0 to 20 dB. In this study, widely used 39-dimension Mel frequency cepstral coefficient (MFCC; i.e., 13-dimension MFCCs augmented with 13-dimension velocity and 13-dimension acceleration coefficients) features were used, and Gaussian mixture model–universal background model was used for speaker modeling. The proposed system’s performance was studied against the energy-based VAD used as the front end of the SV system. The proposed SV system showed some encouraging results when EMD-based VAD was used at its front end.
48

Chen, Zesheng, Li-Chi Chang, Chao Chen, Guoping Wang, and Zhuming Bi. "Defending against FakeBob Adversarial Attacks in Speaker Verification Systems with Noise-Adding." Algorithms 15, no. 8 (August 17, 2022): 293. http://dx.doi.org/10.3390/a15080293.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Speaker verification systems use human voices as an important biometric to identify legitimate users, thus adding a security layer to voice-controlled Internet-of-things smart homes against illegal access. Recent studies have demonstrated that speaker verification systems are vulnerable to adversarial attacks such as FakeBob. The goal of this work is to design and implement a simple and light-weight defense system that is effective against FakeBob. We specifically study two opposite pre-processing operations on input audios in speak verification systems: denoising that attempts to remove or reduce perturbations and noise-adding that adds small noise to an input audio. Through experiments, we demonstrate that both methods are able to weaken the ability of FakeBob attacks significantly, with noise-adding achieving even better performance than denoising. Specifically, with denoising, the targeted attack success rate of FakeBob attacks can be reduced from 100% to 56.05% in GMM speaker verification systems, and from 95% to only 38.63% in i-vector speaker verification systems, respectively. With noise adding, those numbers can be further lowered down to 5.20% and 0.50%, respectively. As a proactive measure, we study several possible adaptive FakeBob attacks against the noise-adding method. Experiment results demonstrate that noise-adding can still provide a considerable level of protection against these countermeasures.
49

Jayanthi Kumari, T. R., and H. S. Jayanna. "i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques." Journal of Intelligent Systems 29, no. 1 (May 3, 2018): 565–82. http://dx.doi.org/10.1515/jisys-2017-0047.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Abstract In many biometric applications, limited data speaker verification plays a significant role in practical-oriented systems to verify the speaker. The performance of the speaker verification system needs to be improved by applying suitable techniques to limited data condition. The limited data represent both train and test data duration in terms of few seconds. This article shows the importance of the speaker verification system under limited data condition using feature- and score-level fusion techniques. The baseline speaker verification system uses vocal tract features like mel-frequency cepstral coefficients, linear predictive cepstral coefficients and excitation source features like linear prediction residual and linear prediction residual phase as features along with i-vector modeling techniques using the NIST 2003 data set. In feature-level fusion, the vocal tract features are fused with excitation source features. As a result, on average, equal error rate (EER) is approximately equal to 4% compared to individual feature performance. Further in this work, two different types of score-level fusion are demonstrated. In the first case, fusing the scores of vocal tract features and excitation source features at score-level-maintaining modeling technique remains the same, which provides an average reduction approximately equal to 2% EER compared to feature-level fusion performance. In the second case, scores of the different modeling techniques are combined, which has resulted in EER reduction approximately equal to 4.5% compared with score-level fusion of different features.
50

Mao, Hongwei, Yan Shi, Yue Liu, Linqiang Wei, Yijie Li, and Yanhua Long. "Short-time speaker verification with different speaking style utterances." PLOS ONE 15, no. 11 (November 11, 2020): e0241809. http://dx.doi.org/10.1371/journal.pone.0241809.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studies focused on extracting target speaker information from natural speech. This paper aims to design a new ASV corpus with multi-speaking styles and investigate the ASV robustness to these different speaking styles. We first release this corpus in the Zenodo website for public research, in which each speaker has several text-dependent and text-independent singing, humming and normal reading speech utterances. Then, we investigate the speaker discrimination of each speaking style in the feature space. Furthermore, the intra and inter-speaker variabilities in each different speaking style and cross-speaking styles are investigated in both text-dependent and text-independent ASV tasks. Conventional Gaussian Mixture Model (GMM), and the state-of-the-art x-vector are used to build ASV systems. Experimental results show that the voiceprint information in humming and singing speech are more distinguishable than that in normal reading speech for conventional ASV systems. Furthermore, we find that combing the three speaking styles can significantly improve the x-vector based ASV system, even when only limited gains are obtained by conventional GMM-based systems.

To the bibliography