Zaloguj się

Gotowe bibliografie tematyczne / Speaker recognition systems / Artykuły w czasopismach

Artykuły w czasopismach na temat „Speaker recognition systems”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Speaker recognition systems.

Autor: Grafiati

Data publikacji: 4 czerwca 2021

Data aktualizacji: 19 lutego 2023

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Speaker recognition systems”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Gonzalez-Rodriguez, Joaquin. "Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)". Loquens 1, nr 1 (30.06.2014): e007. http://dx.doi.org/10.3989/loquens.2014.007.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

2

Bouziane, Ayoub, Jamal Kharroubi i Arsalane Zarghili. "Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models". International Journal of Electrical and Computer Engineering (IJECE) 7, nr 6 (1.12.2017): 3655. http://dx.doi.org/10.11591/ijece.v7i6.pp3655-3663.

Pełny tekst źródła

Streszczenie:

<p>This paper presents a novel speaker modeling approachfor speaker recognition systems. The basic idea of this approach consists of deriving the target speaker model from a personalized background model, composed only of the UBM Gaussian components which are really present in the speech of the target speaker. The motivation behind the derivation of speakers’ models from personalized background models is to exploit the observeddifference insome acoustic-classes between speakers, in order to improve the performance of speaker recognition systems.</p>The proposed approach was evaluatedfor speaker verification task using various amounts of training and testing speech data. The experimental results showed that the proposed approach is efficientin termsof both verification performance and computational cost during the testing phase of the system, compared to the traditional UBM based speaker recognition systems.

Style APA, Harvard, Vancouver, ISO itp.

3

Singh, Satyanand. "Forensic and Automatic Speaker Recognition System". International Journal of Electrical and Computer Engineering (IJECE) 8, nr 5 (1.10.2018): 2804. http://dx.doi.org/10.11591/ijece.v8i5.pp2804-2811.

Pełny tekst źródła

Streszczenie:

<span lang="EN-US">Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics</span>

Style APA, Harvard, Vancouver, ISO itp.

4

Singh, Mahesh K., P. Mohana Satya, Vella Satyanarayana i Sridevi Gamini. "Speaker Recognition Assessment in a Continuous System for Speaker Identification". International Journal of Electrical and Electronics Research 10, nr 4 (30.12.2022): 862–67. http://dx.doi.org/10.37391/ijeer.100418.

Pełny tekst źródła

Streszczenie:

This research article presented and focused on recognizing speakers through multi-speaker speeches. The participation of several speakers includes every conference, talk or discussion. This type of talk has different problems as well as stages of processing. Challenges include the unique impurity of the surroundings, the involvement of speakers, speaker distance, microphone equipment etc. In addition to addressing these hurdles in real time, there are also problems in the treatment of the multi-speaker speech. Identifying speech segments, separating the speaking segments, constructing clusters of similar segments and finally recognizing the speaker using these segments are the common sequential operations in the context of multi-speaker speech recognition. All linked phases of speech recognition processes are discussed with relevant methodologies in this article. This entire article will examine the common metrics, methods and conduct. This paper examined the algorithm of speech recognition system at different stages. The voice recognition systems are built through many phases such as voice filter, speaker segmentation, speaker idolization and the recognition of the speaker by 20 speakers.

Style APA, Harvard, Vancouver, ISO itp.

5

Mridha, Muhammad Firoz, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md Abdul Hamid, Md Rashedul Islam i Yutaka Watanobe. "U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data". Applied Sciences 11, nr 21 (27.10.2021): 10079. http://dx.doi.org/10.3390/app112110079.

Pełny tekst źródła

Streszczenie:

Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, termed unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, a Bengali dataset is included to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves satisfactory performance using pairwise architectures.

Style APA, Harvard, Vancouver, ISO itp.

6

Nematollahi, Mohammad Ali, i S. A. R. Al-Haddad. "Distant Speaker Recognition: An Overview". International Journal of Humanoid Robotics 13, nr 02 (25.05.2016): 1550032. http://dx.doi.org/10.1142/s0219843615500322.

Pełny tekst źródła

Streszczenie:

Distant speaker recognition (DSR) system assumes the microphones are far away from the speaker’s mouth. Also, the position of microphones can vary. Furthermore, various challenges and limitation in terms of coloration, ambient noise and reverberation can bring some difficulties for recognition of the speaker. Although, applying speech enhancement techniques can attenuate speech distortion components, it may remove speaker-specific information and increase the processing time in real-time application. Currently, many efforts have been investigated to develop DSR for commercial viable systems. In this paper, state-of-the-art techniques in DSR such as robust feature extraction, feature normalization, robust speaker modeling, model compensation, dereverberation and score normalization are discussed to overcome the speech degradation components i.e., reverberation and ambient noise. Performance results on DSR show that whenever speaker to microphone distant increases, recognition rates decreases and equal error rate (EER) increases. Finally, the paper concludes that applying robust feature and robust speaker model varying lesser with distant, can improve the DSR performance.

Style APA, Harvard, Vancouver, ISO itp.

7

Garcia‐Romero, Daniel, i Carol Espy‐Wilson. "Automatic speaker recognition: Advances toward informative systems." Journal of the Acoustical Society of America 128, nr 4 (październik 2010): 2394. http://dx.doi.org/10.1121/1.3508584.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

8

Padmanabhan, M., L. R. Bahl, D. Nahamoo i M. A. Picheny. "Speaker clustering and transformation for speaker adaptation in speech recognition systems". IEEE Transactions on Speech and Audio Processing 6, nr 1 (1998): 71–77. http://dx.doi.org/10.1109/89.650313.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

9

Singh, Satyanand. "Bayesian distance metric learning and its application in automatic speaker recognition systems". International Journal of Electrical and Computer Engineering (IJECE) 9, nr 4 (1.08.2019): 2960. http://dx.doi.org/10.11591/ijece.v9i4.pp2960-2967.

Pełny tekst źródła

Streszczenie:

This paper proposes state-of the-art Automatic Speaker Recognition System (ASR) based on Bayesian Distance Learning Metric as a feature extractor. In this modeling, I explored the constraints of the distance between modified and simplified i-vector pairs by the same speaker and different speakers. An approximation of the distance metric is used as a weighted covariance matrix from the higher eigenvectors of the covariance matrix, which is used to estimate the posterior distribution of the metric distance. Given a speaker tag, I select the data pair of the different speakers with the highest cosine score to form a set of speaker constraints. This collection captures the most discriminating variability between the speakers in the training data. This Bayesian distance learning approach achieves better performance than the most advanced methods. Furthermore, this method is insensitive to normalization compared to cosine scores. This method is very effective in the case of limited training data. The modified supervised i-vector based ASR system is evaluated on the NIST SRE 2008 database. The best performance of the combined cosine score EER 1.767% obtained using LDA200 + NCA200 + LDA200, and the best performance of Bayes_dml EER 1.775% obtained using LDA200 + NCA200 + LDA100. Bayesian_dml overcomes the combined norm of cosine scores and is the best result of the short2-short3 condition report for NIST SRE 2008 data.

Style APA, Harvard, Vancouver, ISO itp.

10

Kamiński, Kamil A., i Andrzej P. Dobrowolski. "Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features". Sensors 22, nr 23 (1.12.2022): 9370. http://dx.doi.org/10.3390/s22239370.

Pełny tekst źródła

Streszczenie:

This article presents the Automatic Speaker Recognition System (ASR System), which successfully resolves problems such as identification within an open set of speakers and the verification of speakers in difficult recording conditions similar to telephone transmission conditions. The article provides complete information on the architecture of the various internal processing modules of the ASR System. The speaker recognition system proposed in the article, has been compared very closely to other competing systems, achieving improved speaker identification and verification results, on known certified voice dataset. The ASR System owes this to the dual use of genetic algorithms both in the feature selection process and in the optimization of the system’s internal parameters. This was also influenced by the proprietary feature generation and corresponding classification process using Gaussian mixture models. This allowed the development of a system that makes an important contribution to the current state of the art in speaker recognition systems for telephone transmission applications with known speech coding standards.

Style APA, Harvard, Vancouver, ISO itp.

11

Sangwan, Pardeep, i Saurabh Bhardwaj. "A Structured Approach towards Robust Database Collection for Speaker Recognition". Global Journal of Enterprise Information System 9, nr 3 (27.09.2017): 53. http://dx.doi.org/10.18311/gjeis/2017/16123.

Pełny tekst źródła

Streszczenie:

<p>Speaker recognition systems are classified according to their database, feature extraction techniques and classification methods. It is analyzed that there is a much need to work upon all the dimensions of forensic speaker recognition systems from the very beginning phase of database collection to recognition phase. The present work provides a structured approach towards developing a robust speech database collection for efficient speaker recognition system. The database required for both systems is entirely different. The databases for biometric systems are readily available while databases for forensic speaker recognition system are scarce. The paper also presents several databases available for speaker recognition systems.</p><p> </p>

Style APA, Harvard, Vancouver, ISO itp.

12

Vryzas, Nikolaos, Nikolaos Tsipas i Charalampos Dimoulas. "Web Radio Automation for Audio Stream Management in the Era of Big Data". Information 11, nr 4 (11.04.2020): 205. http://dx.doi.org/10.3390/info11040205.

Pełny tekst źródła

Streszczenie:

Radio is evolving in a changing digital media ecosystem. Audio-on-demand has shaped the landscape of big unstructured audio data available online. In this paper, a framework for knowledge extraction is introduced, to improve discoverability and enrichment of the provided content. A web application for live radio production and streaming is developed. The application offers typical live mixing and broadcasting functionality, while performing real-time annotation as a background process by logging user operation events. For the needs of a typical radio station, a supervised speaker classification model is trained for the recognition of 24 known speakers. The model is based on a convolutional neural network (CNN) architecture. Since not all speakers are known in radio shows, a CNN-based speaker diarization method is also proposed. The trained model is used for the extraction of fixed-size identity d-vectors. Several clustering algorithms are evaluated, having the d-vectors as input. The supervised speaker recognition model for 24 speakers scores an accuracy of 88.34%, while unsupervised speaker diarization scores a maximum accuracy of 87.22%, as tested on an audio file with speech segments from three unknown speakers. The results are considered encouraging regarding the applicability of the proposed methodology.

Style APA, Harvard, Vancouver, ISO itp.

13

Li, Jiguo, Xinfeng Zhang, Jizheng Xu, Siwei Ma i Wen Gao. "Learning to Fool the Speaker Recognition". ACM Transactions on Multimedia Computing, Communications, and Applications 17, nr 3s (31.10.2021): 1–21. http://dx.doi.org/10.1145/3468673.

Pełny tekst źródła

Streszczenie:

Due to the widespread deployment of fingerprint/face/speaker recognition systems, the risk in these systems, especially the adversarial attack, has drawn increasing attention in recent years. Previous researches mainly studied the adversarial attack to the vision-based systems, such as fingerprint and face recognition. While the attack for speech-based systems has not been well studied yet, although it has been widely used in our daily life. In this article, we attempt to fool the state-of-the-art speaker recognition model and present speaker recognition attacker , a lightweight multi-layer convolutional neural network to fool the well-trained state-of-the-art speaker recognition model by adding imperceptible perturbations onto the raw speech waveform. We find that the speaker recognition system is vulnerable to the adversarial attack, and achieve a high success rate on both the non-targeted attack and targeted attack. Besides, we present an effective method by leveraging a pretrained phoneme recognition model to optimize the speaker recognition attacker to obtain a tradeoff between the attack success rate and the perceptual quality. Experimental results on the TIMIT and LibriSpeech datasets demonstrate the effectiveness and efficiency of our proposed model. And the experiments for frequency analysis indicate that high-frequency attack is more effective than low-frequency attack, which is different from the conclusion drawn in previous image-based works. Additionally, the ablation study gives more insights into our model.

Style APA, Harvard, Vancouver, ISO itp.

14

Marini, Marco, Nicola Vanello i Luca Fanucci. "Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria". Sensors 21, nr 19 (27.09.2021): 6460. http://dx.doi.org/10.3390/s21196460.

Pełny tekst źródła

Streszczenie:

Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the speaker’s voice features and the optimal window and shift parameters that minimises the error of an ASR system, for that specific speaker. For our experiments, we used both impaired and unimpaired Italian speech. Specifically, we used 30 speakers with dysarthria from the IDEA database and 10 professional speakers from the CLIPS database. Both databases are freely available. The results confirm that, if a standard ASR system performs poorly with a speaker with dysarthria, it can be improved by using the new speech analysis. Otherwise, the new approach is ineffective in cases of unimpaired and low impaired speech. Furthermore, there exists a correlation between some speaker’s voice features and their optimal parameters.

Style APA, Harvard, Vancouver, ISO itp.

15

Jayanna, H. S., i B. G. Nagaraja. "An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification". Journal of Intelligent Systems 25, nr 4 (1.10.2016): 529–38. http://dx.doi.org/10.1515/jisys-2014-0128.

Pełny tekst źródła

Streszczenie:

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.

Style APA, Harvard, Vancouver, ISO itp.

16

Lotia, Piyush, i M. R. Khan. "Development of Speech Corpora for Speaker Recognition Systems". i-manager's Journal on Electrical Engineering 4, nr 4 (15.06.2011): 19–25. http://dx.doi.org/10.26634/jee.4.4.1455.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

17

Hickt, L. "Speech and speaker recognition". Signal Processing 13, nr 3 (październik 1987): 336–38. http://dx.doi.org/10.1016/0165-1684(87)90137-x.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

18

Ahmed, Ahmed M., i Aliaa K. Hassan. "Speaker Recognition Systems in the Last Decade – A Survey". Engineering and Technology Journal 39, nr 1B (25.03.2021): 30–40. http://dx.doi.org/10.30684/etj.v39i1b.1589.

Pełny tekst źródła

Streszczenie:

Speaker Recognition Defined by the process of recognizing a person by his\her voice through specific features that extract from his\her voice signal. An Automatic Speaker recognition (ASP) is a biometric authentication system. In the last decade, many advances in the speaker recognition field have been attained, along with many techniques in feature extraction and modeling phases. In this paper, we present an overview of the most recent works in ASP technology. The study makes an effort to discuss several modeling ASP techniques like Gaussian Mixture Model GMM, Vector Quantization (VQ), and Clustering Algorithms. Also, several feature extraction techniques like Linear Predictive Coding (LPC) and Mel frequency cepstral coefficients (MFCC) are examined. Finally, as a result of this study, we found MFCC and GMM methods could be considered as the most successful techniques in the field of speaker recognition so far.

Style APA, Harvard, Vancouver, ISO itp.

19

Hussein, Jabbar S., Abdulkadhim A. Salman i Thmer R. Saeed. "Arabic speaker recognition using HMM". Indonesian Journal of Electrical Engineering and Computer Science 23, nr 2 (1.08.2021): 1212. http://dx.doi.org/10.11591/ijeecs.v23.i2.pp1212-1218.

Pełny tekst źródła

Streszczenie:

In this paper, a new suggested system for speaker recognition by using hidden markov model (HHM) algorithm. Many researches have been written in this subject, especially by HMM. Arabic language is one of the difficult languages and the work with it is very little, also, the work has been done for text dependent system where HMM is very effective and the algorithm trained at the word level. One the problems in such systems is the noise, so we take it in consideration by adding additive white gaussian noise (AWGN) to the speech signals to see its effect. Here, we used HMM with new algorithm with one state, where two of these components, i.e. (π and A) are removed. This give extremely accelerates the training and testing stages of recognition speeds with lowest memory usage, as seen in the work. The results show an excellent outcome. 100% recognition rate for the tested data, about 91.6% recognition rate with AWGN noise.

Style APA, Harvard, Vancouver, ISO itp.

20

Lin, Jiang, Yi Yumei, Zhang Maosheng, Chen Defeng, Wang Chao i Wang Tonghan. "A Multiscale Chaotic Feature Extraction Method for Speaker Recognition". Complexity 2020 (2.12.2020): 1–9. http://dx.doi.org/10.1155/2020/8810901.

Pełny tekst źródła

Streszczenie:

In speaker recognition systems, feature extraction is a challenging task under environment noise conditions. To improve the robustness of the feature, we proposed a multiscale chaotic feature for speaker recognition. We use a multiresolution analysis technique to capture more finer information on different speakers in the frequency domain. Then, we extracted the speech chaotic characteristics based on the nonlinear dynamic model, which helps to improve the discrimination of features. Finally, we use a GMM-UBM model to develop a speaker recognition system. Our experimental results verified its good performance. Under clean speech and noise speech conditions, the ERR value of our method is reduced by 13.94% and 26.5% compared with the state-of-the-art method, respectively.

Style APA, Harvard, Vancouver, ISO itp.

21

Bageshree Pathak, Dr, i Shriyanti Kulkarni. "Speaker Recognition System for Home Security using Raspberry Pi and Python". International Journal of Engineering & Technology 7, nr 4.5 (22.09.2018): 95. http://dx.doi.org/10.14419/ijet.v7i4.5.20019.

Pełny tekst źródła

Streszczenie:

The transfer of manual controls to machine controls is automation. Automation is the need of the hour. Home automation is automation of home systems to create smart homes. It includes security systems, appliance control and environment control. The increasing need for safety and security has brought biometric security systems to the forefront. Speech being unique and individualistic can be used for biometric identification. The proposed system is a prototype which can be fitted for speaker recognition for home security. The system will identify the registered speakers and will allow access to the recognized speaker. The system is implemented on Raspberry pi platform using Python language.

Style APA, Harvard, Vancouver, ISO itp.

22

Brummer, Niko, Lukas Burget, Jan Cernocky, Ondrej Glembek, Frantisek Grezl, Martin Karafiat, David A. van Leeuwen, Pavel Matejka, Petr Schwarz i Albert Strasheim. "Fusion of Heterogeneous Speaker Recognition Systems in the STBU Submission for the NIST Speaker Recognition Evaluation 2006". IEEE Transactions on Audio, Speech, and Language Processing 15, nr 7 (wrzesień 2007): 2072–84. http://dx.doi.org/10.1109/tasl.2007.902870.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

23

Shah, Shahid Munir, Muhammad Moinuddin i Rizwan Ahmed Khan. "A Robust Approach for Speaker Identification Using Dialect Information". Applied Computational Intelligence and Soft Computing 2022 (7.03.2022): 1–16. http://dx.doi.org/10.1155/2022/4980920.

Pełny tekst źródła

Streszczenie:

The present research is an effort to enhance the performance of voice processing systems, in our case the speaker identification system (SIS) by addressing the variability caused by the dialectical variations of a language. We present an effective solution to reduce dialect-related variability from voice processing systems. The proposed method minimizes the system’s complexity by reducing search space during the testing process of speaker identification. The speaker is searched from the set of speakers of the identified dialect instead of all the speakers present in system training. The study is conducted on the Pashto language, and the voice data samples are collected from native Pashto speakers of specific regions of Pakistan and Afghanistan where Pashto is spoken with different dialectal variations. The task of speaker identification is achieved with the help of a novel hierarchical framework that works in two steps. In the first step, the speaker’s dialect is identified. For automated dialect identification, the spectral and prosodic features have been used in conjunction with Gaussian mixture model (GMM). In the second step, the speaker is identified using a multilayer perceptron (MLP)-based speaker identification system, which gets aggregated input from the first step, i.e., dialect identification along with prosodic and spectral features. The robustness of the proposed SIS is compared with traditional state-of-the-art methods in the literature. The results show that the proposed framework is better in terms of average speaker recognition accuracy (84.5% identification accuracy) and consumes 39% less time for the identification of speaker.

Style APA, Harvard, Vancouver, ISO itp.

24

Nidhyananthan, S. Selva, Prasad M. i Shantha Selva Kumari R. "Secure Speaker Recognition using BGN Cryptosystem with Prime Order Bilinear Group". International Journal of Information Security and Privacy 9, nr 4 (październik 2015): 1–19. http://dx.doi.org/10.4018/ijisp.2015100101.

Pełny tekst źródła

Streszczenie:

Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this paper, framework for secure speaker recognition system using BGN Cryptosystem, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user during speaker recognition process. Secure speaker recognition makes use of Secure Multiparty Computation (SMC) based on the homomorphic properties of cryptosystem. Among the cryptosytem with homomorphic properties BGN is preferable, because it is partially doubly homomorphic, which can perform arbitrary number of addition and only one multiplication. But the main disadvantage of using BGN cryptosystem is its execution time. In proposed system, the execution time is reduced by a factor of 12 by replacing conventional composite order group by prime order group. This leads to an efficient secure speaker recognition.

Style APA, Harvard, Vancouver, ISO itp.

25

Dişken, Gökay, Zekeriya Tüfekci i Ulus Çevik. "Speaker Model Clustering to Construct Background Models for Speaker Verification". Archives of Acoustics 42, nr 1 (1.03.2017): 127–35. http://dx.doi.org/10.1515/aoa-2017-0014.

Pełny tekst źródła

Streszczenie:

Abstract Conventional speaker recognition systems use the Universal Background Model (UBM) as an imposter for all speakers. In this paper, speaker models are clustered to obtain better imposter model representations for speaker verification purpose. First, a UBM is trained, and speaker models are adapted from the UBM. Then, the k-means algorithm with the Euclidean distance measure is applied to the speaker models. The speakers are divided into two, three, four, and five clusters. The resulting cluster centers are used as background models of their respective speakers. Experiments showed that the proposed method consistently produced lower Equal Error Rates (EER) than the conventional UBM approach for 3, 10, and 30 seconds long test utterances, and also for channel mismatch conditions. The proposed method is also compared with the i-vector approach. The three-cluster model achieved the best performance with a 12.4% relative EER reduction in average, compared to the i-vector method. Statistical significance of the results are also given.

Style APA, Harvard, Vancouver, ISO itp.

26

Pirhosseinloo, Shadi, i Farshad Almas Ganj. "Discriminative speaker adaptation in Persian continuous speech recognition systems". Procedia - Social and Behavioral Sciences 32 (2012): 296–301. http://dx.doi.org/10.1016/j.sbspro.2012.01.043.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

27

Herbig, Tobias, Franz Gerl, Wolfgang Minker i Reinhold Haeb-Umbach. "Adaptive systems for unsupervised speaker tracking and speech recognition". Evolving Systems 2, nr 3 (3.07.2011): 199–214. http://dx.doi.org/10.1007/s12530-011-9034-1.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

28

Karamangala, Narendra, i Ramaswamy Kumaraswamy. "Speaker Recognition in Uncontrolled Environment: A Review". Journal of Intelligent Systems 22, nr 1 (1.03.2013): 49–65. http://dx.doi.org/10.1515/jisys-2012-0020.

Pełny tekst źródła

Streszczenie:

Abstract.Speaker recognition has been an active research area for many years. Methods to represent and quantify information embedded in speech signal are termed as features of the signal. The features are obtained, modeled and stored for further reference when the system is to be tested. Decision whether to accept or reject speakers are taken based on parameters of the data modeling techniques. Real world offers various degradations to the signal that hamper the signal quality. The degradations may be due to ambient background noise, reverberation or multispeaker scenario. This paper presents a survey of various feature extraction, data modeling methods, metrics that are used to take the decisions and methods that can be used to preprocess the degraded data that have been used to perform the task of speaker recognition.

Style APA, Harvard, Vancouver, ISO itp.

29

Lin, Chin-Teng, Hsi-Wen Nein i Wei-Fen Lin. "SPEAKER ADAPTATION OF FUZZY-PERCEPTRON-BASED SPEECH RECOGNITION". International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems 07, nr 01 (luty 1999): 1–30. http://dx.doi.org/10.1142/s0218488599000027.

Pełny tekst źródła

Streszczenie:

In this paper, we propose a speech recognition algorithm which utilizes hidden Markov models (HMM) and Viterbi algorithm for segmenting the input speech sequence, such that the variable-dimensional speech signal is converted into a fixed-dimensional speech signal, called TN vector. We then use the fuzzy perceptron to generate hyperplanes which separate patterns of each class from the others. The proposed speech recognition algorithm is easy for speaker adaptation when the idea of "supporting pattern" is used. The supporting patterns are those patterns closest to the hyperplane. When a recognition error occurs, we include all the TN vectors of the input speech sequence with respect to the segmentations of all HMM models as the supporting patterns. The supporting patterns are then used by the fuzzy perceptron to tune the hyperplane that can cause correct recognition, and also tune the hyperplane that resulted in wrong recognition. Since only two hyperplane need to be tuned for a recognition error, the proposed adaptation scheme is time-economic and suitable for on-line adaptation. Although the adaptation scheme cannot ensure to correct the wrong recognition right after adaptation, the hyperplanes are tuned in the direction for correct recognition iteratively and the speed of adaptation can be adjusted by a "belief" parameter set by the user. Several examples are used to show the performance of the proposed speech recognition algorithm and the speaker adaptation scheme.

Style APA, Harvard, Vancouver, ISO itp.

30

BENNANI, YOUNÈS. "MULTI-EXPERT AND HYBRID CONNECTIONIST APPROACH FOR PATTERN RECOGNITION: SPEAKER IDENTIFICATION TASK". International Journal of Neural Systems 05, nr 03 (wrzesień 1994): 207–16. http://dx.doi.org/10.1142/s0129065794000220.

Pełny tekst źródła

Streszczenie:

This paper presents and evaluates a modular/hybrid connectionist system for speaker identification. Modularity has emerged as a powerful technique for reducing the complexity of connectionist systems, allowing a priori knowledge to be incorporated into their design. In problems where training data are scarce, such modular systems are likely to generalize significantly better than a monolithic connectionist system. In addition, modules are not restricted to be connectionist: hybrid systems, with e.g. Hidden Markov Models (HMMs), can be designed, combining the advantages of connectionist and non-connectionist approaches. Text independent speaker identification is an inherently complex task where the amount of training data is often limited. It thus provides an ideal domain to test the validity of the modular/hybrid connectionist approach. An architecture is developed in this paper which achieves this identification, based upon the cooperation of several connectionist modules, together with an HMM module. When tested on a population of 102 speakers extracted from the DARPA-TIMIT database, perfect identification was obtained. Overall, our recognition results are among the best for any text-independent speaker identification system handling this population size. In a specific comparison with a system based on multivariate auto-regressive models, the modular/hybrid connectionist approach was found to be significantly better in terms of both accuracy and speed. Our design also allows for easy incorporation of new speakers.

Style APA, Harvard, Vancouver, ISO itp.

31

Zanellato, Georges. "Speaker independent isolated word recognition". Signal Processing 21, nr 1 (wrzesień 1990): 93–95. http://dx.doi.org/10.1016/0165-1684(90)90029-x.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

32

Lambamo, Wondimu, Ramasamy Srinivasagan i Worku Jifara. "Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition". Applied Sciences 13, nr 1 (31.12.2022): 569. http://dx.doi.org/10.3390/app13010569.

Pełny tekst źródła

Streszczenie:

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.

Style APA, Harvard, Vancouver, ISO itp.

33

Dwijayanti, Suci, Alvio Yunita Putri i Bhakti Yudho Suprapto. "Speaker Identification Using a Convolutional Neural Network". Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, nr 1 (27.02.2022): 140–45. http://dx.doi.org/10.29207/resti.v6i1.3795.

Pełny tekst źródła

Streszczenie:

Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.

Style APA, Harvard, Vancouver, ISO itp.

34

Hussien, Emad Ahmed, Mohannad Abid Shehab Ahmed i Haithem Abd Al-Raheem Taha. "Speech Recognition using Wavelets and Improved SVM". Wasit Journal of Engineering Sciences 1, nr 2 (1.09.2013): 55–78. http://dx.doi.org/10.31185/ejuow.vol1.iss2.13.

Pełny tekst źródła

Streszczenie:

Speaker recognition (identification/verification) is the computing task of validating a user’s claimed identity using speaker specific information included in speech waves: that is, it enables access control of various services by voice. Discrete Wavelet Transform (DWT) based systems for speaker recognition have shown robust results for several years and are widely used in speaker recognition applications. This paper is based on text independent speaker recognition system that makes use of Discrete Wavelet Transform (DWT) as a feature extraction and kernel Support Vector Machine (SVM) approach as a classification tool for taking the decision through applying simplified-Class Support Vector Machine approach. The proposed SVM approach can convert local Euclidean distances between frame vectors to angles by projecting these -dimensional vectors together, and get the minimum global distance from the non-linear aligned speech path in order to address audio classiﬁcation, and hence, sound recognition.The DWT for each frame of the spoken word are taken as a tool for extracting the main feature as a data code vectors, next these data is normalized utilizing the normalized power algorithm that is used to reduce the number of feature vector coefficients then these data is scaled and tested with those stored of the training spoken words to achieve the speaker identification tasks, also the DWT gives fixed amount of data that can be utilized modesty by SVM.Finally, the proposed method is tested and trained upon a very large data base with results limited to ten speakers only (5 males and 5 females) with words of maximally 17 phenomena and its performance gives an accurate and stable results which rises the algorithm efficiency and reduce the execution time with 97% overall accuracy.

Style APA, Harvard, Vancouver, ISO itp.

35

Simić, Nikola, Siniša Suzić, Tijana Nosek, Mia Vujović, Zoran Perić, Milan Savić i Vlado Delić. "Speaker Recognition Using Constrained Convolutional Neural Networks in Emotional Speech". Entropy 24, nr 3 (16.03.2022): 414. http://dx.doi.org/10.3390/e24030414.

Pełny tekst źródła

Streszczenie:

Speaker recognition is an important classification task, which can be solved using several approaches. Although building a speaker recognition model on a closed set of speakers under neutral speaking conditions is a well-researched task and there are solutions that provide excellent performance, the classification accuracy of developed models significantly decreases when applying them to emotional speech or in the presence of interference. Furthermore, deep models may require a large number of parameters, so constrained solutions are desirable in order to implement them on edge devices in the Internet of Things systems for real-time detection. The aim of this paper is to propose a simple and constrained convolutional neural network for speaker recognition tasks and to examine its robustness for recognition in emotional speech conditions. We examine three quantization methods for developing a constrained network: floating-point eight format, ternary scalar quantization, and binary scalar quantization. The results are demonstrated on the recently recorded SEAC dataset.

Style APA, Harvard, Vancouver, ISO itp.

36

Alkhatib, Bassel, i Mohammad Madian Waleed Kamal Eddin. "Voice Identification Using MFCC and Vector Quantization". Baghdad Science Journal 17, nr 3(Suppl.) (8.09.2020): 1019. http://dx.doi.org/10.21123/bsj.2020.17.3(suppl.).1019.

Pełny tekst źródła

Streszczenie:

The speaker identification is one of the fundamental problems in speech processing and voice modeling. The speaker identification applications include authentication in critical security systems and the accuracy of the selection. Large-scale voice recognition applications are a major challenge. Quick search in the speaker database requires fast, modern techniques and relies on artificial intelligence to achieve the desired results from the system. Many efforts are made to achieve this through the establishment of variable-based systems and the development of new methodologies for speaker identification. Speaker identification is the process of recognizing who is speaking using the characteristics extracted from the speech's waves like pitch, tone, and frequency. The speaker's models are created and saved in the system environment and used to verify the identity required by people accessing the systems, which allows access to various services that are controlled by voice, speaker identification involves two main parts: the first part is the feature extraction and the second part is the feature matching.

Style APA, Harvard, Vancouver, ISO itp.

37

Koenecke, Allison, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky i Sharad Goel. "Racial disparities in automated speech recognition". Proceedings of the National Academy of Sciences 117, nr 14 (23.03.2020): 7684–89. http://dx.doi.org/10.1073/pnas.1915768117.

Pełny tekst źródła

Streszczenie:

Automated speech recognition (ASR) systems, which use sophisticated machine-learning algorithms to convert spoken language to text, have become increasingly widespread, powering popular virtual assistants, facilitating automated closed captioning, and enabling digital dictation platforms for health care. Over the last several years, the quality of these systems has dramatically improved, due both to advances in deep learning and to the collection of large-scale datasets used to train the systems. There is concern, however, that these tools do not work equally well for all subgroups of the population. Here, we examine the ability of five state-of-the-art ASR systems—developed by Amazon, Apple, Google, IBM, and Microsoft—to transcribe structured interviews conducted with 42 white speakers and 73 black speakers. In total, this corpus spans five US cities and consists of 19.8 h of audio matched on the age and gender of the speaker. We found that all five ASR systems exhibited substantial racial disparities, with an average word error rate (WER) of 0.35 for black speakers compared with 0.19 for white speakers. We trace these disparities to the underlying acoustic models used by the ASR systems as the race gap was equally large on a subset of identical phrases spoken by black and white individuals in our corpus. We conclude by proposing strategies—such as using more diverse training datasets that include African American Vernacular English—to reduce these performance differences and ensure speech recognition technology is inclusive.

Style APA, Harvard, Vancouver, ISO itp.

38

Jati, Arindam, Chin-Cheng Hsu, Monisankha Pal, Raghuveer Peri, Wael AbdAlmageed i Shrikanth Narayanan. "Adversarial attack and defense strategies for deep speaker recognition systems". Computer Speech & Language 68 (lipiec 2021): 101199. http://dx.doi.org/10.1016/j.csl.2021.101199.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

39

Doddington, George R., Mark A. Przybocki, Alvin F. Martin i Douglas A. Reynolds. "The NIST speaker recognition evaluation – Overview, methodology, systems, results, perspective". Speech Communication 31, nr 2-3 (czerwiec 2000): 225–54. http://dx.doi.org/10.1016/s0167-6393(99)00080-1.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

40

Siniscalchi, Sabato Marco, Jinyu Li i Chin-Hui Lee. "Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems". IEEE Transactions on Audio, Speech, and Language Processing 21, nr 10 (październik 2013): 2152–61. http://dx.doi.org/10.1109/tasl.2013.2270370.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

41

Najdet Nasret Coran, Ali, Prof Dr Hayri Sever i Dr Murad Ahmed Mohammed Amin. "Acoustic data classification using random forest algorithm and feed forward neural network". International Journal of Engineering & Technology 9, nr 2 (1.07.2020): 582. http://dx.doi.org/10.14419/ijet.v9i2.30815.

Pełny tekst źródła

Streszczenie:

Speaker identification systems are designed to recognize the speaker or set of speakers according to their acoustic analysis. Many approach-es are made to perform the acoustic analysis in the speech signal, the general description of those systems is time and frequency domain analysis. In this paper, acoustic information is extracted from the speech signals using MFCC and Fundamental Frequency methods combi-nation. The results are classified using two different algorithms such as Random-forest and Feed Forward Neural Network. The FFNN classifier integration with the acoustic model resulted a recognition accuracy of 91.4 %. The CMU ARCTIC Database is referred in this work.

Style APA, Harvard, Vancouver, ISO itp.

42

Tan, Hao, Le Wang, Huan Zhang, Junjian Zhang, Muhammad Shafiq i Zhaoquan Gu. "Adversarial Attack and Defense Strategies of Speaker Recognition Systems: A Survey". Electronics 11, nr 14 (12.07.2022): 2183. http://dx.doi.org/10.3390/electronics11142183.

Pełny tekst źródła

Streszczenie:

Speaker recognition is a task that identifies the speaker from multiple audios. Recently, advances in deep learning have considerably boosted the development of speech signal processing techniques. Speaker or speech recognition has been widely adopted in such applications as smart locks, smart vehicle-mounted systems, and financial services. However, deep neural network-based speaker recognition systems (SRSs) are susceptible to adversarial attacks, which fool the system to make wrong decisions by small perturbations, and this has drawn the attention of researchers to the security of SRSs. Unfortunately, there is no systematic review work in this domain. In this work, we conduct a comprehensive survey to fill this gap, which includes the development of SRSs, adversarial attacks and defenses against SRSs. Specifically, we first introduce the mainstream frameworks of SRSs and some commonly used datasets. Then, from the perspectives of adversarial example generation and evaluation, we introduce different attack tasks, the prior knowledge of attacks, perturbation objects, perturbation constraints, and attack effect evaluation indicators. Next, we focus on some effective defense strategies, including adversarial training, attack detection, and input refactoring against existing attacks, and analyze their strengths and weaknesses in terms of fidelity and robustness. Finally, we discuss the challenges posed by audio adversarial examples in SRSs and some valuable research topics in the future.

Style APA, Harvard, Vancouver, ISO itp.

43

Besacier, Laurent, i Jean-François Bonastre. "Subband architecture for automatic speaker recognition". Signal Processing 80, nr 7 (lipiec 2000): 1245–59. http://dx.doi.org/10.1016/s0165-1684(00)00033-5.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

44

Nematollahi, Mohammad Ali, Hamurabi Gamboa-Rosales, Mohammad Ali Akhaee i S. A. R. Al-Haddad. "Robust Digital Speech Watermarking For Online Speaker Recognition". Mathematical Problems in Engineering 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/372398.

Pełny tekst źródła

Streszczenie:

A robust and blind digital speech watermarking technique has been proposed for online speaker recognition systems based on Discrete Wavelet Packet Transform (DWPT) and multiplication to embed the watermark in the amplitudes of the wavelet’s subbands. In order to minimize the degradation effect of the watermark, these subbands are selected where less speaker-specific information was available (500 Hz–3500 Hz and 6000 Hz–7000 Hz). Experimental results on Texas Instruments Massachusetts Institute of Technology (TIMIT), Massachusetts Institute of Technology (MIT), and Mobile Biometry (MOBIO) show that the degradation for speaker verification and identification is 1.16% and 2.52%, respectively. Furthermore, the proposed watermark technique can provide enough robustness against different signal processing attacks.

Style APA, Harvard, Vancouver, ISO itp.

45

GUO, Wu, Yi-Jie LI, Li-Rong DAI i Ren-Hua WANG. "Factor Analysis and Space Assembling in Speaker Recognition". Acta Automatica Sinica 35, nr 9 (13.11.2009): 1193–98. http://dx.doi.org/10.3724/sp.j.1004.2009.01193.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

46

Hong, Q. Y., i S. Kwong. "A genetic classification method for speaker recognition". Engineering Applications of Artificial Intelligence 18, nr 1 (luty 2005): 13–19. http://dx.doi.org/10.1016/j.engappai.2004.08.035.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

47

ZERGAT, KAWTHAR YASMINE, i ABDERRAHMANE AMROUCHE. "SVM AGAINST GMM/SVM FOR DIALECT INFLUENCE ON AUTOMATIC SPEAKER RECOGNITION TASK". International Journal of Computational Intelligence and Applications 13, nr 02 (czerwiec 2014): 1450012. http://dx.doi.org/10.1142/s1469026814500126.

Pełny tekst źródła

Streszczenie:

A big deal for current research on automatic speaker recognition is the effectiveness of the speaker modeling techniques for the talkers, because they have their own speaking style, depending on their specific accents and dialects. This paper investigates on the influence of the dialect and the size of database on the text independent speaker verification task using the SVM and the hybrid GMM/SVM speaker modeling. The Principal Component Analysis (PCA) technique is used in the front-end part of the speaker recognition system, in order to extract the most representative features. Experimental results show that the size of database has an important impact on the SVM and GMM/SVM based speaker verification performances, while the dialect has no significant effect. Applying PCA dimensionality reduction improves the recognition accuracy for both SVM and GMM/SVM based recognition systems. However, it did not give an obvious observation about the dialect effect.

Style APA, Harvard, Vancouver, ISO itp.

48

Kawabata, Takeshi. "Predictor codebook for speaker-independent speech recognition". Systems and Computers in Japan 25, nr 1 (1994): 37–46. http://dx.doi.org/10.1002/scj.4690250103.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

49

Nagaraja, B. G., i H. S. Jayanna. "Multilingual Speaker Identification by Combining Evidence from LPR and Multitaper MFCC". Journal of Intelligent Systems 22, nr 3 (1.09.2013): 241–51. http://dx.doi.org/10.1515/jisys-2013-0038.

Pełny tekst źródła

Streszczenie:

AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.

Style APA, Harvard, Vancouver, ISO itp.

50

Bouziane, Ayoub, Jamal Kharroubi i Arsalane Zarghili. "Towards an objective comparison of feature extraction techniques for automatic speaker recognition systems". Bulletin of Electrical Engineering and Informatics 10, nr 1 (1.02.2021): 374–82. http://dx.doi.org/10.11591/eei.v10i1.1782.

Pełny tekst źródła

Streszczenie:

A common limitation of the previous comparative studies on speaker-features extraction techniques lies in the fact that the comparison is done independently of the used speaker modeling technique and its parameters. The aim of the present paper is twofold. Firstly, it aims to review the most significant advancements in feature extraction techniques used for automatic speaker recognition. Secondly, it seeks to evaluate and compare the currently dominant ones using an objective comparison methodology that overcomes the various limitations and drawbacks of the previous comparative studies. The results of the carried out experiments underlines the importance of the proposed comparison methodology.

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!