Zaloguj się

Gotowe bibliografie tematyczne / Speaker identification systems / Artykuły w czasopismach

Artykuły w czasopismach na temat „Speaker identification systems”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Speaker identification systems.

Autor: Grafiati

Data publikacji: 4 czerwca 2021

Data aktualizacji: 20 lutego 2023

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych artykułów w czasopismach naukowych na temat „Speaker identification systems”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj artykuły w czasopismach z różnych dziedzin i twórz odpowiednie bibliografie.

1

Jayanna, H. S., i B. G. Nagaraja. "An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification". Journal of Intelligent Systems 25, nr 4 (1.10.2016): 529–38. http://dx.doi.org/10.1515/jisys-2014-0128.

Pełny tekst źródła

Streszczenie:

AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.

Style APA, Harvard, Vancouver, ISO itp.

2

Shah, Shahid Munir, Muhammad Moinuddin i Rizwan Ahmed Khan. "A Robust Approach for Speaker Identification Using Dialect Information". Applied Computational Intelligence and Soft Computing 2022 (7.03.2022): 1–16. http://dx.doi.org/10.1155/2022/4980920.

Pełny tekst źródła

Streszczenie:

The present research is an effort to enhance the performance of voice processing systems, in our case the speaker identification system (SIS) by addressing the variability caused by the dialectical variations of a language. We present an effective solution to reduce dialect-related variability from voice processing systems. The proposed method minimizes the system’s complexity by reducing search space during the testing process of speaker identification. The speaker is searched from the set of speakers of the identified dialect instead of all the speakers present in system training. The study is conducted on the Pashto language, and the voice data samples are collected from native Pashto speakers of specific regions of Pakistan and Afghanistan where Pashto is spoken with different dialectal variations. The task of speaker identification is achieved with the help of a novel hierarchical framework that works in two steps. In the first step, the speaker’s dialect is identified. For automated dialect identification, the spectral and prosodic features have been used in conjunction with Gaussian mixture model (GMM). In the second step, the speaker is identified using a multilayer perceptron (MLP)-based speaker identification system, which gets aggregated input from the first step, i.e., dialect identification along with prosodic and spectral features. The robustness of the proposed SIS is compared with traditional state-of-the-art methods in the literature. The results show that the proposed framework is better in terms of average speaker recognition accuracy (84.5% identification accuracy) and consumes 39% less time for the identification of speaker.

Style APA, Harvard, Vancouver, ISO itp.

3

Singh, Mahesh K., P. Mohana Satya, Vella Satyanarayana i Sridevi Gamini. "Speaker Recognition Assessment in a Continuous System for Speaker Identification". International Journal of Electrical and Electronics Research 10, nr 4 (30.12.2022): 862–67. http://dx.doi.org/10.37391/ijeer.100418.

Pełny tekst źródła

Streszczenie:

This research article presented and focused on recognizing speakers through multi-speaker speeches. The participation of several speakers includes every conference, talk or discussion. This type of talk has different problems as well as stages of processing. Challenges include the unique impurity of the surroundings, the involvement of speakers, speaker distance, microphone equipment etc. In addition to addressing these hurdles in real time, there are also problems in the treatment of the multi-speaker speech. Identifying speech segments, separating the speaking segments, constructing clusters of similar segments and finally recognizing the speaker using these segments are the common sequential operations in the context of multi-speaker speech recognition. All linked phases of speech recognition processes are discussed with relevant methodologies in this article. This entire article will examine the common metrics, methods and conduct. This paper examined the algorithm of speech recognition system at different stages. The voice recognition systems are built through many phases such as voice filter, speaker segmentation, speaker idolization and the recognition of the speaker by 20 speakers.

Style APA, Harvard, Vancouver, ISO itp.

4

EhKan, Phaklen, Timothy Allen i Steven F. Quigley. "FPGA Implementation for GMM-Based Speaker Identification". International Journal of Reconfigurable Computing 2011 (2011): 1–8. http://dx.doi.org/10.1155/2011/420369.

Pełny tekst źródła

Streszczenie:

In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.

Style APA, Harvard, Vancouver, ISO itp.

5

Alkhatib, Bassel, i Mohammad Madian Waleed Kamal Eddin. "Voice Identification Using MFCC and Vector Quantization". Baghdad Science Journal 17, nr 3(Suppl.) (8.09.2020): 1019. http://dx.doi.org/10.21123/bsj.2020.17.3(suppl.).1019.

Pełny tekst źródła

Streszczenie:

The speaker identification is one of the fundamental problems in speech processing and voice modeling. The speaker identification applications include authentication in critical security systems and the accuracy of the selection. Large-scale voice recognition applications are a major challenge. Quick search in the speaker database requires fast, modern techniques and relies on artificial intelligence to achieve the desired results from the system. Many efforts are made to achieve this through the establishment of variable-based systems and the development of new methodologies for speaker identification. Speaker identification is the process of recognizing who is speaking using the characteristics extracted from the speech's waves like pitch, tone, and frequency. The speaker's models are created and saved in the system environment and used to verify the identity required by people accessing the systems, which allows access to various services that are controlled by voice, speaker identification involves two main parts: the first part is the feature extraction and the second part is the feature matching.

Style APA, Harvard, Vancouver, ISO itp.

6

Dwijayanti, Suci, Alvio Yunita Putri i Bhakti Yudho Suprapto. "Speaker Identification Using a Convolutional Neural Network". Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, nr 1 (27.02.2022): 140–45. http://dx.doi.org/10.29207/resti.v6i1.3795.

Pełny tekst źródła

Streszczenie:

Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.

Style APA, Harvard, Vancouver, ISO itp.

7

Khoma, Volodymyr, Yuriy Khoma, Vitalii Brydinskyi i Alexander Konovalov. "Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library". Sensors 23, nr 4 (13.02.2023): 2082. http://dx.doi.org/10.3390/s23042082.

Pełny tekst źródła

Streszczenie:

Diarization is an important task when work with audiodata is executed, as it provides a solution to the problem related to the need of dividing one analyzed call recording into several speech recordings, each of which belongs to one speaker. Diarization systems segment audio recordings by defining the time boundaries of utterances, and typically use unsupervised methods to group utterances belonging to individual speakers, but do not answer the question “who is speaking?” On the other hand, there are biometric systems that identify individuals on the basis of their voices, but such systems are designed with the prerequisite that only one speaker is present in the analyzed audio recording. However, some applications involve the need to identify multiple speakers that interact freely in an audio recording. This paper proposes two architectures of speaker identification systems based on a combination of diarization and identification methods, which operate on the basis of segment-level or group-level classification. The open-source PyAnnote framework was used to develop the system. The performance of the speaker identification system was verified through the application of the AMI Corpus open-source audio database, which contains 100 h of annotated and transcribed audio and video data. The research method consisted of four experiments to select the best-performing supervised diarization algorithms on the basis of PyAnnote. The first experiment was designed to investigate how the selection of the distance function between vector embedding affects the reliability of identification of a speaker’s utterance in a segment-level classification architecture. The second experiment examines the architecture of cluster-centroid (group-level) classification, i.e., the selection of the best clustering and classification methods. The third experiment investigates the impact of different segmentation algorithms on the accuracy of identifying speaker utterances, and the fourth examines embedding window sizes. Experimental results demonstrated that the group-level approach offered better identification results were compared to the segment-level approach, and the latter had the advantage of real-time processing.

Style APA, Harvard, Vancouver, ISO itp.

8

Kamiński, Kamil A., i Andrzej P. Dobrowolski. "Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features". Sensors 22, nr 23 (1.12.2022): 9370. http://dx.doi.org/10.3390/s22239370.

Pełny tekst źródła

Streszczenie:

This article presents the Automatic Speaker Recognition System (ASR System), which successfully resolves problems such as identification within an open set of speakers and the verification of speakers in difficult recording conditions similar to telephone transmission conditions. The article provides complete information on the architecture of the various internal processing modules of the ASR System. The speaker recognition system proposed in the article, has been compared very closely to other competing systems, achieving improved speaker identification and verification results, on known certified voice dataset. The ASR System owes this to the dual use of genetic algorithms both in the feature selection process and in the optimization of the system’s internal parameters. This was also influenced by the proprietary feature generation and corresponding classification process using Gaussian mixture models. This allowed the development of a system that makes an important contribution to the current state of the art in speaker recognition systems for telephone transmission applications with known speech coding standards.

Style APA, Harvard, Vancouver, ISO itp.

9

Sarma, Mousmita, i Kandarpa Kumar Sarma. "Vowel Phoneme Segmentation for Speaker Identification Using an ANN-Based Framework". Journal of Intelligent Systems 22, nr 2 (1.06.2013): 111–30. http://dx.doi.org/10.1515/jisys-2012-0050.

Pełny tekst źródła

Streszczenie:

AbstractVowel phonemes are a part of any acoustic speech signal. Vowel sounds occur in speech more frequently and with higher energy. Therefore, vowel phoneme can be used to extract different amounts of speaker discriminative information in situations where acoustic information is noise corrupted. This article presents an approach to identify a speaker using the vowel sound segmented out from words spoken by the speaker. The work uses a combined self-organizing map (SOM)- and probabilistic neural network (PNN)-based approach to segment the vowel phoneme. The segmented vowel is later used to identify the speaker of the word by matching the patterns with a learning vector quantization (LVQ)-based code book. The LVQ code book is prepared by taking features of clean vowel phonemes uttered by the male and female speakers to be identified. The proposed work formulates a framework for the design of a speaker-recognition model of the Assamese language, which is spoken by ∼3 million people in the Northeast Indian state of Assam. The experimental results show that the segmentation success rates obtained using a SOM-based technique provides an increase of at least 7% compared with the discrete wavelet transform-based technique. This increase contributes to the improvement in overall performance of speaker identification by ∼3% compared with earlier related works.

Style APA, Harvard, Vancouver, ISO itp.

10

Nagaraja, B. G., i H. S. Jayanna. "Multilingual Speaker Identification by Combining Evidence from LPR and Multitaper MFCC". Journal of Intelligent Systems 22, nr 3 (1.09.2013): 241–51. http://dx.doi.org/10.1515/jisys-2013-0038.

Pełny tekst źródła

Streszczenie:

AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.

Style APA, Harvard, Vancouver, ISO itp.

11

Li, Qiang, i Yan Hong Liu. "SVM-GMM Based Speaker Identification". Advanced Materials Research 1044-1045 (październik 2014): 1370–74. http://dx.doi.org/10.4028/www.scientific.net/amr.1044-1045.1370.

Pełny tekst źródła

Streszczenie:

Although a great success has been achieved under the environment of lab where the training data is sufficient and the surroundings are quiet, speaker identification (SI) in practical use still remains a challenge because of the complicated environment. To tackle this challenge, a hybrid system of Gaussian mixture model-support vector machines (GMM-SVM) is proposed in this paper. SVM can do well with less data but is computationally expensive while GMM is computationally inexpensive but needs more data to perform adequately. In this paper, SVM and GMM are parallel in both the training and testing phase, the judgment of them are fused to make the final decision: the person with the largest score is identified as the true speaker. Universal background model (UBM) is used in GMM to improve the recognition accuracy. The system is evaluated on part of the TIMIT database and a Chinese database which is recorded by our own. Experiments have shown that the method proposed in this paper is effective. The system has better performance and robustness than the baseline systems.

Style APA, Harvard, Vancouver, ISO itp.

12

BENNANI, YOUNÈS. "MULTI-EXPERT AND HYBRID CONNECTIONIST APPROACH FOR PATTERN RECOGNITION: SPEAKER IDENTIFICATION TASK". International Journal of Neural Systems 05, nr 03 (wrzesień 1994): 207–16. http://dx.doi.org/10.1142/s0129065794000220.

Pełny tekst źródła

Streszczenie:

This paper presents and evaluates a modular/hybrid connectionist system for speaker identification. Modularity has emerged as a powerful technique for reducing the complexity of connectionist systems, allowing a priori knowledge to be incorporated into their design. In problems where training data are scarce, such modular systems are likely to generalize significantly better than a monolithic connectionist system. In addition, modules are not restricted to be connectionist: hybrid systems, with e.g. Hidden Markov Models (HMMs), can be designed, combining the advantages of connectionist and non-connectionist approaches. Text independent speaker identification is an inherently complex task where the amount of training data is often limited. It thus provides an ideal domain to test the validity of the modular/hybrid connectionist approach. An architecture is developed in this paper which achieves this identification, based upon the cooperation of several connectionist modules, together with an HMM module. When tested on a population of 102 speakers extracted from the DARPA-TIMIT database, perfect identification was obtained. Overall, our recognition results are among the best for any text-independent speaker identification system handling this population size. In a specific comparison with a system based on multivariate auto-regressive models, the modular/hybrid connectionist approach was found to be significantly better in terms of both accuracy and speed. Our design also allows for easy incorporation of new speakers.

Style APA, Harvard, Vancouver, ISO itp.

13

Bennani, Younès. "A Modular and Hybrid Connectionist System for Speaker Identification". Neural Computation 7, nr 4 (lipiec 1995): 791–98. http://dx.doi.org/10.1162/neco.1995.7.4.791.

Pełny tekst źródła

Streszczenie:

This paper presents and evaluates a modular/hybrid connectionist system for speaker identification. Modularity has emerged as a powerful technique for reducing the complexity of connectionist systems, and allowing a priori knowledge to be incorporated into their design. Text-independent speaker identification is an inherently complex task where the amount of training data is often limited. It thus provides an ideal domain to test the validity of the modular/hybrid connectionist approach. To achieve such identification, we develop, in this paper, an architecture based upon the cooperation of several connectionist modules, and a Hidden Markov Model module. When tested on a population of 102 speakers extracted from the DARPA-TIMIT database, perfect identification was obtained.

Style APA, Harvard, Vancouver, ISO itp.

14

Lakshmi Prasanna P i Vanlalhriati C. "Is tonal language a problem for speaker identification (SPID)?" World Journal of Advanced Engineering Technology and Sciences 7, nr 2 (30.12.2022): 163–73. http://dx.doi.org/10.30574/wjaets.2022.7.2.0140.

Pełny tekst źródła

Streszczenie:

The availability of numerous technologies has led to an increase in the usage of bioinformatics in recent years. Siri, Alexa, and other artificial intelligence systems assist us in our daily lives. Voice recognition systems are used to confirm an individual's identity based on particular elements retrieved from his or her voice. In this regard, the current study attempted to assess the proportion of speaker identification in tonal language speaking persons. The study included 20 participants in the age range from 20 to 40 years. All participants were given a few phrases to speak and were recorded. PRAAT software was used to analyze the obtained data. A vector was developed by using the first two formants, which was then utilized to calculate the percentage for speaker identification. From small sample size to bigger sample size, three groups were formed: A-5, B-10, and C-20 speakers. In a lower sample size, results showed a benchmark of 65% for vowel /i:/, which is better for SPID, 60% for /a:/, which is above chance level, and 45% for /u:/, which is below chance level. The authors stated that increasing the sample size has an influence on speaker identification.

Style APA, Harvard, Vancouver, ISO itp.

15

Singh, Satyanand. "Forensic and Automatic Speaker Recognition System". International Journal of Electrical and Computer Engineering (IJECE) 8, nr 5 (1.10.2018): 2804. http://dx.doi.org/10.11591/ijece.v8i5.pp2804-2811.

Pełny tekst źródła

Streszczenie:

<span lang="EN-US">Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics</span>

Style APA, Harvard, Vancouver, ISO itp.

16

Vuppala, Anil Kumar, K. Sreenivasa Rao i Saswat Chakrabarti. "Improved speaker identification in wireless environment". International Journal of Signal and Imaging Systems Engineering 6, nr 3 (2013): 130. http://dx.doi.org/10.1504/ijsise.2013.054789.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

17

Alabbasi, Hesham A., Ali M. Jalil i Fadhil S. Hasan. "Adaptive wavelet thresholding with robust hybrid features for text-independent speaker identification system". International Journal of Electrical and Computer Engineering (IJECE) 10, nr 5 (1.10.2020): 5208. http://dx.doi.org/10.11591/ijece.v10i5.pp5208-5216.

Pełny tekst źródła

Streszczenie:

The robustness of speaker identification system over additive noise channel is crucial for real-world applications. In speaker identification (SID) systems, the extracted features from each speech frame are an essential factor for building a reliable identification system. For clean environments, the identification system works well; in noisy environments, there is an additive noise, which is affect the system. To eliminate the problem of additive noise and to achieve a high accuracy in speaker identification system a proposed algorithm for feature extraction based on speech enhancement and a combined features is presents. In this paper, a wavelet thresholding pre-processing stage, and feature warping (FW) techniques are used with two combined features named power normalized cepstral coefficients (PNCC) and gammatone frequency cepstral coefficients (GFCC) to improve the identification system robustness against different types of additive noises. Universal Background Model Gaussian Mixture Model (UBM-GMM) is used for features matching between the claim and actual speakers. The results showed performance improvement for the proposed feature extraction algorithm of identification system comparing with conventional features over most types of noises and different SNR ratios.

Style APA, Harvard, Vancouver, ISO itp.

18

Madhusudhana Rao, T. V., Suribabu Korada i Y. Srinivas. "Machine hearing system for teleconference authentication with effective speech analysis". International Journal of Knowledge-based and Intelligent Engineering Systems 25, nr 3 (10.11.2021): 357–65. http://dx.doi.org/10.3233/kes-210079.

Pełny tekst źródła

Streszczenie:

The speaker identification in Teleconferencing scenario, it is important to address whether a particular speaker is a part of a conference or not and to note that whether a particular speaker is spoken at the meeting or not. The feature vectors are extracted using MFCC-SDC-LPC. The Generalized Gamma Distribution is used to model the feature vectors. K-means algorithm is utilized to cluster the speech data. The test speaker is to be verified that he/she is a participant in the conference. A conference database is generated with 50 speakers. In order to test the model, 20 different speakers not belonging to the conference are also considered. The efficiency of the model developed is compared using various measures such as AR, FAR and MDR. And the system is tested by varying number of speakers in the conference. The results show that the model performs more robustly.

Style APA, Harvard, Vancouver, ISO itp.

19

Bageshree Pathak, Dr, i Shriyanti Kulkarni. "Speaker Recognition System for Home Security using Raspberry Pi and Python". International Journal of Engineering & Technology 7, nr 4.5 (22.09.2018): 95. http://dx.doi.org/10.14419/ijet.v7i4.5.20019.

Pełny tekst źródła

Streszczenie:

The transfer of manual controls to machine controls is automation. Automation is the need of the hour. Home automation is automation of home systems to create smart homes. It includes security systems, appliance control and environment control. The increasing need for safety and security has brought biometric security systems to the forefront. Speech being unique and individualistic can be used for biometric identification. The proposed system is a prototype which can be fitted for speaker recognition for home security. The system will identify the registered speakers and will allow access to the recognized speaker. The system is implemented on Raspberry pi platform using Python language.

Style APA, Harvard, Vancouver, ISO itp.

20

Najdet Nasret Coran, Ali, Prof Dr Hayri Sever i Dr Murad Ahmed Mohammed Amin. "Acoustic data classification using random forest algorithm and feed forward neural network". International Journal of Engineering & Technology 9, nr 2 (1.07.2020): 582. http://dx.doi.org/10.14419/ijet.v9i2.30815.

Pełny tekst źródła

Streszczenie:

Speaker identification systems are designed to recognize the speaker or set of speakers according to their acoustic analysis. Many approach-es are made to perform the acoustic analysis in the speech signal, the general description of those systems is time and frequency domain analysis. In this paper, acoustic information is extracted from the speech signals using MFCC and Fundamental Frequency methods combi-nation. The results are classified using two different algorithms such as Random-forest and Feed Forward Neural Network. The FFNN classifier integration with the acoustic model resulted a recognition accuracy of 91.4 %. The CMU ARCTIC Database is referred in this work.

Style APA, Harvard, Vancouver, ISO itp.

21

Özcan, Zübeyir, i Temel Kayıkçıoğlu. "Evaluating MFCC-based speaker identification systems with data envelopment analysis". Expert Systems with Applications 168 (kwiecień 2021): 114448. http://dx.doi.org/10.1016/j.eswa.2020.114448.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

22

Mohammed, Rawia A., Nidaa F. Hassan i Akbas E. Ali. "Arabic Speaker Identification System Using Multi Features". Engineering and Technology Journal 38, nr 5A (25.05.2020): 769–78. http://dx.doi.org/10.30684/etj.v38i5a.408.

Pełny tekst źródła

Streszczenie:

The performance regarding the Speaker Identification Systems (SIS) has enhanced because of the current developments in speech processing methods, however, an improvement is still required with regard to text-independent speaker identification in the Arabic language. In spite of tremendous progress in applied technology for SIS, it is limited to English and some other languages. This paper aims to design an efficient SIS (text-independent) for the Arabic language. The proposed system uses speech signal features for speaker identification purposes, and it includes two phases: The first phase is training, in this phase a corpus of reference database is built which will serve as a reference for comparing and identifying the speaker for the second phase. The second phase is testing, which searches the identification of the speaker. In this system, the features will be extracted according to: Mel Frequency Cepstrum Coefficient (MFCC), mathematical calculations of voice frequency and voice fundamental frequency. Machine learning classification techniques: K-nearest neighbors, Sequential Minimum Optimization and Logistic Model Tree are used in the classification process. The best classification technique is a K-nearest neighbors, where it gives higher precision 94.8%.

Style APA, Harvard, Vancouver, ISO itp.

23

Sayoud, Halim, i Siham Ouamour. "Pertinent Prosodic Features for Speaker Identification by Voice". International Journal of Mobile Computing and Multimedia Communications 2, nr 2 (kwiecień 2010): 18–33. http://dx.doi.org/10.4018/jmcmc.2010040102.

Pełny tekst źródła

Streszczenie:

Most existing systems of speaker recognition use “state of the art” acoustic features. However, many times one can only recognize a speaker by his or her prosodic features, especially by the accent. For this reason, the authors investigate some pertinent prosodic features that can be associated with other classic acoustic features, in order to improve the recognition accuracy. The authors have developed a new prosodic model using a modified LVQ (Learning Vector Quantization) algorithm, which is called MLVQ (Modified LVQ). This model is composed of three reduced prosodic features: the mean of the pitch, original duration, and low-frequency energy. Since these features are heterogeneous, a new optimized metric has been proposed that is called Optimized Distance for Heterogeneous Features (ODHEF). Tests of speaker identification are done on Arabic corpus because the NIST evaluations showed that speaker verification scores depend on the spoken language and that some of the worst scores were got for the Arabic language. Experimental results show good performances of the new prosodic approach.

Style APA, Harvard, Vancouver, ISO itp.

24

Soloviev, Viktor, Oleg Rybalsky, Vadim Zhuravel, Alexander Shablya i Evgeny Timko. "TAKING INTO ACCOUNT THE MULTIFACTORIAL CHARACTER OF VOICE CHARACTERISTICS IN THE PROBLEMS OF SPEAKER IDENTIFICATION". Journal of Automation and Information sciences 5 (1.09.2021): 21–30. http://dx.doi.org/10.34229/1028-0979-2021-5-2.

Pełny tekst źródła

Streszczenie:

When testing the most advanced speaker identification systems on specialized databases, their minimum efficiency, estimated by the error probability at the point of intersection of the error curves, is only a few percent. However, many factors are known that affect the variability of the characteristics of the speaker's voice, each of which has its own, different from the others, influence on the results of the speaker's identification by the characteristics of the voice. The complexity of creating and testing speaker identification systems is the need to quantitatively formalize a number of specific factors that affect the characteristics of his voice. The article discusses the proposed method for accounting for a variety of factors affecting the parameters of the characteristics of the speaker's voice, which provides the fundamental possibility of indirectly accounting for their practically unlimited number. According to this method, «atomic» structures are distinguished from speech signals, which depend on the totality of the main factors that affect the speaker's identification process. With this method, all significant factors affecting the characteristics of the voice will be indirectly taken into account at the level of these structures. Subsequent decisions are made on the combinatorial set of a huge number of these «atomic» structures. «Atomic» speech structures are understood as the spectra of any fragments of any vowel sounds allocated in a time window of 20 ms. «Atomic» structures are selected automatically. The proposed method provides a rational consideration of the multifactorial influence of various parameters, since the spectra of these structures are influenced by all the main factors that characterize the individuality of the voice of a particular speaker. The decision on the identity of the voices of the announcers recorded on different phonograms is carried out on the basis of combinatorics of «atomic» spectra of vowel sounds in both phonograms. The method has shown high efficiency in the examination of phonograms of short duration.

Style APA, Harvard, Vancouver, ISO itp.

25

Baroughi, Alireza Farrokh, Scott Craver i Daniel Douglas. "Attacks on Speaker Identification Systems Constrained to Speech-to-Text Decoding". Electronic Imaging 2016, nr 8 (14.02.2016): 1–7. http://dx.doi.org/10.2352/issn.2470-1173.2016.8.mwsf-073.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

26

Al-Hmouz, Ahmed, Khaled Daqrouq, Rami Al-Hmouz i JaafaAlghazo. "Feature Reduction Method for Speaker Identification Systems Using Particle Swarm Optimization". International Journal of Engineering and Technology 9, nr 3 (30.06.2017): 1714–23. http://dx.doi.org/10.21817/ijet/2017/v9i3/170903045.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

27

Yamada, Makoto, Masashi Sugiyama i Tomoko Matsui. "Semi-supervised speaker identification under covariate shift". Signal Processing 90, nr 8 (sierpień 2010): 2353–61. http://dx.doi.org/10.1016/j.sigpro.2009.06.001.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

28

Moreno, L. C., i P. B. Lopes. "The Voice Biometrics Based on Pitch Replication". International Journal for Innovation Education and Research 6, nr 10 (31.10.2018): 351–58. http://dx.doi.org/10.31686/ijier.vol6.iss10.1201.

Pełny tekst źródła

Streszczenie:

Authentication and security in automated systems have become very much necessary in our days and many techniques have been proposed towards this end. One of these alternatives is biometrics in which human body characteristics are used to authenticate the system user. The objective of this article is to present a method of text independent speaker identification through the replication of pitch characteristics. Pitch is an important speech feature and is used in a variety of applications, including voice biometrics. The proposed method of speaker identification is based on short segments of speech, namely, three seconds for training and three seconds for the speaker determination. From these segments pitch characteristics are extracted and are used in the proposed method of replication for identification of the speaker.

Style APA, Harvard, Vancouver, ISO itp.

29

El-Shafai, Walid, Marwa A. Elsayed, Mohsen A. Rashwan, Moawad I. Dessouky, Adel S. El-Fishawy, Naglaa F. Soliman, Amel A. Alhussan i Fathi E. Abd El-Samie. "Optical Ciphering Scheme for Cancellable Speaker Identification System". Computer Systems Science and Engineering 45, nr 1 (2023): 563–78. http://dx.doi.org/10.32604/csse.2023.024375.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

30

Nassif, Ali Bou, Noha Alnazzawi, Ismail Shahin, Said A. Salloum, Noor Hindawi, Mohammed Lataifeh i Ashraf Elnagar. "A Novel RBFNN-CNN Model for Speaker Identification in Stressful Talking Environments". Applied Sciences 12, nr 10 (11.05.2022): 4841. http://dx.doi.org/10.3390/app12104841.

Pełny tekst źródła

Streszczenie:

Speaker identification systems perform almost ideally in neutral talking environments. However, these systems perform poorly in stressful talking environments. In this paper, we present an effective approach for enhancing the performance of speaker identification in stressful talking environments based on a novel radial basis function neural network-convolutional neural network (RBFNN-CNN) model. In this research, we applied our approach to two distinct speech databases: a local Arabic Emirati-accent dataset and a global English Speech Under Simulated and Actual Stress (SUSAS) corpus. To the best of our knowledge, this is the first work that addresses the use of an RBFNN-CNN model in speaker identification under stressful talking environments. Our speech identification models select the finest speech signal representation through the use of Mel-frequency cepstral coefficients (MFCCs) as a feature extraction method. A comparison among traditional classifiers such as support vector machine (SVM), multilayer perceptron (MLP), k-nearest neighbors algorithm (KNN) and deep learning models, such as convolutional neural network (CNN) and recurrent neural network (RNN), was conducted. The results of our experiments show that speaker identification performance in stressful environments based on the RBFNN-CNN model is higher than that with the classical and deep machine learning models.

Style APA, Harvard, Vancouver, ISO itp.

31

Koteswara Rao, P. Rama, Sunitha Ravi i Thotakura Haritha. "Purging of silence for robust speaker identification in colossal database". International Journal of Electrical and Computer Engineering (IJECE) 11, nr 4 (1.08.2021): 3084. http://dx.doi.org/10.11591/ijece.v11i4.pp3084-3092.

Pełny tekst źródła

Streszczenie:

The aim of this work is to develop an effective speaker recognition system under noisy environments for large data sets. The important phases involved in typical identification systems are feature extraction, training and testing. During the feature extraction phase, the speaker-specific information is processed based on the characteristics of the voice signal. Effective methods have been proposed for the silence removal in order to achieve accurate recognition under noisy environments in this work. Pitch and Pitch-strength parameters are extracted as distinct features from the input speech spectrum. Multi-linear principle component analysis (MPCA) is is utilized to minimize the complexity of the parameter matrix. Silence removal using zero crossing rate (ZCR) and endpoint detection algorithm (EDA) methods are applied on the source utterance during the feature extraction phase. These features are useful in later classification phase, where the identification is made on the basis of support vector machine (SVM) algorithms. Forward loking schostic (FOLOS) is the efficient large-scale SVM algorithm that has been employed for the effective classification among speakers. The evaluation findings indicate that the methods suggested increase the performance for large amounts of data in noise ecosystems.

Style APA, Harvard, Vancouver, ISO itp.

32

Nidhyananthan, S. Selva, Prasad M. i Shantha Selva Kumari R. "Secure Speaker Recognition using BGN Cryptosystem with Prime Order Bilinear Group". International Journal of Information Security and Privacy 9, nr 4 (październik 2015): 1–19. http://dx.doi.org/10.4018/ijisp.2015100101.

Pełny tekst źródła

Streszczenie:

Speech being a unique characteristic of an individual is widely used in speaker verification and speaker identification tasks in applications such as authentication and surveillance respectively. In this paper, framework for secure speaker recognition system using BGN Cryptosystem, where the system is able to perform the necessary operations without being able to observe the speech input provided by the user during speaker recognition process. Secure speaker recognition makes use of Secure Multiparty Computation (SMC) based on the homomorphic properties of cryptosystem. Among the cryptosytem with homomorphic properties BGN is preferable, because it is partially doubly homomorphic, which can perform arbitrary number of addition and only one multiplication. But the main disadvantage of using BGN cryptosystem is its execution time. In proposed system, the execution time is reduced by a factor of 12 by replacing conventional composite order group by prime order group. This leads to an efficient secure speaker recognition.

Style APA, Harvard, Vancouver, ISO itp.

33

LIANG, QIANHUI, i MIAOLIANG ZHU. "A COMBINED APPROACH TO TEXT-DEPENDENT SPEAKER IDENTIFICATION: COMPARISON WITH PURE NEURAL NET APPROACHES". Journal of Circuits, Systems and Computers 08, nr 02 (kwiecień 1998): 273–81. http://dx.doi.org/10.1142/s0218126698000110.

Pełny tekst źródła

Streszczenie:

A novel approach to automatic speaker identification (ASI) is presented. Most of the present automatic speaker identification systems based on neural networks have no definite mechanisms to compensate for time distortions due to elocution. Such models have less precise information about the intraspeaker measure. The new combined approach uses both distortion-based and discriminant-based methods. The distortion-based and discriminant-based methods are dynamic time warping (DTW) and artificial neural network (ANN) respectively. This paper compares this new classifier with a pure neural net classifier for speaker identification. The performance of the combined classifier surpasses that of a pure ANN classifier for the conditions tested.

Style APA, Harvard, Vancouver, ISO itp.

34

Clarkson, T. G., C. C. Christodoulou, Yelin Guan, D. Gorse, D. A. Romano-Critchley i J. G. Taylor. "Speaker identification for security systems using reinforcement-trained pRAM neural network architectures". IEEE Transactions on Systems, Man and Cybernetics, Part C (Applications and Reviews) 31, nr 1 (2001): 65–76. http://dx.doi.org/10.1109/5326.923269.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

35

Soliman, Naglaa F., Zhraa Mostfa, Fathi E. Abd El-Samie i Mahmoud I. Abdalla. "Performance enhancement of speaker identification systems using speech encryption and cancelable features". International Journal of Speech Technology 20, nr 4 (10.10.2017): 977–1004. http://dx.doi.org/10.1007/s10772-017-9435-z.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

36

Lutsenko, K., i K. Nikulin. "VOICE SPEAKER IDENTIFICATION AS ONE OF THE CURRENT BIOMETRIC METHODS OF IDENTIFICATION OF A PERSON". Theory and Practice of Forensic Science and Criminalistics 19, nr 1 (2.04.2020): 239–55. http://dx.doi.org/10.32353/khrife.1.2019.18.

Pełny tekst źródła

Streszczenie:

The article deals with the most widespread biometric identification systems of individuals, including voice recognition of the speaker on video and sound recordings. The urgency of the topic of identification of a person is due to the active informatization of modern society and the increase of flows of confidential information. The branches of the use of biometric technologies and their general characteristics are given. Here is an overview of the use of identification groups that characterize the voice. Also in the article the division of voice identification systems into the corresponding classes is given. The main advantages of voice biometrics such as simplicity of system realization are considered; low cost (the lowest among all biometric methods); No need for contact, the voice biometry allows for long-range verification, unlike other biometric technologies. The analysis of existing methods of speech recognition recognition identifying a person by a combination of unique voice characteristics, determining their weak and strong points, on the basis of which the choice of the most appropriate method for solving the problem of text-independent recognition, Namely the model of Gaussian mixtures, was carried out. The prerequisite for the development of speech technologies is a significant increase in computing capabilities, memory capacity with a significant reduction in the size of computer systems. It should also be Noted the development of mathematical methods that make it possible to perform the Necessary processing of an audio signal by isolating informative features from it. It has been established that the development of information technologies, and the set of practical applications, which use voice recognition technologies, make this area relevant for further theoretical and practical research.

Style APA, Harvard, Vancouver, ISO itp.

37

XU, Li-Min. "Research on Robust Speaker Identification Based on Adaptive Histogram Equalization". Acta Automatica Sinica 34, nr 7 (2.03.2009): 752–59. http://dx.doi.org/10.3724/sp.j.1004.2008.00752.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

38

Ghezaiel, Wajdi, Amel Ben Slimane i Ezzedine Ben Braiek. "On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification". International Journal of Electrical and Computer Engineering (IJECE) 6, nr 6 (1.12.2016): 2766. http://dx.doi.org/10.11591/ijece.v6i6.9844.

Pełny tekst źródła

Streszczenie:

<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>

Style APA, Harvard, Vancouver, ISO itp.

39

Ghezaiel, Wajdi, Amel Ben Slimane i Ezzedine Ben Braiek. "On Usable Speech Detection by Linear Multi-Scale Decomposition for Speaker Identification". International Journal of Electrical and Computer Engineering (IJECE) 6, nr 6 (1.12.2016): 2766. http://dx.doi.org/10.11591/ijece.v6i6.pp2766-2772.

Pełny tekst źródła

Streszczenie:

<p>Usable speech is a novel concept of processing co-channel speech data. It is proposed to extract minimally corrupted speech that is considered useful for various speech processing systems. In this paper, we are interested for co-channel speaker identification (SID). We employ a new proposed usable speech extraction method based on the pitch information obtained from linear multi-scale decomposition by discrete wavelet transform. The idea is to retain the speech segments that have only one pitch detected and remove the others. Detected Usable speech was used as input for speaker identification system. The system is evaluated on co-channel speech and results show a significant improvement across various Target to Interferer Ratio (TIR) for speaker identification system.</p>

Style APA, Harvard, Vancouver, ISO itp.

40

Shim, Hye-jin, Jee-weon Jung, Ju-ho Kim i Ha-jin Yu. "Integrated Replay Spoofing-Aware Text-Independent Speaker Verification". Applied Sciences 10, nr 18 (10.09.2020): 6292. http://dx.doi.org/10.3390/app10186292.

Pełny tekst źródła

Streszczenie:

A number of studies have successfully developed speaker verification or presentation attack detection systems. However, studies integrating the two tasks remain in the preliminary stages. In this paper, we propose two approaches for building an integrated system of speaker verification and presentation attack detection: an end-to-end monolithic approach and a back-end modular approach. The first approach simultaneously trains speaker identification, presentation attack detection, and the integrated system using multi-task learning using a common feature. However, through experiments, we hypothesize that the information required for performing speaker verification and presentation attack detection might differ because speaker verification systems try to remove device-specific information from speaker embeddings, while presentation attack detection systems exploit such information. Therefore, we propose a back-end modular approach using a separate deep neural network (DNN) for speaker verification and presentation attack detection. This approach has thee input components: two speaker embeddings (for enrollment and test each) and prediction of presentation attacks. Experiments are conducted using the ASVspoof 2017-v2 dataset, which includes official trials on the integration of speaker verification and presentation attack detection. The proposed back-end approach demonstrates a relative improvement of 21.77% in terms of the equal error rate for integrated trials compared to a conventional speaker verification system.

Style APA, Harvard, Vancouver, ISO itp.

41

BIELOZOROVA, YANA, i KATERYNA YATSKO. "FEATURES OF THE IMPLEMENTATION OF THE SPEAKER IDENTIFICATION SOFTWARE SYSTEM". Computer systems and information technologies, nr 4 (29.12.2022): 34–40. http://dx.doi.org/10.31891/csit-2022-4-5.

Pełny tekst źródła

Streszczenie:

The proposed architecture of the identification software system in the form of class and sequence diagrams. The main criteria for assessing the accuracy of speaker identification were studied and possible sources of loss of speaker identification accuracy were identified, which can be used when building a speaker identification system. A software system based on the proposed architecture and previously developed identification algorithms and methods was created. The following conclusions can be drawn on the basis of the performed research: approaches to the construction of existing announcer identification systems are considered; the main criteria for assessing the accuracy of announcer identification were investigated and the main sources of loss of accuracy during announcer identification were identified; the structural construction of the announcer identification system is considered, taking into account the identified sources of loss of accuracy during announcer identification; the proposed architecture of the speaker identification system in the UML language in the form of class and sequence diagrams; a software system was built that implements the functions of speech signal identification according to the methods and algorithm proposed in previous works. The software system uses a ranking method based on three different criteria. These include: calculation of the proximity of two-dimensional probability density function curves for the frequency of the main tone and the location in the spectrum of three frequency ranges that are extracted from the speech recorded in the speech signal; calculation of the proximity of the probability density function curves for each of these features separately; calculation of the degree of closeness of the absolute maxima of the formant spectra extracted from the speech recorded in the speech signal.

Style APA, Harvard, Vancouver, ISO itp.

42

Manikandan, K., i E. Chandra. "Speaker identification analysis for SGMM with k-means and fuzzy C-means clustering using SVM statistical technique". International Journal of Knowledge-based and Intelligent Engineering Systems 25, nr 3 (10.11.2021): 309–14. http://dx.doi.org/10.3233/kes-210073.

Pełny tekst źródła

Streszczenie:

Speaker Identification denotes the speech samples of known speaker and it identifies the best matches of the input model. The SGMFC method is the combination of Sub Gaussian Mixture Model (SGMM) with the Mel-frequency Cepstral Coefficients (MFCC) for feature extraction. The SGMFC method minimizes the error rate, memory footprint and also computational throughput measure needs of a medium-vocabulary speaker identification system, supposed for preparation on a transportable or otherwise. Fuzzy C-means and k-means clustering are used in the SGMM method to attain the improved efficiency and their outcomes with parameters such as precision, sensitivity and specificity are compared.

Style APA, Harvard, Vancouver, ISO itp.

43

Shahin, Ismail. "Speaker identification in emotional talking environments based on CSPHMM2s". Engineering Applications of Artificial Intelligence 26, nr 7 (sierpień 2013): 1652–59. http://dx.doi.org/10.1016/j.engappai.2013.03.013.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

44

Lambamo, Wondimu, Ramasamy Srinivasagan i Worku Jifara. "Analyzing Noise Robustness of Cochleogram and Mel Spectrogram Features in Deep Learning Based Speaker Recognition". Applied Sciences 13, nr 1 (31.12.2022): 569. http://dx.doi.org/10.3390/app13010569.

Pełny tekst źródła

Streszczenie:

The performance of speaker recognition systems is very well on the datasets without noise and mismatch. However, the performance gets degraded with the environmental noises, channel variation, physical and behavioral changes in speaker. The types of Speaker related feature play crucial role in improving the performance of speaker recognition systems. Gammatone Frequency Cepstral Coefficient (GFCC) features has been widely used to develop robust speaker recognition systems with the conventional machine learning, it achieved better performance compared to Mel Frequency Cepstral Coefficient (MFCC) features in the noisy condition. Recently, deep learning models showed better performance in the speaker recognition compared to conventional machine learning. Most of the previous deep learning-based speaker recognition models has used Mel Spectrogram and similar inputs rather than a handcrafted features like MFCC and GFCC features. However, the performance of the Mel Spectrogram features gets degraded in the high noise ratio and mismatch in the utterances. Similar to Mel Spectrogram, Cochleogram is another important feature for deep learning speaker recognition models. Like GFCC features, Cochleogram represents utterances in Equal Rectangular Band (ERB) scale which is important in noisy condition. However, none of the studies have conducted analysis for noise robustness of Cochleogram and Mel Spectrogram in speaker recognition. In addition, only limited studies have used Cochleogram to develop speech-based models in noisy and mismatch condition using deep learning. In this study, analysis of noise robustness of Cochleogram and Mel Spectrogram features in speaker recognition using deep learning model is conducted at the Signal to Noise Ratio (SNR) level from −5 dB to 20 dB. Experiments are conducted on the VoxCeleb1 and Noise added VoxCeleb1 dataset by using basic 2DCNN, ResNet-50, VGG-16, ECAPA-TDNN and TitaNet Models architectures. The Speaker identification and verification performance of both Cochleogram and Mel Spectrogram is evaluated. The results show that Cochleogram have better performance than Mel Spectrogram in both speaker identification and verification at the noisy and mismatch condition.

Style APA, Harvard, Vancouver, ISO itp.

45

Van Dommelen, Wim A. "Identification of Twins by Spoken Syllables". Perceptual and Motor Skills 92, nr 1 (luty 2001): 8–10. http://dx.doi.org/10.2466/pms.2001.92.1.8.

Pełny tekst źródła

Streszczenie:

It is shown that the results of an experiment on speaker identification described by Whiteside and Rixon in 2000 seem to be contradictory and inconclusive. To investigate whether the experiment allows reliable conclusions, re-evaluation of their data using multiple regression techniques is proposed.

Style APA, Harvard, Vancouver, ISO itp.

46

Ding, Hui, Zhen-Min Tang, Li-Hua Wei i Yan-Ping Li. "A study on speaker identification based on weighted LS-SVM". Automatic Control and Computer Sciences 43, nr 6 (grudzień 2009): 328–35. http://dx.doi.org/10.3103/s0146411609060066.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

47

Hussien, Emad Ahmed, Mohannad Abid Shehab Ahmed i Haithem Abd Al-Raheem Taha. "Speech Recognition using Wavelets and Improved SVM". Wasit Journal of Engineering Sciences 1, nr 2 (1.09.2013): 55–78. http://dx.doi.org/10.31185/ejuow.vol1.iss2.13.

Pełny tekst źródła

Streszczenie:

Speaker recognition (identification/verification) is the computing task of validating a user’s claimed identity using speaker specific information included in speech waves: that is, it enables access control of various services by voice. Discrete Wavelet Transform (DWT) based systems for speaker recognition have shown robust results for several years and are widely used in speaker recognition applications. This paper is based on text independent speaker recognition system that makes use of Discrete Wavelet Transform (DWT) as a feature extraction and kernel Support Vector Machine (SVM) approach as a classification tool for taking the decision through applying simplified-Class Support Vector Machine approach. The proposed SVM approach can convert local Euclidean distances between frame vectors to angles by projecting these -dimensional vectors together, and get the minimum global distance from the non-linear aligned speech path in order to address audio classiﬁcation, and hence, sound recognition.The DWT for each frame of the spoken word are taken as a tool for extracting the main feature as a data code vectors, next these data is normalized utilizing the normalized power algorithm that is used to reduce the number of feature vector coefficients then these data is scaled and tested with those stored of the training spoken words to achieve the speaker identification tasks, also the DWT gives fixed amount of data that can be utilized modesty by SVM.Finally, the proposed method is tested and trained upon a very large data base with results limited to ten speakers only (5 males and 5 females) with words of maximally 17 phenomena and its performance gives an accurate and stable results which rises the algorithm efficiency and reduce the execution time with 97% overall accuracy.

Style APA, Harvard, Vancouver, ISO itp.

48

PAL, AMITA, SMARAJIT BOSE, GOPAL K. BASAK i AMITAVA MUKHOPADHYAY. "SPEAKER IDENTIFICATION BY AGGREGATING GAUSSIAN MIXTURE MODELS (GMMs) BASED ON UNCORRELATED MFCC-DERIVED FEATURES". International Journal of Pattern Recognition and Artificial Intelligence 28, nr 04 (czerwiec 2014): 1456006. http://dx.doi.org/10.1142/s0218001414560060.

Pełny tekst źródła

Streszczenie:

For solving speaker identification problems, the approach proposed by Reynolds [IEEE Signal Process. Lett.2 (1995) 46–48], using Gaussian Mixture Models (GMMs) based on Mel Frequency Cepstral Coefficients (MFCCs) as features, is one of the most effective available in the literature. The use of GMMs for modeling speaker identity is motivated by the interpretation that the Gaussian components represent some general speaker-dependent spectral shapes, and also by the capability of Gaussian mixtures to model arbitrary densities. In this work, we have initially illustrated, with the help of a new bilingual speech corpus, how the well-known principal component transformation, in conjunction with the principle of classifier combination can be used to enhance the performance of the MFCC-GMM speaker recognition systems significantly. Subsequently, we have emphatically and rigorously established the same using the benchmark speech corpus NTIMIT. A significant outcome of this work is that the proposed approach has the potential to enhance the performance of any speaker recognition system based on correlated features.

Style APA, Harvard, Vancouver, ISO itp.

49

Nayana, P. K., Dominic Mathew i Abraham Thomas. "Comparison of Text Independent Speaker Identification Systems using GMM and i-Vector Methods". Procedia Computer Science 115 (2017): 47–54. http://dx.doi.org/10.1016/j.procs.2017.09.075.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

50

Xu, Limin, i Zhenmin Tang. "Speaker identification using multi-step clustering algorithm with transformation-based GMM". Automatic Control and Computer Sciences 41, nr 4 (sierpień 2007): 224–31. http://dx.doi.org/10.3103/s0146411607040062.

Pełny tekst źródła

Style APA, Harvard, Vancouver, ISO itp.

Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!