Gotowa bibliografia na temat „Speaker identification systems”

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Zobacz listy aktualnych artykułów, książek, rozpraw, streszczeń i innych źródeł naukowych na temat „Speaker identification systems”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Artykuły w czasopismach na temat "Speaker identification systems"

1

Jayanna, H. S., i B. G. Nagaraja. "An Experimental Comparison of Modeling Techniques and Combination of Speaker – Specific Information from Different Languages for Multilingual Speaker Identification". Journal of Intelligent Systems 25, nr 4 (1.10.2016): 529–38. http://dx.doi.org/10.1515/jisys-2014-0128.

Pełny tekst źródła
Streszczenie:
AbstractMost of the state-of-the-art speaker identification systems work on a monolingual (preferably English) scenario. Therefore, English-language autocratic countries can use the system efficiently for speaker recognition. However, there are many countries, including India, that are multilingual in nature. People in such countries have habituated to speak multiple languages. The existing speaker identification system may yield poor performance if a speaker’s train and test data are in different languages. Thus, developing a robust multilingual speaker identification system is an issue in many countries. In this work, an experimental evaluation of the modeling techniques, including self-organizing map (SOM), learning vector quantization (LVQ), and Gaussian mixture model-universal background model (GMM-UBM) classifiers for multilingual speaker identification, is presented. The monolingual and crosslingual speaker identification studies are conducted using 50 speakers of our own database. It is observed from the experimental results that the GMM-UBM classifier gives better identification performance than the SOM and LVQ classifiers. Furthermore, we propose a combination of speaker-specific information from different languages for crosslingual speaker identification, and it is observed that the combination feature gives better performance in all the crosslingual speaker identification experiments.
Style APA, Harvard, Vancouver, ISO itp.
2

Shah, Shahid Munir, Muhammad Moinuddin i Rizwan Ahmed Khan. "A Robust Approach for Speaker Identification Using Dialect Information". Applied Computational Intelligence and Soft Computing 2022 (7.03.2022): 1–16. http://dx.doi.org/10.1155/2022/4980920.

Pełny tekst źródła
Streszczenie:
The present research is an effort to enhance the performance of voice processing systems, in our case the speaker identification system (SIS) by addressing the variability caused by the dialectical variations of a language. We present an effective solution to reduce dialect-related variability from voice processing systems. The proposed method minimizes the system’s complexity by reducing search space during the testing process of speaker identification. The speaker is searched from the set of speakers of the identified dialect instead of all the speakers present in system training. The study is conducted on the Pashto language, and the voice data samples are collected from native Pashto speakers of specific regions of Pakistan and Afghanistan where Pashto is spoken with different dialectal variations. The task of speaker identification is achieved with the help of a novel hierarchical framework that works in two steps. In the first step, the speaker’s dialect is identified. For automated dialect identification, the spectral and prosodic features have been used in conjunction with Gaussian mixture model (GMM). In the second step, the speaker is identified using a multilayer perceptron (MLP)-based speaker identification system, which gets aggregated input from the first step, i.e., dialect identification along with prosodic and spectral features. The robustness of the proposed SIS is compared with traditional state-of-the-art methods in the literature. The results show that the proposed framework is better in terms of average speaker recognition accuracy (84.5% identification accuracy) and consumes 39% less time for the identification of speaker.
Style APA, Harvard, Vancouver, ISO itp.
3

Singh, Mahesh K., P. Mohana Satya, Vella Satyanarayana i Sridevi Gamini. "Speaker Recognition Assessment in a Continuous System for Speaker Identification". International Journal of Electrical and Electronics Research 10, nr 4 (30.12.2022): 862–67. http://dx.doi.org/10.37391/ijeer.100418.

Pełny tekst źródła
Streszczenie:
This research article presented and focused on recognizing speakers through multi-speaker speeches. The participation of several speakers includes every conference, talk or discussion. This type of talk has different problems as well as stages of processing. Challenges include the unique impurity of the surroundings, the involvement of speakers, speaker distance, microphone equipment etc. In addition to addressing these hurdles in real time, there are also problems in the treatment of the multi-speaker speech. Identifying speech segments, separating the speaking segments, constructing clusters of similar segments and finally recognizing the speaker using these segments are the common sequential operations in the context of multi-speaker speech recognition. All linked phases of speech recognition processes are discussed with relevant methodologies in this article. This entire article will examine the common metrics, methods and conduct. This paper examined the algorithm of speech recognition system at different stages. The voice recognition systems are built through many phases such as voice filter, speaker segmentation, speaker idolization and the recognition of the speaker by 20 speakers.
Style APA, Harvard, Vancouver, ISO itp.
4

EhKan, Phaklen, Timothy Allen i Steven F. Quigley. "FPGA Implementation for GMM-Based Speaker Identification". International Journal of Reconfigurable Computing 2011 (2011): 1–8. http://dx.doi.org/10.1155/2011/420369.

Pełny tekst źródła
Streszczenie:
In today's society, highly accurate personal identification systems are required. Passwords or pin numbers can be forgotten or forged and are no longer considered to offer a high level of security. The use of biological features, biometrics, is becoming widely accepted as the next level for security systems. Biometric-based speaker identification is a method of identifying persons from their voice. Speaker-specific characteristics exist in speech signals due to different speakers having different resonances of the vocal tract. These differences can be exploited by extracting feature vectors such as Mel-Frequency Cepstral Coefficients (MFCCs) from the speech signal. A well-known statistical modelling process, the Gaussian Mixture Model (GMM), then models the distribution of each speaker's MFCCs in a multidimensional acoustic space. The GMM-based speaker identification system has features that make it promising for hardware acceleration. This paper describes the hardware implementation for classification of a text-independent GMM-based speaker identification system. The aim was to produce a system that can perform simultaneous identification of large numbers of voice streams in real time. This has important potential applications in security and in automated call centre applications. A speedup factor of ninety was achieved compared to a software implementation on a standard PC.
Style APA, Harvard, Vancouver, ISO itp.
5

Alkhatib, Bassel, i Mohammad Madian Waleed Kamal Eddin. "Voice Identification Using MFCC and Vector Quantization". Baghdad Science Journal 17, nr 3(Suppl.) (8.09.2020): 1019. http://dx.doi.org/10.21123/bsj.2020.17.3(suppl.).1019.

Pełny tekst źródła
Streszczenie:
The speaker identification is one of the fundamental problems in speech processing and voice modeling. The speaker identification applications include authentication in critical security systems and the accuracy of the selection. Large-scale voice recognition applications are a major challenge. Quick search in the speaker database requires fast, modern techniques and relies on artificial intelligence to achieve the desired results from the system. Many efforts are made to achieve this through the establishment of variable-based systems and the development of new methodologies for speaker identification. Speaker identification is the process of recognizing who is speaking using the characteristics extracted from the speech's waves like pitch, tone, and frequency. The speaker's models are created and saved in the system environment and used to verify the identity required by people accessing the systems, which allows access to various services that are controlled by voice, speaker identification involves two main parts: the first part is the feature extraction and the second part is the feature matching.
Style APA, Harvard, Vancouver, ISO itp.
6

Dwijayanti, Suci, Alvio Yunita Putri i Bhakti Yudho Suprapto. "Speaker Identification Using a Convolutional Neural Network". Jurnal RESTI (Rekayasa Sistem dan Teknologi Informasi) 6, nr 1 (27.02.2022): 140–45. http://dx.doi.org/10.29207/resti.v6i1.3795.

Pełny tekst źródła
Streszczenie:
Speech, a mode of communication between humans and machines, has various applications, including biometric systems for identifying people have access to secure systems. Feature extraction is an important factor in speech recognition with high accuracy. Therefore, we implemented a spectrogram, which is a pictorial representation of speech in terms of raw features, to identify speakers. These features were inputted into a convolutional neural network (CNN), and a CNN-visual geometry group (CNN-VGG) architecture was used to recognize the speakers. We used 780 primary data from 78 speakers, and each speaker uttered a number in Bahasa Indonesia. The proposed architecture, CNN-VGG-f, has a learning rate of 0.001, batch size of 256, and epoch of 100. The results indicate that this architecture can generate a suitable model for speaker identification. A spectrogram was used to determine the best features for identifying the speakers. The proposed method exhibited an accuracy of 98.78%, which is significantly higher than the accuracies of the method involving Mel-frequency cepstral coefficients (MFCCs; 34.62%) and the combination of MFCCs and deltas (26.92%). Overall, CNN-VGG-f with the spectrogram can identify 77 speakers from the samples, validating the usefulness of the combination of spectrograms and CNN in speech recognition applications.
Style APA, Harvard, Vancouver, ISO itp.
7

Khoma, Volodymyr, Yuriy Khoma, Vitalii Brydinskyi i Alexander Konovalov. "Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library". Sensors 23, nr 4 (13.02.2023): 2082. http://dx.doi.org/10.3390/s23042082.

Pełny tekst źródła
Streszczenie:
Diarization is an important task when work with audiodata is executed, as it provides a solution to the problem related to the need of dividing one analyzed call recording into several speech recordings, each of which belongs to one speaker. Diarization systems segment audio recordings by defining the time boundaries of utterances, and typically use unsupervised methods to group utterances belonging to individual speakers, but do not answer the question “who is speaking?” On the other hand, there are biometric systems that identify individuals on the basis of their voices, but such systems are designed with the prerequisite that only one speaker is present in the analyzed audio recording. However, some applications involve the need to identify multiple speakers that interact freely in an audio recording. This paper proposes two architectures of speaker identification systems based on a combination of diarization and identification methods, which operate on the basis of segment-level or group-level classification. The open-source PyAnnote framework was used to develop the system. The performance of the speaker identification system was verified through the application of the AMI Corpus open-source audio database, which contains 100 h of annotated and transcribed audio and video data. The research method consisted of four experiments to select the best-performing supervised diarization algorithms on the basis of PyAnnote. The first experiment was designed to investigate how the selection of the distance function between vector embedding affects the reliability of identification of a speaker’s utterance in a segment-level classification architecture. The second experiment examines the architecture of cluster-centroid (group-level) classification, i.e., the selection of the best clustering and classification methods. The third experiment investigates the impact of different segmentation algorithms on the accuracy of identifying speaker utterances, and the fourth examines embedding window sizes. Experimental results demonstrated that the group-level approach offered better identification results were compared to the segment-level approach, and the latter had the advantage of real-time processing.
Style APA, Harvard, Vancouver, ISO itp.
8

Kamiński, Kamil A., i Andrzej P. Dobrowolski. "Automatic Speaker Recognition System Based on Gaussian Mixture Models, Cepstral Analysis, and Genetic Selection of Distinctive Features". Sensors 22, nr 23 (1.12.2022): 9370. http://dx.doi.org/10.3390/s22239370.

Pełny tekst źródła
Streszczenie:
This article presents the Automatic Speaker Recognition System (ASR System), which successfully resolves problems such as identification within an open set of speakers and the verification of speakers in difficult recording conditions similar to telephone transmission conditions. The article provides complete information on the architecture of the various internal processing modules of the ASR System. The speaker recognition system proposed in the article, has been compared very closely to other competing systems, achieving improved speaker identification and verification results, on known certified voice dataset. The ASR System owes this to the dual use of genetic algorithms both in the feature selection process and in the optimization of the system’s internal parameters. This was also influenced by the proprietary feature generation and corresponding classification process using Gaussian mixture models. This allowed the development of a system that makes an important contribution to the current state of the art in speaker recognition systems for telephone transmission applications with known speech coding standards.
Style APA, Harvard, Vancouver, ISO itp.
9

Sarma, Mousmita, i Kandarpa Kumar Sarma. "Vowel Phoneme Segmentation for Speaker Identification Using an ANN-Based Framework". Journal of Intelligent Systems 22, nr 2 (1.06.2013): 111–30. http://dx.doi.org/10.1515/jisys-2012-0050.

Pełny tekst źródła
Streszczenie:
AbstractVowel phonemes are a part of any acoustic speech signal. Vowel sounds occur in speech more frequently and with higher energy. Therefore, vowel phoneme can be used to extract different amounts of speaker discriminative information in situations where acoustic information is noise corrupted. This article presents an approach to identify a speaker using the vowel sound segmented out from words spoken by the speaker. The work uses a combined self-organizing map (SOM)- and probabilistic neural network (PNN)-based approach to segment the vowel phoneme. The segmented vowel is later used to identify the speaker of the word by matching the patterns with a learning vector quantization (LVQ)-based code book. The LVQ code book is prepared by taking features of clean vowel phonemes uttered by the male and female speakers to be identified. The proposed work formulates a framework for the design of a speaker-recognition model of the Assamese language, which is spoken by ∼3 million people in the Northeast Indian state of Assam. The experimental results show that the segmentation success rates obtained using a SOM-based technique provides an increase of at least 7% compared with the discrete wavelet transform-based technique. This increase contributes to the improvement in overall performance of speaker identification by ∼3% compared with earlier related works.
Style APA, Harvard, Vancouver, ISO itp.
10

Nagaraja, B. G., i H. S. Jayanna. "Multilingual Speaker Identification by Combining Evidence from LPR and Multitaper MFCC". Journal of Intelligent Systems 22, nr 3 (1.09.2013): 241–51. http://dx.doi.org/10.1515/jisys-2013-0038.

Pełny tekst źródła
Streszczenie:
AbstractIn this work, the significance of combining the evidence from multitaper mel-frequency cepstral coefficients (MFCC), linear prediction residual (LPR), and linear prediction residual phase (LPRP) features for multilingual speaker identification with the constraint of limited data condition is demonstrated. The LPR is derived from linear prediction analysis, and LPRP is obtained by dividing the LPR using its Hilbert envelope. The sine-weighted cepstrum estimators (SWCE) with six tapers are considered for multitaper MFCC feature extraction. The Gaussian mixture model–universal background model is used for modeling each speaker for different evidence. The evidence is then combined at scoring level to improve the performance. The monolingual, crosslingual, and multilingual speaker identification studies were conducted using 30 randomly selected speakers from the IITG multivariability speaker recognition database. The experimental results show that the combined evidence improves the performance by nearly 8–10% compared with individual evidence.
Style APA, Harvard, Vancouver, ISO itp.

Rozprawy doktorskie na temat "Speaker identification systems"

1

Wildermoth, Brett Richard, i n/a. "Text-Independent Speaker Recognition Using Source Based Features". Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Pełny tekst źródła
Streszczenie:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Style APA, Harvard, Vancouver, ISO itp.
2

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features". Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Pełny tekst źródła
Streszczenie:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
Style APA, Harvard, Vancouver, ISO itp.
3

Phythian, Mark. "Speaker identification for forensic applications". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36079/3/__qut.edu.au_Documents_StaffHome_StaffGroupR%24_rogersjm_Desktop_36079_Digitised%20Thesis.pdf.

Pełny tekst źródła
Streszczenie:
A major application of Speaker Identification (SI) is suspect identification by voice. This thesis investigates techniques that can be used to improve SI technology as applied to suspect identification. Speech Coding techniques have become integrated into many of our modern voice communications systems. This prompts the question - how are automatic speaker identification systems and modern forensic identification techniques affected by the introduction of digitally coded speech channels? Presented in this thesis are three separate studies investigating the effects of speech coding and compression on current speaker recognition techniques. A relatively new Spectral Analysis technique - Higher Order Spectral Analysis (HOSA) - has been identified as a potential candidate for improving some aspects of forensic speaker identification tasks. Presented in this thesis is a study investigating the application of HOSA to improve the robustness of current ASR techniques in the presence of additive Gaussian noise. Results from our investigations reveal that incremental improvements in each of these aspects related to automatic and forensic identification are achievable.
Style APA, Harvard, Vancouver, ISO itp.
4

Gedik, Berk. "Evaluation of Text-Independent and Closed-Set Speaker Identification Systems". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-239625.

Pełny tekst źródła
Streszczenie:
Speaker recognition is the task of recognizing a speaker of a given speech record and it has wide application areas. In this thesis, various machine learning models such as Gaussian Mixture Model (GMM), k-Nearest Neighbor(k-NN) Model and Support Vector Machines (SVM) and feature extraction methods such as Mel-Frequency Cepstral Coefficients (MFCC) and Linear Predictive Cepstral Coefficients (LPCC) are investigated for the speaker recognition task. Combinations of those models and feature extraction methods are evaluated on many datasets varying on the number of speakers and training data size. This way, the performance of methods in different settings are analyzed. As results, it is found that GMM and KNN methods are providing good accuracies and LPCC method performs better than MFCC. Also, the effect of audio recording duration, training data duration and number of speakers on the prediction accuracy is analyzed.
Talarigenkänning är en benämning på tekniker som syftar till att identifiera en talare givet en inspelning av dennes röst; dessa tekniker har ett brett användningsområde. I det här examensarbetet tillämpas ett antal maskininlärningsmodeller på uppgiften att känna igen talare. Modellerna är Gaussian Mixture Model(GMM), k-Nearest Neighbour(k-NN) och Support Vector Machine(SVM). Olika tekniker för att ta fram  variabler till modelleringen provas, såsom Mel-Frequency Cepstral Coefficients (MFCC) och Linear Predictive Cepstral Coefficients (LPCC). Teknikernas lämplighet för talarigenkänning undersöks. Kombinationer av ovan nämnda modeller och tekniker utvärderas över många olika dataset som skiljer sig åt i antalet talare samt mängden data. På så sätt utvärderas och analyseras de olika metoderna för olika förut- sättningar. Resultaten innehåller bland annat utfallen att både GMM och kNN ger hög träffsäkerhet medan LPCC ger högre träffsäkerhet än MFCC. Även effekten av inspelningslängden för de olika rösterna, den sammanlagda längden på träningsdatan samt antalet talare på de olika modellerna analyseras och presenteras.
Style APA, Harvard, Vancouver, ISO itp.
5

Cohen, Zachary Gideon. "Noise Reduction with Microphone Arrays for Speaker Identification". DigitalCommons@CalPoly, 2012. https://digitalcommons.calpoly.edu/theses/884.

Pełny tekst źródła
Streszczenie:
The presence of acoustic noise in audio recordings is an ongoing issue that plagues many applications. This ambient background noise is difficult to reduce due to its unpredictable nature. Many single channel noise reduction techniques exist but are limited in that they may distort the desired speech signal due to overlapping spectral content of the speech and noise. It is therefore of interest to investigate the use of multichannel noise reduction algorithms to further attenuate noise while attempting to preserve the speech signal of interest. Specifically, this thesis looks to investigate the use of microphone arrays in conjunction with multichannel noise reduction algorithms to aid aiding in speaker identification. Recording a speaker in the presence of acoustic background noise ultimately limits the performance and confidence of speaker identification algorithms. In situations where it is impossible to control the noise environment where the speech sample is taken, noise reduction algorithms must be developed and applied to clean the speech signal in order to give speaker identification software a chance at a positive identification. Due to the limitations of single channel techniques, it is of interest to see if spatial information provided by microphone arrays can be exploited to aid in speaker identification. This thesis provides an exploration of several time domain multichannel noise reduction techniques including delay sum beamforming, multi-channel Wiener filtering, and Spatial-Temporal Prediction filtering. Each algorithm is prototyped and filter performance is evaluated using various simulations and experiments. A three-dimensional noise model is developed to simulate and compare the performance of the above methods and experimental results of three data collections are presented and analyzed. The algorithms are compared and recommendations are given for the use of each technique. Finally, ideas for future work are discussed to improve performance and implementation of these multichannel algorithms. Possible applications for this technology include audio surveillance, identity verification, video chatting, conference calling and sound source localization.
Style APA, Harvard, Vancouver, ISO itp.
6

Chan, Siu Man. "Improved speaker verification with discrimination power weighting /". View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20CHANS.

Pełny tekst źródła
Streszczenie:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 86-93). Also available in electronic version. Access restricted to campus users.
Style APA, Harvard, Vancouver, ISO itp.
7

Al-Kaltakchi, Musab Tahseen Salahaldeen. "Robust text independent closed set speaker identification systems and their evaluation". Thesis, University of Newcastle upon Tyne, 2018. http://hdl.handle.net/10443/3978.

Pełny tekst źródła
Streszczenie:
This thesis focuses upon text independent closed set speaker identi cation. The contributions relate to evaluation studies in the presence of various types of noise and handset e ects. Extensive evaluations are performed on four databases. The rst contribution is in the context of the use of the Gaussian Mixture Model-Universal Background Model (GMM-UBM) with original speech recordings from only the TIMIT database. Four main simulations for Speaker Identi cation Accuracy (SIA) are presented including di erent fusion strategies: Late fusion (score based), early fusion (feature based) and early-late fusion (combination of feature and score based), late fusion using concatenated static and dynamic features (features with temporal derivatives such as rst order derivative delta and second order derivative delta-delta features, namely acceleration features), and nally fusion of statistically independent normalized scores. The second contribution is again based on the GMM-UBM approach. Comprehensive evaluations of the e ect of Additive White Gaussian Noise (AWGN), and Non-Stationary Noise (NSN) (with and without a G.712 type handset) upon identi cation performance are undertaken. In particular, three NSN types with varying Signal to Noise Ratios (SNRs) were tested corresponding to: street tra c, a bus interior and a crowded talking environment. The performance evaluation also considered the e ect of late fusion techniques based on score fusion, namely mean, maximum, and linear weighted sum fusion. The databases employed were: TIMIT, SITW, and NIST 2008; and 120 speakers were selected from each database to yield 3,600 speech utterances. The third contribution is based on the use of the I-vector, four combinations of I-vectors with 100 and 200 dimensions were employed. Then, various fusion techniques using maximum, mean, weighted sum and cumulative fusion with the same I-vector dimension were used to improve the SIA. Similarly, both interleaving and concatenated I-vector fusion were exploited to produce 200 and 400 I-vector dimensions. The system was evaluated with four di erent databases using 120 speakers from each database. TIMIT, SITW and NIST 2008 databases were evaluated for various types of NSN namely, street-tra c NSN, bus-interior NSN and crowd talking NSN; and the G.712 type handset at 16 kHz was also applied. As recommendations from the study in terms of the GMM-UBM approach, mean fusion is found to yield overall best performance in terms of the SIA with noisy speech, whereas linear weighted sum fusion is overall best for original database recordings. However, in the I-vector approach the best SIA was obtained from the weighted sum and the concatenated fusion.
Style APA, Harvard, Vancouver, ISO itp.
8

Reynolds, Douglas A. "A Gaussian mixture modeling approach to text-independent speaker identification". Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/16903.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Leis, John W. "Spectral coding methods for speech compression and speaker identification". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36062/7/36062_Digitised_Thesis.pdf.

Pełny tekst źródła
Streszczenie:
This thesis investigates aspects of encoding the speech spectrum at low bit rates, with extensions to the effect of such coding on automatic speaker identification. Vector quantization (VQ) is a technique for jointly quantizing a block of samples at once, in order to reduce the bit rate of a coding system. The major drawback in using VQ is the complexity of the encoder. Recent research has indicated the potential applicability of the VQ method to speech when product code vector quantization (PCVQ) techniques are utilized. The focus of this research is the efficient representation, calculation and utilization of the speech model as stored in the PCVQ codebook. In this thesis, several VQ approaches are evaluated, and the efficacy of two training algorithms is compared experimentally. It is then shown that these productcode vector quantization algorithms may be augmented with lossless compression algorithms, thus yielding an improved overall compression rate. An approach using a statistical model for the vector codebook indices for subsequent lossless compression is introduced. This coupling of lossy compression and lossless compression enables further compression gain. It is demonstrated that this approach is able to reduce the bit rate requirement from the current 24 bits per 20 millisecond frame to below 20, using a standard spectral distortion metric for comparison. Several fast-search VQ methods for use in speech spectrum coding have been evaluated. The usefulness of fast-search algorithms is highly dependent upon the source characteristics and, although previous research has been undertaken for coding of images using VQ codebooks trained with the source samples directly, the product-code structured codebooks for speech spectrum quantization place new constraints on the search methodology. The second major focus of the research is an investigation of the effect of lowrate spectral compression methods on the task of automatic speaker identification. The motivation for this aspect of the research arose from a need to simultaneously preserve the speech quality and intelligibility and to provide for machine-based automatic speaker recognition using the compressed speech. This is important because there are several emerging applications of speaker identification where compressed speech is involved. Examples include mobile communications where the speech has been highly compressed, or where a database of speech material has been assembled and stored in compressed form. Although these two application areas have the same objective - that of maximizing the identification rate - the starting points are quite different. On the one hand, the speech material used for training the identification algorithm may or may not be available in compressed form. On the other hand, the new test material on which identification is to be based may only be available in compressed form. Using the spectral parameters which have been stored in compressed form, two main classes of speaker identification algorithm are examined. Some studies have been conducted in the past on bandwidth-limited speaker identification, but the use of short-term spectral compression deserves separate investigation. Combining the major aspects of the research, some important design guidelines for the construction of an identification model when based on the use of compressed speech are put forward.
Style APA, Harvard, Vancouver, ISO itp.
10

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition". Thesis, Queensland University of Technology, 2001.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Książki na temat "Speaker identification systems"

1

Christian, Müller, red. Speaker classification. Berlin: Springer, 2007.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Sabourin, Conrad. Computational speech processing: Speech analysis, recognition, understanding, compression, transmission, coding, synthesis, text to speech systems, speech to tactile displays, speaker identification, prosody processing : bibliography. Montréal: Infolingua, 1994.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Franz, Gerl, Minker Wolfgang i SpringerLink (Online service), red. Self-Learning Speaker Identification: A System for Enhanced Speech Recognition. Berlin, Heidelberg: Springer-Verlag Berlin Heidelberg, 2011.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Dialect Accent Features For Establishing Speaker Identity A Case Study. Springer, 2012.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Herbig, Tobias, Franz Gerl i Wolfgang Minker. Self-Learning Speaker Identification: A System for Enhanced Speech Recognition. Springer, 2013.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Haslam, Nick. Reliability, Validity, and the Mixed Blessings of Operationalism. Redaktorzy K. W. M. Fulford, Martin Davies, Richard G. T. Gipps, George Graham, John Z. Sadler, Giovanni Stanghellini i Tim Thornton. Oxford University Press, 2013. http://dx.doi.org/10.1093/oxfordhb/9780199579563.013.0058.

Pełny tekst źródła
Streszczenie:
The concepts of reliability and validity are fundamental for evaluating psychiatric diagnosis, including the "operationalist" approach pioneered in DSM-III. This chapter explores the complexity of these psychometric concepts and their interrelations. Although reliability constrains validity it does not guarantee it, and pursuing reliability in diagnosis can reduce validity. It is widely believed that the operationalist emphasis on diagnostic reliability has compromised the validity of recent psychiatric classifications. In particular, writers have argued that the drive for atheoretical diagnostic criteria has come at the cost of phenomenological richness and psychodynamic complexity. This chapter argues that although the operationalist turn may have impaired the validity of psychiatric diagnosis in some respects, these criticisms must be balanced by an appreciation of its benefits. In addition, it is suggested that some criticisms rest on a misunderstanding of the goals of operational descriptions. They should be evaluated primarily on pragmatic grounds as identification procedures and judged on their success in serving epistemic and communicative functions. Operational descriptions should not be viewed as comprehensive definitions of clinical phenomena or judged on their failure to encompass the richness and complexity of mental disorders. A diagnostic system is best understood as an intentionally delimited instrument for enabling clinical inference and communication. In essence, it is a simplified pidgin with which clinicians who speak different first languages (theoretical orientations) can conduct their shared business.
Style APA, Harvard, Vancouver, ISO itp.

Części książek na temat "Speaker identification systems"

1

Chandra, Mahesh, Pratibha Nandi, Aparajita kumari i Shipra Mishra. "Spectral-Subtraction Based Features for Speaker Identification". W Advances in Intelligent Systems and Computing, 529–36. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-12012-6_58.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Takamizawa, Manaka, Satoru Tsuge, Yasuo Horiuchi i Shingo Kuroiwa. "Same Speaker Identification with Deep Learning and Application to Text-Dependent Speaker Verification". W Human Centred Intelligent Systems, 149–58. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-3455-1_11.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Xu, Yuting, i Hongyan Wang. "Cross-Linguistic Speaker Identification by Monophthongal Vowels". W Advances in Intelligent, Interactive Systems and Applications, 298–305. Cham: Springer International Publishing, 2019. http://dx.doi.org/10.1007/978-3-030-02804-6_40.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Rodarte-Rodríguez, Armando, Aldonso Becerra-Sánchez, José I. De La Rosa-Vargas, Nivia I. Escalante-García, José E. Olvera-González, Emmanuel de J. Velásquez-Martínez i Gustavo Zepeda-Valles. "Speaker Identification in Noisy Environments for Forensic Purposes". W Lecture Notes in Networks and Systems, 299–312. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-20322-0_21.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Radisavljević, Dušan, Bojan Batalo, Rafal Rzepka i Kenji Araki. "Text-Based Speaker Identification for Video Game Dialogues". W Lecture Notes in Networks and Systems, 44–54. Cham: Springer International Publishing, 2021. http://dx.doi.org/10.1007/978-3-030-82199-9_4.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Rekik, Ahmed, Achraf Ben-Hamadou i Walid Mahdi. "Unified System for Visual Speech Recognition and Speaker Identification". W Advanced Concepts for Intelligent Vision Systems, 381–90. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-25903-1_33.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Srivastava, Sumit, Mahesh Chandra i G. Sahoo. "Phase Based Mel Frequency Cepstral Coefficients for Speaker Identification". W Advances in Intelligent Systems and Computing, 309–16. New Delhi: Springer India, 2016. http://dx.doi.org/10.1007/978-81-322-2757-1_31.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Mazumder, Avirup, Subhayu Ghosh, Swarup Roy, Sandipan Dhar i Nanda Dulal Jana. "Rectified Adam Optimizer-Based CNN Model for Speaker Identification". W Lecture Notes in Networks and Systems, 155–62. Singapore: Springer Nature Singapore, 2022. http://dx.doi.org/10.1007/978-981-19-0825-5_16.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Soğanci, Ruhsar, Fikret Gürgen i Haluk Topcuoğlu. "Parallel Implementation of a VQ-Based Text-Independent Speaker Identification". W Advances in Information Systems, 291–300. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004. http://dx.doi.org/10.1007/978-3-540-30198-1_30.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Al-Shamma, Omran, Mohammed A. Fadhel i Haitham S. Hasan. "Employing FPGA Accelerator in Real-Time Speaker Identification Systems". W Recent Trends in Signal and Image Processing, 125–34. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6783-0_12.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Streszczenia konferencji na temat "Speaker identification systems"

1

Maqueda, Emmanuel, Javier Alvarez i Ivan Meza. "Towards forensic speaker identification in Spanish using triplet loss". W LatinX in AI at Neural Information Processing Systems Conference 2020. Journal of LatinX in AI Research, 2020. http://dx.doi.org/10.52591/lxai2020121210.

Pełny tekst źródła
Streszczenie:
This work explores the use of a triplet loss deep network setting for the forensic identification of speakers in Spanish. Within the framework, we train a convolutional network to produce vector representations of speech spectrogram slices. Then we test how similar these vectors are for a given speaker and how dissimilar are compared with other speakers. Based on these metrics we propose the calculation of the Likelihood Radio which is a cornerstone for forensic identification.
Style APA, Harvard, Vancouver, ISO itp.
2

Apsingekar, Vijendra Raj, i Phillip L. De Leon. "Efficient speaker identification using distributional speaker model clustering". W 2008 42nd Asilomar Conference on Signals, Systems and Computers. IEEE, 2008. http://dx.doi.org/10.1109/acssc.2008.5074619.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Indumathi, A., i E. Chandra. "Speaker identification using bagging techniques". W 2015 International Conference on Computers, Communications, and Systems (ICCCS). IEEE, 2015. http://dx.doi.org/10.1109/ccoms.2015.7562905.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Jawarkar, N. P., R. S. Holambe i T. K. Basu. "Speaker Identification Using Whispered Speech". W 2013 International Conference on Communication Systems and Network Technologies (CSNT 2013). IEEE, 2013. http://dx.doi.org/10.1109/csnt.2013.167.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
5

Byun, Sung-Woo, i Seok-Pil Lee. "Implementation of Speaker Identification Using Speaker Localization for Conference System". W The 2nd World Congress on Electrical Engineering and Computer Systems and Science. Avestia Publishing, 2016. http://dx.doi.org/10.11159/mhci16.110.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Abou-Zleikha, Mohamed, Zheng-Hua Tan, Mads Graesboll Christensen i Soren Holdt Jensen. "A discriminative approach for speaker selection in speaker de-identification systems". W 2015 23rd European Signal Processing Conference (EUSIPCO). IEEE, 2015. http://dx.doi.org/10.1109/eusipco.2015.7362755.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Elmisery, F. A., H. F. Hammed, A. E. Salama i F. El-Geldawi. "Speaker identification system based on FPGA". W 2005 12th IEEE International Conference on Electronics, Circuits and Systems (ICECS 2005). IEEE, 2005. http://dx.doi.org/10.1109/icecs.2005.4633573.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Oh, Jae C., i Misty Blowers. "Open-set speaker identification with classifier systems". W Defense and Security Symposium, redaktorzy Kevin Schum i Alex F. Sisti. SPIE, 2006. http://dx.doi.org/10.1117/12.668791.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Daqrouq, Khaled, Wael Al-Sawalmeh, Abdel-Rahman Al-Qawasmi i Ibrahim N. Abu-Isbeih. "Speaker Identification Wavelet Transform based method". W 2008 5th International Multi-Conference on Systems, Signals and Devices. IEEE, 2008. http://dx.doi.org/10.1109/ssd.2008.4632901.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Meftah, Ali, Hassan Mathkour, Mustafa Qamhan i Yousef Alotaibi. "Speaker Identification in Different Emotional States". W 2020 12th International Symposium on Communication Systems, Networks and Digital Signal Processing (CSNDSP). IEEE, 2020. http://dx.doi.org/10.1109/csndsp49049.2020.9249633.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Raporty organizacyjne na temat "Speaker identification systems"

1

Jin, Qin, i Yun Wang. Integrated Robust Open-Set Speaker Identification System (IROSIS). Fort Belvoir, VA: Defense Technical Information Center, maj 2012. http://dx.doi.org/10.21236/ada562148.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Mizrach, Amos, Michal Mazor, Amots Hetzroni, Joseph Grinshpun, Richard Mankin, Dennis Shuman, Nancy Epsky i Robert Heath. Male Song as a Tool for Trapping Female Medflies. United States Department of Agriculture, grudzień 2002. http://dx.doi.org/10.32747/2002.7586535.bard.

Pełny tekst źródła
Streszczenie:
This interdisciplinaray work combines expertise in engineering and entomology in Israel and the US, to develop an acoustic trap for mate-seeking female medflies. Medflies are among the world's most economically harmful pests, and monitoring and control efforts cost about $800 million each year in Israel and the US. Efficient traps are vitally important tools for medfly quarantine and pest management activities; they are needed for early detection, for predicting dispersal patterns and for estimating medfly abundance within infested regions. Early detection facilitates rapid response to invasions, in order to contain them. Prediction of dispersal patterns facilitates preemptive action, and estimates of the pests' abundance lead to quantification of medfly infestations and control efforts. Although olfactory attractants and traps exist for capturing male and mated female medflies, there are still no satisfactorily efficient means to attract and trap virgin and remating females (a significant and dangerous segment of the population). We proposed to explore the largely ignored mechanism of female attraction to male song that the flies use in courtship. The potential of such an approach is indicated by studies under this project. Our research involved the identification, isolation, and augmentation of the most attractive components of male medfly songs and the use of these components in the design and testing of traps incorporating acoustic lures. The project combined expertise in acoustic engineering and instrumentation, fruit fly behavior, and integrated pest management. The BARD support was provided for 1 year to enable proof-of-concept studies, aimed to determine: 1) whether mate-seeking female medflies are attracted to male songs; and 2) over what distance such attraction works. Male medfly calling song was recorded during courtship. Multiple acoustic components of male song were examined and tested for synergism with substrate vibrations produced by various surfaces, plates and loudspeakers, with natural and artificial sound playbacks. A speaker-funnel system was developed that focused the playback signal to reproduce as closely as possible the near-field spatial characteristics of the sounds produced by individual males. In initial studies, the system was tasted by observing the behavior of females while the speaker system played songs at various intensities. Through morning and early afternoon periods of peak sexual activity, virgin female medflies landed on a sheet of filter paper at the funnel outlet and stayed longer during broadcasting than during the silent part of the cycle. In later studies, females were captured on sticky paper at the funnel outlet. The mean capture rates were 67 and 44%, respectively, during sound emission and silent control periods. The findings confirmed that female trapping was improved if a male calling song was played. The second stage of the research focused on estimating the trapping range. Initial results indicated that the range possibly extended to 70 cm, but additional, verification tests remain to be conducted. Further studies are planned also to consider effects of combining acoustic and pheromonal cues.
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii