Journal articles on the topic 'Speaker recognition'

To see the other types of publications on this topic, follow the link: Speaker recognition.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Speaker recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Sun, Linhui, Yunyi Bu, Bo Zou, Sheng Fu, and Pingan Li. "Speaker Recognition Based on Fusion of a Deep and Shallow Recombination Gaussian Supervector." Electronics 10, no. 1 (December 25, 2020): 20. http://dx.doi.org/10.3390/electronics10010020.

Full text
Abstract:
Extracting speaker’s personalized feature parameters is vital for speaker recognition. Only one kind of feature cannot fully reflect the speaker’s personality information. In order to represent the speaker’s identity more comprehensively and improve speaker recognition rate, we propose a speaker recognition method based on the fusion feature of a deep and shallow recombination Gaussian supervector. In this method, the deep bottleneck features are first extracted by Deep Neural Network (DNN), which are used for the input of the Gaussian Mixture Model (GMM) to obtain the deep Gaussian supervector. On the other hand, we input the Mel-Frequency Cepstral Coefficient (MFCC) to GMM directly to extract the traditional Gaussian supervector. Finally, the two categories of features are combined in the form of horizontal dimension augmentation. In addition, when the number of speakers to be recognized increases, in order to prevent the system recognition rate from falling sharply, we introduce the optimization algorithm to find the optimal weight before the feature fusion. The experiment results indicate that the speaker recognition rate based on the feature which is fused directly can reach 98.75%, which is 5% and 0.62% higher than the traditional feature and deep bottleneck feature, respectively. When the number of speakers increases, the fusion feature based on optimized weight coefficients can improve the recognition rate by 0.81%. It is validated that our proposed fusion method can effectively consider the complementarity of the different types of features and improve the speaker recognition rate.
APA, Harvard, Vancouver, ISO, and other styles
2

Singh, Satyanand. "Forensic and Automatic Speaker Recognition System." International Journal of Electrical and Computer Engineering (IJECE) 8, no. 5 (October 1, 2018): 2804. http://dx.doi.org/10.11591/ijece.v8i5.pp2804-2811.

Full text
Abstract:
<span lang="EN-US">Current Automatic Speaker Recognition (ASR) System has emerged as an important medium of confirmation of identity in many businesses, ecommerce applications, forensics and law enforcement as well. Specialists trained in criminological recognition can play out this undertaking far superior by looking at an arrangement of acoustic, prosodic, and semantic attributes which has been referred to as structured listening. An algorithmbased system has been developed in the recognition of forensic speakers by physics scientists and forensic linguists to reduce the probability of a contextual bias or pre-centric understanding of a reference model with the validity of an unknown audio sample and any suspicious individual. Many researchers are continuing to develop automatic algorithms in signal processing and machine learning so that improving performance can effectively introduce the speaker’s identity, where the automatic system performs equally with the human audience. In this paper, I examine the literature about the identification of speakers by machines and humans, emphasizing the key technical speaker pattern emerging for the automatic technology in the last decade. I focus on many aspects of automatic speaker recognition (ASR) systems, including speaker-specific features, speaker models, standard assessment data sets, and performance metrics</span>
APA, Harvard, Vancouver, ISO, and other styles
3

Markowitz, Judith A. "Speaker recognition." Information Security Technical Report 3, no. 1 (January 1998): 14–20. http://dx.doi.org/10.1016/s1363-4127(98)80014-9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Markowitz, Judith A. "Speaker recognition." Information Security Technical Report 4 (January 1999): 28. http://dx.doi.org/10.1016/s1363-4127(99)80053-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Furui, Sadaoki. "Speaker recognition." Scholarpedia 3, no. 4 (2008): 3715. http://dx.doi.org/10.4249/scholarpedia.3715.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

O'Shaughnessy, D. "Speaker recognition." IEEE ASSP Magazine 3, no. 4 (October 1986): 4–17. http://dx.doi.org/10.1109/massp.1986.1165388.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Singh, Mahesh K., P. Mohana Satya, Vella Satyanarayana, and Sridevi Gamini. "Speaker Recognition Assessment in a Continuous System for Speaker Identification." International Journal of Electrical and Electronics Research 10, no. 4 (December 30, 2022): 862–67. http://dx.doi.org/10.37391/ijeer.100418.

Full text
Abstract:
This research article presented and focused on recognizing speakers through multi-speaker speeches. The participation of several speakers includes every conference, talk or discussion. This type of talk has different problems as well as stages of processing. Challenges include the unique impurity of the surroundings, the involvement of speakers, speaker distance, microphone equipment etc. In addition to addressing these hurdles in real time, there are also problems in the treatment of the multi-speaker speech. Identifying speech segments, separating the speaking segments, constructing clusters of similar segments and finally recognizing the speaker using these segments are the common sequential operations in the context of multi-speaker speech recognition. All linked phases of speech recognition processes are discussed with relevant methodologies in this article. This entire article will examine the common metrics, methods and conduct. This paper examined the algorithm of speech recognition system at different stages. The voice recognition systems are built through many phases such as voice filter, speaker segmentation, speaker idolization and the recognition of the speaker by 20 speakers.
APA, Harvard, Vancouver, ISO, and other styles
8

Gonzalez-Rodriguez, Joaquin. "Evaluating Automatic Speaker Recognition systems: An overview of the NIST Speaker Recognition Evaluations (1996-2014)." Loquens 1, no. 1 (June 30, 2014): e007. http://dx.doi.org/10.3989/loquens.2014.007.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Mannepalli, Kasiprasad, Suman Maloji, Panyam Narahari Sastry, Swetha Danthala, and Durgaprasad Mannepalli. "Text independent emotion recognition for Telugu speech by using prosodic features." International Journal of Engineering & Technology 7, no. 2.7 (March 18, 2018): 594. http://dx.doi.org/10.14419/ijet.v7i2.7.10887.

Full text
Abstract:
The human speech delivers different types of information about the speaker and speech. From the speech production side, the speech signal carries linguistic information such as the meaningful message and the language and emotional, geographical and the speaker’s physiological characteristics of the speaker information are conveyed. This paper focuses on automatically identifying the emotion of a speaker given a sample of speech. the speech signals considered in this work are collected from Telugu speakers. The features like pitch, pitch related prosody, energy and formants. The overall recognition accuracy obtained is 72% in this work.
APA, Harvard, Vancouver, ISO, and other styles
10

Lakshmi Prasanna, P. "Attention for the speech of cleft lip and palate in speaker recognition." Open Journal of Pain Medicine 7, no. 1 (December 1, 2023): 7–1. http://dx.doi.org/10.17352/ojpm.000036.

Full text
Abstract:
Artificial Intelligence (AI) has become indispensable to all people, primarily for the purposes of speaker recognition, voice identification, educational purposes, workplace, and health care. Based on a speaker’s voice characteristics, identification and recognition of the speaker is accomplished. The voice is affected by both intra- and interspeaker variability. In addition to this, a condition known as structural abnormalities can cause resonance, which can seriously affect voice quality. As a result, speakers may experience difficulties when using AI-based devices. The study aims to investigate the effects of speech with cleft lip and palate on speaker recognition. The review stated that even after surgery, some people with cleft lip and palate exhibit hypernasality and poor speech intelligibility depending on the severity of the cleft. The author discovered that artificial intelligence has been applied to surgical procedures. In children with corrected cleft lip and palate, acoustic analysis revealed poor benchmarking for speaker identification. The most prevalent type of hypernasality also affects speech intelligibility. Thus, more research on speaker recognition using different algorithms and hypernasality is essential. These can help speakers who have CLP to use AI freely and without any issues. Even with its flaws, people with CLP can still learn more about using AI.
APA, Harvard, Vancouver, ISO, and other styles
11

Aung, Dr Zaw Win. "Automatic Attendance System Using Speaker Recognition." International Journal of Trend in Scientific Research and Development Volume-2, Issue-6 (October 31, 2018): 802–6. http://dx.doi.org/10.31142/ijtsrd18763.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Zong, Feng. "Speaker Recognition Techniques." Applied Mechanics and Materials 599-601 (August 2014): 1716–19. http://dx.doi.org/10.4028/www.scientific.net/amm.599-601.1716.

Full text
Abstract:
The speaker recognition technology and development of the basic concepts of history, lists and compares several commonly used feature extraction and pattern matching methods, summarize the current problems and its development were discussed.
APA, Harvard, Vancouver, ISO, and other styles
13

Campbell, J. P., W. Shen, W. M. Campbell, R. Schwartz, J. F. Bonastre, and D. Matrouf. "Forensic speaker recognition." IEEE Signal Processing Magazine 26, no. 2 (March 2009): 95–103. http://dx.doi.org/10.1109/msp.2008.931100.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Li, Tingyu. "Speaker Recognition System based on Triplet State Loss Function." Scientific Journal of Technology 5, no. 8 (August 22, 2023): 39–46. http://dx.doi.org/10.54691/sjt.v5i8.5496.

Full text
Abstract:
The purpose of this paper is to build a model and design a speaker recognition system by comprehensively summarizing and learning the research data of speaker speech recognition models at home and abroad, and adopting a research method based on deep machine learning theory. Its main contents and proposed methods are as follows: For data processing, firstly, select and download the public data set from official website, preprocess each voice in the data set, extract Fbank features, convert it into. npy, store it in a file, process the voice into a format suitable for model input, and wait for subsequent input into the model. In practice, a ResCNN architecture based on convolution neural network is used to build a model. The model uses triplet loss function training to map speech to hyperplane, so cosine similarity is directly used to characterize the distance between two speakers. Speaker verification function provides three different ways to obtain speech, input the two acquired speech into the model, judge the similarity of the two speech and give the judgment result. For the speaker recognition model, three different ways can also be used to obtain the speech and determine which speaker the speech is in the corpus. For the speaker confirmation model, a speech is randomly played, and a speaker is randomly selected to judge whether the speech is the speaker's voice.
APA, Harvard, Vancouver, ISO, and other styles
15

Bouziane, Ayoub, Jamal Kharroubi, and Arsalane Zarghili. "Towards an Optimal Speaker Modeling in Speaker Verification Systems using Personalized Background Models." International Journal of Electrical and Computer Engineering (IJECE) 7, no. 6 (December 1, 2017): 3655. http://dx.doi.org/10.11591/ijece.v7i6.pp3655-3663.

Full text
Abstract:
<p>This paper presents a novel speaker modeling approachfor speaker recognition systems. The basic idea of this approach consists of deriving the target speaker model from a personalized background model, composed only of the UBM Gaussian components which are really present in the speech of the target speaker. The motivation behind the derivation of speakers’ models from personalized background models is to exploit the observeddifference insome acoustic-classes between speakers, in order to improve the performance of speaker recognition systems.</p>The proposed approach was evaluatedfor speaker verification task using various amounts of training and testing speech data. The experimental results showed that the proposed approach is efficientin termsof both verification performance and computational cost during the testing phase of the system, compared to the traditional UBM based speaker recognition systems.
APA, Harvard, Vancouver, ISO, and other styles
16

P.Mathiyalagan, Et al. "Deep Learning and Limited Boltzmann Machines for Speaker Recognition." International Journal on Recent and Innovation Trends in Computing and Communication 11, no. 1 (January 31, 2023): 247–50. http://dx.doi.org/10.17762/ijritcc.v11i1.9820.

Full text
Abstract:
Speaker recognition has become an essential aspect of modern voice-based systems such as security and authentication applications. In this research, we propose a new method for speaker recognition based on deep learning and limited Boltzmann machines. The method comprises preemphasis and overlapping type framing, endpoint detection, feature extraction, and training of a depth belief network pattern using a limited Boltzmann machine layer. The Softmax graders are added in the top layer of the pattern, and the speaker's phonetic feature is input into the pattern for training. The likelihood probability of other speakers' phonetic features is calculated, and the speaker corresponding to the maximum probability is identified as the recognized result. The results show that the proposed method outperforms other state-of-the-art methods, achieving high accuracy and robustness to noise and signal variations.
APA, Harvard, Vancouver, ISO, and other styles
17

Khadar Nawas, K., Manish Kumar Barik, and A. Nayeemulla Khan. "Speaker Recognition using Random Forest." ITM Web of Conferences 37 (2021): 01022. http://dx.doi.org/10.1051/itmconf/20213701022.

Full text
Abstract:
Speaker identification has become a mainstream technology in the field of machine learning that involves determining the identity of a speaker from his/her speech sample. A person’s speech note contains many features that can be used to discriminate his/her identity. A model that can identify a speaker has wide applications such as biometric authentication, security, forensics and human-machine interaction. This paper implements a speaker identification system based on Random Forest as a classifier to identify the various speakers using MFCC and RPS as feature extraction techniques. The output obtained from the Random Forest classifier shows promising result. It is observed that the accuracy level is significantly higher in MFCC as compared to the RPS technique on the data taken from the well-known TIMIT corpus dataset.
APA, Harvard, Vancouver, ISO, and other styles
18

Huang, X., and K. F. Lee. "On speaker-independent, speaker-dependent, and speaker-adaptive speech recognition." IEEE Transactions on Speech and Audio Processing 1, no. 2 (April 1993): 150–57. http://dx.doi.org/10.1109/89.222875.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Lv, Gang, and Heming Zhao. "Joint Factor Analysis of Channel Mismatch in Whispering Speaker Verification." Archives of Acoustics 37, no. 4 (December 1, 2012): 555–59. http://dx.doi.org/10.2478/v10168-012-0065-9.

Full text
Abstract:
Abstract A speaker recognition system based on joint factor analysis (JFA) is proposed to improve whispering speakers’ recognition rate under channel mismatch. The system estimated separately the eigenvoice and the eigenchannel before calculating the corresponding speaker and the channel factors. Finally, a channel-free speaker model was built to describe accurately a speaker using model compensation. The test results from the whispered speech databases obtained under eight different channels showed that the correct recognition rate of a recognition system based on JFA was higher than that of the Gaussian Mixture Model-Universal Background Model. In particular, the recognition rate in cellphone channel tests increased significantly.
APA, Harvard, Vancouver, ISO, and other styles
20

Nursholihatun, Erina, Sudi Mariyanto Sasongko, and Abdullah Zainuddin. "IDENTIFIKASI SUARA MENGGUNAKAN METODE MEL FREQUENCY CEPSTRUM COEFFICIENTS (MFCC) DAN JARINGAN SYARAF TIRUAN BACKPROPAGATION." DIELEKTRIKA 7, no. 1 (February 29, 2020): 48. http://dx.doi.org/10.29303/dielektrika.v7i1.232.

Full text
Abstract:
The voice is basic humans tool of communications. Speakers identifications is the process of recoqnizing the identity of a speaker by comparing the inputed voice features with all the features of each speaker in the database.There are two step of speaker identification process: feature extraction and pattern recognition. For the characteristic extraction phase using Mel Frequency Cepstrum Coefficient (MFCC) method. The method of pattern recognition using backpropagation artificial neural networks that compares the test data with the reference data in the database based on the variable result in the learning process. The result from the research show that increasing SNR (Signal to Noise Ratio) value will determine the success of the speaker recognition system. The higher SNR (Signal to Noise Ratio), will increase percentage level of recognition. Average accuracy speakers recoqnition of the speakers data without noise generating is 86%, the biggest average accuracy speakers recoqnition is 92 % in the data with 80 dB SNR level, and the lowest average accuracy is 45 % in the data with 80 dB SNR level. Rejection rate testing result of speakers outside the database is 100 %.
APA, Harvard, Vancouver, ISO, and other styles
21

Mridha, Muhammad Firoz, Abu Quwsar Ohi, Muhammad Mostafa Monowar, Md Abdul Hamid, Md Rashedul Islam, and Yutaka Watanobe. "U-Vectors: Generating Clusterable Speaker Embedding from Unlabeled Data." Applied Sciences 11, no. 21 (October 27, 2021): 10079. http://dx.doi.org/10.3390/app112110079.

Full text
Abstract:
Speaker recognition deals with recognizing speakers by their speech. Most speaker recognition systems are built upon two stages, the first stage extracts low dimensional correlation embeddings from speech, and the second performs the classification task. The robustness of a speaker recognition system mainly depends on the extraction process of speech embeddings, which are primarily pre-trained on a large-scale dataset. As the embedding systems are pre-trained, the performance of speaker recognition models greatly depends on domain adaptation policy, which may reduce if trained using inadequate data. This paper introduces a speaker recognition strategy dealing with unlabeled data, which generates clusterable embedding vectors from small fixed-size speech frames. The unsupervised training strategy involves an assumption that a small speech segment should include a single speaker. Depending on such a belief, a pairwise constraint is constructed with noise augmentation policies, used to train AutoEmbedder architecture that generates speaker embeddings. Without relying on domain adaption policy, the process unsupervisely produces clusterable speaker embeddings, termed unsupervised vectors (u-vectors). The evaluation is concluded in two popular speaker recognition datasets for English language, TIMIT, and LibriSpeech. Also, a Bengali dataset is included to illustrate the diversity of the domain shifts for speaker recognition systems. Finally, we conclude that the proposed approach achieves satisfactory performance using pairwise architectures.
APA, Harvard, Vancouver, ISO, and other styles
22

Nematollahi, Mohammad Ali, and S. A. R. Al-Haddad. "Distant Speaker Recognition: An Overview." International Journal of Humanoid Robotics 13, no. 02 (May 25, 2016): 1550032. http://dx.doi.org/10.1142/s0219843615500322.

Full text
Abstract:
Distant speaker recognition (DSR) system assumes the microphones are far away from the speaker’s mouth. Also, the position of microphones can vary. Furthermore, various challenges and limitation in terms of coloration, ambient noise and reverberation can bring some difficulties for recognition of the speaker. Although, applying speech enhancement techniques can attenuate speech distortion components, it may remove speaker-specific information and increase the processing time in real-time application. Currently, many efforts have been investigated to develop DSR for commercial viable systems. In this paper, state-of-the-art techniques in DSR such as robust feature extraction, feature normalization, robust speaker modeling, model compensation, dereverberation and score normalization are discussed to overcome the speech degradation components i.e., reverberation and ambient noise. Performance results on DSR show that whenever speaker to microphone distant increases, recognition rates decreases and equal error rate (EER) increases. Finally, the paper concludes that applying robust feature and robust speaker model varying lesser with distant, can improve the DSR performance.
APA, Harvard, Vancouver, ISO, and other styles
23

Tsai, Chia-Hung Dylan, Michael Hu, and Stone Cheng. "Enhancing the Recognition of Speakers in Different Distances using Voice Features." INTER-NOISE and NOISE-CON Congress and Conference Proceedings 268, no. 4 (November 30, 2023): 4300–4306. http://dx.doi.org/10.3397/in_2023_0610.

Full text
Abstract:
This paper proposes a method to enhance speaker recognition at varying distances by adjusting the reference voice based on voice features. Speaker recognition is the process of identifying an individual based on their voice. It involves analyzing and comparing various acoustic features of a person's voice with the reference voice in the database. Conventional speaker recognition techniques have limitations of reduced accuracy when speakers are from varying distances. In this work, we found that high-frequency signals tend to decline faster than low-frequency ones with respect to speaker distance. Based on this, we propose a method that utilizes support vector machines (SVM) to classify speaker distance using sound features, such as the amplitude sum of high-frequency signals and the dynamic range. Once the speaker distance is determined, the reference signal in the database is adjusted according to the distance before being used for speaker recognition. Mel-Frequency Cepstral Coefficients (MFCC) and Dynamic Time Warping (DTW) were employed as the recognition algorithm. Experiments were conducted with speakers placed at three distances, 0.1, 1, and 2.5 meters from the microphone. The experimental results reveal that signals with the frequency of 4 kHz and above experience a faster decline in amplitude than lower ones with increasing distance. The recognition results also demonstrate a significant improvement in accuracy.
APA, Harvard, Vancouver, ISO, and other styles
24

Chauhan, Neha, Tsuyoshi Isshiki, and Dongju Li. "Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies." Acoustics 6, no. 2 (May 14, 2024): 439–69. http://dx.doi.org/10.3390/acoustics6020024.

Full text
Abstract:
This paper delves into an in-depth exploration of speaker recognition methodologies, with a primary focus on three pivotal approaches: feature-level fusion, dimension reduction employing principal component analysis (PCA) and independent component analysis (ICA), and feature optimization through a genetic algorithm (GA) and the marine predator algorithm (MPA). This study conducts comprehensive experiments across diverse speech datasets characterized by varying noise levels and speaker counts. Impressively, the research yields exceptional results across different datasets and classifiers. For instance, on the TIMIT babble noise dataset (120 speakers), feature fusion achieves a remarkable speaker identification accuracy of 92.7%, while various feature optimization techniques combined with K nearest neighbor (KNN) and linear discriminant (LD) classifiers result in a speaker verification equal error rate (SV EER) of 0.7%. Notably, this study achieves a speaker identification accuracy of 93.5% and SV EER of 0.13% on the TIMIT babble noise dataset (630 speakers) using a KNN classifier with feature optimization. On the TIMIT white noise dataset (120 and 630 speakers), speaker identification accuracies of 93.3% and 83.5%, along with SV EER values of 0.58% and 0.13%, respectively, were attained utilizing PCA dimension reduction and feature optimization techniques (PCA-MPA) with KNN classifiers. Furthermore, on the voxceleb1 dataset, PCA-MPA feature optimization with KNN classifiers achieves a speaker identification accuracy of 95.2% and an SV EER of 1.8%. These findings underscore the significant enhancement in computational speed and speaker recognition performance facilitated by feature optimization strategies.
APA, Harvard, Vancouver, ISO, and other styles
25

Lee, Yun Kyung, and Jeon Gue Park. "Multimodal Unsupervised Speech Translation for Recognizing and Evaluating Second Language Speech." Applied Sciences 11, no. 6 (March 16, 2021): 2642. http://dx.doi.org/10.3390/app11062642.

Full text
Abstract:
This paper addresses an automatic proficiency evaluation and speech recognition for second language (L2) speech. The proposed method recognizes the speech uttered by the L2 speaker, measures a variety of fluency scores, and evaluates the proficiency of the speaker’s spoken English. Stress and rhythm scores are one of the important factors used to evaluate fluency in spoken English and are computed by comparing the stress patterns and the rhythm distributions to those of native speakers. In order to compute the stress and rhythm scores even when the phonemic sequence of the L2 speaker’s English sentence is different from the native speaker’s one, we align the phonemic sequences based on a dynamic time-warping approach. We also improve the performance of the speech recognition system for non-native speakers and compute fluency features more accurately by augmenting the non-native training dataset and training an acoustic model with the augmented dataset. In this work, we augment the non-native speech by converting some speech signal characteristics (style) while preserving its linguistic information. The proposed variational autoencoder (VAE)-based speech conversion network trains the conversion model by decomposing the spectral features of the speech into a speaker-invariant content factor and a speaker-specific style factor to estimate diverse and robust speech styles. Experimental results show that the proposed method effectively measures the fluency scores and generates diverse output signals. Also, in the proficiency evaluation and speech recognition tests, the proposed method improves the proficiency score performance and speech recognition accuracy for all proficiency areas compared to a method employing conventional acoustic models.
APA, Harvard, Vancouver, ISO, and other styles
26

Vishwakarma, Agrani. "Enhanced Speaker Recognition System." International Journal for Research in Applied Science and Engineering Technology 6, no. 5 (May 31, 2018): 1544–50. http://dx.doi.org/10.22214/ijraset.2018.5251.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Wang, William S. Y., and M. R. Schroeder. "Speech and Speaker Recognition." Language 62, no. 3 (September 1986): 706. http://dx.doi.org/10.2307/415501.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Hanna, S. A., and Ann Stuart Laubstein. "Speaker‐independent sound recognition." Journal of the Acoustical Society of America 92, no. 4 (October 1992): 2475–76. http://dx.doi.org/10.1121/1.404442.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Rodrigues, Luis, and John Kroeker. "Articulatory-based speaker recognition." Journal of the Acoustical Society of America 133, no. 5 (May 2013): 3300. http://dx.doi.org/10.1121/1.4805455.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Campbell, J. P. "Speaker recognition: a tutorial." Proceedings of the IEEE 85, no. 9 (1997): 1437–62. http://dx.doi.org/10.1109/5.628714.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Hickt, L. "Speech and speaker recognition." Signal Processing 13, no. 3 (October 1987): 336–38. http://dx.doi.org/10.1016/0165-1684(87)90137-x.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Singh, Nilu, R. A. Khan, and Raj Shree. "Applications of Speaker Recognition." Procedia Engineering 38 (2012): 3122–26. http://dx.doi.org/10.1016/j.proeng.2012.06.363.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Jin, Qin, Tanja Schultz, and Alex Waibel. "Far-Field Speaker Recognition." IEEE Transactions on Audio, Speech and Language Processing 15, no. 7 (September 2007): 2023–32. http://dx.doi.org/10.1109/tasl.2007.902876.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Xu, Hui Hong, and Su Chun Gao. "Speaker Recognition Study Based on Optimized Baum-Welch Algorithm." Applied Mechanics and Materials 543-547 (March 2014): 2471–73. http://dx.doi.org/10.4028/www.scientific.net/amm.543-547.2471.

Full text
Abstract:
The speaker recognition is a sort of biological recognition technology according to person's sound to identify .The article based on vc platform implement speaker recognitions function using VQ and HMM technology. using genetic algorithm to improve the Baum-Welch algorithm.Trough experiment verificate that improved-arithmetic enhance recognition effect.
APA, Harvard, Vancouver, ISO, and other styles
35

Thủy, Đào Thị Lệ, Trinh Van Loan, and Nguyen Hong Quang. "GMM FOR EMOTION RECOGNITION OF VIETNAMESE." Journal of Computer Science and Cybernetics 33, no. 3 (March 20, 2018): 229–46. http://dx.doi.org/10.15625/1813-9663/33/3/11017.

Full text
Abstract:
This paper presents the results of GMM-based recognition for four basic emotions of Vietnamese such as neutral, sadness, anger and happiness. The characteristic parameters of these emotions are extracted from speech signals and divided into different parameter sets for experiments. The experiments are carried out according to speaker-dependent or speaker-independent and content-dependent or content-independent recognitions. The results showed that the recognition scores are rather high with the case for which there is a full combination of parameters as MFCC and its first and second derivatives, fundamental frequency, energy, formants and its correspondent bandwidths, spectral characteristics and F0 variants. In average, the speaker-dependent and content-dependent recognition scrore is 89.21%. Next, the average score is 82.27% for the speaker-dependent and content-independent recognition. For the speaker-independent and content-dependent recognition, the average score is 70.35%. The average score is 66.99% for speaker-independent and content-independent recognition. Information on F0 has significantly increased the score of recognition
APA, Harvard, Vancouver, ISO, and other styles
36

Kholiev, Vladislav O., and Olesia Yu Barkovska. "Improved Speaker Recognition System Using Automatic Lip Recognition." Control Systems and Computers, no. 1 (305) (2024): 38–49. http://dx.doi.org/10.15407/csc.2024.01.038.

Full text
Abstract:
The paper is focused on the relevant problem of speech recognition using additional sources besides the voice itself, in conditions in which the quality or availability of audio information is inadequate (for example, in the presence of noise or additional speakers). This is achieved by using automatic lip recognition (ARL) methods, which rely on non-acoustic biosignals generated by the human body during speech production. Among the applications of this approach are medical applications, as well as processing voice commands in languages with poor audio conditions. The aim of this work is to create a system for speech recognition based on a combination of speaker lip recognition (SSI) and context prediction. To achieve this goal, the following tasks were performed: to substantiate the systems for recognizing voice commands of a silent voice interface (SSI) based on a combination of two neural network architectures, to implement a model for recognizing visemes based on the CNN neural network architecture and an encoder-decoder architecture for the LSTM neural recurrent network model for analyzing and predicting the context of a speaker’s speech. The developed system was tested on a chosen dataset. The results show that the recognition error in different conditions averages from 4,34% to 5,12% for CER and from 5,52% to 6,06% for WER for the proposed ALR system in 7 experiments, which is an advantage over the LipNet project, which additionally processes audio data for the original without noise.
APA, Harvard, Vancouver, ISO, and other styles
37

Singh, Mahesh K., S. Manusha, K. V. Balaramakrishna, and Sridevi Gamini. "Speaker Identification Analysis Based on Long-Term Acoustic Characteristics with Minimal Performance." International Journal of Electrical and Electronics Research 10, no. 4 (December 30, 2022): 848–52. http://dx.doi.org/10.37391/ijeer.100415.

Full text
Abstract:
The identity of the speakers depends on the phonological properties acquired from the speech. The Mel-Frequency Cepstral Coefficients (MFCC) are better researched for derived the acoustic characteristic. This speaker model is based on a small representation and the characteristics of the acoustic features. These are derived from the speaker model and the cartographic representation by the MFCCs. The MFCC is used for independent monitoring of speaker text. There is a problem with the recognition of speakers by small representation, so proposed the Gaussian Mixture Model (GMM), mean super vector core for training. Unknown vector modules are cleared using rarity and experiments based on the TMIT database. The I-vector algorithm is proposed for the effective improvement of ASR (Automatic Speaker Recognition). The Atom Aligned Sparse Representation (AASR) is used to describe the speaker-based model. The Short Representation Classification (SRC) is used to describe the speaker recognition report. A robust short coding is based on the Maximum Likelihood Estimation (MIE) to clarify the problem in small representation. Strong speaker verification based on a small representation of GMM super vectors. Strong speaker verification based on a small representation of GMM super vectors.
APA, Harvard, Vancouver, ISO, and other styles
38

Lim, Seunguook, and Jihie Kim. "SAPBERT: Speaker-Aware Pretrained BERT for Emotion Recognition in Conversation." Algorithms 16, no. 1 (December 22, 2022): 8. http://dx.doi.org/10.3390/a16010008.

Full text
Abstract:
Emotion recognition in conversation (ERC) is receiving more and more attention, as interactions between humans and machines increase in a variety of services such as chat-bot and virtual assistants. As emotional expressions within a conversation can heavily depend on the contextual information of the participating speakers, it is important to capture self-dependency and inter-speaker dynamics. In this study, we propose a new pre-trained model, SAPBERT, that learns to identify speakers in a conversation to capture the speaker-dependent contexts and address the ERC task. SAPBERT is pre-trained with three training objectives including Speaker Classification (SC), Masked Utterance Regression (MUR), and Last Utterance Generation (LUG). We investigate whether our pre-trained speaker-aware model can be leveraged for capturing speaker-dependent contexts for ERC tasks. Experiments show that our proposed approach outperforms baseline models through demonstrating the effectiveness and validity of our method.
APA, Harvard, Vancouver, ISO, and other styles
39

Scruggs, Jeffrey L. "Speaker-dependent speech recognition using speaker independent models." Journal of the Acoustical Society of America 104, no. 5 (November 1998): 2558. http://dx.doi.org/10.1121/1.423803.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Lee, Bong-Jin, Jeung-Yoon Choi, and Hong-Goo Kang. "Phonetically optimized speaker modeling for robust speaker recognition." Journal of the Acoustical Society of America 126, no. 3 (September 2009): EL100—EL106. http://dx.doi.org/10.1121/1.3204765.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Misra, Hemant, Shajith Ikbal, and B. Yegnanarayana. "Speaker-specific mapping for text-independent speaker recognition." Speech Communication 39, no. 3-4 (February 2003): 301–10. http://dx.doi.org/10.1016/s0167-6393(02)00046-8.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Marini, Marco, Nicola Vanello, and Luca Fanucci. "Optimising Speaker-Dependent Feature Extraction Parameters to Improve Automatic Speech Recognition Performance for People with Dysarthria." Sensors 21, no. 19 (September 27, 2021): 6460. http://dx.doi.org/10.3390/s21196460.

Full text
Abstract:
Within the field of Automatic Speech Recognition (ASR) systems, facing impaired speech is a big challenge because standard approaches are ineffective in the presence of dysarthria. The first aim of our work is to confirm the effectiveness of a new speech analysis technique for speakers with dysarthria. This new approach exploits the fine-tuning of the size and shift parameters of the spectral analysis window used to compute the initial short-time Fourier transform, to improve the performance of a speaker-dependent ASR system. The second aim is to define if there exists a correlation among the speaker’s voice features and the optimal window and shift parameters that minimises the error of an ASR system, for that specific speaker. For our experiments, we used both impaired and unimpaired Italian speech. Specifically, we used 30 speakers with dysarthria from the IDEA database and 10 professional speakers from the CLIPS database. Both databases are freely available. The results confirm that, if a standard ASR system performs poorly with a speaker with dysarthria, it can be improved by using the new speech analysis. Otherwise, the new approach is ineffective in cases of unimpaired and low impaired speech. Furthermore, there exists a correlation between some speaker’s voice features and their optimal parameters.
APA, Harvard, Vancouver, ISO, and other styles
43

Janybekova, S. T., G. A. Tolganbayeva, and A. A. Sarsembayev. "Распознавание говорящего с помощью глубокого обучения." INTERNATIONAL JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, no. 6(6) (March 14, 2022): 85–91. http://dx.doi.org/10.54309/ijict.2022.2.6.011.

Full text
Abstract:
.This paper discusses a transition from the traditional methods to novel deep learning architectures for speaker recognition. The article aims to compare the traditional statistical methods and new approaches using deep learning models. To articulate the difference in the discussed approaches it furthermore describes several recent methods of optimization and evaluation techniques. The review covers datasets used, results, contributions made toward speaker recognition,and limitations related to it. В этой статье обсуждается переход от традиционных методов к новым архитектурам глубокого обучения для распознавания говорящего. Он направлен на сравнение традиционных статистических методов и новых подходов с использованием моделей глубокого обучения. Также описаны новейшие методы оптимизации. Из-за разных подходов существует несколько методик оценки. В этой статье представлен обзор методов глубокого обучения и обсуждается недавняя литература, в которой эти методы используются для распознавания речи. Обзор охватывает используемые базы данных, результаты, вклад в распознавание речи и связанные с этим ограничения.
APA, Harvard, Vancouver, ISO, and other styles
44

Weychan, Radoslaw, Tomasz Marciniak, Agnieszka Stankiewicz, and Adam Dabrowski. "Real Time Recognition Of Speakers From Internet Audio Stream." Foundations of Computing and Decision Sciences 40, no. 3 (September 1, 2015): 223–33. http://dx.doi.org/10.1515/fcds-2015-0014.

Full text
Abstract:
Abstract In this paper we present an automatic speaker recognition technique with the use of the Internet radio lossy (encoded) speech signal streams. We show an influence of the audio encoder (e.g., bitrate) on the speaker model quality. The model of each speaker was calculated with the use of the Gaussian mixture model (GMM) approach. Both the speaker recognition and the further analysis were realized with the use of short utterances to facilitate real time processing. The neighborhoods of the speaker models were analyzed with the use of the ISOMAP algorithm. The experiments were based on four 1-hour public debates with 7–8 speakers (including the moderator), acquired from the Polish radio Internet services. The presented software was developed with the MATLAB environment.
APA, Harvard, Vancouver, ISO, and other styles
45

Kshirod, Kshirod Sarmah. "Speaker Diarization with Deep Learning Techniques." Turkish Journal of Computer and Mathematics Education (TURCOMAT) 11, no. 3 (December 15, 2020): 2570–82. http://dx.doi.org/10.61841/turcomat.v11i3.14309.

Full text
Abstract:
Speaker diarization is a task to identify the speaker when different speakers spoke in an audio or video recording environment. Artificial intelligence (AI) fields have effectively used Deep Learning (DL) to solve a variety of real-world application challenges. With effective applications in a wide range of subdomains, such as natural language processing, image processing, computer vision, speech and speaker recognition, and emotion recognition, cyber security, and many others, DL, a very innovative field of Machine Learning (ML), that is quickly emerging as the most potent machine learning technique. DL techniques have outperformed conventional approaches recently in speaker diarization as well as speaker recognition. The technique of assigning classes to speech recordings that correspond to the speaker's identity is known as speaker diarization, and it allows one to determine who spoke when. A crucial step in speech processing is speaker diarization, which divides an audio recording into different speaker areas. In-depth analysis of speaker diarization utilizing a variety of deep learning algorithms that are presented in this research paper. NIST-2000 CALLHOME and our in-house database ALSD-DB are the two voice corpora we used for this study's tests. TDNN-based embeddings with x-vectors, LSTM-based embeddings with d-vectors, and lastly embeddings fusion of both x-vector and d-vector are used in the tests for the basic system. For the NIST-2000 CALLHOME database, LSTM based embeddings with d-vector and embeddings integrating both x-vector and d-vector exhibit improved performance with DER of 8.25% and 7.65%, respectively, and of 10.45% and 9.65% for the local ALSD-DB database
APA, Harvard, Vancouver, ISO, and other styles
46

Mami, Yassine, and Delphine Charlet. "Speaker recognition by location in the space of reference speakers." Speech Communication 48, no. 2 (February 2006): 127–41. http://dx.doi.org/10.1016/j.specom.2005.06.014.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Gupta, Manish, Shambhu Shankar Bharti, and Suneeta Agarwal. "Gender-based speaker recognition from speech signals using GMM model." Modern Physics Letters B 33, no. 35 (December 16, 2019): 1950438. http://dx.doi.org/10.1142/s0217984919504384.

Full text
Abstract:
Speech is a convenient medium for communication among human beings. Speaker recognition is a process of automatically recognizing the speaker by processing the information included in the speech signal. In this paper, a new approach is proposed for speaker recognition through speech signal. Here, a two-level approach is proposed. In the first-level, the gender of the speaker is recognized, and in the second-level speaker is recognized based on recognized gender at first-level. After recognizing the gender of the speaker, search space is reduced to half for the second-level as speaker recognition system searches only in a set of speech signals belonging to identified gender. To identify gender, gender-specific features: Mel Frequency Cepstral Coefficients (MFCC) and pitch are used. Speaker is recognized by using speaker specific features: MFCC, Pitch and RASTA-PLP. Support Vector Machine (SVM) and Gaussian Mixture Model (GMM) classifiers are used for identifying the gender and recognizing the speaker, respectively. Experiments are performed on speech signals of two databases: “IIT-Madras speech synthesis and recognition” (containing speech samples spoken by eight male and eight female speakers of eight different regions in English language) and “ELSDSR” (containing speech samples spoken by five male and five female in English language). Experimentally, it is observed that by using two-level approach, time taken for speaker recognition is reduced by 30–32% as compared to the approach when speaker is recognized without identifying the gender (single-level approach). The accuracy of speaker recognition in this proposed approach is also improved from 99.7% to 99.9% as compared to single-level approach. It is concluded through the experiments that speech signal of a minimum 1.12 duration (after neglecting silence parts) is sufficient for recognizing the speaker.
APA, Harvard, Vancouver, ISO, and other styles
48

Al-hazaimeh, Obaida, Saleh Ali Alomari, Jamal Alsakran, and Nouh Alhindawi. "Across correlation - new based technique for speaker recognition." International Journal of Academic Research 6, no. 3 (May 30, 2014): 232–39. http://dx.doi.org/10.7813/2075-4124.2014/6-3/a.33.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Harish, Dasari. "Speaker Recognition Using MFCC-BPNN-HHO." INTERANTIONAL JOURNAL OF SCIENTIFIC RESEARCH IN ENGINEERING AND MANAGEMENT 08, no. 04 (April 14, 2024): 1–5. http://dx.doi.org/10.55041/ijsrem30717.

Full text
Abstract:
Speaker recognition plays a pivotal role in speech processing. This paper proposes an enhancement to the Backpropagation Neural Network (BPNN) by incorporating Harris Hawks Optimization (HHO) for weight optimization, and evaluates its performance compared to the standalone BPNN. Both methods employ Mel Frequency Cepstral Coefficients (MFCC) for feature extraction from input data. The study assesses the proposed system on a dataset comprising 10 speakers, with each providing 10 utterances. Results demonstrate that the integrated MFCC-BPNN-HHO approach outperforms the standalone BPNN, achieving enhanced accuracy in speaker recognition tasks. Specifically, the accuracy of the BPNN-HHO was found to be significantly higher than that of the BPNN alone, indicating the effectiveness of the HHO optimization technique in improving speaker recognition accuracy. This study underscores the potential of integrating optimization algorithms like HHO with BPNN to further refine speaker recognition systems and contribute to advancements in speech processing technology. This approach has promising applications in access control, identity verification and other security-related domains where biometric authentication is essential. Keywords—MFCC, BPNN, HHO
APA, Harvard, Vancouver, ISO, and other styles
50

Radan, N. H., and K. Sidorov. "Development and Research of a System for Automatic Recognition of the Digits Yemeni Dialect of Arabic Speech Using Neural Networks." Proceedings of Telecommunication Universities 9, no. 5 (November 14, 2023): 35–42. http://dx.doi.org/10.31854/1813-324x-2023-9-5-35-42.

Full text
Abstract:
The article describes the results of research on the development and testing of an automatic speech recognition system (SAR) in Arabic numerals using artificial neural networks. Sound recordings (speech signals) of the Arabic Yemeni dialect recorded in the Republic of Yemen were used for the research. SAR is an isolated system of recognition of whole words, it is implemented in two modes: "speaker-dependent system" (the same speakers are used for training and testing the system) and "speaker-independent system" (the speakers used for training the system differ from those used for testing it). In the process of speech recognition, the speech signal is cleared of noise using filters, then the signal is pre-localized, processed and analyzed by the Hamming window (a time alignment algorithm is used to compensate for differences in pronunciation). Informative features are extracted from the speech signal using mel-frequency cepstral coefficients. The developed SAR provides high accuracy of the recognition of Arabic numerals of the Yemeni dialect – 96.2 % (for a speaker-dependent system) and 98.8 % (for a speaker-independent system).
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography