Log in

Relevant bibliographies by topics / MFCC / Dissertations / Theses

To see the other types of publications on this topic, follow the link: MFCC.

Dissertations / Theses on the topic 'MFCC'

Author: Grafiati

Published: 4 June 2021

Last updated: 26 October 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'MFCC.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mukherjee, Rishiraj. "Speaker Recognition Using Shifted MFCC." Scholar Commons, 2012. http://scholarcommons.usf.edu/etd/4136.

Full text

Abstract:

Speaker Recognition is the art of recognizing a speaker from a given database using speech as the only input. In this thesis we will be discussing a novel approach to detect speakers. Here we will introduce the concept of shifted MFCC to add improvement over the performance from previous work which has shown quite a decent amount of accuracy of about 95% at best. We will be talking about adding different parameters which also contributed in improving the efficiency of speaker recognition. Also we will be testing our algorithm on Text dependent speech data and Text Independent speech data. Our technique was evaluated on TIDIGIT - database. In order to further increase the speaker recognition rate at lower FARs, we combined accent information added with pitch and higher order formants. The possible application areas for the work done here is in any access control entry system or now a day's a lot of smart phones, laptops, operating systems etc have Also, in homeland security applications; speaker accent will play a critical role in the evaluation of biometric systems since users will be international in nature. So incorporating accent information into the speaker recognition/verification system is a key component that our study focused on. The accent incorporation method and Shifted MFCC techniques discussed in this work can also be applied to any other speaker recognition systems.

APA, Harvard, Vancouver, ISO, and other styles

2

Tolunay, Atahan. "Text-Dependent Speaker Verification Implemented in Matlab Using MFCC and DTW." Thesis, Linköpings universitet, Informationskodning, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-60992.

Full text

Abstract:

Even though speaker verification is a broad subject, the commercial and personal use implementations are rare. There are several problems that need to be solved before speaker verification can become more useful. The amount of pattern matching and feature extraction techniques is large and the decision on which ones to use is debatable. One of the main problems of speaker verification in general is the impact of noise. The very popular feature extraction technique MFCC is inherently sensitive to mismatch between training and verification conditions. MFCC is used in many speech recognition applications and is not only useful in text-dependent speaker verification. However the most reliable verification techniques are text-dependent. One of the most popular pattern matching techniques in text-dependent speaker verification is DTW. Although having limitations outside the text-dependent applications it is a reliable way of matching templates even with limited amount of training material. The signal processing techniques, MFCC and DTW are explained and discussed in detail along with a Matlab program where these techniques have been implemented. The choices made in signal processing, feature extraction and pattern matching are determined by discussions of available studies on these topics. The results indicate that it is possible to program text-dependent speaker verification systems that are functional in clean conditions with tools like Matlab.

APA, Harvard, Vancouver, ISO, and other styles

3

Krotký, Jan. "Dekodér pro systém detekce klíčových slov." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218176.

Full text

Abstract:

The essay presents the basic characteristics of human speech recognition, describes systems for the detection of key words and further deals with the proposal of each decoder blocks divided into three chapters. The first one describes the operations that are performed before the signal distribution of the framework and the segmentation. The second chapter describes the calculation of short-term energy, the number of zero passes and self-correlative, prediction and Mel-frequency cepstral coefficients. The third chapter, which describes the design of the block decoder, describes the method of dynamic time destruction and the method based on hidden Markov model. The final part of the essay describes decoders working with a speech and a proposal for a simple decoder working with isolated words, which was based issued and tested based on the preceding chapters.

APA, Harvard, Vancouver, ISO, and other styles

4

Mubarak, Omer Mohsin Electrical Engineering &amp Telecommunications Faculty of Engineering UNSW. "Speech and music discrimination using short-time features." Awarded by:University of New South Wales. Electrical Engineering & Telecommunications, 2006. http://handle.unsw.edu.au/1959.4/31954.

Full text

Abstract:

This thesis addresses the problem of classifying an audio stream as either speech or music, an issue which is beginning to receive increasing attention due to its wide range of applications. Various techniques have been presented in last decade to discriminate between speech and music. However, their accuracy is still not sufficient since music can refer to a very broad class of signals due to the large number of musical instruments found in audio data. Performance can also be further compromised in noisy conditions, which are unavoidable in some practical situations. This thesis presents an analysis of feature extraction techniques and classifiers currently being used, followed by the proposal and evaluation of new features for improved classification. These include two novel cepstral features, delta cepstral energy and power spectrum deviation, along with amplitude and frequency modulation features. The modified group delay feature, initially proposed for speech recognition, is also investigated for speech and music discrimination. Experiments were performed using different sets of features, compared among themselves and with conventional MFCCs using error rate criteria and Detection Error Trade-off curves. It is shown that the proposed cepstral and modulation features result in an increase in the accuracy of the conventional MFCC based system. However, the modified group delay feature which has been shown to improve accuracy for speech classification problems, does not contribute much to the problem of speech and music discrimination. Among the ones presented here the optimum feature configuration, both modulation features with MFCC, resulted in overall error rate of 6.57% as compared to 7.43% for MFCC alone.

APA, Harvard, Vancouver, ISO, and other styles

5

Pan, Linlin. "Research and simulation on speech recognition by Matlab." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-16950.

Full text

Abstract:

With the development of multimedia technology, speech recognition technology has increasingly become a hotspot of research in recent years. It has a wide range of applications, which deals with recognizing the identity of the speakers that can be classified into speech identification and speech verification according to decision modes.The main work of this thesis is to study and research the techniques, algorithms of speech recognition, thus to create a feasible system to simulate the speech recognition. The research work and achievements are as following: First: The author has done a lot of investigation in the field of speech recognition with the adequate research and study. There are many algorithms about speech recognition, to sum up, the algorithms can divided into two categories, one of them is the direct speech recognition, which means the method can recognize the words directly, and another prefer the second method that recognition based on the training model. Second: find a useable and reasonable algorithm and make research about this algorithm. Besides, the author has studied algorithms, which are used to extract the word's characteristic parameters based on MFCC(Mel frequency Cepstrum Coefficients) , and training the Characteristic parameters based on the GMM(Gaussian mixture mode) . Third: The author has used the MATLAB software and written a program to implement the speech recognition algorithm and also used the speech process toolbox in this program. Generally speaking, whole system includes the module of the signal process, MFCC characteristic parameter and GMM training. Forth: Simulation and analysis the results. The MATLAB system will read the wav file, play it first, and then calculate the characteristic parameters automatically. All content of the speech signal have been distinguished in the last step. In this paper, the author has recorded speech from different people to test the systems and the simulation results shown that when the testing environment is quiet enough and the speaker is the same person to record for 20 times, the performance of the algorithm is approach to 100% for pair of words in different and same syllable. But the result will be influenced when the testing signal is surrounded with certain noise level. The simulation system won’t work with a good output, when the speaker is not the same one for recording both reference and testing signal.

APA, Harvard, Vancouver, ISO, and other styles

6

SIQUEIRA, JAN KRUEGER. "CONTINUOUS SPEECH RECOGNITION WITH MFCC, SSCH AND PNCC FEATURES, WAVELET DENOISING AND NEURAL NETWORKS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2011. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=19143@1.

Full text

Abstract:

CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Um dos maiores desafios na área de reconhecimento de voz contínua é desenvolver sistemas robustos ao ruído aditivo. Para isso, este trabalho analisa e testa três técnicas. A primeira delas é a extração de atributos do sinal de voz usando os métodos MFCC, SSCH e PNCC. A segunda é a remoção de ruído do sinal de voz via wavelet denoising. A terceira e última é uma proposta original batizada de feature denoising, que busca melhorar os atributos extraídos usando um conjunto de redes neurais. Embora algumas dessas técnicas já sejam conhecidas na literatura, a combinação entre elas trouxe vários resultados interessantes e inéditos. Inclusive, nota-se que o melhor desempenho vem da união de PNCC com feature denoising.
One of the biggest challenges on the continuous speech recognition field is to develop systems that are robust to additive noise. To do so, this work analyses and tests three techniques. The first one extracts features from the voice signal using the MFCC, SSCH and PNCC methods. The second one removes noise from the voice signal through wavelet denoising. The third one is an original one, called feature denoising, that seeks to improve the extracted features using a set of neural networks. Although some of these techniques are already known in the literature, the combination of them brings many interesting and new results. In fact, it is noticed that the best performance comes from the union of PNCC and feature denoising.

APA, Harvard, Vancouver, ISO, and other styles

7

Dobrovolskis, Martynas. "Šnekos atpažinimas." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2005. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2005~D_20050614_154005-58155.

Full text

Abstract:

Voice recognition technologies appeared in the period of general device miniaturization, when all technologies were commonly integrated into one lust. There is no space for buttons and displays anymore. To have a good system of Lithuanian language recognition, a number of throughout researches must be implemented. Only after selecting the most efficient speech recognition scheme, we can proceed to the development of software adapted to the contemporary time. The aim of this paper is to determine, how efficient speech recognition is possible using neuron networks. MFCC and LPC coefficients were chosen as the parameters characterizing the phonemes. The paper attempts at the determination of the coefficients, which lead to the most efficient recognition of phonemes. For testing, programs PRAAT and MatLab were used. After implementing a number of phoneme recognition experiments in the research work, the results were obtained, which lead to the following conclusions: 1. In case of using neuron network for the recognition of isolated sounds and characterizing the phonemes by MFCC or LPC coefficients, the possibility of recognition does not exceed 90 per cent. It is not enough for quality recognition of Lithuanian speech. 2. In case of using MFCC coefficients, separate phonemes are recognized better than using LPC coefficients. The difference is about 15 per cent. 3. The advantage of LPC coefficients in comparison with MFCC is the curve of recognition possibility, which is more even... [to full text]

APA, Harvard, Vancouver, ISO, and other styles

8

Julien, Eric. "Alignement du chant par rapport à une référence audio en temps réel." Mémoire, Université de Sherbrooke, 2013. http://hdl.handle.net/11143/6184.

Full text

Abstract:

Dans l'optique de créer un système de karaoké qui modifie une interprétation chantée à capella en temps réel, il est nécessaire de pouvoir localiser l'interprète par rapport à une référence afin de pouvoir déterminer quelle serait la cible d'un algorithme de modification de la voix. Pour qu'un tel système fonctionne bien, il est nécessaire que l'algorithme d'alignement exploite au maximum les spécificités de la voix, qu'il utilise l'information liée au texte prononcé plutôt qu'aux aspects artistiques du chant, qu'il soit à temps réel et qu'il offr la plus faible latence possible. Afin d'atteindre ces objectifs, un système d'alignement basé sur le Dynamic Time Warping (DTW) a été développé. Une adaptation temps réel simple de l'algorithme ordinaire de la DTW qui permet d'atteindre les objectifs énumérés est proposée et comparée à d'autres approches répertoriées dans la littérature. Cette adaptation a permis d'obtenir de meilleurs résultats que les autres techniques testées. Une étude comparative de trois types d'analyses spectrales couramment utilisées dans des systèmes de reconnaissance automatique de la voix a été réalisée, dans le cadre spécifique d'un algorithme d'alignement de la voix chantée. Les coefficients évalués sont les Mel-frquency Cepstrum Coefficients (MFCC), les Warped Discrete Cosine Transform Coefficients (WDCTC) et les coefficients de l'analyse Perceptual Linear Prediction (PLP). Les résultats obtenus indiquent une meilleure performance pour l'analyse PLP. L'utilisation d'une fonction de transformation linéaire par morceaux, appliquée aux matrices de coûts instantanés obtenues, permet de rendre l'alignement le plus facilement distinguable dans les matrices de coûts cumulés calculées. Les paramètres de la fonction de transformation peuvent être obtenus par l'optimisation en boucle fermée par recherche directe par motif. Une fonction-objectif permettant d'éviter les discontinuités de l'écart quadratique moyen sur l'alignement est développée. Plusieurs matrices de coûts peuvent être combinées entre elles en effectuant une somme pondérée des matrices de coûts instantanées transformées de chacun des paramètres considérés. La pondération est également obtenue par optimisation. Plusieurs assemblages sont comparés : les meilleurs résultats sont obtenus avec une combinaison de l'analyse PLP et du niveau d'énergie et des dérivées de ceux-ci. L'écart moyen sur l'alignement de référence est de l'ordre de 50 ms, avec un écart-type d'environ 75 ms pour les séquences testées. Des perspectives permettant d'améliorer la convergence de l'algorithme pour les paires de séquences audio difficiles à aligner, d'obtenir de meilleures matrices de coûts en utilisant d'autres contraintes locales, en considérant l'intégration de nouveaux paramètres tels le pitch ou en utilisant une base de données de voix chantée segmentée pour optimiser une mesure de distance sont données.

APA, Harvard, Vancouver, ISO, and other styles

9

Martins, Ana Caroline Vasconcelos. "GluA2 - Glutamatergic Receptor Study: A Molecular Approach." reponame:Repositório Institucional da UFC, 2017. http://www.repositorio.ufc.br/handle/riufc/28258.

Full text

Abstract:

Submitted by José Orlando Soares de Oliveira (orlando.soares@bol.com.br) on 2017-11-30T12:23:47Z No. of bitstreams: 1 2017_tese_acvmartins.pdf: 10270409 bytes, checksum: f2b0eb40db54875e0e40a6d040ce7336 (MD5)
Rejected by Weslayne Nunes de Sales (weslaynesales@ufc.br), reason: A aluna optou por publicar apenas os elementos pré-textuais. on 2017-12-01T12:36:51Z (GMT)
Submitted by José Orlando Soares de Oliveira (orlando.soares@bol.com.br) on 2017-12-01T13:50:35Z No. of bitstreams: 1 Tese corrigida - elementos pretextuais.pdf: 159585 bytes, checksum: 9531b29bc8c5a46f5ed5753442df383f (MD5)
Approved for entry into archive by Weslayne Nunes de Sales (weslaynesales@ufc.br) on 2017-12-01T13:57:30Z (GMT) No. of bitstreams: 1 Tese corrigida - elementos pretextuais.pdf: 159585 bytes, checksum: 9531b29bc8c5a46f5ed5753442df383f (MD5)
Made available in DSpace on 2017-12-01T13:57:30Z (GMT). No. of bitstreams: 1 Tese corrigida - elementos pretextuais.pdf: 159585 bytes, checksum: 9531b29bc8c5a46f5ed5753442df383f (MD5) Previous issue date: 2017-11-17
Glutamate receptors are the mediators of most excitatory neurotransmission processes in the central nervous system, acting as prominent targets for the treatment of several neurological disorders such as Epilepsy, Amyotrophic Lateral Sclerosis, Parkinson’s disease and Alzheimer’s disease. Hence an improved understanding of how glutamate and other ligands interact with the binding domain, of these receptors, can bring relevant insights to the development of new ligands. Therefore, this work aims to study the GluA2–ligand interaction using the structure of GluA2 co-crystallized with the ligands glutamate, AMPA, kainate and DNQX applying a method based on the Density Functional Theory combined with the molecular fractionation with conjugate caps scheme. To address that the dielectric constant of the GluA2 receptor is not homogeneous, a novel molecular approach was proposed and it was applied to study the interaction between the GluA2 and the ligands glutamate, AMPA, kainate and DNQX. The results obtained, considering the inhomogeneous model, were compared with those obtained using an uniform dielectric function for the GluA2 receptor and with data published in the literature establishing a more detailed description of the relevant amino acid residues for the protein-ligand binding interaction. Molecular dynamics studies and protein DFT calculations usually consider a fixed value for the protein dielectric function. In this work when ε = 1 is considered, many amino acid residues seem important, but when the dielectric constant shield was considered, they lost their relevance. The results for the GluA2-ligand total interaction energy and the D1-ligand and D2-ligand total interaction energy also shed some light on the differentiation between full and partial agonists, and between agonists and antagonists. Additionally, the results allow a hypothesis on the correlation between the Glu705-ligand interaction energy and the ligand action, paving the way for the use of the inhomogeneous dielectric function to study glutamate receptors and other protein-ligand systems. Finally, the results also suggests that for different ligands, different homogeneous dielectric constant will be able to well represent the system GluA2-ligand, making it necessary the previous analyses with the inhomogeneous dielectric constant approach.
Os receptores de glutamato são os mediadores da maioria dos processos de neurotransmissão excitatória no sistema nervoso central, atuando como alvos proeminentes para o tratamento de vários distúrbios neurológicos, como Epilepsia, Esclerose Lateral Amiotrófica, Doença de Parkinson e Doença de Alzheimer. Assim, uma compreensão aprimorada de como o glutamato e outros ligantes interagem com o domínio de interação, desses receptores, pode trazer informações relevantes para o desenvolvimento de novos ligantes. Portanto, este trabalho teve por objetivo estudar a interação GluA2-ligante utilizando a estrutura de GluA2 co-cristalizada com os ligantes Glutamato, AMPA, Cainato e DNQX utilizando método baseado na Teoria do Funcional da Densidade combinado com o esquema de fracionamento molecular com capas conjugadas. Para abordar que a constante dielétrica do receptor GluA2 não é homogênea, foi proposta uma nova abordagem molecular, que foi aplicada para estudar a interação entre a GluA2 e os ligantes Glutamato, AMPA, Cainato e DNQX. Os resultados obtidos, considerando o modelo não-homogêneo, foram comparados com aqueles obtidos usando uma função dielétrica uniforme para o receptor GluA2 e com dados publicados na literatura, estabelecendo uma descrição mais detalhada dos resíduos de aminoácido mais relevantes para a interação proteína-ligante. Estudos de dinâmica molecular e cálculos DFT de sistemas proteicos normalmente consideram um valor fixo para a função dielétrica proteica. Nesse trabalho quando ε = 1 é considerado, muitos resíduos de aminoácido parecem relevantes, mas quando a blindagem da constante dielétrica foi considerada, eles perderam sua relevância. Os resultados apresentados para a energia de interação total GluA2-ligante e a energia de interação total D1-ligante e D2-ligante contribuiu com a diferenciação entre agonistas totais e agonistas parciais e entre agonistas e antagonistas. Além disso, os resultados permitem que seja feita hipótese sobre a correlação entre a energia de interação Glu705-ligante e a ação do ligante, abrindo caminho para o uso da função dielétrica não-homogênea para estudar receptores de glutamato e outros sistemas proteína-ligante. Por fim, os resultados também sugerem que para diferentes ligantes, diferentes constantes dielétricas homogêneas serão capazes de representar bem o sistema GluA2-ligante, tornando necessária a análise prévia com a abordagem da constante dielétrica não-homogênea.

APA, Harvard, Vancouver, ISO, and other styles

10

SILVA, HARRY ARNOLD ANACLETO. "INDEPENDENT TEXT ROBUST SPEAKER RECOGNITION IN THE PRESENCE OF NOISE USING PAC-MFCC AND SUB BAND CLASSIFIERS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2011. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=18212@1.

Full text

Abstract:

COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
O presente trabalho é proposto o atributo PAC-MFCC operando com Classificadores em Sub-Bandas para a tarefa de identificação de locutor independente do texto em ruído. O sistema proposto é comparado com os atributos MFCC (Coeficientes Cepestrais de Frequência Mel), PAC- MFCC (Fase Autocorrelação-MFCC ) sem uso de classificadores em sub-bandas, SSCH(Histogramas de Centróides de Sub-Bandas Espectrais) e TECC (Coeficientes Cepestrais da Energia Teager). Nesta tarefa de reconhecimento, utilizou-se a base TIMIT a qual é composta de 630 locutores onde cada um deles falam 10 frases de aproximadamente 3 segundos cada frase, das quais 8 frases foram utilizadas para treinamento e 2 para teste, obtendo-se um total de 1260 locuções para o reconhecimento. Investigou-se o desempenho dos diversos sistemas utilizando diferentes tipos de ruídos da base Noisex 92 com diferentes relação sinal ruído. Verificou-se que a taxa de acerto da técnica PAC-MFCC com classificador em Sub-Bandas apresenta o melhor desempenho em comparação com as outras técnicas quando se tem uma relação sinal ruído menor que 10dB.
In this work is proposed the use of the PAC-MFCC feature with Sub-band Classifiers for the task of text-independent speaker identification in noise. The proposed scheme is compared with the features MFCC (Mel-Frequency Cepstral Coefficients ), PAC-MFCC (Phase Autocorrelation MFCC) without subband classifiers, SSCH (Subband Spectral Centroid Histograms) and TECC (Teager Energy Cepstrum Coefficients). In this recognition task, we used the TIMIT database which consists of 630 speakers, where every one of them speak 10 utterances of 3 seconds each one approximately, of which eight utterance were used for training and two for testing, thus obtaining a total of 1260 test utterance for the recognition. We investigated the performance of these techniques using differents types of noise from the base Noisex 92 with different signal to noise ratios. It was found that the accuracy rate of the PAC-MFCC feature with Sub-band Classifiers performs better in comparison with other techniques at a lower signal noise(less than 10dB).

APA, Harvard, Vancouver, ISO, and other styles

11

Anifowose, Olakunle. "DESIGN OF A KEYWORD SPOTTING SYSTEM USING MODIFIED CROSS-CORRELATION IN THE TIME AND THE MFCC DOMAIN." Master's thesis, Temple University Libraries, 2012. http://cdm16002.contentdm.oclc.org/cdm/ref/collection/p245801coll10/id/205117.

Full text

Abstract:

Electrical Engineering
M.S.E.E.
Abstract A Keyword Spotting System (KWS) is a system that recognizes predefined keywords in spoken utterances or written documents. The objective is to obtain the highest possible keyword detection rate without increasing the number of false detections in a system. The common approach to keyword spotting is the use of a Hidden Markov Model (HMM). These are usually complex systems which require training speech data. The Typical HMM approach uses garbage templates or HMM models to match non-keyword speech and non-speech sounds. The purpose of this research is to design a simple Keyword Spotting System. The system will be designed to spot English words and should be easily adaptable to other languages There are many challenges in designing a keyword spotting system such as variations in speech like pitch, loudness, timbre that make recognition difficult. There can be wide variations in utterances even from the same speaker. In this research, the use of cross-correlation, as an alternative means for detecting keywords in an utterance, was investigated. This research also involves the modeling of a global keyword using a quantized dynamic time warping algorithm, which can function effectively with multi-speakers. The global keyword is an aggregation of the features from several occurrences of the same keyword. This research also investigates the effect of pitch normalization on keyword detection. The use of cross-correlation as a method for keyword spotting was investigated in both the time and MFCC domain. In the time domain the global keyword was cross-correlated with a pitch-normalized utterance. A zero lag ratio (the ratio of the power around the zero lag obtained from a cross correlation to the power in the rest of the signal is computed) was computed for each speech frame, a threshold was then used to determine if the keyword is present. For the MFCC domain the MFCC features of each keyword were computed, normalized and cross-correlated with the normalized MFCC features of portions of the utterance of the same size as the keyword. Cross-correlation of MFCC features of the keyword with that of each portion of the utterance yields a single value between 0-1. The portion with the highest value is usually the location of the keyword. Results in the time domain varied from keyword to keyword, some words showed a 60% hit rate while the average obtained from various keywords from the Call Home database had an average of 41%. Cross-correlation of the keywords and utterance in the MFCC domain yielded a 66% hit rate in test conducted on all different keywords in the Call Home and Switchboard corpus. The system accuracy is keyword dependent with some keywords having an 85% hit rate
Temple University--Theses

APA, Harvard, Vancouver, ISO, and other styles

12

Vrba, Václav. "Robustní detekce klíčových slov v řečovém signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-220670.

Full text

Abstract:

The master thesis is divided into two parts theoretical and practical. The theoretical part is focused on methods of analysis and detection of speech signals. In the practical part the system for isolated word recognition was created in Matlab. The system is speaker independent separately for men and women. Also two speech databases were created for further use in the aircraft cockpit. Tests and evaluations were performed even with added noise.

APA, Harvard, Vancouver, ISO, and other styles

13

GORDILLO, CHRISTIAN DAYAN ARCOS. "CONTINUOUS SPEECH RECOGNITION BY COMBINING MFCC AND PNCC ATTRIBUTES WITH SS, WD, MAP AND FRN METHODS OF ROBUSTNESS." PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2013. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=23090@1.

Full text

Abstract:

PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
O crescente interesse por imitar o modelo que rege o processo cotidiano de comunicação humana através de maquinas tem se convertido em uma das áreas do conhecimento mais pesquisadas e de grande importância nas ultimas décadas. Esta área da tecnologia, conhecida como reconhecimento de voz, em como principal desafio desenvolver sistemas robustos que diminuam o ruído aditivo dos ambientes de onde o sinal de voz é adquirido, antes de que se esse sinal alimente os reconhecedores de voz. Por esta razão, este trabalho apresenta quatro formas diferentes de melhorar o desempenho do reconhecimento de voz contınua na presença de ruído aditivo, a saber: Wavelet Denoising e Subtração Espectral, para realce de fala e Mapeamento de Histogramas e Filtro com Redes Neurais, para compensação de atributos. Esses métodos são aplicados isoladamente e simultaneamente, afim de minimizar os desajustes causados pela inserção de ruído no sinal de voz. Alem dos métodos de robustez propostos, e devido ao fato de que os e conhecedores de voz dependem basicamente dos atributos de voz utilizados, examinam-se dois algoritmos de extração de atributos, MFCC e PNCC, através dos quais se representa o sinal de voz como uma sequência de vetores que contêm informação espectral de curtos períodos de tempo. Os métodos considerados são avaliados através de experimentos usando os software HTK e Matlab, e as bases de dados TIMIT (de vozes) e NOISEX-92 (de ruído). Finalmente, para obter os resultados experimentais, realizam-se dois tipos de testes. No primeiro caso, é avaliado um sistema de referência baseado unicamente em atributos MFCC e PNCC, mostrando como o sinal é fortemente degradado quando as razões sinal-ruıdo são menores. No segundo caso, o sistema de referência é combinado com os métodos de robustez aqui propostos, analisando-se comparativamente os resultados dos métodos quando agem isolada e simultaneamente. Constata-se que a mistura simultânea dos métodos nem sempre é mais atraente. Porem, em geral o melhor resultado é obtido combinando-se MAP com atributos PNCC.
The increasing interest in imitating the model that controls the daily process of human communication trough machines has become one of the most researched areas of knowledge and of great importance in recent decades. This technological area known as voice recognition has as a main challenge to develop robust systems that reduce the noisy additive environment where the signal voice was acquired. For this reason, this work presents four different ways to improve the performance of continuous speech recognition in presence of additive noise, known as Wavelet Denoising and Spectral Subtraction for enhancement of voice, and Mapping of Histograms and Filter with Neural Networks to compensate for attributes. These methods are applied separately and simultaneously two by two, in order to minimize the imbalances caused by the inclusion of noise in voice signal. In addition to the proposed methods of robustness and due to the fact that voice recognizers depend mainly on the attributes voice used, two algorithms are examined for extracting attributes, MFCC, and PNCC, through which represents the voice signal as a sequence of vectors that contain spectral information for short periods of time. The considered methods are evaluated by experiments using the HTK and Matlab software, and databases of TIMIT (voice) and Noisex-92 (noise). Finally, for the experimental results, two types of tests were carried out. In the first case a reference system was assessed based on MFCC and PNCC attributes, only showing how the signal degrades strongly when signal-noise ratios are higher. In the second case, the reference system is combined with robustness methods proposed here, comparatively analyzing the results of the methods when they act alone and simultaneously. It is noted that simultaneous mix of methods is not always more attractive. However, in general, the best result is achieved by the combination of MAP with PNCC attributes.

APA, Harvard, Vancouver, ISO, and other styles

14

Al-Ali, Ahmed Kamil Hasan. "Forensic speaker recognition under adverse conditions." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/130783/1/Ahmed%20Kamil%20Hasan_Al-Ali_Thesis.pdf.

Full text

Abstract:

The performance of forensic speaker recognition systems degrades significantly in the presence of environmental noise and reverberant conditions. This research developed new techniques to improve forensic speaker recognition performance under these conditions using fusion feature extraction techniques and speech enhancement based on the independent component analysis algorithm. A range of forensic speaker recognition applications will benefit from the research outcomes including criminal investigations and law enforcement agencies.

APA, Harvard, Vancouver, ISO, and other styles

15

Viana, Hesdras Oliveira. "Descritor de voz invariante ao ruído." Universidade Federal de Pernambuco, 2013. https://repositorio.ufpe.br/handle/123456789/11842.

Full text

Abstract:

Submitted by João Arthur Martins (joao.arthur@ufpe.br) on 2015-03-10T19:07:24Z No. of bitstreams: 2 Dissertaçao Hesdras Viana.pdf: 2998238 bytes, checksum: de42b675472ac4632a3a3c04688a77d5 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Approved for entry into archive by Daniella Sodre (daniella.sodre@ufpe.br) on 2015-03-10T19:43:06Z (GMT) No. of bitstreams: 2 Dissertaçao Hesdras Viana.pdf: 2998238 bytes, checksum: de42b675472ac4632a3a3c04688a77d5 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5)
Made available in DSpace on 2015-03-10T19:43:06Z (GMT). No. of bitstreams: 2 Dissertaçao Hesdras Viana.pdf: 2998238 bytes, checksum: de42b675472ac4632a3a3c04688a77d5 (MD5) license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Previous issue date: 2013-02-26
Extrair características da fala é uma etapa fundamental para os sistemas de reconhecimento de voz. É através dos descritores que extraímos a energia do sinal, a frequência fundamental (pitch) e a estrutura dos formantes que serão utilizados como identificadores para cada palavra pronunciada. Descritores como MFCC (Mel-Frequency Cepstral Coefficient), RASTA-PLP (RelAtive SpecTrAl - Perceptual Linear Predictive) e PNCC (Power Normalized Cepstral Coefficient) são muitos utilizados no estado da arte na área de reconhecimento de voz, porém, essas técnicas não conseguem apresentar bons resultados quando expostos a amostras com presença de ruído, variabilidade de locutor e fala contínua. O objetivo deste trabalho é desenvolver um descritor para a fala que seja invariante ao ruído, ambiente e locução. Para isso, fizemos um estudo dos descritores de voz mais utilizados na literatura, identificando as vantagens e desvantagens, expondo a situações variadas. Para avaliação das técnicas, utilizamos a base NOIZEUS (Noisy Speech Corpus) e dois classificadores: HMM (Hidden Markov Models) e SVM (Support Vector Machine). Essa base tem como característica a presença de ruído variando de 0dB, 5dB, 10dB e 15dB, gravada em diversos ambientes. A utilização dos classificadores serviu para validar os descritores de voz. O descritor proposto, chamado de MINERS (Model Invariant to Noise and Environment and Robust for Speech), apresentou melhores resultados entre todos os descritores avaliados (MFCC, MFCC combinado com Wavelet Denoising, RASTAPLP e PNCC). A abordagem que obteve maior sucesso foi a utilização do MINERS com o classificador SVM.

APA, Harvard, Vancouver, ISO, and other styles

16

Erokyar, Hasan. "Age and Gender Recognition for Speech Applications based on Support Vector Machines." Scholar Commons, 2014. https://scholarcommons.usf.edu/etd/5356.

Full text

Abstract:

Automatic age and gender recognition for speech applications is very important for a number of reasons. One of the reasons is that it can improve human-machine interaction. For example, the advertisements can be specialized based on the age and the gender of the person on the phone. It also can help identify suspects in criminal cases or at least it can minimize the number of suspects. Some other uses of this system can be applied for adaptation of waiting queue music where a different type of music can be played according to the person's age and gender. And also using this age and gender recognition system, the statistics about age and gender information for a specific population can be learned. Machine learning is part of artificial intelligence which aims to learn from data. Machine Learning has a long history. But due to some limitations, for ex. , the cost of computation and due to some inefficient algorithms, it was not applied to speech recognition tasks. Only for a decade, researchers started to apply these algorithms to some real world tasks, for ex., speech recognition, computer vision, finance, banking, robotics etc. In this thesis, recognition of age and gender was done using a popular machine learning algorithm and the performance of the system was compared. Also the dataset included real -life examples, so that the system is adaptable to real world applications. To remove the noise and to get the features of speech examples, some digital signal processing techniques were used. Useful speech features that were used in this work were: pitch frequency and cepstral representations. The performance of the age and gender recognition system depends on the speech features used. As the first speech feature, the fundamental frequency was selected. Fundamental frequency is the main differentiating factor between male and female speakers. Also, fundamental frequency for each age group is different. So in order to build age and gender recognition system, fundamental frequency was used. To get the fundamental frequency of speakers, harmonic to sub harmonic ratio method was used. The speech was divided into frames and fundamental frequency for each frame was calculated. In order to get the fundamental frequency of the speaker, the mean value of all the speech frames were taken. It turns out that, fundamental frequency is not only a good discriminator gender, but also it is a good discriminator of age groups simply because there is a distinction between age groups and the fundamental frequencies. Mel Frequency Cepstral Coefficients (MFCC) is a good feature for speech recognition and so it was selected. Using MFCC, the age and gender recognition accuracies were satisfactory. As an alternative to MFCC, Shifted Delta Cepstral (SDC) was used as a speech feature. SDC is extracted using MFCC and the advantage of SDC is that, it is more robust under noisy data. It captures the essential information in noisy speech better. From the experiments, it was seen that SDC did not give better recognition rates because the dataset did not contain too much noise. Lastly, a combination of pitch and MFCC was used to get even better recognition rates. The final fused system has an overall recognition value of 64.20% on ELSDSR [32] speech corpus.

APA, Harvard, Vancouver, ISO, and other styles

17

Barbosa, Emmanuel Duarte. "Descri??o bioqu?mica qu?ntica do bols?o de intera??o do ?ON Zn2+ na enzima ALAD humana." PROGRAMA DE P?S-GRADUA??O EM BIOQU?MICA, 2016. https://repositorio.ufrn.br/jspui/handle/123456789/21908.

Full text

Abstract:

Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2017-02-02T13:30:50Z No. of bitstreams: 1 EmmanuelDuarteBarbosa_DISSERT.pdf: 9706329 bytes, checksum: cf979f942793c968afbd04719854d7f0 (MD5)
Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2017-02-08T19:26:36Z (GMT) No. of bitstreams: 1 EmmanuelDuarteBarbosa_DISSERT.pdf: 9706329 bytes, checksum: cf979f942793c968afbd04719854d7f0 (MD5)
Made available in DSpace on 2017-02-08T19:26:36Z (GMT). No. of bitstreams: 1 EmmanuelDuarteBarbosa_DISSERT.pdf: 9706329 bytes, checksum: cf979f942793c968afbd04719854d7f0 (MD5) Previous issue date: 2016-07-29
A enzima Delta Aminolevul?nico Desidratase (ALAD) ? uma metaloprote?na citos?lica essencial em v?rios processos biol?gicos, uma vez que ? respons?vel pelo segundo passo da cat?lise enzim?tica na forma??o de porfobilinog?nio, um precursor dos tetrapirr?licos (heme, clorofila). Esta enzima ? bastante sens?vel a metais pesados e tem sido classicamente usada como um marcador na intoxica??o por chumbo. Sua inibi??o se d? pela substitui??o desses metais pesados no s?tio de liga??o a metais. Na ALAD humana, o Zinco (Zn2+) ocupa funcionalmente este s?tio sendo essencial para a coordena??o das cadeias de ?cido aminolevul?nico durante a cat?lise enzim?tica. Embora muitos ensaios in vitro, in vivo e in s?lico j? tenham demonstrado a import?ncia do Zn2+ nesse s?tio, n?o se tinha conhecimento de nenhum estudo baseado em abordagem qu?ntica com o intuito de elucidar esta intera??o de forma mais detalhada. Diante disso, o presente trabalho teve como objetivo analisar as muta??es missense que acometem o s?tio de liga??o ao zinco e descrever atrav?s de m?todos qu?nticos a energia de intera??o entre a enzima e o zinco com maior acur?cia utilizando o m?todo do Fracionamento Molecular com Capas Conjugadas (MFCC), quantificando energeticamente os res?duos de amino?cidos posicionados at? uma dist?ncia de 8,5 ? do centroide do ligante. Foi identificado as altera??es bioqu?micas na estrutura monom?rica dos mutantes, as quais resultam na diminui??o da atividade enzim?tica. Foram identificados um total de 30 res?duos com valores energ?ticos variados que interagem com o zinco no bols?o de liga??o. Aqueles que apresentaram valores significativos (de atra??o ou repuls?o) e est?o relacionados funcionalmente ? atividade enzim?tica foram: Lis199, Lis252, Arg 209, Arg 174, Cis122, Cis124 e Cis132; e aqueles que demonstraram relev?ncia para a perman?ncia do ?on no s?tio de liga??o foram: Asp169, Gli130, Gli133, Asp120 e Ser168. A partir disso, p?de-se concluir que al?m dos grupos nucle?filos (grupos tiolatos) dos res?duos Cis122, Cis124 e Cis132, os res?duos Asp169, Asp120 e Ser168 s?o fundamentais na composi??o do bols?o, uma vez que demonstraram grande quantidade de energia de intera??o atrativa com o ?on Zn2+.
The enzyme Delta Aminolevulinic Dehydratase (ALAD) is a cytosolic metalloproteinase essential in several biological processes since it participates in the second step in porphobilinogen formation pathway, a tetrapyrrolic precursor of heme and chlorophyll. This enzyme is very sensitive to heavy metals and has traditionally been used as a biomarker in lead poisoning. Its inhibition occurs when these heavy metals are replaced inside the metal binding site. In human ALAD, Zinc (Zn2+) functionally occupies this site and it is essential for coordination of two chains of aminolevulinic acid for the enzymatic catalysis. Although many in vitro, in vivo and in silico works have already demonstrated the importance of Zn2+ at that site, to the best of our knowledge, there isn?t any studies on literature based on quantum approach in order to elucidate this interactions in more details. Therefore, the aim of the present study was to analyse the missense mutations that affect the zinc binding site and describe through quantum methods the energy interaction between zinc and ALAD with greater accuracy using the method of Molecular fractionation with conjugated caps (MFCC) by quantifying amino acid residues? energy positioned at 8.5 ? of distance with the ligand centroid. It was identified biochemical changes in the monomeric structure of mutants, which result in decreased enzyme activity. It were identified a total of 30 residues with a wide range of energy values. The residues with significant (atractition or repulsion) values and functionally related to enzymatic activity were: Lys199, Lys252, Cys122, Cys124 and Cys132; and those that demonstrated relevance to the ion permanence inside the binding site were: Asp169, Gly130, Gly133, Asp120 and Ser168. Thus, it could be concluded that in addition to the nucleophilic groups (thiolates groups) from Cys122, Cys124 and Cys132, others residues such as Asp169, Asp120 and Ser168 are fundamental in the catalytic pocket composition, since they showed high attractive interaction energy with Zn2+ ion.

APA, Harvard, Vancouver, ISO, and other styles

18

Manso, Dalila Nascimento. "An?lise molecular da muta??o HIS275TIR isolada na Neuraminidase do H1N1 resistente ao oseltamivir." PROGRAMA DE P?S-GRADUA??O EM CI?NCIAS BIOL?GICAS, 2017. https://repositorio.ufrn.br/jspui/handle/123456789/24058.

Full text

Abstract:

Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2017-10-04T22:23:59Z No. of bitstreams: 1 DalilaNascimentoManso_DISSERT.pdf: 1914411 bytes, checksum: 966fc442e252d656c3946bff697a75f5 (MD5)
Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2017-10-13T21:33:08Z (GMT) No. of bitstreams: 1 DalilaNascimentoManso_DISSERT.pdf: 1914411 bytes, checksum: 966fc442e252d656c3946bff697a75f5 (MD5)
Made available in DSpace on 2017-10-13T21:33:08Z (GMT). No. of bitstreams: 1 DalilaNascimentoManso_DISSERT.pdf: 1914411 bytes, checksum: 966fc442e252d656c3946bff697a75f5 (MD5) Previous issue date: 2017-04-19
A mais recente pandemia do v?rus influenza ocorreu no ano de 2009, causada pela cepa do influenza A (H1N1), e popularmente conhecida como gripe A ou gripe su?na, gerou preocupa??o aos ?rg?os mundiais de sa?de. Com um quadro sintom?tico que inclui febre, tosse, inflama??o na garganta na maioria dos casos, alguns pacientes, principalmente imunossuprimidos que podem apresentar complica??es que evoluem ao ?bito. A transmiss?o do v?rus ocorre atrav?s do contato entre pessoa a pessoa e seu mecanismo de infec??o se d? a partir das duas glicoprote?nas de superf?cie, a hemaglutinina e a neuraminidase. A hemaglutinina atua se ligando aos receptores do ?cido si?lico favorecendo a entrada do v?rus nas c?lulas-alvo e a neuraminidase cliva as c?lulas do receptor de res?duos do ?cido si?lico, onde as novas part?culas virais est?o se ligando. Atrav?s dessa quebra haver? libera??o das novas part?culas virais, que atrav?s da hemaglutinina invadir?o novas c?lulas. Baseado nisso, f?rmacos foram desenvolvidos com intuito de inibir a a??o da neuraminidase, os chamados inibidores da neuraminidase que interferem na libera??o dessas novas part?culas virais evitando a dissemina??o da infec??o no trato respirat?rio. Dentre estes inibidores o oseltamivir ? o f?rmaco de escolha para profilaxia e tratamento da gripe A; por?m, relatos de resist?ncia a esse f?rmaco foram descritos, o que causou preocupa??o nos profissionais da sa?de e governantes. A muta??o mais encontrada ? a HIS275TIR, onde a histidina ? substitu?da por uma tirosina, promovendo uma s?rie de altera??es conformacionais que diminuem a afinidade do f?rmaco pelo v?rus originando a resist?ncia. A partir da obten??o de dados cristalogr?ficos e simula??o computacional, calculamos a energia de intera??o da neuraminidase selvagem e com a presen?a da muta??o HIS275TIR ligadas ao oseltamivir utilizando a Teoria Funcional da Densidade (DFT) e do M?todo de Fracionamento Molecular com Capas Conjugadas (MFCC). Obtivemos 115 res?duos de intera??o para a neuraminidase selvagem (cristal 4B7R) e 109 res?duos de intera??o para o cristal com a neuraminidase mutante (3CL0). Os resultados foram avaliados de acordo com a relev?ncia dos valores energ?ticos para energias repulsivas e energias atrativas. Os c?lculos energ?ticos realizados confirmaram a redu??o da afinidade da cepa contendo a muta??o HIS275TIR e destacaram a import?ncia energ?tica do s?tio ativo da neuraminidase mostrando que os principais res?duos energ?ticos s?o encontrados nele tornando um alvo para obten??o de novos f?rmacos devido a sua conserva??o. As altera??es causadas pela substitui??o do amino?cido histidina por uma tirosina levaram a uma s?rie de mudan?as conformacionais nos amino?cidos vizinhos que provocaram altera??es eletrost?ticas resultando na resist?ncia ao f?rmaco. A partir desse estudo ser? poss?vel conhecer melhor as intera??es moleculares da neuraminidase mutante e posteriormente projetar novos designs de f?rmacos para serem elaborados e se tornarem mais eficientes na intera??o com as cepas mutantes desse v?rus.
The latest influenza pandemic occurred in the year 2009, caused by the strain of influenza A (H1N1), and popularly known as influenza A or swine flu, generated concern to the global health agencies. With a symptomatic picture that includes fever, cough, throat inflammation in most cases, some patients, mainly immunosuppressed, that can to present complications that evolve to death. Transmission of the virus takes place through contact between person to person and its mechanism of infection occurs from the two surface glycoproteins, hemagglutinin and neuraminidase. The hemagglutinin acts by binding to the sialic acid receptors favoring the entry of the virus into the target cells and the neuraminidase cleaves the receptor cells of sialic acid residues, where the new viral particles are binding. Through this breakdown there will be release of the new particles that through hemagglutinin will attack new cells. Based on these, drugs were developed in an attempt to inhibit the action of neuraminidase, so called neuraminidase inhibitors that interfere in the release of these new viral particles avoiding the spread of infection in the respiratory tract. Among the inhibitors, oseltamivir is the drug of choice for prophylaxis and treatment of influenza A, but reports of resistance to this drug have been described, which has caused concern in health professionals and rulers. The HIS275TIR mutation is most commonly found, where histidine is replaced by a tyrosine, promoting a series of conformational changes that decrease the affinity of the drug for the virus causing resistance. Based on crystallographic data and computational simulation, we calculated the interaction energy of the wild neuraminidase and the presence of the HIS275TIR mutation bonded to oseltamivir using the Functional Density Theory (DFT) and the Molecular Fractionation with Conjugated Caps (MFCC). We obtained 115 interaction residues for the wild neuraminidase (4B7R crystal) and 109 interaction residues for the crystal with the mutant neuraminidase (3CL0). The results were evaluated according to the relevance of the energy values for repulsive energies and attractive energies. The energetic calculations confirmed the reduction of the affinity of the strain containing the HIS275TIR mutation and highlighted the energy importance of the active site of the neuraminidase, showing that the main energy residues are found in it becoming a target for obtaining new drugs due to its conservation. The changes caused by the substitution of the amino acid histidine for a tyrosine led to a series of conformational changes in the neighboring amino acids that provoked electrostatic changes resulting in the resistance to the drug. From this study, it will be possible to know better the molecular interactions of the mutant neuraminidase and subsequently to project new drugs designs to be elaborated and become more efficient in the interaction with the mutant strains of this virus.

APA, Harvard, Vancouver, ISO, and other styles

19

Alvarenga, Rodrigo Jorge. "Reconhecimento de comandos de voz por redes neurais." Universidade de Taubaté, 2012. http://www.bdtd.unitau.br/tedesimplificado/tde_busca/arquivo.php?codArquivo=587.

Full text

Abstract:

Sistema de reconhecimento de fala tem amplo emprego no universo industrial, no aperfeiçoamento de operações e procedimentos humanos e no setor do entretenimento e recreação. O objetivo específico do trabalho foi conceber e desenvolver um sistema de reconhecimento de voz, capaz de identificar comandos de voz, independentemente do locutor. A finalidade precípua do sistema é controlar movimentos de robôs, com aplicações na indústria e no auxílio de deficientes físicos. Utilizou-se a abordagem da tomada de decisão por meio de uma rede neural treinada com as características distintivas do sinal de fala de 16 locutores. As amostras dos comandos foram coletadas segundo o critério de conveniência (em idade e sexo), a fim de garantir uma maior discriminação entre as características de voz, e assim alcançar a generalização da rede neural utilizada. O préprocessamento consistiu na determinação dos pontos extremos da locução do comando e na filtragem adaptativa de Wiener. Cada comando de fala foi segmentado em 200 janelas, com superposição de 25% . As features utilizadas foram a taxa de cruzamento de zeros, a energia de curto prazo e os coeficientes ceptrais na escala de frequência mel. Os dois primeiros coeficientes da codificação linear preditiva e o seu erro também foram testados. A rede neural empregada como classificador foi um perceptron multicamadas, treinado pelo algoritmo backpropagation. Várias experimentações foram realizadas para a escolha de limiares, valores práticos, features e configurações da rede neural. Os resultados foram considerados muito bons, alcançando uma taxa de acertos de 89,16%, sob as condições de pior caso da amostragem dos comandos.
Systems for speech recognition have widespread use in the industrial universe, in the improvement of human operations and procedures and in the area of entertainment and recreation. The specific objective of this study was to design and develop a voice recognition system, capable of identifying voice commands, regardless of the speaker. The main purpose of the system is to control movement of robots, with applications in industry and in aid of disabled people. We used the approach of decision making, by means of a neural network trained with the distinctive features of the speech of 16 speakers. The samples of the voice commands were collected under the criterion of convenience (age and sex), to ensure a greater discrimination between the voice characteristics and to reach the generalization of the neural network. Preprocessing consisted in the determination of the endpoints of each command signal and in the adaptive Wiener filtering. Each speech command was segmented into 200 windows with overlapping of 25%. The features used were the zero crossing rate, the short-term energy and the mel-frequency ceptral coefficients. The first two coefficients of the linear predictive coding and its error were also tested. The neural network classifier was a multilayer perceptron, trained by the backpropagation algorithm. Several experiments were performed for the choice of thresholds, practical values, features and neural network configurations. Results were considered very good, reaching an acceptance rate of 89,16%, under the `worst case conditions for the sampling of the commands.

APA, Harvard, Vancouver, ISO, and other styles

20

Matos, Adriano Nogueira. "Extração de características do sinal de voz utilizando análise fatorial verdadeira." Universidade Federal do Amazonas, 2008. http://tede.ufam.edu.br/handle/tede/2959.

Full text

Abstract:

Made available in DSpace on 2015-04-11T14:03:17Z (GMT). No. of bitstreams: 1 DISSERTACAO ADRIANO NOGUEIRA.pdf: 382280 bytes, checksum: fc1f9e0caac3d97ff74a893e97298a71 (MD5) Previous issue date: 2008-12-17
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior
Digital processing of speech signal is applied in several computer applications, which the major ones are the following: Recognition, synthesis and coding of speech. All these applications require the amount of data in the acoustic signal to be reduced, in order to allow processing by a computer device. The feature extraction of speech signal, that is the goal of this study, performs this action. The features extracted should well depict the speech signal and should have no redundancy, in order to increase the performance of the systems using them. The feature extraction Mel Frequency Cepstral Coefficients (MFCC) method partially fulfills these requirements, but it is seriously damaged when noise signal is acting. The appliance of the statistical method of Factorial Analysis is intended to filter the noise components from the speech. The results of the experiments performed in this work shows that this is a competitive method, especially when used to generate acoustic models in severe noise conditions.
O processamento digital do sinal de voz é empregado em diversas aplicações computacionais, das quais as principais são: Reconhecimento, síntese e codificação da fala. Todas estas aplicações requerem que ocorra redução da quantidade de informações da onda acústica, de maneira a permitir o processamento por um computador. O processo de extração de características do sinal de voz, objeto de estudo deste trabalho, realiza esta tarefa. As características extraídas devem caracterizar o sinal de voz e não conter redundância, de forma a maximizar o desempenho dos sistemas que as utilizem. O método MFCC (Mel Frequency Cepstral Coefficients) de extração de características cumpre parcialmente esses requisitos, mas é seriamente degradado sob a incidência de ruído. A aplicação do método estatístico de Análise Fatorial objetiva filtrar o sinal de ruído das locuções. Os resultados obtidos dos experimentos realizados indicam a competitividade deste método, especialmente quando usado na geração dos modelos acústicos robustos em condições de ruído severo.

APA, Harvard, Vancouver, ISO, and other styles

21

Abraham, Aby. "Continous Speech Recognition Using Long Term Memory Cells." Ohio University / OhioLINK, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=ohiou1377777011.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Li, Yi. "Speaker Diarization System for Call-center data." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-286677.

Full text

Abstract:

To answer the question who spoke when, speaker diarization (SD) is a critical step for many speech applications in practice. The task of our project is building a MFCC-vector based speaker diarization system on top of a speaker verification system (SV), which is an existing Call-centers application to check the customer’s identity from a phone call. Our speaker diarization system uses 13-Dimensional MFCCs as Features, performs Voice Active Detection (VAD), segmentation, Linear Clustering and the Hierarchical Clustering based on GMM and the BIC score. By applying it, we decrease the Equal Error Rate (EER) of the SV from 18.1% in the baseline experiment to 3.26% on the general call-center conversations. To better analyze and evaluate the system, we also simulated a set of call-center data based on the public audio databases ICSI corpus.
För att svara på frågan vem som talade när är högtalardarisering (SD) ett kritiskt steg för många talapplikationer i praktiken. Uppdraget med vårt projekt är att bygga ett MFCC-vektorbaserat högtalar-diariseringssystem ovanpå ett högtalarverifieringssystem (SV), som är ett befintligt Call-center-program för att kontrollera kundens identitet från ett telefonsamtal. Vårt högtalarsystem använder 13-dimensionella MFCC: er som funktioner, utför Voice Active Detection (VAD), segmentering, linjär gruppering och hierarkisk gruppering baserat på GMM och BIC-poäng. Genom att tillämpa den minskar vi EER (Equal Error Rate) från 18,1 % i baslinjeexperimentet till 3,26 % för de allmänna samtalscentret. För att bättre analysera och utvärdera systemet simulerade vi också en uppsättning callcenter-data baserat på de offentliga ljuddatabaserna ICSI corpus.

APA, Harvard, Vancouver, ISO, and other styles

23

Čermák, Jan. "Rozpoznávání emočních stavů na základě analýzy řečového signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218162.

Full text

Abstract:

The thesis is focused on the emotional states classification in the Matlab program, using neural networks and the classifier which is based on a combination of Gaussian density functions. It deals with the speech signal processing; the prosodic and spectral signs and the MFCC coefficients were extracted from the signal. The work also deals with the quality evaluation of individual signs of which the most suitable were chosen in order to provide the correct classification of emotional states. In order to identify the emotional states, two different methods were used. The first method of classification was the use of neural networks with differently selected parameters, and the second method was the use of the Gaussian mixture model (GMM). In both methods, a database of emotional utterances was divided into the training group and the test group. The testing was based on a method independent of the speaker. The work also includes the comparison of individual analyzed methods as well as the representation and comparison of the results. The conclusion comprises a proposition for the best parameters and the best classifier for the recognition of the speaker’s emotional state.

APA, Harvard, Vancouver, ISO, and other styles

24

Vianna, J?ssica de F?tima. "Bioqu?mica qu?ntica da capreomicina e da estreptomicina em complexo com o ribossomo bacteriano." PROGRAMA DE P?S-GRADUA??O EM CI?NCIAS BIOL?GICAS, 2017. https://repositorio.ufrn.br/jspui/handle/123456789/22614.

Full text

Abstract:

Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2017-04-03T22:31:54Z No. of bitstreams: 1 JessicaDeFatimaVianna_DISSERT.pdf: 3724208 bytes, checksum: f7d62fbcd54bf6b212f2003b461810c5 (MD5)
Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2017-04-11T18:14:29Z (GMT) No. of bitstreams: 1 JessicaDeFatimaVianna_DISSERT.pdf: 3724208 bytes, checksum: f7d62fbcd54bf6b212f2003b461810c5 (MD5)
Made available in DSpace on 2017-04-11T18:14:29Z (GMT). No. of bitstreams: 1 JessicaDeFatimaVianna_DISSERT.pdf: 3724208 bytes, checksum: f7d62fbcd54bf6b212f2003b461810c5 (MD5) Previous issue date: 2017-02-16
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior (CAPES)
A tuberculose ? uma doen?a bacteriana provocada pelo Mycobacterium tuberculosis, e de acordo com a Organiza??o Mundial de Sa?de, apenas em 2015 foram 10,4 milh?es de novos casos relatados e 1,4 milh?o de mortes. Cresce o n?mero de casos de pacientes infectados com cepas resistentes aos antimicrobianos mais comumente utilizados, fazendo-se necess?rio uso de drogas de segunda-linha. A capreomicina e a estreptomicina encaixam-se nesse grupo, e s?o antibi?ticos que possuem como mecanismo de atua??o a inibi??o da s?ntese proteica. Entretanto, seus mecanismos de liga??o em seus s?tios s?o distintos: a capreomicina ? capaz de se ligar a ambas subunidades ribossomais (30S e 50S), enquanto que a estreptomicina liga-se ? subunidade ribossomal menor (30S), e interage com alguns pontos da prote?na S12. Atrav?s de dados cristalogr?ficos e simula??es computacionais, foi calculada a energia de intera??o da capreomicina e da estreptomicina com cada um dos res?duos constituintes de seus s?tios utilizando a Teoria Funcional da Densidade (DFT) e do M?todo de Fracionamento Molecular com Capas Conjugadas (MFCC). Os resultados revelaram valores energ?ticos de cada nucleot?deo pertencente ao s?tio de liga??o desses dois medicamentos, como tamb?m dos amino?cidos da prote?na S12 com os quais a estreptomicina interage. Assim, para a capreomicina na subunidade 30S, foram avaliados res?duos presentes em um raio de at? 14 ? distantes do f?rmaco, totalizando 44 res?duos; e na subunidade 50S, 30 nucleot?deos foram analisados, e estavam distribu?dos at? o raio de 30 ? de dist?ncia. Com a estreptomicina foram levados em considera??o 60 nucleot?deos distribu?dos at? 12,5 ? de dist?ncia da droga na subunidade 30S, e 25 amino?cidos da prote?na S12 com at? 15 ? de dist?ncia. Identificamos tamb?m as contribui??es das liga??es de hidrog?nio e das intera??es hidrof?bicas nas intera??es f?rmaco-receptor; as regi?es dos f?rmacos que mais contribu?ram para as fixa??es desses em seus s?tios de liga??o; como tamb?m a identifica??o dos res?duos que s?o mais associados ?s muta??es e consequente resist?ncia.
Tuberculosis is a disease caused by Mycobacterium tuberculosis, and according to the World Health Organization, only in 2015 occurred 10.4 million new cases reported and 1.4 million deaths. The number of cases of patients infected with antimicrobial resistant strains most used is increasing, requiring the use of second-line drugs. Capreomycin and streptomycin are part of the group, and are antibiotics whose mechanism of action is the inhibition of protein synthesis. However, its binding mechanisms in their sites are distinct: capreomycin is able to bind to both ribosomal (30S and 50S) subunits, whereas streptomycin binds to the smaller ribosomal subunit (30S), and interacts with some points of S12 protein. Through crystallographic data and computational simulations, we calculated the interaction energy of capreomycin and streptomycin with each of the residues component of their sites using the Density Functional Theory (DFT) and Molecular Fractionation with Conjugated Caps (MFCC). The results showed energy values of each nucleotide belonging to binding site of these two drugs, as well as the amino acids of the S12 protein with which streptomycin interacts. Thus, for capreomycin in the 30S subunit, residues present in a radius of up to 14 ? distant from the drug, totaling 44 residues; and in the 50S subunit, 30 nucleotides were analyzed, and were distributed up to the 30? radius distance. Regarding streptomycin, 60 nucleotides distributed up to 12.5 ? away from the drug in the 30S subunit, and 25 amino acids of the S12 protein with up to 15 ? were taken into account. We also identify the contributions of hydrogen bonds and hydrophobic interactions in drug-receptor interactions; the regions of the drugs that most contributed to the anchorages of these in their binding sites; as well as the identification of residues that are most associated with mutations and consequent resistance.

APA, Harvard, Vancouver, ISO, and other styles

25

Káčerová, Erika. "Odhad formantových kmitočtů pomocí strojového učení." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-400852.

Full text

Abstract:

This Master's thesis deals with the issue of formant extraction. A system of scripts in Matlab interface is created to generate values of the first three formant frequencies from speech recordings with the use of Praat and Snack(WaveSurfer). Mel Frequency Cepstral Coefficients and Linear Predictive Coefficients are extracted from the audio files in order to be added to the database. This database is then used to train a neural network. Finally, the designed neural network is tested.

APA, Harvard, Vancouver, ISO, and other styles

26

Dobrotka, Matúš. "Detekce Akustického Prostředí z Řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2018. http://www.nusl.cz/ntk/nusl-385945.

Full text

Abstract:

The topic of this thesis is an audio recording classification with 15 different acoustic scene classes that represent common scenes and places where people are situated on a regular basis. The thesis describes 2 approaches based on GMM and i-vectors and a fusion of the both approaches. The score of the best GMM system which was evaluated on the evaluation dataset of the DCASE Challenge is 60.4%. The best i-vector system's score is 68.4%. The fusion of the GMM system and the best i-vector system achieves score of 69.3%, which would lead to the 20th place in the all systems ranking of the DCASE 2017 Challenge (among 98 submitted systems from all over the world).

APA, Harvard, Vancouver, ISO, and other styles

27

Lima, Neto Jos? Xavier de. "Bioqu?mica qu?ntica na diferencia??o dos n?veis de ativa??o de receptores AMPA por agonistas parciais Wilardina." Universidade Federal do Rio Grande do Norte, 2015. http://repositorio.ufrn.br/handle/123456789/19861.

Full text

Abstract:

Submitted by Automa??o e Estat?stica (sst@bczm.ufrn.br) on 2016-02-22T23:19:51Z No. of bitstreams: 1 JoseXavierDeLimaNeto_DISSERT.pdf: 20857554 bytes, checksum: 04aea5694e5da65425668c7f81185381 (MD5)
Approved for entry into archive by Arlan Eloi Leite Silva (eloihistoriador@yahoo.com.br) on 2016-02-26T00:31:29Z (GMT) No. of bitstreams: 1 JoseXavierDeLimaNeto_DISSERT.pdf: 20857554 bytes, checksum: 04aea5694e5da65425668c7f81185381 (MD5)
Made available in DSpace on 2016-02-26T00:31:29Z (GMT). No. of bitstreams: 1 JoseXavierDeLimaNeto_DISSERT.pdf: 20857554 bytes, checksum: 04aea5694e5da65425668c7f81185381 (MD5) Previous issue date: 2015-02-26
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior - CAPES
No sistema nervoso central de mam?feros, a transmiss?o sin?ptica r?pida entre c?lulas nervosa ? realizada primariamente pelo receptor ?-amino-3-hidroxi-5-metil-4- isoxazolpropi?nico (AMPA), um Receptor Ionotr?pico de Glutamato, que est? relacionado com a aprendizagem, mem?ria e homeostase do sistema nervoso. Defici?ncias em seu funcionamento s?o correlacionadas com o desenvolvimento de muitas desordens cerebrais, tais como epilepsia, esquizofrenia, autismo, Parkinson e Alzheimer. O uso dos an?logos de wilardina tem se mostrado uma poderosa ferramenta para o entendimento dos mecanismos de ativa??o e dessensibiliza??o deste receptor, pois a modifica??o em um ?nico ?tomo deste ligante permite a observa??o de variados n?veis de efic?cia. Neste trabalho, tirando vantagem das estruturas de Fl?or Wilardina (1.35?), Hidrog?nio Wilardina (1.65?), Bromo Wilardina (1.8?) e Iodo Wilardina (2.15?), co-cristalizadas com o receptor GluA2 com os c?digos 1MQI, 1MQJ, 1MQH e 1MQG, respectivamente, buscou-se diferenciar energeticamente a efic?cia dos quatro ligantes. Os complexos foram submetidos a c?lculos energ?ticos baseados na teoria do funcional da densidade (DFT), sob a ?ptica do m?todo do fracionamento molecular com caps conjugados (MFCC). Os resultados obtidos mostram uma rela??o entre os valores energ?ticos e a ordem de efic?cia de cada wilardina (FW > HW > BrW > IW), ainda evidenciam a import?ncia de E705, R485, Y450, S654, T655, T480 e P478 como os amino?cidos que contribuem mais fortemente com a intera??o dos quatro agonistas parciais wilardina. Juntamente com isto, delineamos o comportamento de M708, sendo atra?do pelos ligantes FW e HW, e repelido por BrW e IW. Com os dados relatados neste trabalho, faz-se poss?vel um melhor entendimento do receptor AMPA, o que pode servir como auxilio no desenvolvimento de novos f?rmacos para este sistema.
In the central nervous system (CNS) of mammalian, fast synaptic transmission between nerve cells is performed primarily by ?-amino-3-hydroxy-5-methyl-4- isoxazolepropionic acid (AMPA) receptors, an ionotropic glutamate receptor that is related with learning, memory and homeostasis of the nervous system. Impairments in their functions are correlated with development of many brain desorders, such as epilepsy, schizophrenia, autism, Parkinson and Alzheimer. The use of willardiine analogs has been shown a powerful tool to understanding of activation and desensitization mechanisms of this receptors, because the modification of a single ligand atom allows the observation of varying levels of efficacy. In this work, taking advantage of Fluorine Willardiine (1.35?), Hydrogen Willardiine (1.65?), Bromine Willardiine (1.8?) and Iodine Willardiine (2.15?) structures co-crystalized with GluA2 with codes 1MQI, 1MQJ, 1MQH and 1MQG, we attempted to energetically differentiate the four ligands efficacy. The complexes were submitted to energetic calculations based on density functional theory (DFT), under the optics of molecular fractionation with conjugate caps (MFCC) method. Obtained results show a relationship between the energetic values and willardiines efficacy order (FW> HW > BrW > IW), also show the importance of E705, R485, Y450, S654, T655, T480 e P478 as the amino acids that contribute most strongly with the interaction of four partial agonists. Furthermore, we outlined the M708 behaviour, attracted by FW and HW ligands, and repels by BrW and IW. With the datas reported on this work, it is possible for a better understanding of the AMPA receptor, which can serve as an aid in the development of new drugs for this system.

APA, Harvard, Vancouver, ISO, and other styles

28

Bastas, Selin A. "Nocturnal Bird Call Recognition System for Wind Farm Applications." University of Toledo / OhioLINK, 2012. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1325803309.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Duarte, Dami Doria Narayana. "Um estudo da relevância da dinâmica espectral na classificação de sons domésticos." Universidade Federal de Sergipe, 2016. https://ri.ufs.br/handle/riufs/5021.

Full text

Abstract:

Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq
This work presents a study of the spectral dynamics characteristics of audio signals. More specifically, we aim at detecting regularities that can be modeled in typical domestic sounds, in order to classify them. Our starting point is the work of Sehili et al. [2], in which a household sounds classification system based on GMM is proposed. The Sehili system is reproduced in this work as a baseline system. Following the same protocol of experiments, a 73 % recognition rate is achieved. Afterwards, three sets of experiments are performed, arranged so that each new approach incorporates a new technique to highlight a different aspect of the spectral dynamics. The first technique is the insertion of the discrete gradient information of feature vectors, a strategy aimed at a local spectral dynamic analysis, and resultes in a perceptible increase in recognition rate. The next experiment is conducted with a HMM based classifier, in which the spectral dynamic should be encoded in state transition probability matrices. The tests with the HMM do not result in improved recognition rates. The last experiment is based on a features extraction method, proposed by the author, called Patterns of Energy Envelope per Band (PEEB). The PEEB is an extractor that highlight the signal spectral dynamics inside narrow bands. In domestic sounds recognition tests, the classification system based on a combination of PEEB, MFCC and GMM strategies resulted in a significant improvement over all other systems tested. We conclude, based on our results, that the spectral dynamics of the studied dataset plays an important role in the classification task. However, the approaches for spectral dynamic information extraction, studied in this work, are not definitive, for it is clear that they can be further developed. For example, in the case of PEEB, the recognition rate is strongly dependent on the sound class, suggesting more elaborate forms of fusion of PEEB and MFCC features for each class.
Este trabalho é um estudo da característica da dinâmica espectral em sinais sonoros, com vistas a encontrar as regularidades que podem ser modeladas em sons tipicamente domésticos, com o objetivo de classificá-los. O ponto de partida é o trabalho de Sehili et al. [1], no qual é proposto um sistema de classificação de sons domésticos baseado em GMM. O sistema de Sehili é reproduzido neste trabalho como marco zero na análise da dinâmica espectral, seguindo o mesmo roteiro dos experimentos. A partir daí, três conjuntos de experimentos são realizados, organizados de forma que, a cada novo experimento, uma técnica – que destaca um aspecto diferente da dinâmica espectral – seja incorporada. A primeira técnica analisada é a inserção da informação de gradiente discreto dos vetores de características, estratégia que representa uma análise de dinâmica espectral local e que resulta num aumento perceptível na taxa de classificação. O próximo experimento é realizado com um classificador baseado em HMM, no qual a informação de dinâmica espectral deve ser codificada na matriz de probabilidades de transição de estados do modelo. Os testes com o HMM não resultam em melhora na taxa de reconhecimento das classes de sons. O último experimento é baseado num extrator de características proposto pelo autor, chamado de Padrões de Envelopes de Energia por Banda (PEEB). O PEEB é um extrator que destaca os padrões de evolução espectro-temporais do sinais. Nos testes de reconhecimento de sons domésticos, o sistema de classificação baseado numa combinação das estratégias PEEB, MFCC e GMM resultam numa melhora significativa em relação a todos os outros sistemas testados. Conclui-se, com base nos resultados, que a dinâmica espectral dos sinais da base estudada é relevante à tarefa de classificação. No entanto, as maneiras de extração da informação de dinâmica espectral estudadas neste trabalho não são definitivas, pois ainda há muito espaço para desenvolvê-las. Por exemplo, no caso do PEEB, nota-se que a taxa de classificação fortemente é dependente da classe sonora, sugerindo formas mais elaboradas de fusão das características PEEB e MFCC para cada classe.

APA, Harvard, Vancouver, ISO, and other styles

30

Duarte, Dami Doria Narayana. "Um estudo da relevância da dinâmica espectral na classificação de sons doméstic." Universidade Federal de Sergipe, 2016. http://ri.ufs.br:8080/xmlui/handle/123456789/5021.

Full text

Abstract:

Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq
This work presents a study of the spectral dynamics characteristics of audio signals. More specifically, we aim at detecting regularities that can be modeled in typical domestic sounds, in order to classify them. Our starting point is the work of Sehili et al. [2], in which a household sounds classification system based on GMM is proposed. The Sehili system is reproduced in this work as a baseline system. Following the same protocol of experiments, a 73 % recognition rate is achieved. Afterwards, three sets of experiments are performed, arranged so that each new approach incorporates a new technique to highlight a different aspect of the spectral dynamics. The first technique is the insertion of the discrete gradient information of feature vectors, a strategy aimed at a local spectral dynamic analysis, and resultes in a perceptible increase in recognition rate. The next experiment is conducted with a HMM based classifier, in which the spectral dynamic should be encoded in state transition probability matrices. The tests with the HMM do not result in improved recognition rates. The last experiment is based on a features extraction method, proposed by the author, called Patterns of Energy Envelope per Band (PEEB). The PEEB is an extractor that highlight the signal spectral dynamics inside narrow bands. In domestic sounds recognition tests, the classification system based on a combination of PEEB, MFCC and GMM strategies resulted in a significant improvement over all other systems tested. We conclude, based on our results, that the spectral dynamics of the studied dataset plays an important role in the classification task. However, the approaches for spectral dynamic information extraction, studied in this work, are not definitive, for it is clear that they can be further developed. For example, in the case of PEEB, the recognition rate is strongly dependent on the sound class, suggesting more elaborate forms of fusion of PEEB and MFCC features for each class.
Este trabalho é um estudo da característica da dinâmica espectral em sinais sonoros, com vistas a encontrar as regularidades que podem ser modeladas em sons tipicamente domésticos, com o objetivo de classificá-los. O ponto de partida é o trabalho de Sehili et al. [1], no qual é proposto um sistema de classificação de sons domésticos baseado em GMM. O sistema de Sehili é reproduzido neste trabalho como marco zero na análise da dinâmica espectral, seguindo o mesmo roteiro dos experimentos. A partir daí, três conjuntos de experimentos são realizados, organizados de forma que, a cada novo experimento, uma técnica – que destaca um aspecto diferente da dinâmica espectral – seja incorporada. A primeira técnica analisada é a inserção da informação de gradiente discreto dos vetores de características, estratégia que representa uma análise de dinâmica espectral local e que resulta num aumento perceptível na taxa de classificação. O próximo experimento é realizado com um classificador baseado em HMM, no qual a informação de dinâmica espectral deve ser codificada na matriz de probabilidades de transição de estados do modelo. Os testes com o HMM não resultam em melhora na taxa de reconhecimento das classes de sons. O último experimento é baseado num extrator de características proposto pelo autor, chamado de Padrões de Envelopes de Energia por Banda (PEEB). O PEEB é um extrator que destaca os padrões de evolução espectro-temporais do sinais. Nos testes de reconhecimento de sons domésticos, o sistema de classificação baseado numa combinação das estratégias PEEB, MFCC e GMM resultam numa melhora significativa em relação a todos os outros sistemas testados. Conclui-se, com base nos resultados, que a dinâmica espectral dos sinais da base estudada é relevante à tarefa de classificação. No entanto, as maneiras de extração da informação de dinâmica espectral estudadas neste trabalho não são definitivas, pois ainda há muito espaço para desenvolvê-las. Por exemplo, no caso do PEEB, nota-se que a taxa de classificação fortemente é dependente da classe sonora, sugerindo formas mais elaboradas de fusão das características PEEB e MFCC para cada classe.

APA, Harvard, Vancouver, ISO, and other styles

31

Ali, Ahmed Mohamed Abdel Maksoud. "Multi-dialect Arabic broadcast speech recognition." Thesis, University of Edinburgh, 2018. http://hdl.handle.net/1842/31224.

Full text

Abstract:

Dialectal Arabic speech research suffers from the lack of labelled resources and standardised orthography. There are three main challenges in dialectal Arabic speech recognition: (i) finding labelled dialectal Arabic speech data, (ii) training robust dialectal speech recognition models from limited labelled data and (iii) evaluating speech recognition for dialects with no orthographic rules. This thesis is concerned with the following three contributions: Arabic Dialect Identification: We are mainly dealing with Arabic speech without prior knowledge of the spoken dialect. Arabic dialects could be sufficiently diverse to the extent that one can argue that they are different languages rather than dialects of the same language. We have two contributions: First, we use crowdsourcing to annotate a multi-dialectal speech corpus collected from Al Jazeera TV channel. We obtained utterance level dialect labels for 57 hours of high-quality consisting of four major varieties of dialectal Arabic (DA), comprised of Egyptian, Levantine, Gulf or Arabic peninsula, North African or Moroccan from almost 1,000 hours. Second, we build an Arabic dialect identification (ADI) system. We explored two main groups of features, namely acoustic features and linguistic features. For the linguistic features, we look at a wide range of features, addressing words, characters and phonemes. With respect to acoustic features, we look at raw features such as mel-frequency cepstral coefficients combined with shifted delta cepstra (MFCC-SDC), bottleneck features and the i-vector as a latent variable. We studied both generative and discriminative classifiers, in addition to deep learning approaches, namely deep neural network (DNN) and convolutional neural network (CNN). In our work, we propose Arabic as a five class dialect challenge comprising of the previously mentioned four dialects as well as modern standard Arabic. Arabic Speech Recognition: We introduce our effort in building Arabic automatic speech recognition (ASR) and we create an open research community to advance it. This section has two main goals: First, creating a framework for Arabic ASR that is publicly available for research. We address our effort in building two multi-genre broadcast (MGB) challenges. MGB-2 focuses on broadcast news using more than 1,200 hours of speech and 130M words of text collected from the broadcast domain. MGB-3, however, focuses on dialectal multi-genre data with limited non-orthographic speech collected from YouTube, with special attention paid to transfer learning. Second, building a robust Arabic ASR system and reporting a competitive word error rate (WER) to use it as a potential benchmark to advance the state of the art in Arabic ASR. Our overall system is a combination of five acoustic models (AM): unidirectional long short term memory (LSTM), bidirectional LSTM (BLSTM), time delay neural network (TDNN), TDNN layers along with LSTM layers (TDNN-LSTM) and finally TDNN layers followed by BLSTM layers (TDNN-BLSTM). The AM is trained using purely sequence trained neural networks lattice-free maximum mutual information (LFMMI). The generated lattices are rescored using a four-gram language model (LM) and a recurrent neural network with maximum entropy (RNNME) LM. Our official WER is 13%, which has the lowest WER reported on this task. Evaluation: The third part of the thesis addresses our effort in evaluating dialectal speech with no orthographic rules. Our methods learn from multiple transcribers and align the speech hypothesis to overcome the non-orthographic aspects. Our multi-reference WER (MR-WER) approach is similar to the BLEU score used in machine translation (MT). We have also automated this process by learning different spelling variants from Twitter data. We mine automatically from a huge collection of tweets in an unsupervised fashion to build more than 11M n-to-m lexical pairs, and we propose a new evaluation metric: dialectal WER (WERd). Finally, we tried to estimate the word error rate (e-WER) with no reference transcription using decoding and language features. We show that our word error rate estimation is robust for many scenarios with and without the decoding features.

APA, Harvard, Vancouver, ISO, and other styles

32

Kotulek, Milan. "Jednoduchý textově nezávislý hlasový zámek - Softwarový systém pro verifikaci mluvčích." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2015. http://www.nusl.cz/ntk/nusl-221256.

Full text

Abstract:

A brief introduction into biometrics is described in this thesis leading to description and to design a solution of verification system using speech analysis. The designed system provides firstly basic signal processing, then vowel recognition in fluent Czech speech. For each found vowel, observed speech features are calculated. The created GUI application was tested on created speaker database and its efficiency is approximately 54 % for short testing utterances, and approx. 88 % for long testing utterances respectively.

APA, Harvard, Vancouver, ISO, and other styles

33

Costa, Roner Ferreira da. "BioquÃmica quÃntica das estatinas, aspirina e anti-hipertensivos." Universidade Federal do CearÃ, 2011. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=6234.

Full text

Abstract:

Conselho Nacional de Desenvolvimento CientÃfico e TecnolÃgico
As doenÃas cardiovasculares (CVDs) compreendem um amplo espectro de doenÃas do coraÃÃo e vasos sanguÃneos (artÃrias e veias), entre as quais se incluem a doenÃa das artÃrias coronÃrias, o ataque cardÃaco, a angina, a sÃndrome coronariana aguda, o aneurisma da aorta, arritmias cardÃacas, a doenÃa cardÃaca congÃnita, a insuficiÃncia cardÃaca e a doenÃa cardÃaca reumÃtica. Entre os principias fÃrmacos que tratam as doenÃas cardiovasculares estÃo: (i) as estatinas, que atuam inibindo a 3-hidroxi-3-metilgluratil coenzima A (HMG-CoA) redutase no processo de conversÃo da HMG-CoA em mevalonato, numa das etapas da biossÃntese do colesterol. Observa-se em ensaios clÃnicos que a aÃÃo das estatinas pode diminuir os nÃveis de colesterol de baixa densidade (LDL) entre 20\% e 60\%, reduzindo os eventos coronarianos em atÃ 1/3 no perÃodo de cinco anos; (ii) a aspirina, com a qual hÃ mais de 400 preparaÃÃes nos EUA e se produz cerca de 20 mil toneladas anualmente. ApÃs mais de um sÃculo de prÃtica clÃnica, a aspirina continua sendo a droga antitrombÃtica, antitÃrmica, analgÃsica e antiproliferativa mais amplamente recomendada. Ela age bloqueando a biossÃntese de hormÃnios inflamatÃrios prostanÃides atravÃs da inibiÃÃo das enzimas ciclooxigenase COX-1 e COX-2; (iii) os anti-hipertensivos, para os quais a Enzima Conversora de Angiotensina (ECA) Ã o principal alvo (inibidores da ECA estÃo no mercado a mais de 20 anos) visando o combate das pressÃes arteriais elevadas, que provocam alteraÃÃes nos vasos sanguÃneos e na musculatura do coraÃÃo, e levam a hipertrofia do ventrÃculo esquerdo do coraÃÃo, acidente vascular cerebral, infarto do miocÃrdio, morte sÃbita, insuficiÃncias renal e cardÃaca, etc. A hipertensÃo arterial (HTA) ou hipertensÃo arterial sistÃmica (HAS), conhecida popularmente como pressÃo alta, Ã uma das doenÃas com maior prevalÃncia no mundo moderno. A ECA atua na regulaÃÃo da pressÃo sanguÃnea via conversÃo do decapeptÃdeo angiotensina I no potente vasopressor angiotensina II e tambÃm pela inativaÃÃo da bradicinina, sendo componente central do Sistema Renina-Angiotensina-Aldeosterona (SRAA), que controla a pressÃo sanguÃnea e tem forte influÃncia nas funÃÃes relacionadas ao coraÃÃo e os rins, bem como na contraÃÃo dos vasos sanguÃneos. Nesta tese realiza-se um estudo da bioquÃmica quÃntica de estatinas (atorvastina, rosuvastatina, cerivastatina, mevastatina, sinvastatina e fluvastatina), da aspirina/bromoaspirina e de anti-hipertensivos (captopril, enalapril, lisinopril, ramipril, trandolapril e perindopril) levando-se em conta dados cristalogrÃficos dos seus sÃtios de ligaÃÃo nas proteÃnas HMGR, COX-1 (o da aspirina foi simulado partindo-se dos dados da bromoaspirina) e ECA, respectivamente. As simulaÃÃes computacionais foram realizadas considerando-se a Teoria do Funcional de Densidade (DFT) na aproximaÃÃo da densidade local (LDA) e funcional de troca e correlaÃÃo PWC, com energia de interaÃÃo entre os resÃduos das proteÃnas circunscritos ao sÃtio de ligaÃÃo de raio r e os fÃrmacos calculada atravÃs do mÃtodo de fracionamento molecular com capas conjugadas (MFCC). Os resultados obtidos para as estatinas sugerem que: (i) as mais (menos) eficazes sÃo a atorvastatina e a rosuvastatina (sinvastatina e fluvastatina), o que estÃ de acordo com a clÃnica e valores dos seus Ãndices de concentraÃÃes inibitÃrias IC50; (ii) sÃtios de ligaÃÃo com raios de pelo menos 12 Ã (alÃm do raio de 9,5 Ã sugerido pela anÃlise estrita de dados cristalogrÃficos) devem ser considerados para que resÃduos importantes como E665, D767, e R702 sejam considerados para que as eficiÃncias das estatinas sejam corretamente explicadas. Para a aspirina/bromoaspirina utilizou-se um refinamento quÃntico de segunda ordem dos dados cristalogrÃficos para se demonstrar que a energia de ligaÃÃo de ambos com a COX-1 sÃo aproximadamente a mesma, o que explica resultados experimentais de IC50 similares. A existÃncia de resÃduos atrativos e resulsivos Ã destacada, mostrando-se que Arg120 Ã o resÃduo que mais atrai o Ãcido salicÃlico apÃs acetilaÃÃo da Ser530, seguido de Ala527, Leu531, Leu359 e Ser353; por outro lado, Glu524 Ã o resÃduo repulsivo mais efetivo (intensidade comparÃvel ao Arg120), nunca tendo sido antes considerado como resÃduo importante no sÃtio de ligaÃÃo da aspirina/bromoaspirina na COX-1. Finalmente, no caso dos anti-hipertensivos, obtÃm-se que Ã necessÃrio se considerar raios do sÃtio de ligaÃÃo de 16 Ã para se obter que o lisinopropil e o ramipril (trandolapril e perindopril) apresentam as maiores (menores) energias de ligaÃÃo, o que explica a maior (menor) constante de inibiÃÃo dos mesmos entre os anti-hipertensivos estudados para a ACE da Drosophila melanogaster.

APA, Harvard, Vancouver, ISO, and other styles

34

Costa, Roner Ferreira da. "Bioquímica quântica das estatinas, aspirina e anti-hipertensivos." reponame:Repositório Institucional da UFC, 2011. http://www.repositorio.ufc.br/handle/riufc/12543.

Full text

Abstract:

COSTA, Roner Ferreira da. Bioquímica quântica das estatinas, aspirina e anti-hipertensivos. 2011. 185 f. Tese (Doutorado em Física) - Programa de Pós-Graduação em Física, Departamento de Física, Centro de Ciências, Universidade Federal do Ceará, Fortaleza, 2011.
Submitted by Edvander Pires (edvanderpires@gmail.com) on 2015-05-29T22:17:20Z No. of bitstreams: 1 2011_tese_rfcosta.pdf: 5384677 bytes, checksum: b7096c8a3fe046f09eec5640166b7cba (MD5)
Approved for entry into archive by Edvander Pires(edvanderpires@gmail.com) on 2015-05-29T22:18:27Z (GMT) No. of bitstreams: 1 2011_tese_rfcosta.pdf: 5384677 bytes, checksum: b7096c8a3fe046f09eec5640166b7cba (MD5)
Made available in DSpace on 2015-05-29T22:18:27Z (GMT). No. of bitstreams: 1 2011_tese_rfcosta.pdf: 5384677 bytes, checksum: b7096c8a3fe046f09eec5640166b7cba (MD5) Previous issue date: 2011
As doenças cardiovasculares (CVDs) compreendem um amplo espectro de doenças do coração e vasos sanguíneos (artérias e veias), entre as quais se incluem a doença das artérias coronárias, o ataque cardíaco, a angina, a síndrome coronariana aguda, o aneurisma da aorta, arritmias cardíacas, a doença cardíaca congênita, a insuficiência cardíaca e a doença cardíaca reumática. Entre os principias fármacos que tratam as doenças cardiovasculares estão: (i) as estatinas, que atuam inibindo a 3-hidroxi-3-metilgluratil coenzima A (HMG-CoA) redutase no processo de conversão da HMG-CoA em mevalonato, numa das etapas da biossíntese do colesterol. Observa-se em ensaios clínicos que a ação das estatinas pode diminuir os níveis de colesterol de baixa densidade (LDL) entre 20\% e 60\%, reduzindo os eventos coronarianos em até 1/3 no período de cinco anos; (ii) a aspirina, com a qual há mais de 400 preparações nos EUA e se produz cerca de 20 mil toneladas anualmente. Após mais de um século de prática clínica, a aspirina continua sendo a droga antitrombótica, antitérmica, analgésica e antiproliferativa mais amplamente recomendada. Ela age bloqueando a biossíntese de hormônios inflamatórios prostanóides através da inibição das enzimas ciclooxigenase COX-1 e COX-2; (iii) os anti-hipertensivos, para os quais a Enzima Conversora de Angiotensina (ECA) é o principal alvo (inibidores da ECA estão no mercado a mais de 20 anos) visando o combate das pressões arteriais elevadas, que provocam alterações nos vasos sanguíneos e na musculatura do coração, e levam a hipertrofia do ventrículo esquerdo do coração, acidente vascular cerebral, infarto do miocárdio, morte súbita, insuficiências renal e cardíaca, etc. A hipertensão arterial (HTA) ou hipertensão arterial sistêmica (HAS), conhecida popularmente como pressão alta, é uma das doenças com maior prevalência no mundo moderno. A ECA atua na regulação da pressão sanguínea via conversão do decapeptídeo angiotensina I no potente vasopressor angiotensina II e também pela inativação da bradicinina, sendo componente central do Sistema Renina-Angiotensina-Aldeosterona (SRAA), que controla a pressão sanguínea e tem forte influência nas funções relacionadas ao coração e os rins, bem como na contração dos vasos sanguíneos. Nesta tese realiza-se um estudo da bioquímica quântica de estatinas (atorvastina, rosuvastatina, cerivastatina, mevastatina, sinvastatina e fluvastatina), da aspirina/bromoaspirina e de anti-hipertensivos (captopril, enalapril, lisinopril, ramipril, trandolapril e perindopril) levando-se em conta dados cristalográficos dos seus sítios de ligação nas proteínas HMGR, COX-1 (o da aspirina foi simulado partindo-se dos dados da bromoaspirina) e ECA, respectivamente. As simulações computacionais foram realizadas considerando-se a Teoria do Funcional de Densidade (DFT) na aproximação da densidade local (LDA) e funcional de troca e correlação PWC, com energia de interação entre os resíduos das proteínas circunscritos ao sítio de ligação de raio r e os fármacos calculada através do método de fracionamento molecular com capas conjugadas (MFCC). Os resultados obtidos para as estatinas sugerem que: (i) as mais (menos) eficazes são a atorvastatina e a rosuvastatina (sinvastatina e fluvastatina), o que está de acordo com a clínica e valores dos seus índices de concentrações inibitórias IC50; (ii) sítios de ligação com raios de pelo menos 12 Å (além do raio de 9,5 Å sugerido pela análise estrita de dados cristalográficos) devem ser considerados para que resíduos importantes como E665, D767, e R702 sejam considerados para que as eficiências das estatinas sejam corretamente explicadas. Para a aspirina/bromoaspirina utilizou-se um refinamento quântico de segunda ordem dos dados cristalográficos para se demonstrar que a energia de ligação de ambos com a COX-1 são aproximadamente a mesma, o que explica resultados experimentais de IC50 similares. A existência de resíduos atrativos e resulsivos é destacada, mostrando-se que Arg120 é o resíduo que mais atrai o ácido salicílico após acetilação da Ser530, seguido de Ala527, Leu531, Leu359 e Ser353; por outro lado, Glu524 é o resíduo repulsivo mais efetivo (intensidade comparável ao Arg120), nunca tendo sido antes considerado como resíduo importante no sítio de ligação da aspirina/bromoaspirina na COX-1. Finalmente, no caso dos anti-hipertensivos, obtém-se que é necessário se considerar raios do sítio de ligação de 16 Å para se obter que o lisinopropil e o ramipril (trandolapril e perindopril) apresentam as maiores (menores) energias de ligação, o que explica a maior (menor) constante de inibição dos mesmos entre os anti-hipertensivos estudados para a ACE da Drosophila melanogaster.

APA, Harvard, Vancouver, ISO, and other styles

35

Kryške, Lukáš. "Rozpoznávání řeči s pomocí nástroje Sphinx-4." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-220655.

Full text

Abstract:

This diploma thesis is aimed to find an effective method for continuous speech recognition. To be more accurate, it uses speech-to-text recognition for a keyword spotting discipline. This solution is able to be applicable for phone calls analysis or for a similar application. Most of the diploma thesis describes and implements speech recognition framework Sphinx-4 which uses Hidden Markov models (HMM) to define a language acoustic models. It is explained how these models can be trained for a new language or for a new language dialect. Finally there is in detail described how to implement the keyword spotting in the Java language.

APA, Harvard, Vancouver, ISO, and other styles

36

Karlsson, David. "Ljudklassificering med Tensorflow och IOT-enheter : En teknisk studie." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-39331.

Full text

Abstract:

Artificial Inteligens and machine learning has started to get established as reco- gnizable terms to the general masses in their daily lives. Applications such as voice recognicion and image recognicion are used widely in mobile phones and autonomous systems such as self-drivning cars. This study examines how one can utilize this technique to classify sound as a complement to videosurveillan- ce in different settings, for example a busstation or other areas that might need monitoring. To be able to do this a technique called Convolution Neural Ne- twork has been used since this is a popular architecture to use when it comes to image classification. In this model every sound has a visual representation in form of a spectogram that showes frequencies over time. One of the main goals of this study has been to be able to apply this technique on so called IOT units to be able to classify sounds in real time, this because of the fact that these units are relativly affordable and requires little resources. A Rasberry Pi was used to run a prototype version using tensorflow & keras as base api ́s. The studys re- sults show which parts that are important to consider to be able to get a good and reliable system, for example which hardware and software that is needed to get started. The results also shows what factors is important to be able to stream live sound and get reliable results, a classification models architecture is very important where different layers and parameters can have a large impact on the end result.
Termer som Artificiell Intelligens och maskininlärning har under de senaste åren börjat etablera sig hos den breda massan och är numera någonting som på- verkar nästan alla människors vardagliga liv i någon form. Vanliga använd- ningsområden är röststyrning och bildigenkänning som bland annat används i mobiltelefoner och autonoma system som självkörande bilar med mera. Den här studien utforskar hur man kan använda sig av denna teknik för att kunna klassi- ficera ljud som ett komplement till videoövervakning i olika miljöer, till exem- pel på en busstation eller andra övervakningsobjekt. För att göra detta har en teknik kallad Convolution Neural Network använts, vilket är en mycket populär arkitektur att använda vid klassificering av bilder. I denna modell har varje ljud fått en visuell representation i form av ett spektogram som visar frekvenser över tid. Ett av huvudmålen med denna studie har varit att kunna applicera denna teknik på så kallade IOT-enheter för att klassificera ljud i realtid. Dessa är rela- tivt billiga och resurssnåla enheter vilket gör dem till ett attraktivt alternativ för detta ändamål. I denna studie används en Raspberry Pi för att köra en prototyp- version med Tensorflow & Keras som grund APIer. Studien visar bland annat på vilka moment och delar som är viktiga att tänka på för att få igång ett smidigt och pålitligt system, till exempel vilken hårdvara och mjukvara som krävs för att starta. Den visar också på vilka faktorer som spelar in för att kunna streama ljud med bra resultat, detta då en klassifikationsmodells arkitektur och upp- byggnad kan ha stor påverkan på slutresultatet.

APA, Harvard, Vancouver, ISO, and other styles

37

Li, Ke. "Analysis of Energy losses of Microbial Fuel Cells (MFCs) and Design of an Innovative Constructed Wetlands-MFC." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1500604673955179.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Campos, Victor de Abreu [UNESP]. "Arcabouço para reconhecimento de locutor baseado em aprendizado não supervisionado." Universidade Estadual Paulista (UNESP), 2017. http://hdl.handle.net/11449/151725.

Full text

Abstract:

Submitted by Victor de Abreu Campos null (victorde.ac@gmail.com) on 2017-09-27T02:41:28Z No. of bitstreams: 1 dissertacao.pdf: 5473435 bytes, checksum: 1e76ecc15a4499dc141983740cc79e5a (MD5)
Approved for entry into archive by Monique Sasaki (sayumi_sasaki@hotmail.com) on 2017-09-28T13:43:21Z (GMT) No. of bitstreams: 1 campos_va_me_sjrp.pdf: 5473435 bytes, checksum: 1e76ecc15a4499dc141983740cc79e5a (MD5)
Made available in DSpace on 2017-09-28T13:43:21Z (GMT). No. of bitstreams: 1 campos_va_me_sjrp.pdf: 5473435 bytes, checksum: 1e76ecc15a4499dc141983740cc79e5a (MD5) Previous issue date: 2017-08-31
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
A quantidade vertiginosa de conteúdo multimídia acumulada diariamente tem demandado o desenvolvimento de abordagens eficazes de recuperação. Nesse contexto, ferramentas de reconhecimento de locutor capazes de identificar automaticamente um indivíduo pela sua voz são de grande relevância. Este trabalho apresenta uma nova abordagem de reconhecimento de locutor modelado como um cenário de recuperação e usando algoritmos de aprendizado não supervisionado recentes. A abordagem proposta considera Coeficientes Cepstrais de Frequência Mel (MFCCs) e Coeficientes de Predição Linear Perceptual (PLPs) como características de locutor, em combinação com múltiplas abordagens de modelagem probabilística, especificamente Quantização Vetorial, Modelos por Mistura de Gaussianas e i-vectors, para calcular distâncias entre gravações de áudio. Em seguida, métodos de aprendizado não supervisionado baseados em ranqueamento são utilizados para aperfeiçoar a eficácia dos resultados de recuperação e, com a aplicação de um classificador de K-Vizinhos Mais Próximos, toma-se uma decisão quanto a identidade do locutor. Experimentos foram conduzidos considerando três conjuntos de dados públicos de diferentes cenários e carregando ruídos de diversas origens. Resultados da avaliação experimental demonstram que a abordagem proposta pode atingir resultados de eficácia altos. Adicionalmente, ganhos de eficácia relativos de até +318% foram obtidos pelo procedimento de aprendizado não supervisionado na tarefa de recuperação de locutor e ganhos de acurácia relativos de até +7,05% na tarefa de identificação entre gravações de domínios diferentes.
The huge amount of multimedia content accumulated daily has demanded the development of effective retrieval approaches. In this context, speaker recognition tools capable of automatically identifying a person through their voice are of great relevance. This work presents a novel speaker recognition approach modelled as a retrieval scenario and using recent unsupervised learning methods. The proposed approach considers Mel-Frequency Cepstral Coefficients (MFCCs) and Perceptual Linear Prediction Coefficients (PLPs) as features along with multiple modelling approaches, namely Vector Quantization, Gaussian Mixture Models and i-vector to compute distances among audio objects. Next, rank-based unsupervised learning methods are used for improving the effectiveness of retrieval results and, based on a K-Nearest Neighbors classifier, an identity decision is taken. Several experiments were conducted considering three public datasets from different scenarios, carrying noise from various sources. Experimental results demonstrate that the proposed approach can achieve very high effectiveness results. In addition, effectiveness gains up to +318% were obtained by the unsupervised learning procedure in a speaker retrieval task. Also, accuracy gains up to +7,05% were obtained by the unsupervised learning procedure in a speaker identification task considering recordings from different domains.
FAPESP: 2015/07934-4

APA, Harvard, Vancouver, ISO, and other styles

39

Urbiš, Oldřich. "Algoritmy rozpoznávání řeči na FPGA/DSP." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2008. http://www.nusl.cz/ntk/nusl-235943.

Full text

Abstract:

This master's thesis deals with design of speech recognition algorithms with consideration of target technology, which is platform combinating digital signal processing and field programmable gate array. Algorithms for speech recognition includes: feature extraction of Melfrequency cepstral coefficients, hidden Markov models and their evaluation by Viterbi algorithm.

APA, Harvard, Vancouver, ISO, and other styles

40

Židlík, Pavel. "Počítačová analýza sportovních zápasů." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218104.

Full text

Abstract:

This work deals with the possibility of a fast football match analysis from audio part of record with the possibility of implementation of some methods for other than football matches as well. The first intention was concentrated on detection of whiz of the soccer whistle that has specific frequency in its specter, which is out of common speech frequency. After detection harmonic frequency , the attention was focused on the definition of whiz meaning. Referee was helpful with the issue as he informed me about the number of whiz styles and provided me with referential samples for whiz classification. Neural network with back propagation was used for definition of whiz meaning. Another subject for detection of important moments of the match was concentration on the commentator’s basic tone. In case the commentator is really excited with the match, his basic speech tone automatically intensifies with every important action of the game. Analysis of commentator’s intensified basic speech tone was realized in this work too. Also the national hymns of teams playing against each other are a significant moment of the match. That is why detection of a hymn became another subject of analysis. Advantages of MFCC were used to obtain audio signal feature, from which 20 coefficients were gained. These were used as an entrance for classifier based on neural network with back propagation. For easy usage of these methods a graphic user interface with possibility of well-arranged look on gained results and also with possibility of replaying chosen section was created.

APA, Harvard, Vancouver, ISO, and other styles

41

Pelikán, Pavel. "Určení výšky osob z řečového projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2013. http://www.nusl.cz/ntk/nusl-220197.

Full text

Abstract:

Diploma’s thesis is focused on determining person’s height from spoken utterance. First part of the work evaluates present situation and refers to the published studies. Knowledge gained in these studies was used in this thesis. Study with the best results according to estimated height of the speakers was chosen. The experiment realized in the chosen study was performed in this work. The system for the estimation of the height of the speakers based on the speech signal was created. This system was successfully tested by using several acoustic features on spoken utterances from TIMIT database.

APA, Harvard, Vancouver, ISO, and other styles

42

Almeida, Christiane Raulino. "Extratores de características acústicas inspirados no sistema periférico auditivo." Universidade Federal de Sergipe, 2014. http://ri.ufs.br:8080/xmlui/handle/123456789/5014.

Full text

Abstract:

Extracting information from acoustic signals is a common task in signal processing and pattern recognition. Broadly speaking, the processing system has, as initial task, to obtain a low-dimensional representation of the acoustic signal, extracted trough computational methods called feature extractors. This representation aims to present the sound of speech in a more convenient form to extract the information contained in the signal. Considering the initial task of processing systems, this work presents a detailed study of three classic methods for features extracting, namely: the Mel - Frequency Cepstrum Coefficients (MFCC), the Ensemble Interval Histogram (EIH), and the Zero Crossing with Peak amplitudes (ZCPA). Still in the literature review step, a study of the human peripheral auditory system was accomplished, since the EIH and ZCPA methods are based on models of human hearing. Moreover, a new extraction method based on detection of level crossings was developed, called here as Elementary Acoustic Events (EAE). In order to compare the methods, both reviewed and developed, two different experiments were applied in this work. At first, experiments with additive noise and channel effects for robustness analysis methods were performed. Finally, experiments related to the task of isolated word recognition were applied using alignment Dynamic Time Warping (DTW). The results suggest that the proposed method is more robust than the classical methods implemented, for the proposed experiments.
Extrair informações de sinais acústicos é uma tarefa bastante comum dentro das áreas de processamento de sinais e reconhecimento de padrões. De uma maneira geral, os sistemas de processamento têm como tarefa inicial obter uma representação de baixa dimensão do sinal acústico, obtida a partir de métodos computacionais denominados extratores de características. Tal representação propõe apresentar o som da fala de uma forma mais conveniente à tarefa de extração e utilização da informação contida no sinal. Dentro deste contexto, nesta dissertação foi realizado um estudo detalhado de três métodos clássicos para extração de características de sinais acústicos existentes na literatura, a saber: os Mel-Frequency Cepstrum Coefficients (MFCC); o modelo Ensemble Interval Histogram (EIH); e o modelo Zero-Crossing with Peak Amplitudes (ZCPA). Sendo que, ainda para revisão bibliográfica, um estudo do sistema auditivo periférico humano foi realizado, visto que os métodos EIH e ZCPA são baseados em modelos do ouvido humano. Em seguida, um novo método de extração baseado em detecção de cruzamentos de nível foi desenvolvido ao longo do trabalho, denominado Eventos Acústicos Elementares (EAE). Diversos experimentos foram realizados a fim de comparar os métodos clássicos e o método desenvolvido nessa dissertação. Na primeira etapa, foram realizados experimentos com ruídos aditivos e com efeitos convolutivos de canal, para análise de robustez dos métodos. Por fim, referente à segunda etapa da análise comparativa dos métodos, foram realizados experimentos relativos à tarefa de reconhecimento de palavras isoladas, utilizando o método de alinhamento temporal Dynamic Time Warping (DTW). Os resultados obtidos indicam que o método proposto possui maior robustez quando comparado aos métodos clássicos implementados.

APA, Harvard, Vancouver, ISO, and other styles

43

Ujihara, Rintaro. "Multi-objective optimization for model selection in music classification." Thesis, KTH, Optimeringslära och systemteori, 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-298370.

Full text

Abstract:

With the breakthrough of machine learning techniques, the research concerning music emotion classification has been getting notable progress combining various audio features and state-of-the-art machine learning models. Still, it is known that the way to preprocess music samples and to choose which machine classification algorithm to use depends on data sets and the objective of each project work. The collaborating company of this thesis, Ichigoichie AB, is currently developing a system to categorize music data into positive/negative classes. To enhance the accuracy of the existing system, this project aims to figure out the best model through experiments with six audio features (Mel spectrogram, MFCC, HPSS, Onset, CENS, Tonnetz) and several machine learning models including deep neural network models for the classification task. For each model, hyperparameter tuning is performed and the model evaluation is carried out according to pareto optimality with regard to accuracy and execution time. The results show that the most promising model accomplished 95% correct classification with an execution time of less than 15 seconds.
I och med genombrottet av maskininlärningstekniker har forskning kring känsloklassificering i musik sett betydande framsteg genom att kombinera olikamusikanalysverktyg med nya maskinlärningsmodeller. Trots detta är hur man förbehandlar ljuddatat och valet av vilken maskinklassificeringsalgoritm som ska tillämpas beroende på vilken typ av data man arbetar med samt målet med projektet. Denna uppsats samarbetspartner, Ichigoichie AB, utvecklar för närvarande ett system för att kategorisera musikdata enligt positiva och negativa känslor. För att höja systemets noggrannhet är målet med denna uppsats att experimentellt hitta bästa modellen baserat på sex musik-egenskaper (Mel-spektrogram, MFCC, HPSS, Onset, CENS samt Tonnetz) och ett antal olika maskininlärningsmodeller, inklusive Deep Learning-modeller. Varje modell hyperparameteroptimeras och utvärderas enligt paretooptimalitet med hänsyn till noggrannhet och beräkningstid. Resultaten visar att den mest lovande modellen uppnådde 95% korrekt klassificering med en beräkningstid på mindre än 15 sekunder.

APA, Harvard, Vancouver, ISO, and other styles

44

Ulrich, Natalja. "Linguistic and speaker variation in Russian fricatives." Electronic Thesis or Diss., Lyon 2, 2022. http://www.theses.fr/2022LYO20031.

Full text

Abstract:

Cette thèse présente une investigation acoustico-phonétique des détails phonétiques des fricatives russes.L'objectif principal était de détecter des corrélats acoustiques porteurs d'infor- mations linguistiques et idiosyncrasiques. Les questions abordées étaient de savoir si le lieu d'articulation, le sexe du locuteur ou son identité peuvent être prédits par des indices acoustiques et quelles mesures acoustiques représentent les indicateurs les plus fiables. En outre, la distribution des caractéristiques spécifiques au locuteur et à la variation inter et intra locuteur à travers les indices acoustiques a été étudiée plus en détail. Le projet a commencé par la création d'une grande base de données audio des fricatives russes. Des enregistrements acoustiques ont été obtenus auprès de 59 locuteurs russes natifs. Le jeu de données résultant est composé de 22 561 occurrences comprenant les fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ]. Deux analyses ont été menées à partir de cette base de données. Dans la première étude, un échantillon de données de 6320 occurrences (40 locuteurs) a été utilisé. Trois techniques d'extraction acoustisque (à partir du son complet, de la durée du bruit et des fenêtres centrales de 30 ms) ont été sollicitées pour extraire des mesures temporelles et spectrales. En outre, 13 coefficients cepstraux (Mel-Frequency Cepstral Coefficients, MFCC) ont été calculés à partir de la fenêtre centrale de 30 ms. Des classificateurs fondés sur des arbres de décision simples, des forêts aléatoires, des machines à vecteurs de support (Support-vector machine, SVM) et des réseaux neuronaux ont été entraînés et testés pour distinguer trois fricatives non palatalisées [f], [s] et
This thesis represents an acoustic-phonetic investigation of phonetic details in Russian fricatives. The main aim was to detect acoustic correlates that carry linguistic and idiosyncratic information. The questions addressed were whether the place of articulation, speakers' gender and ID can be predicted by a set of acoustic cues and which acoustic measures represent the most reliable indicators. Furthermore, the distribution of speaker-specific characteristics and inter- and intra-speaker variation across acoustic cues were studied in more detail.The project started with the generation of a large audio database of Russian fricatives. Then, two follow-up analyses were conducted. Acoustic recordings were collected from 59 native Russian speakers. The resulting dataset consists of 22,561 tokens including the fricatives [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ], [vʲ], [zʲ].The first study employed a data sample of 6320 tokens (from 40 speakers). Temporal and spectral measurements were extracted using three acoustic cue extraction techniques (full sound, the noise part, and the middle 30ms windows). Furthermore, 13 Mel Frequency Cepstral Coefficients were computed from the middle 30ms window.Classifiers based on single decision trees, random forests, support vector machines, and neural networks were trained and tested to distinguish between the three non-palatalized fricatives [f], [s] and [ʃ].The results demonstrate that machine learning techniques are very successful at classifying the Russian voiceless non-palatalized fricatives [f], [s] and [ʃ] by using the centre of gravity and the spectral spread irrespective of contextual and speaker variation. The three acoustic cue extraction techniques performed similarly in terms of classification accuracy (93% and 99%), but the spectral measurements extracted from the noise parts resulted in slightly better accuracy. Furthermore, Mel Frequency Cepstral Coefficients show marginally higher predictive power over spectral cues (< 2%).This suggests that both spectral measures and Mel Frequency Cepstral provide sufficient information for the classification of these fricatives and their choice depends on the particular research question or application. The second study's dataset consists of 15812 tokens (59 speakers) that contain [f], [s], [ʃ], [x], [v], [z], [ʒ], [sj], [ɕ]. As in the first study, two types of acoustic cues were extracted including 11 acoustic speech features (spectral cues, duration and HNR measures) and 13 Mel Frequency Cepstral Coefficients. Classifiers based on single decision trees and random forests were trained and tested to predict speakers' gender and ID

APA, Harvard, Vancouver, ISO, and other styles

45

Грушко, Ярослав Володимирович. "Система голосової біометрії, економна до обчислювальних ресурсів." Master's thesis, КПІ ім. Ігоря Сікорського, 2019. https://ela.kpi.ua/handle/123456789/32176.

Full text

Abstract:

Мета даної роботи – створити економну до обчислювальних ресурсів систему голосової біометрії. Основною ціллю роботи стали побудова загальної схеми такої системи, визначення її компонент та оптимальних параметрів. Об’єктом дослідження даної магістерської дипломної роботи є розпізнавання голосу людини комп’ютером. Предмет дослідження – голосова біометрія, тобто голосове розпізнавання особи. Спроєктована система складається з трьох основних модулів. Перший модуль – це алгоритм отримання голосового відбитку MFCCs. Другий модуль – це класифікатор, який має навчатися голосовими відбитками отриманими за допомогою першого модуля. Третій, і останній, модуль є верифікатором, який вдруге (після класифікатора) перевіряє правильність визначення особи. Задля підбору параметрів було розроблено окрему систему. Виходячи з підібраних оптимальних параметрів було створено консольний додаток голосової біометрії на мові програмування python та окремий мобільний додаток на java. Точність консольного додатку на вибірці 80 зразків 40-ка різних дикторів склала 93%. При проходженні аутентифікації, коли оброблювалося 6 секунд промови, тривалість роботи консольного додатку склала 2 секунди. Виконано перший етап розроблення стартап-проєкту, а саме, виконано маркетинговий аналіз стартап-проекту.
The purpose of this work is to create a cost-effective system for voice biometrics. The main purpose of the work was to build a general scheme of such a system as well as determine its components and optimal parameters. The object of study of this master's work is the recognition of human voice by computer. The subject of the study is voice biometrics, ie voice recognition of the individual. Designed system contain three basic modules. The first module is the MFCCs, the algorithm that give off individual voiceprint. The second module is a classifier that has to learn the voiceprints obtained with the first module. The third, and last, module is the verifier, which for the second time (after the classifier) verifies the correct identification of the person. A separate system was developed for parameter selection. Based on the selected optimal parameters, console application of voice biometrics in the Python programming language and a separate java mobile application were created. The accuracy of the console application on a dataset of 80 samples of 40 different individuals was 93%. During authentication, when 6 seconds of speech were been processing, the duration of the console application working was 2 seconds. The first stage of the development of the startup project was completed, namely, the marketing analysis of the startup project was performed.

APA, Harvard, Vancouver, ISO, and other styles

46

Odehnal, Jiří. "Řízení a měření sportovních drilů hlasem/zvuky." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2019. http://www.nusl.cz/ntk/nusl-399705.

Full text

Abstract:

This master's thesis deals with the design and development of mobile aplication for Android platform. The aim of the work is to implement a simple and user-friendly user interface that would support and assist the user in trainning and sport exercises. The thesis also include implementation of sound detection to support during exercises and voice instruction by application. In practice the application should help in making training exercises more comfortable without the user being forced to keep mobile device in hand.

APA, Harvard, Vancouver, ISO, and other styles

47

Hacine-Gharbi, Abdenour. "Sélection de paramètres acoustiques pertinents pour la reconnaissance de la parole." Phd thesis, Université d'Orléans, 2012. http://tel.archives-ouvertes.fr/tel-00843652.

Full text

Abstract:

L'objectif de cette thèse est de proposer des solutions et améliorations de performance à certains problèmes de sélection des paramètres acoustiques pertinents dans le cadre de la reconnaissance de la parole. Ainsi, notre première contribution consiste à proposer une nouvelle méthode de sélection de paramètres pertinents fondée sur un développement exact de la redondance entre une caractéristique et les caractéristiques précédemment sélectionnées par un algorithme de recherche séquentielle ascendante. Le problème de l'estimation des densités de probabilités d'ordre supérieur est résolu par la troncature du développement théorique de cette redondance à des ordres acceptables. En outre, nous avons proposé un critère d'arrêt qui permet de fixer le nombre de caractéristiques sélectionnées en fonction de l'information mutuelle approximée à l'itération j de l'algorithme de recherche. Cependant l'estimation de l'information mutuelle est difficile puisque sa définition dépend des densités de probabilités des variables (paramètres) dans lesquelles le type de ces distributions est inconnu et leurs estimations sont effectuées sur un ensemble d'échantillons finis. Une approche pour l'estimation de ces distributions est basée sur la méthode de l'histogramme. Cette méthode exige un bon choix du nombre de bins (cellules de l'histogramme). Ainsi, on a proposé également une nouvelle formule de calcul du nombre de bins permettant de minimiser le biais de l'estimateur de l'entropie et de l'information mutuelle. Ce nouvel estimateur a été validé sur des données simulées et des données de parole. Plus particulièrement cet estimateur a été appliqué dans la sélection des paramètres MFCC statiques et dynamiques les plus pertinents pour une tâche de reconnaissance des mots connectés de la base Aurora2.

APA, Harvard, Vancouver, ISO, and other styles

48

Houdek, Miroslav. "Rozpoznání emočního stavu člověka z řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218117.

Full text

Abstract:

This master thesis concerns with emotional states and gender recognition on the basis of speech signal analysis. We used various prosodic and cepstral features for the description of the speech signal. In the text we describe non-invasive methods for glottal pulses estimation. The described features of speech were implemented in MATLAB. For their classification we used the GMM classifier, which uses the Gaussian probability distribution for modeling a feature space. Furthermore, we constructed a system for recognition of emotional states of the speaker and a system for gender recognition from speech. We tested the success of created systems with several features on speech signal segments of various lengths and compared the results. In the last part we tested the influence of speaker and gender on the success of emotional states recognition.

APA, Harvard, Vancouver, ISO, and other styles

49

Evelyn. "Mediator combined gaseous substrate for electricity generation in microbial fuel cells (MFCs) and potential integration of a MFC into an anaerobic biofiltration system." Thesis, University of Canterbury. Department of Chemical ad Process Engineering, 2013. http://hdl.handle.net/10092/10733.

Full text

Abstract:

Microbial fuel cells (MFCs) are emerging energy production technology which converts the chemical energy stored in biologically degradable compounds to electricity at high efficiencies. Microbial fuel cells have some advantages such as use of an inexpensive catalyst, operate under mild reaction conditions (i.e. ambient temperature, normal pressure and neutral pH), and generate power from a wide range and cheap raw materials. These make microbial fuel cell as an attractive alternative over other electricity generating devices. However, so far the major problem posses by this technology is the low power outputs of the microbial fuel cells that hinder its commercialization. Restriction in the electron transfer from bacteria to the anode electrode of a MFC is thought to be one cause for the low power output. Most recent MFC research is focused on using contaminants present in industrial, agricultural, and municipal wastewater as the energy source, with very few studies utilising gaseous substrates. Mediators can be added to MFCs to enhance the electron transfer from the microbe to the anode, but have limited practical applicability in wastewater applications because of the difficulty in recovering the expensive and potentially toxic compound. This thesis describes an investigation of electricity generation in a microbial fuel cell by combining a gaseous substrate with a mediator in the anode compartment. The emphasis being placed on the selection of a mediator to improve the electron transfer process for electricity production in an MFC. Subsequently, methods to improve the performance of a mediator MFC in respect of power and current density were discussed. This type of MFC is purposely aimed to be applied for treating gaseous contaminants in an anaerobic biofilter while simultaneously produce electricity. In this study, ethanol was the first gaseous substrate tested for the possibility to generate electricity in the MFC. Various mediators were previously compared in their reversibility of redox reactions and in the current production, and three best mediators were then selected for the power production. The highest electrical current production i.e. 12 μA/cm2 was obtained and sustained for 24 hrs with N,N,N',N'-tetramethyl-1,4- phenylendiamine TMPD (N-TMPD) as the mediator using glassy carbon (GC) electrode. The maximum power density reached 0.16 mW/cm2 by using carbon cloth (CC) anode. The absorption of these mediators by the bacterial cells was shown to correlate with the obtained energy production, with no N-TMPD was absorbed by the bacterial cells. The 24 hr current production was shown to be accompanied by the decrease in the ethanol concentration (i.e. 1.82 g/L), however ethanol crossover through the proton exchange membrane and ethanol evaporation around the electrodes were most likely to be the major cause of the decrease in the ethanol concentration. A theoretical coulombic efficiency of 0.005% was calculated for this system. The electrokinetics of microbial reduced mediator in the ethanol-mediator MFCs was also examined. Two methods i.e. linear sweep voltammetry (LSV) and cyclic voltammetry (CV) were used to obtained the kinetic parameters. CV method gave a better estimation of the kinetic parameters than LSV method due to the low concentration of the mediators used, affecting the Tafel behaviors. All CVs showed quasi-reversible behaviors compared to the CVs in the absence of the bacteria, which is thought due to the bacteria decreased the amount of the reduced and the oxidised mediator available at the surface of GC electrode. The highest exchange current density (i o ) was obtained by using N-TMPD as the mediator with the same concentration of the mediator used i.e. 0.13±0.01 mA/cm 2. The power output achieved also the highest (0.008 mW/cm 2) with N-TMPD as the mediator. The power density was improved to 0.03 mW/cm2 by using CC electrode. Another main objective of this thesis is to prove anoxic methane oxidation which was believed to occur only in marine sediments, and applies this for power generation in microbial fuel cells. Ferricyanide looked promising when it was used as the electron acceptor (thus as the mediator for the MFC). It was shown that ferricyanide was fully reduced by methanotrophs bacteria with methane as the substrate (versus abiotic and nitrogen control). The highest reduction rate achieved was 3 x10-3 mM/min.g. This finding was supported by ferricyanide peak heights disappearance (spectrophotometry at 420 nm), CO 2 production (sensor readings), ferrocyanide formation (cyclic voltammetry), and no other alternate electron acceptor was present. The total CO 2 produced was equal to 0.015 mmoles of CO 2 from starting concentration ferricyanide of 0.2 mmoles (after substraction with an offset value). CV results show 2.4 mM of ferrocyanide was produced after a total addition of 3 mM ferricyanide into the anoxic methanotrophic suspension. The current and voltage generation in microbial fuel cell reactor from the reduced ferricyanide confirmed that ferricyanide received electrons from the bacterial metabolism. The maximum power density of 0.02 mW/cm2 and OCV of 0.6 V were obtained with 3 mM ferricyanide using LSV method.

APA, Harvard, Vancouver, ISO, and other styles

50

Larsson, Joel. "Optimizing text-independent speaker recognition using an LSTM neural network." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312.

Full text

Abstract:

In this paper a novel speaker recognition system is introduced. Automated speaker recognition has become increasingly popular to aid in crime investigations and authorization processes with the advances in computer science. Here, a recurrent neural network approach is used to learn to identify ten speakers within a set of 21 audio books. Audio signals are processed via spectral analysis into Mel Frequency Cepstral Coefficients that serve as speaker specific features, which are input to the neural network. The Long Short-Term Memory algorithm is examined for the first time within this area, with interesting results. Experiments are made as to find the optimum network model for the problem. These show that the network learns to identify the speakers well, text-independently, when the recording situation is the same. However the system has problems to recognize speakers from different recordings, which is probably due to noise sensitivity of the speech processing algorithm in use.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!