Dissertations / Theses on the topic 'Mel-Frequency Cepstral coefficients'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 45 dissertations / theses for your research on the topic 'Mel-Frequency Cepstral coefficients.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Darch, Jonathan J. A. "Robust acoustic speech feature prediction from Mel frequency cepstral coefficients." Thesis, University of East Anglia, 2008. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.445206.
Full textEdman, Sebastian. "Radar target classification using Support Vector Machines and Mel Frequency Cepstral Coefficients." Thesis, KTH, Optimeringslära och systemteori, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-214794.
Full textI radar applikationer räcker det ibland inte med att veta att systemet observerat ett mål när en reflekted signal dekekteras, det är ofta också utav stort intresse att veta vilket typ av föremål som signalen reflekterades mot. Detta projekt undersöker möjligheterna att utifrån rå radardata transformera de reflekterade signalerna och använda sina mänskliga sinnen, mer specifikt våran hörsel, för att skilja på olika mål och också genom en maskininlärnings approach där med hjälp av mönster och karaktärsdrag för dessa signaler används för att besvara frågeställningen. Mer ingående avgränsas denna undersökning till två typer av mål, mindre obemannade flygande farkoster (UAV) och fåglar. Genom att extrahera komplexvärd radar video även känt som I/Q data från tidigare nämnda typer av mål via signalbehandlingsmetoder transformera denna data till reella signaler, därefter transformeras dessa signaler till hörbara signaler. För att klassificera dessa typer av signaler används typiska särdrag som också används inom taligenkänning, nämligen, Mel Frequency Cepstral Coefficients tillsammans med två modeller av en Support Vector Machine klassificerings metod. Med den linjära modellen uppnåddes en prediktions noggrannhet på 93.33%. Individuellt var noggrannheten 93.33 % korrekt klassificering utav UAV:n och 93.33 % på fåglar. Med radial bas modellen uppnåddes en prediktions noggrannhet på 98.33%. Individuellt var noggrannheten 100 % korrekt klassificering utav UAV:n och 96.76% på fåglar. Projektet är delvis utfört med J. Clemedson [2] vars fokus är att, som tidigare nämnt, transformera dessa signaler till hörbara signaler.
Yang, Chenguang. "Security in Voice Authentication." Digital WPI, 2014. https://digitalcommons.wpi.edu/etd-dissertations/79.
Full textWu, Qiming. "A robust audio-based symbol recognition system using machine learning techniques." University of the Western Cape, 2020. http://hdl.handle.net/11394/7614.
Full textThis research investigates the creation of an audio-shape recognition system that is able to interpret a user’s drawn audio shapes—fundamental shapes, digits and/or letters— on a given surface such as a table-top using a generic stylus such as the back of a pen. The system aims to make use of one, two or three Piezo microphones, as required, to capture the sound of the audio gestures, and a combination of the Mel-Frequency Cepstral Coefficients (MFCC) feature descriptor and Support Vector Machines (SVMs) to recognise audio shapes. The novelty of the system is in the use of piezo microphones which are low cost, light-weight and portable, and the main investigation is around determining whether these microphones are able to provide sufficiently rich information to recognise the audio shapes mentioned in such a framework.
Candel, Ramón Antonio José. "Verificación automática de locutores aplicando pruebas diagnósticas múltiples en serie y en paralelo basadas en DTW (Dynamic Time Warping) y NFCC (Mel-Frequency Cepstral coefficients)." Doctoral thesis, Universidad de Murcia, 2015. http://hdl.handle.net/10803/300433.
Full textThe present thesis is the design of a system capable of performing automatic speaker verification, for which is based on modeling using the DTW (Dynamic Time Warping) and procedures MFCC (Mel-Frequency Cepstral Coefficients). Once designed it, we have evaluated the system so both at individual events, DTW and MFCC separately as multiple, combining both in series and in parallel, to recordings obtained from the data base AHUMADA from the Guardia Civil. All results have been seen considering the statistical significance thereof, derived from performing a given finite number of tests. Statistical results have been obtained in such a system for different sizes of the databases used, allowing us to conclude the influence of these in the method in order to fix a priori the different variables of this, in order to make the best possible study. To the same conclusion, we can identify what is the best system, consisting of model type and sample size, we use a forensic study based on the intended purpose.
Lindstål, Tim, and Daniel Marklund. "Application of LabVIEW and myRIO to voice controlled home automation." Thesis, Uppsala universitet, Signaler och System, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:uu:diva-380866.
Full textLarsson, Alm Kevin. "Automatic Speech Quality Assessment in Unified Communication : A Case Study." Thesis, Linköpings universitet, Programvara och system, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-159794.
Full textNeville, Katrina Lee, and katrina neville@rmit edu au. "Channel Compensation for Speaker Recognition Systems." RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080514.093453.
Full textAlvarenga, Rodrigo Jorge. "Reconhecimento de comandos de voz por redes neurais." Universidade de Taubaté, 2012. http://www.bdtd.unitau.br/tedesimplificado/tde_busca/arquivo.php?codArquivo=587.
Full textSystems for speech recognition have widespread use in the industrial universe, in the improvement of human operations and procedures and in the area of entertainment and recreation. The specific objective of this study was to design and develop a voice recognition system, capable of identifying voice commands, regardless of the speaker. The main purpose of the system is to control movement of robots, with applications in industry and in aid of disabled people. We used the approach of decision making, by means of a neural network trained with the distinctive features of the speech of 16 speakers. The samples of the voice commands were collected under the criterion of convenience (age and sex), to ensure a greater discrimination between the voice characteristics and to reach the generalization of the neural network. Preprocessing consisted in the determination of the endpoints of each command signal and in the adaptive Wiener filtering. Each speech command was segmented into 200 windows with overlapping of 25%. The features used were the zero crossing rate, the short-term energy and the mel-frequency ceptral coefficients. The first two coefficients of the linear predictive coding and its error were also tested. The neural network classifier was a multilayer perceptron, trained by the backpropagation algorithm. Several experiments were performed for the choice of thresholds, practical values, features and neural network configurations. Results were considered very good, reaching an acceptance rate of 89,16%, under the `worst case conditions for the sampling of the commands.
Larsson, Joel. "Optimizing text-independent speaker recognition using an LSTM neural network." Thesis, Mälardalens högskola, Akademin för innovation, design och teknik, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:mdh:diva-26312.
Full textHrabina, Martin. "VÝVOJ ALGORITMŮ PRO ROZPOZNÁVÁNÍ VÝSTŘELŮ." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2019. http://www.nusl.cz/ntk/nusl-409087.
Full textZezula, Miroslav. "Online detekce jednoduchých příkazů v audiosignálu." Master's thesis, Vysoké učení technické v Brně. Fakulta strojního inženýrství, 2011. http://www.nusl.cz/ntk/nusl-229484.
Full textMahajan, Mayur. "Development of a speech recognition system using the Mel Frequency Cepstrum Coefficient method." Thesis, California State University, Long Beach, 2016. http://pqdtopen.proquest.com/#viewpdf?dispub=10141515.
Full textVoice recognition systems have found widespread use in applications such as tele-shopping, tele-banking, information services, home automation, voice message security, and voice call dialing, which allows a driver to make calls safely while driving.
This project presents the development of a high performance speech recognition system using human voice models. Recognizing the behavior of the human ear, the Mel Frequency Cepstral Coefficient (MFCC) method is used to develop the system capability for feature extraction. Vector quantization optimized by the Linde Buzo Gray (LGB) algorithm is used for feature matching. Experimental results show that the system has over 90% success rate in the noise-free case, but the system performance deteriorates in the presence of noise. The system, however, has better recognition ability when the noise signal consists of harmonic components, as compared to a non-stationary, non-harmonic signal.
Hrušovský, Enrik. "Automatická klasifikace výslovnosti hlásky R." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2018. http://www.nusl.cz/ntk/nusl-377664.
Full textOkuyucu, Cigdem. "Semantic Classification And Retrieval System For Environmental Sounds." Master's thesis, METU, 2012. http://etd.lib.metu.edu.tr/upload/12615114/index.pdf.
Full textAssaad, Firas Souhail. "Biometric Multi-modal User Authentication System based on Ensemble Classifier." University of Toledo / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=toledo1418074931.
Full textDušil, Lubomír. "Automatické rozpoznávání logopedických vad v řečovém projevu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218161.
Full textPešek, Milan. "Detekce logopedických vad v řeči." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2009. http://www.nusl.cz/ntk/nusl-218106.
Full textWang, Yihan. "Automatic Speech Recognition Model for Swedish using Kaldi." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-285538.
Full textDet existerar flera lösningar för automatisk transkribering på marknaden, menen stor del av dem stödjer inte svenska på grund utav det relativt få antalettalare. I det här projektet så skapades automatisk transkribering för svenskamed Hidden Markov models och Gaussian mixture models genom att användaKaldi. Detta för att kunna möjliggöra för ICABanken att klassificera samtal tillsin kundtjänst. En mängd av modellvariationer med olika fonemkombinationsmetoder,egenvärdesberäkning och databearbetningsmetoder har utforskats.Word error rate och real time factor är valda som utvärderingskriterier föratt jämföra precisionen och hastigheten mellan modellerna. När det kommertill kontinuerlig transkribering för ett stort ordförråd så resulterar triphonei mycket bättre prestanda än monophone. Med hjälp utav transformationerså förbättras både precisionen och hastigheten. Kombinationen av lineardiscriminatn analysis, maximum likelihood linear transformering och speakeradaptive träning resulterar i den bästa prestandan i denna implementation.För olika egenskapsextraktioner så bidrar mel-frequency cepstral koefficiententill en bättre precision medan perceptual linear predictive tenderar att ökahastigheten.
Лавриненко, Олександр Юрійович, Александр Юрьевич Лавриненко, and Oleksandr Lavrynenko. "Методи підвищення ефективності семантичного кодування мовних сигналів." Thesis, Національний авіаційний університет, 2021. https://er.nau.edu.ua/handle/NAU/52212.
Full textThe thesis is devoted to the solution of the actual scientific and practical problem in telecommunication systems, namely increasing the bandwidth of the semantic speech data transmission channel due to their efficient coding, that is the question of increasing the efficiency of semantic coding is formulated, namely – at what minimum speed it is possible to encode semantic features of speech signals with the set probability of their error-free recognition? It is on this question will be answered in this research, which is an urgent scientific and technical task given the growing trend of remote human interaction and robotic technology through speech, where the accurateness of this type of system directly depends on the effectiveness of semantic coding of speech signals. In the thesis the well-known method of increasing the efficiency of semantic coding of speech signals based on mel-frequency cepstral coefficients is investigated, which consists in finding the average values of the coefficients of the discrete cosine transformation of the prologarithmic energy of the spectrum of the discrete Fourier transform treated by a triangular filter in the mel-scale. The problem is that the presented method of semantic coding of speech signals based on mel-frequency cepstral coefficients does not meet the condition of adaptability, therefore the main scientific hypothesis of the study was formulated, which is that to increase the efficiency of semantic coding of speech signals is possible through the use of adaptive empirical wavelet transform followed by the use of Hilbert spectral analysis. Coding efficiency means a decrease in the rate of information transmission with a given probability of error-free recognition of semantic features of speech signals, which will significantly reduce the required passband, thereby increasing the bandwidth of the communication channel. In the process of proving the formulated scientific hypothesis of the study, the following results were obtained: 1) the first time the method of semantic coding of speech signals based on empirical wavelet transform is developed, which differs from existing methods by constructing a sets of adaptive bandpass wavelet-filters Meyer followed by the use of Hilbert spectral analysis for finding instantaneous amplitudes and frequencies of the functions of internal empirical modes, which will determine the semantic features of speech signals and increase the efficiency of their coding; 2) the first time it is proposed to use the method of adaptive empirical wavelet transform in problems of multiscale analysis and semantic coding of speech signals, which will increase the efficiency of spectral analysis due to the decomposition of high-frequency speech oscillations into its low-frequency components, namely internal empirical modes; 3) received further development the method of semantic coding of speech signals based on mel-frequency cepstral coefficients, but using the basic principles of adaptive spectral analysis with the application empirical wavelet transform, which increases the efficiency of this method. Conducted experimental research in the software environment MATLAB R2020b showed, that the developed method of semantic coding of speech signals based on empirical wavelet transform allows you to reduce the encoding speed from 320 to 192 bit/s and the required passband from 40 to 24 Hz with a probability of error-free recognition of about 0.96 (96%) and a signal-to-noise ratio of 48 dB, according to which its efficiency increases 1.6 times in contrast to the existing method. The results obtained in the thesis can be used to build systems for remote interaction of people and robotic equipment using speech technologies, such as speech recognition and synthesis, voice control of technical objects, low-speed encoding of speech information, voice translation from foreign languages, etc.
Bekli, Zeid, and William Ouda. "A performance measurement of a Speaker Verification system based on a variance in data collection for Gaussian Mixture Model and Universal Background Model." Thesis, Malmö universitet, Fakulteten för teknik och samhälle (TS), 2018. http://urn.kb.se/resolve?urn=urn:nbn:se:mau:diva-20122.
Full textSklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.
Full textKuo, Yo-zhen, and 郭又禎. "Improved Mel-scale Frequency Cepstral Coefficients for Keyword Spotting Technique." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/27592493670347223949.
Full text國立中央大學
電機工程學系
102
In the speech recognition system, Mel frequency cepstral coefficients (MFCCs) are the feature parameters that are used widely. Because of the wide applications of MFCC in the audio signal processing, lots of studies on the improvement of MFCCs were presented. In this study, we use particle swarm optimization algorithm to optimize the weight of MFCC filter bank. We utilize the difference between voice training database’s energy statistical curve and MFCC filter bank’s envelope as fitness function. Experimental results show that the proposed MFCCs method improves the recognition rate. In noisy environment experiments, the presented MFCCs method also improves the recognition performance.
Tang, Chu-Liang, and 唐曲亮. "Improved Mel Frequency Cepstral Coefficients Combined with Multiple Speech Features." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/57856949340151071584.
Full text國立中央大學
電機工程學系
103
This thesis studies the speech feature extracting and feature compensation in speech recognition. Several speech features are selected for combinations. The best one is cascading Linear Prediction Cepstral Coefficients (LPCC) and Mel-Frequency Cepstral Coefficient (MFCC). The MFCCs used here are obtained by utilizing a Gaussian Mel-Frequency band instead of using a triangular filter bank. And by experiments, it is found that the best combination ratio of LPCC and MFCC is 1:1. The thesis also showed that further improved performance is possible if Cepstral Mean and Variance Normalization (CMVN) is added.
林士棻. "Bird songs recognition using two-dimensional Mel-scale frequency cepstral coefficients." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/38302762655714685237.
Full text中華大學
資訊工程學系(所)
94
We propose a method to automatically identify birds from their sounds in this paper. First, each syllable corresponding to a piece of vocalization is segmented. The average LPCC (ALPCC), average MFCC (AMFCC), Static MFCC (SMFCC), Two-dimensional MFCC (TDMFCC), Dynamic two-dimensional MFCC (DTDMFCC) and TDMFCC+DTDMFCC over all frames in a syllable are calculated as the vocalization features. Linear discriminant analysis (LDA) is exploited to increase the classification accuracy at a lower dimensional feature vector space. A clustering algorithm, called progressive constructive clustering (PCC) algorithm, is used to divide the feature vectors which were computed from the same bird species into several subclasses. In our experiments, TDMFCC+DTDMFCC can achieve average classification accuracy 90% and 89% for 420 bird species and 561 bird species.
Lin, Shih-Fen, and 林士棻. "Bird songs recognition using two-dimensional Mel-scale frequency cepstral coefficients." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/94553686394732089037.
Full textHUANG, CHUAN-HAO, and 黃川豪. "Multi-feature Speaker Verification Based on Mel-frequency cepstral coefficients and Formants." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/4nbqev.
Full textCHIANG, MING-DA, and 蔣明達. "Speaker Recognition Using Mel-Scale Frequency Cepstral Coefficients by Time Domain Filtering method." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/13444981721982290438.
Full text中華技術學院
電子工程研究所碩士班
96
ABSTRACT According to past papers, we find that the algorithm based on Mel-frequency cepstral coefficients (MFCCs) has a better performance than any other algorithms which based on the other feature parameters [1-7]. The Mel-frequency cepstral coefficients are taken by following procedures, including: framing, multiplied by the Hamming Window, taking the fast Fourier transform (FFT), filtered in frequency domain by Mel-frequency triangular filter bank, calculating the logarithmic energy of filter outputs, and taking discrete cosine transform (DCT) to obtain the Mel-frequency cepstral coefficients. In this paper, for finding the Mel-frequency cepstral coefficients, the conventional frequency domain filtering procedures [1] are replaced by our new directly time domain filtering procedures proposed in this thesis. The simulation results show that the performances between our new method and the previous approach [1] are quite similar.
Xu, Sheng-Bin, and 徐勝斌. "Continuous Birdsong Recognition Using Dynamic and Temporal Two-Dimensional Mel-Frequency Cepstral Coefficients." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/21749503795140776068.
Full text中華大學
資訊工程學系(所)
97
In this paper, we will propose an approach for the classification of bird species using fixed-duration sound segments extracted from continuous birdsong recordings. First, each sound segment is divided into a number of overlapped texture windows. Each texture window will be individually classified and then a fusion approach is employed to determine the classification result of the input segment. The features derived from static, transitional, and temporal information of two-dimensional Mel-frequency cepstral coefficients (TDMFCC) will be extracted for the classification of each texture window. TDMFCC can describe both static and dynamic characteristics of a texture window, and dynamic TDMFCC (DTDMFCC) is used to describe sharp transitions within a texture window, and global dynamic TDMFCC (GDTDMFCC) is developed to describe long-time temporal variations in a texture window, and the concepts of DTDMFCC, which computes local regression coefficients, and GDTDMFCC, which evaluates global contrast information, can be integrated to form a new feature vector, called global and local DTDMFCC (GLDTDMFCC). Furthermore, we use principal component analysis (PCA) to reduce the feature dimension, Gaussian mixture models (GMM) to model the sound of different bird species, and linear discriminant analysis (LDA) to improve the classification accuracy at a lower dimensional feature vector space. In our experiment, the highest average classification accuracy is 94.62% for the classification of 28 kinds of bird species.
Lin, Bo-Zhi, and 林柏志. "Speaker Recognition Algorithm Using Mel-Scale Frequency Cepstral Coefficients with Two Stages Linear Prediction Filters." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/18209732501243789128.
Full text中華技術學院
電子工程研究所碩士班
94
The development of computer and communication technologies hastens the application requirements of speaker recognition and speech recognition. The purpose of this paper is to present a new algorithm to promote the performance of speaker recognition. The algorithm uses two stages linear prediction error filters to estimate the spectrogram of the processed speech signal. Then, the algorithm uses Mel-scale triangle bandpass filters bank to obtain the Mel-scale frequency cepstral coefficients(MFCC)to build the needed Gaussian mixture model for speaker recognition. To verify that the algorithm can work well and to compare the performance with the other algorithms, we use the mandarin speech data base, MAT-400, which was bought from the Association for Computational Linguistics and Chinese Language Processing. The experimental results show that the proposed algorithm has the best performance in the case of higher signal-to-noise ratio.
Yang-Ming, Cheng, and 鄭陽銘. "A Mel-Scale Frequency Cepstral Coefficients Speaker Recognition Algorithm Based on Linear Prediction Spectrum Estimation." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/38345345070598427641.
Full text中華技術學院
電子工程研究所碩士班
93
According to the past research, we know that the spectrum estimation based on linear prediction is more robust than the spectrum estimation based on FFT in the case of lower SNR. In this paper, we propose a new speaker identification algorithm based on linear prediction spectrum estimation. In this algorithm, the spectrum estimation algorithm based on short time faster Furrier transform is replaced by the linear prediction spectrum estimation algorithm, then, the Mel-scale frequency cepstral coefficients are obtained by using the Mel-scale frequency triangle filter-bank. Experimental results show that the new algorithm have better performance than the algorithm based on FFT in the case of lower SNR.
Bowman, Casady. "Perceiving Emotion in Sounds: Does Timbre Play a Role?" Thesis, 2011. http://hdl.handle.net/1969.1/ETD-TAMU-2011-12-10656.
Full textWu, Sunrise, and 吳尚叡. "Design Time Domain Filter Banks Using Least Squares Method to Calculate the Mel-Frequency Cepstral Coefficients for Speaker Recognition." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/08178129842426697899.
Full text中華技術學院
電子工程研究所碩士班
96
Up to now, the best speaker recognition technique is based on Mel-frequency cepstral coefficients (MFCCs) [1-4,11] method. The main procedures on taking MFCCs are undergone by: framing, Hamming windowing, multiplied by FFT(Fast Fourier Transform)[7], filtered by Mel-scale triangular filter bank, taken the logarithmic energies of outputs, and transformed by DCT (Discrete Cosine Transform)[1-8]. After these processes, the MFCCs are obtained. The main topic of this thesis is we replace previous procedures of FFT [7] and filtering using a frequency-domain Mel-scale triangular filter bank[15] by filtering using a time-domain Mel-scale triangular filter bank. The time-domain Mel-scale triangular filter bank[1-8,14] we mentioned is obtained by the least square method[10,13], which is used to obtain the Mel-frequency cepstral coefficients of speaker speeches. From the results of our experiments, we find that the successful speaker recognition ratios between the conventional MFCC method[2,3,6,14] and our new approach are very similar.
Yuan, Hor, and 原禾. "Design Time Domain Filter Banks Using Least Squares Method to Calculate the Mel-Frequency Cepstral Coefficients for Non-Continuous Speech Recognition." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/76162451347630250736.
Full text中華技術學院
電子工程研究所碩士班
97
In speech recognition, the Mel frequency cepstral coefficients (MFCC) is currently popular to be used in speech recognition and speaker recognition[2,8-11,14,15]. To obtain the MFCC, the main procedures are filtering the speech signal by a set of triangular Mel-scale Filter Bank in the frequency domain to obtain the logarithm of the output powers of filter bank, and then taking Discrete Cosine Transform to obtain the MFCC. In this paper, the frequency domain triangular Mel-scale filter bank is replaced by a new designed time domain triangular Mel-scale filter bank. The experimental results show that the performances of speech recognition algorithms between that extracting MFCC using the conventional triangular Mel-scale filter bank and that extracting MFCC using the new designed time domain Mel-scale filter bank are very similar.
Sujatha, J. "Improved MFCC Front End Using Spectral Maxima For Noisy Speech Recognition." Thesis, 2005. http://etd.iisc.ernet.in/handle/2005/1506.
Full textLei, Ying, and 雷穎. "Chip Design of Mel Frequency Cepstral Coefficient for Speech Recognition." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/69494964042006361607.
Full text國立暨南國際大學
電機工程學系
94
This paper proposed the chip design of speech recognition for multimedia system. It is composed by three cores: a low-power high performance fast fourier transform (FFT) processor, a Mel-scale frequency cepstral coefficient (MFCC) circuit, and a dual-ALU digital signal process (DSP) processor with dynamic time warping speech recognition algorithm. The DSP processor had been implemented by previous researcher. In this paper, we mainly proposed the FFT processor and MFCC chip. In the FFT processor, we proposed a novel register array based pipelined radix-2 structure to reduce power consumption and computation cycles. In the MFCC circuit, we adopt one pair of accumulation procedure to reduce the computation of Mel frequency bank. In addition, we also minimize the look-up table size for logarithm operations, and we use gating clock issue to reduce power consumption. The two chips are synthesized by TSMC 0.18um cell library. The die size of the FFT/IFFT processor is approximately 4.73 . And the die size of the MFCC chip is approximately 1.71 . The two chips both work at 100 MHz.
Liu, Yi-Ming. "The Chip Design of Reconfigurable FFT-Based Mel Frequency Cepstrum Coefficient." 2008. http://www.cetd.com.tw/ec/thesisdetail.aspx?etdun=U0020-1607200815503000.
Full textLiu, Yi-Ming, and 劉益銘. "The Chip Design of Reconfigurable FFT-Based Mel Frequency Cepstrum Coefficient." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/51081196289542485757.
Full text國立暨南國際大學
電機工程學系
96
This thesis proposed the chip design of speech recognition for multimedia system. It is composed by three cores: a reconfigurable fast Fourier transform (FFT) processor, a Mel-scale frequency cepstrum coefficient (MFCC) circuit, and a dual-ALU digital signal process (DSP) processor with dynamic time warping speech recognition algorithm. In this FFT processor, we used the reconfigurable architecture to design our FFT processor. We proposed a novel register array based pipelined radix-22 structure to reduce power consumption and computation cycles. The FFT processor can be widely used in speech recognition, image process and communication system. The MFCC chips are synthesized by TSMC 0.13um cell library. The gate count of the MFCC chip is about 4767. The latency is about 2.60μs. The MFCC chip is work at 100 MHz.
(6642491), Jingzhao Dai. "SPARSE DISCRETE WAVELET DECOMPOSITION AND FILTER BANK TECHNIQUES FOR SPEECH RECOGNITION." Thesis, 2019.
Find full textSpeech recognition is widely applied to translation from speech to related text, voice driven commands, human machine interface and so on [1]-[8]. It has been increasingly proliferated to Human’s lives in the modern age. To improve the accuracy of speech recognition, various algorithms such as artificial neural network, hidden Markov model and so on have been developed [1], [2].
In this thesis work, the tasks of speech recognition with various classifiers are investigated. The classifiers employed include the support vector machine (SVM), k-nearest neighbors (KNN), random forest (RF) and convolutional neural network (CNN). Two novel features extraction methods of sparse discrete wavelet decomposition (SDWD) and bandpass filtering (BPF) based on the Mel filter banks [9] are developed and proposed. In order to meet diversity of classification algorithms, one-dimensional (1D) and two-dimensional (2D) features are required to be obtained. The 1D features are the array of power coefficients in frequency bands, which are dedicated for training SVM, KNN and RF classifiers while the 2D features are formed both in frequency domain and temporal variations. In fact, the 2D feature consists of the power values in decomposed bands versus consecutive speech frames. Most importantly, the 2D feature with geometric transformation are adopted to train CNN.
Speech recognition including males and females are from the recorded data set as well as the standard data set. Firstly, the recordings with little noise and clear pronunciation are applied with the proposed feature extraction methods. After many trials and experiments using this dataset, a high recognition accuracy is achieved. Then, these feature extraction methods are further applied to the standard recordings having random characteristics with ambient noise and unclear pronunciation. Many experiment results validate the effectiveness of the proposed feature extraction techniques.
Weng, Yu-Sheng, and 翁育生. "The Chip Design of Mel Frequency Cepstrum Coefficient for HMM Speech Recognition." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/80385881218692562375.
Full text國立成功大學
電機工程學系
86
MFCC倒頻譜係數是語音特徵參數的一種,主要從人類聽覺的物理特性所 得來,非常能夠代表語音,除了計算方式較LPCC倒頻譜參數更加直接外, 也有相當不錯的辨識率。此外語音辨識近來則以隱藏式馬可夫模型(HMM )為主要架構,因為在許多相關應用方面都有相當不錯的效果,所以使用 該模型作為參數求取後的語音辨識驗證。因此我們希望將MFCC演算法以硬 體實現,並做為未來整個辨識系統中參數計算的模組。 本論文首先詳細 分析整個原始MFCC演算法所需計算量,使用簡化過之餘弦查表方法( simplified cosine table-lookup method)降低一半的記憶體,也將乘 法運算減少為原來二分之一;其次,利用Mel 頻率座標轉換之特性,將加 權能量頻譜所需之相關乘法運算與記憶體同時降為原來一半;最後更採用 修改後之分割式查表法(modified partitioned logarithm table look- up method )計算對數,除維持原精確度以外,更同時減少查表過程所需 運算與大幅降低查表所需儲存空間達50%之多。 整個硬體架構依據修改 後之演算法配合TSMC 0.6μm製程之標準元件庫設計完成。晶片面積 為3.2*3.3 mm2,以120支接腳包裝,總閘數約為10000,最高工作頻率為 50MHz,可以充分符合即時語音參數計算之需求。 Mel Frequency Cepstrum Coefficient is one kind of speech feature parameters, derived from the characteristic of human hearing. It isnot only good enough to model human speech but more straightforwardthan LPC cepstrum in calculation and has nice recognition rate. Ourpurpose is to implement the MFCC algorithm in hardware which functionsas the speech feature extraction module in overall recognition system. In our thesis, we first study the original MFCC algorithm in detailand analyze its required computational load. We utilize the simplifiedcosine table-lookup method to reduce the memory requirements and thenumber of multiplication to one half. Secondly, both the multiplication operations and memory size concerning weighted energy spectrum are cut down to one half by taking advantage of mapping between mel scale and frequency scale. Finally, we perform the logarithm operations by means of modified partitioned table look-up method. It has fewer intermediate operations needed by table look-up and dramatically decreases the required table size to 50% of original one with the same accuracy. The chip has been implemented using TSMC 0.6μm CELL Library. The chipsize is 3.2*3.3mm2, it contains 120 I/O PADS and the gate count is about 10000. The maximum working frequency is 50MHz and fully meets the requirementof real-time speech feature calculation.
Chen, Chia-Yu, and 陳佳妤. "The Investigation of Chinese Vowel Recognition for Mel-Frequency Cepstrum Coefficient Feature." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/9jf6bu.
Full text國立中興大學
統計學研究所
99
This paper is mainly to discuss the speech vowel recognition of 337 isolated mandarin words for dependence. We use the features of Mel-Frequency Cepstrum Coefficient(Mfcc). We consider three factors such as “the length of frame”, “the duration of the consonant” and “the dimension of speech feature”in the experiment. The method of K-nearest neighbor is used for recognition and the optimal combination can be found for the three experimental factors. In the experimental result, the best recognition can be up to 98.5%.
Chen, Wan-Yu, and 陳宛余. "The Investigation of Capturing Mel-Frequency Cepstrum Coefficient Features on Mandarin Consonant Word Recognition." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/33299797950806290067.
Full text中興大學
統計學研究所
99
The aim of this paper is to discuss the 337 mandarin consonant word recognition under the vowel is correct. The feature of Mel-frequency cepstrum coefficient (MFCC) and k-nearest neighbor (KNN) method are used for the recognition. Four experimental factors are considered in the word. That is, the length of frame, the dimension of MACC, the swing of frame, the duration of the consonant. We can fine the optimal parameters to the four experimental factors and the highest recognition is 95.84%.
Chu, Feng-Seng, and 朱峰森. "Improved Approaches of Processing Perceptual Linear Prediction(PLP)and Mel Frequency Cepstrum Coefficient(MFCC)Parameters for Robust Speech Recognition." Thesis, 2005. http://ndltd.ncl.edu.tw/handle/26578739886453071884.
Full textJhong, Jing-Jyue, and 鍾靖爵. "Using the Method of Common Vector to Recognize Isolated Mandarin Word for Speaker-dependent System with Optimal Mel-frequency Cepstrum Coefficient." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/zwc24k.
Full text國立中興大學
統計學研究所
99
The paper aims to investigate the recognition of 340 isolated mandarin words by using the feature of Mel-frequency cepstrum coefficient, since it is the most well-known feature which can be less susceptible to noise interference. The speech model is constructed by the method of common vector. Moreover, the two-stage method will also be used to improve the recognition for similar consonants. The speech database in this experiment are recorded by twelve different speakers. Each isolated mandarin word is recorded ten times. We will find the optimal parameter set in order to get the highest recognition rate by doing cross-validation through every parameters. Finally, the best average recognition rate in this experiment is 91.80%, and the variance is 0.008.
Wu, Jhong-Da, and 吳忠達. "Using K-Nearest Neighbor Method and the Optimal Mel-Frequency Cepstrum Coefficient Feature to Recognize Isolated Mandarin Word for Speaker-Dependent System." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/47366663775116977709.
Full text中興大學
統計學研究所
99
This paper is mainly to discuss the speech recognition of 337 isolation mandarin words for speaker dependent. The feature is Mel-frequency cepstrum coefficient(Mfcc), and the method is k-nearest neighbor(knn), for the recognition, we try to find out the optimal parameters to obtain high performance recognition. Six experimental factors(the length of frame, the dimension of Mfcc, the number of frame, the weight of consonant and vowel, the swing of frame and the duration of consonant) we considered in the work. We find that the best average rate of recognition in database attains 91.5%.