To see the other types of publications on this topic, follow the link: MFCC.

Journal articles on the topic 'MFCC'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'MFCC.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Lankala, Srinija, and Dr M. Ramana Reddy. "Design and Implementation of Energy-Efficient Floating Point MFCC Extraction Architecture for Speech Recognition Systems." International Journal for Research in Applied Science and Engineering Technology 10, no. 9 (September 30, 2022): 1217–25. http://dx.doi.org/10.22214/ijraset.2022.46807.

Full text
Abstract:
Abstract: This brief presents an energy-efficient architecture to extract mel-frequency cepstrum coefficients (MFCCs) for realtime speech recognition systems. Based on the algorithmic property of MFCC feature extraction, the architecture is designed with floating-point arithmetic units to cover a wide dynamic range with a small bit-width. Moreover, various operations required in the MFCC extraction are examined to optimize operational bit-width and lookup tables needed to compute nonlinear functions, such as trigonometric and logarithmic functions. In addition, the dataflow of MFCC extraction is tailored to minimize the computation time. As a result, the energy consumption is considerably reduced compared with previous MFCC extraction systems
APA, Harvard, Vancouver, ISO, and other styles
2

Chu, Yun Yun, Wei Hua Xiong, Wei Wei Shi, and Yu Liu. "The Extraction of Differential MFCC Based on EMD." Applied Mechanics and Materials 313-314 (March 2013): 1167–70. http://dx.doi.org/10.4028/www.scientific.net/amm.313-314.1167.

Full text
Abstract:
Feature extraction is the key to the object recognition. How to obtain effective, reliable characteristic parameters from the limited measured data is a question of great importance in feature extraction. This paper presents a method based on Empirical Mode Decomposition (EMD) for the extraction of Mel Frequency Cepstrum Coefficients (MFCCs) and its first order difference from original speech signals that contain four kinds of emotions such as anger, happiness, surprise and natural for emotion recognition. And the experiments compare the recognition rate of MFCC, differential MFCC (Both of them are extracted based on EMD) or their combination through using Support Vector Machine (SVM) to recognize speakers' emotional speech identity. It proves that the combination of MFCC and its first order difference has a highest recognition rate.
APA, Harvard, Vancouver, ISO, and other styles
3

Mohammed, Duraid Y., Khamis Al-Karawi, and Ahmed Aljuboori. "Robust speaker verification by combining MFCC and entrocy in noisy conditions." Bulletin of Electrical Engineering and Informatics 10, no. 4 (August 1, 2021): 2310–19. http://dx.doi.org/10.11591/eei.v10i4.2957.

Full text
Abstract:
Automatic speaker recognition may achieve remarkable performance in matched training and test conditions. Conversely, results drop significantly in incompatible noisy conditions. Furthermore, feature extraction significantly affects performance. Mel-frequency cepstral coefficients MFCCs are most commonly used in this field of study. The literature has reported that the conditions for training and testing are highly correlated. Taken together, these facts support strong recommendations for using MFCC features in similar environmental conditions (train/test) for speaker recognition. However, with noise and reverberation present, MFCC performance is not reliable. To address this, we propose a new feature 'entrocy' for accurate and robust speaker recognition, which we mainly employ to support MFCC coefficients in noisy environments. Entrocy is the fourier transform of the entropy, a measure of the fluctuation of the information in sound segments over time. Entrocy features are combined with MFCCs to generate a composite feature set which is tested using the gaussian mixture model (GMM) speaker recognition method. The proposed method shows improved recognition accuracy over a range of signal-to-noise ratios.
APA, Harvard, Vancouver, ISO, and other styles
4

Eskidere, Ömer, and Ahmet Gürhanlı. "Voice Disorder Classification Based on Multitaper Mel Frequency Cepstral Coefficients Features." Computational and Mathematical Methods in Medicine 2015 (2015): 1–12. http://dx.doi.org/10.1155/2015/956249.

Full text
Abstract:
The Mel Frequency Cepstral Coefficients (MFCCs) are widely used in order to extract essential information from a voice signal and became a popular feature extractor used in audio processing. However, MFCC features are usually calculated from a single window (taper) characterized by large variance. This study shows investigations on reducing variance for the classification of two different voice qualities (normal voice and disordered voice) using multitaper MFCC features. We also compare their performance by newly proposed windowing techniques and conventional single-taper technique. The results demonstrate that adapted weighted Thomson multitaper method could distinguish between normal voice and disordered voice better than the results done by the conventional single-taper (Hamming window) technique and two newly proposed windowing methods. The multitaper MFCC features may be helpful in identifying voices at risk for a real pathology that has to be proven later.
APA, Harvard, Vancouver, ISO, and other styles
5

Abdul, Zrar Khalid. "Kurdish Spoken Letter Recognition based on k-NN and SVM Model." Journal of University of Raparin 7, no. 4 (November 30, 2020): 1–12. http://dx.doi.org/10.26750/vol(7).no(4).paper1.

Full text
Abstract:
Automatic recognition of spoken letters is one of the most challenging tasks in the area of speech recognition system. In this paper, different machine learning approaches are used to classify the Kurdish alphabets such as SVM and k-NN where both approaches are fed by two different features, Linear Predictive Coding (LPC) and Mel Frequency Cepstral Coefficients (MFCCs). Moreover, the features are combined together to learn the classifiers. The experiments are evaluated on the dataset that are collected by the authors as there as not standard Kurdish dataset. The dataset consists of 2720 samples as a total. The results show that the MFCC features outperforms the LPC features as the MFCCs have more relative information of vocal track. Furthermore, fusion of the features (MFCC and LPC) is not capable to improve the classification rate significantly.
APA, Harvard, Vancouver, ISO, and other styles
6

Raychaudhuri, Aryama, Rudra Narayan Sahoo, and Manaswini Behera. "Application of clayware ceramic separator modified with silica in microbial fuel cell for bioelectricity generation during rice mill wastewater treatment." Water Science and Technology 84, no. 1 (June 4, 2021): 66–76. http://dx.doi.org/10.2166/wst.2021.213.

Full text
Abstract:
Abstract Ceramic separators have recently been investigated as low-cost, robust, and sustainable separators for application in microbial fuel cells (MFC). In the present study, an attempt was made to develop a low-cost MFC employing a clayware ceramic separator modified with silica. The properties of separators with varying silica content (10%–40% w/w) were evaluated in terms of oxygen and proton diffusion. The membrane containing 30% silica exhibited improved performance compared to the unmodified membrane. Two identical MFCs, fabricated using ceramic separators with 30% silica content (MFCS-30) and without silica (MFCC), were operated at hydraulic retention time of 12 h with real rice mill wastewater with a chemical oxygen demand (COD) of 3,200 ± 50 mg/L. The maximum volumetric power density of 791.72 mW/m3 and coulombic efficiency of 35.77% was obtained in MFCS-30, which was 60.4% and 48.5%, respectively, higher than that of MFCC. The maximum COD and phenol removal efficiency of 76.2% and 58.2%, respectively, were obtained in MFCS-30. MFC fabricated with modified ceramic separator demonstrated higher power generation and pollutant removal. The presence of hygroscopic silica in the ceramic separator improved its performance in terms of hydration properties and proton transport.
APA, Harvard, Vancouver, ISO, and other styles
7

Huizen, Roy Rudolf, and Florentina Tatrin Kurniati. "Feature extraction with mel scale separation method on noise audio recordings." Indonesian Journal of Electrical Engineering and Computer Science 24, no. 2 (November 1, 2021): 815. http://dx.doi.org/10.11591/ijeecs.v24.i2.pp815-824.

Full text
Abstract:
This paper focuses on improving the accuracy of noise audio recordings. High-quality audio recording, extraction using the mel frequency cepstral coefficients (MFCC) method produces high accuracy. While the low-quality is because of noise, the accuracy is low. Improved accuracy by investigating the effect of bandwidth on the mel scale. The proposed improvement uses the mel scale separation methods into two frequency channels (MFCC dual-channel). For the comparison method using the mel scale bandwidth without separation (MFCC single-channel). Feature analysis using k-mean clustering. The data uses a noise variance of up to -16 dB. Testing on the MFCC single-channel method for -16 dB noise has an accuracy of 47.5%, while the MFCC dual-channel method has an accuracy better of 76.25%. The next test used adaptive noise-canceling (ANC) to reduce noise before extraction. The result is that the MFCC single-channel method has an accuracy of 82.5% and the MFCC dual-channel method has an accuracy better of 83.75%. High-quality audio recording testing for the MFCC single-channel method has an accuracy of 92.5% and the MFCC dual-channel method has an accuracy better of 97.5%. The test results show the effect of mel scale bandwidth to increase accuracy. The MFCC dual-channel method has higher accuracy.
APA, Harvard, Vancouver, ISO, and other styles
8

Zhou, Ping, Xiao Pan Li, Jie Li, and Xin Xing Jing. "Speech Emotion Recognition Based on Mixed MFCC." Applied Mechanics and Materials 249-250 (December 2012): 1252–58. http://dx.doi.org/10.4028/www.scientific.net/amm.249-250.1252.

Full text
Abstract:
Due to MFCC characteristic parameter in speech recognition has low identification accuracy when signal is intermediate, high frequency signal, this paper put forward a improved algorithm of combining MFCC, Mid-MFCC and IMFCC, using increase or decrease component method to calculate the contribution that MFCC, Mid-MFCC and IMFCC each order cepstrum component was used in speech emotion recognition, extracting several order cepstrum component with highest contribution from three characteristic parameters and forming a new characteristic parameter. The experiment results show that under the same environment new characteristic parameter has higher recognition rate than classic MFCC characteristic parameter in speech emotion recognition.
APA, Harvard, Vancouver, ISO, and other styles
9

Sharma, Samiksha, Anupam Shukla, and Pankaj Mishra. "Speech and Language Recognition using MFCC and DELTA-MFCC." International Journal of Engineering Trends and Technology 12, no. 9 (June 25, 2014): 449–52. http://dx.doi.org/10.14445/22315381/ijett-v12p286.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

G., Rupali, and S. K. Bhatia. "Analysis of MFCC and Multitaper MFCC Feature Extraction Methods." International Journal of Computer Applications 131, no. 4 (December 17, 2015): 7–10. http://dx.doi.org/10.5120/ijca2015906883.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Qian, Su Xiang, Fu Xi Liu, and Jian Cao. "Design of Sound Recognition System Based on Modified Neural Network." Applied Mechanics and Materials 278-280 (January 2013): 1178–81. http://dx.doi.org/10.4028/www.scientific.net/amm.278-280.1178.

Full text
Abstract:
Sound recognition based on neural network is a technique that can put a resolution to exceeding artificial identification. Three kinds of neural network recognition models, adopting MFCC and difference MFCC, are discussed. According to six kinds of typical gunshots we design a kind of sound recognition system based on BP neural network optimized by PSO that uses MFCC and difference MFCC as a characteristic quantity to recognize sound signal. In the experiment PSO is used to optimize the network’s initial weights and threshold value. The experiment’s results show that BP neural network optimized by PSO using both MFCC characteristic quantity and difference MFCC characteristic quantity have a relatively lower error and a relatively faster speed than other ways discussed in the article, and the designed system reaches the expected goal.
APA, Harvard, Vancouver, ISO, and other styles
12

Varma, V. Sai Nitin, and Abdul Majeed K.K. "Advancements in Speaker Recognition: Exploring Mel Frequency Cepstral Coefficients (MFCC) for Enhanced Performance in Speaker Recognition." International Journal for Research in Applied Science and Engineering Technology 11, no. 8 (August 31, 2023): 88–98. http://dx.doi.org/10.22214/ijraset.2023.55124.

Full text
Abstract:
Abstract: Speaker recognition, a fundamental capability of software or hardware systems, involves receiving speech signals, identifying the speaker present in the speech signal, and subsequently recognizing the speaker for future interactions. This process emulates the cognitive task performed by the human brain. At its core, speaker recognition begins with speech as the input to the system. Various techniques have been developed for speech recognition, including Mel frequency cepstral coefficients (MFCC), Linear Prediction Coefficients (LPC), Linear Prediction Cepstral coefficients (LPCC), Line Spectral Frequencies (LSF), Discrete Wavelet Transform (DWT), and Perceptual Linear Prediction (PLP). Although LPC and several other techniques have been explored, they are often deemed impractical for real-time applications. In contrast, MFCC stands out as one of the most prominent and widely used techniques for speaker recognition. The utilization of cepstrum allows for the computation of resemblance between two cepstral feature vectors, making it an effective tool in this domain. In comparison to LPC-derived cepstrum features, the use of MFCC features has demonstrated superior performance in metrics such as False Acceptance Rate (FAR) and False Rejection Rate (FRR) for speaker recognition systems. MFCCs leverage the human ear's critical bandwidth fluctuations with respect to frequency. To capture phonetically important characteristics of speech signals, filters are linearly separated at low frequencies and logarithmically separated at high frequencies. This design choice is central to the effectiveness of the MFCC technique. The primary objective of the proposed work is to devise efficient techniques that extract pertinent information related to the speaker, thereby enhancing the overall performance of the speaker recognition system. By optimizing feature extraction methods, this research aims to contribute to the advancement of speaker recognition technology.
APA, Harvard, Vancouver, ISO, and other styles
13

Prayogi, Yanuar Risah. "Modifikasi Metode MFCC untuk Identifikasi Pembicara di Lingkungan Ber-Noise." JOINTECS (Journal of Information Technology and Computer Science) 4, no. 1 (July 12, 2019): 13. http://dx.doi.org/10.31328/jointecs.v4i1.999.

Full text
Abstract:
Beberapa metode ekstraksi fitur untuk sistem identifikasi pembicara memiliki kelemahan yaitu ketika dilingkungan berderau hasil akurasinya menurun. Metode ekstraksi fitur Mel-Frequency Cepstral Coefficient (MFCC) merupakan metode ekstraksi sinyal suara yang peka terhadap derau. Metode MFCC menghasilkan akurasi yang tinggi ketika dilingkungan yang bersih. Sebaliknya ketika di lingkungan yang berderau akurasi yang dihasilkan turun drastis. Penelitian ini mengusulkan metode ekstraksi fitur menggunakan MFCC digabung dengan algoritma deteksi endpoint. Algoritma deteksi endpoint memisahkan daerah speech dan nonspeech. Daerah nonspeech biasanya lebih banyak berisi derau sehingga bisa dijadikan informasi derau. Informasi derau diekstrak dan menghasilkan magnitude frekuensi derau. Uji coba metode yang diusulkan menghasilkan nilai akurasi yang lebih tinggi pada semua tipe derau dan tingkat SNR. Akurasi yang dihasilkan oleh metode yang diusulkan lebih tinggi 14.69% dibanding metode MFCC, 6.4% dibanding metode MFCC+wiener, dan 2.74% dibanding metode MFCC+Spectral Subtraction (SS).
APA, Harvard, Vancouver, ISO, and other styles
14

Xie, Tao, Xiaodong Zheng, and Yan Zhang. "Seismic facies analysis based on speech recognition feature parameters." GEOPHYSICS 82, no. 3 (May 1, 2017): O23—O35. http://dx.doi.org/10.1190/geo2016-0121.1.

Full text
Abstract:
Seismic facies analysis plays an important role in seismic stratigraphy. Seismic attributes have been widely applied to seismic facies analysis. One of the most important steps is to optimize the most sensitive attributes with regard to reservoir characteristics. Using different attribute combinations in multidimensional analyses will yield different solutions. Acoustic waves and seismic waves propagating in an elastic medium follow the same law of physics. The generation process of a speech signal based on the acoustic model is similar to the seismic data of the convolution model. We have developed the mel-frequency cepstrum coefficients (MFCCs), which have been successfully applied in speech recognition, as feature parameters for seismic facies analysis. Information about the wavelet and reflection coefficients is well-separated in these cepstrum-domain parameters. Specifically, information about the wavelet mainly appears in the low-domain part, and information about the reflection coefficients mainly appeared in the high-domain part. In the forward model, the seismic MFCCs are used as feature vectors for synthetic data with a noise level of zero and 5%. The Bayesian network is used to classify the traces. Then, classification accuracy rates versus different orders of the MFCCs are obtained. The forwarding results indicate that high accuracy rates are achieved when the order exceeds 10. For the real field data, the seismic data are decomposed into a set of MFCC parameters. The different information is unfolded in the parameter maps, enabling the interpreter to capture the geologic features of the target interval. The geologic features presented in the three instantaneous attributes and coherence can also be found in the MFCC parameter maps. The classification results are in accordance with the paleogeomorphy of the target interval as well as the known wells. The results from the synthetic data and real field data demonstrate the information description abilities of the seismic MFCC parameters. Therefore, using the speech feature parameters to extract information may be helpful for processing and interpreting seismic data.
APA, Harvard, Vancouver, ISO, and other styles
15

Phapatanaburi, Khomdet, Wongsathon Pathonsuwan, Longbiao Wang, Patikorn Anchuen, Talit Jumphoo, Prawit Buayai, Monthippa Uthansakul, and Peerapong Uthansakul. "Whispered Speech Detection Using Glottal Flow-Based Features." Symmetry 14, no. 4 (April 8, 2022): 777. http://dx.doi.org/10.3390/sym14040777.

Full text
Abstract:
Recent studies have reported that the performance of Automatic Speech Recognition (ASR) technologies designed for normal speech notably deteriorates when it is evaluated by whispered speech. Therefore, the detection of whispered speech is useful in order to attenuate the mismatch between training and testing situations. This paper proposes two new Glottal Flow (GF)-based features, namely, GF-based Mel-Frequency Cepstral Coefficient (GF-MFCC) as a magnitude-based feature and GF-based relative phase (GF-RP) as a phase-based feature for whispered speech detection. The main contribution of the proposed features is to extract magnitude and phase information obtained by the GF signal. In the GF-MFCC, Mel-frequency cepstral coefficient (MFCC) feature extraction is modified using the estimated GF signal derived from the iterative adaptive inverse filtering as the input to replace the raw speech signal. In a similar way, the GF-RP feature is the modification of the relative phase (RP) feature extraction by using the GF signal instead of the raw speech signal. The whispered speech production provides lower amplitude from the glottal source than normal speech production, thus, the whispered speech via Discrete Fourier Transformation (DFT) provides the lower magnitude and phase information, which make it different from a normal speech. Therefore, it is hypothesized that two types of our proposed features are useful for whispered speech detection. In addition, using the individual GF-MFCC/GF-RP feature, the feature-level and score-level combination are also proposed to further improve the detection performance. The performance of the proposed features and combinations in this study is investigated using the CHAIN corpus. The proposed GF-MFCC outperforms MFCC, while GF-RP has a higher performance than the RP. Further improved results are obtained via the feature-level combination of MFCC and GF-MFCC (MFCC&GF-MFCC)/RP and GF-RP(RP&GF-RP) compared with using either one alone. In addition, the combined score of MFCC&GF-MFCC and RP&GF-RP gives the best frame-level accuracy of 95.01% and the utterance-level accuracy of 100%.
APA, Harvard, Vancouver, ISO, and other styles
16

Lalitha, S., and Deepa Gupta. "An Encapsulation of Vital Non-Linear Frequency Features for Various Speech Applications." Journal of Computational and Theoretical Nanoscience 17, no. 1 (January 1, 2020): 303–7. http://dx.doi.org/10.1166/jctn.2020.8666.

Full text
Abstract:
Mel Frequency Cepstral Coefficients (MFCCs) and Perceptual linear prediction coefficients (PLPCs) are widely casted nonlinear vocal parameters in majority of the speaker identification, speaker and speech recognition techniques as well in the field of emotion recognition. Post 1980s, significant exertions are put forth on for the progress of these features. Considerations like the usage of appropriate frequency estimation approaches, proposal of appropriate filter banks, and selection of preferred features perform a vital part for the strength of models employing these features. This article projects an overview of MFCC and PLPC features for different speech applications. The insights such as performance metrics of accuracy, background environment, type of data, and size of features are inspected and concise with the corresponding key references. Adding more to this, the advantages and shortcomings of these features have been discussed. This background work will hopefully contribute to floating a heading step in the direction of the enhancement of MFCC and PLPC with respect to novelty, raised levels of accuracy, and lesser complexity.
APA, Harvard, Vancouver, ISO, and other styles
17

H. Mohd Johari, N., Noreha Abdul Malik, and K. A. Sidek. "Distinctive features for normal and crackles respiratory sounds using cepstral coefficients." Bulletin of Electrical Engineering and Informatics 8, no. 3 (September 1, 2019): 875–81. http://dx.doi.org/10.11591/eei.v8i3.1517.

Full text
Abstract:
Classification of respiratory sounds between normal and abnormal is very crucial for screening and diagnosis purposes. Lung associated diseases can be detected through this technique. With the advancement of computerized auscultation technology, the adventitious sounds such as crackles can be detected and therefore diagnostic test can be performed earlier. In this paper, Linear Predictive Cepstral Coefficient (LPCC) and Mel-frequency Cepstral Coefficient (MFCC) are used to extract features from normal and crackles respiratory sounds. By using statistical computation such as mean and standard deviation (SD) of cepstral based coefficients it can differentiate between crackles and normal sounds. The statistical computations of the cepstral coefficient of LPCC and MFCC show that the mean LPCC except for the third coefficient and first three statistical coefficient values of MFCC’s SD provide distinctive feature between normal and crackles respiratory sounds. Hence, LPCCs and MFCCs can be used as feature extraction method of respiratory sounds to classify between normal and crackles as screening and diagnostic tool.
APA, Harvard, Vancouver, ISO, and other styles
18

Zhang, Lanyue, Di Wu, Xue Han, and Zhongrui Zhu. "Feature Extraction of Underwater Target Signal Using Mel Frequency Cepstrum Coefficients Based on Acoustic Vector Sensor." Journal of Sensors 2016 (2016): 1–11. http://dx.doi.org/10.1155/2016/7864213.

Full text
Abstract:
Feature extraction method using Mel frequency cepstrum coefficients (MFCC) based on acoustic vector sensor is researched in the paper. Signals of pressure are simulated as well as particle velocity of underwater target, and the features of underwater target using MFCC are extracted to verify the feasibility of the method. The experiment of feature extraction of two kinds of underwater targets is carried out, and these underwater targets are classified and recognized by Backpropagation (BP) neural network using fusion of multi-information. Results of the research show that MFCC, first-order differential MFCC, and second-order differential MFCC features could be used as effective features to recognize those underwater targets and the recognition rate, which using the particle velocity signal is higher than that using the pressure signal, could be improved by using fusion features.
APA, Harvard, Vancouver, ISO, and other styles
19

Saumard, Matthieu. "Enhancing Speech Emotions Recognition Using Multivariate Functional Data Analysis." Big Data and Cognitive Computing 7, no. 3 (August 25, 2023): 146. http://dx.doi.org/10.3390/bdcc7030146.

Full text
Abstract:
Speech Emotions Recognition (SER) has gained significant attention in the fields of human–computer interaction and speech processing. In this article, we present a novel approach to improve SER performance by interpreting the Mel Frequency Cepstral Coefficients (MFCC) as a multivariate functional data object, which accelerates learning while maintaining high accuracy. To treat MFCCs as functional data, we preprocess them as images and apply resizing techniques. By representing MFCCs as functional data, we leverage the temporal dynamics of speech, capturing essential emotional cues more effectively. Consequently, this enhancement significantly contributes to the learning process of SER methods without compromising performance. Subsequently, we employ a supervised learning model, specifically a functional Support Vector Machine (SVM), directly on the MFCC represented as functional data. This enables the utilization of the full functional information, allowing for more accurate emotion recognition. The proposed approach is rigorously evaluated on two distinct databases, EMO-DB and IEMOCAP, serving as benchmarks for SER evaluation. Our method demonstrates competitive results in terms of accuracy, showcasing its effectiveness in emotion recognition. Furthermore, our approach significantly reduces the learning time, making it computationally efficient and practical for real-world applications. In conclusion, our novel approach of treating MFCCs as multivariate functional data objects exhibits superior performance in SER tasks, delivering both improved accuracy and substantial time savings during the learning process. This advancement holds great potential for enhancing human–computer interaction and enabling more sophisticated emotion-aware applications.
APA, Harvard, Vancouver, ISO, and other styles
20

Singh, Satyanand, and E. G. Rajan. "Vector Quantization Approach for Speaker Recognition using MFCC and Inverted MFCC." International Journal of Computer Applications 17, no. 1 (March 31, 2011): 1–7. http://dx.doi.org/10.5120/2188-2774.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Dadula, Cristina P., and Elmer P. Dadios. "Fuzzy Logic System for Abnormal Audio Event Detection Using Mel Frequency Cepstral Coefficients." Journal of Advanced Computational Intelligence and Intelligent Informatics 21, no. 2 (March 15, 2017): 205–10. http://dx.doi.org/10.20965/jaciii.2017.p0205.

Full text
Abstract:
This paper presents a fuzzy logic system for audio event detection using mel frequency cepstral coefficients (MFCC). Twelve MFCC of audio samples were analyzed. The range of values of MFCC were obtained including its histogram. These values were normalized so that its minimum and maximum values lie between 0 and 1. Rules were formulated based on the histogram to classify audio samples as normal, gunshot, or crowd panic. Five MFCC were chosen as input to the fuzzy logic system. The membership functions and rules of the fuzzy logic system are defined based on the normalized histograms of MFCC. The system was tested with a total of 150 minutes of normal sounds from different buses and 72 seconds audio clips abnormal sounds. The designed fuzzy logic system was able to classify audio events with an average accuracy of 99.4%.
APA, Harvard, Vancouver, ISO, and other styles
22

Tirta, Luhfita, Joan Santoso, and Endang Setyati. "Pengenalan Lirik Lagu Otomatis Pada Video Lagu Indonesia Menggunakan Hidden Markov Model Yang Dilengkapi Music Removal." Journal of Information System,Graphics, Hospitality and Technology 4, no. 2 (December 12, 2022): 86–94. http://dx.doi.org/10.37823/insight.v4i2.225.

Full text
Abstract:
Video sangat penting untuk membuat informasi berupa suara dalam video agar dapat dipahami oleh semua kalangan masyarakat, dan orang-orang yang memiliki masalah pendengaran yaitu dengan cara paling alami terletak pada penggunaan subtitle. Oleh karena itu, peneliti membuat pengenalan lirik lagu otomatis pada video lagu Indonesia menggunakan Hidden Markov Model yang dilengkapi music removal. Dalam pengenalan suara lebih akurat dilakukan dengan menggunakan model HMM yang dilengkapi oleh MFCC (kata yang cocok 81% dan WER 19%) dibandingkan dengan model LDA + MFCC (kata yang cocok 71% dan WER 29%) dan DWT + MFCC (kata yang cocok 61% dan WER 39%). Jumlah kata dan sample suara pada library Bahasa Indonesia yang digunakan cukup sangat mempengaruhi MFCC dan CMU Sphinx-4, Nada pada inputan lagu yang akan diproses CMU Sphinx-4 juga sangat berpengaruh pada tingkat keberhasilan, dikarenakan CMU Sphinx-4 sangat sensitif dengan nada yang terlalu tinggi dan noise yang ada pada inputan lagu tersebut sehingga peneliti menambahkan fitur ekstraksi pada suara yaitu menggunakan MFCC. Dalam hal ini menggunakan dataset kecil terlebih dahulu untuk memastikan metode Hidden Markov Model yang dilengkapi MFCC dan CMU Sphinx-4 dapat berjalan dengan baik, Dari penelitian beberapa peneliti sebelumnya, maka hasil akhir yang diperoleh dengan menggunakan metode HMM yang dilengkapi oleh MFCC dan CMU Sphinx-4 dalam penelitian ini mendapatkan hasil akurasi training 78% dan testing 81% kecocokan kata pada video lagu.
APA, Harvard, Vancouver, ISO, and other styles
23

Sharif, Afriandy, Opim Salim Sitompul, and Erna Budhiarti Nababan. "Analysis Of Variation In The Number Of MFCC Features In Contrast To LSTM In The Classification Of English Accent Sounds." JOURNAL OF INFORMATICS AND TELECOMMUNICATION ENGINEERING 6, no. 2 (January 25, 2023): 587–601. http://dx.doi.org/10.31289/jite.v6i2.8566.

Full text
Abstract:
Various studies have been carried out to classify English accents using traditional classifiers and modern classifiers. In general, research on voice classification and voice recognition that has been done previously uses the MFCC method as voice feature extraction. The stages in this study began with importing datasets, data preprocessing of datasets, then performing MFCC feature extraction, conducting model training, testing model accuracy and displaying a confusion matrix on model accuracy. After that, an analysis of the classification has been carried out. The overall results of the 10 tests on the test set show the highest accuracy value for feature 17 value of 64.96% in the test results obtained some important information, including; The test results on the MFCC coefficient values of twelve to twenty show overfitting. This is shown in the model training process which repeatedly produces high accuracy but produces low accuracy in the classification testing process. The feature assignment on MFCC shows that the higher the feature value assignment on MFCC causes a very large sound feature dimension. With the large number of features obtained, the MFCC method has a weakness in determining the number of features.
APA, Harvard, Vancouver, ISO, and other styles
24

Zhu, Qiang, Zhong Wang, Yunfeng Dou, and Jian Zhou. "Whispered Speech Conversion Based on the Inversion of Mel Frequency Cepstral Coefficient Features." Algorithms 15, no. 2 (February 20, 2022): 68. http://dx.doi.org/10.3390/a15020068.

Full text
Abstract:
A conversion method based on the inversion of Mel frequency cepstral coefficient (MFCC) features was proposed to convert whispered speech into normal speech. First, the MFCC features of whispered speech and normal speech were extracted and a matching relation between the MFCC feature parameters of whispered speech and normal speech was developed through the Gaussian mixture model (GMM). Then, the MFCC feature parameters of normal speech corresponding to whispered speech were obtained based on the GMM and, finally, whispered speech was converted into normal speech through the inversion of MFCC features. The experimental results showed that the cepstral distortion (CD) of the normal speech converted by the proposed method was 21% less than that of the normal speech converted by the linear predictive coefficient (LPC) features, the mean opinion score (MOS) was 3.56, and a satisfactory outcome in both intelligibility and sound quality was achieved.
APA, Harvard, Vancouver, ISO, and other styles
25

Rajesh, Sangeetha, and Nalini N. J. "Recognition of Musical Instrument Using Deep Learning Techniques." International Journal of Information Retrieval Research 11, no. 4 (October 2021): 41–60. http://dx.doi.org/10.4018/ijirr.2021100103.

Full text
Abstract:
The proposed work investigates the impact of Mel Frequency Cepstral Coefficients (MFCC), Chroma DCT Reduced Pitch (CRP), and Chroma Energy Normalized Statistics (CENS) for instrument recognition from monophonic instrumental music clips using deep learning techniques, Bidirectional Recurrent Neural Networks with Long Short-Term Memory (BRNN-LSTM), stacked autoencoders (SAE), and Convolutional Neural Network - Long Short-Term Memory (CNN-LSTM). Initially, MFCC, CENS, and CRP features are extracted from instrumental music clips collected as a dataset from various online libraries. In this work, the deep neural network models have been fabricated by training with extracted features. Recognition rates of 94.9%, 96.8%, and 88.6% are achieved using combined MFCC and CENS features, and 90.9%, 92.2%, and 87.5% are achieved using combined MFCC and CRP features with deep learning models BRNN-LSTM, CNN-LSTM, and SAE, respectively. The experimental results evidence that MFCC features combined with CENS and CRP features at score level revamp the efficacy of the proposed system.
APA, Harvard, Vancouver, ISO, and other styles
26

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Optimizing Integrated Features for Hindi Automatic Speech Recognition System." Journal of Intelligent Systems 29, no. 1 (October 1, 2018): 959–76. http://dx.doi.org/10.1515/jisys-2018-0057.

Full text
Abstract:
Abstract An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.
APA, Harvard, Vancouver, ISO, and other styles
27

Boulmaiz, Amira, Djemil Messadeg, Noureddine Doghmane, and Abdelmalik Taleb-Ahmed. "Design and Implementation of a Robust Acoustic Recognition System for Waterbird Species using TMS320C6713 DSK." International Journal of Ambient Computing and Intelligence 8, no. 1 (January 2017): 98–118. http://dx.doi.org/10.4018/ijaci.2017010105.

Full text
Abstract:
In this paper, a new real-time approach for audio recognition of waterbird species in noisy environments, based on a Texas Instruments DSP, i.e. TMS320C6713 is proposed. For noise estimation in noisy water bird's sound, a tonal region detector (TRD) using a sigmoid function is introduced. This method offers flexibility since the slope and the mean of the sigmoid function can be adapted autonomously for a better trade-off between noise overvaluation and undervaluation. Then, the features Mel Frequency Cepstral Coefficients post processed by Spectral Subtraction (MFCC-SS) were extracted for classification using Support Vector Machine classifier. A development of the Simulink analysis models of classic MFCC and MFCC-SS is described. The audio recognition system is implemented in real time by loading the created models in DSP board, after being converted to target C code using Code Composer Studio. Experimental results demonstrate that the proposed TRD-MFCC-SS feature is highly effective and performs satisfactorily compared to conventional MFCC feature, especially in complex environment.
APA, Harvard, Vancouver, ISO, and other styles
28

Yang, Xing Hai, Wen Jie Fu, Yu Tai Wang, Jia Ding, and Chang Zhi Wei. "Heart Sound Clustering Based on Supervised Kohonen Network." Applied Mechanics and Materials 138-139 (November 2011): 1115–20. http://dx.doi.org/10.4028/www.scientific.net/amm.138-139.1115.

Full text
Abstract:
In this paper, a new method based on Supervised Kohonen network (SKN) and Mel-frequency cepstrum coefficients (MFCC) is introduced. MFCC of heart sound signal are extracted firstly, and then features are got by calculating every order of MFCC average energy. Finally, SKN is used to identify heart sound. The experimental result shows that this algorithm has a good performance in heart sound clustering, and is of significant practical value.
APA, Harvard, Vancouver, ISO, and other styles
29

Panda, Siva Prasad, Rajsekhar Reddy A, and Uttam Prasad Panigrahy. "EVALUATION OF ANTICANCER ACTIVITY OF CUCUMIS CALLOSUS AGAINST EHRLICH’S ASCITES CARCINOMA BEARING MICE." Asian Journal of Pharmaceutical and Clinical Research 11, no. 10 (October 7, 2018): 438. http://dx.doi.org/10.22159/ajpcr.2018.v11i10.27439.

Full text
Abstract:
Objective: Our previous research isolated Cucurbitacin B (CuB) and ebenone leucopentaacetate (ELP) from methanolic fruit extract of Cucumis callosus (MFCC). The fruits of C. callosus (Rottl.) Cogn. (Family: Cucurbitaceae) plant have been traditionally used for antioxidant, anti-inflammatory, and antidiabetic actions. The objective of this research was to evaluate in vitro and in vivo anticancer effect of MFCC on Ehrlich Ascites Carcinoma (EAC) cell lines.Methods: In vitro anticancer assay of MFCC and standard drug, 5-fluorouracil (5-FU) was evaluated using Trypan blue and 3-(4, 5-dimethylthiazol-yl)-2, 5-diphenyl tetrazolium bromide methods. In vivo anticancer activity of MFCC and 5-FU was also performed after 24h of EAC cells (2×106cells/ mouse) inoculation based on toxicity study for 9 consecutive days. The activity of the extract was assessed by the study of tumor volume, tumor weight, viable and non-viable cell count, hematological parameters, and biochemical estimations.Results: The MFCC showed the direct antitumor effect on EAC cells in a dose-dependent manner with an IG50 value of 0.61 mg/ml. Furthermore, MFCC (350 mg/kg) exhibited significant (p<0.01) decrease in tumor volume, tumor weight, and viable cell count of EAC-treated mice. Hematological profile, biochemical estimation assay significantly (p<0.01) reverted to normal level in MFCC, and 5-FU treated mice.Conclusion: The anticancer activity of fruits of C callosus is may be either due to the presence of CuB or/and ELP as phytoconstituent and the activity is comparable to standard drug 5-FU.
APA, Harvard, Vancouver, ISO, and other styles
30

Gupta, Shikha, Jafreezal Jaafar, Wan Fatimah wan Ahmad, and Arpit Bansal. "Feature Extraction Using Mfcc." Signal & Image Processing : An International Journal 4, no. 4 (August 31, 2013): 101–8. http://dx.doi.org/10.5121/sipij.2013.4408.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Ruan, Peilin, Xu Zheng, Yi Qiu, and Zhiyong Hao. "A Binaural MFCC-CNN Sound Quality Model of High-Speed Train." Applied Sciences 12, no. 23 (November 28, 2022): 12151. http://dx.doi.org/10.3390/app122312151.

Full text
Abstract:
The high-speed train (HST) is one of the most important transport tools in China, and the sound quality of its interior noise affects passengers’ comfort. This paper proposes a HST sound quality model. The model combines Mel-scale frequency cepstral coefficients (MFCCs), the most popular spectral-based input parameter in deep learning models, with convolutional neural networks (CNNs) to evaluate the sound quality of HSTs. Meanwhile, two input channels are applied to simulate binaural hearing so that the different sound signals can be processed separately. The binaural MFCC-CNN model achieves an accuracy of 96.2% and outperforms the traditional shallow neural network model because it considers the time-varying characteristics of noise. The MFCC features are capable of capturing the characteristics of noise and improving the accuracy of sound quality evaluations. Besides, the results suggest that the time and level differences in sound signals are important factors affecting sound quality at low annoyance levels. The proposed model is expected to optimize the comfort of the interior acoustic environment of HSTs.
APA, Harvard, Vancouver, ISO, and other styles
32

Gao, Lun, Tai Fu Li, and Feng Wen. "Application in Extraction of Voice Recognition Characteristic for Relief Algorithm." Advanced Materials Research 765-767 (September 2013): 2772–75. http://dx.doi.org/10.4028/www.scientific.net/amr.765-767.2772.

Full text
Abstract:
Pertaining to the study on voice recognition issue, the characteristic selection study on voice signal is described via Relief Algorithm in this paper, it is started from 24-dimensional MFCC parameters to find out the most important MFCC parameters in the voice signal, under the condition that the recognition ratio doesnt change greatly, the optimized combination is performed for 24-dimensional MFCC parameters to provide a new orientation for characteristic of voice recognition.
APA, Harvard, Vancouver, ISO, and other styles
33

Nichting, Thomas J., Maretha Bester, Rohan Joshi, Massimo Mischi, Myrthe van der Ven, Daisy A. A. van der Woude, S. Guid Oei, Judith O. E. H. van Laar, and Rik Vullings. "Evidence and clinical relevance of maternal-fetal cardiac coupling: A scoping review." PLOS ONE 18, no. 7 (July 12, 2023): e0287245. http://dx.doi.org/10.1371/journal.pone.0287245.

Full text
Abstract:
Background Researchers have long suspected a mutual interaction between maternal and fetal heart rhythms, referred to as maternal-fetal cardiac coupling (MFCC). While several studies have been published on this phenomenon, they vary in terms of methodologies, populations assessed, and definitions of coupling. Moreover, a clear discussion of the potential clinical implications is often lacking. Subsequently, we perform a scoping review to map the current state of the research in this field and, by doing so, form a foundation for future clinically oriented research on this topic. Methods A literature search was performed in PubMed, Embase, and Cochrane. Filters were only set for language (English, Dutch, and German literature were included) and not for year of publication. After screening for the title and the abstract, a full-text evaluation of eligibility followed. All studies on MFCC were included which described coupling between heart rate measurements in both the mother and fetus, regardless of the coupling method used, gestational age, or the maternal or fetal health condition. Results 23 studies remained after a systematic evaluation of 6,672 studies. Of these, 21 studies found at least occasional instances of MFCC. Methods used to capture MFCC are synchrograms and corresponding phase coherence indices, cross-correlation, joint symbolic dynamics, transfer entropy, bivariate phase rectified signal averaging, and deep coherence. Physiological pathways regulating MFCC are suggested to exist either via the autonomic nervous system or due to the vibroacoustic effect, though neither of these suggested pathways has been verified. The strength and direction of MFCC are found to change with gestational age and with the rate of maternal breathing, while also being further altered in fetuses with cardiac abnormalities and during labor. Conclusion From the synthesis of the available literature on MFCC presented in this scoping review, it seems evident that MFCC does indeed exist and may have clinical relevance in tracking fetal well-being and development during pregnancy.
APA, Harvard, Vancouver, ISO, and other styles
34

Hidayat, Syahroni. "SPEECH RECOGNITION OF KV-PATTERNED INDONESIAN SYLLABLE USING MFCC, WAVELET AND HMM." Kursor 8, no. 2 (December 12, 2016): 67. http://dx.doi.org/10.28961/kursor.v8i2.63.

Full text
Abstract:
The Indonesian language is an agglutinative language which has complex suffixes and affixes attached on its root. For this reason there is a high possibility to recognize Indonesian speech based on its syllables. The syllable-based Indonesian speech recognition could reduce the database and recognize new Indonesian vocabularies which evolve as the result of language development. MFCC and WPT daubechies 3rd (DB3) and 7th (DB7) order methods are used in feature extraction process and HMM with Euclidean distance probability is applied for classification. The results shows that the best recognition rateis 75% and 70.8% for MFCC and WPT method respectively, which come from the testing using training data test. Meanwhile, for testing using external data test WPT method excel the MFCC method, where the best recognition rate is 53.1% for WPT and 47% for MFCC. For MFCC the accuracy increased asthe data length and the frame length increased. In WPT, the increase in accuracy is influenced by the length of data, type of the wavelet and decomposition level. It is also found that as the variation of state increased the recognition for both methods decreased.
APA, Harvard, Vancouver, ISO, and other styles
35

Hartono, Henry, Viny Christanti Mawardi, and Janson Hendryli. "PERANCANGAN SISTEM PENCARIAN LAGU INDONESIA MENGGUNAKAN QUERY BY HUMMING BERBASIS LONG SHORT-TERM MEMORY." Jurnal Ilmu Komputer dan Sistem Informasi 9, no. 1 (January 18, 2021): 106. http://dx.doi.org/10.24912/jiksi.v9i1.11567.

Full text
Abstract:
Song identification dan query by humming is an application that is developed using Mel-frequency cepstral coefficients (MFCC) and Long Short-Term Memory (LSTM) algorithm.The application purpose is to detect and recognize humming from the input data. In this application the humming input will be divided into two parts, namely the training audio and test audio. For the training audio, the training audio will be divided into two process stages, namely recognizing humming and searching for the unique features of a humming audio.To recognize the humming feature, the humming will be processed using the MFCC method. After obtaining a part of the MFCC Features, the MFCC features will be saved as a vector model. The feature that has been extracted will be learned by the LSTM method. For the test audio of the stages carried out as in the training audio, after the MFCC Feature is detected, an introduction will be made based on learning that has been done with the LSTM method to obtain output in the form of a song name that is successfully recognized and detected will be labeled by the application.
APA, Harvard, Vancouver, ISO, and other styles
36

Bhalke, Daulappa Guranna, Betsy Rajesh, and Dattatraya Shankar Bormane. "Automatic Genre Classification Using Fractional Fourier Transform Based Mel Frequency Cepstral Coefficient and Timbral Features." Archives of Acoustics 42, no. 2 (June 27, 2017): 213–22. http://dx.doi.org/10.1515/aoa-2017-0024.

Full text
Abstract:
Abstract This paper presents the Automatic Genre Classification of Indian Tamil Music and Western Music using Timbral and Fractional Fourier Transform (FrFT) based Mel Frequency Cepstral Coefficient (MFCC) features. The classifier model for the proposed system has been built using K-NN (K-Nearest Neighbours) and Support Vector Machine (SVM). In this work, the performance of various features extracted from music excerpts has been analysed, to identify the appropriate feature descriptors for the two major genres of Indian Tamil music, namely Classical music (Carnatic based devotional hymn compositions) & Folk music and for western genres of Rock and Classical music from the GTZAN dataset. The results for Tamil music have shown that the feature combination of Spectral Roll off, Spectral Flux, Spectral Skewness and Spectral Kurtosis, combined with Fractional MFCC features, outperforms all other feature combinations, to yield a higher classification accuracy of 96.05%, as compared to the accuracy of 84.21% with conventional MFCC. It has also been observed that the FrFT based MFCC effieciently classifies the two western genres of Rock and Classical music from the GTZAN dataset with a higher classification accuracy of 96.25% as compared to the classification accuracy of 80% with MFCC.
APA, Harvard, Vancouver, ISO, and other styles
37

Chen, Qianru, Zhifeng Wu, Qinghua Zhong, and Zhiwei Li. "Heart Sound Classification Based on Mel-Frequency Cepstrum Coefficient Features and Multi-Scale Residual Recurrent Neural Networks." Journal of Nanoelectronics and Optoelectronics 17, no. 8 (August 1, 2022): 1144–53. http://dx.doi.org/10.1166/jno.2022.3305.

Full text
Abstract:
A rapid and accurate algorithm model of extracting heart sounds plays a vital role in the early detection of cardiovascular disorders, especially for small primary health care clinics. This paper proposes a heart sound extraction and classification algorithm based on static and dynamic combination of Mel-frequency cepstrum coefficient (MFCC) feature extraction and the multi-scale residual recurrent neural network (MsRes-RNN) algorithm model. The standard MFCC parameters represent the static characteristics of the signal. In contrast, the first-order and second-order MFCC parameters represent the dynamic characteristics of the signal. They are extracted and combined to form the MFCC feature representation. Then, the MFCC-based features are fed to a MsRes-RNN algorithm model for feature learning and classification tasks. The proposed classification model can take advantage of the encoded local characteristics extracted from the multi-scale residual neural network (MsResNet) and the long-term dependencies captured by recurrent neural network (RNN). Model estimation experiments and performance comparisons with other state-of-the-art algorithms are presented in this paper. Experiments indicate that a classification accuracy of 93.9% has been achieved on 2016 PhysioNet/CinC Challenge datasets.
APA, Harvard, Vancouver, ISO, and other styles
38

Sasilo, Ababil Azies, Rizal Adi Saputra, and Ika Purwanti Ningrum. "Sistem Pengenalan Suara Dengan Metode Mel Frequency Cepstral Coefficients Dan Gaussian Mixture Model." Komputika : Jurnal Sistem Komputer 11, no. 2 (August 25, 2022): 203–10. http://dx.doi.org/10.34010/komputika.v11i2.6655.

Full text
Abstract:
ABSTRAK – Teknologi biometrik sedang menjadi tren teknologi dalam berbagai bidang kehidupan. Teknologi biometrik memanfaatkan bagian tubuh manusia sebagai alat ukur sistem yang memiliki keunikan disetiap individu. Suara merupakan bagian tubuh manusia yang memiliki keunikan dan cocok dijadikan sebagai alat ukur dalam sistem yang mengadopsi teknologi biometrik. Sistem pengenalan suara adalah salah satu penerapan teknologi biometrik yang fokus kepada suara manusia. Sistem pengenalan suara memerlukan metode ekstraksi fitur dan metode klasifikasi, salah satu metode ekstraksi fitur adalah MFCC. MFCC dimulai dari tahap pre-emphasis, frame blocking, windowing, fast fourier transform, mel frequency wrapping dan cepstrum. Sedangkan metode klasifikasi menggunakan GMM dengan menghitung likehood kesamaan antar suara. Berdasarkan hasil pengujian, metode MFCC-GMM pada kondisi ideal memiliki tingkat akurasi sebesar 82.22% sedangkan pada kondisi tidak ideal mendapatkan akurasi sebesar 66.67%. Kata Kunci – Suara, Pengenalan, MFCC, GMM, Sistem
APA, Harvard, Vancouver, ISO, and other styles
39

Shivaprasad, Satla, and Manchala Sadanandam. "Optimized Features Extraction from Spectral and Temporal Features for Identifying the Telugu Dialects by Using GMM and HMM." Ingénierie des systèmes d information 26, no. 3 (June 30, 2021): 275–83. http://dx.doi.org/10.18280/isi.260304.

Full text
Abstract:
Telugu language is one of the historical languages and belongs to the Dravidian family. It contains three dialects named Telangana, Costa Andhra, and Rayalaseema. This paper identifies the dialects of the Telugu language. MFCC, Delta MFCC, and Delta-Delta MFCC are applied with 39 feature vectors for the dialect identification. In addition, ZCR is also applied to identify the dialects. At last combined all the MFCC and ZCR features. A standard database is created to identify the dialects of the Telugu language. Different statistical methods like HMM and GMM are applied for the classification purpose. To improve the accuracy of the model, dimensionality reduction technique PCA is applied to reduce the number of features extracted from the speech signal and applied to models. In this work, with the application of dimensionality reduction, there is an increase in the accuracy of models observed.
APA, Harvard, Vancouver, ISO, and other styles
40

González, M. L. Jiménez, Carlos Hernández Benítez, Zabdiel Abisai Juarez, Evelyn Zamudio Pérez, Víctor Ángel Ramírez Coutiño, Irma Robles, Luis A. Godínez, and Francisco J. Rodríguez-Valadez. "Study of the Effect of Activated Carbon Cathode Configuration on the Performance of a Membrane-Less Microbial Fuel Cell." Catalysts 10, no. 6 (June 2, 2020): 619. http://dx.doi.org/10.3390/catal10060619.

Full text
Abstract:
In this paper, the effect of cathode configuration on the performance of a membrane-less microbial fuel cell (MFC) was evaluated using three different arrangements: an activated carbon bed exposed to air (MFCE), a wetland immersed in an activated carbon bed (MFCW) and a cathode connected to an aeration tower featuring a water recirculation device (MFCT). To evaluate the MFC performance, the efficiency of the organic matter removal, the generated voltage, the power density and the internal resistance of the systems were properly assessed. The experimental results showed that while the COD removal efficiency was in all cases over 60% (after 40 days), the MFCT arrangement showed the best performance since the average removal value was 82%, compared to close to 70% for MFCE and MFCW. Statistical analysis of the COD removal efficiency confirmed that the performance of MCFT is substantially better than that of MFCE and MFCW. In regard to the other parameters surveyed, no significant influence of the different cathode arrangements explored could be found.
APA, Harvard, Vancouver, ISO, and other styles
41

Hidayat, Syahroni, Muhammad Tajuddin, Siti Agrippina Alodia Yusuf, Jihadil Qudsi, and Nenet Natasudian Jaya. "WAVELET DETAIL COEFFICIENT AS A NOVEL WAVELET-MFCC FEATURES IN TEXT-DEPENDENT SPEAKER RECOGNITION SYSTEM." IIUM Engineering Journal 23, no. 1 (January 4, 2022): 68–81. http://dx.doi.org/10.31436/iiumej.v23i1.1760.

Full text
Abstract:
Speaker recognition is the process of recognizing a speaker from his speech. This can be used in many aspects of life, such as taking access remotely to a personal device, securing access to voice control, and doing a forensic investigation. In speaker recognition, extracting features from the speech is the most critical process. The features are used to represent the speech as unique features to distinguish speech samples from one another. In this research, we proposed the use of a combination of Wavelet and Mel Frequency Cepstral Coefficient (MFCC), Wavelet-MFCC, as feature extraction methods, and Hidden Markov Model (HMM) as classification. The speech signal is first extracted using Wavelet into one level of decomposition, then only the sub-band detail coefficient is used as the feature for further extraction using MFCC. The modeled system was applied in 300 speech datasets of 30 speakers uttering “HADIR” in the Indonesian language. K-fold cross-validation is implemented with five folds. As much as 80% of the data were trained for each fold, while the rest was used as testing data. Based on the testing, the system's accuracy using the combination of Wavelet-MFCC obtained is 96.67%. ABSTRAK: Pengecaman penutur adalah proses mengenali penutur dari ucapannya yang dapat digunakan dalam banyak aspek kehidupan, seperti mengambil akses dari jauh ke peranti peribadi, mendapat kawalan ke atas akses suara, dan melakukan penyelidikan forensik. Ciri-ciri khas dari ucapan merupakan proses paling kritikal dalam pengecaman penutur. Ciri-ciri ini digunakan bagi mengenali ciri unik yang terdapat pada sesebuah ucapan dalam membezakan satu sama lain. Penyelidikan ini mencadangkan penggunaan kombinasi Wavelet dan Mel Frekuensi Pekali Cepstral (MFCC), Wavelet-MFCC, sebagai kaedah ekstrak ciri-ciri penutur, dan Model Markov Tersembunyi (HMM) sebagai pengelasan. Isyarat penuturan pada awalnya diekstrak menggunakan Wavelet menjadi satu tahap penguraian, kemudian hanya pekali perincian sub-jalur digunakan bagi pengekstrakan ciri-ciri berikutnya menggunakan MFCC. Model ini diterapkan kepada 300 kumpulan data ucapan daripada 30 penutur yang mengucapkan kata "HADIR" dalam bahasa Indonesia. Pengesahan silang K-lipat dilaksanakan dengan 5 lipatan. Sebanyak 80% data telah dilatih bagi setiap lipatan, sementara selebihnya digunakan sebagai data ujian. Berdasarkan ujian ini, ketepatan sistem yang menggunakan kombinasi Wavelet-MFCC memperolehi 96.67%.
APA, Harvard, Vancouver, ISO, and other styles
42

Alghifari, Muhammad Fahreza, Teddy Surya Gunawan, and Mira Kartiwi. "Speech Emotion Recognition Using Deep Feedforward Neural Network." Indonesian Journal of Electrical Engineering and Computer Science 10, no. 2 (May 1, 2018): 554. http://dx.doi.org/10.11591/ijeecs.v10.i2.pp554-561.

Full text
Abstract:
Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized. The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions.<em><span style="font-size: 9pt; font-family: Arial, sans-serif;" lang="EN-MY">Speech emotion recognition (SER) is currently a research hotspot due to its challenging nature but bountiful future prospects. The objective of this research is to utilize Deep Neural Networks (DNNs) to recognize human speech emotion. First, the chosen speech feature Mel-frequency cepstral coefficient (MFCC) were extracted from raw audio data. Second, the speech features extracted were fed into the DNN to train the network. The trained network was then tested onto a set of labelled emotion speech audio and the recognition rate was evaluated. Based on the accuracy rate the MFCC, number of neurons and layers are adjusted for optimization. Moreover, a custom-made database is introduced and validated using the network optimized.</span></em><em><span style="font-size: 9pt; font-family: Arial, sans-serif;" lang="EN-MY">The optimum configuration for SER is 13 MFCC, 12 neurons and 2 layers for 3 emotions and 25 MFCC, 21 neurons and 4 layers for 4 emotions, achieving a total recognition rate of 96.3% for 3 emotions and 97.1% for 4 emotions.</span></em>
APA, Harvard, Vancouver, ISO, and other styles
43

Heriyanto, Heriyanto, Sri Hartati, and Agfianto Eko Putra. "EKSTRAKSI CIRI MEL FREQUENCY CEPSTRAL COEFFICIENT (MFCC) DAN RERATA COEFFICIENT UNTUK PENGECEKAN BACAAN AL-QUR’AN." Telematika 15, no. 2 (October 31, 2018): 99. http://dx.doi.org/10.31315/telematika.v15i2.3123.

Full text
Abstract:
AbstrakBelajar membaca Al-Qur’an menggunakan alat bantu aplikasi sangat diperlukan dalam mempermudah dan memahami bacaan Al-Qur’an. Pengecekan bacaan Al-Qur’an salah satu metode dengan MFCC untuk pengenalan suara cukup baik dalam speech recognition.Metode tersebut telah lama diperkenalkan oleh Davis dan Mermelstein sekitar tahun 1980. MFCC merupakan metode ekstraksi ciri untuk mendapatkan cepstral coefficient dan frame sehingga dapat digunakan untuk pemrosesan pengenalan suara agar lebih baik dalam ketepatan. Tahapan MFCC mulai dari pre-emphasis, frame blocking, windowing, Fast Fourier Transform (FFT), Mel Frequency Wrapping (MFW), Discrete Cosine Transoform (DCT) dan cepstral liftreing. Hasil pengecekan bacaan Al-Qur’an diujikan dalam sebelas surat mulai dari surat Al-Fatihah, Al-Baqarah, Al-Imran, Al-Hadid, Al-Ashr, Ar-rahman, Al-Alaq, Al-Kautsar, Al-Ikhlas, Al-Falaq dan An-Nas menghasilkan akurasi sebesar rata-rata 51,8%. Kata Kunci : Suara, Bacaan, MFCC, Kesesuaian, Ekstraksi Ciri, Referensi, Bobot, Dominan.
APA, Harvard, Vancouver, ISO, and other styles
44

CHEN, X. H., and J. Z. H. ZHANG. "MFCC-DOWNHILL SIMPLEX METHOD FOR MOLECULAR STRUCTURE OPTIMIZATION." Journal of Theoretical and Computational Chemistry 03, no. 03 (September 2004): 277–89. http://dx.doi.org/10.1142/s0219633604001045.

Full text
Abstract:
In this paper, a MFCC-downhill simplex method is proposed to study binding structures of small molecules or ions in large molecular complex systems. This method employs the Molecular Fractionation with Conjugated Caps (MFCC) approach in computing inter-molecular energy and implements the downhill simplex algorithm for structural optimization. The method is numerically tested on a system of [KCp(18-crown-6)] to optimize the position of the potassium cation in a fixed coordination sphere. The result of MFCC-downhill simplex optimization method shows good agreement with both the crystal structure and with the full-system downhill simplex optimized structure. The effect of the initial structure of the simplex and the method/basis-set levels of the quantum chemical calculation on the MFCC-downhill simplex optimization are also discussed. This method should be applicable to structure optimization of large complex molecular systems such as proteins or other biopolymers.
APA, Harvard, Vancouver, ISO, and other styles
45

Zhang, Hongxing, Hu Li, Wenxin Chen, and Hongjun Han. "Feature Extraction of Speech Signal Based on MFCC (Mel cepstrum coefficient)." Journal of Physics: Conference Series 2584, no. 1 (September 1, 2023): 012143. http://dx.doi.org/10.1088/1742-6596/2584/1/012143.

Full text
Abstract:
Abstract Smart power plant is to establish a modern energy power system to achieve safe, efficient, green, and low-carbon power generation. Its characteristics are that the production process can be independently optimized, the relevant systems can collect, analyze, judge, and plan their own behavior, and intelligently and dynamically optimize equipment configuration and its parameters. This paper focuses on the optimal recognition state of MFCC in smart power plants. In this paper, we propose that by changing the number of filters and the order of MFCC to view the expression effect of the final MFCC parameter, the evaluation index of the effect is “accuracy”, the evaluation index—accuracy in the neural network. In this paper, the network is built through a Python programming environment, and the comparative experiment is adopted to analyze the influence of each parameter on the speech information expression effect of MFCC parameters.
APA, Harvard, Vancouver, ISO, and other styles
46

Lee, Ji-Yeoun. "Classification between Elderly Voices and Young Voices Using an Efficient Combination of Deep Learning Classifiers and Various Parameters." Applied Sciences 11, no. 21 (October 21, 2021): 9836. http://dx.doi.org/10.3390/app11219836.

Full text
Abstract:
The objective of this research was to develop deep learning classifiers and various parameters that provide an accurate and objective system for classifying elderly and young voice signals. This work focused on deep learning methods, such as feedforward neural network (FNN) and convolutional neural network (CNN), for the detection of elderly voice signals using mel-frequency cepstral coefficients (MFCCs) and linear prediction cepstrum coefficients (LPCCs), skewness, as well as kurtosis parameters. In total, 126 subjects (63 elderly and 63 young) were obtained from the Saarbruecken voice database. The highest performance of 93.75% appeared when the skewness was added to the MFCC and MFCC delta parameters, although the fusion of the skewness and kurtosis parameters had a positive effect on the overall accuracy of the classification. The results of this study also revealed that the performance of FNN was higher than that of CNN. Most parameters estimated from male data samples demonstrated good performance in terms of gender. Rather than using mixed female and male data, this work recommends the development of separate systems that represent the best performance through each optimized parameter using data from independent male and female samples.
APA, Harvard, Vancouver, ISO, and other styles
47

Thanh, Chu Ba, Trinh Van Loan, and Nguyen Hong Quang. "SOME NEW RESULTS ON AUTOMATIC IDENTIFICATION OF VIETNAMESE FOLK SONGS CHEO AND QUANHO." Journal of Computer Science and Cybernetics 36, no. 4 (December 14, 2020): 325–45. http://dx.doi.org/10.15625/1813-9663/36/4/14424.

Full text
Abstract:
Vietnamese folk songs are very rich in genre and content. Identifying Vietnamese folk tunes will contribute to the storage and search for information about these tunes automatically. The paper will present an overview of the classification of music genres that have been performed in Vietnam and abroad. For two types of very popular folk songs of Vietnam such as Cheo and Quan ho, the paper describes the dataset and GMM (Gaussian Mixture Model) to perform the experiments on identifying some of these folk songs. The GMM used for experiment with 4 sets of parameters containing MFCC (Mel Frequency Cepstral Coefficients), energy, first derivative and second derivative of MFCC and energy, tempo, intensity, and fundamental frequency. The results showed that the parameters added to the MFCCs contributed significantly to the improvement of the identification accuracy with the appropriate values of Gaussian component number M. Our experiments also showed that, on average, the length of the excerpts was only 29.63% of the whole song for Cheo and 38.1% of the whole song for Quan ho, the identification rate was only 3.1% and 2.33% less than the whole song for Cheo and Quan ho respectively.
APA, Harvard, Vancouver, ISO, and other styles
48

Singh, Mahesh K., S. Manusha, K. V. Balaramakrishna, and Sridevi Gamini. "Speaker Identification Analysis Based on Long-Term Acoustic Characteristics with Minimal Performance." International Journal of Electrical and Electronics Research 10, no. 4 (December 30, 2022): 848–52. http://dx.doi.org/10.37391/ijeer.100415.

Full text
Abstract:
The identity of the speakers depends on the phonological properties acquired from the speech. The Mel-Frequency Cepstral Coefficients (MFCC) are better researched for derived the acoustic characteristic. This speaker model is based on a small representation and the characteristics of the acoustic features. These are derived from the speaker model and the cartographic representation by the MFCCs. The MFCC is used for independent monitoring of speaker text. There is a problem with the recognition of speakers by small representation, so proposed the Gaussian Mixture Model (GMM), mean super vector core for training. Unknown vector modules are cleared using rarity and experiments based on the TMIT database. The I-vector algorithm is proposed for the effective improvement of ASR (Automatic Speaker Recognition). The Atom Aligned Sparse Representation (AASR) is used to describe the speaker-based model. The Short Representation Classification (SRC) is used to describe the speaker recognition report. A robust short coding is based on the Maximum Likelihood Estimation (MIE) to clarify the problem in small representation. Strong speaker verification based on a small representation of GMM super vectors. Strong speaker verification based on a small representation of GMM super vectors.
APA, Harvard, Vancouver, ISO, and other styles
49

Kasim, Anita Ahmad, Muhammad Bakri, Irwan Mahmudi, Rahmawati Rahmawati, and Zulnabil Zulnabil. "Artificial Intelligent for Human Emotion Detection with the Mel-Frequency Cepstral Coefficient (MFCC)." JUITA : Jurnal Informatika 11, no. 1 (May 6, 2023): 47. http://dx.doi.org/10.30595/juita.v11i1.15435.

Full text
Abstract:
Emotions are an important aspect of human communication. Expression of human emotions can be identified through sound. The development of voice detection or speech recognition is a technology that has developed rapidly to help improve human-machine interaction. This study aims to classify emotions through the detection of human voices. One of the most frequently used methods for sound detection is the Mel-Frequency Cepstrum Coefficient (MFCC) where sound waves are converted into several types of representation. Mel-frequency cepstral coefficients (MFCCs) are the coefficients that collectively represent the short-term power spectrum of a sound, based on a linear cosine transform of a log power spectrum on a nonlinear mel scale of frequency. The primary data used in this research is the data recorded by the author. The secondary data used is data from the "Berlin Database of Emotional Speech" in the amount of 500 voice recording data. The use of MFCC can extract implied information from the human voice, especially to recognize the feelings experienced by humans when pronouncing the sound. In this study, the highest accuracy was obtained when training with epochs of 10000 times, which was 85% accuracy.
APA, Harvard, Vancouver, ISO, and other styles
50

Heriyanto, Heriyanto Heriyanto. "DETEKSI UCAPAN ANGKA SATU SAMPAI SEPULUH BAHASA PALEMBANG MENGGUNAKAN MFCC DAN BOBOT DOMINAN." Telematika 16, no. 1 (April 10, 2019): 52. http://dx.doi.org/10.31315/telematika.v16i1.3024.

Full text
Abstract:
Abstract Detecting speech with regional language, one of which is Palembang language, has uniqueness and distinctiveness in accent. Differences in dialects to check how precise and influential the accuracy of using MFCC and dominant weights. This study consists of three stages. The first stage, feature extraction of numerical numbers from one to ten using Mel Frequency Cepstral Coefficient (MFCC). The second stage is the selection of features that will be used as feature tables using the proposed model Normalized Dominant Weight (NBD) with threshold similarity, range, filtering, normalization of weights and dominant weights. The third stage is testing by checking by finding similarities in range, filtering, sequential multiplication and calculation of Suitability of Uniformity Patterns (CTF). The test results of checking MFCC and feature selection with normalization of dominant weights were 70% while without feature selection only 42%. Keywords : extraction, weighting, dominant, normalization, range Abstrak Deteksi ucapan dengan berbahasa daerah salah satunya bahasa Palembang mempunyai keunikan dan kekhasan dalam logat berbahasa. Perbedaan logat bahasa untuk mengecekan seberapa tepat dan berpengaruh terhadap akurasi menggunakan MFCC dan Bobot dominan. Penelitian ini terdiri atas tiga tahap. Tahap pertama, ekstraksi ciri angka bahasa angka satu sampai sepuluh menggunakan Mel Frequency Cepstral Coefficient (MFCC). Tahap kedua adalah pemilihan fitur yang akan dijadikan tabel fitur menggunakan model yang diusulkan Normalisasi Bobot Dominan (NBD) dengan kesamaan threshold, jangkauan, filtering, normalisasi bobot dan bobot dominan. Tahap ketiga adalah pengujian dengan pengecekan dengan cara mencari kesamaan jangkauan, filtering, perkalian sekuensial dan perhitungan Kesesuaian Keseragaman Pola (KKP). Hasil pengujian pengecekan terhadap MFCC dan pemilihan fitur dengan normalisasi bobot dominan sebesar 70% sedangkan tanpa pemilihan fitur hanya sebesar 42%. Kata kunci : ekstraksi, bobot, dominan, normalisasi, jangkauan
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography