Relevant bibliographies by topics / Perceptual features for speech recognition

Academic literature on the topic 'Perceptual features for speech recognition'

Author: Grafiati

Published: 4 June 2021

Last updated: 1 February 2022

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Journal articles
Dissertations / Theses
Books
Book chapters
Conference papers
Reports

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Perceptual features for speech recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Perceptual features for speech recognition"

Li, Guan Yu, Hong Zhi Yu, Yong Hong Li, and Ning Ma. "Features Extraction for Lhasa Tibetan Speech Recognition." Applied Mechanics and Materials 571-572 (June 2014): 205–8. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.205.

Full text

Abstract:

Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.

APA, Harvard, Vancouver, ISO, and other styles

Haque, Serajul, Roberto Togneri, and Anthony Zaknich. "Perceptual features for automatic speech recognition in noisy environments." Speech Communication 51, no. 1 (January 2009): 58–75. http://dx.doi.org/10.1016/j.specom.2008.06.002.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." International Journal of Synthetic Emotions 7, no. 1 (January 2016): 58–68. http://dx.doi.org/10.4018/ijse.2016010105.

Full text

Abstract:

Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.

APA, Harvard, Vancouver, ISO, and other styles

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Optimizing Integrated Features for Hindi Automatic Speech Recognition System." Journal of Intelligent Systems 29, no. 1 (October 1, 2018): 959–76. http://dx.doi.org/10.1515/jisys-2018-0057.

Full text

Abstract:

Abstract An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.

APA, Harvard, Vancouver, ISO, and other styles

Al Mahmud, Nahyan, and Shahfida Amjad Munni. "Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition." International journal of Multimedia & Its Applications 12, no. 5 (October 30, 2020): 1–8. http://dx.doi.org/10.5121/ijma.2020.12501.

Full text

Abstract:

The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.

APA, Harvard, Vancouver, ISO, and other styles

Kamińska, Dorota. "Emotional Speech Recognition Based on the Committee of Classifiers." Entropy 21, no. 10 (September 21, 2019): 920. http://dx.doi.org/10.3390/e21100920.

Full text

Abstract:

This article presents the novel method for emotion recognition from speech based on committee of classifiers. Different classification methods were juxtaposed in order to compare several alternative approaches for final voting. The research is conducted on three different types of Polish emotional speech: acted out with the same content, acted out with different content, and spontaneous. A pool of descriptors, commonly utilized for emotional speech recognition, expanded with sets of various perceptual coefficients, is used as input features. This research shows that presented approach improve the performance with respect to a single classifier.

APA, Harvard, Vancouver, ISO, and other styles

Dmitrieva, E., V. Gelman, K. Zaitseva, and A. Orlov. "Psychophysiological features of perceptual learning in the process of speech emotional prosody recognition." International Journal of Psychophysiology 85, no. 3 (September 2012): 375. http://dx.doi.org/10.1016/j.ijpsycho.2012.07.034.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Seyedin, Sanaz, Seyed Mohammad Ahadi, and Saeed Gazor. "New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition." Scientific World Journal 2013 (2013): 1–11. http://dx.doi.org/10.1155/2013/634160.

Full text

Abstract:

This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.

APA, Harvard, Vancouver, ISO, and other styles

Kaur, Gurpreet, Mohit Srivastava, and Amod Kumar. "Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks." Journal of Telecommunications and Information Technology 2 (June 29, 2018): 23–31. http://dx.doi.org/10.26636/jtit.2018.119617.

Full text

Abstract:

Huge growth is observed in the speech and speaker recognition ﬁeld due to many artiﬁcial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coeﬃcient (MFCC) speech features, and classiﬁcation is performed using a Deep Neural Network (DNN). In the ﬁrst phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and eﬃciency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and speciﬁcity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coeﬃcient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefﬁcients (MFCC) and relative spectra ﬁltering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of diﬀerent methods based on existing techniques for both clean and noisy environments is made as well.

APA, Harvard, Vancouver, ISO, and other styles

Trabelsi, Imen, and Med Salim Bouhlel. "Feature Selection for GUMI Kernel-Based SVM in Speech Emotion Recognition." International Journal of Synthetic Emotions 6, no. 2 (July 2015): 57–68. http://dx.doi.org/10.4018/ijse.2015070104.

Full text

Abstract:

Speech emotion recognition is the indispensable requirement for efficient human machine interaction. Most modern automatic speech emotion recognition systems use Gaussian mixture models (GMM) and Support Vector Machines (SVM). GMM are known for their performance and scalability in the spectral modeling while SVM are known for their discriminatory power. A GMM-supervector characterizes an emotional style by the GMM parameters (mean vectors, covariance matrices, and mixture weights). GMM-supervector SVM benefits from both GMM and SVM frameworks. In this paper, the GMM-UBM mean interval (GUMI) kernel based on the Bhattacharyya distance is successfully used. CFSSubsetEval combined with Best first algorithm and Greedy stepwise were also utilized on the supervectors space in order to select the most important features. This framework is illustrated using Mel-frequency cepstral (MFCC) coefficients and Perceptual Linear Prediction (PLP) features on two different emotional databases namely the Surrey Audio-Expressed Emotion and the Berlin Emotional speech Database.

APA, Harvard, Vancouver, ISO, and other styles

More sources

Dissertations / Theses on the topic "Perceptual features for speech recognition"

Haque, Serajul. "Perceptual features for speech recognition." University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2008. http://theses.library.uwa.edu.au/adt-WU2008.0187.

Full text

Abstract:

Automatic speech recognition (ASR) is one of the most important research areas in the field of speech technology and research. It is also known as the recognition of speech by a machine or, by some artificial intelligence. However, in spite of focused research in this field for the past several decades, robust speech recognition with high reliability has not been achieved as it degrades in presence of speaker variabilities, channel mismatch condi- tions, and in noisy environments. The superb ability of the human auditory system has motivated researchers to include features of human perception in the speech recognition process. This dissertation investigates the roles of perceptual features of human hearing in automatic speech recognition in clean and noisy environments. Methods of simplified synaptic adaptation and two-tone suppression by companding are introduced by temporal processing of speech using a zero-crossing algorithm. It is observed that a high frequency enhancement technique such as synaptic adaptation performs better in stationary Gaussian white noise, whereas a low frequency enhancement technique such as the two-tone sup- pression performs better in non-Gaussian non-stationary noise types. The effects of static compression on ASR parametrization are investigated as observed in the psychoacoustic input/output (I/O) perception curves. A method of frequency dependent asymmetric compression technique, that is, higher compression in the higher frequency regions than the lower frequency regions, is proposed. By asymmetric compression, degradation of the spectral contrast of the low frequency formants due to the added compression is avoided. A novel feature extraction method for ASR based on the auditory processing in the cochlear nucleus is presented. The processings for synchrony detection, average discharge (mean rate) processing and the two tone suppression are segregated and processed separately at the feature extraction level according to the differential processing scheme as observed in the AVCN, PVCN and the DCN, respectively, of the cochlear nucleus. It is further observed that improved ASR performances can be achieved by separating the synchrony detection from the synaptic processing. A time-frequency perceptual spectral subtraction method based on several psychoacoustic properties of human audition is developed and evaluated by an ASR front-end. An auditory masking threshold is determined based on these psychoacoustic e?ects. It is observed that in speech recognition applications, spec- tral subtraction utilizing psychoacoustics may be used for improved performance in noisy conditions. The performance may be further improved if masking of noise by the tonal components is augmented by spectral subtraction in the masked region.

APA, Harvard, Vancouver, ISO, and other styles

Gu, Y. "Perceptually-based features in automatic speech recognition." Thesis, Swansea University, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.637182.

Full text

Abstract:

Interspeaker variability of speech features is one of most important problems in automatic speech recognition (ASR), and makes speaker-independent systems much more difficult to achieve than speaker-dependent ones. The work described in the Thesis examines two ideas to overcome this problem. The first attempts to extract more reliable speech features by perceptually-based modelling; the second investigates the speaker variability in this speech feature and reduces its effects by a speaker normalisation scheme. The application of human speech perception in automatic speech recognition is discussed in the Thesis. Several perceptually-based feature analysis techniques are compared in terms of recognition performance, and the effects of individual perceptual parameter encompassed in the feature analysis are investigated. The work demonstrates the benefits of perceptual feature analysis (particularly perceptually-based linear predictive approach) compared with the conventional linear predictive analysis technique. The proposal for speaker normalisation is based on a regional-continuous linear matrix transform function on the perceptual feature space, with an automatic feature classification. This approach is applied in an ASR adaptation system. It is shown that the recognition error rate reduces rapidly when using a few words or a single sentence for adaptation. The adaptation performance demonstrates that such an approach could be very promising for a large vocabulary speaker-independent system.

APA, Harvard, Vancouver, ISO, and other styles

Chu, Kam Keung. "Feature extraction based on perceptual non-uniform spectral compression for noisy speech recognition /." access full-text access abstract and table of contents, 2005. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?mphil-ee-b19887516a.pdf.

Full text

Abstract:

Thesis (M.Phil.)--City University of Hong Kong, 2005.
"Submitted to Department of Electronic Engineering in partial fulfillment of the requirements for the degree of Master of Philosophy" Includes bibliographical references (leaves 143-147)

APA, Harvard, Vancouver, ISO, and other styles

Koniaris, Christos. "Perceptually motivated speech recognition and mispronunciation detection." Doctoral thesis, KTH, Tal-kommunikation, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102321.

Full text

Abstract:

This doctoral thesis is the result of a research effort performed in two fields of speech technology, i.e., speech recognition and mispronunciation detection. Although the two areas are clearly distinguishable, the proposed approaches share a common hypothesis based on psychoacoustic processing of speech signals. The conjecture implies that the human auditory periphery provides a relatively good separation of different sound classes. Hence, it is possible to use recent findings from psychoacoustic perception together with mathematical and computational tools to model the auditory sensitivities to small speech signal changes. The performance of an automatic speech recognition system strongly depends on the representation used for the front-end. If the extracted features do not include all relevant information, the performance of the classification stage is inherently suboptimal. The work described in Papers A, B and C is motivated by the fact that humans perform better at speech recognition than machines, particularly for noisy environments. The goal is to make use of knowledge of human perception in the selection and optimization of speech features for speech recognition. These papers show that maximizing the similarity of the Euclidean geometry of the features to the geometry of the perceptual domain is a powerful tool to select or optimize features. Experiments with a practical speech recognizer confirm the validity of the principle. It is also shown an approach to improve mel frequency cepstrum coefficients (MFCCs) through offline optimization. The method has three advantages: i) it is computationally inexpensive, ii) it does not use the auditory model directly, thus avoiding its computational cost, and iii) importantly, it provides better recognition performance than traditional MFCCs for both clean and noisy conditions. The second task concerns automatic pronunciation error detection. The research, described in Papers D, E and F, is motivated by the observation that almost all native speakers perceive, relatively easily, the acoustic characteristics of their own language when it is produced by speakers of the language. Small variations within a phoneme category, sometimes different for various phonemes, do not change significantly the perception of the language’s own sounds. Several methods are introduced based on similarity measures of the Euclidean space spanned by the acoustic representations of the speech signal and the Euclidean space spanned by an auditory model output, to identify the problematic phonemes for a given speaker. The methods are tested for groups of speakers from different languages and evaluated according to a theoretical linguistic study showing that they can capture many of the problematic phonemes that speakers from each language mispronounce. Finally, a listening test on the same dataset verifies the validity of these methods.

QC 20120914

European Union FP6-034362 research project ACORNS
Computer-Animated language Teachers (CALATea)

APA, Harvard, Vancouver, ISO, and other styles

Koniaris, Christos. "A study on selecting and optimizing perceptually relevant features for automatic speech recognition." Licentiate thesis, Stockholm : Kungliga Tekniska högskolan, 2009. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-11470.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.

Full text

Abstract:

In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.

APA, Harvard, Vancouver, ISO, and other styles

Atassi, Hicham. "Rozpoznání emočního stavu z hrané a spontánní řeči." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2014. http://www.nusl.cz/ntk/nusl-233665.

Full text

Abstract:

Dizertační práce se zabývá rozpoznáním emočního stavu mluvčích z řečového signálu. Práce je rozdělena do dvou hlavních častí, první část popisuju navržené metody pro rozpoznání emočního stavu z hraných databází. V rámci této části jsou představeny výsledky rozpoznání použitím dvou různých databází s různými jazyky. Hlavními přínosy této části je detailní analýza rozsáhlé škály různých příznaků získaných z řečového signálu, návrh nových klasifikačních architektur jako je například „emoční párování“ a návrh nové metody pro mapování diskrétních emočních stavů do dvou dimenzionálního prostoru. Druhá část se zabývá rozpoznáním emočních stavů z databáze spontánní řeči, která byla získána ze záznamů hovorů z reálných call center. Poznatky z analýzy a návrhu metod rozpoznání z hrané řeči byly využity pro návrh nového systému pro rozpoznání sedmi spontánních emočních stavů. Jádrem navrženého přístupu je komplexní klasifikační architektura založena na fúzi různých systémů. Práce se dále zabývá vlivem emočního stavu mluvčího na úspěšnosti rozpoznání pohlaví a návrhem systému pro automatickou detekci úspěšných hovorů v call centrech na základě analýzy parametrů dialogu mezi účastníky telefonních hovorů.

APA, Harvard, Vancouver, ISO, and other styles

Temko, Andriy. "Acoustic event detection and classification." Doctoral thesis, Universitat Politècnica de Catalunya, 2007. http://hdl.handle.net/10803/6880.

Full text

Abstract:

L'activitat humana que té lloc en sales de reunions o aules d'ensenyament es veu reflectida en una rica varietat d'events acústics, ja siguin produïts pel cos humà o per objectes que les persones manegen. Per això, la determinació de la identitat dels sons i de la seva posició temporal pot ajudar a detectar i a descriure l'activitat humana que té lloc en la sala. A més a més, la detecció de sons diferents de la veu pot ajudar a millorar la robustes de tecnologies de la parla com el reconeixement automàtica a condicions de treball adverses. L'objectiu d'aquesta tesi és la detecció i classificació automàtica d'events acústics. Es tracta de processar els senyals acústics recollits per micròfons distants en sales de reunions o aules per tal de convertir-los en descripcions simbòliques que es corresponguin amb la percepció que un oient tindria dels diversos events sonors continguts en els senyals i de les seves fonts. En primer lloc, s'encara la tasca de classificació automàtica d'events acústics amb classificadors de màquines de vectors suport (Support Vector Machines (SVM)), elecció motivada per l'escassetat de dades d'entrenament. Per al problema de reconeixement multiclasse es desenvolupa un esquema d'agrupament automàtic amb conjunt de característiques variable i basat en matrius de confusió. Realitzant proves amb la base de dades recollida, aquest classificador obté uns millors resultats que la tècnica basada en models de barreges de Gaussianes (Gaussian Mixture Models (GMM)), i aconsegueix una reducció relativa de l'error mitjà elevada en comparació amb el millor resultat obtingut amb l'esquema convencional basat en arbre binari. Continuant amb el problema de classificació, es comparen unes quantes maneres alternatives d'estendre els SVM al processament de seqüències, en un intent d'evitar l'inconvenient de treballar amb vectors de longitud fixa que presenten els SVM quan han de tractar dades d'àudio. En aquestes proves s'observa que els nuclis de deformació temporal dinàmica funcionen bé amb sons que presenten una estructura temporal. A més a més, s'usen conceptes i eines manllevats de la teoria de lògica difusa per investigar, d'una banda, la importància de cada una de les característiques i el grau d'interacció entre elles, i d'altra banda, tot cercant l'augment de la taxa de classificació, s'investiga la fusió de les
sortides de diversos sistemes de classificació. Els sistemes de classificació d'events acústics
desenvolupats s'han testejat també mitjançant la participació en unes quantes avaluacions d'àmbit
internacional, entre els anys 2004 i 2006. La segona principal contribució d'aquest treball de tesi consisteix en el desenvolupament de sistemes de detecció d'events acústics. El problema de la detecció és més complex, ja que inclou tant la classificació dels sons com la determinació dels intervals temporals on tenen lloc. Es desenvolupen dues versions del sistema i es proven amb els conjunts de dades de les dues campanyes d'avaluació internacional CLEAR que van tenir lloc els anys 2006 i 2007, fent-se servir dos tipus de bases de dades: dues bases d'events acústics aïllats, i una base d'enregistraments de seminaris interactius, les quals contenen un nombre relativament elevat d'ocurrències dels events acústics especificats. Els sistemes desenvolupats, que consisteixen en l'ús de classificadors basats en SVM que operen dins
d'una finestra lliscant més un post-processament, van ser els únics presentats a les avaluacions
esmentades que no es basaven en models de Markov ocults (Hidden Markov Models) i cada un d'ells
va obtenir resultats competitius en la corresponent avaluació. La detecció d'activitat oral és un altre dels objectius d'aquest treball de tesi, pel fet de ser un cas particular de detecció d'events acústics especialment important. Es desenvolupa una tècnica de millora de l'entrenament dels SVM per fer front a la necessitat de reducció de l'enorme conjunt de dades existents. El sistema resultant, basat en SVM, és testejat amb uns quants conjunts de dades de l'avaluació NIST RT (Rich Transcription), on mostra puntuacions millors que les del sistema basat en GMM, malgrat que aquest darrer va quedar entre els primers en l'avaluació NIST RT de 2006.
Per acabar, val la pena esmentar alguns resultats col·laterals d'aquest treball de tesi. Com que s'ha dut a terme en l'entorn del projecte europeu CHIL, l'autor ha estat responsable de l'organització de les avaluacions internacionals de classificació i detecció d'events acústics abans esmentades, liderant l'especificació de les classes d'events, les bases de dades, els protocols d'avaluació i, especialment, proposant i implementant les diverses mètriques utilitzades. A més a més, els sistemes de detecció
s'han implementat en la sala intel·ligent de la UPC, on funcionen en temps real a efectes de test i demostració.
The human activity that takes place in meeting-rooms or class-rooms is reflected in a rich variety of acoustic events, either produced by the human body or by objects handled by humans, so the determination of both the identity of sounds and their position in time may help to detect and describe that human activity.
Additionally, detection of sounds other than speech may be useful to enhance the robustness of speech technologies like automatic speech recognition. Automatic detection and classification of acoustic events is the objective of this thesis work. It aims at processing the acoustic signals collected by distant microphones in meeting-room or classroom environments to convert them into symbolic descriptions corresponding to a listener's perception of the different sound events that are present in the signals and their sources. First of all, the task of acoustic event classification is faced using Support Vector Machine (SVM) classifiers, which are motivated by the scarcity of training data. A confusion-matrix-based variable-feature-set clustering scheme is developed for the multiclass recognition problem, and tested on the gathered database. With it, a higher classification rate than the GMM-based technique is obtained, arriving to a large relative average error reduction with respect to the best result from the conventional binary tree scheme. Moreover, several ways to extend SVMs to sequence processing are compared, in an attempt to avoid the drawback of SVMs when dealing with audio data, i.e. their restriction to work with fixed-length vectors, observing that the dynamic time warping kernels work well for sounds that show a temporal structure. Furthermore, concepts and tools from the fuzzy theory are used to investigate, first, the importance of and degree of interaction among features, and second, ways to fuse the outputs of several classification systems. The developed AEC systems are tested also by participating in several international evaluations from 2004 to 2006, and the results
are reported. The second main contribution of this thesis work is the development of systems for detection of acoustic events. The detection problem is more complex since it includes both classification and determination of the time intervals where the sound takes place. Two system versions are developed and tested on the datasets of the two CLEAR international evaluation campaigns in 2006 and 2007. Two kinds of databases are used: two databases of isolated acoustic events, and a database of interactive seminars containing a significant number of acoustic events of interest. Our developed systems, which consist of SVM-based classification within a sliding window plus post-processing, were the only submissions not using HMMs, and each of them obtained competitive results in the corresponding evaluation. Speech activity detection was also pursued in this thesis since, in fact, it is a -especially important - particular case of acoustic event detection. An enhanced SVM training approach for the speech activity detection task is developed, mainly to cope with the problem of dataset reduction. The resulting SVM-based system is tested with several NIST Rich Transcription (RT) evaluation datasets, and it shows better scores than our GMM-based system, which ranked among the best systems in the RT06 evaluation. Finally, it is worth mentioning a few side outcomes from this thesis work. As it has been carried out in the framework of the CHIL EU project, the author has been responsible for the organization of the above mentioned international evaluations in acoustic event classification and detection, taking a leading role in the specification of acoustic event classes, databases, and evaluation protocols, and, especially, in the proposal and implementation of the various metrics that have been used. Moreover, the detection systems have been implemented in the UPC's smart-room and work in real time for purposes of testing and demonstration.

APA, Harvard, Vancouver, ISO, and other styles

Lileikytė, Rasa. "Quality estimation of speech recognition features." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2012. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2012~D_20120302_090132-92071.

Full text

Abstract:

The accuracy of speech recognition system depends on characteristics of employed speech recognition features and classifier. Evaluating the accuracy of speech recognition system in ordinary way, the error of speech recognition system has to be calculated for each type of explored feature system and each type of classifier. The amount of such calculations can be reduced if the quality of explored feature system is estimated. Accordingly, the researches were made for quality estimation of speech recognition features. The proposed method for quality estimation of speech recognition features is based on three metrics usage. It was demonstrated, that the proposed method describes the quality of speech recognition features in Euclidean space and reduces the calculations of quality estimation of speech recognition systems. Demonstrated, that algorithm complexity of method for quality estimation of speech recognition features is O(2Rlog2R), while algorithm complexity of dynamic time warping recognition system is O(R^2), where R is vectors number of speech pattern references. The results of experimental researches confirmed the correctness of the proposed method for quality estimation of speech recognition features.
Šnekos signalų atpažinimo sistemų tikslumas priklauso nuo šnekos signalus aprašančių požymių ir šiuos požymius naudojančių klasifikatorių savybių. Vertinant tradiciškai atpažinimo sistemų tikslumą kiekvienai pasirinktai požymių sistemai ir kiekvienam klasifikatoriaus tipui tenka atlikti atpažinimo tikslumo skaičiavimus. Tokių darbų apimtis galima sumažinti įvertinus pasirenkamų požymių kokybę. Todėl buvo atlikti šnekos signalų požymių kokybės vertinimo tyrimai. Ištirtas metodas šnekos signalų atpažinimo požymių kokybei vertinti, grindžiamas trijų metrikų panaudojimu. Parodyta, kad tokiu būdu atrinkti šnekos signalų požymiai Euklido erdvėje aprašo atpažinimo sistemų kokybę ir leidžia sumažinti atpažinimo sistemų kokybės vertinimo darbų apimtis. Parodyta, kad šnekos signalų požymių kokybės vertinimo metodo algoritmo sudėtingumas yra O(2Rlog2R), o atpažinimo sistemos, kuriame naudojamas dinaminio laiko skalės kraipymo klasifikatorius, atpažinimo kokybės vertinimo algoritmo sudėtingumas yra O(R^2), R – šnekos signalų etalonų vektorių skaičius. Eksperimentinių tyrimų rezultatai patvirtino pateikto šnekos signalų atpažinimo požymių kokybės vertinimo metodo teisingumą.

APA, Harvard, Vancouver, ISO, and other styles

Matthews, Iain. "Features for audio-visual speech recognition." Thesis, University of East Anglia, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.266736.

Full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Books on the topic "Perceptual features for speech recognition"

Rao, K. Sreenivasa, and Shashidhar G. Koolagudi. Emotion Recognition using Speech Features. New York, NY: Springer New York, 2013. http://dx.doi.org/10.1007/978-1-4614-5143-3.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Rao, K. Sreenivasa, and Manjunath K E. Speech Recognition Using Articulatory and Excitation Source Features. Cham: Springer International Publishing, 2017. http://dx.doi.org/10.1007/978-3-319-49220-9.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Gabsdil, Malte. Automatic classification of speech recognition hypotheses using acoustic and pragmatic features. Saarbrücken: DFKI & Universität des Saarlandes, 2005.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Rao, K. Sreenivasa. Robust Emotion Recognition using Spectral and Prosodic Features. New York, NY: Springer New York, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Kulshreshtha, Manisha. Dialect Accent Features for Establishing Speaker Identity: A Case Study. Boston, MA: Springer US, 2012.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Rao, K. Sreenivasa Sreenivasa, and Manjunath K. E. Speech Recognition Using Articulatory and Excitation Source Features. Springer, 2017.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Leibo, Joel Z., and Tomaso Poggio. Perception. Oxford University Press, 2018. http://dx.doi.org/10.1093/oso/9780199674923.003.0025.

Full text

Abstract:

This chapter provides an overview of biological perceptual systems and their underlying computational principles focusing on the sensory sheets of the retina and cochlea and exploring how complex feature detection emerges by combining simple feature detectors in a hierarchical fashion. We also explore how the microcircuits of the neocortex implement such schemes pointing out similarities to progress in the field of machine vision driven deep learning algorithms. We see signs that engineered systems are catching up with the brain. For example, vision-based pedestrian detection systems are now accurate enough to be installed as safety devices in (for now) human-driven vehicles and the speech recognition systems embedded in smartphones have become increasingly impressive. While not being entirely biologically based, we note that computational neuroscience, as described in this chapter, makes up a considerable portion of such systems’ intellectual pedigree.

APA, Harvard, Vancouver, ISO, and other styles

Lee, Lisa. The role of the structure of the lexicon in perceptual word learning. 1993.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Rao, K. Sreenivasa, and Shashidhar G. Koolagudi. Robust Emotion Recognition using Spectral and Prosodic Features. Springer, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

Rao, K. Sreenivasa, and Shashidhar G. Koolagudi. Robust Emotion Recognition using Spectral and Prosodic Features. Springer, 2013.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

More sources

Book chapters on the topic "Perceptual features for speech recognition"

Revathi, A., R. Nagakrishnan, D. Vishnu Vashista, Kuppa Sai Sri Teja, and N. Sasikaladevi. "Emotion Recognition from Speech Using Perceptual Features and Convolutional Neural Networks." In Lecture Notes in Electrical Engineering, 355–65. Singapore: Springer Singapore, 2020. http://dx.doi.org/10.1007/978-981-15-3992-3_29.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Zhang, Linjuan, Longbiao Wang, Jianwu Dang, Lili Guo, and Haotian Guan. "Convolutional Neural Network with Spectrogram and Perceptual Features for Speech Emotion Recognition." In Neural Information Processing, 62–71. Cham: Springer International Publishing, 2018. http://dx.doi.org/10.1007/978-3-030-04212-7_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Grau, Antoni, Joan Aranda, and Joan Climent. "Stepwise selection of perceptual texture features." In Advances in Pattern Recognition, 837–44. Berlin, Heidelberg: Springer Berlin Heidelberg, 1998. http://dx.doi.org/10.1007/bfb0033309.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Kaur, Gurpreet, Mohit Srivastava, and Amod Kumar. "Speech Recognition Fundamentals and Features." In Cognitive Computing Systems, 327–48. First edition.: Apple Academic Press, 2021. http://dx.doi.org/10.1201/9781003082033-18.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Frasconi, Paolo, Marco Gori, and Giovanni Soda. "Automatic speech recognition with neural networks: Beyond nonparametric models." In Intelligent Perceptual Systems, 104–21. Berlin, Heidelberg: Springer Berlin Heidelberg, 1993. http://dx.doi.org/10.1007/3-540-57379-8_6.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Potapova, Rodmonga, and Liliya Komalova. "Auditory-Perceptual Recognition of the Emotional State of Aggression." In Speech and Computer, 89–95. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-23132-7_11.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sendlmeier, Walter F. "Primary Perceptual Units in Word Recognition." In Recent Advances in Speech Understanding and Dialog Systems, 165–69. Berlin, Heidelberg: Springer Berlin Heidelberg, 1988. http://dx.doi.org/10.1007/978-3-642-83476-9_16.

Full text

APA, Harvard, Vancouver, ISO, and other styles

So, Stephen, and Kuldip K. Paliwal. "Quantization of Speech Features: Source Coding." In Advances in Pattern Recognition, 131–61. London: Springer London, 2008. http://dx.doi.org/10.1007/978-1-84800-143-5_7.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Karlos, Stamatis, Nikos Fazakis, Katerina Karanikola, Sotiris Kotsiantis, and Kyriakos Sgarbas. "Speech Recognition Combining MFCCs and Image Features." In Speech and Computer, 651–58. Cham: Springer International Publishing, 2016. http://dx.doi.org/10.1007/978-3-319-43958-7_79.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Bimbot, Frédéric, Gérard Chollet, and Jean-Pierre Tubach. "Phonetic features extraction using Time-Delay Neural Networks." In Speech Recognition and Understanding, 299–304. Berlin, Heidelberg: Springer Berlin Heidelberg, 1992. http://dx.doi.org/10.1007/978-3-642-76626-8_31.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Perceptual features for speech recognition"

Revathi, A., and C. Jeyalakshmi. "Robust speech recognition in noisy environment using perceptual features and adaptive filters." In 2017 2nd International Conference on Communication and Electronics Systems (ICCES). IEEE, 2017. http://dx.doi.org/10.1109/cesys.2017.8321168.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Umakanthan, Padmalochini, and Kaliappan Gopalan. "A Perceptual Masking based Feature Set for Speech Recognition." In Modelling and Simulation. Calgary,AB,Canada: ACTAPRESS, 2013. http://dx.doi.org/10.2316/p.2013.804-024.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Revathi, A., and Y. Venkataramani. "Perceptual Features Based Isolated Digit and Continuous Speech Recognition Using Iterative Clustering Approach." In 2009 First International Conference on Networks & Communications. IEEE, 2009. http://dx.doi.org/10.1109/netcom.2009.32.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Nguyen Quoc Trung and Phung Trung Nghia. "The perceptual wavelet feature for noise robust Vietnamese speech recognition." In 2008 Second International Conference on Communications and Electronics (ICCE). IEEE, 2008. http://dx.doi.org/10.1109/cce.2008.4578968.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Alatwi, Aadel, Stephen So, and Kuldip K. Paliwal. "Perceptually motivated linear prediction cepstral features for network speech recognition." In 2016 10th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE, 2016. http://dx.doi.org/10.1109/icspcs.2016.7843309.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Biswas, Astik, P. K. Sahu, Anirban Bhowmick, and Mahesh Chandra. "Acoustic feature extraction using ERB like wavelet sub-band perceptual Wiener filtering for noisy speech recognition." In 2014 Annual IEEE India Conference (INDICON). IEEE, 2014. http://dx.doi.org/10.1109/indicon.2014.7030474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Frolova, Оlga, and Elena Lyakso. "PERCEPTUAL FEATURES OF SPEECH AND VOCALIZATIONS OF 5-8 YEARS OLD CHILDREN WITH AUTISM SPECTRUM DISORDERS AND INTELLECTUAL DISABILITIES: RECOGNITION OF THE CHILD'S GENDER, AGE AND STATE." In XVI International interdisciplinary congress "Neuroscience for Medicine and Psychology". LLC MAKS Press, 2020. http://dx.doi.org/10.29003/m1310.sudak.ns2020-16/485-486.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Wu, Chung-Hsien, Yu-Hsien Chiu, and Huigan Lim. "Perceptual speech modeling for noisy speech recognition." In Proceedings of ICASSP '02. IEEE, 2002. http://dx.doi.org/10.1109/icassp.2002.5743735.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Chung-Hsien Wu, Yu-Hsien Chiu, and Huigan Lim. "Perceptual speech modeling for noisy speech recognition." In IEEE International Conference on Acoustics Speech and Signal Processing ICASSP-02. IEEE, 2002. http://dx.doi.org/10.1109/icassp.2002.1005757.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Sezgin, Cenk, Bilge Gunsel, and Canberk Hacioglu. "Audio emotion recognition by perceptual features." In 2012 20th Signal Processing and Communications Applications Conference (SIU). IEEE, 2012. http://dx.doi.org/10.1109/siu.2012.6204799.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Perceptual features for speech recognition"

Nahamoo, David. Robust Models and Features for Speech Recognition. Fort Belvoir, VA: Defense Technical Information Center, March 1998. http://dx.doi.org/10.21236/ada344834.

Full text

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

Academic literature on the topic 'Perceptual features for speech recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Contents

Journal articles on the topic "Perceptual features for speech recognition"

Dissertations / Theses on the topic "Perceptual features for speech recognition"

Books on the topic "Perceptual features for speech recognition"

Book chapters on the topic "Perceptual features for speech recognition"

Conference papers on the topic "Perceptual features for speech recognition"

Reports on the topic "Perceptual features for speech recognition"