Journal articles on the topic 'Perceptual features for speech recognition'

To see the other types of publications on this topic, follow the link: Perceptual features for speech recognition.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Perceptual features for speech recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Li, Guan Yu, Hong Zhi Yu, Yong Hong Li, and Ning Ma. "Features Extraction for Lhasa Tibetan Speech Recognition." Applied Mechanics and Materials 571-572 (June 2014): 205–8. http://dx.doi.org/10.4028/www.scientific.net/amm.571-572.205.

Full text
Abstract:
Speech feature extraction is discussed. Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction coefficient (PLP) method is analyzed. These two types of features are extracted in Lhasa large vocabulary continuous speech recognition system. Then the recognition results are compared.
APA, Harvard, Vancouver, ISO, and other styles
2

Haque, Serajul, Roberto Togneri, and Anthony Zaknich. "Perceptual features for automatic speech recognition in noisy environments." Speech Communication 51, no. 1 (January 2009): 58–75. http://dx.doi.org/10.1016/j.specom.2008.06.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Trabelsi, Imen, and Med Salim Bouhlel. "Comparison of Several Acoustic Modeling Techniques for Speech Emotion Recognition." International Journal of Synthetic Emotions 7, no. 1 (January 2016): 58–68. http://dx.doi.org/10.4018/ijse.2016010105.

Full text
Abstract:
Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with a wide range of applications. The purpose of speech emotion recognition system is to automatically classify speaker's utterances into different emotional states such as disgust, boredom, sadness, neutral, and happiness. The speech samples in this paper are from the Berlin emotional database. Mel Frequency cepstrum coefficients (MFCC), Linear prediction coefficients (LPC), linear prediction cepstrum coefficients (LPCC), Perceptual Linear Prediction (PLP) and Relative Spectral Perceptual Linear Prediction (Rasta-PLP) features are used to characterize the emotional utterances using a combination between Gaussian mixture models (GMM) and Support Vector Machines (SVM) based on the Kullback-Leibler Divergence Kernel. In this study, the effect of feature type and its dimension are comparatively investigated. The best results are obtained with 12-coefficient MFCC. Utilizing the proposed features a recognition rate of 84% has been achieved which is close to the performance of humans on this database.
APA, Harvard, Vancouver, ISO, and other styles
4

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Optimizing Integrated Features for Hindi Automatic Speech Recognition System." Journal of Intelligent Systems 29, no. 1 (October 1, 2018): 959–76. http://dx.doi.org/10.1515/jisys-2018-0057.

Full text
Abstract:
Abstract An automatic speech recognition (ASR) system translates spoken words or utterances (isolated, connected, continuous, and spontaneous) into text format. State-of-the-art ASR systems mainly use Mel frequency (MF) cepstral coefficient (MFCC), perceptual linear prediction (PLP), and Gammatone frequency (GF) cepstral coefficient (GFCC) for extracting features in the training phase of the ASR system. Initially, the paper proposes a sequential combination of all three feature extraction methods, taking two at a time. Six combinations, MF-PLP, PLP-MFCC, MF-GFCC, GF-MFCC, GF-PLP, and PLP-GFCC, are used, and the accuracy of the proposed system using all these combinations was tested. The results show that the GF-MFCC and MF-GFCC integrations outperform all other proposed integrations. Further, these two feature vector integrations are optimized using three different optimization methods, particle swarm optimization (PSO), PSO with crossover, and PSO with quadratic crossover (Q-PSO). The results demonstrate that the Q-PSO-optimized GF-MFCC integration show significant improvement over all other optimized combinations.
APA, Harvard, Vancouver, ISO, and other styles
5

Al Mahmud, Nahyan, and Shahfida Amjad Munni. "Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition." International journal of Multimedia & Its Applications 12, no. 5 (October 30, 2020): 1–8. http://dx.doi.org/10.5121/ijma.2020.12501.

Full text
Abstract:
The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.
APA, Harvard, Vancouver, ISO, and other styles
6

Kamińska, Dorota. "Emotional Speech Recognition Based on the Committee of Classifiers." Entropy 21, no. 10 (September 21, 2019): 920. http://dx.doi.org/10.3390/e21100920.

Full text
Abstract:
This article presents the novel method for emotion recognition from speech based on committee of classifiers. Different classification methods were juxtaposed in order to compare several alternative approaches for final voting. The research is conducted on three different types of Polish emotional speech: acted out with the same content, acted out with different content, and spontaneous. A pool of descriptors, commonly utilized for emotional speech recognition, expanded with sets of various perceptual coefficients, is used as input features. This research shows that presented approach improve the performance with respect to a single classifier.
APA, Harvard, Vancouver, ISO, and other styles
7

Dmitrieva, E., V. Gelman, K. Zaitseva, and A. Orlov. "Psychophysiological features of perceptual learning in the process of speech emotional prosody recognition." International Journal of Psychophysiology 85, no. 3 (September 2012): 375. http://dx.doi.org/10.1016/j.ijpsycho.2012.07.034.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Seyedin, Sanaz, Seyed Mohammad Ahadi, and Saeed Gazor. "New Features Using Robust MVDR Spectrum of Filtered Autocorrelation Sequence for Robust Speech Recognition." Scientific World Journal 2013 (2013): 1–11. http://dx.doi.org/10.1155/2013/634160.

Full text
Abstract:
This paper presents a novel noise-robust feature extraction method for speech recognition using the robust perceptual minimum variance distortionless response (MVDR) spectrum of temporally filtered autocorrelation sequence. The perceptual MVDR spectrum of the filtered short-time autocorrelation sequence can reduce the effects of residue of the nonstationary additive noise which remains after filtering the autocorrelation. To achieve a more robust front-end, we also modify the robust distortionless constraint of the MVDR spectral estimation method via revised weighting of the subband power spectrum values based on the sub-band signal to noise ratios (SNRs), which adjusts it to the new proposed approach. This new function allows the components of the input signal at the frequencies least affected by noise to pass with larger weights and attenuates more effectively the noisy and undesired components. This modification results in reduction of the noise residuals of the estimated spectrum from the filtered autocorrelation sequence, thereby leading to a more robust algorithm. Our proposed method, when evaluated on Aurora 2 task for recognition purposes, outperformed all Mel frequency cepstral coefficients (MFCC) as the baseline, relative autocorrelation sequence MFCC (RAS-MFCC), and the MVDR-based features in several different noisy conditions.
APA, Harvard, Vancouver, ISO, and other styles
9

Kaur, Gurpreet, Mohit Srivastava, and Amod Kumar. "Genetic Algorithm for Combined Speaker and Speech Recognition using Deep Neural Networks." Journal of Telecommunications and Information Technology 2 (June 29, 2018): 23–31. http://dx.doi.org/10.26636/jtit.2018.119617.

Full text
Abstract:
Huge growth is observed in the speech and speaker recognition field due to many artificial intelligence algorithms being applied. Speech is used to convey messages via the language being spoken, emotions, gender and speaker identity. Many real applications in healthcare are based upon speech and speaker recognition, e.g. a voice-controlled wheelchair helps control the chair. In this paper, we use a genetic algorithm (GA) for combined speaker and speech recognition, relying on optimized Mel Frequency Cepstral Coefficient (MFCC) speech features, and classification is performed using a Deep Neural Network (DNN). In the first phase, feature extraction using MFCC is executed. Then, feature optimization is performed using GA. In the second phase training is conducted using DNN. Evaluation and validation of the proposed work model is done by setting a real environment, and efficiency is calculated on the basis of such parameters as accuracy, precision rate, recall rate, sensitivity, and specificity. Also, this paper presents an evaluation of such feature extraction methods as linear predictive coding coefficient (LPCC), perceptual linear prediction (PLP), mel frequency cepstral coefficients (MFCC) and relative spectra filtering (RASTA), with all of them used for combined speaker and speech recognition systems. A comparison of different methods based on existing techniques for both clean and noisy environments is made as well.
APA, Harvard, Vancouver, ISO, and other styles
10

Trabelsi, Imen, and Med Salim Bouhlel. "Feature Selection for GUMI Kernel-Based SVM in Speech Emotion Recognition." International Journal of Synthetic Emotions 6, no. 2 (July 2015): 57–68. http://dx.doi.org/10.4018/ijse.2015070104.

Full text
Abstract:
Speech emotion recognition is the indispensable requirement for efficient human machine interaction. Most modern automatic speech emotion recognition systems use Gaussian mixture models (GMM) and Support Vector Machines (SVM). GMM are known for their performance and scalability in the spectral modeling while SVM are known for their discriminatory power. A GMM-supervector characterizes an emotional style by the GMM parameters (mean vectors, covariance matrices, and mixture weights). GMM-supervector SVM benefits from both GMM and SVM frameworks. In this paper, the GMM-UBM mean interval (GUMI) kernel based on the Bhattacharyya distance is successfully used. CFSSubsetEval combined with Best first algorithm and Greedy stepwise were also utilized on the supervectors space in order to select the most important features. This framework is illustrated using Mel-frequency cepstral (MFCC) coefficients and Perceptual Linear Prediction (PLP) features on two different emotional databases namely the Surrey Audio-Expressed Emotion and the Berlin Emotional speech Database.
APA, Harvard, Vancouver, ISO, and other styles
11

Lalitha, S., and Deepa Gupta. "An Encapsulation of Vital Non-Linear Frequency Features for Various Speech Applications." Journal of Computational and Theoretical Nanoscience 17, no. 1 (January 1, 2020): 303–7. http://dx.doi.org/10.1166/jctn.2020.8666.

Full text
Abstract:
Mel Frequency Cepstral Coefficients (MFCCs) and Perceptual linear prediction coefficients (PLPCs) are widely casted nonlinear vocal parameters in majority of the speaker identification, speaker and speech recognition techniques as well in the field of emotion recognition. Post 1980s, significant exertions are put forth on for the progress of these features. Considerations like the usage of appropriate frequency estimation approaches, proposal of appropriate filter banks, and selection of preferred features perform a vital part for the strength of models employing these features. This article projects an overview of MFCC and PLPC features for different speech applications. The insights such as performance metrics of accuracy, background environment, type of data, and size of features are inspected and concise with the corresponding key references. Adding more to this, the advantages and shortcomings of these features have been discussed. This background work will hopefully contribute to floating a heading step in the direction of the enhancement of MFCC and PLPC with respect to novelty, raised levels of accuracy, and lesser complexity.
APA, Harvard, Vancouver, ISO, and other styles
12

Linkai Bu and T. D. Church. "Perceptual speech processing and phonetic feature mapping for robust vowel recognition." IEEE Transactions on Speech and Audio Processing 8, no. 2 (March 2000): 105–14. http://dx.doi.org/10.1109/89.824695.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Helali, W., Ζ. Hajaiej, and A. Cherif. "Real Time Speech Recognition based on PWP Thresholding and MFCC using SVM." Engineering, Technology & Applied Science Research 10, no. 5 (October 26, 2020): 6204–8. http://dx.doi.org/10.48084/etasr.3759.

Full text
Abstract:
The real-time performance of Automatic Speech Recognition (ASR) is a big challenge and needs high computing capability and exhaustive memory consumption. Getting a robust performance against inevitable various difficult situations such as speaker variations, accents, and noise is a tedious task. It’s crucial to expand new and efficient approaches for speech signal extraction features and pre-processing. In order to fix the high dependency issue related to processing succeeding steps in ARS and enhance the extracted features’ quality, noise robustness can be solved within the ARS extraction block feature, removing implicitly the need for further additional specific compensation parameters or data collection. This paper proposes a new robust acoustic extraction approach development based on a hybrid technique consisting of Perceptual Wavelet Packet (PWP) and Mel Frequency Cepstral Coefficients (MFCCs). The proposed system was implemented on a Rasberry Pi board and its performance was checked in a clean environment, reaching 99% average accuracy. The recognition rate was improved (from 80% to 99%) for the majority of Signal-to-Noise Ratios (SNRs) under real noisy conditions for positive SNRs and considerably improved results especially for negative SNRs.
APA, Harvard, Vancouver, ISO, and other styles
14

Burgos, Pepi, Roeland van Hout, and Brigitte Planken. "Matching Acoustical Properties and Native Perceptual Assessments of L2 Speech." Open Linguistics 4, no. 1 (January 1, 2018): 199–226. http://dx.doi.org/10.1515/opli-2018-0011.

Full text
Abstract:
AbstractThis article analyses the acoustical properties of Dutch vowels produced by adult Spanish learners and investigates how these vowels are perceived by non-expert native Dutch listeners. Statistical vowel classifications obtained from the acoustical properties of the learner vowel realizations were compared to vowel classifications provided by native Dutch listeners. Both types of classifications were affected by the specific set of vowels included as stimuli, an effect caused by the large variability in Spanish learners’ vowel realizations. While there were matches between the two types of classifications, shifts were noted within and between production and perception, depending on the vowel and vowel features. We considered the variability between Spanish learners further by investigating individual patterns in the production and perception data, and linking these to the learners’ proficiency level and multilingual background. We conclude that integrating production and perception data provides valuable insights into the role of different features in adult L2 learning, and how their properties actively interact in the way L2 speech is perceived. A second conclusion is that adaptive mechanisms, signalled by boundary shifts and useful in coping with variability of non-native vowel stimuli, play a role in both statistical vowel classifications (production) and human vowel recognition (perception).
APA, Harvard, Vancouver, ISO, and other styles
15

Smith, Kimberly G., and Daniel Fogerty. "Integration of Partial Information Within and Across Modalities: Contributions to Spoken and Written Sentence Recognition." Journal of Speech, Language, and Hearing Research 58, no. 6 (December 2015): 1805–17. http://dx.doi.org/10.1044/2015_jslhr-h-14-0272.

Full text
Abstract:
PurposeThis study evaluated the extent to which partial spoken or written information facilitates sentence recognition under degraded unimodal and multimodal conditions.MethodTwenty young adults with typical hearing completed sentence recognition tasks in unimodal and multimodal conditions across 3 proportions of preservation. In the unimodal condition, performance was examined when only interrupted text or interrupted speech stimuli were available. In the multimodal condition, performance was examined when both interrupted text and interrupted speech stimuli were concurrently presented. Sentence recognition scores were obtained from simultaneous and delayed response conditions.ResultsSignificantly better performance was obtained for unimodal speech-only compared with text-only conditions across all proportions preserved. The multimodal condition revealed better performance when responses were delayed. During simultaneous responses, participants received equal benefit from speech information when the text was moderately and significantly degraded. The benefit from text in degraded auditory environments occurred only when speech was highly degraded.ConclusionsThe speech signal, compared with text, is robust against degradation likely due to its continuous, versus discrete, features. Allowing time for offline linguistic processing is beneficial for the recognition of partial sensory information in unimodal and multimodal conditions. Despite the perceptual differences between the 2 modalities, the results highlight the utility of multimodal speech + text signals.
APA, Harvard, Vancouver, ISO, and other styles
16

CAI, Shang, Yeming XIAO, Jielin PAN, Qingwei ZHAO, and Yonghong YAN. "Noise Robust Feature Scheme for Automatic Speech Recognition Based on Auditory Perceptual Mechanisms." IEICE Transactions on Information and Systems E95.D, no. 6 (2012): 1610–18. http://dx.doi.org/10.1587/transinf.e95.d.1610.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Nashipudimath, Madhu M., Pooja Pillai, Anupama Subramanian, Vani Nair, and Sarah Khalife. "Voice Feature Extraction for Gender and Emotion Recognition." ITM Web of Conferences 40 (2021): 03008. http://dx.doi.org/10.1051/itmconf/20214003008.

Full text
Abstract:
Voice recognition plays a key function in spoken communication that facilitates identifying the emotions of a person that reflects within the voice. Gender classification through speech is a popular Human Computer Interaction (HCI) method on account that determining gender through computer is hard. This led to the development of a model for "Voice feature extraction for Emotion and Gender Recognition". The speech signal consists of semantic information, speaker information (gender, age, emotional state), accompanied by noise. Females and males have specific vocal traits because of their acoustical and perceptual variations along with a variety of emotions which bring their own specific perceptions. In order to explore this area, feature extraction requires pre-processing of data, which is necessary for increasing the accuracy. The proposed model follows steps such as data extraction, pre-processing using Voice Activity Detector(VAD), feature extraction using Mel-Frequency Cepstral Coefficient(MFCC), feature reduction by Principal Component Analysis(PCA) and Support Vector Machine (SVM) classifier. The proposed combination of techniques produced better results which can be useful in healthcare sector, virtual assistants, security purposes and other fields related to Human Machine Interaction domain.
APA, Harvard, Vancouver, ISO, and other styles
18

Nair, Vani, Pooja Pillai, Anupama Subramanian, Sarah Khalife, and Dr Madhu Nashipudimath. "Voice Feature Extraction for Gender and Emotion Recognition." International Journal on Recent and Innovation Trends in Computing and Communication 9, no. 5 (May 31, 2021): 17–22. http://dx.doi.org/10.17762/ijritcc.v9i5.5463.

Full text
Abstract:
Voice recognition plays a key role in spoken communication that helps to identify the emotions of a person that reflects in the voice. Gender classification through speech is a widely used Human Computer Interaction (HCI) as it is not easy to identify gender by computer. This led to the development of a model for “Voice feature extraction for Emotion and Gender Recognition”. The speech signal consists of semantic information, speaker information (gender, age, emotional state), accompanied by noise. Females and males have different voice characteristics due to their acoustical and perceptual differences along with a variety of emotions which convey their own unique perceptions. In order to explore this area, feature extraction requires pre- processing of data, which is necessary for increasing the accuracy. The proposed model follows steps such as data extraction, pre- processing using Voice Activity Detector (VAD), feature extraction using Mel-Frequency Cepstral Coefficient (MFCC), feature reduction by Principal Component Analysis (PCA) and Support Vector Machine (SVM) classifier. The proposed combination of techniques produced better results which can be useful in the healthcare sector, virtual assistants, security purposes and other fields related to the Human Machine Interaction domain.
APA, Harvard, Vancouver, ISO, and other styles
19

Davies-Venn, Evelyn, and Pamela Souza. "The Role of Spectral Resolution, Working Memory, and Audibility in Explaining Variance in Susceptibility to Temporal Envelope Distortion." Journal of the American Academy of Audiology 25, no. 06 (June 2014): 592–604. http://dx.doi.org/10.3766/jaaa.25.6.9.

Full text
Abstract:
Background: Several studies have shown that hearing thresholds alone cannot adequately predict listeners’ success with hearing-aid amplification. Furthermore, previous studies have shown marked differences in listeners’ susceptibility to distortions introduced by certain nonlinear amplification parameters. Purpose: The purpose of this study was to examine the role of spectral resolution, working memory, and audibility in explaining perceptual susceptibility to temporal envelope and other hearing-aid compression-induced distortions for listeners with mild to moderate and moderate to severe hearing loss. Research Design: A between-subjects repeated-measures design was used to compare speech recognition scores with linear versus compression amplification, for listeners with mild to moderate and moderate to severe hearing loss. Study Sample: The study included 15 adult listeners with mild to moderate hearing loss and 13 adults with moderate to severe hearing loss. Data Collection/Analysis: Speech recognition scores were measured for vowel-consonant-vowel syllables processed with linear, moderate compression, and extreme compression amplification. Perceptual susceptibility to compression-induced temporal envelope distortion was defined as the difference in scores between linear and compression amplification. Both overall scores and consonant feature scores (i.e., place, manner, and voicing) were analyzed. Narrowband spectral resolution was measured using individual measures of auditory filter bandwidth at 2000 Hz. Working memory was measured using the reading span test. Signal audibility was quantified using the Aided Audibility Index. Multiple linear regression was used to determine the predictive role of spectral resolution, working memory, and audibility benefit on listeners’ susceptibility to compression-induced distortions. Results: For all listeners, spectral resolution, working memory, and audibility benefit were significant predictors of overall distortion scores. For listeners with moderate to severe hearing loss, spectral resolution and audibility benefit predicted distortion scores for consonant place and manner of articulation features, and audibility benefit predicted distortion scores for consonant voicing features. For listeners with mild to moderate hearing loss, the model did not predict distortion scores for overall or consonant feature scores. Conclusions: The results from this study suggest that when audibility is adequately controlled, measures of spectral resolution may identify the listeners who are most susceptible to compression-induced distortions. Working memory appears to modulate the negative effect of these distortions for listeners with moderate to severe hearing loss.
APA, Harvard, Vancouver, ISO, and other styles
20

Cabral, Frederico Soares, Hidekazu Fukai, and Satoshi Tamura. "Feature Extraction Methods Proposed for Speech Recognition Are Effective on Road Condition Monitoring Using Smartphone Inertial Sensors." Sensors 19, no. 16 (August 9, 2019): 3481. http://dx.doi.org/10.3390/s19163481.

Full text
Abstract:
The objective of our project is to develop an automatic survey system for road condition monitoring using smartphone devices. One of the main tasks of our project is the classification of paved and unpaved roads. Assuming recordings will be archived by using various types of vehicle suspension system and speeds in practice, hence, we use the multiple sensors found in smartphones and state-of-the-art machine learning techniques for signal processing. Despite usually not being paid much attention, the results of the classification are dependent on the feature extraction step. Therefore, we have to carefully choose not only the classification method but also the feature extraction method and their parameters. Simple statistics-based features are most commonly used to extract road surface information from acceleration data. In this study, we evaluated the mel-frequency cepstral coefficient (MFCC) and perceptual linear prediction coefficients (PLP) as a feature extraction step to improve the accuracy for paved and unpaved road classification. Although both MFCC and PLP have been developed in the human speech recognition field, we found that modified MFCC and PLP can be used to improve the commonly used statistical method.
APA, Harvard, Vancouver, ISO, and other styles
21

Massaro, Dominic W. "Multiple Book Review of Speech perception by ear and eye: A paradigm for psychological inquiry." Behavioral and Brain Sciences 12, no. 4 (December 1989): 741–55. http://dx.doi.org/10.1017/s0140525x00025619.

Full text
Abstract:
AbstractThis book is about the processing of information in face-to-face communication when a speaker makes both audible and visible information available to a perceiver. Both auditory and visual sources of information are evaluated and integrated to achieve speech perception. The evaluation of the information source provides information about the strength of alternative interpretations, rather than just all-or-none categorical information, as claimed by “categorical perception” theory. Information sources are evaluated independently; the integration process insures that the least ambiguous sources have the most influences on the judgment. Similar processes occur in a variety of other behaviors, ranging from personality judgments and categorization to sentence interpretation and decision making. The experimental results are consistent with a fuzzy logical model of perception, positing three operations in perceptual (primary) recognition: feature evaluation, feature integration, and pattern classification. Continuously valued features are first evaluated, then integrated and matched against prototype descriptions in memory; finally, an identification decision is made on the basis of the relative goodness-of-match of the stimulus information with the relevant prototype descriptions.
APA, Harvard, Vancouver, ISO, and other styles
22

Dua, Mohit, Rajesh Kumar Aggarwal, and Mantosh Biswas. "Discriminative Training Using Noise Robust Integrated Features and Refined HMM Modeling." Journal of Intelligent Systems 29, no. 1 (February 20, 2018): 327–44. http://dx.doi.org/10.1515/jisys-2017-0618.

Full text
Abstract:
Abstract The classical approach to build an automatic speech recognition (ASR) system uses different feature extraction methods at the front end and various parameter classification techniques at the back end. The Mel-frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) techniques are the conventional approaches used for many years for feature extraction, and the hidden Markov model (HMM) has been the most obvious selection for feature classification. However, the performance of MFCC-HMM and PLP-HMM-based ASR system degrades in real-time environments. The proposed work discusses the implementation of discriminatively trained Hindi ASR system using noise robust integrated features and refined HMM model. It sequentially combines MFCC with PLP and MFCC with gammatone-frequency cepstral coefficient (GFCC) to obtain MF-PLP and MF-GFCC integrated feature vectors, respectively. The HMM parameters are refined using genetic algorithm (GA) and particle swarm optimization (PSO). Discriminative training of acoustic model using maximum mutual information (MMI) and minimum phone error (MPE) is preformed to enhance the accuracy of the proposed system. The results show that discriminative training using MPE with MF-GFCC integrated feature vector and PSO-HMM parameter refinement gives significantly better results than the other implemented techniques.
APA, Harvard, Vancouver, ISO, and other styles
23

Schädler, Marc R., David Hülsmeier, Anna Warzybok, and Birger Kollmeier. "Individual Aided Speech-Recognition Performance and Predictions of Benefit for Listeners With Impaired Hearing Employing FADE." Trends in Hearing 24 (January 2020): 233121652093892. http://dx.doi.org/10.1177/2331216520938929.

Full text
Abstract:
The benefit in speech-recognition performance due to the compensation of a hearing loss can vary between listeners, even if unaided performance and hearing thresholds are similar. To accurately predict the individual performance benefit due to a specific hearing device, a prediction model is proposed which takes into account hearing thresholds and a frequency-dependent suprathreshold component of impaired hearing. To test the model, the German matrix sentence test was performed in unaided and individually aided conditions in quiet and in noise by 18 listeners with different degrees of hearing loss. The outcomes were predicted by an individualized automatic speech-recognition system where the individualization parameter for the suprathreshold component of hearing loss was inferred from tone-in-noise detection thresholds. The suprathreshold component was implemented as a frequency-dependent multiplicative noise (mimicking level uncertainty) in the feature-extraction stage of the automatic speech-recognition system. Its inclusion improved the root-mean-square prediction error of individual speech-recognition thresholds (SRTs) from 6.3 dB to 4.2 dB and of individual benefits in SRT due to common compensation strategies from 5.1 dB to 3.4 dB. The outcome predictions are highly correlated with both the corresponding observed SRTs ( R2 = .94) and the benefits in SRT ( R2 = .89) and hence might help to better understand—and eventually mitigate—the perceptual consequences of as yet unexplained hearing problems, also discussed in the context of hidden hearing loss.
APA, Harvard, Vancouver, ISO, and other styles
24

Myronova, T. Yu, and O. V. Kovalevska. "Methods of development orientational skills in a foreign text." Bulletin of Luhansk Taras Shevchenko National University, no. 4 (335) (2020): 195–202. http://dx.doi.org/10.12958/2227-2844-2020-4(335)-195-202.

Full text
Abstract:
The article is devoted to the implementation of the methodical approach as teaching reading in foreign language to students of non-philological specialties on the basis of specific language material. It is based on the essential characteristics of reading as a type of speech activity based on the analysis of grammatical features contained in the text. The approach of teaching reading covered in the article involves managing the process of development an indicative basis for educational activities. This method has great advantages, because it helps to develop skills of creative analysis of the semantic content. Also, this method provides linking the language form and content, as well as eliminates the interference of native and foreign languages by differentiating language representations in different languages. In order to understand a grammatical phenomenon when reading a text, we must be able to know this phenomenon by its form and to connect the form with the corresponding meaning. Recognition of grammatical phenomena is based on the characteristic features of these phenomena, which symbolize their presence. Reading as a communicative process creates the following tasks before the reader: to recognize the graphic form of morphemes, words, sentences and to perceive the content. Skilled reading is characterized by the automatism of perceptual processing of the presented printed material and the adequacy of solving semantic problems that arise during the implementation of speech activity. Therefore, the way of learning passive grammar should repeat this communicative process, so the description of the phenomenon of passive grammar should be provided from the form (its features) to the disclosure of its content, so exercises should be aimed to developing automatic recognition of these features.
APA, Harvard, Vancouver, ISO, and other styles
25

Bach, Jörg-Hendrik, Jörn Anemüller, and Birger Kollmeier. "Robust speech detection in real acoustic backgrounds with perceptually motivated features." Speech Communication 53, no. 5 (May 2011): 690–706. http://dx.doi.org/10.1016/j.specom.2010.07.003.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Wolfe, Jace, Mila Duke, Erin Schafer, Christine Jones, and Lori Rakita. "Evaluation of Adaptive Noise Management Technologies for School-Age Children with Hearing Loss." Journal of the American Academy of Audiology 28, no. 05 (May 2017): 415–35. http://dx.doi.org/10.3766/jaaa.16015.

Full text
Abstract:
Background: Children with hearing loss experience significant difficulty understanding speech in noisy and reverberant situations. Adaptive noise management technologies, such as fully adaptive directional microphones and digital noise reduction, have the potential to improve communication in noise for children with hearing aids. However, there are no published studies evaluating the potential benefits children receive from the use of adaptive noise management technologies in simulated real-world environments as well as in daily situations. Purpose: The objective of this study was to compare speech recognition, speech intelligibility ratings (SIRs), and sound preferences of children using hearing aids equipped with and without adaptive noise management technologies. Research Design: A single-group, repeated measures design was used to evaluate performance differences obtained in four simulated environments. In each simulated environment, participants were tested in a basic listening program with minimal noise management features, a manual program designed for that scene, and the hearing instruments’ adaptive operating system that steered hearing instrument parameterization based on the characteristics of the environment. Study Sample: Twelve children with mild to moderately severe sensorineural hearing loss. Data Collection and Analysis: Speech recognition and SIRs were evaluated in three hearing aid programs with and without noise management technologies across two different test sessions and various listening environments. Also, the participants’ perceptual hearing performance in daily real-world listening situations with two of the hearing aid programs was evaluated during a four- to six-week field trial that took place between the two laboratory sessions. Results: On average, the use of adaptive noise management technology improved sentence recognition in noise for speech presented in front of the participant but resulted in a decrement in performance for signals arriving from behind when the participant was facing forward. However, the improvement with adaptive noise management exceeded the decrement obtained when the signal arrived from behind. Most participants reported better subjective SIRs when using adaptive noise management technologies, particularly when the signal of interest arrived from in front of the listener. In addition, most participants reported a preference for the technology with an automatically switching, adaptive directional microphone and adaptive noise reduction in real-world listening situations when compared to conventional, omnidirectional microphone use with minimal noise reduction processing. Conclusions: Use of the adaptive noise management technologies evaluated in this study improves school-age children’s speech recognition in noise for signals arriving from the front. Although a small decrement in speech recognition in noise was observed for signals arriving from behind the listener, most participants reported a preference for use of noise management technology both when the signal arrived from in front and from behind the child. The results of this study suggest that adaptive noise management technologies should be considered for use with school-age children when listening in academic and social situations.
APA, Harvard, Vancouver, ISO, and other styles
27

Abid Noor, Ali O. "Robust speaker verification in band-localized noise conditions." Indonesian Journal of Electrical Engineering and Computer Science 13, no. 2 (February 1, 2019): 499. http://dx.doi.org/10.11591/ijeecs.v13.i2.pp499-506.

Full text
Abstract:
This research paper presents a robust method for speaker verification in noisy environments. The noise is assumed to contaminate certain parts of the voice’s frequency spectrum. Therefore, the verification method is based on splitting the noisy speech into subsidiary bands then using a threshold to sense the existence of noise in a specific part of the spectrum, hence activating an adaptive filter in that part to track changes in noise’s characteristics and remove it. The decomposition is achieved using low complexity quadrature mirror filters QMF in three levels thus achieving four bands in a non-uniform that resembles human hearing perceptual. Speaker recognition is based on vector quantization VQ or template matching technique. Features are extracted from speaker’s voice using the normalized power in a similar way to the Mel-frequency cepstral coefficients. The performance of the proposed system is evaluated using 60 speakers subjected to five levels of signal to noise ratio SNR using total success rate TSR, false acceptance rate FAR, false rejection rate FRR and equal error rate. The proposed method showed higher recognition accuracy than existing methods in severe noise conditions.
APA, Harvard, Vancouver, ISO, and other styles
28

Kwak, Yuna, Hosung Nam, Hyun-Woong Kim, and Chai-Youn Kim. "Cross-Modal Correspondence Between Speech Sound and Visual Shape Influencing Perceptual Representation of Shape: the Role of Articulation and Pitch." Multisensory Research 33, no. 6 (June 17, 2020): 569–98. http://dx.doi.org/10.1163/22134808-20191330.

Full text
Abstract:
Abstract Cross-modal correspondence is the tendency to systematically map stimulus features across sensory modalities. The current study explored cross-modal correspondence between speech sound and shape (Experiment 1), and whether such association can influence shape representation (Experiment 2). For the purpose of closely examining the role of the two factors — articulation and pitch — combined in speech acoustics, we generated two sets of 25 vowel stimuli — pitch-varying and pitch-constant sets. Both sets were generated by manipulating articulation — frontness and height of the tongue body’s positions — but differed in terms of whether pitch varied among the sounds within the same set. In Experiment 1, participants made a forced choice between a round and a spiky shape to indicate the shape better associated with each sound. Results showed that shape choice was modulated according to both articulation and pitch, and we therefore concluded that both factors play significant roles in sound–shape correspondence. In Experiment 2, participants reported their subjective experience of shape accompanied by vowel sounds by adjusting an ambiguous shape in the response display. We found that sound–shape correspondence exerts an effect on shape representation by modulating audiovisual interaction, but only in the case of pitch-varying sounds. Therefore, pitch information within vowel acoustics plays the leading role in sound–shape correspondence influencing shape representation. Taken together, our results suggest the importance of teasing apart the roles of articulation and pitch for understanding sound–shape correspondence.
APA, Harvard, Vancouver, ISO, and other styles
29

Frey, Brendan J., and Geoffrey E. Hinton. "Variational Learning in Nonlinear Gaussian Belief Networks." Neural Computation 11, no. 1 (January 1, 1999): 193–213. http://dx.doi.org/10.1162/089976699300016872.

Full text
Abstract:
We view perceptual tasks such as vision and speech recognition as inference problems where the goal is to estimate the posterior distribution over latent variables (e.g., depth in stereo vision) given the sensory input. The recent flurry of research in independent component analysis exemplifies the importance of inferring the continuous-valued latent variables of input data. The latent variables found by this method are linearly related to the input, but perception requires nonlinear inferences such as classification and depth estimation. In this article, we present a unifying framework for stochastic neural networks with nonlinear latent variables. Nonlinear units are obtained by passing the outputs of linear gaussian units through various nonlinearities. We present a general variational method that maximizes a lower bound on the likelihood of a training set and give results on two visual feature extraction problems. We also show how the variational method can be used for pattern classification and compare the performance of these nonlinear networks with other methods on the problem of handwritten digit recognition.
APA, Harvard, Vancouver, ISO, and other styles
30

Gfeller, Kate, Dingfeng Jiang, Jacob J. Oleson, Virginia Driscoll, and John F. Knutson. "Temporal Stability of Music Perception and Appraisal Scores of Adult Cochlear Implant Recipients." Journal of the American Academy of Audiology 21, no. 01 (January 2010): 028–34. http://dx.doi.org/10.3766/jaaa.21.1.4.

Full text
Abstract:
Background: An extensive body of literature indicates that cochlear implants (CIs) are effective in supporting speech perception of persons with severe to profound hearing losses who do not benefit to any great extent from conventional hearing aids. Adult CI recipients tend to show significant improvement in speech perception within 3 mo following implantation as a result of mere experience. Furthermore, CI recipients continue to show modest improvement as long as 5 yr postimplantation. In contrast, data taken from single testing protocols of music perception and appraisal indicate that CIs are less than ideal in transmitting important structural features of music, such as pitch, melody, and timbre. However, there is presently little information documenting changes in music perception or appraisal over extended time as a result of mere experience. Purpose: This study examined two basic questions: (1) Do adult CI recipients show significant improvement in perceptual acuity or appraisal of specific music listening tasks when tested in two consecutive years? (2) If there are tasks for which CI recipients show significant improvement with time, are there particular demographic variables that predict those CI recipients most likely to show improvement with extended CI use? Research Design: A longitudinal cohort study. Implant recipients return annually for visits to the clinic. Study Sample: The study included 209 adult cochlear implant recipients with at least 9 mo implant experience before their first year measurement. Data Collection and Analysis: Outcomes were measured on the patient's annual visit in two consecutive years. Paired t-tests were used to test for significant improvement from one year to the next. Those variables demonstrating significant improvement were subjected to regression analyses performed to detect the demographic variables useful in predicting said improvement. Results: There were no significant differences in music perception outcomes as a function of type of device or processing strategy used. Only familiar melody recognition (FMR) and recognition of melody excerpts with lyrics (MERT-L) showed significant improvement from one year to the next. After controlling for the baseline value, hearing aid use, months of use, music listening habits after implantation, and formal musical training in elementary school were significant predictors of FMR improvement. Bilateral CI use, formal musical training in high school and beyond, and a measure of sequential cognitive processing were significant predictors of MERT-L improvement. Conclusion: These adult CI recipients as a result of mere experience demonstrated fairly consistent music perception and appraisal on measures gathered in two consecutive years. Gains made tend to be modest, and can be associated with characteristics such as use of hearing aids, listening experiences, or bilateral use (in the case of lyrics). These results have implications for counseling of CI recipients with regard to realistic expectations and strategies for enhancing music perception and enjoyment.
APA, Harvard, Vancouver, ISO, and other styles
31

Sheldon, Claire A., George L. Malcolm, and Jason J. S. Barton. "Alexia With and Without Agraphia: An Assessment of Two Classical Syndromes." Canadian Journal of Neurological Sciences / Journal Canadien des Sciences Neurologiques 35, no. 5 (November 2008): 616–24. http://dx.doi.org/10.1017/s0317167100009410.

Full text
Abstract:
Background:Current cognitive models propose that multiple processes are involved in reading and writing.Objective:Our goal was to use linguistic analyses to clarify the cognitive dysfunction behind two classic alexic syndromes.Methods:We report four experiments on two patients, one with alexia without agraphia following occipitotemporal lesions, and one with alexia with agraphia from a left angular gyral lesion.Results:The patient with occipital lesions had trouble discriminating real letters from foils and his reading varied with word-length but not with linguistic variables such as part of speech, word frequency or imageability. He read pseudo-words and words with regular spelling better, indicating preserved use of grapheme-to-phoneme pronunciation rules. His writing showed errors that reflected reliance on ‘phoneme-to-grapheme’ spelling rules. In contrast, the patient with a left angular gyral lesion showed better recognition of letters, words and their meanings. His reading was better for words with high imageability but displayed semantic errors and an inability to use ‘grapheme-to-phoneme’ rules, features consistent with deep dyslexia. His agraphia showed impaired access to both an internal lexicon and ‘phoneme-to-grapheme’ rules.Conclusion:Some cases of pure alexia may be a perceptual word-form agnosia, with loss of internal representations of letters and words, while the angular gyral syndrome of alexia with agraphia is a linguistic deep dyslexia. The presence or absence of agraphia does not always distinguish between the two; rather, writing can mirror the reading deficits, being more obvious and profound in the case of an angular gyral syndrome.
APA, Harvard, Vancouver, ISO, and other styles
32

Nusbaum, Howard C. "Perceptual expectations, attention, and speech recognition." Journal of the Acoustical Society of America 127, no. 3 (March 2010): 1890. http://dx.doi.org/10.1121/1.3384714.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Aliūkaitė, Daiva, and Danguolė Mikulėnienė. "The narrative of an ordinary member of language community: WHERE and WHY is dialecticity of a locality created." Lietuvių kalba, no. 13 (December 20, 2019): 1–22. http://dx.doi.org/10.15388/lk.2019.22481.

Full text
Abstract:
The paper aims to explore where and why an ordinary member of language community creates the dialecticity of a locality and evaluate whether (and how) the dialect artefact of an ordinary member of language community is related with the dialecticity recognised and estimated by researchers, or, in other words, discuss the interaction of the emic and etic perspectives.The empirical basis for the discussion about the interaction of the emic and etic perspectives is formed on the verbalised and visualised language attitudes of the ordinary members of language community and the data of the text-stimuli perceptions gathered during the project “The Position of Standard Language in the Mental Map of the Lithuanian Language” carried out in 2014–2016 and supplied with the data of the ongoing project “Distribution of Regional Variants and Quasistandard Language at the Beginning of the 21st Century: Perceptual Approach (Perceptual Categorisation of Variants”, 2017–2019.The respondents of both projects are the first-fourth year grammar school pupils whom the scholars associate with the emic perspective. The first attempt concerned the data related with the verbalised and visualised (in the drawn maps) language attitudes of 1.415 teenagers; the second one analysed the data related with the verbalised and visualised (in the drawn maps) language attitudes of 1.064 youngsters and the data of the perception of the text-stimuli recorded in an adequate dialect. Both projects are interrelated with regard to the subject matter and the pursued goals: in the first case, an attempt was made to analyse the geolinguistic competence of an ordinary member of language community; in the second one, an additional aspect of the perceptual abilities of an ordinary member of language community was considered.During the performance of the two projects the essential criterion for the selection of the locations in the regions of Lithuania to be explored was whether they were (non)marked by dialect. Hence the respondent groups were formed in the regiolect and/or geolect zones, and in the second project the task of the text-stimuli perception had motivated the inclusion of the Lithuanian cities.The problem of how an ordinary member of language community creates the dialecticity of a location has been approached on the basis of the data given in the drawn maps presented in the two projects.The participants of the first project have drawn the so called perceptual isoglosses in two maps, i.e. in one map they have marked the areas where people speak in dialect and, in the other, where standard language was used. Meanwhile, the participants of the second project in their drawn maps related the linguistic homeland with other locations due to the similarity (or simultaneity) of expression. They also had to draw the maps of standard language and, in addition, localise 8 text-stimuli given to them for assessment which contained the 14–19 seconds fragments of spontaneous speech representing various regiolectic zones.To summarise the obtained results, it should be claimed that etic and emic discourses should be essentially related to the cause and effect factor. The narrative of an ordinary member of language community not only reveals the specific interior relationships but is also affected from outside. Such an insight is determined by the interaction between the created dialecticity of a locality and the dialecticity of localities legitimated in scientific discourse.The results obtained in both projects on perceptual dialectology show that the dialecticity of a locality has been constructed on the basis of adequate etic information: it is obvious from the drawn maps that dialecticity is attracted by the localities that are highly dialect-oriented, i.e. the geolectic and regiolectic areas. This assumption is based on the localisation of the text-stimulus having the most distinct features of dialect which confirms that dialect recognition by the ordinary members of language community does not enter into conflict with the researcher's evaluation from outside. Thus it shows that localities do consistently attract the text-stimuli having the most distinct features of dialect.Meanwhile, as a place of dialect levelling, the capital (or any city) accurately correlates with the NORM reflection of traditional dialectology.The paper summarises that it is not clear yet in what ways the constructors of the narrative from inside are affected by the narratives from outside. There is no tradition formed in the works on perceptual dialectology and no adequate methodological instruments have been devised which might help to find out the sources of knowledge, images and attitudes of the ordinary members of language community. Hence, in order to more clearly describe the relationship between the narrative of a researcher and that of an ordinary member of language community it would be reasonable to move an additional step forward – to expand the instrumentarium and methods of research by including the reflections of the ordinary members of language community regarding the knowledge, images and attitudes that they possess in the field of dialecticity. Thus a new perspective in dialectology should be initiated.
APA, Harvard, Vancouver, ISO, and other styles
34

Zhao, Kai, and Dan Wang. "Research on Speech Recognition Method in Multi Layer Perceptual Network Environment." International Journal of Circuits, Systems and Signal Processing 15 (August 24, 2021): 996–1004. http://dx.doi.org/10.46300/9106.2021.15.107.

Full text
Abstract:
Aiming at the problem of low recognition rate in speech recognition methods, a speech recognition method in multi-layer perceptual network environment is proposed. In the multi-layer perceptual network environment, the speech signal is processed in the filter by using the transfer function of the filter. According to the framing process, the speech signal is windowed and framing processed to remove the silence segment of the speech signal. At the same time, the average energy of the speech signal is calculated and the zero crossing rate is calculated to extract the characteristics of the speech signal. By analyzing the principle of speech signal recognition, the process of speech recognition is designed, and the speech recognition in multi-layer perceptual network environment is realized. The experimental results show that the speech recognition method designed in this paper has good speech recognition performance
APA, Harvard, Vancouver, ISO, and other styles
35

Axelrod, Scott E. "Speech recognition utilizing multitude of speech features." Journal of the Acoustical Society of America 128, no. 4 (2010): 2259. http://dx.doi.org/10.1121/1.3500788.

Full text
APA, Harvard, Vancouver, ISO, and other styles
36

Allen, Jont B., and Marion Regnier. "SPEECH AND METHOD FOR IDENTIFYING PERCEPTUAL FEATURES." Journal of the Acoustical Society of America 132, no. 4 (2012): 2779. http://dx.doi.org/10.1121/1.4757834.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Mattys, Sven L., and Shekeila D. Palmer. "Divided attention disrupts perceptual encoding during speech recognition." Journal of the Acoustical Society of America 137, no. 3 (March 2015): 1464–72. http://dx.doi.org/10.1121/1.4913507.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Eide, Ellen M. "Speech recognition using discriminant features." Journal of the Acoustical Society of America 126, no. 3 (2009): 1646. http://dx.doi.org/10.1121/1.3230471.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Huang, Chang-Han, and Frank Torsten Bernd Seide. "Tone features for speech recognition." Journal of the Acoustical Society of America 117, no. 5 (2005): 2698. http://dx.doi.org/10.1121/1.1932393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Bahl, Lalit R. "Speech recognition using dynamic features." Journal of the Acoustical Society of America 102, no. 6 (1997): 3252. http://dx.doi.org/10.1121/1.420242.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Shafiro, Valeriy, Daniel Fogerty, Kimberly Smith, and Stanley Sheft. "Perceptual Organization of Interrupted Speech and Text." Journal of Speech, Language, and Hearing Research 61, no. 10 (October 26, 2018): 2578–88. http://dx.doi.org/10.1044/2018_jslhr-h-17-0477.

Full text
Abstract:
Purpose Visual recognition of interrupted text may predict speech intelligibility under adverse listening conditions. This study investigated the nature of the linguistic information and perceptual processes underlying this relationship. Method To directly compare the perceptual organization of interrupted speech and text, we examined the recognition of spoken and printed sentences interrupted at different rates in 14 adults with normal hearing. The interruption method approximated deletion and retention of rate-specific linguistic information (0.5–64 Hz) in speech by substituting either white space or silent intervals for text or speech in the original sentences. Results A similar U-shaped pattern of cross-rate variation in performance was observed in both modalities, with minima at 2 Hz. However, at the highest and lowest interruption rates, recognition accuracy was greater for text than speech, whereas the reverse was observed at middle rates. An analysis of word duration and the frequency of word sampling across interruption rates suggested that the location of the function minima was influenced by perceptual reconstruction of whole words. Overall, the findings indicate a high degree of similarity in the perceptual organization of interrupted speech and text. Conclusion The observed rate-specific variation in the perception of speech and text may potentially affect the degree to which recognition accuracy in one modality is predictive of the other.
APA, Harvard, Vancouver, ISO, and other styles
42

Jones, Harrison N., Kelly D. Crisp, Maragatha Kuchibhatla, Leslie Mahler, Thomas Risoli, Carlee W. Jones, and Priya Kishnani. "Auditory-Perceptual Speech Features in Children With Down Syndrome." American Journal on Intellectual and Developmental Disabilities 124, no. 4 (July 1, 2019): 324–38. http://dx.doi.org/10.1352/1944-7558-124.4.324.

Full text
Abstract:
Abstract Speech disorders occur commonly in individuals with Down syndrome (DS), although data regarding the auditory-perceptual speech features are limited. This descriptive study assessed 47 perceptual speech features during connected speech samples in 26 children with DS. The most severely affected speech features were: naturalness, imprecise consonants, hyponasality, speech rate, inappropriate silences, irregular vowels, prolonged intervals, overall loudness level, pitch level, aberrant oropharyngeal resonance, hoarse voice, reduced stress, and prolonged phonemes. These findings suggest that speech disorders in DS are due to distributed impairments involving voice, speech sound production, fluency, resonance, and prosody. These data contribute to the development of a profile of impairments in speakers with DS to guide future research and inform clinical assessment and treatment.
APA, Harvard, Vancouver, ISO, and other styles
43

Richter, Caitlin, Naomi H. Feldman, Harini Salgado, and Aren Jansen. "Evaluating Low-Level Speech Features Against Human Perceptual Data." Transactions of the Association for Computational Linguistics 5 (December 2017): 425–40. http://dx.doi.org/10.1162/tacl_a_00071.

Full text
Abstract:
We introduce a method for measuring the correspondence between low-level speech features and human perception, using a cognitive model of speech perception implemented directly on speech recordings. We evaluate two speaker normalization techniques using this method and find that in both cases, speech features that are normalized across speakers predict human data better than unnormalized speech features, consistent with previous research. Results further reveal differences across normalization methods in how well each predicts human data. This work provides a new framework for evaluating low-level representations of speech on their match to human perception, and lays the groundwork for creating more ecologically valid models of speech perception.
APA, Harvard, Vancouver, ISO, and other styles
44

Small, Larry H. "Listeners' Perceptual Strategies in Word Recognition: Shadowing Misarticulated Speech." Perceptual and Motor Skills 69, no. 3_suppl (December 1989): 1211–16. http://dx.doi.org/10.2466/pms.1989.69.3f.1211.

Full text
Abstract:
The purpose of the study was to examine the perceptual salience of various types of phonetic, lexical, and prosodic information by examining subjects' responses to altered words in a continuous speech-shadowing task. 48 subjects shadowed a prose passage in which the word initial consonant of 14 two-syllable words was altered by either mispronouncing or deleting it. Analysis of responses showed that subjects made use of lexical stress and stressed vowel information during word recognition to cope with the altered auditory signal
APA, Harvard, Vancouver, ISO, and other styles
45

Small, Larry H. "Listeners’ Perceptual Strategies in Word Recognition: Shadowing Misarticulated Speech." Perceptual and Motor Skills 69, no. 3-2 (December 1989): 1211–16. http://dx.doi.org/10.1177/00315125890693-226.

Full text
Abstract:
The purpose of the study was to examine the perceptual salience of various types of phonetic, lexical, and prosodic information by examining subjects’ responses to altered words in a continuous speech-shadowing task. 48 subjects shadowed a prose passage in which the word initial consonant of 14 two-syllable words was altered by either mispronouncing or deleting it. Analysis of responses showed that subjects made use of lexical stress and stressed vowel information during word recognition to cope with the altered auditory signal.
APA, Harvard, Vancouver, ISO, and other styles
46

Nusbaum, Howard. "Perceptual learning and expectations: Cognitive mechanisms in speech recognition." Journal of the Acoustical Society of America 125, no. 4 (April 2009): 2604. http://dx.doi.org/10.1121/1.4783910.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Thomas-Stonell, Nancy, Ava-Lee Kotler, Herbert Leeper, and Philip Doyle. "Computerized speech recognition: influence of intelligibility and perceptual consistency on recognition accuracy." Augmentative and Alternative Communication 14, no. 1 (January 1998): 51–56. http://dx.doi.org/10.1080/07434619812331278196.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Najnin, Shamima, and Bonny Banerjee. "Speech recognition using cepstral articulatory features." Speech Communication 107 (February 2019): 26–37. http://dx.doi.org/10.1016/j.specom.2019.01.002.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Potamianos, Alexandros. "Novel features for robust speech recognition." Journal of the Acoustical Society of America 112, no. 5 (November 2002): 2278. http://dx.doi.org/10.1121/1.4779131.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Lee, Youngjik Lee, and Kyu-Woong Hwang Hwang. "Selecting Good Speech Features for Recognition." ETRI Journal 18, no. 1 (April 1, 1996): 29–40. http://dx.doi.org/10.4218/etrij.96.0196.0013.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography