Um die anderen Arten von Veröffentlichungen zu diesem Thema anzuzeigen, folgen Sie diesem Link: Acoustic speech features.

Zeitschriftenartikel zum Thema „Acoustic speech features“

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Machen Sie sich mit Top-50 Zeitschriftenartikel für die Forschung zum Thema "Acoustic speech features" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Sehen Sie die Zeitschriftenartikel für verschiedene Spezialgebieten durch und erstellen Sie Ihre Bibliographie auf korrekte Weise.

1

Masih, Dawa A. A., Nawzad K. Jalal, Manar N. A. Mohammed und Sulaiman A. Mustafa. „The Assessment of Acoustical Characteristics for Recent Mosque Buildings in Erbil City of Iraq“. ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY 9, Nr. 1 (01.03.2021): 51–66. http://dx.doi.org/10.14500/aro.10784.

Der volle Inhalt der Quelle
Annotation:
The study of mosque acoustics, concerning acoustical features, sound quality for speech intelligibility, and additional practical acoustic criteria, is commonly overlooked. Acoustic quality is vital to the fundamental use of mosques, in terms of contributing toward prayers and worshippers’ appreciation. This paper undertakes a comparative analysis of the acoustic quality level and the acoustical characteristics for two modern mosque buildings constructed in Erbil city. This work investigates and examines the acoustical quality and performance of these two mosques and their prayer halls through room simulation using ODEON Room Acoustics Software, to assess the degree of speech intelligibility according to acoustic criteria relative to the spatial requirements and design guidelines. The sound pressure level and other room-acoustic indicators, such as reverberation time (T30), early decay time, and speech transmission index, are tested. The outcomes demonstrate the quality of acoustics in the investigated mosques during semi-occupied and fully-occupied circumstances. The results specify that the sound quality within the both mosques is displeasing as the loudspeakers were off.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
2

Vyaltseva, Darya. „Acoustic Features of Twins’ Speech“. Vestnik Volgogradskogo gosudarstvennogo universiteta. Serija 2. Jazykoznanije 16, Nr. 3 (15.11.2017): 227–34. http://dx.doi.org/10.15688/jvolsu2.2017.3.24.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
3

Sepulveda-Sepulveda, Alexander, und German Castellanos-Domínguez. „Time-Frequency Energy Features for Articulator Position Inference on Stop Consonants“. Ingeniería y Ciencia 8, Nr. 16 (30.11.2012): 37–56. http://dx.doi.org/10.17230/ingciencia.8.16.2.

Der volle Inhalt der Quelle
Annotation:
Acoustic-to-Articulatory inversion offers new perspectives and interesting applicationsin the speech processing field; however, it remains an open issue. This paper presents a method to estimate the distribution of the articulatory informationcontained in the stop consonants’ acoustics, whose parametrizationis achieved by using the wavelet packet transform. The main focus is on measuringthe relevant acoustic information, in terms of statistical association, forthe inference of the position of critical articulators involved in stop consonantsproduction. The rank correlation Kendall coefficient is used as the relevance measure. The maps of relevant time–frequency features are calculated for theMOCHA–TIMIT database; from which, stop consonants are extracted andanalysed. The proposed method obtains a set of time–frequency components closely related to articulatory phenemenon, which offers a deeper understanding into the relationship between the articulatory and acoustical phenomena.The relevant maps are tested into an acoustic–to–articulatory mapping systembased on Gaussian mixture models, where it is shown they are suitable for improvingthe performance of such a systems over stop consonants. The method could be extended to other manner of articulation categories, e.g. fricatives,in order to adapt present method to acoustic-to-articulatory mapping systemsover whole speech.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
4

Ishimoto, Yuichi, und Noriko Suzuki. „Acoustic features of speech after glossectomy“. Journal of the Acoustical Society of America 120, Nr. 5 (November 2006): 3350–51. http://dx.doi.org/10.1121/1.4781416.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
5

Shuiskaya, Tatiana V., und Svetlana V. Androsova. „ACOUSTIC FEATURES OF CHILD SPEECH SOUNDS: CONSONANTS“. Theoretical and Applied Linguistics 2, Nr. 3 (2016): 123–37. http://dx.doi.org/10.22250/2410-7190_2016_2_3_123_137.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
6

Kobayashi, Maori, Yasuhiro Hamada und Masato Akagi. „Acoustic features in speech for emergency perception“. Journal of the Acoustical Society of America 144, Nr. 3 (September 2018): 1835. http://dx.doi.org/10.1121/1.5068086.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
7

Roh, Yong-Wan, Dong-Ju Kim, Woo-Seok Lee und Kwang-Seok Hong. „Novel acoustic features for speech emotion recognition“. Science in China Series E: Technological Sciences 52, Nr. 7 (09.06.2009): 1838–48. http://dx.doi.org/10.1007/s11431-009-0204-3.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
8

Yamamoto, Katsuhiko, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita und Tomohiro Nakatani. „Analysis of acoustic features for speech intelligibility prediction models analysis of acoustic features for speech intelligibility prediction models“. Journal of the Acoustical Society of America 140, Nr. 4 (Oktober 2016): 3114. http://dx.doi.org/10.1121/1.4969744.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
9

Jiang, Wei, Zheng Wang, Jesse S. Jin, Xianfeng Han und Chunguang Li. „Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network“. Sensors 19, Nr. 12 (18.06.2019): 2730. http://dx.doi.org/10.3390/s19122730.

Der volle Inhalt der Quelle
Annotation:
Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
10

Zlokarnik, Igor. „Adding articulatory features to acoustic features for automatic speech recognition“. Journal of the Acoustical Society of America 97, Nr. 5 (Mai 1995): 3246. http://dx.doi.org/10.1121/1.411699.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
11

Byun, Sung-Woo, und Seok-Pil Lee. „A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms“. Applied Sciences 11, Nr. 4 (21.02.2021): 1890. http://dx.doi.org/10.3390/app11041890.

Der volle Inhalt der Quelle
Annotation:
The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined speech databases are also needed to accurately recognize and analyze emotions from speech signals. In this work, we constructed a Korean emotional speech database for speech emotion analysis and proposed a feature combination that can improve emotion recognition performance using a recurrent neural network model. To investigate the acoustic features, which can reflect distinct momentary changes in emotional expression, we extracted F0, Mel-frequency cepstrum coefficients, spectral features, harmonic features, and others. Statistical analysis was performed to select an optimal combination of acoustic features that affect the emotion from speech. We used a recurrent neural network model to classify emotions from speech. The results show the proposed system has more accurate performance than previous studies.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
12

Lang, Haitao, und Jie Yang. „Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets“. Electronics 9, Nr. 7 (10.07.2020): 1125. http://dx.doi.org/10.3390/electronics9071125.

Der volle Inhalt der Quelle
Annotation:
Recently, supervised learning methods have shown promising performance, especially deep neural network-based (DNN) methods, in the application of single-channel speech enhancement. Generally, those approaches extract the acoustic features directly from the noisy speech to train a magnitude-aware target. In this paper, we propose to extract the acoustic features not only from the noisy speech but also from the pre-estimated speech, noise and phase separately, then fuse them into a new complementary feature for the purpose of obtaining more discriminative acoustic representation. In addition, on the basis of learning a magnitude-aware target, we also utilize the fusion feature to learn a phase-aware target, thereby further improving the accuracy of the recovered speech. We conduct extensive experiments, including performance comparison with some typical existing methods, generalization ability evaluation on unseen noise, ablation study, and subjective test by human listener, to demonstrate the feasibility and effectiveness of the proposed method. Experimental results prove that the proposed method has the ability to improve the quality and intelligibility of the reconstructed speech.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
13

Yoon, Yang-soo, Yongxin Li und Qian-Jie Fu. „Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation“. Journal of Speech, Language, and Hearing Research 55, Nr. 1 (Februar 2012): 105–24. http://dx.doi.org/10.1044/1092-4388(2011/10-0325).

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
14

Harding, Philip, und Ben Milner. „Reconstruction-based speech enhancement from robust acoustic features“. Speech Communication 75 (Dezember 2015): 62–75. http://dx.doi.org/10.1016/j.specom.2015.09.011.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
15

Morton, John, Mitchell Sommers, Steven Lulich, Abeer Alwan und Harish Arsikere. „Acoustic features mediating height estimation from human speech“. Journal of the Acoustical Society of America 134, Nr. 5 (November 2013): 4072. http://dx.doi.org/10.1121/1.4830873.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
16

ONOE, K., S. SATO, S. HOMMA, A. KOBAYASHI, T. IMAI und T. TAKAGI. „Bi-Spectral Acoustic Features for Robust Speech Recognition“. IEICE Transactions on Information and Systems E91-D, Nr. 3 (01.03.2008): 631–34. http://dx.doi.org/10.1093/ietisy/e91-d.3.631.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
17

Tursunov, Anvarjon, Soonil Kwon und Hee-Suk Pang. „Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features“. Applied Sciences 9, Nr. 12 (17.06.2019): 2470. http://dx.doi.org/10.3390/app9122470.

Der volle Inhalt der Quelle
Annotation:
The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
18

de Boer, Janna, Alban Voppel, Frank Wijnen und Iris Sommer. „T59. ACOUSTIC SPEECH MARKERS FOR SCHIZOPHRENIA“. Schizophrenia Bulletin 46, Supplement_1 (April 2020): S253—S254. http://dx.doi.org/10.1093/schbul/sbaa029.619.

Der volle Inhalt der Quelle
Annotation:
Abstract Background Clinicians routinely use impressions of speech as an element of mental status examination, including ‘pressured’ speech in mania and ‘monotone’ or ‘soft’ speech in depression or psychosis. In psychosis in particular, descriptions of speech are used to monitor (negative) symptom severity. Recent advances in computational linguistics have paved the way towards automated speech analyses as a biomarker for psychosis. In the present study, we assessed the diagnostic value of acoustic speech features in schizophrenia. We hypothesized that a classifier would be highly accurate (~ 80%) in classifying patients and healthy controls. Methods Natural speech samples were obtained from 86 patients with schizophrenia and 77 age and gender matched healthy controls through a semi-structured interview, using a set of neutral open-ended questions. Symptom severity was rated by consensus rating of two trained researchers, blinded to phonetic analysis, with the Positive And Negative Syndrome Scale (PANSS). Acoustic features were extracted with OpenSMILE, employing the Geneva Acoustic Minimalistic Parameter Set (GeMAPS), which comprises standardized analyses of pitch (F0), formants (F1, F2 and F3, i.e. acoustic resonance frequencies that indicate the position and movement of the articulatory muscles during speech production), speech quality, length of voiced and unvoiced regions. Speech features were fed into a linear kernel support vector machine (SVM) with leave-one-out cross-validation to assess their value for psychosis diagnosis. Results Demographic analyses revealed no differences between patients with schizophrenia and healthy controls in age or parental education. An automated machine-learning speech classifier reached an accuracy of 82.8% in classifying patients with schizophrenia and controls on speech features alone. Important features in the model were variation in loudness, spectral slope (i.e. the gradual decay in energy in high frequency speech sounds) and the amount of voiced regions (i.e. segments of the interview where the participant was speaking). PANSS positive, negative and general scores were significantly correlated with pitch, formant frequencies and length of voiced and unvoiced regions. Discussion This study demonstrates that an algorithm using quantified features of speech can objectively differentiate patients with schizophrenia from controls with high accuracy. Further validation in an independent sample is required. Employing standardized parameter sets ensures easy replication and comparison of analyses and can be used for cross linguistic studies. Although at an early stage, the field of clinical computational linguistics introduces a powerful tool for diagnosis and prognosis of psychosis and neuropsychiatric disorders in general. We consider this new diagnostic tool to be of high potential given its ease of acquirement, low costs and patient burden. For example, this tool could easily be implemented as a smartphone app to be used in treatment settings.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
19

Suire, Alexandre, Arnaud Tognetti, Valérie Durand, Michel Raymond und Melissa Barkat-Defradas. „Speech Acoustic Features: A Comparison of Gay Men, Heterosexual Men, and Heterosexual Women“. Archives of Sexual Behavior 49, Nr. 7 (31.03.2020): 2575–83. http://dx.doi.org/10.1007/s10508-020-01665-3.

Der volle Inhalt der Quelle
Annotation:
Abstract Potential differences between homosexual and heterosexual men have been studied on a diverse set of social and biological traits. Regarding acoustic features of speech, researchers have hypothesized a feminization of such characteristics in homosexual men, but previous investigations have so far produced mixed results. Moreover, most studies have been conducted with English-speaking populations, which calls for further cross-linguistic examinations. Lastly, no studies investigated so far the potential role of testosterone in the association between sexual orientation and speech acoustic features. To fill these gaps, we explored potential differences in acoustic features of speech between homosexual and heterosexual native French men and investigated whether the former showed a trend toward feminization by comparing theirs to that of heterosexual native French women. Lastly, we examined whether testosterone levels mediated the association between speech acoustic features and sexual orientation. We studied four sexually dimorphic acoustic features relevant for the qualification of feminine versus masculine voices: the fundamental frequency, its modulation, and two understudied acoustic features of speech, the harmonics-to-noise ratio (a proxy of vocal breathiness) and the jitter (a proxy of vocal roughness). Results showed that homosexual men displayed significantly higher pitch modulation patterns and less breathy voices compared to heterosexual men, with values shifted toward those of heterosexual women. Lastly, testosterone levels did not influence any of the investigated acoustic features. Combined with the literature conducted in other languages, our findings bring new support for the feminization hypothesis and suggest that the feminization of some acoustic features could be shared across languages.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
20

Niwano, Katsuko, und Kuniaki Sugai. „Acoustic Determinants Eliciting Japanese Infants' Vocal Response to Maternal Speech“. Psychological Reports 90, Nr. 1 (Februar 2002): 83–90. http://dx.doi.org/10.2466/pr0.2002.90.1.83.

Der volle Inhalt der Quelle
Annotation:
Generally, infants prefer infant-directed speech to adult-directed speech. This study investigated which acoustic features of maternal infant-directed speech elicit effectively 3-mo.-old infants' vocal response. The participants were 40 Japanese mother and infant dyads. Vocal f0 from the mother's speech and the infant's vocalization was extracted using Computerized Speech Laboratory (CSL4300) and custom software. The acoustical features measured were mean fundamental frequency (f0), and f0 contour. The rate of the infant's vocal response was significantly higher when the maternal infant-directed speech was terminated with a falling contour rather than a rising or flat contour. There was no significant difference between the mean f0 of the maternal infant-directed speech followed or not followed by the infant's vocal response. This suggests that the falling contour of terminal maternal infant-directed speech serves to elicit the 3-mo.-old infant's vocal response.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
21

Al Mahmud, Nahyan, und Shahfida Amjad Munni. „Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition“. International journal of Multimedia & Its Applications 12, Nr. 5 (30.10.2020): 1–8. http://dx.doi.org/10.5121/ijma.2020.12501.

Der volle Inhalt der Quelle
Annotation:
The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
22

C, Gunasekar, Sabrigirinathan C, Vinayagavel K und Ramkumar K. „The acoustic parameters for analysing speech with complete dentures“. International Journal of Dental Research 5, Nr. 2 (06.07.2017): 115. http://dx.doi.org/10.14419/ijdr.v5i2.7789.

Der volle Inhalt der Quelle
Annotation:
The patients those who wear dentures have difficulty in speech becomes a major concern. Various studies were provided to improve speech with various technique, suggestions and conclusions but the better or exact test is to analyses the speech with complete dentures. Here we like to give brief details on understanding about proper knowledge on speech production, perception and the acoustic features and useful parameters to analyses these acoustic features with the quality of speech in denture wearers.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
23

Amano, Akio. „Speech recognition apparatus capable of discriminating between similar acoustic features of speech“. Journal of the Acoustical Society of America 94, Nr. 1 (Juli 1993): 613. http://dx.doi.org/10.1121/1.408210.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
24

Zhang, Zhan, Yuehai Wang und Jianyi Yang. „Accent Recognition with Hybrid Phonetic Features“. Sensors 21, Nr. 18 (18.09.2021): 6258. http://dx.doi.org/10.3390/s21186258.

Der volle Inhalt der Quelle
Annotation:
The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
25

Lee, Moa, und Joon-Hyuk Chang. „Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots“. Applied Sciences 10, Nr. 13 (02.07.2020): 4602. http://dx.doi.org/10.3390/app10134602.

Der volle Inhalt der Quelle
Annotation:
Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
26

Swarna Kuchibhotla, Dr, und Mr Niranjan M.S.R. „Emotional Classification of Acoustic Information With Optimal Feature Subset Selection Methods“. International Journal of Engineering & Technology 7, Nr. 2.32 (31.05.2018): 39. http://dx.doi.org/10.14419/ijet.v7i2.32.13521.

Der volle Inhalt der Quelle
Annotation:
This paper mainly focuses on classification of various Acoustic emotional corpora with frequency domain features using feature subset selection methods. The emotional speech samples are classified into neutral, happy, fear , anger, disgust and sad states by using properties of statistics of spectral features estimated from Berlin and Spanish emotional utterances. The Sequential Forward Selection(SFS) and Sequential Floating Forward Selection(SFFS)feature subset selection algorithms are for extracting more informative features. The number of speech emotional samples available for training is smaller than that of the number of features extracted from the speech sample in both Berlin and Spanish corpora which is called curse of dimensionality. Because of this feature vector of high dimensionality the efficiency of the classifier decreases and at the same time the computational time also increases. For additional improvement in the efficiency of the classifier a subset of features which are optimal is needed and is obtained by using feature subset selection methods. This will enhances the performance of the system with high efficiency and lower computation time. The classifier used in this work is the standard K Nearest Neighbour (KNN) Classifier. Experimental evaluation proved that the performance of the classifier is enhanced with SFFS because it vanishes the nesting effect suffered by SFS. The results also showed that an optimal feature subset is a better choice for classification rather than full feature set.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
27

Missaoui, Ibrahim, und Zied Lachiri. „An Extraction Method of Acoustic Features for Speech Recognition“. Research Journal of Applied Sciences, Engineering and Technology 12, Nr. 9 (05.05.2016): 964–67. http://dx.doi.org/10.19026/rjaset.12.2814.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
28

Shahnawazuddin, Syed, Rohit Sinha und Gayadhar Pradhan. „Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition“. IEEE Signal Processing Letters 24, Nr. 8 (August 2017): 1128–32. http://dx.doi.org/10.1109/lsp.2017.2705085.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
29

Kubo, Rieko, und Masato Akagi. „Acoustic features of intelligible speech produced under reverberant environments“. Journal of the Acoustical Society of America 144, Nr. 3 (September 2018): 1802. http://dx.doi.org/10.1121/1.5067954.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
30

Rosenhouse, Judith K. „Assessing acoustic features in the speech of asylum seekers“. Journal of the Acoustical Society of America 133, Nr. 5 (Mai 2013): 3244. http://dx.doi.org/10.1121/1.4805198.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
31

Nasir, Md, Brian Robert Baucom, Panayiotis Georgiou und Shrikanth Narayanan. „Predicting couple therapy outcomes based on speech acoustic features“. PLOS ONE 12, Nr. 9 (21.09.2017): e0185123. http://dx.doi.org/10.1371/journal.pone.0185123.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
32

Romigh, Griffin, Clayton Rothwell, Brandon Greenwell und Meagan Newman. „Modeling uncertainty in spontaneous speech: Lexical and acoustic features“. Journal of the Acoustical Society of America 140, Nr. 4 (Oktober 2016): 3401. http://dx.doi.org/10.1121/1.4970912.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
33

Zvarevashe, Kudakwashe, und Oludayo Olugbara. „Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition“. Algorithms 13, Nr. 3 (22.03.2020): 70. http://dx.doi.org/10.3390/a13030070.

Der volle Inhalt der Quelle
Annotation:
Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
34

Buckley, Daniel P., Manuel Diaz Cadiz, Tanya L. Eadie und Cara E. Stepp. „Acoustic Model of Perceived Overall Severity of Dysphonia in Adductor-Type Laryngeal Dystonia“. Journal of Speech, Language, and Hearing Research 63, Nr. 8 (10.08.2020): 2713–22. http://dx.doi.org/10.1044/2020_jslhr-19-00354.

Der volle Inhalt der Quelle
Annotation:
Purpose This study is a secondary analysis of existing data. The goal of the study was to construct an acoustic model of perceived overall severity of dysphonia in adductory laryngeal dystonia (AdLD). We predicted that acoustic measures (a) related to voice and pitch breaks and (b) related to vocal effort would form the primary elements of a model corresponding to auditory-perceptual ratings of overall severity of dysphonia. Method Twenty inexperienced listeners evaluated the overall severity of dysphonia of speech stimuli from 19 individuals with AdLD. Acoustic features related to primary signs of AdLD (hyperadduction resulting in pitch and voice breaks) and to a potential secondary symptom of AdLD (vocal effort, measures of relative fundamental frequency) were computed from the speech stimuli. Multiple linear regression analysis was applied to construct an acoustic model of the overall severity of dysphonia. Results The acoustic model included an acoustic feature related to pitch and voice breaks and three acoustic measures derived from relative fundamental frequency; it explained 84.9% of the variance in the auditory-perceptual ratings of overall severity of dysphonia in the speech samples. Conclusions Auditory-perceptual ratings of overall severity of dysphonia in AdLD were related to acoustic features of primary signs (pitch and voice breaks, hyperadduction associated with laryngeal spasms) and were also related to acoustic features of vocal effort. This suggests that compensatory vocal effort may be a secondary symptom in AdLD. Future work to generalize this acoustic model to a larger, independent data set is necessary before clinical translation is warranted.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
35

Duffy, Joseph R., Edythe A. Strand, Heather Clark, Mary Machulda, Jennifer L. Whitwell und Keith A. Josephs. „Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates“. American Journal of Speech-Language Pathology 24, Nr. 2 (Mai 2015): 88–100. http://dx.doi.org/10.1044/2015_ajslp-14-0174.

Der volle Inhalt der Quelle
Annotation:
Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
36

Oh, Yoo Rhee, Kiyoung Park und Jeon Gyu Park. „Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector“. Applied Sciences 10, Nr. 12 (14.06.2020): 4091. http://dx.doi.org/10.3390/app10124091.

Der volle Inhalt der Quelle
Annotation:
This paper aims to design an online, low-latency, and high-performance speech recognition system using a bidirectional long short-term memory (BLSTM) acoustic model. To achieve this, we adopt a server-client model and a context-sensitive-chunk-based approach. The speech recognition server manages a main thread and a decoder thread for each client and one worker thread. The main thread communicates with the connected client, extracts speech features, and buffers the features. The decoder thread performs speech recognition, including the proposed multichannel parallel acoustic score computation of a BLSTM acoustic model, the proposed deep neural network-based voice activity detector, and Viterbi decoding. The proposed acoustic score computation method estimates the acoustic scores of a context-sensitive-chunk BLSTM acoustic model for the batched speech features from concurrent clients, using the worker thread. The proposed deep neural network-based voice activity detector detects short pauses in the utterance to reduce response latency, while the user utters long sentences. From the experiments of Korean speech recognition, the number of concurrent clients is increased from 22 to 44 using the proposed acoustic score computation. When combined with the frame skipping method, the number is further increased up to 59 clients with a small accuracy degradation. Moreover, the average user-perceived latency is reduced from 11.71 s to 3.09–5.41 s by using the proposed deep neural network-based voice activity detector.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
37

Bae, Youkyung, David P. Kuehn, Charles A. Conway und Bradley P. Sutton. „Real-Time Magnetic Resonance Imaging of Velopharyngeal Activities with Simultaneous Speech Recordings“. Cleft Palate-Craniofacial Journal 48, Nr. 6 (November 2011): 695–707. http://dx.doi.org/10.1597/09-158.

Der volle Inhalt der Quelle
Annotation:
Objective To examine the relationships between acoustic and physiologic aspects of the velopharyngeal mechanism during acoustically nasalized segments of speech in normal individuals by combining fast magnetic resonance imaging (MRI) with simultaneous speech recordings and subsequent acoustic analyses. Design Ten normal Caucasian adult individuals participated in the study. M id sagittal dynamic magnetic resonance imaging (MRI) and simultaneous speech recordings were performed while participants were producing repetitions of two rate-controlled nonsense syllables including /zanaza/ and /zunuzu/. Acoustic features of nasalization represented as the peak amplitude and the bandwidth of the first resonant frequency (F1) were derived from speech at the rate of 30 sets per second. Physiologic information was based on velar and tongue positional changes measured from the dynamic MRI data, which were acquired at a rate of 21.4 images per second and resampled with a corresponding rate of 30 images per second. Each acoustic feature of nasalization was regressed on gender, vowel context, and velar and tongue positional variables. Results Acoustic features of nasalization represented by F1 peak amplitude and bandwidth changes were significantly influenced by the vowel context surrounding the nasal consonant, velar elevated position, and tongue height at the tip. Conclusions Fast MRI combined with acoustic analysis was successfully applied to the investigation of acoustic-physiologic relationships of the velopharyngeal mechanism with the type of speech samples employed in the present study. Future applications are feasible to examine how anatomic and physiologic deviations of the velopharyngeal mechanism would be acoustically manifested in individuals with velopharyngeal incompetence.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
38

Xiong, Feifei, Stefan Goetze, Birger Kollmeier und Bernd T. Meyer. „Exploring Auditory-Inspired Acoustic Features for Room Acoustic Parameter Estimation From Monaural Speech“. IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, Nr. 10 (Oktober 2018): 1809–20. http://dx.doi.org/10.1109/taslp.2018.2843537.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
39

Kuchibhotla, Swarna, Hima Deepthi Vankayalapati und Koteswara Rao Anne. „An optimal two stage feature selection for speech emotion recognition using acoustic features“. International Journal of Speech Technology 19, Nr. 4 (02.08.2016): 657–67. http://dx.doi.org/10.1007/s10772-016-9358-0.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
40

Ren, Guofeng, Guicheng Shao und Jianmei Fu. „Articulatory-to-Acoustic Conversion Using BiLSTM-CNN Word-Attention-Based Method“. Complexity 2020 (26.09.2020): 1–10. http://dx.doi.org/10.1155/2020/4356981.

Der volle Inhalt der Quelle
Annotation:
In the recent years, along with the development of artificial intelligence (AI) and man-machine interaction technology, speech recognition and production have been asked to adapt to the rapid development of AI and man-machine technology, which need to improve recognition accuracy through adding novel features, fusing the feature, and improving recognition methods. Aiming at developing novel recognition feature and application to speech recognition, this paper presents a new method for articulatory-to-acoustic conversion. In the study, we have converted articulatory features (i.e., velocities of tongue and motion of lips) into acoustic features (i.e., the second formant and Mel-Cepstra). By considering the graphical representation of the articulators’ motion, this study combined Bidirectional Long Short-Term Memory (BiLSTM) with convolution neural network (CNN) and adopted the idea of word attention in Mandarin to extract semantic features. In this paper, we used the electromagnetic articulography (EMA) database designed by Taiyuan University of Technology, which contains ten speakers’ 299 disyllables and sentences of Mandarin, and extracted 8-dimensional articulatory features and 1-dimensional semantic feature relying on the word-attention layer; we then trained 200 samples and tested 99 samples for the articulatory-to-acoustic conversion. Finally, Root Mean Square Error (RMSE), Mean Mel-Cepstral Distortion (MMCD), and correlation coefficient have been used to evaluate the conversion effect and for comparison with Gaussian Mixture Model (GMM) and BiLSTM of recurrent neural network (BiLSTM-RNN). The results illustrated that the MMCD of Mel-Frequency Cepstrum Coefficient (MFCC) was 1.467 dB, and the RMSE of F2 was 22.10 Hz. The research results of this study can be used in the features fusion and speech recognition to improve the accuracy of recognition.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
41

Parjane, Natalia, Sunghye Cho, Sharon Ash, Katheryn A. Q. Cousins, Sanjana Shellikeri, Mark Liberman, Leslie M. Shaw, David J. Irwin, Murray Grossman und Naomi Nevler. „Digital Speech Analysis in Progressive Supranuclear Palsy and Corticobasal Syndromes“. Journal of Alzheimer's Disease 82, Nr. 1 (29.06.2021): 33–45. http://dx.doi.org/10.3233/jad-201132.

Der volle Inhalt der Quelle
Annotation:
Background: Progressive supranuclear palsy syndrome (PSPS) and corticobasal syndrome (CBS) as well as non-fluent/agrammatic primary progressive aphasia (naPPA) are often associated with misfolded 4-repeat tau pathology, but the diversity of the associated speech features is poorly understood. Objective: Investigate the full range of acoustic and lexical properties of speech to test the hypothesis that PSPS-CBS show a subset of speech impairments found in naPPA. Methods: Acoustic and lexical measures, extracted from natural, digitized semi-structured speech samples using novel, automated methods, were compared in PSPS-CBS (n = 87), naPPA (n = 25), and healthy controls (HC, n = 41). We related these measures to grammatical performance and speech fluency, core features of naPPA, to neuropsychological measures of naming, executive, memory and visuoconstructional functioning, and to cerebrospinal fluid (CSF) phosphorylated tau (pTau) levels in patients with available biofluid analytes. Results: Both naPPA and PSPS-CBS speech produced shorter speech segments, longer pauses, higher pause rates, reduced fundamental frequency (f0) pitch ranges, and slower speech rate compared to HC. naPPA speech was distinct from PSPS-CBS with shorter speech segments, more frequent pauses, slower speech rate, reduced verb production, and higher partial word production. In both groups, acoustic duration measures generally correlated with speech fluency, measured as words per minute, and grammatical performance. Speech measures did not correlate with standard neuropsychological measures. CSF pTau levels correlated with f0 range in PSPS-CBS and naPPA. Conclusion: Lexical and acoustic speech features of PSPS-CBS overlaps those of naPPA and are related to CSF pTau levels.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
42

Brodbeck, Christian, Alex Jiao, L. Elliot Hong und Jonathan Z. Simon. „Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers“. PLOS Biology 18, Nr. 10 (22.10.2020): e3000883. http://dx.doi.org/10.1371/journal.pbio.3000883.

Der volle Inhalt der Quelle
Annotation:
Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers’ spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
43

Cherif, Youssouf Ismail, und Abdelhakim Dahimene. „IMPROVED VOICE-BASED BIOMETRICS USING MULTI-CHANNEL TRANSFER LEARNING“. IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS 15, Nr. 1 (07.10.2020): 99–113. http://dx.doi.org/10.33965/ijcsis_2020150108.

Der volle Inhalt der Quelle
Annotation:
Identifying the speaker has become more of an imperative thing to do in the modern age. Especially since most personal and professional appliances rely on voice commands or speech in general terms to operate. These systems need to discern the identity of the speaker rather than just the words that have been said to be both smart and safe. Especially if we consider the numerous advanced methods that have been developed to generate fake speech segments. The objective of this paper is to improve upon the existing voice-based biometrics to keep up with these synthesizers. The proposed method focuses on defining a novel and more speaker adapted features by implying artificial neural networks and transfer learning. The approach uses pre-trained networks to define a mapping from two complementary acoustic features to a speaker adapted phonetic features. The complementary acoustics features are paired to provide both information about how the speech segments are perceived (type 1 feature) and produced (type 2 feature). The approach was evaluated using both a small and large closed-speaker data set. Primary results are encouraging and confirm the usefulness of such an approach to extract speaker adapted features whether for classical machine learning algorithms or advanced neural structures such as LSTM or CNN.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
44

Dehaene-Lambertz, G. „Cerebral Specialization for Speech and Non-Speech Stimuli in Infants“. Journal of Cognitive Neuroscience 12, Nr. 3 (Mai 2000): 449–60. http://dx.doi.org/10.1162/089892900562264.

Der volle Inhalt der Quelle
Annotation:
Early cerebral specialization and lateralization for auditory processing in 4-month-old infants was studied by recording high-density evoked potentials to acoustical and phonetic changes in a series of repeated stimuli (either tones or syllables). Mismatch responses to these stimuli exhibit a distinct topography suggesting that different neural networks within the temporal lobe are involved in the perception and representation of the different features of an auditory stimulus. These data confirm that specialized modules are present within the auditory cortex very early in development. However, both for syllables and continuous tones, higher voltages were recorded over the left hemisphere than over the right with no significant interaction of hemisphere by type of stimuli. This suggests that there is no greater left hemisphere involvement in phonetic processing than in acoustic processing during the first months of life.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
45

Ding, Nai, und Jonathan Z. Simon. „Neural coding of continuous speech in auditory cortex during monaural and dichotic listening“. Journal of Neurophysiology 107, Nr. 1 (Januar 2012): 78–89. http://dx.doi.org/10.1152/jn.00297.2011.

Der volle Inhalt der Quelle
Annotation:
The cortical representation of the acoustic features of continuous speech is the foundation of speech perception. In this study, noninvasive magnetoencephalography (MEG) recordings are obtained from human subjects actively listening to spoken narratives, in both simple and cocktail party-like auditory scenes. By modeling how acoustic features of speech are encoded in ongoing MEG activity as a spectrotemporal response function, we demonstrate that the slow temporal modulations of speech in a broad spectral region are represented bilaterally in auditory cortex by a phase-locked temporal code. For speech presented monaurally to either ear, this phase-locked response is always more faithful in the right hemisphere, but with a shorter latency in the hemisphere contralateral to the stimulated ear. When different spoken narratives are presented to each ear simultaneously (dichotic listening), the resulting cortical neural activity precisely encodes the acoustic features of both of the spoken narratives, but slightly weakened and delayed compared with the monaural response. Critically, the early sensory response to the attended speech is considerably stronger than that to the unattended speech, demonstrating top-down attentional gain control. This attentional gain is substantial even during the subjects' very first exposure to the speech mixture and therefore largely independent of knowledge of the speech content. Together, these findings characterize how the spectrotemporal features of speech are encoded in human auditory cortex and establish a single-trial-based paradigm to study the neural basis underlying the cocktail party phenomenon.
APA, Harvard, Vancouver, ISO und andere Zitierweisen
46

Zahorian, Stephen A., Hongbing Hu und Jiang Wu. „Time/frequency resolution of acoustic features for automatic speech recognition.“ Journal of the Acoustical Society of America 128, Nr. 4 (Oktober 2010): 2324. http://dx.doi.org/10.1121/1.3508203.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
47

Kusumoto, Akiko, und Nancy Vaughan. „Comparison of acoustic features of time‐compressed and natural speech“. Journal of the Acoustical Society of America 116, Nr. 4 (Oktober 2004): 2600–2601. http://dx.doi.org/10.1121/1.4785377.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
48

Mikumo, Mariko. „Relationship between the acoustic features and impression evaluation in speech“. Proceedings of the Annual Convention of the Japanese Psychological Association 79 (22.09.2015): 3AM—072–3AM—072. http://dx.doi.org/10.4992/pacjpa.79.0_3am-072.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
49

Valente, Fabio, Mathew Magimai Doss, Christian Plahl, Suman Ravuri und Wen Wang. „Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features“. IEEE Transactions on Audio, Speech, and Language Processing 19, Nr. 8 (November 2011): 2439–50. http://dx.doi.org/10.1109/tasl.2011.2139206.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
50

Chan, C. P., P. C. Ching und Tan Lee. „Noisy speech recognition using de-noised multiresolution analysis acoustic features“. Journal of the Acoustical Society of America 110, Nr. 5 (November 2001): 2567–74. http://dx.doi.org/10.1121/1.1398054.

Der volle Inhalt der Quelle
APA, Harvard, Vancouver, ISO und andere Zitierweisen
Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!

Zur Bibliographie