To see the other types of publications on this topic, follow the link: Acoustic speech features.

Journal articles on the topic 'Acoustic speech features'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Acoustic speech features.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Masih, Dawa A. A., Nawzad K. Jalal, Manar N. A. Mohammed, and Sulaiman A. Mustafa. "The Assessment of Acoustical Characteristics for Recent Mosque Buildings in Erbil City of Iraq." ARO-THE SCIENTIFIC JOURNAL OF KOYA UNIVERSITY 9, no. 1 (March 1, 2021): 51–66. http://dx.doi.org/10.14500/aro.10784.

Full text
Abstract:
The study of mosque acoustics, concerning acoustical features, sound quality for speech intelligibility, and additional practical acoustic criteria, is commonly overlooked. Acoustic quality is vital to the fundamental use of mosques, in terms of contributing toward prayers and worshippers’ appreciation. This paper undertakes a comparative analysis of the acoustic quality level and the acoustical characteristics for two modern mosque buildings constructed in Erbil city. This work investigates and examines the acoustical quality and performance of these two mosques and their prayer halls through room simulation using ODEON Room Acoustics Software, to assess the degree of speech intelligibility according to acoustic criteria relative to the spatial requirements and design guidelines. The sound pressure level and other room-acoustic indicators, such as reverberation time (T30), early decay time, and speech transmission index, are tested. The outcomes demonstrate the quality of acoustics in the investigated mosques during semi-occupied and fully-occupied circumstances. The results specify that the sound quality within the both mosques is displeasing as the loudspeakers were off.
APA, Harvard, Vancouver, ISO, and other styles
2

Vyaltseva, Darya. "Acoustic Features of Twins’ Speech." Vestnik Volgogradskogo gosudarstvennogo universiteta. Serija 2. Jazykoznanije 16, no. 3 (November 15, 2017): 227–34. http://dx.doi.org/10.15688/jvolsu2.2017.3.24.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Sepulveda-Sepulveda, Alexander, and German Castellanos-Domínguez. "Time-Frequency Energy Features for Articulator Position Inference on Stop Consonants." Ingeniería y Ciencia 8, no. 16 (November 30, 2012): 37–56. http://dx.doi.org/10.17230/ingciencia.8.16.2.

Full text
Abstract:
Acoustic-to-Articulatory inversion offers new perspectives and interesting applicationsin the speech processing field; however, it remains an open issue. This paper presents a method to estimate the distribution of the articulatory informationcontained in the stop consonants’ acoustics, whose parametrizationis achieved by using the wavelet packet transform. The main focus is on measuringthe relevant acoustic information, in terms of statistical association, forthe inference of the position of critical articulators involved in stop consonantsproduction. The rank correlation Kendall coefficient is used as the relevance measure. The maps of relevant time–frequency features are calculated for theMOCHA–TIMIT database; from which, stop consonants are extracted andanalysed. The proposed method obtains a set of time–frequency components closely related to articulatory phenemenon, which offers a deeper understanding into the relationship between the articulatory and acoustical phenomena.The relevant maps are tested into an acoustic–to–articulatory mapping systembased on Gaussian mixture models, where it is shown they are suitable for improvingthe performance of such a systems over stop consonants. The method could be extended to other manner of articulation categories, e.g. fricatives,in order to adapt present method to acoustic-to-articulatory mapping systemsover whole speech.
APA, Harvard, Vancouver, ISO, and other styles
4

Ishimoto, Yuichi, and Noriko Suzuki. "Acoustic features of speech after glossectomy." Journal of the Acoustical Society of America 120, no. 5 (November 2006): 3350–51. http://dx.doi.org/10.1121/1.4781416.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Shuiskaya, Tatiana V., and Svetlana V. Androsova. "ACOUSTIC FEATURES OF CHILD SPEECH SOUNDS: CONSONANTS." Theoretical and Applied Linguistics 2, no. 3 (2016): 123–37. http://dx.doi.org/10.22250/2410-7190_2016_2_3_123_137.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Kobayashi, Maori, Yasuhiro Hamada, and Masato Akagi. "Acoustic features in speech for emergency perception." Journal of the Acoustical Society of America 144, no. 3 (September 2018): 1835. http://dx.doi.org/10.1121/1.5068086.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Roh, Yong-Wan, Dong-Ju Kim, Woo-Seok Lee, and Kwang-Seok Hong. "Novel acoustic features for speech emotion recognition." Science in China Series E: Technological Sciences 52, no. 7 (June 9, 2009): 1838–48. http://dx.doi.org/10.1007/s11431-009-0204-3.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yamamoto, Katsuhiko, Toshio Irino, Toshie Matsui, Shoko Araki, Keisuke Kinoshita, and Tomohiro Nakatani. "Analysis of acoustic features for speech intelligibility prediction models analysis of acoustic features for speech intelligibility prediction models." Journal of the Acoustical Society of America 140, no. 4 (October 2016): 3114. http://dx.doi.org/10.1121/1.4969744.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Jiang, Wei, Zheng Wang, Jesse S. Jin, Xianfeng Han, and Chunguang Li. "Speech Emotion Recognition with Heterogeneous Feature Unification of Deep Neural Network." Sensors 19, no. 12 (June 18, 2019): 2730. http://dx.doi.org/10.3390/s19122730.

Full text
Abstract:
Automatic speech emotion recognition is a challenging task due to the gap between acoustic features and human emotions, which rely strongly on the discriminative acoustic features extracted for a given recognition task. We propose a novel deep neural architecture to extract the informative feature representations from the heterogeneous acoustic feature groups which may contain redundant and unrelated information leading to low emotion recognition performance in this work. After obtaining the informative features, a fusion network is trained to jointly learn the discriminative acoustic feature representation and a Support Vector Machine (SVM) is used as the final classifier for recognition task. Experimental results on the IEMOCAP dataset demonstrate that the proposed architecture improved the recognition performance, achieving accuracy of 64% compared to existing state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
10

Zlokarnik, Igor. "Adding articulatory features to acoustic features for automatic speech recognition." Journal of the Acoustical Society of America 97, no. 5 (May 1995): 3246. http://dx.doi.org/10.1121/1.411699.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Byun, Sung-Woo, and Seok-Pil Lee. "A Study on a Speech Emotion Recognition System with Effective Acoustic Features Using Deep Learning Algorithms." Applied Sciences 11, no. 4 (February 21, 2021): 1890. http://dx.doi.org/10.3390/app11041890.

Full text
Abstract:
The goal of the human interface is to recognize the user’s emotional state precisely. In the speech emotion recognition study, the most important issue is the effective parallel use of the extraction of proper speech features and an appropriate classification engine. Well defined speech databases are also needed to accurately recognize and analyze emotions from speech signals. In this work, we constructed a Korean emotional speech database for speech emotion analysis and proposed a feature combination that can improve emotion recognition performance using a recurrent neural network model. To investigate the acoustic features, which can reflect distinct momentary changes in emotional expression, we extracted F0, Mel-frequency cepstrum coefficients, spectral features, harmonic features, and others. Statistical analysis was performed to select an optimal combination of acoustic features that affect the emotion from speech. We used a recurrent neural network model to classify emotions from speech. The results show the proposed system has more accurate performance than previous studies.
APA, Harvard, Vancouver, ISO, and other styles
12

Lang, Haitao, and Jie Yang. "Speech Enhancement Based on Fusion of Both Magnitude/Phase-Aware Features and Targets." Electronics 9, no. 7 (July 10, 2020): 1125. http://dx.doi.org/10.3390/electronics9071125.

Full text
Abstract:
Recently, supervised learning methods have shown promising performance, especially deep neural network-based (DNN) methods, in the application of single-channel speech enhancement. Generally, those approaches extract the acoustic features directly from the noisy speech to train a magnitude-aware target. In this paper, we propose to extract the acoustic features not only from the noisy speech but also from the pre-estimated speech, noise and phase separately, then fuse them into a new complementary feature for the purpose of obtaining more discriminative acoustic representation. In addition, on the basis of learning a magnitude-aware target, we also utilize the fusion feature to learn a phase-aware target, thereby further improving the accuracy of the recovered speech. We conduct extensive experiments, including performance comparison with some typical existing methods, generalization ability evaluation on unseen noise, ablation study, and subjective test by human listener, to demonstrate the feasibility and effectiveness of the proposed method. Experimental results prove that the proposed method has the ability to improve the quality and intelligibility of the reconstructed speech.
APA, Harvard, Vancouver, ISO, and other styles
13

Yoon, Yang-soo, Yongxin Li, and Qian-Jie Fu. "Speech Recognition and Acoustic Features in Combined Electric and Acoustic Stimulation." Journal of Speech, Language, and Hearing Research 55, no. 1 (February 2012): 105–24. http://dx.doi.org/10.1044/1092-4388(2011/10-0325).

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Harding, Philip, and Ben Milner. "Reconstruction-based speech enhancement from robust acoustic features." Speech Communication 75 (December 2015): 62–75. http://dx.doi.org/10.1016/j.specom.2015.09.011.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Morton, John, Mitchell Sommers, Steven Lulich, Abeer Alwan, and Harish Arsikere. "Acoustic features mediating height estimation from human speech." Journal of the Acoustical Society of America 134, no. 5 (November 2013): 4072. http://dx.doi.org/10.1121/1.4830873.

Full text
APA, Harvard, Vancouver, ISO, and other styles
16

ONOE, K., S. SATO, S. HOMMA, A. KOBAYASHI, T. IMAI, and T. TAKAGI. "Bi-Spectral Acoustic Features for Robust Speech Recognition." IEICE Transactions on Information and Systems E91-D, no. 3 (March 1, 2008): 631–34. http://dx.doi.org/10.1093/ietisy/e91-d.3.631.

Full text
APA, Harvard, Vancouver, ISO, and other styles
17

Tursunov, Anvarjon, Soonil Kwon, and Hee-Suk Pang. "Discriminating Emotions in the Valence Dimension from Speech Using Timbre Features." Applied Sciences 9, no. 12 (June 17, 2019): 2470. http://dx.doi.org/10.3390/app9122470.

Full text
Abstract:
The most used and well-known acoustic features of a speech signal, the Mel frequency cepstral coefficients (MFCC), cannot characterize emotions in speech sufficiently when a classification is performed to classify both discrete emotions (i.e., anger, happiness, sadness, and neutral) and emotions in valence dimension (positive and negative). The main reason for this is that some of the discrete emotions, such as anger and happiness, share similar acoustic features in the arousal dimension (high and low) but are different in the valence dimension. Timbre is a sound quality that can discriminate between two sounds even with the same pitch and loudness. In this paper, we analyzed timbre acoustic features to improve the classification performance of discrete emotions as well as emotions in the valence dimension. Sequential forward selection (SFS) was used to find the most relevant acoustic features among timbre acoustic features. The experiments were carried out on the Berlin Emotional Speech Database and the Interactive Emotional Dyadic Motion Capture Database. Support vector machine (SVM) and long short-term memory recurrent neural network (LSTM-RNN) were used to classify emotions. The significant classification performance improvements were achieved using a combination of baseline and the most relevant timbre acoustic features, which were found by applying SFS on a classification of emotions for the Berlin Emotional Speech Database. From extensive experiments, it was found that timbre acoustic features could characterize emotions sufficiently in a speech in the valence dimension.
APA, Harvard, Vancouver, ISO, and other styles
18

de Boer, Janna, Alban Voppel, Frank Wijnen, and Iris Sommer. "T59. ACOUSTIC SPEECH MARKERS FOR SCHIZOPHRENIA." Schizophrenia Bulletin 46, Supplement_1 (April 2020): S253—S254. http://dx.doi.org/10.1093/schbul/sbaa029.619.

Full text
Abstract:
Abstract Background Clinicians routinely use impressions of speech as an element of mental status examination, including ‘pressured’ speech in mania and ‘monotone’ or ‘soft’ speech in depression or psychosis. In psychosis in particular, descriptions of speech are used to monitor (negative) symptom severity. Recent advances in computational linguistics have paved the way towards automated speech analyses as a biomarker for psychosis. In the present study, we assessed the diagnostic value of acoustic speech features in schizophrenia. We hypothesized that a classifier would be highly accurate (~ 80%) in classifying patients and healthy controls. Methods Natural speech samples were obtained from 86 patients with schizophrenia and 77 age and gender matched healthy controls through a semi-structured interview, using a set of neutral open-ended questions. Symptom severity was rated by consensus rating of two trained researchers, blinded to phonetic analysis, with the Positive And Negative Syndrome Scale (PANSS). Acoustic features were extracted with OpenSMILE, employing the Geneva Acoustic Minimalistic Parameter Set (GeMAPS), which comprises standardized analyses of pitch (F0), formants (F1, F2 and F3, i.e. acoustic resonance frequencies that indicate the position and movement of the articulatory muscles during speech production), speech quality, length of voiced and unvoiced regions. Speech features were fed into a linear kernel support vector machine (SVM) with leave-one-out cross-validation to assess their value for psychosis diagnosis. Results Demographic analyses revealed no differences between patients with schizophrenia and healthy controls in age or parental education. An automated machine-learning speech classifier reached an accuracy of 82.8% in classifying patients with schizophrenia and controls on speech features alone. Important features in the model were variation in loudness, spectral slope (i.e. the gradual decay in energy in high frequency speech sounds) and the amount of voiced regions (i.e. segments of the interview where the participant was speaking). PANSS positive, negative and general scores were significantly correlated with pitch, formant frequencies and length of voiced and unvoiced regions. Discussion This study demonstrates that an algorithm using quantified features of speech can objectively differentiate patients with schizophrenia from controls with high accuracy. Further validation in an independent sample is required. Employing standardized parameter sets ensures easy replication and comparison of analyses and can be used for cross linguistic studies. Although at an early stage, the field of clinical computational linguistics introduces a powerful tool for diagnosis and prognosis of psychosis and neuropsychiatric disorders in general. We consider this new diagnostic tool to be of high potential given its ease of acquirement, low costs and patient burden. For example, this tool could easily be implemented as a smartphone app to be used in treatment settings.
APA, Harvard, Vancouver, ISO, and other styles
19

Suire, Alexandre, Arnaud Tognetti, Valérie Durand, Michel Raymond, and Melissa Barkat-Defradas. "Speech Acoustic Features: A Comparison of Gay Men, Heterosexual Men, and Heterosexual Women." Archives of Sexual Behavior 49, no. 7 (March 31, 2020): 2575–83. http://dx.doi.org/10.1007/s10508-020-01665-3.

Full text
Abstract:
Abstract Potential differences between homosexual and heterosexual men have been studied on a diverse set of social and biological traits. Regarding acoustic features of speech, researchers have hypothesized a feminization of such characteristics in homosexual men, but previous investigations have so far produced mixed results. Moreover, most studies have been conducted with English-speaking populations, which calls for further cross-linguistic examinations. Lastly, no studies investigated so far the potential role of testosterone in the association between sexual orientation and speech acoustic features. To fill these gaps, we explored potential differences in acoustic features of speech between homosexual and heterosexual native French men and investigated whether the former showed a trend toward feminization by comparing theirs to that of heterosexual native French women. Lastly, we examined whether testosterone levels mediated the association between speech acoustic features and sexual orientation. We studied four sexually dimorphic acoustic features relevant for the qualification of feminine versus masculine voices: the fundamental frequency, its modulation, and two understudied acoustic features of speech, the harmonics-to-noise ratio (a proxy of vocal breathiness) and the jitter (a proxy of vocal roughness). Results showed that homosexual men displayed significantly higher pitch modulation patterns and less breathy voices compared to heterosexual men, with values shifted toward those of heterosexual women. Lastly, testosterone levels did not influence any of the investigated acoustic features. Combined with the literature conducted in other languages, our findings bring new support for the feminization hypothesis and suggest that the feminization of some acoustic features could be shared across languages.
APA, Harvard, Vancouver, ISO, and other styles
20

Niwano, Katsuko, and Kuniaki Sugai. "Acoustic Determinants Eliciting Japanese Infants' Vocal Response to Maternal Speech." Psychological Reports 90, no. 1 (February 2002): 83–90. http://dx.doi.org/10.2466/pr0.2002.90.1.83.

Full text
Abstract:
Generally, infants prefer infant-directed speech to adult-directed speech. This study investigated which acoustic features of maternal infant-directed speech elicit effectively 3-mo.-old infants' vocal response. The participants were 40 Japanese mother and infant dyads. Vocal f0 from the mother's speech and the infant's vocalization was extracted using Computerized Speech Laboratory (CSL4300) and custom software. The acoustical features measured were mean fundamental frequency (f0), and f0 contour. The rate of the infant's vocal response was significantly higher when the maternal infant-directed speech was terminated with a falling contour rather than a rising or flat contour. There was no significant difference between the mean f0 of the maternal infant-directed speech followed or not followed by the infant's vocal response. This suggests that the falling contour of terminal maternal infant-directed speech serves to elicit the 3-mo.-old infant's vocal response.
APA, Harvard, Vancouver, ISO, and other styles
21

Al Mahmud, Nahyan, and Shahfida Amjad Munni. "Qualitative Analysis of PLP in LSTM for Bangla Speech Recognition." International journal of Multimedia & Its Applications 12, no. 5 (October 30, 2020): 1–8. http://dx.doi.org/10.5121/ijma.2020.12501.

Full text
Abstract:
The performance of various acoustic feature extraction methods has been compared in this work using Long Short-Term Memory (LSTM) neural network in a Bangla speech recognition system. The acoustic features are a series of vectors that represents the speech signals. They can be classified in either words or sub word units such as phonemes. In this work, at first linear predictive coding (LPC) is used as acoustic vector extraction technique. LPC has been chosen due to its widespread popularity. Then other vector extraction techniques like Mel frequency cepstral coefficients (MFCC) and perceptual linear prediction (PLP) have also been used. These two methods closely resemble the human auditory system. These feature vectors are then trained using the LSTM neural network. Then the obtained models of different phonemes are compared with different statistical tools namely Bhattacharyya Distance and Mahalanobis Distance to investigate the nature of those acoustic features.
APA, Harvard, Vancouver, ISO, and other styles
22

C, Gunasekar, Sabrigirinathan C, Vinayagavel K, and Ramkumar K. "The acoustic parameters for analysing speech with complete dentures." International Journal of Dental Research 5, no. 2 (July 6, 2017): 115. http://dx.doi.org/10.14419/ijdr.v5i2.7789.

Full text
Abstract:
The patients those who wear dentures have difficulty in speech becomes a major concern. Various studies were provided to improve speech with various technique, suggestions and conclusions but the better or exact test is to analyses the speech with complete dentures. Here we like to give brief details on understanding about proper knowledge on speech production, perception and the acoustic features and useful parameters to analyses these acoustic features with the quality of speech in denture wearers.
APA, Harvard, Vancouver, ISO, and other styles
23

Amano, Akio. "Speech recognition apparatus capable of discriminating between similar acoustic features of speech." Journal of the Acoustical Society of America 94, no. 1 (July 1993): 613. http://dx.doi.org/10.1121/1.408210.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Zhang, Zhan, Yuehai Wang, and Jianyi Yang. "Accent Recognition with Hybrid Phonetic Features." Sensors 21, no. 18 (September 18, 2021): 6258. http://dx.doi.org/10.3390/s21186258.

Full text
Abstract:
The performance of voice-controlled systems is usually influenced by accented speech. To make these systems more robust, frontend accent recognition (AR) technologies have received increased attention in recent years. As accent is a high-level abstract feature that has a profound relationship with language knowledge, AR is more challenging than other language-agnostic audio classification tasks. In this paper, we use an auxiliary automatic speech recognition (ASR) task to extract language-related phonetic features. Furthermore, we propose a hybrid structure that incorporates the embeddings of both a fixed acoustic model and a trainable acoustic model, making the language-related acoustic feature more robust. We conduct several experiments on the AESRC dataset. The results demonstrate that our approach can obtain an 8.02% relative improvement compared with the Transformer baseline, showing the merits of the proposed method.
APA, Harvard, Vancouver, ISO, and other styles
25

Lee, Moa, and Joon-Hyuk Chang. "Augmented Latent Features of Deep Neural Network-Based Automatic Speech Recognition for Motor-Driven Robots." Applied Sciences 10, no. 13 (July 2, 2020): 4602. http://dx.doi.org/10.3390/app10134602.

Full text
Abstract:
Speech recognition for intelligent robots seems to suffer from performance degradation due to ego-noise. The ego-noise is caused by the motors, fans, and mechanical parts inside the intelligent robots especially when the robot moves or shakes its body. To overcome the problems caused by the ego-noise, we propose a robust speech recognition algorithm that uses motor-state information of the robot as an auxiliary feature. For this, we use two deep neural networks (DNN) in this paper. Firstly, we design the latent features using a bottleneck layer, one of the internal layers having a smaller number of hidden units relative to the other layers, to represent whether the motor is operating or not. The latent features maximizing the representation of the motor-state information are generated by taking the motor data and acoustic features as the input of the first DNN. Secondly, once the motor-state dependent latent features are designed at the first DNN, the second DNN, accounting for acoustic modeling, receives the latent features as the input along with the acoustic features. We evaluated the proposed system on LibriSpeech database. The proposed network enables efficient compression of the acoustic and motor-state information, and the resulting word error rate (WER) are superior to that of a conventional speech recognition system.
APA, Harvard, Vancouver, ISO, and other styles
26

Swarna Kuchibhotla, Dr, and Mr Niranjan M.S.R. "Emotional Classification of Acoustic Information With Optimal Feature Subset Selection Methods." International Journal of Engineering & Technology 7, no. 2.32 (May 31, 2018): 39. http://dx.doi.org/10.14419/ijet.v7i2.32.13521.

Full text
Abstract:
This paper mainly focuses on classification of various Acoustic emotional corpora with frequency domain features using feature subset selection methods. The emotional speech samples are classified into neutral, happy, fear , anger, disgust and sad states by using properties of statistics of spectral features estimated from Berlin and Spanish emotional utterances. The Sequential Forward Selection(SFS) and Sequential Floating Forward Selection(SFFS)feature subset selection algorithms are for extracting more informative features. The number of speech emotional samples available for training is smaller than that of the number of features extracted from the speech sample in both Berlin and Spanish corpora which is called curse of dimensionality. Because of this feature vector of high dimensionality the efficiency of the classifier decreases and at the same time the computational time also increases. For additional improvement in the efficiency of the classifier a subset of features which are optimal is needed and is obtained by using feature subset selection methods. This will enhances the performance of the system with high efficiency and lower computation time. The classifier used in this work is the standard K Nearest Neighbour (KNN) Classifier. Experimental evaluation proved that the performance of the classifier is enhanced with SFFS because it vanishes the nesting effect suffered by SFS. The results also showed that an optimal feature subset is a better choice for classification rather than full feature set.
APA, Harvard, Vancouver, ISO, and other styles
27

Missaoui, Ibrahim, and Zied Lachiri. "An Extraction Method of Acoustic Features for Speech Recognition." Research Journal of Applied Sciences, Engineering and Technology 12, no. 9 (May 5, 2016): 964–67. http://dx.doi.org/10.19026/rjaset.12.2814.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

Shahnawazuddin, Syed, Rohit Sinha, and Gayadhar Pradhan. "Pitch-Normalized Acoustic Features for Robust Children's Speech Recognition." IEEE Signal Processing Letters 24, no. 8 (August 2017): 1128–32. http://dx.doi.org/10.1109/lsp.2017.2705085.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Kubo, Rieko, and Masato Akagi. "Acoustic features of intelligible speech produced under reverberant environments." Journal of the Acoustical Society of America 144, no. 3 (September 2018): 1802. http://dx.doi.org/10.1121/1.5067954.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Rosenhouse, Judith K. "Assessing acoustic features in the speech of asylum seekers." Journal of the Acoustical Society of America 133, no. 5 (May 2013): 3244. http://dx.doi.org/10.1121/1.4805198.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Nasir, Md, Brian Robert Baucom, Panayiotis Georgiou, and Shrikanth Narayanan. "Predicting couple therapy outcomes based on speech acoustic features." PLOS ONE 12, no. 9 (September 21, 2017): e0185123. http://dx.doi.org/10.1371/journal.pone.0185123.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Romigh, Griffin, Clayton Rothwell, Brandon Greenwell, and Meagan Newman. "Modeling uncertainty in spontaneous speech: Lexical and acoustic features." Journal of the Acoustical Society of America 140, no. 4 (October 2016): 3401. http://dx.doi.org/10.1121/1.4970912.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Zvarevashe, Kudakwashe, and Oludayo Olugbara. "Ensemble Learning of Hybrid Acoustic Features for Speech Emotion Recognition." Algorithms 13, no. 3 (March 22, 2020): 70. http://dx.doi.org/10.3390/a13030070.

Full text
Abstract:
Automatic recognition of emotion is important for facilitating seamless interactivity between a human being and intelligent robot towards the full realization of a smart society. The methods of signal processing and machine learning are widely applied to recognize human emotions based on features extracted from facial images, video files or speech signals. However, these features were not able to recognize the fear emotion with the same level of precision as other emotions. The authors propose the agglutination of prosodic and spectral features from a group of carefully selected features to realize hybrid acoustic features for improving the task of emotion recognition. Experiments were performed to test the effectiveness of the proposed features extracted from speech files of two public databases and used to train five popular ensemble learning algorithms. Results show that random decision forest ensemble learning of the proposed hybrid acoustic features is highly effective for speech emotion recognition.
APA, Harvard, Vancouver, ISO, and other styles
34

Buckley, Daniel P., Manuel Diaz Cadiz, Tanya L. Eadie, and Cara E. Stepp. "Acoustic Model of Perceived Overall Severity of Dysphonia in Adductor-Type Laryngeal Dystonia." Journal of Speech, Language, and Hearing Research 63, no. 8 (August 10, 2020): 2713–22. http://dx.doi.org/10.1044/2020_jslhr-19-00354.

Full text
Abstract:
Purpose This study is a secondary analysis of existing data. The goal of the study was to construct an acoustic model of perceived overall severity of dysphonia in adductory laryngeal dystonia (AdLD). We predicted that acoustic measures (a) related to voice and pitch breaks and (b) related to vocal effort would form the primary elements of a model corresponding to auditory-perceptual ratings of overall severity of dysphonia. Method Twenty inexperienced listeners evaluated the overall severity of dysphonia of speech stimuli from 19 individuals with AdLD. Acoustic features related to primary signs of AdLD (hyperadduction resulting in pitch and voice breaks) and to a potential secondary symptom of AdLD (vocal effort, measures of relative fundamental frequency) were computed from the speech stimuli. Multiple linear regression analysis was applied to construct an acoustic model of the overall severity of dysphonia. Results The acoustic model included an acoustic feature related to pitch and voice breaks and three acoustic measures derived from relative fundamental frequency; it explained 84.9% of the variance in the auditory-perceptual ratings of overall severity of dysphonia in the speech samples. Conclusions Auditory-perceptual ratings of overall severity of dysphonia in AdLD were related to acoustic features of primary signs (pitch and voice breaks, hyperadduction associated with laryngeal spasms) and were also related to acoustic features of vocal effort. This suggests that compensatory vocal effort may be a secondary symptom in AdLD. Future work to generalize this acoustic model to a larger, independent data set is necessary before clinical translation is warranted.
APA, Harvard, Vancouver, ISO, and other styles
35

Duffy, Joseph R., Edythe A. Strand, Heather Clark, Mary Machulda, Jennifer L. Whitwell, and Keith A. Josephs. "Primary Progressive Apraxia of Speech: Clinical Features and Acoustic and Neurologic Correlates." American Journal of Speech-Language Pathology 24, no. 2 (May 2015): 88–100. http://dx.doi.org/10.1044/2015_ajslp-14-0174.

Full text
Abstract:
Purpose This study summarizes 2 illustrative cases of a neurodegenerative speech disorder, primary progressive apraxia of speech (AOS), as a vehicle for providing an overview of the disorder and an approach to describing and quantifying its perceptual features and some of its temporal acoustic attributes. Method Two individuals with primary progressive AOS underwent speech-language and neurologic evaluations on 2 occasions, ranging from 2.0 to 7.5 years postonset. Performance on several tests, tasks, and rating scales, as well as several acoustic measures, were compared over time within and between cases. Acoustic measures were compared with performance of control speakers. Results Both patients initially presented with AOS as the only or predominant sign of disease and without aphasia or dysarthria. The presenting features and temporal progression were captured in an AOS Rating Scale, an Articulation Error Score, and temporal acoustic measures of utterance duration, syllable rates per second, rates of speechlike alternating motion and sequential motion, and a pairwise variability index measure. Conclusions AOS can be the predominant manifestation of neurodegenerative disease. Clinical ratings of its attributes and acoustic measures of some of its temporal characteristics can support its diagnosis and help quantify its salient characteristics and progression over time.
APA, Harvard, Vancouver, ISO, and other styles
36

Oh, Yoo Rhee, Kiyoung Park, and Jeon Gyu Park. "Online Speech Recognition Using Multichannel Parallel Acoustic Score Computation and Deep Neural Network (DNN)- Based Voice-Activity Detector." Applied Sciences 10, no. 12 (June 14, 2020): 4091. http://dx.doi.org/10.3390/app10124091.

Full text
Abstract:
This paper aims to design an online, low-latency, and high-performance speech recognition system using a bidirectional long short-term memory (BLSTM) acoustic model. To achieve this, we adopt a server-client model and a context-sensitive-chunk-based approach. The speech recognition server manages a main thread and a decoder thread for each client and one worker thread. The main thread communicates with the connected client, extracts speech features, and buffers the features. The decoder thread performs speech recognition, including the proposed multichannel parallel acoustic score computation of a BLSTM acoustic model, the proposed deep neural network-based voice activity detector, and Viterbi decoding. The proposed acoustic score computation method estimates the acoustic scores of a context-sensitive-chunk BLSTM acoustic model for the batched speech features from concurrent clients, using the worker thread. The proposed deep neural network-based voice activity detector detects short pauses in the utterance to reduce response latency, while the user utters long sentences. From the experiments of Korean speech recognition, the number of concurrent clients is increased from 22 to 44 using the proposed acoustic score computation. When combined with the frame skipping method, the number is further increased up to 59 clients with a small accuracy degradation. Moreover, the average user-perceived latency is reduced from 11.71 s to 3.09–5.41 s by using the proposed deep neural network-based voice activity detector.
APA, Harvard, Vancouver, ISO, and other styles
37

Bae, Youkyung, David P. Kuehn, Charles A. Conway, and Bradley P. Sutton. "Real-Time Magnetic Resonance Imaging of Velopharyngeal Activities with Simultaneous Speech Recordings." Cleft Palate-Craniofacial Journal 48, no. 6 (November 2011): 695–707. http://dx.doi.org/10.1597/09-158.

Full text
Abstract:
Objective To examine the relationships between acoustic and physiologic aspects of the velopharyngeal mechanism during acoustically nasalized segments of speech in normal individuals by combining fast magnetic resonance imaging (MRI) with simultaneous speech recordings and subsequent acoustic analyses. Design Ten normal Caucasian adult individuals participated in the study. M id sagittal dynamic magnetic resonance imaging (MRI) and simultaneous speech recordings were performed while participants were producing repetitions of two rate-controlled nonsense syllables including /zanaza/ and /zunuzu/. Acoustic features of nasalization represented as the peak amplitude and the bandwidth of the first resonant frequency (F1) were derived from speech at the rate of 30 sets per second. Physiologic information was based on velar and tongue positional changes measured from the dynamic MRI data, which were acquired at a rate of 21.4 images per second and resampled with a corresponding rate of 30 images per second. Each acoustic feature of nasalization was regressed on gender, vowel context, and velar and tongue positional variables. Results Acoustic features of nasalization represented by F1 peak amplitude and bandwidth changes were significantly influenced by the vowel context surrounding the nasal consonant, velar elevated position, and tongue height at the tip. Conclusions Fast MRI combined with acoustic analysis was successfully applied to the investigation of acoustic-physiologic relationships of the velopharyngeal mechanism with the type of speech samples employed in the present study. Future applications are feasible to examine how anatomic and physiologic deviations of the velopharyngeal mechanism would be acoustically manifested in individuals with velopharyngeal incompetence.
APA, Harvard, Vancouver, ISO, and other styles
38

Xiong, Feifei, Stefan Goetze, Birger Kollmeier, and Bernd T. Meyer. "Exploring Auditory-Inspired Acoustic Features for Room Acoustic Parameter Estimation From Monaural Speech." IEEE/ACM Transactions on Audio, Speech, and Language Processing 26, no. 10 (October 2018): 1809–20. http://dx.doi.org/10.1109/taslp.2018.2843537.

Full text
APA, Harvard, Vancouver, ISO, and other styles
39

Kuchibhotla, Swarna, Hima Deepthi Vankayalapati, and Koteswara Rao Anne. "An optimal two stage feature selection for speech emotion recognition using acoustic features." International Journal of Speech Technology 19, no. 4 (August 2, 2016): 657–67. http://dx.doi.org/10.1007/s10772-016-9358-0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
40

Ren, Guofeng, Guicheng Shao, and Jianmei Fu. "Articulatory-to-Acoustic Conversion Using BiLSTM-CNN Word-Attention-Based Method." Complexity 2020 (September 26, 2020): 1–10. http://dx.doi.org/10.1155/2020/4356981.

Full text
Abstract:
In the recent years, along with the development of artificial intelligence (AI) and man-machine interaction technology, speech recognition and production have been asked to adapt to the rapid development of AI and man-machine technology, which need to improve recognition accuracy through adding novel features, fusing the feature, and improving recognition methods. Aiming at developing novel recognition feature and application to speech recognition, this paper presents a new method for articulatory-to-acoustic conversion. In the study, we have converted articulatory features (i.e., velocities of tongue and motion of lips) into acoustic features (i.e., the second formant and Mel-Cepstra). By considering the graphical representation of the articulators’ motion, this study combined Bidirectional Long Short-Term Memory (BiLSTM) with convolution neural network (CNN) and adopted the idea of word attention in Mandarin to extract semantic features. In this paper, we used the electromagnetic articulography (EMA) database designed by Taiyuan University of Technology, which contains ten speakers’ 299 disyllables and sentences of Mandarin, and extracted 8-dimensional articulatory features and 1-dimensional semantic feature relying on the word-attention layer; we then trained 200 samples and tested 99 samples for the articulatory-to-acoustic conversion. Finally, Root Mean Square Error (RMSE), Mean Mel-Cepstral Distortion (MMCD), and correlation coefficient have been used to evaluate the conversion effect and for comparison with Gaussian Mixture Model (GMM) and BiLSTM of recurrent neural network (BiLSTM-RNN). The results illustrated that the MMCD of Mel-Frequency Cepstrum Coefficient (MFCC) was 1.467 dB, and the RMSE of F2 was 22.10 Hz. The research results of this study can be used in the features fusion and speech recognition to improve the accuracy of recognition.
APA, Harvard, Vancouver, ISO, and other styles
41

Parjane, Natalia, Sunghye Cho, Sharon Ash, Katheryn A. Q. Cousins, Sanjana Shellikeri, Mark Liberman, Leslie M. Shaw, David J. Irwin, Murray Grossman, and Naomi Nevler. "Digital Speech Analysis in Progressive Supranuclear Palsy and Corticobasal Syndromes." Journal of Alzheimer's Disease 82, no. 1 (June 29, 2021): 33–45. http://dx.doi.org/10.3233/jad-201132.

Full text
Abstract:
Background: Progressive supranuclear palsy syndrome (PSPS) and corticobasal syndrome (CBS) as well as non-fluent/agrammatic primary progressive aphasia (naPPA) are often associated with misfolded 4-repeat tau pathology, but the diversity of the associated speech features is poorly understood. Objective: Investigate the full range of acoustic and lexical properties of speech to test the hypothesis that PSPS-CBS show a subset of speech impairments found in naPPA. Methods: Acoustic and lexical measures, extracted from natural, digitized semi-structured speech samples using novel, automated methods, were compared in PSPS-CBS (n = 87), naPPA (n = 25), and healthy controls (HC, n = 41). We related these measures to grammatical performance and speech fluency, core features of naPPA, to neuropsychological measures of naming, executive, memory and visuoconstructional functioning, and to cerebrospinal fluid (CSF) phosphorylated tau (pTau) levels in patients with available biofluid analytes. Results: Both naPPA and PSPS-CBS speech produced shorter speech segments, longer pauses, higher pause rates, reduced fundamental frequency (f0) pitch ranges, and slower speech rate compared to HC. naPPA speech was distinct from PSPS-CBS with shorter speech segments, more frequent pauses, slower speech rate, reduced verb production, and higher partial word production. In both groups, acoustic duration measures generally correlated with speech fluency, measured as words per minute, and grammatical performance. Speech measures did not correlate with standard neuropsychological measures. CSF pTau levels correlated with f0 range in PSPS-CBS and naPPA. Conclusion: Lexical and acoustic speech features of PSPS-CBS overlaps those of naPPA and are related to CSF pTau levels.
APA, Harvard, Vancouver, ISO, and other styles
42

Brodbeck, Christian, Alex Jiao, L. Elliot Hong, and Jonathan Z. Simon. "Neural speech restoration at the cocktail party: Auditory cortex recovers masked speech of both attended and ignored speakers." PLOS Biology 18, no. 10 (October 22, 2020): e3000883. http://dx.doi.org/10.1371/journal.pbio.3000883.

Full text
Abstract:
Humans are remarkably skilled at listening to one speaker out of an acoustic mixture of several speech sources. Two speakers are easily segregated, even without binaural cues, but the neural mechanisms underlying this ability are not well understood. One possibility is that early cortical processing performs a spectrotemporal decomposition of the acoustic mixture, allowing the attended speech to be reconstructed via optimally weighted recombinations that discount spectrotemporal regions where sources heavily overlap. Using human magnetoencephalography (MEG) responses to a 2-talker mixture, we show evidence for an alternative possibility, in which early, active segregation occurs even for strongly spectrotemporally overlapping regions. Early (approximately 70-millisecond) responses to nonoverlapping spectrotemporal features are seen for both talkers. When competing talkers’ spectrotemporal features mask each other, the individual representations persist, but they occur with an approximately 20-millisecond delay. This suggests that the auditory cortex recovers acoustic features that are masked in the mixture, even if they occurred in the ignored speech. The existence of such noise-robust cortical representations, of features present in attended as well as ignored speech, suggests an active cortical stream segregation process, which could explain a range of behavioral effects of ignored background speech.
APA, Harvard, Vancouver, ISO, and other styles
43

Cherif, Youssouf Ismail, and Abdelhakim Dahimene. "IMPROVED VOICE-BASED BIOMETRICS USING MULTI-CHANNEL TRANSFER LEARNING." IADIS INTERNATIONAL JOURNAL ON COMPUTER SCIENCE AND INFORMATION SYSTEMS 15, no. 1 (October 7, 2020): 99–113. http://dx.doi.org/10.33965/ijcsis_2020150108.

Full text
Abstract:
Identifying the speaker has become more of an imperative thing to do in the modern age. Especially since most personal and professional appliances rely on voice commands or speech in general terms to operate. These systems need to discern the identity of the speaker rather than just the words that have been said to be both smart and safe. Especially if we consider the numerous advanced methods that have been developed to generate fake speech segments. The objective of this paper is to improve upon the existing voice-based biometrics to keep up with these synthesizers. The proposed method focuses on defining a novel and more speaker adapted features by implying artificial neural networks and transfer learning. The approach uses pre-trained networks to define a mapping from two complementary acoustic features to a speaker adapted phonetic features. The complementary acoustics features are paired to provide both information about how the speech segments are perceived (type 1 feature) and produced (type 2 feature). The approach was evaluated using both a small and large closed-speaker data set. Primary results are encouraging and confirm the usefulness of such an approach to extract speaker adapted features whether for classical machine learning algorithms or advanced neural structures such as LSTM or CNN.
APA, Harvard, Vancouver, ISO, and other styles
44

Dehaene-Lambertz, G. "Cerebral Specialization for Speech and Non-Speech Stimuli in Infants." Journal of Cognitive Neuroscience 12, no. 3 (May 2000): 449–60. http://dx.doi.org/10.1162/089892900562264.

Full text
Abstract:
Early cerebral specialization and lateralization for auditory processing in 4-month-old infants was studied by recording high-density evoked potentials to acoustical and phonetic changes in a series of repeated stimuli (either tones or syllables). Mismatch responses to these stimuli exhibit a distinct topography suggesting that different neural networks within the temporal lobe are involved in the perception and representation of the different features of an auditory stimulus. These data confirm that specialized modules are present within the auditory cortex very early in development. However, both for syllables and continuous tones, higher voltages were recorded over the left hemisphere than over the right with no significant interaction of hemisphere by type of stimuli. This suggests that there is no greater left hemisphere involvement in phonetic processing than in acoustic processing during the first months of life.
APA, Harvard, Vancouver, ISO, and other styles
45

Ding, Nai, and Jonathan Z. Simon. "Neural coding of continuous speech in auditory cortex during monaural and dichotic listening." Journal of Neurophysiology 107, no. 1 (January 2012): 78–89. http://dx.doi.org/10.1152/jn.00297.2011.

Full text
Abstract:
The cortical representation of the acoustic features of continuous speech is the foundation of speech perception. In this study, noninvasive magnetoencephalography (MEG) recordings are obtained from human subjects actively listening to spoken narratives, in both simple and cocktail party-like auditory scenes. By modeling how acoustic features of speech are encoded in ongoing MEG activity as a spectrotemporal response function, we demonstrate that the slow temporal modulations of speech in a broad spectral region are represented bilaterally in auditory cortex by a phase-locked temporal code. For speech presented monaurally to either ear, this phase-locked response is always more faithful in the right hemisphere, but with a shorter latency in the hemisphere contralateral to the stimulated ear. When different spoken narratives are presented to each ear simultaneously (dichotic listening), the resulting cortical neural activity precisely encodes the acoustic features of both of the spoken narratives, but slightly weakened and delayed compared with the monaural response. Critically, the early sensory response to the attended speech is considerably stronger than that to the unattended speech, demonstrating top-down attentional gain control. This attentional gain is substantial even during the subjects' very first exposure to the speech mixture and therefore largely independent of knowledge of the speech content. Together, these findings characterize how the spectrotemporal features of speech are encoded in human auditory cortex and establish a single-trial-based paradigm to study the neural basis underlying the cocktail party phenomenon.
APA, Harvard, Vancouver, ISO, and other styles
46

Zahorian, Stephen A., Hongbing Hu, and Jiang Wu. "Time/frequency resolution of acoustic features for automatic speech recognition." Journal of the Acoustical Society of America 128, no. 4 (October 2010): 2324. http://dx.doi.org/10.1121/1.3508203.

Full text
APA, Harvard, Vancouver, ISO, and other styles
47

Kusumoto, Akiko, and Nancy Vaughan. "Comparison of acoustic features of time‐compressed and natural speech." Journal of the Acoustical Society of America 116, no. 4 (October 2004): 2600–2601. http://dx.doi.org/10.1121/1.4785377.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Mikumo, Mariko. "Relationship between the acoustic features and impression evaluation in speech." Proceedings of the Annual Convention of the Japanese Psychological Association 79 (September 22, 2015): 3AM—072–3AM—072. http://dx.doi.org/10.4992/pacjpa.79.0_3am-072.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Valente, Fabio, Mathew Magimai Doss, Christian Plahl, Suman Ravuri, and Wen Wang. "Transcribing Mandarin Broadcast Speech Using Multi-Layer Perceptron Acoustic Features." IEEE Transactions on Audio, Speech, and Language Processing 19, no. 8 (November 2011): 2439–50. http://dx.doi.org/10.1109/tasl.2011.2139206.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

Chan, C. P., P. C. Ching, and Tan Lee. "Noisy speech recognition using de-noised multiresolution analysis acoustic features." Journal of the Acoustical Society of America 110, no. 5 (November 2001): 2567–74. http://dx.doi.org/10.1121/1.1398054.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography