Academic literature on the topic 'Vocal feature'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Vocal feature.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Vocal feature"

1

Keyser, Samuel Jay, and Kenneth N. Stevens. "Feature geometry and the vocal tract." Phonology 11, no. 2 (August 1994): 207–36. http://dx.doi.org/10.1017/s0952675700001950.

Full text
Abstract:
Perhaps the most important insight in phonological theory since the introduction of the concept of the phoneme has been the role that distinctive features play in phonological theory (Jakobson et al. 1952). Most research since Jakobson's early formulation has focused on the segmental properties of these features without reference to their hierarchical organisation. Recent research, however, has shed considerable light on this latter aspect of the phoneme as a phonological unit. In his seminal article ‘The geometry of phonological features’, for example, Clements (1985), building on earlier work of scholars such as Goldsmith (1976), argues that features are not ‘bundles’ in Bloomfield's sense, but are, in fact, organised into phonological trees with each branch corresponding to what has been called a tier. An overview of the current state of feature geometry can be found in Clements & Hume (forthcoming) and Kenstowicz (1994).
APA, Harvard, Vancouver, ISO, and other styles
2

Lv, Chaohui, Hua Lan, Ying Yu, and Shengnan Li. "Objective Evaluation Method of Broadcasting Vocal Timbre Based on Feature Selection." Wireless Communications and Mobile Computing 2022 (May 26, 2022): 1–17. http://dx.doi.org/10.1155/2022/7086599.

Full text
Abstract:
Broadcasting voice is used to convey ideas and emotions. In the selection process of broadcasting and hosting professionals, the vocal timbre is an important index. The subjective evaluation method is widely used, but the selection results have certain subjectivity and uncertainty. In this paper, an objective evaluation method of broadcasting vocal timbre is proposed. Firstly, the broadcasting vocal timbre database is constructed on Chinese phonetic characteristics. Then, the timbre feature selection strategy is presented based on human vocal mechanism, and the broadcast timbre characteristics are divided into three categories, which include source parameters, vocal tract parameters, and human hearing parameters. Finally, the three models of hidden Markov model (HMM), Gaussian Mixture Model-General Background Model (GMM-UBM), and long short-term memory (LSTM) are exploited to evaluate the timbre of the broadcast by extracting timbre features and four timbre feature combinations. The experiments show that the selection of timbre features is scientific and effective. Moreover, the accuracy of the LSTM network using the deep learning algorithm in the objective evaluation of the broadcast timbre is better than the traditional HMM and GMM-UBM, and the proposed method can achieve about 95% accuracy rate in our database.
APA, Harvard, Vancouver, ISO, and other styles
3

Wang, Fenqi, Delin Deng, and Ratree Wayland. "The acoustic profiles of vocal emotions in Japanese: A corpus study with generalized additive mixed modeling." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A60—A61. http://dx.doi.org/10.1121/10.0015548.

Full text
Abstract:
This study investigated the vocal emotions in Japanese by analyzing acoustic features from emotional utterances in the Online Gaming Voice Chat Corpus with Emotional Label (Arimoto and Kawatsu, 2013). The corpus contains the recorded sentences produced in 8 emotions by four native Japanese speakers who are professional actors. For acoustic feature extraction, Praat script ProsodyPro was used. Principle component analysis (PCA) was conducted to evaluate the contribution of each acoustic feature. In addition, a linear discriminant classifier (LDA) was trained with the extracted acoustic features to predict the emotion category and intensity. A generalized additive mixed model (GAMM) was performed to examine the effect of gender, emotional category, and emotional intensity on the time-normalized f0 values. The GAMM’s results suggested the effects of gender, emotion, and emotional intensity on the time-normalized f0 values of vocal emotions in Japanese. The recognition accuracy of the LDA classifier reached about 60%, suggesting that although pitch-related measures are important to differentiate vocal emotions, bio-informational features (e.g., jitter, shimmer, and harmonicity) are also informative. In addition, our correlation analysis suggested that vocal emotions could be conveyed by a set of features rather than some individual features alone.
APA, Harvard, Vancouver, ISO, and other styles
4

Jayanthi Kumari, T. R., and H. S. Jayanna. "i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques." Journal of Intelligent Systems 29, no. 1 (May 3, 2018): 565–82. http://dx.doi.org/10.1515/jisys-2017-0047.

Full text
Abstract:
Abstract In many biometric applications, limited data speaker verification plays a significant role in practical-oriented systems to verify the speaker. The performance of the speaker verification system needs to be improved by applying suitable techniques to limited data condition. The limited data represent both train and test data duration in terms of few seconds. This article shows the importance of the speaker verification system under limited data condition using feature- and score-level fusion techniques. The baseline speaker verification system uses vocal tract features like mel-frequency cepstral coefficients, linear predictive cepstral coefficients and excitation source features like linear prediction residual and linear prediction residual phase as features along with i-vector modeling techniques using the NIST 2003 data set. In feature-level fusion, the vocal tract features are fused with excitation source features. As a result, on average, equal error rate (EER) is approximately equal to 4% compared to individual feature performance. Further in this work, two different types of score-level fusion are demonstrated. In the first case, fusing the scores of vocal tract features and excitation source features at score-level-maintaining modeling technique remains the same, which provides an average reduction approximately equal to 2% EER compared to feature-level fusion performance. In the second case, scores of the different modeling techniques are combined, which has resulted in EER reduction approximately equal to 4.5% compared with score-level fusion of different features.
APA, Harvard, Vancouver, ISO, and other styles
5

Lahiri, Rimita, Md Nasir, Manoj Kumar, So Hyun Kim, Somer Bishop, Catherine Lord, and Shrikanth Narayanan. "Interpersonal synchrony across vocal and lexical modalities in interactions involving children with autism spectrum disorder." JASA Express Letters 2, no. 9 (September 2022): 095202. http://dx.doi.org/10.1121/10.0013421.

Full text
Abstract:
Quantifying behavioral synchrony can inform clinical diagnosis, long-term monitoring, and individualised interventions in neuro-developmental disorders characterized by deficit in communication and social interaction, such as autism spectrum disorder. In this work, three different objective measures of interpersonal synchrony are evaluated across vocal and linguistic communication modalities. For vocal prosodic and spectral features, dynamic time warping distance and squared cosine distance of (feature-wise) complexity are used, and for lexical features, word mover's distance is applied to capture behavioral synchrony. It is shown that these interpersonal vocal and linguistic synchrony measures capture complementary information that helps in characterizing overall behavioral patterns.
APA, Harvard, Vancouver, ISO, and other styles
6

Wu, Yunfeng, Pinnan Chen, Yuchen Yao, Xiaoquan Ye, Yugui Xiao, Lifang Liao, Meihong Wu, and Jian Chen. "Dysphonic Voice Pattern Analysis of Patients in Parkinson’s Disease Using Minimum Interclass Probability Risk Feature Selection and Bagging Ensemble Learning Methods." Computational and Mathematical Methods in Medicine 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/4201984.

Full text
Abstract:
Analysis of quantified voice patterns is useful in the detection and assessment of dysphonia and related phonation disorders. In this paper, we first study the linear correlations between 22 voice parameters of fundamental frequency variability, amplitude variations, and nonlinear measures. The highly correlated vocal parameters are combined by using the linear discriminant analysis method. Based on the probability density functions estimated by the Parzen-window technique, we propose an interclass probability risk (ICPR) method to select the vocal parameters with small ICPR values as dominant features and compare with the modified Kullback-Leibler divergence (MKLD) feature selection approach. The experimental results show that the generalized logistic regression analysis (GLRA), support vector machine (SVM), and Bagging ensemble algorithm input with the ICPR features can provide better classification results than the same classifiers with the MKLD selected features. The SVM is much better at distinguishing normal vocal patterns with a specificity of 0.8542. Among the three classification methods, the Bagging ensemble algorithm with ICPR features can identify 90.77% vocal patterns, with the highest sensitivity of 0.9796 and largest area value of 0.9558 under the receiver operating characteristic curve. The classification results demonstrate the effectiveness of our feature selection and pattern analysis methods for dysphonic voice detection and measurement.
APA, Harvard, Vancouver, ISO, and other styles
7

Matassini, Lorenzo, Rainer Hegger, Holger Kantz, and Claudia Manfredi. "Analysis of vocal disorders in a feature space." Medical Engineering & Physics 22, no. 6 (July 2000): 413–18. http://dx.doi.org/10.1016/s1350-4533(00)00048-5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

SHIMADA M., Yohko. "Feature of infants' sound in overlapping vocal communication." Proceedings of the Annual Convention of the Japanese Psychological Association 75 (September 15, 2011): 2PM077. http://dx.doi.org/10.4992/pacjpa.75.0_2pm077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hoq, Muntasir, Mohammed Nazim Uddin, and Seung-Bo Park. "Vocal Feature Extraction-Based Artificial Intelligent Model for Parkinson’s Disease Detection." Diagnostics 11, no. 6 (June 11, 2021): 1076. http://dx.doi.org/10.3390/diagnostics11061076.

Full text
Abstract:
As a neurodegenerative disorder, Parkinson’s disease (PD) affects the nerve cells of the human brain. Early detection and treatment can help to relieve the symptoms of PD. Recent PD studies have extracted the features from vocal disorders as a harbinger for PD detection, as patients face vocal changes and impairments at the early stages of PD. In this study, two hybrid models based on a Support Vector Machine (SVM) integrating with a Principal Component Analysis (PCA) and a Sparse Autoencoder (SAE) are proposed to detect PD patients based on their vocal features. The first model extracted and reduced the principal components of vocal features based on the explained variance of each feature using PCA. For the first time, the second model used a novel Deep Neural Network (DNN) of an SAE, consisting of multiple hidden layers with L1 regularization to compress the vocal features into lower-dimensional latent space. In both models, reduced features were fed into the SVM as inputs, which performed classification by learning hyperplanes, along with projecting the data into a higher dimension. An F1-score, a Mathews Correlation Coefficient (MCC), and a Precision-Recall curve were used, along with accuracy to evaluate the proposed models due to highly imbalanced data. With its highest accuracy of 0.935, F1-score of 0.951, and MCC value of 0.788, the probing results show that the proposed model of the SAE-SVM surpassed not only the former model of the PCA-SVM and other standard models including Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbor (KNN), and Random Forest (RF), but also surpassed two recent studies using the same dataset. Oversampling and balancing the dataset with SMOTE boosted the performance of the models.
APA, Harvard, Vancouver, ISO, and other styles
10

Zang, Lu. "Investigation on the Extraction Methods of Timbre Features in Vocal Singing Based on Machine Learning." Computational Intelligence and Neuroscience 2022 (September 17, 2022): 1–11. http://dx.doi.org/10.1155/2022/5074829.

Full text
Abstract:
With the continuous development of digital technology, music, as an important form of media, and its digital audio technology is also constantly developing, forcing the traditional music industry to start the road of digital transformation. What kind of method can be used to automatically retrieve music information effectively and quickly in vocal singing has become one of the current research topics that has attracted much attention. Aiming at this problem, it is of great research significance for the field of timbre feature recognition. With the in-depth research on timbre feature recognition, the research on timbre feature extraction by machine learning in vocal singing has also been gradually carried out, and its performance advantages are of great significance to solve the problem of automatic retrieval of music information. This paper aims to study the application of feature extraction algorithm based on machine learning in timbre feature extraction in vocal singing. Through the analysis and research of machine learning and feature extraction methods, it can be applied to the construction of timbre feature extraction algorithms to solve the problem of automatic retrieval of music information. This paper analyzed vocal singing, machine learning, and feature extraction, experimentally analyzed the performance of the method, and used related theoretical formulas to explain. The results have showed that the method for timbre feature extraction in the vocal singing environment was more accurate than the traditional method, the difference between the two was 24.27%, and the proportion of satisfied users was increased by 33%. It can be seen that this method can meet the needs of users for timbre feature extraction in the use of music software, and the work efficiency and user satisfaction are greatly improved.
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Vocal feature"

1

Moore, Elliot II. "Evaluating objective feature statistics of speech as indicators of vocal affect and depression." Diss., Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/5346.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Moore, Elliot. "Evaluating objective feature statistics of speech as indicators of vocal affect and depression." Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04062004-164738/unrestricted/moore%5Felliot%5F200312%5Fphd.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Carvalho, Raphael Torres Santos. "Transformada Wavelet na detecÃÃo de patologias da laringe." Universidade Federal do CearÃ, 2012. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=8908.

Full text
Abstract:
CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior
A quantidade de mÃtodos nÃo invasivos de diagnÃstico tem aumentado devido à necessidade de exames simples, rÃpidos e indolores. Por conta do crescimento da tecnologia que fornece os meios necessÃrios para a extraÃÃo e processamento de sinais, novos mÃtodos de anÃlise tÃm sido desenvolvidos para compreender a complexidade dos sinais de voz. Este trabalho de dissertaÃÃo apresenta uma nova ideia para caracterizar os sinais de voz saudÃvel e patolÃgicos baseado em uma ferramenta matemÃtica amplamente conhecida na literatura, a Transformada Wavelet (WT). O conjunto de dados utilizado neste trabalho consiste de 60 amostras de vozes divididas em quatro classes de amostras, uma de indivÃduos saudÃveis e as outras trÃs de pessoas com nÃdulo vocal, edema de Reinke e disfonia neurolÃgica. Todas as amostras foram gravadas usando a vogal sustentada /a/ do PortuguÃs Brasileiro. Os resultados obtidos por todos os classificadores de padrÃes estudados mostram que a abordagem proposta usando WT à uma tÃcnica adequada para discriminaÃÃo entre vozes saudÃvel e patolÃgica, e apresentaram resultados similares ou superiores a da tÃcnica clÃssica quanto à taxa de reconhecimento.
The amount of non-invasive methods of diagnosis has increased due to the need for simple, quick and painless tests. Due to the growth of technology that provides the means for extraction and signal processing, new analytical methods have been developed to help the understanding of analysis of the complexity of the voice signals. This dissertation presents a new idea to characterize signals of healthy and pathological voice based on one mathematical tools widely known in the literature, Wavelet Transform (WT). The speech data were used in this work consists of 60 voice samples divided into four classes of samples: one from healthy individuals and three from people with vocal fold nodules, Reinkeâs edema and neurological dysphonia. All the samples were recorded using the vowel /a/ in Brazilian Portuguese. The obtained results by all the pattern classifiers studied indicate that the proposed approach using WT is a suitable technique to discriminate between healthy and pathological voices, since they perform similarly to or even better than classical technique, concerning recognition rates.
APA, Harvard, Vancouver, ISO, and other styles
4

Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
APA, Harvard, Vancouver, ISO, and other styles
5

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
6

Horwitz-Martin, Rachelle (Rachelle Laura). "Vocal modulation features in the prediction of major depressive disorder severity." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/93072.

Full text
Abstract:
Thesis: S.M., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2014.
"September 2014." Cataloged from PDF version of thesis.
Includes bibliographical references (pages 113-115).
This thesis develops a model of vocal modulations up to 50 Hz in sustained vowels as a basis for biomarkers of neurological disease, particularly Major Depressive Disorder (MDD). Two model components contribute to amplitude modulation (AM): AM from respiratory muscles and from interaction between formants and frequency modulation in the fundamental frequency harmonics. Based on the modulation model, we test three methods to extract the envelope of the third formant from which features are extracted using sustained vowels from the 2013 AudioNisual Emotion Challenge. Using a Gaussian-Mixture-Model-based predictor, we evaluate performance of each feature in predicting subjects' Beck MDD severity score by the root mean square error (RMSE), mean absolute error (MAE), and Spearman correlation between the actual Beck score and predicted score. Our lowest MAE and RMSE values are 8.46 and 10.32, respectively (Spearman correlation=0.487, p<0.001), relative to the mean MAE of 10.05 and mean RMSE of 11.86.
by Rachelle L. Horwitz.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
7

García, María Susana Avila. "Automatic tracking of 3D vocal tract features during speech production using MRI." Thesis, University of Southampton, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.437111.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Marchetto, Enrico. "Automatic Speaker Recognition and Characterization by means of Robust Vocal Source Features." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3427390.

Full text
Abstract:
Automatic Speaker Recognition is a wide research field, which encompasses many topics: signal processing, human vocal and auditory physiology, statistical modelling, cognitive sciences, and so on. The study of these techniques started about thirty years ago and, since then, the improvement has been dramatic. Nonetheless the field still poses open issues and many active research centers around the world are working towards more reliable and better performing systems. This thesis documents a Philosophiae Doctor project funded by the private held company RT - Radio Trevisan Elettronica Industriale S.p.A. The title of the fellowship is "Automatic speaker recognition with applications to security and intelligence". Part of the work was carried out during a six-month visit in the Speech, Music and Hearing Department of the KTH Royal Institute of Technology, Stockholm. Speaker Recognition research develops techniques to automatically associate a given human voice to a previously recorded version of it. Speaker Recognition is usually further defined as Speaker Identification or Speaker Verification; in the former the identity of a voice has to be found among a (possibly high) number of speaker voices, while in the latter the system is provided with both a voice and a claimed identity, and the association has to be verified as a true/false statement. The recognition systems also provides a confidence score about the found results. The first Part of the thesis reviews the state of the art of Speaker Recognition research. The main components of a recognition system are described: audio features extraction, statistical modelling, and performance assessment. During the years the research community developed a number of Audio Features, use to describe the information carried by the vocal signal in a compact and deterministic way. In every automatic recognition application, even speech or language, the feature extraction process is the first step, in charge of compressing substantially the size of the input data without loosing any important information. The choice of the best fitted features for a specific application, and their tuning, are crucial to obtain satisfactory recognition results; moreover the definition of innovative features is a lively research direction because it is generally recognized that existing features are still far from the exploitation of the whole information load carried by the vocal signal. There are audio features which during the years have proved to perform better than other; some of them are described in Part I: Mel-Frequency Cepstral Coefficients and Linear Prediction Coefficients. More refined and experimental features are also introduced, and will be explained in Part III. Statistical modelling is introduced, particularly by discussing the Gaussian Mixture Models structure and their training through the EM algorithm; specific modelling techniques for recognition, such as Universal Background Model, are described. Scoring is the last phase of a Speaker Recognition process and involves a number of normalizations; it compensates for different recording conditions or model issues. Part I continues presenting a number of audio databases that are commonly used in the literature as benchmark databases to compare results or recognition systems, in particular TIMIT and NIST Speaker Recognition Evaluation - SRE 2004. A recognition prototype system has been built during the PhD project, and it is detailed in Part II. The first Chapter describes the proposed application, referring to intelligence and security. The application fulfils specific requirements of the Authorities when investigations involve phone wiretapping or environmental interceptions. In these cases Authorities have to listen to a large amount of recordings, most of which are not related to the investigations. The application idea is to automatically detect and label speakers, giving the possibility to search for a specific speaker through the recording collection. This can avoid time wasting, resulting in an economical advantage. Many difficulties arises from the phone lines, which are known to degrade the speech signal and cause a reduction of the recognition performances; main issues are the narrow audio bandwidth, the additive noises and the convolution noise, the last resulting in phase distortion. The second Chapter in Part II describes in detail the developed Speaker Recognition system; a number of design choices are discussed. During the development the research scope of the system has been crucial: a lot of effort has been put to obtain a system with good performances and still easily and deeply modifiable. The assessment of results on different databases posed further challenges, which has been solved with a unified interface to the databases. The fundamental components of a speaker recognition system have been developed, with also some speed-up improvements. Lastly, the whole software can run on a cluster computer without any reconfiguration, a crucial characteristic in order to assess performance on big database in reasonable times. During the three-years project some works have been developed which are related to the Speaker Recognition, although not directly involved with it. These developments are described in Part II as extensions of the prototype. First a Voice Activity Detector suitable for noisy recordings is explained. The first step of feature extraction is to find and select, from a given record, only the segments containing voice; this is not a trivial task when the record is noisy and a simple "energy threshold" approach fails. The developed VAD is based on advanced features, computed from Wavelet Transforms, which are further processed using an adaptive threshold. One second developed application is Speaker Diarization: it permits to automatically segment an audio recording when it contains different speakers. The outputs of the diarization are a segmentation and a speaker label for each segment, resulting in a "who speaks when" answer. The third and last collateral work is a Noise Reduction system for voice applications, developed on a hardware DSP. The noise reduction algorithm adaptively detects the noise and reduces it, keeping only the voice; it works in real time using only a slight portion of the DSP computing power. Lastly, Part III discusses innovative audio features, which are the main novel contribution of this thesis. The features are obtained from the glottal flow, therefore the first Chapter in this Part describes the anatomy of the vocal folds and of the vocal tract. The working principle of the phonation apparatus is described and the importance of the vocal folds physics is pointed out. The glottal flow is an input air flow for the vocal tract, which acts as a filter; an open-source toolkit for the inversion of the vocal tract filter is introduced: it permits to estimate the glottal flow from speech records. A description of some methods used to give a numerical characterization to the glottal flow is given. In the subsequent Chapter, a definition of the novel glottal features is presented. The glottal flow estimates are not always reliable, so a first step detects and deletes unlikely flows. A numerical procedure then groups and sorts the flow estimates, preparing them for a statistical modelling. Performance measures are then discussed, comparing the novel features against the standard ones, applied on the reference databases TIMIT and SRE 2004. A Chapter is dedicated to a different research work, related with glottal flow characterization. A physical model of the vocal folds is presented, with a number of control rules, able to describe the vocal folds dynamic. The rules permit to translate a specific pharyngeal muscular set-up in mechanical parameters of the model, which results in a specific glottal flow (obtained after a computer simulation of the model). The so-called Inverse Problem is defined in this way: given a glottal flow it has to be found the muscular set-up which, used to drive a model simulation, can obtain the same glottal flow as the given one. The inverse problem has a number of difficulties in it, such as the non-univocity of the inversion and the sensitivity to slight variations in the input flow. An optimization control technique has been developed and is explained. The final Chapter summarizes the achievements of the thesis. Along with this discussion, a roadmap for the future improvements to the features is sketched. In the end, a resume of the published and submitted articles for both conferences and journals is presented.
Il Riconoscimento Automatico del Parlatore rappresenta un campo di ricerca esteso, che comprende molti argomenti: elaborazione del segnale, fisiologia vocale e dell'apparato uditivo, strumenti di modellazione statistica, studio del linguaggio, ecc. Lo studio di queste tecniche è iniziato circa trenta anni fa e, da allora, ci sono stati grandi miglioramenti. Nondimeno, il campo di ricerca continua a porre questioni e, in tutto il mondo, gruppi di ricerca continuano a lavorare per ottenere sistemi di riconoscimento più affidabili e con prestazioni migliori. La presente tesi documenta un progetto di Philosophiae Doctor finanziato dall'Azienda privata RT - Radio Trevisan Elettronica Industriale S.p.A. Il titolo della borsa di studio è "Riconoscimento automatico del parlatore con applicazioni alla sicurezza e all'intelligence". Parte del lavoro ha avuto luogo durante una visita, durata sei mesi, presso lo Speech, Music and Hearing Department del KTH - Royal Institute of Technology di Stoccolma. La ricerca inerente il Riconoscimento del Parlatore sviluppa tecnologie per associare automaticamente una data voce umana ad una versione precedentemente registrata della stessa. Il Riconoscimento del Parlatore (Speaker Recognition) viene solitamente meglio definito in termini di Verifica o di Identificazione del Parlatore (in letteratura Speaker Verification o Speaker Identification, rispettivamente). L'Identificazione consiste nel recupero dell'identità di una voce fra un numero (anche alto) di voci modellate dal sistema; nella Verifica invece, date una voce ed una identità, si chiede al sistema di verificare l'associazione tra le due. I sistemi di riconoscimento producono anche un punteggio (Score) che attesta l'attendibilità della risposta fornita. La prima Parte della tesi propone una revisione dello stato dell'arte circa il Riconoscimento del Parlatore. Vengono descritte le componenti principali di un prototipo per il riconoscimento: estrazione di Features audio, modellazione statistica e verifica delle prestazioni. Nel tempo, la comunità di ricerca ha sviluppato una quantità di Features Acustiche: si tratta di tecniche per descrivere numericamente il segnale vocale in modo compatto e deterministico. In ogni applicazione di riconoscimento, anche per le parole o il linguaggio (Speech o Language Recognition), l'estrazione di Features è il primo passo: ha lo scopo di ridurre drasticamente la dimensione dei dati di ingresso, ma senza perdere alcuna informazione significativa. La scelta delle Features più idonee ad una specifica applicazione, e la loro taratura, sono cruciali per ottenere buoni risultati di riconoscimento; inoltre, la definizione di nuove features costituisce un attivo campo di ricerca perché la comunità scientifica ritiene che le features esistenti siano ancora lontane dallo sfruttamento dell'intera informazione portata dal segnale vocale. Alcune Features si sono affermate nel tempo per le loro migliori prestazioni: Coefficienti Cepstrali in scala Mel (Mel-Frequency Cepstral Coefficients) e Coefficienti di Predizione Lineare (Linear Prediction Coefficients); tali Features vengono descritte nella Parte I. Viene introdotta anche la modellazione statistica, spiegando la struttura dei Modelli a Misture di Gaussiane (Gaussian Mixture Models) ed il relativo algoritmo di addestramento (Expectation-Maximization). Tecniche di modellazione specifiche, quali Universal Background Model, completano poi la descrizione degli strumenti statistici usati per il riconoscimento. Lo Scoring rappresenta, infine, la fase di produzione dei risultati da parte del sistema di riconoscimento; comprende diverse procedure di normalizzazione che compensano, ad esempio, i problemi di modellazione o le diverse condizioni acustiche con cui i dati audio sono stati registrati. La Parte I prosegue poi presentando alcuni database audio usati comunemente in letteratura quali riferimento per il confronto delle prestazioni dei sistemi di riconoscimento; in particolare, vengono presentati TIMIT e NIST Speaker Recognition Evaluation (SRE) 2004. Tali database sono adatti alla valutazione delle prestazioni su audio di natura telefonica, di interesse per la presente tesi; tale argomento verrà ulteriormente discusso nella Parte II. Durante il progetto di PhD è stato progettato e realizzato un prototipo di sistema di riconoscimento, discusso nella Parte II. Il primo Capitolo descrive l'applicazione di riconoscimento proposta; la tecnologia per Riconoscimento del Parlatore viene applicate alle linee telefoniche, con riferimento alla sicurezza e all'intelligence. L'applicazione risponde a una specifica necessità delle Autorità quando le investigazioni coinvolgono intercettazioni telefoniche. In questi casi le Autorità devono ascoltare grandi quantità di dati telefonici, la maggior parte dei quali risulta essere inutile ai fini investigativi. L'idea applicativa consiste nell'identificazione e nell'etichettatura automatiche dei parlatori presenti nelle intercettazioni, permettendo così la ricerca di uno specifico parlatore presente nella collezione di registrazioni. Questo potrebbe ridurre gli sprechi di tempo, ottenendo così vantaggi economici. L'audio proveniente da linee telefoniche pone difficoltà al riconoscimento automatico, perché degrada significativamente il segnale e peggiora quindi le prestazioni. Vengono generalmente riconosciute alcune problematiche del segnale audio telefonico: banda ridotta, rumore additivo e rumore convolutivo; quest'ultimo causa distorsione di fase, che altera la forma d'onda del segnale. Il secondo Capitolo della Parte II descrive in dettaglio il sistema di Riconoscimento del Parlatore sviluppato; vengono discusse le diverse scelte di progettazione. Sono state sviluppate le componenti fondamentali di un sistema di riconoscimento, con alcune migliorie per contenere il carico computazionale. Durante lo sviluppo si è ritenuto primario lo scopo di ricerca del software da realizzare: è stato profuso molto impegno per ottenere un sistema con buone prestazioni, che però rimanesse semplice da modificare anche in profondità. La necessità (ed opportunità) di verificare le prestazioni del prototipo ha posto ulteriori requisiti allo sviluppo, che sono stati soddisfatti mediante l'adozione di un'interfaccia comune ai diversi database. Infine, tutti i moduli del software sviluppato possono essere eseguiti su un Cluster di Calcolo (calcolatore ad altre prestazioni per il calcolo parallelo); questa caratteristica del prototipo è stata cruciale per permettere una approfondita valutazione delle prestazioni del software in tempi ragionevoli. Durante il lavoro svolto per il progetto di Dottorato sono stati condotti studi affini al Riconoscimento del Parlatore, ma non direttamente correlati ad esso. Questi sviluppi vengono descritti nella Parte II quali estensioni del prototipo. Viene innanzitutto presentato un Rilevatore di Parlato (Voice Activity Detector) adatto all'impiego in presenza di rumore. Questo componente assume particolare importanza quale primo passo dell'estrazione delle Features: è necessario infatti selezionare e mantenere solo i segmenti audio che contengono effettivamente segnale vocale. In situazioni con rilevante rumore di fondo i semplici approcci a "soglia di energia" falliscono. Il Rilevatore realizzato è basato su Features avanzate, ottenute mediante le Trasformate Wavelet, ulteriormente elaborate mediante una sogliatura adattiva. Una seconda applicazione consiste in un prototipo per la Speaker Diarization, ovvero l'etichettatura automatica di registrazioni audio contenenti diversi parlatori. Il risultato del procedimento consiste nella segmentazione dell'audio ed in una serie di etichette, una per ciascun segmento; il sistema fornisce una risposta del tipo "chi parla quando". Il terzo ed ultimo studio collaterale al Riconoscimento del Parlatore consiste nello sviluppo di un sistema di Riduzione del Rumore (Noise Reduction) su piattaforma hardware DSP dedicata. L'algoritmo di Riduzione individua il rumore in modo adattivo e lo riduce, cercando di mantenere solo il segnale vocale; l'elaborazione avviene in tempo reale, pur usando solo una parte molto limitata delle risorse di calcolo del DSP. La Parte III della tesi introduce, infine, Features audio innovative, che costituiscono il principale contributo innovativo della tesi. Tali Features sono ottenute dal flusso glottale, quindi il primo Capitolo della Parte discute l'anatomia del tratto e delle corde vocali. Viene descritto il principio di funzionamento della fonazione e l'importanza della fisica delle corde vocali. Il flusso glottale costituisce un ingresso per il tratto vocale, che agisce come un filtro. Viene descritto uno strumento software open-source per l'inversione del tratto vocale: esso permette la stima del flusso glottale a partire da semplici registrazioni vocali. Alcuni dei metodi usati per caratterizzare numericamente il flusso glottale vengono infine esposti. Nel Capitolo successivo viene presentata la definizione delle nuove Features glottali. Le stime del flusso glottale non sono sempre affidabili quindi, durante l'estrazione delle nuove Features, il primo passo individua ed esclude i flussi giudicati non attendibili. Una procedure numerica provvede poi a raggruppare ed ordinare le stime dei flussi, preparandoli per la modellazione statistica. Le Features glottali, applicate al Riconoscimento del Parlatore sui database TIMIT e NIST SRE 2004, vengono comparate alle Features standard. Il Capitolo finale della Parte III è dedicato ad un diverso lavoro di ricerca, comunque correlato alla caratterizzazione del flusso glottale. Viene presentato un modello fisico delle corde vocali, controllato da alcune regole numeriche, in grado di descrivere la dinamica delle corde stesse. Le regole permettono di tradurre una specifica impostazione dei muscoli glottali nei parametri meccanici del modello, che portano ad un preciso flusso glottale (ottenuto dopo una simulazione al computer del modello). Il cosiddetto Problema Inverso è definito nel seguente modo: dato un flusso glottale si chiede di trovare una impostazione dei muscoli glottali che, usata per guidare il modello fisico, permetta la risintesi di un segnale glottale il più possibile simile a quello dato. Il problema inverso comporta una serie di difficoltà, quali la non-univocità dell'inversione e la sensitività alle variazioni, anche piccole, del flusso di ingresso. E' stata sviluppata una tecnica di ottimizzazione del controllo, che viene descritta. Il capitolo conclusivo della tesi riassume i risultati ottenuti. A fianco di questa discussione è presentata un piano di lavoro per lo sviluppo delle Features introdotte. Vengono infine presentate le pubblicazioni prodotte.
APA, Harvard, Vancouver, ISO, and other styles
9

Almeida, N?thalee Cavalcanti de. "Sistema inteligente para diagn?stico de patologias na laringe utilizando m?quinas de vetor de suporte." Universidade Federal do Rio Grande do Norte, 2010. http://repositorio.ufrn.br:8080/jspui/handle/123456789/15149.

Full text
Abstract:
Made available in DSpace on 2014-12-17T14:54:56Z (GMT). No. of bitstreams: 1 NathaleeCA_DISSERT.pdf: 1318151 bytes, checksum: d2471205a640d8428567d06ace6c3b31 (MD5) Previous issue date: 2010-07-23
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior
The human voice is an important communication tool and any disorder of the voice can have profound implications for social and professional life of an individual. Techniques of digital signal processing have been used by acoustic analysis of vocal disorders caused by pathologies in the larynx, due to its simplicity and noninvasive nature. This work deals with the acoustic analysis of voice signals affected by pathologies in the larynx, specifically, edema, and nodules on the vocal folds. The purpose of this work is to develop a classification system of voices to help pre-diagnosis of pathologies in the larynx, as well as monitoring pharmacological treatments and after surgery. Linear Prediction Coefficients (LPC), Mel Frequency cepstral coefficients (MFCC) and the coefficients obtained through the Wavelet Packet Transform (WPT) are applied to extract relevant characteristics of the voice signal. For the classification task is used the Support Vector Machine (SVM), which aims to build optimal hyperplanes that maximize the margin of separation between the classes involved. The hyperplane generated is determined by the support vectors, which are subsets of points in these classes. According to the database used in this work, the results showed a good performance, with a hit rate of 98.46% for classification of normal and pathological voices in general, and 98.75% in the classification of diseases together: edema and nodules
A voz humana ? uma importante ferramenta de comunica??o e qualquer funcionamento inadequado da voz pode ter profundas implica??es na vida social e profissional de um indiv?duo. T?cnicas de processamento digital de sinais t?m sido utilizadas atrav?s da an?lise ac?stica de desordens vocais provocadas por patologias na laringe, devido ? sua simplicidade e natureza n?o-invasiva. Este trabalho trata da an?lise ac?stica de sinais de vozes afetadas por patologias na laringe, especificamente, edemas e n?dulos nas pregas vocais. A proposta deste trabalho ? desenvolver um sistema de classifica??o de vozes para auxiliar no pr?-diagn?stico de patologias na laringe, bem como no acompanhamento de tratamentos farmacol?gicos e p?s-cir?rgicos. Os coeficientes de Predi??o Linear (LPC), Coeficientes Cepstrais de Freq??ncia Mel (MFCC) e os coeficientes obtidos atrav?s da Transformada Wavelet Packet (WPT) s?o aplicados para extra??o de caracter?sticas relevantes do sinal de voz. ? utilizada para a tarefa de classifica??o M?quina de Vetor de Suporte (SVM), a qual tem como objetivo construir hiperplanos ?timos que maximizem a margem de separa??o entre as classes envolvidas. O hiperplano gerado ? determinado pelos vetores de suporte, que s?o subconjuntos de pontos dessas classes. De acordo com o banco de dados utilizado neste trabalho, os resultados apresentaram um bom desempenho, com taxa de acerto de 98,46% para classifica??o de vozes normais e patol?gicas em geral, e 98,75% na classifica??o de patologias entre si: edemas e n?dulos
APA, Harvard, Vancouver, ISO, and other styles
10

Lovett, Victoria Anne. "Voice Features of Sjogren's Syndrome: Examination of Relative Fundamental Frequency (RFF) During Connected Speech." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/5749.

Full text
Abstract:
The purpose of this study was to examine the effectiveness of relative fundamental frequency (RFF) in quantifying voice disorder severity and possible change with treatment in individuals with Primary Sjögren's Syndrome (SS). Participants completed twice-daily audio recordings during an ABAB within-subjects experimental study investigating the effects of nebulized saline on voice production in this population. Voice samples of the Rainbow Passage from seven of the eight individuals with Primary SS involved in a larger investigation met inclusion criteria for analysis, for a total of 555 tokens. The results indicated that RFF values for this sample were similar to previously reported RFF values for individuals with voice disorders. RFF values improved with nebulized saline treatment but did not fall within the normal range for typical speakers. These findings were similar to other populations of voice disorders who experienced improvement, but not complete normalization, of RFF with treatment. Patient-based factors, such as age and diagnosis as well as measurement and methodological factors, might affect RFF values. The results from this study indicate that RFF is a potentially useful measure in quantifying voice production and disorder severity in individuals with Primary SS.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Vocal feature"

1

Kizin, M. M. Russkoe vokalʹnoe iskusstvo v ėkrannoĭ kulʹture. Moskva: Izdatelʹstvo "Soglasie", 2017.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
2

Sleepless in Seattle: Music from the TriStar Pictures feature film. Miami, FL: CPP/Belwin, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
3

Adams, David. A Handbook of Diction for Singers. 3rd ed. Oxford University PressNew York, 2022. http://dx.doi.org/10.1093/oso/9780197639504.001.0001.

Full text
Abstract:
Abstract The 3rd edition of A Handbook of Diction for Singers is a guide to help classical singers achieve professional levels of lyric diction in Italian, French, and German, the three major languages of classical vocal repertory. It serves as a textbook for student singers, as well as a reference for voice teachers, vocal coaches, and conductors. The presentation is based on the International Phonetic Alphabet (IPA). The newly created chapter 1, “An Introduction to Specific Sounds,” introduces the relevant phonetic symbols, with descriptions of how each sound is produced and reference to the positioning of the articulators (tongue, jaw, lips, glottis) for each sound. Comparison of sample words from each language, including English, are provided. Each of the three languages is given its own chapter, with discussion not only of the sounds but also of features such as diacritical marks, words stress, vowel length, syllabification, and word structure. Example words have been expanded from previous editions, and most words are translated into English and transcribed into phonetic symbols. There are multiple musical examples, as well as basic exercises for specific sounds and IPA transcription (a new feature). Fine points not available from other textbooks are covered, such as extensive information on the open and close vowels sounds of e and o in Italian, sequencing of consonant sounds and word structure in German, and vowel length and details of the treatment of mute e in French. Additional resources are discussed for each language and sample texts are given with IPA transcriptions and translations.
APA, Harvard, Vancouver, ISO, and other styles
4

Shaibani, Aziz. Dysphonia. Oxford University Press, 2018. http://dx.doi.org/10.1093/med/9780190661304.003.0008.

Full text
Abstract:
Lack of function or malfunction of the vocal cords are not as common manifestations of neuromuscular disorders as dysarthria. It is typically seen in central diseases such as Parkinson disease. Certain muscle and nerve disorders affect the vocal cords, but in these cases, other features of these diseases make the diagnosis easy. Myasthenia gravis (MG) may present with intermittent hoarseness only early in the course of the disease. Consultation with an ear, nose, throat (ENT) specialist is recommended to characterize the type of cord pathology. Hysterical hoarseness and weakness are not unusual presentations to neuromuscular clinics. Unilateral vocal cord palsy is usually due to recurrent laryngeal nerve pathology.
APA, Harvard, Vancouver, ISO, and other styles
5

Owens, Matthew, and Graham F. Welch. Choral Pedagogy and the Construction of Identity. Edited by Frank Abrahams and Paul D. Head. Oxford University Press, 2017. http://dx.doi.org/10.1093/oxfordhb/9780199373369.013.9.

Full text
Abstract:
Following an initiative of the early 1990s, the majority of United Kingdom cathedrals now have girl as well as boy cathedral choristers, often alternating in the singing of the daily services. One of the original political challenges in this musico-cultural initiative was whether or not it was possible for girl choristers to attain the same vocal quality as their male counterparts. Empirical studies, however, suggest that there is considerable overlap between the psycho-acoustic vocal features of girls’ and boys’ singing, such that it is often difficult perceptually to distinguish between the two, particularly for the relatively naïve listener. Moreover, the music repertoire usually reaches across gender. The chapter provides an overview of these recent developments and explores how the musical director can best shape the vocal products of their choristers, while being sensitive to particular vocal production issues that relate to the development of girls’ voices.
APA, Harvard, Vancouver, ISO, and other styles
6

Manning, Jane. Vocal Repertoire for the Twenty-First Century, Volume 1. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780199391028.001.0001.

Full text
Abstract:
In this new follow-up to her highly regarded New Vocal Repertory, volumes 1 and 2, English concert and opera soprano Jane Manning provides a seasoned expert’s guidance and insight into the vocal genre she calls home. This book contains a diverse array of contemporary vocal music available in the twentieth century. It provides specific pieces for different voices, abilities, and occasions. Choices range from substantial song cycles to shorter pieces suitable for encores, examinations, or auditions. Almost all works are for voice and piano, but there are some for solo voice. This volume also contains a rich variety of musical styles, which is reflected here along with some revised and updated articles on works featured in the previous edition, in order to keep them in circulation. Furthermore, this volume includes the broadest possible selection of works which are confined to settings of the English language. Two works in Latin as well as one piece in fake Russian are the only exceptions. In addition, there are certain songs culled from some diploma syllabus many years ago, which seem to have progressed unchallenged through successive generations despite a wealth of viable alternatives. Teachers can thus be inclined to steer students in the direction of pieces they are already familiar with in this book.
APA, Harvard, Vancouver, ISO, and other styles
7

Kaplan, Tamara, and Tracey Milligan. Movement Disorders 1: Tourette’s Syndrome, Essential Tremor, and Parkinson’s Disease (DRAFT). Oxford University Press, 2018. http://dx.doi.org/10.1093/med/9780190650261.003.0011.

Full text
Abstract:
The video in this chapter explores movement disorders, and focuses on Tourette’s Syndrome, Essential tremor, and Parkinson’s Disease. It outlines the characteristics of each, such as motor and vocal tics in Tourette’s Syndrome, postural or kinetic tremor in Essential tremor, and the four hallmark features of Parkinson’s Disease (bradykinesia, resting tremor, cogwheel rigidity, and postural instability).
APA, Harvard, Vancouver, ISO, and other styles
8

Shaibani, Aziz. Dysphonia. Oxford University Press, 2015. http://dx.doi.org/10.1093/med/9780199898152.003.0008.

Full text
Abstract:
Dysfunction of the vocal cords (dysphonia) is not as common a manifestation of neuromuscular disorders as dysarthria. It is typically seen in central diseases such as Parkinson disease and spasmodic dysphonia. Certain muscle and nerve disorders affect the vocal cords, but in these cases other features of these diseases make the diagnosis easy. Myasthenia may present with only intermittent hoarseness early in the course of the disease. Consultation with an ear, nose, and throat (ENT) specialist is recommended to characterize the type of cords pathology. Hysterical hoarseness and weakness are not an unusual presentation to the neuromuscular clinics but again, they do not occur in isolation and other psychogenic symptoms such as give way weakness and functional gait disorder often lead to the right diagnosis.
APA, Harvard, Vancouver, ISO, and other styles
9

Malawey, Victoria. A Blaze of Light in Every Word. Oxford University Press, 2020. http://dx.doi.org/10.1093/oso/9780190052201.001.0001.

Full text
Abstract:
A Blaze of Light in Every Word presents a conceptual model for analyzing vocal delivery in popular song recordings focused on three overlapping areas of inquiry: pitch, prosody, and quality. The domain of pitch, which refers to listeners’ perceptions of frequency, considers range, tessitura, intonation, and registration. Prosody, the pacing and flow of delivery, comprises phrasing, metric placement, motility, embellishment, and consonantal articulation. Qualitative elements include timbre, phonation, onset, resonance, clarity, paralinguistic effects, and loudness. Intersecting all three domains is the area of technological mediation, which considers how external technologies, such as layering, overdubbing, pitch modification, recording transmission, compression, reverb, spatial placement, delay, and other electronic effects, impact voice in recorded music. Though the book focuses primarily on the sonic and material aspects of vocal delivery, it situates these aspects among broader cultural, philosophical, and anthropological approaches to voice with the goal to better understand the relationship between sonic content and its signification. Drawing upon transcription and spectrographic analysis as the primary means of representation, as well as modes of analysis, this book features in-depth analyses of a wide array of popular song recordings spanning genres from indie rock to hip-hop to death metal, develops analytical tools for understanding how individual dimensions make singing voices both complex and unique, and synthesizes how multiple aspects interact to better understand the multidimensionality of singing voices.
APA, Harvard, Vancouver, ISO, and other styles
10

Davé, Shilpa S. Apu’s Brown Voice. University of Illinois Press, 2017. http://dx.doi.org/10.5406/illinois/9780252037405.003.0003.

Full text
Abstract:
This chapter discusses the character Apu, exploring how his appearance on the television show The Simpsons in the 1990s was a departure from previous Hollywood and television representations of South Asians in the United States. Whereas South Asians were previously depicted as brief visitors or exotic foreigners, Apu symbolizes a permanent Indian immigrant presence in the United States. Yet, his brown-voice performance racializes and differentiates him from other Americans. The chapter theorizes the use of brown voice and discusses how animated characters, in particular, become a significant subject to study vocal accents and voiceovers. Animated characters are unique because one of their most important defining features is their voice, and, thus, animation emphasizes the voice as a site of interest in thinking about racial performance.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Vocal feature"

1

Bieser, Armin. "Amplitude Envelope Encoding as a Feature for Temporal Information Processing in the Auditory Cortex of Squirrel Monkeys." In Current Topics in Primate Vocal Communication, 221–33. Boston, MA: Springer US, 1995. http://dx.doi.org/10.1007/978-1-4757-9930-9_12.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Rajasekharreddy, Poreddy, and E. S. Gopi. "Feature Selection for Vocal Segmentation Using Social Emotional Optimization Algorithm." In Socio-cultural Inspired Metaheuristics, 69–91. Singapore: Springer Singapore, 2019. http://dx.doi.org/10.1007/978-981-13-6569-0_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Majidnezhad, Vahid, and Igor Kheidorov. "A Novel Method for Feature Extraction in Vocal Fold Pathology Diagnosis." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 96–105. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-37893-5_11.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Rui, Jianli Qi, and Daifu Qiao. "An Online Vocal Music Teaching Timbre Evaluation Method Based on Feature Comparison." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 482–94. Cham: Springer Nature Switzerland, 2022. http://dx.doi.org/10.1007/978-3-031-21164-5_37.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Ying, Jia-yin Chen, and Ai-ping Zhang. "Design of Mobile Teaching Platform for Vocal Piano Accompaniment Course Based on Feature Comparison." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 430–42. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-94554-1_34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Casalini, Iacopo, Marco Marini, and Luca Fanucci. "FPGA Implementation of a Configurable Vocal Feature Extraction Embedded System for Dysarthric Speech Recognition." In Lecture Notes in Electrical Engineering, 221–28. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-030-95498-7_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Chau, Chun Keung, Chak Shun Lai, and Bertram Emil Shi. "Feature vs. Model Based Vocal Tract Length Normalization for a Speech Recognition-Based Interactive Toy." In Active Media Technology, 134–43. Berlin, Heidelberg: Springer Berlin Heidelberg, 2001. http://dx.doi.org/10.1007/3-540-45336-9_17.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Kawakami, Yuta, Longbiao Wang, Atsuhiko Kai, and Seiichi Nakagawa. "Speaker Identification by Combining Various Vocal Tract and Vocal Source Features." In Text, Speech and Dialogue, 382–89. Cham: Springer International Publishing, 2014. http://dx.doi.org/10.1007/978-3-319-10816-2_46.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Rusko, Milan, and Jozef Juhár. "Towards Annotation of Nonverbal Vocal Gestures in Slovak." In Verbal and Nonverbal Features of Human-Human and Human-Machine Interaction, 255–65. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-70872-8_20.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Madhavi, Maulik C., Shubham Sharma, and Hemant A. Patil. "Vocal Tract Length Normalization Features for Audio Search." In Text, Speech, and Dialogue, 387–95. Cham: Springer International Publishing, 2015. http://dx.doi.org/10.1007/978-3-319-24033-6_44.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Vocal feature"

1

Ai, Jiaqi, Yi Zuo, Junxia Liu, Peichao He, Tieshan Li, and C. L. Philip Chen. "Application of hierarchical clustering analysis for vocal feature extraction." In 2019 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE). IEEE, 2019. http://dx.doi.org/10.1109/csde48274.2019.9162362.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Danilovaite, Monika. "Perceptually Motivated Feature set for Vocal Folds State Assessment." In 2020 IEEE 8th Workshop on Advances in Information, Electronic and Electrical Engineering (AIEEE). IEEE, 2021. http://dx.doi.org/10.1109/aieee51419.2021.9435765.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Liu Fagui and Ning Jing. "A General Method of New Feature Service Development on VOCAL." In IEE Mobility Conference 2005. The Second International Conference on Mobile Technology, Applications and Systems. IEEE, 2005. http://dx.doi.org/10.1109/mtas.2005.207199.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Liu Fagui and Ning Jing. "A general method of new feature service development on VOCAL." In IEE Mobility Conference 2005. The Second International Conference on Mobile Technology, Applications and Systems. IEE, 2005. http://dx.doi.org/10.1049/cp:20051516.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Atmaja, Bagus Tris, and Akira Sasou. "Leveraging Pre-Trained Acoustic Feature Extractor For Affective Vocal Bursts Tasks." In 2022 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC). IEEE, 2022. http://dx.doi.org/10.23919/apsipaasc55919.2022.9980083.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Mamun, Muntasir, Md Ishtyaq Mahmud, Md Iqbal Hossain, Asm Mohaimenul Islam, Md Salim Ahammed, and Md Milon Uddin. "Vocal Feature Guided Detection of Parkinson’s Disease Using Machine Learning Algorithms." In 2022 IEEE 13th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON). IEEE, 2022. http://dx.doi.org/10.1109/uemcon54665.2022.9965732.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Neto, Benedito G. Aguiar, Joseana M. Fechine, Silvana Cunha Costa, and Menaka Muppa. "Feature Estimation for Vocal Fold Edema Detection Using Short-Term Cepstral Analysis." In 2007 IEEE 7th International Symposium on BioInformatics and BioEngineering. IEEE, 2007. http://dx.doi.org/10.1109/bibe.2007.4375707.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Turrisi, Rosanna, Raffaele Tavarone, and Leonardo Badino. "Improving Generalization of Vocal Tract Feature Reconstruction: From Augmented Acoustic Inversion to Articulatory Feature Reconstruction without Articulatory Data." In 2018 IEEE Spoken Language Technology Workshop (SLT). IEEE, 2018. http://dx.doi.org/10.1109/slt.2018.8639537.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Moura, Shayenne, and Marcelo Queiroz. "Instrumental Sensibility of Vocal Detector Based on Spectral Features." In Simpósio Brasileiro de Computação Musical. Sociedade Brasileira de Computação - SBC, 2019. http://dx.doi.org/10.5753/sbcm.2019.10451.

Full text
Abstract:
Detecting voice in a mixture of sound sources remains a challenging task in MIR research. The musical content can be perceived in many different ways as instrumentation varies. We evaluate how instrumentation affects singing voice detection in pieces using a standard spectral feature (MFCC). We trained Random Forest models with song remixes for specific subsets of sound sources, and compare it to models trained with the original songs. We thus present a preliminary analysis of the classification accuracy results.
APA, Harvard, Vancouver, ISO, and other styles
10

Gumuscu, Abdulkadir, Kerim Karadag, Mehmet Emin Tenekeci, and Ibrahim Berkan Aydilek. "Genetic algorithm based feature selection on diagnosis of Parkinson disease via vocal analysis." In 2017 25th Signal Processing and Communications Applications Conference (SIU). IEEE, 2017. http://dx.doi.org/10.1109/siu.2017.7960384.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Vocal feature"

1

Pedersen, Gjertrud. Symphonies Reframed. Norges Musikkhøgskole, August 2018. http://dx.doi.org/10.22501/nmh-ar.481294.

Full text
Abstract:
Symphonies Reframed recreates symphonies as chamber music. The project aims to capture the features that are unique for chamber music, at the juncture between the “soloistic small” and the “orchestral large”. A new ensemble model, the “triharmonic ensemble” with 7-9 musicians, has been created to serve this purpose. By choosing this size range, we are looking to facilitate group interplay without the need of a conductor. We also want to facilitate a richness of sound colours by involving piano, strings and winds. The exact combination of instruments is chosen in accordance with the features of the original score. The ensemble setup may take two forms: nonet with piano, wind quartet and string quartet (with double bass) or septet with piano, wind trio and string trio. As a group, these instruments have a rich tonal range with continuous and partly overlapping registers. This paper will illuminate three core questions: What artistic features emerge when changing from large orchestral structures to mid-sized chamber groups? How do the performers reflect on their musical roles in the chamber ensemble? What educational value might the reframing unfold? Since its inception in 2014, the project has evolved to include works with vocal, choral and soloistic parts, as well as sonata literature. Ensembles of students and professors have rehearsed, interpreted and performed our transcriptions of works by Brahms, Schumann and Mozart. We have also carried out interviews and critical discussions with the students, on their experiences of the concrete projects and on their reflections on own learning processes in general. Chamber ensembles and orchestras are exponents of different original repertoire. The difference in artistic output thus hinges upon both ensemble structure and the composition at hand. Symphonies Reframed seeks to enable an assessment of the qualities that are specific to the performing corpus and not beholden to any particular piece of music. Our transcriptions have enabled comparisons and reflections, using original compositions as a reference point. Some of our ensemble musicians have had first-hand experience with performing the original works as well. Others have encountered the works for the first time through our productions. This has enabled a multi-angled approach to the three central themes of our research. This text is produced in 2018.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography