Log in

Relevant bibliographies by topics / Vocal feature / Journal articles

Journal articles on the topic 'Vocal feature'

To see the other types of publications on this topic, follow the link: Vocal feature.

Author: Grafiati

Published: 14 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 journal articles for your research on the topic 'Vocal feature.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse journal articles on a wide variety of disciplines and organise your bibliography correctly.

1

Keyser, Samuel Jay, and Kenneth N. Stevens. "Feature geometry and the vocal tract." Phonology 11, no. 2 (August 1994): 207–36. http://dx.doi.org/10.1017/s0952675700001950.

Full text

Abstract:

Perhaps the most important insight in phonological theory since the introduction of the concept of the phoneme has been the role that distinctive features play in phonological theory (Jakobson et al. 1952). Most research since Jakobson's early formulation has focused on the segmental properties of these features without reference to their hierarchical organisation. Recent research, however, has shed considerable light on this latter aspect of the phoneme as a phonological unit. In his seminal article ‘The geometry of phonological features’, for example, Clements (1985), building on earlier work of scholars such as Goldsmith (1976), argues that features are not ‘bundles’ in Bloomfield's sense, but are, in fact, organised into phonological trees with each branch corresponding to what has been called a tier. An overview of the current state of feature geometry can be found in Clements & Hume (forthcoming) and Kenstowicz (1994).

APA, Harvard, Vancouver, ISO, and other styles

2

Lv, Chaohui, Hua Lan, Ying Yu, and Shengnan Li. "Objective Evaluation Method of Broadcasting Vocal Timbre Based on Feature Selection." Wireless Communications and Mobile Computing 2022 (May 26, 2022): 1–17. http://dx.doi.org/10.1155/2022/7086599.

Full text

Abstract:

Broadcasting voice is used to convey ideas and emotions. In the selection process of broadcasting and hosting professionals, the vocal timbre is an important index. The subjective evaluation method is widely used, but the selection results have certain subjectivity and uncertainty. In this paper, an objective evaluation method of broadcasting vocal timbre is proposed. Firstly, the broadcasting vocal timbre database is constructed on Chinese phonetic characteristics. Then, the timbre feature selection strategy is presented based on human vocal mechanism, and the broadcast timbre characteristics are divided into three categories, which include source parameters, vocal tract parameters, and human hearing parameters. Finally, the three models of hidden Markov model (HMM), Gaussian Mixture Model-General Background Model (GMM-UBM), and long short-term memory (LSTM) are exploited to evaluate the timbre of the broadcast by extracting timbre features and four timbre feature combinations. The experiments show that the selection of timbre features is scientific and effective. Moreover, the accuracy of the LSTM network using the deep learning algorithm in the objective evaluation of the broadcast timbre is better than the traditional HMM and GMM-UBM, and the proposed method can achieve about 95% accuracy rate in our database.

APA, Harvard, Vancouver, ISO, and other styles

3

Wang, Fenqi, Delin Deng, and Ratree Wayland. "The acoustic profiles of vocal emotions in Japanese: A corpus study with generalized additive mixed modeling." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A60—A61. http://dx.doi.org/10.1121/10.0015548.

Full text

Abstract:

This study investigated the vocal emotions in Japanese by analyzing acoustic features from emotional utterances in the Online Gaming Voice Chat Corpus with Emotional Label (Arimoto and Kawatsu, 2013). The corpus contains the recorded sentences produced in 8 emotions by four native Japanese speakers who are professional actors. For acoustic feature extraction, Praat script ProsodyPro was used. Principle component analysis (PCA) was conducted to evaluate the contribution of each acoustic feature. In addition, a linear discriminant classifier (LDA) was trained with the extracted acoustic features to predict the emotion category and intensity. A generalized additive mixed model (GAMM) was performed to examine the effect of gender, emotional category, and emotional intensity on the time-normalized f0 values. The GAMM’s results suggested the effects of gender, emotion, and emotional intensity on the time-normalized f0 values of vocal emotions in Japanese. The recognition accuracy of the LDA classifier reached about 60%, suggesting that although pitch-related measures are important to differentiate vocal emotions, bio-informational features (e.g., jitter, shimmer, and harmonicity) are also informative. In addition, our correlation analysis suggested that vocal emotions could be conveyed by a set of features rather than some individual features alone.

APA, Harvard, Vancouver, ISO, and other styles

4

Jayanthi Kumari, T. R., and H. S. Jayanna. "i-Vector-Based Speaker Verification on Limited Data Using Fusion Techniques." Journal of Intelligent Systems 29, no. 1 (May 3, 2018): 565–82. http://dx.doi.org/10.1515/jisys-2017-0047.

Full text

Abstract:

Abstract In many biometric applications, limited data speaker verification plays a significant role in practical-oriented systems to verify the speaker. The performance of the speaker verification system needs to be improved by applying suitable techniques to limited data condition. The limited data represent both train and test data duration in terms of few seconds. This article shows the importance of the speaker verification system under limited data condition using feature- and score-level fusion techniques. The baseline speaker verification system uses vocal tract features like mel-frequency cepstral coefficients, linear predictive cepstral coefficients and excitation source features like linear prediction residual and linear prediction residual phase as features along with i-vector modeling techniques using the NIST 2003 data set. In feature-level fusion, the vocal tract features are fused with excitation source features. As a result, on average, equal error rate (EER) is approximately equal to 4% compared to individual feature performance. Further in this work, two different types of score-level fusion are demonstrated. In the first case, fusing the scores of vocal tract features and excitation source features at score-level-maintaining modeling technique remains the same, which provides an average reduction approximately equal to 2% EER compared to feature-level fusion performance. In the second case, scores of the different modeling techniques are combined, which has resulted in EER reduction approximately equal to 4.5% compared with score-level fusion of different features.

APA, Harvard, Vancouver, ISO, and other styles

5

Lahiri, Rimita, Md Nasir, Manoj Kumar, So Hyun Kim, Somer Bishop, Catherine Lord, and Shrikanth Narayanan. "Interpersonal synchrony across vocal and lexical modalities in interactions involving children with autism spectrum disorder." JASA Express Letters 2, no. 9 (September 2022): 095202. http://dx.doi.org/10.1121/10.0013421.

Full text

Abstract:

Quantifying behavioral synchrony can inform clinical diagnosis, long-term monitoring, and individualised interventions in neuro-developmental disorders characterized by deficit in communication and social interaction, such as autism spectrum disorder. In this work, three different objective measures of interpersonal synchrony are evaluated across vocal and linguistic communication modalities. For vocal prosodic and spectral features, dynamic time warping distance and squared cosine distance of (feature-wise) complexity are used, and for lexical features, word mover's distance is applied to capture behavioral synchrony. It is shown that these interpersonal vocal and linguistic synchrony measures capture complementary information that helps in characterizing overall behavioral patterns.

APA, Harvard, Vancouver, ISO, and other styles

6

Wu, Yunfeng, Pinnan Chen, Yuchen Yao, Xiaoquan Ye, Yugui Xiao, Lifang Liao, Meihong Wu, and Jian Chen. "Dysphonic Voice Pattern Analysis of Patients in Parkinson’s Disease Using Minimum Interclass Probability Risk Feature Selection and Bagging Ensemble Learning Methods." Computational and Mathematical Methods in Medicine 2017 (2017): 1–11. http://dx.doi.org/10.1155/2017/4201984.

Full text

Abstract:

Analysis of quantified voice patterns is useful in the detection and assessment of dysphonia and related phonation disorders. In this paper, we first study the linear correlations between 22 voice parameters of fundamental frequency variability, amplitude variations, and nonlinear measures. The highly correlated vocal parameters are combined by using the linear discriminant analysis method. Based on the probability density functions estimated by the Parzen-window technique, we propose an interclass probability risk (ICPR) method to select the vocal parameters with small ICPR values as dominant features and compare with the modified Kullback-Leibler divergence (MKLD) feature selection approach. The experimental results show that the generalized logistic regression analysis (GLRA), support vector machine (SVM), and Bagging ensemble algorithm input with the ICPR features can provide better classification results than the same classifiers with the MKLD selected features. The SVM is much better at distinguishing normal vocal patterns with a specificity of 0.8542. Among the three classification methods, the Bagging ensemble algorithm with ICPR features can identify 90.77% vocal patterns, with the highest sensitivity of 0.9796 and largest area value of 0.9558 under the receiver operating characteristic curve. The classification results demonstrate the effectiveness of our feature selection and pattern analysis methods for dysphonic voice detection and measurement.

APA, Harvard, Vancouver, ISO, and other styles

7

Matassini, Lorenzo, Rainer Hegger, Holger Kantz, and Claudia Manfredi. "Analysis of vocal disorders in a feature space." Medical Engineering & Physics 22, no. 6 (July 2000): 413–18. http://dx.doi.org/10.1016/s1350-4533(00)00048-5.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

SHIMADA M., Yohko. "Feature of infants' sound in overlapping vocal communication." Proceedings of the Annual Convention of the Japanese Psychological Association 75 (September 15, 2011): 2PM077. http://dx.doi.org/10.4992/pacjpa.75.0_2pm077.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Hoq, Muntasir, Mohammed Nazim Uddin, and Seung-Bo Park. "Vocal Feature Extraction-Based Artificial Intelligent Model for Parkinson’s Disease Detection." Diagnostics 11, no. 6 (June 11, 2021): 1076. http://dx.doi.org/10.3390/diagnostics11061076.

Full text

Abstract:

As a neurodegenerative disorder, Parkinson’s disease (PD) affects the nerve cells of the human brain. Early detection and treatment can help to relieve the symptoms of PD. Recent PD studies have extracted the features from vocal disorders as a harbinger for PD detection, as patients face vocal changes and impairments at the early stages of PD. In this study, two hybrid models based on a Support Vector Machine (SVM) integrating with a Principal Component Analysis (PCA) and a Sparse Autoencoder (SAE) are proposed to detect PD patients based on their vocal features. The first model extracted and reduced the principal components of vocal features based on the explained variance of each feature using PCA. For the first time, the second model used a novel Deep Neural Network (DNN) of an SAE, consisting of multiple hidden layers with L1 regularization to compress the vocal features into lower-dimensional latent space. In both models, reduced features were fed into the SVM as inputs, which performed classification by learning hyperplanes, along with projecting the data into a higher dimension. An F1-score, a Mathews Correlation Coefficient (MCC), and a Precision-Recall curve were used, along with accuracy to evaluate the proposed models due to highly imbalanced data. With its highest accuracy of 0.935, F1-score of 0.951, and MCC value of 0.788, the probing results show that the proposed model of the SAE-SVM surpassed not only the former model of the PCA-SVM and other standard models including Multilayer Perceptron (MLP), Extreme Gradient Boosting (XGBoost), K-Nearest Neighbor (KNN), and Random Forest (RF), but also surpassed two recent studies using the same dataset. Oversampling and balancing the dataset with SMOTE boosted the performance of the models.

APA, Harvard, Vancouver, ISO, and other styles

10

Zang, Lu. "Investigation on the Extraction Methods of Timbre Features in Vocal Singing Based on Machine Learning." Computational Intelligence and Neuroscience 2022 (September 17, 2022): 1–11. http://dx.doi.org/10.1155/2022/5074829.

Full text

Abstract:

With the continuous development of digital technology, music, as an important form of media, and its digital audio technology is also constantly developing, forcing the traditional music industry to start the road of digital transformation. What kind of method can be used to automatically retrieve music information effectively and quickly in vocal singing has become one of the current research topics that has attracted much attention. Aiming at this problem, it is of great research significance for the field of timbre feature recognition. With the in-depth research on timbre feature recognition, the research on timbre feature extraction by machine learning in vocal singing has also been gradually carried out, and its performance advantages are of great significance to solve the problem of automatic retrieval of music information. This paper aims to study the application of feature extraction algorithm based on machine learning in timbre feature extraction in vocal singing. Through the analysis and research of machine learning and feature extraction methods, it can be applied to the construction of timbre feature extraction algorithms to solve the problem of automatic retrieval of music information. This paper analyzed vocal singing, machine learning, and feature extraction, experimentally analyzed the performance of the method, and used related theoretical formulas to explain. The results have showed that the method for timbre feature extraction in the vocal singing environment was more accurate than the traditional method, the difference between the two was 24.27%, and the proportion of satisfied users was increased by 33%. It can be seen that this method can meet the needs of users for timbre feature extraction in the use of music software, and the work efficiency and user satisfaction are greatly improved.

APA, Harvard, Vancouver, ISO, and other styles

11

Ma, He, Yi Zuo, Tieshan Li, and C. L. Philip Chen. "Data-Driven Decision-Support System for Speaker Identification Using E-Vector System." Scientific Programming 2020 (June 29, 2020): 1–13. http://dx.doi.org/10.1155/2020/4748606.

Full text

Abstract:

Recently, biometric authorizations using fingerprint, voiceprint, and facial features have garnered considerable attention from the public with the development of recognition techniques and popularization of the smartphone. Among such biometrics, voiceprint has a personal identity as high as that of fingerprint and also uses a noncontact mode to recognize similar faces. Speech signal-processing is one of the keys to accuracy in voice recognition. Most voice-identification systems still employ the mel-scale frequency cepstrum coefficient (MFCC) as the key vocal feature. The quality and accuracy of the MFCC are dependent on the prepared phrase, which belongs to text-dependent speaker identification. In contrast, several new features, such as d-vector, provide a black-box process in vocal feature learning. To address these aspects, a novel data-driven approach for vocal feature extraction based on a decision-support system (DSS) is proposed in this study. Each speech signal can be transformed into a vector representing the vocal features using this DSS. The establishment of this DSS involves three steps: (i) voice data preprocessing, (ii) hierarchical cluster analysis for the inverse discrete cosine transform cepstrum coefficient, and (iii) learning the E-vector through minimization of the Euclidean metric. We compare experiments to verify the E-vectors extracted by this DSS with other vocal features measures and apply them to both text-dependent and text-independent datasets. In the experiments containing one utterance of each speaker, the average accuracy of the E-vector is improved by approximately 1.5% over the MFCC. In the experiments containing multiple utterances of each speaker, the average micro-F1 score of the E-vector is also improved by approximately 2.1% over the MFCC. The results of the E-vector show remarkable advantages when applied to both the Texas Instruments/Massachusetts Institute of Technology corpus and LibriSpeech corpus. These improvements of the E-vector contribute to the capabilities of speaker identification and also enhance its usability for more real-world identification tasks.

APA, Harvard, Vancouver, ISO, and other styles

12

Yogakanthi, Saiumaeswar, Christine Wools, and Susan Mathers. "Unilateral vocal cord adductor weakness: an atypical manifestation of motor neurone disease." BMJ Neurology Open 3, no. 2 (October 2021): e000205. http://dx.doi.org/10.1136/bmjno-2021-000205.

Full text

Abstract:

BackgroundBulbar involvement is a recognised feature of motor neuron disease/amyotrophic lateral sclerosis (MND/ALS), both as a presenting complaint and as a consequence of advancing disease. Hoarseness and dysphonia have been associated with vocal cord abductor weakness. This is usually bilateral and has also been reported as the presenting clinical feature in a handful of patients with superoxide dismutase 1 (SOD1) gene mutations. Presentation with an isolated, unilateral vocal cord adductor weakness, however, is atypical and rare.CaseIn this report, we detail the case of a 38-year-old woman with dysphonia and a family history of an SOD1 mutation. Neurological features remained confined to the territory of the left vagus nerve for the next 12 months, before a more rapid rate of disease dissemination and progression.ConclusionsThis case highlights the importance of recognition of vocal cord palsy as an early manifestation of MND/ALS and the critical need for monitoring to recognise potential disease progression.

APA, Harvard, Vancouver, ISO, and other styles

13

Sakai, Motoki. "Estimation of Heart Rate from Vocal Frequency Based on Support Vector Machine." International Journal of Advances in Scientific Research 2, no. 1 (January 30, 2016): 16. http://dx.doi.org/10.7439/ijasr.v2i1.2849.

Full text

Abstract:

Heart rate (HR) is one of the vital signs used to assess our physical condition; it would be beneficial if HR could easily be obtained without special medical instruments. In this study, a feature of vocal frequency was used to estimate HR, because it can easily be recorded with a common device such as a smartphone. Previous studies proposed that a support vector machine (SVM) that adopted the inner product as the kernel function was efficient for estimating HR to a certain extent. However, these studies did not present the effectiveness of other kernel functions, such as the hyperbolic tangent function. Therefore, this study identified a combination of kernel functions of the kernel ridge regression (KRR). In addition, features of vocal frequency to effectively estimate HR were investigated. To evaluate the effectiveness, experiments were conducted with two subjects. In the experiment, 60 sets of HRs and voice data were measured per subject. To identify the most effective kernel function, four kernel functions (the inner function, Gaussian function, polynomial function, and hyperbolic tangent function) were compared. Moreover, effective features of vocal frequency were selected with the sequential feature selection (SFS) method. As a consequence, the hyperbolic tangent function worked best, and high-frequency components of voice were efficient. However, results of this research indicated that effective vocal spectrum components to estimate HR differ depending on prediction models.

APA, Harvard, Vancouver, ISO, and other styles

14

Zhang, Xulong, Yi Yu, Yongwei Gao, Xi Chen, and Wei Li. "Research on Singing Voice Detection Based on a Long-Term Recurrent Convolutional Network with Vocal Separation and Temporal Smoothing." Electronics 9, no. 9 (September 7, 2020): 1458. http://dx.doi.org/10.3390/electronics9091458.

Full text

Abstract:

Singing voice detection or vocal detection is a classification task that determines whether a given audio segment contains singing voices. This task plays a very important role in vocal-related music information retrieval tasks, such as singer identification. Although humans can easily distinguish between singing and nonsinging parts, it is still very difficult for machines to do so. Most existing methods focus on audio feature engineering with classifiers, which rely on the experience of the algorithm designer. In recent years, deep learning has been widely used in computer hearing. To extract essential features that reflect the audio content and characterize the vocal context in the time domain, this study adopted a long-term recurrent convolutional network (LRCN) to realize vocal detection. The convolutional layer in LRCN functions in feature extraction, and the long short-term memory (LSTM) layer can learn the time sequence relationship. The preprocessing of singing voices and accompaniment separation and the postprocessing of time-domain smoothing were combined to form a complete system. Experiments on five public datasets investigated the impacts of the different features for the fusion, frame size, and block size on LRCN temporal relationship learning, and the effects of preprocessing and postprocessing on performance, and the results confirm that the proposed singing voice detection algorithm reached the state-of-the-art level on public datasets.

APA, Harvard, Vancouver, ISO, and other styles

15

Alhussein, Musaed, Zulfiqar Ali, Muhammad Imran, and Wadood Abdul. "Automatic Gender Detection Based on Characteristics of Vocal Folds for Mobile Healthcare System." Mobile Information Systems 2016 (2016): 1–12. http://dx.doi.org/10.1155/2016/7805217.

Full text

Abstract:

An automatic gender detection may be useful in some cases of a mobile healthcare system. For example, there are some pathologies, such as vocal fold cyst, which mainly occur in female patients. If there is an automatic method for gender detection embedded into the system, it is easy for a healthcare professional to assess and prescribe appropriate medication to the patient. In human voice production system, contribution of the vocal folds is very vital. The length of the vocal folds is gender dependent; a male speaker has longer vocal folds than a female speaker. Due to longer vocal folds, the voice of a male becomes heavy and, therefore, contains more voice intensity. Based on this idea, a new type of time domain acoustic feature for automatic gender detection system is proposed in this paper. The proposed feature measures the voice intensity by calculating the area under the modified voice contour to make the differentiation between males and females. Two different databases are used to show that the proposed feature is independent of text, spoken language, dialect region, recording system, and environment. The obtained results for clean and noisy speech are 98.27% and 96.55%, respectively.

APA, Harvard, Vancouver, ISO, and other styles

16

Kim, Keun Ho, Boncho Ku, Namsik Kang, Young-Su Kim, Jun-Su Jang, and Jong Yeol Kim. "Study of a Vocal Feature Selection Method and Vocal Properties for Discriminating Four Constitution Types." Evidence-Based Complementary and Alternative Medicine 2012 (2012): 1–10. http://dx.doi.org/10.1155/2012/831543.

Full text

Abstract:

The voice has been used to classify the four constitution types, and to recognize a subject's health condition by extracting meaningful physical quantities, in traditional Korean medicine. In this paper, we propose a method of selecting the reliable variables from various voice features, such as frequency derivative features, frequency band ratios, and intensity, from vowels and a sentence. Further, we suggest a process to extract independent variables by eliminating explanatory variables and reducing their correlation and remove outlying data to enable reliable discriminant analysis. Moreover, the suitable division of data for analysis, according to the gender and age of subjects, is discussed. Finally, the vocal features are applied to a discriminant analysis to classify each constitution type. This method of voice classification can be widely used in the u-Healthcare system of personalized medicine and for improving diagnostic accuracy.

APA, Harvard, Vancouver, ISO, and other styles

17

Yang, Jingzhou. "Personalized Song Recommendation System Based on Vocal Characteristics." Mathematical Problems in Engineering 2022 (March 16, 2022): 1–10. http://dx.doi.org/10.1155/2022/3605728.

Full text

Abstract:

To find favorite songs among massive songs has become a difficult problem. The song recommendation algorithm makes personalized recommendations by analyzing user’s historical behavior, which can reduce user’s information fatigue and improve the user experience. This paper studies a personalized song recommendation algorithm based on vocal features. The specific work includes three parts. Firstly, the spectrum feature extraction and observe feature extraction of songs. The spectrum includes three types of features: time domain, frequency domain, and amplitude, which implicitly describe the rhythm, notes, and high-pitched or soothing properties of songs. Furthermore, automatic note recognition methods are explored as explicit classification features. The characteristic of this work is to use the comprehensive features of spectrum and musical notes as the classification basis. Secondly, based on song of convolutional neural network (CNN) classification, it sets different types of song classification. For the training of CNN, ELU, and ReLU, RMSProp and Adam were explored, and their performance and characteristics during training were compared. The classification methods were compared under the two configurations based on the spectrum as the classification basis and the comprehensive characteristic frequency of the spectrum and the note as the basis. Thirdly, a personalized song recommendation method was based on CNN classification. Also, the reasons why classification of CNN is not suitable for direct song recommendation are analyzed, and then, a recommendation method based on song fragment classification is proposed. A threshold model that can distinguish between pseudodiscrete and true-discrete is proposed to improve the accuracy of song classification.

APA, Harvard, Vancouver, ISO, and other styles

18

Goble, J. R., P. F. Suarez, S. K. Rogers, D. W. Ruck, C. Arndt, and M. Kabrisky. "A facial feature communications interface for the non-vocal." IEEE Engineering in Medicine and Biology Magazine 12, no. 3 (September 1993): 46–48. http://dx.doi.org/10.1109/51.232340.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

McGlashan, J. A., and D. G. Golding-Wood. "Snoring as a presenting feature of the Shy-Drager syndrome." Journal of Laryngology & Otology 103, no. 6 (June 1989): 610–11. http://dx.doi.org/10.1017/s002221510010948x.

Full text

Abstract:

AbstractA 67-year-old man with a progressive snoring habit is presented. Fluctuant bilateral abductor vocal cord paralysis was later recognized together with autonomic features suggesting a diagnosis of Shy-Drager syndrome. Snoring as a presenting feature of this condition has been infrequently described. This case highlights the importance of careful assessment of snorers.

APA, Harvard, Vancouver, ISO, and other styles

20

Zhang, Xiaoyan. "Research on Modeling of Vocal State Duration Based on Spectrogram Analysis." E3S Web of Conferences 236 (2021): 04043. http://dx.doi.org/10.1051/e3sconf/202123604043.

Full text

Abstract:

In the early stage of vocal music education, students generally do not understand the structure of the human body, and have doubts about how to pronounce their voices scientifically. However, with the continuous development of computers, computer technology has become more and more developed, and computer processing speed has been greatly increased, which provides favorable conditions for the development of the application of vocal spectrum analysis technology in vocal music teaching. In this paper, we first study the GMM-SVM and DBN, and combine them to extract the deep Gaussian super vector DGS, and further construct the feature DGCS on the basis of DGS; then we study the convolutional neural network (CNN), which has achieved great success in the image recognition task in recent years, and design a CNN model to extract the deep fusion features of vocal music. The experimental simulations show that the CNN fusion-based speaker recognition system achieves very good results in terms of recognition rate.

APA, Harvard, Vancouver, ISO, and other styles

21

Abdullah Al, Walid Abdullah, Wonjae Cha, and Il Dong Yun. "Reinforcement Learning Based Vocal Fold Localization in Preoperative Neck CT for Injection Laryngoplasty." Applied Sciences 13, no. 1 (December 25, 2022): 262. http://dx.doi.org/10.3390/app13010262.

Full text

Abstract:

Transcutaneous injection laryngoplasty is a well-known procedure for treating a paralyzed vocal fold by injecting augmentation material to it. Hence, vocal fold localization plays a vital role in the preoperative planning, as the fold location is required to determine the optimal injection route. In this communication, we propose a mirror environment based reinforcement learning (RL) algorithm for localizing the right and left vocal folds in preoperative neck CT. RL-based methods commonly showed noteworthy outcomes in general anatomic landmark localization problems in recent years. However, such methods suggest training individual agents for localizing each fold, although the right and left vocal folds are located in close proximity and have high feature-similarity. Utilizing the lateral symmetry between the right and left vocal folds, the proposed mirror environment allows for a single agent for localizing both folds by treating the left fold as a flipped version of the right fold. Thus, localization of both folds can be trained using a single training session that utilizes the inter-fold correlation and avoids redundant feature learning. Experiments with 120 CT volumes showed improved localization performance and training efficiency of the proposed method compared with the standard RL method.

APA, Harvard, Vancouver, ISO, and other styles

22

Leena Ali Al-Jarrah, Mustafa Hasan Alqudah, Leena Ali Al-Jarrah, Mustafa Hasan Alqudah. "Vocal Harmony in Zaid Bin Ali Reading: الانسجام الصوتي في قراءة زيد بن علي." المجلة العربية للعلوم و نشر الأبحاث 7, no. 2 (June 29, 2021): 40–26. http://dx.doi.org/10.26389/ajsrp.m241120.

Full text

Abstract:

The Quran readings are considered to be a source of Arab linguistic heritage where they represent various linguistic phenomena characterized by many functional vocal ones. There is no doubt that Zaid Bin Ali contains several vocal phenomena that needs to be thought over deeply. Therefore, the importance of the study emerges where the vocal phenomena whose function was vocal harmony will be dealt by study and analysis to identify their types and influence on the structure of the word. It was followed by a number of results, the most important of which are: The dominant feature of the verbal performance of the reading of Zaid bin Ali is vocal harmony and lightness.

APA, Harvard, Vancouver, ISO, and other styles

23

Day, Nancy F., Amanda K. Kinnischtzke, Murtaza Adam, and Teresa A. Nick. "Top-Down Regulation of Plasticity in the Birdsong System: “Premotor” Activity in the Nucleus HVC Predicts Song Variability Better Than It Predicts Song Features." Journal of Neurophysiology 100, no. 5 (November 2008): 2956–65. http://dx.doi.org/10.1152/jn.90501.2008.

Full text

Abstract:

We studied real-time changes in brain activity during active vocal learning in the zebra finch songbird. The song nucleus HVC is required for the production of learned song. To quantify the relationship of HVC activity and behavior, HVC population activity during repeated vocal sequences (motifs) was recorded and temporally aligned relative to the motif, millisecond by millisecond. Somewhat surprisingly, HVC activity did not reliably predict any vocal feature except amplitude and, to a lesser extent, entropy and pitch goodness (sound periodicity). Variance in “premotor” HVC activity did not reliably predict variance in behavior. In contrast, HVC activity inversely predicted the variance of amplitude, entropy, frequency, pitch, and FM. We reasoned that, if HVC was involved in song learning, the relationship of HVC activity to learned features would be developmentally regulated. To test this hypothesis, we compared the HVC song feature relationships in adults and juveniles in the sensorimotor “babbling” period. We found that the relationship of HVC activity to variance in FM was developmentally regulated, with the greatest difference at an HVC vocalization lag of 50 ms. Collectively, these data show that, millisecond by millisecond, bursts in HVC activity predict song stability on-line during singing, whereas decrements in HVC activity predict plasticity. These relationships between neural activity and plasticity may play a role in vocal learning in songbirds by enabling the selective stabilization of parts of the song that match a learned tutor model.

APA, Harvard, Vancouver, ISO, and other styles

24

Sethi, NitinK, Josh Torgovnick, Edward Arsura, PrahladK Sethi, and Anuradha Batra. "Vocal cord palsy: An uncommon presenting feature of myasthenia gravis." Annals of Indian Academy of Neurology 14, no. 1 (2011): 42. http://dx.doi.org/10.4103/0972-2327.78049.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Gunduz, Hakan. "Deep Learning-Based Parkinson’s Disease Classification Using Vocal Feature Sets." IEEE Access 7 (2019): 115540–51. http://dx.doi.org/10.1109/access.2019.2936564.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Khadivi Heris, Hossein, Babak Seyed Aghazadeh, and Mansour Nikkhah-Bahrami. "Optimal feature selection for the assessment of vocal fold disorders." Computers in Biology and Medicine 39, no. 10 (October 2009): 860–68. http://dx.doi.org/10.1016/j.compbiomed.2009.06.014.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Kushch, Viktoriia. "Pop-song and academic chamber vocal music: points of crossing." National Academy of Managerial Staff of Culture and Arts Herald, no. 2 (September 17, 2021): 273–78. http://dx.doi.org/10.32461/2226-3209.2.2021.240083.

Full text

Abstract:

The purpose of the article is to identify the points of crossing of pop songs and academic chamber vocal music in the Ukrainian cultural and artistic space of the second half of the 20th century. The methodology involves the use of analytical, systemic and historical, and cultural methods to identify the relationship between the pop song genre and academic chamber vocal music in the Ukrainian musical culture of the second half of the 20th century. The scientific novelty of the work lies in the characterization of I. Karabits’ pop songs from the point of view of combining the features of pop and academic chamber vocal music in them. Conclusions. Pop song and chamber vocal music, represented by the genre of solo singing, developed separately in the Ukrainian cultural space of the 1950s-1980s, but their paths often crossed. In the context of their interaction in pop-song creativity, the process of academization takes place, and in academic music – hitting. Based on the analysis of two popular pop songs by I.Karabits «My land is my love» and «A song for good», a specific feature of a number of vocal compositions of the composer was discovered and described, which are functionally ambivalent and correspond to the aesthetics of both academic and pop music, and therefore, they are indicated as works of dual-use – for both the academic and the pop scene. This duality is based on the musical component of a vocal work, which, with variability in the interpretation of the instrumental (and sometimes vocal) component, can enhance the features of both academic and pop music.

APA, Harvard, Vancouver, ISO, and other styles

28

Kacur, Juraj, Boris Puterka, Jarmila Pavlovicova, and Milos Oravec. "On the Speech Properties and Feature Extraction Methods in Speech Emotion Recognition." Sensors 21, no. 5 (March 8, 2021): 1888. http://dx.doi.org/10.3390/s21051888.

Full text

Abstract:

Many speech emotion recognition systems have been designed using different features and classification methods. Still, there is a lack of knowledge and reasoning regarding the underlying speech characteristics and processing, i.e., how basic characteristics, methods, and settings affect the accuracy, to what extent, etc. This study is to extend physical perspective on speech emotion recognition by analyzing basic speech characteristics and modeling methods, e.g., time characteristics (segmentation, window types, and classification regions—lengths and overlaps), frequency ranges, frequency scales, processing of whole speech (spectrograms), vocal tract (filter banks, linear prediction coefficient (LPC) modeling), and excitation (inverse LPC filtering) signals, magnitude and phase manipulations, cepstral features, etc. In the evaluation phase the state-of-the-art classification method and rigorous statistical tests were applied, namely N-fold cross validation, paired t-test, rank, and Pearson correlations. The results revealed several settings in a 75% accuracy range (seven emotions). The most successful methods were based on vocal tract features using psychoacoustic filter banks covering the 0–8 kHz frequency range. Well scoring are also spectrograms carrying vocal tract and excitation information. It was found that even basic processing like pre-emphasis, segmentation, magnitude modifications, etc., can dramatically affect the results. Most findings are robust by exhibiting strong correlations across tested databases.

APA, Harvard, Vancouver, ISO, and other styles

29

Srinivasa Murthy, Y. V., and Shashidhar G. Koolagudi. "Classification of vocal and non-vocal segments in audio clips using genetic algorithm based feature selection (GAFS)." Expert Systems with Applications 106 (September 2018): 77–91. http://dx.doi.org/10.1016/j.eswa.2018.04.005.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Mo, Wenwen, and Yuan Yuan. "Design of Interactive Vocal Guidance and Artistic Psychological Intervention System Based on Emotion Recognition." Occupational Therapy International 2022 (June 17, 2022): 1–9. http://dx.doi.org/10.1155/2022/1079097.

Full text

Abstract:

The research on artistic psychological intervention to judge emotional fluctuations by extracting emotional features from interactive vocal signals has become a research topic with great potential for development. Based on the interactive vocal music instruction theory of emotion recognition, this paper studies the design of artistic psychological intervention system. This paper uses the vocal music emotion recognition algorithm to first train the interactive recognition network, in which the input is a row vector composed of different vocal music characteristics, and finally recognizes the vocal music of different emotional categories, which solves the problem of low data coupling in the artistic psychological intervention system. Among them, the vocal music emotion recognition experiment based on the interactive recognition network is mainly carried out from six aspects: the number of iterative training, the vocal music instruction rate, the number of emotion recognition signal nodes in the artistic psychological intervention layer, the number of sample sets, different feature combinations, and the number of emotion types. The input data of the system is a training class learning video, and actions and expressions need to be recognized before scoring. In the simulation process, before the completion of the sample indicators is unbalanced, the R language statistical analysis tool is used to balance the existing unbalanced data based on the artificial data synthesis method, and 279 uniformly classified samples are obtained. The 279 ∗ 7 dataset was used for statistical identification of the participants. The experimental results show that under the guidance of four different interactive vocal music, the vocal emotion recognition rate is between 65.85%-91.00%, which promotes the intervention of music therapy on artistic psychological intervention.

APA, Harvard, Vancouver, ISO, and other styles

31

MUHAMMAD JAVAID, HABIB UR REHMAN AFRIDI, FAZALI WAHID, QAISAR KHAN, NASEEMUL HAQ, and ISTERAJ KHAN SHAHABI. "CLINICO-ETIOLOGICAL SKETCH OF VOCAL CORD PALSY." Journal of Saidu Medical College, Swat 3, no. 2 (April 19, 2021): 359–62. http://dx.doi.org/10.52206/jsmc.2013.3.2.359-362.

Full text

Abstract:

OBJECTIVE: To determine clinical features and causesof vocal cord paralysis in our set up.MATERIALAND METHODS: This descriptive study was conducted in the department of ENT, Head &Neck Surgery, Hayat Abad Medical complex, Peshawar from January 2010 to December 2012. All newlydiagnosed patients of any age and either gender included. After enrollment a detailed history was taken,thorough ENT and systemic examination was conducted especially focusing on the causes of vocal cordpalsy. After routine investigation endoscopic examination of the upper aero-digestive tract was carried outto establish the diagnosis and causes of vocal cord paralysis.All these patients were followed regularly.Thedata was collected on a pre-designed proforma and analyzed using SPSS version 15.RESULTS: We studied 90 patients over a period of 3 years (2010-2012). Males outnumbered (n-60) withmale to female ratio of 2:1. These patients were in age range of 6-85 years with mean age of 47.33 ± SD21.15 years. The commonest presentation was change of voice (100%). Majority of the patients (n-52,57.77%) were non smoker. Hoarseness was the dominant (94.44%) presentation of these patients. Thecommonest causes of vocal cord palsy was idiopathic (34.44%) followed by thyroid surgery (21.11%). Leftvocal cord palsy was the commonest finding.CONCLUSION: We concluded from this study that vocal cord paralysis is still a challenge for ENTsurgeon. Although in our study idiopathic cause of vocal cord palsy was on top but thyroidectomy has greatcontribution to vocal cord paralysis which can be further minimized if surgeon gives some time to identifythe recurrent laryngeal nerve during thyroid surgery.KEY WORDS: Vocal cord palsy, vocal fold paralysis, hoarseness, clinical feature, etiology

APA, Harvard, Vancouver, ISO, and other styles

32

Hommel, Bernhard, and Jochen Müsseler. "Action-feature integration blinds to feature-overlapping perceptual events: Evidence from manual and vocal actions." Quarterly Journal of Experimental Psychology 59, no. 3 (March 2006): 509–23. http://dx.doi.org/10.1080/02724980443000836.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Palo, Hemanta Kumar, Mihir Narayan Mohanty, and Mahesh Chandra. "Speech Emotion Analysis of Different Age Groups Using Clustering Techniques." International Journal of Information Retrieval Research 8, no. 1 (January 2018): 69–85. http://dx.doi.org/10.4018/ijirr.2018010105.

Full text

Abstract:

The shape, length, and size of the vocal tract and vocal folds vary with the age of the human being. The variation may be of different age or sickness or some other conditions. Arguably, the features extracted from the utterances for the recognition task may differ for different age group. It complicates further for different emotions. The recognition system demands suitable feature extraction and clustering techniques that can separate their emotional utterances. Psychologists, criminal investigators, professional counselors, law enforcement agencies and a host of other such entities may find such analysis useful. In this article, the emotion study has been evaluated for three different age groups of people using the basic age- dependent features like pitch, speech rate, and log energy. The feature sets have been clustered for different age groups by utilizing K-means and Fuzzy c-means (FCM) algorithm for the boredom, sadness, and anger states. K-means algorithm has outperformed the FCM algorithm in terms of better clustering and lower computation time as the authors' results suggest.

APA, Harvard, Vancouver, ISO, and other styles

34

Garcia, Maxime, and Andrea Ravignani. "Acoustic allometry and vocal learning in mammals." Biology Letters 16, no. 7 (July 2020): 20200081. http://dx.doi.org/10.1098/rsbl.2020.0081.

Full text

Abstract:

Acoustic allometry is the study of how animal vocalizations reflect their body size. A key aim of this research is to identify outliers to acoustic allometry principles and pinpoint the evolutionary origins of such outliers. A parallel strand of research investigates species capable of vocal learning , the experience-driven ability to produce novel vocal signals through imitation or modification of existing vocalizations. Modification of vocalizations is a common feature found when studying both acoustic allometry and vocal learning. Yet, these two fields have only been investigated separately to date. Here, we review and connect acoustic allometry and vocal learning across mammalian clades, combining perspectives from bioacoustics, anatomy and evolutionary biology. Based on this, we hypothesize that, as a precursor to vocal learning, some species might have evolved the capacity for volitional vocal modulation via sexual selection for ‘dishonest' signalling. We provide preliminary support for our hypothesis by showing significant associations between allometric deviation and vocal learning in a dataset of 164 mammals. Our work offers a testable framework for future empirical research linking allometric principles with the evolution of vocal learning.

APA, Harvard, Vancouver, ISO, and other styles

35

Lindenmayer, D. B., R. B. Cunningham, and B. D. Lindenmayer. "Sound recording of bird vocalisations in forests. II. Longitudinal profiles in vocal activity." Wildlife Research 31, no. 2 (2004): 209. http://dx.doi.org/10.1071/wr02063.

Full text

Abstract:

As early morning bird vocalisation is a major feature of many bird communities, longitudinal profiles of vocal activity data, collected using sound recorders, were compared for a range of habitat types in the Tumut area of south-eastern Australia. There was a significant, and roughly linear, decline in vocal activity across the morning after an initial early peak of activity. Vocal activity persisted longer at sites located within large areas of continuous eucalypt forest than in the strip- and patch-shaped eucalypt remnants surrounded by extensive stands of radiata pine or at sites dominated by stands of radiata pine. There was evidence that the pattern of persistence of vocal activity differed among the different bird groups.

APA, Harvard, Vancouver, ISO, and other styles

36

Zhou, Changwei, Lili Zhang, Yuanbo Wu, Xiaojun Zhang, Di Wu, and Zhi Tao. "Effects of Sulcus Vocalis Depth on Phonation in Three-Dimensional Fluid-Structure Interaction Laryngeal Models." Applied Bionics and Biomechanics 2021 (April 8, 2021): 1–11. http://dx.doi.org/10.1155/2021/6662625.

Full text

Abstract:

Sulcus vocalis is an indentation parallel to the edge of vocal fold, which may extend into the cover and ligament layer of the vocal fold or deeper. The effects of sulcus vocalis depth d on phonation and the vocal cord vibrations are investigated in this study. The three-dimensional laryngeal models were established for healthy vocal folds (0 mm) and different types of sulcus vocalis with the typical depth of 1 mm, 2 mm, and 3 mm. These models with fluid-structure interaction (FSI) are computed numerically by sequential coupling method, which includes an immersed boundary method (IBM) for modelling the glottal airflow, a finite-element method (FEM) for modelling vocal fold tissue. The results show that a deeper sulcus vocalis in the cover layer decreases the vibrating frequency of vocal folds and expands the prephonatory glottal half-width which increases the phonation threshold pressure. The larger sulcus vocalis depth makes vocal folds difficult to vibrate and phonate. The effects of sulcus vocalis depth suggest that the feature such as phonation threshold pressure could assist in the detection of healthy vocal folds and different types of sulcus vocalis.

APA, Harvard, Vancouver, ISO, and other styles

37

Moon, Seong Kyu, Moon Seung Beag, Mi Ji Lee, and Seung Woo Kim. "A Case of Glottic Amyloidosis Presenting as Leukoplakia." Korean Journal of Otorhinolaryngology-Head and Neck Surgery 65, no. 5 (May 21, 2022): 293–95. http://dx.doi.org/10.3342/kjorl-hns.2022.00038.

Full text

Abstract:

Amyloidosis is defined as a deposit of amyloid substance. While it rarely occurs in the head and neck region, it is most commonly found in the larynx. Laryngeal amyloidosis can occur in the false vocal cord, ventricle, and glottis etc. The typical feature of laryngeal amyloidosis is a round yellowish submucosal mass. A 72-year-old male presented with voice change that began a couple of years ago. The rigid laryngoscopy showed a whitish patch in the medial and superior surfaces of the left true vocal fold. He was pathologically diagnosed with amyloidosis by laryngeal microsurgery. With a relevant review of literature, we report this case as it demonstrates rare, atypical features of laryngeal amyloidosis.

APA, Harvard, Vancouver, ISO, and other styles

38

Ravindran, Sindhu, Neoh Siew-Chin, and Hariharan Muthusamy. "Optimal Selection of Long Time Acoustic Features Using GA for the Assessment of Vocal Fold Disorders." Applied Mechanics and Materials 239-240 (December 2012): 65–70. http://dx.doi.org/10.4028/www.scientific.net/amm.239-240.65.

Full text

Abstract:

In recent times, vocal fold problems have been increasing dramatically due to unhealthy social habits and voice abuse. Non-invasive methods like acoustic analysis of voice signals can be used to investigate such problems. Various feature extraction techniques are used to classify the voice signals into normal and pathological. Among them, long-time acoustical parameters are used by many researchers. The selection of best long-time acoustical parameters is very important to reduce the computational complexity, as well as to achieve better accuracy with minimum number of features. In order to select best long-time acoustical parameters, different feature reduction methods or feature selection methods are proposed by researchers. In this work, genetic algorithm (GA) based optimal selection of long-time acoustical parameters is proposed to achieve higher accuracy with minimum number of features. The classification is carried out using k-nearest neighbourhood (k-NN) classifier. In comparison with other works in the literature, the simulation results show that a minimum of 5 features are required to classify the voice signals by GA and a better accuracy of 94.29% is achieved.

APA, Harvard, Vancouver, ISO, and other styles

39

DuBois, Adrienne L., Stephen Nowicki, and William A. Searcy. "Swamp sparrows modulate vocal performance in an aggressive context." Biology Letters 5, no. 2 (December 16, 2008): 163–65. http://dx.doi.org/10.1098/rsbl.2008.0626.

Full text

Abstract:

Vocal performance refers to the proficiency with which a bird sings songs that are challenging to produce, and can be measured in simple trilled songs by their deviation from an upper bound regression of frequency bandwidth on trill rate. Here, we show that male swamp sparrows ( Melospiza georgiana ) increase the vocal performance of individual song types in aggressive contexts by increasing both the trill rate and frequency bandwidth. These results are the first to demonstrate flexible modulation by songbirds of this aspect of vocal performance and are consistent with this signal feature having a role in aggressive communication.

APA, Harvard, Vancouver, ISO, and other styles

40

Castellucci, Gregg A. "Neural and cognitive mechanisms for vocal communication." Journal of the Acoustical Society of America 152, no. 4 (October 2022): A275. http://dx.doi.org/10.1121/10.0016252.

Full text

Abstract:

Vocal communication is a central feature of social behavior across numerous species. While the neural systems underlying vocalization in humans and animals is well described, less is known about how these circuits enable naturalistic vocal interactions. To investigate how the human brain gives rise to ethologically relevant vocal interactions—conversational turn-taking—a series of intracranial recording and perturbation experiments are performed to precisely assay neural activity while neurosurgical patients engage in both task-based and unconstrained turn-taking. In these social contexts, spatially and functionally distinct networks are uncovered, which are critical for a speaker’s ability to comprehend their partner’s turns, plan their own turns, and articulate the speech comprising those turns. To better understand the neural mechanisms underlying specific computations relevant to vocal communication, a theoretical framework is constructed, which consists of the cognitive modules required for generating communicative action during interaction (e.g., vocalization, co-speech gesture). This model is designed to account for the behavioral and neurobiological features of both naturalistic human language and animal communication; therefore, this species-general framework is intended to facilitate the identification of cognitive analogues between human and non-human interaction—which may rely on similar neural mechanisms. [Work supported by the NIH & Simons Collaboration on the Global Brain.]

APA, Harvard, Vancouver, ISO, and other styles

41

Majidnezhad, Vahid, and Igor Kheidorov. "A Novel GMM-Based Feature Reduction for Vocal Fold Pathology Diagnosis." Research Journal of Applied Sciences, Engineering and Technology ` 11, no. 9 (February 21, 2013): 2245–54. http://dx.doi.org/10.19026/rjaset.5.4779.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Hariharan, M., Kemal Polat, and Sazali Yaacob. "A new feature constituting approach to detection of vocal fold pathology." International Journal of Systems Science 45, no. 8 (May 14, 2013): 1622–34. http://dx.doi.org/10.1080/00207721.2013.794905.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Loza, Sergiy, and Darina Kupina. "Genre stylistic features of О. Вelash’s chamber-vocal creativeness." Музикознавча думка Дніпропетровщини, no. 18 (November 12, 2020): 15–25. http://dx.doi.org/10.33287/222014.

Full text

Abstract:

The purpose of this scientific article is to identify the specifics of chamber vocal creativity of O. Bilash in the context of Ukrainian musical culture of the 20th century. The following investigative methods are involved in the work: comparative, historiographic, analytical, genre, style, axiological. The scientific novelty of this represented exploratory work lies in the fact that for the first time it was made the attempt to detailise specifics of chamber-vocal creativity of O. Bilash. Chamber songs form the basis of the work of O. Bilash. Among the diversity of his songs, several certain groups can be distinguished, among which the main ones are works of a civilian theme, landscape lyrics and intimate lyrics. Among the best features inherent in the work of O. Bilash, we single out lyrical reflection, high sentimentality in the best sense of the word, laconicism of expression and precise psychologism. An integral feature of chamber-vocal creativity of O. Belash is romance, which manifests itself on the genre, style and intonation levels, which brings his chamber-vocal works closer to the 19th century song lyrics. It is romances that can be considered the most revealing from the point of view of individual style among the genre variety of chamber-vocal works of the composer. Conclusions. Celebrated composer O. Bilash is distinguished by a bright melodic talent, which forms the foundation of his creative style. The main natural principle of the composer’s creativity is that some of them have the most characteristic signs of their own indigenous style. O. Bilash’s chamber-vocal creative work is characterized by a very faithful national identity, identifying oneself with the figurative, and so with the vernacular rivals. He is characterized by the sincerely superelevations of national identity and individual compositional style.

APA, Harvard, Vancouver, ISO, and other styles

44

Wahyuni, Ni Made Putri, Luh Arida Ayu Rahning Putri, I. Gusti Ngurah Anom Cahyadi Putra, Dewa Made Bayu Atmaja Darmawan, Made Agung Raharja, and Agus Muliantara. "Implementasi Metode K-Nearest Neighbor Dalam Mengklasifikasikan Jenis Suara Berdasarkan Jangkauan Vokal." JELIKU (Jurnal Elektronik Ilmu Komputer Udayana) 11, no. 1 (July 20, 2022): 187. http://dx.doi.org/10.24843/jlk.2022.v11.i01.p20.

Full text

Abstract:

Humans have voice characteristics with different vocal ranges, namely the male voice consists of Tenor, Baritone, and Bass, while the female voice consists of Soprano, Mezzosoprano, and Alto. Determining the voice range, especially for a singer, requires a vocal trainer or musical instrument that is quite difficult to access. Therefore, a sound classification system created based on vocal range using the Harmonic Product Spectrum (HPS) feature extraction method and the K-Nearest Neighbors (KNN) classification method uses k parameters from 1 to 40. The test gets the highest accuracy on parameter k=8, which is 88.88%, so that from the resulting accuracy to prove the K-Nearest Neighbor (KNN) method gives good results in classifying the type of voice. Keywords: Classification, Vocal range, Harmonic Product Spectrum, K-Nearest Neighbors

APA, Harvard, Vancouver, ISO, and other styles

45

Costantini, Giovanni, Valerio Cesarini, Pietro Di Leo, Federica Amato, Antonio Suppa, Francesco Asci, Antonio Pisani, Alessandra Calculli, and Giovanni Saggio. "Artificial Intelligence-Based Voice Assessment of Patients with Parkinson’s Disease Off and On Treatment: Machine vs. Deep-Learning Comparison." Sensors 23, no. 4 (February 18, 2023): 2293. http://dx.doi.org/10.3390/s23042293.

Full text

Abstract:

Parkinson’s Disease (PD) is one of the most common non-curable neurodegenerative diseases. Diagnosis is achieved clinically on the basis of different symptoms with considerable delays from the onset of neurodegenerative processes in the central nervous system. In this study, we investigated early and full-blown PD patients based on the analysis of their voice characteristics with the aid of the most commonly employed machine learning (ML) techniques. A custom dataset was made with hi-fi quality recordings of vocal tasks gathered from Italian healthy control subjects and PD patients, divided into early diagnosed, off-medication patients on the one hand, and mid-advanced patients treated with L-Dopa on the other. Following the current state-of-the-art, several ML pipelines were compared usingdifferent feature selection and classification algorithms, and deep learning was also explored with a custom CNN architecture. Results show how feature-based ML and deep learning achieve comparable results in terms of classification, with KNN, SVM and naïve Bayes classifiers performing similarly, with a slight edge for KNN. Much more evident is the predominance of CFS as the best feature selector. The selected features act as relevant vocal biomarkers capable of differentiating healthy subjects, early untreated PD patients and mid-advanced L-Dopa treated patients.

APA, Harvard, Vancouver, ISO, and other styles

46

He, Zhuo. "Vocal Music Recognition Based on Deep Convolution Neural Network." Scientific Programming 2022 (February 2, 2022): 1–10. http://dx.doi.org/10.1155/2022/7905992.

Full text

Abstract:

In order to achieve fast and accurate music technique recognition and enhancement for vocal music teaching, the paper proposed a music recognition method based on a combination of migration learning and CNN (convolutional neural network). Firstly, the most standard timbre vocal music is preprocessed by panning, flipping, rotating, and scaling and then manually classified by vocal technique features such as breathing method, articulation method, pronunciation method, and pitch region training. Then, based on the migration learning method, the weight parameters obtained from the convolutional model trained on the sound dataset CNN are migrated to the sound recognition, and the convolutional and pooling layers of the convolutional model are used as feature extraction layers, while the top layer is redesigned as a global average pooling layer and a Softmax output layer, and some of the convolutional layers are frozen during training. The experimental results show that the average test accuracy of the model is 86%, the training time is about 1/2 of the original model, and the model size is only 74.2 M. The F1 values of the model are 0.88, 0.80, 0.83, and 0.85 in four aspects, such as breathing method, exhaling method, articulation method, and phonetic region training, etc. The experimental results show that the method is efficient for voice and vocal music teaching recognition. The experimental results show that the method is efficient, effective, and transferable for voice and vocal music teaching research.

APA, Harvard, Vancouver, ISO, and other styles

47

Manalu, Laura Megawaty, and Richard Jr Kapoyos. "Pembelajaran Ansambel Vokal di Era Revolusi Industri 4.0." Musikolastika: Jurnal Pertunjukan dan Pendidikan Musik 4, no. 1 (June 25, 2022): 37–49. http://dx.doi.org/10.24036/musikolastika.v4i1.83.

Full text

Abstract:

In the era of the industrial revolution 4.0, it has an impact on fast information. Since the beginning of 2020 until now, face-to-face activities are still limited due to the COVID-19 pandemic. This has had an impact on the education system, particularly in the field of vocal music. But of course it takes creativity and innovation to overcome these problems. This study uses a qualitative approach. In writing this is qualitative data, using a descriptive approach, namely research that intends to make a description of situations or events (Basrowi, 2008). Based on data sources, this type of research is field research in the form of descriptive qualitative research. The form of vocal ensemble learning in the era of the industrial revolution 4.0 can be done by utilizing several digital applications. Among them, using the Zoom Meeting application by utilizing the video call feature and the use of features from Zoom have their respective advantages and this provides a new paradigm if both are used in the online vocal ensemble practice learning process.

APA, Harvard, Vancouver, ISO, and other styles

48

ter Haar, Sita M., Ahana A. Fernandez, Maya Gratier, Mirjam Knörnschild, Claartje Levelt, Roger K. Moore, Michiel Vellema, Xiaoqin Wang, and D. Kimbrough Oller. "Cross-species parallels in babbling: animals and algorithms." Philosophical Transactions of the Royal Society B: Biological Sciences 376, no. 1836 (September 6, 2021): 20200239. http://dx.doi.org/10.1098/rstb.2020.0239.

Full text

Abstract:

A key feature of vocal ontogeny in a variety of taxa with extensive vocal repertoires is a developmental pattern in which vocal exploration is followed by a period of category formation that results in a mature species-specific repertoire. Vocal development preceding the adult repertoire is often called ‘babbling’, a term used to describe aspects of vocal development in species of vocal-learning birds, some marine mammals, some New World monkeys, some bats and humans. The paper summarizes the results of research on babbling in examples from five taxa and proposes a unifying definition facilitating their comparison. There are notable similarities across these species in the developmental pattern of vocalizations, suggesting that vocal production learning might require babbling. However, the current state of the literature is insufficient to confirm this suggestion. We suggest directions for future research to elucidate this issue, emphasizing the importance of (i) expanding the descriptive data and seeking species with complex mature repertoires where babbling may not occur or may occur only to a minimal extent; (ii) (quasi-)experimental research to tease apart possible mechanisms of acquisition and/or self-organizing development; and (iii) computational modelling as a methodology to test hypotheses about the origins and functions of babbling. This article is part of the theme issue ‘Vocal learning in animals and humans’.

APA, Harvard, Vancouver, ISO, and other styles

49

Buckley, Daniel P., Manuel Diaz Cadiz, Tanya L. Eadie, and Cara E. Stepp. "Acoustic Model of Perceived Overall Severity of Dysphonia in Adductor-Type Laryngeal Dystonia." Journal of Speech, Language, and Hearing Research 63, no. 8 (August 10, 2020): 2713–22. http://dx.doi.org/10.1044/2020_jslhr-19-00354.

Full text

Abstract:

Purpose This study is a secondary analysis of existing data. The goal of the study was to construct an acoustic model of perceived overall severity of dysphonia in adductory laryngeal dystonia (AdLD). We predicted that acoustic measures (a) related to voice and pitch breaks and (b) related to vocal effort would form the primary elements of a model corresponding to auditory-perceptual ratings of overall severity of dysphonia. Method Twenty inexperienced listeners evaluated the overall severity of dysphonia of speech stimuli from 19 individuals with AdLD. Acoustic features related to primary signs of AdLD (hyperadduction resulting in pitch and voice breaks) and to a potential secondary symptom of AdLD (vocal effort, measures of relative fundamental frequency) were computed from the speech stimuli. Multiple linear regression analysis was applied to construct an acoustic model of the overall severity of dysphonia. Results The acoustic model included an acoustic feature related to pitch and voice breaks and three acoustic measures derived from relative fundamental frequency; it explained 84.9% of the variance in the auditory-perceptual ratings of overall severity of dysphonia in the speech samples. Conclusions Auditory-perceptual ratings of overall severity of dysphonia in AdLD were related to acoustic features of primary signs (pitch and voice breaks, hyperadduction associated with laryngeal spasms) and were also related to acoustic features of vocal effort. This suggests that compensatory vocal effort may be a secondary symptom in AdLD. Future work to generalize this acoustic model to a larger, independent data set is necessary before clinical translation is warranted.

APA, Harvard, Vancouver, ISO, and other styles

50

Monteiro, Ronan de Azevedo, Carolina Demetrio Ferreira, and Gilmar Perbiche-Neves. "Vocal repertoire and group-specific signature in the Smooth-billed Ani, Crotophaga ani Linnaeus, 1758 (Cuculiformes, Aves)." Papéis Avulsos de Zoologia 61 (July 30, 2021): e20216159. http://dx.doi.org/10.11606/1807-0205/2021.61.59.

Full text

Abstract:

Vocal plasticity reflects the ability of animals to vary vocalizations according to context (vocal repertoire) as well as to develop vocal convergence (vocal group signature) in the interaction of members in social groups. This feature has been largely reported for oscine, psittacine and trochilid birds, but little has been investigated in birds that present innate vocalization. The smooth-billed ani (Crotophaga ani) is a social bird that lives in groups between two and twenty individuals, and which presents innate vocalization. Here we analyzed the vocal repertoire of this species during group activities, and further investigated the existence of a vocal group signature. The study was conducted in the Southeast of Brazil between May 2017 and April 2018. Two groups of smooth-billed anis were followed, Guararema and Charqueada groups, and their vocalizations were recorded and contextualized as to the performed behavior. The vocal repertoire was analyzed for its composition, context and acoustic variables. The acoustic parameters maximum peak frequency, maximum fundamental frequency, minimum frequency, maximum frequency and duration were analyzed. To verify the vocal signature of the group, we tested whether there was variation in the acoustic parameters between the monitored groups. We recorded ten vocalizations that constituted the vocal repertoire of the Smooth-billed Ani, five of which (“Ahnee”, “Whine”, “Pre-flight”, “Flight” and “Vigil”) were issued by the two groups and five exclusive to the Charqueada group. There were significant differences in the acoustic parameters for “Flight” and “Vigil” vocalizations between the groups, suggesting vocal group signature for these sounds. We established that the Smooth-billed Ani has a diverse vocal repertoire, with variations also occurring between groups of the same population. Moreover, we found evidence of vocal group signature in vocalizations used in the context of cohesion, defense and territory maintenance.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!