Thematische Bibliographien / Singing voice recognition

Auswahl der wissenschaftlichen Literatur zum Thema „Singing voice recognition“

Autor: Grafiati

Veröffentlicht am 1. Juni 2024

Geben Sie eine Quelle nach APA, MLA, Chicago, Harvard und anderen Zitierweisen an

Wählen Sie eine Art der Quelle aus:

Inhaltsverzeichnis

Zeitschriftenartikel
Dissertationen
Buchteile
Konferenzberichte

Machen Sie sich mit den Listen der aktuellen Artikel, Bücher, Dissertationen, Berichten und anderer wissenschaftlichen Quellen zum Thema "Singing voice recognition" bekannt.

Neben jedem Werk im Literaturverzeichnis ist die Option "Zur Bibliographie hinzufügen" verfügbar. Nutzen Sie sie, wird Ihre bibliographische Angabe des gewählten Werkes nach der nötigen Zitierweise (APA, MLA, Harvard, Chicago, Vancouver usw.) automatisch gestaltet.

Sie können auch den vollen Text der wissenschaftlichen Publikation im PDF-Format herunterladen und eine Online-Annotation der Arbeit lesen, wenn die relevanten Parameter in den Metadaten verfügbar sind.

Zeitschriftenartikel zum Thema "Singing voice recognition"

Wang, Xiaochen, und Tao Wang. „Voice Recognition and Evaluation of Vocal Music Based on Neural Network“. Computational Intelligence and Neuroscience 2022 (20.05.2022): 1–9. http://dx.doi.org/10.1155/2022/3466987.

Der volle Inhalt der Quelle

Annotation:

Artistic voice is the artistic life of professional voice users. In the process of selecting and cultivating artistic performing talents, the evaluation of voice even occupies a very important position. Therefore, an appropriate evaluation of the artistic voice is crucial. With the development of art education, how to scientifically evaluate artistic voice training methods and fairly select artistic voice talents is an urgent need for objective evaluation of artistic voice. The current evaluation methods for artistic voices are time-consuming, laborious, and highly subjective. In the objective evaluation of artistic voice, the selection of evaluation acoustic parameters is very important. Attempt to extract the average energy, average frequency error, and average range error of singing voice by using speech analysis technology as the objective evaluation acoustic parameters, use neural network method to objectively evaluate the singing quality of artistic voice, and compare with the subjective evaluation of senior professional teachers. In this paper, voice analysis technology is used to extract the first formant, third formant, fundamental frequency, sound range, fundamental frequency perturbation, first formant perturbation, third formant perturbation, and average energy of singing acoustic parameters. By using BP neural network methods, the quality of singing was evaluated objectively and compared with the subjective evaluation of senior vocal professional teachers. The results show that the BP neural network method can accurately and objectively evaluate the quality of singing voice by using the evaluation parameters, which is helpful in scientifically guiding the selection and training of artistic voice talents.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Liusong, Yang, und Du Hui. „Voice Quality Evaluation of Singing Art Based on 1DCNN Model“. Mathematical Problems in Engineering 2022 (30.07.2022): 1–9. http://dx.doi.org/10.1155/2022/2074844.

Der volle Inhalt der Quelle

Annotation:

Traditional speech recognition still has the problems of poor robustness and low signal-to-noise ratio, which makes the accuracy of speech recognition not ideal. Combining the idea of one-dimensional convolutional neural network with objective evaluation, an improved CNN speech recognition method is proposed in this paper. The simulation experiment is carried out with MATLAB. The effectiveness and feasibility of this method are verified by simulation. This new method is based on one-dimensional convolutional neural network. The traditional 1DNN algorithm is optimized by using the fractional processing node theory, and the corresponding parameters are set. Establish an objective evaluation system based on improved 1DCNN. Through the comparison with other neural networks, the results show that the evaluation method based on the improved 1DCNN has high stability, and the error between subjective score and evaluation method is the smallest.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Huang, Chunyuan. „Vocal Music Teaching Pharyngeal Training Method Based on Audio Extraction by Big Data Analysis“. Wireless Communications and Mobile Computing 2022 (06.05.2022): 1–11. http://dx.doi.org/10.1155/2022/4572904.

Der volle Inhalt der Quelle

Annotation:

In the process of vocal music learning, incorrect vocalization methods and excessive use of voice have brought many problems to the voice and accumulated a lot of inflammation, so that the level of vocal music learning stagnated or even declined. How to find a way to improve yourself without damaging your voice has become a problem that we have been pursuing. Therefore, it is of great practical significance for vocal music teaching in normal universities to conduct in-depth research and discussion on “pharyngeal singing.” Based on audio extraction, this paper studies the vocal music teaching pharyngeal training method. Different methods of vocal music teaching pharyngeal training have different times. When the recognition amount is 3, the average recognition time of vocal music teaching pharyngeal training based on data mining is 0.010 seconds, the average recognition time of vocal music teaching pharyngeal training based on Internet of Things is 0.011 seconds, and the average recognition time of vocal music teaching pharyngeal training based on audio extraction is 0.006 seconds. The recognition time of the audio extraction method is much shorter than that of the other two traditional methods, because the audio extraction method can perform segmented training according to the changing trend of physical characteristics of notes, effectively extract the characteristics of vocal music teaching pharyngeal training, and shorten the recognition time. The learning of “pharyngeal singing” in vocal music teaching based on audio extraction is different from general vocal music training. It has its unique theory, concept, law, and sound image. In order to “liberate your voice,” it adopts large-capacity and large-scale training methods.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Owen, Ceri. „On Singing and Listening in Vaughan Williams's Early Songs“. 19th-Century Music 40, Nr. 3 (2017): 257–82. http://dx.doi.org/10.1525/ncm.2017.40.3.257.

Der volle Inhalt der Quelle

Annotation:

Vaughan Williams's celebrated set of Robert Louis Stevenson settings, Songs of Travel, has lately garnered liberal scholarly attention, not least on account of the vicissitudes of its publication history. Following the cycle's premiere in 1904 it was issued in two separate books, each gathering stylistically different songs. Though a credible case for narrative coherence has been advanced in numerous accounts, the cycle's peculiar amalgamation of materials might rather be read as a signal to its projection of multiple voices, which unsettle the longstanding critical tendency to map a single protagonist through its progress. The division marked by the cycle's publication history may productively be understood to reflect a tension inherent in its aesthetic propositions, one constitutive of much of Vaughan Williams's work, which frequently mediates between the individualistic and the collective, the “artistic” and the “accessible,” and, as I suggest, the subjective voice of the individual artist in its invitation to the participation of a singing and listening community. I propose that Vaughan Williams's early songs frequently frame the idea or demand the engagement of a listener's contribution, as particular modes of singing and listening—and singing-as-listening—are figured and invited within the music's constitution. Composed as he was searching for an individual creative voice that simultaneously sustained a nascent commitment to the social utility and intelligibility of national art music, these songs explore the possibility of achieving a self-consciously collective authorial subjectivity, often reaching toward a musical intersubjectivity wherein boundaries between self and other—and between composer, performer, and listener—are collapsed. In the recognition of such processes lies a means of examining the tendency of Vaughan Williams's work toward projecting a powerfully subjective voice that simultaneously claims identification with no single agency.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Muhathir, R. Muliono, N. Khairina, M. K. Harahap und S. M. Putri. „Analysis Discrete Hartley Transform for the recognition of female voice based on voice register in singing techniques“. Journal of Physics: Conference Series 1361 (November 2019): 012039. http://dx.doi.org/10.1088/1742-6596/1361/1/012039.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Yuan, Weitao, Boxin He, Shengbei Wang, Jianming Wang und Masashi Unoki. „Enhanced feature network for monaural singing voice separation“. Speech Communication 106 (Januar 2019): 1–6. http://dx.doi.org/10.1016/j.specom.2018.11.004.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Hu, Meihui, Zhiwei Xiang und Kai Li. „Application of Artificial Intelligence Voice Technology in Radio and Television Media“. Journal of Physics: Conference Series 2031, Nr. 1 (01.09.2021): 012051. http://dx.doi.org/10.1088/1742-6596/2031/1/012051.

Der volle Inhalt der Quelle

Annotation:

Abstract With the application of artificial intelligence in various fields, more and more people see the contribution of artificial intelligence technology to the development of the industry, and put more energy in the research of artificial intelligence language, and strive to provide better technical support for the development of more industries. In radio and television media, artificial intelligence voice technology can play a very important value, it can effectively improve the efficiency and quality of traditional audio work, optimize the singing system, broadcasting system and retrieval system, so as to provide better service quality for the masses. This paper discusses the application and development of artificial intelligence technology based on the fusion media environment, analyzes the application and development of artificial intelligence technology in promoting production efficiency, intelligent robot writing, intelligent face recognition, intelligent speech semantic recognition, intelligent OCR recognition, automatic broadcast and other aspects. These AI applications provide efficient and secure support for services, greatly improving service efficiency.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Liu, Pengfei, Wenjin Deng, Hengda Li, Jintai Wang, Yinglin Zheng, Yiwei Ding, Xiaohu Guo und Ming Zeng. „MusicFace: Music-driven expressive singing face synthesis“. Computational Visual Media 10, Nr. 1 (Februar 2023): 119–36. http://dx.doi.org/10.1007/s41095-023-0343-7.

Der volle Inhalt der Quelle

Annotation:

AbstractIt remains an interesting and challenging problem to synthesize a vivid and realistic singing face driven by music. In this paper, we present a method for this task with natural motions for the lips, facial expression, head pose, and eyes. Due to the coupling of mixed information for the human voice and backing music in common music audio signals, we design a decouple-and-fuse strategy to tackle the challenge. We first decompose the input music audio into a human voice stream and a backing music stream. Due to the implicit and complicated correlation between the two-stream input signals and the dynamics of the facial expressions, head motions, and eye states, we model their relationship with an attention scheme, where the effects of the two streams are fused seamlessly. Furthermore, to improve the expressivenes of the generated results, we decompose head movement generation in terms of speed and direction, and decompose eye state generation into short-term blinking and long-term eye closing, modeling them separately. We have also built a novel dataset, SingingFace, to support training and evaluation of models for this task, including future work on this topic. Extensive experiments and a user study show that our proposed method is capable of synthesizing vivid singing faces, qualitatively and quantitatively better than the prior state-of-the-art.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Liu, Lilin. „The New Approach Research on Singing Voice Detection Algorithm Based on Enhanced Reconstruction Residual Network“. Journal of Mathematics 2022 (23.02.2022): 1–11. http://dx.doi.org/10.1155/2022/7987592.

Der volle Inhalt der Quelle

Annotation:

With the development of Internet technology, multimedia information resources are increasing rapidly. Faced with the massive resources in the multimedia music library, it is extremely difficult for people to find the target music that meets their needs. How to realize computer analysis and perceive users’ needs for music resources has become the goal of the future development of human-computer interaction capabilities. Content-based music information retrieval applications are mainly embodied in the automatic classification and recognition of music. Traditional feedforward neural networks are prone to lose local information when extracting singing voice features. For this reason, on the basis of fully considering the impact of information persistence in the network propagation process, this paper proposes an enhanced two-stage super-resolution reconstruction residual network which can effectively integrate the learned features of each layer while increasing the depth of the network. The first stage of reconstruction is to complete the hierarchical learning of singing voice features through dense residual units to improve the integration of information. The second stage of reconstruction is mainly to perform residual relearning on the high-frequency information of the singing voice learned in the first stage to reduce the reconstruction error. In the middle of these two stages, the model introduces feature scaling and expansion convolution to achieve the dual purpose of reducing information redundancy and increasing the receptive field of the convolution kernel. A monophonic singing voice separation based on the high-resolution neural network is proposed. Because the high-resolution network has parallel subnetworks with different resolutions, it also has original resolution representations and multiple low-resolution representations, avoiding information loss caused by serial network downsampling effects and repeating multiple feature fusions to generate new semantic representations, allowing for the learning of comprehensive, high-precision, and highly abstract features. In this article, a high-resolution neural network is utilized to model the time spectrogram in order to correctly estimate the real value of the anticipated time-amplitude spectrograms. Experiments on the dataset MIR-1K show that compared with the current leading SH-4Stack model, the method in this paper has improved SDR, SIR, and SAR indicators for measuring the separation performance, confirming the effectiveness of the algorithm in this paper.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Le, Dinh Son, Huy Hung Ha, Dinh Quan Nguyen, Van An Tran und The Hung Nguyen. „Researching and designing an intelligent humanoid robot for teaching English language“. Ministry of Science and Technology, Vietnam 64, Nr. 6 (25.06.2022): 35–39. http://dx.doi.org/10.31276/vjst.64(6).35-39.

Der volle Inhalt der Quelle

Annotation:

This article presents the design of the mechatronic system for an intelligent humanoid robot, which is employed for teaching the English language. The robot’s appearance looks like a boy, at 1.2 m tall and 40 kg weight. The robot consists of an upper-body with 21 degrees of freedom, a head, two arms, two hands, a ribcage; and a mobile platform with three omnidirectional wheels. The control system consists of a computer that controls the entire operation of the robot, including motion planning, voice recognition and synchronization, face recognition, gestures, receiving commands from the remote control and monitoring station, receiving signals from microphones, cameras, receiving and sending signals to the mobile module controller and the upper body controller. Microphones, speakers and cameras are located at the head and chest of the robot to perform voice communication and image acquisition functions. A touch screen is arranged in front of the robot’s chest allowing the robot to interact with people and display the necessary information. The robot can communicate with people by voice, perform operations such as greetings, expressing emotions, performing dances, singing, applications for supporting English language teaching in primary schools and has extensible for many other practical applications.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Mehr Quellen

Dissertationen zum Thema "Singing voice recognition"

Regnier, Lise. „Localization, Characterization and Recognition of Singing Voices“. Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00687475.

Der volle Inhalt der Quelle

Annotation:

This dissertation is concerned with the problem of describing the singing voice within the audio signal of a song. This work is motivated by the fact that the lead vocal is the element that attracts the attention of most listeners. For this reason it is common for music listeners to organize and browse music collections using information related to the singing voice such as the singer name. Our research concentrates on the three major problems of music information retrieval: the localization of the source to be described (i.e. the recognition of the elements corresponding to the singing voice in the signal of a mixture of instruments), the search of pertinent features to describe the singing voice, and finally the development of pattern recognition methods based on these features to identify the singer. For this purpose we propose a set of novel features computed on the temporal variations of the fundamental frequency of the sung melody. These features, which aim to describe the vibrato and the portamento, are obtained with the aid of a dedicated model. In practice, these features are computed on the time-varying frequency of partials obtained using the sinusoidal model. In the first experiment we show that partials corresponding to the singing voice can be accurately differentiated from the partials produced by other instruments using decisions based on the parameters of the vibrato and the portamento. Once the partials emitted by the singer are identified, the segments of the song containing singing can be directly localized. To improve the recognition of the partials emitted by the singer we propose to group partials that are related harmonically. Partials are clustered according to their degree of similarity. This similarity is computed using a set of CASA cues including their temporal frequency variations (i.e. the vibrato and the portamento). The clusters of harmonically related partials corresponding to the singing voice are identified using the vocal vibrato and the portamento parameters. Groups of vocal partials can then be re-synthesized to isolate the voice. The result of the partial grouping can also be used to transcribe the sung melody. We then propose to go further with these features and study if the vibrato and portamento characteristics can be considered as a part of the singers' signature. Previous works on singer identification describe audio signals using features extracted on the short-term amplitude spectrum. The latter features aim to characterize the timbre of the sound, which, in the case of singing, is related to the vocal tract of the singer. The features we develop in this document capture long-term information related to the intonation of the singer, which is relevant to the style and the technique of the singer. We propose a method to combine these two complementary descriptions of the singing voice to increase the recognition rate of singer identification. In addition we evaluate the robustness of each type of feature against a set of variations. We show the singing voice is a highly variable instrument. To obtain a representative model of a singer's voice it is thus necessary to build models using a large set of examples covering the full tessitura of a singer. In addition, we show that features extracted directly from the partials are more robust to the presence of an instrumental accompaniment than features derived from the amplitude spectrum.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Vaglio, Andrea. „Leveraging lyrics from audio for MIR“. Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAT027.

Der volle Inhalt der Quelle

Annotation:

Les paroles de chansons fournissent un grand nombre d’informations sur la musique car ellescontiennent une grande partie de la sémantique des chansons. Ces informations pourraient aider les utilisateurs à naviguer facilement dans une large collection de chansons et permettre de leur offrir des recommandations personnalisées. Cependant, ces informations ne sont souvent pas disponibles sous leur forme textuelle. Les systèmes de reconnaissance de la voix chantée pourraient être utilisés pour obtenir des transcriptions directement à partir de la source audio. Ces approches sont usuellement adaptées de celles de la reconnaissance vocale. La transcription de la parole est un domaine vieux de plusieurs décennies qui a récemment connu des avancées significatives en raison des derniers développements des techniques d’apprentissage automatique. Cependant, appliqués au chant, ces algorithmes donnent des résultats peu satisfaisants et le processus de transcription des paroles reste difficile avec des complications particulières. Dans cette thèse, nous étudions plusieurs problèmes de ’Music Information Retrieval’ scientifiquement et industriellement complexes en utilisant des informations sur les paroles générées directement à partir de l’audio. L’accent est mis sur la nécessité de rendre les approches aussi pertinentes que possible dans le monde réel. Cela implique par exemple de les tester sur des ensembles de données vastes et diversifiés et d’étudier leur extensibilité. A cette fin, nous utilisons un large ensemble de données publiques possédant des annotations vocales et adaptons avec succès plusieurs des algorithmes de reconnaissance de paroles les plus performants. Nous présentons notamment, pour la première fois, un système qui détecte le contenu explicite directement à partir de l’audio. Les premières recherches sur la création d’un système d’alignement paroles audio multilingue sont également décrites. L’étude de la tâche alignement paroles-audio est complétée de deux expériences quantifiant la perception de la synchronisation de l’audio et des paroles. Une nouvelle approche phonotactique pour l’identification de la langue est également présentée. Enfin, nous proposons le premier algorithme de détection de versions employant explicitement les informations sur les paroles extraites de l’audio
Lyrics provide a lot of information about music since they encapsulate a lot of the semantics of songs. Such information could help users navigate easily through a large collection of songs and to recommend new music to them. However, this information is often unavailable in its textual form. To get around this problem, singing voice recognition systems could be used to obtain transcripts directly from the audio. These approaches are generally adapted from the speech recognition ones. Speech transcription is a decades-old domain that has lately seen significant advancements due to developments in machine learning techniques. When applied to the singing voice, however, these algorithms provide poor results. For a number of reasons, the process of lyrics transcription remains difficult. In this thesis, we investigate several scientifically and industrially difficult ’Music Information Retrieval’ problems by utilizing lyrics information generated straight from audio. The emphasis is on making approaches as relevant in real-world settings as possible. This entails testing them on vast and diverse datasets and investigating their scalability. To do so, a huge publicly available annotated lyrics dataset is used, and several state-of-the-art lyrics recognition algorithms are successfully adapted. We notably present, for the first time, a system that detects explicit content directly from audio. The first research on the creation of a multilingual lyrics-toaudio system are as well described. The lyrics-toaudio alignment task is further studied in two experiments quantifying the perception of audio and lyrics synchronization. A novel phonotactic method for language identification is also presented. Finally, we provide the first cover song detection algorithm that makes explicit use of lyrics information extracted from audio

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Marxer, Piñón Ricard. „Audio source separation for music in low-latency and high-latency scenarios“. Doctoral thesis, Universitat Pompeu Fabra, 2013. http://hdl.handle.net/10803/123808.

Der volle Inhalt der Quelle

Annotation:

Aquesta tesi proposa mètodes per tractar les limitacions de les tècniques existents de separació de fonts musicals en condicions de baixa i alta latència. En primer lloc, ens centrem en els mètodes amb un baix cost computacional i baixa latència. Proposem l'ús de la regularització de Tikhonov com a mètode de descomposició de l'espectre en el context de baixa latència. El comparem amb les tècniques existents en tasques d'estimació i seguiment dels tons, que són passos crucials en molts mètodes de separació. A continuació utilitzem i avaluem el mètode de descomposició de l'espectre en tasques de separació de veu cantada, baix i percussió. En segon lloc, proposem diversos mètodes d'alta latència que milloren la separació de la veu cantada, gràcies al modelatge de components específics, com la respiració i les consonants. Finalment, explorem l'ús de correlacions temporals i anotacions manuals per millorar la separació dels instruments de percussió i dels senyals musicals polifònics complexes.
Esta tesis propone métodos para tratar las limitaciones de las técnicas existentes de separación de fuentes musicales en condiciones de baja y alta latencia. En primer lugar, nos centramos en los métodos con un bajo coste computacional y baja latencia. Proponemos el uso de la regularización de Tikhonov como método de descomposición del espectro en el contexto de baja latencia. Lo comparamos con las técnicas existentes en tareas de estimación y seguimiento de los tonos, que son pasos cruciales en muchos métodos de separación. A continuación utilizamos y evaluamos el método de descomposición del espectro en tareas de separación de voz cantada, bajo y percusión. En segundo lugar, proponemos varios métodos de alta latencia que mejoran la separación de la voz cantada, gracias al modelado de componentes que a menudo no se toman en cuenta, como la respiración y las consonantes. Finalmente, exploramos el uso de correlaciones temporales y anotaciones manuales para mejorar la separación de los instrumentos de percusión y señales musicales polifónicas complejas.
This thesis proposes specific methods to address the limitations of current music source separation methods in low-latency and high-latency scenarios. First, we focus on methods with low computational cost and low latency. We propose the use of Tikhonov regularization as a method for spectrum decomposition in the low-latency context. We compare it to existing techniques in pitch estimation and tracking tasks, crucial steps in many separation methods. We then use the proposed spectrum decomposition method in low-latency separation tasks targeting singing voice, bass and drums. Second, we propose several high-latency methods that improve the separation of singing voice by modeling components that are often not accounted for, such as breathiness and consonants. Finally, we explore using temporal correlations and human annotations to enhance the separation of drums and complex polyphonic music signals.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Chung, Nien-Yu, und 鍾念佑. „Recognition of Singing Voice and Instrument Sound Using Combinations of Acoustic Features“. Thesis, 2016. http://ndltd.ncl.edu.tw/handle/39449100792026503384.

Der volle Inhalt der Quelle

Annotation:

碩士
國立臺灣科技大學
資訊工程系
104
This thesis aims to recognize the class that an input sound clip belongs to. The two sound classes concerned here are singing sound (with vocal singing) and instrument sound (without vocal singing). The focus of this research is placed on testing different combinations of those considered acoustic features in order to find a most effective feature vector for sound class recognition. The acoustic coefficients considered here include mel-frequency cepstral coefficients (MFCC), pitch-detection coefficients (PDC), Chroma extended features, and their delta coefficients. The recognition method studied is based on Gaussian mixture model (GMM). Different numbers of mixtures, e.g. 8, 16, 32 and 64, are used to train the parameters of the GMMs. Then, these GMMs are used in the experiments for recognizing external sound clips. In the experiments for sound frame recognition, we have tried 6 different feature vectors, i.e. 6 different combinations of acoustic features. Among the 6 feature vectors, the vector, MFCC plus PDC, is found to be significantly better than MFCC only in recognition rate. If the feature vector is augmented with delta values and the processing of voting mechanism is added, the best recognition rate achieved is 71.3% for sound frame recognition. In the experiments for sound clips recognition, we have tried 8 different feature vectors, i.e. 8 different combinations of acoustic features. To recognize pure-instrument sound clips, the feature vector consisting of 40 coefficients is found to be the best. The recognition rate achieved is 97.1%. To recognize mixed-sound clips, the feature vector consisting of 17 coefficients (MFCC+PDC) is found to be the best. The recognition rate achieved is 94.7%. If average recognition rate is concerned, the feature vector consisting of 40 coefficients would be the best. The recognition rate achieved is 93.8%. Therefore, the feature vector that obtains the highest recognition rate is of 40 dimensions and consists of MFCC, PDC, Chroma-extended features, and their delta values.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Pereira, Ana Isabel Lemos do Carmo. „The influence of singing with text and a neutral syllable on Portuguese children´s vocal performance, song recognition, and use of singing voice“. Doctoral thesis, 2019. http://hdl.handle.net/10362/91276.

Der volle Inhalt der Quelle

Annotation:

Research on children’s singing development is extensive. Different ages, approaches, and variables have been considered. However, research on singing with a neutral syllable versus singing with text is scarce and findings are inconclusive. Furthermore, little is known about children’s song recognition, and how text and melody interact along the learning process. In addition, the ability to use all vocal registers has not been of regular concern when investigating singing accuracy. Yet, it has been considered a pre-requisite towards accurate singing. The purpose of this dissertation was to investigate the influence of singing with text and a neutral syllable on children’s vocal performance, song recognition, and use of singing voice. Three studies were conducted with children aged 4 to 9, attending a private school in Lisbon. In Study One, two songs were taught over two periods of instruction and assessment. Period One involved the teaching of a song with text (Song 1) and a song with a neutral syllable (Song 2), whereas in Period Two the same two songs were taught with text (text was added to Song 2). At the end of each Period, children (n = 135) were individually audio recorded singing both songs and interviewed for those songs’ recognition (stimuli included songs with the same melody but different text, the same text but different melody, different text and different melody). Results revealed that singing with text seems to favor younger children in both Periods, and that girls scored higher than boys. In song recognition, findings reveal that the ability to decenter focus toward melody and text increases with age. Song 2 (taught with a neutral syllable during Period One) seems to elicit a wider range of recognition strategies. No significant relationship was found between the scores on vocal performance and the most valued component of a song (melody or text). In Study Two, children (n = 137) were administered with the Singing Voice Development Measure (SVDM). The use of singing voice was assessed, the singing accuracy on the pitches belonging to the measure criterion patterns was determined, as well as the relationship between both variables. A significant, strong and positive relationship was found between both variables with text and a neutral syllable. To sing with text or a neutral syllable did not affect children’s use of singing voice, but pattern singing accuracy scores were higher when singing with a neutral syllable. Given the nature of Study’s One and Study’s Two data, a third study was framed. The additional analysis sought to investigate the role of the use of singing voice, grade level, and gender on songs’ tonal achievement. Findings reveal that the use of singing voice with a neutral syllable is a common predictor for both songs’ tonal achievement. Gender predicts Song’s 1 tonal scores (higher for girls), but not Song’s 2 tonal scores. Overall results indicate the importance of a music program that includes songs and patterns with text and a neutral syllable. Implications for music education and needs for future research are addressed at the end.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Buchteile zum Thema "Singing voice recognition"

Żwan, Paweł, Piotr Szczuko, Bożena Kostek und Andrzej Czyżewski. „Automatic Singing Voice Recognition Employing Neural Networks and Rough Sets“. In Transactions on Rough Sets IX, 455–73. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008. http://dx.doi.org/10.1007/978-3-540-89876-4_25.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Rocamora, Martín, und Alvaro Pardo. „Separation and Classification of Harmonic Sounds for Singing Voice Detection“. In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications, 707–14. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-33275-3_87.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Jefferson, Ann. „The Romantic Poet and the Brotherhood of Genius“. In Genius in France. Princeton University Press, 2014. http://dx.doi.org/10.23943/princeton/9780691160658.003.0006.

Der volle Inhalt der Quelle

Annotation:

This chapter traces the emergence of a new poetry that presents its credentials as lying not with any preexisting literary or national tradition, but with the genius of the individual poet. Despite the real success of the volumes published in this spirit, the poet is portrayed, like Moses abandoned by his people, as having no public. The collective “other” that might afford him recognition is absent, and in the words of Victor Hugo, his was a voice crying in the wilderness, and singing to the deaf. The new poets thus enter the literary field announcing in advance that they will go unheard by a world that is fundamentally hostile.

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Konferenzberichte zum Thema "Singing voice recognition"

Zhou, Huali, Yueqian Lin, Yao Shi, Peng Sun und Ming Li. „Bisinger: Bilingual Singing Voice Synthesis“. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389659.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Gao, Xiaoxue, Xiaohai Tian, Yi Zhou, Rohan Kumar Das und Haizhou Li. „Personalized Singing Voice Generation Using WaveRNN“. In Odyssey 2020 The Speaker and Language Recognition Workshop. ISCA: ISCA, 2020. http://dx.doi.org/10.21437/odyssey.2020-36.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Huang, Wen-Chin, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi und Tomoki Toda. „The Singing Voice Conversion Challenge 2023“. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389671.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wang, Jun-You, Hung-Yi Lee, Jyh-Shing Roger Jang und Li Su. „Zero-Shot Singing Voice Synthesis from Musical Score“. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389711.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Liu, Ruolan, Xue Wen, Chunhui Lu, Liming Song und June Sig Sung. „Vibrato Learning in Multi-Singer Singing Voice Synthesis“. In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2021. http://dx.doi.org/10.1109/asru51503.2021.9688029.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Suzuki, Motoyuki, Sho Tomita und Tomoki Morita. „Lyrics Recognition from Singing Voice Focused on Correspondence Between Voice and Notes“. In Interspeech 2019. ISCA: ISCA, 2019. http://dx.doi.org/10.21437/interspeech.2019-1318.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Khunarsal, Peerapol, Chidchanok Lursinsap und Thanapant Raicharoen. „Singing voice recognition based on matching of spectrogram pattern“. In 2009 International Joint Conference on Neural Networks (IJCNN 2009 - Atlanta). IEEE, 2009. http://dx.doi.org/10.1109/ijcnn.2009.5179014.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Liu, Songxiang, Yuewen Cao, Dan Su und Helen Meng. „DiffSVC: A Diffusion Probabilistic Model for Singing Voice Conversion“. In 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2021. http://dx.doi.org/10.1109/asru51503.2021.9688219.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Chowdhury, Anurag, Austin Cozzo und Arun Ross. „Domain Adaptation for Speaker Recognition in Singing and Spoken Voice“. In ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022. http://dx.doi.org/10.1109/icassp43922.2022.9746111.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Yamamoto, Ryuichi, Reo Yoneyama, Lester Phillip Violeta, Wen-Chin Huang und Tomoki Toda. „A Comparative Study of Voice Conversion Models With Large-Scale Speech and Singing Data: The T13 Systems for the Singing Voice Conversion Challenge 2023“. In 2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). IEEE, 2023. http://dx.doi.org/10.1109/asru57964.2023.10389779.

Der volle Inhalt der Quelle

APA, Harvard, Vancouver, ISO und andere Zitierweisen

Wir bieten Rabatte auf alle Premium-Pläne für Autoren, deren Werke in thematische Literatursammlungen aufgenommen wurden. Kontaktieren Sie uns, um einen einzigartigen Promo-Code zu erhalten!