Rozprawy doktorskie na temat „Apprentissage automatique – Musique”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 22 najlepszych rozpraw doktorskich naukowych na temat „Apprentissage automatique – Musique”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Fradet, Nathan. "Apprentissage automatique pour la modélisation de musique symbolique". Electronic Thesis or Diss., Sorbonne université, 2024. https://accesdistant.sorbonne-universite.fr/login?url=https://theses-intra.sorbonne-universite.fr/2024SORUS037.pdf.
Pełny tekst źródłaSymbolic music modeling (SMM) represents the tasks performed by Deep Learning models on the symbolic music modality, among which are music generation or music information retrieval. SMM is often handled with sequential models that process data as sequences of discrete elements called tokens. This thesis study how symbolic music can be tokenized, and what are the impacts of the different ways to do it impact models performances and efficiency. Current challenges include the lack of software to perform this step, poor model efficiency and inexpressive tokens. We address these challenges by: 1) developing a complete, flexible and easy to use software library allowing to tokenize symbolic music; 2) analyzing the impact of various tokenization strategies on model performances; 3) increasing the performance and efficiency of models by leveraging large music vocabularies with the use of byte pair encoding; 4) building the first large-scale model for symbolic music generation
Jacques, Céline. "Méthodes d'apprentissage automatique pour la transcription automatique de la batterie". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS150.
Pełny tekst źródłaThis thesis focuses on learning methods for automatic transcription of the battery. They are based on a transcription algorithm using a non-negative decomposition method, NMD. This thesis raises two main issues: the adaptation of methods to the analyzed signal and the use of deep learning. Taking into account the information of the signal analyzed in the model can be achieved by their introduction during the decomposition steps. A first approach is to reformulate the decomposition step in a probabilistic context to facilitate the introduction of a posteriori information with methods such as SI-PLCA and statistical NMD. A second approach is to implement an adaptation strategy directly in the NMD: the application of modelable filters to the patterns to model the recording conditions or the adaptation of the learned patterns directly to the signal by applying strong constraints to preserve their physical meaning. The second approach concerns the selection of the signal segments to be analyzed. It is best to analyze segments where at least one percussive event occurs. An onset detector based on a convolutional neural network (CNN) is adapted to detect only percussive onsets. The results obtained being very interesting, the detector is trained to detect only one instrument allowing the transcription of the three main drum instruments with three CNNs. Finally, the use of a CNN multi-output is studied to transcribe the part of battery with a single network
Cont, Arshia. "Modélisation de l'anticipation musicale : du temps de la musique vers la musique du temps". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2008. http://tel.archives-ouvertes.fr/tel-00417565.
Pełny tekst źródłaDans le traitement de la première question, nous introduisons un cadre mathématique nommé géométrie d'informations musicales combinant la théorie de l'information, la géométrie différentielle, et l'apprentissage statistique pour représenter les contenus pertinents de l'informations musicales. La deuxième question est abordée comme un problème d'apprentissage automatique des stratégies décisionnelles dans un environnement, en employant les méthodes d'apprentissage interactif. Nous proposons pour la troisième question, une nouvelle conception du problème de synchronisation temps réel entre une partition symbolique et un musicien. Ceci nous ramène à Antescofo, un outils préliminaire d'écriture du temps et de l'interaction dans l'informatique musicale. Malgré la variété des sujets abordés dans cette thèse, la conception anticipative est la facture commune entre toutes les propositions avec les prémices de réduire la complexité structurelle et computationnelle de modélisation, et d'aider à aborder des problèmes complexes dans l'informatique musicale.
Essid, Slim. "Classification automatique des signaux audio-fréquences : reconnaissance des instruments de musique". Phd thesis, Université Pierre et Marie Curie - Paris VI, 2005. http://pastel.archives-ouvertes.fr/pastel-00002738.
Pełny tekst źródłaRousseaux, Francis. "Une contribution de l'intelligence artificielle et de l'apprentissage symbolique automatique à l'élaboration d'un modèle d'enseignement de l'écoute musicale". Phd thesis, Université Pierre et Marie Curie - Paris VI, 1990. http://tel.archives-ouvertes.fr/tel-00417579.
Pełny tekst źródłaC'est ainsi que ce thème devient un objectif d'études et de recherches : mais dans cette optique, il est nécessaire de prendre en compte l'état de l'art en informatique musicale, et d'écouter les besoins manifestés par les musiciens, afin de prendre pied sur une réelle communauté d'intérêts entre les deux disciplines.
En toute hypothèse, la musique est un objet abstrait dont il existe plusieurs représentations, aucune n'étant complète ni générale, et chacune possédant des propriétés spécifiques. Qui plus est, ces représentations ont tendance à évoluer, naître et mourir au gré des besoins des musiciens, même si la représentation sonore reste essentielle et par définition indissociable de l'objet abstrait : mais il faut bien admettre que le son musical n'est pas seul à évoquer la musique, et que si l'homme éprouve le besoin d'inventer des représentations pour mieux s'approprier le phénomène musical, il peut être enrichissant d'examiner la transposition de ce comportement aux machines.
On peut certes isoler une de ces représentations, la traduire informatiquement et lui dédier des outils : c'est ainsi que de nombreux systèmes informatiques abordent la musique. Mais il existe une approche plus typique de l'intelligence artificielle, qui consiste à chercher à atteindre l'objet abstrait à travers l'ensemble de ses représentations et de leurs relations : pour un système informatique, faire preuve d'intelligence dans ce contexte, c'est utiliser cette diversité et cette multiplicité de représentation; c'est savoir s'appuyer sur une réalité mouvante et se déplacer dans un univers d'abstractions.
Mais les représentations ne prennent leur sens qu'avec ceux qui communiquent à travers elles, qu'avec les activités qu'elles engendrent. On peut alors imaginer un système qui constituerait un véritable lieu de rencontre, de réflexion, de création, en un mot de communication : car la musique est avant tout un médium de communication. Mais quelle est la nature de ce qu'on pourra communiquer à travers un tel système ? Par exemple, on pourra s'exercer aux pratiques musicales, expérimenter de nouveaux rapports entre les représentations, en un mot s'approprier le médium musical lui-même.
Mais alors, on a besoin d'un système qui sache témoigner de ces rencontres, plus précisément qui apprenne à en témoigner; c'est là notre définition de l'apprentissage dans le contexte : on dira qu'un système apprend s'il témoigne, et éventuellement s'adapte à un univers de communication musicale. Sans cette exigence, la valeur de la communication est perdue : en effet les parties prenantes quittent le système avec leur nouvelle richesse, quelle que soit la réussite de la médiation. Aussi, l'enjeu pour un système apprenti consiste à retourner un témoignage aux musiciens, aux pédagogues et aux informaticiens, afin qu'ils puissent en tirer profit : bien entendu, on exigera de ce témoignage qu'il produise de la connaissance utile, sans se contenter de cumuls d'événements ou de faits ordonnés historiquement.
Ainsi, à travers un enseignement ouvert, il s'agira pour des élèves d'appréhender et d'expérimenter le médium musical, d'enrichir leurs connaissances et d'obtenir des explications. Pour des enseignants, il s'agira de créer et d'organiser cette médiation, et de rendre des oracles pédagogiques au système. Mais l'intelligence artificielle et l'apprentissage symbolique automatique sont les sciences de l'explication : il faut mettre en jeu la dimension cognitive qui permettra d'expertiser l'adéquation du lieu de rencontre; il faut se placer au cœur des besoins et des préoccupations des enseignants et des élèves, en tentant de formaliser les théories cognitives de la musique. On pourra même inventer des représentations à vocations cognitive et explicative : à terme, un système construit sur un tel modèle pourrait bien être capable de faire lui-même des découvertes dans ce domaine.
Bayle, Yann. "Apprentissage automatique de caractéristiques audio : application à la génération de listes de lecture thématiques". Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0087/document.
Pełny tekst źródłaThis doctoral dissertation presents, discusses and proposes tools for the automatic information retrieval in big musical databases.The main application is the supervised classification of musical themes to generate thematic playlists.The first chapter introduces the different contexts and concepts around big musical databases and their consumption.The second chapter focuses on the description of existing music databases as part of academic experiments in audio analysis.This chapter notably introduces issues concerning the variety and unequal proportions of the themes contained in a database, which remain complex to take into account in supervised classification.The third chapter explains the importance of extracting and developing relevant audio features in order to better describe the content of music tracks in these databases.This chapter explains several psychoacoustic phenomena and uses sound signal processing techniques to compute audio features.New methods of aggregating local audio features are proposed to improve song classification.The fourth chapter describes the use of the extracted audio features in order to sort the songs by themes and thus to allow the musical recommendations and the automatic generation of homogeneous thematic playlists.This part involves the use of machine learning algorithms to perform music classification tasks.The contributions of this dissertation are summarized in the fifth chapter which also proposes research perspectives in machine learning and extraction of multi-scale audio features
Bel, Bernard. "Acquisition et représentation de connaissances en musique". Phd thesis, Aix-Marseille 3, 1990. http://tel.archives-ouvertes.fr/tel-00009692.
Pełny tekst źródłaCarsault, Tristan. "Introduction of musical knowledge and qualitative analysis in chord extraction and prediction tasks with machine learning. : application to human-machine co-improvisation". Electronic Thesis or Diss., Sorbonne université, 2020. http://www.theses.fr/2020SORUS247.
Pełny tekst źródłaThis thesis investigates the impact of introducing musical properties in machine learning models for the extraction and inference of musical features. Furthermore, it discusses the use of musical knowledge to perform qualitative evaluations of the results. In this work, we focus on musical chords since these mid-level features are frequently used to describe harmonic progressions in Western music. Hence, amongs the variety of tasks encountered in the field of Music Information Retrieval (MIR), the two main tasks that we address are the Automatic Chord Extraction (ACE) and the inference of symbolic chord sequences. In the case of musical chords, there exists inherent strong hierarchical and functional relationships. Indeed, even if two chords do not belong to the same class, they can share the same harmonic function within a chord progression. Hence, we developed a specifically-tailored analyzer that focuses on the functional relations between chords to distinguish strong and weak errors. We define weak errors as a misclassification that still preserves the relevance in terms of harmonic function. This reflects the fact that, in contrast to strict transcription tasks, the extraction of high-level musical features is a rather subjective task. Moreover, many creative applications would benefit from a higher level of harmonic understanding rather than an increased accuracy of label classification. For instance, one of our application case is the development of a software that interacts with a musician in real-time by inferring expected chord progressions. In order to achieve this goal, we divided the project into two main tasks : a listening module and a symbolic generation module. The listening module extracts the musical structure played by the musician, where as the generative module predicts musical sequences based on the extracted features. In the first part of this thesis, we target the development of an ACE system that could emulate the process of musical structure discovery, as performed by musicians in improvisation contexts. Most ACE systems are built on the idea of extracting features from raw audio signals and, then, using these features to construct a chord classifier. This entail two major families of approaches, as either rule-based or statistical models. In this work, we identify drawbacks in the use of statistical models for ACE tasks. Then, we propose to introduce prior musical knowledge in order to account for the inherent relationships between chords directly inside the loss function of learning methods. In the second part of this thesis, we focus on learning higher-level relationships inside sequences of extracted chords in order to develop models with the ability to generate potential continuations of chord sequences. In order to introduce musical knowledge in these models, we propose both new architectures, multi-label training methods and novel data representations
Nistal, Hurlé Javier. "Exploring generative adversarial networks for controllable musical audio synthesis". Electronic Thesis or Diss., Institut polytechnique de Paris, 2022. http://www.theses.fr/2022IPPAT009.
Pełny tekst źródłaAudio synthesizers are electronic musical instruments that generate artificial sounds under some parametric control. While synthesizers have evolved since they were popularized in the 70s, two fundamental challenges are still unresolved: 1) the development of synthesis systems responding to semantically intuitive parameters; 2) the design of "universal," source-agnostic synthesis techniques. This thesis researches the use of Generative Adversarial Networks (GAN) towards building such systems. The main goal is to research and develop novel tools for music production that afford intuitive and expressive means of sound manipulation, e.g., by controlling parameters that respond to perceptual properties of the sound and other high-level features. Our first work studies the performance of GANs when trained on various common audio signal representations (e.g., waveform, time-frequency representations). These experiments compare different forms of audio data in the context of tonal sound synthesis. Results show that the Magnitude and Instantaneous Frequency of the phase and the complex-valued Short-Time Fourier Transform achieve the best results. Building on this, our following work presents DrumGAN, a controllable adversarial audio synthesizer of percussive sounds. By conditioning the model on perceptual features describing high-level timbre properties, we demonstrate that intuitive control can be gained over the generation process. This work results in the development of a VST plugin generating full-resolution audio and compatible with any Digital Audio Workstation (DAW). We show extensive musical material produced by professional artists from Sony ATV using DrumGAN. The scarcity of annotations in musical audio datasets challenges the application of supervised methods to conditional generation settings. Our third contribution employs a knowledge distillation approach to extract such annotations from a pre-trained audio tagging system. DarkGAN is an adversarial synthesizer of tonal sounds that employs the output probabilities of such a system (so-called “soft labels”) as conditional information. Results show that DarkGAN can respond moderately to many intuitive attributes, even with out-of-distribution input conditioning. Applications of GANs to audio synthesis typically learn from fixed-size two-dimensional spectrogram data analogously to the "image data" in computer vision; thus, they cannot generate sounds with variable duration. In our fourth paper, we address this limitation by exploiting a self-supervised method for learning discrete features from sequential data. Such features are used as conditional input to provide step-wise time-dependent information to the model. Global consistency is ensured by fixing the input noise z (characteristic in adversarial settings). Results show that, while models trained on a fixed-size scheme obtain better audio quality and diversity, ours can competently generate audio of any duration. One interesting direction for research is the generation of audio conditioned on preexisting musical material, e.g., the generation of some drum pattern given the recording of a bass line. Our fifth paper explores a simple pretext task tailored at learning such types of complex musical relationships. Concretely, we study whether a GAN generator, conditioned on highly compressed MP3 musical audio signals, can generate outputs resembling the original uncompressed audio. Results show that the GAN can improve the quality of the audio signals over the MP3 versions for very high compression rates (16 and 32 kbit/s). As a direct consequence of applying artificial intelligence techniques in musical contexts, we ask how AI-based technology can foster innovation in musical practice. Therefore, we conclude this thesis by providing a broad perspective on the development of AI tools for music production, informed by theoretical considerations and reports from real-world AI tool usage by professional artists
Françoise, Jules. "Motion-sound Mapping By Demonstration". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066105/document.
Pełny tekst źródłaDesigning the relationship between motion and sound is essential to the creation of interactive systems. This thesis proposes an approach to the design of the mapping between motion and sound called Mapping-by-Demonstration. Mapping-by-Demonstration is a framework for crafting sonic interactions from demonstrations of embodied associations between motion and sound. It draws upon existing literature emphasizing the importance of bodily experience in sound perception and cognition. It uses an interactive machine learning approach to build the mapping iteratively from user demonstrations. Drawing upon related work in the fields of animation, speech processing and robotics, we propose to fully exploit the generative nature of probabilistic models, from continuous gesture recognition to continuous sound parameter generation. We studied several probabilistic models under the light of continuous interaction. We examined both instantaneous (Gaussian Mixture Model) and temporal models (Hidden Markov Model) for recognition, regression and parameter generation. We adopted an Interactive Machine Learning perspective with a focus on learning sequence models from few examples, and continuously performing recognition and mapping. The models either focus on movement, or integrate a joint representation of motion and sound. In movement models, the system learns the association between the input movement and an output modality that might be gesture labels or movement characteristics. In motion-sound models, we model motion and sound jointly, and the learned mapping directly generates sound parameters from input movements. We explored a set of applications and experiments relating to real-world problems in movement practice, sonic interaction design, and music. We proposed two approaches to movement analysis based on Hidden Markov Model and Hidden Markov Regression, respectively. We showed, through a use-case in Tai Chi performance, how the models help characterizing movement sequences across trials and performers. We presented two generic systems for movement sonification. The first system allows users to craft hand gesture control strategies for the exploration of sound textures, based on Gaussian Mixture Regression. The second system exploits the temporal modeling of Hidden Markov Regression for associating vocalizations to continuous gestures. Both systems gave birth to interactive installations that we presented to a wide public, and we started investigating their interest to support gesture learning
Afchar, Darius. "Interpretable Music Recommender Systems". Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS608.
Pełny tekst źródła‘‘Why do they keep recommending me this music track?’’ ‘‘Why did our system recommend these tracks to users?’’ Nowadays, streaming platforms are the most common way to listen to recorded music. Still, music recommendations — at the heart of these platforms — are not an easy feat. Sometimes, both users and engineers may be equally puzzled about the behaviour of a music recommendation system (MRS). MRS have been successfully employed to help explore catalogues that may be as large as tens of millions of music tracks. Built and optimised for accuracy, real-world MRS often end up being quite complex. They may further rely on a range of interconnected modules that, for instance, analyse audio signals, retrieve metadata about albums and artists, collect and aggregate user feedbacks on the music service, and compute item similarities with collaborative filtering. All this complexity hinders the ability to explain recommendations and, more broadly, explain the system. Yet, explanations are essential for users to foster a long-term engagement with a system that they can understand (and forgive), and for system owners to rationalise failures and improve said system. Interpretability may also be needed to check the fairness of a decision or can be framed as a means to control the recommendations better. Moreover, we could also recursively question: Why does an explanation method explain in a certain way? Is this explanation relevant? What could be a better explanation? All these questions relate to the interpretability of MRSs. In the first half of this thesis, we explore the many flavours that interpretability can have in various recommendation tasks. Indeed, since there is not just one recommendation task but many (e.g., sequential recommendation, playlist continuation, artist similarity), as well as many angles through which music may be represented and processed (e.g., metadata, audio signals, embeddings computed from listening patterns), there are as many settings that require specific adjustments to make explanations relevant. A topic like this one can never be exhaustively addressed. This study was guided along some of the mentioned modalities of musical objects: interpreting implicit user logs, item features, audio signals and similarity embeddings. Our contribution includes several novel methods for eXplainable Artificial Intelligence (XAI) and several theoretical results, shedding new light on our understanding of past methods. Nevertheless, similar to how recommendations may not be interpretable, explanations about them may themselves lack interpretability and justifications. Therefore, in the second half of this thesis, we found it essential to take a step back from the rationale of ML and try to address a (perhaps surprisingly) understudied question in XAI: ‘‘What is interpretability?’’ Introducing concepts from philosophy and social sciences, we stress that there is a misalignment in the way explanations from XAI are generated and unfold versus how humans actually explain. We highlight that current research tends to rely too much on intuitions or hasty reduction of complex realities into convenient mathematical terms, which leads to the canonisation of assumptions into questionable standards (e.g., sparsity entails interpretability). We have treated this part as a comprehensive tutorial addressed to ML researchers to better ground their knowledge of explanations with a precise vocabulary and a broader perspective. We provide practical advice and highlight less popular branches of XAI better aligned with human cognition. Of course, we also reflect back and recontextualise our methods proposed in the previous part. Overall, this enables us to formulate some perspective for our field of XAI as a whole, including its more critical and promising next steps as well as its shortcomings to overcome
Cohen-Hadria, Alice. "Estimation de descriptions musicales et sonores par apprentissage profond". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS607.
Pełny tekst źródłaIn Music Information Retrieval (MIR) and voice processing, the use of machine learning tools has become in the last few years more and more standard. Especially, many state-of-the-art systems now rely on the use of Neural Networks.In this thesis, we propose a wide overview of four different MIR and voice processing tasks, using systems built with neural networks. More precisely, we will use convolutional neural networks, an image designed class neural networks. The first task presented is music structure estimation. For this task, we will show how the choice of input representation can be critical, when using convolutional neural networks. The second task is singing voice detection. We will present how to use a voice detection system to automatically align lyrics and audio tracks.With this alignment mechanism, we have created the largest synchronized audio and speech data set, called DALI. Singing voice separation is the third task. For this task, we will present a data augmentation strategy, a way to significantly increase the size of a training set. Finally, we tackle voice anonymization. We will present an anonymization method that both obfuscate content and mask the speaker identity, while preserving the acoustic scene
Scurto, Hugo. "Designing With Machine Learning for Interactive Music Dispositifs". Electronic Thesis or Diss., Sorbonne université, 2019. http://www.theses.fr/2019SORUS356.
Pełny tekst źródłaMusic is a cultural and creative practice that enables humans to express a variety of feelings and intentions through sound. Machine learning opens many prospects for designing human expression in interactive music systems. Yet, as a Computer Science discipline, machine learning remains mostly studied from an engineering sciences perspective, which often exclude humans and musical interaction from the loop of the created systems. In this dissertation, I argue in favour of designing with machine learning for interactive music systems. I claim that machine learning must be first and foremost situated in human contexts to be researched and applied to the design of interactive music systems. I present four interdisciplinary studies that support this claim, using human-centred methods and model prototypes to design and apply machine learning to four situated musical tasks: motion-sound mapping, sonic exploration, synthesis exploration, and collective musical interaction. Through these studies, I show that model prototyping helps envision designs of machine learning with human users before engaging in model engineering. I also show that the final human-centred machine learning systems not only helps humans create static musical artifacts, but supports dynamic processes of expression between humans and machines. I call co-expression these processes of musical interaction between humans - who may have an expressive and creative impetus regardless of their expertise - and machines - whose learning abilities may be perceived as expressive by humans. In addition to these studies, I present five applications of the created model prototypes to the design of interactive music systems, which I publicly demonstrated in workshops, exhibitions, installations, and performances. Using a reflexive approach, I argue that the musical contributions enabled by such design practice with machine learning may ultimately complement the scientific contributions of human-centred machine learning. I claim that music research can thus be led through dispositif design, that is, through the technical realization of aesthetically-functioning artifacts that challenge cultural norms on computer science and music
Crestel, Léopold. "Neural networks for automatic musical projective orchestration". Electronic Thesis or Diss., Sorbonne université, 2018. http://www.theses.fr/2018SORUS625.
Pełny tekst źródłaOrchestration is the art of composing a musical discourse over a combinatorial set of instrumental possibilities. For centuries, musical orchestration has only been addressed in an empirical way, as a scientific theory of orchestration appears elusive. In this work, we attempt to build the first system for automatic projective orchestration, and to rely on machine learning. Hence, we start by formalizing this novel task. We focus our effort on projecting a piano piece onto a full symphonic orchestra, in the style of notable classic composers such as Mozart or Beethoven. Hence, the first objective is to design a system of live orchestration, which takes as input the sequence of chords played by a pianist and generate in real-time its orchestration. Afterwards, we relax the real-time constraints in order to use slower but more powerful models and to generate scores in a non-causal way, which is closer to the writing process of a human composer. By observing a large dataset of orchestral music written by composers and their reduction for piano, we hope to be able to capture through statistical learning methods the mechanisms involved in the orchestration of a piano piece. Deep neural networks seem to be a promising lead for their ability to model complex behaviour from a large dataset and in an unsupervised way. More specifically, in the challenging context of symbolic music which is characterized by a high-dimensional target space and few examples, we investigate autoregressive models. At the price of a slower generation process, auto-regressive models allow to account for more complex dependencies between the different elements of the score, which we believe to be of the foremost importance in the case of orchestration
Louboutin, Corentin. "Modélisation multi-échelle et multi-dimensionnelle de la structure musicale par graphes polytopiques". Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S012/document.
Pełny tekst źródłaIn this thesis, we approach these questions by defining and implementing a multi-scale model for music segment structure description, called Polytopic Graph of Latent Relations (PGLR). In our work, a segment is the macroscopic constituent of the global piece. In pop songs, which is the main focus here, segments usually correspond to a chorus or a verse, lasting approximately 15 seconds and exhibiting a clear beginning and end. Under the PGLR scheme, relationships between musical elements within a musical segment are assumed to be developing predominantly between homologous elements within the metrical grid at different scales simultaneously. This approach generalises to the multi-scale case the System&Contrast framework which aims at describing, as a 2×2 square matrix, the logical system of expectation within a segment and the surprise resulting from that expectation. For regular segments of 2^n events, the PGLR lives on a n-dimensional cube (square, cube, tesseract, etc...), n being the number of scales considered simultaneously in the multi-scale model. Each vertex in the polytope corresponds to a low-scale musical element, each edge represents a relationship between two vertices and each face forms an elementary system of relationships. The estimation of the PGLR structure of a musical segment can then be obtained computationally as the joint estimation of : the description of the polytope (as a more or less regular n-polytope) ; the nesting configuration of the graph over the polytope, reflecting the flow of dependencies and interactions as elementary implication systems within the musical segment, the set of relations between the nodes of the graph. The aim of the PGLR model is to both describe the time dependencies between the elements of a segment and model the logical expectation and surprise that can be built on the observation and perception of the similarities and differences between elements with strong relationships. The approach is presented conceptually and algorithmically, together with an extensive evaluation of the ability of different models to predict unseen data, measured using the cross-perplexity value. These experiments have been conducted both on chords sequences, rhythmic and melodic segments extracted from the RWC POP corpus. Our results illustrate the efficiency of the proposed model in capturing structural information within such data
Foroughmand, Aarabi Hadrien. "Towards global tempo estimation and rhythm-oriented genre classification based on harmonic characteristics of rhythm". Electronic Thesis or Diss., Sorbonne université, 2021. http://www.theses.fr/2021SORUS018.
Pełny tekst źródłaAutomatic detection of the rhythmic structure within music is one of the challenges of the "Music Information Retrieval" research area. The advent of technology dedicated to the arts has allowed the emergence of new musical trends generally described by the term "Electronic/Dance Music" (EDM) which encompasses a plethora of sub-genres. This type of music often dedicated to dance is characterized by its rhythmic structure. We propose a rhythmic analysis of what defines certain musical genres including those of EDM. To do so, we want to perform an automatic global tempo estimation task and a genre classification task based on rhythm. Tempo and genre are two intertwined aspects since genres are often associated with rhythmic patterns that are played in specific tempo ranges. Some so-called "handcrafted" tempo estimation systems have been shown to be effective based on the extraction of rhythm-related characteristics. Recently, with the appearance of annotated databases, so-called "data-driven" systems and deep learning approaches have shown progress in the automatic estimation of these tasks. In this thesis, we propose methods at the crossroads between " handcrafted " and " data-driven " systems. The development of a new representation of rhythm combined with deep learning by convolutional neural network is at the basis of all our work. We present in detail our Deep Rhythm method in this thesis and we also present several extensions based on musical intuitions that allow us to improve our results
Françoise, Jules. "Motion-sound Mapping By Demonstration". Electronic Thesis or Diss., Paris 6, 2015. http://www.theses.fr/2015PA066105.
Pełny tekst źródłaDesigning the relationship between motion and sound is essential to the creation of interactive systems. This thesis proposes an approach to the design of the mapping between motion and sound called Mapping-by-Demonstration. Mapping-by-Demonstration is a framework for crafting sonic interactions from demonstrations of embodied associations between motion and sound. It draws upon existing literature emphasizing the importance of bodily experience in sound perception and cognition. It uses an interactive machine learning approach to build the mapping iteratively from user demonstrations. Drawing upon related work in the fields of animation, speech processing and robotics, we propose to fully exploit the generative nature of probabilistic models, from continuous gesture recognition to continuous sound parameter generation. We studied several probabilistic models under the light of continuous interaction. We examined both instantaneous (Gaussian Mixture Model) and temporal models (Hidden Markov Model) for recognition, regression and parameter generation. We adopted an Interactive Machine Learning perspective with a focus on learning sequence models from few examples, and continuously performing recognition and mapping. The models either focus on movement, or integrate a joint representation of motion and sound. In movement models, the system learns the association between the input movement and an output modality that might be gesture labels or movement characteristics. In motion-sound models, we model motion and sound jointly, and the learned mapping directly generates sound parameters from input movements. We explored a set of applications and experiments relating to real-world problems in movement practice, sonic interaction design, and music. We proposed two approaches to movement analysis based on Hidden Markov Model and Hidden Markov Regression, respectively. We showed, through a use-case in Tai Chi performance, how the models help characterizing movement sequences across trials and performers. We presented two generic systems for movement sonification. The first system allows users to craft hand gesture control strategies for the exploration of sound textures, based on Gaussian Mixture Regression. The second system exploits the temporal modeling of Hidden Markov Regression for associating vocalizations to continuous gestures. Both systems gave birth to interactive installations that we presented to a wide public, and we started investigating their interest to support gesture learning
Cifka, Ondrej. "Deep learning methods for music style transfer". Electronic Thesis or Diss., Institut polytechnique de Paris, 2021. http://www.theses.fr/2021IPPAT029.
Pełny tekst źródłaRecently, deep learning methods have enabled transforming musical material in a data-driven manner. The focus of this thesis is on a family of tasks which we refer to as (one-shot) music style transfer, where the goal is to transfer the style of one musical piece or fragment onto another.In the first part of this work, we focus on supervised methods for symbolic music accompaniment style transfer, aiming to transform a given piece by generating a new accompaniment for it in the style of another piece. The method we have developed is based on supervised sequence-to-sequence learning using recurrent neural networks (RNNs) and leverages a synthetic parallel (pairwise aligned) dataset generated for this purpose using existing accompaniment generation software. We propose a set of objective metrics to evaluate the performance on this new task and we show that the system is successful in generating an accompaniment in the desired style while following the harmonic structure of the input.In the second part, we investigate a more basic question: the role of positional encodings (PE) in music generation using Transformers. In particular, we propose stochastic positional encoding (SPE), a novel form of PE capturing relative positions while being compatible with a recently proposed family of efficient Transformers.We demonstrate that SPE allows for better extrapolation beyond the training sequence length than the commonly used absolute PE.Finally, in the third part, we turn from symbolic music to audio and address the problem of timbre transfer. Specifically, we are interested in transferring the timbre of an audio recording of a single musical instrument onto another such recording while preserving the pitch content of the latter. We present a novel method for this task, based on an extension of the vector-quantized variational autoencoder (VQ-VAE), along with a simple self-supervised learning strategy designed to obtain disentangled representations of timbre and pitch. As in the first part, we design a set of objective metrics for the task. We show that the proposed method is able to outperform existing ones
Cazau, Dorian. "Automatic Music Transcription based on Prior Knowledge from Musical Acoustics. Application to the repertoires of the Marovany zither of Madagascar". Thesis, Paris 6, 2015. http://www.theses.fr/2015PA066640/document.
Pełny tekst źródłaEthnomusicology is the study of musics around the world that emphasize their cultural, social, material, cognitive and/or biological. This PhD sub- ject, initiated by Pr. Marc CHEMILLIER, ethnomusicolog at the laboratory CAMS-EHESS, deals with the development of an automatic transcription system dedicated to the repertoires of the traditional marovany zither from Madagascar. These repertoires are orally transmitted, resulting from a pro- cess of memorization/transformation of original base musical motives. These motives represent an important culture patrimony, and are evolving contin- ually under the inuences of other musical practices and genres mainly due to globalization. Current ethnomusicological studies aim at understanding the evolution of the traditional repertoire through the transformation of its original base motives, and preserving this patrimony. Our objectives serve this cause by providing computational tools of musical analysis to organize and structure audio recordings of this instrument. Automatic Music Transcription (AMT) consists in automatically estimating the notes in a recording, through three attributes: onset time, duration and pitch. On the long range, AMT systems, with the purpose of retrieving meaningful information from complex audio, could be used in a variety of user scenarios such as searching and organizing music collections with barely any human labor. One common denominator of our diferent approaches to the task of AMT lays in the use of explicit music-related prior knowledge in our computational systems. A step of this PhD thesis was then to develop tools to generate automatically this information. We chose not to restrict ourselves to a speciprior knowledge class, and rather explore the multi-modal characteristics of musical signals, including both timbre (i.e. modeling of the generic \morphological" features of the sound related to the physics of an instrument, e.g. intermodulation, sympathetic resonances, inharmonicity) and musicological (e.g. harmonic transition, playing dynamics, tempo and rhythm) classes. This prior knowledge can then be used in com- putational systems of transcriptions. The research work on AMT performed in this PhD can be divided into a more \applied research" (axis 1), with the development of ready-to-use operational transcription tools meeting the cur- rent needs of ethnomusicologs to get reliable automatic transcriptions, and a more \basic research" (axis 2), providing deeper insight into the functioning of these tools. Our axis of research requires a transcription accuracy high enough 1 (i.e. average F-measure superior to 95 % with standard error tolerances) to provide analytical supports for musicological studies. Despite a large enthusiasm for AMT challenges, and several audio-to-MIDI converters available commercially, perfect polyphonic AMT systems are out of reach of today's al- gorithms. In this PhD, we explore the use of multichannel capturing sensory systems for AMT of several acoustic plucked string instruments, including the following traditional African zithers: the marovany (Madagascar), the Mvet (Cameroun), the N'Goni (Mali). These systems use multiple string- dependent sensors to retrieve discriminatingly some physical features of their vibrations. For the AMT task, such a system has an obvious advantage in this application, as it allows breaking down a polyphonic musical signal into the sum of monophonic signals respective to each string
Roche, Fanny. "Music sound synthesis using machine learning : Towards a perceptually relevant control space". Thesis, Université Grenoble Alpes, 2020. http://www.theses.fr/2020GRALT034.
Pełny tekst źródłaOne of the main challenges of the synthesizer market and the research in sound synthesis nowadays lies in proposing new forms of synthesis allowing the creation of brand new sonorities while offering musicians more intuitive and perceptually meaningful controls to help them reach the perfect sound more easily. Indeed, today's synthesizers are very powerful tools that provide musicians with a considerable amount of possibilities for creating sonic textures, but the control of parameters still lacks user-friendliness and may require some expert knowledge about the underlying generative processes. In this thesis, we are interested in developing and evaluating new data-driven machine learning methods for music sound synthesis allowing the generation of brand new high-quality sounds while providing high-level perceptually meaningful control parameters.The first challenge of this thesis was thus to characterize the musical synthetic timbre by evidencing a set of perceptual verbal descriptors that are both frequently and consensually used by musicians. Two perceptual studies were then conducted: a free verbalization test enabling us to select eight different commonly used terms for describing synthesizer sounds, and a semantic scale analysis enabling us to quantitatively evaluate the use of these terms to characterize a subset of synthetic sounds, as well as analyze how consensual they were.In a second phase, we investigated the use of machine learning algorithms to extract a high-level representation space with interesting interpolation and extrapolation properties from a dataset of sounds, the goal being to relate this space with the perceptual dimensions evidenced earlier. Following previous studies interested in using deep learning for music sound synthesis, we focused on autoencoder models and realized an extensive comparative study of several kinds of autoencoders on two different datasets. These experiments, together with a qualitative analysis made with a non real-time prototype developed during the thesis, allowed us to validate the use of such models, and in particular the use of the variational autoencoder (VAE), as relevant tools for extracting a high-level latent space in which we can navigate smoothly and create new sounds. However, so far, no link between this latent space and the perceptual dimensions evidenced by the perceptual tests emerged naturally.As a final step, we thus tried to enforce perceptual supervision of the VAE by adding a regularization during the training phase. Using the subset of synthetic sounds used in the second perceptual test and the corresponding perceptual grades along the eight perceptual dimensions provided by the semantic scale analysis, it was possible to constraint, to a certain extent, some dimensions of the VAE high-level latent space so as to match these perceptual dimensions. A final comparative test was then conducted in order to evaluate the efficiency of this additional regularization for conditioning the model and (partially) leading to a perceptual control of music sound synthesis
Grégoire, Laurent. "L' émergence et l'évolution du caractère obligatoire des automatismes cognitifs". Phd thesis, Université de Bourgogne, 2013. http://tel.archives-ouvertes.fr/tel-01015620.
Pełny tekst źródłaBertin-Mahieux, Thierry. "Apprentissage statistique pour l'étiquetage de musique et la recommandation". Thèse, 2009. http://hdl.handle.net/1866/7214.
Pełny tekst źródła