Dissertations / Theses on the topic 'Music information processing'

To see the other types of publications on this topic, follow the link: Music information processing.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 42 dissertations / theses for your research on the topic 'Music information processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Al-Shakarchi, Ahmad. "Scalable audio processing across heterogeneous distributed resources : an investigation into distributed audio processing for Music Information Retrieval." Thesis, Cardiff University, 2013. http://orca.cf.ac.uk/47855/.

Full text
Abstract:
Audio analysis algorithms and frameworks for Music Information Retrieval (MIR) are expanding rapidly, providing new ways to discover non-trivial information from audio sources, beyond that which can be ascertained from unreliable metadata such as ID3 tags. MIR is a broad field and many aspects of the algorithms and analysis components that are used are more accurate given a larger dataset for analysis, and often require extensive computational resources. This thesis investigates if, through the use of modern distributed computing techniques, it is possible to design an MIR system that is scalable as the number of participants increases, which adheres to copyright laws and restrictions, whilst at the same time enabling access to a global database of music for MIR applications and research. A scalable platform for MIR analysis would be of benefit to the MIR and scientific community as a whole. A distributed MIR platform that encompasses the creation of MIR algorithms and workflows, their distribution, results collection and analysis, is presented in this thesis. The framework, called DART - Distributed Audio Retrieval using Triana - is designed to facilitate the submission of MIR algorithms and computational tasks against either remotely held music and audio content, or audio provided and distributed by the MIR researcher. Initially a detailed distributed DART architecture is presented, along with simulations to evaluate the validity and scalability of the architecture. The idea of a parameter sweep experiment to find the optimal parameters of the Sub-Harmonic Summation (SHS) algorithm is presented, in order to test the platform and use it to perform useful and real-world experiments that contribute new knowledge to the field. DART is tested on various pre-existing distributed computing platforms and the feasibility of creating a scalable infrastructure for workflow distribution is investigated throughout the thesis, along with the different workflow distribution platforms that could be integrated into the system. The DART parameter sweep experiments begin on a small scale, working up towards the goal of running experiments on thousands of nodes, in order to truly evaluate the scalability of the DART system. The result of this research is a functional and scalable distributed MIR research platform that is capable of performing real world MIR analysis, as demonstrated by the successful completion of several large scale SHS parameter sweep experiments across a variety of different input data - using various distribution methods - and through finding the optimal parameters of the implemented SHS algorithm. DART is shown to be highly adaptable both in terms of the distributed MIR analysis algorithm, as well as the distribution
APA, Harvard, Vancouver, ISO, and other styles
2

Suyoto, Iman S. H., and ishs@ishs net. "Cross-Domain Content-Based Retrieval of Audio Music through Transcription." RMIT University. Computer Science and Information Technology, 2009. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20090527.092841.

Full text
Abstract:
Research in the field of music information retrieval (MIR) is concerned with methods to effectively retrieve a piece of music based on a user's query. An important goal in MIR research is the ability to successfully retrieve music stored as recorded audio using note-based queries. In this work, we consider the searching of musical audio using symbolic queries. We first examined the effectiveness of using a relative pitch approach to represent queries and pieces. Our experimental results revealed that this technique, while effective, is optimal when the whole tune is used as a query. We then suggested an algorithm involving the use of pitch classes in conjunction with the longest common subsequence algorithm between a query and target, also using the whole tune as a query. We also proposed an algorithm that works effectively when only a small part of a tune is used as a query. The algorithm makes use of a sliding window in addition to pitch classes and the longest common subsequence algorithm between a query and target. We examined the algorithm using queries based on the beginning, middle, and ending parts of pieces. We performed experiments on an audio collection and manually-constructed symbolic queries. Our experimental evaluation revealed that our techniques are highly effective, with most queries used in our experiments being able to retrieve a correct answer in the first rank position. In addition, we examined the effectiveness of duration-based features for improving retrieval effectiveness over the use of pitch only. We investigated note durations and inter-onset intervals. For this purpose, we used solely symbolic music so that we could focus on the core of the problem. A relative pitch approach alongside a relative duration representation were used in our experiments. Our experimental results showed that durations fail to significantly improve retrieval effectiveness, whereas inter-onset intervals significantly improve retrieval effectiveness.
APA, Harvard, Vancouver, ISO, and other styles
3

Byron, Timothy Patrick. "The processing of pitch and temporal information in relational memory for melodies." View thesis, 2008. http://handle.uws.edu.au:8081/1959.7/37492.

Full text
Abstract:
Thesis (Ph.D.) -- University of Western Sydney, 2008.
A thesis submitted to the University of Western Sydney, College of Arts, School of Psychology, in fulfilment of the requirements for the degree of Doctor of Philosophy. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
4

Meinz, Elizabeth J. "Musical experience, musical knowledge and age effects on memory for music." Thesis, Georgia Institute of Technology, 1996. http://hdl.handle.net/1853/30881.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Montecchio, Nicola. "Alignment and Identification of Multimedia Data: Application to Music and Gesture Processing." Doctoral thesis, Università degli studi di Padova, 2012. http://hdl.handle.net/11577/3422091.

Full text
Abstract:
The overwhelming availability of large multimedia collections poses increasingly challenging research problems regarding the organization of, and access to data. A general consensus has been reached in the Information Retrieval community, asserting the need for tools that move past metadata-based techniques and exploit directly the information contained in the media. At the same time, interaction with content has evolved beyond the traditional passive enjoyment paradigm, bringing forth the demand for advanced control and manipulation options. The aim of this thesis is to investigate techniques for multimedia data alignment and identification. In particular, music audio streams and gesture-capture time series are considered. Special attention is given to the efficiency of the proposed approaches, namely the realtime applicability of alignment algorithms and the scalability of identification strategies. The concept of alignment refers to the identification and matching of corresponding substructures in related entities. The focus of this thesis is directed towards alignment of sequences with respect to a single dimension, aiming at the identification and matching of significant events in related time series. The alignment of audio recordings of music to their symbolic representations serves as a starting point to explore different methodologies based on statistical models. A unified model for the real time alignment of music audio streams to both symbolic scores and audio references is proposed. Its advantages are twofold: unlike most state-of-the-art systems, tempo is an explicit parameter within the stochastic framework; moreover, both alignment problems can be formulated within a common framework by exploiting a continuous representation of the reference content. A novel application of audio alignment techniques was found in the domain of studio recording productions, reducing the human effort spent in manual repetitive tasks. Gesture alignment is closely related to the domain of music alignment, as the artistic aims and engineering solutions of both areas largely overlap. Expressivity in gesture performance can be characterized by both the choice of a particular gesture and the way the gesture is executed. The former aspect involves a gesture recognition task, while the latter is addressed considering the time-evolution of features and the way these differ from pre-recorded templates. A model, closely related to the mentioned music alignment strategy, is proposed, capable of simultaneously recognizing a gesture among many templates and aligning it against the correct reference in realtime, while jointly estimating signal feature such as rotation, scaling, velocity. Due to the increasingly large volume of music collections, the organization of media items according to their perceptual characteristics has become of fundamental importance. In particular, content-based identification technologies provide the tools to retrieve and organize music documents. Music identification techniques should ideally be able to identify a recording -- by comparing it against a set of known recordings -- independently from the particular performance, even in case of significantly different arrangements and interpretations. Even though alignment techniques play a central role in many works of the music identification literature, the proposed methodology addresses the task using techniques that are usually associated to textual IR. Similarity computation is based on hashing, attempting at creating collisions between vectors that are close in the feature space. The resulting compactness of the representation of audio content allows index-based retrieval strategies to be exploited for maximizing computational efficiency. A particular application is considered, regarding Cultural Heritage preservation institutions. A methodology is proposed to automatically identify recordings in collections of digitized tapes and vinyl discs. This scenario differs significantly from that of a typical identification task, as a query most often contains more than one relevant result (distinct music work). The audio alignment methodology mentioned above is finally exploited to carry out a precise segmentation of recordings into their individual tracks.
La crescente disponibilità di grandi collezioni multimediali porta all'attenzione problemi di ricerca sempre più complessi in materia di organizzazione e accesso ai dati. Nell'ambito della comunità dell'Information Retrieval è stato raggiunto un consenso generale nel ritenere indispensabili nuovi strumenti di reperimento in grado di superare i limiti delle metodologie basate su meta-dati, sfruttando direttamente l'informazione che risiede nel contenuto multimediale. Lo scopo di questa tesi è lo sviluppo di tecniche per l'allineamento e l'identificazione di contenuti multimediali; la trattazione si focalizza su flussi audio musicali e sequenze numeriche registrate tramite dispositivi di cattura del movimento. Una speciale attenzione è dedicata all'efficienza degli approcci proposti, in particolare per quanto riguarda l'applicabilità in tempo reale degli algoritmi di allineamento e la scalabilità delle metodologie di identificazione. L'allineamento di entità comparabili si riferisce al processo di aggiustamento di caratteristiche strutturali allo scopo di permettere una comparazione diretta tra elementi costitutivi corrispondenti. Questa tesi si concentra sull'allineamento di sequenze rispettivamente ad una sola dimensione, con l'obiettivo di identificare e confrontare eventi significativi in sequenze temporali collegate. L'allineamento di registrazioni musicali alla loro rappresentazione simbolica è il punto di partenza adottato per esplorare differenti metodologie basate su modelli statistici. Si propone un modello unificato per l'allineamento in tempo reale di flussi musicali a partiture simboliche e registrazioni audio. I principali vantaggi sono collegati alla trattazione esplicita del tempo (velocità di esecuzione musicale) nell'architettura del modello statistico; inoltre, ambedue i problemi di allineamento sono formulati sfruttando una rappresentazione continua della dimensione temporale. Un'innovativa applicazione delle tecnologie di allineamento audio è proposta nel contesto della produzione di registrazioni musicali, dove l'intervento umano in attività ripetitive è drasticamente ridotto. L'allineamento di movimenti gestuali è strettamente correlato al contesto dell'allineamento musicale, in quanto gli obiettivi artistici e le soluzioni ingegneristiche delle due aree sono largamente coincidenti. L'espressività di un'esecuzione gestuale è caratterizzata simultaneamente dalla scelta del particolare gesto e dal modo di eseguirlo. Il primo aspetto è collegato ad un problema di riconoscimento, mentre il secondo è affrontato considerando l'evoluzione temporale delle caratteristiche del segnale ed il modo in cui queste differiscono da template pre-registrati. Si propone un modello, strettamente legato alla controparte musicale sopra citata, capace di riconoscere un gesto in tempo reale tra una libreria di templates, simultaneamente allineandolo mentre caratteristiche del segnale come rotazione, dimensionamento e velocità sono congiuntamente stimate. Il drastico incremento delle dimensioni delle collezioni musicali ha portato all'attenzione il problema dell'organizzazione di contenuti multimediali secondo caratteristiche percettive. In particolare, le tecnologie di identificazione basate sul contenuto forniscono strumenti appropriati per reperire e organizzare documenti musicali. Queste tecnologie dovrebbero idealmente essere in grado di identificare una registrazione -- attraverso il confronto con un insieme di registrazioni conosciute -- indipendentemente dalla particolare esecuzione, anche in caso di arrangiamenti o interpretazioni significativamente differenti. Sebbene le tecniche di allineamento assumano un ruolo centrale in letteratura, la metodologia proposta sfrutta strategie solitamente associate al reperimento di informazione testuale. Il calcolo della similarità musicale è basato su tecniche di hashing per creare collisioni fra vettori prossimi nello spazio. La compattezza della risultante rappresentazione del contenuto acustico permette l'utilizzo di tecniche di reperimento basate su indicizzazione, allo scopo di massimizzare l'efficienza computazionale. Un'applicazione in particolare è considerata nell'ambito della preservazione dei Beni Culturali, per l'identificazione automatica di collezioni di nastri e dischi in vinile digitalizzati. In questo contesto un supporto generalmente contiene più di un'opera rilevante. La metodologia di allineamento audio citata sopra è infine utilizzata per segmentare registrazioni in tracce individuali.
APA, Harvard, Vancouver, ISO, and other styles
6

Sanden, Christopher, and University of Lethbridge Faculty of Arts and Science. "An empirical evaluation of computational and perceptual multi-label genre classification on music / Christopher Sanden." Thesis, Lethbridge, Alta. : University of Lethbridge, Dept. of Mathematics and Computer Science, c2010, 2010. http://hdl.handle.net/10133/2602.

Full text
Abstract:
Automatic music genre classi cation is a high-level task in the eld of Music Information Retrieval (MIR). It refers to the process of automatically assigning genre labels to music for various tasks, including, but not limited to categorization, organization and browsing. This is a topic which has seen an increase in interest recently as one of the cornerstones of MIR. However, due to the subjective and ambiguous nature of music, traditional single-label classi cation is inadequate. In this thesis, we study multi-label music genre classi cation from perceptual and computational perspectives. First, we design a set of perceptual experiments to investigate the genre-labelling behavior of individuals. The results from these experiments lead us to speculate that multi-label classi cation is more appropriate for classifying music genres. Second, we design a set of computational experiments to evaluate multi-label classi cation algorithms on music. These experiments not only support our speculation but also reveal which algorithms are more suitable for music genre classi cation. Finally, we propose and examine a group of ensemble approaches for combining multi-label classi cation algorithms to further improve classi cation performance. ii
viii, 87 leaves ; 29 cm
APA, Harvard, Vancouver, ISO, and other styles
7

Fiebrink, Rebecca. "An exploration of feature selection as a tool for optimizing musical genre classification /." Thesis, McGill University, 2006. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=99372.

Full text
Abstract:
The computer classification of musical audio can form the basis for systems that allow new ways of interacting with digital music collections. Existing music classification systems suffer, however, from inaccuracy as well as poor scalability. Feature selection is a machine-learning tool that can potentially improve both accuracy and scalability of classification. Unfortunately, there is no consensus on which feature selection algorithms are most appropriate or on how to evaluate the effectiveness of feature selection. Based on relevant literature in music information retrieval (MIR) and machine learning and on empirical testing, the thesis specifies an appropriate evaluation method for feature selection, employs this method to compare existing feature selection algorithms, and evaluates an appropriate feature selection algorithm on the problem of musical genre classification. The outcomes include an increased understanding of the potential for feature selection to benefit MIR and a new technique for optimizing one type of classification-based system.
APA, Harvard, Vancouver, ISO, and other styles
8

Bianchi, Frederick W. "The cognition of atonal pitch structures." Virtual Press, 1985. http://liblink.bsu.edu/uhtbin/catkey/438705.

Full text
Abstract:
The Cognition of Atonal Pitch Structures investigated the ability of a listener to internally organize atonal pitch sequences into hierarchical structures. Based on an information processing model proposed by Deutsch and Feroe (1981), the internal organization of well processed pitch sequences will result in the formation of hierarchical structures. The more efficiently information is processed by the listener, the more organized its internal hierarchical representation in memory. Characteristic of a well organized internal hierarchy As redundancy. Each ensuing level of the hierarchical structure represents a parsimoniuos recoding of the lower levels. In this respect, each higher hierarchical level contains the most salient structural features extracted from lower levels.Because efficient internal organization increases redundancy, more memory space must be allocated to retain a well processed pitch sequence. Based on this assumption, an experiment was conducted to determine the amount of information retained when listening to pre-organized atonal pitch structures and randomly organized pitch structures. Using time duration estimation techniques (Ornstein, 1969; Block, 1974), the relative size of memory allocated for a processing task was determined. Since the subjective experience of time is influenced by the amount of information processed and retained in memory (Ornstein, 1969; Block, 1974), longer time estimations corresponded to larger memory space allocations, and thus, more efficiently organized hierarchical structures.ConclusionThough not significant at the .05 level (p-.21), the results indicate a tendency to suggest that atonal pitch structures were more efficiently organized into internal hierarchical structures than were random pitch structures. The results of the experiment also suggest that a relationship exists between efficient internal hierarchical organization and increased attention and enjoyment. The present study also investigated the influence that other parameters may have on the cognition of pre-organized music. Of interest were the characteristics inherent in music which may facilitate internal organization.
APA, Harvard, Vancouver, ISO, and other styles
9

Streich, Sebastian. "Music complexity: a multi-faceted description of audio content." Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7545.

Full text
Abstract:
Esta tesis propone un juego de algoritmos que puede emplearse para computar estimaciones de las distintas facetas de complejidad que ofrecen señales musicales auditivas. Están enfocados en los aspectos de acústica, ritmo, timbre y tonalidad. Así pues, la complejidad musical se entiende aquí en el nivel más basto del común acuerdo entre oyentes humanos. El objetivo es obtener juicios de complejidad mediante computación automática que resulten similares al punto de vista de un oyente ingenuo. La motivación de la presente investigación es la de mejorar la interacción humana con colecciones de música digital. Según se discute en la tesis,hay toda una serie de tareas a considerar, como la visualización de una colección, la generación de listas de reproducción o la recomendación automática de música. A través de las estimaciones de complejidad musical provistas por los algoritmos descritos, podemos obtener acceso a un nivel de descripción semántica de la música que ofrecerá novedosas e interesantes soluciones para estas tareas.
This thesis proposes a set of algorithms that can be used to compute estimates of music complexity facets from musical audio signals. They focus on aspects of acoustics, rhythm, timbre, and tonality. Music complexity is thereby considered on the coarse level of common agreement among human listeners. The target is to obtain complexity judgments through automatic computation that resemble a naive listener's point of view. The motivation for the presented research lies in the enhancement of human interaction with digital music collections. As we will discuss, there is a variety of tasks to be considered, such as collection visualization, play-list generation, or the automatic recommendation of music. Through the music complexity estimates provided by the described algorithms we can obtain access to a level of semantic music description, which allows for novel and interesting solutions of these tasks.
APA, Harvard, Vancouver, ISO, and other styles
10

SIMONETTA, FEDERICO. "MUSIC INTERPRETATION ANALYSIS. A MULTIMODAL APPROACH TO SCORE-INFORMED RESYNTHESIS OF PIANO RECORDINGS." Doctoral thesis, Università degli Studi di Milano, 2022. http://hdl.handle.net/2434/918909.

Full text
Abstract:
This Thesis discusses the development of technologies for the automatic resynthesis of music recordings using digital synthesizers. First, the main issue is identified in the understanding of how Music Information Processing (MIP) methods can take into consideration the influence of the acoustic context on the music performance. For this, a novel conceptual and mathematical framework named “Music Interpretation Analysis” (MIA) is presented. In the proposed framework, a distinction is made between the “performance” – the physical action of playing – and the “interpretation” – the action that the performer wishes to achieve. Second, the Thesis describes further works aiming at the democratization of music production tools via automatic resynthesis: 1) it elaborates software and file formats for historical music archiving and multimodal machine-learning datasets; 2) it explores and extends MIP technologies; 3) it presents the mathematical foundations of the MIA framework and shows preliminary evaluations to demonstrate the effectiveness of the approach
APA, Harvard, Vancouver, ISO, and other styles
11

Liljeqvist, Sandra. "Named Entity Recognition for Search Queries in the Music Domain." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-193332.

Full text
Abstract:
This thesis addresses the problem of named entity recognition (NER) in music-related search queries. NER is the task of identifying keywords in text and classifying them into predefined categories. Previous work in the field has mainly focused on longer documents of editorial texts. However, in recent years, the application of NER for queries has attracted increased attention. This task is, however, acknowledged to be challenging due to queries being short, ungrammatical and containing minimal linguistic context. The usage of NER for queries is especially useful for the implementation of natural language queries in domain-specific search applications. These applications are often backed by a database, where the query format otherwise is restricted to keyword search or the usage of a formal query language. In this thesis, two techniques for NER for music-related queries are evaluated; a conditional random field based solution and a probabilistic solution based on context words. As a baseline, the most elementary implementation of NER, commonly applied on editorial text, is used. Both of the evaluated approaches outperform the baseline and demonstrate an overall F1 score of 79.2% and 63.4% respectively. The experimental results show a high precision for the probabilistic approach and the conditional random field based solution demonstrates an F1 score comparable to previous studies from other domains.
Denna avhandling redogör för identifiering av namngivna enheter i musikrelaterade sökfrågor. Identifiering av namngivna enheter innebär att extrahera nyckelord från text och att klassificera dessa till någon av ett antal förbestämda kategorier. Tidigare forskning kring ämnet har framför allt fokuserat på längre redaktionella dokument. Däremot har intresset för tillämpningar på sökfrågor ökat de senaste åren. Detta anses vara ett svårt problem då sökfrågor i allmänhet är korta, grammatiskt inkorrekta och innehåller minimal språklig kontext. Identifiering av namngivna enheter är framför allt användbart för domänspecifika sökapplikationer där målet är att kunna tolka sökfrågor skrivna med naturligt språk. Dessa applikationer baseras ofta på en databas där formatet på sökfrågorna annars är begränsat till att enbart använda nyckelord eller användande av ett formellt frågespråk. I denna avhandling har två tekniker för identifiering av namngivna enheter för musikrelaterade sökfrågor undersökts; en metod baserad på villkorliga slumpfält (eng. conditional random field) och en probabilistisk metod baserad på kontextord. Som baslinje har den mest grundläggande implementationen, som vanligtvis används för redaktionella texter, valts. De båda utvärderade metoderna presterar bättre än baslinjen och ges ett F1-värde på 79,2% respektive 63,4%. De experimentella resultaten visar en hög precision för den probabilistiska implementationen och metoden ba- serad på villkorliga slumpfält visar på resultat på en nivå jämförbar med tidigare studier inom andra domäner.
APA, Harvard, Vancouver, ISO, and other styles
12

Hedén, Malm Jacob, and Kyle Sinclair. "Categorisation of the Emotional Tone of Music using Neural Networks." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279342.

Full text
Abstract:
Machine categorisation of the emotional content of music is an ongoing research area. Feature description and extraction for such a vague and subjective field as emotion presents a difficulty for human-designed audioprocessing. Research into machine categorisation of music based on genrehas expanded as media companies have increased their recommendation and automation efforts, but work into categorising music based on sentiment remains lacking. We took an informed experimental method towards finding a workable solution for a multimedia company, Ichigoichie, who wished to develop a generalizable classifier on musical qualities. This consisted of first orienting ourselves within the academic literature relevant on the subject, which suggested applying spectrographic pre-processing to the sound samples, and then analyzing these visually with a convolutional neural network. To verify this method, we prototyped the model in a high level framework utilizing Python which pre-processes 10 second audio files into spectrographs and then provides these as learning data to a convolutional neural network. This network is assessed on both its categorization accuracy and its generalizability to other data sets. Our results show that the method is justifiable as a technique for providing machine categorization of music based on genre, and even provides evidence that such a method is technically feasible for commercial applications today.
Maskinkategorisering av känsloprofilen i musik är ett pågående forskningsområde. Traditionellt sett görs detta med algoritmer som är skräddarsydda för en visstyp av musik och kategoriseringsområde. En nackdel med detta är att det inte går att applicera sådana algoritmer på flera användningsområden, och att det krävs både god musikkunnighet och även tekniskt vetande för att lyckas utveckla sådana algoritmer. På grund av dessa anledningar ökar stadigt mängden av forskning runt huruvida samma ändamål går att åstadkommas med hjälp av maskininlärningstekniker, och speciellt artificiella neuronnät, en delgrupp av maskininlärning. I detta forskningsprojekt ämnade vi att fortsätta med detta forskningsområde,och i slutändan hoppas kunna besvara frågan om huruvida det går att klassificera och kategorisera musik utifrån känsloprofilen inom musiken, med hjälp av artificiella neuronnät. Vi fann genom experimentell forskning att artificiella neuronnät är en mycket lovande teknik för klassificering av musik, och uppnådde goda resultat. Metoden som användes bestådde av spektrografisk ljudprocessering, och sedan analys av dessa spektrogram med konvolutionella neuronnät, en sorts artificiella neuronnät ämnade för visuell analys.
APA, Harvard, Vancouver, ISO, and other styles
13

Presti, G. "SIGNAL TRANSFORMATIONS FOR IMPROVING INFORMATION REPRESENTATION, FEATURE EXTRACTION AND SOURCE SEPARATION." Doctoral thesis, Università degli Studi di Milano, 2017. http://hdl.handle.net/2434/470676.

Full text
Abstract:
Questa tesi riguarda nuovi metodi di rappresentazione del segnale nel dominio tempo-frequenza, tali da mostrare le informazioni ricercate come dimensioni esplicite di un nuovo spazio. In particolare due trasformate sono introdotte: lo Spazio di Miscelazione Bivariato (Bivariate Mixture Space) e il Campo della Struttura Spettro-Temporale (Spectro-Temporal Structure-Field). La prima trasformata mira a evidenziare le componenti latenti di un segnale bivariato basandosi sul comportamento di ogni componente frequenziale (ad esempio a fini di separazione delle sorgenti); la seconda trasformata mira invece all'incapsulamento di informazioni relative al vicinato di un punto in R^2 in un vettore associato al punto stesso, tale da descrivere alcune proprietà topologiche della funzione di partenza. Nel dominio dell'elaborazione digitale del segnale audio, il Bivariate Mixture Space può essere interpretato come un modo di investigare lo spazio stereofonico per operazioni di separazione delle sorgenti o di estrazione di informazioni, mentre lo Spectro-Temporal Structure-Field può essere usato per ispezionare lo spazio spettro-temporale (segregare suoni percussivi da suoni intonati o tracciae modulazioni di frequenza). Queste trasformate sono studiate e testate anche in relazione allo stato del'arte in campi come la separazione delle sorgenti, l'estrazione di informazioni e la visualizzazione dei dati. Nel campo dell'informatica applicata al suono, queste tecniche mirano al miglioramento della rappresentazione del segnale nel dominio tempo-frequenza, in modo tale da rendere possibile l'esplorazione dello spettro anche in spazi alternativi, quali il panorama stereofonico o una dimensione virtuale che separa gli aspetti percussivi da quelli intonati.
This thesis is about new methods of signal representation in time-frequency domain, so that required information is rendered as explicit dimensions in a new space. In particular two transformations are presented: Bivariate Mixture Space and Spectro-Temporal Structure-Field. The former transform aims at highlighting latent components of a bivariate signal based on the behaviour of each frequency base (e.g. for source separation purposes), whereas the latter aims at folding neighbourhood information of each point of a R^2 function into a vector, so as to describe some topological properties of the function. In the audio signal processing domain, the Bivariate Mixture Space can be interpreted as a way to investigate the stereophonic space for source separation and Music Information Retrieval tasks, whereas the Spectro-Temporal Structure-Field can be used to inspect spectro-temporal dimension (segregate pitched vs. percussive sounds or track pitch modulations). These transformations are investigated and tested against state-of-the-art techniques in fields such as source separation, information retrieval and data visualization. In the field of sound and music computing, these techniques aim at improving the frequency domain representation of signals such that the exploration of the spectrum can be achieved also in alternative spaces like the stereophonic panorama or a virtual percussive vs. pitched dimension.
APA, Harvard, Vancouver, ISO, and other styles
14

Sandrock, Trudie. "Multi-label feature selection with application to musical instrument recognition." Thesis, Stellenbosch : Stellenbosch University, 2013. http://hdl.handle.net/10019/11071.

Full text
Abstract:
Thesis (PhD)--Stellenbosch University, 2013.
ENGLISH ABSTRACT: An area of data mining and statistics that is currently receiving considerable attention is the field of multi-label learning. Problems in this field are concerned with scenarios where each data case can be associated with a set of labels instead of only one. In this thesis, we review the field of multi-label learning and discuss the lack of suitable benchmark data available for evaluating multi-label algorithms. We propose a technique for simulating multi-label data, which allows good control over different data characteristics and which could be useful for conducting comparative studies in the multi-label field. We also discuss the explosion in data in recent years, and highlight the need for some form of dimension reduction in order to alleviate some of the challenges presented by working with large datasets. Feature (or variable) selection is one way of achieving dimension reduction, and after a brief discussion of different feature selection techniques, we propose a new technique for feature selection in a multi-label context, based on the concept of independent probes. This technique is empirically evaluated by using simulated multi-label data and it is shown to achieve classification accuracy with a reduced set of features similar to that achieved with a full set of features. The proposed technique for feature selection is then also applied to the field of music information retrieval (MIR), specifically the problem of musical instrument recognition. An overview of the field of MIR is given, with particular emphasis on the instrument recognition problem. The particular goal of (polyphonic) musical instrument recognition is to automatically identify the instruments playing simultaneously in an audio clip, which is not a simple task. We specifically consider the case of duets – in other words, where two instruments are playing simultaneously – and approach the problem as a multi-label classification one. In our empirical study, we illustrate the complexity of musical instrument data and again show that our proposed feature selection technique is effective in identifying relevant features and thereby reducing the complexity of the dataset without negatively impacting on performance.
AFRIKAANSE OPSOMMING: ‘n Area van dataontginning en statistiek wat tans baie aandag ontvang, is die veld van multi-etiket leerteorie. Probleme in hierdie veld beskou scenarios waar elke datageval met ‘n stel etikette geassosieer kan word, instede van slegs een. In hierdie skripsie gee ons ‘n oorsig oor die veld van multi-etiket leerteorie en bespreek die gebrek aan geskikte standaard datastelle beskikbaar vir die evaluering van multi-etiket algoritmes. Ons stel ‘n tegniek vir die simulasie van multi-etiket data voor, wat goeie kontrole oor verskillende data eienskappe bied en wat nuttig kan wees om vergelykende studies in die multi-etiket veld uit te voer. Ons bespreek ook die onlangse ontploffing in data, en beklemtoon die behoefte aan ‘n vorm van dimensie reduksie om sommige van die uitdagings wat deur sulke groot datastelle gestel word die hoof te bied. Veranderlike seleksie is een manier van dimensie reduksie, en na ‘n vlugtige bespreking van verskillende veranderlike seleksie tegnieke, stel ons ‘n nuwe tegniek vir veranderlike seleksie in ‘n multi-etiket konteks voor, gebaseer op die konsep van onafhanklike soek-veranderlikes. Hierdie tegniek word empiries ge-evalueer deur die gebruik van gesimuleerde multi-etiket data en daar word gewys dat dieselfde klassifikasie akkuraatheid behaal kan word met ‘n verminderde stel veranderlikes as met die volle stel veranderlikes. Die voorgestelde tegniek vir veranderlike seleksie word ook toegepas in die veld van musiek dataontginning, spesifiek die probleem van die herkenning van musiekinstrumente. ‘n Oorsig van die musiek dataontginning veld word gegee, met spesifieke klem op die herkenning van musiekinstrumente. Die spesifieke doel van (polifoniese) musiekinstrument-herkenning is om instrumente te identifiseer wat saam in ‘n oudiosnit speel. Ons oorweeg spesifiek die geval van duette – met ander woorde, waar twee instrumente saam speel – en hanteer die probleem as ‘n multi-etiket klassifikasie een. In ons empiriese studie illustreer ons die kompleksiteit van musiekinstrumentdata en wys weereens dat ons voorgestelde veranderlike seleksie tegniek effektief daarin slaag om relevante veranderlikes te identifiseer en sodoende die kompleksiteit van die datastel te verminder sonder ‘n negatiewe impak op klassifikasie akkuraatheid.
APA, Harvard, Vancouver, ISO, and other styles
15

Oramas, Martín Sergio. "Knowledge extraction and representation learning for music recommendation and classification." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/457709.

Full text
Abstract:
In this thesis, we address the problems of classifying and recommending music present in large collections. We focus on the semantic enrichment of descriptions associated to musical items (e.g., artists biographies, album reviews, metadata), and the exploitation of multimodal data (e.g., text, audio, images). To this end, we first focus on the problem of linking music-related texts with online knowledge repositories and on the automated construction of music knowledge bases. Then, we show how modeling semantic information may impact musicological studies and helps to outperform purely text-based approaches in music similarity, classification, and recommendation. Next, we focus on learning new data representations from multimodal content using deep learning architectures, addressing the problems of cold-start music recommendation and multi-label music genre classification, combining audio, text, and images. We show how the semantic enrichment of texts and the combination of learned data representations improve the performance on both tasks.
En esta tesis, abordamos los problemas de clasificar y recomendar música en grandes colecciones, centrándonos en el enriquecimiento semántico de descripciones (biografías, reseñas, metadatos), y en el aprovechamiento de datos multimodales (textos, audios e imágenes). Primero nos centramos en enlazar textos con bases de conocimiento y en su construcción automatizada. Luego mostramos cómo el modelado de información semántica puede impactar en estudios musicológicos, y contribuye a superar a métodos basados en texto, tanto en similitud como en clasificación y recomendación de música. A continuación, investigamos el aprendizaje de nuevas representaciones de datos a partir de contenidos multimodales utilizando redes neuronales, y lo aplicamos a los problemas de recomendar música nueva y clasificar géneros musicales con múltiples etiquetas, mostrando que el enriquecimiento semántico y la combinación de representaciones aprendidas produce mejores resultados.
APA, Harvard, Vancouver, ISO, and other styles
16

Weese, Joshua L. "A convolutive model for polyphonic instrument identification and pitch detection using combined classification." Thesis, Kansas State University, 2013. http://hdl.handle.net/2097/15599.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
William H. Hsu
Pitch detection and instrument identification can be achieved with relatively high accuracy when considering monophonic signals in music; however, accurately classifying polyphonic signals in music remains an unsolved research problem. Pitch and instrument classification is a subset of Music Information Retrieval (MIR) and automatic music transcription, both having numerous research and real-world applications. Several areas of research are covered in this thesis, including the fast Fourier transform, onset detection, convolution, and filtering. Basic music theory and terms are also presented in order to explain the context and structure of data used. The focus of this thesis is on the representation of musical signals in the frequency domain. Polyphonic signals with many different voices and frequencies can be exceptionally complex. This thesis presents a new model for representing the spectral structure of polyphonic signals: Uniform MAx Gaussian Envelope (UMAGE). The new spectral envelope precisely approximates the distribution of frequency parts in the spectrum while still being resilient to oscillating rapidly (noise) and is able to generalize well without losing the representation of the original spectrum. When subjectively compared to other spectral envelope methods, such as the linear predictive coding envelope method and the cepstrum envelope method, UMAGE is able to model high order polyphonic signals without dropping partials (frequencies present in the signal). In other words, UMAGE is able to model a signal independent of the signal’s periodicity. The performance of UMAGE is evaluated both objectively and subjectively. It is shown that UMAGE is robust at modeling the distribution of frequencies in simple and complex polyphonic signals. Combined classification (combiners), a methodology for learning large concepts, is used to simplify the learning process and boost classification results. The output of each learner is then averaged to get the final result. UMAGE is less accurate when identifying pitches; however, it is able to achieve accuracy in identifying instrument groups on order-10 polyphonic signals (ten voices), which is competitive with the current state of the field.
APA, Harvard, Vancouver, ISO, and other styles
17

Hornstein, Daniel L. (Daniel Lather). "Relationships Between Selected Musical Aural Discrimination Skills and a Multivariate Measure of Intellectual Skills." Thesis, North Texas State University, 1986. https://digital.library.unt.edu/ark:/67531/metadc331803/.

Full text
Abstract:
This study attempted to explore the strength and nature of relationships between specific intellectual information processing skills included in a multi-dimensional model conceived by Guilford, and measured by Meeker's Structure of Intellect - Learning Abilities Test, and specific musical aural discrimination skills as measured by Gordon's Musical Aptitude Profile. Three research questions were posed, which involved determining the strength and the nature of the relationship between MAP melodic, rhythmic, and aesthetic discrimination abilities and the intellectual information processing skills comprising the SOI - LA. Both instruments were administered to 387 fourth, fifth, and sixth graders from schools in the Dallas area. After a pilot study established the feasibility of the study and reliability estimates of the test instruments, multiple regression analysis determined that 10% to 15% of the variance between intellectual information-processing skills and the individual musical aural discrimination abilities was in common (r = +.32 to r = +.39). It was further determined that only six specific SOI intellectual dimensions, all involving the skills of "Cognition" and "Evaluation", were significantly related to the musical aural discrimination abilities. Through the use of the Coefficient of Partial Correlation, the strength of each individual information-processing skill's unique contribution to that covariance was determined. The study indicated that "Semantic" mental information processing skills, involving the ability to recall an abstract meaning or procedure given an external stimulus, play an extremely important part within this relationship. Skills of a "Figural" nature, which involve comprehending either a physical object or an non-physical idea and separating it from other impinging stimuli also enter into the relationship, although not to so high an extent. Finally, it was observed that the dimensions involving an understanding of "Systems", those mental skills which deal with groupings of figures, symbols, or semantic relationships, also was important to the relationship.
APA, Harvard, Vancouver, ISO, and other styles
18

Fančal, Petr. "Analýza zvukové interpretace hudby metodami číslicového zpracování signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2017. http://www.nusl.cz/ntk/nusl-317017.

Full text
Abstract:
The aim of this master's thesis is the analysis of musical compositions from the standpoint of time resources of music. The introduction briefly describes the basic musicological terms and variables that are in direct relationship to the time resources in expressive music performance. The following part of the work is devoted to the known methods of digital signal processing, suitable for music information retrieval from audio recordings. In the practical part these methods are demonstrated on three recordings in MATLAB environment and the results were compared in terms of used agogics.
APA, Harvard, Vancouver, ISO, and other styles
19

Serrà, Julià Joan. "Identification of versions of the same musical composition by processing audio descriptions." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/22674.

Full text
Abstract:
This work focuses on the automatic identification of musical piece versions (alternate renditions of the same musical composition like cover songs, live recordings, remixes, etc.). In particular, we propose two core approaches for version identification: model-free and model-based ones. Furthermore, we introduce the use of post-processing strategies to improve the identification of versions. For all that we employ nonlinear signal analysis tools and concepts, complex networks, and time series models. Overall, our work brings automatic version identification to an unprecedented stage where high accuracies are achieved and, at the same time, explores promising directions for future research. Although our steps are guided by the nature of the considered signals (music recordings) and the characteristics of the task at hand (version identification), we believe our methodology can be easily transferred to other contexts and domains.
Aquest treball es centra en la identificació automàtica de versions musicals (interpretacions alternatives d'una mateixa composició: 'covers', directes, remixos, etc.). En concret, proposem dos tiupus d'estratègies: la lliure de model i la basada en models. També introduïm tècniques de post-processat per tal de millorar la identificació de versions. Per fer tot això emprem conceptes relacionats amb l'anàlisi no linial de senyals, xarxes complexes i models de sèries temporals. En general, el nostre treball porta la identificació automàtica de versions a un estadi sense precedents on s'obtenen bons resultats i, al mateix temps, explora noves direccions de futur. Malgrat que els passos que seguim estan guiats per la natura dels senyals involucrats (enregistraments musicals) i les característiques de la tasca que volem solucionar (identificació de versions), creiem que la nostra metodologia es pot transferir fàcilment a altres àmbits i contextos.
APA, Harvard, Vancouver, ISO, and other styles
20

Fuentes, Magdalena. "Multi-scale computational rhythm analysis : a framework for sections, downbeats, beats, and microtiming." Thesis, Université Paris-Saclay (ComUE), 2019. http://www.theses.fr/2019SACLS404.

Full text
Abstract:
La modélisation computationnelle du rythme a pour objet l'extraction et le traitement d’informations rythmiques à partir d’un signal audio de musique. Cela s'avère être une tâche extrêmement complexe car, pour traiter un enregistrement audio réel, il faut pouvoir gérer sa complexité acoustique et sémantique à plusieurs niveaux de représentation. Les méthodes d’analyse rythmique existantes se concentrent généralement sur l'un de ces aspects à la fois et n’exploitent pas la richesse de la structure musicale, ce qui compromet la cohérence musicale des estimations automatiques. Dans ce travail, nous proposons de nouvelles approches tirant parti des informations multi-échelles pour l'analyse automatique du rythme. Nos modèles prennent en compte des interdépendances intrinsèques aux signaux audio de musique, en permettant ainsi l’interaction entre différentes échelles de temps et en assurant la cohérence musicale entre elles. En particulier, nous effectuons une analyse systématique des systèmes de l’état de l’art pour la détection des premiers temps, ce qui nous conduit à nous tourner vers des architectures convolutionnelles et récurrentes qui exploitent la modélisation acoustique à court et long terme; nous introduisons un modèle de champ aléatoire conditionnel à saut de chaîne pour la détection des premiers temps. Ce système est conçu pour tirer parti des informations de structure musicale (c'est-à-dire des répétitions de sections musicales) dans un cadre unifié. Nous proposons également un modèle linguistique pour la détection conjointe des temps et du micro-timing dans la musique afro-latino-américaine. Nos méthodes sont systématiquement évaluées sur diverses bases de données, allant de la musique occidentale à des genres plus spécifiques culturellement, et comparés à des systèmes de l’état de l’art, ainsi qu’à des variantes plus simples. Les résultats globaux montrent que nos modèles d’estimation des premiers temps sont aussi performants que ceux de l’état de l'art, tout en étant plus cohérents sur le plan musical. De plus, notre modèle d’estimation conjointe des temps et du microtiming représente une avancée vers des systèmes plus interprétables. Les méthodes présentées ici offrent des alternatives nouvelles et plus holistiques pour l'analyse numérique du rythme, ouvrant des perspectives vers une analyse automatique plus complète de la musique
Computational rhythm analysis deals with extracting and processing meaningful rhythmical information from musical audio. It proves to be a highly complex task, since dealing with real audio recordings requires the ability to handle its acoustic and semantic complexity at multiple levels of representation. Existing methods for rhythmic analysis typically focus on one of those levels, failing to exploit music’s rich structure and compromising the musical consistency of automatic estimations. In this work, we propose novel approaches for leveraging multi-scale information for computational rhythm analysis. Our models account for interrelated dependencies that musical audio naturally conveys, allowing the interplay between different time scales and accounting for music coherence across them. In particular, we conduct a systematic analysis of downbeat tracking systems, leading to convolutional-recurrent architectures that exploit short and long term acoustic modeling; we introduce a skip-chain conditional random field model for downbeat tracking designed to take advantage of music structure information (i.e. music sections repetitions) in a unified framework; and we propose a language model for joint tracking of beats and micro-timing in Afro-Latin American music. Our methods are systematically evaluated on a diverse group of datasets, ranging from Western music to more culturally specific genres, and compared to state-of-the-art systems and simpler variations. The overall results show that our models for downbeat tracking perform on par with the state of the art, while being more musically consistent. Moreover, our model for the joint estimation of beats and microtiming takes further steps towards more interpretable systems. The methods presented here offer novel and more holistic alternatives for computational rhythm analysis, towards a more comprehensive automatic analysis of music
APA, Harvard, Vancouver, ISO, and other styles
21

Yesiler, M. Furkan. "Data-driven musical version identification: accuracy, scalability and bias perspectives." Doctoral thesis, Universitat Pompeu Fabra, 2022. http://hdl.handle.net/10803/673264.

Full text
Abstract:
This dissertation aims at developing audio-based musical version identification (VI) systems for industry-scale corpora. To employ such systems in industrial use cases, they must demonstrate high performance on large-scale corpora while not favoring certain musicians or tracks above others. Therefore, the three main aspects we address in this dissertation are accuracy, scalability, and algorithmic bias of VI systems. We propose a data-driven model that incorporates domain knowledge in its network architecture and training strategy. We then take two main directions to further improve our model. Firstly, we experiment with data-driven fusion methods to combine information from models that process harmonic and melodic information, which greatly enhances identification accuracy. Secondly, we investigate embedding distillation techniques to reduce the size of the embeddings produced by our model, which reduces the requirements for data storage and, more importantly, retrieval time. Lastly, we analyze the algorithmic biases of our systems.
En esta tesis se desarrollan sistemas de identificación de versiones musicales basados en audio y aplicables en un entorno industrial. Por lo tanto, los tres aspectos que se abordan en esta tesis son el desempeño, escalabilidad, y los sesgos algorítmicos en los sistemas de identificación de versiones. Se propone un modelo dirigido por datos que incorpora conocimiento musical en su arquitectura de red y estrategia de entrenamiento, para lo cual se experimenta con dos enfoques. Primero, se experimenta con métodos de fusión dirigidos por datos para combinar la información de los modelos que procesan información melódica y armónica, logrando un importante incremento en la exactitud de la identificación. Segundo, se investigan técnicas para la destilación de embeddings para reducir su tamaño, lo cual reduce los requerimientos de almacenamiento de datos, y lo que es más importante, del tiempo de búsqueda. Por último, se analizan los sesgos algorítmicos de nuestros sistemas.
APA, Harvard, Vancouver, ISO, and other styles
22

Salamon, Justin J. "Melody extraction from polyphonic music signals." Doctoral thesis, Universitat Pompeu Fabra, 2013. http://hdl.handle.net/10803/123777.

Full text
Abstract:
Music was the first mass-market industry to be completely restructured by digital technology, and today we can have access to thousands of tracks stored locally on our smartphone and millions of tracks through cloud-based music services. Given the vast quantity of music at our fingertips, we now require novel ways of describing, indexing, searching and interacting with musical content. In this thesis we focus on a technology that opens the door to a wide range of such applications: automatically estimating the pitch sequence of the melody directly from the audio signal of a polyphonic music recording, also referred to as melody extraction. Whilst identifying the pitch of the melody is something human listeners can do quite well, doing this automatically is highly challenging. We present a novel method for melody extraction based on the tracking and characterisation of the pitch contours that form the melodic line of a piece. We show how different contour characteristics can be exploited in combination with auditory streaming cues to identify the melody out of all the pitch content in a music recording using both heuristic and model-based approaches. The performance of our method is assessed in an international evaluation campaign where it is shown to obtain state-of-the-art results. In fact, it achieves the highest mean overall accuracy obtained by any algorithm that has participated in the campaign to date. We demonstrate the applicability of our method both for research and end-user applications by developing systems that exploit the extracted melody pitch sequence for similarity-based music retrieval (version identification and query-by-humming), genre classification, automatic transcription and computational music analysis. The thesis also provides a comprehensive comparative analysis and review of the current state-of-the-art in melody extraction and a first of its kind analysis of melody extraction evaluation methodology.
La industria de la música fue una de las primeras en verse completamente reestructurada por los avances de la tecnología digital, y hoy en día tenemos acceso a miles de canciones almacenadas en nuestros dispositivos móviles y a millones más a través de servicios en la nube. Dada esta inmensa cantidad de música al nuestro alcance, necesitamos nuevas maneras de describir, indexar, buscar e interactuar con el contenido musical. Esta tesis se centra en una tecnología que abre las puertas a nuevas aplicaciones en este área: la extracción automática de la melodía a partir de una grabación musical polifónica. Mientras que identificar la melodía de una pieza es algo que los humanos pueden hacer relativamente bien, hacerlo de forma automática presenta mucha complejidad, ya que requiere combinar conocimiento de procesado de señal, acústica, aprendizaje automático y percepción sonora. Esta tarea se conoce en el ámbito de investigación como “extracción de melodía”, y consiste técnicamente en estimar la secuencia de alturas correspondiente a la melodía predominante de una pieza musical a partir del análisis de la señal de audio. Esta tesis presenta un método innovador para la extracción de la melodía basado en el seguimiento y caracterización de contornos tonales. En la tesis, mostramos cómo se pueden explotar las características de contornos en combinación con reglas basadas en la percepción auditiva, para identificar la melodía a partir de todo el contenido tonal de una grabación, tanto de manera heurística como a través de modelos aprendidos automáticamente. A través de una iniciativa internacional de evaluación comparativa de algoritmos, comprobamos además que el método propuesto obtiene resultados punteros. De hecho, logra la precisión más alta de todos los algoritmos que han participado en la iniciativa hasta la fecha. Además, la tesis demuestra la utilidad de nuestro método en diversas aplicaciones tanto de investigación como para usuarios finales, desarrollando una serie de sistemas que aprovechan la melodía extraída para la búsqueda de música por semejanza (identificación de versiones y búsqueda por tarareo), la clasificación del estilo musical, la transcripción o conversión de audio a partitura, y el análisis musical con métodos computacionales. La tesis también incluye un amplio análisis comparativo del estado de la cuestión en extracción de melodía y el primer análisis crítico existente de la metodología de evaluación de algoritmos de este tipo
La indústria musical va ser una de les primeres a veure's completament reestructurada pels avenços de la tecnologia digital, i avui en dia tenim accés a milers de cançons emmagatzemades als nostres dispositius mòbils i a milions més a través de serveis en xarxa. Al tenir aquesta immensa quantitat de música al nostre abast, necessitem noves maneres de descriure, indexar, buscar i interactuar amb el contingut musical. Aquesta tesi es centra en una tecnologia que obre les portes a noves aplicacions en aquesta àrea: l'extracció automàtica de la melodia a partir d'una gravació musical polifònica. Tot i que identificar la melodia d'una peça és quelcom que els humans podem fer relativament fàcilment, fer-ho de forma automàtica presenta una alta complexitat, ja que requereix combinar coneixement de processament del senyal, acústica, aprenentatge automàtic i percepció sonora. Aquesta tasca es coneix dins de l'àmbit d'investigació com a “extracció de melodia”, i consisteix tècnicament a estimar la seqüència de altures tonals corresponents a la melodia predominant d'una peça musical a partir de l'anàlisi del senyal d'àudio. Aquesta tesi presenta un mètode innovador per a l'extracció de la melodia basat en el seguiment i caracterització de contorns tonals. Per a fer-ho, mostrem com es poden explotar les característiques de contorns combinades amb regles basades en la percepció auditiva per a identificar la melodia a partir de tot el contingut tonal d'una gravació, tant de manera heurística com a través de models apresos automàticament. A més d'això, comprovem a través d'una iniciativa internacional d'avaluació comparativa d'algoritmes que el mètode proposat obté resultats punters. De fet, obté la precisió més alta de tots els algoritmes proposats fins la data d'avui. A demés, la tesi demostra la utilitat del mètode en diverses aplicacions tant d'investigació com per a usuaris finals, desenvolupant una sèrie de sistemes que aprofiten la melodia extreta per a la cerca de música per semblança (identificació de versions i cerca per taral•larà), la classificació de l'estil musical, la transcripció o conversió d'àudio a partitura, i l'anàlisi musical amb mètodes computacionals. La tesi també inclou una àmplia anàlisi comparativa de l'estat de l'art en extracció de melodia i la primera anàlisi crítica existent de la metodologia d'avaluació d'algoritmes d'aquesta mena.
APA, Harvard, Vancouver, ISO, and other styles
23

Dzhambazov, Georgi. "Knowledge-based probabilistic modeling for tracking lyrics in music audio signals." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404681.

Full text
Abstract:
This thesis proposes specific signal processing and machine learning methodologies for automatically aligning the lyrics of a song to its corresponding audio recording. The research carried out falls in the broader field of music information retrieval (MIR) and in this respect, we aim at improving some existing state-of-the-art methodologies, by introducing domain-specific knowledge. The goal of this work is to devise models capable of tracking in the music audio signal the sequential aspect of one particular element of lyrics - the phonemes. Music can be understood as comprising different facets, one of which is lyrics. The models we build take into account the complementary context that exists around lyrics, which is any musical facet complementary to lyrics. The facets used in this thesis include the structure of the music composition, structure of a melodic phrase, the structure of a metrical cycle. From this perspective, we analyse not only the low-level acoustic characteristics, representing the timbre of the phonemes, but also higher-level characteristics, in which the complementary context manifests. We propose specific probabilistic models to represent how the transitions between consecutive sung phonemes are conditioned by different facets of complementary context. The complementary context, which we address, unfolds in time according to principles that are particular of a music tradition. To capture these, we created corpora and datasets for two music traditions, which have a rich set of such principles: Ottoman Turkish makam and Beijing opera. The datasets and the corpora comprise different data types: audio recordings, music scores, and metadata. From this perspective, the proposed models can take advantage both of the data and the music-domain knowledge of particular musical styles to improve existing baseline approaches. As a baseline, we choose a phonetic recognizer based on hidden Markov models (HMM): a widely-used methodology for tracking phonemes both in singing and speech processing problems. We present refinements in the typical steps of existing phonetic recognizer approaches, tailored towards the characteristics of the studied music traditions. On top of the refined baseline, we device probabilistic models, based on dynamic Bayesian networks (DBN) that represent the relation of phoneme transitions to its complementary context. Two separate models are built for two granularities of complementary context: the structure of a melodic phrase (higher-level) and the structure of the metrical cycle (finer-level). In one model we exploit the fact the syllable durations depend on their position within a melodic phrase. Information about the melodic phrases is obtained from the score, as well as from music-specific knowledge.Then in another model, we analyse how vocal note onsets, estimated from audio recordings, influence the transitions between consecutive vowels and consonants. We also propose how to detect the time positions of vocal note onsets in melodic phrases by tracking simultaneously the positions in a metrical cycle (i.e. metrical accents). In order to evaluate the potential of the proposed models, we use the lyrics-to-audio alignment as a concrete task. Each model improves the alignment accuracy, compared to the baseline, which is based solely on the acoustics of the phonetic timbre. This validates our hypothesis that knowledge of complementary context is an important stepping stone for computationally tracking lyrics, especially in the challenging case of singing with instrumental accompaniment. The outcomes of this study are not only theoretic methodologies and data, but also specific software tools that have been integrated into Dunya - a suite of tools, built in the context of CompMusic, a project for advancing the computational analysis of the world's music. With this application, we have also shown that the developed methodologies are useful not only for tracking lyrics, but also for other use cases, such as enriched music listening and appreciation, or for educational purposes.
La tesi aquí presentada proposa metodologies d’aprenentatge automàtic i processament de senyal per alinear automàticament el text d’una cançó amb el seu corresponent enregistrament d’àudio. La recerca duta a terme s’engloba en l’ampli camp de l’extracció d’informació musical (Music Information Retrieval o MIR). Dins aquest context la tesi pretén millorar algunes de les metodologies d’última generació del camp introduint coneixement específic de l’àmbit. L’objectiu d’aquest treball és dissenyar models que siguin capaços de detectar en la senyal d’àudio l’aspecte seqüencial d’un element particular dels textos musicals; els fonemes. Podem entendre la música com la composició de diversos elements entre els quals podem trobar el text. Els models que construïm tenen en compte el context complementari del text. El context són tots aquells aspectes musicals que complementen el text, dels quals hem utilitzat en aquest tesi: la estructura de la composició musical, la estructura de les frases melòdiques i els accents rítmics. Des d’aquesta prespectiva analitzem no només les característiques acústiques de baix nivell, que representen el timbre musical dels fonemes, sinó també les característiques d’alt nivell en les quals es fa patent el context complementari. En aquest treball proposem models probabilístics específics que representen com les transicions entre fonemes consecutius de veu cantanda es veuen afectats per diversos aspectes del context complementari. El context complementari que tractem aquí es desenvolupa en el temps en funció de les característiques particulars de cada tradició musical. Per tal de modelar aquestes característiques hem creat corpus i conjunts de dades de dues tradicions musicals que presenten una gran riquesa en aquest aspectes; la música de l’opera de Beijing i la música makam turc-otomana. Les dades són de diversos tipus; enregistraments d’àudio, partitures musicals i metadades. Des d’aquesta prespectiva els models proposats poden aprofitar-se tant de les dades en si mateixes com del coneixement específic de la tradició musical per a millorar els resultats de referència actuals. Com a resultat de referència prenem un reconeixedor de fonemes basat en models ocults de Markov (Hidden Markov Models o HMM), una metodologia abastament emprada per a detectar fonemes tant en la veu cantada com en la parlada. Presentem millores en els processos comuns dels reconeixedors de fonemes actuals, ajustant-los a les característiques de les tradicions musicals estudiades. A més de millorar els resultats de referència també dissenyem models probabilistics basats en xarxes dinàmiques de Bayes (Dynamic Bayesian Networks o DBN) que respresenten la relació entre la transició dels fonemes i el context complementari. Hem creat dos models diferents per dos aspectes del context complementari; la estructura de la frase melòdica (alt nivell) i la estructura mètrica (nivell subtil). En un dels models explotem el fet que la duració de les síl·labes depén de la seva posició en la frase melòdica. Obtenim aquesta informació sobre les frases musical de la partitura i del coneixement específic de la tradició musical. En l’altre model analitzem com els atacs de les notes vocals, estimats directament dels enregistraments d’àudio, influencien les transicions entre vocals i consonants consecutives. A més també proposem com detectar les posicions temporals dels atacs de les notes en les frases melòdiques a base de localitzar simultàniament els accents en un cicle mètric musical. Per tal d’evaluar el potencial dels mètodes proposats utlitzem la tasca específica d’alineament de text amb àudio. Cada model proposat millora la precisió de l’alineament en comparació als resultats de referència, que es basen exclusivament en les característiques acústiques tímbriques dels fonemes. D’aquesta manera validem la nostra hipòtesi de que el coneixement del context complementari ajuda a la detecció automàtica de text musical, especialment en el cas de veu cantada amb acompanyament instrumental. Els resultats d’aquest treball no consisteixen només en metodologies teòriques i dades, sinó també en eines programàtiques específiques que han sigut integrades a Dunya, un paquet d’eines creat en el context del projecte de recerca CompMusic, l’objectiu del qual és promoure l’anàlisi computacional de les músiques del món. Gràcies a aquestes eines demostrem també que les metodologies desenvolupades es poden fer servir per a altres aplicacions en el context de la educació musical o la escolta musical enriquida.
APA, Harvard, Vancouver, ISO, and other styles
24

Şentürk, Sertan. "Computational analysis of audio recordings and music scores for the description and discovery of Ottoman-Turkish Makam music." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/402102.

Full text
Abstract:
This thesis addresses several shortcomings on the current state of the art methodologies in music information retrieval (MIR). In particular, it proposes several computational approaches to automatically analyze and describe music scores and audio recordings of Ottoman-Turkish makam music (OTMM). The main contributions of the thesis are the music corpus that has been created to carry out the research and the audio-score alignment methodology developed for the analysis of the corpus. In addition, several novel computational analysis methodologies are presented in the context of common MIR tasks of relevance for OTMM. Some example tasks are predominant melody extraction, tonic identification, tempo estimation, makam recognition, tuning analysis, structural analysis and melodic progression analysis. These methodologies become a part of a complete system called Dunya-makam for the exploration of large corpora of OTMM. The thesis starts by presenting the created CompMusic Ottoman- Turkish makam music corpus. The corpus includes 2200 music scores, more than 6500 audio recordings, and accompanying metadata. The data has been collected, annotated and curated with the help of music experts. Using criteria such as completeness, coverage and quality, we validate the corpus and show its research potential. In fact, our corpus is the largest and most representative resource of OTMM that can be used for computational research. Several test datasets have also been created from the corpus to develop and evaluate the specific methodologies proposed for different computational tasks addressed in the thesis. The part focusing on the analysis of music scores is centered on phrase and section level structural analysis. Phrase boundaries are automatically identified using an existing state-of-the-art segmentation methodology. Section boundaries are extracted using heuristics specific to the formatting of the music scores. Subsequently, a novel method based on graph analysis is used to establish similarities across these structural elements in terms of melody and lyrics, and to label the relations semiotically. The audio analysis section of the thesis reviews the state-of-the-art for analysing the melodic aspects of performances of OTMM. It proposes adaptations of existing predominant melody extraction methods tailored to OTMM. It also presents improvements over pitch-distribution-based tonic identification and makam recognition methodologies. The audio-score alignment methodology is the core of the thesis. It addresses the culture-specific challenges posed by the musical characteristics, music theory related representations and oral praxis of OTMM. Based on several techniques such as subsequence dynamic time warping, Hough transform and variable-length Markov models, the audio-score alignment methodology is designed to handle the structural differences between music scores and audio recordings. The method is robust to the presence of non-notated melodic expressions, tempo deviations within the music performances, and differences in tonic and tuning. The methodology utilizes the outputs of the score and audio analysis, and links the audio and the symbolic data. In addition, the alignment methodology is used to obtain score-informed description of audio recordings. The scoreinformed audio analysis not only simplifies the audio feature extraction steps that would require sophisticated audio processing approaches, but also substantially improves the performance compared with results obtained from the state-of-the-art methods solely relying on audio data. The analysis methodologies presented in the thesis are applied to the CompMusic Ottoman-Turkish makam music corpus and integrated into a web application aimed at culture-aware music discovery. Some of the methodologies have already been applied to other music traditions such as Hindustani, Carnatic and Greek music. Following open research best practices, all the created data, software tools and analysis results are openly available. The methodologies, the tools and the corpus itself provide vast opportunities for future research in many fields such as music information retrieval, computational musicology and music education.
Esta tesis aborda varias limitaciones de las metodologías más avanzadas en el campo de recuperación de información musical (MIR por sus siglas en inglés). En particular, propone varios métodos computacionales para el análisis y la descripción automáticas de partituras y grabaciones de audio de música de makam turco-otomana (MMTO). Las principales contribuciones de la tesis son el corpus de música que ha sido creado para el desarrollo de la investigación y la metodología para alineamiento de audio y partitura desarrollada para el análisis del corpus. Además, se presentan varias metodologías nuevas para análisis computacional en el contexto de las tareas comunes de MIR que son relevantes para MMTO. Algunas de estas tareas son, por ejemplo, extracción de la melodía predominante, identificación de la tónica, estimación de tempo, reconocimiento de makam, análisis de afinación, análisis estructural y análisis de progresión melódica. Estas metodologías constituyen las partes de un sistema completo para la exploración de grandes corpus de MMTO llamado Dunya-makam. La tesis comienza presentando el corpus de música de makam turcootomana de CompMusic. El corpus incluye 2200 partituras, más de 6500 grabaciones de audio, y los metadatos correspondientes. Los datos han sido recopilados, anotados y revisados con la ayuda de expertos. Utilizando criterios como compleción, cobertura y calidad, validamos el corpus y mostramos su potencial para investigación. De hecho, nuestro corpus constituye el recurso de mayor tamaño y representatividad disponible para la investigación computacional de MMTO. Varios conjuntos de datos para experimentación han sido igualmente creados a partir del corpus, con el fin de desarrollar y evaluar las metodologías específicas propuestas para las diferentes tareas computacionales abordadas en la tesis. La parte dedicada al análisis de las partituras se centra en el análisis estructural a nivel de sección y de frase. Los márgenes de frase son identificados automáticamente usando uno de los métodos de segmentación existentes más avanzados. Los márgenes de sección son extraídos usando una heurística específica al formato de las partituras. A continuación, se emplea un método de nueva creación basado en análisis gráfico para establecer similitudes a través de estos elementos estructurales en cuanto a melodía y letra, así como para etiquetar relaciones semióticamente. La sección de análisis de audio de la tesis repasa el estado de la cuestión en cuanto a análisis de los aspectos melódicos en grabaciones de MMTO. Se proponen modificaciones de métodos existentes para extracción de melodía predominante para ajustarlas a MMTO. También se presentan mejoras de metodologías tanto para identificación de tónica basadas en distribución de alturas, como para reconocimiento de makam. La metodología para alineación de audio y partitura constituye el grueso de la tesis. Aborda los retos específicos de esta cultura según vienen determinados por las características musicales, las representaciones relacionadas con la teoría musical y la praxis oral de MMTO. Basada en varias técnicas tales como deformaciones dinámicas de tiempo subsecuentes, transformada de Hough y modelos de Markov de longitud variable, la metodología de alineamiento de audio y partitura está diseñada para tratar las diferencias estructurales entre partituras y grabaciones de audio. El método es robusto a la presencia de expresiones melódicas no anotadas, desviaciones de tiempo en las grabaciones, y diferencias de tónica y afinación. La metodología utiliza los resultados del análisis de partitura y audio para enlazar el audio y los datos simbólicos. Además, la metodología de alineación se usa para obtener una descripción informada por partitura de las grabaciones de audio. El análisis de audio informado por partitura no sólo simplifica los pasos para la extracción de características de audio que de otro modo requerirían sofisticados métodos de procesado de audio, sino que también mejora sustancialmente su rendimiento en comparación con los resultados obtenidos por los métodos más avanzados basados únicamente en datos de audio. Las metodologías analíticas presentadas en la tesis son aplicadas al corpus de música de makam turco-otomana de CompMusic e integradas en una aplicación web dedicada al descubrimiento culturalmente específico de música. Algunas de las metodologías ya han sido aplicadas a otras tradiciones musicales, como música indostaní, carnática y griega. Siguiendo las mejores prácticas de investigación en abierto, todos los datos creados, las herramientas de software y los resultados de análisis está disponibles públicamente. Las metodologías, las herramientas y el corpus en sí mismo ofrecen grandes oportunidades para investigaciones futuras en muchos campos tales como recuperación de información musical, musicología computacional y educación musical.
Aquesta tesi adreça diverses deficiències en l’estat actual de les metodologies d’extracció d’informació de música (Music Information Retrieval o MIR). En particular, la tesi proposa diverses estratègies per analitzar i descriure automàticament partitures musicals i enregistraments d’actuacions musicals de música Makam Turca Otomana (OTMM en les seves sigles en anglès). Les contribucions principals de la tesi són els corpus musicals que s’han creat en el context de la tesi per tal de dur a terme la recerca i la metodologia de alineament d’àudio amb la partitura que s’ha desenvolupat per tal d’analitzar els corpus. A més la tesi presenta diverses noves metodologies d’anàlisi computacional d’OTMM per a les tasques més habituals en MIR. Alguns exemples d’aquestes tasques són la extracció de la melodia principal, la identificació del to musical, l’estimació de tempo, el reconeixement de Makam, l’anàlisi de la afinació, l’anàlisi de la estructura musical i l’anàlisi de la progressió melòdica. Aquest seguit de metodologies formen part del sistema Dunya-makam per a la exploració de grans corpus musicals d’OTMM. En primer lloc, la tesi presenta el corpus CompMusic Ottoman- Turkish makam music. Aquest inclou 2200 partitures musicals, més de 6500 enregistraments d’àudio i metadata complementària. Les dades han sigut recopilades i anotades amb ajuda d’experts en aquest repertori musical. El corpus ha estat validat en termes de d’exhaustivitat, cobertura i qualitat i mostrem aquí el seu potencial per a la recerca. De fet, aquest corpus és el la font més gran i representativa de OTMM que pot ser utilitzada per recerca computacional. També s’han desenvolupat diversos subconjunts de dades per al desenvolupament i evaluació de les metodologies específiques proposades per a les diverses tasques computacionals que es presenten en aquest tesi. La secció de la tesi que tracta de l’anàlisi de partitures musicals se centra en l’anàlisi estructural a nivell de secció i de frase musical. Els límits temporals de les frases musicals s’identifiquen automàticament gràcies a un metodologia de segmentació d’última generació. Els límits de les seccions s’extreuen utilitzant un seguit de regles heurístiques determinades pel format de les partitures musicals. Posteriorment s’utilitza un nou mètode basat en anàlisi gràfic per establir semblances entre aquest elements estructurals en termes de melodia i text. També s’utilitza aquest mètode per etiquetar les relacions semiòtiques existents. La següent secció de la tesi tracta sobre anàlisi d’àudio i en particular revisa les tecnologies d’avantguardia d’anàlisi dels aspectes melòdics en OTMM. S’hi proposen adaptacions dels mètodes d’extracció de melodia existents que s’ajusten a OTMM. També s’hi presenten millores en metodologies de reconeixement de makam i en identificació de tònica basats en distribució de to. La metodologia d’alineament d’àudio amb partitura és el nucli de la tesi. Aquesta aborda els reptes culturalment específics imposats per les característiques musicals, les representacions de la teoria musical i la pràctica oral particulars de l’OTMM. Utilitzant diverses tècniques tal i com Dynamic Time Warping, Hough Transform o models de Markov de durada variable, la metodologia d’alineament esta dissenyada per enfrontar les diferències estructurals entre partitures musicals i enregistraments d’àudio. El mètode és robust inclús en presència d’expressions musicals no anotades en la partitura, desviacions de tempo ocorregudes en les actuacions musicals i diferències de tònica i afinació. La metodologia aprofita els resultats de l’anàlisi de la partitura i l’àudio per enllaçar la informació simbòlica amb l’àudio. A més, la tècnica d’alineament s’utilitza per obtenir descripcions de l’àudio fonamentades en la partitura. L’anàlisi de l’àudio fonamentat en la partitura no només simplifica les fases d’extracció de característiques d’àudio que requeririen de mètodes de processament d’àudio sofisticats, sinó que a més millora substancialment els resultats comparat amb altres mètodes d´ultima generació que només depenen de contingut d’àudio. Les metodologies d’anàlisi presentades s’han utilitzat per analitzar el corpus CompMusic Ottoman-Turkish makam music i s’han integrat en una aplicació web destinada al descobriment musical de tradicions culturals específiques. Algunes de les metodologies ja han sigut també aplicades a altres tradicions musicals com la Hindustani, la Carnàtica i la Grega. Seguint els preceptes de la investigació oberta totes les dades creades, eines computacionals i resultats dels anàlisis estan disponibles obertament. Tant les metodologies, les eines i el corpus en si mateix proporcionen àmplies oportunitats per recerques futures en diversos camps de recerca tal i com la musicologia computacional, la extracció d’informació musical i la educació musical. Traducció d’anglès a català per Oriol Romaní Picas.
APA, Harvard, Vancouver, ISO, and other styles
25

Srinivasamurthy, Ajay. "A Data-driven bayesian approach to automatic rhythm analysis of indian art music." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/398986.

Full text
Abstract:
Las colecciones de música son cada vez mayores y más variadas, haciendo necesarias nuevas fórmulas para su organización automática. El análisis automático del ritmo tiene como fin la extracción de información rítmica de grabaciones musicales y es una de las principales áreas de investigación en la disciplina de recuperación de la información musical (MIR por sus siglas en inglés). La dimensión rítmica de la música es específica a una cultura y por tanto su análisis requiere métodos que incluyan el contexto cultural. Las complejidades rítmicas de la música clásica de la India, una de las mayores tradiciones musicales del mundo, no han sido tratadas hasta la fecha en MIR, motivo por el cual la elegimos como nuestro principal objeto de estudio. Nuestra intención es abordar cuestiones de análisis rítmico aún no tratadas en MIR con el fin de contribuir a la disciplina con nuevos métodos sensibles al contexto cultural y generalizables a otras tradiciones musicales. El objetivo de la tesis es el desarrollo de técnicas de procesamiento de señales y aprendizaje automático dirigidas por datos para el análisis, descripción y descubrimiento automáticos de estructuras y patrones rítmicos en colecciones de audio de música clásica de la India. Tras identificar retos y posibilidades, así como varias tareas de investigación relevantes para este objetivo, detallamos la elaboración del corpus de estudio y conjuntos de datos, fundamentales para métodos dirigidos por datos. A continuación, nos centramos en las tareas de análisis métrico y descubrimiento de patrones de percusión. El análisis métrico consiste en la alineación de eventos métricos a diferentes niveles con una grabación de audio. En la tesis formulamos las tareas de deducción de metro, seguimiento de metro y seguimiento informado de metro de acuerdo a la tradición estudiada, se evalúan diferentes modelos bayesianos capaces de incorporar explícitamente información de estructuras métricas de niveles superiores y se proponen nuevas extensiones. Los métodos propuestos superan las limitaciones de las propuestas existentes y los resultados indican la efectividad del análisis informado de metro. La percusión en la música clásica de la India utiliza onomatopeyas para la transmisión del repertorio y la técnica. Utilizamos estas sílabas para definir, representar y descubrir patrones en grabaciones de solos de percusión. A tal fin generamos una transcripción automática basada en un modelo oculto de Márkov, seguida de una búsqueda aproximada de subcadenas usando una biblioteca de patrones de percusión derivada de datos. Experimentos preliminares en patrones de percusión de ópera de Pekín, y en grabaciones de solos de tabla y mridangam, demuestran la utilidad de estas sílabas, identificando nuevos retos para el desarrollo de sistemas prácticos de descubrimiento. Las tecnologías resultantes de esta investigación son parte de un conjunto de herramientas desarrollado en el proyecto CompMusic para el mejor entendimiento y organización de la música clásica de la India, con el objetivo de proveer una experiencia mejorada de escucha y descubrimiento de música. Estos datos y herramientas pueden ser también relevantes para estudios musicológicos dirigidos por datos y otras tareas de MIR que puedan beneficiarse de análisis automáticos de ritmo.
Large and growing collections of a wide variety of music are now available on demand to music listeners, necessitating novel ways of automatically structuring these collections using different dimensions of music. Rhythm is one of the basic music dimensions and its automatic analysis, which aims to extract musically meaningful rhythm related information from music, is a core task in Music Information Research (MIR). Musical rhythm, similar to most musical dimensions, is culture-specific and hence its analysis requires culture-aware approaches. Indian art music is one of the major music traditions of the world and has complexities in rhythm that have not been addressed by the current state of the art in MIR, motivating us to choose it as the primary music tradition for study. Our intent is to address unexplored rhythm analysis problems in Indian art music to push the boundaries of the current MIR approaches by making them culture-aware and generalizable to other music traditions. The thesis aims to build data-driven signal processing and machine learning approaches for automatic analysis, description and discovery of rhythmic structures and patterns in audio music collections of Indian art music. After identifying challenges and opportunities, we present several relevant research tasks that open up the field of automatic rhythm analysis of Indian art music. Data-driven approaches require well curated data corpora for research and efforts towards creating such corpora and datasets are documented in detail. We then focus on the topics of meter analysis and percussion pattern discovery in Indian art music. Meter analysis aims to align several hierarchical metrical events with an audio recording. Meter analysis tasks such as meter inference, meter tracking and informed meter tracking are formulated for Indian art music. Different Bayesian models that can explicitly incorporate higher level metrical structure information are evaluated for the tasks and novel extensions are proposed. The proposed methods overcome the limitations of existing approaches and their performance indicate the effectiveness of informed meter analysis. Percussion in Indian art music uses onomatopoeic oral mnemonic syllables for the transmission of repertoire and technique, providing a language for percussion. We use these percussion syllables to define, represent and discover percussion patterns in audio recordings of percussion solos. We approach the problem of percussion pattern discovery using hidden Markov model based automatic transcription followed by an approximate string search using a data derived percussion pattern library. Preliminary experiments on Beijing opera percussion patterns, and on both tabla and mridangam solo recordings in Indian art music demonstrate the utility of percussion syllables, identifying further challenges to building practical discovery systems. The technologies resulting from the research in the thesis are a part of the complete set of tools being developed within the CompMusic project for a better understanding and organization of Indian art music, aimed at providing an enriched experience with listening and discovery of music. The data and tools should also be relevant for data-driven musicological studies and other MIR tasks that can benefit from automatic rhythm analysis.
Les col·leccions de música són cada vegada més grans i variades, fet que fa necessari buscar noves fórmules per a organitzar automàticament aquestes col·leccions. El ritme és una de les dimensions bàsiques de la música, i el seu anàlisi automàtic és una de les principals àrees d'investigació en la disciplina de l'recuperació de la informació musical (MIR, acrònim de la traducció a l'anglès). El ritme, com la majoria de les dimensions musicals, és específic per a cada cultura i per tant, el seu anàlisi requereix de mètodes que incloguin el context cultural. La complexitat rítmica de la música clàssica de l'Índia, una de les tradicions musicals més grans al món, no ha estat encara treballada en el camp d'investigació de MIR - motiu pel qual l'escollim com a principal material d'estudi. La nostra intenció és abordar les problemàtiques que presenta l'anàlisi rítmic de la música clàssica de l'Índia, encara no tractades en MIR, amb la finalitat de contribuir en la disciplina amb nous models sensibles al context cultural i generalitzables a altres tradicions musicals. L'objectiu de la tesi consisteix en desenvolupar tècniques de processament de senyal i d'aprenentatge automàtic per a l'anàlisi, descripció i descobriment automàtic d'estructures i patrons rítmics en col·leccions de música clàssica de l'Índia. Després d'identificar els reptes i les oportunitats, així com les diverses tasques d'investigació rellevants per a aquest objectiu, detallem el procés d'elaboració del corpus de dades, fonamentals per als mètodes basats en dades. A continuació, ens centrem en les tasques d'anàlisis mètric i descobriment de patrons de percussió. L'anàlisi mètric consisteix en alinear els diversos esdeveniments mètrics -a diferents nivells- que es produeixen en una gravació d'àudio. En aquesta tesi formulem les tasques de deducció, seguiment i seguiment informat de la mètrica. D'acord amb la tradició musical estudiada, s'avaluen diferents models bayesians que poden incorporar explícitament estructures mètriques d'alt nivell i es proposen noves extensions per al mètode. Els mètodes proposats superen les limitacions dels mètodes ja existents i el seu rendiment indica l'efectivitat dels mètodes informats d'anàlisis mètric. La percussió en la música clàssica de l'Índia utilitza onomatopeies per a la transmissió del repertori i de la tècnica, fet que construeix un llenguatge per a la percussió. Utilitzem aquestes síl·labes percussives per a definir, representar i descobrir patrons en enregistraments de solos de percussió. Enfoquem el problema del descobriment de patrons percussius amb un model de transcripció automàtica basat en models ocults de Markov, seguida d'una recerca aproximada de strings utilitzant una llibreria de patrons de percussions derivada de dades. Experiments preliminars amb patrons de percussió d'òpera de Pequín, i amb gravacions de solos de tabla i mridangam, demostren la utilitat de les síl·labes percussives. Identificant, així, nous horitzons per al desenvolupament de sistemes pràctics de descobriment. Les tecnologies resultants d'aquesta recerca són part de les eines desenvolupades dins el projecte de CompMusic, que té com a objectiu millorar l'experiència d'escoltar i descobrir música per a la millor comprensió i organització de la música clàssica de l'Índia, entre d'altres. Aquestes dades i eines poden ser rellevants per a estudis musicològics basats en dades i, també, altres tasques MIR poden beneficiar-se de l'anàlisi automàtic del ritme.
APA, Harvard, Vancouver, ISO, and other styles
26

Laurier, Cyril François. "Automatic Classification of musical mood by content-based analysis." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/51582.

Full text
Abstract:
In this work, we focus on automatically classifying music by mood. For this purpose, we propose computational models using information extracted from the audio signal. The foundations of such algorithms are based on techniques from signal processing, machine learning and information retrieval. First, by studying the tagging behavior of a music social network, we find a model to represent mood. Then, we propose a method for automatic music mood classification. We analyze the contributions of audio descriptors and how their values are related to the observed mood. We also propose a multimodal version using lyrics, contributing to the field of text retrieval. Moreover, after showing the relation between mood and genre, we present a new approach using automatic music genre classification. We demonstrate that genre-based mood classifiers give higher accuracies than standard audio models. Finally, we propose a rule extraction technique to explicit our models.
En esta tesis, nos centramos en la clasificación automática de música a partir de la detección de la emoción que comunica. Primero, estudiamos cómo los miembros de una red social utilizan etiquetas y palabras clave para describir la música y las emociones que evoca, y encontramos un modelo para representar los estados de ánimo. Luego, proponemos un método de clasificación automática de emociones. Analizamos las contribuciones de descriptores de audio y cómo sus valores están relacionados con los estados de ánimo. Proponemos también una versión multimodal de nuestro algoritmo, usando las letras de canciones. Finalmente, después de estudiar la relación entre el estado de ánimo y el género musical, presentamos un método usando la clasificación automática por género. A modo de recapitulación conceptual y algorítmica, proponemos una técnica de extracción de reglas para entender como los algoritmos de aprendizaje automático predicen la emoción evocada por la música
APA, Harvard, Vancouver, ISO, and other styles
27

Pires, André Salim. "Métodos de segmentação musical baseados em descritores sonoros." Universidade de São Paulo, 2011. http://www.teses.usp.br/teses/disponiveis/45/45134/tde-14082011-204700/.

Full text
Abstract:
Esta dissertação apresenta um estudo comparativo de diferentes métodos computacionais de segmentação estrutural musical, onde o principal objetivo é delimitar fronteiras de seções musicais em um sinal de áudio, e rotulá-las, i.e. agrupar as seções encontradas que correspondem a uma mesma parte musical. São apresentadas novas propostas para segmentação estrutural nãosupervisionada, incluindo métodos para processamento em tempo real, alcançando resultados com taxas de erro inferiores a 12%. O método utilizado compreende um estudo dos descritores sonoros e meios de modelá-los temporalmente, uma exposição das técnicas computacionais de segmentação estrutural e novos métodos de avaliação dos resultados que penalizam tanto a incorreta detecção das fronteiras quanto o número incorreto de rótulos encontrados. O desempenho de cada técnica computacional é calculado utilizando diferentes conjuntos de descritores sonoros e os resultados são apresentados e analisados tanto quantitativa quanto qualitativamente.
A comparative study of different music structural segmentation methods is presented, where the goal is to delimit the borders of musical sections and label them, i.e. group the sections that correspond to the same musical part. Novel proposals for unsupervised segmentation are presented, including methods for real-time segmentation, achieving expressive results, with error ratio less then 12%. Our method consists of a study of sound descriptors, an exposition of the computational techniques for structural segmentation and the description of the evaluation methods utilized, which penalize both incorrect boundary detection and incorrect number of labels. The performance of each technique is calculated using different sound descriptor sets and the results are presented and analysed both from quantitative and qualitative points-of-view.
APA, Harvard, Vancouver, ISO, and other styles
28

Ewert, Sebastian [Verfasser]. "Signal Processing Methods for Music Synchronization, Audio Matching, and Source Separation / Sebastian Ewert." Bonn : Universitäts- und Landesbibliothek Bonn, 2012. http://d-nb.info/1044867760/34.

Full text
APA, Harvard, Vancouver, ISO, and other styles
29

Uren, Grethe Rachelle. "Die invloed van geskikte agtergrondmusiek op die studie -oriëntasie en prestasie van graad 8-leerders in wiskunde / Grethe Rachelle Uren." Thesis, North-West University, 2009. http://hdl.handle.net/10394/4325.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Louboutin, Corentin. "Modélisation multi-échelle et multi-dimensionnelle de la structure musicale par graphes polytopiques." Thesis, Rennes 1, 2019. http://www.theses.fr/2019REN1S012/document.

Full text
Abstract:
Il est raisonnable de considérer qu'un auditeur ne perçoit pas la musique comme une simple séquence de sons, pas plus que le compositeur n'a conçu son morceau comme tel. La musique est en effet constituée de motifs dont l'organisation intrinsèque et les relations mutuelles participent à la structuration du propos musical, et ce à plusieurs échelles simultanément. Cependant, il est aujourd'hui encore très difficile de définir précisément le terme de concept musicale. L'un des principaux aspects de la musique est qu'elle est en grande partie constituée de redondances, sous forme de répétitions exactes et variées. L'organisation de ces redondances permet de susciter une attente chez l'auditeur. Une surprise peut alors être créée en présentant des éléments qui ne correspondent pas à cette attente. Ce travail de thèse se base sur l'hypothèse que les redondances, l'attente et la surprise sont des éléments essentiels pour la description de la structure musicale d'un segment. Un certain nombre de questions découlent de ce constat: quels sont les éléments musicaux qui participent à la structure d'un objet musical ? Quelles sont les dépendances entre ces éléments qui jouent un rôle essentiel dans la structuration d'un objet musical ? Comment peut-on décrire une relation entre deux éléments musicaux tels que des accords, des motifs rythmiques ou mélodiques ? Dans ce manuscrit, des éléments de réponse sont proposés par la formalisation et l'implémentation d'un modèle multi-échelle de description de la structure d'un segment musical : les Graphes Polytopiques à Relations Latentes (GPRL/PGLR). Dans ce travail, les segments considérés sont les sections successives qui forment une pièce musicale. Dans le cas de la pop, genre musical sur lequel se concentre ce travail, il s'agit typiquement d'un couplet ou d'un refrain, de 15 sec. environ, comprenant un début et une fin bien définis. En suivant le formalisme PGLR, les relations de dépendance prédominantes entre éléments musicaux d'un segment sont celles qui relient les éléments situés à des positions homologues sur la grille métrique du segment. Cette approche généralise sur le plan multi-échelle le modèle Système&Contraste qui décrit sous la forme d'une matrice 2×2 le système d'attente logique au sein d'un segment et la surprise qui découle de la réalisation de cette attente. Pour des segments réguliers de taille 2^n, le PGLR peut être représenté sur un n-cube (carré, cube, tesseract, ...), où n est le nombre d'échelles considérées. Chaque nœud du polytope correspond à un élément musical fondamental (accord, motif, note...), chaque arête représente une relation entre deux nœuds et chaque face représente un système de relations. La recherche du PGLR correspondant à la meilleure description de la structure d'un segment musical s'opère par l'estimation jointe : de la description du polytope (un n-polytope plus ou moins régulier) ; de la configuration du graphe sur le polytope, permettant de décrire le flux de dépendance et les interactions entre les éléments par des implications élémentaires systémiques au sein du segment ; la description de l'ensemble des relations entre les nœuds du graphe. Le but du modèle PGLR est à la fois de décrire les dépendances temporelles entre les éléments d'un segment et de modéliser l'attente logique et la surprise qui découlent de l'observation et de la perception des similarités et des différences entre ces éléments. Cette approche a été formalisée et implémentée pour décrire la structure de séquences d'accords ainsi que de segments rythmiques et mélodiques, puis évaluée par sa capacité à prédire des segments inconnus. La mesure utilisée pour cette évaluation est la perplexité croisée calculée à partir des données du corpus RWC POP. Les résultats obtenus donnent un large avantage à la méthode multi-échelle proposée, qui semble mieux à même de décrire efficacement la structure des segments testés
In this thesis, we approach these questions by defining and implementing a multi-scale model for music segment structure description, called Polytopic Graph of Latent Relations (PGLR). In our work, a segment is the macroscopic constituent of the global piece. In pop songs, which is the main focus here, segments usually correspond to a chorus or a verse, lasting approximately 15 seconds and exhibiting a clear beginning and end. Under the PGLR scheme, relationships between musical elements within a musical segment are assumed to be developing predominantly between homologous elements within the metrical grid at different scales simultaneously. This approach generalises to the multi-scale case the System&Contrast framework which aims at describing, as a 2×2 square matrix, the logical system of expectation within a segment and the surprise resulting from that expectation. For regular segments of 2^n events, the PGLR lives on a n-dimensional cube (square, cube, tesseract, etc...), n being the number of scales considered simultaneously in the multi-scale model. Each vertex in the polytope corresponds to a low-scale musical element, each edge represents a relationship between two vertices and each face forms an elementary system of relationships. The estimation of the PGLR structure of a musical segment can then be obtained computationally as the joint estimation of : the description of the polytope (as a more or less regular n-polytope) ; the nesting configuration of the graph over the polytope, reflecting the flow of dependencies and interactions as elementary implication systems within the musical segment, the set of relations between the nodes of the graph. The aim of the PGLR model is to both describe the time dependencies between the elements of a segment and model the logical expectation and surprise that can be built on the observation and perception of the similarities and differences between elements with strong relationships. The approach is presented conceptually and algorithmically, together with an extensive evaluation of the ability of different models to predict unseen data, measured using the cross-perplexity value. These experiments have been conducted both on chords sequences, rhythmic and melodic segments extracted from the RWC POP corpus. Our results illustrate the efficiency of the proposed model in capturing structural information within such data
APA, Harvard, Vancouver, ISO, and other styles
31

Bayle, Yann. "Apprentissage automatique de caractéristiques audio : application à la génération de listes de lecture thématiques." Thesis, Bordeaux, 2018. http://www.theses.fr/2018BORD0087/document.

Full text
Abstract:
Ce mémoire de thèse de doctorat présente, discute et propose des outils de fouille automatique de mégadonnées dans un contexte de classification supervisée musical.L'application principale concerne la classification automatique des thèmes musicaux afin de générer des listes de lecture thématiques.Le premier chapitre introduit les différents contextes et concepts autour des mégadonnées musicales et de leur consommation.Le deuxième chapitre s'attelle à la description des bases de données musicales existantes dans le cadre d'expériences académiques d'analyse audio.Ce chapitre introduit notamment les problématiques concernant la variété et les proportions inégales des thèmes contenus dans une base, qui demeurent complexes à prendre en compte dans une classification supervisée.Le troisième chapitre explique l'importance de l'extraction et du développement de caractéristiques audio et musicales pertinentes afin de mieux décrire le contenu des éléments contenus dans ces bases de données.Ce chapitre explique plusieurs phénomènes psychoacoustiques et utilise des techniques de traitement du signal sonore afin de calculer des caractéristiques audio.De nouvelles méthodes d'agrégation de caractéristiques audio locales sont proposées afin d'améliorer la classification des morceaux.Le quatrième chapitre décrit l'utilisation des caractéristiques musicales extraites afin de trier les morceaux par thèmes et donc de permettre les recommandations musicales et la génération automatique de listes de lecture thématiques homogènes.Cette partie implique l'utilisation d'algorithmes d'apprentissage automatique afin de réaliser des tâches de classification musicale.Les contributions de ce mémoire sont résumées dans le cinquième chapitre qui propose également des perspectives de recherche dans l'apprentissage automatique et l'extraction de caractéristiques audio multi-échelles
This doctoral dissertation presents, discusses and proposes tools for the automatic information retrieval in big musical databases.The main application is the supervised classification of musical themes to generate thematic playlists.The first chapter introduces the different contexts and concepts around big musical databases and their consumption.The second chapter focuses on the description of existing music databases as part of academic experiments in audio analysis.This chapter notably introduces issues concerning the variety and unequal proportions of the themes contained in a database, which remain complex to take into account in supervised classification.The third chapter explains the importance of extracting and developing relevant audio features in order to better describe the content of music tracks in these databases.This chapter explains several psychoacoustic phenomena and uses sound signal processing techniques to compute audio features.New methods of aggregating local audio features are proposed to improve song classification.The fourth chapter describes the use of the extracted audio features in order to sort the songs by themes and thus to allow the musical recommendations and the automatic generation of homogeneous thematic playlists.This part involves the use of machine learning algorithms to perform music classification tasks.The contributions of this dissertation are summarized in the fifth chapter which also proposes research perspectives in machine learning and extraction of multi-scale audio features
APA, Harvard, Vancouver, ISO, and other styles
32

Zapata, González José Ricardo. "Comparative evaluation and combination of automatic rhythm description systems." Doctoral thesis, Universitat Pompeu Fabra, 2013. http://hdl.handle.net/10803/123822.

Full text
Abstract:
The automatic analysis of musical rhythm from audio, and more specifically tempo and beat tracking, is one of the fundamental open research problems in Music Information Retrieval (MIR) research. Automatic beat tracking is a valuable tool for the solution of other MIR problems, as it enables beat- synchronous analysis of different music tasks. Even though automatic rhythm description is a relatively mature research topic in MIR tempo estimation and beat tracking remain an unsolved problem. We describe a new method for the extraction of beat times with a confidence value from music audio, based on the measurement of mutual agreement between a committee of beat tracking systems. The method can also be used identify music samples that are challenging for beat tracking without the need for ground truth annotations. we also conduct an extensive comparative evaluation of 32 tempo estimation and 16 beat tracking systems.
El análisis automático musical del ritmo en audio, y más concretamente el tempo y la detección de beats (Beat tracking), es uno de los problemas fundamentales en recuperación de información de Musical (MIR). La detección automática de beat es una valiosa herramienta para la solución de otros problemas de MIR, ya que permite el análisis sincronizado de la música con los beats para otras tareas. Describimos un nuevo método para la extracción de beats en señales de audio que mide el grado de confianza de la estimación, basado en la medición del grado de similitud entre un comité de sistemas de detección de beats. Este método automático se puede utilizar también para identificar canciones que son difíciles para la detección de beats. También realizamos una extensa evaluación comparativa de los sistemas actuales de descripción automática ritmo. Para esto, Evaluamos 32 algoritmos de tempo y 16 sistemas de detección de beats.
APA, Harvard, Vancouver, ISO, and other styles
33

Vercellesi, G. "Digital Audio Processing in MP3 Compressed Domain and Evaluation of Perceived Audio Quality." Doctoral thesis, Università degli Studi di Milano, 2006. http://hdl.handle.net/2434/36412.

Full text
Abstract:
The state of the art provides several digital audio signal processing in uncompressed domain (PCM Pulse Code Modulation). We can found several works in the literature which explain different methods to modify an audio signal both in the time and in the frequency domain, in order to normalize the intensity, or to apply filters, special effects and so on. Currently the MP3 format has not been deeply considered by literature. The most meaningfully works are related to MP1 and MP2 formats. There is not a exhaustive formalization of the digital audio signal processing in the MP3 compressed domain. Furthermore, there is not a software framework which allow to develop and implement every kind of processing algorithm in MP3 compressed domain. There are only some simple software which directly split and wrap MP3 files and process the volume in a very simple way. In this dissertation we define different approaches to develop every kind of algorithm for digital signal processing in the MP3 compressed domain. The contributions of this dissertation are the formalization the problem of MP3 direct processing defining different approaches (or levels), with respect to the various steps of the decoding/encoding phases, the development of algorithms for the MP3 format working as nearest as possible to the MP3 domain, and the improvement and the customization of methods and protocol described in the recommendation of the International Telecommunication Union (ITU-R) to evaluate the objective and subjective perceived audio quality. We define three different domain where it is possible to manage MP3-coded audio information. We develope algorithms to moving the frame, control the gain by RMS, filter and the channel selection. Filters and channel selection have been developed to downgrade MP3 files. For each algorithm we have chosen the best approach, finding the best trade-off among time consumption, perceived audio quality and problems related to unmasking and aliasing. This formalization represents the base concepts for the development of a software framework which allows the implementation of every kind of algorithm from the PCMdomain to the MP3-domain. Finally we improve and customize the methods and the protocol to evaluate the objective and subjective perceived audio quality, described in the recommendation of the International Telecommunication Union (ITU-R). We evaluate the objective performance of modern MP3 codec with respect to tandem coding. We study the level of reliable of objective tests, comparing them with the subjective. We compare the MP3-coded audio processed both following the traditional and the direct approach to editing.
APA, Harvard, Vancouver, ISO, and other styles
34

Nascimento, Sergio Roberto Vital do. "Geoprocessamento aplicado a gest?o de informa??es territoriais do munic?pio de Grossos-RN :estudo multitemporal do uso e ocupa??o do solo." Universidade Federal do Rio Grande do Norte, 2004. http://repositorio.ufrn.br:8080/jspui/handle/123456789/16787.

Full text
Abstract:
Made available in DSpace on 2014-12-17T15:19:58Z (GMT). No. of bitstreams: 1 SergioRVN.pdf: 3914529 bytes, checksum: f675e6634445ab12c769726f659a0a55 (MD5) Previous issue date: 2004-04-30
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior
The present work was carried through in the Grossos city - RN and had as main objectives the elaboration of an physicist-ambient, socioeconomic survey and execution a multisecular evaluation of 11 years, between 1986 and 1996, using remote sensing products, to evaluate the modifications of the land use, aiming at the generation of an information database to implementation a geographical information system (GIS) to management the this city. For they had been in such a way raised given referring the two Demographic Censuses carried through by the IBGE (1991 and 2000) and compared, of this form was possible to the accomplishment of an evaluation on the demographic aspects (degree of urbanization, et?ria structure, educational level) and economic (income, habitation, vulnerability, human development). For the ambient physical survey the maps of the natural resources had been confectioned (simplified geology, hydrography, geomorphologi, veget covering, ground association, use and occupation), based in comments of field and orbital products of remote sensoriamento (images Spot-HRVIR, Landsat 5-TM and IKONOS - II), using itself of techniques of digital picture processing. The survey of these data and important in the identification of the potentialities and fragilities of found ecosystems, therefore allows an adequate planning of the partner-economic development by means of an efficient management. The project was part of a partnership between the Grossos city hall the municipal City hall of Grossos - RN and the Geoscience post-graduate program of the UFRN, more specifically the Geomatica laboratory LAGEOMA
O presente trabalho foi realizado no Munic?pio de Grossos RN e teve como principais objetivos ? elabora??o de um levantamento s?cio-econ?mico, f?sico-ambiental e execu??o uma avalia??o multitemporal de 11 anos, entre o per?odo de 1986 e 1996, utilizando-se de produtos de sensores orbitais, para avaliar as modifica??es ocorridas na utiliza??o e ocupa??o do solo, visando a gera??o de uma base informacional para implementa??o de um Sistema de Informa??es Geogr?ficas (SIG) voltado para a gest?o ambiental do referido Munic?pio. Para tanto foram levantados dados referentes a dois Censos Demogr?ficos realizados pelo IBGE (1991 e 2000) e comparados, desta forma foi poss?vel ? realiza??o de uma avalia??o sobre os aspectos demogr?ficos (grau de urbaniza??o, estrutura et?ria, n?vel educacional) e econ?micos (renda, habita??o, vulnerabilidade, desenvolvimento humano). Para o levantamento f?sico ambiental foram confeccionados os mapas dos recursos naturais (geologia simplificada, hidrografia, geomorfologia, cobertura vegetal, associa??o de solos, uso e ocupa??o), baseados em observa??es de campo e produtos orbitais de sensoriamento remoto (imagens SPOT-HRVIR, Landsat 5-TM e IKONOS - II), utilizando-se de t?cnicas de processamento de imagens digitais. O levantamento destes dados e importante na identifica??o das potencialidades e fragilidades dos ecossistemas encontrados, pois permite um planejamento adequado do desenvolvimento s?cio-econ?mico por meio de um gerenciamento eficaz. O projeto fez parte de uma parceria entre a Prefeitura municipal de GrossosRN e o Programa de P?s-gradua??o em Geoci?ncias da UFRN, mais especificamente o Laborat?rio de Geom?tica LAGEOMA
APA, Harvard, Vancouver, ISO, and other styles
35

Te?dulo, Jos? M?cio Ramalho. "Uso de t?cnicas de Geoprocessamento e Sensoriamento Remoto no levantamento e integra??o de dados necess?rios a gest?o ambiental dos campos de extra??o de ?leo e g?s do Canto do Amaro e Alto da Pedra no munic?pio de Mossor? - RN." Universidade Federal do Rio Grande do Norte, 2004. http://repositorio.ufrn.br:8080/jspui/handle/123456789/16788.

Full text
Abstract:
Made available in DSpace on 2014-12-17T15:19:59Z (GMT). No. of bitstreams: 1 JoseMRT.pdf: 3261760 bytes, checksum: 0a9e4ce701fa9ce62cb02438a86e4461 (MD5) Previous issue date: 2004-04-28
Coordena??o de Aperfei?oamento de Pessoal de N?vel Superior
The objective of this work is to identify, to chart and to explain the evolution of the soil occupation and the envirionment vulnerability of the areas of Canto do Amaro and Alto da Pedra, in the city of Mossor?-RN, having as base analyzes it multiweather of images of orbital remote sensors, the accomplishment of extensive integrated works of field to a Geographic Information System (GIS). With the use of inserted techniques of it analyzes space inserted in a (GIS), and related with the interpretation and analyzes of products that comes from the Remote Sensoriamento (RS.), make possible resulted significant to reach the objectives of this works. Having as support for the management of the information, the data set gotten of the most varied sources and stored in digital environment, it comes to constitute the geographic data base of this research. The previous knowledge of the spectral behavior of the natural or artificial targets, and the use of algorithms of Processing of Digital images (DIP), it facilitates the interpretation task sufficiently and searchs of new information on the spectral level. Use as background these data, was generated a varied thematic cartography was: Maps of Geology, Geomorfol?gicals Units soils, Vegetation and Use and Occupation of the soil. The crossing in environment SIG, of the above-mentioned maps, generated the maps of Natural and Vulnerability envirionmental of the petroliferous fields of I Canto do Amaro and Alto da Pedra-RN, working in an ambient centered in the management of waters and solid residuos, as well as the analysis of the spatial data, making possible then a more complex analysis of the studied area
O objetivo deste trabalho ? identificar, mapear e interpretar a evolu??o do uso e ocupa??o do solo e a vulnerabilidade ambiental das ?reas de Canto do Amaro e Alto da Pedra, no munic?pio de Mossor?-RN, tendo como base a analise multitemporal de imagens de sensores remotos orbitais, a realiza??o de extensos trabalhos de campo e um Sistema de Informa??o Geogr?fica (SIG). O emprego de t?cnicas de analise espacial inseridos em um Sistema de Informa??o Geogr?fica (SIG), e relacionadas com a interpreta??o e analise de produtos advindo do Sensoriamento Remoto (SR), permitiram se chegar aos resultados apresentados. Tendo como suporte para o gerenciamento da informa??o, o conjunto de dados obtidos das mais variadas fontes e armazenados em ambiente digital, vem a constituir o banco de dados geogr?fico desta pesquisa. O conhecimento pr?vio do comportamento espectral dos alvos naturais ou artificiais, e o auxilio de algoritmos de Processamento de Imagens Digitais (PDI), facilitou a tarefa de interpreta??o e busca de novas informa??es a n?vel espectral. Com base nesses dados, foi gerado uma cartografia tem?tica variada: Mapas de Geologia, Unidades Geomorfol?gicas, Associa??o de solos, Vegeta??o e Uso e Ocupa??o do Solo. O cruzamento em ambiente SIG, dos mapas supracitados, gerou os mapas de Vulnerabilidade Natural e Vulnerabilidade Ambiental dos campos petrol?feros de Canto do Amaro e Alto da Pedra-RN, surgerindo uma gest?o ambiental centrada na gest?o das ?guas e dos res?duos possibilitando assim uma an?lise mais complexa da ?rea estudada
APA, Harvard, Vancouver, ISO, and other styles
36

Mallangi, Siva Sai Reddy. "Low-Power Policies Based on DVFS for the MUSEIC v2 System-on-Chip." Thesis, KTH, Skolan för informations- och kommunikationsteknik (ICT), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-229443.

Full text
Abstract:
Multi functional health monitoring wearable devices are quite prominent these days. Usually these devices are battery-operated and consequently are limited by their battery life (from few hours to a few weeks depending on the application). Of late, it was realized that these devices, which are currently being operated at fixed voltage and frequency, are capable of operating at multiple voltages and frequencies. By switching these voltages and frequencies to lower values based upon power requirements, these devices can achieve tremendous benefits in the form of energy savings. Dynamic Voltage and Frequency Scaling (DVFS) techniques have proven to be handy in this situation for an efficient trade-off between energy and timely behavior. Within imec, wearable devices make use of the indigenously developed MUSEIC v2 (Multi Sensor Integrated circuit version 2.0). This system is optimized for efficient and accurate collection, processing, and transfer of data from multiple (health) sensors. MUSEIC v2 has limited means in controlling the voltage and frequency dynamically. In this thesis we explore how traditional DVFS techniques can be applied to the MUSEIC v2. Experiments were conducted to find out the optimum power modes to efficiently operate and also to scale up-down the supply voltage and frequency. Considering the overhead caused when switching voltage and frequency, transition analysis was also done. Real-time and non real-time benchmarks were implemented based on these techniques and their performance results were obtained and analyzed. In this process, several state of the art scheduling algorithms and scaling techniques were reviewed in identifying a suitable technique. Using our proposed scaling technique implementation, we have achieved 86.95% power reduction in average, in contrast to the conventional way of the MUSEIC v2 chip’s processor operating at a fixed voltage and frequency. Techniques that include light sleep and deep sleep mode were also studied and implemented, which tested the system’s capability in accommodating Dynamic Power Management (DPM) techniques that can achieve greater benefits. A novel approach for implementing the deep sleep mechanism was also proposed and found that it can obtain up to 71.54% power savings, when compared to a traditional way of executing deep sleep mode.
Nuförtiden så har multifunktionella bärbara hälsoenheter fått en betydande roll. Dessa enheter drivs vanligtvis av batterier och är därför begränsade av batteritiden (från ett par timmar till ett par veckor beroende på tillämpningen). På senaste tiden har det framkommit att dessa enheter som används vid en fast spänning och frekvens kan användas vid flera spänningar och frekvenser. Genom att byta till lägre spänning och frekvens på grund av effektbehov så kan enheterna få enorma fördelar när det kommer till energibesparing. Dynamisk skalning av spänning och frekvens-tekniker (såkallad Dynamic Voltage and Frequency Scaling, DVFS) har visat sig vara användbara i detta sammanhang för en effektiv avvägning mellan energi och beteende. Hos Imec så använder sig bärbara enheter av den internt utvecklade MUSEIC v2 (Multi Sensor Integrated circuit version 2.0). Systemet är optimerat för effektiv och korrekt insamling, bearbetning och överföring av data från flera (hälso) sensorer. MUSEIC v2 har begränsad möjlighet att styra spänningen och frekvensen dynamiskt. I detta examensarbete undersöker vi hur traditionella DVFS-tekniker kan appliceras på MUSEIC v2. Experiment utfördes för att ta reda på de optimala effektlägena och för att effektivt kunna styra och även skala upp matningsspänningen och frekvensen. Eftersom att ”overhead” skapades vid växling av spänning och frekvens gjordes också en övergångsanalys. Realtidsoch icke-realtidskalkyler genomfördes baserat på dessa tekniker och resultaten sammanställdes och analyserades. I denna process granskades flera toppmoderna schemaläggningsalgoritmer och skalningstekniker för att hitta en lämplig teknik. Genom att använda vår föreslagna skalningsteknikimplementering har vi uppnått 86,95% effektreduktion i jämförelse med det konventionella sättet att MUSEIC v2-chipets processor arbetar med en fast spänning och frekvens. Tekniker som inkluderar lätt sömn och djupt sömnläge studerades och implementerades, vilket testade systemets förmåga att tillgodose DPM-tekniker (Dynamic Power Management) som kan uppnå ännu större fördelar. En ny metod för att genomföra den djupa sömnmekanismen föreslogs också och enligt erhållna resultat så kan den ge upp till 71,54% lägre energiförbrukning jämfört med det traditionella sättet att implementera djupt sömnläge.
APA, Harvard, Vancouver, ISO, and other styles
37

Hong, Wei-Hung, and 洪暐桓. "Perceptual Signal Processing for Robust Bayesian Music Information Retrieval and Analysis." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/46729319423158682600.

Full text
Abstract:
碩士
國立交通大學
工學院聲音與音樂創意科技碩士學位學程
101
In this thesis, we attempt to propound an analysis procedures of robust music information retrieval (MIR) systems. In order to increase the ability to describe the information of music, we take account of three perceptual phenomenon including auditory physiology, psychoacoustic and music expectation. Furthermore, we use Bayesian statistics to automatic learning the content parameter in the model. In this way, we can begin by setting the initial probability distribution of parameter according to music theory, then fitting to proper distribution in line with observation data. What we wish to demonstrate about robustness can be broadly defined as no matter under what kind of system-level, even if there is unexpected variability in the input, the system can still provide steady expected output. Chord progression recognition system play a critical role of core in the music information retrieval domain. Therefore, we will use this system as an example to be discussed. We believe that the analysis procedures of this thesis will have generality in the field of music information retrieval. First, we propose a modified auditory perceptual model for music signal processing and use this model to design a novel music feature. Next, we propose an unsupervised robust Bayesian chord progression recognition system that can recognize the chord progression within a single song without requiring any training data. The two parts are used in a total of 180 songs of the Beatles 13 album music corpus containing 25 kinds of chord type in triads major, minor and no chord. The result of the experiment show that our systems have excellent performance compare with the state-of-the-art.
APA, Harvard, Vancouver, ISO, and other styles
38

Byron, Timothy P., University of Western Sydney, College of Arts, and School of Psychology. "The processing of pitch and temporal information in relational memory for melodies." 2008. http://handle.uws.edu.au:8081/1959.7/37492.

Full text
Abstract:
A series of experiments investigate the roles of relational coding and expectancy in memory for melodies. The focus on memory for melodies was motivated by an argument that research on the evolutionary psychology of music cognition would be improved by further research in this area. Melody length and the use of transposition were identified in a literature review as experimental variables with the potential to shed light on the cognitive mechanisms in memory for melodies; similarly, pitch interval magnitude (PIM), melodic contour, metre, and pulse were identified as musical attributes that appear to be processed by memory for melodies. It was concluded that neither previous models of verbal short term memory (vSTM) nor previous models of memory for melodies are unable to satisfactorily explain current findings on memory for melodies. The model of relational memory for melodies that is developed here aims to explain findings from the memory for melodies literature. This model emphasises the relationship between: a) perceptual processes – specifically, a relational coding mechanism which encodes pitch and temporal information in a relational form; b) a short term store; and c) the redintegration of memory traces using schematic and veridical expectancies. The relational coding mechanism, which focuses on pitch and temporal accents (c.f., Jones, 1993), is assumed to be responsible for the salience of contour direction and note length, while the expectancy processes are assumed to be more responsible for the salience of increases in PIM or deviations from the temporal grid. Using a melody discrimination task, with key transposition within-pairs, in which melody length was manipulated, Experiments 1a, 1b, and 2 investigated the assumption that contour would be more reliant on the relational coding mechanism and PIM would be more reliant on expectancy processes. Experiment 1a confirmed this hypothesis using 8 and 16 note folk melodies. Experiment 1b used the same stimuli as Experiment 1a, except that the within-pair order was reversed in order to reduce the influence of expectancy processes. As expected, while contour was still salient under these conditions, PIM was not. Experiment 2 was similar to Experiment 1b, except that it avoided using the original melodies in same trials in order to specifically reduce the influence of veridical expectancy processes. This led to a floor effect. Overall, the results support the explanation of pitch processing in memory for melodies in the model. Experiments 3 and 4 investigated the assumption in the model that temporal processing in memory for melodies was reliant on the relational coding mechanism. Experiment 3 found that, with key transposition within-pairs, there was little difference between pulse alterations (which deviate more from the temporal grid) and metre alterations (which lengthen the note more) in short melodies, but that pulse alterations were more salient than metre alterations in long melodies. Experiment 4 showed that, with tempo transposition within-pairs, metre alterations were more salient than pulse alterations in short melodies, but that there was no difference in salience in long melodies. That metre alterations are more salient than pulse alterations in Experiment 4 strongly suggests that there is relational coding of temporal information, and that this relational coding uses note length to determine the presence of accents, as the model predicts. Experiments 5a and 5b, using a Garner interference task, transposition within-pairs, and manipulations of melody length, investigated the hypothesis derived from the model that pitch and temporal information would be integrated in the relational coding mechanism. Experiment 5b demonstrated an effect of Garner interference from pitch alterations on the discrimination of temporal alterations; Experiment 5a found a weaker effect of Garner interference from pitch alterations on the discrimination of temporal alterations. The presence of Garner interference in these tasks when there was transposition within melody pairs suggests that pitch and temporal information are integrated in the relational coding mechanism, as predicted in the model. Seven experiments therefore provide support for the assumption that a relational coding mechanism and LTM expectancies play a role in the discrimination of melodies. This has implications for other areas of research in music cognition. Firstly, theories of the evolution of music must be able to explain why features of these processing mechanisms could have evolved. Secondly, research into acquired amusia should have a greater focus on differences between perceptual, cognitive, and LTM processing. Thirdly, research into similarities between music processing and language processing would be improved by further research using PIM as a variable.
Doctor of Philosophy (PhD)
APA, Harvard, Vancouver, ISO, and other styles
39

Byron, Timothy P. "The processing of pitch and temporal information in relational memory for melodies." Thesis, 2008. http://handle.uws.edu.au:8081/1959.7/37492.

Full text
Abstract:
A series of experiments investigate the roles of relational coding and expectancy in memory for melodies. The focus on memory for melodies was motivated by an argument that research on the evolutionary psychology of music cognition would be improved by further research in this area. Melody length and the use of transposition were identified in a literature review as experimental variables with the potential to shed light on the cognitive mechanisms in memory for melodies; similarly, pitch interval magnitude (PIM), melodic contour, metre, and pulse were identified as musical attributes that appear to be processed by memory for melodies. It was concluded that neither previous models of verbal short term memory (vSTM) nor previous models of memory for melodies are unable to satisfactorily explain current findings on memory for melodies. The model of relational memory for melodies that is developed here aims to explain findings from the memory for melodies literature. This model emphasises the relationship between: a) perceptual processes – specifically, a relational coding mechanism which encodes pitch and temporal information in a relational form; b) a short term store; and c) the redintegration of memory traces using schematic and veridical expectancies. The relational coding mechanism, which focuses on pitch and temporal accents (c.f., Jones, 1993), is assumed to be responsible for the salience of contour direction and note length, while the expectancy processes are assumed to be more responsible for the salience of increases in PIM or deviations from the temporal grid. Using a melody discrimination task, with key transposition within-pairs, in which melody length was manipulated, Experiments 1a, 1b, and 2 investigated the assumption that contour would be more reliant on the relational coding mechanism and PIM would be more reliant on expectancy processes. Experiment 1a confirmed this hypothesis using 8 and 16 note folk melodies. Experiment 1b used the same stimuli as Experiment 1a, except that the within-pair order was reversed in order to reduce the influence of expectancy processes. As expected, while contour was still salient under these conditions, PIM was not. Experiment 2 was similar to Experiment 1b, except that it avoided using the original melodies in same trials in order to specifically reduce the influence of veridical expectancy processes. This led to a floor effect. Overall, the results support the explanation of pitch processing in memory for melodies in the model. Experiments 3 and 4 investigated the assumption in the model that temporal processing in memory for melodies was reliant on the relational coding mechanism. Experiment 3 found that, with key transposition within-pairs, there was little difference between pulse alterations (which deviate more from the temporal grid) and metre alterations (which lengthen the note more) in short melodies, but that pulse alterations were more salient than metre alterations in long melodies. Experiment 4 showed that, with tempo transposition within-pairs, metre alterations were more salient than pulse alterations in short melodies, but that there was no difference in salience in long melodies. That metre alterations are more salient than pulse alterations in Experiment 4 strongly suggests that there is relational coding of temporal information, and that this relational coding uses note length to determine the presence of accents, as the model predicts. Experiments 5a and 5b, using a Garner interference task, transposition within-pairs, and manipulations of melody length, investigated the hypothesis derived from the model that pitch and temporal information would be integrated in the relational coding mechanism. Experiment 5b demonstrated an effect of Garner interference from pitch alterations on the discrimination of temporal alterations; Experiment 5a found a weaker effect of Garner interference from pitch alterations on the discrimination of temporal alterations. The presence of Garner interference in these tasks when there was transposition within melody pairs suggests that pitch and temporal information are integrated in the relational coding mechanism, as predicted in the model. Seven experiments therefore provide support for the assumption that a relational coding mechanism and LTM expectancies play a role in the discrimination of melodies. This has implications for other areas of research in music cognition. Firstly, theories of the evolution of music must be able to explain why features of these processing mechanisms could have evolved. Secondly, research into acquired amusia should have a greater focus on differences between perceptual, cognitive, and LTM processing. Thirdly, research into similarities between music processing and language processing would be improved by further research using PIM as a variable.
APA, Harvard, Vancouver, ISO, and other styles
40

"Stream segregation and pattern matching techniques for polyphonic music databases." 2003. http://library.cuhk.edu.hk/record=b5891706.

Full text
Abstract:
Szeto, Wai Man.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2003.
Includes bibliographical references (leaves 81-86).
Abstracts in English and Chinese.
Abstract --- p.ii
Acknowledgements --- p.vi
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Motivations and Aims --- p.1
Chapter 1.2 --- Thesis Organization --- p.6
Chapter 2 --- Preliminaries --- p.7
Chapter 2.1 --- Fundamentals of Music and Terminology --- p.7
Chapter 2.2 --- Findings in Auditory Psychology --- p.8
Chapter 3 --- Literature Review --- p.12
Chapter 3.1 --- Pattern Matching Techniques for Music Information Retrieval --- p.12
Chapter 3.2 --- Stream Segregation --- p.14
Chapter 3.3 --- Post-tonal Music Analysis --- p.15
Chapter 4 --- Proposed Method for Stream Segregation --- p.17
Chapter 4.1 --- Music Representation --- p.17
Chapter 4.2 --- Proposed Method --- p.19
Chapter 4.3 --- Application of Stream Segregation to Polyphonic Databases --- p.27
Chapter 4.4 --- Experimental Results --- p.30
Chapter 4.5 --- Summary --- p.36
Chapter 5 --- Proposed Approaches for Post-tonal Music Analysis --- p.38
Chapter 5.1 --- Pitch-Class Set Theory --- p.39
Chapter 5.2 --- Sequence-Based Approach --- p.43
Chapter 5.2.1 --- Music Representation --- p.43
Chapter 5.2.2 --- Matching Conditions --- p.44
Chapter 5.2.3 --- Algorithm --- p.46
Chapter 5.3 --- Graph-Based Approach --- p.47
Chapter 5.3.1 --- Graph Theory and Its Notations --- p.48
Chapter 5.3.2 --- Music Representation --- p.50
Chapter 5.3.3 --- Matching Conditions --- p.53
Chapter 5.3.4 --- Algorithm --- p.57
Chapter 5.4 --- Experiments --- p.67
Chapter 5.4.1 --- Experiment 1 --- p.67
Chapter 5.4.2 --- Experiment 2 --- p.68
Chapter 5.4.3 --- Experiment 3 --- p.70
Chapter 5.4.4 --- Experiment 4 --- p.75
Chapter 6 --- Conclusion --- p.79
Bibliography --- p.81
A Publications --- p.87
APA, Harvard, Vancouver, ISO, and other styles
41

Kun-Chih, Shih, and 施昆志. "An creative and interactive multimedia system for playing comfortable music in general spaces based on computer vision and image processing technique, and combined analyses of color, psychology, and music information." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/r4366v.

Full text
Abstract:
碩士
南台科技大學
多媒體與電腦娛樂科學研究所
94
Systems based on computer vision and image processing are widely developed in scientific and medical applications. On the other hand, integrated analyses of color, psychology, music, and showing ways of multimedia are useful and helpful in life entertainments. Association of the two fields becomes more and more popular in recent years, and it will be a trend in the future. This motivates us to design a creative and interactive multimedia system that can recognize and capture the color information of one’s wearing when one enters a space. After the color recognition and extraction, we relate the color information with the psychology theory to analyze the characteristics and feeling of the people in the space. Moreover, we relate the psychology theory with the music theory to play appropriate music to comfort the people’s mind in the space. This application can easily be extended to exhibition centers, conference halls, coffee bars, or any space needing special music. Successful experimental results confirm the effectiveness of the proposed approach.
APA, Harvard, Vancouver, ISO, and other styles
42

Rosão, Carlos Manuel Tadeia. "Onset detection in music signals." Master's thesis, 2012. http://hdl.handle.net/10071/5991.

Full text
Abstract:
A Detec c~ao de Onsets, ou seja, a tarefa que procura encontrar o momento de in cio de notas musicais num sinal de audio, tem sido uma area de investiga c~ao activa, uma vez que a Detec c~ao de Onsets e comummente utilizada como primeiro passo em tarefas de alto-n vel de processamento musical. Tendo em conta a necessidade de saber que m etodo de Detec c~ao de Onsets e mais adequado a cada tarefa de alto-n vel, nesta tese foram seguidas duas abordagens que visam, acima de tudo, obter uma informa c~ao mais completa sobre cada m etodo de Detec c~ao de Onsets. A primeira abordagem consiste numa compara c~ao em profundidade do comportamento dos m etodos de Detec c~ao de Onsets que usam caracter sticas espectrais do sinal. Os resultados obtidos mostram que o comportamento dos diferentes m etodos varia signi cativamente entre as fun c~oes de detec c~ao usadas, entre os tipos de Onset, e ainda de acordo com a t ecnica de interpreta c~ao do instrumento. Na segunda abordagem avalia-se a in u^encia do passo nal de Selec c~ao de Picos nos resultados globais de Detec c~ao de Onsets. Os resultados obtidos mostram que o passo de Selec c~ao de Picos in uencia profundamente os resultados { negativa e positivamente {, e que esta in u^encia difere signi cativamente de acordo com o tipo de Onset e com o m etodo de Detec c~ao de Onsets usado.
Onset Detection, that is, the quest for nding the starting moment of musical notes in an audio signal, is an active research subject since note onset detection is commonly used as a rst step in high-level music processing tasks. Driven by the need to know which Onset Detection method can suit better each high-level music processing task, two approaches are followed in this thesis in order to obtain a more complete information about the di erent onset detection methods. The rst consists in a full comparison of the performance of Onset Detection Methods that use Spectral Features. Our results in two distinct datasets show that the behaviour of onset detection varies clearly between onset types and between detection functions, as well as between instrument interpretation style. The other approach assesses the in uence of the nal Peak Selection step in the global results of Onset Detection. Our results show that the Peak Selection step used deeply in uences both positively and negatively the results obtained, and that its in uence di ers signi cantly according to the onset classes and to the onset detection functions.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography