To see the other types of publications on this topic, follow the link: Gesture Synthesis.

Dissertations / Theses on the topic 'Gesture Synthesis'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 27 dissertations / theses for your research on the topic 'Gesture Synthesis.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Faggi, Simone. "An Evaluation Model For Speech-Driven Gesture Synthesis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22844/.

Full text
Abstract:
The research and development of embodied agents with advanced relational capabilities is constantly evolving. In recent years, the development of behavioural signal generation models to be integrated in social robots and virtual characters, is moving from rule-based to data-driven approaches, requiring appropriate and reliable evaluation techniques. This work proposes a novel machine learning approach for the evaluation of speech-to-gestures models that is independent from the audio source. This approach enables the measurement of the quality of gestures produced by these models and provides a benchmark for their evaluation. Results show that the proposed approach is consistent with evaluations made through user studies and, furthermore, that its use allows for a reliable comparison of speech-to-gestures state-of-the-art models.
APA, Harvard, Vancouver, ISO, and other styles
2

Marrin, Nakra Teresa (Teresa Anne) 1970. "Inside the conductor's jacket : analysis, interpretation and musical synthesis of expressive gesture." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/9165.

Full text
Abstract:
Thesis (Ph.D.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2000.
Includes bibliographical references (leaves 154-167).
We present the design and implementation of the Conductor's Jacket, a unique wearable device that measures physiological and gestural signals, together with the Gesture Construction, a musical software system that interprets these signals and applies them expressively in a musical context. Sixteen sensors have been incorporated into the Conductor's Jacket in such a way as to not encumber or interfere with the gestures of a working orchestra conductor. The Conductor's Jacket system gathers up to sixteen data channels reliably at rates of 3 kHz per channel, and also provides mcal-time graphical feedback. Unlike many gesture-sensing systems it not only gathers positional and accelerational data but also senses muscle tension from several locations on each arm. The Conductor's Jacket was used to gather conducting data from six subjects, three professional conductors and three students, during twelve hours of rehearsals and performances. Analyses of the data yielded thirty-five significant features that seem to reflect intuitive and natural gestural tendencies, including context-based hand switching, anticipatory 'flatlining' effects, and correlations between respiration and phrasing. The results indicate that muscle tension and respiration signals reflect several significant and expressive characteristics of a conductor's gestures. From these results we present nine hypotheses about human musical expression, including ideas about efficiency, intentionality, polyphony, signal-to-noise ratios, and musical flow state. Finally, this thesis describes the Gesture Construction, a musical software system that analyzes and performs music in real-time based on the performer's gestures and breathing signals. A bank of software filters extracts several of the features that were found in the conductor study, including beat intensities and the alternation between arms. These features are then used to generate real-time expressive effects by shaping the beats, tempos, articulations, dynamics, and note lengths in a musical score.
by Teresa Marrin Nakra.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
3

Pun, James Chi-Him. "Gesture recognition with application in music arrangement." Diss., University of Pretoria, 2006. http://upetd.up.ac.za/thesis/available/etd-11052007-171910/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wang, Yizhong Johnty. "Investigation of gesture control for articulatory speech synthesis with a bio-mechanical mapping layer." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/43193.

Full text
Abstract:
In the process of working with a real-time, gesture controlled speech and singing synthesizer used for musical performance, we have documented performer related issues and provided some suggestions that will serve to improve future work in the field from an engineering and technician's perspective. One particular, significant detrimental factor in the existing system is the sound quality caused by the limitations of the one-to-one kinematic mapping between the gesture input and output. In order to solve this a force activated bio-mechanical mapping layer was implemented to drive an articulatory synthesizer, and the results were and compared with the existing mapping system for the same task from both the performer and listener perspective. The results show that adding the complex, dynamic bio-mechanical mapping layer introduces more difficulty but allows a greater degree of expression to the performer that is consistent with existing work in the literature. However, to the novice listener, there is no significant difference in the intelligibility of the sound or the perceived quality. The results suggest that for browsing through a vowel space force and position input are comparable when considering output intelligibility alone but for expressivity a complex input may be more suitable.
APA, Harvard, Vancouver, ISO, and other styles
5

Pérez, Carrillo Alfonso Antonio. "Enhacing spectral sintesis techniques with performance gestures using the violin as a case study." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7264.

Full text
Abstract:
In this work we investigate new sound synthesis techniques for imitating musical instruments using the violin as a case study. It is a multidisciplinary research, covering several fields such as spectral modeling, machine learning, analysis of musical gestures or musical acoustics. It addresses sound production with a very empirical approach, based on the analysis of performance gestures as well as on the measurement of acoustical properties of the violin. Based on the characteristics of the main vibrating elements of the violin, we divide the study into two parts, namely bowed string and violin body sound radiation. With regard to the bowed string, we are interested in modeling the influence of bowing controls on the spectrum of string vibration. To accomplish this task we have developed a sensing system for accurate measurement of the bowing parameters. Analysis of real performances allows a better understanding of the bowing control space, its use by performers and its effect on the timbre of the sound produced. Besides, machine learning techniques are used to design a generative timbre model that is able to predict spectral envelopes corresponding to a sequence of bowing controls. These envelopes can then be filled with harmonic and noisy sound components to produce a synthetic string-vibration signal. In relation to the violin body, a new method formeasuring acoustical violin-body impulse responses has been conceived, based on bowed glissandi and a deconvolution algorithm of non-impulsive signals. Excitation is measured as string vibration and responses are recorded with multiple microphones placed at different angles around the violin, providing complete radiation patterns at all frequencies. Both the results of the bowed string and the violin body studies have been incorporated into a violin synthesizer prototype based on sample concatenation. Predicted envelopes of the timbre model are applied to the samples as a time-varying filter, which entails smoother concatenations and phrases that follow the nuances of the controlling gestures. These transformed samples are finally convolved with a body impulse response to recreate a realistic violin sound. The different impulse responses used can enhance the listening experience by simulating different violins, or effects such as stereo or violinist motion. Additionally, an expressivity model has been integrated into the synthesizer, adding expressive features such as timing deviations, dynamics or ornaments, thus augmenting the naturalness of the synthetic performances.
APA, Harvard, Vancouver, ISO, and other styles
6

Thoret, Etienne. "Caractérisation acoustique des relations entre les mouvements biologiques et la perception sonore : application au contrôle de la synthèse et à l'apprentissage de gestes." Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM4780/document.

Full text
Abstract:
Cette thèse s'est intéressée aux relations entre les mouvements biologiques et la perception sonore en considérant le cas spécifique des mouvements graphiques et des sons de frottement qu'ils génèrent. L'originalité de ces travaux réside dans l'utilisation d'un modèle de synthèse sonore basé sur un principe perceptif issu de l'approche écologique de la perception et contrôlé par des modèles de gestes. Des stimuli sonores dont le timbre n'est modulé que par des variations de vitesse produites par un geste ont ainsi pu être générés permettant de se focaliser sur l'influence perceptive de cet invariant transformationel. Une première étude a ainsi montré que l'on reconnait la cinématique des mouvements biologiques (la loi en puissance 1/3), et que l'on peut discriminer des formes géométriques simples juste à partir des sons de frottement produits. Une seconde étude a montré l'existence de prototypes dynamiques sonores caractérisant les trajectoires elliptiques, mettant ainsi en évidence que les prototypes géométriques peuvent émerger d'un couplage sensorimoteur. Enfin, une dernière étude a montré qu'une cinématique évoquée par un sonore influence significativement la cinématique et la géométrie d'un geste dans une tâche de reproduction graphique du mouvement d'un point lumineux. Ce résultat révèle l'importance de la modalité auditive dans l'intégration multisensorielle des mouvements continus dans une situation jamais explorée. Ces résultats ont permis le contrôle de modèles de synthèse par des descriptions gestuelles et la création d'outils de sonification pour l'apprentissage de gestes et la réhabilitation d'une pathologie motrice, la dysgraphie
This thesis focused on the relations between biological movements and auditory perception in considering the specific case of graphical movements and the friction sounds they produced. The originality of this work lies in the use of sound synthesis processes that are based on a perceptual paradigm and that can be controlled by gesture models. The present synthesis model made it possible to generate acoustic stimuli which timbre was directly modulated by the velocity variations induced by a graphic gesture in order to exclusively focus on the perceptual influence of this transformational invariant. A first study showed that we can recognize the biological motion kinematics (the 1/3 power law) and discriminate simple geometric shapes simply by listening to the timbre variations of friction sounds that solely evoke velocity variations. A second study revealed the existence of dynamic prototypes characterized by sounds corresponding to the most representative elliptic trajectory, thus revealing that prototypical shapes may emerged from sensorimotor coupling. A final study showed that the kinematics evoked by friction sounds may significantly affect the dynamic and geometric dimension in the visuo-motor coupling. This shed critical light on the relevance of auditory perception in the multisensory integration of continuous motion in a situation never explored. All of these theoretical results enabled the gestural control of sound synthesis models from a gestural description and the creation of sonification tools for gesture learning and rehabilitation of a graphomotor disease, dysgraphia
APA, Harvard, Vancouver, ISO, and other styles
7

Devaney, Jason Wayne. "A study of articulatory gestures for speech synthesis." Thesis, University of Liverpool, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.284254.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Métois, Eric. "Musical sound information : musical gestures and embedding synthesis." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/29125.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Vigliensoni, Martin Augusto. "Touchless gestural control of concatenative sound synthesis." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104846.

Full text
Abstract:
This thesis presents research on three-dimensional position tracking technologies used to control concatenative sound synthesis and applies the achieved research results to the design of a new immersive interface for musical expression. The underlying concepts and characteristics of position tracking technologies are reviewed and musical applications using these technologies are surveyed to exemplify their use. Four position tracking systems based on different technologies are empirically compared according to their performance parameters, technical specifications, and practical considerations of use. Concatenative sound synthesis, a corpus-based synthesis technique grounded on the segmentation, analysis and concatenation of sound units, is discussed. Three implementations of this technique are compared according to the characteristics of the main components involved in the architecture of these systems. Finally, this thesis introduces SoundCloud, an implementation that extends the interaction possibilities of one of the concatenative synthesis systems reviewed, providing a novel visualisation application. SoundCloud allows a musician to perform with a database of sounds distributed in a three-dimensional descriptor space by exploring a performance space with her hands.
Ce mémoire de thèse présente une nouvelle interface pour l'expression musicale combinant la synthèse sonore par concaténation et les technologies de captation de mouvements dans l'espace. Ce travail commence par une présentation des dispositifs de capture de position de type main-libre, en étudiant leur principes de fonctionnement et leur caractéristiques. Des exemples de leur application dans les contextes musicaux sont aussi étudiés. Une attention toute particulière est accordée à quatre systèmes: leurs spécifications techniques ainsi que leurs performances (évaluées par des métriques quantitatives) sont comparées expérimentalement. Ensuite, la synthèse concaténative est décrite. Cette technique de synthèse sonore consiste à synthéthiser une séquence musicale cible à partir de sons pré-enregistrés, sélectionnés et concaténés en fonction de leur adéquation avec la cible. Trois implémentations de cette technique sont comparées, permettant ainsi d'en choisir une pour notre application. Enfin, nous décrivons SoundCloud, une nouvelle interface qui, en ajoutant une interface visuelle à la méthode de synthèse concaténative, permet d'en étendre les possibilités de contrôle. SoundCloud permet en effet de contrôler la synthése de sons en utilisant des gestes libres des mains pour naviguer au sein d'un espace tridimensionnel de descripteurs des sons d'une base de données.
APA, Harvard, Vancouver, ISO, and other styles
10

Maestre, Gómez Esteban. "Modeling instrumental gestures: an analysis/synthesis framework for violin bowing." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7562.

Full text
Abstract:
Aquest treball presenta una metodologia per modelar el gest instrumental en la interpretació amb instruments musicals d'excitació contínua. En concret, la tesi tracta el control d'arc en interpretació clàssica de violí. S'hi introdueixen tècniques de mesura que presenten baixa intrusivitat, i són aplicades per a l'adquisició de senyals de paràmetres de control d'arc relacionats amb el timbre del so, i per a la construcció d'una base de dades la interpretació. Mitjançant la definició d'un vocabulari d'envolupants, es fan servir seqüències de corbes paramètriques de Bézier per modelar els contorns de velocitat de l'arc, força aplicada a l'arc, i distància entre l'arc i el pont del violí. Així, s'obté una parametrització que permet reconstruir els contorns originals amb robustesa i fidelitat. A partir de la parametrització dels contorns continguts a la base de dades, es construeix un model estadístic per l'anàlisi i la síntesi d'envolupants de paràmetres de control d'arc. Aquest model permet un mapeig flexible entre anotacions de partitura i envolupants. L'entorn de modelat es fa servir per generar contorns sintétics a partir d'una representació textual de la partitura, mitjançant un algorisme de planificació de l'ús d'arc capaç de reproduir les limitacions imposades per les dimensions físiques de l'arc. Els paràmetres de control sintetitzats s'utilitzen amb éxit per generar interpretacions artificials de violí fent servir dues de les técniques de síntesi de so més exteses: models físics basats en guies digitals d'ona, i síntesi basada en mostres.
This work presents a methodology for modeling instrumental gestures in excitation-continuous musical instruments. In particular, it approaches bowing control in violin classical performance. Nearly non-intrusive sensing techniques are introduced and applied for accurately acquiring relevant timbre-related bowing control parameter signals and constructing a performance database. By defining a vocabulary of bowing parameter envelopes, the contours of bow velocity, bow pressing force, and bow-bridge distance are modeled as sequences of Bézier cubic curve segments, yielding a robust parameterization that is well suited for reconstructing original contours with significant fidelity. An analysis/synthesis statistical modeling framework is constructed from a database of parameterized contours of bowing controls, enabling a flexible mapping between score annotations and bowing parameter envelopes. The framework is used for score-based generation of synthetic bowing parameter contours through a bow planning algorithm able to reproduce possible constraints imposed by the finite length of the bow. Rendered bowing control signals are successfully applied to automatic performance by being used for driving offline violin sound generation through two of the most extended techniques: digital waveguide physical modeling, and sample-based synthesis.
APA, Harvard, Vancouver, ISO, and other styles
11

Goudard, Vincent. "Représentation et contrôle dans le design interactif des instruments de musique numériques." Thesis, Sorbonne université, 2020. https://accesdistant.sorbonne-universite.fr/login?url=http://theses-intra.upmc.fr/modules/resources/download/theses/2020SORUS051.pdf.

Full text
Abstract:
Les instruments de musique numériques se présentent comme des objets complexes, qui se situent à la fois dans une continuité historique avec l'histoire de la lutherie tout en étant marqués par une rupture forte provoquée par le numérique et ses conséquences en terme de possibilités sonores, de relations entre le geste et le son, de situations d'écoute, de re-configurabilité des instruments, etc. Ce travail de doctorat propose une analyse des caractéristiques émanant de l'intégration du numérique dans les instruments de musique, en s'appuyant notamment sur une réflexion musicologique, sur des développements logiciels et matériels et sur une pratique musicale, ainsi que sur des échanges avec d'autres musiciens, luthiers, compositeurs et chercheurs
Digital musical instruments appear as complex objects, being positioned in a continuum with the history of lutherie as well as marked with a strong disruption provoked by the digital technology and its consequences in terms of sonic possibilities, relations between gesture and sound, listening situations, reconfigurability of instruments and so on. This doctoral work tries to describe the characteristics originating from the integration of digital technology into musical instruments, drawing notably on a musicological reflection, on softwares and hardwares development, on musical practice, as well as a number of interactions with other musicians, instruments makers, composers and researchers
APA, Harvard, Vancouver, ISO, and other styles
12

洪觀宇 and Roy Hung. "Time domain analysis and synthesis of cello tones based on perceptual quality and playing gestures." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31215348.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Hung, Roy. "Time domain analysis and synthesis of cello tones based on perceptual quality and playing gestures /." Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20665672.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Evrard, Marc. "Synthèse de parole expressive à partir du texte : Des phonostyles au contrôle gestuel pour la synthèse paramétrique statistique." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112202.

Full text
Abstract:
L’objectif de cette thèse est l’étude et la conception d’une plateforme de synthèse de parole expressive.Le système de synthèse — LIPS3, développé dans le cadre de ce travail, incorpore deux éléments : un module linguistique et un module de synthèse paramétrique par apprentissage statistique (construit à l’aide de HTS et de STRAIGHT). Le système s’appuie sur un corpus monolocuteur conçu, enregistréet étiqueté à cette occasion.Une première étude, sur l’influence de la qualité de l’étiquetage du corpus d’apprentissage, indique que la synthèse paramétrique statistique est robuste aux erreurs de labels et d’alignement. Cela répond au problème de la variation des réalisations phonétiques en parole expressive.Une seconde étude, sur l’analyse acoustico-phonétique du corpus permet la caractérisation de l’espace expressif utilisé par la locutrice pour réaliser les consignes expressives qui lui ont été fournies. Les paramètres de source et les paramètres articulatoires sont analysés suivant les classes phonétiques, ce qui permet une caractérisation fine des phonostyles.Une troisième étude porte sur l’intonation et le rythme. Calliphony 2.0 est une interface de contrôlechironomique temps-réel permettant la modification de paramètres prosodiques (f0 et tempo) des signaux de synthèse sans perte de qualité, via une manipulation directe de ces paramètres. Une étude sur la stylisation de l’intonation et du rythme par contrôle gestuel montre que cette interface permet l’amélioration, non-seulement de la qualité expressive de la parole synthétisée, mais aussi de la qualité globale perçue en comparaison avec la modélisation statistique de la prosodie.Ces études montrent que la synthèse paramétrique, combinée à une interface chironomique, offre une solution performante pour la synthèse de la parole expressive, ainsi qu’un outil d’expérimentation puissant pour l’étude de la prosodie
The subject of this thesis was the study and conception of a platform for expressive speech synthesis.The LIPS3 Text-to-Speech system — developed in the context of this thesis — includes a linguistic module and a parametric statistical module (built upon HTS and STRAIGHT). The system was based on a new single-speaker corpus, designed, recorded and annotated.The first study analyzed the influence of the precision of the training corpus phonetic labeling on the synthesis quality. It showed that statistical parametric synthesis is robust to labeling and alignment errors. This addresses the issue of variation in phonetic realizations for expressive speech.The second study presents an acoustico-phonetic analysis of the corpus, characterizing the expressive space used by the speaker to instantiate the instructions that described the different expressive conditions. Voice source parameters and articulatory settings were analyzed according to their phonetic classes, which allowed for a fine phonostylistic characterization.The third study focused on intonation and rhythm. Calliphony 2.0 is a real-time chironomic interface that controls the f0 and rhythmic parameters of prosody, using drawing/writing hand gestures with a stylus and a graphic tablet. These hand-controlled modulations are used to enhance the TTS output, producing speech that is more realistic, without degradation as it is directly applied to the vocoder parameters. Intonation and rhythm stylization using this interface brings significant improvement to the prototypicality of expressivity, as well as to the general quality of synthetic speech.These studies show that parametric statistical synthesis, combined with a chironomic interface, offers an efficient solution for expressive speech synthesis, as well as a powerful tool for the study of prosody
APA, Harvard, Vancouver, ISO, and other styles
15

MORTENSEN, WANDERLEY MARCELO. "Interaction musicien - instrument : application au controle gestuel de la synthese sonore." Paris 6, 2001. http://www.theses.fr/2001PA066175.

Full text
Abstract:
Cette these presente une analyse de l'interaction musicien - instrument et des applications au controle gestuel de la synthese sonore. En premier lieu, une etude d'un instrument musical numerique generique (dmi) et de ses composantes est realisee, en proposant des suggestions d'approches nouvelles a sa conception. Puis, l'analyse du jeu instrumental sur des instruments acoustiques est etudiee dans le but de proposer des directions en vue de la conception de dmis. La premiere partie concerne la question generale des systemes musicaux interactifs et presente une introduction aux parties principales d'un dmi. La notion du geste en musique est etudiee, en suggerant des complements a la litterature existante. Ensuite, la definition de possibilites concernant de l'acquisition de gestes et la realisation de dispositifs d'entree est abordee, ou sont proposees des techniques d'evaluation appropriees au contexte musical. Ceci est suivi de l'analyse de strategies de correspondance entre les variables du controleur gestuel et les variables de synthese. En conclusion de cette partie, ce document presente la description d'un systeme temps-reel ou diverses des idees presentees ont ete mises en application. La deuxieme partie presente l'analyse d'executions de pieces pour clarinette : les mouvements ancillaires d'instrumentistes sont mesures et analyses. Cette analyse montre que ces mouvements font partie de l'execution instrumentale et sont indissociables de l'interpretation. Les comparaisons de differentes executions par le meme joueur indiquent des similarites saisissantes, aussi bien temporelles que spatiales. L'analyse des executions d'un meme morceau par differents joueurs met en evidence differents niveaux de mouvements, dependants des facteurs physiologiques, structuraux ou interpretatifs. Finalement, le dernier chapitre de ce travail etudie l'influence acoustique des mouvements de l'interprete et montre que des fortes fluctuations d'amplitude de partiels en decoulent pour certaines conditions d'enregistrement. Un modele simple de cet effet a ete mis en uvre et des resultats de simulation sont presentes, resultats qui suggerent que ces mouvements peuvent etre utilises en tant que parametre supplementaire de synthese et ainsi ameliorer la conception des dmis existants.
APA, Harvard, Vancouver, ISO, and other styles
16

Fares, Mireille. "Multimodal Expressive Gesturing With Style." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS017.

Full text
Abstract:
La génération de gestes expressifs permet aux agents conversationnels animés (ACA) d'articuler un discours d'une manière semblable à celle des humains. Le thème central du manuscrit est d'exploiter et contrôler l'expressivité comportementale des ACA en modélisant le comportement multimodal que les humains utilisent pendant la communication. Le but est (1) d’exploiter la prosodie de la parole, la prosodie visuelle et le langage dans le but de synthétiser des comportements expressifs pour les ACA; (2) de contrôler le style des gestes synthétisés de manière à pouvoir les générer avec le style de n'importe quel locuteur. Nous proposons un modèle de synthèse de gestes faciaux à partir du texte et la parole; et entraîné sur le corpus TEDx que nous avons collecté. Nous proposons ZS-MSTM 1.0, une approche permettant de synthétiser des gestes stylisés du haut du corps à partir du contenu du discours d'un locuteur source et correspondant au style de tout locuteur cible. Il est entraîné sur le corpus PATS qui inclut des données multimodales de locuteurs ayant des styles de comportement différents. Il n'est pas limité aux locuteurs de PATS, et génère des gestes dans le style de n'importe quel nouveau locuteur vu ou non par notre modèle, sans entraînement supplémentaire, ce qui rend notre approche «zero-shot». Le style comportemental est modélisé sur les données multimodales des locuteurs - langage, gestes et parole - et indépendamment de l'identité du locuteur. Nous proposons ZS-MSTM 2.0 pour générer des gestes faciaux stylisés en plus des gestes du haut du corps. Ce dernier est entraîné sur une extension de PATS, qui inclut des actes de dialogue et des repères faciaux en 2D
The generation of expressive gestures allows Embodied Conversational Agents (ECA) to articulate the speech intent and content in a human-like fashion. The central theme of the manuscript is to leverage and control the ECAs’ behavioral expressivity by modelling the complex multimodal behavior that humans employ during communication. The driving forces of the Thesis are twofold: (1) to exploit speech prosody, visual prosody and language with the aim of synthesizing expressive and human-like behaviors for ECAs; (2) to control the style of the synthesized gestures such that we can generate them with the style of any speaker. With these motivations in mind, we first propose a semantically aware and speech-driven facial and head gesture synthesis model trained on the TEDx Corpus which we collected. Then we propose ZS-MSTM 1.0, an approach to synthesize stylized upper-body gestures, driven by the content of a source speaker’s speech and corresponding to the style of any target speakers, seen or unseen by our model. It is trained on PATS Corpus which includes multimodal data of speakers having different behavioral style. ZS-MSTM 1.0 is not limited to PATS speakers, and can generate gestures in the style of any newly coming speaker without further training or fine-tuning, rendering our approach zero-shot. Behavioral style is modelled based on multimodal speakers’ data - language, body gestures, and speech - and independent from the speaker’s identity ("ID"). We additionally propose ZS-MSTM 2.0 to generate stylized facial gestures in addition to the upper-body gestures. We train ZS-MSTM 2.0 on PATS Corpus, which we extended to include dialog acts and 2D facial landmarks
APA, Harvard, Vancouver, ISO, and other styles
17

Bouënard, Alexandre. "Synthesis of Music Performances: Virtual Character Animation as a Controller of Sound Synthesis." Phd thesis, Université de Bretagne Sud, 2009. http://tel.archives-ouvertes.fr/tel-00497292.

Full text
Abstract:
Ces dernières années ont vu l'émergence de nom- breuses interfaces musicales ayant pour objectif principal d'offrir de nouvelles expériences instru- mentales. La spécification de telles interfaces met généralement en avant l'expertise des musiciens à appréhender des données sensorielles multiples et hétérogènes (visuelles, sonores et tactiles). Ces interfaces mettent ainsi en jeu le traitement de ces différentes données pour la conception de nouveaux modes d'interaction. Cette thèse s'intéresse plus spécifiquement à l'analyse, la modélisation ainsi que la synthèse de situations in- strumentales de percussion. Nous proposons ainsi un système permettant de synthétiser les retours vi- suel et sonore de performances de percussion, dans lesquelles un percussionniste virtuel contrôle des pro- cessus de synthèse sonore. L'étape d'analyse montre l'importance du contrôle de l'extrémité de la mailloche par des percussionnistes ex- perts jouant de la timbale. Cette analyse nécessite la capture préalable des gestes instrumentaux de dif- férents percussionnistes. Elle conduit à l'extraction de paramètres à partir des trajectoires extremité capturées pour diverses variations de jeu. Ces paramètres sont quantitativement évalués par leur capacité à représen- ter ces variations. Le système de synthèse proposé dans ce travail met en oeuvre l'animation physique d'un percussionniste virtuel capable de contrôler des processus de synthèse sonore. L'animation physique met en jeu un nouveau mode de contrôle du modèle physique par la seule spé- cification de la trajectoire extrémité de la mailloche. Ce mode de contrôle est particulièrement pertinent au re- gard de l'importance du contrôle de la mailloche mis en évidence dans l'analyse précédente. L'approche physique adoptée est de plus utilisée pour permettre l'interaction du percussionniste virtuel avec un modèle physique de timbale. En dernier lieu, le système proposé est utilisé dans une perspective de composition musicale. La con- struction de nouvelles situations instrumentales de percussion est réalisée grâce à la mise en oeuvre de partitions gestuelles. Celles-ci sont obtenues par l'assemblage et l'articulation d'unités gestuelles canoniques disponibles dans les données capturées. Cette approche est appliquée à la composition et la synthèse d'exercices de percussion, et evaluée qualitativement par un professeur de percussion.
APA, Harvard, Vancouver, ISO, and other styles
18

Montaño, Aparicio Raúl. "Prosodic and Voice Quality Cross-Language Analysis of Storytelling Expressive Categories Oriented to Text-To-Speech Synthesis." Doctoral thesis, Universitat Ramon Llull, 2016. http://hdl.handle.net/10803/390960.

Full text
Abstract:
Durant segles, la interpretació oral de contes i històries ha sigut una tradició mundial lligada a l’entreteniment, la educació, i la perpetuació de la cultura. En les últimes dècades, alguns treballs s’han centrat en analitzar aquest estil de parla ric en matisos expressius caracteritzats per determinats patrons acústics. En relació a això, també hi ha hagut un interès creixent en desenvolupar aplicacions de contar contes, com ara les de contacontes interactius. Aquesta tesi està orientada a millorar aspectes claus d’aquest tipus d’aplicacions: millorar la naturalitat de la parla sintètica expressiva a partir d’analitzar la parla de contacontes en detall, a més a més de proporcionar un millor llenguatge no verbal a un avatar parlant mitjançant la sincronització de la parla i els gestos. Per aconseguir aquests objectius és necessari comprendre les característiques acústiques d’aquest estil de parla i la interacció de la parla i els gestos. Pel que fa a característiques acústiques de la parla de contacontes, la literatura relacionada ha treballat en termes de prosòdia, mentre que només ha estat suggerit que la qualitat de la veu pot jugar un paper important per modelar les subtileses d’aquest estil. En aquesta tesi, el paper tant de la prosòdia com de la qualitat de la veu en l’estil indirecte de la parla de contacontes en diferents idiomes és analitzat per identificar les principal categories expressives que la composen i els paràmetres acústics que les caracteritzen. Per fer-ho, es proposa una metodologia d’anotació per aquest estil de parla a nivell de oració basada en modes de discurs dels contes (mode narratiu, descriptiu, i diàleg), introduint a més sub-modes narratius. Considerant aquesta metodologia d’anotació, l’estil indirecte d’una història orientada a una audiència jove (cobrint versions en castellà, anglès, francès, i alemany) és analitzat en termes de prosòdia i qualitat de la veu mitjançant anàlisis estadístics i discriminants, després de classificar els àudios de les oracions de la història en les seves categories expressives. Els resultats confirmen l’existència de les categories de contes amb diferències expressives subtils en tots els idiomes més enllà dels estils personals dels narradors. En aquest sentit, es presenten evidències que suggereixen que les categories expressives dels contes es transmeten amb matisos expressius més subtils que en les emocions bàsiques, després de comparar els resultats obtinguts amb aquells de parla emocional. Els anàlisis també mostren que la prosòdia i la qualitat de la veu contribueixen pràcticament de la mateixa manera a l’hora de discriminar entre les categories expressives dels contes, les quals son expressades amb patrons acústics similars en tots els idiomes analitzats. Cal destacar també la gran relació observada en la selecció de categoria per cada oració que han fet servir els diferents narradors encara quan, que sapiguem, no se’ls hi va donar cap indicació. Per poder traslladar totes aquestes categories a un sistema de text a parla basat en corpus, caldria enregistrar un corpus per cada categoria. No obstant, crear diferents corpus ad-hoc esdevé un tasca molt laboriosa. En la tesi, s’introdueix una alternativa basada en una metodologia d’anàlisi orientada a síntesi dissenyada per derivar models de regles des de un petit però representatiu conjunt d’oracions, que poden poder ser utilitzats per generar parla amb estil de contacontes a partir de parla neutra. Els experiments sobre suspens creixent com a prova de concepte mostren la viabilitat de la proposta en termes de naturalitat i similitud respecte un narrador de contes real. Finalment, pel que fa a interacció entre parla i gestos, es realitza un anàlisi de sincronia i èmfasi orientat a controlar un avatar de contacontes en 3D. Al tal efecte, es defineixen indicadors de força tant per els gestos com per la parla. Després de validar-los amb tests perceptius, una regla d’intensitat s’obté de la seva correlació. A més a més, una regla de sincronia es deriva per determinar correspondències temporals entre els gestos i la parla. Aquests anàlisis s’han dut a terme sobre interpretacions neutres i agressives per part d’un actor per cobrir un gran rang de nivells d’èmfasi, com a primer pas per avaluar la integració d’un avatar parlant després del sistema de text a parla.
Durante siglos, la interpretación oral de cuentos e historias ha sido una tradición mundial ligada al entretenimiento, la educación, y la perpetuación de la cultura. En las últimas décadas, algunos trabajos se han centrado en analizar este estilo de habla rico en matices expresivos caracterizados por determinados patrones acústicos. En relación a esto, también ha habido un interés creciente en desarrollar aplicaciones de contar cuentos, como las de cuentacuentos interactivos. Esta tesis está orientada a mejorar aspectos claves de este tipo de aplicaciones: mejorar la naturalidad del habla sintética expresiva a partir de analizar el habla de cuentacuentos en detalle, además de proporcionar un mejor lenguaje no verbal a un avatar parlante mediante la sincronización del habla y los gestos. Para conseguir estos objetivos es necesario comprender las características acústicas de este estilo de habla y la interacción del habla y los gestos. En cuanto a características acústicas del habla de narradores de cuentos, la literatura relacionada ha trabajado en términos de prosodia, mientras que sólo ha sido sugerido que la calidad de la voz puede jugar un papel importante para modelar las sutilezas de este estilo. En esta tesis, el papel tanto de la prosodia como de la calidad de la voz en el estilo indirecto del habla de cuentacuentos en diferentes idiomas es analizado para identificar las principales categorías expresivas que componen este estilo de habla y los parámetros acústicos que las caracterizan. Para ello, se propone una metodología de anotación a nivel de oración basada en modos de discurso de los cuentos (modo narrativo, descriptivo, y diálogo), introduciendo además sub-modos narrativos. Considerando esta metodología de anotación, el estilo indirecto de una historia orientada a una audiencia joven (cubriendo versiones en castellano, inglés, francés, y alemán) es analizado en términos de prosodia y calidad de la voz mediante análisis estadísticos y discriminantes, después de clasificar los audios de las oraciones de la historia en sus categorías expresivas. Los resultados confirman la existencia de las categorías de cuentos con diferencias expresivas sutiles en todos los idiomas más allá de los estilos personales de los narradores. En este sentido, se presentan evidencias que sugieren que las categorías expresivas de los cuentos se transmiten con matices expresivos más sutiles que en las emociones básicas, tras comparar los resultados obtenidos con aquellos de habla emocional. Los análisis también muestran que la prosodia y la calidad de la voz contribuyen prácticamente de la misma manera a la hora de discriminar entre las categorías expresivas de los cuentos, las cuales son expresadas con patrones acústicos similares en todos los idiomas analizados. Cabe destacar también la gran relación observada en la selección de categoría para cada oración que han utilizado los diferentes narradores aun cuando, que sepamos, no se les dio ninguna indicación. Para poder trasladar todas estas categorías a un sistema de texto a habla basado en corpus, habría que grabar un corpus para cada categoría. Sin embargo, crear diferentes corpus ad-hoc es una tarea muy laboriosa. En la tesis, se introduce una alternativa basada en una metodología de análisis orientada a síntesis diseñada para derivar modelos de reglas desde un pequeño pero representativo conjunto de oraciones, que pueden ser utilizados para generar habla de cuentacuentos a partir de neutra. Los experimentos sobre suspense creciente como prueba de concepto muestran la viabilidad de la propuesta en términos de naturalidad y similitud respecto a un narrador de cuentos real. Finalmente, en cuanto a interacción entre habla y gestos, se realiza un análisis de sincronía y énfasis orientado a controlar un avatar cuentacuentos en 3D. Al tal efecto, se definen indicadores de fuerza tanto para gestos como para habla. Después de validarlos con tests perceptivos, una regla de intensidad se obtiene de su correlación. Además, una regla de sincronía se deriva para determinar correspondencias temporales entre los gestos y el habla. Estos análisis se han llevado a cabo sobre interpretaciones neutras y agresivas por parte de un actor para cubrir un gran rango de niveles de énfasis, como primer paso para evaluar la integración de un avatar parlante después del sistema de texto a habla.
For ages, the oral interpretation of tales and stories has been a worldwide tradition tied to entertainment, education, and perpetuation of culture. During the last decades, some works have focused on the analysis of this particular speaking style rich in subtle expressive nuances represented by specific acoustic cues. In line with this fact, there has also been a growing interest in the development of storytelling applications, such as those related to interactive storytelling. This thesis deals with one of the key aspects of audiovisual storytellers: improving the naturalness of the expressive synthetic speech by analysing the storytelling speech in detail, together with providing better non-verbal language to a speaking avatar by synchronizing that speech with its gestures. To that effect, it is necessary to understand in detail the acoustic characteristics of this particular speaking style and the interaction between speech and gestures. Regarding the acoustic characteristics of storytelling speech, the related literature has dealt with the acoustic analysis of storytelling speech in terms of prosody, being only suggested that voice quality may play an important role for the modelling of its subtleties. In this thesis, the role of both prosody and voice quality in indirect storytelling speech is analysed across languages to identify the main expressive categories it is composed of together with the acoustic parameters that characterize them. To do so, an analysis methodology is proposed to annotate this particular speaking style at the sentence level based on storytelling discourse modes (narrative, descriptive, and dialogue), besides introducing narrative sub-modes. Considering this annotation methodology, the indirect speech of a story oriented to a young audience (covering the Spanish, English, French, and German versions) is analysed in terms of prosody and voice quality through statistical and discriminant analyses, after classifying the sentence-level utterances of the story in their corresponding expressive categories. The results confirm the existence of storytelling categories containing subtle expressive nuances across the considered languages beyond narrators' personal styles. In this sense, evidences are presented suggesting that such storytelling expressive categories are conveyed with subtler speech nuances than basic emotions by comparing their acoustic patterns to the ones obtained from emotional speech data. The analyses also show that both prosody and voice quality contribute almost equally to the discrimination among storytelling expressive categories, being conveyed with similar acoustic patterns across languages. It is also worth noting the strong relationship observed in the selection of the expressive category per utterance across the narrators even when, up to our knowledge, no previous indications were given to them. In order to translate all these expressive categories to a corpus-based Text-To-Speech system, the recording of a speech corpus for each category would be required. However, building ad-hoc speech corpora for each and every specific expressive style becomes a very daunting task. In this work, we introduce an alternative based on an analysis-oriented-to-synthesis methodology designed to derive rule-based models from a small but representative set of utterances, which can be used to generate storytelling speech from neutral speech. The experiments conducted on increasing suspense as a proof of concept show the viability of the proposal in terms of naturalness and storytelling resemblance. Finally, in what concerns the interaction between speech and gestures, an analysis is performed in terms of time and emphasis oriented to drive a 3D storytelling avatar. To that effect, strength indicators are defined for speech and gestures. After validating them through perceptual tests, an intensity rule is obtained from their correlation. Moreover, a synchrony rule is derived to determine temporal correspondences between speech and gestures. These analyses have been conducted on aggressive and neutral performances to cover a broad range of emphatic levels as a first step to evaluate the integration of a speaking avatar after the expressive Text-To-Speech system.
APA, Harvard, Vancouver, ISO, and other styles
19

Martín-Albo, Simón Daniel. "Contributions to Pen & Touch Human-Computer Interaction." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/68482.

Full text
Abstract:
[EN] Computers are now present everywhere, but their potential is not fully exploited due to some lack of acceptance. In this thesis, the pen computer paradigm is adopted, whose main idea is to replace all input devices by a pen and/or the fingers, given that the origin of the rejection comes from using unfriendly interaction devices that must be replaced by something easier for the user. This paradigm, that was was proposed several years ago, has been only recently fully implemented in products, such as the smartphones. But computers are actual illiterates that do not understand gestures or handwriting, thus a recognition step is required to "translate" the meaning of these interactions to computer-understandable language. And for this input modality to be actually usable, its recognition accuracy must be high enough. In order to realistically think about the broader deployment of pen computing, it is necessary to improve the accuracy of handwriting and gesture recognizers. This thesis is devoted to study different approaches to improve the recognition accuracy of those systems. First, we will investigate how to take advantage of interaction-derived information to improve the accuracy of the recognizer. In particular, we will focus on interactive transcription of text images. Here the system initially proposes an automatic transcript. If necessary, the user can make some corrections, implicitly validating a correct part of the transcript. Then the system must take into account this validated prefix to suggest a suitable new hypothesis. Given that in such application the user is constantly interacting with the system, it makes sense to adapt this interactive application to be used on a pen computer. User corrections will be provided by means of pen-strokes and therefore it is necessary to introduce a recognizer in charge of decoding this king of nondeterministic user feedback. However, this recognizer performance can be boosted by taking advantage of interaction-derived information, such as the user-validated prefix. Then, this thesis focuses on the study of human movements, in particular, hand movements, from a generation point of view by tapping into the kinematic theory of rapid human movements and the Sigma-Lognormal model. Understanding how the human body generates movements and, particularly understand the origin of the human movement variability, is important in the development of a recognition system. The contribution of this thesis to this topic is important, since a new technique (which improves the previous results) to extract the Sigma-lognormal model parameters is presented. Closely related to the previous work, this thesis study the benefits of using synthetic data as training. The easiest way to train a recognizer is to provide "infinite" data, representing all possible variations. In general, the more the training data, the smaller the error. But usually it is not possible to infinitely increase the size of a training set. Recruiting participants, data collection, labeling, etc., necessary for achieving this goal can be time-consuming and expensive. One way to overcome this problem is to create and use synthetically generated data that looks like the human. We study how to create these synthetic data and explore different approaches on how to use them, both for handwriting and gesture recognition. The different contributions of this thesis have obtained good results, producing several publications in international conferences and journals. Finally, three applications related to the work of this thesis are presented. First, we created Escritorie, a digital desk prototype based on the pen computer paradigm for transcribing handwritten text images. Second, we developed "Gestures à Go Go", a web application for bootstrapping gestures. Finally, we studied another interactive application under the pen computer paradigm. In this case, we study how translation reviewing can be done more ergonomically using a pen.
[ES] Hoy en día, los ordenadores están presentes en todas partes pero su potencial no se aprovecha debido al "miedo" que se les tiene. En esta tesis se adopta el paradigma del pen computer, cuya idea fundamental es sustituir todos los dispositivos de entrada por un lápiz electrónico o, directamente, por los dedos. El origen del rechazo a los ordenadores proviene del uso de interfaces poco amigables para el humano. El origen de este paradigma data de hace más de 40 años, pero solo recientemente se ha comenzado a implementar en dispositivos móviles. La lenta y tardía implantación probablemente se deba a que es necesario incluir un reconocedor que "traduzca" los trazos del usuario (texto manuscrito o gestos) a algo entendible por el ordenador. Para pensar de forma realista en la implantación del pen computer, es necesario mejorar la precisión del reconocimiento de texto y gestos. El objetivo de esta tesis es el estudio de diferentes estrategias para mejorar esta precisión. En primer lugar, esta tesis investiga como aprovechar información derivada de la interacción para mejorar el reconocimiento, en concreto, en la transcripción interactiva de imágenes con texto manuscrito. En la transcripción interactiva, el sistema y el usuario trabajan "codo con codo" para generar la transcripción. El usuario valida la salida del sistema proporcionando ciertas correcciones, mediante texto manuscrito, que el sistema debe tener en cuenta para proporcionar una mejor transcripción. Este texto manuscrito debe ser reconocido para ser utilizado. En esta tesis se propone aprovechar información contextual, como por ejemplo, el prefijo validado por el usuario, para mejorar la calidad del reconocimiento de la interacción. Tras esto, la tesis se centra en el estudio del movimiento humano, en particular del movimiento de las manos, utilizando la Teoría Cinemática y su modelo Sigma-Lognormal. Entender como se mueven las manos al escribir, y en particular, entender el origen de la variabilidad de la escritura, es importante para el desarrollo de un sistema de reconocimiento, La contribución de esta tesis a este tópico es importante, dado que se presenta una nueva técnica (que mejora los resultados previos) para extraer el modelo Sigma-Lognormal de trazos manuscritos. De forma muy relacionada con el trabajo anterior, se estudia el beneficio de utilizar datos sintéticos como entrenamiento. La forma más fácil de entrenar un reconocedor es proporcionar un conjunto de datos "infinito" que representen todas las posibles variaciones. En general, cuanto más datos de entrenamiento, menor será el error del reconocedor. No obstante, muchas veces no es posible proporcionar más datos, o hacerlo es muy caro. Por ello, se ha estudiado como crear y usar datos sintéticos que se parezcan a los reales. Las diferentes contribuciones de esta tesis han obtenido buenos resultados, produciendo varias publicaciones en conferencias internacionales y revistas. Finalmente, también se han explorado tres aplicaciones relaciones con el trabajo de esta tesis. En primer lugar, se ha creado Escritorie, un prototipo de mesa digital basada en el paradigma del pen computer para realizar transcripción interactiva de documentos manuscritos. En segundo lugar, se ha desarrollado "Gestures à Go Go", una aplicación web para generar datos sintéticos y empaquetarlos con un reconocedor de forma rápida y sencilla. Por último, se presenta un sistema interactivo real bajo el paradigma del pen computer. En este caso, se estudia como la revisión de traducciones automáticas se puede realizar de forma más ergonómica.
[CAT] Avui en dia, els ordinadors són presents a tot arreu i es comunament acceptat que la seva utilització proporciona beneficis. No obstant això, moltes vegades el seu potencial no s'aprofita totalment. En aquesta tesi s'adopta el paradigma del pen computer, on la idea fonamental és substituir tots els dispositius d'entrada per un llapis electrònic, o, directament, pels dits. Aquest paradigma postula que l'origen del rebuig als ordinadors prové de l'ús d'interfícies poc amigables per a l'humà, que han de ser substituïdes per alguna cosa més coneguda. Per tant, la interacció amb l'ordinador sota aquest paradigma es realitza per mitjà de text manuscrit i/o gestos. L'origen d'aquest paradigma data de fa més de 40 anys, però només recentment s'ha començat a implementar en dispositius mòbils. La lenta i tardana implantació probablement es degui al fet que és necessari incloure un reconeixedor que "tradueixi" els traços de l'usuari (text manuscrit o gestos) a alguna cosa comprensible per l'ordinador, i el resultat d'aquest reconeixement, actualment, és lluny de ser òptim. Per pensar de forma realista en la implantació del pen computer, cal millorar la precisió del reconeixement de text i gestos. L'objectiu d'aquesta tesi és l'estudi de diferents estratègies per millorar aquesta precisió. En primer lloc, aquesta tesi investiga com aprofitar informació derivada de la interacció per millorar el reconeixement, en concret, en la transcripció interactiva d'imatges amb text manuscrit. En la transcripció interactiva, el sistema i l'usuari treballen "braç a braç" per generar la transcripció. L'usuari valida la sortida del sistema donant certes correccions, que el sistema ha d'usar per millorar la transcripció. En aquesta tesi es proposa utilitzar correccions manuscrites, que el sistema ha de reconèixer primer. La qualitat del reconeixement d'aquesta interacció és millorada, tenint en compte informació contextual, com per exemple, el prefix validat per l'usuari. Després d'això, la tesi se centra en l'estudi del moviment humà en particular del moviment de les mans, des del punt de vista generatiu, utilitzant la Teoria Cinemàtica i el model Sigma-Lognormal. Entendre com es mouen les mans en escriure és important per al desenvolupament d'un sistema de reconeixement, en particular, per entendre l'origen de la variabilitat de l'escriptura. La contribució d'aquesta tesi a aquest tòpic és important, atès que es presenta una nova tècnica (que millora els resultats previs) per extreure el model Sigma- Lognormal de traços manuscrits. De forma molt relacionada amb el treball anterior, s'estudia el benefici d'utilitzar dades sintètiques per a l'entrenament. La forma més fàcil d'entrenar un reconeixedor és proporcionar un conjunt de dades "infinit" que representin totes les possibles variacions. En general, com més dades d'entrenament, menor serà l'error del reconeixedor. No obstant això, moltes vegades no és possible proporcionar més dades, o fer-ho és molt car. Per això, s'ha estudiat com crear i utilitzar dades sintètiques que s'assemblin a les reals. Les diferents contribucions d'aquesta tesi han obtingut bons resultats, produint diverses publicacions en conferències internacionals i revistes. Finalment, també s'han explorat tres aplicacions relacionades amb el treball d'aquesta tesi. En primer lloc, s'ha creat Escritorie, un prototip de taula digital basada en el paradigma del pen computer per realitzar transcripció interactiva de documents manuscrits. En segon lloc, s'ha desenvolupat "Gestures à Go Go", una aplicació web per a generar dades sintètiques i empaquetar-les amb un reconeixedor de forma ràpida i senzilla. Finalment, es presenta un altre sistema inter- actiu sota el paradigma del pen computer. En aquest cas, s'estudia com la revisió de traduccions automàtiques es pot realitzar de forma més ergonòmica.
Martín-Albo Simón, D. (2016). Contributions to Pen & Touch Human-Computer Interaction [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/68482
TESIS
APA, Harvard, Vancouver, ISO, and other styles
20

Faria, Regis Rossi Alves. "Aplicação de wavelets na análise de gestos musicais em timbres de instrumentos acústicos tradicionais." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-18072013-104904/.

Full text
Abstract:
A expressividade é um elemento chave para o transporte de emoções em música, e seu modelamento, vital para a concepção de sistemas de síntese mais realistas. Gestos musicais executados durante a interpretação usualmente portam a informação responsável pela expressividade percebida, e podem ser rastreados por meio de padrões sônicos a eles associados em diversas escalas de resolução. Um conjunto relevante de gestos musicais expressivos foi estudado através de uma análise em multiresolução utilizando-se a transformada wavelet. A escolha deve-se principalmente à capacidade natural desta ferramenta em realizar análises de tempo-escala/frequência, e suas semelhanças com o processamento dos estágios primários do sistema auditivo. Vinte e sete eventos musicais foram capturados em interpretações de violino e flauta, e analisados com o objetivo de avaliar a aplicabilidade desta ferramenta na identificação e segregação de padrões sônicos associados a gestos musicais expressivos. Os algoritmos wavelet foram implementados na plataforma MATLAB utilizando-se bancos de filtros organizados em esquema piramidal. Rotinas para análises gráfica e sônica e uma interface ao usuário foram também implementadas. Verificou-se que as wavelets permitem a identificação de padrões sônicos associados a gestos expressivos exibindo diferentes propriedades em níveis diferentes da análise. A técnica mostrou-se útil para isolar ruídos oriundos de fontes diversas, extrair transientes associados a gestos súbitos e/ou intensos, e para segregar a estrutura harmônica de tons musicais, entre outras potencialidades não menos importantes. Particularidades da técnica e efeitos secundários observados são discutidos, e os padrões sônicos observados nos níveis wavelets são correlacionados com os gestos musicais que lhes deram origem. São propostos trabalhos futuros objetivando a investigação de certos eventos musicais e fenômenos verificados, bem como o estudo de implementações alternativas.
Expressiveness is a key element for emotion transportation in music, and its modeling necessary to conceive more realistic synthesis systems. Musical gestures executed during a performance carry the information answering for expressiveness, and may be tracked by means of sonic patterns associated to them within several resolution scales. A relevant set of musical gestures was studied through a multiresolution analysis using the wavelet transform. The choice for this tool is mainly due to its natural ability to perform time-scale/frequency analysis, and for its similarities with early auditory processing stages. Twenty seven musical events were captured from violin and flute performances, and analyzed in order to evaluate the applicability of this tool for identification and segregation of sonic patterns associated with expressive musical gestures. The wavelet algorithms were implemented on the MATLAB platform, employing filter banks organized in a pyramidal scheme. Graphical and sonic analysis routines and a user interface were carried out over the same platform. It was verified that wavelets enable the identification of sonic patterns associated to musical gestures revealing different properties on different levels of the analysis. The technique showed up useful to isolate noise from different sources, extract transients associated to sudden and/or intense gestures, and segregate the tonal harmonic structure, among other important features. Particularities of the technique and secondary effects observed are discussed, and sonic patterns on wavelet levels are correlated with the musical gestures which produced them. Future works are proposed addressing further investigation of certain musical events and phenomena observed, as well as the study of alternative implementations.
APA, Harvard, Vancouver, ISO, and other styles
21

Perrotin, Olivier. "Chanter avec les mains : interfaces chironomiques pour les instruments de musique numériques." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112207/document.

Full text
Abstract:
Le travail de cette thèse porte sur l'étude du contrôle en temps réel de synthèse de voix chantée par une tablette graphique dans le cadre de l'instrument de musique numérique Cantor Digitalis.La pertinence de l'utilisation d'une telle interface pour le contrôle de l'intonation vocale a été traitée en premier lieu, démontrant que la tablette permet un contrôle de la hauteur mélodique plus précis que la voix réelle en situation expérimentale.Pour étendre la justesse du jeu à toutes situations, une méthode de correction dynamique de l'intonation a été développée, permettant de jouer en dessous du seuil de perception de justesse et préservant en même temps l'expressivité du musicien. Des évaluations objective et perceptive ont permis de valider l'efficacité de cette méthode.L'utilisation de nouvelles interfaces pour la musique pose la question des modalités impliquées dans le jeu de l'instrument. Une troisième étude révèle une prépondérance de la perception visuelle sur la perception auditive pour le contrôle de l'intonation, due à l'introduction d'indices visuels sur la surface de la tablette. Néanmoins, celle-ci est compensée par l'important pouvoir expressif de l'interface.En effet, la maîtrise de l'écriture ou du dessin dès l'enfance permet l'acquisition rapide d'un contrôle expert de l'instrument. Pour formaliser ce contrôle, nous proposons une suite de gestes adaptés à différents effets musicaux rencontrés dans la musique vocale. Enfin, une pratique intensive de l'instrument est réalisée au sein de l'ensemble Chorus Digitalis à des fins de test et de diffusion. Un travail de recherche artistique est conduit tant dans la mise en scène que dans le choix du répertoire musical à associer à l'instrument. De plus, un retour visuel dédié au public a été développé, afin d'aider à la compréhension du maniement de l'instrument
This thesis deals with the real-time control of singing voice synthesis by a graphic tablet, based on the digital musical instrument Cantor Digitalis.The relevance of the graphic tablet for the intonation control is first considered, showing that the tablet provides a more precise pitch control than real voice in experimental conditions.To extend the accuracy of control to any situation, a dynamic pitch warping method for intonation correction is developed. It enables to play under the pitch perception limens preserving at the same time the musician's expressivity. Objective and perceptive evaluations validate the method efficiency.The use of new interfaces for musical expression raises the question of the modalities implied in the playing of the instrument. A third study reveals a preponderance of the visual modality over the auditive perception for the intonation control, due to the introduction of visual clues on the tablet surface. Nevertheless, this is compensated by the expressivity allowed by the interface.The writing or drawing ability acquired since early childhood enables a quick acquisition of an expert control of the instrument. An ensemble of gestures dedicated to the control of different vocal effects is suggested.Finally, an intensive practice of the instrument is made through the Chorus Digitalis ensemble, to test and promote our work. An artistic research has been conducted for the choice of the Cantor Digitalis' musical repertoire. Moreover, a visual feedback dedicated to the audience has been developed, extending the perception of the players' pitch and articulation
APA, Harvard, Vancouver, ISO, and other styles
22

Demoucron, Matthias. "On the control of virtual violins - Physical modelling and control of bowed string instruments." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2008. http://tel.archives-ouvertes.fr/tel-00349920.

Full text
Abstract:
Cette thèse porte sur le contrôle de la synthèse sonore par modélisation physique des instruments à corde frottée. Elle se base, d'une part, sur l'exploration systématique de l'influence des paramètres de contrôle (pression d'archet, vitesse de l'archet et distance au chevalet) sur le comportement du modèle, et d'autre part, sur la mesure du contrôle effectif qu'exerce l'instrumentiste afin d'obtenir un contrôle réaliste du modèle physique. Un modèle physique basé sur la résolution modale de l'équation de la corde est d'abord présenté et implémenté pour la synthèse sonore du violon. Le comportement du modèle physique est ensuite examiné en effectuant des simulations et se concentre sur deux aspects: la ``jouabilité", c'est-à-dire l'espace des paramètres de contrôle dans lequel un mouvement de Helmholtz périodique est obtenu, et les variations des propriétés du son synthétisé (fréquence d'oscillation, niveau sonore et centroïde spectral) à l'intérieur de cet espace de paramètres. La deuxième partie de ce travail concerne la mise au point d'un capteur pour mesurer la force d'appui de l'archet sur la corde dans un contexte de jeu réel. Le capteur est ensuite combiné avec un système optique de capture du mouvement afin de mesurer l'ensemble complet des paramètres de jeu du violoniste. La dernière partie présente l'analyse des mesures de ces paramètres de contrôle pour des modes de jeu typiques (sautillé, spiccato, martelé, tremolo, détaché). Ces mesures permettent de décrire certaines propriétés du geste instrumental et de proposer un contrôle réaliste de la synthèse sonore pour différents modes de jeu et différentes tâches musicales.
APA, Harvard, Vancouver, ISO, and other styles
23

Janer, Mestres Jordi. "Singing-driven interfaces for sound synthesizers." Doctoral thesis, Universitat Pompeu Fabra, 2008. http://hdl.handle.net/10803/7550.

Full text
Abstract:
Els instruments musicals digitals es descomponen usualment en dues parts: la interfície d'usuari i el motor de síntesi. Tradicionalment la interfície d'usuari pren el nom de controlador musical. L'objectiu d'aquesta tesi és el disseny d'un interfície que permeti el control de la síntesi de sons instrumentals a partir de la veu cantada.

Amb la present recerca, intentem relacionar la veu amb el so dels instruments musicals, tenint en compte tan la descripció del senyal de veu, com les corresponents estratègies de mapeig per un control adequat del sintetitzador.
Proposem dos enfocaments diferents, d'una banda el control d'un sintetitzador de veu cantada, i d'altra banda el control de la síntesi de sons instrumentals. Per aquest últim, suggerim una representació del senyal de veu com a gests vocals, que inclou una sèrie d'algoritmes d'anàlisis de veu. A la vegada, per demostrar els resultats obtinguts, hem desenvolupat dos prototips a temps real.
Los instrumentos musicales digitales se pueden separar en dos componentes: el interfaz de usuario y el motor de sintesis. El interfaz de usuario se ha denominado tradicionalmente controlador musical. El objectivo de esta tesis es el diseño de un interfaz que permita el control de la sintesis de sonidos instrumentales a partir de la voz cantada.

La presente investigación pretende relacionar las caracteristicas de la voz con el sonido de los instrumentos musicales, teniendo en cuenta la descripción de la señal de voz, como las correspondientes estrategias de mapeo para un control apropiado del sintetizador. Se proponen dos enfoques distintos, el control de un sintetizador de voz cantada, y el control de la sintesis de sonidos insturmentales. Para este último, se sugiere una representación de la señal de voz como gestos vocales, incluyendo varios algoritmos de analisis de voz. Los resultados obtenidos se demuestran con dos prototipos a tiempo real.
Digital musical instruments are usually decomposed in two main constituent parts: a user interface and a sound synthesis engine. The user interface is popularly referred as a musical controller, and its design is the primary objective of this dissertation. Under the title of singing-driven interfaces, we aim to design systems that allow controlling the synthesis of musical instruments sounds with the singing voice.

This dissertation searches for the relationships between the voice and the sound of musical instruments by addressing both, the voice signal description, as well as the mapping strategies for a meaningful control of the synthesized sound.
We propose two different approaches, one for controlling a singing voice synthesizer, and another for controlling the synthesis of instrumental sounds. For the latter, we suggest to represent voice signal as vocal gestures, contributing with several voice analysis methods.
To demonstrate the obtained results, we developed two real-time prototypes.
APA, Harvard, Vancouver, ISO, and other styles
24

Chen, Wei Cheng, and 陳韋誠. "Gesture Recognition using HMM-based Fundamental Motion Synthesis with Implementation on a Wiimote." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/18269101288733000450.

Full text
Abstract:
碩士
長庚大學
資訊工程學系
99
In this paper, we use the Nintendo Wiimote tri-axial accelerometer as an input device to make a gesture recognition system when using the Hidden Markov Model (HMM) as the recognition algorithm. We use a set of basic movements called “Fundamental Motions” as the synthesis of all the other complex motions. These Fundamental Motions are used as HMM modeling units. In our preliminary study, we use Arabic numerals '0 ' to '9' as the first recognition task. We analyze this task and find a set of 16 motions appropriate to be used as HMM modeling units. The second recognition task is Arabic numerals '10 ' to '99', we also use fundamental motion as main concept, but adding connection signal to represent the voice between models. We found the use of connection signal can increase the recognition rate about 30%. By using appropriate feature extraction and HMM topology, a HMM-Viterbi searching algorithm can achieve near 98% accuracy and 62.26% in average for making ten numbers in a set continuous gesture.
APA, Harvard, Vancouver, ISO, and other styles
25

Huang, Kai-Chih, and 黃楷智. "A Study of Real-Time Image Synthesis System Using White Balance Adjustment Techniques and Gesture Interaction." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/53942254254430659675.

Full text
Abstract:
碩士
國立高雄第一科技大學
資訊管理研究所
100
In this study, we proposed an interactive platform which is combined augmented reality and real-time video processing technique. With the enhancement and popularity of the digital multimedia products, digital cameras, smart phones and other digital photography devices gradually become popularized in daily life. While traveling, most of the people like to shoot a lot of photos. However, some of those commemorative photos were not satisfied by people since the disadvantaged circumstances such as lighting effect. In general, these unsatisfactory photos can be edited using professional image processing tools. But the picture editing process would be too complicated, time consuming and ineffective The purpose of this study is to develop a somatosensory interaction and real-time image synthesis system based on which balance adjustment to solve the above problem. The related techniques of the designed system include (1) a virtual studio background subtraction method to extract the characters from the captures images, (2) image edge smoothing and noise removal, (3) embedding the extracted character image sequence into background image to create pre-synthesized picture, (4) color adjustments of image according to color temperature information, so that the composite image can show a more natural effect,(5) combined with gestures, motion detection, game or interface control. The developed real-time video synthesis system can allow the user free to choose the different scenic spots in the background. We also extended this method to project the segmented character image onto the other video frames, to produce synthetic video. The experimental results can be applied in gymnastics, dance and body movement, educational purposes, to help users self-training and supporting action learning.
APA, Harvard, Vancouver, ISO, and other styles
26

"MirrorGen Wearable Gesture Recognition using Synthetic Videos." Master's thesis, 2018. http://hdl.handle.net/2286/R.I.51558.

Full text
Abstract:
abstract: In recent years, deep learning systems have outperformed traditional machine learning systems in most domains. There has been a lot of research recently in the field of hand gesture recognition using wearable sensors due to the numerous advantages these systems have over vision-based ones. However, due to the lack of extensive datasets and the nature of the Inertial Measurement Unit (IMU) data, there are difficulties in applying deep learning techniques to them. Although many machine learning models have good accuracy, most of them assume that training data is available for every user while other works that do not require user data have lower accuracies. MirrorGen is a technique which uses wearable sensor data and generates synthetic videos using hand movements and it mitigates the traditional challenges of vision based recognition such as occlusion, lighting restrictions, lack of viewpoint variations, and environmental noise. In addition, MirrorGen allows for user-independent recognition involving minimal human effort during data collection. It also helps leverage the advances in vision-based recognition by using various techniques like optical flow extraction, 3D convolution. Projecting the orientation (IMU) information to a video helps in gaining position information of the hands. To validate these claims, we perform entropy analysis on various configurations such as raw data, stick model, hand model and real video. Human hand model is found to have an optimal entropy that helps in achieving user independent recognition. It also serves as a pervasive option as opposed to a video-based recognition. The average user independent recognition accuracy of 99.03% was achieved for a sign language dataset with 59 different users, 20 different signs with 20 repetitions each for a total of 23k training instances. Moreover, synthetic videos can be used to augment real videos to improve recognition accuracy.
Dissertation/Thesis
Masters Thesis Computer Science 2018
APA, Harvard, Vancouver, ISO, and other styles
27

"Modeling instrumental gestures: an analysis/synthesis framework for violin bowing." Universitat Pompeu Fabra, 2009. http://www.tesisenxarxa.net/TDX-1210109-120145/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography