Dissertations / Theses on the topic 'Gesture Synthesis'
Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles
Consult the top 27 dissertations / theses for your research on the topic 'Gesture Synthesis.'
Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.
You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.
Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.
Faggi, Simone. "An Evaluation Model For Speech-Driven Gesture Synthesis." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021. http://amslaurea.unibo.it/22844/.
Full textMarrin, Nakra Teresa (Teresa Anne) 1970. "Inside the conductor's jacket : analysis, interpretation and musical synthesis of expressive gesture." Thesis, Massachusetts Institute of Technology, 2000. http://hdl.handle.net/1721.1/9165.
Full textIncludes bibliographical references (leaves 154-167).
We present the design and implementation of the Conductor's Jacket, a unique wearable device that measures physiological and gestural signals, together with the Gesture Construction, a musical software system that interprets these signals and applies them expressively in a musical context. Sixteen sensors have been incorporated into the Conductor's Jacket in such a way as to not encumber or interfere with the gestures of a working orchestra conductor. The Conductor's Jacket system gathers up to sixteen data channels reliably at rates of 3 kHz per channel, and also provides mcal-time graphical feedback. Unlike many gesture-sensing systems it not only gathers positional and accelerational data but also senses muscle tension from several locations on each arm. The Conductor's Jacket was used to gather conducting data from six subjects, three professional conductors and three students, during twelve hours of rehearsals and performances. Analyses of the data yielded thirty-five significant features that seem to reflect intuitive and natural gestural tendencies, including context-based hand switching, anticipatory 'flatlining' effects, and correlations between respiration and phrasing. The results indicate that muscle tension and respiration signals reflect several significant and expressive characteristics of a conductor's gestures. From these results we present nine hypotheses about human musical expression, including ideas about efficiency, intentionality, polyphony, signal-to-noise ratios, and musical flow state. Finally, this thesis describes the Gesture Construction, a musical software system that analyzes and performs music in real-time based on the performer's gestures and breathing signals. A bank of software filters extracts several of the features that were found in the conductor study, including beat intensities and the alternation between arms. These features are then used to generate real-time expressive effects by shaping the beats, tempos, articulations, dynamics, and note lengths in a musical score.
by Teresa Marrin Nakra.
Ph.D.
Pun, James Chi-Him. "Gesture recognition with application in music arrangement." Diss., University of Pretoria, 2006. http://upetd.up.ac.za/thesis/available/etd-11052007-171910/.
Full textWang, Yizhong Johnty. "Investigation of gesture control for articulatory speech synthesis with a bio-mechanical mapping layer." Thesis, University of British Columbia, 2012. http://hdl.handle.net/2429/43193.
Full textPérez, Carrillo Alfonso Antonio. "Enhacing spectral sintesis techniques with performance gestures using the violin as a case study." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7264.
Full textThoret, Etienne. "Caractérisation acoustique des relations entre les mouvements biologiques et la perception sonore : application au contrôle de la synthèse et à l'apprentissage de gestes." Thesis, Aix-Marseille, 2014. http://www.theses.fr/2014AIXM4780/document.
Full textThis thesis focused on the relations between biological movements and auditory perception in considering the specific case of graphical movements and the friction sounds they produced. The originality of this work lies in the use of sound synthesis processes that are based on a perceptual paradigm and that can be controlled by gesture models. The present synthesis model made it possible to generate acoustic stimuli which timbre was directly modulated by the velocity variations induced by a graphic gesture in order to exclusively focus on the perceptual influence of this transformational invariant. A first study showed that we can recognize the biological motion kinematics (the 1/3 power law) and discriminate simple geometric shapes simply by listening to the timbre variations of friction sounds that solely evoke velocity variations. A second study revealed the existence of dynamic prototypes characterized by sounds corresponding to the most representative elliptic trajectory, thus revealing that prototypical shapes may emerged from sensorimotor coupling. A final study showed that the kinematics evoked by friction sounds may significantly affect the dynamic and geometric dimension in the visuo-motor coupling. This shed critical light on the relevance of auditory perception in the multisensory integration of continuous motion in a situation never explored. All of these theoretical results enabled the gestural control of sound synthesis models from a gestural description and the creation of sonification tools for gesture learning and rehabilitation of a graphomotor disease, dysgraphia
Devaney, Jason Wayne. "A study of articulatory gestures for speech synthesis." Thesis, University of Liverpool, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.284254.
Full textMétois, Eric. "Musical sound information : musical gestures and embedding synthesis." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/29125.
Full textVigliensoni, Martin Augusto. "Touchless gestural control of concatenative sound synthesis." Thesis, McGill University, 2011. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=104846.
Full textCe mémoire de thèse présente une nouvelle interface pour l'expression musicale combinant la synthèse sonore par concaténation et les technologies de captation de mouvements dans l'espace. Ce travail commence par une présentation des dispositifs de capture de position de type main-libre, en étudiant leur principes de fonctionnement et leur caractéristiques. Des exemples de leur application dans les contextes musicaux sont aussi étudiés. Une attention toute particulière est accordée à quatre systèmes: leurs spécifications techniques ainsi que leurs performances (évaluées par des métriques quantitatives) sont comparées expérimentalement. Ensuite, la synthèse concaténative est décrite. Cette technique de synthèse sonore consiste à synthéthiser une séquence musicale cible à partir de sons pré-enregistrés, sélectionnés et concaténés en fonction de leur adéquation avec la cible. Trois implémentations de cette technique sont comparées, permettant ainsi d'en choisir une pour notre application. Enfin, nous décrivons SoundCloud, une nouvelle interface qui, en ajoutant une interface visuelle à la méthode de synthèse concaténative, permet d'en étendre les possibilités de contrôle. SoundCloud permet en effet de contrôler la synthése de sons en utilisant des gestes libres des mains pour naviguer au sein d'un espace tridimensionnel de descripteurs des sons d'une base de données.
Maestre, Gómez Esteban. "Modeling instrumental gestures: an analysis/synthesis framework for violin bowing." Doctoral thesis, Universitat Pompeu Fabra, 2009. http://hdl.handle.net/10803/7562.
Full textThis work presents a methodology for modeling instrumental gestures in excitation-continuous musical instruments. In particular, it approaches bowing control in violin classical performance. Nearly non-intrusive sensing techniques are introduced and applied for accurately acquiring relevant timbre-related bowing control parameter signals and constructing a performance database. By defining a vocabulary of bowing parameter envelopes, the contours of bow velocity, bow pressing force, and bow-bridge distance are modeled as sequences of Bézier cubic curve segments, yielding a robust parameterization that is well suited for reconstructing original contours with significant fidelity. An analysis/synthesis statistical modeling framework is constructed from a database of parameterized contours of bowing controls, enabling a flexible mapping between score annotations and bowing parameter envelopes. The framework is used for score-based generation of synthetic bowing parameter contours through a bow planning algorithm able to reproduce possible constraints imposed by the finite length of the bow. Rendered bowing control signals are successfully applied to automatic performance by being used for driving offline violin sound generation through two of the most extended techniques: digital waveguide physical modeling, and sample-based synthesis.
Goudard, Vincent. "Représentation et contrôle dans le design interactif des instruments de musique numériques." Thesis, Sorbonne université, 2020. https://accesdistant.sorbonne-universite.fr/login?url=http://theses-intra.upmc.fr/modules/resources/download/theses/2020SORUS051.pdf.
Full textDigital musical instruments appear as complex objects, being positioned in a continuum with the history of lutherie as well as marked with a strong disruption provoked by the digital technology and its consequences in terms of sonic possibilities, relations between gesture and sound, listening situations, reconfigurability of instruments and so on. This doctoral work tries to describe the characteristics originating from the integration of digital technology into musical instruments, drawing notably on a musicological reflection, on softwares and hardwares development, on musical practice, as well as a number of interactions with other musicians, instruments makers, composers and researchers
洪觀宇 and Roy Hung. "Time domain analysis and synthesis of cello tones based on perceptual quality and playing gestures." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31215348.
Full textHung, Roy. "Time domain analysis and synthesis of cello tones based on perceptual quality and playing gestures /." Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20665672.
Full textEvrard, Marc. "Synthèse de parole expressive à partir du texte : Des phonostyles au contrôle gestuel pour la synthèse paramétrique statistique." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112202.
Full textThe subject of this thesis was the study and conception of a platform for expressive speech synthesis.The LIPS3 Text-to-Speech system — developed in the context of this thesis — includes a linguistic module and a parametric statistical module (built upon HTS and STRAIGHT). The system was based on a new single-speaker corpus, designed, recorded and annotated.The first study analyzed the influence of the precision of the training corpus phonetic labeling on the synthesis quality. It showed that statistical parametric synthesis is robust to labeling and alignment errors. This addresses the issue of variation in phonetic realizations for expressive speech.The second study presents an acoustico-phonetic analysis of the corpus, characterizing the expressive space used by the speaker to instantiate the instructions that described the different expressive conditions. Voice source parameters and articulatory settings were analyzed according to their phonetic classes, which allowed for a fine phonostylistic characterization.The third study focused on intonation and rhythm. Calliphony 2.0 is a real-time chironomic interface that controls the f0 and rhythmic parameters of prosody, using drawing/writing hand gestures with a stylus and a graphic tablet. These hand-controlled modulations are used to enhance the TTS output, producing speech that is more realistic, without degradation as it is directly applied to the vocoder parameters. Intonation and rhythm stylization using this interface brings significant improvement to the prototypicality of expressivity, as well as to the general quality of synthetic speech.These studies show that parametric statistical synthesis, combined with a chironomic interface, offers an efficient solution for expressive speech synthesis, as well as a powerful tool for the study of prosody
MORTENSEN, WANDERLEY MARCELO. "Interaction musicien - instrument : application au controle gestuel de la synthese sonore." Paris 6, 2001. http://www.theses.fr/2001PA066175.
Full textFares, Mireille. "Multimodal Expressive Gesturing With Style." Electronic Thesis or Diss., Sorbonne université, 2023. http://www.theses.fr/2023SORUS017.
Full textThe generation of expressive gestures allows Embodied Conversational Agents (ECA) to articulate the speech intent and content in a human-like fashion. The central theme of the manuscript is to leverage and control the ECAs’ behavioral expressivity by modelling the complex multimodal behavior that humans employ during communication. The driving forces of the Thesis are twofold: (1) to exploit speech prosody, visual prosody and language with the aim of synthesizing expressive and human-like behaviors for ECAs; (2) to control the style of the synthesized gestures such that we can generate them with the style of any speaker. With these motivations in mind, we first propose a semantically aware and speech-driven facial and head gesture synthesis model trained on the TEDx Corpus which we collected. Then we propose ZS-MSTM 1.0, an approach to synthesize stylized upper-body gestures, driven by the content of a source speaker’s speech and corresponding to the style of any target speakers, seen or unseen by our model. It is trained on PATS Corpus which includes multimodal data of speakers having different behavioral style. ZS-MSTM 1.0 is not limited to PATS speakers, and can generate gestures in the style of any newly coming speaker without further training or fine-tuning, rendering our approach zero-shot. Behavioral style is modelled based on multimodal speakers’ data - language, body gestures, and speech - and independent from the speaker’s identity ("ID"). We additionally propose ZS-MSTM 2.0 to generate stylized facial gestures in addition to the upper-body gestures. We train ZS-MSTM 2.0 on PATS Corpus, which we extended to include dialog acts and 2D facial landmarks
Bouënard, Alexandre. "Synthesis of Music Performances: Virtual Character Animation as a Controller of Sound Synthesis." Phd thesis, Université de Bretagne Sud, 2009. http://tel.archives-ouvertes.fr/tel-00497292.
Full textMontaño, Aparicio Raúl. "Prosodic and Voice Quality Cross-Language Analysis of Storytelling Expressive Categories Oriented to Text-To-Speech Synthesis." Doctoral thesis, Universitat Ramon Llull, 2016. http://hdl.handle.net/10803/390960.
Full textDurante siglos, la interpretación oral de cuentos e historias ha sido una tradición mundial ligada al entretenimiento, la educación, y la perpetuación de la cultura. En las últimas décadas, algunos trabajos se han centrado en analizar este estilo de habla rico en matices expresivos caracterizados por determinados patrones acústicos. En relación a esto, también ha habido un interés creciente en desarrollar aplicaciones de contar cuentos, como las de cuentacuentos interactivos. Esta tesis está orientada a mejorar aspectos claves de este tipo de aplicaciones: mejorar la naturalidad del habla sintética expresiva a partir de analizar el habla de cuentacuentos en detalle, además de proporcionar un mejor lenguaje no verbal a un avatar parlante mediante la sincronización del habla y los gestos. Para conseguir estos objetivos es necesario comprender las características acústicas de este estilo de habla y la interacción del habla y los gestos. En cuanto a características acústicas del habla de narradores de cuentos, la literatura relacionada ha trabajado en términos de prosodia, mientras que sólo ha sido sugerido que la calidad de la voz puede jugar un papel importante para modelar las sutilezas de este estilo. En esta tesis, el papel tanto de la prosodia como de la calidad de la voz en el estilo indirecto del habla de cuentacuentos en diferentes idiomas es analizado para identificar las principales categorías expresivas que componen este estilo de habla y los parámetros acústicos que las caracterizan. Para ello, se propone una metodología de anotación a nivel de oración basada en modos de discurso de los cuentos (modo narrativo, descriptivo, y diálogo), introduciendo además sub-modos narrativos. Considerando esta metodología de anotación, el estilo indirecto de una historia orientada a una audiencia joven (cubriendo versiones en castellano, inglés, francés, y alemán) es analizado en términos de prosodia y calidad de la voz mediante análisis estadísticos y discriminantes, después de clasificar los audios de las oraciones de la historia en sus categorías expresivas. Los resultados confirman la existencia de las categorías de cuentos con diferencias expresivas sutiles en todos los idiomas más allá de los estilos personales de los narradores. En este sentido, se presentan evidencias que sugieren que las categorías expresivas de los cuentos se transmiten con matices expresivos más sutiles que en las emociones básicas, tras comparar los resultados obtenidos con aquellos de habla emocional. Los análisis también muestran que la prosodia y la calidad de la voz contribuyen prácticamente de la misma manera a la hora de discriminar entre las categorías expresivas de los cuentos, las cuales son expresadas con patrones acústicos similares en todos los idiomas analizados. Cabe destacar también la gran relación observada en la selección de categoría para cada oración que han utilizado los diferentes narradores aun cuando, que sepamos, no se les dio ninguna indicación. Para poder trasladar todas estas categorías a un sistema de texto a habla basado en corpus, habría que grabar un corpus para cada categoría. Sin embargo, crear diferentes corpus ad-hoc es una tarea muy laboriosa. En la tesis, se introduce una alternativa basada en una metodología de análisis orientada a síntesis diseñada para derivar modelos de reglas desde un pequeño pero representativo conjunto de oraciones, que pueden ser utilizados para generar habla de cuentacuentos a partir de neutra. Los experimentos sobre suspense creciente como prueba de concepto muestran la viabilidad de la propuesta en términos de naturalidad y similitud respecto a un narrador de cuentos real. Finalmente, en cuanto a interacción entre habla y gestos, se realiza un análisis de sincronía y énfasis orientado a controlar un avatar cuentacuentos en 3D. Al tal efecto, se definen indicadores de fuerza tanto para gestos como para habla. Después de validarlos con tests perceptivos, una regla de intensidad se obtiene de su correlación. Además, una regla de sincronía se deriva para determinar correspondencias temporales entre los gestos y el habla. Estos análisis se han llevado a cabo sobre interpretaciones neutras y agresivas por parte de un actor para cubrir un gran rango de niveles de énfasis, como primer paso para evaluar la integración de un avatar parlante después del sistema de texto a habla.
For ages, the oral interpretation of tales and stories has been a worldwide tradition tied to entertainment, education, and perpetuation of culture. During the last decades, some works have focused on the analysis of this particular speaking style rich in subtle expressive nuances represented by specific acoustic cues. In line with this fact, there has also been a growing interest in the development of storytelling applications, such as those related to interactive storytelling. This thesis deals with one of the key aspects of audiovisual storytellers: improving the naturalness of the expressive synthetic speech by analysing the storytelling speech in detail, together with providing better non-verbal language to a speaking avatar by synchronizing that speech with its gestures. To that effect, it is necessary to understand in detail the acoustic characteristics of this particular speaking style and the interaction between speech and gestures. Regarding the acoustic characteristics of storytelling speech, the related literature has dealt with the acoustic analysis of storytelling speech in terms of prosody, being only suggested that voice quality may play an important role for the modelling of its subtleties. In this thesis, the role of both prosody and voice quality in indirect storytelling speech is analysed across languages to identify the main expressive categories it is composed of together with the acoustic parameters that characterize them. To do so, an analysis methodology is proposed to annotate this particular speaking style at the sentence level based on storytelling discourse modes (narrative, descriptive, and dialogue), besides introducing narrative sub-modes. Considering this annotation methodology, the indirect speech of a story oriented to a young audience (covering the Spanish, English, French, and German versions) is analysed in terms of prosody and voice quality through statistical and discriminant analyses, after classifying the sentence-level utterances of the story in their corresponding expressive categories. The results confirm the existence of storytelling categories containing subtle expressive nuances across the considered languages beyond narrators' personal styles. In this sense, evidences are presented suggesting that such storytelling expressive categories are conveyed with subtler speech nuances than basic emotions by comparing their acoustic patterns to the ones obtained from emotional speech data. The analyses also show that both prosody and voice quality contribute almost equally to the discrimination among storytelling expressive categories, being conveyed with similar acoustic patterns across languages. It is also worth noting the strong relationship observed in the selection of the expressive category per utterance across the narrators even when, up to our knowledge, no previous indications were given to them. In order to translate all these expressive categories to a corpus-based Text-To-Speech system, the recording of a speech corpus for each category would be required. However, building ad-hoc speech corpora for each and every specific expressive style becomes a very daunting task. In this work, we introduce an alternative based on an analysis-oriented-to-synthesis methodology designed to derive rule-based models from a small but representative set of utterances, which can be used to generate storytelling speech from neutral speech. The experiments conducted on increasing suspense as a proof of concept show the viability of the proposal in terms of naturalness and storytelling resemblance. Finally, in what concerns the interaction between speech and gestures, an analysis is performed in terms of time and emphasis oriented to drive a 3D storytelling avatar. To that effect, strength indicators are defined for speech and gestures. After validating them through perceptual tests, an intensity rule is obtained from their correlation. Moreover, a synchrony rule is derived to determine temporal correspondences between speech and gestures. These analyses have been conducted on aggressive and neutral performances to cover a broad range of emphatic levels as a first step to evaluate the integration of a speaking avatar after the expressive Text-To-Speech system.
Martín-Albo, Simón Daniel. "Contributions to Pen & Touch Human-Computer Interaction." Doctoral thesis, Universitat Politècnica de València, 2016. http://hdl.handle.net/10251/68482.
Full text[ES] Hoy en día, los ordenadores están presentes en todas partes pero su potencial no se aprovecha debido al "miedo" que se les tiene. En esta tesis se adopta el paradigma del pen computer, cuya idea fundamental es sustituir todos los dispositivos de entrada por un lápiz electrónico o, directamente, por los dedos. El origen del rechazo a los ordenadores proviene del uso de interfaces poco amigables para el humano. El origen de este paradigma data de hace más de 40 años, pero solo recientemente se ha comenzado a implementar en dispositivos móviles. La lenta y tardía implantación probablemente se deba a que es necesario incluir un reconocedor que "traduzca" los trazos del usuario (texto manuscrito o gestos) a algo entendible por el ordenador. Para pensar de forma realista en la implantación del pen computer, es necesario mejorar la precisión del reconocimiento de texto y gestos. El objetivo de esta tesis es el estudio de diferentes estrategias para mejorar esta precisión. En primer lugar, esta tesis investiga como aprovechar información derivada de la interacción para mejorar el reconocimiento, en concreto, en la transcripción interactiva de imágenes con texto manuscrito. En la transcripción interactiva, el sistema y el usuario trabajan "codo con codo" para generar la transcripción. El usuario valida la salida del sistema proporcionando ciertas correcciones, mediante texto manuscrito, que el sistema debe tener en cuenta para proporcionar una mejor transcripción. Este texto manuscrito debe ser reconocido para ser utilizado. En esta tesis se propone aprovechar información contextual, como por ejemplo, el prefijo validado por el usuario, para mejorar la calidad del reconocimiento de la interacción. Tras esto, la tesis se centra en el estudio del movimiento humano, en particular del movimiento de las manos, utilizando la Teoría Cinemática y su modelo Sigma-Lognormal. Entender como se mueven las manos al escribir, y en particular, entender el origen de la variabilidad de la escritura, es importante para el desarrollo de un sistema de reconocimiento, La contribución de esta tesis a este tópico es importante, dado que se presenta una nueva técnica (que mejora los resultados previos) para extraer el modelo Sigma-Lognormal de trazos manuscritos. De forma muy relacionada con el trabajo anterior, se estudia el beneficio de utilizar datos sintéticos como entrenamiento. La forma más fácil de entrenar un reconocedor es proporcionar un conjunto de datos "infinito" que representen todas las posibles variaciones. En general, cuanto más datos de entrenamiento, menor será el error del reconocedor. No obstante, muchas veces no es posible proporcionar más datos, o hacerlo es muy caro. Por ello, se ha estudiado como crear y usar datos sintéticos que se parezcan a los reales. Las diferentes contribuciones de esta tesis han obtenido buenos resultados, produciendo varias publicaciones en conferencias internacionales y revistas. Finalmente, también se han explorado tres aplicaciones relaciones con el trabajo de esta tesis. En primer lugar, se ha creado Escritorie, un prototipo de mesa digital basada en el paradigma del pen computer para realizar transcripción interactiva de documentos manuscritos. En segundo lugar, se ha desarrollado "Gestures à Go Go", una aplicación web para generar datos sintéticos y empaquetarlos con un reconocedor de forma rápida y sencilla. Por último, se presenta un sistema interactivo real bajo el paradigma del pen computer. En este caso, se estudia como la revisión de traducciones automáticas se puede realizar de forma más ergonómica.
[CAT] Avui en dia, els ordinadors són presents a tot arreu i es comunament acceptat que la seva utilització proporciona beneficis. No obstant això, moltes vegades el seu potencial no s'aprofita totalment. En aquesta tesi s'adopta el paradigma del pen computer, on la idea fonamental és substituir tots els dispositius d'entrada per un llapis electrònic, o, directament, pels dits. Aquest paradigma postula que l'origen del rebuig als ordinadors prové de l'ús d'interfícies poc amigables per a l'humà, que han de ser substituïdes per alguna cosa més coneguda. Per tant, la interacció amb l'ordinador sota aquest paradigma es realitza per mitjà de text manuscrit i/o gestos. L'origen d'aquest paradigma data de fa més de 40 anys, però només recentment s'ha començat a implementar en dispositius mòbils. La lenta i tardana implantació probablement es degui al fet que és necessari incloure un reconeixedor que "tradueixi" els traços de l'usuari (text manuscrit o gestos) a alguna cosa comprensible per l'ordinador, i el resultat d'aquest reconeixement, actualment, és lluny de ser òptim. Per pensar de forma realista en la implantació del pen computer, cal millorar la precisió del reconeixement de text i gestos. L'objectiu d'aquesta tesi és l'estudi de diferents estratègies per millorar aquesta precisió. En primer lloc, aquesta tesi investiga com aprofitar informació derivada de la interacció per millorar el reconeixement, en concret, en la transcripció interactiva d'imatges amb text manuscrit. En la transcripció interactiva, el sistema i l'usuari treballen "braç a braç" per generar la transcripció. L'usuari valida la sortida del sistema donant certes correccions, que el sistema ha d'usar per millorar la transcripció. En aquesta tesi es proposa utilitzar correccions manuscrites, que el sistema ha de reconèixer primer. La qualitat del reconeixement d'aquesta interacció és millorada, tenint en compte informació contextual, com per exemple, el prefix validat per l'usuari. Després d'això, la tesi se centra en l'estudi del moviment humà en particular del moviment de les mans, des del punt de vista generatiu, utilitzant la Teoria Cinemàtica i el model Sigma-Lognormal. Entendre com es mouen les mans en escriure és important per al desenvolupament d'un sistema de reconeixement, en particular, per entendre l'origen de la variabilitat de l'escriptura. La contribució d'aquesta tesi a aquest tòpic és important, atès que es presenta una nova tècnica (que millora els resultats previs) per extreure el model Sigma- Lognormal de traços manuscrits. De forma molt relacionada amb el treball anterior, s'estudia el benefici d'utilitzar dades sintètiques per a l'entrenament. La forma més fàcil d'entrenar un reconeixedor és proporcionar un conjunt de dades "infinit" que representin totes les possibles variacions. En general, com més dades d'entrenament, menor serà l'error del reconeixedor. No obstant això, moltes vegades no és possible proporcionar més dades, o fer-ho és molt car. Per això, s'ha estudiat com crear i utilitzar dades sintètiques que s'assemblin a les reals. Les diferents contribucions d'aquesta tesi han obtingut bons resultats, produint diverses publicacions en conferències internacionals i revistes. Finalment, també s'han explorat tres aplicacions relacionades amb el treball d'aquesta tesi. En primer lloc, s'ha creat Escritorie, un prototip de taula digital basada en el paradigma del pen computer per realitzar transcripció interactiva de documents manuscrits. En segon lloc, s'ha desenvolupat "Gestures à Go Go", una aplicació web per a generar dades sintètiques i empaquetar-les amb un reconeixedor de forma ràpida i senzilla. Finalment, es presenta un altre sistema inter- actiu sota el paradigma del pen computer. En aquest cas, s'estudia com la revisió de traduccions automàtiques es pot realitzar de forma més ergonòmica.
Martín-Albo Simón, D. (2016). Contributions to Pen & Touch Human-Computer Interaction [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/68482
TESIS
Faria, Regis Rossi Alves. "Aplicação de wavelets na análise de gestos musicais em timbres de instrumentos acústicos tradicionais." Universidade de São Paulo, 1997. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-18072013-104904/.
Full textExpressiveness is a key element for emotion transportation in music, and its modeling necessary to conceive more realistic synthesis systems. Musical gestures executed during a performance carry the information answering for expressiveness, and may be tracked by means of sonic patterns associated to them within several resolution scales. A relevant set of musical gestures was studied through a multiresolution analysis using the wavelet transform. The choice for this tool is mainly due to its natural ability to perform time-scale/frequency analysis, and for its similarities with early auditory processing stages. Twenty seven musical events were captured from violin and flute performances, and analyzed in order to evaluate the applicability of this tool for identification and segregation of sonic patterns associated with expressive musical gestures. The wavelet algorithms were implemented on the MATLAB platform, employing filter banks organized in a pyramidal scheme. Graphical and sonic analysis routines and a user interface were carried out over the same platform. It was verified that wavelets enable the identification of sonic patterns associated to musical gestures revealing different properties on different levels of the analysis. The technique showed up useful to isolate noise from different sources, extract transients associated to sudden and/or intense gestures, and segregate the tonal harmonic structure, among other important features. Particularities of the technique and secondary effects observed are discussed, and sonic patterns on wavelet levels are correlated with the musical gestures which produced them. Future works are proposed addressing further investigation of certain musical events and phenomena observed, as well as the study of alternative implementations.
Perrotin, Olivier. "Chanter avec les mains : interfaces chironomiques pour les instruments de musique numériques." Thesis, Paris 11, 2015. http://www.theses.fr/2015PA112207/document.
Full textThis thesis deals with the real-time control of singing voice synthesis by a graphic tablet, based on the digital musical instrument Cantor Digitalis.The relevance of the graphic tablet for the intonation control is first considered, showing that the tablet provides a more precise pitch control than real voice in experimental conditions.To extend the accuracy of control to any situation, a dynamic pitch warping method for intonation correction is developed. It enables to play under the pitch perception limens preserving at the same time the musician's expressivity. Objective and perceptive evaluations validate the method efficiency.The use of new interfaces for musical expression raises the question of the modalities implied in the playing of the instrument. A third study reveals a preponderance of the visual modality over the auditive perception for the intonation control, due to the introduction of visual clues on the tablet surface. Nevertheless, this is compensated by the expressivity allowed by the interface.The writing or drawing ability acquired since early childhood enables a quick acquisition of an expert control of the instrument. An ensemble of gestures dedicated to the control of different vocal effects is suggested.Finally, an intensive practice of the instrument is made through the Chorus Digitalis ensemble, to test and promote our work. An artistic research has been conducted for the choice of the Cantor Digitalis' musical repertoire. Moreover, a visual feedback dedicated to the audience has been developed, extending the perception of the players' pitch and articulation
Demoucron, Matthias. "On the control of virtual violins - Physical modelling and control of bowed string instruments." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2008. http://tel.archives-ouvertes.fr/tel-00349920.
Full textJaner, Mestres Jordi. "Singing-driven interfaces for sound synthesizers." Doctoral thesis, Universitat Pompeu Fabra, 2008. http://hdl.handle.net/10803/7550.
Full textAmb la present recerca, intentem relacionar la veu amb el so dels instruments musicals, tenint en compte tan la descripció del senyal de veu, com les corresponents estratègies de mapeig per un control adequat del sintetitzador.
Proposem dos enfocaments diferents, d'una banda el control d'un sintetitzador de veu cantada, i d'altra banda el control de la síntesi de sons instrumentals. Per aquest últim, suggerim una representació del senyal de veu com a gests vocals, que inclou una sèrie d'algoritmes d'anàlisis de veu. A la vegada, per demostrar els resultats obtinguts, hem desenvolupat dos prototips a temps real.
Los instrumentos musicales digitales se pueden separar en dos componentes: el interfaz de usuario y el motor de sintesis. El interfaz de usuario se ha denominado tradicionalmente controlador musical. El objectivo de esta tesis es el diseño de un interfaz que permita el control de la sintesis de sonidos instrumentales a partir de la voz cantada.
La presente investigación pretende relacionar las caracteristicas de la voz con el sonido de los instrumentos musicales, teniendo en cuenta la descripción de la señal de voz, como las correspondientes estrategias de mapeo para un control apropiado del sintetizador. Se proponen dos enfoques distintos, el control de un sintetizador de voz cantada, y el control de la sintesis de sonidos insturmentales. Para este último, se sugiere una representación de la señal de voz como gestos vocales, incluyendo varios algoritmos de analisis de voz. Los resultados obtenidos se demuestran con dos prototipos a tiempo real.
Digital musical instruments are usually decomposed in two main constituent parts: a user interface and a sound synthesis engine. The user interface is popularly referred as a musical controller, and its design is the primary objective of this dissertation. Under the title of singing-driven interfaces, we aim to design systems that allow controlling the synthesis of musical instruments sounds with the singing voice.
This dissertation searches for the relationships between the voice and the sound of musical instruments by addressing both, the voice signal description, as well as the mapping strategies for a meaningful control of the synthesized sound.
We propose two different approaches, one for controlling a singing voice synthesizer, and another for controlling the synthesis of instrumental sounds. For the latter, we suggest to represent voice signal as vocal gestures, contributing with several voice analysis methods.
To demonstrate the obtained results, we developed two real-time prototypes.
Chen, Wei Cheng, and 陳韋誠. "Gesture Recognition using HMM-based Fundamental Motion Synthesis with Implementation on a Wiimote." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/18269101288733000450.
Full text長庚大學
資訊工程學系
99
In this paper, we use the Nintendo Wiimote tri-axial accelerometer as an input device to make a gesture recognition system when using the Hidden Markov Model (HMM) as the recognition algorithm. We use a set of basic movements called “Fundamental Motions” as the synthesis of all the other complex motions. These Fundamental Motions are used as HMM modeling units. In our preliminary study, we use Arabic numerals '0 ' to '9' as the first recognition task. We analyze this task and find a set of 16 motions appropriate to be used as HMM modeling units. The second recognition task is Arabic numerals '10 ' to '99', we also use fundamental motion as main concept, but adding connection signal to represent the voice between models. We found the use of connection signal can increase the recognition rate about 30%. By using appropriate feature extraction and HMM topology, a HMM-Viterbi searching algorithm can achieve near 98% accuracy and 62.26% in average for making ten numbers in a set continuous gesture.
Huang, Kai-Chih, and 黃楷智. "A Study of Real-Time Image Synthesis System Using White Balance Adjustment Techniques and Gesture Interaction." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/53942254254430659675.
Full text國立高雄第一科技大學
資訊管理研究所
100
In this study, we proposed an interactive platform which is combined augmented reality and real-time video processing technique. With the enhancement and popularity of the digital multimedia products, digital cameras, smart phones and other digital photography devices gradually become popularized in daily life. While traveling, most of the people like to shoot a lot of photos. However, some of those commemorative photos were not satisfied by people since the disadvantaged circumstances such as lighting effect. In general, these unsatisfactory photos can be edited using professional image processing tools. But the picture editing process would be too complicated, time consuming and ineffective The purpose of this study is to develop a somatosensory interaction and real-time image synthesis system based on which balance adjustment to solve the above problem. The related techniques of the designed system include (1) a virtual studio background subtraction method to extract the characters from the captures images, (2) image edge smoothing and noise removal, (3) embedding the extracted character image sequence into background image to create pre-synthesized picture, (4) color adjustments of image according to color temperature information, so that the composite image can show a more natural effect,(5) combined with gestures, motion detection, game or interface control. The developed real-time video synthesis system can allow the user free to choose the different scenic spots in the background. We also extended this method to project the segmented character image onto the other video frames, to produce synthetic video. The experimental results can be applied in gymnastics, dance and body movement, educational purposes, to help users self-training and supporting action learning.
"MirrorGen Wearable Gesture Recognition using Synthetic Videos." Master's thesis, 2018. http://hdl.handle.net/2286/R.I.51558.
Full textDissertation/Thesis
Masters Thesis Computer Science 2018
"Modeling instrumental gestures: an analysis/synthesis framework for violin bowing." Universitat Pompeu Fabra, 2009. http://www.tesisenxarxa.net/TDX-1210109-120145/.
Full text