Log in

Relevant bibliographies by topics / VOICE SIGNALS / Dissertations / Theses

To see the other types of publications on this topic, follow the link: VOICE SIGNALS.

Dissertations / Theses on the topic 'VOICE SIGNALS'

Author: Grafiati

Published: 11 September 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'VOICE SIGNALS.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Wu, Cheng. "A typology for voice and music signals." Thesis, University of Ottawa (Canada), 2005. http://hdl.handle.net/10393/27082.

Full text

Abstract:

With the high increase in the availability of digital music, it has become of interest to automatically query a database of musical pieces. At the same time, a feasible solution of this objective gives us an insight into how humans perceive and classify music. In this research, we discuss our approach to classify music into four categories: pop, classical, country and jazz. Songs are collected in wave format. We randomly chose five 10-second clips from different parts of a song. We discussed two families of features: wavelet features and time-based features. These features are capable of capturing the information of energy and time of voice signal. Instead of using traditional Mel-Frequency Cepstral Coefficients (MFCC)[7] methods, which are widely used in audio classification and music classification, we incorporate the features in statistical classification methods such as LDA, QDA and tree. Finally, we attempted to create an adaptive tree approach for classification. In this research, 130 songs are collected. Pop songs are collected in 4 languages, English, Chinese, Spanish and French. A cross validation method is used to compute the proportion of correctly classified songs. It is shown that the tree method has a proportion of correct classification equal 0.80 when pop and country are considered as one category.

APA, Harvard, Vancouver, ISO, and other styles

2

Anskaitis, Aurimas. "Analysis of Quality of Coded Voice Signals." Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2010. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2009~D_20100303_142141-66509.

Full text

Abstract:

The dissertation investigates the problem of quality of coded voice. The main attention is paid to voice quality evaluation under packet loss conditions. The aim of the work is to improve voice quality evaluation algorithms. The tasks of the work are: • construction of the means for measurement of voice quality of short voice signals; • to define the concept of value of coded voice segment and to choose corresponding value metrics; • to measure distributions of frame values in standard voice; • to establish limits of distortions created by different codecs; • to investigate inertia of wide spread codecs and establish the length of impact of one lost frame. The dissertation consists of the introduction, 4 chapters, conclusions, list of literature. Introduction presents the novelty and topicality of the work, tasks and aims of the work are formulated. The first chapter is overview of voice quality evaluation methods, pros and cons of these methods are analyzed. PESQ algorithm and limits of its applicability are introduced in this chapter too. The lists of Lithuanian words for word intelligibility testing are created. Chapter two presents the method of signal construction that allows to extend PESQ applicability to short signals. This chapter introduces the concept of frame value. Distributions of frame values are calculated. Third chapter analyses distortions created by coding. It is shown that coding distortions... [to full text]
Disertacijoje nagrin jama koduoto balso kokybės vertinimo problematika. Pagrindinis dėmesys skiriamas balso kokybės tyrimams, kai perduodama koduota šneka ir prarandami balso paketai. Darbo tikslas yra patobulinti koduoto balso kokybės vertinimo algoritmus. Darbo uždaviniai yra šie: • sukurti matavimo priemonę trumpų balso signalo atkarpų kokybei vertinti; • apibrėžti koduoto balso segmentų vertės sampratą ir parinkti vertės metrikas; • išmatuoti bendrinės šnekos balso segmentų verčių skirstinius; • nustatyti skirtingų koderių sukuriamų iškraipymų ribas; • ištirti paplitusių koderių inertiškumą, nustatyti kiek laiko pastebima prarastų paketų įtaka sekantiems segmentams. Disertaciją sudaro įvadas, keturi tiriamieji skyriai ir bendrosios išvados. Įvade pristatomas darbo naujumas, aktualumas, aptariamas autoriaus indėlis, formuluojami darbo tikslai. Pirmas skyrius yra apžvalginis – analizuojami balso kokybės vertinimo metodai, jų privalumai ir trūkumai. Kaip savarankiška dalis čia pristatyti autoriaus sudaryti sąrašai lietuviškų žodžių, skirtų šnekos suprantamumo tyrimams. Antrame skyriuje parodoma, kaip galima išplėsti kokybės vertinimo PESQ (angl. Perceptual Evaluation of Speech Quality) algoritmo taikymo ribas. Čia įvedama koduoto balso paketo vertės sąvoka, nustatomi statistiniai paketų vertės skirstiniai. Trečiame skyriuje nagrinėjami specifiniai koduotos šnekos iškraipymai ir kodavimo parametrų įtaka... [toliau žr. visą tekstą]

APA, Harvard, Vancouver, ISO, and other styles

3

Strange, John. "VOICE AUTHENTICATIONA STUDY OF POLYNOMIAL REPRESENTATION OF SPEECH SIGNALS." Master's thesis, University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4015.

Full text

Abstract:

A subset of speech recognition is the use of speech recognition techniques for voice authentication. Voice authentication is an alternative security application to the other biometric security measures such as the use of fingerprints or iris scans. Voice authentication has advantages over the other biometric measures in that it can be utilized remotely, via a device like a telephone. However, voice authentication has disadvantages in that the authentication system typically requires a large memory and processing time than do fingerprint or iris scanning systems. Also, voice authentication research has yet to provide an authentication system as reliable as the other biometric measures. Most voice recognition systems use Hidden Markov Models (HMMs) as their basic probabilistic framework. Also, most voice recognition systems use a frame based approach to analyze the voice features. An example of research which has been shown to provide more accurate results is the use of a segment based model. The HMMs impose a requirement that each frame has conditional independence from the next. However, at a fixed frame rate, typically 10 ms., the adjacent feature vectors might span the same phonetic segment and often exhibit smooth dynamics and are highly correlated. The relationship between features of different phonetic segments is much weaker. Therefore, the segment based approach makes fewer conditional independence assumptions which are also violated to a lesser degree than for the frame based approach. Thus, the HMMs using segmental based approaches are more accurate. The speech polynomials (feature vectors) used in the segmental model have been shown to be Chebychev polynomials. Use of the properties of these polynomials has made it possible to reduce the computation time for speech recognition systems. Also, representing the spoken word waveform as a Chebychev polynomial allows for the recognition system to easily extract useful and repeatable features from the waveform allowing for a more accurate identification of the speaker. This thesis describes the segmental approach to speech recognition and addresses in detail the use of Chebychev polynomials in the representation of spoken words, specifically in the area of speaker recognition. .
M.S.
Department of Mathematics
Arts and Sciences
Mathematics

APA, Harvard, Vancouver, ISO, and other styles

4

BHATT, HARSHIT. "SPEAKER IDENTIFICATION FROM VOICE SIGNALS USING HYBRID NEURAL NETWORK." Thesis, DELHI TECHNOLOGICAL UNIVERSITY, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18865.

Full text

Abstract:

Identifying the speaker in audio visual environment is a crucial task which is now surfacing in the research domain researchers nowadays are moving towards utilizing deep neural networks to match people with their respective voices the applications of deep learning are many-fold that include the ability to process huge volume of data robust training of algorithms feasibility of optimization and reduced computation time. Previous studies have explored recurrent and convolutional neural network incorporating GRUs, Bi-GRUs, LSTM, Bi-LSTM and many more[1]. This work proposes a hybrid mechanism which consist of an CNN and LSTM network fused using an early fusion method. We accumulated a dataset of 1,330 voices by recording through a python script of length of 3 seconds in .wav format. The dataset consists of 14 categories and we used 80% for training and 20% for testing. We optimized and fine-tuned the neural networks and modified them to yield optimum results. For the early fusion approach, we used the concatenation operation that fuses neural networks prior to the training phase. The proposed method achieves 97.72% accuracy on our dataset and outperforms all existing baseline mechanisms like MLP, LSTM, CNN, and RNN. This research serves as a contribution to the ongoing research in speaker identification domain and paves way to future directions using deep learning.

APA, Harvard, Vancouver, ISO, and other styles

5

Chandna, Pritish. "Neural networks for singing voice extraction in monaural polyphonic music signals." Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/673414.

Full text

Abstract:

This thesis dissertation focuses on singing voice extraction from polyphonic musical signals. In particular, we focus on two cases; contemporary popular music, which typically has a processed singing voice with instrumental accompaniment and ensemble choral singing, which involves multiple singers singing in harmony and unison. Over the last decade, several deep learning based models have been proposed to separate the singing voice from instrumental accompaniment in a musical mixture. Most of these models assume that the musical mixture is a linear sum of the individual sources and estimate time-frequency masks to filter out the sources from the input mixture. While this assumption doesn't always hold, deep learning based models have shown remarkable capacity to model the separate sources in a mixture. In this thesis, we propose an alternative method for singing voice extraction. This methodology assumes that the perceived linguistic and melodic content of a singing voice signal is retained even when it is put through a non-linear mixing process. To this end, we explore language independent representations of linguistic content in a voice signal as well as generative methodologies for voice synthesis. Using these, we propose the framework for a methodology to synthesize a clean singing voice signal from the underlying linguistic and melodic content of a processed voice signal in a musical mixture. In addition, we adapt and evaluate state-of-the-art source separation methodologies to separate the soprano, alto, tenor and bass parts of choral recordings. We also use the proposed methodology for extraction via synthesis along with other deep learning based models to analyze unison singing within choral recordings.
Aquesta tesi se centra en l’extracció de veu cantada a partir de senyals musicals polifònics. En particular, ens centrem en dos casos; música popular contemporània, que normalment té una veu cantada processada amb acompanyament instrumental, i cant coral, que consisteix en diversos cantants cantant en harmonia i a l’uníson. Durant l’última dècada, s’han proposat diversos models basats en l’aprenentatge profund per separar la veu de l’acompanyament instrumental en una mescla musical. La majoria d’aquests models assumeixen que la mescla és una suma lineal de les fonts individuals i estimen les màscares temps-freqüència per filtrar les fonts de la mescla d’entrada. Tot i que aquesta assumpció no sempre es compleix, els models basats en l’aprenentatge profund han demostrat una capacitat notable per modelar les fonts en una mescla. En aquesta tesi, proposem un mètode alternatiu per l’extracció de la veu cantada. Aquesta metodologia assumeix que el contingut lingüístic i melòdic que percebem d’un senyal de veu cantada es manté fins i tot quan es tracta d’una mescla no lineal. Per a això, explorem representacions del contingut lingüístic independents de l’idioma en un senyal de veu, així com metodologies generatives per a la síntesi de veu. Utilitzant-les, proposem una metodologia per sintetitzar un senyal de veu cantada a partir del contingut lingüístic i melòdic subjacent d’un senyal de veu processat en una mescla musical. A més, adaptem i avaluem metodologies de separació de fonts d’última generació per separar les parts de soprano, contralt, tenor i baix dels enregistraments corals. També utilitzem la metodologia proposada per a l’extracció mitjançant síntesi juntament amb altres models basats en l’aprenentatge profund per analitzar el cant a l’uníson dins dels enregistraments corals.
Esta disertación doctoral se centra en la extracción de voz cantada a partir de señales musicales polifónicas de audio. En particular, analizamos dos casos; música popular contemporánea, que normalmente contiene voz cantada procesada y acompañada de instrumentación, y canto coral, que involucra a varios coristas cantando en armonía y al unísono. Durante la última década, se han propuesto varios modelos basados en aprendizaje profundo para separar la voz cantada del acompañamiento instrumental en una mezcla musical. La mayoría de estos modelos asumen que la mezcla musical es una suma lineal de fuentes individuales y estiman máscaras de tiempo-frecuencia para extraerlas de la mezcla. Si bien esta suposición no siempre se cumple, los modelos basados en aprendizaje profundo han demostrado tener una gran capacidad para modelar las fuentes de la mezcla. En esta tesis proponemos un método alternativo para extraer voz cantada. Esta técnica asume que el contenido lingüístico y melódico que se percibe en la voz cantada se retiene incluso cuando la señal es sometida a un proceso de mezcla no lineal. Con este fin, exploramos representaciones del contenido lingüístico independientes del lenguaje en la señal de voz, así como metodos generativos para síntesis de voz. Utilizando estas técnicas, proponemos la base para una metodología de síntesis de voz cantada limpia a partir del contenido lingüístico y melódico subyacente de la señal de voz procesada en una mezcla musical. Además, adaptamos y evaluamos metodologías de separación de fuentes de última generación para separar las voces soprano, alto, tenor y bajo de grabaciones corales. También utilizamos la metodología propuesta para extracción mediante síntesis junto con otros modelos basados en aprendizaje profundo para analizar canto al unísono dentro de grabaciones corales.

APA, Harvard, Vancouver, ISO, and other styles

6

Johansson, Dennis. "Real-time analysis, in SuperCollider, of spectral features of electroglottographic signals." Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188498.

Full text

Abstract:

This thesis presents tools and components necessary to further develop an implementation of a method. The method attempts to use the non invasive electroglottographic signal to locate rapid transitions between voice registers. Implementations for sample entropy and the Discrete Fourier Transform (DFT) implemented for the programming language SuperCollider are presented along with tools necessary to evaluate the method and present the results in real time. Since different algorithms have been used, both for clustering and cycle separation, a comparison between algorithms for both of these steps has also been done.
Denna rapport presenterar verktyg och komponenter som är nödvändiga för att vidareutveckla en implementation av en metod. Metoden försöker att använda en icke invasiv elektroglottografisk signal för att hitta snabba övergångar mellan röstregister. Det presenteras implementationer för sampelentropi och den diskreta fourier transformen för programspråket SuperCollider samt verktyg som behövs för att utvärdera metoden och presentera resultaten i realtid. Då olika algoritmer har använts för både klustring och cykelseparation så har även en jämförelse mellan algoritmer för dessa steg gjorts.

APA, Harvard, Vancouver, ISO, and other styles

7

Mészáros, Tomáš. "Speech Analysis for Processing of Musical Signals." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234974.

Full text

Abstract:

Hlavním cílem této práce je obohatit hudební signály charakteristikami lidské řeči. Práce zahrnuje tvorbu audioefektu inspirovaného efektem talk-box: analýzu hlasového ústrojí vhodným algoritmem jako je lineární predikce, a aplikaci odhadnutého filtru na hudební audio-signál. Důraz je kladen na dokonalou kvalitu výstupu, malou latenci a nízkou výpočetní náročnost pro použití v reálném čase. Výstupem práce je softwarový plugin využitelný v profesionálních aplikacích pro úpravu audia a při využití vhodné hardwarové platformy také pro živé hraní. Plugin emuluje reálné zařízení typu talk-box a poskytuje podobnou kvalitu výstupu s unikátním zvukem.

APA, Harvard, Vancouver, ISO, and other styles

8

Borowiak, Kamila. "Brain Mechanisms for the Perception of Visual and Auditory Communication Signals – Insights from Autism Spectrum Disorder." Doctoral thesis, Humboldt-Universität zu Berlin, 2020. http://dx.doi.org/10.18452/21634.

Full text

Abstract:

Kommunikation ist allgegenwärtig in unserem Alltag. Personen mit einer Autismus-Spektrum-Störung (ASS) zeigen soziale Schwierigkeiten und beim Erkennen von Kommunikationssignalen von Gesicht und Stimme. Da derartige Schwierigkeiten die Lebensqualität beeinträchtigen können, ist ein tiefgreifendes Verständnis der zugrundeliegenden Mechanismen von großer Bedeutung. In der vorliegenden Dissertation befasste ich mich mit sensorischen Gehirnmechanismen, die der Verarbeitung von Kommunikationssignalen zugrunde liegen und, die in der Forschung zu ASS bisher wenig Beachtung fanden. Erstens untersuchte ich, ob eine intranasale Gabe von Oxytocin die Erkennung der Stimmenidentität beeinflussen, und ihre Auffälligkeiten bei Personen mit ASS mildern kann. Zweitens erforschte ich, welche neuronalen Prozesse den Schwierigkeiten in der Wahrnehmung visueller Sprache in ASS zugrunde liegen, da bisherige Evidenz nur auf Verhaltensdaten basierte. Diese Fragestellungen beantwortete ich mit Hilfe von funktioneller Magnetresonanztomographie, Eyetracking und Verhaltenstestungen. Die Ergebnisse der Dissertation liefern neuartige Erkenntnisse, die für Personen mit ASS und typisch entwickelte Personen von hoher Relevanz sind. Erstens bestätigen sie die Annahmen, dass atypische sensorische Mechanismen für unser Verständnis der sozialen Schwierigkeiten in ASS grundlegend sind. Sie zeigen, dass atypische Funktionen sensorischer Gehirnregionen den Kommunikationseinschränkungen in ASS zugrunde liegen und die Effektivität von Interventionen beeinflussen, die jene Schwierigkeiten vermindern sollen. Zweitens liefern die Ergebnisse empirische Evidenz für theoretische Annahmen darüber, wie das typisch entwickelte Gehirn visuelle Kommunikationssignale verarbeitet. Diese Erkenntnisse erweitern maßgeblich unser aktuelles Wissen und zukünftige Forschungsansätze zur zwischenmenschlichen Kommunikation. Außerdem können sie neue Interventionsansätze zur Förderung von Kommunikationsfähigkeiten hervorbringen.
Communication is ubiquitous in our everyday life. Yet, individuals with autism spectrum disorder (ASD) have difficulties in social interactions and to recognize socially relevant signals from the face and the voice. Such impairments can vastly affect the quality of life - a profound understanding of the mechanisms behind these difficulties is thus strongly required. In the current dissertation, I focused on sensory brain mechanisms that underlie the perception of emotionally neutral communication signals that so far have gained little attention in ASD research. I studied the malleability of voice-identity processing using intranasal administration of oxytocin, and thus the potential to alleviate voice-identity recognition impairments in ASD. Furthermore, I investigated brain mechanisms that underlie recognition difficulties for visual speech in ASD, as until now evidence on visual-speech recognition in ASD was limited to behavioral findings. I applied methods of functional magnetic resonance imaging, eye tracking, and behavioral testing. The contribution of the present dissertation is twofold. First, the findings corroborate the view that atypical sensory perception is a critical cornerstone for understanding of social difficulties in ASD. Dysfunction of visual and auditory sensory brain regions might contribute to difficulties in processing aspects of communication signals in ASD and modulate the efficacy of interventions for improving the behavioral deficits. Second, the findings deliver empirical support for a recent theoretical model of how the typically developing brain perceives dynamic faces. This improved our current knowledge about brain processing of visual communication signals in the typically developing population. Advanced scientific knowledge about human communication, as provided in the current dissertation, propels further empirical research and development of clinical interventions that aim to promote communication abilities in affected individuals.

APA, Harvard, Vancouver, ISO, and other styles

9

Mokhtari, Mehdi. "The puzzle of non verbal communication: Towards a new aspect of leadership." Thesis, Linnéuniversitetet, Institutionen för organisation och entreprenörskap (OE), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-26248.

Full text

Abstract:

Communication is surrounding us. Leaders and followers are not an exception to that rule. Indeed, leadership actors are communicating with their co-workers, their boss, their employees, the media, and so forth. However, in the course of this paper and because of its importance, the focus on non verbal communication will be adopted. Basically, this form of communication is everything except the actual words that people pronounce. Body language, tone of the voice, cultural differences, deceit signals, all these components of non verbal communication and many others will be developed. The core of this work will be understanding the main concepts of non verbal communication and then applying them to leaders’ real life situations. This thesis will also, among other things, aim to answer the following questions: What is the importance of non verbal communication in everyday life? How are leaders using non verbal communication to give sense? Do they use deceit signals? What influences the non verbal communication? What is the emotional intelligence concept? Can the non verbal communication be extrapolated and be seen as being inter-cultural?

APA, Harvard, Vancouver, ISO, and other styles

10

Dzhambazov, Georgi. "Knowledge-based probabilistic modeling for tracking lyrics in music audio signals." Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404681.

Full text

Abstract:

This thesis proposes specific signal processing and machine learning methodologies for automatically aligning the lyrics of a song to its corresponding audio recording. The research carried out falls in the broader field of music information retrieval (MIR) and in this respect, we aim at improving some existing state-of-the-art methodologies, by introducing domain-specific knowledge. The goal of this work is to devise models capable of tracking in the music audio signal the sequential aspect of one particular element of lyrics - the phonemes. Music can be understood as comprising different facets, one of which is lyrics. The models we build take into account the complementary context that exists around lyrics, which is any musical facet complementary to lyrics. The facets used in this thesis include the structure of the music composition, structure of a melodic phrase, the structure of a metrical cycle. From this perspective, we analyse not only the low-level acoustic characteristics, representing the timbre of the phonemes, but also higher-level characteristics, in which the complementary context manifests. We propose specific probabilistic models to represent how the transitions between consecutive sung phonemes are conditioned by different facets of complementary context. The complementary context, which we address, unfolds in time according to principles that are particular of a music tradition. To capture these, we created corpora and datasets for two music traditions, which have a rich set of such principles: Ottoman Turkish makam and Beijing opera. The datasets and the corpora comprise different data types: audio recordings, music scores, and metadata. From this perspective, the proposed models can take advantage both of the data and the music-domain knowledge of particular musical styles to improve existing baseline approaches. As a baseline, we choose a phonetic recognizer based on hidden Markov models (HMM): a widely-used methodology for tracking phonemes both in singing and speech processing problems. We present refinements in the typical steps of existing phonetic recognizer approaches, tailored towards the characteristics of the studied music traditions. On top of the refined baseline, we device probabilistic models, based on dynamic Bayesian networks (DBN) that represent the relation of phoneme transitions to its complementary context. Two separate models are built for two granularities of complementary context: the structure of a melodic phrase (higher-level) and the structure of the metrical cycle (finer-level). In one model we exploit the fact the syllable durations depend on their position within a melodic phrase. Information about the melodic phrases is obtained from the score, as well as from music-specific knowledge.Then in another model, we analyse how vocal note onsets, estimated from audio recordings, influence the transitions between consecutive vowels and consonants. We also propose how to detect the time positions of vocal note onsets in melodic phrases by tracking simultaneously the positions in a metrical cycle (i.e. metrical accents). In order to evaluate the potential of the proposed models, we use the lyrics-to-audio alignment as a concrete task. Each model improves the alignment accuracy, compared to the baseline, which is based solely on the acoustics of the phonetic timbre. This validates our hypothesis that knowledge of complementary context is an important stepping stone for computationally tracking lyrics, especially in the challenging case of singing with instrumental accompaniment. The outcomes of this study are not only theoretic methodologies and data, but also specific software tools that have been integrated into Dunya - a suite of tools, built in the context of CompMusic, a project for advancing the computational analysis of the world's music. With this application, we have also shown that the developed methodologies are useful not only for tracking lyrics, but also for other use cases, such as enriched music listening and appreciation, or for educational purposes.
La tesi aquí presentada proposa metodologies d’aprenentatge automàtic i processament de senyal per alinear automàticament el text d’una cançó amb el seu corresponent enregistrament d’àudio. La recerca duta a terme s’engloba en l’ampli camp de l’extracció d’informació musical (Music Information Retrieval o MIR). Dins aquest context la tesi pretén millorar algunes de les metodologies d’última generació del camp introduint coneixement específic de l’àmbit. L’objectiu d’aquest treball és dissenyar models que siguin capaços de detectar en la senyal d’àudio l’aspecte seqüencial d’un element particular dels textos musicals; els fonemes. Podem entendre la música com la composició de diversos elements entre els quals podem trobar el text. Els models que construïm tenen en compte el context complementari del text. El context són tots aquells aspectes musicals que complementen el text, dels quals hem utilitzat en aquest tesi: la estructura de la composició musical, la estructura de les frases melòdiques i els accents rítmics. Des d’aquesta prespectiva analitzem no només les característiques acústiques de baix nivell, que representen el timbre musical dels fonemes, sinó també les característiques d’alt nivell en les quals es fa patent el context complementari. En aquest treball proposem models probabilístics específics que representen com les transicions entre fonemes consecutius de veu cantanda es veuen afectats per diversos aspectes del context complementari. El context complementari que tractem aquí es desenvolupa en el temps en funció de les característiques particulars de cada tradició musical. Per tal de modelar aquestes característiques hem creat corpus i conjunts de dades de dues tradicions musicals que presenten una gran riquesa en aquest aspectes; la música de l’opera de Beijing i la música makam turc-otomana. Les dades són de diversos tipus; enregistraments d’àudio, partitures musicals i metadades. Des d’aquesta prespectiva els models proposats poden aprofitar-se tant de les dades en si mateixes com del coneixement específic de la tradició musical per a millorar els resultats de referència actuals. Com a resultat de referència prenem un reconeixedor de fonemes basat en models ocults de Markov (Hidden Markov Models o HMM), una metodologia abastament emprada per a detectar fonemes tant en la veu cantada com en la parlada. Presentem millores en els processos comuns dels reconeixedors de fonemes actuals, ajustant-los a les característiques de les tradicions musicals estudiades. A més de millorar els resultats de referència també dissenyem models probabilistics basats en xarxes dinàmiques de Bayes (Dynamic Bayesian Networks o DBN) que respresenten la relació entre la transició dels fonemes i el context complementari. Hem creat dos models diferents per dos aspectes del context complementari; la estructura de la frase melòdica (alt nivell) i la estructura mètrica (nivell subtil). En un dels models explotem el fet que la duració de les síl·labes depén de la seva posició en la frase melòdica. Obtenim aquesta informació sobre les frases musical de la partitura i del coneixement específic de la tradició musical. En l’altre model analitzem com els atacs de les notes vocals, estimats directament dels enregistraments d’àudio, influencien les transicions entre vocals i consonants consecutives. A més també proposem com detectar les posicions temporals dels atacs de les notes en les frases melòdiques a base de localitzar simultàniament els accents en un cicle mètric musical. Per tal d’evaluar el potencial dels mètodes proposats utlitzem la tasca específica d’alineament de text amb àudio. Cada model proposat millora la precisió de l’alineament en comparació als resultats de referència, que es basen exclusivament en les característiques acústiques tímbriques dels fonemes. D’aquesta manera validem la nostra hipòtesi de que el coneixement del context complementari ajuda a la detecció automàtica de text musical, especialment en el cas de veu cantada amb acompanyament instrumental. Els resultats d’aquest treball no consisteixen només en metodologies teòriques i dades, sinó també en eines programàtiques específiques que han sigut integrades a Dunya, un paquet d’eines creat en el context del projecte de recerca CompMusic, l’objectiu del qual és promoure l’anàlisi computacional de les músiques del món. Gràcies a aquestes eines demostrem també que les metodologies desenvolupades es poden fer servir per a altres aplicacions en el context de la educació musical o la escolta musical enriquida.

APA, Harvard, Vancouver, ISO, and other styles

11

Carvalho, Paulo Henrique Bezerra de. "CODIFICAÇÃO DE SINAIS DE VOZ HUMANA POR DECOMPOSIÇÃO EM COMPONENTES MODULANTES." Universidade Federal do Maranhão, 2003. http://tedebc.ufma.br:8080/jspui/handle/tede/370.

Full text

Abstract:

Made available in DSpace on 2016-08-17T14:52:55Z (GMT). No. of bitstreams: 1 Paulo Henrique Bezerra de Carvalho.pdf: 6145212 bytes, checksum: 1f5a8a10fa0d0e9f555a3fe1f67ac240 (MD5) Previous issue date: 2003-12-12
This work proposes an speech signal encoder variation based on two concepts: the formants and the modulating components of the speech signal. The method suggested for the codification extracts the modulating components (instantaneous amplitude and frequency) to be transmitted. The method is based on the fact that the transmission of the speech can be substituted by the transmission of its AM-FM modulating components (amplitude modulation - frequency modulation). Thus, to send such components, the LPC (linear predictive coding) method is used to determine the frequencies that correspond to the first four formants of the speech spectrum within a 4 kHz band. Then, through a modified Gabor s wavelet function, four narrow bands are filtered around the formants. Finally, the properties of the Hilbert transform are used to determine the modulating components of the filtered bands, in other words, the instantaneous amplitudes and frequencies. The final result is the codification of eight signals in which four of them correspond to the instantaneous amplitudes and the other four correspond to the instantaneous frequencies. It is also presented a recovery of human speech where tests of intelligibility of the samples are applied after their respective recoveries. The results obtained showed that the method is a promising technique to be implemented in actual applications.
Este trabalho propõe uma variação de codificador do sinal de voz baseada em dois conceitos: os formantes e as componentes modulantes do sinal. O método proposto de codificação extrai as componentes modulantes (amplitudes e freqüências instantâneas) para serem transmitidas. O método é baseado no fato de que a transmissão da voz pode ser substituída pelo envio de suas componentes modulantes AM-FM (amplitude modulation - frequency modulation). Desse modo, para o envio de tais componentes é utilizado o método LPC (linear predictive coding) para a determinação das freqüências correspondentes aos quatro primeiros formantes do espectro de voz na faixa de 4 kHz. Em seguida, através de uma função wavelet modificada de Gabor, são filtradas quatro faixas estreitas em torno desses formantes. Por último, utilizando-se as propriedades da transformada de Hilbert, são determinadas as componentes modulantes das faixas filtradas, ou seja, as amplitudes e freqüências instantâneas. O resultado final é a codificação de oito sinais, sendo quatro correspondentes às amplitudes instantâneas e quatro das freqüências instantâneas. Também é apresentada a recuperação da voz a partir dos oitos sinais e para a validação do método são utilizadas cinco amostras de voz humana onde são empregados testes de inteligibilidade das amostras após as suas respectivas recuperações. Os resultados obtidos mostraram que o método é factível de implementação em aplicações reais.

APA, Harvard, Vancouver, ISO, and other styles

12

Boué, Anaïs. "Data mining and volcanic eruption forcasting." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAU007/document.

Full text

Abstract:

L'intégration de méthodes de prédiction des éruptions volcaniques dans une stratégie de surveillance globale peut être un outil d'aide à la décision précieux pour la gestion des crises, si les limites des méthodes utilisées sont connues. La plupart des tentatives de prédictions déterministes des éruptions volcaniques et des glissements de terrain sont effectuées avec la méthode FFM (material Failure Forecast Method). Cette méthode consiste à ajuster une loi de puissance empirique aux précurseurs de sismicité ou de déformation des éruptions. Jusqu'à présent, la plupart des travaux de recherche se sont attachés à faire des prédictions a posteriori, basées sur la séquence complète de précurseurs, mais le potentiel de la méthode FFM pour la prédiction en temps réel, en n'utilisant qu'une partie de la séquence, n'a encore jamais été évaluée. De plus, il est difficile de conclure quant-à la capacité de la méthode pour prédire les éruptions volcaniques car le nombre d'exemples publiés est très limité et aucune évaluation statistique de son potentiel n'a été faite jusqu'à présent. Par conséquent, il est important de procéder à une application systématique de la FFM sur un nombre important d'éruptions, dans des contextes volcaniques variés. Cette thèse présente une approche rigoureuse de la FFM, appliquée aux précurseurs sismiques des éruptions volcaniques, développée pour une application en temps réel. J'utilise une approche Bayésienne basée sur la théorie de la FFM et sur un outil de classification automatique des signaux ayant des mécanismes à la source différents. Les paramètres d'entrée de la méthode sont les densités de probabilité des données, déduites de la performance de l'outil de classification. Le paramètre de sortie donne la distribution de probabilité du temps de prédiction à chaque temps d'observation précédant l'éruption. Je détermine deux critères pour évaluer la fiabilité d'une prédiction en temps réel : l'étalement de la densité de probabilité de la prédiction et sa stabilité dans le temps. La méthode développée ici surpasse les applications classiques de la FFM, que ce soit pour des applications en a posteriori ou en temps réel, en particulier parce que l'information concernant l'incertitude sur les donnée est précisément prise en compte. La classification automatique des signaux sismo-volcaniques permet une application systématique de cette méthode de prédiction sur des dizaines d'années de données pour des contextes volcaniques andésitiques, au volcan Colima (Mexique) et au volcan Mérapi (Indonésie), et pour un contexte basaltique au Piton de la Fournaise (La Réunion, France). Je quantifie le nombre d'éruptions qui ne sont pas précédées de précurseurs, ainsi que les crises sismiques qui ne sont pas associées à des épisodes volcaniques. Au total, 64 séquences de précurseurs sont étudiées et utilisées pour tester la méthode de prédiction des éruptions développée dans cette thèse. Ce travail permet de déterminer dans quelles conditions la FFM peut être appliquée avec succès et de quantifier le taux de réussite de la méthode en temps réel et en a posteriori. Seulement 62% des séquences précurseurs étudiées dans cette thèse sont utilisable dans le cadre de la FFM et la moitié du nombre total d'éruptions sont prédites a posteriori. En temps réel, seulement 36% du nombre total d'éruptions auraient pu être prédites. Cependant, ces prédictions sont précises dans 83% des cas pour lesquels les critères de fiabilités sont satisfaites. Par conséquent, il apparaît que l'on peut avoir confiance en la méthode de prédiction en temps réel développée dans cette thèse mais que la FFM semble être applicable en temps réel uniquement si elle est intégrée dans une statégie de prédiction plus globale. Cependant, elle pourrait être potentiellement utile combinée avec d'autres méthodes de prédictions et supervisée par un observeur. Ces résultats reflètent le manque de connaissances actuelles concernant les mécanismes pré-éruptifs
Eruption forecasting methods are valuable tools for supporting decision making during volcanic crises if they are integrated in a global monitoring strategy and if their potentiality and limitations are known. Many attempts for deterministic forecasting of volcanic eruptions and landslides have been performed using the material Failure Forecast Method (FFM). This method consists in adjusting an empirical power law on precursory patterns of seismicity or deformation. Until now, most of the studies have presented hindsight forecasts, based on complete time series of precursors, and do not evaluate the method's potential for carrying out real-time forecasting with partial precursory sequences. Moreover, the limited number of published examples and the absence of systematic application of the FFM makes it difficult to conclude as to the ability of the method to forecast volcanic eruptions. Thus it appears important to gain experience by carrying out systematic forecasting attempts in various eruptive contexts. In this thesis, I present a rigorous approach of the FFM designed for real-time applications on volcano-seismic precursors. I use a Bayesian approach based on the FFM theory and an automatic classification of the seismic events that do not have the same source mechanisms. The probability distributions of the data deduced from the performance of the classification are used as input. As output, the method provides the probability of the forecast time at each observation time before the eruption. The spread of the posterior probability density function of the prediction time and its stability with respect to the observation time are used as criteria to evaluate the reliability of the forecast. I show that the method developed here outperforms the classical application of the FFM both for hindsight and real-time attempts because it accurately takes the uncertainty of the data information into account. The automatic classification of volcano-seismic signals allows for a systematic application of this forecasting method to decades of seismic data from andesitic volcanoes including Volcan de Colima (Mexico) and Merapi volcano (Indonesia), and from the basaltic volcano of Piton de la Fournaise (Reunion Island, France). The number of eruptions that are not preceded by precursors is quantified, as well as the number of seismic crises that are not followed by eruptions. Then, I use 64 precursory sequences and apply the forecasting method developed in this thesis. I thus determine in which conditions the FFM can be successfully applied and I quantify the success rate of the method in real-time and in hindsight. Only 62% of the precursory sequences analysed in this thesis were suitable for the application of FFM and half of the total number of eruptions are successfully forecast in hindsight. In real-time, the method allows for the successful predictions of only 36% of the total of all eruptions considered. Nevertheless, real-time predictions are successful for 83% of the cases that fulfil the reliability criteria. Therefore, we can have a good confidence on the method when the reliability criteria are met, but the deterministic real-time forecasting tool developed in this thesis is not sufficient in itself. However, it could potentially be informative combined with other forecasting methods and supervised by an observer. These results reflect the lack of knowledge concerning the pre-eruptive mechanisms

APA, Harvard, Vancouver, ISO, and other styles

13

Duchovskis, Donatas. "Aukštesnių eilių statistika grįsto balso detektavimo algoritmo sudarymas ir tyrimas." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2006. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2006~D_20060529_131458-61965.

Full text

Abstract:

This report covers a robust voice activity detection (VAD) algorithm presented in [1]. The algorithm uses higher order statistics (HOS) metrics of speech signal in linear prediction coding (LPC) residual domain to classify noise and speech frames of a signal. Chapters in this report present voice activity detection problem and analysis of environment issues for VAD, deep HOS based and standard algorithms analysis and a real time HOS based voice activity detector model. New improvements (instantaneous SNR estimation, decision smoothing, adaptive thresholds, artificial neural network) to the proposed algorithm are introduced and performance results of the improved algorithm compared to standard VAD algorithms are presented.

APA, Harvard, Vancouver, ISO, and other styles

14

Guy, Richard, and edu au jillj@deakin edu au mikewood@deakin edu au wildol@deakin edu au kimg@deakin. "DISTANCE, DIALOGUE AND DIFFERENCE A Postpositivist Approach to Understanding Distance Education in Papua New Guinea." Deakin University. School of Education, 1994. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20041209.093035.

Full text

Abstract:

This study focuses on the experiences of a group of educators engaged in a professional development program by distance education in Papua New Guinea. The participants in this study have been keeping professional journals, for periods of up to three years, about their experiences of distance education. Their discourses have been used to form a connected group of research participants, who use an action framework to focus on problematic issues surrounding distance education in Papua New Guinea. It is a piece of research, framed by critical theory, and characterised by participation, collaboration, reflexivity, reciprocity and empowerment. The process of the study is based in dialogue, and takes the view that research is constituted of a transformative perspective, which alters the way research participants understand the multiple realities in which they live and work, arid ultimately results in improvements in their lived experiences. The nature of the methodology privileges Voice' and a discourse of difference from each participant which contributes to the problematic nature of the study. The study has concerned itself, increasingly, with issues of power and control in the research process, and this has resulted in significant changes in the research as participants have become more conscious of issues such as distance, dialogue and difference. The study has evolved over a period of time in significant ways, and evidence is available that teachers in Papua New Guinea, despite structural and pedagogical barriers, are critically reflective and are able to transform their practice in ways which are consistent with social, cultural and political contexts in which they live and work. A number of 'local1 theories about research and distance education in Papua New Guinea are developed by the participants as they become informed about issues during the research. The practice of distance education and professional development, at personal and institutional levels, undergoes reconstruction during the life of the research and the study 'signals' other ways in which distance education and professional development may be reconstructed in Papua New Guinea.

APA, Harvard, Vancouver, ISO, and other styles

15

Wu, Nan, and Bofei Wang. "Process and Analysis of Voice Signal by MATLAB." Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-17541.

Full text

Abstract:

Deliver message by voice is the most important, effective and common method of exchange information for mankind. Language is human specific features and human voice is commonly used tool which is also the important way to pass information to each other. The voice has large information capacity. So we can use modern method to study voice processing technology, so that people can easily transmit, store, access and apply the voice. In this thesis, we designed a collection system that can collect voice and use different filters to filter the noise. After filtering the noise, the voice will be more quality in mobile communication, radio, TV and so on. In this thesis we use Microsoft recorder to collect a voice, and then analyze its time-domain, the frequency spectrum and the characteristics of the voice signal. We use MATLAB‟s function to remove the noise which has been added to the voice, further use bilinear transformation method to design a filter which is based on Butterworth simulation and window function and then filter the voice signal which has been added noise. After that we compare the time-domain and frequency-domain of the original voice and noised voice, then playback the noised voice and de-noising voice and then compare the application of signal processing in FIR filter and IIR filter, especially in the perspectives of the signal filtering de-noising characteristics and applications. According to the comparison, we can determine which filter is the best.

APA, Harvard, Vancouver, ISO, and other styles

16

Nayfeh, Taysir H. "Multi-signal processing for voice recognition in noisy environments." Thesis, This resource online, 1991. http://scholar.lib.vt.edu/theses/available/etd-10222009-125021/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Nylén, Helmer. "Detecting Signal Corruptions in Voice Recordings for Speech Therapy." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291429.

Full text

Abstract:

When recording voice samples from a patient in speech therapy the quality of the recording may be affected by different signal corruptions, for example background noise or clipping. The equipment and expertise required to identify small disturbances are not always present at smaller clinics. Therefore, this study investigates possible machine learning algorithms to automatically detect selected corruptions in speech signals, including infrasound and random muting. Five algorithms are analyzed: kernel substitution based Support Vector Machine, Convolutional Neural Network, Long Short-term Memory (LSTM), Gaussian Mixture Model based Hidden Markov Model and Generative Model based Hidden Markov Model. A tool to generate datasets of corrupted recordings is developed to test the algorithms in both single-label and multi-label settings. Mel-frequency Cepstral Coefficients are used as the main features. For each type of corruption different ways to increase the classification accuracy are tested, for example by using a Voice Activity Detector to filter out less relevant parts of the recording, changing the feature parameters, or using an ensemble of classifiers. The experiments show that a machine learning approach is feasible for this problem as a balanced accuracy of at least 75% is reached on all tested corruptions. While the single-label study gave mixed results with no algorithm clearly outperforming the others, in the multi-label case the LSTM in general performs better than other algorithms. Notably it achieves over 95% balanced accuracy on both white noise and infrasound. As the algorithms are trained only on spoken English phrases the usability of this tool in its current state is limited, but the experiments are easily expanded upon with other types of audio recordings, corruptions, features, or classification algorithms.
När en patients röst spelas in för analys i talterapi kan inspelningskvaliteten påverkas av olika signalproblem, till exempel bakgrundsljud eller klippning. Utrustningen och expertisen som behövs för att upptäcka små störningar finns dock inte alltid tillgänglig på mindre kliniker. Därför undersöker denna studie olika maskininlärningsalgoritmer för att automatiskt kunna upptäcka utvalda problem i talinspelningar, bland andra infraljud och slumpmässig utsläckning av signalen. Fem algoritmer analyseras: stödvektormaskin, Convolutional Neural Network, Long Short-term Memory (LSTM), Gaussian mixture model-baserad dold Markovmodell och generatorbaserad dold Markovmodell. Ett verktyg för att skapa datamängder med försämrade inspelningar utvecklas för att kunna testa algoritmerna. Vi undersöker separat fallen där inspelningarna tillåts ha en eller flera problem samtidigt, och använder framförallt en slags kepstralkoefficienter, MFCC:er, som särdrag. För varje typ av problem undersöker vi också sätt att förbättra noggrannheten, till exempel genom att filtrera bort irrelevanta delar av signalen med hjälp av en röstupptäckare, ändra särdragsparametrarna, eller genom att använda en ensemble av klassificerare. Experimenten visar att maskininlärning är ett rimligt tillvägagångssätt för detta problem då den balanserade träffsäkerheten överskrider 75%för samtliga testade störningar. Den delen av studien som fokuserade på enproblemsinspelningar gav inga resultat som tydde på att en algoritm var klart bättre än de andra, men i flerproblemsfallet överträffade LSTM:en generellt övriga algoritmer. Värt att notera är att den nådde över 95 % balanserad träffsäkerhet på både vitt brus och infraljud. Eftersom algoritmerna enbart tränats på engelskspråkiga, talade meningar så har detta verktyg i nuläget begränsad praktisk användbarhet. Däremot är det lätt att utöka dessa experiment med andra typer av inspelningar, signalproblem, särdrag eller algoritmer.

APA, Harvard, Vancouver, ISO, and other styles

18

Oddiraju, Swetha. "Improving performance for adaptive filtering with voice applications." Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/6271.

Full text

Abstract:

Thesis (M.S.)--University of Missouri-Columbia, 2007.
The entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on September 29, 2008) Includes bibliographical references.

APA, Harvard, Vancouver, ISO, and other styles

19

SANTOS, JÚNIOR Gutemberg Gonçalves dos. "Redução de ruído para sistemas de reconhecimento de voz utilizando subespaços vetoriais." Universidade Federal de Campina Grande, 2009. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/1508.

Full text

Abstract:

Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-08-20T20:10:09Z No. of bitstreams: 1 GUTEMBERG GONÇALVES DOS SANTOS JÚNIOR - DISSERTAÇÃO PPGEE 2009..pdf: 2756190 bytes, checksum: 5812d37f7ad4c18eb26e9672d4890812 (MD5)
Made available in DSpace on 2018-08-20T20:10:09Z (GMT). No. of bitstreams: 1 GUTEMBERG GONÇALVES DOS SANTOS JÚNIOR - DISSERTAÇÃO PPGEE 2009..pdf: 2756190 bytes, checksum: 5812d37f7ad4c18eb26e9672d4890812 (MD5) Previous issue date: 2009-05-08
O estabelecimento de uma interface de comunicação através da voz entre seres humanos e computadores vem sendo perseguido desde o início da era da computação. Nesta direção, diversos avanços foram realizados nas últimas seis décadas, permitindo o uso comercial de aplicações com reconhecimento de voz nos dias atuais. Entretanto, fatores como ruídos, reverberações, distorções entre outros, comprometem o desempenho desses sistemas ao reduzir a taxa de acerto quando submetidos a ambientes adversos. Assim, o estudo de técnicas que diminuam os efeitos desses problemas é de grande valia e vem ganhando destaque nas últimas décadas. O trabalho apresentado nesta dissertação tem como objetivo a redução dos problemas referentes aos ruídos característicos de ambientes automotivos, tornando os sistemas de reconhecimento de voz utilizados nesses ambientes mais robustos. Dessa forma, o controle de funcionalidades não-críticas de um automóvel, ou seja, funcionalidades que não coloquem em risco a vida do usuário como tocadores de música e ar condicionado, pode ser realizado através de comandos de voz. O sistema proposto é baseado numa etapa de pré-processamento do sinal de voz através do método de subespaços vetoriais. O desempenho deste método está diretamente relacionado com as dimensões (linhas× colunas) das matrizes representativas do sinal de entrada. Levando isso em consideração, a decomposição ULLV, apesar de se tratar de uma aproximação do método de subespaços vetoriais, foi utilizada por oferecer uma menor complexidade computacional quando comparada a métodos tradicionais baseados na decomposição SVD. O sistema de reconhecimento de voz Julius foi o escolhido para o estudo de caso por se tratar de um sistema desenvolvido em código livre que oferece um alto desempenho. Um banco de dados de voz com 44800 amostras foi gerado com o modelo de um ambiente automotivo. Por ﬁm, a robustez do sistema foi avaliada e comparada com um método tradicional de redução de ruído chamado subtração espectral.
The establishment of a speech-based communication interface between humans and computers has been pursued since the beginning of the computer era. Several studies have been made over the last six decades in order to accomplish this interface, making possible commercial use of speech recognition applications. However, factors such as noise, reverberation, distortion among others degrades the performance of these systems. Thus, reducing their success rate when operating in adverse environments. With this in mind, the study of techniques to reduce the impact of these problems is of a great value and has gained prominence in recent decades. The work presented in this dissertation aims to reduce problems related to noise encountered in an automotive environment, improving the speech recognition system robustness. Thus,controlofnon-critical features of a car, such as CD player and air conditioning, can be performed through voice commands. The proposed system is based on a speech signal preprocessing step using the signal subspace method. Its performance is related to the size (lines× columns) of the matrices that represents the input signal. Therefore, the ULLV decomposition was used because it oﬀers a lower computational complexity compared to traditional methods based on SVD decomposition. The speech recognizer Julius is an open source software that oﬀers high performance and was the chosen one for the case study. A noisy speech database with 44800 samples was generated to model the automotive environment. Finally, the robustness of the system was evaluated and compared with a traditional method of noise reduction called spectral subtraction.

APA, Harvard, Vancouver, ISO, and other styles

20

Jalalinajafabadi, Farideh. "Computerised GRBAS assessement of voice quality." Thesis, University of Manchester, 2016. https://www.research.manchester.ac.uk/portal/en/theses/computerised-grbas-assessement-of-voice-quality(7efd3263-b109-4137-87cf-b9559c61730b).html.

Full text

Abstract:

Vocal cord vibration is the source of voiced phonemes in speech. Voice quality depends on the nature of this vibration. Vocal cords can be damaged by infection, neck or chest injury, tumours and more serious diseases such as laryngeal cancer. This kind of physical damage can cause loss of voice quality. To support the diagnosis of such conditions and also to monitor the effect of any treatment, voice quality assessment is required. Traditionally, this is done ‘subjectively’ by Speech and Language Therapists (SLTs) who, in Europe, use a well-known assessment approach called ‘GRBAS’. GRBAS is an acronym for a five dimensional scale of measurements of voice properties. The scale was originally devised and recommended by the Japanese Society of Logopeadics and Phoniatrics and several European research publications. The proper- ties are ‘Grade’, ‘Roughness’, ‘Breathiness’, ‘Asthenia’ and ‘Strain’. An SLT listens to and assesses a person’s voice while the person performs specific vocal maneuvers. The SLT is then required to record a discrete score for the voice quality in range of 0 to 3 for each GRBAS component. In requiring the services of trained SLTs, this subjective assessment makes the traditional GRBAS procedure expensive and time-consuming to administer. This thesis considers the possibility of using computer programs to perform objective assessments of voice quality conforming to the GRBAS scale. To do this, Digital Signal Processing (DSP) algorithms are required for measuring voice features that may indicate voice abnormality. The computer must be trained to convert DSP measurements to GRBAS scores and a ‘machine learning’ approach has been adopted to achieve this. This research was made possible by the development, by Manchester Royal Infirmary (MRI) Hospital Trust, of a ‘speech database’ with the participation of clinicians, SLT’s, patients and controls. The participation of five SLTs scorers allowed norms to be established for GRBAS scoring which provided ‘reference’ data for the machine learning approach. To support the scoring procedure carried out at MRI, a software package, referred to as GRBAS Presentation and Scoring Package (GPSP), was developed for presenting voice recordings to each of the SLTs and recording their GRBAS scores. A means of assessing intra-scorer consistency was devised and built into this system. Also, the assessment of inter-scorer consistency was advanced by the invention of a new form of the ‘Fleiss Kappa’ which is applicable to ordinal as well as categorical scoring. The means of taking these assessments of scorer consistency into account when producing ‘reference’ GRBAS scores are presented in this thesis. Such reference scores are required for training the machine learning algorithms. The DSP algorithms required for feature measurements are generally well known and available as published or commercial software packages. However, an appraisal of these algorithms and the development of some DSP ‘thesis software’ was found to be necessary. Two ‘machine learning’ regression models have been developed for map- ping the measured voice features to GRBAS scores. These are K Nearest Neighbor Regression (KNNR) and Multiple Linear Regression (MLR). Our research is based on sets of features, sets of data and prediction models that are different from the approaches in the current literature. The performance of the computerised system is evaluated against reference scores using a Normalised Root Mean Squared Error (NRMSE) measure. The performances of MLR and KNNR for objective prediction of GRBAS scores are compared and analysed ‘with feature selection’ and ‘without feature selection’. It was found that MLR with feature selection was better than MLR without feature selection and KNNR with and without feature selection, for all five GRBAS components. It was also found that MLR with feature selection gives scores for ‘Asthenia’ and ‘Strain’ which are closer to the reference scores than the scores given by all five individual SLT scorers. The best objective score for ‘Roughness’ was closer than the scores given by two SLTs, roughly equal to the score of one SLT and worse than the other two SLT scores. The best objective scores for ‘Breathiness’ and ‘Grade’ were further from the reference scores than the scores produced by all five SLT scorers. However, the worst ‘MLR with feature selection’ result has normalised RMS error which is only about 3% worse than the worst SLT scoring. The results obtained indicate that objective GRBAS measurements have the potential for further development towards a commercial product that may at least be useful in augmenting the subjective assessments of SLT scorers.

APA, Harvard, Vancouver, ISO, and other styles

21

Lindsay, Iain Andrew Blair. "A signal constellation and carrier recovery technique for voice-band modems." Thesis, University of Edinburgh, 1986. http://hdl.handle.net/1842/15216.

Full text

APA, Harvard, Vancouver, ISO, and other styles

22

Lembard, Tomáš. "Speciální aplikace VoIP." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-219188.

Full text

Abstract:

The aim of this master's thesis is suggestion and following realization of voice transmission over the local network equipment and a description of used circuits and solutions in terms of hardware and software. This thesis deals with digitization of low-frequency signals, the structure of IP and UDP protocols, implementation of TCP/IP stack cIPS

APA, Harvard, Vancouver, ISO, and other styles

23

Doukas, Nikolaos. "Voice activity detection using energy based measures and source separation." Thesis, Imperial College London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245220.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Smith, Quentin D. "Multichannel Digital Signal Processor Based Red/Black Keyset." International Foundation for Telemetering, 1992. http://hdl.handle.net/10150/611927.

Full text

Abstract:

International Telemetering Conference Proceedings / October 26-29, 1992 / Town and Country Hotel and Convention Center, San Diego, California
This paper addresses a method to provide both secure and non-secure voice communications to a DS-1 network from a common keyset. In order to comply with both the electrical isolation requirements and the operational security issues regarding voice communications, an all-digital approach to the keyset was developed based upon the AD2101 DSP. Protocols that are handled by the keyset include: Multiple PTT modes, hot mike, telephone access, priority override, direct access, indirect access, paging, and monitor only. Special features that are addressed include: independent channel by channel assignment of access protocols, headset assignment, speaker assignment, and PTT assignment. Multiple microprocessors are used to implement the foregoing as well as down-loadable configurations, remote keyset control and monitoring, and composite audio outputs. Partitioning of the digital design provides RED to BLACK channel isolation and RED channel to AC power isolation of greater than 107 dB.

APA, Harvard, Vancouver, ISO, and other styles

25

Fredrickson, Steven Eric. "Neural networks for speaker identification." Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294364.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

El, Malki Karim. "A novel approach to high quality voice using echo cancellation and silence detection." Thesis, University of Sheffield, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.286579.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Tryfou, Georgina. "Time-frequency reassignment for acoustic signal processing. From speech to singing voice applications." Doctoral thesis, University of Trento, 2017. http://eprints-phd.biblio.unitn.it/2562/2/PhD-Thesis.pdf.

Full text

Abstract:

The various time-frequency (TF) representations of acoustic signals share the common objective to describe the temporal evolution of the spectral content of the signal, i.e., how the energy, or intensity, of the signal is changing in time. Many TF representations have been proposed in the past, and among them the short-time Fourier transform (STFT) is the one most commonly found in the core of acoustic signal processing techniques. However, certain problems that arise from the use of the STFT have been extensively discussed in the literature. These problems concern the unavoidable trade-off between the time and frequency resolution, and the fact that the selected resolution is fixed over the whole spectrum. In order to improve upon the spectrogram, several variations have been proposed over the time. One of these variations, stems from a promising method called reassignment. According to this method, the traditional spectrogram, as obtained from the STFT, is reassigned to a sharper representation called the Reassigned Spectrogram (RS). In this thesis we elaborate on approaches that utilize the RS as the TF representation of acoustic signals, and we exploit this representation in the context of different applications, as for instance speech recognition and melody extraction. The first contribution of this work is a method for speech parametrization, which results in a set of acoustic features called time-frequency reassigned cepstral coefficients (TFRCC). Experimental results show the ability of TFRCC features to present higher level characteristics of speech, a fact that leads to advantages in phone-level speech segmentation and speech recognition. The second contribution is the use of the RS as the basis to extract objective quality measures, and in particular the reassigned cepstral distance and the reassigned point-wise distance. Both measures are used for channel selection (CS), following our proposal to perform objective quality measure based CS for improving the accuracy of speech recognition in a multi-microphone reverberant environment. The final contribution of this work, is a method to detect harmonic pitch contours from singing voice signals, using a dominance weighting of the RS. This method has been exploited in the context of melody extraction from polyphonic music signals.

APA, Harvard, Vancouver, ISO, and other styles

28

Maury, Ghislaine. "Mélange de signaux microondes par voie optique." Grenoble INPG, 1998. http://www.theses.fr/1998INPG0152.

Full text

Abstract:

Nous avons etudie une solution originale de melange de signaux microondes par voie optique. Elle est fondee sur la conversion par un interferometre de la modulation de frequence de la lumiere, issue de la modulation directe d'une diode laser, en modulation d'intensite. L'interferometre etant un dispositif passif, cette technique presente plusieurs avantages par rapport aux techniques classiques necessitant des modulateurs actifs : elle peut etre facilement inseree dans un systeme de communications optiques avec multiplexage en longueurs d'onde, les pertes de conversion sont plus faibles, et enfin, la bande passante est reglable par simple choix de la geometrie de l'interferometre. Nous avons analyse theoriquement et simule plusieurs structures, avec differents types d'interferometres : mach-zehnder desequilibre, fabry-perot, et resonateur en anneau. Des mesures experimentales, realisees avec un mach-zehnder a fibres optiques, puis avec un michelson en optique de volume, ont permis de demontrer la faisabilite du melange. Nous avons ensuite realise un interferometre de mach-zehnder en optique integree sur verre car le controle en temperature de son substrat permet de maitriser le regime des interferences et d'optimiser la reponse de melange : les resultats des mesures ont alors ete stables et conformes aux simulations. Enfin, nous avons demontre l'interet pratique de la solution proposee en simulant une configuration dans laquelle un des deux signaux microondes d'entree est module par un signal numerique, ce qui est le cas dans les applications visees de telecommunications optiques.

APA, Harvard, Vancouver, ISO, and other styles

29

Wheatley, John Malcolm. "A current differential feeder protection for use with leased voice frequency communications circuits." Thesis, Northumbria University, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245264.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Commarford, Patrick. "WORKING MEMORY, SEARCH, AND SIGNAL DETECTION: IMPLICATIONS FOR INTERACTIVE VOICE RESPONSE SYSTEM MENU DESIGN." Doctoral diss., University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4050.

Full text

Abstract:

Many researchers and speech user interface practitioners assert that interactive voice response (IVR) menus must be relatively short due to constraints of the human memory system. These individuals commonly cite Miller's (1956) paper to support their claims. The current paper argues that these authors commonly misuse the information provided in Miller's paper and that hypotheses drawn from modern theories of working memory (e.g., Baddeley and Hitch, 1974) would lead to the opposite conclusion – that reducing menu length by creating a greater number of menus and a deeper structure will actually be more demanding on users' working memories and will lead to poorer performance and poorer user satisfaction. The primary purpose of this series of experiments was to gain a greater understanding of the role of working memory in speech-enabled IVR use. The experiments also sought to determine whether theories of visual search and signal detection theory (SDT) could be used to predict auditory search behavior. Results of this experiment indicate that creating a deeper structure with shorter menus is detrimental to performance and satisfaction and more demanding of working memory resource. Further the experiment provides support for arguments developed from Macgregor, Lee, and Lam's dual criterion decision model and is a first step toward applying SDT to the IVR domain.
Ph.D.
Department of Psychology
Arts and Sciences
Psychology

APA, Harvard, Vancouver, ISO, and other styles

31

Podloucká, Lenka. "Identifikace pauz v rušeném řečovém signálu." Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217266.

Full text

Abstract:

This diploma thesis deals with pause identification with degraded speech signal. The speech characteristics and the conception of speech signal processing are described here. The work aim was to create the reliable recognizing method to establish speech and non-speech segments of speech signal with and without degraded speech signal. The five empty pause detectors were realized in computing environment MATLAB. There was the energetic detector in time domain, two-step detector in spectral domain, one-step integral detector, two-step integral detector and differential detector in cepstrum. The spectral detector makes use of energetic characteristics of speech signal in first step and statistic analysis in second step. Cepstral detectors make use of integral or differential algorithms. The detectors robustness was tested for different types of speech degradation and different values of Signal to Noise Ratio. The test of influence different speech degradation was conducted to compare non-speech detection for detectors by ROC (Receiver Operating Characteristic) Curves.

APA, Harvard, Vancouver, ISO, and other styles

32

Лавриненко, Олександр Юрійович, Александр Юрьевич Лавриненко, and Oleksandr Lavrynenko. "Методи підвищення ефективності семантичного кодування мовних сигналів." Thesis, Національний авіаційний університет, 2021. https://er.nau.edu.ua/handle/NAU/52212.

Full text

Abstract:

Дисертаційна робота присвячена вирішенню актуальної науково-практичної проблеми в телекомунікаційних системах, а саме підвищення пропускної здатності каналу передачі семантичних мовних даних за рахунок ефективного їх кодування, тобто формулюється питання підвищення ефективності семантичного кодування, а саме – з якою мінімальною швидкістю можливо кодувати семантичні ознаки мовних сигналів із заданою ймовірністю безпомилкового їх розпізнавання? Саме на це питання буде дана відповідь у даному науковому дослідженні, що є актуальною науково-технічною задачею враховуючи зростаючу тенденцію дистанційної взаємодії людей і роботизованої техніки за допомогою мови, де безпомилковість функціонування даного типу систем безпосередньо залежить від ефективності семантичного кодування мовних сигналів. У роботі досліджено відомий метод підвищення ефективності семантичного кодування мовних сигналів на основі мел-частотних кепстральних коефіцієнтів, який полягає в знаходженні середніх значень коефіцієнтів дискретного косинусного перетворення прологарифмованої енергії спектра дискретного перетворення Фур'є обробленого трикутним фільтром в мел-шкалі. Проблема полягає в тому, що представлений метод семантичного кодування мовних сигналів на основі мел-частотних кепстральних коефіцієнтів не дотримується умови адаптивності, тому було сформульовано основну наукову гіпотезу дослідження, яка полягає в тому що підвищити ефективність семантичного кодування мовних сигналів можливо за рахунок використання адаптивного емпіричного вейвлет-перетворення з подальшим застосуванням спектрального аналізу Гільберта. Під ефективністю кодування розуміється зниження швидкості передачі інформації із заданою ймовірністю безпомилкового розпізнавання семантичних ознак мовних сигналів, що дозволить значно знизити необхідну смугу пропускання, тим самим підвищуючи пропускну здатність каналу зв'язку. У процесі доведення сформульованої наукової гіпотези дослідження були отримані наступні результати: 1) вперше розроблено метод семантичного кодування мовних сигналів на основі емпіричного вейвлетперетворення, який відрізняється від існуючих методів побудовою множини адаптивних смугових вейвлет-фільтрів Мейера з подальшим застосуванням спектрального аналізу Гільберта для знаходження миттєвих амплітуд і частот функцій внутрішніх емпіричних мод, що дозволить визначити семантичні ознаки мовних сигналів та підвищити ефективність їх кодування; 2) вперше запропоновано використовувати метод адаптивного емпіричного вейвлет-перетворення в задачах кратномасштабного аналізу та семантичного кодування мовних сигналів, що дозволить підвищити ефективність спектрального аналізу за рахунок розкладання високочастотного мовного коливання на його низькочастотні складові, а саме внутрішні емпіричні моди; 3) отримав подальший розвиток метод семантичного кодування мовних сигналів на основі мел-частотних кепстральних коефіцієнтів, але з використанням базових принципів адаптивного спектрального аналізу за допомогою емпіричного вейвлет-перетворення, що підвищує ефективність даного методу.
The thesis is devoted to the solution of the actual scientific and practical problem in telecommunication systems, namely increasing the bandwidth of the semantic speech data transmission channel due to their efficient coding, that is the question of increasing the efficiency of semantic coding is formulated, namely – at what minimum speed it is possible to encode semantic features of speech signals with the set probability of their error-free recognition? It is on this question will be answered in this research, which is an urgent scientific and technical task given the growing trend of remote human interaction and robotic technology through speech, where the accurateness of this type of system directly depends on the effectiveness of semantic coding of speech signals. In the thesis the well-known method of increasing the efficiency of semantic coding of speech signals based on mel-frequency cepstral coefficients is investigated, which consists in finding the average values of the coefficients of the discrete cosine transformation of the prologarithmic energy of the spectrum of the discrete Fourier transform treated by a triangular filter in the mel-scale. The problem is that the presented method of semantic coding of speech signals based on mel-frequency cepstral coefficients does not meet the condition of adaptability, therefore the main scientific hypothesis of the study was formulated, which is that to increase the efficiency of semantic coding of speech signals is possible through the use of adaptive empirical wavelet transform followed by the use of Hilbert spectral analysis. Coding efficiency means a decrease in the rate of information transmission with a given probability of error-free recognition of semantic features of speech signals, which will significantly reduce the required passband, thereby increasing the bandwidth of the communication channel. In the process of proving the formulated scientific hypothesis of the study, the following results were obtained: 1) the first time the method of semantic coding of speech signals based on empirical wavelet transform is developed, which differs from existing methods by constructing a sets of adaptive bandpass wavelet-filters Meyer followed by the use of Hilbert spectral analysis for finding instantaneous amplitudes and frequencies of the functions of internal empirical modes, which will determine the semantic features of speech signals and increase the efficiency of their coding; 2) the first time it is proposed to use the method of adaptive empirical wavelet transform in problems of multiscale analysis and semantic coding of speech signals, which will increase the efficiency of spectral analysis due to the decomposition of high-frequency speech oscillations into its low-frequency components, namely internal empirical modes; 3) received further development the method of semantic coding of speech signals based on mel-frequency cepstral coefficients, but using the basic principles of adaptive spectral analysis with the application empirical wavelet transform, which increases the efficiency of this method. Conducted experimental research in the software environment MATLAB R2020b showed, that the developed method of semantic coding of speech signals based on empirical wavelet transform allows you to reduce the encoding speed from 320 to 192 bit/s and the required passband from 40 to 24 Hz with a probability of error-free recognition of about 0.96 (96%) and a signal-to-noise ratio of 48 dB, according to which its efficiency increases 1.6 times in contrast to the existing method. The results obtained in the thesis can be used to build systems for remote interaction of people and robotic equipment using speech technologies, such as speech recognition and synthesis, voice control of technical objects, low-speed encoding of speech information, voice translation from foreign languages, etc.

APA, Harvard, Vancouver, ISO, and other styles

33

Calitz, Wietsche Roets. "Independent formant and pitch control applied to singing voice." Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16267.

Full text

Abstract:

Thesis (MScIng)--University of Stellenbosch, 2004.
ENGLISH ABSTRACT: A singing voice can be manipulated artificially by means of a digital computer for the purposes of creating new melodies or to correct existing ones. When the fundamental frequency of an audio signal that represents a human voice is changed by simple algorithms, the formants of the voice tend to move to new frequency locations, making it sound unnatural. The main purpose is to design a technique by which the pitch and formants of a singing voice can be controlled independently.
AFRIKAANSE OPSOMMING: Onafhanklike formant- en toonhoogte beheer toegepas op ’n sangstem: ’n Sangstem kan deur ’n digitale rekenaar gemanipuleer word om nuwe melodie¨e te skep, of om bestaandes te verbeter. Wanneer die fundamentele frekwensie van ’n klanksein (wat ’n menslike stem voorstel) deur ’n eenvoudige algoritme verander word, skuif die oorspronklike formante na nuwe frekwensie gebiede. Dit veroorsaak dat die resultaat onnatuurlik klink. Die hoof oogmerk is om ’n tegniek te ontwerp wat die toonhoogte en die formante van ’n sangstem apart kan beheer.

APA, Harvard, Vancouver, ISO, and other styles

34

Humphrey, Megan. "A signal detection approach to the perception of affective prosody in anxious individuals : a developmental study : a thesis submitted to the Victoria University of Wellington in fulfilment of the requirements for the degree of Masters of Science in Psychology /." ResearchArchive@Victoria e-thesis, 2009. http://hdl.handle.net/10063/1255.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Kim, Jonathan Chongkang. "Classification of affect using novel voice and visual features." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/54301.

Full text

Abstract:

Emotion adds an important element to the discussion of how information is conveyed and processed by humans; indeed, it plays an important role in the contextual understanding of messages. This research is centered on investigating relevant features for affect classification, along with modeling the multimodal and multitemporal nature of emotion. The use of formant-based features for affect classification is explored. Since linear predictive coding (LPC) based formant estimators often encounter problems with modeling speech elements, such as nasalized phonemes and give inconsistent results for bandwidth estimation, a robust formant-tracking algorithm was introduced to better model the formant and spectral properties of speech. The algorithm utilizes Gaussian mixtures to estimate spectral parameters and refines the estimates using maximum a posteriori (MAP) adaptation. When the method was used for features extraction applied to emotion classification, the results indicate that an improved formant-tracking method will also provide improved emotion classification accuracy. Spectral features contain rich information about expressivity and emotion. However, most of the recent work in affective computing has not progressed beyond analyzing the mel-frequency cepstral coefficients (MFCC’s) and their derivatives. A novel method for characterizing spectral peaks was introduced. The method uses a multi-resolution sinusoidal transform coding (MRSTC). Because of MRSTC’s high precision in representing spectral features, including preservation of high frequency content not present in the MFCC’s, additional resolving power was demonstrated. Facial expressions were analyzed using 53 motion capture (MoCap) markers. Statistical and regression measures of these markers were used for emotion classification along the voice features. Since different modalities use different sampling frequencies and analysis window lengths, a novel classifier fusion algorithm was introduced. This algorithm is intended to integrate classifiers trained at various analysis lengths, as well as those obtained from other modalities. Classification accuracy was statistically significantly improved using a multimodal-multitemporal approach with the introduced classifier fusion method. A practical application of the techniques for emotion classification was explored using social dyadic plays between a child and an adult. The Multimodal Dyadic Behavior (MMDB) dataset was used to automatically predict young children’s levels of engagement using linguistic and non-linguistic vocal cues along with visual cues, such as direction of a child’s gaze or a child’s gestures. Although this and similar research is limited by inconsistent subjective boundaries, and differing theoretical definitions of emotion, a significant step toward successful emotion classification has been demonstrated; key to the progress has been via novel voice and visual features and a newly developed multimodal-multitemporal approach.

APA, Harvard, Vancouver, ISO, and other styles

36

Rae, Rebecca C. "Measures of Voice Onset Time: A Methodological Study." Bowling Green State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1522356095329958.

Full text

APA, Harvard, Vancouver, ISO, and other styles

37

Matassini, Lorenzo. "Signal analysis and modelling of non-linear non-stationary phenomena from human voice to financial markets /." [S.l. : s.n.], 2001. http://deposit.ddb.de/cgi-bin/dokserv?idn=963273256.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Dirks, Patricia Lynn. "Child voice, an interactive electroacoustic composition for soprano and computer-generated soundfiles with live digital signal processing." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape9/PQDD_0018/MQ48370.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Hutin, Claire. "Caractérisation de la voie d'adressage aux thylacoi͏̈des : cpSRP (chloroplastic Signal Recognition Particle)." Paris 11, 2002. http://www.theses.fr/2002PA112250.

Full text

Abstract:

Nous avons étudié le rôle in vivo des sous-unités du complexe cpSRP, qui est responsable de l'adressage intrachloroplastique des antennes photosynthétiques aux thylakoi͏̈des. Des analyses conduites in vitro sur la stoechiométrie du complexe étaient très controversées et la mise en évidence par la technique du double hybride de la dimérisation de cpSRP43 au sein du complexe de transit, tend à démontrer qu'in vivo, le complexe serait capable de prendre en charge deux molécules d'antennes photosynthétiques en association avec deux molécules de cpSRP54. L'analyse du double mutant ffc/chaos (cpSRP54-/cpSRP43-) a démontré que la voie cpSRP était la voie majeure d'import des antennes aux thylakoi͏̈des et que dans cette activité les sous-unités cpSRP43 et cpSRP54 avaient des fonctions indépendantes et additives. De plus, elles présentent des affinités différentes pour leur substrat. Ceci a été démontré in vivo dans le mutant chaos, qui a mis en évidence que seule la sous-unité cpSRP43 était requise pour l'accumulation des ELIPs (protéines apparentées aux antennes) au cours d'un stress photo-oxydant. Le mutant chaos a permis d'étudier le rôle protecteur des ELIPs au cours d'un stress. Il semble que la fonction des ELIPs soit destinée à la capture des chlorophylles libérées au cours des processus de dégradation photo-oxydative. L'analyse phénotypique du mutant cpftsy montre que cpFtsY (une autre sous-unité soluble du cpSRP) est nécessaire à l'insertion de toutes les antennes et à l'activité co-traductionnelle du cpSRP. Ln vivo sa fonction est beaucoup plus importante que celle des sous-unités cpSRP43 et cpSRP54. Des expériences de co-immunolocalisation et de Biacore suggèrent que cette sous-unité dissocie le complexe de transit à proximité des thylakoi͏̈des pour former un complexe intermédiaire dit de déchargement en association avec cpSRP43 uniquement. CpFtsY permettrait ainsi en interagissant avec le récepteur ALB3 et le substrat de transférer les antennes du complexe de transit aux thylakoi͏̈des
The purpose of this study was the in vivo analysis of cpSRP subunits, which are involved in the targeting of photosynthetic antennae (LHCPs) to thylakoid membranes. Ln vitro analysis of the transit complex stoechiometry was contested. We demonstrate in double hybrid system that cpSRP43 subunit was able to dimerize. Consequently, in vivo, the transit complex could perform the targeting of two molecules of the substrate in association with two molecules of cpSRP54. The analysis of the double mutant ffc/chaos (cpSRP54-/pSRP43-) demonstrated that the cpSRP pathway was the major import pathway destined to antennae targeting. Ln this post-translational function, both subunits are independent and additive. This has been confirmed in vivo in the chaos mutant, which demonstrated that only the cpSRP43 subunit was involved in the accumulation of ELIPs in response to photo-oxydative stress. ELIPs belong to the same family as the LHCPs. Chaos mutant has been used to determine ELIP function and permitted to show that these proteins were involved in the protection of chloroplast under stress conditions. Hence, ELIPs could be responsible of the capture of free chlorophyll liberated during photo-oxydative degradation of photosynthetic complexes. Phenotypical analysis of the cpftsy mutant shows that the cpFtsY subunit is more important in vivo than the two others. It is required by ail LHCP for insertion and acts in the cpSRP co-translational activity also. Co-immunolocalizations and BIAcore analysis demonstrated that cpFtsY was responsible of the transit complex dissociation in the vicinity of thylakoid membranes. CpFtsY forms a post-targeting complex in association with ALB3 and cpSRP43 during the LHCP transfer from the transit complex to thylakoids

APA, Harvard, Vancouver, ISO, and other styles

40

Navarro, Muriel. "Etude de la voie de transduction du signal Hedgehog chez Drosophila melanogaster." Nice, 2002. http://www.theses.fr/2002NICE5783.

Full text

Abstract:

Les protéines de la famille Hedgehog (Hh) participent au développement correct de l'embryon. Ce sont des molécules sécrétées dont la voie de signalisation intracellulaire passe par un complexe cytoplasmique multi-protéique (2000 kDa). Afin d'identifier de nouvelles protéines de ce complexe et de déterminer leurs relations moléculaires et fonctionnelles avec les autres composants de la voie Hh, nous avons procédé à une purification biochimique. Nous avons microséquencé deux candidats potentiels, l'un présentant des homologies avec des protéines de la famille des Chaperonines, l'autre avec des protéines d'adhésion, suggérant un rôle structural ou d'ancrage. D'autre part, nous avons confirmé que la protéine Suppresseur de fused Su(fu) s'associe au complexe cytoplasmique mais également à d'autres protéines que celles de la voie Hh. Enfin, nous avons mis en évidence que ce complexe s'associait à de la Tubuline alpha soluble, non polymérisée aux microtubules, ces données permettant d'établir un nouveau modèle de transmission du signal Hh
The proteins of the Hedgehog (Hh) family play an important role during embryogenesis. These secreted proteins use a cytoplasmic multiprotein complex (2000 kDa) as an intracellular signal. We purified this complex using a biochemical approach in order to identify new proteins and to understand the molecular and functional interactions between the different elements of the Hh pathway. We microsequenced two candidate proteins, the first one presenting homologies with proteins belonging to the Chaperonine family and the other one with adhesion proteins, suggesting a structural or anchoring role. Additionally, we confirmed that the Suppressor of fused protein (Su(fu) combines with the cytoplasmic complex but also interacts with other cytoplasmic proteins. At last, we evidenced that this complex combined with alpha soluble Tubulin, no microtubule polymerized. These findings lead to a new Hh signal transmission

APA, Harvard, Vancouver, ISO, and other styles

41

Loscos, Àlex. "Spectral processing of the singing voice." Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7542.

Full text

Abstract:

Aquesta tesi doctoral versa sobre el processament digital de la veu cantada, més concretament, sobre l'anàlisi, transformació i síntesi d'aquets tipus de veu en el domini espectral, amb especial èmfasi en aquelles tècniques rellevants per al desenvolupament d'aplicacions musicals.

La tesi presenta nous procediments i formulacions per a la descripció i transformació d'aquells atributs específicament vocals de la veu cantada. La tesis inclou, entre d'altres, algorismes per l'anàlisi i la generació de desordres vocals como ara rugositat, ronquera, o veu aspirada, detecció i modificació de la freqüència fonamental de la veu, detecció de nasalitat, conversió de veu cantada a melodia, detecció de cops de veu, mutació de veu cantada, i transformació de veu a instrument; exemplificant alguns d'aquests algorismes en aplicacions concretes.
Esta tesis doctoral versa sobre el procesado digital de la voz cantada, más concretamente, sobre el análisis, transformación y síntesis de este tipo de voz basándose e dominio espectral, con especial énfasis en aquellas técnicas relevantes para el desarrollo de aplicaciones musicales.

La tesis presenta nuevos procedimientos y formulaciones para la descripción y transformación de aquellos atributos específicamente vocales de la voz cantada. La tesis incluye, entre otros, algoritmos para el análisis y la generación de desórdenes vocales como rugosidad, ronquera, o voz aspirada, detección y modificación de la frecuencia fundamental de la voz, detección de nasalidad, conversión de voz cantada a melodía, detección de los golpes de voz, mutación de voz cantada, y transformación de voz a instrumento; ejemplificando algunos de éstos en aplicaciones concretas.
This dissertation is centered on the digital processing of the singing voice, more concretely on the analysis, transformation and synthesis of this type of voice in the spectral domain, with special emphasis on those techniques relevant for music applications.

The thesis presents new formulations and procedures for both describing and transforming those attributes of the singing voice that can be regarded as voice specific. The thesis includes, among others, algorithms for rough and growl analysis and transformation, breathiness estimation and emulation, pitch detection and modification, nasality identification, voice to melody conversion, voice beat onset detection, singing voice morphing, and voice to instrument transformation; being some of them exemplified with concrete applications.

APA, Harvard, Vancouver, ISO, and other styles

42

Le, Guennec Yannis. "Conversion de fréquences porteuses de signaux numériques par voie optique." Grenoble INPG, 2003. http://www.theses.fr/2003INPG0087.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Alverio, Gustavo. "DISCUSSION ON EFFECTIVE RESTORATION OF ORAL SPEECH USING VOICE CONVERSION TECHNIQUES BASED ON GAUSSIAN MIXTURE MODELING." Master's thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2909.

Full text

Abstract:

Today's world consists of many ways to communicate information. One of the most effective ways to communicate is through the use of speech. Unfortunately many lose the ability to converse. This in turn leads to a large negative psychological impact. In addition, skills such as lecturing and singing must now be restored via other methods. The usage of text-to-speech synthesis has been a popular resolution of restoring the capability to use oral speech. Text to speech synthesizers convert text into speech. Although text to speech systems are useful, they only allow for few default voice selections that do not represent that of the user. In order to achieve total restoration, voice conversion must be introduced. Voice conversion is a method that adjusts a source voice to sound like a target voice. Voice conversion consists of a training and converting process. The training process is conducted by composing a speech corpus to be spoken by both source and target voice. The speech corpus should encompass a variety of speech sounds. Once training is finished, the conversion function is employed to transform the source voice into the target voice. Effectively, voice conversion allows for a speaker to sound like any other person. Therefore, voice conversion can be applied to alter the voice output of a text to speech system to produce the target voice. The thesis investigates how one approach, specifically the usage of voice conversion using Gaussian mixture modeling, can be applied to alter the voice output of a text to speech synthesis system. Researchers found that acceptable results can be obtained from using these methods. Although voice conversion and text to speech synthesis are effective in restoring voice, a sample of the speaker before voice loss must be used during the training process. Therefore it is vital that voice samples are made to combat voice loss.
M.S.E.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering MSEE

APA, Harvard, Vancouver, ISO, and other styles

44

Degottex, Gilles. "Glottal source and vocal-tract separation : estimation of glottal parameters, voice transformation and synthesis using a glottal model." Paris 6, 2010. http://www.theses.fr/2010PA066399.

Full text

Abstract:

Cette étude s'intéresse au problème de l'inversion d'un modèle de production de la voix étant donné un enregistrement audio de parole pour obtenir une représentation de le source sonore qui est générée au niveau de la glotte, la source glottique, ainsi qu'un représentation des résonances et anti-résonances créées par les cavités du conduit vocal. Cette séparation des éléments composants la voix donne la possibilité de manipuler indépendamment les caractéristiques de la source et le timbre des résonances. Nous supposons que la source glottique est un signal à phase mixte et que la réponse impulsionnelle du filtre du conduit vocal est un signal à minimum de phase. Puis, considérant ces propriétés, différentes méthodes sont proposées pour estimer les paramètres d'un modèle glottique qui minimisent la phase carrée moyenne du résiduel convolutif d'un spectre de parole observé et de son modèle. Une dernière méthode est décrite où un unique paramètre de forme est solution d'une forme quasi fermée du spectre observé. Ces méthodes sont évaluées et comparées avec des méthodes de l'état de l'art en utilisant des signaux synthétiques et electro-glotto-graphiques. Nous proposons également une procédure d'analyse/synthèse qui estime le filtre du conduit vocal en utilisant un spectre observé et sa source estimée. Des tests de préférences ont été menés et leurs résultats sont présentés dans cette étude pour comparer la procédure décrite et d'autres méthodes existantes.

APA, Harvard, Vancouver, ISO, and other styles

45

Little, M. A. "Biomechanically informed nonlinear speech signal processing." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Full text

Abstract:

Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.

APA, Harvard, Vancouver, ISO, and other styles

46

Makrickaitė, Raimonda. "Balso signalo aptikimo ir triukšmo pašalinimo algoritmo tyrimas, naudojant aukštesnės eilės statistiką." Master's thesis, Lithuanian Academic Libraries Network (LABT), 2006. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2006~D_20060529_155017-87407.

Full text

Abstract:

This work presents a robust algorithm for voice activity detection (VAD) and noise reduction mechanism using combined properties of higher-order statistics (HOS) and an efficient algorithm to estimate the instantaneous Signal-to-Noise Ratio (SNR) of speech signal in a background of acoustic noise. The flat spectral feature of Linear Prediction Coding (LPC) residual results in distinct characteristics for the cumulants in terms of phase, periodicity and harmonic content and yields closed-form expressions for the skewness and kurtosis. The HOS of speech is immune to Gaussian noise and this makes them particularly useful in algorithms designed for low SNR environments. The proposed algorithm uses HOS and smooth power estimate metrics with second-order measures, such as SNR and LPC prediction error, to identify speech and noise frames. A voicing condition for speech frames is derived based on the relation between the skewness, kurtosis of voiced speech and estimate of smooth noise power. The algorithm presented and its performance is compared to HOS-only based VAD algorithm. The results show that the proposed algorithm has an overall better performance, with noticeable improvement in Gaussian-like noises, such as street and garage, and high to low SNR, especially for probability of correctly detecting speech. The proposed algorithm is replicated on DSK C6713.

APA, Harvard, Vancouver, ISO, and other styles

47

Ardaillon, Luc. "Synthesis and expressive transformation of singing voice." Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066511/document.

Full text

Abstract:

Le but de cette thèse était de conduire des recherches sur la synthèse et transformation expressive de voix chantée, en vue de pouvoir développer un synthétiseur de haute qualité capable de générer automatiquement un chant naturel et expressif à partir d’une partition et d’un texte donnés. 3 directions de recherches principales peuvent être identifiées: les méthodes de modélisation du signal afin de générer automatiquement une voix intelligible et naturelle à partir d’un texte donné; le contrôle de la synthèse, afin de produire une interprétation d’une partition donnée tout en transmettant une certaine expressivité liée à un style de chant spécifique; la transformation du signal vocal afin de le rendre plus naturel et plus expressif, en faisant varier le timbre en adéquation avec la hauteur, l’intensité et la qualité vocale. Cette thèse apporte diverses contributions dans chacune de ces 3 directions. Tout d’abord, un système de synthèse complet a été développé, basé sur la concaténation de diphones. L’architecture modulaire de ce système permet d’intégrer et de comparer différent modèles de signaux. Ensuite, la question du contrôle est abordée, comprenant la génération automatique de la f0, de l’intensité, et des durées des phonèmes. La modélisation de styles de chant spécifiques a également été abordée par l’apprentissage des variations expressives des paramètres de contrôle modélisés à partir d’enregistrements commerciaux de chanteurs célèbres. Enfin, des investigations sur des transformations expressives du timbre liées à l'intensité et à la raucité vocale ont été menées, en vue d'une intégration future dans notre synthétiseur
This thesis aimed at conducting research on the synthesis and expressive transformations of the singing voice, towards the development of a high-quality synthesizer that can generate a natural and expressive singing voice automatically from a given score and lyrics. Mainly 3 research directions can be identified: the methods for modelling the voice signal to automatically generate an intelligible and natural-sounding voice according to the given lyrics; the control of the synthesis to render an adequate interpretation of a given score while conveying some expressivity related to a specific singing style; the transformation of the voice signal to improve its naturalness and add expressivity by varying the timbre adequately according to the pitch, intensity and voice quality. This thesis provides some contributions in each of those 3 directions. First, a fully-functional synthesis system has been developed, based on diphones concatenations. The modular architecture of this system allows to integrate and compare different signal modeling approaches. Then, the question of the control is addressed, encompassing the automatic generation of the f0, intensity, and phonemes durations. The modeling of specific singing styles has also been addressed by learning the expressive variations of the modeled control parameters on commercial recordings of famous French singers. Finally, some investigations on expressive timbre transformations have been conducted, for a future integration into our synthesizer. This mainly concerns methods related to intensity transformation, considering the effects of both the glottal source and vocal tract, and the modeling of vocal roughness

APA, Harvard, Vancouver, ISO, and other styles

48

Kanuri, Mohan Kumar. "Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)." ScholarWorks@UNO, 2017. http://scholarworks.uno.edu/td/2381.

Full text

Abstract:

Extraction of singing voice from music is one of the ongoing research topics in the field of speech recognition and audio analysis. In particular, this topic finds many applications in the music field, such as in determining music structure, lyrics recognition, and singer recognition. Although many studies have been conducted for the separation of voice from the background, there has been less study on singing voice in particular. In this study, efforts were made to design a new methodology to improve the separation of vocal and non-vocal components in audio clips using REPET [14]. In the newly designed method, we tried to rectify the issues encountered in the REPET method, while designing an improved repeating mask which is used to extract the non-vocal component in audio. The main reason why the REPET method was preferred over previous methods for this study is its independent nature. More specifically, the majority of existing methods for the separation of singing voice from music were constructed explicitly based on one or more assumptions.

APA, Harvard, Vancouver, ISO, and other styles

49

Leroy, Ingrid. "Nouveaux mécanismes d'induction et de régulation de la voie de signalisation apoptotique CD95/FAS." Toulouse 3, 2005. http://www.theses.fr/2005TOU30136.

Full text

Abstract:

La voie apoptotique induite par le récepteur Fas et son ligand (FasL) est impliquée dans de nombreux mécanismes physiologiques et pathologiques. Cette voie se décompose en deux étapes : la formation du DISC (Death inducing signaling complex) puis la condensation de la cellule. Dans notre étude, nous nous sommes intéressés aux mécanismes de régulation et d'induction de la voie Fas. Dans ce travail nous avons montré que la protéine kinase C z est un nouveau partenaire du DISC qui régule négativement l'activation de la caspase 8. En parallèle, nous avons montré que la protéine LMP1 (Latent Membrane Protein 1) du virus d'Epstein-Barr sensibilise les cellules à l'apoptose induite par FasL. Enfin nous avons mis en évidence que le traitement de cellules par la Mithramycine A, un agent anti-cancéreux, induit l'activation de la voie apoptotique Fas de façon FasL-indépendante. Ces résultats contribuent à une meilleure compréhension de la voie Fas
Fas / CD95 apoptotic pathway is implicated in various physiological as well as pathological phenomenon. Ligation of CD95 with Fas ligand (FasL) induces DISC (Death Inducing Signaling Complex) formation with recruitment of FADD and caspase 8 and then a degradation of cell components leading to apoptosis. In our work, we studied new mechanisms of regulation and induction of this apoptotic pathway. First, we showed that protein kinase C z is a new DISC member which regulates caspase 8 activation. In a second part, we evaluated the role of Latent Membrane Protein 1 of Epstein-Barr virus on Fas death pathway: our results indicate that this viral protein can facilitate Fas apoptotic pathway. This is independent of LMP1 polymorphism and is inversely correlated to LMP1 expression level. Finally, we showed that Mithramycin A can induce Fas death pathway, independently of FasL. All these results could contribute to a better understanding of Fas apoptotic pathway

APA, Harvard, Vancouver, ISO, and other styles

50

Eliasson, Björn. "Voice Activity Detection and Noise Estimation for Teleconference Phones." Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-108395.

Full text

Abstract:

If communicating via a teleconference phone the desired transmitted signal (speech) needs to be crystal clear so that all participants experience a good communication ability. However, there are many environmental conditions that contaminates the signal with background noise, i.e sounds not of interest for communication purposes, which impedes the ability to communicate due to interfering sounds. Noise can be removed from the signal if it is known and so this work has evaluated different ways of estimating the characteristics of the background noise. Focus was put on using speech detection to define the noise, i.e. the non-speech part of the signal, but other methods not solely reliant on speech detection but rather on characteristics of the noisy speech signal were included. The implemented techniques were compared and evaluated to the current solution utilized by the teleconference phone in two ways, firstly for their speech detection ability and secondly for their ability to correctly estimate the noise characteristics. The evaluation process was based on simulations of the methods' performance in various noise conditions, ranging from harsh to mild environments. It was shown that the proposed method showed improvement over the existing solution, as implemented in this study, in terms of speech detection ability and for the noise estimate it showed improvement in certain conditions. It was also concluded that using the proposed method would enable two sources of noise estimation compared to the current single estimation source and it was suggested to investigate how utilizing two noise estimators could affect the performance.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!