Rozprawy doktorskie na temat „VOICE SIGNALS”
Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych
Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „VOICE SIGNALS”.
Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.
Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.
Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.
Wu, Cheng. "A typology for voice and music signals". Thesis, University of Ottawa (Canada), 2005. http://hdl.handle.net/10393/27082.
Pełny tekst źródłaAnskaitis, Aurimas. "Analysis of Quality of Coded Voice Signals". Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2010. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2009~D_20100303_142141-66509.
Pełny tekst źródłaDisertacijoje nagrin jama koduoto balso kokybės vertinimo problematika. Pagrindinis dėmesys skiriamas balso kokybės tyrimams, kai perduodama koduota šneka ir prarandami balso paketai. Darbo tikslas yra patobulinti koduoto balso kokybės vertinimo algoritmus. Darbo uždaviniai yra šie: • sukurti matavimo priemonę trumpų balso signalo atkarpų kokybei vertinti; • apibrėžti koduoto balso segmentų vertės sampratą ir parinkti vertės metrikas; • išmatuoti bendrinės šnekos balso segmentų verčių skirstinius; • nustatyti skirtingų koderių sukuriamų iškraipymų ribas; • ištirti paplitusių koderių inertiškumą, nustatyti kiek laiko pastebima prarastų paketų įtaka sekantiems segmentams. Disertaciją sudaro įvadas, keturi tiriamieji skyriai ir bendrosios išvados. Įvade pristatomas darbo naujumas, aktualumas, aptariamas autoriaus indėlis, formuluojami darbo tikslai. Pirmas skyrius yra apžvalginis – analizuojami balso kokybės vertinimo metodai, jų privalumai ir trūkumai. Kaip savarankiška dalis čia pristatyti autoriaus sudaryti sąrašai lietuviškų žodžių, skirtų šnekos suprantamumo tyrimams. Antrame skyriuje parodoma, kaip galima išplėsti kokybės vertinimo PESQ (angl. Perceptual Evaluation of Speech Quality) algoritmo taikymo ribas. Čia įvedama koduoto balso paketo vertės sąvoka, nustatomi statistiniai paketų vertės skirstiniai. Trečiame skyriuje nagrinėjami specifiniai koduotos šnekos iškraipymai ir kodavimo parametrų įtaka... [toliau žr. visą tekstą]
Strange, John. "VOICE AUTHENTICATIONA STUDY OF POLYNOMIAL REPRESENTATION OF SPEECH SIGNALS". Master's thesis, University of Central Florida, 2005. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4015.
Pełny tekst źródłaM.S.
Department of Mathematics
Arts and Sciences
Mathematics
BHATT, HARSHIT. "SPEAKER IDENTIFICATION FROM VOICE SIGNALS USING HYBRID NEURAL NETWORK". Thesis, DELHI TECHNOLOGICAL UNIVERSITY, 2021. http://dspace.dtu.ac.in:8080/jspui/handle/repository/18865.
Pełny tekst źródłaChandna, Pritish. "Neural networks for singing voice extraction in monaural polyphonic music signals". Doctoral thesis, Universitat Pompeu Fabra, 2021. http://hdl.handle.net/10803/673414.
Pełny tekst źródłaAquesta tesi se centra en l’extracció de veu cantada a partir de senyals musicals polifònics. En particular, ens centrem en dos casos; música popular contemporània, que normalment té una veu cantada processada amb acompanyament instrumental, i cant coral, que consisteix en diversos cantants cantant en harmonia i a l’uníson. Durant l’última dècada, s’han proposat diversos models basats en l’aprenentatge profund per separar la veu de l’acompanyament instrumental en una mescla musical. La majoria d’aquests models assumeixen que la mescla és una suma lineal de les fonts individuals i estimen les màscares temps-freqüència per filtrar les fonts de la mescla d’entrada. Tot i que aquesta assumpció no sempre es compleix, els models basats en l’aprenentatge profund han demostrat una capacitat notable per modelar les fonts en una mescla. En aquesta tesi, proposem un mètode alternatiu per l’extracció de la veu cantada. Aquesta metodologia assumeix que el contingut lingüístic i melòdic que percebem d’un senyal de veu cantada es manté fins i tot quan es tracta d’una mescla no lineal. Per a això, explorem representacions del contingut lingüístic independents de l’idioma en un senyal de veu, així com metodologies generatives per a la síntesi de veu. Utilitzant-les, proposem una metodologia per sintetitzar un senyal de veu cantada a partir del contingut lingüístic i melòdic subjacent d’un senyal de veu processat en una mescla musical. A més, adaptem i avaluem metodologies de separació de fonts d’última generació per separar les parts de soprano, contralt, tenor i baix dels enregistraments corals. També utilitzem la metodologia proposada per a l’extracció mitjançant síntesi juntament amb altres models basats en l’aprenentatge profund per analitzar el cant a l’uníson dins dels enregistraments corals.
Esta disertación doctoral se centra en la extracción de voz cantada a partir de señales musicales polifónicas de audio. En particular, analizamos dos casos; música popular contemporánea, que normalmente contiene voz cantada procesada y acompañada de instrumentación, y canto coral, que involucra a varios coristas cantando en armonía y al unísono. Durante la última década, se han propuesto varios modelos basados en aprendizaje profundo para separar la voz cantada del acompañamiento instrumental en una mezcla musical. La mayoría de estos modelos asumen que la mezcla musical es una suma lineal de fuentes individuales y estiman máscaras de tiempo-frecuencia para extraerlas de la mezcla. Si bien esta suposición no siempre se cumple, los modelos basados en aprendizaje profundo han demostrado tener una gran capacidad para modelar las fuentes de la mezcla. En esta tesis proponemos un método alternativo para extraer voz cantada. Esta técnica asume que el contenido lingüístico y melódico que se percibe en la voz cantada se retiene incluso cuando la señal es sometida a un proceso de mezcla no lineal. Con este fin, exploramos representaciones del contenido lingüístico independientes del lenguaje en la señal de voz, así como metodos generativos para síntesis de voz. Utilizando estas técnicas, proponemos la base para una metodología de síntesis de voz cantada limpia a partir del contenido lingüístico y melódico subyacente de la señal de voz procesada en una mezcla musical. Además, adaptamos y evaluamos metodologías de separación de fuentes de última generación para separar las voces soprano, alto, tenor y bajo de grabaciones corales. También utilizamos la metodología propuesta para extracción mediante síntesis junto con otros modelos basados en aprendizaje profundo para analizar canto al unísono dentro de grabaciones corales.
Johansson, Dennis. "Real-time analysis, in SuperCollider, of spectral features of electroglottographic signals". Thesis, KTH, Skolan för datavetenskap och kommunikation (CSC), 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-188498.
Pełny tekst źródłaDenna rapport presenterar verktyg och komponenter som är nödvändiga för att vidareutveckla en implementation av en metod. Metoden försöker att använda en icke invasiv elektroglottografisk signal för att hitta snabba övergångar mellan röstregister. Det presenteras implementationer för sampelentropi och den diskreta fourier transformen för programspråket SuperCollider samt verktyg som behövs för att utvärdera metoden och presentera resultaten i realtid. Då olika algoritmer har använts för både klustring och cykelseparation så har även en jämförelse mellan algoritmer för dessa steg gjorts.
Mészáros, Tomáš. "Speech Analysis for Processing of Musical Signals". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234974.
Pełny tekst źródłaBorowiak, Kamila. "Brain Mechanisms for the Perception of Visual and Auditory Communication Signals – Insights from Autism Spectrum Disorder". Doctoral thesis, Humboldt-Universität zu Berlin, 2020. http://dx.doi.org/10.18452/21634.
Pełny tekst źródłaCommunication is ubiquitous in our everyday life. Yet, individuals with autism spectrum disorder (ASD) have difficulties in social interactions and to recognize socially relevant signals from the face and the voice. Such impairments can vastly affect the quality of life - a profound understanding of the mechanisms behind these difficulties is thus strongly required. In the current dissertation, I focused on sensory brain mechanisms that underlie the perception of emotionally neutral communication signals that so far have gained little attention in ASD research. I studied the malleability of voice-identity processing using intranasal administration of oxytocin, and thus the potential to alleviate voice-identity recognition impairments in ASD. Furthermore, I investigated brain mechanisms that underlie recognition difficulties for visual speech in ASD, as until now evidence on visual-speech recognition in ASD was limited to behavioral findings. I applied methods of functional magnetic resonance imaging, eye tracking, and behavioral testing. The contribution of the present dissertation is twofold. First, the findings corroborate the view that atypical sensory perception is a critical cornerstone for understanding of social difficulties in ASD. Dysfunction of visual and auditory sensory brain regions might contribute to difficulties in processing aspects of communication signals in ASD and modulate the efficacy of interventions for improving the behavioral deficits. Second, the findings deliver empirical support for a recent theoretical model of how the typically developing brain perceives dynamic faces. This improved our current knowledge about brain processing of visual communication signals in the typically developing population. Advanced scientific knowledge about human communication, as provided in the current dissertation, propels further empirical research and development of clinical interventions that aim to promote communication abilities in affected individuals.
Mokhtari, Mehdi. "The puzzle of non verbal communication: Towards a new aspect of leadership". Thesis, Linnéuniversitetet, Institutionen för organisation och entreprenörskap (OE), 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:lnu:diva-26248.
Pełny tekst źródłaDzhambazov, Georgi. "Knowledge-based probabilistic modeling for tracking lyrics in music audio signals". Doctoral thesis, Universitat Pompeu Fabra, 2017. http://hdl.handle.net/10803/404681.
Pełny tekst źródłaLa tesi aquí presentada proposa metodologies d’aprenentatge automàtic i processament de senyal per alinear automàticament el text d’una cançó amb el seu corresponent enregistrament d’àudio. La recerca duta a terme s’engloba en l’ampli camp de l’extracció d’informació musical (Music Information Retrieval o MIR). Dins aquest context la tesi pretén millorar algunes de les metodologies d’última generació del camp introduint coneixement específic de l’àmbit. L’objectiu d’aquest treball és dissenyar models que siguin capaços de detectar en la senyal d’àudio l’aspecte seqüencial d’un element particular dels textos musicals; els fonemes. Podem entendre la música com la composició de diversos elements entre els quals podem trobar el text. Els models que construïm tenen en compte el context complementari del text. El context són tots aquells aspectes musicals que complementen el text, dels quals hem utilitzat en aquest tesi: la estructura de la composició musical, la estructura de les frases melòdiques i els accents rítmics. Des d’aquesta prespectiva analitzem no només les característiques acústiques de baix nivell, que representen el timbre musical dels fonemes, sinó també les característiques d’alt nivell en les quals es fa patent el context complementari. En aquest treball proposem models probabilístics específics que representen com les transicions entre fonemes consecutius de veu cantanda es veuen afectats per diversos aspectes del context complementari. El context complementari que tractem aquí es desenvolupa en el temps en funció de les característiques particulars de cada tradició musical. Per tal de modelar aquestes característiques hem creat corpus i conjunts de dades de dues tradicions musicals que presenten una gran riquesa en aquest aspectes; la música de l’opera de Beijing i la música makam turc-otomana. Les dades són de diversos tipus; enregistraments d’àudio, partitures musicals i metadades. Des d’aquesta prespectiva els models proposats poden aprofitar-se tant de les dades en si mateixes com del coneixement específic de la tradició musical per a millorar els resultats de referència actuals. Com a resultat de referència prenem un reconeixedor de fonemes basat en models ocults de Markov (Hidden Markov Models o HMM), una metodologia abastament emprada per a detectar fonemes tant en la veu cantada com en la parlada. Presentem millores en els processos comuns dels reconeixedors de fonemes actuals, ajustant-los a les característiques de les tradicions musicals estudiades. A més de millorar els resultats de referència també dissenyem models probabilistics basats en xarxes dinàmiques de Bayes (Dynamic Bayesian Networks o DBN) que respresenten la relació entre la transició dels fonemes i el context complementari. Hem creat dos models diferents per dos aspectes del context complementari; la estructura de la frase melòdica (alt nivell) i la estructura mètrica (nivell subtil). En un dels models explotem el fet que la duració de les síl·labes depén de la seva posició en la frase melòdica. Obtenim aquesta informació sobre les frases musical de la partitura i del coneixement específic de la tradició musical. En l’altre model analitzem com els atacs de les notes vocals, estimats directament dels enregistraments d’àudio, influencien les transicions entre vocals i consonants consecutives. A més també proposem com detectar les posicions temporals dels atacs de les notes en les frases melòdiques a base de localitzar simultàniament els accents en un cicle mètric musical. Per tal d’evaluar el potencial dels mètodes proposats utlitzem la tasca específica d’alineament de text amb àudio. Cada model proposat millora la precisió de l’alineament en comparació als resultats de referència, que es basen exclusivament en les característiques acústiques tímbriques dels fonemes. D’aquesta manera validem la nostra hipòtesi de que el coneixement del context complementari ajuda a la detecció automàtica de text musical, especialment en el cas de veu cantada amb acompanyament instrumental. Els resultats d’aquest treball no consisteixen només en metodologies teòriques i dades, sinó també en eines programàtiques específiques que han sigut integrades a Dunya, un paquet d’eines creat en el context del projecte de recerca CompMusic, l’objectiu del qual és promoure l’anàlisi computacional de les músiques del món. Gràcies a aquestes eines demostrem també que les metodologies desenvolupades es poden fer servir per a altres aplicacions en el context de la educació musical o la escolta musical enriquida.
Carvalho, Paulo Henrique Bezerra de. "CODIFICAÇÃO DE SINAIS DE VOZ HUMANA POR DECOMPOSIÇÃO EM COMPONENTES MODULANTES". Universidade Federal do Maranhão, 2003. http://tedebc.ufma.br:8080/jspui/handle/tede/370.
Pełny tekst źródłaThis work proposes an speech signal encoder variation based on two concepts: the formants and the modulating components of the speech signal. The method suggested for the codification extracts the modulating components (instantaneous amplitude and frequency) to be transmitted. The method is based on the fact that the transmission of the speech can be substituted by the transmission of its AM-FM modulating components (amplitude modulation - frequency modulation). Thus, to send such components, the LPC (linear predictive coding) method is used to determine the frequencies that correspond to the first four formants of the speech spectrum within a 4 kHz band. Then, through a modified Gabor s wavelet function, four narrow bands are filtered around the formants. Finally, the properties of the Hilbert transform are used to determine the modulating components of the filtered bands, in other words, the instantaneous amplitudes and frequencies. The final result is the codification of eight signals in which four of them correspond to the instantaneous amplitudes and the other four correspond to the instantaneous frequencies. It is also presented a recovery of human speech where tests of intelligibility of the samples are applied after their respective recoveries. The results obtained showed that the method is a promising technique to be implemented in actual applications.
Este trabalho propõe uma variação de codificador do sinal de voz baseada em dois conceitos: os formantes e as componentes modulantes do sinal. O método proposto de codificação extrai as componentes modulantes (amplitudes e freqüências instantâneas) para serem transmitidas. O método é baseado no fato de que a transmissão da voz pode ser substituída pelo envio de suas componentes modulantes AM-FM (amplitude modulation - frequency modulation). Desse modo, para o envio de tais componentes é utilizado o método LPC (linear predictive coding) para a determinação das freqüências correspondentes aos quatro primeiros formantes do espectro de voz na faixa de 4 kHz. Em seguida, através de uma função wavelet modificada de Gabor, são filtradas quatro faixas estreitas em torno desses formantes. Por último, utilizando-se as propriedades da transformada de Hilbert, são determinadas as componentes modulantes das faixas filtradas, ou seja, as amplitudes e freqüências instantâneas. O resultado final é a codificação de oito sinais, sendo quatro correspondentes às amplitudes instantâneas e quatro das freqüências instantâneas. Também é apresentada a recuperação da voz a partir dos oitos sinais e para a validação do método são utilizadas cinco amostras de voz humana onde são empregados testes de inteligibilidade das amostras após as suas respectivas recuperações. Os resultados obtidos mostraram que o método é factível de implementação em aplicações reais.
Boué, Anaïs. "Data mining and volcanic eruption forcasting". Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAU007/document.
Pełny tekst źródłaEruption forecasting methods are valuable tools for supporting decision making during volcanic crises if they are integrated in a global monitoring strategy and if their potentiality and limitations are known. Many attempts for deterministic forecasting of volcanic eruptions and landslides have been performed using the material Failure Forecast Method (FFM). This method consists in adjusting an empirical power law on precursory patterns of seismicity or deformation. Until now, most of the studies have presented hindsight forecasts, based on complete time series of precursors, and do not evaluate the method's potential for carrying out real-time forecasting with partial precursory sequences. Moreover, the limited number of published examples and the absence of systematic application of the FFM makes it difficult to conclude as to the ability of the method to forecast volcanic eruptions. Thus it appears important to gain experience by carrying out systematic forecasting attempts in various eruptive contexts. In this thesis, I present a rigorous approach of the FFM designed for real-time applications on volcano-seismic precursors. I use a Bayesian approach based on the FFM theory and an automatic classification of the seismic events that do not have the same source mechanisms. The probability distributions of the data deduced from the performance of the classification are used as input. As output, the method provides the probability of the forecast time at each observation time before the eruption. The spread of the posterior probability density function of the prediction time and its stability with respect to the observation time are used as criteria to evaluate the reliability of the forecast. I show that the method developed here outperforms the classical application of the FFM both for hindsight and real-time attempts because it accurately takes the uncertainty of the data information into account. The automatic classification of volcano-seismic signals allows for a systematic application of this forecasting method to decades of seismic data from andesitic volcanoes including Volcan de Colima (Mexico) and Merapi volcano (Indonesia), and from the basaltic volcano of Piton de la Fournaise (Reunion Island, France). The number of eruptions that are not preceded by precursors is quantified, as well as the number of seismic crises that are not followed by eruptions. Then, I use 64 precursory sequences and apply the forecasting method developed in this thesis. I thus determine in which conditions the FFM can be successfully applied and I quantify the success rate of the method in real-time and in hindsight. Only 62% of the precursory sequences analysed in this thesis were suitable for the application of FFM and half of the total number of eruptions are successfully forecast in hindsight. In real-time, the method allows for the successful predictions of only 36% of the total of all eruptions considered. Nevertheless, real-time predictions are successful for 83% of the cases that fulfil the reliability criteria. Therefore, we can have a good confidence on the method when the reliability criteria are met, but the deterministic real-time forecasting tool developed in this thesis is not sufficient in itself. However, it could potentially be informative combined with other forecasting methods and supervised by an observer. These results reflect the lack of knowledge concerning the pre-eruptive mechanisms
Duchovskis, Donatas. "Aukštesnių eilių statistika grįsto balso detektavimo algoritmo sudarymas ir tyrimas". Master's thesis, Lithuanian Academic Libraries Network (LABT), 2006. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2006~D_20060529_131458-61965.
Pełny tekst źródłaGuy, Richard, i edu au jillj@deakin edu au mikewood@deakin edu au wildol@deakin edu au kimg@deakin. "DISTANCE, DIALOGUE AND DIFFERENCE A Postpositivist Approach to Understanding Distance Education in Papua New Guinea". Deakin University. School of Education, 1994. http://tux.lib.deakin.edu.au./adt-VDU/public/adt-VDU20041209.093035.
Pełny tekst źródłaWu, Nan, i Bofei Wang. "Process and Analysis of Voice Signal by MATLAB". Thesis, Högskolan i Gävle, Avdelningen för elektronik, matematik och naturvetenskap, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hig:diva-17541.
Pełny tekst źródłaNayfeh, Taysir H. "Multi-signal processing for voice recognition in noisy environments". Thesis, This resource online, 1991. http://scholar.lib.vt.edu/theses/available/etd-10222009-125021/.
Pełny tekst źródłaNylén, Helmer. "Detecting Signal Corruptions in Voice Recordings for Speech Therapy". Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-291429.
Pełny tekst źródłaNär en patients röst spelas in för analys i talterapi kan inspelningskvaliteten påverkas av olika signalproblem, till exempel bakgrundsljud eller klippning. Utrustningen och expertisen som behövs för att upptäcka små störningar finns dock inte alltid tillgänglig på mindre kliniker. Därför undersöker denna studie olika maskininlärningsalgoritmer för att automatiskt kunna upptäcka utvalda problem i talinspelningar, bland andra infraljud och slumpmässig utsläckning av signalen. Fem algoritmer analyseras: stödvektormaskin, Convolutional Neural Network, Long Short-term Memory (LSTM), Gaussian mixture model-baserad dold Markovmodell och generatorbaserad dold Markovmodell. Ett verktyg för att skapa datamängder med försämrade inspelningar utvecklas för att kunna testa algoritmerna. Vi undersöker separat fallen där inspelningarna tillåts ha en eller flera problem samtidigt, och använder framförallt en slags kepstralkoefficienter, MFCC:er, som särdrag. För varje typ av problem undersöker vi också sätt att förbättra noggrannheten, till exempel genom att filtrera bort irrelevanta delar av signalen med hjälp av en röstupptäckare, ändra särdragsparametrarna, eller genom att använda en ensemble av klassificerare. Experimenten visar att maskininlärning är ett rimligt tillvägagångssätt för detta problem då den balanserade träffsäkerheten överskrider 75%för samtliga testade störningar. Den delen av studien som fokuserade på enproblemsinspelningar gav inga resultat som tydde på att en algoritm var klart bättre än de andra, men i flerproblemsfallet överträffade LSTM:en generellt övriga algoritmer. Värt att notera är att den nådde över 95 % balanserad träffsäkerhet på både vitt brus och infraljud. Eftersom algoritmerna enbart tränats på engelskspråkiga, talade meningar så har detta verktyg i nuläget begränsad praktisk användbarhet. Däremot är det lätt att utöka dessa experiment med andra typer av inspelningar, signalproblem, särdrag eller algoritmer.
Oddiraju, Swetha. "Improving performance for adaptive filtering with voice applications". Diss., Columbia, Mo. : University of Missouri-Columbia, 2007. http://hdl.handle.net/10355/6271.
Pełny tekst źródłaThe entire dissertation/thesis text is included in the research.pdf file; the official abstract appears in the short.pdf file (which also appears in the research.pdf); a non-technical general description, or public abstract, appears in the public.pdf file. Title from title screen of research.pdf file (viewed on September 29, 2008) Includes bibliographical references.
SANTOS, JÚNIOR Gutemberg Gonçalves dos. "Redução de ruído para sistemas de reconhecimento de voz utilizando subespaços vetoriais". Universidade Federal de Campina Grande, 2009. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/1508.
Pełny tekst źródłaMade available in DSpace on 2018-08-20T20:10:09Z (GMT). No. of bitstreams: 1 GUTEMBERG GONÇALVES DOS SANTOS JÚNIOR - DISSERTAÇÃO PPGEE 2009..pdf: 2756190 bytes, checksum: 5812d37f7ad4c18eb26e9672d4890812 (MD5) Previous issue date: 2009-05-08
O estabelecimento de uma interface de comunicação através da voz entre seres humanos e computadores vem sendo perseguido desde o início da era da computação. Nesta direção, diversos avanços foram realizados nas últimas seis décadas, permitindo o uso comercial de aplicações com reconhecimento de voz nos dias atuais. Entretanto, fatores como ruídos, reverberações, distorções entre outros, comprometem o desempenho desses sistemas ao reduzir a taxa de acerto quando submetidos a ambientes adversos. Assim, o estudo de técnicas que diminuam os efeitos desses problemas é de grande valia e vem ganhando destaque nas últimas décadas. O trabalho apresentado nesta dissertação tem como objetivo a redução dos problemas referentes aos ruídos característicos de ambientes automotivos, tornando os sistemas de reconhecimento de voz utilizados nesses ambientes mais robustos. Dessa forma, o controle de funcionalidades não-críticas de um automóvel, ou seja, funcionalidades que não coloquem em risco a vida do usuário como tocadores de música e ar condicionado, pode ser realizado através de comandos de voz. O sistema proposto é baseado numa etapa de pré-processamento do sinal de voz através do método de subespaços vetoriais. O desempenho deste método está diretamente relacionado com as dimensões (linhas× colunas) das matrizes representativas do sinal de entrada. Levando isso em consideração, a decomposição ULLV, apesar de se tratar de uma aproximação do método de subespaços vetoriais, foi utilizada por oferecer uma menor complexidade computacional quando comparada a métodos tradicionais baseados na decomposição SVD. O sistema de reconhecimento de voz Julius foi o escolhido para o estudo de caso por se tratar de um sistema desenvolvido em código livre que oferece um alto desempenho. Um banco de dados de voz com 44800 amostras foi gerado com o modelo de um ambiente automotivo. Por fim, a robustez do sistema foi avaliada e comparada com um método tradicional de redução de ruído chamado subtração espectral.
The establishment of a speech-based communication interface between humans and computers has been pursued since the beginning of the computer era. Several studies have been made over the last six decades in order to accomplish this interface, making possible commercial use of speech recognition applications. However, factors such as noise, reverberation, distortion among others degrades the performance of these systems. Thus, reducing their success rate when operating in adverse environments. With this in mind, the study of techniques to reduce the impact of these problems is of a great value and has gained prominence in recent decades. The work presented in this dissertation aims to reduce problems related to noise encountered in an automotive environment, improving the speech recognition system robustness. Thus,controlofnon-critical features of a car, such as CD player and air conditioning, can be performed through voice commands. The proposed system is based on a speech signal preprocessing step using the signal subspace method. Its performance is related to the size (lines× columns) of the matrices that represents the input signal. Therefore, the ULLV decomposition was used because it offers a lower computational complexity compared to traditional methods based on SVD decomposition. The speech recognizer Julius is an open source software that offers high performance and was the chosen one for the case study. A noisy speech database with 44800 samples was generated to model the automotive environment. Finally, the robustness of the system was evaluated and compared with a traditional method of noise reduction called spectral subtraction.
Jalalinajafabadi, Farideh. "Computerised GRBAS assessement of voice quality". Thesis, University of Manchester, 2016. https://www.research.manchester.ac.uk/portal/en/theses/computerised-grbas-assessement-of-voice-quality(7efd3263-b109-4137-87cf-b9559c61730b).html.
Pełny tekst źródłaLindsay, Iain Andrew Blair. "A signal constellation and carrier recovery technique for voice-band modems". Thesis, University of Edinburgh, 1986. http://hdl.handle.net/1842/15216.
Pełny tekst źródłaLembard, Tomáš. "Speciální aplikace VoIP". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2011. http://www.nusl.cz/ntk/nusl-219188.
Pełny tekst źródłaDoukas, Nikolaos. "Voice activity detection using energy based measures and source separation". Thesis, Imperial College London, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245220.
Pełny tekst źródłaSmith, Quentin D. "Multichannel Digital Signal Processor Based Red/Black Keyset". International Foundation for Telemetering, 1992. http://hdl.handle.net/10150/611927.
Pełny tekst źródłaThis paper addresses a method to provide both secure and non-secure voice communications to a DS-1 network from a common keyset. In order to comply with both the electrical isolation requirements and the operational security issues regarding voice communications, an all-digital approach to the keyset was developed based upon the AD2101 DSP. Protocols that are handled by the keyset include: Multiple PTT modes, hot mike, telephone access, priority override, direct access, indirect access, paging, and monitor only. Special features that are addressed include: independent channel by channel assignment of access protocols, headset assignment, speaker assignment, and PTT assignment. Multiple microprocessors are used to implement the foregoing as well as down-loadable configurations, remote keyset control and monitoring, and composite audio outputs. Partitioning of the digital design provides RED to BLACK channel isolation and RED channel to AC power isolation of greater than 107 dB.
Fredrickson, Steven Eric. "Neural networks for speaker identification". Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294364.
Pełny tekst źródłaEl, Malki Karim. "A novel approach to high quality voice using echo cancellation and silence detection". Thesis, University of Sheffield, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.286579.
Pełny tekst źródłaTryfou, Georgina. "Time-frequency reassignment for acoustic signal processing. From speech to singing voice applications". Doctoral thesis, University of Trento, 2017. http://eprints-phd.biblio.unitn.it/2562/2/PhD-Thesis.pdf.
Pełny tekst źródłaMaury, Ghislaine. "Mélange de signaux microondes par voie optique". Grenoble INPG, 1998. http://www.theses.fr/1998INPG0152.
Pełny tekst źródłaWheatley, John Malcolm. "A current differential feeder protection for use with leased voice frequency communications circuits". Thesis, Northumbria University, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245264.
Pełny tekst źródłaCommarford, Patrick. "WORKING MEMORY, SEARCH, AND SIGNAL DETECTION: IMPLICATIONS FOR INTERACTIVE VOICE RESPONSE SYSTEM MENU DESIGN". Doctoral diss., University of Central Florida, 2006. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4050.
Pełny tekst źródłaPh.D.
Department of Psychology
Arts and Sciences
Psychology
Podloucká, Lenka. "Identifikace pauz v rušeném řečovém signálu". Master's thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2008. http://www.nusl.cz/ntk/nusl-217266.
Pełny tekst źródłaЛавриненко, Олександр Юрійович, Александр Юрьевич Лавриненко i Oleksandr Lavrynenko. "Методи підвищення ефективності семантичного кодування мовних сигналів". Thesis, Національний авіаційний університет, 2021. https://er.nau.edu.ua/handle/NAU/52212.
Pełny tekst źródłaThe thesis is devoted to the solution of the actual scientific and practical problem in telecommunication systems, namely increasing the bandwidth of the semantic speech data transmission channel due to their efficient coding, that is the question of increasing the efficiency of semantic coding is formulated, namely – at what minimum speed it is possible to encode semantic features of speech signals with the set probability of their error-free recognition? It is on this question will be answered in this research, which is an urgent scientific and technical task given the growing trend of remote human interaction and robotic technology through speech, where the accurateness of this type of system directly depends on the effectiveness of semantic coding of speech signals. In the thesis the well-known method of increasing the efficiency of semantic coding of speech signals based on mel-frequency cepstral coefficients is investigated, which consists in finding the average values of the coefficients of the discrete cosine transformation of the prologarithmic energy of the spectrum of the discrete Fourier transform treated by a triangular filter in the mel-scale. The problem is that the presented method of semantic coding of speech signals based on mel-frequency cepstral coefficients does not meet the condition of adaptability, therefore the main scientific hypothesis of the study was formulated, which is that to increase the efficiency of semantic coding of speech signals is possible through the use of adaptive empirical wavelet transform followed by the use of Hilbert spectral analysis. Coding efficiency means a decrease in the rate of information transmission with a given probability of error-free recognition of semantic features of speech signals, which will significantly reduce the required passband, thereby increasing the bandwidth of the communication channel. In the process of proving the formulated scientific hypothesis of the study, the following results were obtained: 1) the first time the method of semantic coding of speech signals based on empirical wavelet transform is developed, which differs from existing methods by constructing a sets of adaptive bandpass wavelet-filters Meyer followed by the use of Hilbert spectral analysis for finding instantaneous amplitudes and frequencies of the functions of internal empirical modes, which will determine the semantic features of speech signals and increase the efficiency of their coding; 2) the first time it is proposed to use the method of adaptive empirical wavelet transform in problems of multiscale analysis and semantic coding of speech signals, which will increase the efficiency of spectral analysis due to the decomposition of high-frequency speech oscillations into its low-frequency components, namely internal empirical modes; 3) received further development the method of semantic coding of speech signals based on mel-frequency cepstral coefficients, but using the basic principles of adaptive spectral analysis with the application empirical wavelet transform, which increases the efficiency of this method. Conducted experimental research in the software environment MATLAB R2020b showed, that the developed method of semantic coding of speech signals based on empirical wavelet transform allows you to reduce the encoding speed from 320 to 192 bit/s and the required passband from 40 to 24 Hz with a probability of error-free recognition of about 0.96 (96%) and a signal-to-noise ratio of 48 dB, according to which its efficiency increases 1.6 times in contrast to the existing method. The results obtained in the thesis can be used to build systems for remote interaction of people and robotic equipment using speech technologies, such as speech recognition and synthesis, voice control of technical objects, low-speed encoding of speech information, voice translation from foreign languages, etc.
Calitz, Wietsche Roets. "Independent formant and pitch control applied to singing voice". Thesis, Stellenbosch : University of Stellenbosch, 2004. http://hdl.handle.net/10019.1/16267.
Pełny tekst źródłaENGLISH ABSTRACT: A singing voice can be manipulated artificially by means of a digital computer for the purposes of creating new melodies or to correct existing ones. When the fundamental frequency of an audio signal that represents a human voice is changed by simple algorithms, the formants of the voice tend to move to new frequency locations, making it sound unnatural. The main purpose is to design a technique by which the pitch and formants of a singing voice can be controlled independently.
AFRIKAANSE OPSOMMING: Onafhanklike formant- en toonhoogte beheer toegepas op ’n sangstem: ’n Sangstem kan deur ’n digitale rekenaar gemanipuleer word om nuwe melodie¨e te skep, of om bestaandes te verbeter. Wanneer die fundamentele frekwensie van ’n klanksein (wat ’n menslike stem voorstel) deur ’n eenvoudige algoritme verander word, skuif die oorspronklike formante na nuwe frekwensie gebiede. Dit veroorsaak dat die resultaat onnatuurlik klink. Die hoof oogmerk is om ’n tegniek te ontwerp wat die toonhoogte en die formante van ’n sangstem apart kan beheer.
Humphrey, Megan. "A signal detection approach to the perception of affective prosody in anxious individuals : a developmental study : a thesis submitted to the Victoria University of Wellington in fulfilment of the requirements for the degree of Masters of Science in Psychology /". ResearchArchive@Victoria e-thesis, 2009. http://hdl.handle.net/10063/1255.
Pełny tekst źródłaKim, Jonathan Chongkang. "Classification of affect using novel voice and visual features". Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/54301.
Pełny tekst źródłaRae, Rebecca C. "Measures of Voice Onset Time: A Methodological Study". Bowling Green State University / OhioLINK, 2018. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1522356095329958.
Pełny tekst źródłaMatassini, Lorenzo. "Signal analysis and modelling of non-linear non-stationary phenomena from human voice to financial markets /". [S.l. : s.n.], 2001. http://deposit.ddb.de/cgi-bin/dokserv?idn=963273256.
Pełny tekst źródłaDirks, Patricia Lynn. "Child voice, an interactive electroacoustic composition for soprano and computer-generated soundfiles with live digital signal processing". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape9/PQDD_0018/MQ48370.pdf.
Pełny tekst źródłaHutin, Claire. "Caractérisation de la voie d'adressage aux thylacoi͏̈des : cpSRP (chloroplastic Signal Recognition Particle)". Paris 11, 2002. http://www.theses.fr/2002PA112250.
Pełny tekst źródłaThe purpose of this study was the in vivo analysis of cpSRP subunits, which are involved in the targeting of photosynthetic antennae (LHCPs) to thylakoid membranes. Ln vitro analysis of the transit complex stoechiometry was contested. We demonstrate in double hybrid system that cpSRP43 subunit was able to dimerize. Consequently, in vivo, the transit complex could perform the targeting of two molecules of the substrate in association with two molecules of cpSRP54. The analysis of the double mutant ffc/chaos (cpSRP54-/pSRP43-) demonstrated that the cpSRP pathway was the major import pathway destined to antennae targeting. Ln this post-translational function, both subunits are independent and additive. This has been confirmed in vivo in the chaos mutant, which demonstrated that only the cpSRP43 subunit was involved in the accumulation of ELIPs in response to photo-oxydative stress. ELIPs belong to the same family as the LHCPs. Chaos mutant has been used to determine ELIP function and permitted to show that these proteins were involved in the protection of chloroplast under stress conditions. Hence, ELIPs could be responsible of the capture of free chlorophyll liberated during photo-oxydative degradation of photosynthetic complexes. Phenotypical analysis of the cpftsy mutant shows that the cpFtsY subunit is more important in vivo than the two others. It is required by ail LHCP for insertion and acts in the cpSRP co-translational activity also. Co-immunolocalizations and BIAcore analysis demonstrated that cpFtsY was responsible of the transit complex dissociation in the vicinity of thylakoid membranes. CpFtsY forms a post-targeting complex in association with ALB3 and cpSRP43 during the LHCP transfer from the transit complex to thylakoids
Navarro, Muriel. "Etude de la voie de transduction du signal Hedgehog chez Drosophila melanogaster". Nice, 2002. http://www.theses.fr/2002NICE5783.
Pełny tekst źródłaThe proteins of the Hedgehog (Hh) family play an important role during embryogenesis. These secreted proteins use a cytoplasmic multiprotein complex (2000 kDa) as an intracellular signal. We purified this complex using a biochemical approach in order to identify new proteins and to understand the molecular and functional interactions between the different elements of the Hh pathway. We microsequenced two candidate proteins, the first one presenting homologies with proteins belonging to the Chaperonine family and the other one with adhesion proteins, suggesting a structural or anchoring role. Additionally, we confirmed that the Suppressor of fused protein (Su(fu) combines with the cytoplasmic complex but also interacts with other cytoplasmic proteins. At last, we evidenced that this complex combined with alpha soluble Tubulin, no microtubule polymerized. These findings lead to a new Hh signal transmission
Loscos, Àlex. "Spectral processing of the singing voice". Doctoral thesis, Universitat Pompeu Fabra, 2007. http://hdl.handle.net/10803/7542.
Pełny tekst źródłaLa tesi presenta nous procediments i formulacions per a la descripció i transformació d'aquells atributs específicament vocals de la veu cantada. La tesis inclou, entre d'altres, algorismes per l'anàlisi i la generació de desordres vocals como ara rugositat, ronquera, o veu aspirada, detecció i modificació de la freqüència fonamental de la veu, detecció de nasalitat, conversió de veu cantada a melodia, detecció de cops de veu, mutació de veu cantada, i transformació de veu a instrument; exemplificant alguns d'aquests algorismes en aplicacions concretes.
Esta tesis doctoral versa sobre el procesado digital de la voz cantada, más concretamente, sobre el análisis, transformación y síntesis de este tipo de voz basándose e dominio espectral, con especial énfasis en aquellas técnicas relevantes para el desarrollo de aplicaciones musicales.
La tesis presenta nuevos procedimientos y formulaciones para la descripción y transformación de aquellos atributos específicamente vocales de la voz cantada. La tesis incluye, entre otros, algoritmos para el análisis y la generación de desórdenes vocales como rugosidad, ronquera, o voz aspirada, detección y modificación de la frecuencia fundamental de la voz, detección de nasalidad, conversión de voz cantada a melodía, detección de los golpes de voz, mutación de voz cantada, y transformación de voz a instrumento; ejemplificando algunos de éstos en aplicaciones concretas.
This dissertation is centered on the digital processing of the singing voice, more concretely on the analysis, transformation and synthesis of this type of voice in the spectral domain, with special emphasis on those techniques relevant for music applications.
The thesis presents new formulations and procedures for both describing and transforming those attributes of the singing voice that can be regarded as voice specific. The thesis includes, among others, algorithms for rough and growl analysis and transformation, breathiness estimation and emulation, pitch detection and modification, nasality identification, voice to melody conversion, voice beat onset detection, singing voice morphing, and voice to instrument transformation; being some of them exemplified with concrete applications.
Le, Guennec Yannis. "Conversion de fréquences porteuses de signaux numériques par voie optique". Grenoble INPG, 2003. http://www.theses.fr/2003INPG0087.
Pełny tekst źródłaAlverio, Gustavo. "DISCUSSION ON EFFECTIVE RESTORATION OF ORAL SPEECH USING VOICE CONVERSION TECHNIQUES BASED ON GAUSSIAN MIXTURE MODELING". Master's thesis, University of Central Florida, 2007. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2909.
Pełny tekst źródłaM.S.E.E.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering MSEE
Degottex, Gilles. "Glottal source and vocal-tract separation : estimation of glottal parameters, voice transformation and synthesis using a glottal model". Paris 6, 2010. http://www.theses.fr/2010PA066399.
Pełny tekst źródłaLittle, M. A. "Biomechanically informed nonlinear speech signal processing". Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.
Pełny tekst źródłaMakrickaitė, Raimonda. "Balso signalo aptikimo ir triukšmo pašalinimo algoritmo tyrimas, naudojant aukštesnės eilės statistiką". Master's thesis, Lithuanian Academic Libraries Network (LABT), 2006. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2006~D_20060529_155017-87407.
Pełny tekst źródłaArdaillon, Luc. "Synthesis and expressive transformation of singing voice". Thesis, Paris 6, 2017. http://www.theses.fr/2017PA066511/document.
Pełny tekst źródłaThis thesis aimed at conducting research on the synthesis and expressive transformations of the singing voice, towards the development of a high-quality synthesizer that can generate a natural and expressive singing voice automatically from a given score and lyrics. Mainly 3 research directions can be identified: the methods for modelling the voice signal to automatically generate an intelligible and natural-sounding voice according to the given lyrics; the control of the synthesis to render an adequate interpretation of a given score while conveying some expressivity related to a specific singing style; the transformation of the voice signal to improve its naturalness and add expressivity by varying the timbre adequately according to the pitch, intensity and voice quality. This thesis provides some contributions in each of those 3 directions. First, a fully-functional synthesis system has been developed, based on diphones concatenations. The modular architecture of this system allows to integrate and compare different signal modeling approaches. Then, the question of the control is addressed, encompassing the automatic generation of the f0, intensity, and phonemes durations. The modeling of specific singing styles has also been addressed by learning the expressive variations of the modeled control parameters on commercial recordings of famous French singers. Finally, some investigations on expressive timbre transformations have been conducted, for a future integration into our synthesizer. This mainly concerns methods related to intensity transformation, considering the effects of both the glottal source and vocal tract, and the modeling of vocal roughness
Kanuri, Mohan Kumar. "Separation of Vocal and Non-Vocal Components from Audio Clip Using Correlated Repeated Mask (CRM)". ScholarWorks@UNO, 2017. http://scholarworks.uno.edu/td/2381.
Pełny tekst źródłaLeroy, Ingrid. "Nouveaux mécanismes d'induction et de régulation de la voie de signalisation apoptotique CD95/FAS". Toulouse 3, 2005. http://www.theses.fr/2005TOU30136.
Pełny tekst źródłaFas / CD95 apoptotic pathway is implicated in various physiological as well as pathological phenomenon. Ligation of CD95 with Fas ligand (FasL) induces DISC (Death Inducing Signaling Complex) formation with recruitment of FADD and caspase 8 and then a degradation of cell components leading to apoptosis. In our work, we studied new mechanisms of regulation and induction of this apoptotic pathway. First, we showed that protein kinase C z is a new DISC member which regulates caspase 8 activation. In a second part, we evaluated the role of Latent Membrane Protein 1 of Epstein-Barr virus on Fas death pathway: our results indicate that this viral protein can facilitate Fas apoptotic pathway. This is independent of LMP1 polymorphism and is inversely correlated to LMP1 expression level. Finally, we showed that Mithramycin A can induce Fas death pathway, independently of FasL. All these results could contribute to a better understanding of Fas apoptotic pathway
Eliasson, Björn. "Voice Activity Detection and Noise Estimation for Teleconference Phones". Thesis, Umeå universitet, Institutionen för matematik och matematisk statistik, 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-108395.
Pełny tekst źródła