Дисертації з теми "Vocal feature"
Оформте джерело за APA, MLA, Chicago, Harvard та іншими стилями
Ознайомтеся з топ-24 дисертацій для дослідження на тему "Vocal feature".
Біля кожної праці в переліку літератури доступна кнопка «Додати до бібліографії». Скористайтеся нею – і ми автоматично оформимо бібліографічне посилання на обрану працю в потрібному вам стилі цитування: APA, MLA, «Гарвард», «Чикаго», «Ванкувер» тощо.
Також ви можете завантажити повний текст наукової публікації у форматі «.pdf» та прочитати онлайн анотацію до роботи, якщо відповідні параметри наявні в метаданих.
Переглядайте дисертації для різних дисциплін та оформлюйте правильно вашу бібліографію.
Moore, Elliot II. "Evaluating objective feature statistics of speech as indicators of vocal affect and depression." Diss., Georgia Institute of Technology, 2003. http://hdl.handle.net/1853/5346.
Повний текст джерелаMoore, Elliot. "Evaluating objective feature statistics of speech as indicators of vocal affect and depression." Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04062004-164738/unrestricted/moore%5Felliot%5F200312%5Fphd.pdf.
Повний текст джерелаCarvalho, Raphael Torres Santos. "Transformada Wavelet na detecÃÃo de patologias da laringe." Universidade Federal do CearÃ, 2012. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=8908.
Повний текст джерелаA quantidade de mÃtodos nÃo invasivos de diagnÃstico tem aumentado devido à necessidade de exames simples, rÃpidos e indolores. Por conta do crescimento da tecnologia que fornece os meios necessÃrios para a extraÃÃo e processamento de sinais, novos mÃtodos de anÃlise tÃm sido desenvolvidos para compreender a complexidade dos sinais de voz. Este trabalho de dissertaÃÃo apresenta uma nova ideia para caracterizar os sinais de voz saudÃvel e patolÃgicos baseado em uma ferramenta matemÃtica amplamente conhecida na literatura, a Transformada Wavelet (WT). O conjunto de dados utilizado neste trabalho consiste de 60 amostras de vozes divididas em quatro classes de amostras, uma de indivÃduos saudÃveis e as outras trÃs de pessoas com nÃdulo vocal, edema de Reinke e disfonia neurolÃgica. Todas as amostras foram gravadas usando a vogal sustentada /a/ do PortuguÃs Brasileiro. Os resultados obtidos por todos os classificadores de padrÃes estudados mostram que a abordagem proposta usando WT à uma tÃcnica adequada para discriminaÃÃo entre vozes saudÃvel e patolÃgica, e apresentaram resultados similares ou superiores a da tÃcnica clÃssica quanto à taxa de reconhecimento.
The amount of non-invasive methods of diagnosis has increased due to the need for simple, quick and painless tests. Due to the growth of technology that provides the means for extraction and signal processing, new analytical methods have been developed to help the understanding of analysis of the complexity of the voice signals. This dissertation presents a new idea to characterize signals of healthy and pathological voice based on one mathematical tools widely known in the literature, Wavelet Transform (WT). The speech data were used in this work consists of 60 voice samples divided into four classes of samples: one from healthy individuals and three from people with vocal fold nodules, Reinkeâs edema and neurological dysphonia. All the samples were recorded using the vowel /a/ in Brazilian Portuguese. The obtained results by all the pattern classifiers studied indicate that the proposed approach using WT is a suitable technique to discriminate between healthy and pathological voices, since they perform similarly to or even better than classical technique, concerning recognition rates.
Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.
Повний текст джерелаWildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.
Повний текст джерелаThesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
Horwitz-Martin, Rachelle (Rachelle Laura). "Vocal modulation features in the prediction of major depressive disorder severity." Thesis, Massachusetts Institute of Technology, 2014. http://hdl.handle.net/1721.1/93072.
Повний текст джерела"September 2014." Cataloged from PDF version of thesis.
Includes bibliographical references (pages 113-115).
This thesis develops a model of vocal modulations up to 50 Hz in sustained vowels as a basis for biomarkers of neurological disease, particularly Major Depressive Disorder (MDD). Two model components contribute to amplitude modulation (AM): AM from respiratory muscles and from interaction between formants and frequency modulation in the fundamental frequency harmonics. Based on the modulation model, we test three methods to extract the envelope of the third formant from which features are extracted using sustained vowels from the 2013 AudioNisual Emotion Challenge. Using a Gaussian-Mixture-Model-based predictor, we evaluate performance of each feature in predicting subjects' Beck MDD severity score by the root mean square error (RMSE), mean absolute error (MAE), and Spearman correlation between the actual Beck score and predicted score. Our lowest MAE and RMSE values are 8.46 and 10.32, respectively (Spearman correlation=0.487, p<0.001), relative to the mean MAE of 10.05 and mean RMSE of 11.86.
by Rachelle L. Horwitz.
S.M.
GarciÌa, MariÌa Susana Avila. "Automatic tracking of 3D vocal tract features during speech production using MRI." Thesis, University of Southampton, 2006. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.437111.
Повний текст джерелаMarchetto, Enrico. "Automatic Speaker Recognition and Characterization by means of Robust Vocal Source Features." Doctoral thesis, Università degli studi di Padova, 2011. http://hdl.handle.net/11577/3427390.
Повний текст джерелаIl Riconoscimento Automatico del Parlatore rappresenta un campo di ricerca esteso, che comprende molti argomenti: elaborazione del segnale, fisiologia vocale e dell'apparato uditivo, strumenti di modellazione statistica, studio del linguaggio, ecc. Lo studio di queste tecniche è iniziato circa trenta anni fa e, da allora, ci sono stati grandi miglioramenti. Nondimeno, il campo di ricerca continua a porre questioni e, in tutto il mondo, gruppi di ricerca continuano a lavorare per ottenere sistemi di riconoscimento più affidabili e con prestazioni migliori. La presente tesi documenta un progetto di Philosophiae Doctor finanziato dall'Azienda privata RT - Radio Trevisan Elettronica Industriale S.p.A. Il titolo della borsa di studio è "Riconoscimento automatico del parlatore con applicazioni alla sicurezza e all'intelligence". Parte del lavoro ha avuto luogo durante una visita, durata sei mesi, presso lo Speech, Music and Hearing Department del KTH - Royal Institute of Technology di Stoccolma. La ricerca inerente il Riconoscimento del Parlatore sviluppa tecnologie per associare automaticamente una data voce umana ad una versione precedentemente registrata della stessa. Il Riconoscimento del Parlatore (Speaker Recognition) viene solitamente meglio definito in termini di Verifica o di Identificazione del Parlatore (in letteratura Speaker Verification o Speaker Identification, rispettivamente). L'Identificazione consiste nel recupero dell'identità di una voce fra un numero (anche alto) di voci modellate dal sistema; nella Verifica invece, date una voce ed una identità, si chiede al sistema di verificare l'associazione tra le due. I sistemi di riconoscimento producono anche un punteggio (Score) che attesta l'attendibilità della risposta fornita. La prima Parte della tesi propone una revisione dello stato dell'arte circa il Riconoscimento del Parlatore. Vengono descritte le componenti principali di un prototipo per il riconoscimento: estrazione di Features audio, modellazione statistica e verifica delle prestazioni. Nel tempo, la comunità di ricerca ha sviluppato una quantità di Features Acustiche: si tratta di tecniche per descrivere numericamente il segnale vocale in modo compatto e deterministico. In ogni applicazione di riconoscimento, anche per le parole o il linguaggio (Speech o Language Recognition), l'estrazione di Features è il primo passo: ha lo scopo di ridurre drasticamente la dimensione dei dati di ingresso, ma senza perdere alcuna informazione significativa. La scelta delle Features più idonee ad una specifica applicazione, e la loro taratura, sono cruciali per ottenere buoni risultati di riconoscimento; inoltre, la definizione di nuove features costituisce un attivo campo di ricerca perché la comunità scientifica ritiene che le features esistenti siano ancora lontane dallo sfruttamento dell'intera informazione portata dal segnale vocale. Alcune Features si sono affermate nel tempo per le loro migliori prestazioni: Coefficienti Cepstrali in scala Mel (Mel-Frequency Cepstral Coefficients) e Coefficienti di Predizione Lineare (Linear Prediction Coefficients); tali Features vengono descritte nella Parte I. Viene introdotta anche la modellazione statistica, spiegando la struttura dei Modelli a Misture di Gaussiane (Gaussian Mixture Models) ed il relativo algoritmo di addestramento (Expectation-Maximization). Tecniche di modellazione specifiche, quali Universal Background Model, completano poi la descrizione degli strumenti statistici usati per il riconoscimento. Lo Scoring rappresenta, infine, la fase di produzione dei risultati da parte del sistema di riconoscimento; comprende diverse procedure di normalizzazione che compensano, ad esempio, i problemi di modellazione o le diverse condizioni acustiche con cui i dati audio sono stati registrati. La Parte I prosegue poi presentando alcuni database audio usati comunemente in letteratura quali riferimento per il confronto delle prestazioni dei sistemi di riconoscimento; in particolare, vengono presentati TIMIT e NIST Speaker Recognition Evaluation (SRE) 2004. Tali database sono adatti alla valutazione delle prestazioni su audio di natura telefonica, di interesse per la presente tesi; tale argomento verrà ulteriormente discusso nella Parte II. Durante il progetto di PhD è stato progettato e realizzato un prototipo di sistema di riconoscimento, discusso nella Parte II. Il primo Capitolo descrive l'applicazione di riconoscimento proposta; la tecnologia per Riconoscimento del Parlatore viene applicate alle linee telefoniche, con riferimento alla sicurezza e all'intelligence. L'applicazione risponde a una specifica necessità delle Autorità quando le investigazioni coinvolgono intercettazioni telefoniche. In questi casi le Autorità devono ascoltare grandi quantità di dati telefonici, la maggior parte dei quali risulta essere inutile ai fini investigativi. L'idea applicativa consiste nell'identificazione e nell'etichettatura automatiche dei parlatori presenti nelle intercettazioni, permettendo così la ricerca di uno specifico parlatore presente nella collezione di registrazioni. Questo potrebbe ridurre gli sprechi di tempo, ottenendo così vantaggi economici. L'audio proveniente da linee telefoniche pone difficoltà al riconoscimento automatico, perché degrada significativamente il segnale e peggiora quindi le prestazioni. Vengono generalmente riconosciute alcune problematiche del segnale audio telefonico: banda ridotta, rumore additivo e rumore convolutivo; quest'ultimo causa distorsione di fase, che altera la forma d'onda del segnale. Il secondo Capitolo della Parte II descrive in dettaglio il sistema di Riconoscimento del Parlatore sviluppato; vengono discusse le diverse scelte di progettazione. Sono state sviluppate le componenti fondamentali di un sistema di riconoscimento, con alcune migliorie per contenere il carico computazionale. Durante lo sviluppo si è ritenuto primario lo scopo di ricerca del software da realizzare: è stato profuso molto impegno per ottenere un sistema con buone prestazioni, che però rimanesse semplice da modificare anche in profondità. La necessità (ed opportunità) di verificare le prestazioni del prototipo ha posto ulteriori requisiti allo sviluppo, che sono stati soddisfatti mediante l'adozione di un'interfaccia comune ai diversi database. Infine, tutti i moduli del software sviluppato possono essere eseguiti su un Cluster di Calcolo (calcolatore ad altre prestazioni per il calcolo parallelo); questa caratteristica del prototipo è stata cruciale per permettere una approfondita valutazione delle prestazioni del software in tempi ragionevoli. Durante il lavoro svolto per il progetto di Dottorato sono stati condotti studi affini al Riconoscimento del Parlatore, ma non direttamente correlati ad esso. Questi sviluppi vengono descritti nella Parte II quali estensioni del prototipo. Viene innanzitutto presentato un Rilevatore di Parlato (Voice Activity Detector) adatto all'impiego in presenza di rumore. Questo componente assume particolare importanza quale primo passo dell'estrazione delle Features: è necessario infatti selezionare e mantenere solo i segmenti audio che contengono effettivamente segnale vocale. In situazioni con rilevante rumore di fondo i semplici approcci a "soglia di energia" falliscono. Il Rilevatore realizzato è basato su Features avanzate, ottenute mediante le Trasformate Wavelet, ulteriormente elaborate mediante una sogliatura adattiva. Una seconda applicazione consiste in un prototipo per la Speaker Diarization, ovvero l'etichettatura automatica di registrazioni audio contenenti diversi parlatori. Il risultato del procedimento consiste nella segmentazione dell'audio ed in una serie di etichette, una per ciascun segmento; il sistema fornisce una risposta del tipo "chi parla quando". Il terzo ed ultimo studio collaterale al Riconoscimento del Parlatore consiste nello sviluppo di un sistema di Riduzione del Rumore (Noise Reduction) su piattaforma hardware DSP dedicata. L'algoritmo di Riduzione individua il rumore in modo adattivo e lo riduce, cercando di mantenere solo il segnale vocale; l'elaborazione avviene in tempo reale, pur usando solo una parte molto limitata delle risorse di calcolo del DSP. La Parte III della tesi introduce, infine, Features audio innovative, che costituiscono il principale contributo innovativo della tesi. Tali Features sono ottenute dal flusso glottale, quindi il primo Capitolo della Parte discute l'anatomia del tratto e delle corde vocali. Viene descritto il principio di funzionamento della fonazione e l'importanza della fisica delle corde vocali. Il flusso glottale costituisce un ingresso per il tratto vocale, che agisce come un filtro. Viene descritto uno strumento software open-source per l'inversione del tratto vocale: esso permette la stima del flusso glottale a partire da semplici registrazioni vocali. Alcuni dei metodi usati per caratterizzare numericamente il flusso glottale vengono infine esposti. Nel Capitolo successivo viene presentata la definizione delle nuove Features glottali. Le stime del flusso glottale non sono sempre affidabili quindi, durante l'estrazione delle nuove Features, il primo passo individua ed esclude i flussi giudicati non attendibili. Una procedure numerica provvede poi a raggruppare ed ordinare le stime dei flussi, preparandoli per la modellazione statistica. Le Features glottali, applicate al Riconoscimento del Parlatore sui database TIMIT e NIST SRE 2004, vengono comparate alle Features standard. Il Capitolo finale della Parte III è dedicato ad un diverso lavoro di ricerca, comunque correlato alla caratterizzazione del flusso glottale. Viene presentato un modello fisico delle corde vocali, controllato da alcune regole numeriche, in grado di descrivere la dinamica delle corde stesse. Le regole permettono di tradurre una specifica impostazione dei muscoli glottali nei parametri meccanici del modello, che portano ad un preciso flusso glottale (ottenuto dopo una simulazione al computer del modello). Il cosiddetto Problema Inverso è definito nel seguente modo: dato un flusso glottale si chiede di trovare una impostazione dei muscoli glottali che, usata per guidare il modello fisico, permetta la risintesi di un segnale glottale il più possibile simile a quello dato. Il problema inverso comporta una serie di difficoltà, quali la non-univocità dell'inversione e la sensitività alle variazioni, anche piccole, del flusso di ingresso. E' stata sviluppata una tecnica di ottimizzazione del controllo, che viene descritta. Il capitolo conclusivo della tesi riassume i risultati ottenuti. A fianco di questa discussione è presentata un piano di lavoro per lo sviluppo delle Features introdotte. Vengono infine presentate le pubblicazioni prodotte.
Almeida, N?thalee Cavalcanti de. "Sistema inteligente para diagn?stico de patologias na laringe utilizando m?quinas de vetor de suporte." Universidade Federal do Rio Grande do Norte, 2010. http://repositorio.ufrn.br:8080/jspui/handle/123456789/15149.
Повний текст джерелаCoordena??o de Aperfei?oamento de Pessoal de N?vel Superior
The human voice is an important communication tool and any disorder of the voice can have profound implications for social and professional life of an individual. Techniques of digital signal processing have been used by acoustic analysis of vocal disorders caused by pathologies in the larynx, due to its simplicity and noninvasive nature. This work deals with the acoustic analysis of voice signals affected by pathologies in the larynx, specifically, edema, and nodules on the vocal folds. The purpose of this work is to develop a classification system of voices to help pre-diagnosis of pathologies in the larynx, as well as monitoring pharmacological treatments and after surgery. Linear Prediction Coefficients (LPC), Mel Frequency cepstral coefficients (MFCC) and the coefficients obtained through the Wavelet Packet Transform (WPT) are applied to extract relevant characteristics of the voice signal. For the classification task is used the Support Vector Machine (SVM), which aims to build optimal hyperplanes that maximize the margin of separation between the classes involved. The hyperplane generated is determined by the support vectors, which are subsets of points in these classes. According to the database used in this work, the results showed a good performance, with a hit rate of 98.46% for classification of normal and pathological voices in general, and 98.75% in the classification of diseases together: edema and nodules
A voz humana ? uma importante ferramenta de comunica??o e qualquer funcionamento inadequado da voz pode ter profundas implica??es na vida social e profissional de um indiv?duo. T?cnicas de processamento digital de sinais t?m sido utilizadas atrav?s da an?lise ac?stica de desordens vocais provocadas por patologias na laringe, devido ? sua simplicidade e natureza n?o-invasiva. Este trabalho trata da an?lise ac?stica de sinais de vozes afetadas por patologias na laringe, especificamente, edemas e n?dulos nas pregas vocais. A proposta deste trabalho ? desenvolver um sistema de classifica??o de vozes para auxiliar no pr?-diagn?stico de patologias na laringe, bem como no acompanhamento de tratamentos farmacol?gicos e p?s-cir?rgicos. Os coeficientes de Predi??o Linear (LPC), Coeficientes Cepstrais de Freq??ncia Mel (MFCC) e os coeficientes obtidos atrav?s da Transformada Wavelet Packet (WPT) s?o aplicados para extra??o de caracter?sticas relevantes do sinal de voz. ? utilizada para a tarefa de classifica??o M?quina de Vetor de Suporte (SVM), a qual tem como objetivo construir hiperplanos ?timos que maximizem a margem de separa??o entre as classes envolvidas. O hiperplano gerado ? determinado pelos vetores de suporte, que s?o subconjuntos de pontos dessas classes. De acordo com o banco de dados utilizado neste trabalho, os resultados apresentaram um bom desempenho, com taxa de acerto de 98,46% para classifica??o de vozes normais e patol?gicas em geral, e 98,75% na classifica??o de patologias entre si: edemas e n?dulos
Lovett, Victoria Anne. "Voice Features of Sjogren's Syndrome: Examination of Relative Fundamental Frequency (RFF) During Connected Speech." BYU ScholarsArchive, 2014. https://scholarsarchive.byu.edu/etd/5749.
Повний текст джерелаFux, Thibaut. "Vers un système indiquant la distance d'un locuteur par transformation de sa voix." Thesis, Grenoble, 2012. http://www.theses.fr/2012GRENT120/document.
Повний текст джерелаThis thesis focuses on speaker voice transformation in the aim to indicate the distance of it: a spokento-whispered voice transformation to indicate a close distance and a spoken-to-shouted voicetransformation for a rather far distance. We perform at first, in-depth analysis to determine mostrelevant features in whispered voices and especially in shouted voices (much harder). The maincontribution of this part is to show the relevance of prosodic parameters in the perception of vocaleffort in a shouted voice. Then, we propose some descriptors to better characterize the prosodiccontours. For the actual transformation, we propose several new transformation rules whichimportantly control the quality of transformed voice. The results showed a very good quality oftransformed whispered voices and transformed shouted voices for relatively simple linguisticstructures (CVC, CVCV, etc.)
Lang, Anja [Verfasser]. "Histomorphometrical analysis of the fibrous components of the porcine vocal folds –Stratigraphical features and their relevance for models in phoniatry / Anja Lang." Hannover : Bibliothek der Tierärztlichen Hochschule Hannover, 2014. http://d-nb.info/1054387656/34.
Повний текст джерелаBoyer, Stanislas. "Contribution de l'analyse du signal vocal à la détection de l'état de somnolence et du niveau de charge mentale." Thesis, Toulouse 3, 2016. http://www.theses.fr/2016TOU30075/document.
Повний текст джерелаOperational requirements of aircraft pilots may cause drowsiness and inadequate mental load levels (i.e., too low or too high) during flights. Sleep debts and circadian disruptions linked to various factors (e.g., long working periods, irregular work schedules, etc.) require pilots to challenge their biological limits. Moreover, pilots' mental workload exhibits strong fluctuations during flights: higher during critical phases (i.e., takeoff and landing), it becomes very low during cruising phases. When the mental load becomes too high or, conversely, too low, performance decreases and flight errors may manifest. Implementation of detection methods of drowsiness and mental load levels in near real time is a major challenge for monitoring and controlling flight activity. The aim of this thesis is therefore to determine if the human voice can serve to detect on one hand the drowsiness and on the other hand the mental load level of an individual. In a first study, the voice of participants was recorded during a reading task before and after a night of total sleep deprivation (TSD). Drowsiness variations linked to TSD were assessed using self-evaluative and electrophysiological measures (ElectroEncephaloGraphy [EEG] and Evoked Potentials [EPs]). Results showed significant variations after the TSD in many acoustic features related to: (a) the amplitude of the glottal pulses (amplitude modulation frequency), (b) the shape of the acoustic wave (Euclidean length of the signal and its associated features) and (3) the spectrum of the vowel signal (harmonic-to-noise ratio, second formant frequency, skewness, spectral center of gravity, energy differences, spectral tilt and Mel-frequency cepstral coefficients). Most spectral features showed different sensitivity to sleep deprivation depending on the vowel type. Significant correlations were found between several acoustic features and several objective indicators (EEG and PEs) of drowsiness. In a second study, voices were recorded during a task featuring word-list recall. The difficulty of the task was manipulated by varying the number of words in each list (i.e., between one and seven, corresponding to seven mental load conditions). Evoked pupillary response - known to be a useful proxy of mental load - was recorded simultaneously with speech to attest variations in mental load level during the experimental task. Results showed that classical features (fundamental frequency and its standard deviation, shimmer, number of periods and harmonic-to-noise ratio) and original features (amplitude modulation frequency and short-term variation in digital amplitude length) were particularly sensitive to variations in mental load. Variations in these acoustic features were correlated to those of the pupil size. Results suggest that the acoustic features of the human voice identified during these experiments could represent relevant indicators for the detection of drowsiness and mental load levels of an individual. Findings open up many research and applications perspectives in the field of transport safety, particularly in the aeronautical sector
Kahn, Juliette. "Parole de locuteur : performance et confiance en identification biométrique vocale." Phd thesis, Université d'Avignon, 2011. http://tel.archives-ouvertes.fr/tel-00995071.
Повний текст джерелаSklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.
Повний текст джерелаMoura, Giselle Borges de. "Vocalização de suínos em grupo sob diferentes condições térmicas." Universidade de São Paulo, 2013. http://www.teses.usp.br/teses/disponiveis/11/11131/tde-26042013-094034/.
Повний текст джерелаTo quantify and to qualify animal well-being in livestock farms is still a challenge. To assess animal well-being, it must be analyzed, mainly, the absence of strong negative feelings, like pain, and the presence of positive feelings, like pleasure. The main objective was to quantify vocalization in a group of pigs under different thermal conditions. The specific objectives were to assess the existence of vocal pattern of communication between housing groups of pigs, and get the acoustic characteristics of the sound spectrum from the vocalizations related to the different microclimate conditions. The trial was carried out in a controlled environment experimental unit for pigs, at the University of Illinois (USA). Four groups of six pigs were used in the data collection. Dataloggers were installed to record environmental variables (T, °C and RH, %). These environmental variable were used to calculate two thermal comfort index: Enthalpy and THI. Cardioid microphones were installed to record continuous vocalizations in the geometric center of each pen where the pigs were housing. Microphones were connected to an amplifier, and this was connected to a dvr card installed in a computer to record audio and video information. For doing the sound edition in a pig vocalization database, the Goldwave® software was used to separate, and filter the files excluding background noise. In the sequence, the sounds were analyzed using the software Sounds Analysis Pro 2011, and the acoustic characteristics were extracted. Amplitude (dB), pitch (Hz), mean frequency (Hz), peak frequency (Hz) and entropy were used to characterize the sound spectrum of vocalizations of the groups of piglets in the different thermal conditions. A randomized block design was used, composed by two treatments and three repetitions in a week and executed in two weeks. Data were sampled to analyze the behavior of the databank of vocalization as a relation to the applied treatments. Data were submitted to an analysis of variance using the proc GLM of SAS. Among the studied acoustic parameters, the amplitude (dB), pitch and entropy. The treatments (comfort and heat stress conditions) presented significative differences, through Tukey\'s test (p<0,05). The analysis of variance showed differences to the wave format to each thermal condition in the different periods of the day. The quantification of vocalization of swine in groups under different thermal conditions is possible, using the extraction of acoustic characteristics from the sound samples. The sound spectrum was extracted, which indicated possible alterations in the piglets behavior in the different thermal conditions during the periods of the day. However, the stage of pattern\'s recognition still needs a larger and more consistent database to the recognition of the spectrum in each thermal condition, through image analysis or by the extraction of the acoustic characteristics. Among he analyzed acoustic characteristics, the amplitude (dB), pitch (Hz) and entropy of the vocalizations of groups of swine were significative to express the condition of the animals in different thermal conditions.
Steinholtz, Tim. "Skip connection in a MLP network for Parkinson’s classification." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-303130.
Повний текст джерелаI denna avhandling har två olika arkitektur designer för ett artificiellt flerskikts neuralt nätverk implementerats. En arkitektur som följer konventionen för ett vanlig MLP nätverk, samt en ny arkitektur som introducerar DenseNet inspirerade genvägs kopplingar i MLP nätverk. Modellerna användes och utvärderades för klassificering, vars mål var att urskilja försökspersoner som friska eller diagnostiserade med Parkinsons sjukdom baserat på röst attribut. Modellerna tränades på ett öppet tillgänglig dataset för Parkinsons klassificering och utvärderades på en delmängd av denna data som inte hade använts för träningen, samt två dataset som kommer från en annan ljudinspelnings miljö än datan för träningen. Avhandlingen sökte efter svaret på två frågor; Hur okänsliga modeller för Parkinsons klassificering är för ljudinspelnings miljön och hur de föreslagna genvägs kopplingarna i en MLP-modell kan bidra till att förbättra prestanda och generalisering kapacitet. Resultaten av avhandlingen visar att ljudmiljön påverkar noggrannheten, men drar slutsatsen att med mer tid skulle man troligen kunna övervinna detta och möjliggöra god noggrannhet i nya ljudmiljöer. När det kommer till om genvägs kopplingarna förbättrar noggrannhet och generalisering, är avhandlingen inte i stånd att dra några breda slutsatser på grund av den data som användes. Modellerna hade generellt bästa prestanda med grunda nätverk, och det är i djupare nätverk som genvägs kopplingarna argumenteras för att förbättra dessa egenskaper. Med det sagt, om man bara kollade på resultaten på datan som är ifrån en annan ljudinspelnings miljö så hade genvägs arkitekturen bättre resultat i två av de tre testerna som utfördes.
Regnier, Lise. "Localization, Characterization and Recognition of Singing Voices." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00687475.
Повний текст джерела"Use of vocal source features in speaker segmentation." 2006. http://library.cuhk.edu.hk/record=b5892857.
Повний текст джерелаThesis (M.Phil.)--Chinese University of Hong Kong, 2006.
Includes bibliographical references (leaves 77-82).
Abstracts in English and Chinese.
Chapter Chapter1 --- Introduction --- p.1
Chapter 1.1 --- Speaker recognition --- p.1
Chapter 1.2 --- State of the art of speaker recognition techniques --- p.2
Chapter 1.3 --- Motivations --- p.5
Chapter 1.4 --- Thesis outline --- p.6
Chapter Chapter2 --- Acoustic Features --- p.8
Chapter 2.1 --- Speech production --- p.8
Chapter 2.1.1 --- Physiology of speech production --- p.8
Chapter 2.1.2 --- Source-filter model --- p.11
Chapter 2.2 --- Vocal tract and vocal source related acoustic features --- p.14
Chapter 2.3 --- Linear predictive analysis of speech --- p.15
Chapter 2.4 --- Features for speaker recognition --- p.16
Chapter 2.4.1 --- Vocal tract related features --- p.17
Chapter 2.4.2 --- Vocal source related features --- p.19
Chapter 2.5 --- Wavelet octave coefficients of residues (WOCOR) --- p.20
Chapter Chapter3 --- Statistical approaches to speaker recognition --- p.24
Chapter 3.1 --- Statistical modeling --- p.24
Chapter 3.1.1 --- Classification and modeling --- p.24
Chapter 3.1.2 --- Parametric vs non-parametric --- p.25
Chapter 3.1.3 --- Gaussian mixture model (GMM) --- p.25
Chapter 3.1.4 --- Model estimation --- p.27
Chapter 3.2 --- Classification --- p.28
Chapter 3.2.1 --- Multi-class classification for speaker identification --- p.28
Chapter 3.2.2 --- Two-speaker recognition --- p.29
Chapter 3.2.3 --- Model selection by statistical model --- p.30
Chapter 3.2.4 --- Performance evaluation metric --- p.31
Chapter Chapter4 --- Content dependency study of WOCOR and MFCC --- p.32
Chapter 4.1 --- Database: CU2C --- p.32
Chapter 4.2 --- Methods and procedures --- p.33
Chapter 4.3 --- Experimental results --- p.35
Chapter 4.4 --- Discussion --- p.36
Chapter 4.5 --- Detailed analysis --- p.39
Summary --- p.41
Chapter Chapter5 --- Speaker Segmentation --- p.43
Chapter 5.1 --- Feature extraction --- p.43
Chapter 5.2 --- Statistical methods for segmentation and clustering --- p.44
Chapter 5.2.1 --- Segmentation by spectral difference --- p.44
Chapter 5.2.2 --- Segmentation by Bayesian information criterion (BIC) --- p.47
Chapter 5.2.3 --- Segment clustering by BIC --- p.49
Chapter 5.3 --- Baseline system --- p.50
Chapter 5.3.1 --- Algorithm --- p.50
Chapter 5.3.2 --- Speech database --- p.52
Chapter 5.3.3 --- Performance metric --- p.53
Chapter 5.3.4 --- Results --- p.58
Summary --- p.60
Chapter Chapter6 --- Application of vocal source features in speaker segmentation --- p.61
Chapter 6.1 --- Discrimination power of WOCOR against MFCC --- p.61
Chapter 6.1.1 --- Experimental set-up --- p.62
Chapter 6.1.2 --- Results --- p.63
Chapter 6.2 --- Speaker segmentation using vocal source features --- p.67
Chapter 6.2.1 --- The construction of new proposed system --- p.67
Summary --- p.72
Chapter Chapter7 --- Conclusions --- p.74
Reference --- p.77
"Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances." 2007. http://library.cuhk.edu.hk/record=b5893317.
Повний текст джерелаThesis (M.Phil.)--Chinese University of Hong Kong, 2007.
Includes bibliographical references (leaves 106-115).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Introduction to Speech and Speaker Recognition --- p.1
Chapter 1.2 --- Difficulties and Challenges of Speaker Authentication --- p.6
Chapter 1.3 --- Objectives and Thesis Outline --- p.7
Chapter 2 --- Speaker Recognition System --- p.10
Chapter 2.1 --- Baseline Speaker Recognition System Overview --- p.10
Chapter 2.1.1 --- Feature Extraction --- p.12
Chapter 2.1.2 --- Pattern Generation and Classification --- p.24
Chapter 2.2 --- Performance Evaluation Metric for Different Speaker Recognition Tasks --- p.30
Chapter 2.3 --- Robustness of Speaker Recognition System --- p.30
Chapter 2.3.1 --- Speech Corpus: CU2C --- p.30
Chapter 2.3.2 --- Noise Database: NOISEX-92 --- p.34
Chapter 2.3.3 --- Mismatched Training and Testing Conditions --- p.35
Chapter 2.4 --- Summary --- p.37
Chapter 3 --- Speaker Recognition System using both Vocal Tract and Vocal Source Features --- p.38
Chapter 3.1 --- Speech Production Mechanism --- p.39
Chapter 3.1.1 --- Speech Production: An Overview --- p.39
Chapter 3.1.2 --- Acoustic Properties of Human Speech --- p.40
Chapter 3.2 --- Source-filter Model and Linear Predictive Analysis --- p.44
Chapter 3.2.1 --- Source-filter Speech Model --- p.44
Chapter 3.2.2 --- Linear Predictive Analysis for Speech Signal --- p.46
Chapter 3.3 --- Vocal Tract Features --- p.51
Chapter 3.4 --- Vocal Source Features --- p.52
Chapter 3.4.1 --- Source Related Features: An Overview --- p.52
Chapter 3.4.2 --- Source Related Features: Technical Viewpoints --- p.54
Chapter 3.5 --- Effects of Noises on Speech Properties --- p.55
Chapter 3.6 --- Summary --- p.61
Chapter 4 --- Estimation of Robust Acoustic Features for Speaker Discrimination --- p.62
Chapter 4.1 --- Robust Speech Techniques --- p.63
Chapter 4.1.1 --- Noise Resilience --- p.64
Chapter 4.1.2 --- Speech Enhancement --- p.64
Chapter 4.2 --- Spectral Subtractive-Type Preprocessing --- p.65
Chapter 4.2.1 --- Noise Estimation --- p.66
Chapter 4.2.2 --- Spectral Subtraction Algorithm --- p.66
Chapter 4.3 --- LP Analysis of Noisy Speech --- p.67
Chapter 4.3.1 --- LP Inverse Filtering: Whitening Process --- p.68
Chapter 4.3.2 --- Magnitude Response of All-pole Filter in Noisy Condition --- p.70
Chapter 4.3.3 --- Noise Spectral Reshaping --- p.72
Chapter 4.4 --- Distinctive Vocal Tract and Vocal Source Feature Extraction . . --- p.73
Chapter 4.4.1 --- Vocal Tract Feature Extraction --- p.73
Chapter 4.4.2 --- Source Feature Generation Procedure --- p.75
Chapter 4.4.3 --- Subband-specific Parameterization Method --- p.79
Chapter 4.5 --- Summary --- p.87
Chapter 5 --- Speaker Recognition Tasks & Performance Evaluation --- p.88
Chapter 5.1 --- Speaker Recognition Experimental Setup --- p.89
Chapter 5.1.1 --- Task Description --- p.89
Chapter 5.1.2 --- Baseline Experiments --- p.90
Chapter 5.1.3 --- Identification and Verification Results --- p.91
Chapter 5.2 --- Speaker Recognition using Source-tract Features --- p.92
Chapter 5.2.1 --- Source Feature Selection --- p.92
Chapter 5.2.2 --- Source-tract Feature Fusion --- p.94
Chapter 5.2.3 --- Identification and Verification Results --- p.95
Chapter 5.3 --- Performance Analysis --- p.98
Chapter 6 --- Conclusion --- p.102
Chapter 6.1 --- Discussion and Conclusion --- p.102
Chapter 6.2 --- Suggestion of Future Work --- p.104
"Exploitation of phase and vocal excitation modulation features for robust speaker recognition." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075192.
Повний текст джерелаSpeaker recognition (SR) refers to the process of automatically determining or verifying the identity of a person based on his or her voice characteristics. In practical applications, a voice can be used as one of the modalities in a multimodal biometric system, or be the sole medium for identity authentication. The general area of speaker recognition encompasses two fundamental tasks: speaker identification and speaker verification.
Wang, Ning.
Adviser: Pak-Chung Ching.
Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical references (leaves 177-193).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.
Wei, Ciou, and 邱薇. "Detecting Emotional Responses of the Customer Service Staff and Exploring General-critical-vocal Features." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/gvu5be.
Повний текст джерела元智大學
工業工程與管理學系
107
As the consumer consciousness rises, the improper dialogue tone of the customer service staff directly affects the customer satisfaction. Therefore, the development of the instant emotion discrimination system for assessing the improper emotional response of the service personnel would help the supervisor provide adequate assistance, and hence improve the service quality. In addition, many studies in the past used different corpora for voice emotion recognition, but important voice feature may vary depending on the corpus or its language. So far, no research has been done to find a set of best voice feature combinations (Anagnostopoulos et al., 2015). Therefore, the research is divided into two stages. The first stage of the study will use the customer service voice data to understand the voice emotion recognition method. The second stage of the study will use the three corpora from three languages to identify and classify their voice emotions, and try to find out the critical features using various scientific logics. Six steps of first stage study method included (1) labelling the dialogue data as good, moderate and bad categories, (2) preprocessing the dialogue data (removing silent parts, diminishing noises, and keeping the voice of the customer service staff), and visualize the original sound file with the t-SNE model, (3) using the OpenSMILE to extract 384 voice features, (4) using methods of principal component analysis, Fisher’s criterion, analysis of variance (ANOVA) and Random Forest to select features, (5) using certain methods comprising OneClassSVM, SVM, Random Forest, ANN and CNN to build recognition models and compare the effectiveness of feature selection methods and modelling performance, (6) and final step was data tag verification. Five steps of second stage study method included (1) screening the corpora of the three languages data and visualizing the original sound files with the t-SNE model, and (2) using the OpenSMILE to extract 384 voice features, (3) using different combinations of voice features, using ANOVA to select the critical voice features (4) and then using different combinations of training and testing data to build SVM model, (5) finally comparing the classification results form each critical voice features combination. First stage study results showed that using Fisher’s Criterion, ANOVA and Random Forest as feature selection methods not only can effectively increase the accuracy of model prediction, but more importantly, the results can help to understand the important relationships between emotion and voice features. The results of this study help to detect inappropriate dialogue tone of customer service staff and hence improve customer service quality. Second stage study results showed that using all corpora for feature selection, considering the emotion and language, the selected features can improve the accuracy of the German corpus (the accuracy is increased from 81.62% to 85.12%), and this study have also confirmed the difficulty of identifying corpora that classify different languages. In the past, most voice emotion recognition researches are devoted to obtaining the best accuracy in the corpus. However, this study believes that identifying multiple corpora at the same time is the biggest challenge in today's voice emotion recognition research, and it is also the trend of voice emotion recognition research in the future. The purpose of most studies is to hope that the research method can be applied to daily life in the future, and the research methods for identifying multiple voice emotion data can be practically applied in today's daily life. Keywords: voice emotion recognition, analysis of variance, feature selection of voice, corpora of different languages
Zita, Aleš. "Automatic analysis of videokymographic images by means of higher-level features." Master's thesis, 2013. http://www.nusl.cz/ntk/nusl-324574.
Повний текст джерелаСаковська, Антоніна Андріївна, та Antonina Andriivna Sakovska. "Формування стилю бельканто у майбутнього співака". Master's thesis, 2021. http://repository.sspu.edu.ua/handle/123456789/11785.
Повний текст джерелаMaster studies are devoted to important theme in vocal pedagogy. The analysis of theoretical foundations of the formation of the style of Blcanto reveals the characteristic features of Belcanto, based on vocal tracts of the XVIII - the first half of the XIX century; The essence and specific features of the singing Belcanto, physiological and acoustic properties, the register of a singer voice. Methodical foundations for the formation of the style of Belcanto style are represented by the definition of the structure of the structure and development of the melody on the formation of Singa's singing skills, pedagogical conditions and methodological recommendations for the formation of Singing Skill skills in the future singer.