Log in

Relevant bibliographies by topics / Vocal recognition / Dissertations / Theses

Dissertations / Theses on the topic 'Vocal recognition'

To see the other types of publications on this topic, follow the link: Vocal recognition.

Author: Grafiati

Published: 4 June 2021

Last updated: 20 February 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Vocal recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Mathukumalli, Sravya. "Vocal combo android application." Kansas State University, 2016. http://hdl.handle.net/2097/32505.

Full text

Abstract:

Master of Science
Department of Computing and Information Sciences
Mitchell L. Neilsen
Now-a-days people from various backgrounds need different information on demand. People are relying on web as a source of information they need. But to get connected to internet all the time with a computer system is not possible. Android, being open source has already made its mark in the mobile application development. The highest smart phone user base and the ease of developing the applications in Android is an added advantage for the users as well as the Android developers. The vocal combo is an Android application which provides the required functionality on Android supported smart phone or a tablet. This provides the flexibility of accessing information at the users’ fingertips. This application is built using Android SDK, which makes the application easy to deploy on any Android powered device. Vocal Combo is a combination of voice based applications. It includes a Text-To-Voice convertor and Voice-To-Text convertor. This application helps the user to learn the pronunciation of various words. At the same time the user can also check his/her pronunciation skills. This application also provides the functionality of meaning check where the user can check the meaning of the words he types in or speaks out. At any point of time, the user can check the history of the words for which he has checked the meaning or pronunciation for. The application also provides the support to the user on how to use this application.

APA, Harvard, Vancouver, ISO, and other styles

2

Benkrid, A. "Real time TLM vocal tract modelling." Thesis, University of Nottingham, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.352958.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

貴文, 古山, and Takafumi Furuyama. "Vocal recognition in primate : comparison between Japanese macaques and humans." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13045021/?lang=0, 2017. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13045021/?lang=0.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Full text

Abstract:

Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.

APA, Harvard, Vancouver, ISO, and other styles

5

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Full text

Abstract:

Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

6

Keenan, Sumir. "Identity information in bonobo vocal communication : from sender to receiver." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSES038/document.

Full text

Abstract:

L’information "identité individuelle" est essentielle chez les espèces fortement sociales car elle permet la reconnaissance individuelle et la différenciation des partenaires sociaux dans de nombreux contextes tels que les relations de dominance, les relations mère-jeunes, la défense territoriale, ou encore participe à la cohésion et coordination de groupe. Chez de nombreuses espèces, le canal audio est l’une des voies les plus efficaces de communication dans des environnementscomplexes et à longue distance. Les vocalisations sont empreintes de caractéristiques acoustiques propres à la voix de chaque individu. La combinaison entre ces signatures vocales individuelles et la connaissance sociale accumulée sur les congénères peut grandement favoriser la valeur sélective des animaux, en facilitant notamment les prises de décisions sociales les plus adaptées. Le but de ma recherche est d’étudier le codage et décodage de l’information "identité individuelle" du système vocal de communication du bonobo, Pan paniscus. Premièrement, nous avons recherché la stabilité des signatures vocales des cinq types de cris les plus courants du répertoire du bonobo. Nous avons trouvé que, bien que ces cinq types de cris aient le potentiel de coder l’information individuelle, les cris les plus forts émis dans des contextes d’excitation intense et de communication à longue distance ont les signatures vocales individuelles les plus marquées. Deuxièmement, nous avons étudié l’effet de la familiarité sociale et des liens de parenté sur les caractéristiquesacoustiques qui codent l’information individuelle dans un type de cri "bark". Nous avons mis en évidence l’existence d’une forte convergence vocale. Les individus apparentés et familiers, et indépendamment l’un de l’autre, présentent plus desimilarités vocales qu’entre des individus non apparentés et non familiers. Enfin, dans une troisième étude, nous avons testé la capacité des bonobos à utiliser l’information "identité individuelle" codée dans les vocalisations pour discriminer la voix d’anciens partenaires sociaux avec qui ils ne vivent plus. Par une série d’expériences de repasse, nous avons démontré que les bonobos étaient capables de reconnaître la voix d’individus familiers sur la seule base de l’acoustique, et cela même après des années de séparation. L’ensemble de ce travail de thèse montre que le codage et décodage de l’information "identité individuelle" chez le bonobo est un système dynamique, sujet à modification avec l’environnement social mais suffisamment fiable pour permettre la reconnaissance individuelle au cours du temps. En conclusion cette étude participe à une meilleure compréhension du système de communication vocale chez un primate non-humain forestier, au réseau social unique et complexe
Identity information is vital for highly social species as it facilitates individual recognition and allows for differentiation between social partners in many contexts, such as dominance hierarchies, territorial defence, mating and parent-offspringidentification and group cohesion and coordination. In many species vocalisations can be the most effective communication channel through complex environments and over long-distances and are encoded with the stable features of an individual’s voice. Associations between these individual vocal signatures and accumulated social knowledge about conspecifics can greatly increase an animal’s fitness, as it facilitates adaptively constructive social decisions. This thesis investigates the encoding and decoding of identity information in the vocal communication system of the bonobo, Pan paniscus. We firstly investigated the stability of vocal signatures across the five most common call types in the bonobo vocal repertoire. Results showed that while all call types have the potential to code identity information, loud calls used during times of high arousal and for distance communication have the strongest individual vocal signatures. Following the first study, we investigated if social familiarity and relatedness affect the acoustic features that code individual information in the bark call type. Overall, we found strong evidence for vocal convergence, and specifically, that individuals who are related and familiar, independently from one another, are more vocally similar to one another than unrelated and unfamiliar individuals. In a final study we tested if bonobos are capable of using the encoded identity information to recognise past group members that they no longer live with. Through a series playback experiments we demonstrated that bonobos are capable of recognising familiar individuals from vocalisations alone even after years of separation. Collectively, the results of this thesis show that the encoding and decoding of identity information in bonobo vocalisations is a dynamic system, subject to modification through social processes but robust enough to allow for individual recognition over time. In conclusion these studies contribute to a better understanding of the vocal communication system of a non-human primate species with a unique and complex social network

APA, Harvard, Vancouver, ISO, and other styles

7

Ma, Zongqiang. "Spontaneous speech recognition using statistical dynamic models for the vocal-tract-resonance dynamics." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0020/NQ53993.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Yuditskaya, Sophia. "Automatic vocal recognition of a child's perceived emotional state within the Speechome corpus." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62086.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 137-149).
With over 230,000 hours of audio/video recordings of a child growing up in the home setting from birth to the age of three, the Human Speechome Project has pioneered a comprehensive, ecologically valid observational dataset that introduces far-reaching new possibilities for the study of child development. By offering In vivo observation of a child's daily life experience at ultra-dense, longitudinal time scales, the Speechome corpus holds great potential for discovering developmental insights that have thus far eluded observation. The work of this thesis aspires to enable the use of the Speechome corpus for empirical study of emotional factors in early child development. To fully harness the benefits of Speechome for this purpose, an automated mechanism must be created to perceive the child's emotional state within this medium. Due to the latent nature of emotion, we sought objective, directly measurable correlates of the child's perceived emotional state within the Speechome corpus, focusing exclusively on acoustic features of the child's vocalizations and surrounding caretaker speech. Using Partial Least Squares regression, we applied these features to build a model that simulates human perceptual heuristics for determining a child's emotional state. We evaluated the perceptual accuracy of models built across child-only, adult-only, and combined feature sets within the overall sampled dataset, as well as controlling for social situations, vocalization behaviors (e.g. crying, laughing, babble), individual caretakers, and developmental age between 9 and 24 months. Child and combined models consistently demonstrated high perceptual accuracy, with overall adjusted R-squared values of 0.54 and 0.58, respectively, and an average of 0.59 and 0.67 per month. Comparative analysis across longitudinal and socio-behavioral contexts yielded several notable developmental and dyadic insights. In the process, we have developed a data mining and analysis methodology for modeling perceived child emotion and quantifying caretaker intersubjectivity that we hope to extend to future datasets across multiple children, as new deployments of the Speechome recording technology are established. Such large-scale comparative studies promise an unprecedented view into the nature of emotional processes in early childhood and potentially enlightening discoveries about autism and other developmental disorders.
by Sophia Yuditskaya.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

9

Schall, Sonja. "The face in your voice–how audiovisual learning benefits vocal communication." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2014. http://dx.doi.org/10.18452/17039.

Full text

Abstract:

Gesicht und Stimme einer Person sind stark miteinander assoziiert und werden normalerweise als eine Einheit wahrgenommen. Trotz des natürlichen gemeinsamen Auftretens von Gesichtern und Stimmen, wurden deren Wahrnehmung in den Neurowissenschaften traditionell aus einer unisensorischen Perspektive untersucht. Das heißt, dass sich Forschung zu Gesichtswahrnehmung ausschließlich auf das visuelle System fokusierte, während Forschung zu Stimmwahrnehmung nur das auditorische System untersuchte. In dieser Arbeit schlage ich vor, dass das Gehirn an die multisensorische Beschaffenheit von Gesichtern und Stimmen adaptiert ist, und dass diese Adaption sogar dann sichtbar ist, wenn nur die Stimme einer Person gehört wird, ohne dass das Gesicht zu sehen ist. Im Besonderen, untersucht diese Arbeit wie das Gehirn zuvor gelernte Gesichts-Stimmassoziationen ausnutzt um die auditorische Analyse von Stimmen und Sprache zu optimieren. Diese Dissertation besteht aus drei empirischen Studien, welche raumzeitliche Hirnaktivität mittels funktionaler Magnetresonanztomographie (fMRT) und Magnetoenzephalographie (MEG) liefern. Alle Daten wurden gemessen, während Versuchspersonen auditive Sprachbeispiele von zuvor familiarisierten Sprechern (mit oder ohne Gesicht des Sprechers) hörten. Drei Ergebnisse zeigen, dass zuvor gelernte visuelle Sprecherinformationen zur auditorischen Analyse von Stimmen beitragen: (i) gesichtssensible Areale waren Teil des sensorischen Netzwerks, dass durch Stimmen aktiviert wurde, (ii) die auditorische Verarbeitung von Stimmen war durch die gelernte Gesichtsinformation zeitlich faszilitiert und (iii) multisensorische Interaktionen zwischen gesichtsensiblen und stimm-/sprachsensiblen Arealen waren verstärkt. Die vorliegende Arbeit stellt den traditionellen, unisensorischen Blickwinkel auf die Wahrnehmung von Stimmen und Sprache in Frage und legt nahe, dass die Wahrnehmung von Stimme und Sprache von von einem multisensorischen Verarbeitungsschema profitiert.
Face and voice of a person are strongly associated with each other and usually perceived as a single entity. Despite the natural co-occurrence of faces and voices, brain research has traditionally approached their perception from a unisensory perspective. This means that research into face perception has exclusively focused on the visual system, while research into voice perception has exclusively probed the auditory system. In this thesis, I suggest that the brain has adapted to the multisensory nature of faces and voices and that this adaptation is evident even when one input stream is missing, that is, when input is actually unisensory. Specifically, the current work investigates how the brain exploits previously learned voice-face associations to optimize the auditory processing of voices and vocal speech. Three empirical studies providing spatiotemporal brain data—via functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG)—constitute this thesis. All data were acquired while participants listened to auditory-only speech samples of previously familiarized speakers (with or without seeing the speakers’ faces). Three key findings demonstrate that previously learned visual speaker information support the auditory analysis of vocal sounds: (i) face-sensitive areas were part of the sensory network activated by voices, (ii) the auditory analysis of voices was temporally facilitated by learned facial associations and (iii) multisensory interactions between face- and voice/speech-sensitive regions were increased. The current work challenges traditional unisensory views on vocal perception and rather suggests that voice and vocal speech perception profit from a multisensory neural processing scheme.

APA, Harvard, Vancouver, ISO, and other styles

10

Garau, Giulia. "Speaker normalisation for large vocabulary multiparty conversational speech recognition." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3983.

Full text

Abstract:

One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions (different transmission channels, recording devices, noises etc.) and to the variability of speech across different speakers (i.e. due to different accents, coarticulation of phonemes and different vocal tract characteristics). Vocal tract length normalisation (VTLN) aims at normalising the acoustic signal, making it independent from the vocal tract length. This is done by a speaker specific warping of the frequency axis parameterised through a warping factor. In this thesis the application of VTLN to multiparty conversational speech was investigated focusing on the meeting domain. This is a challenging task showing a great variability of the speech acoustics both across different speakers and across time for a given speaker. VTL, the distance between the lips and the glottis, varies over time. We observed that the warping factors estimated using Maximum Likelihood seem to be context dependent: appearing to be influenced by the current conversational partner and being correlated with the behaviour of formant positions and the pitch. This is because VTL also influences the frequency of vibration of the vocal cords and thus the pitch. In this thesis we also investigated pitch-adaptive acoustic features with the goal of further improving the speaker normalisation provided by VTLN. We explored the use of acoustic features obtained using a pitch-adaptive analysis in combination with conventional features such as Mel frequency cepstral coefficients. These spectral representations were combined both at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA), and at the system level using ROVER. We evaluated this approach on a challenging large vocabulary speech recognition task: multiparty meeting transcription. We found that VTLN benefits the most from pitch-adaptive features. Our experiments also suggested that combining conventional and pitch-adaptive acoustic features using HLDA results in a consistent, significant decrease in the word error rate across all the tasks. Combining at the system level using ROVER resulted in a further significant improvement. Further experiments compared the use of pitch adaptive spectral representation with the adoption of a smoothed spectrogram for the extraction of cepstral coefficients. It was found that pitch adaptive spectral analysis, providing a representation which is less affected by pitch artefacts (especially for high pitched speakers), delivers features with an improved speaker independence. Furthermore this has also shown to be advantageous when HLDA is applied. The combination of a pitch adaptive spectral representation and VTLN based speaker normalisation in the context of LVCSR for multiparty conversational speech led to more speaker independent acoustic models improving the overall recognition performances.

APA, Harvard, Vancouver, ISO, and other styles

11

Bhullar, Naureen. "Effects of Facial and Vocal Emotion on Word Recognition in 11-to-13-month-old infants." Diss., Virginia Tech, 2007. http://hdl.handle.net/10919/27502.

Full text

Abstract:

The speech commonly addressed to infants (infant-directed speech or IDS) is believed to have multiple functions, including communication of emotion and highlighting linguistic aspects of speech. However, these two functions are most often studied separately so that the influence of emotional prosody (the changes in intonation and vocal quality that relate to emotion) on linguistic processing in infants has rarely been addressed. Given that language learning during infancy occurs in the context of natural infant-caretaker exchanges that most certainly include emotion communication and co-regulation, it is important to integrate the concepts of emotional communication and linguistic communication in studying language learning. This study examined the influence of both positive (happy) and negative (sad) face+voice contexts on word recognition in 11-to-13-month-old infants. It was hypothesized that infants would learn and subsequently recognize words when they were delivered in a happy context, but will experience more difficulty in learning and/or recognition of the same words when delivered in a sad context. The general pattern of results confirmed these predictions in that after habituating to sentences containing a specific target nonsense word, infants in the Happy Condition recovered their attention to the same sentences with a novel target word. In contrast, infants in the Sad Condition showed no significant recovery to a change in target words. These results contribute to our understanding of how emotional tone can facilitate and/or attenuate attention in older infants as they engage in language learning with their caretakers.
Ph. D.

APA, Harvard, Vancouver, ISO, and other styles

12

Bonzi, Francesco. "Lyrics Instrumentalness: An Automatic System for Vocal and Instrumental Recognition in Polyphonic Music with Deep Learning." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2021.

Find full text

Abstract:

Human voice recognition is a crucial task in music information retrieval. In this master thesis we developed an innovative AI system, called Instrumental- ness, to address the instrumental song tagging. An extended pipeline was proposed to fit the Instrumentalness require- ments, w.r.t. the well known tasks of Singing Voice Detection and Singing Voice Segmentation. A deep research on the available datasets was made and two different approaches were tried. The first one involves strongly labeled datasets and tested different neural architectures, while the second one used an attention mechanism to address a weakly labeled dataset, experimenting on different loss functions. Transfer learning was used to take advantage of the most recent architec- tures in the music information retrieval field, keeping the model efficient and effective. This work demonstrates that the quality of data is as important as its quan- tity. Moreover, the architectures to address strongly labeled datasets achieved the best performance, but it is remarkable that the attention mechanism used to address the weakly labeled datasets seems to be effective, even if the dataset was imbalanced and small.

APA, Harvard, Vancouver, ISO, and other styles

13

Panchapagesan, Sankaran. "Frequency warping by linear transformation, and vocal tract inversion for speaker normalization in automatic speech recognition." Diss., Restricted to subscribing institutions, 2008. http://proquest.umi.com/pqdweb?did=1610480121&sid=1&Fmt=2&clientId=1564&RQT=309&VName=PQD.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Dowell, Sacha. "Mother-pup recognition behaviour, pup vocal signatures and allosuckling in the New Zealand fur seal, Arctocephalus forsteri." Thesis, University of Canterbury. Biological Sciences, 2005. http://hdl.handle.net/10092/1267.

Full text

Abstract:

A recognition system is required between pinniped mothers and pups. For otariids this is especially important since females frequently leave their pups for foraging and must reunite on return. Pups must deal with these fasting periods during maternal absence and consequently may attempt to obtain allomaternal care from unrelated females. This research on the New Zealand fur seal (Arctocephalus forsteri) at Ohau Point, Kaikoura, New Zealand, quantified mother-pup recognition behaviour during reunions, individuality of pup calls used by mothers to recognise their pup, and the occurrence of allosuckling as a possible recognition error by females and as a strategy employed by pups to gain allomaternal care during their mothers' absence. A combination of behavioural observations, morphometry, VHF radio telemetry, acoustics and DNA genotyping were employed to study these topics. Postpartum interaction behaviours between mothers and pups appeared to facilitate development of an efficient mother-pup recognition system, involving mainly vocal and olfactory cues that were utilised during reunions. Greater selective pressure on pups to reunite resulted in an asymmetry of searching behaviour between females and pups during reunions. The vocalisations of pups were stereotypic, especially those features of the fundamental frequency and frequency of the lowest harmonic, which are likely to facilitate recognition of a pup by their mother. Pups attempted to steal milk from unrelated females more often during maternal absence and appeared to modify the intra-individual variation pattern of a feature of their vocal signatures over this period, which may assist attempts at allosuckling under nutritional stress. Fostering was demonstrated to occur despite costs to filial pups and possible costs to female reproductive success and may be attributed to development of erroneous recognition between females and non filial pups, or kin selection. This study provides a valuable contribution to the knowledge of recognition systems between pinniped mothers and pups, of alternative pup strategies under nutritional stress and of the rare occurrence of fostering in otariid pinnipeds.

APA, Harvard, Vancouver, ISO, and other styles

15

Roberts, Briony Z. Jr. "Dialects, Sex-specificity, and Individual Recognition in the Vocal Repertoire of the Puerto Rican Parrot (Amazona vittata)." Thesis, Virginia Tech, 1997. http://hdl.handle.net/10919/79692.

Full text

Abstract:

The following study is part of a larger study examining techniques that might be of use in the release program of the Puerto Rican Parrot (Amazona vittata), including marking, capturing, and radio-tracking. The portion of the study reported here documents the vocal behavior of A. vittata during the reproductive season and examines the possibility of using vocalizations to identify individuals, determine the sex of individuals and determine the location of an individual's breeding territory. Objectives of this study included: 1) cataloguing and categorizing the vocal repertoire of A. vittata, 2) determining whether the vocal repertoire was sex-specific and region-specific and 3) determining if an individual's vocal repertoire could be used to identify it. The vocal repertoire was characterized using a hierarchical method and 147 calls were described. The repertoire was found contain a high percentage (76 %) of graded calls. Evolutionary strategies that may explain the complexity of such a repertoire are discussed. The vocal repertoire was found to be both sex- and region-specific. Characteristics analyzed included time and frequency parameters of sonagrams. Three methods were used to determine the feasibility of vocal recognition of individuals. These methods included: bird-call pairing, sonagraphic analysis, and linear predictive coding. Sonagraphic analyses in combination with linear predictive coding techniques show the most promise as tools in voice recognition of the parrot, however, further research will be necessary to determine how reliable voice recognition may be as a method for identifying individuals in the field.
Master of Science

APA, Harvard, Vancouver, ISO, and other styles

16

Lima, Alice de Moura. "Production and perception of acoustic signals in captive bottlenose dolphins (Tursiops truncatus) : contextual use of social signals and recognition of artificial labels." Thesis, Rennes 1, 2017. http://www.theses.fr/2017REN1B048/document.

Full text

Abstract:

Les études de bioacoustique animale, qui reposent traditionnellement sur des modèles primates non humains et oiseaux chanteurs, convergent vers l'idée que la vie sociale serait la principale force motrice de l'évolution de la complexité de la communication. La comparaison avec les cétacés est également particulièrement intéressante d'un point de vue évolutif. Ce sont des mammifères qui forment des liens sociaux complexes, ont des capacités de plasticité acoustique, mais qui ont dû s'adapter à la vie marine, faisant de l'habitat une autre force de sélection déterminante. Leur habitat naturel impose des contraintes sur la production sonore, l'utilisation et la perception des signaux acoustiques, mais, de la même manière, limite les observations éthologiques. Etudier les cétacés captifs devient alors une source importante de connaissances sur ces animaux. Au-delà de l'analyse des structures acoustiques, l'étude des contextes sociaux dans lesquels les différentes vocalisations sont utilisées est essentielle à la compréhension de la communication vocale. Par rapport aux primates et aux oiseaux, la fonction sociale des signaux acoustiques des dauphins reste largement méconnue. En outre, les adaptations morpho-anatomiques de l’appareil vocal et auditif des cétacés à une vie sous-marine sont uniques dans le règne animal. Leur capacité à percevoir les sons produits dans l'air reste controversée en raison du manque de démonstrations expérimentales. Les objectifs de cette thèse étaient, d'une part, d'explorer l'utilisation contextuelle spontanée des signaux acoustiques dans un groupe captif de dauphins et, d'autre part, de tester expérimentalement les capacités à percevoir les sons sous l’eau comme dans l’air. Notre première étude observationnelle décrit la vie quotidienne de dauphins en captivité et montre que les signaux vocaux reflètent, à grande échelle, la répartition temporelle des activités sociales et non sociales dans un établissement sous contrôle humain. Notre deuxième étude met l'accent sur le contexte d’émission des trois principales catégories acoustiques précédemment identifiées dans le répertoire vocal des dauphins, à savoir les sifflements, les sons pulsés et les séries de clics. Nous avons trouvé des associations préférentielles entre chaque catégorie vocale et certains types d'interactions sociales ainsi que des combinaisons sonores non aléatoires et également dépendantes du contexte. Notre troisième étude a testé expérimentalement, dans des conditions standardisées, la réponse des dauphins à des « labels » acoustiques individuels donnés par l’homme et diffusés dans l’eau et dans l’air. Nous avons constaté que les dauphins peuvent reconnaître et réagir uniquement à leur propre « label » sonore, même lorsqu'il est diffusé dans l’air. En plus de confirmer l'audition aérienne, ces résultats soutiennent l’idée que les dauphins possèdent une notion d'identité. Dans l'ensemble, les résultats obtenus au cours de cette thèse suggèrent que certains signaux sociaux dans le répertoire des dauphins peuvent être utilisés pour communiquer des informations spécifiques sur les contextes comportementaux des individus impliqués et que les individus sont capables de généraliser leur concept d'identité à des signaux générés par l'homme
Studies on animal bioacoustics, traditionally relying on non-human primate and songbird models, converge towards the idea that social life appears as the main driving force behind the evolution of complex communication. Comparisons with cetaceans is also particularly interesting from an evolutionary point of view. They are indeed mammals forming complex social bonds, with abilities in acoustic plasticity, but that had to adapt to marine life, making habitat another determining selection force. Their natural habitat constrains sound production, usage and perception but, in the same way, constrains ethological observations making studies of captive cetaceans an important source of knowledge on these animals. Beyond the analysis of acoustic structures, the study of the social contexts in which the different vocalizations are used is essential to the understanding of vocal communication. Compared to primates and birds, the social function of dolphins’ acoustic signals remains largely misunderstood. Moreover, the way cetaceans’ vocal apparatus and auditory system adapted morphoanatomically to an underwater life is unique in the animal kingdom. But their ability to perceive sounds produced in the air remains controversial due to the lack of experimental demonstrations. The objectives of this thesis were, on the one hand, to explore the spontaneous contextual usage of acoustic signals in a captive group of bottlenose dolphins and, on the other hand, to test experimentally underwater and aerial abilities in auditory perception. Our first observational study describes the daily life of our dolphins in captivity, and shows that vocal signalling reflects, at a large scale, the temporal distribution of social and non-social activities in a facility under human control. Our second observational study focuses on the immediate context of emission of the three main acoustic categories previously identified in the dolphins’ vocal repertoire, i.e. whistles, burst-pulses and click trains. We found preferential associations between each vocal category and specific types of social interactions and identified context-dependent patterns of sound combinations. Our third study experimentally tested, under standardized conditions, the response of dolphins to human-made individual sound labels broadcast under and above water. We found that dolphins were able to recognize and to react only to their own label, even when broadcast in the air. Apart from confirming aerial hearing, these findings go in line with studies supporting that dolphins possess a concept of identity. Overall, the results obtained during this thesis suggest that some social signals in the dolphin repertoire can be used to communicate specific information about the behavioural contexts of the individuals involved and that individuals are able to generalize their concept of identity for human-generated signals

APA, Harvard, Vancouver, ISO, and other styles

17

Keshtiari, Niloofar [Verfasser]. "Does Gender Affect Recognition of Vocal Emotions? : evidence from Persian speakers living in a collectivist society / Niloofar Keshtiari." Berlin : Freie Universität Berlin, 2016. http://d-nb.info/1096221128/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Madison, Annelise Alissa. "Social Anxiety Symptoms, Heart Rate Variability, and Vocal Emotion Recognition: Evidence of a Normative Vagally-Mediated Positivity Bias in Women." The Ohio State University, 2019. http://rave.ohiolink.edu/etdc/view?acc_num=osu15582676377176.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Anglade, Yolande. "Robustesse de la connaissance automatique de la parole : étude et application dans un système d'aide vocal pour une standardiste malvoyante." Nancy 1, 1994. http://www.theses.fr/1994NAN10013.

Full text

Abstract:

Ce mémoire présente des travaux menés dans le cadre de la robustesse de la reconnaissance automatique de la parole et de son intégration dans un système de dialogue homme-machine. Deux études ont été menées. La première vise à mieux comprendre les changements acoustico-phonétiques qui se produisent lorsque la parole est produite dans du bruit (effet lombard). Les conclusions de cette étude en français et anglais-américain ont montré des tendances d'évolution significatives de nombreux indices acoustiques. La seconde étude porte sur la reconnaissance de vocabulaires difficiles (ex. : alphabet) en milieu bruité. Une nouvelle méthode de discrimination est proposée. Elle consiste à extraire les parties des mots similaires à l'aide de connaissances acoustiques, à les paramétrer et à les fournir à un perceptron multi-couches qui effectue la reconnaissance. Les comparaisons menées avec des méthodes globales (dtw, hmm) ont montré une nette amélioration des performances. Ces travaux ont été intégrés dans un système de dialogue oral homme-machine opérationnel dont l'objectif est d'aider une standardiste mal-voyante à rechercher les numéros de poste téléphonique des employés d'une société importante

APA, Harvard, Vancouver, ISO, and other styles

20

Tripovich, Joy Sophie. "Acoustic communication in Australian fur seals." Thesis, The University of Sydney, 2006. http://hdl.handle.net/2123/1690.

Full text

Abstract:

Communication is a fundamental process that allows animals to effectively transfer information between groups or individuals. Recognition plays an essential role in permitting animals to distinguish individuals based upon both communicatory and non-communicatory signals allowing animals to direct suitable behaviours towards them. Several modes of recognition exist and in colonial breeding animals which congregate in large numbers, acoustic signalling is thought to be the most effective as it suffers less from environmental degradation. Otariid seals (fur seals and sea lions) are generally colonial breeding species which congregate at high densities on offshore islands. In contrast to the other Arctocephaline species, the Australian fur seal, Arctocephalus pusillus doriferus, along with its conspecific, the Cape fur seal, A. p. pusillus, display many of the behavioural traits of sea lions. This may have important consequences in terms of its social structure and evolution. The acoustic communication of Australian fur seals was studied on Kanowna Island, Bass Strait, Australia. Analysing the acoustic structure of vocalisations and their use facilitates our understanding of the social function of calls in animal communication. The vocal repertoires of males, females, pups and yearlings were characterised and their behavioural context examined. Call structural variations in males were evident with changes in behavioural context, indicating parallel changes in the emotive state of sender. For a call to be used in vocal recognition it must display stereotypy within callers and variation between them. In Australian fur seal females and pups, individuals were found to have unique calls. Mutual mother-pup recognition has been suggested for otariids and this study supports the potential for this process to occur through the use of vocalisations. Call structural changes in pup vocalisations were also investigated over the progression of the year, from birth to weaning. Vocalisations produced by pups increased in duration, lowered in both the number of parts per call and the harmonic band containing the maximum frequency as they became older, suggesting calls are changing constantly as pups grow toward maturity. It has been suggested through descriptive reports, that the bark call produced by males is important to vocal recognition. The present study quantified this through the analysis of vocalisations produced by male Australian fur seals. Results support descriptive evidence suggesting that male barks can be used to discriminate callers. Traditional playback studies further confirmed that territorial male Australian fur seals respond significantly more to the calls of strangers than to those of neighbours, supporting male vocal recognition. This study modified call features of the bark to determine the importance to vocal recognition. The results indicate that the whole frequency spectrum was important to recognition. There was also an increase in response from males when they heard more bark units, indicating the importance of repetition by a caller. Recognition occurred when males heard between 25-75% of each bark unit, indicating that the whole duration of each bark unit is not necessary for recognition to occur. This may have particular advantages for communication in acoustically complex breeding environments, where parts of calls may be degraded by the environment. The present study examined the life history characteristics of otariids to determine the factors likely to influence and shape its vocal behaviour. Preliminary results indicate that female density, body size and the breeding environment all influence the vocal behaviour of otariids, while duration of lactation and the degree of polygyny do not appear to be influential. Understanding these interactions may help elucidate how vocal recognition and communication have evolved in different pinniped species.

APA, Harvard, Vancouver, ISO, and other styles

21

Tripovich, Joy Sophie. "Acoustic communication in Australian fur seals." University of Sydney, 2006. http://hdl.handle.net/2123/1690.

Full text

Abstract:

Doctor of Philosophy(PhD)
Communication is a fundamental process that allows animals to effectively transfer information between groups or individuals. Recognition plays an essential role in permitting animals to distinguish individuals based upon both communicatory and non-communicatory signals allowing animals to direct suitable behaviours towards them. Several modes of recognition exist and in colonial breeding animals which congregate in large numbers, acoustic signalling is thought to be the most effective as it suffers less from environmental degradation. Otariid seals (fur seals and sea lions) are generally colonial breeding species which congregate at high densities on offshore islands. In contrast to the other Arctocephaline species, the Australian fur seal, Arctocephalus pusillus doriferus, along with its conspecific, the Cape fur seal, A. p. pusillus, display many of the behavioural traits of sea lions. This may have important consequences in terms of its social structure and evolution. The acoustic communication of Australian fur seals was studied on Kanowna Island, Bass Strait, Australia. Analysing the acoustic structure of vocalisations and their use facilitates our understanding of the social function of calls in animal communication. The vocal repertoires of males, females, pups and yearlings were characterised and their behavioural context examined. Call structural variations in males were evident with changes in behavioural context, indicating parallel changes in the emotive state of sender. For a call to be used in vocal recognition it must display stereotypy within callers and variation between them. In Australian fur seal females and pups, individuals were found to have unique calls. Mutual mother-pup recognition has been suggested for otariids and this study supports the potential for this process to occur through the use of vocalisations. Call structural changes in pup vocalisations were also investigated over the progression of the year, from birth to weaning. Vocalisations produced by pups increased in duration, lowered in both the number of parts per call and the harmonic band containing the maximum frequency as they became older, suggesting calls are changing constantly as pups grow toward maturity. It has been suggested through descriptive reports, that the bark call produced by males is important to vocal recognition. The present study quantified this through the analysis of vocalisations produced by male Australian fur seals. Results support descriptive evidence suggesting that male barks can be used to discriminate callers. Traditional playback studies further confirmed that territorial male Australian fur seals respond significantly more to the calls of strangers than to those of neighbours, supporting male vocal recognition. This study modified call features of the bark to determine the importance to vocal recognition. The results indicate that the whole frequency spectrum was important to recognition. There was also an increase in response from males when they heard more bark units, indicating the importance of repetition by a caller. Recognition occurred when males heard between 25-75% of each bark unit, indicating that the whole duration of each bark unit is not necessary for recognition to occur. This may have particular advantages for communication in acoustically complex breeding environments, where parts of calls may be degraded by the environment. The present study examined the life history characteristics of otariids to determine the factors likely to influence and shape its vocal behaviour. Preliminary results indicate that female density, body size and the breeding environment all influence the vocal behaviour of otariids, while duration of lactation and the degree of polygyny do not appear to be influential. Understanding these interactions may help elucidate how vocal recognition and communication have evolved in different pinniped species.

APA, Harvard, Vancouver, ISO, and other styles

22

Woods, Richard David. "Collective responses to acoustic threat information in jackdaws." Thesis, University of Exeter, 2016. http://hdl.handle.net/10871/25978.

Full text

Abstract:

Navigating the physical world may present only a small fraction of the challenges faced by social animals. Sociality brings with it numerous benefits, including access to important information that may have otherwise been harder to come by. However, almost every aspect of these apparent benefits may also entail additional cognitive challenges, including how to interpret signals from conspecifics, who to attend to, and how to incorporate knowledge about signallers when deciding how to respond. One approach to understanding the cognitive abilities associated with social function is to investigate social species that take part in potentially costly group behaviours, where individual decisions must be made in a social context. In this thesis I explore how jackdaws (Corvus monedula), a highly sociable corvid species, use acoustic information to coordinate collective anti-predator responses. In Chapter Two I showed using playback experiments that the magnitude of collective responses to anti-predator recruitment calls known as “scolding” calls depends on the identity of the caller, with larger responses to familiar colony members than unfamiliar individuals. In Chapter Three I then used habituation-dishabituation experiments to show that this vocal discrimination operates at the level of the individual, with jackdaws discriminating between the calls of different conspecifics, regardless of their level of familiarity. In Chapter Four, I examined whether aspects of call structure conveyed information about threat levels. Here, I found that high rates of scolding calls were associated with elevated threats, and playback experiments suggested that this information might result in larger group responses. The finding that jackdaws are capable of mediating their response to alarm calls based on the identity of the individual caller, and on structural variation in call production, raised the question of whether jackdaws employed similar forms discrimination between acoustic cues made by predators in their environment. I investigated this in Chapter Five, using playback experiments to show that jackdaws responded not only to the vocalisations of resident predators, but that this ability extended to novel predators, and that responsiveness was mediated by the phase of the breeding season in which predators were heard. Together, these findings provide insights in to how discrimination among acoustic cues can mediate group behaviour in species that respond collectively to threats.

APA, Harvard, Vancouver, ISO, and other styles

23

Wong, Kim-Yung Eddie. "Automatic spoken language identification utilizing acoustic and phonetic speech information." Thesis, Queensland University of Technology, 2004. https://eprints.qut.edu.au/37259/1/Kim-Yung_Wong_Thesis.pdf.

Full text

Abstract:

Automatic spoken Language Identi¯cation (LID) is the process of identifying the language spoken within an utterance. The challenge that this task presents is that no prior information is available indicating the content of the utterance or the identity of the speaker. The trend of globalization and the pervasive popularity of the Internet will amplify the need for the capabilities spoken language identi¯ca- tion systems provide. A prominent application arises in call centers dealing with speakers speaking di®erent languages. Another important application is to index or search huge speech data archives and corpora that contain multiple languages. The aim of this research is to develop techniques targeted at producing a fast and more accurate automatic spoken LID system compared to the previous National Institute of Standards and Technology (NIST) Language Recognition Evaluation. Acoustic and phonetic speech information are targeted as the most suitable fea- tures for representing the characteristics of a language. To model the acoustic speech features a Gaussian Mixture Model based approach is employed. Pho- netic speech information is extracted using existing speech recognition technol- ogy. Various techniques to improve LID accuracy are also studied. One approach examined is the employment of Vocal Tract Length Normalization to reduce the speech variation caused by di®erent speakers. A linear data fusion technique is adopted to combine the various aspects of information extracted from speech. As a result of this research, a LID system was implemented and presented for evaluation in the 2003 Language Recognition Evaluation conducted by the NIST.

APA, Harvard, Vancouver, ISO, and other styles

24

SILVA, Daniella Dias Cavalcante da. "Desenvolvimento de um IP core de pré-processamento digital de sinais de voz para aplicação em sistemas embutidos." Universidade Federal de Campina Grande, 2006. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/1293.

Full text

Abstract:

Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-07-30T19:55:32Z No. of bitstreams: 1 DANIELLA DIAS CAVALCANTE DA SILVA - DISSERTAÇÃO PPGCC 2006..pdf: 2114328 bytes, checksum: d2b6ea9368390fa54a2beb6aab40546e (MD5)
Made available in DSpace on 2018-07-30T19:55:32Z (GMT). No. of bitstreams: 1 DANIELLA DIAS CAVALCANTE DA SILVA - DISSERTAÇÃO PPGCC 2006..pdf: 2114328 bytes, checksum: d2b6ea9368390fa54a2beb6aab40546e (MD5) Previous issue date: 2006-07
Capes
A fala é o meio de comunicação comumente utilizado pelo homem, que o distingue dos demais seres vivos, permitindo-lhe a troca de idéias, expressão de opiniões ou revelação de seu pensamento. Diante do avanço tecnológico e conseqüente surgimento de equipamentos eletrônicos cada vez mais sofisticados, a possibilidade de permitir a interação homemmáquina através da voz tem sido objeto de grande interesse, tanto do meio acadêmico quanto dos fabricantes de tais equipamentos. Pesquisas na área de Processamento Digital de Sinais de Voz têm permitido o desenvolvimento de sistemas de Resposta Vocal, Reconhecimento de Voz e Reconhecimento de Identidade Vocal. Entretanto, requisitos de processamento ainda dificultam a implementação desses sistemas em dispositivos com baixo poder computacional, como celulares, palmtops e eletrodomésticos. O trabalho desenvolvido consiste do estudo e adaptação de técnicas de processamento digital de sinais de voz, resultando em uma biblioteca de pré-processamento, incluindo as funções de pré-ênfase, divisão em quadros e janelamento, de maneira a permitir sua utilização no desenvolvimento de aplicações embutidas de reconhecimento de voz ou locutor. Foram realizadas adaptações dos modelos necessários à realização das funções, implementação em uma linguagem de descrição de hardware, verificação funcional da biblioteca e, por fim, prototipação em um dispositivo de hardware.
Speech is the most common way of communication used by human beings, which distinguishes it from other living beings, allowing the exchange of ideas, expression of opinions or revelation of thought. In face technology advance and consequently appearance of electronics equipments more and more sophisticated, the possibility to allow the man-machine interaction through speech have been object of interesting as to academic environment as to electronic equipment developers. Research in the area of Speech Processing has been allowing the development of Speech Synthesis Systems, Speech Recognition Systems and Speaker Recognition Systems. However, processing requirements still difficult the implementation of those systems in devices with low computational power, as mobile phone, palmtops and home equipments. This work consists of the study and adaptation of digital processing speech signals techniques, resulting in an optimized library of preprocessing including preemphasis, division into frames and windowing, allowing this use in development of speech or speaker recognition embedded applications. It was made adaptations in the models, implementation in a hardware description language, library functional verification and finally the prototyping in a hardware device.

APA, Harvard, Vancouver, ISO, and other styles

25

Regnier, Lise. "Localization, Characterization and Recognition of Singing Voices." Phd thesis, Université Pierre et Marie Curie - Paris VI, 2012. http://tel.archives-ouvertes.fr/tel-00687475.

Full text

Abstract:

This dissertation is concerned with the problem of describing the singing voice within the audio signal of a song. This work is motivated by the fact that the lead vocal is the element that attracts the attention of most listeners. For this reason it is common for music listeners to organize and browse music collections using information related to the singing voice such as the singer name. Our research concentrates on the three major problems of music information retrieval: the localization of the source to be described (i.e. the recognition of the elements corresponding to the singing voice in the signal of a mixture of instruments), the search of pertinent features to describe the singing voice, and finally the development of pattern recognition methods based on these features to identify the singer. For this purpose we propose a set of novel features computed on the temporal variations of the fundamental frequency of the sung melody. These features, which aim to describe the vibrato and the portamento, are obtained with the aid of a dedicated model. In practice, these features are computed on the time-varying frequency of partials obtained using the sinusoidal model. In the first experiment we show that partials corresponding to the singing voice can be accurately differentiated from the partials produced by other instruments using decisions based on the parameters of the vibrato and the portamento. Once the partials emitted by the singer are identified, the segments of the song containing singing can be directly localized. To improve the recognition of the partials emitted by the singer we propose to group partials that are related harmonically. Partials are clustered according to their degree of similarity. This similarity is computed using a set of CASA cues including their temporal frequency variations (i.e. the vibrato and the portamento). The clusters of harmonically related partials corresponding to the singing voice are identified using the vocal vibrato and the portamento parameters. Groups of vocal partials can then be re-synthesized to isolate the voice. The result of the partial grouping can also be used to transcribe the sung melody. We then propose to go further with these features and study if the vibrato and portamento characteristics can be considered as a part of the singers' signature. Previous works on singer identification describe audio signals using features extracted on the short-term amplitude spectrum. The latter features aim to characterize the timbre of the sound, which, in the case of singing, is related to the vocal tract of the singer. The features we develop in this document capture long-term information related to the intonation of the singer, which is relevant to the style and the technique of the singer. We propose a method to combine these two complementary descriptions of the singing voice to increase the recognition rate of singer identification. In addition we evaluate the robustness of each type of feature against a set of variations. We show the singing voice is a highly variable instrument. To obtain a representative model of a singer's voice it is thus necessary to build models using a large set of examples covering the full tessitura of a singer. In addition, we show that features extracted directly from the partials are more robust to the presence of an instrumental accompaniment than features derived from the amplitude spectrum.

APA, Harvard, Vancouver, ISO, and other styles

26

Hale, Jennifer Ann. "The Role of Male Vocal Signals During Male-Male Competition and Female Mate Choice in Greater Prairie-Chickens (Tympanuchus cupido)." The Ohio State University, 2013. http://rave.ohiolink.edu/etdc/view?acc_num=osu1365786099.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Alsabaan, Majed Soliman K. "Pronunciation support for Arabic learners." Thesis, University of Manchester, 2015. https://www.research.manchester.ac.uk/portal/en/theses/pronunciation-support-for-arabic-learners(3db28816-90ed-4e8b-b64c-4bbd35f98be7).html.

Full text

Abstract:

The aim of the thesis is to find out whether providing feedback to Arabic language learners will help them improve their pronunciation, particularly of words involving sounds that are not distinguished in their native languages. In addition, it aims to find out, if possible, what type of feedback will be most helpful. In order to achieve this aim, we developed a computational tool with a number of component sub tools. These tools involve the implementation of several substantial pieces of software. The first task was to ensure the system we were building could distinguish between the more challenging sounds when they were produced by a native speaker, since without that it will not be possible to classify learners’ attempts at these sounds. To this end, a number of experiments were carried out with the hidden Markov model toolkit (the HTK), a well known speech recognition toolkit, in order to ensure that it can distinguish between the confusable sounds, i.e. the ones that people have difficulty with. The developed computational tool analyses the differences between the user’s pronunciation and that of a native speaker by using grammar of minimal pairs, where each utterance is treated as coming from a family of similar words. This provides the ability to categorise learners’ errors - if someone is trying to say cat and the recogniser thinks they have said cad then it is likely that they are voicing the final consonant when it should be unvoiced. Extensive testing shows that the system can reliably distinguish such minimal pairs when they are produced by a native speaker, and that this approach does provide effective diagnostic information about errors. The tool provides feedback in three different sub-tools: as an animation of the vocal tract, as a synthesised version of the target utterance, and as a set of written instructions. The tool was evaluated by placing it in a classroom setting and asking 50 Arabic students to use the different versions of the tool. Each student had a thirty minute session with the tool, working their way through a set of pronunciation exercises at their own pace. The results of this group showed that their pronunciation does improve over the course of the session, though it was not possible to determine whether the improvement is sustained over an extended period. The evaluation was done from three points of view: quantitative analysis, qualitative analysis, and using a questionnaire. Firstly, the quantitative analysis gives raw numbers telling whether a learner had improved their pronunciation or not. Secondly, the qualitative analysis shows a behaviour pattern of what a learner did and how they used the tool. Thirdly, the questionnaire gives feedback from learners and their comments about the tool. We found that providing feedback does appear to help Arabic language learners, but we did not have enough data to see which form of feedback is most helpful. However, we provided an informative analysis of behaviour patterns to see how Arabic students used the tool and interacted with it, which could be useful for more data analysis.

APA, Harvard, Vancouver, ISO, and other styles

28

El, Kansouli Mohamed. "Contribution à la commande vocale d'une machine : Application au robot marcheur EMA4." Valenciennes, 1989. https://ged.uphf.fr/nuxeo/site/esupversions/16a27432-2240-4452-8371-17d179e15a53.

Full text

APA, Harvard, Vancouver, ISO, and other styles

29

Sklar, Alexander Gabriel. "Channel Modeling Applied to Robust Automatic Speech Recognition." Scholarly Repository, 2007. http://scholarlyrepository.miami.edu/oa_theses/87.

Full text

Abstract:

In automatic speech recognition systems (ASRs), training is a critical phase to the system?s success. Communication media, either analog (such as analog landline phones) or digital (VoIP) distort the speaker?s speech signal often in very complex ways: linear distortion occurs in all channels, either in the magnitude or phase spectrum. Non-linear but time-invariant distortion will always appear in all real systems. In digital systems we also have network effects which will produce packet losses and delays and repeated packets. Finally, one cannot really assert what path a signal will take, and so having error or distortion in between is almost a certainty. The channel introduces an acoustical mismatch between the speaker's signal and the trained data in the ASR, which results in poor recognition performance. The approach so far, has been to try to undo the havoc produced by the channels, i.e. compensate for the channel's behavior. In this thesis, we try to characterize the effects of different transmission media and use that as an inexpensive and repeatable way to train ASR systems.

APA, Harvard, Vancouver, ISO, and other styles

30

Zelinka, Petr. "Zvyšování účinnosti strojového rozpoznávání řeči." Doctoral thesis, Vysoké učení technické v Brně. Fakulta elektrotechniky a komunikačních technologií, 2012. http://www.nusl.cz/ntk/nusl-233578.

Full text

Abstract:

This work identifies the causes for unsatisfactory reliability of contemporary systems for automatic speech recognition when deployed in demanding conditions. The impact of the individual sources of performance degradation is documented and a list of known methods for their identification from the recognized signal is given. An overview of the usual methods to suppress the impact of the disruptive influences on the performance of speech recognition is provided. The essential contribution of the work is the formulation of new approaches to constructing acoustical models of noisy speech and nonstationary noise allowing high recognition performance in challenging conditions. The viability of the proposed methods is verified on an isolated-word speech recognizer utilizing several-hour-long recording of the real operating room background acoustical noise recorded at the Uniklinikum Marburg in Germany. This work is the first to identify the impact of changes in speaker’s vocal effort on the reliability of automatic speech recognition in the full vocal effort range (i.e. whispering through shouting). A new concept of a speech recognizer immune to the changes in vocal effort is proposed. For the purposes of research on changes in vocal effort, a new speech database, BUT-VE1, was created.

APA, Harvard, Vancouver, ISO, and other styles

31

Carvalho, Raphael Torres Santos. "Transformada Wavelet na detecÃÃo de patologias da laringe." Universidade Federal do CearÃ, 2012. http://www.teses.ufc.br/tde_busca/arquivo.php?codArquivo=8908.

Full text

Abstract:

CoordenaÃÃo de AperfeiÃoamento de Pessoal de NÃvel Superior
A quantidade de mÃtodos nÃo invasivos de diagnÃstico tem aumentado devido Ã necessidade de exames simples, rÃpidos e indolores. Por conta do crescimento da tecnologia que fornece os meios necessÃrios para a extraÃÃo e processamento de sinais, novos mÃtodos de anÃlise tÃm sido desenvolvidos para compreender a complexidade dos sinais de voz. Este trabalho de dissertaÃÃo apresenta uma nova ideia para caracterizar os sinais de voz saudÃvel e patolÃgicos baseado em uma ferramenta matemÃtica amplamente conhecida na literatura, a Transformada Wavelet (WT). O conjunto de dados utilizado neste trabalho consiste de 60 amostras de vozes divididas em quatro classes de amostras, uma de indivÃduos saudÃveis e as outras trÃs de pessoas com nÃdulo vocal, edema de Reinke e disfonia neurolÃgica. Todas as amostras foram gravadas usando a vogal sustentada /a/ do PortuguÃs Brasileiro. Os resultados obtidos por todos os classificadores de padrÃes estudados mostram que a abordagem proposta usando WT Ã uma tÃcnica adequada para discriminaÃÃo entre vozes saudÃvel e patolÃgica, e apresentaram resultados similares ou superiores a da tÃcnica clÃssica quanto Ã taxa de reconhecimento.
The amount of non-invasive methods of diagnosis has increased due to the need for simple, quick and painless tests. Due to the growth of technology that provides the means for extraction and signal processing, new analytical methods have been developed to help the understanding of analysis of the complexity of the voice signals. This dissertation presents a new idea to characterize signals of healthy and pathological voice based on one mathematical tools widely known in the literature, Wavelet Transform (WT). The speech data were used in this work consists of 60 voice samples divided into four classes of samples: one from healthy individuals and three from people with vocal fold nodules, Reinkeâs edema and neurological dysphonia. All the samples were recorded using the vowel /a/ in Brazilian Portuguese. The obtained results by all the pattern classifiers studied indicate that the proposed approach using WT is a suitable technique to discriminate between healthy and pathological voices, since they perform similarly to or even better than classical technique, concerning recognition rates.

APA, Harvard, Vancouver, ISO, and other styles

32

Favarelli, Elia. "Algoritmi di Machine Learning per il Riconoscimento Vocale." Master's thesis, Alma Mater Studiorum - Università di Bologna, 2017.

Find full text

Abstract:

Con l’avvento dell’Internet of Things (IoT) la mole di dati prodotti da sensori e dispositivi connessi alla rete aumenterà in maniera esponenziale. Per gestire una tale mole di dati diventerà sempre più importante l’utilizzo di nuove strategie di elaborazione per classificare le informazioni raccolte, estrarre “feature” che caratterizzino un certo gruppo di dati, ed infine distillarne il contenuto (dimensionality reduction). In questa tesi si studiano e utilizzano alcuni algoritmi di Machine Learning adatti alla trattazione di problemi multidimensionali di classificazione e alla gestione di grandi quantità di dati (Big Data), in grado di estrarre feature per catalogare i dati. L’obiettivo è quindi quello di implementare alcuni algoritmi di Machine Learning per poi procedere ad un confronto delle prestazioni, sottoponendo vari tipi di problemi di classificazione agli algoritmi realizzati, al fine di verificare quale sia il miglior algoritmo da utilizzare data una certa tipologia di problema. La valutazione delle prestazioni verrà effettuata mediante metodi di misura dell'accuratezza della classificazione, come ad esempio la matrice di confusione, la probabilità di falso allarme (PF), e la probabilità di missed detection (PM).

APA, Harvard, Vancouver, ISO, and other styles

33

Koudache, Abdellah. "Contribution à la reconnaissance vocale en vue de la commande de machine : application au robot marcheur EMA4." Valenciennes, 1993. https://ged.uphf.fr/nuxeo/site/esupversions/121368bf-ebaa-4f19-9a9a-ae9f0c6a7da5.

Full text

Abstract:

L'objectif de l'étude menée dans ce mémoire est de contribuer à l'intégration d'une composante vocale dans un système de communication homme-machine et plus particulièrement entre un opérateur et un robot. Après une présentation générale de la problématique des interfaces orales homme-machine, le travail présenté dans ce mémoire consiste à: 1) d'abord, contribuer à l'amélioration des performances des systèmes de reconnaissance vocale de type DTW, c'est-à-dire, fondés essentiellement sur le principe de programmation dynamique, en sollicitant des connaissances spécifiques au signal de parole dans un processus de reconnaissance hybride; ce dernier fait intervenir une hiérarchie de décisions partielles assez variées (indices acoustiques, traits phonétiques, critère de divergence, DTW) ainsi qu'une décision finale, basée sur la mesure de plausibilité, de type facteur de confiance; 2) ensuite, en guise d'application, présenter les stratégies mises en œuvre dans un système de commande vocale d'un engin marcheur autonome à quatre pattes, son fonctionnement et les connaissances (statiques ou dynamiques) nécessaires a l'interprétation des ordres vocaux en vue de leur exécution par le robot. Enfin, sont proposés des critères de choix du langage de commande (aspect lexical et syntaxique) pour augmenter la fiabilité de la reconnaissance et assurer la spontanéité de la commande, ainsi que des recommandations pour renforcer la sécurité des tâches à exécuter.

APA, Harvard, Vancouver, ISO, and other styles

34

Zapata, Rojas Julian. "Traduction dictée interactive : intégrer la reconnaissance vocale à l’enseignement et à la pratique de la traduction professionnelle." Thèse, Université d'Ottawa / University of Ottawa, 2012. http://hdl.handle.net/10393/23227.

Full text

Abstract:

Translation dictation (TD) is a translation technique that was widely used before professional translators’ workstations witnessed the massive influx of typewriters and personal computers. In the current era of globalization and of information and communication technologies (ICT), and in response to the growing demand for translation, certain translators and translator trainers throughout the world are seeking to (re)integrate dictation into the translation practice. Contrary to a few decades ago, when the transcription of translated texts was typically carried out by professional typists, the translation industry is currently turning to voice recognition (VR) technologies—that is, computer tools that serve to transcribe dictations automatically. Although off-the-shelf VR systems are not specifically conceived for professional translation purposes, they already seem to provide a more ergonomic and efficient approach, for those translators who are already using them, than does the conventional method, i.e., typing on a computer keyboard. This thesis introduces the notion of Interactive Translation Dictation (ITD), a translation technique that involves interaction with a VR system. The literature review conducted for this research indicated that integrating VR technologies into the practice of translation is not new; however, it showed that past efforts have proved unsuccessful. Moreover, an analysis of the needs of translators who use VR systems shed light on why translators have turned to VR software and what their opinions of these tools are. This analysis also allowed us to identify the challenges that VR technology currently presents for professional translation. This thesis is intended as a first step towards developing translation tools that are both ergonomic, i.e., that take into account the human factor, and efficient, allowing translators to meet the needs of the current translation market. The thesis also advocates a renewal of translator training programs. Integrating ITD into translation training and practice means (re)integrating spoken translation techniques that were used in the past and VR technologies that are now emerging. For such integration to be effective, significant technical, cognitive and pedagogical challenges will first need to be overcome.
La traduction dictée (TD) est une technique de traduction amplement utilisée avant l’arrivée massive des machines à écrire et des ordinateurs personnels sur les postes de travail des traducteurs professionnels. À l’heure actuelle, devant la demande croissante de traduction à l’ère de la mondialisation et des technologies de l’information et des communications (TIC), certains traducteurs en exercice et des formateurs en traduction du monde entier considèrent la (ré)intégration de la TD à la pratique traductive. Contrairement à la méthode d’il y a quelques décennies, où la transcription des traductions était normalement produite par un copiste professionnel, on considère l’utilisation des technologies de reconnaissance vocale (RV) : des outils informatiques pouvant prendre en charge la transcription de dictées. Bien que les systèmes de RV sur le marché ne soient pas adaptés à la pratique de la traduction spécifiquement, ils semblent apporter, à ceux qui les utilisent déjà, une approche plus ergonomique et plus efficace que la méthode conventionnelle, c’est-à-dire la saisie au clavier d’ordinateur. La présente thèse introduit la notion de traduction dictée interactive (TDI) comme technique de traduction en interaction avec un système de RV. Lors de la revue de la littérature pour le présent projet, nous avons constaté que l’intérêt à intégrer la RV à la traduction professionnelle n’est pas nouveau, mais que les efforts précédents n’ont pas connu de succès définitif. Également, une analyse des besoins de certains traducteurs utilisant des systèmes de RV nous a éclairé sur la nature des motivations de ces traducteurs à se tourner vers la RV, sur leurs opinions vis-à-vis de cette dernière et sur les difficultés que posent les systèmes de RV pour les tâches d’ordre traductif. Notre thèse se veut un premier pas vers la conception d’outils d’aide à la traduction à la fois ergonomiques, c’est-à-dire prenant en compte le facteur humain, et efficaces, permettant de combler les besoins actuels du marché de la traduction. Elle se veut également une proposition de renouvèlement des programmes de formation à la traduction. Intégrer la TDI à la formation et à la pratique traductives, c’est (ré)intégrer des techniques de traduction orale utilisées par le passé et des technologies émergentes de RV. Et pour que cette intégration soit optimale, des défis importants d’ordre technique, cognitif et pédagogique restent à être surmontés.

APA, Harvard, Vancouver, ISO, and other styles

35

SILVA, Daniella Dias Cavalcante da. "Reconhecimento de fala contínua para o Português Brasileiro em sistemas embarcados." Universidade Federal de Campina Grande, 2011. http://dspace.sti.ufcg.edu.br:8080/jspui/handle/riufcg/1295.

Full text

Abstract:

Submitted by Johnny Rodrigues (johnnyrodrigues@ufcg.edu.br) on 2018-07-30T21:22:20Z No. of bitstreams: 1 DANIELLA DIAS CAVALCANTE DA SILVA - TESE PPGEE 2011..pdf: 21267862 bytes, checksum: 34609e6f0c5b3d3d5dbe954562ec3132 (MD5)
Made available in DSpace on 2018-07-30T21:22:20Z (GMT). No. of bitstreams: 1 DANIELLA DIAS CAVALCANTE DA SILVA - TESE PPGEE 2011..pdf: 21267862 bytes, checksum: 34609e6f0c5b3d3d5dbe954562ec3132 (MD5) Previous issue date: 2011-12
Com o advento da tecnologia, as máquinas predominam em quase todos os cenários do cotidiano das pessoas, sejam essas máquinas computadores, eletrodomésticos, dispositivos portáteis, etc. Com isso, nada melhor do que dotá-las com a capacidade de percepção e compreensão da voz humana, que é a forma mais simples, natural e eficaz do ser humano expressar seus pensamentos. Apesar de muitas pesquisas na área de Processamento Digital de Sinais de Voz (PDSV) terem permitido o desenvolvimento de sistemas de Reconhecimento de Faia bastante eficientes, requisitos de processamento ainda dificultam a implementação desses sistemas em dispositivos com pequeno poder computacional, como celulares, palmtops e eíetrodomésticos. Para permitir a implementação de sistemas de Reconhecimento de Faia nesse contexto, alguns trabalhos sacrificam a eficiência no processo de reconhecimento em nome da redução do tamanho físico e de exigências computacionais. Assim, a busca por modelagens acústicas e linguísticas othnizadas, associadas ao uso de bases de dados representativas, pode levar a ura compromisso entre desempenho do sistema em termos de taxas de reconhecimento e exigências computacionais impostas por sistemas embarcados. O objetivo principal deste trabalho consiste na modelagem da arquitetura de um sistema de reconhecimento de fala contínua para o português brasileiro, utilizando Modelos Ocultos de Markov, de forma a possibilitar sua implementação em um sistema embarcado com recursos computacionais limitados. A fim de selecionar a configuração que melhor atenda esse objetivo, foram realizados experimentos e análises, de modo a identificar possíveis adaptações, a partir de simplificações matemáticas e redução de parâmetros nas etapas do processo de reconhecimento. Em todo lho, foi considerada a relação entre a taxa de reconhecimento e o custo computacional. A arquitetura do sistema embarcado desenvolvida e o seu processo de modelagem, incluindo os experimentos, as análises e os seus respectivos resultados, serão apresentados e discutidos no decorrer deste documento.
WIth the advent of technology, machines predominate in aímost ali seenarios of everyday life. The possibiiity of performing human-maehine comniunication through speech makes this interact.ion easier and more productive. However, processing requirements still difficult tlíe implementation oF systems for automatic continuous speech recognition on devices with low computational power sucJi as mobile phones, palmtops and appliances. To allow the implementation of speech recognition systems in this context. some works sacrifice efficiency in the recognition process for redueing the chip area and computational requirements. For this purpose, it becomes necessary to research for optimized acoustic and language modeling, associated with use of representative databases, looking for a good compromise between recognitioa vaies and compuiational demands imposed by embedded systems. The main goai of this work is to model the architecture of a system for continuous speech recognition Brazilian Portuguese, in order to enable its implementation in an embedded system with limited computtng resources. In order to select the setting that best nieets this goal, experiments and analysis were performed. The purpose of these was to identify possible adaptations, from mathematical simpiifícations and reduction of parameters in the steps of the recognition process. During the deveiopinent of this work, the relationship between recognition rate and computational cost was considered. The embedded system architecture developed and its modeling process, including experiments. analysis and their results will be presented and díscussed thxoughout this document.

APA, Harvard, Vancouver, ISO, and other styles

36

Boué, Anaïs. "Data mining and volcanic eruption forcasting." Thesis, Université Grenoble Alpes (ComUE), 2015. http://www.theses.fr/2015GREAU007/document.

Full text

Abstract:

L'intégration de méthodes de prédiction des éruptions volcaniques dans une stratégie de surveillance globale peut être un outil d'aide à la décision précieux pour la gestion des crises, si les limites des méthodes utilisées sont connues. La plupart des tentatives de prédictions déterministes des éruptions volcaniques et des glissements de terrain sont effectuées avec la méthode FFM (material Failure Forecast Method). Cette méthode consiste à ajuster une loi de puissance empirique aux précurseurs de sismicité ou de déformation des éruptions. Jusqu'à présent, la plupart des travaux de recherche se sont attachés à faire des prédictions a posteriori, basées sur la séquence complète de précurseurs, mais le potentiel de la méthode FFM pour la prédiction en temps réel, en n'utilisant qu'une partie de la séquence, n'a encore jamais été évaluée. De plus, il est difficile de conclure quant-à la capacité de la méthode pour prédire les éruptions volcaniques car le nombre d'exemples publiés est très limité et aucune évaluation statistique de son potentiel n'a été faite jusqu'à présent. Par conséquent, il est important de procéder à une application systématique de la FFM sur un nombre important d'éruptions, dans des contextes volcaniques variés. Cette thèse présente une approche rigoureuse de la FFM, appliquée aux précurseurs sismiques des éruptions volcaniques, développée pour une application en temps réel. J'utilise une approche Bayésienne basée sur la théorie de la FFM et sur un outil de classification automatique des signaux ayant des mécanismes à la source différents. Les paramètres d'entrée de la méthode sont les densités de probabilité des données, déduites de la performance de l'outil de classification. Le paramètre de sortie donne la distribution de probabilité du temps de prédiction à chaque temps d'observation précédant l'éruption. Je détermine deux critères pour évaluer la fiabilité d'une prédiction en temps réel : l'étalement de la densité de probabilité de la prédiction et sa stabilité dans le temps. La méthode développée ici surpasse les applications classiques de la FFM, que ce soit pour des applications en a posteriori ou en temps réel, en particulier parce que l'information concernant l'incertitude sur les donnée est précisément prise en compte. La classification automatique des signaux sismo-volcaniques permet une application systématique de cette méthode de prédiction sur des dizaines d'années de données pour des contextes volcaniques andésitiques, au volcan Colima (Mexique) et au volcan Mérapi (Indonésie), et pour un contexte basaltique au Piton de la Fournaise (La Réunion, France). Je quantifie le nombre d'éruptions qui ne sont pas précédées de précurseurs, ainsi que les crises sismiques qui ne sont pas associées à des épisodes volcaniques. Au total, 64 séquences de précurseurs sont étudiées et utilisées pour tester la méthode de prédiction des éruptions développée dans cette thèse. Ce travail permet de déterminer dans quelles conditions la FFM peut être appliquée avec succès et de quantifier le taux de réussite de la méthode en temps réel et en a posteriori. Seulement 62% des séquences précurseurs étudiées dans cette thèse sont utilisable dans le cadre de la FFM et la moitié du nombre total d'éruptions sont prédites a posteriori. En temps réel, seulement 36% du nombre total d'éruptions auraient pu être prédites. Cependant, ces prédictions sont précises dans 83% des cas pour lesquels les critères de fiabilités sont satisfaites. Par conséquent, il apparaît que l'on peut avoir confiance en la méthode de prédiction en temps réel développée dans cette thèse mais que la FFM semble être applicable en temps réel uniquement si elle est intégrée dans une statégie de prédiction plus globale. Cependant, elle pourrait être potentiellement utile combinée avec d'autres méthodes de prédictions et supervisée par un observeur. Ces résultats reflètent le manque de connaissances actuelles concernant les mécanismes pré-éruptifs
Eruption forecasting methods are valuable tools for supporting decision making during volcanic crises if they are integrated in a global monitoring strategy and if their potentiality and limitations are known. Many attempts for deterministic forecasting of volcanic eruptions and landslides have been performed using the material Failure Forecast Method (FFM). This method consists in adjusting an empirical power law on precursory patterns of seismicity or deformation. Until now, most of the studies have presented hindsight forecasts, based on complete time series of precursors, and do not evaluate the method's potential for carrying out real-time forecasting with partial precursory sequences. Moreover, the limited number of published examples and the absence of systematic application of the FFM makes it difficult to conclude as to the ability of the method to forecast volcanic eruptions. Thus it appears important to gain experience by carrying out systematic forecasting attempts in various eruptive contexts. In this thesis, I present a rigorous approach of the FFM designed for real-time applications on volcano-seismic precursors. I use a Bayesian approach based on the FFM theory and an automatic classification of the seismic events that do not have the same source mechanisms. The probability distributions of the data deduced from the performance of the classification are used as input. As output, the method provides the probability of the forecast time at each observation time before the eruption. The spread of the posterior probability density function of the prediction time and its stability with respect to the observation time are used as criteria to evaluate the reliability of the forecast. I show that the method developed here outperforms the classical application of the FFM both for hindsight and real-time attempts because it accurately takes the uncertainty of the data information into account. The automatic classification of volcano-seismic signals allows for a systematic application of this forecasting method to decades of seismic data from andesitic volcanoes including Volcan de Colima (Mexico) and Merapi volcano (Indonesia), and from the basaltic volcano of Piton de la Fournaise (Reunion Island, France). The number of eruptions that are not preceded by precursors is quantified, as well as the number of seismic crises that are not followed by eruptions. Then, I use 64 precursory sequences and apply the forecasting method developed in this thesis. I thus determine in which conditions the FFM can be successfully applied and I quantify the success rate of the method in real-time and in hindsight. Only 62% of the precursory sequences analysed in this thesis were suitable for the application of FFM and half of the total number of eruptions are successfully forecast in hindsight. In real-time, the method allows for the successful predictions of only 36% of the total of all eruptions considered. Nevertheless, real-time predictions are successful for 83% of the cases that fulfil the reliability criteria. Therefore, we can have a good confidence on the method when the reliability criteria are met, but the deterministic real-time forecasting tool developed in this thesis is not sufficient in itself. However, it could potentially be informative combined with other forecasting methods and supervised by an observer. These results reflect the lack of knowledge concerning the pre-eruptive mechanisms

APA, Harvard, Vancouver, ISO, and other styles

37

Srinivasamurthy, Ajay. "A Data-driven bayesian approach to automatic rhythm analysis of indian art music." Doctoral thesis, Universitat Pompeu Fabra, 2016. http://hdl.handle.net/10803/398986.

Full text

Abstract:

Las colecciones de música son cada vez mayores y más variadas, haciendo necesarias nuevas fórmulas para su organización automática. El análisis automático del ritmo tiene como fin la extracción de información rítmica de grabaciones musicales y es una de las principales áreas de investigación en la disciplina de recuperación de la información musical (MIR por sus siglas en inglés). La dimensión rítmica de la música es específica a una cultura y por tanto su análisis requiere métodos que incluyan el contexto cultural. Las complejidades rítmicas de la música clásica de la India, una de las mayores tradiciones musicales del mundo, no han sido tratadas hasta la fecha en MIR, motivo por el cual la elegimos como nuestro principal objeto de estudio. Nuestra intención es abordar cuestiones de análisis rítmico aún no tratadas en MIR con el fin de contribuir a la disciplina con nuevos métodos sensibles al contexto cultural y generalizables a otras tradiciones musicales. El objetivo de la tesis es el desarrollo de técnicas de procesamiento de señales y aprendizaje automático dirigidas por datos para el análisis, descripción y descubrimiento automáticos de estructuras y patrones rítmicos en colecciones de audio de música clásica de la India. Tras identificar retos y posibilidades, así como varias tareas de investigación relevantes para este objetivo, detallamos la elaboración del corpus de estudio y conjuntos de datos, fundamentales para métodos dirigidos por datos. A continuación, nos centramos en las tareas de análisis métrico y descubrimiento de patrones de percusión. El análisis métrico consiste en la alineación de eventos métricos a diferentes niveles con una grabación de audio. En la tesis formulamos las tareas de deducción de metro, seguimiento de metro y seguimiento informado de metro de acuerdo a la tradición estudiada, se evalúan diferentes modelos bayesianos capaces de incorporar explícitamente información de estructuras métricas de niveles superiores y se proponen nuevas extensiones. Los métodos propuestos superan las limitaciones de las propuestas existentes y los resultados indican la efectividad del análisis informado de metro. La percusión en la música clásica de la India utiliza onomatopeyas para la transmisión del repertorio y la técnica. Utilizamos estas sílabas para definir, representar y descubrir patrones en grabaciones de solos de percusión. A tal fin generamos una transcripción automática basada en un modelo oculto de Márkov, seguida de una búsqueda aproximada de subcadenas usando una biblioteca de patrones de percusión derivada de datos. Experimentos preliminares en patrones de percusión de ópera de Pekín, y en grabaciones de solos de tabla y mridangam, demuestran la utilidad de estas sílabas, identificando nuevos retos para el desarrollo de sistemas prácticos de descubrimiento. Las tecnologías resultantes de esta investigación son parte de un conjunto de herramientas desarrollado en el proyecto CompMusic para el mejor entendimiento y organización de la música clásica de la India, con el objetivo de proveer una experiencia mejorada de escucha y descubrimiento de música. Estos datos y herramientas pueden ser también relevantes para estudios musicológicos dirigidos por datos y otras tareas de MIR que puedan beneficiarse de análisis automáticos de ritmo.
Large and growing collections of a wide variety of music are now available on demand to music listeners, necessitating novel ways of automatically structuring these collections using different dimensions of music. Rhythm is one of the basic music dimensions and its automatic analysis, which aims to extract musically meaningful rhythm related information from music, is a core task in Music Information Research (MIR). Musical rhythm, similar to most musical dimensions, is culture-specific and hence its analysis requires culture-aware approaches. Indian art music is one of the major music traditions of the world and has complexities in rhythm that have not been addressed by the current state of the art in MIR, motivating us to choose it as the primary music tradition for study. Our intent is to address unexplored rhythm analysis problems in Indian art music to push the boundaries of the current MIR approaches by making them culture-aware and generalizable to other music traditions. The thesis aims to build data-driven signal processing and machine learning approaches for automatic analysis, description and discovery of rhythmic structures and patterns in audio music collections of Indian art music. After identifying challenges and opportunities, we present several relevant research tasks that open up the field of automatic rhythm analysis of Indian art music. Data-driven approaches require well curated data corpora for research and efforts towards creating such corpora and datasets are documented in detail. We then focus on the topics of meter analysis and percussion pattern discovery in Indian art music. Meter analysis aims to align several hierarchical metrical events with an audio recording. Meter analysis tasks such as meter inference, meter tracking and informed meter tracking are formulated for Indian art music. Different Bayesian models that can explicitly incorporate higher level metrical structure information are evaluated for the tasks and novel extensions are proposed. The proposed methods overcome the limitations of existing approaches and their performance indicate the effectiveness of informed meter analysis. Percussion in Indian art music uses onomatopoeic oral mnemonic syllables for the transmission of repertoire and technique, providing a language for percussion. We use these percussion syllables to define, represent and discover percussion patterns in audio recordings of percussion solos. We approach the problem of percussion pattern discovery using hidden Markov model based automatic transcription followed by an approximate string search using a data derived percussion pattern library. Preliminary experiments on Beijing opera percussion patterns, and on both tabla and mridangam solo recordings in Indian art music demonstrate the utility of percussion syllables, identifying further challenges to building practical discovery systems. The technologies resulting from the research in the thesis are a part of the complete set of tools being developed within the CompMusic project for a better understanding and organization of Indian art music, aimed at providing an enriched experience with listening and discovery of music. The data and tools should also be relevant for data-driven musicological studies and other MIR tasks that can benefit from automatic rhythm analysis.
Les col·leccions de música són cada vegada més grans i variades, fet que fa necessari buscar noves fórmules per a organitzar automàticament aquestes col·leccions. El ritme és una de les dimensions bàsiques de la música, i el seu anàlisi automàtic és una de les principals àrees d'investigació en la disciplina de l'recuperació de la informació musical (MIR, acrònim de la traducció a l'anglès). El ritme, com la majoria de les dimensions musicals, és específic per a cada cultura i per tant, el seu anàlisi requereix de mètodes que incloguin el context cultural. La complexitat rítmica de la música clàssica de l'Índia, una de les tradicions musicals més grans al món, no ha estat encara treballada en el camp d'investigació de MIR - motiu pel qual l'escollim com a principal material d'estudi. La nostra intenció és abordar les problemàtiques que presenta l'anàlisi rítmic de la música clàssica de l'Índia, encara no tractades en MIR, amb la finalitat de contribuir en la disciplina amb nous models sensibles al context cultural i generalitzables a altres tradicions musicals. L'objectiu de la tesi consisteix en desenvolupar tècniques de processament de senyal i d'aprenentatge automàtic per a l'anàlisi, descripció i descobriment automàtic d'estructures i patrons rítmics en col·leccions de música clàssica de l'Índia. Després d'identificar els reptes i les oportunitats, així com les diverses tasques d'investigació rellevants per a aquest objectiu, detallem el procés d'elaboració del corpus de dades, fonamentals per als mètodes basats en dades. A continuació, ens centrem en les tasques d'anàlisis mètric i descobriment de patrons de percussió. L'anàlisi mètric consisteix en alinear els diversos esdeveniments mètrics -a diferents nivells- que es produeixen en una gravació d'àudio. En aquesta tesi formulem les tasques de deducció, seguiment i seguiment informat de la mètrica. D'acord amb la tradició musical estudiada, s'avaluen diferents models bayesians que poden incorporar explícitament estructures mètriques d'alt nivell i es proposen noves extensions per al mètode. Els mètodes proposats superen les limitacions dels mètodes ja existents i el seu rendiment indica l'efectivitat dels mètodes informats d'anàlisis mètric. La percussió en la música clàssica de l'Índia utilitza onomatopeies per a la transmissió del repertori i de la tècnica, fet que construeix un llenguatge per a la percussió. Utilitzem aquestes síl·labes percussives per a definir, representar i descobrir patrons en enregistraments de solos de percussió. Enfoquem el problema del descobriment de patrons percussius amb un model de transcripció automàtica basat en models ocults de Markov, seguida d'una recerca aproximada de strings utilitzant una llibreria de patrons de percussions derivada de dades. Experiments preliminars amb patrons de percussió d'òpera de Pequín, i amb gravacions de solos de tabla i mridangam, demostren la utilitat de les síl·labes percussives. Identificant, així, nous horitzons per al desenvolupament de sistemes pràctics de descobriment. Les tecnologies resultants d'aquesta recerca són part de les eines desenvolupades dins el projecte de CompMusic, que té com a objectiu millorar l'experiència d'escoltar i descobrir música per a la millor comprensió i organització de la música clàssica de l'Índia, entre d'altres. Aquestes dades i eines poden ser rellevants per a estudis musicològics basats en dades i, també, altres tasques MIR poden beneficiar-se de l'anàlisi automàtic del ritme.

APA, Harvard, Vancouver, ISO, and other styles

38

Patino, Villar José María. "Efficient speaker diarization and low-latency speaker spotting." Thesis, Sorbonne université, 2019. http://www.theses.fr/2019SORUS003/document.

Full text

Abstract:

La segmentation et le regroupement en locuteurs (SRL) impliquent la détection des locuteurs dans un flux audio et les intervalles pendant lesquels chaque locuteur est actif, c'est-à-dire la détermination de ‘qui parle quand’. La première partie des travaux présentés dans cette thèse exploite une approche de modélisation du locuteur utilisant des clés binaires (BKs) comme solution à la SRL. La modélisation BK est efficace et fonctionne sans données d'entraînement externes, car elle utilise uniquement des données de test. Les contributions présentées incluent l'extraction des BKs basée sur l'analyse spectrale multi-résolution, la détection explicite des changements de locuteurs utilisant les BKs, ainsi que les techniques de fusion SRL qui combinent les avantages des BKs et des solutions basées sur un apprentissage approfondi. La tâche de la SRL est étroitement liée à celle de la reconnaissance ou de la détection du locuteur, qui consiste à comparer deux segments de parole et à déterminer s'ils ont été prononcés par le même locuteur ou non. Même si de nombreuses applications pratiques nécessitent leur combinaison, les deux tâches sont traditionnellement exécutées indépendamment l'une de l'autre. La deuxième partie de cette thèse porte sur une application où les solutions de SRL et de reconnaissance des locuteurs sont réunies. La nouvelle tâche, appelée détection de locuteurs à faible latence, consiste à détecter rapidement les locuteurs connus dans des flux audio à locuteurs multiples. Il s'agit de repenser la SRL en ligne et la manière dont les sous-systèmes de SRL et de détection devraient être combinés au mieux
Speaker diarization (SD) involves the detection of speakers within an audio stream and the intervals during which each speaker is active, i.e. the determination of ‘who spoken when’. The first part of the work presented in this thesis exploits an approach to speaker modelling involving binary keys (BKs) as a solution to SD. BK modelling is efficient and operates without external training data, as it operates using test data alone. The presented contributions include the extraction of BKs based on multi-resolution spectral analysis, the explicit detection of speaker changes using BKs, as well as SD fusion techniques that combine the benefits of both BK and deep learning based solutions. The SD task is closely linked to that of speaker recognition or detection, which involves the comparison of two speech segments and the determination of whether or not they were uttered by the same speaker. Even if many practical applications require their combination, the two tasks are traditionally tackled independently from each other. The second part of this thesis considers an application where SD and speaker recognition solutions are brought together. The new task, coined low latency speaker spotting (LLSS), involves the rapid detection of known speakers within multi-speaker audio streams. It involves the re-thinking of online diarization and the manner by which diarization and detection sub-systems should best be combined

APA, Harvard, Vancouver, ISO, and other styles

39

Liu, Jia-Ge, and 劉佳格. "Vocal password recognition." Thesis, 2014. http://ndltd.ncl.edu.tw/handle/25956858531137285592.

Full text

Abstract:

碩士
國立臺灣大學
工程科學及海洋工程學研究所
102
The purpose of this study is to identify speaker by vocal password recognition. Each password includes four Chinese characters. Once a examinee speaks a password, energy method was used to separate the characters in the password. The characteristic frequencies of each character were obtained by calculating its first, second and third moments of the amplitude spectrum. The characteristic frequencies were used to recognize speaker’s identities. The sampled data included data for training and testing. The training data set was used to obtain the threshold and scoring rule in recognition process. The testing data set was used to verify the accuracy of the recognition method. Three different Chinese phases were investigated. The identification rates for each phase were 90%, 88% and 90%. It is inferred that the proposed method for vocal password recognition is very feasible.

APA, Harvard, Vancouver, ISO, and other styles

40

"Speaker recognition using complementary information from vocal source and vocal tract." Thesis, 2005. http://library.cuhk.edu.hk/record=b6074159.

Full text

Abstract:

Experimental results show that source-tract information fusion can also improve the robustness of speaker recognition systems in mismatched conditions. For example, relative improvements of 15.3% and 12.6% have been achieved for speaker identification and verification, respectively.
For speaker verification, a text-dependent weighting scheme is developed. Analysis results show that the source-tract discrimination ratio varies significantly across different sounds due to the diversity of vocal system configurations in speech production. This thesis analyzes the source-tract speaker discrimination ratio for the 10 Cantonese digits, upon which a digit-dependent source-tract weighting scheme is developed. Information fusion with such digit-dependent weights relatively improves the verification performance by 39.6% in matched conditions.
This thesis investigates the feasibility of using both vocal source and vocal tract information to improve speaker recognition performance. Conventional speaker recognition systems typically employ vocal tract related acoustic features, e.g the Mel-frequency cepstral coefficients (MFCC), for discriminative purpose. Motivated by the physiological significance of the vocal source and vocal tract system in speech production, this thesis develops a speaker recognition system to effectively incorporate these two complementary information sources for improved performance and robustness.
This thesis presents a novel approach of representing the speaker-specific vocal source characteristics. The linear predictive (LP) residual signal is adopted as a good representative of the vocal source excitation, in which the speaker specific information resides on both time and frequency domains. Haar transform and wavelet transform are applied for multi-resolution analyses of the LP residual signal. The resulting vocal source features, namely the Haar octave coefficients of residues (HOCOR) and wavelet octave coefficients of residues (WOCOR), can effectively extract the speaker-specific spectro-temporal characteristics of the LP residual signal. Particularly, with pitch-synchronous wavelet transform, the WOCOR feature set is capable of capturing the pitch-related low frequency properties and the high frequency information associated with pitch epochs, as well as their temporal variations within a pitch period and over consecutive periods. The generated vocal source and vocal tract features are complementary to each other since they are derived from two orthogonal components, the LP residual signal and LP coefficients. Therefore they can be fused to provide better speaker recognition performance. A preliminary scheme of fusing MFCC and WOCOR together illustrated that the identification and verification performance can be respectively improved by 34.6% and 23.6%, both in matched conditions.
To maximize the benefit obtained through the fusion of source and tract information, speaker discrimination dependent fusion techniques have been developed. For speaker identification, a confidence measure, which indicates the reliability of vocal source feature in speaker identification, is derived based on the discrimination ratio between the source and tract features in each identification trial. Information fusion with confidence measure offers better weighted scores given by the two features and avoids possible errors introduced by incorporating source information, thereby improves the identification performance further. Compared with MFCC, relative improvement of 46.8% has been achieved.
Zheng Nengheng.
"November 2005."
Adviser: Pak-Chung Ching.
Source: Dissertation Abstracts International, Volume: 67-11, Section: B, page: 6647.
Thesis (Ph.D.)--Chinese University of Hong Kong, 2005.
Includes bibliographical references (p. 123-135).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [200-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstracts in English and Chinese.
School code: 1307.

APA, Harvard, Vancouver, ISO, and other styles

41

Wu, Yi-Chung, and 吳逸中. "Recognition of Vocal and Non-Vocal Segments in Musical Sound Tracks." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/2dkh4f.

Full text

Abstract:

碩士
國立臺北科技大學
資訊工程系研究所
100
In this thesis, we use MFCC, LPC, LPCC feature extractions and HMM(Hidden Markov Model) tool to do training and create a model. Then use the model to recognize the testing songs. The songs in the database will be separated into two parts, training songs and testing songs. We compare MFCC and LPCC Likelihood difference to increase the recognition rate.In addition, we tried to recognize the Vocal and Non-Vocal segments by computing correlation coefficient of left channel and right channel of the stereo songs.

APA, Harvard, Vancouver, ISO, and other styles

42

LUO, GUANG-CUN, and 羅廣村. "Intelligent Recognition System for Vocal Folds Disorder." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/10599482937880025712.

Full text

Abstract:

碩士
國立臺灣科技大學
高分子系
97
The purpose of this study was to develop an intelligent recognition system for vocal cord disorders. The image processing technology and neural networks were applied to identify different disease conditions of vocal cords, including the presence of paralysis, tumor, malignant cancer, or no diseases. We first changed three random sets of original obtained images into a grayscale format, followed by applying histogram equalization to obtain a good contrast. We then used statistical threshold to segment the processed images. The binary images were lastly computed by labeling, dilation and erosion to obtain the position of the glottis. The features of the three sets of images would be the input of the neural network. After testing 95 samples, the experimental results reveal that the third set had the best recognition rate reaching 93.6%. The results of this experiment support that the intelligent recognition system has the ability to identify vocal cord disorder. Thereby the problems obtained from misdiagnosis and subjective knowledge in the medical field could be reduced effectively.

APA, Harvard, Vancouver, ISO, and other styles

43

"Robust speaker recognition using both vocal source and vocal tract features estimated from noisy input utterances." 2007. http://library.cuhk.edu.hk/record=b5893317.

Full text

Abstract:

Wang, Ning.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2007.
Includes bibliographical references (leaves 106-115).
Abstracts in English and Chinese.
Chapter 1 --- Introduction --- p.1
Chapter 1.1 --- Introduction to Speech and Speaker Recognition --- p.1
Chapter 1.2 --- Difficulties and Challenges of Speaker Authentication --- p.6
Chapter 1.3 --- Objectives and Thesis Outline --- p.7
Chapter 2 --- Speaker Recognition System --- p.10
Chapter 2.1 --- Baseline Speaker Recognition System Overview --- p.10
Chapter 2.1.1 --- Feature Extraction --- p.12
Chapter 2.1.2 --- Pattern Generation and Classification --- p.24
Chapter 2.2 --- Performance Evaluation Metric for Different Speaker Recognition Tasks --- p.30
Chapter 2.3 --- Robustness of Speaker Recognition System --- p.30
Chapter 2.3.1 --- Speech Corpus: CU2C --- p.30
Chapter 2.3.2 --- Noise Database: NOISEX-92 --- p.34
Chapter 2.3.3 --- Mismatched Training and Testing Conditions --- p.35
Chapter 2.4 --- Summary --- p.37
Chapter 3 --- Speaker Recognition System using both Vocal Tract and Vocal Source Features --- p.38
Chapter 3.1 --- Speech Production Mechanism --- p.39
Chapter 3.1.1 --- Speech Production: An Overview --- p.39
Chapter 3.1.2 --- Acoustic Properties of Human Speech --- p.40
Chapter 3.2 --- Source-filter Model and Linear Predictive Analysis --- p.44
Chapter 3.2.1 --- Source-filter Speech Model --- p.44
Chapter 3.2.2 --- Linear Predictive Analysis for Speech Signal --- p.46
Chapter 3.3 --- Vocal Tract Features --- p.51
Chapter 3.4 --- Vocal Source Features --- p.52
Chapter 3.4.1 --- Source Related Features: An Overview --- p.52
Chapter 3.4.2 --- Source Related Features: Technical Viewpoints --- p.54
Chapter 3.5 --- Effects of Noises on Speech Properties --- p.55
Chapter 3.6 --- Summary --- p.61
Chapter 4 --- Estimation of Robust Acoustic Features for Speaker Discrimination --- p.62
Chapter 4.1 --- Robust Speech Techniques --- p.63
Chapter 4.1.1 --- Noise Resilience --- p.64
Chapter 4.1.2 --- Speech Enhancement --- p.64
Chapter 4.2 --- Spectral Subtractive-Type Preprocessing --- p.65
Chapter 4.2.1 --- Noise Estimation --- p.66
Chapter 4.2.2 --- Spectral Subtraction Algorithm --- p.66
Chapter 4.3 --- LP Analysis of Noisy Speech --- p.67
Chapter 4.3.1 --- LP Inverse Filtering: Whitening Process --- p.68
Chapter 4.3.2 --- Magnitude Response of All-pole Filter in Noisy Condition --- p.70
Chapter 4.3.3 --- Noise Spectral Reshaping --- p.72
Chapter 4.4 --- Distinctive Vocal Tract and Vocal Source Feature Extraction . . --- p.73
Chapter 4.4.1 --- Vocal Tract Feature Extraction --- p.73
Chapter 4.4.2 --- Source Feature Generation Procedure --- p.75
Chapter 4.4.3 --- Subband-specific Parameterization Method --- p.79
Chapter 4.5 --- Summary --- p.87
Chapter 5 --- Speaker Recognition Tasks & Performance Evaluation --- p.88
Chapter 5.1 --- Speaker Recognition Experimental Setup --- p.89
Chapter 5.1.1 --- Task Description --- p.89
Chapter 5.1.2 --- Baseline Experiments --- p.90
Chapter 5.1.3 --- Identification and Verification Results --- p.91
Chapter 5.2 --- Speaker Recognition using Source-tract Features --- p.92
Chapter 5.2.1 --- Source Feature Selection --- p.92
Chapter 5.2.2 --- Source-tract Feature Fusion --- p.94
Chapter 5.2.3 --- Identification and Verification Results --- p.95
Chapter 5.3 --- Performance Analysis --- p.98
Chapter 6 --- Conclusion --- p.102
Chapter 6.1 --- Discussion and Conclusion --- p.102
Chapter 6.2 --- Suggestion of Future Work --- p.104

APA, Harvard, Vancouver, ISO, and other styles

44

Zhang, Yu Long, and 張裕隆. "The analysis and recognition of human vocal emotions." Thesis, 1994. http://ndltd.ncl.edu.tw/handle/61261274473123342766.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Mai, Chun-Cheng, and 麥鈞程. "Convolutional Neural Networks for Vocal Password Recognition System." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/7r958r.

Full text

Abstract:

碩士
國立臺灣大學
工程科學及海洋工程學研究所
107
There are many different types of biometric systems that are developed because of the need for security and convenience. The biometric technology is based on the unique biological characteristics of each organism such as fingerprint recognition, iris recognition, etc. The voice recognition is also one of the biometric characteristic. One day, people may unlock their cellphone by just talking to their cellphone which make life more convenient. Deep Learning has become one of the most popular research topic becase of ALPHAGO. Everyone started to study how to apply deep learning to a variety of problems and the convolutional neural networks is also an important area in the development of neural networks. This research proposes a vocal password recognition system based on convolutional neural network. Using the grayscale image generated by the speaker’s voice signals as an input to the convolutional neural network and use it to produce the classfication result to build the vocal password recognition system.

APA, Harvard, Vancouver, ISO, and other styles

46

"Use of vocal source features in speaker segmentation." 2006. http://library.cuhk.edu.hk/record=b5892857.

Full text

Abstract:

Chan Wai Nang.
Thesis (M.Phil.)--Chinese University of Hong Kong, 2006.
Includes bibliographical references (leaves 77-82).
Abstracts in English and Chinese.
Chapter Chapter1 --- Introduction --- p.1
Chapter 1.1 --- Speaker recognition --- p.1
Chapter 1.2 --- State of the art of speaker recognition techniques --- p.2
Chapter 1.3 --- Motivations --- p.5
Chapter 1.4 --- Thesis outline --- p.6
Chapter Chapter2 --- Acoustic Features --- p.8
Chapter 2.1 --- Speech production --- p.8
Chapter 2.1.1 --- Physiology of speech production --- p.8
Chapter 2.1.2 --- Source-filter model --- p.11
Chapter 2.2 --- Vocal tract and vocal source related acoustic features --- p.14
Chapter 2.3 --- Linear predictive analysis of speech --- p.15
Chapter 2.4 --- Features for speaker recognition --- p.16
Chapter 2.4.1 --- Vocal tract related features --- p.17
Chapter 2.4.2 --- Vocal source related features --- p.19
Chapter 2.5 --- Wavelet octave coefficients of residues (WOCOR) --- p.20
Chapter Chapter3 --- Statistical approaches to speaker recognition --- p.24
Chapter 3.1 --- Statistical modeling --- p.24
Chapter 3.1.1 --- Classification and modeling --- p.24
Chapter 3.1.2 --- Parametric vs non-parametric --- p.25
Chapter 3.1.3 --- Gaussian mixture model (GMM) --- p.25
Chapter 3.1.4 --- Model estimation --- p.27
Chapter 3.2 --- Classification --- p.28
Chapter 3.2.1 --- Multi-class classification for speaker identification --- p.28
Chapter 3.2.2 --- Two-speaker recognition --- p.29
Chapter 3.2.3 --- Model selection by statistical model --- p.30
Chapter 3.2.4 --- Performance evaluation metric --- p.31
Chapter Chapter4 --- Content dependency study of WOCOR and MFCC --- p.32
Chapter 4.1 --- Database: CU2C --- p.32
Chapter 4.2 --- Methods and procedures --- p.33
Chapter 4.3 --- Experimental results --- p.35
Chapter 4.4 --- Discussion --- p.36
Chapter 4.5 --- Detailed analysis --- p.39
Summary --- p.41
Chapter Chapter5 --- Speaker Segmentation --- p.43
Chapter 5.1 --- Feature extraction --- p.43
Chapter 5.2 --- Statistical methods for segmentation and clustering --- p.44
Chapter 5.2.1 --- Segmentation by spectral difference --- p.44
Chapter 5.2.2 --- Segmentation by Bayesian information criterion (BIC) --- p.47
Chapter 5.2.3 --- Segment clustering by BIC --- p.49
Chapter 5.3 --- Baseline system --- p.50
Chapter 5.3.1 --- Algorithm --- p.50
Chapter 5.3.2 --- Speech database --- p.52
Chapter 5.3.3 --- Performance metric --- p.53
Chapter 5.3.4 --- Results --- p.58
Summary --- p.60
Chapter Chapter6 --- Application of vocal source features in speaker segmentation --- p.61
Chapter 6.1 --- Discrimination power of WOCOR against MFCC --- p.61
Chapter 6.1.1 --- Experimental set-up --- p.62
Chapter 6.1.2 --- Results --- p.63
Chapter 6.2 --- Speaker segmentation using vocal source features --- p.67
Chapter 6.2.1 --- The construction of new proposed system --- p.67
Summary --- p.72
Chapter Chapter7 --- Conclusions --- p.74
Reference --- p.77

APA, Harvard, Vancouver, ISO, and other styles

47

De, Armas Winston. "Vocal Frequency Estimation and Voicing State Prediction with Surface EMG Pattern Recognition." Thesis, 2013. http://hdl.handle.net/1807/35596.

Full text

Abstract:

Most electrolarynges do not allow hands-free use or pitch modulation. This study presents the potential of pattern recognition to support electrolarynx use by predicting fundamental frequency (F0) and voicing state (VS) from neck surface EMG and respiratory trace. Respiratory trace and neck surface EMG were collected from 10 normal, adult males (18-60 years old) during different vocal tasks. Time-domain features were extracted from both signals, and a Support Vector Machine (SVM) classifier was employed to model F0 and VS. An average mean-squared-error (MSE) of 8.21 ± 3.5 semitones2 was achieved for the estimation of vocal frequency. An average classification accuracy of 78.05 ± 6.3 % was achieved for the prediction of voicing state from EMG and 65.24 ± 7.8 % from respiratory trace. Our results show that pattern classification of neck-muscle EMG and respiratory trace has merit in the prediction of F0 and VS during vocalization.

APA, Harvard, Vancouver, ISO, and other styles

48

Love, Christopher D. "A speech recognition system using a neural network model for vocal shaping." 1991. http://hdl.handle.net/1993/18261.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

"Exploitation of phase and vocal excitation modulation features for robust speaker recognition." Thesis, 2011. http://library.cuhk.edu.hk/record=b6075192.

Full text

Abstract:

Mel-frequency cepstral coefficients (MFCCs) are widely adopted in speech recognition as well as speaker recognition applications. They are extracted to primarily characterize the spectral envelope of a quasi-stationary speech segment. It was shown that cepstral features are closely related to the linguistic content of speech. Besides the magnitude-based cepstral features, there are resources in speech, e.g, the phase and excitation source, are believed to contain useful properties for speaker discrimination. Moreover, in real situations, there are large variations exist between the development and application scenarios for a speaker recognition system. These include channel mismatch, recording apparatus mismatch, environmental variation, or even change of emotional/healthy state of speakers. As a consequence, the magnitude-based features are insufficient to provide satisfactory and robust speaker recognition accuracy. Therefore, the exploitation of complementary features with MFCCs may provide one solution to alleviate the deficiency, from a feature-based perspective.
Speaker recognition (SR) refers to the process of automatically determining or verifying the identity of a person based on his or her voice characteristics. In practical applications, a voice can be used as one of the modalities in a multimodal biometric system, or be the sole medium for identity authentication. The general area of speaker recognition encompasses two fundamental tasks: speaker identification and speaker verification.
Wang, Ning.
Adviser: Pak-Chung Ching.
Source: Dissertation Abstracts International, Volume: 73-06, Section: B, page: .
Thesis (Ph.D.)--Chinese University of Hong Kong, 2011.
Includes bibliographical references (leaves 177-193).
Electronic reproduction. Hong Kong : Chinese University of Hong Kong, [2012] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Electronic reproduction. [Ann Arbor, MI] : ProQuest Information and Learning, [201-] System requirements: Adobe Acrobat Reader. Available via World Wide Web.
Abstract also in Chinese.

APA, Harvard, Vancouver, ISO, and other styles

50

Wan, Hung-Wei, and 萬鴻緯. "Deep Neural Network-based Anomaly Detection and Recognition in Endoscopic Images of Vocal Folds." Thesis, 2019. http://ndltd.ncl.edu.tw/handle/mtmv4t.

Full text

Abstract:

碩士
元智大學
電機工程學系甲組
107
With the great success of deep learning in computer vision in recent years, it has gradually been applied to many fields, and the paper mainly combines it with biomedicine. Using deep learning on images, the vocal fold medical images captured by the endoscopes are analyzed to automatically detect the location of the vocal fold disorders and distinguish the classes. This technique will be able to assist doctors in the diagnosis of the disease. Different from the previous analysis of vocal fold medical images that needs manual process of the selection of proper images for input. Instead, we analyze each frame of the whole video automatically, and only retain the key frames for subsequent calculation. Doing so will not only reduce labor-intensive operations but also be practical for clinical applications. We propose a system with two stages. Stage 1 is responsible for extracting the ROI from each frame in the vocal fold endoscopy films and stage 2 mainly detects the location of the vocal fold disorders from the extracted ROI by our developed deep learning model. Experiments have shown that our system achieves 79% sensitivity and 96% specificity in recognition of healthy and unhealthy cases, and achieves an accuracy rate of approximately 80% in classification of 3-class disorders.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!