Academic literature on the topic 'Vocal recognition'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the lists of relevant articles, books, theses, conference reports, and other scholarly sources on the topic 'Vocal recognition.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Journal articles on the topic "Vocal recognition"

1

Guo, Taiyang, Zhi Zhu, Shunsuke Kidani, and Masashi Unoki. "Contribution of Common Modulation Spectral Features to Vocal-Emotion Recognition of Noise-Vocoded Speech in Noisy Reverberant Environments." Applied Sciences 12, no. 19 (October 4, 2022): 9979. http://dx.doi.org/10.3390/app12199979.

Full text
Abstract:
In one study on vocal emotion recognition using noise-vocoded speech (NVS), the high similarities between modulation spectral features (MSFs) and the results of vocal-emotion-recognition experiments indicated that MSFs contribute to vocal emotion recognition in a clean environment (with no noise and no reverberation). Other studies also clarified that vocal emotion recognition using NVS is not affected by noisy reverberant environments (signal-to-noise ratio is greater than 10 dB and reverberation time is less than 1.0 s). However, the contribution of MSFs to vocal emotion recognition in noisy reverberant environments is still unclear. We aimed to clarify whether MSFs can be used to explain the vocal-emotion-recognition results in noisy reverberant environments. We analyzed the results of vocal-emotion-recognition experiments and used an auditory-based modulation filterbank to calculate the modulation spectrograms of NVS. We then extracted ten MSFs as higher-order statistics of modulation spectrograms. As shown from the relationship between MSFs and vocal-emotion-recognition results, except for extremely high noisy reverberant environments, there were high similarities between MSFs and the vocal emotion recognition results in noisy reverberant environments, which indicates that MSFs can be used to explain such results in noisy reverberant environments. We also found that there are two common MSFs (MSKTk (modulation spectral kurtosis) and MSTLk (modulation spectral tilt)) that contribute to vocal emotion recognition in all daily environments.
APA, Harvard, Vancouver, ISO, and other styles
2

Sorokin, V. N., and I. S. Makarov. "Gender recognition from vocal source." Acoustical Physics 54, no. 4 (July 2008): 571–78. http://dx.doi.org/10.1134/s1063771008040192.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Huang, Chunyuan. "Vocal Music Teaching Pharyngeal Training Method Based on Audio Extraction by Big Data Analysis." Wireless Communications and Mobile Computing 2022 (May 6, 2022): 1–11. http://dx.doi.org/10.1155/2022/4572904.

Full text
Abstract:
In the process of vocal music learning, incorrect vocalization methods and excessive use of voice have brought many problems to the voice and accumulated a lot of inflammation, so that the level of vocal music learning stagnated or even declined. How to find a way to improve yourself without damaging your voice has become a problem that we have been pursuing. Therefore, it is of great practical significance for vocal music teaching in normal universities to conduct in-depth research and discussion on “pharyngeal singing.” Based on audio extraction, this paper studies the vocal music teaching pharyngeal training method. Different methods of vocal music teaching pharyngeal training have different times. When the recognition amount is 3, the average recognition time of vocal music teaching pharyngeal training based on data mining is 0.010 seconds, the average recognition time of vocal music teaching pharyngeal training based on Internet of Things is 0.011 seconds, and the average recognition time of vocal music teaching pharyngeal training based on audio extraction is 0.006 seconds. The recognition time of the audio extraction method is much shorter than that of the other two traditional methods, because the audio extraction method can perform segmented training according to the changing trend of physical characteristics of notes, effectively extract the characteristics of vocal music teaching pharyngeal training, and shorten the recognition time. The learning of “pharyngeal singing” in vocal music teaching based on audio extraction is different from general vocal music training. It has its unique theory, concept, law, and sound image. In order to “liberate your voice,” it adopts large-capacity and large-scale training methods.
APA, Harvard, Vancouver, ISO, and other styles
4

Mo, Wenwen, and Yuan Yuan. "Design of Interactive Vocal Guidance and Artistic Psychological Intervention System Based on Emotion Recognition." Occupational Therapy International 2022 (June 17, 2022): 1–9. http://dx.doi.org/10.1155/2022/1079097.

Full text
Abstract:
The research on artistic psychological intervention to judge emotional fluctuations by extracting emotional features from interactive vocal signals has become a research topic with great potential for development. Based on the interactive vocal music instruction theory of emotion recognition, this paper studies the design of artistic psychological intervention system. This paper uses the vocal music emotion recognition algorithm to first train the interactive recognition network, in which the input is a row vector composed of different vocal music characteristics, and finally recognizes the vocal music of different emotional categories, which solves the problem of low data coupling in the artistic psychological intervention system. Among them, the vocal music emotion recognition experiment based on the interactive recognition network is mainly carried out from six aspects: the number of iterative training, the vocal music instruction rate, the number of emotion recognition signal nodes in the artistic psychological intervention layer, the number of sample sets, different feature combinations, and the number of emotion types. The input data of the system is a training class learning video, and actions and expressions need to be recognized before scoring. In the simulation process, before the completion of the sample indicators is unbalanced, the R language statistical analysis tool is used to balance the existing unbalanced data based on the artificial data synthesis method, and 279 uniformly classified samples are obtained. The 279 ∗ 7 dataset was used for statistical identification of the participants. The experimental results show that under the guidance of four different interactive vocal music, the vocal emotion recognition rate is between 65.85%-91.00%, which promotes the intervention of music therapy on artistic psychological intervention.
APA, Harvard, Vancouver, ISO, and other styles
5

Bryant, Gregory, and H. Clark Barrett. "Vocal Emotion Recognition Across Disparate Cultures." Journal of Cognition and Culture 8, no. 1-2 (2008): 135–48. http://dx.doi.org/10.1163/156770908x289242.

Full text
Abstract:
AbstractThere exists substantial cultural variation in how emotions are expressed, but there is also considerable evidence for universal properties in facial and vocal affective expressions. This is the first empirical effort examining the perception of vocal emotional expressions across cultures with little common exposure to sources of emotion stimuli, such as mass media. Shuar hunter-horticulturalists from Amazonian Ecuador were able to reliably identify happy, angry, fearful and sad vocalizations produced by American native English speakers by matching emotional spoken utterances to emotional expressions portrayed in pictured faces. The Shuar performed similarly to English speakers who heard the same utterances in a content-filtered condition. These data support the hypothesis that vocal emotional expressions of basic affective categories manifest themselves in similar ways across quite disparate cultures.
APA, Harvard, Vancouver, ISO, and other styles
6

Masapollo, Matthew, Linda Polka, Lucie Menard, and Athena Vouloumanos. "Infant recognition of infant vocal signals." Journal of the Acoustical Society of America 133, no. 5 (May 2013): 3334. http://dx.doi.org/10.1121/1.4805602.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Konev, Anton, Evgeny Kostyuchenko, and Alexey Yakimuk. "The program complex for vocal recognition." Journal of Physics: Conference Series 803 (January 2017): 012077. http://dx.doi.org/10.1088/1742-6596/803/1/012077.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Sorokin, V. N., A. A. Tananykin, and V. G. Trunov. "Speaker recognition using vocal source model." Pattern Recognition and Image Analysis 24, no. 1 (March 2014): 156–73. http://dx.doi.org/10.1134/s1054661814010179.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Sorokin, V. N. "Vocal Source Contribution to Speaker Recognition." Pattern Recognition and Image Analysis 28, no. 3 (July 2018): 546–56. http://dx.doi.org/10.1134/s1054661818030197.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Houde, Robert A., and James M. Hillenbrand. "Vocal tract normalization for vowel recognition." Journal of the Acoustical Society of America 121, no. 5 (May 2007): 3189. http://dx.doi.org/10.1121/1.4782401.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic "Vocal recognition"

1

Mathukumalli, Sravya. "Vocal combo android application." Kansas State University, 2016. http://hdl.handle.net/2097/32505.

Full text
Abstract:
Master of Science
Department of Computing and Information Sciences
Mitchell L. Neilsen
Now-a-days people from various backgrounds need different information on demand. People are relying on web as a source of information they need. But to get connected to internet all the time with a computer system is not possible. Android, being open source has already made its mark in the mobile application development. The highest smart phone user base and the ease of developing the applications in Android is an added advantage for the users as well as the Android developers. The vocal combo is an Android application which provides the required functionality on Android supported smart phone or a tablet. This provides the flexibility of accessing information at the users’ fingertips. This application is built using Android SDK, which makes the application easy to deploy on any Android powered device. Vocal Combo is a combination of voice based applications. It includes a Text-To-Voice convertor and Voice-To-Text convertor. This application helps the user to learn the pronunciation of various words. At the same time the user can also check his/her pronunciation skills. This application also provides the functionality of meaning check where the user can check the meaning of the words he types in or speaks out. At any point of time, the user can check the history of the words for which he has checked the meaning or pronunciation for. The application also provides the support to the user on how to use this application.
APA, Harvard, Vancouver, ISO, and other styles
2

Benkrid, A. "Real time TLM vocal tract modelling." Thesis, University of Nottingham, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.352958.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

貴文, 古山, and Takafumi Furuyama. "Vocal recognition in primate : comparison between Japanese macaques and humans." Thesis, https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13045021/?lang=0, 2017. https://doors.doshisha.ac.jp/opac/opac_link/bibid/BB13045021/?lang=0.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Wildermoth, Brett Richard, and n/a. "Text-Independent Speaker Recognition Using Source Based Features." Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
APA, Harvard, Vancouver, ISO, and other styles
5

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features." Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Full text
Abstract:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
APA, Harvard, Vancouver, ISO, and other styles
6

Keenan, Sumir. "Identity information in bonobo vocal communication : from sender to receiver." Thesis, Lyon, 2016. http://www.theses.fr/2016LYSES038/document.

Full text
Abstract:
L’information "identité individuelle" est essentielle chez les espèces fortement sociales car elle permet la reconnaissance individuelle et la différenciation des partenaires sociaux dans de nombreux contextes tels que les relations de dominance, les relations mère-jeunes, la défense territoriale, ou encore participe à la cohésion et coordination de groupe. Chez de nombreuses espèces, le canal audio est l’une des voies les plus efficaces de communication dans des environnementscomplexes et à longue distance. Les vocalisations sont empreintes de caractéristiques acoustiques propres à la voix de chaque individu. La combinaison entre ces signatures vocales individuelles et la connaissance sociale accumulée sur les congénères peut grandement favoriser la valeur sélective des animaux, en facilitant notamment les prises de décisions sociales les plus adaptées. Le but de ma recherche est d’étudier le codage et décodage de l’information "identité individuelle" du système vocal de communication du bonobo, Pan paniscus. Premièrement, nous avons recherché la stabilité des signatures vocales des cinq types de cris les plus courants du répertoire du bonobo. Nous avons trouvé que, bien que ces cinq types de cris aient le potentiel de coder l’information individuelle, les cris les plus forts émis dans des contextes d’excitation intense et de communication à longue distance ont les signatures vocales individuelles les plus marquées. Deuxièmement, nous avons étudié l’effet de la familiarité sociale et des liens de parenté sur les caractéristiquesacoustiques qui codent l’information individuelle dans un type de cri "bark". Nous avons mis en évidence l’existence d’une forte convergence vocale. Les individus apparentés et familiers, et indépendamment l’un de l’autre, présentent plus desimilarités vocales qu’entre des individus non apparentés et non familiers. Enfin, dans une troisième étude, nous avons testé la capacité des bonobos à utiliser l’information "identité individuelle" codée dans les vocalisations pour discriminer la voix d’anciens partenaires sociaux avec qui ils ne vivent plus. Par une série d’expériences de repasse, nous avons démontré que les bonobos étaient capables de reconnaître la voix d’individus familiers sur la seule base de l’acoustique, et cela même après des années de séparation. L’ensemble de ce travail de thèse montre que le codage et décodage de l’information "identité individuelle" chez le bonobo est un système dynamique, sujet à modification avec l’environnement social mais suffisamment fiable pour permettre la reconnaissance individuelle au cours du temps. En conclusion cette étude participe à une meilleure compréhension du système de communication vocale chez un primate non-humain forestier, au réseau social unique et complexe
Identity information is vital for highly social species as it facilitates individual recognition and allows for differentiation between social partners in many contexts, such as dominance hierarchies, territorial defence, mating and parent-offspringidentification and group cohesion and coordination. In many species vocalisations can be the most effective communication channel through complex environments and over long-distances and are encoded with the stable features of an individual’s voice. Associations between these individual vocal signatures and accumulated social knowledge about conspecifics can greatly increase an animal’s fitness, as it facilitates adaptively constructive social decisions. This thesis investigates the encoding and decoding of identity information in the vocal communication system of the bonobo, Pan paniscus. We firstly investigated the stability of vocal signatures across the five most common call types in the bonobo vocal repertoire. Results showed that while all call types have the potential to code identity information, loud calls used during times of high arousal and for distance communication have the strongest individual vocal signatures. Following the first study, we investigated if social familiarity and relatedness affect the acoustic features that code individual information in the bark call type. Overall, we found strong evidence for vocal convergence, and specifically, that individuals who are related and familiar, independently from one another, are more vocally similar to one another than unrelated and unfamiliar individuals. In a final study we tested if bonobos are capable of using the encoded identity information to recognise past group members that they no longer live with. Through a series playback experiments we demonstrated that bonobos are capable of recognising familiar individuals from vocalisations alone even after years of separation. Collectively, the results of this thesis show that the encoding and decoding of identity information in bonobo vocalisations is a dynamic system, subject to modification through social processes but robust enough to allow for individual recognition over time. In conclusion these studies contribute to a better understanding of the vocal communication system of a non-human primate species with a unique and complex social network
APA, Harvard, Vancouver, ISO, and other styles
7

Ma, Zongqiang. "Spontaneous speech recognition using statistical dynamic models for the vocal-tract-resonance dynamics." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0020/NQ53993.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Yuditskaya, Sophia. "Automatic vocal recognition of a child's perceived emotional state within the Speechome corpus." Thesis, Massachusetts Institute of Technology, 2010. http://hdl.handle.net/1721.1/62086.

Full text
Abstract:
Thesis (S.M.)--Massachusetts Institute of Technology, School of Architecture and Planning, Program in Media Arts and Sciences, 2010.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 137-149).
With over 230,000 hours of audio/video recordings of a child growing up in the home setting from birth to the age of three, the Human Speechome Project has pioneered a comprehensive, ecologically valid observational dataset that introduces far-reaching new possibilities for the study of child development. By offering In vivo observation of a child's daily life experience at ultra-dense, longitudinal time scales, the Speechome corpus holds great potential for discovering developmental insights that have thus far eluded observation. The work of this thesis aspires to enable the use of the Speechome corpus for empirical study of emotional factors in early child development. To fully harness the benefits of Speechome for this purpose, an automated mechanism must be created to perceive the child's emotional state within this medium. Due to the latent nature of emotion, we sought objective, directly measurable correlates of the child's perceived emotional state within the Speechome corpus, focusing exclusively on acoustic features of the child's vocalizations and surrounding caretaker speech. Using Partial Least Squares regression, we applied these features to build a model that simulates human perceptual heuristics for determining a child's emotional state. We evaluated the perceptual accuracy of models built across child-only, adult-only, and combined feature sets within the overall sampled dataset, as well as controlling for social situations, vocalization behaviors (e.g. crying, laughing, babble), individual caretakers, and developmental age between 9 and 24 months. Child and combined models consistently demonstrated high perceptual accuracy, with overall adjusted R-squared values of 0.54 and 0.58, respectively, and an average of 0.59 and 0.67 per month. Comparative analysis across longitudinal and socio-behavioral contexts yielded several notable developmental and dyadic insights. In the process, we have developed a data mining and analysis methodology for modeling perceived child emotion and quantifying caretaker intersubjectivity that we hope to extend to future datasets across multiple children, as new deployments of the Speechome recording technology are established. Such large-scale comparative studies promise an unprecedented view into the nature of emotional processes in early childhood and potentially enlightening discoveries about autism and other developmental disorders.
by Sophia Yuditskaya.
S.M.
APA, Harvard, Vancouver, ISO, and other styles
9

Schall, Sonja. "The face in your voice–how audiovisual learning benefits vocal communication." Doctoral thesis, Humboldt-Universität zu Berlin, Mathematisch-Naturwissenschaftliche Fakultät II, 2014. http://dx.doi.org/10.18452/17039.

Full text
Abstract:
Gesicht und Stimme einer Person sind stark miteinander assoziiert und werden normalerweise als eine Einheit wahrgenommen. Trotz des natürlichen gemeinsamen Auftretens von Gesichtern und Stimmen, wurden deren Wahrnehmung in den Neurowissenschaften traditionell aus einer unisensorischen Perspektive untersucht. Das heißt, dass sich Forschung zu Gesichtswahrnehmung ausschließlich auf das visuelle System fokusierte, während Forschung zu Stimmwahrnehmung nur das auditorische System untersuchte. In dieser Arbeit schlage ich vor, dass das Gehirn an die multisensorische Beschaffenheit von Gesichtern und Stimmen adaptiert ist, und dass diese Adaption sogar dann sichtbar ist, wenn nur die Stimme einer Person gehört wird, ohne dass das Gesicht zu sehen ist. Im Besonderen, untersucht diese Arbeit wie das Gehirn zuvor gelernte Gesichts-Stimmassoziationen ausnutzt um die auditorische Analyse von Stimmen und Sprache zu optimieren. Diese Dissertation besteht aus drei empirischen Studien, welche raumzeitliche Hirnaktivität mittels funktionaler Magnetresonanztomographie (fMRT) und Magnetoenzephalographie (MEG) liefern. Alle Daten wurden gemessen, während Versuchspersonen auditive Sprachbeispiele von zuvor familiarisierten Sprechern (mit oder ohne Gesicht des Sprechers) hörten. Drei Ergebnisse zeigen, dass zuvor gelernte visuelle Sprecherinformationen zur auditorischen Analyse von Stimmen beitragen: (i) gesichtssensible Areale waren Teil des sensorischen Netzwerks, dass durch Stimmen aktiviert wurde, (ii) die auditorische Verarbeitung von Stimmen war durch die gelernte Gesichtsinformation zeitlich faszilitiert und (iii) multisensorische Interaktionen zwischen gesichtsensiblen und stimm-/sprachsensiblen Arealen waren verstärkt. Die vorliegende Arbeit stellt den traditionellen, unisensorischen Blickwinkel auf die Wahrnehmung von Stimmen und Sprache in Frage und legt nahe, dass die Wahrnehmung von Stimme und Sprache von von einem multisensorischen Verarbeitungsschema profitiert.
Face and voice of a person are strongly associated with each other and usually perceived as a single entity. Despite the natural co-occurrence of faces and voices, brain research has traditionally approached their perception from a unisensory perspective. This means that research into face perception has exclusively focused on the visual system, while research into voice perception has exclusively probed the auditory system. In this thesis, I suggest that the brain has adapted to the multisensory nature of faces and voices and that this adaptation is evident even when one input stream is missing, that is, when input is actually unisensory. Specifically, the current work investigates how the brain exploits previously learned voice-face associations to optimize the auditory processing of voices and vocal speech. Three empirical studies providing spatiotemporal brain data—via functional magnetic resonance imaging (fMRI) and magnetoencephalography (MEG)—constitute this thesis. All data were acquired while participants listened to auditory-only speech samples of previously familiarized speakers (with or without seeing the speakers’ faces). Three key findings demonstrate that previously learned visual speaker information support the auditory analysis of vocal sounds: (i) face-sensitive areas were part of the sensory network activated by voices, (ii) the auditory analysis of voices was temporally facilitated by learned facial associations and (iii) multisensory interactions between face- and voice/speech-sensitive regions were increased. The current work challenges traditional unisensory views on vocal perception and rather suggests that voice and vocal speech perception profit from a multisensory neural processing scheme.
APA, Harvard, Vancouver, ISO, and other styles
10

Garau, Giulia. "Speaker normalisation for large vocabulary multiparty conversational speech recognition." Thesis, University of Edinburgh, 2009. http://hdl.handle.net/1842/3983.

Full text
Abstract:
One of the main problems faced by automatic speech recognition is the variability of the testing conditions. This is due both to the acoustic conditions (different transmission channels, recording devices, noises etc.) and to the variability of speech across different speakers (i.e. due to different accents, coarticulation of phonemes and different vocal tract characteristics). Vocal tract length normalisation (VTLN) aims at normalising the acoustic signal, making it independent from the vocal tract length. This is done by a speaker specific warping of the frequency axis parameterised through a warping factor. In this thesis the application of VTLN to multiparty conversational speech was investigated focusing on the meeting domain. This is a challenging task showing a great variability of the speech acoustics both across different speakers and across time for a given speaker. VTL, the distance between the lips and the glottis, varies over time. We observed that the warping factors estimated using Maximum Likelihood seem to be context dependent: appearing to be influenced by the current conversational partner and being correlated with the behaviour of formant positions and the pitch. This is because VTL also influences the frequency of vibration of the vocal cords and thus the pitch. In this thesis we also investigated pitch-adaptive acoustic features with the goal of further improving the speaker normalisation provided by VTLN. We explored the use of acoustic features obtained using a pitch-adaptive analysis in combination with conventional features such as Mel frequency cepstral coefficients. These spectral representations were combined both at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA), and at the system level using ROVER. We evaluated this approach on a challenging large vocabulary speech recognition task: multiparty meeting transcription. We found that VTLN benefits the most from pitch-adaptive features. Our experiments also suggested that combining conventional and pitch-adaptive acoustic features using HLDA results in a consistent, significant decrease in the word error rate across all the tasks. Combining at the system level using ROVER resulted in a further significant improvement. Further experiments compared the use of pitch adaptive spectral representation with the adoption of a smoothed spectrogram for the extraction of cepstral coefficients. It was found that pitch adaptive spectral analysis, providing a representation which is less affected by pitch artefacts (especially for high pitched speakers), delivers features with an improved speaker independence. Furthermore this has also shown to be advantageous when HLDA is applied. The combination of a pitch adaptive spectral representation and VTLN based speaker normalisation in the context of LVCSR for multiparty conversational speech led to more speaker independent acoustic models improving the overall recognition performances.
APA, Harvard, Vancouver, ISO, and other styles

Books on the topic "Vocal recognition"

1

Davé, Shilpa S. Introduction. University of Illinois Press, 2017. http://dx.doi.org/10.5406/illinois/9780252037405.003.0001.

Full text
Abstract:
This introductory chapter first sets out the book's purpose, which is to examine the representations and stereotypes of South Asian Americans in relation to immigrant narratives of assimilation in American film and television. It theorizes the performance of accent as a means of representing race and particularly national origin beyond visual identification. For South Asians, accent simultaneously connotes difference and privilege. To focus on an Indian vocal accent is to reconsider racialization predicated on visual recognition. The remainder of the chapter discusses vocal accents and racial hierarchies; South Asian American and Indian American identities; popular Culture, Orientalism, and racial performance; and comedy and racial performance. It concludes with an overview of the subsequent chapters.
APA, Harvard, Vancouver, ISO, and other styles
2

Brandzel, Amy L. Legal Detours of U.S. Empire. University of Illinois Press, 2017. http://dx.doi.org/10.5406/illinois/9780252040030.003.0004.

Full text
Abstract:
This chapter uses Rice v. Cayetano (2000), a Supreme Court case involving a white citizen's challenge to Native Hawaiian representation, as a springboard to explore how race and coloniality are set up as oppositional, anti-intersectional politics. Kanaka Maoli and other indigenous scholars and activists have been quite vocal in critiquing the ways in which the discourse of civil rights and racism serves to obscure and undermine sovereignty claims and critiques of colonialism. The chapter adds to these critiques by demonstrating how the combination of legal and historical discourses sets up a battle between the recognition of racism and the recognition of settler colonialism. It illuminates how discourses of citizenship, law, and history collude to (re)produce the misrecognition and disaggregation of anticolonialist and antiracist endeavors.
APA, Harvard, Vancouver, ISO, and other styles
3

Provine, Robert R. Beyond the Smile. Oxford University Press, 2017. http://dx.doi.org/10.1093/acprof:oso/9780190613501.003.0011.

Full text
Abstract:
With the expectation that innovation, insight, and discovery will come from researching neglected topics, this chapter explores human instincts, including yawning, laughing, vocal crying, emotional tearing, coughing, nausea and vomiting, itching and scratching, and changes in scleral color. The critical change approach is exploited to analyze recently evolved, uniquely human traits (e.g., human-type laughter and speech, emotional tearing, scleral color cues) and compare them with thir primate antecendents, seeking the specific neurological, glandular, and muscular processes responsible for their genesis. Particular attention is paid to contagious behaviors, with the anticipation that they may reveal the roots of sociality and empathy. Few of these curious behaviors are traditionally considered in the context of facial expression or emotion, but they deserve recognition for what they can contribute to behavioral neuroscience and social biology.
APA, Harvard, Vancouver, ISO, and other styles
4

Sampson, Brett G., and Andrew D. Bersten. Therapeutic approach to bronchospasm and asthma. Oxford University Press, 2016. http://dx.doi.org/10.1093/med/9780199600830.003.0111.

Full text
Abstract:
The optimal management of bronchospasm and acute asthma is reliant upon confirmation of the diagnosis of asthma, detection of life-threatening complications, recognition of β‎2 agonist toxicity, and exclusion of important asthma mimics (such as vocal cord dysfunction and left ventricular failure). β‎2 agonists, anticholinergics, and corticosteroids are the mainstay of treatment. β‎2 agonists should be preferentially administered by metered dose inhaler via a spacer, and corticosteroids by the oral route, reserving nebulized (and intravenous) salbutamol, as well as intravenous hydrocortisone, for situations when these routes are not possible. A single intravenous dose of magnesium may be of benefit in severe asthma, but repeat dosing is likely to cause serious side effects. Parenteral administration of adrenaline may prevent the need for intubation in the patient in extremis. Aminophylline has an unfavourable side effect profile and has not been shown to offer additional benefit in adults. However, it does have a role in paediatric asthma. Unproven medical therapies with potential benefit include ketamine, heliox, inhalational anaesthetics, and leukotriene antagonists. The need for ventilatory support is usually preceded by worsening dynamic hyperinflation, exhaustion, hypoxia, reduced conscious state, or a combination of these. While non-invasive ventilation may have a temporizing role to allow time for response to medical therapy, there is insufficient evidence for its use, and should not delay invasive ventilation. If invasive ventilation is indicated, a strategy of hypoventilation and permissive hypercapnoea, minimizes barotrauma and dynamic hyperinflation. Extracorporeal support may have a role as a rescue therapy.
APA, Harvard, Vancouver, ISO, and other styles
5

Kayser, Casey. Marginalized. University Press of Mississippi, 2021. http://dx.doi.org/10.14325/mississippi/9781496835901.001.0001.

Full text
Abstract:
In contrast to other literary genres, drama has received little attention in southern studies, and women playwrights in general receive less recognition than their male counterparts. This book addresses these gaps in its examination of the work of southern women playwrights, making the argument that representations of the American South on stage are complicated by difficulties of identity, genre, and region. Success in American drama is defined as having a play staged in the capital of theatre culture, New York City, the city that might be viewed as most antithetical to the South in terms of geography and ideology. Further, women playwrights, women playwrights of color, and those who express queer identities have been vocal about persistent inequities in American theatre which have created obstacles to their success. Drama creates unique problems for playwrights through its concentrated focus on place, dialect, and character; the multiple layers of authorship; the collective reception format; and the demand for exaggeration within production. These issues, as they interact with regional conditions and perceptions, pose problems for southern women playwrights in navigating how to represent a marginalized region on the stage. Through analysis of the dramatic texts, the rhetoric of reviews of productions, as well as what the playwrights themselves have said about their plays and its productions, this book delineates these challenges and argues that playwrights confront obstacles through various conscious strategies. These approaches lead audiences to reconsider monolithic understandings of northern and southern regions and ultimately, they create new visions of the South.
APA, Harvard, Vancouver, ISO, and other styles

Book chapters on the topic "Vocal recognition"

1

Gautier, Jean-Pierre, and Annie Gautier-Hion. "Vocal Quavering: A Basis for Recognition in Forest Guenons." In Primate Vocal Communication, 15–30. Berlin, Heidelberg: Springer Berlin Heidelberg, 1988. http://dx.doi.org/10.1007/978-3-642-73769-5_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Wang, Ning, P. C. Ching, and Tan Lee. "Exploration of Phase and Vocal Excitation Modulation Features for Speaker Recognition." In Biometric Recognition, 251–59. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. http://dx.doi.org/10.1007/978-3-642-35136-5_31.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Krothapalli, Sreenivasa Rao, and Shashidhar G. Koolagudi. "Emotion Recognition Using Vocal Tract Information." In SpringerBriefs in Electrical and Computer Engineering, 67–78. New York, NY: Springer New York, 2012. http://dx.doi.org/10.1007/978-1-4614-5143-3_4.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Tholeti, Guru Sree Ram, Deepika Ghanta, N. V. S. Guru Sai Sarma Chilukuri, and Shahana Bano. "Vocal Source Builds Divergence in Gender Recognition." In Lecture Notes in Electrical Engineering, 171–83. Singapore: Springer Singapore, 2021. http://dx.doi.org/10.1007/978-981-16-3690-5_16.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Stoddard, Philip Kraft. "20. Vocal Recognition of Neighbors by Territorial Passerines." In Ecology and Evolution of Acoustic Communication in Birds, edited by Donald E. Kroodsma and Edward H. Miller, 356–74. Ithaca, NY: Cornell University Press, 2020. http://dx.doi.org/10.7591/9781501736957-028.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Ouhnini, Ahmed, Brahim Aksasse, and Mohammed Ouanan. "Vocal Parameters Analysis for Amazigh Phonemes Recognition System." In Digital Technologies and Applications, 43–53. Cham: Springer International Publishing, 2022. http://dx.doi.org/10.1007/978-3-031-01942-5_5.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Zimmermann, Axel. "Artificial Neural Networks for Analysis and Recognition of Primate Vocal Communication." In Current Topics in Primate Vocal Communication, 29–46. Boston, MA: Springer US, 1995. http://dx.doi.org/10.1007/978-1-4757-9930-9_2.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Charrier, Isabelle. "Mother–Offspring Vocal Recognition and Social System in Pinnipeds." In Coding Strategies in Vertebrate Acoustic Communication, 231–46. Cham: Springer International Publishing, 2020. http://dx.doi.org/10.1007/978-3-030-39200-0_9.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Agarwal, Gaurav, Vikas Maheshkar, Sushila Maheshkar, and Sachi Gupta. "Vocal Mood Recognition: Text Dependent Sequential and Parallel Approach." In Advances in Intelligent Systems and Computing, 131–42. Singapore: Springer Singapore, 2018. http://dx.doi.org/10.1007/978-981-13-1819-1_14.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Asawa, Krishna, and Raj Vardhan. "HMM Modeling of User Mood through Recognition of Vocal Emotions." In Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering, 229–38. Berlin, Heidelberg: Springer Berlin Heidelberg, 2013. http://dx.doi.org/10.1007/978-3-642-36642-0_23.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Conference papers on the topic "Vocal recognition"

1

Mayorga, P., and L. Besacier. "Voice over IP and Vocal Recognition." In 2006 3rd International Conference on Electrical and Electronics Engineering. IEEE, 2006. http://dx.doi.org/10.1109/iceee.2006.251861.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Masapollo, Matthew, Linda Polka, Lucie Menard, and Athena Vouloumanos. "Infant recognition of infant vocal signals." In ICA 2013 Montreal. ASA, 2013. http://dx.doi.org/10.1121/1.4798777.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Zhang, Yichi, and Zhiyao Duan. "Retrieving sounds by vocal imitation recognition." In 2015 IEEE 25th International Workshop on Machine Learning for Signal Processing (MLSP). IEEE, 2015. http://dx.doi.org/10.1109/mlsp.2015.7324316.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Murthy, Y. V. Srinivasa, Shashidhar G. Koolagudi, and Vishnu G. Swaroop. "Vocal and Non-vocal Segmentation based on the Analysis of Formant Structure." In 2017 Ninth International Conference on Advances in Pattern Recognition (ICAPR). IEEE, 2017. http://dx.doi.org/10.1109/icapr.2017.8593164.

Full text
APA, Harvard, Vancouver, ISO, and other styles
5

Hong, Alexander, Yuma Tsuboi, Goldie Nejat, and Beno Benhabib. "Multimodal Affect Recognition for Assistive Human-Robot Interactions." In 2017 Design of Medical Devices Conference. American Society of Mechanical Engineers, 2017. http://dx.doi.org/10.1115/dmd2017-3332.

Full text
Abstract:
Socially assistive robots can provide cognitive assistance with activities of daily living, and promote social interactions to those suffering from cognitive impairments and/or social disorders. They can be used as aids for a number of different populations including those living with dementia or autism spectrum disorder, and for stroke patients during post-stroke rehabilitation [1]. Our research focuses on developing socially assistive intelligent robots capable of partaking in natural human-robot interactions (HRI). In particular, we have been working on the emotional aspects of the interactions to provide engaging settings, which in turn lead to better acceptance by the intended users. Herein, we present a novel multimodal affect recognition system for the robot Luke, Fig. 1(a), to engage in emotional assistive interactions. Current multimodal affect recognition systems mainly focus on inputs from facial expressions and vocal intonation [2], [3]. Body language has also been used to determine human affect during social interactions, but has yet to be explored in the development of multimodal recognition systems. Body language has been strongly correlated to vocal intonation [4]. The combined modalities provide emotional information due to the temporal development underlying the neural interaction in audiovisual perception [5]. In this paper, we present a novel multimodal recognition system that uniquely combines inputs from both body language and vocal intonation in order to autonomously determine user affect during assistive HRI.
APA, Harvard, Vancouver, ISO, and other styles
6

Alghowinem, Sharifa, Roland Goecke, Julien Epps, Michael Wagner, and Jeffrey Cohn. "Cross-Cultural Depression Recognition from Vocal Biomarkers." In Interspeech 2016. ISCA, 2016. http://dx.doi.org/10.21437/interspeech.2016-1339.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Gu, Yu, Eric Postma, and Hai-Xiang Lin. "Vocal Emotion Recognition with Log-Gabor Filters." In MM '15: ACM Multimedia Conference. New York, NY, USA: ACM, 2015. http://dx.doi.org/10.1145/2808196.2811635.

Full text
APA, Harvard, Vancouver, ISO, and other styles
8

Asawa, Krishna, Vikrant Verma, and Ankit Agrawal. "Recognition of vocal emotions from acoustic profile." In the International Conference. New York, New York, USA: ACM Press, 2012. http://dx.doi.org/10.1145/2345396.2345512.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Hanchate, D. B., Mohini Nalawade, Manoj Pawar, Vijay Pophale, and Prabhat Kumar Maurya. "Vocal digit recognition using Artificial Neural Network." In 2010 2nd International Conference on Computer Engineering and Technology. IEEE, 2010. http://dx.doi.org/10.1109/iccet.2010.5486314.

Full text
APA, Harvard, Vancouver, ISO, and other styles
10

Bonjyotsna, A., and M. Bhuyan. "Signal processing for segmentation of vocal and non-vocal regions in songs: A review." In 2013 International Conference on Signal Processing, Image Processing, and Pattern Recognition (ICSIPR). IEEE, 2013. http://dx.doi.org/10.1109/icsipr.2013.6497965.

Full text
APA, Harvard, Vancouver, ISO, and other styles

Reports on the topic "Vocal recognition"

1

Zhan, Puming, and Alex Waibel. Vocal Tract Length Normalization for Large Vocabulary Continuous Speech Recognition. Fort Belvoir, VA: Defense Technical Information Center, May 1997. http://dx.doi.org/10.21236/ada333514.

Full text
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography