Rozprawy doktorskie na temat „Voice recognition”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Voice recognition.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Voice recognition”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Kamarauskas, Juozas. "Speaker recognition by voice". Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2009. http://vddb.library.lt/obj/LT-eLABa-0001:E.02~2009~D_20090615_093847-20773.

Pełny tekst źródła
Streszczenie:
Questions of speaker’s recognition by voice are investigated in this dissertation. Speaker recognition systems, their evolution, problems of recognition, systems of features, questions of speaker modeling and matching used in text-independent and text-dependent speaker recognition are considered too. The text-independent speaker recognition system has been developed during this work. The Gaussian mixture model approach was used for speaker modeling and pattern matching. The automatic method for voice activity detection was proposed. This method is fast and does not require any additional actions from the user, such as indicating patterns of the speech signal and noise. The system of the features was proposed. This system consists of parameters of excitation source (glottal) and parameters of the vocal tract. The fundamental frequency was taken as an excitation source parameter and four formants with three antiformants were taken as parameters of the vocal tract. In order to equate dispersions of the formants and antiformants we propose to use them in mel-frequency scale. The standard mel-frequency cepstral coefficients (MFCC) for comparison of the results were implemented in the recognition system too. These features make baseline in speech and speaker recognition. The experiments of speaker recognition have shown that our proposed system of features outperformed standard mel-frequency cepstral coefficients. The equal error rate (EER) was equal to 5.17% using proposed... [to full text]
Disertacijoje nagrinėjami kalbančiojo atpažinimo pagal balsą klausimai. Aptartos kalbančiojo atpažinimo sistemos, jų raida, atpažinimo problemos, požymių sistemos įvairovė bei kalbančiojo modeliavimo ir požymių palyginimo metodai, naudojami nuo ištarto teksto nepriklausomame bei priklausomame kalbančiojo atpažinime. Darbo metu sukurta nuo ištarto teksto nepriklausanti kalbančiojo atpažinimo sistema. Kalbėtojų modelių kūrimui ir požymių palyginimui buvo panaudoti Gauso mišinių modeliai. Pasiūlytas automatinis vokalizuotų garsų išrinkimo (segmentavimo) metodas. Šis metodas yra greitai veikiantis ir nereikalaujantis iš vartotojo jokių papildomų veiksmų, tokių kaip kalbos signalo ir triukšmo pavyzdžių nurodymas. Pasiūlyta požymių vektorių sistema, susidedanti iš žadinimo signalo bei balso trakto parametrų. Kaip žadinimo signalo parametras, panaudotas žadinimo signalo pagrindinis dažnis, kaip balso trakto parametrai, panaudotos keturios formantės bei trys antiformantės. Siekiant suvienodinti žemesnių bei aukštesnių formančių ir antiformančių dispersijas, jas pasiūlėme skaičiuoti melų skalėje. Rezultatų palyginimui sistemoje buvo realizuoti standartiniai požymiai, naudojami kalbos bei asmens atpažinime – melų skalės kepstro koeficientai (MSKK). Atlikti kalbančiojo atpažinimo eksperimentai parodė, kad panaudojus pasiūlytą požymių sistemą buvo gauti geresni atpažinimo rezultatai, nei panaudojus standartinius požymius (MSKK). Gautas lygių klaidų lygis, panaudojant pasiūlytą požymių... [toliau žr. visą tekstą]
Style APA, Harvard, Vancouver, ISO itp.
2

Al-Kilani, Menia. "Voice-signature-based Speaker Recognition". University of the Western Cape, 2017. http://hdl.handle.net/11394/5888.

Pełny tekst źródła
Streszczenie:
Magister Scientiae - MSc (Computer Science)
Personal identification and the protection of data are important issues because of the ubiquitousness of computing and these have thus become interesting areas of research in the field of computer science. Previously people have used a variety of ways to identify an individual and protect themselves, their property and their information. This they did mostly by means of locks, passwords, smartcards and biometrics. Verifying individuals by using their physical or behavioural features is more secure than using other data such as passwords or smartcards, because everyone has unique features which distinguish him or her from others. Furthermore the biometrics of a person are difficult to imitate or steal. Biometric technologies represent a significant component of a comprehensive digital identity solution and play an important role in security. The technologies that support identification and authentication of individuals is based on either their physiological or their behavioural characteristics. Live-­‐data, in this instance the human voice, is the topic of this research. The aim is to recognize a person’s voice and to identify the user by verifying that his/her voice is the same as a record of his / her voice-­‐signature in a systems database. To address the main research question: “What is the best way to identify a person by his / her voice signature?”, design science research, was employed. This methodology is used to develop an artefact for solving a problem. Initially a pilot study was conducted using visual representation of voice signatures, to check if it is possible to identify speakers without using feature extraction or matching methods. Subsequently, experiments were conducted with 6300 data sets derived from Texas Instruments and the Massachusetts Institute of Technology audio database. Two methods of feature extraction and classification were considered—mel frequency cepstrum coefficient and linear prediction cepstral coefficient feature extraction—and for classification, the Support Vector Machines method was used. The three methods were compared in terms of their effectiveness and it was found that the system using the mel frequency cepstrum coefficient, for feature extraction, gave the marginally better results for speaker recognition.
Style APA, Harvard, Vancouver, ISO itp.
3

Alkilani, Menia Mohamed. "Voice signature based Speaker Recognition". University of the Western Cape, 2017. http://hdl.handle.net/11394/6196.

Pełny tekst źródła
Streszczenie:
Magister Scientiae - MSc (Computer Science)
Personal identification and the protection of data are important issues because of the ubiquitousness of computing and these havethus become interesting areas of research in the field of computer science. Previously people have used a variety of ways to identify an individual and protect themselves, their property and their information.
Style APA, Harvard, Vancouver, ISO itp.
4

Laird, Esther. "Voice recognition and auditory-visual integration in person recognition". Thesis, University of Sussex, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.487906.

Pełny tekst źródła
Streszczenie:
The human ability to recognise a voice is important for social interaction and speech comprehension. In everyday recognitions, the voice can be encountered alone (e.g. over a telephone) or with a face, and ~e person being recognised can be familiar or unfamiliar (such as a witness choosing a perpetrator from a lineup). This thesis - presents 7 studies cov~ring each of these situations. The first paper presents 3 studies on recognition of unfamiliar voices when there is a change in emotional tone between learning and test phases. A tone change reduces recognition accuracy when there is no specific encoding strategy at the learning phase. Familiaris.ation at the learning phase reduces the tone change effect but concentrating on word content at the learning phase does not. The second paper presents 3 studies investigating the limitations of the face overshadowing effect (voice recognition is worse when the voice is learned with a face than if it is learned alone). Blurring faces made face recognition more qifficult but did not affect voice recognition. In experiment 2, participants learned a sentence repeated 3 times, either with the face changing on each repetition or staying the same. Face recognition accuracy was lower when there were 3 faces, but this did not affect voice recognition. In experiment 3, inverting faces' made face recognition more difficult but did not affect voice recognition. The third paper reports that episodic memory for a celebrity is improved when a face and voice are given compared to just a face. A model of person recognition is presented that builds on existing models (e.g. Burton, Bruce & Johnston, 1990; Belin, 2004). It accounts for unfamiliar and familiar voice recognition and the benefits and costs of auditory-visual integration.
Style APA, Harvard, Vancouver, ISO itp.
5

Clotworthy, Christopher John. "A study of automated voice recognition". Thesis, Queen's University Belfast, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356909.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Damjanovic, Ljubica. "Memory processes in familiar voice recognition". Thesis, University of Essex, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.413126.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Vipperla, Ravichander. "Automatic Speech Recognition for ageing voices". Thesis, University of Edinburgh, 2011. http://hdl.handle.net/1842/5725.

Pełny tekst źródła
Streszczenie:
With ageing, human voices undergo several changes which are typically characterised by increased hoarseness, breathiness, changes in articulatory patterns and slower speaking rate. The focus of this thesis is to understand the impact of ageing on Automatic Speech Recognition (ASR) performance and improve the ASR accuracies for older voices. Baseline results on three corpora indicate that the word error rates (WER) for older adults are significantly higher than those of younger adults and the decrease in accuracies is higher for males speakers as compared to females. Acoustic parameters such as jitter and shimmer that measure glottal source disfluencies were found to be significantly higher for older adults. However, the hypothesis that these changes explain the differences in WER for the two age groups is proven incorrect. Experiments with artificial introduction of glottal source disfluencies in speech from younger adults do not display a significant impact on WERs. Changes in fundamental frequency observed quite often in older voices has a marginal impact on ASR accuracies. Analysis of phoneme errors between younger and older speakers shows a pattern of certain phonemes especially lower vowels getting more affected with ageing. These changes however are seen to vary across speakers. Another factor that is strongly associated with ageing voices is a decrease in the rate of speech. Experiments to analyse the impact of slower speaking rate on ASR accuracies indicate that the insertion errors increase while decoding slower speech with models trained on relatively faster speech. We then propose a way to characterise speakers in acoustic space based on speaker adaptation transforms and observe that speakers (especially males) can be segregated with reasonable accuracies based on age. Inspired by this, we look at supervised hierarchical acoustic models based on gender and age. Significant improvements in word accuracies are achieved over the baseline results with such models. The idea is then extended to construct unsupervised hierarchical models which also outperform the baseline models by a good margin. Finally, we hypothesize that the ASR accuracies can be improved by augmenting the adaptation data with speech from acoustically closest speakers. A strategy to select the augmentation speakers is proposed. Experimental results on two corpora indicate that the hypothesis holds true only when the amount of available adaptation is limited to a few seconds. The efficacy of such a speaker selection strategy is analysed for both younger and older adults.
Style APA, Harvard, Vancouver, ISO itp.
8

Iliadi, Konstantina. "Bio-inspired voice recognition for speaker identification". Thesis, University of Southampton, 2016. https://eprints.soton.ac.uk/413949/.

Pełny tekst źródła
Streszczenie:
Speaker identification (SID) aims to identify the underlying speaker(s) given a speech utterance. In a speaker identification system, the first component is the front-end or feature extractor. Feature extraction transforms the raw speech signal into a compact but effective representation that is more stable and discriminative than the original signal. Since the front-end is the first component in the chain, the quality of the later components is strongly determined by its quality. Existing approaches have used several feature extraction methods that have been adopted directly from the speech recognition task. However, the nature of these two tasks is contradictory given that speaker variability is one of the major error sources in speech recognition whereas in speaker recognition, it is the information that we wish to extract. In this thesis, the possible benefits of adapting a biologically-inspired model of human auditory processing as part of the front-end of a SID system are examined. This auditory model named Auditory Image Model (AIM) generates the stabilized auditory image (SAI). Features are extracted by the SAI through breaking it into boxes of different scales. Vector quantization (VQ) is used to create the speaker database with the speakers’ reference templates that will be used for pattern matching with the features of the target speakers that need to be identified. Also, these features are compared to the Mel-frequency cepstral coefficients (MFCCs), which is the most evident example of a feature set that is extensively used in speaker recognition but originally developed for speech recognition purposes. Additionally, another important parameter in SID systems is the dimensionality of the features. This study addresses this issue by specifying the most speaker-specific features and trying to further improve the system configuration for obtaining a representation of the auditory features with lower dimensionality. Furthermore, after evaluating the system performance in quiet conditions, another primary topic of speaker recognition is investigated. SID systems can perform well under matched training and test conditions but their performance degrades significantly because of the mismatch caused by background noise in real-world environments. Achieving robustness to SID systems becomes an important research problem. In the second experimental part of this thesis, the developed version of the system is assessed for speaker data sets of different size. Clean speech is used for the training phase while speech in the presence of babble noise is used for speaker testing. The results suggest that the extracted auditory feature vectors lead to much better performance, i.e. higher SID accuracy, compared to the MFCC-based recognition system especially for low SNRs. Lastly, the system performance is inspected with regard to parameters related to the training and test speech data such as the duration of the spoken material. From these experiments, the system is found to produce satisfying identification scores for relatively short training and test speech segments.
Style APA, Harvard, Vancouver, ISO itp.
9

Sanders, Richard Calvin. "Voice recognition system implementation and laboratory exercise". Master's thesis, This resource online, 1995. http://scholar.lib.vt.edu/theses/available/etd-01262010-020212/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Ho, Ching-Hsiang. "Speaker modelling for voice conversion". Thesis, Brunel University, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365076.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

Eriksson, Erik J. "That voice sounds familiar : factors in speaker recognition". Doctoral thesis, Umeå : Department of Philosophy and Linguistics, Umeå University, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:umu:diva-1106.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Kuhn, Lisa Katharina. "Emotion recognition in the human face and voice". Thesis, Brunel University, 2015. http://bura.brunel.ac.uk/handle/2438/11216.

Pełny tekst źródła
Streszczenie:
At a perceptual level, faces and voices consist of very different sensory inputs and therefore, information processing from one modality can be independent of information processing from another modality (Adolphs & Tranel, 1999). However, there may also be a shared neural emotion network that processes stimuli independent of modality (Peelen, Atkinson, & Vuilleumier, 2010) or emotions may be processed on a more abstract cognitive level, based on meaning rather than on perceptual signals. This thesis therefore aimed to examine emotion recognition across two separate modalities in a within-subject design, including a cognitive Chapter 1 with 45 British adults, a developmental Chapter 2 with 54 British children as well as a cross-cultural Chapter 3 with 98 German and British children, and 78 German and British adults. Intensity ratings as well as choice reaction times and correlations of confusion analyses of emotions across modalities were analysed throughout. Further, an ERP Chapter investigated the time-course of emotion recognition across two modalities. Highly correlated rating profiles of emotions in faces and voices were found which suggests a similarity in emotion recognition across modalities. Emotion recognition in primary-school children improved with age for both modalities although young children relied mainly on faces. British as well as German participants showed comparable patterns for rating basic emotions, but subtle differences were also noted and Germans perceived emotions as less intense than British. Overall, behavioural results reported in the present thesis are consistent with the idea of a general, more abstract level of emotion processing which may act independently of modality. This could be based, for example, on a shared emotion brain network or some more general, higher-level cognitive processes which are activated across a range of modalities. Although emotion recognition abilities are already evident during childhood, this thesis argued for a contribution of ‘nurture’ to emotion mechanisms as recognition was influenced by external factors such as development and culture.
Style APA, Harvard, Vancouver, ISO itp.
13

M, Alzamora, Manuel I. Alzamora, Andrés E. Huamán, Alfredo Barrientos i Riega Rosario del Pilar Villalta. "Implementación de una herramienta de integración de varios tipos de interacción humano-computadora para el desarrollo de nuevos sistemas multimodales". International Institute of Informatics and Systemics, IIIS, 2018. http://hdl.handle.net/10757/624676.

Pełny tekst źródła
Streszczenie:
Las personas interactúan con su entorno de forma multimodal. Esto es, con el uso simultaneo de sus sentidos. En los últimos años, se ha buscado una interacción multimodal humano-computador desarrollando nuevos dispositivos y usando diferentes canales de comunicación con el fin de brindar una experiencia de usuario interactiva más natural. Este trabajo presenta una herramienta que permite la integración de diferentes tipos de interacción humano computador y probarlo sobre una solución multimodal.
Revisión por pares
Style APA, Harvard, Vancouver, ISO itp.
14

Xue, Sukui, i 薛苏葵. "Voice-enabled CAD system". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2010. http://hub.hku.hk/bib/B45461405.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
15

Pandit, Medha. "Voice and lip based speaker verification". Thesis, University of Surrey, 2000. http://epubs.surrey.ac.uk/915/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Nayfeh, Taysir H. "Multi-signal processing for voice recognition in noisy environments". Thesis, This resource online, 1991. http://scholar.lib.vt.edu/theses/available/etd-10222009-125021/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Malyska, Nicolas 1977. "Automatic voice disorder recognition using acoustic amplitude modulation features". Thesis, Massachusetts Institute of Technology, 2004. http://hdl.handle.net/1721.1/30092.

Pełny tekst źródła
Streszczenie:
Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2004.
Includes bibliographical references (p. 114-117).
An automatic dysphonia recognition system is designed that exploits amplitude modulations (AM) in voice using biologically-inspired models. This system recognizes general dysphonia and four subclasses: hyperfunction, A-P squeezing, paralysis, and vocal fold lesions. The models developed represent processing in the auditory system at the level of the cochlea, auditory nerve, and inferior colliculus. Recognition experiments using dysphonic sentence data obtained from the Kay Elemetrics Disordered Voice Database suggest that our system provides complementary information to state-of-the-art mel-cepstral features. A model for analyzing AM in dysphonic speech is also developed from a traditional communications engineering perspective. Through a case study of seven disordered voices, we show that different AM patterns occur in different frequency bands. This perspective challenges current dysphonia analysis methods that analyze AM in the time-domain signal.
by Nicolas Malyska.
S.M.
Style APA, Harvard, Vancouver, ISO itp.
18

He, Qing Ph D. Massachusetts Institute of Technology. "An architecture for low-power voice-command recognition systems". Thesis, Massachusetts Institute of Technology, 2016. http://hdl.handle.net/1721.1/105574.

Pełny tekst źródła
Streszczenie:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2016.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Cataloged from student-submitted PDF version of thesis.
Includes bibliographical references (pages 149-157).
The advancements in fields such as machine-learning have allowed for a growing number of applications seeking to exploit learning methods. Many such applications involve complex algorithms working over high-dimensional features and are implemented in large scale systems where power and other resources are abundant. With emerging interest in embedded applications, nano-scale systems, and mobile devices, which are power and computation constrained, there is a rising need to find simple, low-power solutions for common applications such as voice activation. This thesis develops an ultra-low-power system architecture for voice-command recognition applications. It optimizes system resources by exploiting compact representations of the signal features and extracting them with efficient analog front-ends. The front-end performs feature pre-selection such that only a subset of all available features are chosen and extracted. Two variations of front-end feature extraction design are developed, for the applications of text-dependent speaker-verification and user-independent command recognition, respectively. For speaker-verification, the features are selected with knowledge of the speaker's fundamental frequency and are adapted based on the noise spectrum. The back-end algorithm, supporting adaptive feature selection, is a weighted dynamic time warping algorithm that removes signal misalignments and mitigates speech rate variations while preserving the signal envelope. In the case of user-independent command recognition, a universal set of features are selected without using speaker-specific information. The back-end classifier is enabled by a novel multi-band deep neural network model that processes only the selected features at each decision. In experiments, the proposed systems achieve improved accuracy with noise robustness using significantly less power consumption and computation than existing systems. Components of the front- and back-ends have been implemented in hardware, and the end-to-end system power consumption is kept under a few hundred [mu]Ws.
by Qing He.
Ph. D.
Style APA, Harvard, Vancouver, ISO itp.
19

Lawy, Jenny. "Ethnography of San : minority recognition and voice in Botswana". Thesis, University of Edinburgh, 2016. http://hdl.handle.net/1842/22888.

Pełny tekst źródła
Streszczenie:
Over the last sixty years anthropological interest in San has focused on their status as hunter-gatherers and, more recently, as an economically and socially marginalised minority group. In this thesis, I examine the different ways in which this indigenous minority population in Botswana manage and negotiate their relations with one another and with the broader society in which they are embedded. The research comprised eighteen months of fieldwork (April 2010 to December 2011) in Gaborone city, and a largely Naro-speaking village in Gantsi District in the west of Botswana. The participants comprised a small but relatively highly-educated cadre of elite San men who self-presented as advocates for San-related issues in the wider community but also San men and women in the towns and villages of the region. Early in the research process I recognised the need to make sense of the ethnography in terms of a variety of markers. Whilst this included what San actually said it also encompassed what they did and how they did it: that is their behaviour, dress and bodily techniques and practices – all of which I describe as voice. The research intersects with issues of gender, language, culture, class, identity and self-representation in the daily lives of San. I emphasise the tensions that San face in their daily struggles for recognition as human beings of equal value in Botswana’s society. As the public face of this struggle, San advocates were in a difficult and ambiguous position in relation to the wider San community. As a consequence of this, I explore egalitarianism as a set of political and social relationships rather than as a ‘sharing practice’. I identify a number of areas for further research, for example, to work collaboratively with San to incorporate aspects of what San called ‘personal empowerment’ and training. I show that the research has wider implications for other minority groups and indigenous people worldwide who have also been subject to highly politicised and overly deterministic definitions of their identity. My work suggests possibilities for working with emerging indigenous ‘elites’, who mediate most visibly the contours of these categories of identity by purposefully combining, conflating and straddling these labels.
Style APA, Harvard, Vancouver, ISO itp.
20

Mathukumalli, Sravya. "Vocal combo android application". Kansas State University, 2016. http://hdl.handle.net/2097/32505.

Pełny tekst źródła
Streszczenie:
Master of Science
Department of Computing and Information Sciences
Mitchell L. Neilsen
Now-a-days people from various backgrounds need different information on demand. People are relying on web as a source of information they need. But to get connected to internet all the time with a computer system is not possible. Android, being open source has already made its mark in the mobile application development. The highest smart phone user base and the ease of developing the applications in Android is an added advantage for the users as well as the Android developers. The vocal combo is an Android application which provides the required functionality on Android supported smart phone or a tablet. This provides the flexibility of accessing information at the users’ fingertips. This application is built using Android SDK, which makes the application easy to deploy on any Android powered device. Vocal Combo is a combination of voice based applications. It includes a Text-To-Voice convertor and Voice-To-Text convertor. This application helps the user to learn the pronunciation of various words. At the same time the user can also check his/her pronunciation skills. This application also provides the functionality of meaning check where the user can check the meaning of the words he types in or speaks out. At any point of time, the user can check the history of the words for which he has checked the meaning or pronunciation for. The application also provides the support to the user on how to use this application.
Style APA, Harvard, Vancouver, ISO itp.
21

Lingaria, Dhruvin M. "Assistive voice recognition device for GSM calling using Arduino UNO". Thesis, California State University, Long Beach, 2015. http://pqdtopen.proquest.com/#viewpdf?dispub=1600584.

Pełny tekst źródła
Streszczenie:

Developing a smart home environment for the assistive living requires great efforts. The key element of the smart environment is the ubiquitous voice user interface with several additional capabilities such as the recognition of several gestures, which can be a new feature of voice controlled devices. There are many identification technologies used in current intelligent guard systems. Relative to other techniques, the voice recognition technology is generally regarded as one of the convenient and safe recognition techniques. The assistive device project has incorporated the technology of voice recognition to perform the GSM calling. Arduino UNO is the microprocessor used to create an interface between the voice module and the GSM module SIM900. Platform was developed using inexpensive hardware and software elements available on the market People with disabilities showed high robustness for assistive device. Sample voice commands were stored in the temporary memory for the ATMEGA 328P when field tests with several sets of voice commands was done. The GSM module SIM900 could easily connect to the local cellular network carriers. Hence voice recognized emergency calling can be the future of biomedical field.

Style APA, Harvard, Vancouver, ISO itp.
22

Wilson, Shawn C. "Voice recognition systems : assessment of implementation aboard U.S. naval ships". Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FWilson.pdf.

Pełny tekst źródła
Streszczenie:
Thesis (M.S. in Information Systems and Operations)--Naval Postgraduate School, March 2003.
Thesis advisor(s): Michael T. McMaster, Kenneth J. Hagan. Includes bibliographical references (p. 47-49). Also available online.
Style APA, Harvard, Vancouver, ISO itp.
23

Pillay, Surosh Govindasamy. "Voice biometrics under mismatched noise conditions". Thesis, University of Hertfordshire, 2011. http://hdl.handle.net/2299/5531.

Pełny tekst źródła
Streszczenie:
This thesis describes research into effective voice biometrics (speaker recognition) under mismatched noise conditions. Over the last two decades, this class of biometrics has been the subject of considerable research due to its various applications in such areas as telephone banking, remote access control and surveillance. One of the main challenges associated with the deployment of voice biometrics in practice is that of undesired variations in speech characteristics caused by environmental noise. Such variations can in turn lead to a mismatch between the corresponding test and reference material from the same speaker. This is found to adversely affect the performance of speaker recognition in terms of accuracy. To address the above problem, a novel approach is introduced and investigated. The proposed method is based on minimising the noise mismatch between reference speaker models and the given test utterance, and involves a new form of Test-Normalisation (T-Norm) for further enhancing matching scores under the aforementioned adverse operating conditions. Through experimental investigations, based on the two main classes of speaker recognition (i.e. verification/ open-set identification), it is shown that the proposed approach can significantly improve the performance accuracy under mismatched noise conditions. In order to further improve the recognition accuracy in severe mismatch conditions, an approach to enhancing the above stated method is proposed. This, which involves providing a closer adjustment of the reference speaker models to the noise condition in the test utterance, is shown to considerably increase the accuracy in extreme cases of noisy test data. Moreover, to tackle the computational burden associated with the use of the enhanced approach with open-set identification, an efficient algorithm for its realisation in this context is introduced and evaluated. The thesis presents a detailed description of the research undertaken, describes the experimental investigations and provides a thorough analysis of the outcomes.
Style APA, Harvard, Vancouver, ISO itp.
24

Wildermoth, Brett Richard, i n/a. "Text-Independent Speaker Recognition Using Source Based Features". Griffith University. School of Microelectronic Engineering, 2001. http://www4.gu.edu.au:8080/adt-root/public/adt-QGU20040831.115646.

Pełny tekst źródła
Streszczenie:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Style APA, Harvard, Vancouver, ISO itp.
25

Wildermoth, Brett Richard. "Text-Independent Speaker Recognition Using Source Based Features". Thesis, Griffith University, 2001. http://hdl.handle.net/10072/366289.

Pełny tekst źródła
Streszczenie:
Speech signal is basically meant to carry the information about the linguistic message. But, it also contains the speaker-specific information. It is generated by acoustically exciting the cavities of the mouth and nose, and can be used to recognize (identify/verify) a person. This thesis deals with the speaker identification task; i.e., to find the identity of a person using his/her speech from a group of persons already enrolled during the training phase. Listeners use many audible cues in identifying speakers. These cues range from high level cues such as semantics and linguistics of the speech, to low level cues relating to the speaker's vocal tract and voice source characteristics. Generally, the vocal tract characteristics are modeled in modern day speaker identification systems by cepstral coefficients. Although, these coeficients are good at representing vocal tract information, they can be supplemented by using both pitch and voicing information. Pitch provides very important and useful information for identifying speakers. In the current speaker recognition systems, it is very rarely used as it cannot be reliably extracted, and is not always present in the speech signal. In this thesis, an attempt is made to utilize this pitch and voicing information for speaker identification. This thesis illustrates, through the use of a text-independent speaker identification system, the reasonable performance of the cepstral coefficients, achieving an identification error of 6%. Using pitch as a feature in a straight forward manner results in identification errors in the range of 86% to 94%, and this is not very helpful. The two main reasons why the direct use of pitch as a feature does not work for speaker recognition are listed below. First, the speech is not always periodic; only about half of the frames are voiced. Thus, pitch can not be estimated for half of the frames (i.e. for unvoiced frames). The problem is how to account for pitch information for the unvoiced frames during recognition phase. Second, the pitch estimation methods are not very reliable. They classify some of the frames unvoiced when they are really voiced. Also, they make pitch estimation errors (such as doubling or halving of pitch value depending on the method). In order to use pitch information for speaker recognition, we have to overcome these problems. We need a method which does not use the pitch value directly as feature and which should work for voiced as well as unvoiced frames in a reliable manner. We propose here a method which uses the autocorrelation function of the given frame to derive pitch-related features. We call these features the maximum autocorrelation value (MACV) features. These features can be extracted for voiced as well as unvoiced frames and do not suffer from the pitch doubling or halving type of pitch estimation errors. Using these MACV features along with the cepstral features, the speaker identification performance is improved by 45%.
Thesis (Masters)
Master of Philosophy (MPhil)
School of Microelectronic Engineering
Faculty of Engineering and Information Technology
Full Text
Style APA, Harvard, Vancouver, ISO itp.
26

Mangayyagari, Srikanth. "Voice recognition system based on intra-modal fusion and accent classification". [Tampa, Fla.] : University of South Florida, 2007. http://purl.fcla.edu/usf/dc/et/SFE0002229.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
27

Rashid, M. A. "Packet voice communication on carrier sense multiple access local area networks". Thesis, University of Strathclyde, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.381531.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
28

Gibson, Marcia Rose. "A feasibility study on the use of a voice recognition system for training delivery". Diss., This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-08252008-162853/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

Rouse, Kenneth Arthur Gilbert Juan E. "Classifying speakers using voice biometrics In a multimodal world". Auburn, Ala, 2009. http://hdl.handle.net/10415/1824.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
30

Sepasian, Mojtaba. "Multibiometric security in wireless communication systems". Thesis, Brunel University, 2010. http://bura.brunel.ac.uk/handle/2438/5081.

Pełny tekst źródła
Streszczenie:
This thesis has aimed to explore an application of Multibiometrics to secured wireless communications. The medium of study for this purpose included Wi-Fi, 3G, and WiMAX, over which simulations and experimental studies were carried out to assess the performance. In specific, restriction of access to authorized users only is provided by a technique referred to hereafter as multibiometric cryptosystem. In brief, the system is built upon a complete challenge/response methodology in order to obtain a high level of security on the basis of user identification by fingerprint and further confirmation by verification of the user through text-dependent speaker recognition. First is the enrolment phase by which the database of watermarked fingerprints with memorable texts along with the voice features, based on the same texts, is created by sending them to the server through wireless channel. Later is the verification stage at which claimed users, ones who claim are genuine, are verified against the database, and it consists of five steps. Initially faced by the identification level, one is asked to first present one’s fingerprint and a memorable word, former is watermarked into latter, in order for system to authenticate the fingerprint and verify the validity of it by retrieving the challenge for accepted user. The following three steps then involve speaker recognition including the user responding to the challenge by text-dependent voice, server authenticating the response, and finally server accepting/rejecting the user. In order to implement fingerprint watermarking, i.e. incorporating the memorable word as a watermark message into the fingerprint image, an algorithm of five steps has been developed. The first three novel steps having to do with the fingerprint image enhancement (CLAHE with 'Clip Limit', standard deviation analysis and sliding neighborhood) have been followed with further two steps for embedding, and extracting the watermark into the enhanced fingerprint image utilising Discrete Wavelet Transform (DWT). In the speaker recognition stage, the limitations of this technique in wireless communication have been addressed by sending voice feature (cepstral coefficients) instead of raw sample. This scheme is to reap the advantages of reducing the transmission time and dependency of the data on communication channel, together with no loss of packet. Finally, the obtained results have verified the claims.
Style APA, Harvard, Vancouver, ISO itp.
31

D'Silva, Reginald Arthur. "Promoting reading skills of young adult EAL learners through voice recognition software". Thesis, University of British Columbia, 2011. http://hdl.handle.net/2429/37665.

Pełny tekst źródła
Streszczenie:
The growing international student population in post-secondary institutions in Canada calls for Academic Exchange Programs (AEPs) to focus on promoting reading skills of English as an Additional Language (EAL) students in order to help them read academic and non-academic texts more proficiently. The current study, conducted at a major western Canadian university, investigated the effectiveness of a computer-based software program called the Reading Tutor (RT) in enhancing the reading performance of EAL young adults. A survey determined the reading preferences of participants and reading materials related to news articles were incorporated into the software. Two experimental groups, one (n=16) that self-reported a preference and the other that self-reported a non-preference (n=12) for such reading materials used the software over a period of eight weeks. A control group (n=14) served as a comparison. Results showed that a preference for reading materials positively influenced the non-transfer and transfer of reading fluency skills for non-academic reading materials in a computer-based environment. These skills also transferred to academic texts. However, the gain in reading fluency did not result in gains in comprehension. There was also a positive gain in how students self-assessed their ability to read in English in both of the experimental groups when compared to the control group. The survey also probed reading habits and found that students were in concentric domains of ESL and EFL, spending a majority of their time mainly reading in English for academic purposes. Reading for pleasure in English was only a small part of the students’ reading repertoire. The model of Concentric Domains of Instructional Environments (CDIE) stemming from these results suggests that AEPs, such as the one in the current study, may benefit from reading programs that incorporate extensive reading of non-academic reading materials. There appears to be a small number of studies investigating the effectiveness of computer-based literacy tools in promoting reading skills, among university EAL learners. This study makes a unique and valuable contribution to the understanding of such tools in promoting reading skills with student-preferred materials. In addition, the study adds to the understanding of reading habits of Japanese students in AEPs.
Style APA, Harvard, Vancouver, ISO itp.
32

MACIÁ, ABEL SEBASTIÁN SANTAMARINA. "AN EVALUATION OF BIMODAL RECOGNITION SYSTEMS BASED ON VOICE AND FACIAL IMAGES". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2016. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=29315@1.

Pełny tekst źródła
Streszczenie:
PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO
COORDENAÇÃO DE APERFEIÇOAMENTO DO PESSOAL DE ENSINO SUPERIOR
PROGRAMA DE EXCELENCIA ACADEMICA
Esta dissertação tem como objetivo avaliar os métodos de fusão de escores mais importantes na combinação de dois sistemas uni-modais de reconhecimento em voz e imagens faciais. Para cada sistema uni-modal foram implementadas duas técnicas de classificação: o GMM/UBM e o I-Vetor/GPLDA para voz e o GMM/UBM e um classificador baseado em LBP para imagens faciais. Estes sistemas foram combinados entre eles, sendo 4 combinações testadas. Os métodos de fusão de escores escolhidos se dividem em três grupos: Fusão baseada em densidade, fusão baseada em transformação e fusão baseada em classificadores, e foram testadas algumas variantes para cada grupo. Os métodos foram avaliados em modo de verificação, usando duas bases de dados, uma base virtual formada por duas bases uni-modais e outra base bimodal. O resultado de cada técnica bimodal empregada foi comparado com os resultados das técnicas uni-modais, percebendo-se ganhos significativos na acurácia de reconhecimento. As técnicas de fusão baseadas em densidade mostraram os melhores resultados entre todas as outras técnicas, mais apresentaram uma maior complexidade computacional por causa do processo de estimação da densidade.
The main objective of this dissertation is to compare the most important approaches for score-level fusion of two unimodal systems consisting of facial and independent speaker recognition systems. Two classification methods for each biometric modality were implemented: a GMM/UBM and an I-Vector/GPLDA classifiers for speaker independent recognition and a GMM/UBM and LBP-based classifiers for facial recognition, resulting in four different multimodal combination of fusion explored. The score-level fusion methods investigated are divided in Density-based, Transformation-based and Classifier-based groups and few variants on each group are tested. The fusion methods were tested in verification mode, using two different databases, one virtual database and a bimodal database. The results of each bimodal fusion technique implemented were compared with the unimodal systems, which showed significant recognition performance gains. Density-based techniques of fusion presented the best results among all fusion approaches, at the expense of higher computational complexity due to the density estimation process.
Style APA, Harvard, Vancouver, ISO itp.
33

Johnson, Joanna. "The effectiveness of voice recognition technology as used by persons with disabilities". Online version, 1998. http://www.uwstout.edu/lib/thesis/1998/1998johnsonj.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Count, Peter. "Utilising voice recognition software to improve reading fluency of struggling adolescent readers". Thesis, Edith Cowan University, Research Online, Perth, Western Australia, 2016. https://ro.ecu.edu.au/theses/1799.

Pełny tekst źródła
Streszczenie:
Approximately 15-20% of secondary students in Australia experience reading difficulties. For many, the cognitive effort required to decode words or the lack of automaticity in the elements that contribute to fluent reading prevents effective reading comprehension. Because reading comprehension is of critical importance across the curriculum, students with difficulties in this area are at significant academic risk. One effective method of improving reading fluency is ‘repeated readings’ (NICHHD, 2000). The purpose of this study was to examine whether the use of repeated readings delivered via a home-based program employing voice recognition software (VRS) could improve the reading fluency and self-perception as readers of adolescent students experiencing reading difficulties. The intervention was designed to overcome the problems associated with delivering a repeated reading program within a secondary English classroom. These problems relate to the amount of time required to conduct such a program within the constraints of the existing curriculum, and the reluctance of students to participate in a program that would draw attention to their reading difficulties. A treatment group participated in a home-based repeated reading program using VRS over a 20-week period and their results were compared to a comparison group who participated in a more traditional school-based repeated reading program. Reading fluency, comprehensions and reader self-perception were measured before and after the intervention. Data were analysed using descriptive statistics and case studies. The intervention reported in this study resulted in improved reading rate, accuracy and comprehension for both the home-based treatment group and a school-based comparison group, with evidence of larger gains in the treatment group. The students’ perceptions of themselves as readers, however, did not show significant gains.
Style APA, Harvard, Vancouver, ISO itp.
35

Fredrickson, Steven Eric. "Neural networks for speaker identification". Thesis, University of Oxford, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294364.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
36

Hannah, Malcolm Ian. "Prospects for applying speaker verification to unattended secure banking". Thesis, University of Abertay Dundee, 1996. http://eprints.soton.ac.uk/256265/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
37

Huliehel, Fakhralden A. "An RBFN-based system for speaker-independent speech recognition". Diss., This resource online, 1995. http://scholar.lib.vt.edu/theses/available/etd-06062008-162619/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

MELLO, SIMON. "VATS : Voice-Activated Targeting System". Thesis, KTH, Skolan för industriell teknik och management (ITM), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-279837.

Pełny tekst źródła
Streszczenie:
Machine learning implementations in computer vision and speech recognition are wide and growing; both low- and high-level applications being required. This paper takes a look at the former and if basic implementations are good enough for real-world applications. To demonstrate this, a simple artificial neural network coded in Python and already existing libraries for Python are used to control a laser pointer via a servomotor and an Arduino, to create a voice-activated targeting system. The neural network trained on MNIST data consistently achieves an accuracy of 0.95 ± 0.01 when classifying MNIST test data, but also classifies captured images correctly if noise-levels are low. This also applies to the speech recognition, rarely giving wrong readings. The final prototype achieves success in all domains except turning the correctly classified images into targets that the Arduino can read and aim at, failing to merge the computer vision and speech recognition.
Maskininlärning är viktigt inom röstigenkänning och datorseende, för både små såväl som stora applikationer. Syftet med det här projektet är att titta på om enkla implementationer av maskininlärning duger för den verkligen världen. Ett enkelt artificiellt neuronnät kodat i Python, samt existerande programbibliotek för Python, används för att kontrollera en laserpekare via en servomotor och en Arduino, för att skapa ett röstaktiverat identifieringssystem. Neuronnätet tränat på MNIST data når en precision på 0.95 ± 0.01 när den försöker klassificera MNIST test data, men lyckas även klassificera inspelade bilder korrekt om störningen är låg. Detta gäller även för röstigenkänningen, då den sällan ger fel avläsningar. Den slutliga prototypen lyckas i alla domäner förutom att förvandla bilder som klassificerats korrekt till mål som Arduinon kan läsa av och sikta på, vilket betyder att prototypen inte lyckas sammanfoga röstigenkänningen och datorseendet.
Style APA, Harvard, Vancouver, ISO itp.
39

DeVilliers, Edward Michael. "Implementing voice recognition and natural language processing in the NPSNET networked virtual environment". Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 1996. http://handle.dtic.mil/100.2/ADA320340.

Pełny tekst źródła
Streszczenie:
Thesis (M.S. in Computer Science) Naval Postgraduate School, September 1996.
Thesis advisor(s): Nelson D. Ludlow, John S. Falby. "September 1996." Includes bibliographical references (p. 171-175). Also available online.
Style APA, Harvard, Vancouver, ISO itp.
40

Bruijn, Christina Geertruida de. "Voice quality after dictation to speech recognition software : a perceptual and acoustic study". Thesis, University of Sheffield, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.440907.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
41

Cooper, R. E. "Own-group biases in face and voice recognition : perceptual and social-cognitive influences". Thesis, University of Essex, 2015. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.702477.

Pełny tekst źródła
Streszczenie:
Own-race faces are generally recognised more accurately than other-race faces (Meissner & Brigham, 2001). Two major theories attempt to explain the own-race bias (ORB) and similar own-group biases; social-cognitive and perceptual expertise theories. Perceptual theories expect that increased experience recognising own-race faces leads to a more effective processing style tuned to these faces (e.g., Stahl, Wiese & Schweinberger, 2008). Social-cognitive theories point to categorisation of other-group members at the expense of processing their individual identity, and greater motivation to attend to in-group members (Bernstein, Young & Hugenberg, 2007; Levin, 2001). Both of these theoretical accounts can be used to predict own-group biases in voice processing. An own-sex bias in voice processing was tested (experiment 2.1), and an own-accent bias was found in recognition memory (experiment 3.1). Contributions from perceptual expertise and social-cognitive mechanisms to this bias were then studied. By manipulating the supposed social power of speakers, support for the social-cognitive view was found (experiment 3.2). Event-related potentials (ERPs) revealed further support. This was because an own-accent bias was found in ERP measures of voice discrimination, but not in the ability to discriminate between voices (while ignoring them). Support for the social-cognitive view was limited when studying faces however. There was no evidence of own-group bias for physically. similar face groups (experiments 3.3,5.1 and 6.1). Evidence from eye-tracking found that attention was directed towards the most diagnostic face areas for individual recognition (experiment 5.2). Knowledge of diagnostic areas is best explained by perceptual expertise. Importantly however, unbiased participants adjusted their viewing behaviour according to the most diagnostic areas of each race. Finally, analysis of saccades revealed greater difficulty ignoring own-race faces (experiment 6.2), although there was no such bias for physically similar social groups (experiment 6.1). The implications of these findings and directions for future research are discussed.
Style APA, Harvard, Vancouver, ISO itp.
42

Geoffroy, Nancy Anne. "Measuring Speech Intelligibility in Voice Alarm Communication Systems". Link to electronic thesis, 2005. http://www.wpi.edu/Pubs/ETD/Available/etd-050405-192800/.

Pełny tekst źródła
Streszczenie:
Thesis (M.S.) -- Worcester Polytechnic Institute.
Keywords: speech intelligibility; voice alarm communication system; common intelligibility scale (CIS); speech transmission index (STI). Includes bibliographical references (p. 80-82).
Style APA, Harvard, Vancouver, ISO itp.
43

Kisel, Andrej. "Person Identification by Fingerprints and Voice". Doctoral thesis, Lithuanian Academic Libraries Network (LABT), 2010. http://vddb.laba.lt/obj/LT-eLABa-0001:E.02~2010~D_20101230_093643-05320.

Pełny tekst źródła
Streszczenie:
This dissertation focuses on person identification problems and proposes solutions to overcome those problems. First part is about fingerprint features extraction algorithm performance evaluation. Modifications to a known synthesis algorithm are proposed to make it fast and suitable for performance evaluation. Matching of deformed fingerprints is discussed in the second part of the work. New fingerprint matching algorithm that uses local structures and does not perform fingerprint alignment is proposed to match deformed fingerprints. The use of group delay features of linear prediction model for speaker recognition is proposed in the third part of the work. New similarity metric that uses group delay features is described. It is demonstrated that automatic speaker recognition system with proposed features and similarity metric outperforms traditional speaker identification systems . Multibiometrics using fingerprints and voice is addressed in the last part of the dissertation.
Penkiose disertacijos darbo dalyse nagrinėjamos žmogaus identifikavimo pagal pirštų atspaudus ir balsą problemos ir siūlomi jų sprendimai. Pirštų atspaudų požymių išskyrimo algoritmų kokybės įvertinimo problemą siūloma spręsti panaudojant sintezuotus pirštų atspaudus. Darbe siūlomos žinomo pirštų atpaudų sintezės algoritmo modifikacijos, kurios leidžia sukurti piršto atspaudo vaizdą su iš anksto nustatytomis charakteristikomis ir požymiais bei pagreitina sintezės procesą. Pirštų atspaudų požymių palyginimo problemos yra aptartos ir naujas palyginimo algoritmas yra siūlomas deformuotų pirštų palyginimui. Algoritmo kokybė yra įvertinta ant viešai prieinamų ir vidinių duomenų bazių. Naujas asmens identifikavimo pagal balsą metodas remiantis tiesinės prognozės modelio grupinės delsos požymiais ir tų požymių palyginimo metrika kokybės prasme lenkia tradicinius asmens identifikavimo pagal balsą metodus. Pirštų ir balso įrašų nepriklausomumas yra irodytas ir asmens atpažinimas pagal balsą ir pirštų atspaudus kartu yra pasiūlytas siekiant išspręsti bendras biometrinių sistemų problemas.
Style APA, Harvard, Vancouver, ISO itp.
44

Kay, Peter. "A speech input modality for computer-aided drawing : user interface issues". Thesis, University of Hertfordshire, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.260794.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
45

Якушев, Олександр Анатолійович, Александр Анатольевич Якушев, Oleksandr Anatoliiovych Yakushev i О. В. Кєтов. "Інформаційне та програмне забезпечення мультисенсорної системи розпізнавання неправди". Thesis, Вид-во СумДУ, 2008. http://essuir.sumdu.edu.ua/handle/123456789/20856.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
46

Holmes, William Paul. "Voice input for the disabled /". Title page, contents and summary only, 1987. http://web4.library.adelaide.edu.au/theses/09ENS/09ensh749.pdf.

Pełny tekst źródła
Streszczenie:
Thesis (M. Eng. Sc.)--University of Adelaide, 1987.
Typescript. Includes a copy of a paper presented at TADSEM '85 --Australian Seminar on Devices for Expressive Communication and Environmental Control, co-authored by the author. Includes bibliographical references (leaves [115-121]).
Style APA, Harvard, Vancouver, ISO itp.
47

Ali, Asif. "Voice query-by-example for resource-limited languages using an ergodic hidden Markov model of speech". Diss., Georgia Institute of Technology, 2013. http://hdl.handle.net/1853/50363.

Pełny tekst źródła
Streszczenie:
An ergodic hidden Markov model (EHMM) can be useful in extracting underlying structure embedded in connected speech without the need for a time-aligned transcribed corpus. In this research, we present a query-by-example (QbE) spoken term detection system based on an ergodic hidden Markov model of speech. An EHMM-based representation of speech is not invariant to speaker-dependent variations due to the unsupervised nature of the training. Consequently, a single phoneme may be mapped to a number of EHMM states. The effects of speaker-dependent and context-induced variation in speech on its EHMM-based representation have been studied and used to devise schemes to minimize these variations. Speaker-invariance can be introduced into the system by identifying states with similar perceptual characteristics. In this research, two unsupervised clustering schemes have been proposed to identify perceptually similar states in an EHMM. A search framework, consisting of a graphical keyword modeling scheme and a modified Viterbi algorithm, has also been implemented. An EHMM-based QbE system has been compared to the state-of-the-art and has been demonstrated to have higher precisions than those based on static clustering schemes.
Style APA, Harvard, Vancouver, ISO itp.
48

Benavent, Chàfer José Vicente. "Voice line-ups: Testing aural-perceptual recognition on native speakers of a foreign language". Doctoral thesis, Universitat de Barcelona, 2020. http://hdl.handle.net/10803/668733.

Pełny tekst źródła
Streszczenie:
The focal point of this PhD thesis is (foreign/native) speaker perception and recognition. To test such phenomena, jurors from British and Spanish universities were selected to answer ad hoc perception surveys (voice line-ups in three distinct languages: English, Spanish, and Dutch) to unravel the correlations of speaker-specific sociolinguistic factors and acoustic parameters impinging upon success/error rates in identification and discrimination tasks. To gain a more in-depth understanding of real-life scenarios, the properties of the data are adjusted accordingly (reduced duration of voice samples, semi-spontaneous exchanges), which contrasts with the ideal and controlled conditions hitherto used in experiments of this kind. From a methodological point of view, this is one of the main contributions of this work, besides being one of its challenges, since it aims to prove that differentiating speakers by means of acoustic-phonetic analysis is still plausible despite the limitations of the source material. It is concluded that language familiarity did not influence the results obtained. However, learned languages exhibit a rather unpredictable behaviour. On the other hand, acoustic- phonetic analyses are proven to yield less error rates than the jurors’ responses gathered through identification tests. Nevertheless, jurors’ scores in discrimination tasks reveal even less false alarms, with the exception of the English voice samples’ analysis (0% error rates). In light of the above, further research is naturally encouraged to verify such claims. These findings are indeed limited to some extent, given the interdisciplinary nature of speaker recognition due to the presence of uncontrolled co-existing influences such as psychological states, the memory, and environmental factors. Despite the fact that statistical correlations were not as sound as one may expect, this thesis brings us a step closer to better understand the intricacies of real-life forensic voice comparison through the analysis of semi-spontaneous speech, which is arguably harder to analyse than the samples recorded under laboratory conditions.
El punto fundamental de esta tesis doctoral se centra en el reconocimiento y percepción de hablantes extranjeros/nativos. Para examinar dichos fenómenos, participantes de universidades británicas y españolas fueron seleccionados como jueces de las encuestas de percepción confeccionadas para este fin (ruedas de reconocimiento con tres tipos de lenguas diferenciadas: inglés, español y neerlandés), y así desentrañar las relaciones existentes entre el porcentaje de aciertos/errores en tareas de identificación y discriminación de locutores y los factores inherentes al hablante, como el perfil sociolingüístico y los parámetros acústicos pertinentes. Para ahondar nuestra comprensión de las circunstancias reales, los datos usados se ajustaron debidamente (duración reducida de grabaciones semi-espontáneas), lo que contrasta con las condiciones ideales y controladas hasta ahora empleadas en experimentos de este tipo. Desde un punto de vista metodológico, ésta es una de las principales contribuciones de la presente tesis, además de ser uno de sus retos, ya que pretende demostrar la viabilidad del análisis acústico en la discriminación de hablantes pese a las limitaciones dadas por el material analizado. Se concluye que la familiaridad del idioma no condicionó los resultados obtenidos. Aun así, las lenguas aprendidas exhibieron un comportamiento impredecible. Por otro lado, el análisis acústico produjo una tasa de errores inferior a las producidas por el jurado en pruebas deidentificación. Sin embargo, las tareas dediscriminación mostraron aún menos falsas alarmas en los participantes, con la excepción del análisis de muestras inglesas (con una tasa de error del 0%). En virtud de lo expuesto, se recomienda seguir con esta línea de investigación para verificar dichas afirmaciones. Por otro lado, las limitaciones radican en la interdisciplinariedad del reconocimiento de locutores y en la presencia de influencias coexistentes incontrolables como los estados psicológicos, la memoria y los factores medioambientales. Pese a la falta de contundencia de las pruebas estadísticas, esta tesis nos lleva un paso más cerca hacia la comprensión de las complejidades inherentes a la comparación forense de voces en casos reales mediante el análisis de habla semi-espontánea, cuya información es probablemente más difícil de analizar que la encontrada en grabaciones de laboratorio.
El punt fonamental d’aquesta tesi doctoral se centra en el reconeixement i percepció de parlants estrangers/nadius. Per a examinar aquests fenòmens, participants d’universitats britàniques i espanyoles es seleccionaren com a jutges de les enquestes de percepció confeccionades per a aquesta fi (rodes de reconeixement amb tres tipus de llengües diferenciades: anglès, espanyol i neerlandès), i així investigar les relacions existents entre el percentatge d’encerts/errors en tasques d’identificació i discriminació de locutors i els factors inherents al parlant, com el perfil sociolingüístic i els paràmetres acústics pertinents. Per a aprofundir la nostra comprensió de les circumstàncies reals, les dades emprades s’ajustaren degudament (durada reduïda de gravacions semi-espontànies), la qual cosa contrasta amb les condicions ideals i controlades que s’usaven fins ara en experiments d’aquest tipus. Des d’un punt de vista metodològic, aquesta és una de les contribucions principals de la present tesi, a més de ser un dels seus reptes, ja que pretén demostrar la viabilitat de l’anàlisi acústica en la discriminació de parlants malgrat les limitacions donades pel material analitzat. Es conclou que la familiaritat de l’idioma no ha condicionat els resultats obtinguts. Així i tot, les llengües apreses exhibiren un comportament impredictible. D’altra banda, l’anàlisi acústica causa una taxa d’error inferior a les produïdes pel jurat en proves d’identificació. No obstant això, les tasques de discriminació mostraren encara menys falses alarmes en els participants, amb l’excepció de l’anàlisi en mostres angleses (amb una taxa d’error del 0%). Tenint en compte l’anterior, es recomana seguir amb aquesta línia de recerca per poder verificar les afirmacions ja esmentades. Encara més, les limitacions radiquen en la interdisciplinarietat del reconeixement de locutors i en la presència d’influències coexistents incontrolables com els estats psicològics, la memòria i els factors mediambientals. Malgrat la insuficient contundència de les proves estadístiques, aquesta tesi ens porta un pas més prop cap a la comprensió de les complexitats inherents a la comparació forense de veus en casos reals mitjançant l’anàlisi de parla semi-espontània, la informació de la qual és probablement més difícil d’analitzar que el que s’enregistra a les mostres de laboratori.
Style APA, Harvard, Vancouver, ISO itp.
49

Jafari, Moghadamfard Ramtin, i Saeid Payvar. "The Potential of Visual Features : to Improve Voice Recognition Systems in Vehicles Noisy Environment". Thesis, Högskolan i Halmstad, Sektionen för Informationsvetenskap, Data– och Elektroteknik (IDE), 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:hh:diva-27273.

Pełny tekst źródła
Streszczenie:
Multimodal biometric systems have been subject of study in recent decades, theirunique characteristic of Anti spoofing and liveness detection plus ability to deal withaudio noise made them technology candidates for improving current systems such asvoice recognition, verification and identification systems.In this work we studied feasibility of incorporating audio-visual voice recognitionsystem for dealing with audio noise in the truck cab environment. Speech recognitionsystems suffer from excessive noise from the engine and road traffic and cars stereosystem. To deal with this noise different techniques including active and passive noisecancelling have been studied.Our results showed that although audio-only systems are performing better in noisefree environment their performance drops significantly by increase in the level of noisein truck cabins, which by contrast does not affect the performance of visual features.Final fused system comprising both visual and audio cues, proved to be superior toboth audio-only and video-only systems.
Style APA, Harvard, Vancouver, ISO itp.
50

Couroux, Christina. "Neighbor-stranger discrimination and individual recognition by voice in the American redstart (Setophaga ruticilla)". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1997. http://www.collectionscanada.ca/obj/s4/f2/dsk2/tape16/PQDD_0027/MQ37111.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii