Rozprawy doktorskie na temat „Speech processing”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Speech processing.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Speech processing”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Xu, Jue. "Adaptations in Speech Processing". Doctoral thesis, Humboldt-Universität zu Berlin, 2021. http://dx.doi.org/10.18452/23030.

Pełny tekst źródła
Streszczenie:
Wie sich die Sprachwahrnehmung an ständig eingehende Informationen anpasst, ist eine Schlüsselfrage in der Gedanken- und Gehirnforschung. Die vorliegende Dissertation zielt darauf ab, zum Verständnis von Anpassungen an die Sprecheridentität und Sprachfehler während der Sprachverarbeitung beizutragen und unser Wissen über die Rolle der kognitiven Kontrolle bei der Sprachverarbeitung zu erweitern. Zu diesem Zweck wurden ereigniskorrelierte Potentiale (EKPs, englisch: event-related potentials, ERPs) N400 und P600 in der Elektroenzephalographie (EEG) analysiert. Die vorliegende Arbeit befasste sich insbesondere mit der Frage nach der Anpassung an die Sprecheridentität bei der Verarbeitung von zwei Arten von Sprachfehlern (Xu, Abdel Rahman, & Sommer, 2019), und untersuchte die proaktive Anpassungen, die durch die Erkennung von Sprachfehlern (Xu, Abdel Rahman, & Sommer, 2021) und durch die Sprecher(dis)kontinuität über aufeinanderfolgende Sätze in Situationen mit mehreren Sprechern ausgelöst wurden (Xu, Abdel Rahman, & Sommer, 2021, in press). Die Ergebnisse zeigten, dass unterschiedliche Sprachverarbeitungsstrategien entsprechend der Sprecheridentität von Muttersprachlern oder Nicht-Muttersprachlern und zwei verschiedenen Arten von Sprachfehlern angepasst wurden, was sich in unterschiedlichen N400- und P600-Effekten widerspiegelte. Darüber hinaus kann die Erkennung von Konflikten (Sprachfehler) und Sprecher(dis)kontinuität über aufeinanderfolgende Sätze hinweg eine proaktive kognitive Kontrolle erfordern, die die Verarbeitungsstrategien für den folgenden Satz schnell anpasst, was sich in bisher nicht gemeldeten sequentiellen Anpassungseffekten in der P600-Amplitude manifestierte. Basierend auf dem DMC Modell (Braver, 2012; Braver, Gray, & Burgess, 2007) und dem Überwachungsmodell der Sprachverarbeitung (van de Meerendonk, Indefrey, Chwilla, & Kolk, 2011) schlage ich vor, dass die P600-Amplitude nicht nur reaktive Anpassungen manifestiert, die durch Konflikterkennung ausgelöst werden, nämlich die klassischen P600-Effekte, die eine erneute Analyse der Sprachverarbeitung widerspiegeln, sondern auch proaktive Anpassungen in der Überwachung der Sprachverarbeitung, die Mechanismen der kognitiven Kontrolle von Aufmerksamkeit und Gedächtnis beinhalten.
How language perception adapts to constantly incoming information is a key question in mind and brain research. This doctoral thesis aims to contribute to the understanding of adaptation to speaker identity and speech error during speech processing, and to enhance our knowledge about the role of cognitive control in speech processing. For this purpose, event-related brain potentials (ERPs) N400 and P600 in the electroencephalography (EEG) were analyzed. Specifically, the present work addressed the question about adaptation to the speaker’s identity in processing two types of speech errors (Xu, Abdel Rahman, & Sommer, 2019), and explored proactive adaptation initiated by the detection of speech errors (Xu, Abdel Rahman, & Sommer, 2021) and by speaker (dis-)continuity across consecutive sentences in multi-speaker situations (Xu, Abdel Rahman, & Sommer, 2021, in press). Results showed that different speech processing strategies were adapted according to native or non-native speaker identity and two different types of speech errors, reflected in different N400 and P600 effects. In addition, detection of conflict (speech error) and speaker (dis-)continuity across consecutive sentences engage cognitive control to rapidly adapt processing strategies for the following sentence, manifested in hitherto unreported sequential adaptation effects in the P600 amplitude. Based on the DMC model (Braver, 2012; Braver, Gray, & Burgess, 2007) and the monitoring theory of language perception (van de Meerendonk, Indefrey, Chwilla, & Kolk, 2011), I propose that the P600 amplitude manifests not only reactive adaptations triggered by conflict detection, i.e., the classic P600 effect, reflecting reanalysis of speech processing, but also proactive adaptations in monitoring the speech processing, engaging cognitive control mechanisms of attention and memory.
Style APA, Harvard, Vancouver, ISO itp.
2

Thomas, Mark R. P. "Glottal-synchronous speech processing". Thesis, Imperial College London, 2010. http://hdl.handle.net/10044/1/5611.

Pełny tekst źródła
Streszczenie:
Glottal-synchronous speech processing is a field of speech science where the pseudoperiodicity of voiced speech is exploited. Traditionally, speech processing involves segmenting and processing short speech frames of predefined length; this may fail to exploit the inherent periodic structure of voiced speech which glottal-synchronous speech frames have the potential to harness. Glottal-synchronous frames are often derived from the glottal closure instants (GCIs) and glottal opening instants (GOIs). The SIGMA algorithm was developed for the detection of GCIs and GOIs from the Electroglottograph signal with a measured accuracy of up to 99.59%. For GCI and GOI detection from speech signals, the YAGA algorithm provides a measured accuracy of up to 99.84%. Multichannel speech-based approaches are shown to be more robust to reverberation than single-channel algorithms. The GCIs are applied to real-world applications including speech dereverberation, where SNR is improved by up to 5 dB, and to prosodic manipulation where the importance of voicing detection in glottal-synchronous algorithms is demonstrated by subjective testing. The GCIs are further exploited in a new area of data-driven speech modelling, providing new insights into speech production and a set of tools to aid deployment into real-world applications. The technique is shown to be applicable in areas of speech coding, identification and artificial bandwidth extension of telephone speech
Style APA, Harvard, Vancouver, ISO itp.
3

Lucey, Simon. "Audio-visual speech processing". Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Pełny tekst źródła
Streszczenie:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
Style APA, Harvard, Vancouver, ISO itp.
4

Al-Otaibi, Abdulhadi S. "Arabic speech processing : syllabic segmentation and speech recognition". Thesis, Aston University, 1988. http://publications.aston.ac.uk/8064/.

Pełny tekst źródła
Streszczenie:
A detailed description of the Arabic Phonetic System is given. The syllabic behaviour of the Arabic language is highlighted. Basic statistical properties Of the Arabic language (phoneme and syllabic frequency of repetition) are included. A thorough review of the speech processing techniques, used in speech analysis, synthesis and recognition applications are presented. The development of a PC-based speech processing system is described. The system has proven to be a useful tool in Arabic speech analysis and recognition applications. A sample speotrographic study of two pairs of Arabic similar sounds was performed. it is shown that no clear acoustical property exist in distinguishing between the phonemes /O/ and /f/ except the gradual rise of F1 during formant movements (transitions). The development of an automatic Arabic syllabic segmentation algorithm is described. The performance of the algorithm is tested with monosyllabic and multisyllabic words. An overall accuracy of 92% was achieved. The main parameters affecting the accuracy of the segmentation algorithm are discussed. The syllabic units generated from applying the Arabic syllabic segmentation algorithm are utilized in the implementation of three major speech applications, namely, automatic Arabic vowel recognition system, isolated word recognition system and an acoustic-phonetic model for Arabic. Each application is fully described and its performance results are indicated.
Style APA, Harvard, Vancouver, ISO itp.
5

Grancharov, Volodya. "Human perception in speech processing". Doctoral thesis, Stockholm : Sound and Image Processing Laboratory, School of Electrical Engineering, Royal Institute of Technology, 2006. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4032.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

Duffy, Hester Elizabeth Sarah. "The processing of accented speech". Thesis, University of Plymouth, 2013. http://hdl.handle.net/10026.1/1556.

Pełny tekst źródła
Streszczenie:
This thesis examines the processing of accented speech in both infants and adults. Accents provide a natural and reasonably consistent form of inter-speaker variation in the speech signal, but it is not yet clear exactly what processes are used to normalise this form of variation, or when and how those processes develop. Two adult studies use ERP data to examine differences between the online processing of regional- and foreign-accented speech as compared to a baseline consisting of the listeners’ home accent. These studies demonstrate that the two types of accents recruit normalisation processes which are qualitatively, and not just quantitatively, different. This provided support for the hypothesis that foreign and regional accents require different mechanisms to normalise accent-based variation (Adank et al., 2009, Floccia et al., 2009), rather than for the hypothesis that different types of accents are normalised according to their perceptual distance from the listener’s own accent (Clarke & Garrett, 2004). They also provide support for the Abstract entry approach to lexical storage of variant forms, which suggests that variant forms undergo a process of prelexical normalisation, allowing access to a canonical lexical entry (Pallier et al., 2001), rather than for the Exemplar-based approach, which suggests that variant word-forms are individually represented in the lexicon (Johnson, 1997). Two further studies examined how infants segment words from continuous speech when presented with accented speakers. The first of these includes a set of behavioural experiments, which highlight some methodological issues in the existing literature and offer some potential explanations for conflicting evidence about the age at which infants are able to segment speech. The second uses ERP data to investigate segmentation within and across accents, and provides neurophysiological evidence that 11-month-olds are able to distinguish newly-segmented words at the auditory level even within a foreign accent, or across accents, but that they are more able to treat new word-forms as word-like in a familiar accent than a foreign accent.
Style APA, Harvard, Vancouver, ISO itp.
7

Egorova, Natalia. "Neurobiology of speech act processing". Thesis, University of Cambridge, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.648313.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
8

Wu, Lizhong. "Speech processing with neural networks". Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.259529.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Saleh, Gaafar Mustafa Kamil. "Bayesian inference in speech processing". Thesis, University of Cambridge, 1997. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627179.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
10

Han, Kun. "Supervised Speech Separation And Processing". The Ohio State University, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=osu1407865723.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

Barger, Peter James. "Speech processing for forensic applications". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36081/1/36081_Barger_1998.pdf.

Pełny tekst źródła
Streszczenie:
This thesis examines speech processing systems appropriate for use in forensic analysis. The need for automatic speech processing systems for forensic use is justified by the increasing use of electronically recorded speech for communication. An automatic speaker identification and verification system is described which was tested on data gathered by the Queensland Police Force. Speaker identification using Gaussian mixture models (GMMs) is shown to be useful as an indicator of identity, but not sufficiently accurate to be used as the sole means of identification. It is shown that training GMMs on speech of one language and testing on speech of another language introduces significant bias into the results, which is unpredictable in its effects. This has implications for the performance of the system on subjects attempting to disguise their voices. Automatic gender identification systems are shown to be highly accurate, attaining 98% accuracy, even with very simple classifiers, and when tested on speech degraded by coding or reverberation. These gender gates are useful as initial classifiers in a larger speaker classification system and may even find independent use in a forensic environment. A dual microphone method of improving the performance of speaker identification systems in noisy environments is described. The method gives a significant improvement in log-likelihood scores when its output is used as input to a GMM. This implies that speaker identification tests may be improved in accuracy. A method of automatically assessing the quality of transmitted speech segments using a classification scheme is described. By classifying the difference between cepstral parameters describing the original speech and the transmitted speech, an estimate of the speech quality is obtained.
Style APA, Harvard, Vancouver, ISO itp.
12

Vidal, Dos Santos Hector Yamil. "Phonological prediction in speech processing". Doctoral thesis, SISSA, 2016. http://hdl.handle.net/20.500.11767/4927.

Pełny tekst źródła
Streszczenie:
Auditory speech perception can be described as the task of mapping an auditory signal into meaning. We routinely perform this task in an automatic and effortless manner, which might conceal the complexity behind this process. It should be noted that the speech signal is highly variable, ambiguous and usually perceived in noise. One possible strategy the brain might use to handle this task is to generate predictions about the incoming auditory stream. Prediction occupies a prominent role in cognitive functions ranging from perception to motor control. In the specific case of speech perception, evidence shows that listeners are able to make predictions about incoming speech stimuli. Word processing, for example, is facilitated by the context of a sentence. Furthermore, electroencephalography studies have shown neural correlates that behave like error signals triggered when an unexpected word is encountered. But these examples of prediction in speech processing occur between words, and rely on semantic and or syntactic knowledge. Given the salient role of prediction in other cognitive domains, we hypothesize that prediction might serve a role in speech processing, even at the phonological level (within words) and independently from higher level information such as syntax or semantics. In other words, the brain might use the first phonemes of a word to anticipate which should be the following ones. To test this hypothesis, we performed three electroencephalography experiments with an oddball design. This approach allowed us to present individual words in a context that does not contain neither semantic nor syntactic information. Additionally, this type of experimental design is optimal for the elicitation of event related potentials that are well established marker of prediction violation, such as the Mismatch Negativity (MMN) and P3b responses. In these experiments, participants heard repetitions of standard words, among which, deviant words were presented infrequently. Importantly, deviant words were composed by the same syllables as standard words, although in different combinations. For example if in an experiment XXX and YYY were two standard words, XXY could be a deviant word. We expected that if as we proposed, the first phonemes of a word are used to predict which should be the following ones, encountering a deviant of this kind would elicit a prediction error signal. In Chapter 3, we establish that as we expected, the presentation of deviant words, composed of an unexpected sequence of phonemes, generates a chain of well established prediction error signals, which we take as evidence of the prediction of the forthcoming phonemes of a word. Furthermore, we show that the amplitude of these error signals can be modulated by the amount of congruent syllables presented before the point of deviance, which suggests that prediction strength can increase within a word as previous predictions prove to be successful. In Chapter 4, we study the modulating role of attentional set on the chain of prediction error signals. In particular we show that while high level prediction (indexed by the P3b response) is strategically used depending on the task at hand, early prediction error signals such as the MMN response are generated automatically, even when participants are simply instructed to listen to all the words. These results imply that phonological predictions are automatically deployed while listening to words, regardless of the task at hand. In Chapter 5, we extend our results to a more complex stimulus set that resemble natural speech more closely. Furthermore we show that the amplitude of the MMN and P3b prediction error signals is correlated with participant's reaction time in an on-line deviant detection task. This provides a strong argument in favor of a functional role of phonological predictions in speech processing. Taken together, this work shows that phonological predictions can be generated even in the absence higher level information such as syntax and semantics. This might help the human brain to complete the challenging task of mapping such a variable and noisy signal as speech, into meaning, in real time.
Style APA, Harvard, Vancouver, ISO itp.
13

Coetzee, H. J. "The development of a new objective speech quality measure for speech coding applications". Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/15474.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
14

Al-Naimi, Khaldoon Taha. "Advanced speech processing and coding techniques". Thesis, University of Surrey, 2002. http://epubs.surrey.ac.uk/843488/.

Pełny tekst źródła
Streszczenie:
Over the past two decades there has been substantial growth in speech communications and new speech related applications. Bandwidth constraints led researchers to investigate ways of compressing speech signals whilst maintaining speech quality and intelligibility so as to increase the possible number of customers for the given bandwidth. Because of this a variety of speech coding techniques have been proposed over this period. At the heart of any proposed speech coding method is quantisation of the speech production model parameters that need to be transmitted to the decoder. Quantisation is a controlling factor for the targeted bit rates and for meeting quality requirements. The objectives of the research presented in this thesis are twofold. The first enabling the development of a very low bit rate speech coder which maintains quality and intelligibility. This includes increasing the robustness to various operating conditions as well as enhancing the estimation and improving the quantisation of speech model parameters. The second objective is to provide a method for enhancing the performance of an existing speech related application. The first objective is tackled with the aid of three techniques. Firstly, various novel estimation techniques are proposed which are such that the resultant estimated speech production model parameters have less redundant information and are highly correlated. This leads to easier quantisation (due to higher correlation) and therefore to bit saving. The second approach is to make use of the joint effect of the quantisation of spectral parameters (i.e. LSF and spectral amplitudes) for their big impact on the overall bit allocation required. Work towards the first objective also includes a third technique which enhances the estimation of a speech model parameter (i.e. the pitch) through a robust statistics-based post-processing (or tracking) method which operates in noise contaminated environments. Work towards the second objective focuses on an application where speech plays an important role, namely echo-canceller and noise-suppressor systems. A novel echo-canceller method is proposed which resolves most of the weaknesses present in existing echo-canceller systems and improves the system performance.
Style APA, Harvard, Vancouver, ISO itp.
15

Zwyssig, Erich Paul. "Speech processing using digital MEMS microphones". Thesis, University of Edinburgh, 2013. http://hdl.handle.net/1842/8287.

Pełny tekst źródła
Streszczenie:
The last few years have seen the start of a unique change in microphones for consumer devices such as smartphones or tablets. Almost all analogue capacitive microphones are being replaced by digital silicon microphones or MEMS microphones. MEMS microphones perform differently to conventional analogue microphones. Their greatest disadvantage is significantly increased self-noise or decreased SNR, while their most significant benefits are ease of design and manufacturing and improved sensitivity matching. This thesis presents research on speech processing, comparing conventional analogue microphones with the newly available digital MEMS microphones. Specifically, voice activity detection, speaker diarisation (who spoke when), speech separation and speech recognition are looked at in detail. In order to carry out this research different microphone arrays were built using digital MEMS microphones and corpora were recorded to test existing algorithms and devise new ones. Some corpora that were created for the purpose of this research will be released to the public in 2013. It was found that the most commonly used VAD algorithm in current state-of-theart diarisation systems is not the best-performing one, i.e. MLP-based voice activity detection consistently outperforms the more frequently used GMM-HMM-based VAD schemes. In addition, an algorithm was derived that can determine the number of active speakers in a meeting recording given audio data from a microphone array of known geometry, leading to improved diarisation results. Finally, speech separation experiments were carried out using different post-filtering algorithms, matching or exceeding current state-of-the art results. The performance of the algorithms and methods presented in this thesis was verified by comparing their output using speech recognition tools and simple MLLR adaptation and the results are presented as word error rates, an easily comprehensible scale. To summarise, using speech recognition and speech separation experiments, this thesis demonstrates that the significantly reduced SNR of the MEMS microphone can be compensated for with well established adaptation techniques such as MLLR. MEMS microphones do not affect voice activity detection and speaker diarisation performance.
Style APA, Harvard, Vancouver, ISO itp.
16

Pass, A. R. "Towards pose invariant visual speech processing". Thesis, Queen's University Belfast, 2013. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.580170.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Stevens, D. A. "Non-linear prediction for speech processing". Thesis, Swansea University, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.639110.

Pełny tekst źródła
Streszczenie:
For over 20 years linear prediction has been one of the most widely used methods for analysing speech signals. Linear predictors have been used to model the vocal tract in all areas of speech processing from speech recognition to speech synthesis. However, Teager showed as early as 1980 by measuring the flow within the vocal tract during the pronunciation of a vowel sound, that the vocal tract is a non-linear system. As such the standard linear predictors are unable to model all the vocal tract information available in the speech signal. This work looks at replacing or complementing the standard linear models with non-linear ones in order to improve the modelling of the vocal tract. Several different methods of both generating and implementing non-linear models of the vocal tract are assessed to see how much improvement in prediction can be achieved by using non-linear models, either in place of, or complementing, the standard linear models. Two basic approaches to non-linear prediction have been used. The first of these is to configure a multi-layered perceptron (MLP) as a non-linear predictor and then to train the MLP to predict the speech signal. The second method is known as a split function approach as it effectively splits the overall predictor function into smaller sub-functions each of which requires a less complex predictor function than the whole. This second method uses a classification stage to determine what type of speech is present and then uses a separate predictor for each of the classifications. Initial results using a single MLP predictor proved ineffective, returning gains of 0.1 to 0.3 dB in excess of the standard LPC. This is thought to be due to an inability of the networks used to model the full dynamic complexity of the speech signal. However with the split function predictors it is shown that relatively high prediction gains can be achieved using a few simple sub-functions. With four linear sub-functions gains of 2.1 dB have been achieved over the standard LPC.
Style APA, Harvard, Vancouver, ISO itp.
18

Little, M. A. "Biomechanically informed nonlinear speech signal processing". Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Pełny tekst źródła
Streszczenie:
Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.
Style APA, Harvard, Vancouver, ISO itp.
19

Wang, Alice 1975. "Eigenstructure based speech processing in noise". Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/46218.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
20

Strand, Elizabeth A. "Gender Stereotype Effects in Speech Processing". The Ohio State University, 2000. http://rave.ohiolink.edu/etdc/view?acc_num=osu1380895028.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
21

Xu, Jue [Verfasser]. "Adaptations in Speech Processing / Jue Xu". Berlin : Humboldt-Universität zu Berlin, 2021. http://d-nb.info/1236896939/34.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
22

Morris, Robert W. "Enhancement and recognition of whispered speech". Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
23

Preston, Jonathan. "Phonological processing and speech production in preschoolers with speech sound disorders". Related electronic resource: Current Research at SU : database of SU dissertations, recent titles available full text, 2008. http://wwwlib.umi.com/cr/syr/main.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
24

Payne, Nicole, i Saravanan Elangovan. "Musical Training Influences Temporal Processing of Speech and Non-Speech Contrasts". Digital Commons @ East Tennessee State University, 2012. https://dc.etsu.edu/etsu-works/1565.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
25

Payne, N., Saravanan Elangovan i Jacek Smurzynski. "Auditory Temporal Processing of Speech and Non-speech Contrasts in Specialized Listeners". Digital Commons @ East Tennessee State University, 2012. https://dc.etsu.edu/etsu-works/2216.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
26

Ng, Kwok-hang Ashley. "Phonological processing in children with speech disorders". Click to view the E-thesis via HKUTO, 1995. http://sunzi.lib.hku.hk/hkuto/record/B36209193.

Pełny tekst źródła
Streszczenie:
Thesis (B.Sc)--University of Hong Kong, 1995.
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, April 28, 1995." Also available in print.
Style APA, Harvard, Vancouver, ISO itp.
27

Lukach, Melanie. "Speech production processing in the second language". Thesis, University of Ottawa (Canada), 1993. http://hdl.handle.net/10393/6779.

Pełny tekst źródła
Streszczenie:
The phenomenon of foreign accent has long been a topic of linguistic theory. Neufeld proposes that speech production, especially at the phonological level, is hampered by the use of (conscious or unconscious) knowledge that speakers have about the L2--metalinguistic knowledge. Those who begin acquiring an L2 after the age of five focus more on structural correctness than younger learners, and tend to use this metalinguistic knowledge more often. Thus even among balanced bilinguals, on an experiment designed to induce focus on form, older learners should perform more speech errors and dysfluencies than native speakers or early bilinguals, and tend to correct more. This pattern should be even more pronounced in learners who have acquired their L2 in a formal (school) context. An experiment consisting of five tasks was designed to test these three points of Neufeld's Pre- and Post-Articulatory Verification (PAV) model. (Abstract shortened by UMI.)
Style APA, Harvard, Vancouver, ISO itp.
28

Batri, Nadim. "Robust spectral parameter coding in speech processing". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0005/MQ43996.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

Prager, Richard William. "Parallel processing networks for automatic speech recognition". Thesis, University of Cambridge, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238443.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
30

Garner, Philip Neil. "Bayesian approaches to uncertainty in speech processing". Thesis, University of East Anglia, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.554321.

Pełny tekst źródła
Streszczenie:
Many techniques in speech processing require inference based on observations that are of- ten noisy, incomplete or scarce. In such situations, it is necessary to draw on statistical techniques that themselves must be robust to the nature of the observations. The Bayesian method is a school of thought within statistics that provides such a robust framework for handling "difficult" data. In particular, it provides means to handle situations where data are scarce or even missing. Three broad situations are outlined in which the Bayesian technique is helpful to solve the associated problems. The analysis covers eight publications that appeared between 1996 and 201l. Dialogue act recognition is the inference of dialogue acts or moves from words spoken in a conversation. A technique is presented based on counting words. It is formulated to be robust to scarce words, and extended such that only discriminative words need be considered. A method of incorporating formant measurements into a hidden Markov model for au- tomatic speech recognition is then outlined. In this case, the Bayesian method leads to a re-interpretation of the formant confidence as the variance of a probability density function describing the location of a formant. Finally, the Gaussian model of speech in noise is examined leading to improved methods for voice activity detection and for noise robustness.
Style APA, Harvard, Vancouver, ISO itp.
31

Macdonald, U. U. "Some results in speech processing and recognition". Thesis, University of St Andrews, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.378965.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
32

Javed, Hamza Ahmed. "Perceptual modelling and processing of reverberant speech". Thesis, Imperial College London, 2016. http://hdl.handle.net/10044/1/51415.

Pełny tekst źródła
Streszczenie:
The study of reverberation, broadly defined as the multipath propagation of sound in an enclosed space, has attracted significant interest from speech processing engineers and researchers alike. This is due in large part to the proliferation of speech processing technologies that facilitate distant talking speech input. In such scenarios, understanding the impact of reverberation on speech quality, and mitigating its detrimental impact through dereverberation techniques, are important and practically well motivated tasks. This research concerns both these topics. More specifically, in this work we (1) extend and develop an objective measure for predicting the level of perceived reverberation, (2) conduct an experimental investigation into reverberation perception and (3) propose the use of a spherical microphone array rake receiver to perform speech dereverberation. In order to assess the level of perceived reverberation in speech, we develop the extended Reverberation Decay Tail (RDTx) measure. Employing an improved perceptual model, the performance of the measure is first evaluated objectively. Later, we propose experimental methodologies and listening test schemes to collect subjective assessments. From the data obtained, the acoustic parameters most strongly correlated with reverberation perception are identified. The insights gained from the experimental investigation are used to further develop and validate the model of perceived reverberation incorporated in the RDTx measure. The final contribution of this work, is the formulation of acoustic rake receivers in the spherical harmonic domain, which exploit signal reflections to perform speech dereverberation. Evaluating the proposed designs using widely adopted objective metrics, as well as the objective measures developed in this work, demonstrates the constructive use of early reflections can lead to substantial dereverberation.
Style APA, Harvard, Vancouver, ISO itp.
33

Wells, Ian. "Digital signal processing architectures for speech recognition". Thesis, University of the West of England, Bristol, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.294705.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Kearns, Ruth Katherine. "Prelexical speech processing by mono- and bilinguals". Thesis, University of Cambridge, 1994. https://www.repository.cam.ac.uk/handle/1810/283696.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
35

Benatan, Matthew Aaron. "Audio-visual speech processing for multimedia localisation". Thesis, University of Leeds, 2016. http://etheses.whiterose.ac.uk/16285/.

Pełny tekst źródła
Streszczenie:
For many years, film and television have dominated the entertainment industry. Recently, with the introduction of a range of digital formats and mobile devices, multimedia’s ubiquity as the dominant form of entertainment has increased dramatically. This, in turn, has increased demand on the entertainment industry, with production companies looking to increase their revenue by providing entertainment media to a growing international market. This brings with it challenges in the form of multimedia localisation - the process of preparing content for international distribution. The industry is now looking to modernise production processes - moving what were once wholly manual practices to semi-automated workflows. A key aspect of the localisation process is the alignment of content, such as subtitles or audio, when adapting content from one region to another. One method of automating this is through using audio content as a guide, providing a solution via audio-to-text alignment. While many approaches for audio-to-text alignment currently exist, these all require language models - meaning that dozens of languages models would be required for these approaches to be reliably implemented in large production companies. To address this, this thesis explores the development of audio-to-text alignment procedures which do not rely on language models, instead providing a language independent method for aligning multimedia content. To achieve this, the project explores both audio and visual speech processing, with a focus on voice activity detection, as a means for segmenting and aligning audio and text data. The thesis first presents a novel method for detecting speech activity in entertainment media. This method is compared with current state of the art, and demonstrates significant improvement over baseline methods. Secondly, the thesis explores a novel set of features for detecting voice activity in visual speech data. Here, we show that the combination of landmark and appearance-based features outperforms recent methods for visual voice activity detection, and specifically that the incorporation of landmark features is particularly crucial when presented with challenging natural speech data. Lastly, a speech activity-based alignment framework is presented which demonstrates encouraging results. Here, we show that Dynamic Time Warping (DTW) can be used for segment matching and alignment of audio and subtitle data, and we also present a novel method for aligning scene-level content which outperforms DTW for sequence alignment of finer-level data. To conclude, we demonstrate that combining global and local alignment approaches achieves strong alignment estimates, but that the resulting output is not sufficient for wholly automated subtitle alignment. We therefore propose that this be used as a platform for the development of lexical-discovery based alignment techniques, as the general alignment provided by our system would improve symbolic sequence discovery for sparse dictionary-based systems.
Style APA, Harvard, Vancouver, ISO itp.
36

Mészáros, Tomáš. "Speech Analysis for Processing of Musical Signals". Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2015. http://www.nusl.cz/ntk/nusl-234974.

Pełny tekst źródła
Streszczenie:
Hlavním cílem této práce je obohatit hudební signály charakteristikami lidské řeči. Práce zahrnuje tvorbu audioefektu inspirovaného efektem talk-box: analýzu hlasového ústrojí vhodným algoritmem jako je lineární predikce, a aplikaci odhadnutého filtru na hudební audio-signál. Důraz je kladen na dokonalou kvalitu výstupu, malou latenci a nízkou výpočetní náročnost pro použití v reálném čase. Výstupem práce je softwarový plugin využitelný v profesionálních aplikacích pro úpravu audia a při využití vhodné hardwarové platformy také pro živé hraní. Plugin emuluje reálné zařízení typu talk-box a poskytuje podobnou kvalitu výstupu s unikátním zvukem.
Style APA, Harvard, Vancouver, ISO itp.
37

Roweis, Sam T. Hopfield John J. Abu-Mostafa Yaser S. Perona Pietro. "Data driven production models for speech processing /". Diss., Pasadena, Calif. : California Institute of Technology, 1999. http://resolver.caltech.edu/CaltechETD:etd-02272008-093303.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Schwerin, Belinda Marie. "Modulation Domain Based Processing for Speech Enhancement". Thesis, Griffith University, 2013. http://hdl.handle.net/10072/366414.

Pełny tekst źródła
Streszczenie:
For a long time, the spectral envelope has been accepted as the principal carrier of information important to speech. Therefore much of the work done for speech applications, such as automatic speech recognition and speech enhancement, has aimed to process this envelope. For speech enhancement, given the quasi-stationarity of speech, many approaches have been based on short-time processing of speech in a Fourier analysis-modification-synthesis (AMS) framework. Within this framework, either the magnitude and/or phase spectrum can be modified by a noise suppression or signal estimation approach to achieve enhancement. Most commonly, it is the short-time (acoustic) magnitude spectrum which is modified in order to suppress noise. While there are many methods for enhancement in the literature, it is generally agreed that current methods only achieve in making noise less perceptually annoying while maintaining intelligibility, leaving much room for improvement. In more recent years, the low-frequency temporal modulations of the spectral envelope have received increasing attention. Findings of physiological and psychoacoustic experiments have indicated the importance of these modulations in the human auditory system. This has led to the view that these temporal modulations convey much of the information necessary for speech perception. Many of the efforts to apply modulation processing to the enhancement of speech originated from work in automatic speech recognition, and are based on filtering the trajectories of each acoustic band. However, these filters were typically designed to operate over the entire utterance, without accounting for the properties of speech and noise in the signal. Consequently, processed speech quality is quite poor where corrupting noise types are dissimilar from that used to design the filters.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
Griffith School of Engineering
Science, Environment, Engineering and Technology
Full Text
Style APA, Harvard, Vancouver, ISO itp.
39

Mitchell, Heather Lynn 1968. "Cognitive-linguistic processing demands and speech breathing". Thesis, The University of Arizona, 1993. http://hdl.handle.net/10150/278341.

Pełny tekst źródła
Streszczenie:
This investigation examined the influence of cognitive-linguistic processing demands on speech breathing. Twenty women were studied during performance of two speaking activities designed to differ in cognitive-linguistic planning requirements. Speech breathing was monitored with respiratory magnetometers from which recordings were made of anteroposterior diameter changes of the rib cage and abdomen. Results indicated that speech breathing was highly similar across speaking conditions, with the exception that the average lung volume expended per syllable was greater during performance of the more demanding speaking activity. Further analyses suggested that greater lung volume expenditures were associated with longer expiratory pause times. In conclusion, it appears that general speech breathing performance is essentially unaffected by variations in cognitive-linguistic demands, however, certain fluency-related breathing behaviors are highly sensitive to such demands.
Style APA, Harvard, Vancouver, ISO itp.
40

Principi, Emanuele, i Emanuele Principi. "Pre-processing techniques for automatic speech recognition". Doctoral thesis, Università Politecnica delle Marche, 2009. http://hdl.handle.net/11566/242152.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
41

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing". Thesis, Queensland University of Technology, 2008. https://eprints.qut.edu.au/17689/3/David_Dean_Thesis.pdf.

Pełny tekst źródła
Streszczenie:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
Style APA, Harvard, Vancouver, ISO itp.
42

Dean, David Brendan. "Synchronous HMMs for audio-visual speech processing". Queensland University of Technology, 2008. http://eprints.qut.edu.au/17689/.

Pełny tekst źródła
Streszczenie:
Both human perceptual studies and automaticmachine-based experiments have shown that visual information from a speaker's mouth region can improve the robustness of automatic speech processing tasks, especially in the presence of acoustic noise. By taking advantage of the complementary nature of the acoustic and visual speech information, audio-visual speech processing (AVSP) applications can work reliably in more real-world situations than would be possible with traditional acoustic speech processing applications. The two most prominent applications of AVSP for viable human-computer-interfaces involve the recognition of the speech events themselves, and the recognition of speaker's identities based upon their speech. However, while these two fields of speech and speaker recognition are closely related, there has been little systematic comparison of the two tasks under similar conditions in the existing literature. Accordingly, the primary focus of this thesis is to compare the suitability of general AVSP techniques for speech or speaker recognition, with a particular focus on synchronous hidden Markov models (SHMMs). The cascading appearance-based approach to visual speech feature extraction has been shown to work well in removing irrelevant static information from the lip region to greatly improve visual speech recognition performance. This thesis demonstrates that these dynamic visual speech features also provide for an improvement in speaker recognition, showing that speakers can be visually recognised by how they speak, in addition to their appearance alone. This thesis investigates a number of novel techniques for training and decoding of SHMMs that improve the audio-visual speech modelling ability of the SHMM approach over the existing state-of-the-art joint-training technique. Novel experiments are conducted within to demonstrate that the reliability of the two streams during training is of little importance to the final performance of the SHMM. Additionally, two novel techniques of normalising the acoustic and visual state classifiers within the SHMM structure are demonstrated for AVSP. Fused hidden Markov model (FHMM) adaptation is introduced as a novel method of adapting SHMMs from existing wellperforming acoustic hidden Markovmodels (HMMs). This technique is demonstrated to provide improved audio-visualmodelling over the jointly-trained SHMMapproach at all levels of acoustic noise for the recognition of audio-visual speech events. However, the close coupling of the SHMM approach will be shown to be less useful for speaker recognition, where a late integration approach is demonstrated to be superior.
Style APA, Harvard, Vancouver, ISO itp.
43

Wilson, W. R. "Speech motor control". Thesis, University of Essex, 1986. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.376738.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
44

陳我智 i Ngor-chi Chan. "Text-to-speech conversion for Putonghua". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1990. http://hub.hku.hk/bib/B31209580.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
45

Weywadt, Christina R. "Lateralization of pragmatic processsing : a visual half-field investigation of speech act processing". Virtual Press, 2004. http://liblink.bsu.edu/uhtbin/catkey/1292992.

Pełny tekst źródła
Streszczenie:
The current study utilized a priming paradigm in conjunction with a visual halffield presentation to determine if the right hemisphere contributes to pragmatic processing. Primes included conversational dialogues that either performed a speech act or did not. The targets identified the speech act and were presented to one of the two visual fields (lvf-RH or rvf-LH). It was hypothesized that the right visual field-left hemisphere (rvf-LH) would be more accurate and faster at identifying targets regardless of the script type that preceded it and the left visual field-right hemisphere (lvf-RH) would be significantly more accurate and faster at identifying targets when preceded by a script that performed the identified speech act. Results indicated that the lvf-RH was more accurate and faster at identifying a target regardless of the type of script that preceded it, while the rvf-LH was differentially affected by the type of script.
Department of Psychological Science
Style APA, Harvard, Vancouver, ISO itp.
46

Quackenbush, Schuyler Reynier. "Objective measures of speech quality". Diss., Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/13376.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
47

Reif, Angela. "Self Regulatory Depletion Effects On Speed Within A Complex Speech Processing Task". Bowling Green State University / OhioLINK, 2014. http://rave.ohiolink.edu/etdc/view?acc_num=bgsu1400183863.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

Nathan, Elizabeth. "The development of speech processing skills in children with and without speech difficulties". Thesis, University College London (University of London), 2001. http://discovery.ucl.ac.uk/1349803/.

Pełny tekst źródła
Streszczenie:
Children with developmental speech disorder of no known aetiology constitute a heterogeneous group, both in their presenting difficulties, which can include additional language and speech perception difficulties, and in the developmental course of the disorder. This thesis examines this heterogeneity from a developmental and psycholinguistic perspective. Using a longitudinal design, speech processing and language skills are explored over three years in a group of children with speech difficulties (n=47) and an age- and nonverbal IQ-matched longitudinal control group (n=47), mean age 4;06 - 6;07. Other measures were of developmental history, family history, psychosocial status and therapy input. Key areas of investigation were: the proportion of children whose speech later resolves; uncovering the 'resolving' and 'persisting' profile; the role of input processing in speech development, in particular, the role of accent variability; and the occurrence of dissociable speech processing patterns on matched word/nonword repetition and on speech input tasks. Group characteristics were examined through an analysis of patterns of dissociation on tasks across the group and an examination of patterns of association on speech and language measures (in comparison to the control group) in order to establish the developmental relationships between different aspects of speech processing. Thus concurrent and longitudinal relationships were examined using descriptive statistics, prospective and retrospective subgroup analyses and multiple regression analyses. A 'persisting' speech profile was identified as a pervasive speech processing and language difficulty and/or more severe speech output problems. A 'resolved' profile was confined to early, moderate, specific speech difficulties. Apart from nonword repetition, there was no evidence that speech outcome was related to different rates of speech or language development. Using evidence from normal and atypical development, an interactive view of speech development is outlined. Despite the need to understand development as interactive, speech output performance is argued to be the main factor mediating and constraining change between the ages of 4-6 in children with speech difficulties. An emerging discrepancy between word and nonword repetition, with nonword repetition not improving at similar rates to word repetition in some children with persisting speech difficulties, is cited as additional evidence that speech output, in particular, motor programming deficit, is the core characteristic of a persisting speech disorder.
Style APA, Harvard, Vancouver, ISO itp.
49

Stark, Anthony. "Phase Spectrum Based Speech Processing and Spectral Energy Estimation for Robust Speech Recognition". Thesis, Griffith University, 2011. http://hdl.handle.net/10072/366490.

Pełny tekst źródła
Streszczenie:
Speech is the dominant mode of communication between humans; simple to learn, easy to use and integral for modern life. Given the importance of speech, development of a human-machine speech interface has been greatly anticipated. This challenging task is encapsulated in the digital speech processing research field. In this dissertation, two specific areas of research are considered: 1) the use of short-time Fourier spectral phase in digital speech processing and 2) use of the minimum mean square error spectral energy estimator for environment-robust automatic speech recognition. In speech processing and modelling, the short-time Fourier spectral phase has been considered of minor importance. This is because classic psychoacoustic experiments have shown speech intelligibility to be closely related to short-time Fourier spectral magnitude. Given this result, it is unsurprising that the majority of speech processing literature has involved exploitation of the short-time magnitude spectrum. Despite this, recent studies have shown useful information can be extracted from the spectral phase of speech. As a result, it is now known that spectral phase possesses much of the same intelligibility information as spectral magnitude. It is this avenue of research that is explored in greater detail within this dissertation. In particular, we investigate two phase derived quantities – the short-time instantaneous frequency spectrum and the short-time group delay spectrum. The properties of both spectra are investigated mathematically and empirically, identifying the relationship between known speech features and the underlying phase spectrum. We continue the investigation by examining two related quantities – the instantaneous frequency deviation and the group delay deviation. As a result of this research, two novel phase-based spectral representations are proposed, both of which show a high degree information applicable to speech processing.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
Griffith School of Engineering
Science, Environment, Engineering and Technology
Full Text
Style APA, Harvard, Vancouver, ISO itp.
50

Lee, Vin-yan Vivian. "Speech errors and the language processing in Cantonese". Click to view the E-thesis via HKUTO, 2001. http://sunzi.lib.hku.hk/hkuto/record/B36207998.

Pełny tekst źródła
Streszczenie:
Thesis (B.Sc)--University of Hong Kong, 2001.
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, May 4, 2001." Also available in print.
Style APA, Harvard, Vancouver, ISO itp.
Oferujemy zniżki na wszystkie plany premium dla autorów, których prace zostały uwzględnione w tematycznych zestawieniach literatury. Skontaktuj się z nami, aby uzyskać unikalny kod promocyjny!

Do bibliografii