Rozprawy doktorskie na temat „Speech processing systems”

Kliknij ten link, aby zobaczyć inne rodzaje publikacji na ten temat: Speech processing systems.

Utwórz poprawne odniesienie w stylach APA, MLA, Chicago, Harvard i wielu innych

Wybierz rodzaj źródła:

Sprawdź 50 najlepszych rozpraw doktorskich naukowych na temat „Speech processing systems”.

Przycisk „Dodaj do bibliografii” jest dostępny obok każdej pracy w bibliografii. Użyj go – a my automatycznie utworzymy odniesienie bibliograficzne do wybranej pracy w stylu cytowania, którego potrzebujesz: APA, MLA, Harvard, Chicago, Vancouver itp.

Możesz również pobrać pełny tekst publikacji naukowej w formacie „.pdf” i przeczytać adnotację do pracy online, jeśli odpowiednie parametry są dostępne w metadanych.

Przeglądaj rozprawy doktorskie z różnych dziedzin i twórz odpowiednie bibliografie.

1

Coetzee, H. J. "The development of a new objective speech quality measure for speech coding applications". Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/15474.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
2

Morris, Robert W. "Enhancement and recognition of whispered speech". Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
3

Quackenbush, Schuyler Reynier. "Objective measures of speech quality". Diss., Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/13376.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
4

Lucey, Simon. "Audio-visual speech processing". Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
5

Chiou, Fred Y. "User-interactive speech enhancement using fuzzy logic". Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/14916.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
6

陳我智 i Ngor-chi Chan. "Text-to-speech conversion for Putonghua". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1990. http://hub.hku.hk/bib/B31209580.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
7

Barger, Peter James. "Speech processing for forensic applications". Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36081/1/36081_Barger_1998.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
This thesis examines speech processing systems appropriate for use in forensic analysis. The need for automatic speech processing systems for forensic use is justified by the increasing use of electronically recorded speech for communication. An automatic speaker identification and verification system is described which was tested on data gathered by the Queensland Police Force. Speaker identification using Gaussian mixture models (GMMs) is shown to be useful as an indicator of identity, but not sufficiently accurate to be used as the sole means of identification. It is shown that training GMMs on speech of one language and testing on speech of another language introduces significant bias into the results, which is unpredictable in its effects. This has implications for the performance of the system on subjects attempting to disguise their voices. Automatic gender identification systems are shown to be highly accurate, attaining 98% accuracy, even with very simple classifiers, and when tested on speech degraded by coding or reverberation. These gender gates are useful as initial classifiers in a larger speaker classification system and may even find independent use in a forensic environment. A dual microphone method of improving the performance of speaker identification systems in noisy environments is described. The method gives a significant improvement in log-likelihood scores when its output is used as input to a GMM. This implies that speaker identification tests may be improved in accuracy. A method of automatically assessing the quality of transmitted speech segments using a classification scheme is described. By classifying the difference between cepstral parameters describing the original speech and the transmitted speech, an estimate of the speech quality is obtained.
8

Yatrou, Paul M. "Analysis of predictor mistracking in ADPCM speech coders". Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=66242.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
9

Fang, Jie. "Design of secure speech encryption systems". Thesis, Queensland University of Technology, 1990. https://eprints.qut.edu.au/36471/1/36471_Fang_1990.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
This thesis investigates the design of digital speech encryption systems based on low bit rate vocoders. The speech quality and the cryptographic strength of the system are determined by vocoder and encryptor respectively. Three different low bit rate vocoders, 2400 BPS LPC ( Linear Prediction Coding) vocoder, 9600 BPS MELPC (Mul tipulse Excited Linear Prediction Coding) vocoder and 4800 BPS CELP (Codebook Excited Linear Prediction coding) vocoder, have been simulated. The performances of these vocoders are evaluated by using four objective measures. The thesis considers the follows aspects of digital encryption system: * Security * Speech quality * Robustness * System delay Several choices of the cryptosystem for the encryption of digital speech are investigated, and the performance of the overall system is discussed. The work presented in this thesis enables a secure communication system designer to select a speech coding scheme and a cipher system to meet required level of security and speech quality. encryption systems throughout this thesis refers to mathematics analysis and simulation of such systems rather than the actual construction of electronic circuits.
10

Liu, Zhu Lin. "Speech synthesis via adaptive Fourier decomposition". Thesis, University of Macau, 2011. http://umaclib3.umac.mo/record=b2493215.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
11

Chan, Ngor-chi. "Text-to-speech conversion for Putonghua /". [Hong Kong : University of Hong Kong], 1990. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12929475.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
12

Mazel, David S. "Sinusoidal modeling of speech". Thesis, Georgia Institute of Technology, 1986. http://hdl.handle.net/1853/13873.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
13

Alphonso, Issac John. "Network training for continuous speech recognition". Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-10252003-105104.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
14

Little, M. A. "Biomechanically informed nonlinear speech signal processing". Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.
15

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition". Thesis, Queensland University of Technology, 2001.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
16

Cao, Yuchang. "Speech enhancement with single and multiple microphones". Thesis, Queensland University of Technology, 1996.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
17

Chung, Jae H. "A new homomorphic vocoder framework using analysis-by-synthesis excitation analysis". Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/15471.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
18

Crosmer, Joel R. "Very low bit rate speech coding using the line spectrum pair transformation of the LPC coefficients". Diss., Georgia Institute of Technology, 1985. http://hdl.handle.net/1853/15739.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
19

Wang, Raymond Jian-Wei. "Neurocomputing systems for auditory processing". Thesis, The University of Sydney, 1998. https://hdl.handle.net/2123/26278.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
This thesis studies neural computation models and neuromorphic implementations of the auditory pathway with applications to cochlear implants and artificial auditory sensory and processing systems. Very low power analogue computation is addressed through the design of micropower analogue building blocks and an auditory preprocessing module targeted at cochlear implants. The analogue building blocks have been fabricated and tested in a standard Complementary Metal Oxide Silicon (CMOS) process. The auditory pre-processing module design is based on the cochlea signal processing mechanisms and low power microelectronic design methodologies. Compared to existing preprocessing techniques used in cochlear implants, the proposed design has a wider dynamic range and lower power consumption. Furthermore, it provides the phase coding as well as the place coding information that are necessary for enhanced functionality in future cochlear implants. The thesis presents neural computation based approaches to a number of signal-processing problems encountered in cochlear implants. Techniques that can improve the performance of existing devices are also presented. Neural network based models for loudness mapping and pattern recognition based channel selection strategies are described. Compared with state—of—the—art commercial cochlear implants, the thesis results show that the proposed channel selection model produces superior speech sound qualities; and the proposed loudness mapping model consumes substantially smaller amounts of memory. Aside from the applications in cochlear implants, this thesis describes a biologically plausible computational model of the auditory pathways to the superior colliculus based on current neurophysiological findings. The model encapsulates interaural time difference, interaural spectral difference, monaural pathway and auditory space map tuning in the inferior colliculus. A biologically plausible Hebbian-like learning rule is proposed for auditory space neural map tuning, and a reinforcement learning method is used for map alignment with other sensory space maps through activity independent cues. The validity of the proposed auditory pathway model has been verified by simulation using synthetic data. Further, a complete biologically inspired auditory simulation system is implemented in software. The system incorporates models of the external ear, the cochlea, as well as the proposed auditory pathway model. The proposed implementation can mimic the biological auditory sensory system to generate an auditory space map from 3—D sounds. A large amount of real 3-D sound signals including broadband White noise, click noise and speech are used in the simulation experiments. The efiect of the auditory space map developmental plasticity is examined by simulating early auditory space map formation and auditory space map alignment with a distorted visual sensory map. Detailed simulation methods, procedures and results are presented.
20

Rose, Richard C. "The design and performance of an analysis-by-synthesis class of predictive speech coders". Diss., Georgia Institute of Technology, 1988. http://hdl.handle.net/1853/16693.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
21

Fisher, Andrew John. "Speech enhancement for forensic applications". Thesis, Queensland University of Technology, 1995. https://eprints.qut.edu.au/36243/1/36243_Fisher_1995.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Law enforcement agencies often engage in surveillance operations which involve the recording of spoken conversations. As is often the case, these recordings are made with a single microphone under covert conditions. Under this non-ideal situation, the speech signal is highly susceptible to be severely corrupted by various forms of noise, the most common of which is broadband in nature. This thesis presents a study conducted to investigate the enhancement of speech recordings for forensic applications. A new speech enhancement scheme has been proposed here, to provide noise reduction without compromising the intelligibility of the speech. The scheme implements a hybrid approach combining both spectral and root-cepstral subtraction. Extensive testing using both subjective and objective based intelligibility and acceptability assessment schemes, indicate that the system is successful in providing intelligibility improvement and superior signal-to-noise ratio with minimal spectral distortion. In addition, the proposed system was also tested in the capacity as a preprocessing stage to other speech applications such as speech recognition, speaker recognition and speech coding. The system proved to be beneficial for speech coding, while application to the recognition techniques was limited despite showing positive potential. Finally the system was implemented in real-time and was found additionally successful when applied to enhancement of speech transmitted over High Frequency communication channels.
22

Hosom, John-Paul. "Automatic time alignment of phonemes using acoustic-phonetic information /". Full text open access at:, 2000. http://content.ohsu.edu/u?/etd,282.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
23

Ikram, Muhammad Zubair. "Multichannel blind separation of speech signals in a reverberant environment". Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15023.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
24

Wilson, Shawn C. "Voice recognition systems : assessment of implementation aboard U.S. naval ships". Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FWilson.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Thesis (M.S. in Information Systems and Operations)--Naval Postgraduate School, March 2003.
Thesis advisor(s): Michael T. McMaster, Kenneth J. Hagan. Includes bibliographical references (p. 47-49). Also available online.
25

Müller, J. J. "USB telephony interface device for speech recognition applications /". Link to the online version, 2005. http://hdl.handle.net/10019/1127.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
26

Boulis, Constantinos. "Topic learning in text and conversational speech /". Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/5914.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
27

Farges, Eric P. "An analysis-synthesis hidden Markov model of speech". Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/14775.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
28

LeBlanc, Wilfrid P. (Wilfrid Paul) Carleton University Dissertation Engineering Electrical. "Speech coding at low to medium bit rates". Ottawa, 1992.

Znajdź pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
29

Anderson, David Verl. "Audio signal enhancement using multi-resolution sinusoidal modeling". Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15394.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
30

Kale, Kaustubh R. "Low complexity, narrow baseline beamformer for hand-held devices". [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0001223.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
31

Hild, Kenneth E. "Blind separation of convolutive mixtures using Renyi's divergence". [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0002387.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
32

Keenaghan, Kevin Michael. "A Novel Non-Acoustic Voiced Speech Sensor Experimental Results and Characterization". Link to electronic thesis, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0114104-144946/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
33

Iyengar, Vasu. "A low delay 16 kbit/sec coder for speech signals /". Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63799.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
34

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders". Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/36534.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
35

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders". Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-06072004-131138/unrestricted/ertan%5Fali%5Fe%5F200405%5Fphd.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Thesis (Ph. D.)--School of Electrical and Computer Engineering, Georgia Institute of Technology, 2004. Directed by Thomas P. Barnwell, III.
Vita. Includes bibliographical references (leaves 221-226).
36

Ng, H. N. Elaine. "Effects of noise type on speech understanding". Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37990159.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
37

Ng, H. N. Elaine, i 吳凱寧. "Effects of noise type on speech understanding". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37990159.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
38

Lai, Yiu Pong. "Maximum likelihood normalization for robust speech recognition /". View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20LAI.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 98-103). Also available in electronic version. Access restricted to campus users.
39

Li, Chak Fai. "Improved polynomial segment model for speech recognition /". View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20LI.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 80-84). Also available in electronic version. Access restricted to campus users.
40

Macon, Michael W. "Speech synthesis based on sinusoidal modeling". Diss., Georgia Institute of Technology, 1996. http://hdl.handle.net/1853/13904.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
41

Lam, Victor T. M. "The stability of pitch synthesis filters in speech coding /". Thesis, McGill University, 1985. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63361.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
42

O'Rourke, William Thomas. "Real-world evaluation of mobile phone speech enhancement algorithms". [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000585.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
43

Ellis, Richard T. "Speech enhancement system implemented in CMOS". Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/14801.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
44

Al-Darkazali, Mohammed. "Image processing methods to segment speech spectrograms for word level recognition". Thesis, University of Sussex, 2017. http://sro.sussex.ac.uk/id/eprint/71675/.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone.
45

Chu, Kam Keung. "Feature extraction based on perceptual non-uniform spectral compression for noisy speech recognition /". access full-text access abstract and table of contents, 2005. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?mphil-ee-b19887516a.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
Thesis (M.Phil.)--City University of Hong Kong, 2005.
"Submitted to Department of Electronic Engineering in partial fulfillment of the requirements for the degree of Master of Philosophy" Includes bibliographical references (leaves 143-147)
46

Wasmeier, Hans. "Development of tests and preprocessing algorithms for evaluation and improvement of speech recognition units". Thesis, University of British Columbia, 1986. http://hdl.handle.net/2429/26750.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
Streszczenie:
This study considered the evaluation of commercially available isolated word, speaker dependent, speech recognition units, and preprocessing techniques that may be used for improving their performance. The problem was considered in three separate stages. A series of tests were designed to exercise an isolated word, speaker dependent, speech recognition unit. These tests provided a sound basis for determining a given unit's strengths and weaknesses. This knowledge permits a more informed decision on the best recognition device for a given price range. As well, this knowledge may be used in the design of a robust vocabulary, and creation of guidelines for best performance. The test vocabularies were based on the forty English phonemes identified by Rabiner and Schafer [28] and the test variations were representative of common variations which may be expected in normal use. A digital archive system was implemented for storing the voice input of test subjects. This facility provided a data base for an investigation of preprocessing techniques. As well, it permits the testing of different speech recognition units with the same voice input, providing a platform for device comparison. Several speech preprocessing and performance improvement techniques were then investigated. Specifically, two types of time normalization, the enhancement of low energy phonemes and a change in training technique were investigated. These techniques permit a more accurate analysis of the failure mechanism of the speech recognition unit. They may also provide the basis for a speech preprocessor design which could be placed in front of a commercial speech recognition unit. A commercially available speech recognition unit, the NEC SR100, was used as a measure of the effectiveness of the tests and of the improvements. Results of the study indicated that the designed tests and the preprocessing & performance improvement techniques investigated were useful in identifying the speech recognition unit's weaknesses. Also, depending on the economics of implementation, it was found that preprocessing may provide a cost effective solution to some of the recognition unit's shortcomings.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate
47

Lee, Spencer Jaehoon Gilbert Juan E. "Post-speech-recognition processiing in domain-specific text-corpus-based distributed listening system analysis, interpretation and selection of speech recognition results /". Auburn, Ala., 2006. http://repo.lib.auburn.edu/2006%20Summer/Theses/LEE_SPENCER_7.pdf.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
48

Rao, Ram Raghavendra. "Audio-visual interaction in multimedia". Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
49

Necioğlu, Burhan F. "Objectively measured descriptors for perceptual characterization of speakers". Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15035.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.
50

McCree, Alan V. "A new LPC vocoder model for low bit rate speech coding". Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15053.

Pełny tekst źródła
Style APA, Harvard, Vancouver, ISO itp.

Do bibliografii