Dissertations / Theses on the topic 'Speech processing systems'

To see the other types of publications on this topic, follow the link: Speech processing systems.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Speech processing systems.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Coetzee, H. J. "The development of a new objective speech quality measure for speech coding applications." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/15474.

Full text
APA, Harvard, Vancouver, ISO, and other styles
2

Morris, Robert W. "Enhancement and recognition of whispered speech." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Quackenbush, Schuyler Reynier. "Objective measures of speech quality." Diss., Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/13376.

Full text
APA, Harvard, Vancouver, ISO, and other styles
4

Lucey, Simon. "Audio-visual speech processing." Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.
5

Chiou, Fred Y. "User-interactive speech enhancement using fuzzy logic." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/14916.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

陳我智 and Ngor-chi Chan. "Text-to-speech conversion for Putonghua." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1990. http://hub.hku.hk/bib/B31209580.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Barger, Peter James. "Speech processing for forensic applications." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36081/1/36081_Barger_1998.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis examines speech processing systems appropriate for use in forensic analysis. The need for automatic speech processing systems for forensic use is justified by the increasing use of electronically recorded speech for communication. An automatic speaker identification and verification system is described which was tested on data gathered by the Queensland Police Force. Speaker identification using Gaussian mixture models (GMMs) is shown to be useful as an indicator of identity, but not sufficiently accurate to be used as the sole means of identification. It is shown that training GMMs on speech of one language and testing on speech of another language introduces significant bias into the results, which is unpredictable in its effects. This has implications for the performance of the system on subjects attempting to disguise their voices. Automatic gender identification systems are shown to be highly accurate, attaining 98% accuracy, even with very simple classifiers, and when tested on speech degraded by coding or reverberation. These gender gates are useful as initial classifiers in a larger speaker classification system and may even find independent use in a forensic environment. A dual microphone method of improving the performance of speaker identification systems in noisy environments is described. The method gives a significant improvement in log-likelihood scores when its output is used as input to a GMM. This implies that speaker identification tests may be improved in accuracy. A method of automatically assessing the quality of transmitted speech segments using a classification scheme is described. By classifying the difference between cepstral parameters describing the original speech and the transmitted speech, an estimate of the speech quality is obtained.
8

Yatrou, Paul M. "Analysis of predictor mistracking in ADPCM speech coders." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=66242.

Full text
APA, Harvard, Vancouver, ISO, and other styles
9

Fang, Jie. "Design of secure speech encryption systems." Thesis, Queensland University of Technology, 1990. https://eprints.qut.edu.au/36471/1/36471_Fang_1990.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis investigates the design of digital speech encryption systems based on low bit rate vocoders. The speech quality and the cryptographic strength of the system are determined by vocoder and encryptor respectively. Three different low bit rate vocoders, 2400 BPS LPC ( Linear Prediction Coding) vocoder, 9600 BPS MELPC (Mul tipulse Excited Linear Prediction Coding) vocoder and 4800 BPS CELP (Codebook Excited Linear Prediction coding) vocoder, have been simulated. The performances of these vocoders are evaluated by using four objective measures. The thesis considers the follows aspects of digital encryption system: * Security * Speech quality * Robustness * System delay Several choices of the cryptosystem for the encryption of digital speech are investigated, and the performance of the overall system is discussed. The work presented in this thesis enables a secure communication system designer to select a speech coding scheme and a cipher system to meet required level of security and speech quality. encryption systems throughout this thesis refers to mathematics analysis and simulation of such systems rather than the actual construction of electronic circuits.
10

Liu, Zhu Lin. "Speech synthesis via adaptive Fourier decomposition." Thesis, University of Macau, 2011. http://umaclib3.umac.mo/record=b2493215.

Full text
APA, Harvard, Vancouver, ISO, and other styles
11

Chan, Ngor-chi. "Text-to-speech conversion for Putonghua /." [Hong Kong : University of Hong Kong], 1990. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12929475.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Mazel, David S. "Sinusoidal modeling of speech." Thesis, Georgia Institute of Technology, 1986. http://hdl.handle.net/1853/13873.

Full text
APA, Harvard, Vancouver, ISO, and other styles
13

Alphonso, Issac John. "Network training for continuous speech recognition." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-10252003-105104.

Full text
APA, Harvard, Vancouver, ISO, and other styles
14

Little, M. A. "Biomechanically informed nonlinear speech signal processing." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.
15

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition." Thesis, Queensland University of Technology, 2001.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
16

Cao, Yuchang. "Speech enhancement with single and multiple microphones." Thesis, Queensland University of Technology, 1996.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
17

Chung, Jae H. "A new homomorphic vocoder framework using analysis-by-synthesis excitation analysis." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/15471.

Full text
APA, Harvard, Vancouver, ISO, and other styles
18

Crosmer, Joel R. "Very low bit rate speech coding using the line spectrum pair transformation of the LPC coefficients." Diss., Georgia Institute of Technology, 1985. http://hdl.handle.net/1853/15739.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Wang, Raymond Jian-Wei. "Neurocomputing systems for auditory processing." Thesis, The University of Sydney, 1998. https://hdl.handle.net/2123/26278.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This thesis studies neural computation models and neuromorphic implementations of the auditory pathway with applications to cochlear implants and artificial auditory sensory and processing systems. Very low power analogue computation is addressed through the design of micropower analogue building blocks and an auditory preprocessing module targeted at cochlear implants. The analogue building blocks have been fabricated and tested in a standard Complementary Metal Oxide Silicon (CMOS) process. The auditory pre-processing module design is based on the cochlea signal processing mechanisms and low power microelectronic design methodologies. Compared to existing preprocessing techniques used in cochlear implants, the proposed design has a wider dynamic range and lower power consumption. Furthermore, it provides the phase coding as well as the place coding information that are necessary for enhanced functionality in future cochlear implants. The thesis presents neural computation based approaches to a number of signal-processing problems encountered in cochlear implants. Techniques that can improve the performance of existing devices are also presented. Neural network based models for loudness mapping and pattern recognition based channel selection strategies are described. Compared with state—of—the—art commercial cochlear implants, the thesis results show that the proposed channel selection model produces superior speech sound qualities; and the proposed loudness mapping model consumes substantially smaller amounts of memory. Aside from the applications in cochlear implants, this thesis describes a biologically plausible computational model of the auditory pathways to the superior colliculus based on current neurophysiological findings. The model encapsulates interaural time difference, interaural spectral difference, monaural pathway and auditory space map tuning in the inferior colliculus. A biologically plausible Hebbian-like learning rule is proposed for auditory space neural map tuning, and a reinforcement learning method is used for map alignment with other sensory space maps through activity independent cues. The validity of the proposed auditory pathway model has been verified by simulation using synthetic data. Further, a complete biologically inspired auditory simulation system is implemented in software. The system incorporates models of the external ear, the cochlea, as well as the proposed auditory pathway model. The proposed implementation can mimic the biological auditory sensory system to generate an auditory space map from 3—D sounds. A large amount of real 3-D sound signals including broadband White noise, click noise and speech are used in the simulation experiments. The efiect of the auditory space map developmental plasticity is examined by simulating early auditory space map formation and auditory space map alignment with a distorted visual sensory map. Detailed simulation methods, procedures and results are presented.
20

Rose, Richard C. "The design and performance of an analysis-by-synthesis class of predictive speech coders." Diss., Georgia Institute of Technology, 1988. http://hdl.handle.net/1853/16693.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Fisher, Andrew John. "Speech enhancement for forensic applications." Thesis, Queensland University of Technology, 1995. https://eprints.qut.edu.au/36243/1/36243_Fisher_1995.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Law enforcement agencies often engage in surveillance operations which involve the recording of spoken conversations. As is often the case, these recordings are made with a single microphone under covert conditions. Under this non-ideal situation, the speech signal is highly susceptible to be severely corrupted by various forms of noise, the most common of which is broadband in nature. This thesis presents a study conducted to investigate the enhancement of speech recordings for forensic applications. A new speech enhancement scheme has been proposed here, to provide noise reduction without compromising the intelligibility of the speech. The scheme implements a hybrid approach combining both spectral and root-cepstral subtraction. Extensive testing using both subjective and objective based intelligibility and acceptability assessment schemes, indicate that the system is successful in providing intelligibility improvement and superior signal-to-noise ratio with minimal spectral distortion. In addition, the proposed system was also tested in the capacity as a preprocessing stage to other speech applications such as speech recognition, speaker recognition and speech coding. The system proved to be beneficial for speech coding, while application to the recognition techniques was limited despite showing positive potential. Finally the system was implemented in real-time and was found additionally successful when applied to enhancement of speech transmitted over High Frequency communication channels.
22

Hosom, John-Paul. "Automatic time alignment of phonemes using acoustic-phonetic information /." Full text open access at:, 2000. http://content.ohsu.edu/u?/etd,282.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Ikram, Muhammad Zubair. "Multichannel blind separation of speech signals in a reverberant environment." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15023.

Full text
APA, Harvard, Vancouver, ISO, and other styles
24

Wilson, Shawn C. "Voice recognition systems : assessment of implementation aboard U.S. naval ships." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FWilson.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M.S. in Information Systems and Operations)--Naval Postgraduate School, March 2003.
Thesis advisor(s): Michael T. McMaster, Kenneth J. Hagan. Includes bibliographical references (p. 47-49). Also available online.
25

Müller, J. J. "USB telephony interface device for speech recognition applications /." Link to the online version, 2005. http://hdl.handle.net/10019/1127.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Boulis, Constantinos. "Topic learning in text and conversational speech /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/5914.

Full text
APA, Harvard, Vancouver, ISO, and other styles
27

Farges, Eric P. "An analysis-synthesis hidden Markov model of speech." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/14775.

Full text
APA, Harvard, Vancouver, ISO, and other styles
28

LeBlanc, Wilfrid P. (Wilfrid Paul) Carleton University Dissertation Engineering Electrical. "Speech coding at low to medium bit rates." Ottawa, 1992.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
29

Anderson, David Verl. "Audio signal enhancement using multi-resolution sinusoidal modeling." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15394.

Full text
APA, Harvard, Vancouver, ISO, and other styles
30

Kale, Kaustubh R. "Low complexity, narrow baseline beamformer for hand-held devices." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0001223.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Hild, Kenneth E. "Blind separation of convolutive mixtures using Renyi's divergence." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0002387.

Full text
APA, Harvard, Vancouver, ISO, and other styles
32

Keenaghan, Kevin Michael. "A Novel Non-Acoustic Voiced Speech Sensor Experimental Results and Characterization." Link to electronic thesis, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0114104-144946/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
33

Iyengar, Vasu. "A low delay 16 kbit/sec coder for speech signals /." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63799.

Full text
APA, Harvard, Vancouver, ISO, and other styles
34

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/36534.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-06072004-131138/unrestricted/ertan%5Fali%5Fe%5F200405%5Fphd.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (Ph. D.)--School of Electrical and Computer Engineering, Georgia Institute of Technology, 2004. Directed by Thomas P. Barnwell, III.
Vita. Includes bibliographical references (leaves 221-226).
36

Ng, H. N. Elaine. "Effects of noise type on speech understanding." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37990159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
37

Ng, H. N. Elaine, and 吳凱寧. "Effects of noise type on speech understanding." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37990159.

Full text
APA, Harvard, Vancouver, ISO, and other styles
38

Lai, Yiu Pong. "Maximum likelihood normalization for robust speech recognition /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20LAI.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 98-103). Also available in electronic version. Access restricted to campus users.
39

Li, Chak Fai. "Improved polynomial segment model for speech recognition /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20LI.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 80-84). Also available in electronic version. Access restricted to campus users.
40

Macon, Michael W. "Speech synthesis based on sinusoidal modeling." Diss., Georgia Institute of Technology, 1996. http://hdl.handle.net/1853/13904.

Full text
APA, Harvard, Vancouver, ISO, and other styles
41

Lam, Victor T. M. "The stability of pitch synthesis filters in speech coding /." Thesis, McGill University, 1985. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63361.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

O'Rourke, William Thomas. "Real-world evaluation of mobile phone speech enhancement algorithms." [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000585.

Full text
APA, Harvard, Vancouver, ISO, and other styles
43

Ellis, Richard T. "Speech enhancement system implemented in CMOS." Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/14801.

Full text
APA, Harvard, Vancouver, ISO, and other styles
44

Al-Darkazali, Mohammed. "Image processing methods to segment speech spectrograms for word level recognition." Thesis, University of Sussex, 2017. http://sro.sussex.ac.uk/id/eprint/71675/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
The ultimate goal of automatic speech recognition (ASR) research is to allow a computer to recognize speech in real-time, with full accuracy, independent of vocabulary size, noise, speaker characteristics or accent. Today, systems are trained to learn an individual speaker's voice and larger vocabularies statistically, but accuracy is not ideal. A small gap between actual speech and acoustic speech representation in the statistical mapping causes a failure to produce a match of the acoustic speech signals by Hidden Markov Model (HMM) methods and consequently leads to classification errors. Certainly, these errors in the low level recognition stage of ASR produce unavoidable errors at the higher levels. Therefore, it seems that ASR additional research ideas to be incorporated within current speech recognition systems. This study seeks new perspective on speech recognition. It incorporates a new approach for speech recognition, supporting it with wider previous research, validating it with a lexicon of 533 words and integrating it with a current speech recognition method to overcome the existing limitations. The study focusses on applying image processing to speech spectrogram images (SSI). We, thus develop a new writing system, which we call the Speech-Image Recogniser Code (SIR-CODE). The SIR-CODE refers to the transposition of the speech signal to an artificial domain (the SSI) that allows the classification of the speech signal into segments. The SIR-CODE allows the matching of all speech features (formants, power spectrum, duration, cues of articulation places, etc.) in one process. This was made possible by adding a Realization Layer (RL) on top of the traditional speech recognition layer (based on HMM) to check all sequential phones of a word in single step matching process. The study shows that the method gives better recognition results than HMMs alone, leading to accurate and reliable ASR in noisy environments. Therefore, the addition of the RL for SSI matching is a highly promising solution to compensate for the failure of HMMs in low level recognition. In addition, the same concept of employing SSIs can be used for whole sentences to reduce classification errors in HMM based high level recognition. The SIR-CODE bridges the gap between theory and practice of phoneme recognition by matching the SSI patterns at the word level. Thus, it can be adapted for dynamic time warping on the SIR-CODE segments, which can help to achieve ASR, based on SSI matching alone.
45

Chu, Kam Keung. "Feature extraction based on perceptual non-uniform spectral compression for noisy speech recognition /." access full-text access abstract and table of contents, 2005. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?mphil-ee-b19887516a.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
Thesis (M.Phil.)--City University of Hong Kong, 2005.
"Submitted to Department of Electronic Engineering in partial fulfillment of the requirements for the degree of Master of Philosophy" Includes bibliographical references (leaves 143-147)
46

Wasmeier, Hans. "Development of tests and preprocessing algorithms for evaluation and improvement of speech recognition units." Thesis, University of British Columbia, 1986. http://hdl.handle.net/2429/26750.

Full text
APA, Harvard, Vancouver, ISO, and other styles
Abstract:
This study considered the evaluation of commercially available isolated word, speaker dependent, speech recognition units, and preprocessing techniques that may be used for improving their performance. The problem was considered in three separate stages. A series of tests were designed to exercise an isolated word, speaker dependent, speech recognition unit. These tests provided a sound basis for determining a given unit's strengths and weaknesses. This knowledge permits a more informed decision on the best recognition device for a given price range. As well, this knowledge may be used in the design of a robust vocabulary, and creation of guidelines for best performance. The test vocabularies were based on the forty English phonemes identified by Rabiner and Schafer [28] and the test variations were representative of common variations which may be expected in normal use. A digital archive system was implemented for storing the voice input of test subjects. This facility provided a data base for an investigation of preprocessing techniques. As well, it permits the testing of different speech recognition units with the same voice input, providing a platform for device comparison. Several speech preprocessing and performance improvement techniques were then investigated. Specifically, two types of time normalization, the enhancement of low energy phonemes and a change in training technique were investigated. These techniques permit a more accurate analysis of the failure mechanism of the speech recognition unit. They may also provide the basis for a speech preprocessor design which could be placed in front of a commercial speech recognition unit. A commercially available speech recognition unit, the NEC SR100, was used as a measure of the effectiveness of the tests and of the improvements. Results of the study indicated that the designed tests and the preprocessing & performance improvement techniques investigated were useful in identifying the speech recognition unit's weaknesses. Also, depending on the economics of implementation, it was found that preprocessing may provide a cost effective solution to some of the recognition unit's shortcomings.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate
47

Lee, Spencer Jaehoon Gilbert Juan E. "Post-speech-recognition processiing in domain-specific text-corpus-based distributed listening system analysis, interpretation and selection of speech recognition results /." Auburn, Ala., 2006. http://repo.lib.auburn.edu/2006%20Summer/Theses/LEE_SPENCER_7.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
48

Rao, Ram Raghavendra. "Audio-visual interaction in multimedia." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Full text
APA, Harvard, Vancouver, ISO, and other styles
49

Necioğlu, Burhan F. "Objectively measured descriptors for perceptual characterization of speakers." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15035.

Full text
APA, Harvard, Vancouver, ISO, and other styles
50

McCree, Alan V. "A new LPC vocoder model for low bit rate speech coding." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15053.

Full text
APA, Harvard, Vancouver, ISO, and other styles

To the bibliography