Dissertations / Theses: 'Speech processing systems'

1

Coetzee, H. J. "The development of a new objective speech quality measure for speech coding applications." Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/15474.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Morris, Robert W. "Enhancement and recognition of whispered speech." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180338/unrestricted/morris%5frobert%5fw%5f200312%5fphd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Quackenbush, Schuyler Reynier. "Objective measures of speech quality." Diss., Georgia Institute of Technology, 1995. http://hdl.handle.net/1853/13376.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Lucey, Simon. "Audio-visual speech processing." Thesis, Queensland University of Technology, 2002. https://eprints.qut.edu.au/36172/7/SimonLuceyPhDThesis.pdf.

Full text

Abstract:

Speech is inherently bimodal, relying on cues from the acoustic and visual speech modalities for perception. The McGurk effect demonstrates that when humans are presented with conflicting acoustic and visual stimuli, the perceived sound may not exist in either modality. This effect has formed the basis for modelling the complementary nature of acoustic and visual speech by encapsulating them into the relatively new research field of audio-visual speech processing (AVSP). Traditional acoustic based speech processing systems have attained a high level of performance in recent years, but the performance of these systems is heavily dependent on a match between training and testing conditions. In the presence of mismatched conditions (eg. acoustic noise) the performance of acoustic speech processing applications can degrade markedly. AVSP aims to increase the robustness and performance of conventional speech processing applications through the integration of the acoustic and visual modalities of speech, in particular the tasks of isolated word speech and text-dependent speaker recognition. Two major problems in AVSP are addressed in this thesis, the first of which concerns the extraction of pertinent visual features for effective speech reading and visual speaker recognition. Appropriate representations of the mouth are explored for improved classification performance for speech and speaker recognition. Secondly, there is the question of how to effectively integrate the acoustic and visual speech modalities for robust and improved performance. This question is explored in-depth using hidden Markov model(HMM)classifiers. The development and investigation of integration strategies for AVSP required research into a new branch of pattern recognition known as classifier combination theory. A novel framework is presented for optimally combining classifiers so their combined performance is greater than any of those classifiers individually. The benefits of this framework are not restricted to AVSP, as they can be applied to any task where there is a need for combining independent classifiers.

APA, Harvard, Vancouver, ISO, and other styles

5

Chiou, Fred Y. "User-interactive speech enhancement using fuzzy logic." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/14916.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

陳我智 and Ngor-chi Chan. "Text-to-speech conversion for Putonghua." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1990. http://hub.hku.hk/bib/B31209580.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Barger, Peter James. "Speech processing for forensic applications." Thesis, Queensland University of Technology, 1998. https://eprints.qut.edu.au/36081/1/36081_Barger_1998.pdf.

Full text

Abstract:

This thesis examines speech processing systems appropriate for use in forensic analysis. The need for automatic speech processing systems for forensic use is justified by the increasing use of electronically recorded speech for communication. An automatic speaker identification and verification system is described which was tested on data gathered by the Queensland Police Force. Speaker identification using Gaussian mixture models (GMMs) is shown to be useful as an indicator of identity, but not sufficiently accurate to be used as the sole means of identification. It is shown that training GMMs on speech of one language and testing on speech of another language introduces significant bias into the results, which is unpredictable in its effects. This has implications for the performance of the system on subjects attempting to disguise their voices. Automatic gender identification systems are shown to be highly accurate, attaining 98% accuracy, even with very simple classifiers, and when tested on speech degraded by coding or reverberation. These gender gates are useful as initial classifiers in a larger speaker classification system and may even find independent use in a forensic environment. A dual microphone method of improving the performance of speaker identification systems in noisy environments is described. The method gives a significant improvement in log-likelihood scores when its output is used as input to a GMM. This implies that speaker identification tests may be improved in accuracy. A method of automatically assessing the quality of transmitted speech segments using a classification scheme is described. By classifying the difference between cepstral parameters describing the original speech and the transmitted speech, an estimate of the speech quality is obtained.

APA, Harvard, Vancouver, ISO, and other styles

8

Yatrou, Paul M. "Analysis of predictor mistracking in ADPCM speech coders." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=66242.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Fang, Jie. "Design of secure speech encryption systems." Thesis, Queensland University of Technology, 1990. https://eprints.qut.edu.au/36471/1/36471_Fang_1990.pdf.

Full text

Abstract:

This thesis investigates the design of digital speech encryption systems based on low bit rate vocoders. The speech quality and the cryptographic strength of the system are determined by vocoder and encryptor respectively. Three different low bit rate vocoders, 2400 BPS LPC ( Linear Prediction Coding) vocoder, 9600 BPS MELPC (Mul tipulse Excited Linear Prediction Coding) vocoder and 4800 BPS CELP (Codebook Excited Linear Prediction coding) vocoder, have been simulated. The performances of these vocoders are evaluated by using four objective measures. The thesis considers the follows aspects of digital encryption system: * Security * Speech quality * Robustness * System delay Several choices of the cryptosystem for the encryption of digital speech are investigated, and the performance of the overall system is discussed. The work presented in this thesis enables a secure communication system designer to select a speech coding scheme and a cipher system to meet required level of security and speech quality. encryption systems throughout this thesis refers to mathematics analysis and simulation of such systems rather than the actual construction of electronic circuits.

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Zhu Lin. "Speech synthesis via adaptive Fourier decomposition." Thesis, University of Macau, 2011. http://umaclib3.umac.mo/record=b2493215.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Chan, Ngor-chi. "Text-to-speech conversion for Putonghua /." [Hong Kong : University of Hong Kong], 1990. http://sunzi.lib.hku.hk/hkuto/record.jsp?B12929475.

Full text

APA, Harvard, Vancouver, ISO, and other styles

12

Mazel, David S. "Sinusoidal modeling of speech." Thesis, Georgia Institute of Technology, 1986. http://hdl.handle.net/1853/13873.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Alphonso, Issac John. "Network training for continuous speech recognition." Master's thesis, Mississippi State : Mississippi State University, 2003. http://library.msstate.edu/etd/show.asp?etd=etd-10252003-105104.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Little, M. A. "Biomechanically informed nonlinear speech signal processing." Thesis, University of Oxford, 2007. http://ora.ox.ac.uk/objects/uuid:6f5b84fb-ab0b-42e1-9ac2-5f6acc9c5b80.

Full text

Abstract:

Linear digital signal processing based around linear, time-invariant systems theory finds substantial application in speech processing. The linear acoustic source-filter theory of speech production provides ready biomechanical justification for using linear techniques. Nonetheless, biomechanical studies surveyed in this thesis display significant nonlinearity and non-Gaussinity, casting doubt on the linear model of speech production. In order therefore to test the appropriateness of linear systems assumptions for speech production, surrogate data techniques can be used. This study uncovers systematic flaws in the design and use of exiting surrogate data techniques, and, by making novel improvements, develops a more reliable technique. Collating the largest set of speech signals to-date compatible with this new technique, this study next demonstrates that the linear assumptions are not appropriate for all speech signals. Detailed analysis shows that while vowel production from healthy subjects cannot be explained within the linear assumptions, consonants can. Linear assumptions also fail for most vowel production by pathological subjects with voice disorders. Combining this new empirical evidence with information from biomechanical studies concludes that the most parsimonious model for speech production, explaining all these findings in one unified set of mathematical assumptions, is a stochastic nonlinear, non-Gaussian model, which subsumes both Gaussian linear and deterministic nonlinear models. As a case study, to demonstrate the engineering value of nonlinear signal processing techniques based upon the proposed biomechanically-informed, unified model, the study investigates the biomedical engineering application of disordered voice measurement. A new state space recurrence measure is devised and combined with an existing measure of the fractal scaling properties of stochastic signals. Using a simple pattern classifier these two measures outperform all combinations of linear methods for the detection of voice disorders on a large database of pathological and healthy vowels, making explicit the effectiveness of such biomechanically-informed, nonlinear signal processing techniques.

APA, Harvard, Vancouver, ISO, and other styles

15

Wark, Timothy J. "Multi-modal speech processing for automatic speaker recognition." Thesis, Queensland University of Technology, 2001.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

16

Cao, Yuchang. "Speech enhancement with single and multiple microphones." Thesis, Queensland University of Technology, 1996.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

17

Chung, Jae H. "A new homomorphic vocoder framework using analysis-by-synthesis excitation analysis." Diss., Georgia Institute of Technology, 1991. http://hdl.handle.net/1853/15471.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Crosmer, Joel R. "Very low bit rate speech coding using the line spectrum pair transformation of the LPC coefficients." Diss., Georgia Institute of Technology, 1985. http://hdl.handle.net/1853/15739.

Full text

APA, Harvard, Vancouver, ISO, and other styles

19

Rose, Richard C. "The design and performance of an analysis-by-synthesis class of predictive speech coders." Diss., Georgia Institute of Technology, 1988. http://hdl.handle.net/1853/16693.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Wang, Raymond Jian-Wei. "Neurocomputing systems for auditory processing." Thesis, The University of Sydney, 1998. https://hdl.handle.net/2123/26278.

Full text

Abstract:

This thesis studies neural computation models and neuromorphic implementations of the auditory pathway with applications to cochlear implants and artiﬁcial auditory sensory and processing systems. Very low power analogue computation is addressed through the design of micropower analogue building blocks and an auditory preprocessing module targeted at cochlear implants. The analogue building blocks have been fabricated and tested in a standard Complementary Metal Oxide Silicon (CMOS) process. The auditory pre-processing module design is based on the cochlea signal processing mechanisms and low power microelectronic design methodologies. Compared to existing preprocessing techniques used in cochlear implants, the proposed design has a wider dynamic range and lower power consumption. Furthermore, it provides the phase coding as well as the place coding information that are necessary for enhanced functionality in future cochlear implants. The thesis presents neural computation based approaches to a number of signal-processing problems encountered in cochlear implants. Techniques that can improve the performance of existing devices are also presented. Neural network based models for loudness mapping and pattern recognition based channel selection strategies are described. Compared with state—of—the—art commercial cochlear implants, the thesis results show that the proposed channel selection model produces superior speech sound qualities; and the proposed loudness mapping model consumes substantially smaller amounts of memory. Aside from the applications in cochlear implants, this thesis describes a biologically plausible computational model of the auditory pathways to the superior colliculus based on current neurophysiological ﬁndings. The model encapsulates interaural time difference, interaural spectral difference, monaural pathway and auditory space map tuning in the inferior colliculus. A biologically plausible Hebbian-like learning rule is proposed for auditory space neural map tuning, and a reinforcement learning method is used for map alignment with other sensory space maps through activity independent cues. The validity of the proposed auditory pathway model has been veriﬁed by simulation using synthetic data. Further, a complete biologically inspired auditory simulation system is implemented in software. The system incorporates models of the external ear, the cochlea, as well as the proposed auditory pathway model. The proposed implementation can mimic the biological auditory sensory system to generate an auditory space map from 3—D sounds. A large amount of real 3-D sound signals including broadband White noise, click noise and speech are used in the simulation experiments. The eﬁect of the auditory space map developmental plasticity is examined by simulating early auditory space map formation and auditory space map alignment with a distorted visual sensory map. Detailed simulation methods, procedures and results are presented.

APA, Harvard, Vancouver, ISO, and other styles

21

Fisher, Andrew John. "Speech enhancement for forensic applications." Thesis, Queensland University of Technology, 1995. https://eprints.qut.edu.au/36243/1/36243_Fisher_1995.pdf.

Full text

Abstract:

Law enforcement agencies often engage in surveillance operations which involve the recording of spoken conversations. As is often the case, these recordings are made with a single microphone under covert conditions. Under this non-ideal situation, the speech signal is highly susceptible to be severely corrupted by various forms of noise, the most common of which is broadband in nature. This thesis presents a study conducted to investigate the enhancement of speech recordings for forensic applications. A new speech enhancement scheme has been proposed here, to provide noise reduction without compromising the intelligibility of the speech. The scheme implements a hybrid approach combining both spectral and root-cepstral subtraction. Extensive testing using both subjective and objective based intelligibility and acceptability assessment schemes, indicate that the system is successful in providing intelligibility improvement and superior signal-to-noise ratio with minimal spectral distortion. In addition, the proposed system was also tested in the capacity as a preprocessing stage to other speech applications such as speech recognition, speaker recognition and speech coding. The system proved to be beneficial for speech coding, while application to the recognition techniques was limited despite showing positive potential. Finally the system was implemented in real-time and was found additionally successful when applied to enhancement of speech transmitted over High Frequency communication channels.

APA, Harvard, Vancouver, ISO, and other styles

22

Hosom, John-Paul. "Automatic time alignment of phonemes using acoustic-phonetic information /." Full text open access at:, 2000. http://content.ohsu.edu/u?/etd,282.

Full text

APA, Harvard, Vancouver, ISO, and other styles

23

Ikram, Muhammad Zubair. "Multichannel blind separation of speech signals in a reverberant environment." Diss., Georgia Institute of Technology, 2001. http://hdl.handle.net/1853/15023.

Full text

APA, Harvard, Vancouver, ISO, and other styles

24

Wilson, Shawn C. "Voice recognition systems : assessment of implementation aboard U.S. naval ships." Thesis, Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Mar%5FWilson.pdf.

Full text

Abstract:

Thesis (M.S. in Information Systems and Operations)--Naval Postgraduate School, March 2003.
Thesis advisor(s): Michael T. McMaster, Kenneth J. Hagan. Includes bibliographical references (p. 47-49). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

25

Müller, J. J. "USB telephony interface device for speech recognition applications /." Link to the online version, 2005. http://hdl.handle.net/10019/1127.

Full text

APA, Harvard, Vancouver, ISO, and other styles

26

Boulis, Constantinos. "Topic learning in text and conversational speech /." Thesis, Connect to this title online; UW restricted, 2005. http://hdl.handle.net/1773/5914.

Full text

APA, Harvard, Vancouver, ISO, and other styles

27

Farges, Eric P. "An analysis-synthesis hidden Markov model of speech." Diss., Georgia Institute of Technology, 1987. http://hdl.handle.net/1853/14775.

Full text

APA, Harvard, Vancouver, ISO, and other styles

28

LeBlanc, Wilfrid P. (Wilfrid Paul) Carleton University Dissertation Engineering Electrical. "Speech coding at low to medium bit rates." Ottawa, 1992.

Find full text

APA, Harvard, Vancouver, ISO, and other styles

29

Anderson, David Verl. "Audio signal enhancement using multi-resolution sinusoidal modeling." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15394.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Kale, Kaustubh R. "Low complexity, narrow baseline beamformer for hand-held devices." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0001223.

Full text

APA, Harvard, Vancouver, ISO, and other styles

31

Hild, Kenneth E. "Blind separation of convolutive mixtures using Renyi's divergence." [Gainesville, Fla.] : University of Florida, 2003. http://purl.fcla.edu/fcla/etd/UFE0002387.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Keenaghan, Kevin Michael. "A Novel Non-Acoustic Voiced Speech Sensor Experimental Results and Characterization." Link to electronic thesis, 2004. http://www.wpi.edu/Pubs/ETD/Available/etd-0114104-144946/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Iyengar, Vasu. "A low delay 16 kbit/sec coder for speech signals /." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63799.

Full text

APA, Harvard, Vancouver, ISO, and other styles

34

Ng, H. N. Elaine. "Effects of noise type on speech understanding." Click to view the E-thesis via HKUTO, 2006. http://sunzi.lib.hku.hk/hkuto/record/B37990159.

Full text

APA, Harvard, Vancouver, ISO, and other styles

35

Ng, H. N. Elaine, and 吳凱寧. "Effects of noise type on speech understanding." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2006. http://hub.hku.hk/bib/B37990159.

Full text

APA, Harvard, Vancouver, ISO, and other styles

36

Lai, Yiu Pong. "Maximum likelihood normalization for robust speech recognition /." View Abstract or Full-Text, 2003. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202003%20LAI.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2003.
Includes bibliographical references (leaves 98-103). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

37

Li, Chak Fai. "Improved polynomial segment model for speech recognition /." View abstract or full-text, 2004. http://library.ust.hk/cgi/db/thesis.pl?ELEC%202004%20LI.

Full text

Abstract:

Thesis (M. Phil.)--Hong Kong University of Science and Technology, 2004.
Includes bibliographical references (leaves 80-84). Also available in electronic version. Access restricted to campus users.

APA, Harvard, Vancouver, ISO, and other styles

38

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/36534.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Ertan, Ali Erdem. "Pitch-synchronous processing of speech signal for improving the quality of low bit rate speech coders." Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-06072004-131138/unrestricted/ertan%5Fali%5Fe%5F200405%5Fphd.pdf.

Full text

Abstract:

Thesis (Ph. D.)--School of Electrical and Computer Engineering, Georgia Institute of Technology, 2004. Directed by Thomas P. Barnwell, III.
Vita. Includes bibliographical references (leaves 221-226).

APA, Harvard, Vancouver, ISO, and other styles

40

Macon, Michael W. "Speech synthesis based on sinusoidal modeling." Diss., Georgia Institute of Technology, 1996. http://hdl.handle.net/1853/13904.

Full text

APA, Harvard, Vancouver, ISO, and other styles

41

Lam, Victor T. M. "The stability of pitch synthesis filters in speech coding /." Thesis, McGill University, 1985. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63361.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

O'Rourke, William Thomas. "Real-world evaluation of mobile phone speech enhancement algorithms." [Gainesville, Fla.] : University of Florida, 2002. http://purl.fcla.edu/fcla/etd/UFE0000585.

Full text

APA, Harvard, Vancouver, ISO, and other styles

43

Ellis, Richard T. "Speech enhancement system implemented in CMOS." Thesis, Georgia Institute of Technology, 2002. http://hdl.handle.net/1853/14801.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Chu, Kam Keung. "Feature extraction based on perceptual non-uniform spectral compression for noisy speech recognition /." access full-text access abstract and table of contents, 2005. http://libweb.cityu.edu.hk/cgi-bin/ezdb/thesis.pl?mphil-ee-b19887516a.pdf.

Full text

Abstract:

Thesis (M.Phil.)--City University of Hong Kong, 2005.
"Submitted to Department of Electronic Engineering in partial fulfillment of the requirements for the degree of Master of Philosophy" Includes bibliographical references (leaves 143-147)

APA, Harvard, Vancouver, ISO, and other styles

45

Wasmeier, Hans. "Development of tests and preprocessing algorithms for evaluation and improvement of speech recognition units." Thesis, University of British Columbia, 1986. http://hdl.handle.net/2429/26750.

Full text

Abstract:

This study considered the evaluation of commercially available isolated word, speaker dependent, speech recognition units, and preprocessing techniques that may be used for improving their performance. The problem was considered in three separate stages. A series of tests were designed to exercise an isolated word, speaker dependent, speech recognition unit. These tests provided a sound basis for determining a given unit's strengths and weaknesses. This knowledge permits a more informed decision on the best recognition device for a given price range. As well, this knowledge may be used in the design of a robust vocabulary, and creation of guidelines for best performance. The test vocabularies were based on the forty English phonemes identified by Rabiner and Schafer [28] and the test variations were representative of common variations which may be expected in normal use. A digital archive system was implemented for storing the voice input of test subjects. This facility provided a data base for an investigation of preprocessing techniques. As well, it permits the testing of different speech recognition units with the same voice input, providing a platform for device comparison. Several speech preprocessing and performance improvement techniques were then investigated. Specifically, two types of time normalization, the enhancement of low energy phonemes and a change in training technique were investigated. These techniques permit a more accurate analysis of the failure mechanism of the speech recognition unit. They may also provide the basis for a speech preprocessor design which could be placed in front of a commercial speech recognition unit. A commercially available speech recognition unit, the NEC SR100, was used as a measure of the effectiveness of the tests and of the improvements. Results of the study indicated that the designed tests and the preprocessing & performance improvement techniques investigated were useful in identifying the speech recognition unit's weaknesses. Also, depending on the economics of implementation, it was found that preprocessing may provide a cost effective solution to some of the recognition unit's shortcomings.
Applied Science, Faculty of
Electrical and Computer Engineering, Department of
Graduate

APA, Harvard, Vancouver, ISO, and other styles

46

Lee, Spencer Jaehoon Gilbert Juan E. "Post-speech-recognition processiing in domain-specific text-corpus-based distributed listening system analysis, interpretation and selection of speech recognition results /." Auburn, Ala., 2006. http://repo.lib.auburn.edu/2006%20Summer/Theses/LEE_SPENCER_7.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Rao, Ram Raghavendra. "Audio-visual interaction in multimedia." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/13349.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Necioğlu, Burhan F. "Objectively measured descriptors for perceptual characterization of speakers." Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/15035.

Full text

APA, Harvard, Vancouver, ISO, and other styles

49

McCree, Alan V. "A new LPC vocoder model for low bit rate speech coding." Diss., Georgia Institute of Technology, 1992. http://hdl.handle.net/1853/15053.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Mathan, Luc Stefan. "Speaker-independent access to a large lexicon." Thesis, McGill University, 1987. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=63773.

Full text

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Speech processing systems'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles