Acceder

Bibliografías temáticas / Speech coding / Tesis

Tesis sobre el tema "Speech coding"

Siga este enlace para ver otros tipos de publicaciones sobre el tema: Speech coding.

Autor: Grafiati

Publicado: 4 de junio de 2021

Última modificación: 19 de febrero de 2023

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros

Elija tipo de fuente:

Consulte los 50 mejores tesis para su investigación sobre el tema "Speech coding".

Junto a cada fuente en la lista de referencias hay un botón "Agregar a la bibliografía". Pulsa este botón, y generaremos automáticamente la referencia bibliográfica para la obra elegida en el estilo de cita que necesites: APA, MLA, Harvard, Vancouver, Chicago, etc.

También puede descargar el texto completo de la publicación académica en formato pdf y leer en línea su resumen siempre que esté disponible en los metadatos.

Explore tesis sobre una amplia variedad de disciplinas y organice su bibliografía correctamente.

1

Abboud, Karim. "Wideband CELP speech coding". Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56805.

Texto completo

Resumen

The purpose of this thesis is to study the coding of wideband speech and to improve on previous Code-Excited Linear Prediction (CELP) coders in terms of speech quality and bit rate. To accomplish this task, improved coding techniques are introduced and the operating bit rate is reduced while maintaining and even enhancing the speech quality.
the first approach considers the quantization of Liner Predictive Coding (LPC) parameters and uses a three way split vector quantization. Both scalar and vector quantization are initially studied; results show that, with adequate codebook training, the second method generates better results while using a fewer number of bits. Nevertheless, the use of vector quantizers remain highly complex in terms of memory and number of computations. A new quantization scheme, split vector quantization (split VQ), is investigated to overcome this complexity problem. Using a new weighted distance measure as a selection criterion for split VQ, the average spectral distortion is significantly reduced to match the results obtained with scalar quantizers.
The second approach introduces a new pitch predictor with an increased temporal resolution for periodicity. This new technique has the advantage of maintaining the same quality obtained with conventional multiple coefficient predictors at a reduced bit rate. Furthermore, the conventional CELP noise weighting filter is modified to allow more freedom and better accuracy in the modeling of both tilt and formant structures. Throughout this process, different noise weighting schemes are evaluated and the results show that the new filter greatly contributes in solving the problem of high frequency distortion.
The final wideband CELP coder is operational at 11.7 kbits/s and generates a high perceptual quality of the reconstructed speech using the fractional pitch predictor and the new perceptual noise weighting filter.

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Sturt, Christian. "Pitch synchronous speech coding techniques". Thesis, University of Surrey, 2003. http://epubs.surrey.ac.uk/843327/.

Texto completo

Resumen

Efficient source coding techniques are necessary to make optimal use of the limited bandwidth available in mobile phone networks. Most current mobile telephone communication systems compress the speech waveform by using speech coders based on the Code Excited Linear Prediction (CELP) model. Such coders give high quality speech at bit rates of 8 kbps and above. Below 8 kbps, the quality of the coded speech degrades rapidly. At rates of 6 kbps and below, parametric speech coders offer better speech quality. These coders reduce the required bit rate by transmitting certain characteristics of the speech waveform to the decoder, rather than attempting to code the waveform itself. The disadvantage of parametric coders is that the maximum achievable quality is limited by assumptions made during the coding of the speech signal. The aim of the research presented is to investigate and eliminate the factors that limit the speech quality of parametric coders. A new pitch synchronous coding model is proposed that operates on individual pitch cycle waveforms of speech rather than longer, fixed length frames as used in classic techniques. In order to implement a pitch synchronous coder, new pitch cycle detection algorithms have been proposed. Pitch synchronous parameter analysis was investigated and several new techniques have been developed. A novel pitch synchronous split-band voicing estimator has been proposed that utilises only the phase of the speech harmonics rather than the periodicity used in traditional techniques. Fixed rate quantisation of pitch synchronous speech parameters has been investigated and a joint quantisation/interpolation scheme has been proposed. This scheme has been applied to the quantisation of the pitch synchronous parameters and has been shown to outperform traditional quantisation techniques. A comparison of a reference parametric coder with its pitch synchronous counterpart has shown that the pitch synchronous paradigm eliminates some of the main factors that limit the speech quality in parametric coders. It is expected that this will lead to the development of speech coders that can produce speech of higher quality than current parametric coders operating at the same bit rate. Key words: Speech Coding, Pitch Synchronous, Sinusoidal Coding, Split-Band LPC Coding.

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Kaouri, Hussein Ali. "Speech coding using vector quantisation". Thesis, Queen's University Belfast, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356934.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Kritzinger, Carl. "Low bit rate speech coding". Thesis, Stellenbosch : University of Stellenbosch, 2006. http://hdl.handle.net/10019.1/2078.

Texto completo

Resumen

Thesis (MScIng (Electrical and Electronic Engineering))--University of Stellenbosch, 2006.
Despite enormous advances in digital communication, the voice is still the primary tool with which people exchange ideas. However, uncompressed digital speech tends to require prohibitively high data rates (upward of 64kbps), making it impractical for many applications. Speech coding is the process of reducing the data rate of digital voice to manageable levels. Parametric speech coders or vocoders utilise a-priori information about the mechanism by which speech is produced in order to achieve extremely efficient compression of speech signals (as low as 1 kbps). The greater part of this thesis comprises an investigation into parametric speech coding. This consisted of a review of the mathematical and heuristic tools used in parametric speech coding, as well as the implementation of an accepted standard algorithm for parametric voice coding. In order to examine avenues of improvement for the existing vocoders, we examined some of the mathematical structure underlying parametric speech coding. Following on from this, we developed a novel approach to parametric speech coding which obtained promising results under both objective and subjective evaluation. An additional contribution by this thesis was the comparative subjective evaluation of the effect of parametric speech coding on English and Xhosa speech. We investigated the performance of two different encoding algorithms on the two languages.

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Burnett, I. S. "Hybrid techniques for speech coding". Thesis, University of Bath, 1992. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.317353.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Al-Naimi, Khaldoon Taha. "Advanced speech processing and coding techniques". Thesis, University of Surrey, 2002. http://epubs.surrey.ac.uk/843488/.

Texto completo

Resumen

Over the past two decades there has been substantial growth in speech communications and new speech related applications. Bandwidth constraints led researchers to investigate ways of compressing speech signals whilst maintaining speech quality and intelligibility so as to increase the possible number of customers for the given bandwidth. Because of this a variety of speech coding techniques have been proposed over this period. At the heart of any proposed speech coding method is quantisation of the speech production model parameters that need to be transmitted to the decoder. Quantisation is a controlling factor for the targeted bit rates and for meeting quality requirements. The objectives of the research presented in this thesis are twofold. The first enabling the development of a very low bit rate speech coder which maintains quality and intelligibility. This includes increasing the robustness to various operating conditions as well as enhancing the estimation and improving the quantisation of speech model parameters. The second objective is to provide a method for enhancing the performance of an existing speech related application. The first objective is tackled with the aid of three techniques. Firstly, various novel estimation techniques are proposed which are such that the resultant estimated speech production model parameters have less redundant information and are highly correlated. This leads to easier quantisation (due to higher correlation) and therefore to bit saving. The second approach is to make use of the joint effect of the quantisation of spectral parameters (i.e. LSF and spectral amplitudes) for their big impact on the overall bit allocation required. Work towards the first objective also includes a third technique which enhances the estimation of a speech model parameter (i.e. the pitch) through a robust statistics-based post-processing (or tracking) method which operates in noise contaminated environments. Work towards the second objective focuses on an application where speech plays an important role, namely echo-canceller and noise-suppressor systems. A novel echo-canceller method is proposed which resolves most of the weaknesses present in existing echo-canceller systems and improves the system performance.

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Zhao, David Yuheng. "Model Based Speech Enhancement and Coding". Doctoral thesis, Stockholm : Kungliga Tekniska högskolan, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-4412.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

Katugampala, Nilantha N. "Multimode speech coding below 6 kbps". Thesis, University of Surrey, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.365141.

Texto completo

Resumen

The past two decades have witnessed a rapid expansion of the telecommunications industry. This growth has been primarily fuelled by the proliferation of the digital communication systems and services which have become easily available through wired and wireless networks. Current research trends involving integration and packetisation of voice, video and data channels into true multimedia communications, promise a similar technological revolution in the next decade. The available bandwidth in wire based terrestrial network is a relatively cheap and expandable resource. However in satellite and cellular radio systems the bandwidth is inherently limited and an expensive resource. In order to accommodate ever growing numbers of subscribers whilst maintaining high quality and low operational costs, it is essential to maximise the spectral efficiency. The research presented in this thesis has focused on the development of new source compression algorithms, tailored for human speech in order to improve the spectral efficiency of digital transmission systems. Recently there is an increasing interest on speech coding algorithms which combine various existing technologies in order to improve the speech quality .whilst maintaining the low transmission rate of the existing coding techniques. The aim of the research presented in this thesis was to develop a complete hybrid coding algorithm which combines harmonic and waveform approximating coding techniques. In order to integrate the two coding paradigms novel phase synchronisation and classification techniques were developed. The perceptual quality of the speech synthesised using the unquantised hybrid model achieves nearly transparent quality. The hybrid model was used to develop variable bit rate coders, which are particularly advantageous for voice storage, Code Division Multiple Access (CDMA) wireless networks, packet switched networks, and statistical multiplexing of speech for multi channel communications.

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Green, Richard C. "Walsh based cepstra for speech coding". Thesis, King's College London (University of London), 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.392848.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Ooi, James M. 1970. "Application of wavelets to speech coding". Thesis, Massachusetts Institute of Technology, 1993. http://hdl.handle.net/1721.1/12340.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Zolfaghari, Parham Seyed. "Sinusoidal model based segmental speech coding". Thesis, University of Cambridge, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.621177.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Mason, Michael. "Hybrid coding of speech and audio signals". Thesis, Queensland University of Technology, 2001.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Batri, Nadim. "Robust spectral parameter coding in speech processing". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1998. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape11/PQDD_0005/MQ43996.pdf.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Asenstorfer, John A. "Source-channel coding for CELP speech coders /". Title page, contents and abstract only, 1994. http://web4.library.adelaide.edu.au/theses/09PH/09pha816.pdf.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Soong, Michael. "Predictive split vector quantization for speech coding". Thesis, McGill University, 1994. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=68054.

Texto completo

Resumen

The purpose of this thesis is to examine techniques for efficiently coding speech Linear Predictive Coding (LPC) coefficients. Vector Quantization (VQ) is an efficient approach to encode speech at low bit rates. However its exponentially growing complexity poses a formidable barrier. Thus a structured vector quantizer is normally used instead.
Summation Product Codes (SPCs) are a family of structured vector quantizers that circumvent the complexity obstacle. The performance of SPC vector quantizers can be traded off against their storage and encoding complexity. Besides the complexity factors, the design algorithm can also affect the performance of the quantizer. The conventional generalized Lloyd's algorithm (GLA) generates sub-optimal codebooks. For particular SPC such as multistage VQ, the GLA is applied to design the stage codebooks stage-by-stage. Joint design algorithms on the other hand update all the stage codebooks simultaneously.
In this thesis, a general formulation and an algorithm solution to the joint codebook design problem is provided for the SPCs. The key to this algorithm is that every PC has a reference product codebook which minimizes the overall distortion. This joint design algorithm is tested with a novel SPC, namely "Predictive Split VQ (PSVQ)".
VQ of speech Line Spectral Frequencies (LSF's) using PSVQ is also presented. A result in this work is that PSVQ, designed using the joint codebook design algorithm requires only 20 bits/frame(20 ms) for transparent coding of a 10$ sp{ rm th}$ order LSF's parameters.

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Grass, John. "Quantization of predictor coefficients in speech coding". Thesis, McGill University, 1990. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=60067.

Texto completo

Resumen

This thesis examines techniques of efficiently coding Linear Predictive Coding (LPC) coefficients with 20 to 30 bits per 20 ms speech frame.
Scalar quantization is the first approach evaluated. Results show that Line Spectral Frequencies require significantly fewer bits than reflection coefficients for comparable performance. The second approach investigated is the use of vector-scalar quantization. In the first stage, vector quantization is performed. The second stage consists of a bank of scalar quantizers which code the vector errors between the original LPC coefficients and the components of the vector of the quantized coefficients.
The approach is to couple the vector and scalar quantization stages. Every codebook vector is compared to the original LPC coefficient vector to produce error vectors. The second innovation into vector-scalar quantization is the incorporation of a small adaptive codebook to the large fixed codebook. Frame-to-frame correlation of the LPC coefficients is exploited at no extra cost in bits.
The performance of the vector-scalar quantization using the two new techniques is better than that of the scalar coding techniques currently used in conventional LPC coders.

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Maroun, Nabih. "Toll-quality speech coding at 8 kbs". Thesis, McGill University, 1993. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56802.

Texto completo

Resumen

There has been an ongoing effort to achieve very high quality speech coding at medium transmission bit rates. Consequently, the TIA has chosen the Vector SUM Linear Predictive (VSELP) implementation of an 8 kb/s coder to be the standard for North-American cellular digital telephony. However, it was only recently that, in view of the increased research focus on developing toll-quality speech coding at such bit rates, the CCITT has imposed a set of specifications for standardizing low-delay coders operating at 8 kb/s. The Low-Delay Code Excited Linear Predictive (LD-CELP) suggested by Chen is presently the only potential candidate for CCITT standardization, achieving a one-way coding delay of 10 ms. However, just like the VSELP coding algorithm, the 8 kb/s LD-CELP version does not quite yield toll-quality reconstructed speech. The purpose of the work in this thesis is to establish the minimum requirements for a coding structure capable of generating toll-quality coded speech at 8 kb/s. The purpose of this thesis is to show that, by slightly relaxing the coding delay constraint, perceptual enhancement techniques yield toll quality coding after redesigning and fine-tuning the optimization and quantization procedures of a CELP coder.

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Suddle, Muhammad Riaz. "Speech coding in private and broadcast networks". Thesis, University of Surrey, 1996. http://epubs.surrey.ac.uk/1019/.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Oberhofer, Robert. "Pitch adaptive variable bitrate CELP speech coding". Thesis, University of Ulster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264811.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Thorpe, T. F. "Performance bounds for digital coding of speech". Thesis, University of Cambridge, 1987. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.234070.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Gant, Nicolas Roland Noel. "The linear predictive coding of mask speech". Thesis, University of Southampton, 1986. https://eprints.soton.ac.uk/52261/.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Deloche, François. "Short time-scale efficient coding of speech". Thesis, Paris, EHESS, 2019. http://www.theses.fr/2019EHES0142.

Texto completo

Resumen

L’analyse de données de parole a montré que la sélectivité fréquentielle de la cochlée est adaptée à la structure statistique de la parole. Ce résultat est conforme à l'hypothèse du codage efficace selon laquelle le traitement sensoriel adopte un schéma de codage qui est optimal pour les stimuli naturels. Cependant, le signal de la parole possède une structure riche, même sur des petites échelles de temps, du fait de la diversité des facteurs acoustiques à l'origine de la génération de la parole. Cette complexité de structure motive l'idée qu'une représentation non linéaire de la parole pourrait aboutir à un schéma de codage plus efficace qu‘une simple représentation linéaire. La première étape dans la recherche de stratégies efficaces est la description de la structure statistique de la parole à un niveau fin. Dans cette thèse, j'explore la structure statistique au niveau des phonèmes en adoptant une approche paramétrique pour la représentation du signal. La décomposition la plus parcimonieuse est recherchée parmi une famille de dictionnaires de filtres de Gabor dont la sélectivité fréquentielle suit différentes lois de puissance dans la gamme des hautes fréquences 1-8kHz. L'utilisation de ces dictionnaires comme représentations temps-fréquence parcimonieuses est justifiée mathématiquement et empiriquement. Un lien formel avec les travaux précédents, fondés sur l'Analyse en Composantes indépendantes (ACI), est présenté. Les lois de puissance des représentations parcimonieuses offrent une interprétation riche de la structure statistique de la parole, et peuvent être reliées à des facteurs acoustiques clés déduits de l'analyse de données réelles et synthétiques. Les résultats montrent en outre qu'une stratégie de codage efficace, reflétant le comportement non linéaire de la cochlée, consiste à réduire la sélectivité fréquentielle avec le niveau d'intensité sonore
Cochlear frequency selectivity is known to reflect the overall statistical structure of speech, in line with the hypothesis that low-level sensory processing provides efficient codes for information contained in natural stimuli. Speech signals, however, possess a complex structure, even on short-time scales, as a result of the diversity of acoustic factors involved in the generation of speech. This rich structure means that advanced coding schemes based on a nonlinear representation of speech sounds could provide more efficient codes. The first step in finding efficient strategies is to describe the statistical structure of speech at a fine level — at the level of phonemes or even finer at the level of acoustic events. In this thesis, I use a parametric approach to explore the fine-grained statistical structure of speech. The goal of this method is to find the sparsest representation of speech sounds among a family of dictionaries of Gabor filters whose frequency selectivity follows different power laws in the high frequency range 1-8kHz. I motivate the use of Gabor filters for the search of sparse time-frequency representations of speech signals, and I show that the dictionary method has a formal link with previous work based on Independent Component Analysis (ICA). The acoustic factors that affect the power law associated with the sparsest decomposition can be inferred from the analyses of synthetic and real data. The results suggest that an efficient speech coding strategy is to reduce frequency selectivity with sound intensity level, reflecting the nonlinear behavior of the cochlea

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Hoyle, Robert D. (Robert Douglas) Carleton University Dissertation Engineering Electrical. "Digital speech coding for land mobile radio". Ottawa, 1986.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

Greenwood, Andrew Richard. "Articulatory speech synthesis". Thesis, University of Liverpool, 1993. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.386773.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Varga, A. P. "Multipulse excited linear predictive analysis in speech coding and constructive speech synthesis". Thesis, University of Cambridge, 1985. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.372909.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Leong, Michael. "Representing voiced speech using prototype waveform interpolation for low-rate speech coding". Thesis, McGill University, 1992. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=56796.

Texto completo

Resumen

In recent years, research in narrow-band digital speech coding has achieved good quality speech coders at low rates of 4.8 to 8.0 kb/s. This thesis examines the method proposed by W. B. Kleijn called prototype waveform interpolation (PWI) for coding the voiced sections of speech efficiently to achieve a coder below 4.8 kb/s while maintaining, even improving, the perceptual quality of current coders.
In examining the PWI method, it was found that although the method generally works very well there are occasional sections of the reconstructed voiced speech where audible distortion can be heard, even when the prototypes are not quantized. The research undertaken in this thesis focuses on the fundamental principles behind modelling voiced speech using PWI instead of focusing on bit allocation for encoding the prototypes. Problems in the PWI method are found that may be have been overlooked as encoding error if full encoding were implemented.
Kleijn uses PWI to represent voiced sections of the excitation signal which is the residual obtained after the removal of short-term redundancies by a linear predictive filter. The problem with this method is that when the PWI reconstructed excitation is passed through the inverse filter to synthesize the speech undesired effects occur due to the time-varying nature of the filter. The reconstructed speech may have undesired envelope variations which result in audible warble.
This thesis proposes an energy fixup to smoothen the synthesized speech envelope when the interpolation procedure fails to provide the smooth linear result that is desired. Further investigation, however, leads to the final proposal in this thesis that PWI should he performed on the clean speech signal instead of the excitation to achieve consistently reliable results for all voiced frames.

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Accardi, Anthony J. (Anthony Joseph) 1976. "A modular approach to speech enhancement with an application to speech coding". Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/9976.

Texto completo

Resumen

Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science; and, Thesis (B.S.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.
Includes bibliographical references (p. 98-101).
by Anthony J. Accardi.
B.S.
M.Eng.

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Islam, Tamanna. "Interpolation of linear prediction coefficients for speech coding". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0034/MQ64229.pdf.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Trinkaus, Trevor R. "Perceptual coding of audio and diverse speech signals". Diss., Georgia Institute of Technology, 1999. http://hdl.handle.net/1853/13883.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Loo, James H. Y. (James Hung Yan). "Intraframe and interframe coding of speech spectral parameters". Thesis, McGill University, 1996. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=24065.

Texto completo

Resumen

Most low bit rate speech coders employ linear predictive coding (LPC) which models the short-term spectral information within each speech frame as an all-pole filter. In this thesis, we examine various methods that can efficiently encode spectral parameters for every 20 ms frame interval. Line spectral frequencies (LSF) are found to be the most effective parametric representation for spectral coding. Product code vector quantization (VQ) techniques such as split VQ (SVQ) and multi-stage VQ (MSVQ) are employed in intraframe spectral coding, where each frame vector is encoded independently from other frames. Depending on the product code structure, "transparent coding" quality is achieved for SVQ at 26-28 bits/frame and for MSVQ at 25-27 bits/frame.
Because speech is quasi-stationary, interframe coding methods such as predictive SVQ (PSVQ) can exploit the correlation between adjacent LSF vectors. Nonlinear PSVQ (NPSVQ) is introduced in which a nonparametric and nonlinear predictor replaces the linear predictor used in PSVQ. Regardless of predictor type, PSVQ garners a performance gain of 5-7 bits/frame over SVQ. By interleaving intraframe SVQ with PSVQ, error propagation is limited to at most one adjacent frame. At an overall bit rate of about 21 bits/frame, NPSVQ can provide similar coding quality as intraframe SVQ at 24 bits/frame (an average gain of 3 bits/frame). The particular form of nonlinear prediction we use incurs virtually no additional encoding computational complexity. Voicing classification is used in classified NPSVQ (CNPSVQ) to obtain an additional average gain of 1 bit/frame for unvoiced frames. Furthermore, switched-adaptive predictive SVQ (SA-PSVQ) provides an improvement of 1 bit/frame over PSVQ, or 6-8 bits/frame over SVQ, but error propagation increases to 3-7 frames. We have verified our comparative performance results using subjective listening tests.

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Ramachandran, Ravi P. "Pitch filtering in adaptive predictive coding of speech". Thesis, McGill University, 1986. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=65345.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Roy, Guylain. "Low-rate analysis-by-synthesis wideband speech coding". Thesis, McGill University, 1990. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=59643.

Texto completo

Resumen

This thesis studies low-rate wideband analysis-by-synthesis speech coders. The wideband speech signals have a bandwidth of up to 8 kHz and are sampled at 16 kHz, while the target operating bit rate is 16 kbits/sec. Applications for such a coder range from high-quality voice-mail services to teleconferencing. In order to achieve a low operating rate, the coding places more emphasis on the lower frequencies (0 to 4 kHz), while the higher frequencies (4 to 8 kHz) are coded less precisely but with little perceived degradation.
The study consists of three stages. First, aspects of wideband spectral envelope modeling using Line Spectral Frequencies (LSF's) are studied. Then, the underlying coder structure is derived from a basic Residual Excited Linear Predictive coder (RELP). This structure is enhanced by the addition of a pitch prediction stage, and by the development of full-band and split-band pitch parameter optimization procedures. These procedures are then applied to an Code Excited Linear Prediction (CELP) model. Finally, the performance of full-band and split-band CELP structures are compared.

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Chahine, Gebrael. "Pitch modelling for speech coding at 4.8 kbitss". Thesis, McGill University, 1993. http://digitool.Library.McGill.CA:80/R/?func=dbin-jump-full&object_id=69724.

Texto completo

Resumen

The purpose of this thesis is to examine techniques of efficiently modelling the Long-Term Predictor (LTP) or the pitch filter in low rate speech coders. The emphasis in this thesis is on a class of coders which are referred to as Linear Prediction (LP) based analysis-by-synthesis coders, and more specifically on the Code-Excited Linear Prediction (CELP) coder which is currently the most commonly used in low rate transmission. The experiments are performed on a CELP based coder developed by the U.S. Department of Defense (DoD) and Bell Labs, with an output bit rate of 4.8 kbits/s.
A multi-tap LTP outperforms a single-tap LTP, but at the expense of a greater number of bits. A single-tap LTP can be improved by increasing the time resolution of the LTP. This results in a fractional delay LTP, which produces a significant increase in prediction gain and perceived periodicity at the cost of more bits, but less than for the multi-tap case.
The first new approach in this work is to use a pseudo-three-tap pitch filter with one or two degrees of freedom of the predictor coefficients, which gives a better quality reconstructed speech and also a more desirable frequency response than a one-tap pitch prediction filter. The pseudo-three-tap pitch filter with one degree of freedom is of particular interest as no extra bits are needed to code the pitch coefficients.
The second new approach is to perform time scaling/shifting on the original speech minimizing further the minimum mean square error and allowing a smoother and more accurate reconstruction of the pitch structure. The time scaling technique allows a saving of 1 bit in coding the pitch parameters while maintaining very closely the quality of the reconstructed speech. In addition, no extra bits are needed for the time scaling operation as no extra side information has to be transmitted to the receiver.

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

Yan, Ming. "VLSI architectures for speech and image coding applications". Thesis, Queen's University Belfast, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.356855.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Zemouri, Rachid. "Data compression of speech using sub-band coding". Thesis, University of Newcastle Upon Tyne, 1991. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.316094.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

Davis, Andrew J. "Waveform coding of speech and voiceband data signals". Thesis, University of Liverpool, 1988. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.232946.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

LAMARE, RODRIGO CAIADO DE. "SPEECH CODING AT AVERAGE RATES BELOW 2KB/S". PONTIFÍCIA UNIVERSIDADE CATÓLICA DO RIO DE JANEIRO, 2001. http://www.maxwell.vrac.puc-rio.br/Busca_etds.php?strSecao=resultado&nrSeq=1873@1.

Texto completo

Resumen

CONSELHO NACIONAL DE DESENVOLVIMENTO CIENTÍFICO E TECNOLÓGICO
Esta dissertação propõe algoritmos para codificações de voz a taxas médias em torno de 1,2 Kb/s. Um esquema de quantização vetorial preditiva chaveada com desempenho superior aos esquemas previamente descritos na literatura é proposto e avaliado em canal com ou sem ruído. Detectores eficientes de período fundamental e de sons oclusivos e fricativos são examinados e adaptados ao codificador proposto. Técnicas de exitação a baixas taxas de bits são investigadas a fim de reproduzir uma boa qualidade de voz decodificada. O modelo de exitação mista em multi-bandas com três sub-bandas é adotado para codificar os quadros sonoros. Para os quadros surdos são empregadas técnicas de modelagem e síntese de sinais fricativos e oclusivos, capazes de oferecer qualidade de voz satisfatória, reduzindo a taxa de bits destes quadros para apenas 0,4 Kb/s. Técnicas de pós-filtragem para reduzir o ruído de codificação e melhorar a qualidade de voz reconstruída são também examinadas e comparadas em uma mesma plataforma. Para reduzir o nível de ruído ambiente são ainda analisados métodos de supressão de ruído. Finalmente, o codificador proposto é comparado ao padrão norte-americano Mixed Excitation Linear Prediction (MELP), por meios de teste de comparação do tipo A/B. Os testes realizados indicam que o sistema proposto, operando a 1,2 Kb/s, apresenta qualidade de voz ligeiramente superior ao MELP, operando a 2,4 Kb/s. Para situações de transcodificação, o codificador proposto também apresenta desempenho superior ao MELP.
This dissertation presents algorithms to encode at an avarage bit rate of 1.2 Kb/s. A novel switched-predictive vector quantiser technique that outperforms previously reported schemes is proposed and assessed under noise-free and noisy channels. Efficient detectors for the pitch period and fricative and stop sounds are examined and adapted to the proposed coder. Low bit rate excitation methods are investigated in order to reproduce rather high quality speech. A mixed multiband excitation approach with three sub-bands is employed to encode voiced frames. For unvoiced frames, fricatives and stops modelling and synthesis techniques are used. This approach has shown to provide high quality synthesised speech, whilts it reduces the bit rate to only 0.4 Kb/s for unvoiced frames. To reduce coding noise and improve decoded speech, post- filtering techniques are analysed and compared on the same plataform. To reduce background noise, noise suppression methods are also examined. Finally, the propose coder is evaluated against the North American Mixed Prediction (MELP) coder, through A/B comparison tests. Assessment results have shown that the proposed system, operating at 1.2 Kb/s, slightly outperformed the MELP coder, operating at 2.4 Kb/s. For tandem connection situations, the proposed algorithm has presented a superior performance than the MELP coder.
Esta disertación propone algoritmos para codificaciones de voz a tasas medias en torno de 1,2 Kb/s. Se propone un esquema de cuantización vectorial predictiva, con desempeño superior a los esquemas previamente descritos en la literatura. Este esquema se evalúa en canal con o sin ruido. Se examinan detectores eficientes de período fundamental y de sueños oclusivos y fricativos se adaptan al codificador propuesto. Técnicas de exitación a bajas tasas de bits son investigadas a fin de reproducir una boa calidad de voz decodificada. Se adopta el modelo de exitación mixta en multi-bandas con tres sub-bandas para codificar los cuadros sonoros. Para los cuadros surdos se emplean técnicas de modelación y síntesis de señales fricativos y oclusivos, capaces de ofrecer calidad de voz satisfactoria, reduciendo la tasa de bits de estos cuadros para apenas 0,4 Kb/s. También se examinan y se comparan las técnicas de pós-filtragen para reducir el ruido de codificación y mejorar la calidad de voz reconstruída. Para reducir el nível de ruído ambiente se analizan métodos de supresión de ruido. Finalmente, el codificador propuesto se compara al padrón norteamericano Mixed Excitation Lineal Prediction (MELP), por medio de pruebas de comparación del tipo LA/B. Las pruebas realizadas indican que el sistema propuesto, operando a 1,2 Kb/s, presenta calidad de voz ligeramente superior al MELP, operando a 2,4 Kb/s. Para situaciones de transcodificación, el codificador propuesto también presenta desempeño superior al MELP.

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Savvides, Vasos E. "Perceptual models in speech quality assessment and coding". Thesis, Loughborough University, 1988. https://dspace.lboro.ac.uk/2134/36273.

Texto completo

Resumen

The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment.

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

LeBlanc, Wilfrid P. (Wilfrid Paul) Carleton University Dissertation Engineering Electrical. "Speech coding at low to medium bit rates". Ottawa, 1992.

Buscar texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Lucas, Adrian Edward. "Acoustic level speech recognition". Thesis, University of Surrey, 1991. http://epubs.surrey.ac.uk/2819/.

Texto completo

Resumen

A number of techniques have been developed over the last forty years which attempt to solve the problem of recognizing human speech by machine. Although the general problem of unconstrained, speaker independent connected speech recognition is still not solved, some of the methods have demonstrated varying degrees of success on a number of constrained speech recognition tasks. Human speech communication is considered to take place on a number of levels from the acoustic signal through to higher linguistic and semantic levels. At the acoustic level, the recognition process can be divided into time-alignment (the removal of global and local timing differences between the unknown input speech and the stored reference templates) and referencete mplate matching. Little attention seems to have been given to the effective use of acoustic level contextual information to improve the performance of these tasks. In this thesis, a new template matching scheme is developed which addresses this issue and successfully allows the utilization of acoustic level context. The method, based on Bayesian decision theory, is a dynamic time warping approach which incorporates statistical dependencies in matching errors between frames along the entire length of the reference template. In addition, the method includes a speaker compensation technique operating simultaneously. Implementation is carried out using the highly efficient branch and bound algorithm. Speech model storage requirements are quite small as a result of an elegant feature of the recursive matching criterion. Furthermore, a novel method for inferencing the special speech models is introduced. The new method is tested on data drawn from nearly 8000 utterances of the 26 letters of the British English Alphabet spoken by 104 speakers, split almost equally between male and female speakers. Experiments show that the new approach is a powerful acoustic level speech recognizer achieving up to 34% better recognition performance when compared with a conventional method based on the dynamic programming algorithm.

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Coetzee, H. J. "The development of a new objective speech quality measure for speech coding applications". Diss., Georgia Institute of Technology, 1990. http://hdl.handle.net/1853/15474.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Murray, Alan. "An investigation into a speaker dependent coding system". Thesis, Leeds Beckett University, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.321413.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

McCourt, Paul. "Transform vector quantisation of speech at low bit rates". Thesis, Queen's University Belfast, 1995. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.282252.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

Kura, Vijay. "Novel Pitch Detection Algorithm With Application to Speech Coding". ScholarWorks@UNO, 2003. http://scholarworks.uno.edu/td/52.

Texto completo

Resumen

This thesis introduces a novel method for accurate pitch detection and speech segmentation, named Multi-feature, Autocorrelation (ACR) and Wavelet Technique (MAWT). MAWT uses feature extraction, and ACR applied on Linear Predictive Coding (LPC) residuals, with a wavelet-based refinement step. MAWT opens the way for a unique approach to modeling: although speech is divided into segments, the success of voicing decisions is not crucial. Experiments demonstrate the superiority of MAWT in pitch period detection accuracy over existing methods, and illustrate its advantages for speech segmentation. These advantages are more pronounced for gain-varying and transitional speech, and under noisy conditions.

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Peters, Richard Alan II. "A LINEAR PREDICTION CODING MODEL OF SPEECH (SYNTHESIS, LPC, COMPUTER, ELECTRONIC)". Thesis, The University of Arizona, 1985. http://hdl.handle.net/10150/291240.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Lee, Kwan Yee. "Analysis-by-synthesis linear predictive coding". Thesis, University of Surrey, 1990. http://epubs.surrey.ac.uk/844188/.

Texto completo

Resumen

Applications such as satellite and digital mobile radio systems (DMR) have gained widespread acceptance in recent years, and efficient digital processing techniques are gradually replacing the older analogue systems. An important subsystem of these applications is voiceband communication, especially digital speech encoding. Digital encoding of speech has been a focus of speech processing research for many years, and recently this activity together with the rapid advances in digital hardware, has begun to produce realistic working algorithms. This is typified by the Pan-European DMR system which operates at 13Kbit/s. For applications operating below this coding capacity, sophisticated algorithms have been developed. A particular class of these, termed Analysis-by-Synthesis Linear Predictive Coding (ABS-LPC), has been a subject of active world-wide research. In this thesis, ABS-LPC algorithms are investigated with particular emphasis on the Code-Excited Linear Predictive coding (CELP) variant. The aim of the research is to produce high communication quality speech at 8Kbit/s and below by considering aspects of quantisation, computational complexity and robustness. The ABS-LPC algorithms operate by exploiting short-term and long-term correlations of speech signals. Line Spectral Frequency (LSF) representation of the short-term correlation is examined and various LSF derivations and quantisation procedures are presented. The variants of ABS-LPC are compared for their advantages and disadvantages to determine an algorithm suitable for in-depth analysis. The particular chosen variant, CELP, was pursued. A study on the importance of the long-term prediction, and the simplification of CELP without sacrificing speech quality is presented. The derived alternative approaches for the computation of the long-term predictor and the filter excitation have enabled the previously unpractical CELP algorithm to produce high communication quality speech at rates below 8Kbit/s, and yet remain implement able in real-time on a single chip. Refinements of the CELP algorithm followed in order to improve the coder towards higher speech quality at 4.8Kbit/s and below. This involved the examination of the weaknesses of the basic CELP algorithm, and alternative strategies to overcome these limitations are presented.

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Murray, Iain Robert. "Simulating emotion in synthetic speech". Thesis, University of Dundee, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.306550.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Niranjan, Mahesan. "Modelling and classifying speech patterns". Thesis, University of Cambridge, 1990. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.303223.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Wu, Lizhong. "Speech processing with neural networks". Thesis, University of Cambridge, 1992. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.259529.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Taft, Daniel Adam. "Cochlear implant sound coding with across-frequency delays". Connect to thesis, 2009. http://repository.unimelb.edu.au/10187/5783.

Texto completo

Resumen

The experiments described in this thesis investigate the temporal relationship between frequency bands in a cochlear implant sound processor. Initial studies were of cochlea-based traveling wave delays for cochlear implant sound processing strategies. These were later broadened into studies of an ensemble of across-frequency delays.
Before incorporating cochlear delays into a cochlear implant processor, a set of suitable delays was determined with a psychoacoustic calibration to pitch perception, since normal cochlear delays are a function of frequency. The first experiment assessed the perception of pitch evoked by electrical stimuli from cochlear implant electrodes. Six cochlear implant users with acoustic hearing in their non-implanted ears were recruited for this, since they were able to compare electric stimuli to acoustic tones. Traveling wave delays were then computed for each subject using the frequencies matched to their electrodes. These were similar across subjects, ranging over 0-6 milliseconds along the electrode array.
The next experiment applied the calibrated delays to the ACE strategy filter outputs before maxima selection. The effects upon speech perception in noise were assessed with cochlear implant users, and a small but significant improvement was observed. A subsequent sensitivity analysis indicated that accurate calibration of the delays might not be necessary after all; instead, a range of across-frequency delays might be similarly beneficial.
A computational investigation was performed next, where a corpus of recorded speech was passed through the ACE cochlear implant sound processing strategy in order to determine how across-frequency delays altered the patterns of stimulation. A range of delay vectors were used in combination with a number of processing parameter sets and noise levels. The results showed that additional stimuli from broadband sounds (such as the glottal pulses of vowels) are selected when frequency bands are desynchronized with across-frequency delays. Background noise contains fewer dominant impulses than a single talker and so is not enhanced in this way.
In the following experiment, speech perception with an ensemble of across-frequency delays was assessed with eight cochlear implant users. Reverse cochlear delays (high frequency delays) were equivalent to conventional cochlear delays. Benefit was diminished for larger delays. Speech recognition scores were at baseline with random delay assignments. An information transmission analysis of speech in quiet indicated that the discrimination of voiced cues was most improved with across-frequency delays. For some subjects, this was seen as improved vowel discrimination based on formant locations and improved transmission of the place of articulation of consonants.
A final study indicated that benefits to speech perception with across-frequency delays are diminished when the number of maxima selected per frame is increased above 8-out-of-22 frequency bands.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Ofrecemos descuentos en todos los planes premium para autores cuyas obras están incluidas en selecciones literarias temáticas. ¡Contáctenos para obtener un código promocional único!