Tesis: "Perceptual quality"

1

Dhakal, Prabesh, Prabhat Tiwari y Pawan Chan. "Perceptual Video Quality Assessment Tool". Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2014. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-2576.

Texto completo

Resumen

Subjective video quality is a subjective characteristic of video quality. It is concerned with how a video is perceived by the viewer and designates his or her opinion on the particular video sequence. Subjective video quality tests are quite expensive in terms of time (preparation and running) and human resources. The main objectives of this testing is how the human observes the video quality since they are the ultimate end user. There are many ways of testing the quality of the videos. We have used ITU-T Recommendation P.910.
In our research work, we have designed the tool that can be used to conduct a mass-scale level survey or subjective tests. ACR is the only method used to carry out the subjective video assessment. The test is very useful in the context of a video streaming quality. The survey can be used in various countries and sectors with low internet speeds to determine the kind of video or the compression technique, bit rate, or format that gives the best quality.
0700627491, 0760935352

Los estilos APA, Harvard, Vancouver, ISO, etc.

2

Petersson, Jonas. "A Review of Perceptual Image Quality". Thesis, Linköping University, Department of Science and Technology, 2005. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-2898.

Texto completo

Resumen

What is meant with print quality, what makes people perceive the quality of an image in a certain way? An inquiry was made about what the parameters are that strongly affect the perception of digital printed images.

A subjective test and some measurements make the basis for the thesis. The goal was to find a tool to predict perceived image quality when investigating the connections between the subjective test and the measurements.

Some suitable images were chosen, with a variety of motifs. A test panel consisting of people that are used to observe image quality answered questions about the perception of the quality. Measurements were made on a special test form to get information about the six different printers used in the investigation.

One of the discoveries was made when two images with the same colorful motif were compared. The first image got a much higher grade for general quality than the second image, even though the second image was printed with a printer that had a larger color gamut. The reason of this is that the first image consists of more saturated colors, and the second image has more details. The human eye perceives the more saturated image to be better than the image with more details. Another discovery was the correlation between the perceived general quality of a colored image and the perceived color gamut. One conclusion was that a great difference between two calculated color gamuts resulted in a large difference in perception of the color gamuts. A discovery of an image with very few colors and many glossy surfaces was that print mottle and sharpness are strictly connected to the general quality.

Los estilos APA, Harvard, Vancouver, ISO, etc.

3

Lervold, Mathias Gjerstad. "Measuring perceptual quality in Internet television". Thesis, Norwegian University of Science and Technology, Department of Electronics and Telecommunications, 2009. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-9841.

Texto completo

Resumen

In this thesis we have evaluated the Quality of Experience (QoE) of Internet television through user tests of Absolutt Fotball. Absolutt Fotball is a Norwegian live football streaming service powered by adaptive streaming from Move Networks Inc. In our tests we found that the users rated the overall quality better than other Internet video services, such as Youtube, TV2 Sumo and NRK Nett-TV, but worse than football on TV. The main problems were coding artifacts, such as blurring, edge ringing and color bleeding, as well as problems with the smoothness of playback. Response time and adaptation period were in general satisfactory; all users preferred adaptive streaming with quick starts and no interruptions over traditional streaming with constant quality and buffering in the start and sometimes during sequences. The tests also revealed that factors other than video quality could have significance in the users overall QoE. Most notably was the delay from other live services, such as SMS updates, radio and live updates on the Internet. We also found in our analyses tendencies of content and context dependencies to the QoE. E.g. the result of a users favorite team, as well as his/her viewing environment, could have an impact on his/her perception of the quality. In order to improve the QoE the service provider should evaluate the encoding stage in particular. By increasing the bit rate of the encoding, many of the problems related to coding artifacts and smoothness of playback could be reduced. The client should be optimized with regards to adaptation period, response time and live-delay, however there is a compromise to be made with the robustness and reliability of the media player. The service provider can receive feedback on the QoE in three stages: 1. Full reference objective quality assessment at headend, such as VQM, 2. Bit rate statistics from the clients, and 3. An extended user profile and a QoE tool at user end. The proposed QoE tool in the form of a menu could include guides and tests related to user equipment and viewing environment, real-time feedback and support chat related to video quality problems, and service personalization in relation to quality/price and features. We found that controlling the QoE in Internet television is very difficult. QoE monitoring is however possible for the service provider, but a true end-to-end solution would require a better integration of client and user than is today.

Los estilos APA, Harvard, Vancouver, ISO, etc.

4

Yang, Kai-Chieh. "Perceptual quality assessment for compressed video". Diss., Connect to a 24 p. preview or request complete full text in PDF format. Access restricted to UC campuses, 2007. http://wwwlib.umi.com/cr/ucsd/fullcit?p3284171.

Texto completo

Resumen

Thesis (Ph. D.)--University of California, San Diego, 2007.
Title from first page of PDF file (viewed Mar. 14, 2007). Available via ProQuest Digital Dissertations. Vita. Includes bibliographical references (p. 149-156).

Los estilos APA, Harvard, Vancouver, ISO, etc.

5

Yasakethu, Lasith. "Perceptual quality driven 3D video communications". Thesis, University of Surrey, 2010. http://epubs.surrey.ac.uk/843261/.

Texto completo

Resumen

The ability to provide more exciting, informative and entertaining end-user visual experience has created an enormous interest among the viewers towards 3D content. Whereas traditional 2D video is sufficient for describing details of captured scenes, 3D video can provide more realistic representation of the same scene with the additional value of depth. The success of today's 3D video services requires that the end users meet a satisfactory level of perceptual quality. Thus the main focus of this research is to investigate and design efficient means of delivering the maximum perceptual quality of colour-plus-depth based 3D video services to end-users. 3D content needs to be compressed effectively before transmitting them over communication channels due to the large amount of raw data associated with it. However, compression of 3D content introduces coding artefacts, which hinder the true perception of 3D video. In the first part of the thesis, a novel perceptual quality based rate-controlling algorithm is proposed for both 3D and 2D video encoding, by considering the perceptual quality of video sequence. The proposed technique shows better performance and can be effectively used in off-line video coding. Although several quality models have been proposed in literature to assess the quality of 2D video, no similar effort has been taken for the quality assessment of 3D video. In the second part of the thesis, a compound 3D quality model is designed by combining dominant perceptual attributes of 3D video. While subjective test results remain the best and precise judgment of 3D video quality, the use of proposed quality model is an acceptable compromise for the 3D video research community to speed up the development of 3D consumer products, services and applications. Effective transmission schemes are necessary to improve the end-user perceptual quality and transmission reliability in 3D video communications. Highly compressed 3D content are very sensitive to channel errors. To improve the performance of 3D video transmission over bandwidth limited and error prone wireless channels, a perceptual quality based Joint Source Channel Coding (JSCC) approach is proposed to minimize the effect of both source and channel distortions in the final part of the thesis. Key words: 3D video communications, Colour-plus-depth 3D video. Perceptual 3D video quality Rate controlling, 3D video transmission and Joint Source Channel Coding (JSCC).

Los estilos APA, Harvard, Vancouver, ISO, etc.

6

Rix, Antony W. "Perceptual techniques in audio quality assessment". Thesis, University of Edinburgh, 2003. http://hdl.handle.net/1842/14286.

Texto completo

Resumen

This thesis discusses quality assessment of audio communications systems, in particular telephone networks. A new technique for time-delay estimation based on a smoothed weighted histogram of frame-by-frame delays is presented. This has low complexity and is found to be more robust to non-linear distortions typical of telephone networks. This technique is further extended to identify piecewise constant delay, enabling models to be used for assessing packet-based transmission such as voice over IP, where delay may change several times during a measurement. It is shown that equalisation improves the accuracy of perceptual models for measurements that may include analogue or acoustic components. Linear transfer function estimation is found to be unreliable due to non-linear distortions. Spectral difference and phaseless cross-spectrum estimation methods for identifying and equalising the linear transfer function are implemented for this application, operating in the filter-bank and short-term Fourier spectrum domains. This thesis provides the first detailed examination of the process of selecting and mapping multiple objective perceptual distortion parameters to estimated subjective quality. The systematic variation of subjective opinion between tests is examined and addressed using a new method of monotonic polynomial regression. The effect on conventional regression techniques, and a new joint optimisation process, are considered.

Los estilos APA, Harvard, Vancouver, ISO, etc.

7

Xia, Feng. "Perceptual coding for high-quality audio signals". Ohio : Ohio University, 1998. http://www.ohiolink.edu/etd/view.cgi?ohiou1176235728.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

8

De, Silva Varuna. "Improving perceptual quality of 3D TV systems". Thesis, University of Surrey, 2011. http://epubs.surrey.ac.uk/835859/.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

9

Hewage, Chaminda T. E. R. "Perceptual quality driven 3-D video over networks". Thesis, University of Surrey, 2008. http://eprints.kingston.ac.uk/22178/.

Texto completo

Resumen

3-D video in day to day life will enhance the way we represent real-world sceneries and provide more natural conditions for human interaction. Therefore, 3-D video has the potential to be the next killer application in multimedia communications. However, the demand for resources (e.g. bandwidth), 3-D quality evaluations and providing error protection are challenges to be addressed. Thus, this thesis addresses the issues related to transmission of 3-D video over communication networks including compression, quality evaluations, error resilience and error concealment. The first part of the thesis investigates encoding approaches for 3-D video in terms of compression efficiency and adaptability to existing communication technologies. Moreover, an encoding configuration is proposed for colour plus depth video coding based on scalable video coding principals. The proposed encoding configuration shows improved compression efficiency and scalability which can be utilized to scale conventional video applications into stereoscopic video with a minimum increase to the bandwidth required. Quality evaluation issues of stereoscopic video are addressed in the second part of the thesis. The correlations between objective and subjective quality ratings are derived for the range of compression ratios and packet loss rates considered. The results show high correlation between candidate objective measures (e.g. PSNR of colour image) and the measured 3-D perceptual quality attributes. The third part of the thesis investigates efficient error resilience and concealment methods for backward compatible stereoscopic video transmission over wired/wireless networks. In order to provide enhanced error recovery, the proposed methods utilize inherent characteristics of colour plus depth video and their contributions towards improved perceived quality. The error resilience methods proposed improve 3-D perception compared to equally protected transmission of colour plus depth map video. Similarly, the proposed error concealment methods recover missing information more effectively compared to the deployment of existing 2-D error concealment methods.

Los estilos APA, Harvard, Vancouver, ISO, etc.

10

Lentz, Joshua K. "Perceptual image quality of launch vehicle imaging telescopes". Doctoral diss., University of Central Florida, 2011. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4963.

Texto completo

Resumen

A large fleet (in the hundreds) of high quality telescopes are used for tracking and imaging of launch vehicles during ascent from Cape Canaveral Air Force Station and Kennedy Space Center. A maintenance tool has been development for use with these telescopes. The tool requires rankings of telescope condition in terms of the ability to generate useful imagery. It is thus a case of ranking telescope conditions on the basis of the perceptual image quality of their imagery. Perceptual image quality metrics that are well-correlated to observer opinions of image quality have been available for several decades. However, these are quite limited in their applications, not being designed to compare various optical systems. The perceptual correlation of the metrics implies that a constant image quality curve (such as the boundary between two qualitative categories labeled as excellent and good) would have a constant value of the metric. This is not the case if the optical system parameters (such as object distance or aperture diameter) are varied. No published data on such direct variation is available and this dissertation presents an investigation made into the perceptual metric responses as system parameters are varied. This investigation leads to some non-intuitive conclusions. The perceptual metrics are reviewed as well as more common metrics and their inability to perform in the necessary manner for the research of interest. Perceptual test methods are also reviewed, as is the human visual system. Image formation theory is presented in a non-traditional form, yielding the surprising result that perceptual image quality is invariant under changes in focal length if the final displayed image remains constant. Experimental results are presented of changes in perceived image quality as aperture diameter is varied. Results are analyzed and shortcomings in the process and metrics are discussed.; Using the test results, predictions are made about the form of the metric response to object distance variations, and subsequent testing was conducted to validate the predictions. The utility of the results, limitations of applicability, and the immediate ability to further generalize the results is presented.
ID: 030423279; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (Ph.D.)--University of Central Florida, 2011.; Includes bibliographical references (p. 151-155).
Ph.D.
Doctorate
Center for Research and Education in Optics and Lasers
Optics and Photonics

Los estilos APA, Harvard, Vancouver, ISO, etc.

11

Savvides, Vasos E. "Perceptual models in speech quality assessment and coding". Thesis, Loughborough University, 1988. https://dspace.lboro.ac.uk/2134/36273.

Texto completo

Resumen

The ever-increasing demand for good communications/toll quality speech has created a renewed interest into the perceptual impact of rate compression. Two general areas are investigated in this work, namely speech quality assessment and speech coding. In the field of speech quality assessment, a model is developed which simulates the processing stages of the peripheral auditory system. At the output of the model a "running" auditory spectrum is obtained. This represents the auditory (spectral) equivalent of any acoustic sound such as speech. Auditory spectra from coded speech segments serve as inputs to a second model. This model simulates the information centre in the brain which performs the speech quality assessment.

Los estilos APA, Harvard, Vancouver, ISO, etc.

12

Moreno, Escobar Jesús Jaime. "Perceptual Criteria on Image Compression". Doctoral thesis, Universitat Autònoma de Barcelona, 2011. http://hdl.handle.net/10803/51428.

Texto completo

Resumen

Hoy en día las imágenes digitales son usadas en muchas areas de nuestra vida cotidiana, pero estas tienden a ser cada vez más grandes. Este incremento de información nos lleva al problema del almacenamiento de las mismas. Por ejemplo, es común que la representación de un pixel a color ocupe 24 bits, donde los canales rojo, verde y azul se almacenen en 8 bits. Por lo que, este tipo de pixeles en color pueden representar uno de los 224 ¼ 16:78 millones de colores. Así, una imagen de 512 £ 512 que representa con 24 bits un pixel ocupa 786,432 bytes. Es por ello que la compresión es importante. Una característica importante de la compresión de imágenes es que esta puede ser con per didas o sin ellas. Una imagen es aceptable siempre y cuando dichas perdidas en la información de la imagen no sean percibidas por el ojo. Esto es posible al asumir que una porción de esta información es redundante. La compresión de imágenes sin pérdidas es definida como deco dificar matemáticamente la misma imagen que fue codificada. En la compresión de imágenes con pérdidas se necesita identificar dos características: la redundancia y la irrelevancia de in formación. Así la compresión con pérdidas modifica los datos de la imagen de tal manera que cuando estos son codificados y decodificados, la imagen recuperada es lo suficientemente pare cida a la original. Que tan parecida es la imagen recuperada en comparación con la original es definido previamente en proceso de codificación y depende de la implementación a ser desarrollada. En cuanto a la compresión con pérdidas, los actuales esquemas de compresión de imágenes eliminan información irrelevante utilizando criterios matemáticos. Uno de los problemas de estos esquemas es que a pesar de la calidad numérica de la imagen comprimida es baja, esta muestra una alta calidad visual, dado que no muestra una gran cantidad de artefactos visuales. Esto es debido a que dichos criterios matemáticos no toman en cuenta la información visual percibida por el Sistema Visual Humano. Por lo tanto, el objetivo de un sistema de compresión de imágenes diseñado para obtener imágenes que no muestren artefactos, aunque su calidad numérica puede ser baja, es eliminar la información que no es visible por el Sistema Visual Humano. Así, este trabajo de tesis doctoral propone explotar la redundancia visual existente en una imagen, reduciendo frecuencias imperceptibles para el sistema visual humano. Por lo que primeramente, se define una métrica de calidad de imagen que está altamente correlacionada con opiniones de observadores. La métrica propuesta pondera el bien conocido PSNR por medio de una modelo de inducción cromática (CwPSNR). Después, se propone un algoritmo compresor de imágenes, llamado Hi-SET, el cual explota la alta correlación de un vecindario de pixeles por medio de una función Fractal. Hi-SET posee las mismas características que tiene un compresor de imágenes moderno, como ser una algoritmo embedded que permite la transmisión progresiva. También se propone un cuantificador perceptual(½SQ), el cual es una modificación a la clásica cuantificación Dead-zone. ½SQes aplicado a un grupo entero de pixelesen una sub-banda Wavelet dada, es decir, se aplica una cuantificación global. A diferencia de lo anterior, la modificación propuesta permite hacer una cuantificación local tanto directa como inversa pixel-por-pixel introduciéndoles una distorsión perceptual que depende directamente de la información espacial del entorno del pixel. Combinando el método ½SQ con Hi-SET, se define un compresor perceptual de imágenes, llamado ©SET. Finalmente se presenta un método de codificación de areas de la Región de Interés, ½GBbBShift, la cual pondera perceptualmente los pixeles en dichas areas, en tanto que las areas que no pertenecen a la Región de Interés o el Fondo sólo contendrán aquellas que perceptualmente sean las más importantes. Los resultados expuestos en esta tesis indican que CwPSNR es el mejor indicador de calidad de imagen en las distorsiones más comunes de compresión como son JPEG y JPEG2000, dado que CwPSNR posee la mejor correlación con la opinión de observadores, dicha opinión está sujeta a los experimentos psicofísicos de las más importantes bases de datos en este campo, como son la TID2008, LIVE, CSIQ y IVC. Además, el codificador de imágenes Hi-SET obtiene mejores resultados que los obtenidos por JPEG2000 u otros algoritmos que utilizan el fractal de Hilbert. Así cuando a Hi-SET se la aplica la cuantificación perceptual propuesta, ©SET, este incrementa su eficiencia tanto objetiva como subjetiva. Cuando el método ½GBbBShift es aplicado a Hi-SET y este es comparado contra el método MaxShift aplicado al estándar JPEG2000 y a Hi-SET, se obtienen mejores resultados perceptuales comparando la calidad subjetiva de toda la imagen de dichos métodos. Tanto la cuantificación perceptual propuesta ½SQ como el método ½GBbBShift son algoritmos generales, los cuales pueden ser aplicados a otros algoritmos de compresión de imágenes basados en Transformada Wavelet tales como el mismo JPEG2000, SPIHT o SPECK, por citar algunos ejemplos.
Nowadays, digital images are used in many areas in everyday life, but they tend to be big. This increases amount of information leads us to the problem of image data storage. For example, it is common to have a representation a color pixel as a 24-bit number, where the channels red, green, and blue employ 8 bits each. In consequence, this kind of color pixel can specify one of 224 ¼ 16:78 million colors. Therefore, an image at a resolution of 512 £ 512 that allocates 24 bits per pixel, occupies 786,432 bytes. That is why image compression is important. An important feature of image compression is that it can be lossy or lossless. A compressed image is acceptable provided these losses of image information are not perceived by the eye. It is possible to assume that a portion of this information is redundant. Lossless Image Compression is defined as to mathematically decode the same image which was encoded. In Lossy Image Compression needs to identify two features inside the image: the redundancy and the irrelevancy of information. Thus, lossy compression modifies the image data in such a way when they are encoded and decoded, the recovered image is similar enough to the original one. How similar is the recovered image in comparison to the original image is defined prior to the compression process, and it depends on the implementation to be performed. In lossy compression, current image compression schemes remove information considered irrelevant by using mathematical criteria. One of the problems of these schemes is that although the numerical quality of the compressed image is low, it shows a high visual image quality, e.g. it does not show a lot of visible artifacts. It is because these mathematical criteria, used to remove information, do not take into account if the viewed information is perceived by the Human Visual System. Therefore, the aim of an image compression scheme designed to obtain images that do not show artifacts although their numerical quality can be low, is to eliminate the information that is not visible by the Human Visual System. Hence, this Ph.D. thesis proposes to exploit the visual redundancy existing in an image by reducing those features that can be unperceivable for the Human Visual System. First, we define an image quality assessment, which is highly correlated with the psychophysical experiments performed by human observers. The proposed CwPSNR metrics weights the well-known PSNR by using a particular perceptual low level model of the Human Visual System, e.g. the Chromatic Induction Wavelet Model (CIWaM). Second, we propose an image compression algorithm (called Hi-SET), which exploits the high correlation and self-similarity of pixels in a given area or neighborhood by means of a fractal function. Hi-SET possesses the main features that modern image compressors have, that is, it is an embedded coder, which allows a progressive transmission. Third, we propose a perceptual quantizer (½SQ), which is a modification of the uniform scalar quantizer. The ½SQ is applied to a pixel set in a certain Wavelet sub-band, that is, a global quantization. Unlike this, the proposed modification allows to perform a local pixel-by-pixel forward and inverse quantization, introducing into this process a perceptual distortion which depends on the surround spatial information of the pixel. Combining ½SQ method with the Hi-SET image compressor, we define a perceptual image compressor, called ©SET. Finally, a coding method for Region of Interest areas is presented, ½GBbBShift, which perceptually weights pixels into these areas and maintains only the more important perceivable features in the rest of the image. Results presented in this report show that CwPSNR is the best-ranked image quality method when it is applied to the most common image compression distortions such as JPEG and JPEG2000. CwPSNR shows the best correlation with the judgement of human observers, which is based on the results of psychophysical experiments obtained for relevant image quality databases such as TID2008, LIVE, CSIQ and IVC. Furthermore, Hi-SET coder obtains better results both for compression ratios and perceptual image quality than the JPEG2000 coder and other coders that use a Hilbert Fractal for image compression. Hence, when the proposed perceptual quantization is introduced to Hi-SET coder, our compressor improves its numerical and perceptual e±ciency. When ½GBbBShift method applied to Hi-SET is compared against MaxShift method applied to the JPEG2000 standard and Hi-SET, the images coded by our ROI method get the best results when the overall image quality is estimated. Both the proposed perceptual quantization and the ½GBbBShift method are generalized algorithms that can be applied to other Wavelet based image compression algorithms such as JPEG2000, SPIHT or SPECK.

Los estilos APA, Harvard, Vancouver, ISO, etc.

13

Zhu, Shu-Yu. "Perceptual wavelet coding and quality assessment for still image". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape4/PQDD_0020/MQ53450.pdf.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

14

Eadie, Tanya L. "A perceptual investigation of vocal quality using backward speech". Thesis, National Library of Canada = Bibliothèque nationale du Canada, 1999. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape9/PQDD_0004/MQ42063.pdf.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

15

Joveluro, Prince. "Perceptual quality estimation techniques for 2D and 3D videos". Thesis, University of Surrey, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.543277.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

16

Oh, Han. "Perceptual Image Compression using JPEG2000". Diss., The University of Arizona, 2011. http://hdl.handle.net/10150/202996.

Texto completo

Resumen

Image sizes have increased exponentially in recent years. The resulting high-resolution images are typically encoded in a lossy fashion to achieve high compression ratios. Lossy compression can be categorized into visually lossless and visually lossy compression depending on the visibility of compression artifacts. This dissertation proposes visually lossless coding methods as well as a visually lossy coding method with perceptual quality control. All resulting codestreams are JPEG2000 Part-I compliant.Visually lossless coding is increasingly considered as an alternative to numerically lossless coding. In order to hide compression artifacts caused by quantization, visibility thresholds (VTs) are measured and used for quantization of subbands in JPEG2000. In this work, VTs are experimentally determined from statistically modeled quantization distortion, which is based on the distribution of wavelet coefficients and the dead-zone quantizer of JPEG2000. The resulting VTs are adjusted for locally changing background through a visual masking model, and then used to determine the minimum number of coding passes to be included in a codestream for visually lossless quality under desired viewing conditions. The proposed coding scheme successfully yields visually lossless images at competitive bitrates compared to those of numerically lossless coding and visually lossless algorithms in the literature.This dissertation also investigates changes in VTs as a function of display resolution and proposes a method which effectively incorporates multiple VTs for various display resolutions into the JPEG2000 framework. The proposed coding method allows for visually lossless decoding at resolutions natively supported by the wavelet transform as well as arbitrary intermediate resolutions, using only a fraction of the full-resolution codestream. When images are browsed remotely, this method can significantly reduce bandwidth usage.Contrary to images encoded in the visually lossless manner, highly compressed images inevitably have visible compression artifacts. To minimize these artifacts, many compression algorithms exploit the varying sensitivity of the human visual system (HVS) to different frequencies, which is typically obtained at the near-threshold level where distortion is just noticeable. However, it is unclear that the same frequency sensitivity applies at the supra-threshold level where distortion is highly visible. In this dissertation, the sensitivity of the HVS for several supra-threshold distortion levels is measured based on the JPEG2000 quantization distortion model. Then, a low-complexity JPEG2000 encoder using the measured sensitivity is described. The proposed visually lossy encoder significantly reduces encoding time while maintaining superior visual quality compared with conventional JPEG2000 encoders.

Los estilos APA, Harvard, Vancouver, ISO, etc.

17

Engelke, Ulrich. "Perceptual Quality Metric Design for Wireless Image and Video Communication". Licentiate thesis, Karlskrona : Department of Signal Processing, School of Engineering, Blekinge Institute of Technology, 2008. http://www.bth.se/fou/Forskinfo.nsf/allfirst2/00af49144047a9ccc125746c002db812?OpenDocument.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

18

Huynh-Thu, Quan. "Perceptual quality assessment of communications-grade video with temporal artefacts". Thesis, University of Essex, 2009. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.502128.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

19

Rohani, Mehdiabadi Behrooz. "Power control for mobile radio systems using perceptual speech quality metrics". University of Western Australia. School of Electrical, Electronic and Computer Engineering, 2007. http://theses.library.uwa.edu.au/adt-WU2007.0174.

Texto completo

Resumen

As the characteristics of mobile radio channels vary over time, transmit power must be controlled accordingly to ensure that the received signal level is within the receiver's sensitivity. As a consequence, modern mobile radio systems employ power control to regulate the received signal level such that it is neither less nor excessively larger than receiver sensitivity in order to maintain adequate service quality. In this context, speech quality measurement is an important aspect in the delivery of speech services as it will impact satisfaction of customers as well as the usage of precious system resources. A variety of techniques for speech quality measurement has been produced over the last few years as result of tireless research in the area of perceptual speech quality estimation. These are mainly based on psychoacoustic models of the human auditory systems. However, these techniques cannot be directly applied for real-time communication purposes as they typically require a copy of the transmitted and received speech signals for their operation. This thesis presents a novel technique of incorporating perceptual speech quality metrics with power control for mobile radio systems. The technique allows for standardized perceptual speech quality measurement algorithms to be used for in-service measurement of speech quality. The accuracy of the proposed Real-Time Perceptual Speech Quality Measurement (RTPSQM) technique with respect to measuring speech quality is first validated by extensive simulations. On this basis, RTPSQM is applied to power control in the Global System for Mobile (GSM) communication and the Universal Mobile Telecommunication System (UMTS). It is shown by simulations that the use of perceptual-based power control in GSM and UMTS outperforms conventional power control in terms of reducing the transmitter signal power required for providing adequate speech quality. This in turn facilitates the observed increase in system capacity and thus offers better utilization of available system resources. To enable an analytical performance assessment of perceptual speech quality metrics in power control, the mathematical frameworks for conventional and perceptual-based power control are derived. The derivations are performed for Code Division Multiple Access (CDMA) systems and kept as generic as possible. Numerical results are presented which could be used in a system design to readily find the Erlang capacity per cell for either of the considered power control algorithms.

Los estilos APA, Harvard, Vancouver, ISO, etc.

20

Kirtikar, Shantanu Sanatkumar. "Acoustic and Perceptual Evaluation of the Quality of Radio-Transmitted Speech". Thesis, University of Canterbury. Department of Communication Disorders, 2010. http://hdl.handle.net/10092/5305.

Texto completo

Resumen

Aim When speech signals are transmitted via radio, the process of transmission may add noise to the signal of interest. This study aims to examine the effect of radio transmission on the quality of speech signals transmitted using a combined acoustic and perceptual approach. Method A standard acoustic recording of the Phonetically Balanced Kindergarten (PBK) word list read by a male speaker was played back in three conditions, one without radio transmission and two with two types of radio transmission. The vowel segments (/i, a, o, u/) embedded in the original and the re-recorded signals were analysed to yield measures of frequency loci of the first two formant frequencies (F1 and F2), amplitude difference between the first two harmonics (H1-H2), and singing power ratio (SPR). Other measures included Spectral Moment One (mean), Spectral Moment Two (variance), and the energy ratio between consonant and vowel (CV energy ratio). To examine how H1-H2 and SPR were related to the perception of vowel intelligibility and clarity, vowels at five levels of each of these two measures were selected as stimuli in the perceptual study. The auditory stimuli were presented to 20 normal hearing listeners, including 10 males and 10 females aged between 21 to 42 years, the listeners were asked to identify the vowel for each vowel stimulus in the vowel identification task and judge from a contrast pair which vowel sounded “clearer” in the clarity discrimination task. A follow-up study using vowel stimuli with a constant length and five H1-H2 or five SPR levels was conducted on five listeners to determine the relationship between the perception of speech clarity and H1-H2 or SPR. Results Results from a series of one-way or two-way analyses of variance (ANOVAs) or ANOVAs on Ranks and post-hoc test revealed that radio transmission had a significant effect on all of the selected acoustic measures except for the CV energy ratio. Signal degeneration due to radio transmission is characterized by changes of F1 or F2 frequencies toward a more compressed vowel space, a H1-H2 value indicating an increase of H1 dominance, a SPR value suggestive of an increase in the energy around the 2-4 kHz region, and a loss of differentiation between /s/ and /sh/ on the measures of Spectral Moments One and Two. Vowel duration was also found to play a major role in affecting the perception of vowel intelligibility and clarity. The follow-up study, with a control on vowel duration, found that SPR played a role in affecting the perception of vowel intelligibility and clarity. Conclusion It was concluded from the findings that measures of energy ratio between different frequency regions, as well as the frequencies of the first two formant frequencies, were sensitive in detecting the effect of radio transmission.

Los estilos APA, Harvard, Vancouver, ISO, etc.

21

Engelke, Ulrich. "Modelling Perceptual Quality and Visual Saliency for Image and Video Communications". Doctoral thesis, Karlskrona : Blekinge Institute of Technology, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-00470.

Texto completo

Resumen

The evolution of advanced radio transmission technologies for third and future generation mobile radio systems has paved the way for the delivery of mobile multimedia services. This is further enabled through contemporary video coding standards, such as H.264/AVC, allowing wireless image and video applications to become a reality on modern mobile devices. The extensive amount of data needed to represent the visual content and the scarce channel bandwidth constitute great challenges for network operators to deliver an intended quality of service. Appropriate metrics are thus instrumental for service providers to monitor the quality as experienced by the end user. This thesis focuses on subjective and objective assessment methods of perceived visual quality in image and video communication. The content of the thesis can be broadly divided into four parts. Firstly, the focus is on the development of image quality metrics that predict perceived quality degradations due to transmission errors. The metrics follow the reduced-reference approach, thus, allowing to measure quality loss during image communication with only little overhead as side information. The metrics are designed and validated using subjective quality ratings from two experiments. The distortion assessment performance is further demonstrated through an application for filter design. The second part of the thesis then investigates various methodologies to further improve the quality prediction performance of the metrics. In this respect, several properties of the human visual system are investigated and incorporated into the metric design. It is shown that the quality prediction performance can be considerably improved using these methodologies. The third part is devoted to analysing the impact of the complex distortion patterns on the overall perceived quality, following two goals. Firstly, the confidence of human observers is analysed to identify the difficulties during assessment of the distorted images, showing, that indeed the level of confidence is highly dependent on the level of visual quality. Secondly, the impact of content saliency on the perceived quality is identified using region-of-interest selections and eye tracking data from two independent subjective experiments. It is revealed, that the saliency of the distortion region indeed has an impact on the overall quality perception and also on the viewing behaviour of human observers when rating image quality. Finally, the quality perception of H.264/AVC coded video containing packet loss is analysed based on the results of a combined subjective video quality and eye tracking experiment. It is shown that the distortion location in relation to the content saliency has a tremendous impact on the overall perceived quality. Based on these findings, a framework for saliency aware video quality assessment is proposed that strongly improves the quality prediction performance of existing video quality metrics.

Los estilos APA, Harvard, Vancouver, ISO, etc.

22

Ho, Elaine Mandy. "Effects of cultural and linguistic backgrounds on perceptual voice quality rating". Click to view the E-thesis via HKU Scholors Hub, 2005. http://lookup.lib.hku.hk/lookup/bib/B38279204.

Texto completo

Resumen

Thesis (B.Sc)--University of Hong Kong, 2005.
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, June 30, 2005." Also available in print.

Los estilos APA, Harvard, Vancouver, ISO, etc.

23

Hameed, Abdul. "Perceptual Video Quality Model and its Application in Wireless Multimedia Communications". Diss., North Dakota State University, 2015. http://hdl.handle.net/10365/24867.

Texto completo

Resumen

With the exponential growth of video traffic over wireless networked and embedded devices such as mobile phones and sensors, mechanisms are needed to predict the perceptual quality of video in real time and with low complexity, based on which networking protocols can control video quality and optimize network resources to meet the quality of experience requirements of users. This thesis is composed of three related pieces of work. In the first piece of work, an efficient and light-weight video quality prediction model through partial parsing of compressed from the H.264/AVC compressed bitstream is proposed. A set of features were introduced to reflect video content characteristics and distortions caused by compression and transmission and were obtained directly in parsing mode without decoding the pixel information in macro-blocks. Based on the features, an artificial neural network model was trained for perceptual quality prediction. In the second piece of work, a perceptual video quality prediction model is trained based on massive subjective test results. Prediction of perceptual quality is achieved through a decision tree using a set of easily calculated features from the compressed bitstream and the network. Moreover, based on the prediction model, a novel Forward Error Correction (FEC) scheme is introduced to protect video packets by taking into consideration video content characteristics, compression parameters, as well as network condition. Given a perceptual quality requirement, the error control scheme adjusts the level of protection for different components in a video stream such that the network overhead needed for transmission is minimized. In the third piece of work, a study was conducted to examine whether the previous prediction model could provide a good confidence measure in a different domain of judgments. The accuracy of judgements demonstrated the predictive validity of confidence measure with respect packet loss ratio traits. The results of this study were consistent with the previous one and the experiments suggested that brief and evaluative thin slice judgments are made relatively intuitively. Present research represents a new entry into the domain of high level judgments, such as video confidence measure by the use of our existing perception quality model.

Los estilos APA, Harvard, Vancouver, ISO, etc.

24

IRWIN, LINDSAY K. "PERCEPTUAL EVALUATION OF VOICE QUALITY OF INDIVIDUALS WITH DYSPHAGIA AND DYSPHONIA". University of Cincinnati / OhioLINK, 2006. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1148416348.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

25

Stokes, Tobias W. "Improving the perceptual quality of single-channel blind audio source separation". Thesis, University of Surrey, 2015. http://epubs.surrey.ac.uk/807786/.

Texto completo

Resumen

Given a mixture of audio sources, a blind audio source separation (BASS) tool is required to extract audio relating to one specific source whilst attenuating that related to all others. This thesis answers the question “How can the perceptual quality of BASS be improved for broadcasting applications?” The most common source separation scenario, particularly in the field of broadcasting, is single channel, and this is particularly challenging as a limited set of cues are available. Broadcasting also requires that a source separator is automated, capable of handling non-stationary, reverberant mixtures and able to separate an unknown number of sources. In the single-channel case, the time- frequency mask is common as a method of separation. However, this process produces artefacts in the separated audio. The perceptual evaluation for audio source separation (PEASS) toolkit represents an efficient way to generate a multi-dimensional measure of perceptual quality. Initial experimental work, using ideal target and interferer estimates, uses PEASS to test variations on the ideal binary mask and shows continuous masks are perceptually better than binary while identifying a trade-off between artefacts and interferer suppression. To explore the optimisation of this trade-off, a series of sigmoidal functions are used to map target-to-mixture ratios to mask coefficients. This leads to a mask, with less target-to-mixture based discrimination than those typically found in literature, being identified as the optimum. Further experiments applying offsets, hysteresis, smoothing and frequency-dependency to the mask do not show any benefit in audio quality. The optimal sigmoidal mask is demonstrated to also be superior under non-ideal conditions using a non-negative matrix factorisation algorithm to produce the estimates. A final listening test compares the outputs of binary, ratio and optimal sigmoidal masks concluding that listeners prefer the ratio mask to the sigmoidal mask and both continuous masks to the binary mask.

Los estilos APA, Harvard, Vancouver, ISO, etc.

26

Mitchell, Helen Frances. "Defining vocal quality in female classical singers: pedagogical, acoustical and perceptual studies". University of Sydney. Australian Centre for Applied Research in Music Performance, 2005. http://hdl.handle.net/2123/710.

Texto completo

Resumen

The technique of �open throat� is a pedagogical concept transmitted through the oral tradition of singing. This thesis explored the pedagogical perceptions and practices of �open throat� using empirical methodologies to assess technical skill and associated vocal quality. In the first study (Mitchell, Kenny, Ryan, & Davis, 2003), we assessed the degree of consensus amongst singing pedagogues regarding the definition of, and use in the singing studio of the technique called �open throat.� Results indicated that all fifteen pedagogues described �open throat� technique as fundamental to singing training and were positive about the sound quality it achieved, especially in classical singing. It was described as a way of maximising pharyngeal space or abducting the false vocal folds. Hypotheses generated from pedagogical beliefs expressed in this first study were then tested acoustically (Mitchell & Kenny, 2004a, 2004b). Six advanced singing students sang in two conditions: �optimal� (O), using maximal open throat, �sub-optimal� (SO), using reduced open throat and loud sub-optimal (LSO) to control for the effect of loudness. From these recordings, acoustic characteristics of vibrato (Mitchell & Kenny, 2004b) and energy distribution (Mitchell & Kenny, 2004a) were examined. Subsequent investigations of the vibrato parameters of rate, extent and onset, revealed that extent was significantly reduced and onset increased when singers did not use the technique. As inconsistent vibrato is considered indicative of poor singing, it was hypothesized that testing the energy distribution in these singers� voices in each condition would identify the timbral changes associated with open throat. Visual inspection of long term average spectra (LTAS) confirmed differences between O and SO, but conventional measures applied to long term average spectra (LTAS), comparing energy peak height [singing power ratio (SPR)] and peak area [energy ratio (ER)] were not sensitive to the changes identified through visual inspection of the LTAS. These results were not consistent with the vibrato findings and suggest that conventional measures of SPR and ER are not sufficiently sensitive to evaluate LTAS. In the fourth study, fifteen expert listeners consistently and reliably identified the presence of open throat technique with 87% accuracy (Mitchell & Kenny, in press). In the fifth study, LTAS measurements were examined with respect to the perceptual ratings of singers. There was no relationship between perceptual rankings of vocal beauty and acoustic rankings of vocal quality (Kenny & Mitchell, 2004, in press). There is a vast literature of spectral energy definitions of good voice but the studies in this thesis have indicated that current acoustic methods are limited in defining vocal quality. They also suggest that current work in singing has not sufficiently incorporated perceptual ratings and descriptions of sound quality or the relationship between acoustic and perceptual factors with pedagogical practices.

Los estilos APA, Harvard, Vancouver, ISO, etc.

27

Zeng, Yongqin. "Image segmentation algorithms incorporating perceptual quality factors for region-based image compression". Thesis, Imperial College London, 1999. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.313514.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

28

Oh, Joonmi. "Human visual system informed perceptual quality assessment models for compressed medical images". Thesis, University of Birmingham, 2000. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.368425.

Texto completo

Resumen

Hospital and clinical environments are rapidly moving toward the digital capture, processing, storage, and transmission of medical images. X-ray cardio-angiograms are used to observe coronary blood flow, diagnose arterial disease and perform coronary angioplasty or bypass surgery. The digital storage and transmission of these cardiovascular images has significant potential to improve patient care. For example, digital images enable electronic archiving, network transmission and useful manipulation of diagnostic information such as image enhancement. The efficient compression of medical images is tremendously important for economical storage and fast transmission, since digitised medical images must be of high-quality, requiring high-resolution and having a large volume in general. The use of lossily compressed images has created a need for the development of objective quality assessment metrics I measuring perceived subjective opinions by viewers for optimal compression rate/distortion trade-off. Quality assessment metrics, based on models of the human visual system, have more accurately predicted perceived quality than traditional error-based objective quality metrics. This thesis presents a proposed Multi-stage Perceptual Quality Assessment (MPQA) model for compressed images. The motivation for the development of a perceptual quality assessment is to measure (in)visible physical differences between original and processed images. MPQA produces visible distortion maps and quantitative error measured informed by considerations of the human visual system. Original and decompressed images are decomposed into different spatial frequency bands and orientations modelling the human cortex. Contrast errors are calculated for each frequency and orientation, and masked as a function of contrast sensitivity and background uncertainty. Spatially masked contrast error measurements are made across frequency bands and orientations to produce a single Perceptual Distortion Visibility Map (PDVM). A Perceptual Quality Rating (PQR) is calculated from the PDVM and transformed into a one to five scale for direct comparison with the Mean Opinion Score (MOS), generally used in subjective rating. For medical applications, acceptable decompressed medical images might be those which are perceptually pleasing, contain no visible artefacts and have no loss in diagnostic content. To investigate this problem, clinical tests identifying diagnostically acceptable image reconstructions is performed and demonstrates that the proposed perceptual quality rating method has better agreement with observers' responses than objective error measurement methods. The vision models presented in the thesis are also implemented in the thresholding and quantisation stages of a compression algorithm. An HVS-informed perceptual thresholding and quantisation method is also shown to produce improved compression ratio performance with less visible distortions.

Los estilos APA, Harvard, Vancouver, ISO, etc.

29

Chintala, Bala Venkata Sai Sundeep. "Objective Perceptual Quality Assessment of JPEG2000 Image Coding Format Over Wireless Channel". Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-17785.

Texto completo

Resumen

A dominant source of Internet traffic, today, is constituted of compressed images. In modern multimedia communications, image compression plays an important role. Some of the image compression standards set by the Joint Photographic Expert Group (JPEG) include JPEG and JPEG2000. The expert group came up with the JPEG image compression standard so that still pictures could be compressed to be sent over an e-mail, be displayed on a webpage, and make high-resolution digital photography possible. This standard was originally based on a mathematical method, used to convert a sequence of data to the frequency domain, called the Discrete Cosine Transform (DCT). In the year 2000, however, a new standard was proposed by the expert group which came to be known as JPEG2000. The difference between the two is that the latter is capable of providing better compression efficiency. There is also a downside to this new format introduced. The computation required for achieving the same sort of compression efficiency as one would get with the original JPEG format is higher. JPEG is a lossy compression standard which can throw away some less important information without causing any noticeable perception differences. Whereas, in lossless compression, the primary purpose is to reduce the number of bits required to represent the original image samples without any loss of information. The areas of application of the JPEG image compression standard include the Internet, digital cameras, printing, and scanning peripherals. In this thesis work, a simulator kind of functionality setup is needed for conducting the objective quality assessment. An image is given as an input to our wireless communication system and its data size is varied (e.g. 5%, 10%, 15%, etc) and a Signal-to-Noise Ratio (SNR) value is given as input, for JPEG2000 compression. Then, this compressed image is passed through a JPEG encoder and then transmitted over a Rayleigh fading channel. The corresponding image obtained after having applied these constraints on the original image is then decoded at the receiver and inverse discrete wavelet transform (IDWT) is applied to inverse the JPEG 2000 compression. Quantization is done for the coefficients which are scalar-quantized to reduce the number of bits to represent them, without the loss of quality of the image. Then the final image is displayed on the screen. The original input image is co-passed with the images of varying data size for an SNR value at the receiver after decoding. In particular, objective perceptual quality assessment through Structural Similarity (SSIM) index using MATLAB is provided.

Los estilos APA, Harvard, Vancouver, ISO, etc.

30

Fung, Yam-cheung Kelvin. "The use of internal versus external standards in perceptual evaluation of voice quality". Click to view the E-thesis via HKUTO, 1994. http://sunzi.lib.hku.hk/hkuto/record/B36208899.

Texto completo

Resumen

Thesis (B.Sc)--University of Hong Kong, 1994.
"A dissertation submitted in partial fulfilment of the requirements for the Bachelor of Science (Speech and Hearing Sciences), The University of Hong Kong, April 29, 1994." Also available in print.

Los estilos APA, Harvard, Vancouver, ISO, etc.

31

Kramer, Elena [Verfasser]. "Predicting perceptual voice quality from objective voice parameters in dysphonic patients / Elena Kramer". Lübeck : Zentrale Hochschulbibliothek Lübeck, 2013. http://d-nb.info/1029994641/34.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

32

Bruijn, Christina Geertruida de. "Voice quality after dictation to speech recognition software : a perceptual and acoustic study". Thesis, University of Sheffield, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.440907.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

33

Masaki, Asako. "Optimizing acoustic and perceptual assessment of voice quality in children with vocal nodules". Thesis, Massachusetts Institute of Technology, 2009. http://hdl.handle.net/1721.1/54666.

Texto completo

Resumen

Thesis (Ph. D.)--Harvard-MIT Division of Health Sciences and Technology, 2009.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 105-109).
Few empirically-derived guidelines exist for optimizing the assessment of vocal function in children with voice disorders. The goal of this investigation was to identify a minimal set of speech tasks and associated acoustic analysis methods that are most salient in characterizing the impact of vocal nodules on vocal function in children. Hence, a pediatric assessment protocol was developed based on the standardized Consensus Auditory Perceptual Evaluation of Voice (CAPE-V) used to evaluate adult voices. Adult and pediatric versions of the CAPE-V protocols were used to gather recordings of vowels and sentences from adult females and children (4-6 and 8-10 year olds) with normal voices and vocal nodules, and these recordings were subjected to perceptual and acoustic analyses. Results showed that perceptual ratings for breathiness best characterized the presence of nodules in children's voices, and ratings for the production of sentences best differentiated normal voices and voices with nodules for both children and adults. Selected voice quality-related acoustic algorithms designed to quantitatively evaluate acoustic measures of vowels and sentences, were modified to be pitch-independent for use in analyzing children's voices. Synthesized vowels for children and adults were used to validate the modified algorithms by systematically assessing the effects of manipulating the periodicity and spectral characteristics of the synthesizer's voicing source.
(cont.) In applying the validated algorithms to the recordings of subjects with normal voices and vocal nodules, the acoustic measure tended to differentiate normal voices and voices with nodules in children and adults, and some displayed significant correlations with the perceptual attributes of overall severity of dysphonia, roughness, and/or breathiness. None of the acoustic measures correlated significantly with the perceptual attribute of strain. Limitations in the strength of the correlations between acoustic measures and perceptual attributes were attributed to factors that can be addressed in future investigations, which can now utilize the algorithms that were developed in this investigation for children's voices. Preliminary recommendations are made for the clinical assessment of pediatric voice disorders.
by Asako Masaki.
Ph.D.

Los estilos APA, Harvard, Vancouver, ISO, etc.

34

MONTEIRO, Estêvão Chaves. "Shifted Gradient Similarity: A perceptual video quality assessment index for adaptive streaming encoding". Universidade Federal de Pernambuco, 2016. https://repositorio.ufpe.br/handle/123456789/17359.

Texto completo

Resumen

Submitted by Isaac Francisco de Souza Dias (isaac.souzadias@ufpe.br) on 2016-07-13T18:59:10Z No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Shifted Gradient Similarity - A perceptual video quality assessment index for adaptive streaming encoding.pdf: 5625470 bytes, checksum: 8ec1d179ec4cca056eb66609ba5791a0 (MD5)
Made available in DSpace on 2016-07-13T18:59:10Z (GMT). No. of bitstreams: 2 license_rdf: 1232 bytes, checksum: 66e71c371cc565284e70f40736c94386 (MD5) Shifted Gradient Similarity - A perceptual video quality assessment index for adaptive streaming encoding.pdf: 5625470 bytes, checksum: 8ec1d179ec4cca056eb66609ba5791a0 (MD5) Previous issue date: 2016-03-04
Adaptive video streaming has become prominent due to the rising diversity of Web-enabled personal devices and the popularity of social networks. Common limitations in Internet bandwidth, decoding speed and battery power available in such devices challenge the efficiency of content encoders to preserve visual quality at reduced data rates over a wide range of display resolutions, typically compressing to lower than 1% of the massive raw data rate. Furthermore, the human visual system does not uniformly perceive losses of spatial and temporal information, so a simple physical objective model such as the mean squared error does not correlate well with perceptual quality. Objective assessment and prediction of perceptual quality of visual content has greatly improved in the past decade, but remains an open problem. Among the most relevant psychovisual quality metrics are the many versions of the Structural Similarity (SSIM) index. In this work, several of the most efficient SSIM-based metrics, such as the Multi-Scale Fast SSIM and the Gradient Magnitude Similarity Deviation (GMSD), are decomposed into their component techniques and reassembled in order to measure and understand the contribution of each technique and to develop improvements in quality and efficiency. The metrics are applied to the LIVE Mobile Video Quality and TID2008 databases and the results are correlated to the subjective data included in the databases in the form of mean opinion scores (MOS), so each metric’s degree of correlation indicates its ability to predict perceptual quality. Additionally, the metrics’ applicability to the recent, relevant psychovisal rate-distortion optimization (Psy-RDO) implementation in the x264 encoder, which currently lacks an ideal objective assessment metric, is investigated as well. The “Shifted Gradient Similarity” (SG-Sim) index is proposed with an improved feature enhancement by avoiding a common unintended loss of analysis information in SSIM-based indexes, and achieving considerably higher MOS correlation than the existing metrics investigated in this work. More efficient spatial pooling filters are proposed, as well: the decomposed 1-D integer Gaussian filter limited to two standard deviations, and the downsampling Box filter based on the integral image, which retain respectively 99% and 98% equivalence and achieve speed gains of respectively 68% and 382%. In addition, the downsampling filter also enables broader scalability, particularly for Ultra High Definition content, and defines the “Fast SG-Sim” index version. Furthermore, SG-Sim is found to improve correlation with Psy-RDO, as an ideal encoding quality metric for x264. Finally, the algorithms and experiments used in this work are implemented in the “Video Quality Assessment in Java” (jVQA) software, based on the AviSynth and FFmpeg platforms, and designed for customization and extensibility, supporting 4K Ultra-HD content and available as free, open source code.
Cada vez mais serviços de streaming de vídeo estão migrando para o modelo adaptativo, devido à crescente diversidade de dispositivos pessoais conectados à Web e à popularidade das redes sociais. Limitações comuns na largura de banda de Internet, velocidade de decodificação e potência de baterias disponíveis em tais dispositivos desafiam a eficiência dos codificadores de conteúdo para preservar a qualidade visual em taxas de dados reduzidas e abrangendo uma ampla gama de resoluções de tela, tipicamente comprimindo para menos de 1% da massiva taxa de dados bruta. Ademais, o sistema visual humano não percebe uniformemente as perdas de informação espacial e temporal, então um modelo objetivo físico simples como a média do erro quadrático não se correlaciona bem com qualidade perceptível. Técnicas de avaliação e predição objetiva de qualidade perceptível de conteúdo visual se aprimoraram amplamente na última década, mas o problema permanece em aberto. Dentre as métricas de qualidade psicovisual mais relevantes estão muitas versões do índice de similaridade estrutural (Structural Similarity — SSIM). No presente trabalho, várias das mais eficientes métricas baseadas em SSIM, como o Multi-Scale Fast SSIM e o Gradient Magnitude Similarity Deviation (GMSD), são decompostas em suas técnicas-componentes e recombinadas para se obter medidas e entendimento sobre a contribuição de cada técnica e se desenvolver aprimoramentos à sua qualidade e eficiência. Tais métricas são aplicadas às bases de dados LIVE Mobile Video Quality e TID2008 e os resultados são correlacionados aos dados subjetivos incluídos naquelas bases na forma de escores de opinião subjetiva (mean opinion score — MOS), de modo que o grau de correlação de cada métrica indique sua capacidade de predizer qualidade perceptível. Investiga-se, ainda, a aplicabilidade das métricas à recente e relevante implementação de otimização psicovisual de distorção por taxa (psychovisual rate-distortion optimization — Psy-RDO) do codificador x264, ao qual atualmente falta uma métrica de avaliação objetiva ideal. O índice “Shifted Gradient Similarity” (SG-Sim) é proposto com uma técnica aprimorada de realce de imagem que evita uma perda não-pretendida de informação de análise, comum em índices baseados em SSIM, assim alcançando correlação consideravelmente maior com MOS comparado às métricas existentes investigadas neste trabalho. Também são propostos filtros de consolidação espacial mais eficientes: o filtro gaussiano de inteiros 1-D decomposto e limitado a dois desvios padrão e o filtro “box” subamostrado baseado na imagem integral, os quais retém, respectivamente, 99% e 98% de equivalência e obtém ganhos de velocidade de, respectivamente, 68% e 382%. O filtro subamostrado também promove escalabilidade, especialmente para conteúdo de ultra-alta definição, e define a versão do índice “Fast SG-Sim”. Ademais, verifica-se que o SG-Sim aumenta a correlação com Psy-RDO, indicando-se uma métrica de qualidade de codificação ideal para o x264. Finalmente, os algoritmos e experimentos usados neste trabalho estão implementados no software “Video Quality Assessment in Java” (jVQA), baseado nas plataformas AviSynth e FFmpeg e que é projetado para personalização e extensibilidade, suportando conteúdo ultra-alta definição “4K” e disponibilizado como código-fonte aberto e livre.

Los estilos APA, Harvard, Vancouver, ISO, etc.

35

Williamson, Donald S. "DEEP LEARNING METHODS FOR IMPROVING THE PERCEPTUAL QUALITY OF NOISY AND REVERBERANT SPEECH". The Ohio State University, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=osu1461018277.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

36

R, V. Krishnam Raju Kunadha Raju. "Perceptual Image Quality Prediction Using Region of Interest Based Reduced Reference Metrics Over Wireless Channel". Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13631.

Texto completo

Resumen

As there is a rapid growth in the field of wireless communications, the demand for various multimedia services is also increasing. The data that is being transmitted suffers from distortions through source encoding and transmission over errorprone channels. Due to these errors, the quality of the content is degraded. There is a need for service providers to provide certain Quality of Experience (QoE) to the end user. Several methods are being developed by network providers for better QoE.The human tendency mainly focuses on distortions in the Region of Interest(ROI) which are perceived to be more annoying compared to the Background(BG). With this as a base, the main aim of this thesis is to get an accurate prediction quality metric to measure the quality of the image over ROI and the BG independently. Reduced Reference Image Quality Assessment (RRIQA), a reduced reference image quality assessment metric, is chosen for this purpose. In this method, only partial information about the reference image is available to assess the quality. The quality metric is measured independently over ROI and BG. Finally the metric estimated over ROI and BG are pooled together to get aROI aware metric to predict the Mean Opinion Score (MOS) of the image.In this thesis, an ROI aware quality metric is used to measure the quality of distorted images that are generated using a wireless channel. The MOS of distorted images are obtained. Finally, the obtained MOS are validated with the MOS obtained from a database [1].It is observed that the proposed image quality assessment method provides better results compared to the traditional approach. It also gives a better performance over a wide variety of distortions. The obtained results show that the impairments in ROI are perceived to be more annoying when compared to the BG.

Los estilos APA, Harvard, Vancouver, ISO, etc.

37

Brangers, Kirstin M. "Perceptual Ruler for Quantifying Speech Intelligibility in Cocktail Party Scenarios". UKnowledge, 2013. http://uknowledge.uky.edu/ece_etds/31.

Texto completo

Resumen

Systems designed to enhance intelligibility of speech in noise are difficult to evaluate quantitatively because intelligibility is subjective and often requires feedback from large populations for consistent evaluations. Attempts to quantify the evaluation have included related measures such as the Speech Intelligibility Index. These require separating speech and noise signals, which precludes its use on experimental recordings. This thesis develops a procedure using an Intelligibility Ruler (IR) for efficiently quantifying intelligibility. A calibrated Mean Opinion Score (MOS) method is also implemented in order to compare repeatability over a population of 24 subjective listeners. Results showed that subjects using the IR consistently estimated SII values of the test samples with an average standard deviation of 0.0867 between subjects on a scale from zero to one and R2=0.9421. After a calibration procedure from a subset of subjects, the MOS method yielded similar results with an average standard deviation of 0.07620 and R2=0.9181.While results suggest good repeatability of the IR method over a broad range of subjects, the calibrated MOS method is capable of producing results more closely related to actual SII values and is a simpler procedure for human subjects.

Los estilos APA, Harvard, Vancouver, ISO, etc.

38

Chan, Man-kei Karen. "Auditory perceptual learning of breathy voice quality in naive listeners based on an exemplar and prototype approach /". View the Table of Contents & Abstract, 2005. http://sunzi.lib.hku.hk/hkuto/record/B30397170.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

39

洪觀宇 y Roy Hung. "Time domain analysis and synthesis of cello tones based on perceptual quality and playing gestures". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 1998. http://hub.hku.hk/bib/B31215348.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

40

Hung, Roy. "Time domain analysis and synthesis of cello tones based on perceptual quality and playing gestures /". Hong Kong : University of Hong Kong, 1998. http://sunzi.lib.hku.hk/hkuto/record.jsp?B20665672.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

41

Chan, Man-kei Karen y 陳文琪. "Auditory perceptual learning of breathy voice quality in naive listeners based on an exemplar and prototype approach". Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2005. http://hub.hku.hk/bib/B4501470X.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

42

Meitner, Michael John. "Evaluating Web-based perceptual survey methods for assessing quality of experience on Grand Canyon river trips". Diss., The University of Arizona, 1999. http://hdl.handle.net/10150/284902.

Texto completo

Resumen

Forty-seven sites along the Colorado River in the Grand Canyon National Park were presented to observers at the University of Arizona in one of four different presentation methodologies. The representational validity of the presentation methods for quantification of scenic beauty of locations was assessed by means of comparison among the presentation conditions. Results indicate that in heterogeneous landscapes, such as the Grand Canyon, independent ratings of individual photographs from a common location can not simply be averaged to find the overall rating of the location as a whole. In addition, when assessing the scenic beauty of locations that are constrained by a linear feature (Colorado River), the order of presentation is an important variable to consider in conjunction with the mode of presentation.

Los estilos APA, Harvard, Vancouver, ISO, etc.

43

Marins, Paulo. "Beyond 'basic audio quality' : characterizing the perceptual effects introduced by low bit rate spatial audio codecs". Thesis, University of Surrey, 2009. http://epubs.surrey.ac.uk/844634/.

Texto completo

Resumen

The main aim of this thesis was to characterize the perceptual effects introduced by low bit rate spatial audio codecs. The existing methodologies used to evaluate spatial audio codecs were reviewed and the most important studies conducted to assess the perceived quality of spatial audio coding systems were compared. It was found that spatial audio codecs have been evaluated according to ITU-R standards BS.1116 and BS.1534. These tests evaluate the performance of audio codecs using one perceptual attribute - basic audio quality (BAQ). This approach, although effective in terms of the assessment of the overall performance of codecs, does not quantify the contribution of typical codec distortions to the perceived BAQ of the codecs or allow for the identification of independent perceptual attributes that describe the artefacts introduced by spatial audio coding systems. A series of experiments was carried out aiming to characterize the perceptual effects introduced by low bit rate spatial audio codecs. Two initial studies were conducted with the intention of investigating the contribution of selected attributes to the BAQ of low bit rate spatial codecs. Furthermore, another two experiments were performed in order to identify the perceptually salient dimensions or the independent perceptual attributes related to the artefacts introduced by low bit rate spatial audio coding systems. It was found that impairments related to timbral features of the sound are the ones that affect the most the perceived basic audio quality of the codecs. Additionally, two perceptually salient dimensions were identified and labelled as timbral and spatial. Moreover, four independent perceptual attributes (coding and high frequency noise, spatial image clarity, scene width and tone colour) were uncovered providing a description of the perceptual effects introduced by low bit rate spatial audio codecs.

Los estilos APA, Harvard, Vancouver, ISO, etc.

44

GOGINENI, SRI LOHITH. "DEVELOPMENT OF AN ROI AWARE FULL-REFERENCE OBJECTIVE PERCEPTUAL QUALITY METRIC ON IMAGES OVER FADING CHANNEL". Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2016. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-13610.

Texto completo

Resumen

In spite of technological advances in wireless systems, transmitted data suffers from impairments through both lossy source coding and transmission overerror prone channels. Due to these errors, the quality of multimedia content is degraded. The major challenge for service providers in this scenario is to measure the perceptual impact of distortions to provide certain Quality of Experience(QoE) to the end user. The general tendency of the Human Visual System (HVS) suggests that the artifacts in the Region-of-Interest (ROI) are perceived to be more annoying compared to the artifacts in Background (BG). With this assumption, the thesis aims to measure the quality of image over ROI and BG independently. Visual Information Fidelity (VIF), a full-reference image quality assessment is chosen for this purpose. Finally, the metric measured over ROI and BG are pooled to get a ROI aware metric. The ROI aware metric is used to predict the Mean Opinion Score (MOS) of an image. In this study, an ROI aware quality metric is used to measure the quality of a set of distorted images generated using a wireless channel. Eventually, MOS of the distorted images is estimated. Lastly, the predicted MOS is validated with the MOS obtained from subjective tests. Testing the proposed image quality assessment approach shows an improved prediction performance of ROI aware quality metric over traditional image quality metrics. It is also observed that the above approach provides a consistent improvement over a wide variety of distortions. After extensive research, the obtained results suggest that the impairments in the ROI are perceived to be more annoying than that of the BG.

Los estilos APA, Harvard, Vancouver, ISO, etc.

45

Lundeborg, Hammarström Inger, Elisabeth Hultcrantz, Elisabeth Ericsson y Anita McAllister. "Acoustic and perceptual aspects of vocal function in children with adenotonsillar hypertrophy —effects of surgery". Linköpings universitet, Institutionen för klinisk och experimentell medicin, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-61240.

Texto completo

Resumen

Objective: To evaluate outcome of two types of tonsil surgery (tonsillectomy+adenoidectomy or tonsillotomy +adenoidectomy) on vocal function perceptually and acoustically. Study Design: Sixty-seven children, aged 50-65 months, on waiting list for tonsil surgery were randomized to tonsillectomy (n=33) or tonsillotomy (n=34). Fifty-seven age and gender matched healthy pre-school children were controls. Twenty-eight of them, aged 48-59 months, served as control group before surgery, and 29, aged 60-71 months, after surgery Methods: Before surgery and six months postoperatively, the children were recorded producing three sustained vowels (/A, u, i/) and 14 words. The control groups were recorded only once. Three trained speech and language pathologists performed the perceptual analysis using Visual Analogue Scales (VAS) for eight voice quality parameters. Acoustic analysis from sustained vowels included average fundamental frequency, jitter percent, shimmer percent, noise-to-harmonic ratio and the centre frequencies of formants 1-3 Results: Before surgery the children were rated to have more hyponasality and compressed/throaty voice (p<0,05) and lower mean pitch (p<0,01) in comparison to the control group. They also had higher perturbation measures and lower frequencies of the second and third formant. After surgery there were no differences perceptually. Perturbation measures decreased but were still higher compared to the control group’s, p<0, 05. Differences in formant frequencies for /i/ and /u/ remained. No differences were found between the two surgical methods. Conclusion: Voice quality is affected perceptually and acoustically by adenotonsillar hypertrophy. After surgery the voice is perceptually normalized but acoustic differences remain. Outcome was equal for both surgical methods.

Los estilos APA, Harvard, Vancouver, ISO, etc.

46

Rai, kurlethimar Yashas. "Visual attention for quality prediction at fine spatio-temporal scales : from perceptual weighting towards visual disruption modeling". Thesis, Nantes, 2017. http://www.theses.fr/2017NANT4027/document.

Texto completo

Resumen

Cette thèse revisite les relations entre les processus attentionnels visuels et la perception de qualité. Nous nous intéressons à la perception de dégradation dans des séquences d’images et leur impact sur la perception de qualité. Plutôt qu’un approcha globale, nous travaillons à une échelle spatio temporelle fine, plus adaptée aux décisions des encodeurs vidéo. Deux approches liant attention visuelle et qualité perçue sont explorées. La première, suit une approche classique, de type pondération des distorsions. Ceci est mis en relation avec des scénarios d’usage comme le streaming interactif ou la visualisation de contenus omnidirectionnels. Une seconde approche nous amène à introduire le concept de disruption visuelle (DV) et sa relation avec la perception de qualité. Nous proposons d’abord des techniques permettant d’étudier les saccades résultantes de la DV à partir par de données expérimentales oculométriques. Nous proposons ensuite un modèle computationnel de prédiction de la DV. Une nouvelle mesure objective de qualité est ainsi introduite nommée "Disruption Metric" permettant l’évaluation de la qualité locale de vidéos. Les résultats obtenus trouvent leurs applications dans de nombreux domaines tels que l’évaluation de qualité, la compression, la transmission perpétuellement optimisée de contenus visuel ou le rendu/visualisation foéval
This thesis revisits the relationship between visual attentional processes and the perception of quality. We mainly focus on the perception of degradation in video sequences and their overall impact on our perception of quality. Rather than a global approach, we work in a very localized spatio-temporal scale, more adapted to the decision-process in video encoders. Two approaches linking visual attention and perceived quality are explored in the thesis. The first follows a classical approach, of the distortion weighting type. This is very useful in certain scenarios such as interactive streaming or visualization of omni-directional content. The second approach leads us to the introduction of the concept of visual disruption(DV), and explore its relation to perceived quality. We first propose techniques for studying the saccades related to DV from experimental oculometric data. Then, a computational model for the prediction of DV is proposed. A new objective measurement of quality is therefore born, which we call the "Disruption Metric" : that allows the evaluation of the local quality of videos. The results obtained, find their applications in many fields such as quality evaluation, compression, perpetually optimized transmission of visual content or foveated rendering / transmission

Los estilos APA, Harvard, Vancouver, ISO, etc.

47

Zhang, Di. "INFORMATION THEORETIC CRITERIA FOR IMAGE QUALITY ASSESSMENT BASED ON NATURAL SCENE STATISTICS". Thesis, University of Waterloo, 2006. http://hdl.handle.net/10012/2842.

Texto completo

Resumen

Measurement of visual quality is crucial for various image and video processing applications.

The goal of objective image quality assessment is to introduce a computational quality metric that can predict image or video quality. Many methods have been proposed in the past decades. Traditionally, measurements convert the spatial data into some other feature domains, such as the Fourier domain, and detect the similarity, such as mean square distance or Minkowsky distance, between the test data and the reference or perfect data, however only limited success has been achieved. None of the complicated metrics show any great advantage over other existing metrics.

The common idea shared among many proposed objective quality metrics is that human visual error sensitivities vary in different spatial and temporal frequency and directional channels. In this thesis, image quality assessment is approached by proposing a novel framework to compute the lost information in each channel not the similarities as used in previous methods. Based on natural scene statistics and several image models, an information theoretic framework is designed to compute the perceptual information contained in images and evaluate image quality in the form of entropy.

The thesis is organized as follows. Chapter I give a general introduction about previous work in this research area and a brief description of the human visual system. In Chapter II statistical models for natural scenes are reviewed. Chapter III proposes the core ideas about the computation of the perceptual information contained in the images. In Chapter IV, information theoretic criteria for image quality assessment are defined. Chapter V presents the simulation results in detail. In the last chapter, future direction and improvements of this research are discussed.

Los estilos APA, Harvard, Vancouver, ISO, etc.

48

Huo, Donglai. "Quantitative Image Quality Evaluation of Fast Magnetic Resonance Imaging". Case Western Reserve University School of Graduate Studies / OhioLINK, 2007. http://rave.ohiolink.edu/etdc/view?acc_num=case1155913518.

Texto completo

Los estilos APA, Harvard, Vancouver, ISO, etc.

49

Meneghel, Giovani Balen. "Proposta de metodologia para avaliação de métodos de iluminação global em síntese de imagens". Universidade de São Paulo, 2015. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-08072016-153418/.

Texto completo

Resumen

Produzir imagens de alta qualidade por computador, no menor tempo possível, que sejam convincentes ao público alvo, utilizando-se de maneira ótima todos os recursos computacionais à disposição, é uma tarefa que envolve uma cadeia de processos específicos, sendo um grande desafio ainda nos dias de hoje. O presente trabalho apresenta um estudo sobre toda esta cadeia de processos, com foco na avaliação de métodos de Iluminação Global empregados na Síntese de Imagens fotorrealistas para as áreas de Animação e Efeitos Visuais. Com o objetivo de auxiliar o usuário na tarefa de produzir imagens fotorrealistas de alta qualidade, foram realizados experimentos envolvendo diversas cenas de teste e seis métodos de Iluminação Global do Estado da Arte: Path Tracing, Light Tracing, Bidirectional Path Tracing, Metropolis Light Transport, Progressive Photon Mapping e Vertex Connection and Merging. O sintetizador escolhido para execução do experimento foi o Mitsuba Renderer. Para avaliação da qualidade dos resultados, duas métricas perceptuais foram adotadas: o Índice de Similaridade Estrutural SSIM e o Previsor de Diferenças Visuais HDR-VDP-2. A partir da avaliação dos resultados, foi construído um Guia de Recomendações para o usuário, indicando, com base nas características de uma cena arbitrária, o método de Iluminação Global mais adequado para realizar a síntese das imagens. Por fim, foram apontados caminhos de pesquisa para trabalhos futuros, sugerindo o emprego de classificadores, métodos de redução de parâmetros e Inteligência Artificial a fim de automatizar o processo de produção de imagens fotorrealistas e de alta qualidade.
The task of generating high quality computer images in the shortest time possible, believable to the targets audience perception, using all computational resources available, is still a challenging procedure, composed by a chain of specific processes. This work presents a study of this chain, focusing on the evaluation of Global Illumination methods used on the Synthesis of Photorealistic Images, in the areas of Animation and Visual Effects. To achieve the goal of helping users to produce high-quality photorealistic images, two experiments were proposed containing several test scenes and six State-of-the-Art Global Illumination methods: Path Tracing, Light Tracing, Bidirectional Path Tracing, Metropolis Light Transport, Progressive Photon Mapping and Vertex Connection and Merging. In order to execute the tests, the open source renderer Mitsuba was used. The quality of the produced images was analyzed using two different perceptual metrics: Structural Similarity Index SSIM and Visual Difference Predictor HDR-VDP-2. By analyzing results, a Recommendation Guide was created, providing suggestions, based on an arbitrary scenes characteristics, of the most suitable Global Illumination method to be used in order to synthesize images from the given scene. In the end, future ways of research are presented, proposing the use of classifiers, parameter reduction methods and Artificial Intelligence, in order to build an automatic procedure to generate high quality photorealistic images.

Los estilos APA, Harvard, Vancouver, ISO, etc.

50

Parikh, Devangi Nikunj. "Improving the quality of speech in noisy environments". Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45889.

Texto completo

Resumen

In this thesis, we are interested in processing noisy speech signals that are meant to be heard by humans, and hence we approach the noise-suppression problem from a perceptual perspective. We develop a noise-suppression paradigm that is based on a model of the human auditory system, where we process signals in a way that is natural to the human ear. Under this paradigm, we transform an audio signal in to a perceptual domain, and processes the signal in this perceptual domain. This approach allows us to reduce the background noise and the audible artifacts that are seen in traditional noise-suppression algorithms, while preserving the quality of the processed speech. We develop a single- and dual-microphone algorithm based on this perceptual paradigm, and conduct subjecting tests to show that this approach outperforms traditional noise-suppression techniques. Moreover, we investigate the cause of audible artifacts that are generated as a result of suppressing the noise in noisy signals, and introduce constraints on the noise-suppression gain such that these artifacts are reduced.

Los estilos APA, Harvard, Vancouver, ISO, etc.

Tesis sobre el tema "Perceptual quality"

Crea una cita precisa en los estilos APA, MLA, Chicago, Harvard y otros