Dissertations / Theses on the topic 'Multimedia signal processing'

To see the other types of publications on this topic, follow the link: Multimedia signal processing.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 28 dissertations / theses for your research on the topic 'Multimedia signal processing.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Athanasiadis, Tasso, and tas atha@bigpond net au. "Signal Processing Techniques for Mobile Multimedia Systems." RMIT University. Electrical and Computer Engineering, 2007. http://adt.lib.rmit.edu.au/adt/public/adt-VIT20080123.115457.

Full text
Abstract:
Recent trends in wireless communication systems show a significant demand for the delivery of multimedia services and applications over mobile networks - mobile multimedia - like video telephony, multimedia messaging, mobile gaming, interactive and streaming video, etc. However, despite the ongoing development of key communication technologies that support these applications, the communication resources and bandwidth available to wireless/mobile radio systems are often severely limited. It is well known, that these bottlenecks are inherently due to the processing capabilities of mobile transmission systems, and the time-varying nature of wireless channel conditions and propagation environments. Therefore, new ways of processing and transmitting multimedia data over mobile radio channels have become essential which is the principal focus of this thesis. In this work, the performance and suitability of various signal processing techniques and transmission strategies in the application of multimedia data over wireless/mobile radio links are investigated. The proposed transmission systems for multimedia communication employ different data encoding schemes which include source coding in the wavelet domain, transmit diversity coding (space-time coding), and adaptive antenna beamforming (eigenbeamforming). By integrating these techniques into a robust communication system, the quality (SNR, etc) of multimedia signals received on mobile devices is maximised while mitigating the fast fading and multi-path effects of mobile channels. To support the transmission of high data-rate multimedia applications, a well known multi-carrier transmission technology known as Orthogonal Frequency Division Multiplexing (OFDM) has been implemented. As shown in this study, this results in significant performance gains when combined with other signal-processing techniques such as spa ce-time block coding (STBC). To optimise signal transmission, a novel unequal adaptive modulation scheme for the communication of multimedia data over MIMO-OFDM systems has been proposed. In this system, discrete wavelet transform/subband coding is used to compress data into their respective low-frequency and high-frequency components. Unlike traditional methods, however, data representing the low-frequency data are processed and modulated separately as they are more sensitive to the distortion effects of mobile radio channels. To make use of a desirable subchannel state, such that the quality (SNR) of the multimedia data recovered at the receiver is optimized, we employ a lookup matrix-adaptive bit and power allocation (LM-ABPA) algorithm. Apart from improving the spectral efficiency of OFDM, the modified LM-ABPA scheme, sorts and allocates subcarriers with the highest SNR to low-frequency data and the remaining to the least important data. To maintain a target system SNR, the LM-ABPA loading scheme assigns appropriate signal constella tion sizes and transmit power levels (modulation type) across all subcarriers and is adapted to the varying channel conditions such that the average system error-rate (SER/BER) is minimised. When configured for a constant data-rate load, simulation results show significant performance gains over non-adaptive systems. In addition to the above studies, the simulation framework developed in this work is applied to investigate the performance of other signal processing techniques for multimedia communication such as blind channel equalization, and to examine the effectiveness of a secure communication system based on a logistic chaotic generator (LCG) for chaos shift-keying (CSK).
APA, Harvard, Vancouver, ISO, and other styles
2

Guo, Liwei. "Restoration and modeling for multimedia compression /." View abstract or full-text, 2008. http://library.ust.hk/cgi/db/thesis.pl?ECED%202008%20GUOL.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Bakken, Marianne. "Signal Processing for Communicating Gravity Wave Images from the NTNU Test Satellite." Thesis, Norges teknisk-naturvitenskapelige universitet, Institutt for elektronikk og telekommunikasjon, 2012. http://urn.kb.se/resolve?urn=urn:nbn:no:ntnu:diva-19229.

Full text
Abstract:
The NTNU Test Satellite (NUTS) is planned to have a payload for observation of atmospheric gravity waves. The gravity waves will be observed by means of an infrared camera imaging the perturbations in the OH airglow layer. So far, no suitable camera has been found that complies with the restrictions that follows when building a small satellite. Uncooled InGaAs has however been concluded to be the most suitable detector type in terms of wavelength response and weight.InGaAs sensors are known to have a high dark current when not cooled, and processing must therefore be applied to remove the background offset and noise.The combination of the high speed of the satellite and the long exposure time that is required for the camera will create motion blur. Simulations with synthetic test images in MATLAB showed that the integration time should at least be kept under 1 second in order not to destroy the wave patterns. Longer integration times may however be required in order to get a sufficient SNR.Two signal processing solutions to this problem was investigated: motion blur removal by deconvolution and image averaging with motion compensation. The former strategy is to apply a long exposure time to get a strong signal, and then remove the blur with deconvolution techniques using knowledge of the blur filter.Simulations applying the Lucy-Richardson (LR) algorithm showed that it was not able to remove strong blur, and was very sensitive to errors in the blur filter and noise in the image. The other approach is to obtain a sequence of images with short exposure time in order to avoid motion blur, and provide the necessary SNR by shifting the images according to the known motion and combine them into one image. This concept is simpler and more reliable than the deconvolution approach, and simulations showed that it is less sensitive to errors in the speed estimate than the deconvolution algorithm. It was concluded that this is the most suitable approach for the NUTS application, and it should be implemented on-board the satellite in order to provide a good SNR for the compression to function optimally. The downlink datarate of NUTS is of only 9600 bit/s, and it has been estimated that 2.45 Mb of payload data can be downloaded on average per day. This corresponds to less than 5 uncompressed images of 256 × 256 pixels with 8 bit per pixel.A sequence of overlapping combined images should be obtained to provide a scan of a desired area, and it was suggested that it should be encoded as video to enable efficient compression and transmission of as many images as possible to the ground station. A three-dimensional DPCM algorithm combined with a deadzone quantizer and stack-run coding was implemented in MATLAB. Simulations demonstrated that this simple compression scheme can provide a bit rate of less than 1 bit/px for a sequence of ravity wave images. One of the quantizers that was tried gave 0.83 bits per pixel with reasonable quality. If this number can be achieved in practice, the image transfer ate would be increased to 45 images per day, which is a significant improvement.
APA, Harvard, Vancouver, ISO, and other styles
4

Houas, Heykel. "Allocation de ressources pour la transmission de données multimedia scalables." Phd thesis, Université de Cergy Pontoise, 2009. http://tel.archives-ouvertes.fr/tel-00767889.

Full text
Abstract:
Cette thèse s'intéresse aux problèmes d'allocation de ressources pour la transmission de données multimédia scalables sous contraintes de qualité de service (QoS) sur les réseaux hétérogènes. Les liaisons filaires et sans fil considérées (DS-CDMA, OFDMA) sont appliquées à des services de transmission d'images et de parole sur des canaux à évanouissements lents ou rapides, avec ou sans multitrajets. La QoS de ces réseaux est exprimée en terme de qualité perçue du point de vue de l'utilisateur (couche Application) et en terme de taux d'erreurs binaires (TEB) par classe du point de vue de la transmission (couche Physique). Les ressources étudiées sont : l'allocation des puissances, des ordres de modulation et des porteuses ainsi que les propriétés de protection inégale contre les erreurs (UEP). L'objectif de ce document est d'allouer ces ressources de façon à maximiser le débit source des données multimédia hiérarchisées (sous forme de classes d'importance) en s'appuyant sur une connaissance parfaite ou partielle des canaux de propagation, sous contrainte de performances cibles en réception. Les stratégies d'adaptation de lien que nous présentons se basent sur la possible troncature d'une partie de ces données à transmettre. Elles se fondent également sur le degré de sensibilité et la protection adéquate de chacune de ces classes contre les erreurs liées à la transmission sur le canal, conformément aux exigences de QoS exprimées sur ces dernières. Les schémas de transmission explorent plusieurs critères d'optimisation des ressources : la minimisation de la charge utile du système ainsi que l'optimisation de la robustesse de la transmission aux erreurs d'estimation du canal. Dans ces contextes, nous décrivons l'allocation optimale de sous-porteuses, de modulations, de rendements de code et d'énergie maximisant le débit source de l'utilisateur tout en véri ant les contraintes sur la charge du système et la QoS. Nous montrons que ces schémas d'allocation sont adaptables à de nombreux systèmes de communication et présentent des performances supérieures aux stratégies de l'état de l'art.
APA, Harvard, Vancouver, ISO, and other styles
5

Oberhofer, Robert. "Pitch adaptive variable bitrate CELP speech coding." Thesis, University of Ulster, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.264811.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

DeBardelaben, James Anthony. "An optimization-based approach for cost-effective embedded DSP system design." Diss., Georgia Institute of Technology, 1998. http://hdl.handle.net/1853/15757.

Full text
APA, Harvard, Vancouver, ISO, and other styles
7

Shen, Ju. "Computational Multimedia for Video Self Modeling." UKnowledge, 2014. http://uknowledge.uky.edu/cs_etds/26.

Full text
Abstract:
Video self modeling (VSM) is a behavioral intervention technique in which a learner models a target behavior by watching a video of oneself. This is the idea behind the psychological theory of self-efficacy - you can learn or model to perform certain tasks because you see yourself doing it, which provides the most ideal form of behavior modeling. The effectiveness of VSM has been demonstrated for many different types of disabilities and behavioral problems ranging from stuttering, inappropriate social behaviors, autism, selective mutism to sports training. However, there is an inherent difficulty associated with the production of VSM material. Prolonged and persistent video recording is required to capture the rare, if not existed at all, snippets that can be used to string together in forming novel video sequences of the target skill. To solve this problem, in this dissertation, we use computational multimedia techniques to facilitate the creation of synthetic visual content for self-modeling that can be used by a learner and his/her therapist with a minimum amount of training data. There are three major technical contributions in my research. First, I developed an Adaptive Video Re-sampling algorithm to synthesize realistic lip-synchronized video with minimal motion jitter. Second, to denoise and complete the depth map captured by structure-light sensing systems, I introduced a layer based probabilistic model to account for various types of uncertainties in the depth measurement. Third, I developed a simple and robust bundle-adjustment based framework for calibrating a network of multiple wide baseline RGB and depth cameras.
APA, Harvard, Vancouver, ISO, and other styles
8

Sezer, Osman Gokhan. "Data-driven transform optimization for next generation multimedia applications." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42765.

Full text
Abstract:
The objective of this thesis is to formulate a generic dictionary learning method with the guiding principle that states: Efficient representations lead to efficient estimations. The fundamental idea behind using transforms or dictionaries for signal representation is to exploit the regularity within data samples such that the redundancy of the representation is minimized subject to a level of fidelity. This observation translates to rate-distortion cost in compression literature, where a transform that has the lowest rate-distortion cost provides a more efficient representation than the others. In our work, rather than using as an analysis tool, the rate-distortion cost is utilized to improve the efficiency of transforms. For this, an iterative optimization method is proposed, which seeks an orthonormal transform that reduces the expected value of rate-distortion cost of an ensemble of data. Due to the generic nature of the new optimization method, one can design a set of orthonormal transforms either in the original signal domain or on the top of a transform-domain representation. To test this claim, several image codecs are designed, which use block-, lapped- and wavelet-transform structures. Significant increases in compression performances are observed compared to original methods. An extension of the proposed optimization method for video coding gave us state-of-the-art compression results with separable transforms. Also using the robust statistics, an explanation to the superiority of new design over other learning-based methods such as Karhunen-Loeve transform is provided. Finally, the new optimization method and the minimization of the "oracle" risk of diagonal estimators in signal estimation is shown to be equal. With the design of new diagonal estimators and the risk-minimization-based adaptation, a new image denoising algorithm is proposed. While these diagonal estimators denoise local image patches, by formulation the optimal fusion of overlapping local denoised estimates, the new denoising algorithm is scaled to operate on large images. In our experiments, the state-of-the-art results for transform-domain denoising are achieved.
APA, Harvard, Vancouver, ISO, and other styles
9

Narapareddy, Yagna Brahma Sai. "QoE Performance Evaluation by Introducing Video Freeze on Mobile Multimedia." Thesis, Blekinge Tekniska Högskola, Institutionen för tillämpad signalbehandling, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-18995.

Full text
Abstract:
Real time video streaming through mobile Internet is increasing day by day and the video  quality can be affected  very badly by network performance issues. Video freezing and video  jumping are one of the serious issues that affect the user experience badly. Hence service providers are interested to evaluate the performance of quality of experience. We  follow the methods from the International Telecommunications Union–Telecommunication Sector(ITU-T)recommendations. In this thesis, we are studying the effect of freezing on user experience by subjective tests and obtaining the mean opinion scores using perceptual video quality assessment tool and analyze  which part of the video is affected mostly by introducing freezein selected parts.
APA, Harvard, Vancouver, ISO, and other styles
10

Uzuegbunam, Nkiruka M. A. "SELF-IMAGE MULTIMEDIA TECHNOLOGIES FOR FEEDFORWARD OBSERVATIONAL LEARNING." UKnowledge, 2018. https://uknowledge.uky.edu/ece_etds/124.

Full text
Abstract:
This dissertation investigates the development and use of self-images in augmented reality systems for learning and learning-based activities. This work focuses on self- modeling, a particular form of learning, actively employed in various settings for therapy or teaching. In particular, this work aims to develop novel multimedia systems to support the display and rendering of augmented self-images. It aims to use interactivity (via games) as a means of obtaining imagery for use in creating augmented self-images. Two multimedia systems are developed, discussed and analyzed. The proposed systems are validated in terms of their technical innovation and their clinical efficacy in delivering behavioral interventions for young children on the autism spectrum.
APA, Harvard, Vancouver, ISO, and other styles
11

Yang, Yimin. "Exploring Hidden Coherent Feature Groups and Temporal Semantics for Multimedia Big Data Analysis." FIU Digital Commons, 2015. http://digitalcommons.fiu.edu/etd/2254.

Full text
Abstract:
Thanks to the advanced technologies and social networks that allow the data to be widely shared among the Internet, there is an explosion of pervasive multimedia data, generating high demands of multimedia services and applications in various areas for people to easily access and manage multimedia data. Towards such demands, multimedia big data analysis has become an emerging hot topic in both industry and academia, which ranges from basic infrastructure, management, search, and mining to security, privacy, and applications. Within the scope of this dissertation, a multimedia big data analysis framework is proposed for semantic information management and retrieval with a focus on rare event detection in videos. The proposed framework is able to explore hidden semantic feature groups in multimedia data and incorporate temporal semantics, especially for video event detection. First, a hierarchical semantic data representation is presented to alleviate the semantic gap issue, and the Hidden Coherent Feature Group (HCFG) analysis method is proposed to capture the correlation between features and separate the original feature set into semantic groups, seamlessly integrating multimedia data in multiple modalities. Next, an Importance Factor based Temporal Multiple Correspondence Analysis (i.e., IF-TMCA) approach is presented for effective event detection. Specifically, the HCFG algorithm is integrated with the Hierarchical Information Gain Analysis (HIGA) method to generate the Importance Factor (IF) for producing the initial detection results. Then, the TMCA algorithm is proposed to efficiently incorporate temporal semantics for re-ranking and improving the final performance. At last, a sampling-based ensemble learning mechanism is applied to further accommodate the imbalanced datasets. In addition to the multimedia semantic representation and class imbalance problems, lack of organization is another critical issue for multimedia big data analysis. In this framework, an affinity propagation-based summarization method is also proposed to transform the unorganized data into a better structure with clean and well-organized information. The whole framework has been thoroughly evaluated across multiple domains, such as soccer goal event detection and disaster information management.
APA, Harvard, Vancouver, ISO, and other styles
12

Dikbas, Salih. "A low-complexity approach for motion-compensated video frame rate up-conversion." Diss., Georgia Institute of Technology, 2011. http://hdl.handle.net/1853/42730.

Full text
Abstract:
Video frame rate up-conversion is an important issue for multimedia systems in achieving better video quality and motion portrayal. Motion-compensated methods offer better quality interpolated frames since the interpolation is performed along the motion trajectory. In addition, computational complexity, regularity, and memory bandwidth are important for a real-time implementation. Motion-compensated frame rate up-conversion (MC-FRC) is composed of two main parts: motion estimation (ME) and motion-compensated frame interpolation (MCFI). Since ME is an essential part of MC-FRC, a new fast motion estimation (FME) algorithm capable of producing sub-sample motion vectors at low computational-complexity has been developed. Unlike existing FME algorithms, the developed algorithm considers the low complexity sub-sample accuracy in designing the search pattern for FME. The developed FME algorithm is designed in such a way that the block distortion measure (BDM) is modeled as a parametric surface in the vicinity of the integer-sample motion vector; this modeling enables low computational-complexity sub-sample motion estimation without pixel interpolation. MC-FRC needs more accurate motion trajectories for better video quality; hence, a novel true-motion estimation (TME) algorithm targeting to track the projected object motion has been developed for video processing applications, such as motion-compensated frame interpolation (MCFI), deinterlacing, and denoising. Developed TME algorithm considers not only the computational complexity and regularity but also memory bandwidth. TME is obtained by imposing implicit and explicit smoothness constraints on block matching algorithm (BMA). In addition, it employs a novel adaptive clustering algorithm to keep the low-complexity at reasonable levels yet enable exploiting more spatiotemporal neighbors. To produce better quality interpolated frames, dense motion field at the interpolation instants are obtained for both forward and backward motion vectors (MVs); then, bidirectional motion compensation using forward and backward MVs is applied by mixing both elegantly.
APA, Harvard, Vancouver, ISO, and other styles
13

Irani, Ramin. "Error Detection for DMB Video Streams." Thesis, Blekinge Tekniska Högskola, Sektionen för ingenjörsvetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:bth-5086.

Full text
Abstract:
The purpose of this thesis is to detect errors in Digital Multimedia Broadcasting (DMB) transport stream. DMB uses the MPEG-4 standard for encapsulating Packetized Elementary Stream (PES), and uses the MPEG-2 standard for assembling them in the form of transport stream packets. Recently many research works have been carried out about video stream error detection. They mostly do this by focusing on some decoding parameters related to frame. Processing complexity can be a disadvantage for the proposed methods. In this thesis, we investigated syntax error occurrences due to corruption in the header of the video transport stream. The main focus of the study is the video streams that cannot be decoded. The proposed model is implemented by filtering video and audio packets in order to find the errors. The filters investigate some sources that can affect the video stream playback. The output from this method determines the type, location and duration of the errors. The simplicity of the structure is one of advantages of this model. It can be implemented by three simple filters for detecting errors and a “calculation unit” for calculating the duration of an error. Fast processing is another benefit of the proposed model.
APA, Harvard, Vancouver, ISO, and other styles
14

Laurier, Cyril François. "Automatic Classification of musical mood by content-based analysis." Doctoral thesis, Universitat Pompeu Fabra, 2011. http://hdl.handle.net/10803/51582.

Full text
Abstract:
In this work, we focus on automatically classifying music by mood. For this purpose, we propose computational models using information extracted from the audio signal. The foundations of such algorithms are based on techniques from signal processing, machine learning and information retrieval. First, by studying the tagging behavior of a music social network, we find a model to represent mood. Then, we propose a method for automatic music mood classification. We analyze the contributions of audio descriptors and how their values are related to the observed mood. We also propose a multimodal version using lyrics, contributing to the field of text retrieval. Moreover, after showing the relation between mood and genre, we present a new approach using automatic music genre classification. We demonstrate that genre-based mood classifiers give higher accuracies than standard audio models. Finally, we propose a rule extraction technique to explicit our models.
En esta tesis, nos centramos en la clasificación automática de música a partir de la detección de la emoción que comunica. Primero, estudiamos cómo los miembros de una red social utilizan etiquetas y palabras clave para describir la música y las emociones que evoca, y encontramos un modelo para representar los estados de ánimo. Luego, proponemos un método de clasificación automática de emociones. Analizamos las contribuciones de descriptores de audio y cómo sus valores están relacionados con los estados de ánimo. Proponemos también una versión multimodal de nuestro algoritmo, usando las letras de canciones. Finalmente, después de estudiar la relación entre el estado de ánimo y el género musical, presentamos un método usando la clasificación automática por género. A modo de recapitulación conceptual y algorítmica, proponemos una técnica de extracción de reglas para entender como los algoritmos de aprendizaje automático predicen la emoción evocada por la música
APA, Harvard, Vancouver, ISO, and other styles
15

Kharbouche, Said. "Fonctions de Croyance et Indexation MultimodaleApplication à l'Identification de Personnes dans des Albums." Phd thesis, Université de Rouen, 2006. http://tel.archives-ouvertes.fr/tel-00232806.

Full text
Abstract:
Cette thèse se situe dans la cadre de l'organisation semi-automatique d'albums photo et s'intègre dans un cadre applicatif particulier d'un prototype de service développé par la division recherche et développement de France Telecom. Dans ce cadre applicatif, les photos peuvent être partagées entre plusieurs personnes et peuvent êtres commentées vocalement et/ou textuellement par ces déférents utilisateurs. Le processus d'indexation développé dans cette thèse ne se limite pas seulement à l'indexation d'une collection d'images mais traite également leurs commentaires associés ce qui rend ces contenus multimédia. D'autres informations peuvent également être associées aux photos comme les dates et les lieux d'acquisition de l'image (qui sont connus avec une grande précision grâce notamment au développement de moyens de géo-localisation des appareils multimédia) et peuvent êtres exploitées pour l'organisation de la base. Ainsi, le travail envisagé dans le cadre de cette thèse se focalise sur des documents multimédias avec déférentes modalités : image, texte, son et données. L'un des objectifs à atteindre concerne la fusion des informations issues de ces déférentes modalités dans le but d'identifier les personnages figurant dans les images qui permettent ainsi d'indexer les documents. Chacun des documents de la collection est représenté par ses contenus relatifs aux déférents médias mais est aussi considéré dans son contexte. Pour analyser chaque contenu d'un document, nous utilisons des outils d'indexation qui leur sont spécifiques. Le contexte d'une image est exploité à partir de descripteurs déjà calculés sur des documents de la base en exploitant les dates et lieux d'acquisition des images associées. La contribution essentielle de ce travail concerne donc l'indexation de documents multimédia par leur contenu et leur contexte.
APA, Harvard, Vancouver, ISO, and other styles
16

Karaman, Svebor. "Indexation de la Vidéo Portée : Application à l'Étude Épidémiologique des Maladies Liées à l'Âge." Phd thesis, Université Sciences et Technologies - Bordeaux I, 2011. http://tel.archives-ouvertes.fr/tel-00689855.

Full text
Abstract:
Le travail de recherche de cette thèse de doctorat s'inscrit dans le cadre du suivi médical des patients atteints de démences liées à l'âge à l'aide des caméras videos portées par les patients. L'idée est de fournir aux médecins un nouvel outil pour le diagnostic précoce de démences liées à l'âge telles que la maladie d'Alzheimer. Plus précisément, les Activités Instrumentales du Quotidien (IADL : Instrumental Activities of Daily Living en anglais) doivent être indexées automatiquement dans les vidéos enregistrées par un dispositif d'enregistrement portable. Ces vidéos présentent des caractéristiques spécifiques comme de forts mouvements ou de forts changements de luminosité. De plus, la tâche de reconnaissance visée est d'un très haut niveau sémantique. Dans ce contexte difficile, la première étape d'analyse est la définition d'un équivalent à la notion de " plan " dans les contenus vidéos édités. Nous avons ainsi développé une méthode pour le partitionnement d'une vidéo tournée en continu en termes de " points de vue " à partir du mouvement apparent. Pour la reconnaissance des IADL, nous avons développé une solution selon le formalisme des Modèles de Markov Cachés (MMC). Un MMC hiérarchique à deux niveaux a été introduit, modélisant les activités sémantiques ou des états intermédiaires. Un ensemble complexe de descripteurs (dynamiques, statiques, de bas niveau et de niveau intermédiaire) a été exploité et les espaces de description joints optimaux ont été identifiés expérimentalement. Dans le cadre de descripteurs de niveau intermédiaire pour la reconnaissance d'activités nous nous sommes particulièrement intéressés aux objets sémantiques que la personne manipule dans le champ de la caméra. Nous avons proposé un nouveau concept pour la description d'objets ou d'images faisant usage des descripteurs locaux (SURF) et de la structure topologique sous-jacente de graphes locaux. Une approche imbriquée pour la construction des graphes où la même scène peut être décrite par plusieurs niveaux de graphes avec un nombre de nœuds croissant a été introduite. Nous construisons ces graphes par une triangulation de Delaunay sur des points SURF, préservant ainsi les bonnes propriétés des descripteurs locaux c'est-à-dire leur invariance vis-à-vis de transformations affines dans le plan image telles qu'une rotation, une translation ou un changement d'échelle. Nous utilisons ces graphes descripteurs dans le cadre de l'approche Sacs-de-Mots-Visuels. Le problème de définition d'une distance, ou dissimilarité, entre les graphes pour la classification non supervisée et la reconnaissance est nécessairement soulevé. Nous proposons une mesure de dissimilarité par le Noyau Dépendant du Contexte (Context-Dependent Kernel : CDK) proposé par H. Sahbi et montrons sa relation avec la norme classique L2 lors de la comparaison de graphes triviaux (les points SURF). Pour la reconnaissance d'activités par MMC, les expériences sont conduites sur le premier corpus au monde de vidéos avec caméra portée destiné à l'observation des d'IADL et sur des bases de données publiques comme SIVAL et Caltech-101 pour la reconnaissance d'objets.
APA, Harvard, Vancouver, ISO, and other styles
17

PALLONE, Grégory. "DILATATION ET TRANSPOSITION SOUS CONTRAINTES PERCEPTIVES DES SIGNAUX AUDIO : APPLICATION AU TRANSFERT CINEMA-VIDEO." Phd thesis, Université de la Méditerranée - Aix-Marseille II, 2003. http://tel.archives-ouvertes.fr/tel-00003363.

Full text
Abstract:
La coexistence de deux formats : cinéma à 24 images/s et vidéo à
25 images/s, implique l'accélération ou le ralentissement de la
bande-son lors du transfert d'un format vers l'autre. Ceci
provoque une modification temporelle du signal sonore, et par
conséquent une modification spectrale avec altération du timbre.
Les studios de post-production audiovisuelle souhaitent compenser
cet effet par l'application d'une transformation sonore adéquate.

L'objectif de ce travail est de fournir à l'industrie
audiovisuelle un système permettant de pallier la modification de
timbre engendrée par le changement de vitesse de lecture. Ce
système se compose d'une part d'un algorithme de traitement et
d'autre part d'une machine sur lequel il est implanté.
L'algorithme est conçu et développé pour répondre aux contraintes
liées à la qualité sonore et à la compatibilité multicanal. La
machine, baptisée HARMO, est conçue spécifiquement par la société
GENESIS sur la base de processeurs de signaux numériques, et doit
répondre à la contrainte de temps-réel. Cet aspect "valorisation"
conduit à intégrer dans le projet les contraintes de coût et de
délai de réalisation.

Un état de l'art basé sur une bibliographie quasi-exhaustive
aboutit à une classification originale des méthodes de dilatation
et de transposition existantes. Ceci nous amène à distinguer et à
étudier les méthodes classiques temporelles et fréquentielles, et
à introduire les méthodes temps-fréquence. Cette classification
est à la base de plusieurs méthodes innovantes :

1. deux méthodes temps-fréquence dont l'analyse est adaptée à l'audition,

2. deux méthodes couplées qui associent les avantages des méthodes temporelles et fréquentielles,

3. une méthode temporelle basée sur une amélioration des méthodes existantes.

Les algorithmes sont évalués grâce à une banque de sons-test
spécifiquement élaborée pour mettre en évidence les défauts
caractéristiques des algorithmes. Notre choix final s'est porté
sur l'approche temporelle, que nous optimisons par l'adjonction de
critères de segmentation basés sur l'autocorrélation normalisée et
la détection de transitoires. Cet algorithme s'intègre dans un
logiciel qui a été structuré pour un fonctionnement temps-réel et
multicanal sur le système HARMO.
APA, Harvard, Vancouver, ISO, and other styles
18

Liu, Ming. "Analyse et optimisation du système asiatique de diffusion terrestre et mobile de la télévision numérique." Phd thesis, INSA de Rennes, 2011. http://tel.archives-ouvertes.fr/tel-00662247.

Full text
Abstract:
Cette thèse a pour objectif l'analyse du système de télévision numérique chinois (DTMB) et l'optimisation de sa fonction d'estimation de canal. Tout d'abord, une analyse approfondie de ce système est effectuée en comparaison du système DVB-T en termes de spécifications, d'efficacité spectrale et de performances. Ensuite, la fonction d'estimation de canal basée sur la séquence pseudo-aléatoire du système est étudiée dans les domaines temporel et fréquentiel, et plusieurs améliorations sont apportées aux méthodes typiques afin de notamment gérer les canaux très dispersifs en temps. Enfin, de nouveaux procédés itératifs aidés par les données et peu complexes sont proposés pour raffiner les estimés de canal. Les fonctions de décodage de canal et d'entrelacement sont exclues de la boucle et des fonctions de filtrage temps/fréquence sont étudiées pour fiabiliser les estimations. Ces nouveaux algorithmes démontrent leur efficacité par rapport aux méthodes courantes de la littérature.
APA, Harvard, Vancouver, ISO, and other styles
19

Oliver, Gil José Salvador. "On the design of fast and efficient wavelet image coders with reduced memory usage." Doctoral thesis, Universitat Politècnica de València, 2008. http://hdl.handle.net/10251/1826.

Full text
Abstract:
Image compression is of great importance in multimedia systems and applications because it drastically reduces bandwidth requirements for transmission and memory requirements for storage. Although earlier standards for image compression were based on the Discrete Cosine Transform (DCT), a recently developed mathematical technique, called Discrete Wavelet Transform (DWT), has been found to be more efficient for image coding. Despite improvements in compression efficiency, wavelet image coders significantly increase memory usage and complexity when compared with DCT-based coders. A major reason for the high memory requirements is that the usual algorithm to compute the wavelet transform requires the entire image to be in memory. Although some proposals reduce the memory usage, they present problems that hinder their implementation. In addition, some wavelet image coders, like SPIHT (which has become a benchmark for wavelet coding), always need to hold the entire image in memory. Regarding the complexity of the coders, SPIHT can be considered quite complex because it performs bit-plane coding with multiple image scans. The wavelet-based JPEG 2000 standard is still more complex because it improves coding efficiency through time-consuming methods, such as an iterative optimization algorithm based on the Lagrange multiplier method, and high-order context modeling. In this thesis, we aim to reduce memory usage and complexity in wavelet-based image coding, while preserving compression efficiency. To this end, a run-length encoder and a tree-based wavelet encoder are proposed. In addition, a new algorithm to efficiently compute the wavelet transform is presented. This algorithm achieves low memory consumption using line-by-line processing, and it employs recursion to automatically place the order in which the wavelet transform is computed, solving some synchronization problems that have not been tackled by previous proposals. The proposed encode
Oliver Gil, JS. (2006). On the design of fast and efficient wavelet image coders with reduced memory usage [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/1826
Palancia
APA, Harvard, Vancouver, ISO, and other styles
20

Pinto, Cristiane Zakimi Correia. "Qualidade de experiência de vídeo em aplicação interativa transmídia de televisão digital terrestre." Universidade de São Paulo, 2014. http://www.teses.usp.br/teses/disponiveis/3/3142/tde-28082015-103035/.

Full text
Abstract:
Ao mesmo tempo em que produções transmídia se tornam cada vez mais comuns, estão ocorrendo mudanças no Brasil que favorecem o aumento do uso de interatividade no dia a dia. Uma das mudanças fundamentais foi a adoção de um sistema de televisão digital terrestre que, em comparação ao sistema analógico, proporciona mais do que recepção de sons e imagens de melhor qualidade em televisores fixos: torna possível receber os sinais de televisão também em dispositivos móveis e portáteis, além de possibilitar a interatividade. Dada a importância da televisão como plataforma de mídia no país (a televisão está presente em 96,9% dos domicílios brasileiros), foi feito um estudo da Qualidade de Experiência (QoE) percebida, com base na avaliação da qualidade do vídeo obtido ao utilizar uma aplicação interativa transmídia de televisão digital terrestre. Foi desenvolvida uma aplicação interativa em linguagem NCL, linguagem declarativa padrão do Sistema Brasileiro de TV Digital Terrestre (ISDB-TB), onde um vídeo secundário é carregado via Internet em banda larga (por streaming) ao mesmo tempo em que o vídeo principal é recebido via radiodifusão. Criou-se um cenário de teste onde foi introduzida, de forma controlada, perda de pacotes IP enquanto o vídeo secundário era carregado, simulando o que poderia ocorrer na Internet. A qualidade dos vídeos obtidos em cada situação de perda de pacotes foi analisada segundo métricas objetivas e os dados encontrados permitiram inferir qual seria a QoE percebida em cada caso.
At the same time that transmedia productions are becoming more common, legal changes are occurring in Brazil that bring an increase on daily interactivity use. One of that was the adoption of a terrestrial digital television system that, when compared to legacy analog broadcast systems, yields more than sounds and images of better quality, but also makes it possible to receive TV signals on portable devices and mobile television receivers besides enabling interactivity. Owing to the importance of television as a media platform in Brazil (it is present in 96,9% households) a analysis of Quality of Experience (QoE) was made, measuring the user perception with an interactive application of terrestrial digital television. Such an application was developed using NCL language, which is the standard declarative language of Brazilian digital television system (ISDB-TB). In that application a secondary video is loaded through a broadband Internet access simultaneously with the main video being received thru broadcasting. A test platform was created where IP packet losses was introduced in a controlled way affecting the secondary video, as it is expected to occur in a real network. Video quality was assessed for each loss level with objective metrics in order to compare QoE in each situation.
APA, Harvard, Vancouver, ISO, and other styles
21

"Energy and Quality-Aware Multimedia Signal Processing." Doctoral diss., 2012. http://hdl.handle.net/2286/R.I.15781.

Full text
Abstract:
abstract: Today's mobile devices have to support computation-intensive multimedia applications with a limited energy budget. In this dissertation, we present architecture level and algorithm-level techniques that reduce energy consumption of these devices with minimal impact on system quality. First, we present novel techniques to mitigate the effects of SRAM memory failures in JPEG2000 implementations operating in scaled voltages. We investigate error control coding schemes and propose an unequal error protection scheme tailored for JPEG2000 that reduces overhead without affecting the performance. Furthermore, we propose algorithm-specific techniques for error compensation that exploit the fact that in JPEG2000 the discrete wavelet transform outputs have larger values for low frequency subband coefficients and smaller values for high frequency subband coefficients. Next, we present use of voltage overscaling to reduce the data-path power consumption of JPEG codecs. We propose an algorithm-specific technique which exploits the characteristics of the quantized coefficients after zig-zag scan to mitigate errors introduced by aggressive voltage scaling. Third, we investigate the effect of reducing dynamic range for datapath energy reduction. We analyze the effect of truncation error and propose a scheme that estimates the mean value of the truncation error during the pre-computation stage and compensates for this error. Such a scheme is very effective for reducing the noise power in applications that are dominated by additions and multiplications such as FIR filter and transform computation. We also present a novel sum of absolute difference (SAD) scheme that is based on most significant bit truncation. The proposed scheme exploits the fact that most of the absolute difference (AD) calculations result in small values, and most of the large AD values do not contribute to the SAD values of the blocks that are selected. Such a scheme is highly effective in reducing the energy consumption of motion estimation and intra-prediction kernels in video codecs. Finally, we present several hybrid energy-saving techniques based on combination of voltage scaling, computation reduction and dynamic range reduction that further reduce the energy consumption while keeping the performance degradation very low. For instance, a combination of computation reduction and dynamic range reduction for Discrete Cosine Transform shows on average, 33% to 46% reduction in energy consumption while incurring only 0.5dB to 1.5dB loss in PSNR.
Dissertation/Thesis
Ph.D. Electrical Engineering 2012
APA, Harvard, Vancouver, ISO, and other styles
22

Wires, Kent Eugene. "Arithmetic units for digital signal processing and multimedia /." Diss., 2000. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:9995545.

Full text
APA, Harvard, Vancouver, ISO, and other styles
23

Cheng, Le-Tien, and 鄭樂天. "Implementation of Multimedia Digital Signal Processing Module Using Partial Reconfiguration Architecture." Thesis, 2009. http://ndltd.ncl.edu.tw/handle/13047088512639675017.

Full text
Abstract:
碩士
元智大學
資訊工程學系
97
In this thesis, we propose the implement of multimedia digital image processing module using partial reconfiguration method, we choose discrete cosine transform (DCT) and inverse discrete cosine transform (IDCT) as the example, the use characteristic of the matrix transpose to build a architecture easier using partial reconfiguration implement discrete cosine transform and inverse discrete cosine transform. we use the Verilog HDL within Xilinx ISE 9.2i design tool to complete this architecture. After that, we use FPGA for function simulation and verify computation data
APA, Harvard, Vancouver, ISO, and other styles
24

Chao, Wei-Min, and 趙維民. "Design and Implementation of Pyramid Architecture for Image Signal Processing in Multimedia Applications." Thesis, 2011. http://ndltd.ncl.edu.tw/handle/48940862255957534055.

Full text
Abstract:
博士
國立臺灣大學
電子工程學研究所
99
Image-processing algorithms have played an essential role in our daily life for entertainment, video communication, and computer vision. Various kinds of algorithms are linked together to carry out a super filtering to retrieve meaningful information from 2-D images. To carry out these algorithms in the real-time performance with the modern VLSI design, parallelism is a major architecture design skill to seek for a way to a framework with many computing units and separated memory modules. Larger problems can be divided into several tasks with pieces of the interested data and solved simultaneously on this architecture. During the parallelizing, it is usually a bottleneck to get data ready for computing units and keep dependency. In this dissertation, we address designing image-processing algorithms in tiles and discuss benefits and issues. We then proposed a pyramid architecture for tile-based image-processing algorithms to be efficiently applied in various kinds of systems. In applications using CMOS image sensors, an image processing is crucial to generating high quality images. The on-chip line buffer normally dominates the total area and power dissipation due to the needed filter window buffering. As the image resolution and filter support increases, the area and power requirement increase accordingly. We propose the pyramid architecture design to efficiently process a system that the image pipeline is between an image sensor and a video coding engine. By utilizing the features of pyramid structure and block-based video/image encoder, the proposed architecture is scalable from low to high resolution and filter size. The input image is partitioned into floors of tiles to reduce frame-line buffers. Two computing schemes, immediate result reuse (IRR) and vertical snack scan (VSS), are utilized to reduce the overlapping redundant computation. A 90nm CMOS chip design with the 7×5 filter support for 3840×2160 Quad Full High Definition (QFHD) video at 30 frames/s is designed to demonstrate the performance of power and area efficiency. Compared with the traditional architecture with frame-line buffers, the proposed design has shown the power consumption is reduced by 25% to 108mW from 145mW. The chip area is reduced by 65% to 309K from 888K logic gates. The external memory bandwidth increases to 8286Mbits/s from 5972Mbits/s for YUV4:2:0, from 7963Mbits/s for YUV4:2:2, and is reduced by 30% from 11944Mbits/s for YUV4:4:4 videos. In computer-vision applications using scale-invariant feature transform (SIFT), the kernel size is a key to build a Gaussian pyramid to extract features in scale-space representations. The SIFT implementations typically involve the use of a high-power, general-purposed processor to keep high-quality results but achieve the less-than-real-time performance. For resource-limited embedded systems, the algorithm is simplified to the 3×3 or 7×7 kernel size such that a small 320×240 resolution with a weakened capability of feature extraction is feasible with the modern ASIC or a FPGA platform. We carefully examine the algorithm and separate it into the low-level and feature-level processes. A 90nm CMOS SIFT accelerator with the 15×11 filter support for 3 scales in an octave is designed by extending single-pyramid to multiple-pyramid architecture. The design integrates 791K logic gates and 204K SRAM bits. The synthesis result shows it works at 270MHz to achieve 1280×960 octaves at 204 frames/s. Compared with the software implementation on a single-core processor, the speedup is 48.5 times and the algorithm quality degrades 4.3% in the repeatability. Compared with the frame-line-buffer architecture, the SRAM usage is reduced by 89.23% from 1894 to 204 Kbits, the area efficiency is improved by 7.3 times, and the algorithm quality is improved by 34.8% in the repeatability for a single-object test. The proposed design takes additional 312 Mbytes/s bandwidth to process 640×480 videos at 30 frames/s. The architecture provides the feasibility to trade-off the global bandwidth and local SRAM usage according to system constrains. The global bandwidth can be reduced by 79% to 66 Mbytes/s while the SRAM usage increases by 7.82 times to 1597 Kbits.
APA, Harvard, Vancouver, ISO, and other styles
25

Wang, Mei-Rong, and 王美蓉. "Implementation of an Integrated Signal Processing Board for a Versatile Multimedia System Based on TMS320C80." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/06309175934824831413.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Meei-Rong, Wang, and 王美蓉. "Implementation of an Integrated Signal Processing Board for a Versatile Multimedia System Based on TMS320C80." Thesis, 1998. http://ndltd.ncl.edu.tw/handle/97266196783399958732.

Full text
Abstract:
碩士
國立雲林科技大學
電子與資訊工程技術研究所
86
This thesis presents the design and implementation of a DSP board based on the TMS320C80 multimedia video processor (MVP).We integrate various multi-media IC components to establish a PC-based versatile multimedia codec (VMMC)system.The architecture of this VM C board comprises a high-performance C80 parel-le programmable processor, memory ( SRAM, DRAM, VRAM, EPROM ), and peipheral devices ( video capture/display, PCI interface, etc. ).In this thesis, we concentrate on the design tasks of the C80 processor, and the associated cont-rol circuits for memory and peripherals. A couple of Altera's programmable logic devices(PLD) EPM7128 are employed to embed the control circuits designed to coordinate the memory and peripherals. We use the AHDL hardware description language to verify our design, and the ORCAD capture tool for schematic entry. The VMMC board system is built as a mother board with PCI interface and epansion slots suitable for mounting daughter boards, whereby special-purpose ASIC's can be experimented with. By emphasizing on the core processor C80, we present our research and development work as follows: 1. investigation of the basic functions of peripheral devices, 2. design techniques and control flows for C80 and peripherals,3. flowcharts and illustrations of software programs,4. extensive simulation results and physical measurement of the system timing for VMMC control circuits.
APA, Harvard, Vancouver, ISO, and other styles
27

Zhu, Jihai. "Low-complexity block dividing coding method for image compression using wavelets : a thesis presented in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Systems Engineering at Massey University, Palmerston North, New Zealand." 2007. http://hdl.handle.net/10179/704.

Full text
Abstract:
Image coding plays a key role in multimedia signal processing and communications. JPEG2000 is the latest image coding standard, it uses the EBCOT (Embedded Block Coding with Optimal Truncation) algorithm. The EBCOT exhibits excellent compression performance, but with high complexity. The need to reduce this complexity but maintain similar performance to EBCOT has inspired a significant amount of research activity in the image coding community. Within the development of image compression techniques based on wavelet transforms, the EZW (Embedded Zerotree Wavelet) and the SPIHT (Set Partitioning in Hierarchical Trees) have played an important role. The EZW algorithm was the first breakthrough in wavelet based image coding. The SPIHT algorithm achieves similar performance to EBCOT, but with fewer features. The other very important algorithm is SBHP (Sub-band Block Hierarchical Partitioning), which attracted significant investigation during the JPEG2000 development process. In this thesis, the history of the development of wavelet transform is reviewed, and a discussion is presented on the implementation issues for wavelet transforms. The above mentioned four main coding methods for image compression using wavelet transforms are studied in detail. More importantly the factors that affect coding efficiency are identified. The main contribution of this research is the introduction of a new low-complexity coding algorithm for image compression based on wavelet transforms. The algorithm is based on block dividing coding (BDC) with an optimised packet assembly. Our extensive simulation results show that the proposed algorithm outperforms JPEG2000 in lossless coding, even though it still leaves a narrow gap in lossy coding situations
APA, Harvard, Vancouver, ISO, and other styles
28

Grégory, Païs. "Analyse conjointe texte et image pour la caractérisation de films d'animation." Phd thesis, 2010. http://tel.archives-ouvertes.fr/tel-00750619.

Full text
Abstract:
Le développement rapide des nouvelles technologies de l'information a provoqué ces dernières années une augmentation considérable de la masse de données à disposition de l'utilisateur. Afin d'exploiter de manière rationnelle et efficace l'ensemble de ces données la solution passe par l'indexation de ces documents multimédia. C'est dans ce contexte que ce situe cette thèse et plus spécifiquement dans celui de l'indexation d'une base numérique de films d'animation, telle que celle mise en place par la CITIA (Cité de l'image en mouvement). L'objectif principal de cette thèse est de proposer une méthodologie permettant de prendre en compte des informations issues de l'analyse de l'image et celles issues des péri-textes (synopsis, critiques, analyses, etc.). Ces deux sources d'information sont de niveau sémantique très différent et leur utilisation conjointe permet une caractérisation riche et sémantique des séquences vidéo. L'extraction automatique de descripteurs images est abordée dans ces travaux à travers la caractérisation des couleurs et de l'activité du film. L'analyse automatique des synopsis permet quant à elle de caractériser la thématique du film et permet, grâce au scénario actanciel, la caractérisation de l'action de la séquence. Finalement ces informations sont utilisées conjointement pour retrouver et décrire localement les passages d'action et permettent d'obtenir l'atmosphère du film grâce à leur fusion floue.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography