Dissertations / Theses: 'Image representation'

1

Engel, Claude. "Image et representation." Université Marc Bloch (Strasbourg) (1971-2008), 1989. http://www.theses.fr/1989STR20027.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Chintala, Venkatram Reddy. "Digital image data representation." Ohio : Ohio University, 1986. http://www.ohiolink.edu/etd/view.cgi?ohiou1183128563.

Full text

APA, Harvard, Vancouver, ISO, and other styles

3

Moltisanti, Marco. "Image Representation using Consensus Vocabulary and Food Images Classification." Doctoral thesis, Università di Catania, 2016. http://hdl.handle.net/10761/3968.

Full text

Abstract:

Digital images are the result of many physical factors, such as illumination, point of view an thermal noise of the sensor. These elements may be irrelevant for a specific Computer Vision task; for instance, in the object detection task, the viewpoint and the color of the object should not be relevant in order to answer the question "Is the object present in the image?". Nevertheless, an image depends crucially on all such parameters and it is simply not possible to ignore them in analysis. Hence, finding a representation that, given a specific task, is able to keep the significant features of the image and discard the less useful ones is the first step to build a robust system in Computer Vision. One of the most popular model to represent images is the Bag-of-Visual-Words (BoW) model. Derived from text analysis, this model is based on the generation of a codebook (also called vocabulary) which is subsequently used to provide the actual image representation. Considering a set of images, the typical pipeline, consists in: 1. Select a subset of images to be the training set for the model; 2. Extract the desired features from the all the images; 3. Run a clustering algorithm on the features extracted from the training set: each cluster is a codeword, the set containing all the clusters is the codebook; 4. For each feature point, nd the closest codeword according to a distance function or metric; 5. Build a normalized histogram of the occurrences of each word. The choices made in the design phase influence strongly the final outcome of the representation. In this work we will discuss how to aggregate di fferent kind of features to obtain more powerful representations, presenting some state-of-the-art methods in Computer Vision community. We will focus on Clustering Ensemble techniques, presenting the theoretical framework and a new approach (Section 2.5). Understanding food in everyday life (e.g., the recognition of dishes and the related ingredients, the estimation of quantity, etc.) is a problem which has been considered in different research areas due its important impact under the medical, social and anthropological aspects. For instance, an insane diet can cause problems in the general health of the people. Since health is strictly linked to the diet, advanced Computer Vision tools to recognize food images (e.g., acquired with mobile/wearable cameras), as well as their properties (e.g., calories, volume), can help the diet monitoring by providing useful information to the experts (e.g., nutritionists) to assess the food intake of patients (e.g., to combat obesity). On the other hand, the great diffusion of low cost image acquisition devices embedded in smartphones allows people to take pictures of food and share them on Internet (e.g., on social media); the automatic analysis of the posted images could provide information on the relationship between people and their meals and can be exploited by food retailer to better understand the preferences of a person for further recommendations of food and related products. Image representation plays a key role while trying to infer information about food items depicted in the image. We propose a deep review of the state-of-the-art two different novel representation techniques.

APA, Harvard, Vancouver, ISO, and other styles

4

Bowley, James. "Sparse image representation with encryption." Thesis, Aston University, 2013. http://publications.aston.ac.uk/20914/.

Full text

Abstract:

In this thesis we present an overview of sparse approximations of grey level images. The sparse representations are realized by classic, Matching Pursuit (MP) based, greedy selection strategies. One such technique, termed Orthogonal Matching Pursuit (OMP), is shown to be suitable for producing sparse approximations of images, if they are processed in small blocks. When the blocks are enlarged, the proposed Self Projected Matching Pursuit (SPMP) algorithm, successfully renders equivalent results to OMP. A simple coding algorithm is then proposed to store these sparse approximations. This is shown, under certain conditions, to be competitive with JPEG2000 image compression standard. An application termed image folding, which partially secures the approximated images is then proposed. This is extended to produce a self contained folded image, containing all the information required to perform image recovery. Finally a modified OMP selection technique is applied to produce sparse approximations of Red Green Blue (RGB) images. These RGB approximations are then folded with the self contained approach.

APA, Harvard, Vancouver, ISO, and other styles

5

Le, Huu Ton. "Improving image representation using image saliency and information gain." Thesis, Poitiers, 2015. http://www.theses.fr/2015POIT2287/document.

Full text

Abstract:

De nos jours, avec le développement des nouvelles technologies multimédia, la recherche d’images basée sur le contenu visuel est un sujet de recherche en plein essor avec de nombreux domaines d'application: indexation et recherche d’images, la graphologie, la détection et le suivi d’objets... Un des modèles les plus utilisés dans ce domaine est le sac de mots visuels qui tire son inspiration de la recherche d’information dans des documents textuels. Dans ce modèle, les images sont représentées par des histogrammes de mots visuels à partir d'un dictionnaire visuel de référence. La signature d’une image joue un rôle important car elle détermine la précision des résultats retournés par le système de recherche.Dans cette thèse, nous étudions les différentes approches concernant la représentation des images. Notre première contribution est de proposer une nouvelle méthodologie pour la construction du vocabulaire visuel en utilisant le gain d'information extrait des mots visuels. Ce gain d’information est la combinaison d’un modèle de recherche d’information avec un modèle d'attention visuelle.Ensuite, nous utilisons un modèle d'attention visuelle pour améliorer la performance de notre modèle de sacs de mots visuels. Cette étude de la saillance des descripteurs locaux souligne l’importance d’utiliser un modèle d’attention visuelle pour la description d’une image.La dernière contribution de cette thèse au domaine de la recherche d’information multimédia démontre comment notre méthodologie améliore le modèle des sacs de phrases visuelles. Finalement, une technique d’expansion de requêtes est utilisée pour augmenter la performance de la recherche par les deux modèles étudiés
Nowadays, along with the development of multimedia technology, content based image retrieval (CBIR) has become an interesting and active research topic with an increasing number of application domains: image indexing and retrieval, face recognition, event detection, hand writing scanning, objects detection and tracking, image classification, landmark detection... One of the most popular models in CBIR is Bag of Visual Words (BoVW) which is inspired by Bag of Words model from Information Retrieval field. In BoVW model, images are represented by histograms of visual words from a visual vocabulary. By comparing the images signatures, we can tell the difference between images. Image representation plays an important role in a CBIR system as it determines the precision of the retrieval results.In this thesis, image representation problem is addressed. Our first contribution is to propose a new framework for visual vocabulary construction using information gain (IG) values. The IG values are computed by a weighting scheme combined with a visual attention model. Secondly, we propose to use visual attention model to improve the performance of the proposed BoVW model. This contribution addresses the importance of saliency key-points in the images by a study on the saliency of local feature detectors. Inspired from the results from this study, we use saliency as a weighting or an additional histogram for image representation.The last contribution of this thesis to CBIR shows how our framework enhances the BoVP model. Finally, a query expansion technique is employed to increase the retrieval scores on both BoVW and BoVP models

APA, Harvard, Vancouver, ISO, and other styles

6

Elliott, Desmond. "Structured representation of images for language generation and image retrieval." Thesis, University of Edinburgh, 2015. http://hdl.handle.net/1842/10524.

Full text

Abstract:

A photograph typically depicts an aspect of the real world, such as an outdoor landscape, a portrait, or an event. The task of creating abstract digital representations of images has received a great deal of attention in the computer vision literature because it is rarely useful to work directly with the raw pixel data. The challenge of working with raw pixel data is that small changes in lighting can result in different digital images, which is not typically useful for downstream tasks such as object detection. One approach to representing an image is automatically extracting and quantising visual features to create a bag-of-terms vector. The bag-of-terms vector helps overcome the problems with raw pixel data but this unstructured representation discards potentially useful information about the spatial and semantic relationships between the parts of the image. The central argument of this thesis is that capturing and encoding the relationships between parts of an image will improve the performance of extrinsic tasks, such as image description or search. We explore this claim in the restricted domain of images representing events, such as riding a bicycle or using a computer. The first major contribution of this thesis is the Visual Dependency Representation: a novel structured representation that captures the prominent region–region relationships in an image. The key idea is that images depicting the same events are likely to have similar spatial relationships between the regions contributing to the event. This representation is inspired by dependency syntax for natural language, which directly captures the relationships between the words in a sentence. We also contribute a data set of images annotated with multiple human-written descriptions, labelled image regions, and gold-standard Visual Dependency Representations, and explain how the gold-standard representations can be constructed by trained human annotators. The second major contribution of this thesis is an approach to automatically predicting Visual Dependency Representations using a graph-based statistical dependency parser. A dependency parser is typically used in Natural Language Processing to automatically predict the dependency structure of a sentence. In this thesis we use a dependency parser to predict the Visual Dependency Representation of an image because we are working with a discrete image representation – that of image regions. Our approach can exploit features from the region annotations and the description to predict the relationships between objects in an image. In a series of experiments using gold-standard region annotations, we report significant improvements in labelled and unlabelled directed attachment accuracy over a baseline that assumes there are no relationships between objects in an image. Finally, we find significant improvements in two extrinsic tasks when we represent images as Visual Dependency Representations predicted from gold-standard region annotations. In an image description task, we show significant improvements in automatic evaluation measures and human judgements compared to state-of-the-art models that use either external text corpora or region proximity to guide the generation process. In the query-by-example image retrieval task, we show a significant improvement in Mean Average Precision and the precision of the top 10 images compared to a bag-of-terms approach. We also perform a correlation analysis of human judgements against automatic evaluation measures for the image description task. The automatic measures are standard measures adopted from the machine translation and summarization literature. The main finding of the analysis is that unigram BLEU is less correlated with human judgements than Smoothed BLEU, Meteor, or skip-bigram ROUGE.

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Xin. "Abstractive Representation Modeling for Image Classification." University of Cincinnati / OhioLINK, 2021. http://rave.ohiolink.edu/etdc/view?acc_num=ucin1623250959448677.

Full text

APA, Harvard, Vancouver, ISO, and other styles

8

Mutelo, Risco Mulwani. "Biometric face image representation and recognition." Thesis, University of Newcastle upon Tyne, 2011. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.548004.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Wang, Hua. "Colour image representation by scalar variables." Thesis, Loughborough University, 1992. https://dspace.lboro.ac.uk/2134/10477.

Full text

Abstract:

A number of studies have shown that it is possible to use a colour codebook, which has a limited number of colours (typically 100-200), to replace the colour gamut and obtain a good quality reconstructed colour image. Thus colour images can be displayed on less expensive devices retaining high quality and can be stored in less space. However, a colour codebook is normally randomly arranged and the coded image, which is referred to as the index image, has no structure. This prevents the use of this kind of colour image representation in any further image processing. The objective of the research described in this thesis is to explore the possibility of making the index image meaningful, that is, the index image can retain the structure existing in the original full colour image, such as correlation and edges. In this way, a three band colour image represented by colour vectors can be transfomled into a one band index image represented by scalar variables. To achieve the scalar representation of colour images, the colour codebook must be ordered to satisfy the following two conditions: (I) codewords representing similar colours must be close together in the code book and (2) close code words in the codebook must represent similar colours. Some effective methods are proposed for ordering the colour codebook. First, several grouping strategies are suggested for grouping the code words representing similar colours together. Second, an ordering function is designed, which gives a quantity. measurement of the satisfaction of the two conditions of an ordered codebook. The code book ordering is then iteratively refined by the ordering function. Finally, techniques, such as artificial codeword insertion, are developed to refine the code book ordering further. A number of algorithms for colour codebook ordering have been tried to retain as much structure in the index image as possible. The efficiency of the algorithms for ordering a colour codebook has been tested by applying some image processing techniques to the index image. A VQ/DCT colour image coding scheme has been developed to test the possibility of compressing and decompressing the index image. Edge detection is applied to the index image to test how well the edges existing in the original colour image can be retained in the index image. Experiments demonstrate that the index image can retain a lot of structure existing in the original colour image if the codebook is ordered by an appreciate ordering algorithm, such as the PNNbased/ ordering function method together with artificial codeword insertion. Then further image processing techniques, such as image compression and edge detection, can be applied to the index image. In this way, colour image processing can be realized by index image processing in the same way as monochrome image processing. In this sense, a three-band colour image represented by colour vectors is transformed into a single band index image represented by scalar variables.

APA, Harvard, Vancouver, ISO, and other styles

10

Chang, William. "Representation Theoretical Methods in Image Processing." Scholarship @ Claremont, 2004. https://scholarship.claremont.edu/hmc_theses/160.

Full text

Abstract:

Image processing refers to the various operations performed on pictures that are digitally stored as an aggregate of pixels. One can enhance or degrade the quality of an image, artistically transform the image, or even find or recognize objects within the image. This paper is concerned with image processing, but in a very mathematical perspective, involving representation theory. The approach traces back to Cooley and Tukey’s seminal paper on the Fast Fourier Transform (FFT) algorithm (1965). Recently, there has been a resurgence in investigating algebraic generalizations of this original algorithm with respect to different symmetry groups. My approach in the following chapters is as follows. First, I will give necessary tools from representation theory to explain how to generalize the Discrete Fourier Transform (DFT). Second, I will introduce wreath products and their application to images. Third, I will show some results from applying some elementary filters and compression methods to spectrums of images. Fourth, I will attempt to generalize my method to noncyclic wreath product transforms and apply it to images and three-dimensional geometries.

APA, Harvard, Vancouver, ISO, and other styles

11

Wang, John Yu An. "Layered image representation : identification of coherent components in image sequences." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/10759.

Full text

Abstract:

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.
Includes bibliographical references (p. 105-111).
by John Yu An Wang.
Ph.D.

APA, Harvard, Vancouver, ISO, and other styles

12

Khanna, Rajiv. "Image data compression using multiple bases representation." Thesis, This resource online, 1990. http://scholar.lib.vt.edu/theses/available/etd-12302008-063722/.

Full text

APA, Harvard, Vancouver, ISO, and other styles

13

Begovic, Bojana. "Dictionary learning for scalable sparse image representation." Thesis, University of Strathclyde, 2016. http://oleg.lib.strath.ac.uk:80/R/?func=dbin-jump-full&object_id=26895.

Full text

Abstract:

Modern era of signal processing has developed many technical tools for recording and processing large and growing amount of data together with algorithms specialised for data analysis. This gives rise to new challenges in terms of data processing and modelling data representation. Fields ranging from experimental sciences, astronomy, computer vision,neuroscience mobile networks etc., are all in constant search for scalable and efficient data processing tools which would enable more effective analysis of continuous video streams containing millions of pixels. Therefore, the question of digital signal representation is still of high importance, despite the fact that it has been the topic of a significant amount of work in the past. Moreover, developing new data processing methods also affects the quality of everyday life, where devices such as CCD sensors from digital cameras or cell phones are intensively used for entertainment purposes. Specifically, one of the novel processing tools is signal sparse coding which represents signals as linear combinations of a few representational basis vectors i.e., atoms given an overcomplete dictionary. Applications that employ sparse representation are many such as denoising, compression, and regularisation in inverse problems, feature extraction, and more. In this thesis we introduce and study a particular signal representation denoted as the scalable sparse coding. It is based on a novel design for the dictionary learning algorithm, which has proven to be effective for scalable sparse representation of many modalities such as high motion video sequences, natural and solar images. The proposed algorithm is built upon the foundation of the K-SVD framework originally designed to learn non-scalable dictionaries for natural images. The scalable dictionary learning design is mainly motivated by the main perception characteristics of the Human Visual System (HVS) mechanism. Specifically, its core structure relies on the exploitation of the spatial high-frequency image components and contrast variations in order to achieve visual scene objects identification at all scalable levels. The implementation of HVS properties is carried out by introducing a semi-random Morphological Component Analysis (MCA) based initialisation of the scalable dictionary and the regularisation of its atom’s update mechanism. Subsequently, this enables scalable sparse image reconstruction. In general, dictionary learning for sparse representations leads to state-of-the-art image restoration results for several different problems in the field of image processing. Experiments in this thesis show that these are equally achievable by accommodating all dictionary elements to tailor the scalable data representation and reconstruction, hence modelling data that admit sparse representation in a novel manner. Furthermore, achieved results demonstrateand validate the practicality of the proposed scheme making it a promising candidate for many practical applications involving both time scalable display, denoising and scalable compressive sensing (CS). Performed simulations include scalable sparse recovery for representation of static and dynamic data changing over time such as video sequences and natural images. Lastly, we contribute novel approaches for scalable denoising and contrast enhancement (CE), applied on solar images corrupted with pixel-dependent Poisson and zero-mean additive white Gaussian noise. Given that solar data contain noise introduced by charge-coupled devices within the on-board acquisition system these artefacts, prior to image analysis, have to be removed. Thus, novel image denoising and contrast enhancement methods are necessary for solar preprocessing.

APA, Harvard, Vancouver, ISO, and other styles

14

Brugnot, Sylvain. "Towards a topology-based vector image representation." Thesis, University of Glasgow, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.425117.

Full text

APA, Harvard, Vancouver, ISO, and other styles

15

Cham, Tat Jen. "Geometric representation and grouping of image curves." Thesis, University of Cambridge, 1996. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.627568.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Dalens, Théophile. "Learnable factored image representation for visual discovery." Thesis, Paris Sciences et Lettres (ComUE), 2019. http://www.theses.fr/2019PSLEE036.

Full text

Abstract:

L'objectif de cette thèse est de développer des outils pour analyser les collections d'images temporelles afin d'identifier et de mettre en évidence les tendances visuelles à travers le temps. Cette thèse propose une approche pour l'analyse de données visuelles non appariées annotées avec le temps en générant à quoi auraient ressemblé les images si elles avaient été d'époques différentes. Pour isoler et transférer les variations d'apparence dépendantes du temps, nous introduisons un nouveau module bilinéaire de séparation de facteurs qui peut être entraîné. Nous analysons sa relation avec les représentations factorisées classiques et les auto-encodeurs basés sur la concaténation. Nous montrons que ce nouveau module présente des avantages par rapport à un module standard de concaténation lorsqu'il est utilisé dans une architecture de réseau de neurones convolutionnel encodeur-décodeur à goulot. Nous montrons également qu'il peut être inséré dans une architecture récente de traduction d'images à adversaire, permettant la transformation d'images à différentes périodes de temps cibles en utilisant un seul réseau
This thesis proposes an approach for analyzing unpaired visual data annotated with time stamps by generating how images would have looked like if they were from different times. To isolate and transfer time dependent appearance variations, we introduce a new trainable bilinear factor separation module. We analyze its relation to classical factored representations and concatenation-based auto-encoders. We demonstrate this new module has clear advantages compared to standard concatenation when used in a bottleneck encoder-decoder convolutional neural network architecture. We also show that it can be inserted in a recent adversarial image translation architecture, enabling the image transformation to multiple different target time periods using a single network

APA, Harvard, Vancouver, ISO, and other styles

17

Noble, Julia Alison. "Descriptions of image surfaces." Thesis, University of Oxford, 1989. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.238117.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Karmakar, Priyabrata. "Effective and efficient kernel-based image representations for classification and retrieval." Thesis, Federation University Australia, 2018. http://researchonline.federation.edu.au/vital/access/HandleResolver/1959.17/165515.

Full text

Abstract:

Image representation is a challenging task. In particular, in order to obtain better performances in different image processing applications such as video surveillance, autonomous driving, crime scene detection and automatic inspection, effective and efficient image representation is a fundamental need. The performance of these applications usually depends on how accurately images are classified into their corresponding groups or how precisely relevant images are retrieved from a database based on a query. Accuracy in image classification and precision in image retrieval depend on the effectiveness of image representation. Existing image representation methods have some limitations. For example, spatial pyramid matching, which is a popular method incorporating spatial information in image-level representation, has not been fully studied to date. In addition, the strengths of pyramid match kernel and spatial pyramid matching are not combined for better image matching. Kernel descriptors based on gradient, colour and shape overcome the limitations of histogram-based descriptors, but suffer from information loss, noise effects and high computational complexity. Furthermore, the combined performance of kernel descriptors has limitations related to computational complexity, higher dimensionality and lower effectiveness. Moreover, the potential of a global texture descriptor which is based on human visual perception has not been fully explored to date. Therefore, in this research project, kernel-based effective and efficient image representation methods are proposed to address the above limitations. An enhancement is made to spatial pyramid matching in terms of improved rotation invariance. This is done by investigating different partitioning schemes suitable to achieve rotation-invariant image representation and the proposal of a weight function for appropriate level contribution in image matching. In addition, the strengths of pyramid match kernel and spatial pyramid are combined to enhance matching accuracy between images. The existing kernel descriptors are modified and improved to achieve greater effectiveness, minimum noise effects, less dimensionality and lower computational complexity. A novel fusion approach is also proposed to combine the information related to all pixel attributes, before the descriptor extraction stage. Existing kernel descriptors are based only on gradient, colour and shape information. In this research project, a texture-based kernel descriptor is proposed by modifying an existing popular global texture descriptor. Finally, all the contributions are evaluated in an integrated system. The performances of the proposed methods are qualitatively and quantitatively evaluated on two to four different publicly available image databases. The experimental results show that the proposed methods are more effective and efficient in image representation than existing benchmark methods.
Doctor of Philosophy

APA, Harvard, Vancouver, ISO, and other styles

19

Guha, Tanaya. "Image and video classification and image similarity measurement by learning sparse representation." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/45122.

Full text

Abstract:

Sparse representation of signals has recently emerged as a major research area. It is well-known that many natural signals can be sparsely represented using a properly chosen dictionary (e.g. formed of wavelets bases). A dictionary could be complete or overcomplete depending on whether the number of bases it contains is the same or greater than the dimensionality of the given signal. Traditionally, the use of predefined dictionaries has been prevalent in sparse analysis. However, a more generalized approach is to learn the dictionary from the signal itself. Learnt dictionaries are known to outperform predefined dictionaries in several applications. This thesis explores the application of sparse representations of signals obtained by learning overcomplete dictionaries for three applications: 1) classification of images and videos, 2) measurement of similarity between two images, and 3) assessment of perceptual quality of an image. This thesis first capitalizes on the natural discriminative ability of sparse representations to develop efficient classification algorithms. The proposed algorithms are employed in image-based face recognition and video-based human action recognition. They are shown to perform better than the state-of-the-art. The thesis then studies how to obtain a good measure of similarity between two images. Despite the long history of image similarity evaluation, open issues still exist. These include the need of developing generic similarity measures that do not assume any prior knowledge of the task at hand or the data type. This thesis develops a generic image similarity measure based on learning sparse representations. Successful application of the proposed measure to clustering, retrieval and classification of different types of images is demonstrated. The thesis then examines a highly promising approach to assess the perceptual quality of an image. This approach involves comparing the structural information of a possibly distorted image with that in its reference image. The extraction of the structural information that is important to our visual system is a challenging task. A sparse representation-based image quality assessment approach is proposed to address this issue. When compared with seven existing metrics, our method performs the best in three databases and ranks among the top three in the remaining three databases.

APA, Harvard, Vancouver, ISO, and other styles

20

Basso, Andrea. "Image representation and coding based on zero-crossings /." Lausanne : EPFL, 1995. http://library.epfl.ch/theses/?nr=1379.

Full text

APA, Harvard, Vancouver, ISO, and other styles

21

Ding, Ding, and Ding Ding. "Image Inpainting Based on Exemplars and Sparse Representation." Diss., The University of Arizona, 2017. http://hdl.handle.net/10150/625897.

Full text

Abstract:

Image inpainting is the process of recovering missing or deteriorated data within the digital images and videos in a plausible way. It has become an important topic in the area of image processing, which leads to the understanding of the textural and structural information within the images. Image inpainting has many different applications, such as image/video restoration, text/object removal, texture synthesis, and transmission error concealment. In recent years, many algorithms have been developed to solve the image inpainting problem, which can be roughly grouped into four categories, partial differential equation-based inpainting, exemplar-based inpainting, transform domain inpainting, and hybrid image inpainting. However, the existing algorithms do not work well when the missing region to be inpainted is large, and when there are textural and structural information needed to be recovered. To address this inpainting problem, we propose multiple algorithms, 1) perceptually aware image inpainting based on the perceptual-fidelity aware mean squared error metric, 2) image inpainting using nonlocal texture matching and nonlinear filtering, and 3) multiresolution exemplar-based image inpainting. The experimental results show that our proposed algorithms outperform other existing algorithms with respect to both qualitative analysis and observer studies when inpainting the missing regions of images.

APA, Harvard, Vancouver, ISO, and other styles

22

Jain, Mihir. "Enhanced image and video representation for visual recognition." Phd thesis, Université Rennes 1, 2014. http://tel.archives-ouvertes.fr/tel-00996793.

Full text

Abstract:

L'objectif de cette thèse est d'améliorer les représentations des images et des vidéos dans le but d'obtenir une reconnaissance visuelle accrue, tant pour des entités spécifiques que pour des catégories plus génériques. Les contributions de cette thèse portent, pour l'essentiel, sur des méthodes de description du contenu visuel. Nous proposons des méthodes pour la recherche d'image par le contenu ou par des requêtes textuelles, ainsi que des méthodes pour la reconnaissance et la localisation d'action dans des vidéos. En recherche d'image, les contributions se fondent sur des méthodes à base de plongements de Hamming. Tout d'abord, une méthode de comparaison asymétrique vecteur-à-code est proposée pour améliorer la méthode originale, symétrique et utilisant une comparaison code-à-code. Une méthode de classification fondée sur l'appariement de descripteurs locaux est ensuite proposée. Elle s'appuie sur une classification opérée dans un espace de similarités associées au plongement de Hamming. En reconnaissance d'action, les contributions portent essentiellement sur des meilleures manières d'exploiter et de représenter le mouvement. Finalement, une méthode de localisation est proposée. Elle utilise une partition de la vidéo en super-voxels, qui permet d'effectuer un échantillonnage 2D+t de suites de boîtes englobantes autour de zones spatio-temporelles d'intérêt. Elle s'appuie en particulier sur un critère de similarité associé au mouvement. Toutes les méthodes proposées sont évaluées sur des jeux de données publics. Ces expériences montrent que les méthodes proposées dans cette thèse améliorent l'état de l'art au moment de leur publication.

APA, Harvard, Vancouver, ISO, and other styles

23

Babel, Marie. "From image coding and representation to robotic vision." Habilitation à diriger des recherches, Université Rennes 1, 2012. http://tel.archives-ouvertes.fr/tel-00754550.

Full text

Abstract:

This habilitation thesis is first devoted to applications related to image representation and coding. If the image and video coding community has been traditionally focused on coding standardization processes, advanced services and functionalities have been designed in particular to match content delivery system requirements. In this sense, the complete transmission chain of encoded images has now to be considered. To characterize the ability of any communication network to insure end-to-end quality, the notion of Quality of Service (QoS) has been introduced. First defined by the ITU-T as the set of technologies aiming at the degree of satisfaction of a user of the service, QoS is rather now restricted to solutions designed for monitoring and improving network performance parameters. However, end users are usually not bothered by pure technical performances but are more concerned about their ability to experience the desired content. In fact, QoS addresses network quality issues and provides indicators such as jittering, bandwidth, loss rate... An emerging research area is then focused on the notion of Quality of Experience (QoE, also abbreviated as QoX), that describes the quality perceived by end users. Within this context, QoE faces the challenge of predicting the behaviour of any end users. When considering encoded images, many technical solutions can considerably enhance the end user experience, both in terms of services and functionalities, as well as in terms of final image quality. Ensuring the effective transport of data, maintaining security while obtaining the desired end quality remain key issues for video coding and streaming. First parts of my work are then to be seen within this joint QoS/QoE context. From efficient coding frameworks, additional generic functionalities and services such as scalability, advanced entropy coders, content protection, error resilience, image quality enhancement have been proposed. Related to advanced QoE services, such as Region of Interest definition of object tracking and recognition, we further closely studied pseudo-semantic representation. First designed toward coding purposes, these representations aim at exploiting textural spatial redundancies at region level. Indeed, research, for the past 30 years, provided numerous decorrelation tools that reduce the amount of redundancies across both spatial and temporal dimensions in image sequences. To this day, the classical video compression paradigm locally splits the images into blocks of pixels, and processes the temporal axis on a frame by frame basis, without any obvious continuity. Despite very high compression performances such as AVC and forthcoming HEVC standards , one may still advocate the use of alternative approaches. Disruptive solutions have also been proposed, and offer notably the ability to continuously process the temporal axis. However, they often rely on complex tools (\emph{e.g.} Wavelets, control grids) whose use is rather delicate in practice. We then investigate the viability of alternative representations that embed features of both classical and disruptive approaches. The objective is to exhibit the temporal persistence of the textural information, through a time-continuous description. At last, from this pseudo-semantic level of representation, texture tracking system up to object tracking can be designed. From this technical solution, 3D object tracking is a logical outcome, in particular when considering vision robotic issues.

APA, Harvard, Vancouver, ISO, and other styles

24

Plaisted, K. C. "Stimulus detection and representation : implications for search image." Thesis, University of Cambridge, 1994. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.360607.

Full text

APA, Harvard, Vancouver, ISO, and other styles

25

Jia, Wei. "Image analysis and representation for textile design classification." Thesis, University of Dundee, 2011. https://discovery.dundee.ac.uk/en/studentTheses/c667f279-d7a6-4670-b23e-c9dbe2784266.

Full text

Abstract:

A good image representation is vital for image comparision and classification; it may affect the classification accuracy and efficiency. The purpose of this thesis was to explore novel and appropriate image representations. Another aim was to investigate these representations for image classification. Finally, novel features were examined for improving image classification accuracy. Images of interest to this thesis were textile design images. The motivation of analysing textile design images is to help designers browse images, fuel their creativity, and improve their design efficiency. In recent years, bag-of-words model has been shown to be a good base for image representation, and there have been many attempts to go beyond this representation. Bag-of-words models have been used frequently in the classification of image data, due to good performance and simplicity. “Words” in images can have different definitions and are obtained through steps of feature detection, feature description, and codeword calculation. The model represents an image as an orderless collection of local features. However, discarding the spatial relationships of local features limits the power of this model. This thesis exploited novel image representations, bag of shapes and region label graphs models, which were based on bag-of-words model. In both models, an image was represented by a collection of segmented regions, and each region was described by shape descriptors. In the latter model, graphs were constructed to capture the spatial information between groups of segmented regions and graph features were calculated based on some graph theory. Novel elements include use of MRFs to extract printed designs and woven patterns from textile images, utilisation of the extractions to form bag of shapes models, and construction of region label graphs to capture the spatial information. The extraction of textile designs was formulated as a pixel labelling problem. Algorithms for MRF optimisation and re-estimation were described and evaluated. A method for quantitative evaluation was presented and used to compare the performance of MRFs optimised using alpha-expansion and iterated conditional modes (ICM), both with and without parameter re-estimation. The results were used in the formation of the bag of shapes and region label graphs models. Bag of shapes model was a collection of MRFs' segmented regions, and the shape of each region was described with generic Fourier descriptors. Each image was represented as a bag of shapes. A simple yet competitive classification scheme based on nearest neighbour class-based matching was used. Classification performance was compared to that obtained when using bags of SIFT features. To capture the spatial information, region label graphs were constructed to obtain graph features. Regions with the same label were treated as a group and each group was associated uniquely with a vertex in an undirected, weighted graph. Each region group was represented as a bag of shape descriptors. Edges in the graph denoted either the extent to which the groups' regions were spatially adjacent or the dissimilarity of their respective bags of shapes. Series of unweighted graphs were obtained by removing edges in order of weight. Finally, an image was represented using its shape descriptors along with features derived from the chromatic numbers or domination numbers of the unweighted graphs and their complements. Linear SVM classifiers were used for classification. Experiments were implemented on data from Liberty Art Fabrics, which consisted of more than 10,000 complicated images mainly of printed textile designs and woven patterns. Experimental data was classified into seven classes manually by assigning each image a text descriptor based on content or design type. The seven classes were floral, paisley, stripe, leaf, geometric, spot, and check. The result showed that reasonable and interesting regions were obtained from MRF segmentation in which alpha-expansion with parameter re-estimation performs better than alpha-expansion without parameter re-estimation or ICM. This result was not only promising for textile CAD (Computer-Aided Design) to redesign the textile image, but also for image representation. It was also found that bag of shapes model based on MRF segmentation can obtain comparable classification accuracy with bag of SIFT features in the framework of nearest neighbour class-based matching. Finally, the result indicated that incorporation of graph features extracted by constructing region label graphs can improve the classification accuracy compared to both bag of shapes model and bag of SIFT models.

APA, Harvard, Vancouver, ISO, and other styles

26

Athavale, Prashant Vinayak. "Novel integro-differential schemes for multiscale image representation." College Park, Md.: University of Maryland, 2009. http://hdl.handle.net/1903/9691.

Full text

Abstract:

Thesis (Ph.D.) -- University of Maryland, College Park, 2009.
Thesis research directed by: Applied Mathematics & Statistics, and Scientific Computation Program. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

27

Lundberg, Simon. "Architecture as Image." Thesis, KTH, Arkitektur, 2019. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-263059.

Full text

Abstract:

My investigation have been done in the field of architectural representation. The aim of this project was explore architectural imagery beyond the instrumental use of representation. With the intent to pursue the autonomy of image, but also the dependency of architecture of spectators and interpretations, myths and images. My method is hand made drawings and an approach to image making in four parts. The steps are observation(to depict actual buildings), dissection(to break apart and to analyze), assembling( to modify, distort, put together) and immersion (make credible, make animate). The project relates to the built environment but is not meant as a proposal. In a series of drawings, I have tried to create a playful approach to the city and a site. As motifs and motivation I have studied three areas in the Stockholm: Södra stationsområdet, Skarpnäck and Starrbäcksängen. They were all residential areas constructed in the end of the 1980’s, beginning of 1990’s and are heavily influenced by postmodern ideals of reconstructing a pre-functionalist city. The method aims to extract details and aspects of the existing architecture and fitting it together, many times over, in order to inspire and produce imagery that are loaded with atmosphere and storytelling. In doing so trying to prove that images are never just instructional manuals in the hands of architects. And the potential that lies within this realization.

APA, Harvard, Vancouver, ISO, and other styles

28

Cho, Maengsub. "Biological object representation for identification." Thesis, Loughborough University, 1992. https://dspace.lboro.ac.uk/2134/33236.

Full text

Abstract:

This thesis is concerned with the problem of how to represent a biological object for computerised identification. Images of biological objects have been generally characterised by shapes and colour patterns in the biology domain and the pattern recognition domain. Thus, it is necessary to represent the biological object using descriptors for the shape and the colour pattern. The basic requirements which a description method should satisfy are those such as invariance of scale, location and orientation of an object; direct involvement in the identification stage; easy assessment of results. The major task to deal with in this thesis was to develop a shape-description method and a colour-pattern description method which could accommodate all of the basic requirements and could be generally applied in both domains. In the colour-pattern description stage, an important task was to segment a colour image into meaningful segments. The most efficient method for this task is to apply Cluster Analysis. In the image analysis and pattern recognition domains, the majority of approaches to this method have been constrained by the problem of dealing with inordinate amounts of data, i.e. a large number of pixels of an image. In order to directly apply Cluster Analysis to the colour image segmentation, data structure, the Auxiliary Means is developed in this thesis.

APA, Harvard, Vancouver, ISO, and other styles

29

Rehme, Koy D. "An Internal Representation for Adaptive Online Parallelization." Diss., CLICK HERE for online access, 2009. http://contentdm.lib.byu.edu/ETD/image/etd2939.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

30

Tran, Thi Quynh Nhi. "Robust and comprehensive joint image-text representations." Thesis, Paris, CNAM, 2017. http://www.theses.fr/2017CNAM1096/document.

Full text

Abstract:

La présente thèse étudie la modélisation conjointe des contenus visuels et textuels extraits à partir des documents multimédias pour résoudre les problèmes intermodaux. Ces tâches exigent la capacité de ``traduire'' l'information d'une modalité vers une autre. Un espace de représentation commun, par exemple obtenu par l'Analyse Canonique des Corrélation ou son extension kernelisée est une solution généralement adoptée. Sur cet espace, images et texte peuvent être représentés par des vecteurs de même type sur lesquels la comparaison intermodale peut se faire directement.Néanmoins, un tel espace commun souffre de plusieurs déficiences qui peuvent diminuer la performance des ces tâches. Le premier défaut concerne des informations qui sont mal représentées sur cet espace pourtant très importantes dans le contexte de la recherche intermodale. Le deuxième défaut porte sur la séparation entre les modalités sur l'espace commun, ce qui conduit à une limite de qualité de traduction entre modalités. Pour faire face au premier défaut concernant les données mal représentées, nous avons proposé un modèle qui identifie tout d'abord ces informations et puis les combine avec des données relativement bien représentées sur l'espace commun. Les évaluations sur la tâche d'illustration de texte montrent que la prise en compte de ces information fortement améliore les résultats de la recherche intermodale. La contribution majeure de la thèse se concentre sur la séparation entre les modalités sur l'espace commun pour améliorer la performance des tâches intermodales. Nous proposons deux méthodes de représentation pour les documents bi-modaux ou uni-modaux qui regroupent à la fois des informations visuelles et textuelles projetées sur l'espace commun. Pour les documents uni-modaux, nous suggérons un processus de complétion basé sur un ensemble de données auxiliaires pour trouver les informations correspondantes dans la modalité absente. Ces informations complémentaires sont ensuite utilisées pour construire une représentation bi-modale finale pour un document uni-modal. Nos approches permettent d'obtenir des résultats de l'état de l'art pour la recherche intermodale ou la classification bi-modale et intermodale
This thesis investigates the joint modeling of visual and textual content of multimedia documents to address cross-modal problems. Such tasks require the ability to match information across modalities. A common representation space, obtained by eg Kernel Canonical Correlation Analysis, on which images and text can be both represented and directly compared is a generally adopted solution.Nevertheless, such a joint space still suffers from several deficiencies that may hinder the performance of cross-modal tasks. An important contribution of this thesis is therefore to identify two major limitations of such a space. The first limitation concerns information that is poorly represented on the common space yet very significant for a retrieval task. The second limitation consists in a separation between modalities on the common space, which leads to coarse cross-modal matching. To deal with the first limitation concerning poorly-represented data, we put forward a model which first identifies such information and then finds ways to combine it with data that is relatively well-represented on the joint space. Evaluations on emph{text illustration} tasks show that by appropriately identifying and taking such information into account, the results of cross-modal retrieval can be strongly improved. The major work in this thesis aims to cope with the separation between modalities on the joint space to enhance the performance of cross-modal tasks.We propose two representation methods for bi-modal or uni-modal documents that aggregate information from both the visual and textual modalities projected on the joint space. Specifically, for uni-modal documents we suggest a completion process relying on an auxiliary dataset to find the corresponding information in the absent modality and then use such information to build a final bi-modal representation for a uni-modal document. Evaluations show that our approaches achieve state-of-the-art results on several standard and challenging datasets for cross-modal retrieval or bi-modal and cross-modal classification

APA, Harvard, Vancouver, ISO, and other styles

31

Hutchins, Brett. "Bradman : representation, meaning and Australian culture /." [St. Lucia, Qld.], 2001. http://www.library.uq.edu.au/pdfserve.php?image=thesisabs/absthe16171.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

32

Eberhardt, Joerg. "Digital image based surface modelling." Thesis, Coventry University, 1998. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.245098.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Lewis, Elise C. "Image Representation and Interactivity: An Exploration of Utility Values, Information-Needs and Image Interactivity." Thesis, University of North Texas, 2011. https://digital.library.unt.edu/ark:/67531/metadc84240/.

Full text

Abstract:

This study was designed to explore the relationships between users and interactive images. Three factors were identified and provided different perspectives on how users interact with images: image utility, information-need, and images with varying levels of interactivity. The study used a mixed methodology to gain a more comprehensive understanding about the selected factors. An image survey was used to introduce the participants to the images and recorded utility values when given a specific task. The interviews allowed participants to provide details about their experiences with the interactive images and how it affected their utility values. Findings from the study showed that images offering the highest level of interactivity do not always generate the highest utility. Factors such as personal preference, specifically speed and control of the image, affect the usefulness of the image. Participant also provided a variety of uses where access to interactive images would be beneficial. Educational settings and research tools are a few examples of uses provided by participants.

APA, Harvard, Vancouver, ISO, and other styles

34

Valero, Valbuena Silvia. "Hyperspectral image representation and processing with binary partition trees." Doctoral thesis, Universitat Politècnica de Catalunya, 2012. http://hdl.handle.net/10803/130832.

Full text

Abstract:

The optimal exploitation of the information provided by hyperspectral images requires the development of advanced image processing tools. Therefore, under the title Hyperspectral image representation and Processing with Binary Partition Trees, this PhD thesis proposes the construction and the processing of a new region-based hierarchical hyperspectral image representation: the Binary Partition Tree (BPT). This hierarchical region-based representation can be interpreted as a set of hierarchical regions stored in a tree structure. Hence, the Binary Partition Tree succeeds in presenting: (i) the decomposition of the image in terms of coherent regions and (ii) the inclusion relations of the regions in the scene. Based on region-merging techniques, the construction of BPT is investigated in this work by studying hyperspectral region models and the associated similarity metrics. As a matter of fact, the very high dimensionality and the complexity of the data require the definition of specific region models and similarity measures. Once the BPT is constructed, the fixed tree structure allows implementing efficient and advanced application-dependent techniques on it. The application-dependent processing of BPT is generally implemented through a specific pruning of the tree. Accordingly, some pruning techniques are proposed and discussed according to different applications. This Ph.D is focused in particular on segmentation, object detection and classification of hyperspectral imagery. Experimental results on various hyperspectral data sets demonstrate the interest and the good performances of the BPT representation

APA, Harvard, Vancouver, ISO, and other styles

35

Forsberg, Daniel. "An efficient wavelet representation for large medical image stacks." Thesis, Linköping University, Department of Electrical Engineering, 2007. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-8394.

Full text

Abstract:

Like the rest of the society modern health care has to deal with the ever increasing information flow. Imaging modalities such as CT, MRI, US, SPECT and PET just keep producing more and more data. Especially CT and MRI and their 3D image stacks cause problems in terms of how to effectively handle these data sets. Usually a PACS is used to manage the information flow. Since a PACS often is implemented with a server-client setup, the management of these large data sets requires an efficient representation of medical image stacks that minimizes the amount of data transmitted between server and client and that efficiently supports the workflow of a practitioner.

In this thesis an efficient wavelet representation for large medical image stacks is proposed for the use in a PACS. The representation supports features such as lossless viewing, random access, ROI-viewing, scalable resolution, thick slab viewing and progressive transmission. All of these features are believed to be essential to form an efficient tool for navigation and reconstruction of an image stack.

The proposed wavelet representation has also been implemented and found to be better in terms of memory allocation and amount of data transmitted between server and client when compared to prior solutions. Performance tests of the implementation has also shown the proposed wavelet representation to have a good computational performance.

APA, Harvard, Vancouver, ISO, and other styles

36

Houghton, Michael Kevin. "Image feature matching using polynomial representation of chain codes." Thesis, University of Central Lancashire, 1993. http://clok.uclan.ac.uk/20359/.

Full text

Abstract:

In this thesis the development of a novel descriptor for boundary images represented in a chain code format is reported. This descriptor is based on a truncated series of orthogonal polynomials used to represent a piecewise continuous function derived from a chain code. This piecewise continuous function is generated from a chain code by mapping individual chain links onto real numbers. A variety of alternative mappings of chain links onto real numbers are evaluated, along with two specific orthogonal polynomials; namely Legendre polynomials and Chebychev polynomials. The performance of this series descriptor for chain codes is evaluated initially by applying it to the problem of locating short chains within a long chain; and then extending the application and critically evaluating the descriptor when attempting to match features from pairs of similar images. In addition, a formal algebra is developed that provides the rule base that enables the transformation and manipulation of chain encoded boundary images. The foundation of this algebra is based on the notion that the labelling of the directions of an 8-connected chain code is essentially arbitrary and 7 other, different and consistent labellings can be distinguished.

APA, Harvard, Vancouver, ISO, and other styles

37

Turcat, Jean-Philippe. "Object-based content representation and analysis for image retrieval." Thesis, Staffordshire University, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.394142.

Full text

APA, Harvard, Vancouver, ISO, and other styles

38

Chan, Kin-lok, and 陳健樂. "Video object coding and relighting for image base representation." Thesis, The University of Hong Kong (Pokfulam, Hong Kong), 2004. http://hub.hku.hk/bib/B30221961.

Full text

APA, Harvard, Vancouver, ISO, and other styles

39

Govindarajan, Hariprasath. "Self-Supervised Representation Learning for Content Based Image Retrieval." Thesis, Linköpings universitet, Statistik och maskininlärning, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-166223.

Full text

Abstract:

Automotive technologies and fully autonomous driving have seen a tremendous growth in recent times and have benefitted from extensive deep learning research. State-of-the-art deep learning methods are largely supervised and require labelled data for training. However, the annotation process for image data is time-consuming and costly in terms of human efforts. It is of interest to find informative samples for labelling by Content Based Image Retrieval (CBIR). Generally, a CBIR method takes a query image as input and returns a set of images that are semantically similar to the query image. The image retrieval is achieved by transforming images to feature representations in a latent space, where it is possible to reason about image similarity in terms of image content. In this thesis, a self-supervised method is developed to learn feature representations of road scenes images. The self-supervised method learns feature representations for images by adapting intermediate convolutional features from an existing deep Convolutional Neural Network (CNN). A contrastive approach based on Noise Contrastive Estimation (NCE) is used to train the feature learning model. For complex images like road scenes where mutiple image aspects can occur simultaneously, it is important to embed all the salient image aspects in the feature representation. To achieve this, the output feature representation is obtained as an ensemble of feature embeddings which are learned by focusing on different image aspects. An attention mechanism is incorporated to encourage each ensemble member to focus on different image aspects. For comparison, a self-supervised model without attention is considered and a simple dimensionality reduction approach using SVD is treated as the baseline. The methods are evaluated on nine different evaluation datasets using CBIR performance metrics. The datasets correspond to different image aspects and concern the images at different spatial levels - global, semi-global and local. The feature representations learned by self-supervised methods are shown to perform better than the SVD approach. Taking into account that no labelled data is required for training, learning representations for road scenes images using self-supervised methods appear to be a promising direction. Usage of multiple query images to emphasize a query intention is investigated and a clear improvement in CBIR performance is observed. It is inconclusive whether the addition of an attentive mechanism impacts CBIR performance. The attention method shows some positive signs based on qualitative analysis and also performs better than other methods for one of the evaluation datasets containing a local aspect. This method for learning feature representations is promising but requires further research involving more diverse and complex image aspects.

APA, Harvard, Vancouver, ISO, and other styles

40

Siméoni, Oriane. "Robust image representation for classification, retrieval and object discovery." Thesis, Rennes 1, 2020. https://ged.univ-rennes1.fr/nuxeo/site/esupversions/415eb65b-d5f7-4be7-85e6-c2ecb2aba4dc.

Full text

Abstract:

Les réseaux de neurones convolutifs (CNNs) ont été exploités avec succès pour la résolution de tâches dans le domaine de la vision par ordinateur tels que la classification, la segmentation d'image, la détection d'objets dans une image ou la recherche d'images dans une base de données. Typiquement, un réseau est entraîné spécifiquement pour une tâche et l'entraînement nécessite une très grande quantité d'images annotées. Dans cette thèse, nous proposons des solutions pour extraire le maximum d'information avec un minimum de supervision. D'abord, nous nous concentrons sur la tâche de classification en examinant le processus d'apprentissage actif dans le contexte de l'apprentissage profond. Nous montrons qu'en combinant l'apprentissage actif aux techniques d'apprentissage semi-supervisé et non supervisé, il est possible d'améliorer significativement les résultats. Ensuite, nous étudions la tâche de recherche d'images dans une base de données et nous exploitons les informations de localisation spatiale disponible directement dans les cartes d'activation produites par les CNNs. En première approche, nous proposons de représenter une image par une collection de caractéristiques locales, détectées dans les cartes, qui sont peu coûteuses en terme de mémoire et assez robustes pour effectuer une mise en correspondance spatiale. Alternativement, nous découvrons dans les cartes d'activation les objets d'intérêts des images d'une base de données et nous structurons leurs représentations dans un graphe de plus proches voisins. En utilisant la mesure de centralité du graphe, nous sommes capable de construire une carte de saillance, par image, qui met en lumière les objets qui se répètent et nous permet de construire une représentation globale qui exclue les objets non pertinents et d'arrière-plan
Neural network representations proved to be relevant for many computer vision tasks such as image classification, object detection, segmentation or instance-level image retrieval. A network is trained for one particular task and requires a large number of labeled data. We propose in this thesis solutions to extract the most information with the least supervision. First focusing on the classification task, we examine the active learning process in the context of deep learning and show that combining it to semi-supervised and unsupervised techniques boost greatly results. We then investigate the image retrieval task, and in particular we exploit the spatial localization information available ``for free'' in CNN feature maps. We first propose to represent an image by a collection of affine local features detected within activation maps, which are memory-efficient and robust enough to perform spatial matching. Then again extracting information from feature maps, we discover objects of interest in images of a dataset and gather their representations in a nearest neighbor graph. Using the centrality measure on the graph, we are able to construct a saliency map per image which focuses on the repeating objects and allows us to compute a global representation excluding clutter and background

APA, Harvard, Vancouver, ISO, and other styles

41

Bolt, Barbara. "Art beyond representation : the performative power of the image /." London : I. B. Tauris, 2004. http://catalogue.bnf.fr/ark:/12148/cb39238996v.

Full text

APA, Harvard, Vancouver, ISO, and other styles

42

Giles, Paul A. "Iterated function systems and shape representation." Thesis, Durham University, 1990. http://etheses.dur.ac.uk/6188/.

Full text

Abstract:

We propose the use of iterated function systems as an isomorphic shape representation scheme for use in a machine vision environment. A concise description of the basic theory and salient characteristics of iterated function systems is presented and from this we develop a formal framework within which to embed a representation scheme. Concentrating on the problem of obtaining automatically generated two-dimensional encodings we describe implementations of two solutions. The first is based on a deterministic algorithm and makes simplifying assumptions which limit its range of applicability. The second employs a novel formulation of a genetic algorithm and is intended to function with general data input. Keywords: Machine Vision, Shape Representation, Iterated Function Systems, Genetic Algorithms.

APA, Harvard, Vancouver, ISO, and other styles

43

Brostow, Gabriel Julian. "Novel Skeletal Representation for Articulated Creatures." Diss., Georgia Institute of Technology, 2004. http://hdl.handle.net/1853/5236.

Full text

Abstract:

This research examines an approach for capturing 3D surface and structural data of moving articulated creatures. Given the task of non-invasively and automatically capturing such data, a methodology and the associated experiments are presented, that apply to multiview videos of the subjects motion. Our thesis states: A functional structure and the timevarying surface of an articulated creature subject are contained in a sequence of its 3D data. A functional structure is one example of the possible arrangements of internal mechanisms (kinematic joints, springs, etc.) that is capable of performing the motions observed in the input data. Volumetric structures are frequently used as shape descriptors for 3D data. The capture of such data is being facilitated by developments in multi-view video and range scanning, extending to subjects that are alive and moving. In this research, we examine vision-based modeling and the related representation of moving articulated creatures using Spines. We define a Spine as a branching axial structure representing the shape and topology of a 3D objects limbs, and capturing the limbs correspondence and motion over time. The Spine concept builds on skeletal representations often used to describe the internal structure of an articulated object and the significant protrusions. Our representation of a Spine provides for enhancements over a 3D skeleton. These enhancements form temporally consistent limb hierarchies that contain correspondence information about real motion data. We present a practical implementation that approximates a Spines joint probability function to reconstruct Spines for synthetic and real subjects that move. In general, our approach combines the objectives of generalized cylinders, 3D scanning, and markerless motion capture to generate baseline models from real puppets, animals, and human subjects.

APA, Harvard, Vancouver, ISO, and other styles

44

Meine, Hans. "The GeoMap representation: on topologically correct sub-pixel image analysis /." Heidelberg : Aka, 2009. http://opac.nebis.ch/cgi-bin/showAbstract.pl?u20=9783898383233.

Full text

APA, Harvard, Vancouver, ISO, and other styles

45

Gordo, Albert. "Document Image Representation, Classification and Retrieval in Large-Scale Domains." Doctoral thesis, Universitat Autònoma de Barcelona, 2013. http://hdl.handle.net/10803/117445.

Full text

Abstract:

A pesar del ideal de “oficina sin papeles” nacida en la década de los setenta, la mayoría de empresas siguen todavía luchando contra una ingente cantidad de documentación en papel. Aunque muchas empresas están haciendo un esfuerzo en la transformación de parte de su documentación interna a un formato digital sin necesidad de pasar por el papel, la comunicación con otras empresas y clientes en un formato puramente digital es un problema mucho más complejo debido a la escasa adopción de estándares. Las empresas reciben una gran cantidad de documentación en papel que necesita ser analizada y procesada, en su mayoría de forma manual. Una solución para esta tarea consiste en, en primer lugar, el escaneo automático de los documentos entrantes. A continuación, las imágenes de los documentos puede ser analizadas y la información puede ser extraida a partir de los datos. Los documentos también pueden ser automáticamente enviados a los flujos de trabajo adecuados, usados para buscar documentos similares en bases de datos para transferir información, etc. Debido a la naturaleza de esta “sala de correo” digital, es necesario que los métodos de representación de documentos sean generales, es decir, adecuados para representar correctamente tipos muy diferentes de documentos. Es necesario que los métodos sean robustos, es decir, capaces de representar nuevos tipos de documentos, imágenes con ruido, etc. Y, por último, es necesario que los métodos sean escalables, es decir, capaces de funcionar cuando miles o millones de documentos necesitan ser tratados, almacenados y consultados. Desafortunadamente, las técnicas actuales de representación, clasificación y búsqueda de documentos no son aptos para esta sala de correo digital, ya que no cumplen con algunos o ninguno de estos requisitos. En esta tesis nos centramos en el problema de la representación de documentos enfocada a la clasificación y búsqueda en el marco de la sala de correo digital. En particular, en la primera parte de esta tesis primero presentamos un descriptor de documentos basado en un histograma de “runlengths” a múltiples escalas. Este descriptor supera en resultados a otros métodos del estado-del-arte en bases de datos públicas y propias de diferente naturaleza y condición en tareas de clasificación y búsqueda de documentos. Más tarde modificamos esta representación para hacer frente a documentos más complejos, tales como documentos de varias páginas o documentos que contienen más fuentes de información como texto extraído por OCR. En la segunda parte de esta tesis nos centramos en el requisito de escalabilidad, sobre todo para las tareas de búsqueda, en el que todos los documentos deben estar disponibles en la memoria RAM para que la búsqueda pueda ser eficiente. Proponemos un nuevo método de binarización que llamamos PCAE, así como dos distancias asimétricas generales para descriptores binarios que pueden mejorar significativamente los resultados de la búsqueda con un mínimo coste computacional adicional. Por último, señalamos la importancia del aprendizaje supervisado cuando se realizan búsquedas en grandes bases de datos y estudiamos varios enfoques que pueden aumentar significativamente la precisión de los resultados sin coste adicional en tiempo de consulta.
Despite the “paperless office” ideal that started in the decade of the seventies, businesses still strive against an increasing amount of paper documentation. Although many businesses are making an effort in transforming some of the internal documentation into a digital form with no intrinsic need for paper, the communication with other businesses and clients in a pure digital form is a much more complex problem due to the lack of adopted standards. Companies receive huge amounts of paper documentation that need to be analyzed and processed, mostly in a manual way. A solution for this task consists in, first, automatically scanning the incoming documents. Then, document images can be analyzed and information can be extracted from the data. Documents can also be automatically dispatched to the appropriate workflows, used to retrieve similar documents in the dataset to transfer information, etc. Due to the nature of this “digital mailroom”, we need document representation methods to be general, i.e., able to cope with very different types of documents. We need the methods to be sound, i.e., able to cope with unexpected types of documents, noise, etc. And, we need to methods to be scalable, i.e., able to cope with thousands or millions of documents that need to be processed, stored, and consulted. Unfortunately, current techniques of document representation, classification and retrieval are not apt for this digital mailroom framework, since they do not fulfill some or all of these requirements. Through this thesis we focus on the problem of document representation aimed at classification and retrieval tasks under this digital mailroom framework. Specifically, on the first part of this thesis, we first present a novel document representation based on runlength histograms that achieves state-of-the-art results on public and in-house datasets of different nature and quality on classification and retrieval tasks. This representation is later modified to cope with more complex documents such as multiple-page documents, or documents that contain more sources of information such as extracted OCR text. Then, on the second part of this thesis, we focus on the scalability requirements, particularly for retrieval tasks, where all the documents need to be available in RAM memory for the retrieval to be efficient. We propose a novel binarization method which we dubbed PCAE, as well as two general asymmetric distances between binary embeddings that can significantly improve the retrieval results at a minimal extra computational cost. Finally, we note the importance of supervised learning when performing large-scale retrieval, and study several approaches that can significantly boost the results at no extra cost at query time.

APA, Harvard, Vancouver, ISO, and other styles

46

Reed, Steve. "Vector quantization applied to a facet representation of an image." Thesis, University of Ottawa (Canada), 1987. http://hdl.handle.net/10393/5453.

Full text

APA, Harvard, Vancouver, ISO, and other styles

47

Hu, Li. "Low power CMOS image sensor using adaptive address event representation /." View abstract or full-text, 2007. http://library.ust.hk/cgi/db/thesis.pl?ECED%202007%20HU.

Full text

APA, Harvard, Vancouver, ISO, and other styles

48

Hedjam, Rachid. "Visual image processing in various representation spaces for documentary preservation." Mémoire, École de technologie supérieure, 2013. http://espace.etsmtl.ca/1186/1/HEDJAM_Rachid.pdf.

Full text

Abstract:

Cette thèse établit un cadre de travail de traitement d’images pour le rehaussement et la restauration des images de documents historiques (IDH) dans deux espaces de représentations différents : l’espace des niveaux de gris et de couleur et l’espace multispectral (MS). Elle consiste en trois contributions majeures: 1) la binarisation d’IDH en niveaux de gris ou en couleur, 2) la restauration des IDH capturées au moyen de l’imagerie MS, et 3) l’estimation de données de référence (DR) utilisées à des fins d’évaluation des algorithmes de binarisation d’IDH. La binarisation d’IDH est l’une des techniques de rehaussement qui produit des informations binaires faciles à manipuler par des méthodes d’analyse de haut niveau (OCR, par exemple), et moins coûteuse en termes de calcul par rapport aux images couleurs ou en niveaux de gris. La restauration des IDH dans un espace de représentation MS améliore leur lisibilité, ce qui n’est pas possible avec les méthodes classiques de restauration basées sur l’intensité ou la couleur. La lisibilité des IDH est la principale préoccupation des historiens et bibliothécaires qui souhaitent toujours transférer les connaissances et faire revivre l’ancien patrimoine culturel et scientifique. L’utilisation des systèmes d’imagerie MS est une piste de recherche à la fois nouvelle et attrayante dans le domaine du traitement numérique d’IDH. Dans cette thèse, ces systèmes sont également utilisés pour estimer automatiquement des DR plus précises à utiliser pour l’évaluation d’algorithmes de binarisation d’IDH afin de poursuivre le niveau de performance humaine. Notre première contribution est une nouvelle méthode de binarisation adaptative destinée aux IDH en niveaux de gris et couleurs. Puisque la dégradation est présente un peu partout sur la surface des IDH, les méthodes de binarisation doivent être adaptées pour gérer localement ces phénomènes de dégradation. Malheureusement, ces méthodes ne sont pas efficaces, car elles ne sont pas en mesure de capturer les traits de texte de faible intensité, ce qui entraîne une détérioration de la performance des moteurs de reconnaissance de caractères (OCR). L’approche proposée détecte en premier lieu un sous-ensemble de pixels de texte les plus probables, qui sont utilisés pour estimer les paramètres locaux des deux classes (texte et fond), puis effectue une classification à base de maximum de vraisemblance (MV) afin de classifier localement les pixels restants en fonction de leur appartenance aux classes. Au meilleur de notre connaissance, c’est la première fois que l’estimation des paramètres et la classification locale dans un cadre de MV a été introduite pour la binarisation d’IDH avec des résultats prometteurs. Une limitation de cette méthode, comme pour toutes les méthodes de ehaussement basées sur l’intensité, est qu’elles ne sont pas efficaces dans le traitement d’IDH gravement dégradées. Développer des méthodes plus avancées fondées sur les informations MS serait une alternative prometteuse de la recherche. Dans la deuxième contribution, une nouvelle approche pour la restauration visuelle d’IDH est définie. L’approche vise à fournir une meilleure qualité visuelle des IDH à l’utilisateur final (historien, bibliothécaire, etc.). Plus précisément, elle vise à les restaurer à partir des dégradations, tout en conservant intact leur aspect original. En pratique, ce problème ne peut pas être résolu facilement par les méthodes classiques de restauration basées sur l’intensité. Pour faire face à ces limitations, l’imagerie MS est utilisée pour produire d’autres images spectrales dans la lumière invisible (infrarouge et ultraviolet), ce qui donne un meilleur contraste au contenu des IDH. Le cadre de travail variationnel de ’inpainting’ proposé ici pour la restauration d’IDH consiste à isoler les dégradations dans les images spectrales infrarouges, puis les retoucher (’inpainting’) dans les images spectrales visibles. L’image couleur finale à visualiser est donc reconstruite à partir des images spectrales visibles restaurées. Au meilleur de notre connaissance, c’est la première fois que la technique de ’inpainting’ a été mise en place pour la restauration d’IDH ultispectrales. Les résultats expérimentaux sont prometteurs, et notre objectif, en collaboration avec la BAnQ (Bibliothèque et Archives nationales du Québec), est de rendre disponible les documents du patrimoine dans le domaine public et de construire un moteur intelligent pour y accéder. Il est utile de noter que le modèle proposé peut être étendu à d’autres applications basées sur les images MS. Notre troisième contribution, qui consiste à considérer un nouveau problème d’estimation de DR, est présentée afin de montrer l’importance de travailler avec des images MS plutôt que des images en niveaux de gris ou en couleur. Les DR sont nécessaires pour comparer différents algorithmes de binarisation, et ils sont habituellement générés par un expert. Cependant, les DR d’un expert sont toujours sujettes à des erreurs d’étiquetage et de jugement, en particulier dans le cas des données dégradées traitées dans des espaces de représentation restreints (images en niveaux de gris ou couleur). Dans la méthode proposée, plusieurs RD générées par plusieurs experts sont utilisées en combinaison avec l’image de document MS pour estimer une nouvelle RD plus précise. L’idée est d’inclure la fidélité de données multivariée et le degré de consensus des experts à propos des étiquettes dans un cadre unique de classification Bayésien pour estimer la probabilité a posteriori des nouvelles étiquettes formant la RD finale à estimer. Nos expériences montrent que les RD estimées sont plus précises que celles générées individuellement par l’expert. Au meilleur de notre connaissance, aucun travail similaire, combinant les RD générées par un expert et les données MS, a été effectuée pour l’estimation des RD.

APA, Harvard, Vancouver, ISO, and other styles

49

Lin, Ya-Jing, and 林雅靜. "Image Segmentation and Its Image Information Representation." Thesis, 1995. http://ndltd.ncl.edu.tw/handle/42695777365049890505.

Full text

APA, Harvard, Vancouver, ISO, and other styles

50

Chen, Jhih-Hao, and 陳志豪. "Image Processing with Sparse Representation." Thesis, 2013. http://ndltd.ncl.edu.tw/handle/06104438120528116454.

Full text

Abstract:

碩士
義守大學
資訊工程學系
101
In signal processing, presentation of signal is one of important topics. Usually there is redundant information in signals, which wastes a lot of memories. Sparse representation is a plausible method to avoid such wastes, so it is important in signal processing. For signals with sparse presentation, the energy is focused only on a small portion of components and the others are 0. This feature makes the signal easy to compress. The small portion of comoonents can be regarded as the feature of signal. Sparse presentation can be applied to image compression, image feature extraction, image retrieval, image denoising, and image restoration. In this thesis, we consider both of the compressed sensing and morphological component analysis based on sparse presentation. We take two parts to study the sparse presentation. The first part is signal reconstruction with compressed sensing. Compressed sensing filters the redundant signal and leaves least signal to reconstruct the image. For this problem we compare many kinds of matching pursuit methods for image reconstruction. We also use machine learning method to approximate the model of signal reconstruction. The other part is morphology component analysis. It is different from compressed sensing which takes care where the energy of signal is focused on. We use iterative thresholding algorithm for image decomposition. We also use orthogonal matching pursuit for this problem and compare their results.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'Image representation'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles