Dissertations / Theses: 'IMAGE RETRIEVAL TECHNIQUES'

1

Shaffrey, Cian William. "Multiscale techniques for image segmentation, classification and retrieval." Thesis, University of Cambridge, 2003. https://www.repository.cam.ac.uk/handle/1810/272033.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Yang, Cheng 1974. "Image database retrieval with multiple-instance learning techniques." Thesis, Massachusetts Institute of Technology, 1998. http://hdl.handle.net/1721.1/50505.

Full text

Abstract:

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1998.
Includes bibliographical references (p. 81-82).
In this thesis, we develop and test an approach to retrieving images from an image database based on content similarity. First, each picture is divided into many overlapping regions. For each region, the sub-picture is filtered and converted into a feature vector. In this way, each picture is represented by a number of different feature vectors. The user selects positive and negative image examples to train the system. During the training, a multiple-instance learning method known as the Diverse Density algorithm is employed to determine which feature vector in each image best represents the user's concept, and which dimensions of the feature vectors are important. The system tries to retrieve images with similar feature vectors from the remainder of the database. A variation of the weighted correlation statistic is used to determine image similarity. The approach is tested on a large database of natural scenes as well as single- and multiple-object images. Comparisons are made against a previous approach, and the effects of tuning various training parameters, as well as that of adjusting algorithmic details, are also studied.
by Cheng Yang.
S.M.

APA, Harvard, Vancouver, ISO, and other styles

3

Carswell, James. "Using Raster Sketches for Digital Image Retrieval." Fogler Library, University of Maine, 2000. http://www.library.umaine.edu/theses/pdf/CarswellJD2000.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

4

Zhang, Dengsheng 1963. "Image retrieval based on shape." Monash University, School of Computing and Information Technology, 2002. http://arrow.monash.edu.au/hdl/1959.1/8688.

Full text

APA, Harvard, Vancouver, ISO, and other styles

5

Lim, Suryani. "Feature extraction, browsing and retrieval of images." Monash University, School of Computing and Information Technology, 2005. http://arrow.monash.edu.au/hdl/1959.1/9677.

Full text

APA, Harvard, Vancouver, ISO, and other styles

6

Goncalves, Pinheiro Antonio Manuel. "Shape approximation and retrieval using scale-space techniques." Thesis, University of Essex, 2001. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.391661.

Full text

APA, Harvard, Vancouver, ISO, and other styles

7

Li, Yuanxi. "Semantic image similarity based on deep knowledge for effective image retrieval." HKBU Institutional Repository, 2014. https://repository.hkbu.edu.hk/etd_oa/99.

Full text

Abstract:

A flourishing World Wide Web dramatically increases the amount of images uploaded and shared, and exploring them is an interesting and challenging task. While content-based image retrieval, which is based on the low level features extracted from images, has grown relatively mature, human users are more interested in the semantic concepts behind or inside the images. Search that is based solely on the low level features would not be able to satisfy users requirements and not e.ective enough. In order to measure the semantic similarity among images and increase the accuracy of Web image retrieval, it is necessary to dig the deep concept and semantic meaning of the image as well as to overcome the semantic gap. By exploiting the context of Web images, knowledge base and ontology-based similarities, through the analysis of user behavior of image similarity evaluation, we established a set of formulas which allows e.cient and accurate semantic similarity measurement of images. When jointly applied with ontology-based query expansion approaches and an adaptive image search engine for deep knowledge indexing, they are able to produce a new level of meaningful automatic image annotation, from which semantic image search may be performed. Besides, the semantic concept can be automatically enriched in MPEG-7 Structured Image Annotation approach. The system is evaluated quantitatively using more than thousands of Web images with associated human tags with user subjective test. Experimental results indicate that this approach is able to deliver highly competent performance, attaining good precision e.ciency. This approach enables an advanced degree of semantic richness to be automatically associated with images and e.cient image concept similarity measurement which could previously only be performed manually. Keywords: Image Index, Image Retrieval, Semantic Similarity, Relevance Feedback, Knowledge Base, Ontology, Query Expansion, MPEG-7 . . .

APA, Harvard, Vancouver, ISO, and other styles

8

Wong, Chun Fan. "Automatic semantic image annotation and retrieval." HKBU Institutional Repository, 2010. http://repository.hkbu.edu.hk/etd_ra/1188.

Full text

APA, Harvard, Vancouver, ISO, and other styles

9

Ling, Haibin. "Techniques for image retrieval deformation insensitivity and automatic thumbnail cropping /." College Park, Md. : University of Maryland, 2006. http://hdl.handle.net/1903/3859.

Full text

Abstract:

Thesis (Ph. D.) -- University of Maryland, College Park, 2006.
Thesis research directed by: Computer Science. Title from t.p. of PDF. Includes bibliographical references. Published by UMI Dissertation Services, Ann Arbor, Mich. Also available in paper.

APA, Harvard, Vancouver, ISO, and other styles

10

Liu, Danzhou. "EFFICIENT TECHNIQUES FOR RELEVANCE FEEDBACK PROCESSING IN CONTENT-BASED IMAGE RETRIEVAL." Doctoral diss., University of Central Florida, 2009. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/2991.

Full text

Abstract:

In content-based image retrieval (CBIR) systems, there are two general types of search: target search and category search. Unlike queries in traditional database systems, users in most cases cannot specify an ideal query to retrieve the desired results for either target search or category search in multimedia database systems, and have to rely on iterative feedback to refine their query. Efficient evaluation of such iterative queries can be a challenge, especially when the multimedia database contains a large number of entries, and the search needs many iterations, and when the underlying distance measure is computationally expensive. The overall processing costs, including CPU and disk I/O, are further emphasized if there are numerous concurrent accesses. To address these limitations involved in relevance feedback processing, we propose a generic framework, including a query model, index structures, and query optimization techniques. Specifically, this thesis has five main contributions as follows. The first contribution is an efficient target search technique. We propose four target search methods: naive random scan (NRS), local neighboring movement (LNM), neighboring divide-and-conquer (NDC), and global divide-and-conquer (GDC) methods. All these methods are built around a common strategy: they do not retrieve checked images (i.e., shrink the search space). Furthermore, NDC and GDC exploit Voronoi diagrams to aggressively prune the search space and move towards target images. We theoretically and experimentally prove that the convergence speeds of GDC and NDC are much faster than those of NRS and recent methods. The second contribution is a method to reduce the number of expensive distance computation when answering k-NN queries with non-metric distance measures. We propose an efficient distance mapping function that transfers non-metric measures into metric, and still preserves the original distance orderings. Then existing metric index structures (e.g., M-tree) can be used to reduce the computational cost by exploiting the triangular inequality property. The third contribution is an incremental query processing technique for Support Vector Machines (SVMs). SVMs have been widely used in multimedia retrieval to learn a concept in order to find the best matches. SVMs, however, suffer from the scalability problem associated with larger database sizes. To address this limitation, we propose an efficient query evaluation technique by employing incremental update. The proposed technique also takes advantage of a tuned index structure to efficiently prune irrelevant data. As a result, only a small portion of the data set needs to be accessed for query processing. This index structure also provides an inexpensive means to process the set of candidates to evaluate the final query result. This technique can work with different kernel functions and kernel parameters. The fourth contribution is a method to avoid local optimum traps. Existing CBIR systems, designed around query refinement based on relevance feedback, suffer from local optimum traps that may severely impair the overall retrieval performance. We therefore propose a simulated annealing-based approach to address this important issue. When a stuck-at-a-local-optimum occurs, we employ a neighborhood search technique (i.e., simulated annealing) to continue the search for additional matching images, thus escaping from the local optimum. We also propose an index structure to speed up such neighborhood search. Finally, the fifth contribution is a generic framework to support concurrent accesses. We develop new storage and query processing techniques to exploit sequential access and leverage inter-query concurrency to share computation. Our experimental results, based on the Corel dataset, indicate that the proposed optimization can significantly reduce average response time while achieving better precision and recall, and is scalable to support a large user community. This latter performance characteristic is largely neglected in existing systems making them less suitable for large-scale deployment. With the growing interest in Internet-scale image search applications, our framework offers an effective solution to the scalability problem.
Ph.D.
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Computer Science PhD

APA, Harvard, Vancouver, ISO, and other styles

11

Yu, Ning. "Techniques for boosting the performance in content-based image retrieval systems." Doctoral diss., University of Central Florida, 2011. http://digital.library.ucf.edu/cdm/ref/collection/ETD/id/4726.

Full text

Abstract:

Content-Based Image Retrieval has been an active research area for decades. In a CBIR system, one or more images are used as query to search for similar images. The similarity is measured on the low level features, such as color, shape, edge, texture. First, each image is processed and visual features are extracted. Therefore each image becomes a point in the feature space. Then, if two images are close to each other in the feature space, they are considered similar. That is, the k nearest neighbors are considered the most similar images to the query image. In this K-Nearest Neighbor (k-NN) model, semantically similar images are assumed to be clustered together in a single neighborhood in the high-dimensional feature space. Unfortunately semantically similar images with different appearances are often clustered into distinct neighborhoods, which might scatter in the feature space. Hence, confinement of the search results to a single neighborhood is the latent reason of the low recall rate of typical nearest neighbor techniques. In this dissertation, a new image retrieval technique - the Query Decomposition (QD) model is introduced. QD facilitates retrieval of semantically similar images from multiple neighborhoods in the feature space and hence bridges the semantic gap between the images' low-level feature and the high-level semantic meaning. In the QD model, a query may be decomposed into multiple subqueries based on the user's relevance feedback to cover multiple image clusters which contain semantically similar images. The retrieval results are the k most similar images from multiple discontinuous relevant clusters. To apply the benefit from QD study, a mobile client-side relevance feedback study was conducted. With the proliferation of handheld devices, the demand of multimedia information retrieval on mobile devices has attracted more attention. A relevance feedback information retrieval process usually includes several rounds of query refinement. Each round incurs exchange of tens of images between the mobile device and the server. With limited wireless bandwidth, this process can incur substantial delay making the system unfriendly to use. The Relevance Feedback Support (RFS) structure that was designed in QD technique was adopted for Client-side Relevance Feedback (CRF). Since relevance feedback is done on client side, system response is instantaneous significantly enhancing system usability. Furthermore, since the server is not involved in relevance feedback processing, it is able to support thousands more users simultaneously. As the QD technique improves on the accuracy of CBIR systems, another study, which is called In-Memory relevance feedback is studied in this dissertation. In the study, we improved the efficiency of the CBIR systems. Current methods rely on searching the database, stored on disks, in each round of relevance feedback. This strategy incurs long delay making relevance feedback less friendly to the user, especially for very large databases. Thus, scalability is a limitation of existing solutions. The proposed in-memory relevance feedback technique substantially reduce the delay associated with feedback processing, and therefore improve system usability. A data-independent dimensionality-reduction technique is used to compress the metadata to build a small in-memory database to support relevance feedback operations with minimal disk accesses. The performance of this approach is compared with conventional relevance feedback techniques in terms of computation efficiency and retrieval accuracy. The results indicate that the new technique substantially reduces response time for user feedback while maintaining the quality of the retrieval. In the previous studies, the QD technique relies on a pre-defined Relevance Support Support structure. As the result and user experience indicated that the structure might confine the search range and affect the result. In this dissertation, a novel Multiple Direction Search framework for semi-automatic annotation propagation is studied. In this system, the user interacts with the system to provide example images and the corresponding annotations during the annotation propagation process. In each iteration, the example images are dynamically clustered and the corresponding annotations are propagated separately to each cluster: images in the local neighborhood are annotated. Furthermore, some of those images are returned to the user for further annotation. As the user marks more images, the annotation process goes into multiple directions in the feature space. The query movements can be treated as multiple path navigation. Each path could be further split based on the user's input. In this manner, the system provides accurate annotation assistance to the user - images with the same semantic meaning but different visual characteristics can be handled effectively. From comprehensive experiments on Corel and U. of Washington image databases, the proposed technique shows accuracy and efficiency on annotating image databases.
ID: 030646264; System requirements: World Wide Web browser and PDF reader.; Mode of access: World Wide Web.; Thesis (Ph.D.)--University of Central Florida, 2011.; Includes bibliographical references (p. 81-91).
Ph.D.
Doctorate
Computer Science
Engineering and Computer Science
Computer Science

APA, Harvard, Vancouver, ISO, and other styles

12

Aboaisha, Hosain. "The optimisation of elementary and integrative content-based image retrieval techniques." Thesis, University of Huddersfield, 2015. http://eprints.hud.ac.uk/id/eprint/26164/.

Full text

Abstract:

Image retrieval plays a major role in many image processing applications. However, a number of factors (e.g. rotation, non-uniform illumination, noise and lack of spatial information) can disrupt the outputs of image retrieval systems such that they cannot produce the desired results. In recent years, many researchers have introduced different approaches to overcome this problem. Colour-based CBIR (content-based image retrieval) and shape-based CBIR were the most commonly used techniques for obtaining image signatures. Although the colour histogram and shape descriptor have produced satisfactory results for certain applications, they still suffer many theoretical and practical problems. A prominent one among them is the well-known “curse of dimensionality “. In this research, a new Fuzzy Fusion-based Colour and Shape Signature (FFCSS) approach for integrating colour-only and shape-only features has been investigated to produce an effective image feature vector for database retrieval. The proposed technique is based on an optimised fuzzy colour scheme and robust shape descriptors. Experimental tests were carried out to check the behaviour of the FFCSS-based system, including sensitivity and robustness of the proposed signature of the sampled images, especially under varied conditions of, rotation, scaling, noise and light intensity. To further improve retrieval efficiency of the devised signature model, the target image repositories were clustered into several groups using the k-means clustering algorithm at system runtime, where the search begins at the centres of each cluster. The FFCSS-based approach has proven superior to other benchmarked classic CBIR methods, hence this research makes a substantial contribution towards corresponding theoretical and practical fronts.

APA, Harvard, Vancouver, ISO, and other styles

13

Yoon, Janghyun. "A network-aware semantics-sensitive image retrieval system." Diss., Available online, Georgia Institute of Technology, 2004:, 2003. http://etd.gatech.edu/theses/available/etd-04082004-180459/unrestricted/yoon%5fjanghyun%5f200312%5fphd.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

14

Vila, Duran Marius. "Information theory techniques for multimedia data classification and retrieval." Doctoral thesis, Universitat de Girona, 2015. http://hdl.handle.net/10803/302664.

Full text

Abstract:

We are in the information age where most data is stored in digital format. Thus, the management of digital documents and videos requires the development of efficient techniques for automatic analysis. Among them, capturing the similarity or dissimilarity between different document images or video frames are extremely important. In this thesis, we first analyze for several image resolutions the behavior of three different families of image-based similarity measures applied to invoice classification. In these three set of measures, the computation of the similarity between two images is based, respectively, on intensity differences, mutual information, and normalized compression distance. As the best results are obtained with mutual information-based measures, we proceed to investigate the application of three different Tsallis-based generalizations of mutual information for different entropic indexes. These three generalizations derive respectively from the Kullback-Leibler distance, the difference between entropy and conditional entropy, and the Jensen-Shannon divergence. In relation to digital video processing, we propose two different information-theoretic approaches based, respectively, on Tsallis mutual information and Jensen-Tsallis divergence to detect the abrupt shot boundaries of a video sequence and to select the most representative keyframe of each shot. Finally, Shannon entropy has been commonly used to quantify the image informativeness. The main drawback of this measure is that it does not take into account the spatial distribution of pixels. In this thesis, we analyze four information-theoretic measures that overcome this limitation. Three of them (entropy rate, excess entropy, and erasure entropy) consider the image as a stationary stochastic process, while the fourth (partitional information) is based on an information channel between image regions and histogram bins
Ens trobem a l’era de la informació on la majoria de les dades s’emmagatzemen en format digital. Per tant, la gestió de documents i vídeos digitals requereix el desenvolupament de tècniques eficients per a l’anàlisi automàtic. Entre elles, la captura de la similitud o dissimilitud entre diferents imatges de documents o fotogrames de vídeo és extremadament important. En aquesta tesi, analitzem, a diverses resolucions d’imatge, el comportament de tres famílies diferents de mesures basades en similitud d’imatges i aplicades a la classificació de factures. En aquests tres conjunt de mesures, el càlcul de la similitud entre dues imatges es basa, respectivament, en les diferències d’intensitat, en la informació mútua, i en la distància de compressió normalitzada. Degut a que els millors resultats s’obtenen amb les mesures basades en la informació mútua, es procedeix a investigar l’aplicació de tres generalitzacions de la informació mútua basades en Tsallis en diferents índexs entròpics. Aquestes tres generalitzacions es deriven respectivament de la distància de Kullback-Leibler, la diferència entre l’entropia i entropia condicional, i la divergència de Jensen-Shannon. En relació al processament de vídeo digital, proposem dos enfocaments diferents de teoria de la informació basats respectivament en la informació mútua de Tsallis i en la divergència de Jensen-Tsallis, per detectar els límits d’un pla cinematogràfic en una seqüència de vídeo i per seleccionar el fotograma clau més representatiu de cada pla. Finalment, l’entropia de Shannon s’ha utilitzat habitualment per quantificar la informativitat d’una imatge. El principal inconvenient d’aquesta mesura és que no té en compte la distribució espacial dels píxels. En aquesta tesi, s’analitzen quatre mesures de teoria de la informació que superen aquesta limitació. Tres d’elles (entropy rate, excess entropy i erasure entropy) consideren la imatge com un procés estocàstic estacionari, mentre que la quarta (partitional information) es basa en un canal d’informació entre les regions d’una imatge i els intervals de l’histograma

APA, Harvard, Vancouver, ISO, and other styles

15

Tew, Kevin. "Skuery : manipulation of S-expressions using Xquery techniques /." Diss., CLICK HERE for online access, 2006. http://contentdm.lib.byu.edu/ETD/image/etd1677.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

Voulgaris, Georgios. "Techniques for content-based image characterization in wavelets domain." Thesis, University of South Wales, 2008. https://pure.southwales.ac.uk/en/studentthesis/techniques-for-contentbased-image-characterization-in-wavelets-domain(14c72275-a91e-4ba7-ada8-bdaee55de194).html.

Full text

Abstract:

This thesis documents the research which has led to the design of a number of techniques aiming to improve the performance of content-based image retrieval (CBIR) systems in wavelets domain using texture analysis. Attention was drawn on CBIR in transform domain and in particular wavelets because of the excellent characteristics for compression and texture extraction applications and the wide adoption in both the research community and the industry. The issue of performance is addressed in terms of accuracy and speed. The rationale for this research builds upon the conclusion that CBIR has not yet reached a good performance balance of accuracy, efficiency and speed for wide adoption in practical applications. The issue of bridging the sensory gap, which is defined as "[the difference] between the object in the real world and the information in a (computational) description derived from a recording of that scene." has yet to be resolved. Furthermore, speed improvement remains an uncharted territory as is feature extraction directly from the bitstream of compressed images. To address the above requirements the first part of this work introduces three techniques designed to jointly address the issue of accuracy and processing cost of texture characterization in wavelets domain. The second part introduces a new model for mapping the wavelet coefficients of an orthogonal wavelet transformation to a circular locus. The model is applied in order to design a novel rotation-invariant texture descriptor. All of the aforementioned techniques are also designed to bridge the gap between texture-based image retrieval and image compression by using appropriate compatible design parameters. The final part introduces three techniques for improving the speed of a CBIR query through more efficient calculation of the Li-distance, when it is used as an image similarity metric. The contributions conclude with a novel technique which, in conjunction with a widely adopted wavelet-based compression algorithm, extracts texture information directly from the compressed bit-stream for speed and storage requirements savings. The experimental findings indicate that the proposed techniques form a solid groundwork which can be extended to practical applications.

APA, Harvard, Vancouver, ISO, and other styles

17

Teng, Shyh Wei 1973. "Image indexing and retrieval based on vector quantization." Monash University, Gippsland School of Computing and Information Technology, 2003. http://arrow.monash.edu.au/hdl/1959.1/5764.

Full text

APA, Harvard, Vancouver, ISO, and other styles

18

Conser, Erik Timothy. "Improved Scoring Models for Semantic Image Retrieval Using Scene Graphs." PDXScholar, 2017. https://pdxscholar.library.pdx.edu/open_access_etds/3879.

Full text

Abstract:

Image retrieval via a structured query is explored in Johnson, et al. [7]. The query is structured as a scene graph and a graphical model is generated from the scene graph's object, attribute, and relationship structure. Inference is performed on the graphical model with candidate images and the energy results are used to rank the best matches. In [7], scene graph objects that are not in the set of recognized objects are not represented in the graphical model. This work proposes and tests two approaches for modeling the unrecognized objects in order to leverage the attribute and relationship models to improve image retrieval performance.

APA, Harvard, Vancouver, ISO, and other styles

19

Huang, Ranxi. "Semi-automated techniques for the retrieval of dermatological condition in color skin images /." Online version of thesis, 2009. http://hdl.handle.net/1850/11355.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Faichney, Jolon. "Content-Based Retrieval of Digital Video." Thesis, Griffith University, 2005. http://hdl.handle.net/10072/365697.

Full text

Abstract:

In the next few years consumers will have access to large amounts of video and image data either created by themselves with digital video and still cameras or by having access to other image and video content electronically. Existing personal computer hardware and software has not been designed to manage large quantities of multimedia content. As a result, research in the area of content-based video retrieval (CBVR) has been underway for the last fifteen years. This research aims to improve CBVR by providing an accurate and reliable shape-colour representation and by providing a new 3D user interface called DomeWorld for the efficient browsing of large video databases. Existing feature extraction techniques designed for use in large databases are typically simple techniques as they must conform to the limited processing and storage constraints that are exhibited by large scale databases. Conversely, more complex feature extraction techniques provide higher level descriptions of the underlying data but are time consuming and require large amounts of storage making them less useful for large databases. In this thesis a technique for medium to high level shape representation is presented that exhibits efficient storage and query performance. The technique uses a very accurate contour detection system that incorporates a new asymmetry edge detector which is shown to perform better than other contour detection techniques combined with a new summarisation technique to efficiently store contours. In addition, contours are represented by histograms further reducing space requirements and increasing query performance. A new type of histogram is introduced called the fuzzy histogram and is applied to content-based retrieval systems for the first time. Fuzzy histograms improve the ranking of query results over non-fuzzy techniques especially in low bin-count histogram configurations. The fuzzy contour histogram approach is compared with an exhaustive contour comparison technique and is found to provide equivalent or better results. A number of colour distribution representation techniques were investigated for integration with the contour histogram and the fuzzy HSV histogram was found to provide the best performance. When the colour and contour histograms were integrated less overall bins were required as each histogram compensates for the other’s weaknesses. The result is that only a quarter of the bins were required than either colour or contour histogram alone further reducing query times and storage requirements. This research also improves the user experience with a new user interface called DomeWorld that uses three-dimensional translucent domes. Existing user interfaces are either designed for image databases, for browsing videos, or for browsing large non-multimedia data sets. DomeWorld is designed to be able to browse both image and video databases through a number of innovative techniques including hierarchical clustering, radial space-filling layout of nodes, three-dimensional presentation, and translucent domes that allow the hierarchical nature of the data to be viewed whilst also seeing the relationship between child nodes a number of levels deep. A taxonomy of existing image, video, and large data set user interfaces is presented and the proposed user interface is evaluated within the framework. It is found that video database user interfaces have four requirements: context and detail, gisting, clustering, and integration of video and images. None of the 27 evaluated user interfaces satisfy all four requirements. The DomeWorld user interface is designed to satisfy all of the requirements and presents a step forward in CBVR user interaction. This thesis investigates two important areas of CBVR, structural indexing and user interaction, and presents techniques which advance the field. These two areas will become very important in the future when users must access and manage large collections of image and video content.
Thesis (PhD Doctorate)
Doctor of Philosophy (PhD)
School of Information Technology
Science, Environment, Engineering and Technology
Full Text

APA, Harvard, Vancouver, ISO, and other styles

21

PIRAS, LUCA. "Interactive search techniques for content-based retrieval from archives of images." Doctoral thesis, Università degli Studi di Cagliari, 2011. http://hdl.handle.net/11584/266315.

Full text

Abstract:

Through a little investigation by file types it is possible to easily find that one of the most popular search engines has in its indexes about 10 billion of images. Even considering that this data is probably an underestimate of the real number, however, immediately it gives us an idea of how the images are a key component in human communication. This so exorbitant number puts us in the face of the enormous difficulties encountered when one has to deal with them. Until now, the images have always been accompanied by textual data: description, tags, labels, ... which are used to retrieve them fromthe archives. However it is clear that their increase, occurred in recent years, does not allow this type cataloguing. Furthermore, for its own nature, a manual cataloguing is subjective, partial and without doubt subject to error. To overcome this situation in recent years it has gotten a footing a kind of search based on the intrinsic characteristics of images such as colors and shapes. This information is then converted into numerical vectors, and through their comparison it is possible to find images that have similar characteristics. It is clear that a search, on this level of representation of the images, is far from the user perception that of the images. To allow the interaction between users and retrieval systems and improve the performance, it has been decided to involve the user in the search allowing to him to give a feedback of relevance of the images retrieved so far. In this the kind of image that are interesting for user can be learnt by the system and an improvement in the next iteration can be obtained. These techniques, although studied for many years, still present open issues. High dimensional feature spaces, lack of relevant training images, and feature spaceswith lowdiscriminative capability are just some of the problems encountered. In this thesis these problems will be faced by proposing some innovative solutions both to improve performance obtained by methods proposed in the literature, and to provide to retrieval systems greater generalization capability. Techniques of data fusion, both at the feature space level and at the level of different retrieval techniques, will be presented, showing that the former allow greater discriminative capability while the latter provide more robustness to the system. To overcome the lack of images of training it will be proposed a method to generate synthetic patterns allowing in this way a more balanced learning. Finally, new methods to measure similarity between images and to explore more efficiently the feature space will be proposed. The presented results show that the proposed approaches are indeed helpful in resolving some of the main problems in content based image retrieval.

APA, Harvard, Vancouver, ISO, and other styles

22

Moreux, Jean-Philippe, and Guillaume Chiron. "Image Retrieval in Digital Libraries: A Large Scale Multicollection Experimentation of Machine Learning techniques." Sächsische Landesbibliothek - Staats- und Universitätsbibliothek Dresden, 2017. https://slub.qucosa.de/id/qucosa%3A16444.

Full text

Abstract:

While historically digital heritage libraries were first powered in image mode, they quickly took advantage of OCR technology to index printed collections and consequently improve the scope and performance of the information retrieval services offered to users. But the access to iconographic resources has not progressed in the same way, and the latter remain in the shadows: manual incomplete and heterogeneous indexation, data silos by iconographic genre. Today, however, it would be possible to make better use of these resources, especially by exploiting the enormous volumes of OCR produced during the last two decades, and thus valorize these engravings, drawings, photographs, maps, etc. for their own value but also as an attractive entry point into the collections, supporting discovery and serenpidity from document to document and collection to collection. This article presents an ETL (extract-transform-load) approach to this need, that aims to: Identify and extract iconography wherever it may be found, in image collections but also in printed materials (dailies, magazines, monographies); Transform, harmonize and enrich the image descriptive metadata (in particular with machine learning classification tools); Load it all into a web app dedicated to image retrieval. The approach is pragmatically dual, since it involves leveraging existing digital resources and (virtually) on-the-shelf technologies.
Si historiquement, les bibliothèques numériques patrimoniales furent d’abord alimentées par des images, elles profitèrent rapidement de la technologie OCR pour indexer les collections imprimées afin d’améliorer périmètre et performance du service de recherche d’information offert aux utilisateurs. Mais l’accès aux ressources iconographiques n’a pas connu les mêmes progrès et ces dernières demeurent dans l’ombre : indexation manuelle lacunaire, hétérogène et non viable à grande échelle ; silos documentaires par genre iconographique ; recherche par le contenu (CBIR, content-based image retrieval) encore peu opérationnelle sur les collections patrimoniales. Aujourd’hui, il serait pourtant possible de mieux valoriser ces ressources, en particulier en exploitant les énormes volumes d’OCR produits durant les deux dernières décennies (tant comme descripteur textuel que pour l’identification automatique des illustrations imprimées). Et ainsi mettre en valeur ces gravures, dessins, photographies, cartes, etc. pour leur valeur propre mais aussi comme point d’entrée dans les collections, en favorisant découverte et rebond de document en document, de collection à collection. Cet article décrit une approche ETL (extract-transform-load) appliquée aux images d’une bibliothèque numérique à vocation encyclopédique : identifier et extraire l’iconographie partout où elle se trouve (dans les collections image mais aussi dans les imprimés : presse, revue, monographie) ; transformer, harmoniser et enrichir ses métadonnées descriptives grâce à des techniques d’apprentissage machine – machine learning – pour la classification et l’indexation automatiques ; charger ces données dans une application web dédiée à la recherche iconographique (ou dans d’autres services de la bibliothèque). Approche qualifiée de pragmatique à double titre, puisqu’il s’agit de valoriser des ressources numériques existantes et de mettre à profit des technologies (quasiment) mâtures.

APA, Harvard, Vancouver, ISO, and other styles

23

Bosilj, Petra. "Image indexing and retrieval using component trees." Thesis, Lorient, 2016. http://www.theses.fr/2016LORIS396/document.

Full text

Abstract:

Cette thèse explore l’utilisation de représentations hiérarchiques des images issues de la morphologie mathématique, les arbres des coupes, pour la recherche et la classification d’images. Différents types de structures arborescentes sont analysés et une nouvelle classification en deux superclasses est proposée, ainsi qu’une contribution à l’indexation et à la représentation de ces structures par des dendogrammes. Deux contributions à la recherche d’images sont proposées, l’une sur la détection de régions d’intérêt et l’autre sur la description de ces régions. Les régions MSER peuvent être détectées par un algorithme s’appuyant sur une représentation des images par arbres min et max. L’utilisation d’autres structures arborescentes sous-jacentes permet de détecter des régions présentant des propriétés de stabilité différentes. Un nouveau détecteur, basé sur les arbres des formes, est proposé et évalué en recherche d’images. Pour la description des régions, le concept de spectres de formes 2D permettant de décrire globalement une image est étendu afin de proposer un descripteur local, au pouvoir discriminant plus puissant. Ce nouveau descripteur présente de bonnes propriétés à la fois de compacité et d’invariance à la rotation et à la translation. Une attention particulière a été portée à la préservation de l’invariance à l’échelle. Le descripteur est évalué à la fois en classification d’images et en recherche d’images satellitaires. Enfin, une technique de simplification des arbres de coupes est présentée, qui permet à l’utilisateur de réévaluer les mesures du niveau d’agrégation des régions imposé par les arbres des coupes
This thesis explores component trees, hierarchical structures from Mathematical Morphology, and their application to image retrieval and related tasks. The distinct component trees are analyzed and a novel classification into two superclasses is proposed, as well as a contribution to indexing and representation of the hierarchies using dendrograms. The first contribution to the field of image retrieval is in developing a novel feature detector, built upon the well-established MSER detection. The tree-based implementation of the MSER detector allows for changing the underlying tree in order to produce features of different stability properties. This resulted in the Tree of Shapes based Maximally Stable Region detector, leading to improvements over MSER in retrieval performance. Focusing on feature description, we extend the concept of 2D pattern spectra and adapt their global variant to more powerful, local schemes. Computed on the components of Min/Max-tree, they are histograms holding the information on distribution of image region attributes. The rotation and translation invariance is preserved from the global descriptor, while special attention is given to achieving scale invariance. We report comparable results to SIFT in image classification, as well as outperforming Morphology-based descriptors in satellite image retrieval, with a descriptor shorter than SIFT. Finally, a preprocessing or simplification technique for component trees is also presented, allowing the user to reevaluate the measures of region level of aggregation imposed on a component tree. The thesis is concluded by outlining the future perspectives based on the content of the thesis

APA, Harvard, Vancouver, ISO, and other styles

24

Goodrum, Abby A. (Abby Ann). "Evaluation of Text-Based and Image-Based Representations for Moving Image Documents." Thesis, University of North Texas, 1997. https://digital.library.unt.edu/ark:/67531/metadc500441/.

Full text

Abstract:

Document representation is a fundamental concept in information retrieval (IR), and has been relied upon in textual IR systems since the advent of library catalogs. The reliance upon text-based representations of stored information has been perpetuated in conventional systems for the retrieval of moving images as well. Although newer systems have added image-based representations of moving image documents as aids to retrieval, there has been little research examining how humans interpret these different types of representations. Such basic research has the potential to inform IR system designers about how best to aid users of their systems in retrieving moving images. One key requirement for the effective use of document representations in either textual or image form is thedegree to which these representations are congruent with the original documents. A measure of congruence is the degree to which human responses to representations are similar to responses produced by the document being represented. The aim of this study was to develop a model for the representation of moving images based upon human judgements of representativeness. The study measured the degree of congruence between moving image documents and their representations, both text and image based, in a non-retrieval environment with and without task constraints. Multidimensional scaling (MDS) was used to examine the dimensional dispersions of human judgements for the full moving images and their representations.

APA, Harvard, Vancouver, ISO, and other styles

25

Zlatoff, Nicolas. "Indexation d'images 2D : vers une reconnaissance d'objets multi-critèresContent-based image retrieval : On the way to object features." Lyon, INSA, 2006. http://theses.insa-lyon.fr/publication/2006ISAL0039/these.pdf.

Full text

Abstract:

D'importants volumes d'images numériques, conduisent aujourd'hui à une forte demande d'outils permettant d'indexer puis de rechercher une image. Indexer une image consiste à en extraire une signature. Rechercher une image dans une base consiste alors à comparer plusieurs signatures entre elles. Une indexation est dite basée sur le contenu lorsqu'elle utilise les données de bas niveau (couleur, texture) de l'image pour construire la signature. De tels systèmes sont face à une limitation fondamentale : ils permettent aux utilisateurs de rechercher des images d'après leurs caractéristiques de bas niveaux (matière) alors que ces derniers préfèreraient une recherche plus sémantique, relative à ce que l'image décrit (les objets présents, par exemple). Dans cette thèse, nous proposons un système d'indexation qui permet de réduire le fossé entre les données de bas niveau et la sémantique. Tout d'abord, l'utilisateur formule, lors de la requête, un modèle (prototype) de l'objet recherché. Lors de la comparaison, entre ce modèle et les images de la base, plusieurs critères sont utilisés, comme la forme mais aussi l'organisation spatiale de différentes zones d'intérêt. Une étape cruciale consiste justement à extraire de telles zones d'intérêt. Les approches de segmentation sont souvent entachées d'erreur, notamment à cause de variation d'éclairage dans la scène. Nous proposons donc de ne pas décrire une image par une segmentation unique mais plutôt par une hiérarchie de segmentations. Celle-ci représente l'image à différents niveaux de détails et se construit à partir de regroupements successifs de régions (groupements perceptuels), basés à la fois sur des critères de bas niveaux mais aussi géométriques. Durant la comparaison entre un modèle et une image, nous considérons les correspondances entre chacune des parties au lieu d'utiliser seulement le modèle dans sa globalité. Plus précisément, la correspondance prend en compte les formes des parties, à travers les descripteurs ART (Angular Radial Transform) et CSS (Curvature Scale Space). En outre, l'organisation spatiale de sparties entre elles est également prise en compte. Toutes ces caractéristiques sont combinées entre elles, par la théorie de l'évidence de Shafer afin d'en déduire une mesure unique de similarité
Huge volume of numeric images has recently led to strong needs for indexing and retrieval tools. Indexing an image consists in extracting a signature from it. Then, retrieving an image from an image database implies to compare several signatures together. We call content-based image retrieval systems those which build a signature from image low-level signal features such as color or texture. Such systems face a crucial limitation today. As a matter of fact, they allow to retrieve an image based on signal point of view, while users usually seek a more semantic-based search, related to what the image depicts (objects for instance). In this thesis, we have proposed an indexing system which may allow to bridge the gap between low-level features and semantic. First, the user has to formulate a kind of model (prototype) for the object sought. Then, while comparing this model which each image from the database, several features are considered, such as shape but also structural relationships between some regions of interest. The extraction of those regions remains an open and challenging problem. Segmentation approaches are often error-prone, because of artifacts from tight variations in illumination of the scene. That is why we do not describe an image with one unique segmentation, but rather with a hierarchy of segmentations. This represents the image at several levels of detail. It is build by iterative perceptual groupings on regions, considering both low-level and geometric features. When comparing a model with an image, we use one-to-one matching between model parts and regions from image, instead of considering the model in its whole. More precisely, comparison is based on shape similarity (through Angular Radial Transform and Curvature Scale Space) and on structural relationships among parts of object. All these features are then combined together, using Dempster-Shafer theory of belief, in order to derive one single similarity measure

APA, Harvard, Vancouver, ISO, and other styles

26

O'Connor, Maureen J. Patillo Paul J. "Reengineering human performance and fatigue research through use of physiological monitoring devices, web-based and mobile device data collection methods, and integrated data storage techniques /." Monterey, Calif. : Springfield, Va. : Naval Postgraduate School ; Available from National Technical Information Service, 2003. http://library.nps.navy.mil/uhtbin/hyperion-image/03Dec%5FO'Connor.pdf.

Full text

Abstract:

Thesis (M.S. in Information Technology Management)--Naval Postgraduate School, December 2003.
Thesis advisor(s): Nita L. Miller, Thomas J. Housel. Includes bibliographical references (p. 115-117). Also available online.

APA, Harvard, Vancouver, ISO, and other styles

27

Cáceres, Sheila Maricela Pinto. "Técnicas de visualização para sistemas de recuperação de imagens por conteúdo." [s.n.], 2010. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275783.

Full text

Abstract:

Orientador: Ricardo da Silva Torres
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-17T03:09:27Z (GMT). No. of bitstreams: 1 Caceres_SheilaMaricelaPinto_M.pdf: 5019224 bytes, checksum: cf87ae15c741c8322d4d398124abee74 (MD5) Previous issue date: 2010
Resumo: Um sistema de Recuperação de Imagens por Conteúdo (CBIR) oferece mecanismos necessários para busca e recuperação de imagens baseando-se em propriedades visuais como cor, textura, forma, etc. Em um processo de busca de imagens, a apresentação de resultados é um componente essencial, na medida em que a obtenção desses resultados é o motivo da existência do sistema. Consequentemente, o uso de técnicas de visualização apropriadas pode determinar o sucesso ou o fracasso de um sistema CBIR. Técnicas de visualização são valiosas ferramentas na exploração de grandes quantidades de dados, como coleções de imagens. Contudo, técnicas para visualizar imagens retornadas por sistemas CBIR têm sido pobremente exploradas. Este trabalho apresenta um estudo comparativo e avaliação de várias técnicas de visualização para sistemas CBIR. Como resultado desse estudo, propõe-se um conjunto de técnicas originais que tentam suprir algumas das limitações identificadas em métodos da literatura. Dentre as características das técnicas propostas, destacam-se o enfoque baseado no centro e o uso de técnicas de agrupamento de dados para representar a similaridade intrínseca entre as imagens retornadas. Resultados experimentais mostram que os métodos propostos superam outras estratégias de visualização, considerando-se diversos critérios, como adequação para mostrar resultados em sistemas CBIR, quantidade de informação oferecida, satisfação de usuário, etc. As principais contribuições deste trabalho são: (i) estudo comparativo e análise de sete técnicas de visualização, quatro delas existentes na literatura e três técnicas novas propostas; (ii) avaliação de duas técnicas da literatura nunca antes avaliadas: anéis concêntricos e espiral; (iii) especificação e implementação de três novas técnicas de visualização baseadas em agrupamento; (iv) especificação e implementação de um framework para desenvolvimento de novas estruturas visuais para sistemas CBIR no qual foram implementadas as técnicas de visualização estudadas
Abstract: A Content-Based Image Retrieval (CBIR) system offers mechanisms needed to search and retrieve images based on visual properties such as color, texture, shape, etc. In an image search process, the presentation of results is an essential component as the retrieval of relevant images is the reason of the system existence. Consequently, the use of appropriate visualization techniques may determine the success of a CBIR system. Visualization techniques are valuable tools for the exploration of a great quantity of data, such as images collections. However, techniques for visualizing images in CBIR systems have been poorly explored. This work presents a comparative study of several visualization techniques for CBIR systems. As a result of this study, several original techniques were proposed trying to fulfill some of the absent characteristics in existing methods, such as the central-based focus and the use of clustering approaches to represent the intrinsic similarity between retrieved images. Experimental results show that the proposed methods overcome other visualization strategies by considering several criteria such as adaptation to show CBIR results, information load, user satisfaction, etc. The main contributions of this work are: (i) comparative study and analysis of seven visualization techniques, four of them from the literature and three new ones; (ii) validation of two techniques never evaluated before: concentric rings and spiral; (iii) specification and implementation of three new techniques of visualization based on clustering; (iv) specification and implementation of a framework for developing new visual structures for content-based image retrieval systems. The studied techniques were implemented by using this framework
Mestrado
Ciência da Computação
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

28

Le, Huu Ton. "Improving image representation using image saliency and information gain." Thesis, Poitiers, 2015. http://www.theses.fr/2015POIT2287/document.

Full text

Abstract:

De nos jours, avec le développement des nouvelles technologies multimédia, la recherche d’images basée sur le contenu visuel est un sujet de recherche en plein essor avec de nombreux domaines d'application: indexation et recherche d’images, la graphologie, la détection et le suivi d’objets... Un des modèles les plus utilisés dans ce domaine est le sac de mots visuels qui tire son inspiration de la recherche d’information dans des documents textuels. Dans ce modèle, les images sont représentées par des histogrammes de mots visuels à partir d'un dictionnaire visuel de référence. La signature d’une image joue un rôle important car elle détermine la précision des résultats retournés par le système de recherche.Dans cette thèse, nous étudions les différentes approches concernant la représentation des images. Notre première contribution est de proposer une nouvelle méthodologie pour la construction du vocabulaire visuel en utilisant le gain d'information extrait des mots visuels. Ce gain d’information est la combinaison d’un modèle de recherche d’information avec un modèle d'attention visuelle.Ensuite, nous utilisons un modèle d'attention visuelle pour améliorer la performance de notre modèle de sacs de mots visuels. Cette étude de la saillance des descripteurs locaux souligne l’importance d’utiliser un modèle d’attention visuelle pour la description d’une image.La dernière contribution de cette thèse au domaine de la recherche d’information multimédia démontre comment notre méthodologie améliore le modèle des sacs de phrases visuelles. Finalement, une technique d’expansion de requêtes est utilisée pour augmenter la performance de la recherche par les deux modèles étudiés
Nowadays, along with the development of multimedia technology, content based image retrieval (CBIR) has become an interesting and active research topic with an increasing number of application domains: image indexing and retrieval, face recognition, event detection, hand writing scanning, objects detection and tracking, image classification, landmark detection... One of the most popular models in CBIR is Bag of Visual Words (BoVW) which is inspired by Bag of Words model from Information Retrieval field. In BoVW model, images are represented by histograms of visual words from a visual vocabulary. By comparing the images signatures, we can tell the difference between images. Image representation plays an important role in a CBIR system as it determines the precision of the retrieval results.In this thesis, image representation problem is addressed. Our first contribution is to propose a new framework for visual vocabulary construction using information gain (IG) values. The IG values are computed by a weighting scheme combined with a visual attention model. Secondly, we propose to use visual attention model to improve the performance of the proposed BoVW model. This contribution addresses the importance of saliency key-points in the images by a study on the saliency of local feature detectors. Inspired from the results from this study, we use saliency as a weighting or an additional histogram for image representation.The last contribution of this thesis to CBIR shows how our framework enhances the BoVP model. Finally, a query expansion technique is employed to increase the retrieval scores on both BoVW and BoVP models

APA, Harvard, Vancouver, ISO, and other styles

29

Li, Honglin. "Hierarchical video semantic annotation the vision and techniques /." Connect to this title online, 2003. http://rave.ohiolink.edu/etdc/view?acc%5Fnum=osu1071863899.

Full text

Abstract:

Thesis (Ph. D.)--Ohio State University, 2003.
Title from first page of PDF file. Document formatted into pages; contains xv, 146 p.; also includes graphics. Includes bibliographical references (p. 136-146).

APA, Harvard, Vancouver, ISO, and other styles

30

Bahga, Arshdeep. "Technologies for context based video search." Thesis, Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/33824.

Full text

Abstract:

This thesis presents methods and a system for video search over the internet or the intranet. The objective is to design a real time and automated video clustering and search system that provides users of the search engine the most relevant videos available that are responsive to a query at a particular moment in time, and supplementary information that may also be useful. The thesis highlights methods to mitigate the effect of the semantic gap faced by current content based video search approaches. A context-sensitive video ranking scheme is used, wherein the context is generated in an automated manner.

APA, Harvard, Vancouver, ISO, and other styles

31

Xiong, Li. "Resilient Reputation and Trust Management: Models and Techniques." Diss., Georgia Institute of Technology, 2005. http://hdl.handle.net/1853/7483.

Full text

Abstract:

The continued advances in service-oriented computing and global communications have created a strong technology push for online information sharing and business transactions among enterprises, organizations and individuals. While these communities offer enormous opportunities, they also present potential threats due to a lack of trust. Reputation systems provide a way for building trust through social control by harnessing the community knowledge in the form of feedback. Although feedback-based reputation systems help community participants decide who to trust and encourage trustworthy behavior, they also introduce vulnerabilities due to potential manipulations by dishonest or malicious players. Therefore, building an effective and resilient reputation system remains a big challenge for the wide deployment of service-oriented computing. This dissertation proposes a decentralized reputation based trust supporting framework called PeerTrust, focusing on models and techniques for resilient reputation management against feedback aggregation related vulnerabilities, especially feedback sparsity with potential feedback manipulation, feedback oscillation, and loss of feedback privacy. This dissertation research has made three unique contributions for building a resilient decentralized reputation system. First, we develop a core reputation model with important trust parameters and a coherent trust metric for quantifying and comparing the trustworthiness of participants. We develop decentralized strategies for implementing the trust model in an efficient and secure manner. Second, we develop techniques countering potential vulnerabilities associated with feedback aggregation, including a similarity inference scheme to counter feedback sparsity with potential feedback manipulations, and a novel metric based on Proportional, Integral, and Derivative (PID) model to handle strategic oscillating behavior of participants. Third but not the least, we develop privacy-conscious trust management models and techniques to address the loss of feedback privacy. We develop a set of novel probabilistic decentralized privacy-preserving computation protocols for important primitive operations. We show how feedback aggregation can be divided into individual steps that utilize above primitive protocols through an example reputation algorithm based on kNN classification. We perform experimental evaluations for each of the schemes we proposed and show the feasibility, effectiveness, and cost of our approach. The PeerTrust framework presents an important step forward with respect to developing attack-resilient reputation trust systems.

APA, Harvard, Vancouver, ISO, and other styles

32

Tao, Cui. "Schema Matching and Data Extraction over HTML Tables." Diss., CLICK HERE for online access, 2003. http://contentdm.lib.byu.edu/ETD/image/etd279.pdf.

Full text

APA, Harvard, Vancouver, ISO, and other styles

33

Rasheed, Zeeshan. "Video categorization using semantics and semiotics." Doctoral diss., University of Central Florida, 2003. http://digital.library.ucf.edu/cdm/ref/collection/RTD/id/2888.

Full text

Abstract:

University of Central Florida College of Engineering Thesis
There is a great need to automatically segment, categorize, and annotate video data, and to develop efficient tools for browsing and searching. We believe that the categorization of videos can be achieved by exploring the concepts and meanings of the videos. This task requires bridging the gap between low-level content and high-level concepts (or semantics). Once a relationship is established between the low-level computable features of the video and its semantics, .the user would be able to navigate through videos through the use of concepts and ideas (for example, a user could extract only those scenes in an action film that actually contain fights) rat her than sequentially browsing the whole video. However, this relationship must follow the norms of human perception and abide by the rules that are most often followed by the creators (directors) of these videos. These rules are called film grammar in video production literature. Like any natural language, this grammar has several dialects, but it has been acknowledged to be universal. Therefore, the knowledge of film grammar can be exploited effectively for the understanding of films. To interpret an idea using the grammar, we need to first understand the symbols, as in natural languages, and second, understand the rules of combination of these symbols to represent concepts. In order to develop algorithms that exploit this film grammar, it is necessary to relate the symbols of the grammar to computable video features.
Ph.D.
Doctorate;
School of Electrical Engineering and Computer Science
Engineering and Computer Science
Electrical Engineering and Computer Science
120 p.
xix, 120 leaves, bound : ill., (some col.) ; 28 cm.

APA, Harvard, Vancouver, ISO, and other styles

34

Awad, Dounia. "Vers un système perceptuel de reconnaissance d'objets." Thesis, La Rochelle, 2014. http://www.theses.fr/2014LAROS017/document.

Full text

Abstract:

Cette thèse a pour objectif de proposer un système de reconnaissance d’images utilisant des informations attentionnelles. Nous nous intéressons à la capacité d’une telle approche à améliorer la complexité en temps de calcul et en utilisation mémoire pour la reconnaissance d’objets. Dans un premier temps, nous avons proposé d’utiliser un système d’attention visuelle comme filtre pour réduire le nombre de points d’intérêt générés par les détecteurs traditionnels [Awad 12]. En utilisant l’architecture attentionnelle proposée par Perreira da Silva comme filtre [Awad 12] sur la base d’images de VOC 2005, nous avons montré qu’un filtrage de 60% des points d’intérêt (extraits par Harris-Laplace et Laplacien) ne fait diminuer que légèrement la performance d’un système de reconnaissance d’objets (différence moyenne de AUC ~ 1%) alors que le gain en complexité est important (40% de gain en vitesse de calcul et 60% en complexité). Par la suite, nous avons proposé un descripteur hybride perceptuel-texture [Awad 14] qui caractérise les informations fréquentielles de certaines caractéristiques considérées comme perceptuellement intéressantes dans le domaine de l’attention visuelle, comme la couleur, le contraste ou l’orientation. Notre descripteur a l’avantage de fournir des vecteurs de caractéristiques ayant une dimension deux fois moindre que celle des descripteurs proposés dans l’état de l’art. L’expérimentation de ce descripteur sur un système de reconnaissance d’objets (le détecteur restant SIFT), sur la base d’images de VOC 2007, a montré une légère baisse de performance (différence moyenne de précision ~5%) par rapport à l’algorithme original, basé sur SIFT mais gain de 50% en complexité. Pour aller encore plus loin, nous avons proposé une autre expérimentation permettant de tester l’efficacité globale de notre descripteur en utilisant cette fois le système d’attention visuelle comme détecteur des points d’intérêt sur la base d’images de VOC 2005. Là encore, le système n’a montré qu’une légère baisse de performance (différence moyenne de précision ~3%) alors que la complexité est réduite de manière drastique (environ 50% de gain en temps de calcul et 70% en complexité)
The main objective of this thesis is to propose a pipeline for an object recognition algorithm, near to human perception, and at the same time, address the problems of Content Based image retrieval (CBIR) algorithm complexity : query run time and memory allocation. In this context, we propose a filter based on visual attention system to select salient points according to human interests from the interest points extracted by a traditionnal interest points detectors. The test of our approach, using Perreira Da Silva’s system as filter, on VOC 2005 databases, demonstrated that we can maintain approximately the same performance of a object recognition system by selecting only 40% of interest points (extracted by Harris-Laplace and Laplacian), while having an important gain in complexity (40% gain in query-run time and 60% in complexity). Furthermore, we address the problem of high dimensionality of descriptor in object recognition system. We proposed a new hybrid texture descriptor, representing the spatial frequency of some perceptual features extracted by a visual attention system. This descriptor has the advantage of being lower dimension vs. traditional descriptors. Evaluating our descriptor with an object recognition system (interest points detectors are Harris-Laplace & Laplacian) on VOC 2007 databases showed a slightly decrease in the performance (with 5% loss in Average Precision) compared to the original system, based on SIFT descriptor (with 50% complexity gain). In addition, we evaluated our descriptor using a visual attention system as interest point detector, on VOC 2005 databases. The experiment showed a slightly decrease in performance (with 3% loss in performance), meanwhile we reduced drastically the complexity of the system (with 50% gain in run-query time and 70% in complexity)

APA, Harvard, Vancouver, ISO, and other styles

35

Vemulapalli, Smita. "Audio-video based handwritten mathematical content recognition." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/45958.

Full text

Abstract:

Recognizing handwritten mathematical content is a challenging problem, and more so when such content appears in classroom videos. However, given the fact that in such videos the handwritten text and the accompanying audio refer to the same content, a combination of video and audio based recognizer has the potential to significantly improve the content recognition accuracy. This dissertation, using a combination of video and audio based recognizers, focuses on improving the recognition accuracy associated with handwritten mathematical content in such videos. Our approach makes use of a video recognizer as the primary recognizer and a multi-stage assembly, developed as part of this research, is used to facilitate effective combination with an audio recognizer. Specifically, we address the following challenges related to audio-video based handwritten mathematical content recognition: (1) Video Preprocessing - generates a timestamped sequence of segmented characters from the classroom video in the face of occlusions and shadows caused by the instructor, (2) Ambiguity Detection - determines the subset of input characters that may have been incorrectly recognized by the video based recognizer and forwards this subset for disambiguation, (3) A/V Synchronization - establishes correspondence between the handwritten character and the spoken content, (4) A/V Combination - combines the synchronized outputs from the video and audio based recognizers and generates the final recognized character, and (5) Grammar Assisted A/V Based Mathematical Content Recognition - utilizes a base mathematical speech grammar for both character and structure disambiguation. Experiments conducted using videos recorded in a classroom-like environment demonstrate the significant improvements in recognition accuracy that can be achieved using our techniques.

APA, Harvard, Vancouver, ISO, and other styles

36

Andrade, Felipe dos Santos Pinto de 1986. "Combinação de descritores locais e globais para recuperação de imagens e vídeos por conteúdo." [s.n.], 2012. http://repositorio.unicamp.br/jspui/handle/REPOSIP/275668.

Full text

Abstract:

Orientador: Ricardo da Silva Torres, Hélio Pedrini
Dissertação (mestrado) - Universidade Estadual de Campinas, Instituto de Computação
Made available in DSpace on 2018-08-22T03:54:24Z (GMT). No. of bitstreams: 1 Andrade_FelipedosSantosPintode_M.pdf: 3172421 bytes, checksum: cf06d3683b1828f22508de3f77ed1c00 (MD5) Previous issue date: 2012
Resumo: Recentemente, a fusão de descritores tem sido usada para melhorar o desempenho de sistemas de busca em tarefas de recuperação de imagens e vídeos. Descritores podem ser globais ou locais, dependendo de como analisam o conteúdo visual. A maioria dos trabalhos existentes tem se concentrado na fusão de um tipo de descritor. Este trabalho objetiva analisar o impacto da combinação de descritores locais e globais. Realiza-se um estudo comparativo de diferentes tipos de descritores e todas suas possíveis combinações. Além disso, investigam-se modelos para extração e a comparação das características globais e locais para recuperação de imagens e vídeos e estuda-se a utilização da técnica de programação genética para combinar esses descritores. Experimentos extensivos baseados em um projeto experimental rigoroso mostram que descritores locais e globais complementam-se quando combinados. Além disso, esta combinação produz resultados superiores aos observados para outras combinações e ao uso dos descritores individualmente
Abstract: Recently, fusion of descriptors has become a trend for improving the performance in image and video retrieval tasks. Descriptors can be global or local, depending on how they analyze visual content. Most of existing works have focused on the fusion of a single type of descriptor. Different from all of them, this work aims at analyzing the impact of combining global and local descriptors. Here, we perform a comparative study of different types of descriptors and all of their possible combinations. Furthermore, we investigate different models for extracting and comparing local and global features of images and videos, and evaluate the use of genetic programming as a suitable alternative for combining local and global descriptors. Extensive experiments following a rigorous experimental design show that global and local descriptors complement each other, such that, when combined, they outperform other combinations or single descriptors
Mestrado
Ciência da Computação
Mestre em Ciência da Computação

APA, Harvard, Vancouver, ISO, and other styles

37

Gorisse, David. "Passage à l’échelle des méthodes de recherche sémantique dans les grandes bases d’images." Thesis, Cergy-Pontoise, 2010. http://www.theses.fr/2010CERG0519/document.

Full text

Abstract:

Avec la révolution numérique de cette dernière décennie, la quantité de photos numériques mise à disposition de chacun augmente plus rapidement que la capacité de traitement des ordinateurs. Les outils de recherche actuels ont été conçus pour traiter de faibles volumes de données. Leur complexité ne permet généralement pas d'effectuer des recherches dans des corpus de grande taille avec des temps de calculs acceptables pour les utilisateurs. Dans cette thèse, nous proposons des solutions pour passer à l'échelle les moteurs de recherche d'images par le contenu. Dans un premier temps, nous avons considéré les moteurs de recherche automatique traitant des images indexées sous la forme d'histogrammes globaux. Le passage à l'échelle de ces systèmes est obtenu avec l'introduction d'une nouvelle structure d'index adaptée à ce contexte qui nous permet d'effectuer des recherches de plus proches voisins approximées mais plus efficaces. Dans un second temps, nous nous sommes intéressés à des moteurs plus sophistiqués permettant d'améliorer la qualité de recherche en travaillant avec des index locaux tels que les points d'intérêt. Dans un dernier temps, nous avons proposé une stratégie pour réduire la complexité de calcul des moteurs de recherche interactifs. Ces moteurs permettent d'améliorer les résultats en utilisant des annotations que les utilisateurs fournissent au système lors des sessions de recherche. Notre stratégie permet de sélectionner rapidement les images les plus pertinentes à annoter en optimisant une méthode d'apprentissage actif
In this last decade, would the digital revolution and its ancillary consequence of a massive increases in digital picture quantities. The database size grow much faster than the processing capacity of computers. The current search engine which conceived for small data volumes do not any more allow to make searches in these new corpus with acceptable response times for users.In this thesis, we propose scalable content-based image retrieval engines.At first, we considered automatic search engines where images are indexed with global histograms. Secondly, we were interested in more sophisticated engines allowing to improve the search quality by working with bag of feature. In a last time, we proposed a strategy to reduce the complexity of interactive search engines. These engines allow to improve the results by using labels which the users supply to the system during the search sessions

APA, Harvard, Vancouver, ISO, and other styles

38

Borba, Gustavo Benvenutti. "Automatic extraction of regions of interest from images based on visual attention models." Universidade Tecnológica Federal do Paraná, 2010. http://repositorio.utfpr.edu.br/jspui/handle/1/1295.

Full text

Abstract:

UOL; CAPES
Esta tese apresenta um método para a extração de regiões de interesse (ROIs) de imagens. No contexto deste trabalho, ROIs são definidas como os objetos semânticos que se destacam em uma imagem, podendo apresentar qualquer tamanho ou localização. O novo método baseia-se em modelos computacionais de atenção visual (VA), opera de forma completamente bottom-up, não supervisionada e não apresenta restrições com relação à categoria da imagem de entrada. Os elementos centrais da arquitetura são os modelos de VA propostos por Itti-Koch-Niebur e Stentiford. O modelo de Itti-Koch-Niebur considera as características de cor, intensidade e orientação da imagem e apresenta uma resposta na forma de coordenadas, correspondentes aos pontos de atenção (POAs) da imagem. O modelo Stentiford considera apenas as características de cor e apresenta a resposta na forma de áreas de atenção na imagem (AOAs). Na arquitetura proposta, a combinação de POAs e AOAs permite a obtenção dos contornos das ROIs. Duas implementações desta arquitetura, denominadas 'primeira versão' e 'versão melhorada' são apresentadas. A primeira versão utiliza principalmente operações tradicionais de morfologia matemática. Esta versão foi aplicada em dois sistemas de recuperação de imagens com base em regiões. No primeiro, as imagens são agrupadas de acordo com as ROIs, ao invés das características globais da imagem. O resultado são grupos de imagens mais significativos semanticamente, uma vez que o critério utilizado são os objetos da mesma categoria contidos nas imagens. No segundo sistema, á apresentada uma combinação da busca de imagens tradicional, baseada nas características globais da imagem, com a busca de imagens baseada em regiões. Ainda neste sistema, as buscas são especificadas através de mais de uma imagem exemplo. Na versão melhorada da arquitetura, os estágios principais são uma análise de coerência espacial entre as representações de ambos modelos de VA e uma representação multi-escala das AOAs. Se comparada à primeira versão, esta apresenta maior versatilidade, especialmente com relação aos tamanhos das ROIs presentes nas imagens. A versão melhorada foi avaliada diretamente, com uma ampla variedade de imagens diferentes bancos de imagens públicos, com padrões-ouro na forma de bounding boxes e de contornos reais dos objetos. As métricas utilizadas na avaliação foram presision, recall, F1 e area of overlap. Os resultados finais são excelentes, considerando-se a abordagem exclusivamente bottom-up e não-supervisionada do método.
This thesis presents a method for the extraction of regions of interest (ROIs) from images. By ROIs we mean the most prominent semantic objects in the images, of any size and located at any position in the image. The novel method is based on computational models of visual attention (VA), operates under a completely bottom-up and unsupervised way and does not present con-straints in the category of the input images. At the core of the architecture is de model VA proposed by Itti, Koch and Niebur and the one proposed by Stentiford. The first model takes into account color, intensity, and orientation features and provides coordinates corresponding to the points of attention (POAs) in the image. The second model considers color features and provides rough areas of attention (AOAs) in the image. In the proposed architecture, the POAs and AOAs are combined to establish the contours of the ROIs. Two implementations of this architecture are presented, namely 'first version' and 'improved version'. The first version mainly on traditional morphological operations and was applied in two novel region-based image retrieval systems. In the first one, images are clustered on the basis of the ROIs, instead of the global characteristics of the image. This provides a meaningful organization of the database images, since the output clusters tend to contain objects belonging to the same category. In the second system, we present a combination of the traditional global-based with region-based image retrieval under a multiple-example query scheme. In the improved version of the architecture, the main stages are a spatial coherence analysis between both VA models and a multiscale representation of the AOAs. Comparing to the first one, the improved version presents more versatility, mainly in terms of the size of the extracted ROIs. The improved version was directly evaluated for a wide variety of images from different publicly available databases, with ground truth in the form of bounding boxes and true object contours. The performance measures used were precision, recall, F1 and area overlap. Experimental results are of very high quality, particularly if one takes into account the bottom-up and unsupervised nature of the approach.

APA, Harvard, Vancouver, ISO, and other styles

39

Gbehounou, Syntyche. "Indexation de bases d'images : évaluation de l'impact émotionnel." Thesis, Poitiers, 2014. http://www.theses.fr/2014POIT2295/document.

Full text

Abstract:

L'objectif de ce travail est de proposer une solution de reconnaissance de l'impact émotionnel des images en se basant sur les techniques utilisées en recherche d'images par le contenu. Nous partons des résultats intéressants de cette architecture pour la tester sur une tâche plus complexe. La tâche consiste à classifier les images en fonction de leurs émotions que nous avons définies "Négative", "Neutre" et "Positive". Les émotions sont liées aussi bien au contenu des images, qu'à notre vécu. On ne pourrait donc pas proposer un système de reconnaissance des émotions performant universel. Nous ne sommes pas sensible aux mêmes choses toute notre vie : certaines différences apparaissent avec l'âge et aussi en fonction du genre. Nous essaierons de nous affranchir de ces inconstances en ayant une évaluation des bases d'images la plus hétérogène possible. Notre première contribution va dans ce sens : nous proposons une base de 350 images très largement évaluée. Durant nos travaux, nous avons étudié l'apport de la saillance visuelle aussi bien pendant les expérimentations subjectives que pendant la classification des images. Les descripteurs, que nous avons choisis, ont été évalués dans leur majorité sur une base consacrée à la recherche d'images par le contenu afin de ne sélectionner que les plus pertinents. Notre approche qui tire les avantages d'une architecture bien codifiée, conduit à des résultats très intéressants aussi bien sur la base que nous avons construite que sur la base IAPS, qui sert de référence dans l'analyse de l'impact émotionnel des images
The goal of this work is to propose an efficient approach for emotional impact recognition based on CBIR techniques (descriptors, image representation). The main idea relies in classifying images according to their emotion which can be "Negative", "Neutral" or "Positive". Emotion is related to the image content and also to the personnal feelings. To achieve our goal we firstly need a correct assessed image database. Our first contribution is about this aspect. We proposed a set of 350 diversifed images rated by people around the world. Added to our choice to use CBIR methods, we studied the impact of visual saliency for the subjective evaluations and interest region segmentation for classification. The results are really interesting and prove that the CBIR methods are usefull for emotion recognition. The chosen desciptors are complementary and their performance are consistent on the database we have built and on IAPS, reference database for the analysis of the image emotional impact

APA, Harvard, Vancouver, ISO, and other styles

40

Allani, Atig Olfa. "Une approche de recherche d'images basée sur la sémantique et les descripteurs visuels." Thesis, Paris 8, 2017. http://www.theses.fr/2017PA080032.

Full text

Abstract:

La recherche d’image est une thématique de recherche très active. Plusieurs approches permettant d'établir un lien entre les descripteurs de bas niveau et la sémantique ont été proposées. Parmi celles-là, nous citons la reconnaissance d'objets, les ontologies et le bouclage de pertinence. Cependant, leur limitation majeure est la haute dépendance d’une ressource externe et l'incapacité à combiner efficacement l'information visuelle et sémantique. Cette thèse propose un système basé sur un graphe de patrons, la sélection ciblée des descripteurs pour la phase en ligne et l'amélioration de la visualisation des résultats. L'idée est de (1) construire un graphe de patrons composé d'une ontologie modulaire et d'un modèle basé graphe pour l'organisation de l'information sémantique, (2) de construire un ensemble de collections de descripteurs pour guider la sélection des descripteurs à appliquer durant la recherche et (3) améliorer la visualisation des résultats en intégrant les relations sémantiques déduite du graphe de patrons.Durant la construction de graphe de patrons, les modules ontologiques associés à chaque domaine sont automatiquement construits. Le graphe de régions résume l'information visuelle en un format plus condensé et la classifie selon son domaine. Le graphe de patrons est déduit par composition de modules ontologiques. Notre système a été testé sur trois bases d’images. Les résultats obtenus montrent une amélioration au niveau du processus de recherche, une meilleure adaptation des descripteurs visuels utilisés aux domaines couverts et une meilleure visualisation des résultats qui diminue le niveau d’abstraction par rapport à leur logique de génération
Image retrieval is a very active search area. Several image retrieval approaches that allow mapping between low-level features and high-level semantics have been proposed. Among these, one can cite object recognition, ontologies, and relevance feedback. However, their main limitation concern their high dependence on reliable external resources and lack of capacity to combine semantic and visual information.This thesis proposes a system based on a pattern graph combining semantic and visual features, relevant visual feature selection for image retrieval and improvement of results visualization. The idea is (1) build a pattern graph composed of a modular ontology and a graph-based model, (2) to build visual feature collections to guide feature selection during online retrieval phase and (3) improve the retrieval results visualization with the integration of semantic relations.During the pattern graph building, ontology modules associated to each domain are automatically built using textual corpuses and external resources. The region's graphs summarize the visual information in a condensed form and classify it given its semantics. The pattern graph is obtained using modules composition. In visual features collections building, association rules are used to deduce the best practices on visual features use for image retrieval. Finally, results visualization uses the rich information on images to improve the results presentation.Our system has been tested on three image databases. The results show an improvement in the research process, a better adaptation of the visual features to the domains and a richer visualization of the results

APA, Harvard, Vancouver, ISO, and other styles

41

Cantalloube, Faustine. "Détection et caractérisation d'exoplanètes dans des images à grand contraste par la résolution de problème inverse." Thesis, Université Grenoble Alpes (ComUE), 2016. http://www.theses.fr/2016GREAY017/document.

Full text

Abstract:

L’imagerie d’exoplanètes permet d’obtenir de nombreuses informations sur la lumière qu’elles émettent, l’interaction avec leur environnement et sur leur nature. Afin d’extraire l’information des images, il est indispensable d’appliquer des méthodes de traitement d’images adaptées aux instruments. En particulier, il faut séparer les signaux planétaires des tavelures présentes dans les images qui sont dues aux aberrations instrumentales quasi-statiques. Dans mon travail de thèse je me suis intéressée à deux méthodes innovantes de traitement d’images qui sont fondés sur la résolution de problèmes inverses.La première méthode, ANDROMEDA, est un algorithme dédié à la détection et à la caractérisation de point sources dans des images haut contraste via une approche maximum de vraisemblance. ANDROMEDA exploite la diversité temporelle apportée par la rotation de champ de l’image (où se trouvent les objets astrophysiques) alors que la pupille (où les aberrations prennent naissance) est gardée fixe. A partir de l’application sur données réelles de l’algorithme dans sa version originale, j’ai proposé et qualifié des améliorations afin de prendre en compte les résidus non modélisés par la méthode tels que les structures bas ordres variant lentement et le niveau résiduel de bruit correlé dans les données. Une fois l’algorithme ANDROMEDA opérationnel, j’ai analysé ses performances et sa sensibilité aux paramètres utilisateurs, montrant la robustesse de la méthode. Une comparaison détaillée avec les algorithmes les plus utilisés dans la communauté a prouvé que cet algorithme est compétitif avec des performances très intéressantes dans le contexte actuel. En particulier, il s’agit de la seule méthode qui permet une détection entièrement non-supervisée. De plus, l’application à de nombreuses données ciel venant d’instruments différents a prouvé la fiabilité de la méthode et l’efficacité à extraire rapidement et systématiquement (avec un seul paramètre utilisateur à ajuster) les informations contenues dans les images. Ces applications ont aussi permis d’ouvrir des perspectives pour adapter cet outil aux grands enjeux actuels de l’imagerie d’exoplanètes.La seconde méthode, MEDUSAE, consiste à estimer conjointement les aberrations et les objets d’intérêt scientifique, en s’appuyant sur un modèle de formation d’images coronographiques. MEDUSAE exploite la redondance d’informations apportée par des images multi-spectrales. Afin de raffiner la stratégie d’inversion de la méthode et d’identifier les paramètres les plus critiques, j’ai appliqué l’algorithme sur des données générées avec le modèle utilisé dans l’inversion. J’ai ensuite appliqué cette méthode à des données simulées plus réalistes afin d’étudier l’impact de la différence entre le modèle utilisé dans l’inversion et les données réelles. Enfin, j’ai appliqué la méthode à des données réelles et les résultats préliminaires que j’ai obtenus ont permis d’identifier les informations importantes dont la méthode a besoin et ainsi de proposer plusieurs pistes de travail qui permettraient de rendre cet algorithme opérationnel sur données réelles
Direct imaging of exoplanets provides valuable information about the light they emit, their interactions with their host star environment and their nature. In order to image such objects, advanced data processing tools adapted to the instrument are needed. In particular, the presence of quasi-static speckles in the images, due to optical aberrations distorting the light from the observed star, prevents planetary signals from being distinguished. In this thesis, I present two innovative image processing methods, both based on an inverse problem approach, enabling the disentanglement of the quasi-static speckles from the planetary signals. My work consisted of improving these two algorithms in order to be able to process on-sky images.The first one, called ANDROMEDA, is an algorithm dedicated to point source detection and characterization via a maximum likelihood approach. ANDROMEDA makes use of the temporal diversity provided by the image field rotation during the observation, to recognize the deterministic signature of a rotating companion over the stellar halo. From application of the original version on real data, I have proposed and qualified improvements in order to deal with the non-stable large scale structures due to the adaptative optics residuals and with the remaining level of correlated noise in the data. Once ANDROMEDA became operational on real data, I analyzed its performance and its sensitivity to the user-parameters proving the robustness of the algorithm. I also conducted a detailed comparison to the other algorithms widely used by the exoplanet imaging community today showing that ANDROMEDA is a competitive method with practical advantages. In particular, it is the only method that allows a fully unsupervised detection. By the numerous tests performed on different data set, ANDROMEDA proved its reliability and efficiency to extract companions in a rapid and systematic way (with only one user parameter to be tuned). From these applications, I identified several perspectives whose implementation could significantly improve the performance of the pipeline.The second algorithm, called MEDUSAE, consists in jointly estimating the aberrations (responsible for the speckle field) and the circumstellar objects by relying on a coronagraphic image formation model. MEDUSAE exploits the spectral diversity provided by multispectral data. In order to In order to refine the inversion strategy and probe the most critical parameters, I applied MEDUSAE on a simulated data set generated with the model used in the inversion. To investigate further the impact of the discrepancy between the image model used and the real images, I applied the method on realistic simulated images. At last, I applied MEDUSAE on real data and from the preliminary results obtained, I identified the important input required by the method and proposed leads that could be followed to make this algorithm operational to process on-sky data

APA, Harvard, Vancouver, ISO, and other styles

42

Dang, Quoc Bao. "Information spotting in huge repositories of scanned document images." Thesis, La Rochelle, 2018. http://www.theses.fr/2018LAROS024/document.

Full text

Abstract:

Ce travail vise à développer un cadre générique qui est capable de produire des applications de localisation d'informations à partir d’une caméra (webcam, smartphone) dans des très grands dépôts d'images de documents numérisés et hétérogènes via des descripteurs locaux. Ainsi, dans cette thèse, nous proposons d'abord un ensemble de descripteurs qui puissent être appliqués sur des contenus aux caractéristiques génériques (composés de textes et d’images) dédié aux systèmes de recherche et de localisation d'images de documents. Nos descripteurs proposés comprennent SRIF, PSRIF, DELTRIF et SSKSRIF qui sont construits à partir de l’organisation spatiale des points d’intérêts les plus proches autour d'un point-clé pivot. Tous ces points sont extraits à partir des centres de gravité des composantes connexes de l‘image. A partir de ces points d’intérêts, des caractéristiques géométriques invariantes aux dégradations sont considérées pour construire nos descripteurs. SRIF et PSRIF sont calculés à partir d'un ensemble local des m points d’intérêts les plus proches autour d'un point d’intérêt pivot. Quant aux descripteurs DELTRIF et SSKSRIF, cette organisation spatiale est calculée via une triangulation de Delaunay formée à partir d'un ensemble de points d’intérêts extraits dans les images. Cette seconde version des descripteurs permet d’obtenir une description de forme locale sans paramètres. En outre, nous avons également étendu notre travail afin de le rendre compatible avec les descripteurs classiques de la littérature qui reposent sur l’utilisation de points d’intérêts dédiés de sorte qu'ils puissent traiter la recherche et la localisation d'images de documents à contenu hétérogène. La seconde contribution de cette thèse porte sur un système d'indexation de très grands volumes de données à partir d’un descripteur volumineux. Ces deux contraintes viennent peser lourd sur la mémoire du système d’indexation. En outre, la très grande dimensionnalité des descripteurs peut amener à une réduction de la précision de l'indexation, réduction liée au problème de dimensionnalité. Nous proposons donc trois techniques d'indexation robustes, qui peuvent toutes être employées sans avoir besoin de stocker les descripteurs locaux dans la mémoire du système. Cela permet, in fine, d’économiser la mémoire et d’accélérer le temps de recherche de l’information, tout en s’abstrayant d’une validation de type distance. Pour cela, nous avons proposé trois méthodes s’appuyant sur des arbres de décisions : « randomized clustering tree indexing” qui hérite des propriétés des kd-tree, « kmean-tree » et les « random forest » afin de sélectionner de manière aléatoire les K dimensions qui permettent de combiner la plus grande variance expliquée pour chaque nœud de l’arbre. Nous avons également proposé une fonction de hachage étendue pour l'indexation de contenus hétérogènes provenant de plusieurs couches de l'image. Comme troisième contribution de cette thèse, nous avons proposé une méthode simple et robuste pour calculer l'orientation des régions obtenues par le détecteur MSER, afin que celui-ci puisse être combiné avec des descripteurs dédiés. Comme la plupart de ces descripteurs visent à capturer des informations de voisinage autour d’une région donnée, nous avons proposé un moyen d'étendre les régions MSER en augmentant le rayon de chaque région. Cette stratégie peut également être appliquée à d'autres régions détectées afin de rendre les descripteurs plus distinctifs. Enfin, afin d'évaluer les performances de nos contributions, et en nous fondant sur l'absence d'ensemble de données publiquement disponibles pour la localisation d’information hétérogène dans des images capturées par une caméra, nous avons construit trois jeux de données qui sont disponibles pour la communauté scientifique
This work aims at developing a generic framework which is able to produce camera-based applications of information spotting in huge repositories of heterogeneous content document images via local descriptors. The targeted systems may take as input a portion of an image acquired as a query and the system is capable of returning focused portion of database image that match the query best. We firstly propose a set of generic feature descriptors for camera-based document images retrieval and spotting systems. Our proposed descriptors comprise SRIF, PSRIF, DELTRIF and SSKSRIF that are built from spatial space information of nearest keypoints around a keypoints which are extracted from centroids of connected components. From these keypoints, the invariant geometrical features are considered to be taken into account for the descriptor. SRIF and PSRIF are computed from a local set of m nearest keypoints around a keypoint. While DELTRIF and SSKSRIF can fix the way to combine local shape description without using parameter via Delaunay triangulation formed from a set of keypoints extracted from a document image. Furthermore, we propose a framework to compute the descriptors based on spatial space of dedicated keypoints e.g SURF or SIFT or ORB so that they can deal with heterogeneous-content camera-based document image retrieval and spotting. In practice, a large-scale indexing system with an enormous of descriptors put the burdens for memory when they are stored. In addition, high dimension of descriptors can make the accuracy of indexing reduce. We propose three robust indexing frameworks that can be employed without storing local descriptors in the memory for saving memory and speeding up retrieval time by discarding distance validating. The randomized clustering tree indexing inherits kd-tree, kmean-tree and random forest from the way to select K dimensions randomly combined with the highest variance dimension from each node of the tree. We also proposed the weighted Euclidean distance between two data points that is computed and oriented the highest variance dimension. The secondly proposed hashing relies on an indexing system that employs one simple hash table for indexing and retrieving without storing database descriptors. Besides, we propose an extended hashing based method for indexing multi-kinds of features coming from multi-layer of the image. Along with proposed descriptors as well indexing frameworks, we proposed a simple robust way to compute shape orientation of MSER regions so that they can combine with dedicated descriptors (e.g SIFT, SURF, ORB and etc.) rotation invariantly. In the case that descriptors are able to capture neighborhood information around MSER regions, we propose a way to extend MSER regions by increasing the radius of each region. This strategy can be also applied for other detected regions in order to make descriptors be more distinctive. Moreover, we employed the extended hashing based method for indexing multi-kinds of features from multi-layer of images. This system are not only applied for uniform feature type but also multiple feature types from multi-layers separated. Finally, in order to assess the performances of our contributions, and based on the assessment that no public dataset exists for camera-based document image retrieval and spotting systems, we built a new dataset which has been made freely and publicly available for the scientific community. This dataset contains portions of document images acquired via a camera as a query. It is composed of three kinds of information: textual content, graphical content and heterogeneous content

APA, Harvard, Vancouver, ISO, and other styles

43

Basunia, Mahmudunnabi. "A Recursive Phase Retrieval Technique Using Transport of Intensity: Reconstruction of Imaged Phase and 3D Surfaces." University of Dayton / OhioLINK, 2016. http://rave.ohiolink.edu/etdc/view?acc_num=dayton1481049563470488.

Full text

APA, Harvard, Vancouver, ISO, and other styles

44

Faessel, Nicolas. "Indexation et interrogation de pages web décomposées en blocs visuels." Thesis, Aix-Marseille 3, 2011. http://www.theses.fr/2011AIX30014/document.

Full text

Abstract:

Cette thèse porte sur l'indexation et l'interrogation de pages Web. Dans ce cadre, nous proposons un nouveau modèle : BlockWeb, qui s'appuie sur une décomposition de pages Web en une hiérarchie de blocs visuels. Ce modèle prend en compte, l'importance visuelle de chaque bloc et la perméabilité des blocs au contenu de leurs blocs voisins dans la page. Les avantages de cette décomposition sont multiples en terme d'indexation et d'interrogation. Elle permet notamment d'effectuer une interrogation à une granularité plus fine que la page : les blocs les plus similaires à une requête peuvent être renvoyés à la place de la page complète. Une page est représentée sous forme d'un graphe acyclique orienté dont chaque nœud est associé à un bloc et étiqueté par l'importance de ce bloc et chaque arc est étiqueté la perméabilité du bloc cible au bloc source. Afin de construire ce graphe à partir de la représentation en arbre de blocs d'une page, nous proposons un nouveau langage : XIML (acronyme de XML Indexing Management Language), qui est un langage de règles à la façon de XSLT. Nous avons expérimenté notre modèle sur deux applications distinctes : la recherche du meilleur point d'entrée sur un corpus d'articles de journaux électroniques et l'indexation et la recherche d'images sur un corpus de la campagne d'ImagEval 2006. Nous en présentons les résultats
This thesis is about indexing and querying Web pages. We propose a new model called BlockWeb, based on the decomposition of Web pages into a hierarchy of visual blocks. This model takes in account the visual importance of each block as well as the permeability of block's content to their neighbor blocks on the page. Splitting up a page into blocks has several advantages in terms of indexing and querying. It allows to query the system with a finer granularity than the whole page: the most similar blocks to the query can be returned instead of the whole page. A page is modeled as a directed acyclic graph, the IP graph, where each node is associated with a block and is labeled by the coefficient of importance of this block and each arc is labeled by the coefficient of permeability of the target node content to the source node content. In order to build this graph from the bloc tree representation of a page, we propose a new language : XIML (acronym for XML Indexing Management Language), a rule based language like XSLT. The model has been assessed on two distinct dataset: finding the best entry point in a dataset of electronic newspaper articles, and images indexing and querying in a dataset drawn from web pages of the ImagEval 2006 campaign. We present the results of these experiments

APA, Harvard, Vancouver, ISO, and other styles

45

Tarafdar, Arundhati. "Wordspotting from multilingual and stylistic documents." Thesis, Tours, 2017. http://www.theses.fr/2017TOUR4022/document.

Full text

Abstract:

Les outils et méthodes d’analyse d’images de documents (DIA) donnent aujourd’hui la possibilité de faire des recherches par mots-clés dans des bases d’images de documents alors même qu’aucune transcription n’est disponible. Dans ce contexte, beaucoup de travaux ont déjà été réalisés sur les OCR ainsi que sur des systèmes de repérage de mots (spotting) dédiés à des documents textuels avec une mise en page simple. En revanche, très peu d’approches ont été étudiées pour faire de la recherche dans des documents contenant du texte multi-orienté et multi-échelle, comme dans les documents graphiques. Par exemple, les images de cartes géographiques peuvent contenir des symboles, des graphiques et du texte ayant des orientations et des tailles différentes. Dans ces documents, les caractères peuvent aussi être connectés entre eux ou bien à des éléments graphiques. Par conséquent, le repérage de mots dans ces documents se révèle être une tâche difficile. Dans cette thèse nous proposons un ensemble d’outils et méthodes dédiés au repérage de mots écrits en caractères bengali ou anglais (script Roman) dans des images de documents géographiques. L’approche proposée repose sur plusieurs originalités
Word spotting in graphical documents is a very challenging task. To address such scenarios this thesis deals with developing a word spotting system dedicated to geographical documents with Bangla and English (Roman) scripts. In the proposed system, at first, text-graphics layers are separated using filtering, clustering and self-reinforcement through classifier. Additionally, instead of using binary decision we have used probabilistic measurement to represent the text components. Subsequently, in the text layer, character segmentation approach is applied using water-reservoir based method to extract individual character from the document. Then recognition of these isolated characters is done using rotation invariant feature, coupled with SVM classifier. Well recognized characters are then grouped based on their sizes. Initial spotting is started to find a query word among those groups of characters. In case if the system could spot a word partially due to any noise, SIFT is applied to identify missing portion of that partial spotting. Experimental results on Roman and Bangla scripts document images show that the method is feasible to spot a location in text labeled graphical documents. Experiments are done on an annotated dataset which was developed for this work. We have made this annotated dataset available publicly for other researchers

APA, Harvard, Vancouver, ISO, and other styles

46

SIDHU, ARUNIMA. "ANALYSIS OF IMAGE RETRIEVAL TECHNIQUES." Thesis, 2016. http://dspace.dtu.ac.in:8080/jspui/handle/repository/15273.

Full text

Abstract:

In the recent years we have observed a rapid rise in the size of digital image compilations. Giiga bytes of test images are created daily by both civilian as well as military equipment. However to manage such incredible amount of data pouring in everyday from so many sources we need to have an efficient storage and retrieval system for images. Since the 1970s work on retrieval of images is being actively carried on. Image retrieval can be looked upon from two angles- one his text based and the other being visual based. We have divided our work in two phases – first we have tried to retrieve text from an image using Support Vector Machines (SVM) and Maximally Stable Extremal Regions(MSER), second we retrieve image using the Bag of Visual Words technique (BoVW) where we have worked on inscription images. The text retrieval is carried on by first getting the output of images containing text from the SVM and then further processing of the SVM result using MSER. The proposed method for inscription image retrieval can be used to recognize inscriptions in languages from across the world. SURF (speeded up robust features) is used as an image feature extractor. A visual vocabulary is created by representing the image as a histogram of visual words which helps in the retrieval process. Usage of SURF ensures scalability, faster processing better results with darkened and blurred images. We demonstrate the method on a combination 300 inscriptions images comprising of several languages.

APA, Harvard, Vancouver, ISO, and other styles

47

Lin, Hui-Chuan, and 林惠娟. "A Study of Information Hiding and Image Retrieval Techniques for compressed Images." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/47958365589227005176.

Full text

Abstract:

碩士
國立臺中技術學院
資訊科技與應用研究所
95
This thesis focuses on information steganography, copyright protection and image retrieval for compressed images. There are three schemes proposed in this thesis. The first proposed scheme is tree growth based watermarking technique. To safeguard the image rightful ownerships, a representative logo or owner information could be hidden in the host image. The operation of this scheme is to divide the codewords into two groups by tree growing structure. The copyright information is embedded during the vector quantization compression. This scheme is simple and robust to protect the copyright efficiently. The second proposed scheme in the thesis is a steganography technique based on the palette method. The secret information is hidden in a cover image to ensure the transmission security. In this method, the palette colors are divided into two groups using the de-clustering scheme. The median edge detector (MED) predictor and flags are applied in information hiding. It not only increases the hidden capacity, when retrieving the message, but the original image is also recovered at the same time. This method can solve the problem that the palette image has distortion after data hiding. On the other hand, as a result of rapid Internet growth, the amount of multimedia circulation almost increases by geometric series acceleration. Many content based image retrieval (CBIR) technologies have been proposed in literatures. But few for palette-based image were proposed. However, the palette-based images have been widely used on Internet. In this thesis, a new image retrieval scheme based on palette images is proposed. The palette color (PC) is used as the first index directly and the edge-direction histogram (EDH) as the second index. The best advantage is to leave out large computation. Furthermore, this kind of compression format may save two-thirds of the image space in the database.

APA, Harvard, Vancouver, ISO, and other styles

48

Huang, Shiuan, and 黃暄. "Multi-Query Image Retrieval using CNN and SIFT Techniques." Thesis, 2016. http://ndltd.ncl.edu.tw/handle/7e8v85.

Full text

Abstract:

碩士
國立交通大學
電子工程學系電子研究所
104
Due to the rapid growth of image number, the content-based image retrieval for a large database becomes an essential tool in image processing. Although there are many published studies on this topic, it is still a challenge to do an advanced search, for example, retrieving a specific building using a view different from the photographing angles in the database. In addition, if the user can provide additional images as the second or the third queries, how do we combine the information provided by these multiple queries? Thus, we propose a multi-query fusion method to achieve a better accuracy. In this study, we test two different types of features designed for retrieval purpose. We adopt the Scale-Invariant Feature Transform (SIFT) feature as the low-level feature and the Convolutional Neural Network (CNN) feature as the high-level feature for retrieval. In using the SIFT features, the Bag-of-Word is implemented using the Term Frequency–Inverse Document Frequency (TF–IDF) retrieval algorithm. The AlexNet is adopted as our CNN model and it is modified to the Siamese-Triplet Network to match the image retrieval purpose. The Network weights are pre-trained by ImageNet and are fine-tuned using specific landmark datasets in retrieval. We use the CNN as the feature extractor instead of the image classifier. The loss function calculates the similarity between the query and the similar images or dissimilar images. Several levels of data fusion methods are proposed. The first one is combining the features of the 6th layer features and the 7th layer features derived from CNN. The second one is combine the information provided by the SIFT features and the CNN features. The third one is combining the information provided by multiple queries. When appropriate, we try both early fusion concept and the late fusion concept. Our best proposed method can exceed most of the state-of-the-art retrieval methods for a single query. The multi-query retrieval can further increase the retrieval accuracy.

APA, Harvard, Vancouver, ISO, and other styles

49

Tsai, Tsung Ting, and 蔡宗廷. "Content-Based Image Retrieval based on Search Engine Techniques." Thesis, 2008. http://ndltd.ncl.edu.tw/handle/13568655121103117364.

Full text

Abstract:

碩士
國立暨南國際大學
資訊工程學系
96
Content-based image retrieval has been an important research topic for a long time. However, to reduce the search time in a large image database remains a challenge problem. The information retrieval (IR), which is the core of the text search engine techniques, has some well-known and efficient methods which can be applied to search information in a large database. Therefore, our solution simply extracts some "visual words" from images, these are analogies to the "words" in articles, and we can apply those methods in the IR domain directly. Our method can be divided into the following three parts: (1) Extract visual words. We recursively divide an image into four equal-sized blocks, and then two methods are proposed to extract visual words from these blocks. (2) Build index. Create index between visual words and images, and the associated TF-IDF weights to the database. The key method in this part is the inverted index, which can reduce the time and computing resources when we searching words using the index. (3) Search images. We search the similar images of the query image on the created index. Two methods, (a) Count Match (b) Vector Model, are proposed to estimate the similarities between query image and images in the database. We have evaluated the proposed methods on the image databases crawled from the auction webpages.

APA, Harvard, Vancouver, ISO, and other styles

50

Liu, Yi-Min, and 柳依旻. "Adaptive Relevance Feedback Techniques for Content-Based Image Retrieval." Thesis, 2007. http://ndltd.ncl.edu.tw/handle/18126839365646713051.

Full text

Abstract:

碩士
國立暨南國際大學
資訊管理學系
95
Due to the popularity of Internet and the growing demand of image access, the volume of image databases is exploding. Hence, we need a more efficient and effective image searching technology. Relevance feedback (RF) is an interaction process between the user and the system such that the user’s information need is satisfied by the retrievals from the system. Traditional RF techniques use the same system parameter values for all types of query images. It is questionable that the best performance can be obtained through such setting. Hence, we propose self-adapting parameterization for the traditional relevance feedback approaches including the query vector modification (QVM) and the feature relevance estimation (FRE) methods using the particle swarm optimization. As such different system parameter values can be used to handle various types of queries, the retrieval system is thus more efficient and effective.

APA, Harvard, Vancouver, ISO, and other styles

Dissertations / Theses on the topic 'IMAGE RETRIEVAL TECHNIQUES'

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles