Log in

Relevant bibliographies by topics / Local visual feature / Dissertations / Theses

Dissertations / Theses on the topic 'Local visual feature'

To see the other types of publications on this topic, follow the link: Local visual feature.

Author: Grafiati

Published: 10 March 2023

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 20 dissertations / theses for your research on the topic 'Local visual feature.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Andreasson, Henrik. "Local visual feature based localisation and mapping by mobile robots." Doctoral thesis, Örebro : Örebro University, 2008. http://urn.kb.se/resolve?urn=urn:nbn:se:oru:diva-2444.

Full text

APA, Harvard, Vancouver, ISO, and other styles

2

Manivannan, Siyamalan. "Visual feature learning with application to medical image classification." Thesis, University of Dundee, 2015. https://discovery.dundee.ac.uk/en/studentTheses/10e26212-e836-4ccd-9b12-a576458de5eb.

Full text

Abstract:

Various hand-crafted features have been explored for medical image classification, which include SIFT and Local Binary Patterns (LBP). However, hand-crafted features may not be optimally discriminative for classifying images from particular domains (e.g. colonoscopy), as not necessarily tuned to the domain’s characteristics. In this work, I give emphasis on learning highly discriminative local features and image representations to achieve the best possible classification performance for medical images, particularly for colonoscopy and histology (cell) images. I propose approaches to learn local features using unsupervised and weakly-supervised methods, and an approach to improve the feature encoding methods such as bag-of-words. Unlike the existing work, the proposed weakly-supervised approach uses image-level labels to learn the local features. Requiring image-labels instead of region-level labels makes annotations less expensive, and closer to the data normally available from normal clinical practice, hence more feasible in practice. In this thesis, first, I propose a generalised version of the LBP descriptor called the Generalised Local Ternary Patterns (gLTP), which is inspired by the success of LBP and its variants for colonoscopy image classification. gLTP is robust to both noise and illumination changes, and I demonstrate its competitive performance compared to the best performing LBP-based descriptors on two different datasets (colonoscopy and histology). However LBP-based descriptors (including gLTP) lose information due to the binarisation step involved in their construction. Therefore, I then propose a descriptor called the Extended Multi-Resolution Local Patterns (xMRLP), which is real-valued and reduces information loss. I propose unsupervised and weakly-supervised learning approaches to learn the set of parameters in xMRLP. I show that the learned descriptors give competitive or better performance compared to other descriptors such as root-SIFT and Random Projections. Finally, I propose an approach to improve feature encoding methods. The approach captures inter-cluster features, providing context information in the feature as well as in the image spaces, in addition to the intra-cluster features often captured by conventional feature encoding approaches. The proposed approaches have been evaluated on three datasets, 2-class colonoscopy (2, 100 images), 3-class colonoscopy (2, 800 images) and histology (public dataset, containing 13, 596 images). Some experiments on radiology images (IRMA dataset, public) also were given. I show state-of-the-art or superior classification performance on colonoscopy and histology datasets.

APA, Harvard, Vancouver, ISO, and other styles

3

Emir, Erdem. "A Comparative Performance Evaluation Of Scale Invariant Interest Point Detectors For Infrared And Visual Images." Master's thesis, METU, 2008. http://etd.lib.metu.edu.tr/upload/2/12610159/index.pdf.

Full text

Abstract:

In this thesis, the performance of four state-of-the-art feature detectors along with SIFT and SURF descriptors in matching object features of mid-wave infrared, long-wave infrared and visual-band images is evaluated across viewpoints and changing distance conditions. The utilized feature detectors are Scale Invariant Feature Transform (SIFT), multiscale Harris-Laplace, multiscale Hessian-Laplace and Speeded Up Robust Features (SURF) detectors, all of which are invariant to image scale and rotation. Features on different blackbodies, human face and vehicle images are extracted and performance of reliable matching is explored between different views of these objects each in their own category. All of these feature detectors provide good matching performance results in infrared-band images compared with visual-band images. The comparison of matching performance for mid-wave and long-wave infrared images is also explored in this study and it is observed that long-wave infrared images provide good matching performance for objects at lower temperatures, whereas mid-wave infrared-band images provide good matching performance for objects at higher temperatures. The matching performance of SURF detector and descriptor for human face images in long-wave infrared-band is found to be outperforming than other detectors and descriptors.

APA, Harvard, Vancouver, ISO, and other styles

4

Ferro, Demetrio. "Effects of attention on visual processing between cortical layers and cortical areas V1 and V4." Doctoral thesis, Università degli studi di Trento, 2019. http://hdl.handle.net/11572/246290.

Full text

Abstract:

Visual attention improves sensory processing, as well as perceptual readout and behavior. Over the last decades, many proposals have been put forth to explain how attention affects visual neural processing. These include the modulation of neural firing rates and synchrony, neural tuning properties, and rhythmic, subthreshold activity. Despite the wealth of knowledge provided by previous studies, the way attention shapes interactions between cortical layers within and between visual sensory areas is only just emerging. To investigate this, we studied neural signals from macaque V1 and V4 visual areas, while monkeys performed a covert, feature-based spatial attention task. The data were simultaneously recorded from laminar electrodes disposed normal to cortical surface in both areas (16 contacts, 150 μm inter-contact spacing). Stimuli presentation was based on the overlap of the receptive fields (RFs) of V1 and V4. Channel depths alignment was referenced to laminar layer IV, based on spatial current source density and temporal latency analyses. Our analyses mainly focused on the study of Local Field Potential (LFP) signals, for which we applied local (bipolar) re-referencing offline. We investigated the effects of attention on LFP spectral power and laminar interactions between LFP signals at different depths, both at the local level within V1 and V4, and at the inter-areal level across V1 and V4. Inspired by current progress from literature, we were interested in the characterization of frequency-specific laminar interactions, which we investigated both in terms of rhythmic synchronization by computing spectral coherence, and in terms of directed causal influence, by computing Granger causalities (GCs). The spectral power of LFPs in different frequency bands showed relatively small differences along cortical depths both in V1 and in V4. However, we found attentional effects on LFP spectral power consistent with previous literature. For V1 LFPs, attention to stimuli in RF location mainly resulted in a shift of the low-gamma (∼30-50 Hz) spectral power peak towards (∼3-4 Hz) higher frequencies and increases in power for frequency bands above low-gamma peak frequencies, as well as decreases in power below these frequencies. For V4 LFPs, attention towards stimuli in RF locations caused a decrease in power for frequencies < 20 Hz and a broad band increase for frequencies > 20 Hz. Attention affected spectral coherence within V1 and within V4 layers in similar way as the spectral power modulation described above. Spectral coherence between V1 and V4 channel pairs was increased by attention mainly in the beta band (∼ 15-30 Hz) and the low-gamma range (∼ 30-50 Hz). Attention affected GC interactions in a layer and frequency dependent manner in complex ways, not always compliant with predictions made by the canonical models of laminar feed-forward and feed-back interactions. Within V1, attention increased feed-forward efficacy across almost all low-frequency bands (∼ 2-50 Hz). Within V4, attention mostly increased GCs in the low and high gamma frequency in a 'downwards' direction within the column, i.e. from supragranular to granular and to infragranular layers. Increases were also evident in an ‘upwards’ direction from granular to supragranular layers. For inter-areal GCs, the dominant changes were an increase in the gamma frequency range from V1 granular and infragranular layers to V4 supragranular and granular layers, as well as an increase from V4 supragranular layers to all V1 layers.

APA, Harvard, Vancouver, ISO, and other styles

5

Zhu, Chao. "Effective and efficient visual description based on local binary patterns and gradient distribution for object recognition." Phd thesis, Ecole Centrale de Lyon, 2012. http://tel.archives-ouvertes.fr/tel-00755644.

Full text

Abstract:

Cette thèse est consacrée au problème de la reconnaissance visuelle des objets basé sur l'ordinateur, qui est devenue un sujet de recherche très populaire et important ces dernières années grâce à ses nombreuses applications comme l'indexation et la recherche d'image et de vidéo , le contrôle d'accès de sécurité, la surveillance vidéo, etc. Malgré beaucoup d'efforts et de progrès qui ont été fait pendant les dernières années, il reste un problème ouvert et est encore considéré comme l'un des problèmes les plus difficiles dans la communauté de vision par ordinateur, principalement en raison des similarités entre les classes et des variations intra-classe comme occlusion, clutter de fond, les changements de point de vue, pose, l'échelle et l'éclairage. Les approches populaires d'aujourd'hui pour la reconnaissance des objets sont basé sur les descripteurs et les classiffieurs, ce qui généralement extrait des descripteurs visuelles dans les images et les vidéos d'abord, et puis effectue la classification en utilisant des algorithmes d'apprentissage automatique sur la base des caractéristiques extraites. Ainsi, il est important de concevoir une bonne description visuelle, qui devrait être à la fois discriminatoire et efficace à calcul, tout en possédant certaines propriétés de robustesse contre les variations mentionnées précédemment. Dans ce contexte, l'objectif de cette thèse est de proposer des contributions novatrices pour la tâche de la reconnaissance visuelle des objets, en particulier de présenter plusieurs nouveaux descripteurs visuelles qui représentent effectivement et efficacement le contenu visuel d'image et de vidéo pour la reconnaissance des objets. Les descripteurs proposés ont l'intention de capturer l'information visuelle sous aspects différents. Tout d'abord, nous proposons six caractéristiques LBP couleurs de multi-échelle pour traiter les défauts principaux du LBP original, c'est-à-dire, le déffcit d'information de couleur et la sensibilité aux variations des conditions d'éclairage non-monotoniques. En étendant le LBP original à la forme de multi-échelle dans les différents espaces de couleur, les caractéristiques proposées non seulement ont plus de puissance discriminante par l'obtention de plus d'information locale, mais possèdent également certaines propriétés d'invariance aux différentes variations des conditions d'éclairage. En plus, leurs performances sont encore améliorées en appliquant une stratégie de l'image division grossière à fine pour calculer les caractéristiques proposées dans les blocs d'image afin de coder l'information spatiale des structures de texture. Les caractéristiques proposées capturent la distribution mondiale de l'information de texture dans les images. Deuxièmement, nous proposons une nouvelle méthode pour réduire la dimensionnalité du LBP appelée la combinaison orthogonale de LBP (OC-LBP). Elle est adoptée pour construire un nouveau descripteur local basé sur la distribution en suivant une manière similaire à SIFT. Notre objectif est de construire un descripteur local plus efficace en remplaçant l'information de gradient coûteux par des patterns de texture locales dans le régime du SIFT. Comme l'extension de notre première contribution, nous étendons également le descripteur OC-LBP aux différents espaces de couleur et proposons six descripteurs OC-LBP couleurs pour améliorer la puissance discriminante et la propriété d'invariance photométrique du descripteur basé sur l'intensité. Les descripteurs proposés capturent la distribution locale de l'information de texture dans les images. Troisièmement, nous introduisons DAISY, un nouveau descripteur local rapide basé sur la distribution de gradient, dans le domaine de la reconnaissance visuelle des objets. [...]

APA, Harvard, Vancouver, ISO, and other styles

6

Abid, Muhammad Rizwan. "Visual Recognition of a Dynamic Arm Gesture Language for Human-Robot and Inter-Robot Communication." Thesis, Université d'Ottawa / University of Ottawa, 2015. http://hdl.handle.net/10393/32800.

Full text

Abstract:

This thesis presents a novel Dynamic Gesture Language Recognition (DGLR) system for human-robot and inter-robot communication. We developed and implemented an experimental setup consisting of a humanoid robot/android able to recognize and execute in real time all the arm gestures of the Dynamic Gesture Language (DGL) in similar way as humans do. Our DGLR system comprises two main subsystems: an image processing (IP) module and a linguistic recognition system (LRS) module. The IP module enables recognizing individual DGL gestures. In this module, we use the bag-of-features (BOFs) and a local part model approach for dynamic gesture recognition from images. Dynamic gesture classification is conducted using the BOFs and nonlinear support-vector-machine (SVM) methods. The multiscale local part model preserves the temporal context. The IP module was tested using two databases, one consisting of images of a human performing a series of dynamic arm gestures under different environmental conditions and a second database consisting of images of an android performing the same series of arm gestures. The linguistic recognition system (LRS) module uses a novel formal grammar approach to accept DGL-wise valid sequences of dynamic gestures and reject invalid ones. LRS consists of two subsystems: one using a Linear Formal Grammar (LFG) to derive the valid sequence of dynamic gestures and another using a Stochastic Linear Formal Grammar (SLFG) to occasionally recover gestures that were unrecognized by the IP module. Experimental results have shown that the DGLR system had a slightly better overall performance when recognizing gestures made by a human subject (98.92% recognition rate) than those made by the android (97.42% recognition rate).

APA, Harvard, Vancouver, ISO, and other styles

7

Ventura, Royo Carles. "Visual object analysis using regions and local features." Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/398407.

Full text

Abstract:

The first part of this dissertation focuses on an analysis of the spatial context in semantic image segmentation. First, we review how spatial context has been tackled in the literature by local features and spatial aggregation techniques. From a discussion about whether the context is beneficial or not for object recognition, we extend a Figure-Border-Ground segmentation for local feature aggregation with ground truth annotations to a more realistic scenario where object proposals techniques are used instead. Whereas the Figure and Ground regions represent the object and the surround respectively, the Border is a region around the object contour, which is found to be the region with the richest contextual information for object recognition. Furthermore, we propose a new contour-based spatial aggregation technique of the local features within the object region by a division of the region into four subregions. Both contributions have been tested on a semantic segmentation benchmark with a combination of free and non-free context local features that allows the models automatically learn whether the context is beneficial or not for each semantic category. The second part of this dissertation addresses the semantic segmentation for a set of closely-related images from an uncalibrated multiview scenario. State-of-the-art semantic segmentation algorithms fail on correctly segmenting the objects from some viewpoints when the techniques are independently applied to each viewpoint image. The lack of large annotations available for multiview segmentation do not allow to obtain a proper model that is robust to viewpoint changes. In this second part, we exploit the spatial correlation that exists between the different viewpoints images to obtain a more robust semantic segmentation. First, we review the state-of-the-art co-clustering, co-segmentation and video segmentation techniques that aim to segment the set of images in a generic way, i.e. without considering semantics. Then, a new architecture that considers motion information nd provides a multiresolution segmentation is proposed for the co-clustering framework nd outperforms state-of-the-art techniques for generic multiview segmentation. Finally, the proposed multiview segmentation is combined with the semantic segmentation results giving a method for automatic resolution selection and a coherent semantic multiview segmentation.
La primera part de la tesi es focalitza en l'anàlisi del context espacial en la segmentació semàntica d'imatges. En primer lloc, revisem com s'ha tractat el context espacial en la literatura per mitjà de descriptors locals i tècniques d'agregació espacial. A partir de la discussió sobre si el context és beneficial o no per al reconeixement d'objectes, extenem una segmentació en objecte, contorn i fons per a l'agregació espacial de descriptors locals amb annotacions a un escenari més realístic on s'utilitzen hipòtesis de localitzacions d'objectes enlloc d'annotacions. Mentres que les regions corresponen a objecte i fons representes aquestes àrees respectives de la imatge, el contorn és una regió al voltant de l'objecte, la qual ha resultat ser la regió més rica amb informació contextual per al reconeixement d'objectes. A més a més, proposem una nova tècnica d'agregació espacial dels descriptors locals de l'interior de l'objecte amb una divisió d'aquesta regió en 4 subregions. Ambdues contribucions han estat verificades en un benchmark de segmentació semàntica amb la combinació de descriptors locals dependents i independents del context que permet que els models automàticament aprenguin si el context és beneficiós o no per a cada categoria semàntica. La segona part de la tesi aborda el problema de segmentació semàntica per a un conjunt d'imatges relacionades en un escenari multi-vista sense calibració. Els algorismes de l'estat de l'art en segmentació semàntica fallen en segmentar correctament els objects dels diferents punts de vista quan les tècniques són aplicades de forma independent a cadascun dels punts de vista. La manca d'un nombre elevat d'annotacions disponibles per a segmentació multi-vista no permet obtenir un model que sigui robust als canvis de vista. En aquesta segona part, explotem la correlació espacial existent entre els diferents punts de vista per obtenir una segmentació semàntica més robusta. En primer lloc, revisem les tècniques de l'estat de l'art en co-agrupament, co-segmentació i segmentació de vídeo que tenen per objectiu segmentar el conjunt d'imatges de forma genèrica, és a dir, sense considerar la semàntica. A continuació, proposem una nova arquitectura de co-agrupament que considera informació de moviment i proveeix una segmentació amb múltiples resolucions i millora les tècniques de l'estat de l'art en segmentació genèrica multi-vista. Finalment, la segmentació multivista proposada és combinada amb els resultats de la segmentació semàntica donant lloc a un mètode per a una selecció automàtica de la resolució i una segmentació semàntica multi-vista coherent.

APA, Harvard, Vancouver, ISO, and other styles

8

Bai, Hequn. "Mobile 3D Visual Search based on Local Stereo Image Features." Thesis, KTH, Ljud- och bildbehandling, 2012. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-102603.

Full text

Abstract:

Many recent applications using local image features focus on 2D image recognition. Such applications can not distinguish between real objects and photos of objects. In this project, we present a 3D object recognition method using stereo images. Using the 3D information of the objects obtained from stereo images, objects with similar image description but different 3D shapes can be distinguished, such as real objects and photos of objects. Besides, the feature matching performance is improved compared with the method using only local image features. Knowing the fact that local image features may consume higher bitrates than transmitting the compressed images itself, we evaluate the performance of a recently proposed low-bitrate local image feature descriptor CHoG in 3D object reconstruction and recognition, and propose a difference compression method based on the quantized CHoG descriptor, which further reduces bitrates.

APA, Harvard, Vancouver, ISO, and other styles

9

Le, Viet Phuong. "Logo detection, recognition and spotting in context by matching local visual features." Thesis, La Rochelle, 2015. http://www.theses.fr/2015LAROS029/document.

Full text

Abstract:

Cette thèse présente un framework pour le logo spotting appliqué à repérer les logos à partir de l’image des documents en se concentrant sur la catégorisation de documents et les problèmes de récupération de documents. Nous présentons également trois méthodes de matching par point clé : le point clé simple avec le plus proche voisin, le matching par règle des deux voisins les plus proches et le matching par deux descripteurs locaux à différents étapes de matching. Les deux derniers procédés sont des améliorations de la première méthode. En outre, utiliser la méthode de classification basée sur la densité pour regrouper les correspondances dans le framework proposé peut aider non seulement à segmenter la région candidate du logo mais également à rejeter les correspondances incorrectes comme des valeurs aberrantes. En outre, afin de maximiser la performance et de localiser les logos, un algorithme à deux étages a été proposé pour la vérification géométrique basée sur l’homographie avec RANSAC. Comme les approches fondées sur le point clé supposent des approches coûteuses, nous avons également investi dans l’optimisation de notre framework. Les problèmes de séparation de texte/graphique sont étudiés. Nous proposons une méthode de segmentation de texte et non-texte dans les images de documents basée sur un ensemble de fonctionnalités puissantes de composants connectés. Nous avons appliqué les techniques de réduction de dimensionnalité pour réduire le vecteur de descripteurs locaux de grande dimension et rapprocher les algorithmes de recherche du voisin le plus proche pour optimiser le framework. En outre, nous avons également mené des expériences pour un système de récupération de documents sur les documents texte et non-texte segmentés et l'algorithme ANN. Les résultats montrent que le temps de calcul du système diminue brusquement de 56% tandis que la précision diminue légèrement de près de 2,5%. Dans l'ensemble, nous avons proposé une approche efficace et efficiente pour résoudre le problème de spotting des logos dans les images de documents. Nous avons conçu notre approche pour être flexible pour des futures améliorations. Nous croyons que notre travail peut être considéré comme une étape sur la voie pour résoudre le problème de l’analyse complète et la compréhension des images de documents
This thesis presents a logo spotting framework applied to spotting logo images on document images and focused on document categorization and document retrieval problems. We also present three key-point matching methods: simple key-point matching with nearest neighbor, matching by 2-nearest neighbor matching rule method and matching by two local descriptors at different matching stages. The last two matching methods are improvements of the first method. In addition, using a density-based clustering method to group the matches in our proposed spotting framework can help not only segment the candidate logo region but also reject the incorrect matches as outliers. Moreover, to maximize the performance and to locate logos, an algorithm with two stages is proposed for geometric verification based on homography with RANSAC. Since key-point-based approaches assume costly approaches, we have also invested to optimize our proposed framework. The problems of text/graphics separation are studied. We propose a method for segmenting text and non-text in document images based on a set of powerful connected component features. We applied dimensionality reduction techniques to reduce the high dimensional vector of local descriptors and approximate nearest neighbor search algorithms to optimize our proposed framework. In addition, we have also conducted experiments for a document retrieval system on the text and non-text segmented documents and ANN algorithm. The results show that the computation time of the system decreases sharply by 56% while its accuracy decreases slightly by nearly 2.5%. Overall, we have proposed an effective and efficient approach for solving the problem of logo spotting in document images. We have designed our approach to be flexible for future improvements by us and by other researchers. We believe that our work could be considered as a step in the direction of solving the problem of complete analysis and understanding of document images

APA, Harvard, Vancouver, ISO, and other styles

10

Asbach, Mark [Verfasser]. "Modeling for part-based visual object detection based on local features / Mark Asbach." Aachen : Hochschulbibliothek der Rheinisch-Westfälischen Technischen Hochschule Aachen, 2012. http://d-nb.info/1021938211/34.

Full text

APA, Harvard, Vancouver, ISO, and other styles

11

Khoualed, Samir. "Descripteurs augmentés basés sur l'information sémantique contextuelle." Phd thesis, Université Blaise Pascal - Clermont-Ferrand II, 2012. http://tel.archives-ouvertes.fr/tel-00853815.

Full text

Abstract:

Les techniques de description des éléments caractéristiques d'une image sont omniprésentes dans de nombreuses applications de vision par ordinateur. Nous proposons à travers ce manuscrit une extension, pour décrire (représenter) et apparier les éléments caractéristiques des images. L'extension proposée consiste en une approche originale pour apprendre, ou estimer, la présence sémantique des éléments caractéristiques locaux dans les images. L'information sémantique obtenue est ensuite exploitée, en conjonction avec le paradigme de sac-de-mots, pour construire un descripteur d'image performant. Le descripteur résultant, est la combinaison de deux types d'informations, locale et contextuelle-sémantique. L'approche proposée peut être généralisée et adaptée à n'importe quel descripteur local d'image, pour améliorer fortement ses performances spécialement quand l'image est soumise à des conditions d'imagerie contraintes. La performance de l'approche proposée est évaluée avec des images réelles aussi bien dans les deux domaines, 2D que 3D. Nous avons abordé dans le domaine 2D, un problème lié à l'appariement des éléments caractéristiques dans des images. Dans le domaine 3D, nous avons résolu les problèmes d'appariement et alignement des vues partielles tridimensionnelles. Les résultats obtenus ont montré qu'avec notre approche, les performances sont nettement meilleures par rapport aux autres méthodes existantes.

APA, Harvard, Vancouver, ISO, and other styles

12

Beran, Vítězslav. "On-line Analýza Dat s Využitím Vizuálních Slovníků." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2011. http://www.nusl.cz/ntk/nusl-261247.

Full text

Abstract:

Práce představuje novou adaptabilní metodu pro on-line vyhledávání videa v reálném čase pomocí vizuálních slovníků. Nová metoda se zaměřuje na nízkou výpočetní náročnost a přesnost vyhledání při on-line použití. Metoda vychází z technik využitých u statických vizuálních slovníků. Tyto běžné techniky jsou upraveny tak, aby byly schopné se adaptovat na proměnlivá data. Postupy, které toto u nové metody řeší, jsou - dynamická inverzní frekvence dokumentů, adaptabilní vizuální slovník a proměnlivý invertovaný index. Navržený postup byl vyhodnocen na úloze vyhledávání videa a prezentované výsledky ukazují, jaké vlastnosti má adaptabilní metoda ve srovnání se statickým přístupem. Nová adaptabilní metoda je založena na konceptu plovoucího okna, který definuje, jakým způsobem se vybírají data pro adaptaci a ke zpracování. Společně s konceptem je definován i matematický aparát, který umožňuje vyhodnotit, jak koncept nejlépe využít pro různé metody zpracování videa. Praktické využití adaptabilní metody je konkrétně u systémů pro zpracování videa, kde se očekává změna v charakteru vizuálních dat nebo tam, kde není předem známo, jakého charakteru vizuální data budou.

APA, Harvard, Vancouver, ISO, and other styles

13

Řezníček, Ivo. "ROZPOZNÁNÍ ČINNOSTÍ ČLOVĚKA VE VIDEU." Doctoral thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2014. http://www.nusl.cz/ntk/nusl-261240.

Full text

Abstract:

Tato disertační práce se zabývá vylepšením systémů pro rozpoznávání činností člověka. Současný stav vědění v této oblasti jest prezentován. Toto zahrnuje způsoby získávání digitálních obrazů a videí společně se způsoby reprezentace těchto entit za použití počítače. Dále jest prezentováno jak jsou použity extraktory příznakových vektorů a extraktory pros- torově-časových příznakových vektorů a způsoby přípravy těchto dat pro další zpracování. Příkladem následného zpracování jsou klasifikační metody. Pro zpracování se obecně obvykle používají části videa s proměnlivou délkou. Hlavní přínos této práce je vyřčená hypotéza o optimální délce analýzy video sekvence, kdy kvalita řešení je porovnatelná s řešením bez restrikce délky videosekvence. Algoritmus pro ověření této hypotézy jest navržen, implementován a otestován. Hypotéza byla experimentálně ověřena za použití tohoto algoritmu. Při hledání optimální délky bylo též dosaženo jistého zlepšení kvality klasifikace. Experimenty, výsledky a budoucí využití této práce jsou taktéž prezentovány.

APA, Harvard, Vancouver, ISO, and other styles

14

Veľas, Martin. "Automatické třídění fotografií podle obsahu." Master's thesis, Vysoké učení technické v Brně. Fakulta informačních technologií, 2013. http://www.nusl.cz/ntk/nusl-236399.

Full text

Abstract:

This thesis deals with content based automatic photo categorization. The aim of the work is to experiment with advanced techniques of image represenatation and to create a classifier which is able to process large image dataset with sufficient accuracy and computation speed. A traditional solution based on using visual codebooks is enhanced by computing color features, soft assignment of visual words to extracted feature vectors, usage of image segmentation in process of visual codebook creation and dividing picture into cells. These cells are processed separately. Linear SVM classifier with explicit data embeding is used for its efficiency. Finally, results of experiments with above mentioned techniques of the image categorization are discussed.

APA, Harvard, Vancouver, ISO, and other styles

15

Zhao, Zhipeng. "Towards a local-global visual feature-based framework for recognition." 2009. http://hdl.rutgers.edu/1782.2/rucore10001600001.ETD.000051935.

Full text

APA, Harvard, Vancouver, ISO, and other styles

16

BALLAN, LAMBERTO. "Object and event recognition in multimedia archives using local visual features." Doctoral thesis, 2011. http://hdl.handle.net/2158/485661.

Full text

APA, Harvard, Vancouver, ISO, and other styles

17

Li, Jung-Lin, and 李忠霖. "Stereo Visual Navigation Based on Local Scale-Invariant Feature Transform and Its Nao Embedded System Implementation." Thesis, 2010. http://ndltd.ncl.edu.tw/handle/17978402803821569526.

Full text

Abstract:

碩士
雲林科技大學
電機工程系碩士班
98
Stereo vision navigation is the fundamental functionality of the intelligent robot, so that the intelligent robot can smoothly achieve the features of obstacle avoidance, path planning, map building, and environmental localization. , However, conventional feature detection methods can not provide plenty of feature points that are distributed evenly and can not accomplish the stereo vision navigation. Meanwhile, the intelligent robot often requires some extra ultrasonic or infrared sensor for assistance. In this thesis, Local Scale-Invariant Feature Transform (SIFT) method is proposed to get more and evenly feature points. So accurate 3D environment modeling and elaborate stereo map can be accomplished easily. Experimental results verify the proposed Local SIFT can detect more and reliable feature points. On the other hand, this thesis also implements the simplified stereo vision navigation based on grayscale histogram segmentation onto Nao embedded robot. Implementation results show the simplified vision navigation based on grayscale histogram analysis is simple and efficient.

APA, Harvard, Vancouver, ISO, and other styles

18

Alqasrawi, Yousef T. N., Daniel Neagu, and Peter I. Cowling. "Fusing integrated visual vocabularies-based bag of visual words and weighted colour moments on spatial pyramid layout for natural scene image classification." 2013. http://hdl.handle.net/10454/9604.

Full text

Abstract:

No
The bag of visual words (BOW) model is an efficient image representation technique for image categorization and annotation tasks. Building good visual vocabularies, from automatically extracted image feature vectors, produces discriminative visual words, which can improve the accuracy of image categorization tasks. Most approaches that use the BOW model in categorizing images ignore useful information that can be obtained from image classes to build visual vocabularies. Moreover, most BOW models use intensity features extracted from local regions and disregard colour information, which is an important characteristic of any natural scene image. In this paper, we show that integrating visual vocabularies generated from each image category improves the BOW image representation and improves accuracy in natural scene image classification. We use a keypoint density-based weighting method to combine the BOW representation with image colour information on a spatial pyramid layout. In addition, we show that visual vocabularies generated from training images of one scene image dataset can plausibly represent another scene image dataset on the same domain. This helps in reducing time and effort needed to build new visual vocabularies. The proposed approach is evaluated over three well-known scene classification datasets with 6, 8 and 15 scene categories, respectively, using 10-fold cross-validation. The experimental results, using support vector machines with histogram intersection kernel, show that the proposed approach outperforms baseline methods such as Gist features, rgbSIFT features and different configurations of the BOW model.

APA, Harvard, Vancouver, ISO, and other styles

19

Yen, Chu-Chun, and 顏竹君. "Local Features Based Person Authentication Using Visual Speech with Random Passwords." Thesis, 2012. http://ndltd.ncl.edu.tw/handle/04481825368885259299.

Full text

APA, Harvard, Vancouver, ISO, and other styles

20

Frisky, Aufaclav Zatu Kusuma, and 柯奧福. "Visual Speech Recognition and Password Verification Using Local Spatiotemporal Features and Kernel Sparse Representation Classifier." Thesis, 2015. http://ndltd.ncl.edu.tw/handle/03868492706552896766.

Full text

Abstract:

碩士
國立中央大學
資訊工程學系在職專班
103
Visual speech recognition (VSR) applications play an important role in various aspects of human life, with research efforts being put into recognition systems in security, biometrics, and human machine interaction. In this thesis, we proposed two lip-based systems. First system, we proposed a letter recognition system using spatiotemporal features descriptors. The proposed system adopted non-negative matrix factorization (NMF) to reduce the dimensionality of the feature and kernel sparse representation classifier for classification step. We used local texture and local temporal features to represent the visual lips data. Firstly, the visual lips data were preprocessed by enhancing the contrast of images and then used to extract the feature. In our experiment, the promising accuracies of 67.13%, 45.37%, and 63.12% can be achieved in semi speaker dependent, speaker independent, and speaker dependent on AVLetters database. We also compared our method with other methods on AVLetters 2 database. Using the same configuration, our method could achieve accuracy rate of 89.02% for speaker dependent case and 25.9% for speaker independent case. This result shows that our method outperforms the others in the same configuration. In the second system, we proposed a new approach in lip-based password for home entrance security using confidence point in home automation system. We also proposed new features using modified version of spatiotemporal descriptor features adopt L2-Hellinger to do a normalization and used two-dimension semi non-negative matrix factorization (2D Semi-NMF) for dimensionality reduction. In classifier parts, we proposed forward-backward kernel sparse representation classifier (FB-KSRC). Our experiment results proves that our system is quite robust to classify the password. We applied this system in AVLetters 2 dataset. Using ten visual passwords of five combined letters from AVLetters 2 dataset, using all combination experiments, the result shows that our system can verify the password very well. In the complexity experiment, we also get a reasonable time classification process if our system will be implemented in real world application.

APA, Harvard, Vancouver, ISO, and other styles

We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!