Dissertations / Theses on the topic 'Visual object'

To see the other types of publications on this topic, follow the link: Visual object.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 50 dissertations / theses for your research on the topic 'Visual object.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Figueroa, Flores Carola. "Visual Saliency for Object Recognition, and Object Recognition for Visual Saliency." Doctoral thesis, Universitat Autònoma de Barcelona, 2021. http://hdl.handle.net/10803/671964.

Full text
Abstract:
Per als humans, el reconeixement d’objectes és un procés gairebé instantani, precís i extremadament adaptable. A més, tenim la capacitat innata d’aprendre classes d’objectes nous a partir d’uns pocs exemples. El cervell humà redueix la complexitat de les dades entrants filtrant part de la informació i processant només aquelles coses que ens capturen l’atenció. Això, barrejat amb la nostra predisposició biològica per respondre a determinades formes o colors, ens permet reconèixer en un simple cop d’ull les regions més importants o destacades d’una imatge. Aquest mecanisme es pot observar analitzant sobre quines parts de les imatges hi posa l’atenció; on es fixen els ulls quan se’ls mostra una imatge. La forma més precisa de registrar aquest comportament és fer un seguiment dels moviments oculars mentre es mostren imatges. L’estimació computacional de la salubritat té com a objectiu identificar fins a quin punt les regions o els objectes destaquen respecte als seus entorns per als observadors humans. Els mapes Saliency es poden utilitzar en una àmplia gamma d’aplicacions, inclosa la detecció d’objectes, la compressió d’imatges i vídeos i el seguiment visual. La majoria de les investigacions en aquest camp s’han centrat en estimar automàticament els mapes de salubritat donats una imatge d’entrada. En el seu lloc, en aquesta tesi, ens proposem incorporar mapes de salubritat en una canalització de reconeixement d’objectes: volem investigar si els mapes de salubritat poden millorar els resultats del reconeixement d’objectes.En aquesta tesi, identifiquem diversos problemes relacionats amb l’estimació de la salubritat visual. En primer lloc, fins a quin punt es pot aprofitar l’estimació de la salubritat per millorar la formació d’un model de reconeixement d’objectes quan es disposa de dades d’entrenament escasses. Per solucionar aquest problema, dissenyem una xarxa de classificació d’imatges que incorpori informació d’informació salarial com a entrada. Aquesta xarxa processa el mapa de saliència a través d’una branca de xarxa dedicada i utilitza les característiques resultants per modular les característiques visuals estàndard de baix a dalt de l’entrada d’imatge original. Ens referirem a aquesta tècnica com a classificació d’imatges modulades en salinitat (SMIC). En amplis experiments sobre conjunts de dades de referència estàndard per al reconeixement d’objectes de gra fi, demostrem que la nostra arquitectura proposada pot millorar significativament el rendiment, especialment en el conjunt de dades amb dades de formació escasses.A continuació, abordem l’inconvenient principal de la canonada anterior: SMIC requereix un algorisme de saliència explícit que s’ha de formar en un conjunt de dades de saliència. Per solucionar-ho, implementem un mecanisme d’al·lucinació que ens permet incorporar la branca d’estimació de la salubritat en una arquitectura de xarxa neuronal entrenada de punta a punta que només necessita la imatge RGB com a entrada. Un efecte secundari d’aquesta arquitectura és l’estimació de mapes de salubritat. En experiments, demostrem que aquesta arquitectura pot obtenir resultats similars en reconeixement d’objectes com SMIC, però sense el requisit de mapes de salubritat de la veritat del terreny per entrenar el sistema. Finalment, hem avaluat la precisió dels mapes de salubritat que es produeixen com a efecte secundari del reconeixement d’objectes. Amb aquest propòsit, fem servir un conjunt de conjunts de dades de referència per a l’avaluació de la validesa basats en experiments de seguiment dels ulls. Sorprenentment, els mapes de salubritat estimats són molt similars als mapes que es calculen a partir d’experiments de rastreig d’ulls humans. Els nostres resultats mostren que aquests mapes de salubritat poden obtenir resultats competitius en els mapes de salubritat de referència. En un conjunt de dades de saliència sintètica, aquest mètode fins i tot obté l’estat de l’art sense la necessitat d’haver vist mai una imatge de saliència real.
El reconocimiento de objetos para los seres humanos es un proceso instantáneo, preciso y extremadamente adaptable. Además, tenemos la capacidad innata de aprender nuevas categorias de objetos a partir de unos pocos ejemplos. El cerebro humano reduce la complejidad de los datos entrantes filtrando parte de la información y procesando las cosas que captan nuestra atención. Esto, combinado con nuestra predisposición biológica a responder a determinadas formas o colores, nos permite reconocer en una simple mirada las regiones más importantes o destacadas de una imagen. Este mecanismo se puede observar analizando en qué partes de las imágenes los sujetos ponen su atención; por ejemplo donde fijan sus ojos cuando se les muestra una imagen. La forma más precisa de registrar este comportamiento es rastrear los movimientos de los ojos mientras se muestran imágenes. La estimación computacional del ‘saliency’, tiene como objetivo diseñar algoritmos que, dada una imagen de entrada, estimen mapas de ‘saliency’. Estos mapas se pueden utilizar en una variada gama de aplicaciones, incluida la detección de objetos, la compresión de imágenes y videos y el seguimiento visual. La mayoría de la investigación en este campo se ha centrado en estimar automáticamente estos mapas de ‘saliency’, dada una imagen de entrada. En cambio, en esta tesis, nos propusimos incorporar la estimación de ‘saliency’ en un procedimiento de reconocimiento de objeto, puesto que, queremos investigar si los mapas de ‘saliency’ pueden mejorar los resultados de la tarea de reconocimiento de objetos. En esta tesis, identificamos varios problemas relacionados con la estimación del ‘saliency’ visual. Primero, pudimos determinar en qué medida se puede aprovechar la estimación del ‘saliency’ para mejorar el entrenamiento de un modelo de reconocimiento de objetos cuando se cuenta con escasos datos de entrenamiento. Para resolver este problema, diseñamos una red de clasificación de imágenes que incorpora información de ‘saliency’ como entrada. Esta red procesa el mapa de ‘saliency’ a través de una rama de red dedicada y utiliza las características resultantes para modular las características visuales estándar ascendentes de la entrada de la imagen original. Nos referiremos a esta técnica como clasificación de imágenes moduladas por prominencia (SMIC en inglés). En numerosos experimentos realizando sobre en conjuntos de datos de referencia estándar para el reconocimiento de objetos ‘fine-grained’, mostramos que nuestra arquitectura propuesta puede mejorar significativamente el rendimiento, especialmente en conjuntos de datos con datos con escasos datos de entrenamiento. Luego, abordamos el principal inconveniente del problema anterior: es decir, SMIC requiere explícitamente un algoritmo de ‘saliency’, el cual debe entrenarse en un conjunto de datos de ‘saliency’. Para resolver esto, implementamos un mecanismo de alucinación que nos permite incorporar la rama de estimación de ‘saliency’ en una arquitectura de red neuronal entrenada de extremo a extremo que solo necesita la imagen RGB como entrada. Un efecto secundario de esta arquitectura es la estimación de mapas de ‘saliency’. En varios experimentos, demostramos que esta arquitectura puede obtener resultados similares en el reconocimiento de objetos como SMIC pero sin el requisito de mapas de ‘saliency’ para entrenar el sistema. Finalmente, evaluamos la precisión de los mapas de ‘saliency’ que ocurren como efecto secundario del reconocimiento de objetos. Para ello, utilizamos un de conjuntos de datos de referencia para la evaluación de la prominencia basada en experimentos de seguimiento ocular. Sorprendentemente, los mapas de ‘saliency’ estimados son muy similares a los mapas que se calculan a partir de experimentos de seguimiento ocular humano. Nuestros resultados muestran que estos mapas de ‘saliency’ pueden obtener resultados competitivos en mapas de ‘saliency’ de referencia.
For humans, the recognition of objects is an almost instantaneous, precise and extremely adaptable process. Furthermore, we have the innate capability to learn new object classes from only few examples. The human brain lowers the complexity of the incoming data by filtering out part of the information and only processing those things that capture our attention. This, mixed with our biological predisposition to respond to certain shapes or colors, allows us to recognize in a simple glance the most important or salient regions from an image. This mechanism can be observed by analyzing on which parts of images subjects place attention; where they fix their eyes when an image is shown to them. The most accurate way to record this behavior is to track eye movements while displaying images. Computational saliency estimation aims to identify to what extent regions or objects stand out with respect to their surroundings to human observers. Saliency maps can be used in a wide range of applications including object detection, image and video compression, and visual tracking. The majority of research in the field has focused on automatically estimating saliency maps given an input image. Instead, in this thesis, we set out to incorporate saliency maps in an object recognition pipeline: we want to investigate whether saliency maps can improve object recognition results. In this thesis, we identify several problems related to visual saliency estimation. First, to what extent the estimation of saliency can be exploited to improve the training of an object recognition model when scarce training data is available. To solve this problem, we design an image classification network that incorporates saliency information as input. This network processes the saliency map through a dedicated network branch and uses the resulting characteristics to modulate the standard bottom-up visual characteristics of the original image input. We will refer to this technique as saliency-modulated image classification (SMIC). In extensive experiments on standard benchmark datasets for fine-grained object recognition, we show that our proposed architecture can significantly improve performance, especially on dataset with scarce training data. Next, we address the main drawback of the above pipeline: SMIC requires an explicit saliency algorithm that must be trained on a saliency dataset. To solve this, we implement a hallucination mechanism that allows us to incorporate the saliency estimation branch in an end-to-end trained neural network architecture that only needs the RGB image as an input. A side-effect of this architecture is the estimation of saliency maps. In experiments, we show that this architecture can obtain similar results on object recognition as SMIC but without the requirement of ground truth saliency maps to train the system. Finally, we evaluated the accuracy of the saliency maps that occur as a side-effect of object recognition. For this purpose, we use a set of benchmark datasets for saliency evaluation based on eye-tracking experiments. Surprisingly, the estimated saliency maps are very similar to the maps that are computed from human eye-tracking experiments. Our results show that these saliency maps can obtain competitive results on benchmark saliency maps. On one synthetic saliency dataset this method even obtains the state-of-the-art without the need of ever having seen an actual saliency image for training.
Universitat Autònoma de Barcelona. Programa de Doctorat en Informàtica
APA, Harvard, Vancouver, ISO, and other styles
2

Fergus, Robert. "Visual object category recognition." Thesis, University of Oxford, 2005. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.425029.

Full text
APA, Harvard, Vancouver, ISO, and other styles
3

Wallenberg, Marcus. "Embodied Visual Object Recognition." Doctoral thesis, Linköpings universitet, Datorseende, 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-132762.

Full text
Abstract:
Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
FaceTrack
APA, Harvard, Vancouver, ISO, and other styles
4

Nguyen, Duong B. T. Carleton University Dissertation Computer Science. "The visual object editing kit." Ottawa, 1993.

Find full text
APA, Harvard, Vancouver, ISO, and other styles
5

Tauber, Zinovi. "Visual object retrieval based on locales." Thesis, National Library of Canada = Bibliothèque nationale du Canada, 2000. http://www.collectionscanada.ca/obj/s4/f2/dsk1/tape3/PQDD_0013/MQ61504.pdf.

Full text
APA, Harvard, Vancouver, ISO, and other styles
6

Breuel, Thomas M. "Geometric Aspects of Visual Object Recognition." Thesis, Massachusetts Institute of Technology, 1992. http://hdl.handle.net/1721.1/7342.

Full text
Abstract:
This thesis presents there important results in visual object recognition based on shape. (1) A new algorithm (RAST; Recognition by Adaptive Sudivisions of Tranformation space) is presented that has lower average-case complexity than any known recognition algorithm. (2) It is shown, both theoretically and empirically, that representing 3D objects as collections of 2D views (the "View-Based Approximation") is feasible and affects the reliability of 3D recognition systems no more than other commonly made approximations. (3) The problem of recognition in cluttered scenes is considered from a Bayesian perspective; the commonly-used "bounded-error errorsmeasure" is demonstrated to correspond to an independence assumption. It is shown that by modeling the statistical properties of real-scenes better, objects can be recognized more reliably.
APA, Harvard, Vancouver, ISO, and other styles
7

Meger, David Paul. "Visual object recognition for mobile platforms." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44682.

Full text
Abstract:
A robot must recognize objects in its environment in order to complete numerous tasks. Significant progress has been made in modeling visual appearance for image recognition, but the performance of current state-of-the-art approaches still falls short of that required by applications. This thesis describes visual recognition methods that leverage the spatial information sources available on-board mobile robots, such as the position of the platform in the world and the range data from its sensors, in order to significantly improve performance. Our research includes: a physical robotic platform that is capable of state-of-the-art recognition performance; a re-usable data set that facilitates study of the robotic recognition problem by the scientific community; and a three dimensional object model that demonstrates improved robustness to clutter. Based on our 3D model, we describe algorithms that integrate information across viewpoints, relate objects to auxiliary 3D sensor information, plan paths to next-best-views, explicitly model object occlusions and reason about the sub-parts of objects in 3D. Our approaches have been proven experimentally on-board the Curious George robot platform, which placed first in an international object recognition challenge for mobile robots for several years. We have also collected a large set of visual experiences from a robot, annotated the true objects in this data and made it public to the research community for use in performance evaluation. A path planning system derived from our model has been shown to hasten confident recognition by allowing informative viewpoints to be observed quickly. In each case studied, our system demonstrates significant improvements in recognition rate, in particular on realistic cluttered scenes, which promises more successful task execution for robotic platforms in the future.
APA, Harvard, Vancouver, ISO, and other styles
8

Fu, Huanzhang. "Contributions to generic visual object categorization." Phd thesis, Ecole Centrale de Lyon, 2010. http://tel.archives-ouvertes.fr/tel-00599713.

Full text
Abstract:
This thesis is dedicated to the active research topic of generic Visual Object Categorization(VOC), which can be widely used in many applications such as videoindexation and retrieval, video monitoring, security access control, automobile drivingsupport etc. Due to many realistic difficulties, it is still considered to be one ofthe most challenging problems in computer vision and pattern recognition. In thiscontext, we have proposed in this thesis our contributions, especially concerning thetwo main components of the methods addressing VOC problems, namely featureselection and image representation.Firstly, an Embedded Sequential Forward feature Selection algorithm (ESFS)has been proposed for VOC. Its aim is to select the most discriminant features forobtaining a good performance for the categorization. It is mainly based on thecommonly used sub-optimal search method Sequential Forward Selection (SFS),which relies on the simple principle to add incrementally most relevant features.However, ESFS not only adds incrementally most relevant features in each stepbut also merges them in an embedded way thanks to the concept of combinedmass functions from the evidence theory which also offers the benefit of obtaining acomputational cost much lower than the one of original SFS.Secondly, we have proposed novel image representations to model the visualcontent of an image, namely Polynomial Modeling and Statistical Measures basedImage Representation, called PMIR and SMIR respectively. They allow to overcomethe main drawback of the popular "bag of features" method which is the difficultyto fix the optimal size of the visual vocabulary. They have been tested along withour proposed region based features and SIFT. Two different fusion strategies, earlyand late, have also been considered to merge information from different "channels"represented by the different types of features.Thirdly, we have proposed two approaches for VOC relying on sparse representation,including a reconstructive method (R_SROC) as well as a reconstructiveand discriminative one (RD_SROC). Indeed, sparse representation model has beenoriginally used in signal processing as a powerful tool for acquiring, representingand compressing the high-dimensional signals. Thus, we have proposed to adaptthese interesting principles to the VOC problem. R_SROC relies on the intuitiveassumption that an image can be represented by a linear combination of trainingimages from the same category. Therefore, the sparse representations of images arefirst computed through solving the ℓ1 norm minimization problem and then usedas new feature vectors for images to be classified by traditional classifiers such asSVM. To improve the discrimination ability of the sparse representation to betterfit the classification problem, we have also proposed RD_SROC which includes adiscrimination term, such as Fisher discrimination measure or the output of a SVMclassifier, to the standard sparse representation objective function in order to learna reconstructive and discriminative dictionary. Moreover, we have also proposedChapter 0. Abstractto combine the reconstructive and discriminative dictionary and the adapted purereconstructive dictionary for a given category so that the discrimination power canfurther be increased.The efficiency of all the methods proposed in this thesis has been evaluated onpopular image datasets including SIMPLIcity, Caltech101 and Pascal2007.
APA, Harvard, Vancouver, ISO, and other styles
9

Choi, Changhyun. "Visual object perception in unstructured environments." Diss., Georgia Institute of Technology, 2014. http://hdl.handle.net/1853/53003.

Full text
Abstract:
As robotic systems move from well-controlled settings to increasingly unstructured environments, they are required to operate in highly dynamic and cluttered scenarios. Finding an object, estimating its pose, and tracking its pose over time within such scenarios are challenging problems. Although various approaches have been developed to tackle these problems, the scope of objects addressed and the robustness of solutions remain limited. In this thesis, we target a robust object perception using visual sensory information, which spans from the traditional monocular camera to the more recently emerged RGB-D sensor, in unstructured environments. Toward this goal, we address four critical challenges to robust 6-DOF object pose estimation and tracking that current state-of-the-art approaches have, as yet, failed to solve. The first challenge is how to increase the scope of objects by allowing visual perception to handle both textured and textureless objects. A large number of 3D object models are widely available in online object model databases, and these object models provide significant prior information including geometric shapes and photometric appearances. We note that using both geometric and photometric attributes available from these models enables us to handle both textured and textureless objects. This thesis presents our efforts to broaden the spectrum of objects to be handled by combining geometric and photometric features. The second challenge is how to dependably estimate and track the pose of an object despite the clutter in backgrounds. Difficulties in object perception rise with the degree of clutter. Background clutter is likely to lead to false measurements, and false measurements tend to result in inaccurate pose estimates. To tackle significant clutter in backgrounds, we present two multiple pose hypotheses frameworks: a particle filtering framework for tracking and a voting framework for pose estimation. Handling of object discontinuities during tracking, such as severe occlusions, disappearances, and blurring, presents another important challenge. In an ideal scenario, a tracked object is visible throughout the entirety of tracking. However, when an object happens to be occluded by other objects or disappears due to the motions of the object or the camera, difficulties ensue. Because the continuous tracking of an object is critical to robotic manipulation, we propose to devise a method to measure tracking quality and to re-initialize tracking as necessary. The final challenge we address is performing these tasks within real-time constraints. Our particle filtering and voting frameworks, while time-consuming, are composed of repetitive, simple and independent computations. Inspired by that observation, we propose to run massively parallelized frameworks on a GPU for those robotic perception tasks which must operate within strict time constraints.
APA, Harvard, Vancouver, ISO, and other styles
10

Buchler, Daniela Martins. "Visual perception of the designed object." Thesis, Staffordshire University, 2007. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.442502.

Full text
Abstract:
This investigation deals with the issue of visual perception of the designed object, which is relevant in the context of product differentiation particularly in the case where incremental style changes are made to the external shape design of the product. Such cases present a problem regarding the effectiveness of product differentiation, which this research claims is a matter of visual perception. The problem is that in order for product differentiation to be effective, the design changes must be perceptible. Perceptible differentiation is explained as a function of the physical change, i.e. the Oreal¹ difference, and also of the relevance for the observer of that change, i.e. Operceived¹ difference. This study therefore focuses on the comparison between these two aspects of the designed object: the physical design and the perceived design. Literature from both material culture and the so-called indirect account of perception suggest that visual perception is an interpretation of the artefacts that we see. This visual perception is a function of the physical aspect of that object and of the individual cultural background of the observer. However, it was found that between these two accounts there are theoretical incompatibilities which this study claims could be resolved through scholarly investigation of visual perception of the designed object. The thesis takes these two accounts into consideration and proposes a more comprehensive model of visual perception of the designed object that details and extends the material culture understanding of what constitutes the perceptual experience with the designed object and the role of form in that experience. Theory building was conducted across the disciplines of psychology of perception and design. A revised model was proposed for the area of designed object studies, which was informed by Gregory¹s theoretical framework and incorporated empirical explorations into the model development process. The study therefore contributes knowledge to the research area of design, more specifically to cross-disciplinary methods for theory building on visual perception of the designed object.
APA, Harvard, Vancouver, ISO, and other styles
11

Fang, Jianzhong. "Computational approaches to visual object detection." Thesis, University of Nottingham, 2004. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.416393.

Full text
APA, Harvard, Vancouver, ISO, and other styles
12

Moghaddam, Baback 1963. "Probabilistic visual learning for object detection." Thesis, Massachusetts Institute of Technology, 1997. http://hdl.handle.net/1721.1/10242.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 1997.
Includes bibliographical references (leaves 78-82).
by Baback Moghaddam.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
13

Lim, Joseph J. (Joseph Jaewhan). "Toward visual understanding of everyday object." Thesis, Massachusetts Institute of Technology, 2015. http://hdl.handle.net/1721.1/101574.

Full text
Abstract:
Thesis: Ph. D., Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2015.
Cataloged from PDF version of thesis.
Includes bibliographical references (pages 83-92).
The computer vision community has made impressive progress on object recognition using large scale data. However, for any visual system to interact with objects, it needs to understand much more than simply recognizing where the objects are. The goal of my research is to explore and solve object understanding tasks for interaction - finding an object's pose in 3D, understanding its various states and transformations, and interpreting its physical interactions. In this thesis, I will focus on two specific aspects of this agenda: 3D object pose estimation and object state understanding. Precise pose estimation is a challenging problem. One reason is that an object's appearance inside an image can vary a lot based on different conditions (e.g. location, occlusions, and lighting). I address these issues by utilizing 3D models directly. The goal is to develop a method that can exploit all possible views provided by a 3D model - a single 3D model represents infinitely many 2D views of the same object. I have developed a method that uses the 3D geometry of an object for pose estimation. The method can then also learn additional real-world statistics, such as which poses appear more frequently, which area is more likely to contain an object, and which parts are commonly occluded and discriminative. These methods allow us to localize and estimate the exact pose of objects in natural images. Finally, I will also describe the work on learning and inferring different states and transformations an object class can undergo. Objects in visual scenes come in a rich variety of transformed states. A few classes of transformation have been heavily studied in computer vision: mostly simple, parametric changes in color and geometry. However, transformations in the physical world occur in many more flavors, and they come with semantic meaning: e.g., bending, folding, aging, etc. Hence, the goal is to learn about an object class, in terms of their states and transformations, using the collection of images from the image search engine.
by Joseph J. Lim.
Ph. D.
APA, Harvard, Vancouver, ISO, and other styles
14

Tuovinen, Antti-Pekka. "Object-oriented engineering of visual languages." Helsinki : University of Helsinki, 2002. http://ethesis.helsinki.fi/julkaisut/mat/tieto/vk/tuovinen/.

Full text
APA, Harvard, Vancouver, ISO, and other styles
15

Thanikasalam, Kokul. "Appearance based online visual object tracking." Thesis, Queensland University of Technology, 2019. https://eprints.qut.edu.au/130875/1/Kokul_Thanikasalam_Thesis.pdf.

Full text
Abstract:
This thesis presents research contributions to the field of computer vision based visual object tracking. This study investigates appearance based object tracking by using traditional hand-crafted and deep features. The thesis proposes a real-time tracking framework with high accuracy which follows a deep similarity tracking strategy. This thesis also proposes several deep tracking frameworks for high-accuracy tracking and to manage the spatial information loss. The research findings of the study would be able to be used in a range of applications including visual surveillance systems.
APA, Harvard, Vancouver, ISO, and other styles
16

Yang, Tao. "visual tracking and object motion prediction for intelligent vehicles." Thesis, Bourgogne Franche-Comté, 2019. http://www.theses.fr/2019UBFCA005.

Full text
Abstract:
Le suivi d’objets et la prédiction de mouvement sont des aspects importants pour les véhicules autonomes. Tout d'abord, nous avons développé une méthode de suivi mono-objet en utilisant le compressive tracking, afin de corriger le suivi à base de flux optique et d’arriver ainsi à un compromis entre performance et vitesse de traitement. Compte tenu de l'efficacité de l'extraction de caractéristiques comprimées (compressive features), nous avons appliqué cette méthode de suivi au cas multi-objets pour améliorer les performances sans trop ralentir la vitesse de traitement. Deuxièmement, nous avons amélioré la méthode de suivi mono-objet basée sur DCF en utilisant des caractéristiques provenant d’un CNN multicouches, une analyse de fiabilité spatiale (via un masque d'objet) ainsi qu’une stratégie conditionnelle de mise à jour de modèle. Ensuite, nous avons appliqué la méthode améliorée au cas du suivi multi-objets. Les VGGNet-19 et DCFNet pré-entraînés sont testés respectivement en tant qu’extracteurs de caractéristiques. Le modèle discriminant réalisé par DCF est pris en compte dans l’étape d'association des données. Troisièmement, deux modèles LSTM (seq2seq et seq2dense) pour la prédiction de mouvement des véhicules et piétons dans le système de référence de la caméra sont proposés. En se basant sur des données visuelles et un nuage de points 3D (LiDAR), un système de suivi multi-objets basé sur un filtre de Kalman avec un détecteur 3D sont utilisés pour générer les trajectoires des objets à tester. Les modèles proposées et le modèle de régression polynomiale, considéré comme méthode de référence, sont comparés et évalués
Object tracking and motion prediction are important for autonomous vehicles and can be applied in many other fields. First, we design a single object tracker using compressive tracking to correct the optical flow tracking in order to achieve a balance between performance and processing speed. Considering the efficiency of compressive feature extraction, we apply this tracker to multi-object tracking to improve the performance without slowing down too much speed. Second, we improve the DCF based single object tracker by introducing multi-layer CNN features, spatial reliability analysis (through a foreground mask) and conditionally model updating strategy. Then, we apply the DCF based CNN tracker to multi-object tracking. The pre-trained VGGNet-19 and DCFNet are tested as feature extractors respectively. The discriminative model achieved by DCF is considered for data association. Third, two proposed LSTM models (seq2seq and seq2dense) for motion prediction of vehicles and pedestrians in the camera coordinate are proposed. Based on visual data and 3D points cloud (LiDAR), a Kalman filter based multi-object tracking system with a 3D detector are used to generate the object trajectories for testing. The proposed models, and polynomial regression model, considered as baseline, are compared for evaluation
APA, Harvard, Vancouver, ISO, and other styles
17

Hussain, Sabit ul. "Machine Learning Methods for Visual Object Detection." Thesis, Grenoble, 2011. http://www.theses.fr/2011GRENM070/document.

Full text
Abstract:
Le but de cette thèse est de développer des méthodes pratiques plus performantes pour la détection d'instances de classes d'objets de la vie quotidienne dans les images. Nous présentons une famille de détecteurs qui incorporent trois types d'indices visuelles performantes – histogrammes de gradients orientés (Histograms of Oriented Gradients, HOG), motifs locaux binaires (Local Binary Patterns, LBP) et motifs locaux ternaires (Local Ternary Patterns, LTP) – dans des méthodes de discrimination efficaces de type machine à vecteur de support latent (Latent SVM), sous deux régimes de réduction de dimension – moindres carrées partielles (Partial Least Squares, PLS) et sélection de variables par élagage de poids SVM (SVM Weight Truncation). Sur plusieurs jeux de données importantes, notamment ceux du PASCAL VOC2006 et VOC2007, INRIA Person et ETH Zurich, nous démontrons que nos méthodes améliorent l'état de l'art du domaine. Nos contributions principales sont : – Nous étudions l'indice visuelle LTP pour la détection d'objets. Nous démontrons que sa performance est globalement mieux que celle des indices bien établies HOG et LBP parce qu'elle permet d'encoder à la fois la texture locale de l'objet et sa forme globale, tout en étant résistante aux variations d'éclairage. Grâce à ces atouts, LTP fonctionne aussi bien pour les classes qui sont caractérisées principalement par leurs structures que pour celles qui sont caractérisées par leurs textures. En plus, nous démontrons que les indices HOG, LBP et LTP sont bien complémentaires, de sorte qu'un jeux d'indices étendu qui intègre tous les trois améliore encore la performance. – Les jeux d'indices visuelles performantes étant de dimension assez élevée, nous proposons deux méthodes de réduction de dimension afin d'améliorer leur vitesse et réduire leur utilisation de mémoire. La première, basée sur la projection moindres carrés partielles, diminue significativement le temps de formation des détecteurs linéaires, sans réduction de précision ni perte de vitesse d'exécution. La seconde, fondée sur la sélection de variables par l'élagage des poids du SVM, nous permet de réduire le nombre d'indices actives par un ordre de grandeur avec une réduction minime, voire même une petite augmentation, de la précision du détecteur. Malgré sa simplicité, cette méthode de sélection de variables surpasse toutes les autres approches que nous avons mis à l'essai. – Enfin, nous décrivons notre travail en cours sur une nouvelle variété d'indice visuelle – les « motifs locaux quantifiées » (Local Quantized Patterns, LQP). LQP généralise les indices existantes LBP / LTP en introduisant une étape de quantification vectorielle – ce qui permet une souplesse et une puissance analogue aux celles des approches de reconnaissance visuelle « sac de mots », qui sont basées sur la quantification des régions locales d'image considérablement plus grandes – sans perdre la simplicité et la rapidité qui caractérisent les approches motifs locales actuelles parce que les résultats de la quantification puissent être pré-compilés et stockés dans un tableau. LQP permet une augmentation considérable de la taille du support local de l'indice, et donc de sa puissance discriminatoire. Nos expériences indiquent qu'elle a la meilleure performance de toutes les indices visuelles testés, y compris HOG, LBP et LTP
The goal of this thesis is to develop better practical methods for detecting common object classes in real world images. We present a family of object detectors that combine Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) features with efficient Latent SVM classifiers and effective dimensionality reduction and sparsification schemes to give state-of-the-art performance on several important datasets including PASCAL VOC2006 and VOC2007, INRIA Person and ETHZ. The three main contributions are as follows. Firstly, we pioneer the use of Local Ternary Pattern features for object detection, showing that LTP gives better overall performance than HOG and LBP, because it captures both rich local texture and object shape information while being resistant to variations in lighting conditions. It thus works well both for classes that are recognized mainly by their structure and ones that are recognized mainly by their textures. We also show that HOG, LBP and LTP complement one another, so that an extended feature set that incorporates all three of them gives further improvements in performance. Secondly, in order to tackle the speed and memory usage problems associated with high-dimensional modern feature sets, we propose two effective dimensionality reduction techniques. The first, feature projection using Partial Least Squares, allows detectors to be trained more rapidly with negligible loss of accuracy and no loss of run time speed for linear detectors. The second, feature selection using SVM weight truncation, allows active feature sets to be reduced in size by almost an order of magnitude with little or no loss, and often a small gain, in detector accuracy. Despite its simplicity, this feature selection scheme outperforms all of the other sparsity enforcing methods that we have tested. Lastly, we describe work in progress on Local Quantized Patterns (LQP), a generalized form of local pattern features that uses lookup table based vector quantization to provide local pattern style pixel neighbourhood codings that have the speed of LBP/LTP and some of the flexibility and power of traditional visual word representations. Our experiments show that LQP outperforms all of the other feature sets tested including HOG, LBP and LTP
APA, Harvard, Vancouver, ISO, and other styles
18

Craddock, Matthew Peter. "Comparing the attainment of object constancy in haptic and visual object recognition." Thesis, University of Liverpool, 2010. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.539615.

Full text
APA, Harvard, Vancouver, ISO, and other styles
19

Gepperth, Alexander Rainer Tassilo. "Neural learning methods for visual object detection." [S.l.] : [s.n.], 2006. http://deposit.ddb.de/cgi-bin/dokserv?idn=981053998.

Full text
APA, Harvard, Vancouver, ISO, and other styles
20

Allred, Sarah R. "The Neural basis of visual object perception /." Thesis, Connect to this title online; UW restricted, 2006. http://hdl.handle.net/1773/10645.

Full text
APA, Harvard, Vancouver, ISO, and other styles
21

Mahmood, Hamid. "Visual Attention-based Object Detection and Recognition." Thesis, Linköpings universitet, Institutionen för datavetenskap, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-94024.

Full text
Abstract:
This thesis is all about the visual attention, starting from understanding the human visual system up till applying this mechanism to a real-world computer vision application. This has been achieved by taking the advantage of latest findings about the human visual attention and the increased performance of the computers. These two facts played a vital role in simulating the many different aspects of this visual behavior. In addition, the concept of bio-inspired visual attention systems have become applicable due to the emergence of different interdisciplinary approaches to vision which leads to a beneficial interaction between the scientists related to different fields. The problems of high complexities in computer vision lead to consider the visual attention paradigm to become a part of real time computer vision solutions which have increasing demand.  In this thesis work, different aspects of visual attention paradigm have been dealt ranging from the biological modeling to the real-world computer vision tasks implementation based on this visual behavior. The implementation of traffic signs detection and recognition system benefited from this mechanism is the central part of this thesis work.
APA, Harvard, Vancouver, ISO, and other styles
22

Hussain, Sibt Ul. "Machine Learning Methods for Visual Object Detection." Phd thesis, Université de Grenoble, 2011. http://tel.archives-ouvertes.fr/tel-00680048.

Full text
Abstract:
The goal of this thesis is to develop better practical methods for detecting common object classes in real world images. We present a family of object detectors that combine Histogram of Oriented Gradient (HOG), Local Binary Pattern (LBP) and Local Ternary Pattern (LTP) features with efficient Latent SVM classifiers and effective dimensionality reduction and sparsification schemes to give state-of-the-art performance on several important datasets including PASCAL VOC2006 and VOC2007, INRIA Person and ETHZ. The three main contributions are as follows. Firstly, we pioneer the use of Local Ternary Pattern features for object detection, showing that LTP gives better overall performance than HOG and LBP, because it captures both rich local texture and object shape information while being resistant to variations in lighting conditions. It thus works well both for classes that are recognized mainly by their structure and ones that are recognized mainly by their textures. We also show that HOG, LBP and LTP complement one another, so that an extended feature set that incorporates all three of them gives further improvements in performance. Secondly, in order to tackle the speed and memory usage problems associated with high-dimensional modern feature sets, we propose two effective dimensionality reduction techniques. The first, feature projection using Partial Least Squares, allows detectors to be trained more rapidly with negligible loss of accuracy and no loss of run time speed for linear detectors. The second, feature selection using SVM weight truncation, allows active feature sets to be reduced in size by almost an order of magnitude with little or no loss, and often a small gain, in detector accuracy. Despite its simplicity, this feature selection scheme outperforms all of the other sparsity enforcing methods that we have tested. Lastly, we describe work in progress on Local Quantized Patterns (LQP), a generalized form of local pattern features that uses lookup table based vector quantization to provide local pattern style pixel neighbourhood codings that have the speed of LBP/LTP and some of the flexibility and power of traditional visual word representations. Our experiments show that LQP outperforms all of the other feature sets tested including HOG, LBP and LTP.
APA, Harvard, Vancouver, ISO, and other styles
23

Rebai, Ahmed. "Interactive Object Retrieval using Interpretable Visual Models." Phd thesis, Université Paris Sud - Paris XI, 2011. http://tel.archives-ouvertes.fr/tel-00608467.

Full text
Abstract:
This thesis is an attempt to improve visual object retrieval by allowing users to interact with the system. Our solution lies in constructing an interactive system that allows users to define their own visual concept from a concise set of visual patches given as input. These patches, which represent the most informative clues of a given visual category, are trained beforehand with a supervised learning algorithm in a discriminative manner. Then, and in order to specialize their models, users have the possibility to send their feedback on the model itself by choosing and weighting the patches they are confident of. The real challenge consists in how to generate concise and visually interpretable models. Our contribution relies on two points. First, in contrast to the state-of-the-art approaches that use bag-of-words, we propose embedding local visual features without any quantization, which means that each component of the high-dimensional feature vectors used to describe an image is associated to a unique and precisely localized image patch. Second, we suggest using regularization constraints in the loss function of our classifier to favor sparsity in the models produced. Sparsity is indeed preferable for concision (a reduced number of patches in the model) as well as for decreasing prediction time. To meet these objectives, we developed a multiple-instance learning scheme using a modified version of the BLasso algorithm. BLasso is a boosting-like procedure that behaves in the same way as Lasso (Least Absolute Shrinkage and Selection Operator). It efficiently regularizes the loss function with an additive L1-constraint by alternating between forward and backward steps at each iteration. The method we propose here is generic in the sense that it can be used with any local features or feature sets representing the content of an image region.
APA, Harvard, Vancouver, ISO, and other styles
24

Webber, James. "Visual object-oriented development of parallel applications." Thesis, University of Newcastle Upon Tyne, 2000. http://hdl.handle.net/10443/1762.

Full text
Abstract:
Developing software for parallel architectures is a notoriously difficult task, compounded further by the range of available parallel architectures. There has been little research effort invested in how to engineer parallel applications for more general problem domains than the traditional numerically intensive domain. This thesis addresses these issues. An object-oriented paradigm for the development of general-purpose parallel applications, with full lifecycle support, is proposed and investigated, and a visual programming language to support that paradigm is developed. This thesis presents experiences and results from experiments with this new model for parallel application development.
APA, Harvard, Vancouver, ISO, and other styles
25

Villalba, Michael Joseph. "Fast visual recognition of large object sets." Thesis, Massachusetts Institute of Technology, 1990. http://hdl.handle.net/1721.1/42211.

Full text
APA, Harvard, Vancouver, ISO, and other styles
26

Aghajanian, J. "Patch-based models for visual object classes." Thesis, University College London (University of London), 2011. http://discovery.ucl.ac.uk/1306170/.

Full text
Abstract:
This thesis concerns models for visual object classes that exhibit a reasonable amount of regularity, such as faces, pedestrians, cells and human brains. Such models are useful for making “within-object” inferences such as determining their individual characteristics and establishing their identity. For example, the model could be used to predict the identity of a face, the pose of a pedestrian or the phenotype of a cell and segment parts of a human brain. Existing object modelling techniques have several limitations. First, most current methods have targeted the above tasks individually using object specific representations; therefore, they cannot be applied to other problems without major alterations. Second, most methods have been designed to work with small databases which do not contain the variations in pose, illumination, occlusion and background clutter seen in ‘real world’ images. Consequently, many existing algorithms fail when tested on unconstrained databases. Finally, the complexity of the training procedure in these methods makes it impractical to use large datasets. In this thesis, we investigate patch-based models for object classes. Our models are capable of exploiting very large databases of objects captured in uncontrolled environments. We represent the test image with a regular grid of patches from a library of images of the same object. All the domain specific information is held in this library: we use one set of images of the object to help draw inferences about others. In each experimental chapter we investigate a different within-object inference task. In particular we develop models for classification, regression, semantic segmentation and identity recognition. In each task, we achieve results that are comparable to or better than the state of the art. We conclude that patch-based representation can be successfully used for the above tasks and shows promise for other applications such as generation and localization.
APA, Harvard, Vancouver, ISO, and other styles
27

Revie, Gavin F. "Object based attention in visual word processing." Thesis, University of Dundee, 2015. https://discovery.dundee.ac.uk/en/studentTheses/205c8224-4954-4b76-aa8c-b0ecd40a6591.

Full text
Abstract:
This thesis focusses on whether words are treated like visual objects by the human attentional system. Previous research has shown an attentional phenomenon that is associated specifically with objects: this is known as “object based attention” (e.g. Egly, Driver & Rafal, 1994). This is where drawing a participant’s attention (cuing) to any part of a visual object facilitates target detection at non-cued locations within that object. That is, the cue elevates visual attention across the whole object. The primary objective of this thesis was to demonstrate this effect using words instead of objects. The main finding of this thesis is that this effect can indeed be found within English words – but only when they are presented in their canonical horizontal orientation. The effect is also highly sensitive to the type of cue and target used. Cues which draw attention to the “wholeness” of the word appear to amplify the object based effect. A secondary finding of this thesis is that under certain circumstances participants apply some form of attentional mapping to words which respects the direction of reading. Participants are faster (or experience less cost) when prompted to move their attention in accord with reading direction than against. This effect only occurs when the word stimuli are used repeatedly during the course of the experiment. The final finding of this thesis is that both the object based attentional effect and the reading direction effect described above can be found using either real words or a non-lexical stimulus: specifically symbol strings. This strongly implies that these phenomena are not exclusively associated with word stimuli, but are instead associated with lower level visual processing. Nonetheless, it is considered highly likely that these processes are involved in the day to day process of reading.
APA, Harvard, Vancouver, ISO, and other styles
28

Kinuthia, Charles. "Visual Object Detector for Vehicle Teleoperation Applications." Thesis, KTH, Skolan för elektroteknik och datavetenskap (EECS), 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-276857.

Full text
Abstract:
Self-driving vehicles have recently gained attention from vehicle manufacturers due breakthrough in machine learning and AI algorithms. One of the areas that has sparked interest is the improved perception of vehicles by employing accurate real-time object detectors aided by the fast computing resources available. As more vehicles become autonomous, there will be a need to monitor and remotely control vehicles to handle edge case scenarios that are difficult to automate or foresee. This would require streaming of video from the vehicle to a teleoperator driver. Due to network degradation caused by bandwidth fluctuations and handover operations, streaming of the video is not enough. One can improve the experience of teleoperators by highlighting detected objects in the visual scene such as vehicles and pedestrians. The main contribution of this thesis work is a realtime visual object detector that has comparable accuracy to Faster R-CNN. Furthermore, the proposed detector is modular meaning that retraining of the entire model is not required to detect new types of object classes. Finally, the detector is tested on a video with network degradation artifacts to assess it’s performance.
Självkörande bilar har fångat intressen av biltillverkarna på grund av genombrott inom maskinlärning och AI algorithmer. En av de områden som har väckt intressen är förbättrat igenkännande av bilar genom att använda nogrann realtids objektdetektor. Som är föld av att mer fordon blir självkörande öka behovet att övervaka och fjärrstyra fordon. Detta för att kunna hantera speciella fall som är svåra att automatisera eller förutse. Detta kräver sändning av video från fordon till fjärrförare. På grund av nätverk problem som orsakas av bandbredd fluktuationer, räcker inte det att bara skicka video. Man kan förbettra körupplevelsen av fordonkörare genom att markera objekt som till exempel fordon och personer. Huvudbidragen av examensarbetet är en realtids objekt detektor som har jämförbar noggrannhet med Faster R-CNN. Det föreslagna detektorn är modulär och medför att man behöver inte träna om hela modelen om man lägger in en ny typ av objekt klass. I slutändan testas detektorn på en video med artifakter for att bedöma prestandan.
APA, Harvard, Vancouver, ISO, and other styles
29

Lindqvist, Zebh. "Design Principles for Visual Object Recognition Systems." Thesis, Luleå tekniska universitet, Institutionen för system- och rymdteknik, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:ltu:diva-80769.

Full text
Abstract:
Today's smartphones are capable of accomplishing far more advanced tasks than reading emails. With the modern framework TensorFlow, visual object recognition becomes possible using smartphone resources. This thesis shows that the main challenge does not lie in developing an artifact which performs visual object recognition. Instead, the main challenge lies in developing an ecosystem which allows for continuous improvement of the system’s ability to accomplish the given task without laborious and inefficient data collection. This thesis presents four design principles which contribute to an efficient ecosystem with quick initiation of new object classes and efficient data collection which is used to continuously improve the system’s ability to recognize smart meters in varying environments in an automated fashion.
APA, Harvard, Vancouver, ISO, and other styles
30

Teynor, Alexandra. "Visual object class recognition using local descriptions." [S.l. : s.n.], 2008. http://nbn-resolving.de/urn:nbn:de:bsz:25-opus-62371.

Full text
APA, Harvard, Vancouver, ISO, and other styles
31

Wu, Hanwei. "Object Ranking for Mobile 3D Visual Search." Thesis, KTH, Skolan för elektro- och systemteknik (EES), 2015. http://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-175146.

Full text
Abstract:
In this thesis, we study object ranking in mobile 3D visual search. The conventional methods of object ranking achieve ranking results based on the appearance of objects in images captured by mobile devices while ignoring the underlying 3D geometric information. Thus, we propose to use the method of mobile 3D visual search to improve the ranking by using the underlying 3D geometry of the objects. We develop an algorithm of fast 3D geometric verication to re-rank the objects at low computational complexity. In that scene, the geometry of the objects such as round corners, sharp edges, or planar surfaces as well as the appearance of objects will be considered for 3D object ranking. On the other hand, we also investigate flaws of conventional vocabulary trees and improve the ranking results by introducing a credibility value to the TF-IDF scheme. By combining novel vocabulary trees and fast 3D geometric verification, we can improve the recall-datarate performance as well as the subjective ranking results for mobile 3D visual search.
APA, Harvard, Vancouver, ISO, and other styles
32

Wu, Zheng. "Occlusion reasoning for multiple object visual tracking." Thesis, Boston University, 2013. https://hdl.handle.net/2144/12892.

Full text
Abstract:
Thesis (Ph.D.)--Boston University
Occlusion reasoning for visual object tracking in uncontrolled environments is a challenging problem. It becomes significantly more difficult when dense groups of indistinguishable objects are present in the scene that cause frequent inter-object interactions and occlusions. We present several practical solutions that tackle the inter-object occlusions for video surveillance applications. In particular, this thesis proposes three methods. First, we propose "reconstruction-tracking," an online multi-camera spatial-temporal data association method for tracking large groups of objects imaged with low resolution. As a variant of the well-known Multiple-Hypothesis-Tracker, our approach localizes the positions of objects in 3D space with possibly occluded observations from multiple camera views and performs temporal data association in 3D. Second, we develop "track linking," a class of offline batch processing algorithms for long-term occlusions, where the decision has to be made based on the observations from the entire tracking sequence. We construct a graph representation to characterize occlusion events and propose an efficient graph-based/combinatorial algorithm to resolve occlusions. Third, we propose a novel Bayesian framework where detection and data association are combined into a single module and solved jointly. Almost all traditional tracking systems address the detection and data association tasks separately in sequential order. Such a design implies that the output of the detector has to be reliable in order to make the data association work. Our framework takes advantage of the often complementary nature of the two subproblems, which not only avoids the error propagation issue from which traditional "detection-tracking approaches" suffer but also eschews common heuristics such as "nonmaximum suppression" of hypotheses by modeling the likelihood of the entire image. The thesis describes a substantial number of experiments, involving challenging, notably distinct simulated and real data, including infrared and visible-light data sets recorded ourselves or taken from data sets publicly available. In these videos, the number of objects ranges from a dozen to a hundred per frame in both monocular and multiple views. The experiments demonstrate that our approaches achieve results comparable to those of state-of-the-art approaches.
APA, Harvard, Vancouver, ISO, and other styles
33

Van, Thielen Tessa. "From object towards island." Thesis, Konstfack, Institutionen för Konst (K), 2017. http://urn.kb.se/resolve?urn=urn:nbn:se:konstfack:diva-5924.

Full text
Abstract:
A NOTE FROM ME TO YOU ABOUT WHAT I WILL BE HIDING In a moment I will introduce myself to you as a storyteller and a constant traveller; bothdescriptions are accurate and so is the description of an ‘object maker’. In combination withstories, objects are created and either the story or the object is presented, and so you will have tosettle for only one element in the upcoming writings. Singular edition photographs, books or evenengraved instruments will be hidden from you during the tales. Documentation; this could be the simplest way of describing my method, things are simplywitnessed and written down, or traces are caught on camera. Objects as documentation and thenthe reverse; stories within stories and photographs within photographs. There is an aim to makeall information obtained into a riddle.It is experiences which trigger these tales: people I meet, or texts which magically match with reallife events and synchronize. I use the word ‘magically’ since my own plans always turn out to bedisappointing. I attempt to plan, but often something else takes over; leaving chance to create thework. Most objects involved are rather small and simple; there exists a distance between these objectsand you. They are about their own details; about the small engravings one might pass; about thewood used for the framing and about the printing process which makes a blue bright. They areabout time; they require time. Time to reveal the amount of information they keep captured withinthemselves. They are about places; they require space to breath in and out. The photographic is present in these tales and it is also the photographic which makes me hidethese objects from you now; to reveal or hide a bigger context. It is the photographic which sinksinto my way of thinking. I use text and stories and the imaginative and the real as a photographer,despite not being one. However, I do document the objects previously mentioned and theircontext and find new life in these images. This is where the work continues, it is a non-stop‘connecting-the-dots’ way of working. I do not care for fiction; although I do not care for the fantastic either. Truth is there is aboringness to this process, as things are sometimes repetitive and executed in the here. Aninability to access ‘the there’, at first sight, could perhaps be said. But let’s admit there is no fun intelling tales which will actually happen. You will travel through stories in where various sorts of information are combined and narrationsout-of-nowhere are created. Connections between gossip and truth and present and past andearth and earth are intertwined into a sort of imaginary world in where nature is centralized.A riddle and at the same time an insight into an artistic process. The travel never ends
APA, Harvard, Vancouver, ISO, and other styles
34

Yang, Fan. "Visual Infrastructure based Accurate Object Recognition and Localization." The Ohio State University, 2017. http://rave.ohiolink.edu/etdc/view?acc_num=osu1492752246062673.

Full text
APA, Harvard, Vancouver, ISO, and other styles
35

Piñol, Naranjo Mónica. "Reinforcement learning of visual descriptors for object recognition." Doctoral thesis, Universitat Autònoma de Barcelona, 2014. http://hdl.handle.net/10803/283927.

Full text
Abstract:
El sistema visual humà és capaç de reconéixe l'objecte que hi ha en una imatge encara que l'objecte estigui parcialment oclòs, des de diferents punts de vista, en diferents colors i amb independència de la distància a la que es troba l'objecte de la càmera. Per poder realitzar això, l'ull obté l'imatge i extreu unes caracterítiques que són enviades al cervell i és allà on es classifica l'objecte per poder identificar-lo. En el reconeixement d'objectes, la visió per computador intenta imitar el sistema humà. Així, s'utilitza un algoritme per detectar característiques representatives de l'escena (detector), un altre algoritme per descriure les característiques extretes (descriptor) i finalment la informació es enviada a un tercer algoritme per fer la classificació (aprenentatge). Escollir aquests algoritmes és molt complicat i tant mateix una àrea d'investigació molt activa. En aquesta tesis ens hem enfocat en la selecció/aprenentatge del millor descriptor per a cada imatge. A l'actualitat hi ha molts descriptors a l'estat de l'art però no sabem quin es el millor, ja que no depèn sols d'ell mateix sinó també depen de les característiques de les imatges (base de dades) i dels algoritmes de classificació. Nosaltres proposem un marc de treball basat en l'aprenentatge per reforç i la bossa de característiques per poder escollir el millor descriptor per a cada imatge. El sistema permet analitzar el comportament de diferents classiicadors i conjunts de descriptors. A més el sistema que proposem per a la millora del reconeixement/classificació pot ser utilizat en altres àmbits de la visió per computador, com per exemple el video retrieval
The human visual system is able to recognize the object in an image even if the object is partially occluded, from various points of view, in different colors, or with independence of the distance to the object. To do this, the eye obtains an image and extracts features that are sent to the brain, and then, in the brain the object is recognized. In computer vision, the object recognition branch tries to learns from the human visual system behaviour to achieve its goal. Hence, an algorithm is used to identify representative features of the scene (detection), then another algorithm is used to describe these points (descriptor) and finally the extracted information is used for classifying the object in the scene. The selection of this set of algorithms is a very complicated task and thus, a very active research field. In this thesis we are focused on the selection/learning of the best descriptor for a given image. In the state of the art there are several descriptors but we do not know how to choose the best descriptor because depends on scenes that we will use (dataset) and the algorithm chosen to do the classification. We propose a framework based on reinforcement learning and bag of features to choose the best descriptor according to the given image. The system can analyse the behaviour of different learning algorithms and descriptor sets. Further- more the proposed framework for improving the classification/recognition ratio can be used with minor changes in other computer vision fields, such as video retrieval.
APA, Harvard, Vancouver, ISO, and other styles
36

Ventura, Royo Carles. "Visual object analysis using regions and local features." Doctoral thesis, Universitat Politècnica de Catalunya, 2016. http://hdl.handle.net/10803/398407.

Full text
Abstract:
The first part of this dissertation focuses on an analysis of the spatial context in semantic image segmentation. First, we review how spatial context has been tackled in the literature by local features and spatial aggregation techniques. From a discussion about whether the context is beneficial or not for object recognition, we extend a Figure-Border-Ground segmentation for local feature aggregation with ground truth annotations to a more realistic scenario where object proposals techniques are used instead. Whereas the Figure and Ground regions represent the object and the surround respectively, the Border is a region around the object contour, which is found to be the region with the richest contextual information for object recognition. Furthermore, we propose a new contour-based spatial aggregation technique of the local features within the object region by a division of the region into four subregions. Both contributions have been tested on a semantic segmentation benchmark with a combination of free and non-free context local features that allows the models automatically learn whether the context is beneficial or not for each semantic category. The second part of this dissertation addresses the semantic segmentation for a set of closely-related images from an uncalibrated multiview scenario. State-of-the-art semantic segmentation algorithms fail on correctly segmenting the objects from some viewpoints when the techniques are independently applied to each viewpoint image. The lack of large annotations available for multiview segmentation do not allow to obtain a proper model that is robust to viewpoint changes. In this second part, we exploit the spatial correlation that exists between the different viewpoints images to obtain a more robust semantic segmentation. First, we review the state-of-the-art co-clustering, co-segmentation and video segmentation techniques that aim to segment the set of images in a generic way, i.e. without considering semantics. Then, a new architecture that considers motion information nd provides a multiresolution segmentation is proposed for the co-clustering framework nd outperforms state-of-the-art techniques for generic multiview segmentation. Finally, the proposed multiview segmentation is combined with the semantic segmentation results giving a method for automatic resolution selection and a coherent semantic multiview segmentation.
La primera part de la tesi es focalitza en l'anàlisi del context espacial en la segmentació semàntica d'imatges. En primer lloc, revisem com s'ha tractat el context espacial en la literatura per mitjà de descriptors locals i tècniques d'agregació espacial. A partir de la discussió sobre si el context és beneficial o no per al reconeixement d'objectes, extenem una segmentació en objecte, contorn i fons per a l'agregació espacial de descriptors locals amb annotacions a un escenari més realístic on s'utilitzen hipòtesis de localitzacions d'objectes enlloc d'annotacions. Mentres que les regions corresponen a objecte i fons representes aquestes àrees respectives de la imatge, el contorn és una regió al voltant de l'objecte, la qual ha resultat ser la regió més rica amb informació contextual per al reconeixement d'objectes. A més a més, proposem una nova tècnica d'agregació espacial dels descriptors locals de l'interior de l'objecte amb una divisió d'aquesta regió en 4 subregions. Ambdues contribucions han estat verificades en un benchmark de segmentació semàntica amb la combinació de descriptors locals dependents i independents del context que permet que els models automàticament aprenguin si el context és beneficiós o no per a cada categoria semàntica. La segona part de la tesi aborda el problema de segmentació semàntica per a un conjunt d'imatges relacionades en un escenari multi-vista sense calibració. Els algorismes de l'estat de l'art en segmentació semàntica fallen en segmentar correctament els objects dels diferents punts de vista quan les tècniques són aplicades de forma independent a cadascun dels punts de vista. La manca d'un nombre elevat d'annotacions disponibles per a segmentació multi-vista no permet obtenir un model que sigui robust als canvis de vista. En aquesta segona part, explotem la correlació espacial existent entre els diferents punts de vista per obtenir una segmentació semàntica més robusta. En primer lloc, revisem les tècniques de l'estat de l'art en co-agrupament, co-segmentació i segmentació de vídeo que tenen per objectiu segmentar el conjunt d'imatges de forma genèrica, és a dir, sense considerar la semàntica. A continuació, proposem una nova arquitectura de co-agrupament que considera informació de moviment i proveeix una segmentació amb múltiples resolucions i millora les tècniques de l'estat de l'art en segmentació genèrica multi-vista. Finalment, la segmentació multivista proposada és combinada amb els resultats de la segmentació semàntica donant lloc a un mètode per a una selecció automàtica de la resolució i una segmentació semàntica multi-vista coherent.
APA, Harvard, Vancouver, ISO, and other styles
37

Wilson, Susan E. "Perceptual organization and symmetry in visual object recognition." Thesis, University of British Columbia, 1991. http://hdl.handle.net/2429/29802.

Full text
Abstract:
A system has been implemented which is able to detect symmetrical groupings in edge images. The initial stages of the algorithm consist of edge detection, curve smoothing, and the extension of the perceptual grouping phase of the SCERPO [Low87] vision system to enable detection of instances of endpoint proximity and curvilinearity among curved segments. The symmetry detection stage begins by first locating points along object boundaries which are significant in terms of curvature. These key points are then tested against each other in order to detect locally symmetric pairs. An iterative grouping procedure is then applied which matches these pairs together using a more global definition of symmetry. The end result of this process is a set of pairs of key points along the boundary of an object which are bilaterally symmetric, along with the axis of symmetry for the object or sub-object. This paper describes the implementation of this system and presents several examples of the results obtained using real images. The output of the system is intended for use as indexing features in a model-based object recognition system, such as SCERPO, which requires as input a set of spatial correspondences between image features and model features.
Science, Faculty of
Computer Science, Department of
Graduate
APA, Harvard, Vancouver, ISO, and other styles
38

Wallenberg, Marcus, and Per-Erik Forssén. "A Research Platform for Embodied Visual Object Recognition." Linköpings universitet, Datorseende, 2010. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-70769.

Full text
Abstract:
We present in this paper a research platform for development and evaluation of embodied visual object recognition strategies. The platform uses a stereoscopic peripheral-foveal camera system and a fast pan-tilt unit to perform saliency-based visual search. This is combined with a classification framework based on the bag-of-features paradigm with the aim of targeting, classifying and recognising objects. Interaction with the system is done via typed commands and speech synthesis. We also report the current classification performance of the system.
APA, Harvard, Vancouver, ISO, and other styles
39

Firouzi, Hadi. "Visual non-rigid object tracking in dynamic environments." Thesis, University of British Columbia, 2013. http://hdl.handle.net/2429/44629.

Full text
Abstract:
This research presents machine vision techniques to track an object of interest visually in an image sequence in which the target appearance, scale, orientation, shape, and position may significantly change over time. The images are captured using a non-stationary camera in a dynamic environment in a gray-scale format, and the initial location of the target is given. The contributions of this thesis include the introduction of two robust object tracking techniques and an adaptive similarity measure which can significantly improve the performance of visual tracking. In the first technique, the target is initially partitioned into several sub-regions, and subsequently each sub-region is represented by two distinct adaptive templates namely immediate and delayed templates. At every tracking step, the translational transformation of each sub-region is preliminarily estimated using the immediate template by a multi-start gradient-based search, and then the delayed template is employed to correct the estimation. After this two-step optimization, the target is tracked by robust fusion of the new sub-region locations. From the experiments, the proposed tracker is more robust against appearance variance and occlusion in comparison with the traditional trackers. Similarly, in the second technique the target is represented by two heterogeneous Gaussian-based templates which models both short- and long-term changes in the target appearance. The target localization of the latter technique features an interactive multi-start optimization that takes into account generic transformations using a combination of sampling- and gradient-based algorithms in a probabilistic framework. Unlike the two-step optimization of the first method, the templates are used to find the best location of the target, simultaneously. This approach further increases both the efficiency and accuracy of the proposed tracker. Lastly, an adaptive metric to estimate the similarity between the target model and new images is proposed. In this work, a weighted L2-norm is used to calculate the target similarity measure. A histogram-based classifier is learned on-line to categorize the L2-norm error into three classes which subsequently specify a weight to each L2-norm error. The inclusion of the proposed similarity measure can remarkably improve the robustness of visual tracking against severe and long-term occlusion.
APA, Harvard, Vancouver, ISO, and other styles
40

Leeds, Daniel Demeny. "Searching for the Visual Components of Object Perception." Research Showcase @ CMU, 2013. http://repository.cmu.edu/dissertations/313.

Full text
Abstract:
The nature of visual properties used for object perception in mid- and high-level vision areas of the brain is poorly understood. Past studies have employed simplistic stimuli probing models limited in descriptive power and mathematical under-pinnings. Unfortunately, pursuit of more complex stimuli and properties requires searching through a wide, unknown space of models and of images. The difficulty of this pursuit is exacerbated in brain research by the limited number of stimulus responses that can be collected for a given human subject over the course of an experiment. To more quickly identify complex visual features underlying cortical object perception, I develop, test, and use a novel method in which stimuli for use in the ongoing study are selected in realtime based on fMRI-measured cortical responses to recently-selected and displayed stimuli. A variation of the simplex method controls this ongoing selection as part of a search in visual space for images producing maximal activity — measured in realtime — in a pre-determined 1 cm3 brain region. I probe cortical selectivities during this search using photographs of real-world objects and synthetic “Fribble” objects. Real-world objects are used to understand perception of naturally-occurring visual properties. These objects are characterized based on feature descriptors computed from the scale invariant feature transform (SIFT), a popular computer vision method that is well established in its utility for aiding in computer object recognition and that I recently found to account for intermediate-level representations in the visual object processing pathway in the brain. Fribble objects are used to study object perception in an arena in which visual properties are well defined a priori. They are constructed from multiple well-defined shapes, and variation of each of these component shapes produces a clear space of visual stimuli. I study the behavior of my novel realtime fMRI search method, to assess its value in the investigation of cortical visual perception, and I study the complex visual properties my method identifies as highly-activating selected brain regions in the visual object processing pathway. While there remain further technical and biological challenges to overcome, my method uncovers reliable and interesting cortical properties for most subjects — though only for selected searches performed for each subject. I identify brain regions selective for holistic and component object shapes and for varying surface properties, providing examples of more precise selectivities within classes of visual properties previously associated with cortical object representation. I also find examples of “surround suppression,” in which cortical activity is inhibited upon viewing stimuli slightly deviation from the visual properties preferred by a brain region, expanding on similar observations at lower levels of vision.
APA, Harvard, Vancouver, ISO, and other styles
41

Lovell, Kylie Sarah. "Implicit and explicit processes in visual object recognition." Thesis, University of Reading, 2002. http://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.430835.

Full text
APA, Harvard, Vancouver, ISO, and other styles
42

Sudderth, Erik B. (Erik Blaine) 1977. "Graphical models for visual object recognition and tracking." Thesis, Massachusetts Institute of Technology, 2006. http://hdl.handle.net/1721.1/34023.

Full text
Abstract:
Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2006.
This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.
Includes bibliographical references (p. 277-301).
We develop statistical methods which allow effective visual detection, categorization, and tracking of objects in complex scenes. Such computer vision systems must be robust to wide variations in object appearance, the often small size of training databases, and ambiguities induced by articulated or partially occluded objects. Graphical models provide a powerful framework for encoding the statistical structure of visual scenes, and developing corresponding learning and inference algorithms. In this thesis, we describe several models which integrate graphical representations with nonparametric statistical methods. This approach leads to inference algorithms which tractably recover high-dimensional, continuous object pose variations, and learning procedures which transfer knowledge among related recognition tasks. Motivated by visual tracking problems, we first develop a nonparametric extension of the belief propagation (BP) algorithm. Using Monte Carlo methods, we provide general procedures for recursively updating particle-based approximations of continuous sufficient statistics. Efficient multiscale sampling methods then allow this nonparametric BP algorithm to be flexibly adapted to many different applications.
(cont.) As a particular example, we consider a graphical model describing the hand's three-dimensional (3D) structure, kinematics, and dynamics. This graph encodes global hand pose via the 3D position and orientation of several rigid components, and thus exposes local structure in a high-dimensional articulated model. Applying nonparametric BP, we recover a hand tracking algorithm which is robust to outliers and local visual ambiguities. Via a set of latent occupancy masks, we also extend our approach to consistently infer occlusion events in a distributed fashion. In the second half of this thesis, we develop methods for learning hierarchical models of objects, the parts composing them, and the scenes surrounding them. Our approach couples topic models originally developed for text analysis with spatial transformations, and thus consistently accounts for geometric constraints. By building integrated scene models, we may discover contextual relationships, and better exploit partially labeled training images. We first consider images of isolated objects, and show that sharing parts among object categories improves accuracy when learning from few examples.
(cont.) Turning to multiple object scenes, we propose nonparametric models which use Dirichlet processes to automatically learn the number of parts underlying each object category, and objects composing each scene. Adapting these transformed Dirichlet processes to images taken with a binocular stereo camera, we learn integrated, 3D models of object geometry and appearance. This leads to a Monte Carlo algorithm which automatically infers 3D scene structure from the predictable geometry of known object categories.
by Erik B. Sudderth.
Ph.D.
APA, Harvard, Vancouver, ISO, and other styles
43

Kuo, Michael. "Learning visual object categories from few training examples." Thesis, Massachusetts Institute of Technology, 2011. http://hdl.handle.net/1721.1/66430.

Full text
Abstract:
Thesis (M. Eng.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2011.
Cataloged from PDF version of thesis.
Includes bibliographical references (p. 73-74).
During visual perception of complex objects, humans fixate on salient regions of a particular object, moving their gaze from one region to another in order to gain information about that object. The Bayesian Integrate and Shift (BIAS) model is a recently proposed model for learning visual object categories that is modeled after the process of human visual perception, integrating information from within and across fixations. Previous works have described preliminary evaluations of the BIAS model and demonstrated that it can learn new object categories from only a few examples. In this thesis, we introduce and evaluate improvements to the learning algorithm, demonstrate that the model benefits from using information from fixating on multiple regions of a particular object, evaluate the limitations of the model when learning different object categories, and assess the performance of the learning algorithm when objects are partially occluded.
by Michael Kuo.
M.Eng.
APA, Harvard, Vancouver, ISO, and other styles
44

Sun, Yaoru. "Hierarchical object-based visual attention for machine vision." Thesis, University of Edinburgh, 2003. http://hdl.handle.net/1842/316.

Full text
Abstract:
Human vision uses mechanisms of covert attention to selectively process interesting information and overt eye movements to extend this selectivity ability. Thus, visual tasks can be effectively dealt with by limited processing resources. Modelling visual attention for machine vision systems is not only critical but also challenging. In the machine vision literature there have been many conventional attention models developed but they are all space-based only and cannot perform object-based selection. In consequence, they fail to work in real-world visual environments due to the intrinsic limitations of the space-based attention theory upon which these models are built. The aim of the work presented in this thesis is to provide a novel human-like visual selection framework based on the object-based attention theory recently being developed in psychophysics. The proposed solution – a Hierarchical Object-based Attention Framework (HOAF) based on grouping competition, consists of two closely-coupled visual selection models of (1) hierarchical object-based visual (covert) attention and (2) object-based attention-driven (overt) saccadic eye movements. The Hierarchical Object-based Attention Model (HOAM) is the primary selection mechanism and the Object-based Attention-Driven Saccading model (OADS) has a supporting role, both of which are combined in the integrated visual selection framework HOAF. This thesis first describes the proposed object-based attention model HOAM which is the primary component of the selection framework HOAF. The model is based on recent psychophysical results on object-based visual attention and adopted grouping-based competition to integrate object-based and space-based attention together so as to achieve object-based hierarchical selectivity. The behaviour of the model is demonstrated on a number of synthetic images simulating psychophysical experiments and real-world natural scenes. The experimental results showed that the performance of our object-based attention model HOAM concurs with the main findings in the psychophysical literature on object-based and space-based visual attention. Moreover, HOAM has outstanding hierarchical selectivity from far to near and from coarse to fine by features, objects, spatial regions, and their groupings in complex natural scenes. This successful performance arises from three original mechanisms in the model: grouping-based saliency evaluation, integrated competition between groupings, and hierarchical selectivity. The model is the first implemented machine vision model of integrated object-based and space-based visual attention. The thesis then addresses another proposed model of Object-based Attention-Driven Saccadic eye movements (OADS) built upon the object-based attention model HOAM, ii as an overt saccading component within the object-based selection framework HOAF. This model, like our object-based attention model HOAM, is also the first implemented machine vision saccading model which makes a clear distinction between (covert) visual attention and overt saccading movements in a two-level selection system – an important feature of human vision but not yet explored in conventional machine vision saccading systems. In the saccading model OADS, a log-polar retina-like sensor is employed to simulate the human-like foveation imaging for space variant sensing. Through a novel mechanism for attention-driven orienting, the sensor fixates on new destinations determined by object-based attention. Hence it helps attention to selectively process interesting objects located at the periphery of the whole field of view to accomplish the large-scale visual selection tasks. By another proposed novel mechanism for temporary inhibition of return, OADS can simulate the human saccading/ attention behaviour to refixate/reattend interesting objects for further detailed inspection. This thesis concludes that the proposed human-like visual selection solution – HOAF, which is inspired by psychophysical object-based attention theory and grouping-based competition, is particularly useful for machine vision. HOAF is a general and effective visual selection framework integrating object-based attention and attentiondriven saccadic eye movements with biological plausibility and object-based hierarchical selectivity from coarse to fine in a space-time context.
APA, Harvard, Vancouver, ISO, and other styles
45

Peterson, Jason W. "Visual assessment of object color chroma and colorfulness /." Online version of thesis, 1994. http://hdl.handle.net/1850/11868.

Full text
APA, Harvard, Vancouver, ISO, and other styles
46

Wallenberg, Marcus. "Components of Embodied Visual Object Recognition : Object Perception and Learning on a Robotic Platform." Licentiate thesis, Linköpings universitet, Datorseende, 2013. http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-93812.

Full text
Abstract:
Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, and the implementation of the system itself. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. Finally, in order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. All of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied.
Embodied Visual Object Recognition
APA, Harvard, Vancouver, ISO, and other styles
47

Naha, Shujon. "Zero-shot Learning for Visual Recognition Problems." IEEE, 2015. http://hdl.handle.net/1993/31806.

Full text
Abstract:
In this thesis we discuss different aspects of zero-shot learning and propose solutions for three challenging visual recognition problems: 1) unknown object recognition from images 2) novel action recognition from videos and 3) unseen object segmentation. In all of these three problems, we have two different sets of classes, the “known classes”, which are used in the training phase and the “unknown classes” for which there is no training instance. Our proposed approach exploits the available semantic relationships between known and unknown object classes and use them to transfer the appearance models from known object classes to unknown object classes to recognize unknown objects. We also propose an approach to recognize novel actions from videos by learning a joint model that links videos and text. Finally, we present a ranking based approach for zero-shot object segmentation. We represent each unknown object class as a semantic ranking of all the known classes and use this semantic relationship to extend the segmentation model of known classes to segment unknown class objects.
October 2016
APA, Harvard, Vancouver, ISO, and other styles
48

Corradi, Tadeo. "Integrating visual and tactile robotic perception." Thesis, University of Bath, 2018. https://ethos.bl.uk/OrderDetails.do?uin=uk.bl.ethos.761005.

Full text
Abstract:
The aim of this project is to enable robots to recognise objects and object categories by combining vision and touch. In this thesis, a novel inexpensive tactile sensor design is presented, together with a complete, probabilistic sensor-fusion model. The potential of the model is demonstrated in four areas: (i) Shape Recognition, here the sensor outperforms its most similar rival, (ii) Single-touch Object Recognition, where state-of-the-art results are produced, (iii) Visuo-tactile object recognition, demonstrating the benefits of multi-sensory object representations, and (iv) Object Classification, which has not been reported in the literature to date. Both the sensor design and the novel database were made available. Tactile data collection is performed by a robot. An extensive analysis of data encodings, data processing, and classification methods is presented. The conclusions reached are: (i) the inexpensive tactile sensor can be used for basic shape and object recognition, (ii) object recognition combining vision and touch in a probabilistic manner provides an improvement in accuracy over either modality alone, (iii) when both vision and touch perform poorly independently, the sensor-fusion model proposed provides faster learning, i.e. fewer training samples are required to achieve similar accuracy, and (iv) such a sensor-fusion model is more accurate than either modality alone when attempting to classify unseen objects, as well as when attempting to recognise individual objects from amongst similar other objects of the same class. (v) The preliminary potential is identified for real-life applications: underwater object classification. (vi) The sensor fusion model providesimprovements in classification even for award-winning deep-learning basedcomputer vision models.
APA, Harvard, Vancouver, ISO, and other styles
49

Zoccoli, Sandra L. "Object features and object recognition Semantic memory abilities during the normal aging process /." Ann Arbor, Mich. : ProQuest, 2007. http://gateway.proquest.com/openurl?url_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:dissertation&res_dat=xri:pqdiss&rft_dat=xri:pqdiss:3288933.

Full text
Abstract:
Thesis (Ph.D. in Psychology)--S.M.U., 2007.
Title from PDF title page (viewed Nov. 19, 2009). Source: Dissertation Abstracts International, Volume: 68-11, Section: B, page: 7695. Adviser: Alan S. Brown. Includes bibliographical references.
APA, Harvard, Vancouver, ISO, and other styles
50

Eren, Kanat Selda. "Visual Object Representations: Effects Of Feature Frequency And Similarity." Phd thesis, METU, 2011. http://etd.lib.metu.edu.tr/upload/12613978/index.pdf.

Full text
Abstract:
The effects of feature frequency and similarity on object recognition have been examined through behavioral experiments, and a model of the formation of visual object representations and old/new recognition has been proposed. A number of experiments were conducted to test the hypothesis that frequency and similarity of object features affect the old/new responses to test stimuli in a later recognition task. In the first experiment, when the feature frequencies are controlled, there was a significant increase in the percentage of &ldquo
old&rdquo
responses for unstudied objects as the number of frequently repeated features (FRFs) on the object increased. In the second experiment, where all features had equal frequency, similarity of test objects did not affect old/new responses. An evaluation of the models on object recognition and categorization with respect to the experimental results showed that these models can only partially explain experimental results. A comprehensive model for the formation of visual object representations and old/new recognition, called CDZ-VIS, developed on the Convergence-Divergence Zone framework by Damasio (1989), has been proposed. According to this framework, co-occurring object features converge to upper layer units in the hierarchical representation which act as binding units. As more objects are displayed, frequent object features cause grouping of these binding units which converge to upper binding units. The performance of the CDZ-VIS model on the feature frequency and similarity experiments of the present study was shown to be closer to the performance of the human participants, compared to the performance of two models from the categorization literature.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography