Dissertations / Theses on the topic 'Pore segmentation'

To see the other types of publications on this topic, follow the link: Pore segmentation.

Create a spot-on reference in APA, MLA, Chicago, Harvard, and other styles

Select a source type:

Consult the top 28 dissertations / theses for your research on the topic 'Pore segmentation.'

Next to every source in the list of references, there is an 'Add to bibliography' button. Press on it, and we will generate automatically the bibliographic reference to the chosen work in the citation style you need: APA, MLA, Harvard, Chicago, Vancouver, etc.

You can also download the full text of the academic publication as pdf and read online its abstract whenever available in the metadata.

Browse dissertations / theses on a wide variety of disciplines and organise your bibliography correctly.

1

Ding, Nan. "3D Modeling of the Lamina Cribrosa in OCT Data." Electronic Thesis or Diss., Sorbonne université, 2024. http://www.theses.fr/2024SORUS148.

Full text
Abstract:
La lame criblée (LC), située dans la tête du nerf optique, joue un rôle crucial dans le diagnostic et l'étude du glaucome, la deuxième cause de cécité. Il s'agit d'un maillage collagénique 3D formé de pores, par lesquels les fibres nerveuses passent pour atteindre le cerveau. L'observation 3D in vivo des pores de la LC est désormais possible grâce aux progrès de l'imagerie de tomographie de cohérence optique (OCT). Dans cette étude, nous visons à réaliser automatiquement la reconstruction 3D des pores à partir de volumes OCT, afin d'étudier le remodelage de la LC au cours du glaucome. La résolution limitée de l'OCT conventionnel ainsi que le faible rapport signal à bruit (SNR) posent des problèmes pour caractériser les chemins axonaux avec suffisamment de fiabilité et de précision, sachant qu'il est difficile, même pour des experts, d'identifier les pores dans une seule image en-face. Ainsi, notre première contribution est une méthode innovante de recalage et de fusion de deux volumes OCT 3D orthogonaux pour l'amélioration de la qualité d'image et le rehaussement des pores, ce qui, à notre connaissance, n'avait jamais été réalisé. Les résultats expérimentaux démontrent que notre algorithme est robuste et conduit à un alignement précis. Notre deuxième contribution est la conception d'un réseau de neurones profond, de type attention U-net, pour segmenter les pores de la LC dans les images 2D en-face. Il s'agit de la première tentative de résolution de ce problème par apprentissage profond, les défis posés relevant de l'incomplétude des annotations pour l'apprentissage, et du faible contraste et de la mauvaise résolution des pores. L'analyse comparative avec d'autres méthodes montre que notre approche conduit aux meilleurs résultats. La fusion des volumes OCT et la segmentation des pores dans les images en-face constituent les deux étapes préliminaires à la reconstruction 3D des trajets axonaux, notre troisième contribution. Nous proposons une méthode de suivi des pores fondée sur un algorithme de contour actif paramétrique appliqué localement. Notre modèle intègre les caractéristiques de faible intensité et de régularité des pores. Combiné aux cartes de segmentation 2D, il nous permet de reconstituer plan par plan les chemins axonaux en 3D. Ces résultats ouvrent la voie au calcul de biomarqueurs et facilitent l'interprétation médicale
The lamina cribrosa (LC) is a 3D collagenous mesh in theoptic nerve head that plays a crucial role in themechanisms and diagnosis of glaucoma, the second leading cause of blindness in the world. The LC is composed of so-called “pores”, namely axonal paths within the collagenous mesh, through which the axons pass to reach the brain. In vivo 3D observation of the LC pores is now possible thanks to advances in Optical Coherence Tomography (OCT) technology. In this study, we aim to automatically perform the 3D reconstruction of pore paths from OCT volumes, in order to study the remodeling of the lamina cribrosa during glaucoma and better understand this disease.The limited axial resolution of conventional OCT as well as the low signal to noise ratio (SNR) poses challenges for the robust characterization of axonal paths with enough reliability, knowing that it is difficult even for experts to identify the pores in a single en-face image. To this end, our first contribution introduces an innovative method to register and fuse 2 orthogonal 3D OCT volumes in order to enhance the pores. This is, to our knowledge, the first time that orthogonal OCT volumes are jointly exploited to achieve better image quality. Experimental results demonstrate that our algorithm is robust and leads to accurate alignment.Our second contribution presents a context-aware attention U-Net method, a deep learning approach using partial points annotation for the accurate pore segmentation in every 2D en-face image. This work is also, to the best of our knowledge, the first attempt to look into the LC pore reconstruction problem using deep learning methods. Through a comparative analysis with other state-of-the-art methods, we demonstrate the superior performance of the proposed approach.Our robust and accurate pore registration and segmentation methods provide a solid foundation for 3D reconstruction of axonal pathways, our third contribution. We propose a pore tracking method based on a locally applied parametric active contour algorithm. Our model integrates the characteristics of low intensity and regularity of pores. Combined with the 2D segmentation maps, it enables us to reconstruct the axonal paths in 3D plane by plane. These results pave the way for the calculation of biomarkers characterizing the LC and facilitate medical interpretation
APA, Harvard, Vancouver, ISO, and other styles
2

Wagh, Ameya Yatindra. "A Deep 3D Object Pose Estimation Framework for Robots with RGB-D Sensors." Digital WPI, 2019. https://digitalcommons.wpi.edu/etd-theses/1287.

Full text
Abstract:
The task of object detection and pose estimation has widely been done using template matching techniques. However, these algorithms are sensitive to outliers and occlusions, and have high latency due to their iterative nature. Recent research in computer vision and deep learning has shown great improvements in the robustness of these algorithms. However, one of the major drawbacks of these algorithms is that they are specific to the objects. Moreover, the estimation of pose depends significantly on their RGB image features. As these algorithms are trained on meticulously labeled large datasets for object's ground truth pose, it is difficult to re-train these for real-world applications. To overcome this problem, we propose a two-stage pipeline of convolutional neural networks which uses RGB images to localize objects in 2D space and depth images to estimate a 6DoF pose. Thus the pose estimation network learns only the geometric features of the object and is not biased by its color features. We evaluate the performance of this framework on LINEMOD dataset, which is widely used to benchmark object pose estimation frameworks. We found the results to be comparable with the state of the art algorithms using RGB-D images. Secondly, to show the transferability of the proposed pipeline, we implement this on ATLAS robot for a pick and place experiment. As the distribution of images in LINEMOD dataset and the images captured by the MultiSense sensor on ATLAS are different, we generate a synthetic dataset out of very few real-world images captured from the MultiSense sensor. We use this dataset to train just the object detection networks used in the ATLAS Robot experiment.
APA, Harvard, Vancouver, ISO, and other styles
3

Seguin, Guillaume. "Analyse des personnes dans les films stéréoscopiques." Thesis, Paris Sciences et Lettres (ComUE), 2016. http://www.theses.fr/2016PSLEE021/document.

Full text
Abstract:
Les humains sont au coeur de nombreux problèmes de vision par ordinateur, tels que les systèmes de surveillance ou les voitures sans pilote. Ils sont également au centre de la plupart des contenus visuels, pouvant amener à des jeux de données très larges pour l’entraînement de modèles et d’algorithmes. Par ailleurs, si les données stéréoscopiques font l’objet d’études depuis longtemps, ce n’est que récemment que les films 3D sont devenus un succès commercial. Dans cette thèse, nous étudions comment exploiter les données additionnelles issues des films 3D pour les tâches d’analyse des personnes. Nous explorons tout d’abord comment extraire une notion de profondeur à partir des films stéréoscopiques, sous la forme de cartes de disparité. Nous évaluons ensuite à quel point les méthodes de détection de personne et d’estimation de posture peuvent bénéficier de ces informations supplémentaires. En s’appuyant sur la relative facilité de la tâche de détection de personne dans les films 3D, nous développons une méthode pour collecter automatiquement des exemples de personnes dans les films 3D afin d’entraîner un détecteur de personne pour les films non 3D. Nous nous concentrons ensuite sur la segmentation de plusieurs personnes dans les vidéos. Nous proposons tout d’abord une méthode pour segmenter plusieurs personnes dans les films 3D en combinant des informations dérivées des cartes de profondeur avec des informations dérivées d’estimations de posture. Nous formulons ce problème comme un problème d’étiquetage de graphe multi-étiquettes, et notre méthode intègre un modèle des occlusions pour produire une segmentation multi-instance par plan. Après avoir montré l’efficacité et les limitations de cette méthode, nous proposons un second modèle, qui ne repose lui que sur des détections de personne à travers la vidéo, et pas sur des estimations de posture. Nous formulons ce problème comme la minimisation d’un coût quadratique sous contraintes linéaires. Ces contraintes encodent les informations de localisation fournies par les détections de personne. Cette méthode ne nécessite pas d’information de posture ou des cartes de disparité, mais peut facilement intégrer ces signaux supplémentaires. Elle peut également être utilisée pour d’autres classes d’objets. Nous évaluons tous ces aspects et démontrons la performance de cette nouvelle méthode
People are at the center of many computer vision tasks, such as surveillance systems or self-driving cars. They are also at the center of most visual contents, potentially providing very large datasets for training models and algorithms. While stereoscopic data has been studied for long, it is only recently that feature-length stereoscopic ("3D") movies became widely available. In this thesis, we study how we can exploit the additional information provided by 3D movies for person analysis. We first explore how to extract a notion of depth from stereo movies in the form of disparity maps. We then evaluate how person detection and human pose estimation methods perform on such data. Leveraging the relative ease of the person detection task in 3D movies, we develop a method to automatically harvest examples of persons in 3D movies and train a person detector for standard color movies. We then focus on the task of segmenting multiple people in videos. We first propose a method to segment multiple people in 3D videos by combining cues derived from pose estimates with ones derived from disparity maps. We formulate the segmentation problem as a multi-label Conditional Random Field problem, and our method integrates an occlusion model to produce a layered, multi-instance segmentation. After showing the effectiveness of this approach as well as its limitations, we propose a second model which only relies on tracks of person detections and not on pose estimates. We formulate our problem as a convex optimization one, with the minimization of a quadratic cost under linear equality or inequality constraints. These constraints weakly encode the localization information provided by person detections. This method does not explicitly require pose estimates or disparity maps but can integrate these additional cues. Our method can also be used for segmenting instances of other object classes from videos. We evaluate all these aspects and demonstrate the superior performance of this new method
APA, Harvard, Vancouver, ISO, and other styles
4

Madadi, Meysam. "Human segmentation, pose estimation and applications." Doctoral thesis, Universitat Autònoma de Barcelona, 2017. http://hdl.handle.net/10803/457900.

Full text
Abstract:
El análisis automático de seres humanos en fotografías o videos tiene grandes aplicaciones dentro de la visión por computador, incluyendo diagnóstico médico, deportes, entretenimiento, edición de películas y vigilancia, por nombrar sólo algunos. El cuerpo, la cara y la mano son los componentes más estudiados de los seres humanos. El cuerpo tiene muchas variabilidades en la forma y la ropa junto con altos grados de libertad en pose. La cara está compuesta por multitud de músculos, causando muchas deformaciones visibles, diferentes formas, y variabilidad en cabello. La mano es un objeto pequeño, que se mueve rápido y tiene altos grados de libertad. La adición de características humanas a todas las variabilidades antes mencionadas hace que el análisis humano sea una tarea muy difícil. En esta tesis, desarrollamos la segmentación humana en diferentes modalidades. En un primer escenario, segmentamos el cuerpo humano y la mano en imágenes de profundidad utilizando la forma basada en la deformación de forma. Desarrollamos un descriptor de forma basado en el contexto de forma y las probabilidades de clase de regiones de forma para extraer vecinos más cercanos. Consideramos entonces la alineación afın rígida frente a la deformación de forma iterativa no rígida. En un segundo escenario, segmentamos la cara en imágenes RGB usando redes neuronales convolucionales (CNN). Modelamos los Conditional Random Field con redes neuronales recurrentes. En nuestro modelo, los núcleos de pares no son fijos y aprendidos durante el entrenamiento. Hemos entrenado la red de extremo-a-extremo utilizando redes adversarias que mejoraron la segmentación del pelo con un alto margen. También hemos trabajado en la estimación de pose de manos 3D en imágenes de profundidad. En un enfoque generativo, se ajustó un modelo de dedo por separado para cada dedo. Minimizamos una función de energía basada en el área de superposición, la discrepancia de profundidad y las colisiones de los dedos. También se aplican modelos lineales en el espacio de la trayectoria articular para refinar las articulaciones ocluidas basadas en el error de las articulaciones visibles y la suavidad de la trayectoria invisible de las articulaciones. En un enfoque basado en CNN, desarrollamos una red de estructura de árbol para entrenar características específicas para cada dedo y las fusionamos para la consistencia de la pose global. También formulamos restricciones físicas y de apariencia como funciones de pérdida de la red. Finalmente, desarrollamos una serie de aplicaciones que consisten en mediciones biométricas humanas y retextura de prendas de vestir. También hemos generado algunos conjuntos de datos en esta tesis sobre diferentes tópicos del análisis de personas, que incluyen problemas de segmentación, manos sintéticas, ropa para retextura, y reconocimiento de gestos.
Automatic analyzing humans in photographs or videos has great potential applications in computer vision containing medical diagnosis, sports, entertainment, movie editing and surveillance, just to name a few. Body, face and hand are the most studied components of humans. Body has many variabilities in shape and clothing along with high degrees of freedom in pose. Face has many muscles causing many visible deformity, beside variable shape and hair style. Hand is a small object, moving fast and has high degrees of freedom. Adding human characteristics to all aforementioned variabilities makes human analysis quite a challenging task.  In this thesis, we developed human segmentation in different modalities. In a first scenario, we segmented human body and hand in depth images using example-based shape warping. We developed a shape descriptor based on shape context and class probabilities of shape regions to extract nearest neighbors. We then considered rigid affine alignment vs. non-rigid iterative shape warping. In a second scenario, we segmented face in RGB images using convolutional neural networks (CNN). We modeled conditional random field with recurrent neural networks. In our model pair-wise kernels are not fixed and learned during training. We trained the network end-to-end using adversarial networks which improved hair segmentation by a high margin. We also worked on 3D hand pose estimation in depth images. In a generative approach, we fitted a finger model separately for each finger based on our example-based rigid hand segmentation. We minimized an energy function based on overlapping area, depth discrepancy and finger collisions. We also applied linear models in joint trajectory space to refine occluded joints based on visible joints error and invisible joints trajectory smoothness. In a CNN-based approach, we developed a tree-structure network to train specific features for each finger and fused them for global pose consistency. We also formulated physical and appearance constraints as loss functions. Finally, we developed a number of applications consisting of human soft biometrics measurement and garment retexturing. We also generated some datasets in this thesis consisting of human segmentation, synthetic hand pose, garment retexturing and Italian gestures.
APA, Harvard, Vancouver, ISO, and other styles
5

Chen, Daniel Chien Yu. "Image segmentation and pose estimation of humans in video." Thesis, Queensland University of Technology, 2014. https://eprints.qut.edu.au/66230/1/Daniel_Chen_Thesis.pdf.

Full text
Abstract:
This thesis introduces improved techniques towards automatically estimating the pose of humans from video. It examines a complete workflow to estimating pose, from the segmentation of the raw video stream to extract silhouettes, to using the silhouettes in order to determine the relative orientation of parts of the human body. The proposed segmentation algorithms have improved performance and reduced complexity, while the pose estimation shows superior accuracy during difficult cases of self occlusion.
APA, Harvard, Vancouver, ISO, and other styles
6

Sandhu, Romeil Singh. "Statistical methods for 2D image segmentation and 3D pose estimation." Diss., Georgia Institute of Technology, 2010. http://hdl.handle.net/1853/37245.

Full text
Abstract:
The field of computer vision focuses on the goal of developing techniques to exploit and extract information from underlying data that may represent images or other multidimensional data. In particular, two well-studied problems in computer vision are the fundamental tasks of 2D image segmentation and 3D pose estimation from a 2D scene. In this thesis, we first introduce two novel methodologies that attempt to independently solve 2D image segmentation and 3D pose estimation separately. Then, by leveraging the advantages of certain techniques from each problem, we couple both tasks in a variational and non-rigid manner through a single energy functional. Thus, the three theoretical components and contributions of this thesis are as follows: Firstly, a new distribution metric for 2D image segmentation is introduced. This is employed within the geometric active contour (GAC) framework. Secondly, a novel particle filtering approach is proposed for the problem of estimating the pose of two point sets that differ by a rigid body transformation. Thirdly, the two techniques of image segmentation and pose estimation are coupled in a single energy functional for a class of 3D rigid objects. After laying the groundwork and presenting these contributions, we then turn to their applicability to real world problems such as visual tracking. In particular, we present an example where we develop a novel tracking scheme for 3-D Laser RADAR imagery. However, we should mention that the proposed contributions are solutions for general imaging problems and therefore can be applied to medical imaging problems such as extracting the prostate from MRI imagery
APA, Harvard, Vancouver, ISO, and other styles
7

DELERUE, JEAN FRANCOIS. "Segmentation 3d, application a l'extraction de reseaux de pores et a la caracterisation hydrodynamique des sols." Paris 11, 2001. http://www.theses.fr/2001PA112141.

Full text
Abstract:
Le sol et les materiaux poreux en general, peuvent etre vus comme l'union de deux parties : la partie solide, constituee de differents materiaux (argile, roche etc. ) et la partie vide (espace poral) par ou peuvent s'ecouler des fluides. Une connaissance precise de la structure 3d de la partie vide devrait permettre une meilleure comprehension des phenomenes d'ecoulement, voire meme une prevision des proprietes hydriques de ces materiaux. Les recents progres dans les domaines de l'acquisition d'image rendent de plus en plus abordable l'obtention d'images volumiques de sol, notamment grace a la tomographie a rayon x. Mon travail a consiste a adapter des algorithmes d'analyse d'image existants, et a en developper de nouveaux afin de decrire les structures des parties vides dans des images volumiques de sol. Pour mener a bien cette description, je propose differents algorithmes originaux : un algorithme de calcul de diagramme de voronoi sur espace discret, un algorithme de squelettisation par selection des points de frontieres de voronoi et un algorithme de segmentation par croissance de region utilisant des distances de type geodesique. Ces differents algorithmes forment une suite qui, appliquee a un objet quelconque, permet de le decomposer suivant des criteres de taille locale. Dans le cas d'une image de sol, la partie vide du sol est segmentee en regions correspondant a des pores (parties elementaires de l'espace poral d'ouverture homogene) et un reseau de pores est cree. A partir de ce reseau, il est possible par analogie avec les reseaux electriques de calculer la conductivite hydrique equivalente pour l'image etudiee. De facon generale, je propose un ensemble de procedures permettant entre autre, de simuler des processus d'intrusion et d'extrusion de fluide dans l'espace poral, de simuler la porosimetrie au mercure et de calculer des distributions d'ouvertures. Bien que concu pour l'etude des sols, ce travail d'imagerie 3d pourrait etre applique a d'autres domaines.
APA, Harvard, Vancouver, ISO, and other styles
8

Hewa, Thondilege Akila Sachinthani Pemasiri. "Multimodal Image Correspondence." Thesis, Queensland University of Technology, 2022. https://eprints.qut.edu.au/235433/1/Akila%2BHewa%2BThondilege%2BThesis%281%29.pdf.

Full text
Abstract:
Multimodal images are used across many application areas including medical and surveillance. Due to the different characteristics of different imaging modalities, developing image processing algorithms for multimodal images is challenging. This thesis proposes effective solutions for the challenging problem of multimodal semantic correspondence where the connections between similar components across images from different modalities are established. The proposed methods which are based on deep learning techniques have been applied for several applications including epilepsy type classification and 3D reconstruction of human hand from visible and X-ray image. These proposed algorithms can be adapted to many other imaging modalities.
APA, Harvard, Vancouver, ISO, and other styles
9

Calzavara, Ivan. "Human pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNetHuman pose augmentation for facilitating Violence Detection in videos: a combination of the deep learning methods DensePose and VioNet." Thesis, Mittuniversitetet, Institutionen för informationssystem och –teknologi, 2020. http://urn.kb.se/resolve?urn=urn:nbn:se:miun:diva-40842.

Full text
Abstract:
In recent years, deep learning, a critical technology in computer vision, has achieved remarkable milestones in many fields, such as image classification and object detection. In particular, it has also been introduced to address the problem of violence detection, which is a big challenge considering the complexity to establish an exact definition for the phenomenon of violence. Thanks to the ever increasing development of new technologies for surveillance, we have nowadays access to an enormous database of videos that can be analyzed to find any abnormal behavior. However, by dealing with such huge amount of data it is unrealistic to manually examine all of them. Deep learning techniques, instead, can automatically study, learn and perform classification operations. In the context of violence detection, with the extraction of visual harmful patterns, it is possible to design various descriptors to represent features that can identify them. In this research we tackle the task of generating new augmented datasets in order to try to simplify the identification step performed by a violence detection technique in the field of Deep Learning. The novelty of this work is to introduce the usage of DensePose model to enrich the images in a dataset by highlighting (i.e. by identifying and segmenting) all the human beings present in them. With this approach we gained knowledge of how this algorithm performs on videos with a violent context and how the violent detection network benefit from this procedure. Performances have been evaluated from the point of view of segmentation accuracy and efficiency of the violence detection network, as well from the computational point of view. Results shows how the context of the scene is the major indicator that brings the DensePose model to correct segment human beings and how the context of violence does not seem to be the most suitable field for the application of this model since the common overlap of bodies (distinctive aspect of violence) acts as disadvantage for the segmentation. For this reason, the violence detection network does not exploit its full potential. Finally, we understood how such augmented datasets can boost up the training speed by reducing the time needed for the weights-update phase, making this procedure a helpful adds-on for implementations in different contexts where the identification of human beings still plays the major role.
APA, Harvard, Vancouver, ISO, and other styles
10

Karabagli, Bilal. "Vérification automatique des montages d'usinage par vision : application à la sécurisation de l'usinage." Phd thesis, Université Toulouse le Mirail - Toulouse II, 2013. http://tel.archives-ouvertes.fr/tel-01018079.

Full text
Abstract:
Le terme "usinage à porte fermée", fréquemment employé par les PME de l'aéronautique et de l'automobile, désigne l'automatisation sécurisée du processus d'usinage des pièces mécaniques. Dans le cadre de notre travail, nous nous focalisons sur la vérification du montage d'usinage, avant de lancer la phase d'usinage proprement dite. Nous proposons une solution sans contact, basée sur la vision monoculaire (une caméra), permettant de reconnaitre automatiquement les éléments du montage (brut à usiner, pions de positionnement, tiges de fixation,etc.), de vérifier que leur implantation réelle (réalisée par l'opérateur) est conforme au modèle 3D numérique de montage souhaité (modèle CAO), afin de prévenir tout risque de collision avec l'outil d'usinage.
APA, Harvard, Vancouver, ISO, and other styles
11

Goudie, Duncan. "Discriminative hand-object pose estimation from depth images using convolutional neural networks." Thesis, University of Manchester, 2018. https://www.research.manchester.ac.uk/portal/en/theses/discriminative-handobject-pose-estimation-from-depth-images-using-convolutional-neural-networks(f677870a-779f-460a-948d-10fc045e094c).html.

Full text
Abstract:
This thesis investigates the task of estimating the pose of a hand interacting with an object from a depth image. The main contribution of this thesis is the development of our discriminative one-shot hand-object pose estimation system. To the best of our knowledge, this is the first attempt at a one-shot hand-object pose estimation system. It is a two stage system consisting of convolutional neural networks. The first stage segments the object out of the hand from the depth image. This hand-minus-object depth image is combined with the original input depth image to form a 2-channel image for use in the second stage, pose estimation. We show that using this 2-channel image produces better pose estimation performance than a single stage pose estimation system taking just the input depth map as input. We also believe that we are amongst the first to research hand-object segmentation. We use fully convolutional neural networks to perform hand-object segmentation from a depth image. We show that this is a superior approach to random decision forests for this task. Datasets were created to train our hand-object pose estimator stage and hand-object segmentation stage. The hand-object pose labels were estimated semi-automatically with a combined manual annotation and generative approach. The segmentation labels were inferred automatically with colour thresholding. To the best of our knowledge, there were no public datasets for these two tasks when we were developing our system. These datasets have been or are in the process of being publicly released.
APA, Harvard, Vancouver, ISO, and other styles
12

Lee, Jehoon. "Statistical and geometric methods for visual tracking with occlusion handling and target reacquisition." Diss., Georgia Institute of Technology, 2012. http://hdl.handle.net/1853/43582.

Full text
Abstract:
Computer vision is the science that studies how machines understand scenes and automatically make decisions based on meaningful information extracted from an image or multi-dimensional data of the scene, like human vision. One common and well-studied field of computer vision is visual tracking. It is challenging and active research area in the computer vision community. Visual tracking is the task of continuously estimating the pose of an object of interest from the background in consecutive frames of an image sequence. It is a ubiquitous task and a fundamental technology of computer vision that provides low-level information used for high-level applications such as visual navigation, human-computer interaction, and surveillance system. The focus of the research in this thesis is visual tracking and its applications. More specifically, the object of this research is to design a reliable tracking algorithm for a deformable object that is robust to clutter and capable of occlusion handling and target reacquisition in realistic tracking scenarios by using statistical and geometric methods. To this end, the approaches developed in this thesis make extensive use of region-based active contours and particle filters in a variational framework. In addition, to deal with occlusions and target reacquisition problems, we exploit the benefits of coupling 2D and 3D information of an image and an object. In this thesis, first, we present an approach for tracking a moving object based on 3D range information in stereoscopic temporal imagery by combining particle filtering and geometric active contours. Range information is weighted by the proposed Gaussian weighting scheme to improve segmentation achieved by active contours. In addition, this work present an on-line shape learning method based on principal component analysis to reacquire track of an object in the event that it disappears from the field of view and reappears later. Second, we propose an approach to jointly track a rigid object in a 2D image sequence and to estimate its pose in 3D space. In this work, we take advantage of knowledge of a 3D model of an object and we employ particle filtering to generate and propagate the translation and rotation parameters in a decoupled manner. Moreover, to continuously track the object in the presence of occlusions, we propose an occlusion detection and handling scheme based on the control of the degree of dependence between predictions and measurements of the system. Third, we introduce the fast level-set based algorithm applicable to real-time applications. In this algorithm, a contour-based tracker is improved in terms of computational complexity and the tracker performs real-time curve evolution for detecting multiple windows. Lastly, we deal with rapid human motion in context of object segmentation and visual tracking. Specifically, we introduce a model-free and marker-less approach for human body tracking based on a dynamic color model and geometric information of a human body from a monocular video sequence. The contributions of this thesis are summarized as follows: 1. Reliable algorithm to track deformable objects in a sequence consisting of 3D range data by combining particle filtering and statistics-based active contour models. 2. Effective handling scheme based on object's 2D shape information for the challenging situations in which the tracked object is completely gone from the image domain during tracking. 3. Robust 2D-3D pose tracking algorithm using a 3D shape prior and particle filters on SE(3). 4. Occlusion handling scheme based on the degree of trust between predictions and measurements of the tracking system, which is controlled in an online fashion. 5. Fast level set based active contour models applicable to real-time object detection. 6. Model-free and marker-less approach for tracking of rapid human motion based on a dynamic color model and geometric information of a human body.
APA, Harvard, Vancouver, ISO, and other styles
13

Kong, Longbo. "Accurate Joint Detection from Depth Videos towards Pose Analysis." Thesis, University of North Texas, 2018. https://digital.library.unt.edu/ark:/67531/metadc1157524/.

Full text
Abstract:
Joint detection is vital for characterizing human pose and serves as a foundation for a wide range of computer vision applications such as physical training, health care, entertainment. This dissertation proposed two methods to detect joints in the human body for pose analysis. The first method detects joints by combining body model and automatic feature points detection together. The human body model maps the detected extreme points to the corresponding body parts of the model and detects the position of implicit joints. The dominant joints are detected after implicit joints and extreme points are located by a shortest path based methods. The main contribution of this work is a hybrid framework to detect joints on the human body to achieve robustness to different body shapes or proportions, pose variations and occlusions. Another contribution of this work is the idea of using geodesic features of the human body to build a model for guiding the human pose detection and estimation. The second proposed method detects joints by segmenting human body into parts first and then detect joints by making the detection algorithm focusing on each limb. The advantage of applying body part segmentation first is that the body segmentation method narrows down the searching area for each joint so that the joint detection method can provide more stable and accurate results.
APA, Harvard, Vancouver, ISO, and other styles
14

Aziz, Kheir Eddine. "Suivi multi-caméras de personnes dans un environnement contraint." Thesis, Aix-Marseille, 2012. http://www.theses.fr/2012AIXM4093.

Full text
Abstract:
La consommation est considérée comme étant l'une des formes simples de la vie quotidienne. L'évolution de la société moderne a entraîné un environnement fortement chargé d'objets, de signes et d'interactions fondées sur des transactions commerciales. À ce phénomène s'ajoutent l'accélération du renouvellement de l'offre disponible et le pouvoir d'achat qui devient une préoccupation grandissante pour la majorité des consommateurs et oú l'inflation des prix est un sujet récurrent. Compte tenu de cette complexité et de ces enjeux économiques aussi consé- quents, la nécessité de modéliser le comportement d'achat des consommateurs dans les diffé- rents secteurs d'activité présente une phase primordiale pour les grands acteurs économiques ou analystes. En 2008, la société Cliris s'est lancée dans le projet de suivi multi-caméras de trajectoires des clients. En effet, le projet repose sur la mise au point d'un système d'analyse automatique multi-flux basé sur le suivi multi-caméras de clients. Ce système permet d'analy- ser la fréquentation et les parcours des clients dans les surfaces de grandes distributions. Dans le cadre de cette thèse CIFRE, nous avons abordé l'ensemble du processus de suivi multi-caméras de personnes tout en mettant l'accent sur le côté applicatif du problème en apportant notre contribution à la réponse aux questions suivantes :1. Comment suivre un individu à partir d'un flux vidéo mono-caméra en assurant la gestion des occultations ?2. Comment effectuer un comptage de personnes dans les surfaces denses ?3. Comment reconnaître un individu en différents points du magasin à partir des flux vidéo multi-caméras et suivre ainsi son parcours ?
APA, Harvard, Vancouver, ISO, and other styles
15

Blanc, Beyne Thibault. "Estimation de posture 3D à partir de données imprécises et incomplètes : application à l'analyse d'activité d'opérateurs humains dans un centre de tri." Thesis, Toulouse, INPT, 2020. http://www.theses.fr/2020INPT0106.

Full text
Abstract:
Dans un contexte d’étude de la pénibilité et de l’ergonomie au travail pour la prévention des troubles musculo-squelettiques, la société Ebhys cherche à développer un outil d’analyse de l’activité des opérateurs humains dans un centre de tri, par l’évaluation d’indicateurs ergonomiques. Pour faire face à l’environnement non contrôlé du centre de tri et pour faciliter l’acceptabilité du dispositif, ces indicateurs sont mesurés à partir d’images de profondeur. Une étude ergonomique nous permet de définir les indicateurs à mesurer. Ces indicateurs sont les zones d’évolution des mains de l’opérateur et d’angulations de certaines articulations du haut du corps. Ce sont donc des indicateurs obtenables à partir d’une analyse de la posture 3D de l’opérateur. Le dispositif de calcul des indicateurs sera donc composé de trois parties : une première partie sépare l’opérateur du reste de la scène pour faciliter l’estimation de posture 3D, une seconde partie calcule la posture 3D de l’opérateur, et la troisième utilise la posture 3D de l’opérateur pour calculer les indicateurs ergonomiques. Tout d’abord, nous proposons un algorithme qui permet d’extraire l’opérateur du reste de l’image de profondeur. Pour ce faire, nous utilisons une première segmentation automatique basée sur la suppression du fond statique et la sélection d’un objet dynamique à l’aide de sa position et de sa taille. Cette première segmentation sert à entraîner un algorithme d’apprentissage qui améliore les résultats obtenus. Cet algorithme d’apprentissage est entraîné à l’aide des segmentations calculées précédemment, dont on sélectionne automatiquement les échantillons de meilleure qualité au cours de l’entraînement. Ensuite, nous construisons un modèle de réseau de neurones pour l’estimation de la posture 3D de l’opérateur. Nous proposons une étude qui permet de trouver un modèle léger et optimal pour l’estimation de posture 3D sur des images de profondeur de synthèse, que nous générons numériquement. Finalement, comme ce modèle n’est pas directement applicable sur les images de profondeur acquises dans les centres de tri, nous construisons un module qui permet de transformer les images de profondeur de synthèse en images de profondeur plus réalistes. Ces images de profondeur plus réalistes sont utilisées pour réentrainer l’algorithme d’estimation de posture 3D, pour finalement obtenir une estimation de posture 3D convaincante sur les images de profondeur acquises en conditions réelles, permettant ainsi de calculer les indicateurs ergonomiques
In a context of study of stress and ergonomics at work for the prevention of musculoskeletal disorders, the company Ebhys wants to develop a tool for analyzing the activity of human operators in a waste sorting center, by measuring ergonomic indicators. To cope with the uncontrolled environment of the sorting center, these indicators are measured from depth images. An ergonomic study allows us to define the indicators to be measured. These indicators are zones of movement of the operator’s hands and zones of angulations of certain joints of the upper body. They are therefore indicators that can be obtained from an analysis of the operator’s 3D pose. The software for calculating the indicators will thus be composed of three steps : a first part segments the operator from the rest of the scene to ease the 3D pose estimation, a second part estimates the operator’s 3D pose, and the third part uses the operator’s 3D pose to compute the ergonomic indicators. First of all, we propose an algorithm that extracts the operator from the rest of the depth image. To do this, we use a first automatic segmentation based on static background removal and selection of a moving element given its position and size. This first segmentation allows us to train a neural network that improves the results. This neural network is trained using the segmentations obtained from the first automatic segmentation, from which the best quality samples are automatically selected during training. Next, we build a neural network model to estimate the operator’s 3D pose. We propose a study that allows us to find a light and optimal model for 3D pose estimation on synthetic depth images, which we generate numerically. However, if this network gives outstanding performances on synthetic depth images, it is not directly applicable to real depth images that we acquired in an industrial context. To overcome this issue, we finally build a module that allows us to transform the synthetic depth images into more realistic depth images. This image-to-image translation model modifies the style of the depth image without changing its content, keeping the 3D pose of the operator from the synthetic source image unchanged on the translated realistic depth frames. These more realistic depth images are then used to re-train the 3D pose estimation neural network, to finally obtain a convincing 3D pose estimation on the depth images acquired in real conditions, to compute de ergonomic indicators
APA, Harvard, Vancouver, ISO, and other styles
16

Cabras, Paolo. "3D Pose estimation of continuously deformable instruments in robotic endoscopic surgery." Thesis, Strasbourg, 2016. http://www.theses.fr/2016STRAD007/document.

Full text
Abstract:
Connaître la position 3D d’instruments robotisés peut être très utile dans le contexte chirurgical. Nous proposons deux méthodes automatiques pour déduire la pose 3D d’un instrument avec une unique section pliable et équipé avec des marqueurs colorés, en utilisant uniquement les images fournies par la caméra monoculaire incorporée dans l'endoscope. Une méthode basée sur les graphes permet segmenter les marqueurs et leurs coins apparents sont extraits en détectant la transition de couleur le long des courbes de Bézier qui modélisent les points du bord. Ces primitives sont utilisées pour estimer la pose 3D de l'instrument en utilisant un modèle adaptatif qui prend en compte les jeux mécaniques du système. Pour éviter les limites de cette approche dérivants des incertitudes sur le modèle géométrique, la fonction image-position-3D peut être appris selon un ensemble d’entrainement. Deux techniques ont été étudiées et améliorées : réseau des fonctions à base radiale avec noyaux gaussiens et une régression localement pondérée. Les méthodes proposées sont validées sur une cellule expérimentale robotique et sur des séquences in-vivo
Knowing the 3D position of robotized instruments can be useful in surgical context for e.g. their automatic control or gesture guidance. We propose two methods to infer the 3D pose of a single bending section instrument equipped with colored markers using only the images provided by the monocular camera embedded in the endoscope. A graph-based method is used to segment the markers. Their corners are extracted by detecting color transitions along Bézier curves fitted on edge points. These features are used to estimate the 3D pose of the instrument using an adaptive model that takes into account the mechanical plays of the system. Since this method can be affected by model uncertainties, the image-to-3d function can be learned according to a training set. We opted for two techniques that have been improved : Radial Basis Function Network with Gaussian kernel and Locally Weighted Projection. The proposed methods are validated on a robotic experimental cell and in in-vivo sequences
APA, Harvard, Vancouver, ISO, and other styles
17

Usher, Kane. "Visual homing for a car-like vehicle." Thesis, Queensland University of Technology, 2005. https://eprints.qut.edu.au/16309/1/Kane_Usher_Thesis.pdf.

Full text
Abstract:
This thesis addresses the pose stabilization of a car-like vehicle using omnidirectional visual feedback. The presented method allows a vehicle to servo to a pre-learnt target pose based on feature bearing angle and range discrepancies between the vehicle's current view of the environment and that seen at the learnt location. The best example of such a task is the use of visual feedback for autonomous parallel-parking of an automobile. Much of the existing work in pose stabilization is highly theoretical in nature with few examples of implementations on 'real' vehicles, let alone vehicles representative of those found in industry. The work in this thesis develops a suitable test platform and implements vision-based pose stabilization techniques. Many of the existing techniques were found to fail due to vehicle steering and velocity loop dynamics, and more significantly, with steering input saturation. A technique which does cope with the characteristics of 'real' vehicles is to divide the task into predefined stages, essentially dividing the state space into sub-manifolds. For a car-like vehicle, the strategy used is to stabilize the vehicle to the line which has the correct orientation and contains the target location. Once on the line, the vehicle then servos to the desired pose. This strategy can accommodate velocity and steering loop dynamics, and input saturation. It can also allow the use of linear control techniques for system analysis and tuning of control gains. To perform pose stabilization, good estimates of vehicle pose are required. A simple, yet robust, method derived from the visual homing literature is to sum the range vectors to all the landmarks in the workspace and divide by the total number of landmarks--the Improved Average Landmark Vector. By subtracting the IALV at the target location from the currently calculated IALV, an estimate of vehicle pose is obtained. In this work, views of the world are provided by an omnidirectional camera, while a magnetic compass provides a reference direction. The landmarks used are red road cones which are segmented from the omnidirectional colour images using a pre-learnt, two-dimensional lookup table of their colour profile. Range to each landmark is estimated using a model of the optics of the system, based on a flat-Earth assumption. A linked-list based method is used to filter the landmarks over time. Complementary filtering techniques, which combine the vision data with vehicle odometry, are used to improve the quality of the measurements.
APA, Harvard, Vancouver, ISO, and other styles
18

Usher, Kane. "Visual homing for a car-like vehicle." Queensland University of Technology, 2005. http://eprints.qut.edu.au/16309/.

Full text
Abstract:
This thesis addresses the pose stabilization of a car-like vehicle using omnidirectional visual feedback. The presented method allows a vehicle to servo to a pre-learnt target pose based on feature bearing angle and range discrepancies between the vehicle's current view of the environment and that seen at the learnt location. The best example of such a task is the use of visual feedback for autonomous parallel-parking of an automobile. Much of the existing work in pose stabilization is highly theoretical in nature with few examples of implementations on 'real' vehicles, let alone vehicles representative of those found in industry. The work in this thesis develops a suitable test platform and implements vision-based pose stabilization techniques. Many of the existing techniques were found to fail due to vehicle steering and velocity loop dynamics, and more significantly, with steering input saturation. A technique which does cope with the characteristics of 'real' vehicles is to divide the task into predefined stages, essentially dividing the state space into sub-manifolds. For a car-like vehicle, the strategy used is to stabilize the vehicle to the line which has the correct orientation and contains the target location. Once on the line, the vehicle then servos to the desired pose. This strategy can accommodate velocity and steering loop dynamics, and input saturation. It can also allow the use of linear control techniques for system analysis and tuning of control gains. To perform pose stabilization, good estimates of vehicle pose are required. A simple, yet robust, method derived from the visual homing literature is to sum the range vectors to all the landmarks in the workspace and divide by the total number of landmarks--the Improved Average Landmark Vector. By subtracting the IALV at the target location from the currently calculated IALV, an estimate of vehicle pose is obtained. In this work, views of the world are provided by an omnidirectional camera, while a magnetic compass provides a reference direction. The landmarks used are red road cones which are segmented from the omnidirectional colour images using a pre-learnt, two-dimensional lookup table of their colour profile. Range to each landmark is estimated using a model of the optics of the system, based on a flat-Earth assumption. A linked-list based method is used to filter the landmarks over time. Complementary filtering techniques, which combine the vision data with vehicle odometry, are used to improve the quality of the measurements.
APA, Harvard, Vancouver, ISO, and other styles
19

Simó, Serra Edgar. "Understanding human-centric images : from geometry to fashion." Doctoral thesis, Universitat Politècnica de Catalunya, 2015. http://hdl.handle.net/10803/327030.

Full text
Abstract:
Understanding humans from photographs has always been a fundamental goal of computer vision. Early works focused on simple tasks such as detecting the location of individuals by means of bounding boxes. As the field progressed, harder and more higher level tasks have been undertaken. For example, from human detection came the 2D and 3D human pose estimation in which the task consisted of identifying the location in the image or space of all different body parts, e.g., head, torso, knees, arms, etc. Human attributes also became a great source of interest as they allow recognizing individuals and other properties such as gender or age. Later, the attention turned to the recognition of the action being performed. This, in general, relies on the previous works on pose estimation and attribute classification. Currently, even higher level tasks are being conducted such as predicting the motivations of human behavior or identifying the fashionability of an individual from a photograph. In this thesis we have developed a hierarchy of tools that cover all these range of problems, from low level feature point descriptors to high level fashion-aware conditional random fields models, all with the objective of understanding humans from monocular, RGB images. In order to build these high level models it is paramount to have a battery of robust and reliable low and mid level cues. Along these lines, we have proposed two low-level keypoint descriptors: one based on the theory of the heat diffusion on images, and the other that uses a convolutional neural network to learn discriminative image patch representations. We also introduce distinct low-level generative models for representing human pose: in particular we present a discrete model based on a directed acyclic graph and a continuous model that consists of poses clustered on a Riemannian manifold. As mid level cues we propose two 3D human pose estimation algorithms: one that estimates the 3D pose given a noisy 2D estimation, and an approach that simultaneously estimates both the 2D and 3D pose. Finally, we formulate higher level models built upon low and mid level cues for human understanding. Concretely, we focus on two different tasks in the context of fashion: semantic segmentation of clothing, and predicting the fashionability from images with metadata to ultimately provide fashion advice to the user. In summary, to robustly extract knowledge from images with the presence of humans it is necessary to build high level models that integrate low and mid level cues. In general, using and understanding strong features is critical for obtaining reliable performance. The main contribution of this thesis is in proposing a variety of low, mid and high level algorithms for human-centric images that can be integrated into higher level models for comprehending humans from photographs, as well as tackling novel fashion-oriented problems.
Siempre ha sido una meta fundamental de la visión por computador la comprensión de los seres humanos. Los primeros trabajos se fijaron en objetivos sencillos tales como la detección en imágenes de la posición de los individuos. A medida que la investigación progresó se emprendieron tareas mucho más complejas. Por ejemplo, a partir de la detección de los humanos se pasó a la estimación en dos y tres dimensiones de su postura por lo que la tarea consistía en identificar la localización en la imagen o el espacio de las diferentes partes del cuerpo, por ejemplo cabeza, torso, rodillas, brazos, etc...También los atributos humanos se convirtieron en una gran fuente de interés ya que permiten el reconocimiento de los individuos y de sus propiedades como el género o la edad. Más tarde, la atención se centró en el reconocimiento de la acción realizada. Todos estos trabajos reposan en las investigaciones previas sobre la estimación de las posturas y la clasificación de los atributos. En la actualidad, se llevan a cabo investigaciones de un nivel aún superior sobre cuestiones tales como la predicción de las motivaciones del comportamiento humano o la identificación del tallaje de un individuo a partir de una fotografía. En esta tesis desarrollamos una jerarquía de herramientas que cubre toda esta gama de problemas, desde descriptores de rasgos de bajo nivel a modelos probabilísticos de campos condicionales de alto nivel reconocedores de la moda, todos ellos con el objetivo de mejorar la comprensión de los humanos a partir de imágenes RGB monoculares. Para construir estos modelos de alto nivel es decisivo disponer de una batería de datos robustos y fiables de nivel bajo y medio. En este sentido, proponemos dos descriptores novedosos de bajo nivel: uno se basa en la teoría de la difusión de calor en las imágenes y otro utiliza una red neural convolucional para aprender representaciones discriminativas de trozos de imagen. También introducimos diferentes modelos de bajo nivel generativos para representar la postura humana: en particular presentamos un modelo discreto basado en un gráfico acíclico dirigido y un modelo continuo que consiste en agrupaciones de posturas en una variedad de Riemann. Como señales de nivel medio proponemos dos algoritmos estimadores de la postura humana: uno que estima la postura en tres dimensiones a partir de una estimación imprecisa en el plano de la imagen y otro que estima simultáneamente la postura en dos y tres dimensiones. Finalmente construimos modelos de alto nivel a partir de señales de nivel bajo y medio para la comprensión de la persona a partir de imágenes. En concreto, nos centramos en dos diferentes tareas en el ámbito de la moda: la segmentación semántica del vestido y la predicción del buen ajuste de la prenda a partir de imágenes con meta-datos con la finalidad de aconsejar al usuario sobre moda. En resumen, para extraer conocimiento a partir de imágenes con presencia de seres humanos es preciso construir modelos de alto nivel que integren señales de nivel medio y bajo. En general, el punto crítico para obtener resultados fiables es el empleo y la comprensión de rasgos fuertes. La aportación fundamental de esta tesis es la propuesta de una variedad de algoritmos de nivel bajo, medio y alto para el tratamiento de imágenes centradas en seres humanos que pueden integrarse en modelos de alto nivel, para mejor comprensión de los seres humanos a partir de fotografías, así como abordar problemas planteados por el buen ajuste de las prendas.
APA, Harvard, Vancouver, ISO, and other styles
20

Ting, Chen-Kang, and 丁介棡. "Market Segmentation and Service Satisfaction of Electronic Commerce for Port of Kaohsiung." Thesis, 2001. http://ndltd.ncl.edu.tw/handle/01424498989465839392.

Full text
Abstract:
碩士
國立成功大學
交通管理學系
89
This study seeks to examine the market segmentation and service satisfaction of electronic commerce for Port of Kaohsiung. Firstly, the important and satisfaction levels of service attributes and selection criteria of electronic commerce were conducted. Further, five factors were extracted from service attributes based on a factor analysis. These were defined as: harbor service, enquiry service, communication and support service, port information service as well as information service. In addition, five factors of selection criteria included content factor, facility factor, technological support factor, security factor and response factor were founded in this research. Thirdly, a cluster was used to classify those ocean carriers into three groups. These three groups were characterized as: multiple and technological services oriented firms, information and content oriented firms, as well as port information oriented firms. Finally, the results indicated that schedule search and on-line support service were perceived as most important service attributes and needed to improve in the current electronic commerce services. The use of this framework may be a useful approach for port authority to develop their strategies in electronic commerce market.
APA, Harvard, Vancouver, ISO, and other styles
21

Kuo, Hung-chin, and 郭宏志. "Feature Detection in Random Scan Data Using Pole Method for Segmentation in Reverse Engineering." Thesis, 2006. http://ndltd.ncl.edu.tw/handle/21722477390367189164.

Full text
Abstract:
碩士
國立中正大學
機械工程所
95
For manipulating the scanning database, it is usually transform to STL in the reverse engineering. The NURBS surface is still the major way to cope with scanning data in the conventional industry and the results have to be processed further, for instance, fetching the curve artificially to establish NURBS surface. However, it needs more cost and human power and the different experiences on it might cause different results. In this study, we try to establish a preprocess of NURBS surface -- Segmentation to reduce the error of conventional way due to the different judgements. This research mainly provide a pre-process which establish NURBS surface model - a procedures of segmentation. Then, with the region that is cut apart to build NURBS surface directly. The main purpose of this research directly use the method of the computational geometry to establish the relation of points, and then extract the feature points. Then, we compute a minimum spanning tree (MST) for these feature point to become boundary lines. Finally, found out the different region by using the property of B-rep solid model for the segmentation purpose. By this process could reduce human-made error and simplify the conventional steps. It is expected that this process could shorten the processing time of NURBS surface`s establishment. And under the standard processing, it could ensure the quality of product and has a lot of favor to the industry.
APA, Harvard, Vancouver, ISO, and other styles
22

Xu, Changhai 1977. "Steps towards the object semantic hierarchy." Thesis, 2011. http://hdl.handle.net/2152/ETD-UT-2011-08-3797.

Full text
Abstract:
An intelligent robot must be able to perceive and reason robustly about its world in terms of objects, among other foundational concepts. The robot can draw on rich data for object perception from continuous sensory input, in contrast to the usual formulation that focuses on objects in isolated still images. Additionally, the robot needs multiple object representations to deal with different tasks and/or different classes of objects. We propose the Object Semantic Hierarchy (OSH), which consists of multiple representations with different ontologies. The OSH factors the problems of object perception so that intermediate states of knowledge about an object have natural representations, with relatively easy transitions from less structured to more structured representations. Each layer in the hierarchy builds an explanation of the sensory input stream, in terms of a stochastic model consisting of a deterministic model and an unexplained "noise" term. Each layer is constructed by identifying new invariants from the previous layer. In the final model, the scene is explained in terms of constant background and object models, and low-dimensional dynamic poses of the observer and objects. The OSH contains two types of layers: the Object Layers and the Model Layers. The Object Layers describe how the static background and each foreground object are individuated, and the Model Layers describe how the model for the static background or each foreground object evolves from less structured to more structured representations. Each object or background model contains the following layers: (1) 2D object in 2D space (2D2D): a set of constant 2D object views, and the time-variant 2D object poses, (2) 2D object in 3D space (2D3D): a collection of constant 2D components, with their individual time-variant 3D poses, and (3) 3D object in 3D space (3D3D): the same collection of constant 2D components but with invariant relations among their 3D poses, and the time-variant 3D pose of the object as a whole. In building 2D2D object models, a fundamental problem is to segment out foreground objects in the pixel-level sensory input from the background environment, where motion information is an important cue to perform the segmentation. Traditional approaches for moving object segmentation usually appeal to motion analysis on pure image information without exploiting the robot's motor signals. We observe, however, that the background motion (from the robot's egocentric view) has stronger correlation to the robot's motor signals than the motion of foreground objects. Based on this observation, we propose a novel approach to segmenting moving objects by learning homography and fundamental matrices from motor signals. In building 2D3D and 3D3D object models, estimating camera motion parameters plays a key role. We propose a novel method for camera motion estimation that takes advantage of both planar features and point features and fuses constraints from both homography and essential matrices in a single probabilistic framework. Using planar features greatly improves estimation accuracy over using point features only, and with the help of point features, the solution ambiguity from a planar feature is resolved. Compared to the two classic approaches that apply the constraint of either homography or essential matrix, the proposed method gives more accurate estimation results and avoids the drawbacks of the two approaches.
text
APA, Harvard, Vancouver, ISO, and other styles
23

GARRO, Valeria. "Image localization and parsing using 3D structure." Doctoral thesis, 2013. http://hdl.handle.net/11562/533354.

Full text
Abstract:
L'obiettivo di questa tesi è lo studio di due problematiche fondamentali della visione computazionale: la localizzazione direttamente da immagini e la segmentazione semantica di un'immagine. Il primo contributo di questa tesi è lo sviluppo di un sistema che calcola un'accurata e rapida localizzazione di una fotocamera portatile, utilizzando oltre ad un dataset di immagini pre-registrate, dati tridimensionali ottenuti da una ricostruzione mediante un algoritmo di "Structure from Motion". L'informazione 3D viene considerata sotto due differenti aspetti: in primo luogo essa è direttamente coinvolta nella fase di registrazione della camera da cui si ottengono robuste corrispondenze di tipo 2D-3D invece di coppie di punti salienti 2D-2D; inoltre il sistema, nella sua fase di reperimento di immagini similari, sfrutta la fase di clusterizzazione dell'algoritmo di "Structure from Motion" migliorando la propria efficienza e robustezza. La seconda parte della tesi consiste in una analisi dettagliata di uno dei componenti principali del sistema di localizzazione, l'algoritmo di stima della posa della camera partendo da corrispondenze 2D-3D. In particolare viene presentata una nuova formulazione del problema, in letteratura indicato come "Perspective-n-Point problem" o "exterior orientation problem", trasformandola in un'istanza di un problema di analisi di Procrustes di tipo anisotropico ortogonale. Il contributo finale della tesi è lo sviluppo di un nuovo approccio per la segmentazione semantica di immagini in contesto urbano che coinvolge anch'esso in maniera preponderante la struttura tridimensionale ottenuta dall'algoritmo di "Structure from Motion" in termini di trasferimento di etichette semantiche da un'immagine precedentemente annotata all'immagine di test. L'immagine da annotare può essere sia un'immagine appartenente al dataset di partenza dell'algoritmo di "Structure from Motion" a cui non è stata associata nessuna informazione semantica, sia un'immagine esterna al dataset che è stata appena registrata mediante il sistema di localizzazione precedentemente descritto. L'assegnamento delle etichette è modellato mediante un Markov random field i cui nodi sono superpixel estratti dall'immagine di test.
The aim of this thesis is the study of two fundamental problems in computer vision: localization from images and semantic image segmentation. The first contribution of this thesis is the development of a complete system that obtains an accurate and fast localization of a hand-held camera device, leveraging not only on a dataset of registered images but also on the three-dimensional information obtained by a Structure from Motion reconstruction. We exploit the 3D structure under two different aspect: first it is directly involved in the camera registration making available robust 2D-3D correspondences instead of 2D-2D pairs of matched features, furthermore we take advantage of the image clustering computed in the Structure from Motion algorithm during the retrieval step of the localization system improving both robustness and efficiency of the aforementioned algorithmic stage. The second part of the thesis consists in an in-depth analysis of one of the main components of the localization system, the camera pose estimation from 2D-3D correspondences. In particular we present a novel formulation of the Perspective-n-Point problem, also known as exterior orientation, in terms of an instance of anisotropic orthogonal Procrustes problem. The last contribution of the thesis is the proposal of a new approach to semantic image segmentation in urban environment that deeply involves the Structure from Motion 3D structure in terms of label transfer from a pre-labeled image to a query image. The query image can be whether an image belonging to the SfM dataset that does not have any semantic information or an external image that has just been localized by the localization system aforementioned. The label assignment problem is modeled as a Markov random field where the nodes are the superpixels of the query image.
APA, Harvard, Vancouver, ISO, and other styles
24

Schoeler, Markus. "Visual Perception of Objects and their Parts in Artificial Systems." Doctoral thesis, 2015. http://hdl.handle.net/11858/00-1735-0000-0023-9669-A.

Full text
APA, Harvard, Vancouver, ISO, and other styles
25

Patrão, Bruno André Santos. "Biblioteca para Desenvolvimento de Aplicações de Realidade Aumentada Baseada em Marcadores Binários." Master's thesis, 2011. http://hdl.handle.net/10316/99649.

Full text
Abstract:
Dissertação de Mestrado em Engenharia Electrotécnica e de Computadores, Tecnologias de Informação Visuail, apresentada à Faculdade de Ciências e Tecnologia da Universidade de Coimbra.
O presente trabalho explora a área da Realidade Aumentada e foca-se na integração de objectos virtuais em ambientes reais em tempo real. O objectivo geral é desenvolver uma biblioteca compreensiva e de fácil utilização para integrar no OpenAR e possibilitar a criação de aplicações de Realidade Aumentada. O elemento chave para a interacção com esta biblioteca são os marcadores simples com um código binário criados para fornecer informação sobre o mundo real em tempo real. São descritos e caracterizados os processos que servem de base para o desenvolvimento deste trabalho, nomeadamente a binarização de imagem, a detecção de marcadores, a extracção e aplicação da pose a objectos virtuais. Os resultados deste trabalho comprovam a importância e utilidade de implementar um sistema de Realidade Aumentada desta natureza em diferentes áreas, tais como a interacção homem-máquina, o entretenimento, a educação, a medicina/psicologia e a indústria. Como trabalho futuro, são propostas melhorias ao nível da visualização e realismo dos ambientes virtuais.
The present work explores the area of Augmented Reality and focus on the integration of virtual objects within real ambient in real time. The main objective is to develop a comprehensive library of easy utilization in order to integrate on OpenAR and improve the possibility of create Augmented Reality applications. The key element for the interaction with that library is simple markers with a binary code created to deliver information about the real world in real time. The processes that compose the basis to the development of this work are here exposed and characterized, namely the image binarization, the markers detection, the extraction and application of pose to virtual objects. This work results prove the importance and utility of an Augmented Reality system implantation of this nature in different areas, such as human-computer interaction, entertainment, education, medicine/psychology and industry. As a proposition for future work, improvements in the visualization and realism of virtual ambient are also presented.
APA, Harvard, Vancouver, ISO, and other styles
26

Félix, Inês Dinis. "Deep Learning for Markerless Surgical Navigation in Orthopedics." Master's thesis, 2020. http://hdl.handle.net/10316/92155.

Full text
Abstract:
Trabalho de Projeto do Mestrado Integrado em Engenharia Biomédica apresentado à Faculdade de Ciências e Tecnologia
A Artroplastia Total do Joelho (ATJ) é um procedimento cirúrgico realizado em pacientes que sofrem de artrite do joelho. O posicionamento correcto dos implantes está fortemente relacionado com múltiplas variáveis cirúrgicas que têm um impacto tremendo no sucesso da cirurgia. Foram investigados e desenvolvidos sistemas de navegação baseados em computador, com o objetivo de auxiliar o cirurgião a controlar, com precisão, essas variáveis cirúrgicas. Esta tese centra-se na navegação em ATJ e aborda dois problemas que são apontados por muitos como fundamentais para a sua adoção consensual.O primeiro problema é que as tecnologias existentes são muito dispendiosas e requerem incisões ósseas adicionais para a fixação de marcadores, geralmente muito volumosos, interferindo com o típico fluxo cirúrgico. Este trabalho apresenta um sistema de navegação sem marcadores que apoia o cirurgião na execução precisa do procedimento de ATJ. O sistema proposto utiliza uma câmara RGB-D móvel para substituir os sistemas de navegação ópticos existentes, eliminando a necessidade de marcadores. A metodologia apresentada combina uma abordagem eficaz baseada em Deep Learning para segmentar com precisão a superfície óssea com um algoritmo robusto baseado na geometria para registar os ossos com modelos pré-operatórios. O desempenho favorável da nossa metodologia é alcançado através (1) do uso de uma estratégia semi-supervisionada para gerar dados de treino a partir de dados reais de cirurgia ATJ, (2) utilizando técnicas eficazes de aumento de dados para melhorar a capacidade de generalização, e (3) utilizando dados de profundidade adequados. A utilidade deste método completo de registo sem marcadores, que generaliza para diferentes dados intra-operatórios, é evidente e os resultados experimentais mostram um desempenho promissor para ATJ baseada em vídeo.O segundo problema está relacionado com a falta de precisão na localização de pontos de referência no joelho durante a navegação, o que pode levar a erros significativos no posicionamento dos implantes. Esta tese apresenta um método de prova de conceito que utiliza Deep Learning para a detecção automática dos pontos de referência apenas a partir de imagens. O objetivo é fornecer sugestões em tempo real para auxiliar o cirurgião nesta tarefa, o que pode ser útil na tomada de decisões e na redução da variabilidade. A validação experimental num ponto de referência mostra que o método atinge resultados fiáveis, podendo ser feita a sua aplicação aos restantes pontos de referência.
Total Knee Arthroplasty (TKA) is a surgical procedure performed in patients suffering from knee arthritis. The correct positioning of the implants is strongly related to multiple surgical variables that have a tremendous impact on the success of the surgery. Computer-based navigation systems have been investigated and developed in order to assist the surgeon in accurately controlling those surgical variables. This thesis focuses in navigation for TKA and addresses two problems that are pointed by many as fundamental for its broader acceptance. The first problem is that existing technologies are very costly, require additional bone incisions for fixing markers to be tracked, and these markers are usually bulky, interfering with the standard surgical flow. This work presents a markerless navigation system that supports the surgeon in accurately performing the TKA procedure. The proposed system uses a mobile RGB-D camera for replacing the existing optical tracking systems and does not require markers to be tracked. We combine an effective deep learning-based approach for accurately segmenting the bone surface with a robust geometry-based algorithm for registering the bones with pre-operative models. The favorable performance of our pipeline is achieved by (1) employing a semi-supervised labeling approach for generating training data from real TKA surgery data, (2) using effective data augmentation techniques for improving the generalization capability, and (3) using appropriate depth data. The construction of this complete markerless registration prototype that generalizes for unseen intra-operative data is non-obvious, and relevant insights and future research directions can be derived. The experimental results show encouraging performance for video-based TKA. The second problem is related to the lack of accuracy in localizing landmarks during image-free navigation, that can lead to significant errors in implant positioning. This thesis presents a proof-of-concept method that uses deep learning for automatic detection of landmarks from only visual input. The aim is to provide real time suggestions to assist the surgeon in this task, which can be useful in decision making and to reduce variability. Experimental validation with one landmark shows that the method achieves reliable results, and extension to the remaining landmarks can be extrapolated.
APA, Harvard, Vancouver, ISO, and other styles
27

Kundu, Jogendra Nath. "Self-Supervised Domain Adaptation Frameworks for Computer Vision Tasks." Thesis, 2022. https://etd.iisc.ac.in/handle/2005/5782.

Full text
Abstract:
There is a strong incentive to build intelligent machines that can understand and adapt to changes in the visual world without human supervision. While humans and animals learn to perceive the world on their own, almost all state-of-the-art vision systems heavily rely on external supervision from millions of manually annotated training examples. Gathering such large-scale manual annotations for structured vision tasks, such as monocular depth estimation, scene segmentation, human pose estimation, faces several practical limitations. Usually, the annotations are gathered in two broad ways; 1) via specialized instruments (sensors) or laboratory setups, 2) via manual annotations. Both processes have several drawbacks. While human annotations are expensive, scarce, or error-prone; instrument-based annotations are often noisy or limited to specific laboratory environments. Such limitations not only stand as a major bottleneck in our efforts to gather unambiguous ground-truth but also limit the diversity in the collected labeled dataset. This motivates us to develop innovative ways to utilize synthetic environments to create labeled synthetic datasets with noise-free unambiguous ground-truths. However, the performance of models trained on such synthetic data markedly degrades when tested on real-world samples due to input distribution shift (a.k.a. domain shift). Unsupervised domain adaptation (DA) seeks learning techniques that can minimize the domain discrepancy between a labeled source and an unlabeled target. However, it mostly remains unexplored for challenging structured prediction based vision tasks. Motivated by the above observations, my research focuses on addressing the following key aspects: (1) Developing algorithms that support improved transferability to domain and task shifts, (2) Leveraging inter-entity or cross-modal relationships to develop self-supervised objectives, and (3) Instilling natural priors to constrain the model output within the realm of natural distributions. First, we present AdaDepth - an unsupervised domain adaptation (DA) strategy for the pixel-wise regression task of monocular depth estimation. Mode collapse is a common phenomenon observed during adversarial training in the absence of paired supervision. Without access to target depth-maps, we address this challenge using a novel content congruent regularization technique. In a follow-up work, we introduced UM-Adapt, a unified framework to address two distinct objectives in a multi-task adaptation framework, i.e., a) achieving balanced performance across all tasks and b) performing domain adaptation in an unsupervised setting. This is realized using two novel regularization strategies; Contour-based content regularization and exploitation of inter-task coherency using a novel cross-task distillation module. Moving forward, we identified certain key issues in existing domain adaptation algorithms that hinder their practical deployability to a large extent. Existing approaches demand the coexistence of source and target data, which is highly impractical in scenarios where data-sharing is restricted due to proprietary or privacy concerns. To address this, we propose a new setting termed as Source-Free DA and tailored learning protocols for the dense prediction task of semantic segmentation and image classification in both with and without category shift scenarios. Further, we investigate the problem of Self-supervised Domain Adaptation for the challenging monocular 3D human pose estimation task. The key differentiating factor in our approach is the idea of infusing model-based structural prior as a means to constrain the pose estimation predictions within the realm of natural pose and shape distributions. Towards self-supervised learning, our contribution lies in the effective use of new inter-entity relationships to discern the co-salient foreground appearance and thereby the corresponding pose from just a pair of images having diverse backgrounds. Unlike self-supervised solutions that aim for better generalization, self-adaptive solutions aim for target-specific adaptation, i.e., adaptation to deployment-specific environmental attributes. To this end, we propose a self-adaptive method to align the latent space of human pose from unpaired image-to-latent and the pose-to-latent, by enforcing well-formed non-local latent space rules available for unpaired image (or video) and pose (or motion) domains. This idea of non-local relation distillation against the broadly employed general contrastive learning techniques shows significant improvements in the self-adaptation performance. Further, in a recent work, we propose a novel way to effectively utilize uncertainty estimation for out-of-distribution (OOD) detection, and thus enabling inference-time self-adaptation. The ability to discern OOD samples allows a model to assess when to perform re-adaptation while deployed in a continually changing environment. Such solutions are in high demand for enabling effective real-world deployment across various industries, from virtual and augmented reality to gaming and health-care applications.
APA, Harvard, Vancouver, ISO, and other styles
28

Silva, Tomé Pereira da. "Desenvolvimento de plataforma móvel para futebol robótico." Master's thesis, 2010. http://hdl.handle.net/1822/65406.

Full text
Abstract:
Dissertação de Mestrado em Ciclo de Estudos Integrados Conducentes ao Grau de Mestre em Engenharia Electrónica Industrial e Computadores
A robótica de hoje em dia tem inúmeras aplicações práticas, desde a ajuda prestada ao Homem, até situações em que a precisão e a repetibilidade a torna num grande instrumento de trabalho em diversificadas áreas. Em certos casos, em que o meio ambiente que engloba o agente não é totalmente controlado, este tem que se adaptar ao meio envolvente para finalização da sua determinada tarefa. Esta última situação é a mais complexa, mas é também a situação em que se insere o principal objectivo desta dissertação - a construção de um robô autónomo capaz de jogar futebol. O trabalho apresentado, engloba tanto a concepção como a construção de um protótipo de um robô futebolista, com software capaz de controlar o robô autonomamente, assim como software de apoio às competições. Na construção do robô é analisada desde a estrutura, forma, disposição dos componentes e materiais usados; o software é desenvolvido desde a raiz numa nova estrutura organizada; por fim, mas igualmente importante, é implementado software para a comunicação com hardware, para comunicação em rede, processamento de imagem entre outros módulos necessários ao bom funcionamento do robô. No final, são apresentados alguns aspectos críticos de aperfeiçoamento de todo este trabalho, assim como soluções futuras para os problemas encontrados.
Nowadays, robotics has numerous practical applications, from help to humans, to situations where accuracy and repeatability becomes a great tool to work in several different areas. In some cases, when the agent works on uncontrolled environments, he has to adapt itself completely to that environment or to its particular task. This becomes more complex, but it is also the main goal of this thesis - the construction of an autonomous robot, able to play football coping with the RoboCup rules. This thesis work here presented encloses the robot football player prototype design, with software that can autonomously control the robot, and software to support the competition in which it operates. The robot is analyzed regarding design, structure, shape and components arrangement, as well as the materials used. Software was developed in a new organizational structure from scratch and is also explained on this thesis. Several software modules were created, from the network communication, to hardware control, image processing as well as other modules necessary to manage the real game. In the last chapters, critical aspects are described and discussed, as well as future solutions to problems encountered during this whole process.
APA, Harvard, Vancouver, ISO, and other styles
We offer discounts on all premium plans for authors whose works are included in thematic literature selections. Contact us to get a unique promo code!

To the bibliography